Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev transformer #1723

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open

Dev transformer #1723

wants to merge 28 commits into from

Conversation

ZaoZhe6666
Copy link

您好,为支持业务端 Transformer 模型,因此在您的项目中添加了包括 ARM 端 Where、Cast、Unsqueeze、Shape、Not、Equal、Greater 算子的支持。同时添加了 ARM 端计算 INT32 类型的支持:先原地转换为 FLOAT,计算后再转回。更详细的说明可以参看 commit 信息中的 README_EVA 文件,以及提及的 iwiki 链接。谢谢。

@@ -470,12 +478,44 @@ Status ArmConcatLayerAcc::Exec(const std::vector<Blob *> &inputs, const std::vec
return TNN_OK;
}

// 修改处:添加了新的函数TransDataType,用于将T_IN类别数据转化为T_OUT类别存储
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inplace blob data type converting may cause result issues when the blob is referenced by multiple layers.
Besides, the TransDataType functions was defined in multiple layers, which is not good for maintain, Please add an optimizer which can insert cast layers to do the converting job.

@@ -369,9 +415,39 @@ Status ArmBinaryLayerAcc::ExecInt8(const std::vector<Blob *> &inputs, const std:
return TNN_OK;
}

template <typename T_IN, typename T_OUT>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@bluaxe
Copy link
Collaborator

bluaxe commented Aug 2, 2022

Binary file is not recommended to be includeds. Please resubmit a PR without source/tnn/tnn.zip

zezhao and others added 19 commits August 3, 2022 10:38
* add logsoftmax kernel and trt layer builder

Signed-off-by: sjfeng1999 <j514681085@icloud.com>

* add logsoftmax unittest

Signed-off-by: sjfeng1999 <j514681085@icloud.com>

* [CUDA][TRT] unpack logsoftmaxPlugin to SoftmaxLayer and UnargLayer

Signed-off-by: sjfeng1999 <j514681085@icloud.com>
* [UPD] fix some about error status again

* [UPD]enable const folder to infer blobs shape for coreml; fix reshape shape size logic;

* [UPD]unify op system;check apple neral engine;

* [UPD]unify op system;check apple neral engine;

* [FIX] reset multi input in network forward for support image classifier demo

* [FIX] fix multi input in network forward

* [FIX] fix const op about weight shape(=1)

* [FIX] fix const op about weight shape(=1) again

* [UPD] update to support multi output forward

* [UPD] update to support split op

* [UPD]fix coreml multi output case; add cache logic;

* [UPD]fix coreml multi output case; add cache logic;

* [UPD]fix coreml multi output case; add cache logic;

* [UPD]fix multi output error

* [FIX] fix pool op about pad

* [UPD] update to support pad op (only allowed for H and W dimensions)

* [UPD]remove blob manager of coreml network

* [UPD]rename coreml_executor to coremlmodel

* [UPD] remove InitCoreMLExecutor

* [FIX] fix to support different input data type (float32 & int32) in forward

* [UPD] update to support expand dims & reduce dims reshape by adding unsqueeze & squeeze

* [UPD]change internal device from metal to arm for device npu

* [FIX] fix conv op about group conv

* [FIX] fix deconv op about group deconv

* [UPD] update to support sub op

* [UPD] update to support clip op

* [UPD] update to support slice op

* [UPD] update to support upsample op

* [FIX] fix slice op about endindex

* [UPD] update to support constant padding, allowed for C , H and W dimensions

* [UPD]fix camera switch device

* [UPD]fix actual device display error

* [UPD]fix cache path

* [UPD] upodate to add sub & slice & clip to project

* [FIX] fix demo use NPU error

* [UPD]fix ocr error

* [FIX] fix upsample op about align_corners

* [FIX] fix upsample op about Fractional scales

* [BUG]fix coreml output nil error; fix upsample nn for fractional scale

* [FIX] fix upsample op about scales order

* [UPD] update to support slice v2 op

* [UPD] update to support tanh v2 op

* [FIX] fix batchnorm op about mean value

* [FIX] fix some annotation

* [BUG]fix upsample error; add shuffle channel coreml layer

* [FIX] fix innerproduct op about inputchannels

* [UPD] remove slicev2 to slice file

* [UPD] remove tanhv2 to slice file

* [UPD] update to reshape op about expand dims & reduce dims

* [UPD] update to innerproduct op adout adding squeeze to reduce dims (in order to match old TNN model)

* [UPD] update to support flatten to 2D op

* [UPD] update to support relu6 op

* [ADD]]add cast coreml layer

* [ADD]]add shape coreml layer

* [UPD] add flatten & relu6 & shuffle_channel to xcode project

* [ADD]]add gather coreml layer

* [ADD]]add gelu coreml layer

* [ADD]]add layernorm coreml layer

* [BUG]support int32 for coreml const layer

* [BUG]support shape input for coreml reshape layer

* [BUG]support model check for TNN_APPLE_NPU_ENABLE using MLComputeUnitsCPUOnly

* [ADD]]add mat_mul coreml layer;

* [UPD] update to support reshape layer when reshape_type = 1

* [UPD] update to coreml model input&output support int32 data tpye

* [FIX] fix reshape layer about reshapedynamic input & output

* [BUG]support mlmodel and mlmodelc for benchmark

* [UPD] update to support conv layer with fp16 data type

* [FIX] add 'APPLE_NPU' to model_check device_type_message

* [FIX] fix some about conv layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [FIX] fix some about const layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support deconv layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support innerproduct layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support batchnorm layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support layernorm layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support prelu layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support matmul layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [FIX] fix some about matmul layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD]support fuse form mul+add to batchnorm

* [BUG]fix import error

* [BUG]fix reshape error

* [BUG] fix reshape layer when reshape_type=1 (input_shape_size = output_shape_size = 4)

* [BUG] fix reshape layer when reshape_type=1 (input_shape_size = output_shape_size = 4)

* [UPD]support ssd

* [ADD]]ssdlite-mobilenetv2 from tf

* [UPD] update to support conv & deconv & const & innerproduct & batchnorm & layernorm & matmul & prelu layers with fp16 data type (TNN fp16 -> CoreML fp16)

* [UPD] update to support batchnorm layers with fp16 data type (TNN fp16 -> CoreML fp16)

* [FIX] set coreml layer default using full precision

* [UPD] update to support hardsigmoid layer

* [UPD] update to support hardswish layer

* [UPD] update to support reducesum layer

* [UPD] update to support reducemean layer

* [UPD] add some coreml layer files to xcode project

* [FIX] fix some annotation about hardswish

* [BUG]fix reshape for tensor with dims size=0

* [UPD]support landscapeleft ui; clear navbar left items

* [UPD]support landscapeleft ui; add stackview to support minor camera preview;

* [ADD]add monodepth demo

* [UPD] update to support unit_test

* [FIX] upload missing download_model.sh and download_model.bat

* [UPD] update concat & conv & shuffle uint_test files for APPLE_NPU

* [FIX] rename unit_test model

* [UPD] update to support softplus layer

* [UPD] update to support softsign layer

* [UPD] update to support div layer

* [UPD] update binary layer unit_test for APPLE_NPU

* [UPD] update to support reducemax layer

* [UPD] update to support reducemin layer

* [UPD]update project file

* [UPD]add log error

* [UPD] update hardswish layer unit_test for APPLE_NPU

* [UPD]add log error

* [UPD] update to skip stride_slice when APPLE_NPU

* [BUG]fix batchnorm unitest

* [BUG]fix prelu unitest

* [BUG]fix prelu unitest

* [BUG]fix prelu unitest

* [BUG] fix unsqueeze unittest

* [BUG] fix split unittest

* [BUG] fix reshape unittest

* [BUG]fix updample unitest

* [BUG] fix reduce op (reducesum/reducemean/reducemax/reducemin) unittest

* [BUG]fix layernorm unitest

* [BUG] fix reduce op unittest again

* [BUG] fix deconv unittest

* [BUG] fix innerproduct unittest

* [BUG]fix ssd demo display error

* [BUG] fix matmul unittest

* [BUG]fix benchmark error to support multiple model in the same directory

* [BUG] add some explanation about reduce op unittest

* [BUG]fix benchmark error to support multiple model in the same directory

* [BUG] add some explanation about reduce op unittest again

* [BUG]fix batchnorm param error

* [BUG] fix reshape layer unittest

* [BUG]fix batchnorm param error

* [BUG]fix conv/deconv input/output channel error

* [UPD] update to support stride_slice & unittest

* [BUG] fix reshape layer unittest when reshape_type = 1

* [BUG] fix reshape layer unittest when reshape_type = 1 using reshapestatic

* [BUG] fix reshape layer unittest using reshapestatic

* [BUG] fix some annotation about reshape layer

* [BUG] fix reshape layer output permute when reshape_type = 1

* [BUG] fix reshape layer using reshapestatic whem reshape_type = 1

* [BUG]fix broadcast layer error for input form constant map; fix bert demo error;

* [BUG]fix blob convert error for int32 mat

* [BUG]fix reshape name style

* [UPD]add tiny bert fixed length 256

* [BUG] fix add layer by binary op base class

* [BUG] fix div/mul/sub layer by binary op base class

* [BUG]fix batchnorm unitest

* [BUG]ensure clean up mlmodelc if error raises when compile

* [UPD]adjust demo list

* [BUG] fix conv layer about activation inplace

* [BUG] fix conv layer about relu6

* [BUG] fix cleanup func none of return

* [BUG] remove repetitive line

* [BUG]fix batchnorm unitest

* [BUG] fix conv layer about relu6 inplace

* [UPD]automatically use apple npu

* [UPD]add clean logic for coreml

* [BUG] fix hardswish layer with 2 inputs

* [UPD] update README.md & support.md about APPLE_NPU

* [UPD]unify rawbuffer2coremlweight

* [UPD]support coreml lstm

* [UPD]fix lstm error

* [UPD]support coreml lstm bidirection

* [UPD]support coreml constofshape

* [UPD]support slice at axis=0

* [UPD]ignore

* [UPD]fix reshape error

* [UPD]fix lstm error; replace suqeeze with reshape because some case suqeeze raise runtime compile error for axis = {3, 4}

* [UPD]fix slice error

* [UPD]support multiple mlmodel in the same dirctory; add autorelease memory, because coreml may need large memory in ocr demo

* ignore

* [UPD]add log msg

* [UPD]fix reshape and slice error

* [UPD]add auto release to model

* [UPD]add auto release to model

* [UPD]unify convertion from rawbuffer to coreml weight param

* [FIX] fix matmul from rawbuffer to coreml weight param

* [UPD]fix innerproduct input channel error

* [BUG] fix matmul weight bug

* remove some annotation

* [BUG] fix matmul layer about fp16

* [FIX] fix sliceV2 op  conflict with master

* [FIX] fix sliceV2 op  conflict with master

* merge master (Tencent#1721)

* Fix trt multistream logger (Tencent#1521)

* [FIX] fix trt logger

* [FIX] catch std::bad_alloc error for trt8 building

* [FIX] return null while shape_tensor size -1

* Update version.h

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Update split_utils.cc (Tencent#1528)

我使用mingw32编译提示错误,因为使用mingw32编译器仍然需要空间命名
[ 99%] Building CXX object CMakeFiles/TNN.dir/source/tnn/utils/split_utils.cc.obj
D:\TNN\source\tnn\utils\split_utils.cc: In static member function 'static tnn::Status tnn::SplitUtils::SplitStr(const char*, tnn::str_arr&, const char*, bool, bool, bool, bool, bool)':
D:\TNN\source\tnn\utils\split_utils.cc:163:23: error: 'min' was not declared in this scope
             int len = min((i - cursor), subs_length - 1);
个人认为修改这样更好一下,可以适应mingw32和兼顾之前的编译器

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Update README.md (Tencent#1538)

Typos

* [UPD]update QQ group (Tencent#1552)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [opencl][fix] try save program cache (Tencent#1557)

* Dev roi align (Tencent#1511)

* [ARM] fix int32 blob cvt to mat

* [ARM] support roi align

* [ARM] add roi align unit test

* [ARM] add to xcodeproj

Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Fix arm gather and constant blob (Tencent#1564)

* [ARM][BUG] fix gather error for indice < 0

* [ARM][BUG] fix buffer to blob error without converting precision

* [ARM] update type convert in layer_norm fp16

Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>

* Dev add config layer (Tencent#1569)

* add config layer param to set arm conv algorithm for specific layer

Co-authored-by: powerpwang <powerpwang@outlook.com>
Co-authored-by: ealinli <ealinli@tencent.com>

* 修复 protobuf 版本升级造成的 onnx2tnn 编译失败的问题 (Tencent#1571)

* [ONNX][BUG]1. fix compile bug;

* [ONNX2TNN][BUG]1. 修复因为 protobuf 版本升级带来的编译问题;

* [ADD][TOOLS] add dynamic range quantization (Tencent#1572)

* [ADD][TOOLS] support fake quantization

* [UPD][FAKE_QUANT] fix bug

* [UPD][DOC] add fake quantization in doc

* [UPD] 1.rename fake quant to dynamic range quant 2.move dequant to net_optimizer

* [UPD] remove redundant comment

* [UPD] update comment for DynamicRangeDequant

* [DRQuant][UPD] fix namespace issue

* [DRQuant][UPD] Turn off TNN_SYMBOL_HIDE to fix ci

Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD][OPENCL] opencl support using unoptimized conv (Tencent#1581)

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][CONVERTER] lstm support sequence_lens (Tencent#1585)

Co-authored-by: ealinli <ealinli@tencent.com>

* [MODEL_CHECK][BUG]1. fix bug for dump layer(fp16); (Tencent#1567)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Bugfix from train branch (Tencent#1592)

* [BUG] fix get dims value bug when input is 1D or 2D in arm_reduce_layer_acc.cc.
* [BUG] fix Convert from NCHW to NHWC error when input is on arm device.
* [BUG] fix convert mat to blob bug when input is NC_INT32 on arm device.
* [BUG] fix tflite_converter bug when transform a activation layer.
* add nchw format condition when copy int32 mat to blob
* rollback changes on tflite_op_converter.cc

Co-authored-by: sanerzheng <sanerzheng@tencent.com>

* [UPD][OPENCL] opencl support x86 mat (Tencent#1593)

Co-authored-by: ealinli <ealinli@tencent.com>

* [CONVERTER][BUG]1. fix issue 1595; (Tencent#1596)

* [UPD][OPENCL] add ocl version check (Tencent#1601)

* [UPD][OPENCL] add ocl version check

* [UPD][OPENCL] update message for vervion check

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][OPENCL] solve the problem that matmul, tile have incorrect results on helio p65 (Tencent#1602)

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][DYQ] fix dynamic range quant compile error on windows (Tencent#1604)

Co-authored-by: ealinli <ealinli@tencent.com>

* [DOC][UPD] modify image links in doc (Tencent#1617)

Co-authored-by: ealinli <ealinli@tencent.com>

* remove redundant test cases (Tencent#1614)

* Fix typos. (Tencent#1626)

* Fix typos.

* Update Readme.

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Interpreter change from std::map to safe_map, later one offers a const operator[] function (Tencent#1618)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD][OPENCL] get opencl version when GpuType is OTHER (Tencent#1636)

* [UPD][OPENCL] get opencl version when GpuType is OTHER

* [UPD][OPENCL] optimize nv gpu judgment logic

Co-authored-by: ealinli <ealinli@tencent.com>

* Patch x86 avx support (Tencent#1633)

* merge dev_vc14_m1_debug, support x86 avx

* add option to support x86 avx2 compile

* update win_x86_opencl building script

Co-authored-by: Dandiding <Dandiding@tencent.com>

* fix x86 avx2 options (Tencent#1638)

* fix typos in doc (Tencent#1634)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [X86][BUG] fix deconv layer build error (Tencent#1641)

* [OPENCL][FIX] fix conv and dwconv on some of the AMD GPUs

* [UPD][OPENCL] add coor check for conv and dwconv

* [OPENCL][FIX] fix compilation issues

* [OPENCL][UPD] optimize AMD GPU judgment logic

Co-authored-by: ealinli <ealinli@tencent.com>

* [OPENCL][UPD] fix deconv, avgpool on AMD GPU (Tencent#1646)

* [OPENCL][UPD] fix deconv and avgpool when read image

* [OPENCL][UPD] add header file for pooling

Co-authored-by: ealinli <ealinli@tencent.com>

* [OPENCL][UPD] opencl support cache on windows (Tencent#1645)

* [UPD][OPENCL] add coor check for conv and dwconv

* [OPENCL][FIX] fix compilation issues

* [OPENCL][UPD] optimize AMD GPU judgment logic

* [OPENCL][UPD] support cache on windows

* [OPENCL][UPD] fix load cache on windows

Co-authored-by: ealinli <ealinli@tencent.com>

* [DRQ][UPD] dynamic range quant model support do const folder (Tencent#1647)

* [DRQ][UPD] dynamic range quant model support do const folder

* [TOOLS][UPD] dynamic range quant updates usage

Co-authored-by: ealinli <ealinli@tencent.com>

* 1. make model_check support dynamic range quantized model; (Tencent#1653)

* [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial (Tencent#1640)

* [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial

* [TUTORIAL][UPD] update code link

* [TUTORIAL][UPD] fix typo

Co-authored-by: ealinli <ealinli@tencent.com>

* [X86][FIX] binary op support fp16 weights (Tencent#1655)

* [X86][FIX] binary op support fp16 weights

* [X86][FIX] matmul support fp16 weights

Co-authored-by: ealinli <ealinli@tencent.com>

* Feature dynamic quant fc (Tencent#1660)

* [DYNAMIC_QUANT][UPD]1. dynamic quant support inner_product layer;

* [ARM][UPD]1. arm gemm 部分情况下使用 Kahan sum 算法,以避免 fp16 累加误差;

* [FIX][CPU][TRT] Fix CPU Not OP bug, Fix TensorRT ShapeTensor Class Bug. (Tencent#1663)

* [FIX] Fix CPU Not Operator data type error.

* [FIX] Fix TensorRT ShapeTensor class ConvertTo1D() func bug

* fix _mm256_load_ps segmentation fault (Tencent#1682)

* fix _mm256_load_ps segmentation fault

* fix crash on mm256_load when  innerproduct

* use loadu instead of stride-judgement

* remove unused code

Co-authored-by: fishdai <fishdai@tencent.com>

* x86_acc & blob_converter now will consider the BlobHandle.bytes_offset (Tencent#1684)

* Dev x86 layer adapter (Tencent#1683)

* [X86] add layer acc adapter

* [X86] NULL to nullptr

* [X86][OPENVINO] add openvino adapter layer builder, fallback to cpu naive impl if there is no normal ov layer builder

* [X86][OPENVINO] fix hard code of ov precision

Co-authored-by: anonymous <anonymous@mail.org>

* [ARM] fix arm cross compile error caused by float-abi (Tencent#1678)

* avoid nullptr in IsSupport (Tencent#1685)

* [UPD][TOOLS] 1.increase subs_length 2.align model support bool and int32 input 3. fix gather and onehot convert 4. gather_nd support indices_shape[-1] < r (Tencent#1686)

Co-authored-by: ealinli <ealinli@tencent.com>

* Dev metal ngray (Tencent#1693)

* [METAL] metal support ngray input mat

* [METAL]fix bytes_size

* [COREML] fix dynamic quantization model about coreml

Co-authored-by: jacinhu <jacinhu@tencent.com>
Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [UPD][DRQ] support quantizing matmul's const weight (Tencent#1698)

* [UPD][DRQ] support quantizing matmul's const weight

* [UPD][DRQ] add scale check in constant map

Co-authored-by: ealinli <ealinli@tencent.com>

* [FIX] fix compile macos framework (Tencent#1687)

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* Optimize dynamic range quantize (Tencent#1699)

* [DynamicRangeQuantize][UPD]1. 添加了根据权重分布判断是否量化的逻辑;

* [DynamicQuantization][UPD]1. dynamic_range_quantization support TNN fp16 model;

* [DRQ][UPD]1. 修复了 model_check_android.sh 脚本中指定 reference file,但是推理没有用到的 bug;2. 优化了 dynamic_range_quantization 中的部分代码;

* [DRQ][UPD]1.fix conflict with merge master code;

Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>

* Fix windows x86 build (Tencent#1697)

* [FIX] remove nanodet for windows

* remove ninga compile for some bug

* fix x86 mat type register macro name

* fix x86 matmul with 2 inputs

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [METAL] fix stride slice crach when dims is 2 (Tencent#1701)

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [mac] 1. FIX X86 and ARM conflict; 2. ADD ARM arch on intel cpu (You can use ARM if rosetta-X86 crash).  3. Use ios project build/profile M1-Mac. (Tencent#1700)

Co-authored-by: gennyxu <gennyxu@tencent.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [iOS][UPD]1. add missing file for xcode project; (Tencent#1705)

* [BUG]fix coreml error of slicev2、padv2 and matmul; (Tencent#1703)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

* [BUG]fix dynamic dequant error; fix arm pad error;

* [BUG]support coreml padv2

* [BUG]fix ccoreml matmul error when it has const input blob

* [BUG]fix coreml slicev2

* [UPD]add convert logic of swish

* [BUG]fix  error cpu error for x86 mac

* [UPD]support fusion for gemm + bn

* [UPD]add convert logic of swish

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD]update merge logic for swish groupnorm deconv (Tencent#1708)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

* [BUG]fix dynamic dequant error; fix arm pad error;

* [BUG]support coreml padv2

* [BUG]fix ccoreml matmul error when it has const input blob

* [BUG]fix coreml slicev2

* [UPD]add convert logic of swish

* [BUG]fix  error cpu error for x86 mac

* [UPD]support fusion for gemm + bn

* [UPD]add convert logic of swish

* [UPD]support fusion for deconv+add and deconv+add+bn

* [UPD]add aliyun disk link for tnn models

* [UPD]support fusion for group norm

* [UPD]support fusion for swish

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [DRQ][BUG]1. fix bug for max_values; (Tencent#1716)

* Hotfix m1 build (Tencent#1715)

* fix apple m1 clang 13.1 compile error

* fix unit test compile error

Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>

Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: sxj731533730 <sxj731533730@gmail.com>
Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com>
Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>
Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>
Co-authored-by: powerpwang <powerpwang@outlook.com>
Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: saner zheng <zqawszqaws@126.com>
Co-authored-by: sanerzheng <sanerzheng@tencent.com>
Co-authored-by: Feng Shijie <j514681085@icloud.com>
Co-authored-by: Dandiding <Dandiding@tencent.com>
Co-authored-by: FeiGeChuanShu <774074168@qq.com>
Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com>
Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com>
Co-authored-by: kumbayaco <xyu.dai@gmail.com>
Co-authored-by: fishdai <fishdai@tencent.com>
Co-authored-by: anonymous <anonymous@mail.org>
Co-authored-by: jacinhu <jacinhu@tencent.com>
Co-authored-by: XDC <196890111@qq.com>
Co-authored-by: gennyxu <gennyxu@tencent.com>
Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local>

* [FIX] fix sliceV2 op  conflict with master again

* [METAL][OP][FIX] 1.metal support groupnorm & swish op 2.fix metal blob conveter & reformat bug when input dim is 1

* reset model

* [COREML] coreml support swish op

* [COREML] fix coreml batchnorn bug

* [COREML]coreml support groupmorm

* [COREML]coreml support instancenorm

* reset model

* solve conflict

* solve conflict

* Dev groupnorm (Tencent#1726)

* Fix trt multistream logger (Tencent#1521)

* [FIX] fix trt logger

* [FIX] catch std::bad_alloc error for trt8 building

* [FIX] return null while shape_tensor size -1

* Update version.h

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Update split_utils.cc (Tencent#1528)

我使用mingw32编译提示错误,因为使用mingw32编译器仍然需要空间命名
[ 99%] Building CXX object CMakeFiles/TNN.dir/source/tnn/utils/split_utils.cc.obj
D:\TNN\source\tnn\utils\split_utils.cc: In static member function 'static tnn::Status tnn::SplitUtils::SplitStr(const char*, tnn::str_arr&, const char*, bool, bool, bool, bool, bool)':
D:\TNN\source\tnn\utils\split_utils.cc:163:23: error: 'min' was not declared in this scope
             int len = min((i - cursor), subs_length - 1);
个人认为修改这样更好一下,可以适应mingw32和兼顾之前的编译器

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Update README.md (Tencent#1538)

Typos

* [UPD]update QQ group (Tencent#1552)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [opencl][fix] try save program cache (Tencent#1557)

* Dev roi align (Tencent#1511)

* [ARM] fix int32 blob cvt to mat

* [ARM] support roi align

* [ARM] add roi align unit test

* [ARM] add to xcodeproj

Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Fix arm gather and constant blob (Tencent#1564)

* [ARM][BUG] fix gather error for indice < 0

* [ARM][BUG] fix buffer to blob error without converting precision

* [ARM] update type convert in layer_norm fp16

Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>

* Dev add config layer (Tencent#1569)

* add config layer param to set arm conv algorithm for specific layer

Co-authored-by: powerpwang <powerpwang@outlook.com>
Co-authored-by: ealinli <ealinli@tencent.com>

* 修复 protobuf 版本升级造成的 onnx2tnn 编译失败的问题 (Tencent#1571)

* [ONNX][BUG]1. fix compile bug;

* [ONNX2TNN][BUG]1. 修复因为 protobuf 版本升级带来的编译问题;

* [ADD][TOOLS] add dynamic range quantization (Tencent#1572)

* [ADD][TOOLS] support fake quantization

* [UPD][FAKE_QUANT] fix bug

* [UPD][DOC] add fake quantization in doc

* [UPD] 1.rename fake quant to dynamic range quant 2.move dequant to net_optimizer

* [UPD] remove redundant comment

* [UPD] update comment for DynamicRangeDequant

* [DRQuant][UPD] fix namespace issue

* [DRQuant][UPD] Turn off TNN_SYMBOL_HIDE to fix ci

Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD][OPENCL] opencl support using unoptimized conv (Tencent#1581)

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][CONVERTER] lstm support sequence_lens (Tencent#1585)

Co-authored-by: ealinli <ealinli@tencent.com>

* [MODEL_CHECK][BUG]1. fix bug for dump layer(fp16); (Tencent#1567)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Bugfix from train branch (Tencent#1592)

* [BUG] fix get dims value bug when input is 1D or 2D in arm_reduce_layer_acc.cc.
* [BUG] fix Convert from NCHW to NHWC error when input is on arm device.
* [BUG] fix convert mat to blob bug when input is NC_INT32 on arm device.
* [BUG] fix tflite_converter bug when transform a activation layer.
* add nchw format condition when copy int32 mat to blob
* rollback changes on tflite_op_converter.cc

Co-authored-by: sanerzheng <sanerzheng@tencent.com>

* [UPD][OPENCL] opencl support x86 mat (Tencent#1593)

Co-authored-by: ealinli <ealinli@tencent.com>

* [CONVERTER][BUG]1. fix issue 1595; (Tencent#1596)

* [UPD][OPENCL] add ocl version check (Tencent#1601)

* [UPD][OPENCL] add ocl version check

* [UPD][OPENCL] update message for vervion check

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][OPENCL] solve the problem that matmul, tile have incorrect results on helio p65 (Tencent#1602)

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][DYQ] fix dynamic range quant compile error on windows (Tencent#1604)

Co-authored-by: ealinli <ealinli@tencent.com>

* [DOC][UPD] modify image links in doc (Tencent#1617)

Co-authored-by: ealinli <ealinli@tencent.com>

* remove redundant test cases (Tencent#1614)

* Fix typos. (Tencent#1626)

* Fix typos.

* Update Readme.

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Interpreter change from std::map to safe_map, later one offers a const operator[] function (Tencent#1618)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD][OPENCL] get opencl version when GpuType is OTHER (Tencent#1636)

* [UPD][OPENCL] get opencl version when GpuType is OTHER

* [UPD][OPENCL] optimize nv gpu judgment logic

Co-authored-by: ealinli <ealinli@tencent.com>

* Patch x86 avx support (Tencent#1633)

* merge dev_vc14_m1_debug, support x86 avx

* add option to support x86 avx2 compile

* update win_x86_opencl building script

Co-authored-by: Dandiding <Dandiding@tencent.com>

* fix x86 avx2 options (Tencent#1638)

* fix typos in doc (Tencent#1634)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [X86][BUG] fix deconv layer build error (Tencent#1641)

* [OPENCL][FIX] fix conv and dwconv on some of the AMD GPUs

* [UPD][OPENCL] add coor check for conv and dwconv

* [OPENCL][FIX] fix compilation issues

* [OPENCL][UPD] optimize AMD GPU judgment logic

Co-authored-by: ealinli <ealinli@tencent.com>

* [OPENCL][UPD] fix deconv, avgpool on AMD GPU (Tencent#1646)

* [OPENCL][UPD] fix deconv and avgpool when read image

* [OPENCL][UPD] add header file for pooling

Co-authored-by: ealinli <ealinli@tencent.com>

* [OPENCL][UPD] opencl support cache on windows (Tencent#1645)

* [UPD][OPENCL] add coor check for conv and dwconv

* [OPENCL][FIX] fix compilation issues

* [OPENCL][UPD] optimize AMD GPU judgment logic

* [OPENCL][UPD] support cache on windows

* [OPENCL][UPD] fix load cache on windows

Co-authored-by: ealinli <ealinli@tencent.com>

* [DRQ][UPD] dynamic range quant model support do const folder (Tencent#1647)

* [DRQ][UPD] dynamic range quant model support do const folder

* [TOOLS][UPD] dynamic range quant updates usage

Co-authored-by: ealinli <ealinli@tencent.com>

* 1. make model_check support dynamic range quantized model; (Tencent#1653)

* [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial (Tencent#1640)

* [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial

* [TUTORIAL][UPD] update code link

* [TUTORIAL][UPD] fix typo

Co-authored-by: ealinli <ealinli@tencent.com>

* [X86][FIX] binary op support fp16 weights (Tencent#1655)

* [X86][FIX] binary op support fp16 weights

* [X86][FIX] matmul support fp16 weights

Co-authored-by: ealinli <ealinli@tencent.com>

* Feature dynamic quant fc (Tencent#1660)

* [DYNAMIC_QUANT][UPD]1. dynamic quant support inner_product layer;

* [ARM][UPD]1. arm gemm 部分情况下使用 Kahan sum 算法,以避免 fp16 累加误差;

* [FIX][CPU][TRT] Fix CPU Not OP bug, Fix TensorRT ShapeTensor Class Bug. (Tencent#1663)

* [FIX] Fix CPU Not Operator data type error.

* [FIX] Fix TensorRT ShapeTensor class ConvertTo1D() func bug

* fix _mm256_load_ps segmentation fault (Tencent#1682)

* fix _mm256_load_ps segmentation fault

* fix crash on mm256_load when  innerproduct

* use loadu instead of stride-judgement

* remove unused code

Co-authored-by: fishdai <fishdai@tencent.com>

* x86_acc & blob_converter now will consider the BlobHandle.bytes_offset (Tencent#1684)

* Dev x86 layer adapter (Tencent#1683)

* [X86] add layer acc adapter

* [X86] NULL to nullptr

* [X86][OPENVINO] add openvino adapter layer builder, fallback to cpu naive impl if there is no normal ov layer builder

* [X86][OPENVINO] fix hard code of ov precision

Co-authored-by: anonymous <anonymous@mail.org>

* [ARM] fix arm cross compile error caused by float-abi (Tencent#1678)

* avoid nullptr in IsSupport (Tencent#1685)

* [UPD][TOOLS] 1.increase subs_length 2.align model support bool and int32 input 3. fix gather and onehot convert 4. gather_nd support indices_shape[-1] < r (Tencent#1686)

Co-authored-by: ealinli <ealinli@tencent.com>

* Dev metal ngray (Tencent#1693)

* [METAL] metal support ngray input mat

* [METAL]fix bytes_size

* [COREML] fix dynamic quantization model about coreml

Co-authored-by: jacinhu <jacinhu@tencent.com>
Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [UPD][DRQ] support quantizing matmul's const weight (Tencent#1698)

* [UPD][DRQ] support quantizing matmul's const weight

* [UPD][DRQ] add scale check in constant map

Co-authored-by: ealinli <ealinli@tencent.com>

* [FIX] fix compile macos framework (Tencent#1687)

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* Optimize dynamic range quantize (Tencent#1699)

* [DynamicRangeQuantize][UPD]1. 添加了根据权重分布判断是否量化的逻辑;

* [DynamicQuantization][UPD]1. dynamic_range_quantization support TNN fp16 model;

* [DRQ][UPD]1. 修复了 model_check_android.sh 脚本中指定 reference file,但是推理没有用到的 bug;2. 优化了 dynamic_range_quantization 中的部分代码;

* [DRQ][UPD]1.fix conflict with merge master code;

Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>

* Fix windows x86 build (Tencent#1697)

* [FIX] remove nanodet for windows

* remove ninga compile for some bug

* fix x86 mat type register macro name

* fix x86 matmul with 2 inputs

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [METAL] fix stride slice crach when dims is 2 (Tencent#1701)

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [mac] 1. FIX X86 and ARM conflict; 2. ADD ARM arch on intel cpu (You can use ARM if rosetta-X86 crash).  3. Use ios project build/profile M1-Mac. (Tencent#1700)

Co-authored-by: gennyxu <gennyxu@tencent.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [iOS][UPD]1. add missing file for xcode project; (Tencent#1705)

* [BUG]fix coreml error of slicev2、padv2 and matmul; (Tencent#1703)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

* [BUG]fix dynamic dequant error; fix arm pad error;

* [BUG]support coreml padv2

* [BUG]fix ccoreml matmul error when it has const input blob

* [BUG]fix coreml slicev2

* [UPD]add convert logic of swish

* [BUG]fix  error cpu error for x86 mac

* [UPD]support fusion for gemm + bn

* [UPD]add convert logic of swish

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD]update merge logic for swish groupnorm deconv (Tencent#1708)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

* [BUG]fix dynamic dequant error; fix arm pad error;

* [BUG]support coreml padv2

* [BUG]fix ccoreml matmul error when it has const input blob

* [BUG]fix coreml slicev2

* [UPD]add convert logic of swish

* [BUG]fix  error cpu error for x86 mac

* [UPD]support fusion for gemm + bn

* [UPD]add convert logic of swish

* [UPD]support fusion for deconv+add and deconv+add+bn

* [UPD]add aliyun disk link for tnn models

* [UPD]support fusion for group norm

* [UPD]support fusion for swish

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [DRQ][BUG]1. fix bug for max_values; (Tencent#1716)

* Hotfix m1 build (Tencent#1715)

* fix apple m1 clang 13.1 compile error

* fix unit test compile error

Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>

* [ARM] support groupnorm

* [ARM] support swish

* add swish to conv-post-fuse

* [ADD][OPENCL] opencl add group norm and swish (Tencent#1722)

Co-authored-by: ealinli <ealinli@tencent.com>

* add x86 swish and groupnorm operator; explicitly open see4.2 with low version of compiler

Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: sxj731533730 <sxj731533730@gmail.com>
Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com>
Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>
Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>
Co-authored-by: powerpwang <powerpwang@outlook.com>
Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: saner zheng <zqawszqaws@126.com>
Co-authored-by: sanerzheng <sanerzheng@tencent.com>
Co-authored-by: Feng Shijie <j514681085@icloud.com>
Co-authored-by: Dandiding <Dandiding@tencent.com>
Co-authored-by: FeiGeChuanShu <774074168@qq.com>
Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com>
Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com>
Co-authored-by: kumbayaco <xyu.dai@gmail.com>
Co-authored-by: fishdai <fishdai@tencent.com>
Co-authored-by: anonymous <anonymous@mail.org>
Co-authored-by: jacinhu <jacinhu@tencent.com>
Co-authored-by: XDC <196890111@qq.com>
Co-authored-by: gennyxu <gennyxu@tencent.com>
Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local>
Co-authored-by: quinnrong <quinnrong@tencent.com>
Co-authored-by: shenpenwang <565067453@qq.com>

* fix coreml groupnorm unit test

* [ADD]add exp op

* [BUG]fix deconv bisas error

* [UPD]init cpu memory with 0 for bert model

* [BUG]fix reshape static error; reshape static layer cannot handle 0 or -1

* [UPD]support inst norm for coreml; update tnn project file;

* [BUG]fix error for layer without layer resource, [] operater will add one, which is not thread safe

* [UPD]add param to batchnorm to support instancenorm

* [UPD]adjust groupnorm with batchnorm

* [UPD]support instancenorm with groupnorm by setting group==channels

* [UPD]update unit test of instancenorm

* [BUG]fix unit test error for layer batchnorm

* [UPD]update tnn project

* [BUG]fix unit test error for APPLE NPU

* [BUG]fix unit test crash for layer batchnorm

* [UPD]ignore cpu or gpu benchmark for mlmodel or mlmodelc

* [UPD]ignore

* [UPD]ignore pixelshuffle for apple npu

* [UPD]ignore matconvert for apple npu

* [UPD]ignore some unary op for apple npu

* [UPD]unify before and after coreml layer, simplify lstm layer

* [UPD]fix lstm error for ht and ct for biLSTM

* [UPD]fix const input load error

* [UPD]fix internal error

* [UPD]ignore

Co-authored-by: jacinhu <jacinhu@tencent.com>
Co-authored-by: teslawho <597645882@qq.com>
Co-authored-by: teslawho <71381575+teslawho@users.noreply.github.com>
Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: sxj731533730 <sxj731533730@gmail.com>
Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com>
Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>
Co-authored-by: powerpwang <powerpwang@outlook.com>
Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: saner zheng <zqawszqaws@126.com>
Co-authored-by: sanerzheng <sanerzheng@tencent.com>
Co-authored-by: Feng Shijie <j514681085@icloud.com>
Co-authored-by: Dandiding <Dandiding@tencent.com>
Co-authored-by: FeiGeChuanShu <774074168@qq.com>
Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com>
Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com>
Co-authored-by: kumbayaco <xyu.dai@gmail.com>
Co-authored-by: fishdai <fishdai@tencent.com>
Co-authored-by: anonymous <anonymous@mail.org>
Co-authored-by: XDC <196890111@qq.com>
Co-authored-by: gennyxu <gennyxu@tencent.com>
Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local>
Co-authored-by: quinnrong <quinnrong@tencent.com>
Co-authored-by: shenpenwang <565067453@qq.com>
* [ARM] support groupnorm

* [ARM] support swish

* add swish to conv-post-fuse

* [ADD][OPENCL] opencl add group norm and swish (Tencent#1722)

Co-authored-by: ealinli <ealinli@tencent.com>

* add x86 swish and groupnorm operator; explicitly open see4.2 with low version of compiler

* fix lstm unit test

Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>
Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: shenpenwang <565067453@qq.com>
In order to make the changes ready to merge, now delete the readme file
* [UPD] fix some about error status again

* [UPD]enable const folder to infer blobs shape for coreml; fix reshape shape size logic;

* [UPD]unify op system;check apple neral engine;

* [UPD]unify op system;check apple neral engine;

* [FIX] reset multi input in network forward for support image classifier demo

* [FIX] fix multi input in network forward

* [FIX] fix const op about weight shape(=1)

* [FIX] fix const op about weight shape(=1) again

* [UPD] update to support multi output forward

* [UPD] update to support split op

* [UPD]fix coreml multi output case; add cache logic;

* [UPD]fix coreml multi output case; add cache logic;

* [UPD]fix coreml multi output case; add cache logic;

* [UPD]fix multi output error

* [FIX] fix pool op about pad

* [UPD] update to support pad op (only allowed for H and W dimensions)

* [UPD]remove blob manager of coreml network

* [UPD]rename coreml_executor to coremlmodel

* [UPD] remove InitCoreMLExecutor

* [FIX] fix to support different input data type (float32 & int32) in forward

* [UPD] update to support expand dims & reduce dims reshape by adding unsqueeze & squeeze

* [UPD]change internal device from metal to arm for device npu

* [FIX] fix conv op about group conv

* [FIX] fix deconv op about group deconv

* [UPD] update to support sub op

* [UPD] update to support clip op

* [UPD] update to support slice op

* [UPD] update to support upsample op

* [FIX] fix slice op about endindex

* [UPD] update to support constant padding, allowed for C , H and W dimensions

* [UPD]fix camera switch device

* [UPD]fix actual device display error

* [UPD]fix cache path

* [UPD] upodate to add sub & slice & clip to project

* [FIX] fix demo use NPU error

* [UPD]fix ocr error

* [FIX] fix upsample op about align_corners

* [FIX] fix upsample op about Fractional scales

* [BUG]fix coreml output nil error; fix upsample nn for fractional scale

* [FIX] fix upsample op about scales order

* [UPD] update to support slice v2 op

* [UPD] update to support tanh v2 op

* [FIX] fix batchnorm op about mean value

* [FIX] fix some annotation

* [BUG]fix upsample error; add shuffle channel coreml layer

* [FIX] fix innerproduct op about inputchannels

* [UPD] remove slicev2 to slice file

* [UPD] remove tanhv2 to slice file

* [UPD] update to reshape op about expand dims & reduce dims

* [UPD] update to innerproduct op adout adding squeeze to reduce dims (in order to match old TNN model)

* [UPD] update to support flatten to 2D op

* [UPD] update to support relu6 op

* [ADD]]add cast coreml layer

* [ADD]]add shape coreml layer

* [UPD] add flatten & relu6 & shuffle_channel to xcode project

* [ADD]]add gather coreml layer

* [ADD]]add gelu coreml layer

* [ADD]]add layernorm coreml layer

* [BUG]support int32 for coreml const layer

* [BUG]support shape input for coreml reshape layer

* [BUG]support model check for TNN_APPLE_NPU_ENABLE using MLComputeUnitsCPUOnly

* [ADD]]add mat_mul coreml layer;

* [UPD] update to support reshape layer when reshape_type = 1

* [UPD] update to coreml model input&output support int32 data tpye

* [FIX] fix reshape layer about reshapedynamic input & output

* [BUG]support mlmodel and mlmodelc for benchmark

* [UPD] update to support conv layer with fp16 data type

* [FIX] add 'APPLE_NPU' to model_check device_type_message

* [FIX] fix some about conv layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [FIX] fix some about const layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support deconv layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support innerproduct layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support batchnorm layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support layernorm layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support prelu layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support matmul layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [FIX] fix some about matmul layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD]support fuse form mul+add to batchnorm

* [BUG]fix import error

* [BUG]fix reshape error

* [BUG] fix reshape layer when reshape_type=1 (input_shape_size = output_shape_size = 4)

* [BUG] fix reshape layer when reshape_type=1 (input_shape_size = output_shape_size = 4)

* [UPD]support ssd

* [ADD]]ssdlite-mobilenetv2 from tf

* [UPD] update to support conv & deconv & const & innerproduct & batchnorm & layernorm & matmul & prelu layers with fp16 data type (TNN fp16 -> CoreML fp16)

* [UPD] update to support batchnorm layers with fp16 data type (TNN fp16 -> CoreML fp16)

* [FIX] set coreml layer default using full precision

* [UPD] update to support hardsigmoid layer

* [UPD] update to support hardswish layer

* [UPD] update to support reducesum layer

* [UPD] update to support reducemean layer

* [UPD] add some coreml layer files to xcode project

* [FIX] fix some annotation about hardswish

* [BUG]fix reshape for tensor with dims size=0

* [UPD]support landscapeleft ui; clear navbar left items

* [UPD]support landscapeleft ui; add stackview to support minor camera preview;

* [ADD]add monodepth demo

* [UPD] update to support unit_test

* [FIX] upload missing download_model.sh and download_model.bat

* [UPD] update concat & conv & shuffle uint_test files for APPLE_NPU

* [FIX] rename unit_test model

* [UPD] update to support softplus layer

* [UPD] update to support softsign layer

* [UPD] update to support div layer

* [UPD] update binary layer unit_test for APPLE_NPU

* [UPD] update to support reducemax layer

* [UPD] update to support reducemin layer

* [UPD]update project file

* [UPD]add log error

* [UPD] update hardswish layer unit_test for APPLE_NPU

* [UPD]add log error

* [UPD] update to skip stride_slice when APPLE_NPU

* [BUG]fix batchnorm unitest

* [BUG]fix prelu unitest

* [BUG]fix prelu unitest

* [BUG]fix prelu unitest

* [BUG] fix unsqueeze unittest

* [BUG] fix split unittest

* [BUG] fix reshape unittest

* [BUG]fix updample unitest

* [BUG] fix reduce op (reducesum/reducemean/reducemax/reducemin) unittest

* [BUG]fix layernorm unitest

* [BUG] fix reduce op unittest again

* [BUG] fix deconv unittest

* [BUG] fix innerproduct unittest

* [BUG]fix ssd demo display error

* [BUG] fix matmul unittest

* [BUG]fix benchmark error to support multiple model in the same directory

* [BUG] add some explanation about reduce op unittest

* [BUG]fix benchmark error to support multiple model in the same directory

* [BUG] add some explanation about reduce op unittest again

* [BUG]fix batchnorm param error

* [BUG] fix reshape layer unittest

* [BUG]fix batchnorm param error

* [BUG]fix conv/deconv input/output channel error

* [UPD] update to support stride_slice & unittest

* [BUG] fix reshape layer unittest when reshape_type = 1

* [BUG] fix reshape layer unittest when reshape_type = 1 using reshapestatic

* [BUG] fix reshape layer unittest using reshapestatic

* [BUG] fix some annotation about reshape layer

* [BUG] fix reshape layer output permute when reshape_type = 1

* [BUG] fix reshape layer using reshapestatic whem reshape_type = 1

* [BUG]fix broadcast layer error for input form constant map; fix bert demo error;

* [BUG]fix blob convert error for int32 mat

* [BUG]fix reshape name style

* [UPD]add tiny bert fixed length 256

* [BUG] fix add layer by binary op base class

* [BUG] fix div/mul/sub layer by binary op base class

* [BUG]fix batchnorm unitest

* [BUG]ensure clean up mlmodelc if error raises when compile

* [UPD]adjust demo list

* [BUG] fix conv layer about activation inplace

* [BUG] fix conv layer about relu6

* [BUG] fix cleanup func none of return

* [BUG] remove repetitive line

* [BUG]fix batchnorm unitest

* [BUG] fix conv layer about relu6 inplace

* [UPD]automatically use apple npu

* [UPD]add clean logic for coreml

* [BUG] fix hardswish layer with 2 inputs

* [UPD] update README.md & support.md about APPLE_NPU

* [UPD]unify rawbuffer2coremlweight

* [UPD]support coreml lstm

* [UPD]fix lstm error

* [UPD]support coreml lstm bidirection

* [UPD]support coreml constofshape

* [UPD]support slice at axis=0

* [UPD]ignore

* [UPD]fix reshape error

* [UPD]fix lstm error; replace suqeeze with reshape because some case suqeeze raise runtime compile error for axis = {3, 4}

* [UPD]fix slice error

* [UPD]support multiple mlmodel in the same dirctory; add autorelease memory, because coreml may need large memory in ocr demo

* ignore

* [UPD]add log msg

* [UPD]fix reshape and slice error

* [UPD]add auto release to model

* [UPD]add auto release to model

* [UPD]unify convertion from rawbuffer to coreml weight param

* [FIX] fix matmul from rawbuffer to coreml weight param

* [UPD]fix innerproduct input channel error

* [BUG] fix matmul weight bug

* remove some annotation

* [BUG] fix matmul layer about fp16

* [FIX] fix sliceV2 op  conflict with master

* [FIX] fix sliceV2 op  conflict with master

* merge master (Tencent#1721)

* Fix trt multistream logger (Tencent#1521)

* [FIX] fix trt logger

* [FIX] catch std::bad_alloc error for trt8 building

* [FIX] return null while shape_tensor size -1

* Update version.h

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Update split_utils.cc (Tencent#1528)

我使用mingw32编译提示错误,因为使用mingw32编译器仍然需要空间命名
[ 99%] Building CXX object CMakeFiles/TNN.dir/source/tnn/utils/split_utils.cc.obj
D:\TNN\source\tnn\utils\split_utils.cc: In static member function 'static tnn::Status tnn::SplitUtils::SplitStr(const char*, tnn::str_arr&, const char*, bool, bool, bool, bool, bool)':
D:\TNN\source\tnn\utils\split_utils.cc:163:23: error: 'min' was not declared in this scope
             int len = min((i - cursor), subs_length - 1);
个人认为修改这样更好一下,可以适应mingw32和兼顾之前的编译器

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Update README.md (Tencent#1538)

Typos

* [UPD]update QQ group (Tencent#1552)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [opencl][fix] try save program cache (Tencent#1557)

* Dev roi align (Tencent#1511)

* [ARM] fix int32 blob cvt to mat

* [ARM] support roi align

* [ARM] add roi align unit test

* [ARM] add to xcodeproj

Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Fix arm gather and constant blob (Tencent#1564)

* [ARM][BUG] fix gather error for indice < 0

* [ARM][BUG] fix buffer to blob error without converting precision

* [ARM] update type convert in layer_norm fp16

Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>

* Dev add config layer (Tencent#1569)

* add config layer param to set arm conv algorithm for specific layer

Co-authored-by: powerpwang <powerpwang@outlook.com>
Co-authored-by: ealinli <ealinli@tencent.com>

* 修复 protobuf 版本升级造成的 onnx2tnn 编译失败的问题 (Tencent#1571)

* [ONNX][BUG]1. fix compile bug;

* [ONNX2TNN][BUG]1. 修复因为 protobuf 版本升级带来的编译问题;

* [ADD][TOOLS] add dynamic range quantization (Tencent#1572)

* [ADD][TOOLS] support fake quantization

* [UPD][FAKE_QUANT] fix bug

* [UPD][DOC] add fake quantization in doc

* [UPD] 1.rename fake quant to dynamic range quant 2.move dequant to net_optimizer

* [UPD] remove redundant comment

* [UPD] update comment for DynamicRangeDequant

* [DRQuant][UPD] fix namespace issue

* [DRQuant][UPD] Turn off TNN_SYMBOL_HIDE to fix ci

Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD][OPENCL] opencl support using unoptimized conv (Tencent#1581)

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][CONVERTER] lstm support sequence_lens (Tencent#1585)

Co-authored-by: ealinli <ealinli@tencent.com>

* [MODEL_CHECK][BUG]1. fix bug for dump layer(fp16); (Tencent#1567)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Bugfix from train branch (Tencent#1592)

* [BUG] fix get dims value bug when input is 1D or 2D in arm_reduce_layer_acc.cc.
* [BUG] fix Convert from NCHW to NHWC error when input is on arm device.
* [BUG] fix convert mat to blob bug when input is NC_INT32 on arm device.
* [BUG] fix tflite_converter bug when transform a activation layer.
* add nchw format condition when copy int32 mat to blob
* rollback changes on tflite_op_converter.cc

Co-authored-by: sanerzheng <sanerzheng@tencent.com>

* [UPD][OPENCL] opencl support x86 mat (Tencent#1593)

Co-authored-by: ealinli <ealinli@tencent.com>

* [CONVERTER][BUG]1. fix issue 1595; (Tencent#1596)

* [UPD][OPENCL] add ocl version check (Tencent#1601)

* [UPD][OPENCL] add ocl version check

* [UPD][OPENCL] update message for vervion check

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][OPENCL] solve the problem that matmul, tile have incorrect results on helio p65 (Tencent#1602)

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][DYQ] fix dynamic range quant compile error on windows (Tencent#1604)

Co-authored-by: ealinli <ealinli@tencent.com>

* [DOC][UPD] modify image links in doc (Tencent#1617)

Co-authored-by: ealinli <ealinli@tencent.com>

* remove redundant test cases (Tencent#1614)

* Fix typos. (Tencent#1626)

* Fix typos.

* Update Readme.

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Interpreter change from std::map to safe_map, later one offers a const operator[] function (Tencent#1618)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD][OPENCL] get opencl version when GpuType is OTHER (Tencent#1636)

* [UPD][OPENCL] get opencl version when GpuType is OTHER

* [UPD][OPENCL] optimize nv gpu judgment logic

Co-authored-by: ealinli <ealinli@tencent.com>

* Patch x86 avx support (Tencent#1633)

* merge dev_vc14_m1_debug, support x86 avx

* add option to support x86 avx2 compile

* update win_x86_opencl building script

Co-authored-by: Dandiding <Dandiding@tencent.com>

* fix x86 avx2 options (Tencent#1638)

* fix typos in doc (Tencent#1634)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [X86][BUG] fix deconv layer build error (Tencent#1641)

* [OPENCL][FIX] fix conv and dwconv on some of the AMD GPUs

* [UPD][OPENCL] add coor check for conv and dwconv

* [OPENCL][FIX] fix compilation issues

* [OPENCL][UPD] optimize AMD GPU judgment logic

Co-authored-by: ealinli <ealinli@tencent.com>

* [OPENCL][UPD] fix deconv, avgpool on AMD GPU (Tencent#1646)

* [OPENCL][UPD] fix deconv and avgpool when read image

* [OPENCL][UPD] add header file for pooling

Co-authored-by: ealinli <ealinli@tencent.com>

* [OPENCL][UPD] opencl support cache on windows (Tencent#1645)

* [UPD][OPENCL] add coor check for conv and dwconv

* [OPENCL][FIX] fix compilation issues

* [OPENCL][UPD] optimize AMD GPU judgment logic

* [OPENCL][UPD] support cache on windows

* [OPENCL][UPD] fix load cache on windows

Co-authored-by: ealinli <ealinli@tencent.com>

* [DRQ][UPD] dynamic range quant model support do const folder (Tencent#1647)

* [DRQ][UPD] dynamic range quant model support do const folder

* [TOOLS][UPD] dynamic range quant updates usage

Co-authored-by: ealinli <ealinli@tencent.com>

* 1. make model_check support dynamic range quantized model; (Tencent#1653)

* [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial (Tencent#1640)

* [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial

* [TUTORIAL][UPD] update code link

* [TUTORIAL][UPD] fix typo

Co-authored-by: ealinli <ealinli@tencent.com>

* [X86][FIX] binary op support fp16 weights (Tencent#1655)

* [X86][FIX] binary op support fp16 weights

* [X86][FIX] matmul support fp16 weights

Co-authored-by: ealinli <ealinli@tencent.com>

* Feature dynamic quant fc (Tencent#1660)

* [DYNAMIC_QUANT][UPD]1. dynamic quant support inner_product layer;

* [ARM][UPD]1. arm gemm 部分情况下使用 Kahan sum 算法,以避免 fp16 累加误差;

* [FIX][CPU][TRT] Fix CPU Not OP bug, Fix TensorRT ShapeTensor Class Bug. (Tencent#1663)

* [FIX] Fix CPU Not Operator data type error.

* [FIX] Fix TensorRT ShapeTensor class ConvertTo1D() func bug

* fix _mm256_load_ps segmentation fault (Tencent#1682)

* fix _mm256_load_ps segmentation fault

* fix crash on mm256_load when  innerproduct

* use loadu instead of stride-judgement

* remove unused code

Co-authored-by: fishdai <fishdai@tencent.com>

* x86_acc & blob_converter now will consider the BlobHandle.bytes_offset (Tencent#1684)

* Dev x86 layer adapter (Tencent#1683)

* [X86] add layer acc adapter

* [X86] NULL to nullptr

* [X86][OPENVINO] add openvino adapter layer builder, fallback to cpu naive impl if there is no normal ov layer builder

* [X86][OPENVINO] fix hard code of ov precision

Co-authored-by: anonymous <anonymous@mail.org>

* [ARM] fix arm cross compile error caused by float-abi (Tencent#1678)

* avoid nullptr in IsSupport (Tencent#1685)

* [UPD][TOOLS] 1.increase subs_length 2.align model support bool and int32 input 3. fix gather and onehot convert 4. gather_nd support indices_shape[-1] < r (Tencent#1686)

Co-authored-by: ealinli <ealinli@tencent.com>

* Dev metal ngray (Tencent#1693)

* [METAL] metal support ngray input mat

* [METAL]fix bytes_size

* [COREML] fix dynamic quantization model about coreml

Co-authored-by: jacinhu <jacinhu@tencent.com>
Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [UPD][DRQ] support quantizing matmul's const weight (Tencent#1698)

* [UPD][DRQ] support quantizing matmul's const weight

* [UPD][DRQ] add scale check in constant map

Co-authored-by: ealinli <ealinli@tencent.com>

* [FIX] fix compile macos framework (Tencent#1687)

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* Optimize dynamic range quantize (Tencent#1699)

* [DynamicRangeQuantize][UPD]1. 添加了根据权重分布判断是否量化的逻辑;

* [DynamicQuantization][UPD]1. dynamic_range_quantization support TNN fp16 model;

* [DRQ][UPD]1. 修复了 model_check_android.sh 脚本中指定 reference file,但是推理没有用到的 bug;2. 优化了 dynamic_range_quantization 中的部分代码;

* [DRQ][UPD]1.fix conflict with merge master code;

Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>

* Fix windows x86 build (Tencent#1697)

* [FIX] remove nanodet for windows

* remove ninga compile for some bug

* fix x86 mat type register macro name

* fix x86 matmul with 2 inputs

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [METAL] fix stride slice crach when dims is 2 (Tencent#1701)

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [mac] 1. FIX X86 and ARM conflict; 2. ADD ARM arch on intel cpu (You can use ARM if rosetta-X86 crash).  3. Use ios project build/profile M1-Mac. (Tencent#1700)

Co-authored-by: gennyxu <gennyxu@tencent.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [iOS][UPD]1. add missing file for xcode project; (Tencent#1705)

* [BUG]fix coreml error of slicev2、padv2 and matmul; (Tencent#1703)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

* [BUG]fix dynamic dequant error; fix arm pad error;

* [BUG]support coreml padv2

* [BUG]fix ccoreml matmul error when it has const input blob

* [BUG]fix coreml slicev2

* [UPD]add convert logic of swish

* [BUG]fix  error cpu error for x86 mac

* [UPD]support fusion for gemm + bn

* [UPD]add convert logic of swish

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD]update merge logic for swish groupnorm deconv (Tencent#1708)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

* [BUG]fix dynamic dequant error; fix arm pad error;

* [BUG]support coreml padv2

* [BUG]fix ccoreml matmul error when it has const input blob

* [BUG]fix coreml slicev2

* [UPD]add convert logic of swish

* [BUG]fix  error cpu error for x86 mac

* [UPD]support fusion for gemm + bn

* [UPD]add convert logic of swish

* [UPD]support fusion for deconv+add and deconv+add+bn

* [UPD]add aliyun disk link for tnn models

* [UPD]support fusion for group norm

* [UPD]support fusion for swish

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [DRQ][BUG]1. fix bug for max_values; (Tencent#1716)

* Hotfix m1 build (Tencent#1715)

* fix apple m1 clang 13.1 compile error

* fix unit test compile error

Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>

Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: sxj731533730 <sxj731533730@gmail.com>
Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com>
Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>
Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>
Co-authored-by: powerpwang <powerpwang@outlook.com>
Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: saner zheng <zqawszqaws@126.com>
Co-authored-by: sanerzheng <sanerzheng@tencent.com>
Co-authored-by: Feng Shijie <j514681085@icloud.com>
Co-authored-by: Dandiding <Dandiding@tencent.com>
Co-authored-by: FeiGeChuanShu <774074168@qq.com>
Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com>
Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com>
Co-authored-by: kumbayaco <xyu.dai@gmail.com>
Co-authored-by: fishdai <fishdai@tencent.com>
Co-authored-by: anonymous <anonymous@mail.org>
Co-authored-by: jacinhu <jacinhu@tencent.com>
Co-authored-by: XDC <196890111@qq.com>
Co-authored-by: gennyxu <gennyxu@tencent.com>
Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local>

* [FIX] fix sliceV2 op  conflict with master again

* [METAL][OP][FIX] 1.metal support groupnorm & swish op 2.fix metal blob conveter & reformat bug when input dim is 1

* reset model

* [COREML] coreml support swish op

* [COREML] fix coreml batchnorn bug

* [COREML]coreml support groupmorm

* [COREML]coreml support instancenorm

* reset model

* solve conflict

* solve conflict

* Dev groupnorm (Tencent#1726)

* Fix trt multistream logger (Tencent#1521)

* [FIX] fix trt logger

* [FIX] catch std::bad_alloc error for trt8 building

* [FIX] return null while shape_tensor size -1

* Update version.h

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Update split_utils.cc (Tencent#1528)

我使用mingw32编译提示错误,因为使用mingw32编译器仍然需要空间命名
[ 99%] Building CXX object CMakeFiles/TNN.dir/source/tnn/utils/split_utils.cc.obj
D:\TNN\source\tnn\utils\split_utils.cc: In static member function 'static tnn::Status tnn::SplitUtils::SplitStr(const char*, tnn::str_arr&, const char*, bool, bool, bool, bool, bool)':
D:\TNN\source\tnn\utils\split_utils.cc:163:23: error: 'min' was not declared in this scope
             int len = min((i - cursor), subs_length - 1);
个人认为修改这样更好一下,可以适应mingw32和兼顾之前的编译器

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Update README.md (Tencent#1538)

Typos

* [UPD]update QQ group (Tencent#1552)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [opencl][fix] try save program cache (Tencent#1557)

* Dev roi align (Tencent#1511)

* [ARM] fix int32 blob cvt to mat

* [ARM] support roi align

* [ARM] add roi align unit test

* [ARM] add to xcodeproj

Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Fix arm gather and constant blob (Tencent#1564)

* [ARM][BUG] fix gather error for indice < 0

* [ARM][BUG] fix buffer to blob error without converting precision

* [ARM] update type convert in layer_norm fp16

Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>

* Dev add config layer (Tencent#1569)

* add config layer param to set arm conv algorithm for specific layer

Co-authored-by: powerpwang <powerpwang@outlook.com>
Co-authored-by: ealinli <ealinli@tencent.com>

* 修复 protobuf 版本升级造成的 onnx2tnn 编译失败的问题 (Tencent#1571)

* [ONNX][BUG]1. fix compile bug;

* [ONNX2TNN][BUG]1. 修复因为 protobuf 版本升级带来的编译问题;

* [ADD][TOOLS] add dynamic range quantization (Tencent#1572)

* [ADD][TOOLS] support fake quantization

* [UPD][FAKE_QUANT] fix bug

* [UPD][DOC] add fake quantization in doc

* [UPD] 1.rename fake quant to dynamic range quant 2.move dequant to net_optimizer

* [UPD] remove redundant comment

* [UPD] update comment for DynamicRangeDequant

* [DRQuant][UPD] fix namespace issue

* [DRQuant][UPD] Turn off TNN_SYMBOL_HIDE to fix ci

Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD][OPENCL] opencl support using unoptimized conv (Tencent#1581)

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][CONVERTER] lstm support sequence_lens (Tencent#1585)

Co-authored-by: ealinli <ealinli@tencent.com>

* [MODEL_CHECK][BUG]1. fix bug for dump layer(fp16); (Tencent#1567)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Bugfix from train branch (Tencent#1592)

* [BUG] fix get dims value bug when input is 1D or 2D in arm_reduce_layer_acc.cc.
* [BUG] fix Convert from NCHW to NHWC error when input is on arm device.
* [BUG] fix convert mat to blob bug when input is NC_INT32 on arm device.
* [BUG] fix tflite_converter bug when transform a activation layer.
* add nchw format condition when copy int32 mat to blob
* rollback changes on tflite_op_converter.cc

Co-authored-by: sanerzheng <sanerzheng@tencent.com>

* [UPD][OPENCL] opencl support x86 mat (Tencent#1593)

Co-authored-by: ealinli <ealinli@tencent.com>

* [CONVERTER][BUG]1. fix issue 1595; (Tencent#1596)

* [UPD][OPENCL] add ocl version check (Tencent#1601)

* [UPD][OPENCL] add ocl version check

* [UPD][OPENCL] update message for vervion check

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][OPENCL] solve the problem that matmul, tile have incorrect results on helio p65 (Tencent#1602)

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][DYQ] fix dynamic range quant compile error on windows (Tencent#1604)

Co-authored-by: ealinli <ealinli@tencent.com>

* [DOC][UPD] modify image links in doc (Tencent#1617)

Co-authored-by: ealinli <ealinli@tencent.com>

* remove redundant test cases (Tencent#1614)

* Fix typos. (Tencent#1626)

* Fix typos.

* Update Readme.

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Interpreter change from std::map to safe_map, later one offers a const operator[] function (Tencent#1618)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD][OPENCL] get opencl version when GpuType is OTHER (Tencent#1636)

* [UPD][OPENCL] get opencl version when GpuType is OTHER

* [UPD][OPENCL] optimize nv gpu judgment logic

Co-authored-by: ealinli <ealinli@tencent.com>

* Patch x86 avx support (Tencent#1633)

* merge dev_vc14_m1_debug, support x86 avx

* add option to support x86 avx2 compile

* update win_x86_opencl building script

Co-authored-by: Dandiding <Dandiding@tencent.com>

* fix x86 avx2 options (Tencent#1638)

* fix typos in doc (Tencent#1634)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [X86][BUG] fix deconv layer build error (Tencent#1641)

* [OPENCL][FIX] fix conv and dwconv on some of the AMD GPUs

* [UPD][OPENCL] add coor check for conv and dwconv

* [OPENCL][FIX] fix compilation issues

* [OPENCL][UPD] optimize AMD GPU judgment logic

Co-authored-by: ealinli <ealinli@tencent.com>

* [OPENCL][UPD] fix deconv, avgpool on AMD GPU (Tencent#1646)

* [OPENCL][UPD] fix deconv and avgpool when read image

* [OPENCL][UPD] add header file for pooling

Co-authored-by: ealinli <ealinli@tencent.com>

* [OPENCL][UPD] opencl support cache on windows (Tencent#1645)

* [UPD][OPENCL] add coor check for conv and dwconv

* [OPENCL][FIX] fix compilation issues

* [OPENCL][UPD] optimize AMD GPU judgment logic

* [OPENCL][UPD] support cache on windows

* [OPENCL][UPD] fix load cache on windows

Co-authored-by: ealinli <ealinli@tencent.com>

* [DRQ][UPD] dynamic range quant model support do const folder (Tencent#1647)

* [DRQ][UPD] dynamic range quant model support do const folder

* [TOOLS][UPD] dynamic range quant updates usage

Co-authored-by: ealinli <ealinli@tencent.com>

* 1. make model_check support dynamic range quantized model; (Tencent#1653)

* [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial (Tencent#1640)

* [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial

* [TUTORIAL][UPD] update code link

* [TUTORIAL][UPD] fix typo

Co-authored-by: ealinli <ealinli@tencent.com>

* [X86][FIX] binary op support fp16 weights (Tencent#1655)

* [X86][FIX] binary op support fp16 weights

* [X86][FIX] matmul support fp16 weights

Co-authored-by: ealinli <ealinli@tencent.com>

* Feature dynamic quant fc (Tencent#1660)

* [DYNAMIC_QUANT][UPD]1. dynamic quant support inner_product layer;

* [ARM][UPD]1. arm gemm 部分情况下使用 Kahan sum 算法,以避免 fp16 累加误差;

* [FIX][CPU][TRT] Fix CPU Not OP bug, Fix TensorRT ShapeTensor Class Bug. (Tencent#1663)

* [FIX] Fix CPU Not Operator data type error.

* [FIX] Fix TensorRT ShapeTensor class ConvertTo1D() func bug

* fix _mm256_load_ps segmentation fault (Tencent#1682)

* fix _mm256_load_ps segmentation fault

* fix crash on mm256_load when  innerproduct

* use loadu instead of stride-judgement

* remove unused code

Co-authored-by: fishdai <fishdai@tencent.com>

* x86_acc & blob_converter now will consider the BlobHandle.bytes_offset (Tencent#1684)

* Dev x86 layer adapter (Tencent#1683)

* [X86] add layer acc adapter

* [X86] NULL to nullptr

* [X86][OPENVINO] add openvino adapter layer builder, fallback to cpu naive impl if there is no normal ov layer builder

* [X86][OPENVINO] fix hard code of ov precision

Co-authored-by: anonymous <anonymous@mail.org>

* [ARM] fix arm cross compile error caused by float-abi (Tencent#1678)

* avoid nullptr in IsSupport (Tencent#1685)

* [UPD][TOOLS] 1.increase subs_length 2.align model support bool and int32 input 3. fix gather and onehot convert 4. gather_nd support indices_shape[-1] < r (Tencent#1686)

Co-authored-by: ealinli <ealinli@tencent.com>

* Dev metal ngray (Tencent#1693)

* [METAL] metal support ngray input mat

* [METAL]fix bytes_size

* [COREML] fix dynamic quantization model about coreml

Co-authored-by: jacinhu <jacinhu@tencent.com>
Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [UPD][DRQ] support quantizing matmul's const weight (Tencent#1698)

* [UPD][DRQ] support quantizing matmul's const weight

* [UPD][DRQ] add scale check in constant map

Co-authored-by: ealinli <ealinli@tencent.com>

* [FIX] fix compile macos framework (Tencent#1687)

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* Optimize dynamic range quantize (Tencent#1699)

* [DynamicRangeQuantize][UPD]1. 添加了根据权重分布判断是否量化的逻辑;

* [DynamicQuantization][UPD]1. dynamic_range_quantization support TNN fp16 model;

* [DRQ][UPD]1. 修复了 model_check_android.sh 脚本中指定 reference file,但是推理没有用到的 bug;2. 优化了 dynamic_range_quantization 中的部分代码;

* [DRQ][UPD]1.fix conflict with merge master code;

Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>

* Fix windows x86 build (Tencent#1697)

* [FIX] remove nanodet for windows

* remove ninga compile for some bug

* fix x86 mat type register macro name

* fix x86 matmul with 2 inputs

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [METAL] fix stride slice crach when dims is 2 (Tencent#1701)

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [mac] 1. FIX X86 and ARM conflict; 2. ADD ARM arch on intel cpu (You can use ARM if rosetta-X86 crash).  3. Use ios project build/profile M1-Mac. (Tencent#1700)

Co-authored-by: gennyxu <gennyxu@tencent.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [iOS][UPD]1. add missing file for xcode project; (Tencent#1705)

* [BUG]fix coreml error of slicev2、padv2 and matmul; (Tencent#1703)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

* [BUG]fix dynamic dequant error; fix arm pad error;

* [BUG]support coreml padv2

* [BUG]fix ccoreml matmul error when it has const input blob

* [BUG]fix coreml slicev2

* [UPD]add convert logic of swish

* [BUG]fix  error cpu error for x86 mac

* [UPD]support fusion for gemm + bn

* [UPD]add convert logic of swish

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD]update merge logic for swish groupnorm deconv (Tencent#1708)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

* [BUG]fix dynamic dequant error; fix arm pad error;

* [BUG]support coreml padv2

* [BUG]fix ccoreml matmul error when it has const input blob

* [BUG]fix coreml slicev2

* [UPD]add convert logic of swish

* [BUG]fix  error cpu error for x86 mac

* [UPD]support fusion for gemm + bn

* [UPD]add convert logic of swish

* [UPD]support fusion for deconv+add and deconv+add+bn

* [UPD]add aliyun disk link for tnn models

* [UPD]support fusion for group norm

* [UPD]support fusion for swish

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [DRQ][BUG]1. fix bug for max_values; (Tencent#1716)

* Hotfix m1 build (Tencent#1715)

* fix apple m1 clang 13.1 compile error

* fix unit test compile error

Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>

* [ARM] support groupnorm

* [ARM] support swish

* add swish to conv-post-fuse

* [ADD][OPENCL] opencl add group norm and swish (Tencent#1722)

Co-authored-by: ealinli <ealinli@tencent.com>

* add x86 swish and groupnorm operator; explicitly open see4.2 with low version of compiler

Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: sxj731533730 <sxj731533730@gmail.com>
Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com>
Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>
Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>
Co-authored-by: powerpwang <powerpwang@outlook.com>
Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: saner zheng <zqawszqaws@126.com>
Co-authored-by: sanerzheng <sanerzheng@tencent.com>
Co-authored-by: Feng Shijie <j514681085@icloud.com>
Co-authored-by: Dandiding <Dandiding@tencent.com>
Co-authored-by: FeiGeChuanShu <774074168@qq.com>
Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com>
Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com>
Co-authored-by: kumbayaco <xyu.dai@gmail.com>
Co-authored-by: fishdai <fishdai@tencent.com>
Co-authored-by: anonymous <anonymous@mail.org>
Co-authored-by: jacinhu <jacinhu@tencent.com>
Co-authored-by: XDC <196890111@qq.com>
Co-authored-by: gennyxu <gennyxu@tencent.com>
Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local>
Co-authored-by: quinnrong <quinnrong@tencent.com>
Co-authored-by: shenpenwang <565067453@qq.com>

* fix coreml groupnorm unit test

* [ADD]add exp op

* [BUG]fix deconv bisas error

* [UPD]init cpu memory with 0 for bert model

* [BUG]fix reshape static error; reshape static layer cannot handle 0 or -1

* [UPD]support inst norm for coreml; update tnn project file;

* [BUG]fix error for layer without layer resource, [] operater will add one, which is not thread safe

* [UPD]add param to batchnorm to support instancenorm

* [UPD]adjust groupnorm with batchnorm

* [UPD]support instancenorm with groupnorm by setting group==channels

* [UPD]update unit test of instancenorm

* [BUG]fix unit test error for layer batchnorm

* [UPD]update tnn project

* [BUG]fix unit test error for APPLE NPU

* [BUG]fix unit test crash for layer batchnorm

* [UPD]ignore cpu or gpu benchmark for mlmodel or mlmodelc

* [UPD]ignore

* [UPD]ignore pixelshuffle for apple npu

* [UPD]ignore matconvert for apple npu

* [UPD]ignore some unary op for apple npu

* [UPD]unify before and after coreml layer, simplify lstm layer

* [UPD]fix lstm error for ht and ct for biLSTM

* [UPD]fix const input load error

* [UPD]fix internal error

* [UPD]ignore

Co-authored-by: jacinhu <jacinhu@tencent.com>
Co-authored-by: teslawho <597645882@qq.com>
Co-authored-by: teslawho <71381575+teslawho@users.noreply.github.com>
Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: sxj731533730 <sxj731533730@gmail.com>
Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com>
Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>
Co-authored-by: powerpwang <powerpwang@outlook.com>
Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: saner zheng <zqawszqaws@126.com>
Co-authored-by: sanerzheng <sanerzheng@tencent.com>
Co-authored-by: Feng Shijie <j514681085@icloud.com>
Co-authored-by: Dandiding <Dandiding@tencent.com>
Co-authored-by: FeiGeChuanShu <774074168@qq.com>
Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com>
Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com>
Co-authored-by: kumbayaco <xyu.dai@gmail.com>
Co-authored-by: fishdai <fishdai@tencent.com>
Co-authored-by: anonymous <anonymous@mail.org>
Co-authored-by: XDC <196890111@qq.com>
Co-authored-by: gennyxu <gennyxu@tencent.com>
Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local>
Co-authored-by: quinnrong <quinnrong@tencent.com>
Co-authored-by: shenpenwang <565067453@qq.com>
* [ARM] support groupnorm

* [ARM] support swish

* add swish to conv-post-fuse

* [ADD][OPENCL] opencl add group norm and swish (Tencent#1722)

Co-authored-by: ealinli <ealinli@tencent.com>

* add x86 swish and groupnorm operator; explicitly open see4.2 with low version of compiler

* fix lstm unit test

Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>
Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: shenpenwang <565067453@qq.com>
@ZaoZhe6666
Copy link
Author

改为使用 Optimizer 来支持INT类型计算,改动包括:

  1. Optimizer 中添加优化器,筛选计算中使用了 INT32 / 输出为 INT8 类型的算子,在算子前后合适位置分别添加 CAST 算子
  2. 删去原有的 TransDataType 函数,不再使用原地交换
  3. 删去部分算子内部逻辑中的 INT32 计算,仅保留原有的 FLOAT 计算逻辑(INT 数据会被 CAST 转为 FLOAT 参与计算)
  4. 添加 FLOAT_TO_INT8 INT32_TO_INT8 的 CAST 支持逻辑
  5. 由于部分 Constant 类型的 Initial 数据,会添加 CAST 算子转换为 FLOAT 类型数据,这类新增算子不应该被默认为 Constant(否则将不会做地址变换),在 base_layer 与 optimizer/layout_reformat 中添加相应的筛选逻辑
  6. equal/greater 算子复用 Binary 的 Float 计算逻辑,在输出后再通过 Cast 算子转为 INT8 类型

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants