Dev transformer #1723

ZaoZhe6666 · 2022-07-12T02:52:38Z

您好，为支持业务端 Transformer 模型，因此在您的项目中添加了包括 ARM 端 Where、Cast、Unsqueeze、Shape、Not、Equal、Greater 算子的支持。同时添加了 ARM 端计算 INT32 类型的支持：先原地转换为 FLOAT，计算后再转回。更详细的说明可以参看 commit 信息中的 README_EVA 文件，以及提及的 iwiki 链接。谢谢。

bluaxe · 2022-08-02T06:13:03Z

source/tnn/device/arm/acc/arm_concat_layer_acc.cc

@@ -470,12 +478,44 @@ Status ArmConcatLayerAcc::Exec(const std::vector<Blob *> &inputs, const std::vec
    return TNN_OK;
 }

+// 修改处：添加了新的函数TransDataType，用于将T_IN类别数据转化为T_OUT类别存储


Inplace blob data type converting may cause result issues when the blob is referenced by multiple layers.
Besides, the TransDataType functions was defined in multiple layers, which is not good for maintain, Please add an optimizer which can insert cast layers to do the converting job.

bluaxe · 2022-08-02T06:14:55Z

source/tnn/device/arm/acc/arm_binary_layer_acc.cc

@@ -369,9 +415,39 @@ Status ArmBinaryLayerAcc::ExecInt8(const std::vector<Blob *> &inputs, const std:
    return TNN_OK;
 }

+template <typename T_IN, typename T_OUT>


Same as above.

bluaxe · 2022-08-02T06:18:07Z

Binary file is not recommended to be includeds. Please resubmit a PR without source/tnn/tnn.zip

…page.action?pageId=1928435580

* add logsoftmax kernel and trt layer builder Signed-off-by: sjfeng1999 <j514681085@icloud.com> * add logsoftmax unittest Signed-off-by: sjfeng1999 <j514681085@icloud.com> * [CUDA][TRT] unpack logsoftmaxPlugin to SoftmaxLayer and UnargLayer Signed-off-by: sjfeng1999 <j514681085@icloud.com>

* [UPD] fix some about error status again * [UPD]enable const folder to infer blobs shape for coreml; fix reshape shape size logic; * [UPD]unify op system;check apple neral engine; * [UPD]unify op system;check apple neral engine; * [FIX] reset multi input in network forward for support image classifier demo * [FIX] fix multi input in network forward * [FIX] fix const op about weight shape(=1) * [FIX] fix const op about weight shape(=1) again * [UPD] update to support multi output forward * [UPD] update to support split op * [UPD]fix coreml multi output case; add cache logic; * [UPD]fix coreml multi output case; add cache logic; * [UPD]fix coreml multi output case; add cache logic; * [UPD]fix multi output error * [FIX] fix pool op about pad * [UPD] update to support pad op (only allowed for H and W dimensions) * [UPD]remove blob manager of coreml network * [UPD]rename coreml_executor to coremlmodel * [UPD] remove InitCoreMLExecutor * [FIX] fix to support different input data type (float32 & int32) in forward * [UPD] update to support expand dims & reduce dims reshape by adding unsqueeze & squeeze * [UPD]change internal device from metal to arm for device npu * [FIX] fix conv op about group conv * [FIX] fix deconv op about group deconv * [UPD] update to support sub op * [UPD] update to support clip op * [UPD] update to support slice op * [UPD] update to support upsample op * [FIX] fix slice op about endindex * [UPD] update to support constant padding, allowed for C , H and W dimensions * [UPD]fix camera switch device * [UPD]fix actual device display error * [UPD]fix cache path * [UPD] upodate to add sub & slice & clip to project * [FIX] fix demo use NPU error * [UPD]fix ocr error * [FIX] fix upsample op about align_corners * [FIX] fix upsample op about Fractional scales * [BUG]fix coreml output nil error; fix upsample nn for fractional scale * [FIX] fix upsample op about scales order * [UPD] update to support slice v2 op * [UPD] update to support tanh v2 op * [FIX] fix batchnorm op about mean value * [FIX] fix some annotation * [BUG]fix upsample error; add shuffle channel coreml layer * [FIX] fix innerproduct op about inputchannels * [UPD] remove slicev2 to slice file * [UPD] remove tanhv2 to slice file * [UPD] update to reshape op about expand dims & reduce dims * [UPD] update to innerproduct op adout adding squeeze to reduce dims (in order to match old TNN model) * [UPD] update to support flatten to 2D op * [UPD] update to support relu6 op * [ADD]]add cast coreml layer * [ADD]]add shape coreml layer * [UPD] add flatten & relu6 & shuffle_channel to xcode project * [ADD]]add gather coreml layer * [ADD]]add gelu coreml layer * [ADD]]add layernorm coreml layer * [BUG]support int32 for coreml const layer * [BUG]support shape input for coreml reshape layer * [BUG]support model check for TNN_APPLE_NPU_ENABLE using MLComputeUnitsCPUOnly * [ADD]]add mat_mul coreml layer; * [UPD] update to support reshape layer when reshape_type = 1 * [UPD] update to coreml model input&output support int32 data tpye * [FIX] fix reshape layer about reshapedynamic input & output * [BUG]support mlmodel and mlmodelc for benchmark * [UPD] update to support conv layer with fp16 data type * [FIX] add 'APPLE_NPU' to model_check device_type_message * [FIX] fix some about conv layer with fp16 data type (TNN fp16 -> CoreML fp32) * [FIX] fix some about const layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support deconv layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support innerproduct layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support batchnorm layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support layernorm layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support prelu layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support matmul layer with fp16 data type (TNN fp16 -> CoreML fp32) * [FIX] fix some about matmul layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD]support fuse form mul+add to batchnorm * [BUG]fix import error * [BUG]fix reshape error * [BUG] fix reshape layer when reshape_type=1 (input_shape_size = output_shape_size = 4) * [BUG] fix reshape layer when reshape_type=1 (input_shape_size = output_shape_size = 4) * [UPD]support ssd * [ADD]]ssdlite-mobilenetv2 from tf * [UPD] update to support conv & deconv & const & innerproduct & batchnorm & layernorm & matmul & prelu layers with fp16 data type (TNN fp16 -> CoreML fp16) * [UPD] update to support batchnorm layers with fp16 data type (TNN fp16 -> CoreML fp16) * [FIX] set coreml layer default using full precision * [UPD] update to support hardsigmoid layer * [UPD] update to support hardswish layer * [UPD] update to support reducesum layer * [UPD] update to support reducemean layer * [UPD] add some coreml layer files to xcode project * [FIX] fix some annotation about hardswish * [BUG]fix reshape for tensor with dims size=0 * [UPD]support landscapeleft ui; clear navbar left items * [UPD]support landscapeleft ui; add stackview to support minor camera preview; * [ADD]add monodepth demo * [UPD] update to support unit_test * [FIX] upload missing download_model.sh and download_model.bat * [UPD] update concat & conv & shuffle uint_test files for APPLE_NPU * [FIX] rename unit_test model * [UPD] update to support softplus layer * [UPD] update to support softsign layer * [UPD] update to support div layer * [UPD] update binary layer unit_test for APPLE_NPU * [UPD] update to support reducemax layer * [UPD] update to support reducemin layer * [UPD]update project file * [UPD]add log error * [UPD] update hardswish layer unit_test for APPLE_NPU * [UPD]add log error * [UPD] update to skip stride_slice when APPLE_NPU * [BUG]fix batchnorm unitest * [BUG]fix prelu unitest * [BUG]fix prelu unitest * [BUG]fix prelu unitest * [BUG] fix unsqueeze unittest * [BUG] fix split unittest * [BUG] fix reshape unittest * [BUG]fix updample unitest * [BUG] fix reduce op (reducesum/reducemean/reducemax/reducemin) unittest * [BUG]fix layernorm unitest * [BUG] fix reduce op unittest again * [BUG] fix deconv unittest * [BUG] fix innerproduct unittest * [BUG]fix ssd demo display error * [BUG] fix matmul unittest * [BUG]fix benchmark error to support multiple model in the same directory * [BUG] add some explanation about reduce op unittest * [BUG]fix benchmark error to support multiple model in the same directory * [BUG] add some explanation about reduce op unittest again * [BUG]fix batchnorm param error * [BUG] fix reshape layer unittest * [BUG]fix batchnorm param error * [BUG]fix conv/deconv input/output channel error * [UPD] update to support stride_slice & unittest * [BUG] fix reshape layer unittest when reshape_type = 1 * [BUG] fix reshape layer unittest when reshape_type = 1 using reshapestatic * [BUG] fix reshape layer unittest using reshapestatic * [BUG] fix some annotation about reshape layer * [BUG] fix reshape layer output permute when reshape_type = 1 * [BUG] fix reshape layer using reshapestatic whem reshape_type = 1 * [BUG]fix broadcast layer error for input form constant map； fix bert demo error； * [BUG]fix blob convert error for int32 mat * [BUG]fix reshape name style * [UPD]add tiny bert fixed length 256 * [BUG] fix add layer by binary op base class * [BUG] fix div/mul/sub layer by binary op base class * [BUG]fix batchnorm unitest * [BUG]ensure clean up mlmodelc if error raises when compile * [UPD]adjust demo list * [BUG] fix conv layer about activation inplace * [BUG] fix conv layer about relu6 * [BUG] fix cleanup func none of return * [BUG] remove repetitive line * [BUG]fix batchnorm unitest * [BUG] fix conv layer about relu6 inplace * [UPD]automatically use apple npu * [UPD]add clean logic for coreml * [BUG] fix hardswish layer with 2 inputs * [UPD] update README.md & support.md about APPLE_NPU * [UPD]unify rawbuffer2coremlweight * [UPD]support coreml lstm * [UPD]fix lstm error * [UPD]support coreml lstm bidirection * [UPD]support coreml constofshape * [UPD]support slice at axis=0 * [UPD]ignore * [UPD]fix reshape error * [UPD]fix lstm error; replace suqeeze with reshape because some case suqeeze raise runtime compile error for axis = {3, 4} * [UPD]fix slice error * [UPD]support multiple mlmodel in the same dirctory; add autorelease memory, because coreml may need large memory in ocr demo * ignore * [UPD]add log msg * [UPD]fix reshape and slice error * [UPD]add auto release to model * [UPD]add auto release to model * [UPD]unify convertion from rawbuffer to coreml weight param * [FIX] fix matmul from rawbuffer to coreml weight param * [UPD]fix innerproduct input channel error * [BUG] fix matmul weight bug * remove some annotation * [BUG] fix matmul layer about fp16 * [FIX] fix sliceV2 op conflict with master * [FIX] fix sliceV2 op conflict with master * merge master (Tencent#1721) * Fix trt multistream logger (Tencent#1521) * [FIX] fix trt logger * [FIX] catch std::bad_alloc error for trt8 building * [FIX] return null while shape_tensor size -1 * Update version.h Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Update split_utils.cc (Tencent#1528) 我使用mingw32编译提示错误，因为使用mingw32编译器仍然需要空间命名 [ 99%] Building CXX object CMakeFiles/TNN.dir/source/tnn/utils/split_utils.cc.obj D:\TNN\source\tnn\utils\split_utils.cc: In static member function 'static tnn::Status tnn::SplitUtils::SplitStr(const char*, tnn::str_arr&, const char*, bool, bool, bool, bool, bool)': D:\TNN\source\tnn\utils\split_utils.cc:163:23: error: 'min' was not declared in this scope int len = min((i - cursor), subs_length - 1); 个人认为修改这样更好一下，可以适应mingw32和兼顾之前的编译器 Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Update README.md (Tencent#1538) Typos * [UPD]update QQ group (Tencent#1552) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [opencl][fix] try save program cache (Tencent#1557) * Dev roi align (Tencent#1511) * [ARM] fix int32 blob cvt to mat * [ARM] support roi align * [ARM] add roi align unit test * [ARM] add to xcodeproj Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Fix arm gather and constant blob (Tencent#1564) * [ARM][BUG] fix gather error for indice < 0 * [ARM][BUG] fix buffer to blob error without converting precision * [ARM] update type convert in layer_norm fp16 Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> * Dev add config layer (Tencent#1569) * add config layer param to set arm conv algorithm for specific layer Co-authored-by: powerpwang <powerpwang@outlook.com> Co-authored-by: ealinli <ealinli@tencent.com> * 修复 protobuf 版本升级造成的 onnx2tnn 编译失败的问题 (Tencent#1571) * [ONNX][BUG]1. fix compile bug; * [ONNX2TNN][BUG]1. 修复因为 protobuf 版本升级带来的编译问题; * [ADD][TOOLS] add dynamic range quantization (Tencent#1572) * [ADD][TOOLS] support fake quantization * [UPD][FAKE_QUANT] fix bug * [UPD][DOC] add fake quantization in doc * [UPD] 1.rename fake quant to dynamic range quant 2.move dequant to net_optimizer * [UPD] remove redundant comment * [UPD] update comment for DynamicRangeDequant * [DRQuant][UPD] fix namespace issue * [DRQuant][UPD] Turn off TNN_SYMBOL_HIDE to fix ci Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD][OPENCL] opencl support using unoptimized conv (Tencent#1581) Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][CONVERTER] lstm support sequence_lens (Tencent#1585) Co-authored-by: ealinli <ealinli@tencent.com> * [MODEL_CHECK][BUG]1. fix bug for dump layer(fp16); (Tencent#1567) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Bugfix from train branch (Tencent#1592) * [BUG] fix get dims value bug when input is 1D or 2D in arm_reduce_layer_acc.cc. * [BUG] fix Convert from NCHW to NHWC error when input is on arm device. * [BUG] fix convert mat to blob bug when input is NC_INT32 on arm device. * [BUG] fix tflite_converter bug when transform a activation layer. * add nchw format condition when copy int32 mat to blob * rollback changes on tflite_op_converter.cc Co-authored-by: sanerzheng <sanerzheng@tencent.com> * [UPD][OPENCL] opencl support x86 mat (Tencent#1593) Co-authored-by: ealinli <ealinli@tencent.com> * [CONVERTER][BUG]1. fix issue 1595; (Tencent#1596) * [UPD][OPENCL] add ocl version check (Tencent#1601) * [UPD][OPENCL] add ocl version check * [UPD][OPENCL] update message for vervion check Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][OPENCL] solve the problem that matmul, tile have incorrect results on helio p65 (Tencent#1602) Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][DYQ] fix dynamic range quant compile error on windows (Tencent#1604) Co-authored-by: ealinli <ealinli@tencent.com> * [DOC][UPD] modify image links in doc (Tencent#1617) Co-authored-by: ealinli <ealinli@tencent.com> * remove redundant test cases (Tencent#1614) * Fix typos. (Tencent#1626) * Fix typos. * Update Readme. Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Interpreter change from std::map to safe_map, later one offers a const operator[] function (Tencent#1618) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD][OPENCL] get opencl version when GpuType is OTHER (Tencent#1636) * [UPD][OPENCL] get opencl version when GpuType is OTHER * [UPD][OPENCL] optimize nv gpu judgment logic Co-authored-by: ealinli <ealinli@tencent.com> * Patch x86 avx support (Tencent#1633) * merge dev_vc14_m1_debug, support x86 avx * add option to support x86 avx2 compile * update win_x86_opencl building script Co-authored-by: Dandiding <Dandiding@tencent.com> * fix x86 avx2 options (Tencent#1638) * fix typos in doc (Tencent#1634) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [X86][BUG] fix deconv layer build error (Tencent#1641) * [OPENCL][FIX] fix conv and dwconv on some of the AMD GPUs * [UPD][OPENCL] add coor check for conv and dwconv * [OPENCL][FIX] fix compilation issues * [OPENCL][UPD] optimize AMD GPU judgment logic Co-authored-by: ealinli <ealinli@tencent.com> * [OPENCL][UPD] fix deconv, avgpool on AMD GPU (Tencent#1646) * [OPENCL][UPD] fix deconv and avgpool when read image * [OPENCL][UPD] add header file for pooling Co-authored-by: ealinli <ealinli@tencent.com> * [OPENCL][UPD] opencl support cache on windows (Tencent#1645) * [UPD][OPENCL] add coor check for conv and dwconv * [OPENCL][FIX] fix compilation issues * [OPENCL][UPD] optimize AMD GPU judgment logic * [OPENCL][UPD] support cache on windows * [OPENCL][UPD] fix load cache on windows Co-authored-by: ealinli <ealinli@tencent.com> * [DRQ][UPD] dynamic range quant model support do const folder (Tencent#1647) * [DRQ][UPD] dynamic range quant model support do const folder * [TOOLS][UPD] dynamic range quant updates usage Co-authored-by: ealinli <ealinli@tencent.com> * 1. make model_check support dynamic range quantized model; (Tencent#1653) * [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial (Tencent#1640) * [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial * [TUTORIAL][UPD] update code link * [TUTORIAL][UPD] fix typo Co-authored-by: ealinli <ealinli@tencent.com> * [X86][FIX] binary op support fp16 weights (Tencent#1655) * [X86][FIX] binary op support fp16 weights * [X86][FIX] matmul support fp16 weights Co-authored-by: ealinli <ealinli@tencent.com> * Feature dynamic quant fc (Tencent#1660) * [DYNAMIC_QUANT][UPD]1. dynamic quant support inner_product layer; * [ARM][UPD]1. arm gemm 部分情况下使用 Kahan sum 算法,以避免 fp16 累加误差; * [FIX][CPU][TRT] Fix CPU Not OP bug, Fix TensorRT ShapeTensor Class Bug. (Tencent#1663) * [FIX] Fix CPU Not Operator data type error. * [FIX] Fix TensorRT ShapeTensor class ConvertTo1D() func bug * fix _mm256_load_ps segmentation fault (Tencent#1682) * fix _mm256_load_ps segmentation fault * fix crash on mm256_load when innerproduct * use loadu instead of stride-judgement * remove unused code Co-authored-by: fishdai <fishdai@tencent.com> * x86_acc & blob_converter now will consider the BlobHandle.bytes_offset (Tencent#1684) * Dev x86 layer adapter (Tencent#1683) * [X86] add layer acc adapter * [X86] NULL to nullptr * [X86][OPENVINO] add openvino adapter layer builder, fallback to cpu naive impl if there is no normal ov layer builder * [X86][OPENVINO] fix hard code of ov precision Co-authored-by: anonymous <anonymous@mail.org> * [ARM] fix arm cross compile error caused by float-abi (Tencent#1678) * avoid nullptr in IsSupport (Tencent#1685) * [UPD][TOOLS] 1.increase subs_length 2.align model support bool and int32 input 3. fix gather and onehot convert 4. gather_nd support indices_shape[-1] < r (Tencent#1686) Co-authored-by: ealinli <ealinli@tencent.com> * Dev metal ngray (Tencent#1693) * [METAL] metal support ngray input mat * [METAL]fix bytes_size * [COREML] fix dynamic quantization model about coreml Co-authored-by: jacinhu <jacinhu@tencent.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [UPD][DRQ] support quantizing matmul's const weight (Tencent#1698) * [UPD][DRQ] support quantizing matmul's const weight * [UPD][DRQ] add scale check in constant map Co-authored-by: ealinli <ealinli@tencent.com> * [FIX] fix compile macos framework (Tencent#1687) Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * Optimize dynamic range quantize (Tencent#1699) * [DynamicRangeQuantize][UPD]1. 添加了根据权重分布判断是否量化的逻辑; * [DynamicQuantization][UPD]1. dynamic_range_quantization support TNN fp16 model; * [DRQ][UPD]1. 修复了 model_check_android.sh 脚本中指定 reference file,但是推理没有用到的 bug;2. 优化了 dynamic_range_quantization 中的部分代码; * [DRQ][UPD]1.fix conflict with merge master code; Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> * Fix windows x86 build (Tencent#1697) * [FIX] remove nanodet for windows * remove ninga compile for some bug * fix x86 mat type register macro name * fix x86 matmul with 2 inputs Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [METAL] fix stride slice crach when dims is 2 (Tencent#1701) Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [mac] 1. FIX X86 and ARM conflict; 2. ADD ARM arch on intel cpu (You can use ARM if rosetta-X86 crash). 3. Use ios project build/profile M1-Mac. (Tencent#1700) Co-authored-by: gennyxu <gennyxu@tencent.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [iOS][UPD]1. add missing file for xcode project; (Tencent#1705) * [BUG]fix coreml error of slicev2、padv2 and matmul; (Tencent#1703) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN * [BUG]fix dynamic dequant error; fix arm pad error; * [BUG]support coreml padv2 * [BUG]fix ccoreml matmul error when it has const input blob * [BUG]fix coreml slicev2 * [UPD]add convert logic of swish * [BUG]fix error cpu error for x86 mac * [UPD]support fusion for gemm + bn * [UPD]add convert logic of swish Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD]update merge logic for swish groupnorm deconv (Tencent#1708) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN * [BUG]fix dynamic dequant error; fix arm pad error; * [BUG]support coreml padv2 * [BUG]fix ccoreml matmul error when it has const input blob * [BUG]fix coreml slicev2 * [UPD]add convert logic of swish * [BUG]fix error cpu error for x86 mac * [UPD]support fusion for gemm + bn * [UPD]add convert logic of swish * [UPD]support fusion for deconv+add and deconv+add+bn * [UPD]add aliyun disk link for tnn models * [UPD]support fusion for group norm * [UPD]support fusion for swish Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [DRQ][BUG]1. fix bug for max_values; (Tencent#1716) * Hotfix m1 build (Tencent#1715) * fix apple m1 clang 13.1 compile error * fix unit test compile error Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: sxj731533730 <sxj731533730@gmail.com> Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: powerpwang <powerpwang@outlook.com> Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: saner zheng <zqawszqaws@126.com> Co-authored-by: sanerzheng <sanerzheng@tencent.com> Co-authored-by: Feng Shijie <j514681085@icloud.com> Co-authored-by: Dandiding <Dandiding@tencent.com> Co-authored-by: FeiGeChuanShu <774074168@qq.com> Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com> Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com> Co-authored-by: kumbayaco <xyu.dai@gmail.com> Co-authored-by: fishdai <fishdai@tencent.com> Co-authored-by: anonymous <anonymous@mail.org> Co-authored-by: jacinhu <jacinhu@tencent.com> Co-authored-by: XDC <196890111@qq.com> Co-authored-by: gennyxu <gennyxu@tencent.com> Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local> * [FIX] fix sliceV2 op conflict with master again * [METAL][OP][FIX] 1.metal support groupnorm & swish op 2.fix metal blob conveter & reformat bug when input dim is 1 * reset model * [COREML] coreml support swish op * [COREML] fix coreml batchnorn bug * [COREML]coreml support groupmorm * [COREML]coreml support instancenorm * reset model * solve conflict * solve conflict * Dev groupnorm (Tencent#1726) * Fix trt multistream logger (Tencent#1521) * [FIX] fix trt logger * [FIX] catch std::bad_alloc error for trt8 building * [FIX] return null while shape_tensor size -1 * Update version.h Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Update split_utils.cc (Tencent#1528) 我使用mingw32编译提示错误，因为使用mingw32编译器仍然需要空间命名 [ 99%] Building CXX object CMakeFiles/TNN.dir/source/tnn/utils/split_utils.cc.obj D:\TNN\source\tnn\utils\split_utils.cc: In static member function 'static tnn::Status tnn::SplitUtils::SplitStr(const char*, tnn::str_arr&, const char*, bool, bool, bool, bool, bool)': D:\TNN\source\tnn\utils\split_utils.cc:163:23: error: 'min' was not declared in this scope int len = min((i - cursor), subs_length - 1); 个人认为修改这样更好一下，可以适应mingw32和兼顾之前的编译器 Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Update README.md (Tencent#1538) Typos * [UPD]update QQ group (Tencent#1552) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [opencl][fix] try save program cache (Tencent#1557) * Dev roi align (Tencent#1511) * [ARM] fix int32 blob cvt to mat * [ARM] support roi align * [ARM] add roi align unit test * [ARM] add to xcodeproj Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Fix arm gather and constant blob (Tencent#1564) * [ARM][BUG] fix gather error for indice < 0 * [ARM][BUG] fix buffer to blob error without converting precision * [ARM] update type convert in layer_norm fp16 Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> * Dev add config layer (Tencent#1569) * add config layer param to set arm conv algorithm for specific layer Co-authored-by: powerpwang <powerpwang@outlook.com> Co-authored-by: ealinli <ealinli@tencent.com> * 修复 protobuf 版本升级造成的 onnx2tnn 编译失败的问题 (Tencent#1571) * [ONNX][BUG]1. fix compile bug; * [ONNX2TNN][BUG]1. 修复因为 protobuf 版本升级带来的编译问题; * [ADD][TOOLS] add dynamic range quantization (Tencent#1572) * [ADD][TOOLS] support fake quantization * [UPD][FAKE_QUANT] fix bug * [UPD][DOC] add fake quantization in doc * [UPD] 1.rename fake quant to dynamic range quant 2.move dequant to net_optimizer * [UPD] remove redundant comment * [UPD] update comment for DynamicRangeDequant * [DRQuant][UPD] fix namespace issue * [DRQuant][UPD] Turn off TNN_SYMBOL_HIDE to fix ci Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD][OPENCL] opencl support using unoptimized conv (Tencent#1581) Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][CONVERTER] lstm support sequence_lens (Tencent#1585) Co-authored-by: ealinli <ealinli@tencent.com> * [MODEL_CHECK][BUG]1. fix bug for dump layer(fp16); (Tencent#1567) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Bugfix from train branch (Tencent#1592) * [BUG] fix get dims value bug when input is 1D or 2D in arm_reduce_layer_acc.cc. * [BUG] fix Convert from NCHW to NHWC error when input is on arm device. * [BUG] fix convert mat to blob bug when input is NC_INT32 on arm device. * [BUG] fix tflite_converter bug when transform a activation layer. * add nchw format condition when copy int32 mat to blob * rollback changes on tflite_op_converter.cc Co-authored-by: sanerzheng <sanerzheng@tencent.com> * [UPD][OPENCL] opencl support x86 mat (Tencent#1593) Co-authored-by: ealinli <ealinli@tencent.com> * [CONVERTER][BUG]1. fix issue 1595; (Tencent#1596) * [UPD][OPENCL] add ocl version check (Tencent#1601) * [UPD][OPENCL] add ocl version check * [UPD][OPENCL] update message for vervion check Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][OPENCL] solve the problem that matmul, tile have incorrect results on helio p65 (Tencent#1602) Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][DYQ] fix dynamic range quant compile error on windows (Tencent#1604) Co-authored-by: ealinli <ealinli@tencent.com> * [DOC][UPD] modify image links in doc (Tencent#1617) Co-authored-by: ealinli <ealinli@tencent.com> * remove redundant test cases (Tencent#1614) * Fix typos. (Tencent#1626) * Fix typos. * Update Readme. Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Interpreter change from std::map to safe_map, later one offers a const operator[] function (Tencent#1618) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD][OPENCL] get opencl version when GpuType is OTHER (Tencent#1636) * [UPD][OPENCL] get opencl version when GpuType is OTHER * [UPD][OPENCL] optimize nv gpu judgment logic Co-authored-by: ealinli <ealinli@tencent.com> * Patch x86 avx support (Tencent#1633) * merge dev_vc14_m1_debug, support x86 avx * add option to support x86 avx2 compile * update win_x86_opencl building script Co-authored-by: Dandiding <Dandiding@tencent.com> * fix x86 avx2 options (Tencent#1638) * fix typos in doc (Tencent#1634) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [X86][BUG] fix deconv layer build error (Tencent#1641) * [OPENCL][FIX] fix conv and dwconv on some of the AMD GPUs * [UPD][OPENCL] add coor check for conv and dwconv * [OPENCL][FIX] fix compilation issues * [OPENCL][UPD] optimize AMD GPU judgment logic Co-authored-by: ealinli <ealinli@tencent.com> * [OPENCL][UPD] fix deconv, avgpool on AMD GPU (Tencent#1646) * [OPENCL][UPD] fix deconv and avgpool when read image * [OPENCL][UPD] add header file for pooling Co-authored-by: ealinli <ealinli@tencent.com> * [OPENCL][UPD] opencl support cache on windows (Tencent#1645) * [UPD][OPENCL] add coor check for conv and dwconv * [OPENCL][FIX] fix compilation issues * [OPENCL][UPD] optimize AMD GPU judgment logic * [OPENCL][UPD] support cache on windows * [OPENCL][UPD] fix load cache on windows Co-authored-by: ealinli <ealinli@tencent.com> * [DRQ][UPD] dynamic range quant model support do const folder (Tencent#1647) * [DRQ][UPD] dynamic range quant model support do const folder * [TOOLS][UPD] dynamic range quant updates usage Co-authored-by: ealinli <ealinli@tencent.com> * 1. make model_check support dynamic range quantized model; (Tencent#1653) * [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial (Tencent#1640) * [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial * [TUTORIAL][UPD] update code link * [TUTORIAL][UPD] fix typo Co-authored-by: ealinli <ealinli@tencent.com> * [X86][FIX] binary op support fp16 weights (Tencent#1655) * [X86][FIX] binary op support fp16 weights * [X86][FIX] matmul support fp16 weights Co-authored-by: ealinli <ealinli@tencent.com> * Feature dynamic quant fc (Tencent#1660) * [DYNAMIC_QUANT][UPD]1. dynamic quant support inner_product layer; * [ARM][UPD]1. arm gemm 部分情况下使用 Kahan sum 算法,以避免 fp16 累加误差; * [FIX][CPU][TRT] Fix CPU Not OP bug, Fix TensorRT ShapeTensor Class Bug. (Tencent#1663) * [FIX] Fix CPU Not Operator data type error. * [FIX] Fix TensorRT ShapeTensor class ConvertTo1D() func bug * fix _mm256_load_ps segmentation fault (Tencent#1682) * fix _mm256_load_ps segmentation fault * fix crash on mm256_load when innerproduct * use loadu instead of stride-judgement * remove unused code Co-authored-by: fishdai <fishdai@tencent.com> * x86_acc & blob_converter now will consider the BlobHandle.bytes_offset (Tencent#1684) * Dev x86 layer adapter (Tencent#1683) * [X86] add layer acc adapter * [X86] NULL to nullptr * [X86][OPENVINO] add openvino adapter layer builder, fallback to cpu naive impl if there is no normal ov layer builder * [X86][OPENVINO] fix hard code of ov precision Co-authored-by: anonymous <anonymous@mail.org> * [ARM] fix arm cross compile error caused by float-abi (Tencent#1678) * avoid nullptr in IsSupport (Tencent#1685) * [UPD][TOOLS] 1.increase subs_length 2.align model support bool and int32 input 3. fix gather and onehot convert 4. gather_nd support indices_shape[-1] < r (Tencent#1686) Co-authored-by: ealinli <ealinli@tencent.com> * Dev metal ngray (Tencent#1693) * [METAL] metal support ngray input mat * [METAL]fix bytes_size * [COREML] fix dynamic quantization model about coreml Co-authored-by: jacinhu <jacinhu@tencent.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [UPD][DRQ] support quantizing matmul's const weight (Tencent#1698) * [UPD][DRQ] support quantizing matmul's const weight * [UPD][DRQ] add scale check in constant map Co-authored-by: ealinli <ealinli@tencent.com> * [FIX] fix compile macos framework (Tencent#1687) Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * Optimize dynamic range quantize (Tencent#1699) * [DynamicRangeQuantize][UPD]1. 添加了根据权重分布判断是否量化的逻辑; * [DynamicQuantization][UPD]1. dynamic_range_quantization support TNN fp16 model; * [DRQ][UPD]1. 修复了 model_check_android.sh 脚本中指定 reference file,但是推理没有用到的 bug;2. 优化了 dynamic_range_quantization 中的部分代码; * [DRQ][UPD]1.fix conflict with merge master code; Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> * Fix windows x86 build (Tencent#1697) * [FIX] remove nanodet for windows * remove ninga compile for some bug * fix x86 mat type register macro name * fix x86 matmul with 2 inputs Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [METAL] fix stride slice crach when dims is 2 (Tencent#1701) Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [mac] 1. FIX X86 and ARM conflict; 2. ADD ARM arch on intel cpu (You can use ARM if rosetta-X86 crash). 3. Use ios project build/profile M1-Mac. (Tencent#1700) Co-authored-by: gennyxu <gennyxu@tencent.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [iOS][UPD]1. add missing file for xcode project; (Tencent#1705) * [BUG]fix coreml error of slicev2、padv2 and matmul; (Tencent#1703) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN * [BUG]fix dynamic dequant error; fix arm pad error; * [BUG]support coreml padv2 * [BUG]fix ccoreml matmul error when it has const input blob * [BUG]fix coreml slicev2 * [UPD]add convert logic of swish * [BUG]fix error cpu error for x86 mac * [UPD]support fusion for gemm + bn * [UPD]add convert logic of swish Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD]update merge logic for swish groupnorm deconv (Tencent#1708) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN * [BUG]fix dynamic dequant error; fix arm pad error; * [BUG]support coreml padv2 * [BUG]fix ccoreml matmul error when it has const input blob * [BUG]fix coreml slicev2 * [UPD]add convert logic of swish * [BUG]fix error cpu error for x86 mac * [UPD]support fusion for gemm + bn * [UPD]add convert logic of swish * [UPD]support fusion for deconv+add and deconv+add+bn * [UPD]add aliyun disk link for tnn models * [UPD]support fusion for group norm * [UPD]support fusion for swish Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [DRQ][BUG]1. fix bug for max_values; (Tencent#1716) * Hotfix m1 build (Tencent#1715) * fix apple m1 clang 13.1 compile error * fix unit test compile error Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> * [ARM] support groupnorm * [ARM] support swish * add swish to conv-post-fuse * [ADD][OPENCL] opencl add group norm and swish (Tencent#1722) Co-authored-by: ealinli <ealinli@tencent.com> * add x86 swish and groupnorm operator; explicitly open see4.2 with low version of compiler Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: sxj731533730 <sxj731533730@gmail.com> Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: powerpwang <powerpwang@outlook.com> Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: saner zheng <zqawszqaws@126.com> Co-authored-by: sanerzheng <sanerzheng@tencent.com> Co-authored-by: Feng Shijie <j514681085@icloud.com> Co-authored-by: Dandiding <Dandiding@tencent.com> Co-authored-by: FeiGeChuanShu <774074168@qq.com> Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com> Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com> Co-authored-by: kumbayaco <xyu.dai@gmail.com> Co-authored-by: fishdai <fishdai@tencent.com> Co-authored-by: anonymous <anonymous@mail.org> Co-authored-by: jacinhu <jacinhu@tencent.com> Co-authored-by: XDC <196890111@qq.com> Co-authored-by: gennyxu <gennyxu@tencent.com> Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local> Co-authored-by: quinnrong <quinnrong@tencent.com> Co-authored-by: shenpenwang <565067453@qq.com> * fix coreml groupnorm unit test * [ADD]add exp op * [BUG]fix deconv bisas error * [UPD]init cpu memory with 0 for bert model * [BUG]fix reshape static error; reshape static layer cannot handle 0 or -1 * [UPD]support inst norm for coreml; update tnn project file; * [BUG]fix error for layer without layer resource, [] operater will add one, which is not thread safe * [UPD]add param to batchnorm to support instancenorm * [UPD]adjust groupnorm with batchnorm * [UPD]support instancenorm with groupnorm by setting group==channels * [UPD]update unit test of instancenorm * [BUG]fix unit test error for layer batchnorm * [UPD]update tnn project * [BUG]fix unit test error for APPLE NPU * [BUG]fix unit test crash for layer batchnorm * [UPD]ignore cpu or gpu benchmark for mlmodel or mlmodelc * [UPD]ignore * [UPD]ignore pixelshuffle for apple npu * [UPD]ignore matconvert for apple npu * [UPD]ignore some unary op for apple npu * [UPD]unify before and after coreml layer, simplify lstm layer * [UPD]fix lstm error for ht and ct for biLSTM * [UPD]fix const input load error * [UPD]fix internal error * [UPD]ignore Co-authored-by: jacinhu <jacinhu@tencent.com> Co-authored-by: teslawho <597645882@qq.com> Co-authored-by: teslawho <71381575+teslawho@users.noreply.github.com> Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: sxj731533730 <sxj731533730@gmail.com> Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com> Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: powerpwang <powerpwang@outlook.com> Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: saner zheng <zqawszqaws@126.com> Co-authored-by: sanerzheng <sanerzheng@tencent.com> Co-authored-by: Feng Shijie <j514681085@icloud.com> Co-authored-by: Dandiding <Dandiding@tencent.com> Co-authored-by: FeiGeChuanShu <774074168@qq.com> Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com> Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com> Co-authored-by: kumbayaco <xyu.dai@gmail.com> Co-authored-by: fishdai <fishdai@tencent.com> Co-authored-by: anonymous <anonymous@mail.org> Co-authored-by: XDC <196890111@qq.com> Co-authored-by: gennyxu <gennyxu@tencent.com> Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local> Co-authored-by: quinnrong <quinnrong@tencent.com> Co-authored-by: shenpenwang <565067453@qq.com>

* [ARM] support groupnorm * [ARM] support swish * add swish to conv-post-fuse * [ADD][OPENCL] opencl add group norm and swish (Tencent#1722) Co-authored-by: ealinli <ealinli@tencent.com> * add x86 swish and groupnorm operator; explicitly open see4.2 with low version of compiler * fix lstm unit test Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: shenpenwang <565067453@qq.com>

In order to make the changes ready to merge, now delete the readme file

…page.action?pageId=1928435580

* [UPD] fix some about error status again * [UPD]enable const folder to infer blobs shape for coreml; fix reshape shape size logic; * [UPD]unify op system;check apple neral engine; * [UPD]unify op system;check apple neral engine; * [FIX] reset multi input in network forward for support image classifier demo * [FIX] fix multi input in network forward * [FIX] fix const op about weight shape(=1) * [FIX] fix const op about weight shape(=1) again * [UPD] update to support multi output forward * [UPD] update to support split op * [UPD]fix coreml multi output case; add cache logic; * [UPD]fix coreml multi output case; add cache logic; * [UPD]fix coreml multi output case; add cache logic; * [UPD]fix multi output error * [FIX] fix pool op about pad * [UPD] update to support pad op (only allowed for H and W dimensions) * [UPD]remove blob manager of coreml network * [UPD]rename coreml_executor to coremlmodel * [UPD] remove InitCoreMLExecutor * [FIX] fix to support different input data type (float32 & int32) in forward * [UPD] update to support expand dims & reduce dims reshape by adding unsqueeze & squeeze * [UPD]change internal device from metal to arm for device npu * [FIX] fix conv op about group conv * [FIX] fix deconv op about group deconv * [UPD] update to support sub op * [UPD] update to support clip op * [UPD] update to support slice op * [UPD] update to support upsample op * [FIX] fix slice op about endindex * [UPD] update to support constant padding, allowed for C , H and W dimensions * [UPD]fix camera switch device * [UPD]fix actual device display error * [UPD]fix cache path * [UPD] upodate to add sub & slice & clip to project * [FIX] fix demo use NPU error * [UPD]fix ocr error * [FIX] fix upsample op about align_corners * [FIX] fix upsample op about Fractional scales * [BUG]fix coreml output nil error; fix upsample nn for fractional scale * [FIX] fix upsample op about scales order * [UPD] update to support slice v2 op * [UPD] update to support tanh v2 op * [FIX] fix batchnorm op about mean value * [FIX] fix some annotation * [BUG]fix upsample error; add shuffle channel coreml layer * [FIX] fix innerproduct op about inputchannels * [UPD] remove slicev2 to slice file * [UPD] remove tanhv2 to slice file * [UPD] update to reshape op about expand dims & reduce dims * [UPD] update to innerproduct op adout adding squeeze to reduce dims (in order to match old TNN model) * [UPD] update to support flatten to 2D op * [UPD] update to support relu6 op * [ADD]]add cast coreml layer * [ADD]]add shape coreml layer * [UPD] add flatten & relu6 & shuffle_channel to xcode project * [ADD]]add gather coreml layer * [ADD]]add gelu coreml layer * [ADD]]add layernorm coreml layer * [BUG]support int32 for coreml const layer * [BUG]support shape input for coreml reshape layer * [BUG]support model check for TNN_APPLE_NPU_ENABLE using MLComputeUnitsCPUOnly * [ADD]]add mat_mul coreml layer; * [UPD] update to support reshape layer when reshape_type = 1 * [UPD] update to coreml model input&output support int32 data tpye * [FIX] fix reshape layer about reshapedynamic input & output * [BUG]support mlmodel and mlmodelc for benchmark * [UPD] update to support conv layer with fp16 data type * [FIX] add 'APPLE_NPU' to model_check device_type_message * [FIX] fix some about conv layer with fp16 data type (TNN fp16 -> CoreML fp32) * [FIX] fix some about const layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support deconv layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support innerproduct layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support batchnorm layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support layernorm layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support prelu layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support matmul layer with fp16 data type (TNN fp16 -> CoreML fp32) * [FIX] fix some about matmul layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD]support fuse form mul+add to batchnorm * [BUG]fix import error * [BUG]fix reshape error * [BUG] fix reshape layer when reshape_type=1 (input_shape_size = output_shape_size = 4) * [BUG] fix reshape layer when reshape_type=1 (input_shape_size = output_shape_size = 4) * [UPD]support ssd * [ADD]]ssdlite-mobilenetv2 from tf * [UPD] update to support conv & deconv & const & innerproduct & batchnorm & layernorm & matmul & prelu layers with fp16 data type (TNN fp16 -> CoreML fp16) * [UPD] update to support batchnorm layers with fp16 data type (TNN fp16 -> CoreML fp16) * [FIX] set coreml layer default using full precision * [UPD] update to support hardsigmoid layer * [UPD] update to support hardswish layer * [UPD] update to support reducesum layer * [UPD] update to support reducemean layer * [UPD] add some coreml layer files to xcode project * [FIX] fix some annotation about hardswish * [BUG]fix reshape for tensor with dims size=0 * [UPD]support landscapeleft ui; clear navbar left items * [UPD]support landscapeleft ui; add stackview to support minor camera preview; * [ADD]add monodepth demo * [UPD] update to support unit_test * [FIX] upload missing download_model.sh and download_model.bat * [UPD] update concat & conv & shuffle uint_test files for APPLE_NPU * [FIX] rename unit_test model * [UPD] update to support softplus layer * [UPD] update to support softsign layer * [UPD] update to support div layer * [UPD] update binary layer unit_test for APPLE_NPU * [UPD] update to support reducemax layer * [UPD] update to support reducemin layer * [UPD]update project file * [UPD]add log error * [UPD] update hardswish layer unit_test for APPLE_NPU * [UPD]add log error * [UPD] update to skip stride_slice when APPLE_NPU * [BUG]fix batchnorm unitest * [BUG]fix prelu unitest * [BUG]fix prelu unitest * [BUG]fix prelu unitest * [BUG] fix unsqueeze unittest * [BUG] fix split unittest * [BUG] fix reshape unittest * [BUG]fix updample unitest * [BUG] fix reduce op (reducesum/reducemean/reducemax/reducemin) unittest * [BUG]fix layernorm unitest * [BUG] fix reduce op unittest again * [BUG] fix deconv unittest * [BUG] fix innerproduct unittest * [BUG]fix ssd demo display error * [BUG] fix matmul unittest * [BUG]fix benchmark error to support multiple model in the same directory * [BUG] add some explanation about reduce op unittest * [BUG]fix benchmark error to support multiple model in the same directory * [BUG] add some explanation about reduce op unittest again * [BUG]fix batchnorm param error * [BUG] fix reshape layer unittest * [BUG]fix batchnorm param error * [BUG]fix conv/deconv input/output channel error * [UPD] update to support stride_slice & unittest * [BUG] fix reshape layer unittest when reshape_type = 1 * [BUG] fix reshape layer unittest when reshape_type = 1 using reshapestatic * [BUG] fix reshape layer unittest using reshapestatic * [BUG] fix some annotation about reshape layer * [BUG] fix reshape layer output permute when reshape_type = 1 * [BUG] fix reshape layer using reshapestatic whem reshape_type = 1 * [BUG]fix broadcast layer error for input form constant map； fix bert demo error； * [BUG]fix blob convert error for int32 mat * [BUG]fix reshape name style * [UPD]add tiny bert fixed length 256 * [BUG] fix add layer by binary op base class * [BUG] fix div/mul/sub layer by binary op base class * [BUG]fix batchnorm unitest * [BUG]ensure clean up mlmodelc if error raises when compile * [UPD]adjust demo list * [BUG] fix conv layer about activation inplace * [BUG] fix conv layer about relu6 * [BUG] fix cleanup func none of return * [BUG] remove repetitive line * [BUG]fix batchnorm unitest * [BUG] fix conv layer about relu6 inplace * [UPD]automatically use apple npu * [UPD]add clean logic for coreml * [BUG] fix hardswish layer with 2 inputs * [UPD] update README.md & support.md about APPLE_NPU * [UPD]unify rawbuffer2coremlweight * [UPD]support coreml lstm * [UPD]fix lstm error * [UPD]support coreml lstm bidirection * [UPD]support coreml constofshape * [UPD]support slice at axis=0 * [UPD]ignore * [UPD]fix reshape error * [UPD]fix lstm error; replace suqeeze with reshape because some case suqeeze raise runtime compile error for axis = {3, 4} * [UPD]fix slice error * [UPD]support multiple mlmodel in the same dirctory; add autorelease memory, because coreml may need large memory in ocr demo * ignore * [UPD]add log msg * [UPD]fix reshape and slice error * [UPD]add auto release to model * [UPD]add auto release to model * [UPD]unify convertion from rawbuffer to coreml weight param * [FIX] fix matmul from rawbuffer to coreml weight param * [UPD]fix innerproduct input channel error * [BUG] fix matmul weight bug * remove some annotation * [BUG] fix matmul layer about fp16 * [FIX] fix sliceV2 op conflict with master * [FIX] fix sliceV2 op conflict with master * merge master (Tencent#1721) * Fix trt multistream logger (Tencent#1521) * [FIX] fix trt logger * [FIX] catch std::bad_alloc error for trt8 building * [FIX] return null while shape_tensor size -1 * Update version.h Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Update split_utils.cc (Tencent#1528) 我使用mingw32编译提示错误，因为使用mingw32编译器仍然需要空间命名 [ 99%] Building CXX object CMakeFiles/TNN.dir/source/tnn/utils/split_utils.cc.obj D:\TNN\source\tnn\utils\split_utils.cc: In static member function 'static tnn::Status tnn::SplitUtils::SplitStr(const char*, tnn::str_arr&, const char*, bool, bool, bool, bool, bool)': D:\TNN\source\tnn\utils\split_utils.cc:163:23: error: 'min' was not declared in this scope int len = min((i - cursor), subs_length - 1); 个人认为修改这样更好一下，可以适应mingw32和兼顾之前的编译器 Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Update README.md (Tencent#1538) Typos * [UPD]update QQ group (Tencent#1552) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [opencl][fix] try save program cache (Tencent#1557) * Dev roi align (Tencent#1511) * [ARM] fix int32 blob cvt to mat * [ARM] support roi align * [ARM] add roi align unit test * [ARM] add to xcodeproj Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Fix arm gather and constant blob (Tencent#1564) * [ARM][BUG] fix gather error for indice < 0 * [ARM][BUG] fix buffer to blob error without converting precision * [ARM] update type convert in layer_norm fp16 Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> * Dev add config layer (Tencent#1569) * add config layer param to set arm conv algorithm for specific layer Co-authored-by: powerpwang <powerpwang@outlook.com> Co-authored-by: ealinli <ealinli@tencent.com> * 修复 protobuf 版本升级造成的 onnx2tnn 编译失败的问题 (Tencent#1571) * [ONNX][BUG]1. fix compile bug; * [ONNX2TNN][BUG]1. 修复因为 protobuf 版本升级带来的编译问题; * [ADD][TOOLS] add dynamic range quantization (Tencent#1572) * [ADD][TOOLS] support fake quantization * [UPD][FAKE_QUANT] fix bug * [UPD][DOC] add fake quantization in doc * [UPD] 1.rename fake quant to dynamic range quant 2.move dequant to net_optimizer * [UPD] remove redundant comment * [UPD] update comment for DynamicRangeDequant * [DRQuant][UPD] fix namespace issue * [DRQuant][UPD] Turn off TNN_SYMBOL_HIDE to fix ci Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD][OPENCL] opencl support using unoptimized conv (Tencent#1581) Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][CONVERTER] lstm support sequence_lens (Tencent#1585) Co-authored-by: ealinli <ealinli@tencent.com> * [MODEL_CHECK][BUG]1. fix bug for dump layer(fp16); (Tencent#1567) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Bugfix from train branch (Tencent#1592) * [BUG] fix get dims value bug when input is 1D or 2D in arm_reduce_layer_acc.cc. * [BUG] fix Convert from NCHW to NHWC error when input is on arm device. * [BUG] fix convert mat to blob bug when input is NC_INT32 on arm device. * [BUG] fix tflite_converter bug when transform a activation layer. * add nchw format condition when copy int32 mat to blob * rollback changes on tflite_op_converter.cc Co-authored-by: sanerzheng <sanerzheng@tencent.com> * [UPD][OPENCL] opencl support x86 mat (Tencent#1593) Co-authored-by: ealinli <ealinli@tencent.com> * [CONVERTER][BUG]1. fix issue 1595; (Tencent#1596) * [UPD][OPENCL] add ocl version check (Tencent#1601) * [UPD][OPENCL] add ocl version check * [UPD][OPENCL] update message for vervion check Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][OPENCL] solve the problem that matmul, tile have incorrect results on helio p65 (Tencent#1602) Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][DYQ] fix dynamic range quant compile error on windows (Tencent#1604) Co-authored-by: ealinli <ealinli@tencent.com> * [DOC][UPD] modify image links in doc (Tencent#1617) Co-authored-by: ealinli <ealinli@tencent.com> * remove redundant test cases (Tencent#1614) * Fix typos. (Tencent#1626) * Fix typos. * Update Readme. Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Interpreter change from std::map to safe_map, later one offers a const operator[] function (Tencent#1618) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD][OPENCL] get opencl version when GpuType is OTHER (Tencent#1636) * [UPD][OPENCL] get opencl version when GpuType is OTHER * [UPD][OPENCL] optimize nv gpu judgment logic Co-authored-by: ealinli <ealinli@tencent.com> * Patch x86 avx support (Tencent#1633) * merge dev_vc14_m1_debug, support x86 avx * add option to support x86 avx2 compile * update win_x86_opencl building script Co-authored-by: Dandiding <Dandiding@tencent.com> * fix x86 avx2 options (Tencent#1638) * fix typos in doc (Tencent#1634) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [X86][BUG] fix deconv layer build error (Tencent#1641) * [OPENCL][FIX] fix conv and dwconv on some of the AMD GPUs * [UPD][OPENCL] add coor check for conv and dwconv * [OPENCL][FIX] fix compilation issues * [OPENCL][UPD] optimize AMD GPU judgment logic Co-authored-by: ealinli <ealinli@tencent.com> * [OPENCL][UPD] fix deconv, avgpool on AMD GPU (Tencent#1646) * [OPENCL][UPD] fix deconv and avgpool when read image * [OPENCL][UPD] add header file for pooling Co-authored-by: ealinli <ealinli@tencent.com> * [OPENCL][UPD] opencl support cache on windows (Tencent#1645) * [UPD][OPENCL] add coor check for conv and dwconv * [OPENCL][FIX] fix compilation issues * [OPENCL][UPD] optimize AMD GPU judgment logic * [OPENCL][UPD] support cache on windows * [OPENCL][UPD] fix load cache on windows Co-authored-by: ealinli <ealinli@tencent.com> * [DRQ][UPD] dynamic range quant model support do const folder (Tencent#1647) * [DRQ][UPD] dynamic range quant model support do const folder * [TOOLS][UPD] dynamic range quant updates usage Co-authored-by: ealinli <ealinli@tencent.com> * 1. make model_check support dynamic range quantized model; (Tencent#1653) * [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial (Tencent#1640) * [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial * [TUTORIAL][UPD] update code link * [TUTORIAL][UPD] fix typo Co-authored-by: ealinli <ealinli@tencent.com> * [X86][FIX] binary op support fp16 weights (Tencent#1655) * [X86][FIX] binary op support fp16 weights * [X86][FIX] matmul support fp16 weights Co-authored-by: ealinli <ealinli@tencent.com> * Feature dynamic quant fc (Tencent#1660) * [DYNAMIC_QUANT][UPD]1. dynamic quant support inner_product layer; * [ARM][UPD]1. arm gemm 部分情况下使用 Kahan sum 算法,以避免 fp16 累加误差; * [FIX][CPU][TRT] Fix CPU Not OP bug, Fix TensorRT ShapeTensor Class Bug. (Tencent#1663) * [FIX] Fix CPU Not Operator data type error. * [FIX] Fix TensorRT ShapeTensor class ConvertTo1D() func bug * fix _mm256_load_ps segmentation fault (Tencent#1682) * fix _mm256_load_ps segmentation fault * fix crash on mm256_load when innerproduct * use loadu instead of stride-judgement * remove unused code Co-authored-by: fishdai <fishdai@tencent.com> * x86_acc & blob_converter now will consider the BlobHandle.bytes_offset (Tencent#1684) * Dev x86 layer adapter (Tencent#1683) * [X86] add layer acc adapter * [X86] NULL to nullptr * [X86][OPENVINO] add openvino adapter layer builder, fallback to cpu naive impl if there is no normal ov layer builder * [X86][OPENVINO] fix hard code of ov precision Co-authored-by: anonymous <anonymous@mail.org> * [ARM] fix arm cross compile error caused by float-abi (Tencent#1678) * avoid nullptr in IsSupport (Tencent#1685) * [UPD][TOOLS] 1.increase subs_length 2.align model support bool and int32 input 3. fix gather and onehot convert 4. gather_nd support indices_shape[-1] < r (Tencent#1686) Co-authored-by: ealinli <ealinli@tencent.com> * Dev metal ngray (Tencent#1693) * [METAL] metal support ngray input mat * [METAL]fix bytes_size * [COREML] fix dynamic quantization model about coreml Co-authored-by: jacinhu <jacinhu@tencent.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [UPD][DRQ] support quantizing matmul's const weight (Tencent#1698) * [UPD][DRQ] support quantizing matmul's const weight * [UPD][DRQ] add scale check in constant map Co-authored-by: ealinli <ealinli@tencent.com> * [FIX] fix compile macos framework (Tencent#1687) Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * Optimize dynamic range quantize (Tencent#1699) * [DynamicRangeQuantize][UPD]1. 添加了根据权重分布判断是否量化的逻辑; * [DynamicQuantization][UPD]1. dynamic_range_quantization support TNN fp16 model; * [DRQ][UPD]1. 修复了 model_check_android.sh 脚本中指定 reference file,但是推理没有用到的 bug;2. 优化了 dynamic_range_quantization 中的部分代码; * [DRQ][UPD]1.fix conflict with merge master code; Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> * Fix windows x86 build (Tencent#1697) * [FIX] remove nanodet for windows * remove ninga compile for some bug * fix x86 mat type register macro name * fix x86 matmul with 2 inputs Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [METAL] fix stride slice crach when dims is 2 (Tencent#1701) Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [mac] 1. FIX X86 and ARM conflict; 2. ADD ARM arch on intel cpu (You can use ARM if rosetta-X86 crash). 3. Use ios project build/profile M1-Mac. (Tencent#1700) Co-authored-by: gennyxu <gennyxu@tencent.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [iOS][UPD]1. add missing file for xcode project; (Tencent#1705) * [BUG]fix coreml error of slicev2、padv2 and matmul; (Tencent#1703) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN * [BUG]fix dynamic dequant error; fix arm pad error; * [BUG]support coreml padv2 * [BUG]fix ccoreml matmul error when it has const input blob * [BUG]fix coreml slicev2 * [UPD]add convert logic of swish * [BUG]fix error cpu error for x86 mac * [UPD]support fusion for gemm + bn * [UPD]add convert logic of swish Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD]update merge logic for swish groupnorm deconv (Tencent#1708) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN * [BUG]fix dynamic dequant error; fix arm pad error; * [BUG]support coreml padv2 * [BUG]fix ccoreml matmul error when it has const input blob * [BUG]fix coreml slicev2 * [UPD]add convert logic of swish * [BUG]fix error cpu error for x86 mac * [UPD]support fusion for gemm + bn * [UPD]add convert logic of swish * [UPD]support fusion for deconv+add and deconv+add+bn * [UPD]add aliyun disk link for tnn models * [UPD]support fusion for group norm * [UPD]support fusion for swish Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [DRQ][BUG]1. fix bug for max_values; (Tencent#1716) * Hotfix m1 build (Tencent#1715) * fix apple m1 clang 13.1 compile error * fix unit test compile error Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: sxj731533730 <sxj731533730@gmail.com> Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: powerpwang <powerpwang@outlook.com> Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: saner zheng <zqawszqaws@126.com> Co-authored-by: sanerzheng <sanerzheng@tencent.com> Co-authored-by: Feng Shijie <j514681085@icloud.com> Co-authored-by: Dandiding <Dandiding@tencent.com> Co-authored-by: FeiGeChuanShu <774074168@qq.com> Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com> Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com> Co-authored-by: kumbayaco <xyu.dai@gmail.com> Co-authored-by: fishdai <fishdai@tencent.com> Co-authored-by: anonymous <anonymous@mail.org> Co-authored-by: jacinhu <jacinhu@tencent.com> Co-authored-by: XDC <196890111@qq.com> Co-authored-by: gennyxu <gennyxu@tencent.com> Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local> * [FIX] fix sliceV2 op conflict with master again * [METAL][OP][FIX] 1.metal support groupnorm & swish op 2.fix metal blob conveter & reformat bug when input dim is 1 * reset model * [COREML] coreml support swish op * [COREML] fix coreml batchnorn bug * [COREML]coreml support groupmorm * [COREML]coreml support instancenorm * reset model * solve conflict * solve conflict * Dev groupnorm (Tencent#1726) * Fix trt multistream logger (Tencent#1521) * [FIX] fix trt logger * [FIX] catch std::bad_alloc error for trt8 building * [FIX] return null while shape_tensor size -1 * Update version.h Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Update split_utils.cc (Tencent#1528) 我使用mingw32编译提示错误，因为使用mingw32编译器仍然需要空间命名 [ 99%] Building CXX object CMakeFiles/TNN.dir/source/tnn/utils/split_utils.cc.obj D:\TNN\source\tnn\utils\split_utils.cc: In static member function 'static tnn::Status tnn::SplitUtils::SplitStr(const char*, tnn::str_arr&, const char*, bool, bool, bool, bool, bool)': D:\TNN\source\tnn\utils\split_utils.cc:163:23: error: 'min' was not declared in this scope int len = min((i - cursor), subs_length - 1); 个人认为修改这样更好一下，可以适应mingw32和兼顾之前的编译器 Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Update README.md (Tencent#1538) Typos * [UPD]update QQ group (Tencent#1552) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [opencl][fix] try save program cache (Tencent#1557) * Dev roi align (Tencent#1511) * [ARM] fix int32 blob cvt to mat * [ARM] support roi align * [ARM] add roi align unit test * [ARM] add to xcodeproj Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Fix arm gather and constant blob (Tencent#1564) * [ARM][BUG] fix gather error for indice < 0 * [ARM][BUG] fix buffer to blob error without converting precision * [ARM] update type convert in layer_norm fp16 Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> * Dev add config layer (Tencent#1569) * add config layer param to set arm conv algorithm for specific layer Co-authored-by: powerpwang <powerpwang@outlook.com> Co-authored-by: ealinli <ealinli@tencent.com> * 修复 protobuf 版本升级造成的 onnx2tnn 编译失败的问题 (Tencent#1571) * [ONNX][BUG]1. fix compile bug; * [ONNX2TNN][BUG]1. 修复因为 protobuf 版本升级带来的编译问题; * [ADD][TOOLS] add dynamic range quantization (Tencent#1572) * [ADD][TOOLS] support fake quantization * [UPD][FAKE_QUANT] fix bug * [UPD][DOC] add fake quantization in doc * [UPD] 1.rename fake quant to dynamic range quant 2.move dequant to net_optimizer * [UPD] remove redundant comment * [UPD] update comment for DynamicRangeDequant * [DRQuant][UPD] fix namespace issue * [DRQuant][UPD] Turn off TNN_SYMBOL_HIDE to fix ci Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD][OPENCL] opencl support using unoptimized conv (Tencent#1581) Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][CONVERTER] lstm support sequence_lens (Tencent#1585) Co-authored-by: ealinli <ealinli@tencent.com> * [MODEL_CHECK][BUG]1. fix bug for dump layer(fp16); (Tencent#1567) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Bugfix from train branch (Tencent#1592) * [BUG] fix get dims value bug when input is 1D or 2D in arm_reduce_layer_acc.cc. * [BUG] fix Convert from NCHW to NHWC error when input is on arm device. * [BUG] fix convert mat to blob bug when input is NC_INT32 on arm device. * [BUG] fix tflite_converter bug when transform a activation layer. * add nchw format condition when copy int32 mat to blob * rollback changes on tflite_op_converter.cc Co-authored-by: sanerzheng <sanerzheng@tencent.com> * [UPD][OPENCL] opencl support x86 mat (Tencent#1593) Co-authored-by: ealinli <ealinli@tencent.com> * [CONVERTER][BUG]1. fix issue 1595; (Tencent#1596) * [UPD][OPENCL] add ocl version check (Tencent#1601) * [UPD][OPENCL] add ocl version check * [UPD][OPENCL] update message for vervion check Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][OPENCL] solve the problem that matmul, tile have incorrect results on helio p65 (Tencent#1602) Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][DYQ] fix dynamic range quant compile error on windows (Tencent#1604) Co-authored-by: ealinli <ealinli@tencent.com> * [DOC][UPD] modify image links in doc (Tencent#1617) Co-authored-by: ealinli <ealinli@tencent.com> * remove redundant test cases (Tencent#1614) * Fix typos. (Tencent#1626) * Fix typos. * Update Readme. Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Interpreter change from std::map to safe_map, later one offers a const operator[] function (Tencent#1618) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD][OPENCL] get opencl version when GpuType is OTHER (Tencent#1636) * [UPD][OPENCL] get opencl version when GpuType is OTHER * [UPD][OPENCL] optimize nv gpu judgment logic Co-authored-by: ealinli <ealinli@tencent.com> * Patch x86 avx support (Tencent#1633) * merge dev_vc14_m1_debug, support x86 avx * add option to support x86 avx2 compile * update win_x86_opencl building script Co-authored-by: Dandiding <Dandiding@tencent.com> * fix x86 avx2 options (Tencent#1638) * fix typos in doc (Tencent#1634) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [X86][BUG] fix deconv layer build error (Tencent#1641) * [OPENCL][FIX] fix conv and dwconv on some of the AMD GPUs * [UPD][OPENCL] add coor check for conv and dwconv * [OPENCL][FIX] fix compilation issues * [OPENCL][UPD] optimize AMD GPU judgment logic Co-authored-by: ealinli <ealinli@tencent.com> * [OPENCL][UPD] fix deconv, avgpool on AMD GPU (Tencent#1646) * [OPENCL][UPD] fix deconv and avgpool when read image * [OPENCL][UPD] add header file for pooling Co-authored-by: ealinli <ealinli@tencent.com> * [OPENCL][UPD] opencl support cache on windows (Tencent#1645) * [UPD][OPENCL] add coor check for conv and dwconv * [OPENCL][FIX] fix compilation issues * [OPENCL][UPD] optimize AMD GPU judgment logic * [OPENCL][UPD] support cache on windows * [OPENCL][UPD] fix load cache on windows Co-authored-by: ealinli <ealinli@tencent.com> * [DRQ][UPD] dynamic range quant model support do const folder (Tencent#1647) * [DRQ][UPD] dynamic range quant model support do const folder * [TOOLS][UPD] dynamic range quant updates usage Co-authored-by: ealinli <ealinli@tencent.com> * 1. make model_check support dynamic range quantized model; (Tencent#1653) * [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial (Tencent#1640) * [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial * [TUTORIAL][UPD] update code link * [TUTORIAL][UPD] fix typo Co-authored-by: ealinli <ealinli@tencent.com> * [X86][FIX] binary op support fp16 weights (Tencent#1655) * [X86][FIX] binary op support fp16 weights * [X86][FIX] matmul support fp16 weights Co-authored-by: ealinli <ealinli@tencent.com> * Feature dynamic quant fc (Tencent#1660) * [DYNAMIC_QUANT][UPD]1. dynamic quant support inner_product layer; * [ARM][UPD]1. arm gemm 部分情况下使用 Kahan sum 算法,以避免 fp16 累加误差; * [FIX][CPU][TRT] Fix CPU Not OP bug, Fix TensorRT ShapeTensor Class Bug. (Tencent#1663) * [FIX] Fix CPU Not Operator data type error. * [FIX] Fix TensorRT ShapeTensor class ConvertTo1D() func bug * fix _mm256_load_ps segmentation fault (Tencent#1682) * fix _mm256_load_ps segmentation fault * fix crash on mm256_load when innerproduct * use loadu instead of stride-judgement * remove unused code Co-authored-by: fishdai <fishdai@tencent.com> * x86_acc & blob_converter now will consider the BlobHandle.bytes_offset (Tencent#1684) * Dev x86 layer adapter (Tencent#1683) * [X86] add layer acc adapter * [X86] NULL to nullptr * [X86][OPENVINO] add openvino adapter layer builder, fallback to cpu naive impl if there is no normal ov layer builder * [X86][OPENVINO] fix hard code of ov precision Co-authored-by: anonymous <anonymous@mail.org> * [ARM] fix arm cross compile error caused by float-abi (Tencent#1678) * avoid nullptr in IsSupport (Tencent#1685) * [UPD][TOOLS] 1.increase subs_length 2.align model support bool and int32 input 3. fix gather and onehot convert 4. gather_nd support indices_shape[-1] < r (Tencent#1686) Co-authored-by: ealinli <ealinli@tencent.com> * Dev metal ngray (Tencent#1693) * [METAL] metal support ngray input mat * [METAL]fix bytes_size * [COREML] fix dynamic quantization model about coreml Co-authored-by: jacinhu <jacinhu@tencent.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [UPD][DRQ] support quantizing matmul's const weight (Tencent#1698) * [UPD][DRQ] support quantizing matmul's const weight * [UPD][DRQ] add scale check in constant map Co-authored-by: ealinli <ealinli@tencent.com> * [FIX] fix compile macos framework (Tencent#1687) Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * Optimize dynamic range quantize (Tencent#1699) * [DynamicRangeQuantize][UPD]1. 添加了根据权重分布判断是否量化的逻辑; * [DynamicQuantization][UPD]1. dynamic_range_quantization support TNN fp16 model; * [DRQ][UPD]1. 修复了 model_check_android.sh 脚本中指定 reference file,但是推理没有用到的 bug;2. 优化了 dynamic_range_quantization 中的部分代码; * [DRQ][UPD]1.fix conflict with merge master code; Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> * Fix windows x86 build (Tencent#1697) * [FIX] remove nanodet for windows * remove ninga compile for some bug * fix x86 mat type register macro name * fix x86 matmul with 2 inputs Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [METAL] fix stride slice crach when dims is 2 (Tencent#1701) Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [mac] 1. FIX X86 and ARM conflict; 2. ADD ARM arch on intel cpu (You can use ARM if rosetta-X86 crash). 3. Use ios project build/profile M1-Mac. (Tencent#1700) Co-authored-by: gennyxu <gennyxu@tencent.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [iOS][UPD]1. add missing file for xcode project; (Tencent#1705) * [BUG]fix coreml error of slicev2、padv2 and matmul; (Tencent#1703) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN * [BUG]fix dynamic dequant error; fix arm pad error; * [BUG]support coreml padv2 * [BUG]fix ccoreml matmul error when it has const input blob * [BUG]fix coreml slicev2 * [UPD]add convert logic of swish * [BUG]fix error cpu error for x86 mac * [UPD]support fusion for gemm + bn * [UPD]add convert logic of swish Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD]update merge logic for swish groupnorm deconv (Tencent#1708) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN * [BUG]fix dynamic dequant error; fix arm pad error; * [BUG]support coreml padv2 * [BUG]fix ccoreml matmul error when it has const input blob * [BUG]fix coreml slicev2 * [UPD]add convert logic of swish * [BUG]fix error cpu error for x86 mac * [UPD]support fusion for gemm + bn * [UPD]add convert logic of swish * [UPD]support fusion for deconv+add and deconv+add+bn * [UPD]add aliyun disk link for tnn models * [UPD]support fusion for group norm * [UPD]support fusion for swish Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [DRQ][BUG]1. fix bug for max_values; (Tencent#1716) * Hotfix m1 build (Tencent#1715) * fix apple m1 clang 13.1 compile error * fix unit test compile error Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> * [ARM] support groupnorm * [ARM] support swish * add swish to conv-post-fuse * [ADD][OPENCL] opencl add group norm and swish (Tencent#1722) Co-authored-by: ealinli <ealinli@tencent.com> * add x86 swish and groupnorm operator; explicitly open see4.2 with low version of compiler Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: sxj731533730 <sxj731533730@gmail.com> Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: powerpwang <powerpwang@outlook.com> Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: saner zheng <zqawszqaws@126.com> Co-authored-by: sanerzheng <sanerzheng@tencent.com> Co-authored-by: Feng Shijie <j514681085@icloud.com> Co-authored-by: Dandiding <Dandiding@tencent.com> Co-authored-by: FeiGeChuanShu <774074168@qq.com> Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com> Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com> Co-authored-by: kumbayaco <xyu.dai@gmail.com> Co-authored-by: fishdai <fishdai@tencent.com> Co-authored-by: anonymous <anonymous@mail.org> Co-authored-by: jacinhu <jacinhu@tencent.com> Co-authored-by: XDC <196890111@qq.com> Co-authored-by: gennyxu <gennyxu@tencent.com> Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local> Co-authored-by: quinnrong <quinnrong@tencent.com> Co-authored-by: shenpenwang <565067453@qq.com> * fix coreml groupnorm unit test * [ADD]add exp op * [BUG]fix deconv bisas error * [UPD]init cpu memory with 0 for bert model * [BUG]fix reshape static error; reshape static layer cannot handle 0 or -1 * [UPD]support inst norm for coreml; update tnn project file; * [BUG]fix error for layer without layer resource, [] operater will add one, which is not thread safe * [UPD]add param to batchnorm to support instancenorm * [UPD]adjust groupnorm with batchnorm * [UPD]support instancenorm with groupnorm by setting group==channels * [UPD]update unit test of instancenorm * [BUG]fix unit test error for layer batchnorm * [UPD]update tnn project * [BUG]fix unit test error for APPLE NPU * [BUG]fix unit test crash for layer batchnorm * [UPD]ignore cpu or gpu benchmark for mlmodel or mlmodelc * [UPD]ignore * [UPD]ignore pixelshuffle for apple npu * [UPD]ignore matconvert for apple npu * [UPD]ignore some unary op for apple npu * [UPD]unify before and after coreml layer, simplify lstm layer * [UPD]fix lstm error for ht and ct for biLSTM * [UPD]fix const input load error * [UPD]fix internal error * [UPD]ignore Co-authored-by: jacinhu <jacinhu@tencent.com> Co-authored-by: teslawho <597645882@qq.com> Co-authored-by: teslawho <71381575+teslawho@users.noreply.github.com> Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: sxj731533730 <sxj731533730@gmail.com> Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com> Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: powerpwang <powerpwang@outlook.com> Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: saner zheng <zqawszqaws@126.com> Co-authored-by: sanerzheng <sanerzheng@tencent.com> Co-authored-by: Feng Shijie <j514681085@icloud.com> Co-authored-by: Dandiding <Dandiding@tencent.com> Co-authored-by: FeiGeChuanShu <774074168@qq.com> Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com> Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com> Co-authored-by: kumbayaco <xyu.dai@gmail.com> Co-authored-by: fishdai <fishdai@tencent.com> Co-authored-by: anonymous <anonymous@mail.org> Co-authored-by: XDC <196890111@qq.com> Co-authored-by: gennyxu <gennyxu@tencent.com> Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local> Co-authored-by: quinnrong <quinnrong@tencent.com> Co-authored-by: shenpenwang <565067453@qq.com>

* [ARM] support groupnorm * [ARM] support swish * add swish to conv-post-fuse * [ADD][OPENCL] opencl add group norm and swish (Tencent#1722) Co-authored-by: ealinli <ealinli@tencent.com> * add x86 swish and groupnorm operator; explicitly open see4.2 with low version of compiler * fix lstm unit test Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: shenpenwang <565067453@qq.com>

ZaoZhe6666 · 2022-08-29T09:49:00Z

改为使用 Optimizer 来支持INT类型计算，改动包括：

Optimizer 中添加优化器，筛选计算中使用了 INT32 / 输出为 INT8 类型的算子，在算子前后合适位置分别添加 CAST 算子
删去原有的 TransDataType 函数，不再使用原地交换
删去部分算子内部逻辑中的 INT32 计算，仅保留原有的 FLOAT 计算逻辑（INT 数据会被 CAST 转为 FLOAT 参与计算）
添加 FLOAT_TO_INT8 INT32_TO_INT8 的 CAST 支持逻辑
由于部分 Constant 类型的 Initial 数据，会添加 CAST 算子转换为 FLOAT 类型数据，这类新增算子不应该被默认为 Constant（否则将不会做地址变换），在 base_layer 与 optimizer/layout_reformat 中添加相应的筛选逻辑
equal/greater 算子复用 Binary 的 Float 计算逻辑，在输出后再通过 Cast 算子转为 INT8 类型

bluaxe reviewed Aug 2, 2022

View reviewed changes

ZaoZhe6666 force-pushed the dev_transformer branch from 93de5a8 to 040f62f Compare August 2, 2022 10:40

zezhao and others added 19 commits August 3, 2022 10:38

添加 README_EVA.md 说明文件

47d318f

修复由于 GEMM 算子导致的 ONNX2TNN 失败问题

28610c0

解决因为 Selu 算子 default 值导致的 2TNN 不对齐问题

90e90c5

为 Transformer 任务需求提供支持，相关改动已更新至iwiki https://iwiki.woa.com/pages/view…

62cff44

…page.action?pageId=1928435580

return error when mat.data_ == nullptr (Tencent#1733)

16e2fae

Delete README_EVA file.

17fb9dc

In order to make the changes ready to merge, now delete the readme file

refine PR 1723 comments

3d1d70c

fix onnx converter

09ec860

refine comments for PR 1723

d2819e9

bugfix: cpu 模式下，layer_res->element_shape 值未正确传递，使得二元算子计算时仅使用了第一个元素参与计算

21528dc

为 Transformer 任务需求提供支持，相关改动已更新至iwiki https://iwiki.woa.com/pages/view…

3e88ced

…page.action?pageId=1928435580

refine PR 1723 comments

8aa24e4

fix onnx converter

2f67e21

refine comments for PR 1723

1ecee3e

ZaoZhe6666 force-pushed the dev_transformer branch from 040f62f to 1ecee3e Compare August 3, 2022 03:00

zezhao(赵泽) added 3 commits August 29, 2022 17:20

使用 Optimizer 方式实现 INT32 计算支持

729b1b8

merge with master

6acbd67

merge with master part 2

0ad3828

gttiankai and others added 2 commits September 21, 2022 20:07

Merge branch 'master' into dev_transformer

11fdfb0

Merge branch 'master' into dev_transformer

cc195da

bluaxe and others added 4 commits January 4, 2023 14:18

Merge branch 'master' into dev_transformer

abc85ac

Merge branch 'master' into dev_transformer

e08c15e

Merge branch 'master' into dev_transformer

1ed16fd

Merge branch 'master' into dev_transformer

7839397

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev transformer #1723

Dev transformer #1723

ZaoZhe6666 commented Jul 12, 2022

bluaxe Aug 2, 2022

bluaxe Aug 2, 2022

bluaxe commented Aug 2, 2022

ZaoZhe6666 commented Aug 29, 2022

Dev transformer #1723

Are you sure you want to change the base?

Dev transformer #1723

Conversation

ZaoZhe6666 commented Jul 12, 2022

bluaxe Aug 2, 2022

Choose a reason for hiding this comment

bluaxe Aug 2, 2022

Choose a reason for hiding this comment

bluaxe commented Aug 2, 2022

ZaoZhe6666 commented Aug 29, 2022