<input id="ohw05"></input>
  • <table id="ohw05"><menu id="ohw05"></menu></table>
  • <var id="ohw05"></var>
  • <code id="ohw05"><cite id="ohw05"></cite></code>
    <label id="ohw05"></label>
    <var id="ohw05"></var>
  • clang9適配一階段總結

    1. 概述

    截止2021年11月25日,clang9完成sdk/gtest/dsopt模塊的編譯。

    參照下面的腳本下載了所有[TR-16607] clang9交叉編譯工具鏈制作和驗證 - Enflame Company JIRA相關的修改,包含merged和當前還是open狀態的修改:

    怎么從gerrit批量導出詳細的patch - 周榮華_Ronghua - enflame wiki

     

    特地說明一下,gerrit的query命令里面不能有括號,所以實際如果存在多個條件的復雜聯合時,默認是AND運算,如果想使用OR運算的話,需要把多個可選表達式用OR連接起來。

     

    簡單統計了一下,新增3924行代碼,刪除4164行代碼:

    PS D:\code> grep "^+[^+]" .\diffrecord.txt |wc
       3924   24785  152346
    PS D:\code> grep "^-[^-]" .\diffrecord.txt |wc
       4164   23159  147430

     

    前期修改的時候,由于打開了-Werr選項,所以有一些是不太重要的告警,由于告警實在太多,后期將-Werr臨時先關閉了,只保留了部分特定的Werr選項。

    另外,由于tops下面的代碼中從大的整型向小的整型隱式轉換的非常多,后面還用-Wno-c++11-narrowing臨時關閉了相關告警。

     

    2. 問題發現和解決的方法

    如果每次發現一個問題之后,修改完之后,再走全量編譯,通常非常耗時,下面的方法可以獲取單個的編譯或者鏈接命令,便于針對性驗證。

    2.1. cmake的編譯命令獲取

    cmake有編譯字典,在cmake_build(敲cmake命令的目錄,可能是其他目錄)目錄下會生成一個“compile_commands.json”文件,里面記錄了所有.c/.cc/.cpp生成.o的目錄和完整命令,例如想知道

    hlir_utils_test.cc的編譯命令,可以用下面的途徑獲取:
    grep hlir_utils_test.cc compile_commands.json
      "command": "/opt/efb/clang9/bin/clang++  -DLLVM_DISABLE_ABI_BREAKING_CHECKS_ENFORCING -D_GLIBCXX_USE_CXX11_ABI=0 -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/sdksrc/include/_virtual_includes/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/sdksrc/include/_virtual_includes/include/dtu -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/lib/umd/include/_virtual_includes/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/ef_log/include/_virtual_includes/include -I/home/ronghua.zhou/clang1_build/tops/sdk -I/home/ronghua.zhou/clang1_build/tops/sdk/lib -I/home/ronghua.zhou/clang1_build/tops/sdk/lib/cpu_ops -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/llvm-project/llvm/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/llvm-project/mlir/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/org_tensorflow -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/eigen_archive -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/com_google_absl -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/com_google_protobuf/src -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/external/dtu_sdk/bazel-bin -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/external/llvm-project/llvm/utils/unittest/googlemock/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/external/com_googlesource_code_re2 -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/lib -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/org_tensorflow -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/llvm-project/llvm/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/llvm-project/mlir/include -isystem /home/ronghua.zhou/clang1_build/tops/3rdparty/googletest/include -isystem /home/ronghua.zhou/clang1_build/tops/3rdparty/googletest  -O3 -g0 -DNDEBUG -fPIE   -m64 -march=x86-64 -mtune=generic -Werror=array-bounds -Werror=empty-body -Werror=format-extra-args -Werror=incompatible-pointer-types -Werror=array-bounds-pointer-arithmetic -Werror=c++-compat -Werror=shift-count-overflow -Werror=sizeof-pointer-memaccess -Werror=for-loop-analysis -Werror=unused-label -Werror=delete-incomplete -Werror=empty-translation-unit -Werror=unused-local-typedef -Werror=gnu-case-range -Werror=mismatched-new-delete -Werror=infinite-recursion -Werror=unreachable-code -Werror=sometimes-uninitialized -Werror=c++14-binary-literal -Werror=implicit-fallthrough -Werror=constant-logical-operand -Werror=exceptions -fcxx-exceptions -Werror=extra-tokens -Werror=format -Werror=format-security -Werror=header-guard -Werror=literal-conversion -Werror=null-conversion -Werror=pointer-bool-conversion -Werror=shift-overflow -Werror=tautological-constant-out-of-range-compare -Werror=tautological-pointer-compare -Werror=varargs -Wdouble-promotion -Wno-error=extern-c-compat -Wall -Wno-c++11-narrowing -Wextra -fsanitize=address -fno-omit-frame-pointer -std=gnu++14 -std=gnu++14 -o sdk/tests/hlir/cc_tests/CMakeFiles/hlir_utils_test.dir
    hlir_utils_test.cc.o -c /home/ronghua.zhou/clang1_build/tops/sdk/tests/hlir/cc_tests/hlir_utils_test.cc",
      "file": "/home/ronghua.zhou/clang1_build/tops/sdk/tests/hlir/cc_tests/hlir_utils_test.cc"

     

     

    2.2. bazel的編譯命令獲取

    ?https://github.com/vincent-picaud/Bazel_and_CompileCommands

    上面這個開源項目提到可以用–experimental_action_listener=//tools/actions:generate_compile_commands_listener到bazel命令的方式來實現接收編譯命令,但我用了幾次沒有成功,最終改為在編譯過程中用原始的ps命令來獲取,例如想獲取hlir_utils_test.ccbian編譯命令可以用下面的命令:

    ps -elf |grep hlir_utils_test.cc

    另外,bazel命令后面加上-s參數也可以達到獲取后續編譯命令的效果。

    2.3. 鏈接命令的獲取

    如果知道鏈接的具體目標文件,可以參照2.2的方法用ps命令獲取,例如要鏈接libdtu_sdk.so,可以用下面命令獲取鏈接命令:

    ps -elf |grep libdtu_sdk.so

    如果不清楚鏈接的具體目標,在鏈接對象不多的情況下可以用“ps -elf”獲取一個全集,從全集里面可以看到很多“ld @/tmp/response-xxx.txt”的進程,將當前所有的/tmp/response*拷貝到別的目錄下,研究下這些文件用來鏈接生成什么目標的,這些文件里面會有完整的鏈接命令和參數,通過這個文件可以得到鏈接命令。

     

    3. 實際修改分類

    3.1. 編譯選項的修改

    3.1.1. 增加的選項

    -fcxx-exceptions :因為dsopt使用了異常,clang的異常處理默認關閉,需要打開。

    -Wno-c++11-narrowing :tops下面的代碼中從大的整型向小的整型隱式轉換的非常多,臨時關閉,等各個組件消除了相關問題之后再打開,clang里面把從大整型到小整型的隱式轉換當做錯誤處理。

    3.1.2. 刪除的選項

    -Werror : 告警實在太多,要求消除所有告警不現實,臨時先刪除該選項。

    3.1.3. 修改的選項

    set (CMAKE_CXX_STANDARD 14) :原來的默認標準是17,和TensorFlow的默認標準14沖突,也和gcc的默認標準14沖突,改成c++14。

    -fno-canonical-system-headers :這個參數僅gcc支持,clang不支持,所以把它從所有編譯器都打開,改到僅gcc打開。

    3.1.4. bazel的選項說明

    bazel的編譯選項分copt/cxxopt/conlyopt,其中copt是c和c++公用的選項,cxxopt是僅c++才是用的選項,conlyopt是僅c才有的選項,如果用錯了,會出現很多告警。

     

    3.1.5. CMAKE的CMAKE_TOOLCHAIN_FILE變量在rerun的時候,有一定概率會把搜索路徑下的工具鏈配置文件加上全路徑,導致直接STREQUAL判斷失敗

    解決方案是用MATCHES代替STREQUAL,通配是否增加全路徑的情況:

    CMakeLists.txt Expand source

    3.2. 模板相關錯誤

    3.2.1. use 'template' keyword to treat 'cast' as a dependent template name

    clang里面對在一個模板實例化后的對象中調用一個需要動態翻譯的函數,需要使用template顯示說明,否則會報錯。參照ISO C++03 14.2/4:

    When the name of a member template specialization appears after . or -> in a postfix-expression, or after nested-name-specifier in a qualified-id, and the postfix-expression or qualified-id explicitly depends on a template-parameter (14.6.2), the member template name must be prefixed by the keyword template. Otherwise the name is assumed to name a non-template.

     

    例如hlir的SinkTransposeWithScalarBroadcast類里面調用了mlir::RankedTensorType、mlir::ShapedType的cast方法

     
    diff --git a/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc b/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc
    index c82fa217a21..9952ddbc470 100644
    --- a/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc
    +++ b/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc
    @@ -237,11 +237,14 @@ struct SinkTransposeWithScalarBroadcast : public mlir::OpRewritePattern<T> {
         }
         llvm::SmallVector<mlir::Value, 4> new_operands(root->getNumOperands(), {});
         for (auto& it : broadcast_ops) {
    -      auto transposedTy = getTransposedType(std::get<1>(it)
    -                                                ->getResult(0)
    -                                                .getType()
    -                                                .cast<mlir::RankedTensorType>(),
    -                                            prePermutation);
    +      // fix error:
    +      // use 'template' keyword to treat 'cast' as a dependent template name
    +      auto transposedTy =
    +          getTransposedType(std::get<1>(it)
    +                                ->getResult(0)
    +                                .getType()
    +                                .template cast<mlir::RankedTensorType>(),
    +                            prePermutation);
           auto new_attr = llvm::cast<HlirOp::BroadcastInDimOp>(std::get<1>(it))
                               .broadcast_dimensionsAttr();
           if (new_attr) {
    @@ -251,7 +254,7 @@ struct SinkTransposeWithScalarBroadcast : public mlir::OpRewritePattern<T> {
               new_data[i] = layout[data[i]];
             }
             new_attr = mlir::DenseIntElementsAttr::get(
    -            new_attr.getType().cast<mlir::RankedTensorType>(),
    +            new_attr.getType().template cast<mlir::RankedTensorType>(),
                 llvm::makeArrayRef(new_data));
           }
           mlir::Operation* transpose_bs_op =
    @@ -274,7 +277,7 @@ struct SinkTransposeWithScalarBroadcast : public mlir::OpRewritePattern<T> {
         mlir::Operation* ret_transpose = rewriter.create<HlirOp::TransposeOp>(
             root->getLoc(), root->getResult(0).getType(), new_root->getResult(0),
             mlir::DenseIntElementsAttr::get(
    -            permutation.getType().cast<mlir::ShapedType>(), layout));
    +            permutation.getType().template cast<mlir::ShapedType>(), layout));
         root->replaceAllUsesWith(ret_transpose);
       }

     

    注意,如果不是模板實例化的函數,不需要加template,同一個類里面也存在不需要處理的函數調用,例如同一個文件里面的ss對象是非模板實例化的,類型是固定的mlir::Operation*,ss在調用存在多態的cast函數時就不需要使用temple進行前置聲明:

     
    mlir::Operation* ss = op.getOperation();
    auto new_operand_ty = getTransposedType(operand_ty, prePermutation);
    auto new_source_ty = getTransposedType(source_ty, prePermutation);
    auto new_result_ty = getTransposedType(
        ss->getResult(0).getType().cast<mlir::RankedTensorType>(),
        prePermutation);

     

    同樣的問題也存在于factor模塊的factor_profiler_pass.cc中:

    diff --git a/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc b/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc
    index 43419fd305a..ad23a709f20 100644
    --- a/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc
    +++ b/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc
    @@ -55,11 +55,11 @@ mlir::Value getFirstOperand<mlir::Value>(mlir::Value op) {
      
     template <typename T>
     int getSrcCompressed(T op) {
    -  return op.template dma_src_compressedAttr().getInt();
    +  return op.dma_src_compressedAttr().getInt();
     }
     template <typename T>
     int getDstDecompressed(T op) {
    -  return op.template dma_dst_decompressAttr().getInt();
    +  return op.dma_dst_decompressAttr().getInt();
     }
      
     #define DISABLE_DMA_COMPRESS_ATTR_GETTER(OP) \
    @@ -84,11 +84,11 @@ DISABLE_DMA_COMPRESS_ATTR_GETTER(mlir::factor::FactorDeSliceOp)
      
     template <typename T>
     int getReverseLr(T op) {
    -  return op.template dma_reverse_lrAttr().getInt();
    +  return op.dma_reverse_lrAttr().getInt();
     }
     template <typename T>
     int getReverseTb(T op) {
    -  return op.template dma_reverse_tbAttr().getInt();
    +  return op.dma_reverse_tbAttr().getInt();
     }
      
     #define DISABLE_REVERSE_ATTR_GETTER(OP) \
    @@ -114,7 +114,7 @@ DISABLE_REVERSE_ATTR_GETTER(mlir::factor::FactorDeSliceOp)
      
     template <typename T>
     int getDmaType(T op) {
    -  return op.template dma_typeAttr().getInt();
    +  return op.dma_typeAttr().getInt();
     }
      
     #define DISABLE_DMA_TYPE_GETTER(OP) \
    @@ -142,8 +142,8 @@ std::string formatDmaAttrs(int direction, int src_compressed,
     template <typename T>
     void extractDmaMetaInfoTo(T op, dtu_activity_data &data) {
       auto &args = data.args;
    -  mlir::Value from = getFirstOperand(op.template from());
    -  mlir::Value to = getFirstOperand(op.template to());
    +  mlir::Value from = getFirstOperand(op.from());
    +  mlir::Value to = getFirstOperand(op.to());
       auto engine_type = getDmaType(op);
       auto direction = op.dma_directionAttr().getInt();

     

    3.2.2. 二義性

    部分模板實例化的時候,如果同一個調用用模板函數A和模板函數B都能正常匹配到,clang會報二義性錯誤,gcc不報錯。

    例如下面的EraseHelp,原來的版本定義了兩種原型,其實對存在多個模板類型需要使用TypeSequence進行原型定義的時候,編譯器其實不知道是該先把Last抽出來計算,還是先把Inner抽出來計算,如果這2個函數的實現邏輯不一樣的話,在gcc里面居然沒報錯,不知道是隨機找到一個匹配的原型就調用,還是用第一個或者最后一個原型來調用。

    constexpr static auto EraseHelp(TypeSequence<Left...>, TypeSequence<Last>);

    constexpr static auto EraseHelp(TypeSequence<Left...>, TypeSequence<Inner, Right...>);
    diff --git a/sdk/lib/hlir/ir/type_utils.h b/sdk/lib/hlir/ir/type_utils.h
    index 3cf2bc7994a..0e645fd1e7e 100644
    --- a/sdk/lib/hlir/ir/type_utils.h
    +++ b/sdk/lib/hlir/ir/type_utils.h
    @@ -157,12 +157,9 @@ struct EraseSeqIf {
         using type = decltype(EraseHelp(LeftSeq(), TypeSequence<Right...>()));
         return type();
       }
    -  template <typename... Left, typename Last>
    -  constexpr static auto EraseHelp(TypeSequence<Left...>, TypeSequence<Last>) {
    -    using type = typename std::conditional<!Pred<Last>::value,
    -                                           TypeSequence<Left..., Last>,
    -                                           TypeSequence<Left...>>::type;
    -    return type();
    +  template <typename... Left>
    +  constexpr static auto EraseHelp(TypeSequence<Left...>, TypeSequence<>) {
    +    return TypeSequence<Left...>();
       }
       using type = decltype(EraseHelp(TypeSequence<>(), TypeSequence<T...>()));
     };

     

    3.3. 類型不匹配

    3.3.1. 大整型向小整型的隱式轉換

    例如sdk/tests/llir/dataflow1_pingpang_buffer_test.cc里面定義的func_entry是int64_t類型,但實際調用函數的時候,函數原型要求的入參是uint32_t,會觸發int64_t → uint32_t的隱式轉換:

    diff --git a/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc b/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc
    index fa824f03d9a..70298b1fb59 100644
    --- a/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc
    +++ b/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc
    @@ -522,7 +522,7 @@ TEST(Pavo2xCDMAPattern1Test, Pavo2xCDMAPattern1WithPingpangTest) {
                                  {{0}, {1}, {2}, {3}, {4}, {5}}, 1, 1, 1, -1, -1,
                                  output_queues_l1);
      
    -    int64_t func_entry = 0;
    +    uint32_t func_entry = 0;
         // trigger sip
         for (uint64_t idx = 0; idx < SIP_COUNT; ++idx) {
           std::string sip_name = std::string("sip") + std::to_string(idx);

     

    其他類似的有:

    sdk/tests/llir/dataflow1_test.cc

    sdk/tests/llir/dataflow2_test.cc

    sdk/tests/llir/dataflow3_test.cc

    sdk/tests/llir/dataflow5_test.cc

    sdk/tests/llir/dataflow5_test_1xcdma.cc

    sdk/tests/llir/dataflow7_test.cc

    sdk/tests/llir/llir2assembler_leo_test.cc

    sdk/tests/llir/utils/llir_test_util.cc

    sdk/tests/llir/utils/llir_test_util.h

     

    3.3.2. 有符號向無符號的隱式轉換

    -1轉換為無符號整型:

    diff --git a/sdk/lib/hlir/ir/type_utils.h b/sdk/lib/hlir/ir/type_utils.h
    index 0e645fd1e7e..f84360269f3 100644
    --- a/sdk/lib/hlir/ir/type_utils.h
    +++ b/sdk/lib/hlir/ir/type_utils.h
    @@ -122,10 +122,9 @@ struct FindIf<Pred, T, R...> {
      
     template <template <typename N> typename Pred, typename T>
     struct FindIf<Pred, T> {
    -  using type =
    -      typename std::conditional<Pred<T>::value,
    -                                std::integral_constant<size_t, 0>,
    -                                std::integral_constant<size_t, -1>>::type;
    +  using type = typename std::conditional<
    +      Pred<T>::value, std::integral_constant<size_t, 0>,
    +      std::integral_constant<size_t, static_cast<size_t>(-1)>>::type;
     };

     

    其他主要體現在迭代器定義的是int類型,但實際使用過程中需要和很多uint32_t進行比較,導致了隱式的int → uint32的轉換:

    diff --git a/sdk/lib/umd/tests/sample/launch_code.cc b/sdk/lib/umd/tests/sample/launch_code.cc
    index 1152a283052..708b1f44e7d 100644
    --- a/sdk/lib/umd/tests/sample/launch_code.cc
    +++ b/sdk/lib/umd/tests/sample/launch_code.cc
    @@ -719,11 +716,10 @@ static void _launch_code_for_eight_sip(int cid, bool check_result) {
       dtu_mem_handle param = cluster_mem[cid];
       u64 param_off = A_B_SIZE + EIGHT_C_SIZE;
       u64 param_size = PARAM_TRUE_SIZE;
    -  u16 launch_entry = 0;
       dtu_sip_mode_cfg_st mode;
       mode.mode_dw = 0x5070f10;
       LaunchKernelParameter parameter[8];
    -  for (int i = 0; i < run_sip_count; i++) {
    +  for (u32 i = 0; i < run_sip_count; i++) {
         parameter[i] =
             LaunchKernelParameter(sip[i], param, param_off + i * ONE_PARAM_SIZE,
                                   param_size, 0, mode, 0, false, false, "op_0");

     

    其他文件:

    sdk/lib/spm/src/buddy_policy.c

    system_test/tools/vpd_cycle/vpd_cycle.c

    sdk/lib/spm/include/spm.h

    sdk/tests/llir/llir2assembler_leo_test.cc

    sdk/tests/llir/dataflow5_test_1xcdma.cc

    sdk/tests/llir/dataflow5_test.cc

    sdk/tests/llir/llir2assembler_leo_test.cc

    sdk/tests/llir/utils/llir_test_util.cc

    sdk/tests/llir/utils/llir_test_util.h

     

    對sdk/lib/umd/tools/kernel_code_processor/dturt.inc的修改要麻煩一點,___leo_runtime___和___x_runtime___定義的時候是char[],但初始化有可能大于127,會導致溢出,但使用該變量的函數,以及二級引用的函數,都要求它是char[],最終修改是定義改成unsigned char[],但在一級引用的函數中做一次強制轉換。

    diff --git a/sdk/lib/umd/tools/kernel_code_processor/dturt.inc b/sdk/lib/umd/tools/kernel_code_processor/dturt.inc
    index 1f22b52d8af..d1ed30a049d 100644
    --- a/sdk/lib/umd/tools/kernel_code_processor/dturt.inc
    +++ b/sdk/lib/umd/tools/kernel_code_processor/dturt.inc
    @@ -1,4 +1,4 @@
    -static const char ___leo_runtime___[] = {
    +static const unsigned char ___leo_runtime___[] = {
         0x21, 0x3C, 0x61, 0x72, 0x63, 0x68, 0x3E, 0x0A, 0x2F, 0x20, 0x20, 0x20,
         0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,
         0x30, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,
    @@ -2885,7 +2885,7 @@ static const char ___leo_runtime___[] = {
         0x00, 0x00,
     };
     static const int ___leo_runtime_size___ = sizeof(___leo_runtime___);
    -static const char ___x_runtime___[] = {
    +static const unsigned char ___x_runtime___[] = {
         0x21, 0x3C, 0x61, 0x72, 0x63, 0x68, 0x3E, 0x0A, 0x2F, 0x20, 0x20, 0x20,
         0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,
         0x30, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,

     

    diff --git a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h
    index d193a8823ac..f61d048cfd6 100644
    --- a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h
    +++ b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h
    @@ -128,7 +128,10 @@ class Kernel {
       struct __target__ : public KernelCode<__target__>, public Kernel {         \
         using KernelCode<__target__>::KernelCode;                                \
         static const llvm::StringRef GetArch() { return #__arch__; }             \
    -    static const char* GetRTBuffer() { return ___##__arch__##_runtime___; }  \
    +    static const char* GetRTBuffer() {                                       \
    +      return static_cast<char*>(static_cast<void*>(                          \
    +          const_cast<unsigned char*>(___##__arch__##_runtime___)));          \
    +    }                                                                        \
         static int GetRTBufferSize() { return ___##__arch__##_runtime_size___; } \
       };                                                                         \
       template class KernelCode<__target__>

     

     

     

     

    3.3.3. 浮點向整型的隱式轉換

    小數點直接轉沒了,非0值立即成了0值:

    diff --git a/sdk/tests/tops/tops_bnForwardTrainingEx_integration_test.cc b/sdk/tests/tops/tops_bnForwardTrainingEx_integration_test.cc
    index 73712aba4ad..df82dadfa65 100644
    --- a/sdk/tests/tops/tops_bnForwardTrainingEx_integration_test.cc
    +++ b/sdk/tests/tops/tops_bnForwardTrainingEx_integration_test.cc
    @@ -1890,7 +1890,7 @@ TEST_F(TopsTest, topsConvolutionForward_BatchNorm_RELU_UV) {
       int k_c = 1;
       int k_h = 3;
       int k_w = 3;
    -  int epsilon = 0.01;
    +  float epsilon = 0.01;
      
       int input_size = n * c * h * w;
       int kernel_size = k_n * k_c * k_h * k_w;
    @@ -2181,7 +2181,7 @@ TEST_F(TopsTest, topsConvolutionForward_BatchNorm_RELU_SV) {
       int k_c = 1;
       int k_h = 3;
       int k_w = 3;
    -  int epsilon = 0.01;
    +  float epsilon = 0.01;
      
       int input_size = n * c * h * w;
       int kernel_size = k_n * k_c * k_h * k_w;

     

    其他類似修改:

    sdk/tests/op/hlir/pavo/bert/hlir_div_test.cc

     

    3.3.4. double向float的隱式轉換

    diff --git a/sdk/lib/umd/tests/sample/launch_code.cc b/sdk/lib/umd/tests/sample/launch_code.cc
    index 1152a283052..708b1f44e7d 100644
    --- a/sdk/lib/umd/tests/sample/launch_code.cc
    +++ b/sdk/lib/umd/tests/sample/launch_code.cc
    @@ -783,8 +779,8 @@ static void _launch_code_for_eight_sip(int cid, bool check_result) {
         float *result = (float *)((u64)dtu_mem_get_cpu_ptr(host_mem) + A_B_SIZE);
         for (u32 i = 0; i < (run_sip_count * DTU_ALIGN(DATA_BUFF_SIZE, 128)) / 4;
              i++) {
    -      if (result[i] - ((2 * i) % (DATA_BUFF_SIZE / 2)) > 0.01 ||
    -          ((2 * i) % (DATA_BUFF_SIZE / 2)) - result[i] > 0.01) {
    +      if (result[i] - ((2 * i) % (DATA_BUFF_SIZE / 2)) > 0.01f ||
    +          ((2 * i) % (DATA_BUFF_SIZE / 2)) - result[i] > 0.01f) {
             dtu_command_queue_destroy(queue);
             dtu_mem_free_hbm(hbm_mem);
             dtu_mem_free_host(host_mem);
    @@ -425,7 +425,7 @@ static void launch_code_for_one_sip(void) {
      
       float *result = (float *)((u64)dtu_mem_get_cpu_ptr(host_mem) + A_B_SIZE);
       for (int i = 0; i < DATA_BUFF_SIZE / 4; i++) {
    -    if (result[i] - (2 * i) > 0.01 || (2 * i) - result[i] > 0.01) {
    +    if (result[i] - (2 * i) > 0.01f || (2 * i) - result[i] > 0.01f) {
           dtu_command_queue_destroy(queue);
           dtu_mem_free_hbm(hbm_mem);
           dtu_mem_free_host(host_mem);
    @@ -605,8 +605,8 @@ static void launch_one_sip_twice(void) {
      
       float *result = (float *)((u64)dtu_mem_get_cpu_ptr(host_mem) + A_B_SIZE);
       for (int i = 0; i < 2 * DATA_BUFF_SIZE / 4; i++) {
    -    if (result[i] - ((2 * i) % (DATA_BUFF_SIZE / 2)) > 0.01 ||
    -        ((2 * i) % (DATA_BUFF_SIZE / 2)) - result[i] > 0.01) {
    +    if (result[i] - ((2 * i) % (DATA_BUFF_SIZE / 2)) > 0.01f ||
    +        ((2 * i) % (DATA_BUFF_SIZE / 2)) - result[i] > 0.01f) {
           dtu_command_queue_destroy(queue);
           dtu_mem_free_hbm(hbm_mem);
           dtu_mem_free_host(host_mem);

     

    其他類似修改:

    sdk/tests/op/hlir/pavo/resnet50/hlir_general_resize_test.cc

    3.3.5. 指針向bool的隱式轉換

    diff --git a/system_test/tools/vpd_cycle/vpd_cycle.c b/system_test/tools/vpd_cycle/vpd_cycle.c
    index 31d57fa0f9c..ccc9f71b827 100644
    --- a/system_test/tools/vpd_cycle/vpd_cycle.c
    +++ b/system_test/tools/vpd_cycle/vpd_cycle.c
    @@ -75,14 +83,14 @@ static int ProcessDB(const char *path) {
       char *name = strdup(path);
       char *base = basename(name);
       char *p;
    -  if (p = strrchr(base, '.')) *p = '\0';
    +  if ((p = strrchr(base, '.')) != NULL) *p = '\0';
       fprintf(output_fp, "%s,%lu\n", base, end - start);
       free(name);

     

     

    3.3.6. 不同類型隱式轉換

     

    fixed_size_mem_pool.h直接將dtu_status和int相互賦值,雖然dtu_status是個enum類型,和int類型很類似,但clang是強類型檢查,直接報錯。
    diff --git a/sdk/runtime/lib/top_scheduler/fixed_size_mem_pool.h b/sdk/runtime/lib/top_scheduler/fixed_size_mem_pool.h
    index 73d02f3b1f4..b7be6ee39c4 100644
    --- a/sdk/runtime/lib/top_scheduler/fixed_size_mem_pool.h
    +++ b/sdk/runtime/lib/top_scheduler/fixed_size_mem_pool.h
    @@ -118,7 +118,7 @@ class DeviceFixedSizeMemPool final
       ~DeviceFixedSizeMemPool() {}
      
       Status Init(dtu_umd::MemoryMgr *mgr, uint32_t mc, uint32_t flags) override {
    -    dtu_status status = 0;
    +    dtu_status status = DTU_SUCCESS;
         status =
             mgr->AllocDevice(NODE_NUMBER * NODE_SIZE, mc, flags, &(this->mem_));
         if (status) {
    --

     

    dtu_status的定義:

     
    typedef enum dtu_status_code {
      DTU_SUCCESS = 0,
      DTU_ERROR_INVALID_PARAMETER = -100,
      DTU_ERROR_INVALID_MEM_TYPE = -101,
      DTU_ERROR_OUT_OF_MEMORY = -102,
      DTU_ERROR_OUT_OF_RESOURCES = -103,
      DTU_ERROR_NOT_INITIALIZED = -104,
      DTU_ERROR_INVALID_CTX_OBJ = -105,
      DTU_ERROR_INVALID_CLUSTER_OBJ = -106,
      DTU_ERROR_INVALID_SIP_OBJ = -107,
      DTU_ERROR_INVALID_MEM_OBJ = -108,
      DTU_ERROR_INVALID_CMD_QUEUE_OBJ = -109,
      DTU_ERROR_INVALID_CMD_DESC_OBJ = -110,
      DTU_ERROR_INVALID_PROGRAM_OBJ = -111,
      DTU_ERROR_INVALID_FUNCTION_OBJ = -112,
      DTU_ERROR_INVALID_EVENT_OBJ = -113,
      DTU_ERROR_CLUSTER_BUSY = -114,
      DTU_ERROR_SIP_BUSY = -115,
      DTU_ERROR_IN_DRM = -116,
      DTU_ERROR_IN_IOCTRL = -117,
      DTU_ERROR_GEM_CREATE = -118,
      DTU_ERROR_GEM_CLOSE = -119,
      DTU_ERROR_GEM_MMAP = -120,
      DTU_ERROR_GEM_UNMMAP = -121,
      DTU_ERROR_CMD_QUEUE_SYNC = -122,
      DTU_ERROR_CMD_QUEUE_EMIT = -123,
      DTU_ERROR_CLUSTER_ACQUIRE = -124,
      DTU_ERROR_CLUSTER_RELEASE = -125,
      DTU_ERROR_NOT_MATCH = -126,
      DTU_ERROR_NOT_RELEASE_REF = -127,
      DTU_ERROR_GET_DEVICE_HDL = -128,
      DTU_ERROR_ALLOC_HOST = -129,
      DTU_ERROR_ALLOC_HBM = -130,
      DTU_ERROR_ALLOC_CLUSTER = -131,
      DTU_ERROR_FREE_HOST = -132,
      DTU_ERROR_FREE_HBM = -133,
      DTU_ERROR_FREE_CLUSTER = -134,
      DTU_ERROR_CMD_QUEUE_EMITED = -135,
      DTU_ERROR_OPEN_FILE = -136,
      DTU_ERROR_READ_FILE = -137,
      DTU_ERROR_WRITE_FILE = -138,
      DTU_ERROR_INVALID_BIN_TYPE = -139,
      DTU_ERROR_LOAD_BIN_FILE = -140,
      DTU_ERROR_LOAD_BIN_IMAGE = -141,
      DTU_ERROR_FUNCTION_NOT_FOUND = -142,
      DTU_ERROR_INVALID_OPERATION = -143,
      DTU_ERROR_EVENT_GET_ID = -144,
      DTU_ERROR_EVENT_WAIT_STATUS = -145,
      DTU_ERROR_EVENT_SIGNAL_STATUS = -146,
      DTU_ERROR_EVENT_TYPE = -147,
      DTU_ERROR_EVENT_NOT_SUBMIT = -148,
      DTU_ERROR_EVENT_DESTROYED = -149,
      DTU_ERROR_EVENT_SIGNAL_TWICE = -150,
      DTU_ERROR_MEMORY_OVERLAP = -151,
      DTU_ERROR_THREAD_POOL_QUEUE_OVERFLOW = -152,
      DTU_ERROR_PCI_BUS_SCAN = -153,
      DTU_ERROR_ALLOC_USERPTR = -154,
      DTU_ERROR_FREE_USERPTR = -155,
      DTU_ERROR_DUMP_CMEM = -156,
      DTU_ERROR_LOAD_CMEM = -157,
      DTU_ERROR_DUMP_SMEM = -158,
      DTU_ERROR_LOAD_SMEM = -159,
      DTU_ERROR_READ_REGISTERS = -160,
      DTU_ERROR_WRITE_REGISTERS = -161,
      DTU_ERROR_ALLOC_SIP = -162,
      DTU_ERROR_FREE_SIP = -163,
      DTU_ERROR_UNKNOWN = -164,
      DTU_ERROR_ALLOC_HUGE = -165,
      DTU_ERROR_INVALID_USR_IRQ_OBJ = -166,
      DTU_ERROR_LINK_CCIX_IO = -167,
      DTU_ERROR_PLACEHOLDER_NOT_FEED = -168,
      DTU_ERROR_LAUNCH_DMA = -169,
      DTU_ERROR_INVALID_PROFILE_MAGIC = -170,
      DTU_ERROR_INVALID_TIMESTAMP = -180,
      DTU_ERROR_INVALID_CONFIG = -181,
      DTU_ERROR_CHILD_NOT_SUBMIT = -182,
      DTU_ERROR_ALREADY_FORKED = -183,
      DTU_ERROR_LABEL_USED = -184,
      DTU_ERROR_LABEL_NOT_VALIDATED = -185,
      DTU_ERROR_COMMAND_TYPE_MISMATCH = -186,
      DTU_ERROR_VECTOR_NUMBER = -187,
      DTU_ERROR_VECTOR_FLAG_MISMATCH = -188,
      DTU_ERROR_DEVICE_RESET = -189,
      DTU_ERROR_EXECUTABLE_CRC_VERIFY = -190,
      DTU_ERROR_EXECUTABLE_DEVICE_VERIFY = -191,
      DTU_ERROR_INVALID_TS_OBJ = -192,
      DTU_ERROR_ALLOC_VDEV = -193,
      DTU_ERROR_FREE_VDEV = -194,
      DTU_ERROR_VDEV_BUSY = -195,
    } dtu_status;

     

     

    NULL和0的值雖然一樣,但前者的類型是void*,后者類型是int,差別很大的。

    diff --git a/sdk/lib/umd/tests/sample/sample_run.cc b/sdk/lib/umd/tests/sample/sample_run.cc
    index c5a3557c2a5..23e9563859b 100644
    --- a/sdk/lib/umd/tests/sample/sample_run.cc
    +++ b/sdk/lib/umd/tests/sample/sample_run.cc
    @@ -35,7 +35,7 @@ void usage() {
      
     dtu_context ctx;
     dtu_cluster cluster[4] = {NULL};
    -u32 cluster_id[4] = {NULL};
    +u32 cluster_id[4] = {0};
     dtu_mem_handle cluster_mem[4] = {NULL};
     dtu_sip sip[32] = {NULL};

     

    3.3.7. 函數原型中的const隱式轉換

    diff --git a/sdk/lib/cpu/cpu_func_manager.cc b/sdk/lib/cpu/cpu_func_manager.cc
    index 940bde5d91a..ec8967203c2 100644
    --- a/sdk/lib/cpu/cpu_func_manager.cc
    +++ b/sdk/lib/cpu/cpu_func_manager.cc
    @@ -31,7 +31,7 @@ struct FunctionInvoker {
       }
       template <size_t... idx>
       void unpack(std::index_sequence<idx...> seq, const void* func, char** argvs) {
    -    (*reinterpret_cast<void (*)(...)>(func))(argvs[idx]...);
    +    (*reinterpret_cast<void (*)(...)>(const_cast<void*>(func)))(argvs[idx]...);
       }
     };

     

    3.3.8. void*向char*的隱式轉換

    很多模塊直接對void*指針多算術運算,void*指向的對象大小是未知的,一般如果把它作為地址進行+或者-運算,實際上是自己先做了一次隱式的void* → char*的轉換,clang中不允許這樣做:

    diff --git a/sdk/tests/hlir/cc_tests/hlir_4c_add_test.cc b/sdk/tests/hlir/cc_tests/hlir_4c_add_test.cc
    index 5b9c2dcc98f..7b568f934bb 100644
    --- a/sdk/tests/hlir/cc_tests/hlir_4c_add_test.cc
    +++ b/sdk/tests/hlir/cc_tests/hlir_4c_add_test.cc
    @@ -66,8 +66,8 @@ static void Add4CTest(SimpleModuleOpBuilder::ShapeType &shape,
       executor.run(false);
      
       auto output_hanlde = executor.get_output(0);
    -  T* result =
    -      static_cast<T*>(output_hanlde->CPUPtr() + output_hanlde->offset());
    +  T* result = static_cast<T*>(static_cast<void*>(
    +      static_cast<char*>(output_hanlde->CPUPtr()) + output_hanlde->offset()));
       for (size_t i = 0; i < l_data.size(); ++i) {
         EXPECT_EQ(result[i], out_data[i]);
       }

     

    其他類似修改:

    sdk/tests/hlir/cc_tests/hlir_corner_test.cc

    sdk/tests/hlir/cc_tests/hlir_press_test.cc

    sdk/tests/tops/tops_dot_test.cc

    sdk/tests/op/hlir/pavo/bert/hlir_broadcast_test.cc

    sdk/tests/op/hlir/pavo/resnet50/hlir_transpose_test.cc

    sdk/tests/op/hlir/pavo/resnet50/hlir_test_header.h

    sdk/tests/op/hlir/pavo/resnet50/hlir_slice_test.cc

    sdk/tests/op/hlir/pavo/resnet50/hlir_select_and_scatter_test.cc

    sdk/tests/op/hlir/pavo/resnet50/hlir_select_and_scatter_non4c_test.cc

    sdk/tests/op/hlir/pavo/resnet50/hlir_pad_test.cc

    sdk/tests/op/hlir/pavo/resnet50/hlir_dynamic_update_slice_test.cc

    sdk/tests/op/hlir/pavo/resnet50/hlir_dynamic_slice_test.cc

    sdk/tests/op/hlir/pavo/resnet50/hlir_concat_test.cc

    sdk/tests/op/hlir/pavo/resnet50/hlir_broadcast_test.cc

    sdk/tests/op/hlir/pavo/dnn/hlir_test_header.h

    sdk/tests/op/hlir/hlir_test_header.h

    sdk/tests/runtime/executable_test.cc

    3.3.9. string類型到char*的隱式轉換

    diff --git a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
    index 7e366337561..41fb573a562 100644
    --- a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
    +++ b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
    @@ -23,7 +23,7 @@ KernelCode<T>::KernelCode(StringRef file)
         : compiled_(false), name_(file), module_("KernelModule", context_) {
       auto mb_or_err = MemoryBuffer::getFile(file);
       if (auto ec = mb_or_err.getError()) {
    -    EF_PRINT(UmdMsg::UMD_CANNOT_OPEN_MSG, file.data(), ec.message());
    +    EF_PRINT(UmdMsg::UMD_CANNOT_OPEN_MSG, file.data(), ec.message().c_str());
         EF_THROW_WITH << -1 << std::endl;
       }

     

     

    3.4. switch中break缺失

    3.4.1. 語義上確實需要break的場景,增加break

    例如parser.hpp里面在最后的default分支之前沒有加break,雖然由于default分支當前是空的,所以實際上不影響功能,但萬一后面default分支增加了任何處理,就會出問題:

    diff --git a/3rdparty/inja/include/inja/parser.hpp b/3rdparty/inja/include/inja/parser.hpp
    index 6266c4a0f74..466499ecc8b 100644
    --- a/3rdparty/inja/include/inja/parser.hpp
    +++ b/3rdparty/inja/include/inja/parser.hpp
    @@ -296,7 +296,7 @@ class Parser {
               operator_stack.pop();
               function_stack.pop();
             }
    -      }
    +      } break;
           default:
             break;
           }

     

    其他類似修改:

    sdk/sdk.bzl
    sdk/third_party/inja.patch

     

    3.4.2. 語義上確實不需要break的場景,增加編譯指示,讓編譯器忽略檢查

    這樣的問題比較普遍。

    diff --git a/sdk/tests/runtime/chunk_allocator_test.cc b/sdk/tests/runtime/chunk_allocator_test.cc
    index e63568ddc63..78896778a21 100644
    --- a/sdk/tests/runtime/chunk_allocator_test.cc
    +++ b/sdk/tests/runtime/chunk_allocator_test.cc
    @@ -552,6 +552,10 @@ TEST_F(ChunkAllocatorTest, copy_constructor_test) {
         uint64_t offset0 = 0;
         uint64_t offset1 = 0;
      
    +#if defined(__clang__)
    +#pragma clang diagnostic push
    +#pragma clang diagnostic ignored "-Wimplicit-fallthrough"
    +#endif
         switch (op) {
           case TestOpAllocTopDown:
           case TestOpAllocDownTop: {
    @@ -590,6 +594,9 @@ TEST_F(ChunkAllocatorTest, copy_constructor_test) {
             }
           } break;
         }
    +#if defined(__clang__)
    +#pragma clang diagnostic pop
    +#endif
       }
     }
     }  // namespace

     

     

    其他類似修改:

    sdk/tests/runtime/mem_manager_test.cc
    sdk/tests/runtime/mem_pool_test.cc
    sdk/tools/dtu_compiler/dtu_compiler.cc
    sdk/lib/umd/tests/sample/tinyxmlparser.cc

    另外,C++17開始支持fallthrough的attribute,可以比較簡單的告訴編譯器需要fallthrough:C++ attribute: fallthrough (since C++17) - cppreference.com

    
    

    3.5. format不匹配問題

    3.5.1. 不匹配,但實際上不影響功能

    format的string和后面實際傳遞的參數不一致的情況下,有可能導致嚴重問題,不過tops下面的代碼很多是ll類型傳遞了64位數據,實際上對功能影響不大,但如果后面有128位處理器,可能ll就是實際上128位,就可能導致堆棧異常。

     

    diff --git a/sdk/tests/runtime/chunk_allocator_test.cc b/sdk/tests/runtime/chunk_allocator_test.cc
    index e63568ddc63..78896778a21 100644
    --- a/sdk/tests/runtime/chunk_allocator_test.cc
    +++ b/sdk/tests/runtime/chunk_allocator_test.cc
    @@ -426,7 +426,7 @@ TEST_F(ChunkAllocatorTest, basic_stress_test) {
               if (allocated_size < allocated_size_pass) {
                 char str_buf[256];
                 snprintf(str_buf, sizeof(str_buf),
    -                     "allocated_size: %llx, allocated_chunks.size(): %lu",
    +                     "allocated_size: %lx, allocated_chunks.size(): %lu",
                          allocated_size, allocated_chunks.size());
                 EXPECT_TRUE(false) << str_buf;
                 break;

     

    其他文件:

    sdk/lib/umd/tests/sample/mm_test.cc

    sdk/include/driver/mem_handle.h

    sdk/include/runtime/command_packet.h

    sdk/include/driver/mem_handle.h

    sdk/tests/runtime/mem_pool_test.cc

    sdk/lib/umd/tests/sample/performance_test.cc

    sdk/tests/profile/test_zebu.cc

    sdk/runtime/tests/top_scheduler/loop_task_utils.h

     

     

    3.5.2. 不匹配,并且影響功能

     

    下面本意是打印uint16_t*的指針指向的數據,錯誤傳遞成指針,相當于打印的是一個地址,而不是值,幸好只是一句打印,但實際上%hu對應的是32位,而入參指針在64位機器上是64位,還是會破壞堆棧:

    diff --git a/sdk/include/runtime/command_packet.h b/sdk/include/runtime/command_packet.h
    index a2d061e9117..5006a601cc0 100644
    --- a/sdk/include/runtime/command_packet.h
    +++ b/sdk/include/runtime/command_packet.h
    @@ -362,7 +362,7 @@ struct CommandPacket {
        */
       static std::string MemberToString(uint16_t* p, std::string tab = "    ") {
         char buf[256];
    -    snprintf(buf, sizeof(buf), "%hu", p);
    +    snprintf(buf, sizeof(buf), "%hu", *p);
         return buf;
       }

     

    3.6. 有定義無使用

    3.6.1. 未使用變量

     

    diff --git a/sdk/lib/umd/tests/sample/launch_code.cc b/sdk/lib/umd/tests/sample/launch_code.cc
    index 1152a283052..708b1f44e7d 100644
    --- a/sdk/lib/umd/tests/sample/launch_code.cc
    +++ b/sdk/lib/umd/tests/sample/launch_code.cc
    @@ -180,7 +180,6 @@ static void launch_code_with_cluster_check(void) {
       dtu_mem_handle param = cluster_mem[0];
       u64 param_off = A_B_SIZE + ONE_C_SIZE;
       u64 param_size = PARAM_TRUE_SIZE;
    -  u16 launch_entry = 0;
       dtu_sip_mode_cfg_st mode;
       mode.mode_dw = 0x5070f10;
       LaunchKernelParameter parameter(sip[0], param, param_off, param_size, 0, mode,
    @@ -363,7 +362,6 @@ static void launch_code_for_one_sip(void) {
       dtu_mem_handle param = cluster_mem[0];
       u64 param_off = A_B_SIZE + ONE_C_SIZE;
       u64 param_size = PARAM_TRUE_SIZE;
    -  u16 launch_entry = 0;
       dtu_sip_mode_cfg_st mode;
       mode.mode_dw = 0x5070f10;
       LaunchKernelParameter parameter(sip[0], param, param_off, param_size, 0, mode,
    @@ -537,7 +535,6 @@ static void launch_one_sip_twice(void) {
       dtu_mem_handle param = cluster_mem[0];
       u64 param_off = A_B_SIZE + TWO_C_SIZE;
       u64 param_size = PARAM_TRUE_SIZE;
    -  u16 launch_entry = 0;
       dtu_sip_mode_cfg_st mode;
       mode.mode_dw = 0x5070f10;
       LaunchKernelParameter parameter[2];
    @@ -719,11 +716,10 @@ static void _launch_code_for_eight_sip(int cid, bool check_result) {
       dtu_mem_handle param = cluster_mem[cid];
       u64 param_off = A_B_SIZE + EIGHT_C_SIZE;
       u64 param_size = PARAM_TRUE_SIZE;
    -  u16 launch_entry = 0;
       dtu_sip_mode_cfg_st mode;
       mode.mode_dw = 0x5070f10;
       LaunchKernelParameter parameter[8];

     

    其他文件:

    sdk/tests/spm/basic.cc

    sdk/lib/spm/src/best_fit_policy.c

     

     

    3.6.2. 未使用參數

    非常多,尤其涉及一些第三方組件,還要專門制作patch的方式修改,后面忍不住把Werr關掉主要也是因為這個告警:

     
    diff --git a/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc b/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc
    index abd50ad4e81..1b39f594406 100644
    --- a/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc
    +++ b/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc
    @@ -16,6 +16,7 @@
     #include "dtu/umd/dtu.h"
     #include "dtu/umd/dtu_base_obj.h"
     #include "dtu/umd/dtu_log.h"
    +#include "dtu/umd/dtu_utils.h"
     #include "lib/umd/src/dtu_memory.h"
     #include "lib/umd/tests/sample/sample.h"
     #include "lib/umd/tests/sample/sample_assert.h"
    @@ -26,6 +27,7 @@ std::mutex mtx;
      
     void event_callback_func_1(dtu_callback callback, void *user_data,
                                u32 engine_id) {
    +  MAYBE_UNUSED(callback);
       std::unique_lock<std::mutex> lock(mtx);
       *(int *)user_data = 1;
       DTU_ERROR_LOG(TEST, "event callback_1 call[%d]\n", engine_id);
    @@ -33,6 +35,7 @@ void event_callback_func_1(dtu_callback callback, void *user_data,
      
     void event_callback_func_2(dtu_callback callback, void *user_data,
                                u32 engine_id) {
    +  MAYBE_UNUSED(callback);
       std::unique_lock<std::mutex> lock(mtx);
       *(int *)user_data = 1;
       DTU_ERROR_LOG(TEST, "event callback_2 call[%d]\n", engine_id);

     

    其他文件:

    tools/logging/lib/logging/log.cc

    tools/logging/lib/logging/to/file.cc

    tools/logging/lib/logging/to/std_err.cc

    tools/logging/lib/util/signal_handler.cc

    tools/logging/tests/logging/log_to_test.h

    sdk/lib/umd/include/dtu_utils.h

    sdk/lib/umd/include/reference_obj.h

    3rdparty/protobuf-3.8.0/src/google/protobuf/arena.h

    sdk/lib/umd/tests/sample/device_reset.cc

    sdk/lib/umd/tests/sample/usr_irq.cc

    sdk/lib/umd/tests/sample/callback_test.cc

    3rdparty/protobuf-3.8.0/src/google/protobuf/map_type_handler.h

    3rdparty/protobuf-3.8.0/src/google/protobuf/parse_context.h

    kmd/utils/ktest/kmd-test.cpp

    sdk/lib/spm/src/buddy_policy.c

    sdk/lib/umd/include/dtu_command_obj.h

    sdk/lib/umd/include/dtu_context_obj.h

    sdk/lib/umd/include/dtu_dqm_obj.h

    sdk/lib/umd/include/dtu_driver.h

    system_test/tools/vpd_cycle/vpd_cycle.c

    sdk/lib/spm/src/buddy_policy.c

    sdk/lib/spm/src/interface.c

    sdk/lib/spm/src/rbtree.c

    sdk/lib/umd/include/dtu_device.h

    sdk/lib/umd/include/dtu_driver.h

     

    另外,tools/logging/include/logging/check.h里面的未使用變量比較特殊,實際上是要用的,不過接口調用錯了,導致信息傳遞中丟失了:

    diff --git a/tools/logging/include/logging/check.h b/tools/logging/include/logging/check.h
    index eb856b7df85..67a667477f1 100644
    --- a/tools/logging/include/logging/check.h
    +++ b/tools/logging/include/logging/check.h
    @@ -47,16 +47,16 @@
     #define EFCHECK_STRCASENE(s1, s2) EF_DTU_CHECK_STROP(strcasecmp, !=, false, s1, s2)
      
     #undef EFCHECK_NOTNULL
    -#define EFCHECK_NOTNULL(val) \
    -  ::ef_log::CheckNotNull(__FILE__, __LINE__, "'" #val "' Must be non NULL", (val))
    -
    +#define EFCHECK_NOTNULL(val)                                                \
    +  ::ef_log::CheckNotNull(__FILE__, __LINE__, "'" #val "' Must be non NULL", \
    +                         (val))
      
     namespace ef_log {
      
     template <typename T>
     T&& CheckNotNull(const char* file, int line, const char* exprtext, T&& t) {
       if (t == nullptr) {
    -    EFLOG(FATAL) << std::string(exprtext);
    +    ::ef_log::FatalLog(file, line) << std::string(exprtext);
       }
       return std::forward<T>(t);
     }

     

     

     

    3.6.3. 未使用label

     

    diff --git a/sdk/include/scheduler/cmd_packet_pass_util.h b/sdk/include/scheduler/cmd_packet_pass_util.h
    index d56b5f362e8..1cc18ddc603 100644
    --- a/sdk/include/scheduler/cmd_packet_pass_util.h
    +++ b/sdk/include/scheduler/cmd_packet_pass_util.h
    @@ -457,7 +457,6 @@ void MultiThreadDo(PacketGraph* graph, InitFuncS initf, ThreadFunc f,
       uninif(core_count);
      
       delete[] ptl;
    -Exit0:
       return;
     }

     

     

    3.6.4. 執行不到的代碼

    下面代碼開發解釋是當前不支持,又不想刪除,先加個注釋:

    diff --git a/sdk/runtime/tests/top_scheduler/TimerTest.cc b/sdk/runtime/tests/top_scheduler/TimerTest.cc
    index cb1e2269dd4..5aea6c1f956 100644
    --- a/sdk/runtime/tests/top_scheduler/TimerTest.cc
    +++ b/sdk/runtime/tests/top_scheduler/TimerTest.cc
    @@ -127,12 +127,12 @@ TEST_F(TimerTest, Timer) {
         L3DMA = EngineType::Type::ODMA;
       } else if (IsPavoT20() || IsPavoT21()) {
         return;  // Need TS FW;
    -    assembler = new ExecutableAssembler(TargetType::PAVO);
    -    L3DMA = EngineType::Type::CDMA_LITE;
    +    // assembler = new ExecutableAssembler(TargetType::PAVO);
    +    // L3DMA = EngineType::Type::CDMA_LITE;
       } else if (IsDoradoI20() || IsDoradoI21()) {
         return;  // Need TS FW;
    -    assembler = new ExecutableAssembler(TargetType::DORADO);
    -    L3DMA = EngineType::Type::CDMA;
    +    // assembler = new ExecutableAssembler(TargetType::DORADO);
    +    // L3DMA = EngineType::Type::CDMA;
       } else {
         return;
       }

     

    sdk/tests/tops/tops_customop_upsample_nearest_test.cc也會報未使用代碼,主要是因為Co當前是固定值,導致第一層if判斷永遠未false,實際上后面這層循環也兼容了Co為1的場景,完全可以去掉:

    diff --git a/sdk/tests/tops/tops_customop_upsample_nearest_test.cc b/sdk/tests/tops/tops_customop_upsample_nearest_test.cc
    index 2b59c3fc0fc..cf19502b9b4 100644
    --- a/sdk/tests/tops/tops_customop_upsample_nearest_test.cc
    +++ b/sdk/tests/tops/tops_customop_upsample_nearest_test.cc
    @@ -175,32 +175,17 @@ TEST_F(TopsTest, CustomCall_UpSample_Nearest_1) {
       int n_offset = Ho * Wo * Co;
       int h_offset = Wo * Co;
      
    -  if (Co == 1) {
    -    for (int n = 0; n < N; ++n) {
    -      int n_offset = n * n_offset;
    -      for (int h = 0; h < Ho; ++h) {
    -        int h_index = h / scale_H;
    -        for (int w = 0; w < Wo; ++w) {
    -          int w_index = w / scale_W;
    -          output_ref[n_offset + h * h_offset + w] =
    -              image_data[n * Hi * Wi * Ci + h_index * Wi * Ci + w_index * Ci];
    -        }
    -      }
    -    }
    -
    -  } else {
    -    for (int n = 0; n < N; ++n) {
    -      int n_offset = n * Ho * Wo * Co;
    -      for (int h = 0; h < Ho; ++h) {
    -        int h_index = h / scale_H;
    -        for (int w = 0; w < Wo; ++w) {
    -          int w_index = w / scale_W;
    -          for (int c = 0; c < Co; ++c) {
    -            int c_index = c / scale_C;
    -            output_ref[n_offset + h * h_offset + w * Co + c] =
    -                image_data[n * Hi * Wi * Ci + h_index * Wi * Ci + w_index * Ci +
    -                           c_index];
    -          }
    +  for (int n = 0; n < N; ++n) {
    +    int n_offset = n * Ho * Wo * Co;
    +    for (int h = 0; h < Ho; ++h) {
    +      int h_index = h / scale_H;
    +      for (int w = 0; w < Wo; ++w) {
    +        int w_index = w / scale_W;
    +        for (int c = 0; c < Co; ++c) {
    +          int c_index = c / scale_C;
    +          output_ref[n_offset + h * h_offset + w * Co + c] =
    +              image_data[n * Hi * Wi * Ci + h_index * Wi * Ci + w_index * Ci +
    +                         c_index];
             }
           }
         }

     

     

    3.6.5. 未被調用的inline函數

    diff --git a/sdk/lib/umd/tests/sample/memcpy_odma.cc b/sdk/lib/umd/tests/sample/memcpy_odma.cc
    index 3cfc9777934..2b565c5e55e 100644
    --- a/sdk/lib/umd/tests/sample/memcpy_odma.cc
    +++ b/sdk/lib/umd/tests/sample/memcpy_odma.cc
    @@ -9,6 +9,7 @@
      
     #include "dtu/umd/dtu.h"
     #include "dtu/umd/dtu_interface.h"
    +#include "dtu/umd/dtu_utils.h"
     #include "lib/umd/tests/sample/sample.h"
     #include "lib/umd/tests/sample/sample_assert.h"
      
    @@ -991,6 +992,7 @@ static void memcpy_host_to_hbm_mc_scan_sync(void) {
     }
     MAKE_SAMPLE_FROM_FUNCTION(memcpy_host_to_hbm_mc_scan_sync);
      
    +#if 0
     static int odma_copy(dtu_mem_handle dst_hdl, u64 dst_offset,
                          dtu_mem_handle src_hdl, u64 src_offset, u64 size,
                          u32 engine_id) {
    @@ -1034,6 +1036,7 @@ static int odma_copy(dtu_mem_handle dst_hdl, u64 dst_offset,
       dtu_command_queue_destroy(queue);
       return 0;
     }
    +#endif
      
     #define MB (1 * 1024 * 1024)
     #if 0

     

     

    其他文件:

    sdk/lib/spm/src/buddy_policy.c

    sdk/lib/umd/tests/sample/mm_test.cc

     

    3.6.6. 未使用的class聲明

    diff --git a/dtu_backend/dtu_executor.h b/dtu_backend/dtu_executor.h
    index 5149656537f..c361bb15e9d 100644
    --- a/dtu_backend/dtu_executor.h
    +++ b/dtu_backend/dtu_executor.h
    @@ -50,7 +50,6 @@ class ClusterAllocation;
     }
      
     class DTUObject;
    -class sr::TaskContext;
     class DTUExecutor : public ::xla::dtu::DTUExecutorInterface {
      public:
       typedef typename sr::TaskContext context_type;

     

    3.6.7. 未使用的類型定義

    diff --git a/sdk/tests/tops/tops_transform_parameter_test.cc b/sdk/tests/tops/tops_transform_parameter_test.cc
    index ee7718e40f1..c7b99fb323d 100644
    --- a/sdk/tests/tops/tops_transform_parameter_test.cc
    +++ b/sdk/tests/tops/tops_transform_parameter_test.cc
    @@ -483,7 +483,6 @@ TEST_P(TopsGraphTransformParameterTest, TopsConv) {
           break;
       }
      
    -  typedef float D_TYPE;
       int inputdata_size = input_length * (sizeof(input_data[0]));
      
       topsMemory_t output_mem;

     

     

    3.7. 重復定義

    tops代碼棧里面各個模塊都分別定義的宏非常多,輪到大家相互include的時候就會有大量重復定義問題,解決這個問題的根本解決方案還是需要提取一些公共的頭文件,但各模塊當前又不希望相互間存在依賴,當前只能用ifndef來包起來臨時規避:

    diff --git a/sdk/lib/umd/tests/sample/loop_task.cc b/sdk/lib/umd/tests/sample/loop_task.cc
    index 2869f571029..099dcaf53f1 100644
    --- a/sdk/lib/umd/tests/sample/loop_task.cc
    +++ b/sdk/lib/umd/tests/sample/loop_task.cc
    @@ -22,12 +22,14 @@
      
     using namespace std;
      
    +#ifndef EFCHECK
     #define EFCHECK(__statement__)                                       \
       do {                                                               \
         sts = __statement__;                                             \
         if (sts != DTU_SUCCESS)                                          \
           failed_assertion("Failed:", __FILE__, __FUNCTION__, __LINE__); \
       } while (0)
    +#endif
      
     template <int N>
     struct DataLayout {

     

    其他文件:

    sdk/lib/umd/tests/sample/sample_assert.h

    system_test/tools/vpd_cycle/vpd_cycle.c

     

    3.8. 入參初始化順序異常

    這個就出現過一次:

    diff --git a/sdk/include/factor/func.h b/sdk/include/factor/func.h
    index af24b782ed1..50137f7c5dd 100644
    --- a/sdk/include/factor/func.h
    +++ b/sdk/include/factor/func.h
    @@ -4163,10 +4163,10 @@ struct FACTOR_EXPORT ConvGenDescParams {
                         int64_t Co, int64_t R, int64_t S)
           : conv_type(conv_type),
             data_format(data_format),
    -        stride(stride),
    -        dailations(dailations),
             opt_level(opt_level),
             padding(padding),
    +        stride(stride),
    +        dailations(dailations),
             N(N),
             Hi(Hi),
             Wi(Wi),

     

    其他修改的文件:

    sdk/tests/tops/tops_convert_parameter_test.cc

     

     

    3.9. 類型申明不全

    clang對直接聲明一個class,但包含的頭文件里面找不到完整定義的會報錯。

    要找到tf頭文件的定義順序是個非常麻煩的事情,幸好clang會自動搜索頭文件,所以用clang的宏包起來了。

    diff --git a/sdk/lib/cpu/cpu_func_runtime_context.h b/sdk/lib/cpu/cpu_func_runtime_context.h
    index 530b4def8ad..62ba4099e6e 100644
    --- a/sdk/lib/cpu/cpu_func_runtime_context.h
    +++ b/sdk/lib/cpu/cpu_func_runtime_context.h
    @@ -23,6 +23,10 @@
     #include <tuple>
     #include <vector>
      
    +#if defined(__clang__)
    +#include "tensorflow/compiler/xla/service/cpu/simple_orc_jit.h"
    +#endif
    +
     namespace xla {
     namespace cpu {
     class SimpleOrcJIT;

     

    3.10. 數組初始化

    3.10.1. 確實必須是變長數組的使用new[]()和delete[]來申請和釋放內存

    diff --git a/sdk/lib/cpu_ops/naive/dot.cc b/sdk/lib/cpu_ops/naive/dot.cc
    index f4bb6b7d877..be7ddb0ab23 100755
    --- a/sdk/lib/cpu_ops/naive/dot.cc
    +++ b/sdk/lib/cpu_ops/naive/dot.cc
    @@ -31,9 +31,9 @@ void vectorMul_4_4(const int64_t M, const int64_t N, const int64_t K, outT* out,
         int64_t m_stride = (M - m) >= stride ? stride : (M - m);
         for (int64_t n = 0; n < N;) {
           int64_t n_stride = (N - n) >= stride ? stride : (N - n);
    -      register outT out_reg[m_stride * n_stride] = {0};
    -      register lhsT lhs_reg[m_stride];
    -      register rhsT rhs_reg[n_stride];
    +      register outT* out_reg = new outT[m_stride * n_stride]();
    +      register outT* lhs_reg = new outT[m_stride]();
    +      register outT* rhs_reg = new outT[n_stride]();
           for (int64_t i = 0; i < K; i++) {
             for (auto idx = 0; idx < m_stride; idx++) {
               lhs_reg[idx] = ELEMENT(lhs, m + idx, i, K);
    @@ -53,6 +53,9 @@ void vectorMul_4_4(const int64_t M, const int64_t N, const int64_t K, outT* out,
             }
           }
           n += n_stride;
    +      delete[] rhs_reg;
    +      delete[] lhs_reg;
    +      delete[] out_reg;
         }
         m += m_stride;
       }

     

    其他類似修改:

    sdk/lib/umd/tests/sample/mm_test.cc

    sdk/lib/cpu_ops/naive/dot.cc

    sdk/lib/factor/codegen/macro_instruction/minst_conv2d_bpi.cc

     

     

    3.10.2. 實際語義是定長數組的,通過加const修飾來解決

    這種在test里面非常多,大家定義數組的時候都沒有習慣把數組的長度定義加上const修飾符,這樣不斷可以增加執行效率,也不容易出錯。

     

    diff --git a/sdk/sample/batchnormalInference/tops_batchnormalInference.cc b/sdk/sample/batchnormalInference/tops_batchnormalInference.cc
    index 8df67784dec..1db6440e22c 100644
    --- a/sdk/sample/batchnormalInference/tops_batchnormalInference.cc
    +++ b/sdk/sample/batchnormalInference/tops_batchnormalInference.cc
    @@ -67,16 +67,16 @@ void topsBatchNormalInferenceNHWC() {
       topsTensorDescriptor_t xDesc;
       topsTensorDescriptor_t yDesc;
      
    -  int x_c = 4;
    -  int x_h = 2;
    -  int x_n = 3;
    -  int x_w = 2;
    +  const int x_c = 4;
    +  const int x_h = 2;
    +  const int x_n = 3;
    +  const int x_w = 2;
      
    -  int scaleNums = x_c;
    -  int y_c = x_c;
    -  int y_h = x_h;
    -  int y_n = x_n;
    -  int y_w = x_w;
    +  const int scaleNums = x_c;
    +  const int y_c = x_c;
    +  const int y_h = x_h;
    +  const int y_n = x_n;
    +  const int y_w = x_w;
      
       topsContext_t context;
       int clusters[] = {0};
    @@ -90,7 +90,7 @@ void topsBatchNormalInferenceNHWC() {
       topsSetTensorDescriptor(yDesc, TOPS_TENSOR_NHWC, TOPS_DATA_FLOAT, y_n, y_c,
                               y_h, y_w);
      
    -  int inputdata_num = x_c * x_h * x_n * x_w;
    +  const int inputdata_num = x_c * x_h * x_n * x_w;
      
       D_TYPE InputData[inputdata_num] = {
           2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
    @@ -170,16 +170,16 @@ void topsBatchNormalInferenceCHNW() {
       topsTensorDescriptor_t xDesc;
       topsTensorDescriptor_t yDesc;
      
    -  int x_c = 4;
    -  int x_h = 2;
    -  int x_n = 3;
    -  int x_w = 2;
    +  const int x_c = 4;
    +  const int x_h = 2;
    +  const int x_n = 3;
    +  const int x_w = 2;
      
    -  int scaleNums = x_c;
    -  int y_c = x_c;
    -  int y_h = x_h;
    -  int y_n = x_n;
    -  int y_w = x_w;
    +  const int scaleNums = x_c;
    +  const int y_c = x_c;
    +  const int y_h = x_h;
    +  const int y_n = x_n;
    +  const int y_w = x_w;
      
       topsContext_t context;
       int clusters[] = {0};
    @@ -193,7 +193,7 @@ void topsBatchNormalInferenceCHNW() {
       topsSetTensorDescriptor(yDesc, TOPS_TENSOR_CHNW, TOPS_DATA_FLOAT, y_n, y_c,
                               y_h, y_w);
      
    -  int inputdata_num = x_c * x_h * x_n * x_w;
    +  const int inputdata_num = x_c * x_h * x_n * x_w;
      
       D_TYPE InputData[inputdata_num] = {
           2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
    @@ -275,16 +275,16 @@ void topsBatchNormalInferenceBoundary() {
       topsTensorDescriptor_t xDesc;
       topsTensorDescriptor_t yDesc;
      
    -  int x_c = 4;
    -  int x_h = 2;
    -  int x_n = 50;
    -  int x_w = 2;
    +  const int x_c = 4;
    +  const int x_h = 2;
    +  const int x_n = 50;
    +  const int x_w = 2;
      
    -  int scaleNums = x_c;
    -  int y_c = x_c;
    -  int y_h = x_h;
    -  int y_n = x_n;
    -  int y_w = x_w;
    +  const int scaleNums = x_c;
    +  const int y_c = x_c;
    +  const int y_h = x_h;
    +  const int y_n = x_n;
    +  const int y_w = x_w;
      
       topsContext_t context;
       int clusters[] = {0};
    @@ -298,7 +298,7 @@ void topsBatchNormalInferenceBoundary() {
       topsSetTensorDescriptor(yDesc, TOPS_TENSOR_CHNW, TOPS_DATA_FLOAT, y_n, y_c,
                               y_h, y_w);
      
    -  int inputdata_num = x_c * x_h * x_n * x_w;
    +  const int inputdata_num = x_c * x_h * x_n * x_w;
      
       D_TYPE InputData[inputdata_num];
       for (int i = 0; i < inputdata_num; i++) {
    @@ -380,16 +380,16 @@ void topsBatchNormalInferenceScaleOffset() {
       topsTensorDescriptor_t xDesc;
       topsTensorDescriptor_t yDesc;
      
    -  int x_c = 4;
    -  int x_h = 2;
    -  int x_n = 3;
    -  int x_w = 2;
    +  const int x_c = 4;
    +  const int x_h = 2;
    +  const int x_n = 3;
    +  const int x_w = 2;
      
    -  int scaleNums = x_c;
    -  int y_c = x_c;
    -  int y_h = x_h;
    -  int y_n = x_n;
    -  int y_w = x_w;
    +  const int scaleNums = x_c;
    +  const int y_c = x_c;
    +  const int y_h = x_h;
    +  const int y_n = x_n;
    +  const int y_w = x_w;
      
       topsContext_t context;
       int clusters[] = {0};
    @@ -403,7 +403,7 @@ void topsBatchNormalInferenceScaleOffset() {
       topsSetTensorDescriptor(yDesc, TOPS_TENSOR_NHWC, TOPS_DATA_FLOAT, y_n, y_c,
                               y_h, y_w);
      
    -  int inputdata_num = x_c * x_h * x_n * x_w;
    +  const int inputdata_num = x_c * x_h * x_n * x_w;
      
       D_TYPE InputData[inputdata_num] = {
           2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

     

    還有很多,僅列出文件名:

    sdk/sample/batchnormalTraining/tops_batchnormalTraining.cc

    sdk/sample/broadcast/tops_broadcast.cc

    sdk/sample/resnet50/TopsOpApi.cc

    sdk/tests/tops/tops_batchnormalBackward_test.cc

    sdk/tests/tops/tops_batchnormalTraining_test.cc

    sdk/tests/tops/tops_concat_test.cc

    sdk/tests/tops/tops_convert_test.cc

    sdk/tests/tops/tops_customop_test.cc

    sdk/tests/tops/tops_scatter_test.cc

    sdk/tests/tops/tops_bnForwardTrainingEx_unit_test.cc (這個文件修改了1800+行,逼得我單獨成了一個patch)

    sdk/tests/tops/tops_broadcast_test.cc

    sdk/tests/tops/tops_concat_test.cc

    sdk/tests/tops/tops_convert_test.cc

    sdk/tests/tops/tops_descriptor_test.cc

    sdk/tests/tops/tops_pad_test.cc

    sdk/tests/tops/tops_scatter_test.cc

     

    3.11. 函數原型中的auto

    clang禁止在函數原型中使用auto入參,我理解主要出于以下考慮:

    1、如果該函數作為接口暴露接口出去,調用者應該用什么類型的實參?

    2、如果多個調用,使用的實參類型不一樣,函數體類對入參進行處理時是否會觸發隱式的類型轉換?而clang對存在信息損耗的隱式的類型轉換是嚴格禁止的。

    3、如果多個調用時,入參本身使用的存儲長度不一樣,是否會導致堆棧被破壞?例如有些用int,有些用long,函數具體編譯過程中是應該實例化出來2個實體,還是單個實體?

    4、函數翻譯成C函數的時候,函數名稱應該怎么生成?C++函數名稱轉換為C函數名稱的時候,可沒有考慮auto入參的轉換規則。

    auto入參的問題,主要體現在sdk/lib/tuner/pavo/和sdk/tests/factor/targets/pavo/dnn/conv/目錄中:

    diff --git a/sdk/lib/tuner/pavo/pavo_conv_dataflow3_bpi_non4c_impl.cc b/sdk/lib/tuner/pavo/pavo_conv_dataflow3_bpi_non4c_impl.cc
    index dba48418fc2..2c59bc4eda6 100644
    --- a/sdk/lib/tuner/pavo/pavo_conv_dataflow3_bpi_non4c_impl.cc
    +++ b/sdk/lib/tuner/pavo/pavo_conv_dataflow3_bpi_non4c_impl.cc
    @@ -31,8 +31,8 @@ namespace factor {
     using namespace hlir;
      
     static std::vector<std::vector<int64_t>> build_dim(
    -    std::vector<int64_t> dim_count, auto cores_on_dim, auto sip_cord,
    -    int64_t sip_num) {
    +    std::vector<int64_t> dim_count, std::vector<int64_t> cores_on_dim,
    +    std::vector<std::vector<int64_t>> sip_cord, int sip_num) {
       std::vector<int64_t> dim_count1 = {
           dim_count[0] / cores_on_dim[0], dim_count[1] / cores_on_dim[1],
           dim_count[2] / cores_on_dim[2], dim_count[3] / cores_on_dim[3]};

     

    其他函數的修改類似,僅列出文件名:

    sdk/lib/tuner/pavo/pavo_conv_dataflow5_bpi_non4c_impl.cc
    sdk/lib/tuner/pavo/
    pavo_conv_dataflow7_bpi_non4c_impl.cc

    sdk/lib/tuner/pavo/pavo_conv_dataflow1_bpi_non4c_impl.cc

    sdk/lib/tuner/pavo/pavo_conv_dataflow2_bpi_non4c_impl.cc

    sdk/lib/tuner/pavo/pavo_conv_dataflow3_1_forward_non4c_impl.cc

    sdk/lib/tuner/pavo/pavo_conv_dataflow5_1_forward_non4c_impl.cc

    sdk/lib/tuner/pavo/pavo_conv_dataflow6_bpk_non4c_impl.cc

    sdk/lib/tuner/pavo/pavo_conv_dataflow7_1_forward_non4c_impl.cc

    sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c6s_bpi_dataflow1_template_test.cc

    sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_bpk_1c4s_dataflow7_template_test.cc

    sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_bpk_1c6s_dataflow6_template_test.cc

    sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_ff_dataflow3_1_template_test.cc

    sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_ff_dataflow5_1_template_test.cc

    sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_ff_dataflow7_1_template_test.cc

    sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow2_template_test.cc

    sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow3_template_test.cc

    sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow5_template_test.cc

    sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow7_template_test.cc

    sdk/lib/ops/common/dtu_elementwise_fusion_impl.cc

    sdk/tests/llir/dma_test/slice_dma_test.cc

    sdk/tests/llir/dma_test/broadcast_dma_test.cc

    sdk/tests/llir/dma_test/deslice_dma_test.cc

    sdk/tests/llir/dma_test/mirror_dma_test.cc

    sdk/tests/llir/dma_test/padding_dma_test.cc

    sdk/tests/llir/dma_test/subsampling_dma_test.cc

    sdk/tests/llir/dma_test/transpose_dma_test.cc

     

    3.12. strlen返回值不作為常量類型的處理

    clang里面把strlen返回值當做變量處理,如果要作為const來使用,需要自己定義函數:

    diff --git a/sdk/lib/profile/topspti/reader/helper.h b/sdk/lib/profile/topspti/reader/helper.h
    index c56502bef86..1d642e2431e 100644
    --- a/sdk/lib/profile/topspti/reader/helper.h
    +++ b/sdk/lib/profile/topspti/reader/helper.h
    @@ -28,7 +28,6 @@
      
     #include <cstring>
     #include <string>
    -
     #include "utils/utils.h"
      
     namespace topspti2 {
    @@ -36,6 +35,10 @@ namespace topspti2 {
     #define TENSOR_MARK "!dtu.tensor<"
     #define TENSOR_MARK_SZ (sizeof(TENSOR_MARK) - 1)
      
    +int constexpr CONSTEXPR_STRLEN(const char *str) {
    +  return *str ? 1 + CONSTEXPR_STRLEN(str + 1) : 0;
    +}
    +
     static inline bool HasDPF(const std::string &product) {
       return (product != "" && product != "unknown" && product != "T10" &&
               product != "T11" && product != "T10s" && product != "I10");
    @@ -123,7 +126,7 @@ static inline bool FastParseSizeFromTensor(const std::string &tensor,
       if (std::string::npos == pos) {
         return false;
       }
    -  constexpr int tensor_mark_sz = strlen(TENSOR_MARK);
    +  constexpr int tensor_mark_sz = CONSTEXPR_STRLEN(TENSOR_MARK);
       const char *data = tensor.c_str();
       while (pos != std::string::npos) {
         pos += tensor_mark_sz;
    @@ -202,7 +205,7 @@ static inline bool FastParseSizeFromMemref(const std::string &memref,
       if (0 != pos) {
         return false;
       }
    -  constexpr auto memref_mark_sz = strlen(MEMREF_MARK);
    +  constexpr auto memref_mark_sz = CONSTEXPR_STRLEN(MEMREF_MARK);
       pos += memref_mark_sz;
       int64_t prod = 1;
       size_t lz = memref.size();
    @@ -261,7 +264,7 @@ static inline bool ParseTensorInfoFromString(const std::string &input,
                                                  TensorInfoValue &tiv) {
       tiv = TensorInfoValue();
       constexpr const char *const szstr = "size:";
    -  constexpr int sz = strlen(szstr);
    +  constexpr int sz = CONSTEXPR_STRLEN(szstr);
      
       if (input.size() > sz && !strncmp(input.c_str(), szstr, sz)) {
         tiv.size = stoll(input.substr(sz));

     

    其他類似修改:

    sdk/lib/profile/libprofile_defs.h

    3.13. 其他語法問題

    3.13.1. lambda語法問題

    參見 Lambda expressions (since C++11) - cppreference.com,lambda表達式的capture用法如下:

    a comma-separated list of zero or more captures, optionally beginning with a capture-default.

    See below for the detailed description of captures.

    A lambda expression can use a variable without capturing it if the variable

    • is a non-local variable or has static or thread local storage duration (in which case the variable cannot be captured), or
    • is a reference that has been initialized with a constant expression.

    A lambda expression can read the value of a variable without capturing it if the variable

    • has const non-volatile integral or enumeration type and has been initialized with a constant expression, or
    • is constexpr and has no mutable members.

    上面的描述是說,下面這幾種情況不需要指定capture:

    1)非局部變量(全局變量)

    2)static變量

    3) thread local 變量(這種情況下不是不需要指定,是指定了也用不了)

    4)常量表達式初始化的對象的引用

    5)常量表達式初始化的非volatile整型或者枚舉類型(只讀訪問)

    6)不帶可變成員的常量表達式(只讀訪問)

    sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc里面使用的module_str是全局變量,不需要指定捕獲,原來的寫法在gcc5上可以編譯通過,但gcc7和clang下面會直接報錯:
    diff --git a/sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc b/sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc
    index 6f17f27ca4d..70506e38506 100644
    --- a/sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc
    +++ b/sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc
    @@ -38,7 +38,7 @@ TEST(MTTest, PassMgr) {
       std::vector<std::thread> th_vec;
       th_vec.reserve(thread_count);
       for (size_t i = 0; i < thread_count; ++i) {
    -    th_vec.emplace_back([&module_str]() {
    +    th_vec.emplace_back([]() {
           mlir::MLIRContext context;
           mlir::OwningModuleRef module =
               mlir::parseSourceString(module_str, &context);

     

    下面這個寫法由于this指針雖然指定了捕獲,但沒有使用,所以會有“expression result unused [-Wunused-value]”告警,設置了捕獲相當于在lambda函數里面做了一次聲明,如果未使用會有告警:

    diff --git a/tools/logging/tests/logging/test_log_old_api.cc b/tools/logging/tests/logging/test_log_old_api.cc
    index 638a84cac6b..91cbf6b2676 100644
    --- a/tools/logging/tests/logging/test_log_old_api.cc
    +++ b/tools/logging/tests/logging/test_log_old_api.cc
    @@ -18,7 +18,7 @@ class OldLogTest : public testing::Test {
         Test::SetUp();
         RegisterLogTo(this->pLog);
         pLog->setCallback(
    -        [this](const std::string &msg) { std::cerr << msg << std::endl; });
    +        [](const std::string &msg) { std::cerr << msg << std::endl; });
         pLog->SetAutoClear(true);
       }

     

    類似的,sdk/lib/ops/common/dtu_scatter_impl.cc里面將常量alignment在捕獲中定義也是錯誤的:

    diff --git a/sdk/lib/ops/common/dtu_scatter_impl.cc b/sdk/lib/ops/common/dtu_scatter_impl.cc
    index 032c19e6ef9..99058631d4c 100644
    --- a/sdk/lib/ops/common/dtu_scatter_impl.cc
    +++ b/sdk/lib/ops/common/dtu_scatter_impl.cc
    @@ -92,7 +92,7 @@ bool predicate_func(int64_t i) {
      
     // alloc_ memory with alignment of 128 byte.
     const uint32_t alignment = 128;
    -auto GetAlignedSize = [alignment](uint64_t size) {
    +auto GetAlignedSize = [](uint64_t size) {
       return (size + alignment - 1) / alignment * alignment;
     };

     

    3.13.2. return語句中的move調用

    在return語句中使用std::move會使編譯器的copy elision失效,下面修改之前的代碼clang會上報告警“moving a local object in a return statement prevents copy elision [-Wpessimizing-move]”,什么是copy elision?

    Copy elision - cppreference.com上的定義如下:Omits copy and move (since C++11) constructors, resulting in zero-copy pass-by-value semantics.

    也就是說,如果不調用std::move,在return的過程中,編譯器會盡量省略對象的copy或者move操作,達到零拷貝的效果;如果調用了std::move,會強制要求編譯器調用對象的move構造函數。顯然,后者更昂貴。

     
    diff --git a/tools/logging/tests/logging/log_to_test.h b/tools/logging/tests/logging/log_to_test.h
    index de91f49b34d..0c92e2acd8a 100644
    --- a/tools/logging/tests/logging/log_to_test.h
    +++ b/tools/logging/tests/logging/log_to_test.h
    @@ -21,7 +21,7 @@ class LogToString : public LogDestination {
         if (autoClear_) {
           Clear();
         }
    -    return std::move(ret);
    +    return ret;
       }
       void SetAutoClear(bool autoClear) { autoClear_ = autoClear; }
       void Clear() { str_.clear(); }

     

    3.13.3. 使用未初始化的對象

    sdk/tests/runtime/device_manager_test.cc在修改前的版本中,如果result.ok()為false,則cluster沒有機會初始化就會被后面的device->ClusterMemoryHandle()函數當做入參使用,會觸發很惡劣的影響:
    diff --git a/sdk/tests/runtime/device_manager_test.cc b/sdk/tests/runtime/device_manager_test.cc
    index cf9075367a7..0adf469da8b 100644
    --- a/sdk/tests/runtime/device_manager_test.cc
    +++ b/sdk/tests/runtime/device_manager_test.cc
    @@ -109,14 +109,13 @@ TEST_F(DeviceManagerTest, ClusterMemoryHandle_SuccessFail) {
       dtu::driver::DeviceManager* device = dtu::driver::DeviceManager::instance();
       device->AcquireDevice(0);
       dtu::StatusOr<dtu_cluster> result = device->Cluster(0, 0);
    -  dtu_cluster cluster;
       if (result.ok()) {
    -    cluster = std::move(result.ValueOrDie());
    -    EXPECT_NE(cluster, nullptr);
    +    dtu_cluster cluster = std::move(result.ValueOrDie());
    +    dtu::StatusOr<dtu_mem_handle> result1 =
    +        device->ClusterMemoryHandle(cluster);
       } else {
         EFLOG(FATAL) << "Get ClusterIds error: " << result.status();
       }
    -  dtu::StatusOr<dtu_mem_handle> result1 = device->ClusterMemoryHandle(cluster);
       EXPECT_EQ(result.ok(), true);
       EXPECT_NE(result.ValueOrDie(), nullptr);
       device->ReleaseCluster(0, 0);

     

    3.13.4. clang禁止使用括號表達式初始化數組

    下面的修改前的代碼clang會報錯"parenthesized initialization of a member array is a GNU extension [-Wgnu-array-member-paren-init]",從gcc回報告警"list-initializer for non-class type must not be parenthesized":
    diff --git a/sdk/tests/tops/tops_broadcast_parameter_test.cc b/sdk/tests/tops/tops_broadcast_parameter_test.cc
    index 831d9d23791..321c56171f3 100644
    --- a/sdk/tests/tops/tops_broadcast_parameter_test.cc
    +++ b/sdk/tests/tops/tops_broadcast_parameter_test.cc
    @@ -139,14 +139,18 @@ class TopsBroadcastParameterTest
     };
      
     TopsBroadcastParameterTest::TopsBroadcastParameterTest()
    -    : x_desc_dim({GetParam().x.h, GetParam().x.w}),
    -      y_desc_dim(
    -          {GetParam().y.n, GetParam().y.c, GetParam().y.h, GetParam().y.w}),
    -      broadcast_dims(
    -          {GetParam().broadcast_dim.dim_1, GetParam().broadcast_dim.dim_2}),
    -      input_length(GetParam().x.h * GetParam().x.w),
    +    : input_length(GetParam().x.h * GetParam().x.w),
           output_length(GetParam().y.n * GetParam().y.c * GetParam().y.h *
    -                    GetParam().y.w) {}
    +                    GetParam().y.w) {
    +  x_desc_dim[0] = GetParam().x.h;
    +  x_desc_dim[1] = GetParam().x.w;
    +  y_desc_dim[0] = GetParam().y.n;
    +  y_desc_dim[1] = GetParam().y.c;
    +  y_desc_dim[2] = GetParam().y.h;
    +  y_desc_dim[3] = GetParam().y.w;
    +  broadcast_dims[0] = GetParam().broadcast_dim.dim_1;
    +  broadcast_dims[1] = GetParam().broadcast_dim.dim_2;
    +}
      
     void TopsBroadcastParameterTest::freeDebugInfo() {
       if (input_mem == nullptr) {

     

    類似的修改還有:

    sdk/tests/tops/tops_dot_parameter_test.cc

    sdk/tests/tops/tops_pad_parameter_test.cc

    3.13.5. clang的泛型函數的實例化必須有相關調用才會觸發

    因為構造函數在sdk自身代碼里面沒有被調用,導致libdtu_sdk.so里面也沒有相關符號,但測試函數需要使用,不得已加了個樁函數來觸發構造函數實例化。

    diff --git a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
    index 7e366337561..41fb573a562 100644
    --- a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
    +++ b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
    @@ -248,4 +248,13 @@ vector<string> KernelCode<T>::LinkArgs() {
       return args;
     }
      
    +// stab function for undefined reference to
    +// 'dtu_umd::KernelCode<dtu_umd::PavoKernel>::KernelCode(llvm::StringRef)'
    +void kernel_code_stab() {
    +  StringRef file_name = "stab_file";
    +  KernelCode<PavoKernel> k_stab1(file_name);
    +  KernelCode<DoradoKernel> k_stab2(file_name);
    +  KernelCode<LeoKernel> k_stab3(file_name);
    +}
    +
     }  // namespace dtu_umd

     

    3.13.6. clang的constexpr中不允許定義需要內存處理的復雜對象

    下面的模板定義中需要新生成vector對象,該對象需要在構造函數中使用內存相關處理,不修改會報錯“variable of non-literal type 'std::vector<size_t>' (aka 'vector<unsigned long>') cannot be defined in a constexpr function”,將模板中的constexpr標識刪掉之后正常。

    查看c++標準3.9/10可以看到literal type的定義(相當于常量或者簡單變量),unpack_seq_to_vector里面的vector不屬于簡單變量或者簡單變量的數組,如果換成array應該可以通過,不過調用這個函數的地方都要修改:

    A type is a literal type if it is:

    • void; or

    • a scalar type; or

    • a reference type; or

    • an array of literal type; or

    • a class type (Clause 9) that has all of the following properties:

      • it has a trivial destructor,

      • it is an aggregate type (8.5.1) or has at least one constexpr constructor or constructor template that is not a copy or move constructor, and

      • all of its non-static data members and base classes are of non-volatile literal types

    diff --git a/sdk/tests/hlir/cc_tests/hlir_utils_test.cc b/sdk/tests/hlir/cc_tests/hlir_utils_test.cc
    index bdc21e3f317..24e37191fc4 100644
    --- a/sdk/tests/hlir/cc_tests/hlir_utils_test.cc
    +++ b/sdk/tests/hlir/cc_tests/hlir_utils_test.cc
    @@ -152,7 +152,7 @@ TEST(HlirUtilTest, ConstSplatValue) {
     }
      
     template <size_t... Idx>
    -constexpr static auto unpack_seq_to_vector(hlir::IndexSeq<Idx...>) {
    +static auto unpack_seq_to_vector(hlir::IndexSeq<Idx...>) {
       std::vector<size_t> ret = {Idx...};
       return ret;
     }

     

    3.13.7. clang的虛函數的重載需要加上顯式的override關鍵字

     

    diff --git a/tools/logging/include/logging/to/file.h b/tools/logging/include/logging/to/file.h
    index bdda687afdc..c6d39779bde 100644
    --- a/tools/logging/include/logging/to/file.h
    +++ b/tools/logging/include/logging/to/file.h
    @@ -18,7 +18,7 @@ class LogToFile : public LogDestination {
       DISALLOW_COPY_AND_ASSIGN(LogToFile);
      
       static pointer Create(const std::string &file_name);
    -  void Message(int level, const std::string &message);
    +  void Message(int level, const std::string &message) override;
       void Flush() override;
      
      private:

     

    其他類似修改:

    tools/logging/include/logging/to/std_err.h

    3.13.8. alignas使用問題

    alignas本意是定義結構體的時候,為了優化結構體的訪問效率,讓結構體的存放盡量靠近大的整數邊界,和c語言里面的pack不是一個概念。所以pack可以對所有對象強制指定pack(1)來確保內存訪問不移位,alignas的設置卻要求比結構體成員的最大長度要大:

    The object or the type declared by such a declaration will have its alignment requirement equal to the strictest (largest) non-zero expression of all alignas specifiers used in the declaration, unless it would weaken the natural alignment of the type.

    下面定義的結構體中有uint16_t的成員,理論上最小alignas是2,所以不能用alignas(1)來修飾:

    diff --git a/sdk/lib/hlir/utils/types.h b/sdk/lib/hlir/utils/types.h
    index 87aee25fe31..90cabe7bdb6 100644
    --- a/sdk/lib/hlir/utils/types.h
    +++ b/sdk/lib/hlir/utils/types.h
    @@ -151,13 +151,13 @@ enum class CompareType {
      
     // define raw data type
     // lower to factor need raw data
    -struct alignas(1) raw_bf16_ty {
    +struct alignas(2) raw_bf16_ty {
       uint16_t data;
     };
     static_assert(sizeof(raw_bf16_ty) == 2, "");
      
     // half
    -struct alignas(1) raw_fp16_ty {
    +struct alignas(2) raw_fp16_ty {
       uint16_t data;
     };
     static_assert(sizeof(raw_fp16_ty) == 2, "");

     

    3.14. 為了解決告警順帶做的一些優化

    3.14.1. 冗余的計算

    tools/logging/lib/logging/log_message.cc當時本來是為了解決變長數組的初始化問題,但自己閱讀發現把timeval的毫秒和秒先計算成一個總的毫秒之后并沒有使用,后面又直接換算成秒和毫秒再用的,所以這個換算實際上沒用,和代碼onwer確認之后刪掉相關冗余計算。

    diff --git a/tools/logging/lib/logging/log_message.cc b/tools/logging/lib/logging/log_message.cc
    index 77fa33fe129..8d4cd48b348 100644
    --- a/tools/logging/lib/logging/log_message.cc
    +++ b/tools/logging/lib/logging/log_message.cc
    @@ -25,18 +25,15 @@ std::string LogMessage::GenerateMessage() {
       std::stringstream os;
       struct timeval tv;
       gettimeofday(&tv, nullptr);
    -  uint64_t now_micros = static_cast<uint64_t>(tv.tv_sec) * 1000000 + tv.tv_usec;
    -  time_t now_seconds = static_cast<time_t>(now_micros / 1000000);
    -  int32_t micros_remainder = static_cast<int32_t>(now_micros % 1000000);
       const size_t time_buffer_size = 50;
    -  struct tm now_time = {0};
    -  char time_buffer[time_buffer_size];
    -  localtime_r(&now_seconds, &now_time);
    +  struct tm now_time = tm();
    +  char time_buffer[time_buffer_size]={0};
    +  localtime_r(&tv.tv_sec, &now_time);
       strftime(time_buffer, time_buffer_size, "%Y-%m-%d %H:%M:%S", &now_time);
      
       os << time_buffer << ".";
       os.width(6);
    -  os << micros_remainder << ": ";
    +  os << tv.tv_usec << ": ";
       os << "DIWEF"[severity_];
       if(msg_code_) {
         os << msg_code_;

     

    3.14.2. 引用指針和空指針的冗余比較

    對象的引用是指某個對象的地址,肯定不是空,所以將它和nullptr做比較沒有意義:

    diff --git a/tools/logging/lib/logging/log_module.cc b/tools/logging/lib/logging/log_module.cc
    index f40e13d6fea..3ea150b37a0 100644
    --- a/tools/logging/lib/logging/log_module.cc
    +++ b/tools/logging/lib/logging/log_module.cc
    @@ -27,10 +27,6 @@ LogModuleMgr &LogModuleMgr::Instance() {
     }
      
     void LogModuleMgr::UpdateModuleMaskFromEnv(const std::string &env) {
    -  if (&env == nullptr) {
    -    return;
    -  }
    -
       EFLOG(DBG) << "Init Logging Module" << std::endl;
       EFLOG(DBG) << "ENFLAME_LOG_DEBUG_MOD = " << env << std::endl;
       auto tokens = strutil::split(env, ',');
    @@ -91,4 +87,4 @@ void LogModuleMgr::SetModuleOff(EF_LOG_MOD module) {
       mod_status_[static_cast<int>(module)] = false;
     }
      
    -} // namespace dtu
    \ No newline at end of file
    +} // namespace dtu

     

     

    posted @ 2021-12-06 16:11  周榮華  閱讀(157)  評論(0編輯  收藏  舉報
    国产美女a做受大片观看