How to find out, what takes most time in query processing

Hello , where can i find out, what takes most time in query proccessing ?

I have following query, but it takes almost 40 sec to proccess on 10GB dataset

*select s_acctbal,s_name,n_name,p_partkey,p_mfgr,s_address,s_phone,s_comment from part,supplier,partsupp,nation,region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 15 *
*and p_type like ‘%BRASS’ and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ‘EUROPE’ and ps_supplycost = (select min(ps_supplycost) from partsupp,supplier,nation,region *
where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ‘EUROPE’) order by s_acctbal desc,n_name,s_name,p_partkey fetch next 100 rows only;

Hi @Sipy,

You can start the database by setting the parameter enable-debug-timer to true.
The timings of macro steps of the query, like the execution in CPU/GPU, projection, and the “order by” will be written to the log.
.

We haven’t anything to return the timings of single steps of the query, like group by, joins.

It would be difficult considering the extreme parallelism degree of GPU execution.

I tried query on 100GB, using 1 or 2 GPUs, and the response time is between 4000ms and 1200ms.

heavysql> select s_acctbal,s_name,n_name,p_partkey,p_mfgr,s_address,s_phone,s_comment from part,supplier,partsupp,nation,region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 15 and p_type like '%BRASS' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and ps_supplycost = (select min(ps_supplycost) from partsupp,supplier,nation,region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE') order by s_acctbal desc,n_name,s_name,p_partkey limit 10;
s_acctbal|s_name|n_name|p_partkey|p_mfgr|s_address|s_phone|s_comment
9999.70|Supplier#000239544|UNITED KINGDOM|6739531|Manufacturer#4|,6oexY7ft68JKJCN1SQ LUO5Xf|33-509-584-9496|ets are. blithely special accounts wake across t
9999.65|Supplier#000143654|FRANCE|2393647|Manufacturer#3|HhHCZ,RosE8He4uYvyIDqsPZe,7cZiJh1y9,|16-166-504-5864|special accounts. never dogged deposits
9999.49|Supplier#000615014|GERMANY|12365001|Manufacturer#3|4C,uOuJDig7z,4e|17-780-902-4027|pinto beans affix along th
9999.28|Supplier#000494480|ROMANIA|17244462|Manufacturer#3|hNB2qcYmnyeqrd em,o29TxzLwwrl|29-756-312-1779|ly. slyly final pains detect furiously qu
9998.87|Supplier#000826281|ROMANIA|14326252|Manufacturer#3|LkGmLSAjbjgKS5ZRepL|29-775-451-4774|slyly silent accounts affix furiously across the final, even ideas; tithe
9998.56|Supplier#000039514|UNITED KINGDOM|7539499|Manufacturer#2|fUN95rpikfiqeGo,okl,27ItSq9fMpr|33-317-828-7758|olites use about the blithely regular warhorses. carefully final instructions a
9997.89|Supplier#000319666|RUSSIA|19319665|Manufacturer#1|SZtdx3rzXnFuiVuJzKathV9|32-972-151-6038|furiously ironic pearls. furiously regular foxes along the furiously
9997.85|Supplier#000718770|ROMANIA|718769|Manufacturer#1|2173woKsCg7zpHCqhiEXFTmYYqpT4XONnRAsUryE|29-663-409-2865|posits sleep across the blithely express requests.
9997.83|Supplier#000348318|FRANCE|9348317|Manufacturer#4|AvfnvjCFAg1aNpj|16-718-707-5676|e pending requests: furiously even deposits boost furiously slyly even requests.
9997.73|Supplier#000329974|RUSSIA|4079969|Manufacturer#3|eKtIadopsE|32-268-488-7178|uickly. furiously ironic requests are slyly above the regular
10 rows returned.
Execution time: 1133 ms, Total time: 1179 ms

Could you enable the debug timer and post the log of your execution?
If the default storage directory is used the log file is /var/lib/heavyai/log/omnisci_server.INFO

the result of my execution is

1107ms total duration for executeRelAlgQuery
  1107ms start(0ms) executeRelAlgQueryNoRetry RelAlgExecutor.cpp:516
    0ms start(0ms) Query pre-execution steps RelAlgExecutor.cpp:517
    1106ms start(0ms) executeRelAlgSeq RelAlgExecutor.cpp:814
      4ms start(0ms) executeRelAlgStep RelAlgExecutor.cpp:962
        4ms start(0ms) executeCompound RelAlgExecutor.cpp:2116
          4ms start(0ms) executeWorkUnit RelAlgExecutor.cpp:3388
            1ms start(0ms) compileWorkUnit NativeCodegen.cpp:2589
              0ms start(2ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
              0ms start(2ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
            0ms start(2ms) ExecutionKernel::run ExecutionKernel.cpp:125
            0ms start(2ms) fetchChunks Execute.cpp:2941
            0ms start(2ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
            0ms start(2ms) executePlanWithoutGroupBy Execute.cpp:3362
              0ms start(2ms) launchGpuCode QueryExecutionContext.cpp:223
            0ms start(3ms) collectAllDeviceResults Execute.cpp:2333
              0ms start(3ms) reduceMultiDeviceResults Execute.cpp:1309
                0ms start(3ms) reduceMultiDeviceResultSets Execute.cpp:1355
            1ms start(3ms) compileWorkUnit NativeCodegen.cpp:2589
              0ms start(4ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
              0ms start(4ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
            0ms start(4ms) ExecutionKernel::run ExecutionKernel.cpp:125
            0ms start(4ms) fetchChunks Execute.cpp:2941
            0ms start(4ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
            0ms start(4ms) executePlanWithGroupBy Execute.cpp:3583
              0ms start(4ms) launchGpuCode QueryExecutionContext.cpp:223
              0ms start(5ms) getRowSet QueryExecutionContext.cpp:158
                0ms start(5ms) reduceMultiDeviceResults Execute.cpp:1309
                  0ms start(5ms) reduceMultiDeviceResultSets Execute.cpp:1355
            0ms start(5ms) resultsUnion Execute.cpp:1279
      156ms start(5ms) executeRelAlgStep RelAlgExecutor.cpp:962
        156ms start(5ms) executeCompound RelAlgExecutor.cpp:2116
          156ms start(6ms) executeWorkUnit RelAlgExecutor.cpp:3388
            2ms start(6ms) compileWorkUnit NativeCodegen.cpp:2589
              0ms start(6ms) getInstance HashJoin.cpp:295
                0ms start(6ms) reify PerfectJoinHashTable.cpp:354
                  0ms start(6ms) getOneColumnFragment ColumnFetcher.cpp:78
                  0ms start(6ms) getOneColumnFragment ColumnFetcher.cpp:78
                    New thread(170)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(170)
                    New thread(171)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(171)
              0ms start(7ms) getInstance HashJoin.cpp:295
                0ms start(7ms) reify PerfectJoinHashTable.cpp:354
                  0ms start(7ms) getOneColumnFragment ColumnFetcher.cpp:78
                  0ms start(7ms) getOneColumnFragment ColumnFetcher.cpp:78
                    New thread(172)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(172)
                    New thread(173)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(173)
              0ms start(7ms) getInstance HashJoin.cpp:295
                0ms start(7ms) synthesize_metadata InputMetadata.cpp:306
                0ms start(7ms) reify PerfectJoinHashTable.cpp:354
                  0ms start(7ms) getOneColumnFragment ColumnFetcher.cpp:78
                    0ms start(7ms) ColumnarResults ColumnarResults.cpp:64
                  0ms start(7ms) getOneColumnFragment ColumnFetcher.cpp:78
                    New thread(174)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(174)
                    New thread(175)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(175)
              0ms start(8ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
              0ms start(8ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
            0ms start(8ms) ExecutionKernel::run ExecutionKernel.cpp:125
            0ms start(8ms) fetchChunks Execute.cpp:2941
                New thread(47)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
                  0ms start(0ms) fetchChunks Execute.cpp:2941
                  0ms start(0ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
                  131ms start(0ms) executePlanWithGroupBy Execute.cpp:3583
                    131ms start(0ms) launchGpuCode QueryExecutionContext.cpp:223
                    0ms start(131ms) getRowSet QueryExecutionContext.cpp:158
                      0ms start(131ms) reduceMultiDeviceResults Execute.cpp:1309
                        0ms start(131ms) reduceMultiDeviceResultSets Execute.cpp:1355
                End thread(47)
            0ms start(8ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
            130ms start(8ms) executePlanWithGroupBy Execute.cpp:3583
              130ms start(8ms) launchGpuCode QueryExecutionContext.cpp:223
              0ms start(139ms) getRowSet QueryExecutionContext.cpp:158
                0ms start(139ms) reduceMultiDeviceResults Execute.cpp:1309
                  0ms start(139ms) reduceMultiDeviceResultSets Execute.cpp:1355
            21ms start(140ms) collectAllDeviceResults Execute.cpp:2333
              21ms start(140ms) reduceMultiDeviceResults Execute.cpp:1309
                21ms start(140ms) reduceMultiDeviceResultSets Execute.cpp:1355
      4ms start(162ms) executeRelAlgStep RelAlgExecutor.cpp:962
        4ms start(162ms) executeCompound RelAlgExecutor.cpp:2116
          4ms start(162ms) executeWorkUnit RelAlgExecutor.cpp:3388
            1ms start(162ms) compileWorkUnit NativeCodegen.cpp:2589
              0ms start(163ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
              0ms start(163ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
            0ms start(163ms) ExecutionKernel::run ExecutionKernel.cpp:125
            0ms start(163ms) fetchChunks Execute.cpp:2941
            0ms start(163ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
            0ms start(163ms) executePlanWithoutGroupBy Execute.cpp:3362
              0ms start(163ms) launchGpuCode QueryExecutionContext.cpp:223
            0ms start(164ms) collectAllDeviceResults Execute.cpp:2333
              0ms start(164ms) reduceMultiDeviceResults Execute.cpp:1309
                0ms start(164ms) reduceMultiDeviceResultSets Execute.cpp:1355
            1ms start(164ms) compileWorkUnit NativeCodegen.cpp:2589
              0ms start(165ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
              0ms start(165ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
            0ms start(165ms) ExecutionKernel::run ExecutionKernel.cpp:125
            0ms start(165ms) fetchChunks Execute.cpp:2941
            0ms start(165ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
            0ms start(165ms) executePlanWithGroupBy Execute.cpp:3583
              0ms start(165ms) launchGpuCode QueryExecutionContext.cpp:223
              0ms start(166ms) getRowSet QueryExecutionContext.cpp:158
                0ms start(166ms) reduceMultiDeviceResults Execute.cpp:1309
                  0ms start(166ms) reduceMultiDeviceResultSets Execute.cpp:1355
            0ms start(166ms) resultsUnion Execute.cpp:1279
      5ms start(166ms) executeRelAlgStep RelAlgExecutor.cpp:962
        5ms start(166ms) executeCompound RelAlgExecutor.cpp:2116
          5ms start(166ms) executeWorkUnit RelAlgExecutor.cpp:3388
            1ms start(166ms) compileWorkUnit NativeCodegen.cpp:2589
              0ms start(167ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
              0ms start(167ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
            0ms start(167ms) ExecutionKernel::run ExecutionKernel.cpp:125
            0ms start(167ms) fetchChunks Execute.cpp:2941
            0ms start(167ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
            0ms start(168ms) executePlanWithoutGroupBy Execute.cpp:3362
              0ms start(168ms) launchGpuCode QueryExecutionContext.cpp:223
            0ms start(168ms) collectAllDeviceResults Execute.cpp:2333
              0ms start(168ms) reduceMultiDeviceResults Execute.cpp:1309
                0ms start(168ms) reduceMultiDeviceResultSets Execute.cpp:1355
            1ms start(169ms) compileWorkUnit NativeCodegen.cpp:2589
              0ms start(170ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
              0ms start(170ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
            0ms start(170ms) ExecutionKernel::run ExecutionKernel.cpp:125
            0ms start(170ms) fetchChunks Execute.cpp:2941
            0ms start(170ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
            1ms start(170ms) executePlanWithGroupBy Execute.cpp:3583
              1ms start(170ms) launchGpuCode QueryExecutionContext.cpp:223
              0ms start(172ms) getRowSet QueryExecutionContext.cpp:158
                0ms start(172ms) reduceMultiDeviceResults Execute.cpp:1309
                  0ms start(172ms) reduceMultiDeviceResultSets Execute.cpp:1355
            0ms start(172ms) resultsUnion Execute.cpp:1279
      893ms start(172ms) executeRelAlgStep RelAlgExecutor.cpp:962
        893ms start(172ms) executeProject RelAlgExecutor.cpp:2166
          892ms start(173ms) executeWorkUnit RelAlgExecutor.cpp:3388
            376ms start(204ms) compileWorkUnit NativeCodegen.cpp:2589
              0ms start(205ms) getInstance HashJoin.cpp:295
                0ms start(205ms) reify PerfectJoinHashTable.cpp:354
                  0ms start(205ms) getOneColumnFragment ColumnFetcher.cpp:78
                  0ms start(205ms) getOneColumnFragment ColumnFetcher.cpp:78
                    New thread(176)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(176)
                    New thread(177)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(177)
              13ms start(206ms) getInstance HashJoin.cpp:295
                4ms start(206ms) synthesize_metadata InputMetadata.cpp:306
                8ms start(210ms) reify PerfectJoinHashTable.cpp:354
                  8ms start(210ms) getOneColumnFragment ColumnFetcher.cpp:78
                    7ms start(210ms) ColumnarResults ColumnarResults.cpp:64
                  0ms start(219ms) getOneColumnFragment ColumnFetcher.cpp:78
                    New thread(178)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(178)
                    New thread(179)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(179)
              0ms start(219ms) getInstance HashJoin.cpp:295
                0ms start(219ms) reify PerfectJoinHashTable.cpp:354
                  0ms start(219ms) getOneColumnFragment ColumnFetcher.cpp:78
                  0ms start(219ms) getOneColumnFragment ColumnFetcher.cpp:78
                    New thread(180)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(180)
                    New thread(181)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(181)
              0ms start(219ms) getInstance HashJoin.cpp:295
                0ms start(219ms) synthesize_metadata InputMetadata.cpp:306
                0ms start(219ms) reify PerfectJoinHashTable.cpp:354
                  0ms start(219ms) getOneColumnFragment ColumnFetcher.cpp:78
                    0ms start(219ms) ColumnarResults ColumnarResults.cpp:64
                  0ms start(220ms) getOneColumnFragment ColumnFetcher.cpp:78
                    New thread(182)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(182)
                    New thread(183)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(183)
              360ms start(220ms) getInstance HashJoin.cpp:295
                235ms start(220ms) synthesize_metadata InputMetadata.cpp:306
                124ms start(456ms) reify PerfectJoinHashTable.cpp:354
                  115ms start(456ms) getOneColumnFragment ColumnFetcher.cpp:78
                    107ms start(456ms) ColumnarResults ColumnarResults.cpp:64
                  6ms start(572ms) getOneColumnFragment ColumnFetcher.cpp:78
                    New thread(184)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(184)
                    New thread(185)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(185)
              0ms start(580ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
              0ms start(580ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
            0ms start(581ms) ExecutionKernel::run ExecutionKernel.cpp:125
            24ms start(581ms) fetchChunks Execute.cpp:2941
                New thread(88)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
                  37ms start(0ms) fetchChunks Execute.cpp:2941
                  0ms start(37ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
                  2ms start(37ms) executePlanWithoutGroupBy Execute.cpp:3362
                    2ms start(37ms) launchGpuCode QueryExecutionContext.cpp:223
                End thread(88)
            0ms start(605ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
            2ms start(605ms) executePlanWithoutGroupBy Execute.cpp:3362
              2ms start(605ms) launchGpuCode QueryExecutionContext.cpp:223
            0ms start(622ms) collectAllDeviceResults Execute.cpp:2333
              0ms start(622ms) reduceMultiDeviceResults Execute.cpp:1309
                0ms start(622ms) reduceMultiDeviceResultSets Execute.cpp:1355
            394ms start(622ms) compileWorkUnit NativeCodegen.cpp:2589
              1ms start(623ms) getInstance HashJoin.cpp:295
                1ms start(623ms) reify PerfectJoinHashTable.cpp:354
                  0ms start(623ms) getOneColumnFragment ColumnFetcher.cpp:78
                  0ms start(623ms) getOneColumnFragment ColumnFetcher.cpp:78
                    New thread(186)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(186)
                    New thread(187)
                      1ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        1ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(187)
              17ms start(625ms) getInstance HashJoin.cpp:295
                7ms start(625ms) synthesize_metadata InputMetadata.cpp:306
                9ms start(633ms) reify PerfectJoinHashTable.cpp:354
                  8ms start(633ms) getOneColumnFragment ColumnFetcher.cpp:78
                    8ms start(633ms) ColumnarResults ColumnarResults.cpp:64
                  0ms start(642ms) getOneColumnFragment ColumnFetcher.cpp:78
                    New thread(188)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(188)
                    New thread(189)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(189)
              0ms start(642ms) getInstance HashJoin.cpp:295
                0ms start(642ms) reify PerfectJoinHashTable.cpp:354
                  0ms start(643ms) getOneColumnFragment ColumnFetcher.cpp:78
                  0ms start(643ms) getOneColumnFragment ColumnFetcher.cpp:78
                    New thread(190)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(190)
                    New thread(191)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(191)
              0ms start(643ms) getInstance HashJoin.cpp:295
                0ms start(643ms) synthesize_metadata InputMetadata.cpp:306
                0ms start(643ms) reify PerfectJoinHashTable.cpp:354
                  0ms start(643ms) getOneColumnFragment ColumnFetcher.cpp:78
                    0ms start(643ms) ColumnarResults ColumnarResults.cpp:64
                  0ms start(643ms) getOneColumnFragment ColumnFetcher.cpp:78
                    New thread(192)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(192)
                    New thread(193)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(193)
              371ms start(643ms) getInstance HashJoin.cpp:295
                238ms start(643ms) synthesize_metadata InputMetadata.cpp:306
                133ms start(881ms) reify PerfectJoinHashTable.cpp:354
                  124ms start(881ms) getOneColumnFragment ColumnFetcher.cpp:78
                    111ms start(881ms) ColumnarResults ColumnarResults.cpp:64
                  7ms start(1006ms) getOneColumnFragment ColumnFetcher.cpp:78
                    New thread(194)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(194)
                    New thread(195)
                      0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                        0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                    End thread(195)
              0ms start(1015ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
              0ms start(1016ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
            0ms start(1016ms) ExecutionKernel::run ExecutionKernel.cpp:125
              New thread(196)
                0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
                16ms start(27ms) fetchChunks Execute.cpp:2941
                0ms start(44ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
                4ms start(44ms) executePlanWithGroupBy Execute.cpp:3583
                  4ms start(44ms) launchGpuCode QueryExecutionContext.cpp:223
                  0ms start(49ms) getRowSet QueryExecutionContext.cpp:158
                    0ms start(49ms) reduceMultiDeviceResults Execute.cpp:1309
                      0ms start(49ms) reduceMultiDeviceResultSets Execute.cpp:1355
              End thread(196)
              New thread(197)
                0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
                23ms start(0ms) fetchChunks Execute.cpp:2941
                0ms start(24ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
                6ms start(24ms) executePlanWithGroupBy Execute.cpp:3583
                  6ms start(24ms) launchGpuCode QueryExecutionContext.cpp:223
                  0ms start(30ms) getRowSet QueryExecutionContext.cpp:158
                    0ms start(30ms) reduceMultiDeviceResults Execute.cpp:1309
                      0ms start(30ms) reduceMultiDeviceResultSets Execute.cpp:1355
              End thread(197)
            23ms start(1016ms) fetchChunks Execute.cpp:2941
            0ms start(1040ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
            3ms start(1040ms) executePlanWithGroupBy Execute.cpp:3583
              3ms start(1040ms) launchGpuCode QueryExecutionContext.cpp:223
              0ms start(1043ms) getRowSet QueryExecutionContext.cpp:158
                0ms start(1043ms) reduceMultiDeviceResults Execute.cpp:1309
                  0ms start(1043ms) reduceMultiDeviceResultSets Execute.cpp:1355
            0ms start(1065ms) resultsUnion Execute.cpp:1279
      41ms start(1066ms) executeRelAlgStep RelAlgExecutor.cpp:962
        41ms start(1066ms) executeSort RelAlgExecutor.cpp:2985
          37ms start(1066ms) executeWorkUnit RelAlgExecutor.cpp:3388
            1ms start(1066ms) compileWorkUnit NativeCodegen.cpp:2589
              0ms start(1067ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
              0ms start(1067ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
            0ms start(1067ms) ExecutionKernel::run ExecutionKernel.cpp:125
            16ms start(1067ms) fetchChunks Execute.cpp:2941
              15ms start(1067ms) ColumnarResults ColumnarResults.cpp:64
            0ms start(1083ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
            0ms start(1083ms) executePlanWithoutGroupBy Execute.cpp:3362
              0ms start(1083ms) launchGpuCode QueryExecutionContext.cpp:223
            0ms start(1084ms) collectAllDeviceResults Execute.cpp:2333
              0ms start(1084ms) reduceMultiDeviceResults Execute.cpp:1309
                0ms start(1084ms) reduceMultiDeviceResultSets Execute.cpp:1355
            1ms start(1084ms) compileWorkUnit NativeCodegen.cpp:2589
              0ms start(1085ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
              0ms start(1085ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
            0ms start(1085ms) ExecutionKernel::run ExecutionKernel.cpp:125
            15ms start(1085ms) fetchChunks Execute.cpp:2941
              14ms start(1085ms) ColumnarResults ColumnarResults.cpp:64
            0ms start(1100ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
            2ms start(1101ms) executePlanWithGroupBy Execute.cpp:3583
              2ms start(1101ms) launchGpuCode QueryExecutionContext.cpp:223
              0ms start(1103ms) getRowSet QueryExecutionContext.cpp:158
                0ms start(1103ms) reduceMultiDeviceResults Execute.cpp:1309
                  0ms start(1103ms) reduceMultiDeviceResultSets Execute.cpp:1355
            0ms start(1103ms) resultsUnion Execute.cpp:1279
          3ms start(1103ms) sort ResultSet.cpp:768
            0ms start(1103ms) initPermutationBuffer ResultSet.cpp:846
            0ms start(1104ms) createComparator ResultSet.h:840
            3ms start(1104ms) topPermutation ResultSet.cpp:1208

Quite complex :wink:

Regards,
Candido

1 Like