Hi @Sipy,
You can start the database by setting the parameter enable-debug-timer to true.
The timings of macro steps of the query, like the execution in CPU/GPU, projection, and the “order by” will be written to the log.
.
We haven’t anything to return the timings of single steps of the query, like group by, joins.
It would be difficult considering the extreme parallelism degree of GPU execution.
I tried query on 100GB, using 1 or 2 GPUs, and the response time is between 4000ms and 1200ms.
heavysql> select s_acctbal,s_name,n_name,p_partkey,p_mfgr,s_address,s_phone,s_comment from part,supplier,partsupp,nation,region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 15 and p_type like '%BRASS' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and ps_supplycost = (select min(ps_supplycost) from partsupp,supplier,nation,region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE') order by s_acctbal desc,n_name,s_name,p_partkey limit 10;
s_acctbal|s_name|n_name|p_partkey|p_mfgr|s_address|s_phone|s_comment
9999.70|Supplier#000239544|UNITED KINGDOM|6739531|Manufacturer#4|,6oexY7ft68JKJCN1SQ LUO5Xf|33-509-584-9496|ets are. blithely special accounts wake across t
9999.65|Supplier#000143654|FRANCE|2393647|Manufacturer#3|HhHCZ,RosE8He4uYvyIDqsPZe,7cZiJh1y9,|16-166-504-5864|special accounts. never dogged deposits
9999.49|Supplier#000615014|GERMANY|12365001|Manufacturer#3|4C,uOuJDig7z,4e|17-780-902-4027|pinto beans affix along th
9999.28|Supplier#000494480|ROMANIA|17244462|Manufacturer#3|hNB2qcYmnyeqrd em,o29TxzLwwrl|29-756-312-1779|ly. slyly final pains detect furiously qu
9998.87|Supplier#000826281|ROMANIA|14326252|Manufacturer#3|LkGmLSAjbjgKS5ZRepL|29-775-451-4774|slyly silent accounts affix furiously across the final, even ideas; tithe
9998.56|Supplier#000039514|UNITED KINGDOM|7539499|Manufacturer#2|fUN95rpikfiqeGo,okl,27ItSq9fMpr|33-317-828-7758|olites use about the blithely regular warhorses. carefully final instructions a
9997.89|Supplier#000319666|RUSSIA|19319665|Manufacturer#1|SZtdx3rzXnFuiVuJzKathV9|32-972-151-6038|furiously ironic pearls. furiously regular foxes along the furiously
9997.85|Supplier#000718770|ROMANIA|718769|Manufacturer#1|2173woKsCg7zpHCqhiEXFTmYYqpT4XONnRAsUryE|29-663-409-2865|posits sleep across the blithely express requests.
9997.83|Supplier#000348318|FRANCE|9348317|Manufacturer#4|AvfnvjCFAg1aNpj|16-718-707-5676|e pending requests: furiously even deposits boost furiously slyly even requests.
9997.73|Supplier#000329974|RUSSIA|4079969|Manufacturer#3|eKtIadopsE|32-268-488-7178|uickly. furiously ironic requests are slyly above the regular
10 rows returned.
Execution time: 1133 ms, Total time: 1179 ms
Could you enable the debug timer and post the log of your execution?
If the default storage directory is used the log file is /var/lib/heavyai/log/omnisci_server.INFO
the result of my execution is
1107ms total duration for executeRelAlgQuery
1107ms start(0ms) executeRelAlgQueryNoRetry RelAlgExecutor.cpp:516
0ms start(0ms) Query pre-execution steps RelAlgExecutor.cpp:517
1106ms start(0ms) executeRelAlgSeq RelAlgExecutor.cpp:814
4ms start(0ms) executeRelAlgStep RelAlgExecutor.cpp:962
4ms start(0ms) executeCompound RelAlgExecutor.cpp:2116
4ms start(0ms) executeWorkUnit RelAlgExecutor.cpp:3388
1ms start(0ms) compileWorkUnit NativeCodegen.cpp:2589
0ms start(2ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
0ms start(2ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
0ms start(2ms) ExecutionKernel::run ExecutionKernel.cpp:125
0ms start(2ms) fetchChunks Execute.cpp:2941
0ms start(2ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
0ms start(2ms) executePlanWithoutGroupBy Execute.cpp:3362
0ms start(2ms) launchGpuCode QueryExecutionContext.cpp:223
0ms start(3ms) collectAllDeviceResults Execute.cpp:2333
0ms start(3ms) reduceMultiDeviceResults Execute.cpp:1309
0ms start(3ms) reduceMultiDeviceResultSets Execute.cpp:1355
1ms start(3ms) compileWorkUnit NativeCodegen.cpp:2589
0ms start(4ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
0ms start(4ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
0ms start(4ms) ExecutionKernel::run ExecutionKernel.cpp:125
0ms start(4ms) fetchChunks Execute.cpp:2941
0ms start(4ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
0ms start(4ms) executePlanWithGroupBy Execute.cpp:3583
0ms start(4ms) launchGpuCode QueryExecutionContext.cpp:223
0ms start(5ms) getRowSet QueryExecutionContext.cpp:158
0ms start(5ms) reduceMultiDeviceResults Execute.cpp:1309
0ms start(5ms) reduceMultiDeviceResultSets Execute.cpp:1355
0ms start(5ms) resultsUnion Execute.cpp:1279
156ms start(5ms) executeRelAlgStep RelAlgExecutor.cpp:962
156ms start(5ms) executeCompound RelAlgExecutor.cpp:2116
156ms start(6ms) executeWorkUnit RelAlgExecutor.cpp:3388
2ms start(6ms) compileWorkUnit NativeCodegen.cpp:2589
0ms start(6ms) getInstance HashJoin.cpp:295
0ms start(6ms) reify PerfectJoinHashTable.cpp:354
0ms start(6ms) getOneColumnFragment ColumnFetcher.cpp:78
0ms start(6ms) getOneColumnFragment ColumnFetcher.cpp:78
New thread(170)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(170)
New thread(171)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(171)
0ms start(7ms) getInstance HashJoin.cpp:295
0ms start(7ms) reify PerfectJoinHashTable.cpp:354
0ms start(7ms) getOneColumnFragment ColumnFetcher.cpp:78
0ms start(7ms) getOneColumnFragment ColumnFetcher.cpp:78
New thread(172)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(172)
New thread(173)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(173)
0ms start(7ms) getInstance HashJoin.cpp:295
0ms start(7ms) synthesize_metadata InputMetadata.cpp:306
0ms start(7ms) reify PerfectJoinHashTable.cpp:354
0ms start(7ms) getOneColumnFragment ColumnFetcher.cpp:78
0ms start(7ms) ColumnarResults ColumnarResults.cpp:64
0ms start(7ms) getOneColumnFragment ColumnFetcher.cpp:78
New thread(174)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(174)
New thread(175)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(175)
0ms start(8ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
0ms start(8ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
0ms start(8ms) ExecutionKernel::run ExecutionKernel.cpp:125
0ms start(8ms) fetchChunks Execute.cpp:2941
New thread(47)
0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
0ms start(0ms) fetchChunks Execute.cpp:2941
0ms start(0ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
131ms start(0ms) executePlanWithGroupBy Execute.cpp:3583
131ms start(0ms) launchGpuCode QueryExecutionContext.cpp:223
0ms start(131ms) getRowSet QueryExecutionContext.cpp:158
0ms start(131ms) reduceMultiDeviceResults Execute.cpp:1309
0ms start(131ms) reduceMultiDeviceResultSets Execute.cpp:1355
End thread(47)
0ms start(8ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
130ms start(8ms) executePlanWithGroupBy Execute.cpp:3583
130ms start(8ms) launchGpuCode QueryExecutionContext.cpp:223
0ms start(139ms) getRowSet QueryExecutionContext.cpp:158
0ms start(139ms) reduceMultiDeviceResults Execute.cpp:1309
0ms start(139ms) reduceMultiDeviceResultSets Execute.cpp:1355
21ms start(140ms) collectAllDeviceResults Execute.cpp:2333
21ms start(140ms) reduceMultiDeviceResults Execute.cpp:1309
21ms start(140ms) reduceMultiDeviceResultSets Execute.cpp:1355
4ms start(162ms) executeRelAlgStep RelAlgExecutor.cpp:962
4ms start(162ms) executeCompound RelAlgExecutor.cpp:2116
4ms start(162ms) executeWorkUnit RelAlgExecutor.cpp:3388
1ms start(162ms) compileWorkUnit NativeCodegen.cpp:2589
0ms start(163ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
0ms start(163ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
0ms start(163ms) ExecutionKernel::run ExecutionKernel.cpp:125
0ms start(163ms) fetchChunks Execute.cpp:2941
0ms start(163ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
0ms start(163ms) executePlanWithoutGroupBy Execute.cpp:3362
0ms start(163ms) launchGpuCode QueryExecutionContext.cpp:223
0ms start(164ms) collectAllDeviceResults Execute.cpp:2333
0ms start(164ms) reduceMultiDeviceResults Execute.cpp:1309
0ms start(164ms) reduceMultiDeviceResultSets Execute.cpp:1355
1ms start(164ms) compileWorkUnit NativeCodegen.cpp:2589
0ms start(165ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
0ms start(165ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
0ms start(165ms) ExecutionKernel::run ExecutionKernel.cpp:125
0ms start(165ms) fetchChunks Execute.cpp:2941
0ms start(165ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
0ms start(165ms) executePlanWithGroupBy Execute.cpp:3583
0ms start(165ms) launchGpuCode QueryExecutionContext.cpp:223
0ms start(166ms) getRowSet QueryExecutionContext.cpp:158
0ms start(166ms) reduceMultiDeviceResults Execute.cpp:1309
0ms start(166ms) reduceMultiDeviceResultSets Execute.cpp:1355
0ms start(166ms) resultsUnion Execute.cpp:1279
5ms start(166ms) executeRelAlgStep RelAlgExecutor.cpp:962
5ms start(166ms) executeCompound RelAlgExecutor.cpp:2116
5ms start(166ms) executeWorkUnit RelAlgExecutor.cpp:3388
1ms start(166ms) compileWorkUnit NativeCodegen.cpp:2589
0ms start(167ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
0ms start(167ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
0ms start(167ms) ExecutionKernel::run ExecutionKernel.cpp:125
0ms start(167ms) fetchChunks Execute.cpp:2941
0ms start(167ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
0ms start(168ms) executePlanWithoutGroupBy Execute.cpp:3362
0ms start(168ms) launchGpuCode QueryExecutionContext.cpp:223
0ms start(168ms) collectAllDeviceResults Execute.cpp:2333
0ms start(168ms) reduceMultiDeviceResults Execute.cpp:1309
0ms start(168ms) reduceMultiDeviceResultSets Execute.cpp:1355
1ms start(169ms) compileWorkUnit NativeCodegen.cpp:2589
0ms start(170ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
0ms start(170ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
0ms start(170ms) ExecutionKernel::run ExecutionKernel.cpp:125
0ms start(170ms) fetchChunks Execute.cpp:2941
0ms start(170ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
1ms start(170ms) executePlanWithGroupBy Execute.cpp:3583
1ms start(170ms) launchGpuCode QueryExecutionContext.cpp:223
0ms start(172ms) getRowSet QueryExecutionContext.cpp:158
0ms start(172ms) reduceMultiDeviceResults Execute.cpp:1309
0ms start(172ms) reduceMultiDeviceResultSets Execute.cpp:1355
0ms start(172ms) resultsUnion Execute.cpp:1279
893ms start(172ms) executeRelAlgStep RelAlgExecutor.cpp:962
893ms start(172ms) executeProject RelAlgExecutor.cpp:2166
892ms start(173ms) executeWorkUnit RelAlgExecutor.cpp:3388
376ms start(204ms) compileWorkUnit NativeCodegen.cpp:2589
0ms start(205ms) getInstance HashJoin.cpp:295
0ms start(205ms) reify PerfectJoinHashTable.cpp:354
0ms start(205ms) getOneColumnFragment ColumnFetcher.cpp:78
0ms start(205ms) getOneColumnFragment ColumnFetcher.cpp:78
New thread(176)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(176)
New thread(177)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(177)
13ms start(206ms) getInstance HashJoin.cpp:295
4ms start(206ms) synthesize_metadata InputMetadata.cpp:306
8ms start(210ms) reify PerfectJoinHashTable.cpp:354
8ms start(210ms) getOneColumnFragment ColumnFetcher.cpp:78
7ms start(210ms) ColumnarResults ColumnarResults.cpp:64
0ms start(219ms) getOneColumnFragment ColumnFetcher.cpp:78
New thread(178)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(178)
New thread(179)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(179)
0ms start(219ms) getInstance HashJoin.cpp:295
0ms start(219ms) reify PerfectJoinHashTable.cpp:354
0ms start(219ms) getOneColumnFragment ColumnFetcher.cpp:78
0ms start(219ms) getOneColumnFragment ColumnFetcher.cpp:78
New thread(180)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(180)
New thread(181)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(181)
0ms start(219ms) getInstance HashJoin.cpp:295
0ms start(219ms) synthesize_metadata InputMetadata.cpp:306
0ms start(219ms) reify PerfectJoinHashTable.cpp:354
0ms start(219ms) getOneColumnFragment ColumnFetcher.cpp:78
0ms start(219ms) ColumnarResults ColumnarResults.cpp:64
0ms start(220ms) getOneColumnFragment ColumnFetcher.cpp:78
New thread(182)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(182)
New thread(183)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(183)
360ms start(220ms) getInstance HashJoin.cpp:295
235ms start(220ms) synthesize_metadata InputMetadata.cpp:306
124ms start(456ms) reify PerfectJoinHashTable.cpp:354
115ms start(456ms) getOneColumnFragment ColumnFetcher.cpp:78
107ms start(456ms) ColumnarResults ColumnarResults.cpp:64
6ms start(572ms) getOneColumnFragment ColumnFetcher.cpp:78
New thread(184)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(184)
New thread(185)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(185)
0ms start(580ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
0ms start(580ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
0ms start(581ms) ExecutionKernel::run ExecutionKernel.cpp:125
24ms start(581ms) fetchChunks Execute.cpp:2941
New thread(88)
0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
37ms start(0ms) fetchChunks Execute.cpp:2941
0ms start(37ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
2ms start(37ms) executePlanWithoutGroupBy Execute.cpp:3362
2ms start(37ms) launchGpuCode QueryExecutionContext.cpp:223
End thread(88)
0ms start(605ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
2ms start(605ms) executePlanWithoutGroupBy Execute.cpp:3362
2ms start(605ms) launchGpuCode QueryExecutionContext.cpp:223
0ms start(622ms) collectAllDeviceResults Execute.cpp:2333
0ms start(622ms) reduceMultiDeviceResults Execute.cpp:1309
0ms start(622ms) reduceMultiDeviceResultSets Execute.cpp:1355
394ms start(622ms) compileWorkUnit NativeCodegen.cpp:2589
1ms start(623ms) getInstance HashJoin.cpp:295
1ms start(623ms) reify PerfectJoinHashTable.cpp:354
0ms start(623ms) getOneColumnFragment ColumnFetcher.cpp:78
0ms start(623ms) getOneColumnFragment ColumnFetcher.cpp:78
New thread(186)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(186)
New thread(187)
1ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
1ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(187)
17ms start(625ms) getInstance HashJoin.cpp:295
7ms start(625ms) synthesize_metadata InputMetadata.cpp:306
9ms start(633ms) reify PerfectJoinHashTable.cpp:354
8ms start(633ms) getOneColumnFragment ColumnFetcher.cpp:78
8ms start(633ms) ColumnarResults ColumnarResults.cpp:64
0ms start(642ms) getOneColumnFragment ColumnFetcher.cpp:78
New thread(188)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(188)
New thread(189)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(189)
0ms start(642ms) getInstance HashJoin.cpp:295
0ms start(642ms) reify PerfectJoinHashTable.cpp:354
0ms start(643ms) getOneColumnFragment ColumnFetcher.cpp:78
0ms start(643ms) getOneColumnFragment ColumnFetcher.cpp:78
New thread(190)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(190)
New thread(191)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(191)
0ms start(643ms) getInstance HashJoin.cpp:295
0ms start(643ms) synthesize_metadata InputMetadata.cpp:306
0ms start(643ms) reify PerfectJoinHashTable.cpp:354
0ms start(643ms) getOneColumnFragment ColumnFetcher.cpp:78
0ms start(643ms) ColumnarResults ColumnarResults.cpp:64
0ms start(643ms) getOneColumnFragment ColumnFetcher.cpp:78
New thread(192)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(192)
New thread(193)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(193)
371ms start(643ms) getInstance HashJoin.cpp:295
238ms start(643ms) synthesize_metadata InputMetadata.cpp:306
133ms start(881ms) reify PerfectJoinHashTable.cpp:354
124ms start(881ms) getOneColumnFragment ColumnFetcher.cpp:78
111ms start(881ms) ColumnarResults ColumnarResults.cpp:64
7ms start(1006ms) getOneColumnFragment ColumnFetcher.cpp:78
New thread(194)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(194)
New thread(195)
0ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
0ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
End thread(195)
0ms start(1015ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
0ms start(1016ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
0ms start(1016ms) ExecutionKernel::run ExecutionKernel.cpp:125
New thread(196)
0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
16ms start(27ms) fetchChunks Execute.cpp:2941
0ms start(44ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
4ms start(44ms) executePlanWithGroupBy Execute.cpp:3583
4ms start(44ms) launchGpuCode QueryExecutionContext.cpp:223
0ms start(49ms) getRowSet QueryExecutionContext.cpp:158
0ms start(49ms) reduceMultiDeviceResults Execute.cpp:1309
0ms start(49ms) reduceMultiDeviceResultSets Execute.cpp:1355
End thread(196)
New thread(197)
0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
23ms start(0ms) fetchChunks Execute.cpp:2941
0ms start(24ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
6ms start(24ms) executePlanWithGroupBy Execute.cpp:3583
6ms start(24ms) launchGpuCode QueryExecutionContext.cpp:223
0ms start(30ms) getRowSet QueryExecutionContext.cpp:158
0ms start(30ms) reduceMultiDeviceResults Execute.cpp:1309
0ms start(30ms) reduceMultiDeviceResultSets Execute.cpp:1355
End thread(197)
23ms start(1016ms) fetchChunks Execute.cpp:2941
0ms start(1040ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
3ms start(1040ms) executePlanWithGroupBy Execute.cpp:3583
3ms start(1040ms) launchGpuCode QueryExecutionContext.cpp:223
0ms start(1043ms) getRowSet QueryExecutionContext.cpp:158
0ms start(1043ms) reduceMultiDeviceResults Execute.cpp:1309
0ms start(1043ms) reduceMultiDeviceResultSets Execute.cpp:1355
0ms start(1065ms) resultsUnion Execute.cpp:1279
41ms start(1066ms) executeRelAlgStep RelAlgExecutor.cpp:962
41ms start(1066ms) executeSort RelAlgExecutor.cpp:2985
37ms start(1066ms) executeWorkUnit RelAlgExecutor.cpp:3388
1ms start(1066ms) compileWorkUnit NativeCodegen.cpp:2589
0ms start(1067ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
0ms start(1067ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
0ms start(1067ms) ExecutionKernel::run ExecutionKernel.cpp:125
16ms start(1067ms) fetchChunks Execute.cpp:2941
15ms start(1067ms) ColumnarResults ColumnarResults.cpp:64
0ms start(1083ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
0ms start(1083ms) executePlanWithoutGroupBy Execute.cpp:3362
0ms start(1083ms) launchGpuCode QueryExecutionContext.cpp:223
0ms start(1084ms) collectAllDeviceResults Execute.cpp:2333
0ms start(1084ms) reduceMultiDeviceResults Execute.cpp:1309
0ms start(1084ms) reduceMultiDeviceResultSets Execute.cpp:1355
1ms start(1084ms) compileWorkUnit NativeCodegen.cpp:2589
0ms start(1085ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
0ms start(1085ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
0ms start(1085ms) ExecutionKernel::run ExecutionKernel.cpp:125
15ms start(1085ms) fetchChunks Execute.cpp:2941
14ms start(1085ms) ColumnarResults ColumnarResults.cpp:64
0ms start(1100ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:781
2ms start(1101ms) executePlanWithGroupBy Execute.cpp:3583
2ms start(1101ms) launchGpuCode QueryExecutionContext.cpp:223
0ms start(1103ms) getRowSet QueryExecutionContext.cpp:158
0ms start(1103ms) reduceMultiDeviceResults Execute.cpp:1309
0ms start(1103ms) reduceMultiDeviceResultSets Execute.cpp:1355
0ms start(1103ms) resultsUnion Execute.cpp:1279
3ms start(1103ms) sort ResultSet.cpp:768
0ms start(1103ms) initPermutationBuffer ResultSet.cpp:846
0ms start(1104ms) createComparator ResultSet.h:840
3ms start(1104ms) topPermutation ResultSet.cpp:1208
Quite complex 
Regards,
Candido