Setup a TPC-H benchmark

Hey, im trying to do a TPC-H test, but im lost, can you please provide me a tutorial on it ?

Do you mean DDLs and queries?

Which size you are going to try and how manu gpus are you going to use?

I have ddl and queries.
I will test a 20Gb benchmark with a single card.
But i need help with how to run these queries. I need each query to run separately, but in the same test.
that test, i will repeat for 10, and after every test, i need to reset cache, neednt i ?
Thanks for your help :slight_smile:

We have something to run queries against an omniscidb and get results in various formats, but you have to install our python driver

You can get everything in our GitHub repo at this link

I don’t remember right now if it’s present on OS edition.

edit: just checked and that part it’s included into binary distribution, but you can download without issue from the github repository


Hello again, i have prepared all data throught the web application, installed all requirements, can you please help me with this benchmark , Im still im getting following error:

benchmark command:
sudo python3 -u admin -p HyperInteractive -S localhost -t supplier -l testovani -d /home/dominik/Desktop/test -i 2 -n omnisci

And another question, when i have 22 queries, how can i specify all tables here, because from i saw, i can only specify 1 table here.


I tried the with pymapd, and it worked as expected; I see you used the -S parameter, a target database that stores the results, so the error.

this way is working, and it is outputting the resulting JSON to standard output; if you prefer, you can have a file in output changing the -e switch

python -t mytable  -l "first run" -i 10 -u admin -p HyperInteractive -s localhost -n tpch1000 -d queries/ -e output

the -t switch is used when you want to run the queries against different tables, so as an example you want to query two different tables one with a billion rows and another with ten billion, you can write the query is this way

FROM ###TAB###

the will substitute the ###TAB### with the name you specify using -t switch

but with the TPCH benchmark this isn’t useful because there are multiple tables, so its better to create more databases with a different scaling factor, and change the database name when you connect to Heavyai.

Thank you so much guys,
I have the last question, in the following results, query_exec_first is always large, but query_exec_avg is much smaller, so isn’t the query_exec_first value more accurate in measuring how long it took to process the query?

“query_id”: “query_1”,
“query_result_set_count”: 4,
“query_error_info”: “”,
“query_conn_first”: 22.8,
“query_conn_avg”: 1.7,
“query_conn_min”: 1.1,
“query_conn_max”: 2.6,
“query_conn_85”: 2.2,
“query_exec_first”: 39270,
“query_exec_avg”: 681.6,
“query_exec_min”: 645,
“query_exec_max”: 944,
“query_exec_85”: 652.6,
“query_exec_25”: 646.0,
“query_exec_stdd”: 92.8,
“query_exec_trimmed_avg”: 649,
“query_exec_trimmed_max”: 653,
“query_render_first”: null,
“query_render_avg”: null,
“query_render_min”: null,
“query_render_max”: null,
“query_render_85”: null,
“query_render_25”: null,
“query_render_stdd”: null,
“query_total_first”: 39292.8,
“query_total_avg”: 683.3,
“query_total_min”: 646.3,
“query_total_max”: 945.1,
“query_total_85”: 653.9,
“query_total_all”: 45450.0,
“query_total_trimmed_avg”: 651.2,
“results_iter_count”: 10,
“results_iter_first”: 0.0,
“results_iter_avg”: 0.0,
“results_iter_min”: 0.0,
“results_iter_max”: 0.0,
“results_iter_85”: 0.0,
“cpu_mem_usage_mb”: 1487.4,
“gpu_mem_usage_mb”: 1487.4
“debug”: {
“query_exec_times”: [
“query_total_times”: [
“detailed_timing_last_iteration”: {}


The first run of a kind of a query will be slower than subsequent.

The first is that the engine could be forced to read the data from DISK to CPU and GPU memory, and this takes time depending on the size of data needed to move.

The second is that the query plan has to be generated from scratch, which can take at least 100ms to more depending on the complexity of the query. Still, subsequent queries with different literals will reuse that plan so that the queries will run faster than the first one.

As an example, this simple query runs against a small table containing 120MM rows on a notebook using an entry-level GPU

select extract(year from arr_timestamp),avg(arrdelay) from flights_sk_8_120m where distance<10 group by 1;

Looking at the logs, we get that the database had to create a plan to run the query
130ms start(0ms) compileWorkUnit NativeCodegen.cpp:2575

and has to read data from disk to populate the CPU and GPU caches
371ms start(130ms) fetchChunks Execute.cpp:2572

the rest of the runtime operations tooks another 9ms
9ms start(502ms) executePlanWithGroupBy Execute.cpp:3214
4ms start(502ms) launchGpuCode QueryExecutionContext.cpp:221
4ms start(507ms) getRowSet QueryExecutionContext.cpp:156

If I will run the query another time, changing the literal of the filter

select extract(year from arr_timestamp),avg(arrdelay) from flights_sk_8_120m where distance<11 group by 1;

the total runtime falls to 7 ms the engine had a valid plan to run the query, and the data was in the GPU’s RAM

0ms start(2ms) fetchChunks Execute.cpp:2572
0ms start(2ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:775
  6ms start(2ms) executePlanWithGroupBy Execute.cpp:3214   
    6ms start(2ms) launchGpuCode QueryExecutionContext.cpp:221

This is the reason why in the benchmark, the results of the first run aren’t used to calculate the average, so we think the subsequent runs are more representative of the performance.

Best regards,

1 Like