It is possible that a dictionary can be more significant than the underlying table because, in the table, it’s stored only the id that oìpoint to the string in the dictionary.
So if you have a table with two integers and one dictionary encoded string (32 bits as an example), each row will table 12 bytes, and with 1B, records will allocate 12GB of disk space. Assuming a high cardinality of 250M differents trying with an average length of 80 bytes, the dictionary will use 20GB.
Anyway, if you are doing a lot of CTAS, the dictionaries are shared between source and newly created tables.
To know which tables belong to a dictionary, you have to query the catalog, so connect to the internal catalog
the “easy” way is to connect to the catalog and query for the198 and 202 as id of the dictionaries
candido@zion-legion:/mapd_storage/data_test/mapd_catalogs$ sqlite3 omnisci
SQLite version 3.35.5 2021-04-19 18:32:05
Enter ".help" for usage hints.
sqlite> select * from mapd_dictionaries where dictid in(198,202);
198|orders_O_COMMENT_dict198|32|0|1|1
202|orders_ns_O_COMMENT_dict202|32|0|1|1
so from the internal name we know that the tables are called orders and orders_ns and the column is o_comment for both.
How i can CTAS as instruct for not share the Dictionary… i am expecting by doing this activity may be Dictionary size will reduce. will this activity has any other adverse impact ?
If you want to create the new table with a new dictionary, instead of sharing the old one between the two tables, you can use the use_shared_dictionaries property, this way.
CREATE now_table AS
SELECT ...
FROM old_table
WITH (use_shared_dictionaries=false);
Generally it’s preferable to use shared dictionaries to save space and memory, but it depends by the data and how the tables are populated after the CTAS.