As a test, I am trying to load streaming data into a table that I created using max_rows as 1000. When I am looking at the count of the rows for the table it is changing like 925,800,500,925,62,900 etc (during the streaming period).
Shouldn’t the number of rows be like 1000(approx) and any data into the table will delete that many rows from the beginning. I do have a column time stamp in the data. Can anyone help me understand how the max_rows in creating a table works? Looks like all the rows are deleted and then new ones are entered.
The max rows parameter has to be intended as the maximum number of rows that you can insert to a table and the system isn’t deleting rows, but it’s deleting older fragment (it’s like truncating a partition) so the number of rows you will have at a T0 time will be less than the max_row setting.
This is what the docs say about the parameter
Used primarily for streaming datasets to limit the number of rows in a table. When the
max_rows limit is reached, the oldest fragment is removed. When populating a table from a file, make sure that your row count is below the
max_rows setting. If you attempt to load more rows at one time than the
max_rows setting defines, the records up to the
max_rows limit are removed, leaving only the additional rows. Default = 2^62.
You can use the time stamp column to filter your data using a bigger max_rows setting to be sure that all records are loaded.
Can I ask why are you using just 1000 records?
I was testing with a small number of rows, but I understand the point now. Thank you !