Omni server Fatal Error

Very Frequently i am getting following error in system

[sumit@util1 mapd_log]$ cat omnisci_server.FATAL.20200213-154433.log
2020-02-13T15:44:33.050904 F 78659 Calcite.cpp:509 Error occurred trying to communicate with Calcite server, the error was: ‘THRIFT_EAGAIN (timed out)’, omnisci_server restart will be required

Hi @Sumit
I have been able to reproduce the error on a context with extremly high concurrency and with table metadata not loaded yet.

I had to spawn 100 concurrent processes running a query against a table that hasn’t received query from the satrt of database, so the calcite process has been asked to read the metadata of table from disk (I use an extremly fast storage so it’s unlikely that disk performance is causing the problem).
Running the same 100 processes against a table queried before (different query, so the new query has been parsed by calcite) run flawlessy, so assuming the problem is the concurrent metadata reads causing your problem, you could try to run some simple warmup queries at server startup to force the caching of tables metadata.
You can set the db-query-list parameter to point a file containing query that will be run at server startup; a simple select count(*) from table would force the read of table metadata

Hopes this help

In v5.1.2 we introduced a new command line parameter, calcite-service-timeout , which allow you to increase the timeout:
If you have high concurrency or lots of views (in your case, views with long metadata reads on the underlying table), increasing the timeout will prevent these EAGAIN error messages.

1 Like