Calculating the percentile


Simple problem really. I’m trying to calculate a set of percentiles (e.g. 50, 68, 90) on groups within a table. I’ve noticed the percentile_cont function is a reserved word in the documents, but the function is not currently implemented. I’ve attempted to go down the path of defining the function myself using the UDF extension to no avail and after looking online for several hours I’ve almost given up.

As an example using the nyc_tree_2015_683k dataset, I would like to be able to do the following (or something as close to it).

    percentile(tree_dbh, 0.50), 
    percentile(tree_dbh, 0.68), 
    percentile(tree_dbh, 0.90) 
from nyc_tree_2015_683k 
group by zipcode;

I should also mention that I have tried executing the query in a Jupyter notebook using ibis, pymapd and rbc.

*edited to align with the following post

For interest, below is the UDF I define and pass in the omnisci.config file, which is not even working now as running ./startomnisci --config omnisci.config just quits not long after starting.

    #include <cstdint>
    #include <math.h>
    #include <limits>
    #include <type_traits>
    #if defined(clang) && defined(CUDA) && defined(CUDA_ARCH) #define DEVICE device #else #define DEVICE #endif
    #if defined(clang) && defined(CUDA) && defined(CUDA_ARCH) #define NEVER_INLINE #else #define NEVER_INLINE attribute((noinline)) #endif
    #if defined(clang) && defined(CUDA) && defined(CUDA_ARCH) #define ALWAYS_INLINE #else #define ALWAYS_INLINE attribute((always_inline)) #endif

    double percentile(const Array<double> arr, double q) {
        double perc, rank, crank, frank;
        int32_t length, ridx, cidx, fidx;
        length = arr.getSize();
        rank = 1. + *static_cast<double>(length-1)*q;
        crank = ceil(rank);
        frank = floor(rank);
        ridx = *static_cast<int32_t>(rank);
        cidx = *static_cast<int32_t>(crank);
        fidx = *static_cast<int32_t>(frank)
        if (crank == frank) & (frank == rank) {
            perc = arr[ridx];
        } else {
            perc = (crank-rank)*arr[frank] + (rank-frank)*arr[crank];
        return perc;

Hi @atomslinger,

Welcome to the community forum :wink:

the only window’s functions avaiable right noew are listed here
and as younoticed the perrcentile_cont or percentile_rank aren’t implemented yet so the error.

I don’t think your UDF would work because you are assuming that you gettint into an array the distinct and order values of the tree_dbh values within the groups.

Trying to give you an alternative using sql only

Hi @candido.dessanti

I just tried implementing this solution from stackoverflow (the code which I derived my UDF from):

However, does omniscidb currently support functions? My attempt at creating even simple ones appears to fail:

RETURNS integer AS $total$  
    total integer;  
   SELECT count(tree_dbh) into total FROM nyc_tree_2015_683k;  
   RETURN total;  
$total$ LANGUAGE omnisql;


nope, currently only UDF functions written in C Language are supported. The problem is that it’s likely you have to use windows functions, that runs on CPU so performance is going to be not so good