Example: Providing a C Stored Function Wrapper for an Existing C Function

Straightforward Approach

Suppose we have a function compute_predictor that takes in two arrays of values, x and y, and uses these to predict the next value of x. The actual implementation of this model is irrelevant to this discussion; we assume that such a model already exists. The model can be called as follows:

typedef struct {
  //date
  unsigned day;
  unsigned month;
  unsigned year;
  unsigned week_day;
  unsigned day_of_month;
} date;

double *compute_predictor (date *dates, double *x_values, double *y_values,
                           int size) {
  ...;
}

The function takes an array of dates and two arrays of values corresponding to the dates, all of which are of length size. Here, x_values[0] is the value of a time series X on the date dates[0].

Now let us demonstrate the straightforward way of converting this function into a C stored function that the framework can work with. This method does not require any modifications to the function that implements the model. In the next section, we will describe a more efficient way to do this, which does involve rewriting the model.

Let predictor be the name for the C stored function we want to write. The first thing we need to do is specify the input parameters for the C stored function, which in this case are going to be two security parameters. Therefore, we can define the cfunc_instance_init_defaults function as follows:

#define PRED_STOCK1 "stock1"
#define PRED_STOCK2 "stock2"

extern "C" int cfunc_instance_init_defaults (xmim_cfunc_args *args) {
  int num = 2;
  //security args
  xmim_cfunc_security_args *sec = new xmim_cfunc_security_args;
  sec->size = num;
  sec->names = new char*[num];
  sec->names[0] = PRED_STOCK1;
  sec->names[1] = PRED_STOCK2;
  sec->rels = new char*[num];
  sec->rels[0] = NULL;
  sec->rels[1] = NULL;

  args->security_args = sec;

  //define the state here

  return 1;
}

A state is specific to a particular C stored function and is defined in the cfunc_instance_init_defaults function. A state consists of all objects that need to persist between invocations of different functions, so that a function can retrieve an object it is interested in and use for its purposes.

Now let us define the state which our C stored function needs to keep. We need to store one of the time series that we will create based on the input stock symbols so that we are able to define the trading pattern and the trading range for predictor later. Also, in cfunc_instance_init function we will compute the result values and "remember'' them throughout our C stored function's lifetime.

Taking this into account the cfunc_instance_init_defaults looks as follows:

#define PRED_STOCK1 "stock1"
#define PRED_STOCK2 "stock2"
#define PRED_TIME_SERIES "time_series"
#define PRED_RESULT "result"

#define DEFAULT_COLUMN "Asks"

extern "C" int cfunc_instance_init_defaults (xmim_cfunc_args *args) {
  int num = 2;
  //security args
  xmim_cfunc_security_args *sec = new xmim_cfunc_security_args;
  sec->size = num;
  sec->names = new char*[num];
  sec->names[0] = PRED_STOCK1;
  sec->names[1] = PRED_STOCK2;
  sec->rels = new char*[num];
  sec->rels[0] = NULL;
  sec->rels[1] = NULL;

  args->security_args = sec;

  xmim_cfunc_state *state = new xmim_cfunc_state;
  num = 2;
  state->size = num;
  state->ptr = new xmim_cfunc_obj_ptr[num];
  for (int i = 0; i < num; i++)
    state->ptr[i].obj = NULL;

  state->names = new char*[num];
  state->names[0] = PRED_TIME_SERIES;
  state->names[1] = PRED_RESULT;

  args->state = state;

  return 1;
}

In the cfunc_instance_init function C stored functions are able to access the input parameters with which they are called from a user's query. For example, if our predictor is invoked via the MIM query:

SHOW
    predictor(EA.EPWVQ304, EA.EPWVQ404)
WHEN
    Date is after 2002 
AND     
    Date is Wednesday

Then, to get the first input stock symbol, we would use the xmim_cfunc_get_arg function:

  char *symbol = NULL;
  symbol = (char *)xmim_cfunc_get_arg (args, P_SECURITY, PRED_STOCK1);
  if (!symbol)
    return 0;

The first argument to xmim_cfunc_get_arg is an xmim_cfunc_args structure, which is populated with the input parameters after the call to cfunc_instance_init_defaults. This means that the input parameters are only inaccessible in the cfunc_instance_init_defaults function and accessible in all other functions of a C stored function. In general, we suggest that the cfunc_instance_init_defaults should only be used to specify the input parameters for the C stored function and its state. Nothing else should happen there since the xmim_cfunc_args structure is empty at that point.

The second argument is the type of the input parameter we want to get back. All possible types are defined in:

  //arg types for a cfunc
  typedef enum {
    P_INT_CONSTANT,
    P_DBL_CONSTANT,
    P_CATEGORY,
    P_STRING,
    P_ATTR,
    P_TIME_OFFSET,
    P_TIME_PERIOD,
    P_SECURITY,
    //cfunc specific state
    P_STATE
  } xmim_cfunc_arg_type;

in the xmim_cfunc_api.h file. And the third argument to xmim_cfunc_get_arg is the name of the input parameter as we defined it in the cfunc_instance_init_defaults function.

The last two lines of the code snippet above handle an exceptional condition when the returned symbol is NULL. xmim_cfunc_get_arg will return NULL in two cases: when it fails to retrieve the named parameter or if the parameter value is NULL. In the first case args->error is set to indicate the failure, while in the second case no error is be set. So, if a NULL can be a valid value for a parameter (the state is treated analogously to input parameters by xmim_cfunc_get_arg, and NULL can be a valid value before we initialize the state), then we need to check whether args->error is set (in which case xmim_cfunc_get_arg has failed) or is NULL (in which case the parameter value is NULL).

Returning a 0 from any of the C stored function's functions signals that an error has occurred during the function's execution. The framework will then raise an error with the args->error error message.

Now let us get back to what we want to do in the cfunc_instance_init function. Given the input stock symbols, we are going to create the corresponding time series. The compute_predictor function takes three arrays as its input: an array of dates and arrays of values on those dates for the two time series. This is what we could store in our state. However, it is more convenient for us to precompute the result exactly the way we compute it in the compute_predictor function and then store it in our state. Thus the pred_result structure:

  typedef struct {
    double *values;
    xmim_cfunc_date_time **dates;
    int size;
  } pred_result;

which stores both the dates array and the result values on the corresponding dates.

In general, the purpose of the cfunc_instance_init function is to initialize the state. It is executed before the first call to the cfunc_instance_get_value, so all the initialization should be done there.

Our cfunc_instance_init will initialize the state as follows:

extern "C" int cfunc_instance_init (xmim_cfunc_args *args) {  
  char *relname1 = NULL, *relname2 = NULL;
  void *series1 = NULL, *series2 = NULL;
  xmim_cfunc_date_time *from_date = NULL, *to_date = NULL, *date = NULL, *old_date = NULL;
  void *offset;
  pred_result *result = NULL;
  int size = 0;

  //first input argument
  relname1 = (char *)xmim_cfunc_get_arg (args, P_SECURITY, PRED_STOCK1);
  if (!relname1)
    return 0;

  //second input argument
  relname2 = (char *)xmim_cfunc_get_arg (args, P_SECURITY, PRED_STOCK2);
  if (!relname2)
    return 0;

  //create the time series for the first stock symbol
  series1 = xmim_cfunc_get_time_series (args, relname1, DEFAULT_COLUMN);
  if (!series1)
    return 0;
  //store the first time series in the state
  if (!xmim_cfunc_set_arg (args, P_STATE, series1, PRED_TIME_SERIES))
    return 0;

  //create the time series for the second stock symbol  
  //will get deleted by the framework
  series2 = xmim_cfunc_get_time_series (args, relname2, DEFAULT_COLUMN);  
  if (!series2)
    return 0;

  // the from and to dates for the first time series
  // the date range we are going to use
  from_date = xmim_cfunc_get_from_date (args, series1);
  to_date = xmim_cfunc_get_to_date (args, series1);

  //calculate the number of values we have
  date = xmim_cfunc_get_from_date (args, series1);
  size = 1;

  while (!xmim_cfunc_date_time_equal (args, date, to_date)) {
    offset = xmim_cfunc_get_time_offset (args, 1, PL_DAYS, PL_LATER);
    old_date = date;
    date = xmim_cfunc_apply_time_offset (args, date, series1, offset);
    xmim_cfunc_free_date_time (args, old_date);
    xmim_cfunc_free_time_offset (args, offset);  
    size++;
  }

  //compute the result
  result = new pred_result;
  result->values = new double[size];
  result->dates = new xmim_cfunc_date_time*[size];
  result->size = size;
  
  double v1, v2, prev_v1, prev_v2;
  //the first value is a NaN since we do not have the 
  //value on the previous date
  double zero = 0;
  double *doubleNan = &zero;
  xmim_cfunc_makeNan (args, doubleNan);
  result->values[0] = *doubleNan;
  result->dates[0] = from_date;  
  prev_v1 = xmim_cfunc_get_value (args, series1, from_date);
  prev_v2 = xmim_cfunc_get_value (args, series2, from_date);
  //compute the values using the two time series
  for (int i = 1; i < size; i++) {
    offset = xmim_cfunc_get_time_offset (args, i, PL_DAYS, PL_LATER);
    date = xmim_cfunc_apply_time_offset (args, from_date, series1, offset);
    v1 = xmim_cfunc_get_value (args, series1, date);
    v2 = xmim_cfunc_get_value (args, series2, date);
    result->values[i] = v1 + 
      (0.5 * v1 * 
      (((v1 - prev_v1) / prev_v1) + 
      ((v2 - prev_v2) / prev_v2)));
        
    result->dates[i] = date;
    prev_v1 = v1;
    prev_v2 = v2;
    xmim_cfunc_free_time_offset (args, offset);  
  }

  
  //store the result in the state
  if (!xmim_cfunc_set_arg (args, P_STATE, result, PRED_RESULT))
    return 0;

  return 1;
}

xmim_cfunc_get_time_series returns a time series object corresponding to the relation and the column provided. The column is assumed to be the same for all symbols (relations) and known beforehand. A time series can be either destroyed explicitly by calling the

  void xmim_cfunc_free_time_series (xmim_cfunc_args *args, void *time_series);

function or destroyed implicitly by the framework when the C stored function instance is being destroyed (note, that this only holds for time series objects created by the framework, and everything you create, you should destroy).

When computing the values for the result array, we are using the xmim_cfunc_get_time_offset function to get the desired offset and then get a date by applying that offset to the from_date.

Then, the cfunc_instance_get_value function will retrieve the result we have computed in the cfunc_instance_init function, find a value on the requested date and return that value. It looks as follows:

extern "C" int cfunc_instance_get_value (xmim_cfunc_args *args, xmim_cfunc_date_time *dt) {
  pred_result *result = (pred_result *)xmim_cfunc_get_arg (args, P_STATE, PRED_RESULT);
  xmim_cfunc_date_time *date = NULL;

  if (!result)
    return 0;

  for (int i = 0; i < result->size; i++) {
    date = (xmim_cfunc_date_time *) result->dates[i];
    if (xmim_cfunc_date_time_equal (args, date, dt)) {
      args->result = result->values[i];
      return 1;
    }
  }
  //we do not have the requested date - return NaN.
  double zero = 0;
  double *doubleNan = &zero;
  xmim_cfunc_makeNan (args, doubleNan);
  args->result = *doubleNan;

  return 1;
}

As you have noticed, the cfunc_instance_get_value function returns the result by setting args->result to the value it needs to return.

Then, finally in the cfunc_instance_destroy function we are going to clean up everything we have created:

extern "C" int cfunc_instance_destroy (xmim_cfunc_args *args) {
  delete[] args->security_args->rels;
  delete[] args->security_args->names;
  delete args->security_args;  
  args->security_args = NULL;
  
  //the time series will be deleted for us by the framework

  pred_result *result = (pred_result *)xmim_cfunc_get_arg (args, P_STATE, PRED_RESULT);
  if (result) {
    for (int i = 0; i < result->size; i++)
      delete result->dates[i];

    delete[] result->dates;
    delete[] result->values;

    delete result;
  }

  return 1;
}

We covered the mandatory functions each C stored function is required to implement. The next two functions are optional: cfunc_instance_compute_trading_pattern and cfunc_instance_compute_trading_date_range.

They define the trading pattern (e.g., M-F, 8-5) and the trading date range (e.g., Jan 1, 1923 -- November 15, 2004) respectively and can be implemented by using one of the existing time series to get the trading pattern and the trading date range:

extern "C" int cfunc_instance_compute_trading_pattern (xmim_cfunc_args *args) {
  void *time_series = xmim_cfunc_get_arg (args, P_STATE, STOCK1_TIME_SERIES);

  if (!time_series)
    return 0;

  xmim_cfunc_compute_trading_pattern (args, time_series);
  
  return 1;
}

extern "C" int cfunc_instance_compute_trading_date_range (xmim_cfunc_args *args) {
  void *time_series = xmim_cfunc_get_arg (args, P_STATE, STOCK1_TIME_SERIES);

  if (!time_series)
    return 0;

  xmim_cfunc_compute_trading_date_range (args, time_series);
  
  return 1;  
}

extern "C" int cfunc_instance_get_characteristic_time_series (xmim_cfunc_args *args) {
  void *close_series = xmim_cfunc_get_arg (args, P_STATE, VWAP_TEST_CLOSE_SERIES);  
  if (!close_series)
    close_series = vwap_set_series (args); 

  if (!close_series) {
    xmim_cfunc_set_error (args, "VWAP get_characteristic_time_series: failed to create the time series. Please, make sure this call is happening at the right place.");
    return 0;
  }

  args->characteristic_time_series = close_series;
   
  return 1;
}

Complete code for this example is available in the section called “Complete Predictor Code”.

Lazy Evaluation Approach

Now let us consider a more efficient way of writing a C stored function based on the example we described in the previous section. The code in this section is more efficient in two ways:

  • It does not have to allocate a time series to store all possible return values and

  • It does not compute a result unless it is actually needed by the query.

However, to achieve this efficiency we must modify the code that computes the predictor model. It should be noted that this modification may be impractical or impossible. Let us call the new C stored function we are going to write lazy_predictor.

The only code that would change in the cfunc_instance_init_defaults function is the code that handles the state, since now we want to store the time series corresponding to the input stock symbols in the state:

  xmim_cfunc_state *state = new xmim_cfunc_state;
  num = 2;
  state->size = num;
  state->ptr = new xmim_cfunc_obj_ptr[num];
  for (int i = 0; i < num; i++)
    state->ptr[i].obj = NULL;

  state->names = new char*[num];
  state->names[0] = STOCK1_TIME_SERIES;
  state->names[1] = STOCK2_TIME_SERIES;

In the cfunc_instance_init function we are going to create the time series corresponding to the two stock input symbols and store them in the state (just like we did in the previous section).

The cfunc_instance_get_value is going to compute the result value for the specific date it is asked about.

extern "C" int cfunc_instance_get_value (xmim_cfunc_args *args, xmim_cfunc_date_time *dt) {
  xmim_cfunc_date_time *from_date = NULL;
  void *series1 = xmim_cfunc_get_arg (args, P_STATE, STOCK1_TIME_SERIES);
  void *series2 = xmim_cfunc_get_arg (args, P_STATE, STOCK2_TIME_SERIES);

  if (!series1 || !series2)
    return 0;  

  //return a NaN for the first date
  from_date = xmim_cfunc_get_from_date (args, series1);
  if (xmim_cfunc_date_time_equal (args, from_date, dt)) {
    double zero = 0;
    double *doubleNan = &zero;
    xmim_cfunc_makeNan (args, doubleNan);
    args->result = *doubleNan;
    return 1;
  }

  //compute the result
  double v1, v2, prev_v1, prev_v2;
  xmim_cfunc_date_time *date = NULL;
  void *offset = NULL;
  offset = xmim_cfunc_get_time_offset (args, 1, PL_DAYS, PL_AGO);  
  date = xmim_cfunc_apply_time_offset (args, dt, series1, offset);
  prev_v1 = xmim_cfunc_get_value (args, series1, date);
  prev_v2 = xmim_cfunc_get_value (args, series2, date);  
  v1 = xmim_cfunc_get_value (args, series1, dt);
  v2 = xmim_cfunc_get_value (args, series2, dt);
  xmim_cfunc_free_date_time (args, date);
  xmim_cfunc_free_time_offset (args, offset);  

  args->result = v1 + 
      (0.5 * v1 * 

      (((v1 - prev_v1) / prev_v1) + 
      ((v2 - prev_v2) / prev_v2)));

  return 1;
}

The rest of the functions here are analogous to the previous section.

The advantage of this approach is computing the values "on demand" which is faster than pre-computing and storing them.

When should either approach be used? We suggest that if you are just sitting down to write a new C stored function you try to use the lazy evaluation approach. However, you are more likely to have some functions you would like to "wrap around" to use within the framework. In this case, if the logic of a particular function you want to "wrap around" is easy to change such that is uses lazy evaluation, we would recommend using the lazy evaluation approach.

If the function is complicated, then it might be better to "wrap it around" in a straightforward way.

Complete code for this example is available in the section called “Complete Lazy Predictor Code”.