Suppose we have a function compute_predictor that
takes in two arrays of values, x and y, and
uses these to predict the next value of x. The actual
implementation of this model is irrelevant to this discussion; we assume
that such a model already exists. The model can be called as
follows:
typedef struct {
//date
unsigned day;
unsigned month;
unsigned year;
unsigned week_day;
unsigned day_of_month;
} date;
double *compute_predictor (date *dates, double *x_values, double *y_values,
int size) {
...;
}The function takes an array of dates and two arrays of values
corresponding to the dates, all of which are of length
size. Here, x_values[0] is the value of a time
series X on the date dates[0].
Now let us demonstrate the straightforward way of converting this function into a C stored function that the framework can work with. This method does not require any modifications to the function that implements the model. In the next section, we will describe a more efficient way to do this, which does involve rewriting the model.
Let predictor be the name for the C stored function
we want to write. The first thing we need to do is specify the input
parameters for the C stored function, which in this case are going to be
two security parameters. Therefore, we can define the
cfunc_instance_init_defaults function as follows:
#define PRED_STOCK1 "stock1"
#define PRED_STOCK2 "stock2"
extern "C" int cfunc_instance_init_defaults (xmim_cfunc_args *args) {
int num = 2;
//security args
xmim_cfunc_security_args *sec = new xmim_cfunc_security_args;
sec->size = num;
sec->names = new char*[num];
sec->names[0] = PRED_STOCK1;
sec->names[1] = PRED_STOCK2;
sec->rels = new char*[num];
sec->rels[0] = NULL;
sec->rels[1] = NULL;
args->security_args = sec;
//define the state here
return 1;
}A state is specific to a particular C stored function and is
defined in the cfunc_instance_init_defaults function. A
state consists of all objects that need to persist between invocations
of different functions, so that a function can retrieve an object it is
interested in and use for its purposes.
Now let us define the state which our C stored function needs to
keep. We need to store one of the time series that we will create based
on the input stock symbols so that we are able to define the trading
pattern and the trading range for predictor later. Also, in
cfunc_instance_init function we will compute the result
values and "remember'' them throughout our C stored function's
lifetime.
Taking this into account the
cfunc_instance_init_defaults looks as follows:
#define PRED_STOCK1 "stock1"
#define PRED_STOCK2 "stock2"
#define PRED_TIME_SERIES "time_series"
#define PRED_RESULT "result"
#define DEFAULT_COLUMN "Asks"
extern "C" int cfunc_instance_init_defaults (xmim_cfunc_args *args) {
int num = 2;
//security args
xmim_cfunc_security_args *sec = new xmim_cfunc_security_args;
sec->size = num;
sec->names = new char*[num];
sec->names[0] = PRED_STOCK1;
sec->names[1] = PRED_STOCK2;
sec->rels = new char*[num];
sec->rels[0] = NULL;
sec->rels[1] = NULL;
args->security_args = sec;
xmim_cfunc_state *state = new xmim_cfunc_state;
num = 2;
state->size = num;
state->ptr = new xmim_cfunc_obj_ptr[num];
for (int i = 0; i < num; i++)
state->ptr[i].obj = NULL;
state->names = new char*[num];
state->names[0] = PRED_TIME_SERIES;
state->names[1] = PRED_RESULT;
args->state = state;
return 1;
}In the cfunc_instance_init
function C stored functions are able to access the input parameters with
which they are called from a user's query. For example, if our
predictor is invoked via the MIM query:
SHOW
predictor(EA.EPWVQ304, EA.EPWVQ404)
WHEN
Date is after 2002
AND
Date is WednesdayThen, to get the first input stock symbol,
we would use the xmim_cfunc_get_arg function:
char *symbol = NULL;
symbol = (char *)xmim_cfunc_get_arg (args, P_SECURITY, PRED_STOCK1);
if (!symbol)
return 0;The first argument to xmim_cfunc_get_arg is an
xmim_cfunc_args structure, which is populated with the
input parameters after the call to
cfunc_instance_init_defaults. This means that the input
parameters are only inaccessible in the
cfunc_instance_init_defaults function and accessible in all
other functions of a C stored function. In general, we suggest that the
cfunc_instance_init_defaults should only be used to specify
the input parameters for the C stored function and its state. Nothing
else should happen there since the xmim_cfunc_args
structure is empty at that point.
The second argument is the type of the input parameter we want to get back. All possible types are defined in:
//arg types for a cfunc
typedef enum {
P_INT_CONSTANT,
P_DBL_CONSTANT,
P_CATEGORY,
P_STRING,
P_ATTR,
P_TIME_OFFSET,
P_TIME_PERIOD,
P_SECURITY,
//cfunc specific state
P_STATE
} xmim_cfunc_arg_type;in the xmim_cfunc_api.h file. And the third argument
to xmim_cfunc_get_arg is the name of the input parameter as
we defined it in the cfunc_instance_init_defaults
function.
The last two lines of the code snippet above handle an exceptional
condition when the returned symbol is NULL. xmim_cfunc_get_arg
will return NULL in two cases: when it fails to retrieve the
named parameter or if the parameter value is NULL. In the first case
args->error is set to indicate the failure, while in the
second case no error is be set. So, if a NULL can be a valid value for a
parameter (the state is treated analogously to input parameters by
xmim_cfunc_get_arg, and NULL can be a valid value before we
initialize the state), then we need to check whether
args->error is set (in which case
xmim_cfunc_get_arg has failed) or is NULL (in which case
the parameter value is NULL).
Returning a 0 from any of the C stored function's functions
signals that an error has occurred during the function's execution. The
framework will then raise an error with the args->error
error message.
Now let us get back to what we want to do in
the cfunc_instance_init function. Given the input stock
symbols, we are going to create the corresponding time series. The
compute_predictor function takes three arrays as its input:
an array of dates and arrays of values on those dates for the two time
series. This is what we could store in our state. However, it is more
convenient for us to precompute the result exactly the way we compute it
in the compute_predictor function and then store it in our
state. Thus the pred_result structure:
typedef struct {
double *values;
xmim_cfunc_date_time **dates;
int size;
} pred_result;which stores both the dates array and the result values on the corresponding dates.
In general, the purpose of the cfunc_instance_init
function is to initialize the state. It is executed before the first
call to the cfunc_instance_get_value, so all the
initialization should be done there.
Our cfunc_instance_init will initialize the state as
follows:
extern "C" int cfunc_instance_init (xmim_cfunc_args *args) {
char *relname1 = NULL, *relname2 = NULL;
void *series1 = NULL, *series2 = NULL;
xmim_cfunc_date_time *from_date = NULL, *to_date = NULL, *date = NULL, *old_date = NULL;
void *offset;
pred_result *result = NULL;
int size = 0;
//first input argument
relname1 = (char *)xmim_cfunc_get_arg (args, P_SECURITY, PRED_STOCK1);
if (!relname1)
return 0;
//second input argument
relname2 = (char *)xmim_cfunc_get_arg (args, P_SECURITY, PRED_STOCK2);
if (!relname2)
return 0;
//create the time series for the first stock symbol
series1 = xmim_cfunc_get_time_series (args, relname1, DEFAULT_COLUMN);
if (!series1)
return 0;
//store the first time series in the state
if (!xmim_cfunc_set_arg (args, P_STATE, series1, PRED_TIME_SERIES))
return 0;
//create the time series for the second stock symbol
//will get deleted by the framework
series2 = xmim_cfunc_get_time_series (args, relname2, DEFAULT_COLUMN);
if (!series2)
return 0;
// the from and to dates for the first time series
// the date range we are going to use
from_date = xmim_cfunc_get_from_date (args, series1);
to_date = xmim_cfunc_get_to_date (args, series1);
//calculate the number of values we have
date = xmim_cfunc_get_from_date (args, series1);
size = 1;
while (!xmim_cfunc_date_time_equal (args, date, to_date)) {
offset = xmim_cfunc_get_time_offset (args, 1, PL_DAYS, PL_LATER);
old_date = date;
date = xmim_cfunc_apply_time_offset (args, date, series1, offset);
xmim_cfunc_free_date_time (args, old_date);
xmim_cfunc_free_time_offset (args, offset);
size++;
}
//compute the result
result = new pred_result;
result->values = new double[size];
result->dates = new xmim_cfunc_date_time*[size];
result->size = size;
double v1, v2, prev_v1, prev_v2;
//the first value is a NaN since we do not have the
//value on the previous date
double zero = 0;
double *doubleNan = &zero;
xmim_cfunc_makeNan (args, doubleNan);
result->values[0] = *doubleNan;
result->dates[0] = from_date;
prev_v1 = xmim_cfunc_get_value (args, series1, from_date);
prev_v2 = xmim_cfunc_get_value (args, series2, from_date);
//compute the values using the two time series
for (int i = 1; i < size; i++) {
offset = xmim_cfunc_get_time_offset (args, i, PL_DAYS, PL_LATER);
date = xmim_cfunc_apply_time_offset (args, from_date, series1, offset);
v1 = xmim_cfunc_get_value (args, series1, date);
v2 = xmim_cfunc_get_value (args, series2, date);
result->values[i] = v1 +
(0.5 * v1 *
(((v1 - prev_v1) / prev_v1) +
((v2 - prev_v2) / prev_v2)));
result->dates[i] = date;
prev_v1 = v1;
prev_v2 = v2;
xmim_cfunc_free_time_offset (args, offset);
}
//store the result in the state
if (!xmim_cfunc_set_arg (args, P_STATE, result, PRED_RESULT))
return 0;
return 1;
}xmim_cfunc_get_time_series returns a time series
object corresponding to the relation and the column provided. The column
is assumed to be the same for all symbols (relations) and known
beforehand. A time series can be either destroyed explicitly by calling
the
void xmim_cfunc_free_time_series (xmim_cfunc_args *args, void *time_series);
function or destroyed implicitly by the framework when the C stored function instance is being destroyed (note, that this only holds for time series objects created by the framework, and everything you create, you should destroy).
When computing the values for the result array, we are using the
xmim_cfunc_get_time_offset function to get the desired
offset and then get a date by applying that offset to the
from_date.
Then, the cfunc_instance_get_value function will
retrieve the result we have computed in the
cfunc_instance_init function, find a value on the requested
date and return that value. It looks as follows:
extern "C" int cfunc_instance_get_value (xmim_cfunc_args *args, xmim_cfunc_date_time *dt) {
pred_result *result = (pred_result *)xmim_cfunc_get_arg (args, P_STATE, PRED_RESULT);
xmim_cfunc_date_time *date = NULL;
if (!result)
return 0;
for (int i = 0; i < result->size; i++) {
date = (xmim_cfunc_date_time *) result->dates[i];
if (xmim_cfunc_date_time_equal (args, date, dt)) {
args->result = result->values[i];
return 1;
}
}
//we do not have the requested date - return NaN.
double zero = 0;
double *doubleNan = &zero;
xmim_cfunc_makeNan (args, doubleNan);
args->result = *doubleNan;
return 1;
}As you have noticed, the cfunc_instance_get_value
function returns the result by setting
args->result to the value it needs to
return.
Then, finally in the cfunc_instance_destroy function
we are going to clean up everything we have created:
extern "C" int cfunc_instance_destroy (xmim_cfunc_args *args) {
delete[] args->security_args->rels;
delete[] args->security_args->names;
delete args->security_args;
args->security_args = NULL;
//the time series will be deleted for us by the framework
pred_result *result = (pred_result *)xmim_cfunc_get_arg (args, P_STATE, PRED_RESULT);
if (result) {
for (int i = 0; i < result->size; i++)
delete result->dates[i];
delete[] result->dates;
delete[] result->values;
delete result;
}
return 1;
}We covered the mandatory functions each C stored function is
required to implement. The next two functions are
optional: cfunc_instance_compute_trading_pattern and
cfunc_instance_compute_trading_date_range.
They define the trading pattern (e.g., M-F, 8-5) and the trading date range (e.g., Jan 1, 1923 -- November 15, 2004) respectively and can be implemented by using one of the existing time series to get the trading pattern and the trading date range:
extern "C" int cfunc_instance_compute_trading_pattern (xmim_cfunc_args *args) {
void *time_series = xmim_cfunc_get_arg (args, P_STATE, STOCK1_TIME_SERIES);
if (!time_series)
return 0;
xmim_cfunc_compute_trading_pattern (args, time_series);
return 1;
}
extern "C" int cfunc_instance_compute_trading_date_range (xmim_cfunc_args *args) {
void *time_series = xmim_cfunc_get_arg (args, P_STATE, STOCK1_TIME_SERIES);
if (!time_series)
return 0;
xmim_cfunc_compute_trading_date_range (args, time_series);
return 1;
}
extern "C" int cfunc_instance_get_characteristic_time_series (xmim_cfunc_args *args) {
void *close_series = xmim_cfunc_get_arg (args, P_STATE, VWAP_TEST_CLOSE_SERIES);
if (!close_series)
close_series = vwap_set_series (args);
if (!close_series) {
xmim_cfunc_set_error (args, "VWAP get_characteristic_time_series: failed to create the time series. Please, make sure this call is happening at the right place.");
return 0;
}
args->characteristic_time_series = close_series;
return 1;
}
Complete code for this example is available in the section called “Complete Predictor Code”.
Now let us consider a more efficient way of writing a C stored function based on the example we described in the previous section. The code in this section is more efficient in two ways:
It does not have to allocate a time series to store all possible return values and
It does not compute a result unless it is actually needed by the query.
However, to achieve this efficiency we must modify the code that
computes the predictor model. It should be noted that this modification
may be impractical or impossible. Let us call the new C stored function
we are going to write lazy_predictor.
The only code that would change in the
cfunc_instance_init_defaults function is the code that
handles the state, since now we want to store the time series
corresponding to the input stock symbols in the state:
xmim_cfunc_state *state = new xmim_cfunc_state;
num = 2;
state->size = num;
state->ptr = new xmim_cfunc_obj_ptr[num];
for (int i = 0; i < num; i++)
state->ptr[i].obj = NULL;
state->names = new char*[num];
state->names[0] = STOCK1_TIME_SERIES;
state->names[1] = STOCK2_TIME_SERIES;In the cfunc_instance_init function we are going to
create the time series corresponding to the two stock input symbols and
store them in the state (just like we did in the previous
section).
The cfunc_instance_get_value is going to compute the
result value for the specific date it is asked about.
extern "C" int cfunc_instance_get_value (xmim_cfunc_args *args, xmim_cfunc_date_time *dt) {
xmim_cfunc_date_time *from_date = NULL;
void *series1 = xmim_cfunc_get_arg (args, P_STATE, STOCK1_TIME_SERIES);
void *series2 = xmim_cfunc_get_arg (args, P_STATE, STOCK2_TIME_SERIES);
if (!series1 || !series2)
return 0;
//return a NaN for the first date
from_date = xmim_cfunc_get_from_date (args, series1);
if (xmim_cfunc_date_time_equal (args, from_date, dt)) {
double zero = 0;
double *doubleNan = &zero;
xmim_cfunc_makeNan (args, doubleNan);
args->result = *doubleNan;
return 1;
}
//compute the result
double v1, v2, prev_v1, prev_v2;
xmim_cfunc_date_time *date = NULL;
void *offset = NULL;
offset = xmim_cfunc_get_time_offset (args, 1, PL_DAYS, PL_AGO);
date = xmim_cfunc_apply_time_offset (args, dt, series1, offset);
prev_v1 = xmim_cfunc_get_value (args, series1, date);
prev_v2 = xmim_cfunc_get_value (args, series2, date);
v1 = xmim_cfunc_get_value (args, series1, dt);
v2 = xmim_cfunc_get_value (args, series2, dt);
xmim_cfunc_free_date_time (args, date);
xmim_cfunc_free_time_offset (args, offset);
args->result = v1 +
(0.5 * v1 *
(((v1 - prev_v1) / prev_v1) +
((v2 - prev_v2) / prev_v2)));
return 1;
}The rest of the functions here are analogous to the previous section.
The advantage of this approach is computing the values "on demand" which is faster than pre-computing and storing them.
When should either approach be used? We suggest that if you are just sitting down to write a new C stored function you try to use the lazy evaluation approach. However, you are more likely to have some functions you would like to "wrap around" to use within the framework. In this case, if the logic of a particular function you want to "wrap around" is easy to change such that is uses lazy evaluation, we would recommend using the lazy evaluation approach.
If the function is complicated, then it might be better to "wrap it around" in a straightforward way.
Complete code for this example is available in the section called “Complete Lazy Predictor Code”.