When you use the PROTOTYPE comment to generate C code wrappers, you cannot take advantage of many of the features of the yorick language. In this example, there is no easy way to pass an indeterminate number of arguments to your function. When you write your own C code wrapper, all such limitations disappear. The price is that you need to use the yorick-specific C functions declared in ‘yapi.h’ to interact with the intepreter.
Since you no longer need an interpreted wrapper (although you might still choose to have one – some tasks are easier, occasionally dramatically easier, in interpreted code than in compiled code), your inteface definition file ‘rtleg.i’ becomes trivial:
plug_in, "rtleg"; extern rtleg; |
But your C source becomes much more complicated:
#include "yapi.h" #include <math.h> extern void Y_rtleg(int argc); void Y_rtleg(int argc) { int iarg; long i, n, dimsr[Y_DIMSIZE], dims[Y_DIMSIZE]; double *r, *rr, **px, *x, *a, xval; int *flag, flagr; if (argc < 2) y_error("rtleg needs at least two arguments"); /* get some scratch space to deal with the indeterminate * argument count, this increases iarg for arguments by 1 */ px = ypush_scratch((argc-1)*(sizeof(double)+sizeof(int)), 0); flag = (int *)&px[argc]; /* get r argument as double */ r = ygeta_d(argc, 0, dimsr); /* first pass through remaining arguments checks they are * conformable, and builds the result dimension list */ for (flagr=0, iarg=argc-1 ; iarg>0 ; iarg--) { flagr |= flag[iarg-1] = yarg_dims(iarg, dims, dimsr); if (flag[iarg-1] < 0) y_errorn("rtleg argument %ld not conformable", argc-iarg); } /* broadcast r argument to result shape if necessary */ if (flagr & 1) yarg_bcast(argc, dimsr); /* second pass broadcasts all arguments to result shape * and converts to type double */ for (iarg=argc-1 ; iarg>0 ; iarg--) { if (flag[iarg-1] & 2) yarg_bcast(iarg, dimsr); px[iarg-1] = ygeta_d(iarg, &n, 0); flag[iarg-1] = yarg_scratch(iarg); if (flag[iarg-1]) flagr = iarg; } /* push new array and fill with reciprocal of r */ rr = ypush_d(dimsr); for (i=0 ; i<n ; i++) if (r[i] > 0.) rr[i] = 1./r[i]; else if (r[i] < 0.) rr[i] = -1./r[i]; else rr[i] = -1.; if (flagr) { /* use scratch x as accumulator */ a = px[flagr-1]; for (i=0 ; i<n ; i++) { xval = a[i]*rr[i]; a[i] = -xval*xval; } } else { /* need to create accumulator on stack */ a = ypush_d(dimsr); } /* compute 1 - sum((x/r)^2) */ for (i=0 ; i<n ; i++) a[i] += (rr[i]<0.)? 0. : 1.; for (iarg=0 ; iarg<argc-1 ; iarg++) { x = px[iarg]; if (x == a) continue; for (i=0 ; i<n ; i++) { xval = x[i]*rr[i]; a[i] -= xval*xval; } } /* compute final result */ for (i=0 ; i<n ; i++) a[i] = r[i]*sqrt(a[i]); /* discard overlying stack elements to leave result on top */ if (flagr) yarg_drop(flagr+1); } |
The ypush_scratch function enables you to create scratch space that is safe against asynchronous interrupt. You have to write every compiled function assuming that the user will type Control-C and interrupt your function. If you used malloc to obtain scratch space, and your function were asynchronously interrupted before the matching free, your routine would leak memory. In this case, the scratch space is only required to cope with the fact that the function accepts an indeterminate number of arguments.
The implementation unavoidably requires a temporary array to hold the reciprocal of the first argument, which is used to normalize the remaining arguments to prevent overflows or erronious underflows when the other arguments are squared. However, there is an opportunity to reuse an argument array which happens to be a temporary (either because it is the result of an expression, or because it had to be type converted or broadcast) as the result. Taking advantage of that optimization also costs a few additional lines of source code towards the end of the function.
What you get for this modest complexity is an rtleg function that is equal in performance and features to the built in abs function.