A problem arises when building R interfaces to C/C++ libraries involves testing: how to go about replicating the existing C/C++ tests in R without undue effort. If the C/C++ tests are simple and small enough, they can be manually translated. However, when there are many tests, and each test initializes its own large data structures, the task becomes a chore.
We faced this problem with a recent release of the ECOSolveR
, a solver package crucial to our larger package CVXR
. Until version 0.4, we had been content with including one small test and a larger one using saved RDA
files in the R package. But with our work on CVXR
moving towards a version 1.0 release, we wanted to batten down the hatch.
The ECOS
C library has about 28 tests and many of them include large, initialized arrays as test data. For example, see this.
The initial thought was to parse out the arrays in the C source and write out R equivalents for testing. But that could be error-prone and if a test failed, we could never really be sure that our translation was not a problem.
So we looked around for a lazy solution that would let us create R data structures from within the C test code.
Enter RInside
RInside allows one to embed R inside C/C++.
To use RInside
, one initializes a handle to the R process in the C/C++ program. This handle can be used to assign R variables to values held in C/C++ scalars, arrays, etc. One can also evaluate arbitrary strings using the R process, which is quite handy: we can use it to execute a saveRDS
call to save R structures in a file. The C++ snippet below shows a simple example. (To run the example, first untar the RInside source, copy test.cpp
below to the RInside/inst/examples/standard
, and make test
.)
// test.cpp
#include <RInside.h>
// Stuff a double array in a std::vector
std::vector< double > dVec(double *data, int len) {
std::vector< double > result;
for (int i = 0; i < len; i++) result.push_back(data[i]);
return(result);
}
#define DLEN(x) x? (int) (sizeof(x) / sizeof(double)) : 0
int main(int argc, char* argv[]) {
int n = 1;
double y[] = {1.0, 2.0, 3.0};
RInside R(argc, argv);
R.assign(n, "n"); // assign R variable n to C scalar n
R.assign(dVec(y, DLEN(y)), "y"); // assign R variable y to C vector y
std::string rds_file = "out.RDS";
R.assign(rds_file, "rds_file"); // assign R variable rds_file the value "rds_file"
R.parseEvalQ("saveRDS(list(n = n, y = y), file = rds_file)");
return 0;
}
Note the use of some macros to determine lengths of initialized C arrays; in particular, we account for the fact that the array pointer could be NULL
, common in the code we encounter. Not shown here is a similar macro that can be used for an initialized integer array. (Dynamically allocated arrays pose no difficulty since the lengths would be known.) Such macros are made accessible to the C/C++ code via included headers.
The following features of the ECOS
C tests make it possible to exploit RInside
.
- Each C test is in a single source file. One exception has five tests in a single file.
- Each C test has a pattern: a call to a setup function to set up the data, followed by an actual call to the solver.
- Most (but not all) arrays are initialized in the C source.
- The types of the variables in the setup function call are fixed.
- The tests are all invoked from a single test harness.
This suggests a following strategy.
- Create an R process in the C library testing harness.
- Modify each C test source file by inserting
RInside
calls before calling the setup function. These calls will save the C data in R data structures, and export the R data using a callsaveRDS
. - Rerun the C library tests to generate the
RDS
files. - Use the generated
RDS
files in R package tests.
Example
Taking one test as an example, the setup call (near the bottom of the file) has the following form:
mywork = ECOS_setup(MPC01_n, MPC01_m, MPC01_p, MPC01_l, MPC01_ncones, MPC01_q, 0,
MPC01_Gpr, MPC01_Gjc, MPC01_Gir,
MPC01_Apr, MPC01_Ajc, MPC01_Air,
MPC01_c, MPC01_h, MPC01_b);
And since the signature of ECOS_setup
is known, the parameter types (scalars or arrays) can be fixed.
var_type <- c(n = "int", m = "int", p = "int", l = "int", ncones = "int",
q = "int*", e = "int",
Gpr = "double*", Gjc = "int*", Gir = "int*",
Apr = "double*", Ajc = "int*", Air = "int*",
c = "double*", h = "double*", b = "double*")
Extracting C variable names
A simple string match in R can easily identify the starting and ending lines of the C setup invocation in the test source file and return a list of the C variable names in each call.
get_setup_vars <- function(source, file_source = TRUE) {
n <- length(lines <- if (file_source) readLines(source) else source)
starts <- grep("ECOS_setup", lines)
potential_ends <- grep("\\);$", lines)
ends <- sapply(starts,
function(i) potential_ends[potential_ends >= i][1])
## starts[i]:ends[i] are chunks of interest.
lapply(seq_along(starts),
function(i) {
lines[starts[i]:ends[i]] %>%
paste(collapse = "") %>%
stringr::str_replace(pattern = "(.*ECOS_setup\\()", replacement = "") %>%
stringr::str_replace(pattern = "(\\);)$", replacement = "") %>%
stringr::str_split(pattern = ",") %>%
magrittr::extract2(1) %>%
stringr::str_trim()
})
}
A test invocation.
source_lines <- c('mywork = ECOS_setup(MPC01_n, MPC01_m, MPC01_p, MPC01_l, MPC01_ncones, MPC01_q, 0,',
'MPC01_Gpr, MPC01_Gjc, MPC01_Gir,',
'MPC01_Apr, MPC01_Ajc, MPC01_Air,',
'MPC01_c, MPC01_h, MPC01_b);')
get_setup_vars(source_lines, file_source = FALSE)
[[1]]
[1] "MPC01_n" "MPC01_m" "MPC01_p" "MPC01_l"
[5] "MPC01_ncones" "MPC01_q" "0" "MPC01_Gpr"
[9] "MPC01_Gjc" "MPC01_Gir" "MPC01_Apr" "MPC01_Ajc"
[13] "MPC01_Air" "MPC01_c" "MPC01_h" "MPC01_b"
Inserting C code
The next task is to generate the code to be inserted before the call to setup. The following code does the job, taking the types of each of the variables in the setup call and making use of macros shown in the C++ example above.
#' Assign a variable in R
#' @param x C variable name
#' @param r_name R variable name
set_val <- function(x, r_name) paste0('R.assign(', x, ', "', r_name, '");')
set_ivec <- function(x, r_name) paste0('R.assign(iVec(', x, ', ILEN(', x, ')), "', r_name, '");')
set_dvec <- function(x, r_name) paste0('R.assign(dVec(', x, ', DLEN(', x, ')), "', r_name, '");')
#' Generate C++ code lines for insertion into C test source
#' @param c_name a vector of C variable names that should be saved in R
#' @param a named list of variable types
#' @param rds_file a string naming the rds file for saving the list object
gen_cpp <- function(c_name, var_type, rds_file) {
r_name <- names(var_type)
result <- sapply(seq_along(var_type),
function(i) {
if (var_type[i] == "int" || var_type[i] == "double") {
set_val(c_name[i], r_name[i])
} else if (var_type[i] == "int*") {
set_ivec(c_name[i], r_name[i])
} else if (var_type[i] == "double*") {
set_dvec(c_name[i], r_name[i])
} else {
stop("Unknown variable type")
}
})
## Create list to save:
output <- paste("foo <- list(",
paste(sapply(r_name, function(x) paste(x, "=", x)), collapse=", "),
")")
## Set output file name and insert call to R saveRDS
c(result,
paste0("std::string fname = ", paste0('"', rds_file, '"'), ";"),
set_val('fname', 'fname'),
paste0('R.parseEvalQ("', output, '");'),
paste0('R.parseEvalQ("saveRDS(foo, file=fname)");'))
}
Does it work?
We can check that the appropriate C code is generated for inserting into the file.
c_name <- get_setup_vars(source_lines, file_source = FALSE)[[1]]
gen_cpp(c_name, var_type, "foo.rds")
[1] "R.assign(MPC01_n, \"n\");"
[2] "R.assign(MPC01_m, \"m\");"
[3] "R.assign(MPC01_p, \"p\");"
[4] "R.assign(MPC01_l, \"l\");"
[5] "R.assign(MPC01_ncones, \"ncones\");"
[6] "R.assign(iVec(MPC01_q, ILEN(MPC01_q)), \"q\");"
[7] "R.assign(0, \"e\");"
[8] "R.assign(dVec(MPC01_Gpr, DLEN(MPC01_Gpr)), \"Gpr\");"
[9] "R.assign(iVec(MPC01_Gjc, ILEN(MPC01_Gjc)), \"Gjc\");"
[10] "R.assign(iVec(MPC01_Gir, ILEN(MPC01_Gir)), \"Gir\");"
[11] "R.assign(dVec(MPC01_Apr, DLEN(MPC01_Apr)), \"Apr\");"
[12] "R.assign(iVec(MPC01_Ajc, ILEN(MPC01_Ajc)), \"Ajc\");"
[13] "R.assign(iVec(MPC01_Air, ILEN(MPC01_Air)), \"Air\");"
[14] "R.assign(dVec(MPC01_c, DLEN(MPC01_c)), \"c\");"
[15] "R.assign(dVec(MPC01_h, DLEN(MPC01_h)), \"h\");"
[16] "R.assign(dVec(MPC01_b, DLEN(MPC01_b)), \"b\");"
[17] "std::string fname = \"foo.rds\";"
[18] "R.assign(fname, \"fname\");"
[19] "R.parseEvalQ(\"foo <- list( n = n, m = m, p = p, l = l, ncones = ncones, q = q, e = e, Gpr = Gpr, Gjc = Gjc, Gir = Gir, Apr = Apr, Ajc = Ajc, Air = Air, c = c, h = h, b = b )\");"
[20] "R.parseEvalQ(\"saveRDS(foo, file=fname)\");"
This C code can be inserted before the setup call in each test file in an automated way.
Summary
RInside
can be part of the solution for generating R package tests based on underlying C/C++ library tests. The above approach, with some minor modifications, enabled us to reprogram all 28 C tests for our R package. The modified C test source can be found on GitHub. For instance, compare the original source of one test to the modified one. The modifications are towards the end.
Once the modifications were inserted into the C source files, the C tests were re-run to generate RDS
files now included in ECOSolveR
version 0.5. It was then quite straightforward to add the tests using testthat
as may be seen from the R test source.
(1) Balasubramanian Narasimhan is a Senior Research Scientist in Statistics at Stanford University, and Director of the Data Coordinating Center in the Department of Biomedical Data Sciences.
(2) Anqi Fu is a Ph.D Candidate in Electrical Engineering at Stanford University.
You may leave a comment below or discuss the post in the forum community.rstudio.com.