Designing a C interface for complex types

Update Nov 2022: After doing 4-5 interfaces in this fashion, if I had it to do over, I would probably use SWIG.  I passed on it initially because it had only one maintainer on GitHub.  However, given the time I ended up sinking into writing & debugging custom interfaces, it may have been worth the risk.

Let’s say we are designing an interface to pass a complex type between C++ modules. If there is any possibility of the two modules being built by different compilers, different versions of the same compiler, or for different platforms, the C++ ABI could be different, making updates potentially very brittle.  The unfortunate result is we’ll need to use C functions & structures for the interface.  The same is true of C++ to another language.

As an example, let’s say our module houses a lookup we want to share:

class SkiResortLookup
{
private:
std::unordered_map<SkiResort> Resorts;
public:
//Realistically, we would probably return std::optional<SkiResort> here,
//but for simplicity...
SkiResort& GetResortByName(std::string name) const;
}
class SkiResort
{
private:
std::string Name;
std::vector<SkiRun> Runs;
public:
const std::string& GetName() const;
std::vector<SkiRun> GetRunsByDifficulty(SkiRunDifficultyRating difficulty) const;
}
class SkiRun
{
private:
std::string Name;
SkiRunDifficultyRating Difficulty;
public:
std::string GetName() const;
SkiRunDifficultyRating GetDifficulty() const;
}
enum SkiRunDifficultyRating
{
Green,
Blue,
Black,
DoubleBlack,
Orange
};


If we wanted to make this lookup available to another module, due to reasons previously stated, we can’t pass our nice C++ types. So, we would need something like this:

 

extern "C"
{
//First, we need a structure to pass across the interface.
//Perhaps the simplest option is this:
typedef struct ExternalInterface_SkiResort
{
const char* Name;
}
typedef struct ExternalInterface_SkiRun
{
const char* Name;
//There are problems with passing this enum, but for simplicity...
SkiRunDifficultyRating Difficulty;
}
//I'm no C programmer, but I think this is the traditional approach:
ExternalInterface_GetSkiResort(const char* name, ExternalInterface_SkiResort* output);
ExternalInterface_SkiResort_GetRunsByDifficulty(SkiRunDifficultyRating difficulty,
ExternalInterface_SkiRun* runsOutput,
uint8_t runCountOutput);
}

 

So, the implementation of the external-facing function could look something like this:

 

ExternalInterface_GetSkiResort(const char* name, ExternalInterface_SkiResort* output)
{
//Yes, I know GetResortByName() isn't static,
//but let's ignore the lifecycle of SkiResortLookup for brevity
SkiResort resort = SkiResortLookup::GetResortByName(name);
output = malloc(sizeof(ExternalInterface_SkiResort));
output.Name = resort.GetName().c_str();
}
ExternalInterface_SkiResort_GetRunsByDifficulty(SkiRunDifficultyRating difficulty,
ExternalInterface_SkiRun* runsOutput,
uint8_t* runCountOutput)
{
auto runs = resort.GetRunsByDifficulty(difficulty);
output.RunCount = runs.count();
output.Runs = malloc(sizeof(ExternalInterface_SkiRun) * output.RunCount);
for(size_t i = 0; i < runs.count(); ++i)
{
strcpy(output.Runs[i]->Name, runs[i].GetName().c_str());
output.Runs[i]->Difficulty = runs[i].GetDifficulty();
}
}

 

 

The code in the caller would look something like this

 

ExternalInterface_SkiResort* output = nullptr;
ExternalInterface_GetSkiResort("Tahoe", output);
ExternalInterface_SkiRun* runs;
uint8_t runCount;
ExternalInterface_SkiResort_GetRunsByDifficulty(
SkiRunDifficultyRating.Black,
runs,
&runCount);
//Process the runs list
for (size_t i = 0; i < runCount; ++i)
{
delete runs[i];
}
delete output;

 

I don’t love this approach. To be honest, I’ve never liked returning values via parameter. Especially, in functions with several parameters, I think using return values is much clearer to the reader.

In this case, it also puts a number of responsibilities on the caller. First, if they get a ski resort, there is nothing about that type that makes it obvious how to get more information about it (a run list by difficulty, for example). It requires that they know of the existence of ExternalInterface_SkiResort_GetRunsByDifficulty(). Sure, in our example, it is declared close by and named decently, but that is not always the case. Second, it makes the caller responsible for declaring all of the necessary pointers, then cleaning up the memory manually when finished.  We could add a cleanup function in the library that the caller can use, but again, the caller has to know about the function's existence.

So, how might we design this to avoid these two issues? By taking over these responsibilities in the library:

 

extern "C"
{
typedef void ExternalInterface_SkiRunCollection;
typedef struct IExternalInterface_SkiRunCollection
{
ExternalInterface_SkiRunCollection* Collection;
ExternalInterface_SkiRun* (*GetRunByIndex)(
ExternalInterface_SkiRunCollection* collection,
uint8_t index);
const uint8_t (*GetRunCount)(ExternalInterface_SkiRunCollection* collection);
//Having the destroy method as part of the return struct makes it obvious
//who is responsible for memory management. This can be reinforced with a comment.
void (*Destroy)(ExternalInterface_SkiRunCollection* collection);
} IExternalInterface_SkiRunCollection;
typedef void ExternalInterface_SkiResort;
typedef struct IExternalInterface_SkiResort
{
ExternalInterface_SkiResort* Resort;
const char* (*GetName)(ExternalInterface_SkiResort* resort);
IExternalInterface_SkiRunCollection (*GetRunsByDifficulty)(
ExternalInterface_SkiResort* resort,
SkiRunDifficultyRating difficulty);
} IExternalInterface_SkiResort;
typedef void ExternalInterface_SkiRun;
typedef struct IExternalInterface_SkiRun
{
ExternalInterface_SkiRun* Run;
const char* (*GetName)(ExternalInterface_SkiRun* run);
SkiRunDifficultyRating (*GetDifficulty)(ExternalInterface_SkiRun* run);
} IExternalInterface_SkiRun;
ExternalInterface_SkiResort ExternalInterface_GetSkiResort(const char* name);
}
const char* IExternalInterface_SkiResort_GetName(ExternalInterface_SkiResort* resort);
IExternalInterface_SkiRunCollection ExternalInterface_SkiResort_GetRunsByDifficulty(
SkiRunDifficultyRating difficulty);
ExternalInterface_SkiRun* IExternalInterface_SkiRunCollection_GetRunByIndex(
ExternalInterface_SkiRunCollection* collection,
uint8_t index);
const uint8_t IExternalInterface_SkiRunCollection_GetRunCount(
ExternalInterface_SkiRunCollection* collection);
void IExternalInterface_SkiRunCollection_Destroy(
ExternalInterface_SkiRunCollection* collection);

 

 

In this example, ExternalInterface_GetSkiResort() is the only thing the caller needs to know about. We could even put it in a separate file to set it apart. Use becomes more straightforward as well:

 

auto resort = ExternalInterface_GetSkiResort("Tahoe");
auto blackRuns = resort.GetRunsByDifficulty(resort.Resort, SkiRunDifficultyRating::Black);
//Process the runs list
resort.Destroy(resort.Resort);
blackRuns.Destroy(blackRuns.Collection);

 

The caller no longer has to worry about set up, the operations possible on a given return structure are included in the structure, and the cleanup details are abstracted away. The complexity has been moved inside the library:

 

IExternalInterface_SkiResort ExternalInterface_GetSkiResort(char* name)
{
auto resort = SkiResortLookup::GetResortByName(name);
IExternalInterface_SkiResort result;
result.Resort = reinterpret_cast<ExternalInterface_SkiResort*>(&resort);
result.GetName = IExternalInterface_SkiResort_GetName;
result.GetRunsByDifficulty = ExternalInterface_SkiResort_GetRunsByDifficulty;
return result;
}
const char* IExternalInterface_SkiResort_GetName(ExternalInterface_SkiResort* resort)
{
return reinterpret_cast<SkiResort>(resort).GetName().c_str();
}
IExternalInterface_SkiRunCollection ExternalInterface_SkiResort_GetRunsByDifficulty(
SkiRunDifficultyRating difficulty)
{
auto runs = reinterpret_cast<SkiResort>(resort).GetRunsByDifficulty(difficulty);
//We could have the class return a heap-allocated collection,
//but that would tightly couple it to the interface.
//Also, we would have to return a raw pointer from SkiResort.GetRunsByDifficulty(),
//but that is poor C++ practice.
auto heapAllocatedRuns = new std::vector<SkiRun>(runs);
IExternalInterface_SkiRunCollection result;
result.Collection = reinterpret_cast<ExternalInterface_SkiRunCollection*>(heapAllocatedRuns);
result.GetRunByIndex = IExternalInterface_SkiRunCollection_GetRunByIndex;
result.GetRunCount = IExternalInterface_SkiRunCollection_GetRunCount;
result.Destroy = IExternalInterface_SkiRunCollection_Destroy;
return result;
}
ExternalInterface_SkiRun* IExternalInterface_SkiRunCollection_GetRunByIndex(
ExternalInterface_SkiRunCollection* collection,
uint8_t index)
{
return (*reinterpret_cast<std::vector<SkiRun>*>(collection))[index];
}
const uint8_t IExternalInterface_SkiRunCollection_GetRunCount(
ExternalInterface_SkiRunCollection* collection)
{
return reinterpret_cast<std::vector<SkiRun>*>(collection)->size();
}
void IExternalInterface_SkiRunCollection_Destroy(
ExternalInterface_SkiRunCollection* collection)
{
delete reinterpret_cast<std::vector<SkiRun>*>(collection);
}

 

This is a lot of boilerplate. Is it worth it for the reduced complexity for the caller? I'm not sure.  I've been thinking around objective reasons the latter might be better, but every performance or memory management situation I have thought of can be handled with either approach. It seems to boil down to ease of use for the caller.

Comments

Popular posts from this blog

Fixing Conan Lock Issues

Initialize With Care

Dude, Where's My Framework?