119 lines
6.2 KiB
Markdown
119 lines
6.2 KiB
Markdown
{{#title Other Rust–C++ interop tools — Rust ♡ C++}}
|
||
# Context: other Rust–C++ interop tools
|
||
|
||
When it comes to interacting with an idiomatic Rust API or idiomatic C++ API
|
||
from the other language, the generally applicable approaches outside of the CXX
|
||
crate are:
|
||
|
||
- Build a C-compatible wrapper around the code (expressed using `extern "C"`
|
||
signatures, primitives, C-compatible structs, raw pointers). Translate that
|
||
manually to equivalent `extern "C"` declarations in the other language and
|
||
keep them in sync. Preferably, build a safe/idiomatic wrapper around the
|
||
translated `extern "C"` signatures for callers to use.
|
||
|
||
- Build a C wrapper around the C++ code and use **[bindgen]** to translate that
|
||
programmatically to `extern "C"` Rust signatures. Preferably, build a
|
||
safe/idiomatic Rust wrapper on top.
|
||
|
||
- Build a C-compatible Rust wrapper around the Rust code and use **[cbindgen]**
|
||
to translate that programmatically to an `extern "C"` C++ header. Preferably,
|
||
build an idiomatic C++ wrapper.
|
||
|
||
**If the code you are binding is already *"effectively C"*, the above has you
|
||
covered.** You should use bindgen or cbindgen, or manually translated C
|
||
signatures if there aren't too many and they seldom change.
|
||
|
||
[bindgen]: https://github.com/rust-lang/rust-bindgen
|
||
[cbindgen]: https://github.com/eqrion/cbindgen
|
||
|
||
## C++ vs C
|
||
|
||
Bindgen has some basic support for C++. It can reason about classes, member
|
||
functions, and the layout of templated types. However, everything it does
|
||
related to C++ is best-effort only. Bindgen starts from a point of wanting to
|
||
generate declarations for everything, so any C++ detail that it hasn't
|
||
implemented will cause a crash if you are lucky ([bindgen#388]) or more likely
|
||
silently emit an incompatible signature ([bindgen#380], [bindgen#607],
|
||
[bindgen#652], [bindgen#778], [bindgen#1194]) which will do arbitrary
|
||
memory-unsafe things at runtime whenever called.
|
||
|
||
[bindgen#388]: https://github.com/rust-lang/rust-bindgen/issues/388
|
||
[bindgen#380]: https://github.com/rust-lang/rust-bindgen/issues/380
|
||
[bindgen#607]: https://github.com/rust-lang/rust-bindgen/issues/607
|
||
[bindgen#652]: https://github.com/rust-lang/rust-bindgen/issues/652
|
||
[bindgen#778]: https://github.com/rust-lang/rust-bindgen/issues/778
|
||
[bindgen#1194]: https://github.com/rust-lang/rust-bindgen/issues/1194
|
||
|
||
Thus using bindgen correctly requires not just juggling all your pointers
|
||
correctly at the language boundary, but also understanding ABI details and their
|
||
workarounds and reliably applying them. For example, the programmer will
|
||
discover that their program sometimes segfaults if they call a function that
|
||
returns std::unique\_ptr\<T\> through bindgen. Why? Because unique\_ptr, despite
|
||
being "just a pointer", has a different ABI than a pointer or a C struct
|
||
containing a pointer ([bindgen#778]) and is not directly expressible in Rust.
|
||
Bindgen emitted something that *looks* reasonable and you will have a hell of a
|
||
time in gdb working out what went wrong. Eventually people learn to avoid
|
||
anything involving a non-trivial copy constructor, destructor, or inheritance,
|
||
and instead stick to raw pointers and primitives and trivial structs only
|
||
— in other words C.
|
||
|
||
## Geometric intuition for why there is so much opportunity for improvement
|
||
|
||
The CXX project attempts a different approach to C++ FFI.
|
||
|
||
Imagine Rust and C and C++ as three vertices of a scalene triangle, with length
|
||
of the edges being related to similarity of the languages when it comes to
|
||
library design.
|
||
|
||
The most similar pair (the shortest edge) is Rust–C++. These languages
|
||
have largely compatible concepts of things like ownership, vectors, strings,
|
||
fallibility, etc that translate clearly from signatures in either language to
|
||
signatures in the other language.
|
||
|
||
When we make a binding for an idiomatic C++ API using bindgen, and we fall down
|
||
to raw pointers and primitives and trivial structs as described above, what we
|
||
are really doing is coding the two longest edges of the triangle: getting from
|
||
C++ down to C, and C back up to Rust. The Rust–C edge always involves a
|
||
great deal of `unsafe` code, and the C++–C edge similarly requires care
|
||
just for basic memory safety. Something as basic as "how do I pass ownership of
|
||
a string to the other language?" becomes a strap-yourself-in moment,
|
||
particularly for someone not already an expert in one or both sides.
|
||
|
||
You should think of the `cxx` crate as being the midpoint of the Rust–C++
|
||
edge. Rather than coding the two long edges, you will code half the short edge
|
||
in Rust and half the short edge in C++, in both cases with the library playing
|
||
to the strengths of the Rust type system *and* the C++ type system to help
|
||
assure correctness.
|
||
|
||
If you've already been through the tutorial in the previous chapter, take a
|
||
moment to appreciate that the C++ side *really* looks like we are just writing
|
||
C++ and the Rust side *really* looks like we are just writing Rust. Anything you
|
||
could do wrong in Rust, and almost anything you could reasonably do wrong in
|
||
C++, will be caught by the compiler. This highlights that we are on the "short
|
||
edge of the triangle".
|
||
|
||
But it all still boils down to the same things: it's still FFI from one piece of
|
||
native code to another, nothing is getting serialized or allocated or
|
||
runtime-checked in between.
|
||
|
||
## Role of CXX
|
||
|
||
The role of CXX is to capture the language boundary with more fidelity than what
|
||
`extern "C"` is able to represent. You can think of CXX as being a replacement
|
||
for `extern "C"` in a sense.
|
||
|
||
From this perspective, CXX is a lower level tool than the bindgens. Just as
|
||
bindgen and cbindgen are built on top of `extern "C"`, it makes sense to think
|
||
about higher level tools built on top of CXX. Such a tool might consume a C++
|
||
header and/or Rust module (and/or IDL like Thrift) and emit the corresponding
|
||
safe cxx::bridge language boundary, leveraging CXX's static analysis and
|
||
underlying implementation of that boundary. We are beginning to see this space
|
||
explored by the [autocxx] tool, though nothing yet ready for broad use in the
|
||
way that CXX on its own is.
|
||
|
||
[autocxx]: https://github.com/google/autocxx
|
||
|
||
But note in other ways CXX is higher level than the bindgens, with rich support
|
||
for common standard library types. CXX's types serve as an intuitive vocabulary
|
||
for designing a good boundary between components in different languages.
|