aboutsummaryrefslogtreecommitdiff
path: root/book/src/tutorial.md
blob: 1182dc2c85582735dd1d2d7b48d7fe4b798a83b9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
{{#title Tutorial — Rust ♡ C++}}
# Tutorial: CXX blobstore client

This example walks through a Rust application that calls into a C++ client of a
blobstore service. In fact we'll see calls going in both directions: Rust to C++
as well as C++ to Rust. For your own use case it may be that you need just one
of these directions.

All of the code involved in the example is shown on this page, but it's also
provided in runnable form in the *demo* directory of
<https://github.com/dtolnay/cxx>. To try it out directly, run `cargo run` from
that directory.

This tutorial assumes you've read briefly about **shared structs**, **opaque
types**, and **functions** in the [*Core concepts*](concepts.md) page.

## Creating the project

We'll use Cargo, which is the build system commonly used by open source Rust
projects. (CXX works with other build systems too; refer to chapter 5.)

Create a blank Cargo project: `mkdir cxx-demo`; `cd cxx-demo`; `cargo init`.

Edit the Cargo.toml to add a dependency on the `cxx` crate:

```toml,hidelines=...
# Cargo.toml
...[package]
...name = "cxx-demo"
...version = "0.1.0"
...edition = "2021"

[dependencies]
cxx = "1.0"
```

We'll revisit this Cargo.toml later when we get to compiling some C++ code.

## Defining the language boundary

CXX relies on a description of the function signatures that will be exposed from
each language to the other. You provide this description using `extern` blocks
in a Rust module annotated with the `#[cxx::bridge]` attribute macro.

We'll open with just the following at the top of src/main.rs and walk through
each item in detail.

```rust,noplayground
// src/main.rs

#[cxx::bridge]
mod ffi {

}
#
# fn main() {}
```

The contents of this module will be everything that needs to be agreed upon by
both sides of the FFI boundary.

## Calling a C++ function from Rust

Let's obtain an instance of the C++ blobstore client, a class `BlobstoreClient`
defined in C++.

We'll treat `BlobstoreClient` as an *opaque type* in CXX's classification so
that Rust does not need to assume anything about its implementation, not even
its size or alignment. In general, a C++ type might have a move-constructor
which is incompatible with Rust's move semantics, or may hold internal
references which cannot be modeled by Rust's borrowing system. Though there are
alternatives, the easiest way to not care about any such thing on an FFI
boundary is to require no knowledge about a type by treating it as opaque.

Opaque types may only be manipulated behind an indirection such as a reference
`&`, a Rust `Box`, or a `UniquePtr` (Rust binding of `std::unique_ptr`). We'll
add a function through which C++ can return a `std::unique_ptr<BlobstoreClient>`
to Rust.

```rust,noplayground
// src/main.rs

#[cxx::bridge]
mod ffi {
    unsafe extern "C++" {
        include!("cxx-demo/include/blobstore.h");

        type BlobstoreClient;

        fn new_blobstore_client() -> UniquePtr<BlobstoreClient>;
    }
}

fn main() {
    let client = ffi::new_blobstore_client();
}
```

The nature of `unsafe` extern blocks is clarified in more detail in the
[*extern "C++"*](extern-c++.md) chapter. In brief: the programmer is **not**
promising that the signatures they have typed in are accurate; that would be
unreasonable. CXX performs static assertions that the signatures exactly match
what is declared in C++. Rather, the programmer is only on the hook for things
that C++'s semantics are not precise enough to capture, i.e. things that would
only be represented at most by comments in the C++ code. In this case, it's
whether `new_blobstore_client` is safe or unsafe to call. If that function said
something like "must be called at most once or we'll stomp yer memery", Rust
would instead want to expose it as `unsafe fn new_blobstore_client`, this time
inside a safe `extern "C++"` block because the programmer is no longer on the
hook for any safety claim about the signature.

If you build this file right now with `cargo build`, it won't build because we
haven't written a C++ implementation of `new_blobstore_client` nor instructed
Cargo about how to link it into the resulting binary. You'll see an error from
the linker like this:

```console
error: linking with `cc` failed: exit code: 1
 |
 = /bin/ld: target/debug/deps/cxx-demo-7cb7fddf3d67d880.rcgu.o: in function `cxx_demo::ffi::new_blobstore_client':
   src/main.rs:1: undefined reference to `cxxbridge1$new_blobstore_client'
   collect2: error: ld returned 1 exit status
```

## Adding in the C++ code

In CXX's integration with Cargo, all #include paths begin with a crate name by
default (when not explicitly selected otherwise by a crate; see
`CFG.include_prefix` in chapter 5). That's why we see
`include!("cxx-demo/include/blobstore.h")` above &mdash; we'll be putting the
C++ header at relative path `include/blobstore.h` within the Rust crate. If your
crate is named something other than `cxx-demo` according to the `name` field in
Cargo.toml, you will need to use that name everywhere in place of `cxx-demo`
throughout this tutorial.

```cpp
// include/blobstore.h

#pragma once
#include <memory>

class BlobstoreClient {
public:
  BlobstoreClient();
};

std::unique_ptr<BlobstoreClient> new_blobstore_client();
```

```cpp
// src/blobstore.cc

#include "cxx-demo/include/blobstore.h"

BlobstoreClient::BlobstoreClient() {}

std::unique_ptr<BlobstoreClient> new_blobstore_client() {
  return std::unique_ptr<BlobstoreClient>(new BlobstoreClient());
}
```

Using `std::make_unique` would work too, as long as you pass `std("c++14")` to
the C++ compiler as described later on.

The placement in *include/* and *src/* is not significant; you can place C++
code anywhere else in the crate as long as you use the right paths throughout
the tutorial.

Be aware that *CXX does not look at any of these files.* You're free to put
arbitrary C++ code in here, #include your own libraries, etc. All we do is emit
static assertions against what you provide in the headers.

## Compiling the C++ code with Cargo

Cargo has a [build scripts] feature suitable for compiling non-Rust code.

We need to introduce a new build-time dependency on CXX's C++ code generator in
Cargo.toml:

```toml,hidelines=...
# Cargo.toml
...[package]
...name = "cxx-demo"
...version = "0.1.0"
...edition = "2021"

[dependencies]
cxx = "1.0"

[build-dependencies]
cxx-build = "1.0"
```

Then add a build.rs build script adjacent to Cargo.toml to run the cxx-build
code generator and C++ compiler. The relevant arguments are the path to the Rust
source file containing the cxx::bridge language boundary definition, and the
paths to any additional C++ source files to be compiled during the Rust crate's
build.

```rust,noplayground
// build.rs

fn main() {
    cxx_build::bridge("src/main.rs")
        .file("src/blobstore.cc")
        .compile("cxx-demo");

    println!("cargo:rerun-if-changed=src/main.rs");
    println!("cargo:rerun-if-changed=src/blobstore.cc");
    println!("cargo:rerun-if-changed=include/blobstore.h");
}
```

This build.rs would also be where you set up C++ compiler flags, for example if
you'd like to have access to `std::make_unique` from C++14. See the page on
***[Cargo-based builds](build/cargo.md)*** for more details about CXX's Cargo
integration.

```rust,noplayground
# // build.rs
#
# fn main() {
    cxx_build::bridge("src/main.rs")
        .file("src/blobstore.cc")
        .std("c++14")
        .compile("cxx-demo");
# }
```

[build scripts]: https://doc.rust-lang.org/cargo/reference/build-scripts.html

The project should now build and run successfully, though not do anything useful
yet.

```console
cxx-demo$  cargo run
  Compiling cxx-demo v0.1.0
  Finished dev [unoptimized + debuginfo] target(s) in 0.34s
  Running `target/debug/cxx-demo`

cxx-demo$
```

## Calling a Rust function from C++

Our C++ blobstore supports a `put` operation for a discontiguous buffer upload.
For example we might be uploading snapshots of a circular buffer which would
tend to consist of 2 pieces, or fragments of a file spread across memory for
some other reason (like a rope data structure).

We'll express this by handing off an iterator over contiguous borrowed chunks.
This loosely resembles the API of the widely used `bytes` crate's `Buf` trait.
During a `put`, we'll make C++ call back into Rust to obtain contiguous chunks
of the upload (all with no copying or allocation on the language boundary). In
reality the C++ client might contain some sophisticated batching of chunks
and/or parallel uploading that all of this ties into.

```rust,noplayground
// src/main.rs

#[cxx::bridge]
mod ffi {
    extern "Rust" {
        type MultiBuf;

        fn next_chunk(buf: &mut MultiBuf) -> &[u8];
    }

    unsafe extern "C++" {
        include!("cxx-demo/include/blobstore.h");

        type BlobstoreClient;

        fn new_blobstore_client() -> UniquePtr<BlobstoreClient>;
        fn put(&self, parts: &mut MultiBuf) -> u64;
    }
}
#
# fn main() {
#     let client = ffi::new_blobstore_client();
# }
```

Any signature having a `self` parameter (the Rust name for C++'s `this`) is
considered a method / non-static member function. If there is only one `type` in
the surrounding extern block, it'll be a method of that type. If there is more
than one `type`, you can disambiguate which one a method belongs to by writing
`self: &BlobstoreClient` in the argument list.

As usual, now we need to provide Rust definitions of everything declared by the
`extern "Rust"` block and a C++ definition of the new signature declared by the
`extern "C++"` block.

```rust,noplayground
// src/main.rs
#
# #[cxx::bridge]
# mod ffi {
#     extern "Rust" {
#         type MultiBuf;
#
#         fn next_chunk(buf: &mut MultiBuf) -> &[u8];
#     }
#
#     unsafe extern "C++" {
#         include!("cxx-demo/include/blobstore.h");
#
#         type BlobstoreClient;
#
#         fn new_blobstore_client() -> UniquePtr<BlobstoreClient>;
#         fn put(&self, parts: &mut MultiBuf) -> u64;
#     }
# }

// An iterator over contiguous chunks of a discontiguous file object. Toy
// implementation uses a Vec<Vec<u8>> but in reality this might be iterating
// over some more complex Rust data structure like a rope, or maybe loading
// chunks lazily from somewhere.
pub struct MultiBuf {
    chunks: Vec<Vec<u8>>,
    pos: usize,
}

pub fn next_chunk(buf: &mut MultiBuf) -> &[u8] {
    let next = buf.chunks.get(buf.pos);
    buf.pos += 1;
    next.map_or(&[], Vec::as_slice)
}
#
# fn main() {
#     let client = ffi::new_blobstore_client();
# }
```

```cpp,hidelines=...
// include/blobstore.h

...#pragma once
...#include <memory>
...
struct MultiBuf;

class BlobstoreClient {
public:
  BlobstoreClient();
  uint64_t put(MultiBuf &buf) const;
};
...
...std::unique_ptr<BlobstoreClient> new_blobstore_client();
```

In blobstore.cc we're able to call the Rust `next_chunk` function, exposed to
C++ by a header `main.rs.h` generated by the CXX code generator. In CXX's Cargo
integration this generated header has a path containing the crate name, the
relative path of the Rust source file within the crate, and a `.rs.h` extension.

```cpp,hidelines=...
// src/blobstore.cc

#include "cxx-demo/include/blobstore.h"
#include "cxx-demo/src/main.rs.h"
#include <functional>
#include <string>
...
...BlobstoreClient::BlobstoreClient() {}
...
...std::unique_ptr<BlobstoreClient> new_blobstore_client() {
...  return std::make_unique<BlobstoreClient>();
...}

// Upload a new blob and return a blobid that serves as a handle to the blob.
uint64_t BlobstoreClient::put(MultiBuf &buf) const {
  // Traverse the caller's chunk iterator.
  std::string contents;
  while (true) {
    auto chunk = next_chunk(buf);
    if (chunk.size() == 0) {
      break;
    }
    contents.append(reinterpret_cast<const char *>(chunk.data()), chunk.size());
  }

  // Pretend we did something useful to persist the data.
  auto blobid = std::hash<std::string>{}(contents);
  return blobid;
}
```

This is now ready to use. :)

```rust,noplayground
// src/main.rs
#
# #[cxx::bridge]
# mod ffi {
#     extern "Rust" {
#         type MultiBuf;
#
#         fn next_chunk(buf: &mut MultiBuf) -> &[u8];
#     }
#
#     unsafe extern "C++" {
#         include!("cxx-demo/include/blobstore.h");
#
#         type BlobstoreClient;
#
#         fn new_blobstore_client() -> UniquePtr<BlobstoreClient>;
#         fn put(&self, parts: &mut MultiBuf) -> u64;
#     }
# }
#
# pub struct MultiBuf {
#     chunks: Vec<Vec<u8>>,
#     pos: usize,
# }
# pub fn next_chunk(buf: &mut MultiBuf) -> &[u8] {
#     let next = buf.chunks.get(buf.pos);
#     buf.pos += 1;
#     next.map_or(&[], Vec::as_slice)
# }

fn main() {
    let client = ffi::new_blobstore_client();

    // Upload a blob.
    let chunks = vec![b"fearless".to_vec(), b"concurrency".to_vec()];
    let mut buf = MultiBuf { chunks, pos: 0 };
    let blobid = client.put(&mut buf);
    println!("blobid = {}", blobid);
}
```

```console
cxx-demo$  cargo run
  Compiling cxx-demo v0.1.0
  Finished dev [unoptimized + debuginfo] target(s) in 0.41s
  Running `target/debug/cxx-demo`

blobid = 9851996977040795552
```

## Interlude: What gets generated?

For the curious, it's easy to look behind the scenes at what CXX has done to
make these function calls work. You shouldn't need to do this during normal
usage of CXX, but for the purpose of this tutorial it can be educative.

CXX comprises *two* code generators: a Rust one (which is the cxx::bridge
attribute procedural macro) and a C++ one.

### Rust generated code

It's easiest to view the output of the procedural macro by installing
[cargo-expand]. Then run `cargo expand ::ffi` to macro-expand the `mod ffi`
module.

[cargo-expand]: https://github.com/dtolnay/cargo-expand

```console
cxx-demo$  cargo install cargo-expand
cxx-demo$  cargo expand ::ffi
```

You'll see some deeply unpleasant code involving `#[repr(C)]`, `#[link_name]`,
and `#[export_name]`.

### C++ generated code

For debugging convenience, `cxx_build` links all generated C++ code into Cargo's
target directory under *target/cxxbridge/*.

```console
cxx-demo$  exa -T target/cxxbridge/
target/cxxbridge
├── cxx-demo
│  └── src
│     ├── main.rs.cc -> ../../../debug/build/cxx-demo-11c6f678ce5c3437/out/cxxbridge/sources/cxx-demo/src/main.rs.cc
│     └── main.rs.h -> ../../../debug/build/cxx-demo-11c6f678ce5c3437/out/cxxbridge/include/cxx-demo/src/main.rs.h
└── rust
   └── cxx.h -> ~/.cargo/registry/src/github.com-1ecc6299db9ec823/cxx-1.0.0/include/cxx.h
```

In those files you'll see declarations or templates of any CXX Rust types
present in your language boundary (like `rust::Slice<T>` for `&[T]`) and `extern
"C"` signatures corresponding to your extern functions.

If it fits your workflow better, the CXX C++ code generator is also available as
a standalone executable which outputs generated code to stdout.

```console
cxx-demo$  cargo install cxxbridge-cmd
cxx-demo$  cxxbridge src/main.rs
```

## Shared data structures

So far the calls in both directions above only used **opaque types**, not
**shared structs**.

Shared structs are data structures whose complete definition is visible to both
languages, making it possible to pass them by value across the language
boundary. Shared structs translate to a C++ aggregate-initialization compatible
struct exactly matching the layout of the Rust one.

As the last step of this demo, we'll use a shared struct `BlobMetadata` to pass
metadata about blobs between our Rust application and C++ blobstore client.

```rust,noplayground
// src/main.rs

#[cxx::bridge]
mod ffi {
    struct BlobMetadata {
        size: usize,
        tags: Vec<String>,
    }

    extern "Rust" {
        // ...
#         type MultiBuf;
#
#         fn next_chunk(buf: &mut MultiBuf) -> &[u8];
    }

    unsafe extern "C++" {
        // ...
#         include!("cxx-demo/include/blobstore.h");
#
#         type BlobstoreClient;
#
#         fn new_blobstore_client() -> UniquePtr<BlobstoreClient>;
#         fn put(&self, parts: &mut MultiBuf) -> u64;
        fn tag(&self, blobid: u64, tag: &str);
        fn metadata(&self, blobid: u64) -> BlobMetadata;
    }
}
#
# pub struct MultiBuf {
#     chunks: Vec<Vec<u8>>,
#     pos: usize,
# }
# pub fn next_chunk(buf: &mut MultiBuf) -> &[u8] {
#     let next = buf.chunks.get(buf.pos);
#     buf.pos += 1;
#     next.map_or(&[], Vec::as_slice)
# }

fn main() {
    let client = ffi::new_blobstore_client();

    // Upload a blob.
    let chunks = vec![b"fearless".to_vec(), b"concurrency".to_vec()];
    let mut buf = MultiBuf { chunks, pos: 0 };
    let blobid = client.put(&mut buf);
    println!("blobid = {}", blobid);

    // Add a tag.
    client.tag(blobid, "rust");

    // Read back the tags.
    let metadata = client.metadata(blobid);
    println!("tags = {:?}", metadata.tags);
}
```

```cpp,hidelines=...
// include/blobstore.h

#pragma once
#include "rust/cxx.h"
...#include <memory>

struct MultiBuf;
struct BlobMetadata;

class BlobstoreClient {
public:
  BlobstoreClient();
  uint64_t put(MultiBuf &buf) const;
  void tag(uint64_t blobid, rust::Str tag) const;
  BlobMetadata metadata(uint64_t blobid) const;

private:
  class impl;
  std::shared_ptr<impl> impl;
};
...
...std::unique_ptr<BlobstoreClient> new_blobstore_client();
```

```cpp,hidelines=...
// src/blobstore.cc

#include "cxx-demo/include/blobstore.h"
#include "cxx-demo/src/main.rs.h"
#include <algorithm>
#include <functional>
#include <set>
#include <string>
#include <unordered_map>

// Toy implementation of an in-memory blobstore.
//
// In reality the implementation of BlobstoreClient could be a large
// complex C++ library.
class BlobstoreClient::impl {
  friend BlobstoreClient;
  using Blob = struct {
    std::string data;
    std::set<std::string> tags;
  };
  std::unordered_map<uint64_t, Blob> blobs;
};

BlobstoreClient::BlobstoreClient() : impl(new class BlobstoreClient::impl) {}
...
...// Upload a new blob and return a blobid that serves as a handle to the blob.
...uint64_t BlobstoreClient::put(MultiBuf &buf) const {
...  // Traverse the caller's chunk iterator.
...  std::string contents;
...  while (true) {
...    auto chunk = next_chunk(buf);
...    if (chunk.size() == 0) {
...      break;
...    }
...    contents.append(reinterpret_cast<const char *>(chunk.data()), chunk.size());
...  }
...
...  // Insert into map and provide caller the handle.
...  auto blobid = std::hash<std::string>{}(contents);
...  impl->blobs[blobid] = {std::move(contents), {}};
...  return blobid;
...}

// Add tag to an existing blob.
void BlobstoreClient::tag(uint64_t blobid, rust::Str tag) const {
  impl->blobs[blobid].tags.emplace(tag);
}

// Retrieve metadata about a blob.
BlobMetadata BlobstoreClient::metadata(uint64_t blobid) const {
  BlobMetadata metadata{};
  auto blob = impl->blobs.find(blobid);
  if (blob != impl->blobs.end()) {
    metadata.size = blob->second.data.size();
    std::for_each(blob->second.tags.cbegin(), blob->second.tags.cend(),
                  [&](auto &t) { metadata.tags.emplace_back(t); });
  }
  return metadata;
}
...
...std::unique_ptr<BlobstoreClient> new_blobstore_client() {
...  return std::make_unique<BlobstoreClient>();
...}
```

```console
cxx-demo$  cargo run
  Running `target/debug/cxx-demo`

blobid = 9851996977040795552
tags = ["rust"]
```

*You've now seen all the code involved in the tutorial. It's available all
together in runnable form in the* demo *directory of
<https://github.com/dtolnay/cxx>. You can run it directly without stepping
through the steps above by running `cargo run` from that directory.*

<br>

# Takeaways

The key contribution of CXX is it gives you Rust&ndash;C++ interop in which
*all* of the Rust side of the code you write *really* looks like you are just
writing normal Rust, and the C++ side *really* looks like you are just writing
normal C++.

You've seen in this tutorial that none of the code involved feels like C or like
the usual perilous "FFI glue" prone to leaks or memory safety flaws.

An expressive system of opaque types, shared types, and key standard library
type bindings enables API design on the language boundary that captures the
proper ownership and borrowing contracts of the interface.

CXX plays to the strengths of the Rust type system *and* C++ type system *and*
the programmer's intuitions. An individual working on the C++ side without a
Rust background, or the Rust side without a C++ background, will be able to
apply all their usual intuitions and best practices about development in their
language to maintain a correct FFI.

<br><br>