summaryrefslogtreecommitdiff
path: root/doc/source/embedding.rst
blob: 8020f21625a51daa33a864ea55487b8ca8081f17 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
================================
Using CFFI for embedding
================================

.. contents::

You can use CFFI to generate C code which exports the API of your choice
to any C application that wants to link with this C code.  This API,
which you define yourself, ends up as the API of a ``.so/.dll/.dylib``
library---or you can statically link it within a larger application.

Possible use cases:

* Exposing a library written in Python directly to C/C++ programs.

* Using Python to make a "plug-in" for an existing C/C++ program that is
  already written to load them.

* Using Python to implement part of a larger C/C++ application (with
  static linking).

* Writing a small C/C++ wrapper around Python, hiding the fact that the
  application is actually written in Python (to make a custom
  command-line interface; for distribution purposes; or simply to make
  it a bit harder to reverse-engineer the application).

The general idea is as follows:

* You write and execute a Python script, which produces a ``.c`` file
  with the API of your choice (and optionally compile it into a
  ``.so/.dll/.dylib``).  The script also gives some Python code to be
  "frozen" inside the ``.so``.

* At runtime, the C application loads this ``.so/.dll/.dylib`` (or is
  statically linked with the ``.c`` source) without having to know that
  it was produced from Python and CFFI.

* The first time a C function is called, Python is initialized and
  the frozen Python code is executed.

* The frozen Python code defines more Python functions that implement the
  C functions of your API, which are then used for all subsequent C
  function calls.

One of the goals of this approach is to be entirely independent from
the CPython C API: no ``Py_Initialize()`` nor ``PyRun_SimpleString()``
nor even ``PyObject``.  It works identically on CPython and PyPy.

This is entirely *new in version 1.5.*  (PyPy contains CFFI 1.5 since
release 5.0.)


Usage
-----

.. __: overview.html#embedding

See the `paragraph in the overview page`__ for a quick introduction.
In this section, we explain every step in more details.  We will use
here this slightly expanded example:

.. code-block:: c

    /* file plugin.h */
    typedef struct { int x, y; } point_t;
    extern int do_stuff(point_t *);

.. code-block:: c

    /* file plugin.h, Windows-friendly version */
    typedef struct { int x, y; } point_t;

    /* When including this file from ffibuilder.set_source(), the
       following macro is defined to '__declspec(dllexport)'.  When
       including this file directly from your C program, we define
       it to 'extern __declspec(dllimport)' instead.

       With non-MSVC compilers we simply define it to 'extern'.
       (The 'extern' is needed for sharing global variables;
       functions would be fine without it.  The macros always
       include 'extern': you must not repeat it when using the
       macros later.)
    */
    #ifndef CFFI_DLLEXPORT
    #  if defined(_MSC_VER)
    #    define CFFI_DLLEXPORT  extern __declspec(dllimport)
    #  else
    #    define CFFI_DLLEXPORT  extern
    #  endif
    #endif

    CFFI_DLLEXPORT int do_stuff(point_t *);

.. code-block:: python

    # file plugin_build.py
    import cffi
    ffibuilder = cffi.FFI()

    with open('plugin.h') as f:
        # read plugin.h and pass it to embedding_api(), manually
        # removing the '#' directives and the CFFI_DLLEXPORT
        data = ''.join([line for line in f if not line.startswith('#')])
        data = data.replace('CFFI_DLLEXPORT', '')
        ffibuilder.embedding_api(data)

    ffibuilder.set_source("my_plugin", r'''
        #include "plugin.h"
    ''')

    ffibuilder.embedding_init_code("""
        from my_plugin import ffi

        @ffi.def_extern()
        def do_stuff(p):
            print("adding %d and %d" % (p.x, p.y))
            return p.x + p.y
    """)

    ffibuilder.compile(target="plugin-1.5.*", verbose=True)
    # or: ffibuilder.emit_c_code("my_plugin.c")

Running the code above produces a *DLL*, i,e, a dynamically-loadable
library.  It is a file with the extension ``.dll`` on Windows,
``.dylib`` on Mac OS/X, or ``.so`` on other platforms.  As usual, it
is produced by generating some intermediate ``.c`` code and then
calling the regular platform-specific C compiler.  See below__ for
some pointers to C-level issues with using the produced library.

.. __: `Issues about using the .so`_

Here are some details about the methods used above:

* **ffibuilder.embedding_api(source):** parses the given C source, which
  declares functions that you want to be exported by the DLL.  It can
  also declare types, constants and global variables that are part of
  the C-level API of your DLL.

  The functions that are found in ``source`` will be automatically
  defined in the ``.c`` file: they will contain code that initializes
  the Python interpreter the first time any of them is called,
  followed by code to call the attached Python function (with
  ``@ffi.def_extern()``, see next point).

  The global variables, on the other hand, are not automatically
  produced.  You have to write their definition explicitly in
  ``ffibuilder.set_source()``, as regular C code (see the point after next).

* **ffibuilder.embedding_init_code(python_code):** this gives
  initialization-time Python source code.  This code is copied
  ("frozen") inside the DLL.  At runtime, the code is executed when
  the DLL is first initialized, just after Python itself is
  initialized.  This newly initialized Python interpreter has got an
  extra "built-in" module that can be loaded magically without
  accessing any files, with a line like "``from my_plugin import ffi,
  lib``".  The name ``my_plugin`` comes from the first argument to
  ``ffibuilder.set_source()``.  This module represents "the caller's C world"
  from the point of view of Python.

  The initialization-time Python code can import other modules or
  packages as usual.  You may have typical Python issues like needing
  to set up ``sys.path`` somehow manually first.

  For every function declared within ``ffibuilder.embedding_api()``, the
  initialization-time Python code or one of the modules it imports
  should use the decorator ``@ffi.def_extern()`` to attach a
  corresponding Python function to it.

  If the initialization-time Python code fails with an exception, then
  you get a traceback printed to stderr, along with more information
  to help you identify problems like wrong ``sys.path``.  If some
  function remains unattached at the time where the C code tries to
  call it, an error message is also printed to stderr and the function
  returns zero/null.

  Note that the CFFI module never calls ``exit()``, but CPython itself
  contains code that calls ``exit()``, for example if importing
  ``site`` fails.  This may be worked around in the future.

* **ffibuilder.set_source(c_module_name, c_code):** set the name of the
  module from Python's point of view.  It also gives more C code which
  will be included in the generated C code.  In trivial examples it
  can be an empty string.  It is where you would ``#include`` some
  other files, define global variables, and so on.  The macro
  ``CFFI_DLLEXPORT`` is available to this C code: it expands to the
  platform-specific way of saying "the following declaration should be
  exported from the DLL".  For example, you would put "``extern int
  my_glob;``" in ``ffibuilder.embedding_api()`` and "``CFFI_DLLEXPORT int
  my_glob = 42;``" in ``ffibuilder.set_source()``.

  Currently, any *type* declared in ``ffibuilder.embedding_api()`` must also
  be present in the ``c_code``.  This is automatic if this code
  contains a line like ``#include "plugin.h"`` in the example above.

* **ffibuilder.compile([target=...] [, verbose=True]):** make the C code and
  compile it.  By default, it produces a file called
  ``c_module_name.dll``, ``c_module_name.dylib`` or
  ``c_module_name.so``, but the default can be changed with the
  optional ``target`` keyword argument.  You can use
  ``target="foo.*"`` with a literal ``*`` to ask for a file called
  ``foo.dll`` on Windows, ``foo.dylib`` on OS/X and ``foo.so``
  elsewhere.  One reason for specifying an alternate ``target`` is to
  include characters not usually allowed in Python module names, like
  "``plugin-1.5.*``".

  For more complicated cases, you can call instead
  ``ffibuilder.emit_c_code("foo.c")`` and compile the resulting ``foo.c``
  file using other means.  CFFI's compilation logic is based on the
  standard library ``distutils`` package, which is really developed
  and tested for the purpose of making CPython extension modules; it
  might not always be appropriate for making general DLLs.  Also, just
  getting the C code is what you need if you do not want to make a
  stand-alone ``.so/.dll/.dylib`` file: this C file can be compiled
  and statically linked as part of a larger application.


More reading
------------

If you're reading this page about embedding and you are not familiar
with CFFI already, here are a few pointers to what you could read
next:

* For the ``@ffi.def_extern()`` functions, integer C types are passed
  simply as Python integers; and simple pointers-to-struct and basic
  arrays are all straightforward enough.  However, sooner or later you
  will need to read about this topic in more details here__.

* ``@ffi.def_extern()``: see `documentation here,`__ notably on what
  happens if the Python function raises an exception.

* To create Python objects attached to C data, one common solution is
  to use ``ffi.new_handle()``.  See documentation here__.

* In embedding mode, the major direction is C code that calls Python
  functions.  This is the opposite of the regular extending mode of
  CFFI, in which the major direction is Python code calling C.  That's
  why the page `Using the ffi/lib objects`_ talks first about the
  latter, and why the direction "C code that calls Python" is
  generally referred to as "callbacks" in that page.  If you also
  need to have your Python code call C code, read more about
  `Embedding and Extending`_ below.

* ``ffibuilder.embedding_api(source)``: follows the same syntax as
  ``ffibuilder.cdef()``, `documented here.`__  You can use the "``...``"
  syntax as well, although in practice it may be less useful than it
  is for ``cdef()``.  On the other hand, it is expected that often the
  C sources that you need to give to ``ffibuilder.embedding_api()`` would be
  exactly the same as the content of some ``.h`` file that you want to
  give to users of your DLL.  That's why the example above does this::

      with open('foo.h') as f:
          ffibuilder.embedding_api(f.read())

  Note that a drawback of this approach is that ``ffibuilder.embedding_api()``
  doesn't support ``#ifdef`` directives.  You may have to use a more
  convoluted expression like::

      with open('foo.h') as f:
          lines = [line for line in f if not line.startswith('#')]
          ffibuilder.embedding_api(''.join(lines))

  As in the example above, you can also use the same ``foo.h`` from
  ``ffibuilder.set_source()``::

      ffibuilder.set_source('module_name', r'''
          #include "foo.h"
      ''')


.. __: using.html#working
.. __: using.html#def-extern
.. __: ref.html#ffi-new-handle
.. __: cdef.html#cdef

.. _`Using the ffi/lib objects`: using.html


Troubleshooting
---------------

* The error message

    cffi extension module 'c_module_name' has unknown version 0x2701

  means that the running Python interpreter located a CFFI version older
  than 1.5.  CFFI 1.5 or newer must be installed in the running Python.

* On PyPy, the error message

    debug: pypy_setup_home: directories 'lib-python' and 'lib_pypy' not
    found in pypy's shared library location or in any parent directory

  means that the ``libpypy-c.so`` file was found, but the standard library
  was not found from this location.  This occurs at least on some Linux
  distributions, because they put ``libpypy-c.so`` inside ``/usr/lib/``,
  instead of the way we recommend, which is: keep that file inside
  ``/opt/pypy/bin/`` and put a symlink to there from ``/usr/lib/``.
  The quickest fix is to do that change manually.


Issues about using the .so
--------------------------

This paragraph describes issues that are not necessarily specific to
CFFI.  It assumes that you have obtained the ``.so/.dylib/.dll`` file as
described above, but that you have troubles using it.  (In summary: it
is a mess.  This is my own experience, slowly built by using Google and
by listening to reports from various platforms.  Please report any
inaccuracies in this paragraph or better ways to do things.)

* The file produced by CFFI should follow this naming pattern:
  ``libmy_plugin.so`` on Linux, ``libmy_plugin.dylib`` on Mac, or
  ``my_plugin.dll`` on Windows (no ``lib`` prefix on Windows).

* First note that this file does not contain the Python interpreter
  nor the standard library of Python.  You still need it to be
  somewhere.  There are ways to compact it to a smaller number of files,
  but this is outside the scope of CFFI (please report if you used some
  of these ways successfully so that I can add some links here).

* In what we'll call the "main program", the ``.so`` can be either
  used dynamically (e.g. by calling ``dlopen()`` or ``LoadLibrary()``
  inside the main program), or at compile-time (e.g. by compiling it
  with ``gcc -lmy_plugin``).  The former case is always used if you're
  building a plugin for a program, and the program itself doesn't need
  to be recompiled.  The latter case is for making a CFFI library that
  is more tightly integrated inside the main program.

* In the case of compile-time usage: you can add the gcc
  option ``-Lsome/path/`` before ``-lmy_plugin`` to describe where the
  ``libmy_plugin.so`` is.  On some platforms, notably Linux, ``gcc``
  will complain if it can find ``libmy_plugin.so`` but not
  ``libpython27.so`` or ``libpypy-c.so``.  To fix it, you need to call
  ``LD_LIBRARY_PATH=/some/path/to/libpypy gcc``.

* When actually executing the main program, it needs to find the
  ``libmy_plugin.so`` but also ``libpython27.so`` or ``libpypy-c.so``.
  For PyPy, unpack a PyPy distribution and you get a full directory
  structure with ``libpypy-c.so`` inside a ``bin`` subdirectory, or on
  Windows ``pypy-c.dll`` inside the top directory; you must not move
  this file around, but just point to it.  One way to point to it is by
  running the main program with some environment variable:
  ``LD_LIBRARY_PATH=/some/path/to/libpypy`` on Linux,
  ``DYLD_LIBRARY_PATH=/some/path/to/libpypy`` on OS/X.

* You can avoid the ``LD_LIBRARY_PATH`` issue if you compile
  ``libmy_plugin.so`` with the path hard-coded inside in the first
  place.  On Linux, this is done by ``gcc -Wl,-rpath=/some/path``.  You
  would put this option in ``ffibuilder.set_source("my_plugin", ...,
  extra_link_args=['-Wl,-rpath=/some/path/to/libpypy'])``.  The path can
  start with ``$ORIGIN`` to mean "the directory where
  ``libmy_plugin.so`` is".  You can then specify a path relative to that
  place, like ``extra_link_args=['-Wl,-rpath=$ORIGIN/../venv/bin']``.
  Use ``ldd libmy_plugin.so`` to look at what path is currently compiled
  in after the expansion of ``$ORIGIN``.)

  After this, you don't need ``LD_LIBRARY_PATH`` any more to locate
  ``libpython27.so`` or ``libpypy-c.so`` at runtime.  In theory it
  should also cover the call to ``gcc`` for the main program.  I wasn't
  able to make ``gcc`` happy without ``LD_LIBRARY_PATH`` on Linux if
  the rpath starts with ``$ORIGIN``, though.

* The same rpath trick might be used to let the main program find
  ``libmy_plugin.so`` in the first place without ``LD_LIBRARY_PATH``.
  (This doesn't apply if the main program uses ``dlopen()`` to load it
  as a dynamic plugin.)  You'd make the main program with ``gcc
  -Wl,-rpath=/path/to/libmyplugin``, possibly with ``$ORIGIN``.  The
  ``$`` in ``$ORIGIN`` causes various shell problems on its own: if
  using a common shell you need to say ``gcc
  -Wl,-rpath=\$ORIGIN``.  From a Makefile, you need to say
  something like ``gcc -Wl,-rpath=\$$ORIGIN``.

* On some Linux distributions, notably Debian, the ``.so`` files of
  CPython C extension modules may be compiled without saying that they
  depend on ``libpythonX.Y.so``.  This makes such Python systems
  unsuitable for embedding if the embedder uses ``dlopen(...,
  RTLD_LOCAL)``.  You get an ``undefined symbol`` error.  See
  `issue #264`__.  A workaround is to first call
  ``dlopen("libpythonX.Y.so", RTLD_LAZY|RTLD_GLOBAL)``, which will
  force ``libpythonX.Y.so`` to be loaded first.

.. __: https://foss.heptapod.net/pypy/cffi/-/issues/264


Using multiple CFFI-made DLLs
-----------------------------

Multiple CFFI-made DLLs can be used by the same process.

Note that all CFFI-made DLLs in a process share a single Python
interpreter.  The effect is the same as the one you get by trying to
build a large Python application by assembling a lot of unrelated
packages.  Some of these might be libraries that monkey-patch some
functions from the standard library, for example, which might be
unexpected from other parts.


Multithreading
--------------

Multithreading should work transparently, based on Python's standard
Global Interpreter Lock.

If two threads both try to call a C function when Python is not yet
initialized, then locking occurs.  One thread proceeds with
initialization and blocks the other thread.  The other thread will be
allowed to continue only when the execution of the initialization-time
Python code is done.

If the two threads call two *different* CFFI-made DLLs, the Python
initialization itself will still be serialized, but the two pieces of
initialization-time Python code will not.  The idea is that there is a
priori no reason for one DLL to wait for initialization of the other
DLL to be complete.

After initialization, Python's standard Global Interpreter Lock kicks
in.  The end result is that when one CPU progresses on executing
Python code, no other CPU can progress on executing more Python code
from another thread of the same process.  At regular intervals, the
lock switches to a different thread, so that no single thread should
appear to block indefinitely.


Testing
-------

For testing purposes, a CFFI-made DLL can be imported in a running
Python interpreter instead of being loaded like a C shared library.

You might have some issues with the file name: for example, on
Windows, Python expects the file to be called ``c_module_name.pyd``,
but the CFFI-made DLL is called ``target.dll`` instead.  The base name
``target`` is the one specified in ``ffibuilder.compile()``, and on Windows
the extension is ``.dll`` instead of ``.pyd``.  You have to rename or
copy the file, or on POSIX use a symlink.

The module then works like a regular CFFI extension module.  It is
imported with "``from c_module_name import ffi, lib``" and exposes on
the ``lib`` object all C functions.  You can test it by calling these
C functions.  The initialization-time Python code frozen inside the
DLL is executed the first time such a call is done.


Embedding and Extending
-----------------------

The embedding mode is not incompatible with the non-embedding mode of
CFFI.

You can use *both* ``ffibuilder.embedding_api()`` and
``ffibuilder.cdef()`` in the
same build script.  You put in the former the declarations you want to
be exported by the DLL; you put in the latter only the C functions and
types that you want to share between C and Python, but not export from
the DLL.

As an example of that, consider the case where you would like to have
a DLL-exported C function written in C directly, maybe to handle some
cases before calling Python functions.  To do that, you must *not* put
the function's signature in ``ffibuilder.embedding_api()``.  (Note that this
requires more hacks if you use ``ffibuilder.embedding_api(f.read())``.)
You must only write the custom function definition in
``ffibuilder.set_source()``, and prefix it with the macro CFFI_DLLEXPORT:

.. code-block:: c

    CFFI_DLLEXPORT int myfunc(int a, int b)
    {
        /* implementation here */
    }

This function can, if it wants, invoke Python functions using the
general mechanism of "callbacks"---called this way because it is a
call from C to Python, although in this case it is not calling
anything back:

.. code-block:: python

    ffibuilder.cdef("""
        extern "Python" int mycb(int);
    """)

    ffibuilder.set_source("my_plugin", r"""

        static int mycb(int);   /* the callback: forward declaration, to make
                                   it accessible from the C code that follows */

        CFFI_DLLEXPORT int myfunc(int a, int b)
        {
            int product = a * b;   /* some custom C code */
            return mycb(product);
        }
    """)

and then the Python initialization code needs to contain the lines:

.. code-block:: python

    @ffi.def_extern()
    def mycb(x):
        print "hi, I'm called with x =", x
        return x * 10

This ``@ffi.def_extern`` is attaching a Python function to the C
callback ``mycb()``, which in this case is not exported from the DLL.
Nevertheless, the automatic initialization of Python occurs when
``mycb()`` is called, if it happens to be the first function called
from C.  More precisely, it does not happen when ``myfunc()`` is
called: this is just a C function, with no extra code magically
inserted around it.  It only happens when ``myfunc()`` calls
``mycb()``.

As the above explanation hints, this is how ``ffibuilder.embedding_api()``
actually implements function calls that directly invoke Python code;
here, we have merely decomposed it explicitly, in order to add some
custom C code in the middle.

In case you need to force, from C code, Python to be initialized
before the first ``@ffi.def_extern()`` is called, you can do so by
calling the C function ``cffi_start_python()`` with no argument.  It
returns an integer, 0 or -1, to tell if the initialization succeeded
or not.  Currently there is no way to prevent a failing initialization
from also dumping a traceback and more information to stderr.
Note that the function ``cffi_start_python()`` is static: it must be
called from C source written inside ``ffibuilder.set_source()``.  To
call it from somewhere else, you need to make a function (with a
different non-static name) in the ``ffibuilder.set_source()`` that just
calls ``cffi_start_python()``.  The reason it is static is to avoid
naming conflicts in case you are ultimately trying to link a large C
program with more than one cffi embedded module in it.