path: root/doc
diff options
authorRoland Illig <roland.illig@gmx.de>2017-06-06 00:06:38 +0200
committerBehdad Esfahbod <behdad@behdad.org>2017-06-05 18:06:38 -0400
commit6b0889d9b0357ef1409c9e28e98d706658acddfd (patch)
tree72309cd6d150ec41830d86e82aa45eb8bfe7a462 /doc
parent2439bd08ff93d4dce761dd6b825917938bd35a4f (diff)
Rescue the old wiki pages from web.archive.org. (#87)
The conversion from HTML to MarkDown has been done mainly by Pandoc, with some manual cleanup afterwards. Fixes #83.
Diffstat (limited to 'doc')
6 files changed, 558 insertions, 0 deletions
diff --git a/doc/MicroTypeExpress.md b/doc/MicroTypeExpress.md
new file mode 100644
index 0000000..97f55d0
--- /dev/null
+++ b/doc/MicroTypeExpress.md
@@ -0,0 +1,42 @@
+# MicroType Express in sfntly
+This page describes sfntly’s implementation of MicroType Express
+The MicroType Express format is [documented](http://www.w3.org/Submission/MTX/),
+and is recently available under license terms we believe are friendly to
+open source (as well as proprietary) implementation. The most popular
+implementation of a decoder is Internet Explorer, as a compression
+option in its Embedded OpenType (EOT) format. Since MicroType Express
+gives an approximate 15% gain in compression over gzip, a web font
+service striving for maximum performance can benefit from implementing
+EOT compression.
+The current codebase in sfntly is for compression only (this is by far
+the most useful for web serving). The easiest way to get started is with
+the command line tool, sfnttool:
+sfnttool -e -x source.ttf source.eot
+The code has been tested extensively against IE, and is currently in
+production in Google Web Fonts, both for full fonts and for subsetting
+(using the text= parameter). This serving path also uses sfntly to
+compute the subsets, and is essentially the same as the -s parameter to
+If you’re interested in the code and details of the compression
+algorithm, here’s a bit of a guide:
+The top-level MicroType Express compression code is in MtxWriter.java
+(in the tools/conversion/eot directory). This code implements almost all
+of the MicroType Express format, including the hdmx tables, push
+sequences, and jump coding. The main feature missing is the VDMX table
+(vertical device metrics), which is very rarely used in web fonts.
+Patches are welcome -- possible areas include: implementing the VDMX
+table, speeding up the LZCOMP entropy coder (the match finding code is a
+straightforward adaptation of the algorithm in the format document),
+implementing a decoder in addition to an encoder, and porting the Java
+implementation to C++.
diff --git a/doc/SfntlyCPlusPlusNotes.md b/doc/SfntlyCPlusPlusNotes.md
new file mode 100644
index 0000000..0a55f85
--- /dev/null
+++ b/doc/SfntlyCPlusPlusNotes.md
@@ -0,0 +1,164 @@
+This document will as the C++ port matures serve as a log to how
+different parts of the library work. As of today, there is some general
+info but mostly CMap specific details.
+Font Data Tables
+One of the important goals in `sfntly` is thread safety which is why
+tables can only be created with their nested `Builder` class and are
+immutable after creation.
+*CMap* = character map; it converts *code points* in a *code page* to
+*glyph IDs*.
+The CMapTable is a table of CMaps (CMaps are also tables; one for every
+encoding supported by the font). Representing an encoding-dependent
+character map is in one of 14 formats, out of which formats 0 and 4 are
+the most used; sfntly/C++ will initially only support formats 0, 2, 4
+and 12.
+### `CMapFormat0` Byte encoding table
+Format 0 is a basic table where a character’s glyph ID is looked up in a
+glyphIdArray256. As it only supports 256 characters it can only encode
+ASCII and ISO 8859-x (alphabet-based languages).
+### `CMapFormat2` High-byte mapping through table
+Chinese, Japanese and Korean (CJK) need special 2 byte encodings for
+each code point like Shift-JIS.
+### `CMapFormat4` Segment mapping to delta values
+This is the preferred format for Unicode Basic Multilingual Plane (BMP)
+encodings according to the Microsoft spec. Format 4 defines segments
+(contiguous ranges of characters; variable length). Finding a
+character’s glyph id first means finding the segment it is part of using
+a binary search (the segments are sorted). A segment has a
+**`startCode`**, an **`endCode`** (the minimum and maximum code points
+in the segment), an **`idDelta`** (delta for all code points in the
+segment) and an **`idRangeOffset`** (offset into glyphIdArray or 0).
+`idDelta` and `idRangeOffset` seem to be the same thing, offsets. In
+fact, `idRangeOffset` uses the glyph array to get the index by relying
+on the fact that the array is immediately after the `idRangeOffset`
+table in the font file. So, the segment’s offset is `idRangeOffset[i]`
+but since the `idRangeOffset` table contains words and not bytes, the
+value is divided by 2.
+``` {.prettyprint}
+glyphIndex = *(&idRangeOffset[i] + idRangeOffset[i] / 2 + (c - startCode[i]))
+`idDelta[i]` is another kind of segment offset used when
+`idRangeOffset[i] = 0`, in which case it is added directly to the
+character code.
+``` {.prettyprint}
+glyphIndex = idDelta[i] + c
+### Class Hierarchy
+`CMapTable` is the main class and the container for all other CMap
+related classes.
+#### Utility classes
+- `CMapTable::CMapId` describes a pair of IDs, platform ID and
+ encoding ID that form the CMaps ID. The ID a CMap has is usually a
+ good indicator as to what kind of format the CMap uses (Unicode
+ CMaps are usually either format 4 or format 12).
+- `CMapTable::CMapIdComparator`
+- `CMapTable::CMapIterator` iteration through the CMapTable is
+ supported through a Java-style iterator.
+- `CMapTable::CMapFilter` Java-style filter; CMapIterator supports
+ filtering CMaps. By default, it accepts everything CMap.
+- `CMapTable::CMapIdFilter` extends CMapFilter; only accepts one type
+ of CMap. Used in conjunction with CMapIterator, this is how the CMap
+ getters are implemented.
+- **`CMapTable::Builder`** is the only way to create a CMapTable.
+#### CMaps
+- **`CMapTable::CMap`** is the abstract base class that all
+ `CMapFormat*` derive. It defines basic functions and the abstract
+ `CMapTable::CMap::CharacterIterator` class to iterate through the
+ characters in the map. The basic implementation just loops through
+ every character between a start and an end. This is overridden so
+ that format specific iteration is performed.
+- `CMapFormat0` (mostly done?)
+- `CMapFormat2` (needs builders)
+- ... coming soon
+`[todo: will add images soon; need to upload to svn]`
+# Table Building Pipeline
+Building a data table in sfntly is done by the
+`FontDataTable::Builder::build` method which defines the general
+pipeline and leaves the details to each implementing subclass
+(`CMapTable::Builder` for example). Note: **`sub*`** methods are table
+**`ReadableFontDataPtr data = internalReadData()`**
+> There are 2 private fields in the `FontDataTable::Builder` class:
+> `rData` and `wData` for `ReadableFontData` and `WritableFontData`.
+> This function returns `rData` if there is any or `wData` (it is cast
+> to readable font data) if `rData` is null. *They hold the same data!*
+**`if (model_changed_)`**
+> A font is essentially a binary blob when loaded inside a `FontData`
+> object. A *model* is the Java/C++ collection of objects that represent
+> the same data in a manipulable format. If you ask for the model (even
+> if you dont write to it), it will count as changed and the underlying
+> raw data will get updated.
+**`if (!subReadyToSerialize())`**
+**`return NULL`**
+1. **`size = subDataToSerialize()`**
+2. **`WritableDataPtr new_data = container_->getNewData(size)`**
+3. **`subSerialize(new_data)`**
+4. **`data = new_data`**
+**`FontDataTablePtr table = subBuildTable(data)`**
+> The table is actually built, where `subBuildTable` is overridden by
+> every class of table but a table header is always added.
+Subtable Builders
+Subtables are lazily built
+When creating the object view of the font and dealing with lots of
+tables, it would be wasteful to create builders for every subtable there
+is since most users only do fairly high level manipulation of the font.
+Instead, **only the tables at font level are fully built**.
+All other subtables have builders that contain valid FontData but the
+object view is not created by default. For the `CMapTable`, this means
+that if you don’t go through the `getCMapBuilders()` method, the CMap
+builders are not initialized. So, the builder map would seem to be empty
+when calling its `size()` method but there are CMaps in the font when
+calling `numCMaps(internalReadFont())`.
+Character encoders
+Sfntly/Java uses a native ICU-based API for encoding characters.
+Sfntly/C++ uses ICU directly. In unit tests we assume text is encoded in
+UTF16. Public APIs will use ICU classes like `UnicodeString`.
diff --git a/doc/Useful_links.md b/doc/Useful_links.md
new file mode 100644
index 0000000..2caa409
--- /dev/null
+++ b/doc/Useful_links.md
@@ -0,0 +1,25 @@
+- TrueType
+ [http://en.wikipedia.org/wiki/TrueType](http://en.wikipedia.org/wiki/TrueType)
+- OpenType
+ [http://en.wikipedia.org/wiki/OpenType](http://en.wikipedia.org/wiki/OpenType)
+Microsoft Typography
+- Font specifications
+ [http://www.microsoft.com/typography/SpecificationsOverview.mspx](http://www.microsoft.com/typography/SpecificationsOverview.mspx),
+ including OpenType and TrueType.
+- Font Tools
+ [http://www.microsoft.com/typography/tools/tools.aspx](http://www.microsoft.com/typography/tools/tools.aspx)
+- TrueType reference manual
+ [http://developer.apple.com/fonts/TTRefMan](http://developer.apple.com/fonts/TTRefMan)
+- OpenType
+ [http://www.adobe.com/type/opentype/](http://www.adobe.com/type/opentype/)
diff --git a/doc/build_cpp.md b/doc/build_cpp.md
new file mode 100644
index 0000000..5e3d344
--- /dev/null
+++ b/doc/build_cpp.md
@@ -0,0 +1,150 @@
+# How to build sfntly C++ port
+## Build Environment Requirements
+* cmake 2.6 or above
+* C++ compiler requirement
+ * Windows: Visual C++ 2008, Visual C++ 2010
+ * Linux: g++ 4.3 or above.
+ g++ must support built-in atomic ops and has companion libstd++.
+ * Mac: Apple XCode 3.2.5 or above.
+## External Dependencies
+sfntly is dependent on several external packages.
+* [Google C++ Testing Framework](https://github.com/google/googletest)
+ You can download the package yourself or extract the one from `ext/redist`.
+ The package needs to be extracted to ext and rename/symbolic link to `gtest`.
+ sfntly C++ port had been tested with gTest 1.6.0.
+* ICU
+ For Linux, default ICU headers in system will be used.
+ Linux users please make sure you have dev packages for ICU.
+ For example, you can run `sudo apt-get install libicu-dev` in Ubuntu and see if the required library is installed.
+ For Windows, download from http://site.icu-project.org/download or extract the one from `ext/redist`.
+ You can also provide your own ICU package.
+ However, you need to alter the include path, library path, and provide `icudt.dll`.
+ Tested with ICU 4.6.1 binary release.
+ For Mac users, please download ICU source tarball from http://site.icu-project.org/download
+ and install according to ICU documents.
+## Getting the Source
+Clone the Git repository from https://github.com/googlei18n/sfntly.
+## Building on Windows
+Let's assume your folder for sfntly is `d:\src\sfntly`.
+1. If you don't have cmake installed, extract the cmake-XXX.zip
+ into `d:\src\sfntly\cpp\ext\cmake` removing the "-XXX" part.
+ The extracted binary should be in `d:\src\sfntly\cpp\ext\cmake\bin\cmake.exe`.
+2. Extract gtest-XXX.zip into `d:\src\sfntly\cpp\ext\gtest`
+ removing the "-XXX" part.
+3. Extract icu4c-XXX.zip into `d:\src\sfntly\cpp\ext\icu`
+ removing the "-XXX" part.
+4. Run the following commands to create the Visual Studio solution files:
+ ```
+ d:
+ cd d:\src\sfntly\cpp
+ md build
+ cd build
+ ..\ext\cmake\bin\cmake ..
+ ```
+You should see `sfntly.sln` in `d:\src\sfntly\cpp\build`.
+5. Until the test is modified to access the fonts in the `ext\data` directory:
+ copy the test fonts from `d:\src\sfntly\cpp\data\ext\` to `d:\src\sfntly\cpp\build\bin\Debug`.
+6. Open sfntly.sln.
+ Since sfntly use STL extensively, please patch your Visual Studio for any STL-related hotfixes/service packs.
+7. Build the solution (if the icuuc dll is not found,
+ you may need to add `d:\src\sfntly\cpp\ext\icu\bin` to the system path).
+### Building on Windows via Command Line
+Visual Studio 2008 and 2010 support command line building,
+therefore you dont need the IDE to build the project.
+For Visual Studio 2008 (assume its installed at `c:\vs08`)
+ cd d:\src\sfntly\cpp\build
+ ..\ext\cmake\bin\cmake .. -G "Visual Studio 9 2008"
+ c:\vs08\common7\tools\vsvars32.bat
+ vcbuild sfntly.sln
+We invoke the cmake with `-G` to make sure
+Visual Studio 2008 solution/project files are generated.
+You can also use `devenv sfntly.sln /build`
+to build the solution instead of using `vcbuild`.
+There are subtle differences between `devenv` and `vcbuild`.
+Please refer to your Visual Studio manual for more details.
+For Visual Studio 2010 (assume its installed at `c:\vs10`)
+ cd d:\src\sfntly\cpp\build
+ ..\ext\cmake\bin\cmake .. -G "Visual Studio 10"
+ c:\vs10\common7\tools\vsvars32.bat
+ msbuild sfntly.sln
+If you install both Visual Studio 2008 and 2010 on your system,
+you cant run the scripts above in the same Command Prompt window.
+`vsvars32.bat` assumes that it is run from a clean Command Prompt.
+## Building on Linux/Mac
+### Recommended Out-of-Source Building
+1. `cd` *&lt;sfntly dir&gt;*
+2. `mkdir build`
+3. `cd build`
+4. `cmake ..`
+5. `make`
+### Default In-Source Building
+> This is not recommended.
+> Please use out-of-source whenever possible.
+1. `cd` *&lt;sfntly dir&gt;*
+2. `cmake .`
+3. `make`
+### Using clang Instead
+Change the `cmake` command line to
+ CC=clang CXX=clang++ cmake ..
+The generated Makefile will use clang.
+Please note that sfntly uses a lot of advanced C++ semantics that
+might not be understood or compiled correctly by earlier versions
+of clang (2.8 and before).
+sfntly is tested to compile and run correctly on clang 3.0 (trunk 135314).
+clang 2.9 might work but unfortunately we dont have the resource to test it.
+### Debug and Release Builds
+Currently Debug builds are set as default.
+To build Release builds, you can either modify the `CMakeList.txt`,
+or set environment variable `CMAKE_BUILD_TYPE` to `Release`
+before invoking `cmake`.
+Windows users can just switch the configuration in Visual Studio.
+## Running Unit Test
+A program named `unit_test` will be generated after a full compilation.
+It expects fonts in `data/ext` to be in the same directory it resides to execute the unit tests.
+Windows users also needs to copy `icudt.dll` and `icuuc.dll` to that directory.
diff --git a/doc/smart_pointer_usage.md b/doc/smart_pointer_usage.md
new file mode 100644
index 0000000..5f77013
--- /dev/null
+++ b/doc/smart_pointer_usage.md
@@ -0,0 +1,145 @@
+Usage of smart pointers in sfntly C++ port
+In sfntly C++ port, an object ref-counting and smart pointer mechanism
+is implemented. The implementation works very much like COM.
+Ref-countable object type inherits from RefCounted&lt;&gt;, which have
+addRef() and release() just like IUnknown in COM (but no
+QueryInterface). Ptr&lt;&gt; is a smart pointer class like
+CComPtr&lt;&gt; which is used to hold the ref-countable objects so that
+the object ref count is handled correctly.
+Lets take a look at the example:
+``` {.prettyprint}
+class Foo : public RefCounted<Foo> {
+ public:
+ static Foo* CreateInstance() {
+ Ptr<Foo> obj = new Foo(); // ref count = 1
+ return obj.detach(); // Giving away the control of this instance.
+ }
+typedef Ptr<Foo> FooPtr; // Common short-hand notation.
+FooPtr obj;
+obj.attach(Foo::CreateInstance()); // ref count = 1
+ // Take over control but not bumping
+ // ref count.
+ FooPtr obj2 = obj; // ref count = 2
+ // Assignment bumps up ref count.
+} // ref count = 1
+ // obj2 out of scope, decrements ref count
+obj.release(); // ref count = 0, object destroyed
+Notes on usage
+- Virtual inherit from RefCount interface in base class if smart
+ pointers are going to be defined.
+<!-- -->
+- All RefCounted objects must be instantiated on the heap. Allocating
+ the object on stack will cause crash.
+<!-- -->
+- Be careful when you have complex inheritance. For example,
+> In this case the smart pointer is pretty dumb and dont count on it to
+> nicely destroy your objects as designed. Try refactor your code like
+> ``` {.prettyprint}
+> class I; // the common interface and implementations
+> class A : public I, public RefCounted<A>; // A specific implementation
+> class B : public I, public RefCounted<B>; // B specific implementation
+> ```
+- Smart pointers here are very bad candidates for function parameters
+ and return values. Use dumb pointers when passing over the stack.
+<!-- -->
+- When down\_cast is performed on a dangling pointer due to bugs in
+ code, VC++ will generate SEH which is not handled well in VC++
+ debugger. One can use WinDBG to run it and get the faulting stack.
+<!-- -->
+- Idioms for heap object as return value
+> If you are not passing that object back, you are the end of scope.
+> Be very careful when using the assignment operator as opposed to
+> attaching. This can easily cause memory leaks. Have a look at [The
+> difference between assignment and attachment with ATL smart
+> pointers](http://blogs.msdn.com/b/oldnewthing/archive/2009/11/20/9925918.aspx).
+> Since were using essentially the same model, it applies in pretty much
+> the same way. The easiest way of knowing when to Attach and when to
+> assign is to check the declaration of the function you are using.
+> ``` {.prettyprint}
+> // Attach to the pointer returned by GetInstance
+> static CALLER_ATTACH FontFactory* GetInstance();
+> // Assign pointer returned by NewCMapBuilder
+> CMap::Builder* NewCMapBuilder(const CMapId& cmap_id,
+> ReadableFontData* data);
+> ```
+Detecting Memory Leak
+> The implementation of COM-like ref-counting and smart pointer remedies
+> the lack of garbage collector in C++ to certain extent, however, it
+> also introduces other problems. The most common one is memory leakage.
+> How do we know that our code leak memory or not?
+- For Linux/Mac, valgrind
+ [http://valgrind.org](http://valgrind.org/)
+ is very useful.
+- For Windows, the easiest is to use Visual C++ built-in CRT debugging
+ [http://msdn.microsoft.com/en-us/library/x98tx3cf(v=vs.80).aspx](http://msdn.microsoft.com/en-us/library/x98tx3cf(v=vs.80).aspx)
+Useful Tips for Debugging Ref-Couting Issue
+ ref-count related activity will be ouput to stderr.
+<!-- -->
+- The logs will look like (under VC2010)
+> Use your favorite editor to transform them into SQL statements, e.g.
+> Regex pattern
+> ``` {.prettyprint}
+> ^([ACDR]) class RefCounted<class sfntly::([A-Za-z0-9:]+)>[ *const]+:oc=([-0-9]+),oid=([0-9]+),rc=([-0-9]+)
+> ```
+> Replace to
+> ``` {.prettyprint}
+> insert into log values(\1, \2, \3, \4, \5);
+> ```
+- Add one line to the beginning of log
+<!-- -->
+- Run sqlite shell, use .read to input the SQL file.
+<!-- -->
+- Run following commands to get the leaking object class and object
+ id:
+<!-- -->
+- Once you know which object is leaking, its much easier to setup
+ conditional breakpoints to identify the real culprit.
diff --git a/doc/unit_tests.md b/doc/unit_tests.md
new file mode 100644
index 0000000..0426979
--- /dev/null
+++ b/doc/unit_tests.md
@@ -0,0 +1,32 @@
+[]{#Unit_Test}Unit Test[](#Unit_Test){.section_anchor}
+sfntly C++ port uses Google testing framework for unit testing.
+All test cases are under src/test.
+To run the unit test, you need to compile sfntly first, then copy
+Tuffy.ttf from data/ext to the folder that unit\_test(.exe) is located.
+Run the unit\_test.
+If you are only interested in one test case, for example, you are
+working on name table and only wants name editing test, you can use the
+gtest\_filter flag in command line.
+``` {.prettyprint}
+unit_test --gtest_filter=NameEditing.*
+For more info regarding running a subset of gtest, please see [GTest
+Its strongly suggested to run unit\_tests with valgrind. The following
+example shows a typical test command
+``` {.prettyprint}
+valgrind --leak-check=full -v ./unit_test --gtest_filter=*.*-FontData.*
+FontData tests are the longest and you may want to bypass them during
+development. They shall be run in bots, however.