From 6b0889d9b0357ef1409c9e28e98d706658acddfd Mon Sep 17 00:00:00 2001 From: Roland Illig Date: Tue, 6 Jun 2017 00:06:38 +0200 Subject: Rescue the old wiki pages from web.archive.org. (#87) The conversion from HTML to MarkDown has been done mainly by Pandoc, with some manual cleanup afterwards. Fixes #83. --- doc/MicroTypeExpress.md | 42 ++++++++++++ doc/SfntlyCPlusPlusNotes.md | 164 ++++++++++++++++++++++++++++++++++++++++++++ doc/Useful_links.md | 25 +++++++ doc/build_cpp.md | 150 ++++++++++++++++++++++++++++++++++++++++ doc/smart_pointer_usage.md | 145 +++++++++++++++++++++++++++++++++++++++ doc/unit_tests.md | 32 +++++++++ 6 files changed, 558 insertions(+) create mode 100644 doc/MicroTypeExpress.md create mode 100644 doc/SfntlyCPlusPlusNotes.md create mode 100644 doc/Useful_links.md create mode 100644 doc/build_cpp.md create mode 100644 doc/smart_pointer_usage.md create mode 100644 doc/unit_tests.md (limited to 'doc') diff --git a/doc/MicroTypeExpress.md b/doc/MicroTypeExpress.md new file mode 100644 index 0000000..97f55d0 --- /dev/null +++ b/doc/MicroTypeExpress.md @@ -0,0 +1,42 @@ +# MicroType Express in sfntly + +This page describes sfntly’s implementation of MicroType Express +compression. + +The MicroType Express format is [documented](http://www.w3.org/Submission/MTX/), +and is recently available under license terms we believe are friendly to +open source (as well as proprietary) implementation. The most popular +implementation of a decoder is Internet Explorer, as a compression +option in its Embedded OpenType (EOT) format. Since MicroType Express +gives an approximate 15% gain in compression over gzip, a web font +service striving for maximum performance can benefit from implementing +EOT compression. + +The current codebase in sfntly is for compression only (this is by far +the most useful for web serving). The easiest way to get started is with +the command line tool, sfnttool: + +``` +sfnttool -e -x source.ttf source.eot +``` + +The code has been tested extensively against IE, and is currently in +production in Google Web Fonts, both for full fonts and for subsetting +(using the text= parameter). This serving path also uses sfntly to +compute the subsets, and is essentially the same as the -s parameter to +sfnttool. + +If you’re interested in the code and details of the compression +algorithm, here’s a bit of a guide: + +The top-level MicroType Express compression code is in MtxWriter.java +(in the tools/conversion/eot directory). This code implements almost all +of the MicroType Express format, including the hdmx tables, push +sequences, and jump coding. The main feature missing is the VDMX table +(vertical device metrics), which is very rarely used in web fonts. + +Patches are welcome -- possible areas include: implementing the VDMX +table, speeding up the LZCOMP entropy coder (the match finding code is a +straightforward adaptation of the algorithm in the format document), +implementing a decoder in addition to an encoder, and porting the Java +implementation to C++. diff --git a/doc/SfntlyCPlusPlusNotes.md b/doc/SfntlyCPlusPlusNotes.md new file mode 100644 index 0000000..0a55f85 --- /dev/null +++ b/doc/SfntlyCPlusPlusNotes.md @@ -0,0 +1,164 @@ +Introduction +=============================================================== + +This document will as the C++ port matures serve as a log to how +different parts of the library work. As of today, there is some general +info but mostly CMap specific details. + +------------------------------------------------------------------------ + +Font Data Tables +=========================================================================== + +One of the important goals in `sfntly` is thread safety which is why +tables can only be created with their nested `Builder` class and are +immutable after creation. + +`CMapTable` +-------------------------------------------------------- + +*CMap* = character map; it converts *code points* in a *code page* to +*glyph IDs*. + +The CMapTable is a table of CMaps (CMaps are also tables; one for every +encoding supported by the font). Representing an encoding-dependent +character map is in one of 14 formats, out of which formats 0 and 4 are +the most used; sfntly/C++ will initially only support formats 0, 2, 4 +and 12. + +### `CMapFormat0` Byte encoding table + +Format 0 is a basic table where a character’s glyph ID is looked up in a +glyphIdArray256. As it only supports 256 characters it can only encode +ASCII and ISO 8859-x (alphabet-based languages). + +### `CMapFormat2` High-byte mapping through table + +Chinese, Japanese and Korean (CJK) need special 2 byte encodings for +each code point like Shift-JIS. + +### `CMapFormat4` Segment mapping to delta values + +This is the preferred format for Unicode Basic Multilingual Plane (BMP) +encodings according to the Microsoft spec. Format 4 defines segments +(contiguous ranges of characters; variable length). Finding a +character’s glyph id first means finding the segment it is part of using +a binary search (the segments are sorted). A segment has a +**`startCode`**, an **`endCode`** (the minimum and maximum code points +in the segment), an **`idDelta`** (delta for all code points in the +segment) and an **`idRangeOffset`** (offset into glyphIdArray or 0). + +`idDelta` and `idRangeOffset` seem to be the same thing, offsets. In +fact, `idRangeOffset` uses the glyph array to get the index by relying +on the fact that the array is immediately after the `idRangeOffset` +table in the font file. So, the segment’s offset is `idRangeOffset[i]` +but since the `idRangeOffset` table contains words and not bytes, the +value is divided by 2. + +``` {.prettyprint} +glyphIndex = *(&idRangeOffset[i] + idRangeOffset[i] / 2 + (c - startCode[i])) +``` + +`idDelta[i]` is another kind of segment offset used when +`idRangeOffset[i] = 0`, in which case it is added directly to the +character code. + +``` {.prettyprint} +glyphIndex = idDelta[i] + c +``` + +### Class Hierarchy + +`CMapTable` is the main class and the container for all other CMap +related classes. + +#### Utility classes + +- `CMapTable::CMapId` describes a pair of IDs, platform ID and + encoding ID that form the CMaps ID. The ID a CMap has is usually a + good indicator as to what kind of format the CMap uses (Unicode + CMaps are usually either format 4 or format 12). +- `CMapTable::CMapIdComparator` +- `CMapTable::CMapIterator` iteration through the CMapTable is + supported through a Java-style iterator. +- `CMapTable::CMapFilter` Java-style filter; CMapIterator supports + filtering CMaps. By default, it accepts everything CMap. +- `CMapTable::CMapIdFilter` extends CMapFilter; only accepts one type + of CMap. Used in conjunction with CMapIterator, this is how the CMap + getters are implemented. +- **`CMapTable::Builder`** is the only way to create a CMapTable. + +#### CMaps + +- **`CMapTable::CMap`** is the abstract base class that all + `CMapFormat*` derive. It defines basic functions and the abstract + `CMapTable::CMap::CharacterIterator` class to iterate through the + characters in the map. The basic implementation just loops through + every character between a start and an end. This is overridden so + that format specific iteration is performed. +- `CMapFormat0` (mostly done?) +- `CMapFormat2` (needs builders) +- ... coming soon + +`[todo: will add images soon; need to upload to svn]` + +------------------------------------------------------------------------ + +# Table Building Pipeline + +Building a data table in sfntly is done by the +`FontDataTable::Builder::build` method which defines the general +pipeline and leaves the details to each implementing subclass +(`CMapTable::Builder` for example). Note: **`sub*`** methods are table +specific + +**`ReadableFontDataPtr data = internalReadData()`** +> There are 2 private fields in the `FontDataTable::Builder` class: +> `rData` and `wData` for `ReadableFontData` and `WritableFontData`. +> This function returns `rData` if there is any or `wData` (it is cast +> to readable font data) if `rData` is null. *They hold the same data!* + +**`if (model_changed_)`** +> A font is essentially a binary blob when loaded inside a `FontData` +> object. A *model* is the Java/C++ collection of objects that represent +> the same data in a manipulable format. If you ask for the model (even +> if you dont write to it), it will count as changed and the underlying +> raw data will get updated. + +**`if (!subReadyToSerialize())`** +**`return NULL`** +`else` +1. **`size = subDataToSerialize()`** +2. **`WritableDataPtr new_data = container_->getNewData(size)`** +3. **`subSerialize(new_data)`** +4. **`data = new_data`** + +**`FontDataTablePtr table = subBuildTable(data)`** +> The table is actually built, where `subBuildTable` is overridden by +> every class of table but a table header is always added. + +Subtable Builders +------------------------------------------------------------------------------ + +Subtables are lazily built + +When creating the object view of the font and dealing with lots of +tables, it would be wasteful to create builders for every subtable there +is since most users only do fairly high level manipulation of the font. +Instead, **only the tables at font level are fully built**. + +All other subtables have builders that contain valid FontData but the +object view is not created by default. For the `CMapTable`, this means +that if you don’t go through the `getCMapBuilders()` method, the CMap +builders are not initialized. So, the builder map would seem to be empty +when calling its `size()` method but there are CMaps in the font when +calling `numCMaps(internalReadFont())`. + +------------------------------------------------------------------------ + +Character encoders +--------------------------------------------------------------------------------- + +Sfntly/Java uses a native ICU-based API for encoding characters. +Sfntly/C++ uses ICU directly. In unit tests we assume text is encoded in +UTF16. Public APIs will use ICU classes like `UnicodeString`. diff --git a/doc/Useful_links.md b/doc/Useful_links.md new file mode 100644 index 0000000..2caa409 --- /dev/null +++ b/doc/Useful_links.md @@ -0,0 +1,25 @@ +Wiki + +- TrueType + [http://en.wikipedia.org/wiki/TrueType](http://en.wikipedia.org/wiki/TrueType) +- OpenType + [http://en.wikipedia.org/wiki/OpenType](http://en.wikipedia.org/wiki/OpenType) + +Microsoft Typography +[http://www.microsoft.com/typography/default.mspx](http://www.microsoft.com/typography/default.mspx) + +- Font specifications + [http://www.microsoft.com/typography/SpecificationsOverview.mspx](http://www.microsoft.com/typography/SpecificationsOverview.mspx), + including OpenType and TrueType. +- Font Tools + [http://www.microsoft.com/typography/tools/tools.aspx](http://www.microsoft.com/typography/tools/tools.aspx) + +Apple + +- TrueType reference manual + [http://developer.apple.com/fonts/TTRefMan](http://developer.apple.com/fonts/TTRefMan) + +Adobe + +- OpenType + [http://www.adobe.com/type/opentype/](http://www.adobe.com/type/opentype/) diff --git a/doc/build_cpp.md b/doc/build_cpp.md new file mode 100644 index 0000000..5e3d344 --- /dev/null +++ b/doc/build_cpp.md @@ -0,0 +1,150 @@ +# How to build sfntly C++ port + +## Build Environment Requirements + +* cmake 2.6 or above +* C++ compiler requirement + * Windows: Visual C++ 2008, Visual C++ 2010 + * Linux: g++ 4.3 or above. + g++ must support built-in atomic ops and has companion libstd++. + * Mac: Apple XCode 3.2.5 or above. + +## External Dependencies + +sfntly is dependent on several external packages. + +* [Google C++ Testing Framework](https://github.com/google/googletest) + + You can download the package yourself or extract the one from `ext/redist`. + The package needs to be extracted to ext and rename/symbolic link to `gtest`. + + sfntly C++ port had been tested with gTest 1.6.0. + +* ICU + + For Linux, default ICU headers in system will be used. + Linux users please make sure you have dev packages for ICU. + For example, you can run `sudo apt-get install libicu-dev` in Ubuntu and see if the required library is installed. + + For Windows, download from http://site.icu-project.org/download or extract the one from `ext/redist`. + You can also provide your own ICU package. + However, you need to alter the include path, library path, and provide `icudt.dll`. + + Tested with ICU 4.6.1 binary release. + + For Mac users, please download ICU source tarball from http://site.icu-project.org/download + and install according to ICU documents. + +## Getting the Source + +Clone the Git repository from https://github.com/googlei18n/sfntly. + +## Building on Windows + +Let's assume your folder for sfntly is `d:\src\sfntly`. + +1. If you don't have cmake installed, extract the cmake-XXX.zip + into `d:\src\sfntly\cpp\ext\cmake` removing the "-XXX" part. + The extracted binary should be in `d:\src\sfntly\cpp\ext\cmake\bin\cmake.exe`. +2. Extract gtest-XXX.zip into `d:\src\sfntly\cpp\ext\gtest` + removing the "-XXX" part. +3. Extract icu4c-XXX.zip into `d:\src\sfntly\cpp\ext\icu` + removing the "-XXX" part. +4. Run the following commands to create the Visual Studio solution files: + + ``` + d: + cd d:\src\sfntly\cpp + md build + cd build + ..\ext\cmake\bin\cmake .. + ``` + +You should see `sfntly.sln` in `d:\src\sfntly\cpp\build`. + +5. Until the test is modified to access the fonts in the `ext\data` directory: + copy the test fonts from `d:\src\sfntly\cpp\data\ext\` to `d:\src\sfntly\cpp\build\bin\Debug`. +6. Open sfntly.sln. + Since sfntly use STL extensively, please patch your Visual Studio for any STL-related hotfixes/service packs. +7. Build the solution (if the icuuc dll is not found, + you may need to add `d:\src\sfntly\cpp\ext\icu\bin` to the system path). + +### Building on Windows via Command Line + +Visual Studio 2008 and 2010 support command line building, +therefore you dont need the IDE to build the project. + +For Visual Studio 2008 (assume its installed at `c:\vs08`) + + cd d:\src\sfntly\cpp\build + ..\ext\cmake\bin\cmake .. -G "Visual Studio 9 2008" + c:\vs08\common7\tools\vsvars32.bat + vcbuild sfntly.sln + +We invoke the cmake with `-G` to make sure +Visual Studio 2008 solution/project files are generated. +You can also use `devenv sfntly.sln /build` +to build the solution instead of using `vcbuild`. + +There are subtle differences between `devenv` and `vcbuild`. +Please refer to your Visual Studio manual for more details. + +For Visual Studio 2010 (assume its installed at `c:\vs10`) + + + cd d:\src\sfntly\cpp\build + ..\ext\cmake\bin\cmake .. -G "Visual Studio 10" + c:\vs10\common7\tools\vsvars32.bat + msbuild sfntly.sln + +If you install both Visual Studio 2008 and 2010 on your system, +you cant run the scripts above in the same Command Prompt window. +`vsvars32.bat` assumes that it is run from a clean Command Prompt. + +## Building on Linux/Mac + +### Recommended Out-of-Source Building + +1. `cd` *<sfntly dir>* +2. `mkdir build` +3. `cd build` +4. `cmake ..` +5. `make` + +### Default In-Source Building + +> This is not recommended. +> Please use out-of-source whenever possible. + +1. `cd` *<sfntly dir>* +2. `cmake .` +3. `make` + +### Using clang Instead + +Change the `cmake` command line to + + CC=clang CXX=clang++ cmake .. + +The generated Makefile will use clang. +Please note that sfntly uses a lot of advanced C++ semantics that +might not be understood or compiled correctly by earlier versions +of clang (2.8 and before). + +sfntly is tested to compile and run correctly on clang 3.0 (trunk 135314). +clang 2.9 might work but unfortunately we dont have the resource to test it. + +### Debug and Release Builds + +Currently Debug builds are set as default. +To build Release builds, you can either modify the `CMakeList.txt`, +or set environment variable `CMAKE_BUILD_TYPE` to `Release` +before invoking `cmake`. + +Windows users can just switch the configuration in Visual Studio. + +## Running Unit Test + +A program named `unit_test` will be generated after a full compilation. +It expects fonts in `data/ext` to be in the same directory it resides to execute the unit tests. +Windows users also needs to copy `icudt.dll` and `icuuc.dll` to that directory. diff --git a/doc/smart_pointer_usage.md b/doc/smart_pointer_usage.md new file mode 100644 index 0000000..5f77013 --- /dev/null +++ b/doc/smart_pointer_usage.md @@ -0,0 +1,145 @@ +Usage of smart pointers in sfntly C++ port +========================================================================================================================================================= + +In sfntly C++ port, an object ref-counting and smart pointer mechanism +is implemented. The implementation works very much like COM. + +Ref-countable object type inherits from RefCounted<>, which have +addRef() and release() just like IUnknown in COM (but no +QueryInterface). Ptr<> is a smart pointer class like +CComPtr<> which is used to hold the ref-countable objects so that +the object ref count is handled correctly. + +Lets take a look at the example: + +``` {.prettyprint} +class Foo : public RefCounted { + public: + static Foo* CreateInstance() { + Ptr obj = new Foo(); // ref count = 1 + return obj.detach(); // Giving away the control of this instance. + } +}; +typedef Ptr FooPtr; // Common short-hand notation. + +FooPtr obj; +obj.attach(Foo::CreateInstance()); // ref count = 1 + // Take over control but not bumping + // ref count. +{ + FooPtr obj2 = obj; // ref count = 2 + // Assignment bumps up ref count. +} // ref count = 1 + // obj2 out of scope, decrements ref count + +obj.release(); // ref count = 0, object destroyed +``` + +Notes on usage +--------------------------------------------------------------------- + +- Virtual inherit from RefCount interface in base class if smart + pointers are going to be defined. + + + +- All RefCounted objects must be instantiated on the heap. Allocating + the object on stack will cause crash. + + + +- Be careful when you have complex inheritance. For example, + +> In this case the smart pointer is pretty dumb and dont count on it to +> nicely destroy your objects as designed. Try refactor your code like +> +> ``` {.prettyprint} +> class I; // the common interface and implementations +> class A : public I, public RefCounted; // A specific implementation +> class B : public I, public RefCounted; // B specific implementation +> ``` + +- Smart pointers here are very bad candidates for function parameters + and return values. Use dumb pointers when passing over the stack. + + + +- When down\_cast is performed on a dangling pointer due to bugs in + code, VC++ will generate SEH which is not handled well in VC++ + debugger. One can use WinDBG to run it and get the faulting stack. + + + +- Idioms for heap object as return value + +> If you are not passing that object back, you are the end of scope. + +> Be very careful when using the assignment operator as opposed to +> attaching. This can easily cause memory leaks. Have a look at [The +> difference between assignment and attachment with ATL smart +> pointers](http://blogs.msdn.com/b/oldnewthing/archive/2009/11/20/9925918.aspx). +> Since were using essentially the same model, it applies in pretty much +> the same way. The easiest way of knowing when to Attach and when to +> assign is to check the declaration of the function you are using. +> +> ``` {.prettyprint} +> // Attach to the pointer returned by GetInstance +> static CALLER_ATTACH FontFactory* GetInstance(); +> +> // Assign pointer returned by NewCMapBuilder +> CMap::Builder* NewCMapBuilder(const CMapId& cmap_id, +> ReadableFontData* data); +> ``` + +Detecting Memory Leak +------------------------------------------------------------------------------------------ + +> The implementation of COM-like ref-counting and smart pointer remedies +> the lack of garbage collector in C++ to certain extent, however, it +> also introduces other problems. The most common one is memory leakage. +> How do we know that our code leak memory or not? + +- For Linux/Mac, valgrind + [http://valgrind.org](http://valgrind.org/) + is very useful. +- For Windows, the easiest is to use Visual C++ built-in CRT debugging + [http://msdn.microsoft.com/en-us/library/x98tx3cf(v=vs.80).aspx](http://msdn.microsoft.com/en-us/library/x98tx3cf(v=vs.80).aspx) + +Useful Tips for Debugging Ref-Couting Issue +------------------------------------------------------------------------------------------------------------------------------------------------------------ + +- Define `ENABLE_OBJECT_COUNTER` and `REF_COUNT_DEBUGGING`. All + ref-count related activity will be ouput to stderr. + + + +- The logs will look like (under VC2010) + +> Use your favorite editor to transform them into SQL statements, e.g. +> Regex pattern +> +> ``` {.prettyprint} +> ^([ACDR]) class RefCounted[ *const]+:oc=([-0-9]+),oid=([0-9]+),rc=([-0-9]+) +> ``` + +> Replace to +> +> ``` {.prettyprint} +> insert into log values(\1, \2, \3, \4, \5); +> ``` + +- Add one line to the beginning of log + + + +- Run sqlite shell, use .read to input the SQL file. + + + +- Run following commands to get the leaking object class and object + id: + + + +- Once you know which object is leaking, its much easier to setup + conditional breakpoints to identify the real culprit. diff --git a/doc/unit_tests.md b/doc/unit_tests.md new file mode 100644 index 0000000..0426979 --- /dev/null +++ b/doc/unit_tests.md @@ -0,0 +1,32 @@ +[]{#Unit_Test}Unit Test[](#Unit_Test){.section_anchor} +====================================================== + +sfntly C++ port uses Google testing framework for unit testing. + +All test cases are under src/test. + +To run the unit test, you need to compile sfntly first, then copy +Tuffy.ttf from data/ext to the folder that unit\_test(.exe) is located. +Run the unit\_test. + +If you are only interested in one test case, for example, you are +working on name table and only wants name editing test, you can use the +gtest\_filter flag in command line. + +``` {.prettyprint} +unit_test --gtest_filter=NameEditing.* +``` + +For more info regarding running a subset of gtest, please see [GTest +Advanced +Guide](http://code.google.com/p/googletest/wiki/V1_6_AdvancedGuide). + +Its strongly suggested to run unit\_tests with valgrind. The following +example shows a typical test command + +``` {.prettyprint} +valgrind --leak-check=full -v ./unit_test --gtest_filter=*.*-FontData.* +``` + +FontData tests are the longest and you may want to bypass them during +development. They shall be run in bots, however. -- cgit v1.2.3