From 6b0889d9b0357ef1409c9e28e98d706658acddfd Mon Sep 17 00:00:00 2001
From: Roland Illig <roland.illig@gmx.de>
Date: Tue, 6 Jun 2017 00:06:38 +0200
Subject: Rescue the old wiki pages from web.archive.org. (#87)

The conversion from HTML to MarkDown has been done mainly by Pandoc,
with some manual cleanup afterwards.

Fixes #83.
---
 doc/MicroTypeExpress.md     |  42 ++++++++++++
 doc/SfntlyCPlusPlusNotes.md | 164 ++++++++++++++++++++++++++++++++++++++++++++
 doc/Useful_links.md         |  25 +++++++
 doc/build_cpp.md            | 150 ++++++++++++++++++++++++++++++++++++++++
 doc/smart_pointer_usage.md  | 145 +++++++++++++++++++++++++++++++++++++++
 doc/unit_tests.md           |  32 +++++++++
 6 files changed, 558 insertions(+)
 create mode 100644 doc/MicroTypeExpress.md
 create mode 100644 doc/SfntlyCPlusPlusNotes.md
 create mode 100644 doc/Useful_links.md
 create mode 100644 doc/build_cpp.md
 create mode 100644 doc/smart_pointer_usage.md
 create mode 100644 doc/unit_tests.md

(limited to 'doc')

diff --git a/doc/MicroTypeExpress.md b/doc/MicroTypeExpress.md
new file mode 100644
index 0000000..97f55d0
--- /dev/null
+++ b/doc/MicroTypeExpress.md
@@ -0,0 +1,42 @@
+# MicroType Express in sfntly
+
+This page describes sfntly’s implementation of MicroType Express
+compression.
+
+The MicroType Express format is [documented](http://www.w3.org/Submission/MTX/),
+and is recently available under license terms we believe are friendly to
+open source (as well as proprietary) implementation. The most popular
+implementation of a decoder is Internet Explorer, as a compression
+option in its Embedded OpenType (EOT) format. Since MicroType Express
+gives an approximate 15% gain in compression over gzip, a web font
+service striving for maximum performance can benefit from implementing
+EOT compression.
+
+The current codebase in sfntly is for compression only (this is by far
+the most useful for web serving). The easiest way to get started is with
+the command line tool, sfnttool:
+
+```
+sfnttool -e -x source.ttf source.eot
+```
+
+The code has been tested extensively against IE, and is currently in
+production in Google Web Fonts, both for full fonts and for subsetting
+(using the text= parameter). This serving path also uses sfntly to
+compute the subsets, and is essentially the same as the -s parameter to
+sfnttool.
+
+If you’re interested in the code and details of the compression
+algorithm, here’s a bit of a guide:
+
+The top-level MicroType Express compression code is in MtxWriter.java
+(in the tools/conversion/eot directory). This code implements almost all
+of the MicroType Express format, including the hdmx tables, push
+sequences, and jump coding. The main feature missing is the VDMX table
+(vertical device metrics), which is very rarely used in web fonts.
+
+Patches are welcome -- possible areas include: implementing the VDMX
+table, speeding up the LZCOMP entropy coder (the match finding code is a
+straightforward adaptation of the algorithm in the format document),
+implementing a decoder in addition to an encoder, and porting the Java
+implementation to C++.
diff --git a/doc/SfntlyCPlusPlusNotes.md b/doc/SfntlyCPlusPlusNotes.md
new file mode 100644
index 0000000..0a55f85
--- /dev/null
+++ b/doc/SfntlyCPlusPlusNotes.md
@@ -0,0 +1,164 @@
+Introduction
+===============================================================
+
+This document will as the C++ port matures serve as a log to how
+different parts of the library work. As of today, there is some general
+info but mostly CMap specific details.
+
+------------------------------------------------------------------------
+
+Font Data Tables
+===========================================================================
+
+One of the important goals in `sfntly` is thread safety which is why
+tables can only be created with their nested `Builder` class and are
+immutable after creation.
+
+`CMapTable`
+--------------------------------------------------------
+
+*CMap* = character map; it converts *code points* in a *code page* to
+*glyph IDs*.
+
+The CMapTable is a table of CMaps (CMaps are also tables; one for every
+encoding supported by the font). Representing an encoding-dependent
+character map is in one of 14 formats, out of which formats 0 and 4 are
+the most used; sfntly/C++ will initially only support formats 0, 2, 4
+and 12.
+
+### `CMapFormat0` Byte encoding table
+
+Format 0 is a basic table where a character’s glyph ID is looked up in a
+glyphIdArray256. As it only supports 256 characters it can only encode
+ASCII and ISO 8859-x (alphabet-based languages).
+
+### `CMapFormat2` High-byte mapping through table
+
+Chinese, Japanese and Korean (CJK) need special 2 byte encodings for
+each code point like Shift-JIS.
+
+### `CMapFormat4` Segment mapping to delta values
+
+This is the preferred format for Unicode Basic Multilingual Plane (BMP)
+encodings according to the Microsoft spec. Format 4 defines segments
+(contiguous ranges of characters; variable length). Finding a
+character’s glyph id first means finding the segment it is part of using
+a binary search (the segments are sorted). A segment has a
+**`startCode`**, an **`endCode`** (the minimum and maximum code points
+in the segment), an **`idDelta`** (delta for all code points in the
+segment) and an **`idRangeOffset`** (offset into glyphIdArray or 0).
+
+`idDelta` and `idRangeOffset` seem to be the same thing, offsets. In
+fact, `idRangeOffset` uses the glyph array to get the index by relying
+on the fact that the array is immediately after the `idRangeOffset`
+table in the font file. So, the segment’s offset is `idRangeOffset[i]`
+but since the `idRangeOffset` table contains words and not bytes, the
+value is divided by 2.
+
+``` {.prettyprint}
+glyphIndex = *(&idRangeOffset[i] + idRangeOffset[i] / 2 + (c - startCode[i]))
+```
+
+`idDelta[i]` is another kind of segment offset used when
+`idRangeOffset[i] = 0`, in which case it is added directly to the
+character code.
+
+``` {.prettyprint}
+glyphIndex = idDelta[i] + c
+```
+
+### Class Hierarchy
+
+`CMapTable` is the main class and the container for all other CMap
+related classes.
+
+#### Utility classes
+
+-   `CMapTable::CMapId` describes a pair of IDs, platform ID and
+    encoding ID that form the CMaps ID. The ID a CMap has is usually a
+    good indicator as to what kind of format the CMap uses (Unicode
+    CMaps are usually either format 4 or format 12).
+-   `CMapTable::CMapIdComparator`
+-   `CMapTable::CMapIterator` iteration through the CMapTable is
+    supported through a Java-style iterator.
+-   `CMapTable::CMapFilter` Java-style filter; CMapIterator supports
+    filtering CMaps. By default, it accepts everything CMap.
+-   `CMapTable::CMapIdFilter` extends CMapFilter; only accepts one type
+    of CMap. Used in conjunction with CMapIterator, this is how the CMap
+    getters are implemented.
+-   **`CMapTable::Builder`** is the only way to create a CMapTable.
+
+#### CMaps
+
+-   **`CMapTable::CMap`** is the abstract base class that all
+    `CMapFormat*` derive. It defines basic functions and the abstract
+    `CMapTable::CMap::CharacterIterator` class to iterate through the
+    characters in the map. The basic implementation just loops through
+    every character between a start and an end. This is overridden so
+    that format specific iteration is performed.
+-   `CMapFormat0` (mostly done?)
+-   `CMapFormat2` (needs builders)
+-   ... coming soon
+
+`[todo: will add images soon; need to upload to svn]`
+
+------------------------------------------------------------------------
+
+# Table Building Pipeline
+
+Building a data table in sfntly is done by the
+`FontDataTable::Builder::build` method which defines the general
+pipeline and leaves the details to each implementing subclass
+(`CMapTable::Builder` for example). Note: **`sub*`** methods are table
+specific
+
+**`ReadableFontDataPtr data = internalReadData()`**
+> There are 2 private fields in the `FontDataTable::Builder` class:
+> `rData` and `wData` for `ReadableFontData` and `WritableFontData`.
+> This function returns `rData` if there is any or `wData` (it is cast
+> to readable font data) if `rData` is null. *They hold the same data!*
+
+**`if (model_changed_)`**
+> A font is essentially a binary blob when loaded inside a `FontData`
+> object. A *model* is the Java/C++ collection of objects that represent
+> the same data in a manipulable format. If you ask for the model (even
+> if you dont write to it), it will count as changed and the underlying
+> raw data will get updated.
+
+**`if (!subReadyToSerialize())`**
+**`return NULL`**
+`else`
+1.  **`size = subDataToSerialize()`**
+2.  **`WritableDataPtr new_data = container_->getNewData(size)`**
+3.  **`subSerialize(new_data)`**
+4.  **`data = new_data`**
+
+**`FontDataTablePtr table = subBuildTable(data)`**
+> The table is actually built, where `subBuildTable` is overridden by
+> every class of table but a table header is always added.
+
+Subtable Builders
+------------------------------------------------------------------------------
+
+Subtables are lazily built
+
+When creating the object view of the font and dealing with lots of
+tables, it would be wasteful to create builders for every subtable there
+is since most users only do fairly high level manipulation of the font.
+Instead, **only the tables at font level are fully built**.
+
+All other subtables have builders that contain valid FontData but the
+object view is not created by default. For the `CMapTable`, this means
+that if you don’t go through the `getCMapBuilders()` method, the CMap
+builders are not initialized. So, the builder map would seem to be empty
+when calling its `size()` method but there are CMaps in the font when
+calling `numCMaps(internalReadFont())`.
+
+------------------------------------------------------------------------
+
+Character encoders
+---------------------------------------------------------------------------------
+
+Sfntly/Java uses a native ICU-based API for encoding characters.
+Sfntly/C++ uses ICU directly. In unit tests we assume text is encoded in
+UTF16. Public APIs will use ICU classes like `UnicodeString`.
diff --git a/doc/Useful_links.md b/doc/Useful_links.md
new file mode 100644
index 0000000..2caa409
--- /dev/null
+++ b/doc/Useful_links.md
@@ -0,0 +1,25 @@
+Wiki
+
+-   TrueType
+    [http://en.wikipedia.org/wiki/TrueType](http://en.wikipedia.org/wiki/TrueType)
+-   OpenType
+    [http://en.wikipedia.org/wiki/OpenType](http://en.wikipedia.org/wiki/OpenType)
+
+Microsoft Typography
+[http://www.microsoft.com/typography/default.mspx](http://www.microsoft.com/typography/default.mspx)
+
+-   Font specifications
+    [http://www.microsoft.com/typography/SpecificationsOverview.mspx](http://www.microsoft.com/typography/SpecificationsOverview.mspx),
+    including OpenType and TrueType.
+-   Font Tools
+    [http://www.microsoft.com/typography/tools/tools.aspx](http://www.microsoft.com/typography/tools/tools.aspx)
+
+Apple
+
+-   TrueType reference manual
+    [http://developer.apple.com/fonts/TTRefMan](http://developer.apple.com/fonts/TTRefMan)
+
+Adobe
+
+-   OpenType
+    [http://www.adobe.com/type/opentype/](http://www.adobe.com/type/opentype/)
diff --git a/doc/build_cpp.md b/doc/build_cpp.md
new file mode 100644
index 0000000..5e3d344
--- /dev/null
+++ b/doc/build_cpp.md
@@ -0,0 +1,150 @@
+# How to build sfntly C++ port
+
+## Build Environment Requirements
+
+*   cmake 2.6 or above
+*   C++ compiler requirement
+    *   Windows: Visual C++ 2008, Visual C++ 2010
+    *   Linux: g++ 4.3 or above.
+        g++ must support built-in atomic ops and has companion libstd++.
+    *   Mac: Apple XCode 3.2.5 or above.
+
+## External Dependencies
+
+sfntly is dependent on several external packages.
+ 
+*   [Google C++ Testing Framework](https://github.com/google/googletest)
+
+    You can download the package yourself or extract the one from `ext/redist`.
+    The package needs to be extracted to ext and rename/symbolic link to `gtest`.
+
+    sfntly C++ port had been tested with gTest 1.6.0.
+ 
+*   ICU
+
+    For Linux, default ICU headers in system will be used.
+    Linux users please make sure you have dev packages for ICU.
+    For example, you can run `sudo apt-get install libicu-dev` in Ubuntu and see if the required library is installed.
+    
+    For Windows, download from http://site.icu-project.org/download or extract the one from `ext/redist`.
+    You can also provide your own ICU package.
+    However, you need to alter the include path, library path, and provide `icudt.dll`.
+    
+    Tested with ICU 4.6.1 binary release.
+
+    For Mac users, please download ICU source tarball from http://site.icu-project.org/download
+    and install according to ICU documents.
+
+## Getting the Source
+
+Clone the Git repository from https://github.com/googlei18n/sfntly.
+ 
+## Building on Windows
+
+Let's assume your folder for sfntly is `d:\src\sfntly`.
+
+1.  If you don't have cmake installed, extract the cmake-XXX.zip
+    into `d:\src\sfntly\cpp\ext\cmake` removing the "-XXX" part.
+    The extracted binary should be in `d:\src\sfntly\cpp\ext\cmake\bin\cmake.exe`.
+2.  Extract gtest-XXX.zip into `d:\src\sfntly\cpp\ext\gtest`
+    removing the "-XXX" part.
+3.  Extract icu4c-XXX.zip into `d:\src\sfntly\cpp\ext\icu`
+    removing the "-XXX" part.
+4.  Run the following commands to create the Visual Studio solution files:
+
+    ```
+    d:
+    cd d:\src\sfntly\cpp
+    md build
+    cd build
+    ..\ext\cmake\bin\cmake ..
+    ```
+
+You should see `sfntly.sln` in `d:\src\sfntly\cpp\build`.
+ 
+5.  Until the test is modified to access the fonts in the `ext\data` directory:
+    copy the test fonts from `d:\src\sfntly\cpp\data\ext\` to `d:\src\sfntly\cpp\build\bin\Debug`.
+6.  Open sfntly.sln.
+    Since sfntly use STL extensively, please patch your Visual Studio for any STL-related hotfixes/service packs.
+7.  Build the solution (if the icuuc dll is not found,
+    you may need to add `d:\src\sfntly\cpp\ext\icu\bin` to the system path).
+
+### Building on Windows via Command Line
+
+Visual Studio 2008 and 2010 support command line building,
+therefore you dont need the IDE to build the project.
+
+For Visual Studio 2008 (assume its installed at `c:\vs08`)
+
+    cd d:\src\sfntly\cpp\build
+    ..\ext\cmake\bin\cmake .. -G "Visual Studio 9 2008"
+    c:\vs08\common7\tools\vsvars32.bat
+    vcbuild sfntly.sln
+
+We invoke the cmake with `-G` to make sure
+Visual Studio 2008 solution/project files are generated.
+You can also use `devenv sfntly.sln /build`
+to build the solution instead of using `vcbuild`.
+
+There are subtle differences between `devenv` and `vcbuild`.
+Please refer to your Visual Studio manual for more details.
+
+For Visual Studio 2010 (assume its installed at `c:\vs10`)
+
+
+    cd d:\src\sfntly\cpp\build
+    ..\ext\cmake\bin\cmake .. -G "Visual Studio 10"
+    c:\vs10\common7\tools\vsvars32.bat
+    msbuild sfntly.sln
+
+If you install both Visual Studio 2008 and 2010 on your system,
+you cant run the scripts above in the same Command Prompt window.
+`vsvars32.bat` assumes that it is run from a clean Command Prompt.
+
+## Building on Linux/Mac
+
+### Recommended Out-of-Source Building
+
+1.  `cd` *&lt;sfntly dir&gt;*
+2.  `mkdir build`
+3.  `cd build`
+4.  `cmake ..`
+5.  `make`
+
+### Default In-Source Building
+
+> This is not recommended.
+> Please use out-of-source whenever possible.
+
+1.  `cd` *&lt;sfntly dir&gt;*
+2.  `cmake .`
+3.  `make`
+
+### Using clang Instead
+
+Change the `cmake` command line to
+
+    CC=clang CXX=clang++ cmake ..
+
+The generated Makefile will use clang.
+Please note that sfntly uses a lot of advanced C++ semantics that
+might not be understood or compiled correctly by earlier versions
+of clang (2.8 and before).
+   
+sfntly is tested to compile and run correctly on clang 3.0 (trunk 135314).
+clang 2.9 might work but unfortunately we dont have the resource to test it.
+
+### Debug and Release Builds
+
+Currently Debug builds are set as default.
+To build Release builds, you can either modify the `CMakeList.txt`,
+or set environment variable `CMAKE_BUILD_TYPE` to `Release`
+before invoking `cmake`.
+ 
+Windows users can just switch the configuration in Visual Studio.
+ 
+## Running Unit Test
+
+A program named `unit_test` will be generated after a full compilation.
+It expects fonts in `data/ext` to be in the same directory it resides to execute the unit tests.
+Windows users also needs to copy `icudt.dll` and `icuuc.dll` to that directory.
diff --git a/doc/smart_pointer_usage.md b/doc/smart_pointer_usage.md
new file mode 100644
index 0000000..5f77013
--- /dev/null
+++ b/doc/smart_pointer_usage.md
@@ -0,0 +1,145 @@
+Usage of smart pointers in sfntly C++ port
+=========================================================================================================================================================
+
+In sfntly C++ port, an object ref-counting and smart pointer mechanism
+is implemented. The implementation works very much like COM.
+
+Ref-countable object type inherits from RefCounted&lt;&gt;, which have
+addRef() and release() just like IUnknown in COM (but no
+QueryInterface). Ptr&lt;&gt; is a smart pointer class like
+CComPtr&lt;&gt; which is used to hold the ref-countable objects so that
+the object ref count is handled correctly.
+
+Lets take a look at the example:
+
+``` {.prettyprint}
+class Foo : public RefCounted<Foo> {
+ public:
+  static Foo* CreateInstance() {
+    Ptr<Foo> obj = new Foo();  // ref count = 1
+    return obj.detach();  // Giving away the control of this instance.
+  }
+};
+typedef Ptr<Foo> FooPtr;  // Common short-hand notation.
+
+FooPtr obj;
+obj.attach(Foo::CreateInstance());  // ref count = 1
+                                    // Take over control but not bumping
+                                    // ref count.
+{
+  FooPtr obj2 = obj;  // ref count = 2
+                      // Assignment bumps up ref count.
+}  // ref count = 1
+   // obj2 out of scope, decrements ref count
+
+obj.release();  // ref count = 0, object destroyed
+```
+
+Notes on usage
+---------------------------------------------------------------------
+
+-   Virtual inherit from RefCount interface in base class if smart
+    pointers are going to be defined.
+
+<!-- -->
+
+-   All RefCounted objects must be instantiated on the heap. Allocating
+    the object on stack will cause crash.
+
+<!-- -->
+
+-   Be careful when you have complex inheritance. For example,
+
+> In this case the smart pointer is pretty dumb and dont count on it to
+> nicely destroy your objects as designed. Try refactor your code like
+>
+> ``` {.prettyprint}
+> class I;  // the common interface and implementations
+> class A : public I, public RefCounted<A>;  // A specific implementation
+> class B : public I, public RefCounted<B>;  // B specific implementation
+> ```
+
+-   Smart pointers here are very bad candidates for function parameters
+    and return values. Use dumb pointers when passing over the stack.
+
+<!-- -->
+
+-   When down\_cast is performed on a dangling pointer due to bugs in
+    code, VC++ will generate SEH which is not handled well in VC++
+    debugger. One can use WinDBG to run it and get the faulting stack.
+
+<!-- -->
+
+-   Idioms for heap object as return value
+
+> If you are not passing that object back, you are the end of scope.
+
+> Be very careful when using the assignment operator as opposed to
+> attaching. This can easily cause memory leaks. Have a look at [The
+> difference between assignment and attachment with ATL smart
+> pointers](http://blogs.msdn.com/b/oldnewthing/archive/2009/11/20/9925918.aspx).
+> Since were using essentially the same model, it applies in pretty much
+> the same way. The easiest way of knowing when to Attach and when to
+> assign is to check the declaration of the function you are using.
+>
+> ``` {.prettyprint}
+>   // Attach to the pointer returned by GetInstance
+>   static CALLER_ATTACH FontFactory* GetInstance();
+>
+>   // Assign pointer returned by NewCMapBuilder 
+>   CMap::Builder* NewCMapBuilder(const CMapId& cmap_id,
+>                                 ReadableFontData* data);
+> ```
+
+Detecting Memory Leak
+------------------------------------------------------------------------------------------
+
+> The implementation of COM-like ref-counting and smart pointer remedies
+> the lack of garbage collector in C++ to certain extent, however, it
+> also introduces other problems. The most common one is memory leakage.
+> How do we know that our code leak memory or not?
+
+-   For Linux/Mac, valgrind
+    [http://valgrind.org](http://valgrind.org/)
+    is very useful.
+-   For Windows, the easiest is to use Visual C++ built-in CRT debugging
+    [http://msdn.microsoft.com/en-us/library/x98tx3cf(v=vs.80).aspx](http://msdn.microsoft.com/en-us/library/x98tx3cf(v=vs.80).aspx)
+
+Useful Tips for Debugging Ref-Couting Issue
+------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+-   Define `ENABLE_OBJECT_COUNTER` and `REF_COUNT_DEBUGGING`. All
+    ref-count related activity will be ouput to stderr.
+
+<!-- -->
+
+-   The logs will look like (under VC2010)
+
+> Use your favorite editor to transform them into SQL statements, e.g.
+> Regex pattern
+>
+> ``` {.prettyprint}
+> ^([ACDR]) class RefCounted<class sfntly::([A-Za-z0-9:]+)>[ *const]+:oc=([-0-9]+),oid=([0-9]+),rc=([-0-9]+)
+> ```
+
+> Replace to
+>
+> ``` {.prettyprint}
+> insert into log values(\1, \2, \3, \4, \5);
+> ```
+
+-   Add one line to the beginning of log
+
+<!-- -->
+
+-   Run sqlite shell, use .read to input the SQL file.
+
+<!-- -->
+
+-   Run following commands to get the leaking object class and object
+    id:
+
+<!-- -->
+
+-   Once you know which object is leaking, its much easier to setup
+    conditional breakpoints to identify the real culprit.
diff --git a/doc/unit_tests.md b/doc/unit_tests.md
new file mode 100644
index 0000000..0426979
--- /dev/null
+++ b/doc/unit_tests.md
@@ -0,0 +1,32 @@
+[]{#Unit_Test}Unit Test[](#Unit_Test){.section_anchor}
+======================================================
+
+sfntly C++ port uses Google testing framework for unit testing.
+
+All test cases are under src/test.
+
+To run the unit test, you need to compile sfntly first, then copy
+Tuffy.ttf from data/ext to the folder that unit\_test(.exe) is located.
+Run the unit\_test.
+
+If you are only interested in one test case, for example, you are
+working on name table and only wants name editing test, you can use the
+gtest\_filter flag in command line.
+
+``` {.prettyprint}
+unit_test --gtest_filter=NameEditing.*
+```
+
+For more info regarding running a subset of gtest, please see [GTest
+Advanced
+Guide](http://code.google.com/p/googletest/wiki/V1_6_AdvancedGuide).
+
+Its strongly suggested to run unit\_tests with valgrind. The following
+example shows a typical test command
+
+``` {.prettyprint}
+valgrind --leak-check=full -v ./unit_test --gtest_filter=*.*-FontData.*
+```
+
+FontData tests are the longest and you may want to bypass them during
+development. They shall be run in bots, however.
-- 
cgit v1.2.3