android13/external/sfntly/doc/SfntlyCPlusPlusNotes.md

Introduction
===============================================================

This document will as the C++ port matures serve as a log to how
different parts of the library work. As of today, there is some general
info but mostly CMap specific details.

------------------------------------------------------------------------

Font Data Tables
===========================================================================

One of the important goals in `sfntly` is thread safety which is why
tables can only be created with their nested `Builder` class and are
immutable after creation.

`CMapTable`
--------------------------------------------------------

*CMap* = character map; it converts *code points* in a *code page* to
*glyph IDs*.

The CMapTable is a table of CMaps (CMaps are also tables; one for every
encoding supported by the font). Representing an encoding-dependent
character map is in one of 14 formats, out of which formats 0 and 4 are
the most used; sfntly/C++ will initially only support formats 0, 2, 4
and 12.

### `CMapFormat0` Byte encoding table

Format 0 is a basic table where a character’s glyph ID is looked up in a
glyphIdArray256. As it only supports 256 characters it can only encode
ASCII and ISO 8859-x (alphabet-based languages).

### `CMapFormat2` High-byte mapping through table

Chinese, Japanese and Korean (CJK) need special 2 byte encodings for
each code point like Shift-JIS.

### `CMapFormat4` Segment mapping to delta values

This is the preferred format for Unicode Basic Multilingual Plane (BMP)
encodings according to the Microsoft spec. Format 4 defines segments
(contiguous ranges of characters; variable length). Finding a
character’s glyph id first means finding the segment it is part of using
a binary search (the segments are sorted). A segment has a
**`startCode`**, an **`endCode`** (the minimum and maximum code points
in the segment), an **`idDelta`** (delta for all code points in the
segment) and an **`idRangeOffset`** (offset into glyphIdArray or 0).

`idDelta` and `idRangeOffset` seem to be the same thing, offsets. In
fact, `idRangeOffset` uses the glyph array to get the index by relying
on the fact that the array is immediately after the `idRangeOffset`
table in the font file. So, the segment’s offset is `idRangeOffset[i]`
but since the `idRangeOffset` table contains words and not bytes, the
value is divided by 2.

``` {.prettyprint}
glyphIndex = *(&idRangeOffset[i] + idRangeOffset[i] / 2 + (c - startCode[i]))
```

`idDelta[i]` is another kind of segment offset used when
`idRangeOffset[i] = 0`, in which case it is added directly to the
character code.

``` {.prettyprint}
glyphIndex = idDelta[i] + c
```

### Class Hierarchy

`CMapTable` is the main class and the container for all other CMap
related classes.

#### Utility classes

-   `CMapTable::CMapId` describes a pair of IDs, platform ID and
    encoding ID that form the CMaps ID. The ID a CMap has is usually a
    good indicator as to what kind of format the CMap uses (Unicode
    CMaps are usually either format 4 or format 12).
-   `CMapTable::CMapIdComparator`
-   `CMapTable::CMapIterator` iteration through the CMapTable is
    supported through a Java-style iterator.
-   `CMapTable::CMapFilter` Java-style filter; CMapIterator supports
    filtering CMaps. By default, it accepts everything CMap.
-   `CMapTable::CMapIdFilter` extends CMapFilter; only accepts one type
    of CMap. Used in conjunction with CMapIterator, this is how the CMap
    getters are implemented.
-   **`CMapTable::Builder`** is the only way to create a CMapTable.

#### CMaps

-   **`CMapTable::CMap`** is the abstract base class that all
    `CMapFormat*` derive. It defines basic functions and the abstract
    `CMapTable::CMap::CharacterIterator` class to iterate through the
    characters in the map. The basic implementation just loops through
    every character between a start and an end. This is overridden so
    that format specific iteration is performed.
-   `CMapFormat0` (mostly done?)
-   `CMapFormat2` (needs builders)
-   ... coming soon

`[todo: will add images soon; need to upload to svn]`

------------------------------------------------------------------------

# Table Building Pipeline

Building a data table in sfntly is done by the
`FontDataTable::Builder::build` method which defines the general
pipeline and leaves the details to each implementing subclass
(`CMapTable::Builder` for example). Note: **`sub*`** methods are table
specific

**`ReadableFontDataPtr data = internalReadData()`**
> There are 2 private fields in the `FontDataTable::Builder` class:
> `rData` and `wData` for `ReadableFontData` and `WritableFontData`.
> This function returns `rData` if there is any or `wData` (it is cast
> to readable font data) if `rData` is null. *They hold the same data!*

**`if (model_changed_)`**
> A font is essentially a binary blob when loaded inside a `FontData`
> object. A *model* is the Java/C++ collection of objects that represent
> the same data in a manipulable format. If you ask for the model (even
> if you dont write to it), it will count as changed and the underlying
> raw data will get updated.

**`if (!subReadyToSerialize())`**
**`return NULL`**
`else`
1.  **`size = subDataToSerialize()`**
2.  **`WritableDataPtr new_data = container_->getNewData(size)`**
3.  **`subSerialize(new_data)`**
4.  **`data = new_data`**

**`FontDataTablePtr table = subBuildTable(data)`**
> The table is actually built, where `subBuildTable` is overridden by
> every class of table but a table header is always added.

Subtable Builders
------------------------------------------------------------------------------

Subtables are lazily built

When creating the object view of the font and dealing with lots of
tables, it would be wasteful to create builders for every subtable there
is since most users only do fairly high level manipulation of the font.
Instead, **only the tables at font level are fully built**.

All other subtables have builders that contain valid FontData but the
object view is not created by default. For the `CMapTable`, this means
that if you don’t go through the `getCMapBuilders()` method, the CMap
builders are not initialized. So, the builder map would seem to be empty
when calling its `size()` method but there are CMaps in the font when
calling `numCMaps(internalReadFont())`.

------------------------------------------------------------------------

Character encoders
---------------------------------------------------------------------------------

Sfntly/Java uses a native ICU-based API for encoding characters.
Sfntly/C++ uses ICU directly. In unit tests we assume text is encoded in
UTF16. Public APIs will use ICU classes like `UnicodeString`.