This is achieved by adding getBytes2() and getBytes4() to streams, and by
changing int16() and int32() to take multiple scalar args instead of an array
arg.
This reduces memory consumption for text heavy documents. I tested five
documents and saw hit rates ranging from 97.4% to 99.8% (most of the misses are
due to |width| varying even when |fontChar| matches). On two of those documents
I saw improvements of 40 and 50 MiB.
The patch also introduces the Glyph constructor, and renames the |unicodeChars|
local variable as |unicode| for consistency with the corresponding Glyph
property.
There is no need to slow down the inner loop with a test for ltp as it can only
change if prediction is true in which case it only changes in the outer loop.
When decoding a stream, the decode buffer is often grown multiple times, its
byte size increasing like so: 512, 1024, 2048, etc. This patch estimates the
minimum size in advance (using the length of the encoded stream), often
allowing the smaller sizes to be skipped. It also renames numerous |length|
variables as |maybeLength| to make it clear that they can be |null|.
I measured this change on eight documents. This change reduces the cumulative
size of decode buffer allocations by 0--32%, with 10--20% being typical. This
reduces peak RSS by 10 or 20 MiB for several of them.
This avoids lots of unnecessary work when such streams are referred to via
fetch(), and so their bytes aren't subsequently read. This is a large
performance win on some files.
By checking if the data is all present before making a substream, we avoid
cases where we parse part of a stream and then throw a MissingDataException
part-way through, which forces us to later re-read the stream -- possibly
multiple times. This is a sizeable performance win for some cases when file
loading is slow (e.g. over the web).