PostsLife Happens, Programs Don't - Musings About Programming and Software EngineeringThomas Leitnerhttps://gettalong.org
webgen - Webgen::PathHandler::Feed
2022-12-26T00:34:11+01:00https://gettalong.org/Benchmarking Ruby 2.6 to 3.2https://gettalong.org/blog/2022/benchmarking-rubies.html2022-12-26T00:33:06+01:002022-12-26T00:25:00+01:00
<p>It’s this time of the year again where Ruby is released and everyone asks: Is it faster? You will
find out below! And if you are interested, you can compare the results to the previous installations
of this post for the years <a href="../2016/ruby24-performance-looking-good.html">2016</a>, <a href="../2017/benchmarking-ruby-2-5.html">2017</a>, <a href="../2020/benchmarking-rubies.html">2020</a> and <a href="../2021/benchmarking-rubies.html">2021</a>.</p>
<p>This christmas <a href="https://www.ruby-lang.org/en/news/2022/12/25/ruby-3-2-0-released/">Ruby 3.2.0</a> was released, featuring improvements across the board. The YJIT just in
time compiler has been ported to Rust and can now be used on ARM machines. And variable width
allocation (VWA) has been enabled by default. Let’s see how fast Ruby got in the past year!</p>
<h2 id="the-benchmark-setup">The Benchmark Setup</h2>
<p>I will be using the same benchmark setup as <a href="../2021/benchmarking-rubies.html">last year</a> with only a few changes to make the graphs
easier to read:</p>
<ol>
<li>
<p><a href="https://hexapdf.gettalong.org">HexaPDF</a></p>
<p>The following commands were excecuted in the <code>benchmark/</code> directory:</p>
<pre><code>./rubies.sh "2.6.10 2.7.7 3.0.5 3.1.3y 3.2.0 3.2.0y" optimization -b "hexapdf CS$"
./rubies.sh "2.6.10 2.7.7 3.0.5 3.1.3y 3.2.0 3.2.0y" raw_text -b hexapdf
./rubies.sh "2.6.10 2.7.7 3.0.5 3.1.3y 3.2.0 3.2.0y" line_wrapping -b "hexapdf C"
</code></pre>
<p>A Ruby version with an appended “y” tells the script to activate YJIT.</p>
<p>Did you know that HexaPDF is part of the <a href="https://speed.yjit.org/">YJIT headline benchmarks</a>?</p>
</li>
<li>
<p><a href="https://kramdown.gettalong.org">kramdown</a></p>
<p>The following command was excecuted in the kramdown repository directory:</p>
<pre><code>./benchmark/benchmark-rubies.sh "2.6.10 2.7.7 3.0.5 3.1.3y 3.2.0 3.2.0y"
</code></pre>
<p>A Ruby version with an appended “y” tells the script to activate YJIT.</p>
</li>
<li>
<p><a href="https://github.com/gettalong/geom2d">geom2d</a></p>
<p>This benchmark is done using the superb <a href="https://github.com/benchmark-driver/benchmark-driver">benchmark-driver</a> gem.</p>
<pre><code>benchmark-driver benchmark.yaml --rbenv "2.6.10;2.7.7;3.0.5;3.1.3 --yjit;3.2.0;3.2.0 --yjit" -o record
benchmark-driver benchmark_driver.record.yml -o gruff
</code></pre>
</li>
</ol>
<h2 id="results">Results</h2>
<h3 id="hexapdf">HexaPDF</h3>
<p>The images are SVG files, click on them to open them in a new window to view details. The raw data
is already the post-processed data ready for gnuplot-ingestion, with the time in milliseconds and
the memory in kilobytes.</p>
<h4 id="optimization-benchmark">Optimization Benchmark</h4>
<p class="image fit"><a href="assets/hexapdf-optimization.svg" target="_blank"><img src="assets/hexapdf-optimization.svg" alt="HexaPDF optimization benchmark" /></a></p>
<pre><code>Time "hexapdf 2.6.10" "hexapdf 2.7.7" "hexapdf 3.0.5" "hexapdf 3.1.3-yjit" "hexapdf 3.2.0" "hexapdf 3.2.0-yjit"
"CS a.pdf" 188 191 195 424 189 305
"CS b.pdf" 796 885 860 958 819 854
"CS c.pdf" 1542 1659 1714 1502 1549 1218
"CS d.pdf" 3124 3306 3538 3066 3393 2690
"CS e.pdf" 747 819 820 969 774 797
"CS f.pdf" 45440 48377 49116 37725 46389 34489
Memory "hexapdf 2.6.10" "hexapdf 2.7.7" "hexapdf 3.0.5" "hexapdf 3.1.3-yjit" "hexapdf 3.2.0" "hexapdf 3.2.0-yjit"
"CS a.pdf" 29224 28420 27988 294724 30336 35128
"CS b.pdf" 51104 50380 50012 315624 45756 57380
"CS c.pdf" 49184 52524 52628 319708 54344 59840
"CS d.pdf" 79956 78908 75352 340456 78360 83844
"CS e.pdf" 89796 98672 88084 373956 110704 116940
"CS f.pdf" 586996 602100 597028 872624 547892 532800
Filesize "hexapdf 2.6.10" "hexapdf 2.7.7" "hexapdf 3.0.5" "hexapdf 3.1.3-yjit" "hexapdf 3.2.0" "hexapdf 3.2.0-yjit"
"CS a.pdf" 49227 49226 49228 49226 49226 49227
"CS b.pdf" 11045210 11045211 11045210 11045209 11045212 11045210
"CS c.pdf" 13180717 13180715 13180714 13180715 13180717 13180717
"CS d.pdf" 6418483 6418483 6418483 6418482 6418483 6418483
"CS e.pdf" 21751180 21751181 21751180 21751180 21751181 21751180
"CS f.pdf" 117545254 117545255 117545254 117545254 117545254 117545254
</code></pre>
<h4 id="raw-text-benchmark">Raw Text Benchmark</h4>
<p class="image fit"><a href="assets/hexapdf-raw_text.svg" target="_blank"><img src="assets/hexapdf-raw_text.svg" alt="HexaPDF raw text benchmark" /></a></p>
<pre><code>Time "hexapdf 2.6.10" "hexapdf 2.7.7" "hexapdf 3.0.5" "hexapdf 3.1.3-yjit" "hexapdf 3.2.0" "hexapdf 3.2.0-yjit"
"1x" 547 543 548 720 568 513
"5x" 1856 2182 2090 1862 1954 1570
"10x" 3661 4109 4003 3425 3679 2909
"1x ttf" 570 590 593 742 590 571
"5x ttf" 2148 2299 2423 2185 2221 1880
"10x ttf" 4284 4467 4698 3950 4269 3486
Memory "hexapdf 2.6.10" "hexapdf 2.7.7" "hexapdf 3.0.5" "hexapdf 3.1.3-yjit" "hexapdf 3.2.0" "hexapdf 3.2.0-yjit"
"1x" 35152 34252 32356 298724 38928 43348
"5x" 47480 47648 46792 313284 49712 53444
"10x" 60140 59800 59256 325612 62100 65252
"1x ttf" 33760 34400 33392 298996 36592 41268
"5x ttf" 49204 45592 43896 310380 51056 55004
"10x ttf" 63212 63012 63908 323152 70132 73668
Filesize "hexapdf 2.6.10" "hexapdf 2.7.7" "hexapdf 3.0.5" "hexapdf 3.1.3-yjit" "hexapdf 3.2.0" "hexapdf 3.2.0-yjit"
"1x" 441386 441388 441386 441386 441386 441386
"5x" 2201631 2201631 2201633 2201631 2201631 2201631
"10x" 4403276 4403278 4403276 4403278 4403276 4403278
"1x ttf" 535240 535241 535241 535239 535239 535239
"5x ttf" 2615338 2615339 2615337 2615337 2615339 2615339
"10x ttf" 5217071 5217070 5217071 5217072 5217071 5217073
</code></pre>
<h4 id="line-wrapping-benchmark">Line Wrapping Benchmark</h4>
<p class="image fit"><a href="assets/hexapdf-line_wrapping.svg" target="_blank"><img src="assets/hexapdf-line_wrapping.svg" alt="HexaPDF line wrapping benchmark" /></a></p>
<pre><code>Time "hexapdf 2.6.10" "hexapdf 2.7.7" "hexapdf 3.0.5" "hexapdf 3.1.3-yjit" "hexapdf 3.2.0" "hexapdf 3.2.0-yjit"
"C 400" 1472 1699 1646 1544 1675 1379
"C 200" 1601 1815 1851 1733 1970 1539
"C 100" 1945 2209 2210 1901 2307 1730
"C 50" 3281 3584 3576 2834 3563 2499
"C 400 ttf" 1559 1779 1753 1619 1716 1516
"C 200 ttf" 1717 1976 1958 1732 1966 1622
"C 100 ttf" 2223 2515 2423 2141 2626 1885
"C 50 ttf" 5231 6203 6047 5071 5701 4759
Memory "hexapdf 2.6.10" "hexapdf 2.7.7" "hexapdf 3.0.5" "hexapdf 3.1.3-yjit" "hexapdf 3.2.0" "hexapdf 3.2.0-yjit"
"C 400" 112224 113632 115872 369368 86496 93916
"C 200" 112144 106964 108920 369136 88396 96840
"C 100" 109572 104924 105588 367296 91876 100108
"C 50" 159716 192212 235116 467600 197040 207976
"C 400 ttf" 109004 114376 92308 382244 96928 105088
"C 200 ttf" 115452 103608 103792 372156 90280 95440
"C 100 ttf" 116328 101844 104080 369268 92012 100312
"C 50 ttf" 158264 268552 258640 541588 273504 279660
Filesize "hexapdf 2.6.10" "hexapdf 2.7.7" "hexapdf 3.0.5" "hexapdf 3.1.3-yjit" "hexapdf 3.2.0" "hexapdf 3.2.0-yjit"
"C 400" 361579 361581 361579 361579 361581 361579
"C 200" 408490 408495 408493 408493 408495 408495
"C 100" 463816 463815 463814 463816 463814 463813
"C 50" 569332 569332 569332 569332 569332 569334
"C 400 ttf" 442441 442443 442446 442440 442440 442441
"C 200 ttf" 504456 504458 504457 504456 504456 504457
"C 100 ttf" 606549 606547 606549 606549 606548 606548
"C 50 ttf" 767841 767843 767839 767839 767841 767841
</code></pre>
<h4 id="comments">Comments</h4>
<ul>
<li>
<p><strong>I’m just blown away at how much faster 3.2.0 with YJIT is compared to 3.2.0 without it!</strong> This
is very visible in longer running benchmarks, like with optimizing <code>f.pdf</code> where the difference is
a whopping 25%!</p>
</li>
<li>
<p>The three fastest Rubies are 3.2.0 with YJIT, 3.1.3 with YJIT and then 2.6.10. The “plain” Ruby
versions are noticably slower than 2.6.10. However, it is good to see that 3.2.0 is the fastest
among them in most benchmarks.</p>
</li>
<li>
<p>There is a big difference in memory usage between 3.1.3 with YJIT and 3.2.0 with YJIT since the
latter doesn’t reserve memory up-front anymore. The result is that the memory overhead of 3.2.0
with YJIT compared to plain 3.2.0 is very small. This means that there is no need to tune any
YJIT-related memory options anymore!</p>
</li>
<li>
<p>It seems that startup time with YJIT has drastically improved compared to 3.1.3 which is noticable
in the short-running benchmarks.</p>
</li>
</ul>
<h3 id="kramdown">kramdown</h3>
<p><img src="assets/kramdown.png" alt="kramdown benchmark" /></p>
<pre><code># ruby-2.6.10p210 || ruby-2.7.7p221 || ruby-3.0.5p211 || ruby-3.1.3p185-yjit || ruby-3.2.0 || ruby-3.2.0-0-yjit
256 0.68738 0.75570 0.73757 0.67675 0.81496 0.70220
512 1.67798 1.68121 1.54550 1.45382 1.74624 1.54535
1024 3.42654 3.40070 3.44496 3.06991 3.59604 3.18020
</code></pre>
<p>Surprinsingly, Ruby 3.2.0 is the slowest one in this benchmark. The good news is that 3.2.0 with
YJIT performs very well and is only beaten by 3.1.3 with YJIT.</p>
<h3 id="geom2d">geom2d</h3>
<p>The bars represent instructions per seconds, so larger bars are better.</p>
<p class="image fit"><img src="assets/geom2d.png" alt="geom2d small benchmark" /></p>
<pre><code>Comparison:
small
3.2.0 --yjit: 8200.6 i/s
3.1.3 --yjit: 7124.2 i/s - 1.15x slower
2.6.10: 4373.8 i/s - 1.87x slower
3.0.5: 3972.3 i/s - 2.06x slower
3.2.0: 3908.2 i/s - 2.10x slower
2.7.7: 3789.6 i/s - 2.16x slower
</code></pre>
<p>Like last time Ruby with YJIT performs best in this largely CPU-bound benchmark. And YJIT got faster
compared to Ruby 3.1.3!</p>
<h2 id="conclusion">Conclusion</h2>
<p>Ruby 3.2.0 with YJIT is a big step forward performance-wise for all kinds of applications. And I’m
very pleased to see that it got faster since 3.1.x and deemed production ready!</p>
<p>Other enhancements like enabled-by-default variable width allocation (VWA) and object shapes are
certainly also contributing to a better and more performant Ruby experience.</p>
Benchmarking Ruby 2.5 to 3.1https://gettalong.org/blog/2021/benchmarking-rubies.html2021-12-26T16:11:52+01:002021-12-26T15:55:00+01:00
<p>This is another Ruby comparison benchmark, in the tradition of <a href="../2016/ruby24-performance-looking-good.html">2016</a>, <a href="../2017/benchmarking-ruby-2-5.html">2017</a> and <a href="../2020/benchmarking-rubies.html">2020</a>. This
christmas <a href="https://www.ruby-lang.org/en/news/2021/12/25/ruby-3-1-0-released/">Ruby 3.1.0</a> was released, featuring the brand-new YJIT just in time compiler.
<a href="https://twitter.com/_gettalong/status/1442964424130768896">Pre-liminary benchmarks</a> showed noticeable performance benefits for <a href="https://hexapdf.gettalong.org">HexaPDF</a>, so let’s see
what the final version brings.</p>
<h2 id="the-benchmark-setup">The Benchmark Setup</h2>
<p>I will be using the same applications/libraries as <a href="../2020/benchmarking-rubies.html#three-real-world-benchmarks">last time</a> (look there if you need more
details): <a href="https://hexapdf.gettalong.org">HexaPDF</a>, <a href="https://kramdown.gettalong.org">kramdown</a> and <a href="https://github.com/gettalong/geom2d">geom2d</a>.</p>
<p>The adapted benchmarking scripts are as follows:</p>
<ol>
<li>
<p>HexaPDF</p>
<p>The following commands were excecuted in the <code>benchmark/</code> directory:</p>
<pre><code>./rubies.sh "2.5.7 2.6.9 2.7.5 3.0.3 3.1.0 3.1.0m 3.1.0y" optimization -b hexapdf
./rubies.sh "2.5.7 2.6.9 2.7.5 3.0.3 3.1.0 3.1.0m 3.1.0y" raw_text -b hexapdf
./rubies.sh "2.5.7 2.6.9 2.7.5 3.0.3 3.1.0 3.1.0m 3.1.0y" line_wrapping -b hexapdf
</code></pre>
<p>A Ruby version with an appended “m” tells the script to activate MJIT, one with “y” to activate
YJIT.</p>
</li>
<li>
<p>kramdown</p>
<p>The following command was excecuted in the kramdown repository directory:</p>
<pre><code>./benchmark/benchmark-rubies.sh "2.5.7 2.6.9 2.7.5 3.0.3 3.1.0 3.1.0m 3.1.0y"
</code></pre>
<p>A Ruby version with an appended “m” tells the script to activate MJIT, one with “y” to activate
YJIT.</p>
</li>
<li>
<p>geom2d</p>
<p>This benchmark is done using the superb <a href="https://github.com/benchmark-driver/benchmark-driver">benchmark-driver</a> gem.</p>
<pre><code>benchmark-driver benchmark.yaml --rbenv "2.5.7;2.6.9;2.7.5;3.0.3;3.1.0;3.1.0 --mjit;3.1.0 --yjit" -o record
benchmark-driver benchmark_driver.record.yml -o gruff
</code></pre>
</li>
</ol>
<h2 id="results">Results</h2>
<h3 id="hexapdf">HexaPDF</h3>
<p>The images are SVG files, click on them to open them in a new window to view details. The raw data
is already the post-processed data ready for gnuplot-ingestion, with the time in milliseconds and
the memory in kilobytes.</p>
<h4 id="optimization-benchmark">Optimization Benchmark</h4>
<p>Note: There are six groups (different files <code>a.pdf</code> to <code>f.pdf</code>) of four benchmarks (different
hexapdf invocations; except for <code>f.pdf</code> which only has three because the CSP mode would take
<em>really</em> long) with each benchmark having seven columns (different Ruby versions).</p>
<p class="image fit"><a href="assets/hexapdf-optimization.svg" target="_blank"><img src="assets/hexapdf-optimization.svg" alt="HexaPDF optimization benchmark" /></a></p>
<pre><code>Time "hexapdf 2.5.7" "hexapdf 2.6.9" "hexapdf 2.7.5" "hexapdf 3.0.3" "hexapdf 3.1.0" "hexapdf 3.1.0-mjit" "hexapdf 3.1.0-yjit"
"a.pdf" 222 193 298 226 183 471 325
"C a.pdf" 166 153 141 159 163 394 318
"CS a.pdf" 163 154 147 162 178 402 344
"CSP a.pdf" 156 162 158 172 180 387 359
"b.pdf" 652 627 629 665 684 1058 737
"C b.pdf" 605 634 667 669 703 1037 754
"CS b.pdf" 727 741 745 802 798 1059 850
"CSP b.pdf" 4141 4767 5072 4864 4998 0 4099
"c.pdf" 1163 1178 1204 1274 1246 1492 1178
"C c.pdf" 1156 1260 1344 1358 1365 1750 1192
"CS c.pdf" 1300 1385 1477 1461 1535 0 1344
"CSP c.pdf" 4533 4883 5234 5208 5397 0 4268
"d.pdf" 2819 2845 2793 3025 2941 3831 2540
"C d.pdf" 2714 2780 2811 3000 3009 4075 2532
"CS d.pdf" 3023 3076 3162 3324 3447 4398 2783
"CSP d.pdf" 2874 3074 2976 3111 3095 4822 2630
"e.pdf" 575 581 531 572 578 710 673
"C e.pdf" 608 640 660 686 697 904 790
"CS e.pdf" 636 678 680 726 721 902 825
"CSP e.pdf" 16825 18422 19527 20789 20505 23846 17253
"f.pdf" 33949 34068 33737 35586 36043 41606 27994
"C f.pdf" 37793 38077 37730 39154 40388 46554 29839
"CS f.pdf" 44430 44506 45118 45257 48361 0 35573
"CSP f.pdf" 0 0 0 0 0 0 0
Memory "hexapdf 2.5.7" "hexapdf 2.6.9" "hexapdf 2.7.5" "hexapdf 3.0.3" "hexapdf 3.1.0" "hexapdf 3.1.0-mjit" "hexapdf 3.1.0-yjit"
"a.pdf" 15540 28152 28296 27380 28132 44620 293860
"C a.pdf" 15792 28252 28332 27536 28172 44952 294052
"CS a.pdf" 16264 28772 28472 27752 28648 44936 294300
"CSP a.pdf" 16632 29336 29292 28520 29228 45052 295064
"b.pdf" 35016 42784 46352 46016 46572 59792 312352
"C b.pdf" 33884 44772 46616 46184 47516 59688 313140
"CS b.pdf" 38200 47676 47468 49020 50344 59896 316500
"CSP b.pdf" 49040 55124 58024 57848 60796 0 327324
"c.pdf" 34224 52548 51652 48920 49300 59940 314104
"C c.pdf" 37744 47796 50580 49984 49912 60072 315448
"CS c.pdf" 40120 51608 52548 52732 52596 0 317996
"CSP c.pdf" 59140 69204 66496 68760 70772 0 336448
"d.pdf" 65220 77980 76660 76364 72688 73116 338472
"C d.pdf" 57904 74148 73436 75120 76108 77096 341764
"CS d.pdf" 58220 75648 77044 75496 75932 76432 341316
"CSP d.pdf" 78812 85504 84528 86348 89068 90924 353380
"e.pdf" 52316 57112 54956 49764 51660 51900 317468
"C e.pdf" 63492 86952 91124 88324 92276 92588 359600
"CS e.pdf" 89928 83532 97708 83388 104044 104044 369756
"CSP e.pdf" 160284 143684 158604 157276 156308 165656 418132
"f.pdf" 490684 510832 483032 485868 511396 514828 763428
"C f.pdf" 517716 488540 527900 535080 572200 576408 813928
"CS f.pdf" 616200 578648 604328 617584 616688 0 879312
"CSP f.pdf" 0 0 0 0 0 0 0
</code></pre>
<h4 id="raw-text-benchmark">Raw Text Benchmark</h4>
<p class="image fit"><a href="assets/hexapdf-raw_text.svg" target="_blank"><img src="assets/hexapdf-raw_text.svg" alt="HexaPDF raw text benchmark" /></a></p>
<pre><code>Time "hexapdf 2.5.7" "hexapdf 2.6.9" "hexapdf 2.7.5" "hexapdf 3.0.3" "hexapdf 3.1.0" "hexapdf 3.1.0-mjit" "hexapdf 3.1.0-yjit"
"1x" 466 508 514 533 547 663 627
"5x" 1755 1908 1917 2118 2076 0 1842
"10x" 3356 3838 3792 3975 3821 0 3340
"1x ttf" 498 565 558 597 578 903 671
"5x ttf" 2053 2205 2278 2257 2283 0 2103
"10x ttf" 4036 4269 4228 4353 4678 0 3879
Memory "hexapdf 2.5.7" "hexapdf 2.6.9" "hexapdf 2.7.5" "hexapdf 3.0.3" "hexapdf 3.1.0" "hexapdf 3.1.0-mjit" "hexapdf 3.1.0-yjit"
"1x" 24564 36420 35748 34852 33780 47896 298680
"5x" 37312 47088 45824 45668 48368 0 312764
"10x" 48732 58440 57240 57924 61528 0 325368
"1x ttf" 23544 33580 34232 33672 33672 49184 298680
"5x ttf" 36120 48876 44820 43888 46436 0 310888
"10x ttf" 58572 62688 62604 64524 63100 0 325932
</code></pre>
<h4 id="line-wrapping-benchmark">Line Wrapping Benchmark</h4>
<p class="image fit"><a href="assets/hexapdf-line_wrapping.svg" target="_blank"><img src="assets/hexapdf-line_wrapping.svg" alt="HexaPDF line wrapping benchmark" /></a></p>
<pre><code>Time "hexapdf 2.5.7" "hexapdf 2.6.9" "hexapdf 2.7.5" "hexapdf 3.0.3" "hexapdf 3.1.0" "hexapdf 3.1.0-mjit" "hexapdf 3.1.0-yjit"
"L 400" 1217 1283 1333 1290 1363 1679 1322
"C 400" 1513 1664 1775 1516 1579 1729 1482
"L 200" 1334 1400 1503 1503 1534 1733 1434
"C 200" 1679 1875 2000 1726 1722 2157 1564
"L 100" 1545 1636 1790 1709 1727 2147 1552
"C 100" 2073 2130 2329 1967 2086 2434 1835
"L 50" 2555 2750 2872 2757 2684 3072 2259
"C 50" 3300 3578 3668 3278 3296 3924 2724
"L 400 ttf" 1254 1365 1395 1394 1448 0 1355
"C 400 ttf" 1613 1738 1810 1577 1569 1890 1539
"L 200 ttf" 1518 1514 1622 1528 1577 0 1389
"C 200 ttf" 1867 1965 1975 1787 1783 0 1667
"L 100 ttf" 1881 1763 1854 1847 1928 0 1616
"C 100 ttf" 2322 2356 2549 2229 2273 0 1895
"L 50 ttf" 4430 4708 4804 4559 4531 0 3960
"C 50 ttf" 5466 5996 5732 5565 5489 6522 4710
Memory "hexapdf 2.5.7" "hexapdf 2.6.9" "hexapdf 2.7.5" "hexapdf 3.0.3" "hexapdf 3.1.0" "hexapdf 3.1.0-mjit" "hexapdf 3.1.0-yjit"
"L 400" 79868 106940 91156 95348 101444 101896 367380
"C 400" 85836 97632 91032 102984 94980 95248 362252
"L 200" 84244 100584 104232 89136 97136 97868 363504
"C 200" 78540 92620 87440 95608 95192 95832 362624
"L 100" 81896 97384 98992 87036 93264 93848 359116
"C 100" 82184 91844 99844 91052 91324 91940 358584
"L 50" 183380 177796 217552 232260 234476 231672 497100
"C 50" 181212 174596 199768 217608 232100 204324 488116
"L 400 ttf" 77720 103380 104088 103356 97220 0 363836
"C 400 ttf" 82780 108436 108464 99456 105296 105764 372432
"L 200 ttf" 84904 101268 93068 94876 100204 0 366180
"C 200 ttf" 83992 102280 102124 100048 98804 0 365992
"L 100 ttf" 87404 100348 92292 91812 95940 0 361592
"C 100 ttf" 85068 100708 100148 96804 96468 0 363424
"L 50 ttf" 267952 247984 278100 275644 273772 0 545924
"C 50 ttf" 300152 264944 275396 267236 293740 277408 546472
</code></pre>
<h4 id="comments">Comments</h4>
<ul>
<li>
<p>Run time generally gets a bit worse each time for each newer version of Ruby, with 2.5.7 most
often being the fastest except for Ruby 3.1.0+YJIT.</p>
</li>
<li>
<p>Ruby 3.1.0+MJIT is the slowest one, often being much slower than 3.1.0.</p>
</li>
<li>
<p><strong>Ruby 3.1.0+YJIT</strong> performs on par with Ruby 2.5.7 for most benchmarks but is much faster when
the benchmark takes longer. E.g. the optimization benchmark on <code>f.pdf</code> takes around 34 seconds for
2.5.7, but only around 28 seconds for 3.1.0+YJIT.</p>
<p>While the raw text benchmark generates many small (string) objects and doesn’t benefit much from
YJIT, the line wrapping benchmark needs to do much more computations and sees a big performance
improvement for the longer running benchmarks.</p>
<p>The drawback of using YJIT is its high memory usage, consuming an additional 256MB of RAM by
default. See below for how to change the memory used by YJIT and how that affects its performance.</p>
</li>
<li>
<p>Ruby 3.1.0+MJIT has bug with respect to Zlib which affects HexaPDF because Zlib is used for
deflate streams. This is the reason why there is often no data for that column.</p>
</li>
</ul>
<h3 id="kramdown">kramdown</h3>
<p><img src="assets/kramdown.png" alt="kramdown benchmark" /></p>
<pre><code># ruby-2.5.7p206 || ruby-2.6.9p207 || ruby-2.7.5p203 || ruby-3.0.3p157 || ruby-3.1.0p0 || ruby-3.1.0p0-mjit || ruby-3.1.0p0-yjit ||
256 0.70620 0.69444 0.65407 0.73346 0.71877 0.74956 0.65734
512 1.65357 1.60639 1.39829 1.54884 1.56712 1.57714 1.43944
1024 3.37941 3.48940 3.16486 3.33241 3.30570 3.33736 3.05656
</code></pre>
<p>While all Ruby versions perform very similar, Ruby+YJIT clearly takes the lead in this benchmark.</p>
<h3 id="geom2d">geom2d</h3>
<p>The bars represent instructions per seconds, so larger bars are better.</p>
<p class="image fit"><img src="assets/geom2d.png" alt="geom2d small benchmark" /></p>
<pre><code>Comparison:
small
3.1.0 --yjit: 7538.8 i/s
2.6.9: 4760.6 i/s - 1.58x slower
3.0.3: 4519.3 i/s - 1.67x slower
2.5.7: 4513.3 i/s - 1.67x slower
3.1.0: 4358.0 i/s - 1.73x slower
2.7.5: 4134.1 i/s - 1.82x slower
3.1.0 --mjit: 4000.9 i/s - 1.88x slower
</code></pre>
<p>Like last time Ruby+MJIT performed worst but, as expected, Ruby+YJIT performs best in this largely
CPU-bound benchmark.</p>
<h2 id="rubyyjit-memory-tuning">Ruby+YJIT Memory Tuning</h2>
<p>The drawback to using YJIT is its memory usage. YJIT uses 256MB of RAM for its purposes by default
but that can be tuned using the <code>--yjit-exec-mem-size</code> option.</p>
<p>To see the effects of different executable memory sizes, I tested again with HexaPDF and geom2d:</p>
<pre><code>|--------------------------------------------------------------------|
| Optimization || Time | Memory | File size |
|--------------------------------------------------------------------|
| 3.1.0 no YJIT | CSP e.pdf | 20.784ms | 163.532KiB | 21.186.414 |
| YJIT 256MB | CSP e.pdf | 16.242ms | 427.072KiB | 21.186.414 |
| YJIT 128MB | CSP e.pdf | 16.150ms | 297.616KiB | 21.186.416 |
| YJIT 64MB | CSP e.pdf | 16.414ms | 232.044KiB | 21.186.415 |
| YJIT 32MB | CSP e.pdf | 16.312ms | 196.820KiB | 21.186.414 |
| YJIT 16MB | CSP e.pdf | 16.267ms | 182.744KiB | 21.186.414 |
|--------------------------------------------------------------------|
| Line wrapping || Time | Memory | File size |
|--------------------------------------------------------------------|
| 3.1.0 no YJIT | L 50 | 2.717ms | 219.740KiB | 569.798 |
| YJIT 256MB | L 50 | 2.234ms | 474.148KiB | 569.797 |
| YJIT 128MB | L 50 | 2.171ms | 360.116KiB | 569.797 |
| YJIT 64MB | L 50 | 2.158ms | 283.852KiB | 569.798 |
| YJIT 32MB | L 50 | 2.129ms | 258.600KiB | 569.797 |
| YJIT 16MB | L 50 | 2.125ms | 232.604KiB | 569.797 |
|--------------------------------------------------------------------|
</code></pre>
<p class="image fit"><img src="assets/geom2d-yjit-mem.png" alt="geom2d small benchmark" /></p>
<p>In all three cases YJIT performs at a similar level regardless of whether it has 16MB or up to 256MB
of memory for its purpose; and it is faster than Ruby 3.1.0 without YJIT. <strong>I will gladly forteit
16MB of memory in exchange for 20% (HexaPDF) to 50% (geom2d) better performance!</strong></p>
<p>Note that in both benchmarks the code size that can be optimized is not that large, less than 10,000
lines in HexaPDF’s case. So this might be different for e.g. big Rails applications.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Ruby 3.1.0 brings with YJIT another JIT to the runtime and <strong>this one brings benefits for all kinds
of programs, not just for very CPU intensive ones</strong>. The peformance benefit is sometimes very large,
as can be seen in the geom2d benchmark.</p>
<p>One should also check whether the default of 256BM of RAM for YJIT is necessary and tune that value.
Giving YJIT as low as 16MB of RAM turned out to be as good as 256MB for the benchmarked
applications/libraries</p>
<p>After finding last year that MJIT doesn’t perform so well with regular applications, I’m <em>very</em>
excited that YJIT works so well and I will be following it’s development closely!</p>
An Unusual Performance Optimizationhttps://gettalong.org/blog/2021/an-unusual-performance-optimization.html2021-01-18T23:44:57+01:002021-01-14T20:24:00+01:00
<p>I regularly run the <a href="https://hexapdf.gettalong.org/documentation/benchmarks/">HexaPDF benchmarks</a> to make sure that HexaPDF gets faster and not
slower. One of the benchmarks, the “raw_text” benchmark, always had me wondering why using TrueType
fonts was visibly slower. So I decided to investigate.</p>
<h2 id="the-odd-benchmark-result">The Odd Benchmark Result</h2>
<p>The “raw_text” benchmarks tests the performance of close-to-metal text output. This is important
because it is the limiting factor when creating PDF files, especially big PDF files, with much text
content.</p>
<p>What the benchmark does is</p>
<ul>
<li>reading a file line by line,</li>
<li>putting each line on a page, without line wrapping and with manual cursor positioning,</li>
<li>and creating new pages as necessary.</li>
</ul>
<p>Here is the main part of the HexaPDF script:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">File</span><span class="p">.</span><span class="nf">foreach</span><span class="p">(</span><span class="no">ARGV</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="ss">mode: </span><span class="s1">'r'</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">line</span><span class="o">|</span>
<span class="k">if</span> <span class="n">y</span> <span class="o"><</span> <span class="n">bottom_margin</span>
<span class="c1"># Remove the canvas object out of scope for garbage collection</span>
<span class="k">if</span> <span class="n">canvas</span>
<span class="n">doc</span><span class="p">.</span><span class="nf">clear_cache</span><span class="p">(</span><span class="n">canvas</span><span class="p">.</span><span class="nf">context</span><span class="p">.</span><span class="nf">data</span><span class="p">)</span>
<span class="n">canvas</span><span class="p">.</span><span class="nf">context</span><span class="p">.</span><span class="nf">contents</span> <span class="o">=</span> <span class="n">canvas</span><span class="p">.</span><span class="nf">context</span><span class="p">.</span><span class="nf">contents</span>
<span class="k">end</span>
<span class="n">canvas</span> <span class="o">=</span> <span class="n">doc</span><span class="p">.</span><span class="nf">pages</span><span class="p">.</span><span class="nf">add</span><span class="p">.</span><span class="nf">canvas</span>
<span class="n">canvas</span><span class="p">.</span><span class="nf">font</span><span class="p">(</span><span class="n">font</span><span class="p">,</span> <span class="ss">size: </span><span class="mi">12</span><span class="p">)</span>
<span class="n">canvas</span><span class="p">.</span><span class="nf">leading</span> <span class="o">=</span> <span class="mi">14</span>
<span class="n">canvas</span><span class="p">.</span><span class="nf">move_text_cursor</span><span class="p">(</span><span class="ss">offset: </span><span class="p">[</span><span class="mi">72</span><span class="p">,</span> <span class="n">top_margin</span><span class="p">])</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">top_margin</span>
<span class="k">end</span>
<span class="n">canvas</span><span class="p">.</span><span class="nf">show_glyphs_only</span><span class="p">(</span><span class="n">font</span><span class="p">.</span><span class="nf">decode_utf8</span><span class="p">(</span><span class="n">line</span><span class="p">.</span><span class="nf">rstrip!</span><span class="p">))</span>
<span class="n">canvas</span><span class="p">.</span><span class="nf">move_text_cursor</span>
<span class="n">y</span> <span class="o">-=</span> <span class="mi">14</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The bottleneck is at the bottom where each line gets decoded into glyph objects (<code>font.decode_utf8</code>)
and the characters’ glyphs are shown (<code>canvas.show_glyphs_only</code>).</p>
<p>There is nothing specific here regarding TrueType fonts. The <code>font</code> object either represents one of
the built-in PDF fonts like ‘Times Roman’ or a supplied TrueType font. The methods in either version
are very similar.</p>
<p>The <code>canvas.show_glyphs_only</code> method of the canvas is also font-agnostic, calling <code>font.encode</code> to
encode a certain glyph into the needed PDF content stream operator representation.</p>
<p>What’s more is that the two font methods are using cached values as much as possible. I.e. once a
glyph is created for a certain UTF-8 character, it is re-used. And once a glyph is encoded, it
doesn’t need to be encoded again.</p>
<p>So from the first look at it there shouldn’t have been much difference, performance-wise, when using
a TrueType font instead of built-in font. But there was, as the benchmark results showed:</p>
<pre><code>|--------------------------------------------------------------------|
| || Time | Memory | File size |
|--------------------------------------------------------------------|
| hexapdf | 1x | 557ms | 34.160KiB | 452.598 |
|--------------------------------------------------------------------|
| hexapdf | 5x | 1.891ms | 45.244KiB | 2.258.904 |
|--------------------------------------------------------------------|
| hexapdf | 10x | 3.754ms | 57.364KiB | 4.517.825 |
|--------------------------------------------------------------------|
| hexapdf | 1x ttf | 634ms | 33.044KiB | 549.522 |
|--------------------------------------------------------------------|
| hexapdf | 5x ttf | 2.335ms | 48.908KiB | 2.687.124 |
|--------------------------------------------------------------------|
| hexapdf | 10x ttf | 4.693ms | 63.568KiB | 5.360.947 |
|--------------------------------------------------------------------|
</code></pre>
<p>The text file used by the benchmark is the Project Gutenberg text of Homer’s Odyssey (contains about
12.000 lines and about 700.000 characters). The “1x”, “5x” and “10x” indicators show the number of
times the text was output.</p>
<p>For the “10x” version the TrueType benchmark ran about 25% slower than the one with the built-in
PDF font. Some of the difference can be attributed to the need of subsetting the TrueType font and
embedding it in the PDF. Also, when looking at the file sizes there is a difference of about 820KiB.
This is because each glyph is encoded using two bytes in the TrueType version and only one byte in
the built-in PDF font version. So the serializer also has more work to do.</p>
<p>But still, it felt a bit off…</p>
<h2 id="investigating-the-cause">Investigating the Cause</h2>
<p>As I have <a href="https://gettalong.org/blog/2017/memory-conscious-programming-in-ruby.html">written before</a> development of HexaPDF is done in a memory and performance conscious way.
So most parts of HexaPDF are already heavily optimized with respect to those regards. The next
step was to find out the cause - it was time to unpack the trusty profilers!</p>
<h3 id="try-1---run-time-profiling">Try 1 - Run-time Profiling</h3>
<p>Since the TrueType version was markedly slower my first thought was that there was some performance
problem. This is where <a href="https://ruby-prof.github.io/">ruby-prof</a> shines! It helps us to find out which methods are called how many
times and how long processing took in each method.</p>
<p>I have the following simple script that can just be required and it automatically runs ruby-prof:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">BEGIN</span> <span class="p">{</span>
<span class="nb">require</span> <span class="s1">'ruby-prof'</span>
<span class="vg">$profile</span> <span class="o">=</span> <span class="no">RubyProf</span><span class="o">::</span><span class="no">Profile</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">measure_mode: </span><span class="no">RubyProf</span><span class="o">::</span><span class="no">WALL_TIME</span><span class="p">)</span>
<span class="vg">$profile</span><span class="p">.</span><span class="nf">start</span>
<span class="p">}</span>
<span class="k">END</span> <span class="p">{</span>
<span class="n">result</span> <span class="o">=</span> <span class="vg">$profile</span><span class="p">.</span><span class="nf">stop</span>
<span class="no">RubyProf</span><span class="o">::</span><span class="no">GraphHtmlPrinter</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">result</span><span class="p">).</span><span class="nf">print</span><span class="p">(</span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="s1">'ruby-prof.graph.html'</span><span class="p">,</span> <span class="s1">'w+'</span><span class="p">),</span> <span class="ss">min_percent: </span><span class="mi">1</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The script creates an output file <code>ruby-prof.graph.html</code> which contains all the information we need
to pin-point performance-hungry method.</p>
<p>I ran the benchmark together with this script, once using a built-in font and once using a TrueType
font. The results, alas, were disappointing. The time spent in the top methods as well as the number
of calls was nearly identical. So there wasn’t really a clue there.</p>
<p>If a run-time profiler doesn’t show much difference, maybe a look at the memory consumption and the
number of created objects helps.</p>
<h3 id="try-2---memory-profiling">Try 2 - Memory Profiling</h3>
<p>There are <a href="https://github.com/SamSaffron/memory_profiler">several</a> <a href="https://github.com/tmm1/stackprof">great</a> memory profilers available. Most often, however, I
use the <a href="https://github.com/ko1/allocation_tracer">AllocationTracer</a> gem for this task.</p>
<p>As with ruby-prof I have a small script that can just be required:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">BEGIN</span> <span class="p">{</span>
<span class="nb">require</span> <span class="s1">'set'</span>
<span class="nb">require</span> <span class="s1">'forwardable'</span>
<span class="nb">require</span> <span class="s1">'allocation_tracer'</span>
<span class="no">ObjectSpace</span><span class="o">::</span><span class="no">AllocationTracer</span><span class="p">.</span><span class="nf">setup</span><span class="p">(</span><span class="sx">%i{path line type}</span><span class="p">)</span>
<span class="no">ObjectSpace</span><span class="o">::</span><span class="no">AllocationTracer</span><span class="p">.</span><span class="nf">trace</span>
<span class="p">}</span>
<span class="k">END</span> <span class="p">{</span>
<span class="k">begin</span>
<span class="n">results</span> <span class="o">=</span> <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">AllocationTracer</span><span class="p">.</span><span class="nf">stop</span>
<span class="n">results</span><span class="p">.</span><span class="nf">reject</span> <span class="p">{</span><span class="o">|</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="o">|</span> <span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o"><</span> <span class="mi">10</span><span class="p">}.</span><span class="nf">sort_by</span><span class="p">{</span><span class="o">|</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="o">|</span> <span class="p">[</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">k</span><span class="p">[</span><span class="mi">0</span><span class="p">]]}.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="o">|</span>
<span class="vg">$stderr</span><span class="p">.</span><span class="nf">puts</span> <span class="s2">"</span><span class="si">#{</span><span class="n">k</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="si">}</span><span class="s2">:</span><span class="si">#{</span><span class="n">k</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="si">}</span><span class="s2"> - </span><span class="si">#{</span><span class="n">k</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="si">}</span><span class="s2"> - </span><span class="si">#{</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="si">}</span><span class="s2">"</span>
<span class="k">end</span>
<span class="vg">$stderr</span><span class="p">.</span><span class="nf">puts</span> <span class="s2">"Sum: "</span> <span class="o">+</span> <span class="n">results</span><span class="p">.</span><span class="nf">inject</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="p">{</span><span class="o">|</span><span class="n">sum</span><span class="p">,</span> <span class="p">(</span><span class="n">k</span><span class="p">,</span><span class="n">v</span><span class="p">)</span><span class="o">|</span> <span class="n">sum</span> <span class="o">+</span> <span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]}.</span><span class="nf">to_s</span>
<span class="n">pp</span> <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">AllocationTracer</span><span class="p">.</span><span class="nf">allocated_count_table</span>
<span class="n">pp</span> <span class="ss">:total</span> <span class="o">=></span> <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">AllocationTracer</span><span class="p">.</span><span class="nf">allocated_count_table</span><span class="p">.</span><span class="nf">values</span><span class="p">.</span><span class="nf">inject</span><span class="p">(:</span><span class="o">+</span><span class="p">)</span>
<span class="k">rescue</span>
<span class="k">end</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The script outputs the location, type and count of created objects, as well as a summary at the end.</p>
<p>I ran the benchmark again, together with this script and for both versions. And… <em>eureka</em>! The
non-TrueType version allocated around 1.679 million objects whereas the TrueType version allocated
around 2.390 million objects. And the detailed output of the TrueType result also showed us where
this happened:</p>
<pre><code>/home/thomas/hexapdf/lib/hexapdf/serializer.rb:272 - T_MATCH - 202480
/home/thomas/hexapdf/lib/hexapdf/serializer.rb:270 - T_STRING - 208762
/home/thomas/hexapdf/lib/hexapdf/serializer.rb:272 - T_STRING - 567231
</code></pre>
<p>The non-TrueType version only had the second line (as last line, so with the most allocations) and
the other two added up to a bit more than the difference in allocated objects.</p>
<p>Now I knew where to look! Following is the code for the method in question, with one irrelevant
statement removed:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">serialize_string</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
<span class="n">obj</span><span class="p">.</span><span class="nf">gsub!</span><span class="p">(</span><span class="sr">/[()\\\r]/n</span><span class="p">,</span> <span class="no">STRING_ESCAPE_MAP</span><span class="p">)</span>
<span class="s2">"(</span><span class="si">#{</span><span class="n">obj</span><span class="si">}</span><span class="s2">)"</span>
<span class="k">end</span>
</code></pre></div></div>
<p>This method serializes a string into the format used by the PDF spec. The first line was the culprit
allocating the many String and Match objects. It replaces all special characters with
backslash-escaped versions using the <code>STRING_ESCAPE_MAP</code> hash.</p>
<p>This meant that, somehow, the TrueType version generated many strings for which <code>String#gsub!</code>
needed to do something, i.e. ones that included a special character.</p>
<h2 id="finding-the-root-cause">Finding the Root Cause</h2>
<p>The next obvious step was to look for font related code that generates many strings for the
benchmark. This happens during text output in the <code>canvas.show_glyphs_only</code> method. As mentioned
before this method delegates to <code>font.encode</code> for retrieving the character codes that get put into a
PDF content stream.</p>
<p>Here is the relevant method:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">encode</span><span class="p">(</span><span class="n">glyph</span><span class="p">)</span>
<span class="p">(</span><span class="vi">@encoded_glyphs</span><span class="p">[</span><span class="n">glyph</span><span class="p">.</span><span class="nf">id</span><span class="p">]</span> <span class="o">||=</span>
<span class="k">begin</span>
<span class="k">if</span> <span class="n">glyph</span><span class="p">.</span><span class="nf">kind_of?</span><span class="p">(</span><span class="no">InvalidGlyph</span><span class="p">)</span>
<span class="k">raise</span> <span class="no">HexaPDF</span><span class="o">::</span><span class="no">Error</span><span class="p">,</span> <span class="s2">"Glyph for </span><span class="si">#{</span><span class="n">glyph</span><span class="p">.</span><span class="nf">str</span><span class="p">.</span><span class="nf">inspect</span><span class="si">}</span><span class="s2"> missing"</span>
<span class="k">end</span>
<span class="k">if</span> <span class="vi">@subsetter</span>
<span class="p">[[</span><span class="vi">@subsetter</span><span class="p">.</span><span class="nf">use_glyph</span><span class="p">(</span><span class="n">glyph</span><span class="p">.</span><span class="nf">id</span><span class="p">)].</span><span class="nf">pack</span><span class="p">(</span><span class="s1">'n'</span><span class="p">),</span> <span class="n">glyph</span><span class="p">]</span>
<span class="k">else</span>
<span class="p">[[</span><span class="n">glyph</span><span class="p">.</span><span class="nf">id</span><span class="p">].</span><span class="nf">pack</span><span class="p">(</span><span class="s1">'n'</span><span class="p">),</span> <span class="n">glyph</span><span class="p">]</span>
<span class="k">end</span>
<span class="k">end</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">end</span>
</code></pre></div></div>
<p>This method takes a glyph object and returns the character code that is needed for the respective
PDF text showing operators. As the HexaPDF default is to subset TrueType fonts,
<code>@subsetter.use_glyph(glyph.id)</code> is invoked and returns the mapped glyph index for the given glyph.
The returned glyph index is just packed into two bytes and returned.</p>
<p>As there was not further clue here I went further down the rabbit hole and inspected the
<code>#use_glyph</code> method of the subsetter class:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">use_glyph</span><span class="p">(</span><span class="n">glyph_id</span><span class="p">)</span>
<span class="k">return</span> <span class="vi">@glyph_map</span><span class="p">[</span><span class="n">glyph_id</span><span class="p">]</span> <span class="k">if</span> <span class="vi">@glyph_map</span><span class="p">.</span><span class="nf">key?</span><span class="p">(</span><span class="n">glyph_id</span><span class="p">)</span>
<span class="vi">@last_id</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="vi">@glyph_map</span><span class="p">[</span><span class="n">glyph_id</span><span class="p">]</span> <span class="o">=</span> <span class="vi">@last_id</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The method returns an already mapped glyph index or, if not already mapped, increases the counter
for the last used glyph index, stores it for the given glyph index and returns it. Not shown is that
the initial <code>@last_id</code> value is 0.</p>
<p>And here I found the root cause: The counter was increased for every newly encountered glyph. One of
those glyphs would be mapped to 13 (if at least thirteen different glyphs were used which is the
usual case) and this is the decimal value for <code>\r</code> and one of the special characters that needs to
be escaped! The other special characters have the decimal values 40, 41 and 92. So we can assume
that in most cases three glyphs are mapped to strings that need to be escaped when serialized.</p>
<h2 id="implementing-a-solution">Implementing a Solution</h2>
<p>Now that I knew the <em>why</em> I needed to find a solution.</p>
<p>My first instinct was to set the initial <code>@last_id</code> value to 93. Then the problematic values would
never be encountered. However, it turned out that making the necessary adjustments for this to
create a valid PDF font object was not that easy.</p>
<p>Skipping the problematic values was also not an option as that would again mean adjustments in other
places. But what if we made sure that the problematic values were just never used?</p>
<p>The solution I came up with is to use invalid keys for the <code>@glyph_map</code> hash whenever a problematic
value is reached:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="vi">@last_id</span> <span class="o">==</span> <span class="mi">13</span> <span class="o">||</span> <span class="vi">@last_id</span> <span class="o">==</span> <span class="mi">40</span> <span class="o">||</span> <span class="vi">@last_id</span> <span class="o">==</span> <span class="mi">41</span> <span class="o">||</span> <span class="vi">@last_id</span> <span class="o">==</span> <span class="mi">92</span>
<span class="vi">@glyph_map</span><span class="p">[</span><span class="ss">:"s</span><span class="si">#{</span><span class="vi">@last_id</span><span class="si">}</span><span class="ss">"</span><span class="p">]</span> <span class="o">=</span> <span class="vi">@last_id</span>
<span class="vi">@last_id</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">end</span>
</code></pre></div></div>
<p>These invalid keys are then always specially handled in the same manner as the glyph index 0 which
represents an undefined glyph. See <a href="https://github.com/gettalong/hexapdf/commit/cdb87239e95f638d4eca19f503674670c55b3586">the commit</a> for all details of the change.</p>
<h2 id="benchmark-results">Benchmark Results</h2>
<p>After implementing the changes I ran the memory profiler again and the number of allocated objects
went down to about 1.680 million. That looked promising from a memory point of view. However, memory
savings not always translate into time savings.</p>
<p>So I also ran the benchmarks again:</p>
<pre><code>|--------------------------------------------------------------------|
| || Time | Memory | File size |
|--------------------------------------------------------------------|
| hexapdf | 1x | 572ms | 34.680KiB | 452.598 |
|--------------------------------------------------------------------|
| hexapdf | 5x | 1.840ms | 45.352KiB | 2.258.904 |
|--------------------------------------------------------------------|
| hexapdf | 10x | 3.504ms | 57.464KiB | 4.517.827 |
|--------------------------------------------------------------------|
| hexapdf | 1x ttf | 542ms | 33.540KiB | 546.390 |
|--------------------------------------------------------------------|
| hexapdf | 5x ttf | 2.099ms | 43.600KiB | 2.670.953 |
|--------------------------------------------------------------------|
| hexapdf | 10x ttf | 4.016ms | 63.584KiB | 5.328.382 |
|--------------------------------------------------------------------|
</code></pre>
<p>That also looked better! The TrueType benchmark of the “10x” version was now only about 14% slower
than the one with the built-in PDF font.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Most optimizations I already did for HexaPDF involved things like in-place modifications of strings,
i.e. using better tools that Ruby provides.</p>
<p>In this case, however, the solution to the performance problem was a bit unusual in that we actually
had to use deeper TrueType font and PDF format knowledge to modify how strings were created in the
first place.</p>
<p>If such a modification was done in a general purpose font handling library, it would probably not be
merged. However, HexaPDF includes its own font handling library because of such reasons.</p>
Benchmarking Ruby 2.4 to 3.0https://gettalong.org/blog/2020/benchmarking-rubies.html2020-12-28T10:37:15+01:002020-12-28T10:24:00+01:00
<p>I ran some benchmarks using <a href="https://hexapdf.gettalong.org">HexaPDF</a> after Ruby 2.4 was released in <a href="../2016/ruby24-performance-looking-good.html">2016</a> and again after Ruby 2.5
was releasd in <a href="../2017/benchmarking-ruby-2-5.html">2017</a>. Since <a href="https://www.ruby-lang.org/en/news/2020/12/25/ruby-3-0-0-released/">Ruby 3.0.0</a> was released this Christmas, I think this warrants another
round of benchmarks. And this time three different real-world benchmarks are used to evaluate
relative Ruby performance.</p>
<h2 id="three-real-world-benchmarks">Three Real-world Benchmarks</h2>
<p>The last two times I just used HexaPDF to evaluate the relative performance of Ruby releases. This
time I will use three different Rubygems for this task:</p>
<ol>
<li>
<p><strong>HexaPDF Benchmarks</strong></p>
<p>What once started as simple gists is now part of the <a href="https://github.com/gettalong/hexapdf/tree/master/benchmark">HexaPDF repository</a>. The benchmark
descriptions are available on the <a href="https://hexapdf.gettalong.org/documentation/benchmarks/">HexaPDF website</a>. I recommend looking at the
descriptions there to understand what the benchmarks do as I will not replicate the descriptions
here.</p>
<p>The following commands were excecuted in the <code>benchmark/</code> directory:</p>
<pre><code>./rubies.sh "2.4.9 2.5.7 2.6.5 2.7.1 3.0.0 3.0.0j" optimization -b hexapdf
./rubies.sh "2.4.9 2.5.7 2.6.5 2.7.1 3.0.0 3.0.0j" raw_text -b hexapdf
./rubies.sh "2.4.9 2.5.7 2.6.5 2.7.1 3.0.0 3.0.0j" line_wrapping -b hexapdf
</code></pre>
<p>A Ruby version with an appended “j” tells the script to activate the JIT.</p>
</li>
<li>
<p><strong>kramdown Benchmark</strong></p>
<p><a href="https://kramdown.gettalong.org">kramdown</a> also includes a simple benchmarking script which is normally used for evaluating the
performance of different kramdown versions. I added another script to facilitate testing of a
single kramdown version on different Ruby versions.</p>
<p>The benchmark just parses and converts a sample Markdown input document. The size of the input
document is increased for each run (i.e. the original input document is just concatenated <em>X</em>
times).</p>
<p>You can run the benchmark yourself in the kramdown repository using the
<code>benchmark/benchmark-rubies.sh</code> script which needs rbenv and gnuplot installed as well as the
kramdown gem in the <code>rbenv shell --unset</code> environment.</p>
<p>The following command was used:</p>
<pre><code>/benchmark/benchmark-rubies.sh "2.4.9 2.5.7 2.6.5 2.7.1 2.7.1j 3.0.0 3.0.0j"
</code></pre>
<p>A Ruby version with an appended “j” tells the script to activate the JIT.</p>
</li>
<li>
<p><strong>geom2d Benchmark</strong></p>
<p><a href="https://github.com/gettalong/geom2d">geom2d</a> is a small library for 2D geometry. It includes an algorithm for boolean operations
(think union, intersection, …) on arbitrary polygons. The benchmark intersects polygons – one
set has just a few vertices, the other many – which is a compute-intensive operation. So I would
expect a speed-up when using the JIT here.</p>
<p>This benchmark is done using the superb <a href="https://github.com/benchmark-driver/benchmark-driver">benchmark-driver</a> gem.</p>
</li>
</ol>
<h2 id="results-and-comments">Results and Comments</h2>
<p>All benchmarks were done on Ubuntu 20.04 with an i7-8550U processor.</p>
<h3 id="hexapdf">HexaPDF</h3>
<p>The images are SVG files, click on them to open them in a new window to view details. The raw data
is already the post-processed data ready for gnuplot-ingestion, with the time in milliseconds and
the memory in kilobytes.</p>
<h4 id="optimization-benchmark">Optimization Benchmark</h4>
<p>Note: There are six groups (different files <code>a.pdf</code> to <code>f.pdf</code>) of four benchmarks (different
hexapdf invocations; except for <code>f.pdf</code> which only has three because the CSP mode would take
<em>really</em> long) with each benchmark having six columns (different Ruby versions).</p>
<p class="image fit"><a href="assets/hexapdf-optimization.svg" target="_blank"><img src="assets/hexapdf-optimization.svg" alt="HexaPDF optimization benchmark" /></a></p>
<pre><code>Time "hexapdf 2.4.9" "hexapdf 2.5.7" "hexapdf 2.6.5" "hexapdf 2.7.1" "hexapdf 3.0.0" "hexapdf 3.0.0-jit"
"a.pdf" 148 189 200 227 177 512
"C a.pdf" 141 154 152 155 157 456
"CS a.pdf" 138 143 159 157 162 441
"CSP a.pdf" 159 155 188 183 177 452
"b.pdf" 710 627 653 660 688 1069
"C b.pdf" 701 641 672 733 714 1058
"CS b.pdf" 797 723 762 788 806 1063
"CSP b.pdf" 4430 4385 4582 4573 4823 5205
"c.pdf" 1227 1224 1233 1206 1303 1571
"C c.pdf" 1308 1257 1307 1358 1407 1689
"CS c.pdf" 1461 1408 1402 1446 1515 1746
"CSP c.pdf" 4674 4823 5010 5138 5308 5993
"d.pdf" 3439 2973 2984 2873 3296 3672
"C d.pdf" 3372 2875 2957 2862 3029 3725
"CS d.pdf" 3789 3234 3277 3109 3510 4213
"CSP d.pdf" 3801 3370 3484 3150 3557 4254
"e.pdf" 687 611 641 571 601 785
"C e.pdf" 737 694 699 696 759 1016
"CS e.pdf" 770 723 738 755 757 1065
"CSP e.pdf" 17650 18082 18140 18897 19324 0
"f.pdf" 44039 0 36400 35459 33779 37168 39065
"C f.pdf" 48286 39558 39627 40512 42744 42474
"CS f.pdf" 54120 47982 45595 46055 48131 49231
Memory "hexapdf 2.4.9" "hexapdf 2.5.7" "hexapdf 2.6.5" "hexapdf 2.7.1" "hexapdf 3.0.0" "hexapdf 3.0.0-jit"
"a.pdf" 15480 15292 19404 28540 27284 43896
"C a.pdf" 15624 15240 19428 28332 27552 44408
"CS a.pdf" 15924 15720 19744 28708 27636 44344
"CSP a.pdf" 16172 16312 20824 29244 27968 44420
"b.pdf" 35308 34452 35944 46072 45180 57484
"C b.pdf" 35184 34480 35304 46236 45648 57384
"CS b.pdf" 35256 37456 36020 48068 48264 57448
"CSP b.pdf" 55608 48204 47156 57172 58824 59900
"c.pdf" 40568 34652 43412 51816 47856 57480
"C c.pdf" 42560 37588 38348 50116 49644 57328
"CS c.pdf" 44892 39936 41328 52052 51756 57508
"CSP c.pdf" 70560 59576 54672 66208 65900 67680
"d.pdf" 62144 65920 70084 76028 76336 76980
"C d.pdf" 61860 57788 64076 77260 73856 74396
"CS d.pdf" 63356 57284 64276 79816 74436 74344
"CSP d.pdf" 87832 75504 82328 99620 88016 89080
"e.pdf" 45868 51308 48000 54892 55444 55656
"C e.pdf" 100856 69980 75440 98592 102932 94516
"CS e.pdf" 100260 69748 74024 95260 100696 101788
"CSP e.pdf" 197016 176820 128996 151168 152404 0
"f.pdf" 489868 0 490452 498336 471948 472380 472804
"C f.pdf" 506684 504776 486748 528316 529732 537968
"CS f.pdf" 608280 596416 567020 595480 609728 601932
</code></pre>
<h4 id="raw-text-benchmark">Raw Text Benchmark</h4>
<p class="image fit"><a href="assets/hexapdf-raw_text.svg" target="_blank"><img src="assets/hexapdf-raw_text.svg" alt="HexaPDF raw text benchmark" /></a></p>
<pre><code>Time "hexapdf 2.4.9" "hexapdf 2.5.7" "hexapdf 2.6.5" "hexapdf 2.7.1" "hexapdf 3.0.0" "hexapdf 3.0.0-jit"
"1x" 479 465 496 511 566 784
"5x" 1809 1830 1868 1913 2021 2278
"10x" 3531 3434 3600 3765 4002 5255
"1x ttf" 544 613 599 594 629 942
"5x ttf" 2222 2389 2268 2290 2482 2981
"10x ttf" 4479 4552 4563 4638 4851 5501
Memory "hexapdf 2.4.9" "hexapdf 2.5.7" "hexapdf 2.6.5" "hexapdf 2.7.1" "hexapdf 3.0.0" "hexapdf 3.0.0-jit"
"1x" 30152 24316 26232 35520 34056 46840
"5x" 57544 36876 38608 45864 45368 55020
"10x" 77372 49840 49668 58012 57112 68320
"1x ttf" 26932 23880 24872 33752 33232 46868
"5x ttf" 59996 42572 40920 48856 48820 49588
"10x ttf" 80584 60372 55132 62728 63688 68268
</code></pre>
<h4 id="line-wrapping-benchmark">Line Wrapping Benchmark</h4>
<p class="image fit"><a href="assets/hexapdf-line_wrapping.svg" target="_blank"><img src="assets/hexapdf-line_wrapping.svg" alt="HexaPDF line wrapping benchmark" /></a></p>
<pre><code>Time "hexapdf 2.4.9" "hexapdf 2.5.7" "hexapdf 2.6.5" "hexapdf 2.7.1" "hexapdf 3.0.0" "hexapdf 3.0.0-jit"
"L 400" 1210 1219 1265 1331 1330 1703
"C 400" 1519 1581 1577 1693 1506 1848
"L 200" 1357 1344 1411 1480 1494 1744
"C 200" 1772 1869 1851 1877 1658 1948
"L 100" 1628 1617 1560 1758 1676 1856
"C 100" 2098 2077 2058 2217 2001 2310
"L 50" 2901 2787 2762 2806 2733 3172
"C 50" 3561 3461 3483 3616 3387 3903
"L 400 ttf" 1300 1376 1411 1436 1426 1933
"C 400 ttf" 1628 1700 1759 1729 1733 1898
"L 200 ttf" 1588 1525 1552 1589 1783 1923
"C 200 ttf" 1864 1892 1971 1979 1816 2178
"L 100 ttf" 1888 1833 1872 1859 1897 2288
"C 100 ttf" 2423 2420 2507 2404 2235 2805
"L 50 ttf" 5100 4845 5015 4947 4937 5574
"C 50 ttf" 6104 5946 6272 5904 5571 6493
Memory "hexapdf 2.4.9" "hexapdf 2.5.7" "hexapdf 2.6.5" "hexapdf 2.7.1" "hexapdf 3.0.0" "hexapdf 3.0.0-jit"
"L 400" 84676 77912 83476 91672 100696 101928
"C 400" 85112 80204 83344 96848 108756 109564
"L 200" 86144 74900 93812 104136 92736 93284
"C 200" 83552 80696 81436 91616 100940 101508
"L 100" 83048 74040 91552 98792 89744 90268
"C 100" 83568 76772 80712 90464 96956 97828
"L 50" 200188 179200 155004 193996 189788 207156
"C 50" 199416 174324 161544 186052 231444 221180
"L 400 ttf" 81296 79572 96844 100784 91120 92108
"C 400 ttf" 78544 82332 92356 103460 108152 108688
"L 200 ttf" 88684 85288 90144 92304 85732 86112
"C 200 ttf" 76960 89232 87568 96148 98860 99280
"L 100 ttf" 79764 84788 87452 92336 86368 87404
"C 100 ttf" 80820 88760 85492 95368 96608 97268
"L 50 ttf" 257716 270368 236536 286000 281968 284932
"C 50 ttf" 259612 291688 252096 282292 286640 284348
</code></pre>
<h4 id="comments">Comments</h4>
<ul>
<li>
<p>Run time for all Rubies except Ruby 3.0.0+JIT is roughly the same, with 3.0.0+JIT being much
slower in nearly all cases.</p>
</li>
<li>
<p>Memory usage (see raw data) generally got better from 2.4 to 2.5 but got worse starting with 2.7
and takes another hit when the JIT is used.</p>
</li>
<li>
<p>One interesting thing is that Ruby 3.0.0+JIT errors out when doing the “hexapdf CSP” optimization
benchmark on the <code>e.pdf</code> file. I will have to look at this to see what is happening there.</p>
</li>
</ul>
<h3 id="kramdown">kramdown</h3>
<p><img src="assets/kramdown.png" alt="kramdown benchmark" /></p>
<pre><code># ruby-2.4.9p362 || ruby-2.5.7p206 || ruby-2.6.5p114 || ruby-2.7.1p83 || ruby-2.7.1p83-jit || ruby-3.0.0p0 || ruby-3.0.0p0-jit
256 0.73670 0.72830 0.69474 0.69346 0.70838 0.69294 0.69133
512 1.72586 1.70017 1.64458 1.61140 1.65583 1.49876 1.46554
1024 3.58049 3.50635 3.31272 3.22631 3.25020 3.32012 3.23109
</code></pre>
<p>The general trend here is that Ruby got faster over time, with Ruby 2.7 and 3.0 being roughly the
same, irrespective of JIT usage.</p>
<h3 id="geom2d">geom2d</h3>
<p>The bars represent instructions per seconds, so larger bars are better.</p>
<p class="image fit"><img src="assets/geom2d-large.png" alt="geom2d large benchmark" /></p>
<p class="image fit"><img src="assets/geom2d-small.png" alt="geom2d small benchmark" /></p>
<pre><code>Comparison:
large
2.6.5: 3.0 i/s
2.4.9: 2.9 i/s - 1.02x slower
2.5.7: 2.9 i/s - 1.03x slower
3.0.0: 2.9 i/s - 1.05x slower
2.7.1: 2.8 i/s - 1.05x slower
3.0.0 --jit: 2.6 i/s - 1.14x slower
small
2.6.5: 3813.9 i/s
3.0.0: 3797.3 i/s - 1.00x slower
2.5.7: 3794.6 i/s - 1.01x slower
2.7.1: 3635.5 i/s - 1.05x slower
3.0.0 --jit: 3582.3 i/s - 1.06x slower
2.4.9: 3562.4 i/s - 1.07x slower
</code></pre>
<p>I expected that Ruby 3.0.0+JIT would perform best in this benchmark because it is largely CPU-bound.
However, it was actually one of the slowest Rubies.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Ruby 3.0.0 brings many new things to the table with respect to concurrency and typing. If we look at
strictly CPU-bound applications it also got much better performance, especially with the JIT.</p>
<p>However, for real world applications the performance increases are evolutionary rather than
revolutionary. Those who follow Ruby development have known this for years. And I think that is okay
because Ruby is already acceptably fast in many cases (e.g. the HexaPDF PDF library being only
50%-90% slower than a PDF library written in C++).</p>
On maintaining webgenhttps://gettalong.org/blog/2019/on-maintaining-webgen.html2019-08-15T23:16:33+02:002019-08-15T22:40:00+02:00
<p>My static website generator <a href="https://webgen.gettalong.org">webgen</a> has been around for a long time.
Though there are now many other static website generators written in Ruby, I still maintain webgen
because some of its functionality is unique.</p>
<p>Development of webgen started 16 years ago, back in 2003 because I needed a tool for creating my
personal website. From there it grew into a full-blown static website generator over the years (you
can read more about its <a href="https://webgen.gettalong.org/documentation/history.html">history</a> at the webgen homepage). It had its heyday around 2007/2008 when
not many static website generators existed (<a href="https://hobix.com">Hobix</a> anyone?). A few years later
the “boom years” for static website generators started and <a href="https://www.staticgen.com">many, many</a> were created.
Ruby lends itself especially well for such a tool due to its rich ecosystem of web related
libraries.</p>
<p>Nowadays webgen is probably only used by a handful of people besides myself; if you are one them, I
would <a href="mailto:t_leitner@gmx.at">love to hear</a> why you are sticking with webgen. I still maintain
webgen and generate all my websites with it, like this personal website, the <a href="https://webgen.gettalong.org">webgen homepage</a>, the
<a href="https://cmdparse.gettalong.org">cmdparse homepage</a>, the <a href="https://kramdown.gettalong.org">kramdown homepage</a> or the <a href="https://hexapdf.gettalong.org">HexaPDF homepage</a>.</p>
<p>One reason is that I’m naturally very familiar with it, it works great and is reasonably fast.
However, the main reason is that it provides some unique features that I didn’t find anywhere else.
Here are two that I find especially useful:</p>
<dl>
<dt>Flexible file system layout</dt>
<dd>
<p>In contrast to most other static website generators webgen doesn’t prescribe a certain directory
structure. The default is the following:</p>
<pre><code>website/ # your website directory
webgen.config # webgen's configuration file
src/ # the directory with all the source files
ext/ # extensions to webgen's functionality
out/ # where all the generated files go
tmp/ # directory for temporary files and caches
</code></pre>
<p>However, the only thing webgen really needs is the <code>webgen.config</code> file, which can be a YAML or
Ruby file and configures webgen. <a href="https://cmdparse.gettalong.org">cmdparse</a>, for example, uses the following in
its <a href="https://github.com/gettalong/cmdparse/blob/master/webgen.config#L4"><code>webgen.config</code></a> file:</p>
<pre><code>website.config['sources'] =[['/', :file_system, 'doc']]
website.config['destination'] = [:file_system, 'htmldoc']
website.config['website.tmpdir'] = 'webgen-tmp'
</code></pre>
<p>This means that the source directory is changed to <code>doc/</code>, the output directory to <code>htmldoc/</code> and
the temporary directory to <code>webgen-tmp/</code>. Due to this flexibility it is easy to ship the
source for the documentation website with the code itself.</p>
</dd>
<dt>RDoc integration</dt>
<dd>
<p>webgen can integrate the API documentation created via RDoc into a website. This means, for
example, that the API documentation has the same look and feel as the rest of the documentation
(see, for example, the documentation for <a href="https://cmdparse.gettalong.org/api/CmdParse/CommandParser.html">CmdParse::CommandParser</a>).</p>
<p>What is more important, though, and more useful is that the other parts of the website can easily
link to any part of the API documentation. This functionality is extensively used by the <a href="https://webgen.gettalong.org">webgen
homepage</a> itself and, for example, by the <a href="https://hexapdf.gettalong.org">HexaPDF homepage</a>.</p>
<p>Taking the <a href="https://hexapdf.gettalong.org/documentation/changelog.html">HexaPDF changelog</a> as example, you will find that all mentions of classes or methods
are linked to the correct place. There will be no dangling links because during the generation of
the website all such links are automatically checked and a warning would appear if a link target
is not found.</p>
<p>The only other tool I know that integrates API documentation in such a way is <a href="http://www.sphinx-doc.org/en/master/">Sphinx</a>. But maybe
<a href="https://twitter.com/_gettalong/status/1139094702693736448">Antora</a> (project documentation tool based on Asciidoctor) will get such a functionality,
too!</p>
</dd>
</dl>
<p>Until another tool provides at least these two functionalities, I guess I will maintain webgen. The
time and effort is not that much and if I need something new I can quickly implement it.</p>
kramdown 2.0 and beyondhttps://gettalong.org/blog/2018/kramdown-2-0-and-beyond.html2018-10-26T08:41:51+02:002018-10-26T08:40:00+02:00
<p>The <a href="https://kramdown.gettalong.org">kramdown</a> project has become an umbrella project for many
parsers, converters, math engines, … It is time to split things apart to make them more manageable
and have faster release cycles.</p>
<p>Once kramdown was really a <strong>pure-Ruby</strong> Markdown-superset conversion library. Nowadays, it includes
many extensions that rely on Ruby gems with C extensions or even on other programming languages like
NodeJS. This makes updating and testing kramdown more time intensive since one has to install and
manage all dependencies – and as we all know, NodeJS <em>loves</em> dependencies…</p>
<p>Therefore starting with release 2.0 the core kramdown gem will be reduced to a meaningful subset of
extensions, and all other extension will get their own gem. This will allow more independent
development and faster releases.</p>
<p>I will still develop the core kramdown gem and I will help out with the extensions if I have time
but as of now I’m looking for developers/maintainers for the following extensions (<strong><a href="mailto:t_leitner@gmx.at">contact
me if you are interested</a></strong>):</p>
<ul>
<li>GFM parser</li>
<li>PDF converter</li>
<li>mathjaxnode math engine</li>
<li>sskatex math engine</li>
<li>katex math engine</li>
<li>itex2mml math engine</li>
<li>ritex math engine</li>
<li>coderay syntax highlighter</li>
</ul>
<p>The plan is to release the next kramdown version with the current pending changes. Then kramdown 2.0
and all extension gems (in their 1.0 version) will be released, with no code changes. From this
point onwards each gem has its own release cycle.</p>
Privacy Enhancementshttps://gettalong.org/blog/2018/privacy-enhancements.html2018-06-13T20:56:47+02:002018-05-10T17:40:00+02:00
<p>The European Union’s General Data Protection Regulation (GDPR) will be enforced from May 25th
forward. In the light of this I adjusted some things on <code>*.gettalong.org</code> websites.</p>
<p><a href="#update"><strong>Update 2018-06-13</strong></a></p>
<h2 id="no-external-resources">No External Resources</h2>
<p>I have never used many external resources but now even those few are gone. This means:</p>
<ul>
<li>
<p>Fonts that were previously hosted by Google Fonts are now locally hosted. So Google won’t get any
IP adresses or other data.</p>
</li>
<li>
<p>All Javascript, CSS and images are also locally hosted. This was already the case with the
exception of a few images.</p>
</li>
</ul>
<p>The disadvantage of this approach is that browser caching won’t be as effective. However, this is
offset by using longer caching times due to the use of new cache-busting features of <a href="https://webgen.gettalong.org/news.html#webgen-1-5-0-released">webgen</a>.</p>
<h2 id="analytics">Analytics</h2>
<p>I still use <a href="https://statcounter.com">StatCounter</a> for site analytics. So “no external resources” was not 100% correct. The
thing is, however, that the websites <em>would</em> work without it and that StatCounter is blocked by
default by systems like <a href="https://github.com/gorhill/uMatrix">uMatrix</a>. For example, if you are using uMatrix, the websites will work
even if you only enabled 1st-party content.</p>
<p>To enhance the privacy of the data I have enabled <a href="https://de.statcounter.com/support/knowledge-base/314/">IP address masking</a> (which replaces the last
octet of the IP address with a dummy value) and disabled the tracking cookies in StatCounter (which
means that every visit is the first visit).</p>
<p>If you see a cookie named <code>__cfduid</code>: It is from CloudFlare and is not used for tracking. See the
<a href="https://support.cloudflare.com/hc/en-us/articles/200170156-What-does-the-Cloudflare-cfduid-cookie-do-">CloudFlare site</a> for more information.</p>
<h2 id="web-server-enhancements">Web Server Enhancements</h2>
<p>Additionally, I’m now using some HTTP headers that will enhance the privacy:</p>
<dl>
<dt><code>Referrer-Policy "same-origin"</code></dt>
<dd>
<p>If you click on a link to an external website, the external site will normally get the URL of the
original site sent during the request. This header tells the browser to do this only for the
website itself and not for external websites (which get nothing).</p>
</dd>
<dt><code>Strict-Transport-Security "max-age=31536000"</code></dt>
<dd>
<p>Also called HSTS, this header will mandate the use HTTPS for one year after the first access, even
if the link entered into the browser is HTTP. So, essentially, it forces the browser to use HTTPS.</p>
</dd>
<dt><code>X-Frame-Options "SAMEORIGIN"</code></dt>
<dd>
<p>This header disallows embedding the website into another website by use of <code><iframe></code> tags.</p>
</dd>
</dl>
<h2 id="website-checking-tools">Website Checking Tools</h2>
<p>If you want to check your website for trackers, HSTS or security related headers, have a look at the
following websites:</p>
<dl>
<dt><a href="https://webbkoll.dataskydd.net/en">https://webbkoll.dataskydd.net/en</a></dt>
<dd>
<p>Checks for trackers and other things</p>
</dd>
<dt><a href="https://www.ssllabs.com/ssltest/">https://www.ssllabs.com/ssltest/</a></dt>
<dd>
<p>Checks whether HTTPS is correctly set up</p>
</dd>
<dt><a href="https://securityheaders.com">https://securityheaders.com</a></dt>
<dd>
<p>Checks for security related HTTP headers</p>
</dd>
</dl>
<h2 id="update">Update 2018-06-13</h2>
<p>Since StatCounter doesn’t seem to be compliant with EU regulations, I have decided to drop it and
use a self-hosted installation of Matomo instead. Now there are really no external dependencies
anymore.</p>
<p>Furthermore, I have added a <a href="../../privacy.html">privacy policy</a> and a <a href="../../legal.html">legal notice</a> page.</p>
Ruby 2.5 Is Out - Let's Benchmarkhttps://gettalong.org/blog/2017/benchmarking-ruby-2-5.html2017-12-27T15:48:05+01:002017-12-27T15:40:00+01:00
<p>Ruby’s performance is getting better and better with each release and the <a href="https://www.ruby-lang.org/en/news/2017/12/25/ruby-2-5-0-released/">newly released 2.5.0
version</a> is no different.</p>
<p>Before the release of Ruby 2.4 last year I <a href="../2016/ruby24-performance-looking-good.html">benchmarked Ruby 2.3.3p222 and 2.4.0preview3</a> and
was pleasantly surprised. Since the <a href="https://www.ruby-lang.org/en/news/2017/12/25/ruby-2-5-0-released/">release notes of Ruby 2.5</a> highlight some performance
improvements I ran another benchmark.</p>
<p>This time I benchmarked Ruby 2.3.6, 2.4.3 and 2.5.0 on all three <a href="http://hexapdf.gettalong.org">HexaPDF</a> benchmarks. Since HexaPDF
is only compatible with Ruby 2.4 and higher, I had to modify it a bit so that it also runs under
Ruby 2.3. Note that all Ruby versions are tested with the exact same HexaPDF code which means that
there is no distortion through the use of newer methods (like <code>String#match?</code>).</p>
<p>The benchmarks themselves are not that compute-intense but generate a lot of small objects and
strings. As no heavy computation is done it means that potential speed-ups of the Ruby interpreter
are most likely not that pronounced. The three benchmarks are:</p>
<ul>
<li>
<p><a href="https://gist.github.com/gettalong/8955ff5403fe7abb7bee"><strong>Optimization</strong></a>: This benchmark involves reading various PDFs one by one, creating
in-memory representations and writing size-optimized versions of the PDFs. This involves a lot of
string to Ruby object conversion and vice versa.</p>
</li>
<li>
<p><a href="https://gist.github.com/gettalong/0d7c576064725774299cdf4d1a51d2b9"><strong>Raw Text</strong></a>: In this benchmark the text from then English version of Homer’s Odyssey is
just output line by line, with no additional line breaks begin inserted or text metric measuring
being done. This tests the low-level text output facilities of HexaPDF which generate a lot of
small strings.</p>
<p>To see how more text influences the performance, this test is run using the text of Homer’s
Odyssey one, five and ten times.</p>
</li>
<li>
<p><a href="https://gist.github.com/gettalong/8afae547ac3e50e9b8ce6c521a2a0eea"><strong>Line Wrapping</strong></a>: Again Homer’s Odyssey is output but this time the line breaking algorithm
is used. This means that the text needs to be segmented into parts first and then assembled into
lines, providing a more compute intense benchmark.</p>
<p>This test is run on different page widths where with a page width of 400pt no additional line
breaks need to be inserted, and with a page width of 50pt even long words need to be broken.</p>
</li>
</ul>
<p>And here are the results, graphics first followed by the raw data (note that the last four columns
are missing on the optimization benchmark graphic because the bars would be so much higher):</p>
<p class="image fit"><img src="assets/optimization.svg" alt="optimization" /></p>
<p class="image fit"><img src="assets/raw_text.svg" alt="raw text" /></p>
<p class="image fit"><img src="assets/line_wrapping.svg" alt="line wrapping" /></p>
<p>Raw data for the “Optimization” benchmark:</p>
<pre><code>|---------------------------------------------------------------------------------------------|
| a.pdf (53,056) | Time | Memory |
|---------------------------------------------------------------------------------------------|
| | 2.3.6 | 2.4.3 | 2.5.0 | 2.3.6 | 2.4.3 | 2.5.0 |
|---------------------------------------------------------------------------------------------|
| hexapdf | 158ms | 154ms | 162ms | 15,644KiB | 14,108KiB | 14,424KiB |
| hexapdf C | 151ms | 141ms | 155ms | 15,704KiB | 14,224KiB | 14,728KiB |
| hexapdf CS | 156ms | 148ms | 161ms | 16,344KiB | 14,644KiB | 15,064KiB |
| hexapdf CSP | 209ms | 167ms | 176ms | 16,608KiB | 14,856KiB | 15,504KiB |
|---------------------------------------------------------------------------------------------|
| b.pdf (11,520,218) | Time | Memory |
|---------------------------------------------------------------------------------------------|
| hexapdf | 1,093ms | 934ms | 839ms | 31,188KiB | 31,604KiB | 25,248KiB |
| hexapdf C | 1,032ms | 953ms | 878ms | 31,532KiB | 30,440KiB | 26,004KiB |
| hexapdf CS | 1,137ms | 1,087ms | 1,001ms | 34,264KiB | 31,352KiB | 29,120KiB |
| hexapdf CSP | 8,938ms | 8,328ms | 7,933ms | 49,796KiB | 46,796KiB | 40,260KiB |
|---------------------------------------------------------------------------------------------|
| c.pdf (14,399,980) | Time | Memory |
|---------------------------------------------------------------------------------------------|
| hexapdf | 2,192ms | 1,890ms | 1,720ms | 43,272KiB | 39,936KiB | 36,520KiB |
| hexapdf C | 2,194ms | 2,052ms | 1,898ms | 43,604KiB | 39,836KiB | 37,968KiB |
| hexapdf CS | 2,396ms | 2,203ms | 2,047ms | 49,764KiB | 43,328KiB | 40,672KiB |
| hexapdf CSP | 9,435ms | 9,136ms | 8,431ms | 71,284KiB | 63,592KiB | 55,780KiB |
|---------------------------------------------------------------------------------------------|
| d.pdf (8,107,348) | Time | Memory |
|---------------------------------------------------------------------------------------------|
| hexapdf | 5,889ms | 5,002ms | 4,238ms | 99,812KiB | 59,968KiB | 57,172KiB |
| hexapdf C | 5,601ms | 4,967ms | 4,196ms | 85,488KiB | 57,860KiB | 57,724KiB |
| hexapdf CS | 6,119ms | 5,576ms | 4,685ms | 83,880KiB | 60,992KiB | 59,048KiB |
| hexapdf CSP | 6,284ms | 5,606ms | 4,833ms | 93,520KiB | 90,744KiB | 82,328KiB |
|---------------------------------------------------------------------------------------------|
| e.pdf (21,788,087) | Time | Memory |
|---------------------------------------------------------------------------------------------|
| hexapdf | 1,034ms | 851ms | 814ms | 44,596KiB | 49,688KiB | 50,704KiB |
| hexapdf C | 1,093ms | 1,054ms | 920ms | 109,588KiB | 93,268KiB | 66,916KiB |
| hexapdf CS | 1,134ms | 1,127ms | 1,006ms | 109,792KiB | 96,556KiB | 89,128KiB |
| hexapdf CSP | 30,476ms | 29,949ms | 28,679ms | 188,592KiB | 184,464KiB | 182,472KiB |
|---------------------------------------------------------------------------------------------|
| f.pdf (154,752,614) | Time | Memory |
|---------------------------------------------------------------------------------------------|
| hexapdf | 59,356ms | 54,641ms | 45,201ms | 583,908KiB | 461,636KiB | 473,448KiB |
| hexapdf C | 63,415ms | 58,382ms | 49,877ms | 539,764KiB | 505,060KiB | 504,452KiB |
| hexapdf CS | 71,359ms | 64,601ms | 55,915ms | 674,008KiB | 563,100KiB | 592,072KiB |
| ERR hexapdf CSP | 0ms | 0ms | 0ms | 0KiB | 0KiB | 0KiB |
|---------------------------------------------------------------------------------------------|
</code></pre>
<p>Raw data for “Raw Text” benchmark:</p>
<pre><code>|-----------------------------------------------------------------------------------------|
| | Time | Memory |
|-----------------------------------------------------------------------------------------|
| | 2.3.6 | 2.4.3 | 2.5.0 | 2.3.6 | 2.4.3 | 2.5.0 |
|-----------------------------------------------------------------------------------------|
| hexapdf 1x | 667ms | 585ms | 544ms | 28,892KiB | 20,904KiB | 20,956KiB |
| hexapdf 5x | 2,571ms | 2,362ms | 2,176ms | 40,092KiB | 35,444KiB | 32,236KiB |
| hexapdf 10x | 5,025ms | 4,654ms | 4,208ms | 51,360KiB | 52,664KiB | 46,092KiB |
| hexapdf 1x ttf | 708ms | 644ms | 619ms | 25,308KiB | 21,764KiB | 20,464KiB |
| hexapdf 5x ttf | 2,932ms | 2,680ms | 2,480ms | 44,800KiB | 45,756KiB | 36,900KiB |
| hexapdf 10x ttf | 5,785ms | 5,194ms | 4,878ms | 69,996KiB | 62,876KiB | 52,808KiB |
|-----------------------------------------------------------------------------------------|
</code></pre>
<p>Raw data for “Line Wrapping” benchmark:</p>
<pre><code>|-----------------------------------------------------------------------------------------|
| | Time | Time | Time | Memory | Memory | Memory |
|-----------------------------------------------------------------------------------------|
| | 2.3.6 | 2.4.3 | 2.5.0 | 2.3.6 | 2.4.3 | 2.5.0 |
|-----------------------------------------------------------------------------------------|
| hexapdf 400 | 2,251ms | 2,025ms | 1,865ms | 82,612KiB | 67,276KiB | 68,712KiB |
| hexapdf 200 | 2,553ms | 2,308ms | 2,141ms | 94,092KiB | 69,420KiB | 69,820KiB |
| hexapdf 100 | 2,945ms | 2,649ms | 2,434ms | 93,868KiB | 74,496KiB | 73,432KiB |
| hexapdf 50 | 4,572ms | 4,228ms | 3,974ms | 175,492KiB | 178,932KiB | 157,028KiB |
| hexapdf 400 ttf | 2,330ms | 2,103ms | 1,959ms | 84,920KiB | 68,812KiB | 70,660KiB |
| hexapdf 200 ttf | 2,669ms | 2,368ms | 2,201ms | 93,688KiB | 73,540KiB | 78,452KiB |
| hexapdf 100 ttf | 3,305ms | 2,971ms | 2,712ms | 98,556KiB | 76,804KiB | 80,880KiB |
| hexapdf 50 ttf | 7,051ms | 6,602ms | 6,288ms | 268,120KiB | 236,108KiB | 266,376KiB |
|-----------------------------------------------------------------------------------------|
</code></pre>
<p>I think that the graphics and numbers speak for themselves: <strong>Ruby is clearly getting faster and
faster</strong> which is great! And I’m especially excited by the possibility of having an <a href="https://bugs.ruby-lang.org/issues/12589#note-35"><strong>MJIT in Ruby
2.6</strong></a>! Good times! 😊</p>
Memory Conscious Programming in Rubyhttps://gettalong.org/blog/2017/memory-conscious-programming-in-ruby.html2017-10-31T22:06:48+01:002017-10-31T21:47:00+01:00
<p>When programming in Ruby many people think that egregious memory usage is the norm and unavoidable.
However, there are ways and strategies to keep memory usage down and in this post I will show you
some of them.</p>
<h2 id="keeping-rubys-internals-in-mind">Keeping Ruby’s Internals in Mind</h2>
<p>Ruby’s main built-in classes like <code>TrueClass</code>, <code>FalseClass</code>, <code>NilClass</code>, <code>Integer</code>, <code>Float</code>,
<code>Symbol</code>, <code>String</code>, <code>Array</code>, <code>Hash</code> and <code>Struct</code> are highly optimized in terms of execution
performance and memory usage. Note that I’m talking about CRuby (MRI) here and therefore most things
will probably not apply to other Ruby implementations.</p>
<p>Internally, i.e. in its C code, each object in Ruby is referenced via the <code>VALUE</code> type. This is a
pointer to a C structure that holds all the necessary information.</p>
<p>All given numbers below are valid for a 64-bit Linux platform but should apply to any other 64-bit
system.</p>
<h3 id="nil-true-false-and-some-integers"><code>nil</code>, <code>true</code>, <code>false</code> and Some Integers</h3>
<p>Some classes don’t need to allocate memory for the C structure when creating an object since the
<strong>objects can be directly represented by a <code>VALUE</code></strong>. This is the case for objects of the type
<code>NilClass</code> (i.e. the <code>nil</code> value), type <code>TrueClass</code> (i.e. the <code>true</code> value) and type <code>FalseClass</code>
(i.e. the <code>false</code> value).</p>
<p><strong>Small integers in the range of -2^62 to 2^62-1 are also directly represented as a <code>VALUE</code></strong>.</p>
<p>What does this mean? It means that only the bare minimum memory is needed for representing these
objects. And that you don’t need to think about memory usage when using such values.</p>
<p>We can test this by using the <code>ObjectSpace.memsize_of</code> method that returns the memory used by an
object:</p>
<pre><code>2.4.2 > require 'objspace'
=> true
2.4.2 > ObjectSpace.memsize_of(nil)
=> 0
2.4.2 > ObjectSpace.memsize_of(true)
=> 0
2.4.2 > ObjectSpace.memsize_of(false)
=> 0
2.4.2 > ObjectSpace.memsize_of(2**62-1)
=> 0
2.4.2 > ObjectSpace.memsize_of(2**62)
=> 40
</code></pre>
<p>As you can see no additional memory is used, except in the last case since the integer is too big.
Once a <code>VALUE</code> structure is needed, an object uses at least 40 bytes of memory.</p>
<h3 id="arrays-structs-hashes-and-strings">Arrays, Structs, Hashes and Strings</h3>
<p>Objects for these four classes use special C structures instead of the general one. These structures
allow storing some values directly inside them instead of allocating extra memory.</p>
<p><strong>Arrays with up to three elements are memory efficient</strong>. After that each new element needs 8
additional bytes:</p>
<pre><code>2.4.2 > ObjectSpace.memsize_of([])
=> 40
2.4.2 > ObjectSpace.memsize_of([1])
=> 40
2.4.2 > ObjectSpace.memsize_of([1, 2])
=> 40
2.4.2 > ObjectSpace.memsize_of([1, 2, 3])
=> 40
2.4.2 > ObjectSpace.memsize_of([1, 2, 3, 4])
=> 72
</code></pre>
<p>This also applies to structs with up to three members, i.e. those structs only need 40 bytes of
memory:</p>
<pre><code>2.4.2 > X = Struct.new(:a, :b, :c)
=> X
2.4.2 > Y = Struct.new(:a, :b, :c, :d)
=> Y
2.4.2 > ObjectSpace.memsize_of(X.new)
=> 40
2.4.2 > ObjectSpace.memsize_of(Y.new)
=> 72
</code></pre>
<p>It is a bit different with hashes but the most important thing is that <strong>hashes without elements
only need the minimum 40 bytes</strong> (so no big penalty there, e.g. for default values):</p>
<pre><code>2.4.2 :044 > ObjectSpace.memsize_of({})
=> 40
2.4.2 :045 > ObjectSpace.memsize_of({a: 1})
=> 192
2.4.2 :046 > ObjectSpace.memsize_of({a: 1, b: 2, c: 3, d: 4})
=> 192
2.4.2 :047 > ObjectSpace.memsize_of({a: 1, b: 2, c: 3, d: 4, e: 5})
=> 288
</code></pre>
<p>You can also see that a hash with up to four entries uses 192 bytes, so this is the minimum you need
for non-empty hashes.</p>
<p>Finally, <strong>strings with up to 23 bytes</strong> are stored directly in the <code>RString</code> structure that
represents a string object:</p>
<pre><code>2.4.2 :062 > ObjectSpace.memsize_of("")
=> 40
2.4.2 :063 > ObjectSpace.memsize_of("a"*23)
=> 40
2.4.2 :064 > ObjectSpace.memsize_of("a"*24)
=> 65
</code></pre>
<p>How does this knowledge help you? I don’t suggest that you design purely around these constraints
but they may influence your decisions when you need to choose between alternative implementations.</p>
<h3 id="your-everyday-object">Your Everyday Object</h3>
<p>All “normal” objects, i.e. those without a special C structure, use the general <code>RObject</code> structure.
You might think that this won’t allow you to be memory conscious but you are wrong. Even this
structure has a “memory efficient” mode.</p>
<p>If you have an array the memory used by the array is for storing (<code>VALUE</code> pointers to) its entries.
Similarly, if you have a string it uses memory for storing the bytes that make up the string. So for
what purpose is memory used in case of a general object? Instance variables!</p>
<p>The values for instance variables are stored by the object, however, the names of the instance
variables are stored by the associated class object (because normally the objects of one class have
the same instance variables).</p>
<p>Like with arrays <strong>an object with up to three instance variables only uses 40 bytes</strong>, one with four
or five uses 80 bytes:</p>
<pre><code>2.4.2 > class X; def initialize(c); c.times {|i| instance_variable_set(:"@i#{i}", i)}; end; end
=> :initialize
2.4.2 :064 > ObjectSpace.memsize_of(X.new(0))
=> 40
2.4.2 :065 > ObjectSpace.memsize_of(X.new(1))
=> 40
2.4.2 :066 > ObjectSpace.memsize_of(X.new(2))
=> 40
2.4.2 :067 > ObjectSpace.memsize_of(X.new(3))
=> 40
2.4.2 :068 > ObjectSpace.memsize_of(X.new(4))
=> 80
2.4.2 :069 > ObjectSpace.memsize_of(X.new(5))
=> 80
2.4.2 :070 > ObjectSpace.memsize_of(X.new(6))
=> 96
</code></pre>
<h2 id="strategies">Strategies</h2>
<h3 id="class-and-system-design">Class and System Design</h3>
<p>When developing applications/libraries that don’t need to create many objects, you don’t really need
to be memory conscious. However, if they <em>do</em> need to create many objects, it would be good to keep
the above information in the back of your mind when designing the classes and interactions.</p>
<p>Consider this example: You need to create a class that can represent CSS margin values. As per the
<a href="https://developer.mozilla.org/en-US/docs/Web/CSS/margin">CSS specification</a>, one to four values are allowed. How would you do this?</p>
<ul>
<li>
<p>One idea might be to just use an array. This would not be a good abstractions but memory-wise the
array would use either 40 bytes or, with four values, 72 bytes.</p>
</li>
<li>
<p>However, since most of the array methods are not really applicable, the array should be wrapped
inside a class. Objects of this class would use 80 or 112 bytes, depending on the size of the
array.</p>
</li>
<li>
<p>Another possibility would be to create a class and store the four values in instance variables on
initialization. Objects would then always use 80 bytes.</p>
</li>
<li>
<p>Finally, instead of a class a struct with four members could be used. Objects would only use 72
bytes.</p>
</li>
</ul>
<p>This example may be far-fetched but it nicely illustrates two points: First that using built-in
types are often the best way to conserve memory, at the cost of having a good abstraction. And
second that having Ruby’s internals in mind when designing classes can reduce memory usage (i.e. in
the example’s case using a struct over a plain class saves 10% per object).</p>
<h3 id="object-re-use">Object Re-use</h3>
<p>Another way to conserve memory is to re-use objects when possible. This is easily done in case of
immutable objects but can be applied in other cases, too.</p>
<p>A typical example for object re-use would be a graphical text editor. The text editor needs to have
the information about each visual representation of a character (a glyph) available. By caching and
re-using the glyph information only one instance for each glyph needs to be created, even if
referenced from multiple positions.</p>
<p>Another example would be the freezing and deduplicating of strings. This can be done on a case by
case basis or globally for a Ruby source file via the “frozen_string_literal: true” pragma. This
allows the interpreter to deduplicate strings, reducing the memory usage. Starting with Ruby 2.5 you
can also deduplicate any string yourself by using the result of the <code>String#-@</code> method, e.g. <code>-str</code>.</p>
<h3 id="appropriate-use-of-methods-and-algorithms">Appropriate Use of Methods and Algorithms</h3>
<p>The best memory savings come from <strong>not allocating additional objects at all</strong>. For example, if you
have an array and need to map each value, you can either use <code>Array#map</code> or <code>Array#map!</code>. The
difference is that the first creates a new array whereas the second one modifies the array in-place.
It is often possible to use the second method without any other code changes. So if you have a
hotspot that uses a transformation method like <code>Array#map</code>, think if you can get away with using a
different, more memory-efficient method.</p>
<p><strong>Choosing appropriate algorithms</strong> can also greatly reduce memory usage. For example, when
modifying an encrypted PDF file in HexaPDF, there are situations where decryption and re-encryption
of data streams is not needed. By identifying these situations it is possible to reduce memory usage
and speed up processing by just copying the input data stream straight into the output file. This
lead to <a href="../2016/hexapdf-performance-benchmark.html">HexaPDF using less memory than a C++ library</a> when optimizing encrypted files.</p>
<h3 id="measuring-memory-usage">Measuring Memory Usage</h3>
<p>There are several gems that help with determining where a program allocates memory. The two that I
most often use are <a href="https://github.com/ko1/allocation_tracer">allocation_tracer</a> and <a href="https://github.com/SamSaffron/memory_profiler">memory_profiler</a>.</p>
<p>Both tools can measure a whole program or they can be turned on and off to only measure certain
parts of a program. Either method allows you to determine hotspots in your program and then act on
the information. For example, while developing kramdown several years ago I found that the HTML
converter class allocated huge amounts of throw-away strings. By changing this hotspot to a better
alternative kramdown got faster and used less memory.</p>
<p>To get you started on using these two gems, here are two files that are intended to get pre-loaded
using the <code>-r</code> switch of the ruby binary (i.e. <code>ruby -I. -ralloc_tracer myscript.rb</code>).</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">BEGIN</span> <span class="p">{</span>
<span class="nb">require</span> <span class="s1">'allocation_tracer'</span>
<span class="no">ObjectSpace</span><span class="o">::</span><span class="no">AllocationTracer</span><span class="p">.</span><span class="nf">setup</span><span class="p">(</span><span class="sx">%i{path line type}</span><span class="p">)</span>
<span class="no">ObjectSpace</span><span class="o">::</span><span class="no">AllocationTracer</span><span class="p">.</span><span class="nf">trace</span>
<span class="p">}</span>
<span class="k">END</span> <span class="p">{</span>
<span class="nb">require</span> <span class="s1">'pp'</span>
<span class="n">results</span> <span class="o">=</span> <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">AllocationTracer</span><span class="p">.</span><span class="nf">stop</span>
<span class="n">results</span><span class="p">.</span><span class="nf">reject</span> <span class="p">{</span><span class="o">|</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="o">|</span> <span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o"><</span> <span class="mi">10</span><span class="p">}.</span><span class="nf">sort_by</span><span class="p">{</span><span class="o">|</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="o">|</span> <span class="p">[</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">k</span><span class="p">[</span><span class="mi">0</span><span class="p">]]}.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="o">|</span>
<span class="nb">puts</span> <span class="s2">"</span><span class="si">#{</span><span class="n">k</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="si">}</span><span class="s2">:</span><span class="si">#{</span><span class="n">k</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="si">}</span><span class="s2"> - </span><span class="si">#{</span><span class="n">k</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="si">}</span><span class="s2"> - </span><span class="si">#{</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="si">}</span><span class="s2">"</span>
<span class="k">end</span>
<span class="nb">puts</span> <span class="s2">"Sum: "</span> <span class="o">+</span> <span class="n">results</span><span class="p">.</span><span class="nf">inject</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="p">{</span><span class="o">|</span><span class="n">sum</span><span class="p">,</span> <span class="p">(</span><span class="n">k</span><span class="p">,</span><span class="n">v</span><span class="p">)</span><span class="o">|</span> <span class="n">sum</span> <span class="o">+</span> <span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]}.</span><span class="nf">to_s</span>
<span class="n">pp</span> <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">AllocationTracer</span><span class="p">.</span><span class="nf">allocated_count_table</span>
<span class="n">pp</span> <span class="ss">:total</span> <span class="o">=></span> <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">AllocationTracer</span><span class="p">.</span><span class="nf">allocated_count_table</span><span class="p">.</span><span class="nf">values</span><span class="p">.</span><span class="nf">inject</span><span class="p">(:</span><span class="o">+</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">BEGIN</span> <span class="p">{</span>
<span class="nb">require</span> <span class="s1">'memory_profiler'</span>
<span class="no">MemoryProfiler</span><span class="p">.</span><span class="nf">start</span>
<span class="p">}</span>
<span class="k">END</span> <span class="p">{</span>
<span class="n">report</span> <span class="o">=</span> <span class="no">MemoryProfiler</span><span class="p">.</span><span class="nf">stop</span>
<span class="n">report</span><span class="p">.</span><span class="nf">pretty_print</span>
<span class="p">}</span>
</code></pre></div></div>
<p>These two files allow you to profile memory usage without changing your program.</p>
<h2 id="conclusion">Conclusion</h2>
<p>There are various ways to reduce the memory usage of Ruby when developing libraries and
applications. Knowing a bit of Ruby interpreter internals greatly helps in understanding how Ruby
uses memory and how we can exploit that fact. Additionally, knowing the performance and memory
impact of Ruby core methods helps in choosing the appropriate method.</p>
PDF Filter Implementation in HexaPDF Using Fibershttps://gettalong.org/blog/2017/pdf-filter-implementation-in-hexapdf-using-fibers.html2017-10-07T13:50:21+02:002017-10-07T12:59:00+02:00
<p>In the <a href="../2016/pdf-object-representation-in-hexapdf.html">previous post</a> about <a href="http://hexapdf.gettalong.org">HexaPDF</a> I introduced the basic PDF object system. This post will
focus on one of the available object types, PDF streams and their filters.</p>
<p>If you are already familiar with the basics of PDF streams and filters, <a href="#hexapdf">jump down</a> to the
section about their implementation in HexaPDF.</p>
<h2 id="pdf-streams">PDF Streams</h2>
<p>As described in the previous post, a PDF stream represents a potentially unlimited sequence of
bytes. Each stream has also some meta data associated with it and this meta data is represented by a
PDF dictionary. Since a stream is not limited in size it is used to hold data like images, font
files or content streams of pages (i.e. the instructions that tell a PDF viewer what to display and
how).</p>
<p>If you look inside a PDF file you will find that the instructions for defining streams are just
plain ASCII strings, as are the instructions for all other PDF objects. The following is a valid PDF
stream:</p>
<pre><code>1 0 obj
<</Length 12>>
stream
Hello World!
endstream
endobj
</code></pre>
<p>The first thing to notice is that the PDF stream is defined as an indirect PDF object. Since streams
have to follow a certain syntax and can be arbitrarily long, they can never be direct objects!</p>
<p>After the object definition comes the PDF dictionary that holds the meta data of the stream. There
are a few keys like <code>/Length</code>, <code>/Filter</code> and <code>/DecodeParms</code> that are valid for all streams. The only
mandatory key is <code>/Length</code> since without it would be hard (sometimes impossible) to find the end of
the stream (in essence, we would need to scan for the <code>endstream</code> keyword which might, or might not,
work).</p>
<p>If the dictionary was followed by the <code>endobj</code> keyword, we would just have an indirect object
pointing to a dictionary. However, it is followed by the <code>stream</code> keyword, telling us that stream
data follows, and the stream data itself is followed by the <code>endstream</code> keyword.</p>
<p>And that’s how streams are represented at the file level. However, before we start exploring how
HexaPDF handles streams there is one more thing to know of: <strong>stream filters</strong>.</p>
<h2 id="stream-filters">Stream Filters</h2>
<p>The example stream shown above contains the exact byte sequence that a PDF reader would get. But
just dumping all streams without compression into a PDF would lead to large PDF files. Therefore
streams can employ filters that need to be applied to the raw stream data to get the real stream
data.</p>
<p>Of the 9 filters (I will leave out the <code>Crypt</code> filter because it is a special construct) that are
defined by the PDF specification four filters deal exclusively with image data:</p>
<ul>
<li><code>CCITTFaxDecode</code> handles images encoded with group 3 and 4 CCITT fax encoding,</li>
<li><code>JBIG2Decode</code> handles monochrome images in JBIG2 encoding,</li>
<li><code>DCTDecode</code> handles JPEG images, and</li>
<li><code>JPXDecode</code> handles JPEG2000 images.</li>
</ul>
<p>Currently, none of these four filters are implemented in HexaPDF. Decoding JPEG or JPEG2000 images
is currently not necessary because we can just put a whole JPEG/JPEG2000 image in a PDF stream, set
the filter accordingly and it just works. However, this is not the case with the other two filters
and therefore they will be implemented in a future version.</p>
<p>That leaves us with five remaining filters of which two, <code>ASCIIHexDecode</code> and <code>ASCII85Decode</code>, are
used to ensure that streams in a PDF file are encoded using only ASCII characters, making it
possible to create PDFs consisting only of ASCII characters. The problem with them is that they make
the streams bigger instead of smaller (e.g. with <code>ASCIIHexDecode</code> each source byte gets encoded by
two bytes) and are therefore seldomly used.</p>
<p>Finally, the last three filters deal with compressing data:</p>
<ul>
<li>
<p>The <code>RunLengthDecode</code> filter employs a simple run length encoding to compress data. You will
probably never see it used.</p>
</li>
<li>
<p>The <code>LZWDecode</code> filter uses the Lempel-Ziv-Welch algorithm, that is also used by the TIFF format,
to compress data. You will also probably never see it used.</p>
</li>
<li>
<p>Finally, <code>FlateDecode</code> uses the zlib/deflate compression method and this is what is used most of
the time since it offers better compression than the other two.</p>
</li>
</ul>
<p>The <code>LZWDecode</code> and <code>FlateDecode</code> filters can additionally use a predictor algorithm that prepares
the input stream so that higher compression rates can be achieved. This predictor algorithm is taken
from the PNG specification and together with the deflate algorithm allows for the easy embedding of
PNG images into a PDF.</p>
<p>Now that you know which filters are available, we will look at how to they are used.</p>
<p>If a stream has filters applied, the stream dictionary’s <code>/Filter</code> key needs to be set to the
applied filters. You read correctly, more than one filter can be applied to a stream; however, this
feature is rarely used. Additionally, the <code>/DecodeParms</code> key can be used to supply decoding
parameters for each filter.</p>
<p>Going back to our earlier example, it would look like this if the <code>ASCII85Decode</code> and
<code>ASCIIHexDecode</code> filters were applied in that order on encoding (note that the filters describe the
decoding order):</p>
<pre><code>1 0 obj
<</Length 35 /Filter [/ASCIIHexDecode /ASCII85Decode]>>
stream
3837635552445d692c2245626f38307e3e>
endstream
endobj
</code></pre>
<h2 id="hexapdf">Implementation in HexaPDF</h2>
<p>Since PDF streams are essentially dictionaries with a byte stream attached, they are implemented in
HexaPDF as the subclass <a href="https://hexapdf.gettalong.org/api/HexaPDF/Stream.html">HexaPDF::Stream</a> of <a href="https://hexapdf.gettalong.org/api/HexaPDF/Dictionary.html">HexaPDF::Dictionary</a>. The class provides all necessary
convenience methods to access, decode and encode streams.</p>
<p>The stream data itself can either be a simple String or a <a href="https://hexapdf.gettalong.org/api/HexaPDF/StreamData.html">HexaPDF::StreamData</a> object. The former
is mostly used for setting the stream data when creating a PDF file or when processing the decoded
stream data. The latter is used to represent the stream data <strong>without actually reading/decoding</strong>
it. The last bit is important since it means that HexaPDF can load large stream objects without
needing to read the stream data itself if it is not used.</p>
<p><a href="https://hexapdf.gettalong.org/api/HexaPDF/StreamData.html">HexaPDF::StreamData</a> objects basically just store a reference to an IO object, an offset and a
length. When asked for the data, i.e. when a stream needs to be read and decoded, it returns an
object that reads the raw data in chunks to avoid huge memory use when possible. The raw stream data
is then passed through the filters specified by the stream dictionary to get the decoded stream
data. Since the raw data is read in chunks, it means that the filters need to be aware of that, too.
Otherwise the benefits of reading in chunks is wasted. Finally, if the whole stream data is needed
at once, it is read as described above but concatenated into one huge string.</p>
<p>The best way to think of this is as a <strong>filter pipeline</strong>:</p>
<ul>
<li>The first object in the pipeline is responsible for providing the data chunks.</li>
<li>The middle objects then transform the data chunks according to some defined algorithms.</li>
<li>The last object collects the data chunks and either concatenates them into a string or does
something else with them, e.g. writing the chunks to a file.</li>
</ul>
<p>Therefore the requirements for filter objects used in such a pipeline are:</p>
<ul>
<li>Can handle arbitrarily large chunks of source data, from 1 byte upwards</li>
<li>Can process the source data in chunks, i.e. it doesn’t need all the data to start processing</li>
</ul>
<p>Thinking about all this, Ruby’s <strong>fiber</strong> objects immediately came to mind, mostly because I
remembered a <a href="https://pragdave.me/blog/2007/12/30/pipelines-using-fibers-in-ruby-19.html">blog post about implementing pipelines using fibers</a> by Dave Thomas.</p>
<p>The neat thing about fibers is that they allow you to <strong>interrupt an algorithm at any point and
return to that exact same point later on</strong>, continuing with the algorithm. This is in stark contrast
with methods, procs and the like because they always start from the top, even if interrupted in the
middle. Koichi Sasada gave a <a href="http://rubykaigi.org/2017/presentations/ko1.html">great talk about fibers</a> at this year’s Ruby Kaigi that
you should definitely check out.</p>
<p>As you can see fibers are a perfect fit for implementing the PDF filter pipeline. I have implemented
some <a href="https://hexapdf.gettalong.org/api/HexaPDF/Filter/">helper methods</a> for creating the initial, source data yielding fibers and for
collecting the results.</p>
<p>The filters themselves (e.g.<a href="https://hexapdf.gettalong.org/api/HexaPDF/Filter/ASCIIHexDecode.html">HexaPDF::Filter::ASCIIHexDecode</a>) are implemented as modules that have
the methods <code>encoder(source, options = nil)</code> and <code>decoder(source, options = nil)</code>. These two methods
create fibers that transform the data received via the <code>source</code> argument and yield the results.</p>
<p>If we were to manually read a PDF stream, the process would be like this (using our example stream
from above):</p>
<ul>
<li>
<p>Create fiber <code>a</code> that knows how to read chunks from the IO (<a href="https://github.com/gettalong/hexapdf/blob/master/lib/hexapdf/stream.rb#L87">the real code</a>):</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">a</span> <span class="o">=</span> <span class="no">HexaPDF</span><span class="o">::</span><span class="no">Filter</span><span class="p">.</span><span class="nf">source_from_io</span><span class="p">(</span><span class="n">io</span><span class="p">,</span> <span class="ss">pos: </span><span class="n">offset</span><span class="p">,</span> <span class="ss">length: </span><span class="n">length</span><span class="p">,</span> <span class="ss">chunk_size: </span><span class="n">chunk_size</span><span class="p">)</span>
</code></pre></div> </div>
</li>
<li>
<p>Check whether the stream employs filters (our example stream does) and wrap the fiber <code>a</code> in the
necessary filter fibers (<a href="https://github.com/gettalong/hexapdf/blob/master/lib/hexapdf/stream.rb#L172">the real code</a>):</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">b</span> <span class="o">=</span> <span class="no">HexaPDF</span><span class="o">::</span><span class="no">Filter</span><span class="o">::</span><span class="no">ASCIIHexDecode</span><span class="p">.</span><span class="nf">decoder</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="n">c</span> <span class="o">=</span> <span class="no">HexaPDF</span><span class="o">::</span><span class="no">Filter</span><span class="o">::</span><span class="no">ASCII85Decode</span><span class="p">.</span><span class="nf">decoder</span><span class="p">(</span><span class="n">b</span><span class="p">)</span>
</code></pre></div> </div>
</li>
<li>
<p>Note that <strong>nothing has been read so far</strong> since the fibers were just created but not resumed. To
get the string we retrieve the chunks by continuously resuming our fiber and concatenate the
chunks (<a href="https://github.com/gettalong/hexapdf/blob/master/lib/hexapdf/stream.rb#L144">the real code</a>):</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">HexaPDF</span><span class="o">::</span><span class="no">Filter</span><span class="p">.</span><span class="nf">string_from_source</span><span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="c1"># => "Hello World!"</span>
</code></pre></div> </div>
</li>
</ul>
<p>And that’s the whole magic!</p>
<h2 id="conclusion">Conclusion</h2>
<p>This post showed you how PDF streams and filters work in general and how they are implemented in
HexaPDF. Using Ruby’s fiber objects HexaPDF can lazily load PDF streams and perform chunk-wise
processing on them, avoiding huge memory usage.</p>
<p>In a future post I will introduce you to the security features of PDF, how they work and how HexaPDF
implements them.</p>
Animated Turtle Graphics using PDFhttps://gettalong.org/blog/2017/turtle-graphics-with-pdf.html2017-02-08T23:10:55+01:002017-02-08T19:08:00+01:00
<p>After seeing one of Jamis Buck’s <a href="http://weblog.jamisbuck.org/2016/11/5/weekly-programming-challenge-15.html">weekly programming challenges</a> being the implementation of
a turtle graphics system, I decided to tackle this one using <a href="https://hexapdf.gettalong.org">HexaPDF</a> as backend.</p>
<p>First I will introduce the basics of turtle graphics. Then I will show you how a simple
implementation using HexaPDF looks like and some examples. After that it’s show time - (ab)using the
presentation capabilities of PDF to animate the turtle graphics!</p>
<h2 id="turtle-graphics-basics">Turtle Graphics Basics</h2>
<p>If you are not familiar with turtle graphics, here is a short primer:</p>
<ul>
<li>There is a turtle which has an <strong>initial position and heading</strong>.</li>
<li>It can <strong>move</strong> a given number of steps forward or backwards.</li>
<li>It can <strong>turn</strong> a given number of degrees to the left or right.</li>
<li>It can <strong>move with or without drawing a line</strong>.</li>
</ul>
<p>The turtle follows the instructions that you give and draws an image consisting of lines as a
result. That’s it!</p>
<p>There can also be additional instructions like changing the width or color of the drawn lines but
they are not necessary for the basic turtle graphics.</p>
<h2 id="implementation-using-hexapdf">Implementation Using HexaPDF</h2>
<p>When you look at a PDF in a viewer, you will find that it can contain <strong>vector graphics</strong> besides
raster graphics and text on a page. Not everything is built-in, though. For example, there are no
native instructions for drawing circles, they have to be approximated using Bézier curves.</p>
<p>However, for the turtle graphics I only need to be able to draw lines and there <em>are</em> native PDF
instructions for them. HexaPDF has a <a href="https://hexapdf.gettalong.org/api/HexaPDF/Content/Canvas.html">Canvas</a> class that provides access to all these PDF drawing
instructions, so I chose it as backend for drawing.</p>
<p>One other design decision was that the given <strong>instructions should be recorded</strong> so that the turtle
graphics can be used multiple times, either on the same PDF page or on different pages.</p>
<p>The implementation itself was straightforward (see below for comments):</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'hexapdf'</span>
<span class="k">class</span> <span class="nc">Turtle</span>
<span class="no">Instruction</span> <span class="o">=</span> <span class="no">Struct</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:operation</span><span class="p">,</span> <span class="ss">:arg</span><span class="p">)</span> <span class="c1"># (1)</span>
<span class="k">def</span> <span class="nf">initialize</span>
<span class="vi">@instructions</span> <span class="o">=</span> <span class="p">[]</span>
<span class="vi">@x</span> <span class="o">=</span> <span class="mi">0</span>
<span class="vi">@y</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">move</span><span class="p">(</span><span class="n">steps</span><span class="p">)</span> <span class="c1"># (2)</span>
<span class="vi">@instructions</span> <span class="o"><<</span> <span class="no">Instruction</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:move</span><span class="p">,</span> <span class="n">steps</span><span class="p">)</span>
<span class="nb">self</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="n">steps</span><span class="p">)</span> <span class="c1"># (2)</span>
<span class="n">move</span><span class="p">(</span><span class="n">steps</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">back</span><span class="p">(</span><span class="n">steps</span><span class="p">)</span> <span class="c1"># (2)</span>
<span class="n">move</span><span class="p">(</span><span class="o">-</span><span class="n">steps</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">turn</span><span class="p">(</span><span class="n">degrees</span><span class="p">)</span> <span class="c1"># (2)</span>
<span class="vi">@instructions</span> <span class="o"><<</span> <span class="no">Instruction</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:turn</span><span class="p">,</span> <span class="n">degrees</span><span class="p">)</span>
<span class="nb">self</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">right</span><span class="p">(</span><span class="n">degrees</span><span class="p">)</span> <span class="c1"># (2)</span>
<span class="n">turn</span><span class="p">(</span><span class="o">-</span><span class="n">degrees</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">left</span><span class="p">(</span><span class="n">degrees</span><span class="p">)</span> <span class="c1"># (2)</span>
<span class="n">turn</span><span class="p">(</span><span class="n">degrees</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">pen</span><span class="p">(</span><span class="n">up_or_down</span><span class="p">)</span> <span class="c1"># (2)</span>
<span class="vi">@instructions</span> <span class="o"><<</span> <span class="no">Instruction</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:pen_down</span><span class="p">,</span> <span class="n">up_or_down</span> <span class="o">==</span> <span class="ss">:down</span><span class="p">)</span>
<span class="nb">self</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">draw</span><span class="p">(</span><span class="n">canvas</span><span class="p">)</span> <span class="c1"># (3)</span>
<span class="n">x</span> <span class="o">=</span> <span class="vi">@x</span>
<span class="n">y</span> <span class="o">=</span> <span class="vi">@y</span>
<span class="n">heading</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">pen_down</span> <span class="o">=</span> <span class="kp">true</span>
<span class="n">canvas</span><span class="p">.</span><span class="nf">move_to</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="vi">@instructions</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">instruction</span><span class="o">|</span>
<span class="k">case</span> <span class="n">instruction</span><span class="p">.</span><span class="nf">operation</span>
<span class="k">when</span> <span class="ss">:move</span>
<span class="n">x</span> <span class="o">+=</span> <span class="no">Math</span><span class="p">.</span><span class="nf">cos</span><span class="p">(</span><span class="n">heading</span><span class="p">)</span> <span class="o">*</span> <span class="n">instruction</span><span class="p">.</span><span class="nf">arg</span> <span class="o">*</span> <span class="vi">@scale</span>
<span class="n">y</span> <span class="o">+=</span> <span class="no">Math</span><span class="p">.</span><span class="nf">sin</span><span class="p">(</span><span class="n">heading</span><span class="p">)</span> <span class="o">*</span> <span class="n">instruction</span><span class="p">.</span><span class="nf">arg</span> <span class="o">*</span> <span class="vi">@scale</span>
<span class="k">if</span> <span class="n">pen_down</span>
<span class="n">canvas</span><span class="p">.</span><span class="nf">line_to</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="k">else</span>
<span class="n">canvas</span><span class="p">.</span><span class="nf">move_to</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">when</span> <span class="ss">:turn</span>
<span class="n">heading</span> <span class="o">+=</span> <span class="no">Math</span><span class="o">::</span><span class="no">PI</span> <span class="o">/</span> <span class="mf">180.0</span> <span class="o">*</span> <span class="n">instruction</span><span class="p">.</span><span class="nf">arg</span>
<span class="k">when</span> <span class="ss">:pen_down</span>
<span class="n">pen_down</span> <span class="o">=</span> <span class="n">instruction</span><span class="p">.</span><span class="nf">arg</span>
<span class="k">else</span>
<span class="k">raise</span> <span class="no">ArgumentError</span><span class="p">,</span> <span class="s2">"Unsupported turtle graphics operation"</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="n">canvas</span><span class="p">.</span><span class="nf">stroke</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">configure</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="c1"># (4)</span>
<span class="n">new</span><span class="p">.</span><span class="nf">configure</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">configure</span><span class="p">(</span><span class="ss">x: </span><span class="kp">nil</span><span class="p">,</span> <span class="ss">y: </span><span class="kp">nil</span><span class="p">)</span> <span class="c1"># (4)</span>
<span class="vi">@x</span> <span class="o">=</span> <span class="n">x</span> <span class="k">if</span> <span class="n">x</span>
<span class="vi">@y</span> <span class="o">=</span> <span class="n">y</span> <span class="k">if</span> <span class="n">y</span>
<span class="nb">self</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">create_pdf</span><span class="p">(</span><span class="n">width</span><span class="p">,</span> <span class="n">height</span><span class="p">)</span> <span class="c1"># (5)</span>
<span class="n">doc</span> <span class="o">=</span> <span class="no">HexaPDF</span><span class="o">::</span><span class="no">Document</span><span class="p">.</span><span class="nf">new</span>
<span class="n">page</span> <span class="o">=</span> <span class="n">doc</span><span class="p">.</span><span class="nf">pages</span><span class="p">.</span><span class="nf">add</span>
<span class="n">page</span><span class="p">[</span><span class="ss">:MediaBox</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">width</span><span class="p">,</span> <span class="n">height</span><span class="p">]</span>
<span class="n">page</span><span class="p">.</span><span class="nf">canvas</span><span class="p">.</span><span class="nf">draw</span><span class="p">(</span><span class="nb">self</span><span class="p">,</span> <span class="ss">x: </span><span class="n">width</span> <span class="o">/</span> <span class="mf">2.0</span><span class="p">,</span> <span class="ss">y: </span><span class="n">height</span> <span class="o">/</span> <span class="mf">2.0</span><span class="p">)</span>
<span class="n">doc</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="no">HexaPDF</span><span class="o">::</span><span class="no">DefaultDocumentConfiguration</span><span class="p">[</span><span class="s1">'graphic_object.map'</span><span class="p">][</span><span class="ss">:turtle</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'Turtle'</span> <span class="c1"># (4)</span>
</code></pre></div></div>
<p>Comments:</p>
<ol>
<li>
<p>This is the struct for representing a single instruction.</p>
</li>
<li>
<p>The methods for moving, turning and putting the pen down or up are just adding instruction
objects to the instruction list. Returning the turtle object allows chaining these methods
together.</p>
</li>
<li>
<p>After the instructions have been recorded, they have to be played back on a canvas. This is done
in the <code>#draw(canvas)</code> method by iterating over the instructions and following them.</p>
</li>
<li>
<p>The instance method <code>#configure</code> is used to set the initial position of the turtle on the canvas.</p>
<p>Together with the class method <code>::configure</code> and the instance method <code>#draw</code> the requirements for
being a <a href="https://hexapdf.gettalong.org/api/HexaPDF/Content/GraphicObject/index.html">“graphic object”</a> are fulfilled. Additionally, the turtle class is registered with
HexaPDF so that it can be used on any canvas without knowing the actual class name.</p>
</li>
<li>
<p>This convenience method returns a HexaPDF document containing a single page on which the turtle
graphics have been drawn. The document can be modified if needed, or just written out using the
<code>HexaPDF::Document#write</code> method.</p>
</li>
</ol>
<p>As can be seen implementing the basics is rather easy. I implemented several additional features
later on, like setting the color and line width. The final implementation is <a href="https://github.com/gettalong/misc/blob/master/hexapdf-turtle/turtle.rb">available on
Github</a>.</p>
<h2 id="examples">Examples</h2>
<p>Additionally to some examples written by myself, I took the three examples from Jamis Buck’s
solution (<code>boxes.rb</code>, <code>circles.rb</code> and <code>spiral.rb</code>) and adapted them for my implementation.</p>
<p>Here are the results (PDFs rendered as PNGs):</p>
<p class="align-center"><code>boxes.rb</code> <br />
<img src="assets/boxes.png" alt="Boxes" /> <br />
<code>circles.rb</code> <br />
<img src="assets/circles.png" alt="Circles" /> <br />
<code>spiral.rb</code> <br />
<img src="assets/spiral.png" alt="Spiral" /> <br />
<code>color-spiral.rb</code> <br />
<img src="assets/color-spiral.png" alt="Colored spiral" /> <br />
<code>tree.rb</code> <br />
<img src="assets/tree.png" alt="Tree" /> <br />
<code>ruby.rb</code> <br />
<img src="assets/ruby.png" alt="Ruby" /></p>
<h2 id="animating-the-turtle-graphics">Animating the Turtle Graphics</h2>
<p>Now that we have an implementation and some example, we can come to the fun part: <strong>Animating the
turtle graphics</strong>!</p>
<p>There are several ways to do animations in PDF. However, most of them are not well supported in
viewers other than Adobe Reader:</p>
<ul>
<li>
<p>The only way (as far as I know) to provide animations in PDF without invoking presentation mode is
to <strong>use Javascript and widgets</strong> (see the <a href="http://mirror.unl.edu/ctan/macros/latex/contrib/animate/animate.pdf">animate</a> LaTeX package as an example). However, this
only works with Adobe Reader under Linux.</p>
</li>
<li>
<p>So we are left with <strong>(ab)using presentation mode</strong> for our animation.</p>
<p>PDF supports <strong>sub-page navigation</strong> where a presentation step doesn’t show the next page but does
something else. This could be used, in conjunction with <strong>optional content groups</strong> (OCGs; think:
layers), to show frame after frame for the animation. The benefit is that only a single page is
needed. However, although OCGs are supported by most Linux PDF viewers (e.g. Okular and Evince),
sub-page navigation is not, at least not together with OCGs.</p>
</li>
<li>
<p>This means that we have to use individual pages, one page for one frame of the animation.</p>
</li>
</ul>
<p>The straightforward implementation would be to render the first frame on the first page, the first
and second frames on the second page, the first three frames on the third page and so on. But this
means that we have a complexity of <code>O(n^2/2)</code> time and space wise. Therefore this is not an ideal
solution.</p>
<p>My first though for remedying this situation was to use <strong>Form XObjects</strong>. This is a way in PDF to
store repeated content, like a header, in a separate object and reuse it only several pages. For the
animation we could store all frames in separate Form XObjects and then nest the form XObjects to get
the desired result.</p>
<p>I.e. page 1 uses xobject 1 containing frame 1; page 2 uses xobject 2 containing a reference to
xobject 1 and frame 2; page 3 uses xobject 3 containing a reference to xobject 2 and frame 3; and so
on.</p>
<p>As it turns out, there is a limit on how deep Form XObjects can be nested. The limit for Adobe
Reader seems to be about 30 nesting levels, Okular’s is at about 100. So this isn’t a general
solution either.</p>
<p>What I settled for was using <strong>multiple content streams</strong>. In PDF page definitions are separate from
the content streams. This means that multiple pages can refer to the same content stream and that a
single page can refer to multiple content streams.</p>
<p>My implementation reuses an existing content stream every 10 frames. This means that the first page
contains the first frame, the second page the first and second frames, …, up to the tenth page
which contains the first ten frames. The eleventh page then just references the content stream with
the first ten frames and uses another content stream with only the eleventh frame.</p>
<p>This means that the frames have to be iterated only once and that the needed space is also vastly
reduced. The implementation can be found in the <a href="https://github.com/gettalong/misc/blob/master/hexapdf-turtle/turtle.rb"><code>#create_pdf_animation</code> method</a>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>First and foremost: Turtle graphics are fun! Really, they are! ☺</p>
<p>But besides being fun they are also useful for <strong>teaching programming to kids</strong> (<a href="https://en.wikipedia.org/wiki/Logo_programming_language">Logo programming
language anyone?</a>) or visualizing <strong><a href="https://en.wikipedia.org/wiki/Lindenmayer_system">Lindenmayer systems</a></strong>.</p>
<p>Implementing the basics was rather easy, especially when a capable backend for drawing is already
available as was the case with HexaPDF. However, I was a bit disappointed to find out that <strong>a bit
more advanced PDF functionality like sub-page navigation isn’t implemented in most Linux PDF
viewers</strong>. There is a version of Adobe Reader available on Linux but its outdated and should not be
used due to security vulnerabilities.</p>
<p>Lastly, I gave <strong>talk at <a href="http://www.vienna-rb.at/">vienna.rb</a></strong> last week on February 2nd on this topic which was quite well
received. There is a <strong><a href="https://twitter.com/viennarb/status/827250584294137857">video of a sample animation</a></strong> shown during the talk, complete
with a moving turtle. The <strong><a href="https://github.com/gettalong/misc/tree/master/talk-2017-02-vienna.rb">slides of the talk in PDF format</a> were created with
<a href="https://hexapdf.gettalong.org">HexaPDF</a></strong> and the turtle graphics systems, showing off both of them.</p>
<h2 id="links">Links</h2>
<ul>
<li><a href="http://weblog.jamisbuck.org/2016/11/5/weekly-programming-challenge-15.html">Jamis Buck Weekly Programming Challenge #15</a></li>
<li><a href="https://hexapdf.gettalong.org">HexaPDF website</a></li>
<li><a href="https://github.com/gettalong/misc/blob/master/hexapdf-turtle/">Source of the turtle graphics implementation and examples</a>.</li>
<li><a href="https://github.com/gettalong/misc/tree/master/talk-2017-02-vienna.rb">Source for the talk at vienna.rb</a></li>
</ul>
Ruby 2.4 Performance Looking Goodhttps://gettalong.org/blog/2016/ruby24-performance-looking-good.html2016-12-07T23:48:38+01:002016-12-07T23:08:00+01:00
<p>There are <a href="http://www.blackbytes.info/2016/12/new-ruby-features/">some</a> <a href="http://blog.bigbinary.com/categories/Ruby-2-4">articles</a> highlighting new features of the upcoming Ruby 2.4. I decided to run a
basic benchmark comparing Ruby 2.4 to Ruby 2.3.3 and was pleasantly surprised.</p>
<p>A few weeks ago I wrote about <a href="hexapdf-performance-benchmark.html">HexaPDF’s performance</a> by running a benchmark that compares
HexaPDF to various other tools in regards to optimizing the size of a PDF file.</p>
<p>However, with this real-world benchmark I cannot only compare HexaPDF to other tools but also to
itself on other Ruby versions. Note that this is neither an artificial benchmark nor a
micro-benchmark since PDF files are parsed, their in-memory representation modified and then
serialized again by the <a href="http://hexapdf.gettalong.org">HexaPDF library</a>. This involves a lot of string to Ruby object
conversion and vice versa.</p>
<p>Here are the results:</p>
<pre><code>|----------------------------------------------||-----------------------|
| | Ruby 2.3.3p222 || Ruby 2.4.0preview3 |
|----------------------------------------------||-----------------------|
| a.pdf (53,056) | Time | Memory || Time | Memory |
|----------------------------------------------||-----------------------|
| hexapdf | 189ms | 14,992KiB || 222ms | 13,396KiB |
| hexapdf C | 152ms | 14,912KiB || 137ms | 13,260KiB |
| hexapdf CS | 154ms | 15,920KiB || 152ms | 14,432KiB |
| hexapdf CSP | 158ms | 16,600KiB || 176ms | 15,124KiB |
|----------------------------------------------||-----------------------|
|----------------------------------------------||-----------------------|
| b.pdf (11,520,218) | Time | Memory || Time | Memory |
|----------------------------------------------||-----------------------|
| hexapdf | 1,188ms | 31,356KiB || 900ms | 32,512KiB |
| hexapdf C | 1,055ms | 33,460KiB || 1,025ms | 33,480KiB |
| hexapdf CS | 1,120ms | 34,512KiB || 1,062ms | 35,396KiB |
| hexapdf CSP | 9,469ms | 84,896KiB || 8,891ms | 79,924KiB |
|----------------------------------------------||-----------------------|
|----------------------------------------------||-----------------------|
| c.pdf (14,399,980) | Time | Memory || Time | Memory |
|----------------------------------------------||-----------------------|
| hexapdf | 2,286ms | 44,840KiB || 2,020ms | 39,808KiB |
| hexapdf C | 2,201ms | 49,940KiB || 2,063ms | 39,908KiB |
| hexapdf CS | 2,354ms | 53,076KiB || 2,211ms | 46,944KiB |
| hexapdf CSP | 10,148ms | 104,680KiB || 9,889ms | 97,088KiB |
|----------------------------------------------||-----------------------|
|----------------------------------------------||-----------------------|
| d.pdf (8,107,348) | Time | Memory || Time | Memory |
|----------------------------------------------||-----------------------|
| hexapdf | 5,834ms | 104,844KiB || 5,113ms | 65,068KiB |
| hexapdf C | 5,762ms | 90,940KiB || 5,045ms | 62,256KiB |
| hexapdf CS | 6,254ms | 84,860KiB || 5,692ms | 71,036KiB |
| hexapdf CSP | 6,327ms | 98,496KiB || 5,798ms | 102,684KiB |
|----------------------------------------------||-----------------------|
|----------------------------------------------||-----------------------|
| e.pdf (21,788,087) | Time | Memory || Time | Memory |
|----------------------------------------------||-----------------------|
| hexapdf | 1,001ms | 53,352KiB || 811ms | 47,156KiB |
| hexapdf C | 1,111ms | 107,264KiB || 1,065ms | 105,084KiB |
| hexapdf CS | 1,152ms | 108,276KiB || 1,069ms | 101,172KiB |
| hexapdf CSP | 35,771ms | 186,952KiB || 37,525ms | 202,364KiB |
|----------------------------------------------||-----------------------|
|----------------------------------------------||-----------------------|
| f.pdf (154,752,614) | Time | Memory || Time | Memory |
|----------------------------------------------||-----------------------|
| hexapdf | 60,355ms | 606,736KiB || 55,118ms | 484,672KiB |
| hexapdf C | 64,876ms | 592,752KiB || 58,753ms | 532,488KiB |
| hexapdf CS | 69,811ms | 716,004KiB || 63,725ms | 653,232KiB |
| ERR hexapdf CSP | 0ms | 0KiB || 0ms | 0KiB |
|----------------------------------------------||-----------------------|
</code></pre>
<p>When looking at the time, especially for <code>f.pdf</code>, it is clear that <strong>Ruby 2.4 is about 9% faster
than 2.3.3</strong> (ignore the <code>a.pdf</code> case since this is a very small file where the initialization cost
distorts the results)!</p>
<p>To be honest, I expected Ruby 2.4 to be faster because of the Ruby 3x3 initiative. However, what I
didn’t expect was that the <strong>memory consumption is also reduced by about 10%</strong>!</p>
<p>Props to the Ruby core team – I’m looking forward to using Ruby 2.4!</p>
PDF Object Representation in HexaPDFhttps://gettalong.org/blog/2016/pdf-object-representation-in-hexapdf.html2016-11-25T17:33:57+01:002016-11-25T17:29:00+01:00
<p>To work with PDFs using a library means that you need to understand at least the part of the PDF
specification that is about the PDF object system. This post will introduce this part and then look
at how <a href="http://hexapdf.gettalong.org">HexaPDF</a> implements it.</p>
<h2 id="the-pdf-file-format---a-short-introduction">The PDF File Format - A Short Introduction</h2>
<p>If you look at a PDF file using a text editor (e.g. <code>vi -b</code>) you will find <strong>ASCII text intermingled
with binary data</strong>. The reason for this is that the basic structure of a PDF is defined using ASCII
characters. It is even possible to create a PDF using only ASCII characters, although it will be
bigger than necessary.</p>
<p>A PDF file basically consists of four parts:</p>
<dl>
<dt><strong>Header</strong></dt>
<dd>The header defines the PDF version and may contain binary bytes to indicate that the PDF contains
binary data.</dd>
<dt><strong>Body</strong></dt>
<dd>The body contains the real data of the PDF file in so called “indirect objects”, see below.</dd>
<dt><strong>Cross-Reference Table</strong></dt>
<dd>The cross-reference table contains information that allows accessing an indirect object directly,
without scanning the whole file.</dd>
<dt><strong>File Trailer</strong></dt>
<dd>This last part contains information to find the cross-reference table and certain other important
objects.</dd>
</dl>
<p>You may have noticed that I have written about “objects” that are inside the PDF file. The reason
for this is that PDF has the notion of objects of various types:</p>
<ul>
<li><strong>Booleans</strong>: Represented by <code>true</code> and <code>false</code></li>
<li><strong>Numerics</strong>: Integers like <code>123</code> and floats like <code>123.45</code></li>
<li><strong>Strings</strong>: May be serialized as literal strings using parentheses, e.g. <code>(Test)</code>, or
hexadecimal strings using angle brackets, e.g. <code><ABCDEF></code>; also supports binary strings.</li>
<li><strong>Names</strong>: Work like symbols in Ruby; represented by prefixing a slash to the name, e.g.
<code>/Name</code></li>
<li><strong>Arrays</strong>: Represented by using brackets around the values, e.g. <code>[123 (Test) /Name]</code></li>
<li><strong>Dictionaries</strong>: Like hashes in Ruby but can only have name objects as keys; represented by
double angle brackets where each key is followed by its value, e.g. <code><</Key (Value) /AnotherKey
12345>></code></li>
<li><strong>Null</strong>: Like nil in Ruby; represented by <code>null</code></li>
<li><strong>Streams</strong>: A sequence of potentially unlimited bytes; represented as a dictionary followed by
<code>stream\n...stream bytes...\nendstream</code>; always has to be an indirect object and may be filtered</li>
<li><strong>Indirect objects</strong>: An object of any of the above types that is additionally assigned an object
identifier consisting of an object number (a positive integer) and a generation number (a
non-negative integer); represented like this: <code>4 0 obj (SomeObject) endobj</code>; can be referenced
from another object like this: <code>4 0 R</code></li>
</ul>
<p>Knowing the above it is possible to get any indirect object:</p>
<ul>
<li>First the PDF header is checked whether the PDF version is supported.</li>
<li>Then the end of the file is inspected to find the file trailer and the position of the
cross-reference table.</li>
<li>The cross-reference table is searched for the position of the indirect object that should be read.</li>
<li>Finally, the found position is used to read the indirect object.</li>
</ul>
<p>The first two steps only need to be done once whereas the last two steps need to be done for each
indirect object.</p>
<p>While this gives you access to any indirect object, the meaning of this indirect object may not be
apparent. This is where the file trailer dictionary comes in: It provides named references to the
most important objects. They in turn reference other objects and so on, building an object graph.
Like the file trailer, the most important parts of a PDF are built with dictionaries, for example
pages, fonts and annotations.</p>
<p>This all is very abstract, so let’s use the <code>hexapdf inspect</code> command to inspect a PDF file and show
the file trailer and some objects. The option <code>-o</code> is used for showing an indirect object and <code>-s</code>
for showing raw or unfiltered stream data:</p>
<asciinema-player src="assets/hexapdf-inspect.json" cols="100" rows="29" speed="2"></asciinema-player>
<h2 id="hexapdf-implementation-of-the-pdf-object-types">HexaPDF Implementation of the PDF Object Types</h2>
<p>Now that you know the basics of the PDF file format, I can move on to describing HexaPDF’s
implementation of it.</p>
<p>First and foremost, <strong>nearly all object types can and are mapped directly to one of Ruby’s built-in
types</strong>, only stream and indirect objects need custom implementations (see <a href="http://hexapdf.gettalong.org/api/HexaPDF/Stream.html">HexaPDF::Stream</a> and
<a href="http://hexapdf.gettalong.org/api/HexaPDF/Object.html">HexaPDF::Object</a>). On the one hand, this makes working with PDF objects very easy since you can
just use the normal Ruby data structures. And on the other hand it has benefits in regards to memory
usage and execution performance.</p>
<p>Since <strong>the PDF dictionary is the most important type</strong>, there is a wrapper class
<a href="http://hexapdf.gettalong.org/api/HexaPDF/Dictionary.html">HexaPDF::Dictionary</a> which provides convenience methods. For example, accessing a value
automatically dereferences it so that not the reference itself is returned, but the indirect object
it references.</p>
<p>This certainly increases memory usage but allows HexaPDF to do something else, too, namely
<strong>automatic mapping of PDF objects to specific subclasses of HexaPDF::Dictionary</strong>. For example, a
page object is a PDF dictionary and would normally be represented by HexaPDF::Dictionary. However,
since there is a more specific subclass <a href="http://hexapdf.gettalong.org/api/HexaPDF/Type/Page.html">HexaPDF::Type::Page</a> registered for it, this subclass is
used.</p>
<p>Internally, this is made possible by a HexaPDF::Object not actually storing the indirect object’s
data but just a <a href="http://hexapdf.gettalong.org/api/HexaPDF/PDFData.html">HexaPDF::PDFData</a> object that holds everything related to an indirect object. So it
doesn’t matter whether a HexaPDF::Object or a HexaPDF::Type::Page object is used as wrapper as long
as they use the same HexaPDF::PDFData object. Again, this increases memory usage but the gains are
worth it.</p>
<p>This mapping is done automatically behind the scenes and can be configured via the global
configuration object (see <a href="http://hexapdf.gettalong.org/api/HexaPDF/index.html#GlobalConfiguration">HexaPDF::GlobalConfiguration</a>).</p>
<p>The PDF format also provides the ability to access a specific indirect object without loading any
other. This feature is used by HexaPDF so that indirect objects are loaded only when they are
accessed, i.e. it provides <strong>lazy loading of indirect objects</strong>. This provides performance and
memory benefits. However, there is currently one suboptimal part in this process: The whole
cross-reference table is loaded after loading a PDF document. This doesn’t matter for small PDF
files but for files with tens of thousands of objects there can be a rather large delay. I intend to
address this problem in the future.</p>
<p>In the context of stream objects, the unfiltered stream data (i.e. after decompression) can amount
to many mebibytes. Therefore the <strong>stream data itself is also lazily loaded</strong>: Only when the stream
data is needed it is read and unfiltered.</p>
<p>Everything mentioned above allows you to work with a <a href="http://hexapdf.gettalong.org/api/HexaPDF/Document.html">HexaPDF::Document</a> and its objects in a very
straight-forward way. As an example, the following code creates a new PDF document and assembles a
page dictionary manually that is then added to the document’s page tree:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'hexapdf'</span>
<span class="n">doc</span> <span class="o">=</span> <span class="no">HexaPDF</span><span class="o">::</span><span class="no">Document</span><span class="p">.</span><span class="nf">new</span>
<span class="n">page</span> <span class="o">=</span> <span class="n">doc</span><span class="p">.</span><span class="nf">add</span><span class="p">(</span><span class="no">Type</span><span class="p">:</span> <span class="ss">:Page</span><span class="p">,</span> <span class="no">MediaBox</span><span class="p">:</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">100</span><span class="p">])</span>
<span class="n">page</span><span class="p">.</span><span class="nf">contents</span> <span class="o">=</span> <span class="s2">"0 0 m 100 100 l S"</span>
<span class="n">doc</span><span class="p">.</span><span class="nf">pages</span> <span class="o"><<</span> <span class="n">page</span>
<span class="n">doc</span><span class="p">.</span><span class="nf">write</span><span class="p">(</span><span class="s2">"sample.pdf"</span><span class="p">)</span>
</code></pre></div></div>
<p>Note that the <code>doc.add(...)</code> call actually returns a page object and not a simple dictionary,
allowing the use of the <code>#contents</code> methods.</p>
<p>One thing to note, though, is that not all special PDF dictionaries have a subclass counterpart in
HexaPDF. There are, among others, subclasses for page objects, the main catalog object and the
trailer. However, this apparent lack doesn’t prevent you from working with these special PDF
dictionaries, it just means that you need to know the various needed keys yourself. For example,
there is currently no subclass for transition dictionaries (see section 12.4.4.1 in the PDF 1.7
specification) but we can still make use of them using plain Ruby objects:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'hexapdf'</span>
<span class="n">doc</span> <span class="o">=</span> <span class="no">HexaPDF</span><span class="o">::</span><span class="no">Document</span><span class="p">.</span><span class="nf">new</span>
<span class="n">doc</span><span class="p">.</span><span class="nf">pages</span><span class="p">.</span><span class="nf">add</span>
<span class="n">second_page</span> <span class="o">=</span> <span class="n">doc</span><span class="p">.</span><span class="nf">pages</span><span class="p">.</span><span class="nf">add</span>
<span class="n">third_page</span> <span class="o">=</span> <span class="n">doc</span><span class="p">.</span><span class="nf">pages</span><span class="p">.</span><span class="nf">add</span>
<span class="n">second_page</span><span class="p">.</span><span class="nf">canvas</span><span class="p">.</span><span class="nf">line_width</span><span class="p">(</span><span class="mi">20</span><span class="p">).</span><span class="nf">stroke_color</span><span class="p">(</span><span class="mi">255</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">).</span><span class="nf">line</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">400</span><span class="p">,</span> <span class="mi">400</span><span class="p">).</span><span class="nf">stroke</span>
<span class="n">third_page</span><span class="p">.</span><span class="nf">canvas</span><span class="p">.</span><span class="nf">line_width</span><span class="p">(</span><span class="mi">20</span><span class="p">).</span><span class="nf">stroke_color</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">255</span><span class="p">,</span> <span class="mi">0</span><span class="p">).</span><span class="nf">line</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">400</span><span class="p">,</span> <span class="mi">400</span><span class="p">,</span> <span class="mi">0</span><span class="p">).</span><span class="nf">stroke</span>
<span class="n">second_page</span><span class="p">[</span><span class="ss">:Trans</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="no">Type</span><span class="p">:</span> <span class="ss">:Trans</span><span class="p">,</span> <span class="no">S</span><span class="p">:</span> <span class="ss">:Split</span><span class="p">,</span> <span class="no">D</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span> <span class="no">Dm</span><span class="p">:</span> <span class="ss">:V</span><span class="p">}</span>
<span class="n">third_page</span><span class="p">[</span><span class="ss">:Trans</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="no">Type</span><span class="p">:</span> <span class="ss">:Trans</span><span class="p">,</span> <span class="no">S</span><span class="p">:</span> <span class="ss">:Blinds</span><span class="p">,</span> <span class="no">Dm</span><span class="p">:</span> <span class="ss">:H</span><span class="p">}</span>
<span class="n">doc</span><span class="p">.</span><span class="nf">write</span><span class="p">(</span><span class="s2">"sample.pdf"</span><span class="p">)</span>
</code></pre></div></div>
<p>Open the resulting PDF file, switch to presentation mode and move to the second and third pages.
Your viewing application, if it is compatible, will show you transitions between the pages.</p>
<h2 id="conclusion">Conclusion</h2>
<p>This post introduced the PDF object system and how it is implemented in HexaPDF. As you have seen
HexaPDF provides a very Ruby-like interface for working with the PDF object system while still
trying to be as memory efficient and high-performance as possible.</p>
<p>In a future post I will show you how HexaPDF’s implementation of stream filters work and why Ruby’s
Fiber objects are essential for it.</p>
HexaPDF Performance Benchmarkhttps://gettalong.org/blog/2016/hexapdf-performance-benchmark.html2016-11-06T14:09:50+01:002016-11-06T13:22:00+01:00
<p>My pure Ruby PDF library <a href="http://hexapdf.gettalong.org">HexaPDF</a> contains an application for working with PDFs. In this post I
look at how this application performs in comparison to other such applications.</p>
<h2 id="about-pdf-optimization">About PDF Optimization</h2>
<p>Although the <code>hexapdf</code> application can perform various commands, for example displaying information
about a PDF file, modifying a PDF file or extracting files from a PDF file, I will concentrate on
the command to modify a PDF file.</p>
<p>One of the ways to use this command is to optimize a PDF file in terms of its file size. This
involves reading and writing the PDF file and performing the optimization. Sometimes the word
“optimization” is used when a PDF file is linearized for faster display on web sites. Here I always
mean file size optimization.</p>
<p>There are various ways to optimize the file size of a PDF file and they can be divided into two
groups: lossless and lossy operations. Since all used applications perform only lossless
optimizations, I only look at those:</p>
<dl>
<dt>Removing unused and deleted objects</dt>
<dd>
<p>A PDF file can store multiple revisions of an object but only the last one is used. So all other
versions can safely be deleted.</p>
</dd>
<dt>Using object and cross-reference streams</dt>
<dd>
<p>A PDF file can be thought of as a collection of random-access objects that are stored sequentially
in an ASCII-based format. Object streams take those objects and store them compressed in a binary
format. And cross-reference streams stores the file offsets to the objects in a compressed manner,
instead of the standard ASCII-based format.</p>
</dd>
<dt>Recompressing page content streams</dt>
<dd>
<p>The content of a PDF page is described in an ASCII-based format. Some PDF producers don’t optimize
their output which can lead to bigger than necessary content streams or don’t store it in a
compressed format.</p>
</dd>
</dl>
<p>There are some more techniques for reducing the file size like font subsetting/merging/deduplication
or object and image deduplication. However, those are rather advanced and not implemented in most
PDF libraries because it is hard to get them right.</p>
<h2 id="benchmark-setup">Benchmark Setup</h2>
<p>There are many applications that can perform some or all of the optimizations mentioned above. Since
I’m working on Linux I will use applications that are readily available on this platform and which
are command line applications.</p>
<p>Since the abilities of the applications vary, following is a table of keys used to describe the
various operations:</p>
<table>
<thead>
<tr>
<th>Key</th>
<th>Operation</th>
</tr>
</thead>
<tbody>
<tr>
<td>C</td>
<td>Compacting by removing unused and deleted objects</td>
</tr>
<tr>
<td>S</td>
<td>Usage of object and cross-reference streams</td>
</tr>
<tr>
<td>P</td>
<td>Recompression of page content streams</td>
</tr>
</tbody>
</table>
<p>The list of the benchmarked applications:</p>
<dl>
<dt><strong>hexapdf</strong></dt>
<dd>
<p>Homepage: <a href="http://hexapdf.gettalong.org">http://hexapdf.gettalong.org</a><br />
Version: Latest version in <a href="https://github.com/gettalong/hexapdf">Github repository</a> <br />
Abilities: Any combination of C, S and P</p>
<p><code>hexapdf</code> is used multiple times, with increasing level of compression, with the following
commands:</p>
<table>
<tbody>
<tr>
<td>hexapdf</td>
<td><code>hexapdf modify -f input.pdf --no-compact output.pdf</code></td>
</tr>
<tr>
<td>hexapdf C</td>
<td><code>hexapdf modify -f input.pdf --compact output.pdf</code></td>
</tr>
<tr>
<td>hexapdf CS</td>
<td><code>hexapdf modify -f input.pdf --compact --object-streams generate output.pdf</code></td>
</tr>
<tr>
<td>hexapdf CSP</td>
<td><code>hexapdf modify -f input.pdf --compact --object-streams generate --compress-pages output.pdf</code></td>
</tr>
</tbody>
</table>
</dd>
<dt><strong>pdftk</strong></dt>
<dd>
<p>Homepage: <a href="https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/">https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/</a><br />
Version: 2.02<br />
Abilities: C</p>
<p><code>pdftk</code> is probably the best known application because, like <code>hexapdf</code> it allows for many
different operations on PDFs. It is based on the Java iText library which has been compiled to
native code using GCJ.</p>
<p>The application doesn’t have options for optimizing a PDF file but it can be assumed that it
removes unused and deleted objects.</p>
<p>It is used in the benchmark like this:</p>
<table>
<tbody>
<tr>
<td>pdftk C</td>
<td><code>pdftk input.pdf output output.pdf</code></td>
</tr>
</tbody>
</table>
</dd>
<dt><strong>qpdf</strong></dt>
<dd>
<p>Homepage: <a href="http://qpdf.sourceforge.net/">http://qpdf.sourceforge.net/</a><br />
Version: 6.0.0<br />
Abilities: C, CS</p>
<p>QPDF is a command line application for transforming PDF file written in C++ and it is used like
this:</p>
<table>
<tbody>
<tr>
<td>qpdf C</td>
<td><code>qpdf input.pdf output.pdf</code></td>
</tr>
<tr>
<td>qpdf CS</td>
<td><code>qpdf --object-streams=generate input.pdf output.pdf</code></td>
</tr>
</tbody>
</table>
</dd>
<dt><strong>smpdf</strong></dt>
<dd>Homepage: <a href="http://www.coherentpdf.com/compression.html">http://www.coherentpdf.com/compression.html</a><br />
Version: 1.4.1<br />
Abilities: CSP
<p>This is a commercial application but can be used for evaluation purposes. The application is
probably written in OCaml since it uses the <a href="http://www.github.com/johnwhitington/camlpdf">CamlPDF library</a>.</p>
<p>There is no way to configure the operations done but judging from its output it seems it does all
of the lossless operations:</p>
<table>
<tbody>
<tr>
<td>smpdf CSP</td>
<td><code>smpdf input.pdf -o output.pdf</code></td>
</tr>
</tbody>
</table>
</dd>
</dl>
<p>Apart from <code>hexapdf</code> all other applications are native binaries, compiled to machine code.</p>
<p>The files used in the benchmark vary in file size and internal structure:</p>
<table>
<thead>
<tr>
<th>Name</th>
<th style="text-align: right">Size</th>
<th style="text-align: right">Objects</th>
<th style="text-align: right">Pages</th>
<th>Details</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>a.pdf</strong></td>
<td style="text-align: right">53.056</td>
<td style="text-align: right">36</td>
<td style="text-align: right">4</td>
<td>created by Prawn</td>
</tr>
<tr>
<td><strong>b.pdf</strong></td>
<td style="text-align: right">11.520.218</td>
<td style="text-align: right">4.161</td>
<td style="text-align: right">439</td>
<td>many non-stream objects</td>
</tr>
<tr>
<td><strong>c.pdf</strong></td>
<td style="text-align: right">14.399.980</td>
<td style="text-align: right">5.263</td>
<td style="text-align: right">620</td>
<td>linearized, many streams</td>
</tr>
<tr>
<td><strong>d.pdf</strong></td>
<td style="text-align: right">8.107.348</td>
<td style="text-align: right">34.513</td>
<td style="text-align: right">20</td>
<td> </td>
</tr>
<tr>
<td><strong>e.pdf</strong></td>
<td style="text-align: right">21.788.087</td>
<td style="text-align: right">2.296</td>
<td style="text-align: right">52</td>
<td>huge content streams, many pictures, object streams, encrypted with default password</td>
</tr>
<tr>
<td><strong>f.pdf</strong></td>
<td style="text-align: right">154.752.614</td>
<td style="text-align: right">287.977</td>
<td style="text-align: right">28.365</td>
<td><em>very</em> big file</td>
</tr>
</tbody>
</table>
<p>The benchmark script is a simple Bash script that uses standard Linux CLI tools for measuring the
execution time and memory usage:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#/bin/bash</span>
<span class="nv">OUT_FILE</span><span class="o">=</span>/tmp/bench-result.pdf
<span class="nb">trap exit </span>2
<span class="k">function </span>bench_file<span class="o">()</span> <span class="o">{</span>
<span class="nv">cmdname</span><span class="o">=</span><span class="nv">$1</span>
<span class="nv">FORMAT</span><span class="o">=</span><span class="s2">"| %-20s | %'6ims | %'7iKiB | %'11i |</span><span class="se">\n</span><span class="s2">"</span>
<span class="nb">shift
time</span><span class="o">=</span><span class="si">$(</span><span class="nb">date</span> +%s%N<span class="si">)</span>
/usr/bin/time <span class="nt">-f</span> <span class="s1">'%M'</span> <span class="nt">-o</span> /tmp/bench-times <span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span> &>/dev/null
<span class="k">if</span> <span class="o">[</span> <span class="nv">$?</span> <span class="nt">-ne</span> 0 <span class="o">]</span><span class="p">;</span> <span class="k">then
</span><span class="nv">cmdname</span><span class="o">=</span><span class="s2">"ERR </span><span class="k">${</span><span class="nv">cmdname</span><span class="k">}</span><span class="s2">"</span>
<span class="nb">time</span><span class="o">=</span>0
<span class="nv">mem_usage</span><span class="o">=</span>0
<span class="nv">file_size</span><span class="o">=</span>0
<span class="k">else
</span><span class="nb">time</span><span class="o">=</span><span class="k">$((</span> <span class="o">(</span><span class="si">$(</span><span class="nb">date</span> +%s%N<span class="si">)</span><span class="o">-</span><span class="nb">time</span><span class="o">)/</span><span class="m">1000000</span> <span class="k">))</span>
<span class="nv">mem_usage</span><span class="o">=</span><span class="si">$(</span><span class="nb">cat</span> /tmp/bench-times<span class="si">)</span>
<span class="nv">file_size</span><span class="o">=</span><span class="si">$(</span><span class="nb">stat</span> <span class="nt">-c</span> <span class="s1">'%s'</span> <span class="nv">$OUT_FILE</span><span class="si">)</span>
<span class="k">fi
</span><span class="nb">printf</span> <span class="s2">"</span><span class="nv">$FORMAT</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$cmdname</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$time</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$mem_usage</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$file_size</span><span class="s2">"</span>
<span class="o">}</span>
<span class="nb">cd</span> <span class="si">$(</span><span class="nb">dirname</span> <span class="nv">$0</span><span class="si">)</span>
<span class="nv">FILES</span><span class="o">=(</span><span class="k">*</span>.pdf<span class="o">)</span>
<span class="k">if</span> <span class="o">[</span> <span class="nv">$# </span><span class="nt">-ne</span> 0 <span class="o">]</span><span class="p">;</span> <span class="k">then </span><span class="nv">FILES</span><span class="o">=(</span><span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span><span class="o">)</span><span class="p">;</span> <span class="k">fi
for </span>file <span class="k">in</span> <span class="s2">"</span><span class="k">${</span><span class="nv">FILES</span><span class="p">[@]</span><span class="k">}</span><span class="s2">"</span><span class="p">;</span> <span class="k">do
</span><span class="nv">file_size</span><span class="o">=</span><span class="si">$(</span><span class="nb">printf</span> <span class="s2">"%'i"</span> <span class="si">$(</span><span class="nb">stat</span> <span class="nt">-c</span> <span class="s1">'%s'</span> <span class="s2">"</span><span class="nv">$file</span><span class="s2">"</span><span class="si">))</span>
<span class="nb">echo</span> <span class="s2">"|------------------------------------------------------------|"</span>
<span class="nb">printf</span> <span class="s2">"| %-20s | Time | Memory | File size |</span><span class="se">\n</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$file</span><span class="s2"> (</span><span class="nv">$file_size</span><span class="s2">)"</span>
<span class="nb">echo</span> <span class="s2">"|------------------------------------------------------------|"</span>
bench_file <span class="s2">"hexapdf "</span> ruby <span class="nt">-I</span>../lib ../bin/hexapdf modify <span class="nt">-f</span> <span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--no-compact</span> <span class="k">${</span><span class="nv">OUT_FILE</span><span class="k">}</span>
bench_file <span class="s2">"hexapdf C"</span> ruby <span class="nt">-I</span>../lib ../bin/hexapdf modify <span class="nt">-f</span> <span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--compact</span> <span class="k">${</span><span class="nv">OUT_FILE</span><span class="k">}</span>
bench_file <span class="s2">"hexapdf CS"</span> ruby <span class="nt">-I</span>../lib ../bin/hexapdf modify <span class="nt">-f</span> <span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--compact</span> <span class="nt">--object-streams</span> generate <span class="k">${</span><span class="nv">OUT_FILE</span><span class="k">}</span>
bench_file <span class="s2">"hexapdf CSP"</span> ruby <span class="nt">-I</span>../lib ../bin/hexapdf modify <span class="nt">-f</span> <span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--compact</span> <span class="nt">--object-streams</span> generate <span class="nt">--compress-pages</span> <span class="k">${</span><span class="nv">OUT_FILE</span><span class="k">}</span>
bench_file <span class="s2">"pdftk C"</span> pdftk <span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">"</span> output <span class="k">${</span><span class="nv">OUT_FILE</span><span class="k">}</span>
bench_file <span class="s2">"qpdf C"</span> qpdf <span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">"</span> <span class="k">${</span><span class="nv">OUT_FILE</span><span class="k">}</span>
bench_file <span class="s2">"qpdf CS"</span> qpdf <span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--object-streams</span><span class="o">=</span>generate <span class="k">${</span><span class="nv">OUT_FILE</span><span class="k">}</span>
bench_file <span class="s2">"smpdf CSP"</span> smpdf <span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">"</span> <span class="nt">-o</span> <span class="k">${</span><span class="nv">OUT_FILE</span><span class="k">}</span>
<span class="nb">echo</span> <span class="s2">"|------------------------------------------------------------|"</span>
<span class="nb">echo
</span><span class="k">done</span>
</code></pre></div></div>
<h2 id="benchmark-results">Benchmark Results</h2>
<p>Here are the results of running the benchmark script on all the PDF files. You will find comments on
the results are afterwards:</p>
<pre><code>|------------------------------------------------------------|
| a.pdf (53,056) | Time | Memory | File size |
|------------------------------------------------------------|
| hexapdf | 146ms | 16,012KiB | 52,338 |
| hexapdf C | 181ms | 16,332KiB | 52,315 |
| hexapdf CS | 192ms | 16,992KiB | 49,181 |
| hexapdf CSP | 200ms | 17,872KiB | 48,251 |
| pdftk C | 55ms | 53,384KiB | 53,144 |
| qpdf C | 12ms | 4,568KiB | 53,179 |
| qpdf CS | 15ms | 4,652KiB | 49,287 |
| smpdf CSP | 20ms | 8,136KiB | 48,329 |
|------------------------------------------------------------|
|------------------------------------------------------------|
| b.pdf (11,520,218) | Time | Memory | File size |
|------------------------------------------------------------|
| hexapdf | 1,012ms | 32,736KiB | 11,464,892 |
| hexapdf C | 1,062ms | 35,212KiB | 11,415,701 |
| hexapdf CS | 1,091ms | 37,472KiB | 11,101,448 |
| hexapdf CSP | 9,570ms | 86,920KiB | 11,085,158 |
| pdftk C | 466ms | 68,532KiB | 11,501,669 |
| qpdf C | 581ms | 11,840KiB | 11,500,308 |
| qpdf CS | 700ms | 11,948KiB | 11,124,779 |
| smpdf CSP | 3,379ms | 51,648KiB | 11,092,428 |
|------------------------------------------------------------|
|------------------------------------------------------------|
| c.pdf (14,399,980) | Time | Memory | File size |
|------------------------------------------------------------|
| hexapdf | 2,207ms | 46,624KiB | 14,519,207 |
| hexapdf C | 2,225ms | 46,640KiB | 14,349,008 |
| hexapdf CS | 2,396ms | 55,180KiB | 13,185,262 |
| hexapdf CSP | 10,543ms | 106,448KiB | 13,111,094 |
| pdftk C | 1,625ms | 100,648KiB | 14,439,611 |
| qpdf C | 1,730ms | 34,840KiB | 14,432,647 |
| qpdf CS | 2,103ms | 35,348KiB | 13,228,102 |
| smpdf CSP | 3,117ms | 76,440KiB | 13,076,598 |
|------------------------------------------------------------|
|------------------------------------------------------------|
| d.pdf (8,107,348) | Time | Memory | File size |
|------------------------------------------------------------|
| hexapdf | 6,115ms | 110,760KiB | 7,774,817 |
| hexapdf C | 5,825ms | 92,952KiB | 7,036,577 |
| hexapdf CS | 6,352ms | 86,860KiB | 6,539,334 |
| hexapdf CSP | 6,499ms | 96,264KiB | 5,599,758 |
| pdftk C | 2,232ms | 102,276KiB | 7,279,035 |
| qpdf C | 3,153ms | 40,568KiB | 7,209,305 |
| qpdf CS | 3,197ms | 40,360KiB | 6,703,374 |
| smpdf CSP | 2,922ms | 80,288KiB | 5,528,352 |
|------------------------------------------------------------|
|------------------------------------------------------------|
| e.pdf (21,788,087) | Time | Memory | File size |
|------------------------------------------------------------|
| hexapdf | 882ms | 52,736KiB | 21,784,732 |
| hexapdf C | 1,172ms | 99,924KiB | 21,850,715 |
| hexapdf CS | 1,139ms | 101,952KiB | 21,769,651 |
| hexapdf CSP | 36,015ms | 201,240KiB | 21,195,877 |
| pdftk C | 694ms | 122,920KiB | 21,874,883 |
| qpdf C | 1,391ms | 64,144KiB | 21,802,439 |
| qpdf CS | 1,443ms | 64,588KiB | 21,787,558 |
| smpdf CSP | 38,209ms | 646,888KiB | 21,188,516 |
|------------------------------------------------------------|
|------------------------------------------------------------|
| f.pdf (154,752,614) | Time | Memory | File size |
|------------------------------------------------------------|
| hexapdf | 60,135ms | 575,172KiB | 154,077,468 |
| hexapdf C | 65,187ms | 580,344KiB | 153,946,077 |
| hexapdf CS | 71,495ms | 715,720KiB | 117,642,988 |
| ERR hexapdf CSP | 0ms | 0KiB | 0 |
| pdftk C | 30,563ms | 682,044KiB | 157,850,354 |
| qpdf C | 36,736ms | 485,060KiB | 157,723,936 |
| qpdf CS | 41,945ms | 487,516KiB | 118,114,521 |
| ERR smpdf CSP | 0ms | 0KiB | 0 |
|------------------------------------------------------------|
</code></pre>
<p>Some comments:</p>
<ul>
<li>
<p><strong>HexaPDF in CSP mode produces the smallest PDF in three cases and is second in the other three</strong>
cases where <code>smpdf</code> is the best compressor. However, since the difference in files sizes are
marginal, HexaPDF and <code>smpdf</code> can be considered equal.</p>
</li>
<li>
<p>When page compression is activated, HexaPDF is much slower but this is expected since each content
stream has to be parsed and serialized.</p>
</li>
<li>
<p><code>pdftk</code> is the fastest application except for the a.pdf file. And in all cases but a.pdf <strong>HexaPDF
in CS mode is only up to three times slower</strong>. This is rather good considering HexaPDF is written
in Ruby while all applications are compiled binaries.</p>
<p>The benchmark for a.pdf is a bit of an outlier because startup time greatly affects the result in
HexaPDF’s case.</p>
</li>
<li>
<p><strong>Looking at the memory usage, HexaPDF also fares quite well compared to C++ based <code>qpdf</code></strong>. And
it uses less memory than <code>pdftk</code> in all cases except f.pdf!</p>
</li>
<li>
<p>There is one case where HexaPDF uses the least amount of memory, namely for e.pdf without any
special operations done. The reason for this is that HexaPDF applies stream filters only when
necessary.</p>
<p>This means for e.pdf that HexaPDF doesn’t decrypt and encrypt any stream if not necessary while
all the other applications seem to do so, leading to higher memory usage.</p>
</li>
</ul>
<h2 id="summary">Summary</h2>
<p>The HexaPDF library is already quite optimized in terms of performance and memory usage. It is only
up to three times slower than solutions in compiled languages and doesn’t use much memory. There are
still some things that I think can make HexaPDF perform better and I will look into them in the
future.</p>
<p>Also, compared to other Ruby solutions like <a href="https://github.com/gdelugre/origami">origami</a> or <a href="https://github.com/boazsegev/combine_pdf">combine_pdf</a>, HexaPDF uses less memory and
is always at least two times faster. So I think that HexaPDF is currently the way to go if one needs
to work with PDFs in Ruby.</p>
HexaPDF Code and First Version Releasedhttps://gettalong.org/blog/2016/hexapdf-code-and-first-version-released.html2016-10-26T12:31:07+02:002016-10-26T12:31:00+02:00
<p>As of today the <a href="http://hexapdf.gettalong.org">HexaPDF</a> source code is available on <a href="https://github.com/gettalong/hexapdf">Github</a> and the first version has been
<a href="https://rubygems.org/gems/hexapdf">released</a>.</p>
<p>HexaPDF is a pure Ruby library with an accompanying application for working with PDF files. In
short, it allows</p>
<ul>
<li><strong>creating</strong> new PDF files,</li>
<li><strong>manipulating</strong> existing PDF files,</li>
<li><strong>merging</strong> multiple PDF files into one,</li>
<li><strong>extracting</strong> meta information, text, images and files from PDF files,</li>
<li><strong>securing</strong> PDF files by encrypting them and</li>
<li><strong>optimizing</strong> PDF files for smaller file size or other criteria.</li>
</ul>
<p>HexaPDF was designed with ease of use and performance in mind. It uses lazy loading and lazy
computing when possible and tries to produce small PDF files by default.</p>
<p>The initial version provides the basic functionalities for reading and writing PDFs as well as
low-level support for creating PDFs. You can find some code examples on how to use the library on
the <a href="http://hexapdf.gettalong.org">HexaPDF website</a>.</p>
<p>The command line application currently provides commands for</p>
<ul>
<li>getting general information about a PDF file,</li>
<li>extracting files from a PDF file,</li>
<li>modifying a PDF file (file size optimization, decryption, encryption, page selection and page
rotation),</li>
<li>and for inspecting the internal objects and streams of a PDF file.</li>
</ul>
<p>In the future the application will provide the full range of standard PDF operations. This means,
for example, that commands for merging and splitting will be added.</p>
On HexaPDF and Releasing Earlyhttps://gettalong.org/blog/2016/on-hexapdf-and-releasing-early.html2016-10-14T20:01:24+02:002016-10-14T20:01:00+02:00
<p>After my <a href="http://talks.gettalong.org/euruko2016">lightning talk</a> about HexaPDF at <a href="http://euruko2016.org/">Euruko 2016</a> I was asked why I haven’t released
the source code to HexaPDF, yet. I have my reasons and in this post I will shed some light on them.</p>
<h2 id="why-releasing-early-is-good">Why Releasing Early is Good</h2>
<p>Many open source developers put their code online at very early stages of development in the hope
that others are interested and contribute in one way or another. For example, this was done by Linus
Torvalds when he announced the Linux kernel on a mailing list and many people got interested very
fast.</p>
<p>It totally makes sense to do this in many cases, for example, when you are not sure about the
overall design of your library or application and need input from others, or when you are basically
finished with the “core” and need input for more features.</p>
<p>Releasing early often brings a new set of ideas to the table on how things could be done because
there isn’t so much code and changing it is easier.</p>
<p>Whether people are interested in your code depends, naturally, on what the code does but if you put
the code on Github or a similar service, announce it on appropriate websites, forums, mailings
lists, etc., you will nearly always get some response and discussions.</p>
<h2 id="except-when-it-is-not">… Except When it is Not</h2>
<p>So how could this be a bad thing?</p>
<p>If you release code at very early stages where many things already work but documentation is sparse
and some things are still broken, you will inevitably find yourself answering questions on how to do
certain things, responding to issues, commenting on pull requests…</p>
<p>These tasks, while valueable to your project, take time away from actually developing your project.
And what if you don’t have much time for your project in the first place?</p>
<p>You may suddenly find yourself in a situation where your project demands much more attention than
you are able to give it which may lead to frustration on your side but will also frustrate people
interested in the project.</p>
<h2 id="so-what-about-hexapdf">So What About HexaPDF?</h2>
<p>A couple of years ago I needed to convert kramdown documents to PDF. First I used <a href="http://wkhtmltopdf.org/">wkhtmltopdf</a> and
then <a href="http:prawnpdf.org">Prawn</a> for this task. However, while implementing the Prawn solution I found that Prawn lacked
some features I liked to have.</p>
<p>After first investigating whether patching Prawn would do the trick, I decided to look at the PDF
specification to see what it would take to implement a PDF library from scratch. And so the idea for
HexaPDF was born about three years ago.</p>
<p>Since I didn’t have the time to start development, I read the PDF specification and looked at many
existing PDF libraries to see what features they had and how the libraries were designed. This gave
me many ideas on how I wanted to design HexaPDF.</p>
<p>About a year later, in September 2014, I started implementing HexaPDF. Since I knew the order in
which I wanted to implement the various parts of the library and for most parts how I wanted to
implement them, there was no need to release the code early to get feedback. So that bonus of
releasing early wasn’t really a bonus for me.</p>
<p>Another reason why I didn’t release the code was that I did all the development in my spare time.
This meant for me that I could code whenever I wanted (or had time) and I didn’t need to concern
myself with what other people wanted or expected from me or the code. There were times in the last
two years when I didn’t write a single line of code for months and just pondered the design.</p>
<p>I don’t know yet whether HexaPDF will garner much interest in the Ruby community but since there are
really only two libraries for working with PDFs, <a href="http:prawnpdf.org">Prawn</a> and <a href="https://github.com/yob/pdf-reader">pdf-reader</a> (both of which implement
not nearly all aspects of the PDF specification), I guess it will.</p>
<p>Lastly, I hadn’t decided on a license for HexaPDF and releasing the code without a license doesn’t
make much sense.</p>
<p>So here they are, my reasons for not releasing the code to HexaPDF at an early stage. I know that
many probably won’t agree with my reasons but for me it provided me with the time and space to
really enjoy working on HexaPDF and the challenges that came with it.</p>
<p>One question still remains: When will HexaPDF be released? The code itself will be released soon,
but the <strong><a href="http://hexapdf.gettalong.org">HexaPDF website</a></strong> containing example scripts and their resulting PDF output as well as
the API documentation is available <em>now</em>.</p>
Static Websites with webgenhttps://gettalong.org/blog/2016/static-websites-with-webgen.html2016-10-13T22:32:17+02:002016-10-13T22:30:00+02:00
<p>The static website generator <a href="http://webgen.gettalong.org">webgen</a> has been in development for over a decade now. It provides the
essential functionalities out of the box and is easy to use, even for non-programmers. Designed to
be a general purpose static website generator it can be used for any kind of website, not just
blogs.</p>
<p>In this post I will show you how to create a basic website with webgen.</p>
<h2 id="first-steps">First Steps</h2>
<p>As with most applications written in Ruby, webgen is just a <code>gem install webgen</code> away. After the
installation, the <code>webgen</code> binary is used for everything, from creating the needed website structure
to generating the output files (see <code>webgen help</code> for all available commands).</p>
<p>When webgen is invoked in a directory that is not a valid webgen website, it tells you so:</p>
<pre><code>$ webgen
INFO Generating website...
INFO No active source paths found - maybe not a webgen website?
INFO ... done in 0.02 seconds
</code></pre>
<p>So we need to create the website directory first. However, before we do that we install an extension
bundle that provides some pre-built templates:</p>
<pre><code>$ webgen install templates
Installed webgen-templates-bundle
</code></pre>
<p>Now we create the website and populate it with the “andreas07” template:</p>
<pre><code>$ webgen create website --template andreas07 demo -v
INFO [create] </>
INFO [create] </ext/>
INFO [create] </ext/init.rb>
INFO [create] </src/>
INFO [create] </src/andreas07.css>
INFO [create] </src/default.template>
INFO [create] </src/images/>
INFO [create] </src/images/bodybg.gif>
INFO [create] </src/images/sidebarbg.gif>
INFO [create] </webgen.config>
Created a new webgen website in <demo> using the 'andreas07' template
</code></pre>
<p>By using the <code>-v</code> option webgen provides more output, in this case showing us the created files. We
see that inside the <code>demo/</code> directory a <code>src/</code> directory was created with some files and the files
<code>ext/init.rb</code> as well as <code>webgen.config</code>.</p>
<h2 id="website-structure">Website Structure</h2>
<p>The most important file for webgen is <code>webgen.config</code> because the existence of this file means that
the directory contains a webgen website. This file is used for setting configuration values, like
the main language or base URL of the website. To see the full list of available configuration
options, use <code>webgen show config</code> or the <a href="http://webgen.gettalong.org/documentation/reference/configuration_options.html">configuration reference</a> on the website.</p>
<p>By default webgen expects all source files to be in the <code>src/</code> directory and puts the generated
files into the <code>out/</code> directory. All temporary files are put into the <code>tmp/</code> directory (which can
safely be deleted).</p>
<p>The <code>ext/init.rb</code> file is for extending the functionality of webgen by writing Ruby code. It is not
needed in many cases and can be deleted.</p>
<h2 id="generating-the-website-and-adding-content">Generating the Website and Adding Content</h2>
<p>Now that we have created the basic website structure we can use webgen to generate the output files:</p>
<pre><code>$ cd demo
$ webgen
INFO Generating website...
INFO [create] </>
INFO [create] </andreas07.css>
INFO [create] </images/>
INFO [create] </images/bodybg.gif>
INFO [create] </images/sidebarbg.gif>
INFO ... done in 0.02 seconds
$ webgen
INFO Generating website...
INFO Nothing has changed since the last invocation!
INFO ... done in 0.02 seconds
</code></pre>
<p>As you can see webgen supports <strong>partial website generation</strong>, i.e. it only generates those files
that have changed since the last invocation.</p>
<p>You might have noticed that no HTML file was created. We will remedy the situation by creating a
page file. <strong>Page files</strong> are those files of a webgen website that get transformed into HTML files.
So we create the <code>src/index.page</code> file with some content and rerun webgen:</p>
<pre><code>$ cat src/index.page
---
title: Homepage
in_menu: true
---
# My Homepage
This is the first page of my homepage!
$ webgen
INFO Generating website...
INFO [update] </>
INFO [create] </index.html>
INFO ... done in 0.08 seconds
</code></pre>
<p>This file consists of two parts:</p>
<ul>
<li>
<p>The first part is the <strong>meta information section</strong> where information about the page is defined. We
defined the title of the page and an additional meta information called <code>in_menu</code>.</p>
</li>
<li>
<p>The second part, after the second line containing <code>---</code>, is the content of the page file.</p>
</li>
</ul>
<p>All page files always have at least one content section but may have more, see the <a href="http://webgen.gettalong.org/documentation/reference/webgen_page_format.html">Webgen Page
Format</a> page for details.</p>
<p>If you open the resulting <code>out/index.html</code> file in a browser you will see the following:</p>
<p class="image main"><img src="assets/webgen-website-initial.png" alt="initial website page" /></p>
<p>What we can see is that the page’s content was transformed into HTML (see the middle part) and
embedded into a template. This is the main purpose of every static website generator: Provide means
for using other markup languages like Markdown to ease writing the content, and a templating system
to avoid duplication of the main HTML markup that defines the layout and look of the site.</p>
<p>So where is that template? It can be found in the source directory as the file
<code>src/default.template</code>. Template files follow the same format as page files but most template files
don’t use the meta information section.</p>
<p>If you look into this template file you will find that it contains a basic HTML structure with some
additional un-HTML like statements:</p>
<ul>
<li>
<p>Line 10 looks like this:</p>
<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><link</span> <span class="na">rel=</span><span class="s">"stylesheet"</span> <span class="na">type=</span><span class="s">"text/css"</span> <span class="na">href=</span><span class="s">"{relocatable: andreas07.css}"</span> <span class="na">media=</span><span class="s">"screen,projection"</span> <span class="nt">/></span>
</code></pre></div> </div>
<p>This would be as standard HTML <code><link /></code> tag except for the value of the “href” attribute. The
contents is actually what is called a <strong>webgen tag</strong> which is a system for adding dynamic content
without programming.</p>
<p>Each tag has a name, in this case “relocatable”, and may contain parameters, in this case
“andreas07.css”. The <a href="http://webgen.gettalong.org/documentation/reference/extensions/tag/relocatable.html">“relocatable” tag</a> looks up its parameter in the tree of files
webgen knows about and creates a relative path to that file. This allows us to preview the
generated HTML file without a webserver because all generated links are relative.</p>
</li>
<li>
<p>Another webgen tag is found in line 19:</p>
<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{menu: {options: {sort: true, mi: {in_menu: true}, absolute_levels: 1}}}
</code></pre></div> </div>
<p>This tag creates a <a href="http://webgen.gettalong.org/documentation/reference/extensions/tag/menu.html">HTML menu</a> by filtering the files webgen knows about using the given
options. In this case each menu entry needs to have the meta information “in_menu” set to “true”
(remember: we added that meta information to <code>src/index.page</code>) and it must be on level one in the
directory hierarchy.</p>
</li>
<li>
<p>The last line we look at is line 37:</p>
<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><webgen:block</span> <span class="na">name=</span><span class="s">"content"</span> <span class="nt">/></span>
</code></pre></div> </div>
<p>This definitely does look more like an XML tag than an HTML tag. It is a <a href="http://webgen.gettalong.org/documentation/reference/extensions/content_processor/blocks.html">special tag</a>
that webgen recognizes and it means that the content should be placed here.</p>
</li>
</ul>
<p>If a page file doesn’t specify a special template file using the <a href="http://webgen.gettalong.org/documentation/reference/meta_information_keys.html#template">“template” meta information
key</a>, the default template is used. And templates themselves can even use other templates
which is useful, for example, to embedded a post into a special post template that itself is
embedded into the main template.</p>
<p>And this is all you need for a basic website! Just add some more page files, fill them with content
and you are good to go!</p>
<h2 id="conclusion">Conclusion</h2>
<p>By knowing about the <a href="http://webgen.gettalong.org/documentation/reference/configuration_options.html">configuration file <code>webgen.config</code></a>, what <a href="http://webgen.gettalong.org/documentation/reference/extensions/path_handler/page.html">page</a> and
<a href="http://webgen.gettalong.org/documentation/reference/extensions/path_handler/template.html">template</a> files are and for what they are used, as well as the <a href="http://webgen.gettalong.org/documentation/reference/extensions/content_processor/tags.html">webgen tag
system</a> you can create a basic website without knowing a programming language.</p>
<p>If this short introduction roused your interest in webgen, have a look at the <a href="http://webgen.gettalong.org/documentation/">webgen
documentation</a> to see what else is possible with webgen.</p>
<p>You can also have a look at existing websites that are done with webgen to get inspiration:</p>
<ul>
<li>
<p>Since the only file for a webgen website that has to exist is the configuration file, a webgen
website can easily be maintained next to the source code of a project. See the <a href="http://cmdparse.gettalong.org">website of
cmdparse</a> and the corresponding <a href="https://github.com/gettalong/cmdparse/blob/master/webgen.config">webgen.config file</a> as an example.</p>
</li>
<li>
<p>webgen can also be used for more complex websites like the <a href="https://github.com/gettalong/webgen-website">website of webgen</a>
itself. This website uses some extensions in the <code>ext/</code> directory but most things are handled
through built-in features, e.g. the <a href="http://webgen.gettalong.org/documentation/reference/extensions/path_handler/sitemap.html">sitemap</a>, <a href="http://webgen.gettalong.org/documentation/reference/extensions/path_handler/feed.html">RSS feeds</a> and the <a href="http://webgen.gettalong.org/documentation/reference/extensions/path_handler/api.html">automatic API documentation
generation</a>.</p>
</li>
<li>
<p><a href="https://github.com/gettalong/gettalong.org">This website</a> is also done with webgen and shows that simple blogs can be done with
webgen out of the box.</p>
</li>
</ul>
A Blog at Lasthttps://gettalong.org/blog/2016/a-blog-at-last.html2016-09-03T22:39:00+02:002016-09-03T22:39:00+02:00
<p>I never really started blogging all these years although I developed a <a href="../../projects/webgen.html">static website
generator</a> and a <a href="../../projects/kramdown.html">Markdown converter</a> ages ago. Don’t know why really… probably thought
that I wouldn’t find the time to actually post something.</p>
<p>But now is always a better time to start than never, isn’t it? :-)</p>
<p><strong>What will I blog about?</strong> Since I’m programming in my spare time I will write about programming
and software engineering, especially in the context of the programming language Ruby. And probably
about problem solving in general.</p>
<p><strong>Why about Ruby?</strong> It is the language I use nearly on a daily basis and therefore know most
intimately. My open source projects use Ruby and can provide examples of real world code for
discussion.</p>
<p>I already have some ideas for posts about HexaPDF, my newest Ruby project, a full-featured PDF
reader and writer.</p>