Skip to content

Commit

Permalink
Copy docs site from gh-pages branch into master branch
Browse files Browse the repository at this point in the history
  • Loading branch information
Willem van Bergen committed Oct 24, 2020
1 parent 4f0f3db commit f73e03f
Show file tree
Hide file tree
Showing 7 changed files with 373 additions and 22 deletions.
10 changes: 10 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,13 @@ end
platform :rbx do
gem "rubysl"
end

group :jekyll do
gem "jekyll", "~> 3.3"
gem "kramdown-parser-gfm"
end

group :jekyll_plugins do
gem "jekyll-commonmark"
gem "jekyll-theme-cayman"
end
3 changes: 3 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.sass-cache/
.jekyll-metadata
_site/
10 changes: 9 additions & 1 deletion docs/_config.yml
Original file line number Diff line number Diff line change
@@ -1 +1,9 @@
theme: jekyll-theme-cayman
theme: jekyll-theme-cayman

title: ChunkyPNG
email: willem@vanbergen.org
description: Read/write access to PNG images in pure Ruby.
url: "https://www.chunkypng.com"

highlighter: rouge
show_downloads: true
136 changes: 136 additions & 0 deletions docs/_posts/2010-01-14-memory-efficiency-when-using-ruby.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
---
author: Willem van Bergen
title: Memory efficiency when using Ruby
---

I have been spending some time creating [a pure Ruby PNG library](https://github.com/wvanbergen/chunky_png). For this library, I need to have some representation of the image, which is composed of RGB pixels, supporting an alpha channel. Because images can be composed of a lot of pixels, I want the implementation to be as memory efficient as possible. I also would like decent performance.

A very naive Ruby implementation for an image represents the red, green, blue and alpha channel using a floating point number between 0.0 and 1.0, and might look something like this:

{% highlight ruby %}
class Pixel
attr_reader :r, :g, :b, :a

def initialize(r, g, b, a = 1.0)
@r, @g, @b, @a = r, g, b, a
end
end

class Image
attr_reader :width, :height

def initialize(width, height)
@width, @height = width, height
@pixels = Array.new(width * height)
end

def [](x,y)
@pixels[y * width + x]
end

def []=(x,y, pixel)
@pixels[y * width + x] = pixel
end
end
{% endhighlight %}

For a 10×10 image, this representation requires 4 times 100 floating point numbers, which require 8 bytes each. That’s already over 3kB for such a small image just for the floating point numbers! Ouch.

A simple improvement is to decide that 8-bit color depth is enough in the case, in which case each channel can be represented by an integer between 0 and 255. Storing such a number only costs one byte of memory. Ruby’s Fixnum class typically uses 4-byte integers. If only the 4 channels of one byte each could be combined into a single Fixnum instance… Behold!

{% highlight ruby %}
class Pixel
attr_reader :value
alias :to_i :value

def initialize(value)
@value = value
end

def self.rgba(r, g, b, a = 255)
self.new(r << 24 | g << 16 | b << 8 | a)
end

def r; (@value & 0xff000000) >> 24; end
def g; (@value & 0x00ff0000) >> 16; end
def b; (@value & 0x0000ff00) >> 8; end
def a; (@value & 0x000000ff); end
end
{% endhighlight %}

Notice the bit operations, which are extremely fast. This only requires 100 times 4 bytes = 400 bytes for storing the RGBA values for a 10×10 image, an 8 times improvement!

This implementation wraps every pixel inside an object. This is nice, because I want to access the separate channels of every pixel easily using the r, g, b, and a methods, and every other method that is defined for every pixel. However, a Ruby object instance has an overhead of at least 20 bytes. That’s 20 times 100 is about 2kB for our 10×10 image!

To get rid of the object overhead, it is possible to simply store the Fixnum value for every pixel, and only wrapping it inside a Pixel object when it is accessed. This can be done by modifying the Image class:

{% highlight ruby %}
class Image
# ...

def [](x,y)
Pixel.new(@pixels[y * width + x]) # wrap
end

def []=(x,y, pixel)
@pixels[y * width + x] = pixel.to_i # unwrap
end
end
{% endhighlight %}

As you can see, some simply changes in the representation can really make a difference in the memory usage. Can this representation be improved further?

## Integer math calculations

Because we are now using integers to represent a pixel, this can cause problems when the math requires you to use floating point numbers. For example, the formula for [alpha composition](https://en.wikipedia.org/wiki/Alpha_compositing) of two pixels is as follows:

\\[ C_o = C_a \alpha_a + C_b \alpha_b (1 - \alpha_a) \\]

in which \\(C_a\\) is the color component of the foreground pixel, \\(\alpha_a\\) the alpha channel of the foreground pixel, \\(C_b\\) and \\(\alpha_b\\) the same values for the background pixel, all of which should be values between 0 and 1.

A naive implementation could convert the integer numbers to their floating point equivalents:

{% highlight ruby %}
def compose(fg, bg)
return bg if fg.a == 0
return fg if fg.a == 255

fg_alpha = fg.a / 255.0
bg_alpha = fg.a / 255.0
alpha_complement = (1.0 - fg_alpha) * bg_alpha

new_r = (fg_alpha * fg.r + alpha_complement * bg.r).round
new_g = (fg_alpha * fg.g + alpha_complement * bg.g).round
new_b = (fg_alpha * fg.b + alpha_complement * bg.b).round
new_a = ((fg_alpha + alpha_complement) * 255).round

Pixel.rgba(new_r, new_g, new_b, new_a)
end
{% endhighlight %}

This implementation is already a little bit optimized: no unnecessary conversions and calculations are being performed. However, this composition can be done a lot quicker after realizing that 255 is almost a power of two, in which computers excel because it can use bitwise operators and shifting for some calculations.

My new approach uses a quicker implementation of multiplication of 8-bit integers that represent floating numbers between 0 and 1:

{% highlight ruby %}
def compose(fg, bg)
return bg if fg.a == 0
return fg if fg.a == 255

alpha_complement = multiply(255 - fg.a, bg.a)
new_r = multiply(fg.a, fg.r) + multiply(alpha_complement, bg.r)
new_g = multiply(fg.a, fg.g) + multiply(alpha_complement, bg.g)
new_b = multiply(fg.a, fg.b) + multiply(alpha_complement, bg.b)
new_a = fg.a + alpha_complement

Pixel.rgba(new_r, new_g, new_b, new_a)
end

# Quicker alternative for (a * b / 255.0).round
def multiply(a, b)
t = a * b + 0x80
((t >> 8) + t) >> 8
end
{% endhighlight %}

Note that the new implementation is less precise in theory, but this precision is lost anyway because we have to convert the values back to 8 bit RGBA values. Your thoughts?
82 changes: 82 additions & 0 deletions docs/_posts/2010-01-17-ode-to-array-pack-and-string-unpack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
author: Willem van Bergen
title: Ode to Array#pack and String#unpack
---

Remember [my last post]({% post_url 2010-01-14-memory-efficiency-when-using-ruby %}), where I representing a pixel with a Fixnum, storing the R, G, B and A value in its 4 bytes of memory? Well, I have been working some more on [my PNG library](https://github.com/wvanbergen/chunky_png) and I am now trying loading and saving an image.

Using the [PNG specification](https://www.w3.org/TR/PNG/), building a PNG encoder/decoder isn’t that hard, but the required algorithmic calculations make sure that performance in Ruby is less than stellar. I have rewritten all calculations to only use fast integer math (plus, minus, multiply and bitwise operators), but simply the amount of code that is getting executed is slowing Ruby down. What more can I do to improve the performance?

## Encoding RGBA images

Optimizing loading images is very hard, because PNG images can have many variations, and taking shortcuts means that some images are no longer supported. Not so with saving images: as long an image is saved using one of the valid variations, every PNG decoder will be able to read the file. Let’s see if it is possible to optimize one of these encoding variations.

During encoding, the image get splits up into scanlines (rows) of pixels, which in turn get converted into bytes. These bytes can be filtered for optimal compression. For a 3×3 8-bit RGBA image, the result looks like this:

F Rf Gf Bf Af Rf Gf Bf Af Rf Gf Bf Af
F Rf Gf Bf Af Rf Gf Bf Af Rf Gf Bf Af
F Rf Gf Bf Af Rf Gf Bf Af Rf Gf Bf Af

Every line starts with a byte F indicating the filter method, followed by the filtered R, G and B value for every pixel on that line. Now, if we choose filter method 0, which means no filtering, the result looks like this:

0 Ro Go Bo Ao Ro Go Bo Ao Ro Go Bo Ao
0 Ro Go Bo Ao Ro Go Bo Ao Ro Go Bo Ao
0 Ro Go Bo Ao Ro Go Bo Ao Ro Go Bo Ao

Now, the original R, G, B and A byte from the original pixel’s Fixnum, occur in [big-endian or network byte order](https://en.wikipedia.org/wiki/Endianness), starting with the top left pixel, moving left to right and then top to bottom. Exactly like the pixels are stored in our image’s pixel array! This means that we can use the Array#pack method to encode into this format. The Array#pack-notation for this is "xN3" in which x get translated into a null byte, and every N as 4-byte integer in network byte order. For optimal performance, it is best to not split the original array in lines, but to pack the complete pixel array at once. So, we can encode all pixels with this command:

{% highlight ruby %}
pixeldata = pixels.pack("xN#{width}" * height)
{% endhighlight %}

This way, the splitting the image into lines, splitting the pixels into bytes, and filtering the bytes can be skipped. In Ruby 1.8.7, this means a speedup of over 1500% (no typo)! Of course, because no filtering applied, the subsequent compression is not optimal, but that is a tradeoff that I am willing to make.

## Encoding RGB images

What about RGB images without alpha channel? We can simply choose to encode these using the RGBA method, but that increases the file size with roughly 25%. Can we fix this somehow?

The unfiltered pixel data should look something like this:

0 Ro Go Bo Ro Go Bo Ro Go Bo
0 Ro Go Bo Ro Go Bo Ro Go Bo
0 Ro Go Bo Ro Go Bo Ro Go Bo

This means that for every pixel that is encoded as a 4-byte integer, the last byte should be ditched. Luckily, the `Array#pack` method offers a modifier that does just that: `X`. Packing a 3 pixel line can be done with `"xNXNXNX"`. Again we would like to pack the whole pixel array at once:

{% highlight ruby %}
pixeldata = pixels.pack(("x" + ('NX' * width)) * height)
{% endhighlight %}

Because all the encoding steps can get skipped once again, the speed improvement is again 1500%! And the result is 25% smaller than the RGBA method. This method is actually so speedy, that saving an image using Ruby 1.9.1 is only a little bit slower (< 10%) than saving a PNG image using RMagick! See my [performance comparison](https://github.com/wvanbergen/chunky_png/wiki/performance-comparison).

## Loading image

Given the promising results of the Array#pack method, using its counterpart String#unpack looks promising for speedy image loading, if you know the image’s size and the encoding format beforehand.

An RGBA formatted stream can be loaded quickly with this command:

{% highlight ruby %}
pixels = rgba_pixeldata.unpack("N#{width * height}")
image = Image.new(width, height, pixels)
{% endhighlight %}

For an RGB formatted stream, we can use the X modifier again, but we have to make sure to set the alpha value for every pixel to 255:

{% highlight ruby %}
pixels = rgb_pixeldata.unpack("NX" * (width * height))
pixels.map! { |pixel| pixel | 0x000000ff }
image = Image.new(width, height, pixels)
{% endhighlight %}

You can even use little-endian integers to load streams in ABGR format!

{% highlight ruby %}
pixels = abgr_pixeldata.unpack("V#{width * height}")
image = Image.new(width, height, pixels)
{% endhighlight %}

Loading pixel data for an image like this is again over 1500% faster than decoding the same PNG image. However, this can only be applied if you have control over the input format of the image.

## To conclude

`Array#pack` and `String#unpack` really have increased the performance for my code. If you can apply them for project, don’t hesitate and spread the love! For all other cases, use as little code as possible, and upgrade to Ruby 1.9 for improved algorithmic performance.
61 changes: 61 additions & 0 deletions docs/_posts/2014-11-07-the-value-of-a-pure-ruby-library.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
author: Willem van Bergen
title: The value of a pure Ruby library
---

In late 2009, my employer at the time &mdash; [Floorplanner](https://www.floorplanner.com) &mdash; was struggling with memory leaks in [RMagick](https://www.imagemagick.org/RMagick/doc/), a Ruby wrapper around the image manipulation library [ImageMagick](https://www.imagemagick.org/). Because we only needed a small subset of RMagick's functionality, I decided to write a simple library so we could get rid of RMagick. Not much later, [ChunkyPNG was born](https://github.com/wvanbergen/chunky_png/commit/aa8a9378eedfc02aa1d0d1e05c313badc76594a7).

Even though ChunkyPNG has grown in scope and complexity to cover the entire PNG standard, it still is a "pure Ruby" library: all of the code is Ruby, and it doesn't have any dependencies besides Ruby itself. Initially, this was purely for practical reasons: I knew Ruby wasn't the fastest language in the world, but I had no idea how to write Ruby C extensions. Performance was not an important concern for the problem at hand, and maybe RMagick being a C extension was the cause of its memory leaks? By writing pure Ruby, I could get results faster and let the Ruby interpreter do the hard work of managing memory for me. <sup>[1]</sup>

### Performance becomes important

Mostly as a learning project, I ended up implementing the entire PNG standard. This made the library suitable for a broader set of problems, and more people started using it. Performance then became more important. I put a decent effort into optimizing the memory efficiency by [optimizing storing pixels in memory]({% post_url 2010-01-14-memory-efficiency-when-using-ruby %}), and I boosted performance by [short-circuiting the PNG encoding routine using Array#pack]({% post_url 2010-01-17-ode-to-array-pack-and-string-unpack %}).

Even though these efforts resulted in sizable improvements, it became clear that there are limits on how far you can push performance in Ruby. The fact that I am implementing a library that by nature requires a lot of memory and computation is not going to change.

So what are the options? I could recommend RMagick to people asking for more performance, but that is not going to happen after all my ImageMagick bashing. <sup>[2]</sup> In the end, I had to roll up my sleeves and program some C.

### Being pure Ruby is a feature

To tackle the performance issue, I had the options of either implementing the C extension as part of ChunkyPNG, or build a separate library. <sup>[3]</sup> My initial gut feeling was to add a C extension to ChunkyPNG to give everyone a free performance boost. However, I soon discovered many people were using the library *because* it was pure Ruby. For me, it was a pragmatic implementation detail; for them, it was a feature.

Including a C extension would require everybody that wants to install ChunkyPNG to have a compiler toolchain installed. For me, installing a compiler toolchain is the first thing I do when I get a new machine. This is true for many Ruby developers, but it turns out that many of the library users are not Ruby developers at all. [Compass](http://compass-style.org/), a popular CSS authoring framework, uses ChunkyPNG to generate sprite images. Most Compass users are front-end developers who primarily use HTML, CSS and Javascript, and not Ruby. Because OS X comes with Ruby and Rubygems installed, running `gem install compass` works out of the box. Telling them to install a C compiler chain is simply an unacceptable installation requirement.

There are a couple of additional advantages of being a pure Ruby library. As an open source project ChunkyPNG can attract more contributors, because only a small percentage of Ruby developers are well-versed in C. Moreover, C extensions are MRI specific. This means that many C extensions won't work on Rubinius or JRuby, and I wanted my library to work in these environments as well. <sup>[4]</sup> Finally, libraries that require a C compiler inevitably get a lot of bug reports or support requests of people that are having issues installing the library, because of differences in development environments. <sup>[5]</sup>

### OilyPNG: a mixin library

So instead of adding a C extension, I started working on a separate library: [OilyPNG](https://github.com/wvanbergen/oily_png). Rather than making this a standalone library, I designed it to be a mixin module that depends on ChunkyPNG.

The approach is simple: OilyPNG consists of modules that implement some of the methods of ChunkyPNG in C. When OilyPNG is loaded with `require 'oily_png'`, it first loads ChunkyPNG and uses `Module#include` and `Module#extend` to [overwrite some methods in ChunkyPNG with OilyPNG's faster implementation](https://github.com/wvanbergen/oily_png/blob/master/lib/oily_png.rb).

This approach allows us to keep ChunkyPNG pure Ruby, and make OilyPNG 100% API compatible with ChunkyPNG. It is even possible to make OilyPNG optional in your project:

{% highlight ruby %}
begin
require 'oily_png'
rescue LoadError
require 'chunky_png'
end
{% endhighlight %}

This approach has some other advantages as well. Instead of having to implement everything at once to get to a library that implements most of ChunkyPNG, we can do this step by step while always providing 100% functional parity. Profile ChunkyPNG to find a slow method, implement it in OilyPNG, and iterate. This way OilyPNG doesn't suffer from a bootstrapping problem of having to implement and minimum viable subset of ChunkyPNG right from the start. It can grow organically, one optimized method at the time.

And because we have a well tested, pure Ruby implementation available to which OilyPNG is supposed to be 100% compatible, testing OilyPNG is simple. We just call a method on ChunkyPNG, run the exact same call on an OilyPNG-enhanced ChunkyPNG, and compare the results.

### To conclude

Being pure Ruby can be an important feature of a library for many of its users. Don't give it up too easily, even though Ruby's lacking performance may be an issue. Using a hybrid approach of a pure Ruby library with a native companion library is a great way to have the best of both worlds. <sup>[6]</sup>

---------------------------------------

#### Footnotes

1. This is also why I avoided using the [png gem](https://github.com/seattlerb/png), an "almost-pure-ruby" library that was available at the time. It uses [inline C](https://github.com/seattlerb/rubyinline) to speed up some of the algorithms.
2. Disclaimer: I should note that I haven't used ImageMagick and RMagick since 2010. So my knowledge about the current state of these libraries is extremely outdated at this point.
3. I could have leveraged the work of [libpng](http://www.libpng.org/pub/png/libpng.html) instead of implementing the algorithms myself. I decided not to, because libpng's API doesn't lend itself very well for the cherry-picking of hotspots approach I took with OilyPNG. You basically have to go all in if you want to use libpng. I think a Ruby PNG library that simply wraps libpng still has potential, but because of the reasons outlined in this article, I will leave that as an exercise to the reader. :)
4. Rubinius since has implemented most of MRI's C API so you can compile many C extensions against Rubinius as well, including OilyPNG. As an interesting side note: the Rubinius and JRuby developers have used ChunkyPNG as a performance benchmarking tool, because it contains a non-trivial amount of code and is computation heavy.
5. Unfortunately, OilyPNG is [not an exception](https://github.com/wvanbergen/oily_png/issues/12) to this rule.
6. My current employer &mdash; [Shopify](https://www.shopify.com) &mdash; is using the same approach for [Liquid](https://shopify.github.io/liquid/) and its C companion library [liquid-c](https://github.com/Shopify/liquid-c) with great success. Even though this requires matching Liquid's parsing behavior in certain edge cases quirk by quirk in the C implementation.

Thanks to Simon Hørup Eskildsen, Emilie Noël, and Steven H. Noble for reviewing drafts of this post.
Loading

0 comments on commit f73e03f

Please sign in to comment.