Monday, May 05, 2014

Build system performance: Shake vs Ninja

Summary: Ninja is a build system focused on being fast. Some limited benchmarking suggests the Shake build system might be faster.

The Shake build system aims to be both powerful and fast. The Ninja build system describes itself as small with a focus on speed. Now that Shake can run Ninja build files, I benchmarked Shake against Ninja.

The Test

I have been benchmarking Shake vs Ninja as part of the Shake continuous integration tests. My benchmark builds the Ninja source code using their Ninja build file, once from scratch with 3 threads (a full build), then builds again to ensure it is up to date (a zero build). The test is run with both Ninja and Shake, always immediately after a Ninja build (to ensure all files are cached). The average times over 71 runs are:

  • Full build: Ninja takes 6.552s, Shake takes 5.985s. Shake is 0.567s faster.
  • Zero build: Ninja takes 0.007s, Shake takes 0.012s. Ninja is 0.005s faster.

These tests are run on Linux, on a Travis machine. Both the hardware and loading of the machine is likely to vary over time. I deliberately picked a lower level of parallelism to try and ensure the build was not limited by running too many processes (it does not seem to be). It is now a test failure if Shake is slower for the full build, or if Shake is more than 0.1s slower for the zero build.

A more interesting test would be building something more substantial than Ninja - but choosing a benchmark is hard, and I am limited by the amount of Travis time I can use. It is not clear if Shake will be consistently N seconds faster than Ninja, or N% faster than Ninja, or if this result is an aberration due to the particular choice of benchmark. Shake does not implement the Ninja feature of rebuilding when the command line changes - adding that feature would be unlikely to have any impact on the full build but may slightly slow down the Shake zero build.

Improvements to Shake

When I first started benchmarking Shake vs Ninja, I had reports that Shake was significantly slower - taking around 40% longer to build large projects. As a result I made a number of improvements to Shake:

Improvement 1: --skip-commands

I added the --skip-commands flag and shakeRunCommands option to Shake, which skips running any command operations which have no return results. Provided your build system does not delete temporary files, it allows you to build normally, then build with --always-make --skip-commands to "run the build", skipping running commands, measuring the rest of the build system.

Improvement 2: Makefile parsing

Using --always-make --skip-commands on LLVM via Ninja files, I found the non-command build time was 58s. Profiling showed that most of the time was spent parsing Makefiles, so I wrote optimised routines, available from Development.Shake.Util. These changes reduced the LLVM non-command time to 15s.

Improvement 3: Filepath normalisation

Further profiling showed that filepath normalisation was now a bottleneck. I responded by both optimising the filepath normalisation (writing a large test suite and correcting several bugs in the process), and removing some redundant normalisations. These changes reduced the LLVM time to 4s, most of which went on file modification time checking.

Improvement 4: Turning off idle garbage collection

By default, programs compiled with GHC run the garbage collector if the Haskell runtime is idle for 0.3s. For Shake, which regularly becomes idle when running commands, the garbage collector ends up competing with the commands it has spawned. I now recommend people turn off the idle garbage collector by compiling with -with-rtsopts=-I0, and I do so for the shake executable.

Improvement 5: --timings

In order to accurately measure where time was going, I added the --timings flag and shakeTimings option. When run with --timings Shake prints out something like:

Start                             0.006s    1%
shakeArgsWith                     0.000s    0%
Function shake                    0.002s    0%
Database read                     0.049s   10%  ===
With database                     0.002s    0%
Running rules                     0.353s   72%  =========================
Pool finished (5 threads, 2 max)  0.000s    0%
Lint checking                     0.033s    6%  ==
Profile report                    0.041s    8%  ==
Total                             0.486s  100%
Build completed in 0:01m

Here we can see which stages are taking most time. For example, reading in the database takes 0.049s at 10% of the time. The = symbols to the right serve as a crude bar plot representing the timings.

Improvement 6: Smaller database

For zero builds I found much of the time was spent reading the database. I changed some of the representation, using smaller Int types and more compact encodings. These changes reduced the database by ~25% and had a small effect on the time to read the database.

Future improvements

For the full build, I beat Ninja, despite originally only aiming for a draw. The build overhead introduced by Shake is 0.029s, of which 0.010s is running the rules. Provided that scales linearly, the cost seems negligible compared to actually performing the build.

For the zero build, I am slower than Ninja. To investigate I measured just running --version with Ninja and Shake. Ninja takes 0.003s and Shake takes 0.004s, so a large portion of the zero build times is the cost of starting the executable, not project specific. Running --timing I see:

Start                             0.000s    3%  ==                       
shakeArgsWith                     0.000s    7%  =====                    
Ninja parse                       0.001s   16%  ===========              
Function shake                    0.000s   10%  ======                   
Database read                     0.002s   36%  =========================
With database                     0.000s    3%  ==                       
Running rules                     0.001s   20%  =============            
Pool finished (2 threads, 2 max)  0.000s    2%  =                        
Total                             0.004s  100%                           

The largest bottleneck is database reading. Duncan Coutts' recent work on moving the binary library to CBOR is likely to result in a smaller and faster database. I await that work eagerly, and will look at further optimisations after it is available.

Is Shake faster than Ninja?

For building Ninja from scratch, Shake is faster than Ninja (perhaps the Ninja developers should switch to Shake for their development work :P). Another Ninja user benchmarked building the VXL project with both Shake and Ninja and discovered Ninja took 44 mins, while Shake took 41 mins (Shake 3 minutes faster) - but this benchmark was only run a handful of times. I would be interested to hear additional results.

6 comments:

MichaƂ said...

Congratulations!

Looks like shake is contender for the best large scale build system now.

Did you consider moving database to sqlite or similar backend? Or starting to read database in parallel with other stages?

PS I should probably think about moving my Makefiles to shake now... do you recommend any particular tutorial?

Neil Mitchell said...

I think something like sqlite would be a poor fit - I describe it as "the database" but it's very much a serial list of record entries, with arbitrary Haskell payloads. If I put it in a database, I'd still have to deserialise the Haskell payloads, which is the bottleneck, and my own log can end up very efficient with the right binary library. It's interesting that both Shake and Ninja converged independently on very similar log structured files.

I have considered reading the database in parallel. In particular, if you are reading the db and come across a file entry (which is roughly every single entry) you could check the modification time of that entry and cache it. I've not tried that yet as it requires threads/locks and could potentially go slower given a fast database read. It's one of the things on my list after the binary library rewrite.

For learning Shake, I recommend the user manual which is basically a long tutorial: https://github.com/ndmitchell/shake/blob/master/docs/Manual.md#readme

Ian Monroe said...

This is all for Linux I assume? Windows would be more fertile ground for optimizing a make replacement. I build on Windows and Linux both, and basically only really notice build times on Windows with my relatively small project. Especially the 'zero build' is a lot slower on Windows. (My theory is that NTFS is horrible, but I'm just guessing.) Though I'm guessing optimizing on Windows doesn't help for Linux though.

Anyways if you can parse Ninja, that means Shake can build any cmake project. I suggest kdelibs for a more meaningful benchmark. ;)

Neil Mitchell said...

Ian: These numbers are all for Linux, but I do all my development on Windows and use Shake with Windows all day long, so I've optimised it for that too. Most optimisations tend to be shared, although I have custom filetime code on each OS, and certain things like parallelising GetModificationTime give a win on Windows but not Linux. For Windows, spawning processes has a terrible overhead compared to Linux, and Shake doesn't spawn process to check a build is up to date, while recursive Make would.

Unfortunately my home desktop is now showing its age, so building things like kdelibs takes a while and probably requires updating half my system. I welcome others to try Shake out with cmake - if providing Windows Shake binaries would make that easier, let me know.

Unknown said...

Errata: The binary package link should be http://hackage.haskell.org/package/binary .

Neil Mitchell said...

Sean: Thanks, fixed now.