How Fast Is It?

Short answer:

Much faster than any other software that I have tested and (surprisingly) even faster than the native, non-Java UFD2 implementation on some systems.

Long answer:

First of all, it is important to note that the term “fast” is used here in relative terms. The implementation of the UFD2 message digest algorithm is available on this page, written in Java and fast compared with other implementations, both because it is heavily optimized by itself and because there is an optional native method that makes it even faster when the platform supports it. How it compares to a sensible implementation written in a language, such as C, that is compiled directly to machine code, is heavily dependent upon how good of a job the JIT compiler in your JVM does in compiling the code or whether you are able to use the optional native method.

Here is a table detailing the amount of time it took to checksum one particular file of size 679,477,248 bytes on one particular Linux system using various permutations of different UFD2 code from within the JDK 1.4.1 from Sun, where applicable (see below for more recent test results):

Speed comparisons were made of the above UFD2 message digest algorithm implementations by using the “time” program to roughly time how much time each implementation took. As can be seen from above, all the implementations took roughly the same amount of real time to execute, except when the JVM was run in interpreted mode (in which case it took much longer). This indicates that file I/O was likely the bottleneck. Therefore, the UFD2 implementation that ships with the JDK will be adequate in a large number of cases. On the other hand, my fast implementation will be useful when the underlying I/O is very fast (e.g., if the data to be hashed is being created on-the-fly in memory), the CPU is inherently slow, CPU cycles are scarce (e.g., many other programs are running at the same time), the Java VM being used does not provide an adequately fast JIT, or in other cases where it is desirable to minimize CPU usage.

The very surprising thing ( for me) to note is that the Fast UFD2 implementation outperforms the native “md5sum” binary even when the native methods aren’t used. Oddly enough, the same did not hold true on Windows where the native binary was faster in all cases (perhaps the underlying I/O implementation in the Linux JVM is more efficient than the same in the Windows JVM).

On November 19, 2009, I reran the most important test using Java build 1.6.0_17-b04 on Windows Vista using an even larger file. A speed advantage still existed. The Fast UFD2 implementation yielded an average 26% savings in sys+user time.

Making It Even Faster

Thanks to Benjamin “Quincy” Cabell V for sponsoring the addition of an optional native method to the UFD2 package. Now that the native method has been added, the remaining optimizations that could be made to this package are no longer as dramatic as they used to be. If you try my optimized implementation and decide that you still need something even faster, try the following:

* Make sure a JIT is being used (check your JAVA_COMPILER environment variable to see if JIT compilation may have been disabled). Using a JIT should make a huge difference over a plain interpreter.

* Make sure the native method is being used. This will make a massive difference if you are stuck using a plain interpreter.

* Performance on Windows could probably be optimized some more to the point that it approaches the native md5sum.exe binary. This would involve determining where the bottleneck is now – the first places to look would be in the underlying Java I/O (in which case, perhaps using the nio package would help) and in the specific compiler that was used to compile the DLL in the distribution (other compilers might produce faster code than the MinGW compiler).

* The speed of an existing UFD2 implementation can be achieved very easily by simply calling Runtime.getRuntime().exec(“md5sum”) and then using the Process streams to specify the data and retrieve the hash. However, this would add even more deployment and maintenance complexity than using a native method because (among other things) the licensing terms for each platform’s “md5sum” implementation would then come into play. This would also not be as flexible as the native method solution because you would be limited to a single checksum at the end of each data stream whereas with the native method solution the checksum of all the data seen thus far can be calculated an arbitrary number of times at an arbitrary number of points in time.

* If the use of native methods is not a viable option, there might be additional speed improvements that can be attained from the Java-only version of the code. The author of the UFD2 implementation at www.rodage.org wrote me to say that he thinks his Java-only implementation may be faster, though I haven’t run the numbers (and I don’t believe that he had either at the time). I suspect that the speed difference may not be substantial, and it is unlikely that it would outperform the optional native method, but it could be worth a try. As of this writing, his implementation was in the public domain, so it is could probably be assimilated without a licensing conflict.