jabbalaci/SpeedTests
comparing the execution speeds of various programming languages
When I learn a new programming language, I always implement the Münchausen numbers problem in the given language. The problem is simple but it includes a lot of computations, thus it gives an idea of the execution speed of a language.
A Münchausen number is a number equal to the sum of its digits raised to each digit's power.
For instance, 3435 is a Münchausen number because 33+44+33+55 = 3435.
00 is not well-defined, thus we'll consider 00=0. In this case there are four Münchausen numbers: 0, 1, 3435, and 438579088.
Write a program that finds all the Münchausen numbers. We know that the largest Münchausen number is less than 440 million.
Dates are in yyyy-month
format.
2025-July: F# was added.
2025-April: Python 3 with Rust removed. Common LISP updated. C3 added.
In the implementations I tried to use the same (simple) algorithm in order to make the comparisons as fair as possible.
All the tests were run on my home desktop machine (Intel Core i7-4771 CPU @ 3.50GHz with 8 CPU cores) using Manjaro Linux. Execution times are wall-clock times and they are measured with hyperfine (warmup runs: 1, benchmarked runs: 2).
The following implementations were received in the form of pull requests:
Thanks for the contributions!
If you know how to make something faster, let me know!
Languages are listed in alphabetical order.
The size of the EXE files can be further reduced with the command strip -s
. If it's
applicable, then the stripped EXE size is also shown in the table.
Below, you can find single-threaded implemetations. We also have some multi-threaded implementations, see here.
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
gcc -O3 main.c -o main -lm |
3.893 ± 0.01 | 15,560 | 14,408 |
gcc -O2 main.c -o main -lm |
3.892 ± 0.001 | 15,560 | 14,408 |
clang -O3 main.c -o main -lm |
2.684 ± 0.013 | 15,528 | 14,416 |
clang -O2 main.c -o main -lm |
2.672 ± 0.001 | 15,528 | 14,416 |
Notes:
-O2
and -O3
.
It's enough to use -O2
.Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
g++ -O3 --std=c++2a main.cpp -o main |
3.865 ± 0.01 | 15,936 | 14,432 |
g++ -O2 --std=c++2a main.cpp -o main |
3.849 ± 0.012 | 15,936 | 14,432 |
clang++ -O3 --std=c++2a main.cpp -o main |
2.913 ± 0.01 | 15,904 | 14,440 |
clang++ -O2 --std=c++2a main.cpp -o main |
2.827 ± 0.015 | 15,904 | 14,440 |
Notes:
-O2
and -O3
.
Using -O2
is even better.Compilation | Runtime (sec) | EXE (bytes) | -- |
---|---|---|---|
dotnet publish -o dist -c Release |
5.097 ± 0.043 | 77,736 | -- |
Notes:
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
c3c compile -O5 -g0 main.c3 |
3.125 ± 0.01 | 110,752 | 90,920 |
Notes:
Execution | Runtime (sec) | compiled / transpiled output size (bytes) | -- |
---|---|---|---|
clj -M -m main |
5.631 ± 0.112 | -- | -- |
mkdir classes && java -cp `clj -Spath` main |
5.339 ± 0.101 | -- | -- |
Notes:
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
codon build -release main.py |
5.369 ± 0.006 | 28,400 | 26,864 |
Notes:
See https://github.com/exaloop/codon for more information about this compiler.
Execution | Runtime (sec) | -- | -- |
---|---|---|---|
clisp -C main2.cl |
517.914 ± 1.032 | -- | -- |
clisp -C main.cl |
322.324 ± 0.98 | -- | -- |
sbcl --script main.cl |
7.277 ± 0.003 | -- | -- |
sbcl --script main2.cl |
4.897 ± 0.007 | -- | -- |
Notes:
clisp
is very slow. Even worse than Python. And without the
-C
switch, it's ten times slower.sbcl
, you can get excellent performance.Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
crystal build --release main.cr |
4.237 ± 0.077 | 807,432 | 273,424 |
Notes:
See https://crystal-lang.org for more info about this language.
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
dmd -release -O main.d |
9.987 ± 0.045 | 993,816 | 712,504 |
ldc2 -release -O main.d |
3.089 ± 0.008 | 34,584 | 23,008 |
Notes:
dmd
is slowldc2
is the best in this caseExecution | Runtime (sec) | compiled / transpiled output size (bytes) | -- |
---|---|---|---|
dart main.dart |
23.909 ± 0.581 | -- | -- |
dart compile js main.dart -O2 -m -o main.js && node main.js |
10.509 ± 0.032 | 31,684 | -- |
dart compile exe main.dart -o main && ./main |
8.377 ± 0.009 | 5,925,856 | -- |
(*
): in the first case, the Dart code is executed as a script
Notes:
Execution | Runtime (sec) | -- | -- |
---|---|---|---|
elixir main.exs |
227.963 ± 0.543 | -- | -- |
elixirc munchausen.ex && elixir caller.exs |
217.528 ± 0.762 | -- | -- |
Notes:
.beam
files. However, it
didn't make the program much faster. The difference is very small.Compilation | Runtime (sec) | EXE (bytes) | -- |
---|---|---|---|
dotnet publish -o dist -c Release |
4.872 ± 0.015 | 77,736 | -- |
Notes:
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
# FASM x64, see v1 in Makefile |
15.792 ± 0.018 | 532 | 532 |
# FASM x86, see v2 in Makefile |
15.207 ± 0.023 | 444 | 444 |
Note: no difference between the 32-bit and 64-bit versions.
See https://en.wikipedia.org/wiki/FASM for more info about FASM.
Execution | Runtime (sec) | -- | -- |
---|---|---|---|
gforth-fast main.fs |
73.734 ± 0.034 | -- | -- |
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
gfortran -O2 main.f08 -o main |
3.884 ± 0.054 | 21,016 | 14,456 |
Note: its speed is comparable to C.
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
# using int, see v1 in Makefile |
4.122 ± 0.034 | 2,137,820 | 1,391,192 |
# using uint and uint32, see v2 in Makefile |
3.5 ± 0.045 | 2,137,756 | 1,391,192 |
Notes:
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
# basic, see v1 in Makefile |
93.816 ± 0.043 | 3,175,704 | 754,008 |
# optimized, see v2 in Makefile |
3.517 ± 0.009 | 6,324,936 | 3,183,648 |
Notes:
Execution | Runtime (sec) | Binary size (bytes) | -- |
---|---|---|---|
javac Main.java && java Main |
5.003 ± 0.002 | 1,027 | -- |
(*
): the binary size is the size of the .class
file
Note: very good performance.
Execution | Runtime (sec) | -- | -- |
---|---|---|---|
node main1.js |
17.789 ± 0.009 | -- | -- |
node main2.js |
6.819 ± 0.001 | -- | -- |
Notes:
main1.js
is a straightforward implementationmain2.js
is an improved implementation, using
a more optimal cache array sizeExecution | Runtime (sec) | -- | -- |
---|---|---|---|
julia --startup=no main.jl |
3.656 ± 0.006 | -- | -- |
Note: excellent performance, almost like C.
See https://julialang.org for more info about this language.
Execution | Runtime (sec) | JAR size (bytes) | -- |
---|---|---|---|
kotlinc main.kt -include-runtime -d main.jar && java -jar main.jar |
5.092 ± 0.004 | 4,826,841 | -- |
Note: same performance as Java.
Compilation | Runtime (sec) | -- | -- |
---|---|---|---|
lua main1.lua |
112.412 ± 0.03 | -- | -- |
luajit main1.lua |
16.854 ± 0.013 | -- | -- |
luajit main2_goto.lua |
15.737 ± 0.007 | -- | -- |
Notes:
//
), but LuaJIT
doesn't understand it.Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
# using Int, see v1 in Makefile |
3.844 ± 0.011 | 1,160,400 | 302,952 |
# using UInt32, see v2 in Makefile |
3.125 ± 0.043 | 1,160,400 | 302,952 |
Notes:
UInt32
makes it even faster. v2 is one of the fastest solutions here.See https://www.modular.com/mojo for more info about Mojo.
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
# NASM x86, see v2 in Makefile |
15.19 ± 0.012 | 9,228 | 8,428 |
# NASM x64, see v1 in Makefile |
15.186 ± 0.034 | 9,656 | 8,552 |
Note: no difference between the 32-bit and 64-bit versions.
See https://en.wikipedia.org/wiki/Netwide_Assembler for more info about NASM.
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
nelua main.nelua --release -o main |
3.519 ± 0.02 | 15,704 | 14,432 |
nelua main.nelua --release --cc=clang -o main |
3.215 ± 0.011 | 15,616 | 14,432 |
Notes:
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
nim c -d:release main.nim |
3.773 ± 0.017 | 73,696 | 63,936 |
nim c --cc:clang -d:release main.nim |
3.645 ± 0.014 | 57,440 | 47,608 |
nim c --cc:clang -d:danger main.nim |
3.41 ± 0.021 | 42,808 | 35,152 |
nim c -d:danger main.nim |
3.098 ± 0.022 | 54,808 | 47,328 |
(*
): if --cc:clang
is missing, then the default gcc
was used
Notes:
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
# using int32, see v3 in Makefile |
3.728 ± 0.02 | 57,440 | 47,608 |
# using int64, see v2 in Makefile |
3.644 ± 0.023 | 57,440 | 47,608 |
# using int, see v1 in Makefile |
3.623 ± 0.001 | 57,440 | 47,608 |
# using uint64, see v5 in Makefile |
3.427 ± 0.033 | 57,496 | 47,608 |
# using uint32, see v4 in Makefile |
3.248 ± 0.026 | 57,496 | 47,608 |
Here, we used the compiler options --cc:clang -d:release
everywhere and tested the different integer data types.
Notes:
int
is platform-dependent, i.e. it's 64-bit long on
a 64 bit system. Thus, on a 64 bit system, there is no difference between
using int and int64 (that is, v1 and v2 are equivalent).Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
ocamlopt -unsafe -O3 -o main -rounds 10 main.ml |
8.18 ± 0.001 | 1,086,200 | 902,232 |
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
odin build . -no-bounds-check -disable-assert -o:speed |
3.536 ± 0.338 | 151,704 | 145,616 |
See https://odin-lang.org for more info about this language.
Notes:
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
# see v1 in Makefile |
17.391 ± 0.009 | 531,056 | 531,056 |
# see v2 in Makefile |
5.828 ± 0.024 | 531,056 | 531,056 |
Notes:
LongInt
in v1) is a bit slower than using unsigned integer (UInt32
in v2).UInt32
, DivMod()
is slow. Using separate div
and mod
operations gives better results.LongInt
, it's the opposite. DivMod()
is a better choice here.strip
didn't make the EXE any smaller.Execution | Runtime (sec) | -- | -- |
---|---|---|---|
perl main.pl |
494.71 ± 4.649 | -- | -- |
perl -Minteger main.pl |
423.805 ± 2.471 | -- | -- |
Notes:
Execution | Runtime (sec) | -- | -- |
---|---|---|---|
php main.php |
133.232 ± 0.113 | -- | -- |
Notes:
Execution | Runtime (sec) | -- | -- |
---|---|---|---|
python3 main.py |
313.333 ± 8.03 | -- | -- |
pypy3 main.py |
19.911 ± 0.054 | -- | -- |
Notes:
Execution | Runtime (sec) | .so (bytes) | stripped .so (bytes) |
---|---|---|---|
mypyc main.py && ./start_v3.sh |
80.481 ± 0.574 | 183,992 | 92,824 |
Notes:
mypyc
can compile a module. This way, the program can be 4 to 5 times faster.Execution | Runtime (sec) | -- | -- |
---|---|---|---|
./start_v1.sh |
46.772 ± 0.203 | -- | -- |
Notes:
Execution | Runtime (sec) | -- | -- |
---|---|---|---|
python3 main.py |
5.526 ± 0.435 | -- | -- |
Notes:
Execution | Runtime (sec) | -- | -- |
---|---|---|---|
racket main1.rkt |
107.486 ± 0.5 | -- | -- |
racket main2.rkt |
43.847 ± 1.932 | -- | -- |
See https://racket-lang.org for more info about this language.
Notes:
main1.js
is a straightforward implementationmain2.js
is an improved implementation that uses type-specific proceduresExecution | Runtime (sec) | -- | -- |
---|---|---|---|
ruby main.rb |
199.632 ± 3.2 | -- | -- |
ruby --jit main.rb |
75.863 ± 1.174 | -- | -- |
Notes:
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
cargo build --release |
2.936 ± 0.078 | 3,839,048 | 317,752 |
Notes:
Execution | Runtime (sec) | JAR size (bytes) | -- |
---|---|---|---|
scalac main.scala -d main.jar && scala main.jar |
5.378 ± 0.015 | 5,782 | -- |
Notes:
Execution | Runtime (sec) | EXE (bytes) | -- |
---|---|---|---|
guile -s main.scm |
148.423 ± 1.773 | -- | -- |
chez --compile-imported-libraries --optimize-level 3 -q --script main.scm |
69.826 ± 0.387 | -- | -- |
gambitc -:debug=pqQ0 -exe -cc-options '-O3' main.scm && ./main |
21.718 ± 0.229 | 9,098,392 | -- |
stalin -architecture amd64 -s -On -Ot -Ob -Om -Or -dC -dH -dP\ && ./main |
4.599 ± 0.017 | 25,472 | -- |
stalin -architecture amd64 -s -On -Ot -Ob -Om -Or -dC -dH -dP\ && ./main |
4.012 ± 0.014 | 25,512 | -- |
Note: stalin's performance is close to C.
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
swiftc -Ounchecked main.swift |
3.335 ± 0.004 | 15,832 | 11,984 |
Note: the performance is similar to C++.
Execution | Runtime (sec) | EXE (bytes) | -- |
---|---|---|---|
toit.run main.toit |
120.263 ± 0.069 | -- | -- |
toit.compile -O2 -o main main.toit && ./main |
118.63 ± 0.774 | 1,254,784 | 1,254,784 |
Notes:
toit.run
and toit.compile
is the same. I'm not sure,
but I think toit.run
compiles to a temp. folder and starts the program from there.toit.compile
must produce a stripped EXE. Stripping the EXE explicitly didn't change
the file size.Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
v -prod main.v |
4.056 ± 0.004 | 209,392 | 187,728 |
v -cc clang -prod main.v |
3.936 ± 0.018 | 212,720 | 191,736 |
By default, it uses GCC.
Notes:
Compilation | Runtime (sec) | EXE (bytes) | stripped EXE (bytes) |
---|---|---|---|
zig build-exe -OReleaseFast src/main.zig |
2.975 ± 0.037 | 1,721,168 | 170,968 |
Notes: