Add the following code to your project's shard.yml under:
dependencies
to use in production
- OR -
development_dependencies
to use in development
chars
vs codepoints
benchmarkI was curious if (like in some other languages) evaluating numeric representations of strings was faster than their character representations.
The short answer is: in Crystal codepoints
is negligibly faster.
The following is example output performed on a ThinkPad E480.
Benchmark #1: ../bin/hamming --chars
Time (mean ± σ): 3.263 s ± 0.029 s [User: 2.276 s, System: 0.975 s]
Range (min … max): 3.247 s … 3.375 s 50 runs
Benchmark #2: ../bin/hamming --codepoints
Time (mean ± σ): 3.251 s ± 0.040 s [User: 2.265 s, System: 0.974 s]
Range (min … max): 3.228 s … 3.395 s 50 runs
Summary
'../bin/hamming --codepoints' ran
1.00 ± 0.02 times faster than '../bin/hamming --chars'
The results are fairly stable, codepoints
always beat chars
, but the
difference is tiny. I wouldn't look here for an optimization, especially if you
need to actually do anything with the characters (like display).
The code under benchmark (src/hamming.cr
) is calculating the Hamming distance
between two files of 100,000,000 characters each. The test (test/bench
) is a
bash script that generates test data (using src/gen_input.cr
) and runs
hyperfine
for a performance comparision.
The idea for the using Hamming distance, and this benchmark in general, came from this exercism.io exercise where my solution got some reasonable questions from the Crystal mentor reviewing it.
shards build --release
test/bench