Add the following code to your project's shard.yml under:
dependencies
to use in production
- OR -
development_dependencies
to use in development
🚧 Under Construction 👷
A state machine compiler with no runtime dependency. Define a grammar using a subset of regular expression notation, then compile it into a blazing-fast state machine. Acorn
supports lexers or custom string-based state machines.
Add this to your application's shard.yml
:
dependencies:
acorn:
github: "rmosolgo/acorn"
Define the grammar in a build file:
# ./build/my_lexer.cr
require "acorn"
class MyLexer < Acorn::Lexer
# optional, rename the generated class:
# name("Namespace::MyLexer")
token :letter "a-z"
token :number "0-9"
generate "./src/my_lexer.cr"
end
Generate the lexer:
crystal run ./build/my_lexer.cr
Use the compiled lexer:
require "./src/my_lexer.cr"
MyLexer.scan(input) # => Array(Tuple(Symbol, String))
Tokens are defined with a small regular expression language:
Feature | Example |
---|---|
Character | a , 1 , ❤️ |
Sequence | ab , 123 |
Alternation | a|b |
(ab)|c |
|
. |
|
One of | [abc] |
[^abc] |
|
Escape | \[ , \. |
Unicode character range | a-z , 0-9 |
Zero-or-more | a* |
One-or-more | a+ |
Zero-or-one | a? |
Specific number | a{3} |
Between numbers | a{3,4} |
At least | a{3,} |
An Acorn
module is a Crystal program that generates code. To get a lexer, you have to run the Acorn
module. Then, your main program should use the generated code.
For example, if you define a lexer:
# build/my_lexer.cr
class MyLexer < Acorn::Lexer
# ...
generate("./app/my_lexer.cr")
end
You should run the file with Crystal to generate the specified file:
crystal run build/my_lexer.cr
Then, your main program should require
the generated file:
# my_app.cr
require "app/my_lexer"
MyLexer.scan(input) # => Array(Tuple(Symbol, String))
The generated code has no dependency on Acorn
, so you only need this library during development.
Acorn
returns an of array tokens. Each token is a tuple with:
Symbol
: the name of this token in the lexer definitionString
: the segment of input which matched the pattern{Int32, Int32}
{Int32, Int32}
Line numbers and column numbers are 1-indexed, so the first character in the input is 1:1
.
Acorn
lexers are actually a special case of state machine. You can specify a custom machine, too.
class MyMachine < Acorn::Machine
to bring in the macrosalias Accumulator = ...
to specify the data that will be modified during the process
.new
.scan
action :name, "pattern" { |acc, str, ts, te| ... }
to define patterns
acc
is an instance of your Accumulator
str
is the original inputts
is the index in str
where this token begante
is the index in str
where this token ended (if the match is one character long, ts == te
)name("MyNamespace::MachineName")
to rename the generated Crystal classcrystal run spec/prepare.cr
crystal spec
Goals:
.cr
file and memory usage).cr
, not a special format)Non-goals:
(...)
, .
, [^...]
)
a
is also a move on :any
, how is that handled?LGPLv3