Citron > Parsing interface > CitronLexer
CitronLexer
Generic Class
A simple rule-based lexer. The type of the token data to be returned is used as the generic parameter.
Types
TokenData
The generic parameter for the
CitronLexer. This is the type of the token data to be obtained as a result of tokenization.
LexingRule
A lexing rule can be either:
- string-based, like
.string("func", funcTokenData), or- NSRegularExpression-based, like
.regexPattern("[0-9]+", { str in integerTokenData(str) }It is defined as:
enum LexingRule { case string(String, TokenData?) case regex(NSRegularExpression, (String) -> TokenData?) case regexPattern(String, (String) -> TokenData?) }
CitronLexerPosition
Specifies a position in the input string that marks a token or an error.
This is a tuple containing three fields:
tokenPosition: String.IndexThe start of a token or error in the input string.
linePosition: String.IndexThe start of the line containing the
tokenPosition.
lineNumber: IntThe line number of the line containing the
tokenPosition.This type is defined outside the scope of CitronLexer.
Initializing
init(rules: LexingRule)
Initialize the lexer with lexing rules.
Parameters:
rulesThe lexing rules to use for tokenizing.
Tokenizing
tokenize(_ string: String, onFound: Action)
Tokenize
string. When a token is found, theonFoundblock is called.In case there’s no lexing rule applicable at some position in the
string, an error is thrown and tokenization is aborted.Parameters:
string:
The input string to tokenize
onFound:
This is an action block of type
(TokenData) throws -> Void.When a match is found as per the lexing rules, the
TokenDataobtained from the matching rule is passed on to this action block.Return value:
- None
Throws:
If there is no matching rule at a particular position in the input, a
.noMatchingRuleAt(errorPosition:)error is thrown.Any errors thrown in the
onFoundaction block will be prapagated up to the caller of this method.
tokenize(_ string: String, onFound: Action, onError: ErrorAction?)
Tokenize
string. When a token is found, theonFoundblock is called.In case there’s no rule applicable at some position in the
string, theonErrorblock is called and tokenization continues.Parameters:
string:
The input string to tokenize
onFound:
This is an action block of type
(TokenData) throws -> Void.When a match is found as per the lexing rules, the
TokenDataobtained from the matching rule is passed on to this action block.onError:
This is an action block of type
(CitronLexerError) throws -> Void.If there is no matching rule at a particular position in the input, a
.noMatchingRuleAt(errorPosition:)error is passed to this action block. The lexer then moves ahead to the next position in the input at which a rule can be applied and tokenization continues from there.Return value:
- None
Throws:
- Any errors thrown in the
onFoundandonErroraction blocks will be prapagated up to the caller of this method.
Position
currentPosition: CitronLexerPosition
The current position of the lexer.
Errors
.noMatchingRuleAt(errorPosition: CitronLexerPosition)
Signifies that at position
CitronLexerPositionof the input, none of the specified rules could be applied.
Usage with CitronParser
A CitronLexer can be used with a CitronParser by using a
TokenData that contains:
- a
CitronToken, and - a
CitronTokenCode
The tokens generated by the lexer can then be passed on to the
Citron-generated parser by calling its consume(token:, code:)
method.
Assuming that the Citron-generated parser is called Parser, we can
write:
let parser = Parser()
typealias TokenData = (token: Parser.CitronToken, code: Parser.CitronTokenCode)
// Parser.CitronToken = Int (%token_type)
// Parser.CitronTokenCode is an enum with .Plus, .Minus and .Integer as values
// (where Plus, Minus and Integer are terminals used in the grammar)
let lexer = CitronLexer<TokenData>(rules: [
.string("+", (token: 0, code: .Plus)),
.string("-", (token: 0, code: .Minus)),
.regexPattern("[0-9]+", { s in (token: Int(s)!, .Integer) }
])
try lexer.tokenize("40+2") { tokenData in
try parser.consume(token: tokenData.token, code: tokenData.code)
}
let result = try parser.endParsing()