Skip to the content.

Citron > Parsing interface > CitronLexer

CitronLexer

Generic Class

A simple rule-based lexer. The type of the token data to be returned is used as the generic parameter.

Types

TokenData

The generic parameter for the CitronLexer. This is the type of the token data to be obtained as a result of tokenization.

LexingRule

A lexing rule can be either:

It is defined as:

    enum LexingRule {
        case string(String, TokenData?)
        case regex(NSRegularExpression, (String) -> TokenData?)
        case regexPattern(String, (String) -> TokenData?)
    }

CitronLexerPosition

Specifies a position in the input string that marks a token or an error.

This is a tuple containing three fields:

This type is defined outside the scope of CitronLexer.

Initializing

init(rules: LexingRule)

Initialize the lexer with lexing rules.

Parameters:

Tokenizing

tokenize(_ string: String, onFound: Action)

Tokenize string. When a token is found, the onFound block is called.

In case there’s no lexing rule applicable at some position in the string, an error is thrown and tokenization is aborted.

Parameters:

Return value:

Throws:

tokenize(_ string: String, onFound: Action, onError: ErrorAction?)

Tokenize string. When a token is found, the onFound block is called.

In case there’s no rule applicable at some position in the string, the onError block is called and tokenization continues.

Parameters:

Return value:

Throws:

Position

currentPosition: CitronLexerPosition

The current position of the lexer.

Errors

.noMatchingRuleAt(errorPosition: CitronLexerPosition)

Signifies that at position CitronLexerPosition of the input, none of the specified rules could be applied.

Usage with CitronParser

A CitronLexer can be used with a CitronParser by using a TokenData that contains:

The tokens generated by the lexer can then be passed on to the Citron-generated parser by calling its consume(token:, code:) method.

Assuming that the Citron-generated parser is called Parser, we can write:

let parser = Parser()

typealias TokenData = (token: Parser.CitronToken, code: Parser.CitronTokenCode)
// Parser.CitronToken = Int (%token_type)
// Parser.CitronTokenCode is an enum with .Plus, .Minus and .Integer as values
// (where Plus, Minus and Integer are terminals used in the grammar)

let lexer = CitronLexer<TokenData>(rules: [
        .string("+", (token: 0, code: .Plus)),
        .string("-", (token: 0, code: .Minus)),
        .regexPattern("[0-9]+", { s in (token: Int(s)!, .Integer) }
        ])

try lexer.tokenize("40+2") { tokenData in
    try parser.consume(token: tokenData.token, code: tokenData.code)
}
let result = try parser.endParsing()