Skip to content

Latest commit

 

History

History
730 lines (497 loc) · 11.5 KB

parsers.md

File metadata and controls

730 lines (497 loc) · 11.5 KB

List of parser combinators

Note: when samples use a local input variable representing the input text to parse, and a parser variable, the result is usually the outcome of calling var result = parser.Parse(input) or var success = parser.TryParse(input, out var result).

Terms and Literals

These are lowest level elements of a grammar, like a '.' (dot), predefined strings like "hello", numbers, and more. In Parlot they usually are accessed from the Literals or Terms classes, with the difference that Terms will return a parser that accepts blank spaces before the element.

Terms and Literals are accessed using the Parsers.Terms and Parsers.Literals static classes.

WhiteSpace

Matches blank spaces, optionally including new lines. Returns a TextSpan with the matched spaces. This parser is not available in the Terms static class.

Parser<TextSpan> WhiteSpace(bool includeNewLines = false)

Usage

var input = "   \thello world  ";
var parser = Literals.WhiteSpace();

Result:

"   \t"

NonWhiteSpace

Matches any non-blank spaces, optionally including new lines. Returns a TextSpan with the matched characters.

Parser<TextSpan> NonWhiteSpace(bool includeNewLines = false)

Usage:

var input = "hello world";
var parser = Terms.NonWhiteSpace();

Result:

"hello"

Text

Matches a given string, optionally in a case insensitive way.

Parser<string> Text(string text, bool caseInsensitive = false)

Usage:

var input = "hello world";
var parser = Terms.Text("hello");

Result:

"hello"

Char

Matches a given character.

Parser<char> Char(char c)

Usage:

var input = "hello world";
var parser = Terms.Char("h");

Result:

'h'

Integer

Matches an integral numeric value.

Parser<long> Integer()

Usage:

var input = "-1234";
var parser = Terms.Integer();

Result:

-1,234

Decimal

Matches a numeric value with optional digits.

Parser<decimal> Decimal()

Usage:

var input = "-1234.56";
var parser = Terms.Decimal();

Result:

-1,234.56

String

Matches a quoted string literal, optionally use single or double enclosing quotes.

Parser<TextSpan> String(StringLiteralQuotes quotes = StringLiteralQuotes.SingleOrDouble)

Usage:

var input = "'hello\\nworld'";
var parser = Terms.String();

Result:

'hello\\nworld'

Identifier

Matches an identifier, optionally with extra allowed characters. Default start chars are [$_a-zA-Z]. Other chars also include digits.

Parser<TextSpan> Identifier(Func<char, bool> extraStart = null, Func<char, bool> extraPart = null)

Usage:

var input = "slice_text();";
var parser = Terms.Identifier();

Result:

slice_text

Pattern

Matches a consecutive characters with a specific predicate, optionally defining a minimum and maximum size.

Parser<TextSpan> Pattern(Func<char, bool> predicate, int minSize = 1, int maxSize = 0)

Usage:

var input = "ababcad";
var parser = Terms.Pattern(c => c == 'a' || c == 'b');

Result:

abab

Combining parsers

Or

Matches any of two parsers.

Parser<T> Or<T>(this Parser<T> parser, Parser<T> or)

Another overload accepts to return a common base class from the two parsers

Parser<T> Or<A, B, T>(this Parser<A> parser, Parser<B> or) 

Usage:

var parser = Terms.Text("one").Or(Terms.Text("1"));
parser.Parse("1");
parser.Parse("one");
parser.Parse("hello")

Result:

"1"
"one"
null

Multiple Or() calls can be used in a row, e.g., a.Or(b).Or(c).Or(d)

And

Matches two consecutive parsers. The result is a strongly typed tuple containing the two individual results.

Parser<ValueTuple<T1, T2>> And<T1, T2>(this Parser<T1> parser, Parser<T2> and)

Usage:

var parser = Terms.Text("hello").And(Terms.Text("world"));
parser.Parse("hello world").Item1;
parser.Parse("hello world").Item2;
parser.Parse("hello");

Result:

"hello"
"world"
null

Multiple And() calls can be used in a row, e.g. to match a variable assignment like age = 12

var input = "age = 12";
var parser = Terms.Identifier().And(Terms.Char('=')).And(Terms.Integer());
var result = parser.Parse(input);

Assert.Equal("age", result.Item1);
Assert.Equal('=', result.Item2);
Assert.Equal(12, result.Item3);

AndSkip

Behaves like And but skips the later one's result.

Parser<T1> AndSkip<T1, T2>(this Parser<T1> parser, Parser<T2> and)```

Usage:

var parser = Terms.Text("hello").AndSkip(Terms.Text("world"));
parser.Parse("hello world");
parser.Parse("hello");

Result:

"hello"
null

This is useful to expect successive terms but to ignore some of them and make the result as lean as possible.

var input = "age = 12";
var parser = Terms.Identifier().AndSkip(Terms.Char('=')).And(Terms.Integer());
var result = parser.Parse(input);

Assert.Equal("age", result.Item1);
Assert.Equal(12, result.Item2);

SkipAnd

Behaves like And but skips the former one's result.

Parser<T2> SkipAnd<T1, T2>(this Parser<T1> parser, Parser<T2> and)

Usage:

var parser = Terms.Text("hello").SkipAnd(Terms.Text("world"));
parser.Parse("hello world");
parser.Parse("hello");

Result:

"world"
null

This is useful to expect successive terms but to ignore some of them and make the result as lean as possible.

var input = "age = 12";
var parser = Terms.Identifier().And(Terms.Char('=')).SkipAnd(Terms.Integer());
var result = parser.Parse(input);

Assert.Equal("age", result.Item1);
Assert.Equal(12, result.Item2);

Cardinality Parsers

ZeroOrOne

Makes an existing parser optional.

Parser<T> ZeroOrOne<T>(Parser<T> parser)

Usage:

var parser = ZeroOrOne(Terms.Text("hello"));
parser.Parse("hello");
parser.Parse(""); // returns null but with a successful state

Result:

"hello"
null

ZeroOrMany

Executes a parser as long as it's successful. The result is a list of all individual results.

Parser<List<T>> ZeroOrMany<T>(Parser<T> parser)

Usage:

var parser = ZeroOrMany(Terms.Text("hello"));
parser.Parse("hello hello");
parser.Parse("");

Result:

[ "hello", "hello" ]
[]

OneOrMany

Executes a parser as long as it's successful, and is successful if at least one occurrence is found. The result is a list of all individual results.

Parser<List<T>> OneOrMany<T>(Parser<T> parser)

Usage:

var parser = OneOrMany(Terms.Text("hello"));
parser.Parse("hello hello");
parser.Parse("");

Result:

[ "hello", "hello" ]
null

Not

Succeeds if the parser is not matching.

Parser<T> Not<T>(Parser<T> parser)

Usage:

var parser = Not(Terms.Text("hello"));
parser.Parse("hello");
parser.Parse("world");

Result:

hello // failure
null // success

Coordination parsers

Separated

Matches all occurrences of a parser that are separated by another one. If a separator is not followed by a value, it is not consumed.

Parser<List<T>> Separated<U, T>(Parser<U> separator, Parser<T> parser)

Usage:

var parser = Separated(Terms.Text(","), Terms.Integer());
parser.Parse("1, 2, 3");
parser.Parse("1,2;3");

Result:

[1, 2, 3]
[1, 2]

Between

Matches a parser when between two other ones.

Parser<T> Between<A, T, B>(Parser<A> before, Parser<T> parser, Parser<B> after)

Usage:

var parser = Between(Terms.Char('['), Terms.Integer(), Terms.Char(']'));
parser.Parse("[ 1 ]");
parser.Parse("[ 1");

Result:

1
0 // failure

SkipWhiteSpace

Matches a parser after any blank spaces. This parser respects the Scanner options related to multi-line grammars.

Parser<T> SkipWhiteSpace<T>(Parser<T> parser)

Usage:

var parser = SkipWhiteSpace(Literals.Text("abc"));
parser.Parse("abc");
parser.Parse("  abc");

Result:

"abc"
"abc"

Note: This parser is used by all Terms (e.g., Terms.Text) to skip blank spaces before a Literal.

Deferred

Creates a parser that can be references before it is actually defined. This is used when there is a cyclic dependency between parsers.

Deferred<T> Deferred<T>()

Usage:

var parser = Deferred<string>();
var group = Between(Terms.Char('('), parser, Terms.Char(')'));
parser.Parser = Terms.Integer().Or(group);

parser.Parse("((1))");
parser.Parse("1");

Result:

1
1

Recursive

Creates a parser that can reference itself.

Deferred<T> Recursive<T>(Func<Deferred<T>, Parser<T>> parser)

Usage:

var number = Terms.Decimal();
var minus = Terms.Char('-');

var parser = Recursive<decimal>((u) =>
    minus.And(u)
        .Then(static x => 0 - x.Item2)
    .Or(number)
    );

parser.Parse("--1");

Result:

1

Capture

Ignores the individual result of a parser and returns the whole matched string instead.

This can be used for pattern matching when each part of a pattern isn't more useful that the whole result.

Parser<TextSpan> Capture<T>(Parser<T> parser)

Usage:

var parser = Terms.Identifier().And(Terms.Char('=')).And(Terms.Integer());
var capture = Capture(parser);

capture.Parse("age = 12");

Result:

"age = 12"

Flow parsers

Then

Convert the result of a parser. This is usually used to create custom data structures when a parser succeeds, or to convert it to another type.

Parser<U> Then<U>(Func<T, U> conversion)
Parser<U> Then<U>(Func<ParseContext, T, U> conversion)

Usage:

var parser = 
    Terms.Integer()
    .AndSkip(Terms.Char(','))
    .And(Terms.Integer())
    .Then(x => new Point(x.Item1, y.Item2));

parser.Parse("1,2");

Result:

Point { x: 1, y: 2}

ElseError

Fails parsing with a custom error message when the inner parser didn't match.

Parser<T> ElseError(string message)

Usage:

var parser = 
    Terms.Integer().ElseError("Expected an integer")
    .AndSkip(Terms.Char(',').ElseError("Expected a comma"))
    .And(Terms.Integer().ElseError("Expected an integer"))
    .Then(x => new Point(x.Item1, y.Item2));

parser.Parse("1,");

Result:

failure: "Expected an integer at (1:3)

Error

Fails parsing with a custom error message when the inner parser matched.

Parser<T> Error(string message)
Parser<U> Error<U>(string message)

Usage:

var parser = 
    Terms.Char('a')
    .Or(Terms.Char('b')
    .Or(Terms.Char('c').Error("Unexpected char c")

parser.Parse("1,");

Result:

failure: "Expected an integer at (1:3)

When

Adds some additional logic for a parser to succeed.

Parser<T> When(Func<T, bool> predicate)

Switch

Returns the next parser based on some custom logic.

Parser<U> Switch<U>(Func<ParseContext, T, Parser<U>> action)

Eof

Expects the end of the string.

Parser<T> Eof()

Discard

Discards the previous result and replaces it with the default value or a custom one.

Parser<U> Discard<U>()
Parser<U> Discard<U>(U value)

Other parsers

AnyCharBefore

Returns any characters until the specified parser is matched.

Parser<TextSpan> AnyCharBefore<T>(Parser<T> parser, bool canBeEmpty = false, bool failOnEof = false, bool consumeDelimiter = false)

Empty

Always returns successfully, with an optional return type or value.

Parser<T> Empty<T>()
Parser<object> Empty()
Parser<T> Empty<T>(T value)

OneOf

Like Or, with an unlimited list of parsers.

Parser<T> OneOf<T>(params Parser<T>[] parsers)