-
Notifications
You must be signed in to change notification settings - Fork 0
First Parser
Parsers in Antelope are relatively simple to create - the interface that Antelope uses to interface with a programmer is meant to be simple, and understandable; it was also designed to reflect the interface of Yacc and Bison, while being much more lax with its interface. Antelope posits its interface within 'ace' files.
We're going to create our own, simple language to show the basic syntax. We're going to want to have it generate into Ruby.
# This is a comment. Anything after the # is ignored.
%require "~> 0.3" # This requires a specific version of Antelope.
# If you attempt to compile the file with a version
# of Antelope that doesn't match this, Antelope
# _will_ error.
%generator "ruby" # The generator that Antelope will use for the
# output format. This is determined by what
# Antelope can support. `null`, `output`, `ruby`,
# and `c` re all provided by default.
%define ruby.error-class {SyntaxError} # This is a definition for the
# Ruby generator. It is the error class that is
# raised.
%define output.panic-mode true # If the output generators should
# output "panic mode" code. This is turned off by
# default; more on panic mode later.
%define output.verbose true # Output verbose information about the
# parser. The only generator that actually uses
# this define is Output.
%terminal NUMBER # Defines a terminal named `NUMBER`. This can be
# used within the grammar.
%terminal IDENTIFIER
%terminal EXPONENTIATE "^" # Defines a terminal named `MULTIPLY`.
# Unlike before, however, a representative token is
# used to make the debug information easier to
# understand.
%terminal MULTIPLY "*"
%terminal DIVIDE "/"
%terminal ADD "+"
%terminal SUBTRACT "-"
%terminal LPAREN "("
%terminal RPAREN ")"
# This represents a code block that is copied directly into the output
# of Antelope.
%{
# encoding: utf-8
%}
# This is the boundry between the "directives" and the "productions".
# Above are directives that are used for Antelope to compile the
# parser.
%%
# Productions are a set of terminals and nonterminals that reduce to
# a nonterminal. A "nonterminal" is a term that defines an abstract
# concept. For example, an `expression` may represent any of the
# operations.
# The typical naming scheme for nonterminals is thus: if a nonterminal
# _may_ exist, then it ends in `.maybe`, and one of its productions is
# `nothing`, which is a special nonterminal that should not be defined
# by the user (it represents literally nothing).
# It is also common to align, for multi-line productions, the "or"
# symbol (`|`) with the colon (`:`).
body: expressions.maybe;
expressions.maybe: expressions | nothing;
# It turns out, semicolons are optional!
expressions: expressions expression
| expression
# `error` is also a special nonterminal; I'll explain it later.
expression: expression ADD expression
| expression SUBTRACT expression
| expression MULTIPLY expression
| expression DIVIDE expression
| expression EXPONENTIATE expression
| LPAREN expression RPAREN
| LPAREN error RPAREN
# This is also the boundry between the "productions" and the "output".
%%
# This is the contents of the output of Antelope.
module MyLibrary
module Parser
# The output by Antelope.
%{write}
end
end
This is the basic setup for an Ace langauge file. The directives
at the beginning take their syntax from Bison and Yacc. Directives
are formatted like %<name> [<arguments>]*
; each argument can be
plain text (something
), quoted ("something"
), in braces
({something}
), or in carets (<something>
). Some directives
sometimes require a specific type of argument to represent something.
There are a few accepted directives by default.
-
require
: This requires one argument. This dictates the versions of Antelope the Ace file can be compiled with. If you've ever used a Gemfile or a gemspec file, the exact same syntax usable in those to denote versions of gems can be used here. Example:%require ">= 0"
. -
generator
,language
,grammar.type
: These all require one argument. These all perform the same action of defining the primary generator that the Ace file should use. This can be overwritten by the command line interface; however, if no generator is provided, then Antelope will error. Right now, Antelope ships with a few generators:-
output
: This generator is actually just a facade for two other generators,error
andinfo
; however, using them individually is not recommended.error
contains error information about each state, such as conflicts.info
contains information about the states themselves.output
is used by default along with any generator you choose within the file. -
null
: This generator doesn't actually generate anything, and is a good generator if you don't wish to generate something. -
ruby
: This generator generates a ruby parser that can be easily integrated into a ruby program. How the contents are integrated are completely up to you. The generator requires one method to be defined:type
(the documentation on this should be outputted with the generator). -
c
: This generator generates a C parser. This, likeoutput
, is just a facade for two other generators,c_header
andc_source
. This generator is also incomplete. Example:%generator "ruby"
.
-
-
token
,terminal
: These requires require one argument, with another argument being optional. These both do the same thing; that is, they define a terminal used within the grammar. The first argument is the terminal's name, which can be used later within the grammar file. The second argument is the terminal's representation in theoutput
generator. Example:%terminal PLUS "+"
. -
left
,right
,nonassoc
: These require at least one argument, with more being optional. These all do similar things - they define a single precedence level, with the first defining a left associative precedence level, the second defining a right associative precedence level, and the last defining a non-associative precedence level. The arguments to each can be any terminal, defined or not. Example:%left PLUS MINUS
. -
define
: This requires one argument, with more being optional. This defines a single key-value pair that is normally used by the generators. Some of the defines include (and are mostly not limited to; however, if Antelope can't find a generator that will take the option, it will error):-
null.data
,comment
: These take any number of arguments and ignores it. A part of the Null generator. -
union
: Used by the C generator, it takes two arguments, and it defines the union type that any nonterminal or terminal can use. The first argument is the name of the union, and the second is the body of the union. Example:%define union some_thing { int test; }
. -
api.prefix
: Used by the C generator, it takes one argument, and it defines the prefix of all identifiers that are generated. By default, this isyy
, for compatibility reasons. Example:%define api.prefix some_parser
. -
api.push-pull
: Used by the C generator, it takes one argument, and it defines the type of parser that is generated. By default, it is a pull parser. Example:%define api.push-pull push
. -
api.value.type
: Used by the C generator, it takes one argument, and it defines the type that terminals and nonterminals all have. Example:%define api.value.type uint32_t
. -
api.token.prefix
: Used by the C generator, it takes one argument, and it defines the prefix that any and all terminals have as enumerable constants. Example:%define api.token.prefix SOME_PARSER
. -
parse-param
: Used by the C generator, it takes any number of arguments, and it is the parameters that are passed to the parse function (defined by Antelope) for use by actions. Example:%define parse-param {some_parser_t* parser}
. -
lex-param
: Used by the C generator, it takes any number of arguments, and it is the parameters that are passed to the lex function for use by the lex function. Example:%define lex-param {some_lexer_t* lex}
. -
param
: Used by the C generator, it takes any number of arguments, and it is the parameters that are passed to both the parse function (defined by Antelope) and the lex function. Example:%define param {some_param_t* param}
. -
output.verbose
: Used by the Error generator, it takes one argument, and it outputs verbose information about conflicts that were automagically resolved using the precedence rules (more on that later), if the value is trueish (i.e., not undefined or false). By default, it is false. Example:%define output.verbose true
. -
html.show-lookahead
: Has no apparent effect. -
panic-mode
: Enables panic mode code generation. More on panic mode in Advanced Ace files. Example:%define panic-mode true
. -
ruby.error-class
: Used by the Ruby generator, it takes one argument, being the class that should be raised by the ruby parser upon an error. Example:%define ruby.error-class {SyntaxError}
. Technically, any of the above can be used without thedefine
prefix; however, this is wholly unrecommended.
-