Overview of PCCTS:

PCCTS is a set of utilities for creating compilers from an integrated description of the tokens, grammar, and actions to be performed. It only works with LL(k) grammars, but this is sufficient to build most useful languages.

PCCTS includes the lexer generator DLG and the parser generator ANTLR. A support program for automatically generating a makefile for your PCCTS project called genmk is also included in the PCCTS tool suite.

DLG

DLG reads one or more lexclass descriptions and creates a DFA-based lexical analyzer (a.k.a. scanner or lexer) function. Each time this lexer function is called, it will read and store characters from an incoming stream until it recognizes a token. It then returns the token type to the calling function. The calling function may be a C program or an ANTLR-generated parser.

Each lexclass description contains a list of token types that the incoming character stream should be broken into. These tokens are analogous to the words, numbers, punctuation, and other symbols found in a book. For each token type, there is a regular expression (a.k.a. regex) which describes the input patterns that may match that token, and an optional action to be performed whenever the lexer recognizes a token of that type.

Actions embedded in the lexclass rules can be used to generate output or manipulate data structures when certain tokens are recognized.

ANTLR

ANTLR reads a grammar description and builds a set of parsing functions for a top-down parser of the language described. The language must be LL(k) for ANTLR to generate a proper parser. The parsing process is started when the function which parses the starting rule is first called using an ANTLR macro. Each parsing function will analyze one part of the language by trying to match expected tokens and calling other parsing functions to handle the smaller pieces of this part of the language. These functions will call the lexer to receive new tokens as needed.

The grammar description contains a set of rules which describe the parts of the language. These parts may be analogous to the sentences, paragraphs, chapters, etc. in a book, or even the book itself. Each rule describes one part of the language, and lists the parts from which it is built. There must be a rule for each part of the language that will be recognized, and ANTLR will generate one parsing function for each of these rules.

ANTLR will build a recognizer for the language unless you specify actions to be taken during the parsing of the input stream. Actions embedded in the grammar rules can be used to generate output or manipulate data structures when certain parts of the language are recognized. If the description file contains both lexclass and grammar descriptions, the lexclass description will be passed to DLG, and the grammar description to ANTLR.

genmk

genmk is a utility that can be used to create a makefile for building a lexer, recognizer, or an entire compiler from a PCCTS description file. Which of these the makefile will build depends on the contents of the description file and the command-line parameters passed to genmk.


This page was last modified .