ParseKit Documentation

ParseKit

ParseKit is a Mac OS X Framework written by Todd Ditchendorf in Objective-C and released under the Apache 2 Open Source License. ParseKit is suitable for use on Mac OS X Leopard and later or iOS. ParseKit is an Objective-C is heavily influced by ANTLR by Terence Parr and "Building Parsers with Java" by Steven John Metsker. Also, ParseKit depends on MGTemplateEngine by Matt Gemmell for its templating features.

The ParseKit Framework offers 3 basic services of general interest to Cocoa developers:

  1. String Tokenization via the Objective-C PKTokenizer and PKToken classes.
  2. High-Level Language Parsing via Objective-C - An Objective-C parser-building API (the PKParser class and sublcasses).
  3. Objective-C Parser Generation via Grammars - Generate Objective-C source code for parser for your custom language using a BNF-style grammar syntax (similar to yacc or ANTLR). While parsing, the parser will provide callbacks to your Objective-C code.

The ParseKit source code is available on Github.

More documentation:

Projects using ParseKit:

Xcode Project

The ParseKit Xcode project consists of 6 targets:

  1. ParseKit : the ParseKit Objective-C framework. The central feature/codebase of this project.
  2. libParseKit : the ParseKit Framework as a static library for Mac OS X applications.
  3. libParseKitMobile : the ParseKit Framework as a static library for iOS applications.
  4. ParserGenApp : a simple Mac app that can convert your ParseKit grammars into Objective-C parser source code.
  5. Tests : a UnitTest Bundle containing hundreds of unit tests (or more correctly, interaction tests) for the framework as well as some example classes that serve as real-world uses of the framework.
  6. DemoApp : a simple Cocoa demo app that gives a visual presentation of the results of tokenizing text using the PKTokenizer class.
  7. DebugApp : a simple Cocoa app that exists only to run arbitrary test code thru GDB with breakpoints for debugging (I was not able to do that with the UnitTest bundle).

ParseKit Framework


Tokenization

The API for tokenization is provided by the PKTokenizer class. Cocoa developers will be familiar with the NSScanner class provided by the Foundation Framework which provides a similar service. However, the PKTokenizer class is simpler and more powerful for many use cases.

Example usage:

NSString *s = @"\"It's 123 blast-off!\", she said, // watch out!\n"
              @"and <= 3.5 'ticks' later /* wince */, it's blast-off!";
PKTokenizer *t = [PKTokenizer tokenizerWithString:s];

PKToken *eof = [PKToken EOFToken];
PKToken *tok = nil;

while ((tok = [t nextToken]) != eof) {
    NSLog(@" (%@)", tok);
}

outputs:

 ("It's 123 blast-off!")
 (,)
 (she)
 (said)
 (,)
 (and)
 (<=)
 (3.5)
 ('ticks')
 (later)
 (,)
 (it's)
 (blast-off)
 (!)

Each token produced is an object of class PKToken. PKTokens have a tokenType (Word, Symbol, Number, QuotedString, etc.) and both a stringValue and a floatValue.

More information about a token can be easily discovered using the -debugDescription method instead of the default -description. Replace the line containing NSLog above with this line:

NSLog(@"%@", [tok debugDescription]);

and each token's type will be printed as well:

 <Quoted String «"It's 123 blast-off!"»>
 <Symbol «,»>
 <Word «she»>
 <Word «said»>
 <Symbol «,»>
 <Word «and»>
 <Symbol «<=»>
 <Number «3.5»>
 <Quoted String «'ticks'»>
 <Word «later»>
 <Symbol «,»>
 <Word «it's»>
 <Word «blast-off»>
 <Symbol «!»>

As you can see from the output, PKTokenzier is configured by default to properly group characters into tokens including:

The PKTokenizer class is very flexible, and all of those features are configurable. PKTokenizer may be configured to:


Parsing

ParseKit also includes a collection of token parser subclasses (of the abstract PKParser class) including collection parsers such as PKAlternation, PKSequence, and PKRepetition as well as terminal parsers including PKWord, PKNum, PKSymbol, PKQuotedString, etc. Also included are parser subclasses which work in individual chars such as PKChar, PKDigit, and PKSpecificChar. These char parsers are useful for things like RegEx parsing. Generally speaking though, the token parsers will be more useful and interesting.

The parser classes represent a Composite pattern. Programs can build a composite parser, in Objective-C (rather than a separate language like with lex&yacc), from a collection of terminal parsers composed into alternations, sequences, and repetitions to represent an infinite number of languages.

Parsers built from ParseKit are non-deterministic, recursive descent parsers, which basically means they trade some performance for ease of user programming and simplicity of implementation.

Here is an example of how one might build a parser for a simple voice-search command language (note: ParseKit does not include any kind of speech recognition technology). The language consists of:

search google for? <search-term>
...

	[self parseString:@"search google 'iphone'"];
...
	
- (void)parseString:(NSString *)s {
	PKSequence *parser = [PKSequence sequence];

	[parser add:[[PKLiteral literalWithString:@"search"] discard]];
	[parser add:[[PKLiteral literalWithString:@"google"] discard]];

	PKAlternation *optionalFor = [PKAlternation alternation];
	[optionalFor add:[PKEmpty empty]];
	[optionalFor add:[PKLiteral literalWithString:@"for"]];

	[parser add:[optionalFor discard]];

	PKParser *searchTerm = [PKQuotedString quotedString];
	[searchTerm setAssembler:self selector:@selector(workOnSearchTermAssembly:)];
	[parser add:searchTerm];

	PKAssembly *result = [parser bestMatchFor:[PKTokenAssembly assmeblyWithString:s]];
	
	NSLog(@" %@", result);

	// output:
	//  ['iphone']search/google/'iphone'^
}

...

- (void)workOnSearchTermAssembly:(PKAssembly *)a {
	PKToken *t = [a pop]; // a QuotedString token with a stringValue of 'iphone'
	[self doGoogleSearchForTerm:t.stringValue];
}