Added some package docs.
This commit is contained in:
parent
2293627232
commit
5904da9677
|
@ -0,0 +1,51 @@
|
|||
// Package parsekit provides tooling for building parsers using recursive descent and parser/combinator methodology.
|
||||
//
|
||||
// The two main components for parsing are subpackages 'tokenize' and 'parse'.
|
||||
//
|
||||
// TOKENIZE
|
||||
//
|
||||
// The tokenize package's focus is to take some UTF8 input data and to produce
|
||||
// tokens from that input, which are bits and pieces that can be extracted
|
||||
// from the input data and that can be recognized by the parser.
|
||||
//
|
||||
// Traditionally a tokenizer would produce general tokens (like 'numbers',
|
||||
// 'plus sign', 'letters') without caring at all about the actual structure
|
||||
// or semantics of the input. That would be the task of the parser.
|
||||
//
|
||||
// I said 'traditionally', because the tokenize package implements a
|
||||
// parser/combinator-style parser, which allows you to construct complex
|
||||
// tokenizers which are parsers in their own right in an easy way.
|
||||
// You can even write a tokenizer and use it in a stand-alone manner
|
||||
// (see examples - Dutch Postcode).
|
||||
//
|
||||
// PARSE
|
||||
//
|
||||
// The parse package's focus is to interpret the tokens as provided
|
||||
// by the tokenizer. The intended style for the parser code is a left-to-right
|
||||
// recursive descent parser state matchine, constructed from recursive
|
||||
// function calls.
|
||||
//
|
||||
// This might sound intimidating if you're not familiar with the terminology,
|
||||
// but don't worry about that. It simply means that you implement your parser
|
||||
// by writing functions that know how to handle various parts of the input,
|
||||
// and these functions invoke each other based on the input tokens that are
|
||||
// found, going from left to right over the input
|
||||
// (see examples - Hello Many State Parser).
|
||||
//
|
||||
// BALANCE BETWEEN THE TWO
|
||||
//
|
||||
// When writing your own parser using parsekit, you will have to find a
|
||||
// good balance between the responsibilities for the tokenizer and the parser.
|
||||
// The parser could provide anything from a stream of individual UTF8 runes
|
||||
// (where the parser will have to do all the work) to a fully parsed
|
||||
// and tokenized document for the parser to interpret.
|
||||
//
|
||||
// In general, recognizing input data belongs in a tokenizer, while interpreting
|
||||
// input data belongs in a parser. You can for example perfectly well write
|
||||
// parser code that takes individual digit tokens and checks if those make up
|
||||
// a phone number, but it is a lot easier to have that handled by a tokenizer.
|
||||
//
|
||||
// When all you need is to recognize some data, maybe normalize it and extract
|
||||
// some bits from it, then you might not even require a parser. A stand-alone
|
||||
// tokenizer can do all that.
|
||||
package parsekit
|
Loading…
Reference in New Issue