go-parsekit/parsekit.go

52 lines
2.5 KiB
Go

// Package parsekit provides tooling for building parsers using recursive descent and parser/combinator methodology.
//
// The two main components for parsing are subpackages 'tokenize' and 'parse'.
//
// TOKENIZE
//
// The tokenize package's focus is to take input data and to produce
// tokens from that input, which are bits and pieces that can be extracted
// from the input data and that can be recognized by the parser.
//
// Traditionally a tokenizer would produce general tokens (like 'numbers',
// 'plus sign', 'letters') without caring at all about the actual structure
// or semantics of the input. That would be the task of the parser.
//
// I said 'traditionally', because the tokenize package provides a
// parser/combinator-style parser, which allows you to construct complex
// tokenizers which are parsers in their own right in an easy way.
// You can even write a tokenizer and use it in a stand-alone manner
// (see examples - Dutch Postcode).
//
// PARSE
//
// The parse package's focus is to interpret the tokens as provided
// by the tokenizer. The intended style for the parser code is a left-to-right
// recursive descent parser state matchine, constructed from recursive
// function calls.
//
// This might sound intimidating if you're not familiar with the terminology,
// but don't worry about that. It simply means that you implement your parser
// by writing functions that know how to handle various parts of the input,
// and these functions invoke each other based on the input tokens that are
// found, going from left to right over the input
// (see examples - Hello Many State Parser).
//
// BALANCE BETWEEN THE TWO
//
// When writing your own parser using parsekit, you will have to find a
// good balance between the responsibilities for the tokenizer and the parser.
// The parser could provide anything from a stream of individual bytes
// (where the parser will have to do all the work) to a fully parsed
// and tokenized document for the parser to interpret.
//
// In general, recognizing input data belongs in a tokenizer, while interpreting
// input data belongs in a parser. You can for example perfectly well write
// parser code that takes individual digit tokens and checks if those make up
// a phone number, but it is a lot easier to have that handled by a tokenizer.
//
// When all you need is to recognize some data, maybe normalize it and extract
// some bits from it, then you might not even require a parser. A stand-alone
// tokenizer can do all that.
package parsekit