// Package parsekit provides tooling for building parsers using recursive descent and parser/combinator methodology. // // The two main components for parsing are subpackages 'tokenize' and 'parse'. // // TOKENIZE // // The tokenize package's focus is to take input data and to produce // tokens from that input, which are bits and pieces that can be extracted // from the input data and that can be recognized by the parser. // // Traditionally a tokenizer would produce general tokens (like 'numbers', // 'plus sign', 'letters') without caring at all about the actual structure // or semantics of the input. That would be the task of the parser. // // I said 'traditionally', because the tokenize package provides a // parser/combinator-style parser, which allows you to construct complex // tokenizers which are parsers in their own right in an easy way. // You can even write a tokenizer and use it in a stand-alone manner // (see examples - Dutch Postcode). // // PARSE // // The parse package's focus is to interpret the tokens as provided // by the tokenizer. The intended style for the parser code is a left-to-right // recursive descent parser state matchine, constructed from recursive // function calls. // // This might sound intimidating if you're not familiar with the terminology, // but don't worry about that. It simply means that you implement your parser // by writing functions that know how to handle various parts of the input, // and these functions invoke each other based on the input tokens that are // found, going from left to right over the input // (see examples - Hello Many State Parser). // // BALANCE BETWEEN THE TWO // // When writing your own parser using parsekit, you will have to find a // good balance between the responsibilities for the tokenizer and the parser. // The parser could provide anything from a stream of individual bytes // (where the parser will have to do all the work) to a fully parsed // and tokenized document for the parser to interpret. // // In general, recognizing input data belongs in a tokenizer, while interpreting // input data belongs in a parser. You can for example perfectly well write // parser code that takes individual digit tokens and checks if those make up // a phone number, but it is a lot easier to have that handled by a tokenizer. // // When all you need is to recognize some data, maybe normalize it and extract // some bits from it, then you might not even require a parser. A stand-alone // tokenizer can do all that. package parsekit