From 5904da9677888465a85ae952669f79a9eb1b4cd2 Mon Sep 17 00:00:00 2001 From: Maurice Makaay Date: Tue, 18 Jun 2019 22:52:17 +0000 Subject: [PATCH] Added some package docs. --- parsekit.go | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) create mode 100644 parsekit.go diff --git a/parsekit.go b/parsekit.go new file mode 100644 index 0000000..5d6aff0 --- /dev/null +++ b/parsekit.go @@ -0,0 +1,51 @@ +// Package parsekit provides tooling for building parsers using recursive descent and parser/combinator methodology. +// +// The two main components for parsing are subpackages 'tokenize' and 'parse'. +// +// TOKENIZE +// +// The tokenize package's focus is to take some UTF8 input data and to produce +// tokens from that input, which are bits and pieces that can be extracted +// from the input data and that can be recognized by the parser. +// +// Traditionally a tokenizer would produce general tokens (like 'numbers', +// 'plus sign', 'letters') without caring at all about the actual structure +// or semantics of the input. That would be the task of the parser. +// +// I said 'traditionally', because the tokenize package implements a +// parser/combinator-style parser, which allows you to construct complex +// tokenizers which are parsers in their own right in an easy way. +// You can even write a tokenizer and use it in a stand-alone manner +// (see examples - Dutch Postcode). +// +// PARSE +// +// The parse package's focus is to interpret the tokens as provided +// by the tokenizer. The intended style for the parser code is a left-to-right +// recursive descent parser state matchine, constructed from recursive +// function calls. +// +// This might sound intimidating if you're not familiar with the terminology, +// but don't worry about that. It simply means that you implement your parser +// by writing functions that know how to handle various parts of the input, +// and these functions invoke each other based on the input tokens that are +// found, going from left to right over the input +// (see examples - Hello Many State Parser). +// +// BALANCE BETWEEN THE TWO +// +// When writing your own parser using parsekit, you will have to find a +// good balance between the responsibilities for the tokenizer and the parser. +// The parser could provide anything from a stream of individual UTF8 runes +// (where the parser will have to do all the work) to a fully parsed +// and tokenized document for the parser to interpret. +// +// In general, recognizing input data belongs in a tokenizer, while interpreting +// input data belongs in a parser. You can for example perfectly well write +// parser code that takes individual digit tokens and checks if those make up +// a phone number, but it is a lot easier to have that handled by a tokenizer. +// +// When all you need is to recognize some data, maybe normalize it and extract +// some bits from it, then you might not even require a parser. A stand-alone +// tokenizer can do all that. +package parsekit