Added some package docs.

2019-06-18 22:52:17 +00:00 · 2019-06-18 22:52:17 +00:00 · 5904da9677
parent 2293627232
commit 5904da9677
1 changed files with 51 additions and 0 deletions
--- a/parsekit.go
+++ b/parsekit.go
@ -0,0 +1,51 @@
+// Package parsekit provides tooling for building parsers using recursive descent and parser/combinator methodology.
+//
+// The two main components for parsing are subpackages 'tokenize' and 'parse'.
+//
+// TOKENIZE
+//
+// The tokenize package's focus is to take some UTF8 input data and to produce
+// tokens from that input, which are bits and pieces that can be extracted
+// from the input data and that can be recognized by the parser.
+//
+// Traditionally a tokenizer would produce general tokens (like 'numbers',
+// 'plus sign', 'letters') without caring at all about the actual structure
+// or semantics of the input. That would be the task of the parser.
+//
+// I said 'traditionally', because the tokenize package implements a
+// parser/combinator-style parser, which allows you to construct complex
+// tokenizers which are parsers in their own right in an easy way.
+// You can even write a tokenizer and use it in a stand-alone manner
+// (see examples - Dutch Postcode).
+//
+// PARSE
+//
+// The parse package's focus is to interpret the tokens as provided
+// by the tokenizer. The intended style for the parser code is a left-to-right
+// recursive descent parser state matchine, constructed from recursive
+// function calls.
+//
+// This might sound intimidating if you're not familiar with the terminology,
+// but don't worry about that. It simply means that you implement your parser
+// by writing functions that know how to handle various parts of the input,
+// and these functions invoke each other based on the input tokens that are
+// found, going from left to right over the input
+// (see examples - Hello Many State Parser).
+//
+// BALANCE BETWEEN THE TWO
+//
+// When writing your own parser using parsekit, you will have to find a
+// good balance between the responsibilities for the tokenizer and the parser.
+// The parser could provide anything from a stream of individual UTF8 runes
+// (where the parser will have to do all the work) to a fully parsed
+// and tokenized document for the parser to interpret.
+//
+// In general, recognizing input data belongs in a tokenizer, while interpreting
+// input data belongs in a parser. You can for example perfectly well write
+// parser code that takes individual digit tokens and checks if those make up
+// a phone number, but it is a lot easier to have that handled by a tokenizer.
+//
+// When all you need is to recognize some data, maybe normalize it and extract
+// some bits from it, then you might not even require a parser. A stand-alone
+// tokenizer can do all that.
+package parsekit