go-parsekit

179ce57826 New read buffer peek options for extra performance. main Maurice Makaay 2019-08-01 13:26:02 +0000
f70bf8d074 Speed improvements Maurice Makaay 2019-07-29 23:51:09 +0000
b9cc91c0ae More speed improvements. Maurice Makaay 2019-07-29 22:52:38 +0000
8ef9aed096 Switching from various Byte and Rune handlers to single Char handlers. The Char handlers determine on their own if they should handle things in byte or rune mode. Maurice Makaay 2019-07-29 09:45:25 +0000
e0b1039abd Made a big jump in performance on big files with lots of comments, by reading in chunks till end of line, instead of byte-by-byte. Maurice Makaay 2019-07-28 23:50:58 +0000
53ae659ef6 Moving results to their own light weight tokenize.API.Result. Maurice Makaay 2019-07-28 22:35:33 +0000
eda71f304e Dropped PeekWithResult(), because it does not add any substantial performance. A simpler API which is virtually as fast wins any day. Maurice Makaay 2019-07-27 12:26:02 +0000
fcdd3d4ea7 Wow, going nicely! Some more miliseconds stripped. Maurice Makaay 2019-07-26 22:56:12 +0000
daf3b9838f Backup work on dropping forking support. Maurice Makaay 2019-07-26 14:51:40 +0000
4c94374107 Getting rid of forking, the new system delivers more performance. Maurice Makaay 2019-07-26 12:14:15 +0000
87cdadae78 Hmm... this whole snapshot idea seems to work and a valid replacement for the forking method. Maurice Makaay 2019-07-26 08:02:37 +0000
bc9e718e47 Lowering the number of forks required. Maurice Makaay 2019-07-24 22:42:40 +0000
99b0abc490 WIP on lowering the number of forks required. Maurice Makaay 2019-07-24 22:42:16 +0000
548289560b Code cleanup. Maurice Makaay 2019-07-24 11:03:02 +0000
62cd84bb74 Use zero-indexed cursor positioning data inside stackframes. This simplifies some things. Also a bit of code cleanup. Maurice Makaay 2019-07-24 10:34:24 +0000
802701ade5 Added multi-byte peeks for some performance improvements. Maurice Makaay 2019-07-23 23:23:40 +0000
7037c6d24a Fixing some naming inconsistencies. Maurice Makaay 2019-07-23 17:55:13 +0000
a968f22d45 Code cleanup, making the byte and rune inputs look as much the same as possible and get rid of some unneeded functionality. Maurice Makaay 2019-07-23 08:03:16 +0000
93d2cfa6f1 Backup work. Maurice Makaay 2019-07-22 23:28:05 +0000
cf679b2225 Backup work for next refactoring step. Maurice Makaay 2019-07-22 22:16:28 +0000
070e6a13a7 Made some nice steps, backup and continue! Maurice Makaay 2019-07-22 15:37:52 +0000
dd1159e309 Committing a bit of code cleanup before trying something bigger. Maurice Makaay 2019-07-22 07:57:05 +0000
183f5df00d Brought back some lost performance. Doing everything via api.Input/Output causes an extra level of indirection and it does not cost that much, but we do loose performance through that route. So added private methods for the API struct, which are used internally to squeeze out a bit of extra performance. Maurice Makaay 2019-07-20 23:51:08 +0000
acdf83332b Use pointers instead of values, since we're updating the structs. Maurice Makaay 2019-07-20 11:50:36 +0000
7998d05113 More efficient version of MatchOctet. Maurice Makaay 2019-07-20 01:50:12 +0000
0c057e4a9a Split up the api.go into three files: api.go, api_input.go and api_output.go. This makes it easier to manage the individual code sets. Maurice Makaay 2019-07-20 00:48:11 +0000
93c75af87f Moved Input and Output related fields from the API to their respective sub-structs. Maurice Makaay 2019-07-20 00:28:37 +0000
7d2d8dbed3 Moved input-related functions to their own API.Input struct. Maurice Makaay 2019-07-19 23:41:15 +0000
9d98c9dff7 Moving output functions to its own substruct of the API. Maurice Makaay 2019-07-19 22:57:06 +0000
458d6f60a6 A nice performance gain by making a difference between AcceptRunes/AcceptBytes and the new simpler AcceptRune/AcceptByte functions. The simpler versions are faster when only accepting a single byte or rune (which is the case in most situations). Maurice Makaay 2019-07-19 21:13:15 +0000
9a53ea9012 Working on API speed. Maurice Makaay 2019-07-19 14:44:44 +0000
31055a3cd3 Bugfix for parsekit.read: when filling the buffer, the read offset was not taken into account for determining how many bytes could be read. Maurice Makaay 2019-07-19 10:13:32 +0000
3f9c745ac4 Unit tests improved for the parsekit.read package. Maurice Makaay 2019-07-19 09:50:42 +0000
22bcf4677e Some work on simlifying the reader code, to see if I can squeeze some more performance out of that part. Maurice Makaay 2019-07-19 08:47:13 +0000
1771e237c0 Switched to a []byte backing store instead of []rune for collecting input data (we can use both bytes and runes for input in an easy way now) Maurice Makaay 2019-07-18 09:26:11 +0000
b9eeac3480 Work in progress on switching to byte stack. Committing to do some performance checks against master. Maurice Makaay 2019-07-18 08:06:26 +0000
e659380a5f Implemented an efficient M.DropUntilEndOfLine handler, which is now used in the TOML parser for a dramatic speed increase on comment parsing. Maurice Makaay 2019-07-17 23:51:37 +0000
64f92696b2 Fixed unit tests for the new allocation behavior. Maurice Makaay 2019-07-17 23:03:14 +0000
0a4e44b8f8 Allow for bufio Readers that deliver data in chunks (like our unit test Reader) Maurice Makaay 2019-07-17 23:03:00 +0000
6d3eacdcae Allocate read buffer in 1024 byte chunks, and read the data in chunks as well. This is more efficient than reading byte by byte. Maurice Makaay 2019-07-17 22:12:37 +0000
5e3e4b0f0a Yay! First version for which parsing long.toml drops below 100ms! Got an outcome of 93ms. Almost down to BurntSushi's speed level, but still with a generic parser backing. Looking good!! Maurice Makaay 2019-07-16 23:34:01 +0000
ddd0ed49f6 Don't resize the stack slices, since we keep track of their starts and ends anyway. Maurice Makaay 2019-07-16 12:19:50 +0000
06faabdfe2 Small bugfix for the rune-to-byte-fallback code and added byte-support to the Str and StrNoCase matchers. Maurice Makaay 2019-07-16 07:35:06 +0000
4cfdbafa6e Further switching to byte-based input handling. Maurice Makaay 2019-07-16 07:05:10 +0000
0362763e83 Switched to byte input for built-in tokenize.Handler functions. Maurice Makaay 2019-07-15 22:48:00 +0000
d4492e4f0a Bytes reader working, now carry on switching to byte reading in the tokenizer code. Maurice Makaay 2019-07-15 20:03:05 +0000
17935b7534 Further performance optimization and code cleanup. Maurice Makaay 2019-07-12 21:32:40 +0000
56b8df3aab Removed loop protection code. This is useful, but it puts a performance burden on the code when doing it by keeping track of actual callers through the call stack. Maybe to be reintroduced in a future version with something like a simple counter and a maximum depth-style protection. Maurice Makaay 2019-07-12 12:33:18 +0000
09746c0d2e Speeding up the code some more. Big step was made by simplifying the cursor, continuing with that in the next commit. Maurice Makaay 2019-07-12 08:02:04 +0000
7116aa47df Squishing out more performance. Maurice Makaay 2019-07-12 00:21:02 +0000
a4eda45d2c Made all unit tests work again. Maurice Makaay 2019-07-11 14:55:08 +0000
3c9a678d7a Fixed the ModifyDrop() behavior. It worked, but it caused memory build-up in the old implementation. Maurice Makaay 2019-07-11 14:52:12 +0000
c532af67ca Optimization round completed (for now :-) All tests successful. Maurice Makaay 2019-07-11 12:43:57 +0000
7598b62dd0 Finalized the work-through of the new version of the tokenizer code. Maurice Makaay 2019-07-10 20:36:21 +0000
48d7fda9f8 New implementation for performance. Maurice Makaay 2019-07-10 11:26:47 +0000
7795588fe6 Speed improvement work. Maurice Makaay 2019-07-08 21:57:32 +0000
5fa0b5eace Backup work on performance improvements. Maurice Makaay 2019-07-08 14:31:01 +0000
23ca3501e1 Backup changes for performance fixes. Maurice Makaay 2019-07-08 00:12:30 +0000
7bc7fda593 Backup changes for performance fixes. Maurice Makaay 2019-07-05 15:07:07 +0000
5e9879326a Backup work to performance tuning. Maurice Makaay 2019-07-05 08:08:42 +0000
583197c37a Made a distinction between MatchWhitespace() and MatchUnicodeSpace(). Maurice Makaay 2019-07-04 11:32:07 +0000
d96511ce0a Backup work. Maurice Makaay 2019-07-03 15:46:43 +0000
92e6eec7f3 implemented Cursor.moveByRune(), to get rid of some useless rune->string conversion for updating cursor positions. Maurice Makaay 2019-06-30 10:16:46 +0000
4b0309453f Added a feature to run the parser without any of the built-in sanity checks (like loop checks). This improved performance, but at the risk of missing some runtime issues with the parser implementation. Maurice Makaay 2019-06-30 01:05:54 +0000
7ce12d1632 A few small changes used for TOML support. Maurice Makaay 2019-06-23 12:06:31 +0000
5904da9677 Added some package docs. Maurice Makaay 2019-06-18 22:52:17 +0000
2293627232 Small code cleanup things, mainly backing up the changes. Maurice Makaay 2019-06-18 15:46:09 +0000
99654c2f9e Simplified some internal code, which also fixes a bug with correct error reporting from within parsekit in various edge cases. Maurice Makaay 2019-06-17 13:59:31 +0000
cdfc4ce52c More documentation and examples. Maurice Makaay 2019-06-12 16:17:13 +0000
1a280233b0 Got rid of the testify dependency. My testing needs are so basic, that there's no need for this full fledged testing library. Maurice Makaay 2019-06-12 15:25:15 +0000
cef6ae1bc4 Working on documentation. Maurice Makaay 2019-06-12 15:24:09 +0000
27c97ae902 Big overhaul on separating packages for code containment. Maurice Makaay 2019-06-12 14:30:46 +0000
1f0e0fcc17 Splitting up functionality in packages, intermediate step. Maurice Makaay 2019-06-11 22:23:30 +0000
0f7b4e0d26 Added a few syntactic sugar methods for ParseHandler. Maurice Makaay 2019-06-11 09:09:41 +0000
65895ac502 Making parsekit.reader both simpler and more complex (more complex by adopting some buffer allocation logic from the built-in bytes package, to not be copying memory all the time during the read operations. Maurice Makaay 2019-06-09 21:55:01 +0000
9656cd4449 The parsekit.reader.Reader now caches error messages that are returned from the embedded io.Reader. When an error is returned, the read offset and the error are stored. When later on, the same of a higher offset is requested, the error is returned again. This way the code will work for Readers that do not repeatedly return the correct error when calling the Read() method multiple times arter a first error has occurred. Maurice Makaay 2019-06-09 19:42:20 +0000
76336e883e Removed the use of Error.Full(). The default Error() method now includes the extra data from Full() (line and column offset) Maurice Makaay 2019-06-09 15:20:44 +0000
add28feb33 In the spirit of Go, slimmed down the ParseAPI interface. I'm no longer using ParseAPI.On(..).<DoSomething>(), but now it's simply ParseAPI.<DoSomething>(). I also dropped the difference between a Stay() and an Accept(). All that is possible now is ParseAPI.Peek() and ParseAPI.Accept(). Maurice Makaay 2019-06-09 10:25:49 +0000
9f5caa2024 Backup work. Maurice Makaay 2019-06-08 22:48:56 +0000
05ae55c487 Brought the examples up-to-date with the lateset code. All are working correctly now. Maurice Makaay 2019-06-07 16:20:32 +0000
40bad51064 Improvement a few TokanHandlers by letting them make use of the new MatchRuneByCallback method, instead of having them implement their own logic. Maurice Makaay 2019-06-07 15:57:53 +0000
9a5bf8b9af Further code cleaning for the interaction between ParseAPI and TokenAPI. Extra atoms added, also one based on a callback which can accept single runes based on thhat callback function. Maurice Makaay 2019-06-07 15:48:49 +0000
98d2db0374 Moved Reader into its own package. Maurice Makaay 2019-06-07 10:55:55 +0000
6d92e1dc68 Merged functionality of p.Expects(string) and p.UnexpectedInput(). It is now simply p.UnexpectedInput(string). This makes the naming of unexpected input not as magical, but explicit (which is a GoodThing). With one of the earlier incarnations of parsekit it did make sense, but it went in a way in which explicit is more idiomatic for the package. Maurice Makaay 2019-06-07 07:56:24 +0000
3094b09284 Adding documentation and getting the interactions between ParseAPI and TokenAPI cleaned up a bit. Maurice Makaay 2019-06-07 07:26:41 +0000
c0389283bd Added input check for MatchIntegerBetween() Maurice Makaay 2019-06-05 22:21:34 +0000
3d791233e0 Added a lot of IP-address-related TokenHandlers, so we can now process IPv4 addresses, IPv6 addresses, CIDR netmasks, IPv4 dotted quad netmasks, IPv4Net (ipv4 + mask) and IPv6Mask (ipv6 + mask). Maurice Makaay 2019-06-05 22:16:09 +0000
05585db341 Normalizing error handling, to always include the caller location in errors. This makes debugging a lot easier for users of the package, because it doesn't say stuff like 'Method() was called incorrectly', but instead something like 'Method() was called incorrectlty at /path/to/file.go:1234'. Maurice Makaay 2019-06-05 10:07:50 +0000
75373e5ed5 Big simplification run once more, cleaned up code, added tests and examples, made stuff unexported where possible, to slim down the exported interface. Maurice Makaay 2019-06-04 23:15:02 +0000
4580962fb8 Backup a load of work on typed token support, making it easy to produce tokens directly from parser/combinator-based parsing rules. Maurice Makaay 2019-06-04 00:03:08 +0000
21f1aa597c Made the panic() calls (which basically indicate parser implementation bugs) more useful by referencing from where illegal calls were made. Maurice Makaay 2019-05-29 07:24:27 +0000
2fa5b8d0f4 OCD ..OCD ...OCD ... Maurice Makaay 2019-05-29 00:01:24 +0000
1e7ec7553a Tiny fix in variable naming, because the test had grown in a different direction. Maurice Makaay 2019-05-28 23:59:02 +0000
e1534f678e Simplified calculator 2 example. Maurice Makaay 2019-05-28 23:51:19 +0000
11883b06ac Added a unit test for the actual parser loop issue that I ran into myself. This one will not bite me again! Maurice Makaay 2019-05-28 23:13:28 +0000
d31d09abf0 Added crude loop protection to the parser, which should prevent parsers running in circles (happened to me a few times too). Maurice Makaay 2019-05-28 23:01:23 +0000
7aff3fc43e Added a nice example that shows how a []string-based type can be turned into a parser that fills its own slice elements during parsing. Maurice Makaay 2019-05-28 14:38:04 +0000
2d851103e5 Cleanup of stuff that I don't need anymore, because it has been fully deprecated. Also added some tests for panic() calls in parsekit, which brings test coverage to 100%. It's not a goal as such, but it's good to know that I got there without cheaty tests :) Maurice Makaay 2019-05-28 13:41:58 +0000
3dfa99c965 Modified all examples and tests to make use of the new ideas on how to keep parsing state. After this commit, I can cleanup a lot of stuff from the emitting loop-based parser which was basically crap for complex parsers. Maurice Makaay 2019-05-28 10:42:46 +0000
980c18099e A small change to the computation interpreter to get rid of one useless level of recursion. Maurice Makaay 2019-05-28 07:26:50 +0000

Commit Graph Select branches Hide Pull Requests main Mono Color

Commit Graph

Select branches

Hide Pull Requests

main