License: GNU LGPL
2003-2004 (c) Németh László
2013- (c) Zséder Attila
make install
- Unix environment (shell, Unix tools),
- Flex lexical analyzer generator,
- M4 macro processor.
- Unix shell, or CYGWIN on Windows
- sed
huntoken <input_raw_text >xml_output
- -h, --help: help
- -r: only sentence boundary detection
- -x: processing without hun_abbrev filter
- -b: break long sentences (need for tokenising long (>4000 characters) sentences!!!)
- -n: output without XML header and footer
- -e: tokenize English (set English abbrevations)
- -v, --version: version
See flex sources, and huntoken shell program.
László Németh [email protected]
Attila Zséder [email protected], [email protected]