A tokenizer splits bodies of text into individual words, useful for example in keyword searching. Other implementations are:
- Perl: http://search.cpan.org/~jiro/Text-TinySegmenter-0.01/lib/Text/TinySegmenter.pm
- Javascript: http://chasen.org/~taku/software/TinySegmenter/
- http://code.google.com/p/asheldritch/wiki/TinySegmenter: http://lilyx.net/pages/tinysegmenterp.html
- Objective-C: http://blog.bornneet.com/Entry/276/
- Lisp: http://miyamuko.s56.xrea.com/xyzzy/tiny-segmenter.html
- Ruby: http://d.hatena.ne.jp/llamerada/20080224/1203818061
- VBA: http://pub.ne.jp/arihagne/?cat_id=123314