- Tokenization of raw text is a standard pre-processing step for many NLP tasks. For English, tokenization usually involves punctuation splitting and separation of some affixes like possessives. Other languages require more extensive token pre-processing, which is usually called segmentation.
- View on NuGet: http://www.nuget.org/packages/Stanford.NLP.Segmenter
Here are the packages that version 18.104.22.168 of Stanford.NLP.Segmenter depends on.net.sf.mpxj-ikvm : (7.6.2)
Installing with NuGet
PM> Install-Package Stanford.NLP.Segmenter -Version 22.214.171.124
Packages that Depend on Stanford.NLP.Segmenter
No packages were found that depend on version 126.96.36.199.