README.md (459B) - raw
1 Utilities for working with multilingual text in Lucene.
2
3 ``CyrillicTransliteratingFilter`` injects a Latin transliteration in the
4 same position as tokens containing Cyrillic characters. For example,
5 this makes it possible to match the text ``Pasternak’s Повесть`` with
6 the query ``pasternak's povest``.
7
8 ``XMLTokenizer`` tokenizes an XML document, using different Analyzers
9 for each language in the document identified by the ``lang`` attribute.