README.md (459B) - raw
1 Utilities for working with multilingual text in Lucene. 2 3 ``CyrillicTransliteratingFilter`` injects a Latin transliteration in the 4 same position as tokens containing Cyrillic characters. For example, 5 this makes it possible to match the text ``Pasternak’s Повесть`` with 6 the query ``pasternak's povest``. 7 8 ``XMLTokenizer`` tokenizes an XML document, using different Analyzers 9 for each language in the document identified by the ``lang`` attribute.