lucene-multilingual

Multilingual enhancements for the Lucene text search library
git clone https://code.djc.id.au/git/lucene-multilingual/

README.md (459B) - raw

      1 Utilities for working with multilingual text in Lucene.
      2 
      3 ``CyrillicTransliteratingFilter`` injects a Latin transliteration in the 
      4 same position as tokens containing Cyrillic characters. For example, 
      5 this makes it possible to match the text ``Pasternak’s Повесть`` with 
      6 the query ``pasternak's povest``.
      7 
      8 ``XMLTokenizer`` tokenizes an XML document, using different Analyzers 
      9 for each language in the document identified by the ``lang`` attribute.