[[language-intro]] == Getting Started with Languages

Elasticsearch ships with a collection of language analyzers that provide good, basic, out-of-the-box ((("language analyzers")))((("languages", "getting started with")))support for many of the world's most common languages:

Arabic, Armenian, Basque, Brazilian, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Kurdish, Norwegian, Persian, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, and Thai.

These analyzers typically((("language analyzers", "roles performed by"))) perform four roles:

  • Tokenize text into individual words: + The quick brown foxes -> [The, quick, brown, foxes]

  • Lowercase tokens: + The -> the

  • Remove common stopwords: + [The, quick, brown, foxes] -> [quick, brown, foxes]

  • Stem tokens to their root form: + foxes -> fox

Each analyzer may also apply other transformations specific to its language in order to make words from that((("language analyzers", "other transformations specific to the language"))) language more searchable:

  • The english analyzer ((("english analyzer")))removes the possessive 's: + John's -> john

  • The french analyzer ((("french analyzer")))removes elisions like l' and qu' and diacritics like ¨ or ^: + l'église -> eglis

  • The german analyzer normalizes((("german analyzer"))) terms, replacing ä and ae with a, or ß with ss, among others: + äußerst -> ausserst

powered by Gitbook该页面构建时间: 2017-08-11 12:51:16

results matching ""

    No results matching ""