[[language-intro]] == Getting Started with Languages
Elasticsearch ships with a collection of language analyzers that provide good, basic, out-of-the-box ((("language analyzers")))((("languages", "getting started with")))support for many of the world's most common languages:
Arabic, Armenian, Basque, Brazilian, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Kurdish, Norwegian, Persian, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, and Thai.
These analyzers typically((("language analyzers", "roles performed by"))) perform four roles:
Tokenize text into individual words: +
The quick brown foxes-> [The,quick,brown,foxes]Lowercase tokens: +
The->theRemove common stopwords: + [
The,quick,brown,foxes] -> [quick,brown,foxes]Stem tokens to their root form: +
foxes->fox
Each analyzer may also apply other transformations specific to its language in order to make words from that((("language analyzers", "other transformations specific to the language"))) language more searchable:
The
englishanalyzer ((("english analyzer")))removes the possessive's: +John's->johnThe
frenchanalyzer ((("french analyzer")))removes elisions likel'andqu'and diacritics like¨or^: +l'église->eglisThe
germananalyzer normalizes((("german analyzer"))) terms, replacingäandaewitha, orßwithss, among others: +äußerst->ausserst