ποΈ 07112025 0015
Core Concept: Configure custom analyzers in index settings to control how text fields are tokenized and searched.
Why It Mattersβ
Analyzers determine how text is broken down and indexed. The right analyzer improves search relevance, enables features like autocomplete, and optimizes for your use case. This builds on elasticsearch_analyzers which explains analyzer fundamentals.
When to Useβ
- General full-text search - Use standard or custom analyzer with lowercase + ASCII folding
- Autocomplete - Use edge n-gram analyzer
- Language-specific - Use language analyzers for stemming (English, Chinese, etc.)
- Case-insensitive exact match - Use keyword tokenizer with lowercase filter
- Multi-language fields - Different analyzers per language sub-field
Trade-offsβ
Custom Analyzers:
- More control over tokenization and normalization
- Optimized for specific use cases
- Requires understanding of analyzer components
Built-in Analyzers:
- Quick to set up
- Good defaults for most cases
- Less flexibility
Critical: Cannot Change Analyzers on Existing Fields
Analyzer settings are "baked in" at index creation. To change analyzers:
- Create new index with updated analyzer settings
- Reindex all data from old β new index
- Switch application to new index using aliases
- Delete old index after validation
See Reindex API docs for migration strategy.
Quick Referenceβ
Index Structure with Custom Analyzerβ
PUT /your_index
{
"settings": {
"analysis": {
"analyzer": {
"custom_text_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"dynamic_templates": [
{
"field_values": {
"path_match": "field_values.*.*",
"mapping": {
"type": "text",
"analyzer": "custom_text_analyzer",
"fields": {
"keyword": {"type": "keyword"}
}
}
}
}
],
"properties": {
"name": {
"type": "text",
"analyzer": "custom_text_analyzer",
"fields": {"keyword": {"type": "keyword"}}
}
}
}
}
Field structure example:
{
"field_values": {
"title": {"en": "Hello World", "cn": "δ½ ε₯½δΈη"},
"description": {"en": "Test description", "cn": "ζ΅θ―ζθΏ°"}
}
}
This matches field_values.title.en, field_values.title.cn, etc.
Common Analyzer Recipesβ
General Full-Text Search:
"analyzer": {
"general_text": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding"]
}
}
Autocomplete (Edge N-gram):
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "autocomplete_filter"]
}
},
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10
}
}
English with Stemming:
"analyzer": {
"english_text": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "english_stop", "english_stemmer"]
}
},
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
}
}
Case-Insensitive Exact Match:
"analyzer": {
"case_insensitive": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
Multi-Language Pattern:
"dynamic_templates": [
{
"english_fields": {
"path_match": "field_values.*.en",
"mapping": {
"type": "text",
"analyzer": "english",
"fields": {"keyword": {"type": "keyword"}}
}
}
},
{
"chinese_fields": {
"path_match": "field_values.*.cn",
"mapping": {
"type": "text",
"analyzer": "ik_smart",
"fields": {"keyword": {"type": "keyword"}}
}
}
}
]
Multi-Field Strategyβ
Use multiple sub-fields for different search patterns:
"field_values.title.en": {
"type": "text",
"analyzer": "english",
"fields": {
"keyword": {
"type": "keyword"
},
"raw": {
"type": "text",
"analyzer": "standard"
}
}
}
Usage:
field_values.title.en- Full-text search with English analyzerfield_values.title.en.keyword- Exact match, sorting, aggregationsfield_values.title.en.raw- Alternative search pattern (no stemming)
Test Analyzer Outputβ
GET /your_index/_analyze
{
"analyzer": "custom_text_analyzer",
"text": "Hello World! Testing 123"
}
Test specific field:
GET /your_index/_analyze
{
"field": "field_values.title.en",
"text": "Running quickly"
}
Gotcha: Define Analyzers Before Mappings
All custom analyzers must be defined in settings.analysis BEFORE referencing them in mappings. Elasticsearch validates analyzer names at index creation time.
Referencesβ
- Elasticsearch Analyzers
- Custom Analyzers
- elasticsearch_analyzers - Understanding how analyzers work
- es_ngram - N-gram configuration for autocomplete