Indexing and Updating Documents with the Elasticsearch REST API
Elasticsearch is a powerful, distributed search and analytics engine that relies on well-structured data ingestion. Managing this data involves fundamental Create, Read, Update, and Delete (CRUD) operations, primarily executed via its versatile REST API. Understanding how to correctly index new documents and efficiently update existing ones is crucial for maintaining a real-time, accurate data store.
This guide will walk you through the essential HTTP methods and API endpoints used for indexing new records and modifying existing documents within your Elasticsearch cluster. We will focus on the syntax, required JSON payloads, and interpreting the response codes to ensure seamless data management.
Prerequisites
Before proceeding, ensure you have:
- An active Elasticsearch cluster running.
- A command-line tool capable of making HTTP requests (like
curl) or an HTTP client (like Postman). - Knowledge of your target index name.
1. Indexing New Documents
Indexing is the process of storing a JSON document into an Elasticsearch index. Elasticsearch automatically assigns a unique ID to the document unless one is explicitly provided. The primary method for indexing is the PUT or POST HTTP method.
1.1 Indexing with an Automatic ID (POST)
When you use POST to the index endpoint, Elasticsearch generates a unique document ID for you. This is often the preferred method for initial data ingestion when IDs are managed internally.
Endpoint: POST /{index_name}/_doc/
Example Request (using curl):
curl -X POST "localhost:9200/products/_doc/" -H 'Content-Type: application/json' -d'
{
"name": "Wireless Mouse X1",
"price": 25.99,
"in_stock": true
}
'
Successful Response Snippet:
{
"_index": "products",
"_id": "c7BwJ3gBpV4wT-eH_aY1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
The result field showing created confirms a new document was added.
1.2 Indexing with a Specific ID (PUT)
If your source system provides a unique identifier for the document, you should use the PUT method targeting a specific ID. If a document with that ID already exists, PUT will overwrite the entire document.
Endpoint: PUT /{index_name}/_doc/{document_id}
Example Request: Indexing a document with ID 1001.
curl -X PUT "localhost:9200/products/_doc/1001" -H 'Content-Type: application/json' -d'
{
"name": "Mechanical Keyboard K90",
"price": 129.99,
"in_stock": true
}
'
Successful Response Snippet:
{
"_index": "products",
"_id": "1001",
"_version": 1,
"result": "created",
"_shards": { ... }
}
Tip on Overwriting: If you run the exact same
PUTrequest again with the same ID, theresultwill change toupdated, and the_versionnumber will increment.
2. Updating Existing Documents
Updating documents is distinct from overwriting. When you only want to change specific fields within an existing document without affecting unchanged fields, you use the _update endpoint, typically with the POST method.
2.1 Partial Updates using _update
The _update API is crucial for Atomic Updates. It requires a doc block within the payload, which contains only the fields you wish to modify. Elasticsearch retrieves the document, merges the changes, and re-indexes it.
Endpoint: POST /{index_name}/_update/{document_id}
Example Scenario: We want to update the price of product ID 1001 from $129.99 to $119.99 and mark it as out of stock.
curl -X POST "localhost:9200/products/_update/1001" -H 'Content-Type: application/json' -d'
{
"doc": {
"price": 119.99,
"in_stock": false
}
}
'
Successful Response Snippet:
{
"_index": "products",
"_id": "1001",
"_version": 2,
"result": "updated",
"_shards": { ... }
}
Notice that _version has incremented from 1 to 2, reflecting the modification.
2.2 Using Scripted Updates
For more complex, conditional, or mathematical updates, Elasticsearch supports Painless scripting within the _update API. This allows you to perform operations like incrementing counters or setting fields based on their current values.
Example Scenario: Increment the stock count by 5 for document ID 1001.
curl -X POST "localhost:9200/products/_update/1001" -H 'Content-Type: application/json' -d'
{
"script": {
"source": "ctx._source.stock += params.count",
"params": {
"count": 5
}
}
}
'
Key Scripting Concept: ctx._source refers to the current document source.
Warning on Scripts: While powerful, complex scripts can impact performance. Use simple field updates (
doc) whenever possible, as they are generally faster and safer.
3. Bulk Indexing and Updates
For high-volume data operations, sending individual requests for every document is inefficient. Elasticsearch provides the _bulk API to handle multiple indexing, updating, or deleting operations in a single request.
3.1 Structure of a Bulk Request
Bulk requests use a specific, newline-delimited JSON (NDJSON) format. Each operation is defined by a metadata line (specifying the action, index, and optional ID) followed immediately by the document source (if required).
Action Types for Bulk: index, create, update, delete.
Example Bulk Request (Mixing Indexing and Updating):
curl -X POST "localhost:9200/products/_bulk" -H 'Content-Type: application/x-ndjson' -d'
{"index": {"_id": "2001"}}
{"name": "USB-C Hub", "price": 45.00, "in_stock": true}
{"update": {"_id": "1001"}}
{"doc": {"price": 115.00}}
{"delete": {"_id": "3003"}}
'
In this example:
- Document
2001is indexed. - Document
1001is partially updated (price is lowered). - Document
3003is deleted.
Bulk Response: The response will detail the success or failure of each individual operation within the batch, allowing you to pinpoint which documents succeeded and which failed.
Summary of Key API Commands
| Operation | HTTP Method | Endpoint Pattern | Effect |
|---|---|---|---|
| Index (Auto ID) | POST |
/{index}/_doc/ |
Creates new document with auto-generated ID. |
| Index/Overwrite | PUT |
/{index}/_doc/{id} |
Creates or completely replaces document at specified ID. |
| Partial Update | POST |
/{index}/_update/{id} |
Merges changes specified in the doc block. |
| Bulk Operations | POST |
/{index}/_bulk |
Executes multiple operations in one request. |
Mastering these fundamental REST API interactions provides the backbone for dynamic data management within any Elasticsearch application.