Indexing and Updating Documents with the Elasticsearch REST API

Master the core Create, Read, Update, Delete (CRUD) operations in Elasticsearch using the REST API. This guide details the precise HTTP requests, endpoints, and JSON payloads required for indexing new documents (with or without specified IDs) and performing granular, partial updates on existing records. Learn practical `curl` examples for atomic updates, scripted modifications, and efficient bulk data ingestion.

38 views

Indexing and Updating Documents with the Elasticsearch REST API

Elasticsearch is a powerful, distributed search and analytics engine that relies on well-structured data ingestion. Managing this data involves fundamental Create, Read, Update, and Delete (CRUD) operations, primarily executed via its versatile REST API. Understanding how to correctly index new documents and efficiently update existing ones is crucial for maintaining a real-time, accurate data store.

This guide will walk you through the essential HTTP methods and API endpoints used for indexing new records and modifying existing documents within your Elasticsearch cluster. We will focus on the syntax, required JSON payloads, and interpreting the response codes to ensure seamless data management.


Prerequisites

Before proceeding, ensure you have:

  • An active Elasticsearch cluster running.
  • A command-line tool capable of making HTTP requests (like curl) or an HTTP client (like Postman).
  • Knowledge of your target index name.

1. Indexing New Documents

Indexing is the process of storing a JSON document into an Elasticsearch index. Elasticsearch automatically assigns a unique ID to the document unless one is explicitly provided. The primary method for indexing is the PUT or POST HTTP method.

1.1 Indexing with an Automatic ID (POST)

When you use POST to the index endpoint, Elasticsearch generates a unique document ID for you. This is often the preferred method for initial data ingestion when IDs are managed internally.

Endpoint: POST /{index_name}/_doc/

Example Request (using curl):

curl -X POST "localhost:9200/products/_doc/" -H 'Content-Type: application/json' -d'
{
  "name": "Wireless Mouse X1",
  "price": 25.99,
  "in_stock": true
}
'

Successful Response Snippet:

{
  "_index": "products",
  "_id": "c7BwJ3gBpV4wT-eH_aY1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

The result field showing created confirms a new document was added.

1.2 Indexing with a Specific ID (PUT)

If your source system provides a unique identifier for the document, you should use the PUT method targeting a specific ID. If a document with that ID already exists, PUT will overwrite the entire document.

Endpoint: PUT /{index_name}/_doc/{document_id}

Example Request: Indexing a document with ID 1001.

curl -X PUT "localhost:9200/products/_doc/1001" -H 'Content-Type: application/json' -d'
{
  "name": "Mechanical Keyboard K90",
  "price": 129.99,
  "in_stock": true
}
'

Successful Response Snippet:

{
  "_index": "products",
  "_id": "1001",
  "_version": 1,
  "result": "created",
  "_shards": { ... }
}

Tip on Overwriting: If you run the exact same PUT request again with the same ID, the result will change to updated, and the _version number will increment.


2. Updating Existing Documents

Updating documents is distinct from overwriting. When you only want to change specific fields within an existing document without affecting unchanged fields, you use the _update endpoint, typically with the POST method.

2.1 Partial Updates using _update

The _update API is crucial for Atomic Updates. It requires a doc block within the payload, which contains only the fields you wish to modify. Elasticsearch retrieves the document, merges the changes, and re-indexes it.

Endpoint: POST /{index_name}/_update/{document_id}

Example Scenario: We want to update the price of product ID 1001 from $129.99 to $119.99 and mark it as out of stock.

curl -X POST "localhost:9200/products/_update/1001" -H 'Content-Type: application/json' -d'
{
  "doc": {
    "price": 119.99,
    "in_stock": false
  }
}
'

Successful Response Snippet:

{
  "_index": "products",
  "_id": "1001",
  "_version": 2, 
  "result": "updated",
  "_shards": { ... }
}

Notice that _version has incremented from 1 to 2, reflecting the modification.

2.2 Using Scripted Updates

For more complex, conditional, or mathematical updates, Elasticsearch supports Painless scripting within the _update API. This allows you to perform operations like incrementing counters or setting fields based on their current values.

Example Scenario: Increment the stock count by 5 for document ID 1001.

curl -X POST "localhost:9200/products/_update/1001" -H 'Content-Type: application/json' -d'
{
  "script": {
    "source": "ctx._source.stock += params.count",
    "params": {
      "count": 5
    }
  }
}
'

Key Scripting Concept: ctx._source refers to the current document source.

Warning on Scripts: While powerful, complex scripts can impact performance. Use simple field updates (doc) whenever possible, as they are generally faster and safer.


3. Bulk Indexing and Updates

For high-volume data operations, sending individual requests for every document is inefficient. Elasticsearch provides the _bulk API to handle multiple indexing, updating, or deleting operations in a single request.

3.1 Structure of a Bulk Request

Bulk requests use a specific, newline-delimited JSON (NDJSON) format. Each operation is defined by a metadata line (specifying the action, index, and optional ID) followed immediately by the document source (if required).

Action Types for Bulk: index, create, update, delete.

Example Bulk Request (Mixing Indexing and Updating):

curl -X POST "localhost:9200/products/_bulk" -H 'Content-Type: application/x-ndjson' -d'

{"index": {"_id": "2001"}}
{"name": "USB-C Hub", "price": 45.00, "in_stock": true}
{"update": {"_id": "1001"}}
{"doc": {"price": 115.00}}
{"delete": {"_id": "3003"}}

'

In this example:

  1. Document 2001 is indexed.
  2. Document 1001 is partially updated (price is lowered).
  3. Document 3003 is deleted.

Bulk Response: The response will detail the success or failure of each individual operation within the batch, allowing you to pinpoint which documents succeeded and which failed.


Summary of Key API Commands

Operation HTTP Method Endpoint Pattern Effect
Index (Auto ID) POST /{index}/_doc/ Creates new document with auto-generated ID.
Index/Overwrite PUT /{index}/_doc/{id} Creates or completely replaces document at specified ID.
Partial Update POST /{index}/_update/{id} Merges changes specified in the doc block.
Bulk Operations POST /{index}/_bulk Executes multiple operations in one request.

Mastering these fundamental REST API interactions provides the backbone for dynamic data management within any Elasticsearch application.