Skip to content

Conversation

@zach-shu
Copy link
Collaborator

@zach-shu zach-shu commented Oct 14, 2025

This PR migrates Elasticsearch guides and docs from Assistant Builder to Agent Builder.

Source docs: https://github.com/watson-developer-cloud/assistant-toolkit/tree/master/integrations/extensions/docs/elasticsearch-install-and-setup

Signed off by: [email protected]

@zach-shu zach-shu self-assigned this Oct 21, 2025
@@ -0,0 +1,26 @@
# Elasticsearch Installation and Setup Documentation

This directory contains documentation for installing and setting up Elasticsearch along with related guides and integrations.
Copy link

@kndeepa-ibm kndeepa-ibm Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document explains about installing and setting up Elasticsearch along with related guides and integrations.

## Elasticsearch Setup
- [Install Docker or Docker alternatives](how_to_install_docker.md): A guide explaining Docker and Docker Compose installation options, essential for running Elasticsearch-related applications.
- [Set up Elasticsearch from IBM Cloud and integrate it with watsonx Orchestrate](ICD_Elasticsearch_install_and_setup.md): Instructions for provisioning Elasticsearch instance on IBM Cloud and setting up Agent Knowledge in watsonx Orchestrate.
- [Set up watsonx Discovery (aka Elasticsearch on-prem) and integrate it with watsonx Orchestrate on-prem](watsonx_discovery_install_and_setup.md): Documentation for setting up watsonx Discovery (aka Elasticsearch on-prem) and integrating it with watsonx Orchestrate on-prem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set up watsonx Discovery (also called as Elasticsearch on-prem)

### Option 1: Add Knowledge to your agents in the Agent Builder UI
See [Connecting to an Elasticsearch content repository](https://www.ibm.com/docs/en/watsonx/watson-orchestrate/base?topic=agents-connecting-elasticsearch-content-repository) in watsonx Orchestrate documentation for more details.

### Option 2: Create Knowledge bases via watsonx Orchestrate ADK (Agent Development Kit)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create Knowledge bases through watsonx Orchestrate Agent Development Kit (ADK)

See [Creating external knowledge bases with Elasticsearch](https://developer.watson-orchestrate.ibm.com/knowledge_base/build_kb#elasticsearch) in ADK documentation for more details.

### Configure the Advanced Elasticsearch Settings
There are two settings under `Advanced Elasticsearch Settings` for using custom query body and custom filters to achieve advanced search use cases. See the guide [How to configure Advanced Elasticsearch Settings](./how_to_configure_advanced_elasticsearch_settings.md) for more details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To achieve advanced search results, use custom query body and custom filters in Advanced Elasticsearch Settings. For more details, see How to configure Advanced Elasticsearch Settings.

There are two settings under `Advanced Elasticsearch Settings` for using custom query body and custom filters to achieve advanced search use cases. See the guide [How to configure Advanced Elasticsearch Settings](./how_to_configure_advanced_elasticsearch_settings.md) for more details.

### Federated search
You can follow the guide [here](federated_search.md) to run queries across multiple indexes within your Elasticsearch cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow the guidance in Federated Search in Elasticsearch to run queries across multiple indexes within your Elasticsearch cluster.


You can now run the fscrawler to ingest your documents.

NOTE: If the updated date of the documents are older than the current date, you would have to follow instructions as mentioned [here](https://fscrawler.readthedocs.io/en/latest/user/tips.html#moving-files-to-a-watched-directory) to ensure the fscrawler is able to pick it up for indexing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: If the updated date of the documents are older than the current date, you must follow the instructions as mentioned in Moving files to a “watched” directory to ensure the fscrawler is able to pick it up for indexing.

-H "Content-Type: application/json" --cacert "${ES_CACERT}"
```

OPTIONAL: Once all documents are indexed, you can stop the `fscrawler` app or if you would like to, you can leave it running to keep the filesystem in sync if new documents are added or old ones removed. To stop the app , run the following:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPTIONAL: Once all documents are indexed, you can stop the fscrawler app. Else, you can leave it running to keep the file system in sync if new documents are added or old ones removed.
To stop the app , run the following:

docker-compose down
```

Your documents are now available in the index, ready for searching and querying. Follow the steps outlined below to use this index for Agent Knowledge in watsonx Orchestrate.
Copy link

@kndeepa-ibm kndeepa-ibm Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove " Follow the steps outlined below to use this index for Agent Knowledge in watsonx Orchestrate. "
Its understood automatically.


### Step 5: Connecting to Agent Knowledge in watsonx Orchestrate

To configure your index for Agent Knowledge in watsonx Orchestrate, you need to follow the documentation for [Connecting to an Elasticsearch content repository](https://www.ibm.com/docs/en/watsonx/watson-orchestrate/base?topic=agents-connecting-elasticsearch-content-repository).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To configure your index for Agent Knowledge in watsonx Orchestrate, refer to Connecting to an Elasticsearch content repository.


To configure your index for Agent Knowledge in watsonx Orchestrate, you need to follow the documentation for [Connecting to an Elasticsearch content repository](https://www.ibm.com/docs/en/watsonx/watson-orchestrate/base?topic=agents-connecting-elasticsearch-content-repository).

Importantly, you need to use the right fields to configure your result content (In this guide, use `title` for Title and `text` for Body). You also need to use the right query body to make Knowledge work with your web crawler index. Here is an screenshot of the configuration:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure to use the right fields to configure your result content, that is, title for Title and text for Body. You must use the right query format so that Knowledge works properly with your web crawler index. Refer to the following configuration image:


## What is Docker?

Docker is a software platform that allows you to build, test, and deploy applications quickly. Docker packages software into standardized units called containers that have everything the software needs to run including libraries, system tools, code, and runtime. Using Docker, you can quickly deploy and scale applications into any environment and know your code will run.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker is a software platform designed to streamline the process of building, testing, and deploying applications. It packages software into standardized units called containers that include all necessary component such as libraries, system tools, code, and runtime. With Docker, you can efficiently deploy and scale applications in any environment ensuring consistent and reliable execution of your code.


Docker is a software platform that allows you to build, test, and deploy applications quickly. Docker packages software into standardized units called containers that have everything the software needs to run including libraries, system tools, code, and runtime. Using Docker, you can quickly deploy and scale applications into any environment and know your code will run.

And Docker Compose is a tool for defining and running multi-container applications. It is the key to unlocking a streamlined and efficient development and deployment experience.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker Compose is a tool for defining and managing multi-container applications. It plays a crucial role in simplifying and optimizing both development and deployment experience.


And Docker Compose is a tool for defining and running multi-container applications. It is the key to unlocking a streamlined and efficient development and deployment experience.

You will see references to `docker` and `docker-compose` as you work through some of our guides and this document serves to guide anyone who needs a starting point to install that software or its alternatives.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you go through the guides, you can see references to docker and docker-compose. This document is intended to help anyone who needs a starting point for installing these tools or exploring alternative solutions.


1. [Docker Compose Overview](https://docs.docker.com/compose/)
2. [Docker Overview](https://docs.docker.com/get-docker/)

Copy link

@kndeepa-ibm kndeepa-ibm Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a sentence like
"You can install Docker in many ways. The following table serves as a quick guide to choose the method of installing Docker."

Install method Who can use Maintenance
Docker Desktop) Small organizations Maintained regularly by Docker
Rancher Desktop New users to Docker and prefer an easy one-click install of basic functionality
Podman Windows or Mac users who has a Linux distribution/subsystem or Linux in a virtual machine Manual update
Colima Who prefers CLI than Docker desktop GUI

Please give links to all the install options so that users can go to the correct option easily.

1. [Docker Compose Overview](https://docs.docker.com/compose/)
2. [Docker Overview](https://docs.docker.com/get-docker/)

## Option 1: Docker Desktop

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Docker Desktop


To use ELSER for text expansion queries on chunked texts, you need to build a pipeline with an inference processor that uses the ELSER model.

NOTE: ELSER model is not enabled by default, and you can enable it in Kibana, following the [download-deploy-elser instructions](https://www.elastic.co/guide/en/machine-learning/8.11/ml-nlp-elser.html#download-deploy-elser).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow NOTE or Note throughout.

Note: ELSER model is not enabled by default. You can enable it in Kibana by following the download-deploy-elser instructions.


Depending on your Elasticsearch version, you can choose to deploy either ELSER v1 or v2 model. The following steps and commands are based on ELSER v1 model, but you can find what change is needed for ELSER v2 in the notes of each step.

You will be able to reference this pipeline in the next few steps as a part of indexing the documents of choice. It transforms the "text" field using the ELSER model and produces the terms along with weights as a sparse vector in the "ml" field at index time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can refrence this pipeline in the upcoming steps as part of the document indexing process. It applies the ELSER model to transform the text field, generating the terms along with weights as a sparse vector in the ml field at index time.


Learn more about [inference-ingest-pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/8.11/semantic-search-elser.html#inference-ingest-pipeline) from the tutorial

Create the pipeline using the command below:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create the pipeline using the following command:


You will be able to reference this pipeline in the next few steps as a part of indexing the documents of choice. It transforms the "text" field using the ELSER model and produces the terms along with weights as a sparse vector in the "ml" field at index time.

Learn more about [inference-ingest-pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/8.11/semantic-search-elser.html#inference-ingest-pipeline) from the tutorial

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Learn more about inference-ingest-pipeline from the tutorial.


Your documents are now available in the index, ready for searching and querying. Follow the steps outlined below to use this index for Agent Knowledge in watsonx Orchestrate.

**NOTE**: There are some example documents available [here](../assets/sample_pdf_docs), if you would like to test the setup.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to Example documents to test the setup.

@@ -0,0 +1,506 @@
# How to set up and use the web crawler in Elasticsearch
This is a documentation about how to set up and use the web crawler in Elasticsearch and connect it to Agent Knowledge in watsonx Orchestrate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation explains how to set up and use the web crawler in Elasticsearch and connect it to Agent Knowledge in watsonx Orchestrate.

# How to set up and use the web crawler in Elasticsearch
This is a documentation about how to set up and use the web crawler in Elasticsearch and connect it to Agent Knowledge in watsonx Orchestrate.

## Tabel of contents:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table of contents

* [Step 4: Connect a web crawler index to Agent Knowledge in watsonx Orchestrate](#step-4-connect-a-web-crawler-index-to-agent-knowledge-in-watsonx-orchestrate)

## Step 1: Set up Enterprise Search to enable the web crawler in Elasticsearch
Before you start, you will need to install and set up your Elasticsearch cluster,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before you start, you must install and set up your Elasticsearch cluster.

### Set up Enterprise Search for Elasticsearch on IBM Cloud
Assuming you have installed Kibana locally following [ICD-elasticsearch-install-and-setup](../../docs/elasticsearch-install-and-setup/ICD_Elasticsearch_install_and_setup.md),
follow these steps to set up Enterprise Search in Elasticsearch:
**NOTE: Enterprise Search requires at least 4GB of memory, so please make sure you have enough memory allocated to your Docker Engine.**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: Enterprise Search requires a minimum of 4GB of memory. Ensure that your Docker Engine has sufficient memory allocated to meet this requirement.

```shell
docker network create elastic
```
NOTE: `elastic` will be the name of your docker network.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: elastic is the name of your docker network.

In the ingest pipeline page, click on `Add a processor`, choose `Script` processor, and then add [a painless script](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-painless.html) to the `Source` field.
For example,
<img src="assets/web_crawler_script_processor.png" width="577" height="718" />

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples for painless scripts

(or any other suitable heading)

i = j;
}
```
This script splits the `body_content` into sentences using regex and combines them into `passages`. The maximum number of characters in each paggase is controlled by the `model_limit` parameter. There is a overlapping between two adjacent passages, and it is controled by the `overlap_percentage` parameter. So, `model_limit` and `overlap_percentage` need to be configured in the `Parameters` field, for example,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overlapping between two adjacent passages is controlled by the overlap_percentage parameter. So, model_limit and overlap_percentage need to be configured in the Parameters field. For example,

NOTES:
* `.elser_model_2_linux-x86_64` is an optimized version of the ELSER v2 model and is preferred to use if it is available. Otherwise, use `.elser_model_2` for the regular ELSER v2 model or `.elser_model_1` for ELSER v1.
* `inference_config.text_expansion` is required in the config to tell the Foreach processor to use `text_expansion`
and store the results in `tokens` field for each chunked text.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align the sentence properly



> ⛔️
> **Caution**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Click Save pipeline to save your changes.


To configure your web crawler index for Agent Knowledge in watsonx Orchestrate, you need to follow the documentation for [Connecting to an Elasticsearch content repository](https://www.ibm.com/docs/en/watsonx/watson-orchestrate/base?topic=agents-connecting-elasticsearch-content-repository).

Importantly, you need to use the right fields to configure your result content (In this guide, use `title` for Title and `text` for Body). You also need to use the right query body to make Knowledge work with your web crawler index. Here is an screenshot of the configuration:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMPORTANT You must use the right fields to configure your result content (In this guide, use title for Title and text for Body). You must use the right query body to make Knowledge work with your web crawler index. Refer to the following example configuration image:

@@ -0,0 +1,248 @@
# How to set up and use 3rd-party text embeddings for dense vector search in Elasticsearch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to set up and use third party text embeddings for dense vector search in Elasticsearch
Comment

@@ -0,0 +1,248 @@
# How to set up and use 3rd-party text embeddings for dense vector search in Elasticsearch
This guide demonstrates how to deploy and use a text embedding model in Elasticsearch. The model will generate vector representations for text, enabling vector similarity (k-nearest neighbours) search.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guide outlines the deployment and usage of a text embedding model within Elasticsearch. The model generates vector representations for text, enabling k-nearest neighbors (KNN) search based on vector similarity.


## Set up Elasticsearch

### Elasticsearch from IBM Cloud

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set up Elasticsearch on IBM Cloud

## Set up Elasticsearch

### Elasticsearch from IBM Cloud
If you are using Elasticsearch from IBM Cloud, please refer to [this guide](./ICD_Elasticsearch_install_and_setup.md) first to create an Elasticsearch instance and set up Kibana if you haven't already.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are using Elasticsearch from IBM Cloud, refer to Install guide to create an Elasticsearch instance and set up Kibana,

This guide demonstrates how to deploy and use a text embedding model in Elasticsearch. The model will generate vector representations for text, enabling vector similarity (k-nearest neighbours) search.

## Set up Elasticsearch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write one sentence about setting ES. You can't have H2 & H3 together.

```
You can find the credentials from the service credentials of your Elasticsearch instance.
## Pull and deploy an embedding model
Run the command below to pull your desired model from the [Huggingface Models Hub](https://huggingface.co/models) and deploy it on your Elasticsearch instance:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run the following command to pull your desired model

--start
```

In this example, we are using the `multilingual-e5-small` model which is a multi-lingual model that supports text embeddings in 100 languages. You can read more about this model [here](https://huggingface.co/intfloat/multilingual-e5-small)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above example, multilingual-e5-small model which is a multi-lingual model that supports text embeddings in 100 languages is taken for reference. For more information on this model, see Multilingual-E5-small.

In this example, we are using the `multilingual-e5-small` model which is a multi-lingual model that supports text embeddings in 100 languages. You can read more about this model [here](https://huggingface.co/intfloat/multilingual-e5-small)

## Synchronize your deployed model
Go to the **Machine Learning > Trained Models** page http://localhost:5601/app/ml/trained_models and synchronize your trained models. A warning message is displayed at the top of the page that says "ML job and trained model synchronization required". Follow the link to "Synchronize your jobs and trained models." Then click Synchronize.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To synchronize your deployed model:

  1. Go to http://localhost:5601/app/ml/trained_models page.
  2. Click Machine Learning > Trained Models.
  3. A warning message, "ML job and trained model synchronization required" is displayed at the top of the page.
  4. Click the link "Synchronize your jobs and trained models".
  5. Click Synchronize.

}'
```

You can verify that the ingest pipeline was created by locating it in the list of your ingest pipelines on Kibana http://localhost:5601/app/management/ingest/ingest_pipelines

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go to http://localhost:5601/app/management/ingest/ingest_pipelines page and verify that the ingest pipeline was created by locating it in the list of your ingest pipelines on Kibana.

}'
```

## What's next?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What to do next

@@ -0,0 +1,312 @@
# How to set up watsonx Discovery (Elasticsearch) and integrate it with watsonx Orchestrate in CloudPak

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to set up watsonx Discovery (Elasticsearch) and integrate it with watsonx Orchestrate in CloudPak for Data

@@ -0,0 +1,312 @@
# How to set up watsonx Discovery (Elasticsearch) and integrate it with watsonx Orchestrate in CloudPak
This is a documentation about how to set up watsonx Discovery (aka Elasticsearch on-prem) and integrate it with watsonx Orchestrate in CloudPak.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document provides guidance on setting up watsonx Discovery (also known as Elasticsearch on-prem) and integrating it with watsonx Orchestrate in CloudPak for Data.



## Step 1: Install Elastic Cloud on Kubernetes(ECK) on CloudPak
This step is about installing Elastic Cloud on Kubernetes (ECK) in CloudPak.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is not needed. You can remove. Title is self explanatory

## Step 1: Install Elastic Cloud on Kubernetes(ECK) on CloudPak
This step is about installing Elastic Cloud on Kubernetes (ECK) in CloudPak.

Before you begin, you will need:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before you begin

This step is about installing Elastic Cloud on Kubernetes (ECK) in CloudPak.

Before you begin, you will need:
* Access to a CloudPak cluster

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must have access to the CloudPak cluster

cpu: 2
EOF
```
NOTE: the container resources are configurable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: The container resources are configurable.

```

* Add an ECK enterprise license
When you install the default distribution of ECK, you receive a Basic license. If you have a valid Enterprise

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the sentence alignment

You have obtained `ES_USER` and `ES_PASSWORD` from [obtain-the-elasticsearch-credentials](#verify-the-installation) step.

### Enable ELSER model (v2)
ELSER model is not enabled by default, but you can enable it in Kibana. Please follow the [download-deploy-elser instructions](https://www.elastic.co/guide/en/machine-learning/8.11/ml-nlp-elser.html#download-deploy-elser) to do it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ELSER model is not enabled by default. To enable it in Kibana, follow the Download deploy ELSER instructions.

Notes:
* `search-wa-docs` will be your index name
* `text_embedding` is the field that will keep ELSER output when data is ingested, and `sparse_vector` type is required for ELSER output field
* `text` is the input filed for the inference processor. In the example dataset, the name of the input field is `text` which will be used by ELSER model to process.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

text is the input field for the inference processor.


### Semantic search by using the text_expansion query
To perform semantic search, use the `text_expansion` query, and provide the query text and the ELSER model ID.
The example below uses the query text "How to set up custom extension?", the `text_embedding` field contains

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following example uses the query text "How to set up custom extension?" and the text_embedding field contains the generated ELSER output:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants