Skip to content

• Astra DB Client

1. Overview

The Astra DB Client, as the name suggests, is a client library that interacts with the various APIs of the Astra DataStax Platform. It enables users to connect to, utilize, and administer the Astra Vector product. The library encompasses two distinct clients working in tandem:

  • AstraDBAmin: This class is initialized exclusively using an organization administrator token and enables the creation and deletion of databases via the DevOps API. It facilitates automation and administration within your organization's tenant.

  • AstraDB: This is the primary endpoint, connecting exclusively to a single database to perform all operations for your applications. It requires initialization with a database administrator token and also necessitates the API endpoint of your database.

  • AstraDBCollection: This client class facilitates all operations at the collection level, including find(), insert(), and delete(). It is instantiated through the AstraDB class and accommodates operations on both vector and non-vector collections.

  • AstraDBRepository: This class represents a specialized form of AstraDBCollection designed for use with Java beans (T). It embodies the repository pattern, streamlining the management and access of domain entities.

Reference Architecture

2. Prerequisites

Java and Apache Maven/Gradle Setup
  • Install Java Development Kit (JDK) 11++

Use the java reference documentation to install a Java Development Kit (JDK) tailored for your operating system. After installation, you can validate your setup with the following command:

java --version
  • Install Apache Maven (3.9+) or Gradle

Samples and tutorials are designed to be used with Apache Maven. Follow the instructions in the reference documentation to install Maven. To validate your installation, use the following command:

mvn -version
Astra Environment Setup
  • Create your DataStax Astra account:

Sign Up to Datastax Astra

  • Create an organization level Astra Token

Once logged into the user interface, select settings from the left menu and then click on the tokens tab to create a new token.

You want to pick the following role:

Properties Values
Token Role Organization Administrator

The Token contains properties Client ID, Client Secret and the token. You will only need the third (starting with AstraCS:)

{
  "ClientId": "ROkiiDZdvPOvHRSgoZtyAapp",
  "ClientSecret": "fakedfaked",
  "Token":"AstraCS:fake" <========== use this field
}

To operate with AstraDBAdmin, this specific organization-level token is required. For tasks involving AstraDB at the database level, a database-level token suffices. The procedure for creating such a token is detailed in subsequent sections.

3. Getting Started

Project Setup

Project Setup
  • If you are using Maven Update your pom.xml file with the latest version of the Vector SDK Maven Central
<dependency>
  <groupId>com.datastax.astra</groupId>
  <artifactId>astra-db-client</artifactId>
  <version>${latest}</version>
</dependency>
  • If you are using gradle change the build.dgradle with
dependencies {
    compile 'com.datastax.astra:astra-db-client-1.0'
}

Quickstart

Getting your token and Api Endpoint

AstraDB class is the entry point of the SDK. It enables interactions with one particular database within your Astra environment. The initialization can be achieved in multiple ways:

  • Using a token along with the api_endpoint. Both are retrieved from the Astra user interface.
  • Using a token with the database identifier and eventually the region.

To establish this connection, you can generate a token via the user interface. This token will be assigned the Database Administrator permission level, which grants sufficient privileges for interacting with a specific database.

The api_endpoint is obtained from the user interface. It adheres to the following pattern: https://{database-identifier}-{database-region}.apps.astra.datastax.com.

4. Reference Guide

Connection

Connect to AstraDB Vector by instantiating AstraDB class.

General Information
  • Connection is stateless and thread safe, we initialize an HTTP client.
  • At initialization a check is performed to ensure enpoint and token are valid.
  • If not provided default keyspace is default_keyspace.
  • Database UUID and region are part of the endpoint URL.
AstraDB(String token, String apiEndpoint);
AstraDB(String token, String apiEndpoint, String keyspace);
AstraDB(String token, UUID databaseId);
AstraDB(String token, UUID databaseId, String keyspace);
AstraDB(String token, UUID databaseId, String region, String keyspace);
AstraDB(String token, UUID databaseId, String region, AstraEnvironment env, String keyspace);
  • Sample Code

Working with Collections

Overview

Overview

AstraDB is a vector database that manages multiple collections. Each collection (AstraDBCollection) is identified by a name and stores schema-less documents. It is capable of holding any JSON document, each uniquely identified by an _id. Additionally, a JSON document within AstraDB can contain a vector. It is important to note that all documents within the same collection should utilize vectors of the same type, characterized by consistent dimensions and metrics.

Create Collection

Create a collection in the current database.

General Information
  • A collection name is unique for a database
  • A collection name should match [A-Za-z_]
  • Method createCollection() method returns an instance of AstraDBCollection
  • Collection is created only if it does not exist
  • If collection exists, a check is performed for vector dimension and metric
  • There are a maximum of 5 collections per database
  • If not provided, default metric is cosine
  • Vector dimension and a metric are set at creation and cannot be changed later
  • The dimension is the size of the vector
  • The metric is the way the vector will be compared. It can be cosine, euclidean or dot_product
AstraDBCollection createCollection(String name);
AstraDBCollection createCollection(String name, int vectorDimension);
AstraDBCollection createCollection(String name, int vectorDimension, SimilarityMetric metric);
AstraDBCollection createCollection(CollectionDefinition def);
  • Sample Code
  • Data API

Below is the associated REST API payload

Create a collection with no vector

{
  "createCollection": {
    "name": "collection_simple"
  }
}

Create a collection with a vector

{
  "createCollection": {
    "name": "collection_vector",
    "options": {
      "vector": {
        "dimension": 14,
        "metric": "cosine"
      }
    }
  }
}

Create a collection with a vector and indexing options

{
  "createCollection": {
    "name": "collection_deny",
    "options": {
      "vector": {
        "dimension": 14,
        "metric": "cosine"
      },
      "indexing": {
        "deny": [
          "blob_body"
        ]
      }
    }
  }
}

List Collections

List collections in the current database with their attributes. (similarity, dimension, indexing...)

General Information
  • A database can have up to 5 collections.
  • A collection with a vector has a set of options like dimension, similarity and indexing.
Stream<String> findAllCollectionNames();
Stream<CollectionDefinition> findAllCollections();
  • Sample Code
  • Data API

Below is the associated REST API payload

{
  "findCollections": {
    "options": {
      "explain": true
    }
  }
}

Find Collection

Retrieve collection definition from its name.

General Information
  • name is the identifier of the collection.
Optional<CollectionDefinition> findCollectionByName(String name);
boolean isCollectionExists(String name);
  • Sample Code
  • Data API

list collections

{
  "findCollections": {
    "options": {
      "explain": true
    }
  }
}

Delete Collection

Delete a collection from its name

General Information
  • If the collection does not exist, the method will not return any error.
void deleteCollection(String name);
  • Sample Code
  • Data API

delete a collection from its name

{
  "deleteCollection": {
    "name": "collection_vector2"
  }
}

Working with Documents

Insert One

You can insert unitary record with the function insertOne(). Multiple signatures are available to insert a document.

General Informations
  • If not provided, the identifier is generated as a java UUID
  • The method always return the document identifier.
  • All attributes are optional (schemaless)
  • You attribute names should match [A-Za-z_]
  • All Java simple standard types are supported
  • Nested object are supported
  • A field value should not exceed 5Kb
  • Each attribute is indexed and searchable
  • A vector cannot be filled only with 0s, it would lead to division by 0
  • Signature
JsonDocumentMutationResult 
  insertOne(JsonDocument doc);

CompletableFuture<JsonDocumentMutationResult> 
  insertOneASync(JsonDocument doc);

DocumentMutationResult<DOC> 
  insertOne(Document<DOC> document);

CompletableFuture<DocumentMutationResult<DOC>> 
  insertOneASync(Document<DOC> document);
  • Sample Code
  • Data API Payload
{
  "insertOne": {
    "document": {
      "product_name": "HealthyFresh - Chicken raw dog food",
      "product_price": 9.99,
      "_id": "f2472946-cc9f-4ad1-801d-f1cf21d8cb38",
      "$vector": [
        0.3, 0.3, 0.3, 0.3, 0.3,
        0.3, 0.3, 0.3, 0.3, 0.3,
        0.3, 0.3, 0.3, 0.3
      ]
    }
  }
}

Upsert One

General Informations
  • insert* will give you an error when id that already exist in the collection is provided.
  • upsert* will update the document if it exists or insert it if it does not.
  • Signatures
JsonDocumentMutationResult 
  upsertOne(JsonDocument doc);

CompletableFuture<JsonDocumentMutationResult>  
  upsertOneASync(JsonDocument doc);

DocumentMutationResult<DOC>  
  upsertOne(Document<DOC> document);

CompletableFuture<DocumentMutationResult<DOC>>  
  upsertOneASync(Document<DOC> document);
  • Sample Code
  • Data API Payload
{
  "findOneAndReplace": {
    "filter": {
      "_id": "1"
    },
    "options": {
      "upsert": true
    },
    "replacement": {
      "a": "a",
      "b": "updated",
      "_id": "1"
    }
  }
}

Insert Many

General Informations
  • The underlying REST API is paged. The maximum page size is 20.
  • To perform bulk loading, distribution of the workload is recommended
  • insertMany**Chunked** are a helper to distribute the workload
  • If more than 20 documents are provided chunking is applied under the hood
  • Signatures
// Use a json String
List<JsonDocumentMutationResult> 
   insertMany(String json);
CompletableFuture<List<JsonDocumentMutationResult>> 
   insertManyASync(String json);

// Use an Array of JsonDocuments
List<JsonDocumentMutationResult>
   insertMany(JsonDocument... documents);
CompletableFuture<List<JsonDocumentMutationResult>>
   insertManyASync(JsonDocument... documents);

// Use a list of JsonDocument
List<JsonDocumentMutationResult> 
   insertManyJsonDocuments(List<JsonDocument> documents);
CompletableFuture<List<JsonDocumentMutationResult>> 
   insertManyJsonDocumentsASync(List<JsonDocument> documents);

// Use an Array of Document<T>
List<DocumentMutationResult<DOC>> 
   insertMany(Document<DOC>... documents);
CompletableFuture<List<DocumentMutationResult<DOC>>>
   insertManyASync(Document<DOC>... documents);

// Use a list of Document<T>
List<DocumentMutationResult<DOC>> 
   insertMany(List<Document<DOC>> documents);
CompletableFuture<List<DocumentMutationResult<DOC>>>
    insertManyASync(List<Document<DOC>> documents);
  • Sample Code
  • Data API

Insert Many with ordered true

{
  "insertMany": {
    "options": {
      "ordered": false
    },
    "documents": [
      {
        "product_name": "test1",
        "product_price": 12.99,
        "_id": "doc1"
      },
      {
        "product_name": "test2",
        "product_price": 2.99,
        "_id": "doc2"
      }
    ]
  }
}

Insert Many with ordered false

{
  "insertMany": {
    "options": {
      "ordered": true
    },
    "documents": [
      {
        "firstName": "Lucas",
        "lastName": "Hernandez",
        "_id": "1"
      },
      {
        "firstName": "Antoine",
        "lastName": "Griezmann",
        "_id": "2"
      },
      {
        "firstName": "N'Golo",
        "lastName": "Kanté",
        "_id": "3"
      },
      {
        "firstName": "Paul",
        "lastName": "Pogba",
        "_id": "4"
      },
      {
        "firstName": "Raphaël",
        "lastName": "Varane",
        "_id": "5"
      },
      {
        "firstName": "Hugo",
        "lastName": "Lloris",
        "_id": "6"
      },
      {
        "firstName": "Olivier",
        "lastName": "Giroud",
        "_id": "7"
      },
      {
        "firstName": "Benjamin",
        "lastName": "Pavard",
        "_id": "8"
      },
      {
        "firstName": "Kylian",
        "lastName": "Mbappé",
        "_id": "9"
      }
    ]
  }
}

Insert Many Chunked

  • Signatures
// Insert a list of json documents
List<JsonDocumentMutationResult> 
  insertManyChunkedJsonDocuments(List<JsonDocument> documents, int chunkSize, int concurrency);
CompletableFuture<List<JsonDocumentMutationResult>> 
  insertManyChunkedJsonDocumentsAsync(List<JsonDocument> documents, int chunkSize, int concurrency);

// Insert a list of documents
List<DocumentMutationResult<DOC>> 
  insertManyChunked(List<Document<DOC>> documents, int chunkSize, int concurrency);
CompletableFuture<List<DocumentMutationResult<DOC>>> 
  insertManyChunkedASync(List<Document<DOC>> documents, int chunkSize, int concurrency);

Upsert Many

  • Signatures
// Use a json String
List<JsonDocumentMutationResult>
   upsertMany(String json);
CompletableFuture<List<JsonDocumentMutationResult>>
   upsertManyASync(String json);

// Use a list of JsonDocument
List<JsonDocumentMutationResult>
   upsertManyJsonDocuments(List<JsonDocument> documents);
CompletableFuture<List<JsonDocumentMutationResult>>
   upsertManyJsonDocumentsASync(List<JsonDocument> documents);

// Use a list of Document<T>
List<DocumentMutationResult<DOC>>
   upsertMany(List<Document<DOC>> documents);
CompletableFuture<List<DocumentMutationResult<DOC>>>
   upsertManyASync(List<Document<DOC>> documents);

Find By Id

  • Signatures
Optional<JsonDocumentResult> findById(String id);
Optional<DocumentResult<T>> findById(String id, Class<T> bean);
Optional<DocumentResult<T>> findById(String id, DocumentResultMapper<T> mapper);
boolean isDocumentExists(String id);
  • Sample Code
  • Data API
{
  "findOne": {
    "filter": {
      "_id": "p1"
    }
  }
}

Find By Vector

  • Signatures
Optional<JsonDocumentResult> findOneByVector(float[] vector);
Optional<DocumentResult<T>> findOneByVector(float[] vector, Class<T> bean);
Optional<DocumentResult<T>> findOneByVector(float[] vector, DocumentResultMapper<T> mapper);
  • Sample Code

Find One

Introducing SelectQuery

Under the hood every search against the REST Api is done by providing 4 parameters:

  • $filter: which are your criteria (where clause)
  • $projection: which list the fields you want to retrieve (select)
  • $sort: which order the results in memory (order by) or the vector search (order by ANN)
  • $options: that will contains all information like paging, limit, etc.

The SelectQuery class is a builder that will help you to build the query. It is a fluent API that will help you to build the query.

As for findById and findByVector there are 3 methods available to retrieve a document. If the SelectQuery has multiple matches objects only the first will be returned. In doubt use find() or even better findPage() not to exhaust all the collection.

Optional<JsonDocumentResult> findOne(SelectQuery query);
Optional<DocumentResult<DOC>> findOne(SelectQuery query, Class<T> clazz);
Optional<DocumentResult<DOC>> findOne(SelectQuery query, ResultMapper<T> mapper);

Here is a sample class detailing the usage of the findOne method.

  • Data API

Find with a Greater Than or Equals

{
  "find": {
    "filter": {
      "product_price": {
        "$gte": 12.99
      }
    }
  }
}

Find with a Less Than

{
  "find": {
    "filter": {
      "product_price": {
        "$lt": 10
      }
    }
  }
}

Find with a Less Than or Equals

{
  "find": {
    "filter": {
      "product_price": {
        "$lte": 9.99
      }
    }
  }
}

Find with a Equals

{
  "find": {
    "filter": {
      "product_price": 9.99
    }
  }
}

Find with a Not Equals

{
  "find": {
    "filter": {
      "product_price": {
        "$ne": 9.99
      }
    }
  }
}

Find with a Exists

{
  "find": {
    "filter": {
      "product_price": {
        "$exists": true
      }
    }
  }
}

Find with a And

{
  "find": {
    "filter": {
      "$and": [
        {
          "product_price": {
            "$exists": true
          }
        },
        {
          "product_price": {
            "$ne": 9.99
          }
        }
      ]
    }
  }
}

Find with a In

{
  "find": {
    "filter": {
      "metadata_string": {
        "$in": [
          "hello",
          "world"
        ]
      }
    }
  }
}

Find with a Not In

{
  "find": {
    "filter": {
      "metadata_string": {
        "$nin": [
          "Hallo",
          "Welt"
        ]
      }
    }
  }
}

Find with a Size

{
  "find": {
    "filter": {
      "metadata_boolean_array": {
        "$size": 3
      }
    }
  }
}

Find with a Less Than Instant

{
  "find": {
    "filter": {
      "metadata_instant": {
        "$lt": {
          "$date": 1707483540638
        }
      }
    }
  }
}

Find

Reminders on SelectQuery

Under the hood every search against the REST Api is done by providing 4 parameters:

  • $filter: which are your criteria (where clause)
  • $projection: which list the fields you want to retrieve (select)
  • $sort: which order the results in memory (order by) or the vector search (order by ANN)
  • $options: that will contains all information like paging, limit, etc.

The SelectQuery class is a builder that will help you to build the query. It is a fluent API that will help you to build the query.

 SelectQuery.builder()
 .where("product_price")
 .isEqualsTo(9.99)
 .build();
Important

With the Json API all queries are paged. The maximum page size is 20. The method findAll() and find() will fetch the pages one after the other until pagingState is null. Use those functions with caution.

  • To retrieve every document of a collection use findAll()
// Find All for VectorStore<MyBean>
Stream<JsonResult> all = col1.findAll();
  • Find with a Query
  • To perform semantic search use findVector()

Paging

Every request is paged with the Json API and the maximum page size is 20. The methods return Page that contains the data but also a field called `pagingState

  • Find Page

The signature are close to the find(). Reason is that find() is using findPage under the hood. The difference is that it will exhaust all the pages and return a Stream<JsonResult>.

Page<JsonResult> jsonResult = findPage(SelectQuery query);
Page<Result<T>> jsonResult2 = findPage(SelectQuery query, Class<T> clazz);
Page<Result<T>> jsonResult3 = findPage(SelectQuery query, ResultMapper<T> clazz);

Update One

Allow to update an existing document:

Update Many

Allow to update a set of document matching a request.

Delete One

Use to delete an existing document.

Delete Many

Used to delete a set of document matching a request.

Clear

Used to empty a collection

Object Mapping

Overview

Instead of interacting with the database with key/values you may want to associate an object to each record in the collection for this you can use CollectionRepository. If we reproduce the sample before

Repository Pattern

Instead of working with raw JsonDocument you can work with your own object. The object will be serialized to JSON and stored in the database. You do not want to provide a ResultMapper each time but rather use the repository pattern. We will follow the signature of the CrudRepository from Spring Data.

long count();
void delete(T entity);
void deleteAll();
void deleteAll(Iterable<? extends T> entities);
void deleteAllById(Iterable<? extends ID> ids);
void deleteById(ID id);
boolean existsById(ID id);
Iterable<T> findAll();
Iterable<T> findAllById(Iterable<ID> ids);
Optional<T> findById(ID id);
<S extends T> S  save(S entity);
Iterable<S> saveAll(Iterable<S> entities);

Create collection

Insert One

Insert Many

Find One

  • To get a single document use findById() or findByVector()

Find

  • To perform search use find()
  • To perform semantic search use findVector()

Update One

Update Many

Delete One

Delete Many

Clear

Working with databases

Connection

About token permissions

To work with Databases you need to use a token with organization level permissions. You will work with the class AstraDBClient

To establish a connection with AstraDB using the client SDK, you are required to supply a token. This token enables two primary connection modes:

  • Direct database-level connection, facilitating access to a specific database. It is the one decribe above and primay way of working with the SDK.

  • Organization-level connection, which allows interaction with multiple databases under your organization. This is what we will detailed now

AstraDBClient class is used to facilitate interactions with all components within your Astra organization, rather than limiting operations to a single database. This approach enables a broader scope of management and control across the organization's databases. The token used for this connection must be scoped to the organization with

Properties Values
Token Role Organization Administrator

List databases

Create database

To create a database you need to use a token with organization level permissions. You will work with the class AstraDBClient

Find database

  • Accessing object AstraDB
AstraDB myDB = client.database("getting_started");

Delete database

  • Delete Databases with deleteDatabase

The function can take a database identifier (uuid) or the database name.

Working with Keyspaces

Create Keyspace

Create a keyspace in the current database with the given name.

General Information
  • Default keyspace is default_keyspace
  • If the keyspace already exist, the method will return 'KeyspaceAlreadyExistException'
void createKeyspace(String databaseName, String keyspaceName);
void createKeyspace(UUID databaseId, String keyspaceName);
  • Sample Code

Delete Keyspace

Delete a keyspace in the current database from its name.

General Information
  • Default keyspace is default_keyspace
  • If the keyspace does not exist, the method will return 'KeyspaceNotFoundException'
void deleteKeyspace(String databaseName, String keyspaceName);
void deleteKeyspace(UUID databaseId, String keyspaceName);
  • Sample Code

Find Keyspace

General Information
  • A database is not limited in number of keyspaces.
  • A keyspace is a logical grouping of collections.
  • Default keyspace name is default_keyspace
boolean isKeyspaceExists(String keyspaceName);
Stream<String> findAllKeyspaceNames();
String getCurrentKeyspace(String keyspaceName);
void changeKeyspace(String keyspaceName);
  • Sample Code

6. Class Diagram

7. Working with CassIO

Cassio is framework originally implement in Python to use Open Source Cassandra as a Vector Store. It has been partially ported in Java. Idea is java to use the same table created by CassIO.

Connection

General Information
  • CassIO is a framework to use Open Source Cassandra as a Vector Store.
  • Java portage is only 2 tables metadata_vector and clustered_metadata_vector
  • The tables are created with a specific schema to store vectors and metadata
  • The indices are created to perform efficient search on the vector
CqlSession init(String token, UUID databaseId, String databaseRegion, String keyspace);
  • Sample Code

MetadataVectorTable

General Information
  • Creating a Cassandra table with the following schema and associated indices
CREATE TABLE vector_store (
 row_id          timeuuid,
 attributes_blob text,
 body_blob       text,
 metadata_s      map<text, text>,
 vector          vector<float, 1536>,
 PRIMARY KEY (row_id)
);
  • Sample Code

ClusteredMetadataVectorTable

General Information
  • Creating a Cassandra table
CREATE TABLE goodbards.vector_store_openai_by_tenant (
partition_id text,
row_id timeuuid,
attributes_blob text,
body_blob text,
metadata_s map<text, text>,
vector vector<float, 1536>,
PRIMARY KEY (partition_id, row_id)
) WITH CLUSTERING ORDER BY (row_id DESC)
  • Sample Code

8. Working with Langchain4j



Last update: 2025-01-23