Skip to content

• Astra DB Client

1. Overview

The Astra DB Client, as the name suggests, is a client library that interacts with the various APIs of the Astra DataStax Platform. It enables users to connect to, utilize, and administer the Astra Vector product. The library encompasses two distinct clients working in tandem:

  • AstraDBAmin: This class is initialized exclusively using an organization administrator token and enables the creation and deletion of databases via the DevOps API. It facilitates automation and administration within your organization's tenant.

  • AstraDB: This is the primary endpoint, connecting exclusively to a single database to perform all operations for your applications. It requires initialization with a database administrator token and also necessitates the API endpoint of your database.

  • AstraDBCollection: This client class facilitates all operations at the collection level, including find(), insert(), and delete(). It is instantiated through the AstraDB class and accommodates operations on both vector and non-vector collections.

  • AstraDBRepository: This class represents a specialized form of AstraDBCollection designed for use with Java beans (T). It embodies the repository pattern, streamlining the management and access of domain entities.

Reference Architecture

2. Prerequisites

Java and Apache Maven/Gradle Setup
  • Install Java Development Kit (JDK) 11++

Use the java reference documentation to install a Java Development Kit (JDK) tailored for your operating system. After installation, you can validate your setup with the following command:

java --version
  • Install Apache Maven (3.9+) or Gradle

Samples and tutorials are designed to be used with Apache Maven. Follow the instructions in the reference documentation to install Maven. To validate your installation, use the following command:

mvn -version
Astra Environment Setup
  • Create your DataStax Astra account:

Sign Up to Datastax Astra

  • Create an organization level Astra Token

Once logged into the user interface, select settings from the left menu and then click on the tokens tab to create a new token.

You want to pick the following role:

Properties Values
Token Role Organization Administrator

The Token contains properties Client ID, Client Secret and the token. You will only need the third (starting with AstraCS:)

{
  "ClientId": "ROkiiDZdvPOvHRSgoZtyAapp",
  "ClientSecret": "fakedfaked",
  "Token":"AstraCS:fake" <========== use this field
}

To operate with AstraDBAdmin, this specific organization-level token is required. For tasks involving AstraDB at the database level, a database-level token suffices. The procedure for creating such a token is detailed in subsequent sections.

3. Getting Started

Project Setup

Project Setup
  • If you are using Maven Update your pom.xml file with the latest version of the Vector SDK Maven Central
<dependency>
  <groupId>com.datastax.astra</groupId>
  <artifactId>astra-db-client</artifactId>
  <version>${latest}</version>
</dependency>
  • If you are using gradle change the build.dgradle with
dependencies {
    compile 'com.datastax.astra:astra-db-client-1.0'
}

Quickstart

Getting your token and Api Endpoint

AstraDB class is the entry point of the SDK. It enables interactions with one particular database within your Astra environment. The initialization can be achieved in multiple ways:

  • Using a token along with the api_endpoint. Both are retrieved from the Astra user interface.
  • Using a token with the database identifier and eventually the region.

To establish this connection, you can generate a token via the user interface. This token will be assigned the Database Administrator permission level, which grants sufficient privileges for interacting with a specific database.

The api_endpoint is obtained from the user interface. It adheres to the following pattern: https://{database-identifier}-{database-region}.apps.astra.datastax.com.

Quickstart.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.query.Filter;
import io.stargate.sdk.data.domain.JsonDocument;
import io.stargate.sdk.data.domain.JsonDocumentResult;

import java.util.Map;
import java.util.stream.Stream;

public class QuickStart {
  public static void main(String[] args) {

    // Initialize the client
    AstraDB myDb = new AstraDB("TOKEN", "API_ENDPOINT");

    // Create a collection
    AstraDBCollection demoCollection = myDb.createCollection("demo",14);

   // Insert vectors
   demoCollection.insertOne(
       new JsonDocument()
           .id("doc1") // generated if not set
           .vector(new float[]{1f, 0f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f})
           .put("product_name", "HealthyFresh - Beef raw dog food")
           .put("product_price", 12.99));
    demoCollection.insertOne(
        new JsonDocument()
           .id("doc2")
           .vector(new float[]{1f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f})
           .put("product_name", "HealthyFresh - Chicken raw dog food")
           .put("product_price", 9.99));
    demoCollection.insertOne(
        new JsonDocument()
           .id("doc3")
           .vector(new float[]{1f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f})
           .data(Map.of("product_name", "HealthyFresh - Chicken raw dog food")));
    demoCollection.insertOne(
        new JsonDocument()
           .id("doc4")
           .vector(new float[]{1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f})
           .put("product_name", "HealthyFresh - Chicken raw dog food")
           .put("product_price", 9.99));

    // Perform a similarity search
    float[] embeddings = new float[] {1f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f};
    Filter metadataFilter = new Filter().where("product_price").isEqualsTo(9.99);
    int maxRecord = 10;
      long top = System.currentTimeMillis();
    Stream<JsonDocumentResult> resultsSet = demoCollection.findVector(embeddings, metadataFilter, maxRecord);
      System.out.println(System.currentTimeMillis() - top);

  }
}

4. Reference Guide

Connection

Connect to AstraDB Vector by instantiating AstraDB class.

General Information
  • Connection is stateless and thread safe, we initialize an HTTP client.
  • At initialization a check is performed to ensure enpoint and token are valid.
  • If not provided default keyspace is default_keyspace.
  • Database UUID and region are part of the endpoint URL.
AstraDB(String token, String apiEndpoint);
AstraDB(String token, String apiEndpoint, String keyspace);
AstraDB(String token, UUID databaseId);
AstraDB(String token, UUID databaseId, String keyspace);
AstraDB(String token, UUID databaseId, String region, String keyspace);
AstraDB(String token, UUID databaseId, String region, AstraEnvironment env, String keyspace);
  • Sample Code
Connection.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;

import java.util.UUID;

public class Connecting {
  public static void main(String[] args) {
    // Default initialization
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");

    // Initialize with a non-default keyspace
    AstraDB db1 = new AstraDB("TOKEN", "API_ENDPOINT", "<keyspace>");

    // Initialize with an identifier instead of an endpoint
    UUID databaseUuid = UUID.fromString("<database_id>");
    AstraDB db2 = new AstraDB("TOKEN", databaseUuid);
  }
}

Working with Collections

Overview

Overview

AstraDB is a vector database that manages multiple collections. Each collection (AstraDBCollection) is identified by a name and stores schema-less documents. It is capable of holding any JSON document, each uniquely identified by an _id. Additionally, a JSON document within AstraDB can contain a vector. It is important to note that all documents within the same collection should utilize vectors of the same type, characterized by consistent dimensions and metrics.

Create Collection

Create a collection in the current database.

General Information
  • A collection name is unique for a database
  • A collection name should match [A-Za-z_]
  • Method createCollection() method returns an instance of AstraDBCollection
  • Collection is created only if it does not exist
  • If collection exists, a check is performed for vector dimension and metric
  • There are a maximum of 5 collections per database
  • If not provided, default metric is cosine
  • Vector dimension and a metric are set at creation and cannot be changed later
  • The dimension is the size of the vector
  • The metric is the way the vector will be compared. It can be cosine, euclidean or dot_product
AstraDBCollection createCollection(String name);
AstraDBCollection createCollection(String name, int vectorDimension);
AstraDBCollection createCollection(String name, int vectorDimension, SimilarityMetric metric);
AstraDBCollection createCollection(CollectionDefinition def);
  • Sample Code
CreateCollection.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.CollectionDefinition;
import io.stargate.sdk.data.domain.SimilarityMetric;
import io.stargate.sdk.data.exception.DataApiException;

public class CreateCollection {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");

    // Create a non-vector collection
    AstraDBCollection collection1 = db.createCollection("collection_simple");

    // Create a vector collection
    AstraDBCollection collection2 = db.createCollection(
        "collection_vector1",
        14,
        SimilarityMetric.cosine);

    // Create a vector collection with a builder
    AstraDBCollection collection3 = db.createCollection(CollectionDefinition
        .builder()
        .name("collection_vector2")
        .vector(1536, SimilarityMetric.euclidean)
        .build());

    // Collection names should use snake case ([a-zA-Z][a-zA-Z0-9_]*)
    try {
      db.createCollection("invalid.name");
    } catch(DataApiException e) {
      // invalid.name is not valid
    }
  }
}
  • Data API

Below is the associated REST API payload

Create a collection with no vector

{
  "createCollection": {
    "name": "collection_simple"
  }
}

Create a collection with a vector

{
  "createCollection": {
    "name": "collection_vector",
    "options": {
      "vector": {
        "dimension": 14,
        "metric": "cosine"
      }
    }
  }
}

Create a collection with a vector and indexing options

{
  "createCollection": {
    "name": "collection_deny",
    "options": {
      "vector": {
        "dimension": 14,
        "metric": "cosine"
      },
      "indexing": {
        "deny": [
          "blob_body"
        ]
      }
    }
  }
}

List Collections

List collections in the current database with their attributes. (similarity, dimension, indexing...)

General Information
  • A database can have up to 5 collections.
  • A collection with a vector has a set of options like dimension, similarity and indexing.
Stream<String> findAllCollectionNames();
Stream<CollectionDefinition> findAllCollections();
  • Sample Code
FindAllCollections.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import io.stargate.sdk.data.domain.CollectionDefinition;

public class FindAllCollections {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");

    // Get Collection Names
    db.findAllCollectionsNames().forEach(System.out::println);

    // Iterate over all collections and print each vector definition
    db.findAllCollections().forEach(col -> {
      System.out.print("\nname=" + col.getName());
      if (col.getOptions() != null && col.getOptions().getVector() != null) {
        CollectionDefinition.Options.Vector vector = col.getOptions().getVector();
        System.out.print(", dim=" + vector.getDimension());
        System.out.print(", metric=" + vector.getMetric());
      }
    });
  }
}
  • Data API

Below is the associated REST API payload

{
  "findCollections": {
    "options": {
      "explain": true
    }
  }
}

Find Collection

Retrieve collection definition from its name.

General Information
  • name is the identifier of the collection.
Optional<CollectionDefinition> findCollectionByName(String name);
boolean isCollectionExists(String name);
  • Sample Code
FindCollection.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import io.stargate.sdk.data.domain.CollectionDefinition;
import java.util.Optional;

public class FindCollection {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");

    // Find a collection
    Optional<CollectionDefinition> collection = db.findCollectionByName("collection_vector1");

    // Check if a collection exists
    boolean collectionExists = db.isCollectionExists("collection_vector2");
  }
}
  • Data API

list collections

{
  "findCollections": {
    "options": {
      "explain": true
    }
  }
}

Delete Collection

Delete a collection from its name

General Information
  • If the collection does not exist, the method will not return any error.
void deleteCollection(String name);
  • Sample Code
DeleteCollection.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;

public class DeleteCollection {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");

    // Delete an existing collection
    db.deleteCollection("collection_vector2");
  }
}
  • Data API

delete a collection from its name

{
  "deleteCollection": {
    "name": "collection_vector2"
  }
}

Working with Documents

Insert One

You can insert unitary record with the function insertOne(). Multiple signatures are available to insert a document.

General Informations
  • If not provided, the identifier is generated as a java UUID
  • The method always return the document identifier.
  • All attributes are optional (schemaless)
  • You attribute names should match [A-Za-z_]
  • All Java simple standard types are supported
  • Nested object are supported
  • A field value should not exceed 5Kb
  • Each attribute is indexed and searchable
  • A vector cannot be filled only with 0s, it would lead to division by 0
  • Signature
JsonDocumentMutationResult 
  insertOne(JsonDocument doc);

CompletableFuture<JsonDocumentMutationResult> 
  insertOneASync(JsonDocument doc);

DocumentMutationResult<DOC> 
  insertOne(Document<DOC> document);

CompletableFuture<DocumentMutationResult<DOC>> 
  insertOneASync(Document<DOC> document);
  • Sample Code
InsertOne.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.JsonDocumentMutationResult;
import io.stargate.sdk.data.domain.JsonDocument;
import java.util.Map;

public class InsertOne {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");

    // Assumes a collection with a vector field of dimension 14
    AstraDBCollection collection = db.getCollection("collection_vector1");

    // You must delete any existing rows with the same IDs as the
    // rows you want to insert
    collection.deleteAll();

    // Insert rows defined by key/value
    collection.insertOne(
        new JsonDocument()
            .id("doc1") // uuid is generated if not explicitely set
            .vector(new float[]{1f, 0f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f})
            .put("product_name", "HealthyFresh - Beef raw dog food")
            .put("product_price", 12.99));

    // Insert rows defined as a JSON String
    collection.insertOne(
        new JsonDocument()
            .data(
                "{" +
                "\"_id\": \"doc2\", " +
                "\"$vector\": [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], " +
                "\"product_name\": \"HealthyFresh - Chicken raw dog food\", " +
                "\"product_price\": 9.99" +
                "}"));

    // Insert rows defined as a Map Asynchronously
    collection.insertOneASync(
        new JsonDocument()
            .id("doc3")
            .vector(new float[]{1f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f})
            .data(Map.of("product_name", "HealthyFresh - Chicken raw dog food")));

    // If you do not provide an ID, they are generated automatically
    JsonDocumentMutationResult result = collection.insertOne(
        new JsonDocument().put("demo", 1));
    String generatedId = result.getDocument().getId();
  }
}
  • Data API Payload
{
  "insertOne": {
    "document": {
      "product_name": "HealthyFresh - Chicken raw dog food",
      "product_price": 9.99,
      "_id": "f2472946-cc9f-4ad1-801d-f1cf21d8cb38",
      "$vector": [
        0.3, 0.3, 0.3, 0.3, 0.3,
        0.3, 0.3, 0.3, 0.3, 0.3,
        0.3, 0.3, 0.3, 0.3
      ]
    }
  }
}

Upsert One

General Informations
  • insert* will give you an error when id that already exist in the collection is provided.
  • upsert* will update the document if it exists or insert it if it does not.
  • Signatures
JsonDocumentMutationResult 
  upsertOne(JsonDocument doc);

CompletableFuture<JsonDocumentMutationResult>  
  upsertOneASync(JsonDocument doc);

DocumentMutationResult<DOC>  
  upsertOne(Document<DOC> document);

CompletableFuture<DocumentMutationResult<DOC>>  
  upsertOneASync(Document<DOC> document);
  • Sample Code
InsertOne.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.JsonDocument;
import io.stargate.sdk.data.domain.JsonDocumentMutationResult;
import org.junit.jupiter.api.Assertions;

import static io.stargate.sdk.data.domain.DocumentMutationStatus.CREATED;
import static io.stargate.sdk.data.domain.DocumentMutationStatus.UNCHANGED;
import static io.stargate.sdk.data.domain.DocumentMutationStatus.UPDATED;

public class UpsertOne {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");

    // Assumes a collection with a vector field of dimension 14
    AstraDBCollection collection = db.getCollection("collection_vector1");

    // Insert rows defined by key/value
    JsonDocument doc1 = new JsonDocument()
            .id("doc1") // uuid is generated if not explicitely set
            .put("product_name", "HealthyFresh - Beef raw dog food")
            .put("product_price", 12.99);

    // Create the document
    JsonDocumentMutationResult res1 = collection.upsertOne(doc1);
    Assertions.assertEquals(CREATED, res1.getStatus());

    // Nothing happened
    JsonDocumentMutationResult res2 = collection.upsertOne(doc1);
    Assertions.assertEquals(UNCHANGED, res1.getStatus());

    // Document is updated (async)
    doc1.put("new_property", "value");
    collection.upsertOneASync(doc1).thenAccept(res ->
      Assertions.assertEquals(UPDATED, res.getStatus()));
    }
}
  • Data API Payload
{
  "findOneAndReplace": {
    "filter": {
      "_id": "1"
    },
    "options": {
      "upsert": true
    },
    "replacement": {
      "a": "a",
      "b": "updated",
      "_id": "1"
    }
  }
}

Insert Many

General Informations
  • The underlying REST API is paged. The maximum page size is 20.
  • To perform bulk loading, distribution of the workload is recommended
  • insertMany**Chunked** are a helper to distribute the workload
  • If more than 20 documents are provided chunking is applied under the hood
  • Signatures
// Use a json String
List<JsonDocumentMutationResult> 
   insertMany(String json);
CompletableFuture<List<JsonDocumentMutationResult>> 
   insertManyASync(String json);

// Use an Array of JsonDocuments
List<JsonDocumentMutationResult>
   insertMany(JsonDocument... documents);
CompletableFuture<List<JsonDocumentMutationResult>>
   insertManyASync(JsonDocument... documents);

// Use a list of JsonDocument
List<JsonDocumentMutationResult> 
   insertManyJsonDocuments(List<JsonDocument> documents);
CompletableFuture<List<JsonDocumentMutationResult>> 
   insertManyJsonDocumentsASync(List<JsonDocument> documents);

// Use an Array of Document<T>
List<DocumentMutationResult<DOC>> 
   insertMany(Document<DOC>... documents);
CompletableFuture<List<DocumentMutationResult<DOC>>>
   insertManyASync(Document<DOC>... documents);

// Use a list of Document<T>
List<DocumentMutationResult<DOC>> 
   insertMany(List<Document<DOC>> documents);
CompletableFuture<List<DocumentMutationResult<DOC>>>
    insertManyASync(List<Document<DOC>> documents);
  • Sample Code
InsertMany.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.JsonDocumentMutationResult;
import io.stargate.sdk.data.domain.JsonDocument;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

public class InsertMany {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBCollection collection = db.createCollection("collection_vector1",14);

    // Insert documents into the collection (IDs are generated automatically)
    List<JsonDocumentMutationResult> identifiers = collection.insertManyJsonDocuments(List.of(
        new JsonDocument()
            .vector(new float[]{1f, 0f, 1f, 1f, .5f, 1f, 0f, 0.3f, 0f, 0f, 0f, 0f, 0f, 0f})
            .put("product_name", "Yet another product")
            .put("product_price", 99.99),
        new JsonDocument()
            .vector(new float[]{1f, 0f, 1f, 1f, .5f, 1f, 0f, 0.3f, 0f, 0f, 0f, 0f, 0f, 0f})
            .put("product_name", "product3")
            .put("product_price", 99.99)));

    // Insert large collection of documents
    List<JsonDocument> largeList = IntStream
             .rangeClosed(1, 1000)
             .mapToObj(id -> new JsonDocument()
                     .id(String.valueOf(id))
                     .put("sampleKey", id))
             .collect(Collectors.toList());
    int chunkSize   = 20;  // In between 1 and 20
    int threadCount = 10;  // How many chunks processed in parallel
    List<JsonDocumentMutationResult> result = collection
            .insertManyChunkedJsonDocuments(largeList, chunkSize, threadCount);
  }
}
  • Data API

Insert Many with ordered true

{
  "insertMany": {
    "options": {
      "ordered": false
    },
    "documents": [
      {
        "product_name": "test1",
        "product_price": 12.99,
        "_id": "doc1"
      },
      {
        "product_name": "test2",
        "product_price": 2.99,
        "_id": "doc2"
      }
    ]
  }
}

Insert Many with ordered false

{
  "insertMany": {
    "options": {
      "ordered": true
    },
    "documents": [
      {
        "firstName": "Lucas",
        "lastName": "Hernandez",
        "_id": "1"
      },
      {
        "firstName": "Antoine",
        "lastName": "Griezmann",
        "_id": "2"
      },
      {
        "firstName": "N'Golo",
        "lastName": "Kanté",
        "_id": "3"
      },
      {
        "firstName": "Paul",
        "lastName": "Pogba",
        "_id": "4"
      },
      {
        "firstName": "Raphaël",
        "lastName": "Varane",
        "_id": "5"
      },
      {
        "firstName": "Hugo",
        "lastName": "Lloris",
        "_id": "6"
      },
      {
        "firstName": "Olivier",
        "lastName": "Giroud",
        "_id": "7"
      },
      {
        "firstName": "Benjamin",
        "lastName": "Pavard",
        "_id": "8"
      },
      {
        "firstName": "Kylian",
        "lastName": "Mbappé",
        "_id": "9"
      }
    ]
  }
}

Insert Many Chunked

  • Signatures
// Insert a list of json documents
List<JsonDocumentMutationResult> 
  insertManyChunkedJsonDocuments(List<JsonDocument> documents, int chunkSize, int concurrency);
CompletableFuture<List<JsonDocumentMutationResult>> 
  insertManyChunkedJsonDocumentsAsync(List<JsonDocument> documents, int chunkSize, int concurrency);

// Insert a list of documents
List<DocumentMutationResult<DOC>> 
  insertManyChunked(List<Document<DOC>> documents, int chunkSize, int concurrency);
CompletableFuture<List<DocumentMutationResult<DOC>>> 
  insertManyChunkedASync(List<Document<DOC>> documents, int chunkSize, int concurrency);

Upsert Many

  • Signatures
// Use a json String
List<JsonDocumentMutationResult>
   upsertMany(String json);
CompletableFuture<List<JsonDocumentMutationResult>>
   upsertManyASync(String json);

// Use a list of JsonDocument
List<JsonDocumentMutationResult>
   upsertManyJsonDocuments(List<JsonDocument> documents);
CompletableFuture<List<JsonDocumentMutationResult>>
   upsertManyJsonDocumentsASync(List<JsonDocument> documents);

// Use a list of Document<T>
List<DocumentMutationResult<DOC>>
   upsertMany(List<Document<DOC>> documents);
CompletableFuture<List<DocumentMutationResult<DOC>>>
   upsertManyASync(List<Document<DOC>> documents);

Find By Id

  • Signatures
Optional<JsonDocumentResult> findById(String id);
Optional<DocumentResult<T>> findById(String id, Class<T> bean);
Optional<DocumentResult<T>> findById(String id, DocumentResultMapper<T> mapper);
boolean isDocumentExists(String id);
  • Sample Code
FindById.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import com.fasterxml.jackson.annotation.JsonProperty;
import io.stargate.sdk.data.domain.JsonDocumentResult;
import io.stargate.sdk.data.domain.odm.DocumentResult;
import java.util.Optional;

public class FindById {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBCollection collection = db.getCollection("collection_vector1");

    // Fetch a document by ID and return it as JSON
    Optional<JsonDocumentResult> res = collection.findById("doc1");
    res.ifPresent(jsonResult -> System.out.println(jsonResult.getSimilarity()));

    // Fetch a document by ID and map it to an object with ResultMapper
    Optional<DocumentResult<MyBean>> res2 = collection.findById("doc1", record -> {
      MyBean bean = new MyBean(
          (String) record.getData().get("product_name"),
          (Double) record.getData().get("product_price"));
      return new DocumentResult<>(record, bean);
    });

    // Fetch a document by ID and map it to a class
    Optional<DocumentResult<MyBean>> res3 = collection.findById("doc1", MyBean.class);

    // Check if a document exists
    boolean exists = collection.isDocumentExists("doc1");
  }

  public static class MyBean {
    @JsonProperty("product_name") String name;
    @JsonProperty("product_price") Double price;
    public MyBean(String name, Double price) {
      this.name = name;
      this.price = price;
    }
  }
}
  • Data API
{
  "findOne": {
    "filter": {
      "_id": "p1"
    }
  }
}

Find By Vector

  • Signatures
Optional<JsonDocumentResult> findOneByVector(float[] vector);
Optional<DocumentResult<T>> findOneByVector(float[] vector, Class<T> bean);
Optional<DocumentResult<T>> findOneByVector(float[] vector, DocumentResultMapper<T> mapper);
  • Sample Code
FindByVector.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import com.fasterxml.jackson.annotation.JsonProperty;
import io.stargate.sdk.data.domain.odm.DocumentResult;

import java.util.Optional;

public class FindByVector {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBCollection collection = db.getCollection("collection_vector1");

    // Fetch a row by vector and return JSON
    collection
        .findOneByVector(new float[]{1f, 0f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f})
        .ifPresent(jsonResult -> System.out.println(jsonResult.getSimilarity()));

    // Fetch a row by ID and map it to an object with ResultMapper
    Optional<DocumentResult<MyBean>> res2 = collection
        .findOneByVector(
            new float[]{1f, 0f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f},
                record -> {
                    MyBean bean = new MyBean(
                        (String)record.getData().get("product_name"),
                        (Double)record.getData().get("product_price"));
                    return new DocumentResult<>(record, bean);
                }
        );

    // Fetch a row by ID and map the result to a class
    Optional<DocumentResult<MyBean>> res3 = collection.findOneByVector(
        new float[]{1f, 0f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f},
        MyBean.class);
  }

  public static class MyBean {
    @JsonProperty("product_name") String name;
    @JsonProperty("product_price") Double price;
    public MyBean(String name, Double price) {
      this.name = name;
      this.price = price;
    }
  }
}

Find One

Introducing SelectQuery

Under the hood every search against the REST Api is done by providing 4 parameters:

  • $filter: which are your criteria (where clause)
  • $projection: which list the fields you want to retrieve (select)
  • $sort: which order the results in memory (order by) or the vector search (order by ANN)
  • $options: that will contains all information like paging, limit, etc.

The SelectQuery class is a builder that will help you to build the query. It is a fluent API that will help you to build the query.

As for findById and findByVector there are 3 methods available to retrieve a document. If the SelectQuery has multiple matches objects only the first will be returned. In doubt use find() or even better findPage() not to exhaust all the collection.

Optional<JsonDocumentResult> findOne(SelectQuery query);
Optional<DocumentResult<DOC>> findOne(SelectQuery query, Class<T> clazz);
Optional<DocumentResult<DOC>> findOne(SelectQuery query, ResultMapper<T> mapper);

Here is a sample class detailing the usage of the findOne method.

FindOne.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.query.Filter;
import io.stargate.sdk.data.domain.query.SelectQuery;

import static io.stargate.sdk.http.domain.FilterOperator.EQUALS_TO;

public class FindOne {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBCollection collection = db.createCollection("collection_vector1", 14);

    // Retrieve the first document where product_price exists
    Filter filter = new Filter()
            .where("product_price")
            .exists();
    collection.findOne(SelectQuery.builder()
            .filter(filter).build())
            .ifPresent(System.out::println);

    // Retrieve the first document where product_price is 12.99
    Filter filter2 = new Filter()
            .where("product_price")
            .isEqualsTo(12.99);
    collection.findOne(SelectQuery.builder()
        .filter(filter2).build())
    .ifPresent(System.out::println);

    // Send the request as a JSON String
    collection.findOne(
        "{" +
        "\"filter\":{" +
        "\"product_price\":9.99," +
        "\"product_name\":\"HealthyFresh - Chicken raw dog food\"}" +
        "}")
    .ifPresent(System.out::println);

    // Only retrieve the product_name and product_price fields
    collection.findOne(SelectQuery.builder()
        .select("product_name", "product_price")
        .filter(filter2)
        .build())
    .ifPresent(System.out::println);

    // Perform a similarity search
    collection.findOne(SelectQuery.builder()
        .filter(filter2)
        .orderByAnn(new float[]{1f, 0f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f})
        .build());

    // Perform a complex query with AND and OR
    SelectQuery sq2 = new SelectQuery();
    Filter yaFilter = new Filter()
            .and()
              .or()
                .where("product_price", EQUALS_TO, 9.99)
                .where("product_name", EQUALS_TO, "HealthyFresh - Beef raw dog food")
              .end()
              .or()
                .where("product_price", EQUALS_TO, 9.99)
                .where("product_name", EQUALS_TO, "HealthyFresh - Beef raw dog food")
              .end();
    collection.findOne(sq2).ifPresent(System.out::println);
  }
}
  • Data API

Find with a Greater Than or Equals

{
  "find": {
    "filter": {
      "product_price": {
        "$gte": 12.99
      }
    }
  }
}

Find with a Less Than

{
  "find": {
    "filter": {
      "product_price": {
        "$lt": 10
      }
    }
  }
}

Find with a Less Than or Equals

{
  "find": {
    "filter": {
      "product_price": {
        "$lte": 9.99
      }
    }
  }
}

Find with a Equals

{
  "find": {
    "filter": {
      "product_price": 9.99
    }
  }
}

Find with a Not Equals

{
  "find": {
    "filter": {
      "product_price": {
        "$ne": 9.99
      }
    }
  }
}

Find with a Exists

{
  "find": {
    "filter": {
      "product_price": {
        "$exists": true
      }
    }
  }
}

Find with a And

{
  "find": {
    "filter": {
      "$and": [
        {
          "product_price": {
            "$exists": true
          }
        },
        {
          "product_price": {
            "$ne": 9.99
          }
        }
      ]
    }
  }
}

Find with a In

{
  "find": {
    "filter": {
      "metadata_string": {
        "$in": [
          "hello",
          "world"
        ]
      }
    }
  }
}

Find with a Not In

{
  "find": {
    "filter": {
      "metadata_string": {
        "$nin": [
          "Hallo",
          "Welt"
        ]
      }
    }
  }
}

Find with a Size

{
  "find": {
    "filter": {
      "metadata_boolean_array": {
        "$size": 3
      }
    }
  }
}

Find with a Less Than Instant

{
  "find": {
    "filter": {
      "metadata_instant": {
        "$lt": {
          "$date": 1707483540638
        }
      }
    }
  }
}

Find

Reminders on SelectQuery

Under the hood every search against the REST Api is done by providing 4 parameters:

  • $filter: which are your criteria (where clause)
  • $projection: which list the fields you want to retrieve (select)
  • $sort: which order the results in memory (order by) or the vector search (order by ANN)
  • $options: that will contains all information like paging, limit, etc.

The SelectQuery class is a builder that will help you to build the query. It is a fluent API that will help you to build the query.

 SelectQuery.builder()
 .where("product_price")
 .isEqualsTo(9.99)
 .build();
Important

With the Json API all queries are paged. The maximum page size is 20. The method findAll() and find() will fetch the pages one after the other until pagingState is null. Use those functions with caution.

  • To retrieve every document of a collection use findAll()
// Find All for VectorStore<MyBean>
Stream<JsonResult> all = col1.findAll();
  • Find with a Query
Find.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.query.Filter;
import io.stargate.sdk.data.domain.query.SelectQuery;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class Find {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBCollection collection = db.createCollection("collection_vector1", 14);

    // Retrieve the first document with a product_price
    Filter filter = new Filter()
            .where("product_price")
            .exists();
    collection.find(
        SelectQuery.builder().filter(filter).build()
    ).forEach(System.out::println);

    // Retrieve the first document where the product_price is 12.99
    Filter filter2 = new Filter()
            .where("product_price")
            .isEqualsTo(12.99);
    collection
            .find(SelectQuery.builder().filter(filter2).build())
            .forEach(System.out::println);

    // Only retrieve the product_name and product_price fields
    collection.find(
        SelectQuery.builder()
            .select("product_name", "product_price")
            .filter(filter2)
            .build())
        .forEach(System.out::println);

    // Order the results by similarity
    collection.find(
        SelectQuery.builder()
            .filter(filter2)
            .orderByAnn(new float[]{1f, 0f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f})
            .build())
        .forEach(System.out::println);

    // Order the results by a specific field
    Filter filter3 = new Filter()
            .where("product_name")
            .isEqualsTo("HealthyFresh - Chicken raw dog food");
    collection.find(
        SelectQuery.builder()
            .filter(filter3)
            .orderBy("product_price", 1)
            .build())
        .forEach(System.out::println);

    // Complex query with AND and OR:
    //     (product_price == 9.99 OR product_name == "HealthyFresh - Beef raw dog food")
    // AND (product_price == 12.99 OR product_name == "HealthyFresh - Beef raw dog food")
    SelectQuery sq2 = new SelectQuery();
    sq2.setFilter(new HashMap<>());
    Map<String, List<Map<String, Object>>> or1Criteria = new HashMap<>();
    or1Criteria.put("$or", new ArrayList<Map<String, Object>>());
    or1Criteria.get("$or").add(Map.of("product_price", 9.99));
    or1Criteria.get("$or").add(Map.of("product_name", "HealthyFresh - Beef raw dog food"));
    Map<String, List<Map<String, Object>>> or2Criteria = new HashMap<>();
    or2Criteria.put("$or", new ArrayList<Map<String, Object>>());
    or2Criteria.get("$or").add(Map.of("product_price", 12.99));
    or2Criteria.get("$or").add(Map.of("product_name", "HealthyFresh - Beef raw dog food"));
    List<Map<String, List<Map<String, Object>>>> andCriteria = new ArrayList<>();
    andCriteria.add(or1Criteria);
    andCriteria.add(or2Criteria);
    sq2.getFilter().put("$and", andCriteria);
    collection.find(sq2).forEach(System.out::println);
  }
}
  • To perform semantic search use findVector()
FindOne.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.JsonDocumentResult;
import io.stargate.sdk.data.domain.query.Filter;
import io.stargate.sdk.data.domain.query.SelectQuery;
import java.util.stream.Stream;

public class FindVector {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBCollection collection = db.createCollection("collection_vector1", 14);

    float[] embeddings = new float[]{1f, 0f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f};
    Filter metadataFilter = new Filter().where("product_price").isEqualsTo(9.99);
    int maxRecord = 10;

    // Retrieve all document with product price based on the ann search
    collection.findVector(SelectQuery.builder()
       .filter(metadataFilter)
       .orderByAnn(embeddings)
       .withLimit(maxRecord)
       .build())
    .forEach(System.out::println);

    // Same using another signature
    Stream<JsonDocumentResult> result = collection.findVector(embeddings, metadataFilter, maxRecord);
  }
}

Paging

Every request is paged with the Json API and the maximum page size is 20. The methods return Page that contains the data but also a field called `pagingState

  • Find Page

The signature are close to the find(). Reason is that find() is using findPage under the hood. The difference is that it will exhaust all the pages and return a Stream<JsonResult>.

Page<JsonResult> jsonResult = findPage(SelectQuery query);
Page<Result<T>> jsonResult2 = findPage(SelectQuery query, Class<T> clazz);
Page<Result<T>> jsonResult3 = findPage(SelectQuery query, ResultMapper<T> clazz);
FindPage.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import com.fasterxml.jackson.annotation.JsonProperty;
import io.stargate.sdk.core.domain.Page;
import io.stargate.sdk.data.domain.JsonDocumentResult;
import io.stargate.sdk.data.domain.odm.DocumentResult;
import io.stargate.sdk.data.domain.query.Filter;
import io.stargate.sdk.data.domain.query.SelectQuery;

public class FindPage {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBCollection collection = db.createCollection("collection_vector1", 14);

    // Retrieve page 1 of a search (up to 20 results)
    Filter filter = new Filter()
            .where("product_price")
            .exists();
    Page<JsonDocumentResult> page1 = collection.findPage(
        SelectQuery.builder()
            .filter(filter)
            .build());

    // Retrieve page 2 of the same search (if there are more than 20 results)
    Filter filter2 = new Filter()
            .where("product_price")
            .isEqualsTo(12.99);
    page1.getPageState().ifPresent(pageState -> {
        Page<JsonDocumentResult> page2 = collection.findPage(
            SelectQuery.builder()
                .filter(filter2)
                .withPagingState(pageState)
                .build());
    });

    // You can map the output as Result<T> using either a Java pojo or mapper
    Page<DocumentResult<MyBean>> page = collection.findPage(
        SelectQuery.builder().filter(filter2).build(),
        MyBean.class);
  }

  public static class MyBean {
    @JsonProperty("product_name") String name;
    @JsonProperty("product_price") Double price;

    public MyBean(String name, Double price) {
      this.name = name;
      this.price = price;
    }
  }
}

Update One

Allow to update an existing document:

UpdateOne.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.JsonDocument;
import io.stargate.sdk.data.domain.query.UpdateQuery;

import static io.stargate.sdk.http.domain.FilterOperator.EQUALS_TO;

public class UpdateOne {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBCollection collection = db.getCollection("collection_vector1");

    // You must delete any existing rows with the same IDs as the
    // rows you want to insert
    collection.deleteAll();

    // Upsert a document based on a query
    collection.updateOne(UpdateQuery.builder()
      .updateSet("product_name", 12.99)
      .where("product_name", EQUALS_TO, "HealthyFresh - Beef raw dog food")
      .build());

    // Upsert a document by ID
    collection.upsertOne(new JsonDocument()
        .id("id1")
        .put("product_name", 12.99));
  }
}

Update Many

Allow to update a set of document matching a request.

UpdateMany.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.query.Filter;
import io.stargate.sdk.data.domain.query.UpdateQuery;
import io.stargate.sdk.http.domain.FilterOperator;

public class UpdateMany {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBCollection collection = db.getCollection("collection_vector1");

    // Update multiple documents based on a query
    collection.updateMany(UpdateQuery.builder()
        .updateSet("product_name", 12.99)
        .filter(new Filter("product_name", FilterOperator.EQUALS_TO, "HealthyFresh - Beef raw dog food"))
        .build());
  }
}

Delete One

Use to delete an existing document.

DeleteOne.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.query.DeleteQuery;
import io.stargate.sdk.data.domain.query.DeleteResult;

public class DeleteOne {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBCollection collection = db.createCollection("collection_vector1", 14);

    // Delete items from an existing collection with a query
    DeleteResult deletedCount = collection
            .deleteOne(DeleteQuery.deleteById("id1"));
  }
}

Delete Many

Used to delete a set of document matching a request.

DeleteMany.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.query.DeleteQuery;
import io.stargate.sdk.data.domain.query.DeleteResult;

import static io.stargate.sdk.http.domain.FilterOperator.EQUALS_TO;

public class DeleteMany {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBCollection collection = db.createCollection("collection_vector1", 14);

    // Build our query
    DeleteQuery deleteQuery = DeleteQuery.builder()
      .where("product_price", EQUALS_TO, 9.99)
      .build();

    // Deleting only up to 20 record
    DeleteResult page = collection
            .deleteManyPaged(deleteQuery);

    // Deleting all documents matching query
    DeleteResult allDeleted = collection
            .deleteMany(deleteQuery);

    // Deleting all documents matching query in distributed way
    DeleteResult result = collection
            .deleteManyChunked(deleteQuery, 5);
  }
}

Clear

Used to empty a collection

ClearCollection.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;

public class ClearCollection {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBCollection collection = db.createCollection("collection_vector1", 14);

    // Delete all rows from an existing collection
    collection.deleteAll();
  }
}

Object Mapping

Overview

Instead of interacting with the database with key/values you may want to associate an object to each record in the collection for this you can use CollectionRepository. If we reproduce the sample before

Repository Pattern

Instead of working with raw JsonDocument you can work with your own object. The object will be serialized to JSON and stored in the database. You do not want to provide a ResultMapper each time but rather use the repository pattern. We will follow the signature of the CrudRepository from Spring Data.

long count();
void delete(T entity);
void deleteAll();
void deleteAll(Iterable<? extends T> entities);
void deleteAllById(Iterable<? extends ID> ids);
void deleteById(ID id);
boolean existsById(ID id);
Iterable<T> findAll();
Iterable<T> findAllById(Iterable<ID> ids);
Optional<T> findById(ID id);
<S extends T> S  save(S entity);
Iterable<S> saveAll(Iterable<S> entities);

Create collection

ObjectMappingCreateCollection.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBRepository;
import com.fasterxml.jackson.annotation.JsonProperty;
import io.stargate.sdk.data.domain.CollectionDefinition;
import io.stargate.sdk.data.domain.SimilarityMetric;

public class ObjectMappingCreateCollection {

  static class Product {
    @JsonProperty("product_name") private String name;
    @JsonProperty("product_price") private Double price;
  }

  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");

    // Create a non-vector collection
    AstraDBRepository<Product> collection1 =
        db.createCollection("collection_simple", Product.class);

    // Create a vector collection with a builder
    AstraDBRepository<Product> collection2 =
        db.createCollection(
            CollectionDefinition.builder()
                .name("collection_vector2")
                .vector(1536, SimilarityMetric.euclidean)
                .build(),
            Product.class);
  }
}

Insert One

ObjectMappingInsertOne.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBRepository;
import com.fasterxml.jackson.annotation.JsonProperty;
import io.stargate.sdk.data.domain.odm.Document;

public class ObjectMappingInsertOne {
  static class Product {
    @JsonProperty("product_name") private String name;
    @JsonProperty("product_price") private Double price;
    Product(String name, Double price) {
      this.name = name;
      this.price = price;
    }
  }

  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBRepository<Product> productRepository =
        db.createCollection("collection_vector1", 14, Product.class);

    // Upsert document
    productRepository.save(new Document<Product>()
        .id("product1")
        .vector(new float[]{1f, 0f, 1f, 1f, .5f, 1f, 0f, 0.3f, 0f, 0f, 0f, 0f, 0f, 0f})
        .data(new Product("product1", 9.99)));
  }
}

Insert Many

ObjectMappingInsertMany.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBRepository;
import com.fasterxml.jackson.annotation.JsonProperty;
import io.stargate.sdk.data.domain.DocumentMutationResult;
import io.stargate.sdk.data.domain.odm.Document;
import java.util.List;

public class ObjectMappingInsertMany {
  static class Product {
    @JsonProperty("product_name") private String name;
    @JsonProperty("product_price") private Double price;
    Product(String name, Double price) {
      this.name = name;
      this.price = price;
    }
  }

  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBRepository<Product> productRepository =
        db.createCollection("collection_vector1", 14, Product.class);

    // Insert documents into the collection (IDs are generated automatically)
    List<DocumentMutationResult<Product>> identifiers = productRepository.saveAll(
        List.of(
            new Document<Product>()
                .vector(new float[]{1f, 0f, 1f, 1f, .5f, 1f, 0f, 0.3f, 0f, 0f, 0f, 0f, 0f, 0f})
                .data(new Product("product1", 9.99)),
            new Document<Product>()
                .vector(new float[]{1f, 0f, 1f, 1f, .5f, 1f, 0f, 0.3f, 0f, 0f, 0f, 0f, 0f, 0f})
                .data(new Product("product2", 12.99))));
  }
}

Find One

  • To get a single document use findById() or findByVector()
ObjectMappingFindOne.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBRepository;
import com.fasterxml.jackson.annotation.JsonProperty;
import io.stargate.sdk.data.domain.odm.DocumentResult;
import java.util.Optional;

public class ObjectMappingFindOne {
  static class Product {
    @JsonProperty("product_name") private String name;
    @JsonProperty("product_price") private Double price;
  }

  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBRepository<Product> productRepository =
        db.createCollection("collection_vector1", 14, Product.class);

    // Retrieve a products from its id
    Optional<DocumentResult<Product>> res1 = productRepository.findById("id1");

    // Retrieve a product from its vector
    float[] vector = new float[]{1f, 0f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f};
    Optional<DocumentResult<Product>> res2 = productRepository.findByVector(vector);
  }
}

Find

  • To perform search use find()
ObjectMappingFind.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.query.Filter;
import io.stargate.sdk.data.domain.query.SelectQuery;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class ObjectMappingFind {
  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBCollection collection = db.createCollection("collection_vector1", 14);

    // Retrieve the first document with a product_price
    Filter filter = new Filter()
            .where("product_price")
            .exists();
    collection.find(
        SelectQuery.builder()
            .filter(filter)
            .build())
        .forEach(System.out::println);

    // Retrieve the first document where product_price is 12.99
    Filter filter2 = new Filter()
            .where("product_price")
            .isEqualsTo(12.99);
    collection.find(
        SelectQuery.builder()
            .filter(filter2)
            .build())
        .forEach(System.out::println);

    // Only retrieve the product_name and product_price fields
    collection.find(
        SelectQuery.builder()
            .select("product_name", "product_price")
            .filter(filter2)
            .build())
        .forEach(System.out::println);

    // Order the results by similarity
    collection.find(
        SelectQuery.builder()
            .filter(filter2)
            .orderByAnn(new float[]{1f, 0f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f})
            .build())
        .forEach(System.out::println);

    // Order the results by a specific field
    collection.find(
        SelectQuery.builder()
            .filter(filter2)
            .orderBy("product_price", 1)
            .build())
        .forEach(System.out::println);

    // Complex query with AND and OR:
    //     (product_price == 9.99 OR product_name == "HealthyFresh - Beef raw dog food")
    // AND (product_price == 12.99 OR product_name == "HealthyFresh - Beef raw dog food")
    SelectQuery sq2 = new SelectQuery();
    sq2.setFilter(new HashMap<>());
    Map<String, List<Map<String, Object>>> or1Criteria = new HashMap<>();
    or1Criteria.put("$or", new ArrayList<Map<String, Object>>());
    or1Criteria.get("$or").add(Map.of("product_price", 9.99));
    or1Criteria.get("$or").add(Map.of("product_name", "HealthyFresh - Beef raw dog food"));
    Map<String, List<Map<String, Object>>> or2Criteria = new HashMap<>();
    or2Criteria.put("$or", new ArrayList<Map<String, Object>>());
    or2Criteria.get("$or").add(Map.of("product_price", 12.99));
    or2Criteria.get("$or").add(Map.of("product_name", "HealthyFresh - Beef raw dog food"));
    List<Map<String, List<Map<String, Object>>>> andCriteria = new ArrayList<>();
    andCriteria.add(or1Criteria);
    andCriteria.add(or2Criteria);
    sq2.getFilter().put("$and", andCriteria);
    collection.find(sq2).forEach(System.out::println);
  }
}
  • To perform semantic search use findVector()
ObjectMappingFindVector.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBRepository;
import com.fasterxml.jackson.annotation.JsonProperty;
import io.stargate.sdk.data.domain.odm.DocumentResult;
import io.stargate.sdk.data.domain.query.Filter;

import java.util.List;

public class ObjectMappingFindVector {
  static class Product {
    @JsonProperty("product_name") private String name;
    @JsonProperty("product_price") private Double price;
  }

  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBRepository<Product> productRepository =
        db.createCollection("collection_vector1", 14, Product.class);

    // Perform a semantic search
    float[] embeddings = new float[]{1f, 0f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f};
    Filter metadataFilter = new Filter().where("product_price").isEqualsTo(9.99);
    int maxRecord = 10;
    List<DocumentResult<Product>> res = productRepository.findVector(embeddings, metadataFilter, maxRecord);

    // If you do not have max record or metadata filter, you can use the following
    productRepository.findVector(embeddings, maxRecord);
    productRepository.findVector(embeddings, metadataFilter);
  }
}

Update One

ObjectMappingUpdateOne.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBRepository;
import com.fasterxml.jackson.annotation.JsonProperty;
import io.stargate.sdk.data.domain.odm.Document;

public class ObjectMappingUpdateOne {
  static class Product {
    @JsonProperty("product_name") private String name;
    @JsonProperty("product_price") private Double price;
    Product(String name, Double price) {
      this.name = name;
      this.price = price;
    }
  }

  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBRepository<Product> productRepository =
        db.createCollection("collection_vector1", 14, Product.class);

    // Upsert a document
    productRepository.save(new Document<Product>()
        .id("product1")
        .vector(new float[]{1f, 0f, 1f, 1f, .5f, 1f, 0f, 0.3f, 0f, 0f, 0f, 0f, 0f, 0f})
        .data(new Product("product1", 9.99)));
  }
}

Update Many

ObjectMappingUpdateMany.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBRepository;
import com.fasterxml.jackson.annotation.JsonProperty;
import io.stargate.sdk.data.domain.DocumentMutationResult;
import io.stargate.sdk.data.domain.odm.Document;
import java.util.List;

public class ObjectMappingUpdateMany {
  static class Product {
    @JsonProperty("product_name") private String name;
    @JsonProperty("product_price") private Double price;
    Product(String name, Double price) {
      this.name = name;
      this.price = price;
    }
  }

  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBRepository<Product> productRepository =
        db.createCollection("collection_vector1", 14, Product.class);

    // Insert documents into the collection (IDs are generated automatically)
    List<DocumentMutationResult<Product>> identifiers = productRepository.saveAll(
        List.of(
            new Document<Product>()
                .vector(new float[]{1f, 0f, 1f, 1f, .5f, 1f, 0f, 0.3f, 0f, 0f, 0f, 0f, 0f, 0f})
                .data(new Product("product1", 9.99)),
            new Document<Product>()
                .vector(new float[]{1f, 0f, 1f, 1f, .5f, 1f, 0f, 0.3f, 0f, 0f, 0f, 0f, 0f, 0f})
                .data(new Product("product2", 12.99))));
  }
}

Delete One

ObjectMappingDeleteOne.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBRepository;
import com.fasterxml.jackson.annotation.JsonProperty;
import io.stargate.sdk.data.domain.odm.Document;

public class ObjectMappingDeleteOne {
  static class Product {
    @JsonProperty("product_name") private String name;
    @JsonProperty("product_price") private Double price;
  }

  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBRepository<Product> collection1 =
        db.createCollection("collection_simple", Product.class);

    // Delete a document by ID
    collection1.deleteById("id1");

    // Delete a specific document
    collection1.delete(new Document<Product>().id("id2"));
  }
}

Delete Many

ObjectMappingDeleteMany.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBRepository;
import com.fasterxml.jackson.annotation.JsonProperty;
import io.stargate.sdk.data.domain.query.DeleteQuery;
import io.stargate.sdk.data.domain.query.DeleteResult;

import static io.stargate.sdk.http.domain.FilterOperator.EQUALS_TO;

public class ObjectMappingDeleteMany {
  static class Product {
    @JsonProperty("product_name") private String name;
    @JsonProperty("product_price") private Double price;
  }

  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");

    // Create a vector collection
    AstraDBRepository<Product> collection1 =
        db.createCollection("collection_simple", Product.class);

    // Delete rows based on a query
    DeleteQuery q = DeleteQuery.builder()
            .where("product_price", EQUALS_TO, 9.99)
            .build();
    DeleteResult res = collection1.deleteAll(q);
  }
}

Clear

ObjectMappingClearCollection.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBRepository;
import com.fasterxml.jackson.annotation.JsonProperty;

public class ObjectMappingClearCollection {
  static class Product {
    @JsonProperty("product_name") private String name;
    @JsonProperty("product_price") private Double price;
  }

  public static void main(String[] args) {
    AstraDB db = new AstraDB("TOKEN", "API_ENDPOINT");
    AstraDBRepository<Product> collection1 =
        db.createCollection("collection_simple", Product.class);

    // Delete all rows in a collection
    collection1.deleteAll();
  }
}

Working with databases

Connection

About token permissions

To work with Databases you need to use a token with organization level permissions. You will work with the class AstraDBClient

To establish a connection with AstraDB using the client SDK, you are required to supply a token. This token enables two primary connection modes:

  • Direct database-level connection, facilitating access to a specific database. It is the one decribe above and primay way of working with the SDK.

  • Organization-level connection, which allows interaction with multiple databases under your organization. This is what we will detailed now

AstraDBClient class is used to facilitate interactions with all components within your Astra organization, rather than limiting operations to a single database. This approach enables a broader scope of management and control across the organization's databases. The token used for this connection must be scoped to the organization with

Properties Values
Token Role Organization Administrator
ConnectingAdmin.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDBAdmin;

public class ConnectingAdmin {
  public static void main(String[] args) {
    // Default Initialization
    AstraDBAdmin client = new AstraDBAdmin("TOKEN");

    // You can omit the token if you defined the `ASTRA_DB_APPLICATION_TOKEN`
    // environment variable or if you are using the Astra CLI.
    AstraDBAdmin defaultClient=new AstraDBAdmin();
  }
}

List databases

FindAllDatabases.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDBAdmin;
import com.dtsx.astra.sdk.db.domain.Database;
import java.util.stream.Stream;

public class FindAllDatabases {
  public static void main(String[] args) {
    AstraDBAdmin client = new AstraDBAdmin("TOKEN");
    boolean exists = client.isDatabaseExists("<database_name>");

    // List all available databases
    Stream<Database> dbStream = client.listDatabases();
  }
}

Create database

To create a database you need to use a token with organization level permissions. You will work with the class AstraDBClient

CreateDatabase.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDBAdmin;
import com.dtsx.astra.sdk.db.domain.CloudProviderType;
import java.util.UUID;

public class CreateDatabase {
  public static void main(String[] args) {
    AstraDBAdmin client = new AstraDBAdmin("TOKEN");

    // Choose a cloud provider (GCP, AZURE, AWS) and a region
    CloudProviderType cloudProvider = CloudProviderType.GCP;
    String cloudRegion = "us-east1";

    // Create a database
    UUID newDbId = client.createDatabase("<database_name>", cloudProvider, cloudRegion);
  }
}

Find database

FindDatabase.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDBAdmin;
import com.dtsx.astra.sdk.db.domain.Database;
import java.util.Optional;
import java.util.UUID;
import java.util.stream.Stream;

public class FindDatabase {
  public static void main(String[] args) {
    AstraDBAdmin client = new AstraDBAdmin("TOKEN");

    // Check if a database exists
    boolean exists = client.isDatabaseExists("<database_name>");

    // Find a database by name (names may not be unique)
    Stream<Database> dbStream = client.getDatabaseInformations("<database_name>");
    Optional<Database> dbByName = dbStream.findFirst();

    // Find a database by ID
    Optional<Database> dbById = client
        .getDatabaseInformations(UUID.fromString("<replace_with_db_uuid>"));
  }
}
  • Accessing object AstraDB
AstraDB myDB = client.database("getting_started");

Delete database

  • Delete Databases with deleteDatabase

The function can take a database identifier (uuid) or the database name.

DeleteDatabase.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDBAdmin;

import java.util.UUID;

public class DeleteDatabase {
  public static void main(String[] args) {
    AstraDBAdmin client = new AstraDBAdmin("TOKEN");

    // Delete an existing database
    client.dropDatabase("<database_name>");

    // Delete an existing database by ID
    client.dropDatabase(
            UUID.fromString("<replace_with_db_uuid>"));
  }
}

Working with Keyspaces

Create Keyspace

Create a keyspace in the current database with the given name.

General Information
  • Default keyspace is default_keyspace
  • If the keyspace already exist, the method will return 'KeyspaceAlreadyExistException'
void createKeyspace(String databaseName, String keyspaceName);
void createKeyspace(UUID databaseId, String keyspaceName);
  • Sample Code
CreateKeyspace.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDBAdmin;

public class CreateKeyspace {

    public static void main(String[] args) {
        AstraDBAdmin client = new AstraDBAdmin("TOKEN");

        // Create a Keyspace
        client.createKeyspace("<db_name>", "<keyspace_name>");
    }
}

Delete Keyspace

Delete a keyspace in the current database from its name.

General Information
  • Default keyspace is default_keyspace
  • If the keyspace does not exist, the method will return 'KeyspaceNotFoundException'
void deleteKeyspace(String databaseName, String keyspaceName);
void deleteKeyspace(UUID databaseId, String keyspaceName);
  • Sample Code
DeleteKeyspace.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDBAdmin;

public class DeleteKeyspace {

    public static void main(String[] args) {
        AstraDBAdmin client = new AstraDBAdmin("TOKEN");

        // Create a Keyspace
        client.deleteKeyspace("<db_name>", "<keyspace_name>");
    }
}

Find Keyspace

General Information
  • A database is not limited in number of keyspaces.
  • A keyspace is a logical grouping of collections.
  • Default keyspace name is default_keyspace
boolean isKeyspaceExists(String keyspaceName);
Stream<String> findAllKeyspaceNames();
String getCurrentKeyspace(String keyspaceName);
void changeKeyspace(String keyspaceName);
  • Sample Code
FindKeyspace.java
package com.dtsx.astra.sdk.documentation;

import com.dtsx.astra.sdk.AstraDBAdmin;

public class DeleteKeyspace {

    public static void main(String[] args) {
        AstraDBAdmin client = new AstraDBAdmin("TOKEN");

        // Create a Keyspace
        client.deleteKeyspace("<db_name>", "<keyspace_name>");
    }
}

6. Class Diagram

7. Working with CassIO

Cassio is framework originally implement in Python to use Open Source Cassandra as a Vector Store. It has been partially ported in Java. Idea is java to use the same table created by CassIO.

Connection

General Information
  • CassIO is a framework to use Open Source Cassandra as a Vector Store.
  • Java portage is only 2 tables metadata_vector and clustered_metadata_vector
  • The tables are created with a specific schema to store vectors and metadata
  • The indices are created to perform efficient search on the vector
CqlSession init(String token, UUID databaseId, String databaseRegion, String keyspace);
  • Sample Code
CassIOConnection.java
package com.dtsx.astra.sdk.documentation;

import com.datastax.oss.driver.api.core.CqlSession;
import com.dtsx.astra.sdk.AstraDBAdmin;
import com.dtsx.astra.sdk.cassio.CassIO;
import com.dtsx.astra.sdk.utils.TestUtils;

import java.util.UUID;

public class CassIOConnection {

    public static void main(String[] args) {

        // Create db if not exists
        UUID databaseId = new AstraDBAdmin("TOKEN")
                .createDatabase("database");

        // Initializing CqlSession
        try (CqlSession cqlSession = CassIO.init("TOKEN",
                databaseId, TestUtils.TEST_REGION,
                AstraDBAdmin.DEFAULT_KEYSPACE)) {
            cqlSession
                    .execute("SELECT datacenter FROM system.local;")
                    .one()
                    .get("datacenter", String.class);
        }
    }
}

MetadataVectorTable

General Information
  • Creating a Cassandra table with the following schema and associated indices
CREATE TABLE vector_store (
 row_id          timeuuid,
 attributes_blob text,
 body_blob       text,
 metadata_s      map<text, text>,
 vector          vector<float, 1536>,
 PRIMARY KEY (row_id)
);
  • Sample Code
CassIOMetadataVectorTable.java
package com.dtsx.astra.sdk.documentation;

import com.datastax.oss.driver.api.core.CqlSession;
import com.dtsx.astra.sdk.AstraDBAdmin;
import com.dtsx.astra.sdk.cassio.AnnQuery;
import com.dtsx.astra.sdk.cassio.CassIO;
import com.dtsx.astra.sdk.cassio.MetadataVectorRecord;
import com.dtsx.astra.sdk.cassio.MetadataVectorTable;
import com.dtsx.astra.sdk.utils.TestUtils;

import java.util.List;
import java.util.Map;
import java.util.UUID;

public class CassIOMetadataVectorTable {
    public static void main(String[] args) {

        // Create db if not exists
        UUID databaseId = new AstraDBAdmin("TOKEN")
                .createDatabase("database");

        // Initializing CqlSession
        try (CqlSession cqlSession = CassIO.init("TOKEN",
                databaseId, TestUtils.TEST_REGION,
                AstraDBAdmin.DEFAULT_KEYSPACE)) {

            // Initializing table with the dimension
            MetadataVectorTable vector_Store = CassIO
                    .metadataVectorTable("vector_store", 1536);
            vector_Store.create();

            // Insert Vectors
            String partitionId = UUID.randomUUID().toString();
            MetadataVectorRecord record = new MetadataVectorRecord();
            record.setVector(List.of(0.1f, 0.2f, 0.3f, 0.4f));
            record.setMetadata(Map.of("key", "value"));
            record.setBody("Sample text fragment");
            record.setAttributes("handy field for special attributes");
            vector_Store.put(record);

            // Semantic Search
            AnnQuery query = AnnQuery
                    .builder()
                    .embeddings(List.of(0.1f, 0.2f, 0.3f, 0.4f))
                    .metaData(Map.of("key", "value"))
                    .build();

            vector_Store.similaritySearch(query).forEach(result -> {
                System.out.println("Similarity : " + result.getSimilarity());
                System.out.println("Record : " + result.getEmbedded().getBody());
            });
        }
    }
}

ClusteredMetadataVectorTable

General Information
  • Creating a Cassandra table
CREATE TABLE goodbards.vector_store_openai_by_tenant (
partition_id text,
row_id timeuuid,
attributes_blob text,
body_blob text,
metadata_s map<text, text>,
vector vector<float, 1536>,
PRIMARY KEY (partition_id, row_id)
) WITH CLUSTERING ORDER BY (row_id DESC)
  • Sample Code
CassIOClusteredMetadataVectorTable.java
package com.dtsx.astra.sdk.documentation;

import com.datastax.oss.driver.api.core.CqlSession;
import com.dtsx.astra.sdk.AstraDBAdmin;
import com.dtsx.astra.sdk.cassio.AnnQuery;
import com.dtsx.astra.sdk.cassio.CassIO;
import com.dtsx.astra.sdk.cassio.ClusteredMetadataVectorRecord;
import com.dtsx.astra.sdk.cassio.ClusteredMetadataVectorTable;
import com.dtsx.astra.sdk.utils.TestUtils;

import java.util.List;
import java.util.Map;
import java.util.UUID;

import static com.dtsx.astra.sdk.cassio.AbstractCassandraTable.PARTITION_ID;

public class CassIOClusteredMetadataVectorTable {
    public static void main(String[] args) {

        // Create db if not exists
        UUID databaseId = new AstraDBAdmin("TOKEN")
                .createDatabase("database");

        // Initializing CqlSession
        try (CqlSession cqlSession = CassIO.init("TOKEN",
                databaseId, TestUtils.TEST_REGION,
                AstraDBAdmin.DEFAULT_KEYSPACE)) {

            // Initializing table with the dimension
            ClusteredMetadataVectorTable vector_Store = CassIO
                    .clusteredMetadataVectorTable("vector_store", 1536);
            vector_Store.create();

            // Insert Vectors
            String partitionId = UUID.randomUUID().toString();
            ClusteredMetadataVectorRecord record = new ClusteredMetadataVectorRecord();
            record.setVector(List.of(0.1f, 0.2f, 0.3f, 0.4f));
            record.setMetadata(Map.of("key", "value"));
            record.setPartitionId(partitionId);
            record.setBody("Sample text fragment");
            record.setAttributes("handy field for special attributes");
            vector_Store.put(record);

            // Semantic Search
            AnnQuery query = AnnQuery
                    .builder()
                   .embeddings(List.of(0.1f, 0.2f, 0.3f, 0.4f))
                   .metaData(Map.of(PARTITION_ID, partitionId))
                   .build();

            vector_Store.similaritySearch(query).forEach(result -> {
                System.out.println("Similarity : " + result.getSimilarity());
                System.out.println("Record : " + result.getEmbedded().getBody());
            });
        }
    }
}

8. Working with Langchain4j



Last update: 2024-02-09