Skip to content

• Vector Client Java

1. Overview

The Astra DB Client, as the name suggests, is a client library that interacts with the various APIs of the Astra DataStax Platform. It enables users to connect to, utilize, and administer the Astra Vector product. The library encompasses two distinct clients working in tandem:

  • AstraDBClient: This is the primary entry point to the library and serves as the initial object to access all its features. The client supports both schema operations (such as adding and deleting vector stores and collections) and data operations (including insert, update, and delete functions). It notably offers advanced search capabilities, which encompass similarity search, text-based search, and metadata filtering.

  • AstraDBOpsClient: This class is specifically designed for the administration of the Astra Vector platform. It facilitates the creation, deletion, and management of various databases within your tenant. Authentication is done via a token that is scoped to your tenant.

2. Prerequisites

  • Install Java Development Kit (JDK) 11++

Use the java reference documentation to install a Java Development Kit (JDK) tailored for your operating system. After installation, you can validate your setup with the following command:

java --version
  • Install Apache Maven (3.9+) or Gradle

Samples and tutorials are designed to be used with Apache Maven. Follow the instructions in the reference documentation to install Maven. To validate your installation, use the following command:

mvn -version
  • Create your DataStax Astra account:

Sign Up to Datastax Astra

  • Create an Astra Token

Once logged into the user interface, select settings from the left menu and then click on the tokens tab to create a new token.

You want to pick the following role:

Properties Values
Token Role Organization Administrator

The Token contains properties Client ID, Client Secret and the token. You will only need the third (starting with AstraCS:)

{
  "ClientId": "ROkiiDZdvPOvHRSgoZtyAapp",
  "ClientSecret": "fakedfaked",
  "Token":"AstraCS:fake" <========== use this field
}

3. Setup project

  • If you are using Maven Update your pom.xml file with the latest version of the Vector SDK Maven Central
<dependency>
  <groupId>com.datastax.astra</groupId>
  <artifactId>astra-sdk-vector</artifactId>
  <version>${latest}</version>
</dependency>
  • If you are using gradle change the build.dgradle with
dependencies {
    compile 'com.datastax.astra:astra-sdk-vector-1.0'
}

4. Getting Started

With a valid token, you can create an AstraVectorClient object and start using the library.

4.1 Using Json

// 1) Initialization
AstraDBClient astraClient = new AstraDBClient("AstraCS:....");

// 2) Create database if not exists
if (!astraClient.isDatabaseExists("getting_started")) {
  UUID dbId = astraDBClient.createDatabase(databaseName);
}

// 3) Select the database
AstraDB db = astraClient.database("getting_started");

// 4) Create or select collection
CollectionClient demoCollection;
if (!db.isCollectionExists("demo")) {
  demoCollection = db.createCollection("demo",14);
} else {
  demoCollection = db.collection("demo");
}

// 5) Insert a few vectors

// 5a. Insert One (attributes as key/value)
demoCollection.insertOne(new JsonDocument()
  .id("doc1") // generated if not set
  .vector(new float[]{1f, 0f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f})
  .put("product_name", "HealthyFresh - Beef raw dog food")
  .put("product_price", 12.99));
// 5b. Insert One (attributes as JSON)
demoCollection.insertOne(new JsonDocument()
  .id("doc2")
  .vector(new float[]{1f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f})
  .data("{"
  +"   \"product_name\": \"HealthyFresh - Chicken raw dog food\", "
  + "  \"product_price\": 9.99"
  + "}")
);
// 5c. Insert One (attributes as a MAP)
demoCollection.insertOne(new JsonDocument()
  .id("doc3")
  .vector(new float[]{1f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f})
  .data(Map.of("product_name", "HealthyFresh - Chicken raw dog food"))
);
// 5d. Insert as a single Big JSON
demoCollection.insertOne(new JsonDocument()
  .id("doc4")
  .vector(new float[]{1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f})
  .put("product_name", "HealthyFresh - Chicken raw dog food")
  .put("product_price", 9.99)
);

// 6) Similarity Search
float[] embeddings     = new float[] {1f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f};
Filter  metadataFilter = new Filter().where("product_price").isEqualsTo(9.99);
int maxRecord = 10;
List<JsonResult> resultsSet = demoCollection
        .similaritySearch(embeddings, metadataFilter, maxRecord);

4.2 Object Mapping

Instead of interacting with the database with key/values you may want to associate an object to each record in the collection for this you can use CollectionRepository. If we reproduce the sample before

  • Create the object
static class Product {

  @JsonProperty("product_name")
  private String name;

  @JsonProperty("product_price")
  private Double price;

  // getters and setters
}
  • Similarity Search
// 1) Initialization
AstraDB db = new AstraDBClient("AstraCS:....")
        .database("getting_started");

// 2) Create or select collection
CollectionRepository<Product> productRepository = db
        .collectionRepository(collectionName, Product.class);

// 3) Insert a few vectors
productRepository.insert(new Document<>("doc5",
        new Product("HealthyFresh - Beef raw dog food", 12.99),
        new float[]{1f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f}));
productRepository.insert(new Document<>("doc6",
        new Product("Another Product", 9.99),
        new float[]{1f, 1f, 1f, 0f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f}));

// 4) Similarity Search
float[] embeddings     = 
        new float[] {1f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f};
Filter  metadataFilter = 
        new Filter().where("product_price").isEqualsTo(9.99);
int maxRecord = 10;
List<Result<Product>> results = productRepository
        .similaritySearch(embeddings, metadataFilter, maxRecord);

5. Reference Guide

5.1 AstraDBClient

The initialization happens in AstraVectorClient class. It can be done in different ways:

  • Initialization
// 1. Expecting env var `ASTRA_DB_APPLICATION_TOKEN` 
AstraDBClient client = new AstraVectorClient();

// 2. Using the token
AstraDBClient client = new AstraDBClient("AstraCS:....");

// 3. Non production environment
AstraDBClient client = new AstraDBClient(astraToken, AstraEnvironment.DEV);

5.2 Working with Databases

  • List Databases with findAllDatabases
client.findAllDatabases()
        .map(Database::getInfo)
        .map(DatabaseInfo::getName)
        .forEach(log::info);
  • Create Databases with createDatabase

The function can take a database identifier (uuid) or the database name.

UUID db1Id = client.createDatabase("db1");

// Specify the region (enum for the user to pick, +  explicit FREE_TIER)
UUID db2Id = client.createDatabase("db2",
  AstraVectorClient.FREE_TIER_CLOUD,
  AstraVectorClient.FREE_TIER_CLOUD_REGION);
  • Delete Databases with deleteDatabase

The function can take a database identifier (uuid) or the database name.

client.deleteDatabase("db1");

  • Access database from its name or id
// Retrieve from an id
UUID id = UUID.randomUUID();
Optional<Database> db2 = findDatabaseById(id);

// Retrieve from  its name
Optional<Database> db2 = findDatabaseByName(name)
  • Check a database exists
boolean isDatabaseExists(id)
  • Accessing devops API
AstraDBOpsClient devops = clientgetAstraDbOps();
  • Accessing object AstraDB
AstraDB myDB = client.database("getting_started");

5.3 AstraDB

Assuming the database already exist and you want to use it you can directly instantiate this class from am astra token and the api_endpoint. The endpoint can be copied from the user interface but it looks like

https://{database-id}-{database-region}.apps.astra.datastax.com
  • Initializations
// 1) Initialization with api endpoint
AstraDB db1 = new AstraDB("AstraCS:....", "https:://");

// 2) Initialization with databaseId 
AstraDB db2 = new AstraDB("AstraCS:....", dbId);

5.4 Working with Collections

  • Find all collections
// assuming you have vectorDatabase
Stream<CollectionDefinition> collections = db.findAllCollections();
  • Does a collection exists
boolean demo  = db.isCollectionExists("collection1");
  • Find a collection from its name
Optional<CollectionDefinition> collection  = db.findCollection("collection1");
  • Delete a collection from its name
db.deleteCollection("collection1");
  • Create Collection with createCollection
// Create a collection without vector
CollectionClient col1 = db.createCollection("store_name");

// Create a collection with vector
CollectionClient col2 = db.createCollection("vector_store", 1536);

// More information with the usage of the defintion
CollectionClient col3 = db.createCollection(CollectionDefinition.builder()
        .name("tmp_collection")
        .vector(14, cosine));
  • Use same method providing a bean you get CollectionRepository
// Create a collection without vector
CollectionRepository<Product> col1 = db
   .createCollection("store_name", Product.class);

// Create a collection with vector
CollectionRepository<Product> col2 = db
   .createCollection("vector_store", 1536, Product.class);

// More information with the usage of the defintion
CollectionRepository<Product>  col3 = db
   .createCollection(CollectionDefinition
     .builder()
     .name("tmp_collection")
     .vector(14, cosine), 
    Product.class);
  • If collection already exist
// Accessing CollectionClient
CollectionClient col1 = db
    .collection(name)

// Accessing CollectionRepository
CollectionRepository<Product> repo = db
   .collectionRepository("demo", Product.class);

5.5 CollectionClient

  • Insertions

  • If no id is provide when inserting the system will generate on for you

// Insert with key/values
col1.insert(new JsonDocument()
  .id("doc1") // generated if not set
  .vector(new float[]{1f, 0f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f})
  .put("product_name", "HealthyFresh - Beef raw dog food")
  .put("product_price", 12.99));

// Insert with payload as Json
col1.insert(new JsonDocument()
  .id("doc2")
  .vector(new float[]{1f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f})
  .data("{"
       +"   \"product_name\": \"HealthyFresh - Chicken raw dog food\", "
       + "  \"product_price\": 9.99"
       + "}")
);

// Insert with payload as a Map
col1.insert(new JsonDocument()
   .id("doc3")
   .vector(new float[]{1f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f})
   .data(Map.of("product_name", "HealthyFresh - Chicken raw dog food"))
);

// Insert as a Json
col1.insert("{"
    + "   \"_id\":\"doc4\","
    + "   \"$vector\":[1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],"
    + "   \"product_name\": \"HealthyFresh - Chicken raw dog food\", "
    + "   \"product_price\": 9.99"
    + "}");

You can retrieve vector documents from their id of their vector. It is not really a search but rather a findById.

  • Find By Id

Retrieve a document from its id (if exists)

// Assuming you have a VectorStore<MyBean>
Optional<MyBean> result = col1.findById("doc1");

// When working with JsonVectorStore to returned raw 'JsonResult'
Optional<JsonResult> result = col1.findByIdJson("doc1");
  • Find By Vector

Retrieve a document from its vector (if exists)

// Assuming you have a VectorStore<MyBean>
Optional<MyBean> result = col1
        .findByVector(new float[]{1f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f});

// When working with JsonVectorStore to returned raw 'JsonResult'
Optional<JsonResult> result = col1
        .findByVectorJson(new float[]{1f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f});
  • Find all

You can retrieve all vectors from your store but it might be slow and consume a lot of memory, prefered paed request except when in development.

// Find All for VectorStore<MyBean>
Stream<JsonResult> all = col1.findAll();
  • Find with a query

You can search on any field of the document. All fields are indexed. Using a SelectQuery populated through builder you can get some precise results.

Stream<JsonResult> all = col1.findAll(SelectQuery.builder()
  .where("product_price")
  .isEqualsTo(9.99)
  .build());
  • Find Page

Find Page works the same as findAll(Query) where you can pass a SelectQuery as input. In the object Page the field pagingState should be provided from page to another.

// VectorStore<MyBean>
// JsonVectorStore
Page<JsonResult> page1 = vectorStore.findPage(SelectQuery.builder().build());
page1.getPageState().ifPresent(pagingState -> {
  Page<JsonResult> page2 = vectorStore
    .findPageJson(SelectQuery
    .builder().withPagingState(pagingState).build());
});

In the query ou can then add filter with the builder.

A similarity search is a query that will find records where vectors are the closest to a given vector. It is done by providing a vector and a number of results to return. The result is a list of JsonResult that contains the payload and the distance.

  • Simple Search
float[] embeddings = 
   new float[]{1f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f};
int limit = 2;
List<JsonDocument> results = col1.similaritySearch(embeddings, limit);
  • Search with filter
float[] embeddings = 
   new float[]{1f, 1f, 1f, 1f, 1f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f, 0f};
int limit = 2;
Filter  metadataFilter = new Filter().where("product_price").isEqualsTo(9.99);
List<JsonDocument> results = col1
        .similaritySearch(embeddings, metadataFilter, limit);
  • When a limit is provided the service return a list of Results.
  • When no limit is provided the service return a Page of results and paging is enabled.
  • The limit must be between 1 and 20.

5.6 CollectionRepository

6. Troubleshooting

  • Common Errors and Solutions

List typical issues users might face and their resolutions.

  • 6.2. FAQ

Address frequently asked questions.

7. Best Practices

  • 7.1. Performance Tips

Offer guidance on optimizing usage for better performance.

  • 7.2. Security Recommendations

Share advice on secure practices when using the library.

8. Contribution Guide

  • 8.1. Code of Conduct

Outline the behavior expected from contributors.

  • 8.2. Contribution Steps

Describe how one can contribute to the library, e.g., via pull requests.

9. Release Notes/Changelog

Track changes made in each version of the library.

10. Contact and Support

  • 10.1. Reporting Bugs

Provide a link or method for users to report issues.

  • 10.2. Getting Help

Point users to forums, support channels, or other resources.


Last update: 2023-11-21