📖 Reference Documentations and resources
A - Overview¶
📘 What is DSBulk ?¶
The DataStax Bulk Loader tool (DSBulk) is a unified tool for loading into and unloading from Cassandra-compatible storage engines, such as OSS Apache Cassandra®, DataStax Astra and DataStax Enterprise (DSE).
Out of the box, DSBulk provides the ability to:
- Load (import) large amounts of data into the database efficiently and reliably;
- Unload (export) large amounts of data from the database efficiently and reliably;
- Count elements in a database table: how many rows in total, how many rows per replica and per token range, and how many rows in the top N largest partitions.
Currently, CSV and Json formats are supported for both loading and unloading data.
📘 DataStax Bulk Loader with Astra¶
Use DataStax Bulk Loader
(dsbulk) to load and unload data in CSV or JSON format with your DataStax Astra DB database efficiently and reliably.
You can use
dsbulk as a standalone tool to remotely connect to a cluster. The tool is not required to run locally on an instances, but can be used in this configuration.
B - Prerequisites¶
- You should have an Astra account
- You should Create an Astra Database
- You should Have an Astra Token
- You should Download your Secure bundle
This article was written for DataStax Bulk Loader version
Starting with version
dsbulkcan detect and respect server-side rate limiting. This is very useful when working with Astra DB, which by default has some throughput guardrails in place.
C - Installation¶
✅ Step 1 : Download the archive and unzip locally
it will take a few seconds (file is about 30M)...
D - Usage¶
📘 Load Data¶
- Given a table
- A sample CSV could be:
- Loaded with the following command:
📘 Export Data¶
- Unloaded the same table with the following command:
📘 Count Table Records¶
- Counted the rows in the table with the following command:
- Produces the following output: