‣ Presto
Overview¶
Presto is a distributed SQL query engine for big data analytics. Presto can query data from over 30 different data sources, including Cassandra, MongoDB, MySQL, PostgresSQL, and Redis. Common Presto use cases include:
- interactive data analytics,
- SQL-based analytics over object storage systems,
- data access and analytics across multiple data sources with query federation,
- batch ETL processing across disparate systems.
In this tutorial, we show how to use Presto to explore and query data in Astra DB with SQL. The overall architecture of this solution is depicted below. Presto CLI Client sends SQL queries to Presto Server. Presto Server retrieves data from Astra DB via CQL Proxy, computes the query results and returns them to the client.
Prerequisites¶
Setup Astra DB¶
✅ 1. Sign in
Connect to your Astra account and create a new Astra database or select an existing one. Add a new keyspace with name banking_db
or use an existing one.
✅ 2. Create the following tables using the CQL Console
CREATE TABLE customer (
id UUID,
name TEXT,
email TEXT,
PRIMARY KEY (id)
);
CREATE TABLE accounts_by_customer (
customer_id UUID,
account_number TEXT,
account_type TEXT,
account_balance DECIMAL,
customer_name TEXT STATIC,
PRIMARY KEY ((customer_id), account_number)
);
✅ 3. Insert the rows using the CQL Console
INSERT INTO customer (id,name,email) VALUES (8d6c1271-16b6-479d-8ea9-546c37381ab3,'Alice','alice@example.org');
INSERT INTO customer (id,name,email) VALUES (0e5d9e8c-2e3b-4576-8515-58b491cb859e,'Bob','bob@example.org');
INSERT INTO accounts_by_customer (customer_id,account_number,account_type,account_balance,customer_name)
VALUES (8d6c1271-16b6-479d-8ea9-546c37381ab3,'A-101','Checking',100.01,'Alice');
INSERT INTO accounts_by_customer (customer_id,account_number,account_type,account_balance,customer_name)
VALUES (8d6c1271-16b6-479d-8ea9-546c37381ab3,'A-102','Savings',200.02,'Alice');
INSERT INTO accounts_by_customer (customer_id,account_number,account_type,account_balance,customer_name)
VALUES (0e5d9e8c-2e3b-4576-8515-58b491cb859e,'B-101','Checking',300.03,'Bob');
INSERT INTO accounts_by_customer (customer_id,account_number,account_type,account_balance,customer_name)
VALUES (0e5d9e8c-2e3b-4576-8515-58b491cb859e,'B-102','Savings',400.04,'Bob');
Deploy CQL Proxy¶
✅ 4. Installation
Follow the instructions to deploy a CQL Proxy as close to a Presto Server as possible, preferrably deploying both components on the same server. The simplest way to start cql-proxy
is to use an <astra-token>
and <astra-database-id>
:
An example command with a sample, invalid token and database id:
./cql-proxy \
--astra-token AstraCS:NoBhcuwCrIhZxqzjEMCSuGos:8a85142b47a588472a1f3b1314e2141f098785895411dee9db11f2a7ade457ce \
--astra-database-id e5e4e925-289a-8231-83fd-25918093257b
Setup Presto Server¶
✅ 5. Presto intallation
Follow the instructions to download, install and configure a Presto Server or use an existing deployment. The minimal configuration requirements for a local single-machine deployment are:
- Node properties in file
etc/node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/var/presto/data
- JVM config in file
etc/jvm.config
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true
- Config properties in file
etc/config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080
- Catalog properties in file
etc/catalog/cassandra.properties
The above configuration uses the Cassandra connector to interact with cql-proxy
.
✅ 6. Start the Presto Server:
Wait for message ======== SERVER STARTED ========
to confirm a successful start.
SQL Queries with Presto Client¶
In this section you will execute SQL Queries against Astra DB using Presto CLI Client.
✅ 7. Install Presto Client
Follow the instructions to download and install a CLI Presto Client.
✅ 8. Start the CLI Presto Client:
The server
option specifies the HTTP(S) address and port of the Presto coordinator, and the catalog
option sets the default catalog.
✅ 9. Execute the SQL query to find the total number of customers:
Output:
✅ 10. Execute the SQL query to find emails of customers with account balances of 300.00
or higher:
SELECT DISTINCT email AS customer_email
FROM banking_db.customer
INNER JOIN banking_db.accounts_by_customer
ON (id = customer_id)
WHERE account_balance >= 300.00;
Output:
✅ 11. Execute the SQL query to find customers and sums of their account balances:
SELECT id AS customer_id,
name AS customer_name,
email AS customer_email,
SUM ( CAST (
COALESCE(account_balance,0) AS DECIMAL(12,2)
) ) AS customer_funds
FROM banking_db.customer
LEFT OUTER JOIN banking_db.accounts_by_customer
ON (id = customer_id)
GROUP BY id, name, email;
Output:
customer_id | customer_name | customer_email | customer_funds
--------------------------------------+---------------+-------------------+----------------
8d6c1271-16b6-479d-8ea9-546c37381ab3 | Alice | alice@example.org | 300.03
0e5d9e8c-2e3b-4576-8515-58b491cb859e | Bob | bob@example.org | 700.07
(2 rows)