Skip to content

‣ Presto

Overview

Presto is a distributed SQL query engine for big data analytics. Presto can query data from over 30 different data sources, including Cassandra, MongoDB, MySQL, PostgresSQL, and Redis. Common Presto use cases include:

  • interactive data analytics,
  • SQL-based analytics over object storage systems,
  • data access and analytics across multiple data sources with query federation,
  • batch ETL processing across disparate systems.

In this tutorial, we show how to use Presto to explore and query data in Astra DB with SQL. The overall architecture of this solution is depicted below. Presto CLI Client sends SQL queries to Presto Server. Presto Server retrieves data from Astra DB via CQL Proxy, computes the query results and returns them to the client.

Prerequisites

Setup Astra DB

1. Sign in

Connect to your Astra account and create a new Astra database or select an existing one. Add a new keyspace with name banking_db or use an existing one.

2. Create the following tables using the CQL Console

USE banking_db;
CREATE TABLE customer (
    id UUID,
    name TEXT,
    email TEXT,
    PRIMARY KEY (id)
);

CREATE TABLE accounts_by_customer (
  customer_id UUID,
  account_number TEXT,
  account_type TEXT,
  account_balance DECIMAL,
  customer_name TEXT STATIC,
  PRIMARY KEY ((customer_id), account_number)
);

3. Insert the rows using the CQL Console

INSERT INTO customer (id,name,email) VALUES (8d6c1271-16b6-479d-8ea9-546c37381ab3,'Alice','alice@example.org');
INSERT INTO customer (id,name,email) VALUES (0e5d9e8c-2e3b-4576-8515-58b491cb859e,'Bob','bob@example.org');


INSERT INTO accounts_by_customer (customer_id,account_number,account_type,account_balance,customer_name)
VALUES (8d6c1271-16b6-479d-8ea9-546c37381ab3,'A-101','Checking',100.01,'Alice');
INSERT INTO accounts_by_customer (customer_id,account_number,account_type,account_balance,customer_name)
VALUES (8d6c1271-16b6-479d-8ea9-546c37381ab3,'A-102','Savings',200.02,'Alice');
INSERT INTO accounts_by_customer (customer_id,account_number,account_type,account_balance,customer_name)
VALUES (0e5d9e8c-2e3b-4576-8515-58b491cb859e,'B-101','Checking',300.03,'Bob');
INSERT INTO accounts_by_customer (customer_id,account_number,account_type,account_balance,customer_name)
VALUES (0e5d9e8c-2e3b-4576-8515-58b491cb859e,'B-102','Savings',400.04,'Bob');

Deploy CQL Proxy

4. Installation

Follow the instructions to deploy a CQL Proxy as close to a Presto Server as possible, preferrably deploying both components on the same server. The simplest way to start cql-proxy is to use an <astra-token> and <astra-database-id>:

./cql-proxy \
--astra-token <astra-token> \
--astra-database-id <astra-database-id>

An example command with a sample, invalid token and database id:

./cql-proxy \
--astra-token AstraCS:NoBhcuwCrIhZxqzjEMCSuGos:8a85142b47a588472a1f3b1314e2141f098785895411dee9db11f2a7ade457ce \
--astra-database-id e5e4e925-289a-8231-83fd-25918093257b

Setup Presto Server

5. Presto intallation

Follow the instructions to download, install and configure a Presto Server or use an existing deployment. The minimal configuration requirements for a local single-machine deployment are:

  • Node properties in file etc/node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/var/presto/data
  • JVM config in file etc/jvm.config
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true
  • Config properties in file etc/config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080
  • Catalog properties in file etc/catalog/cassandra.properties
connector.name=cassandra
cassandra.contact-points=localhost
cassandra.consistency-level=QUORUM

The above configuration uses the Cassandra connector to interact with cql-proxy.

6. Start the Presto Server:

bin/launcher run

Wait for message ======== SERVER STARTED ======== to confirm a successful start.

SQL Queries with Presto Client

In this section you will execute SQL Queries against Astra DB using Presto CLI Client.

7. Install Presto Client

Follow the instructions to download and install a CLI Presto Client.

8. Start the CLI Presto Client:

./presto --server http://localhost:8080 --catalog cassandra

The server option specifies the HTTP(S) address and port of the Presto coordinator, and the catalog option sets the default catalog.

9. Execute the SQL query to find the total number of customers:

SELECT COUNT(*) AS customer_count
FROM banking_db.customer;

Output:

 customer_count
----------------
              2
(1 row)

10. Execute the SQL query to find emails of customers with account balances of 300.00 or higher:

SELECT DISTINCT email AS customer_email
FROM banking_db.customer
     INNER JOIN banking_db.accounts_by_customer
     ON (id = customer_id)
WHERE account_balance >= 300.00;

Output:

 customer_email
-----------------
 bob@example.org
(1 row)

11. Execute the SQL query to find customers and sums of their account balances:

SELECT id AS customer_id,
       name AS customer_name,
       email AS customer_email,
       SUM ( CAST (
              COALESCE(account_balance,0) AS DECIMAL(12,2)
                ) ) AS customer_funds
FROM banking_db.customer
     LEFT OUTER JOIN banking_db.accounts_by_customer
     ON (id = customer_id)
GROUP BY id, name, email;

Output:

             customer_id              | customer_name |  customer_email   | customer_funds
--------------------------------------+---------------+-------------------+----------------
 8d6c1271-16b6-479d-8ea9-546c37381ab3 | Alice         | alice@example.org | 300.03
 0e5d9e8c-2e3b-4576-8515-58b491cb859e | Bob           | bob@example.org   | 700.07
(2 rows)

Last update: 2023-10-13