cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Data Engineering - CTAS - External Tables - Limitations of CTAS for external tables - can or cannot use options and location

Kopal
New Contributor II

Data Engineering - CTAS - External Tables

  1. Can someone help me understand why In chapter 3.3, we cannot not directly use CTAS with OPTIONS and LOCATION to specify delimiter and location of CSV?
  2. Or I misunderstood?

Details:

CREATE TABLE IF NOT EXISTS sales_csv
  (order_id LONG, email STRING, transactions_timestamp LONG, total_item_quantity INTEGER, purchase_revenue_in_usd DOUBLE, unique_items INTEGER, items STRING)
USING CSV
OPTIONS (
  header = "true",
  delimiter = "|"
)
LOCATION "${DA.paths.sales_csv}"
  • Video DE 3.3 - Set up Delta Tables
    • It is said that CTAS has limitations to support additional file OPTIONS. The example is to show how we could not define a pipe delimiter to correctly ingest data.
    • But this example opposes what was told in Chapter 2.2.
CREATE OR REPLACE TABLE sales_unparsed AS
SELECT * FROM csv.`${da.paths.datasets}/ecommerce/raw/sales-csv`;

Please help me understand.

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

The 2nd statement CTAS will not be able to parse the csv in any manner because it's just the from statement that points to a file. It's more of a traditional SQL statement with select and from. It will create a Delta Table. This just happens to be the way the syntax works with AS not allowing for options.

View solution in original post

3 REPLIES 3

Anonymous
Not applicable

The 2nd statement CTAS will not be able to parse the csv in any manner because it's just the from statement that points to a file. It's more of a traditional SQL statement with select and from. It will create a Delta Table. This just happens to be the way the syntax works with AS not allowing for options.

Kopal
New Contributor II

Ahh the devil was in the details. The statements differ in USING CSV -> external table and CTAS + SELECT FROM-> managed table. I hope I got it right.

Thanks a lot for explanation. Cheers!

Anonymous
Not applicable

Glad I could help

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group