cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Data Engineering - CTAS - External Tables - Limitations of CTAS for external tables - can or cannot use options and location

Kopal
New Contributor II

Data Engineering - CTAS - External Tables

  1. Can someone help me understand why In chapter 3.3, we cannot not directly use CTAS with OPTIONS and LOCATION to specify delimiter and location of CSV?
  2. Or I misunderstood?

Details:

CREATE TABLE IF NOT EXISTS sales_csv
  (order_id LONG, email STRING, transactions_timestamp LONG, total_item_quantity INTEGER, purchase_revenue_in_usd DOUBLE, unique_items INTEGER, items STRING)
USING CSV
OPTIONS (
  header = "true",
  delimiter = "|"
)
LOCATION "${DA.paths.sales_csv}"
  • Video DE 3.3 - Set up Delta Tables
    • It is said that CTAS has limitations to support additional file OPTIONS. The example is to show how we could not define a pipe delimiter to correctly ingest data.
    • But this example opposes what was told in Chapter 2.2.
CREATE OR REPLACE TABLE sales_unparsed AS
SELECT * FROM csv.`${da.paths.datasets}/ecommerce/raw/sales-csv`;

Please help me understand.

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

The 2nd statement CTAS will not be able to parse the csv in any manner because it's just the from statement that points to a file. It's more of a traditional SQL statement with select and from. It will create a Delta Table. This just happens to be the way the syntax works with AS not allowing for options.

View solution in original post

3 REPLIES 3

Anonymous
Not applicable

The 2nd statement CTAS will not be able to parse the csv in any manner because it's just the from statement that points to a file. It's more of a traditional SQL statement with select and from. It will create a Delta Table. This just happens to be the way the syntax works with AS not allowing for options.

Kopal
New Contributor II

Ahh the devil was in the details. The statements differ in USING CSV -> external table and CTAS + SELECT FROM-> managed table. I hope I got it right.

Thanks a lot for explanation. Cheers!

Anonymous
Not applicable

Glad I could help

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.