cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Running SQL queries against a parquet folder in S3

Shaimaa
New Contributor

I need to run sql queries against a parquet folder in S3. I am trying to use "read_files" but sometimes my queries fail due to errors while inferring the schema and sometimes without a specified reason. 

Sample query:

 

 

SELECT 
SUM(CASE WHEN match_result.names IS NOT NULL AND ARRAY_SIZE(match_result.names) !=0 THEN 1 ELSE 0 END)
FROM read_files('s3://folder_path')

 

 

How can I enforce the schema successfully and run my query without errors?

1 REPLY 1

shan_chandra
Esteemed Contributor
Esteemed Contributor

 @Shaimaa  - you can divide the query into a nested query to first select all the fields from the s3 by enforcing the schema and build a nested query on top of the below example query (not syntax verified)

SELECT *
  FROM STREAM read_files(
      's3://bucket/path',
      format => 'parquet',
      schema => 'id int, ts timestamp, event string')

 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!