Databricks Community

AdityaM · ‎04-30-2024

Hi Databricks community,

Hope you are doing well.
I am trying to create an external table using a Gzipped CSV file uploaded to an S3 bucket.
The S3 URI of the resource doesn't have any file extensions, but the content of the file is a Gzipped comma separated file that I want to read into the External Table.

The command I'm using is:

CREATE EXTERNAL TABLE `mycatalog`.`myExternalTable`(

`ID` STRING,

`value` STRING

)

USING CSV

OPTIONS (

PATH 's3://mybucket/filename',

HEADER 'false',

encoding 'UTF-8',

compression 'gzip',

delimiter ','

);

If I try to create the table using that exact same file, in the same bucket, with the .gz extension, it works.
But without that extension, it gives me a weird jumbled output(on doing select * on the table) indicating that decompression is not happening properly.
Is there a way to create the table without adding any extensions to the S3 file path?

Thanks for your time,
Aditya

AdityaM · ‎05-07-2024

Hey , thanks for your response.

I tried using a Serde(I think the OpenCSVSerde should work for me) but unfortunately im getting the below from the Unity Catalog:

[UC_DATASOURCE_NOT_SUPPORTED] Data source format hive is not supported in Unity Catalog. SQLSTATE: 0AKUC

Can you please suggest any other workarounds for the above?

Thanks

@Retired_mod