cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Are training/ecommerce data tables available as CSVs?

Tim_T
New Contributor

The course "Apache Sparkโ„ข Programming with Databricks" requires data sources such as training/ecommerce/events/events.parquet. Are these available as CSV files? My company's databricks configuration does not allow me to mount to such repositories, but I can upload CSVs.

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @Tim Tremperโ€‹, The specific dataset you mentioned, "training/ecommerce/events/events.parquet", is in Parquet format, but you can easily convert it into a CSV format using Apache Sparkโ„ข on Databricks.

Here's a step-by-step guide to convert the Parquet dataset into a CSV file and download it locally:

  • First, load the Parquet file into a DataFrame:
parquet_df = spark.read.parquet("dbfs:/databricks-datasets/ecommerce/events/events.parquet")

  • Next, save the DataFrame as a temporary CSV file in your DBFS:
parquet_df.write.csv("dbfs:/tmp/events.csv", mode="overwrite", header=True)

Now, you can copy the CSV file from DBFS to the local file system of the driver node:

%fs cp -r dbfs:/tmp/events.csv file:/tmp/events.csv

  • Finally, download the CSV file from the driver node to your local machine using the following command:

dbutils.fs.cp("file:/tmp/events.csv", "dbfs:/FileStore/events.csv", recurse=True)

You can now download the CSV file from your browser by navigating to:

https://<your-databricks-instance>/files/events.csv  Replace   <your-databricks-instance>   with the URL of your Databricks workspace.

Once you have the CSV file, you can upload it to your company's Databricks environment and use it as a data source for the "Apache Sparkโ„ข Programming with Databricks" course.

Remember that converting the Parquet dataset to a CSV format may cause the file size to increase and result in a loss of some features, like schema evolution and data compression. However, it should be sufficient for the purposes of the course.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.