cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Are training/ecommerce data tables available as CSVs?

Tim_T
New Contributor

The course "Apache Sparkโ„ข Programming with Databricks" requires data sources such as training/ecommerce/events/events.parquet. Are these available as CSV files? My company's databricks configuration does not allow me to mount to such repositories, but I can upload CSVs.

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @Tim Tremperโ€‹, The specific dataset you mentioned, "training/ecommerce/events/events.parquet", is in Parquet format, but you can easily convert it into a CSV format using Apache Sparkโ„ข on Databricks.

Here's a step-by-step guide to convert the Parquet dataset into a CSV file and download it locally:

  • First, load the Parquet file into a DataFrame:
parquet_df = spark.read.parquet("dbfs:/databricks-datasets/ecommerce/events/events.parquet")

  • Next, save the DataFrame as a temporary CSV file in your DBFS:
parquet_df.write.csv("dbfs:/tmp/events.csv", mode="overwrite", header=True)

Now, you can copy the CSV file from DBFS to the local file system of the driver node:

%fs cp -r dbfs:/tmp/events.csv file:/tmp/events.csv

  • Finally, download the CSV file from the driver node to your local machine using the following command:

dbutils.fs.cp("file:/tmp/events.csv", "dbfs:/FileStore/events.csv", recurse=True)

You can now download the CSV file from your browser by navigating to:

https://<your-databricks-instance>/files/events.csv  Replace   <your-databricks-instance>   with the URL of your Databricks workspace.

Once you have the CSV file, you can upload it to your company's Databricks environment and use it as a data source for the "Apache Sparkโ„ข Programming with Databricks" course.

Remember that converting the Parquet dataset to a CSV format may cause the file size to increase and result in a loss of some features, like schema evolution and data compression. However, it should be sufficient for the purposes of the course.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group