Welcome to the Databricks community! Letโs address the issue youโre facing with importing data in your notebook.
The example notebook youโre trying to run relies on a dataset located at /dbfs/databricks-datasets/wine-quality/winequality-white.csv
. This dataset contains information about white wine quality and is commonly used for regression or classification modeling.
Here are a few steps you can take to resolve the issue:
-
Mount the Data: If youโre using Databricks Community Edition, you might need to mount the dataset to make it accessible within your notebook. To do this, follow these steps:
- Click on the โDataโ tab in the left sidebar.
- Click โAdd Dataโ and select โDBFSโ as the source.
- Enter the path
/databricks-datasets/wine-quality/winequality-white.csv
.
- Choose a mount point (e.g.,
/mnt/wine-quality
).
- Click โCreate Tableโ to create a table associated with the mounted data.
-
Read Data Using Spark: Instead of directly reading the CSV file, use Spark to read the data. You can do this with the following code snippet in your notebook:
df = spark.read.csv('/dbfs/databricks-datasets/wine-quality/winequality-white.csv', header=True, inferSchema=True)
-
Convert to Pandas DataFrame (Optional): If you prefer working with Pandas DataFrames, you can convert the Spark DataFrame to a Pandas DataFrame:
df_pandas = df.toPandas()
Remember that Databricks Community Edition provides limited resources, so you might encounter some limitations. If youโre planning to work extensively with Databricks, consider exploring the full Databricks platform, which offers additional features and scalability.
Feel free to try the above steps, and let me know if you need further assistance! ๐