Welcome to the Databricks community! Let’s address the issue you’re facing with importing data in your notebook.
The example notebook you’re trying to run relies on a dataset located at /dbfs/databricks-datasets/wine-quality/winequality-white.csv
. This dataset contains information about white wine quality and is commonly used for regression or classification modeling.
Here are a few steps you can take to resolve the issue:
-
Mount the Data: If you’re using Databricks Community Edition, you might need to mount the dataset to make it accessible within your notebook. To do this, follow these steps:
- Click on the “Data” tab in the left sidebar.
- Click “Add Data” and select “DBFS” as the source.
- Enter the path
/databricks-datasets/wine-quality/winequality-white.csv
.
- Choose a mount point (e.g.,
/mnt/wine-quality
).
- Click “Create Table” to create a table associated with the mounted data.
-
Read Data Using Spark: Instead of directly reading the CSV file, use Spark to read the data. You can do this with the following code snippet in your notebook:
df = spark.read.csv('/dbfs/databricks-datasets/wine-quality/winequality-white.csv', header=True, inferSchema=True)
-
Convert to Pandas DataFrame (Optional): If you prefer working with Pandas DataFrames, you can convert the Spark DataFrame to a Pandas DataFrame:
df_pandas = df.toPandas()
Remember that Databricks Community Edition provides limited resources, so you might encounter some limitations. If you’re planning to work extensively with Databricks, consider exploring the full Databricks platform, which offers additional features and scalability.
Feel free to try the above steps, and let me know if you need further assistance! 😊