Databricks Community

Eric76 · ‎02-12-2024

Hi,

Newcomer here. I am experimenting with the community version of databrick.

I wanted to run the notebook example provided here https://community.cloud.databricks.com/?o=6085264701896358#notebook/2691200955149229

It failed because it cannot import the data from

/dbfs/databricks-datasets/wine-quality/winequality-white.csv.

anyway to workaround this. do I need to have a cloud account for this example

Kaniz_Fatma · ‎02-14-2024

Hi @Eric76,

Welcome to the Databricks community! Let’s address the issue you’re facing with importing data in your notebook.

The example notebook you’re trying to run relies on a dataset located at /dbfs/databricks-datasets/wine-quality/winequality-white.csv. This dataset contains information about white wine quality and is commonly used for regression or classification modeling.

Here are a few steps you can take to resolve the issue:

Mount the Data: If you’re using Databricks Community Edition, you might need to mount the dataset to make it accessible within your notebook. To do this, follow these steps:
- Click on the “Data” tab in the left sidebar.
- Click “Add Data” and select “DBFS” as the source.
- Enter the path /databricks-datasets/wine-quality/winequality-white.csv.
- Choose a mount point (e.g., /mnt/wine-quality).
- Click “Create Table” to create a table associated with the mounted data.
Read Data Using Spark: Instead of directly reading the CSV file, use Spark to read the data. You can do this with the following code snippet in your notebook:
```
# Read the data into a Spark DataFrame
df = spark.read.csv('/dbfs/databricks-datasets/wine-quality/winequality-white.csv', header=True, inferSchema=True)
```
Convert to Pandas DataFrame (Optional): If you prefer working with Pandas DataFrames, you can convert the Spark DataFrame to a Pandas DataFrame:
```
# Convert Spark DataFrame to Pandas DataFrame
df_pandas = df.toPandas()
```

Remember that Databricks Community Edition provides limited resources, so you might encounter some limitations. If you’re planning to work extensively with Databricks, consider exploring the full Databricks platform, which offers additional features and scalability.

Feel free to try the above steps, and let me know if you need further assistance! 😊

Databricks Community

Can the community version of databrick run model training examples?

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 10 October - 31 October

Intelligent Data Engineering: Beyond the AI Hype

GenAI: The Shift to Data Intelligence

Big Book of Data Engineering — 3rd Edition