Data Engineering

Forum Posts

Sorted by:

by sachamourier • Contributor

07-28-2025 8:17:06 AM

2141 Views
5 replies
3 kudos

Resolved! Enable to use library GraphFrames

Hello,I am trying to install and use the library GraphFrames but keep receiving the following error: "AttributeError: 'SparkSession' object has no attribute '_sc'".I have tried to install the library on my all-purpose cluster (Access mode: Standard)....

Data Engineering

2141 Views
5 replies
3 kudos

07-28-2025 8:17:06 AM

View Replies

Latest Reply

sachamourier
Contributor

07-29-2025 12:28:06 AM

3 kudos

@szymon_dybczak Thanks for the responses. I indeed changed my all-purpose cluster access mode and it worked. I figured that was a nicest option than changing the runtime.

3 kudos

07-29-2025 12:28:06 AM

4 More Replies

by jar • Contributor

07-28-2025 6:51:57 PM

2557 Views
2 replies
0 kudos

Resolved! Use of Python variable in SQL cell

If using spark.conf.set(<variable_name>, <variable_value>), or just referring a widget value directly, in a Python cell and then referring to it in a SQL cell with ${variable_name} one gets the warning: "SQL query contains a dollar sign parameter, $p...

Data Engineering

2557 Views
2 replies
0 kudos

07-28-2025 6:51:57 PM

View Replies

Latest Reply

jar
Contributor

07-28-2025 10:40:18 PM

0 kudos

Frustrating indeed. Thank you, @lingareddy_Alva

0 kudos

07-28-2025 10:40:18 PM

1 More Replies

by pavlosskev • New Contributor III

07-16-2025 6:25:19 AM

3691 Views
1 replies
0 kudos

Oracle JDBC Load Fails with Timestamp Partitioning (lowerBound/upperBound)

Hi everyone,I'm trying to read data from an Oracle database into Databricks using JDBC with timestamp-based partitioning. However, it seems that the partitioning doesn't work as expected when I specify lowerBound and upperBound using timestamp string...

Data Engineering

3691 Views
1 replies
0 kudos

07-16-2025 6:25:19 AM

View Replies

Latest Reply

mani_22
Databricks Employee

07-28-2025 4:29:06 PM

0 kudos

@pavlosskev Could you try adding the following option as well to your read? .option("sessionInitStatement", "ALTER SESSION SET NLS_TIMESTAMP_FORMAT = 'YYYY-MM-DD HH24:MI:SS'") df = ( spark.read.format("jdbc") .option("url", jdbcUrl) .opti...

0 kudos

07-28-2025 4:29:06 PM

by Sainath368 • Contributor

07-28-2025 5:59:04 AM

1218 Views
1 replies
1 kudos

Resolved! E series vs F series VM's

Hi all,I need to run weekly maintenance on approximately 7,000 tables in my Databricks environment, involving OPTIMIZE, VACUUM, and ANALYZE TABLE (for statistics calculation) on all tables.My question is: between the Ev4, Edv4, and Fsv2 VM series, wh...

Data Engineering

1218 Views
1 replies
1 kudos

07-28-2025 5:59:04 AM

View Replies

Latest Reply

mani_22
Databricks Employee

07-28-2025 3:50:13 PM

1 kudos

@Sainath368 OPTIMIZE and VACUUM are compute-intensive operations, so you can choose a compute-optimized instance like the F series for both drivers and workers, which has a higher CPU-to-memory ratio. If its UC managed table, I recommend enabling Pr...

1 kudos

07-28-2025 3:50:13 PM

by Eyespoop • New Contributor II

06-23-2022 2:59:21 AM

31203 Views
4 replies
4 kudos

Resolved! PySpark: Writing Parquet Files to the Azure Blob Storage Container

Currently I am having some issues with the writing of the parquet file in the Storage Container. I do have the codes running but whenever the dataframe writer puts the parquet to the blob storage instead of the parquet file type, it is created as a f...

Data Engineering

31203 Views
4 replies
4 kudos

06-23-2022 2:59:21 AM

View Replies

Latest Reply

amarv
New Contributor II

07-28-2025 2:37:07 PM

4 kudos

This is my approach:from databricks.sdk.runtime import dbutils from pyspark.sql.types import DataFrame output_base_url = "abfss://..." def write_single_parquet_file(df: DataFrame, filename: str): print(f"Writing '{filename}.parquet' to ABFS") ...

4 kudos

07-28-2025 2:37:07 PM

3 More Replies

by yhu126 • New Contributor

07-28-2025 9:33:35 AM

1023 Views
1 replies
0 kudos

How to create a SparkSession in jobs run-unit-tests

I’m converting my Python unit tests to run with databricks jobs run-unit-tests.Each test needs a SparkSession, but every pattern I try What I tried1. Create my own local Sparkspark = (SparkSession.builder.master("local[*]").appName("unit-test").getOr...

Data Engineering

1023 Views
1 replies
0 kudos

07-28-2025 9:33:35 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

07-28-2025 12:46:54 PM

0 kudos

Hi @yhu126 ,Maybe below blog post give you some inspiration:Writing Unit Tests for PySpark in Databricks: Appr... - Databricks Community - 122398

0 kudos

07-28-2025 12:46:54 PM

by nkrom456 • New Contributor III

07-28-2025 4:07:02 AM

2802 Views
7 replies
1 kudos

Resolved! Unable to resolve column error while trying to query the view

I have a federated table from snowflake in data bricks say employee.When i executed print schema i am able to see schema as "employeeid": long,"employeename":stringTried to create a view as create view vw_emp with schema binding as select `"employeei...

Data Engineering

2802 Views
7 replies
1 kudos

07-28-2025 4:07:02 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

07-28-2025 4:13:50 AM

1 kudos

Hi @nkrom456 ,Try something like this. If you are using backticks it treats a column name exactly as you type (in this case it treats double quotes as a part of a colum name)create view vw_emp with schema binding as select `employeeid` from employee ...

1 kudos

07-28-2025 4:13:50 AM

6 More Replies

by RyHubb • New Contributor III

01-31-2024 1:21:21 PM

7289 Views
6 replies
1 kudos

Resolved! Databricks asset bundles job and pipeline

Hello, I'm looking to create a job which is linked to a delta live table. Given the job code like this: my_job_name: name: thejobname schedule: quartz_cron_expression: 56 30 12 * * ? timezone_id: UTC pause_stat...

Data Engineering

7289 Views
6 replies
1 kudos

01-31-2024 1:21:21 PM

View Replies

Latest Reply

Laurens1
New Contributor II

07-28-2025 7:16:38 AM

1 kudos

This ended a frustrating search! Would be great to add this to the documentation instead of "go to portal and copy paste the id"!!!

1 kudos

07-28-2025 7:16:38 AM

5 More Replies

by noorbasha534 • Valued Contributor II

07-28-2025 5:10:46 AM

702 Views
1 replies
2 kudos

Machine type for different operations in Azure Databricks

Dear alldo we have a general recommendation for the virtual machine type to be used for different operations in Azure Databricks? we are looking for the below -1. VACUUM 2. OPTIMIZE 3. ANALYZE STATS 4. DESCRIBE TABLE HISTORYI understood at a high lev...

Data Engineering

702 Views
1 replies
2 kudos

07-28-2025 5:10:46 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

07-28-2025 5:50:50 AM

2 kudos

Hi @noorbasha534 ,Here's a general recommendation from Databricks. So they're recommending to run OPTIMIZE on compute optimized VMs and VACUUM on general purpose.Comprehensive Guide to Optimize Data Workloads | DatabricksBut as you said, VACCUM is co...

2 kudos

07-28-2025 5:50:50 AM

by xhead • New Contributor II

11-20-2023 1:43:21 PM

30127 Views
15 replies
3 kudos

Does "databricks bundle deploy" clean up old files?

I'm looking at this page (Databricks Asset Bundles development work tasks) in the Databricks documentation.When repo assets are deployed to a databricks workspace, it is not clear if the "databricks bundle deploy" will remove files from the target wo...

Data Engineering

bundle

cli

deploy

30127 Views
15 replies
3 kudos

11-20-2023 1:43:21 PM

View Replies

Latest Reply

ganapati
New Contributor III

05-23-2025 3:01:49 AM

3 kudos

@JamesGraham this issue is related to "databricks bundle deploy" command itself, when run inside ci/cd pipeline, i am still seeing old configs in bundle.tf.json. Ideally it should be updated to changes done from previous run. But i am still seeing er...

3 kudos

05-23-2025 3:01:49 AM

14 More Replies

by Aidonis • New Contributor III

12-16-2022 4:12:39 AM

26278 Views
4 replies
4 kudos

Resolved! Load Data from Sharepoint Site to Delta table in Databricks

Hi New to the community so sorry if my post lacks detail.I am trying to create a connection between databricks and a sharepoint site to read excel files into a delta tableI can see there is a FiveTran partner connection that we can use to get sharepo...

Data Engineering

26278 Views
4 replies
4 kudos

12-16-2022 4:12:39 AM

View Replies

Latest Reply

gaurav_singh_14
New Contributor II

05-27-2025 5:17:58 AM

4 kudos

@Ajay-Pandey can we connect using user ID without using client id and secrets

4 kudos

05-27-2025 5:17:58 AM

3 More Replies

by rizkyjarr • New Contributor II

06-19-2025 8:38:35 PM

1012 Views
3 replies
0 kudos

"with open" not working in single user access mode cluster (no such file or directory found)

Hi fellow engineers,So i was trying to read binary files (.jpg) in a ADLS2 mounted containerBut when im trying to read the file using "with open" i kept getting an error: No such file or directory foundI've read something related to this matter on So...

Data Engineering

1012 Views
3 replies
0 kudos

06-19-2025 8:38:35 PM

View Replies

Latest Reply

amenon
Databricks Employee

07-28-2025 3:10:38 AM

0 kudos

@rizkyjarr , did you run into the issue with `with open()` using `\dbfs\mnt` paths , while using a non-Unity catalog enabled workspace despite using the single user access mode cluster as you pointed out?

0 kudos

07-28-2025 3:10:38 AM

2 More Replies

by tariq • New Contributor III

04-11-2024 10:17:00 PM

8883 Views
6 replies
1 kudos

SqlContext in DBR 14.3

I have a Databricks workspace in GCP and I am using the cluster with the Runtime 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am trying to set the checkpoint directory location using the following command in a notebook:spark.sparkContext.set...

Data Engineering

8883 Views
6 replies
1 kudos

04-11-2024 10:17:00 PM

View Replies

Latest Reply

Sjors
New Contributor II

07-28-2025 2:31:25 AM

1 kudos

Has this been resolved? I'm also encountering the same issue with spark.sparkContext.parallelize(). My code is verifiably running on a single user access cluster.

1 kudos

07-28-2025 2:31:25 AM

5 More Replies

by adhi_databricks • Contributor

07-27-2025 11:06:25 PM

1148 Views
3 replies
1 kudos

Resolved! Table of Contents Not Visible in Databricks Notebook

Hi everyone,I'm experiencing a strange issue with one of my Databricks notebooks — the Table of Contents (ToC) pane is no longer visible. It used to show up on the left, but now it’s missing only for this specific notebook.What I’ve observed so far:T...

Data Engineering

1148 Views
3 replies
1 kudos

07-27-2025 11:06:25 PM

View Replies

Latest Reply

Raghavan93513
Databricks Employee

07-28-2025 1:47:05 AM

1 kudos

Hi @adhi_databricks.,Good day! Please check for the probable errors: It is a Code cell, not markdown - Change to markdown cell, then add headingIncorrect heading syntax - Try using # Heading or #Heading and then refresh the pageNo headings/titles - A...

1 kudos

07-28-2025 1:47:05 AM

2 More Replies

by sandelic • New Contributor II

07-27-2025 12:18:39 PM

2272 Views
5 replies
1 kudos

Databricks with Airflow

Hi there, I'm trying to understand the advantages of using Airflow operators to orchestrate Databricks notebooks, given that Databricks already offers its own workflow solution. Could someone please explain the benefits?Thanks,Stefan

Data Engineering

2272 Views
5 replies
1 kudos

07-27-2025 12:18:39 PM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

07-27-2025 1:28:06 PM

1 kudos

Hi @sandelic ,If you workload is mainly Databricks-centered then stick to workflows. They are easy to manage and worfklows directly integrate with Databricks notebooks and jobs.But sometimes your workload requires complex orchestration and scheduling...

1 kudos

07-27-2025 1:28:06 PM

4 More Replies

Databricks Community

Forum Posts

Resolved! Enable to use library GraphFrames

Resolved! Use of Python variable in SQL cell

Oracle JDBC Load Fails with Timestamp Partitioning (lowerBound/upperBound)

Resolved! E series vs F series VM's

Resolved! PySpark: Writing Parquet Files to the Azure Blob Storage Container

How to create a SparkSession in jobs run-unit-tests

Resolved! Unable to resolve column error while trying to query the view

Resolved! Databricks asset bundles job and pipeline

Machine type for different operations in Azure Databricks

Does "databricks bundle deploy" clean up old files?

Resolved! Load Data from Sharepoint Site to Delta table in Databricks

"with open" not working in single user access mode cluster (no such file or directory found)

SqlContext in DBR 14.3

Resolved! Table of Contents Not Visible in Databricks Notebook

Databricks with Airflow

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template