cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sachamourier
by Contributor
  • 2141 Views
  • 5 replies
  • 3 kudos

Resolved! Enable to use library GraphFrames

Hello,I am trying to install and use the library GraphFrames but keep receiving the following error: "AttributeError: 'SparkSession' object has no attribute '_sc'".I have tried to install the library on my all-purpose cluster (Access mode: Standard)....

  • 2141 Views
  • 5 replies
  • 3 kudos
Latest Reply
sachamourier
Contributor
  • 3 kudos

@szymon_dybczak Thanks for the responses. I indeed changed my all-purpose cluster access mode and it worked. I figured that was a nicest option than changing the runtime.

  • 3 kudos
4 More Replies
jar
by Contributor
  • 2557 Views
  • 2 replies
  • 0 kudos

Resolved! Use of Python variable in SQL cell

If using spark.conf.set(<variable_name>, <variable_value>), or just referring a widget value directly, in a Python cell and then referring to it in a SQL cell with ${variable_name} one gets the warning: "SQL query contains a dollar sign parameter, $p...

  • 2557 Views
  • 2 replies
  • 0 kudos
Latest Reply
jar
Contributor
  • 0 kudos

Frustrating indeed. Thank you, @lingareddy_Alva 

  • 0 kudos
1 More Replies
pavlosskev
by New Contributor III
  • 3691 Views
  • 1 replies
  • 0 kudos

Oracle JDBC Load Fails with Timestamp Partitioning (lowerBound/upperBound)

Hi everyone,I'm trying to read data from an Oracle database into Databricks using JDBC with timestamp-based partitioning. However, it seems that the partitioning doesn't work as expected when I specify lowerBound and upperBound using timestamp string...

  • 3691 Views
  • 1 replies
  • 0 kudos
Latest Reply
mani_22
Databricks Employee
  • 0 kudos

@pavlosskev Could you try adding the following option as well to your read? .option("sessionInitStatement", "ALTER SESSION SET NLS_TIMESTAMP_FORMAT = 'YYYY-MM-DD HH24:MI:SS'") df = ( spark.read.format("jdbc") .option("url", jdbcUrl) .opti...

  • 0 kudos
Sainath368
by Contributor
  • 1218 Views
  • 1 replies
  • 1 kudos

Resolved! E series vs F series VM's

Hi all,I need to run weekly maintenance on approximately 7,000 tables in my Databricks environment, involving OPTIMIZE, VACUUM, and ANALYZE TABLE (for statistics calculation) on all tables.My question is: between the Ev4, Edv4, and Fsv2 VM series, wh...

  • 1218 Views
  • 1 replies
  • 1 kudos
Latest Reply
mani_22
Databricks Employee
  • 1 kudos

@Sainath368  OPTIMIZE and VACUUM are compute-intensive operations, so you can choose a compute-optimized instance like the F series for both drivers and workers, which has a higher CPU-to-memory ratio. If its UC managed table, I recommend enabling Pr...

  • 1 kudos
Eyespoop
by New Contributor II
  • 31203 Views
  • 4 replies
  • 4 kudos

Resolved! PySpark: Writing Parquet Files to the Azure Blob Storage Container

Currently I am having some issues with the writing of the parquet file in the Storage Container. I do have the codes running but whenever the dataframe writer puts the parquet to the blob storage instead of the parquet file type, it is created as a f...

image image(1) image(2)
  • 31203 Views
  • 4 replies
  • 4 kudos
Latest Reply
amarv
New Contributor II
  • 4 kudos

This is my approach:from databricks.sdk.runtime import dbutils from pyspark.sql.types import DataFrame output_base_url = "abfss://..." def write_single_parquet_file(df: DataFrame, filename: str): print(f"Writing '{filename}.parquet' to ABFS") ...

  • 4 kudos
3 More Replies
yhu126
by New Contributor
  • 1023 Views
  • 1 replies
  • 0 kudos

How to create a SparkSession in jobs run-unit-tests

I’m converting my Python unit tests to run with databricks jobs run-unit-tests.Each test needs a SparkSession, but every pattern I try What I tried1. Create my own local Sparkspark = (SparkSession.builder.master("local[*]").appName("unit-test").getOr...

  • 1023 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @yhu126 ,Maybe below blog post give you some inspiration:Writing Unit Tests for PySpark in Databricks: Appr... - Databricks Community - 122398

  • 0 kudos
nkrom456
by New Contributor III
  • 2802 Views
  • 7 replies
  • 1 kudos

Resolved! Unable to resolve column error while trying to query the view

I have a federated table from snowflake in data bricks say employee.When i executed print schema i am able to see schema as "employeeid": long,"employeename":stringTried to create a view as create view vw_emp with schema binding as select `"employeei...

  • 2802 Views
  • 7 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @nkrom456 ,Try something like this. If you are using backticks it treats a column name exactly as you type (in this case it treats double quotes as a part of a colum name)create view vw_emp with schema binding as select `employeeid` from employee ...

  • 1 kudos
6 More Replies
RyHubb
by New Contributor III
  • 7289 Views
  • 6 replies
  • 1 kudos

Resolved! Databricks asset bundles job and pipeline

Hello, I'm looking to create a job which is linked to a delta live table.  Given the job code like this: my_job_name: name: thejobname schedule: quartz_cron_expression: 56 30 12 * * ? timezone_id: UTC pause_stat...

  • 7289 Views
  • 6 replies
  • 1 kudos
Latest Reply
Laurens1
New Contributor II
  • 1 kudos

This ended a frustrating search! Would be great to add this to the documentation instead of "go to portal and copy paste the id"!!!

  • 1 kudos
5 More Replies
noorbasha534
by Valued Contributor II
  • 702 Views
  • 1 replies
  • 2 kudos

Machine type for different operations in Azure Databricks

Dear alldo we have a general recommendation for the virtual machine type to be used for different operations in Azure Databricks? we are looking for the below -1. VACUUM 2. OPTIMIZE 3. ANALYZE STATS 4. DESCRIBE TABLE HISTORYI understood at a high lev...

  • 702 Views
  • 1 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @noorbasha534 ,Here's a general recommendation from Databricks. So they're recommending to run OPTIMIZE on compute optimized VMs and VACUUM on general purpose.Comprehensive Guide to Optimize Data Workloads | DatabricksBut as you said, VACCUM is co...

  • 2 kudos
xhead
by New Contributor II
  • 30127 Views
  • 15 replies
  • 3 kudos

Does "databricks bundle deploy" clean up old files?

I'm looking at this page (Databricks Asset Bundles development work tasks) in the Databricks documentation.When repo assets are deployed to a databricks workspace, it is not clear if the "databricks bundle deploy" will remove files from the target wo...

Data Engineering
bundle
cli
deploy
  • 30127 Views
  • 15 replies
  • 3 kudos
Latest Reply
ganapati
New Contributor III
  • 3 kudos

@JamesGraham this issue is related to "databricks bundle deploy" command itself, when run inside ci/cd pipeline, i am still seeing old configs in bundle.tf.json. Ideally it should be updated to changes done from previous run. But i am still seeing er...

  • 3 kudos
14 More Replies
Aidonis
by New Contributor III
  • 26278 Views
  • 4 replies
  • 4 kudos

Resolved! Load Data from Sharepoint Site to Delta table in Databricks

Hi New to the community so sorry if my post lacks detail.I am trying to create a connection between databricks and a sharepoint site to read excel files into a delta tableI can see there is a FiveTran partner connection that we can use to get sharepo...

  • 26278 Views
  • 4 replies
  • 4 kudos
Latest Reply
gaurav_singh_14
New Contributor II
  • 4 kudos

@Ajay-Pandey can we connect using user ID without using client id and secrets

  • 4 kudos
3 More Replies
rizkyjarr
by New Contributor II
  • 1012 Views
  • 3 replies
  • 0 kudos

"with open" not working in single user access mode cluster (no such file or directory found)

Hi fellow engineers,So i was trying to read binary files (.jpg) in a ADLS2 mounted containerBut when im trying to read the file using "with open" i kept getting an error: No such file or directory foundI've read something related to this matter on So...

rizkyjarr_0-1750390374120.png rizkyjarr_1-1750390546193.png
  • 1012 Views
  • 3 replies
  • 0 kudos
Latest Reply
amenon
Databricks Employee
  • 0 kudos

@rizkyjarr , did you run into the issue with `with open()` using `\dbfs\mnt` paths , while using a non-Unity catalog enabled workspace despite using the single user access mode cluster  as you pointed out?

  • 0 kudos
2 More Replies
tariq
by New Contributor III
  • 8883 Views
  • 6 replies
  • 1 kudos

SqlContext in DBR 14.3

I have a Databricks workspace in GCP and I am using the cluster with the Runtime 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am trying to set the checkpoint directory location using the following command in a notebook:spark.sparkContext.set...

  • 8883 Views
  • 6 replies
  • 1 kudos
Latest Reply
Sjors
New Contributor II
  • 1 kudos

Has this been resolved? I'm also encountering the same issue with spark.sparkContext.parallelize(). My code is verifiably running on a single user access cluster. 

  • 1 kudos
5 More Replies
adhi_databricks
by Contributor
  • 1148 Views
  • 3 replies
  • 1 kudos

Resolved! Table of Contents Not Visible in Databricks Notebook

Hi everyone,I'm experiencing a strange issue with one of my Databricks notebooks — the Table of Contents (ToC) pane is no longer visible. It used to show up on the left, but now it’s missing only for this specific notebook.What I’ve observed so far:T...

  • 1148 Views
  • 3 replies
  • 1 kudos
Latest Reply
Raghavan93513
Databricks Employee
  • 1 kudos

Hi @adhi_databricks.,Good day! Please check for the probable errors: It is a Code cell, not markdown - Change to markdown cell, then add headingIncorrect heading syntax - Try using # Heading or #Heading and then refresh the pageNo headings/titles - A...

  • 1 kudos
2 More Replies
sandelic
by New Contributor II
  • 2272 Views
  • 5 replies
  • 1 kudos

Databricks with Airflow

Hi there, I'm trying to understand the advantages of using Airflow operators to orchestrate Databricks notebooks, given that Databricks already offers its own workflow solution. Could someone please explain the benefits?Thanks,Stefan

  • 2272 Views
  • 5 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @sandelic ,If you workload is mainly Databricks-centered then stick to workflows. They are easy to manage and worfklows directly integrate with Databricks notebooks and jobs.But sometimes your workload requires complex orchestration and scheduling...

  • 1 kudos
4 More Replies
Labels