cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

NotARobot
by New Contributor III
  • 1590 Views
  • 0 replies
  • 2 kudos

Force DBR/Spark Version in Delta Live Tables Cluster Policy

Is there a way to use Compute Policies to force Delta Live Tables to use specific Databricks Runtime and PySpark versions? While trying to leverage some of the functions in PySpark 3.5.0, I don't seem to be able to get Delta Live Tables to use Databr...

test_cluster_policy.png dlt_version.png
Data Engineering
Compute Policies
Delta Live Tables
Graphframes
pyspark
  • 1590 Views
  • 0 replies
  • 2 kudos
JohnJustus
by New Contributor III
  • 13533 Views
  • 1 replies
  • 0 kudos

Accessing Excel file from Databricks

Hi,I am trying to access excel file that is stored in Azure Blob storage via Databricks.In my understanding, it is not possible to access using Pyspark. So accessing through Pandas is the option,Here is my code.%pip install openpyxlimport pandas as p...

  • 13533 Views
  • 1 replies
  • 0 kudos
databicky
by Contributor II
  • 5315 Views
  • 3 replies
  • 1 kudos

No handler for udf/udaf/udtf for function

i created one function using jar file which is present in the cluster location, but when executing the hive query it is showing error as no handler for udf/udaf/udtf . this queries is running fine in hd insight clusters but when running in databricks...

IMG20231015164650.jpg
  • 5315 Views
  • 3 replies
  • 1 kudos
dbuser1234
by New Contributor
  • 3035 Views
  • 0 replies
  • 0 kudos

How to readstream from multiple sources?

Hi I am trying to readstream from 2 sources and join them into a target table. How can I do this in pyspark? Egt1 + t2 as my bronze table. I want to readstream from t1 and t2, and merge the changes into t3 (silver table)

  • 3035 Views
  • 0 replies
  • 0 kudos
anmol_hans_de
by New Contributor
  • 8830 Views
  • 0 replies
  • 0 kudos

Exam suspended by proctor

Hi Team,I need urgent support since I was about to submit my exam and was just reviewing the responses but proctor suspended it because i did not satisfy the proctoring conditions. Even though i was sitting in a room with clear background and well li...

  • 8830 Views
  • 0 replies
  • 0 kudos
BST
by New Contributor
  • 1372 Views
  • 0 replies
  • 0 kudos

Spark - Cluster Mode - Driver

When running a Spark Job in Cluster Mode, how does Spark decide which worker node to place the driver resources ? 

  • 1372 Views
  • 0 replies
  • 0 kudos
anirudh_a
by New Contributor II
  • 16495 Views
  • 8 replies
  • 3 kudos

Resolved! 'No file or Directory' error when using pandas.read_excel in Databricks

I am baffled by the behaviour of Databricks:Below you can see the contents of the directory using dbutils in Databricks. It shows the `test.xlsx` file clearly in directory (and I can even open it using `dbutils.fs.head`) But when I go to use panda.re...

wCLqf
Data Engineering
dbfs
panda
spark
spark config
  • 16495 Views
  • 8 replies
  • 3 kudos
Latest Reply
DamnKush
New Contributor II
  • 3 kudos

Hey, I encountered it recently. I can see you are using the shared cluster, try switching to a single user cluster and it will fix it.Can someone let me know why it wasn't working w a shared cluster?Thanks.

  • 3 kudos
7 More Replies
Joe1912
by New Contributor III
  • 1485 Views
  • 0 replies
  • 0 kudos

Strategy to add new table base on silver data

I have a merge function for streaming foreachBatch kind ofmergedf(df,i):    merge_func_1(df,i)     merge_func_2(df,i)Then I want to add new merge_func_3 into it. Is there any best practices for this case? when streaming always runs, how can I process...

  • 1485 Views
  • 0 replies
  • 0 kudos
UtkarshTrehan
by New Contributor
  • 15674 Views
  • 1 replies
  • 1 kudos

Inconsistent Results When Writing to Oracle DB with Spark's dropDuplicates and foreachPartition

It's more a spark question then a databricks question, I'm encountering an issue when writing data to an Oracle database using Apache Spark. My workflow involves removing duplicate rows from a DataFrame and then writing the deduplicated DataFrame to ...

  • 15674 Views
  • 1 replies
  • 1 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 1 kudos

The difference in behaviour between using foreachPartition and data.write.jdbc(...) after dropDuplicates() could be due to how Spark handles data partitioning and operations on partitions. When you use foreachPartition, you are manually handling the ...

  • 1 kudos
Graham
by New Contributor III
  • 9686 Views
  • 5 replies
  • 3 kudos

"MERGE" always slower than "CREATE OR REPLACE"

OverviewTo update our Data Warehouse tables, we have tried two methods: "CREATE OR REPLACE" and "MERGE". With every query we've tried, "MERGE" is slower.My question is this: Has anyone successfully gotten a "MERGE" to perform faster than a "CREATE OR...

  • 9686 Views
  • 5 replies
  • 3 kudos
Latest Reply
Manisha_Jena
Databricks Employee
  • 3 kudos

Hi @Graham Can you please try Low Shuffle Merge [LSM]  and see if it helps? LSM is a new MERGE algorithm that aims to maintain the existing data organization (including z-order clustering) for unmodified data, while simultaneously improving performan...

  • 3 kudos
4 More Replies
carlosna
by New Contributor II
  • 47716 Views
  • 0 replies
  • 0 kudos

Recover files from previous cluster execution

I saved a file with results by just opening a file via fopen("filename.csv", "a").Once the execution ended (and the cluster shutted down) I couldn't retrieve the file.I found that the file was stored in "/databricks/driver", and that folder empties w...

  • 47716 Views
  • 0 replies
  • 0 kudos
Databricks143
by New Contributor III
  • 1343 Views
  • 0 replies
  • 0 kudos

Failure to intialize congratulations

Hi team,When we reading the CSV file from azure blob using databricks we are not getting any key error and able to read the data from  blob .But if we are trying to read XML file  it failed with key issue invalid configuration . Error:Failure to inti...

  • 1343 Views
  • 0 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels