cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

dave_d
by New Contributor II
  • 7797 Views
  • 2 replies
  • 0 kudos

What is the "Columnar To Row" node in this simple Databricks SQL query profile?

I am running a relatively simple SQL query that writes back to a table on a Databricks serverless SQL warehouse, and I'm trying to understand why there is a "Columnar To Row" node in the query profile that is consuming the vast majority of the time s...

dave_d_0-1696974904324.png
  • 7797 Views
  • 2 replies
  • 0 kudos
Latest Reply
Annapurna_Hiriy
Databricks Employee
  • 0 kudos

 @dave_d We do not have a document with list of operations that would bring up ColumnarToRow node. This node provides a common executor to translate an RDD of ColumnarBatch into an RDD of InternalRow. This is inserted whenever such a transition is de...

  • 0 kudos
1 More Replies
Rafal9
by New Contributor II
  • 8962 Views
  • 0 replies
  • 0 kudos

Issue during testing SparkSession.sql() with pytest.

Dear Community,I am testing pyspark code via pytest using VS code and Databricks Connect.SparkSession is initiated from Databricks Connect: from databricks.connect import DatabricksSessionspark = DatabricksSession.builder.getOrCreate()I am  receiving...

  • 8962 Views
  • 0 replies
  • 0 kudos
svrdragon
by New Contributor
  • 2869 Views
  • 0 replies
  • 0 kudos

optimizeWrite takes too long

Hi , We have a spark job write data in delta table for last 90 date partition. We have enabled spark.databricks.delta.autoCompact.enabled and delta.autoOptimize.optimizeWrite. Job takes 50 mins to complete. In that logic takes 12 mins and optimizewri...

  • 2869 Views
  • 0 replies
  • 0 kudos
erigaud
by Honored Contributor
  • 4901 Views
  • 3 replies
  • 0 kudos

Merge DLT with Delta Table

Is there anyway to accomplish this ? I have an existing Delta Table and a separate Delta Live Table pipelines and I would like to merge data from a DLT to my existing Delta Table. Is this doable or completely impossible ?

  • 4901 Views
  • 3 replies
  • 0 kudos
Latest Reply
LeifBruen
New Contributor II
  • 0 kudos

Merging data from a Delta Live Table (DLT) into an existing Delta Table is possible with careful planning. Transition data from DLT to Delta Table through batch processing, data transformation, and ETL processes, ensuring schema compatibility. 

  • 0 kudos
2 More Replies
NotARobot
by New Contributor III
  • 1729 Views
  • 0 replies
  • 2 kudos

Force DBR/Spark Version in Delta Live Tables Cluster Policy

Is there a way to use Compute Policies to force Delta Live Tables to use specific Databricks Runtime and PySpark versions? While trying to leverage some of the functions in PySpark 3.5.0, I don't seem to be able to get Delta Live Tables to use Databr...

test_cluster_policy.png dlt_version.png
Data Engineering
Compute Policies
Delta Live Tables
Graphframes
pyspark
  • 1729 Views
  • 0 replies
  • 2 kudos
JohnJustus
by New Contributor III
  • 13865 Views
  • 1 replies
  • 0 kudos

Accessing Excel file from Databricks

Hi,I am trying to access excel file that is stored in Azure Blob storage via Databricks.In my understanding, it is not possible to access using Pyspark. So accessing through Pandas is the option,Here is my code.%pip install openpyxlimport pandas as p...

  • 13865 Views
  • 1 replies
  • 0 kudos
databicky
by Contributor II
  • 5533 Views
  • 3 replies
  • 1 kudos

No handler for udf/udaf/udtf for function

i created one function using jar file which is present in the cluster location, but when executing the hive query it is showing error as no handler for udf/udaf/udtf . this queries is running fine in hd insight clusters but when running in databricks...

IMG20231015164650.jpg
  • 5533 Views
  • 3 replies
  • 1 kudos
dbuser1234
by New Contributor
  • 3141 Views
  • 0 replies
  • 0 kudos

How to readstream from multiple sources?

Hi I am trying to readstream from 2 sources and join them into a target table. How can I do this in pyspark? Egt1 + t2 as my bronze table. I want to readstream from t1 and t2, and merge the changes into t3 (silver table)

  • 3141 Views
  • 0 replies
  • 0 kudos
anmol_hans_de
by New Contributor
  • 8925 Views
  • 0 replies
  • 0 kudos

Exam suspended by proctor

Hi Team,I need urgent support since I was about to submit my exam and was just reviewing the responses but proctor suspended it because i did not satisfy the proctoring conditions. Even though i was sitting in a room with clear background and well li...

  • 8925 Views
  • 0 replies
  • 0 kudos
BST
by New Contributor
  • 1439 Views
  • 0 replies
  • 0 kudos

Spark - Cluster Mode - Driver

When running a Spark Job in Cluster Mode, how does Spark decide which worker node to place the driver resources ? 

  • 1439 Views
  • 0 replies
  • 0 kudos
anirudh_a
by New Contributor II
  • 17887 Views
  • 8 replies
  • 3 kudos

Resolved! 'No file or Directory' error when using pandas.read_excel in Databricks

I am baffled by the behaviour of Databricks:Below you can see the contents of the directory using dbutils in Databricks. It shows the `test.xlsx` file clearly in directory (and I can even open it using `dbutils.fs.head`) But when I go to use panda.re...

wCLqf
Data Engineering
dbfs
panda
spark
spark config
  • 17887 Views
  • 8 replies
  • 3 kudos
Latest Reply
DamnKush
New Contributor II
  • 3 kudos

Hey, I encountered it recently. I can see you are using the shared cluster, try switching to a single user cluster and it will fix it.Can someone let me know why it wasn't working w a shared cluster?Thanks.

  • 3 kudos
7 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels