Data Engineering

Forum Posts

Sorted by:

by dave_d • New Contributor II

10-10-2023 3:15:55 PM

7797 Views
2 replies
0 kudos

What is the "Columnar To Row" node in this simple Databricks SQL query profile?

I am running a relatively simple SQL query that writes back to a table on a Databricks serverless SQL warehouse, and I'm trying to understand why there is a "Columnar To Row" node in the query profile that is consuming the vast majority of the time s...

Data Engineering

7797 Views
2 replies
0 kudos

10-10-2023 3:15:55 PM

View Replies

Latest Reply

Annapurna_Hiriy
Databricks Employee

11-05-2023 5:08:59 AM

0 kudos

@dave_d We do not have a document with list of operations that would bring up ColumnarToRow node. This node provides a common executor to translate an RDD of ColumnarBatch into an RDD of InternalRow. This is inserted whenever such a transition is de...

0 kudos

11-05-2023 5:08:59 AM

1 More Replies

by Rafal9 • New Contributor II

11-04-2023 10:04:32 AM

8962 Views
0 replies
0 kudos

Issue during testing SparkSession.sql() with pytest.

Dear Community,I am testing pyspark code via pytest using VS code and Databricks Connect.SparkSession is initiated from Databricks Connect: from databricks.connect import DatabricksSessionspark = DatabricksSession.builder.getOrCreate()I am receiving...

Data Engineering

8962 Views
0 replies
0 kudos

11-04-2023 10:04:32 AM

by svrdragon • New Contributor

11-04-2023 4:35:05 AM

2869 Views
0 replies
0 kudos

optimizeWrite takes too long

Hi , We have a spark job write data in delta table for last 90 date partition. We have enabled spark.databricks.delta.autoCompact.enabled and delta.autoOptimize.optimizeWrite. Job takes 50 mins to complete. In that logic takes 12 mins and optimizewri...

Data Engineering

2869 Views
0 replies
0 kudos

11-04-2023 4:35:05 AM

by erigaud • Honored Contributor

10-12-2023 10:57:03 PM

4901 Views
3 replies
0 kudos

Merge DLT with Delta Table

Is there anyway to accomplish this ? I have an existing Delta Table and a separate Delta Live Table pipelines and I would like to merge data from a DLT to my existing Delta Table. Is this doable or completely impossible ?

Data Engineering

4901 Views
3 replies
0 kudos

10-12-2023 10:57:03 PM

View Replies

Latest Reply

LeifBruen
New Contributor II

11-03-2023 11:06:22 PM

0 kudos

Merging data from a Delta Live Table (DLT) into an existing Delta Table is possible with careful planning. Transition data from DLT to Delta Table through batch processing, data transformation, and ETL processes, ensuring schema compatibility.

0 kudos

11-03-2023 11:06:22 PM

2 More Replies

by 7cb15 • New Contributor

11-03-2023 4:44:56 PM

15506 Views
0 replies
0 kudos

org.apache.spark.SparkException: Job aborted due to stage failure while saving to s3

Hello, I am having issues saving a spark dataframe generated in a databricks notebook to an s3 bucket. The dataframe contains approximately 1.1M rows and 5 columns. The error is as follows: org.apache.spark.SparkException: Job aborted due to stage fa...

Data Engineering

15506 Views
0 replies
0 kudos

11-03-2023 4:44:56 PM

by NotARobot • New Contributor III

11-03-2023 9:23:16 AM

1729 Views
0 replies
2 kudos

Force DBR/Spark Version in Delta Live Tables Cluster Policy

Is there a way to use Compute Policies to force Delta Live Tables to use specific Databricks Runtime and PySpark versions? While trying to leverage some of the functions in PySpark 3.5.0, I don't seem to be able to get Delta Live Tables to use Databr...

Data Engineering

Compute Policies

Delta Live Tables

Graphframes

pyspark

1729 Views
0 replies
2 kudos

11-03-2023 9:23:16 AM

by JohnJustus • New Contributor III

11-02-2023 11:49:14 AM

13865 Views
1 replies
0 kudos

Accessing Excel file from Databricks

Hi,I am trying to access excel file that is stored in Azure Blob storage via Databricks.In my understanding, it is not possible to access using Pyspark. So accessing through Pandas is the option,Here is my code.%pip install openpyxlimport pandas as p...

Data Engineering

13865 Views
1 replies
0 kudos

11-02-2023 11:49:14 AM

View Replies

by databicky • Contributor II

11-02-2023 6:27:25 AM

6821 Views
2 replies
0 kudos

BLOCK_OFFSET_INSIDE_BLOCK ROW_OFFSET_INSIDE_BLOCK is not working

BLOCK_OFFSET_INSIDE_BLOCK ROW_OFFSET_INSIDE_BLOCK command is not working in spark, but these command is running in hive , when running in spark it get failed with invalid column like that

Data Engineering

6821 Views
2 replies
0 kudos

11-02-2023 6:27:25 AM

View Replies

by databicky • Contributor II

10-15-2023 4:24:01 AM

5533 Views
3 replies
1 kudos

No handler for udf/udaf/udtf for function

i created one function using jar file which is present in the cluster location, but when executing the hive query it is showing error as no handler for udf/udaf/udtf . this queries is running fine in hd insight clusters but when running in databricks...

Data Engineering

5533 Views
3 replies
1 kudos

10-15-2023 4:24:01 AM

View Replies

by DavidStarosta • New Contributor II

11-02-2023 5:55:50 AM

3273 Views
1 replies
0 kudos

Databricks Asset Bundles Jobs Updated instead of Create

Hello, is it possible to just update parameter values in different workspaces?YAML source code taken from workflow jobs always create a new job. I'd like to just change/update parameter values when I deploy bundle to different workspaces/environments...

Data Engineering

3273 Views
1 replies
0 kudos

11-02-2023 5:55:50 AM

View Replies

by dbuser1234 • New Contributor

11-03-2023 1:28:22 AM

3141 Views
0 replies
0 kudos

How to readstream from multiple sources?

Hi I am trying to readstream from 2 sources and join them into a target table. How can I do this in pyspark? Egt1 + t2 as my bronze table. I want to readstream from t1 and t2, and merge the changes into t3 (silver table)

Data Engineering

3141 Views
0 replies
0 kudos

11-03-2023 1:28:22 AM

by anmol_hans_de • New Contributor

11-03-2023 1:24:31 AM

8925 Views
0 replies
0 kudos

Exam suspended by proctor

Hi Team,I need urgent support since I was about to submit my exam and was just reviewing the responses but proctor suspended it because i did not satisfy the proctoring conditions. Even though i was sitting in a room with clear background and well li...

Data Engineering

8925 Views
0 replies
0 kudos

11-03-2023 1:24:31 AM

by BST • New Contributor

11-02-2023 6:07:05 PM

1439 Views
0 replies
0 kudos

Spark - Cluster Mode - Driver

When running a Spark Job in Cluster Mode, how does Spark decide which worker node to place the driver resources ?

Data Engineering

1439 Views
0 replies
0 kudos

11-02-2023 6:07:05 PM

by anirudh_a • New Contributor II

07-25-2023 7:20:10 AM

17887 Views
8 replies
3 kudos

Resolved! 'No file or Directory' error when using pandas.read_excel in Databricks

I am baffled by the behaviour of Databricks:Below you can see the contents of the directory using dbutils in Databricks. It shows the `test.xlsx` file clearly in directory (and I can even open it using `dbutils.fs.head`) But when I go to use panda.re...

Data Engineering

dbfs

panda

spark

spark config

17887 Views
8 replies
3 kudos

07-25-2023 7:20:10 AM

View Replies

Latest Reply

DamnKush
New Contributor II

11-02-2023 4:33:09 AM

3 kudos

Hey, I encountered it recently. I can see you are using the shared cluster, try switching to a single user cluster and it will fix it.Can someone let me know why it wasn't working w a shared cluster?Thanks.

3 kudos

11-02-2023 4:33:09 AM

7 More Replies

by Joe1912 • New Contributor III

11-02-2023 2:28:01 AM

3687 Views
1 replies
0 kudos

Consume 2 kafka topic with different schemas on 1 cluster databricks

Hi everyone,I have a concern that is there any way to read stream from 2 different kafka topics with 2 different in 1 jobs or same cluster? or we need to create 2 separate jobs for it ? (Job will need to process continually)

Data Engineering

3687 Views
1 replies
0 kudos

11-02-2023 2:28:01 AM

View Replies

Databricks Community

Forum Posts

What is the "Columnar To Row" node in this simple Databricks SQL query profile?

Issue during testing SparkSession.sql() with pytest.

optimizeWrite takes too long

Merge DLT with Delta Table

org.apache.spark.SparkException: Job aborted due to stage failure while saving to s3

Force DBR/Spark Version in Delta Live Tables Cluster Policy

Accessing Excel file from Databricks

BLOCK_OFFSET_INSIDE_BLOCK ROW_OFFSET_INSIDE_BLOCK is not working

No handler for udf/udaf/udtf for function

Databricks Asset Bundles Jobs Updated instead of Create

How to readstream from multiple sources?

Exam suspended by proctor

Spark - Cluster Mode - Driver

Resolved! 'No file or Directory' error when using pandas.read_excel in Databricks

Consume 2 kafka topic with different schemas on 1 cluster databricks

Join Us as a Local Community Builder!

Cognito as IdP provider for Delta Share

How to Retrieve the spark.statistics.createdAt Whe...

Not able to find lab for Data Engineering Learning...

Lakeflow Connect - Postgres connector

Prakash Hinduja Switzerland (Swiss) How do I build...