Data Engineering

Forum Posts

Sorted by:

by Immassive • New Contributor II

11-07-2023 3:50:47 AM

794 Views
1 replies
0 kudos

Reading information_schema tables through JDBC connection

Hi, I am using Unity Catalog as storage for data. I have an external system that establishes connection to Unity Catalog via a JDBC connection using the Databricks driver:Configure the Databricks ODBC and JDBC drivers - Azure Databricks | Microsoft L...

Data Engineering

794 Views
1 replies
0 kudos

11-07-2023 3:50:47 AM

View Replies

Latest Reply

Immassive
New Contributor II

11-08-2023 3:56:40 AM

0 kudos

Note: I can see the tables of the system.information schema in the UI of Databricks and read them there.

0 kudos

11-08-2023 3:56:40 AM

by alj_a • New Contributor III

11-06-2023 11:06:29 PM

563 Views
2 replies
0 kudos

source db and target db in DLT

Hi,Thanks in advance.I am new in DLT, the scenario is i need to read the data from cloud storage(ADLS) and load it into my bronze table. and read it from bronz table -> do some DQ checks and load the cleaned data into my silver table. finally populat...

Data Engineering

563 Views
2 replies
0 kudos

11-06-2023 11:06:29 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-07-2023 2:12:25 AM

0 kudos

Hi @alj_a, In Delta Live Tables (DLT), the database or schema name is specified in the table name itself. You can specify the database name in the @dlt.table decorator by using the format database_name.table_name.

0 kudos

11-07-2023 2:12:25 AM

1 More Replies

by marianopenn • New Contributor III

07-27-2023 8:31:22 AM

1405 Views
2 replies
1 kudos

Databricks VSCode Extension Sync Timeout

I am using the databricks VSCode extension to sync my local repository to Databricks Workspaces. I have everything configured such that smaller syncs work fine, but a full sync of my repository leads to the following error:Sync Error: Post "https://<...

Data Engineering

dbx sync

Repos

VSCode

Workspaces

1405 Views
2 replies
1 kudos

07-27-2023 8:31:22 AM

View Replies

Latest Reply

kimongrigorakis
New Contributor II

11-06-2023 2:27:32 AM

1 kudos

Same issue here..... Can someone please help??

1 kudos

11-06-2023 2:27:32 AM

1 More Replies

by 278875 • New Contributor

09-03-2022 7:04:56 AM

3412 Views
4 replies
1 kudos

How do I figure out the cost breakdown for Databricks

I'm trying to figure out the cost breakdown for the Databricks usage for my team.When I go into the Databricks administration console and click Usage when I select to show the usage By SKU it just displays the type of cluster but not the name of it. ...

Data Engineering

3412 Views
4 replies
1 kudos

09-03-2022 7:04:56 AM

View Replies

Latest Reply

MuthuLakshmi
New Contributor III

11-05-2023 9:43:13 PM

1 kudos

Please check the below docs for usage related informations. The Billable Usage Logs: https://docs.databricks.com/en/administration-guide/account-settings/usage.html You can filter them using tags for more precise information which you are looking for...

1 kudos

11-05-2023 9:43:13 PM

3 More Replies

by dave_d • New Contributor II

10-10-2023 3:15:55 PM

1809 Views
3 replies
1 kudos

What is the "Columnar To Row" node in this simple Databricks SQL query profile?

I am running a relatively simple SQL query that writes back to a table on a Databricks serverless SQL warehouse, and I'm trying to understand why there is a "Columnar To Row" node in the query profile that is consuming the vast majority of the time s...

Data Engineering

1809 Views
3 replies
1 kudos

10-10-2023 3:15:55 PM

View Replies

Latest Reply

Annapurna_Hiriy
New Contributor III

11-05-2023 5:08:59 AM

1 kudos

@dave_d We do not have a document with list of operations that would bring up ColumnarToRow node. This node provides a common executor to translate an RDD of ColumnarBatch into an RDD of InternalRow. This is inserted whenever such a transition is de...

1 kudos

11-05-2023 5:08:59 AM

2 More Replies

by erigaud • Honored Contributor

10-12-2023 10:57:03 PM

1620 Views
3 replies
0 kudos

Merge DLT with Delta Table

Is there anyway to accomplish this ? I have an existing Delta Table and a separate Delta Live Table pipelines and I would like to merge data from a DLT to my existing Delta Table. Is this doable or completely impossible ?

Data Engineering

1620 Views
3 replies
0 kudos

10-12-2023 10:57:03 PM

View Replies

Latest Reply

LeifBruen
New Contributor II

11-03-2023 11:06:22 PM

0 kudos

Merging data from a Delta Live Table (DLT) into an existing Delta Table is possible with careful planning. Transition data from DLT to Delta Table through batch processing, data transformation, and ETL processes, ensuring schema compatibility.

0 kudos

11-03-2023 11:06:22 PM

2 More Replies

by NotARobot • New Contributor III

11-03-2023 9:23:16 AM

766 Views
0 replies
1 kudos

Force DBR/Spark Version in Delta Live Tables Cluster Policy

Is there a way to use Compute Policies to force Delta Live Tables to use specific Databricks Runtime and PySpark versions? While trying to leverage some of the functions in PySpark 3.5.0, I don't seem to be able to get Delta Live Tables to use Databr...

Data Engineering

Compute Policies

Delta Live Tables

Graphframes

pyspark

766 Views
0 replies
1 kudos

11-03-2023 9:23:16 AM

by Akshith_Rajesh • New Contributor III

05-12-2023 10:13:36 AM

7440 Views
4 replies
5 kudos

Resolved! Call a Stored Procedure in Azure Synapse with input and output Params

driver_manager = spark._sc._gateway.jvm.java.sql.DriverManager connection = driver_manager.getConnection(mssql_url, mssql_user, mssql_pass) connection.prepareCall("EXEC sys.sp_tables").execute() connection.close()The above code works fine but however...

Data Engineering

7440 Views
4 replies
5 kudos

05-12-2023 10:13:36 AM

View Replies

Latest Reply

sivaram_bh
New Contributor II

11-03-2023 4:59:30 AM

5 kudos

statement="EXEC procedurename imputparametre , ? "driver_manager = spark._sc._gateway.jvm.java.sql.DriverManagercon = driver_manager.getConnection(jdbcUrl, username, pwd)exec_statement = con.prepareCall(statement)exec_statement.registerOutParameter(...

5 kudos

11-03-2023 4:59:30 AM

3 More Replies

by databicky • Contributor II

11-02-2023 6:27:25 AM

5541 Views
4 replies
0 kudos

BLOCK_OFFSET_INSIDE_BLOCK ROW_OFFSET_INSIDE_BLOCK is not working

BLOCK_OFFSET_INSIDE_BLOCK ROW_OFFSET_INSIDE_BLOCK command is not working in spark, but these command is running in hive , when running in spark it get failed with invalid column like that

Data Engineering

5541 Views
4 replies
0 kudos

11-02-2023 6:27:25 AM

View Replies

Latest Reply

Kaniz
Community Manager

11-03-2023 2:46:04 AM

0 kudos

Hi @databicky, Could you paste your code stack here instead of the screenshot?

0 kudos

11-03-2023 2:46:04 AM

3 More Replies

by databicky • Contributor II

10-15-2023 4:24:01 AM

2706 Views
5 replies
1 kudos

Resolved! No handler for udf/udaf/udtf for function

i created one function using jar file which is present in the cluster location, but when executing the hive query it is showing error as no handler for udf/udaf/udtf . this queries is running fine in hd insight clusters but when running in databricks...

Data Engineering

2706 Views
5 replies
1 kudos

10-15-2023 4:24:01 AM

View Replies

Latest Reply

Kaniz
Community Manager

10-16-2023 1:22:01 AM

1 kudos

Hi @databicky , The error message "No handler for UDF/UDAF/UDTF" typically occurs when Spark cannot locate the UDF/UDAF/UDTF you registered. This can happen if the JAR file containing the UDF/UDAF/UDTF is not correctly loaded into Spark or the func...

1 kudos

10-16-2023 1:22:01 AM

4 More Replies

by BST • New Contributor

11-02-2023 6:07:05 PM

584 Views
1 replies
0 kudos

Resolved! Spark - Cluster Mode - Driver

When running a Spark Job in Cluster Mode, how does Spark decide which worker node to place the driver resources ?

Data Engineering

584 Views
1 replies
0 kudos

11-02-2023 6:07:05 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-03-2023 1:25:13 AM

0 kudos

Hi @BST, When running a Spark job in cluster mode, it involves a central manager (e.g., YARN, Mesos, Kubernetes), a driver program, and worker nodes. The driver program is submitted to the central manager, which allocates resources and decides where ...

0 kudos

11-03-2023 1:25:13 AM

by anirudh_a • New Contributor II

07-25-2023 7:20:10 AM

6016 Views
8 replies
3 kudos

Resolved! 'No file or Directory' error when using pandas.read_excel in Databricks

I am baffled by the behaviour of Databricks:Below you can see the contents of the directory using dbutils in Databricks. It shows the `test.xlsx` file clearly in directory (and I can even open it using `dbutils.fs.head`) But when I go to use panda.re...

Data Engineering

dbfs

panda

spark

spark config

6016 Views
8 replies
3 kudos

07-25-2023 7:20:10 AM

View Replies

Latest Reply

DamnKush
New Contributor II

11-02-2023 4:33:09 AM

3 kudos

Hey, I encountered it recently. I can see you are using the shared cluster, try switching to a single user cluster and it will fix it.Can someone let me know why it wasn't working w a shared cluster?Thanks.

3 kudos

11-02-2023 4:33:09 AM

7 More Replies

by priyanananthram • New Contributor II

10-10-2023 11:18:59 PM

4868 Views
4 replies
1 kudos

Delta live tables for large number of tables

Hi There I am hoping for some guidance I have some 850 tables that I need to ingest using a DLT Pipeline. When I do this my event log shows that driver node dies becomes unresponsive likely due to GC.Can DLT be used to ingest large number of tablesI...

Data Engineering

4868 Views
4 replies
1 kudos

10-10-2023 11:18:59 PM

View Replies

Latest Reply

Sidhant07
New Contributor III

11-02-2023 3:49:37 AM

1 kudos

Delta Live Tables (DLT) can indeed be used to ingest a large number of tables. However, if you're experiencing issues with the driver node becoming unresponsive due to garbage collection (GC), it might be a sign that the resources allocated to the dr...

1 kudos

11-02-2023 3:49:37 AM

3 More Replies

by Databricks143 • New Contributor III

11-01-2023 11:08:54 PM

745 Views
1 replies
0 kudos

Failure to intialize congratulations

Hi team,When we reading the CSV file from azure blob using databricks we are not getting any key error and able to read the data from blob .But if we are trying to read XML file it failed with key issue invalid configuration . Error:Failure to inti...

Data Engineering

745 Views
1 replies
0 kudos

11-01-2023 11:08:54 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-02-2023 2:28:19 AM

0 kudos

Hi @Databricks143, Please check this link here. Please LMK if that helps.

0 kudos

11-02-2023 2:28:19 AM

by Graham • New Contributor III

09-16-2022 10:41:51 AM

3519 Views
5 replies
2 kudos

"MERGE" always slower than "CREATE OR REPLACE"

OverviewTo update our Data Warehouse tables, we have tried two methods: "CREATE OR REPLACE" and "MERGE". With every query we've tried, "MERGE" is slower.My question is this: Has anyone successfully gotten a "MERGE" to perform faster than a "CREATE OR...

Data Engineering

3519 Views
5 replies
2 kudos

09-16-2022 10:41:51 AM

View Replies

Latest Reply

Manisha_Jena
New Contributor III

11-02-2023 2:18:28 AM

2 kudos

Hi @Graham Can you please try Low Shuffle Merge [LSM] and see if it helps? LSM is a new MERGE algorithm that aims to maintain the existing data organization (including z-order clustering) for unmodified data, while simultaneously improving performan...

2 kudos

11-02-2023 2:18:28 AM

4 More Replies

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Reading information_schema tables through JDBC connection

source db and target db in DLT

Databricks VSCode Extension Sync Timeout

How do I figure out the cost breakdown for Databricks

What is the "Columnar To Row" node in this simple Databricks SQL query profile?

Merge DLT with Delta Table

Force DBR/Spark Version in Delta Live Tables Cluster Policy

Resolved! Call a Stored Procedure in Azure Synapse with input and output Params

BLOCK_OFFSET_INSIDE_BLOCK ROW_OFFSET_INSIDE_BLOCK is not working

Resolved! No handler for udf/udaf/udtf for function

Resolved! Spark - Cluster Mode - Driver

Resolved! 'No file or Directory' error when using pandas.read_excel in Databricks

Delta live tables for large number of tables

Failure to intialize congratulations

"MERGE" always slower than "CREATE OR REPLACE"

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...