Data Engineering

Forum Posts

Sorted by:

by Erik • Valued Contributor II

04-03-2022 7:53:30 AM

5060 Views
9 replies
10 kudos

Resolved! How to use dbx for local development.

Databricks connect is a program which allows you to run spark code locally, but the actual execution happens on a spark cluster. Noticeably, it allows you to debug and step through the code locally in your own IDE. Quite useful. But it is now beeing...

Data Engineering

5060 Views
9 replies
10 kudos

04-03-2022 7:53:30 AM

View Replies

Latest Reply

FeliciaWilliam
Contributor

06-03-2024 1:46:17 AM

10 kudos

I found answers to my questions here

10 kudos

06-03-2024 1:46:17 AM

8 More Replies

by Himanshu4 • New Contributor II

05-23-2024 8:56:22 AM

1012 Views
4 replies
2 kudos

Inquiry Regarding Enabling Unity Catalog in Databricks Cluster Configuration via API

Dear Databricks Community,I hope this message finds you well. I am currently working on automating cluster configuration updates in Databricks using the API. As part of this automation, I am looking to ensure that the Unity Catalog is enabled within ...

Data Engineering

1012 Views
4 replies
2 kudos

05-23-2024 8:56:22 AM

View Replies

Latest Reply

Himanshu4
New Contributor II

06-02-2024 10:24:16 PM

2 kudos

Hi RaphaelCan we fetch job details from one workspace and create new job in new workspace with the same "job id" and configuration?

2 kudos

06-02-2024 10:24:16 PM

3 More Replies

by fury-kata • New Contributor II

05-29-2024 9:52:52 PM

604 Views
2 replies
0 kudos

ModuleNotFoundError when run with foreachBatch on serverless mode

I using Notebooks to do some transformations I install a new whl: %pip install --force-reinstall /Workspace/<my_lib>.whl %restart_python Then I successfully import the installed lib from my_lib.core import test However when I run my code with fo...

Data Engineering

604 Views
2 replies
0 kudos

05-29-2024 9:52:52 PM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

05-31-2024 1:52:01 AM

0 kudos

Hi @fury-kata, Make sure that the path to your custom module is correctly added to the Python path (sys.path). You mentioned installing the .whl file, so ensure that the installation path is accessible from your Databricks notebook.Verify that th...

0 kudos

05-31-2024 1:52:01 AM

1 More Replies

by wilco • New Contributor II

04-05-2024 12:39:31 AM

1201 Views
3 replies
0 kudos

SQL Warehouse: Retrieving SQL ARRAY Type via JDBC driver

Hi all,we are currently running into the following issuewe are using serverless SQL warehousein a JAVA application we are using the latest Databricks JDBC driver (v2.6.36)we are querying the warehouse with a collect_list function, which should return...

Data Engineering

1201 Views
3 replies
0 kudos

04-05-2024 12:39:31 AM

View Replies

Latest Reply

KTheJoker
Contributor II

06-02-2024 9:02:05 PM

0 kudos

Hey Wilco, The answer is no, ODBC/JDBC don't support complex types so these need to be compressed into strings over the wire (usually in JSON representation) and rehydrated on the client side into a complex object.

0 kudos

06-02-2024 9:02:05 PM

2 More Replies

by source2sea • Contributor

04-26-2023 4:29:24 AM

2393 Views
2 replies
0 kudos

Resolved! ERROR RetryingHMSHandler: NoSuchObjectException(message:There is no database named global_temp)

ERROR RetryingHMSHandler: NoSuchObjectException(message:There is no database named global_temp)should one create it in the work space manually via UI? and how?would it get overwritten if work space is created via terraform?I use 10.4 LTS runtime.

Data Engineering

2393 Views
2 replies
0 kudos

04-26-2023 4:29:24 AM

View Replies

Latest Reply

ashish2007g
New Contributor II

06-02-2024 9:22:24 AM

0 kudos

I am experiencing significant delay on my streaming. I am using changefeed connector. Its processing streaming batch very frequently but experiencing sudden halt and shows no active stage for longer time. I observed below exception continuously promp...

0 kudos

06-02-2024 9:22:24 AM

1 More Replies

by kskistad • New Contributor III

12-15-2022 6:18:13 AM

3524 Views
3 replies
4 kudos

Resolved! Streaming Delta Live Tables

I'm a little confused about how streaming works with DLT. My first questions is what is the difference in behavior if you set the pipeline mode to "Continuous" but in your notebook you don't use the "streaming" prefix on table statements, and simila...

Data Engineering

3524 Views
3 replies
4 kudos

12-15-2022 6:18:13 AM

View Replies

Latest Reply

Harsh141220
New Contributor II

06-01-2024 11:29:50 PM

4 kudos

Is it possible to have custom upserts in streaming tables in a delta live tables pipeline?Use case: I am trying to maintain a valid session based on timestamp column and want to upsert to the target table.Tried going through the documentations but dl...

4 kudos

06-01-2024 11:29:50 PM

2 More Replies

by PearceR • New Contributor III

04-21-2023 2:56:14 AM

9630 Views
3 replies
1 kudos

Resolved! custom upsert for delta live tables apply_changes()

Hello community :).I am currently implementing some pipelines using DLT. They are working great for my medalion architecture for landed json in bronze -> silver (using apply_changes) then materialized gold views ontop.However, I am attempting to crea...

Data Engineering

9630 Views
3 replies
1 kudos

04-21-2023 2:56:14 AM

View Replies

Latest Reply

Harsh141220
New Contributor II

06-01-2024 11:22:20 PM

1 kudos

Is it possible to have custom upserts for streaming tables in delta live tables?Im getting the error:pyspark.errors.exceptions.captured.AnalysisException: `blusmart_poc.information_schema.sessions` is not a Delta table.

1 kudos

06-01-2024 11:22:20 PM

2 More Replies

by sreeyv • New Contributor II

05-30-2024 5:00:36 AM

434 Views
2 replies
0 kudos

Unable to execute update statement through Databricks Notebook

I am unable to execute update statements through Databricks Notebook, getting this error message "com.databricks.sql.transaction.tahoe.actions.InvalidProtocolVersionException: Delta protocol version is too new for this version of the Databricks Runti...

Data Engineering

434 Views
2 replies
0 kudos

05-30-2024 5:00:36 AM

View Replies

Latest Reply

sreeyv
New Contributor II

06-01-2024 12:02:42 PM

0 kudos

This is resolved, this happens when a Column in the table has a GENERATED BY DEFAULT AS IDENTITY defined. When you remove this column, it works fine

0 kudos

06-01-2024 12:02:42 PM

1 More Replies

by deepu • New Contributor II

11-12-2022 11:30:46 PM

812 Views
1 replies
1 kudos

performance issue with SIMBA ODBC using SSIS

i was trying to upload data into a table in hive_metastore using SSIS using SIMBA ODBC driver. The data set is huge (1.2 million records and 20 columns) , it is taking more than 40 mins to complete. is there an config change to improve the load time.

Data Engineering

812 Views
1 replies
1 kudos

11-12-2022 11:30:46 PM

View Replies

Latest Reply

NandiniN
Honored Contributor

06-01-2024 1:25:52 AM

1 kudos

Looks like a slow data upload into a table in hive_metastore using SSIS and the SIMBA ODBC driver. This could be due to a variety of factors, including the size of your dataset and the configuration of your system. One potential solution could be to ...

1 kudos

06-01-2024 1:25:52 AM

by Ramseths • New Contributor

05-31-2024 12:47:37 PM

485 Views
1 replies
0 kudos

Wrong Path Databricks Repos

In a Databricks environment, I have cloned a repository that I have in Azure DevOps Repos, the repository is inside the path:Workspace/Repos/<user_mail>/my_repo.Then when I create a Python script that I want to call in a notebook using an import: imp...

Data Engineering

485 Views
1 replies
0 kudos

05-31-2024 12:47:37 PM

View Replies

Latest Reply

NandiniN
Honored Contributor

05-31-2024 9:44:45 PM

0 kudos

Hi @Ramseths , If your notebook and script are in the same path, it would have picked the same relative path. Is your notebook located in /databricks/driver? Thanks!

0 kudos

05-31-2024 9:44:45 PM

by JonLaRose • New Contributor III

10-30-2023 5:21:59 AM

1582 Views
3 replies
0 kudos

Adding custom Jars to SQL Warehouses

Hi there,I want to add custom JARs to an SQL warehouse (Pro if that matters) like I can in an interactive cluster, yet I don't see a way.Is that a degraded functionality when transitioning to a SQL warehouse, or have I missed something? Thank you.

Data Engineering

1582 Views
3 replies
0 kudos

10-30-2023 5:21:59 AM

View Replies

Latest Reply

JunYang
New Contributor III

05-31-2024 3:32:23 PM

0 kudos

ADD JAR is a SQL syntax for Databricks runtime, it does not work for DBSQL/warehouse. DBSQL would throw this error: [NOT_SUPPORTED_WITH_DB_SQL] LIST JAR(S) is not supported on a SQL warehouse. SQLSTATE: 0A000. This feature is not supported as of now....

0 kudos

05-31-2024 3:32:23 PM

2 More Replies

by pavel_merkle • New Contributor II

05-23-2024 9:06:36 AM

1423 Views
3 replies
0 kudos

Databrikcs SDK - create new job using JSON

Hello,I am trying to create a Job via Databricks SDK. As input, I use the JSON generated via Workflows UI (Worklflows->Jobs->View YAML/JSON->JSON API->Create) generating pavel_job.json. When trying to run SDK function jobs.create asdbk = WorkspaceCli...

Data Engineering

jobs

sdk

1423 Views
3 replies
0 kudos

05-23-2024 9:06:36 AM

View Replies

Latest Reply

mhiltner
New Contributor III

05-31-2024 1:27:03 PM

0 kudos

it does work for me, using directly on a notebook cell (havent tried in vscode). This is my sample Json { "name": "Agregacao_Gold", "email_notifications": {}, "webhook_notifications": {}, "timeout_seconds": 0, "max_concurrent_runs": 1, ...

0 kudos

05-31-2024 1:27:03 PM

2 More Replies

by leungi • Contributor

05-10-2024 2:29:57 PM

1347 Views
6 replies
1 kudos

Resolved! Unable to add column comment in Materialized View (MV)

The following doc suggests the ability to add column comments during MV creation via the `column list` parameter.Thus, the SQL code below is expected to generate a table where the columns `col_1` and `col_2` are commented; however, this is not the ca...

Data Engineering

1347 Views
6 replies
1 kudos

05-10-2024 2:29:57 PM

View Replies

Latest Reply

raphaelblg
Honored Contributor II

05-31-2024 9:02:07 AM

1 kudos

@leungi you've shared the python language reference. This is the SQL Reference from where I've based my example.

1 kudos

05-31-2024 9:02:07 AM

5 More Replies

by Marcin_U • New Contributor II

05-31-2024 7:08:59 AM

241 Views
1 replies
0 kudos

Making transform on pyspark.sql.Column object outside DataFrame.withColumn method

Hello,I made some transform on pyspark.sql.Column object: file_path_splitted=f.split(df[filepath_col_name],'/') # return Column object file_name = file_path_splitted[f.size(file_path_splitted) - 1] # return Column object Next I used variable "file_na...

Data Engineering

241 Views
1 replies
0 kudos

05-31-2024 7:08:59 AM

View Replies

Latest Reply

raphaelblg
Honored Contributor II

05-31-2024 9:18:46 AM

0 kudos

Hello @Marcin_U , Thank you for reaching out. The transformation you apply within or outside the `withColumn` method will ultimately result in the same Spark plan. The answer is no, it's not possible to have rows mismatch if you're referring to the s...

0 kudos

05-31-2024 9:18:46 AM

by ckarrasexo • New Contributor II

05-29-2024 1:06:08 PM

984 Views
2 replies
0 kudos

pyspark.sql.connect.dataframe.DataFrame vs pyspark.sql.DataFrame

I noticed that on some Databricks 14.3 clusters, I get DataFrames with type pyspark.sql.connect.dataframe.DataFrame, while on other clusters also with Databricks 14.3, the exact same code gets DataFrames of type pyspark.sql.DataFramepyspark.sql.conne...

Data Engineering

984 Views
2 replies
0 kudos

05-29-2024 1:06:08 PM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

05-31-2024 12:49:26 AM

0 kudos

Hi @ckarrasexo, The distinction between pyspark.sql.connect.dataframe.DataFrame and pyspark.sql.DataFrame can be a bit confusing. DataFrame vs. SQL Queries: Both pyspark.sql.connect.dataframe.DataFrame and pyspark.sql.DataFrame represent structur...

0 kudos

05-31-2024 12:49:26 AM

1 More Replies

User

Count

1603

744

348

285

247

Databricks Community

Forum Posts

Resolved! How to use dbx for local development.

Inquiry Regarding Enabling Unity Catalog in Databricks Cluster Configuration via API

ModuleNotFoundError when run with foreachBatch on serverless mode

SQL Warehouse: Retrieving SQL ARRAY Type via JDBC driver

Resolved! ERROR RetryingHMSHandler: NoSuchObjectException(message:There is no database named global_temp)

Resolved! Streaming Delta Live Tables

Resolved! custom upsert for delta live tables apply_changes()

Unable to execute update statement through Databricks Notebook

performance issue with SIMBA ODBC using SSIS

Wrong Path Databricks Repos

Adding custom Jars to SQL Warehouses

Databrikcs SDK - create new job using JSON

Resolved! Unable to add column comment in Materialized View (MV)

Making transform on pyspark.sql.Column object outside DataFrame.withColumn method

pyspark.sql.connect.dataframe.DataFrame vs pyspark.sql.DataFrame

Compute Policy Does Not Install Libraries

Is there a way to let the DLT pipeline retry by it...

Can't create Catalog on Databricks on AWS

Executing Notebooks - Run All Cells vs Run All Bel...

getting Status code: 301 Moved Permanently error