Data Engineering

Forum Posts

Sorted by:

by hfrid • New Contributor

04-11-2023 5:40:27 AM

2832 Views
1 replies
0 kudos

JDBC connector seems to be a bottleneck when trying to insert dataframe to Azure SQL Server

Hi! I am inserting a pyspark dataframe to Azure sql server and it takes a very long time. The database is a s4 but my dataframe that is 17 million rows and 30 columns takes up to 50 minutes to insert.Is there a way to significantly speed this up? I a...

Data Engineering

2832 Views
1 replies
0 kudos

04-11-2023 5:40:27 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-15-2023 6:06:47 PM

0 kudos

@Hjalmar Friden :There are several ways to improve the performance of inserting data into Azure SQL Server using JDBC connector:Increase the batch size: By default, the JDBC connector sends data in batches of 1000 rows at a time. You can increase th...

0 kudos

04-15-2023 6:06:47 PM

by Anonymous • Not applicable

04-11-2023 8:38:56 AM

1346 Views
1 replies
1 kudos

Testing framework using Databricks Notebook and Pytest.

Hi Friends,I am designing a Testing framework using Databricks and pytest. Currently stuck with report generation, that is generating blank with only default parameters only .for ex :-testsuites><testsuite name="pytest" errors="0" failures="0" skippe...

Data Engineering

1346 Views
1 replies
1 kudos

04-11-2023 8:38:56 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-15-2023 6:03:36 PM

1 kudos

@Vijaya Palreddy :There are several testing frameworks available for data testing that you can consider using with Databricks and Pytest:Great Expectations: Great Expectations is an open-source framework that provides a simple way to create and main...

1 kudos

04-15-2023 6:03:36 PM

by gary7135 • New Contributor II

04-11-2023 11:08:07 AM

858 Views
1 replies
0 kudos

Unable to use GridsearchCV from spark-sklearn due to 'fit_params' error

When using GridsearchCV from spark-sklearn, I got GridSearchCV giving " __init__() got an unexpected keyword argument 'fit_params' errorI am using sklearn 1.2.2 and spark-sklearn 0.3.0I think this is because spark-sklearn GridsearchCV still has the f...

Data Engineering

858 Views
1 replies
0 kudos

04-11-2023 11:08:07 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-15-2023 5:59:30 PM

0 kudos

@Gary Mu :Yes, you are correct. The error message you are seeing is likely due to the fact that the fit_params parameter was deprecated in GridSearchCV in sklearn 1.2.2. One possible solution is to use a different version of scikit-learn that is co...

0 kudos

04-15-2023 5:59:30 PM

by 709986 • New Contributor

04-12-2023 2:54:21 AM

425 Views
1 replies
0 kudos

Not able to connect with Salesforce, We need to read data from Salesforce

Not able to connect with Salesforce, We need to read data from Salesforce, we are getting NoClassDefFoundError: scala/Product$classCode:%scalaval sfDF = spark. read. format("com.springml.spark.salesforce"). option("username", "sf...

Data Engineering

425 Views
1 replies
0 kudos

04-12-2023 2:54:21 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-15-2023 5:56:28 PM

0 kudos

@Amar.Kasar :The error you are getting, NoClassDefFoundError: scala/Product$class, suggests that the Scala classpath is not set up correctly. You can try the following steps to troubleshoot the issue:Check if the library com.springml:spark-salesforc...

0 kudos

04-15-2023 5:56:28 PM

by cmilligan • Contributor II

04-11-2023 12:16:48 PM

1087 Views
1 replies
0 kudos

Pull query that inserts into table

I'm trying to pull some data down for table history and am needing to view the query that inserted into a table. My team owns the process so I'm able to view the current query by just viewing it but I'm also wanting to capture changes over time witho...

Data Engineering

1087 Views
1 replies
0 kudos

04-11-2023 12:16:48 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-15-2023 5:48:38 PM

0 kudos

@Coleman Milligan :Yes, in Databricks, you can use the built-in Delta Lake feature to track the history of changes made to a table, including the queries that inserted data into it.Here's an example of how to retrieve the queries that inserted data ...

0 kudos

04-15-2023 5:48:38 PM

by Arty • New Contributor II

04-09-2023 5:10:36 AM

3225 Views
5 replies
6 kudos

Resolved! How to make Autoloader delete files after a successful load

Hi AllCan you please advise how I can arrange loaded file deletion from Azure Storage upon its successful load via Autoloader? As I understood, Spark streaming "cleanSource" option is unavailable for Autoloader, so I'm trying to find the best way to ...

Data Engineering

3225 Views
5 replies
6 kudos

04-09-2023 5:10:36 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-11-2023 6:34:33 AM

6 kudos

Hi @Artem Sachuk Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

6 kudos

04-11-2023 6:34:33 AM

4 More Replies

by Julie1 • New Contributor II

04-13-2023 11:26:06 AM

1481 Views
2 replies
1 kudos

Resolved! Query data not showing in custom alert notifications and QUERY_RESULT_ROWS

I've set up a custom alert notification for one of my Databricks SQL queries, and it triggers correctly, but I'm not able to get the actual results of the query to appear in the notification email. I've followed the example/template in the custom ale...

Data Engineering

1481 Views
2 replies
1 kudos

04-13-2023 11:26:06 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

04-14-2023 12:05:39 AM

1 kudos

The actual query results are not displayed in the alert unfortunately. You can pass the alert condition etc, but not the raw results of the underlying query.I hope this will be added in the future.A workaround is to add a link to the query, so the r...

1 kudos

04-14-2023 12:05:39 AM

1 More Replies

by Mado • Valued Contributor II

01-09-2023 10:37:24 PM

1751 Views
4 replies
3 kudos

Resolved! Streaming Delta Live Table, if I re-run the pipeline, does it append the new data to the current table?

Hi,I have a question about DLT table. Assume that I have a streaming DLT pipeline which reads data from a Bronze table and apply transformation on data. Pipeline mode is triggered. If I re-run the pipeline, does it append new data to the current tabl...

Data Engineering

1751 Views
4 replies
3 kudos

01-09-2023 10:37:24 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 6:13:39 AM

3 kudos

@Mohammad Saber :In a Databricks Delta Lake (DLT) pipeline, when you re-run the pipeline in "append" mode, new data will be appended to the existing table. Delta Lake provides built-in support for handling duplicates through its "upsert" functionali...

3 kudos

04-10-2023 6:13:39 AM

3 More Replies

by JJ_ • New Contributor II

01-30-2023 8:03:08 AM

1068 Views
3 replies
0 kudos

ODBC Connection to Another Compute Within the Same Workspace

Hello all!I couldn't find anything definitive related to this issue so I hope I'm not duplicating another topic :). I have imported an R repository that normally runs on another machine and uses ODBC driver to issue sparkSQL commands to a compute (le...

Data Engineering

1068 Views
3 replies
0 kudos

01-30-2023 8:03:08 AM

View Replies

Latest Reply

JJ_
New Contributor II

04-11-2023 2:15:08 AM

0 kudos

Thanks @Suteja Kanuri for your response! I tried all of the steps you mentioned (and many more) but never managed to make it work. My suspicion was that our azure networking setup was preventing this from happening. I have not found this documented ...

0 kudos

04-11-2023 2:15:08 AM

2 More Replies

by a2_ish • New Contributor II

10-04-2022 3:31:34 AM

1181 Views
1 replies
0 kudos

Where are delta lake files stored by given path?

I have below code which works for the path below but fails for path = azure storage account path. i have enough access to write and update the storage account. I would like to know what wrong am I doing and the path below which works , how can i phys...

Data Engineering

1181 Views
1 replies
0 kudos

10-04-2022 3:31:34 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-14-2023 9:39:33 AM

0 kudos

@Ankit Kumar :The error message you received indicates that the user does not have sufficient permission to access the Azure Blob Storage account. You mentioned that you have enough access to write and update the storage account, but it's possible t...

0 kudos

04-14-2023 9:39:33 AM

by vicusbass • New Contributor II

04-13-2023 11:54:50 PM

4932 Views
3 replies
0 kudos

How to extract values from JSON array field?

Hi,While creating an SQL notebook, I am struggling with extracting some values from a JSON array field. I need to create a view where a field would be an array with values extracted from a field like the one below, specifically I need the `value` fi...

Data Engineering

4932 Views
3 replies
0 kudos

04-13-2023 11:54:50 PM

View Replies

Latest Reply

vicusbass
New Contributor II

04-14-2023 9:26:46 AM

0 kudos

Maybe I didn't explain it correctly. The JSON snippet from the description is a cell from a row from a table.

0 kudos

04-14-2023 9:26:46 AM

2 More Replies

by labromb • Contributor

04-14-2023 4:02:45 AM

1447 Views
2 replies
0 kudos

Getting Databricks SQL dashboard to recognise change to an underlying query

Hi CommunityScenario:I have created a query in Databricks SQL, built a number of visualisations from it and published them to a dashboard. I then realise that I need to add another field to the underlying query that I want to then leverage as a dashb...

Data Engineering

1447 Views
2 replies
0 kudos

04-14-2023 4:02:45 AM

View Replies

Latest Reply

youssefmrini
Honored Contributor III

04-14-2023 5:55:18 AM

0 kudos

Can you take a screenshot ?

0 kudos

04-14-2023 5:55:18 AM

1 More Replies

by Hitesh_goswami • New Contributor

04-12-2023 9:25:35 AM

590 Views
1 replies
0 kudos

Upgrading Ipython version without changing LTS version

I am using a specific Pydeeque function called ColumnProfilerRunner which is only supported with Spark 3.0.1, so I must use 7.3 LTS. Currently, I am trying to install "great_expectations" library on Python, which requires Ipython version==7.16.3, an...

Data Engineering

590 Views
1 replies
0 kudos

04-12-2023 9:25:35 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-14-2023 2:36:43 AM

0 kudos

@Hitesh Goswami : please check if the below helps!To upgrade the Ipython version on a Databricks 7.3LTS cluster, you can follow these steps:Create a new library installation command using the Databricks CLI by running the following command in your l...

0 kudos

04-14-2023 2:36:43 AM

by HamidHamid_Mora • New Contributor II

04-14-2023 12:53:03 AM

1213 Views
2 replies
2 kudos

ganglia is unavailable on DBR 13.0

We created a library in databricks to ingest ganglia metrics for all jobs in our delta tables;However end point 8652 is no more available on DBR 13.0is there any other endpoint available ? since we need to log all metrics for all executed jobs not on...

Data Engineering

1213 Views
2 replies
2 kudos

04-14-2023 12:53:03 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

04-14-2023 2:18:27 AM

2 kudos

Ganglia is only supported on Databricks Runtime versions 12 and below. From Databricks Runtime 13, Ganglia is replaced by a new Databricks metrics system offering more features and integrations. To export metrics to external services, you can use Dat...

2 kudos

04-14-2023 2:18:27 AM

1 More Replies

by JGil • New Contributor III

04-11-2023 11:35:16 PM

1381 Views
5 replies
0 kudos

Installing Bazel on databricks cluster

I am new to azure databricks and I want to install a library on a cluster and to do that I need to install bazel build tool first.I checked the site bazel but I am still not sure how to do it in databricks?I appriciate if any can help me and write me...

Data Engineering

1381 Views
5 replies
0 kudos

04-11-2023 11:35:16 PM

View Replies

Latest Reply

Avinash_94
New Contributor III

04-14-2023 12:25:06 AM

0 kudos

Databricks migrated over from the standard Scala Build Tool (SBT) to using Bazel to build, test and deploy our Scala code. Follow this doc https://www.databricks.com/blog/2019/02/27/speedy-scala-builds-with-bazel-at-databricks.html

0 kudos

04-14-2023 12:25:06 AM

4 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

JDBC connector seems to be a bottleneck when trying to insert dataframe to Azure SQL Server

Testing framework using Databricks Notebook and Pytest.

Unable to use GridsearchCV from spark-sklearn due to 'fit_params' error

Not able to connect with Salesforce, We need to read data from Salesforce

Pull query that inserts into table

Resolved! How to make Autoloader delete files after a successful load

Resolved! Query data not showing in custom alert notifications and QUERY_RESULT_ROWS

Resolved! Streaming Delta Live Table, if I re-run the pipeline, does it append the new data to the current table?

ODBC Connection to Another Compute Within the Same Workspace

Where are delta lake files stored by given path?

How to extract values from JSON array field?

Getting Databricks SQL dashboard to recognise change to an underlying query

Upgrading Ipython version without changing LTS version

ganglia is unavailable on DBR 13.0

Installing Bazel on databricks cluster

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...