cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

hfrid
by New Contributor
  • 2832 Views
  • 1 replies
  • 0 kudos

JDBC connector seems to be a bottleneck when trying to insert dataframe to Azure SQL Server

Hi! I am inserting a pyspark dataframe to Azure sql server and it takes a very long time. The database is a s4 but my dataframe that is 17 million rows and 30 columns takes up to 50 minutes to insert.Is there a way to significantly speed this up? I a...

  • 2832 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Hjalmar Friden​ :There are several ways to improve the performance of inserting data into Azure SQL Server using JDBC connector:Increase the batch size: By default, the JDBC connector sends data in batches of 1000 rows at a time. You can increase th...

  • 0 kudos
Anonymous
by Not applicable
  • 1346 Views
  • 1 replies
  • 1 kudos

Testing framework using Databricks Notebook and Pytest.

Hi Friends,I am designing a Testing framework using Databricks and pytest. Currently stuck with report generation, that is generating blank with only default parameters only .for ex :-testsuites><testsuite name="pytest" errors="0" failures="0" skippe...

  • 1346 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Vijaya Palreddy​ :There are several testing frameworks available for data testing that you can consider using with Databricks and Pytest:Great Expectations: Great Expectations is an open-source framework that provides a simple way to create and main...

  • 1 kudos
gary7135
by New Contributor II
  • 858 Views
  • 1 replies
  • 0 kudos

Unable to use GridsearchCV from spark-sklearn due to 'fit_params' error

When using GridsearchCV from spark-sklearn, I got GridSearchCV giving " __init__() got an unexpected keyword argument 'fit_params' errorI am using sklearn 1.2.2 and spark-sklearn 0.3.0I think this is because spark-sklearn GridsearchCV still has the f...

  • 858 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Gary Mu​ :Yes, you are correct. The error message you are seeing is likely due to the fact that the fit_params parameter was deprecated in GridSearchCV in sklearn 1.2.2. One possible solution is to use a different version of scikit-learn that is co...

  • 0 kudos
709986
by New Contributor
  • 425 Views
  • 1 replies
  • 0 kudos

Not able to connect with Salesforce, We need to read data from Salesforce

Not able to connect with Salesforce, We need to read data from Salesforce, we are getting NoClassDefFoundError: scala/Product$classCode:%scalaval sfDF = spark.        read.        format("com.springml.spark.salesforce").        option("username", "sf...

  • 425 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Amar.Kasar​ :The error you are getting, NoClassDefFoundError: scala/Product$class, suggests that the Scala classpath is not set up correctly. You can try the following steps to troubleshoot the issue:Check if the library com.springml:spark-salesforc...

  • 0 kudos
cmilligan
by Contributor II
  • 1087 Views
  • 1 replies
  • 0 kudos

Pull query that inserts into table

I'm trying to pull some data down for table history and am needing to view the query that inserted into a table. My team owns the process so I'm able to view the current query by just viewing it but I'm also wanting to capture changes over time witho...

  • 1087 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Coleman Milligan​ :Yes, in Databricks, you can use the built-in Delta Lake feature to track the history of changes made to a table, including the queries that inserted data into it.Here's an example of how to retrieve the queries that inserted data ...

  • 0 kudos
Arty
by New Contributor II
  • 3225 Views
  • 5 replies
  • 6 kudos

Resolved! How to make Autoloader delete files after a successful load

Hi AllCan you please advise how I can arrange loaded file deletion from Azure Storage upon its successful load via Autoloader? As I understood, Spark streaming "cleanSource" option is unavailable for Autoloader, so I'm trying to find the best way to ...

  • 3225 Views
  • 5 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hi @Artem Sachuk​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 6 kudos
4 More Replies
Julie1
by New Contributor II
  • 1481 Views
  • 2 replies
  • 1 kudos

Resolved! Query data not showing in custom alert notifications and QUERY_RESULT_ROWS

I've set up a custom alert notification for one of my Databricks SQL queries, and it triggers correctly, but I'm not able to get the actual results of the query to appear in the notification email. I've followed the example/template in the custom ale...

  • 1481 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

The actual query results are not displayed in the alert unfortunately. You can pass the alert condition etc, but not the raw results of the underlying query.I hope this will be added in the future.A workaround is to add a link to the query, so the r...

  • 1 kudos
1 More Replies
Mado
by Valued Contributor II
  • 1751 Views
  • 4 replies
  • 3 kudos

Resolved! Streaming Delta Live Table, if I re-run the pipeline, does it append the new data to the current table?

Hi,I have a question about DLT table. Assume that I have a streaming DLT pipeline which reads data from a Bronze table and apply transformation on data. Pipeline mode is triggered. If I re-run the pipeline, does it append new data to the current tabl...

  • 1751 Views
  • 4 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Mohammad Saber​ :In a Databricks Delta Lake (DLT) pipeline, when you re-run the pipeline in "append" mode, new data will be appended to the existing table. Delta Lake provides built-in support for handling duplicates through its "upsert" functionali...

  • 3 kudos
3 More Replies
JJ_
by New Contributor II
  • 1068 Views
  • 3 replies
  • 0 kudos

ODBC Connection to Another Compute Within the Same Workspace

Hello all!I couldn't find anything definitive related to this issue so I hope I'm not duplicating another topic :). I have imported an R repository that normally runs on another machine and uses ODBC driver to issue sparkSQL commands to a compute (le...

  • 1068 Views
  • 3 replies
  • 0 kudos
Latest Reply
JJ_
New Contributor II
  • 0 kudos

Thanks @Suteja Kanuri​ for your response! I tried all of the steps you mentioned (and many more) but never managed to make it work. My suspicion was that our azure networking setup was preventing this from happening. I have not found this documented ...

  • 0 kudos
2 More Replies
a2_ish
by New Contributor II
  • 1181 Views
  • 1 replies
  • 0 kudos

Where are delta lake files stored by given path?

I have below code which works for the path below but fails for path = azure storage account path. i have enough access to write and update the storage account. I would like to know what wrong am I doing and the path below which works , how can i phys...

  • 1181 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Ankit Kumar​ :The error message you received indicates that the user does not have sufficient permission to access the Azure Blob Storage account. You mentioned that you have enough access to write and update the storage account, but it's possible t...

  • 0 kudos
vicusbass
by New Contributor II
  • 4932 Views
  • 3 replies
  • 0 kudos

How to extract values from JSON array field?

Hi,While creating an SQL notebook, I am struggling with extracting some values from a JSON array field. I need to create a view where a field would be an array with values extracted from a field like the one below, specifically I need the `value` fi...

  • 4932 Views
  • 3 replies
  • 0 kudos
Latest Reply
vicusbass
New Contributor II
  • 0 kudos

Maybe I didn't explain it correctly. The JSON snippet from the description is a cell from a row from a table.

  • 0 kudos
2 More Replies
labromb
by Contributor
  • 1447 Views
  • 2 replies
  • 0 kudos

Getting Databricks SQL dashboard to recognise change to an underlying query

Hi CommunityScenario:I have created a query in Databricks SQL, built a number of visualisations from it and published them to a dashboard. I then realise that I need to add another field to the underlying query that I want to then leverage as a dashb...

  • 1447 Views
  • 2 replies
  • 0 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 0 kudos

Can you take a screenshot ?

  • 0 kudos
1 More Replies
Hitesh_goswami
by New Contributor
  • 590 Views
  • 1 replies
  • 0 kudos

Upgrading Ipython version without changing LTS version

I am using a specific Pydeeque function called ColumnProfilerRunner which is only supported with Spark 3.0.1, so I must use 7.3 LTS. Currently, I am trying to install "great_expectations" library on Python, which requires Ipython version==7.16.3, an...

  • 590 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Hitesh Goswami​ : please check if the below helps!To upgrade the Ipython version on a Databricks 7.3LTS cluster, you can follow these steps:Create a new library installation command using the Databricks CLI by running the following command in your l...

  • 0 kudos
HamidHamid_Mora
by New Contributor II
  • 1213 Views
  • 2 replies
  • 2 kudos

ganglia is unavailable on DBR 13.0

We created a library in databricks to ingest ganglia metrics for all jobs in our delta tables;However end point 8652 is no more available on DBR 13.0is there any other endpoint available ? since we need to log all metrics for all executed jobs not on...

  • 1213 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Ganglia is only supported on Databricks Runtime versions 12 and below. From Databricks Runtime 13, Ganglia is replaced by a new Databricks metrics system offering more features and integrations. To export metrics to external services, you can use Dat...

  • 2 kudos
1 More Replies
JGil
by New Contributor III
  • 1381 Views
  • 5 replies
  • 0 kudos

Installing Bazel on databricks cluster

I am new to azure databricks and I want to install a library on a cluster and to do that I need to install bazel build tool first.I checked the site bazel but I am still not sure how to do it in databricks?I appriciate if any can help me and write me...

  • 1381 Views
  • 5 replies
  • 0 kudos
Latest Reply
Avinash_94
New Contributor III
  • 0 kudos

Databricks migrated over from the standard Scala Build Tool (SBT) to using Bazel to build, test and deploy our Scala code. Follow this doc https://www.databricks.com/blog/2019/02/27/speedy-scala-builds-with-bazel-at-databricks.html

  • 0 kudos
4 More Replies
Labels
Top Kudoed Authors