by
hfrid
• New Contributor
- 2832 Views
- 1 replies
- 0 kudos
Hi! I am inserting a pyspark dataframe to Azure sql server and it takes a very long time. The database is a s4 but my dataframe that is 17 million rows and 30 columns takes up to 50 minutes to insert.Is there a way to significantly speed this up? I a...
- 2832 Views
- 1 replies
- 0 kudos
Latest Reply
@Hjalmar Friden​ :There are several ways to improve the performance of inserting data into Azure SQL Server using JDBC connector:Increase the batch size: By default, the JDBC connector sends data in batches of 1000 rows at a time. You can increase th...
- 1346 Views
- 1 replies
- 1 kudos
Hi Friends,I am designing a Testing framework using Databricks and pytest. Currently stuck with report generation, that is generating blank with only default parameters only .for ex :-testsuites><testsuite name="pytest" errors="0" failures="0" skippe...
- 1346 Views
- 1 replies
- 1 kudos
Latest Reply
@Vijaya Palreddy​ :There are several testing frameworks available for data testing that you can consider using with Databricks and Pytest:Great Expectations: Great Expectations is an open-source framework that provides a simple way to create and main...
- 858 Views
- 1 replies
- 0 kudos
When using GridsearchCV from spark-sklearn, I got GridSearchCV giving " __init__() got an unexpected keyword argument 'fit_params' errorI am using sklearn 1.2.2 and spark-sklearn 0.3.0I think this is because spark-sklearn GridsearchCV still has the f...
- 858 Views
- 1 replies
- 0 kudos
Latest Reply
@Gary Mu​ :Yes, you are correct. The error message you are seeing is likely due to the fact that the fit_params parameter was deprecated in GridSearchCV in sklearn 1.2.2. One possible solution is to use a different version of scikit-learn that is co...
- 425 Views
- 1 replies
- 0 kudos
Not able to connect with Salesforce, We need to read data from Salesforce, we are getting NoClassDefFoundError: scala/Product$classCode:%scalaval sfDF = spark. read. format("com.springml.spark.salesforce"). option("username", "sf...
- 425 Views
- 1 replies
- 0 kudos
Latest Reply
@Amar.Kasar​ :The error you are getting, NoClassDefFoundError: scala/Product$class, suggests that the Scala classpath is not set up correctly. You can try the following steps to troubleshoot the issue:Check if the library com.springml:spark-salesforc...
- 1087 Views
- 1 replies
- 0 kudos
I'm trying to pull some data down for table history and am needing to view the query that inserted into a table. My team owns the process so I'm able to view the current query by just viewing it but I'm also wanting to capture changes over time witho...
- 1087 Views
- 1 replies
- 0 kudos
Latest Reply
@Coleman Milligan​ :Yes, in Databricks, you can use the built-in Delta Lake feature to track the history of changes made to a table, including the queries that inserted data into it.Here's an example of how to retrieve the queries that inserted data ...
by
Arty
• New Contributor II
- 3225 Views
- 5 replies
- 6 kudos
Hi AllCan you please advise how I can arrange loaded file deletion from Azure Storage upon its successful load via Autoloader? As I understood, Spark streaming "cleanSource" option is unavailable for Autoloader, so I'm trying to find the best way to ...
- 3225 Views
- 5 replies
- 6 kudos
Latest Reply
Hi @Artem Sachuk​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...
4 More Replies
by
Julie1
• New Contributor II
- 1481 Views
- 2 replies
- 1 kudos
I've set up a custom alert notification for one of my Databricks SQL queries, and it triggers correctly, but I'm not able to get the actual results of the query to appear in the notification email. I've followed the example/template in the custom ale...
- 1481 Views
- 2 replies
- 1 kudos
Latest Reply
The actual query results are not displayed in the alert unfortunately. You can pass the alert condition etc, but not the raw results of the underlying query.I hope this will be added in the future.A workaround is to add a link to the query, so the r...
1 More Replies
by
Mado
• Valued Contributor II
- 1751 Views
- 4 replies
- 3 kudos
Hi,I have a question about DLT table. Assume that I have a streaming DLT pipeline which reads data from a Bronze table and apply transformation on data. Pipeline mode is triggered. If I re-run the pipeline, does it append new data to the current tabl...
- 1751 Views
- 4 replies
- 3 kudos
Latest Reply
@Mohammad Saber​ :In a Databricks Delta Lake (DLT) pipeline, when you re-run the pipeline in "append" mode, new data will be appended to the existing table. Delta Lake provides built-in support for handling duplicates through its "upsert" functionali...
3 More Replies
by
JJ_
• New Contributor II
- 1068 Views
- 3 replies
- 0 kudos
Hello all!I couldn't find anything definitive related to this issue so I hope I'm not duplicating another topic :). I have imported an R repository that normally runs on another machine and uses ODBC driver to issue sparkSQL commands to a compute (le...
- 1068 Views
- 3 replies
- 0 kudos
Latest Reply
Thanks @Suteja Kanuri​ for your response! I tried all of the steps you mentioned (and many more) but never managed to make it work. My suspicion was that our azure networking setup was preventing this from happening. I have not found this documented ...
2 More Replies
by
a2_ish
• New Contributor II
- 1181 Views
- 1 replies
- 0 kudos
I have below code which works for the path below but fails for path = azure storage account path. i have enough access to write and update the storage account. I would like to know what wrong am I doing and the path below which works , how can i phys...
- 1181 Views
- 1 replies
- 0 kudos
Latest Reply
@Ankit Kumar​ :The error message you received indicates that the user does not have sufficient permission to access the Azure Blob Storage account. You mentioned that you have enough access to write and update the storage account, but it's possible t...
- 4932 Views
- 3 replies
- 0 kudos
Hi,While creating an SQL notebook, I am struggling with extracting some values from a JSON array field. I need to create a view where a field would be an array with values extracted from a field like the one below, specifically I need the `value` fi...
- 4932 Views
- 3 replies
- 0 kudos
Latest Reply
Maybe I didn't explain it correctly. The JSON snippet from the description is a cell from a row from a table.
2 More Replies
- 1447 Views
- 2 replies
- 0 kudos
Hi CommunityScenario:I have created a query in Databricks SQL, built a number of visualisations from it and published them to a dashboard. I then realise that I need to add another field to the underlying query that I want to then leverage as a dashb...
- 1447 Views
- 2 replies
- 0 kudos
- 590 Views
- 1 replies
- 0 kudos
I am using a specific Pydeeque function called ColumnProfilerRunner which is only supported with Spark 3.0.1, so I must use 7.3 LTS. Currently, I am trying to install "great_expectations" library on Python, which requires Ipython version==7.16.3, an...
- 590 Views
- 1 replies
- 0 kudos
Latest Reply
@Hitesh Goswami​ : please check if the below helps!To upgrade the Ipython version on a Databricks 7.3LTS cluster, you can follow these steps:Create a new library installation command using the Databricks CLI by running the following command in your l...
- 1213 Views
- 2 replies
- 2 kudos
We created a library in databricks to ingest ganglia metrics for all jobs in our delta tables;However end point 8652 is no more available on DBR 13.0is there any other endpoint available ? since we need to log all metrics for all executed jobs not on...
- 1213 Views
- 2 replies
- 2 kudos
Latest Reply
Ganglia is only supported on Databricks Runtime versions 12 and below. From Databricks Runtime 13, Ganglia is replaced by a new Databricks metrics system offering more features and integrations. To export metrics to external services, you can use Dat...
1 More Replies
by
JGil
• New Contributor III
- 1381 Views
- 5 replies
- 0 kudos
I am new to azure databricks and I want to install a library on a cluster and to do that I need to install bazel build tool first.I checked the site bazel but I am still not sure how to do it in databricks?I appriciate if any can help me and write me...
- 1381 Views
- 5 replies
- 0 kudos
Latest Reply
Databricks migrated over from the standard Scala Build Tool (SBT) to using Bazel to build, test and deploy our Scala code. Follow this doc https://www.databricks.com/blog/2019/02/27/speedy-scala-builds-with-bazel-at-databricks.html
4 More Replies