cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Leszek
by Contributor
  • 1884 Views
  • 5 replies
  • 11 kudos

Resolved! Runtime SQL Configuration - how to make it simple

Hi, I'm running couple of Notebooks in my pipeline and I would like to set fixed value of 'spark.sql.shuffle.partitions' - same value for every notebook. Should I do that by adding spark.conf.set.. code in each Notebook (Runtime SQL configurations ar...

  • 1884 Views
  • 5 replies
  • 11 kudos
Latest Reply
Leszek
Contributor
  • 11 kudos

Hi, Thank you all for the tips. I tried before to set this option in Spark Config but didn't work for some reason. Today I tried again and it's working :).

  • 11 kudos
4 More Replies
Kaniz
by Community Manager
  • 936 Views
  • 3 replies
  • 2 kudos
  • 936 Views
  • 3 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

as @Kaniz Fatma​ wrote you can use native functions for it:df = spark.read.format("csv").option("header", "true").load("file.csv")Alternative really nice way is to use sql syntax for that:%sql CREATE TEMPORARY VIEW diamonds USING CSV OPTIONS (path "/...

  • 2 kudos
2 More Replies
SRS
by New Contributor II
  • 1990 Views
  • 3 replies
  • 5 kudos

Resolved! Delta Tables incremental backup method

Hello,Does anyone tried to create an incremental backup on delta tables? What I mean is to load into the backup storage only the latest parquet files part of the Delta Table and to refresh the _delta_log folder, instead of copying the whole files aga...

  • 1990 Views
  • 3 replies
  • 5 kudos
Latest Reply
jose_gonzalez
Moderator
  • 5 kudos

Hi @Stefan Stegaru​ ,You can use Delta time travel to query the data that was just added on a specific version. Then like @Hubert Dudek​  mentioned, you can copy over this sub set of data to a new table or a new location. You will need to do a deep...

  • 5 kudos
2 More Replies
Kaniz
by Community Manager
  • 762 Views
  • 1 replies
  • 1 kudos
  • 762 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

"Zookeeper keeps track of status of the Kafka cluster nodes and it also keeps track of Kafka topics, partitions etc. Zookeeper it self is allowing multiple clients to perform simultaneous reads and writes and acts as a shared configuration service wi...

  • 1 kudos
Anonymous
by Not applicable
  • 1616 Views
  • 4 replies
  • 2 kudos

Resolved! Anyone using RAPIDS and cuGraph on a current runtime?

We're in the process of migrating a large graph computation workload to nvidia RAPIDS + cuGraph for GPU acceleration. The package isn't a part of the base runtime and it is available by conda package management only, so can't be installed via init sc...

  • 1616 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Thanks @Prabakar Ammeappin​ , we're looking at this. Strangely, the last commit removed the rapids libraries from the base cuda-images. We're adding them back in.

  • 2 kudos
3 More Replies
JK2021
by New Contributor III
  • 2323 Views
  • 5 replies
  • 3 kudos

Resolved! Exception handling in Databricks

We are planning to customise code on Databricks to call Salesforce bulk API 2.0 to load data from databricks delta table to Salesforce.My question is : All the exception handling, retries and all around Bulk API can be coded explicitly in Data bricks...

  • 2323 Views
  • 5 replies
  • 3 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 3 kudos

Hi @Jazmine Kochan​ , I haven't tried Salesforce bulk API 2.0 to load data. But in theory, it should be fine.

  • 3 kudos
4 More Replies
yatharth29
by New Contributor II
  • 2835 Views
  • 5 replies
  • 1 kudos

How can I extract/get the time, along with the status (Failed or Succeeded) into a table for every time my Databricks job finishes running?

I want to get a mail notification at the end of each day for when my Databricks job has finished running and for that I need to extract the time of it's completion and it's status. How can I achieve that?

  • 2835 Views
  • 5 replies
  • 1 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 1 kudos

Hi @Yatharth Kaushik​  you can use the JobsRunList API to get all the information of the job run. You can write a code to extract the information that you need for the table.The are multiple API's in the same doc that you can use to get information a...

  • 1 kudos
4 More Replies
RantoB
by Valued Contributor
  • 4626 Views
  • 5 replies
  • 0 kudos

Resolved! SSLCertVerificationError how to disable SSL Certification

Hi, How is that possible to disable SSL Certification.With databricks API I got this error :SSLCertVerificationError   SSLCertVerificationError: ("hostname 'https' doesn't match either of '*.numericable.fr', 'numericable.fr'",)   MaxRetryError: HTTPS...

  • 4626 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Bertrand BURCKER​ - Thanks for letting us know your issue is resolved. If @Prabakar Ammeappin​'s answer solved the problem, would you be happy to mark his answer as best so others can more easily find an answer for this?

  • 0 kudos
4 More Replies
Kaniz
by Community Manager
  • 3067 Views
  • 1 replies
  • 3 kudos
  • 3067 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

To initialize the Athena client you need to import boto3 libraryclient = boto3.client('athena')You will then execute your query:queryStart = client.start_query_execution( QueryString = 'SELECT * FROM myTable', QueryExecutionContext = { ...

  • 3 kudos
Sarvagna_Mahaka
by New Contributor III
  • 9895 Views
  • 6 replies
  • 6 kudos

Resolved! Exporting csv files from Databricks

I'm trying to export a csv file from my Databricks workspace to my laptop.I have followed the below steps. 1.Installed databricks CLI2. Generated Token in Azure Databricks3. databricks configure --token5. Token:xxxxxxxxxxxxxxxxxxxxxxxxxx6. databrick...

  • 9895 Views
  • 6 replies
  • 6 kudos
Latest Reply
User16871418122
Contributor III
  • 6 kudos

Hi @Sarvagna Mahakali​ There is an easier hack: a) You can save results locally on the disk and create a hyper link for downloading CSV . You can copy the file to this location: dbfs:/FileStore/table1_good_2020_12_18_07_07_19.csvb) Then download with...

  • 6 kudos
5 More Replies
DB_007
by New Contributor III
  • 5017 Views
  • 8 replies
  • 4 kudos

Resolved! Databricks SQL not displaying all the databases that i have on my cluster.

I have a cluster running on 7.3 LTS and it has about 35+ databases. When i tried to setup an endpoint on Databricks SQL, i do not see any database listed.

  • 5017 Views
  • 8 replies
  • 4 kudos
Latest Reply
User16871418122
Contributor III
  • 4 kudos

hi @Arif Ali​  You may have to check the data access config to add the params for external metastore: spark.hadoop.javax.jdo.option.ConnectionDriverName org.mariadb.jdbc.Driverspark.hadoop.javax.jdo.option.ConnectionUserName <mysql-username>spark.had...

  • 4 kudos
7 More Replies
sarvesh
by Contributor III
  • 2098 Views
  • 5 replies
  • 8 kudos

Catch rejected Data ( Rows ) while reading with Apache-Spark.

I work with Spark-Scala and I receive Data in different formats ( .csv/.xlxs/.txt etc ), when I try to read/write this data from different sources to a any database, many records got rejected due to various issues like (special characters, data type ...

  • 2098 Views
  • 5 replies
  • 8 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 8 kudos

or maybe schema evolution on delta lake is enough, in combination with Hubert's answer

  • 8 kudos
4 More Replies
Nick_Hughes
by New Contributor III
  • 3960 Views
  • 8 replies
  • 3 kudos

Resolved! Formatting on Databricks Alerts

Hi Guys. I have looked at the formatting options and I'm still struggling to work out how to best format the email body of a databricks alert. I want to be able to selectively choose columns from the query and dispaly them in a table. Or even if i ca...

  • 3960 Views
  • 8 replies
  • 3 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 3 kudos

Hi @Nick Hughes​ , unfortunately, this is not available for now. We have a feature request for the same. DB-I-4105 - SQL Alerts: Formatting message body when creating Custom TemplateThis feature has been considered by our product team and it will be...

  • 3 kudos
7 More Replies
Labels
Top Kudoed Authors