cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Raghu_Bindingan
by New Contributor III
  • 15477 Views
  • 2 replies
  • 0 kudos

Resolved! SQL Merge Statement not working

Hi I am trying to use the SQL Merge statement on databricksMERGE INTO targetUSING sourceON source.key = target.keyWHEN MATCHED UPDATE SET *WHEN NOT MATCHED INSERT *WHEN NOT MATCHED BY SOURCE DELETEThis is failing with the error [PARSE_SYNTAX_ERROR...

  • 15477 Views
  • 2 replies
  • 0 kudos
Latest Reply
Raghu_Bindingan
New Contributor III
  • 0 kudos

I was missing the THEN before UPDATE, INSERT and DELETE. This keyword is missing from the documentation on Databricks https://learn.microsoft.com/en-us/azure/databricks/delta/mergeIt now works. Thanks

  • 0 kudos
1 More Replies
Rexton
by New Contributor
  • 8242 Views
  • 3 replies
  • 2 kudos

AWS Databricks Pyspark - Unable to connect to Azure MySQL - Shows "SSL Connection is required"

Even after specifying SSL options, unable to connect to MySQL. What could have gone wrong? Could anyone experience similar issues? df_target_master = spark.read.format("jdbc")\.option("driver", "com.mysql.jdbc.Driver")\.option("url", host_url)\.optio...

  • 8242 Views
  • 3 replies
  • 2 kudos
Latest Reply
a2barbosa
New Contributor II
  • 2 kudos

Hey,Here the solution: The correct option for ssl is "useSSL" and not just "ssl".This code below could works:df_target_master = spark.read.format("jdbc")\.option("driver", "com.mysql.jdbc.Driver")\.option("url", host_url)\.option("dbtable", supply_ma...

  • 2 kudos
2 More Replies
Punnu
by New Contributor II
  • 2654 Views
  • 1 replies
  • 1 kudos

Error while running spark.catalog.listDatabases()

I am running steps mentioned in https://github.com/databrickslabs/splunk-integration/blob/master/notebooks/source/push_to_splunk.pyWhen I am running spark.catalog.listDatabases()getting error py4j.security.Py4JSecurityException: Method public java.l...

  • 2654 Views
  • 1 replies
  • 1 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 1 kudos

Hi @Purnima Bhatia​ , I faced a similar error for a different command when I was using a wrong type of cluster access mode. You can try to create a different cluster with different access mode and check. I might be wrong but try and check this.

  • 1 kudos
Manju1202
by New Contributor II
  • 4486 Views
  • 3 replies
  • 1 kudos

Saving Number field as String in Databricks

Do we see any risk of saving a Number field as String? Will we use any functionality/feature if we save as String ? Will it have any impact on performance ?

  • 4486 Views
  • 3 replies
  • 1 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 1 kudos

Hi @Manju Chugani​. Yes. In Short, it is not really recommended to save the columns as string if all the values are expected to be numbers.Here are some of them Storage Space: Storing numbers as strings can take up more storage space than storing the...

  • 1 kudos
2 More Replies
nicole_wong
by Databricks Employee
  • 16060 Views
  • 10 replies
  • 7 kudos

Resolved! Can Terraform be used to set configurations in Admin / workspace settings?

I am posting this on behalf of my customer. They are currently working on the deployment & config of their workspace on AWS via Terraform.Is it possible to set some configs in the Admin/workspace settings via TF? According to the Terraform module, it...

  • 16060 Views
  • 10 replies
  • 7 kudos
Latest Reply
francly
New Contributor II
  • 7 kudos

Hi, can I get a full list of the latest configurable supported workspace_conf on tf, I can't find the list on tf registry site.

  • 7 kudos
9 More Replies
johnb1
by Contributor
  • 3971 Views
  • 3 replies
  • 0 kudos

Cluster Configuration for ML Model Training

Hi!I am training a Random Forest (pyspark.ml.classification.RandomForestClassifier) on Databricks with 1,000,000 training examples and 25 features. I employ a cluster with one driver (16 GB Memory, 4 Cores), 2-6 workers (32-96 GB Memory, 8-24 Cores),...

  • 3971 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @John B​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can...

  • 0 kudos
2 More Replies
ALIDI
by New Contributor II
  • 3066 Views
  • 3 replies
  • 3 kudos

training_set.load_df().toPandas() fails with the new pandas version (2.0.0)

pandas 2.0.0 was released on 4.3.2023 and was pushed to my cluster on the same day. The day after I tried using training_set.load_df().toPandas() and it failed. Reverting to pandas 1.5.3. fixed the problem.

  • 3066 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Al IDI​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your q...

  • 3 kudos
2 More Replies
RDD1
by New Contributor III
  • 2474 Views
  • 3 replies
  • 0 kudos

Hi, I have completed lakehouse fundamentals accreditation, but did not receive the badge yet, only have the certificate of completion.

Hi, I have completed lakehouse fundamentals accreditation, but did not receive the badge yet, only have the certificate of completion.

  • 2474 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @RD DO​ Apologies as we have an issue with our credentials app. We are working with the vendor to resolve it. We expect to be able to grant your badge soon.Thank you!

  • 0 kudos
2 More Replies
Stephanraj
by Databricks Partner
  • 5648 Views
  • 2 replies
  • 0 kudos

TPC-DI Benchmark in Databricks (ETL)

Similar to TPC-DS and TPC-H benchmark, is it possible to execute TPC-DI benchmark in Databricks. What is the feasibility to execute TPC-DI benchmark, please let me know if there is any existing ETL Tool available in Databricks for TPC-DI. For referen...

  • 5648 Views
  • 2 replies
  • 0 kudos
Latest Reply
Shannon_Barrow
Databricks Employee
  • 0 kudos

Hello, we recently made public a repo that makes it easy to run the TPC-DI benchmark end-to-end - in a variety of ways. The repo can be found here. Feel free to let me know if you have any issues running it!

  • 0 kudos
1 More Replies
codeexplorer
by New Contributor II
  • 11253 Views
  • 4 replies
  • 0 kudos

Update record in databricks sql table from C#.Net in visual studio 2022 using ODBC

I am trying to make a backend method call work which connects to database and updates the record in a table. The method call works perfectly and it is not throwing any error but at the same time it does not update any record in the table either.Note:...

  • 11253 Views
  • 4 replies
  • 0 kudos
Latest Reply
codeexplorer
New Contributor II
  • 0 kudos

I found a temporary work around. Instead of passing the value through the parameters, I passed the value directly in query like below. I know it is not the ideal way but at this time this is working. If I do not pass the value as shown below, the log...

  • 0 kudos
3 More Replies
Vijesh
by New Contributor II
  • 5861 Views
  • 5 replies
  • 1 kudos

parsing error in Databricks SQL endpoint

I have two tables EMPLOYEE & EMPLOYEE_ROLE. I'm trying to Update a column with a value from another column. I'm using SQL server join but i get an error - [parse_syntax_error] Syntax error at or near 'FROM' line 3. UPDATE CSET C.title = B.title FROM ...

  • 5861 Views
  • 5 replies
  • 1 kudos
Latest Reply
sensanjoy
Contributor II
  • 1 kudos

Hi @Vijesh V​ Try to use merge into to perform cdc between tables :MERGE INTO target aUSING source bON {merge_condition}WHEN MATCHED THEN {matched_action}WHEN NOT MATCHED THEN {not_matched_action}

  • 1 kudos
4 More Replies
tech2cloud
by New Contributor II
  • 4694 Views
  • 3 replies
  • 2 kudos
  • 4694 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Ravi Vishwakarma​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 2 kudos
2 More Replies
PraveenC
by New Contributor II
  • 4779 Views
  • 4 replies
  • 3 kudos

[Databricks][JDBC](10400) Invalid type for data - column: 10, type: Array

Getting below error while mapping an Array Column to String[] entity. Please suggest if Databricks JDBC support entity mapping of Array Values [Worked the same code for below config - H2 DB version - 2.1.214 and org.hibernate.dialect.H2Dialect - ...

  • 4779 Views
  • 4 replies
  • 3 kudos
Latest Reply
Atanu
Databricks Employee
  • 3 kudos

Hello @Emmanuel Trindade​  @Praveen C​  This does not look like coming from Databricks end. Look at the error thread.javax.persistence.PersistenceException: org.hibernate.exception.DataException: Could not read entity state from ResultSet : EntityKey...

  • 3 kudos
3 More Replies
brickster_2018
by Databricks Employee
  • 2196 Views
  • 1 replies
  • 0 kudos
  • 2196 Views
  • 1 replies
  • 0 kudos
Latest Reply
sagnikml
New Contributor III
  • 0 kudos

Delta Sharing is an open protocol developed by Databricks for secure data sharing with other organizations. The primary concepts underlying Delta Sharing in Databricks are shares and recipients. A share is a read-only collection of tables and table p...

  • 0 kudos
MRTN
by Contributor
  • 15362 Views
  • 4 replies
  • 3 kudos

Load CSV files with slightly different schemas

I have a set of CSV files generated by a system, where the schema has evolved over the years. Some columns have been added, and at least one column has been renamed in newer files. Is there any way to elegantly load these files into a dataframe? I ha...

  • 15362 Views
  • 4 replies
  • 3 kudos
Latest Reply
MRTN
Contributor
  • 3 kudos

For reference - for anybody struggling with the same issues. All online examples using auto loader are written as one block statement on the form: (spark.readStream.format("cloudFiles") .option("cloudFiles.format", "csv") # The schema location di...

  • 3 kudos
3 More Replies
Labels