cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Harsha777
by New Contributor III
  • 1364 Views
  • 1 replies
  • 0 kudos

Casting a String (containing number in EU format) to a Decimal

Hi,I have a string column containing a number in EU format, has comma instead of dot, e.g. 10,35I need to convert this string into a proper decimal data type as part data transformation into the target table.I could do it as below by replacing the ",...

  • 1364 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @Harsha777 ,Your solution looks good!However, you may try also to_number function, but unfortunately still will need to first to replace "," with ".". from pyspark.sql.functions import to_number, regexp_replace, lit data = [("10,6523",), ("10,23"...

  • 0 kudos
ibrahim2124
by New Contributor
  • 1632 Views
  • 1 replies
  • 0 kudos

Facing Issue in Importing Delta Live Tables - Databricks Runtime 14.3

I am facing issues while Importing dlt library in Databricks Runtime 14.3. Previously while using the Runtime 13.1 The `import dlt` was working fine but now when updating the Runtime it is giving me error.This is the Cluster's Configuration     Also ...

image (1).png image.png
  • 1632 Views
  • 1 replies
  • 0 kudos
Latest Reply
upatint07
New Contributor II
  • 0 kudos

@Retired_mod Do you have any solution for this above problem ? I saw you reply in this link https://community.databricks.com/t5/data-engineering/no-module-named-dlt/td-p/21105. so i just ask you. Thank you !

  • 0 kudos
Atul-Kumar
by New Contributor II
  • 1407 Views
  • 4 replies
  • 1 kudos

XML file Load to Delta table with different fields list

I there,I have a scenario where the source XML files may have all the fields or may be 80% of fields in next run. How to we load the files in Delta tables which should handle the XML files with all field lists and also with few fields only. In smalle...

atulkumar_0-1723433363042.png
  • 1407 Views
  • 4 replies
  • 1 kudos
Latest Reply
Atul-Kumar
New Contributor II
  • 1 kudos

Auto Loader is not acceptable solution in my case. I tried to make an empty table using XSD file and then load the data frame. Some how it worked to meet the objective.

  • 1 kudos
3 More Replies
Brahmareddy
by Esteemed Contributor
  • 5230 Views
  • 0 replies
  • 2 kudos

How I Tuned Databricks Query Performance from Power BI Desktop

As someone who frequently works with large datasets in Power BI, I’ve had my fair share of frustrations with slow query performance, especially when pulling data from Databricks. After countless hours of tweaking and experimenting, I’ve finally found...

  • 5230 Views
  • 0 replies
  • 2 kudos
suela
by New Contributor II
  • 1471 Views
  • 3 replies
  • 2 kudos

Resolved! Not able to get a SQL warehouse in Databricks AWS

Hey folks, I have been trying to set up an SQL warehouse for Databricks in AWS (on a new account) but I keep getting this:Cluster Start-up Delayed. Please wait while we continue to try and start the cluster. No action is required from you.This kept h...

  • 1471 Views
  • 3 replies
  • 2 kudos
Latest Reply
suela
New Contributor II
  • 2 kudos

Hey Brahma,Thanks for the hints. I actually tried a lot of thinks back and forth and the only thing that finally worked was to create a new workspace. Since this was a new account, that was easy. I suppose something would have gone wrong with the con...

  • 2 kudos
2 More Replies
rpilli
by New Contributor
  • 3239 Views
  • 0 replies
  • 0 kudos

Conditional Execution in DLT Pipeline based on the output

Hello ,I'm working on a Delta Live Tables (DLT) pipeline where I need to implement a conditional step that only triggers under specific conditions. Here's the challenge I'm facing:I have a function that checks if the data meets certain thresholds. If...

  • 3239 Views
  • 0 replies
  • 0 kudos
JayanthDE
by New Contributor
  • 1059 Views
  • 2 replies
  • 0 kudos

What are the use cases of AI for data engineers?

I am a data engineer and want to start with AI and ML, how and where to start with? 

  • 1059 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 0 kudos

@JayanthDE refer this .https://github.com/amueller/introduction_to_ml_with_python/blob/main/03-unsupervised-learning.ipynb

  • 0 kudos
1 More Replies
shiva1212
by New Contributor II
  • 1247 Views
  • 1 replies
  • 0 kudos

Clone/copy python file

We are using Databricks extensively in the company. We found out that we can’t clone/copy *.py file using UI. We can clone notebooks but not python file. If we clone folder, we only clone notebooks inside folder but not python file. 

  • 1247 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 0 kudos

@shiva1212  When working with Databricks and managing Python files, it's true that the UI limitations can sometimes be restrictive. you can use Databricks CLI and REST API for file management and copying.

  • 0 kudos
Smriti1
by New Contributor
  • 2169 Views
  • 1 replies
  • 0 kudos

How to pass parameters to different notebooks in a workflow?

I have three notebooks: Notebook-1, Notebook-2, and Notebook-3, with a workflow dependency sequence of 1 -> 2 -> 3.Notebook-1 dynamically receives parameters, such as entity-1 and entity-2.Since these parameters change with each run, how can I pass t...

  • 2169 Views
  • 1 replies
  • 0 kudos
Latest Reply
adbooth01
New Contributor II
  • 0 kudos

As long as the parameter names are the same in all Notebooks, whatever value you trigger the workflow with will automatically be sent to all the notebooks. For example, if all three notebooks in the workflow have parameters entity-1 and entity-2, and...

  • 0 kudos
mailsathesh
by New Contributor II
  • 1746 Views
  • 2 replies
  • 1 kudos

Databricks Cluster Start and Stop

I want to send out an email if the cluster failed to start, i used to start the cluster using Databricks cli and then terminate them. In some cases ,my cluster is not starting at all and there are some errors.My use case is to send an email using dat...

  • 1746 Views
  • 2 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @mailsathesh ,You can write a script that will use databricks cli to start cluster. You can you --timeout flag to set maximum amount of time to reach running state. If this amoun is exceeded or if there is any error you can then send an email with...

  • 1 kudos
1 More Replies
CarterM
by New Contributor III
  • 8408 Views
  • 3 replies
  • 2 kudos

Resolved! Why Spark Streaming from S3 is returning thousands of files when there are only 9?

I am attempting to stream JSON endpoint responses from an s3 bucket into a spark DLT. I have been very successful in this practice previously, but the difference this time is that I am storing the responses from multiple endpoints in the same s3 buck...

8_9 endpoint response structure Soccer  endpoint  9 9 endpoint responses in same s3 bucket
  • 8408 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Carter Mooring​ Thank you SO MUCH for coming back to provide a solution to your thread! Happy you were able to figure this out so quickly. And I am sure that this will help someone in the future with the same issue.

  • 2 kudos
2 More Replies
Mangeysh
by New Contributor
  • 1461 Views
  • 2 replies
  • 0 kudos

Converting databricks query output in JSON and creating end point

Hello  I am very new to DB and building an UI where I need to show data from databricks table. Unfortunately , I am getting access delta sharing feature by administrator.  Planning to develop own API and expose endpoint with JSON output. I am sure th...

  • 1461 Views
  • 2 replies
  • 0 kudos
Latest Reply
menotron
Valued Contributor
  • 0 kudos

Hi @Mangeysh,You could achieve this using Databricks SQL Statement Execution API. I would recommend going through the docs and looking at the functionality and limitations and see if it serves your need before planning to develop your own APIs.

  • 0 kudos
1 More Replies
Kayla
by Valued Contributor II
  • 6101 Views
  • 12 replies
  • 22 kudos

Resolved! Environment magic command?

Noticed something strange in the Databricks version control- I'm getting hidden commands with %environment that I can only see in the UI for version control. No idea what it's from, just a minor nuisance I'm curious if anyone can shed light on.+ %env...

  • 6101 Views
  • 12 replies
  • 22 kudos
Latest Reply
justinbreese
Databricks Employee
  • 22 kudos

Hello all, this is Justin from the PM team at Databricks. We are sorry about the friction this caused you. This was a feature related to the new way that we are doing dependency management in our serverless offerings - Environments. We are going to r...

  • 22 kudos
11 More Replies
jonathan-dufaul
by Valued Contributor
  • 5041 Views
  • 6 replies
  • 6 kudos

Why is writing to MSSQL Server 12.0 so slow directly from spark but nearly instant when I write to a csv and read it back

I have a dataframe that inexplicably takes forever to write to an MS SQL Server even though other dataframes, even much larger ones, write nearly instantly. I'm using this code:my_dataframe.write.format("jdbc") .option("url",sqlsUrl) .optio...

  • 5041 Views
  • 6 replies
  • 6 kudos
Latest Reply
plondon
New Contributor II
  • 6 kudos

Had a similar issue. I can do 1-4 million rows in 1 minute via SSIS ETL on SQL server. Table is 15 fields long. Looking at your code it seems you have many fields but nothing like 300-400 fields which can affect performance. You can check SQL Server ...

  • 6 kudos
5 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels