cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sika
by New Contributor II
  • 12326 Views
  • 2 replies
  • 0 kudos

ignoreDeletes in DLT pipeline

Hi all,I have a DLT pipeline as so:raw -> cleansed (SCD2) -> curated. 'Raw' is utilizing autoloader, to continously read file from a datalake. These files can contain tons of duplicate, which causes our raw table to become quite large. Therefore, we ...

  • 12326 Views
  • 2 replies
  • 0 kudos
Latest Reply
sika
New Contributor II
  • 0 kudos

Ok, i'll try an add additional details. Firstly: The diagram below shows our current dataflow: Our raw table is defined as such: TABLES = ['table1','table2']   def generate_tables(table_name): @dlt.table( name=f'raw_{table_name}', table_pro...

  • 0 kudos
1 More Replies
JordanYaker
by Contributor
  • 6624 Views
  • 8 replies
  • 1 kudos

Why is Delta Lake creating a 238.0TiB shuffle on merge?

I'm frankly at a loss here. I have a task that is consistently performing just awfully. I took some time this morning to try and debug it and the physical plan is showing a 238TiB shuffle:== Physical Plan == AdaptiveSparkPlan (40) +- == Current Plan...

image
  • 6624 Views
  • 8 replies
  • 1 kudos
Latest Reply
Vartika
Databricks Employee
  • 1 kudos

Hi @Jordan Yaker​,Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

  • 1 kudos
7 More Replies
Abel_Martinez
by Contributor
  • 26204 Views
  • 10 replies
  • 39 kudos

Why Python logs shows [REDACTED] literal in spaces when I use dbutils.secrets.get in my code?

When I use  dbutils.secrets.get in my code, spaces in the log are replaced by "[REDACTED]" literal. This is very annoying and makes the log reading difficult. Any idea how to avoid this?See my screenshot...

  • 26204 Views
  • 10 replies
  • 39 kudos
Latest Reply
jlb0001
New Contributor III
  • 39 kudos

I ran into the same issue and found that the reason was that the notebook included some test keys with values of "A" and "B" for simple testing. I noticed that any string with a substring of "A" or "B" was "[REDACTED]".​So, in my case, it was an eas...

  • 39 kudos
9 More Replies
ShellyXiao
by New Contributor II
  • 12864 Views
  • 1 replies
  • 0 kudos

Azure Databricks cluster driver config

Hi there,I am trying to set up databricks storage account access in Global init script. according to Azure Databricks document on creating cluster with driver config for all clusters (https://learn.microsoft.com/en-us/azure/databricks/archive/compute...

  • 12864 Views
  • 1 replies
  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 2811 Views
  • 1 replies
  • 7 kudos

Databricks recently added SQL alerts feature that enables users to create notifications based on various conditions and trigger them within their job ...

Databricks recently added SQL alerts feature that enables users to create notifications based on various conditions and trigger them within their job workflows.SQL alerts inform users about potential issues and easily ensure critical data availabilit...

ezgif-3-324d970c83
  • 2811 Views
  • 1 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Thanks for sharing this @Hubert Dudek​ .

  • 7 kudos
sintsan
by New Contributor II
  • 2130 Views
  • 1 replies
  • 1 kudos

Resolved! spark.sparkContext.setCheckpointDir - External Azure Storage

Is it possible to direct spark.sparkContext.setCheckpointDir to an external Azure Storage Container location (instead of DBFS) & if so how, there's very little documentation on that.

  • 2130 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

yes,the directory must be an HDFS path if running on a cluster.All you need to do is provide the correct path.

  • 1 kudos
Praveen
by New Contributor II
  • 11541 Views
  • 8 replies
  • 1 kudos

Resolved! Pass Typesafe config file to the Spark Submit Job

Hello everyone ! I am trying to pass a Typesafe config file to the spark submit task and print the details in the config file. Code: import org.slf4j.{Logger, LoggerFactory} import com.typesafe.config.{Config, ConfigFactory} import org.apache.spa...

  • 11541 Views
  • 8 replies
  • 1 kudos
Latest Reply
source2sea
Contributor
  • 1 kudos

I've experenced similar issues; please help to answer how to get this working;I've tried using below to be either /dbfs/mnt/blah path or dbfs:/mnt/blah pathin either spark_submit_task or spark_jar_task (via cluster spark_conf for java optinos); no su...

  • 1 kudos
7 More Replies
Anonymous
by Not applicable
  • 1190 Views
  • 0 replies
  • 1 kudos

Hi Everyone, As we continue to see the rise of AI, particularly with technologies such as ChatGPT, it's important to consider how this will impact...

Hi Everyone,As we continue to see the rise of AI, particularly with technologies such as ChatGPT, it's important to consider how this will impact the future of our workplaces and long-term career goals. With that in mind, I would love to hear your th...

  • 1190 Views
  • 0 replies
  • 1 kudos
Vinay123
by New Contributor III
  • 8919 Views
  • 2 replies
  • 1 kudos

Unity catlog replication or Disaster recovery implementation

I am working on Disaster recovery implementation on databricks on aws.I am not able to find how to implement with unity catalog. I am planning to create two workspaces in two different regions, one would be primary workspace which will be active and ...

  • 8919 Views
  • 2 replies
  • 1 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 1 kudos

@Suram Vinay​ From my end i have not implemented this, but just cheked this blog previously. terraform script will help for DR setup. https://www.databricks.com/blog/2022/07/18/disaster-recovery-automation-and-tooling-for-a-databricks-workspace.htmlc...

  • 1 kudos
1 More Replies
pranathisg97
by New Contributor III
  • 4469 Views
  • 2 replies
  • 1 kudos

readStream query throws exception if there's no data in delta location.

Hi,I have a scenario where writeStream query writes the stream data to bronze location and I have to read from bronze, do some processing and finally write it to silver. I use S3 location for delta tablesBut for the very first execution , readStream ...

  • 4469 Views
  • 2 replies
  • 1 kudos
Latest Reply
Vartika
Databricks Employee
  • 1 kudos

Hi @Pranathi Girish​,Hope all is well!Checking in. If @Suteja Kanuri​'s answer helped, would you let us know and mark the answer as best? If not, would you be happy to give us more information?We'd love to hear from you.Thanks!

  • 1 kudos
1 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 4152 Views
  • 2 replies
  • 9 kudos

databricks Photon is a next-generation engine on the Databricks Lakehouse Platform that provides speedy query performance at a low cost.- Its function...

databricks Photon is a next-generation engine on the Databricks Lakehouse Platform that provides speedy query performance at a low cost.- Its function coverage is growing, and UDF under Photon is coming, which can bring significant improvements in us...

ezgif-5-724cb0ccf8
  • 4152 Views
  • 2 replies
  • 9 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 9 kudos

 

  • 9 kudos
1 More Replies
gdev
by New Contributor
  • 8459 Views
  • 6 replies
  • 3 kudos

Resolved! Migrate notebooks and workflows and others .

I want to move notebooks , workflows , data from one users to another user in Azure Databricks. We move have access to that databricks. Is it possible? If, yes. How to move it.

  • 8459 Views
  • 6 replies
  • 3 kudos
Latest Reply
deedstoke
New Contributor II
  • 3 kudos

Hope all is well!

  • 3 kudos
5 More Replies
Paru
by New Contributor II
  • 2363 Views
  • 1 replies
  • 1 kudos
  • 2363 Views
  • 1 replies
  • 1 kudos
Latest Reply
Vartika
Databricks Employee
  • 1 kudos

Hi @Parvez Mushaf Maniyar​,Hope everything is going well.Does @Kaniz Fatma​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 1 kudos
owen1
by New Contributor
  • 1787 Views
  • 2 replies
  • 2 kudos

workflow cluster was create error

I set the workflow to run at 12:00 every day in the workflow, but the workflow failed with the error message below, and I don't know why.Run result unavailable: run failed with error message Unexpected failure while waiting for the cluster (0506-0233...

  • 1787 Views
  • 2 replies
  • 2 kudos
Latest Reply
Murthy1
Contributor II
  • 2 kudos

Hello @Sangwoo Lee​ ,As mentioned by vignesh, it seems like an infra related issue. > Does the user (which executes the job) has access to start a cluster?> Incase if it is not an access issue and Incase if you are starting a lot of workflow jobs tog...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels