cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Mr__E
by Contributor II
  • 1231 Views
  • 2 replies
  • 3 kudos

Sync prod WS DBs to dev WS DBs

We have a couple sources we'd already set up to stream to prod using a 3p system. Is there a way to sync this directly to our dev workspace to build pipelines? eg. directly connecting to a cluster in prod and pull with a job cluster, dump to S3 and u...

  • 1231 Views
  • 2 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Erik Louie​ , We haven't heard from you on the last response from @Debayan Mukherjee​, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to oth...

  • 3 kudos
1 More Replies
spyderfaye
by New Contributor II
  • 1324 Views
  • 3 replies
  • 1 kudos

Has anyone come across an issue where a table join fails for a single row, when there is no reason for this to happen?

So, I have a super simple left join from one table to another it's purpose to retrieve the date of birth for a customer from the customer ID FK in the transaction table to the customer ID PK in the customer table. A customer will have several transac...

  • 1324 Views
  • 3 replies
  • 1 kudos
Latest Reply
Vidula
Honored Contributor
  • 1 kudos

Hi @Faye Hughes​ Thank you so much for getting back to us. It's really great of you to send in the solution and mark the answer as best. We really appreciate your time.Wish you a great Databricks journey ahead!

  • 1 kudos
2 More Replies
Michael_Galli
by Contributor III
  • 1768 Views
  • 3 replies
  • 3 kudos

Streaming with Delta table source- definition of "File"?

Hi all,I have a Delta Table as a Spark Streaming source.This table contains signals on row level -> each signal is one append to the source table that creates a new version in the delta transaction history.I am not really sure now how Spark streaming...

  • 1768 Views
  • 3 replies
  • 3 kudos
Latest Reply
Vidula
Honored Contributor
  • 3 kudos

Hey there @Michael Galli​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from y...

  • 3 kudos
2 More Replies
j02424
by New Contributor
  • 2732 Views
  • 2 replies
  • 4 kudos

Best practice to delete /dbfs/tmp ?

What is best practice regarding the tmp folder? We have a very large amount of data in that folder and not sure whether to delete, back up etc?

  • 2732 Views
  • 2 replies
  • 4 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 4 kudos

Hi @James Owen​, We haven’t heard from you on the last response from @Debayan Mukherjee​, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpful to...

  • 4 kudos
1 More Replies
yatharthmahesh
by New Contributor III
  • 2478 Views
  • 4 replies
  • 6 kudos

ENABLE CHANGE DATA FEED FOR EXISTING DELTA-TABLE

I have a delta table already created, now I want to enable the change data feed. I read that I have to set delta.enableChangeDataFeed property to true. But however, this cannot be done using the Scala API. I tried using this but it didn't work. I am ...

  • 2478 Views
  • 4 replies
  • 6 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 6 kudos

Hi @Yatharth Maheshwari​, We haven’t heard from you on the last response from @Jose Gonzalez​, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpf...

  • 6 kudos
3 More Replies
AnandR
by New Contributor
  • 941 Views
  • 2 replies
  • 1 kudos

I have 2 roles created for my Dbricks acc on AWS. Want to know which role will be used by Dbricks for AWS resources (ex. Cluster Creation)

I have 1 role with AWS root account and 1 role wit AWS non-root account. How do I tell Dbricks to use specific role for cluster creation ? Please guide me here or if any documentation will also suffice . Thanks.

  • 941 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Anandkumar Ravikumar​ , We haven't heard from you on the last response from @Aman Sehgal​, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to...

  • 1 kudos
1 More Replies
TT1
by New Contributor III
  • 2099 Views
  • 3 replies
  • 8 kudos
  • 2099 Views
  • 3 replies
  • 8 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 8 kudos

Hi @Thao Ton​, We haven't heard from you on the last response from @Hubert Dudek​ and @Aman Sehgal​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please share it with the community as it can be he...

  • 8 kudos
2 More Replies
zyang
by Contributor
  • 1568 Views
  • 1 replies
  • 4 kudos

pyspark delta table schema evolution

I am using the schema evolution in the delta table and the code is written in databricks notebook. df.write .format("delta") .mode("append") .option("mergeSchema", "true") .partitionBy("date") .save(path)But I ...

  • 1568 Views
  • 1 replies
  • 4 kudos
Latest Reply
Noopur_Nigam
Valued Contributor II
  • 4 kudos

Hi @z yang​ Please provide the df creation code as well to understand the complete exception and scenario.

  • 4 kudos
Bharath_1610
by New Contributor
  • 1640 Views
  • 2 replies
  • 1 kudos

Resolved! Check Existence of table

Hi Team,How do we check the existence of a table in ADF container using SQL query in Databricks?Thanks in advance.

  • 1640 Views
  • 2 replies
  • 1 kudos
Latest Reply
Noopur_Nigam
Valued Contributor II
  • 1 kudos

Hi, please elaborate on the issue for us to help you resolve it.

  • 1 kudos
1 More Replies
lokeshr
by New Contributor
  • 1113 Views
  • 2 replies
  • 1 kudos

Clarity on usage STREAM while defining DLT tables

Hi, I am currently trying to learn Databricks and going through tutorials and learning materials. I came across this link https://databricks.com/discover/pages/getting-started-with-delta-live-tablesWhile I get most of what is described in page, I fin...

  • 1113 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @Lokesh Raju​,Just a friendly follow-up. Did Tomasz's response help you to resolved your question? If it did, please mark it as best.

  • 1 kudos
1 More Replies
KumarShiv
by New Contributor III
  • 1793 Views
  • 2 replies
  • 2 kudos

Resolved! Databricks Spark SQL function "PERCENTILE_DISC()" output not accurate.

I am try to get the percentile values on different splits but I got that the result of Databricks PERCENTILE_DISC() function is not accurate . I have run the same query on MS SQL but getting different result set.Here are both result sets for Pyspark ...

  • 1793 Views
  • 2 replies
  • 2 kudos
Latest Reply
artsheiko
Honored Contributor
  • 2 kudos

The reason might be that in SQL PERCENTILE_DISC is nondeterministic

  • 2 kudos
1 More Replies
Trung
by Contributor
  • 3054 Views
  • 5 replies
  • 5 kudos

Job fail due to Access Denied

please help me to solve the problem that my data bricks account can not start the Job by triggering manually or scheduling although I can run the script without error.

image.png
  • 3054 Views
  • 5 replies
  • 5 kudos
Latest Reply
Vivian_Wilfred
Honored Contributor
  • 5 kudos

Hi @trung nguyen​ , Please check if you have the necessary instance profile attached to the Job cluster. You are definitely missing something related to the IAM.

  • 5 kudos
4 More Replies
Anonymous
by Not applicable
  • 1405 Views
  • 4 replies
  • 4 kudos

Invalid shard address

I'm running pyspark through databricks-connect and getting an error saying```ERROR SparkClientManager: Fail to get the SparkClientjava.util.concurrent.ExecutionException: com.databricks.service.SparkServiceConnectionException: Invalid shard address:`...

  • 1405 Views
  • 4 replies
  • 4 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 4 kudos

hi @Marco Wong​ was this working before and failing now? Are you behind a VPN or firewall? If so can you check by disabling it?enable traces at wireshark and collected dump to check if there is traffic going to workspace?Check if you can get curl wor...

  • 4 kudos
3 More Replies
krsimons
by New Contributor
  • 991 Views
  • 3 replies
  • 0 kudos

How do I automate my Databricks script?

How do I automate my Databricks script?

  • 991 Views
  • 3 replies
  • 0 kudos
Latest Reply
Vartika
Moderator
  • 0 kudos

Hey there @Kayla Simons​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell...

  • 0 kudos
2 More Replies
fshimamoto
by New Contributor III
  • 1976 Views
  • 3 replies
  • 2 kudos

What are the best practices for schema drift using Delta Live tables, in a scenario where the main source is a no sql database and we have a lot of ch...

What are the best practices for schema drift using Delta Live tables, in a scenario where the main source is a no sql database and we have a lot of changes in the schema?​

  • 1976 Views
  • 3 replies
  • 2 kudos
Latest Reply
Vartika
Moderator
  • 2 kudos

Hey there @Fernando Martin​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from...

  • 2 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels