cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Gilg
by Contributor II
  • 1000 Views
  • 0 replies
  • 0 kudos

Best Practices Near Real-time Processing

HI All,We are ingesting 1000 files in json format and different sizes per minute. DLT is in continuous mode. Unity Catalog is enabled workspace.  We are using the default setting of Autoloader (Directory Listing) and Silver has CDC as well.We aim to ...

  • 1000 Views
  • 0 replies
  • 0 kudos
RajeshRK
by Contributor II
  • 8163 Views
  • 3 replies
  • 0 kudos

Need help to analyze databricks logs for a long-running job.

Hi Team,We have a job it completes in 3 minutes in one Databricks cluster, if we run the same job in another databricks cluster it is taking 3 hours to complete.I am quite new to Databricks and need your guidance on how to find out where databricks s...

  • 8163 Views
  • 3 replies
  • 0 kudos
Latest Reply
AmitKP
New Contributor II
  • 0 kudos

Hi @Retired_mod ,I am saving logs of my databricks Job Compute From ADF, How can i open those files that present in dbfs location.

  • 0 kudos
2 More Replies
Brad
by Contributor II
  • 4133 Views
  • 0 replies
  • 0 kudos

Why "rror: Invalid access to Org: xxx"

Hi team, I installed Databricks CLI, and run "databricks auth login --profile xxx" successfully. I can also connect from vscode to Databricks. "databricks clusters list -p xxx" also works. But when I tried to rundatabricks bundle validateI got"Error:...

  • 4133 Views
  • 0 replies
  • 0 kudos
Gilg
by Contributor II
  • 2397 Views
  • 1 replies
  • 0 kudos

Move files

HiI am using DLT with Autoloader.DLT pipeline is running in Continuous mode.Autoloader is in Directory Listing mode (Default)Question.I want to move files that has been processed by the DLT to another folder (archived) and planning to have another no...

  • 2397 Views
  • 1 replies
  • 0 kudos
Brad
by Contributor II
  • 3226 Views
  • 1 replies
  • 0 kudos

What is the behavior when merge key is not unique

Hi, When using the MERGE statement, if merge key is not unique on both source and target, it will throw error. If merge key is unique in source but not unique in target, WHEN MATCHED THEN DELETE/UPDATE should work or not? For example merge key is id....

  • 3226 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brad
Contributor II
  • 0 kudos

Cool, this is what I tested out. Great to get confirmed. Thanks. BTW, https://medium.com/@ritik20023/delta-lake-upserting-without-primary-key-f4a931576b0 has a workaround which can fix the merge with duplicate merge key on both source and target.

  • 0 kudos
Erik_L
by Contributor II
  • 1389 Views
  • 1 replies
  • 0 kudos

Visualizations failing to show

I have a SQL query that generates a table. I created a visualization from that table with the UI. I then have a widget that updates a value used in the query and re-runs the SQL, but then the visualization shows nothing, that there is "1 row," but if...

Screenshot from 2024-04-05 10-23-03.png
  • 1389 Views
  • 1 replies
  • 0 kudos
397973
by New Contributor III
  • 2304 Views
  • 3 replies
  • 0 kudos

Having trouble installing my own Python wheel?

I want to install my own Python wheel package on a cluster but can't get it working. I tried two ways: I followed these steps: https://docs.databricks.com/en/workflows/jobs/how-to/use-python-wheels-in-workflows.html#:~:text=March%2025%2C%202024,code%...

Data Engineering
cluster
Notebook
  • 2304 Views
  • 3 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@397973 - Once you uploaded the .whl file, did you had a chance to list the file manually in the notebook?  Also, did you had a chance to move the files to /Volumes .whl file?  

  • 0 kudos
2 More Replies
SyedSaqib
by New Contributor II
  • 2226 Views
  • 1 replies
  • 0 kudos

Delta Live Table : [TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view

Hi,I have a delta live table workflow with storage enabled for cloud storage to a blob store.Syntax of bronze table in notebook===@dlt.table(spark_conf = {"spark.databricks.delta.schema.autoMerge.enabled": "true"},table_properties = {"quality": "bron...

  • 2226 Views
  • 1 replies
  • 0 kudos
Latest Reply
SyedSaqib
New Contributor II
  • 0 kudos

Hi Kaniz,Thanks for replying back.I am using python for delta live table creation, so how can I set these configurations?When creating the table, add the IF NOT EXISTS clause to tolerate pre-existing objects.consider using the OR REFRESH clause Answe...

  • 0 kudos
Henrique_Lino
by New Contributor II
  • 3523 Views
  • 6 replies
  • 0 kudos

value is null after loading a saved df when using specific type in schema

 I am facing an issue when using databricks, when I set a specific type in my schema and read a json, its values are fine, but after saving my df and loading again, the value is gone.I have this sample code that shows this issue: from pyspark.sql.typ...

  • 3523 Views
  • 6 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

@Henrique_Lino , Where are you saving your df?

  • 0 kudos
5 More Replies
Anandsingh
by New Contributor
  • 887 Views
  • 1 replies
  • 0 kudos

Writing to multiple files/tables from data held within a single file through autoloader

I have a requirement to read and parse JSON files using autoloader where incoming JSON file has multiple sub entities. Each sub entity needs to go into its own delta table. Alternatively we can write each entity data to individual files. We can use D...

  • 887 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

I think using DLT's medallion architecture should be helpful in this scenario. You can write all the incoming data to one bronze table and one silver table. And you can have multiple gold tables based on the value of the sub-entities.

  • 0 kudos
Kavi_007
by New Contributor III
  • 4918 Views
  • 6 replies
  • 1 kudos

Resolved! Seeing history even after vacuuming the Delta table

Hi,I'm trying to do the vacuum on a Delta table within a unity catalog. The default retention is 7 days. Though I vacuum the table, I'm able to see the history beyond 7 days. Tried restarting the cluster but still not working. What would be the fix ?...

  • 4918 Views
  • 6 replies
  • 1 kudos
Latest Reply
Kavi_007
New Contributor III
  • 1 kudos

No, that's wrong. VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold.VACUUM - Azu...

  • 1 kudos
5 More Replies
iarregui
by New Contributor
  • 2681 Views
  • 2 replies
  • 0 kudos

Getting a Databricks static IP

Hello. I want to connect from my Databricks workspace to an external API to extract some data. The owner of the API asks for an IP to provide the token necessary for the extraction of data. Therefore I would need to set a static IP in Databricks that...

  • 2681 Views
  • 2 replies
  • 0 kudos
Latest Reply
Wojciech_BUK
Valued Contributor III
  • 0 kudos

Hello, the easiest way (in Azure) is to deploy Workspace in VNET Injection mode and attach NAT Gateway to you VNET. NAT GW require Public IP. This IP will be your static egress IP for all Cluster in for this Workspace.Note: both NAT GW and IP Address...

  • 0 kudos
1 More Replies
Jon
by New Contributor II
  • 3695 Views
  • 4 replies
  • 5 kudos

IP address fix

How can I fix the IP address of my Azure Cluster so that I can whitelist the IP address to run my job daily on my python notebook? Or can I find out the IP address to perform whitelisting? Thanks

  • 3695 Views
  • 4 replies
  • 5 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 5 kudos

Depends on the scenario.  You could expose a single ip address to the external internet, but databricks itself will always use many addresses.

  • 5 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels