cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

manish1987c
by New Contributor III
  • 2536 Views
  • 1 replies
  • 0 kudos

Delta Live table expectations

I am able to ues expectation feature in delta live table using by creating the expectations as below   checks = {}checks["validate circuitId col for null values"] = "(circuitId IS not NULL)"checks["validate name col for null values"] = "(name IS not ...

  • 2536 Views
  • 1 replies
  • 0 kudos
Gilg
by Contributor II
  • 1734 Views
  • 0 replies
  • 0 kudos

Best Practices Near Real-time Processing

HI All,We are ingesting 1000 files in json format and different sizes per minute. DLT is in continuous mode. Unity Catalog is enabled workspace.  We are using the default setting of Autoloader (Directory Listing) and Silver has CDC as well.We aim to ...

  • 1734 Views
  • 0 replies
  • 0 kudos
RajeshRK
by Contributor II
  • 10617 Views
  • 3 replies
  • 0 kudos

Need help to analyze databricks logs for a long-running job.

Hi Team,We have a job it completes in 3 minutes in one Databricks cluster, if we run the same job in another databricks cluster it is taking 3 hours to complete.I am quite new to Databricks and need your guidance on how to find out where databricks s...

  • 10617 Views
  • 3 replies
  • 0 kudos
Latest Reply
AmitKP
New Contributor II
  • 0 kudos

Hi @Retired_mod ,I am saving logs of my databricks Job Compute From ADF, How can i open those files that present in dbfs location.

  • 0 kudos
2 More Replies
Gilg
by Contributor II
  • 3222 Views
  • 1 replies
  • 0 kudos

Move files

HiI am using DLT with Autoloader.DLT pipeline is running in Continuous mode.Autoloader is in Directory Listing mode (Default)Question.I want to move files that has been processed by the DLT to another folder (archived) and planning to have another no...

  • 3222 Views
  • 1 replies
  • 0 kudos
MikeGo
by Contributor II
  • 5247 Views
  • 1 replies
  • 0 kudos

What is the behavior when merge key is not unique

Hi, When using the MERGE statement, if merge key is not unique on both source and target, it will throw error. If merge key is unique in source but not unique in target, WHEN MATCHED THEN DELETE/UPDATE should work or not? For example merge key is id....

  • 5247 Views
  • 1 replies
  • 0 kudos
Latest Reply
MikeGo
Contributor II
  • 0 kudos

Cool, this is what I tested out. Great to get confirmed. Thanks. BTW, https://medium.com/@ritik20023/delta-lake-upserting-without-primary-key-f4a931576b0 has a workaround which can fix the merge with duplicate merge key on both source and target.

  • 0 kudos
Erik_L
by Contributor II
  • 2445 Views
  • 1 replies
  • 0 kudos

Visualizations failing to show

I have a SQL query that generates a table. I created a visualization from that table with the UI. I then have a widget that updates a value used in the query and re-runs the SQL, but then the visualization shows nothing, that there is "1 row," but if...

Screenshot from 2024-04-05 10-23-03.png
  • 2445 Views
  • 1 replies
  • 0 kudos
397973
by New Contributor III
  • 4273 Views
  • 3 replies
  • 0 kudos

Having trouble installing my own Python wheel?

I want to install my own Python wheel package on a cluster but can't get it working. I tried two ways: I followed these steps: https://docs.databricks.com/en/workflows/jobs/how-to/use-python-wheels-in-workflows.html#:~:text=March%2025%2C%202024,code%...

Data Engineering
cluster
Notebook
  • 4273 Views
  • 3 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@397973 - Once you uploaded the .whl file, did you had a chance to list the file manually in the notebook?  Also, did you had a chance to move the files to /Volumes .whl file?  

  • 0 kudos
2 More Replies
SyedSaqib
by New Contributor II
  • 4021 Views
  • 1 replies
  • 0 kudos

Delta Live Table : [TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view

Hi,I have a delta live table workflow with storage enabled for cloud storage to a blob store.Syntax of bronze table in notebook===@dlt.table(spark_conf = {"spark.databricks.delta.schema.autoMerge.enabled": "true"},table_properties = {"quality": "bron...

  • 4021 Views
  • 1 replies
  • 0 kudos
Latest Reply
SyedSaqib
New Contributor II
  • 0 kudos

Hi Kaniz,Thanks for replying back.I am using python for delta live table creation, so how can I set these configurations?When creating the table, add the IF NOT EXISTS clause to tolerate pre-existing objects.consider using the OR REFRESH clause Answe...

  • 0 kudos
Henrique_Lino
by New Contributor II
  • 6912 Views
  • 6 replies
  • 0 kudos

value is null after loading a saved df when using specific type in schema

 I am facing an issue when using databricks, when I set a specific type in my schema and read a json, its values are fine, but after saving my df and loading again, the value is gone.I have this sample code that shows this issue: from pyspark.sql.typ...

  • 6912 Views
  • 6 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

@Henrique_Lino , Where are you saving your df?

  • 0 kudos
5 More Replies
Anandsingh
by New Contributor
  • 2368 Views
  • 1 replies
  • 0 kudos

Writing to multiple files/tables from data held within a single file through autoloader

I have a requirement to read and parse JSON files using autoloader where incoming JSON file has multiple sub entities. Each sub entity needs to go into its own delta table. Alternatively we can write each entity data to individual files. We can use D...

  • 2368 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

I think using DLT's medallion architecture should be helpful in this scenario. You can write all the incoming data to one bronze table and one silver table. And you can have multiple gold tables based on the value of the sub-entities.

  • 0 kudos
Kavi_007
by Databricks Partner
  • 10303 Views
  • 6 replies
  • 1 kudos

Resolved! Seeing history even after vacuuming the Delta table

Hi,I'm trying to do the vacuum on a Delta table within a unity catalog. The default retention is 7 days. Though I vacuum the table, I'm able to see the history beyond 7 days. Tried restarting the cluster but still not working. What would be the fix ?...

  • 10303 Views
  • 6 replies
  • 1 kudos
Latest Reply
Kavi_007
Databricks Partner
  • 1 kudos

No, that's wrong. VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold.VACUUM - Azu...

  • 1 kudos
5 More Replies
Jon
by New Contributor II
  • 9295 Views
  • 4 replies
  • 5 kudos

IP address fix

How can I fix the IP address of my Azure Cluster so that I can whitelist the IP address to run my job daily on my python notebook? Or can I find out the IP address to perform whitelisting? Thanks

  • 9295 Views
  • 4 replies
  • 5 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 5 kudos

Depends on the scenario.  You could expose a single ip address to the external internet, but databricks itself will always use many addresses.

  • 5 kudos
3 More Replies
DE_K
by New Contributor II
  • 5088 Views
  • 4 replies
  • 0 kudos

@dlt.table throws error AttributeError: module 'dlt' has no attribute 'table'

Hi Everyone,I am new to DLT and am trying to run below code to create table dynamically. But I get error "AttributeError: module 'dlt' has no attribute 'table'". code snippet:def generate_tables(model_name   try:    spark.sql("select * from dlt.{0}"....

DE_K_0-1712148007690.png
Data Engineering
dataengineering
datapipeline
deltalivetables
dlt
  • 5088 Views
  • 4 replies
  • 0 kudos
Latest Reply
YuliyanBogdanov
New Contributor III
  • 0 kudos

Thank You, @DE_K. I see your point. I believe you are using the @dlt.table instead of @dlt.create_table to begin with, since want the table to be created and not define and existing one. (https://community.databricks.com/t5/data-engineering/differenc...

  • 0 kudos
3 More Replies
Labels