cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AliviaB
by New Contributor
  • 714 Views
  • 1 replies
  • 0 kudos

Authorization Issue while creating first Unity catalog table

 Hi All,We are setting up our new UC enabled databricks workspace. We have completed the metastore setup for our workspace and we have created new catalog and schema. But while creating a table we are getting authorization issue. Below is the table s...

  • 714 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

Are there locations specified for the catalog/table/schema? Or do you keep these at defaults?  Also, do you have a storage credential and external location set for mystorageaccount/mycontainer?

  • 0 kudos
lucami
by Contributor
  • 2715 Views
  • 1 replies
  • 0 kudos

Resolved! Understanding dropDuplicates in Delta Live Tables (DLT) with Photon

Hi everyone,I've been working with Delta Live Tables (DLT) in Databricks, and I'm particularly interested in understanding how the dropDuplicates function works when using the Photon engine. Photon is known for its columnar data processing capabiliti...

plan.png
  • 2715 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

FIRST() never stitches together values from different rows.When Photon executes dropDuplicates, it deterministically chooses one complete row for each set of duplicate keys and returns every column from that same row. If you ever encounter a result w...

  • 0 kudos
surajitDE
by Contributor
  • 1214 Views
  • 2 replies
  • 1 kudos

Resolved! How to Enable Sub-300 Millisecond Real-Time Mode in Delta Live Tables (DLT)

Hi folks,During the recent Data + AI Summit, there was a mention of a new real-time streaming mode in Delta Live Tables (DLT) that enables sub-300 millisecond latency. This sounds really promising!Could someone please guide me on:How do we enable thi...

  • 1214 Views
  • 2 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

Real-time mode, right now, is in private preview. Reach out to your account team for enablement. It's separate from pipelines.trigger.interval. The engine is the same, just a different mode within it.

  • 1 kudos
1 More Replies
pargit2
by New Contributor II
  • 1443 Views
  • 5 replies
  • 0 kudos

dlt vs delta table

Hi,I'm building gold layer and silver layers. in bronze I ingest using auto loader.  data is getting updated once a month. should I save the fd in silver notebooks using delta live table or delta table? in the past I used simple: df.write.save("s3.."...

  • 1443 Views
  • 5 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 0 kudos

I would say if the data is not complex and you are not handling any DQ checks in the pipeline the go for a regular databricks workflow and save it as delta table since you are refreshing the data every 1 month and it is not streaming workload.

  • 0 kudos
4 More Replies
taruntarun1345
by New Contributor
  • 2904 Views
  • 1 replies
  • 0 kudos

cluster creation

Hey all, I am facing an issue in creating a cluster. I can only see the SQL warehouse and its server creation. But I need to create a cluster to work on a data engineering project.

  • 2904 Views
  • 1 replies
  • 0 kudos
Latest Reply
jameshughes
Databricks Partner
  • 0 kudos

A couple of things to explore here as it can be solved a couple of different ways.1. A workspace admin needs to update your Entitlements to allow for cluster creation.  This is generally not a best practice as it can lead to unmanaged cluster sprawl....

  • 0 kudos
Monteiro_12
by New Contributor II
  • 769 Views
  • 1 replies
  • 0 kudos

How to Add a Certified Tag to a Table Using a DLT Pipeline

Is there a table property or configuration that allows me to add a certified tag directly to a table when using a Delta Live Tables pipeline?

  • 769 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor II
  • 0 kudos

Hi @Monteiro_12 ,As far as I know, DLT pipeline doesn’t support adding a certified tag directly through table properties or pipeline configurations. Tags like system.Certified needs to be applied manually after the table is created via SQL

  • 0 kudos
samgon
by New Contributor III
  • 4448 Views
  • 4 replies
  • 4 kudos

Resolved! study materials for Certified Data Engineer Professional Certification?

Can anyone recommend high-quality study materials or resources (courses, documentation, practice exams, etc.) that helped you prepare for the Professional-level exam?

Data Engineering
dataengineering
  • 4448 Views
  • 4 replies
  • 4 kudos
Latest Reply
samgon
New Contributor III
  • 4 kudos

Thanks alot for the suggestion, much appreciated , I already pass the associate exam!

  • 4 kudos
3 More Replies
HariPrasad1
by Databricks Partner
  • 1148 Views
  • 2 replies
  • 0 kudos

Unable to create log files using logging.basicConfig()

When I run this code below, I am not able to see the file under the path specified:import logginglogger = logging.getLogger(__name__)logging.basicConfig(filename='/Volumes/d_use1_ach_dbw_databricks1/default/ach_elegibility_raw/logs/example.log', enco...

  • 1148 Views
  • 2 replies
  • 0 kudos
Latest Reply
Yogesh_Verma_
Contributor II
  • 0 kudos

The issue is happening because you're calling logging.getLogger(__name__) before setting up logging.basicConfig(). When the logger is created too early, it doesn't know about the file handler, so it doesn't write to the file.To fix this, make sure yo...

  • 0 kudos
1 More Replies
Datamate
by New Contributor
  • 934 Views
  • 2 replies
  • 0 kudos

Databricks Connecting to ADLS Gen2 vs Azure SQL

What is the best approach to connect Databricks with Azure SQL or connect Databricks with ADLS Gen2.I am designing the system where I am planning to Integrate Databricks to Azure.May someone share experience Pros and cons of approach and best practic...

  • 934 Views
  • 2 replies
  • 0 kudos
Latest Reply
kavithai
New Contributor II
  • 0 kudos

Use Azure SQL Spark Connector. This method allows Databricks to read from and write to Azure SQL Database efficiently, supporting both bulk operations and secure authentication.Azure sql : Install connector, configure JDBC, use Key Vault, set permiss...

  • 0 kudos
1 More Replies
saicharandeepb
by Contributor
  • 846 Views
  • 1 replies
  • 1 kudos

Resolved! I'm trying to understand if predicate pushdown is supported when using the DESCRIBE HISTORY command

I'm trying to understand if predicate pushdown is supported when using the DESCRIBE HISTORY command on a Delta table in Databricks.

  • 846 Views
  • 1 replies
  • 1 kudos
Latest Reply
Yogesh_Verma_
Contributor II
  • 1 kudos

`DESCRIBE HISTORY` on a Delta table in Databricks does **not support predicate pushdown** in the same way as regular SQL queries on data tables.This is because `DESCRIBE HISTORY` is a **metadata operation** that reads the Delta log files to return ta...

  • 1 kudos
lucami
by Contributor
  • 1888 Views
  • 5 replies
  • 2 kudos

Resolved! Validation with views - Dlt pipeline expectations

I have a question about how expectations work when applied to views inside a Delta Live Tables (DLT) pipeline. For instance, suppose we define this view inside a pipeline to stop the pipeline if we spot some duplicates:@Dlt.view( name=view_name, ...

  • 1888 Views
  • 5 replies
  • 2 kudos
Latest Reply
Yogesh_Verma_
Contributor II
  • 2 kudos

In DLT, expectations defined with dlt.expect_or_fail() on views are only evaluated if the view is used downstream by a materialized table. Since views are logical and lazily evaluated, if no table depends on the view, the expectation is skipped and t...

  • 2 kudos
4 More Replies
bgerhardi
by New Contributor III
  • 17102 Views
  • 13 replies
  • 13 kudos

Surrogate Keys with Delta Live

We are considering moving to Delta Live tables from a traditional sql-based data warehouse. Worrying me is this FAQ on identity columns Delta Live Tables frequently asked questions | Databricks on AWS this seems to suggest that we basically can't cre...

  • 17102 Views
  • 13 replies
  • 13 kudos
Latest Reply
tmaund1704
New Contributor II
  • 13 kudos

Hi , Is there any resolution for the above?Thanks

  • 13 kudos
12 More Replies
sahil_s_jain
by New Contributor III
  • 1381 Views
  • 3 replies
  • 0 kudos

How to Exclude or Overwrite Specific JARs in Databricks Jars

Spark Version in Databricks 15.5 LTS: The runtime includes Apache Spark 3.5.x, which defines the SparkListenerApplicationEnd constructor as:public SparkListenerApplicationEnd(long time)This constructor takes a single long parameter.Conflicting Spark ...

  • 1381 Views
  • 3 replies
  • 0 kudos
Latest Reply
baljeetyadav_23
New Contributor II
  • 0 kudos

Hi Alberto_Umana,Do we have fix of this issue in 16.4 LTS?

  • 0 kudos
2 More Replies
pooja_bhumandla
by Databricks Partner
  • 1620 Views
  • 3 replies
  • 0 kudos

data file size

"numRemovedFiles": "2099","numRemovedBytes": "29658974681","p25FileSize": "29701688","numDeletionVectorsRemoved": "0","minFileSize": "19920357","numAddedFiles": "883","maxFileSize": "43475356","p75FileSize": "34394580","p50FileSize": "31978037","numA...

  • 1620 Views
  • 3 replies
  • 0 kudos
Latest Reply
pooja_bhumandla
Databricks Partner
  • 0 kudos

What are the criterias based on which max and min files sizes vary from target file size? 

  • 0 kudos
2 More Replies
Labels