cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

pyter
by New Contributor III
  • 5598 Views
  • 6 replies
  • 2 kudos

Resolved! [13.3] Vacuum on table fails if shallow clone without write access exists

Hello everyone,We use unity catalog, separating our dev, test and prod data into individual catalogs.We run weekly vacuums on our prod catalog using a service principal that only has (read+write) access to this production catalog, but no access to ou...

  • 5598 Views
  • 6 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 2 kudos
5 More Replies
rt-slowth
by Contributor
  • 727 Views
  • 2 replies
  • 0 kudos

Handling files used more than once in a streaming pipeline

I am implementing Structured Streaming using Delta Live Table. I want to delete the parquet files once they are used. What options should I set so that the files loaded in S3 are not deleted?

  • 727 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 0 kudos
1 More Replies
JasonThomas
by New Contributor III
  • 1287 Views
  • 2 replies
  • 0 kudos

Row-level Concurrency and Liquid Clustering compatibility

The documentation is a little ambiguous:"Row-level concurrency is only supported on tables without partitioning, which includes tables with liquid clustering."https://docs.databricks.com/en/release-notes/runtime/14.2.html Tables with liquid clusterin...

  • 1287 Views
  • 2 replies
  • 0 kudos
Latest Reply
JasonThomas
New Contributor III
  • 0 kudos

Cluster-on-write is something being worked on. The limitations at the moment have to do with accommodating streaming workloads.I found the following informative:https://www.youtube.com/watch?v=5t6wX28JC_M

  • 0 kudos
1 More Replies
ChristianRRL
by Contributor III
  • 1641 Views
  • 1 replies
  • 1 kudos

Resolved! DLT Dedupping Best Practice in Medallion

Hi there, I have what may be a deceptively simple question but I suspect may have a variety of answers:What is the "right" place to handle dedupping using the medallion architecture?In my example, I already have everything properly laid out with data...

  • 1641 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @ChristianRRL,  Your approach to handling deduplication in the Silver layer of the Medallion architecture is quite common and aligns with the general principles of this architecture. In the Medallion architecture, data flows through different l...

  • 1 kudos
Gcabrera
by New Contributor
  • 903 Views
  • 1 replies
  • 0 kudos

Issue importing library deltalake

Hello,I'm currently seeing a rather cryptic error message whenever I try to import the deltalake library into Databricks (without actually doing anything else).import datalake"ImportError: /local_disk0/.ephemeral_nfs/envs/pythonEnv-cbe496f6-d064-40ae...

Gcabrera_0-1701937296901.png Gcabrera_1-1701937506250.png Gcabrera_2-1701937513959.png
  • 903 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Are you trying to import this library to a Databricks notebook? are you using open source spark in your local machine?  

  • 0 kudos
N_M
by Contributor
  • 3358 Views
  • 5 replies
  • 1 kudos

Resolved! ignoreCorruptFiles behavior with CSV and COPY INTO

HiI'm using the COPY INTO command to insert new data (in form of CSVs) into an already existing table.The SQL query takes care of the conversion of the fields to the target table schema (well, there isn't other way to do that), and schema update is n...

Data Engineering
COPY INTO
ignoreCorruptFiles
  • 3358 Views
  • 5 replies
  • 1 kudos
Latest Reply
N_M
Contributor
  • 1 kudos

I actually found an option that could solve the newline issue I mentioned in my previous post:setting spark.sql.csv.parser.columnPruning.enabled to false withspark.conf.set("spark.sql.csv.parser.columnPruning.enabled", False)will consider malformed r...

  • 1 kudos
4 More Replies
shivam-singh
by New Contributor
  • 1245 Views
  • 1 replies
  • 0 kudos

DLT || Python || Aggregate Functions recomputing all the records

Hi all, I am building a realtime dashboard using Databricks Delta Live Tables Pipeline and using the following steps : - Bronze Table : Using the autoloader functionality provided by databricks, its incrementally ingesting new files records into a br...

  • 1245 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @shivam-singh,  One option is to use change data capture (CDC) in Databricks Delta Live Tables Pipeline to update tables based on changes in source data. CDC lets you capture and apply changes from streaming sources like Kafka, S3, or Delta Lake t...

  • 0 kudos
rt-slowth
by Contributor
  • 481 Views
  • 0 replies
  • 0 kudos

Help design my streaming pipeline

###Data Source- AWS RDS- Database migration tasks have been created using AWS DMS- Relevant cdc information is being stored in a specific bucket in S3### Data frequency- Once a day (but not sure when, sometime after 6pm)### Development environment- d...

  • 481 Views
  • 0 replies
  • 0 kudos
Phani1
by Valued Contributor II
  • 1522 Views
  • 1 replies
  • 0 kudos

Query Delta table from .net

Hi Team,How can expose data stored in delta table through API like exposing sql data through .net api?

Data Engineering
delta
dotnet
  • 1522 Views
  • 1 replies
  • 0 kudos
Latest Reply
BjarkeM
New Contributor II
  • 0 kudos

You can use the SQL Statement Execution API.At energinet.dk we have created this open-source .NET client, which we use internally in the company.

  • 0 kudos
Muhammed
by New Contributor III
  • 16583 Views
  • 15 replies
  • 0 kudos

Filtering files for query

Hi Team,While writing my data to datalake table I am getting 'filtering files for query', it would be stuck at writingHow can I resolve this issue

  • 16583 Views
  • 15 replies
  • 0 kudos
Latest Reply
kulkpd
Contributor
  • 0 kudos

My bad, somewhere in the screenshot I saw that but not able to find it now.Which source you are using to load the data, delta table, aws-s3, or azure-storage?

  • 0 kudos
14 More Replies
pankz-104
by New Contributor
  • 1189 Views
  • 2 replies
  • 0 kudos

how to read deleted files in adls

We have soft delete enabled in adls for 3 days, And we have manually deleted some checkpoint files size 3 tb approx. Each file is just couple of bytes like 30 b, 40 b. The deleted file size is increasing day by day even after couple of days. Suppose ...

  • 1189 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @pankz-104 , Just a friendly follow-up. Did you have time to test Kaniz's recommendations? do you still have issues? please let us know

  • 0 kudos
1 More Replies
quakenbush
by Contributor
  • 1529 Views
  • 2 replies
  • 1 kudos

Resolved! Is Autoloader suitable to load full dumps?

Hi,I recently completed the fundamentals & advanced data engineer exam, yet I've got a question about Autoloader. Please don't go too hard on me, since I lack practical experience at this point in time Docs say this is incremental ingestion, so it's ...

  • 1529 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Our End-of-Year Community Survey is here! Please take a few moments to complete the survey. Your feedback matters!

  • 1 kudos
1 More Replies
BriGuy
by New Contributor II
  • 1201 Views
  • 0 replies
  • 0 kudos

How can I efficiently write to easily queryable logs?

I've got a parallel running process loading multiple tables into the datalake. I'm writing my logs to a delta table using dataframewriter in append mode. The problem is that every save is taking a bit of time with what appears to be the calculation o...

  • 1201 Views
  • 0 replies
  • 0 kudos
BriGuy
by New Contributor II
  • 1231 Views
  • 2 replies
  • 0 kudos

process logging optimisation

I have created a process that runs a notebook multiple times in parallel with different parameters.  This was working quite quickly.  However I've added several logging steps that are appending log details to a dataframe then using dataframewriter to...

  • 1231 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @BriGuy, Regarding your Databricks SQL Editor issue, you’re not alone! Several users have faced similar problems.    Here are some steps you can take:   Contact Databricks Support: I recommend contacting Databricks support. File a support ticket t...

  • 0 kudos
1 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 4345 Views
  • 2 replies
  • 0 kudos

Resolved! dlt append_flow = multiple streams into a single Delta table

With the append_flow method in Delta Live Tables, you can effortlessly combine data from multiple streams into a single Delta table.

dlt_target.png
  • 4345 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Thank you for sharing this information @Hubert-Dudek 

  • 0 kudos
1 More Replies
Labels