Data Engineering

Forum Posts

Sorted by:

by Jorge3 • New Contributor III

03-15-2024 1:59:28 AM

9906 Views
3 replies
1 kudos

Dynamic partition overwrite with Streaming Data

Hi,I'm working on a job that propagate updates of data from a delta table to a parquet files (requirement of the consumer). The data is partitioned by day (year > month > day) and the daily data is updated every hour. I'm using table read streaming w...

Data Engineering

9906 Views
3 replies
1 kudos

03-15-2024 1:59:28 AM

View Replies

Latest Reply

JacintoArias
New Contributor III

06-12-2024 8:47:04 AM

1 kudos

We had a similar situation, @Hubert-Dudek we are using delta, but we are having some problems when propagating updates via merge, as you cannot read the resulting table as streaming source anymore... so using complete overwrite over parquet partition...

1 kudos

06-12-2024 8:47:04 AM

2 More Replies

by dbengineer516 • New Contributor III

06-12-2024 7:05:17 AM

9142 Views
1 replies
0 kudos

Resolved! IOStream.flush Timed Out

Hello,I'm encountering an issue with a Python script/notebook that I have developed and used in a daily job ran in Databricks. It has worked perfectly fine for months, but now continues to fail constantly. After digging a little deeper, when running ...

Data Engineering

9142 Views
1 replies
0 kudos

06-12-2024 7:05:17 AM

View Replies

Latest Reply

raphaelblg
Databricks Employee

06-12-2024 8:42:19 AM

0 kudos

Hello @dbengineer516 From my research it looks to be an IPython cache error. Maybe your python REPL is getting throttled due to too many requests. Please check: https://github.com/ipython/ipykernel/issues/334 This comment seems to be a possible solu...

0 kudos

06-12-2024 8:42:19 AM

by Philospher1425 • New Contributor II

06-06-2024 4:39:24 AM

9559 Views
4 replies
2 kudos

Rename the file in Databricks is so hard.How to make it simpler

Hi Community Actually my requirement is simple , I need to drop the files into Azure data Lake gen 2 storage from Databricks. But When I use df.coalesce(1).write.csv("url to gen 2/stage/) It's creating part .CSV file . But I need to rename to a cust...

Data Engineering

9559 Views
4 replies
2 kudos

06-06-2024 4:39:24 AM

View Replies

Latest Reply

raphaelblg
Databricks Employee

06-06-2024 12:48:02 PM

2 kudos

Hi @Philospher1425, Allow me to clarify that dbutils.fs serves as an interface to submit commands to your cloud provider storage. As such, the speed of copy operations is determined by the cloud provider and is beyond Databricks' control. That be...

2 kudos

06-06-2024 12:48:02 PM

3 More Replies

by Lizhi_Dong • New Contributor II

01-23-2023 6:27:55 AM

3789 Views
6 replies
1 kudos

Tables disappear when I re-start a new cluster on Community Edition

What would be the best plan for independent course creator?Hi folks! I want to use databrick community edition as the platform to teach online courses. As you may know, for community edition, you need to create a new cluster when the old one terminat...

Data Engineering

3789 Views
6 replies
1 kudos

01-23-2023 6:27:55 AM

View Replies

Latest Reply

Shivanshu_
Contributor

06-12-2024 8:28:03 AM

1 kudos

I believe only the metadata get's removed from HMS not the delta files from dbfs. Instead of loading the data again and again try using ctas with that dbfs location.

1 kudos

06-12-2024 8:28:03 AM

5 More Replies

by JonM • New Contributor

06-12-2024 8:10:17 AM

1099 Views
0 replies
0 kudos

Information_schema appears empty

Hi,We've encountered a problem with the information schema for one of our catalogs. For context: we're using dbt to implement our logic. We noticed this issue because dbt queries the information_schema.tables view to check which tables should be drop...

Data Engineering

1099 Views
0 replies
0 kudos

06-12-2024 8:10:17 AM

by venkateshgunda • New Contributor III

06-07-2024 9:29:23 AM

1457 Views
3 replies
3 kudos

Resolved! Tables and Databases disappear when I re-start a new cluster on Community Edition

Data Engineering

1457 Views
3 replies
3 kudos

06-07-2024 9:29:23 AM

View Replies

Latest Reply

raphaelblg
Databricks Employee

06-10-2024 1:21:20 PM

3 kudos

Hello @venkateshgunda, Community Edition managed storage is temporary.

3 kudos

06-10-2024 1:21:20 PM

2 More Replies

by Pratibha • New Contributor II

06-12-2024 2:46:30 AM

1695 Views
2 replies
0 kudos

which cluster/worker/driver type is best for analytics work?

Data Engineering

1695 Views
2 replies
0 kudos

06-12-2024 2:46:30 AM

View Replies

Latest Reply

jacovangelder
Honored Contributor

06-12-2024 5:21:48 AM

0 kudos

Analytics work as in querying and analyzing data? Preferably using Databricks SQL? If so, then a SQL Warehouse is your best friend.

0 kudos

06-12-2024 5:21:48 AM

1 More Replies

by kiko_roy • Contributor

06-11-2024 4:08:08 AM

3177 Views
5 replies
2 kudos

Resolved! Delta sharing open protocol in Unity catalog: FileNotFoundError

Hi TeamI have created a recipient under delta sharing (azure databricks) . Unity catalog is enabled and data is stored in ADLS gen2. I have downloaded the credential file and trying to resue in my python script (as per databricks documentation) for a...

Data Engineering

3177 Views
5 replies
2 kudos

06-11-2024 4:08:08 AM

View Replies

Latest Reply

jacovangelder
Honored Contributor

06-11-2024 7:41:38 AM

2 kudos

I wasn't able to reproduce your issue. Is your delta table operable? can you see sample data from within databricks and query the table from within databricks? It almost looks like some parquet files are missing, causing your delta not queryable anym...

2 kudos

06-11-2024 7:41:38 AM

4 More Replies

by medha • New Contributor

06-12-2024 1:17:07 AM

1027 Views
1 replies
0 kudos

Issues with Common Data Model as Source - different column size for blobs

I have a Dataverse Synapse link set up to extract data into ADLS gen2. I am trying to connect ADLS gen2 as the data source to read the data files in Databricks. I have CDC enabled for CDM Data with the partition of Year and Month.So, for example, if ...

Data Engineering

1027 Views
1 replies
0 kudos

06-12-2024 1:17:07 AM

View Replies

Latest Reply

jacovangelder
Honored Contributor

06-12-2024 2:33:23 AM

0 kudos

This is a thoughtful consideration, but have you considered using .option("mergeSchema", "true") when writing?Do keep in mind that this will affect the target table and possible downstream consumers. Ideally you want to have strict a schema contract ...

0 kudos

06-12-2024 2:33:23 AM

by maskepravin02 • New Contributor II

06-03-2024 1:07:01 AM

3182 Views
2 replies
0 kudos

Resolved! How can we connect to 2 different hive spark.hadoop.hive.metastore.uris

We need to read a table from 2 different spark.hadoop.hive.metastore.uris and do some validations.We are not able to connect to both spark.hadoop.hive.metastore.uris at the same time using sparkSession.I will be using Spark version: 3.1.1 and the lan...

Data Engineering

3182 Views
2 replies
0 kudos

06-03-2024 1:07:01 AM

View Replies

Latest Reply

ashraf1395
Honored Contributor

06-11-2024 11:24:58 PM

0 kudos

Hi there @maskepravin02,We have once implemented this approach of two reading two different hive metasores, but it was not on AWS and GCP, maybe the docs can help.Though it is not recommended The best approach is to create separate spark applications...

0 kudos

06-11-2024 11:24:58 PM

1 More Replies

by pt07 • New Contributor

06-11-2024 4:15:42 PM

922 Views
1 replies
0 kudos

how do i pass the value from one task in the workflow to another task?

how do i pass the value from one task in the workflow to another task? #worksflow #orchestration

Data Engineering

922 Views
1 replies
0 kudos

06-11-2024 4:15:42 PM

View Replies

Latest Reply

brockb
Databricks Employee

06-11-2024 8:34:44 PM

0 kudos

Hi @pt07 , This may be what you're looking for. Can you please take a look? https://www.databricks.com/blog/2022/08/02/sharing-context-between-tasks-in-databricks-workflows.html

0 kudos

06-11-2024 8:34:44 PM

by subashdsouza • New Contributor

06-11-2024 4:49:10 PM

1085 Views
1 replies
0 kudos

How will Iceberg be integrated into Delta Lake

Data Engineering

1085 Views
1 replies
0 kudos

06-11-2024 4:49:10 PM

View Replies

Latest Reply

mhiltner
Databricks Employee

06-11-2024 5:36:18 PM

0 kudos

Delta Uniform is the way to go at the moment: https://www.databricks.com/blog/delta-lake-universal-format-uniform-iceberg-compatibility-now-ga In one word, tables are written in Delta but have compatibility with Iceberg as they will also save iceberg...

0 kudos

06-11-2024 5:36:18 PM

by Peter-M • New Contributor II

06-11-2024 3:59:38 PM

3553 Views
3 replies
2 kudos

Resolved! Is there a way to have multiple triggers for a single workflow?

Data Engineering

3553 Views
3 replies
2 kudos

06-11-2024 3:59:38 PM

View Replies

Latest Reply

mhiltner
Databricks Employee

06-11-2024 5:34:12 PM

2 kudos

You could have a workflow with two tasks, one being a "trigger checker" that could be a super light task scheduled to run every X hours/minutes. This first task would check for your different triggers and define a success criteria for your next task....

2 kudos

06-11-2024 5:34:12 PM

2 More Replies

by quachtv • New Contributor

06-11-2024 4:48:57 PM

408 Views
0 replies
0 kudos

Interesting talks at DAIS 2024

I feel some of the talks are quite interesting and inspiring. Great to see everyone here!

Data Engineering

408 Views
0 replies
0 kudos

06-11-2024 4:48:57 PM

by 555228 • New Contributor II

06-11-2024 3:52:27 PM

886 Views
2 replies
1 kudos

Liquid Clustering

Love the new concept, will save a lot of compute and make queries a lot faster!

Data Engineering

886 Views
2 replies
1 kudos

06-11-2024 3:52:27 PM

View Replies

Latest Reply

Rajiv007
New Contributor II

06-11-2024 3:58:22 PM

1 kudos

Awesome!

1 kudos

06-11-2024 3:58:22 PM

1 More Replies

Databricks Community

Forum Posts

Dynamic partition overwrite with Streaming Data

Resolved! IOStream.flush Timed Out

Rename the file in Databricks is so hard.How to make it simpler

Tables disappear when I re-start a new cluster on Community Edition

Information_schema appears empty

Resolved! Tables and Databases disappear when I re-start a new cluster on Community Edition

which cluster/worker/driver type is best for analytics work?

Resolved! Delta sharing open protocol in Unity catalog: FileNotFoundError

Issues with Common Data Model as Source - different column size for blobs

Resolved! How can we connect to 2 different hive spark.hadoop.hive.metastore.uris

how do i pass the value from one task in the workflow to another task?

How will Iceberg be integrated into Delta Lake

Resolved! Is there a way to have multiple triggers for a single workflow?

Interesting talks at DAIS 2024

Liquid Clustering

Join Us as a Local Community Builder!

Cognito as IdP provider for Delta Share

How to Retrieve the spark.statistics.createdAt Whe...

Not able to find lab for Data Engineering Learning...

Lakeflow Connect - Postgres connector

Prakash Hinduja Switzerland (Swiss) How do I build...