cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

kmodelew
by New Contributor III
  • 76 Views
  • 7 replies
  • 12 kudos

Unable to read excel file from Volume

Hi, I'am trying to read excel file directly from Volume (not workspace or filestore) -> all examples on the internet use workspace or filestore. Volume is external location so I can read from there but I would like to read directly from Volume. I hav...

  • 76 Views
  • 7 replies
  • 12 kudos
Latest Reply
TheOC
Contributor
  • 12 kudos

@szymon_dybczak I think we also need to be conscious of the damage blind LLM usage can do.I'd hope it'd get caught early in a Community message, and in this case the hallucination was relatively harmless. However, There are plenty of instances on red...

  • 12 kudos
6 More Replies
Michał
by Visitor
  • 42 Views
  • 2 replies
  • 2 kudos

how to process a streaming lakeflow declarative pipeline in batches

Hi, I've got a problem and I have run out of ideas as to what else I can try. Maybe you can help? I've got a delta table with hundreds millions of records on which I have to perform relatively expensive operations. I'd like to be able to process some...

  • 42 Views
  • 2 replies
  • 2 kudos
Latest Reply
Michał
Visitor
  • 2 kudos

Thanks @szymon_dybczak. From my experiments so far, you can set `maxFilesPerTrigger`, `maxBytesPerTrigger` and other settings in both Python and SQL code when you declare streaming tables in declarative pipelines.,However, I don't see any evidence th...

  • 2 kudos
1 More Replies
yinan
by Visitor
  • 45 Views
  • 4 replies
  • 4 kudos

Resolved! Does the free version of Databricks not support external storage data sources?

1、Can the data I use with the free version of Databricks on Azure only be stored on Azure, AWS, and Google Cloud Storage?2、Assuming the network is connected, can the paid version be used to access other publicly stored data (i.e., custom storage spac...

  • 45 Views
  • 4 replies
  • 4 kudos
Latest Reply
BS_THE_ANALYST
Honored Contributor III
  • 4 kudos

Not sure if this is a cheeky way to get around bringing files in: https://community.databricks.com/t5/data-engineering/connect-to-azure-data-lake-storage-using-databricks-free-edition/m-p/127900#M48116 but I answered a similar thing on a different po...

  • 4 kudos
3 More Replies
ManoramTaparia
by Visitor
  • 32 Views
  • 1 replies
  • 1 kudos

Identify updated rows during incremental refresh in DLT Materialized Views

Hello, every time that I run a delta live table materialized view in serverless , I get a log of "COMPLETE RECOMPUTE". I realised I was using current_timestamp as a column in my MV to identify rows which got updated in the last refresh. But that make...

  • 32 Views
  • 1 replies
  • 1 kudos
Latest Reply
ck7007
New Contributor II
  • 1 kudos

@ManoramTaparia The issue is that current_timestamp() makes your MV non-deterministic, forcing complete recomputes. Here's how to fix it:Solution: Use the Source Table's Change TrackingOption 1: Leverage Source Table's Timestamp Column@Dlt.table(name...

  • 1 kudos
yinan
by Visitor
  • 78 Views
  • 5 replies
  • 2 kudos
  • 78 Views
  • 5 replies
  • 2 kudos
Latest Reply
Khaja_Zaffer
Contributor
  • 2 kudos

Hello @yinan Good day!!Databricks, being a cloud-based platform, does not have direct built-in support for reading data from a truly air-gapped (completely offline, no network connectivity) Cloudera Distribution for Hadoop (CDH) environment.  In such...

  • 2 kudos
4 More Replies
Kurgod
by New Contributor II
  • 145 Views
  • 2 replies
  • 0 kudos

Using Databricks to transform cloudera lakehouse on-prem without bringing the data to cloud

I am looking for a solution to connect databricks to cloudera lakehouse hosted on-prem and transform the data using databricks without bringing the data to databricks delta tables or cloud storage. once the transformation is done the data need to be ...

  • 145 Views
  • 2 replies
  • 0 kudos
Latest Reply
BR_DatabricksAI
Contributor III
  • 0 kudos

Hello, What is your data volume? You can connect using  jdbc/odbc but this process will be slower if the data volume is too high.Another way of connecting is if your cloudera storage is in HDFS then you can also connect through HDFS API as well.  

  • 0 kudos
1 More Replies
Nabbott
by Visitor
  • 17 Views
  • 0 replies
  • 0 kudos

Databrick Genie

I have curated silver and gold tables in Advana that feed downstream applications. Other organizations also create tables for their own use. Can Databricks Genie query across tables from different pipelines within the same organization and across mul...

  • 17 Views
  • 0 replies
  • 0 kudos
azam-io
by New Contributor II
  • 592 Views
  • 4 replies
  • 2 kudos

How can I structure pipeline-specific job params separately in Databricks Asset Bundle.

Hi all, I am working with databricks asset bundle and want to separate environment-specific job params (for example, for "env" and "dev") for each pipeline within my bundle. I need each pipeline to have its own job params values for different environ...

  • 592 Views
  • 4 replies
  • 2 kudos
Latest Reply
Michał
Visitor
  • 2 kudos

Hi azam-io, were you able to solve your problem? Are you trying to have different parameters depending on the environment, or a different parameter value? I think the targets would allow to specify different parameters per environment / target. As fo...

  • 2 kudos
3 More Replies
seefoods
by Contributor II
  • 1625 Views
  • 2 replies
  • 1 kudos

Resolved! assets bundle

Hello Guys,I am working on assets bundle. So i want to make it generic for all team like ( analytics, data engineering), Someone could you share a best practice for this purpose ? Cordially, 

  • 1625 Views
  • 2 replies
  • 1 kudos
Latest Reply
Michał
Visitor
  • 1 kudos

Hi seefoods, Were you able to achieve that generic asset bundle setup? I've been working on something, potentially, similar, and I'd be happy to discuss it, hoping to share experiences. While what I have works for a few teams, it is focused on declar...

  • 1 kudos
1 More Replies
SharathE
by New Contributor III
  • 1884 Views
  • 3 replies
  • 1 kudos

Incremental refresh of materialized view in serverless DLT

Hello, Every time that I run a delta live table materialized view in serverless , I get a log of "COMPLETE RECOMPUTE" . How can I achieve incremental refresh in serverless in DLT pipelines?

  • 1884 Views
  • 3 replies
  • 1 kudos
Latest Reply
drewipson
New Contributor III
  • 1 kudos

Make sure you are using the aggregates and SQL restrictions outlined in this article. https://docs.databricks.com/en/optimizations/incremental-refresh.htmlIf a SQL function is non-deterministic (current_timestamp() is a common one) you will have a CO...

  • 1 kudos
2 More Replies
korijn
by New Contributor II
  • 646 Views
  • 4 replies
  • 0 kudos

Git integration inconsistencies between git folders and job git

It's a little confusing and limiting that the git integration support is inconsistent between the two options available.Sparse checkout is only supported when using a workspace Git folder, and checking out by commit hash is only supported when using ...

  • 646 Views
  • 4 replies
  • 0 kudos
Latest Reply
_J
New Contributor II
  • 0 kudos

Same here, could be a good improvement for the jobs layer guys!

  • 0 kudos
3 More Replies
ck7007
by New Contributor II
  • 42 Views
  • 2 replies
  • 1 kudos

Streaming Solution

Maintain Zonemaps with Streaming Writes Challenge: Streaming breaks zonemaps due to constant micro-batches.Solution: Incremental Updatesdef write_streaming_with_zonemap(stream_df, table_path):def update_zonemap(batch_df, batch_id):# Write databatch_d...

  • 42 Views
  • 2 replies
  • 1 kudos
Latest Reply
ManojkMohan
Contributor III
  • 1 kudos

@ck7007   Yes i am interested to collaborate  . AM stucturing the problem like belowthe challenge is: How can we leverage the query performance benefits of zonemaps without sacrificing the ingestion performance of a streaming pipeline? Problem Statem...

  • 1 kudos
1 More Replies
stucas
by New Contributor
  • 22 Views
  • 0 replies
  • 0 kudos

DLT Pipeline and Pivot tables

TLDR:Can DLT determine a dynamic schema - one which is generated from the results of a pivot?IssueI know you cant use spark `.pivot` in DLT pipeline and that if you wish to pivot data you need to do that outside of the DLT decorated functions. I have...

  • 22 Views
  • 0 replies
  • 0 kudos
IONA
by New Contributor III
  • 171 Views
  • 6 replies
  • 7 kudos

Resolved! Getting data from the Spark query profiler

When you navigate to Compute > Select Cluster > Spark UI > JDBC/ODBC There you can see grids of Session stats and SQL stats. Is there any way to get this data in a query so that I can do some analysis? Thanks

  • 171 Views
  • 6 replies
  • 7 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 7 kudos

 Hi @IONA ,As @BigRoux  correctly suggested there no native way to get stats from JDBC/ODBC Spark UI.1. You can try to use query history system table, but it has limited number of metrics %sql SELECT * FROM system.query.history 2. You can use /api/2....

  • 7 kudos
5 More Replies
ManojkMohan
by Contributor III
  • 218 Views
  • 13 replies
  • 12 kudos

Ingesting 100 TB raw CSV data into the Bronze layer in Parquet + Snappy

Problem i am trying to solve:Bronze is the landing zone for immutable, raw data.At this stage, i am trying to sse a columnar format (Parquet or ORC) → good compression, efficient scans. and then apply lightweight compression (e.g., Snappy) → balances...

  • 218 Views
  • 13 replies
  • 12 kudos
Latest Reply
BS_THE_ANALYST
Honored Contributor III
  • 12 kudos

@ManojkMohan You could just do the checkpointing inside a volume within Unity Catalog. What's the benefit to having this externally? I think resolving the AWS creds config is still good for learning but you can bypass that.All the best,BS

  • 12 kudos
12 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels