cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MikeGo
by Contributor III
  • 213 Views
  • 6 replies
  • 1 kudos

Resolved! Table update trigger and File Arrival trigger latency

Hi team,When using table update or file arrival trigger, what latency I can expect for the trigger. Does Databricks poll the source by some schedule? If yes, whether the poll is free?Thanks

  • 213 Views
  • 6 replies
  • 1 kudos
Latest Reply
MikeGo
Contributor III
  • 1 kudos

Hi @Ashwin_DSA ,Thanks for the input. This is very helpful. For the last question, we thought about to create another table as staging, which is specifically used as trigger. Any time source has changes, we will update the staging table too. However ...

  • 1 kudos
5 More Replies
maikel
by Contributor II
  • 84 Views
  • 1 replies
  • 0 kudos

Uploading file to volume and start ingestion job

Hello Community!I am writing to you with my idea about data ingestion job which we have to implement in our project.The data which we have are in CSV file format and depending on the case it differs a little bit. Before uploading we pivoting csv file...

  • 84 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @maikel, You don't have to build a custom solution for this. Databricks now has native components that align very well with what you want. If you want the job to start as soon as new files land in a volume, the recommended approach is to use file-...

  • 0 kudos
murtadha_s
by Databricks Partner
  • 69 Views
  • 1 replies
  • 0 kudos

What the maximum size to read using dbutils.fs.head

Hi,What the maximum size to read using dbutils.fs.head()?is there a limit? because AI says 10MB and I couldn't find useful info in documentations, while I tried in the actual one and it was only limited by the driver memory.Thanks in advance. 

  • 69 Views
  • 1 replies
  • 0 kudos
Latest Reply
DivyaandData
Databricks Employee
  • 0 kudos

dbutils.fs.head() itself does not have a documented hard cap like 10 MB. From the official dbutils reference, the signature is: dbutils.fs.head(file: String, max_bytes: int = 65536): String “Returns up to the specified maximum number of bytes in t...

  • 0 kudos
DavidKxx
by Contributor
  • 89 Views
  • 2 replies
  • 1 kudos

Resolved! Data in Unity Catalog that can't be previewed

This is a small deficiency, but a fix would be nice to have.For a long time now, the Sample Data previewer in the Unity Catalog explorer has been unable to show tables that contain a certain kind of column.  Instead of showing sample rows of the tabl...

  • 89 Views
  • 2 replies
  • 1 kudos
Latest Reply
DavidKxx
Contributor
  • 1 kudos

Yes, my vector space is commonly of dimension 4000 or 8000.I don't write any dense vectors to table; can't speak to what happens previewing that type.Thanks for taking up the issue!

  • 1 kudos
1 More Replies
vidya_kothavale
by Contributor
  • 156 Views
  • 6 replies
  • 7 kudos

Resolved! Managed Delta table: time travel blocked after automatic VACUUM

Hi,On a managed Delta table  I get:SELECT * FROM abc VERSION AS OF 25;Error:DELTA_UNSUPPORTED_TIME_TRAVEL_BEYOND_DELETED_FILE_RETENTION_DURATION Cannot time travel beyond delta.deletedFileRetentionDuration (168 HOURS).Audit logs show VACUUM START/END...

  • 156 Views
  • 6 replies
  • 7 kudos
Latest Reply
balajij8
Contributor III
  • 7 kudos

VACUUM will never delete files on the latest version even if Version 10 was not accessed or modified as it represents the current state of the table. VACUUM targets files that are not referenced by the recent version. It identifies files that were re...

  • 7 kudos
5 More Replies
Muralidharan_A
by New Contributor
  • 66 Views
  • 1 replies
  • 0 kudos

Supporting File unrecognition in DLT Pipeline.

We have a dlt pipeline which creates some same table, which are created based on some transformation and those transformation are kept inside a function in a seperate file. and those file were used using import function.we are deploying those changes...

  • 66 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @Muralidharan_A, To your question about whether retry_on_failure does more than a manual refresh, the answer is yes! retry_on_failure (along with pipelines.numUpdateRetryAttempts and pipelines.maxFlowRetryAttempts) performs classified, timed retri...

  • 0 kudos
RodrigoE
by New Contributor III
  • 63 Views
  • 1 replies
  • 0 kudos

Ingest data from REST endpoint into Databricks

Hello,I'm looking for the best option to retrieve between 1-1.5TB of data per day from a REST API into Databricks.Thank you,Rodrigo Escamilla

  • 63 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @RodrigoE, It would be helpful to have additional information to recommend the best options for your scenario.  Who owns the REST API?Is that in your control? Can the source push data to Databricks, or should you pull on a schedule? If the source ...

  • 0 kudos
397973
by New Contributor III
  • 141 Views
  • 2 replies
  • 1 kudos

Resolved! Jobs & Pipelines: is it possible for "Run parameters" to display a value generated in code?

Hi. I'm testing out the "Run parameters" you see in Jobs & Pipelines. As far as I know, this value is set manually by "Job parameters" on the right side bar. Can I set the value within code though? Like if I want something dynamically generated depen...

397973_0-1776867978321.png
  • 141 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @397973, Interesting question and I did not know the answer. So, I ran the test you described on my own workspace. Sharing what I found in case it saves you time. The short answer is that the task values won't populate the Run parameters column. V...

  • 1 kudos
1 More Replies
tsam
by New Contributor II
  • 230 Views
  • 4 replies
  • 0 kudos

Driver memory utilization grows continuously during job

I have a batch job that runs thousands of Deep Clone commands, it uses a ForEach task to run multiple Deep Clones in parallel. It was taking a very long time and I realized that the Driver was the main culprit since it was using up all of its memory ...

tsam_2-1776095245905.png
  • 230 Views
  • 4 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 0 kudos

You’re seeing (a monotonic / stair‑step climb in driver RAM over thousands of DEEP CLONE statements) is a very common pattern when the driver is not “holding data”, but holding metadata, query artifacts, and per‑command state that accumulates faster ...

  • 0 kudos
3 More Replies
harisrinivasay
by New Contributor
  • 188 Views
  • 4 replies
  • 1 kudos

Resolved! Unable to View Tables While Setting Up PostgreSQL CDC via Lakeflow Connect

Dear Experts,I have a requirement to implement PostgreSQL CDC using Databricks Lakeflow Connect. While setting up the tables, I am unable to see the list of available tables, even though the connection settings appear to be correct.Could you please s...

  • 188 Views
  • 4 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @harisrinivasay, @szymon_dybczak is correct. You must enter the database name. Lakeflow Connect can only connect to and query that database, and list the schemas and tables if you provide the correct name. If the name is incorrect or if you don’t ...

  • 1 kudos
3 More Replies
cvh
by New Contributor II
  • 157 Views
  • 5 replies
  • 3 kudos

Does Lakeflow Connect Have Any Change Tracking Diagnostics?

We have set up Change Tracking on multiple SQL Servers for Lakeflow Connect successfully in the past, but lately we are having lots of problems with a couple of servers. The latest utility script has been run and both lakeflowSetupChangeTracking and ...

  • 157 Views
  • 5 replies
  • 3 kudos
Latest Reply
cvh
New Contributor II
  • 3 kudos

Thanks @Ashwin_DSA , @amirabedhiafi for your swift responses.I had high hopes when I saw lakeflowUtilityVersion_1_5() is queried, as I found the database user for the ingestion gateway connection (i.e. the @User parameter for both dbo.lakeflowSetupCh...

  • 3 kudos
4 More Replies
Lavaneethreddy
by New Contributor
  • 53 Views
  • 0 replies
  • 1 kudos

Stop Refreshing. Start Querying.

How Databricks Metric Views Are Replacing Power BI Import Models — and What Your Team Needs to Do About It.IntroductionPower BI Import models work — until scheduled refreshes, size limits, and governance sprawl become too big to ignore. Databricks Un...

Lavaneethreddy_0-1776942515973.png Lavaneethreddy_1-1776942827059.png Lavaneethreddy_2-1776943016504.png Lavaneethreddy_3-1776943083992.png
  • 53 Views
  • 0 replies
  • 1 kudos
Brahmareddy
by Esteemed Contributor
  • 68 Views
  • 0 replies
  • 2 kudos

Too Many Tools Can Slow Good Data Teams Down

A Small Thing I Keep Noticing in Data ProjectsLately, I have been thinking about something I have seen again and again in big data projects.At the start, everything feels manageable. One tool is used for ingestion. Another one is used for transformat...

  • 68 Views
  • 0 replies
  • 2 kudos
TX-Aggie-00
by Databricks Partner
  • 237 Views
  • 4 replies
  • 0 kudos

Resolved! Sharepoint Connector Site Limitation

Hey All!We are trying out the Beta connector for SharePoint and found that the connector will not work at the root-level site.  Is there a reason for this limitation.  It is unfortunately a hard blocker for us to use the native connector.  MUST_START...

  • 237 Views
  • 4 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi Scott, Just asking our product team the quesiton. By the root level site do you mean content that is stored on the root level site? Or do you mean everything across your root tennant. ie you want to ingest all files across your tennant in a single...

  • 0 kudos
3 More Replies
Labels