cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

NehaR
by New Contributor III
  • 2231 Views
  • 5 replies
  • 1 kudos

Way to enforce partition column in where clause

Hi All,I want to know if is it possible to enforce that all queries must include a partition filter if the delta table is a partition table in databricks?I tried the below option and set the required property but it doesn't work and I can still query...

Data Engineering
databricks delta table
Delta table
partition
  • 2231 Views
  • 5 replies
  • 1 kudos
Latest Reply
balajij8
Contributor
  • 1 kudos

Liquid clustering is flexible and handles most of the issues automatically. You can use liquid clustering instead of forcing teams to use partition filter.

  • 1 kudos
4 More Replies
Danish11052000
by Contributor
  • 409 Views
  • 1 replies
  • 0 kudos

Resolved! How should I correctly extract the full table name from request_params in audit logs?

I’m trying to build a UC usage/refresh tracking table for every workspace. For each workspace, I want to know how many times a UC table was refreshed or accessed each month. To do this, I’m reading the Databricks audit logs and I need to extract only...

  • 409 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @Danish11052000, Is there a reason you prefer building your own table for this? I'm asking because there are simpler and more reliable patterns than hand-parsing. If the account has system tables enabled, you can query system.access.audit directly...

  • 0 kudos
Skcmsa007
by New Contributor
  • 490 Views
  • 1 replies
  • 0 kudos

Databrciks app 504 Upstream request timeout

I have deployed my fast api application in databricks apps and I have given keep alive timeout 1200.Issue:From databricks swagger I am getting 504 "upstream request timeout" after 2 mins while my api takes 3 min to respond. But in backend my task got...

  • 490 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lu_Wang_ENB_DBX
Databricks Employee
  • 0 kudos

TLDR: You cannot increase the upstream gateway timeout in Databricks Apps. The best practice and quick solution to handle operations that take longer than the gateway limit is to implement a "status pull" (polling) pattern.Why the Timeout Occurs Data...

  • 0 kudos
Hsn
by New Contributor II
  • 701 Views
  • 5 replies
  • 1 kudos

Resolved! Suggest about data engineer

Hey, I'm Hasan Sayyed, currently pursuing SYBCA. I want to become a Data Engineer, but as a beginner, I’ve wasted some time learning other languages and technologies due to a lack of proper knowledge about this field. If someone could guide and teach...

  • 701 Views
  • 5 replies
  • 1 kudos
Latest Reply
xandermuchanga
New Contributor II
  • 1 kudos

2x

  • 1 kudos
4 More Replies
raimundovidal
by New Contributor II
  • 287 Views
  • 1 replies
  • 0 kudos

Resolved! Managed File Events: Are reads from the file events cache independent per pipeline?

We have two Databricks workspaces (staging and production) attached to the same Unity Catalog metastore. Both workspaces run DLT pipelines that use Auto Loader with cloudFiles.useManagedFileEvents = "true" to ingest from the sameexternal location (sa...

  • 287 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @raimundovidal, You’re safe to run both staging and production Lakeflow Spark Declarative Pipelines with cloudFiles.useManagedFileEvents = "true" against the same external location (same S3 path) and same Unity Catalog metastore, as long as each p...

  • 0 kudos
Eibraao
by New Contributor II
  • 533 Views
  • 6 replies
  • 0 kudos

Disable the dashboard sharing field for dashboard creators

"How can I disable the dashboard sharing field for dashboard creators who are not admins? I tried changing the creator’s permission from 'CAN_MANAGE' to 'CAN_READ', but it had no effect — the creator still retains the 'CAN_MANAGE' permission

  • 533 Views
  • 6 replies
  • 0 kudos
Latest Reply
Eibraao
New Contributor II
  • 0 kudos

 

  • 0 kudos
5 More Replies
Datalight
by Contributor
  • 627 Views
  • 2 replies
  • 0 kudos

Resolved! Design Oracle Fusion SCM to Azure Databricks

Hello Techie,I am planning to migrate All module of Oracle fusion scm data to Azure Databricks.Do we have only option of BICC (Business Intelligence Cloud Connector), OR any other option avaialble.Can anyone please help me with reference architecture...

  • 627 Views
  • 2 replies
  • 0 kudos
Latest Reply
Datalight
Contributor
  • 0 kudos

@mark_ott : Thanks a ton. sorry for late reply, as Client was not sure on the approach. your solution helps a lot. Thanks Again.

  • 0 kudos
1 More Replies
rvo19941
by Databricks Partner
  • 5524 Views
  • 3 replies
  • 0 kudos

Auto Loader File Notification Mode not working with ADLS Gen2 and files written as a stream

Dear,I am working on a real-time use case and am therefore using Auto Loader with file notification to ingest json files from a Gen2 Azure Storage Account in real-time. Full refreshes of my table work fine but I noticed Auto Loader was not picking up...

Data Engineering
ADLS
Auto Loader
Event Subscription
File Notification
Queue Storage
  • 5524 Views
  • 3 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Auto Loader file notification in Databricks relies on Azure Event Grid’s BlobCreated event to trigger notifications for newly created files in Azure Data Lake Gen2. The issue you’re experiencing is a known limitation when files are written via certai...

  • 0 kudos
2 More Replies
janm2
by New Contributor II
  • 2359 Views
  • 6 replies
  • 1 kudos

Autoloader cleansource option does not take any effect

Hello everyone,I was very keen to try out the Autoloader's new cleanSource option so we can clean up our landing folder easily.However I found out it does not have any effect whatsoever. As I cannot create a support case I am creating this post.A sim...

  • 2359 Views
  • 6 replies
  • 1 kudos
Latest Reply
awhorton
New Contributor II
  • 1 kudos

I had the same issue, which was caused by colons in the filenames.  It quietly failed in the app, but log4j contained warnings like this:26/02/20 07:11:07 WARN CleanSourceFileMover: [queryId = f0e53] Unexpected exception when cleaning: /Volumes/prod/...

  • 1 kudos
5 More Replies
Sneeze7432
by New Contributor III
  • 5308 Views
  • 14 replies
  • 2 kudos

File Trigger Not Triggering Multiple Runs

I have a job with one task which is to run a notebook.  The job run is setup with a File arrival trigger with my blob storage as the location.  The trigger works and the job will start when a new file arrives in the location, but it does not run for ...

  • 5308 Views
  • 14 replies
  • 2 kudos
Latest Reply
maddy08
New Contributor II
  • 2 kudos

@Sneeze7432 did you solve ?File arrival group the files when it executes, I verified this with Databricks team.you may encounter Multiple source matched error during MERGE operations. to overcome, It’s better to APPEND only into to Raw/bronze layer, ...

  • 2 kudos
13 More Replies
Shimon
by New Contributor II
  • 1055 Views
  • 3 replies
  • 0 kudos

Jackson version conflict

Hi,I am trying to implement the Spark TableProvider api and i am experiencing a jar conflict (I am using the 17.3 runtime). com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.15.2 requires Jackson Databind version >= 2.15.0 and < 2.1...

  • 1055 Views
  • 3 replies
  • 0 kudos
Latest Reply
emanuele_m
Databricks Employee
  • 0 kudos

Hi,this problem occurs if you have dynamic module registration, e.g.new ObjectMapper().findAndRegisterModules()and the way to solve it is to use something like this insteadval jsonMapper = new ObjectMapper() jsonMapper.registerModule(DefaultScalaModu...

  • 0 kudos
2 More Replies
hello_world
by Databricks Partner
  • 4588 Views
  • 2 replies
  • 5 kudos

What is the purpose of the USAGE privilege?

I watched a couple of courses on Databricks Academy, none of which clearly explains or demonstrates the purpose of the USAGE privilege.USAGE: does not give any abilities, but is an additional requirement to perform any action on a schema object.I hav...

  • 4588 Views
  • 2 replies
  • 5 kudos
Latest Reply
Celebal2
Databricks Partner
  • 5 kudos

In Databricks (Unity Catalog), USAGE is a basic access privilege that allows a user to access a container object but not read or modify data inside it.Think like:“Permission to enter the building, but not open any rooms.”

  • 5 kudos
1 More Replies
dan11
by New Contributor II
  • 6427 Views
  • 5 replies
  • 1 kudos

sql delete?

<pre> Hello databricks people, I started working with databricks today. I have a sql script which I developed with sqlite3 on a laptop. I want to port the script to databricks. I started with two sql statements: select count(prop_id) from prop0; del...

  • 6427 Views
  • 5 replies
  • 1 kudos
Latest Reply
oliverstonez
New Contributor III
  • 1 kudos

You aren't doing anything wrong logically, but Databricks requires row-level changes to happen on Delta Lake tables. Standard Spark tables (like those backed by raw Parquet) are often immutable. Have a look at the Language Manual for DELETE to ensure...

  • 1 kudos
4 More Replies
valiro21
by Contributor
  • 1081 Views
  • 8 replies
  • 1 kudos

Resolved! Incremental monthly UDF analytics on large time-series: full reprocess vs change-driven recompute

Hi everyone,I’m looking for advice and best practices for running monthly analytics on large time-series data in Databricks, where the source data can be updated retroactively.Data modelWe have a Delta table with time-series measurements:- customer_i...

  • 1081 Views
  • 8 replies
  • 1 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 1 kudos

Hi @valiro21 , Your scenario is a classic late-arriving and incremental refresh data challenge. While you could build custom logic to track these changes, there is a native approach in Databricks that simplifies this significantly: Lakeflow Spark Dec...

  • 1 kudos
7 More Replies
shahabm
by New Contributor III
  • 18520 Views
  • 6 replies
  • 2 kudos

Resolved! Databricks job keep getting failed due to GC issue

There is a job that running successful but it's for more than a month we are experiencing long run which gets failed. In the stdout log file(attached), there are numerous following messages:[GC (Allocation Failure) [PSYoungGen:...]    and   [Full GC ...

  • 18520 Views
  • 6 replies
  • 2 kudos
Latest Reply
siddhu30
New Contributor II
  • 2 kudos

Thanks a lot @shahabm for your prompt response, appreciate it. I'll try to debug in this direction.Thanks again!

  • 2 kudos
5 More Replies
Labels