cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

vishesh_berera
by New Contributor III
  • 383 Views
  • 1 replies
  • 0 kudos

How can we Implement Conditional Logic on SQL Query Output in Job Workflow

I'm trying to create a job where I define a get data task that executes a SQL query. After that, I want to apply conditional logic using an if-else task based on the query output. Specifically, I want to check each row individually—if a condition is ...

vishesh_berera_0-1755112079334.png
  • 383 Views
  • 1 replies
  • 0 kudos
Latest Reply
BR_DatabricksAI
Contributor III
  • 0 kudos

Hello, I believe the the fixed parameter option is exists and introduced recently in  lake flow declarative pipeline where you need to navigate to the configuration section and add parameters.

  • 0 kudos
Ramu1821
by New Contributor II
  • 2932 Views
  • 2 replies
  • 0 kudos

Merge using DLT

I have a requirement where i need only 24 hours data from my delta tablelets call this as latest tablethis latest table should be in sync with sourceso, it should handle all updates and inserts along with delete (if something gets deleted at source, ...

  • 2932 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ramu1821
New Contributor II
  • 0 kudos

from pyspark.sql.functions import col, lit, expr, when, to_timestamp, current_timestampfrom pyspark.sql.functions import max as max_import dltfrom pyspark.sql.types import StructType, StructField, StringTypefrom pyspark.sql.utils import AnalysisExcep...

  • 0 kudos
1 More Replies
boitumelodikoko
by Valued Contributor
  • 12869 Views
  • 7 replies
  • 4 kudos

Resolved! Databricks Autoloader Checkpoint

Hello Databricks Community,I'm encountering an issue with the Databricks Autoloader where, after running successfully for a period of time, it suddenly stops detecting new files in the source directory. This issue only gets resolved when I reset the ...

  • 12869 Views
  • 7 replies
  • 4 kudos
Latest Reply
boitumelodikoko
Valued Contributor
  • 4 kudos

I have found that reducing the number of objects in the landing path (via an archive/cleanup process) is the most reliable fix. Auto Loader's file discovery can bog down in big/"long-lived" landing folders—especially in directory-listing mode—so clea...

  • 4 kudos
6 More Replies
boskicl
by New Contributor III
  • 38969 Views
  • 8 replies
  • 12 kudos

Resolved! Table write command stuck "Filtering files for query."

Hello all,Background:I am having an issue today with databricks using pyspark-sql and writing a delta table. The dataframe is made by doing an inner join between two tables and that is the table which I am trying to write to a delta table. The table ...

filtering job_info spill_memory
  • 38969 Views
  • 8 replies
  • 12 kudos
Latest Reply
nvashisth
New Contributor III
  • 12 kudos

@timo199 , @boskicl I had similar issue and job was getting stuck at Filtering Files for Query indefinitely. I checked SPARK logs and based on that figured out that we had enabled PHOTON acceleration on our cluster for job and datatype of our columns...

  • 12 kudos
7 More Replies
fkseki
by New Contributor III
  • 842 Views
  • 6 replies
  • 6 kudos

Resolved! List budget policies applying filter_by

I'm trying to list budget policies using the parameter "filter_by" to filter policies that start with "aaaa" but I'm getting an error  "400 Bad Request"{'error_code': 'MALFORMED_REQUEST', 'message': "Could not parse request object: Expected 'START_OB...

  • 842 Views
  • 6 replies
  • 6 kudos
Latest Reply
fkseki
New Contributor III
  • 6 kudos

Thanks for the reply, @szymon_dybczak and @lingareddy_Alva.I tried both approaches but none was successful.url = f'{account_url}/api/2.1/accounts/{account_id}/budget-policies'filter_by_json = json.dumps({"policy_name": "aaaa"})params = {"filter_by": ...

  • 6 kudos
5 More Replies
sowanth
by New Contributor II
  • 502 Views
  • 3 replies
  • 0 kudos

Spark Memory Configuration– Request for Clarification

Hi Team,I have noticed the following Spark configuration is being applied, though it's not defined in our repo or anywhere in the policies:spark.memory.offHeap.enabled = true  spark.memory.offHeap.size = Around 3/4 of the node instance memory (i.e 1-...

  • 502 Views
  • 3 replies
  • 0 kudos
Latest Reply
sowanth
New Contributor II
  • 0 kudos

Now I understand how it's automatically configured in our cluster along with the rationale behind this off-heap memory approach.However, I have some concerns about this configuration:General applicability: Most jobs don't actually require 70% off-hea...

  • 0 kudos
2 More Replies
jeremy98
by Honored Contributor
  • 496 Views
  • 1 replies
  • 0 kudos

how to manage a dynamic scheduled job if an INTERNAL_ERROR occurs?

Hi community,My team and I have been occasionally experiencing INTERNAL_ERROR events in Databricks. We have a job that runs on a schedule, but the start times vary. Sometimes, when the job is triggered, the underlying cluster fails to start for some ...

  • 496 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor
  • 0 kudos

Hi @jeremy98 ,To investigate, check the Jobs UI for failed runs and review both error messages and cluster logs. Monitor failure trends over time and adjust cluster settings or quotas if needed.https://docs.databricks.com/gcp/en/jobs/repair-job-failu...

  • 0 kudos
ManojkMohan
by Honored Contributor II
  • 1030 Views
  • 1 replies
  • 1 kudos

Resolved! Notebook not found: Error

Last execution failed Notebook not found: Users/manojdatabricks73@gmail.com/includes/CreateRawData. Notebooks can be specified via a relative path (./Notebook or ../folder/Notebook) or via an absolute path (/Abs/Path/to/Notebook). Make sure you are s...

  • 1030 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 1 kudos

Hi @ManojkMohan 1. Verify the exact notebook location in the workspaceIn Databricks, open the Workspace browser.Navigate manually to where you think CreateRawData lives.Right-click on the notebook and select Copy Path — this gives you the exact absol...

  • 1 kudos
databricks_use2
by New Contributor II
  • 1612 Views
  • 2 replies
  • 2 kudos

Resolved! Autoloader Checkpoint Issue

I was pulling data from an S3 source using a Databricks Autoloader pipeline. Some files in the source contained bad characters, which caused the Autoloader to fail to load the data. These problematic files have now been removed from the source, but D...

  • 1612 Views
  • 2 replies
  • 2 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 2 kudos

Hello @databricks_use2 If you are okay this please make this as solution so that this can help others.

  • 2 kudos
1 More Replies
grazie
by Contributor
  • 3680 Views
  • 4 replies
  • 3 kudos

Do you need to be workspace admin to create jobs?

We're using a setup where we use gitlab ci to deploy workflows using a service principal, using the Jobs API (2.1) https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsCreateWhen we wanted to reduce permissions of the ci to minimu...

  • 3680 Views
  • 4 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Geir Iversen​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 3 kudos
3 More Replies
susmitsircar
by New Contributor III
  • 1100 Views
  • 3 replies
  • 0 kudos

Spark streaming failing intermittently with FileAlreadyExistsException RocksDB checkpointing

We are encountering an issue in our Spark streaming pipeline when attempting to write checkpoint data to S3. The error we are seeing is as follows:25/08/12 13:35:40 ERROR RocksDBFileManager : Error zipping to s3://xxx-datalake-binary/event-types/chec...

  • 1100 Views
  • 3 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @susmitsircar Best practices / Fixes1. Clean up the checkpoint directory before restartIf you know the stream can safely start from scratch or reprocess data:Delete the S3 checkpoint path before restarting.This ensures no stale 0.zip files remain....

  • 0 kudos
2 More Replies
susmitsircar
by New Contributor III
  • 1842 Views
  • 7 replies
  • 3 kudos

Resolved! Spark streaming failing intermittently with llegalStateException: Found no SST files

I'm encountering the following error while trying to upload a RocksDB checkpoint in Databricks:java.lang.IllegalStateException: Found no SST files during uploading RocksDB checkpoint version 498 with 2332 key(s). at com.databricks.sql.streaming.s...

  • 1842 Views
  • 7 replies
  • 3 kudos
Latest Reply
susmitsircar
New Contributor III
  • 3 kudos

@mani_22 Do you see any risk of disabling this flag in our pipeline, as we will be bypassing some heuristic checks, as far as i understand, while uploading the state filesspark.databricks.rocksDB.verifyBeforeUpload false 

  • 3 kudos
6 More Replies
absan
by Contributor
  • 584 Views
  • 1 replies
  • 1 kudos

Resolved! Lakeflow Designer, DAB & Git

Hi, i'm trying to understand the process and configuration needed to get the new Lakeflow designer, DAB and Git Folder play together.What i've done:Created an empty Github repository and created a Git Folder for it in DatabricksIn the Git Folder i cr...

  • 584 Views
  • 1 replies
  • 1 kudos
Latest Reply
SP_6721
Honored Contributor
  • 1 kudos

Hi @absan ,It’s recommended to create and deploy your DAB templates from within the Git folder, as this ensures the pipeline’s root is set correctly to that folder.

  • 1 kudos
noorbasha534
by Valued Contributor II
  • 353 Views
  • 1 replies
  • 0 kudos

Column access patterns

Hello allFloating this question again separately ((few weeks ago I clubbed this as part of predictive optimization)) -Has anyone cracked to get the list if columns being used in joins & filters, especially in the context that access to end users is g...

  • 353 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor
  • 0 kudos

Hi @noorbasha534 ,I’m not aware of a direct way to do this, but one approach is to parse each view’s SQL definition to identify the columns used in join and filter conditions, then use lineage tools to trace them through nested views back to the unde...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels