cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Maxi1693
by New Contributor II
  • 4229 Views
  • 6 replies
  • 1 kudos

Monitoring structure streaming in externar sink

Hi! Today working trying to collect some metrics to create a splot in my spark structure streaming. It is configured with a trigger(processingTime="30 seconds") and I am trying to collect data with the following Listener Class (just an example).  # D...

Screenshot 2024-03-08 113453.png
  • 4229 Views
  • 6 replies
  • 1 kudos
Latest Reply
WiliamRosa
Contributor III
  • 1 kudos

Hi everyone,I recently worked on a similar requirement and would like to share a structured approach to monitoring Structured Streaming when writing to external sinks.1. Use a Unique Query NameAlways assign a clear and meaningful name to each streami...

  • 1 kudos
5 More Replies
dholea
by New Contributor II
  • 470 Views
  • 3 replies
  • 1 kudos

Help required for executing geospetial query

we have requirement to find a specific distance based on longitude and latitude. Can you please help me with the details step how we can achieve this using pyspark? Thank you.

  • 470 Views
  • 3 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @dholea ,Databricks Runtime 17.1 Beta added native support for Spatial SQL. So for example it let's you calculate distance between coordinates.I think you can try with ST_Distance() function.

  • 1 kudos
2 More Replies
sebih
by New Contributor II
  • 452 Views
  • 2 replies
  • 0 kudos

Cannot use join with Enzyme

I suppose I can use incrementalization on pipelines. Supported operators are listed in here: https://docs.databricks.com/aws/en/optimizations/incremental-refresh#support-for-materialized-view-incremental-refreshHowever, when I run the pipeline, it do...

  • 452 Views
  • 2 replies
  • 0 kudos
Latest Reply
sebih
New Contributor II
  • 0 kudos

Thank you for your reply. Even though we only do one join, we keep getting this error.

  • 0 kudos
1 More Replies
turagittech
by Contributor
  • 646 Views
  • 2 replies
  • 1 kudos

Resolved! Managing values that change between development and production

Hi all, when moving from development to testing a production one often needs to handle change values like the blob store or database server being differentI have seen that using widgets can be a useful way to have updateable values for Notebooks and ...

  • 646 Views
  • 2 replies
  • 1 kudos
Latest Reply
turagittech
Contributor
  • 1 kudos

Great, thanks. Speed in this case isn't critical as it's not processing massive amounts of data, well I hope not massive amounts at this time. It'll be some batch processes that can't use dlt.

  • 1 kudos
1 More Replies
Odoo_ERP
by New Contributor II
  • 3685 Views
  • 2 replies
  • 1 kudos

Odoo ERP customization Odoo is one of the most popular ERP software. It is widely use by companies. Odoo customization mainly includes changing the sy...

Odoo ERP customizationOdoo is one of the most popular ERP software. It is widely use by companies. Odoo customization mainly includes changing the system by including new features and functionalities in accordance with the business needs of the clien...

  • 3685 Views
  • 2 replies
  • 1 kudos
Latest Reply
danieljogi
New Contributor II
  • 1 kudos

Odoo ERP customization is process to customize module, CRM, website, POS, report and more to meet the specific business requirement. 

  • 1 kudos
1 More Replies
Datalight
by Contributor
  • 2357 Views
  • 4 replies
  • 2 kudos

Resolved! Data Transfer using Unity Catalog full implementation

I have to share data between Azure A   and Azure B . using unity catalog and delta sharing.Every Time Data comes to Azure A, The same Data can be read by AzureB.How to handle Incremental Load. for change records I think I need to use Merge Statement....

  • 2357 Views
  • 4 replies
  • 2 kudos
Latest Reply
turagittech
Contributor
  • 2 kudos

This works well when set up, If you're securely set up in Azure you will need to grant a privatelink to the underlying storage for their service to read data. For enhanced security I'd recommend your catalog for the other party then be in external st...

  • 2 kudos
3 More Replies
vishesh_berera
by New Contributor III
  • 373 Views
  • 1 replies
  • 0 kudos

How can we Implement Conditional Logic on SQL Query Output in Job Workflow

I'm trying to create a job where I define a get data task that executes a SQL query. After that, I want to apply conditional logic using an if-else task based on the query output. Specifically, I want to check each row individually—if a condition is ...

vishesh_berera_0-1755112079334.png
  • 373 Views
  • 1 replies
  • 0 kudos
Latest Reply
BR_DatabricksAI
Contributor III
  • 0 kudos

Hello, I believe the the fixed parameter option is exists and introduced recently in  lake flow declarative pipeline where you need to navigate to the configuration section and add parameters.

  • 0 kudos
Ramu1821
by New Contributor II
  • 2917 Views
  • 2 replies
  • 0 kudos

Merge using DLT

I have a requirement where i need only 24 hours data from my delta tablelets call this as latest tablethis latest table should be in sync with sourceso, it should handle all updates and inserts along with delete (if something gets deleted at source, ...

  • 2917 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ramu1821
New Contributor II
  • 0 kudos

from pyspark.sql.functions import col, lit, expr, when, to_timestamp, current_timestampfrom pyspark.sql.functions import max as max_import dltfrom pyspark.sql.types import StructType, StructField, StringTypefrom pyspark.sql.utils import AnalysisExcep...

  • 0 kudos
1 More Replies
boitumelodikoko
by Valued Contributor
  • 12776 Views
  • 7 replies
  • 4 kudos

Resolved! Databricks Autoloader Checkpoint

Hello Databricks Community,I'm encountering an issue with the Databricks Autoloader where, after running successfully for a period of time, it suddenly stops detecting new files in the source directory. This issue only gets resolved when I reset the ...

  • 12776 Views
  • 7 replies
  • 4 kudos
Latest Reply
boitumelodikoko
Valued Contributor
  • 4 kudos

I have found that reducing the number of objects in the landing path (via an archive/cleanup process) is the most reliable fix. Auto Loader's file discovery can bog down in big/"long-lived" landing folders—especially in directory-listing mode—so clea...

  • 4 kudos
6 More Replies
boskicl
by New Contributor III
  • 38788 Views
  • 8 replies
  • 12 kudos

Resolved! Table write command stuck "Filtering files for query."

Hello all,Background:I am having an issue today with databricks using pyspark-sql and writing a delta table. The dataframe is made by doing an inner join between two tables and that is the table which I am trying to write to a delta table. The table ...

filtering job_info spill_memory
  • 38788 Views
  • 8 replies
  • 12 kudos
Latest Reply
nvashisth
New Contributor III
  • 12 kudos

@timo199 , @boskicl I had similar issue and job was getting stuck at Filtering Files for Query indefinitely. I checked SPARK logs and based on that figured out that we had enabled PHOTON acceleration on our cluster for job and datatype of our columns...

  • 12 kudos
7 More Replies
fkseki
by New Contributor III
  • 825 Views
  • 6 replies
  • 6 kudos

Resolved! List budget policies applying filter_by

I'm trying to list budget policies using the parameter "filter_by" to filter policies that start with "aaaa" but I'm getting an error  "400 Bad Request"{'error_code': 'MALFORMED_REQUEST', 'message': "Could not parse request object: Expected 'START_OB...

  • 825 Views
  • 6 replies
  • 6 kudos
Latest Reply
fkseki
New Contributor III
  • 6 kudos

Thanks for the reply, @szymon_dybczak and @lingareddy_Alva.I tried both approaches but none was successful.url = f'{account_url}/api/2.1/accounts/{account_id}/budget-policies'filter_by_json = json.dumps({"policy_name": "aaaa"})params = {"filter_by": ...

  • 6 kudos
5 More Replies
sowanth
by New Contributor II
  • 491 Views
  • 3 replies
  • 0 kudos

Spark Memory Configuration– Request for Clarification

Hi Team,I have noticed the following Spark configuration is being applied, though it's not defined in our repo or anywhere in the policies:spark.memory.offHeap.enabled = true  spark.memory.offHeap.size = Around 3/4 of the node instance memory (i.e 1-...

  • 491 Views
  • 3 replies
  • 0 kudos
Latest Reply
sowanth
New Contributor II
  • 0 kudos

Now I understand how it's automatically configured in our cluster along with the rationale behind this off-heap memory approach.However, I have some concerns about this configuration:General applicability: Most jobs don't actually require 70% off-hea...

  • 0 kudos
2 More Replies
jeremy98
by Honored Contributor
  • 489 Views
  • 1 replies
  • 0 kudos

how to manage a dynamic scheduled job if an INTERNAL_ERROR occurs?

Hi community,My team and I have been occasionally experiencing INTERNAL_ERROR events in Databricks. We have a job that runs on a schedule, but the start times vary. Sometimes, when the job is triggered, the underlying cluster fails to start for some ...

  • 489 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor
  • 0 kudos

Hi @jeremy98 ,To investigate, check the Jobs UI for failed runs and review both error messages and cluster logs. Monitor failure trends over time and adjust cluster settings or quotas if needed.https://docs.databricks.com/gcp/en/jobs/repair-job-failu...

  • 0 kudos
ManojkMohan
by Honored Contributor II
  • 999 Views
  • 1 replies
  • 1 kudos

Resolved! Notebook not found: Error

Last execution failed Notebook not found: Users/manojdatabricks73@gmail.com/includes/CreateRawData. Notebooks can be specified via a relative path (./Notebook or ../folder/Notebook) or via an absolute path (/Abs/Path/to/Notebook). Make sure you are s...

  • 999 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 1 kudos

Hi @ManojkMohan 1. Verify the exact notebook location in the workspaceIn Databricks, open the Workspace browser.Navigate manually to where you think CreateRawData lives.Right-click on the notebook and select Copy Path — this gives you the exact absol...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels