cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

bhargavabasava
by New Contributor III
  • 1127 Views
  • 2 replies
  • 1 kudos

Support for JDBC writes from serverless compute

Hi team,Are there any plans in place to support JDBC writes using serverless compute.

  • 1127 Views
  • 2 replies
  • 1 kudos
Latest Reply
CarlosPH
Databricks Partner
  • 1 kudos

Hello! And what is the standard way to write to a external database through databricks? general purpose compute?Thanks very much.

  • 1 kudos
1 More Replies
JIWON
by New Contributor III
  • 661 Views
  • 2 replies
  • 3 kudos

Resolved! Questions on Auto Loader auto Listing Logic

Hi everyone,I’m investigating some performance patterns in our Auto Loader (S3) pipelines and would like to clarify the internal listing logic.Context: We run a batch job every hour using Auto Loader. Recently, after March 10th, we noticed our execut...

  • 661 Views
  • 2 replies
  • 3 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 3 kudos

Hi @JIWON , 1. There is no such option; 2. Assuming that the job is triggered every hour, the spikes every 8-hours can be explained by this: To ensure eventual completeness of data in auto mode, Auto Loader automatically triggers a full directory lis...

  • 3 kudos
1 More Replies
jacovangelder
by Databricks MVP
  • 5584 Views
  • 4 replies
  • 10 kudos

How do you define PyPi libraries on job level in Asset Bundles?

Hello,Reading the documentation, it does not state it is possible to define libraries on job level instead of on task level. It feels really counter-intuitive putting libraries on task level in Databricks workflows provisioned by Asset Bundles. Is th...

  • 5584 Views
  • 4 replies
  • 10 kudos
Latest Reply
jacovangelder
Databricks MVP
  • 10 kudos

Thanks @Witold ! Thought so. I decided to go with an init script where I install my dependencies rather than installing libraries. For future reference, this is what it looks like:job_clusters: - job_cluster_key: job_cluster new_cluster: ...

  • 10 kudos
3 More Replies
zenwanderer
by New Contributor II
  • 840 Views
  • 4 replies
  • 0 kudos

Kill/Cancel a Notebook Cell Running Too Long on an All-purpose Cluster

Hi everyone, I’m facing an issue when running a notebook on a Databricks All-purpose cluster. Some of my cells/pipelines run for a very long time, and I want to automatically cancel/kill them when they exceed a certain time limit.I tried setting spar...

  • 840 Views
  • 4 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

@zenwanderer Have you looked into Query Watchdog? For Classic All-Purpose clusters this might be your best bet. https://docs.databricks.com/aws/en/compute/troubleshooting/query-watchdog

  • 0 kudos
3 More Replies
guidotognini
by New Contributor II
  • 639 Views
  • 2 replies
  • 2 kudos

Resolved! Rename Column Name of Streaming Table in Lakeflow Spark Declarative Pipeline

Hi, I would like to know if it is possible to rename the name of a column of a streaming table defined in Lakeflow Spark Declarative Pipeline without having to run a Full Refresh. Could u give me any ideas on how I can achieve this?

  • 639 Views
  • 2 replies
  • 2 kudos
Latest Reply
balajij8
Contributor III
  • 2 kudos

You canUpdate pipeline code to rename old column & trigger a Incremental Update (old_column and new_column exists after it)Old data will have NULL for new_column after Incremental Update. Update the table to fill new_column for such cases from old_co...

  • 2 kudos
1 More Replies
AnilKumarM
by New Contributor
  • 1016 Views
  • 3 replies
  • 1 kudos

Best-practice structure for config.yaml, utils, and databricks.yaml in ML project (Databricks)

Hi everyone, I’m working on an ML project in Databricks and want to design a clean, scalable, and production-ready project structure. I’d really appreciate guidance from those with real-world experience.  My Requirement I need to organize my project ...

  • 1016 Views
  • 3 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @AnilKumarM, Agree with @-werners- here. There isn’t a single 'one true' repo layout we mandate, but there are a few public references that show the patterns Databricks recommends. For bundles/databricks.yml + multi‑env, you may want to check the ...

  • 1 kudos
2 More Replies
maikel
by Contributor III
  • 1385 Views
  • 5 replies
  • 1 kudos

Resolved! SQL schemas migration

Hello Community!I would like to ask for your recommendation in terms of SQL schemas migration best practice. In our project, currently we have different SQL schemas definition and data seeding saved in SQL files. Since we are going to higher environm...

  • 1385 Views
  • 5 replies
  • 1 kudos
Latest Reply
maikel
Contributor III
  • 1 kudos

@anuj_lathi and @Louis_Frolio thank you very much! This is really great approach and example! 

  • 1 kudos
4 More Replies
js5
by New Contributor II
  • 624 Views
  • 1 replies
  • 0 kudos

Resolved! UNSUPPORTED_TIME_TYPE despite 18.1 runtime?

Hello,I have tried using TimeType data type which is supported since Spark 4.1:https://spark.apache.org/docs/latest/sql-ref-datatypes.htmlI am unfortunately still getting UNSUPPORTED_TIME_TYPE error when trying to run display() on a pandas dataframe ...

  • 624 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @js5, This is expected today on Databricks. You can check this out for reference. Spark 4.1 introduces a standard TIME type (TimeType) in the SQL type system, and Databricks runtimes based on Spark 4.x already expose it at the engine level (for ex...

  • 0 kudos
Neelimak
by Databricks Partner
  • 1009 Views
  • 5 replies
  • 3 kudos

Resolved! ingestion pipeline configuration

When trying to create a ingestion pipelines, auto generated cluster is hitting quota limit errors. The type of vm its trying to use is not available in our region and there seems no way to add fallback to different types of vms. Can you please help h...

  • 1009 Views
  • 5 replies
  • 3 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 3 kudos

Hi @Neelimak, Thanks for the feedback. I've now passed the feedback to our product team.   

  • 3 kudos
4 More Replies
IM_01
by Contributor III
  • 533 Views
  • 1 replies
  • 0 kudos

How to disable full refresh on materialized view if the src is not updated since last run

Hi,Is there any option to control refresh rate of materialized view such as , even the dlt is triggered in full refresh mode and src Streaming tables are not updated then the full refresh should not happen on mvs . Is there any way to achieve this.

  • 533 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, The only way that I think you can do this is to seperate out the Materialized views from the pipelines into standalone SQL jobs with a TRIGGER ON UPDATE clause. This means they only run if the streaming tables are updated. However, this approach ...

  • 0 kudos
AndriusVitkausk
by New Contributor III
  • 372 Views
  • 1 replies
  • 0 kudos

Logging inside foreachbatch

Im having difficulty carrying out logging inside a foreachbatch.Seems that all logging outside the foreachbatch works as expected, but inside they are only visable from the spark UI in driver logs. Is there a way to get this to work (inc serverless)?

  • 372 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @AndriusVitkausk, This is expected behaviour on Databricks. Code inside foreachBatch runs in the background as part of the long‑lived streaming query, so print() and standard logging go to the driver logs / Spark UI, not back to the notebook outpu...

  • 0 kudos
ShankarM
by Databricks Partner
  • 806 Views
  • 2 replies
  • 0 kudos

Masking of PII data

We have a below requirement There is a history table where data need to be loaded incrementally.  This table contains a PII field which has been masked using a custom masking function (allow visibility for a specific user group, XXXX for rest). When ...

  • 806 Views
  • 2 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, I believe this is a permissions issue related to the Service Principal who runs the job. If the pipeline is doing a merge into statement then it needs access to be able to see the rows, so it can tell how to match them. First thing to do is to ma...

  • 0 kudos
1 More Replies
sreya_sahithi
by Databricks Partner
  • 562 Views
  • 1 replies
  • 0 kudos

Resolved! Column Tags Not Accessible in Genie (Azure Databricks)

Hi Team,We’ve applied column-level tags to a table in Azure Databricks and attached the table in our Genie workspace. However, when querying via Genie, the column tag information is not being returned correctly (missing/incomplete results), despite t...

sreya_sahithi_2-1774591739004.png
  • 562 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ale_Armillotta
Valued Contributor II
  • 0 kudos

Hi @sreya_sahithi,This is an important distinction about how Genie works: Genie queries the actual data rows in the tables attached to its space — it does not natively query Unity Catalog metadata such as column-level tags. Column tags live in INFORM...

  • 0 kudos
Phani1
by Databricks MVP
  • 816 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks Cost Estimation Template

Hi Databricks Team, Is there a standard Databricks cost estimation template(xl), sizing calculator, or TCO tool that allows us to provide the following inputs and derive an approximate monthly and annual platform cost:Source systems and their types (...

  • 816 Views
  • 2 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, There isn't anything publicly available that I'm aware of. For this kind of complex migration I'd recommend working with your account team. As somebody who does Databricks sizing a lot, it's a nuanced art which I suspect is why we don't have any ...

  • 0 kudos
1 More Replies
seefoods
by Valued Contributor
  • 660 Views
  • 2 replies
  • 0 kudos

Resolved! setup justfile command in order to launch your spark application

Hello Guys, Actually, I build a just file for my project which will be execute my wheel job task using command line, but when i run my wheel task i have encountered this error. from pyspark.sql.connect.expressions import PythonUDFEnvironmentImportErr...

  • 660 Views
  • 2 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

This ImportError happens because you have both standalone pyspark and databricks-connect installed, and they conflict with each other. databricks-connect bundles its own version of PySpark internally — when the standalone pyspark package is also pres...

  • 0 kudos
1 More Replies
Labels