cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ManojkMohan
by Honored Contributor II
  • 819 Views
  • 3 replies
  • 1 kudos

Resolved! ML Specific computes in data bricks free edition

Given free edition data bricks has serverless compute only is there any work around to chose ML Specific computes like belowis paying for it the only option ?

ManojkMohan_0-1754653497247.png
  • 819 Views
  • 3 replies
  • 1 kudos
Latest Reply
pjvi
New Contributor II
  • 1 kudos

Hi,In May 2026, I have tried with the environment v5 and still the same issue. However, looks like a Databricks employee answered short before, that in environment v4 it was available again, but not working for me, neither v4 nor v5.https://www.reddi...

  • 1 kudos
2 More Replies
plankton
by New Contributor
  • 162 Views
  • 7 replies
  • 2 kudos

R plots not rendering

Has anyone been experiencing the issue of R plots not rendering in notebooks, starting a few days ago?t's not related to splarkly or plotly, or specifc data types, or anything. For example in base R: plot(1:3, 5:7) calculates without error, but does ...

  • 162 Views
  • 7 replies
  • 2 kudos
Latest Reply
riabenko
Visitor
  • 2 kudos

The device to file workaround doesn't work for complex plots from ggarrange. I also stopped seeing plots in all old notebooks.  Hope this is fixed soon.

  • 2 kudos
6 More Replies
ccsalt
by New Contributor
  • 65 Views
  • 1 replies
  • 0 kudos

Inconsistent Cluster Log Persistence to Volume/S3 (stderr, stdout, log4j-active.log)

Saving logs from an all-purpose cluster to Volume or S3 is not consistent, because stderr, stdout, and log4j-active.log get overwritten when the cluster is restarted between minutes 01 and 59.Tested case:A job is configured to start every 20 minutes,...

  • 65 Views
  • 1 replies
  • 0 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 0 kudos

Hi @ccsalt , This is a known limitation. Log rotation (renaming to log4j-YYYY-MM-DD-HH.log.gz) only happens on the hour boundary. The active log file log4j-active.log has always the same name and is overwritten if a cluster restart happens within one...

  • 0 kudos
mgcasas-aws
by New Contributor
  • 2512 Views
  • 2 replies
  • 1 kudos

Resolved! Azure Databricks Serverless private connection to S3 bucket

I'm looking for technical references to connect an Azure Databricks serverless workspace to an S3 bucket over a private site-to-site VPN connection. Found the following to connect AWS (consumer) to Azure (provider), but I'm looking for the other way....

  • 2512 Views
  • 2 replies
  • 1 kudos
Latest Reply
Venkatauppuluri
  • 1 kudos

Hello @Sai_Ponugoti any progress on the solution?

  • 1 kudos
1 More Replies
AlexM
by New Contributor
  • 57 Views
  • 1 replies
  • 0 kudos

Serverless Custom Environment Imaging

Hi,I'm looking at moving from job clusters to serverless environments. Ideally to reduce cost and improve start up time.I can see that it is now possible to specify a custom environment .yaml file - and specify Python packages to be installed.Is ther...

  • 57 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @AlexM There isn’t currently a way to bring a pre-built container image into serverless notebooks/jobs. Serverless supports custom environment YAML files and dependency installation/caching, but Databricks Container Services isn’t supported on ser...

  • 0 kudos
Alessio_F
by New Contributor
  • 34 Views
  • 1 replies
  • 0 kudos

Extract SQL function in SQL Server federated database

Hi everyone,I'm using Azure Databricks with a customer who has a SQL Server database federated on the Unity Catalog.It seems that, while converting some date functions to the SQL Server dialect, Databricks uses the function "extract", which is not re...

  • 34 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Alessio_F ,This happens because in Databricks SQL both year and month functions are just aliases over following patterns:- extract (YEAR FROM expr)- extract(MONTH FROM expr) When Databricks pushes a predicate or expression down to the remote SQL ...

  • 0 kudos
Raj_DB
by Contributor
  • 52 Views
  • 1 replies
  • 0 kudos

Automating Job Permission Updates in Databricks Using a Notebook

Hi everyone,I am looking to create a notebook that, when executed by a user, performs the following actions:Retrieves all Databricks jobs created by the current userChecks whether a specific role already has permissions on those jobsAutomatically add...

  • 52 Views
  • 1 replies
  • 0 kudos
Latest Reply
ziafazal
Databricks Partner
  • 0 kudos

Hi @Raj_DB You can use databricks SDK to retrieve all jobs filter them by selecting only those where owner is current usersomething like thisfrom databricks.sdk import WorkspaceClient w = WorkspaceClient() # Specify the user email/username you want...

  • 0 kudos
micheloh
by Visitor
  • 55 Views
  • 2 replies
  • 0 kudos

Create External Catalog when dbname has special characters

Hi experts,I'm having a problem when trying to create an external catalog with my PostgreSQL database. The connection is fine. But the database name that I want to connect has dashes and colon (eg. my-db-prod:all). When trying to connect with it, I a...

  • 55 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @micheloh, From what we’ve seen, this is currently a limitation of Lakehouse Federation foreign catalog creation rather than a problem with the connection itself. The PostgreSQL connection can succeed, but the database value used when creating the...

  • 0 kudos
1 More Replies
malterializedvw
by New Contributor III
  • 1944 Views
  • 9 replies
  • 4 kudos

Parametrizing queries in DAB deployments

Hi folks,I would like to ask for best practises concerning the topic of parametrizing queries in Databricks Asset Bundle deployments.This topic is relevant to differentiate between deployments on different environments as well as [dev]-deployments vs...

  • 1944 Views
  • 9 replies
  • 4 kudos
Latest Reply
abohlin
Visitor
  • 4 kudos

Came across this thread as I was facing the same exact issue as @malterializedvw and I want to comment my fix in case anyone else tears their hair out on this problem.In my databricks.yml file I put in a gold_catalog and a silver_catalog variable and...

  • 4 kudos
8 More Replies
erigaud
by Honored Contributor
  • 9034 Views
  • 3 replies
  • 3 kudos

Get total number of files of a Delta table

I'm looking to know programatically how many files a delta table is made of.I know I can do %sqlDESCRIBE DETAIL my_tableBut that would only give me the number of files of the current version. I am looking to know the total number of files (basically ...

  • 9034 Views
  • 3 replies
  • 3 kudos
Latest Reply
gmiguel
Databricks Partner
  • 3 kudos

The best way to get this is executing the following statement:ANALYZE TABLE [table_name] COMPUTE STORAGE METRICS;Applies to: Databricks Runtime 18.0 and above

  • 3 kudos
2 More Replies
flourishingsing
by New Contributor III
  • 70 Views
  • 1 replies
  • 0 kudos

Resolved! How can retrieve backfill run parameter in Python?

I'm trying to run backfill with the following parameter. How can I access this in the Python script?Do I need to change anything in the yml?I usually set task parameters the following way:These are then parsed using argparse Python module.  

flourishingsing_0-1779284296139.png flourishingsing_1-1779284438804.png
  • 70 Views
  • 1 replies
  • 0 kudos
Latest Reply
flourishingsing
New Contributor III
  • 0 kudos

Found the following solution:Add job level parameters:parameters: - name: run_timestamp default: "some_default_value" Reference in task level parameters:tasks: - task_key: my_task spark_python_task: python_file: ../../script.py ...

  • 0 kudos
manish_de
by New Contributor II
  • 379 Views
  • 5 replies
  • 5 kudos

query based connector snapshot feature

In ingestion pipeline, for query based connector there is option of selecting batch snapshot instead of column name under dropdown - Cursor column. If I choose batch snapshot, will the databricks engine run select * from my source table, say Sql serv...

  • 379 Views
  • 5 replies
  • 5 kudos
Latest Reply
michaelfriendly
New Contributor II
  • 5 kudos

@rbtv It may execute something very similar to a `SELECT *` on the source table unless the platform adds its own partitioning or optimisation behind the scenes. From what I've observed, selecting batch snapshot often means the connector handles each ...

  • 5 kudos
4 More Replies
koen_hai
by New Contributor II
  • 112 Views
  • 2 replies
  • 0 kudos

Resolved! Custom and community connectors

Hi,The option to enable custom and community connectors does not seem to be available on the Previews page, how can this be enabled? Feature I'm referencing: Community connectors in Lakeflow Connect - Azure Databricks | Microsoft Learn

  • 112 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @koen_hai, The Community Connectors feature is controlled from the workspace-level Previews page by a workspace admin. If you don’t see that option there, the workspace likely hasn’t been enrolled for the preview yet. In that case, please contact ...

  • 0 kudos
1 More Replies
Labels