cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ccsalt
by New Contributor
  • 76 Views
  • 2 replies
  • 0 kudos

Inconsistent Cluster Log Persistence to Volume/S3 (stderr, stdout, log4j-active.log)

Saving logs from an all-purpose cluster to Volume or S3 is not consistent, because stderr, stdout, and log4j-active.log get overwritten when the cluster is restarted between minutes 01 and 59.Tested case:A job is configured to start every 20 minutes,...

  • 76 Views
  • 2 replies
  • 0 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 0 kudos

Hi @ccsalt , This is a known limitation. Log rotation (renaming to log4j-YYYY-MM-DD-HH.log.gz) only happens on the hour boundary. The active log file log4j-active.log has always the same name and is overwritten if a cluster restart happens within one...

  • 0 kudos
1 More Replies
loujiang
by New Contributor II
  • 35 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks Runtime, Pyspark and Spark Versions

Hello, Dear community,I was go through the documentation of function from_xml here pyspark.sql.functions.from_xml — PySpark 4.1.2 documentation, it denotes that it is available in pyspark version higher than 4.0.0. Meanwhile, we have documentation fo...

  • 35 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @loujiang ,Databricks Runtime is not a vanilla Apache Spark distribution. DBR is built on top of a highly optimized version of Apache Spark, but also adds enhancements and additional components that substantially improve usability, performance, an...

  • 0 kudos
KSharmaDE
by Visitor
  • 32 Views
  • 1 replies
  • 0 kudos

Import Data from Databricks to SQL Server

Hi our team wants to import data from Databricks catalog tables to SQL server.Is it possible to do so using SSIS package on SQL server ? what settings are required on Databricks tables?Suggest me some ETL tools and how to do it using SSIS

  • 32 Views
  • 1 replies
  • 0 kudos
Latest Reply
ziafazal
Databricks Partner
  • 0 kudos

Hi @KSharmaDE You can create ODBC/ADO.NET Connection on your machine running the SSIS to import data from databricks tables. Databricks provides ODBC driver which can be use to create ODBC/ADO.NET Connection.Follow these Steps 1. Download and install...

  • 0 kudos
plankton
by New Contributor
  • 232 Views
  • 9 replies
  • 3 kudos

R plots not rendering

Has anyone been experiencing the issue of R plots not rendering in notebooks, starting a few days ago?t's not related to splarkly or plotly, or specifc data types, or anything. For example in base R: plot(1:3, 5:7) calculates without error, but does ...

  • 232 Views
  • 9 replies
  • 3 kudos
Latest Reply
TomB
Visitor
  • 3 kudos

One very inadequate workaround is to use the focus mode (Ctrl + Alt + o) on a cell to see that cell and output, which will show the R plot even if it doesn't show in the notebook per se. It very much does not replace the purpose of having notebooks, ...

  • 3 kudos
8 More Replies
micheloh
by New Contributor
  • 99 Views
  • 4 replies
  • 1 kudos

Resolved! Create External Catalog when dbname has special characters

Hi experts,I'm having a problem when trying to create an external catalog with my PostgreSQL database. The connection is fine. But the database name that I want to connect has dashes and colon (eg. my-db-prod:all). When trying to connect with it, I a...

  • 99 Views
  • 4 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @micheloh, From what we’ve seen, this is currently a limitation of Lakehouse Federation foreign catalog creation rather than a problem with the connection itself. The PostgreSQL connection can succeed, but the database value used when creating the...

  • 1 kudos
3 More Replies
Pranav_1699
by New Contributor II
  • 130 Views
  • 1 replies
  • 1 kudos

Building a Spark Declarative Pipeline OSS with Apache Iceberg and AWS Glue Catalog

Hey everyone,I recently worked on building a modern financial data lakehouse using Spark Declarative Pipeline OSS (SDP OSS), Apache Iceberg, and AWS Glue Catalog.The blog covers:- Building declarative data pipelines with Spark- Using Apache Iceberg a...

Data Engineering
Spark Declarative Pipelines
  • 130 Views
  • 1 replies
  • 1 kudos
Latest Reply
sameer_yasser
New Contributor
  • 1 kudos

really cool

  • 1 kudos
Brahmareddy
by Esteemed Contributor
  • 202 Views
  • 1 replies
  • 2 kudos

Too Many Tools Can Slow Good Data Teams Down

A Small Thing I Keep Noticing in Data ProjectsLately, I have been thinking about something I have seen again and again in big data projects.At the start, everything feels manageable. One tool is used for ingestion. Another one is used for transformat...

  • 202 Views
  • 1 replies
  • 2 kudos
Latest Reply
sameer_yasser
New Contributor
  • 2 kudos

Honest advice teams should use Databricks for Data, BI, ML, and AI and close the tab. The depth of what's already there surprises most people once they actually dig in. The real problem isn't the tooling, it's that everyone chases the next shiny thin...

  • 2 kudos
JstelaBR
by Databricks Partner
  • 107 Views
  • 1 replies
  • 1 kudos

Is Databricks AI/BI Genie worth it if we already have Power BI or Tableau?

One thing that really changed how I think about BI platforms happened while I was working in a large enterprise environment heavily invested in Tableau.On paper, the environment looked mature: lots of dashboards, lots of business areas onboarded, and...

  • 107 Views
  • 1 replies
  • 1 kudos
Latest Reply
sameer_yasser
New Contributor
  • 1 kudos

Definitely yes and I'll back that with a real data point.Last week I ran a POC where I replicated our most complex Power BI dashboard in Genie. The original took our team about a month to build. Genie reproduced it in under 10 minutes with zero manua...

  • 1 kudos
Bank_Kirati
by New Contributor III
  • 56 Views
  • 1 replies
  • 0 kudos

Cross-region S3 reads suddenly fail with 400 Bad Request — eu-west-1 metastore to af-south-1 bucket

What changedA production daily job that has worked unchanged for ~8 months started failing on 2026-05-18 ~23:46 UTC. The notebook does a plain spark.read.json("s3://BUCKET/...") against a bucket in af-south-1. The metastore is in eu-west-1. Same code...

  • 56 Views
  • 1 replies
  • 0 kudos
Latest Reply
sameer_yasser
New Contributor
  • 0 kudos

Your debugging is really thorough and you've already done the hard work of isolating this. The 400 with an empty body (no proper S3 error code like InvalidArgument) on an opt-in region is almost always one thing: SigV4 signing region mismatch. af-sou...

  • 0 kudos
Rahul_Dhankhar
by Visitor
  • 47 Views
  • 1 replies
  • 2 kudos

Seeking Volunteers with Lakehouse, Fabric, Databricks, or Snowflake Experience

Hello everyone,I am a doctoral researcher at the University of the Cumberlands and seeking 2–3 volunteers for a 20–25-minute field test for my dissertation research on Lakehouse platform adoption.The field test will be conducted over Zoom or Microsof...

  • 47 Views
  • 1 replies
  • 2 kudos
Latest Reply
sameer_yasser
New Contributor
  • 2 kudos

I am interested. Let me know. 

  • 2 kudos
ManojkMohan
by Honored Contributor II
  • 841 Views
  • 3 replies
  • 1 kudos

Resolved! ML Specific computes in data bricks free edition

Given free edition data bricks has serverless compute only is there any work around to chose ML Specific computes like belowis paying for it the only option ?

ManojkMohan_0-1754653497247.png
  • 841 Views
  • 3 replies
  • 1 kudos
Latest Reply
pjvi
New Contributor II
  • 1 kudos

Hi,In May 2026, I have tried with the environment v5 and still the same issue. However, looks like a Databricks employee answered short before, that in environment v4 it was available again, but not working for me, neither v4 nor v5.https://www.reddi...

  • 1 kudos
2 More Replies
mgcasas-aws
by New Contributor
  • 2524 Views
  • 2 replies
  • 1 kudos

Resolved! Azure Databricks Serverless private connection to S3 bucket

I'm looking for technical references to connect an Azure Databricks serverless workspace to an S3 bucket over a private site-to-site VPN connection. Found the following to connect AWS (consumer) to Azure (provider), but I'm looking for the other way....

  • 2524 Views
  • 2 replies
  • 1 kudos
Latest Reply
Venkatauppuluri
  • 1 kudos

Hello @Sai_Ponugoti any progress on the solution?

  • 1 kudos
1 More Replies
AlexM
by New Contributor
  • 64 Views
  • 1 replies
  • 0 kudos

Serverless Custom Environment Imaging

Hi,I'm looking at moving from job clusters to serverless environments. Ideally to reduce cost and improve start up time.I can see that it is now possible to specify a custom environment .yaml file - and specify Python packages to be installed.Is ther...

  • 64 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @AlexM There isn’t currently a way to bring a pre-built container image into serverless notebooks/jobs. Serverless supports custom environment YAML files and dependency installation/caching, but Databricks Container Services isn’t supported on ser...

  • 0 kudos
Alessio_F
by New Contributor
  • 52 Views
  • 1 replies
  • 0 kudos

Extract SQL function in SQL Server federated database

Hi everyone,I'm using Azure Databricks with a customer who has a SQL Server database federated on the Unity Catalog.It seems that, while converting some date functions to the SQL Server dialect, Databricks uses the function "extract", which is not re...

  • 52 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Alessio_F ,This happens because in Databricks SQL both year and month functions are just aliases over following patterns:- extract (YEAR FROM expr)- extract(MONTH FROM expr) When Databricks pushes a predicate or expression down to the remote SQL ...

  • 0 kudos
Labels