cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

saicharandeepb
by New Contributor III
  • 96 Views
  • 5 replies
  • 1 kudos

Looking for Suggestions: Designed a Decision Tree to Recommend Optimal VM Types for Workloads

Hi everyone!I recently designed a decision tree model to help recommend the most suitable VM types for different kinds of workloads in Databricks. Thought Process Behind the Design:Determining the optimal virtual machine (VM) for a workload is heavil...

saicharandeepb_0-1762515348166.png
  • 96 Views
  • 5 replies
  • 1 kudos
Latest Reply
Coffee77
Contributor
  • 1 kudos

It looks interesting and I'll take a deeper loop! At first sight, as a suggestion I would include a new decision node to conditionally include VMs ready to "delta cache acceleration" or now "disk caching". These VMs have local SSD volumes so that the...

  • 1 kudos
4 More Replies
kanikvijay9
by New Contributor III
  • 58 Views
  • 2 replies
  • 10 kudos

Optimizing Delta Table Writes for Massive Datasets in Databricks

Problem StatementIn one of my recent projects, I faced a significant challenge: Writing a huge dataset of 11,582,763,212 rows and 2,068 columns to a Databricks managed Delta table.The initial write operation took 22.4 hours using the following setup:...

kanikvijay9_0-1762695454233.png kanikvijay9_1-1762695506126.png kanikvijay9_2-1762695536800.png kanikvijay9_3-1762695573841.png
  • 58 Views
  • 2 replies
  • 10 kudos
Latest Reply
kanikvijay9
New Contributor III
  • 10 kudos

Hey @Louis_Frolio ,Thank you for the thoughtful feedback and great suggestions!A few clarifications:AQE is already enabled in my setup, and it definitely helped reduce shuffle overhead during the write.Regarding Column Pruning, in this case, the fina...

  • 10 kudos
1 More Replies
RobsonNLPT
by Contributor III
  • 3773 Views
  • 2 replies
  • 0 kudos

Databricks Rest API Statement Execution - External Links

Hi.I've tested the adb Rest Api to execute queries on databricks sql serverless. Using INLINE as disposition I have the json array with my correct results but using EXTERNAL_LINKS I have the chunks but the external_link (URL starting with http://stor...

  • 3773 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The issue you're experiencing with Databricks SQL Serverless REST API in EXTERNAL_LINKS mode—where the external_link URL (http://storage-proxy.databricks.com/...) does not work, but you can access chunks directly via the /api/2.0/sql/statements/{stat...

  • 0 kudos
1 More Replies
rbee
by New Contributor II
  • 5069 Views
  • 1 replies
  • 0 kudos

Connect to Sql server analysis services(SSAS) server to run DAX query using python

Hi, I have a powerbi server which I'm able to connect through SSMS. I tried using pyodbc to connect to same in databricks, but it is throwing below error.OperationalError: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 17 for SQL Server]Login timeout exp...

  • 5069 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Your understanding is correct: Power BI’s data model is stored in an Analysis Services (SSAS) engine, not a traditional SQL Server database. This means that while SSMS may connect to Power BI Premium datasets via XMLA endpoints, attempting to use pyo...

  • 0 kudos
RobsonNLPT
by Contributor III
  • 4545 Views
  • 4 replies
  • 0 kudos

Connection timeout when connecting to MongoDB using MongoDB Connector for Spark 10.x

Hi.I'm testing a databricks connection to a mongo cluster V7 (azure cluster) using the library org.mongodb.spark:mongo-spark-connector_2.13:10.4.1I can connect using compass but I get a timeout error using my adb notebookMongoTimeoutException: Timed ...

  • 4545 Views
  • 4 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The error you’re seeing — MongoTimeoutException referencing localhost:27017 â€” suggests your Databricks cluster is trying to connect to MongoDB using the wrong address or that it cannot properly reach the MongoDB cluster endpoint from the notebook, ev...

  • 0 kudos
3 More Replies
kertsman_nm
by New Contributor
  • 3854 Views
  • 1 replies
  • 0 kudos

Trying to use Broadcast to run Presidio distrubuted

Hello,I am currently evaluating using Microsoft's Presidio de-identification libraries for my organization and would like to see if we can take advantage to Sparks broadcast capabilities, but I am getting an error message:"[BROADCAST_VARIABLE_NOT_LOA...

  • 3854 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

You’re encountering the [BROADCAST_VARIABLE_NOT_LOADED] error because Databricks in shared access mode cannot use broadcast variables with non-serializable Python objects (such as your Presidio engines) due to cluster architecture limitations. The cl...

  • 0 kudos
6502
by New Contributor III
  • 3282 Views
  • 1 replies
  • 0 kudos

Schema change and OpenSearch

Let me be crystal clear: Schema Change and OpenSeach do not fit well together. However, the data pushed to it are processed and always have the same schema. The problem here is that Spark is reading a CDC feed, which is subject to Schema Change becau...

  • 3282 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

You are encountering a common issue in Databricks Delta Lake streaming when working with Change Data Capture (CDC) feeds: schema evolution, especially with column mapping enabled, is not fully supported automatically in streaming reads—that includes ...

  • 0 kudos
van45678
by New Contributor
  • 3888 Views
  • 2 replies
  • 0 kudos

Getting connection reset issue while connecting to a SQL server

Hello All,I am unable to connect to a SQL server instance that is installed in a on-premise network from databricks. I am able to successfully ping the server from the notebook using this command [nc -vz <hostname> <port>]  which means I am able to e...

Data Engineering
Databricks
sqlserver
timeout
  • 3888 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The error you are encountering, "com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset," even after a successful nc (netcat) connection, is a common but nuanced problem when connecting Databricks to an on-premise SQL Server. Although your...

  • 0 kudos
1 More Replies
meghana_tulla
by New Contributor III
  • 2262 Views
  • 1 replies
  • 0 kudos

Issue: UCX Assessment Installation Error in Databricks Automation Script

Hi I'm experiencing a problem when installing UCX Assessment through an automation script in Databricks. The script fails with this error:13:38:06 WARNING [databricks.labs.ucx.hive_metastore.tables] {listing_tables_0} failed-table-crawl: listing tabl...

  • 2262 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The manual installation of UCX Assessment in Databricks works with the default <ALL> values, but automation scripts that set WORKSPACE_GROUPS="<ALL>" DATABASES="<ALL>" often encounter a SCHEMA_NOT_FOUND error related to 'ALL' not being recognized as ...

  • 0 kudos
aav331
by New Contributor
  • 157 Views
  • 3 replies
  • 2 kudos

Resolved! Unable to install libraries from requirements.txt in a Serverless Job and spark_python_task

I am running into the following error while trying to deploy a serverless job running a spark_python_task with GIT as the source for the code. The Job was deployed as part of a DAB from a Github Actions Runner.Run failed with error message Library i...

  • 157 Views
  • 3 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

@aav331 , if you are happy with the result please "Accept as Solution." This will help others who may be in the same boat. Cheers, Louis.

  • 2 kudos
2 More Replies
carlos_tasayco
by Contributor
  • 2449 Views
  • 1 replies
  • 0 kudos

DATABRICKS CLI SYNC SPECIFIC FILES

Hello,I am struggling with this problem I need to update databricks repo, to only sync some files according to documentation is possible:https://learn.microsoft.com/en-us/azure/databricks/dev-tools/cli/sync-commands#only-sync-specific-filesIn my work...

carlos_tasayco_0-1751310427720.png carlos_tasayco_1-1751310545234.png
  • 2449 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

You are trying to use the --include-from option with your .gitignore file to only sync specific files with the databricks sync command, but you are observing that all files get synced, not just the expected ones. The key issue is how the include/excl...

  • 0 kudos
hgm251
by New Contributor
  • 124 Views
  • 2 replies
  • 2 kudos

badrequest: cannot create online table is being deprecated. creating new online table is not allowed

Hello!This seems so sudden that we cannot create online tables anymore? Is there a workaround to being able to create online tables temporarily as we need more time to move to synced tables? #online_tables 

  • 124 Views
  • 2 replies
  • 2 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 2 kudos

Yes, the Databricks online tables (legacy) are being deprecated, and after January 15, 2026, you will no longer be able to access or create them.https://docs.databricks.com/aws/en/machine-learning/feature-store/migrate-from-online-tablesHere are few ...

  • 2 kudos
1 More Replies
Vetrivel
by Contributor
  • 3516 Views
  • 1 replies
  • 0 kudos

Federate AWS Cloudwatch logs to Databricks Unity Catalog

I am looking to integrate CloudWatch logs with Databricks. Our objective is not to monitor Databricks via CloudWatch, but rather to facilitate access to CloudWatch logs from within Databricks. If anyone has implemented a similar solution, kindly prov...

  • 3516 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

To access CloudWatch logs from within Databricks, you can set up an integration that enables Databricks to fetch, query, and analyze AWS CloudWatch log data directly—without configuring CloudWatch to monitor Databricks clusters. This approach is incr...

  • 0 kudos
jeremy98
by Honored Contributor
  • 3456 Views
  • 1 replies
  • 1 kudos

Environment set up in serveless notebook task

Hi community,Is there a way to install dependencies inside a notebook task using serveless compute using Databricks Asset Bundle? Is there a way to avoid installing everytime for each serverless task that compose a job the dependencies (or the librar...

  • 3456 Views
  • 1 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

For Databricks serverless compute jobs using Asset Bundles, custom dependencies (such as Python packages or wheel files) cannot be pre-installed on shared serverless infrastructure across job tasks as you can with traditional job clusters. Instead, d...

  • 1 kudos
Maser_AZ
by New Contributor II
  • 3875 Views
  • 1 replies
  • 0 kudos

16.2 (includes Apache Spark 3.5.2, Scala 2.12) cluster in community edition taking long time

16.2 (includes Apache Spark 3.5.2, Scala 2.12) cluster in community edition taking long time to start.I m trying to launch 16.2 DBR but it seems the cluster which is one node is taking long time . Is this a bug in the community edition ?Here is the u...

Data Engineering
Databricks
  • 3875 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The long startup time for a Databricks Runtime 16.2 (Apache Spark 3.5.2, Scala 2.12) single-node cluster in Databricks Community Edition is a known issue and not unique to your setup. Many users have reported this situation, with some clusters taking...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels