cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ChristianRRL
by Valued Contributor
  • 643 Views
  • 3 replies
  • 2 kudos

Databricks UMF Best Practice

Hi there, I would like to get some feedback on what are the ideal/suggested ways to get UMF data from our Azure cloud into Databricks. For context, UMF can mean either:User Managed FileUser Maintained FileBasically, a UMF could be something like a si...

Data Engineering
Data ingestion
UMF
User Maintained File
User Managed File
  • 643 Views
  • 3 replies
  • 2 kudos
Latest Reply
BigRoux
Databricks Employee
  • 2 kudos

I am not an expert on this topic or Azure services but I did some research and have some suggested courses of action for you to test out.  To address your request for suggested ways to get User Managed Files (UMF) from Azure into Databricks, here are...

  • 2 kudos
2 More Replies
ChristianRRL
by Valued Contributor
  • 392 Views
  • 3 replies
  • 4 kudos

Resolved! toml file syntax highlighting

Hi there, I'm curious if there's a way for Databricks to support syntax highlighting for a language that is currently not supported in our DBX configuration. For example, I'm using .toml files, but Databricks doesn't understand it and displays it as ...

ChristianRRL_0-1743447583140.png ChristianRRL_1-1743447614420.png
  • 392 Views
  • 3 replies
  • 4 kudos
Latest Reply
Advika
Databricks Employee
  • 4 kudos

Hello @ChristianRRL! Sorry for the delayed response. Databricks currently does not support syntax highlighting for .toml files. As a workaround, you can edit toml files in external editors like VS code (with plugins) and sync them to Databricks using...

  • 4 kudos
2 More Replies
georgef
by New Contributor III
  • 4604 Views
  • 3 replies
  • 2 kudos

Resolved! Cannot import relative python paths

Hello,Some variations of this question have been asked before but there doesn't seem to be an answer for the following simple use case:I have the following file structure on a Databricks Asset Bundles project: src --dir1 ----file1.py --dir2 ----file2...

  • 4604 Views
  • 3 replies
  • 2 kudos
Latest Reply
klaas
New Contributor II
  • 2 kudos

This works as long as the script calling the module is indeed __main__; i've changed it a bit to make it more generic:import os import sys def find_module(path): while path: if os.path.basename(path) == "src": return path ...

  • 2 kudos
2 More Replies
lsrinivas2k13
by New Contributor II
  • 427 Views
  • 3 replies
  • 0 kudos

not able to run python script even after everything is in place in azure data bricks

getting the below error while running a python which connects to azure sql db Database connection error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' : file not found (0) (SQLDriverConnect)") can some on...

  • 427 Views
  • 3 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

The error occurs because the Microsoft ODBC Driver 17 for SQL Server is missing on your Azure Databricks cluster. Here's how to fix it: Steps to Resolve Step 1: Create an Init Script to Install ODBC Driver1. Create a file named `odbc-install.sh` with...

  • 0 kudos
2 More Replies
NikosLoutas
by New Contributor II
  • 366 Views
  • 3 replies
  • 2 kudos

Resolved! Materialized Views Compute

When creating a Materialized View (MV) without a schedule, there seems to be a cost associated with the MV once it is created, even if it is not queried.The question is, once the MV is created, is there already a "hot" compute ready for use in case a...

  • 366 Views
  • 3 replies
  • 2 kudos
Latest Reply
BigRoux
Databricks Employee
  • 2 kudos

Please select "Accept as Solution" so that others can benefit from this exchange.  Regards, Louis.

  • 2 kudos
2 More Replies
ak5har
by New Contributor II
  • 690 Views
  • 7 replies
  • 2 kudos

Databricks connection to on-prem cloudera

Hello,     we are trying to evaluate Databricks solution to extract the data from existing cloudera schema hosted on physical server. We are using the Databricks serverless compute provided by databricks express setup and we assume we will not need t...

  • 690 Views
  • 7 replies
  • 2 kudos
Latest Reply
lorenzo1889
New Contributor II
  • 2 kudos

We are in the same situation. We have a CDH cluster with IaaS architecture. The data are on Hdfs in EC2 disks in AWS and we want to migrate the data from CDH to Databricks in AZURE.If we federate CDH's HIVE metastore with Databricks, we can migrate t...

  • 2 kudos
6 More Replies
noname123
by New Contributor III
  • 5364 Views
  • 2 replies
  • 0 kudos

Resolved! Delta table version protocol

I do:df.write.format("delta").mode("append").partitionBy("timestamp").option("mergeSchema", "true").save(destination)If table doesn't exist, it creates new table with "minReaderVersion":3,"minWriterVersion":7.Yesterday it was creating table with "min...

  • 5364 Views
  • 2 replies
  • 0 kudos
Latest Reply
AddBox45
New Contributor II
  • 0 kudos

hello how did you fix this explicitly?how did you enable/disable the auto-enable deletion vectors setting to write again with minReaderVersion 1 and minWriterVersion 2?

  • 0 kudos
1 More Replies
kmodelew
by New Contributor III
  • 287 Views
  • 2 replies
  • 1 kudos

Do I need many wheels for each job in project?

I have  a project witch my commons, like sparksession object (to run code in pycharm using databricks connect library and the same code directly on databricks).I have under src a few packages from which DAB creates separate jobs. I'm using PyCharm. S...

  • 287 Views
  • 2 replies
  • 1 kudos
Latest Reply
kmodelew
New Contributor III
  • 1 kudos

Hi, I hope it would be usefuel. Here are my files: project structure -> DAB_project_structure.pngeach yml file for job definitions -> task_group_1_job.png and task_group_2_job.pngEach .py file has main() method.setup.py:description="wheel file based ...

  • 1 kudos
1 More Replies
jeremy98
by Contributor III
  • 336 Views
  • 2 replies
  • 0 kudos

how to install the package using --index-url

Hi community,I created a job using databricks asset bundle, but I'm worrying about how to install this dependency in the right way?because, I was testing the related job, but seems it doesn't install the torch library properly 

jeremy98_0-1744215217654.png
  • 336 Views
  • 2 replies
  • 0 kudos
Latest Reply
jeremy98
Contributor III
  • 0 kudos

I tried to do it manually and it works.. through databricks asset bundle no. But, I did at the end: dependencies: - torch==2.5.1 - --index-url https://download.pytorch.org/whl/cpu It says:Error: file doesn't exi...

  • 0 kudos
1 More Replies
pratik21
by New Contributor II
  • 7095 Views
  • 4 replies
  • 1 kudos

Unexpected error while calling Notebook string matching regex `\$[\w_]+' expected but `M' found

Run result unavailable: job failed with error message INVALID_PARAMETER_VALUE: Failed to parse %run command: string matching regex `\$[\w_]+' expected but `M' found) Stacktrace:/Notebookpath: scalato call notebook we are using dbutils.notebook.run("N...

  • 7095 Views
  • 4 replies
  • 1 kudos
Latest Reply
thedeadturtle
New Contributor II
  • 1 kudos

Since you're using dbutils.notebook.run() properly now, the issue is not in your current notebook, but actually in the target notebook you’re calling.Specifically, Databricks is trying to parse a %run command in that notebook, and it’s hitting a synt...

  • 1 kudos
3 More Replies
Vasu_Kumar_T
by New Contributor II
  • 105 Views
  • 1 replies
  • 0 kudos

Blade bridge Analyzer out of memory issue

We are running bladebridge analyzer, and we are getting to run out of memorywe tried to increase the RAM and still it gives the same error.We cannot run the analyzer against subset of metadata as it would not generate comprehensive report with how th...

  • 105 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi Vasu_Kumar_T,How are you doing today?, As per my understanding, running out of memory with BladeBridge Analyzer can be tough, especially when you're working with large and complex metadata where you need the full picture. Even if you've increased ...

  • 0 kudos
patacoing
by New Contributor II
  • 302 Views
  • 1 replies
  • 1 kudos

Medaillon architecture

Hello, I have in a S3 data lake, in it:  a structure of files that are of different formats : json, csv, text, binary, ...Would you consider this as my bronze layer ? or a "pre-bronze" layer since it can't be processed directly by spark (because of d...

  • 302 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 1 kudos

Hi patacoing,How are you doing today?, As per my understanding, The structure you described in your S3 data lake sounds more like a "pre-bronze" layer, since the files are in mixed formats (JSON, CSV, text, binary), which makes it tricky to process t...

  • 1 kudos
jeremy98
by Contributor III
  • 1003 Views
  • 9 replies
  • 0 kudos

Resolved! Error Databricks Bundle Deploy with changes in the wheel file

Hello Community,Suddenly, I have an error, when I'm doing the deploy of the new bundle to databricks changing the python script, the cluster continue to point to an old version of the py script uploaded from databricks asset bundle, why this? 

  • 1003 Views
  • 9 replies
  • 0 kudos
Latest Reply
denis-dbx
Databricks Employee
  • 0 kudos

We've added a solution for this problem in v0.245.0. There is opt-in "dynamic_version: true" flag on artifact to enable automated wheel patching that break the cache (Example). Once set, "bundle deploy" will transparently patch version suffix in the ...

  • 0 kudos
8 More Replies
Tommabip
by New Contributor II
  • 221 Views
  • 3 replies
  • 2 kudos

Resolved! Databricks Cluster Policies

Hi, I' m trying to create a terraform script that does the following:- create a policy where I specify env variables and libraries- create a cluster that inherits from that policy and uses the env variables specified in the policy.I saw in the decume...

  • 221 Views
  • 3 replies
  • 2 kudos
Latest Reply
BigRoux
Databricks Employee
  • 2 kudos

You're correct in observing this discrepancy. When a cluster policy is defined and applied through the Databricks UI, fixed environment variables (`spark_env_vars`) specified in the policy automatically propagate to clusters created under that policy...

  • 2 kudos
2 More Replies
Alex_Persin
by New Contributor III
  • 6420 Views
  • 6 replies
  • 8 kudos

How can the shared memory size (/dev/shm) be increased on databricks worker nodes with custom docker images?

PyTorch uses shared memory to efficiently share tensors between its dataloader workers and its main process. However in a docker container the default size of the shared memory (a tmpfs file system mounted at /dev/shm) is 64MB, which is too small to ...

  • 6420 Views
  • 6 replies
  • 8 kudos
Latest Reply
stevewb
New Contributor II
  • 8 kudos

Bump again... does anyone have a solution for this?

  • 8 kudos
5 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels