cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

lsrinivas2k13
by New Contributor II
  • 1689 Views
  • 3 replies
  • 0 kudos

not able to run python script even after everything is in place in azure data bricks

getting the below error while running a python which connects to azure sql db Database connection error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' : file not found (0) (SQLDriverConnect)") can some on...

  • 1689 Views
  • 3 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

The error occurs because the Microsoft ODBC Driver 17 for SQL Server is missing on your Azure Databricks cluster. Here's how to fix it: Steps to Resolve Step 1: Create an Init Script to Install ODBC Driver1. Create a file named `odbc-install.sh` with...

  • 0 kudos
2 More Replies
NikosLoutas
by Databricks Partner
  • 1439 Views
  • 3 replies
  • 2 kudos

Resolved! Materialized Views Compute

When creating a Materialized View (MV) without a schedule, there seems to be a cost associated with the MV once it is created, even if it is not queried.The question is, once the MV is created, is there already a "hot" compute ready for use in case a...

  • 1439 Views
  • 3 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Please select "Accept as Solution" so that others can benefit from this exchange.  Regards, Louis.

  • 2 kudos
2 More Replies
noname123
by New Contributor III
  • 7707 Views
  • 2 replies
  • 0 kudos

Resolved! Delta table version protocol

I do:df.write.format("delta").mode("append").partitionBy("timestamp").option("mergeSchema", "true").save(destination)If table doesn't exist, it creates new table with "minReaderVersion":3,"minWriterVersion":7.Yesterday it was creating table with "min...

  • 7707 Views
  • 2 replies
  • 0 kudos
Latest Reply
AddBox45
New Contributor II
  • 0 kudos

hello how did you fix this explicitly?how did you enable/disable the auto-enable deletion vectors setting to write again with minReaderVersion 1 and minWriterVersion 2?

  • 0 kudos
1 More Replies
kmodelew
by New Contributor III
  • 1571 Views
  • 2 replies
  • 1 kudos

Do I need many wheels for each job in project?

I have  a project witch my commons, like sparksession object (to run code in pycharm using databricks connect library and the same code directly on databricks).I have under src a few packages from which DAB creates separate jobs. I'm using PyCharm. S...

  • 1571 Views
  • 2 replies
  • 1 kudos
Latest Reply
kmodelew
New Contributor III
  • 1 kudos

Hi, I hope it would be usefuel. Here are my files: project structure -> DAB_project_structure.pngeach yml file for job definitions -> task_group_1_job.png and task_group_2_job.pngEach .py file has main() method.setup.py:description="wheel file based ...

  • 1 kudos
1 More Replies
jeremy98
by Honored Contributor
  • 1184 Views
  • 2 replies
  • 0 kudos

how to install the package using --index-url

Hi community,I created a job using databricks asset bundle, but I'm worrying about how to install this dependency in the right way?because, I was testing the related job, but seems it doesn't install the torch library properly 

jeremy98_0-1744215217654.png
  • 1184 Views
  • 2 replies
  • 0 kudos
Latest Reply
jeremy98
Honored Contributor
  • 0 kudos

I tried to do it manually and it works.. through databricks asset bundle no. But, I did at the end: dependencies: - torch==2.5.1 - --index-url https://download.pytorch.org/whl/cpu It says:Error: file doesn't exi...

  • 0 kudos
1 More Replies
pratik21
by New Contributor II
  • 8851 Views
  • 4 replies
  • 1 kudos

Unexpected error while calling Notebook string matching regex `\$[\w_]+' expected but `M' found

Run result unavailable: job failed with error message INVALID_PARAMETER_VALUE: Failed to parse %run command: string matching regex `\$[\w_]+' expected but `M' found) Stacktrace:/Notebookpath: scalato call notebook we are using dbutils.notebook.run("N...

  • 8851 Views
  • 4 replies
  • 1 kudos
Latest Reply
thedeadturtle
Databricks Partner
  • 1 kudos

Since you're using dbutils.notebook.run() properly now, the issue is not in your current notebook, but actually in the target notebook you’re calling.Specifically, Databricks is trying to parse a %run command in that notebook, and it’s hitting a synt...

  • 1 kudos
3 More Replies
Vasu_Kumar_T
by Databricks Partner
  • 538 Views
  • 1 replies
  • 0 kudos

Blade bridge Analyzer out of memory issue

We are running bladebridge analyzer, and we are getting to run out of memorywe tried to increase the RAM and still it gives the same error.We cannot run the analyzer against subset of metadata as it would not generate comprehensive report with how th...

  • 538 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi Vasu_Kumar_T,How are you doing today?, As per my understanding, running out of memory with BladeBridge Analyzer can be tough, especially when you're working with large and complex metadata where you need the full picture. Even if you've increased ...

  • 0 kudos
patacoing
by New Contributor II
  • 772 Views
  • 1 replies
  • 1 kudos

Medaillon architecture

Hello, I have in a S3 data lake, in it:  a structure of files that are of different formats : json, csv, text, binary, ...Would you consider this as my bronze layer ? or a "pre-bronze" layer since it can't be processed directly by spark (because of d...

  • 772 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi patacoing,How are you doing today?, As per my understanding, The structure you described in your S3 data lake sounds more like a "pre-bronze" layer, since the files are in mixed formats (JSON, CSV, text, binary), which makes it tricky to process t...

  • 1 kudos
jeremy98
by Honored Contributor
  • 3608 Views
  • 9 replies
  • 0 kudos

Resolved! Error Databricks Bundle Deploy with changes in the wheel file

Hello Community,Suddenly, I have an error, when I'm doing the deploy of the new bundle to databricks changing the python script, the cluster continue to point to an old version of the py script uploaded from databricks asset bundle, why this? 

  • 3608 Views
  • 9 replies
  • 0 kudos
Latest Reply
denis-dbx
Databricks Employee
  • 0 kudos

We've added a solution for this problem in v0.245.0. There is opt-in "dynamic_version: true" flag on artifact to enable automated wheel patching that break the cache (Example). Once set, "bundle deploy" will transparently patch version suffix in the ...

  • 0 kudos
8 More Replies
Tommabip
by Databricks Partner
  • 2672 Views
  • 3 replies
  • 2 kudos

Resolved! Databricks Cluster Policies

Hi, I' m trying to create a terraform script that does the following:- create a policy where I specify env variables and libraries- create a cluster that inherits from that policy and uses the env variables specified in the policy.I saw in the decume...

  • 2672 Views
  • 3 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

You're correct in observing this discrepancy. When a cluster policy is defined and applied through the Databricks UI, fixed environment variables (`spark_env_vars`) specified in the policy automatically propagate to clusters created under that policy...

  • 2 kudos
2 More Replies
Alex_Persin
by New Contributor III
  • 10123 Views
  • 6 replies
  • 8 kudos

How can the shared memory size (/dev/shm) be increased on databricks worker nodes with custom docker images?

PyTorch uses shared memory to efficiently share tensors between its dataloader workers and its main process. However in a docker container the default size of the shared memory (a tmpfs file system mounted at /dev/shm) is 64MB, which is too small to ...

  • 10123 Views
  • 6 replies
  • 8 kudos
Latest Reply
stevewb
New Contributor III
  • 8 kudos

Bump again... does anyone have a solution for this?

  • 8 kudos
5 More Replies
valde
by New Contributor
  • 932 Views
  • 1 replies
  • 0 kudos

Window function VS groupBy + map

Let's say we have an RDD like this:RDD(id: Int, measure: Int, date: LocalDate)Let's say we want to apply some function that compares 2 consecutive measures by date, outputs a number and we want to get the sum of those numbers by id. The function is b...

  • 932 Views
  • 1 replies
  • 0 kudos
Latest Reply
Renu_
Valued Contributor II
  • 0 kudos

Hi @valde, those two approaches give the same result, but they don’t work the same way under the hood. SparkSQL uses optimized window functions that handle things like shuffling and memory more efficiently, often making it faster and lighter.On the o...

  • 0 kudos
Nathant93
by New Contributor III
  • 2647 Views
  • 2 replies
  • 0 kudos

(java.util.concurrent.ExecutionException) Boxed Error

Has anyone ever come across the error above?I am trying to get two tables from unity catalog and join them, the join is fairly complex as it is imitating a where not exists top 1 sql query.

  • 2647 Views
  • 2 replies
  • 0 kudos
Latest Reply
pk13
New Contributor II
  • 0 kudos

Hello @VZLA Recently, I am getting the exact same error.It has a caused by as below -```Caused by: kafkashaded.org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.```Stacktrace -ERROR: Some ...

  • 0 kudos
1 More Replies
eenaagrawal
by Databricks Partner
  • 5526 Views
  • 1 replies
  • 0 kudos
  • 5526 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor II
  • 0 kudos

Hi @eenaagrawal ,There isn't a specific built-in integration in Databricks to directly interact with Sharepoint. However, you can accomplish this by leveraging libraries like Office365-REST-Python-Client, which enable interaction with Sharepoint's RE...

  • 0 kudos
rahuja
by Contributor
  • 2460 Views
  • 2 replies
  • 0 kudos

Resolved! Cloning Git Repository in Databricks via Rest API Endpoint using Azure Service principal

HelloI have written a python script that uses Databricks Rest API(s). I am trying to clone/ update an Azure Devops Repository inside databricks using Azure Service Principal. I am able to retrieve the credential_id for the service principal I am usin...

  • 2460 Views
  • 2 replies
  • 0 kudos
Latest Reply
rahuja
Contributor
  • 0 kudos

@nicole_lu_PM  So sorry for coming back to this issue after such a long time. But I looked into it and it seems like this concept of OBO token is applicable in case we use Databricks with AWS as our cloud provider. In case of Azure most of the commen...

  • 0 kudos
1 More Replies
Labels