cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Maser_AZ
by New Contributor II
  • 3115 Views
  • 0 replies
  • 0 kudos

16.2 (includes Apache Spark 3.5.2, Scala 2.12) cluster in community edition taking long time

16.2 (includes Apache Spark 3.5.2, Scala 2.12) cluster in community edition taking long time to start.I m trying to launch 16.2 DBR but it seems the cluster which is one node is taking long time . Is this a bug in the community edition ?Here is the u...

Data Engineering
Databricks
  • 3115 Views
  • 0 replies
  • 0 kudos
the_dude
by New Contributor II
  • 909 Views
  • 3 replies
  • 0 kudos

How are .whl files executed for Python wheel tasks?

Hello,We package a Poetry managed project into a .whl and run it as a Python wheel task. Naturally, many of the dependencies referenced by the .whl file are already present on the Databricks cluster. Is this detected by the task setup (in its virtual...

  • 909 Views
  • 3 replies
  • 0 kudos
Latest Reply
Nik_Vanderhoof
Contributor
  • 0 kudos

Hi David,I can't speak exactly to how Poetry handles the dependency resolution of libraries that are already installed, or how that interacts with the Databricks runtime. However, I can offer you some advice on how my team handles this situtation.It'...

  • 0 kudos
2 More Replies
shadowinc
by New Contributor III
  • 2835 Views
  • 0 replies
  • 0 kudos

Call SQL Function via API

Background - I created a SQL function with the name schema.function_name, which returns a table, in a notebook, the function works perfectly, however, I want to execute it via API using SQL Endpoint. In API, I got insufficient privileges error, so gr...

  • 2835 Views
  • 0 replies
  • 0 kudos
udi_azulay
by New Contributor II
  • 2935 Views
  • 6 replies
  • 1 kudos

Variant type table within DLT

Hi,I have a table with Variant type (preview) and works well in 15.3, when i try to run a code that reference this Variant type in a DLT pipeline i get : com.databricks.sql.transaction.tahoe.DeltaUnsupportedTableFeatureException: [DELTA_UNSUPPORTED_F...

  • 2935 Views
  • 6 replies
  • 1 kudos
Latest Reply
MAJVeld
New Contributor II
  • 1 kudos

I can indeed confirm that adding some additional table properties to the @Dlt attribute in the DLT pipeline definition resolved the earlier issues. Thanks for pointing this out. 

  • 1 kudos
5 More Replies
Rasputin312
by New Contributor II
  • 2684 Views
  • 1 replies
  • 1 kudos

Resolved! How To Save a File as a Pickle Object to the Databricks File System

I tried running this code:```def save_file(name, obj   with open(name, 'wb') as        pickle.dump(obj, f)``` One file was saved in the local file system, but the second was too large and so I need to save in the dbfs file system.  Unfortunately, I d...

  • 2684 Views
  • 1 replies
  • 1 kudos
Latest Reply
JissMathew
Valued Contributor
  • 1 kudos

To save a Python object to the Databricks File System (DBFS), you can use the dbutils.fs module to write files to DBFS. Since you are dealing with a Python object and not a DataFrame, you can use the pickle module to serialize the object and then wri...

  • 1 kudos
swzzzsw
by New Contributor III
  • 11529 Views
  • 4 replies
  • 9 kudos

"Run now with different parameters" - different parameters not recognized by jobs involving multiple tasks

I'm running a databricks job involving multiple tasks and would like to run the job with different set of task parameters. I can achieve that by edit each task and and change the parameter values. However, it gets very manual when I have a lot of tas...

  • 11529 Views
  • 4 replies
  • 9 kudos
Latest Reply
VijayNakkonda
New Contributor II
  • 9 kudos

Dear Team, For now, I found a solution. Disconnect the bundle source on Databricks, edit the parameters that you want to run. After execution, redeploy your code again from repository.

  • 9 kudos
3 More Replies
JonathanFlint
by New Contributor III
  • 6334 Views
  • 9 replies
  • 2 kudos

Asset bundle doesn't sync files to workspace

I've created a completely fresh project with a completely empty workspaceLocally I have the databricks CLI version 0.230.0 installedI rundatabricks bundle init default-pythonI have auth set up with a PAT generated by an account which has workspace ad...

  • 6334 Views
  • 9 replies
  • 2 kudos
Latest Reply
pherrera
New Contributor II
  • 2 kudos

Ok, I feel silly.  Despite reading the other messages in this thread, I didn't twig to the fact that I had in fact added the subfolder I had created the DAB in to my top-level project .gitignore since I was just playing around and didn't want to comm...

  • 2 kudos
8 More Replies
aa_204
by New Contributor II
  • 3996 Views
  • 4 replies
  • 0 kudos

Reading excel file using pandas on spark api not rendering #N/A values correctly

I am trying to read a .xlsx file using ps.read_excel() and having #N/A as a value for string type columns. But in the dataframe, i am getting "null" inplace of #N/A . Is there any option , using which we can read #N/A as a string in .xlsx file 

  • 3996 Views
  • 4 replies
  • 0 kudos
Latest Reply
Soumik
New Contributor II
  • 0 kudos

Did you get a solution or workaround for this error, as I am also facing the same even after using dtype = str, na_filter= False, keep_default_na = False ?

  • 0 kudos
3 More Replies
ah0896
by New Contributor III
  • 19325 Views
  • 18 replies
  • 10 kudos

Using init scripts on UC enabled shared access mode clusters

I know that UC enabled shared access mode clusters do not allow init script usage and I have tried multiple workarounds to use the required init script in the cluster(pyodbc-install.sh, in my case) including installing the pyodbc package as a workspa...

  • 19325 Views
  • 18 replies
  • 10 kudos
Latest Reply
praveenVP
New Contributor III
  • 10 kudos

Hello all,Below workaround was efficient to me1) pyodbc-install.sh is uploaded in a Volume 2) the shared cluster is able to navigate to the Volume to select the init script3) the Databricks runtime is 15.4 LTS4) the Allowlist has been updated to allo...

  • 10 kudos
17 More Replies
Benni
by New Contributor III
  • 4579 Views
  • 8 replies
  • 0 kudos

UC, pyodbc, Shared Cluster, and :Can't open lib 'ODBC Driver 17 for SQL Server' : file not found

Hey Databricks!    Trying to use the pyodbc init script in a Volume in UC on a shared compute cluster but receive error: "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' : file not found (0) (SQLDriverConnect)"). I fo...

Benni_0-1730842808643.png Benni_1-1730842895078.png
  • 4579 Views
  • 8 replies
  • 0 kudos
Latest Reply
praveenVP
New Contributor III
  • 0 kudos

Hello all,Below workaround was efficient to me1) pyodbc-install.sh is uploaded in a Volume 2) the shared cluster is able to navigate to the Volume to select the init script3) the Databricks runtime is 15.4 LTS4) the Allowlist has been updated to allo...

  • 0 kudos
7 More Replies
poorni_sm
by New Contributor
  • 3420 Views
  • 2 replies
  • 0 kudos

Get failed records from Salesforce write target tool in AWS GLUE job

I am working in the AWS GLUE service, where we are trying to migrate data from S3 to salesforce using Salesforce write target tool(Using Salesforce connection). The actual process has to be, once the process is done, the salesforce provides the jobId...

  • 3420 Views
  • 2 replies
  • 0 kudos
Latest Reply
jhonm_839
New Contributor III
  • 0 kudos

Thank you so much emillion. This helps me a lot. Keep it up!

  • 0 kudos
1 More Replies
tobyevans
by New Contributor II
  • 8087 Views
  • 1 replies
  • 1 kudos

Ingesting complex/unstructured data

Hi there,my company is reasonably new to using Databricks, and we're running our first PoCs.  Some of the data we have structured/reasonably structured, so it drops into a bucket, we point a notebook at it, and all is well and DeltaThe problem is ari...

  • 8087 Views
  • 1 replies
  • 1 kudos
Latest Reply
mark5
New Contributor II
  • 1 kudos

Hi Toby,Managing diverse, unstructured data can be challenging. At Know2Ledge (ShareArchiver), we specialize in unstructured data management to streamline this process.To handle your scenario efficiently:1️⃣Pre-Process Before Ingestion – Use AI-power...

  • 1 kudos
sachin_kanchan
by New Contributor III
  • 2413 Views
  • 2 replies
  • 0 kudos

Community Edition? More Like Community Abandonment - Thanks for NOTHING, Databricks!

To the Databricks Team (or whoever is pretending to care),Let me get this straight. You offer a "Community Edition" to supposedly help people learn, right? Well, congratulations, you've created the most frustrating, useless signup process I've ever s...

  • 2413 Views
  • 2 replies
  • 0 kudos
Latest Reply
Advika_
Databricks Employee
  • 0 kudos

Hello @sachin_kanchan! I understand the frustration, and I appreciate you sharing your experience. The Community Edition is meant to provide a smooth experience, and this shouldn’t be happening. We usually ask users to drop an email to help@databrick...

  • 0 kudos
1 More Replies
mstfkmlbsbdk
by New Contributor II
  • 3248 Views
  • 1 replies
  • 1 kudos

Resolved! Access ADLS with serverless. CONFIG_NOT_AVAILABLE error

I have my own Autoloader repo and this repo is responsible for ingestion data from landing layer(ADLS) and load data into raw layer in Databricks. In that repo, I created a couple of workflows, and run these workflows with serverless cluster. and I u...

Data Engineering
ADLS
autoloader
dbt
NCC
serverless cluster
  • 3248 Views
  • 1 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

The recommended approach for accessing cloud storage is to create Databricks storage credentials. These storage credentials can refer to entra service principals, managed identities, etc. After a credential is created, create an external location. Wh...

  • 1 kudos
Sjoshi
by New Contributor
  • 1385 Views
  • 2 replies
  • 1 kudos

How to make the write operation faster for writing a spark dataframe to a delta table

So, I am doing 4 spatial join operation on the files with the following sizes:Base_road_file which is 1gigabyteTelematics file which is 1.2 gigsstate boundary file , BH road file, client_geofence file and kpmg_geofence_file which are not too large My...

  • 1385 Views
  • 2 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

We recommend using spatial frameworks to speed up things like spatial joins, point-in-polygon, etc, like databricks mosaic or apache sedona. Without these frameworks, many of these operations result in unoptimized and explosive crossjoins.

  • 1 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels