cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Areqio
by New Contributor
  • 227 Views
  • 1 replies
  • 1 kudos

How to Stream Azure event hub to databricks delta table

I am trying to stream my IoT data from azure event hub to databricks. Im running Databricks runtime 17.3 LTS with scala 2.13. 

  • 227 Views
  • 1 replies
  • 1 kudos
Latest Reply
balajij8
Contributor
  • 1 kudos

Hi @Areqio You can use Lakeflow Declarative Pipelines to stream Azure Event Hub IoT data into Databricks delta tables. Lakeflow Spark Declarative Pipelines extends functionality in Spark Structured Streaming and allows you to write just a few lines o...

  • 1 kudos
RIDBX
by Contributor
  • 624 Views
  • 5 replies
  • 0 kudos

Reading JSON file to columns as relational ?

Reading JSON file to columns as relational ?======================================  Thanks for reviewing my threads. I like to explore Reading JSON file to columns as relational  within Databricks.I have input file at workspace path > path\receipt.js...

  • 624 Views
  • 5 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @RIDBX, Thanks for the thorough description. Flattening JSON into relational columns is one of the most common data engineering tasks in Databricks, and there are several powerful approaches depending on your JSON structure. Let me walk you throug...

  • 0 kudos
4 More Replies
cdn_yyz_yul
by Contributor II
  • 686 Views
  • 8 replies
  • 3 kudos

Resolved! schema evolution with structured streaming: upstream schema change causes downstream writer fails.

Hello,Bronze: use classic or job compute,  Autoloader with.option("mergeSchema", "true"). Schema evolution works correctly. data goes to bronze.my_bronze_table.Silver: uses serverless compute, reader reads bronze.my_bronze_table, does all necessary t...

  • 686 Views
  • 8 replies
  • 3 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 3 kudos

Hi @cdn_yyz_yul, Because the silver stream runs on serverless, you can’t relax state-store schema checks or set custom Spark configs. When the upstream bronze table schema evolves in a way that changes the schema of any stateful operator, the streami...

  • 3 kudos
7 More Replies
priyak
by New Contributor III
  • 9059 Views
  • 8 replies
  • 3 kudos

Resolved! Multiple versions of custom libraries on the cluster

Using the install_libraries API, I installed a custom Python whl file on a running cluster. For certain types of requests, we have a requirement to install a different version of the same custom whl file in the running cluster. My problem is that uni...

  • 9059 Views
  • 8 replies
  • 3 kudos
Latest Reply
bkapers
New Contributor II
  • 3 kudos

Why does databricks forum not filter out these spambot AI replies ?  If one wants an AI chatbot to guess an answer their question, they are free to directly ask ChatGPT, etc.  A community forum is for human-written answers based on actual first-hand ...

  • 3 kudos
7 More Replies
Phani1
by Databricks MVP
  • 351 Views
  • 2 replies
  • 2 kudos

Resolved! Seeking Best Approach for Bulk Migration of LUA/Exasol Scripts to Databricks PySpark

Hi All,We are planning a bulk migration of LUA Script / Exasol scripts to Databricks native PySpark and are evaluating the best approach for large-scale automated code conversion and testing in Databricks.So far, we have analyzed the following option...

  • 351 Views
  • 2 replies
  • 2 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 2 kudos

Hi @Phani1, After some research, I don't believe there’s a Databricks-native, one-click tool to bulk-convert Lua/Exasol to PySpark. Databricks AI Assistant is great for interactive refactoring, but as you said, it’s not really a bulk‑migration engine...

  • 2 kudos
1 More Replies
Jake3
by New Contributor III
  • 619 Views
  • 4 replies
  • 5 kudos

Resolved! optimizing my databricks code

I have the following code in databricks under serverless and i want to know how to improve it to make it more efficient and run faster without having the results change (standard errors change slightly when i try to improve it): # Databricks Serverle...

  • 619 Views
  • 4 replies
  • 5 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 5 kudos

Hi @Jake3, Your Taylor-linearisation row-percent estimator is well structured. The main performance bottleneck is the Python-level loop over every domain/measure combination, with a full DataFrame copy (df.copy()) happening inside each iteration. Her...

  • 5 kudos
3 More Replies
pland_yasp
by New Contributor
  • 191 Views
  • 1 replies
  • 0 kudos

No access to Unity Catalog due to NullPointerException in AWS GPU compute

In our Databricks workspace via the AWS marketplace, serverless works, but when I run my notebooks from dedicated computes they can't reach UnityCatalog and therefore my Spark tables.Nothing was configured manually by us, everything is configured by ...

  • 191 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, This sounds like a network connectivity issue between your AWS VPC and the databricks control plane. You're best course of action is to file a databricks support ticket, as they will be able to check the logs to our control plane and see what is ...

  • 0 kudos
sk1996
by New Contributor
  • 204 Views
  • 1 replies
  • 0 kudos

I am getting error on delta write - com.databricks.s3commit.S3CommitFailedException: Access Denied

I am facing delta write issue on one of my non prod bucket. I am using correct role and bucket has all the required access. When data is being written as parquet, it works fine. But when we change format to delta, it fails with Access Denied error an...

Data Engineering
delta
fileformat
  • 204 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, so I think this is happening as the write to delta log is a different AWS permission, to the put object permission that allows you to the parquet file. My suspicion is some kind of permission update must have happened in the background. The best ...

  • 0 kudos
PrasadGaikwad
by New Contributor
  • 11908 Views
  • 1 replies
  • 0 kudos

TypeError: Column is not iterable when using more than one columns in withColumn()

I am trying to find quarter start date from a date column. I get the expected result when i write it using selectExpr() but when i add the same logic in .withColumn() i get TypeError: Column is not iterableI am using a workaround as follows workarou...

  • 11908 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, This is a super old question but answering in case anyone else comes across it. This isn't working because the add months expects an integer rather than a column name you can get round this by using expr() inside withColumn: from pyspark.sql.func...

  • 0 kudos
cristianc
by Contributor
  • 1898 Views
  • 3 replies
  • 3 kudos

Resolved! Does Databricks support AWS S3 Express One Zone?

Greetings,I'm writing this message since I learned that AWS has a storage class that is faster than S3 Standard called "S3 Express One Zone". (https://aws.amazon.com/s3/storage-classes/express-one-zone/)AWS offers support for this storage class with ...

  • 1898 Views
  • 3 replies
  • 3 kudos
Latest Reply
marioquark
New Contributor II
  • 3 kudos

I also watch at this feature

  • 3 kudos
2 More Replies
Nmtc9to5
by New Contributor II
  • 330 Views
  • 1 replies
  • 2 kudos

Resolved! Behavior of the Databricks Asset Bundle using Github Actions

Hi everyone, I am new in the Databricks Asset Bundles world, so I need to understand how the .databricks directory works1. I know that it is created when the databricks bundle deploy command is executed, and which is a place where metadata and the cu...

Data Engineering
DAB
DABs
Databricks Asset Bundles
GitHub Actions
  • 330 Views
  • 1 replies
  • 2 kudos
Latest Reply
Pat
Esteemed Contributor
  • 2 kudos

You’re right that everything is ephemeral on the GitHub runner, but that does not mean “full redeploy from scratch” every time in the workspace. The .databricks directory is local state + cache, and the real, durable state lives in the Databricks wor...

  • 2 kudos
QueryingQuail
by New Contributor III
  • 609 Views
  • 4 replies
  • 3 kudos

Resolved! DLT pipeline cannot read from a Unity Catalog foreign catalog

We are having some difficulties working with Onelake connections.What we have done:Set up a Databricks connection to OnelakeCreated a foreign catalogWe try to read using:import dlt @dlt.table def fabric_test(): return spark.read.table("fabric.db...

  • 609 Views
  • 4 replies
  • 3 kudos
Latest Reply
QueryingQuail
New Contributor III
  • 3 kudos

Thank you all for the reply and please excuse the delay on my part - I've been away.I've read all three replies and this clarified all my questions and widened my understanding of foreign catalog handling within Unity.

  • 3 kudos
3 More Replies
Dileep_Vidyadar
by New Contributor III
  • 10213 Views
  • 10 replies
  • 4 kudos

Not Able to create Cluster on Community Edition for 3-4 days.

I am learning Pyspark on Community edition for a like month. It's been great until I am facing issues while creating a cluster for 3-4 Days.Sometimes it is taking 30 minutes to 60 minutes to create a Cluster and sometimes it is not even creating a Cl...

  • 10213 Views
  • 10 replies
  • 4 kudos
Latest Reply
Mmarin
Databricks Partner
  • 4 kudos

"Hello,I am unable to create a cluster with my profile. I need to request access to create a new cluster for my PySpark jobs.Thank you."

  • 4 kudos
9 More Replies
utkarshamone
by New Contributor III
  • 284 Views
  • 2 replies
  • 0 kudos

Getting runtime execution error when migrating a job to Unity

I am in the process of migrating our jobs from the legacy hive metastore to Unity. I have modified my existing job to read and write from a different bucket as part of the migration. The only change I have made to my job config is to enable this sett...

  • 284 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @utkarshamone, After looking into your error dump, it appears the driver is hanging while trying to initialise a DBFS mount backed by GCS... As an example, the highlighted paths below only appear when Databricks is resolving a DBFS mount (or root)...

  • 0 kudos
1 More Replies
Labels