cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

rahult1407
by New Contributor II
  • 1422 Views
  • 3 replies
  • 2 kudos

Lakebridge code conversion from oracle to databricks sql

Hi Community,I’m working on migrating several Oracle views to SparkSQL using the Databricks Labs Lakehouse Bridge tool.  I’m facing issues while converting the code .For oracle views and materialized views .Problems I’m encountering:The converted SQL...

  • 1422 Views
  • 3 replies
  • 2 kudos
Latest Reply
Raman_Unifeye
Honored Contributor III
  • 2 kudos

@Louis_Frolio - will there be similar guidelines for other 'source' code too such as T-SQL, Teradata. Any such comprhensive docs on Laeebridge to cover per source.

  • 2 kudos
2 More Replies
murtadha_s
by Databricks Partner
  • 1540 Views
  • 1 replies
  • 2 kudos

Resolved! Moving files using DBUtils is so slow

I am using the platform DBUtils.fs.mv() on databricks clusters, and facing issues with move operation slowness.I move files in UC Volumes or ADLS storage abfss links, which work but is so slow.I mean it takes hours to transfer files that used to take...

  • 1540 Views
  • 1 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Hello @murtadha_s , here are some helpfult tips and hints to help you further diagnose the slowness.   Totally expected behavior here: object-storage moves with dbutils.fs.mv will be much slower than HDFS. Under the hood, dbutils isn’t doing an atom...

  • 2 kudos
Charansai
by New Contributor III
  • 583 Views
  • 1 replies
  • 0 kudos

Serverless Compute – ADLS Gen2 Authorization Failure with RBAC

We are facing an authorization issue when using serverless compute with ADLS Gen2 storage. Queries fail with:Code AbfsRestOperationException: Operation failed: "This request is not authorized to perform this operation.", 403 AuthorizationFailureDetai...

  • 583 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 0 kudos

private link from serverless, as probably you are not allowing public internet access. Configure private connectivity to Azure resources - Azure Databricks | Microsoft Learn you need to add both dfs and blob

  • 0 kudos
Fz1
by New Contributor III
  • 14589 Views
  • 7 replies
  • 3 kudos

Resolved! SQL Warehouse Serverless - Not able to access the external tables in the hive_metastore

I have DLT tables created under the hive_metastore with external data stored in ADL gen2.The ADL blob storage is mounted into /mnt/<storage-account>The tables are successfully created and accessible from my notebooks, as well the ADL storage.I have c...

  • 14589 Views
  • 7 replies
  • 3 kudos
Latest Reply
Charansai
New Contributor III
  • 3 kudos

we can use terraform to create NCC (Network Connectivity Configuration). It will create a private endpoint on storage account and approve it manually because it is not auto approved.

  • 3 kudos
6 More Replies
adhi_databricks
by Contributor
  • 1706 Views
  • 2 replies
  • 1 kudos

Resolved! Multiple Databricks Issues: Spark Context Limit, Concurrency Load, API Character Limit & Job Timeout

I am encountering multiple issues in our Databricks environment and would appreciate guidance or best-practice recommendations for each. Details below:1. [MaxSparkContextsExceeded] Too many execution contexts are open right now (Limit 150)Error: [Max...

  • 1706 Views
  • 2 replies
  • 1 kudos
Latest Reply
siva-anantha
Databricks Partner
  • 1 kudos

I would like to add my experience with 3. Databricks API 10k Character LimitWe had a similar issue, and this limit cannot be changed. Instead review concepts of sharing the input/output between Databricks and caller using cloud storage like ADLS. Pro...

  • 1 kudos
1 More Replies
DatabricksUser5
by New Contributor II
  • 1175 Views
  • 4 replies
  • 1 kudos

Reset committed offset of spark streaming to capture missed data

I have a very straightforward setup between Azure Eventhub and DLT using the kafka endpoint through spark streaming.There were network issues and the stream didn't pick up some event, but still progressed (and committed) the offset for some reasonAs ...

Data Engineering
dlt spark eventhub kafka azure
  • 1175 Views
  • 4 replies
  • 1 kudos
Latest Reply
DatabricksUser5
New Contributor II
  • 1 kudos

Thank you K_Anudeep! The REST API is exactly what I was looking for.

  • 1 kudos
3 More Replies
murtadha_s
by Databricks Partner
  • 383 Views
  • 1 replies
  • 0 kudos

Authentication Temporarily Unavailable

This has happened alot in the previous weeks although both azure and Databricks showed no issues at the time the error was recieved by both Databricks python SDK and Java SDK, now I started creating a retry mechnaism to retry those errors selectively...

  • 383 Views
  • 1 replies
  • 0 kudos
Latest Reply
siva-anantha
Databricks Partner
  • 0 kudos

My PoV please, We use Databricks REST API; and we have faced 401 or Azure Front door related auth issues. Like you said, we use retry mechanism. Runtime errors are recorded and retry attempts are made if the tasks are idempotent; otherwise user inter...

  • 0 kudos
rakshakpr11
by Databricks Partner
  • 528 Views
  • 3 replies
  • 2 kudos

Compression Export to volume is not working as expected

I am trying to write data into a volume using below table.coalesce(1)          .write          .mode("overwrite")          .format(file_format)          .option("header", "true")          .option("delimiter", field_delimiter)          .option("compre...

rakshakpr11_0-1764608677946.png
  • 528 Views
  • 3 replies
  • 2 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 2 kudos

It sounds like Spark is splitting your output into many small files (one per row) despite coalesce(1). Can you try setting spark.sql.files.maxRecordsPerFile , this limits how many records can be written into a single output file; if this is set to 1 ...

  • 2 kudos
2 More Replies
vr
by Valued Contributor
  • 2874 Views
  • 17 replies
  • 5 kudos

Resolved! remote_query() is not working

I am trying to experiment with remote_query() function according to the documentation. The feature is in public preview, so I assume it should be available to everyone now.select * from remote_query( 'my_connection', database => 'mydb', dbtable...

  • 2874 Views
  • 17 replies
  • 5 kudos
Latest Reply
GA4
New Contributor II
  • 5 kudos

Hi @Coffee77 are you giving the foreign catalog details in the remote query function? coffee77.sampleDB

  • 5 kudos
16 More Replies
Mathew-Vesely
by New Contributor
  • 700 Views
  • 4 replies
  • 0 kudos

Archive of legacy system into Databricks with structure and semi-structured data

We are currently exploring using Data Bricks to store and archive data from a legacy syste. The governance features of Unity Catalogue will give us the required capabilities to ensure we meet our legal, statutory and policy requirements for data rete...

  • 700 Views
  • 4 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Honored Contributor III
  • 0 kudos

Classic 360 Custoemr View case and Databricks is certainly the right platform to do so.Strcutred Data - Stores in the Delta TablesEmail and PDFs - stored in Volumes, however, metadata as path to the volumes stored in delta table against customer-idIn...

  • 0 kudos
3 More Replies
jitendrajha11
by New Contributor II
  • 1363 Views
  • 5 replies
  • 2 kudos

Want to see logs for lineage view run events

Hi All,I need your help, as I am running jobs it is getting successful, when I click on job and there we can find lineage > View run events option when click on it. I see below steps.Job Started: The job is triggered.Waiting for Cluster: The job wait...

  • 1363 Views
  • 5 replies
  • 2 kudos
Latest Reply
Commitchell
Databricks Employee
  • 2 kudos

Hi there, I vibe-coded* a query where I was able to derive most of your events from the system tables: system.lakeflow.jobssystem.lakeflow.job_run_timelinesystem.lakeflow.job_task_run_timeline If you have SELECT access to system tables, this could b...

  • 2 kudos
4 More Replies
ssommer-ai
by New Contributor
  • 378 Views
  • 1 replies
  • 1 kudos

Error when triggering a single job run with a Table update trigger

When I trigger a single job run while having a Table update trigger, I get this error message.It has issues with this parameter: - name: updated_tables   default: "{{job.trigger.table_update.updated_tables}}"I want to have the option of having the ta...

ssommerai_0-1764617238499.png
  • 378 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @ssommer-ai ,When you use a table update trigger with a parameter like {{job.trigger.table_update.updated_tables}}, this dynamic parameter only gets populated when the job is triggered by an actual table update event. When you manually click "Run ...

  • 1 kudos
SRJDB
by New Contributor II
  • 889 Views
  • 3 replies
  • 5 kudos

Resolved! How to restrict the values permitted in a job or task parameter?

Hi, apologies if this is a daft question - I'm relatively new to Databricks and still finding my feet!I have a notebook with a parameter set within it via a widget, like this:dbutils.widgets.dropdown("My widget", "A", ["A", "B", "C"]) my_variable = d...

  • 889 Views
  • 3 replies
  • 5 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 5 kudos

Job/task parameters are free-form strings (or JSON) that get pushed down into tasks; there’s no built‑in way in Jobs to constrain them to an enum list like A/B/C in the UI or API. You can override them at run time, but they’re not validated against t...

  • 5 kudos
2 More Replies
Swathik
by New Contributor III
  • 2948 Views
  • 5 replies
  • 1 kudos

Resolved! Best practices for the meta data driven ETL framework

I am designing a metadata‑driven ETL framework to migrate approximately 500 tables from Db2 to PostgreSQL.After reviewing multiple design patterns and blog posts, I am uncertain about the recommended approach for storing ETL metadata such as source s...

  • 2948 Views
  • 5 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 1 kudos

For a migration of that scale, I’d lean toward storing metadata in database tables rather than YAML files. It’s easier to query, update, and integrate with orchestration tools, especially when you have 500 tables. YAML works fine for small projects, ...

  • 1 kudos
4 More Replies
Brahmareddy
by Esteemed Contributor
  • 2567 Views
  • 4 replies
  • 9 kudos

Future of Movie Discovery: How I Built an AI Movie Recommendation Agent on Databricks Free Edition

As a data engineer deeply passionate about how data and AI can come together to create real-world impact, I’m excited to share my project for the Databricks Free Edition Hackathon 2025 — Future of Movie Discovery (FMD). Built entirely on Databricks F...

  • 2567 Views
  • 4 replies
  • 9 kudos
Latest Reply
AlbertaBode
New Contributor III
  • 9 kudos

Really cool project! The mood-based movie matching and conversational memory make the whole discovery experience feel way more intuitive. It’s interesting because most people still browse platforms manually — like on streaming App — but your system s...

  • 9 kudos
3 More Replies
Labels