cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

holychs
by New Contributor III
  • 39 Views
  • 1 replies
  • 0 kudos

Run failed with error message Cluster was terminated. Reason: JOB_FINISHED (SUCCESS)

I am running a notebook through workflow using all purpose cluster("data_security_mode": "USER_ISOLATION"). I am seeing some strange behaviour with the cluster during the run. While the job is still running cluster gets terminated with the Reason: Re...

Data Engineering
clusterds
clusters
jobs
Workflows
  • 39 Views
  • 1 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 0 kudos

@holychs - Well, this behaviour needs troubleshooting I imagine.- What is the auto-termination value. Try increasing it to much higher value and observe if it is the same.- Does your workflow have multiple notebook tasks? If Task A finishes while Tas...

  • 0 kudos
Punit_Prajapati
by New Contributor III
  • 183 Views
  • 2 replies
  • 1 kudos

Long-lived authentication for Databricks Apps / FastAPI when using Service Principal (IoT use case)

Hi Community,I’m working with Databricks Apps (FastAPI) and invoking the API from external IoT devices.Currently, the recommended approach is to authenticate using a Bearer token generated via a Databricks Apps Service Principal (Client ID + Client S...

Punit_Prajapati_1-1767935971292.png
  • 183 Views
  • 2 replies
  • 1 kudos
Latest Reply
Punit_Prajapati
New Contributor III
  • 1 kudos

Hi Databricks Team,Thanks for the response.I reviewed the Unified Authentication documentation. From what I understand, the supported authentication methods are PAT, M2M (Service Principal OAuth), and U2M.For my use case, external IoT devices are cal...

  • 1 kudos
1 More Replies
maddan80
by New Contributor II
  • 110 Views
  • 1 replies
  • 1 kudos

Serverless giving inconsistent results in Oracle UCM SOAP call

Hello ,We have implemented Data pipeline to ingest data from Oracle UCM using SOAP API, This was working fine with Job and all Purpose clusters. Recently we wanted to use Serverless to take advantage of the server startup time. In this case we were n...

  • 110 Views
  • 1 replies
  • 1 kudos
Latest Reply
mmayorga
Databricks Employee
  • 1 kudos

Hi @maddan80  Thank you for reaching out with your question and providing the context about your use case. Per your comments, having a 200 Status Code in Serverless is a good initial indicator that the request is reaching the Oracle UCM server. Brain...

  • 1 kudos
demo-user
by New Contributor II
  • 51 Views
  • 1 replies
  • 0 kudos

Connecting an S3-compatible endpoint (such as MinIO) to Unity Catalog

Hi everyone, is it possible to connect an S3-compatible storage endpoint that is not AWS S3 (for example MinIO) to Databricks Unity Catalog? I already have access using Spark configurations (3a endpoint, access key, secret key, etc.), and I can read/...

  • 51 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

It's not supported unfortunately to register those as Storage Credentials. But the ask seems to be coming up more frequently and I believe Product is in "Discovery" phase to support it in UC. Here are some standard questions that might help with coll...

  • 0 kudos
cdn_yyz_yul
by Contributor
  • 55 Views
  • 3 replies
  • 0 kudos

unionbyname several streaming dataframes of different sources

Is the following type of union safe with spark structured streaming?union multiple streaming dataframes, and each from a different source.Anything better solution ?for example, df1 = spark.readStream.table(f"{bronze_catalog}.{bronze_schema}.table1") ...

  • 55 Views
  • 3 replies
  • 0 kudos
Latest Reply
cdn_yyz_yul
Contributor
  • 0 kudos

Thanks @stbjelcevic ,I am looking for a solution .... === Let's say, I have already had: df1 = spark.readStream.table(f"{bronze_catalog}.{bronze_schema}.table1") df2 = spark.readStream.table(f"{bronze_catalog}.{bronze_schema}.table2") df1a = df1.se...

  • 0 kudos
2 More Replies
dnchankov
by New Contributor II
  • 9860 Views
  • 5 replies
  • 9 kudos

Resolved! Why my notebook I created in a Repo can be opened safe?

I've cloned a Repo during "Get Started with Data Engineering on Databricks".Then I'm trying to run another notebook from a cell with a magic %run command.But I get that the file can't be opened safe.Here my code:notebook_aname = "John" print(f"Hello ...

  • 9860 Views
  • 5 replies
  • 9 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 9 kudos

+1 to all the above comments. Having the %run command along with other commands will confuse the REPL execution. So, have the %run notebook_b3 command alone in a new cell, maybe as the first cell, is notebook_a, which will resolve the issue, and your...

  • 9 kudos
4 More Replies
fintech_latency
by New Contributor
  • 187 Views
  • 9 replies
  • 2 kudos

How to guarantee “always-warm” serverless compute for low-latency Jobs workloads?

We’re building a low-latency processing pipeline on Databricks and are running into serverless cold-start constraints.We ingest events (calls) continuously via a Spark Structured Streaming listener.For each event, we trigger a serverlesss compute tha...

  • 187 Views
  • 9 replies
  • 2 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 2 kudos

@fintech_latency  For streaming: refactor to one long‑running Structured Streaming job with a short trigger interval (for example, 1s) and move “assignment” logic into foreachBatch or a transactional task table updated within the micro‑batch. For per...

  • 2 kudos
8 More Replies
Fox19
by New Contributor II
  • 95 Views
  • 4 replies
  • 0 kudos

DELTA_FEATURES_REQUIRE_MANUAL_ENABLEMENT DLT Streaming Table as Variant

I am attempting to ingest csv files from an S3 bucket with Autoloader. Since the schema of the data is inconsistent (each csv may have different headers), I was hoping to ingest the data as Variant following this: https://docs.databricks.com/aws/en/i...

  • 95 Views
  • 4 replies
  • 0 kudos
Latest Reply
pradeep_singh
New Contributor II
  • 0 kudos

Can you also share the exact code you are running to ingest 

  • 0 kudos
3 More Replies
ajay_wavicle
by New Contributor
  • 100 Views
  • 3 replies
  • 0 kudos

How to copy files of databricks associated storage account UC tables along with _delta_log folder

I want to migrate managed tables from one cloud Databricks workspace to another as it is with delta history. I am able to do with External tables since i have access to storage account container folder but its not the case for UC managed tables. How ...

  • 100 Views
  • 3 replies
  • 0 kudos
Latest Reply
lucami
Contributor
  • 0 kudos

Hi @szymon_dybczak,I suggest you the following:Create storage credentialRegister an external location for new storage locationCreate the catalog with a managed locationMigrate table with full Delta history using DEEP CLONE-- Azure exampleCREATE EXTER...

  • 0 kudos
2 More Replies
jeremy98
by Honored Contributor
  • 9711 Views
  • 5 replies
  • 0 kudos

Resolved! Concurrent Writes to the same DELTA TABLE

Hi Community,My team and I have written some workflows that write to the same table. One of my workflows performs a MERGE operation on the table, while another workflow performs an append. However, these operations can occur simultaneously, leading t...

  • 9711 Views
  • 5 replies
  • 0 kudos
Latest Reply
tariqueanwer
New Contributor II
  • 0 kudos

I'm sorry, but wouldn't the "Serializable" isolation level make it worse?

  • 0 kudos
4 More Replies
batch_bender
by New Contributor
  • 77 Views
  • 1 replies
  • 0 kudos

create_auto_cdc_from_snapshot_flow vs create_auto_cdc_flow – when is snapshot CDC actually worth it?

I am deciding between create_auto_cdc_from_snapshot_flow() and create_auto_cdc_flow() in a pipeline.My source is a daily full snapshot table:No operation column (no insert/update/delete flags)Order can be derived from snapshot_date (sequence by)Rows ...

  • 77 Views
  • 1 replies
  • 0 kudos
Latest Reply
pradeep_singh
New Contributor II
  • 0 kudos

If your source only emits full daily snapshots, create_auto_cdc_from_snapshot_flow() is purpose-built for this and will likely be simpler and safer to operate than synthesizing CDC events for create_auto_cdc_flow(). It automatically computes inserts/...

  • 0 kudos
AJ270990
by Contributor II
  • 135 Views
  • 1 replies
  • 0 kudos

All purpose cluster, SQL Warehouse and Job Cluster are not executing the code

All purpose cluster, SQL Warehouse and Job Cluster are not executing the spark code in Pro and Classic mode. When switched to Serverless mode they are able to execute the code. When checked with Networking team there were no subnet changes recently. ...

AJ270990_0-1768207712912.png
  • 135 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

Can you share the actual commands and error messages? Screenshots if you have them.

  • 0 kudos
dpc
by Contributor II
  • 129 Views
  • 5 replies
  • 2 kudos

Using AD groups for object ownership

Databricks has a general issue with object ownership in that only the creator can delete them.So, if I create a catalog, table, view, schema etc. I am the only person who can delete it.No good if it's a general table or view and some other developer ...

  • 129 Views
  • 5 replies
  • 2 kudos
Latest Reply
pradeep_singh
New Contributor II
  • 2 kudos

 I had the problem with another client at a much larger scale.This is what we did .At the end of each pipeline that we ran in the development environment we had an AlterOwnership task.When a user runs a pipeline with his/her credentials all the objec...

  • 2 kudos
4 More Replies
Labels