cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Yuki
by Contributor
  • 952 Views
  • 2 replies
  • 1 kudos

Resolved! Can we implement Unity Catalog table lifecycle?

I want to delete tables that haven't been selected or otherwise accessed for several months.I can see the Delta table history, but I can only catch the DDL or update/insert/delete and can't catch "select".I realized that the Unity Catalog insight, ht...

  • 952 Views
  • 2 replies
  • 1 kudos
Latest Reply
Yuki
Contributor
  • 1 kudos

Hi @Renu_ ,I appreciate for your clear response. I now have a better understanding and will work with our admin team to develop a strategy.Thank you.

  • 1 kudos
1 More Replies
Bart_DE
by New Contributor II
  • 1318 Views
  • 2 replies
  • 0 kudos

Resolved! Concurency behavior with merge operations

Hi community,I have this case right now in project where i have to develop a solution that will prevent duplicate data from being ingested twice to delta lake. Some of our data suppliers at a rare occurence are sending us the same dataset in two diff...

  • 1318 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Your idea of using a log table to track processed ingestions and leveraging a MERGE operation in your pipeline is a sound approach for preventing duplicate data ingestion into Delta Lake. Delta Lake's ACID transactions and its support for concurrency...

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 2978 Views
  • 2 replies
  • 0 kudos

DBFS Permissions

if there is permission control on the folder/file level in DBFS.e.g. if a team member uploads a file to /Filestore/Tables/TestData/testfile, could we mask permissions on TestData and/or testfile?

  • 2978 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

DBFS does not have ACL at this point

  • 0 kudos
1 More Replies
sahil3
by New Contributor
  • 500 Views
  • 1 replies
  • 0 kudos

NOT ABLE TO ATTACH CLUSTRE

notebook detached-exception when creating execution context:java.until.concurrent.timeoutexceoption:timed out after 15 seconds

  • 500 Views
  • 1 replies
  • 0 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 0 kudos

Hey @sahil3 Try detach and re-attaching the notebook to the notebook. Please note that this will clear the state of the notebook.if the issue persists, try restarting the cluster.Best,

  • 0 kudos
rak_haq
by New Contributor III
  • 1391 Views
  • 3 replies
  • 1 kudos

Resolved! How to use read_kafka() SQL with secret()?

Hi,I want to read data from the Azure Event Hub using SQL.Can someone please give me an executable example where you can also use the connection string from the event hub using the SQL function secret(), for example?This is what i tried but it Databr...

Data Engineering
azure
event_hub
kafka
sql
streaming
  • 1391 Views
  • 3 replies
  • 1 kudos
Latest Reply
rak_haq
New Contributor III
  • 1 kudos

I found the solution und could successfully establish a connection to Event-Hub.  SELECT cast(value as STRING) as raw_json, current_timestamp() as processing_time FROM read_kafka( bootstrapServers => '<YOUR EVENT-HUB NAMESPACE>.servicebus.windows.n...

  • 1 kudos
2 More Replies
Ajay-Pandey
by Databricks MVP
  • 4794 Views
  • 5 replies
  • 0 kudos

On-behalf-of token creation for service principals is not enabled for this workspace

Hi AllI just wanted to create PAT for Databricks Service Principle but getting below code while hitting API or using CLI - Please help me to create PAT for the same.#dataengineering #databricks

AjayPandey_0-1710845262519.png AjayPandey_1-1710845276557.png
Data Engineering
community
Databricks
  • 4794 Views
  • 5 replies
  • 0 kudos
Latest Reply
JackB
New Contributor II
  • 0 kudos

You can generate the token while logged in as the Service Principle via the Azure CLI in a Command Prompt window.  To do so, make sure to install the Azure CLI and the Databricks CLI with it.Install the Azure CLI for Windows | Microsoft LearnInstall ...

  • 0 kudos
4 More Replies
Harrison
by New Contributor II
  • 2244 Views
  • 1 replies
  • 0 kudos

Reading CloudWatch Logs from AWS Kinesis

If you have AWS CloudWatch subscribed to write out logs to AWS Kinesis, the Kinesis stream is base64 encoded and the CloudWatch logs are GZIP compressed. The challenge we faced was how to address that in pyspark to be able to read the data.  We were ...

  • 2244 Views
  • 1 replies
  • 0 kudos
Latest Reply
oblikas
New Contributor II
  • 0 kudos

Thank you so much, this is very helpful

  • 0 kudos
BF7
by Contributor
  • 1629 Views
  • 3 replies
  • 3 kudos

Resolved! What is the difference between spark inferschema and cloudFiles.inferColumnTypes?

We have been using spark.read with inferSchema = True to validate AutoLoader schema inferencing. But I have a suspicion that they do these differently from each other and may not always yield the identical results.Has anyone ever answered this questi...

  • 1629 Views
  • 3 replies
  • 3 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 3 kudos

Hi @BF7 Yes — there is a difference between how spark.read(...).option("inferSchema", "true")and Auto Loader's schema inference (cloudFiles.schemaHints, cloudFiles.inferColumnTypes, etc.) work.They are not guaranteed to produce identical results,Key ...

  • 3 kudos
2 More Replies
Unimog
by New Contributor III
  • 1250 Views
  • 3 replies
  • 1 kudos

Resolved! springml sftp with spark 3.x

Is there a version of springml spark-sftp that works with spark 3.x and scala 2.12?  If so can you point me to it or how to load it in my compute?

  • 1250 Views
  • 3 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

For Python you might want to look at Paramiko, it seems that it might be an option.  You could also look at ETL tools like Airbyte, Rivery, CData, etc.

  • 1 kudos
2 More Replies
ÓscarHernández
by New Contributor II
  • 2513 Views
  • 3 replies
  • 0 kudos

SQLSTATE: XX000 The Spark SQL phase planning failed with an internal error.

Hello everyone,I am currently working with a SQL Warehouse and have been getting the following error message:[INTERNAL_ERROR ] The Spark SQL phase planning failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, re...

  • 2513 Views
  • 3 replies
  • 0 kudos
Latest Reply
ÓscarHernández
New Contributor II
  • 0 kudos

I have tried to simplify the query as much as possible to see if that helps but the bug still persists. The problem should be something with the way Databricks treats columns passed as arguments for a function.I tried these queries:select * FROM VALU...

  • 0 kudos
2 More Replies
minhhung0507
by Valued Contributor
  • 7201 Views
  • 15 replies
  • 3 kudos

API for Restarting Individual Failed Tasks within a Job?

Hi everyone,I'm exploring ways to streamline my workflow in Databricks and could really use some expert advice. In my current setup, I have a job (named job_silver) with multiple tasks (e.g., task 1, task 2, task 3). When one of these tasks fails—say...

  • 7201 Views
  • 15 replies
  • 3 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 3 kudos

Hey @minhhung0507 - quick question - what is the cluster type you're using to run your workflow?I'm using a shared, interactive cluster, so I'm passing the parameter {'existing_cluster_id' : task['existing_cluster_id']}in the payload. This parameter ...

  • 3 kudos
14 More Replies
smpa01
by Contributor
  • 1410 Views
  • 4 replies
  • 0 kudos

Resolved! Debugging jobs/run-now endpoint

I am not being able to run jobs/runnow endpoint. I am getting an error asError fetching files: 403 - {"error_code":"PERMISSION_DENIED","message":"User xxxx-dxxxx-xxx-xxxx does not have Manage Run or Owner or Admin permissions on job 437174060919465",...

smpa01_0-1744904979789.png smpa01_1-1744905123010.png
  • 1410 Views
  • 4 replies
  • 0 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 0 kudos

Hi @smpa01 - The PAT you're using belongs to Service Principal or your personal token? If SP, it should have permissions to run the DBX workflow. Let me know if any questions.

  • 0 kudos
3 More Replies
21f3001806
by New Contributor III
  • 2113 Views
  • 5 replies
  • 5 kudos

Resolved! Dlt pipeline showing legacy , even though all things are latest

Some of old dlt pipelines in my databricks workspace are showing legacy,I am using serverless pipeline with mode - preview.Anything which I missed ?

  • 2113 Views
  • 5 replies
  • 5 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 5 kudos

@ashraf1395 - I understand now, let me try it once.

  • 5 kudos
4 More Replies
daan_dw
by New Contributor III
  • 674 Views
  • 1 replies
  • 0 kudos

Writing files using multithreading to dbfs

Hello,I am reading in xml files from AWS S3 and storing them on dbfs:/ using multithreaded code. The code itself seems to be fine as for the first +- 100 000 files it works without issues and I can see the data arriving on DBFS.However it will always...

Screenshot 2025-04-11 at 16.14.04.png
  • 674 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor
  • 0 kudos

Hi @daan_dw I think this issue mainly comes from using multithreading to handle XML files while interacting with both S3 and DBFS. When the thread count gets too high, it likely causes race conditions.To avoid this:Try reducing the number of threads....

  • 0 kudos
Yuki
by Contributor
  • 1099 Views
  • 2 replies
  • 2 kudos

Resolved! How do you think continuing to use instance profile to S3 multi part upload?

My team is currently using an instance profile to upload data to S3 since we only have Hive Metastore.I like Unity Catalog a lot, but my code uses multipart upload to S3 for efficiency.https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview...

  • 1099 Views
  • 2 replies
  • 2 kudos
Latest Reply
Yuki
Contributor
  • 2 kudos

Hi @lingareddy_Alva ,Thank you for your excellent response. I really appreciated it.I couldn't find the mention that says "Instance profiles are still supported but should be used for specific, advanced access cases." I will use it for now, recognizi...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels