cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

explorer
by New Contributor III
  • 5068 Views
  • 4 replies
  • 1 kudos

Resolved! Deleting records manually in databricks streaming table.

Hi Team , Let me know if there is any ways I can delete records manually from databricks streaming table without corrupting table and data.Can we delete the few records (based on some condition) manually in databricks streaming table (having checkpoi...

  • 5068 Views
  • 4 replies
  • 1 kudos
Latest Reply
SparkJun
Databricks Employee
  • 1 kudos

  If you use the applyChanges method in DLT for Change Data Capture (CDC), you can delete records manually without affecting the consistency of the table, as applyChanges respects manual deletions. You must configure your DLT pipeline to respect manu...

  • 1 kudos
3 More Replies
alexkychen
by New Contributor II
  • 1066 Views
  • 2 replies
  • 0 kudos

How to read csv files stored in my Databricks workspace using a Python script in my local computer?

I am developing a Python app on my local computer, and I would like to let it read some data stored in my Databricks workspace using preferably Pandas. The data are stored in .csv files in the workspace. How can I make this happen? Is it possible to ...

  • 1066 Views
  • 2 replies
  • 0 kudos
Latest Reply
alexkychen
New Contributor II
  • 0 kudos

Hi Eni,Thank you very much for your reply. I also did some research, but realized that storing sensitive data (which is in my case) in DBFS is no longer recommended by Databricks due to security reason as it states here: https://docs.databricks.com/e...

  • 0 kudos
1 More Replies
BeardyMan
by New Contributor III
  • 6758 Views
  • 9 replies
  • 3 kudos

Resolved! MLFlow Serve Logging

When using Azure Databricks and serving a model, we have received requests to capture additional logging. In some instances, they would like to capture input and output or even some of the steps from a pipeline. Is there any way we can extend the lo...

  • 6758 Views
  • 9 replies
  • 3 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 3 kudos

Another word from a Databricks employee:"""You can use the custom model approach but configuring it is painful. Plus you have ended every loggable model in the custom model. Another less intrusive solution would be to have a proxy server do the loggi...

  • 3 kudos
8 More Replies
berk
by New Contributor II
  • 1886 Views
  • 2 replies
  • 1 kudos

Delete Managed Table from S3 Bucket

Hello,I am encountering an issue with our managed tables in Databricks. The tables are stored in S3 Bucket. When I drop a managed table (either through UI or through running a drop table code in a notebook), the associated data is not being deleted f...

  • 1886 Views
  • 2 replies
  • 1 kudos
Latest Reply
berk
New Contributor II
  • 1 kudos

@kenkoshaw, thank you for your reply. It is indeed interesting that the data isn't immediately deleted after the table is dropped, and that there's no way to force this process. I suppose I'll have to manually delete the files from the S3 Bucket if I...

  • 1 kudos
1 More Replies
Roxio
by New Contributor II
  • 1221 Views
  • 1 replies
  • 1 kudos

Resolved! Materilized view quite slower than table and lots of time on "Optimizing query & pruning files"

I have a query that calls different materialized views, anyway most of the time of the query is spent in "Optimizing query & pruning files" vs the execution.The difference is like 2-3 secs for the optimization and 300-400ms for the executionSimilar i...

  • 1221 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 1 kudos

Hi Roxio, How are you doing today?The difference in query times between materialized views and tables likely comes from the complexity of the views, as they often involve more steps in the background. To reduce the optimization time, you can try simp...

  • 1 kudos
basit_siddiqui
by New Contributor III
  • 492 Views
  • 1 replies
  • 0 kudos

Autloader error for assuming a role

Hi @Retired_mod I have seen numerous post by you. Thanks for continuously providing support. Can you or your colleagues help on this. We have a basic user which assumes a role with S3 policy to a specific bucket. When we try to read the bucket from D...

  • 492 Views
  • 1 replies
  • 0 kudos
Latest Reply
basit_siddiqui
New Contributor III
  • 0 kudos

Py4JJavaError: An error occurred while calling o503.json. : java.nio.file.AccessDeniedException: s3a://xxxxxx.json: shaded.databricks.org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by AwsCredentialContextTokenProvid...

  • 0 kudos
manoj_
by New Contributor II
  • 791 Views
  • 1 replies
  • 0 kudos

Databricks view error

Data source errorDataSource.Error: ODBC: ERROR [42000] [Microsoft][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query.Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: [DEL...

  • 791 Views
  • 1 replies
  • 0 kudos
Latest Reply
manoj_
New Contributor II
  • 0 kudos

This view used to run till last week and suddenly started giving this error. So need to check what can be the reason for this issue

  • 0 kudos
ksilva
by New Contributor
  • 4148 Views
  • 4 replies
  • 1 kudos

Incorrect secret value when loaded as environment variable

I recently faced an issue that took good hours to identify. I'm loading an environment variable with a secretENVVAR: {{secrets/scope/key}}The secret is loaded in my application, I could verify it's there, but its value is not correct. I realised tha...

  • 4148 Views
  • 4 replies
  • 1 kudos
Latest Reply
danmlopsmaz
New Contributor II
  • 1 kudos

Hi team, is there an update or fix for this?

  • 1 kudos
3 More Replies
marcuskw
by Contributor II
  • 2463 Views
  • 5 replies
  • 5 kudos

Resolved! IDENTIFIER not working in UPDATE

The following code works perfectly fine: df = spark.createDataFrame([('A', 1), ('B', 2)]) df.createOrReplaceTempView('temp') spark.sql(""" SELECT IDENTIFIER(:col) FROM temp """, args={ "col": "_1" } ).display(...

  • 2463 Views
  • 5 replies
  • 5 kudos
Latest Reply
marcuskw
Contributor II
  • 5 kudos

If it helps anyone else I found this article that described a few limitations:https://community.databricks.com/t5/technical-blog/how-not-to-build-an-execute-immediate-demo/ba-p/82167 

  • 5 kudos
4 More Replies
leireroman
by New Contributor III
  • 1782 Views
  • 3 replies
  • 0 kudos

Resolved! RESOURCE_EXHAUSTED dbutils.jobs.taskValues.get

I've a job in Databricks running multiple tasks in parallel. Those tasks read parameters of the job using the utility of dbutils. I'm getting the following error when trying to read parameters in my different tasks:com.databricks.common.client.Databr...

image.png
  • 1782 Views
  • 3 replies
  • 0 kudos
Latest Reply
leireroman
New Contributor III
  • 0 kudos

Hi all,Our solution has been to use job parameters and dynamic value references. These are read using dbutils.widgets.get() instead of dbutils.jobs.taskValues.get(). Now, our ETL is working well again.Pass context about job runs into job tasks - Azur...

  • 0 kudos
2 More Replies
4kb_nick
by New Contributor III
  • 1707 Views
  • 3 replies
  • 0 kudos

Unity Catalog Lineage Not Working on GCP

Hello,We have set up a lakehouse in Databricks for one of our clients. One of the features our client would like to use is the Unity Catalog data lineage view. This is a handy feature that we have used with other clients (in both AWS and Azure) witho...

  • 1707 Views
  • 3 replies
  • 0 kudos
Latest Reply
4kb_nick
New Contributor III
  • 0 kudos

Hello,It's been a few months since this exchange. The feature limitation is not documented anywhere - documents imply that this should be working in GCP:https://docs.gcp.databricks.com/en/data-governance/unity-catalog/data-lineage.htmlIs this feature...

  • 0 kudos
2 More Replies
Valentin14
by New Contributor II
  • 7786 Views
  • 5 replies
  • 4 kudos

Import module never ends on random branches

Hello,Since a week ago, our notebook are stuck in running on the firsts cells which import python module from our github repository which is cloned in databricks.The cells stays in running state and when we try to manually cancel the jobs in databric...

  • 7786 Views
  • 5 replies
  • 4 kudos
Latest Reply
timo199
New Contributor II
  • 4 kudos

@Retired_mod 

  • 4 kudos
4 More Replies
SebastianCar28
by New Contributor
  • 791 Views
  • 0 replies
  • 0 kudos

How to implement Lifecycle of Data When Use ADLS

Hello everyone, nice to greet you. I have a question about the data lifecycle in ADLS. I know ADLS has its own rules, but they aren't working properly because I have two ADLS accounts: one for hot data and another for cool storage where the informati...

  • 791 Views
  • 0 replies
  • 0 kudos
weldermartins
by Honored Contributor
  • 8271 Views
  • 6 replies
  • 10 kudos

Resolved! Spark - API Jira

Hello guys. I use pyspark in my daily life. A demand has arisen to collect information in Jira. I was able to do this via Talend ESB, but I wouldn't want to use different tools to get the job done. Do you have any example of how to extract data from ...

  • 8271 Views
  • 6 replies
  • 10 kudos
Latest Reply
Marty73
New Contributor II
  • 10 kudos

Hi,There is also a new Databricks for Jira add-on on the Atlassian Marketplace. It is easy to setup and exports are directly created within Jira. They can be one-time, scheduled, or real-time. It can also export additional Jira data such as Assets, C...

  • 10 kudos
5 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels