cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

YFL
by New Contributor III
  • 9357 Views
  • 11 replies
  • 6 kudos

Resolved! When delta is a streaming source, how can we get the consumer lag?

Hi, I want to keep track of the streaming lag from the source table, which is a delta table. I see that in query progress logs, there is some information about the last version and the last file in the version for the end offset, but this don't give ...

  • 9357 Views
  • 11 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hey @Yerachmiel Feltzman​ I hope all is well.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 6 kudos
10 More Replies
lshar
by New Contributor III
  • 43638 Views
  • 7 replies
  • 5 kudos

Resolved! How do I pass arguments/variables from widgets to notebooks?

Hello,I am looking for a solution to this problem, which is known since 7 years: https://community.databricks.com/s/question/0D53f00001HKHZfCAP/how-do-i-pass-argumentsvariables-to-notebooksWhat I need is to parametrize my notebooks using widget infor...

example_if_run
  • 43638 Views
  • 7 replies
  • 5 kudos
Latest Reply
T_Ash
New Contributor II
  • 5 kudos

Can we create paginated reports with multiple parameters(one parameter can dynamically change other parameter) or we can pass one variable from one dataset to other dataset like power bi paginated report using Databricks dashboard, please let me know...

  • 5 kudos
6 More Replies
rameshybr
by New Contributor II
  • 3773 Views
  • 3 replies
  • 0 kudos

Workflow - How to find the task id at run time in current notebook

There are four tasks in the workflow. How can I get the task ID at the beginning of the notebook, store it after finishing all the code cells in the notebook, and then save it into a table?

  • 3773 Views
  • 3 replies
  • 0 kudos
Latest Reply
menotron
Valued Contributor
  • 0 kudos

Hi @rameshybr,You can capture these as parameters in the task configuration.And from within the notebook you could use the widget utils to get their values.

  • 0 kudos
2 More Replies
Divs2308
by New Contributor II
  • 1325 Views
  • 2 replies
  • 1 kudos

Apply changes in Delta Live tables

Hi,I have created delta live tables (@dlt), need to capture all CDCs (all insert, update, delete) happening on our source.I tried with by creating streaming live table but still not able to achieve.Does delta live tables (@dlt) does support only appe...

  • 1325 Views
  • 2 replies
  • 1 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 1 kudos

The APPLY CHANGES APIs: Simplify change data capture with Delta Live Tables | Databricks on AWS check with this 

  • 1 kudos
1 More Replies
Phani1
by Valued Contributor II
  • 3872 Views
  • 3 replies
  • 2 kudos

Multi languages support in Databricks

Hi Team,How can I set up multiple languages in Databricks? For example, if I connect from Germany, the workspace and data should support German. If I connect from China, it should support Chinese, and if I connect from the US, it should be in English...

  • 3872 Views
  • 3 replies
  • 2 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 2 kudos

1-In that case you need to encode the data in that language format , ex if the data is in japanease then u need to encode in UTF-8 REATE OR REPLACE TEMP VIEW japanese_dataAS SELECT * FROM csv.`path/to/japanese_data.csv` OPTIONS ('encoding'='UTF-8')al...

  • 2 kudos
2 More Replies
databricks8923
by New Contributor
  • 3847 Views
  • 0 replies
  • 0 kudos

DLT Pipeline, Autoloader, Streaming Query Exception: Could not find ADLS Gen2 Token

I have set up autoloader to form a streaming table in my DLT pipeline,  import dlt@dlt.tabledef streamFiles_new():        return (            spark.readStream.format("cloudFiles")                .option("cloudFiles.format", "json")                .op...

  • 3847 Views
  • 0 replies
  • 0 kudos
explorer
by New Contributor III
  • 6437 Views
  • 4 replies
  • 1 kudos

Resolved! Deleting records manually in databricks streaming table.

Hi Team , Let me know if there is any ways I can delete records manually from databricks streaming table without corrupting table and data.Can we delete the few records (based on some condition) manually in databricks streaming table (having checkpoi...

  • 6437 Views
  • 4 replies
  • 1 kudos
Latest Reply
SparkJun
Databricks Employee
  • 1 kudos

  If you use the applyChanges method in DLT for Change Data Capture (CDC), you can delete records manually without affecting the consistency of the table, as applyChanges respects manual deletions. You must configure your DLT pipeline to respect manu...

  • 1 kudos
3 More Replies
alexkychen
by New Contributor II
  • 1699 Views
  • 2 replies
  • 0 kudos

How to read csv files stored in my Databricks workspace using a Python script in my local computer?

I am developing a Python app on my local computer, and I would like to let it read some data stored in my Databricks workspace using preferably Pandas. The data are stored in .csv files in the workspace. How can I make this happen? Is it possible to ...

  • 1699 Views
  • 2 replies
  • 0 kudos
Latest Reply
alexkychen
New Contributor II
  • 0 kudos

Hi Eni,Thank you very much for your reply. I also did some research, but realized that storing sensitive data (which is in my case) in DBFS is no longer recommended by Databricks due to security reason as it states here: https://docs.databricks.com/e...

  • 0 kudos
1 More Replies
BeardyMan
by New Contributor III
  • 9060 Views
  • 9 replies
  • 3 kudos

Resolved! MLFlow Serve Logging

When using Azure Databricks and serving a model, we have received requests to capture additional logging. In some instances, they would like to capture input and output or even some of the steps from a pipeline. Is there any way we can extend the lo...

  • 9060 Views
  • 9 replies
  • 3 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 3 kudos

Another word from a Databricks employee:"""You can use the custom model approach but configuring it is painful. Plus you have ended every loggable model in the custom model. Another less intrusive solution would be to have a proxy server do the loggi...

  • 3 kudos
8 More Replies
berk
by New Contributor II
  • 2861 Views
  • 2 replies
  • 1 kudos

Delete Managed Table from S3 Bucket

Hello,I am encountering an issue with our managed tables in Databricks. The tables are stored in S3 Bucket. When I drop a managed table (either through UI or through running a drop table code in a notebook), the associated data is not being deleted f...

  • 2861 Views
  • 2 replies
  • 1 kudos
Latest Reply
berk
New Contributor II
  • 1 kudos

@kenkoshaw, thank you for your reply. It is indeed interesting that the data isn't immediately deleted after the table is dropped, and that there's no way to force this process. I suppose I'll have to manually delete the files from the S3 Bucket if I...

  • 1 kudos
1 More Replies
Roxio
by New Contributor II
  • 2236 Views
  • 1 replies
  • 1 kudos

Resolved! Materilized view quite slower than table and lots of time on "Optimizing query & pruning files"

I have a query that calls different materialized views, anyway most of the time of the query is spent in "Optimizing query & pruning files" vs the execution.The difference is like 2-3 secs for the optimization and 300-400ms for the executionSimilar i...

  • 2236 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi Roxio, How are you doing today?The difference in query times between materialized views and tables likely comes from the complexity of the views, as they often involve more steps in the background. To reduce the optimization time, you can try simp...

  • 1 kudos
basit_siddiqui
by New Contributor III
  • 800 Views
  • 1 replies
  • 0 kudos

Autloader error for assuming a role

Hi @Retired_mod I have seen numerous post by you. Thanks for continuously providing support. Can you or your colleagues help on this. We have a basic user which assumes a role with S3 policy to a specific bucket. When we try to read the bucket from D...

  • 800 Views
  • 1 replies
  • 0 kudos
Latest Reply
basit_siddiqui
New Contributor III
  • 0 kudos

Py4JJavaError: An error occurred while calling o503.json. : java.nio.file.AccessDeniedException: s3a://xxxxxx.json: shaded.databricks.org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by AwsCredentialContextTokenProvid...

  • 0 kudos
manoj_
by New Contributor II
  • 1363 Views
  • 1 replies
  • 0 kudos

Databricks view error

Data source errorDataSource.Error: ODBC: ERROR [42000] [Microsoft][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query.Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: [DEL...

  • 1363 Views
  • 1 replies
  • 0 kudos
Latest Reply
manoj_
New Contributor II
  • 0 kudos

This view used to run till last week and suddenly started giving this error. So need to check what can be the reason for this issue

  • 0 kudos
ksilva
by New Contributor
  • 5282 Views
  • 4 replies
  • 1 kudos

Incorrect secret value when loaded as environment variable

I recently faced an issue that took good hours to identify. I'm loading an environment variable with a secretENVVAR: {{secrets/scope/key}}The secret is loaded in my application, I could verify it's there, but its value is not correct. I realised tha...

  • 5282 Views
  • 4 replies
  • 1 kudos
Latest Reply
danmlopsmaz
New Contributor II
  • 1 kudos

Hi team, is there an update or fix for this?

  • 1 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels