cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Jennifer
by New Contributor III
  • 203 Views
  • 5 replies
  • 0 kudos

Can external tables be created backed by current cloud files without ingesting files in Databricks?

Hi,We have huge amount of parquet files in s3 with the path pattern <bucket>/<customer>/yyyy/mm/dd/hh/.*.parquet.The question is can I create a external table in Unity Catalog from this external location without actually ingesting the files? Like wha...

  • 203 Views
  • 5 replies
  • 0 kudos
Latest Reply
Data_Mavericks
New Contributor
  • 0 kudos

 i think the issue is that you are trying to create a DELTA table in Unity catalog from an Parquet source without converting it to Delta format first.As Unity catalog will not allow delta table to be created in an non-empty location. Since you want t...

  • 0 kudos
4 More Replies
Eric_Kieft
by New Contributor III
  • 126 Views
  • 3 replies
  • 1 kudos

Centralized Location of Table History/Timestamps in Unity Catalog

Is there a centralized location in Unity Catalog that retains the table history, specifically the last timestamp, for managed delta tables?DESCRIBE HISTORY will provide it for a specific table, but I would like to get it for a number of tables.inform...

  • 126 Views
  • 3 replies
  • 1 kudos
Latest Reply
Brahmareddy
Honored Contributor II
  • 1 kudos

Hi Eric_Kieft,How are you doing today?, As per my understanding, yeah, Unity Catalog doesn’t currently provide a direct system table that tracks all table modifications (including inserts/updates) across multiple managed Delta tables. DESCRIBE HISTOR...

  • 1 kudos
2 More Replies
Jaclaglez13
by New Contributor II
  • 93 Views
  • 1 replies
  • 1 kudos

[UNRESOLVED_ROUTINE] Cannot resolve function `date_format`

Hi all,We are getting the following error log in a Workflow:AnalysisException: [UNRESOLVED_ROUTINE] Cannot resolve function `date_format` on search path [`system`.`builtin`, `system`.`session`]. SQLSTATE: 42883The Workflow consists in different noteb...

  • 93 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Honored Contributor II
  • 1 kudos

Hi Jaclaglez13,How are you doing today?, As per my understanding, yeah, this issue usually happens when Unity Catalog affects function resolution across different tasks in a Workflow. Since your first task runs fine but the second one fails, it’s lik...

  • 1 kudos
Volker
by New Contributor III
  • 17 Views
  • 0 replies
  • 0 kudos

From Partitioning to Liquid Clustering

We had some delta tables that where previously partitioned on year, month, day, and hour. This resulted in quite small partitions and we now switched to liquid clustering.We followed these steps:Remove partitioning by doing REPLACEALTER TABLE --- CLU...

  • 17 Views
  • 0 replies
  • 0 kudos
DaPo
by New Contributor II
  • 76 Views
  • 0 replies
  • 0 kudos

DLT Fails with Exception: CANNOT_READ_STREAMING_STATE_FILE

I have several DLT Pipeline, writing to some schema in a unity catalog. The storage location of the unity-catalog is managed by the databricks deployment (on AWS).The schema and the dlt-pipeline are managed via databricks asset bundles. I did not cha...

  • 76 Views
  • 0 replies
  • 0 kudos
DylanStout
by Contributor
  • 104 Views
  • 1 replies
  • 0 kudos

Error while reading file from Cloud Storage

The code we are executing: df = spark.read.format("parquet").load("/mnt/g/drb/HN/") df.write.mode('overwrite').saveAsTable("bronze.HN")the error it throws:org.apache.spark.SparkException: Job aborted due to stage failure: Task 44 in stage 642.0 faile...

  • 104 Views
  • 1 replies
  • 0 kudos
Latest Reply
ashraf1395
Valued Contributor III
  • 0 kudos

Try these solutions https://community.databricks.com/t5/data-engineering/how-can-i-convert-a-parquet-into-delta-table/td-p/14348

  • 0 kudos
LorenRD
by Contributor
  • 9709 Views
  • 15 replies
  • 11 kudos
  • 9709 Views
  • 15 replies
  • 11 kudos
Latest Reply
miranda_luna_db
Databricks Employee
  • 11 kudos

Hi folks - if you're unsure who your account team is and you're interested in the app delegated auth preview, please contact us via aibi-previews [at] databricks [dot] com

  • 11 kudos
14 More Replies
JordanYaker
by Contributor
  • 1531 Views
  • 1 replies
  • 0 kudos

Integration options for Databricks Jobs and DataDog?

I know that there is already the Databricks (technically Spark) integration for DataDog. Unfortunately, that integration only covers the cluster execution itself and that means only Cluster Metrics and Spark Jobs and Tasks. I'm looking for somethin...

  • 1531 Views
  • 1 replies
  • 0 kudos
Latest Reply
BrenAlex_88575
New Contributor II
  • 0 kudos

Did you ever figure out a good solution? I'm faced with the same problem.

  • 0 kudos
JameDavi_51481
by Contributor
  • 196 Views
  • 5 replies
  • 1 kudos

parameterized ALTER TABLE SET TAGS

I would like to use parametrized sql queries to run SET TAGs commands on tables, but cannot figure out how to parameterize the query to prevent SQL injection. Both the `?` and `:key` parameter syntaxes throw a syntax errorBasically, I'd like to do th...

  • 196 Views
  • 5 replies
  • 1 kudos
Latest Reply
Data_Mavericks
New Contributor
  • 1 kudos

@JameDavi_51481  i agree with your point so creating dynamic safe queries in python can i think still if i am not wrong? i think can use EXECUTE IMMEDIATE this can prevent risk of exposing  direct string formatting.

  • 1 kudos
4 More Replies
GFrost
by New Contributor
  • 78 Views
  • 1 replies
  • 0 kudos

Passing values from a CTE (Common Table Expression) to user-defined functions (UDF) in Spark SQL

Hello everyone, I'm trying to pass a value from a CTE to my function (UDF). Unfortunately, it's not working.Here is the first variant: WITH fx_date_new AS (   SELECT CASE         WHEN '2025-01-01' > current_date()                THEN CAST(date_format...

  • 78 Views
  • 1 replies
  • 0 kudos
Latest Reply
ggsmith
Contributor
  • 0 kudos

I think the issue is in your subquery. You shouldn't have the entire cte query in parentheses. Only  the column from your CTE. Your FROM clause is inside your udf arguments. See if you can use the example below to fix the issue.CREATE OR REPLACE FUNC...

  • 0 kudos
samanthacr
by New Contributor II
  • 407 Views
  • 4 replies
  • 0 kudos

How to use Iceberg SQL Extensions in a notebook?

I'm trying to use Iceberg's SQL extensions in my Databricks Notebook, but I get a syntax error. Specifically, I'm trying to run 'ALTER TABLE my_iceberg_table WRITE LOCALLY ORDERED BY timestamp;'. This command is listed as part of Iceberg's SQL extens...

  • 407 Views
  • 4 replies
  • 0 kudos
Latest Reply
gorkaada_BI
New Contributor II
  • 0 kudos

val dfh = spark.sql(s"""CALL glue_catalog.system.create_changelog_view(  table => '<>table',  options => map('start-snapshot-id', '$startSnapshotId', 'end-snapshot-id', '$endSnapshotId'),  changelog_view => table_v)""")lead to ParseException: [PROCED...

  • 0 kudos
3 More Replies
User16790091296
by Contributor II
  • 3055 Views
  • 3 replies
  • 5 kudos

Resolved! How do I use databricks-cli without manual configuration

I want to use databricks cli:databricks clusters listbut this requires a manual step that requires interactive work with the user:databricks configure --tokenIs there a way to use databricks cli without manual intervention so that you can run it as p...

  • 3055 Views
  • 3 replies
  • 5 kudos
Latest Reply
alexott
Databricks Employee
  • 5 kudos

You can set two environment variables: DATABRICKS_HOST and DATABRICKS_TOKEN, and databricks-cli will use them. See the example of that in the DevOps pipelinesee the full list of environment variables at the end of the Authentication section of docume...

  • 5 kudos
2 More Replies
him
by New Contributor III
  • 19253 Views
  • 13 replies
  • 9 kudos

i am getting the below error while making a GET request to job in databrick after successfully running it

"error_code": "INVALID_PARAMETER_VALUE",  "message": "Retrieving the output of runs with multiple tasks is not supported. Please retrieve the output of each individual task run instead."}

Capture
  • 19253 Views
  • 13 replies
  • 9 kudos
Latest Reply
Octavian1
Contributor
  • 9 kudos

Hi @Debayan I'd suggest to also mention this explicitly in the documentation of the workspace client for get_run_outputOne has to pay extra attention to the examplerun_id=run.tasks[0].run_id otherwise it can be easily missed. 

  • 9 kudos
12 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels
Latest Photos in Data Engineering