cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Prajit0710
by New Contributor II
  • 288 Views
  • 1 replies
  • 0 kudos

Resolved! Authentication issue in HiveMetastore

Problem Statement:When I execute the below code as a part of the notebook both manually and in workflow it works as expecteddf.write.mode("overwrite") \.format('delta') \.option('path',ext_path) \.saveAsTable("tbl_schema.Table_name")but when I integr...

  • 288 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor II
  • 0 kudos

Hi @Prajit0710 This is an interesting issue where your Delta table write operation works as expected when run directly,but when executed within a function, the table doesn't get recognized by the HiveMetastore.The key difference is likely related to ...

  • 0 kudos
tebodelpino1234
by New Contributor
  • 1484 Views
  • 1 replies
  • 0 kudos

can view allow_expectations_col in unit catalog

I am developing a dlt that manages expectations and it works correctly.but I need to see the columns__DROP_EXPECTATIONS_COL__MEETS_DROP_EXPECTATIONS__ALLOW_EXPECTATIONS_COLin the unified catalog, I can see them in the delta table that the dlt generat...

tebodelpino1234_0-1739993377613.png tebodelpino1234_1-1739993707990.png tebodelpino1234_2-1739993775604.png
  • 1484 Views
  • 1 replies
  • 0 kudos
Latest Reply
kamal_ch
Databricks Employee
  • 0 kudos

Materialization tables created by DLT include these columns to process expectations but they might not propagate to Unity Catalog representations such as views or schema-level metadata unless explicitly set up for such lineage or column-level exposur...

  • 0 kudos
KS12
by New Contributor
  • 1455 Views
  • 1 replies
  • 0 kudos

Unable to get s3 data - o536.ls.

Error while executingdisplay(dbutils.fs.ls(f"s3a://bucket-name/"))bucket-name has read/list permissionsshaded.databricks.org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://bucket-name/ com.amazonaws.SdkClientException: Unable to ex...

  • 1455 Views
  • 1 replies
  • 0 kudos
Latest Reply
kamal_ch
Databricks Employee
  • 0 kudos

To start with add SSL debugging logs by passing the JVM option -Djavax.net.debug=ssl in cluster configuration. This helps identify whether the handshake is failing due to missing certificates or invalid paths, Also check the cluster initialization sc...

  • 0 kudos
minhhung0507
by Valued Contributor
  • 1638 Views
  • 1 replies
  • 0 kudos

Error Listing Delta Log on GCS in Databricks

I am encountering an issue while working with a Delta table in Databricks. The error message is as follows:java.io.IOException: Error listing gs://cimb-prod-lakehouse/bronze-layer/dbd/customer_info_update_request_processing/_delta_log/ This issue occ...

minhhung0507_0-1739502695009.png
  • 1638 Views
  • 1 replies
  • 0 kudos
Latest Reply
kamal_ch
Databricks Employee
  • 0 kudos

Ensure that the Databricks workspace has the necessary permissions to access the GCS bucket. Check if the service account used for Databricks has "Storage Object Viewer" or a similar role granted. Verify that the path "gs://cimb-prod-lakehouse/bronze...

  • 0 kudos
DaPo
by New Contributor III
  • 1468 Views
  • 2 replies
  • 0 kudos

DLT Fails with Exception: CANNOT_READ_STREAMING_STATE_FILE

I have several DLT Pipeline, writing to some schema in a unity catalog. The storage location of the unity-catalog is managed by the databricks deployment (on AWS).The schema and the dlt-pipeline are managed via databricks asset bundles. I did not cha...

  • 1468 Views
  • 2 replies
  • 0 kudos
Latest Reply
mani_22
Databricks Employee
  • 0 kudos

Hi @DaPo , Have you made any code changes to your streaming query? There are limitations on what changes in a streaming query are allowed between restarts from the same checkpoint location. Refer this documentation The checkpoint location appears to ...

  • 0 kudos
1 More Replies
Nes_Hdr
by New Contributor III
  • 2695 Views
  • 2 replies
  • 0 kudos

Path based access not supported for tables with row filters?

Hello,  I have encountered an issue recently and was not able to find a solution yet. I have a job on databricks that creates a table using dbt (dbt-databricks>=1.0.0,<2.0.0). I am setting the location_root configuration so that this table is externa...

Data Engineering
dbt
row_filter
  • 2695 Views
  • 2 replies
  • 0 kudos
Latest Reply
Nes_Hdr
New Contributor III
  • 0 kudos

To recreate the issue:PS. Good to know: using dbt to create materialized tables is equivalent to running "create or replace table table_name"The following code will create an external table with row security:create or replace table table_name using d...

  • 0 kudos
1 More Replies
oscarramosp
by New Contributor II
  • 568 Views
  • 3 replies
  • 1 kudos

DLT Pipeline upsert question

Hello, I'm working on a DLT pipeline to build a what would be a Datawarehouse/Datamart. I'm facing issues trying to "update" my fact table when the dimensions that are outside the pipeline fail to be up to date at my processing time, so on the next r...

  • 568 Views
  • 3 replies
  • 1 kudos
Latest Reply
BigRoux
Databricks Employee
  • 1 kudos

The error encountered, "Cannot have multiple queries named catalog.schema.destination_fact for catalog.schema.destination_fact. Additional queries on that table must be named," arises because Delta Live Tables (DLT) disallows multiple unnamed queries...

  • 1 kudos
2 More Replies
Zeruno
by New Contributor II
  • 1756 Views
  • 1 replies
  • 0 kudos

UDFs with modular code - INVALID_ARGUMENT

I am migrating a massive codebase to Pyspark on Azure Databricks,using DLT Pipelines. It is very important that code will be modular, that is I am looking to make use of UDFs for the timebeing that use modules and classes.I am receiving the following...

  • 1756 Views
  • 1 replies
  • 0 kudos
Latest Reply
briceg
Databricks Employee
  • 0 kudos

Hi @Zeruno. What you can do is to package up your code and pip install in your pipeline. I had the same situation where I developed some code which ran fine in a notebook, but when used in a DLT pipeline, the deps were not found. Packaging them up an...

  • 0 kudos
jlynlangford
by New Contributor
  • 310 Views
  • 1 replies
  • 0 kudos

collect() in SparkR and sparklyr

Hello,I'm have a vast difference in performance between SparkR:collect() and sparklyr:collect. I have a somewhat complicated query that uses WITH AS syntax to get the data set I need; there are several views defined and joins required. The final data...

  • 310 Views
  • 1 replies
  • 0 kudos
Latest Reply
niteshm
New Contributor III
  • 0 kudos

@jlynlangford This is a tricky situation, and multiple resolutions can be tried to address the performance gap,Schema Complexity: If the DataFrame contains nested structs, arrays, or map types, collect() can become significantly slower due to complex...

  • 0 kudos
Filip
by New Contributor II
  • 5489 Views
  • 5 replies
  • 0 kudos

How to Assign User Managed Identity to DBR Cluster so I can use it for quering ADLSv2?

Hi,I'm trying to figure out if we can switch from Entra ID SPN's to User Assigned Managed Indentities and everything works except I can't figure out how to access the lake files from python notebook.I've tried with below code and was running it on a ...

  • 5489 Views
  • 5 replies
  • 0 kudos
Latest Reply
kuniteru
New Contributor II
  • 0 kudos

Hi,I can be accessed with the following code.storageAccountName = "my-storage-account-name" applicationClientId = "my-umi-client-id" aadDirectoryId = "my-entra-tenant-id" containerName = "my-lake-container" spark.conf.set("fs.azure.account.auth.type...

  • 0 kudos
4 More Replies
thomas_berry
by New Contributor II
  • 721 Views
  • 3 replies
  • 2 kudos

Resolved! federated queries on PostgreSQL - TimestampNTZ option

Hello,I am trying to migrate some spark reads away from JDBC into the federated queries based in unity catalog.Here is an example of the spark read command that I want to migrate:spark.read.format("jdbc").option("driver", "org.postgresql.Driver").opt...

  • 721 Views
  • 3 replies
  • 2 kudos
Latest Reply
lingareddy_Alva
Honored Contributor II
  • 2 kudos

Thanks @thomas_berry I hope so 

  • 2 kudos
2 More Replies
bigger_dave
by New Contributor II
  • 248 Views
  • 1 replies
  • 0 kudos

create flow for streaming table

Hi Team.I'm following the example code to create flows, which is here.When I create the streaming table without a query (see code below):CREATE OR REFRESH STREAMING TABLE target_table;I get the error "The operation CREATE WITHOUT A QUERY is not allow...

  • 248 Views
  • 1 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

The error "The operation CREATE WITHOUT A QUERY is not allowed: The operation is not supported on Streaming Tables" occurs because the CREATE OR REFRESH STREAMING TABLE statement requires a query to define the data source for the streaming table. Str...

  • 0 kudos
Keremmm
by New Contributor II
  • 1175 Views
  • 1 replies
  • 3 kudos

Delta Lake Commit Versions: Are Gaps Possible?

Hi everyone,I'm exploring how commit versions work in Delta Lake and have a question regarding their sequencing. Specifically, I'm curious whether commit versions are guaranteed to be dense and sequential, or if there are scenarios where gaps might o...

  • 1175 Views
  • 1 replies
  • 3 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 3 kudos

Commit versions in Delta Lake are not guaranteed to be dense and sequential. There are scenarios where gaps might occur between version numbers. Specifically, the DELTA_VERSIONS_NOT_CONTIGUOUS error condition indicates that versions are not contiguou...

  • 3 kudos
soumend7115
by New Contributor
  • 216 Views
  • 1 replies
  • 0 kudos

Is there a way to permanenetly purge data in Databricks based on certain condition ?

Is there a way to permanenetly purge data in Databricks based on certain condition ?Like, from a particular Databricks table, I want to permanently purge certain rows based on a specific condition e.g., WHERE <col1>="Val1" and <col2>="Val2"

  • 216 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lucas_TBrabo
Databricks Employee
  • 0 kudos

Hi @soumend7115! I will assume you are talking about managed tables in Unity Catalog here, if thats not the case, let me know. We can segregate this in two steps: You can use a DELETE FROM SQL statement to remove rows that match your condition. For e...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels