cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Prajit0710
by New Contributor II
  • 585 Views
  • 1 replies
  • 0 kudos

Resolved! Authentication issue in HiveMetastore

Problem Statement:When I execute the below code as a part of the notebook both manually and in workflow it works as expecteddf.write.mode("overwrite") \.format('delta') \.option('path',ext_path) \.saveAsTable("tbl_schema.Table_name")but when I integr...

  • 585 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @Prajit0710 This is an interesting issue where your Delta table write operation works as expected when run directly,but when executed within a function, the table doesn't get recognized by the HiveMetastore.The key difference is likely related to ...

  • 0 kudos
tebodelpino1234
by New Contributor
  • 3464 Views
  • 1 replies
  • 0 kudos

can view allow_expectations_col in unit catalog

I am developing a dlt that manages expectations and it works correctly.but I need to see the columns__DROP_EXPECTATIONS_COL__MEETS_DROP_EXPECTATIONS__ALLOW_EXPECTATIONS_COLin the unified catalog, I can see them in the delta table that the dlt generat...

tebodelpino1234_0-1739993377613.png tebodelpino1234_1-1739993707990.png tebodelpino1234_2-1739993775604.png
  • 3464 Views
  • 1 replies
  • 0 kudos
Latest Reply
kamal_ch
Databricks Employee
  • 0 kudos

Materialization tables created by DLT include these columns to process expectations but they might not propagate to Unity Catalog representations such as views or schema-level metadata unless explicitly set up for such lineage or column-level exposur...

  • 0 kudos
KS12
by New Contributor
  • 3644 Views
  • 1 replies
  • 0 kudos

Unable to get s3 data - o536.ls.

Error while executingdisplay(dbutils.fs.ls(f"s3a://bucket-name/"))bucket-name has read/list permissionsshaded.databricks.org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://bucket-name/ com.amazonaws.SdkClientException: Unable to ex...

  • 3644 Views
  • 1 replies
  • 0 kudos
Latest Reply
kamal_ch
Databricks Employee
  • 0 kudos

To start with add SSL debugging logs by passing the JVM option -Djavax.net.debug=ssl in cluster configuration. This helps identify whether the handshake is failing due to missing certificates or invalid paths, Also check the cluster initialization sc...

  • 0 kudos
minhhung0507
by Valued Contributor
  • 3954 Views
  • 1 replies
  • 0 kudos

Error Listing Delta Log on GCS in Databricks

I am encountering an issue while working with a Delta table in Databricks. The error message is as follows:java.io.IOException: Error listing gs://cimb-prod-lakehouse/bronze-layer/dbd/customer_info_update_request_processing/_delta_log/ This issue occ...

minhhung0507_0-1739502695009.png
  • 3954 Views
  • 1 replies
  • 0 kudos
Latest Reply
kamal_ch
Databricks Employee
  • 0 kudos

Ensure that the Databricks workspace has the necessary permissions to access the GCS bucket. Check if the service account used for Databricks has "Storage Object Viewer" or a similar role granted. Verify that the path "gs://cimb-prod-lakehouse/bronze...

  • 0 kudos
DaPo
by New Contributor III
  • 3760 Views
  • 2 replies
  • 0 kudos

DLT Fails with Exception: CANNOT_READ_STREAMING_STATE_FILE

I have several DLT Pipeline, writing to some schema in a unity catalog. The storage location of the unity-catalog is managed by the databricks deployment (on AWS).The schema and the dlt-pipeline are managed via databricks asset bundles. I did not cha...

  • 3760 Views
  • 2 replies
  • 0 kudos
Latest Reply
mani_22
Databricks Employee
  • 0 kudos

Hi @DaPo , Have you made any code changes to your streaming query? There are limitations on what changes in a streaming query are allowed between restarts from the same checkpoint location. Refer this documentation The checkpoint location appears to ...

  • 0 kudos
1 More Replies
oscarramosp
by New Contributor II
  • 1487 Views
  • 3 replies
  • 1 kudos

DLT Pipeline upsert question

Hello, I'm working on a DLT pipeline to build a what would be a Datawarehouse/Datamart. I'm facing issues trying to "update" my fact table when the dimensions that are outside the pipeline fail to be up to date at my processing time, so on the next r...

  • 1487 Views
  • 3 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

The error encountered, "Cannot have multiple queries named catalog.schema.destination_fact for catalog.schema.destination_fact. Additional queries on that table must be named," arises because Delta Live Tables (DLT) disallows multiple unnamed queries...

  • 1 kudos
2 More Replies
Zeruno
by New Contributor II
  • 4033 Views
  • 1 replies
  • 0 kudos

UDFs with modular code - INVALID_ARGUMENT

I am migrating a massive codebase to Pyspark on Azure Databricks,using DLT Pipelines. It is very important that code will be modular, that is I am looking to make use of UDFs for the timebeing that use modules and classes.I am receiving the following...

  • 4033 Views
  • 1 replies
  • 0 kudos
Latest Reply
briceg
Databricks Employee
  • 0 kudos

Hi @Zeruno. What you can do is to package up your code and pip install in your pipeline. I had the same situation where I developed some code which ran fine in a notebook, but when used in a DLT pipeline, the deps were not found. Packaging them up an...

  • 0 kudos
jlynlangford
by New Contributor
  • 857 Views
  • 1 replies
  • 0 kudos

collect() in SparkR and sparklyr

Hello,I'm have a vast difference in performance between SparkR:collect() and sparklyr:collect. I have a somewhat complicated query that uses WITH AS syntax to get the data set I need; there are several views defined and joins required. The final data...

  • 857 Views
  • 1 replies
  • 0 kudos
Latest Reply
niteshm
New Contributor III
  • 0 kudos

@jlynlangford This is a tricky situation, and multiple resolutions can be tried to address the performance gap,Schema Complexity: If the DataFrame contains nested structs, arrays, or map types, collect() can become significantly slower due to complex...

  • 0 kudos
thomas_berry
by New Contributor II
  • 1436 Views
  • 3 replies
  • 2 kudos

Resolved! federated queries on PostgreSQL - TimestampNTZ option

Hello,I am trying to migrate some spark reads away from JDBC into the federated queries based in unity catalog.Here is an example of the spark read command that I want to migrate:spark.read.format("jdbc").option("driver", "org.postgresql.Driver").opt...

  • 1436 Views
  • 3 replies
  • 2 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 2 kudos

Thanks @thomas_berry I hope so 

  • 2 kudos
2 More Replies
bigger_dave
by New Contributor II
  • 621 Views
  • 1 replies
  • 0 kudos

create flow for streaming table

Hi Team.I'm following the example code to create flows, which is here.When I create the streaming table without a query (see code below):CREATE OR REFRESH STREAMING TABLE target_table;I get the error "The operation CREATE WITHOUT A QUERY is not allow...

  • 621 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

The error "The operation CREATE WITHOUT A QUERY is not allowed: The operation is not supported on Streaming Tables" occurs because the CREATE OR REFRESH STREAMING TABLE statement requires a query to define the data source for the streaming table. Str...

  • 0 kudos
Keremmm
by New Contributor II
  • 3506 Views
  • 1 replies
  • 3 kudos

Delta Lake Commit Versions: Are Gaps Possible?

Hi everyone,I'm exploring how commit versions work in Delta Lake and have a question regarding their sequencing. Specifically, I'm curious whether commit versions are guaranteed to be dense and sequential, or if there are scenarios where gaps might o...

  • 3506 Views
  • 1 replies
  • 3 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 3 kudos

Commit versions in Delta Lake are not guaranteed to be dense and sequential. There are scenarios where gaps might occur between version numbers. Specifically, the DELTA_VERSIONS_NOT_CONTIGUOUS error condition indicates that versions are not contiguou...

  • 3 kudos
soumend7115
by New Contributor
  • 663 Views
  • 1 replies
  • 0 kudos

Is there a way to permanenetly purge data in Databricks based on certain condition ?

Is there a way to permanenetly purge data in Databricks based on certain condition ?Like, from a particular Databricks table, I want to permanently purge certain rows based on a specific condition e.g., WHERE <col1>="Val1" and <col2>="Val2"

  • 663 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lucas_TBrabo
Databricks Employee
  • 0 kudos

Hi @soumend7115! I will assume you are talking about managed tables in Unity Catalog here, if thats not the case, let me know. We can segregate this in two steps: You can use a DELETE FROM SQL statement to remove rows that match your condition. For e...

  • 0 kudos
hravilla
by New Contributor
  • 4780 Views
  • 2 replies
  • 0 kudos

Upload file to DBFS fails with error code 0

When trying to upload to DBFS from local machine getting error as "Error occurred when processing file ... : Server responded with 0 code" DBR 7.3 LTSSpark 3.0.1 Scala 2.12 Uploading the file using the "upload" in the Databricks cloud console, the c...

  • 4780 Views
  • 2 replies
  • 0 kudos
Latest Reply
LokeshManne
New Contributor III
  • 0 kudos

@PramodNaikThe error you are facing not because of file size but because the file you are trying to upload contains PII or SPII data  words like dob, Token, accesskey, password..,etc.Solution: Rename such data like date_of_birth, token_no, access_key...

  • 0 kudos
1 More Replies
HelloDatabricks
by New Contributor II
  • 8415 Views
  • 6 replies
  • 8 kudos

Connect Timeout - Error when trying to run a cell

Hello everybody.Whenever I am trying to run a simple cell I receive the following error message now:Notebook detached. Exception when creating expectation context: java.net.SocketTimeoutException: Connect Timeout.After that error message the cluster ...

  • 8415 Views
  • 6 replies
  • 8 kudos
Latest Reply
LokeshManne
New Contributor III
  • 8 kudos

@HelloDatabricks The error you are facing is due to not detaching the old cluster, which is terminated/auto-terminated and when you run a cluster that connected to new cluster with same name and config. Which notebook tries to connect to old cluster ...

  • 8 kudos
5 More Replies
Prashant777
by New Contributor II
  • 5228 Views
  • 3 replies
  • 0 kudos

UnsupportedOperationException: Cannot perform Merge as multiple source rows matched and attempted to modify the same

My Code:-- CREATE OR REPLACE TEMPORARY VIEW preprocessed_source ASSELECT  Key_ID,  Distributor_ID,  Customer_ID,  Customer_Name,  ChannelFROM integr_masterdata.Customer_Master;-- Step 2: Perform the merge operation using the preprocessed source table...

  • 5228 Views
  • 3 replies
  • 0 kudos
Latest Reply
LokeshManne
New Contributor III
  • 0 kudos

@Prashant777 In your scenario at update section, you are trying to update primary keys aswell, which Delta Table can't differentiate when you re-run the same batch/file and throws error as all duplicates, to run without error/fail, remove (Target.Dis...

  • 0 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels