cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DaPo
by New Contributor III
  • 3738 Views
  • 2 replies
  • 0 kudos

DLT Fails with Exception: CANNOT_READ_STREAMING_STATE_FILE

I have several DLT Pipeline, writing to some schema in a unity catalog. The storage location of the unity-catalog is managed by the databricks deployment (on AWS).The schema and the dlt-pipeline are managed via databricks asset bundles. I did not cha...

  • 3738 Views
  • 2 replies
  • 0 kudos
Latest Reply
mani_22
Databricks Employee
  • 0 kudos

Hi @DaPo , Have you made any code changes to your streaming query? There are limitations on what changes in a streaming query are allowed between restarts from the same checkpoint location. Refer this documentation The checkpoint location appears to ...

  • 0 kudos
1 More Replies
oscarramosp
by New Contributor II
  • 1470 Views
  • 3 replies
  • 1 kudos

DLT Pipeline upsert question

Hello, I'm working on a DLT pipeline to build a what would be a Datawarehouse/Datamart. I'm facing issues trying to "update" my fact table when the dimensions that are outside the pipeline fail to be up to date at my processing time, so on the next r...

  • 1470 Views
  • 3 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

The error encountered, "Cannot have multiple queries named catalog.schema.destination_fact for catalog.schema.destination_fact. Additional queries on that table must be named," arises because Delta Live Tables (DLT) disallows multiple unnamed queries...

  • 1 kudos
2 More Replies
Zeruno
by New Contributor II
  • 4030 Views
  • 1 replies
  • 0 kudos

UDFs with modular code - INVALID_ARGUMENT

I am migrating a massive codebase to Pyspark on Azure Databricks,using DLT Pipelines. It is very important that code will be modular, that is I am looking to make use of UDFs for the timebeing that use modules and classes.I am receiving the following...

  • 4030 Views
  • 1 replies
  • 0 kudos
Latest Reply
briceg
Databricks Employee
  • 0 kudos

Hi @Zeruno. What you can do is to package up your code and pip install in your pipeline. I had the same situation where I developed some code which ran fine in a notebook, but when used in a DLT pipeline, the deps were not found. Packaging them up an...

  • 0 kudos
jlynlangford
by New Contributor
  • 837 Views
  • 1 replies
  • 0 kudos

collect() in SparkR and sparklyr

Hello,I'm have a vast difference in performance between SparkR:collect() and sparklyr:collect. I have a somewhat complicated query that uses WITH AS syntax to get the data set I need; there are several views defined and joins required. The final data...

  • 837 Views
  • 1 replies
  • 0 kudos
Latest Reply
niteshm
New Contributor III
  • 0 kudos

@jlynlangford This is a tricky situation, and multiple resolutions can be tried to address the performance gap,Schema Complexity: If the DataFrame contains nested structs, arrays, or map types, collect() can become significantly slower due to complex...

  • 0 kudos
thomas_berry
by New Contributor II
  • 1415 Views
  • 3 replies
  • 2 kudos

Resolved! federated queries on PostgreSQL - TimestampNTZ option

Hello,I am trying to migrate some spark reads away from JDBC into the federated queries based in unity catalog.Here is an example of the spark read command that I want to migrate:spark.read.format("jdbc").option("driver", "org.postgresql.Driver").opt...

  • 1415 Views
  • 3 replies
  • 2 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 2 kudos

Thanks @thomas_berry I hope so 

  • 2 kudos
2 More Replies
bigger_dave
by New Contributor II
  • 613 Views
  • 1 replies
  • 0 kudos

create flow for streaming table

Hi Team.I'm following the example code to create flows, which is here.When I create the streaming table without a query (see code below):CREATE OR REFRESH STREAMING TABLE target_table;I get the error "The operation CREATE WITHOUT A QUERY is not allow...

  • 613 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

The error "The operation CREATE WITHOUT A QUERY is not allowed: The operation is not supported on Streaming Tables" occurs because the CREATE OR REFRESH STREAMING TABLE statement requires a query to define the data source for the streaming table. Str...

  • 0 kudos
Keremmm
by New Contributor II
  • 3477 Views
  • 1 replies
  • 3 kudos

Delta Lake Commit Versions: Are Gaps Possible?

Hi everyone,I'm exploring how commit versions work in Delta Lake and have a question regarding their sequencing. Specifically, I'm curious whether commit versions are guaranteed to be dense and sequential, or if there are scenarios where gaps might o...

  • 3477 Views
  • 1 replies
  • 3 kudos
Latest Reply
Vidhi_Khaitan
Databricks Employee
  • 3 kudos

Commit versions in Delta Lake are not guaranteed to be dense and sequential. There are scenarios where gaps might occur between version numbers. Specifically, the DELTA_VERSIONS_NOT_CONTIGUOUS error condition indicates that versions are not contiguou...

  • 3 kudos
soumend7115
by New Contributor
  • 647 Views
  • 1 replies
  • 0 kudos

Is there a way to permanenetly purge data in Databricks based on certain condition ?

Is there a way to permanenetly purge data in Databricks based on certain condition ?Like, from a particular Databricks table, I want to permanently purge certain rows based on a specific condition e.g., WHERE <col1>="Val1" and <col2>="Val2"

  • 647 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lucas_TBrabo
Databricks Employee
  • 0 kudos

Hi @soumend7115! I will assume you are talking about managed tables in Unity Catalog here, if thats not the case, let me know. We can segregate this in two steps: You can use a DELETE FROM SQL statement to remove rows that match your condition. For e...

  • 0 kudos
hravilla
by New Contributor
  • 4766 Views
  • 2 replies
  • 0 kudos

Upload file to DBFS fails with error code 0

When trying to upload to DBFS from local machine getting error as "Error occurred when processing file ... : Server responded with 0 code" DBR 7.3 LTSSpark 3.0.1 Scala 2.12 Uploading the file using the "upload" in the Databricks cloud console, the c...

  • 4766 Views
  • 2 replies
  • 0 kudos
Latest Reply
LokeshManne
New Contributor III
  • 0 kudos

@PramodNaikThe error you are facing not because of file size but because the file you are trying to upload contains PII or SPII data  words like dob, Token, accesskey, password..,etc.Solution: Rename such data like date_of_birth, token_no, access_key...

  • 0 kudos
1 More Replies
HelloDatabricks
by New Contributor II
  • 8352 Views
  • 6 replies
  • 8 kudos

Connect Timeout - Error when trying to run a cell

Hello everybody.Whenever I am trying to run a simple cell I receive the following error message now:Notebook detached. Exception when creating expectation context: java.net.SocketTimeoutException: Connect Timeout.After that error message the cluster ...

  • 8352 Views
  • 6 replies
  • 8 kudos
Latest Reply
LokeshManne
New Contributor III
  • 8 kudos

@HelloDatabricks The error you are facing is due to not detaching the old cluster, which is terminated/auto-terminated and when you run a cluster that connected to new cluster with same name and config. Which notebook tries to connect to old cluster ...

  • 8 kudos
5 More Replies
Prashant777
by New Contributor II
  • 5192 Views
  • 3 replies
  • 0 kudos

UnsupportedOperationException: Cannot perform Merge as multiple source rows matched and attempted to modify the same

My Code:-- CREATE OR REPLACE TEMPORARY VIEW preprocessed_source ASSELECT  Key_ID,  Distributor_ID,  Customer_ID,  Customer_Name,  ChannelFROM integr_masterdata.Customer_Master;-- Step 2: Perform the merge operation using the preprocessed source table...

  • 5192 Views
  • 3 replies
  • 0 kudos
Latest Reply
LokeshManne
New Contributor III
  • 0 kudos

@Prashant777 In your scenario at update section, you are trying to update primary keys aswell, which Delta Table can't differentiate when you re-run the same batch/file and throws error as all duplicates, to run without error/fail, remove (Target.Dis...

  • 0 kudos
2 More Replies
jomt
by New Contributor III
  • 4600 Views
  • 4 replies
  • 2 kudos

Error in SQL Warehouse: User is not part of org

I tried to start the Databricks SQL Warehouse cluster today, but received the following error message:Clusters are failing to launch. Cluster launch will be retired Request to create a cluster failed with an exception: PERMISSION_DENIED: User xxxx is...

  • 4600 Views
  • 4 replies
  • 2 kudos
Latest Reply
akshay4996
New Contributor II
  • 2 kudos

Hi All,What you need to do is set a new owner. You can do this by clicking on permissions, then the setup icon, and choosing Assign new owner.It works for me.Thanks

  • 2 kudos
3 More Replies
mjar
by New Contributor III
  • 6379 Views
  • 10 replies
  • 4 kudos

ModuleNotFoundError when using foreachBatch on runtime 14 with Unity

Recently we have run into an issue using foreachBatch after upgrading our Databricks cluster on Azure to a runtime version 14 with Spark 3.5 with Shared access mode and Unity catalogue.The issue was manifested by ModuleNotFoundError error being throw...

  • 6379 Views
  • 10 replies
  • 4 kudos
Latest Reply
dataeng42io
New Contributor III
  • 4 kudos

I am having the same issue using serverless compute. I think the issue comes from this documentation limitations https://docs.databricks.com/aws/en/structured-streaming/foreach#behavior-changes-for-foreachbatch-in-databricks-runtime-140 

  • 4 kudos
9 More Replies
SKakarla
by New Contributor
  • 1229 Views
  • 2 replies
  • 0 kudos

Notebooks owner shows 'Unknown'

Hi All,We are using CI/CD to deploy notebooks from GitHub and authenticating via Azure Service Principals (SPNs). Until last week, the notebook owner was correctly displayed as the SPN. However, over the past few days, the owner is now shown as "Unkn...

  • 1229 Views
  • 2 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

Same here, but indeed it does not seem to have any impact at all.So I guess something changed in the databricks backend as suggested before.

  • 0 kudos
1 More Replies
marcio_oliveira
by New Contributor II
  • 2017 Views
  • 3 replies
  • 2 kudos

Resolved! Job run failing to import modules

I have several notebooks that run code to ingest data from various APIs into our Data Warehouse. I have several modules that I reuse in multiple notebooks, things like redshift functions, string cleaning functions and json cleaning functions. Out of ...

marcio_oliveira_0-1747149522503.png
  • 2017 Views
  • 3 replies
  • 2 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 2 kudos

Hi @marcio_oliveira Thanks for sharing the error and the context — this intermittent module import issue in Databricks Serverless jobs is a known behavior in some environments,and here’s what’s likely going wrong :Root Cause:A race condition or cold-...

  • 2 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels