cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Th0rs7en
by New Contributor
  • 110 Views
  • 1 replies
  • 0 kudos

"databricks bundle deploy" "Error: Failed to unzip with error: invalid compression method (400 <nil>

When trying to deploy a DAB (in WSL Terminal) I get the following error:Error: Failed to unzip with error: invalid compression method (400 <nil>)How to fix this?

  • 110 Views
  • 1 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

Its typically caused due to incompatibility. You can follow belowUpgrade CLI to the latest version since you are in WSLcurl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | shIf the CLI version is below 0.230 the upgrade...

  • 0 kudos
Surya2
by New Contributor III
  • 415 Views
  • 2 replies
  • 1 kudos

Resolved! Auto CDC Delete Propagation Issue: Streaming CDF Reads Don't Capture Delete Events from Auto CDC

SummaryI'm exploring GDPR delete propagation through a medallion architecture (Bronze → Silver → Gold) using Auto CDC with Change Data Feed. Delete events propagate successfully from Landing → Bronze, but fail to propagate from Bronze → Silver → Gold...

  • 415 Views
  • 2 replies
  • 1 kudos
Latest Reply
Surya2
New Contributor III
  • 1 kudos

Hi Louis @Louis_FrolioThank you very much for your comprehensive troubleshooting guidance. The references you shared, particularly the technical blog post on "Propagating Deletes..." were extremely helpful and contained information I had missed earli...

  • 1 kudos
1 More Replies
NathanG
by New Contributor
  • 148 Views
  • 1 replies
  • 0 kudos

Lakeflow Connect - Pending ‘full refresh’ process that needs to be removed in gateway pipeline.

Hello, we have the following issue that we have been unable to resolve. Gateway pipeline: gw-replication-spainManaged ingestion pipeline: pip-replication-spainSource: SQL ServerTable: GestionesTarget table: repl.00_landing.gestiones (deleted due to s...

  • 148 Views
  • 1 replies
  • 0 kudos
Latest Reply
Yogasathyandrun
New Contributor II
  • 0 kudos

Based on the events you've shared, it does appear that the gateway is recognizing the configuration change (Tables removed: Gestiones) but is still attempting to process a previously initiated snapshot request for that table.A few things stand out:Th...

  • 0 kudos
Yogasathyandrun
by New Contributor II
  • 389 Views
  • 2 replies
  • 3 kudos

Resolved! Detecting Photon fallback in-cluster + safe right-sizing from system tables

I'm prototyping a cluster cost / right-sizing advisor and wanted to get a reality-check from people running Databricks at real scale before I sink more time into it.The main thing I'm chasing is Photon fallback. Photon quietly drops to the JVM on uns...

  • 389 Views
  • 2 replies
  • 3 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 3 kudos

Hey @Yogasathyandrun , I did some digging and would like to share some thoughts that you hopefully find useful. You've mapped the boundary here more accurately than most people do, so let me give you a quick reality check on your four sticking points...

  • 3 kudos
1 More Replies
lachu
by New Contributor II
  • 230 Views
  • 4 replies
  • 0 kudos

SDP continuous mode

Hi,I was doing a POC and hence used open source spark and kafka in docket container and got it working. The sample code is ingesting data from kafka but it is running only in batch mode. Not able to continuously ingest the kafka streamQuestion: Can w...

  • 230 Views
  • 4 replies
  • 0 kudos
Latest Reply
lachu
New Contributor II
  • 0 kudos

Sample code that i usedfrom pyspark import pipelines as dp from pyspark.sql import DataFrame, SparkSession, functions as f from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DecimalType spark = SparkSession.active() @dp...

  • 0 kudos
3 More Replies
thedatacrew
by Databricks Partner
  • 278 Views
  • 3 replies
  • 3 kudos

Adhoc Table Refresh in Lakeflow Spark Declarative Pipelines (SDP)

Hi,It is currently not possible to specify a list of tables to refresh and their refresh policies (full/normal) in a Lakeflow Job.It can be done via the REST API, but it's messy.For example, if you need some tables or views refreshed more regularly, ...

  • 278 Views
  • 3 replies
  • 3 kudos
Latest Reply
Yogasathyandrun
New Contributor II
  • 3 kudos

This is a real limitation in the current Lakeflow / DLT job model.Today, a pipeline is treated as the unit of refresh, not individual tables inside it. That means:You can run or fully refresh a pipelineBut you cannot define different refresh policies...

  • 3 kudos
2 More Replies
Databrickissue
by New Contributor
  • 148 Views
  • 1 replies
  • 0 kudos

DLT Issue

I have one DLT pipeline in Databricks. When I schedule the pipeline, the data is not showing. However, when I run the pipeline manually, the data is displayed properly

  • 148 Views
  • 1 replies
  • 0 kudos
Latest Reply
Yogasathyandrun
New Contributor II
  • 0 kudos

A few details would help narrow this down.When the scheduled run executes:Does the pipeline update show Succeeded or Failed?In the pipeline Event Log, do you see rows being processed/written?Is your manual run a normal update or a Full Refresh?Is the...

  • 0 kudos
Ericsson
by New Contributor II
  • 7079 Views
  • 4 replies
  • 1 kudos

SQL week format issue its not showing result as 01(ww)

Hi Folks,I've requirement to show the week number as ww format. Please see the below codeselect weekofyear(date_add(to_date(current_date, 'yyyyMMdd'), +35)). also plz refre the screen shot for result.

result
  • 7079 Views
  • 4 replies
  • 1 kudos
Latest Reply
Aidutchinso
New Contributor II
  • 1 kudos

"I've been exploring different communities lately, and honestly, connecting with people who share your interests makes all the difference. Whether it's diving deep into data engineering discussions or just having random conversations on platforms lik...

  • 1 kudos
3 More Replies
deepak05
by Contributor
  • 43762 Views
  • 12 replies
  • 13 kudos

Resolved! I Got 70.00% on Databricks Certified Data Engineer Professional Exam but Failed....

Hi Everyone,Today I gave databricks exam for and I got 64 questions and my result was exactly 70.00%(As per databricks the pass percentage is 70 or above). but still the status was showing Failed and I couldn't get certified.Can you anyone help me on...

  • 43762 Views
  • 12 replies
  • 13 kudos
Latest Reply
halliekohler
New Contributor II
  • 13 kudos

Congratulations on this achievement! Reaching this milestone feels incredibly rewarding. I had a similar experience, and quality practice resources from https://linkly.link/2l2Hb were very helpful throughout my preparation journey.

  • 13 kudos
11 More Replies
genie
by New Contributor
  • 162 Views
  • 1 replies
  • 0 kudos

Genie Code hallucinates CLI commands

I want to run some SQL commands programmatically against and decided to use Genie Code to help me, it came up with unsupported and non-existent commands.  

genie_0-1782127873093.png
  • 162 Views
  • 1 replies
  • 0 kudos
Latest Reply
Yogasathyandrun
New Contributor II
  • 0 kudos

The command shown in the screenshot appears to be hallucinated.databricks sql-statements execute is not a valid Databricks CLI command. It looks like Genie combined concepts from the SQL Statement Execution API with CLI syntax that doesn't actually e...

  • 0 kudos
Maxrb
by New Contributor III
  • 376 Views
  • 4 replies
  • 3 kudos

Resolved! Autoloader [FAILED_READ_FILE.PARQUET_COLUMN_DATA_TYPE_MISMATCH]

Hi,I am using autoloader to load parquet files into my unity catalog with the following settings:.option("cloudFiles.format", "parquet") .option("cloudFiles.inferColumnTypes", "true") .option("cloudFiles.schemaEvolutionMode", "addNewColumnsWithTypeWi...

  • 376 Views
  • 4 replies
  • 3 kudos
Latest Reply
Yogasathyandrun
New Contributor II
  • 3 kudos

What you're seeing comes down to where the type mismatch is detected.For Parquet, some mismatches can be handled at the Auto Loader layer and end up in _rescued_data, while others fail earlier inside the Parquet reader itself.In your example, the exi...

  • 3 kudos
3 More Replies
shan-databricks
by Databricks Partner
  • 214 Views
  • 3 replies
  • 1 kudos

How to store credentials in Databricks and assign them to job parameters

I am using SQL Server, Postgres, and MongoDB as data sources, connecting through Spark and JDBC connector. I would like to store the credentials and connection details in Databricks, pass them as job parameters, and need guidance on possible approach...

  • 214 Views
  • 3 replies
  • 1 kudos
Latest Reply
Yogasathyandrun
New Contributor II
  • 1 kudos

I'd think about this as a separation of concerns:Secrets are for sensitive values (usernames, passwords, tokens, connection URIs).Job parameters are for runtime values (connection name, database, schema, table, query, collection, source system).In mo...

  • 1 kudos
2 More Replies
Nick_Hughes
by New Contributor III
  • 17573 Views
  • 5 replies
  • 1 kudos

Best way to generate fake data using underlying schema

HiWe are trying to generate fake data to run our tests. For example, we have a pipeline that creates a gold layer fact table form 6 underlying source tables in our silver layer. We want to generate the data in a way that recognises the relationships ...

  • 17573 Views
  • 5 replies
  • 1 kudos
Latest Reply
savlahanish27
Databricks Partner
  • 1 kudos

The core problem you're facing is that Delta Lake doesn't enforce foreign key constraints, so most datagen tools generate each table independently and your joins produce no meaningful overlap.The solution is to generate a shared key pool first - a si...

  • 1 kudos
4 More Replies
ConnorK
by Databricks Partner
  • 260 Views
  • 3 replies
  • 2 kudos

Databricks Standard SharePoint Connector Performance Issues

I've recently started using the Databricks Standard SharePoint connector within my workspace and have run into some significant performance issues.My notebook does a straightforward read using the following:lakeflow_connection_name = 'sharepoint_dev'...

  • 260 Views
  • 3 replies
  • 2 kudos
Latest Reply
Yogasathyandrun
New Contributor II
  • 2 kudos

I think your diagnosis is likely correct.One thing that stands out is that you’re only reading A1:Z2 from each workbook. Given that the operation is still taking 40+ minutes, the bottleneck is unlikely to be the Excel parsing itself and more likely t...

  • 2 kudos
2 More Replies
Labels