cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

kmenke-em
by New Contributor II
  • 1132 Views
  • 1 replies
  • 1 kudos

Resolved! CHAR/VARCHAR fields sometimes show as STRING in a view

We've found an interesting behavior where `char` and `varchar` fields in a table show as the `string` type in a view. Consider the following table and view:create or replace table thirty_day_tables.kit_varchar_string ( str1 string, str2 char(10),...

  • 1132 Views
  • 1 replies
  • 1 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 1 kudos

In Spark SQLstring is the canonical type for all textual data.char(n) and varchar(n) are parsed and stored as metadata, but internally treated as string.When you create a view, Spark does not preserve the original char(n) or varchar(n) types — it nor...

  • 1 kudos
Ranga_naik1180
by New Contributor III
  • 11765 Views
  • 7 replies
  • 5 kudos

Resolved! Delta Live table

Hi All,I'm working on a databricks delta live table(DLT) pipe line where we receive daily fully sanshot csv files in azure cloud storage .These files contain HR data (eg.employee file) and i'm using autoloader to ingest them into bronze layer DLT tab...

  • 11765 Views
  • 7 replies
  • 5 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 5 kudos

Hi @Ranga_naik1180, There is no need to create an intermediate view in SQL. You can directly read the change data feed from silver into the gold table. You can use the code something like below: CREATE STREAMING LIVE TABLE gold_table AS SELECT * FRO...

  • 5 kudos
6 More Replies
carlos_tasayco
by Contributor
  • 479 Views
  • 1 replies
  • 0 kudos

Showing masked column when they should not

In my organization we mask some column because they are PII, I have a dlt pipeline, I am masking these columns like this:CASE WHEN is_account_group_member("BDAIM-{environment.upper()}-PII_Unmask") THEN Personshopper.firstName ELSE mask(Personshopper....

  • 479 Views
  • 1 replies
  • 0 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 0 kudos

Hi @carlos_tasayco, this is supported for materialized views. Please check the document to confirm if you're using the right syntax:  https://docs.databricks.com/aws/en/dlt-ref/dlt-sql-ref-create-materialized-view#examples  Please let me know if you ...

  • 0 kudos
alexbarev
by New Contributor II
  • 1466 Views
  • 2 replies
  • 0 kudos

Very Slow UDF Execution on One Cluster Compared to Another with Similar Config

Hi all,I’m experiencing a significant slowdown behavior in Python UDF execution times on a particular cluster. The same code runs much faster on another cluster with very similar hardware and policy settings.This cell takes 2–3 minutes on the problem...

  • 1466 Views
  • 2 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor II
  • 0 kudos

Hi @alexbarev ,The slowdown is likely due to using Python UDFs on a Shared (Standard) access mode cluster with Unity Catalog, which adds extra security and isolation overhead. Using a Dedicated access mode cluster removes the extra isolation overhead...

  • 0 kudos
1 More Replies
Pavankumar7
by New Contributor III
  • 2828 Views
  • 1 replies
  • 0 kudos

Resolved! Diffrence b/w community edition and Free edition Databrick platform

Recently there is a news from DATA +AI summit mentioning Free edition of Databricks platform, how its different from community edition?Follow up question.Is there any limitation on compute resources.Will it support other cloud services provider apart...

  • 2828 Views
  • 1 replies
  • 0 kudos
Latest Reply
ilir_nuredini
Honored Contributor
  • 0 kudos

Hello Pavankumar,Regarding your questions:The difference between free edition and CE: "Free Edition has been designed and extended to include full access to the Data Intelligence Platform. It provides an easy-to-use environment where you can build AI...

  • 0 kudos
Parth2692
by New Contributor II
  • 971 Views
  • 6 replies
  • 0 kudos

Experiencing sorting problems with bigint columns

Experiencing sorting problems with bigint columns across tables tested. Example:In the table the projectid (bigint as per schema - which is correct) yet when sorted in SQL is sorting as per a string value 1000903 is returned as the minimum projectid ...

  • 971 Views
  • 6 replies
  • 0 kudos
Latest Reply
EktaPuri
New Contributor III
  • 0 kudos

Try running explain command  what's happening in the background, also if table is small try to write in another table and check 

  • 0 kudos
5 More Replies
stefan-vulpe
by New Contributor II
  • 1399 Views
  • 2 replies
  • 1 kudos

Resolved! Batch Python UDFs in Unity Catalog and Spark SQL

Hello datanauts 六‍,I'm encountering a conceptual challenge regarding Batch Python UDFs within Spark SQL in Databricks. My primary question is: can Batch Python UDFs be used directly via Spark SQL? As a Databricks beginner, I'm seeking to understand ...

Data Engineering
spark sql
udf
Unity Catalog
  • 1399 Views
  • 2 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 1 kudos

Hi @stefan-vulpe Looking at your code and the behavior you're describing, I can identify the core issue and provide some insights about Batch Python UDFs in Databricks.The Core ProblemThe issue you're encountering is related to session isolation and ...

  • 1 kudos
1 More Replies
MauricioS
by New Contributor III
  • 767 Views
  • 1 replies
  • 1 kudos

Is it possible to reprocess only a portion of a streaming table data using DLT?

Hi all,Currently I have a standard notebook that it takes 2 dates as parameters, start date and end date it goes to the source then it pull only that portion of data then on target table deletes if necessary (if data within those ranges exists) the u...

  • 767 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 1 kudos

Hi @MauricioS Yes, you can achieve similar reprocessing functionality with DLT streaming tables,but it requires a different approach than your current batch process. Here are the main strategies:1. CDC Pattern with Tombstone RecordsThe most common ap...

  • 1 kudos
juancanocondes
by New Contributor III
  • 2534 Views
  • 1 replies
  • 0 kudos

Resolved! Connection with Azure service principal

I have configured a data bricks connection with Azure using a service principal who is configured in both Azure system and data bricks.  I am getting an issue calling this API since early June. https://$URI$/api/2.0/secrets/scopes/list Headers = @{  ...

  • 2534 Views
  • 1 replies
  • 0 kudos
Latest Reply
juancanocondes
New Contributor III
  • 0 kudos

For anyone interested, this particular case was due migrating Azure Powershell to a new version. This is the link to follow just in case https://learn.microsoft.com/en-us/powershell/azure/migrate-az-14.0.0?view=azps-14.1.0

  • 0 kudos
IGRACH
by New Contributor III
  • 773 Views
  • 2 replies
  • 1 kudos

Resolved! Specifing Output mode and Path when using For Each Batch

Since .foreachBatch() is "hijacking" the stream and executing arbitrary code in it, do I need to specify Output mode and Path:(df.writeStream .format("delta") .trigger(availableNow = True) .option("checkpointLocation", "check_point_location") .forea...

  • 773 Views
  • 2 replies
  • 1 kudos
Latest Reply
Branislav
New Contributor II
  • 1 kudos

Thanks xD

  • 1 kudos
1 More Replies
ClarkElliott
by New Contributor
  • 4146 Views
  • 1 replies
  • 0 kudos

Parquet file for delta streaming live table with pipeline

I am having an issue with parquet files:   I'm getting Illegal Parquet type: INT64 (TIMESTAMP(NANOS,false)) error while trying to read a parquet file (generated outside of DataBricks).  I am using a Delta streaming live table with a pipeline.  If I r...

  • 4146 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @ClarkElliott  Good day!! Cause Databricks Runtime versions 11.3 LTS and above do not support the TIMESTAMP_NANOS type in open source Apache Spark and Databricks Runtime. If a Parquet file contains fields with the TIMESTAMP_NANOS type, attempts to...

  • 0 kudos
shavya
by New Contributor
  • 4723 Views
  • 1 replies
  • 0 kudos

Where are default temporary checkpoint locations created for streaming queries with display command?

Hello!I created a streaming query using Auto Loader to read data from S3 and used display command to see if the query was working. Initially, cloudFiles.includeExistingFiles was set to True, but since we have data in Glacier that needs to be retrieve...

  • 4723 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @shavya  Good day!! When you do not specify a checkpointLocation in a streaming query in Databricks. It uses a temporary system directory such as:     dbfs:/local_disk0/tmp/temporary-<random_uuid>   To remove the temporary checkpoint, please ...

  • 0 kudos
lprevost
by Contributor III
  • 1048 Views
  • 1 replies
  • 0 kudos

Streaming query error - [STREAMING_STATEFUL_OPERATOR_NOT_MATCH_IN_STATE_METADATA]

[STREAM_FAILED] Query [id = 6a821fbc-490b-4ad8-891d-e4cacc2af1d6, runId = e055fede-8012-4369-861b-47183999e91d] terminated with exception: [STREAMING_STATEFUL_OPERATOR_NOT_MATCH_IN_STATE_METADATA] Streaming stateful operator name does not match with ...

  • 1048 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @lprevost  Good day!! Please find below my analysis for your issue.  Error: [STREAM_FAILED] Query [id = 6a821fbc-490b-4ad8-891d-e4cacc2af1d6, runId = e055fede-8012-4369-861b-47183999e91d] terminated with exception: [STREAMING_STATEFUL_OPERATOR_NOT...

  • 0 kudos
Klusener
by Contributor
  • 1087 Views
  • 1 replies
  • 3 kudos

Resolved! Handling partition overwrite in Liquid Clustering

Hello,Currently we have delta tables in TBs partitioned by year, month, day. We perform dynamic partition overwrite using partitionOverwriteMode  as dynamic to handle rerun/corrections.With liquid clustering, since explicit partitions are not require...

  • 1087 Views
  • 1 replies
  • 3 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 3 kudos

Hi @Klusener Good day!!Dynamic partition overwrites only supports selective overwrites for partitioned columns, not for liquid clustering or regular columns.If you know the exact predicates, use replaceWhere. Note: This is not possible without knowin...

  • 3 kudos
Malthe
by Contributor III
  • 956 Views
  • 1 replies
  • 1 kudos

Resolved! Unable to add primary key constraint to nullable identity column

While we can in fact define a primary key during table creation for an identity column that's nullable (i.e., not constrained using NOT NULL), it's not possible to add such a primary key constraint after the table has been created.We get an error mes...

  • 956 Views
  • 1 replies
  • 1 kudos
Latest Reply
amuchoudhary
New Contributor III
  • 1 kudos

Creating a table with a nullable IDENTITY column and defining the primary key at creation time works.The database quietly interprets the column as NOT NULL for the purposes of the primary key, even though it's technically defined as nullable (i.e., n...

  • 1 kudos
Labels