Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I am looking forward for functionality similar to snowflake which allows attaching masking to a existing column. Documents found related to masking with encryption but my use case is on the existing table. Solutions using views along with Dynamic Vie...
we are facing similar issues while write into adls location delta format, after that we created on top delta location unity catalog tables. below format of data type length should be possible to change spark sql supported ?Azure SQL Spark ...
Rename and drop columns with Delta Lake column mapping. Hi all,Now databricks started supporting column rename and drop.Column mapping requires the following Delta protocols:Reader version 2 or above.Writer version 5 or above.Blog URL##Available in D...
Hi,I've large Delta Table for IoT data for over 10K different sensors with timestamp, sensor name and value columns at 1 second precision.Query pattern is usually random 5-100 sensors at a time. But typically involves specific year/month/day interval...
@numersoz did you z-order on the timestamp column or on less granular columns, like Year, Month, or Day. timestamp column is very granular (high cardinality) since it also includes hour, minute, second...
Hi @THIAM HUAT TAN , The issue is because the schema defined for the column "Rainfall_Value" is of DoubleType and the values present in the data frame are of Integer type. This could be because of one or multiple values. Depending on the data, you ...
Here's me use case: I'm migrating out of an old DWH, into Databricks. When moving dimension tables into Databricks, I'd like old SKs (surrogate keys) to be maintained, while creating the SKs column as an IDENTITY column, so new dimension values get a...
Hi @anand R Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ca...
I have the following code in a notebook. It is randomly giving me the error, "At least one column must be specified for the table." The error occurs (if at all it occurs) only on the first run after attaching to a cluster.Cluster details:Summary5-1...
Hi Today I have seen very Strang behavior of databricks.I have dropped one column from a dataframe and assigned the result to a new dataframe but I am able to use the dropped column in the filter command.In general scenario I should get an error but ...
Hi, I created delta table with identity column using this syntax:Id BIGINT GENERATED BY DEFAULT AS IDENTITYMy steps:1) Created table with Id using syntax above.2) Added two rows with Id = 1 and Id = 2 (BY DEFAULT allows to do that).3) Run Insert (wit...
I have a column that contains an array of structs as follows:"column" : [
{ "struct_field1": "struct_value", "struct_field2": "struct_value" },
{ "struct_field1": "struct_value", "struct_field2": "struct_value" }
]I want to apply a udf to each f...
Hi @Paweł Tomczyk Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...
I would like to do a big search on all field/columns names that contain "XYZ".I tried below sql but it's giving me an error.SELECT table_name,column_nameFROM information_schema.columnsWHERE column_name like '%<account>%'order by table_name, column_na...
Hi @Ian Fox Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your ...
@Istuti Gupta :There are several algorithms you can use to mask a column in Databricks in a way that is compatible with SQL Server. One commonly used algorithm is called pseudonymization or tokenization.Here's an example of how you can implement pse...