Data Engineering

Forum Posts

Sorted by:

by 442027 • New Contributor II

07-06-2023 1:43:57 PM

2916 Views
1 replies
1 kudos

Default delta log retention interval is different than in documentation?

It notes in the documentation here that the default delta log retention interval is 30 days - however when I create checkpoints in the delta log to trigger the cleanup - historical records from 30 days aren't removed; i.e. current day checkpoint is a...

Data Engineering

2916 Views
1 replies
1 kudos

07-06-2023 1:43:57 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

01-24-2024 3:50:23 PM

1 kudos

you need to set SET TBLPROPERTIES ('delta.checkpointRetentionDuration' = '30 days',)

1 kudos

01-24-2024 3:50:23 PM

by Mrk • New Contributor II

07-17-2023 9:06:15 AM

14439 Views
4 replies
4 kudos

Resolved! Insert or merge into a table with GENERATED IDENTITY

Hi,When I create an identity column using the GENERATED ALWAYS AS IDENTITY statement and I try to INSERT or MERGE data into that table I keep getting the following error message:Cannot write to 'table', not enough data columns; target table has x col...

Data Engineering

14439 Views
4 replies
4 kudos

07-17-2023 9:06:15 AM

View Replies

Latest Reply

Aboladebaba
New Contributor III

01-24-2024 2:55:00 PM

4 kudos

You can run the INSERT by passing the subset of columns you want to provide values for... for example your insert statement would be something like:INSERT INTO target_table_with_identity_col(<list-of-cols-names-without-the-identity-column>SELECT(<lis...

4 kudos

01-24-2024 2:55:00 PM

3 More Replies

by ilarsen • Contributor

11-21-2023 3:05:41 PM

3959 Views
2 replies
0 kudos

Structured Streaming Auto Loader UnknownFieldsException and Workflow Retries

Hi. I am using structured streaming and auto loader to read json files, and it is automated by Workflow. I am having difficulties with the job failing as schema changes are detected, but not retrying. Hopefully someone can point me in the right dir...

Data Engineering

3959 Views
2 replies
0 kudos

11-21-2023 3:05:41 PM

View Replies

Latest Reply

ilarsen
Contributor

01-24-2024 12:25:50 PM

0 kudos

Another point I have realised, is that the task and the parent notebook (which then calls the child notebook that runs the auto loader part) does not fail if the schema-changed failure occurs during the auto loader process. It's the child notebook a...

0 kudos

01-24-2024 12:25:50 PM

1 More Replies

by Aidonis • New Contributor III

02-24-2023 8:43:34 AM

48213 Views
3 replies
3 kudos

Copilot Databricks integration

Given Copilot has now been released as a paid for product. Do we have a timeline when it will be integrated into Databricks?Our team are using VScode alot for Copilot and we think it would be super awesome to have it on our Databricks environment. Ou...

Data Engineering

48213 Views
3 replies
3 kudos

02-24-2023 8:43:34 AM

View Replies

Latest Reply

prasad_vaze
New Contributor III

01-24-2024 12:03:04 PM

3 kudos

@Vartika no josephk didn't answer Aidan's question. It's about comparing copilot with databricks assistant and can copilot be used in databricks workspace?

3 kudos

01-24-2024 12:03:04 PM

2 More Replies

by xneg • Contributor

03-06-2023 7:34:01 AM

21026 Views
12 replies
9 kudos

PyPI library sometimes doesn't install during workflow execution

I have a workflow that is running upon a job cluster and contains a task that requires prophet library from PyPI:{ "task_key": "my_task", "depends_on": [ { "task_key": "<...>...

Data Engineering

21026 Views
12 replies
9 kudos

03-06-2023 7:34:01 AM

View Replies

Latest Reply

Vartika
Databricks Employee

03-31-2023 2:40:57 AM

9 kudos

Hey @Eugene Bikkinin Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feed...

9 kudos

03-31-2023 2:40:57 AM

11 More Replies

by Michael_Galli • Databricks Partner

01-23-2024 7:39:15 AM

3040 Views
1 replies
0 kudos

How to add a Workflow File Arrival trigger on a file in a Unity Catalog Volume in Azure Databricks

I have a UC volume wil XLSX files, and would like to run a workflow when a new file arrives in the Volume.I was thinking of a workflow file arrival trigger.But that does not work when I add the physical ADLS location of the root folder:External locat...

Data Engineering

3040 Views
1 replies
0 kudos

01-23-2024 7:39:15 AM

View Replies

Latest Reply

Michael_Galli
Databricks Partner

01-24-2024 5:49:18 AM

0 kudos

Worked it out with Microsoft.-> only works with external volumes, not managed.https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/file-arrival-triggers

0 kudos

01-24-2024 5:49:18 AM

by ShlomoSQM • New Contributor

01-23-2024 6:26:27 AM

3305 Views
2 replies
0 kudos

Autoloader, toTable

"In autoloader there is the option ".toTable(catalog.volume.table_name)", I have an autoloder script that reads all the files from a source volume in unity catalog, inside the source I have two different files with two different schemas.I want to sen...

Data Engineering

3305 Views
2 replies
0 kudos

01-23-2024 6:26:27 AM

View Replies

Latest Reply

Palash01
Valued Contributor

01-23-2024 10:31:10 PM

0 kudos

Hey @ShlomoSQM, looks like @shan_chandra suggested a feasible solution, just to add a little more context this is how you can achieve the same if you have a column that can help you identify what is type1 and type 2file_type1_stream = readStream.opti...

0 kudos

01-23-2024 10:31:10 PM

1 More Replies

by Data_Engineeri7 • New Contributor

01-23-2024 8:57:37 AM

5202 Views
3 replies
0 kudos

Global or environment parameters.

Hi All,Need a help on creating utility file that can be use in pyspark notebook.Utility file contain variables like database and schema names. So I need to pass this variables in other notebook wherever I am using database and schema.Thanks

Data Engineering

5202 Views
3 replies
0 kudos

01-23-2024 8:57:37 AM

View Replies

Latest Reply

KSI
New Contributor II

01-23-2024 2:39:52 PM

0 kudos

You can use:${param_catalog}.schema.tablename.Pass actual value in the notebook through a job param "param_catalog" or widget utils through text called "param_catalog"

0 kudos

01-23-2024 2:39:52 PM

2 More Replies

by ilarsen • Contributor

11-21-2023 3:19:59 PM

3915 Views
1 replies
0 kudos

Schema inference with auto loader (non-DLT and DLT)

Hi. Another question, this time about schema inference and column types. I have dabbled with DLT and structured streaming with auto loader (as in, not DLT). My data source use case is json files, which contain nested structures. I noticed that in t...

Data Engineering

3915 Views
1 replies
0 kudos

11-21-2023 3:19:59 PM

View Replies

by MarthinusBosma1 • New Contributor II

01-23-2024 1:40:52 AM

2763 Views
3 replies
0 kudos

Unable to DROP TABLE: "Lock wait timeout exceeded"

We have a table where the underlying data has been dropped, and seemingly something else must have gone wrong as well, and we want to just get rid of the whole table and schema, but running "DROP TABLE schema.table" is throwing the following error:or...

Data Engineering

2763 Views
3 replies
0 kudos

01-23-2024 1:40:52 AM

View Replies

Latest Reply

Lakshay
Databricks Employee

01-23-2024 3:19:50 AM

0 kudos

The table needs to be dropped from the backend. If you can raise a ticket, support team can do it for you.

0 kudos

01-23-2024 3:19:50 AM

2 More Replies

by Data_Engineer3 • Contributor III

01-22-2024 5:25:28 AM

9423 Views
5 replies
0 kudos

Resolved! Need to define the struct and array of struct field colum in the delta live table(dlt) in databrick.

I want to create the columns with datatype struct and array of struct datatype in the DLT live tables, will it be possible, if possible could you share the sample for the same.Thanks.

Data Engineering

dlt

9423 Views
5 replies
0 kudos

01-22-2024 5:25:28 AM

View Replies

Latest Reply

Data_Engineer3
Contributor III

01-22-2024 11:16:29 PM

0 kudos

I have created DLT live tables pipeline, In Job UI, i can able to see only steps and if any failure happened it show only error at that stage.But if i use any log using print, it doesn't show the logs in the console or any where. how can i see the lo...

0 kudos

01-22-2024 11:16:29 PM

4 More Replies

by kiko_roy • Contributor

01-18-2024 8:15:47 AM

4388 Views
3 replies
1 kudos

Resolved! IsBlindAppend config changes

Hello Allcan someone please suggest me how can I change the config IsBlindAppend true from false. I need to do this not for a data table but a custom log table .Also is there any concern If I toggle the value as standard practices. pls suggest

Data Engineering

4388 Views
3 replies
1 kudos

01-18-2024 8:15:47 AM

View Replies

Latest Reply

Lakshay
Databricks Employee

01-22-2024 9:05:02 AM

1 kudos

Hi, IsBlindAppend is not a config but an operation metrics that is used in Delta Lake History. The value of this changes based on the type of operation performed on Delta table. https://docs.databricks.com/en/delta/history.html

1 kudos

01-22-2024 9:05:02 AM

2 More Replies

by francly • New Contributor II

08-06-2022 7:28:32 AM

5840 Views
5 replies
3 kudos

Resolved! terraform create multiple db user

Hi, follow the example to create one user. It's working however I want to create multiple users, I have tried many ways but still cannot get it work, please share some idea.https://registry.terraform.io/providers/databricks/databricks/latest/docs/res...

Data Engineering

5840 Views
5 replies
3 kudos

08-06-2022 7:28:32 AM

View Replies

Latest Reply

Natlab
New Contributor II

01-22-2024 11:10:10 PM

3 kudos

What if I want to give User Name along with the email ID?I used below code but its not helping(code is not failing, but not adding user name)It seems this code line: "display_name = each.key" is not working. Pls suggest. terraform {required_provider...

3 kudos

01-22-2024 11:10:10 PM

4 More Replies

by 364488 • New Contributor

01-22-2024 9:56:30 AM

2829 Views
2 replies
0 kudos

java.io.IOException: Invalid PKCS8 data error when reading data from Google Storage

Databricks workspace is hosted in AWS. Trying to access data in Google Cloud Platform.I have followed the instructions here: https://docs.databricks.com/en/connect/storage/gcs.htmlI get error: "java.io.IOException: Invalid PKCS8 data." when trying t...

Data Engineering

2829 Views
2 replies
0 kudos

01-22-2024 9:56:30 AM

View Replies

Latest Reply

Debayan
Databricks Employee

01-22-2024 7:29:01 PM

0 kudos

Hi, Could you also please share the whole error stack?

0 kudos

01-22-2024 7:29:01 PM

1 More Replies

by Faisal • Contributor

10-12-2023 1:44:31 AM

13330 Views
1 replies
0 kudos

DLT quarantine records

How to capture bad records that are violating expectations into quarantine tables, can someone provide DLT SQL code syntax for the same

Data Engineering

13330 Views
1 replies
0 kudos

10-12-2023 1:44:31 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

01-22-2024 3:49:20 PM

0 kudos

I would like to share the following docs, which will have examples https://docs.databricks.com/en/delta-live-tables/expectations.html

0 kudos

01-22-2024 3:49:20 PM

Databricks Community

Forum Posts

Default delta log retention interval is different than in documentation?

Resolved! Insert or merge into a table with GENERATED IDENTITY

Structured Streaming Auto Loader UnknownFieldsException and Workflow Retries

Copilot Databricks integration

PyPI library sometimes doesn't install during workflow execution

How to add a Workflow File Arrival trigger on a file in a Unity Catalog Volume in Azure Databricks

Autoloader, toTable

Global or environment parameters.

Schema inference with auto loader (non-DLT and DLT)

Unable to DROP TABLE: "Lock wait timeout exceeded"

Resolved! Need to define the struct and array of struct field colum in the delta live table(dlt) in databrick.

Resolved! IsBlindAppend config changes

Resolved! terraform create multiple db user

java.io.IOException: Invalid PKCS8 data error when reading data from Google Storage

DLT quarantine records

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template