Data Engineering

Forum Posts

Sorted by:

by joeyslaptop • New Contributor II

01-17-2024 8:27:50 PM

2057 Views
5 replies
2 kudos

How to add a column to a new table containing the original source filenames in DataBricks.

If this isn't the right spot to post this, please move it or refer me to the right area.I recently learned about the "_metadata.file_name". It's not quite what I need.I'm creating a new table in DataBricks and want to add a USR_File_Name column cont...

Data Engineering

Databricks

filename

import

SharePoint

Upload

2057 Views
5 replies
2 kudos

01-17-2024 8:27:50 PM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

01-18-2024 11:15:35 AM

2 kudos

Hi, Could you please elaborate more on the expectation here?

2 kudos

01-18-2024 11:15:35 AM

4 More Replies

by William_Scardua • Valued Contributor

01-25-2024 9:07:46 AM

244 Views
1 replies
0 kudos

Cluster types pricing

Hy guys,How can I get the pricing of cluster types (standard_D*, standard_E*, standart_F*, etc.) ?Im doing a study to decrease the price of my actual cluster.Have any idea ?Thank you, thank you

Data Engineering

244 Views
1 replies
0 kudos

01-25-2024 9:07:46 AM

View Replies

Latest Reply

Lakshay
Esteemed Contributor

01-25-2024 9:49:29 AM

0 kudos

Hey, you can use the pricing calculator here: https://www.databricks.com/product/pricing/product-pricing/instance-types

0 kudos

01-25-2024 9:49:29 AM

by JJ_LVS1 • New Contributor III

03-22-2023 1:17:37 PM

1342 Views
4 replies
1 kudos

FiscalYear Start Period Is not Correct

Hi, I'm trying to create a calendar dimension including a fiscal year with a fiscal start of April 1. I'm using the fiscalyear library and am setting the start to month 4 but it insists on setting April to month 7.runtime 12.1My code snipet is:start_...

Data Engineering

1342 Views
4 replies
1 kudos

03-22-2023 1:17:37 PM

View Replies

Latest Reply

DataEnginner
New Contributor II

01-25-2024 7:53:11 AM

1 kudos

import fiscalyear import datetime def get_fiscal_date(year,month,day): fiscalyear.setup_fiscal_calendar(start_month=4) v_fiscal_month=fiscalyear.FiscalDateTime(year, month, day).fiscal_month #To get the Fiscal Month v_fiscal_quarter=fiscalyea...

1 kudos

01-25-2024 7:53:11 AM

3 More Replies

by harlemmuniz • New Contributor II

01-24-2024 12:33:22 PM

479 Views
2 replies
1 kudos

Issue with Job Versioning with “Run Job” tasks and Deployments between envinronments

Hello,I am writing to bring to your attention an issue that we have encountered while working with Databricks and seek your assistance in resolving it.When running a Job of Workflow with the task "Run Job" and clicking on "View YAML/JSON," we have ob...

Data Engineering

479 Views
2 replies
1 kudos

01-24-2024 12:33:22 PM

View Replies

Latest Reply

harlemmuniz
New Contributor II

01-25-2024 5:11:00 AM

1 kudos

Hi @Kaniz, thank you for your fast response.However, the versioned JSON or YAML (via Databricks Asset Bundle) in the Job UI should also include the job_name, or we have to change it manually by replacing the job_id with the job_name. For this reason,...

1 kudos

01-25-2024 5:11:00 AM

1 More Replies

by 442027 • New Contributor II

07-06-2023 1:43:57 PM

450 Views
1 replies
0 kudos

Default delta log retention interval is different than in documentation?

It notes in the documentation here that the default delta log retention interval is 30 days - however when I create checkpoints in the delta log to trigger the cleanup - historical records from 30 days aren't removed; i.e. current day checkpoint is a...

Data Engineering

450 Views
1 replies
0 kudos

07-06-2023 1:43:57 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

01-24-2024 3:50:23 PM

0 kudos

you need to set SET TBLPROPERTIES ('delta.checkpointRetentionDuration' = '30 days',)

0 kudos

01-24-2024 3:50:23 PM

by Mrk • New Contributor II

07-17-2023 9:06:15 AM

3820 Views
4 replies
3 kudos

Resolved! Insert or merge into a table with GENERATED IDENTITY

Hi,When I create an identity column using the GENERATED ALWAYS AS IDENTITY statement and I try to INSERT or MERGE data into that table I keep getting the following error message:Cannot write to 'table', not enough data columns; target table has x col...

Data Engineering

3820 Views
4 replies
3 kudos

07-17-2023 9:06:15 AM

View Replies

Latest Reply

Aboladebaba
New Contributor II

01-24-2024 2:55:00 PM

3 kudos

You can run the INSERT by passing the subset of columns you want to provide values for... for example your insert statement would be something like:INSERT INTO target_table_with_identity_col(<list-of-cols-names-without-the-identity-column>SELECT(<lis...

3 kudos

01-24-2024 2:55:00 PM

3 More Replies

by ilarsen • Contributor

11-21-2023 3:05:41 PM

975 Views
3 replies
1 kudos

Structured Streaming Auto Loader UnknownFieldsException and Workflow Retries

Hi. I am using structured streaming and auto loader to read json files, and it is automated by Workflow. I am having difficulties with the job failing as schema changes are detected, but not retrying. Hopefully someone can point me in the right dir...

Data Engineering

975 Views
3 replies
1 kudos

11-21-2023 3:05:41 PM

View Replies

Latest Reply

ilarsen
Contributor

01-24-2024 12:25:50 PM

1 kudos

Another point I have realised, is that the task and the parent notebook (which then calls the child notebook that runs the auto loader part) does not fail if the schema-changed failure occurs during the auto loader process. It's the child notebook a...

1 kudos

01-24-2024 12:25:50 PM

2 More Replies

by Aidonis • New Contributor III

02-24-2023 8:43:34 AM

7048 Views
3 replies
3 kudos

Copilot Databricks integration

Given Copilot has now been released as a paid for product. Do we have a timeline when it will be integrated into Databricks?Our team are using VScode alot for Copilot and we think it would be super awesome to have it on our Databricks environment. Ou...

Data Engineering

7048 Views
3 replies
3 kudos

02-24-2023 8:43:34 AM

View Replies

Latest Reply

prasad_vaze
New Contributor III

01-24-2024 12:03:04 PM

3 kudos

@Vartika no josephk didn't answer Aidan's question. It's about comparing copilot with databricks assistant and can copilot be used in databricks workspace?

3 kudos

01-24-2024 12:03:04 PM

2 More Replies

by xneg • Contributor

03-06-2023 7:34:01 AM

6922 Views
12 replies
9 kudos

PyPI library sometimes doesn't install during workflow execution

I have a workflow that is running upon a job cluster and contains a task that requires prophet library from PyPI:{ "task_key": "my_task", "depends_on": [ { "task_key": "<...>...

Data Engineering

6922 Views
12 replies
9 kudos

03-06-2023 7:34:01 AM

View Replies

Latest Reply

Vartika
Moderator

03-31-2023 2:40:57 AM

9 kudos

Hey @Eugene Bikkinin Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feed...

9 kudos

03-31-2023 2:40:57 AM

11 More Replies

by Michael_Galli • Contributor II

01-23-2024 7:39:15 AM

630 Views
1 replies
0 kudos

How to add a Workflow File Arrival trigger on a file in a Unity Catalog Volume in Azure Databricks

I have a UC volume wil XLSX files, and would like to run a workflow when a new file arrives in the Volume.I was thinking of a workflow file arrival trigger.But that does not work when I add the physical ADLS location of the root folder:External locat...

Data Engineering

630 Views
1 replies
0 kudos

01-23-2024 7:39:15 AM

View Replies

Latest Reply

Michael_Galli
Contributor II

01-24-2024 5:49:18 AM

0 kudos

Worked it out with Microsoft.-> only works with external volumes, not managed.https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/file-arrival-triggers

0 kudos

01-24-2024 5:49:18 AM

by Bram • New Contributor II

09-05-2023 6:53:03 AM

2562 Views
7 replies
0 kudos

Configuration spark.sql.sources.partitionOverwriteMode is not available.

Dear, In the current setup, we are using dbt as a modeling tool for our data lakehouse.For a specific use case, we want to use the insert_overwrite strategy, where dbt will replace all data for a specific partition:Databricks configurations | dbt Dev...

Data Engineering

2562 Views
7 replies
0 kudos

09-05-2023 6:53:03 AM

View Replies

Latest Reply

nad__
New Contributor II

01-24-2024 4:09:17 AM

0 kudos

Hi!I have same issue with insert_overwrite on Databricks with SQL Warehouse. Do you have any solution or updates? Or is it still not supported by Databricks?

0 kudos

01-24-2024 4:09:17 AM

6 More Replies

by ShlomoSQM • New Contributor

01-23-2024 6:26:27 AM

452 Views
2 replies
0 kudos

Autoloader, toTable

"In autoloader there is the option ".toTable(catalog.volume.table_name)", I have an autoloder script that reads all the files from a source volume in unity catalog, inside the source I have two different files with two different schemas.I want to sen...

Data Engineering

452 Views
2 replies
0 kudos

01-23-2024 6:26:27 AM

View Replies

Latest Reply

Palash01
Contributor III

01-23-2024 10:31:10 PM

0 kudos

Hey @ShlomoSQM, looks like @shan_chandra suggested a feasible solution, just to add a little more context this is how you can achieve the same if you have a column that can help you identify what is type1 and type 2file_type1_stream = readStream.opti...

0 kudos

01-23-2024 10:31:10 PM

1 More Replies

by dbdude • New Contributor II

08-17-2023 4:01:48 PM

4481 Views
3 replies
0 kudos

AWS Secrets Works In One Cluster But Not Another

Why can I use boto3 to go to secrets manager to retrieve a secret with a personal cluster but I get an error with a shared cluster?NoCredentialsError: Unable to locate credentials

Data Engineering

4481 Views
3 replies
0 kudos

08-17-2023 4:01:48 PM

View Replies

Latest Reply

drii_cavalcanti
New Contributor III

01-23-2024 8:56:57 PM

0 kudos

Hey @Szpila , have you found a solution for it? I am currently encountering the same issue.

0 kudos

01-23-2024 8:56:57 PM

2 More Replies

by Data_Engineeri7 • New Contributor

01-23-2024 8:57:37 AM

617 Views
3 replies
0 kudos

Global or environment parameters.

Hi All,Need a help on creating utility file that can be use in pyspark notebook.Utility file contain variables like database and schema names. So I need to pass this variables in other notebook wherever I am using database and schema.Thanks

Data Engineering

617 Views
3 replies
0 kudos

01-23-2024 8:57:37 AM

View Replies

Latest Reply

KSI
New Contributor II

01-23-2024 2:39:52 PM

0 kudos

You can use:${param_catalog}.schema.tablename.Pass actual value in the notebook through a job param "param_catalog" or widget utils through text called "param_catalog"

0 kudos

01-23-2024 2:39:52 PM

2 More Replies

by ilarsen • Contributor

11-21-2023 3:19:59 PM

1460 Views
2 replies
1 kudos

Resolved! Schema inference with auto loader (non-DLT and DLT)

Hi. Another question, this time about schema inference and column types. I have dabbled with DLT and structured streaming with auto loader (as in, not DLT). My data source use case is json files, which contain nested structures. I noticed that in t...

Data Engineering

1460 Views
2 replies
1 kudos

11-21-2023 3:19:59 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-21-2023 10:37:58 PM

1 kudos

Hi @ilarsen , Certainly! Let’s delve into the nuances of schema inference and column types in the context of Delta Live Tables (DLT) and structured streaming with auto loader. DLT vs. Structured Streaming: DLT (Delta Live Tables) is a managed servi...

1 kudos

11-21-2023 10:37:58 PM

1 More Replies

User

Count

1602

736

344

284

247

Databricks

Forum Posts

How to add a column to a new table containing the original source filenames in DataBricks.

Cluster types pricing

FiscalYear Start Period Is not Correct

Issue with Job Versioning with “Run Job” tasks and Deployments between envinronments

Default delta log retention interval is different than in documentation?

Resolved! Insert or merge into a table with GENERATED IDENTITY

Structured Streaming Auto Loader UnknownFieldsException and Workflow Retries

Copilot Databricks integration

PyPI library sometimes doesn't install during workflow execution

How to add a Workflow File Arrival trigger on a file in a Unity Catalog Volume in Azure Databricks

Configuration spark.sql.sources.partitionOverwriteMode is not available.

Autoloader, toTable

AWS Secrets Works In One Cluster But Not Another

Global or environment parameters.

Resolved! Schema inference with auto loader (non-DLT and DLT)

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...