Data Engineering

Forum Posts

Sorted by:

by Aidonis • New Contributor III

02-24-2023 8:43:34 AM

6840 Views
3 replies
3 kudos

Copilot Databricks integration

Given Copilot has now been released as a paid for product. Do we have a timeline when it will be integrated into Databricks?Our team are using VScode alot for Copilot and we think it would be super awesome to have it on our Databricks environment. Ou...

Data Engineering

6840 Views
3 replies
3 kudos

02-24-2023 8:43:34 AM

View Replies

Latest Reply

prasad_vaze
New Contributor III

01-24-2024 12:03:04 PM

3 kudos

@Vartika no josephk didn't answer Aidan's question. It's about comparing copilot with databricks assistant and can copilot be used in databricks workspace?

3 kudos

01-24-2024 12:03:04 PM

2 More Replies

by xneg • Contributor

03-06-2023 7:34:01 AM

6630 Views
12 replies
9 kudos

PyPI library sometimes doesn't install during workflow execution

I have a workflow that is running upon a job cluster and contains a task that requires prophet library from PyPI:{ "task_key": "my_task", "depends_on": [ { "task_key": "<...>...

Data Engineering

6630 Views
12 replies
9 kudos

03-06-2023 7:34:01 AM

View Replies

Latest Reply

Vartika
Moderator

03-31-2023 2:40:57 AM

9 kudos

Hey @Eugene Bikkinin Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feed...

9 kudos

03-31-2023 2:40:57 AM

11 More Replies

by Michael_Galli • Contributor II

01-23-2024 7:39:15 AM

591 Views
1 replies
0 kudos

How to add a Workflow File Arrival trigger on a file in a Unity Catalog Volume in Azure Databricks

I have a UC volume wil XLSX files, and would like to run a workflow when a new file arrives in the Volume.I was thinking of a workflow file arrival trigger.But that does not work when I add the physical ADLS location of the root folder:External locat...

Data Engineering

591 Views
1 replies
0 kudos

01-23-2024 7:39:15 AM

View Replies

Latest Reply

Michael_Galli
Contributor II

01-24-2024 5:49:18 AM

0 kudos

Worked it out with Microsoft.-> only works with external volumes, not managed.https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/file-arrival-triggers

0 kudos

01-24-2024 5:49:18 AM

by Bram • New Contributor II

09-05-2023 6:53:03 AM

2481 Views
7 replies
0 kudos

Configuration spark.sql.sources.partitionOverwriteMode is not available.

Dear, In the current setup, we are using dbt as a modeling tool for our data lakehouse.For a specific use case, we want to use the insert_overwrite strategy, where dbt will replace all data for a specific partition:Databricks configurations | dbt Dev...

Data Engineering

2481 Views
7 replies
0 kudos

09-05-2023 6:53:03 AM

View Replies

Latest Reply

nad__
New Contributor II

01-24-2024 4:09:17 AM

0 kudos

Hi!I have same issue with insert_overwrite on Databricks with SQL Warehouse. Do you have any solution or updates? Or is it still not supported by Databricks?

0 kudos

01-24-2024 4:09:17 AM

6 More Replies

by ShlomoSQM • New Contributor

01-23-2024 6:26:27 AM

434 Views
2 replies
0 kudos

Autoloader, toTable

"In autoloader there is the option ".toTable(catalog.volume.table_name)", I have an autoloder script that reads all the files from a source volume in unity catalog, inside the source I have two different files with two different schemas.I want to sen...

Data Engineering

434 Views
2 replies
0 kudos

01-23-2024 6:26:27 AM

View Replies

Latest Reply

Palash01
Contributor III

01-23-2024 10:31:10 PM

0 kudos

Hey @ShlomoSQM, looks like @shan_chandra suggested a feasible solution, just to add a little more context this is how you can achieve the same if you have a column that can help you identify what is type1 and type 2file_type1_stream = readStream.opti...

0 kudos

01-23-2024 10:31:10 PM

1 More Replies

by dbdude • New Contributor II

08-17-2023 4:01:48 PM

4350 Views
3 replies
0 kudos

AWS Secrets Works In One Cluster But Not Another

Why can I use boto3 to go to secrets manager to retrieve a secret with a personal cluster but I get an error with a shared cluster?NoCredentialsError: Unable to locate credentials

Data Engineering

4350 Views
3 replies
0 kudos

08-17-2023 4:01:48 PM

View Replies

Latest Reply

drii_cavalcanti
New Contributor III

01-23-2024 8:56:57 PM

0 kudos

Hey @Szpila , have you found a solution for it? I am currently encountering the same issue.

0 kudos

01-23-2024 8:56:57 PM

2 More Replies

by Data_Engineeri7 • New Contributor

01-23-2024 8:57:37 AM

585 Views
3 replies
0 kudos

Global or environment parameters.

Hi All,Need a help on creating utility file that can be use in pyspark notebook.Utility file contain variables like database and schema names. So I need to pass this variables in other notebook wherever I am using database and schema.Thanks

Data Engineering

585 Views
3 replies
0 kudos

01-23-2024 8:57:37 AM

View Replies

Latest Reply

KSI
New Contributor II

01-23-2024 2:39:52 PM

0 kudos

You can use:${param_catalog}.schema.tablename.Pass actual value in the notebook through a job param "param_catalog" or widget utils through text called "param_catalog"

0 kudos

01-23-2024 2:39:52 PM

2 More Replies

by ilarsen • Contributor

11-21-2023 3:19:59 PM

1393 Views
2 replies
1 kudos

Resolved! Schema inference with auto loader (non-DLT and DLT)

Hi. Another question, this time about schema inference and column types. I have dabbled with DLT and structured streaming with auto loader (as in, not DLT). My data source use case is json files, which contain nested structures. I noticed that in t...

Data Engineering

1393 Views
2 replies
1 kudos

11-21-2023 3:19:59 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-21-2023 10:37:58 PM

1 kudos

Hi @ilarsen , Certainly! Let’s delve into the nuances of schema inference and column types in the context of Delta Live Tables (DLT) and structured streaming with auto loader. DLT vs. Structured Streaming: DLT (Delta Live Tables) is a managed servi...

1 kudos

11-21-2023 10:37:58 PM

1 More Replies

by MarthinusBosma1 • New Contributor II

01-23-2024 1:40:52 AM

466 Views
3 replies
0 kudos

Unable to DROP TABLE: "Lock wait timeout exceeded"

We have a table where the underlying data has been dropped, and seemingly something else must have gone wrong as well, and we want to just get rid of the whole table and schema, but running "DROP TABLE schema.table" is throwing the following error:or...

Data Engineering

466 Views
3 replies
0 kudos

01-23-2024 1:40:52 AM

View Replies

Latest Reply

Lakshay
Esteemed Contributor

01-23-2024 3:19:50 AM

0 kudos

The table needs to be dropped from the backend. If you can raise a ticket, support team can do it for you.

0 kudos

01-23-2024 3:19:50 AM

2 More Replies

by Mystagon • New Contributor II

01-23-2024 2:10:58 AM

426 Views
1 replies
0 kudos

Performance Issues with Unity Catalog

Hey I need some help / suggestions troubleshooting this, I have two DataBricks Workspaces Common and Lakehouse. There difference between them is: Major Differences:- Lakehouse is using Unity Catalog- Lakehouse is using External Locations whereas cre...

Data Engineering

426 Views
1 replies
0 kudos

01-23-2024 2:10:58 AM

View Replies

Latest Reply

Lakshay
Esteemed Contributor

01-23-2024 4:06:37 AM

0 kudos

This needs a detailed analysis to understand the root cause. But a good point to start is to compare the Spark Ui for both runs and identify which part of execution is taking time. And then we need to look at the logs.

0 kudos

01-23-2024 4:06:37 AM

by Data_Engineer3 • Contributor II

01-22-2024 5:25:28 AM

1032 Views
5 replies
0 kudos

Resolved! Need to define the struct and array of struct field colum in the delta live table(dlt) in databrick.

I want to create the columns with datatype struct and array of struct datatype in the DLT live tables, will it be possible, if possible could you share the sample for the same.Thanks.

Data Engineering

dlt

1032 Views
5 replies
0 kudos

01-22-2024 5:25:28 AM

View Replies

Latest Reply

Data_Engineer3
Contributor II

01-22-2024 11:16:29 PM

0 kudos

I have created DLT live tables pipeline, In Job UI, i can able to see only steps and if any failure happened it show only error at that stage.But if i use any log using print, it doesn't show the logs in the console or any where. how can i see the lo...

0 kudos

01-22-2024 11:16:29 PM

4 More Replies

by kiko_roy • New Contributor III

01-18-2024 8:15:47 AM

1168 Views
3 replies
1 kudos

Resolved! IsBlindAppend config changes

Hello Allcan someone please suggest me how can I change the config IsBlindAppend true from false. I need to do this not for a data table but a custom log table .Also is there any concern If I toggle the value as standard practices. pls suggest

Data Engineering

1168 Views
3 replies
1 kudos

01-18-2024 8:15:47 AM

View Replies

Latest Reply

Lakshay
Esteemed Contributor

01-22-2024 9:05:02 AM

1 kudos

Hi, IsBlindAppend is not a config but an operation metrics that is used in Delta Lake History. The value of this changes based on the type of operation performed on Delta table. https://docs.databricks.com/en/delta/history.html

1 kudos

01-22-2024 9:05:02 AM

2 More Replies

by aa_204 • New Contributor II

07-28-2023 5:38:24 AM

977 Views
3 replies
0 kudos

Reading excel file using pandas on spark api not rendering #N/A values correctly

I am trying to read a .xlsx file using ps.read_excel() and having #N/A as a value for string type columns. But in the dataframe, i am getting "null" inplace of #N/A . Is there any option , using which we can read #N/A as a string in .xlsx file

Data Engineering

977 Views
3 replies
0 kudos

07-28-2023 5:38:24 AM

View Replies

Latest Reply

vishwanath_1
New Contributor III

01-23-2024 2:54:18 AM

0 kudos

i am facing the same issue currently even after setting keep_default_na = False still #N/A is being converted as nulldoes anyone know the solution here?

0 kudos

01-23-2024 2:54:18 AM

2 More Replies

by bayerb • New Contributor

01-22-2024 12:59:54 AM

408 Views
2 replies
0 kudos

Sink is not written into delta table in Spark structured streaming

I want to create a streaming job, that reads messages from a folder within TXT files, does the parsing, some processing, and appends the result into one of 3 possible delta tables depending on the parse result. There is a parse_failed table, an unknw...

Data Engineering

408 Views
2 replies
0 kudos

01-22-2024 12:59:54 AM

View Replies

Latest Reply

Kaniz
Community Manager

01-23-2024 12:44:33 AM

0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

0 kudos

01-23-2024 12:44:33 AM

1 More Replies

by francly • New Contributor II

08-06-2022 7:28:32 AM

1776 Views
5 replies
3 kudos

Resolved! terraform create multiple db user

Hi, follow the example to create one user. It's working however I want to create multiple users, I have tried many ways but still cannot get it work, please share some idea.https://registry.terraform.io/providers/databricks/databricks/latest/docs/res...

Data Engineering

1776 Views
5 replies
3 kudos

08-06-2022 7:28:32 AM

View Replies

Latest Reply

Natlab
New Contributor II

01-22-2024 11:10:10 PM

3 kudos

What if I want to give User Name along with the email ID?I used below code but its not helping(code is not failing, but not adding user name)It seems this code line: "display_name = each.key" is not working. Pls suggest. terraform {required_provider...

3 kudos

01-22-2024 11:10:10 PM

4 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Copilot Databricks integration

PyPI library sometimes doesn't install during workflow execution

How to add a Workflow File Arrival trigger on a file in a Unity Catalog Volume in Azure Databricks

Configuration spark.sql.sources.partitionOverwriteMode is not available.

Autoloader, toTable

AWS Secrets Works In One Cluster But Not Another

Global or environment parameters.

Resolved! Schema inference with auto loader (non-DLT and DLT)

Unable to DROP TABLE: "Lock wait timeout exceeded"

Performance Issues with Unity Catalog

Resolved! Need to define the struct and array of struct field colum in the delta live table(dlt) in databrick.

Resolved! IsBlindAppend config changes

Reading excel file using pandas on spark api not rendering #N/A values correctly

Sink is not written into delta table in Spark structured streaming

Resolved! terraform create multiple db user

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...