Data Engineering

Forum Posts

Sorted by:

by Bin • New Contributor

08-21-2022 10:36:23 PM

682 Views
0 replies
0 kudos

How to do an "overwrite" output mode using spark structured streaming without deleting all the data and the checkpoint

I have this delta lake in ADLS to sink data through spark structured streaming. We usually append new data from our data source to our delta lake, but there are some cases when we find errors in the data that we need to reprocess everything. So what ...

Data Engineering

682 Views
0 replies
0 kudos

08-21-2022 10:36:23 PM

by mp • New Contributor II

09-24-2021 10:26:22 PM

1403 Views
4 replies
6 kudos

Resolved! How can I convert a parquet into delta table?

I am looking to migrate my legacy warehouse data. How can I convert a parquet into delta table?

Data Engineering

1403 Views
4 replies
6 kudos

09-24-2021 10:26:22 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-07-2022 3:16:10 AM

6 kudos

Hi @Manish P , You have three options for converting a Parquet table to a Delta table.Convert files to Delta Lake format and then create a Delta table:CONVERT TO DELTA parquet.`/data-pipeline/` CREATE TABLE events USING DELTA LOCATION '/data-pipelin...

6 kudos

03-07-2022 3:16:10 AM

3 More Replies

by ilarsen • Contributor

08-21-2022 5:18:12 PM

501 Views
0 replies
1 kudos

Trouble referencing a column that has been added by schema evolution (Auto Loader with Delta Live Tables)

Hi,I have a Delta Live Tables pipeline, using Auto Loader, to ingest from JSON files. I need to do some transformations - in this case, converting timestamps. Except one of the timestamp columns does not exist in every file. This is causing the DLT p...

Data Engineering

501 Views
0 replies
1 kudos

08-21-2022 5:18:12 PM

by serg-v • New Contributor III

06-17-2022 2:25:53 AM

1089 Views
3 replies
0 kudos

Running large window spark structured streaming aggregations with small slide duration

I want to run aggregations on large windows (90 days) with small slide duration (5 minutes).Straightforward solution leads to giant state around hundreds of gigabytes, which doesn't look acceptable.Is there any best practices doing this?Now I conside...

Data Engineering

1089 Views
3 replies
0 kudos

06-17-2022 2:25:53 AM

View Replies

Latest Reply

Kaniz
Community Manager

06-23-2022 5:40:11 AM

0 kudos

Hi @Sergey Volkov, Thanks for your question. Here are some fantastic articles on EWMA and Event-time Aggregation in Apache Spark™’s Structured Streaming. Please have a look. Let us know if that helps.https://towardsdatascience.com/time-series-from-s...

0 kudos

06-23-2022 5:40:11 AM

2 More Replies

by SailajaB • Valued Contributor III

01-17-2022 6:05:32 AM

1062 Views
2 replies
8 kudos

Resolved! How to restrict Azure users to use launch workspace to login to ADB workspace as admin when user has owner or contributor role

HI,Is there any way to disable launch workspace option in Azure portal for ADB.We have user accesses at resource group, so we need to restrict users who are part of owner or contributor role to launch ADB worksapce as admin.Thank you

Data Engineering

1062 Views
2 replies
8 kudos

01-17-2022 6:05:32 AM

View Replies

Latest Reply

none_ranjeet
New Contributor III

08-20-2022 7:20:27 PM

8 kudos

Deny Assignments don't block subscription contributor to launch workspace and become admin. Actually I haven't find any way to block that after many tries of different methods.

8 kudos

08-20-2022 7:20:27 PM

1 More Replies

by Malcoln_Dandaro • New Contributor

08-20-2022 1:45:21 PM

1156 Views
0 replies
0 kudos

Is there any way to navigate/access cloud files using the direct abfss URI (no mount) with default python functions/libs like open() or os.listdir()?

Hello, Today on our workspace we access everything via mount points, we plan to change it to "abfss://" because of security, governance and performance reasons. The problem is sometimes we interact with files using "python only" code, and apparently ...

Data Engineering

1156 Views
0 replies
0 kudos

08-20-2022 1:45:21 PM

by danny_edm • New Contributor

08-19-2022 9:44:18 PM

352 Views
0 replies
0 kudos

collect_set wired result when Proton enable

Cluster : DBR 10.4 LTS with protonSample schemaseq_no (decimal)type (string)Sample dataseq_no type1 A1 A2 A2 B2 Bcommand : F.size(F.collect_set(F.col("type")).over(Window.partitionBy("seq_no"))...

Data Engineering

352 Views
0 replies
0 kudos

08-19-2022 9:44:18 PM

by moos • New Contributor

08-19-2022 2:47:18 PM

597 Views
0 replies
0 kudos

ManagedLibraryInstallFailed when changing Databricks Runtime Version from 9.1 to 11.0

Hi, I'm currently using Databricks Runtime Version 9.1 LTS and everything is fine. When I change it to 11.0 (while keeping everything else the same), my libraries failed to install. Here is the error message:java.lang.RuntimeException: ManagedLibrary...

Data Engineering

597 Views
0 replies
0 kudos

08-19-2022 2:47:18 PM

by Mamdouh_Dabjan • New Contributor III

08-18-2022 12:09:34 PM

2269 Views
6 replies
2 kudos

Importing a large csv file into databricks free

Basically, I have a large csv file that does not fit in a single worksheet. I can just use it in power query. I am trying to import this file into my databricks notebook. I imported it and created a table using that file. But, When I saw the table, i...

Data Engineering

2269 Views
6 replies
2 kudos

08-18-2022 12:09:34 PM

View Replies

Latest Reply

weldermartins
Honored Contributor

08-19-2022 5:19:41 AM

2 kudos

hello, manually opening one of the parts of the csv file is the view different?

2 kudos

08-19-2022 5:19:41 AM

5 More Replies

by yannickmo • New Contributor III

10-12-2021 8:01:59 AM

3765 Views
8 replies
14 kudos

Resolved! Adding JAR from Azure DevOps Artifacts feed to Databricks job

Hello,We have some Scala code which is compiled and published to an Azure DevOps Artifacts feed.The issue is we're trying to now add this JAR to a Databricks job (through Terraform) to automate the creation.To do this I'm trying to authenticate using...

Data Engineering

3765 Views
8 replies
14 kudos

10-12-2021 8:01:59 AM

View Replies

Latest Reply

alexott
Valued Contributor II

11-25-2021 10:47:59 AM

14 kudos

As of right now, Databricks can't use non-public Maven repositories as resolving of the maven coordinates happens in the control plane. That's different from the R & Python libraries. As workaround you may try to install libraries via init script or ...

14 kudos

11-25-2021 10:47:59 AM

7 More Replies

by User16752245312 • New Contributor III

06-07-2021 3:02:32 PM

3663 Views
2 replies
2 kudos

How can I automatically capture the heap dump on the driver and executors in the event of an OOM error?

If you have a job that repeatedly run into Out-of-memory error (OOM) either on the driver or executors, automatically capture the heap dump on OOM event will help debugging the memory issue and identify the cause of the error.Spark config:spark.execu...

Data Engineering

3663 Views
2 replies
2 kudos

06-07-2021 3:02:32 PM

View Replies

Latest Reply

John_360
New Contributor II

08-09-2022 3:16:03 PM

2 kudos

Is it necessary to use exactly that HeapDumpPath? I find I'm unable to get driver heap dumps with a different path but otherwise the same configuration. I'm using spark_version 10.4.x-cpu-ml-scala2.12.

2 kudos

08-09-2022 3:16:03 PM

1 More Replies

by Serhii • Contributor

08-18-2022 9:23:59 AM

1923 Views
1 replies
1 kudos

Resolved! Behaviour of cluster launches in multi-task jobs

We are adapting the multi-tasks workflow example from dbx documentation for our pipelines https://dbx.readthedocs.io/en/latest/examples/python_multitask_deployment_example.html. As a part of configuration we specify cluster configuration and provide ...

Data Engineering

1923 Views
1 replies
1 kudos

08-18-2022 9:23:59 AM

View Replies

Latest Reply

User16873043099
Contributor

08-18-2022 10:22:33 AM

1 kudos

Tasks within the same multi task job can reuse the clusters. A shared job cluster allows multiple tasks in the same job to use the cluster. The cluster is created and started when the first task using the cluster starts and terminates after the last ...

1 kudos

08-18-2022 10:22:33 AM

by Ashok1 • New Contributor II

06-13-2022 4:20:44 AM

755 Views
2 replies
1 kudos

Can we use autoloader to stream files from delta tables(source)

Data Engineering

755 Views
2 replies
1 kudos

06-13-2022 4:20:44 AM

View Replies

Latest Reply

Anonymous
Not applicable

08-18-2022 8:31:41 AM

1 kudos

Hey there @Ashok ch Hope everything is going great.Does @Ivan Tang's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more hel...

1 kudos

08-18-2022 8:31:41 AM

1 More Replies

by shubhamb • New Contributor III

06-12-2022 10:30:12 PM

2553 Views
3 replies
3 kudos

How to fetch environmental variables saved in one notebook into another notebook in Databricks Repos and Notebooks

I have this config.py file which is used to store environmental variablesPUSH_API_ACCOUNT_ID = '*******' PUSH_API_PASSCODE = '***********************'I am using this to fetch the variables and use it in my file.py import sys sys.path.append("..") ...

Data Engineering

2553 Views
3 replies
3 kudos

06-12-2022 10:30:12 PM

View Replies

Latest Reply

Anonymous
Not applicable

08-18-2022 8:27:40 AM

3 kudos

Hey there @Shubham Biswas Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from ...

3 kudos

08-18-2022 8:27:40 AM

2 More Replies

by BradSheridan • Valued Contributor

07-27-2022 6:13:27 AM

1958 Views
9 replies
4 kudos

Resolved! How to use cloudFiles to completely overwrite the target

Hey there Community!! I have a client that will produce a CSV file daily that needs to be moved from Bronze -> Silver. Unfortunately, this source file will always be a full set of data....not incremental. I was thinking of using AutoLoader/cloudFil...

Data Engineering

1958 Views
9 replies
4 kudos

07-27-2022 6:13:27 AM

View Replies

Latest Reply

BradSheridan
Valued Contributor

08-12-2022 10:44:42 AM

4 kudos

I "up voted'" all of @werners suggestions b/c they are all very valid ways of addressing my need (the true power/flexibility of the Databricks UDAP!!!). However, turns out I'm going to end up getting incremental data afterall :). So now the flow wi...

4 kudos

08-12-2022 10:44:42 AM

8 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

How to do an "overwrite" output mode using spark structured streaming without deleting all the data and the checkpoint

Resolved! How can I convert a parquet into delta table?

Trouble referencing a column that has been added by schema evolution (Auto Loader with Delta Live Tables)

Running large window spark structured streaming aggregations with small slide duration

Resolved! How to restrict Azure users to use launch workspace to login to ADB workspace as admin when user has owner or contributor role

Is there any way to navigate/access cloud files using the direct abfss URI (no mount) with default python functions/libs like open() or os.listdir()?

collect_set wired result when Proton enable

ManagedLibraryInstallFailed when changing Databricks Runtime Version from 9.1 to 11.0

Importing a large csv file into databricks free

Resolved! Adding JAR from Azure DevOps Artifacts feed to Databricks job

How can I automatically capture the heap dump on the driver and executors in the event of an OOM error?

Resolved! Behaviour of cluster launches in multi-task jobs

Can we use autoloader to stream files from delta tables(source)

How to fetch environmental variables saved in one notebook into another notebook in Databricks Repos and Notebooks

Resolved! How to use cloudFiles to completely overwrite the target

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...