Data Engineering

Forum Posts

Sorted by:

by jonathan-dufaul • Valued Contributor

01-24-2023 7:28:41 AM

801 Views
2 replies
0 kudos

Is there a function similar to display that downloads a dataframe?

I find myself constantly having to do display(df), and then "recompute with <5g records and download). I was just hoping I could skip the middleman and download from get go. ideally it'd be a function like download(df,num_rows="max") where num_rows i...

Data Engineering

801 Views
2 replies
0 kudos

01-24-2023 7:28:41 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-24-2023 10:26:35 AM

0 kudos

Question where do you want to download it to? If to cloud location, use regular DataFrameWriter. You can install, for example, Azure Storage Explorer on your computer. Some cloud storage you can even mount in your system as a folder or network share.

0 kudos

01-24-2023 10:26:35 AM

1 More Replies

by mmenjivar • New Contributor II

01-23-2023 2:45:17 PM

1120 Views
2 replies
0 kudos

How to get the run_id from a previous task in a Databricks jobs

Hi, is there any way to share the run_id from a task_A to a task_B within the same job when task_A is a dbt task?

Data Engineering

1120 Views
2 replies
0 kudos

01-23-2023 2:45:17 PM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

01-23-2023 10:49:54 PM

0 kudos

Hi, You can pass {job_id}} and {{run_id}} in Job arguments and print that information and save into wherever it is neededplease find below the documentation for the same:https://docs.databricks.com/data-engineering/jobs/jobs.html#task-parameter-varia...

0 kudos

01-23-2023 10:49:54 PM

1 More Replies

by jonathan-dufaul • Valued Contributor

12-20-2022 9:54:59 AM

2081 Views
5 replies
4 kudos

Resolved! How can I look up the first ancestor (person,first_ancestor) of a record from a table that has (child,parent) records?

I have a table that looks like this:/* input */ -- | parent | child | -- | ------ | ----- | -- | 1 | 2 | -- | 2 | 3 | -- | 3 | 4 | -- | 5 | 6 | -- | 6 | 7 | -- | 8 | 9 | -- | 10 | 11 |and I...

Data Engineering

2081 Views
5 replies
4 kudos

12-20-2022 9:54:59 AM

View Replies

Latest Reply

JGil
New Contributor III

01-22-2023 11:03:16 AM

4 kudos

@Landan George Hey, I am looking into same issue, but when I execute what's suggested in the post for CTE_Recursive https://medium.com/globant/how-to-implement-recursive-queries-in-spark-3d26f7ed3bc9 I get errorError in SQL statement: AnalysisExcep...

4 kudos

01-22-2023 11:03:16 AM

4 More Replies

by bhawik21 • New Contributor II

02-11-2022 2:51:08 AM

1253 Views
4 replies
0 kudos

Resolved! How do I invoke a data enrichment function before model.predict while serving the model

I have used mlflow and got my model served through REST API. It work fine when all model features are provided. But my use case is that only a single feature (the primary key) will be provided by the consumer application, and my code has to lookup th...

Data Engineering

1253 Views
4 replies
0 kudos

02-11-2022 2:51:08 AM

View Replies

Latest Reply

LuisL
New Contributor II

01-24-2023 6:34:31 AM

0 kudos

You can create a custom endpoint for your REST API that handles the data massaging before calling themodel.predict function. This endpoint can take in the primary key as an input, retrieve the additional features from the database based on that key, ...

0 kudos

01-24-2023 6:34:31 AM

3 More Replies

by powerus • New Contributor III

01-24-2023 1:12:04 AM

3512 Views
1 replies
0 kudos

Resolved! "Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key" using com.databricks:spark-xml_2.12:0.12.0

Hi community,I'm trying to read XML data from Azure Datalake Gen 2 using com.databricks:spark-xml_2.12:0.12.0:spark.read.format('XML').load('abfss://[CONTAINER]@[storageaccount].dfs.core.windows.net/PATH/TO/FILE.xml')The code above gives the followin...

Data Engineering

3512 Views
1 replies
0 kudos

01-24-2023 1:12:04 AM

View Replies

Latest Reply

powerus
New Contributor III

01-24-2023 4:43:25 AM

0 kudos

The issue was also raised here: https://github.com/databricks/spark-xml/issues/591A fix is to use the "spark.hadoop" prefix in front of the fs.azure spark config keys:spark.hadoop.fs.azure.account.oauth2.client.id.nubulosdpdlsdev01.dfs.core.windows.n...

0 kudos

01-24-2023 4:43:25 AM

by sid_de • New Contributor II

01-14-2023 10:51:34 PM

2062 Views
3 replies
2 kudos

404 Not Found [IP: 185.125.190.36 80] on trying to install google-chrome in databricks spark driver

We are installing google-chrome-stable in databricks cluster using apt-get install. Which has been working fine for a long time, but since the past few days it has started to fail intermittently.The following is the code that we run.%sh sudo curl -s...

Data Engineering

2062 Views
3 replies
2 kudos

01-14-2023 10:51:34 PM

View Replies

Latest Reply

sid_de
New Contributor II

01-24-2023 2:27:36 AM

2 kudos

Hi The issue was still persistent. We are trying to solve this by using docker image with preinstalled Selenium driver and chrome browser.RegardsDharmin

2 kudos

01-24-2023 2:27:36 AM

2 More Replies

by quakenbush • Contributor

01-18-2023 7:16:21 AM

2073 Views
4 replies
5 kudos

Resolved! Does Databricks offer something like Oracle's dblink?

I am aware, I can load anything into a DataFrame using JDBC, that works well from Oracle sources. Is there an equivalent in Spark SQL, so I can combine datasets as well?Basically something like so - you get the idea...select lt.field1, rt.fie...

Data Engineering

2073 Views
4 replies
5 kudos

01-18-2023 7:16:21 AM

View Replies

Latest Reply

Kaniz
Community Manager

01-24-2023 1:46:19 AM

5 kudos

Hi @Roger Bieri (Customer), I appreciate your attempt to choose the best answer for us. I'm glad you got your query resolved. @Joseph Kambourakis and @Adrian Łobacz, Thank you for giving excellent answers .

5 kudos

01-24-2023 1:46:19 AM

3 More Replies

by Fred_F • New Contributor III

01-09-2023 6:57:28 AM

4003 Views
7 replies
5 kudos

JDBC connection timeout on workflow cluster

Hi there,I've a batch process configured in a workflow which fails due to a jdbc timeout on a Postgres DB.I checked the JDBC connection configuration and it seems to work when I query a table and doing a df.show() in the process and it displays th...

Data Engineering

4003 Views
7 replies
5 kudos

01-09-2023 6:57:28 AM

View Replies

Latest Reply

Kaniz
Community Manager

01-11-2023 7:18:40 AM

5 kudos

Hi @Fred Foucart, We haven’t heard from you since the last response from @Rama Krishna N , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be helpful to ...

5 kudos

01-11-2023 7:18:40 AM

6 More Replies

by Direo • Contributor

01-20-2023 2:47:58 AM

803 Views
1 replies
1 kudos

Azure databricks integration with Datadog

Before running a script which would create an agent on a cluster, you have to provide SPARK_LOCAL_IP variable. How can I find it? Does it change over time or its a constant?

Data Engineering

803 Views
1 replies
1 kudos

01-20-2023 2:47:58 AM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

01-24-2023 12:02:37 AM

1 kudos

Hi, Could you please refer to https://www.datadoghq.com/blog/databricks-monitoring-datadog/ and let us know if this helps. SPARK_LOCAL_IP is the environment variable, FYI, https://spark.apache.org/docs/latest/configuration.html

1 kudos

01-24-2023 12:02:37 AM

by SIRIGIRI • Contributor

01-22-2023 11:51:29 PM

546 Views
1 replies
2 kudos

what is the probability that the worker node is having an internal problem for a speculative task to start.

@DataBricksHelp232 @Arjun Krishna S R @akash kumar

Data Engineering

546 Views
1 replies
2 kudos

01-22-2023 11:51:29 PM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

01-23-2023 11:19:48 PM

2 kudos

Hi, What kind of internal problem you are talking about? Anything particular?

2 kudos

01-23-2023 11:19:48 PM

by Kajorn • New Contributor III

01-23-2023 7:39:46 PM

3270 Views
2 replies
0 kudos

Resolved! WHEN NOT MATCHED BY SOURCE Syntax error at or near 'BY' (DBR 11.2 ML)

Hi, I have trouble with executing the given SQL Statement below.MERGE INTO warehouse.pdr_debit_card as TARGET USING (SELECT * FROM ( SELECT CIF, CARD_TYPE, ISSUE_DATE, MATURITY_DATE, BOO, DATA_DATE, row_number(...

Data Engineering

3270 Views
2 replies
0 kudos

01-23-2023 7:39:46 PM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

01-23-2023 11:17:53 PM

0 kudos

Hi, Please refer: https://docs.databricks.com/sql/language-manual/delta-merge-into.html

0 kudos

01-23-2023 11:17:53 PM

1 More Replies

by Ender • New Contributor

01-23-2023 7:45:49 AM

528 Views
0 replies
0 kudos

Delta Live Tables migration

How can I migrate a delta live tables workflow to another Databricks workspace?PS: Data source/sink will remain the same. I only want to migrate the DLT config.

Data Engineering

528 Views
0 replies
0 kudos

01-23-2023 7:45:49 AM

by Lizhi_Dong • New Contributor II

01-22-2023 4:34:32 PM

988 Views
4 replies
0 kudos

What would be the best plan for independent course creator?

Hi folks! I want to use databrick community edition as the platform to teach online courses. As you may know, for community edition, you need to create a new cluster when the old one terminates. I found out however tables created from the old cluster...

Data Engineering

988 Views
4 replies
0 kudos

01-22-2023 4:34:32 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-23-2023 5:20:06 AM

0 kudos

You can create a notebook for students which recreates everything, like doing the installation of tables etc., before every exercise.

0 kudos

01-23-2023 5:20:06 AM

3 More Replies

by KKo • Contributor III

01-21-2023 5:21:59 AM

3521 Views
4 replies
0 kudos

Move whole workflow from Dev to Prod

I have a workflow created in Dev, now I want to move the whole thing to prod and schedule it. The workflow has multiple notebooks, dependent libraries, parameters and such. How to move the whole thing to prod, instead of moving each notebooks and rec...

Data Engineering

3521 Views
4 replies
0 kudos

01-21-2023 5:21:59 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-23-2023 5:46:42 AM

0 kudos

Alternatively, you can just click the three dots options in workflow and choose "view JSON" and save JSON. Then use it in the Rest API call to create new workflow/job using that JSON (but usually some part needs to be removed)

0 kudos

01-23-2023 5:46:42 AM

3 More Replies

by RRO • Contributor

08-24-2022 7:24:11 AM

950 Views
1 replies
3 kudos

AutoML forecasting with monthly data?

ARIMA and FBProphet have the capability to forecast monthly data. When using AutoML (via the API or the UI) it seems like it is not possible to have a monthly freq (e.g. 'MS').Is there a way / workaround to make it work with monthly data or is it pla...

Data Engineering

950 Views
1 replies
3 kudos

08-24-2022 7:24:11 AM

View Replies

Latest Reply

Mateusz_Lomansk
New Contributor II

01-23-2023 4:55:25 AM

3 kudos

It is possible to use AutoML to forecast monthly data, but it may require some additional steps or adjustments.One approach is to resample the monthly data to a lower frequency such as weekly or daily, and then use AutoML to forecast at that lower fr...

3 kudos

01-23-2023 4:55:25 AM

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Is there a function similar to display that downloads a dataframe?

How to get the run_id from a previous task in a Databricks jobs

Resolved! How can I look up the first ancestor (person,first_ancestor) of a record from a table that has (child,parent) records?

Resolved! How do I invoke a data enrichment function before model.predict while serving the model

Resolved! "Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key" using com.databricks:spark-xml_2.12:0.12.0

404 Not Found [IP: 185.125.190.36 80] on trying to install google-chrome in databricks spark driver

Resolved! Does Databricks offer something like Oracle's dblink?

JDBC connection timeout on workflow cluster

Azure databricks integration with Datadog

what is the probability that the worker node is having an internal problem for a speculative task to start.

Resolved! WHEN NOT MATCHED BY SOURCE Syntax error at or near 'BY' (DBR 11.2 ML)

Delta Live Tables migration

What would be the best plan for independent course creator?

Move whole workflow from Dev to Prod

AutoML forecasting with monthly data?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...