cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

mmenjivar
by New Contributor II
  • 2655 Views
  • 2 replies
  • 0 kudos

How to get the run_id from a previous task in a Databricks jobs

Hi, is there any way to share the run_id from a task_A to a task_B within the same job when task_A is a dbt task?

  • 2655 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, You can pass {job_id}} and {{run_id}} in Job arguments and print that information and save into wherever it is neededplease find below the documentation for the same:https://docs.databricks.com/data-engineering/jobs/jobs.html#task-parameter-varia...

  • 0 kudos
1 More Replies
jonathan-dufaul
by Valued Contributor
  • 4871 Views
  • 5 replies
  • 4 kudos

Resolved! How can I look up the first ancestor (person,first_ancestor) of a record from a table that has (child,parent) records?

I have a table that looks like this:/* input */ -- | parent | child | -- | ------ | ----- | -- | 1 | 2 | -- | 2 | 3 | -- | 3 | 4 | -- | 5 | 6 | -- | 6 | 7 | -- | 8 | 9 | -- | 10 | 11 |and I...

  • 4871 Views
  • 5 replies
  • 4 kudos
Latest Reply
JGil
New Contributor III
  • 4 kudos

@Landan George​ Hey, I am looking into same issue, but when I execute what's suggested in the post for CTE_Recursive https://medium.com/globant/how-to-implement-recursive-queries-in-spark-3d26f7ed3bc9 I get errorError in SQL statement: AnalysisExcep...

  • 4 kudos
4 More Replies
bhawik21
by New Contributor II
  • 3135 Views
  • 4 replies
  • 0 kudos

Resolved! How do I invoke a data enrichment function before model.predict while serving the model

I have used mlflow and got my model served through REST API. It work fine when all model features are provided. But my use case is that only a single feature (the primary key) will be provided by the consumer application, and my code has to lookup th...

  • 3135 Views
  • 4 replies
  • 0 kudos
Latest Reply
LuisL
New Contributor II
  • 0 kudos

You can create a custom endpoint for your REST API that handles the data massaging before calling themodel.predict function. This endpoint can take in the primary key as an input, retrieve the additional features from the database based on that key, ...

  • 0 kudos
3 More Replies
powerus
by New Contributor III
  • 5738 Views
  • 1 replies
  • 0 kudos

Resolved! "Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key" using com.databricks:spark-xml_2.12:0.12.0

Hi community,I'm trying to read XML data from Azure Datalake Gen 2 using com.databricks:spark-xml_2.12:0.12.0:spark.read.format('XML').load('abfss://[CONTAINER]@[storageaccount].dfs.core.windows.net/PATH/TO/FILE.xml')The code above gives the followin...

  • 5738 Views
  • 1 replies
  • 0 kudos
Latest Reply
powerus
New Contributor III
  • 0 kudos

The issue was also raised here: https://github.com/databricks/spark-xml/issues/591A fix is to use the "spark.hadoop" prefix in front of the fs.azure spark config keys:spark.hadoop.fs.azure.account.oauth2.client.id.nubulosdpdlsdev01.dfs.core.windows.n...

  • 0 kudos
sid_de
by New Contributor II
  • 3685 Views
  • 2 replies
  • 2 kudos

404 Not Found [IP: 185.125.190.36 80] on trying to install google-chrome in databricks spark driver

We are installing google-chrome-stable in databricks cluster using apt-get install. Which has been working fine for a long time, but since the past few days it has started to fail intermittently.The following is the code that we run.%sh sudo curl -s...

  • 3685 Views
  • 2 replies
  • 2 kudos
Latest Reply
sid_de
New Contributor II
  • 2 kudos

Hi The issue was still persistent. We are trying to solve this by using docker image with preinstalled Selenium driver and chrome browser.RegardsDharmin

  • 2 kudos
1 More Replies
Fred_F
by New Contributor III
  • 8416 Views
  • 5 replies
  • 5 kudos

JDBC connection timeout on workflow cluster

Hi there,​I've a batch process configured in a workflow which fails due to a jdbc timeout on a Postgres DB.​I checked the JDBC connection configuration and it seems to work when I query a table and doing a df.show() in the process and it displays th...

  • 8416 Views
  • 5 replies
  • 5 kudos
Latest Reply
RKNutalapati
Valued Contributor
  • 5 kudos

HI @Fred Foucart​ ,The above code looks good to me. Can you try with below code as well.spark.read\  .format("jdbc") \  .option("url", f"jdbc:postgresql://{host}/{database}") \  .option("driver", "org.postgresql.Driver") \  .option("user", username) ...

  • 5 kudos
4 More Replies
Direo
by Contributor II
  • 2642 Views
  • 1 replies
  • 1 kudos

Azure databricks integration with Datadog

Before running a script which would create an agent on a cluster, you have to provide SPARK_LOCAL_IP variable. How can I find it? Does it change over time or its a constant?

  • 2642 Views
  • 1 replies
  • 1 kudos
Latest Reply
Debayan
Databricks Employee
  • 1 kudos

Hi, Could you please refer to https://www.datadoghq.com/blog/databricks-monitoring-datadog/ and let us know if this helps. SPARK_LOCAL_IP is the environment variable, FYI, https://spark.apache.org/docs/latest/configuration.html

  • 1 kudos
SIRIGIRI
by Contributor
  • 1257 Views
  • 1 replies
  • 2 kudos

what is the probability that the worker node is having an internal problem for a speculative task to start.

@DataBricksHelp232​ @Arjun Krishna S R​ @akash kumar​ 

  • 1257 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi, What kind of internal problem you are talking about? Anything particular?

  • 2 kudos
Kajorn
by New Contributor III
  • 6778 Views
  • 2 replies
  • 0 kudos

Resolved! WHEN NOT MATCHED BY SOURCE Syntax error at or near 'BY' (DBR 11.2 ML)

Hi, I have trouble with executing the given SQL Statement below.MERGE INTO warehouse.pdr_debit_card as TARGET USING (SELECT * FROM ( SELECT CIF, CARD_TYPE, ISSUE_DATE, MATURITY_DATE, BOO, DATA_DATE, row_number(...

  • 6778 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, Please refer: https://docs.databricks.com/sql/language-manual/delta-merge-into.html

  • 0 kudos
1 More Replies
Ender
by New Contributor II
  • 1101 Views
  • 0 replies
  • 0 kudos

Delta Live Tables migration

How can I migrate a delta live tables workflow to another Databricks workspace?PS: Data source/sink will remain the same. I only want to migrate the DLT config.

  • 1101 Views
  • 0 replies
  • 0 kudos
Lizhi_Dong
by New Contributor II
  • 2164 Views
  • 4 replies
  • 0 kudos

What would be the best plan for independent course creator?

Hi folks! I want to use databrick community edition as the platform to teach online courses. As you may know, for community edition, you need to create a new cluster when the old one terminates. I found out however tables created from the old cluster...

  • 2164 Views
  • 4 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

You can create a notebook for students which recreates everything, like doing the installation of tables etc., before every exercise.

  • 0 kudos
3 More Replies
RRO
by Contributor
  • 1812 Views
  • 1 replies
  • 3 kudos

AutoML forecasting with monthly data?

ARIMA and FBProphet have the capability to forecast monthly data. When using AutoML (via the API or the UI) it seems like it is not possible to have a monthly freq (e.g. 'MS').Is there a way / workaround to make it work with monthly data or is it pla...

  • 1812 Views
  • 1 replies
  • 3 kudos
Latest Reply
MateuszLomanski
New Contributor II
  • 3 kudos

It is possible to use AutoML to forecast monthly data, but it may require some additional steps or adjustments.One approach is to resample the monthly data to a lower frequency such as weekly or daily, and then use AutoML to forecast at that lower fr...

  • 3 kudos
Ajay-Pandey
by Esteemed Contributor III
  • 4253 Views
  • 9 replies
  • 11 kudos

Databricks start support to run selected text in a cell this will help us a lot during debugging of the code.In windows just select the line of code w...

Databricks start support to run selected text in a cell this will help us a lot during debugging of the code.In windows just select the line of code which you want to execute and press Ctrl+Shift+Enter

sele
  • 4253 Views
  • 9 replies
  • 11 kudos
Latest Reply
Nhan_Nguyen
Valued Contributor
  • 11 kudos

Thanks @Ajay Pandey​ nice sharing

  • 11 kudos
8 More Replies
SIRIGIRI
by Contributor
  • 624 Views
  • 0 replies
  • 1 kudos

medium.com

During Shuffle operation Data is moving from memory to disk Why?Please find the detailed answer here if any question please comment and hit like and share if interested in upcoming articles.https://medium.com/@sharikrishna26/during-shuffle-operation-...

  • 624 Views
  • 0 replies
  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels