cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

abhi_1825
by New Contributor III
  • 3407 Views
  • 6 replies
  • 1 kudos

Resolved! Databricks certified Data Engineer Associate V3 Exam - Voucher code not working.

So I have a voucher code which I received after completing the Lakehouse Fundamentals exam. However, I am not able to use it for Databricks certified Data Engineer Associate V3 Exam in public section. Its working for V2 version along with other exams...

  • 3407 Views
  • 6 replies
  • 1 kudos
Latest Reply
abhi_1825
New Contributor III
  • 1 kudos

Thanks guys. My problem of voucher code is resolved after raising a ticket with Training team.

  • 1 kudos
5 More Replies
Toy
by New Contributor II
  • 2676 Views
  • 3 replies
  • 0 kudos

Pipeline Error [Py4JJavaError] com.databricks.WorkflowException: com.databricks.NotebookExecutionException: FAILED

I have a pipeline that used used to run successfully and now all of a sudden is returning this error that I cannot resolve: [Py4JJavaError] 

image
  • 2676 Views
  • 3 replies
  • 0 kudos
Latest Reply
Toy
New Contributor II
  • 0 kudos

Hi Guys, You're right the problem is with the child notebook.All my notebooks are failing at this point. I can't seem to be wining with solving this error

  • 0 kudos
2 More Replies
Phani1
by Valued Contributor II
  • 1251 Views
  • 1 replies
  • 0 kudos

Databricks Issue with the returning results to PowerBI

While returning results to PowerBI, Databricks Completed the session (in 9 mins) but PowerBI waiting for the results (more than 7 hrs. for 20 GB of data).Could you please help us on this.

  • 1251 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@Janga Reddy​ It's really hard to find a solution here without further investigating PBI Gateway performance/Data models etc.If Databricks completed the session in 9 mins then I assume that the issue could be with the performance of PBI datasets.

  • 0 kudos
MachuPichu
by New Contributor III
  • 6857 Views
  • 9 replies
  • 4 kudos

Copying Delta table from QA to Prod

I am looking for a way to copy large managed Delta table like 4TB from one environment (QA) to other(Prod) . QA and Prod are in different subscription and in different region. I understand Databricks provides a way to clone table. But I am not sure i...

  • 6857 Views
  • 9 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

I would use a data factory to copy 4TB files as it has gigantic throughput. After completing a copy of everything, I would register as a table in the new metastore.

  • 4 kudos
8 More Replies
521746
by New Contributor
  • 520 Views
  • 0 replies
  • 0 kudos

Deleting account level group through API throws error

Hi team, we are getting an error in deleting an account level group with the scim API{ "schemas": [ "urn:ietf:params:scim:api:messages:2.0:Error" ], "detail": "INTERNAL_ERROR: Unexpected error: getTenantIdForAccountId is not implemented in HybridWor...

  • 520 Views
  • 0 replies
  • 0 kudos
SivaRamaKrishna
by New Contributor
  • 6668 Views
  • 2 replies
  • 0 kudos

Resolved! I got this error help me how to sort out?

The file actually in DEV environment... My job was sync data in QA ENVIRONMENT.The query execute in QA ENVIRONMENT I got this error ...how to solve this issue​

  • 6668 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @SivaRamaKrishna Mulagalapati​,Could you copy and paste the full error stack trace? and also share the code you are trying to run.

  • 0 kudos
1 More Replies
hussi
by New Contributor
  • 1333 Views
  • 3 replies
  • 0 kudos

I also have the same issue, can someone please help ?I just passed the Lakehouse Fundamentals Accreditation and I haven't received any badge or ce...

I also have the same issue, can someone please help ?I just passed the Lakehouse Fundamentals Accreditation and I haven't received any badge or certificate for it. I understand that I need to go to credentials.databricks.com but it is not there. How ...

  • 1333 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Husain Sadriwala​ ,Just a friendly follow-up. Did you receive your badge yet? please let us know if you still need help.

  • 0 kudos
2 More Replies
Craig_
by New Contributor III
  • 3535 Views
  • 4 replies
  • 0 kudos

Resolved! Caveats when importing functions from REPO stored .py files

The ability to import .py files into notebooks looked like a clean and easy way to reuse code and to ensure all notebooks are using the same version of code. However, two items remain unclear after scouring documentation and forums.Are these the rig...

  • 3535 Views
  • 4 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

"does not have access to the spark session by default" yes, that is correct you need to pass the reference to the spark variable inside Class or Function, something like when you call from notebook function_from_file(spark=spark)displayHTML() is desi...

  • 0 kudos
3 More Replies
jonathan-dufaul
by Valued Contributor
  • 1576 Views
  • 2 replies
  • 0 kudos

Is there a function similar to display that downloads a dataframe?

I find myself constantly having to do display(df), and then "recompute with <5g records and download). I was just hoping I could skip the middleman and download from get go. ideally it'd be a function like download(df,num_rows="max") where num_rows i...

  • 1576 Views
  • 2 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

Question where do you want to download it to? If to cloud location, use regular DataFrameWriter. You can install, for example, Azure Storage Explorer on your computer. Some cloud storage you can even mount in your system as a folder or network share.

  • 0 kudos
1 More Replies
mmenjivar
by New Contributor II
  • 2234 Views
  • 2 replies
  • 0 kudos

How to get the run_id from a previous task in a Databricks jobs

Hi, is there any way to share the run_id from a task_A to a task_B within the same job when task_A is a dbt task?

  • 2234 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, You can pass {job_id}} and {{run_id}} in Job arguments and print that information and save into wherever it is neededplease find below the documentation for the same:https://docs.databricks.com/data-engineering/jobs/jobs.html#task-parameter-varia...

  • 0 kudos
1 More Replies
jonathan-dufaul
by Valued Contributor
  • 4217 Views
  • 5 replies
  • 4 kudos

Resolved! How can I look up the first ancestor (person,first_ancestor) of a record from a table that has (child,parent) records?

I have a table that looks like this:/* input */ -- | parent | child | -- | ------ | ----- | -- | 1 | 2 | -- | 2 | 3 | -- | 3 | 4 | -- | 5 | 6 | -- | 6 | 7 | -- | 8 | 9 | -- | 10 | 11 |and I...

  • 4217 Views
  • 5 replies
  • 4 kudos
Latest Reply
JGil
New Contributor III
  • 4 kudos

@Landan George​ Hey, I am looking into same issue, but when I execute what's suggested in the post for CTE_Recursive https://medium.com/globant/how-to-implement-recursive-queries-in-spark-3d26f7ed3bc9 I get errorError in SQL statement: AnalysisExcep...

  • 4 kudos
4 More Replies
bhawik21
by New Contributor II
  • 2651 Views
  • 4 replies
  • 0 kudos

Resolved! How do I invoke a data enrichment function before model.predict while serving the model

I have used mlflow and got my model served through REST API. It work fine when all model features are provided. But my use case is that only a single feature (the primary key) will be provided by the consumer application, and my code has to lookup th...

  • 2651 Views
  • 4 replies
  • 0 kudos
Latest Reply
LuisL
New Contributor II
  • 0 kudos

You can create a custom endpoint for your REST API that handles the data massaging before calling themodel.predict function. This endpoint can take in the primary key as an input, retrieve the additional features from the database based on that key, ...

  • 0 kudos
3 More Replies
powerus
by New Contributor III
  • 5152 Views
  • 1 replies
  • 0 kudos

Resolved! "Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key" using com.databricks:spark-xml_2.12:0.12.0

Hi community,I'm trying to read XML data from Azure Datalake Gen 2 using com.databricks:spark-xml_2.12:0.12.0:spark.read.format('XML').load('abfss://[CONTAINER]@[storageaccount].dfs.core.windows.net/PATH/TO/FILE.xml')The code above gives the followin...

  • 5152 Views
  • 1 replies
  • 0 kudos
Latest Reply
powerus
New Contributor III
  • 0 kudos

The issue was also raised here: https://github.com/databricks/spark-xml/issues/591A fix is to use the "spark.hadoop" prefix in front of the fs.azure spark config keys:spark.hadoop.fs.azure.account.oauth2.client.id.nubulosdpdlsdev01.dfs.core.windows.n...

  • 0 kudos
sid_de
by New Contributor II
  • 3301 Views
  • 2 replies
  • 2 kudos

404 Not Found [IP: 185.125.190.36 80] on trying to install google-chrome in databricks spark driver

We are installing google-chrome-stable in databricks cluster using apt-get install. Which has been working fine for a long time, but since the past few days it has started to fail intermittently.The following is the code that we run.%sh sudo curl -s...

  • 3301 Views
  • 2 replies
  • 2 kudos
Latest Reply
sid_de
New Contributor II
  • 2 kudos

Hi The issue was still persistent. We are trying to solve this by using docker image with preinstalled Selenium driver and chrome browser.RegardsDharmin

  • 2 kudos
1 More Replies
Fred_F
by New Contributor III
  • 7382 Views
  • 5 replies
  • 5 kudos

JDBC connection timeout on workflow cluster

Hi there,​I've a batch process configured in a workflow which fails due to a jdbc timeout on a Postgres DB.​I checked the JDBC connection configuration and it seems to work when I query a table and doing a df.show() in the process and it displays th...

  • 7382 Views
  • 5 replies
  • 5 kudos
Latest Reply
RKNutalapati
Valued Contributor
  • 5 kudos

HI @Fred Foucart​ ,The above code looks good to me. Can you try with below code as well.spark.read\  .format("jdbc") \  .option("url", f"jdbc:postgresql://{host}/{database}") \  .option("driver", "org.postgresql.Driver") \  .option("user", username) ...

  • 5 kudos
4 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels