Data Engineering

Forum Posts

Sorted by:

by ksilva • New Contributor

04-29-2022 5:04:45 AM

1729 Views
2 replies
1 kudos

Incorrect secret value when loaded as environment variable

I recently faced an issue that took good hours to identify. I'm loading an environment variable with a secretENVVAR: {{secrets/scope/key}}The secret is loaded in my application, I could verify it's there, but its value is not correct. I realised tha...

Data Engineering

1729 Views
2 replies
1 kudos

04-29-2022 5:04:45 AM

View Replies

Latest Reply

User16752242622
Valued Contributor

06-23-2022 11:40:51 AM

1 kudos

Hi @kleber silva There was a known issue which has been resolved now. That is when a $ character is included in a secret value, the $ and all subsequent text are truncated. Although your question is actually related to how spark parse the value as a...

1 kudos

06-23-2022 11:40:51 AM

1 More Replies

by rgrosskopf • New Contributor II

04-27-2022 8:27:06 AM

3111 Views
2 replies
1 kudos

How to access secrets in Hashicorp Vault from Databricks notebooks?

I see in this blog post that Databricks supports Hashicorp Vault for secrets storage but I've been unable to find any additional details on how that would work. Specifically, how would I authenticate to Vault from within a Databricks notebook?

Data Engineering

3111 Views
2 replies
1 kudos

04-27-2022 8:27:06 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

07-25-2022 2:49:50 PM

1 kudos

Hi @Ryan Grosskopf,Just a friendly follow-up. Did any Prabakar's responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

1 kudos

07-25-2022 2:49:50 PM

1 More Replies

by KamKam • New Contributor

04-26-2022 1:52:28 AM

698 Views
2 replies
0 kudos

How to write to a folder in a Azure Data Lake container using Delta?

Hi All,How to write to a folder in a Azure Data Lake container using Delta?When I run:write_mode = 'overwrite' write_format = 'delta' save_path = '/mnt/container-name/folder-name' df.write \ .mode(write_mode) \ .format(write_format) \ ....

Data Engineering

698 Views
2 replies
0 kudos

04-26-2022 1:52:28 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

06-01-2022 5:07:28 PM

0 kudos

Hi @Kamalen Reddy ,Could you share the error message please?

0 kudos

06-01-2022 5:07:28 PM

1 More Replies

by dududu • New Contributor II

04-25-2022 8:31:30 AM

538 Views
1 replies
0 kudos

How to explain the huge time latency between two jobs? How to optimize the job to reduce the latency ?

I have met a problem , you can see in the picture as followed: there is some long delay between some jobs , I don't understand what happened and how to optimize the job ? Can anybody help me ? Thanks a lot.

Data Engineering

538 Views
1 replies
0 kudos

04-25-2022 8:31:30 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

07-25-2022 2:40:39 PM

0 kudos

Hi @jieping zhang,Did you check the driver's logs? do you see any error messages? please provide more details

0 kudos

07-25-2022 2:40:39 PM

by wyzer • Contributor II

04-12-2022 5:12:10 AM

2438 Views
9 replies
4 kudos

Unable to read an XML file of 9 GB

Hello,We have a large XML file (9 GB) that we can't read.We have this error : VM size limitBut how can we change the VM size limit ?We have tested many clusters, but no one can read this file.Thank you for your help.

Data Engineering

2438 Views
9 replies
4 kudos

04-12-2022 5:12:10 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

07-25-2022 2:14:39 PM

4 kudos

Hi @Salah K.,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

4 kudos

07-25-2022 2:14:39 PM

8 More Replies

by MarcJustice • New Contributor

04-05-2022 6:15:04 PM

885 Views
3 replies
3 kudos

Is the promise of a data lake simply about data science, data analytics and data quality or can it also be an integral part of core transaction processing also?

Upfront, I want to let you know that I'm not a veteran data jockey, so I apologize if this topic has been covered already or is simply too basic or narrow for this community. That said, I do need help so please feel free to point me in another direc...

Data Engineering

885 Views
3 replies
3 kudos

04-05-2022 6:15:04 PM

View Replies

Latest Reply

Kaniz
Community Manager

04-26-2022 8:46:25 AM

3 kudos

Hi @Marc Barnett , Just a friendly follow-up. Do you still need help, or @Aashita Ramteke 's response help you to find the solution? Please let us know.

3 kudos

04-26-2022 8:46:25 AM

2 More Replies

by celerity12 • New Contributor II

04-03-2022 12:45:20 PM

3389 Views
7 replies
4 kudos

Pulling list of running jobs using JOBS API 2.1

I need to find out all jobs which are currently running and not get other jobsThe below command fetches all the jobscurl --location --request GET 'https://xxxxxx.gcp.databricks.com/api/2.1/jobs/list?active_only=true&expand_tasks=true&run_type=JOB_RUN...

Data Engineering

3389 Views
7 replies
4 kudos

04-03-2022 12:45:20 PM

View Replies

Latest Reply

User16764241763
Honored Contributor

06-11-2022 1:31:45 AM

4 kudos

Hi @Sumit Rohatgi It seems like active_only=true only applies to jobs/runs/list API and not to jobs/list.Can you please try the jobs/runs/list API?

4 kudos

06-11-2022 1:31:45 AM

6 More Replies

by C_1 • New Contributor III

03-31-2022 9:06:52 PM

2720 Views
7 replies
5 kudos

Resolved! Databricks notebook command logging

Hello Community,I am trying to search for Databricks notebook command logging feature for compliance purpose.My requirement is to log the exact spark sql fired by user.I didnt get spark sql (notebook command) tracked under this azure diagnostic logs....

Data Engineering

2720 Views
7 replies
5 kudos

03-31-2022 9:06:52 PM

View Replies

Latest Reply

Noopur_Nigam
Valued Contributor II

05-13-2022 8:16:55 AM

5 kudos

Hi @C P we don't have this feature implemented, however, there is already an existing idea available in our idea portal here: https://databricks.aha.io/features/DB-7583.You can check and vote the same.

5 kudos

05-13-2022 8:16:55 AM

6 More Replies

by CHANDY • New Contributor

05-26-2022 5:08:00 AM

691 Views
2 replies
0 kudos

Real Time data processing

Say I am getting a customer record from an website. I want to read the massage & then insert/update that one to snowflake table , depending on the records insert/update is successful I need to respond back the success / failure massage in say 1 sec. ...

Data Engineering

691 Views
2 replies
0 kudos

05-26-2022 5:08:00 AM

View Replies

Latest Reply

Anonymous
Not applicable

07-25-2022 9:48:30 AM

0 kudos

Hey @CHANDAN NANDY Just checking in with you.Does @Kaniz Fatma's answer help? If it does, would you be happy to mark it as best? If it doesn't, please tell us so we can help you further.Thanks!

0 kudos

07-25-2022 9:48:30 AM

1 More Replies

by JohnB • New Contributor II

05-26-2022 4:26:26 AM

1628 Views
2 replies
2 kudos

Are there implications moving Managed Table, and mounting as External.

The scenario is "A substaincial amount of data needs to be moved from a legacy Databricks that has Managed Tables, to a new E2 Databrick. The new bucket will be a dedicated Datalake rather than the Workspace Bucket so they will be External Tables."U...

Data Engineering

1628 Views
2 replies
2 kudos

05-26-2022 4:26:26 AM

View Replies

Latest Reply

Anonymous
Not applicable

07-25-2022 9:45:28 AM

2 kudos

Hey there @John Brandborg Hope everything is going great! Just wanted to check in if you were able to resolve your issue would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to ...

2 kudos

07-25-2022 9:45:28 AM

1 More Replies

by Ravi96 • New Contributor II

05-25-2022 9:55:07 PM

2257 Views
4 replies
5 kudos

How can we sort the timeout issue in Databricks

we are creating a denorm table based on a JSON ingestion but the complex table is getting generated .when we try to deflatten the JSON rows it is taking for more than 5 hours and the error message is timeout erroris there any way that we could resolv...

Data Engineering

2257 Views
4 replies
5 kudos

05-25-2022 9:55:07 PM

View Replies

Latest Reply

Anonymous
Not applicable

07-25-2022 9:33:39 AM

5 kudos

Hey @Raviteja Paluri Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. Thanks!

5 kudos

07-25-2022 9:33:39 AM

3 More Replies

by ta_db • New Contributor

05-25-2022 7:12:11 PM

1040 Views
2 replies
0 kudos

Databricks SQL Endpoint Failing to create an external table on a parquet file with Decimal or Timestamp datatype

I'm using the Databricks SQL Endpoint and I'm attempting to create an external table on top of an existing parquet file. I can do this so long as my table definition does not include a reference to a decimal or timestamp/date datatype.ex. This worksC...

Data Engineering

1040 Views
2 replies
0 kudos

05-25-2022 7:12:11 PM

View Replies

Latest Reply

Anonymous
Not applicable

07-25-2022 9:31:29 AM

0 kudos

Hey there @T A Hope everything is going great!Does @Kaniz Fatma's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? If not, would you be happy to give us more info...

0 kudos

07-25-2022 9:31:29 AM

1 More Replies

by 577391 • New Contributor II

07-20-2022 4:58:03 PM

1078 Views
2 replies
0 kudos

Resolved! How do I merge two tables and track changes to missing rows as well as new rows

In my scenario, the new data coming in are the current, valid records. Any records that are not in the new data should be labeled as 'Gone", any matching records should be labeled with "Updated". And finally, any new records should be added.So in sum...

Data Engineering

1078 Views
2 replies
0 kudos

07-20-2022 4:58:03 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

07-25-2022 1:04:59 AM

0 kudos

Detection deletions does not work out of the box.The merge statement will evaluate the incoming data against the existing data. It will not check the existing data against the incoming data.To mark deletions, you will have to specifically update tho...

0 kudos

07-25-2022 1:04:59 AM

1 More Replies

by ivanychev • Contributor

07-25-2022 8:03:45 AM

700 Views
0 replies
1 kudos

How to enable remote JMX monitoring in Databricks?

Adding these optionsEXTRA_JAVA_OPTIONS = ( '-Dcom.sun.management.jmxremote.port=9999', '-Dcom.sun.management.jmxremote.authenticate=false', '-Dcom.sun.management.jmxremote.ssl=false', )is enough in vanilla Apache Spark, but apparently it ...

Data Engineering

700 Views
0 replies
1 kudos

07-25-2022 8:03:45 AM

by Sree_Patllola • New Contributor

07-25-2022 7:26:59 AM

913 Views
0 replies
0 kudos

I am in a process of Connecting to X vendor and pull back the data needed from that X vendor.

For that we have shared our Azure IP addres (NO VPN or Corporate IP address Available as of now - still initial stages of the project) with X vendor, which is whitelisted now. Now I am trying to setup the X vendor API in the databricks to lookup into...

Data Engineering

913 Views
0 replies
0 kudos

07-25-2022 7:26:59 AM

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Incorrect secret value when loaded as environment variable

How to access secrets in Hashicorp Vault from Databricks notebooks?

How to write to a folder in a Azure Data Lake container using Delta?

How to explain the huge time latency between two jobs? How to optimize the job to reduce the latency ?

Unable to read an XML file of 9 GB

Is the promise of a data lake simply about data science, data analytics and data quality or can it also be an integral part of core transaction processing also?

Pulling list of running jobs using JOBS API 2.1

Resolved! Databricks notebook command logging

Real Time data processing

Are there implications moving Managed Table, and mounting as External.

How can we sort the timeout issue in Databricks

Databricks SQL Endpoint Failing to create an external table on a parquet file with Decimal or Timestamp datatype

Resolved! How do I merge two tables and track changes to missing rows as well as new rows

How to enable remote JMX monitoring in Databricks?

I am in a process of Connecting to X vendor and pull back the data needed from that X vendor.

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...