Data Engineering

Forum Posts

Sorted by:

by Raja_682227 • New Contributor II

02-16-2023 3:28:39 AM

906 Views
2 replies
2 kudos

Databricks Data Cleanroom

Just needs to understand the data cleanroom. As per the documentation, Databricks Data Cleanroom provides a secure, governed, and privacy-safe environment. Participants can enable fine-grained control access to data with the help of UnityCatalog.Also...

Data Engineering

906 Views
2 replies
2 kudos

02-16-2023 3:28:39 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-20-2023 10:25:34 PM

2 kudos

Hi @Rajarampandian Arumugam Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear fro...

2 kudos

04-20-2023 10:25:34 PM

1 More Replies

by Snowhow1 • New Contributor II

04-17-2023 4:15:01 AM

2918 Views
1 replies
1 kudos

Logging when using multiprocessing with joblib

Hi,I'm using joblib for multiprocessing in one of our processes. The logging does work well (except weird py4j errors which I supress) except when it's within multiprocessing. Also how do I supress the other errors that I always receive on DB - perha...

Data Engineering

2918 Views
1 replies
1 kudos

04-17-2023 4:15:01 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-20-2023 7:50:34 PM

1 kudos

@Sam G :It seems like the issue is related to the py4j library used by Spark, and not specifically related to joblib or multiprocessing. The error message indicates a network error while sending a command between the Python process and the Java Virt...

1 kudos

04-20-2023 7:50:34 PM

by jhon341 • New Contributor

04-16-2023 1:35:40 AM

2504 Views
1 replies
0 kudos

How can I optimize Spark performance in Databricks for large-scale data processing

I'm using Databricks for processing large-scale data with Apache Spark, but I'm experiencing performance issues. The processing time is taking longer than expected, and I'm encountering memory and CPU usage limitations. I want to optimize the perform...

Data Engineering

2504 Views
1 replies
0 kudos

04-16-2023 1:35:40 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-20-2023 7:11:05 PM

0 kudos

@jhon marton :Optimizing Spark performance in Databricks for large-scale data processing can involve a combination of techniques, configurations, and best practices. Below are some recommendations that can help improve the performance of your Spark ...

0 kudos

04-20-2023 7:11:05 PM

by jtorr • New Contributor

04-17-2023 11:16:25 AM

1059 Views
1 replies
0 kudos

What are executeAdhocQuery and executeFastQuery operations in the Azure SQL Logs?

Hi,-Im performing some analysis using the databricks sql logs, and seeing these operation names.-I notice these events dont seem to have a duration nor query text, unlike commandSubmit operations.-Any explanation on what these operations mean exactly...

Data Engineering

1059 Views
1 replies
0 kudos

04-17-2023 11:16:25 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-20-2023 5:00:41 PM

0 kudos

Hi @Jose Torres, executeAdhocQuery and executeFastQuery are two types of operations that can appear in the Azure SQL Logs.executeAdhocQuery refers to the execution of an ad hoc query, which is a one-time query that is not stored as a prepared statem...

0 kudos

04-20-2023 5:00:41 PM

by lugger1 • New Contributor III

04-19-2023 7:21:58 AM

1537 Views
1 replies
1 kudos

Resolved! What is the best way to use credentials for API calls from databricks notebook?

Hello, I have an Databricks account on Azure, and the goal is to compare different image tagging services from Azure, GCP, AWS via corresponding API calls, with Python notebook. I have problems with GCP vision API calls, specifically with credentials...

Data Engineering

1537 Views
1 replies
1 kudos

04-19-2023 7:21:58 AM

View Replies

Latest Reply

lugger1
New Contributor III

04-20-2023 4:21:00 PM

1 kudos

Ok, here is a trick: in my case, the file with GCP credentials is stored in notebook workspace storage, which is not visible to os.environ() command. So solution is to read a content of this file, and save it to the cluster storage attached to the no...

1 kudos

04-20-2023 4:21:00 PM

by testname1 • New Contributor II

03-26-2023 2:36:16 PM

996 Views
1 replies
1 kudos

Is it possible to use the databricks-sql-nodejs driver in a create-react-app app?

I'm using the typescript example for the databricks sql driver but I'm getting errors when compiling:

Data Engineering

996 Views
1 replies
1 kudos

03-26-2023 2:36:16 PM

View Replies

Latest Reply

User16502773013
New Contributor III

04-20-2023 4:01:23 PM

1 kudos

Hello @asdf fdsa ,The NodeJS connector is built for NodeJS environment it will not integrate ReactJSFor cases where a web execution is needed we advise to use SQL Exec APIPlease check documentation here for the same:https://docs.databricks.com/sql/a...

1 kudos

04-20-2023 4:01:23 PM

by Diego_MSFT • New Contributor II

08-05-2022 6:02:47 PM

2444 Views
1 replies
4 kudos

Automating the re run of job (with several Tasks) // automate the notification of a failed specific tasks after re trying // Error handling on azure data factory pipeline with DataBricks notebook

Hi DataBricks Experts:I'm using Databricks on Azure.... I'd like to understand the following:1) if there is way of automating the re run some specific failed tasks from a job (with several Tasks), for example if I have 4 tasks, and the task 1 and 2 h...

Data Engineering

2444 Views
1 replies
4 kudos

08-05-2022 6:02:47 PM

View Replies

Latest Reply

Lindberg
New Contributor II

04-20-2023 11:55:30 AM

4 kudos

You can use "retries".In Workflow, select your job, the task, and in the options below, configure retries.If so, you can also see more options at:https://learn.microsoft.com/pt-br/azure/databricks/dev-tools/api/2.0/jobs?source=recommendations

4 kudos

04-20-2023 11:55:30 AM

by dceman • New Contributor

01-19-2023 8:18:49 AM

578 Views
1 replies
0 kudos

How to skip "onboarding" wizard?

I have registreded account via AWS marketplace.Also I have deployed workspaces with Terraform.When I log in admin console, It redirects me to https://accounts.cloud.databricks.com/onboardingwhere I need to create workspace manually, but I don't want ...

Data Engineering

578 Views
1 replies
0 kudos

01-19-2023 8:18:49 AM

View Replies

Latest Reply

Mounika_Tarigop
New Contributor II

04-20-2023 10:23:37 AM

0 kudos

Hi Team, Would you mind telling us how you have provisioned? Are you using the same account id which you have used while creation. If so, Could you please try to login through incognito and see if that works?

0 kudos

04-20-2023 10:23:37 AM

by 190809 • Contributor

01-06-2023 7:04:15 AM

924 Views
2 replies
1 kudos

Example API call using 'has_more=true'

Can someone please provide an example while loop including has_more=true. I can't get pagination to work for the API endpoint '/jobs/runs/list/'. Thanks

Data Engineering

924 Views
2 replies
1 kudos

01-06-2023 7:04:15 AM

View Replies

Latest Reply

arpit
Contributor III

04-20-2023 9:59:05 AM

1 kudos

Hi @Rachel Cunningham Could you please elaborate what you mean by "I can't get pagination to work"? Is "has_more" set to "true" even when there are no more tasks to list? This is do you mean it doesn't list all runs or doesn't list tasks within each...

1 kudos

04-20-2023 9:59:05 AM

1 More Replies

by arun_pamulapati • New Contributor III

04-10-2023 5:34:11 AM

458 Views
1 replies
1 kudos

www.youtube.com

We made another major release for Security Analysis Tool (SAT) with Unity Catalog and Delta sharing checks, Terraform deployments, and faster analysis if you have many workspaces. If you are on Azure Databricks there are new step-by-step video-based ...

Data Engineering

458 Views
1 replies
1 kudos

04-10-2023 5:34:11 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

04-20-2023 8:58:37 AM

1 kudos

Thank you for sharing @Arun Pamulapati!!!

1 kudos

04-20-2023 8:58:37 AM

by saikrishna3390 • New Contributor II

12-20-2022 12:27:21 PM

491 Views
1 replies
0 kudos

The current cluster state is pending . please retry your request after 30 seconds

We are trying to make a connection to database instance from datahub/dbeaver and getting error . We can make a connection manually after few tries . We are facing it every time we execute our code to make a connection. We need to resolve this before ...

Data Engineering

491 Views
1 replies
0 kudos

12-20-2022 12:27:21 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

04-20-2023 8:54:01 AM

0 kudos

could you share more details? for example, go to the driver's logs and extract the logs and share the error stack trace with us please.

0 kudos

04-20-2023 8:54:01 AM

by JeroenD • New Contributor

01-02-2023 8:31:13 AM

549 Views
1 replies
0 kudos

Waiting list

I would like to do the Platform Administrator learning plan, but for all components in the learning plan it mentions "in waiting list". What does this mean?

Data Engineering

549 Views
1 replies
0 kudos

01-02-2023 8:31:13 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

04-20-2023 8:50:01 AM

0 kudos

adding @Vidula Khanna and @Kaniz Fatma for visibility

0 kudos

04-20-2023 8:50:01 AM

by asif5494 • New Contributor III

12-25-2022 10:58:04 PM

617 Views
1 replies
3 kudos

Study material for Databricks Certified Data Engineer Professional Certification?

I want to go for Databricks Certified Data Engineer Professional, Is there any predefined study material for Databricks Certified Data Engineer Professional Certification?

Data Engineering

617 Views
1 replies
3 kudos

12-25-2022 10:58:04 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

04-20-2023 8:48:55 AM

3 kudos

adding @Vidula Khanna and @Kaniz Fatma for visibility

3 kudos

04-20-2023 8:48:55 AM

by Ogi • New Contributor II

04-03-2023 2:30:34 AM

686 Views
4 replies
1 kudos

Setting right processingTime

How to set just the right processingTime for readStream to maximize the performance? Based on which factors it depends and is there a way to measure this?

Data Engineering

686 Views
4 replies
1 kudos

04-03-2023 2:30:34 AM

View Replies

Latest Reply

Ogi
New Contributor II

04-20-2023 3:56:59 AM

1 kudos

Thanks @Ajay Pandey and @Nandini N for your answers. I wanted to know more about what should I do in order to do it properly. Should I change processing times (1, 5, 10, 30, 60 seconds) and see how it affects running job in terms of time and CPU/me...

1 kudos

04-20-2023 3:56:59 AM

3 More Replies

by AdamRink • New Contributor III

02-23-2022 9:45:08 AM

1008 Views
3 replies
0 kudos

Resolved! Apply Avro defaults when writing to Confluent Kafka

I have an avro schema for my Kafka topic. In that schema it has defaults. I would like to exclude the defaulted columns from databricks and just let them default as an empty array. Sample avro, trying to not provide the UserFields because I can't...

Data Engineering

1008 Views
3 replies
0 kudos

02-23-2022 9:45:08 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-04-2022 6:39:22 AM

0 kudos

Hi @Adam Rink , Please go through the following blog. Let me know if it helps.https://docs.databricks.com/spark/latest/structured-streaming/avro-dataframe.html#example-with-schema-registry

0 kudos

03-04-2022 6:39:22 AM

2 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Databricks Data Cleanroom

Logging when using multiprocessing with joblib

How can I optimize Spark performance in Databricks for large-scale data processing

What are executeAdhocQuery and executeFastQuery operations in the Azure SQL Logs?

Resolved! What is the best way to use credentials for API calls from databricks notebook?

Is it possible to use the databricks-sql-nodejs driver in a create-react-app app?

Automating the re run of job (with several Tasks) // automate the notification of a failed specific tasks after re trying // Error handling on azure data factory pipeline with DataBricks notebook

How to skip "onboarding" wizard?

Example API call using 'has_more=true'

www.youtube.com

The current cluster state is pending . please retry your request after 30 seconds

Waiting list

Study material for Databricks Certified Data Engineer Professional Certification?

Setting right processingTime

Resolved! Apply Avro defaults when writing to Confluent Kafka

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...