Just needs to understand the data cleanroom. As per the documentation, Databricks Data Cleanroom provides a secure, governed, and privacy-safe environment. Participants can enable fine-grained control access to data with the help of UnityCatalog.Also...
Hi @Rajarampandian Arumugam​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear fro...
Hi,I'm using joblib for multiprocessing in one of our processes. The logging does work well (except weird py4j errors which I supress) except when it's within multiprocessing. Also how do I supress the other errors that I always receive on DB - perha...
@Sam G​ :It seems like the issue is related to the py4j library used by Spark, and not specifically related to joblib or multiprocessing. The error message indicates a network error while sending a command between the Python process and the Java Virt...
I'm using Databricks for processing large-scale data with Apache Spark, but I'm experiencing performance issues. The processing time is taking longer than expected, and I'm encountering memory and CPU usage limitations. I want to optimize the perform...
@jhon marton​ :Optimizing Spark performance in Databricks for large-scale data processing can involve a combination of techniques, configurations, and best practices. Below are some recommendations that can help improve the performance of your Spark ...
Hi,-Im performing some analysis using the databricks sql logs, and seeing these operation names.-I notice these events dont seem to have a duration nor query text, unlike commandSubmit operations.-Any explanation on what these operations mean exactly...
Hi @Jose Torres​, executeAdhocQuery and executeFastQuery are two types of operations that can appear in the Azure SQL Logs.executeAdhocQuery refers to the execution of an ad hoc query, which is a one-time query that is not stored as a prepared statem...
Hello, I have an Databricks account on Azure, and the goal is to compare different image tagging services from Azure, GCP, AWS via corresponding API calls, with Python notebook. I have problems with GCP vision API calls, specifically with credentials...
Ok, here is a trick: in my case, the file with GCP credentials is stored in notebook workspace storage, which is not visible to os.environ() command. So solution is to read a content of this file, and save it to the cluster storage attached to the no...
Hello @asdf fdsa​ ,The NodeJS connector is built for NodeJS environment it will not integrate ReactJSFor cases where a web execution is needed we advise to use SQL Exec APIPlease check documentation here for the same:https://docs.databricks.com/sql/a...
Hi DataBricks Experts:I'm using Databricks on Azure.... I'd like to understand the following:1) if there is way of automating the re run some specific failed tasks from a job (with several Tasks), for example if I have 4 tasks, and the task 1 and 2 h...
You can use "retries".In Workflow, select your job, the task, and in the options below, configure retries.If so, you can also see more options at:https://learn.microsoft.com/pt-br/azure/databricks/dev-tools/api/2.0/jobs?source=recommendations
I have registreded account via AWS marketplace.Also I have deployed workspaces with Terraform.When I log in admin console, It redirects me to https://accounts.cloud.databricks.com/onboardingwhere I need to create workspace manually, but I don't want ...
Hi Team, Would you mind telling us how you have provisioned? Are you using the same account id which you have used while creation. If so, Could you please try to login through incognito and see if that works?
Can someone please provide an example while loop including has_more=true. I can't get pagination to work for the API endpoint '/jobs/runs/list/'. Thanks
Hi @Rachel Cunningham​ Could you please elaborate what you mean by "I can't get pagination to work"? Is "has_more" set to "true" even when there are no more tasks to list? This is do you mean it doesn't list all runs or doesn't list tasks within each...
We made another major release for Security Analysis Tool (SAT) with Unity Catalog and Delta sharing checks, Terraform deployments, and faster analysis if you have many workspaces. If you are on Azure Databricks there are new step-by-step video-based ...
We are trying to make a connection to database instance from datahub/dbeaver and getting error . We can make a connection manually after few tries . We are facing it every time we execute our code to make a connection. We need to resolve this before ...
I would like to do the Platform Administrator learning plan, but for all components in the learning plan it mentions "in waiting list". What does this mean?
I want to go for Databricks Certified Data Engineer Professional, Is there any predefined study material for Databricks Certified Data Engineer Professional Certification?
How to set just the right processingTime for readStream to maximize the performance? Based on which factors it depends and is there a way to measure this?
Thanks @Ajay Pandey​ and @Nandini N​ for your answers. I wanted to know more about what should I do in order to do it properly. Should I change processing times (1, 5, 10, 30, 60 seconds) and see how it affects running job in terms of time and CPU/me...
I have an avro schema for my Kafka topic. In that schema it has defaults. I would like to exclude the defaulted columns from databricks and just let them default as an empty array. Sample avro, trying to not provide the UserFields because I can't...
Hi @Adam Rink​ , Please go through the following blog. Let me know if it helps.https://docs.databricks.com/spark/latest/structured-streaming/avro-dataframe.html#example-with-schema-registry