Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Rednirus Mart is a Third-Party Pharma Manufacturer and Supplier. If you are looking For Pharma Contract manufacturers For Ayurvedic Medicine Manufacturer Company in your region. Rednirus Mart is one of the leading one and their products are manufactu...
Hi so I want to essentially execute a sql query if a condition is met. So one of my cells in my python notebook is a sql query (%sql followed by the query). Is there any way to put that in an 'IF' statement ie if an environment variable = some value,...
We get errors like this,Recursive view `x` detected (cycle: `x` -> `x`).. in our long-term working code, that has worked just fine in Spark 2.4.5 (Runtime 6.4), when we run it on a Spark 3.2 cluster (Runtime 10.0).It happens whenever we have,<x is a ...
This is a breaking change introduced in Spark 3.1 From Migration Guide: SQL, Datasets and DataFrame - Spark 3.1.1 Documentation (apache.org)In Spark 3.1, the temporary view will have same behaviors with the permanent view, i.e. capture and store runt...
Attached to this post we have added an ADLS Gen2 access recommendation to have the ideal security and governance over your data. The best practice involves leveraging Cluster ACLs, cluster configuration, and secret ACLs to handle user access over you...
I am trying to use Audit from Vertica in spark and getting correct table size from it, but the minimum size Audit function can find is bytes, but we are getting data in bits even smaller than bytes. val size = f"select audit('table_name');"
Rather everything will be in bytes. Spak sql have built in methods to get table size but also in bytes:spark.sql("ANALYZE TABLE df COMPUTE STATISTICS NOSCAN")spark.sql("DESCRIBE EXTENDED df ").filter(col("col_name") === "Statistics").show(false)
A quick way to start exploratory data analysis is to use the EDA notebook that is created when you use Databricks AutoML. Then you can use the notebook generated as is, or as a starting point for modeling. You’ll need a cluster with Databricks Runtim...
When I go to ideas.databricks.com it display me screen asking about workspace (so I put there for example westeurope.azuredatabricks.net):then it redirect to login and then to... my azure workspace instead of ideas.When I want to use community (I put...
Hello,I'm working in Databricks Community Edition. So I will terminate my cluster after my work because anyways it will be terminated after 2 hours. I'm creating a database to store all my transformed data. Will the database be deleted when I termina...
@Hubert Dudek​ - Thanks for answering so quickly!!@Sriram Devineedi​ - If Hubert's answer solved the issue for you, would you be happy to mark his answer as best? That helps others know where to look.
My dataset has an "item" column which groups the rows into many groups. (Think of these groups as items in a store.) I want to fit 1 ML model per group. Should I tune hyperparameters for each group separately? Or should I tune them for the entire...
For the first question ("which option is better?"), you need to answer that via your understanding of the problem domain.Do you expect similar behavior across the groups (items)?If so, that's a +1 in favor of sharing hyperparameters. And vice versa....
https://stackoverflow.com/questions/67088891/send-email-from-databricks-notebook-with-attachmenti have to send the attachment to the organisation google drive folder directly instead of email any suggestionssample email with attachement codemsg.atta...
I have completed my Databricks associate developer assessment on 12/05/2021 and received a pass result. On 12/08/2021 I have received an email stating my digital badge for this assessment is available. However, I do not see this badge or my completio...
Here is an article I wrote that puts Databricks in a historical context (why was it developed?) and provides introductory steps to help a newbie get started. Feel free to copy/link as you want.https://www.linkedin.com/pulse/databricks-introduction-ch...
Importing JSON to Databricks (PySpark) is simple in the simple case. But of course there are wrinkles for real-world data. Here are some tips/tricks to help...https://www.linkedin.com/pulse/json-databricks-pyspark-chuck-connell/
If you want to know the version of Databricks runtime in Azure after creation: Go to Azure Data bricks portal => Clusters => Interactive Clusters => here you can find the run time version. For more details, refer "Azure Databricks Runtime versions".R...
Please use Repos and in admin settings please enable "Files in Repo" than you will be able to import class in notebook:from repo_folder.subfolders.file import your_class