Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I am applying mosaic's `grid_boundary` method on a spark DataFrame containing a set of `h3_hex_ids`. The geometries returned are not consistent. i.e they could be either `lat, long` or `long, lat`.Here's a sample data```import pyspark.sql.functions a...
It is mentioned in the delta protocol that checkpoints for delta tables are created every 10 commits - however when I modify a table after >10 separate operations (producing >10 separate json files in the _delta_log directory), no checkpoint files ar...
As the latest update now checkpointing of delta tables are created for every 100 commits. This is done for some improvement purpose.If you want to have a checkpoint file for delta table for every 10 commits or after any desired commits. You can cust...
Hello,I found an issue with the Apache Sparkâ„¢ Programming with Databricks courses on Databricks Academy when trying to do the labs. The mount that the courses use for training data is failing with what looks to me like an authentication issue (see sc...
I found the course Git Repo at (https://github.com/databricks-academy/apache-spark-programming-with-databricks-english), this works so using that instead of the 'apache-spark-programming-with-databricks.dbc' file available in the learning portal. #DA...
I am trying to get the spark default metrics from the application to statsd sink at Job level not cluster level. So I configured the necessary configuration in the Spark context and spark session in code. And in a local system, which means a single n...
Has anyone else come across an issue where their R notebook executes cells before all libraries in the cluster have been installed?These are libraries in the cluster configuration since the start (not ones I've just put in). Please see attached my tw...
Hi,I applied for Databricks Certified: Data Engineer Professional certification on 5th July 2023. The test was going fine for me but suddenly there was an alert from the system (I think I was in proper angle in front of camera and was genuinely givin...
Hi,I applied for Databricks Certified: Data Engineer Professional certification on 5th July 2023. The test was going fine for me but suddenly there was an alert from the system (I think I was in proper angle in front of camera and was genuinely givin...
Hi, i have a problem with my on-premise sql conection from databricks.i have my python code wich use pymssql and it is working well but my spark code doest not and i am using the same credential for both, my spark code is """# Read data from SQL Serv...
I've ran a dual multiprocessing and multithreading solution in python before using the multiprocessing and concurrent futures python modules. However, since the multiprocessing module only runs on the driver node, I have to instead use sc.parallelize...
I am trying to log in to my workspace, but it takes a very long time to evaluate. Sometimes, it simply fails to do so, and I am prompted with the below message. {"error_code":"TEMPORARILY_UNAVAILABLE","message":"Authentication is temporarily unavaila...
Hi All,Recently i created a custom docker image to run on databricks on cluster, Image built successfully and also cluster started but when i try to run anything in the notebook it's throwing error:File "/databricks/python_shell/scripts/db_ipyke...
Is there any way to check job cluster is unity catalog enabled or not in databricks using python.I tried with jobs api https://{host_name}/api/2.0/jobs/get?job_id={job_id}, but I didn't that cluster is unity catalog enabled or not.Could anyone sugges...
To check if a job cluster is Unity catalog enabled in Databricks programmatically using Python, you can use the Databricks REST API. Here's an example of how you can do it:Import the required modules:import requestsSet up the necessary variables:host...
Preparing for databricks eligibility!Is the content below correct?"If the queries are running sequentially then scale up (increase the size of the cluster from 2x small to 4x large)If the queries are running concurrently or with many users then scale...
Scaling in Databricks involves two aspects: vertical scaling (scale up) and horizontal scaling (scale out).
Vertical Scaling (Scale Up):
If your queries are running sequentially, meaning one query at a time, and you want to improve performance for a...
Hi,I'm trying to assign a location to a new database in Databricks SQL. Normally I'd do this in Python since we specify storage account names from secret scopes, however I'm attempting to do all of this from a SQL warehouse. When doing this I seem to...