cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

kll
by New Contributor III
  • 661 Views
  • 0 replies
  • 0 kudos

Mosaic's grid_boundary method returns inconsistent geometries

I am applying mosaic's `grid_boundary` method on a spark DataFrame containing a set of `h3_hex_ids`. The geometries returned are not consistent. i.e they could be either `lat, long` or `long, lat`.Here's a sample data```import pyspark.sql.functions a...

Data Engineering
geospatial
mosaic
  • 661 Views
  • 0 replies
  • 0 kudos
442027
by New Contributor II
  • 5672 Views
  • 2 replies
  • 3 kudos

Resolved! Delta Log checkpoints not being created?

It is mentioned in the delta protocol that checkpoints for delta tables are created every 10 commits - however when I modify a table after >10 separate operations (producing >10 separate json files in the _delta_log directory), no checkpoint files ar...

  • 5672 Views
  • 2 replies
  • 3 kudos
Latest Reply
Vinay_M_R
Databricks Employee
  • 3 kudos

 As the latest update now checkpointing of delta tables are created for every 100 commits. This is done for some improvement purpose.If you want to have a checkpoint file for delta table for every 10 commits or after any desired commits. You can cust...

  • 3 kudos
1 More Replies
Vsleg
by Contributor
  • 4662 Views
  • 5 replies
  • 3 kudos

Resolved! Issue with Apache Sparkâ„¢ Programming with Databricks course

Hello,I found an issue with the Apache Sparkâ„¢ Programming with Databricks courses on Databricks Academy when trying to do the labs. The mount that the courses use for training data is failing with what looks to me like an authentication issue (see sc...

image
  • 4662 Views
  • 5 replies
  • 3 kudos
Latest Reply
Vsleg
Contributor
  • 3 kudos

I found the course Git Repo at (https://github.com/databricks-academy/apache-spark-programming-with-databricks-english), this works so using that instead of the 'apache-spark-programming-with-databricks.dbc' file available in the learning portal. #DA...

  • 3 kudos
4 More Replies
ah0896
by New Contributor III
  • 14296 Views
  • 17 replies
  • 10 kudos

Using init scripts on UC enabled shared access mode clusters

I know that UC enabled shared access mode clusters do not allow init script usage and I have tried multiple workarounds to use the required init script in the cluster(pyodbc-install.sh, in my case) including installing the pyodbc package as a workspa...

  • 14296 Views
  • 17 replies
  • 10 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 10 kudos

@Anonymous @Retired_mod can anyone form databricks confirm on above issue please, there seems to be bit conflict on using custom scripts support  on shared access mode cluster with unity catalog enabled please

  • 10 kudos
16 More Replies
alexiswl
by Contributor
  • 872 Views
  • 0 replies
  • 0 kudos

R Notebooks start before all libraries have been installed

Has anyone else come across an issue where their R notebook executes cells before all libraries in the cluster have been installed?These are libraries in the cluster configuration since the start (not ones I've just put in). Please see attached my tw...

alexiswl_0-1688599641817.png alexiswl_1-1688599678822.png
Data Engineering
Libraries
Notebook
R
  • 872 Views
  • 0 replies
  • 0 kudos
Hoviedo
by New Contributor III
  • 913 Views
  • 0 replies
  • 0 kudos

load data from sql server : python works well but spark does not

Hi, i have a problem with my on-premise sql conection from databricks.i have my python code wich use pymssql and it is working well but my spark code doest not and i am using the same credential for both, my spark code is """# Read data from SQL Serv...

  • 913 Views
  • 0 replies
  • 0 kudos
pjp94
by Contributor
  • 8547 Views
  • 0 replies
  • 0 kudos

Run threadpool on multiple nodes

I've ran a dual multiprocessing and multithreading solution in python before using the multiprocessing and concurrent futures python modules. However, since the multiprocessing module only runs on the driver node, I have to instead use sc.parallelize...

Data Engineering
parallelization
threading
  • 8547 Views
  • 0 replies
  • 0 kudos
Torlynet
by New Contributor III
  • 6422 Views
  • 2 replies
  • 3 kudos

Resolved! Azure Databricks workspace AAD authentication issue

I am trying to log in to my workspace, but it takes a very long time to evaluate. Sometimes, it simply fails to do so, and I am prompted with the below message. {"error_code":"TEMPORARILY_UNAVAILABLE","message":"Authentication is temporarily unavaila...

  • 6422 Views
  • 2 replies
  • 3 kudos
Latest Reply
Torlynet
New Contributor III
  • 3 kudos

Hi Menotron.Eventually, I realized it myself. Thank you for your comment

  • 3 kudos
1 More Replies
shreyassharmabh
by New Contributor II
  • 3922 Views
  • 2 replies
  • 1 kudos

How to check programmatically job cluster is unity catalog enabled or not in databricks

Is there any way to check job cluster is unity catalog enabled or not in databricks using python.I tried with jobs api https://{host_name}/api/2.0/jobs/get?job_id={job_id}, but I didn't that cluster is unity catalog enabled or not.Could anyone sugges...

  • 3922 Views
  • 2 replies
  • 1 kudos
Latest Reply
KarenZak
New Contributor II
  • 1 kudos

To check if a job cluster is Unity catalog enabled in Databricks programmatically using Python, you can use the Databricks REST API. Here's an example of how you can do it:Import the required modules:import requestsSet up the necessary variables:host...

  • 1 kudos
1 More Replies
chorongs
by New Contributor III
  • 3690 Views
  • 2 replies
  • 1 kudos

Resolved! Sequential vs concurrency optimization questions from query!

Preparing for databricks eligibility!Is the content below correct?"If the queries are running sequentially then scale up (increase the size of the cluster from 2x small to 4x large)If the queries are running concurrently or with many users then scale...

  • 3690 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Scaling in Databricks involves two aspects: vertical scaling (scale up) and horizontal scaling (scale out). Vertical Scaling (Scale Up): If your queries are running sequentially, meaning one query at a time, and you want to improve performance for a...

  • 1 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels