cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

himanshu_k
by New Contributor
  • 6720 Views
  • 3 replies
  • 0 kudos

Clarification Needed: Ensuring Correct Pagination with Offset and Limit in PySpark

Hi community,I hope you're all doing well. I'm currently engaged in a PySpark project where I'm implementing pagination-like functionality using the offset and limit functions. My aim is to retrieve data between a specified starting_index and ending_...

  • 6720 Views
  • 3 replies
  • 0 kudos
Latest Reply
Mathias_Peters
Contributor II
  • 0 kudos

Hi, did you find answer to this question? I am having similar problems and a slow solution, which I need to improve upon. Thanks in advance

  • 0 kudos
2 More Replies
Reza
by New Contributor III
  • 13821 Views
  • 11 replies
  • 6 kudos

Resolved! How can search in a specific folder in Databricks?

There is a keyword search option in Databricks that searches for a command or word in the entire workspace. How can search for a command in a specific folder or repository?

  • 13821 Views
  • 11 replies
  • 6 kudos
Latest Reply
Jensz007
New Contributor II
  • 6 kudos

@AtanuI agree with nelsoncardenas, the problem is not solved, and the answer currently only provides us with saying we need to raise a feature request.Would it be possible to at least link the feature requested by nelsoncardenas to this post/answer? ...

  • 6 kudos
10 More Replies
nayan_wylde
by Esteemed Contributor
  • 920 Views
  • 3 replies
  • 0 kudos

Installing Maven in UC enabled Standard mode cluster.

Curios if anyone face the issue of installing Maven packages in UC enabled cluster. Traditionally we use to install maven packages from artifactory repo. I am trying to install the same package from a UC enabled cluster (Standard mode). It worked whe...

  • 920 Views
  • 3 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @nayan_wylde Yes, this is a common challenge when transitioning to Unity Catalog (UC) enabled clusters.The installation of Maven packages from Artifactory repositories does work differently in UC environments,but there are several approaches you c...

  • 0 kudos
2 More Replies
PedroFaria2135
by New Contributor II
  • 1958 Views
  • 1 replies
  • 0 kudos

Resolved! How to add permissions to a Databricks Workflow deployed via Asset Bundle YAML?

Hey! I was deploying a new Databricks Workflow into my workspace via Databricks Asset Bundles. Currently, I have a very simple workflow, defined in a YAML file like this: resources:  jobs:    example_job:      name: example_job      schedule:        ...

  • 1958 Views
  • 1 replies
  • 0 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 0 kudos

Hi @PedroFaria2135, this can be done using the permission key in the YAML file. Please refer to this document: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/reference#permissions    permissions: - level: CAN_VIEW group_name: te...

  • 0 kudos
Sangamswadik
by New Contributor III
  • 3078 Views
  • 5 replies
  • 2 kudos

Resolved! Unable to see All purpose compute

In the workspace, I can only see SQL warehouse, and apps, I've attached a screenshot. I don't see an option to create all purpose compute. Can you please tell me if there is a way to create one? Under user entitlements page look Identity and access >...

TWn25NCajM.png
  • 3078 Views
  • 5 replies
  • 2 kudos
Latest Reply
Execute
New Contributor II
  • 2 kudos

Please let us know how did you resolve this

  • 2 kudos
4 More Replies
karthikmani
by New Contributor
  • 861 Views
  • 1 replies
  • 1 kudos

Resolved! How to log the errors?

We have a notebook with some generic framework that we created to run for multiple tables everyday. We wanted to log the error/success/exceptions any such errors needs to be recorded in a log table so that we can troubleshoot based on the error log f...

  • 861 Views
  • 1 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 1 kudos

You can basically create some custom functions to log the events and write it to a data lake and then use structured streaming to read the data from data lake to a delta table.%scala// Functionsdef set_local_variables() = {      // get the variables ...

  • 1 kudos
OODataEng
by New Contributor III
  • 2073 Views
  • 6 replies
  • 1 kudos

Liquid clustering performance issue

Hello,I have a table with approximately 300 million records. It weighs 3.4 GB and consists of 305 files.I wanted to create liquid clustering for it and chose a date column as the key for clustering. When I created a new table with the above details b...

  • 2073 Views
  • 6 replies
  • 1 kudos
Latest Reply
Yogesh_Verma_
Contributor II
  • 1 kudos

Hey @OODDATAEng To create a new table in Databricks using the schema and data from an existing table, you can use the CREATE TABLE AS SELECT command. This command allows you to define a new table based on the results of a SELECT query executed on the...

  • 1 kudos
5 More Replies
JohanS
by New Contributor III
  • 5851 Views
  • 2 replies
  • 1 kudos

Resolved! WorkspaceClient authentication fails when running on a Docker cluster

from databricks.sdk import WorkspaceClientw = WorkspaceClient()ValueError: default auth: cannot configure default credentials ...I'm trying to instantiate a WorkspaceClient in a notebook on a cluster running a Docker image, but authentication fails.T...

  • 5851 Views
  • 2 replies
  • 1 kudos
Latest Reply
kyle_scherer1_5
New Contributor II
  • 1 kudos

Any progress here? Same issue, over a year later

  • 1 kudos
1 More Replies
OODataEng
by New Contributor III
  • 818 Views
  • 2 replies
  • 0 kudos

Resolved! Git cerdentials for serivce principal running jobs

Hello, I have a permission issue when trying to access Azure DevOps and run a job using a Service Principal.I’ve read about the whole credentials topic, and indeed, when I create a PAT (Personal Access Token) through my personal user account, I can s...

OODataEng_0-1749968869036.png
  • 818 Views
  • 2 replies
  • 0 kudos
Latest Reply
loui_wentzel
Contributor
  • 0 kudos

Using a PAT is how you authenticate as a user, so that you can configure your Service Principal (SP) - if you follow this link, there's a guide to the next steps (you're on step 3 now)Thie article explains a bit more on how to setup up the SP in Azur...

  • 0 kudos
1 More Replies
KristiLogos
by Contributor
  • 1728 Views
  • 4 replies
  • 1 kudos

Simba JDBC Exception When Querying Tables via BigQuery Databricks Connection

Hello, I have a federated connection to BigQuery that has GA events tables for each of our projects. I'm trying to query each daily table which contains about 400,000 each day, and load into another table, but I keep seeig this Simba JDBC exception. ...

  • 1728 Views
  • 4 replies
  • 1 kudos
Latest Reply
tsekityam_2
New Contributor II
  • 1 kudos

I also have this issue, and I resolved it by cast all the records columns in bigquery to string before I dump the data.I first create a view likecreate view xxx as select string_1, string_2, string_3, to_json_string(record_1) as record_1, to_json_s...

  • 1 kudos
3 More Replies
mkwparth
by New Contributor III
  • 2060 Views
  • 3 replies
  • 1 kudos

Resolved! How Increase REPL time to prevent timeout error

Hi everyone, I've tried setting the Spark configuration spark.databricks.repl.timeout to 300, but I’m still getting a REPL timeout error saying it took longer than 60 seconds. It seems like the configuration might be incorrect. Can someone guide me o...

mkwparth_0-1749620824347.png mkwparth_1-1749620849807.png
  • 2060 Views
  • 3 replies
  • 1 kudos
Latest Reply
mkwparth
New Contributor III
  • 1 kudos

Hi @Saritha_S ,Yes! I've configured spark config that you said. I'll observe for few days and let you know.Thanks! For your Help.

  • 1 kudos
2 More Replies
mickniz
by Contributor
  • 25378 Views
  • 8 replies
  • 2 kudos

Connect to Databricks from PowerApps

Hi All,Currently I trying to connect databricks Unity Catalog from Powerapps Dataflow by using spark connector specifying http url and using databricks personal access token as specified in below screenshot: I am able to connect but the issue is when...

mickniz_0-1714487746554.png mickniz_1-1714487891958.png
  • 25378 Views
  • 8 replies
  • 2 kudos
Latest Reply
Toussaint_Webb
Databricks Employee
  • 2 kudos

If you are an Azure Databricks customer, there is now a connector for Power Platform (Power Apps, Copilot Studio, and Power Automate) in Public Preview.BlogDocumentation 

  • 2 kudos
7 More Replies
data4life
by New Contributor II
  • 1052 Views
  • 4 replies
  • 5 kudos

Relative Path Reading Ambiguity in running nested run commands

Hello All,I came across an unusual error while using the %run & dbutils.notebook.run() functionalities of the notebook in tandem and the particular scenarios are listed below -I have below directory structure(simplified) where all 3 notebooks are loc...

main.png NB1.png NB2.png
  • 1052 Views
  • 4 replies
  • 5 kudos
Latest Reply
jameshughes
Contributor II
  • 5 kudos

I'm going to run an experiment in my workspace and let you know if I see the same thing.  I'm not sure if I have seen this, but also not sure if my use of relative pathing previously had notebooks in different directories as you have listed.  General...

  • 5 kudos
3 More Replies
ashokv
by New Contributor II
  • 757 Views
  • 2 replies
  • 0 kudos

Range join hint does not help in faster execution of spark sql

Spark SQL execution did not complete even after 12 hours, i ran it on i3.xlarge with 4 worker nodes.only two worker nodes showed as running, with CPU at 100%what should i do differently? --SQLINSERT into  attribute_results...SELECT  /*+ BROADCAST(t) ...

  • 757 Views
  • 2 replies
  • 0 kudos
Latest Reply
saiprasadambati
New Contributor III
  • 0 kudos

can you share the result of the below query ?select count(1) from transaction_attributes where analysis_start_date = '2025-05-01' and analysis_end_date = '2025-05-01' ,  If it has multiple entries , the join condition will lead to cross join and henc...

  • 0 kudos
1 More Replies
Anotech
by New Contributor II
  • 11506 Views
  • 3 replies
  • 1 kudos

How can I fix this error. ExecutionError: An error occurred while calling o392.mount: java.lang.NullPointerException

Hello, I'm trying to mount my Databricks to my Azure gen 2 data lake to read in data from the container, but I get an error when executing this line of code: dbutils.fs.mount( source = "abfss://resumes@choisysresume.dfs.core.windows.net/", mount_poin...

  • 11506 Views
  • 3 replies
  • 1 kudos
Latest Reply
Nikhill
New Contributor II
  • 1 kudos

I was using databricks scopes, to get the key which was used in the the config. I received a similar mount error while mounting with "wasbs" driver,  "ExecutionError: An error occurred while calling o427.mount.", this was the issue because the scope ...

  • 1 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels