cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

drewtoby
by New Contributor II
  • 8013 Views
  • 2 replies
  • 1 kudos

Resolved! How to Pull Cached SQL Table into Python Dictionary?

Hello,I have been working on this issue as a proof of concept - it would be extremely helpful to iterate through tables via loops in a few scenarios. I have a simple three column dimension that I added to a cached table.cache lazy table hedis_cache s...

Method 1 Method 2
  • 8013 Views
  • 2 replies
  • 1 kudos
Latest Reply
drewtoby
New Contributor II
  • 1 kudos

Got it to work, thank you for the tip! I needed to convert the dataframe over to a pandas dataframehttps://www.geeksforgeeks.org/convert-pyspark-dataframe-to-dictionary-in-python/

  • 1 kudos
1 More Replies
AkasBala
by New Contributor III
  • 1676 Views
  • 4 replies
  • 3 kudos

Unity Catalog Primary key column taking duplicates

I have Updated a Hive Meta Store from a Unity Catalog. I have setup Primary keys on the table. When I try to insert duplicates its succeeding Inserts and seems like PK is not working. Anyone else seeing such behaviour ?

  • 1676 Views
  • 4 replies
  • 3 kudos
Latest Reply
AkasBala
New Contributor III
  • 3 kudos

@Debayan Mukherjee​ Any info on the above plz ??

  • 3 kudos
3 More Replies
Anonymous
by Not applicable
  • 610 Views
  • 0 replies
  • 0 kudos

docs.databricks.com

What Serverless features are you using on Databricks? I am curious to know.Is it Databricks SQL Serverless or Model Serving?Proceed here to Compare serverless compute to other Databricks architectureshttps://docs.databricks.com/serverless-compute/ind...

  • 610 Views
  • 0 replies
  • 0 kudos
Anuj93
by New Contributor III
  • 623 Views
  • 0 replies
  • 0 kudos

Change Azure Databricks cluster owner

I wanted to add secrets to spark conf of the cluster but i am not able to because i am not the cluster owner. I want to know how can we change the cluster owner?

  • 623 Views
  • 0 replies
  • 0 kudos
Ryu1
by New Contributor
  • 692 Views
  • 0 replies
  • 0 kudos

Other than the "account admin" permission, is there a small permission or role to collect only catalog information?

I am going to use an open source called "datahub" to collect and share metadata information of databricks. (https://datahubproject.io/)Recently, however, there has been a big challenge. That is, to collect the unity catalog information of databricks,...

  • 692 Views
  • 0 replies
  • 0 kudos
Dean_Lovelace
by New Contributor III
  • 3193 Views
  • 1 replies
  • 1 kudos

Resolved! Efficiently move multiple files with dbutils.fs.mv command on abfs storage

As part of my batch processing I archive a large number of small files received from the source system each day using the dbutils.fs.mv command. This takes hours as dbutils.fs.mv moves the files one at a time.How can I speed this up?

  • 3193 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@Dean Lovelace​ You can use multithreading.See example here: https://nealanalytics.com/blog/databricks-spark-jobs-optimization-techniques-multi-threading/

  • 1 kudos
Phani1
by Valued Contributor
  • 7334 Views
  • 2 replies
  • 2 kudos

Resolved! Web application integrated with Gradio or streamlit on Databricks

We are trying to run a web application integrated with Gradio on Databricks. Although, we have configured launch parameter with (share="True")The app executes and gives us output but it keeps on running with no Public URL is generated:o/p: Running on...

  • 7334 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Janga Reddy​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 2 kudos
1 More Replies
youssefmrini
by Honored Contributor III
  • 675 Views
  • 1 replies
  • 0 kudos
  • 675 Views
  • 1 replies
  • 0 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 0 kudos

You can now use Delta Sharing to share notebook files securely using the Databricks-to-Databricks sharing flow.Sharing notebooks empowers users to collaborate across metastores and accounts, and enables providers to demonstrate use cases and visualiz...

  • 0 kudos
Kaniz_Fatma
by Community Manager
  • 366 Views
  • 0 replies
  • 2 kudos

��RAFFLE ALERT�� Hey there, Awesome Community Members! �� Tick-Tock, Tick-Tock!  Time is racing, and we're just FOUR WEEKS aw...

RAFFLE ALERTHey there, Awesome Community Members! Tick-Tock, Tick-Tock! Time is racing, and we're just FOUR WEEKS away from the grand raffle draw! Some may think, "What if I can't reach the United States?" Well, we've got your back. We understand tha...

  • 366 Views
  • 0 replies
  • 2 kudos
ArturoNuor
by New Contributor III
  • 3333 Views
  • 3 replies
  • 0 kudos

Resolved! Unable to install R geospatial libraries raster, terra, sf, ncdf4, etc

When trying to install any of this R libraries from a cmd cell/block on a notebook, or from the UI in the cluster I receive the same error,seeming that are unable to install dependecies.Warning in utils::install.packages(pkgs, ...) : installation of ...

  • 3333 Views
  • 3 replies
  • 0 kudos
Latest Reply
ArturoNuor
New Contributor III
  • 0 kudos

For the next soul looking for an answer, I managed to solve the issue with the next 2 Init scripts, it gets tricky in the apt or apt-get, that was the issue, sometimes it did update, sometimes it didn't, making it possible to find libmysqlclient21.1)...

  • 0 kudos
2 More Replies
Divya_Bhadauria
by New Contributor II
  • 504 Views
  • 0 replies
  • 0 kudos

Number of rows displayed in sql cell

Even though the default limit on rows displayed is 10,000, the SQL cell is showing rows less than the limit when my resultant has more rows than 10k.It should alteast show the default limit .

  • 504 Views
  • 0 replies
  • 0 kudos
source2sea
by Contributor
  • 3645 Views
  • 4 replies
  • 2 kudos

Resolved! how to make databricks job to fail when the application has already given "exit code 1"?

object OurMainObject extends LazyLogging with IOApp { def run(args: List[String]): IO[ExitCode] = { logger.info("Started the application")   val conf = defaultOverrides.withFallback(defaultApplication).withFallback(defaultReference) val...

  • 3645 Views
  • 4 replies
  • 2 kudos
Latest Reply
source2sea
Contributor
  • 2 kudos

my workaround now is to make the code like below, so the databricks jobs becomes failure. case Left(ex) => { IO(logger.error("Glue failure", ex)).map(_ => ExitCode.Error) IO.raiseError(ex) }

  • 2 kudos
3 More Replies
DomDuf
by New Contributor II
  • 3723 Views
  • 3 replies
  • 3 kudos

Resolved! Roll back to previous version of an AutoLoader checkpoint file

I know to "reset" AutoLoader, you can delete the checkpoint file entirely. I was wondering if it's possible to and how would someone :Get the checkpoint file to a previous version so I can reload certain files that were already processedDelete certai...

  • 3723 Views
  • 3 replies
  • 3 kudos
Latest Reply
MRTN
New Contributor III
  • 3 kudos

This would for sure be a useful feature.

  • 3 kudos
2 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels