cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Dean_Lovelace
by New Contributor III
  • 2390 Views
  • 1 replies
  • 1 kudos

Resolved! Efficiently move multiple files with dbutils.fs.mv command on abfs storage

As part of my batch processing I archive a large number of small files received from the source system each day using the dbutils.fs.mv command. This takes hours as dbutils.fs.mv moves the files one at a time.How can I speed this up?

  • 2390 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@Dean Lovelace​ You can use multithreading.See example here: https://nealanalytics.com/blog/databricks-spark-jobs-optimization-techniques-multi-threading/

  • 1 kudos
Phani1
by Valued Contributor
  • 5887 Views
  • 2 replies
  • 2 kudos

Resolved! Web application integrated with Gradio or streamlit on Databricks

We are trying to run a web application integrated with Gradio on Databricks. Although, we have configured launch parameter with (share="True")The app executes and gives us output but it keeps on running with no Public URL is generated:o/p: Running on...

  • 5887 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Janga Reddy​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 2 kudos
1 More Replies
youssefmrini
by Honored Contributor III
  • 495 Views
  • 1 replies
  • 0 kudos
  • 495 Views
  • 1 replies
  • 0 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 0 kudos

You can now use Delta Sharing to share notebook files securely using the Databricks-to-Databricks sharing flow.Sharing notebooks empowers users to collaborate across metastores and accounts, and enables providers to demonstrate use cases and visualiz...

  • 0 kudos
Kaniz
by Community Manager
  • 259 Views
  • 0 replies
  • 2 kudos

��RAFFLE ALERT�� Hey there, Awesome Community Members! �� Tick-Tock, Tick-Tock!  Time is racing, and we're just FOUR WEEKS aw...

RAFFLE ALERTHey there, Awesome Community Members! Tick-Tock, Tick-Tock! Time is racing, and we're just FOUR WEEKS away from the grand raffle draw! Some may think, "What if I can't reach the United States?" Well, we've got your back. We understand tha...

  • 259 Views
  • 0 replies
  • 2 kudos
ArturoNuor
by New Contributor III
  • 2411 Views
  • 3 replies
  • 0 kudos

Resolved! Unable to install R geospatial libraries raster, terra, sf, ncdf4, etc

When trying to install any of this R libraries from a cmd cell/block on a notebook, or from the UI in the cluster I receive the same error,seeming that are unable to install dependecies.Warning in utils::install.packages(pkgs, ...) : installation of ...

  • 2411 Views
  • 3 replies
  • 0 kudos
Latest Reply
ArturoNuor
New Contributor III
  • 0 kudos

For the next soul looking for an answer, I managed to solve the issue with the next 2 Init scripts, it gets tricky in the apt or apt-get, that was the issue, sometimes it did update, sometimes it didn't, making it possible to find libmysqlclient21.1)...

  • 0 kudos
2 More Replies
Divya_Bhadauria
by New Contributor II
  • 349 Views
  • 0 replies
  • 0 kudos

Number of rows displayed in sql cell

Even though the default limit on rows displayed is 10,000, the SQL cell is showing rows less than the limit when my resultant has more rows than 10k.It should alteast show the default limit .

  • 349 Views
  • 0 replies
  • 0 kudos
source2sea
by Contributor
  • 2607 Views
  • 4 replies
  • 2 kudos

Resolved! how to make databricks job to fail when the application has already given "exit code 1"?

object OurMainObject extends LazyLogging with IOApp { def run(args: List[String]): IO[ExitCode] = { logger.info("Started the application")   val conf = defaultOverrides.withFallback(defaultApplication).withFallback(defaultReference) val...

  • 2607 Views
  • 4 replies
  • 2 kudos
Latest Reply
source2sea
Contributor
  • 2 kudos

my workaround now is to make the code like below, so the databricks jobs becomes failure. case Left(ex) => { IO(logger.error("Glue failure", ex)).map(_ => ExitCode.Error) IO.raiseError(ex) }

  • 2 kudos
3 More Replies
DomDuf
by New Contributor II
  • 2767 Views
  • 3 replies
  • 3 kudos

Resolved! Roll back to previous version of an AutoLoader checkpoint file

I know to "reset" AutoLoader, you can delete the checkpoint file entirely. I was wondering if it's possible to and how would someone :Get the checkpoint file to a previous version so I can reload certain files that were already processedDelete certai...

  • 2767 Views
  • 3 replies
  • 3 kudos
Latest Reply
MRTN
New Contributor III
  • 3 kudos

This would for sure be a useful feature.

  • 3 kudos
2 More Replies
MRTN
by New Contributor III
  • 2599 Views
  • 2 replies
  • 1 kudos

Resolved! Configure multiple source paths for auto loader

I am currently using two streams to monitor data in two different containers on an Azure storage account. Is there any way to configure an autoloader to read from two different locations? The schemas of the files are identical.

  • 2599 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Morten Stakkeland​ :Yes, it's possible to configure an autoloader to read from multiple locations.You can define multiple CloudFiles sources for the autoloader, each pointing to a different container in the same storage account. In your case, since ...

  • 1 kudos
1 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 386 Views
  • 0 replies
  • 4 kudos

databricks Runtime 13.1 has added the sql_keywords() function, which lists all SQL keywords. It is a good practice to refrain from using these keyword...

databricks Runtime 13.1 has added the sql_keywords() function, which lists all SQL keywords. It is a good practice to refrain from using these keywords as names for tables or fields, although, in standard ANSI false mode, it will work without problem...

Untitled
  • 386 Views
  • 0 replies
  • 4 kudos
KVNARK
by Honored Contributor II
  • 1581 Views
  • 2 replies
  • 1 kudos

Resolved! Notebook activity is getting timed out in ADF pipeline.

Notebook activity is getting timed out after certain time of running (5 hours) in ADF pipeline and getting timeout error.Its just simply getting timed out error. Problem is this will process TB of data daily. can anyone have any idea to fix this.

  • 1581 Views
  • 2 replies
  • 1 kudos
Latest Reply
KVNARK
Honored Contributor II
  • 1 kudos

@Daniel Sahal​ - Noted. Thanks Daniel!

  • 1 kudos
1 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 1840 Views
  • 2 replies
  • 7 kudos

You can use apache hudi in databricks without a problem: - in cluster settings, install Maven library org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0...

You can use apache hudi in databricks without a problem:- in cluster settings, install Maven library org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0 for Databricks 12.2 LTS- in cluster spark config, add three lines:spark.sql.extensions org.apache.sp...

hudi
  • 1840 Views
  • 2 replies
  • 7 kudos
Latest Reply
ros
New Contributor III
  • 7 kudos

I tried installing library and configuring spark configs, restarted the cluster and then in notebook ran the create cmd but it gives me error stating java.io.FileNotFoundException: No such file or directory: s3://incred-databricks-data/hudi_dms_data/...

  • 7 kudos
1 More Replies
_deepak_
by New Contributor II
  • 633 Views
  • 1 replies
  • 2 kudos

Resolved! Shallow copy in databricks

Hi, I am new to Databricks. I need to setup a non-prod environment for which I need data of prod to be cloned in non-prod. Explored some and got to know about shallow copy. Is it possible to do shallow copy across environments? or Is it possible to d...

  • 633 Views
  • 1 replies
  • 2 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 2 kudos

@deepak prasad​ I'm not sure it's possible to do that. Even with Unity Catalog enabled, you cannot use shallow clone.You can do two things here:Without UC - just simply recreate an empty table in your non-prod environment and do SELECT * from prod st...

  • 2 kudos
SenthilJ
by New Contributor III
  • 675 Views
  • 1 replies
  • 2 kudos

Resolved! Databricks Account

Hi,In my org, we are using Azure Databricks. As an Azure AD user, I and my project team have access to Databricks workspaces. In our context, what's exactly meant as Databricks Account? I understand it's a group of workspaces used for billing, but at...

  • 675 Views
  • 1 replies
  • 2 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 2 kudos

@Senthilnathan J​ Databricks Account is like a top level of administration layer for everything that's going on your tenant.

  • 2 kudos
Labels
Top Kudoed Authors