cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Vee
by New Contributor
  • 6759 Views
  • 1 replies
  • 1 kudos

Cluster configuration and optimal number for fs.s3a.connection.maximum , fs.s3a.threads.max

Please could you suggest best cluster configuration for a use case stated below and tips to resolve the errors shown below -Use case:There could be 4 or 5 spark jobs that run concurrently.Each job reads 40 input files and spits out 120 output files ...

  • 6759 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Vetrivel Senthil​ , Just wondering if this question is a duplicate from this one https://community.databricks.com/s/feed/0D53f00001qvQJcCAM?

  • 1 kudos
Rk2
by New Contributor II
  • 2280 Views
  • 2 replies
  • 4 kudos

Resolved! scheduling a job with multiple notebooks using common parameter

I have a practical use case​three notebooks (pyspark ) all have on​e common parameter. ​need to schedule all three notebooks in a sequence ​is there any way to run them by setting one parameter value, as they are same in all. ​please suggest the ...

  • 2280 Views
  • 2 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

@Ramesh Kotha​ , in notebook get parameter like that:my_parameter = dbutils.widgets.get("my_parameter")and set it in a task like that:

  • 4 kudos
1 More Replies
SailajaB
by Valued Contributor III
  • 5693 Views
  • 3 replies
  • 7 kudos

Resolved! how we can use config file to change pysparks dataframe names without hardcoding

Hi,Can we use config file to change pyspark dataframe attribute names (root, nested of both struct and array type) .Actually in input we are getting attributes in lowercase we need to convert them into camel case(please note we don't have any separat...

  • 5693 Views
  • 3 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hi @Sailaja B​ This is awesome!Thanks for coming in and posting the solution. We really appreciate it.Cheers!

  • 7 kudos
2 More Replies
Tahseen0354
by Valued Contributor
  • 1925 Views
  • 1 replies
  • 1 kudos

Configure CLI on databricks on GCP

Hi, I have a service account in my GCP project and the service account is added as a user in my databricks GCP account. Is it possible to configure CLI on databricks on GCP using that service account ? Something similar to:databricks configure ---tok...

  • 1925 Views
  • 1 replies
  • 1 kudos
LukaszJ
by Contributor III
  • 5986 Views
  • 4 replies
  • 4 kudos

Resolved! Terraform: get metastore id without creating new metastore

Hello,I want to create database (schema) and tables in my Databricks workspace using terraform.I found this resources: databricks_schemaIt requires databricks_catalog, which requires metastore_id.However, I have databricks_workspace and I did not cre...

  • 5986 Views
  • 4 replies
  • 4 kudos
Latest Reply
Atanu
Databricks Employee
  • 4 kudos

https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/schema I think this is for UC. https://docs.databricks.com/data-governance/unity-catalog/index.html

  • 4 kudos
3 More Replies
Juniper_AIML
by New Contributor
  • 5188 Views
  • 3 replies
  • 0 kudos

How to access the virtual environment directory where the databricks notebooks are running?

How to get access to a separate virtual environment space and its storage location on databricks so that we can move our created libraries into it without waiting for their installation each time the cluster is brought up.What we want basically is a ...

  • 5188 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Aman Gaurav​ Thank you for posting your question.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 0 kudos
2 More Replies
alejandrofm
by Valued Contributor
  • 5484 Views
  • 4 replies
  • 4 kudos

Resolved! Are there any recommended spark config settings for Delta/Databricks?

Hi! I'm starting to test configs on DataBricks, for example, to avoid corrupting data if two processes try to write at the same time:.config('spark.databricks.delta.multiClusterWrites.enabled', 'false')Or if I need more partitions than default .confi...

  • 5484 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hey there @Alejandro Martinez​ Hope everything is going well.Just wanted to see if you were able to find an answer to your question. If yes, would you be happy to let us know and mark it as best so that other members can find the solution more quickl...

  • 4 kudos
3 More Replies
DejanSunderic
by New Contributor III
  • 15741 Views
  • 11 replies
  • 3 kudos

is command stuck?

I created some ETL using DataFrames in python. It used to run 180 sec. But it is not taking ~ 1200 sec. I have been changing it, so it could be something that I introduced, or something in the environment.Part of the process is appending results into...

  • 15741 Views
  • 11 replies
  • 3 kudos
Latest Reply
Carneiro
New Contributor II
  • 3 kudos

I am having a problem very similar. Since yesterday, without a known reason, some commands that used to run daily are now stuck in a "Running command" state. Commands like: dataframe.show(n=1) dataframe.toPandas() dataframe.description() dataframe.wr...

  • 3 kudos
10 More Replies
Thefan
by New Contributor II
  • 1415 Views
  • 0 replies
  • 1 kudos

Koalas dropna in DLT

Greetings !I've been trying out DLT for a few days but I'm running into an unexpected issue when trying to use Koalas dropna in my pipeline.My goal is to drop all columns that contain only null/na values before writing it.Current code is this : @dlt...

  • 1415 Views
  • 0 replies
  • 1 kudos
shawncao
by New Contributor II
  • 4468 Views
  • 0 replies
  • 0 kudos

REST api to execute SQL query and read output

Hi there,I'm using these two APIs to execute SQL statements and read output back when it's finished. However, seems it always returns only 1000 rows even though I need all the results (millions of rows), is there a solution for this? execute SQL: htt...

  • 4468 Views
  • 0 replies
  • 0 kudos
Jackie
by New Contributor II
  • 6875 Views
  • 3 replies
  • 6 kudos

Resolved! speed up a for loop in python (azure databrick)

code example# a list of file pathlist_files_path = ["/dbfs/mnt/...", ..., "/dbfs/mnt/..."]# copy all file above to this folderdest_path=""/dbfs/mnt/..."for file_path in list_files_path: # copy function copy_file(file_path, dest_path)I am runni...

  • 6875 Views
  • 3 replies
  • 6 kudos
Latest Reply
Hemant
Valued Contributor II
  • 6 kudos

@Jackie Chan​ , What's the data size you want to copy? If it's bigger, then use ADF.

  • 6 kudos
2 More Replies
818674
by New Contributor III
  • 11836 Views
  • 10 replies
  • 8 kudos

Resolved! How to perform a cross-check for data in multiple columns in same table?

I am trying to check whether a certain datapoint exists in multiple locations.This is what my table looks like:I am checking whether the same datapoint is in two locations. The idea is that this datapoint should exist in BOTH locations, and be counte...

Table Examples of Results for Cross-Checking
  • 11836 Views
  • 10 replies
  • 8 kudos
Latest Reply
818674
New Contributor III
  • 8 kudos

Hi,Thank you very much for following up. I no longer need assistance with this issue.

  • 8 kudos
9 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels