cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16790091296
by Contributor II
  • 446 Views
  • 0 replies
  • 5 kudos

Some Tips & Tricks for Optimizing costs and performance (Clusters and Ganglia): [Note: This list is not exhaustive] Leverage the DataFrame or Spar...

Some Tips & Tricks for Optimizing costs and performance (Clusters and Ganglia):[Note: This list is not exhaustive]Leverage the DataFrame or SparkSQL API’s first. They use the same execution process resulting in parity in performance but they also com...

  • 446 Views
  • 0 replies
  • 5 kudos
Anonymous
by Not applicable
  • 1814 Views
  • 1 replies
  • 0 kudos

Resolved! Delta vs parquet

When does it make sense to use Delta over parquet? Are there any instances when you would rather use parquet?

  • 1814 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

Users should almost always choose Delta over parquet. Keep in mind that delta is a storage format that sits on top of parquet so the performance of writing to both formats is similar. However, reading data and transforming data with delta is almost a...

  • 0 kudos
Anonymous
by Not applicable
  • 6688 Views
  • 1 replies
  • 0 kudos
  • 6688 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

An Action in Spark is any operation that does not return an RDD. Evaluation is executed when an action is taken. Actions trigger the scheduler, which build a directed acyclic graph (DAG) as a plan of execution. The plan of execution is created by wor...

  • 0 kudos
Anonymous
by Not applicable
  • 638 Views
  • 1 replies
  • 0 kudos

Resolved! Converting between Pandas to Koalas

When and why should I convert b/w a Pandas to Koalas dataframe? What are the implications?

  • 638 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

Koalas is distributed on a Databricks cluster similar to how Spark dataframes are also distributed. Pandas dataframes only live on the spark driver in memory. If you are a pandas user and are using a multi-node cluster then you should use koalas to p...

  • 0 kudos
Anonymous
by Not applicable
  • 574 Views
  • 0 replies
  • 0 kudos

Append subset of columns to target Snowflake table

I’m using the databricks-snowflake connector to load data into a Snowflake table. Can someone point me to any example of how we can append only a subset of columns to a target Snowflake table (for example some columns in the target snowflake table ar...

  • 574 Views
  • 0 replies
  • 0 kudos
Anonymous
by Not applicable
  • 535 Views
  • 0 replies
  • 0 kudos

Detailed logs for R process

We have a user notebook in R that reliably crashes the driver. Are detailed logs from the R process stored somewhere on drivers/workers?

  • 535 Views
  • 0 replies
  • 0 kudos
User16790091296
by Contributor II
  • 1650 Views
  • 1 replies
  • 0 kudos

Resolved! How can I use a Python function defined in my git-repo module within the DB notebook?

I have a function within a module in my git-repo. I want to import that to my DB notebook - how can I do that?

  • 1650 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Databricks Repos allows you to sync your work in Databricks with a remote Git repository. This makes it easier to implement development best practices. Databricks supports integrations with GitHub, Bitbucket, and GitLab. Using Repos you can bring you...

  • 0 kudos
Anonymous
by Not applicable
  • 1247 Views
  • 0 replies
  • 0 kudos

Seeing all columns

I have a dataframe with a lot of columns (20 or so) and 8 rows. Part of the output is being cutoff and I can scroll to the right to see the rest of the columns but I was just wondering if it was possible to somehow "zoom out" of the table so I can se...

  • 1247 Views
  • 0 replies
  • 0 kudos
MallikSunkara
by New Contributor II
  • 7020 Views
  • 4 replies
  • 0 kudos

how to pass arguments and variables to databricks python activity from azure data factory

how to pass arguments and variables to databricks python activity from azure data factory

  • 7020 Views
  • 4 replies
  • 0 kudos
Latest Reply
CristianIspan
New Contributor II
  • 0 kudos

try importing argv from sys. Then if you have the parameter added correctly in DataFactory you could get it in your python script typing argv[1] (index 0 is the file path).

  • 0 kudos
3 More Replies
Labels
Top Kudoed Authors