Data Engineering

Forum Posts

Sorted by:

Start a conversation

by User16790091296 • Contributor II

06-04-2021 12:10:43 PM

437 Views
0 replies
0 kudos

How are the Spark hours tracked? How would I know used vs. remaining Spark hours for the month?

Data Engineering

437 Views
0 replies
0 kudos

06-04-2021 12:10:43 PM

by User16790091296 • Contributor II

06-04-2021 11:52:03 AM

446 Views
0 replies
5 kudos

Some Tips & Tricks for Optimizing costs and performance (Clusters and Ganglia): [Note: This list is not exhaustive] Leverage the DataFrame or Spar...

Some Tips & Tricks for Optimizing costs and performance (Clusters and Ganglia):[Note: This list is not exhaustive]Leverage the DataFrame or SparkSQL API’s first. They use the same execution process resulting in parity in performance but they also com...

Data Engineering

446 Views
0 replies
5 kudos

06-04-2021 11:52:03 AM

by Anonymous • Not applicable

06-04-2021 10:27:54 AM

1814 Views
1 replies
0 kudos

Resolved! Delta vs parquet

When does it make sense to use Delta over parquet? Are there any instances when you would rather use parquet?

Data Engineering

1814 Views
1 replies
0 kudos

06-04-2021 10:27:54 AM

View Replies

Latest Reply

Ryan_Chynoweth
Honored Contributor III

06-04-2021 4:28:00 AM

0 kudos

Users should almost always choose Delta over parquet. Keep in mind that delta is a storage format that sits on top of parquet so the performance of writing to both formats is similar. However, reading data and transforming data with delta is almost a...

0 kudos

06-04-2021 4:28:00 AM

by Anonymous • Not applicable

06-02-2021 4:28:54 PM

6688 Views
1 replies
0 kudos

Resolved! What is difference between Action and Transformation in Spark?

Data Engineering

6688 Views
1 replies
0 kudos

06-02-2021 4:28:54 PM

View Replies

Latest Reply

Ryan_Chynoweth
Honored Contributor III

06-04-2021 4:32:00 AM

0 kudos

An Action in Spark is any operation that does not return an RDD. Evaluation is executed when an action is taken. Actions trigger the scheduler, which build a directed acyclic graph (DAG) as a plan of execution. The plan of execution is created by wor...

0 kudos

06-04-2021 4:32:00 AM

by Anonymous • Not applicable

06-02-2021 4:34:38 PM

638 Views
1 replies
0 kudos

Resolved! Converting between Pandas to Koalas

When and why should I convert b/w a Pandas to Koalas dataframe? What are the implications?

Data Engineering

638 Views
1 replies
0 kudos

06-02-2021 4:34:38 PM

View Replies

Latest Reply

Ryan_Chynoweth
Honored Contributor III

06-04-2021 4:31:00 AM

0 kudos

Koalas is distributed on a Databricks cluster similar to how Spark dataframes are also distributed. Pandas dataframes only live on the spark driver in memory. If you are a pandas user and are using a multi-node cluster then you should use koalas to p...

0 kudos

06-04-2021 4:31:00 AM

by Anonymous • Not applicable

06-02-2021 5:25:49 PM

429 Views
0 replies
0 kudos

How large is considered a “large” dataset to put on the driver node?

Data Engineering

429 Views
0 replies
0 kudos

06-02-2021 5:25:49 PM

by Anonymous • Not applicable

06-02-2021 5:01:52 PM

574 Views
0 replies
0 kudos

Append subset of columns to target Snowflake table

I’m using the databricks-snowflake connector to load data into a Snowflake table. Can someone point me to any example of how we can append only a subset of columns to a target Snowflake table (for example some columns in the target snowflake table ar...

Data Engineering

574 Views
0 replies
0 kudos

06-02-2021 5:01:52 PM

by Anonymous • Not applicable

06-02-2021 4:59:04 PM

535 Views
0 replies
0 kudos

Detailed logs for R process

We have a user notebook in R that reliably crashes the driver. Are detailed logs from the R process stored somewhere on drivers/workers?

Data Engineering

535 Views
0 replies
0 kudos

06-02-2021 4:59:04 PM

by Anonymous • Not applicable

06-02-2021 4:32:43 PM

409 Views
0 replies
0 kudos

Koalas index columns

How are index columns handled in Koalas? What about multi-level indices?

Data Engineering

409 Views
0 replies
0 kudos

06-02-2021 4:32:43 PM

by lee • Contributor

05-28-2021 12:38:57 PM

1323 Views
0 replies
0 kudos

How can I see details of a previous version of a table (e.g. number of files)?

I know that I can do a DESCRIBE DETAIL on a table to get current delta table version details. If I want to get these same details on a previous version, how can I do that?

Data Engineering

1323 Views
0 replies
0 kudos

05-28-2021 12:38:57 PM

by User16790091296 • Contributor II

05-28-2021 11:40:57 AM

1650 Views
1 replies
0 kudos

Resolved! How can I use a Python function defined in my git-repo module within the DB notebook?

I have a function within a module in my git-repo. I want to import that to my DB notebook - how can I do that?

Data Engineering

1650 Views
1 replies
0 kudos

05-28-2021 11:40:57 AM

View Replies

Latest Reply

aladda
Honored Contributor II

05-28-2021 5:00:00 AM

0 kudos

Databricks Repos allows you to sync your work in Databricks with a remote Git repository. This makes it easier to implement development best practices. Databricks supports integrations with GitHub, Bitbucket, and GitLab. Using Repos you can bring you...

0 kudos

05-28-2021 5:00:00 AM

by Ryan_Chynoweth • Honored Contributor III

05-28-2021 11:27:53 AM

452 Views
0 replies
0 kudos

Is there a way to disable Azure Active Directory tokens from being utilized on the Databricks platform?

I know I can disable Databricks PAT tokens from being used, but what about AAD tokens?

Data Engineering

452 Views
0 replies
0 kudos

05-28-2021 11:27:53 AM

by User16752246494 • Contributor

05-28-2021 7:55:05 AM

862 Views
0 replies
1 kudos

How do you connect databricks notebook with IntelliJ remote debug mode?

Does anyone know how to debug notebook code using IntelliJ or is there any other tool for it?like debugging in Spark cluster using export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005are there any similar sett...

Data Engineering

862 Views
0 replies
1 kudos

05-28-2021 7:55:05 AM

by Anonymous • Not applicable

05-27-2021 11:45:39 AM

1247 Views
0 replies
0 kudos

Seeing all columns

I have a dataframe with a lot of columns (20 or so) and 8 rows. Part of the output is being cutoff and I can scroll to the right to see the rest of the columns but I was just wondering if it was possible to somehow "zoom out" of the table so I can se...

Data Engineering

1247 Views
0 replies
0 kudos

05-27-2021 11:45:39 AM

by MallikSunkara • New Contributor II

07-22-2019 7:45:12 AM

7020 Views
4 replies
0 kudos

how to pass arguments and variables to databricks python activity from azure data factory

Data Engineering

7020 Views
4 replies
0 kudos

07-22-2019 7:45:12 AM

View Replies

Latest Reply

CristianIspan
New Contributor II

05-27-2021 6:43:58 AM

0 kudos

try importing argv from sys. Then if you have the parameter added correctly in DataFactory you could get it in your python script typing argv[1] (index 0 is the file path).

0 kudos

05-27-2021 6:43:58 AM

3 More Replies

User

Count

1603

736

344

284

247

Databricks

Forum Posts

How are the Spark hours tracked? How would I know used vs. remaining Spark hours for the month?

Some Tips & Tricks for Optimizing costs and performance (Clusters and Ganglia): [Note: This list is not exhaustive] Leverage the DataFrame or Spar...

Resolved! Delta vs parquet

Resolved! What is difference between Action and Transformation in Spark?

Resolved! Converting between Pandas to Koalas

How large is considered a “large” dataset to put on the driver node?

Append subset of columns to target Snowflake table

Detailed logs for R process

Koalas index columns

How can I see details of a previous version of a table (e.g. number of files)?

Resolved! How can I use a Python function defined in my git-repo module within the DB notebook?

Is there a way to disable Azure Active Directory tokens from being utilized on the Databricks platform?

How do you connect databricks notebook with IntelliJ remote debug mode?

Seeing all columns

how to pass arguments and variables to databricks python activity from azure data factory

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...