Data Engineering

Forum Posts

Sorted by:

by Mado • Valued Contributor II

10-22-2022 3:38:00 AM

2888 Views
2 replies
3 kudos

How to apply Pandas functions on PySpark DataFrame?

Hi, I want to apply Pandas functions (like isna, concat, append, etc) on PySpark DataFrame in such a way that computations are done on multi-node cluster.I don't want to convert PySpark DataFrame into Pandas DataFrame since, I think, only one node is...

Data Engineering

2888 Views
2 replies
3 kudos

10-22-2022 3:38:00 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

10-23-2022 2:00:08 PM

3 kudos

The best is to use pandas on a spark, it is virtually interchangeable so it just different API for Spark data frameimport pyspark.pandas as ps psdf = ps.range(10) sdf = psdf.to_spark().filter("id > 5") sdf.show()

3 kudos

10-23-2022 2:00:08 PM

1 More Replies

by AJDJ • New Contributor III

09-30-2022 3:35:41 PM

16548 Views
9 replies
4 kudos

Delta Lake Demo - Not working

Hi there, I imported the delta lake demo notebook from databricks link and at command 12 it errors out. I tired other ways and path but couldnt get past the error. May be the notebook is outdated?https://www.databricks.com/notebooks/Demo_Hub-Delta_La...

Data Engineering

16548 Views
9 replies
4 kudos

09-30-2022 3:35:41 PM

View Replies

Latest Reply

Anonymous
Not applicable

10-22-2022 11:11:37 PM

4 kudos

Hi @AJ DJ Does @Hubert Dudek response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

4 kudos

10-22-2022 11:11:37 PM

8 More Replies

by JoeS • New Contributor III

09-30-2022 2:34:06 PM

7311 Views
1 replies
1 kudos

When will Github Copilot be available in the Databricks IDE?

It's been quite difficult to stay in VSCode while developing data science experiments and tooling for Databricks. Our team would like to have Github Copilot for the databricks IDE.

Data Engineering

7311 Views
1 replies
1 kudos

09-30-2022 2:34:06 PM

View Replies

Latest Reply

Anonymous
Not applicable

10-22-2022 11:08:27 PM

1 kudos

Hi @Joe Shull Does @Kaniz Fatma response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

1 kudos

10-22-2022 11:08:27 PM

by RJB • New Contributor II

03-03-2022 1:16:27 PM

15230 Views
6 replies
0 kudos

Resolved! How to pass outputs from a python task to a notebook task

I am trying to create a job which has 2 tasks as follows:A python task which accepts a date and an integer from the user and outputs a list of dates (say, a list of 5 dates in string format).A notebook which runs once for each of the dates from the d...

Data Engineering

15230 Views
6 replies
0 kudos

03-03-2022 1:16:27 PM

View Replies

Latest Reply

BilalAslamDbrx
Databricks Employee

10-22-2022 1:14:35 AM

0 kudos

Just a note that this feature, Task Values, has been generally available for a while.

0 kudos

10-22-2022 1:14:35 AM

5 More Replies

by hari • Contributor

10-14-2022 6:07:06 AM

26170 Views
3 replies
7 kudos

How to add the partition for an existing delta table

We didn't need to set partitions for our delta tables as we didn't have many performance concerns and delta lake out-of-the-box optimization worked great for us. But there is now a need to set a specific partition column for some tables to allow conc...

Data Engineering

26170 Views
3 replies
7 kudos

10-14-2022 6:07:06 AM

View Replies

Latest Reply

hari
Contributor

10-18-2022 4:50:33 AM

7 kudos

Updated the description

7 kudos

10-18-2022 4:50:33 AM

2 More Replies

by Anonymous • Not applicable

10-20-2022 2:36:01 PM

1185 Views
0 replies
1 kudos

Heads up! November Community Social! On November 17th we are hosting another Community Social - we're doing these monthly ! We want to make sure ...

Heads up! November Community Social! On November 17th we are hosting another Community Social - we're doing these monthly ! We want to make sure that we all have the chance to connect as a community often. Come network, talk data, and just get social...

Data Engineering

1185 Views
0 replies
1 kudos

10-20-2022 2:36:01 PM

by Taha_Hussain • Databricks Employee

10-20-2022 11:17:35 AM

2011 Views
0 replies
8 kudos

Ask your technical questions at Databricks Office Hours October 26 - 11:00 AM - 12:00 PM PT: Register HereNovember 9 - 8:00 AM - 9:00 AM GMT: Register...

Ask your technical questions at Databricks Office HoursOctober 26 - 11:00 AM - 12:00 PM PT: Register HereNovember 9 - 8:00 AM - 9:00 AM GMT: Register Here (NEW EMEA Office Hours)Databricks Office Hours connects you directly with experts to answer all...

Data Engineering

2011 Views
0 replies
8 kudos

10-20-2022 11:17:35 AM

by pen • New Contributor II

10-17-2022 3:30:00 AM

2845 Views
2 replies
2 kudos

Pyspark will error while I pack source zip package without dir.

If I send the package made by zipfile on spark.submit.pyFiles which zip by this code. import zipfile, os def make_zip(source_dir, output_filename): with zipfile.ZipFile(output_filename, 'w') as zipf: pre_len = len(os.path....

Data Engineering

2845 Views
2 replies
2 kudos

10-17-2022 3:30:00 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

10-20-2022 10:15:30 AM

2 kudos

I checked, and your code is ok. If you set source_dir and output_filename please remember to start path with /dbfsIf you work on the community edition you can get problems with access to underlying filesystem.

2 kudos

10-20-2022 10:15:30 AM

1 More Replies

by mghildiy • New Contributor

10-14-2022 9:57:00 PM

1874 Views
1 replies
1 kudos

Checking spark performance locally

I am experimenting with spark, on my local machine. So, is there some tool/api available to check the performance of the code I write?For eg. I write:val startTime = System.nanoTime() invoicesDF .select( count("*").as("Total Number Of Inv...

Data Engineering

1874 Views
1 replies
1 kudos

10-14-2022 9:57:00 PM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

10-20-2022 9:45:28 AM

1 kudos

Please check the details about your code (task in jobs) in Spark UI.

1 kudos

10-20-2022 9:45:28 AM

by g96g • New Contributor III

10-19-2022 8:27:34 AM

6652 Views
1 replies
1 kudos

Resolved! how can I pass the df columns as a parameter

Im doing the self study and want pass df column name as a parameter.I have defined the widget column_name= dbutils.widgets.get('column_name')which is executing succefuly ( giving me a column name)then Im reading the df and do some transformation and ...

Data Engineering

6652 Views
1 replies
1 kudos

10-19-2022 8:27:34 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

10-20-2022 7:53:21 AM

1 kudos

df2.select([column_name]).writeORdf2.select(column_name).write

1 kudos

10-20-2022 7:53:21 AM

by Mado • Valued Contributor II

10-19-2022 6:26:06 AM

31743 Views
2 replies
6 kudos

Resolved! Difference between "spark.table" & "spark.read.table"?

Hi,I want to make a PySpark DataFrame from a Table. I would like to ask about the difference of the following commands:spark.read.table(TableName)&spark.table(TableName)Both return PySpark DataFrame and look similar. Thanks.

Data Engineering

31743 Views
2 replies
6 kudos

10-19-2022 6:26:06 AM

View Replies

Latest Reply

Mado
Valued Contributor II

10-20-2022 6:21:17 AM

6 kudos

Hi @Kaniz Fatma I selected answer from @Kedar Deshpande as the best answer.

6 kudos

10-20-2022 6:21:17 AM

1 More Replies

by 829023 • Databricks Partner

10-14-2022 12:36:05 AM

3767 Views
2 replies
0 kudos

Faced error using Databricks SQL Connector

I installed databricks-sql-connector in Pycharm.Then i run the query below based on docs.I refer this docs.(https://docs.databricks.com/dev-tools/python-sql-connector.html)==========================================from databricks import sqlimport osw...

Data Engineering

3767 Views
2 replies
0 kudos

10-14-2022 12:36:05 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

10-20-2022 4:53:55 AM

0 kudos

It seems that one of your environment variables is incorrect. Please print them and compare them with the connection settings from the cluster or SQL warehouse endpoint.

0 kudos

10-20-2022 4:53:55 AM

1 More Replies

by ramankr48 • Databricks Partner

10-19-2022 4:01:39 AM

50878 Views
6 replies
11 kudos

Resolved! how to find the size of a table in python or sql?

let's suppose there is a database db, inside that so many tables are there and , i want to get the size of tables . how to get in either sql, python, pyspark.even if i have to get one by one it's fine.

Data Engineering

50878 Views
6 replies
11 kudos

10-19-2022 4:01:39 AM

View Replies

Latest Reply

shan_chandra
Databricks Employee

10-19-2022 10:54:01 AM

11 kudos

@Raman Gupta - could you please try the below %python spark.sql("describe detail delta-table-name").select("sizeInBytes").collect()

11 kudos

10-19-2022 10:54:01 AM

5 More Replies

by User16835756816 • Databricks Employee

09-16-2022 4:20:11 PM

9536 Views
1 replies
6 kudos

How can I simplify my data ingestion by processing the data as it arrives in cloud storage?

This post will help you simplify your data ingestion by utilizing Auto Loader, Delta Optimized Writes, Delta Write Jobs, and Delta Live Tables. Pre-Req: You are using JSON data and Delta Writes commandsStep 1: Simplify ingestion with Auto Loader Delt...

Data Engineering

9536 Views
1 replies
6 kudos

09-16-2022 4:20:11 PM

View Replies

Latest Reply

youssefmrini
Databricks Employee

10-19-2022 8:15:47 AM

6 kudos

This post will help you simplify your data ingestion by utilizing Auto Loader, Delta Optimized Writes, Delta Write Jobs, and Delta Live Tables.Pre-Req: You are using JSON data and Delta Writes commandsStep 1: Simplify ingestion with Auto Loader Delta...

6 kudos

10-19-2022 8:15:47 AM

by ricperelli • New Contributor II

10-19-2022 5:38:26 AM

3287 Views
0 replies
1 kudos

How can i save a parquet file using pandas with a data factory orchestrated notebook?

Hi guys,this is my first question, feel free to correct me if i'm doing something wrong.Anyway, i'm facing a really strange problem, i have a notebook in which i'm performing some pandas analysis, after that i save the resulting dataframe in a parque...

Data Engineering

3287 Views
0 replies
1 kudos

10-19-2022 5:38:26 AM

Databricks Community

Forum Posts

How to apply Pandas functions on PySpark DataFrame?

Delta Lake Demo - Not working

When will Github Copilot be available in the Databricks IDE?

Resolved! How to pass outputs from a python task to a notebook task

How to add the partition for an existing delta table

Heads up! November Community Social! On November 17th we are hosting another Community Social - we're doing these monthly ! We want to make sure ...

Ask your technical questions at Databricks Office Hours October 26 - 11:00 AM - 12:00 PM PT: Register HereNovember 9 - 8:00 AM - 9:00 AM GMT: Register...

Pyspark will error while I pack source zip package without dir.

Checking spark performance locally

Resolved! how can I pass the df columns as a parameter

Resolved! Difference between "spark.table" & "spark.read.table"?

Faced error using Databricks SQL Connector

Resolved! how to find the size of a table in python or sql?

How can I simplify my data ingestion by processing the data as it arrives in cloud storage?

How can i save a parquet file using pandas with a data factory orchestrated notebook?

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template