cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Mado
by Valued Contributor II
  • 2888 Views
  • 2 replies
  • 3 kudos

How to apply Pandas functions on PySpark DataFrame?

Hi, I want to apply Pandas functions (like isna, concat, append, etc) on PySpark DataFrame in such a way that computations are done on multi-node cluster.I don't want to convert PySpark DataFrame into Pandas DataFrame since, I think, only one node is...

  • 2888 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 3 kudos

The best is to use pandas on a spark, it is virtually interchangeable so it just different API for Spark data frameimport pyspark.pandas as ps   psdf = ps.range(10) sdf = psdf.to_spark().filter("id > 5") sdf.show()

  • 3 kudos
1 More Replies
AJDJ
by New Contributor III
  • 16548 Views
  • 9 replies
  • 4 kudos

Delta Lake Demo - Not working

Hi there, I imported the delta lake demo notebook from databricks link and at command 12 it errors out. I tired other ways and path but couldnt get past the error. May be the notebook is outdated?https://www.databricks.com/notebooks/Demo_Hub-Delta_La...

  • 16548 Views
  • 9 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @AJ DJ​ Does @Hubert Dudek​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 4 kudos
8 More Replies
JoeS
by New Contributor III
  • 7311 Views
  • 1 replies
  • 1 kudos

When will Github Copilot be available in the Databricks IDE?

It's been quite difficult to stay in VSCode while developing data science experiments and tooling for Databricks. Our team would like to have Github Copilot for the databricks IDE.

  • 7311 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Joe Shull​ Does @Kaniz Fatma​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 1 kudos
RJB
by New Contributor II
  • 15230 Views
  • 6 replies
  • 0 kudos

Resolved! How to pass outputs from a python task to a notebook task

I am trying to create a job which has 2 tasks as follows:A python task which accepts a date and an integer from the user and outputs a list of dates (say, a list of 5 dates in string format).A notebook which runs once for each of the dates from the d...

  • 15230 Views
  • 6 replies
  • 0 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 0 kudos

Just a note that this feature, Task Values, has been generally available for a while.

  • 0 kudos
5 More Replies
hari
by Contributor
  • 26170 Views
  • 3 replies
  • 7 kudos

How to add the partition for an existing delta table

We didn't need to set partitions for our delta tables as we didn't have many performance concerns and delta lake out-of-the-box optimization worked great for us. But there is now a need to set a specific partition column for some tables to allow conc...

  • 26170 Views
  • 3 replies
  • 7 kudos
Latest Reply
hari
Contributor
  • 7 kudos

Updated the description

  • 7 kudos
2 More Replies
Anonymous
by Not applicable
  • 1185 Views
  • 0 replies
  • 1 kudos

Heads up! November Community Social!  On November 17th we are hosting another Community Social - we're doing these monthly ! We want to make sure ...

Heads up! November Community Social! On November 17th we are hosting another Community Social - we're doing these monthly ! We want to make sure that we all have the chance to connect as a community often. Come network, talk data, and just get social...

  • 1185 Views
  • 0 replies
  • 1 kudos
Taha_Hussain
by Databricks Employee
  • 2011 Views
  • 0 replies
  • 8 kudos

Ask your technical questions at Databricks Office Hours October 26 - 11:00 AM - 12:00 PM PT: Register HereNovember 9 - 8:00 AM - 9:00 AM GMT: Register...

Ask your technical questions at Databricks Office HoursOctober 26 - 11:00 AM - 12:00 PM PT: Register HereNovember 9 - 8:00 AM - 9:00 AM GMT: Register Here (NEW EMEA Office Hours)Databricks Office Hours connects you directly with experts to answer all...

  • 2011 Views
  • 0 replies
  • 8 kudos
pen
by New Contributor II
  • 2845 Views
  • 2 replies
  • 2 kudos

Pyspark will error while I pack source zip package without dir.

If I send the package made by zipfile on spark.submit.pyFiles which zip by this code. import zipfile, os def make_zip(source_dir, output_filename): with zipfile.ZipFile(output_filename, 'w') as zipf: pre_len = len(os.path....

  • 2845 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 2 kudos

I checked, and your code is ok. If you set source_dir and output_filename please remember to start path with /dbfsIf you work on the community edition you can get problems with access to underlying filesystem.

  • 2 kudos
1 More Replies
mghildiy
by New Contributor
  • 1874 Views
  • 1 replies
  • 1 kudos

Checking spark performance locally

I am experimenting with spark, on my local machine. So, is there some tool/api available to check the performance of the code I write?For eg. I write:val startTime = System.nanoTime() invoicesDF .select( count("*").as("Total Number Of Inv...

  • 1874 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 1 kudos

Please check the details about your code (task in jobs) in Spark UI.

  • 1 kudos
g96g
by New Contributor III
  • 6652 Views
  • 1 replies
  • 1 kudos

Resolved! how can I pass the df columns as a parameter

Im doing the self study and want pass df column name as a parameter.I have defined the widget column_name= dbutils.widgets.get('column_name')which is executing succefuly ( giving me a column name)then Im reading the df and do some transformation and ...

  • 6652 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 1 kudos

df2.select([column_name]).writeORdf2.select(column_name).write

  • 1 kudos
Mado
by Valued Contributor II
  • 31743 Views
  • 2 replies
  • 6 kudos

Resolved! Difference between "spark.table" & "spark.read.table"?

Hi,I want to make a PySpark DataFrame from a Table. I would like to ask about the difference of the following commands:spark.read.table(TableName)&spark.table(TableName)Both return PySpark DataFrame and look similar. Thanks.

  • 31743 Views
  • 2 replies
  • 6 kudos
Latest Reply
Mado
Valued Contributor II
  • 6 kudos

Hi @Kaniz Fatma​ I selected answer from @Kedar Deshpande​ as the best answer.

  • 6 kudos
1 More Replies
829023
by Databricks Partner
  • 3767 Views
  • 2 replies
  • 0 kudos

Faced error using Databricks SQL Connector

I installed databricks-sql-connector in Pycharm.Then i run the query below based on docs.I refer this docs.(https://docs.databricks.com/dev-tools/python-sql-connector.html)==========================================from databricks import sqlimport osw...

  • 3767 Views
  • 2 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 0 kudos

It seems that one of your environment variables is incorrect. Please print them and compare them with the connection settings from the cluster or SQL warehouse endpoint.

  • 0 kudos
1 More Replies
ramankr48
by Databricks Partner
  • 50878 Views
  • 6 replies
  • 11 kudos

Resolved! how to find the size of a table in python or sql?

let's suppose there is a database db, inside that so many tables are there and , i want to get the size of tables . how to get in either sql, python, pyspark.even if i have to get one by one it's fine.

  • 50878 Views
  • 6 replies
  • 11 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 11 kudos

@Raman Gupta​ - could you please try the below %python spark.sql("describe detail delta-table-name").select("sizeInBytes").collect()

  • 11 kudos
5 More Replies
User16835756816
by Databricks Employee
  • 9536 Views
  • 1 replies
  • 6 kudos

How can I simplify my data ingestion by processing the data as it arrives in cloud storage?

This post will help you simplify your data ingestion by utilizing Auto Loader, Delta Optimized Writes, Delta Write Jobs, and Delta Live Tables. Pre-Req: You are using JSON data and Delta Writes commandsStep 1: Simplify ingestion with Auto Loader Delt...

  • 9536 Views
  • 1 replies
  • 6 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 6 kudos

This post will help you simplify your data ingestion by utilizing Auto Loader, Delta Optimized Writes, Delta Write Jobs, and Delta Live Tables.Pre-Req: You are using JSON data and Delta Writes commandsStep 1: Simplify ingestion with Auto Loader Delta...

  • 6 kudos
ricperelli
by New Contributor II
  • 3287 Views
  • 0 replies
  • 1 kudos

How can i save a parquet file using pandas with a data factory orchestrated notebook?

Hi guys,this is my first question, feel free to correct me if i'm doing something wrong.Anyway, i'm facing a really strange problem, i have a notebook in which i'm performing some pandas analysis, after that i save the resulting dataframe in a parque...

  • 3287 Views
  • 0 replies
  • 1 kudos
Labels