cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

SIRIGIRI
by Contributor
  • 858 Views
  • 2 replies
  • 2 kudos

sharikrishna26.medium.com

Spark Dataframe MetadataSpark Dataframe is structurally the same as the table. However, it does not store any schema information in the metadata store. Instead, we have a runtime metadata catalog to store the Dataframe schema information. It is simil...

  • 858 Views
  • 2 replies
  • 2 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 2 kudos

this is awesome thanks

  • 2 kudos
1 More Replies
joakon
by New Contributor III
  • 3079 Views
  • 5 replies
  • 1 kudos

Resolved! slow running query

Hi All, I would you to get some ideas on how to improve performance on a data frame with around 10M rows. adls- gen2df1 =source1 , format , parquet ( 10 m)df2 =source2 , format , parquet ( 10 m)df = join df1 and df2 type =inner join df.count() is ...

  • 3079 Views
  • 5 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

hey @raghu maremanda​ did you get any answer if yes ,please update here, by that other people can also get the solution

  • 1 kudos
4 More Replies
AK98
by New Contributor II
  • 4717 Views
  • 3 replies
  • 0 kudos

Py4JJavaError when trying to write dataframe to delta table

I'm trying to write a dataframe to a delta table and am getting this error.Im not sure what the issue is as I had no problem successfully writing other dataframes to delta tables. I attached a snippet of the data as well along with the schema:

image.png image image
  • 4717 Views
  • 3 replies
  • 0 kudos
Latest Reply
ramravi
Contributor II
  • 0 kudos

Does your delta tables contains all columns what your dataframe contains. If there is schema mismatch it might be a reason for failure.df.write.format("delta") \ .option("mergeSchema", "true") \ .mode("append") \ .sav...

  • 0 kudos
2 More Replies
ratnakarsinha
by New Contributor II
  • 20705 Views
  • 3 replies
  • 0 kudos

How to get full result using DataFrame.Display method

Hi, Dataframe.Display method in Databricks notebook fetches only 1000 rows by default. Is there a way to change this default to display and download full result (more than 1000 rows) in python? Thanks, Ratnakar.

  • 20705 Views
  • 3 replies
  • 0 kudos
Latest Reply
ramravi
Contributor II
  • 0 kudos

display method doesn't have the option to choose the number of rows. Use the show method. It is not neat and you can't do visualizations and downloads.

  • 0 kudos
2 More Replies
Mado
by Valued Contributor II
  • 8414 Views
  • 6 replies
  • 2 kudos

Resolved! How to see if condition is True / False for all rows in a DataFrame?

Assume that I have a Spark DataFrame, and I want to see if records satisfy a condition.Example dataset:# Prepare Data data = [('A', 1), \ ('A', 2), \ ('B', 3) ]   # Create DataFrame columns= ['col_1', 'col_2'] df = spark.createDataF...

image image
  • 8414 Views
  • 6 replies
  • 2 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 2 kudos

Hi you can use display() or show() function that will provide you expected results.

  • 2 kudos
5 More Replies
joakon
by New Contributor III
  • 11434 Views
  • 6 replies
  • 6 kudos
  • 11434 Views
  • 6 replies
  • 6 kudos
Latest Reply
huyd
New Contributor III
  • 6 kudos

check your read cell, "Delimeter"

  • 6 kudos
5 More Replies
Sujitha
by Databricks Employee
  • 2126 Views
  • 6 replies
  • 5 kudos

KB Feedback Discussion  In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers ...

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers to common questions about Databricks, as well as information on optimisation and troubleshooting.Thes...

  • 2126 Views
  • 6 replies
  • 5 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 5 kudos

Thanks for sharing @Sujitha Ramamoorthy​ 

  • 5 kudos
5 More Replies
James_209101
by New Contributor II
  • 6451 Views
  • 2 replies
  • 5 kudos

Using large dataframe in-memory (data not allowed to be "at rest") results in driver crash and/or out of memory

I'm having trouble working on Databricks with data that we are not allowed to save off or persist in any way. The data comes from an API (which returns a JSON response). We have a scala package on our cluster that makes the queries (almost 6k queries...

  • 6451 Views
  • 2 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @James Held​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 5 kudos
1 More Replies
tassiodahora
by New Contributor III
  • 60596 Views
  • 2 replies
  • 7 kudos

Resolved! Failed to merge incompatible data types LongType and StringType

Guys, good morning!I am writing the results of a json in a delta table, only the json structure is not always the same, if the field does not list in the json it generates type incompatibility when I append(dfbrzagend.write .format("delta") .mode("ap...

  • 60596 Views
  • 2 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hi @Tássio Santos​ The delta table performs schema validation of every column, and the source dataframe column data types must match the column data types in the target table. If they don’t match, an exception is raised.For reference-https://docs.dat...

  • 7 kudos
1 More Replies
Manjusha
by New Contributor II
  • 2308 Views
  • 1 replies
  • 1 kudos

SocketTimeout exception when running a display command on spark dataframe

I am using runtime 9.1LTSI have a R notebook that reads a csv into a R dataframe and does some transformations and finally is converted to spark dataframe using the createDataFrame function.after that when I call the display function on this spark da...

  • 2308 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Manjusha Unnikrishnan​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.

  • 1 kudos
rajat1
by New Contributor
  • 14606 Views
  • 2 replies
  • 1 kudos

How to convert dataframe (df), to a excel file that I can share with my colleagues ?

I am working on microsoft azure databrick, I have a final dataframe of shape (3276*23) , I want to share it in form of excel file? How can I do it ( I am using ->df.to_excel('fileOutput.xlsx', sheet_name = 'Sheet1', index = False) , command is runn...

  • 14606 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

You could try this way, convert Pyspark Dataframe to Pandas Dataframe then export to excel file.

  • 1 kudos
1 More Replies
wyzer
by Contributor II
  • 5466 Views
  • 2 replies
  • 12 kudos

Resolved! Add the creation date of a parquet file into a DataFrame

Currently I load multiple parquet file with this code:df = spark.read.parquet("/mnt/dev/bronze/Voucher/*/*")(Inside the Voucher folder, there is one folder by date. Each one containing one parquet file)How can I add a column into this DataFrame, that...

  • 5466 Views
  • 2 replies
  • 12 kudos
Latest Reply
wyzer
Contributor II
  • 12 kudos

Thanks @Michail Karamanos​ 

  • 12 kudos
1 More Replies
markdias
by New Contributor II
  • 1591 Views
  • 3 replies
  • 2 kudos

Which is quicker: grouping a table that is a join of several others or querying data?

This may be a tricky question, so please bear with meIn a real life scenario, i have a dataframe (i'm using pyspark) called age, with is a groupBy of other 4 dataframes. I join these 4 so at the end I have a few million rows, but after the groupBy th...

  • 1591 Views
  • 3 replies
  • 2 kudos
Latest Reply
NhatHoang
Valued Contributor II
  • 2 kudos

Hi @Marcos Dias​ ,Frankly, I think we need more detail to answer your question:Are these 4 dataframes​ updated their data?How often you use the groupBy-dataframe?

  • 2 kudos
2 More Replies
Labels