cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

databicky
by Contributor II
  • 16430 Views
  • 13 replies
  • 4 kudos
  • 16430 Views
  • 13 replies
  • 4 kudos
Latest Reply
FerArribas
Contributor
  • 4 kudos

Hi @Hubert Dudek​,​Pandas API doesn't support abfss protocol.You have three options:​If you need to use pandas, you can write the excel to the local file system (dbfs) and then move it to ABFSS (for example with dbutils)Write as csv directly in abfss...

  • 4 kudos
12 More Replies
amitdatabricksc
by New Contributor II
  • 9843 Views
  • 4 replies
  • 2 kudos

how to zip a dataframe

how to zip a dataframe so that i get a zipped csv output file. please share command. it is only 1 dataframe involved and not multiple. 

  • 9843 Views
  • 4 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

writing to a local directory does not work.See this topic:https://community.databricks.com/s/feed/0D53f00001M7hNlCAJ

  • 2 kudos
3 More Replies
Rani
by New Contributor
  • 8770 Views
  • 2 replies
  • 0 kudos

Divide a dataframe into multiple smaller dataframes based on values in multiple columns in Scala

I have to divide a dataframe into multiple smaller dataframes based on values in columns like - gender and state , the end goal is to pick up random samples from each dataframeI am trying to implement a sample as explained below, I am quite new to th...

  • 8770 Views
  • 2 replies
  • 0 kudos
Latest Reply
subham0611
New Contributor II
  • 0 kudos

@raela I also have similar usecase. I am writing data to different databricks tables based on colum value.But I am getting insufficient disk space error and driver is getting killed. I am suspecting df.select(colName).distinct().collect()step is taki...

  • 0 kudos
1 More Replies
alexkit
by New Contributor II
  • 2356 Views
  • 4 replies
  • 3 kudos

ASP1.2 Error create database in Spark Programming with Databricks training

I'm on Demo and Lab in Dataframes section. I've imported the dbc into my company cluster and has run "%run ./Includes/Classroom-Setup" successfully. When i run the 1st sql command %sql CREATE TABLE IF NOT EXISTS events USING parquet OPTIONS (path "/m...

  • 2356 Views
  • 4 replies
  • 3 kudos
Latest Reply
KDOCKX
New Contributor II
  • 3 kudos

I had the same issue and solved it like this:In the includes folder, there is a reset notebook, run the first command, this unmounts all mounted databases.Go back to the ASP 1.2 notebook and run the %run ./Includes/Classroom-Setup codeblock.Then run ...

  • 3 kudos
3 More Replies
Ram443
by New Contributor III
  • 33032 Views
  • 9 replies
  • 5 kudos

Resolved! I created a data frame but was not able to see the data

Code to create a data frame:from pyspark.sql import SparkSessionspark=SparkSession.builder.appName("oracle_queries").master("local[4]")\  .config("spark.sql.warehouse.dir", "C:\\softwares\\git\\pyspark\\hive").getOrCreate()from pyspark.sql.functions ...

  • 33032 Views
  • 9 replies
  • 5 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 5 kudos

@ramanjaneyulu kancharla​  can you please select my answer as best answer

  • 5 kudos
8 More Replies
pcriado
by New Contributor III
  • 6160 Views
  • 2 replies
  • 1 kudos

Resolved! Requested array size exceeds VM limit when saving to feature table

Hi, I'm trying to process a small dataset (less than 300 Mb) composed by five queries that run with spark. The end result of those queries is parsed using python and merged into a data frame. Then I try to write this to a delta lake table using featu...

  • 6160 Views
  • 2 replies
  • 1 kudos
Latest Reply
pcriado
New Contributor III
  • 1 kudos

Hello, we have recently found that it's my user in particular that casues the memory issue. Two other users in my organization can run the same notebook without problems, but my user consistenly consumes all available ram and crashes the cluster... a...

  • 1 kudos
1 More Replies
etsyal1e2r3
by Honored Contributor
  • 10002 Views
  • 1 replies
  • 2 kudos

Resolved! Compiling Flattened Dataframe back to Struct Columns

I have a dataframe with this format of columns:[`first.second.third` , `alpha.bravo.test1` , `alpha.bravo.test2`]I'd like to get an output dataframe of this:[ `first` | `alpha` ] ---------------...

image
  • 10002 Views
  • 1 replies
  • 2 kudos
Latest Reply
etsyal1e2r3
Honored Contributor
  • 2 kudos

I have figured out the solution.

  • 2 kudos
konda1
by New Contributor
  • 992 Views
  • 0 replies
  • 0 kudos

Getting Executor lost due to stage failure error on writing data frame to a delta table or any file like parquet or csv or avro

We are working on multiline nested ( multilevel).The file is read and flattened using pyspark and the data frame is showing data using display() method. when saving the same dataframe it is giving executor lost failure error.for some files it is givi...

  • 992 Views
  • 0 replies
  • 0 kudos
Neil
by New Contributor
  • 5585 Views
  • 1 replies
  • 0 kudos

While trying to save the spark dataframe to delta table is taking too long

While working on video analytics task I need to save the image bytes to the delta table earlier extracted into the spark dataframe. While I want to over write a same delta table over the period of complete task and also the size of input data differs...

  • 5585 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

can you check the spark UI, to see where the time is spent?It can be a join, udf, ...

  • 0 kudos
kll
by New Contributor III
  • 829 Views
  • 0 replies
  • 0 kudos

Spark DataFrame apply Databricks geospatial indexing functions

I have a spark DataFrame with `h3` hex ids and I am trying to obtain the polygon geometries. from pyspark.sql import SparkSession from pyspark.sql.functions import col, expr from pyspark.databricks.sql.functions import *   from mosaic import enable_m...

  • 829 Views
  • 0 replies
  • 0 kudos
Vishal09k
by New Contributor II
  • 2490 Views
  • 1 replies
  • 3 kudos

Display Command Not showing the Result, Rather giving the Dataframe Schema

Display Command Not showing the Result, Rather giving the Dataframe Schema 

image image
  • 2490 Views
  • 1 replies
  • 3 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 3 kudos

hey ,can you try you sql query with this methodselect * from (your sql query )

  • 3 kudos
arw1070
by New Contributor II
  • 2117 Views
  • 2 replies
  • 0 kudos

Downstream delta live table is unable to read data frame from upstream table

I have been trying to work on implementing delta live tables to a pre-existing workflow. Currently trying to create two tables: appointments_raw and notes_raw, where notes_raw is "downstream" of appointments_raw. Following this as a reference, I'm at...

image.png
  • 2117 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Anna Wuest​ : Could you please send me the code snippet here? Thanks.

  • 0 kudos
1 More Replies
afzi
by New Contributor II
  • 2327 Views
  • 1 replies
  • 1 kudos

Pandas DataFrame error when using to_csv

Hi Everyone, I would like to a Pandas Dataframe to /dbfs/FileStore/ using to_csv method.Usually it would just write the Dataframe to the path described but It has been giving me "FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStor...

  • 2327 Views
  • 1 replies
  • 1 kudos
Latest Reply
Avinash_94
New Contributor III
  • 1 kudos

f = open("/dbfs/mnt/blob/myNames.txt", "r")

  • 1 kudos
elgeo
by Valued Contributor II
  • 4391 Views
  • 2 replies
  • 0 kudos

Trasform SQL Cursor using Pyspark in Databricks

We have a Cursor in DB2 which reads in each loop data from 2 tables. At the end of each loop, after inserting the data to a target table, we update records related to each loop in these 2 tables before moving to the next loop. An indicative example i...

  • 4391 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @ELENI GEORGOUSI​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

  • 0 kudos
1 More Replies
hv
by New Contributor
  • 4186 Views
  • 1 replies
  • 0 kudos

Error-"'Column' object is not callable".

I am trying to lowercase one of the columns(A_description) of a dataframe(df) and getting the error-"'Column' object is not callable".Code: def new_desc():  for line in df:    line = df['A_description'].map(str.lower)  return line new_desc()Have used...

  • 4186 Views
  • 1 replies
  • 0 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 0 kudos

Hi @Himadri Verma​ Hope this below suggestion will help you in pyspark.Please let me know if you are looking for something elseHappy Learning!!

  • 0 kudos
Labels