cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

RaghuMundru
by New Contributor III
  • 29569 Views
  • 15 replies
  • 0 kudos

Resolved! I am running simple count and I am getting an error

Here is the error that I am getting when I run the following query statement=sqlContext.sql("SELECT count(*) FROM ARDATA_2015_09_01").show() ---------------------------------------------------------------------------Py4JJavaError Traceback (most rec...

  • 29569 Views
  • 15 replies
  • 0 kudos
Latest Reply
muchave
New Contributor II
  • 0 kudos

192.168.o.1 is a private IP address used to login the admin panel of a router. 192.168.l.l is the host address to change default router settings.

  • 0 kudos
14 More Replies
Anbazhagananbut
by New Contributor II
  • 6296 Views
  • 1 replies
  • 0 kudos

Get Size of a column in Bytes for a Pyspark Data frame

Hello All, I have a column in a dataframe which i struct type.I want to find the size of the column in bytes.it is getting failed while loading in snowflake.I could see size functions avialable to get the length.how to calculate the size in bytes fo...

  • 6296 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

There isn't one size for a column; it takes some amount of bytes in memory, but a different amount potentially when serialized on disk or stored in Parquet. You can work out the size in memory from its data type; an array of 100 bytes takes 100 byte...

  • 0 kudos
ubsingh
by New Contributor II
  • 10002 Views
  • 3 replies
  • 1 kudos
  • 10002 Views
  • 3 replies
  • 1 kudos
Latest Reply
ubsingh
New Contributor II
  • 1 kudos

Thanks for you help @leedabee. I will go through second option, First one is not applicable in my case.

  • 1 kudos
2 More Replies
Anbazhagananbut
by New Contributor II
  • 8581 Views
  • 1 replies
  • 1 kudos

How to handle Blank values in Array of struct elements in pyspark

Hello All, We have a data in a column in pyspark dataframe having array of struct typehaving multiple nested fields present.if the value is not blank it will savethe data in the same array of struct type in spark delta table.please advise on the bel...

  • 8581 Views
  • 1 replies
  • 1 kudos
Latest Reply
shyam_9
Valued Contributor
  • 1 kudos

Hi @Anbazhagan anbutech17,Can you please try as in below answers,https://stackoverflow.com/questions/56942683/how-to-add-null-columns-to-complex-array-struct-in-spark-with-a-udf

  • 1 kudos
Juan_MiguelTrin
by New Contributor
  • 6520 Views
  • 1 replies
  • 0 kudos

How to resolve our of memory error?

I have a data bricks notebook hosted on Azure. I am having this problem when doing INNER JOIN. I tried creating a much higher cluster configuration but it still making outOfMemoryError. org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquir...

  • 6520 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Valued Contributor
  • 0 kudos

Hi @Juan Miguel Trinidad,can you please the below suggestions,http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-OutOfMemoryError-Unable-to-acquire-bytes-of-memory-td16773.html

  • 0 kudos
SohelKhan
by New Contributor II
  • 9992 Views
  • 3 replies
  • 0 kudos

PySpark DataFrame: Select all but one or a set of columns

In SQL select, in some implementation, we can provide select -col_A to select all columns except the col_A. I tried it in the Spark 1.6.0 as follows: For a dataframe df with three columns col_A, col_B, col_C df.select('col_B, 'col_C') # it works df....

  • 9992 Views
  • 3 replies
  • 0 kudos
Latest Reply
NavitaJain
New Contributor II
  • 0 kudos

@sk777, @zjffdu, @Lejla Metohajrova if your columns are time-series ordered OR you want to maintain their original order... use cols = [c for c in df.columns if c != 'col_A'] df[cols]

  • 0 kudos
2 More Replies
AmitSukralia
by New Contributor
  • 22385 Views
  • 5 replies
  • 0 kudos

Listing all files under an Azure Data Lake Gen2 container

I am trying to find a way to list all files in an Azure Data Lake Gen2 container. I have mounted the storage account and can see the list of files in a folder (a container can have multiple level of folder hierarchies) if I know the exact path of th...

  • 22385 Views
  • 5 replies
  • 0 kudos
Latest Reply
Balaji_su
New Contributor II
  • 0 kudos

stackoverflow.pngfiles.txt

  • 0 kudos
4 More Replies
cfregly
by Contributor
  • 4124 Views
  • 5 replies
  • 0 kudos
  • 4124 Views
  • 5 replies
  • 0 kudos
Latest Reply
srisre111
New Contributor II
  • 0 kudos

I am trying to store a dataframe as table in databricks and encountering the following error, can someone help? "typeerror: field date: can not merge type <class 'pyspark.sql.types.stringtype'> and <class 'pyspark.sql.types.doubletype'>"

  • 0 kudos
4 More Replies
dhanunjaya
by New Contributor II
  • 6083 Views
  • 6 replies
  • 0 kudos

how to remove empty rows from the data frame.

lets assume if i have 10 columns in a data frame,all 10 columns has empty values for 100 rows out of 200 rows, how i can skip the empty rows?

  • 6083 Views
  • 6 replies
  • 0 kudos
Latest Reply
GaryDiaz
New Contributor II
  • 0 kudos

you can try this: df.na.drop(how = "all"), this will remove the row only if all the rows are null or NaN

  • 0 kudos
5 More Replies
AlaQabaja
by New Contributor II
  • 4025 Views
  • 3 replies
  • 0 kudos

Get last modified date or create date for azure blob container

Hi Everyone, I am trying to implement a way in Python to only read files that weren't loaded since the last run of my notebook. The way I am thinking of implementing this is to keep of the last time my notebook has finished in a database table. Nex...

  • 4025 Views
  • 3 replies
  • 0 kudos
Latest Reply
Forum_Admin
Contributor
  • 0 kudos

Hello! I just wanted to share my point of view on the topic of dating sites. I have been looking for a decent Asian catch-up site for a very long time, in addition to them I found https://hookupsearch.org/asian-hookup-sites/. We definitely recommend...

  • 0 kudos
2 More Replies
smanickam
by New Contributor II
  • 13873 Views
  • 5 replies
  • 3 kudos

com.databricks.sql.io.FileReadException: Error while reading file dbfs:

I ran the below statement and got the error %python data = sqlContext.read.parquet("/FileStore/tables/ganesh.parquet") display(data) Error: SparkException: Job aborted due to stage failure: Task 0 in stage 27.0 failed 1 times, most recent failure:...

  • 13873 Views
  • 5 replies
  • 3 kudos
Latest Reply
MatthewSzafir
New Contributor III
  • 3 kudos

I'm having a similar issue reading a JSON file. It is ~550MB compressed and is on a single line: val cfilename = "c_datafeed_20200128.json.gz" val events = spark.read.json(s"/mnt/c/input1/$cfilename") display(events) The filename is correct and t...

  • 3 kudos
4 More Replies
AnaDel_Campo_Me
by New Contributor
  • 10482 Views
  • 2 replies
  • 1 kudos

FileNotFoundError: [Errno 2] No such file or directory or IsADirectoryError: [Errno 21] Is a directory

I have been trying to open a file on the dbfs using all different combinations: if I use the following code: with open("/dbfs/FileStore/df/Downloadedfile.csv", 'r', newline='') as f I get IsADirectoryError: [Errno 21] Is a directory with open("dbfs:...

  • 10482 Views
  • 2 replies
  • 1 kudos
Latest Reply
paulmark
New Contributor II
  • 1 kudos

To get rid of this error you can try using Python file exists methods to check that at least python sees the file exists or not. In other words, you can make sure that the user has indeed typed a correct path for a real existing file. If the user do...

  • 1 kudos
1 More Replies
Seenu45
by New Contributor II
  • 5120 Views
  • 3 replies
  • 1 kudos

Resolved! JavaPackage' object is not callable :: Mlean

Hi Folks, We are working on production Databricks project using Mleap. when run below code on databricks, it throws error like " 'JavaPackage' object is not callable" code :import mleap.pyspark from mleap.pyspark.spark_support import SimpleSparkSer...

  • 5120 Views
  • 3 replies
  • 1 kudos
Latest Reply
Seenu45
New Contributor II
  • 1 kudos

Thanks syamspr. it is working now.

  • 1 kudos
2 More Replies
pepevo
by New Contributor III
  • 11600 Views
  • 10 replies
  • 0 kudos

Resolved! How to convert column type from decimal to date in sparksql

I need to convert column type from decimal to date in sparksql when the format is not yyyy-mm-dd? A table contains column data declared as decimal (38,0) and data is in yyyymmdd format and I am unable to run sql queries on it in databrick notebook. ...

  • 11600 Views
  • 10 replies
  • 0 kudos
Latest Reply
pepevo
New Contributor III
  • 0 kudos

thank you Tom. I made it work already.

  • 0 kudos
9 More Replies
ArielHerrera
by New Contributor II
  • 15132 Views
  • 5 replies
  • 2 kudos

Resolved! How to display SHAP plots?

I am looking to display SHAP plots, here is the code:import xgboost import shap shap.initjs() # load JS visualization code to notebookX,y = shap.datasets.boston() # train XGBoost model model = xgboost.train({"learning_rate": 0.01}, xgboost.DMatri...

0693f000007OoIfAAK
  • 15132 Views
  • 5 replies
  • 2 kudos
Latest Reply
lrnzcig
New Contributor II
  • 2 kudos

As @Vinh dqvinh87​  noted, the accepted solution only works for force_plot. For other plots, the following trick works for me:import matplotlib.pyplot as plt p = shap.summary_plot(shap_values, test_df, show=False) display(p)

  • 2 kudos
4 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels