cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

JigaoLuo
by New Contributor
  • 4981 Views
  • 3 replies
  • 0 kudos

OPTIMIZE error: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'OPTIMIZE'

Hi everyone. I am trying to learn the keyword OPTIMIZE from this blog using scala: https://docs.databricks.com/delta/optimizations/optimization-examples.html#delta-lake-on-databricks-optimizations-scala-notebook. But my local spark seems not able t...

  • 4981 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi Jigao, OPTIMIZE isn't in the open source delta API, so won't run on your local Spark instance - https://docs.delta.io/latest/api/scala/io/delta/tables/index.html?search=optimize

  • 0 kudos
2 More Replies
EricThomas
by New Contributor
  • 11102 Views
  • 2 replies
  • 0 kudos

!pip install vs. dbutils.library.installPyPI()

Hello, Scenario: Trying to install some python modules into a notebook (scoped to just the notebook) using...``` dbutils.library.installPyPI("azure-identity") dbutils.library.installPyPI("azure-storage-blob") dbutils.library.restartPython()``` ...ge...

  • 11102 Views
  • 2 replies
  • 0 kudos
Latest Reply
eishbis
New Contributor II
  • 0 kudos

Hi @ericOnline I also faced the same issue and I eventually found that upgrading the databricks runtime version from my current "5.5 LTS (includes Apache Spark 2.4.3, Scala 2.11)" to "6.5(Scala 2.11,Spark 2.4.5) resolved this issue. Though the offic...

  • 0 kudos
1 More Replies
RaghuMundru
by New Contributor III
  • 33000 Views
  • 15 replies
  • 0 kudos

Resolved! I am running simple count and I am getting an error

Here is the error that I am getting when I run the following query statement=sqlContext.sql("SELECT count(*) FROM ARDATA_2015_09_01").show() ---------------------------------------------------------------------------Py4JJavaError Traceback (most rec...

  • 33000 Views
  • 15 replies
  • 0 kudos
Latest Reply
muchave
New Contributor II
  • 0 kudos

192.168.o.1 is a private IP address used to login the admin panel of a router. 192.168.l.l is the host address to change default router settings.

  • 0 kudos
14 More Replies
Anbazhagananbut
by New Contributor II
  • 6665 Views
  • 1 replies
  • 0 kudos

Get Size of a column in Bytes for a Pyspark Data frame

Hello All, I have a column in a dataframe which i struct type.I want to find the size of the column in bytes.it is getting failed while loading in snowflake.I could see size functions avialable to get the length.how to calculate the size in bytes fo...

  • 6665 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

There isn't one size for a column; it takes some amount of bytes in memory, but a different amount potentially when serialized on disk or stored in Parquet. You can work out the size in memory from its data type; an array of 100 bytes takes 100 byte...

  • 0 kudos
ubsingh
by New Contributor II
  • 10834 Views
  • 3 replies
  • 1 kudos
  • 10834 Views
  • 3 replies
  • 1 kudos
Latest Reply
ubsingh
New Contributor II
  • 1 kudos

Thanks for you help @leedabee. I will go through second option, First one is not applicable in my case.

  • 1 kudos
2 More Replies
Anbazhagananbut
by New Contributor II
  • 9439 Views
  • 1 replies
  • 1 kudos

How to handle Blank values in Array of struct elements in pyspark

Hello All, We have a data in a column in pyspark dataframe having array of struct typehaving multiple nested fields present.if the value is not blank it will savethe data in the same array of struct type in spark delta table.please advise on the bel...

  • 9439 Views
  • 1 replies
  • 1 kudos
Latest Reply
shyam_9
Valued Contributor
  • 1 kudos

Hi @Anbazhagan anbutech17,Can you please try as in below answers,https://stackoverflow.com/questions/56942683/how-to-add-null-columns-to-complex-array-struct-in-spark-with-a-udf

  • 1 kudos
Juan_MiguelTrin
by New Contributor
  • 6817 Views
  • 1 replies
  • 0 kudos

How to resolve our of memory error?

I have a data bricks notebook hosted on Azure. I am having this problem when doing INNER JOIN. I tried creating a much higher cluster configuration but it still making outOfMemoryError. org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquir...

  • 6817 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Valued Contributor
  • 0 kudos

Hi @Juan Miguel Trinidad,can you please the below suggestions,http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-OutOfMemoryError-Unable-to-acquire-bytes-of-memory-td16773.html

  • 0 kudos
SohelKhan
by New Contributor II
  • 11238 Views
  • 3 replies
  • 0 kudos

PySpark DataFrame: Select all but one or a set of columns

In SQL select, in some implementation, we can provide select -col_A to select all columns except the col_A. I tried it in the Spark 1.6.0 as follows: For a dataframe df with three columns col_A, col_B, col_C df.select('col_B, 'col_C') # it works df....

  • 11238 Views
  • 3 replies
  • 0 kudos
Latest Reply
NavitaJain
New Contributor II
  • 0 kudos

@sk777, @zjffdu, @Lejla Metohajrova if your columns are time-series ordered OR you want to maintain their original order... use cols = [c for c in df.columns if c != 'col_A'] df[cols]

  • 0 kudos
2 More Replies
AmitSukralia
by New Contributor
  • 25418 Views
  • 5 replies
  • 0 kudos

Listing all files under an Azure Data Lake Gen2 container

I am trying to find a way to list all files in an Azure Data Lake Gen2 container. I have mounted the storage account and can see the list of files in a folder (a container can have multiple level of folder hierarchies) if I know the exact path of th...

  • 25418 Views
  • 5 replies
  • 0 kudos
Latest Reply
Balaji_su
New Contributor II
  • 0 kudos

stackoverflow.pngfiles.txt

  • 0 kudos
4 More Replies
cfregly
by Contributor
  • 4921 Views
  • 5 replies
  • 0 kudos
  • 4921 Views
  • 5 replies
  • 0 kudos
Latest Reply
srisre111
New Contributor II
  • 0 kudos

I am trying to store a dataframe as table in databricks and encountering the following error, can someone help? "typeerror: field date: can not merge type <class 'pyspark.sql.types.stringtype'> and <class 'pyspark.sql.types.doubletype'>"

  • 0 kudos
4 More Replies
dhanunjaya
by New Contributor II
  • 7123 Views
  • 6 replies
  • 0 kudos

how to remove empty rows from the data frame.

lets assume if i have 10 columns in a data frame,all 10 columns has empty values for 100 rows out of 200 rows, how i can skip the empty rows?

  • 7123 Views
  • 6 replies
  • 0 kudos
Latest Reply
GaryDiaz
New Contributor II
  • 0 kudos

you can try this: df.na.drop(how = "all"), this will remove the row only if all the rows are null or NaN

  • 0 kudos
5 More Replies
AlaQabaja
by New Contributor II
  • 4652 Views
  • 3 replies
  • 0 kudos

Get last modified date or create date for azure blob container

Hi Everyone, I am trying to implement a way in Python to only read files that weren't loaded since the last run of my notebook. The way I am thinking of implementing this is to keep of the last time my notebook has finished in a database table. Nex...

  • 4652 Views
  • 3 replies
  • 0 kudos
Latest Reply
Forum_Admin
Contributor
  • 0 kudos

Hello! I just wanted to share my point of view on the topic of dating sites. I have been looking for a decent Asian catch-up site for a very long time, in addition to them I found https://hookupsearch.org/asian-hookup-sites/. We definitely recommend...

  • 0 kudos
2 More Replies
smanickam
by New Contributor II
  • 15352 Views
  • 5 replies
  • 3 kudos

com.databricks.sql.io.FileReadException: Error while reading file dbfs:

I ran the below statement and got the error %python data = sqlContext.read.parquet("/FileStore/tables/ganesh.parquet") display(data) Error: SparkException: Job aborted due to stage failure: Task 0 in stage 27.0 failed 1 times, most recent failure:...

  • 15352 Views
  • 5 replies
  • 3 kudos
Latest Reply
MatthewSzafir
New Contributor III
  • 3 kudos

I'm having a similar issue reading a JSON file. It is ~550MB compressed and is on a single line: val cfilename = "c_datafeed_20200128.json.gz" val events = spark.read.json(s"/mnt/c/input1/$cfilename") display(events) The filename is correct and t...

  • 3 kudos
4 More Replies
AnaDel_Campo_Me
by New Contributor
  • 10995 Views
  • 2 replies
  • 1 kudos

FileNotFoundError: [Errno 2] No such file or directory or IsADirectoryError: [Errno 21] Is a directory

I have been trying to open a file on the dbfs using all different combinations: if I use the following code: with open("/dbfs/FileStore/df/Downloadedfile.csv", 'r', newline='') as f I get IsADirectoryError: [Errno 21] Is a directory with open("dbfs:...

  • 10995 Views
  • 2 replies
  • 1 kudos
Latest Reply
paulmark
New Contributor II
  • 1 kudos

To get rid of this error you can try using Python file exists methods to check that at least python sees the file exists or not. In other words, you can make sure that the user has indeed typed a correct path for a real existing file. If the user do...

  • 1 kudos
1 More Replies
Seenu45
by New Contributor II
  • 5613 Views
  • 3 replies
  • 1 kudos

Resolved! JavaPackage' object is not callable :: Mlean

Hi Folks, We are working on production Databricks project using Mleap. when run below code on databricks, it throws error like " 'JavaPackage' object is not callable" code :import mleap.pyspark from mleap.pyspark.spark_support import SimpleSparkSer...

  • 5613 Views
  • 3 replies
  • 1 kudos
Latest Reply
Seenu45
New Contributor II
  • 1 kudos

Thanks syamspr. it is working now.

  • 1 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels