cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

SergeyIvanchuk
by New Contributor
  • 9735 Views
  • 4 replies
  • 0 kudos

Seaborn plot display in Databricks

I am using Seaborn version 0.7.1 and matplotlib version 1.5.3 The following code does not display a graph in the end. Any idea how to resolve ? (works in Python CLI on my local computer) import seaborn as sns sns.set(style="darkgrid") tips = sns.lo...

  • 9735 Views
  • 4 replies
  • 0 kudos
Latest Reply
AbbyLemon
New Contributor II
  • 0 kudos

I found that you create a similar comparison plot as what you get from seaborn by using the display(sparkdf) and adding multiple columns to the 'Values' section while creating a 'Scatter plot'. You get to the 'Customize Plot' by clicking on the icon ...

  • 0 kudos
3 More Replies
AlexRomano
by New Contributor
  • 6925 Views
  • 1 replies
  • 0 kudos

PicklingError: Could not pickle the task to send it to the workers.

I am using sklearn in a databricks notebook to fit an estimator in parallel. Sklearn uses joblib with loky backend to do this. Now, I have file in databricks which I can import my custom Classifier from, and everything works fine. However, if I lite...

  • 6925 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi, aromano I know this issue was opened almost a year ago, but I faced the same problem and I was able to solve it. So, I'm sharing the solution in order to help others. Probably, you're using SparkTrials to optimize the model's hyperparameters ...

  • 0 kudos
Mir_SakhawatHos
by New Contributor II
  • 32400 Views
  • 2 replies
  • 3 kudos

How can I delete folders from my DBFS?

I want to delete my created folder from DBFS. But how? How can I download files from there?

  • 32400 Views
  • 2 replies
  • 3 kudos
Latest Reply
IA
New Contributor II
  • 3 kudos

Hello, Max answer focuses on the CLI. Instead, using the Community Edition Platform, proceed as follows: # You must first delete all files in your folder. 1. import org.apache.hadoop.fs.{Path, FileSystem}  2. dbutils.fs.rm("/FileStore/tables/file.cs...

  • 3 kudos
1 More Replies
bhaumikg
by New Contributor II
  • 15296 Views
  • 7 replies
  • 2 kudos

Databricks throwing error "SQL DW failed to execute the JDBC query produced by the connector." while pushing the column with string length more than 255

I am using databricks to transform the data and than pushing the data into datalake. the data is getting pushed in if the length of the string field is 255 or less but it throws following error if it is beyond that. "SQL DW failed to execute the JDB...

  • 15296 Views
  • 7 replies
  • 2 kudos
Latest Reply
bhaumikg
New Contributor II
  • 2 kudos

As suggested by ZAIvR, please use append and provide maxlength while pushing the data. Overwrite may not work with this unless databricks team has fixed the issue

  • 2 kudos
6 More Replies
Nik
by New Contributor III
  • 12848 Views
  • 19 replies
  • 0 kudos

write from a Dataframe to a CSV file, CSV file is blank

Hi i am reading from a text file from a blob val sparkDF = spark.read.format(file_type) .option("header", "true") .option("inferSchema", "true") .option("delimiter", file_delimiter) .load(wasbs_string + "/" + PR_FileName) Then i test my Datafra...

  • 12848 Views
  • 19 replies
  • 0 kudos
Latest Reply
nl09
New Contributor II
  • 0 kudos

Create temp folder inside output folder. Copy file part-00000* with the file name to output folder. Delete the temp folder. Python code snippet to do the same. fpath=output+'/'+'temp' def file_exists(path): try: dbutils.fs.ls(path) return...

  • 0 kudos
18 More Replies
pmezentsev
by New Contributor
  • 7925 Views
  • 7 replies
  • 0 kudos

Pyspark. How to get best params in grid search

Hello!I am using spark 2.1.1 in python(python 2.7 executed in jupyter notebook)And trying to make grid search for linear regression parameters.My code looks like this:from pyspark.ml.tuning import CrossValidator, ParamGridBuilder from pyspark.ml impo...

  • 7925 Views
  • 7 replies
  • 0 kudos
Latest Reply
phamyen
New Contributor II
  • 0 kudos

This is a great article. It gave me a lot of useful information. thank you very much download app

  • 0 kudos
6 More Replies
BingQian
by New Contributor II
  • 12566 Views
  • 2 replies
  • 0 kudos

Resolved! Error of "name 'IntegerType' is not defined" in attempting to convert a DF column to IntegerType

initialDF .withColumn("OriginalCol", initialDF.OriginalCol.cast(IntegerType)) Or initialDF .withColumn("OriginalCol", initialDF.OriginalCol.cast(IntegerType())) However, always failed with this error : NameError: name 'IntegerType' is not defined ...

  • 12566 Views
  • 2 replies
  • 0 kudos
Latest Reply
BingQian
New Contributor II
  • 0 kudos

Thank you @Kristo Raun​  !

  • 0 kudos
1 More Replies
prakharjain
by New Contributor
  • 18680 Views
  • 2 replies
  • 0 kudos

Resolved! I need to edit my parquet files, and change field name, replacing space by underscore

Hello, I am facing trouble as mentioned in following topics in stackoverflow, https://stackoverflow.com/questions/45804534/pyspark-org-apache-spark-sql-analysisexception-attribute-name-contains-inv https://stackoverflow.com/questions/38191157/spark-...

  • 18680 Views
  • 2 replies
  • 0 kudos
Latest Reply
DimitriBlyumin
New Contributor III
  • 0 kudos

One option is to use something other than Spark to read the problematic file, e.g. Pandas, if your file is small enough to fit on the driver node (Pandas will only run on the driver). If you have multiple files - you can loop through them and fix on...

  • 0 kudos
1 More Replies
ChristianHofste
by New Contributor II
  • 11297 Views
  • 1 replies
  • 0 kudos

Drop duplicates in Table

Hi, there is a function to delete data from a Delta Table: deltaTable = DeltaTable.forPath(spark, "/data/events/") deltaTable.delete(col("date") < "2017-01-01") But is there also a way to drop duplicates somehow? Like deltaTable.dropDuplicates()......

  • 11297 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Databricks Employee
  • 0 kudos

Hi @Christian Hofstetter, You can check here for info on the same,https://docs.delta.io/0.4.0/delta-update.html#data-deduplication-when-writing-into-delta-tables

  • 0 kudos
JigaoLuo
by New Contributor
  • 5130 Views
  • 3 replies
  • 0 kudos

OPTIMIZE error: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'OPTIMIZE'

Hi everyone. I am trying to learn the keyword OPTIMIZE from this blog using scala: https://docs.databricks.com/delta/optimizations/optimization-examples.html#delta-lake-on-databricks-optimizations-scala-notebook. But my local spark seems not able t...

  • 5130 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi Jigao, OPTIMIZE isn't in the open source delta API, so won't run on your local Spark instance - https://docs.delta.io/latest/api/scala/io/delta/tables/index.html?search=optimize

  • 0 kudos
2 More Replies
EricThomas
by New Contributor
  • 11366 Views
  • 2 replies
  • 0 kudos

!pip install vs. dbutils.library.installPyPI()

Hello, Scenario: Trying to install some python modules into a notebook (scoped to just the notebook) using...``` dbutils.library.installPyPI("azure-identity") dbutils.library.installPyPI("azure-storage-blob") dbutils.library.restartPython()``` ...ge...

  • 11366 Views
  • 2 replies
  • 0 kudos
Latest Reply
eishbis
New Contributor II
  • 0 kudos

Hi @ericOnline I also faced the same issue and I eventually found that upgrading the databricks runtime version from my current "5.5 LTS (includes Apache Spark 2.4.3, Scala 2.11)" to "6.5(Scala 2.11,Spark 2.4.5) resolved this issue. Though the offic...

  • 0 kudos
1 More Replies
RaghuMundru
by New Contributor III
  • 35367 Views
  • 15 replies
  • 0 kudos

Resolved! I am running simple count and I am getting an error

Here is the error that I am getting when I run the following query statement=sqlContext.sql("SELECT count(*) FROM ARDATA_2015_09_01").show() ---------------------------------------------------------------------------Py4JJavaError Traceback (most rec...

  • 35367 Views
  • 15 replies
  • 0 kudos
Latest Reply
muchave
New Contributor II
  • 0 kudos

192.168.o.1 is a private IP address used to login the admin panel of a router. 192.168.l.l is the host address to change default router settings.

  • 0 kudos
14 More Replies
Anbazhagananbut
by New Contributor II
  • 6831 Views
  • 1 replies
  • 0 kudos

Get Size of a column in Bytes for a Pyspark Data frame

Hello All, I have a column in a dataframe which i struct type.I want to find the size of the column in bytes.it is getting failed while loading in snowflake.I could see size functions avialable to get the length.how to calculate the size in bytes fo...

  • 6831 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

There isn't one size for a column; it takes some amount of bytes in memory, but a different amount potentially when serialized on disk or stored in Parquet. You can work out the size in memory from its data type; an array of 100 bytes takes 100 byte...

  • 0 kudos
ubsingh
by New Contributor II
  • 11079 Views
  • 3 replies
  • 1 kudos
  • 11079 Views
  • 3 replies
  • 1 kudos
Latest Reply
ubsingh
New Contributor II
  • 1 kudos

Thanks for you help @leedabee. I will go through second option, First one is not applicable in my case.

  • 1 kudos
2 More Replies
Anbazhagananbut
by New Contributor II
  • 9741 Views
  • 1 replies
  • 1 kudos

How to handle Blank values in Array of struct elements in pyspark

Hello All, We have a data in a column in pyspark dataframe having array of struct typehaving multiple nested fields present.if the value is not blank it will savethe data in the same array of struct type in spark delta table.please advise on the bel...

  • 9741 Views
  • 1 replies
  • 1 kudos
Latest Reply
shyam_9
Databricks Employee
  • 1 kudos

Hi @Anbazhagan anbutech17,Can you please try as in below answers,https://stackoverflow.com/questions/56942683/how-to-add-null-columns-to-complex-array-struct-in-spark-with-a-udf

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels