cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Anbazhagananbut
by New Contributor II
  • 3207 Views
  • 2 replies
  • 0 kudos

Pyspark Convert Struct Type to Map Type

Hello Sir, Could you please advise the below scenario in pyspark 2.4.3 in data-bricksto load the data into the delta table.I want to load the dataframe with this column "data" into the table as Maptype in the data-bricks spark delta table.could you ...

  • 3207 Views
  • 2 replies
  • 0 kudos
Latest Reply
sherryellis
New Contributor II
  • 0 kudos

you can do it by making an api request - convert png to ico paint/api/2.0/clusters/permanent-delete i dont see an option to delete or edit an automated cluster from UI.

  • 0 kudos
1 More Replies
SimonNuss
by New Contributor II
  • 27763 Views
  • 6 replies
  • 5 kudos

Resolved! Databricks cannot access Azure Key Vault

I am trying to set retrieve a secret from Azure Key Vault as follows: sqlPassword = dbutils.secrets.get(scope = "Admin", key = "SqlPassword") The scope has been created correctly, but I receive the following error message: com.databricks.common.clie...

  • 27763 Views
  • 6 replies
  • 5 kudos
Latest Reply
virahkumar
New Contributor II
  • 5 kudos

Sometimes turning it off and on again is underrated, so I gave up finding the problem, deleted it and re-created the scope - worked a breeze!Mine seems like it was something silly, I was able to set up my vault but got the same issue when trying to ...

  • 5 kudos
5 More Replies
KutayKoralturk
by New Contributor
  • 7797 Views
  • 2 replies
  • 0 kudos

Filtering rows that does not contain a string

search = search.filter(!F.col("Name").contains("ABC")) search = search.filter(F.not(F.col("Name").contains("ABC")) Both methods fail due to syntax error could you please help me filter rows that does not contain a certain string in pyspark. ^ Synta...

  • 7797 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16857282152
Contributor
  • 0 kudos

Here is a complete example values = [("K1","true","false"),("K2","true","false")] columns = ['Key', 'V1', 'V2'] df = spark.createDataFrame(values, columns) display(df) FILTER df2 = df.filter(df.column2 != "delete") display(df2)

  • 0 kudos
1 More Replies
SergeyIvanchuk
by New Contributor
  • 9090 Views
  • 4 replies
  • 0 kudos

Seaborn plot display in Databricks

I am using Seaborn version 0.7.1 and matplotlib version 1.5.3 The following code does not display a graph in the end. Any idea how to resolve ? (works in Python CLI on my local computer) import seaborn as sns sns.set(style="darkgrid") tips = sns.lo...

  • 9090 Views
  • 4 replies
  • 0 kudos
Latest Reply
AbbyLemon
New Contributor II
  • 0 kudos

I found that you create a similar comparison plot as what you get from seaborn by using the display(sparkdf) and adding multiple columns to the 'Values' section while creating a 'Scatter plot'. You get to the 'Customize Plot' by clicking on the icon ...

  • 0 kudos
3 More Replies
AlexRomano
by New Contributor
  • 6681 Views
  • 1 replies
  • 0 kudos

PicklingError: Could not pickle the task to send it to the workers.

I am using sklearn in a databricks notebook to fit an estimator in parallel. Sklearn uses joblib with loky backend to do this. Now, I have file in databricks which I can import my custom Classifier from, and everything works fine. However, if I lite...

  • 6681 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi, aromano I know this issue was opened almost a year ago, but I faced the same problem and I was able to solve it. So, I'm sharing the solution in order to help others. Probably, you're using SparkTrials to optimize the model's hyperparameters ...

  • 0 kudos
Mir_SakhawatHos
by New Contributor II
  • 31751 Views
  • 2 replies
  • 3 kudos

How can I delete folders from my DBFS?

I want to delete my created folder from DBFS. But how? How can I download files from there?

  • 31751 Views
  • 2 replies
  • 3 kudos
Latest Reply
IA
New Contributor II
  • 3 kudos

Hello, Max answer focuses on the CLI. Instead, using the Community Edition Platform, proceed as follows: # You must first delete all files in your folder. 1. import org.apache.hadoop.fs.{Path, FileSystem}  2. dbutils.fs.rm("/FileStore/tables/file.cs...

  • 3 kudos
1 More Replies
bhaumikg
by New Contributor II
  • 14823 Views
  • 7 replies
  • 2 kudos

Databricks throwing error "SQL DW failed to execute the JDBC query produced by the connector." while pushing the column with string length more than 255

I am using databricks to transform the data and than pushing the data into datalake. the data is getting pushed in if the length of the string field is 255 or less but it throws following error if it is beyond that. "SQL DW failed to execute the JDB...

  • 14823 Views
  • 7 replies
  • 2 kudos
Latest Reply
bhaumikg
New Contributor II
  • 2 kudos

As suggested by ZAIvR, please use append and provide maxlength while pushing the data. Overwrite may not work with this unless databricks team has fixed the issue

  • 2 kudos
6 More Replies
Nik
by New Contributor III
  • 12156 Views
  • 19 replies
  • 0 kudos

write from a Dataframe to a CSV file, CSV file is blank

Hi i am reading from a text file from a blob val sparkDF = spark.read.format(file_type) .option("header", "true") .option("inferSchema", "true") .option("delimiter", file_delimiter) .load(wasbs_string + "/" + PR_FileName) Then i test my Datafra...

  • 12156 Views
  • 19 replies
  • 0 kudos
Latest Reply
nl09
New Contributor II
  • 0 kudos

Create temp folder inside output folder. Copy file part-00000* with the file name to output folder. Delete the temp folder. Python code snippet to do the same. fpath=output+'/'+'temp' def file_exists(path): try: dbutils.fs.ls(path) return...

  • 0 kudos
18 More Replies
pmezentsev
by New Contributor
  • 7596 Views
  • 7 replies
  • 0 kudos

Pyspark. How to get best params in grid search

Hello!I am using spark 2.1.1 in python(python 2.7 executed in jupyter notebook)And trying to make grid search for linear regression parameters.My code looks like this:from pyspark.ml.tuning import CrossValidator, ParamGridBuilder from pyspark.ml impo...

  • 7596 Views
  • 7 replies
  • 0 kudos
Latest Reply
phamyen
New Contributor II
  • 0 kudos

This is a great article. It gave me a lot of useful information. thank you very much download app

  • 0 kudos
6 More Replies
BingQian
by New Contributor II
  • 12054 Views
  • 2 replies
  • 0 kudos

Resolved! Error of "name 'IntegerType' is not defined" in attempting to convert a DF column to IntegerType

initialDF .withColumn("OriginalCol", initialDF.OriginalCol.cast(IntegerType)) Or initialDF .withColumn("OriginalCol", initialDF.OriginalCol.cast(IntegerType())) However, always failed with this error : NameError: name 'IntegerType' is not defined ...

  • 12054 Views
  • 2 replies
  • 0 kudos
Latest Reply
BingQian
New Contributor II
  • 0 kudos

Thank you @Kristo Raun​  !

  • 0 kudos
1 More Replies
prakharjain
by New Contributor
  • 17592 Views
  • 2 replies
  • 0 kudos

Resolved! I need to edit my parquet files, and change field name, replacing space by underscore

Hello, I am facing trouble as mentioned in following topics in stackoverflow, https://stackoverflow.com/questions/45804534/pyspark-org-apache-spark-sql-analysisexception-attribute-name-contains-inv https://stackoverflow.com/questions/38191157/spark-...

  • 17592 Views
  • 2 replies
  • 0 kudos
Latest Reply
DimitriBlyumin
New Contributor III
  • 0 kudos

One option is to use something other than Spark to read the problematic file, e.g. Pandas, if your file is small enough to fit on the driver node (Pandas will only run on the driver). If you have multiple files - you can loop through them and fix on...

  • 0 kudos
1 More Replies
ChristianHofste
by New Contributor II
  • 11207 Views
  • 1 replies
  • 0 kudos

Drop duplicates in Table

Hi, there is a function to delete data from a Delta Table: deltaTable = DeltaTable.forPath(spark, "/data/events/") deltaTable.delete(col("date") < "2017-01-01") But is there also a way to drop duplicates somehow? Like deltaTable.dropDuplicates()......

  • 11207 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Valued Contributor
  • 0 kudos

Hi @Christian Hofstetter, You can check here for info on the same,https://docs.delta.io/0.4.0/delta-update.html#data-deduplication-when-writing-into-delta-tables

  • 0 kudos
JigaoLuo
by New Contributor
  • 4967 Views
  • 3 replies
  • 0 kudos

OPTIMIZE error: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'OPTIMIZE'

Hi everyone. I am trying to learn the keyword OPTIMIZE from this blog using scala: https://docs.databricks.com/delta/optimizations/optimization-examples.html#delta-lake-on-databricks-optimizations-scala-notebook. But my local spark seems not able t...

  • 4967 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi Jigao, OPTIMIZE isn't in the open source delta API, so won't run on your local Spark instance - https://docs.delta.io/latest/api/scala/io/delta/tables/index.html?search=optimize

  • 0 kudos
2 More Replies
EricThomas
by New Contributor
  • 11085 Views
  • 2 replies
  • 0 kudos

!pip install vs. dbutils.library.installPyPI()

Hello, Scenario: Trying to install some python modules into a notebook (scoped to just the notebook) using...``` dbutils.library.installPyPI("azure-identity") dbutils.library.installPyPI("azure-storage-blob") dbutils.library.restartPython()``` ...ge...

  • 11085 Views
  • 2 replies
  • 0 kudos
Latest Reply
eishbis
New Contributor II
  • 0 kudos

Hi @ericOnline I also faced the same issue and I eventually found that upgrading the databricks runtime version from my current "5.5 LTS (includes Apache Spark 2.4.3, Scala 2.11)" to "6.5(Scala 2.11,Spark 2.4.5) resolved this issue. Though the offic...

  • 0 kudos
1 More Replies
RaghuMundru
by New Contributor III
  • 32942 Views
  • 15 replies
  • 0 kudos

Resolved! I am running simple count and I am getting an error

Here is the error that I am getting when I run the following query statement=sqlContext.sql("SELECT count(*) FROM ARDATA_2015_09_01").show() ---------------------------------------------------------------------------Py4JJavaError Traceback (most rec...

  • 32942 Views
  • 15 replies
  • 0 kudos
Latest Reply
muchave
New Contributor II
  • 0 kudos

192.168.o.1 is a private IP address used to login the admin panel of a router. 192.168.l.l is the host address to change default router settings.

  • 0 kudos
14 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels