Data Engineering

Forum Posts

Sorted by:

by SatheeshSathees • New Contributor

08-19-2020 11:31:33 AM

5909 Views
1 replies
0 kudos

how to dynamically explode array type column in pyspark or scala

HI, i have a parquet file with complex column types with nested structs and arrays. I am using the scrpit from below link to flatten my parquet file. https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schema I am able ...

Data Engineering

5909 Views
1 replies
0 kudos

08-19-2020 11:31:33 AM

View Replies

Latest Reply

shyam_9
Valued Contributor

09-18-2020 12:39:35 PM

0 kudos

Hello, Please check out the below docs and notebook which has similar examples, https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schemahttps://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/transform-comple...

0 kudos

09-18-2020 12:39:35 PM

by zachary_jones • New Contributor

10-28-2019 7:49:59 AM

3241 Views
3 replies
0 kudos

Resolved! Python logging: 'Operation not supported' after upgrading to DBRT 6.1

My organization has an S3 bucket mounted to the databricks filesystem under /dbfs/mnt. When using Databricks runtime 5.5 and below, the following logging code works correctly:log_file = '/dbfs/mnt/path/to/my/bucket/test.log' logger = logging.getLogg...

Data Engineering

3241 Views
3 replies
0 kudos

10-28-2019 7:49:59 AM

View Replies

Latest Reply

lycenok
New Contributor II

09-09-2020 10:06:20 PM

0 kudos

Probably it's worth to try to rewrite the emit ... https://docs.python.org/3/library/logging.html#handlers This works for me: class OurFileHandler(logging.FileHandler): def emit(self, record): # copied from https://github.com/python/cpython/bl...

0 kudos

09-09-2020 10:06:20 PM

2 More Replies

by DimitrisMpizos • New Contributor

02-08-2016 7:45:52 AM

24087 Views
16 replies
0 kudos

Exporting data from databricks

I couldn't find in documentation a way to export an RDD as a text file to a local folder by using python. Is it possible?

Data Engineering

24087 Views
16 replies
0 kudos

02-08-2016 7:45:52 AM

View Replies

Latest Reply

Manu1
New Contributor II

03-25-2019 8:18:04 AM

0 kudos

To: Export a file to local desktop Workaround : Basically you have to do a "Create a table in notebook" with DBFS The steps are: Click on "Data" icon > Click "Add Data" button > Click "DBFS" button > Click "FileStore" folder icon in 1st pane "Sele...

0 kudos

03-25-2019 8:18:04 AM

15 More Replies

by MarcoMistroni • New Contributor II

09-13-2017 2:23:11 PM

11848 Views
4 replies
0 kudos

pandas.read_csv

HI all i have uploaded a file on my cluster , at location /FileStore/tables/qmwxhxvi1505337108590/PastHires.csv However, whenever i try to read it using panda df = pd.read_csv('dbfs:/FileStore/tables/qmwxhxvi1505337108590/PastHires.csv') , i alwas...

Data Engineering

11848 Views
4 replies
0 kudos

09-13-2017 2:23:11 PM

View Replies

Latest Reply

rohitshah
New Contributor II

09-07-2020 3:57:12 AM

0 kudos

I am also having same issue, I have uploaded file in DBFS and it gives some default code which itself is not working. Is anyone has solved this issue ?

0 kudos

09-07-2020 3:57:12 AM

3 More Replies

by olisch • New Contributor

09-26-2019 3:37:08 AM

17009 Views
3 replies
0 kudos

Spark: How to simultaneously read from and write to the same parquet file

How can I read a DataFrame from a parquet file, do transformations and write this modified DataFrame back to the same same parquet file? If I attempt to do so, I get an error, understandably because spark reads from the source and one cannot writ...

Data Engineering

17009 Views
3 replies
0 kudos

09-26-2019 3:37:08 AM

View Replies

Latest Reply

saravananraju
New Contributor II

09-03-2020 3:41:16 PM

0 kudos

Hi, You can use insertinto instead of save. It will overwrite the target file no need to cache or persist your dataframe Df.write.format("parquet").mode("overwrite").insertInto("/file_path") ~Saravanan

0 kudos

09-03-2020 3:41:16 PM

2 More Replies

by DineshKumar • New Contributor III

08-29-2020 7:31:43 AM

7977 Views
3 replies
0 kudos

How to convert the first row as column from an existing dataframe.

I have a dataframe like below. I want to convert the first row as columns for this dataframe. How could I do this. Is there any way to convert it directly.(without using df.first) usdata.show() -----+---+------------+------------+-------------------...

Data Engineering

7977 Views
3 replies
0 kudos

08-29-2020 7:31:43 AM

View Replies

Latest Reply

User16857282152
Contributor

09-01-2020 11:24:43 AM

0 kudos

My point was that you are asking for column names from what you consider to be the "first row" and I am telling you that at scale, or if the data volume grows what you consider to be the "first row" may no longer actually be the "first row" unless ...

0 kudos

09-01-2020 11:24:43 AM

2 More Replies

by RahulMukherjee • New Contributor

08-13-2019 8:14:50 AM

20373 Views
1 replies
1 kudos

I am trying to load a delta table from a dataframe. But its giving me an error.

Code : from pyspark.sql.functions import *acDF = spark.read.format('csv').options(header='true', inferschema='true').load("/mnt/rahulmnt/Insurance_Info1.csv"); acDF.write.option("overwriteSchema", "true").format("delta").mode("overwrite").save("/delt...

Data Engineering

20373 Views
1 replies
1 kudos

08-13-2019 8:14:50 AM

View Replies

Latest Reply

AbhaKhanna
New Contributor II

08-28-2020 1:40:46 AM

1 kudos

1. using Spark SQL Context in python, scala notebooks : sql("SET spark.databricks.delta.formatCheck.enabled=false") 2. In SQL dbc notebooks: SET spark.databricks.delta.formatCheck.enabled=false

1 kudos

08-28-2020 1:40:46 AM

by Anbazhagananbut • New Contributor II

03-01-2020 5:34:53 AM

2713 Views
2 replies
0 kudos

Pyspark Convert Struct Type to Map Type

Hello Sir, Could you please advise the below scenario in pyspark 2.4.3 in data-bricksto load the data into the delta table.I want to load the dataframe with this column "data" into the table as Maptype in the data-bricks spark delta table.could you ...

Data Engineering

2713 Views
2 replies
0 kudos

03-01-2020 5:34:53 AM

View Replies

Latest Reply

sherryellis
New Contributor II

08-24-2020 1:56:24 AM

0 kudos

you can do it by making an api request - convert png to ico paint/api/2.0/clusters/permanent-delete i dont see an option to delete or edit an automated cluster from UI.

0 kudos

08-24-2020 1:56:24 AM

1 More Replies

by SimonNuss • New Contributor II

10-10-2018 11:52:38 AM

23183 Views
6 replies
4 kudos

Resolved! Databricks cannot access Azure Key Vault

I am trying to set retrieve a secret from Azure Key Vault as follows: sqlPassword = dbutils.secrets.get(scope = "Admin", key = "SqlPassword") The scope has been created correctly, but I receive the following error message: com.databricks.common.clie...

Data Engineering

23183 Views
6 replies
4 kudos

10-10-2018 11:52:38 AM

View Replies

Latest Reply

virahkumar
New Contributor II

08-10-2020 9:15:38 PM

4 kudos

Sometimes turning it off and on again is underrated, so I gave up finding the problem, deleted it and re-created the scope - worked a breeze!Mine seems like it was something silly, I was able to set up my vault but got the same issue when trying to ...

4 kudos

08-10-2020 9:15:38 PM

5 More Replies

by KutayKoralturk • New Contributor

08-06-2020 3:23:47 AM

6939 Views
2 replies
0 kudos

Filtering rows that does not contain a string

search = search.filter(!F.col("Name").contains("ABC")) search = search.filter(F.not(F.col("Name").contains("ABC")) Both methods fail due to syntax error could you please help me filter rows that does not contain a certain string in pyspark. ^ Synta...

Data Engineering

6939 Views
2 replies
0 kudos

08-06-2020 3:23:47 AM

View Replies

Latest Reply

User16857282152
Contributor

08-06-2020 10:34:20 AM

0 kudos

Here is a complete example values = [("K1","true","false"),("K2","true","false")] columns = ['Key', 'V1', 'V2'] df = spark.createDataFrame(values, columns) display(df) FILTER df2 = df.filter(df.column2 != "delete") display(df2)

0 kudos

08-06-2020 10:34:20 AM

1 More Replies

by SergeyIvanchuk • New Contributor

11-16-2018 2:06:02 PM

7609 Views
4 replies
0 kudos

Seaborn plot display in Databricks

I am using Seaborn version 0.7.1 and matplotlib version 1.5.3 The following code does not display a graph in the end. Any idea how to resolve ? (works in Python CLI on my local computer) import seaborn as sns sns.set(style="darkgrid") tips = sns.lo...

Data Engineering

7609 Views
4 replies
0 kudos

11-16-2018 2:06:02 PM

View Replies

Latest Reply

AbbyLemon
New Contributor II

08-04-2020 2:58:33 PM

0 kudos

I found that you create a similar comparison plot as what you get from seaborn by using the display(sparkdf) and adding multiple columns to the 'Values' section while creating a 'Scatter plot'. You get to the 'Customize Plot' by clicking on the icon ...

0 kudos

08-04-2020 2:58:33 PM

3 More Replies

by AlexRomano • New Contributor

08-15-2019 9:54:41 AM

5962 Views
1 replies
0 kudos

PicklingError: Could not pickle the task to send it to the workers.

I am using sklearn in a databricks notebook to fit an estimator in parallel. Sklearn uses joblib with loky backend to do this. Now, I have file in databricks which I can import my custom Classifier from, and everything works fine. However, if I lite...

Data Engineering

5962 Views
1 replies
0 kudos

08-15-2019 9:54:41 AM

View Replies

Latest Reply

Anonymous
Not applicable

07-16-2020 2:06:42 PM

0 kudos

Hi, aromano I know this issue was opened almost a year ago, but I faced the same problem and I was able to solve it. So, I'm sharing the solution in order to help others. Probably, you're using SparkTrials to optimize the model's hyperparameters ...

0 kudos

07-16-2020 2:06:42 PM

by Mir_SakhawatHos • New Contributor II

08-30-2017 12:49:20 AM

29799 Views
2 replies
3 kudos

How can I delete folders from my DBFS?

I want to delete my created folder from DBFS. But how? How can I download files from there?

Data Engineering

29799 Views
2 replies
3 kudos

08-30-2017 12:49:20 AM

View Replies

Latest Reply

IA
New Contributor II

07-02-2020 6:13:08 AM

3 kudos

Hello, Max answer focuses on the CLI. Instead, using the Community Edition Platform, proceed as follows: # You must first delete all files in your folder. 1. import org.apache.hadoop.fs.{Path, FileSystem} 2. dbutils.fs.rm("/FileStore/tables/file.cs...

3 kudos

07-02-2020 6:13:08 AM

1 More Replies

by bhaumikg • New Contributor II

08-29-2019 11:47:37 AM

13304 Views
7 replies
2 kudos

Databricks throwing error "SQL DW failed to execute the JDBC query produced by the connector." while pushing the column with string length more than 255

I am using databricks to transform the data and than pushing the data into datalake. the data is getting pushed in if the length of the string field is 255 or less but it throws following error if it is beyond that. "SQL DW failed to execute the JDB...

Data Engineering

13304 Views
7 replies
2 kudos

08-29-2019 11:47:37 AM

View Replies

Latest Reply

bhaumikg
New Contributor II

04-24-2020 9:23:13 AM

2 kudos

As suggested by ZAIvR, please use append and provide maxlength while pushing the data. Overwrite may not work with this unless databricks team has fixed the issue

2 kudos

04-24-2020 9:23:13 AM

6 More Replies

by Nik • New Contributor III

09-04-2018 10:03:05 AM

9410 Views
19 replies
0 kudos

write from a Dataframe to a CSV file, CSV file is blank

Hi i am reading from a text file from a blob val sparkDF = spark.read.format(file_type) .option("header", "true") .option("inferSchema", "true") .option("delimiter", file_delimiter) .load(wasbs_string + "/" + PR_FileName) Then i test my Datafra...

Data Engineering

9410 Views
19 replies
0 kudos

09-04-2018 10:03:05 AM

View Replies

Latest Reply

nl09
New Contributor II

06-25-2020 9:15:52 AM

0 kudos

Create temp folder inside output folder. Copy file part-00000* with the file name to output folder. Delete the temp folder. Python code snippet to do the same. fpath=output+'/'+'temp' def file_exists(path): try: dbutils.fs.ls(path) return...

0 kudos

06-25-2020 9:15:52 AM

18 More Replies

User

Count

1602

738

348

285

247

Databricks Community

Forum Posts

how to dynamically explode array type column in pyspark or scala

Resolved! Python logging: 'Operation not supported' after upgrading to DBRT 6.1

Exporting data from databricks

pandas.read_csv

Spark: How to simultaneously read from and write to the same parquet file

How to convert the first row as column from an existing dataframe.

I am trying to load a delta table from a dataframe. But its giving me an error.

Pyspark Convert Struct Type to Map Type

Resolved! Databricks cannot access Azure Key Vault

Filtering rows that does not contain a string

Seaborn plot display in Databricks

PicklingError: Could not pickle the task to send it to the workers.

How can I delete folders from my DBFS?

Databricks throwing error "SQL DW failed to execute the JDBC query produced by the connector." while pushing the column with string length more than 255

write from a Dataframe to a CSV file, CSV file is blank

Pyspark serialization

Getting com.databricks.client.jdbc.Driver is not f...

Unit Testing DLT Pipelines

Retrieve job-level parameters in spark_python_task...

Cannot pass arrays to spark.sql() using named para...