cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

SatheeshSathees
by New Contributor
  • 5909 Views
  • 1 replies
  • 0 kudos

how to dynamically explode array type column in pyspark or scala

HI, i have a parquet file with complex column types with nested structs and arrays. I am using the scrpit from below link to flatten my parquet file. https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schema I am able ...

  • 5909 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Valued Contributor
  • 0 kudos

Hello, Please check out the below docs and notebook which has similar examples, https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schemahttps://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/transform-comple...

  • 0 kudos
zachary_jones
by New Contributor
  • 3241 Views
  • 3 replies
  • 0 kudos

Resolved! Python logging: 'Operation not supported' after upgrading to DBRT 6.1

My organization has an S3 bucket mounted to the databricks filesystem under /dbfs/mnt. When using Databricks runtime 5.5 and below, the following logging code works correctly:log_file = '/dbfs/mnt/path/to/my/bucket/test.log' logger = logging.getLogg...

  • 3241 Views
  • 3 replies
  • 0 kudos
Latest Reply
lycenok
New Contributor II
  • 0 kudos

Probably it's worth to try to rewrite the emit ... https://docs.python.org/3/library/logging.html#handlers This works for me: class OurFileHandler(logging.FileHandler): def emit(self, record): # copied from https://github.com/python/cpython/bl...

  • 0 kudos
2 More Replies
DimitrisMpizos
by New Contributor
  • 24087 Views
  • 16 replies
  • 0 kudos

Exporting data from databricks

I couldn't find in documentation a way to export an RDD as a text file to a local folder by using python. Is it possible?

  • 24087 Views
  • 16 replies
  • 0 kudos
Latest Reply
Manu1
New Contributor II
  • 0 kudos

To: Export a file to local desktop Workaround : Basically you have to do a "Create a table in notebook" with DBFS The steps are: Click on "Data" icon > Click "Add Data" button > Click "DBFS" button > Click "FileStore" folder icon in 1st pane "Sele...

  • 0 kudos
15 More Replies
MarcoMistroni
by New Contributor II
  • 11848 Views
  • 4 replies
  • 0 kudos

pandas.read_csv

HI all i have uploaded a file on my cluster , at location /FileStore/tables/qmwxhxvi1505337108590/PastHires.csv However, whenever i try to read it using panda df = pd.read_csv('dbfs:/FileStore/tables/qmwxhxvi1505337108590/PastHires.csv') , i alwas...

  • 11848 Views
  • 4 replies
  • 0 kudos
Latest Reply
rohitshah
New Contributor II
  • 0 kudos

I am also having same issue, I have uploaded file in DBFS and it gives some default code which itself is not working. Is anyone has solved this issue ?

  • 0 kudos
3 More Replies
olisch
by New Contributor
  • 17009 Views
  • 3 replies
  • 0 kudos

Spark: How to simultaneously read from and write to the same parquet file

How can I read a DataFrame from a parquet file, do transformations and write this modified DataFrame back to the same same parquet file? If I attempt to do so, I get an error, understandably because spark reads from the source and one cannot writ...

  • 17009 Views
  • 3 replies
  • 0 kudos
Latest Reply
saravananraju
New Contributor II
  • 0 kudos

Hi, You can use insertinto instead of save. It will overwrite the target file no need to cache or persist your dataframe Df.write.format("parquet").mode("overwrite").insertInto("/file_path") ~Saravanan

  • 0 kudos
2 More Replies
DineshKumar
by New Contributor III
  • 7977 Views
  • 3 replies
  • 0 kudos

How to convert the first row as column from an existing dataframe.

I have a dataframe like below. I want to convert the first row as columns for this dataframe. How could I do this. Is there any way to convert it directly.(without using df.first) usdata.show() -----+---+------------+------------+-------------------...

  • 7977 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16857282152
Contributor
  • 0 kudos

My point was that you are asking for column names from what you consider to be the "first row" and I am telling you that at scale, or if the data volume grows what you consider to be the "first row" may no longer actually be the "first row" unless ...

  • 0 kudos
2 More Replies
RahulMukherjee
by New Contributor
  • 20373 Views
  • 1 replies
  • 1 kudos

I am trying to load a delta table from a dataframe. But its giving me an error.

Code : from pyspark.sql.functions import *acDF = spark.read.format('csv').options(header='true', inferschema='true').load("/mnt/rahulmnt/Insurance_Info1.csv"); acDF.write.option("overwriteSchema", "true").format("delta").mode("overwrite").save("/delt...

  • 20373 Views
  • 1 replies
  • 1 kudos
Latest Reply
AbhaKhanna
New Contributor II
  • 1 kudos

1. using Spark SQL Context in python, scala notebooks : sql("SET spark.databricks.delta.formatCheck.enabled=false") 2. In SQL dbc notebooks: SET spark.databricks.delta.formatCheck.enabled=false

  • 1 kudos
Anbazhagananbut
by New Contributor II
  • 2713 Views
  • 2 replies
  • 0 kudos

Pyspark Convert Struct Type to Map Type

Hello Sir, Could you please advise the below scenario in pyspark 2.4.3 in data-bricksto load the data into the delta table.I want to load the dataframe with this column "data" into the table as Maptype in the data-bricks spark delta table.could you ...

  • 2713 Views
  • 2 replies
  • 0 kudos
Latest Reply
sherryellis
New Contributor II
  • 0 kudos

you can do it by making an api request - convert png to ico paint/api/2.0/clusters/permanent-delete i dont see an option to delete or edit an automated cluster from UI.

  • 0 kudos
1 More Replies
SimonNuss
by New Contributor II
  • 23183 Views
  • 6 replies
  • 4 kudos

Resolved! Databricks cannot access Azure Key Vault

I am trying to set retrieve a secret from Azure Key Vault as follows: sqlPassword = dbutils.secrets.get(scope = "Admin", key = "SqlPassword") The scope has been created correctly, but I receive the following error message: com.databricks.common.clie...

  • 23183 Views
  • 6 replies
  • 4 kudos
Latest Reply
virahkumar
New Contributor II
  • 4 kudos

Sometimes turning it off and on again is underrated, so I gave up finding the problem, deleted it and re-created the scope - worked a breeze!Mine seems like it was something silly, I was able to set up my vault but got the same issue when trying to ...

  • 4 kudos
5 More Replies
KutayKoralturk
by New Contributor
  • 6939 Views
  • 2 replies
  • 0 kudos

Filtering rows that does not contain a string

search = search.filter(!F.col("Name").contains("ABC")) search = search.filter(F.not(F.col("Name").contains("ABC")) Both methods fail due to syntax error could you please help me filter rows that does not contain a certain string in pyspark. ^ Synta...

  • 6939 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16857282152
Contributor
  • 0 kudos

Here is a complete example values = [("K1","true","false"),("K2","true","false")] columns = ['Key', 'V1', 'V2'] df = spark.createDataFrame(values, columns) display(df) FILTER df2 = df.filter(df.column2 != "delete") display(df2)

  • 0 kudos
1 More Replies
SergeyIvanchuk
by New Contributor
  • 7609 Views
  • 4 replies
  • 0 kudos

Seaborn plot display in Databricks

I am using Seaborn version 0.7.1 and matplotlib version 1.5.3 The following code does not display a graph in the end. Any idea how to resolve ? (works in Python CLI on my local computer) import seaborn as sns sns.set(style="darkgrid") tips = sns.lo...

  • 7609 Views
  • 4 replies
  • 0 kudos
Latest Reply
AbbyLemon
New Contributor II
  • 0 kudos

I found that you create a similar comparison plot as what you get from seaborn by using the display(sparkdf) and adding multiple columns to the 'Values' section while creating a 'Scatter plot'. You get to the 'Customize Plot' by clicking on the icon ...

  • 0 kudos
3 More Replies
AlexRomano
by New Contributor
  • 5962 Views
  • 1 replies
  • 0 kudos

PicklingError: Could not pickle the task to send it to the workers.

I am using sklearn in a databricks notebook to fit an estimator in parallel. Sklearn uses joblib with loky backend to do this. Now, I have file in databricks which I can import my custom Classifier from, and everything works fine. However, if I lite...

  • 5962 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi, aromano I know this issue was opened almost a year ago, but I faced the same problem and I was able to solve it. So, I'm sharing the solution in order to help others. Probably, you're using SparkTrials to optimize the model's hyperparameters ...

  • 0 kudos
Mir_SakhawatHos
by New Contributor II
  • 29799 Views
  • 2 replies
  • 3 kudos

How can I delete folders from my DBFS?

I want to delete my created folder from DBFS. But how? How can I download files from there?

  • 29799 Views
  • 2 replies
  • 3 kudos
Latest Reply
IA
New Contributor II
  • 3 kudos

Hello, Max answer focuses on the CLI. Instead, using the Community Edition Platform, proceed as follows: # You must first delete all files in your folder. 1. import org.apache.hadoop.fs.{Path, FileSystem}  2. dbutils.fs.rm("/FileStore/tables/file.cs...

  • 3 kudos
1 More Replies
bhaumikg
by New Contributor II
  • 13304 Views
  • 7 replies
  • 2 kudos

Databricks throwing error "SQL DW failed to execute the JDBC query produced by the connector." while pushing the column with string length more than 255

I am using databricks to transform the data and than pushing the data into datalake. the data is getting pushed in if the length of the string field is 255 or less but it throws following error if it is beyond that. "SQL DW failed to execute the JDB...

  • 13304 Views
  • 7 replies
  • 2 kudos
Latest Reply
bhaumikg
New Contributor II
  • 2 kudos

As suggested by ZAIvR, please use append and provide maxlength while pushing the data. Overwrite may not work with this unless databricks team has fixed the issue

  • 2 kudos
6 More Replies
Nik
by New Contributor III
  • 9410 Views
  • 19 replies
  • 0 kudos

write from a Dataframe to a CSV file, CSV file is blank

Hi i am reading from a text file from a blob val sparkDF = spark.read.format(file_type) .option("header", "true") .option("inferSchema", "true") .option("delimiter", file_delimiter) .load(wasbs_string + "/" + PR_FileName) Then i test my Datafra...

  • 9410 Views
  • 19 replies
  • 0 kudos
Latest Reply
nl09
New Contributor II
  • 0 kudos

Create temp folder inside output folder. Copy file part-00000* with the file name to output folder. Delete the temp folder. Python code snippet to do the same. fpath=output+'/'+'temp' def file_exists(path): try: dbutils.fs.ls(path) return...

  • 0 kudos
18 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors