cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

dchokkadi1_5588
by New Contributor II
  • 14570 Views
  • 8 replies
  • 0 kudos

Resolved! graceful dbutils mount/unmount

Is there a way to indicate to dbutils.fs.mount to not throw an error if the mount is already mounted? And viceversa, for unmount to not throw an error if it is already unmounted? I am trying to run my notebook as a job and it has a init section that...

  • 14570 Views
  • 8 replies
  • 0 kudos
Latest Reply
Mariano_IrvinLo
New Contributor II
  • 0 kudos

If you use scala to mount a gen 2 data lake you could try something like this /Gather relevant Keys/ var ServicePrincipalID = "" var ServicePrincipalKey = "" var DirectoryID = "" /Create configurations for our connection/ var configs = Map (...

  • 0 kudos
7 More Replies
Barb
by New Contributor III
  • 6128 Views
  • 6 replies
  • 0 kudos

SQL charindex function?

Hi all,I need to use the SQL charindex function, but I'm getting a databricks error that this doesn't exist. That can't be true, right? Thanks for any ideas about how to make this work!Barb

  • 6128 Views
  • 6 replies
  • 0 kudos
Latest Reply
Traveller
New Contributor II
  • 0 kudos

The best option I found to replace CHARINDEX was LOCATE, examples from the Spark documentation below > SELECT locate('bar', 'foobarbar', 5); 7 > SELECT POSITION('bar' IN 'foobarbar'); 4

  • 0 kudos
5 More Replies
SatheeshSathees
by New Contributor
  • 6562 Views
  • 1 replies
  • 0 kudos

how to dynamically explode array type column in pyspark or scala

HI, i have a parquet file with complex column types with nested structs and arrays. I am using the scrpit from below link to flatten my parquet file. https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schema I am able ...

  • 6562 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Valued Contributor
  • 0 kudos

Hello, Please check out the below docs and notebook which has similar examples, https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schemahttps://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/transform-comple...

  • 0 kudos
zachary_jones
by New Contributor
  • 3957 Views
  • 3 replies
  • 0 kudos

Resolved! Python logging: 'Operation not supported' after upgrading to DBRT 6.1

My organization has an S3 bucket mounted to the databricks filesystem under /dbfs/mnt. When using Databricks runtime 5.5 and below, the following logging code works correctly:log_file = '/dbfs/mnt/path/to/my/bucket/test.log' logger = logging.getLogg...

  • 3957 Views
  • 3 replies
  • 0 kudos
Latest Reply
lycenok
New Contributor II
  • 0 kudos

Probably it's worth to try to rewrite the emit ... https://docs.python.org/3/library/logging.html#handlers This works for me: class OurFileHandler(logging.FileHandler): def emit(self, record): # copied from https://github.com/python/cpython/bl...

  • 0 kudos
2 More Replies
DimitrisMpizos
by New Contributor
  • 30506 Views
  • 16 replies
  • 0 kudos

Exporting data from databricks

I couldn't find in documentation a way to export an RDD as a text file to a local folder by using python. Is it possible?

  • 30506 Views
  • 16 replies
  • 0 kudos
Latest Reply
Manu1
New Contributor II
  • 0 kudos

To: Export a file to local desktop Workaround : Basically you have to do a "Create a table in notebook" with DBFS The steps are: Click on "Data" icon > Click "Add Data" button > Click "DBFS" button > Click "FileStore" folder icon in 1st pane "Sele...

  • 0 kudos
15 More Replies
MarcoMistroni
by New Contributor II
  • 16635 Views
  • 4 replies
  • 0 kudos

pandas.read_csv

HI all i have uploaded a file on my cluster , at location /FileStore/tables/qmwxhxvi1505337108590/PastHires.csv However, whenever i try to read it using panda df = pd.read_csv('dbfs:/FileStore/tables/qmwxhxvi1505337108590/PastHires.csv') , i alwas...

  • 16635 Views
  • 4 replies
  • 0 kudos
Latest Reply
rohitshah
New Contributor II
  • 0 kudos

I am also having same issue, I have uploaded file in DBFS and it gives some default code which itself is not working. Is anyone has solved this issue ?

  • 0 kudos
3 More Replies
olisch
by New Contributor
  • 19034 Views
  • 3 replies
  • 0 kudos

Spark: How to simultaneously read from and write to the same parquet file

How can I read a DataFrame from a parquet file, do transformations and write this modified DataFrame back to the same same parquet file? If I attempt to do so, I get an error, understandably because spark reads from the source and one cannot writ...

  • 19034 Views
  • 3 replies
  • 0 kudos
Latest Reply
saravananraju
New Contributor II
  • 0 kudos

Hi, You can use insertinto instead of save. It will overwrite the target file no need to cache or persist your dataframe Df.write.format("parquet").mode("overwrite").insertInto("/file_path") ~Saravanan

  • 0 kudos
2 More Replies
DineshKumar
by New Contributor III
  • 9910 Views
  • 3 replies
  • 0 kudos

How to convert the first row as column from an existing dataframe.

I have a dataframe like below. I want to convert the first row as columns for this dataframe. How could I do this. Is there any way to convert it directly.(without using df.first) usdata.show() -----+---+------------+------------+-------------------...

  • 9910 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16857282152
Contributor
  • 0 kudos

My point was that you are asking for column names from what you consider to be the "first row" and I am telling you that at scale, or if the data volume grows what you consider to be the "first row" may no longer actually be the "first row" unless ...

  • 0 kudos
2 More Replies
RahulMukherjee
by New Contributor
  • 21151 Views
  • 1 replies
  • 1 kudos

I am trying to load a delta table from a dataframe. But its giving me an error.

Code : from pyspark.sql.functions import *acDF = spark.read.format('csv').options(header='true', inferschema='true').load("/mnt/rahulmnt/Insurance_Info1.csv"); acDF.write.option("overwriteSchema", "true").format("delta").mode("overwrite").save("/delt...

  • 21151 Views
  • 1 replies
  • 1 kudos
Latest Reply
AbhaKhanna
New Contributor II
  • 1 kudos

1. using Spark SQL Context in python, scala notebooks : sql("SET spark.databricks.delta.formatCheck.enabled=false") 2. In SQL dbc notebooks: SET spark.databricks.delta.formatCheck.enabled=false

  • 1 kudos
Anbazhagananbut
by New Contributor II
  • 3219 Views
  • 2 replies
  • 0 kudos

Pyspark Convert Struct Type to Map Type

Hello Sir, Could you please advise the below scenario in pyspark 2.4.3 in data-bricksto load the data into the delta table.I want to load the dataframe with this column "data" into the table as Maptype in the data-bricks spark delta table.could you ...

  • 3219 Views
  • 2 replies
  • 0 kudos
Latest Reply
sherryellis
New Contributor II
  • 0 kudos

you can do it by making an api request - convert png to ico paint/api/2.0/clusters/permanent-delete i dont see an option to delete or edit an automated cluster from UI.

  • 0 kudos
1 More Replies
SimonNuss
by New Contributor II
  • 27965 Views
  • 6 replies
  • 5 kudos

Resolved! Databricks cannot access Azure Key Vault

I am trying to set retrieve a secret from Azure Key Vault as follows: sqlPassword = dbutils.secrets.get(scope = "Admin", key = "SqlPassword") The scope has been created correctly, but I receive the following error message: com.databricks.common.clie...

  • 27965 Views
  • 6 replies
  • 5 kudos
Latest Reply
virahkumar
New Contributor II
  • 5 kudos

Sometimes turning it off and on again is underrated, so I gave up finding the problem, deleted it and re-created the scope - worked a breeze!Mine seems like it was something silly, I was able to set up my vault but got the same issue when trying to ...

  • 5 kudos
5 More Replies
KutayKoralturk
by New Contributor
  • 7832 Views
  • 2 replies
  • 0 kudos

Filtering rows that does not contain a string

search = search.filter(!F.col("Name").contains("ABC")) search = search.filter(F.not(F.col("Name").contains("ABC")) Both methods fail due to syntax error could you please help me filter rows that does not contain a certain string in pyspark. ^ Synta...

  • 7832 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16857282152
Contributor
  • 0 kudos

Here is a complete example values = [("K1","true","false"),("K2","true","false")] columns = ['Key', 'V1', 'V2'] df = spark.createDataFrame(values, columns) display(df) FILTER df2 = df.filter(df.column2 != "delete") display(df2)

  • 0 kudos
1 More Replies
SergeyIvanchuk
by New Contributor
  • 9135 Views
  • 4 replies
  • 0 kudos

Seaborn plot display in Databricks

I am using Seaborn version 0.7.1 and matplotlib version 1.5.3 The following code does not display a graph in the end. Any idea how to resolve ? (works in Python CLI on my local computer) import seaborn as sns sns.set(style="darkgrid") tips = sns.lo...

  • 9135 Views
  • 4 replies
  • 0 kudos
Latest Reply
AbbyLemon
New Contributor II
  • 0 kudos

I found that you create a similar comparison plot as what you get from seaborn by using the display(sparkdf) and adding multiple columns to the 'Values' section while creating a 'Scatter plot'. You get to the 'Customize Plot' by clicking on the icon ...

  • 0 kudos
3 More Replies
AlexRomano
by New Contributor
  • 6716 Views
  • 1 replies
  • 0 kudos

PicklingError: Could not pickle the task to send it to the workers.

I am using sklearn in a databricks notebook to fit an estimator in parallel. Sklearn uses joblib with loky backend to do this. Now, I have file in databricks which I can import my custom Classifier from, and everything works fine. However, if I lite...

  • 6716 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi, aromano I know this issue was opened almost a year ago, but I faced the same problem and I was able to solve it. So, I'm sharing the solution in order to help others. Probably, you're using SparkTrials to optimize the model's hyperparameters ...

  • 0 kudos
Mir_SakhawatHos
by New Contributor II
  • 31850 Views
  • 2 replies
  • 3 kudos

How can I delete folders from my DBFS?

I want to delete my created folder from DBFS. But how? How can I download files from there?

  • 31850 Views
  • 2 replies
  • 3 kudos
Latest Reply
IA
New Contributor II
  • 3 kudos

Hello, Max answer focuses on the CLI. Instead, using the Community Edition Platform, proceed as follows: # You must first delete all files in your folder. 1. import org.apache.hadoop.fs.{Path, FileSystem}  2. dbutils.fs.rm("/FileStore/tables/file.cs...

  • 3 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels