cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

nthomas
by New Contributor
  • 11675 Views
  • 5 replies
  • 0 kudos

Tips for properly using large broadcast variables?

I'm using a broadcast variable about 100 MB pickled in size, which I'm approximating with: >>> data = list(range(int(10*1e6))) >>> import cPickle as pickle >>> len(pickle.dumps(data)) 98888896Running on a cluster with 3 c3.2xlarge executors, ...

  • 11675 Views
  • 5 replies
  • 0 kudos
Latest Reply
dragoncity
New Contributor II
  • 0 kudos

The Facebook credit can be utilized by the gamers to purchase the pearls. The other route is to finished various sorts of Dragons in the Dragon Book. Dragon City Gems There are various kinds of Dragons, one is amazing, at that point you have the fund...

  • 0 kudos
4 More Replies
JulioManuelNava
by New Contributor
  • 8839 Views
  • 2 replies
  • 0 kudos

[pyspark] foreach + print produces no output

The following code produces no output. It seems as if the print(x) is not being executed for each "words" element: words = sc.parallelize ( ["scala", "java", "hadoop", "spark", "akka", "spark vs hadoop", "pyspark", "pysp...

  • 8839 Views
  • 2 replies
  • 0 kudos
Latest Reply
john_nicholas
New Contributor II
  • 0 kudos

Epson wf-3640 error code 0x97 is the common printer error code that may occur mostly in all printers but in order to resolve the error code, upon provides the best printer guide to all printer users.

  • 0 kudos
1 More Replies
dchokkadi1_5588
by New Contributor II
  • 18984 Views
  • 8 replies
  • 0 kudos

Resolved! graceful dbutils mount/unmount

Is there a way to indicate to dbutils.fs.mount to not throw an error if the mount is already mounted? And viceversa, for unmount to not throw an error if it is already unmounted? I am trying to run my notebook as a job and it has a init section that...

  • 18984 Views
  • 8 replies
  • 0 kudos
Latest Reply
Mariano_IrvinLo
New Contributor II
  • 0 kudos

If you use scala to mount a gen 2 data lake you could try something like this /Gather relevant Keys/ var ServicePrincipalID = "" var ServicePrincipalKey = "" var DirectoryID = "" /Create configurations for our connection/ var configs = Map (...

  • 0 kudos
7 More Replies
Barb
by New Contributor III
  • 10188 Views
  • 6 replies
  • 0 kudos

SQL charindex function?

Hi all,I need to use the SQL charindex function, but I'm getting a databricks error that this doesn't exist. That can't be true, right? Thanks for any ideas about how to make this work!Barb

  • 10188 Views
  • 6 replies
  • 0 kudos
Latest Reply
Traveller
New Contributor II
  • 0 kudos

The best option I found to replace CHARINDEX was LOCATE, examples from the Spark documentation below > SELECT locate('bar', 'foobarbar', 5); 7 > SELECT POSITION('bar' IN 'foobarbar'); 4

  • 0 kudos
5 More Replies
SatheeshSathees
by New Contributor
  • 8732 Views
  • 1 replies
  • 0 kudos

how to dynamically explode array type column in pyspark or scala

HI, i have a parquet file with complex column types with nested structs and arrays. I am using the scrpit from below link to flatten my parquet file. https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schema I am able ...

  • 8732 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Databricks Employee
  • 0 kudos

Hello, Please check out the below docs and notebook which has similar examples, https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schemahttps://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/transform-comple...

  • 0 kudos
zachary_jones
by New Contributor
  • 5964 Views
  • 3 replies
  • 0 kudos

Resolved! Python logging: 'Operation not supported' after upgrading to DBRT 6.1

My organization has an S3 bucket mounted to the databricks filesystem under /dbfs/mnt. When using Databricks runtime 5.5 and below, the following logging code works correctly:log_file = '/dbfs/mnt/path/to/my/bucket/test.log' logger = logging.getLogg...

  • 5964 Views
  • 3 replies
  • 0 kudos
Latest Reply
lycenok
New Contributor II
  • 0 kudos

Probably it's worth to try to rewrite the emit ... https://docs.python.org/3/library/logging.html#handlers This works for me: class OurFileHandler(logging.FileHandler): def emit(self, record): # copied from https://github.com/python/cpython/bl...

  • 0 kudos
2 More Replies
DimitrisMpizos
by New Contributor
  • 49712 Views
  • 16 replies
  • 0 kudos

Exporting data from databricks

I couldn't find in documentation a way to export an RDD as a text file to a local folder by using python. Is it possible?

  • 49712 Views
  • 16 replies
  • 0 kudos
Latest Reply
Manu1
New Contributor II
  • 0 kudos

To: Export a file to local desktop Workaround : Basically you have to do a "Create a table in notebook" with DBFS The steps are: Click on "Data" icon > Click "Add Data" button > Click "DBFS" button > Click "FileStore" folder icon in 1st pane "Sele...

  • 0 kudos
15 More Replies
MarcoMistroni
by New Contributor II
  • 18695 Views
  • 4 replies
  • 0 kudos

pandas.read_csv

HI all i have uploaded a file on my cluster , at location /FileStore/tables/qmwxhxvi1505337108590/PastHires.csv However, whenever i try to read it using panda df = pd.read_csv('dbfs:/FileStore/tables/qmwxhxvi1505337108590/PastHires.csv') , i alwas...

  • 18695 Views
  • 4 replies
  • 0 kudos
Latest Reply
rohitshah
New Contributor II
  • 0 kudos

I am also having same issue, I have uploaded file in DBFS and it gives some default code which itself is not working. Is anyone has solved this issue ?

  • 0 kudos
3 More Replies
olisch
by New Contributor
  • 25822 Views
  • 3 replies
  • 0 kudos

Spark: How to simultaneously read from and write to the same parquet file

How can I read a DataFrame from a parquet file, do transformations and write this modified DataFrame back to the same same parquet file? If I attempt to do so, I get an error, understandably because spark reads from the source and one cannot writ...

  • 25822 Views
  • 3 replies
  • 0 kudos
Latest Reply
saravananraju
New Contributor II
  • 0 kudos

Hi, You can use insertinto instead of save. It will overwrite the target file no need to cache or persist your dataframe Df.write.format("parquet").mode("overwrite").insertInto("/file_path") ~Saravanan

  • 0 kudos
2 More Replies
DineshKumar
by New Contributor III
  • 15502 Views
  • 3 replies
  • 0 kudos

How to convert the first row as column from an existing dataframe.

I have a dataframe like below. I want to convert the first row as columns for this dataframe. How could I do this. Is there any way to convert it directly.(without using df.first) usdata.show() -----+---+------------+------------+-------------------...

  • 15502 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16857282152
Databricks Employee
  • 0 kudos

My point was that you are asking for column names from what you consider to be the "first row" and I am telling you that at scale, or if the data volume grows what you consider to be the "first row" may no longer actually be the "first row" unless ...

  • 0 kudos
2 More Replies
RahulMukherjee
by New Contributor
  • 22531 Views
  • 1 replies
  • 1 kudos

I am trying to load a delta table from a dataframe. But its giving me an error.

Code : from pyspark.sql.functions import *acDF = spark.read.format('csv').options(header='true', inferschema='true').load("/mnt/rahulmnt/Insurance_Info1.csv"); acDF.write.option("overwriteSchema", "true").format("delta").mode("overwrite").save("/delt...

  • 22531 Views
  • 1 replies
  • 1 kudos
Latest Reply
AbhaKhanna
New Contributor II
  • 1 kudos

1. using Spark SQL Context in python, scala notebooks : sql("SET spark.databricks.delta.formatCheck.enabled=false") 2. In SQL dbc notebooks: SET spark.databricks.delta.formatCheck.enabled=false

  • 1 kudos
Anbazhagananbut
by New Contributor II
  • 4625 Views
  • 2 replies
  • 0 kudos

Pyspark Convert Struct Type to Map Type

Hello Sir, Could you please advise the below scenario in pyspark 2.4.3 in data-bricksto load the data into the delta table.I want to load the dataframe with this column "data" into the table as Maptype in the data-bricks spark delta table.could you ...

  • 4625 Views
  • 2 replies
  • 0 kudos
Latest Reply
sherryellis
New Contributor II
  • 0 kudos

you can do it by making an api request - convert png to ico paint/api/2.0/clusters/permanent-delete i dont see an option to delete or edit an automated cluster from UI.

  • 0 kudos
1 More Replies
SimonNuss
by New Contributor II
  • 39758 Views
  • 6 replies
  • 5 kudos

Resolved! Databricks cannot access Azure Key Vault

I am trying to set retrieve a secret from Azure Key Vault as follows: sqlPassword = dbutils.secrets.get(scope = "Admin", key = "SqlPassword") The scope has been created correctly, but I receive the following error message: com.databricks.common.clie...

  • 39758 Views
  • 6 replies
  • 5 kudos
Latest Reply
virahkumar
New Contributor II
  • 5 kudos

Sometimes turning it off and on again is underrated, so I gave up finding the problem, deleted it and re-created the scope - worked a breeze!Mine seems like it was something silly, I was able to set up my vault but got the same issue when trying to ...

  • 5 kudos
5 More Replies
KutayKoralturk
by New Contributor
  • 9885 Views
  • 2 replies
  • 0 kudos

Filtering rows that does not contain a string

search = search.filter(!F.col("Name").contains("ABC")) search = search.filter(F.not(F.col("Name").contains("ABC")) Both methods fail due to syntax error could you please help me filter rows that does not contain a certain string in pyspark. ^ Synta...

  • 9885 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16857282152
Databricks Employee
  • 0 kudos

Here is a complete example values = [("K1","true","false"),("K2","true","false")] columns = ['Key', 'V1', 'V2'] df = spark.createDataFrame(values, columns) display(df) FILTER df2 = df.filter(df.column2 != "delete") display(df2)

  • 0 kudos
1 More Replies
SergeyIvanchuk
by New Contributor
  • 13456 Views
  • 4 replies
  • 0 kudos

Seaborn plot display in Databricks

I am using Seaborn version 0.7.1 and matplotlib version 1.5.3 The following code does not display a graph in the end. Any idea how to resolve ? (works in Python CLI on my local computer) import seaborn as sns sns.set(style="darkgrid") tips = sns.lo...

  • 13456 Views
  • 4 replies
  • 0 kudos
Latest Reply
AbbyLemon
New Contributor II
  • 0 kudos

I found that you create a similar comparison plot as what you get from seaborn by using the display(sparkdf) and adding multiple columns to the 'Values' section while creating a 'Scatter plot'. You get to the 'Customize Plot' by clicking on the icon ...

  • 0 kudos
3 More Replies
Labels