cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

vibhakar
by New Contributor
  • 4397 Views
  • 3 replies
  • 1 kudos

Not able to mount ADLS Gen2 in Data bricks

py4j.security.Py4JSecurityException: Method public com.databricks.backend.daemon.dbutils.DBUtilsCore$Result com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.util.Map) is ...

  • 4397 Views
  • 3 replies
  • 1 kudos
Latest Reply
cpradeep
New Contributor II
  • 1 kudos

Hi , have you sorted this issue ? can you please let me know the solution? 

  • 1 kudos
2 More Replies
ACK
by New Contributor II
  • 3130 Views
  • 2 replies
  • 2 kudos

Resolved! How do I pass kwargs to wheel method?

Hi,I have a method named main it takes **kwargs as a parameter. def main(**kwargs): parameterOne = kwargs["param-one"] parameterTwo = kwargs["param-two"] parameterThree = kwargs["param-optional-one"] if "param-optional-one" in kwargs else...

  • 3130 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

it is command-line parameters so it is like ---param-one=testyou can test it with ArgumentParserfrom argparse import ArgumentParser   parser = ArgumentParser() parser.add_argument("--param-one", dest="parameterOne")   args = parser.parse_args()

  • 2 kudos
1 More Replies
patojo94
by New Contributor II
  • 6943 Views
  • 0 replies
  • 0 kudos

Adding deduplication method to spark streaming

Hi everyone, I am having some troubles to add a deduplication step on a file streaming that is already running. The code I am trying to add is this one:df = df.withWatermark("arrival_time", "20 minutes")\ .dropDuplicates(["event_id", "arrival_time"])...

  • 6943 Views
  • 0 replies
  • 0 kudos
User16826992666
by Valued Contributor
  • 1857 Views
  • 3 replies
  • 2 kudos

Resolved! What is the best method for bringing an already trained model into MLflow?

I already have a trained and saved model that was created outside of MLflow. What is the best way to handle it if I want this model to be added to an MLflow experiment?

  • 1857 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Trevor Bishop​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 2 kudos
2 More Replies
Jreco
by Contributor
  • 3725 Views
  • 4 replies
  • 1 kudos

Resolved! Method iterableAsScalaIterable does not exist Pydeequ

Hello,I'm using Databricks and pydeequ to build a QA step in structured streaming.One of the Analyzers that I need to use is the Uniqueness.If I try to add another one like Completeness, work properly, but if y add the Uniqueness I get an error:py4j....

155309688-d4d03acc-1012-42ec-8d40-9cbf4b8d12c3.png 155311239-2259d89e-e2b2-45c1-b57c-1a841ebe189e 155309988-fd6ec25f-53ec-4f7a-a37a-e3596cefe10e
  • 3725 Views
  • 4 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

I think it is because you did not attach the libraries to the cluster.When you work with a notebook, the sparksession is already created.To add libraries, you should install them on the cluster (in the compute tab) using f.e. pypi/maven etc.

  • 1 kudos
3 More Replies
wyzer
by Contributor II
  • 2089 Views
  • 2 replies
  • 1 kudos

Resolved! Are we using the advantage of "Map & Reduce" ?

Hello,We are new on Databricks and we would like to know if our working method are good.Currently, we are working like this :spark.sql("CREATE TABLE Temp (SELECT avg(***), sum(***) FROM aaa LEFT JOIN bbb WHERE *** >= ***)")With this method, are we us...

  • 2089 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

Spark will handle the map/reduce for you.So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.You just care about what you want as a result.And afterwards when you are more...

  • 1 kudos
1 More Replies
missyT
by New Contributor III
  • 1513 Views
  • 1 replies
  • 3 kudos

Is there a reason lists don't have a .sum() method?

I do a lot of work with numpy arrays and pytorch tensors, but occasionally throw some native lists around. I naturally want to write <list>.sum(), which would work for these other third-party iterables, but doesn't work for native lists.It'd be very ...

  • 1513 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

I think reason is that list can contain different type of objects than just integers and floats (so nested lists, string and all possible other kind of objects) so it doesn't make sense to implement .sum method as it would fail in many cases.

  • 3 kudos
Anonymous
by Not applicable
  • 1663 Views
  • 0 replies
  • 0 kudos

Is the "patch"/update method of the repos API synchronous?

The repos API has a patch method to update a repo in the workspace (to do a git pull).We would please like to verify: is this method fully synchronous? Is it guaranteed to only return a 200 after the update is complete? Or, would immediately referenc...

  • 1663 Views
  • 0 replies
  • 0 kudos
Labels