Data Engineering

Forum Posts

Sorted by:

by vibhakar • New Contributor

02-21-2023 7:36:45 AM

6003 Views
3 replies
1 kudos

Not able to mount ADLS Gen2 in Data bricks

py4j.security.Py4JSecurityException: Method public com.databricks.backend.daemon.dbutils.DBUtilsCore$Result com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.util.Map) is ...

Data Engineering

6003 Views
3 replies
1 kudos

02-21-2023 7:36:45 AM

View Replies

Latest Reply

cpradeep
New Contributor III

11-08-2024 6:15:17 AM

1 kudos

Hi , have you sorted this issue ? can you please let me know the solution?

1 kudos

11-08-2024 6:15:17 AM

2 More Replies

by ACK • New Contributor II

07-27-2022 1:12:36 AM

4650 Views
2 replies
2 kudos

Resolved! How do I pass kwargs to wheel method?

Hi,I have a method named main it takes **kwargs as a parameter. def main(**kwargs): parameterOne = kwargs["param-one"] parameterTwo = kwargs["param-two"] parameterThree = kwargs["param-optional-one"] if "param-optional-one" in kwargs else...

Data Engineering

4650 Views
2 replies
2 kudos

07-27-2022 1:12:36 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

07-27-2022 3:41:17 AM

2 kudos

it is command-line parameters so it is like ---param-one=testyou can test it with ArgumentParserfrom argparse import ArgumentParser parser = ArgumentParser() parser.add_argument("--param-one", dest="parameterOne") args = parser.parse_args()

2 kudos

07-27-2022 3:41:17 AM

1 More Replies

by patojo94 • New Contributor II

05-04-2022 5:35:26 AM

8350 Views
0 replies
0 kudos

Adding deduplication method to spark streaming

Hi everyone, I am having some troubles to add a deduplication step on a file streaming that is already running. The code I am trying to add is this one:df = df.withWatermark("arrival_time", "20 minutes")\ .dropDuplicates(["event_id", "arrival_time"])...

Data Engineering

8350 Views
0 replies
0 kudos

05-04-2022 5:35:26 AM

by User16826992666 • Databricks Employee

06-25-2021 10:38:31 AM

2901 Views
3 replies
2 kudos

Resolved! What is the best method for bringing an already trained model into MLflow?

I already have a trained and saved model that was created outside of MLflow. What is the best way to handle it if I want this model to be added to an MLflow experiment?

Data Engineering

2901 Views
3 replies
2 kudos

06-25-2021 10:38:31 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-22-2022 7:11:52 AM

2 kudos

Hi @Trevor Bishop Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

2 kudos

04-22-2022 7:11:52 AM

2 More Replies

by Jreco • Contributor

02-28-2022 3:06:20 AM

6167 Views
4 replies
1 kudos

Resolved! Method iterableAsScalaIterable does not exist Pydeequ

Hello,I'm using Databricks and pydeequ to build a QA step in structured streaming.One of the Analyzers that I need to use is the Uniqueness.If I try to add another one like Completeness, work properly, but if y add the Uniqueness I get an error:py4j....

155311239-2259d89e-e2b2-45c1-b57c-1a841ebe189e

Data Engineering

6167 Views
4 replies
1 kudos

02-28-2022 3:06:20 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

02-28-2022 4:57:19 AM

1 kudos

I think it is because you did not attach the libraries to the cluster.When you work with a notebook, the sparksession is already created.To add libraries, you should install them on the cluster (in the compute tab) using f.e. pypi/maven etc.

1 kudos

02-28-2022 4:57:19 AM

3 More Replies

by wyzer • Contributor II

02-07-2022 6:06:57 AM

2904 Views
2 replies
1 kudos

Resolved! Are we using the advantage of "Map & Reduce" ?

Hello,We are new on Databricks and we would like to know if our working method are good.Currently, we are working like this :spark.sql("CREATE TABLE Temp (SELECT avg(***), sum(***) FROM aaa LEFT JOIN bbb WHERE *** >= ***)")With this method, are we us...

Data Engineering

2904 Views
2 replies
1 kudos

02-07-2022 6:06:57 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

02-08-2022 2:23:30 AM

1 kudos

Spark will handle the map/reduce for you.So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.You just care about what you want as a result.And afterwards when you are more...

1 kudos

02-08-2022 2:23:30 AM

1 More Replies

by missyT • New Contributor III

11-26-2021 4:13:32 AM

2178 Views
1 replies
3 kudos

Is there a reason lists don't have a .sum() method?

I do a lot of work with numpy arrays and pytorch tensors, but occasionally throw some native lists around. I naturally want to write <list>.sum(), which would work for these other third-party iterables, but doesn't work for native lists.It'd be very ...

Data Engineering

2178 Views
1 replies
3 kudos

11-26-2021 4:13:32 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

11-26-2021 4:19:56 AM

3 kudos

I think reason is that list can contain different type of objects than just integers and floats (so nested lists, string and all possible other kind of objects) so it doesn't make sense to implement .sum method as it would fail in many cases.

3 kudos

11-26-2021 4:19:56 AM

by Anonymous • Not applicable

11-11-2021 7:51:06 AM

2683 Views
0 replies
0 kudos

Is the "patch"/update method of the repos API synchronous?

The repos API has a patch method to update a repo in the workspace (to do a git pull).We would please like to verify: is this method fully synchronous? Is it guaranteed to only return a 200 after the update is complete? Or, would immediately referenc...

Data Engineering

2683 Views
0 replies
0 kudos

11-11-2021 7:51:06 AM

Databricks Community

Not able to mount ADLS Gen2 in Data bricks

Resolved! How do I pass kwargs to wheel method?

Adding deduplication method to spark streaming

Resolved! What is the best method for bringing an already trained model into MLflow?

Resolved! Method iterableAsScalaIterable does not exist Pydeequ

Resolved! Are we using the advantage of "Map & Reduce" ?

Is there a reason lists don't have a .sum() method?

Is the "patch"/update method of the repos API synchronous?