Data Engineering

Forum Posts

Sorted by:

by kumarPerry • New Contributor II

04-11-2023 10:46:49 AM

3940 Views
3 replies
0 kudos

Notebook connectivity issue with aws s3 bucket using mounting

When connecting to aws s3 bucket using dbfs, application throws error like org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7864387.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7864387.0 (TID 1709732...

Data Engineering

3940 Views
3 replies
0 kudos

04-11-2023 10:46:49 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-15-2023 11:50:12 PM

0 kudos

Hi @Amrendra Kumar Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

0 kudos

04-15-2023 11:50:12 PM

2 More Replies

by jonathan-dufaul • Valued Contributor

12-30-2022 10:56:02 AM

2772 Views
3 replies
3 kudos

Resolved! Why does chaining spark.read from one system/driver and .write to another system/driver take so much longer than doing each piece individually?

i am reading data from IBM DB2 and saving into a MS SQL server (the first step is moving the code itself to databricks, and then we will move the databases to databricks itself). Problem I'm running into is doing something like the below will take > ...

Data Engineering

2772 Views
3 replies
3 kudos

12-30-2022 10:56:02 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-02-2023 7:56:11 AM

3 kudos

Hi, it is related to partitioning optimization. By default, the JDBC driver queries the source database with only a single thread. So write was from one partition as one partition was created, so it was using a single core. When you used pandas, it d...

3 kudos

01-02-2023 7:56:11 AM

2 More Replies

by Bit-Warrior • New Contributor

09-16-2022 11:52:43 AM

924 Views
0 replies
0 kudos

Installing System ML on the cluster

I am trying to install the systemml package from Maven, I ignored the librarieslog4j:log4j, com:sun.jdmk, com:sun.jmx, javax:jmsBut when I run one command of systemml, then spark/databricks can no longer select from tables, effectively breaking somet...

Data Engineering

924 Views
0 replies
0 kudos

09-16-2022 11:52:43 AM

by Raymond_Garcia • Contributor II

05-19-2022 9:23:03 AM

3520 Views
4 replies
2 kudos

Migrating from Databricks Notebooks to IDE for Development

Hello, we are developers who have been creating a system in Databricks with Scala. We enabled the Git feature, so the project is in a repository. The project has a lot of notebooks and a lot of calls to other notebooks. Sometimes it is a little overw...

Data Engineering

3520 Views
4 replies
2 kudos

05-19-2022 9:23:03 AM

View Replies

Latest Reply

Raymond_Garcia
Contributor II

06-24-2022 11:22:23 AM

2 kudos

it is true that we can't work without data bricks but we can develop an IDE and send the jar to databricks, this will allow us to create unit tests, and use the IDE capabilities (i.e fast navigation among classes).

2 kudos

06-24-2022 11:22:23 AM

3 More Replies

by snoeprol • New Contributor II

10-17-2021 5:25:45 AM

6735 Views
3 replies
2 kudos

Resolved! Unable to open files with python, but filesystem shows files exist

Dear community,I have the following problem:%fs mv '/FileStore/Tree_point_classification-1.dlpk' '/dbfs/mnt/group22/Tree_point_classification-1.dlpk'I have uploaded a file of a ML-model and have transferred it to the directory with When I now check ...

Data Engineering

6735 Views
3 replies
2 kudos

10-17-2021 5:25:45 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

10-18-2021 3:38:36 AM

2 kudos

There is dbfs:/dbfs/ displayed maybe file is in /dbfs/dbfs directory? Please check it and try to open with open('/dbfs/dbfs. You can also use "data" from left menu to check what is in dbfs file system more easily.

2 kudos

10-18-2021 3:38:36 AM

2 More Replies

Databricks Community

Notebook connectivity issue with aws s3 bucket using mounting

Resolved! Why does chaining spark.read from one system/driver and .write to another system/driver take so much longer than doing each piece individually?

Installing System ML on the cluster

Migrating from Databricks Notebooks to IDE for Development

Resolved! Unable to open files with python, but filesystem shows files exist