cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

kumarPerry
by New Contributor II
  • 2955 Views
  • 3 replies
  • 0 kudos

Notebook connectivity issue with aws s3 bucket using mounting

When connecting to aws s3 bucket using dbfs, application throws error like org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7864387.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7864387.0 (TID 1709732...

  • 2955 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Amrendra Kumar​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

  • 0 kudos
2 More Replies
jonathan-dufaul
by Valued Contributor
  • 1816 Views
  • 3 replies
  • 3 kudos

Resolved! Why does chaining spark.read from one system/driver and .write to another system/driver take so much longer than doing each piece individually?

i am reading data from IBM DB2 and saving into a MS SQL server (the first step is moving the code itself to databricks, and then we will move the databases to databricks itself). Problem I'm running into is doing something like the below will take > ...

  • 1816 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

Hi, it is related to partitioning optimization. By default, the JDBC driver queries the source database with only a single thread. So write was from one partition as one partition was created, so it was using a single core. When you used pandas, it d...

  • 3 kudos
2 More Replies
Bit-Warrior
by New Contributor
  • 658 Views
  • 0 replies
  • 0 kudos

Installing System ML on the cluster

I am trying to install the systemml package from Maven, I ignored the librarieslog4j:log4j, com:sun.jdmk, com:sun.jmx, javax:jmsBut when I run one command of systemml, then spark/databricks can no longer select from tables, effectively breaking somet...

  • 658 Views
  • 0 replies
  • 0 kudos
Raymond_Garcia
by Contributor II
  • 2492 Views
  • 4 replies
  • 2 kudos

Migrating from Databricks Notebooks to IDE for Development

Hello, we are developers who have been creating a system in Databricks with Scala. We enabled the Git feature, so the project is in a repository. The project has a lot of notebooks and a lot of calls to other notebooks. Sometimes it is a little overw...

  • 2492 Views
  • 4 replies
  • 2 kudos
Latest Reply
Raymond_Garcia
Contributor II
  • 2 kudos

it is true that we can't work without data bricks but we can develop an IDE and send the jar to databricks, this will allow us to create unit tests, and use the IDE capabilities (i.e fast navigation among classes).

  • 2 kudos
3 More Replies
snoeprol
by New Contributor II
  • 5157 Views
  • 3 replies
  • 2 kudos

Resolved! Unable to open files with python, but filesystem shows files exist

Dear community,I have the following problem:%fs mv '/FileStore/Tree_point_classification-1.dlpk' '/dbfs/mnt/group22/Tree_point_classification-1.dlpk'I have uploaded a file of a ML-model and have transferred it to the directory with When I now check ...

  • 5157 Views
  • 3 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

There is dbfs:/dbfs/ displayed maybe file is in /dbfs/dbfs directory? Please check it and try to open with open('/dbfs/dbfs. You can also use "data" from left menu to check what is in dbfs file system more easily.

  • 2 kudos
2 More Replies
Labels