cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

rammy
by Contributor III
  • 1500 Views
  • 1 replies
  • 5 kudos

Not able to parse .doc extension file using scala in databricks notebook?

I could able to parse .doc extension files using Java programming with the help of POI libraries but when trying to convert Java code into Scala i expect it has to work with same java libraries with Scala programming but it is showing with below erro...

error screenshot Jar dependencies
  • 1500 Views
  • 1 replies
  • 5 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 5 kudos

Hi @Ramesh Bathini​ In pyspark, we have a docx module. I found that to be working perfectly fine. Can you try using that ?Documentation and stuff could be found online. Cheers...

  • 5 kudos
pabloaus
by New Contributor III
  • 5010 Views
  • 2 replies
  • 4 kudos

Resolved! How to read sql file from a Repo to string

I am trying to read a sql file in the repo to string. I have triedwith open("/Workspace/Repos/xx@***.com//file.sql","r") as queryFile: queryText = queryFile.read()And I get following error.[Errno 1] Operation not permitted: '/Workspace/Repos/***@*...

  • 5010 Views
  • 2 replies
  • 4 kudos
Latest Reply
Senthil1
Contributor
  • 4 kudos

I checked in my unity_catalog enabled cluster, i am able to access the @repos file to read and display

  • 4 kudos
1 More Replies
Ryan_Chynoweth
by Esteemed Contributor
  • 6987 Views
  • 3 replies
  • 7 kudos

Resolved! Best language to use

Databricks supports SQL, Scala, Python, and R. Is there a most performant language to use on Databricks? I know SQL well but would like to get into one of the other languages and don't know which to focus on.

  • 6987 Views
  • 3 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

It total depends on you? BTW, you can choose Python and SQL

  • 7 kudos
2 More Replies
Rahul_Tiwary
by New Contributor II
  • 5365 Views
  • 2 replies
  • 4 kudos

Getting Error "java.lang.NoSuchMethodError: org.apache.spark.sql.AnalysisException" while writing data to event hub for streaming. It is working fine if I am writing it to another data brick table

import org.apache.spark.sql._import scala.collection.JavaConverters._import com.microsoft.azure.eventhubs._import java.util.concurrent._import scala.collection.immutable._import org.apache.spark.eventhubs._import scala.concurrent.Futureimport scala.c...

  • 5365 Views
  • 2 replies
  • 4 kudos
Latest Reply
Gepap
New Contributor II
  • 4 kudos

The dataframe to write needs to have the following schema:Column | Type ---------------------------------------------- body (required) | string or binary partitionId (*optional) | string partitionKey...

  • 4 kudos
1 More Replies
BkP
by Contributor
  • 2032 Views
  • 3 replies
  • 3 kudos

Scala Connectivity to Databricks Bronze Layer Raw Data from a Non-Databricks Spark environment

Hi All, We are developing a new Scala/Java program which needs to read & process the raw data stored in source ADLS (which is a Databricks Environment) in parallel as the volume of the source data is very high (in GBs & TBs). What kind of connection ...

requirement
  • 2032 Views
  • 3 replies
  • 3 kudos
Latest Reply
BkP
Contributor
  • 3 kudos

hello experts. any advise on this question ?? tagging some folks from whom I have received answers before. Please help on this requirement or tag someone who can help on this@Kaniz Fatma​ , @Vartika Nain​ , @Bilal Aslam​ 

  • 3 kudos
2 More Replies
pret
by New Contributor II
  • 2839 Views
  • 4 replies
  • 0 kudos

How can I run a scala command line in databricks?

I wish to run a scala command, which I believe would normally be run from a scala command line rather than from within a notebook. It happens to be:scala [-cp scalatest-<version>.jar:...] org.scalatest.tools.Runner [arguments](scalatest_2.12__3.0.8.j...

  • 2839 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @David Vardy​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 0 kudos
3 More Replies
isaac_gritz
by Valued Contributor II
  • 3083 Views
  • 6 replies
  • 8 kudos

Library Dependency

How to Install Libraries on DatabricksYou can install libraries in Databricks at the cluster level for libraries commonly used on a cluster, at the notebook-level using %pip, or using global init scripts when you have libraries that should be install...

  • 3083 Views
  • 6 replies
  • 8 kudos
Latest Reply
Chris_Shehu
Valued Contributor III
  • 8 kudos

It can be a risky to install libraries without any sort of oversite/security structure to ensure those libraries have no vulnerabilities. I think more caution needs to be added to the wording of these documents to express that. All of the libraries w...

  • 8 kudos
5 More Replies
learnerbricks
by New Contributor II
  • 1215 Views
  • 2 replies
  • 0 kudos

how should I start databricks ?

Hello Guys,I am new to databricks. I have try to read the documentation as much I can. Now I want to jump in. What I Want : I have store my parquet file in Databricks storage system. I want to load this file into Data Lake Table. And then want to do ...

  • 1215 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Learner bricks​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 0 kudos
1 More Replies
齐木木
by New Contributor III
  • 1437 Views
  • 1 replies
  • 3 kudos

Resolved! The case class reports an error when running in the notebook

As shown in the figure, the case class and the json string are converted through fasterxml.jackson, but an unexpected error occurred during the running of the code. I think this problem may be related to the loading principle of the notebook. Because...

image.png local image
  • 1437 Views
  • 1 replies
  • 3 kudos
Latest Reply
齐木木
New Contributor III
  • 3 kudos

code:var str="{\"app_type\":\"installed-app\"}" import com.fasterxml.jackson.databind.ObjectMapper import com.fasterxml.jackson.module.scala.DefaultScalaModule val mapper = new ObjectMapper() mapper.registerModule(DefaultScalaModule) ...

  • 3 kudos
Sha_1890
by New Contributor III
  • 4248 Views
  • 8 replies
  • 0 kudos

How to execute a series of stored procedures using scala in databricks

I am working in a migration project, where lift and shift method is used to migrate SQL server DB from onprem to AZure Cloud. There are a lot of stored procedures used for integration in On prem. Now here in On prem , to process the XMl file and exec...

  • 4248 Views
  • 8 replies
  • 0 kudos
Latest Reply
Noopur_Nigam
Valued Contributor II
  • 0 kudos

Hi @shafana Roohi Jahubar​ I hope that your queries are answered. Please let me know if you have more doubts.

  • 0 kudos
7 More Replies
isaac_gritz
by Valued Contributor II
  • 1609 Views
  • 2 replies
  • 7 kudos

Versions of Spark, Python, Scala, R in each Databricks Runtime

What version of Spark, Python, Scala, R are included in each Databricks Runtime? What libraries are pre-installed?You can find this info at the Databricks runtime releases page (AWS | Azure | GCP).Let us know if you have any additional questions on t...

  • 1609 Views
  • 2 replies
  • 7 kudos
Latest Reply
maxdata
New Contributor II
  • 7 kudos

Wow! Thanks for the help @Isaac Gritz​ !

  • 7 kudos
1 More Replies
Sunny
by New Contributor III
  • 6235 Views
  • 6 replies
  • 1 kudos

Using Thread.sleep in Scala

We need to hit REST web service every 5 mins until success message is received. The Scala object is inside a Jar file and gets invoked by Databricks task within a workflow.Thread.sleep(5000) is working fine but not sure if it is safe practice or is t...

  • 6235 Views
  • 6 replies
  • 1 kudos
Latest Reply
Vartika
Moderator
  • 1 kudos

Hey there @Sundeep P​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.C...

  • 1 kudos
5 More Replies
karthikM
by New Contributor
  • 1587 Views
  • 3 replies
  • 1 kudos

Delta Live Tables

is DLT supported for Scala? Any reference implementations or wikis to get started?

  • 1587 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Karthik Munipalle​, Delta Live Tables queries can be implemented in Python or SQL.Here are few articles best explaining about DLT. Please have a look.https://docs.databricks.com/data-engineering/delta-live-tables/index.htmlhttps://databricks.com/...

  • 1 kudos
2 More Replies
Data_Engineer3
by Contributor III
  • 1877 Views
  • 2 replies
  • 1 kudos

Unable to access Scala and python variables in-between shells in same notebook.

I am facing issue in while accessing python data frame in Scala shell and vice versa. I am getting error variable not defined.

  • 1877 Views
  • 2 replies
  • 1 kudos
Latest Reply
tomasz
Contributor
  • 1 kudos

The context is not shared between Scala and Python so you won't be able to access the same variables directly. However you can use createOrReplaceTempView to create a temporary view of your dataframe and read it in the other language with read_df = s...

  • 1 kudos
1 More Replies
Labels