cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

CBull
by New Contributor III
  • 2537 Views
  • 3 replies
  • 2 kudos

Is there a way in Azure to compare data in one field?

Is there a way to compare a time stamp within on field/column for an individual ID? For example, if I have two records for an ID and the time stamps are within 5 min of each other....I just want to keep the latest. But, for example, if they were an h...

  • 2537 Views
  • 3 replies
  • 2 kudos
Latest Reply
merca
Valued Contributor II
  • 2 kudos

Since you are trying to do this in SQL, I hope someone else can write you the correct answer. The above example is for pyspark. You can check the SQL synax from Databricks documents

  • 2 kudos
2 More Replies
SCOR
by New Contributor II
  • 2333 Views
  • 3 replies
  • 4 kudos

SparkJDBC42.jar Issue ?

Hi there!I am using the SparkJDBC42.jar in my Java application to use my delta lake tables , The connection is made through databricks sql endpoint in where I created a database and store in it my delta tables. I have a simple code to open connection...

  • 2333 Views
  • 3 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @Seifeddine SNOUSSI​ ,Are you still having issue or you were able to resolve this issue? please let us know

  • 4 kudos
2 More Replies
BasavarajAngadi
by Contributor
  • 5206 Views
  • 6 replies
  • 6 kudos

Resolved! Hi Experts I want to know the difference between connecting any BI Tool to Spark SQL and Databricks SQL end point?

Its all about spinning the spark cluster and both spark Sql api and databricks does the same operation what difference does it make to BI tools ?

  • 5206 Views
  • 6 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Thanks @Bilal Aslam​ and @Aman Sehgal​ for jumping in! @Basavaraj Angadi​ â€‹ I want to make sure you got your question(s) answered! Will you let us know? Don't forget, you can select any reply as the "best answer" !

  • 6 kudos
5 More Replies
Databricks_7045
by New Contributor III
  • 2809 Views
  • 3 replies
  • 0 kudos

Resolved! Encapsulate Databricks Pyspark/SparkSql code

Hi All ,I have Custom code ( Pyspark & SparkSQL) (notebooks) which I want to deploy at customer location and encapsulate so that end customers don't see the actual code. Currently we have all code in Notebooks (Pyspark/spark sql). Could you please l...

  • 2809 Views
  • 3 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

With notebooks that is not possible.You can write your code in scala/java and build a jar, which you then run with spark-submit.(example)Or use python and deploy a wheel.(example)This can become quite complex when you have dependencies.Also: a jar et...

  • 0 kudos
2 More Replies
tusworten
by New Contributor II
  • 6538 Views
  • 3 replies
  • 3 kudos

Spark SQL Group by duplicates, collect_list in array of structs and evaluate rows in each group.

I'm begginner working with Spark SQL in Java API. I have a dataset with duplicate clients grouped by ENTITY and DOCUMENT_ID like this:.withColumn( "ROWNUMBER", row_number().over(Window.partitionBy("ENTITY", "ENTITY_DOC").orderBy("ID")))I added a ROWN...

1
  • 6538 Views
  • 3 replies
  • 3 kudos
Latest Reply
tusworten
New Contributor II
  • 3 kudos

Hi @Kaniz Fatma​ Her answer didn't solve my problem but it was useful to learn more about UDFS, which I did not know.

  • 3 kudos
2 More Replies
MRH
by New Contributor II
  • 2399 Views
  • 4 replies
  • 4 kudos

Resolved! Simple Question

Does Spark SQL have both materialized and non-materialized views? With materialized views, it reads from cache for unchanged data, and only from the table for new/changed rows since the view was last accessed? Thanks!

  • 2399 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

AWESOME!

  • 4 kudos
3 More Replies
Robbie
by New Contributor III
  • 2519 Views
  • 1 replies
  • 2 kudos

How can I avoid this 'java.sql.SQLException: Too many connections' error?

I'm having difficulty with a job (parent) that triggers multiple parallel runs of another job (child) in batches (e.g. 10 parallel runs per batch).Occasionally some of the parallel "child" jobs will crash a few minutes in-- either during or immediate...

  • 2519 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

It is MariaDB JDBC error so probably database which you are trying to connect can not handle this amount of concurrent connections (alternatively if you are not connecting to MariaDB databse, MariaDB is used also for hive metastore in your case maria...

  • 2 kudos
saltuk
by Contributor
  • 1396 Views
  • 0 replies
  • 0 kudos

Using Parquet, passing Partition on Insert Overwrite. Partition parenthesis includes equitation and it gives an error.

I am new on Spark sql, we are migrating our Cloudera to Databricks. there are a lot of SQLs done, only a few are on going. We are having some troubles during passing an argument and using it in an equitation on Partition section. LOGDATE is an argu...

  • 1396 Views
  • 0 replies
  • 0 kudos
daschl
by Contributor
  • 12566 Views
  • 18 replies
  • 8 kudos

Resolved! NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser on Databricks Cloud (but not on Spark Directly)

Hi,I'm working for Couchbase on the Couchbase Spark Connector and noticed something weird which I haven't been able to get to the bottom of so far.For query DataFrames we use the Datasource v2 API and we delegate the JSON parsing to the org.apache.sp...

  • 12566 Views
  • 18 replies
  • 8 kudos
Latest Reply
daschl
Contributor
  • 8 kudos

Since there hasn't been any progress on this for over a month, I applied a workaround and copied the classes into the connector source code so we don't have to rely on the databricks classloader. It seems to work in my testing and will be released wi...

  • 8 kudos
17 More Replies
RasmusOlesen
by New Contributor III
  • 7840 Views
  • 4 replies
  • 2 kudos

Upgrading from Spark 2.4 to 3.2: Recursive view errors when using

We get errors like this,Recursive view `x` detected (cycle: `x` -> `x`).. in our long-term working code, that has worked just fine in Spark 2.4.5 (Runtime 6.4), when we run it on a Spark 3.2 cluster (Runtime 10.0).It happens whenever we have,<x is a ...

  • 7840 Views
  • 4 replies
  • 2 kudos
Latest Reply
arkrish
New Contributor II
  • 2 kudos

This is a breaking change introduced in Spark 3.1 From Migration Guide: SQL, Datasets and DataFrame - Spark 3.1.1 Documentation (apache.org)In Spark 3.1, the temporary view will have same behaviors with the permanent view, i.e. capture and store runt...

  • 2 kudos
3 More Replies
sarvesh
by Contributor III
  • 5022 Views
  • 4 replies
  • 3 kudos

read percentage values in spark ( no casting )

I have a xlsx file which has a single column ;percentage30%40%50%-10%0.00%0%0.10%110%99.99%99.98%-99.99%-99.98%when i read this using Apache-Spark out put i get is,|percentage|+----------+| 0.3|| 0.4|| 0.5|| -0.1|| 0.0|| ...

  • 5022 Views
  • 4 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

Affirmative. This is how excel stores percentages. What you see is just cell formatting.Databricks notebooks do not (yet?) have the possibility to format the output.But it is easy to use a BI tool on top of Databricks, where you can change the for...

  • 3 kudos
3 More Replies
Constantine
by Contributor III
  • 10201 Views
  • 4 replies
  • 4 kudos

Resolved! How does Spark do lazy evaluation?

For context, I am running Spark on databricks platform and using Delta Tables (s3). Let's assume we a table called table_one. I create a view called view_one using the table and then call view_one. Next, I create another view, called view_two based o...

  • 10201 Views
  • 4 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @John Constantine​ ,The following notebook url will help you to undertand better the difference between lazy transformations and action in Spark. You will be able to compare the physical query plans and undertand better what is going on when you e...

  • 4 kudos
3 More Replies
Constantine
by Contributor III
  • 2830 Views
  • 2 replies
  • 4 kudos

Resolved! Generating Spark SQL query using Python

I have a Spark SQL notebook on DB where I have a sql query likeSELECT * FROM table_name WHERE condition_1 = 'fname' OR condition_1 = 'lname' OR condition_1 = 'mname' AND condition_2 = 'apple' AND condition_3 ='orange'There are a lot ...

  • 2830 Views
  • 2 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @John Constantine​ ,I think you can also use arrays_overlap() for your OR statements docs here

  • 4 kudos
1 More Replies
krishnakash
by New Contributor II
  • 4077 Views
  • 4 replies
  • 4 kudos

Resolved! Is there any way of determining last stage of SparkSQL Application Execution?

I have created custom UDF's that generate logs. These logs can be flushed by calling another API exposed which is exposed by an internal layer. However I want to call this API just after the execution of the UDF comes to an end. Is there any way of d...

  • 4077 Views
  • 4 replies
  • 4 kudos
Latest Reply
User16763506586
Contributor
  • 4 kudos

@Krishna Kashiv​ May be ExecutorPlugin.java can help. It has all the methods you might required. Let me know if it works or not.You need to implement this interface org.apache.spark.api.plugin.SparkPluginand expose it as spark.plugins = com.abc.Imp...

  • 4 kudos
3 More Replies
Labels