Topics with Label: Spark sql

Forum Posts

Sorted by:

Start a conversation

by HariharaSam • Contributor

08-30-2022 2:38:10 AM

12008 Views
4 replies
4 kudos

Using variables in Spark SQL

Is there a way to declare variables in Spark SQL like we do it in T-SQL?

Data Engineering

12008 Views
4 replies
4 kudos

08-30-2022 2:38:10 AM

View Replies

Latest Reply

Kaniz
Community Manager

08-31-2022 2:16:35 AM

4 kudos

Hi @Hariharan Sambath, We haven't heard from you on the last response from @Debayan Mukherjee and @Hubert Dudek, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as...

4 kudos

08-31-2022 2:16:35 AM

3 More Replies

by Data_Engineer3 • Contributor II

07-20-2022 9:18:19 PM

3565 Views
4 replies
1 kudos

Unable to read data from Elasticsearch with spark in Databricks.

When I am trying to read data from elasticsearch by spark sql, it throw an error like RuntimeException: Error while encoding: java.lang.RuntimeException: scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema of string...

Data Engineering

3565 Views
4 replies
1 kudos

07-20-2022 9:18:19 PM

View Replies

Latest Reply

Vidula
Honored Contributor

09-05-2022 5:16:37 AM

1 kudos

Hi there @KARTHICK N Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.T...

1 kudos

09-05-2022 5:16:37 AM

3 More Replies

by parthibsg • New Contributor II

08-29-2022 7:56:29 PM

750 Views
2 replies
2 kudos

When to use Dataframes API over Spark SQL

Hello Experts,I am new to Databricks. Building data pipelines, I have both batch and streaming data.Should I use Dataframes API to read csv files then convert to parquet format then do the transformation? orwrite to table using CSV then use Spark SQL...

Data Engineering

750 Views
2 replies
2 kudos

08-29-2022 7:56:29 PM

View Replies

Latest Reply

Kaniz
Community Manager

09-03-2022 2:02:00 PM

2 kudos

Hi @Parthib Rathnam, Thank you for reaching out!Let us look into this for you, and we'll follow up with an update.

2 kudos

09-03-2022 2:02:00 PM

1 More Replies

by ejloh • New Contributor II

07-08-2022 6:54:35 AM

1572 Views
3 replies
1 kudos

SQL query with leads and lags

I'm trying to create a new column that fills in the nulls below. I tried using leads and lags but isn't turning out right. Basically trying to figure out who is in "possession" of the record, given the TransferFrom and TransferTo columns and sequence...

Data Engineering

1572 Views
3 replies
1 kudos

07-08-2022 6:54:35 AM

View Replies

Latest Reply

Vidula
Honored Contributor

09-01-2022 10:05:11 PM

1 kudos

Hi there @Eric Lohbeck Does @Hubert Dudek response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

1 kudos

09-01-2022 10:05:11 PM

2 More Replies

by KumarShiv • New Contributor III

08-01-2022 5:36:22 AM

1117 Views
2 replies
2 kudos

Resolved! Databricks Spark SQL function "PERCENTILE_DISC()" output not accurate.

I am try to get the percentile values on different splits but I got that the result of Databricks PERCENTILE_DISC() function is not accurate . I have run the same query on MS SQL but getting different result set.Here are both result sets for Pyspark ...

Data Engineering

1117 Views
2 replies
2 kudos

08-01-2022 5:36:22 AM

View Replies

Latest Reply

artsheiko
Valued Contributor III

08-03-2022 11:17:53 AM

2 kudos

The reason might be that in SQL PERCENTILE_DISC is nondeterministic

2 kudos

08-03-2022 11:17:53 AM

1 More Replies

by AJ270990 • Contributor II

08-09-2022 11:54:30 PM

12131 Views
3 replies
0 kudos

Resolved! I am getting ParseException: error while running the spark SQL query

I am using below code to create the Spark session and also loading the csv file. Spark session and loading csv is running well. However SQL query is generating the Parse Exception.%pythonfrom pyspark.sql import SparkSession # Create a SparkSessio...

Data Engineering

12131 Views
3 replies
0 kudos

08-09-2022 11:54:30 PM

View Replies

Latest Reply

AJ270990
Contributor II

08-10-2022 10:49:33 PM

0 kudos

This is resolved. Below query works fine nowsqldf = spark.sql("select sum(cast(enrollment as float)), sum(cast(growth as float)),`plan type`,`Parent Organization`,state,`Special Needs Plan`,`Plan Name Sec A`, CASE when `Plan ID` between '800' and '89...

0 kudos

08-10-2022 10:49:33 PM

2 More Replies

by C_1 • New Contributor III

03-31-2022 9:06:52 PM

2854 Views
7 replies
5 kudos

Resolved! Databricks notebook command logging

Hello Community,I am trying to search for Databricks notebook command logging feature for compliance purpose.My requirement is to log the exact spark sql fired by user.I didnt get spark sql (notebook command) tracked under this azure diagnostic logs....

Data Engineering

2854 Views
7 replies
5 kudos

03-31-2022 9:06:52 PM

View Replies

Latest Reply

Noopur_Nigam
Valued Contributor II

05-13-2022 8:16:55 AM

5 kudos

Hi @C P we don't have this feature implemented, however, there is already an existing idea available in our idea portal here: https://databricks.aha.io/features/DB-7583.You can check and vote the same.

5 kudos

05-13-2022 8:16:55 AM

6 More Replies

by CBull • New Contributor III

03-15-2022 2:31:48 PM

1403 Views
4 replies
2 kudos

Is there a way in Azure to compare data in one field?

Is there a way to compare a time stamp within on field/column for an individual ID? For example, if I have two records for an ID and the time stamps are within 5 min of each other....I just want to keep the latest. But, for example, if they were an h...

Data Engineering

1403 Views
4 replies
2 kudos

03-15-2022 2:31:48 PM

View Replies

Latest Reply

Kaniz
Community Manager

07-12-2022 5:29:17 AM

2 kudos

Hi @Cory Bullard, We haven't heard from you on the last response from @Merca Ovnerud, and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Otherwis...

2 kudos

07-12-2022 5:29:17 AM

3 More Replies

by bdugar • New Contributor II

07-06-2022 7:32:50 AM

8943 Views
2 replies
2 kudos

Creating permanent views from dataframes?

Hi:It's possible to create temp views in pyspark using a dataframe (df.createOrReplaceTempView()), and it's possible to create a permanent view in Spark SQL. But as far as I can tell, there is no way to create a permanent view from a dataframe, somet...

Data Engineering

8943 Views
2 replies
2 kudos

07-06-2022 7:32:50 AM

View Replies

Latest Reply

bdugar
New Contributor II

07-08-2022 7:04:30 AM

2 kudos

Hi Kaniz:This is what I understood from the research I did, I was curious more as to why permanent views can't be created from dataframes and whether this is a feature that might be implemented by Databricks or Spark at some point. Temporary views ca...

2 kudos

07-08-2022 7:04:30 AM

1 More Replies

by lav • New Contributor III

07-04-2022 10:24:57 PM

596 Views
1 replies
1 kudos

Correlated Column Exception in Spark SQL

Hi Johan,Were you able to resolve the correlated column exception issue? I have been stuck on this since past week. If you can guide me that will be alot of help.Thanks.

Data Engineering

596 Views
1 replies
1 kudos

07-04-2022 10:24:57 PM

View Replies

Latest Reply

Johan_Van_Noten
New Contributor III

07-05-2022 12:24:11 AM

1 kudos

Seems to be a duplicate of your comment on https://community.databricks.com/s/question/0D53f00001XCuCACA1/correlated-column-exception-in-sql-udf-when-using-udf-parameters. I guess you did that to be able to put other tags?

1 kudos

07-05-2022 12:24:11 AM

by Constantine • Contributor III

11-04-2021 1:09:59 PM

6189 Views
3 replies
7 kudos

Resolved! collect_list by preserving order based on another variable - Spark SQL

I am using databricks sql notebook to run these queries. I have a Python UDF like %python from pyspark.sql.functions import udf from pyspark.sql.types import StringType, DoubleType, DateType def get_sell_price(sale_prices): return sale_...

Data Engineering

6189 Views
3 replies
7 kudos

11-04-2021 1:09:59 PM

View Replies

Latest Reply

Kaniz
Community Manager

05-23-2022 7:41:56 AM

7 kudos

Hi @John Constantine , Just a friendly follow-up. Do you still need help or do the above responses help you find the solution? Please let us know.

7 kudos

05-23-2022 7:41:56 AM

2 More Replies

by nickg • New Contributor III

03-30-2022 11:16:10 AM

2283 Views
7 replies
3 kudos

Resolved! I am looking to use the pivot function with Spark SQL (not Python)

Hello. I am trying to using the Pivot function for email addresses. This is what I have so far:Select fname, lname, awUniqueID, Email1, Email2From xxxxxxxxPivot ( count(Email) as Test For Email In (1 as Email1, 2 as Email2) )I get everyth...

Data Engineering

2283 Views
7 replies
3 kudos

03-30-2022 11:16:10 AM

View Replies

Latest Reply

nickg
New Contributor III

03-30-2022 11:46:23 AM

3 kudos

source data:fname lname awUniqueID EmailJohn Smith 22 jsmith@gmail.comJODI JONES 22 jsmith@live.comDesired output:fname lname awUniqueID Em...

3 kudos

03-30-2022 11:46:23 AM

6 More Replies

by ernijed • New Contributor II

04-21-2022 12:48:00 AM

5489 Views
4 replies
3 kudos

Resolved! Error in SQL statement: SparkFatalException. How to fix it?

When i try to execute sql query(2 joins) i get below message: com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.util.SparkFatalException at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$a...

Data Engineering

5489 Views
4 replies
3 kudos

04-21-2022 12:48:00 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

04-25-2022 11:05:31 AM

3 kudos

@Erni Jed , I tested, and your query is ok. So it has to be some other issue. Maybe you could try it on a smaller data set. Please analyze/debug also using SPARK UI.

3 kudos

04-25-2022 11:05:31 AM

3 More Replies

by ImAbhishekTomar • New Contributor III

04-21-2022 8:20:00 AM

1540 Views
2 replies
1 kudos

Resolved! Trying to Flatten My Json using CosmosDB Spark connector - Azure Databricks

Hi,Using the below cosmos DB query it is possible to achieve the expected output, but how can I do the same with spark SQL in Databricks.COSMOSDB QUERY : select c.ReportId,c.ReportName,i.price,p as provider from c join i in in_network join p in i.pr...

Data Engineering

1540 Views
2 replies
1 kudos

04-21-2022 8:20:00 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

04-21-2022 11:24:36 AM

1 kudos

Hi @Abhishek Tomar , If you want to get it from Cosmos DB, use the connector with a custom query https://github.com/Azure/azure-cosmosdb-sparkIf you want to have JSON imported directly by databricks/spark, please go with the below solution:SELECT ...

1 kudos

04-21-2022 11:24:36 AM

1 More Replies

by haseebkhan1421 • New Contributor

08-15-2021 11:00:17 AM

10828 Views
3 replies
1 kudos

Resolved! How can I access python variable in Spark SQL?

I have python variable created under %python in my jupyter notebook file in Azure Databricks. How can I access the same variable to make comparisons under %sql. Below is the example:%python RunID_Goal = sqlContext.sql("SELECT CONCAT(SUBSTRING(RunID,...

Data Engineering

10828 Views
3 replies
1 kudos

08-15-2021 11:00:17 AM

View Replies

Latest Reply

Nirupam
New Contributor III

04-14-2022 1:06:28 AM

1 kudos

You can use {} in spark.sql() of pyspark/scala instead of making a sql cell using %sql.This will result in a dataframe. If you want you can create a view on top of this using createOrReplaceTempView()Below is an example to use a variable:-# A variab...

1 kudos

04-14-2022 1:06:28 AM

2 More Replies