cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

enavuio
by New Contributor II
  • 1626 Views
  • 2 replies
  • 3 kudos

Count on External Table to Azure Data Storage is taking too long

I have created an External table to Azure Data Lake Storage Gen2.The Container has about 200K Json files.The structure of the json files are created with```CREATE EXTERNAL TABLE IF NOT EXISTS dbo.table(    ComponentInfo STRUCT<ComponentHost: STRING, ...

  • 1626 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Ena Vu​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 3 kudos
1 More Replies
Sadiq
by New Contributor III
  • 3301 Views
  • 5 replies
  • 4 kudos

Fixed length file from Databricks notebook ( Spark SQL)

Hi ,I need help writing data from azure databricks notebook into Fixed Length .txt.notebook has 10 lakh rows and 86 columns. can anyone suggest me

  • 3301 Views
  • 5 replies
  • 4 kudos
Latest Reply
Vidula
Honored Contributor
  • 4 kudos

Hi @sadiq vali​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 4 kudos
4 More Replies
HariharaSam
by Contributor
  • 16333 Views
  • 3 replies
  • 4 kudos

Using variables in Spark SQL

Is there a way to declare variables in Spark SQL like we do it in T-SQL?

  • 16333 Views
  • 3 replies
  • 4 kudos
Latest Reply
Debayan
Databricks Employee
  • 4 kudos

Could you please follow the below link and let us know if this helps? https://community.databricks.com/s/question/0D53f00001HKHa3CAH/how-do-i-pass-parameters-to-my-sql-statements

  • 4 kudos
2 More Replies
Data_Engineer3
by Contributor III
  • 5931 Views
  • 4 replies
  • 1 kudos

Unable to read data from Elasticsearch with spark in Databricks.

When I am trying to read data from elasticsearch by spark sql, it throw an error like RuntimeException: Error while encoding: java.lang.RuntimeException: scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema of string...

  • 5931 Views
  • 4 replies
  • 1 kudos
Latest Reply
Vidula
Honored Contributor
  • 1 kudos

Hi there @KARTHICK N​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.T...

  • 1 kudos
3 More Replies
ejloh
by New Contributor II
  • 3423 Views
  • 3 replies
  • 1 kudos

SQL query with leads and lags

I'm trying to create a new column that fills in the nulls below. I tried using leads and lags but isn't turning out right. Basically trying to figure out who is in "possession" of the record, given the TransferFrom and TransferTo columns and sequence...

image image
  • 3423 Views
  • 3 replies
  • 1 kudos
Latest Reply
Vidula
Honored Contributor
  • 1 kudos

Hi there @Eric Lohbeck​ Does @Hubert Dudek​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 1 kudos
2 More Replies
parthibsg
by New Contributor II
  • 1483 Views
  • 1 replies
  • 2 kudos

When to use Dataframes API over Spark SQL

Hello Experts,I am new to Databricks. Building data pipelines, I have both batch and streaming data.Should I use Dataframes API to read csv files then convert to parquet format then do the transformation? orwrite to table using CSV then use Spark SQL...

  • 1483 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi Rathinam, It would be better to understand the pipeline more in this situation. Writing to table using CSV and then using spark SQL will be faster in few cases than the other one.

  • 2 kudos
KumarShiv
by New Contributor III
  • 2124 Views
  • 2 replies
  • 2 kudos

Resolved! Databricks Spark SQL function "PERCENTILE_DISC()" output not accurate.

I am try to get the percentile values on different splits but I got that the result of Databricks PERCENTILE_DISC() function is not accurate . I have run the same query on MS SQL but getting different result set.Here are both result sets for Pyspark ...

  • 2124 Views
  • 2 replies
  • 2 kudos
Latest Reply
artsheiko
Databricks Employee
  • 2 kudos

The reason might be that in SQL PERCENTILE_DISC is nondeterministic

  • 2 kudos
1 More Replies
AJ270990
by Contributor II
  • 17584 Views
  • 3 replies
  • 0 kudos

Resolved! I am getting ParseException: error while running the spark SQL query

I am using below code to create the Spark session and also loading the csv file. Spark session and loading csv is running well. However SQL query is generating the Parse Exception.%pythonfrom pyspark.sql import SparkSession     # Create a SparkSessio...

  • 17584 Views
  • 3 replies
  • 0 kudos
Latest Reply
AJ270990
Contributor II
  • 0 kudos

This is resolved. Below query works fine nowsqldf = spark.sql("select sum(cast(enrollment as float)), sum(cast(growth as float)),`plan type`,`Parent Organization`,state,`Special Needs Plan`,`Plan Name Sec A`, CASE when `Plan ID` between '800' and '89...

  • 0 kudos
2 More Replies
C_1
by New Contributor III
  • 4746 Views
  • 5 replies
  • 4 kudos

Resolved! Databricks notebook command logging

Hello Community,I am trying to search for Databricks notebook command logging feature for compliance purpose.My requirement is to log the exact spark sql fired by user.I didnt get spark sql (notebook command) tracked under this azure diagnostic logs....

  • 4746 Views
  • 5 replies
  • 4 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 4 kudos

Hi @C P​ we don't have this feature implemented, however, there is already an existing idea available in our idea portal here: https://databricks.aha.io/features/DB-7583.You can check and vote the same.

  • 4 kudos
4 More Replies
bdugar
by New Contributor II
  • 16123 Views
  • 1 replies
  • 2 kudos

Creating permanent views from dataframes?

Hi:It's possible to create temp views in pyspark using a dataframe (df.createOrReplaceTempView()), and it's possible to create a permanent view in Spark SQL. But as far as I can tell, there is no way to create a permanent view from a dataframe, somet...

  • 16123 Views
  • 1 replies
  • 2 kudos
Latest Reply
bdugar
New Contributor II
  • 2 kudos

Hi Kaniz:This is what I understood from the research I did, I was curious more as to why permanent views can't be created from dataframes and whether this is a feature that might be implemented by Databricks or Spark at some point. Temporary views ca...

  • 2 kudos
lav
by New Contributor III
  • 1112 Views
  • 1 replies
  • 1 kudos

Correlated Column Exception in Spark SQL

Hi Johan,Were you able to resolve the correlated column exception issue? I have been stuck on this since past week. If you can guide me that will be alot of help.Thanks.

  • 1112 Views
  • 1 replies
  • 1 kudos
Latest Reply
Johan_Van_Noten
New Contributor III
  • 1 kudos

Seems to be a duplicate of your comment on https://community.databricks.com/s/question/0D53f00001XCuCACA1/correlated-column-exception-in-sql-udf-when-using-udf-parameters. I guess you did that to be able to put other tags?

  • 1 kudos
nickg
by New Contributor III
  • 4484 Views
  • 6 replies
  • 3 kudos

Resolved! I am looking to use the pivot function with Spark SQL (not Python)

Hello. I am trying to using the Pivot function for email addresses. This is what I have so far:Select fname, lname, awUniqueID, Email1, Email2From xxxxxxxxPivot (    count(Email) as Test    For Email    In (1 as Email1, 2 as Email2)    )I get everyth...

  • 4484 Views
  • 6 replies
  • 3 kudos
Latest Reply
nickg
New Contributor III
  • 3 kudos

source data:fname lname awUniqueID EmailJohn Smith 22 jsmith@gmail.comJODI JONES 22 jsmith@live.comDesired output:fname lname awUniqueID Em...

  • 3 kudos
5 More Replies
ernijed
by New Contributor II
  • 7626 Views
  • 3 replies
  • 3 kudos

Resolved! Error in SQL statement: SparkFatalException. How to fix it?

When i try to execute sql query(2 joins) i get below message: com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.util.SparkFatalException at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$a...

  • 7626 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

@Erni Jed​ , I tested, and your query is ok. So it has to be some other issue. Maybe you could try it on a smaller data set. Please analyze/debug also using SPARK UI.

  • 3 kudos
2 More Replies
ImAbhishekTomar
by New Contributor III
  • 3032 Views
  • 1 replies
  • 1 kudos

Resolved! Trying to Flatten My Json using CosmosDB Spark connector - Azure Databricks

Hi,Using the below cosmos DB query it is possible to achieve the expected output, but how can I do the same with spark SQL in Databricks.COSMOSDB QUERY : select c.ReportId,c.ReportName,i.price,p as provider from c join i in in_network join p in i.pr...

  • 3032 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Hi @Abhishek Tomar​ , If you want to get it from Cosmos DB, use the connector with a custom query https://github.com/Azure/azure-cosmosdb-sparkIf you want to have JSON imported directly by databricks/spark, please go with the below solution:SELECT ...

  • 1 kudos
haseebkhan1421
by New Contributor
  • 13989 Views
  • 2 replies
  • 1 kudos

Resolved! How can I access python variable in Spark SQL?

I have python variable created under %python in my jupyter notebook file in Azure Databricks. How can I access the same variable to make comparisons under %sql. Below is the example:%python RunID_Goal = sqlContext.sql("SELECT CONCAT(SUBSTRING(RunID,...

  • 13989 Views
  • 2 replies
  • 1 kudos
Latest Reply
Nirupam
New Contributor III
  • 1 kudos

You can use {} in spark.sql() of pyspark/scala instead of making a sql cell using %sql.This will result in a dataframe. If you want you can create a view on top of this using createOrReplaceTempView()Below is an example to use a variable:-# A variab...

  • 1 kudos
1 More Replies
Labels