cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Constantine
by Contributor III
  • 8370 Views
  • 3 replies
  • 6 kudos

Resolved! CREATE TEMP TABLE FROM CTE

I have written a CTE in Spark SQL WITH temp_data AS (   ......   )   CREATE VIEW AS temp_view FROM SELECT * FROM temp_view; I get a cryptic error. Is there a way to create a temp view from CTE using Spark SQL in databricks?

  • 8370 Views
  • 3 replies
  • 6 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 6 kudos

In the CTE you can't do a CREATE. It expects an expression in the form of expression_name [ ( column_name [ , ... ] ) ] [ AS ] ( query )where expression_name specifies a name for the common table expression.If you want to create a view from a CTE, y...

  • 6 kudos
2 More Replies
ramravi
by Contributor II
  • 5268 Views
  • 2 replies
  • 0 kudos

spark is case sensitive? Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select eit...

spark is case sensitive?Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select either "Name" or "name" column you will get column ambiguity error.There is a way to handle this issue b...

  • 5268 Views
  • 2 replies
  • 0 kudos
Latest Reply
source2sea
Contributor
  • 0 kudos

Hi, even though i set the conf to be true, on writing to disk it had exceptions complaining it has duplicate columns.below is the error message org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the data to save: branchavailablity....

  • 0 kudos
1 More Replies
lnsnarayanan
by New Contributor II
  • 5281 Views
  • 7 replies
  • 9 kudos

Resolved! I cannot see the Hive databases or tables once I terminate the cluster and use another cluster.

I am using Databricks community edition for learning purposes. I created some Hive-managed tables through spark sql as well as with df.saveAsTable options. But when I connect to a new cluser, "Show databases" only returns the default database....

  • 5281 Views
  • 7 replies
  • 9 kudos
Latest Reply
dez
New Contributor II
  • 9 kudos

This "feature" in the Community edition is based on the fact that I cannot restart a cluster. So, in the morning I create a cluster for studies purposes and in the afternoon I have to recreate the cluster.If there's any dependent objects from previou...

  • 9 kudos
6 More Replies
learnerbricks
by New Contributor II
  • 3351 Views
  • 4 replies
  • 0 kudos

Unable to save file in DBFS

I have took the azure datasets that are available for practice. I got the 10 days data from that dataset and now I want to save this data into DBFS in csv format. I have facing an error :" No such file or directory: 'No such file or directory: '/dbfs...

  • 3351 Views
  • 4 replies
  • 0 kudos
Latest Reply
pardosa
New Contributor II
  • 0 kudos

Hi,after some exercise you need to aware folder create in dbutils.fs.mkdirs("/dbfs/tmp/myfolder") it's created in /dbfs/dbfs/tmp/myfolderif you want to access path to_csv("/dbfs/tmp/myfolder/mytest.csv") you should created with this script dbutils.fs...

  • 0 kudos
3 More Replies
alexisjohnson
by New Contributor III
  • 4601 Views
  • 7 replies
  • 6 kudos

Resolved! Window function using last/last_value with PARTITION BY/ORDER BY has unexpected results

Hi, I'm wondering if this is the expected behavior when using last or last_value in a window function? I've written a query like this:select col1, col2, last_value(col2) over (partition by col1 order by col2) as column2_last from values ...

Screen Shot 2021-11-18 at 12.48.25 PM Screen Shot 2021-11-18 at 12.48.32 PM
  • 4601 Views
  • 7 replies
  • 6 kudos
Latest Reply
Carv
Visitor II
  • 6 kudos

For those stumbling across this; it seems LAST_VALUE emulates the same functionality as it does in SQL Server which does not, in most people's minds, have a proper row/range frame for the window. You can adjust it with the below syntax.I understand l...

  • 6 kudos
6 More Replies
ros
by New Contributor III
  • 631 Views
  • 2 replies
  • 2 kudos

merge vs MERGE INTO

from 10.4 LTS version we have low shuffle merge, so merge is more faster. But what about MERGE INTO function that we run in sql notebook of databricks. Is there any performance difference when we use databrciks pyspark ".merge" function vs databricks...

  • 631 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Roshan RC​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 2 kudos
1 More Replies
RateVan
by New Contributor II
  • 1226 Views
  • 3 replies
  • 0 kudos

Spark last window dont flush in append mode

The problem is very simple, when you use TUMBLING window with append mode, then the window is closed only when the next message arrives (+watermark logic). In the current implementation, if you stop incoming streaming data, the last window will NEVER...

3P1l3
  • 1226 Views
  • 3 replies
  • 0 kudos
Latest Reply
RateVan
New Contributor II
  • 0 kudos

No, the problem remains the same. The meaning doesn't change because you increased the timeout a little bit. As the window did not close, and does not close until a new message arrives

  • 0 kudos
2 More Replies
bluesky
by New Contributor II
  • 1294 Views
  • 2 replies
  • 1 kudos

Identity error Spark Sql:not enough data columns;target has 3 but the inserted data has 2, it's the identity column which is missing here

While inserting into target table i am getting an error '"not enough data columns;target has 3 but the inserted data has 2" but it's the identity column which is the 8th column ".insert into table A(col 1,col 2,col3)select col2,col3from table Bjoin t...

  • 1294 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @sky blue​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 1 kudos
1 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 1757 Views
  • 3 replies
  • 5 kudos

Support of running multiple cells at a time in databricks notebook Hi all,Now databricks notebook supports parallel run of commands in a single notebo...

Support of running multiple cells at a time in databricks notebookHi all,Now databricks notebook supports parallel run of commands in a single notebook that will help run ad hoc queries simultaneously without creating a separate notebook.Once you run...

image.png image
  • 1757 Views
  • 3 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Ajay Pandey​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

  • 5 kudos
2 More Replies
Jyo777
by Contributor
  • 2546 Views
  • 4 replies
  • 4 kudos

need help with Azure Databricks questions on CTE and SQL syntax within notebooks

Hi amazing community folks,Feel free to share your experience or knowledge regarding below questions:-1.) Can we pass a CTE sql statement into spark jdbc? i tried to do it i couldn't but i can pass normal sql (Select * from ) and it works. i heard th...

  • 2546 Views
  • 4 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @Jyoti j​​, We haven't heard from you since the last response from @Suteja Kanuri​, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to others....

  • 4 kudos
3 More Replies
oleole
by Contributor
  • 4695 Views
  • 3 replies
  • 2 kudos

Resolved! Using "FOR XML PATH" in Spark SQL in sql syntax

I'm using spark version 3.2.1 on databricks (DBR 10.4 LTS), and I'm trying to convert sql server sql query to a new sql query that runs on a spark cluster using spark sql in sql syntax. However, spark sql does not seem to support XML PATH as a functi...

input output
  • 4695 Views
  • 3 replies
  • 2 kudos
Latest Reply
oleole
Contributor
  • 2 kudos

Posting the solution that I ended up using:%sql DROP TABLE if exists UserCountry; CREATE TABLE if not exists UserCountry ( UserID INT, Country VARCHAR(5000) ); INSERT INTO UserCountry SELECT L.UserID AS UserID, CONCAT_WS(',', co...

  • 2 kudos
2 More Replies
oleole
by Contributor
  • 6088 Views
  • 1 replies
  • 1 kudos

Resolved! MERGE to update a column of a table using Spark SQL

Coming from MS SQL background, I'm trying to write a query in Spark SQL that simply update a column value of table A (source table) by INNER JOINing a new table B with a filter.MS SQL query looks like this:UPDATE T SET T.OfferAmount = OSE.EndpointEve...

  • 6088 Views
  • 1 replies
  • 1 kudos
Latest Reply
oleole
Contributor
  • 1 kudos

Posting answer to my question:   MERGE into TempOffer VIEW USING OfferSeq OSE ON VIEW.OfferId = OSE.OfferID AND OSE.OfferId = 1 WHEN MATCHED THEN UPDATE set VIEW.OfferAmount = OSE.EndpointEventAmountValue;

  • 1 kudos
JJL
by New Contributor II
  • 8039 Views
  • 3 replies
  • 3 kudos

Resolved! Does Spark SQL can perform UPDATE with INNER JOIN and LIKE with '%' + [column] + '%' ?

Hi All,I came from MS SQL and just started to learning more about Spark SQLHere is one part that I'm trying to perform. In MS SQL, it can be easily done, but it seems like it doesn't in SparkSo, I want to make a simple update to the record, if the co...

  • 8039 Views
  • 3 replies
  • 3 kudos
Latest Reply
oleole
Contributor
  • 3 kudos

@Hubert Dudek​ Hello, I'm having the same issue with using UPDATE in spark sql and came across your answer. When you say "replace source_table_reference with view" in MERGE, do you mean to replace "P" with "VIEW" that looks something as below:%sql ME...

  • 3 kudos
2 More Replies
ramankr48
by Contributor II
  • 11443 Views
  • 5 replies
  • 8 kudos

Resolved! How to get all the tables name with a specific column or columns in a database?

let's say there is a database db in which 700 tables are there, and we need to find all the tables name in which column "project_id" is present.just an example for ubderstanding the questions.

  • 11443 Views
  • 5 replies
  • 8 kudos
Latest Reply
Anonymous
Not applicable
  • 8 kudos

databaseName = "db" desiredColumn = "project_id" database = spark.sql(f"show tables in {databaseName} ").collect() tablenames = [] for row in database: cols = spark.table(row.tableName).columns if desiredColumn in cols: tablenames.append(row....

  • 8 kudos
4 More Replies
Labels