Data Engineering

Forum Posts

Sorted by:

by ramravi • Contributor II

01-02-2023 6:30:29 AM

19151 Views
2 replies
0 kudos

spark is case sensitive? Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select eit...

spark is case sensitive?Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select either "Name" or "name" column you will get column ambiguity error.There is a way to handle this issue b...

Data Engineering

19151 Views
2 replies
0 kudos

01-02-2023 6:30:29 AM

View Replies

Latest Reply

zerospeed
New Contributor II

02-05-2025 8:39:15 AM

0 kudos

Hi I had similar issues with parquet files when trying to query athena, fix was i had to inspect the parquet file since it contained columns such as "Name", "name" which the aws crawler / athena would interpret as a duplicate column since it would se...

0 kudos

02-05-2025 8:39:15 AM

1 More Replies

by Ajay-Pandey • Esteemed Contributor III

02-10-2023 5:05:14 AM

6222 Views
5 replies
5 kudos

Support of running multiple cells at a time in databricks notebook Hi all,Now databricks notebook supports parallel run of commands in a single notebo...

Support of running multiple cells at a time in databricks notebookHi all,Now databricks notebook supports parallel run of commands in a single notebook that will help run ad hoc queries simultaneously without creating a separate notebook.Once you run...

Data Engineering

6222 Views
5 replies
5 kudos

02-10-2023 5:05:14 AM

View Replies

Latest Reply

SunilUIIT
New Contributor II

01-09-2025 4:59:15 AM

5 kudos

Hi Team,I am observing that the functionality is not working as expected in the Trial workspace of Databricks. Is there a setting that needs to be enabled to allow independent SQL cells in a Databricks notebook to run in parallel, while dependent cel...

5 kudos

01-09-2025 4:59:15 AM

4 More Replies

by najmead • Contributor

01-24-2023 8:33:32 PM

24789 Views
7 replies
13 kudos

How to convert string to datetime with correct timezone?

I have a field stored as a string in the format "12/30/2022 10:30:00 AM"If I use the function TO_DATE, I only get the date part... I want the full date and time.If I use the function TO_TIMESTAMP, I get the date and time, but it's assumed to be UTC, ...

Data Engineering

24789 Views
7 replies
13 kudos

01-24-2023 8:33:32 PM

View Replies

Latest Reply

Rajeev_Basu
Contributor III

02-03-2023 3:48:45 AM

13 kudos

use from_utc_timestamp(to_timestam("<string>", <format>),<timezone>)

13 kudos

02-03-2023 3:48:45 AM

6 More Replies

by Jyo777 • Contributor

03-10-2023 9:03:59 AM

6224 Views
7 replies
4 kudos

need help with Azure Databricks questions on CTE and SQL syntax within notebooks

Hi amazing community folks,Feel free to share your experience or knowledge regarding below questions:-1.) Can we pass a CTE sql statement into spark jdbc? i tried to do it i couldn't but i can pass normal sql (Select * from ) and it works. i heard th...

Data Engineering

6224 Views
7 replies
4 kudos

03-10-2023 9:03:59 AM

View Replies

Latest Reply

Rjdudley
Honored Contributor

12-06-2024 6:31:59 PM

4 kudos

Not a comparison, but there is a DB-SQL cheatsheet at https://www.databricks.com/sites/default/files/2023-09/databricks-sql-cheatsheet.pdf/

4 kudos

12-06-2024 6:31:59 PM

6 More Replies

by RateVan • New Contributor II

04-01-2023 4:31:49 AM

3121 Views
4 replies
0 kudos

Spark last window dont flush in append mode

The problem is very simple, when you use TUMBLING window with append mode, then the window is closed only when the next message arrives (+watermark logic). In the current implementation, if you stop incoming streaming data, the last window will NEVER...

Data Engineering

3121 Views
4 replies
0 kudos

04-01-2023 4:31:49 AM

View Replies

Latest Reply

Dtank
New Contributor II

12-05-2024 1:00:52 AM

0 kudos

Do you have any solution for this ?

0 kudos

12-05-2024 1:00:52 AM

3 More Replies

by brickster_2018 • Databricks Employee

06-25-2021 2:51:19 PM

11098 Views
2 replies
0 kudos

Resolved! What is the maximum limit of data that can be broadcasted using broadcast join

Data Engineering

11098 Views
2 replies
0 kudos

06-25-2021 2:51:19 PM

View Replies

Latest Reply

lchari
New Contributor II

11-16-2024 6:01:19 AM

0 kudos

Is the limit per "table/dataframe" or for all tables/dataframes put together?The driver collects the data from all executors (which are having the respective table or dataframe) and distributes to all executors. When will the memory be released in bo...

0 kudos

11-16-2024 6:01:19 AM

1 More Replies

by Arpi • New Contributor II

02-18-2023 9:31:29 AM

4075 Views
4 replies
4 kudos

Resolved! Database creation error

I am trying to create database with external location abfss but facing the below error.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs....

Data Engineering

4075 Views
4 replies
4 kudos

02-18-2023 9:31:29 AM

View Replies

Latest Reply

source2sea
Contributor

10-21-2024 8:21:14 AM

4 kudos

Changing it to a CLUSTER level for OAuth authentication helped me solve the problem.I wish the notebook AI bot could tell me the solution.before the changes, my configraiotn was at the notebook leve.and it has below errorsAnalysisException: org.apac...

4 kudos

10-21-2024 8:21:14 AM

3 More Replies

by Constantine • Contributor III

11-04-2021 1:09:59 PM

13747 Views
3 replies
7 kudos

Resolved! collect_list by preserving order based on another variable - Spark SQL

I am using databricks sql notebook to run these queries. I have a Python UDF like %python from pyspark.sql.functions import udf from pyspark.sql.types import StringType, DoubleType, DateType def get_sell_price(sale_prices): return sale_...

Data Engineering

13747 Views
3 replies
7 kudos

11-04-2021 1:09:59 PM

View Replies

Latest Reply

villi77
New Contributor II

10-14-2024 11:22:34 AM

7 kudos

I had a similar situation where I was trying to order the days of the week from Monday to Sunday. I saw solutions that use Python but was wanting to do it all in SQL. My original attempt was to use: CONCAT_WS(',', COLLECT_LIST(DISTINCT t.LOAD_ORIG_...

7 kudos

10-14-2024 11:22:34 AM

2 More Replies

by lnsnarayanan • New Contributor II

08-22-2021 12:05:47 AM

12337 Views
8 replies
12 kudos

Resolved! I cannot see the Hive databases or tables once I terminate the cluster and use another cluster.

I am using Databricks community edition for learning purposes. I created some Hive-managed tables through spark sql as well as with df.saveAsTable options. But when I connect to a new cluser, "Show databases" only returns the default database....

Data Engineering

12337 Views
8 replies
12 kudos

08-22-2021 12:05:47 AM

View Replies

Latest Reply

dhpaulino
New Contributor II

06-03-2024 11:24:08 AM

12 kudos

As the file still in the dbfs you can just recreate the reference of your tables and continue the work, with something like this:db_name = "mydb" from pathlib import Path path_db = f"dbfs:/user/hive/warehouse/{db_name}.db/" tables_dirs = dbutils.fs.l...

12 kudos

06-03-2024 11:24:08 AM

7 More Replies

by Constantine • Contributor III

11-02-2021 7:46:29 PM

14489 Views
2 replies
6 kudos

Resolved! CREATE TEMP TABLE FROM CTE

I have written a CTE in Spark SQL WITH temp_data AS ( ...... ) CREATE VIEW AS temp_view FROM SELECT * FROM temp_view; I get a cryptic error. Is there a way to create a temp view from CTE using Spark SQL in databricks?

Data Engineering

14489 Views
2 replies
6 kudos

11-02-2021 7:46:29 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

11-03-2021 12:08:56 AM

6 kudos

In the CTE you can't do a CREATE. It expects an expression in the form of expression_name [ ( column_name [ , ... ] ) ] [ AS ] ( query )where expression_name specifies a name for the common table expression.If you want to create a view from a CTE, y...

6 kudos

11-03-2021 12:08:56 AM

1 More Replies

by learnerbricks • New Contributor II

09-14-2022 1:11:16 AM

6767 Views
4 replies
0 kudos

Unable to save file in DBFS

I have took the azure datasets that are available for practice. I got the 10 days data from that dataset and now I want to save this data into DBFS in csv format. I have facing an error :" No such file or directory: 'No such file or directory: '/dbfs...

Data Engineering

6767 Views
4 replies
0 kudos

09-14-2022 1:11:16 AM

View Replies

Latest Reply

pardosa
New Contributor II

11-14-2023 8:34:06 PM

0 kudos

Hi,after some exercise you need to aware folder create in dbutils.fs.mkdirs("/dbfs/tmp/myfolder") it's created in /dbfs/dbfs/tmp/myfolderif you want to access path to_csv("/dbfs/tmp/myfolder/mytest.csv") you should created with this script dbutils.fs...

0 kudos

11-14-2023 8:34:06 PM

3 More Replies

by alexisjohnson • New Contributor III

11-18-2021 10:46:17 AM

14099 Views
5 replies
7 kudos

Resolved! Window function using last/last_value with PARTITION BY/ORDER BY has unexpected results

Hi, I'm wondering if this is the expected behavior when using last or last_value in a window function? I've written a query like this:select col1, col2, last_value(col2) over (partition by col1 order by col2) as column2_last from values ...

Data Engineering

14099 Views
5 replies
7 kudos

11-18-2021 10:46:17 AM

View Replies

Latest Reply

Carv
New Contributor II

07-12-2023 1:00:31 PM

7 kudos

For those stumbling across this; it seems LAST_VALUE emulates the same functionality as it does in SQL Server which does not, in most people's minds, have a proper row/range frame for the window. You can adjust it with the below syntax.I understand l...

7 kudos

07-12-2023 1:00:31 PM

4 More Replies

by NathanSundarara • Contributor

05-12-2023 6:36:41 PM

3150 Views
4 replies
1 kudos

sample

Help parsing the JSON using Spark SQL or python. Sample json attached.

Data Engineering

3150 Views
4 replies
1 kudos

05-12-2023 6:36:41 PM

View Replies

Latest Reply

NathanSundarara
Contributor

05-22-2023 5:45:16 AM

1 kudos

@Suteja Kanuri can you please respond to my question above?

1 kudos

05-22-2023 5:45:16 AM

3 More Replies

by ros • New Contributor III

05-31-2023 12:47:59 AM

2374 Views
2 replies
2 kudos

merge vs MERGE INTO

from 10.4 LTS version we have low shuffle merge, so merge is more faster. But what about MERGE INTO function that we run in sql notebook of databricks. Is there any performance difference when we use databrciks pyspark ".merge" function vs databricks...

Data Engineering

2374 Views
2 replies
2 kudos

05-31-2023 12:47:59 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-01-2023 12:10:35 AM

2 kudos

Hi @Roshan RC Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

2 kudos

06-01-2023 12:10:35 AM

1 More Replies

by bluesky • New Contributor II

02-04-2023 10:51:00 AM

2837 Views
2 replies
1 kudos

Identity error Spark Sql:not enough data columns;target has 3 but the inserted data has 2, it's the identity column which is missing here

While inserting into target table i am getting an error '"not enough data columns;target has 3 but the inserted data has 2" but it's the identity column which is the 8th column ".insert into table A(col 1,col 2,col3)select col2,col3from table Bjoin t...

Data Engineering

2837 Views
2 replies
1 kudos

02-04-2023 10:51:00 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 3:05:59 AM

1 kudos

Hi @sky blue Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

1 kudos

04-10-2023 3:05:59 AM

1 More Replies