cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

brickster_2018
by Databricks Employee
  • 8382 Views
  • 2 replies
  • 0 kudos
  • 8382 Views
  • 2 replies
  • 0 kudos
Latest Reply
lchari
New Contributor II
  • 0 kudos

Is the limit per "table/dataframe" or for all tables/dataframes put together?The driver collects the data from all executors (which are having the respective table or dataframe) and distributes to all executors. When will the memory be released in bo...

  • 0 kudos
1 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 4366 Views
  • 4 replies
  • 5 kudos

Support of running multiple cells at a time in databricks notebook Hi all,Now databricks notebook supports parallel run of commands in a single notebo...

Support of running multiple cells at a time in databricks notebookHi all,Now databricks notebook supports parallel run of commands in a single notebook that will help run ad hoc queries simultaneously without creating a separate notebook.Once you run...

image.png image
  • 4366 Views
  • 4 replies
  • 5 kudos
Latest Reply
SenthilRT
New Contributor II
  • 5 kudos

Can we run parallel cell execution for python (pyspark) cells?

  • 5 kudos
3 More Replies
Arpi
by New Contributor II
  • 3357 Views
  • 4 replies
  • 4 kudos

Resolved! Database creation error

I am trying to create database with external location abfss but facing the below error.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs....

  • 3357 Views
  • 4 replies
  • 4 kudos
Latest Reply
source2sea
Contributor
  • 4 kudos

Changing it to a CLUSTER level for OAuth authentication helped me solve the problem.I wish the notebook AI bot could tell me the solution.before the changes, my configraiotn was at the notebook leve.and  it has below errorsAnalysisException: org.apac...

  • 4 kudos
3 More Replies
Constantine
by Contributor III
  • 10593 Views
  • 3 replies
  • 7 kudos

Resolved! collect_list by preserving order based on another variable - Spark SQL

I am using databricks sql notebook to run these queries. I have a Python UDF like   %python   from pyspark.sql.functions import udf from pyspark.sql.types import StringType, DoubleType, DateType   def get_sell_price(sale_prices): return sale_...

  • 10593 Views
  • 3 replies
  • 7 kudos
Latest Reply
villi77
New Contributor II
  • 7 kudos

I had a similar situation where I was trying to order the days of the week from Monday to Sunday.  I saw solutions that use Python but was wanting to do it all in SQL.  My original attempt was to use: CONCAT_WS(',', COLLECT_LIST(DISTINCT t.LOAD_ORIG_...

  • 7 kudos
2 More Replies
Jyo777
by Contributor
  • 4990 Views
  • 4 replies
  • 4 kudos

need help with Azure Databricks questions on CTE and SQL syntax within notebooks

Hi amazing community folks,Feel free to share your experience or knowledge regarding below questions:-1.) Can we pass a CTE sql statement into spark jdbc? i tried to do it i couldn't but i can pass normal sql (Select * from ) and it works. i heard th...

  • 4990 Views
  • 4 replies
  • 4 kudos
Latest Reply
vijaypavann
Databricks Employee
  • 4 kudos

CTE expressions are supported with the `prepareQuery` option.  https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html   A prefix that will form the final query together with query. As the specified query will be parenthesized as a subquery i...

  • 4 kudos
3 More Replies
lnsnarayanan
by New Contributor II
  • 10112 Views
  • 8 replies
  • 12 kudos

Resolved! I cannot see the Hive databases or tables once I terminate the cluster and use another cluster.

I am using Databricks community edition for learning purposes. I created some Hive-managed tables through spark sql as well as with df.saveAsTable options. But when I connect to a new cluser, "Show databases" only returns the default database....

  • 10112 Views
  • 8 replies
  • 12 kudos
Latest Reply
dhpaulino
New Contributor II
  • 12 kudos

As the file still in the dbfs you can just recreate the reference of your tables and continue the work, with something like this:db_name = "mydb" from pathlib import Path path_db = f"dbfs:/user/hive/warehouse/{db_name}.db/" tables_dirs = dbutils.fs.l...

  • 12 kudos
7 More Replies
Constantine
by Contributor III
  • 12642 Views
  • 2 replies
  • 6 kudos

Resolved! CREATE TEMP TABLE FROM CTE

I have written a CTE in Spark SQL WITH temp_data AS (   ......   )   CREATE VIEW AS temp_view FROM SELECT * FROM temp_view; I get a cryptic error. Is there a way to create a temp view from CTE using Spark SQL in databricks?

  • 12642 Views
  • 2 replies
  • 6 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 6 kudos

In the CTE you can't do a CREATE. It expects an expression in the form of expression_name [ ( column_name [ , ... ] ) ] [ AS ] ( query )where expression_name specifies a name for the common table expression.If you want to create a view from a CTE, y...

  • 6 kudos
1 More Replies
ramravi
by Contributor II
  • 13977 Views
  • 1 replies
  • 0 kudos

spark is case sensitive? Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select eit...

spark is case sensitive?Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select either "Name" or "name" column you will get column ambiguity error.There is a way to handle this issue b...

  • 13977 Views
  • 1 replies
  • 0 kudos
Latest Reply
source2sea
Contributor
  • 0 kudos

Hi, even though i set the conf to be true, on writing to disk it had exceptions complaining it has duplicate columns.below is the error message org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the data to save: branchavailablity....

  • 0 kudos
learnerbricks
by New Contributor II
  • 5780 Views
  • 4 replies
  • 0 kudos

Unable to save file in DBFS

I have took the azure datasets that are available for practice. I got the 10 days data from that dataset and now I want to save this data into DBFS in csv format. I have facing an error :" No such file or directory: 'No such file or directory: '/dbfs...

  • 5780 Views
  • 4 replies
  • 0 kudos
Latest Reply
pardosa
New Contributor II
  • 0 kudos

Hi,after some exercise you need to aware folder create in dbutils.fs.mkdirs("/dbfs/tmp/myfolder") it's created in /dbfs/dbfs/tmp/myfolderif you want to access path to_csv("/dbfs/tmp/myfolder/mytest.csv") you should created with this script dbutils.fs...

  • 0 kudos
3 More Replies
alexisjohnson
by New Contributor III
  • 10947 Views
  • 5 replies
  • 6 kudos

Resolved! Window function using last/last_value with PARTITION BY/ORDER BY has unexpected results

Hi, I'm wondering if this is the expected behavior when using last or last_value in a window function? I've written a query like this:select col1, col2, last_value(col2) over (partition by col1 order by col2) as column2_last from values ...

Screen Shot 2021-11-18 at 12.48.25 PM Screen Shot 2021-11-18 at 12.48.32 PM
  • 10947 Views
  • 5 replies
  • 6 kudos
Latest Reply
Carv
Visitor II
  • 6 kudos

For those stumbling across this; it seems LAST_VALUE emulates the same functionality as it does in SQL Server which does not, in most people's minds, have a proper row/range frame for the window. You can adjust it with the below syntax.I understand l...

  • 6 kudos
4 More Replies
ros
by New Contributor III
  • 1727 Views
  • 2 replies
  • 2 kudos

merge vs MERGE INTO

from 10.4 LTS version we have low shuffle merge, so merge is more faster. But what about MERGE INTO function that we run in sql notebook of databricks. Is there any performance difference when we use databrciks pyspark ".merge" function vs databricks...

  • 1727 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Roshan RC​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 2 kudos
1 More Replies
RateVan
by New Contributor II
  • 2407 Views
  • 3 replies
  • 0 kudos

Spark last window dont flush in append mode

The problem is very simple, when you use TUMBLING window with append mode, then the window is closed only when the next message arrives (+watermark logic). In the current implementation, if you stop incoming streaming data, the last window will NEVER...

3P1l3
  • 2407 Views
  • 3 replies
  • 0 kudos
Latest Reply
RateVan
New Contributor II
  • 0 kudos

No, the problem remains the same. The meaning doesn't change because you increased the timeout a little bit. As the window did not close, and does not close until a new message arrives

  • 0 kudos
2 More Replies
bluesky
by New Contributor II
  • 2405 Views
  • 2 replies
  • 1 kudos

Identity error Spark Sql:not enough data columns;target has 3 but the inserted data has 2, it's the identity column which is missing here

While inserting into target table i am getting an error '"not enough data columns;target has 3 but the inserted data has 2" but it's the identity column which is the 8th column ".insert into table A(col 1,col 2,col3)select col2,col3from table Bjoin t...

  • 2405 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @sky blue​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 1 kudos
1 More Replies
oleole
by Contributor
  • 7998 Views
  • 3 replies
  • 2 kudos

Resolved! Using "FOR XML PATH" in Spark SQL in sql syntax

I'm using spark version 3.2.1 on databricks (DBR 10.4 LTS), and I'm trying to convert sql server sql query to a new sql query that runs on a spark cluster using spark sql in sql syntax. However, spark sql does not seem to support XML PATH as a functi...

input output
  • 7998 Views
  • 3 replies
  • 2 kudos
Latest Reply
oleole
Contributor
  • 2 kudos

Posting the solution that I ended up using:%sql DROP TABLE if exists UserCountry; CREATE TABLE if not exists UserCountry ( UserID INT, Country VARCHAR(5000) ); INSERT INTO UserCountry SELECT L.UserID AS UserID, CONCAT_WS(',', co...

  • 2 kudos
2 More Replies
Labels