In Spark SQL, you could use commands like "insert overwrite directory" that indirectly creates a temporary file with the datahttps://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-dml-insert-overwrite-directory.html#example...
I’m trying to use sql query on azure-databricks with distinct sort and aliasesSELECT DISTINCT album.ArtistId AS my_alias
FROM album ORDER BY album.ArtistIdThe problem is that if I add an alias then I can not use not aliased name in the order by ...
Hello,as the title already suggests, i'm not able to remove a file via the shell (%sh rm -f "path") nor continue the notebook 6.2 onwards on (6.3 etc...) inside DataBricks. I'm using the DataBricks Community edition.While the error message is clear:"...
I am trying to obtain the month and year in the format of "MM-YYY", then "YYY" to get a values such as 12-2012. I noticed an error where a timestamp of 2012-12-30T00:00:00.000+0000 results in both 12-2013 and 2013. This is an error, since 2012-12-30...
Hi @ Josh21! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.
I have a SQL query which I am converting into spark sql in azure databricks running in my jupyter notebook. In my SQL query, a column named Type is created on the fly which has value 'Goal' for every row:SELECT Type='Goal', Value FROM tableNow, when...
Hi, I noticed unexpected behavior for Date type. If year value is less then 1000 then filtering do not work.
Steps:create table test (date Date); insert into test values ('0001-01-01'); select * from test where date = '0001-01-01'
Returns 0 rows....
I'm trying to create a dashboard in Databricks SQL, parameterized by table name. We have a metadata table which contains the names of all the eligible tables, and we use it to populate a drop-down box for the dashboard. This is a simplified version ...
Yep, I figured out the issue now. Both of you gave the right information to solve the problem. My first mistake was as Jacob mentioned, `date` is actually a dataframe object here. To get the string date, I had to do similar to what Amine suggested. S...
In notebook, It looks like if I need to select top N rows, I can rely on "LIMIT" keyword. It would be nice if you can support "TOP" as well
The current approach to select 10 rows:
select * from table1 LIMIT 10
Requesting TOP support:
SELECT TOP 10 *...
I am extremely sorry, this feature is not available at databricks.You can request for this feature here:-https://docs.databricks.com/resources/ideas.html
Am trying to use SQL, but createOrReplaceTempView("myDataView") fails.
I can create and display a DataFrame fine...
import pandas as pd
df = pd.DataFrame(['$3,000,000.00','$3,000.00', '$200.5', '$5.5'], columns = ['Amount'])
df
I add another cell, ...
As of this comment, SQL analytics still requires a few additional enablement steps. You will need to ask your Databricks account team to help turn this on in your workspace.
When using SQL, I can use the Create Live Table command and the Create Incremental Live Table command to set the run type I want the table to use. But I don't seem to have that same syntax for python. How can I set this table type while using Python?
The documentation at https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-user-guide.html#mixing-complete-tables-and-incremental-tables has an example the first two functions load data incrementally and the last one loads...
I have a high concurrency cluster where multiple users are running. However, I see the queries are running very slow. I did debug the logs and see more time is spent on the Spark driver. on the Spark UI, I do not see slowness.
It's possible the connectivity to hive metastore is causing the delay here. When there is a high degree of concurrency and contention for metastore access. Interactive clusters in DBR are configured to use up to 5 (spark.databricks.hive.metastore.cli...