Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
Here's your Data + AI Summit 2024 - Warehousing & Analytics recap as you use intelligent data warehousing to improve performance and increase your organization’s productivity with analytics, dashboards and insights.
Keynote: Data Warehouse presente...
I have huge datasets, transformation, display, print, show are working well on this data when read in a pandas dataframe. But the same dataframe when converted to a spark dataframe, is taking minutes to display even a single row and hours to write th...
I understand you want it sooner. Did it at least write the data in 10 minutes compared to not writing before?
There are more knobs you can tweak like
spark.sql.shuffle.partitions=auto
Do you have any index columns in your spatial data that can be us...
I am unable to obtain a count of a dataframe, it always get stuck at 1 stage, I have tried reducing the size, what can be the issue? How can I read cluster logs to identify the issue?
Driver memory is good enough, it is able to handle 90 lakhs data, what I am giving it is definitely less than that, what can I do about skewed data and shuffling?
Hi Community,I try to pass the result of a CTE as a function parameter as code below WITH t1 AS (
SELECT array_join(collect_list(output), ',') AS x
FROM my_catalog.my_db.get_x(:startTime, :endTime)
)
SELECT 'AM_offline' as Type, CASE WHEN off...
Hi @szymon_dybczak Thanks for replying. I don't the issue is related to datatype, since the query works if I pass the subquery to _x parameter without CTE.Please see as below code:SELECT 'AM_offline' as Type, CASE WHEN offline_ratio > 1.5 THEN 'no-Go...
I have created Python modules containing some Python functions and I would like to import them from a notebook contained in the Workspace. For example, I have a "etl" directory, containing a "snapshot.py" file with some Python functions, and an empty...
Hi @sachamourier ,It will work, but you need carefully craft path to sys.path.append(), you even do not need __init__.py to make it work.Try to hard-code the path to the snapshot.py in workspace.Add this to your notebook: import sys
import os
absolu...
According to the official Databricks documentation on GCP, I should have the ability to deploy a serverless SQL warehouse inside Databricks. Following the documentation, it is requested to turn on Serverless SQL warehouses (On), but there is nothing ...
Hello, Following abnormally high costs when using serverless sql on September 9 and 10, I noticed that the cluster sometimes stays on for an hour even though it's not receiving any new requests, and that the auto-stop is set to 5 minutes of inactivit...
Hi @EmmaP!I have encountered this. Even though the UI says that they are complete, they actually are not. While the query itself completed, the client is still fetching the data from the SQL Warehouse.To check if this is your issue, from the monitori...
Cool. This is a very convenient feature since most people now use the PDF format when working with text files. If anyone has ever had any issues with this format, I can say that I recently needed to merge several PDF files into one, and with the help...
is there a SQL equivalent of overwriteSchema ?https://docs.databricks.com/en/delta/update-schema.html#explicitly-update-schema-to-change-column-type-or-name
In place schema adjustment =>Then ALTER TABLE XXX ADD/DROP COLUMN XXX INTExamplecreate table test (id int, first_name string, last_name string ); insert into test values (1, 'john', 'smith'); alter table test add column age int; select * from testCr...
I'm using a SQL warehouse with autostop after 5 minutes of inactivity. However, the cluster is constantly activating and deactivating without any explanation. There are no queries being executed, and I can't identify any reasons why it is happening,...
Hi @msolcuadrado ,In your case I would try to contact directly with databricks support team. This is a serious issue and I feel your pain. They should help you pinpoint an excat cause + maybe you'll get a refund
Hi,I'm receiving the error Incorrect syntax near '=' when I run simple queries like the example below. This only happens when I use a column created using a CASE statement in the WHERE clause. I can use any other column in the WHERE clause, includi...
What jumps out to me at first is the backticks on `Peak Vertical Force / BW`, but I'm assuming that's just a column name and not an attempt at division.Next that jumps out is TestType and TestTypeName being aliased as testType and testTypeName- spark...
Hi,I'm using the REST API for SQL Warehouse in order to execute queries. I have experienced multiple times that query validation fails over the REST API, while executing the same query in the Databricks UI on the same cluster succeeds. An example: [P...
Had to try for myself and it seems the sql execution context in the REST API is different than that of an *.sql script, notebook or query made against an sql warehouse through the ui. The error stems from the fact that the SET command can also be use...
Hi Joeshph,How are you doing today?Give a try with below inputs and let me know if works well.Filter and aggregate data in Databricks to reduce load before it reaches Power BI. Use DirectQuery carefully, simplify measures, and reduce the number of vi...
I have a query using LCA. When referencing another table that has a column with the same name as the column used as LCA, the behavior of the query changes and it starts referencing the table column instead of the column that is already in the select ...
Hi @Kaniz_Fatma,we had the same problem as @paulocorrea.That's why it would be correct for to me to throw an error on ambiguous columns and the LCA could/must be addressed with a default identifier.Thanks
When trying to execute a query via sql warehouse, I get the following error:INVALID_PARAMETER_MARKER_VALUE.DUPLICATE_NAMEthe sql statement uses ? placeholders and the correct number of arguments are being passed.I am not able to use named placeholder...
Hi @RickB ,Which API are you using to invoke this? Parameter markers can be provided by:Python using its pyspark.sql.SparkSession.sql() API.Scala using its org.apache.spark.sql.SparkSession.sql() API.Java using its org.apache.spark.sql.SparkSession.s...
I'm planning to connect SQL Server Management Studio (SSMS) with Databricks using Lakehouse Federation. I understand that there are some differences in the SQL dialects between SSMS and Databricks SQL. For instance, in SSMS, we use TOP 10 to limit th...
To add on this:if you really have to use T-SQL (the MS dialect of SQL), you can define the SQL warehouse from databricks as a linked server on your SQL server.As said: SSMS is merely a sql client, the SQL dialect to be used is defined by the database...