Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
Here's your Data + AI Summit 2024 - Warehousing & Analytics recap as you use intelligent data warehousing to improve performance and increase your organization’s productivity with analytics, dashboards and insights.
Keynote: Data Warehouse presente...
I trying to connect databricks workspace to Redshift with security groups. Before I set up security groups it worked fine but now I can't connect to it because of inbound rules. What databricks IP's I should add to inbound rules to make it work?
Hi all,Just started using Db Dashboards on Azure. While generating a line chart with a x-axis value based on a date data type I've applied a filter which removes some of the data. The above is without filtering, the below is with filtering. You can o...
I'm encountering an issue while importing a .gz file containing JSON data into a Spark DataFrame in Databricks. The error indicates a column name conflict. Could you please advise on how to resolve this issue and handle duplicate column names during ...
We are having Databricks tables and we have an API which query that data, the query will be dynamic and API would allow user to query anything. However users can query lot of data and consume lot of DBU but generic rate limiting wont help, as any sin...
Hi everyone,We are at the moment stumbeling upon a big challenge with loading data into PowerBI. I need some advice!To give a bit of conext: we introduced Databricks instead of Azure Synapse for a client of ours. We are currently busy with moving all...
I'm new to databricks and I have a source data model that stores the data as Name-Value pairs (i.e. normalised) in two columns in the table.EntityIDNameValue1Field1SomeValue11Field2SomeValue21Field3SomeValue32Field1SomeValue12Field3SomeValue3The defi...
Your first approach didn't work, because named_struct needs it's arguments on odd postition to be foldable.So you can think of it in following way, at compile time compiler needs to "see" this value. That's why even if you prepared proper expression ...
Looking for Chauffeur service Melbourne? Visit Executive cars for Chauffeur service Melbourne. ✓24/7 Operation ✓100+ Vehicles ✓ Chauffeur Cars Melbourne.
If you're in Melbourne and need reliable chauffeur services, Executive Cars seems like a great option with its extensive fleet and round-the-clock availability. Safety and reliability are crucial in the transportation industry. Speaking of which, hav...
I'm trying to find information to see who runs a notebook. I'm able to see who created the notebook, and I can find out when the notebook is ran, but there doesn't seem to be any information on who ran it, only who created the notebook.
SELECT *
FROM system.logs
WHERE event_type = 'notebook_run'
ORDER BY timestamp DESC;@KrisMcDougal with this code you can check from which user id the first command of the notebook got started and you will get to know who started the notebook.
Hi AllI am trying to use Delta Live Tables to connect to MSK.We have set up serverless MSK clusters that use IAM for its authetication. I cannot connect to it from a dlt notebook. The same code near enough works on normal clusters that have java libr...
Just rephrasing the question:I am trying to use the DLT to connect to serverless MSK clusters authenticated by IAM. The code works on ordinary clusters but doesn't work when run on DLT clusters. I think the issue is the authentication because we can ...
When changes are made to a Databricks SQL table, a new version is created. If changes to the table are made using Spark or Python in a notebook and the table is overwritten, will a new version be created, or will it remain as version number 0?
When changes are made to a Databricks SQL table (Delta table) using Spark or Python in a notebook, and the table is overwritten, a new version will indeed be created. It will not remain as version number 0. Each overwrite operation increments the ver...
Here's your Data + AI Summit 2024 - Warehousing & Analytics recap as you use intelligent data warehousing to improve performance and increase your organization’s productivity with analytics, dashboards and insights.
Keynote: Data Warehouse presente...
If I create an external table on AWS Databricks, will it be a Delta table? If not, is there a way to make it a Delta table, or is there no Delta capability for external tables?
Yes, First of all, open source spark already has a set of auto-tuning features denominated Adaptive Query Execution (AQE). Here are more details: https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution.
For even bett...
I am runninng a query against multiple parquet files:SELECT
SUM(CASE WHEN match_result.year_incorporated IS NOT NULL AND match_result.year_incorporated != '' THEN 1 ELSE 0 END)
FROM
parquet.`s3://folder_path/*`for some files, the field `year_incorpo...
@Shaimaa The column type mismatch between the files could be an issue here.For example: if in one file column 'xyz' is a type of INTEGER and in another one the same column is a type of STRING, Spark will give you a schema conversion error.Below is a ...
I am running this query against parquet:SELECT
SUM(CASE WHEN match_result.ecommerce.has_online_payments THEN 1 ELSE 0 END)
FROM parquet.`s3://folder_path/*`when all the values of the object `match_result.ecommerce` are null, I get the following erro...
None of these solutions with coalesce work because it's "match_result.ecommerce" that is null not "match_result.ecommerce.has_online_payments". So it's still trying to extract a value from a null. Help me modify the query accordingly please.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.