Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
Here's your Data + AI Summit 2024 - Warehousing & Analytics recap as you use intelligent data warehousing to improve performance and increase your organization’s productivity with analytics, dashboards and insights.
Keynote: Data Warehouse presente...
Here, we're trying to use the Python UDF inside the query.taking the table as function input converting the table into dataframe performing modification converting the dataframe into table returning the table How can we create spark context inside U...
Hi team,I believe you cannot create or access a SparkSession or run Spark operations like spark.sql() directly inside a Python UDF. input_table is a table argument, not a string with a table name. You receive it as a pandas DataFrame when using RETUR...
Hi,Is it possible to connect to the Community Edition using JDBC?On the web dashboard, in the "compute" instance parameters, there are details about a JDBC connection string, but it is templated to use a PAT. Though I've read in other posts that PAT ...
Hello!I'm new to Databricks.Assume, I need to migrate 2 Tb Oracle Datamart to Databricks on Azure. Serverless SQL Warehouse seems as a valid choice.What is a better option ( cost vs performance) to store the data?Should I upload Oracle Extracts to Az...
@Curious-mind
You got it. Running the COPY INTO is good for the initial load as it's optimized for bulk loads. You'll want to use Auto-loader going forward to incrementally process new rows.
Hi, I am trying to test our delta sharing server using UC in our workspace.We noticed that when we execute a query, for example: SELECT COUNT(1) FROM table_name WHERE col1 = 'value' , it sends two /query requests to our server. The first request has ...
Hi, we understand that the client performs a two-phase query planning process. In our case, the table has around 145,000 Parquet files, and we've observed that the first request becomes a significant bottleneck: the response body is large (655 MB) an...
I am trying to compute the market_share measure using the Custom Calculations functionality in the AI/BI Dashboard.My dataset looks like this:empresa (company): The name of the company.acessos (accesses): Count of access.My Custom Calculation express...
I have this:connectionString='mongodb+srv://user:pw@something.jghu.mongodb.net/?retryWrites=true&w=majority&appName=dbricks&tls=true'
database='dealflow'
collection='activities'
frame = spark.read.format("mongodb") \
.option("spark.mongodb.read....
hey @attie_bc I guess you are using All-pourpouse cluster, have you triedcurl https://www.google.comMaybe your cluster doesn’t have internet access? If that’s the case, DNS resolution for your MongoDB Atlas SRV connection string will fail, which woul...
I am currently building a dashboard in Databricks AI/BI and would like to implement a parameter-based chart selection feature. The goal is to allow users to choose a metric from a dropdown (such as Sales, Cost, or Profit), and based on that selection...
Here is something to consider:
To implement a parameter-based chart selection feature in a Databricks AI/BI dashboard, follow these steps:
Utilize Dashboard Parameters to Drive Dynamic Updates:
Databricks dashboards support the use of parameters to ...
Hi. I have a local MongoDB running on an EC2 instance in the same AWS VPC as my Databricks cluster but cannot get Databricks to talk to MongoDB. I've followed the guide at https://docs.databricks.com/aws/en/connect/external-systems/mongodb and have a...
Hey @Kirki maybe its late but I will try to help you or others to create these connectionsFirst thing make sure you have installed inside your cluster the connector org.mongodb.spark:mongo-spark-connector_2.12:3.0.1You can use directly in your spark....
I've got a custom Dash app I've written and am attempting to deploy. It runs fine on my local machine (while accessing my DB SQL Warehouse), but when I try deploying to Databricks, it cannot connect to the data for some reason. I was basically follow...
With the recent addition of dashboard tabs in databricks, I could not find a way to have filters apply to multiple tabs within the same dashboard - so far I have had to manually create and apply filters to each of my tabs individually.
Hello @brld!
Currently, dashboards do not support applying filters automatically across multiple tabs. You could try using a global parameter, this parameter will be available to all widgets using the same dataset or query. However, the filter compon...
If I create an external table on AWS Databricks, will it be a Delta table? If not, is there a way to make it a Delta table, or is there no Delta capability for external tables?
Hi Akshay,
I believe you can try this for your use case ->
CREATE TABLE IF NOT EXISTS catalog.schema.my_external_table (
id INT,
name STRING,
age INT
)
USING delta
LOCATION '<location>'
This will create a delta table.
When creating a dashboard with multiple pages connected to one dataset. It seems that only visual elements on the same page that the filter is on takes effect. Is there a way to filter all visual elements regardless of which page the filter is on?I h...
Hi folks,
We are currently working on global filters, which will allow you to set a filter value or parameter value across multiple pages. Keep an eye out for that feature, coming soon!
We aim to implement Databricks Mirroring through the Fabric APIs for automation. However, the Mirroring API specifically states that it is not compatible with Databricks. Are there alternative APIs that could be used to achieve this functionality?
I am looking to monitor my SQL Warehouse, especially the 'Running Clusters' metric that is available in the monitoring tab of the warehouse. This shows the start and shut down time as well as the number of running clusters:The issue I have run into i...
Hey @Dave_Nithio To monitor the “Running Clusters” metric for your SQL Warehouse, you can use the Databricks Cluster Events API. This API retrieves a list of events related to cluster activity, such as start and shutdown times, and provides paginated...
Hello All!I have a python script which utilizes the databricks SQL for python package in order to pull a databricks table into a pandas dataframe which is used to create a table in a Spotfire report. The table contains ~1.28 million rows, with 155 co...
Hey @barchiel33 ,After reviewing your context further, I believe the most effective approach would be to set up an automated pipeline within Databricks that periodically extracts data based on the frequency you need (daily, weekly, hourly, etc.), cre...