- 198 Views
- 5 replies
- 0 kudos
Issue with Multiple Stateful Operations in Databricks Structured Streaming
Hi everyone,I'm working with Databricks structured streaming and have encountered an issue with stateful operations. Below is my pseudo-code: df = df.withWatermark("timestamp", "1 second") df_header = df.withColumn("message_id", F.col("payload.id"))...
- 198 Views
- 5 replies
- 0 kudos
- 0 kudos
This should according to this blog post basically work, right? However, I'm getting the same errorMultiple Stateful Streaming Operators | Databricks BlogOr am I missing something? rate_df = spark.readStream.format("rate").option("rowsPerSecond", "1")...
- 0 kudos
- 258 Views
- 4 replies
- 1 kudos
Different JSON Results when Running a Job vs Running a Notebook
I have a regularly scheduled job that runs a PySpark Notebook that GETs semi-structured JSON data from an external API, loads that data into dataframes, and saves those dataframes to delta tables in Databricks. I have the schema for the JSON defined ...
- 258 Views
- 4 replies
- 1 kudos
- 1 kudos
@Alberto_Umana Sounds good, thank you for looking into it and let me know if there's any additional information I can provide in the meantime!
- 1 kudos
- 212 Views
- 4 replies
- 0 kudos
dbt error: Data too long for column at row 1
Hi there!We are experiencing a Databricks error we don’t recognise when we are running one of our event-based dbt models in dbt core (version 1.6.18). The dbt model uses the ‘insert_by_period’ materialisation that is still experimental for version 1....
- 212 Views
- 4 replies
- 0 kudos
- 0 kudos
We are yet to upgrade dbt core to the latest version but will check again once we have done so.
- 0 kudos
- 122 Views
- 1 replies
- 0 kudos
When is it time to change from ETL in notebooks to whl/py?
Hi!I would like some input/tips from the community regarding when is it time to go from a working solution in notebooks to something more "stable", like whl/py-files?What are the pros/cons with notebooks compared to whl/py?The way i structured things...
- 122 Views
- 1 replies
- 0 kudos
- 0 kudos
Hey @Forsen ,My advice:Using .py files and .whl packages is generally more secure and scalable, especially when working in a team. One of the key advantages is that code reviews and version control are much more efficient with .py files, as changes c...
- 0 kudos
- 97 Views
- 1 replies
- 0 kudos
Terminated cluster on free account
Hi,I mistakenly terminated my cluster. Could you please advise on how I can reactivate the same cluster?
- 97 Views
- 1 replies
- 0 kudos
- 0 kudos
Hi @Lupo123, To reactivate a terminated cluster on a free Databricks account, you will need to create a new cluster. Unfortunately, once a cluster is terminated, it cannot be reactivated
- 0 kudos
- 4493 Views
- 4 replies
- 2 kudos
Gathering Data Off Of A PDF File
Hello everyone,I am developing an application that accepts pdf files and inserts the data into my database. The company in question that distributes this data to us only offers PDF files, which you can see attached below (I hid personal info for priv...
- 4493 Views
- 4 replies
- 2 kudos
- 2 kudos
You can use PDF Data Source for read data from pdf files. Examples here: https://stabrise.com/blog/spark-pdf-on-databricks/And after that use Scale DP library for extract data from the text in declarative way using LLM. Here is example of extraction ...
- 2 kudos
- 204 Views
- 1 replies
- 0 kudos
Speaker diarization on databricks with Nemo throwing error
The configuration of my compute is 15.4 LTS ML (includes Apache Spark 3.5.0, GPU, Scala 2.12)Standard_NC8as_T4_v3 on Azure Databricks
- 204 Views
- 1 replies
- 0 kudos
- 0 kudos
Hi @Nishat ,It looks like there's a problem with GPU compability. As mentioned in the error message, FlashAttention only supports Ampere GPUs or newer.According to following thread, GPU architecture you've chosen is not supportedRuntimeError: FlashAt...
- 0 kudos
- 249 Views
- 1 replies
- 0 kudos
DBT RUN Command not working while invoked using subprocess.run
Hi,I am using below code to run DBT Model from notebook.I am using parameters to pass DBT run command(project directory, profile directory, schema name etc). The issue is, when I am running this code in my local workspace it is working fine but when ...
- 249 Views
- 1 replies
- 0 kudos
- 0 kudos
Hi @dk09, Can you share the path of: dbt_project_directory and also try inputting the folder path manually to debug it, does it still fail?
- 0 kudos
- 292 Views
- 2 replies
- 0 kudos
INSERT OVERWRITE DIRECTORY
I am using this query to create a csv in a volume named test_volsrr that i createdINSERT OVERWRITE DIRECTORY '/Volumes/DATAMAX_DATABRICKS/staging/test_volsrr'USING CSVOPTIONS ('delimiter' = ',', 'header' = 'true')SELECT * FROM staging.extract1gbDISTR...
- 292 Views
- 2 replies
- 0 kudos
- 0 kudos
The DISTRIBUTE BY COALESCE(1) clause is intended to reduce the number of output files to one. However, this can lead to inefficiencies and large file sizes because it forces all data to be processed by a single task, which can cause memory and perfor...
- 0 kudos
- 1654 Views
- 2 replies
- 0 kudos
Discrepancy in Performance Reading Delta Tables from S3 in PySpark
Hello Databricks Community,I've encountered a puzzling performance difference while reading Delta tables from S3 using PySpark, particularly when applying filters and projections. I'm seeking insights to understand this variation better.I've attempte...
- 1654 Views
- 2 replies
- 0 kudos
- 0 kudos
Use the explain method to analyze the execution plans for both methods and identify any inefficiencies or differences in the plans. You can also review the metrics to understand this further. https://www.databricks.com/discover/pages/optimize-data-wo...
- 0 kudos
- 191 Views
- 1 replies
- 0 kudos
Error changing connection information of Databricks data source posted on Tableau server
HelloThere is a Databricks data source published on the Tableau server.When I click the 'Edit Data Source' button in the location where the data source is published and go to the Data Source tab, and change the Databricks connection information (HTTP...
- 191 Views
- 1 replies
- 0 kudos
- 0 kudos
1) I am thinking if there are saved auth, which could cause the issue. 2) If possible, try using different authentication methods (e.g., Personal Access Token) to see if the issue persists. This can help identify if the problem is specific to the aut...
- 0 kudos
- 198 Views
- 1 replies
- 0 kudos
Migration of PowerBI reports from Synapse to Databricks sql (DBSQL)
We have 250 powerbi reports build on top of Azure Synapse, now we are migrating from Azure Synapse to Databricks (DB SQL). How to plan for cutover and strategy for PowerBII just seeking high level points we have to take care for planning. Any techie ...
- 198 Views
- 1 replies
- 0 kudos
- 0 kudos
While your account Solution Architect (SA) will be able to guide you, if you still want to check what peers did here https://community.databricks.com/t5/warehousing-analytics/migrate-azure-synapse-analytics-data-to-databricks/td-p/90663 and here http...
- 0 kudos
- 86 Views
- 1 replies
- 0 kudos
How to identify the goal of a specific Spark job?
I'm analyzing the performance of a DBR/Spark request. In this case, the cluster is created using a custom image, and then we run a job on it.I've dived into the "Spark UI" part of the DBR interface, and identified 3 jobs that appear to account for an...
- 86 Views
- 1 replies
- 0 kudos
- 0 kudos
The spark jobs are decided based on your spark code. You can look at the spark plan to understand what operations each spark job/stage is executing
- 0 kudos
- 557 Views
- 3 replies
- 1 kudos
Databricks workspace adjust column width
Hi, is it possible to change the column width in the workspace overview? Currently I have a lot of jobs with a name which is too wide for the standard overview and so it not easy to find certain jobs.
- 557 Views
- 3 replies
- 1 kudos
- 1 kudos
Ahh my mistake! You are right. It can be done only in workflow
- 1 kudos
- 254 Views
- 2 replies
- 0 kudos
JDBC Invalid SessionHandle with dbSQL Warehouse
Connecting Pentaho Ctools dashboards to Databricks using JDBC to a serverless dbSQL Warehouse, it works fine on the initial load, but then if we leave it idle for awhile and come back we get this error:[Databricks][JDBCDriver](500593) Communication l...
- 254 Views
- 2 replies
- 0 kudos
- 0 kudos
I should have mentioned that we're using AuthMech=3 and in the JDBC docs (Databricks JDBC Driver Installation and Configuration Guide) I don't see any relevant timeout settings that would apply in that scenario. Am I missing something?
- 0 kudos
Connect with Databricks Users in Your Area
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group-
AI Summit
4 -
Azure
2 -
Azure databricks
2 -
Bi
1 -
Certification
1 -
Certification Voucher
2 -
Community
7 -
Community Edition
3 -
Community Members
1 -
Community Social
1 -
Contest
1 -
Data + AI Summit
1 -
Data Engineering
1 -
Databricks Certification
1 -
Databricks Cluster
1 -
Databricks Community
8 -
Databricks community edition
3 -
Databricks Community Rewards Store
3 -
Databricks Lakehouse Platform
5 -
Databricks notebook
1 -
Databricks Office Hours
1 -
Databricks Runtime
1 -
Databricks SQL
4 -
Databricks-connect
1 -
DBFS
1 -
Dear Community
1 -
Delta
9 -
Delta Live Tables
1 -
Documentation
1 -
Exam
1 -
Featured Member Interview
1 -
HIPAA
1 -
Integration
1 -
LLM
1 -
Machine Learning
1 -
Notebook
1 -
Onboarding Trainings
1 -
Python
2 -
Rest API
10 -
Rewards Store
2 -
Serverless
1 -
Social Group
1 -
Spark
1 -
SQL
8 -
Summit22
1 -
Summit23
5 -
Training
1 -
Unity Catalog
3 -
Version
1 -
VOUCHER
1 -
WAVICLE
1 -
Weekly Release Notes
2 -
weeklyreleasenotesrecap
2 -
Workspace
1
- « Previous
- Next »