- 425 Views
- 4 replies
- 0 kudos
DLT refresh time for combination of streaming and non streaming tables?
@dlt.tabledef joined_table(): dim_df = spark.read.table("dim_table") # Reloads every batch fact_df = spark.readStream.table("fact_stream") return fact_df.join(dim_df, "id", "left")
- 425 Views
- 4 replies
- 0 kudos
- 0 kudos
Hi,Current approach reloads dim_df in every batch, which can be inefficient. To optimize, consider broadcasting dim_df if it's small or using a mapGroupsWithState function for stateful joins. Also, ensure that fact_df has sufficient watermarking to h...
- 0 kudos
- 8242 Views
- 2 replies
- 0 kudos
How to detect if running in a workflow job?
Hi there,what's the best way to differentiate in what environment my Spark session is running? Locally I develop with databricks-connect's DatabricksSession, but that doesn't work when running a workflow job which requires SparkSession.getOrCreate()....
- 8242 Views
- 2 replies
- 0 kudos
- 0 kudos
import json def get_job_context(): """Retrieve job-related context from the current Databricks notebook.""" # Retrieve the notebook context ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext() # Convert the context...
- 0 kudos
- 297 Views
- 1 replies
- 0 kudos
Help Needed: Executor Lost Error in Multi-Node Distributed Training with PyTorch
Hi everyone,I'm currently working on distributed training of a PyTorch model, following the example provided here. The training runs perfectly on a single node with a single GPU. However, when I attempt multi-node training using the following configu...
- 297 Views
- 1 replies
- 0 kudos
- 0 kudos
We do not recommend using spot instances with distributed ML training workloads that use barrier mode, like TorchDistributor as these workloads are extremely sensitive to executor loss. Please disable spot/pre-emption and try again.
- 0 kudos
- 3682 Views
- 2 replies
- 0 kudos
cannot create external location: invalid Databricks Workspace configuration
HI AllI am trying to create databricks storage credentials , external location and catalog with terraform.cloud : AzureMy storage credentials code is working correctly . But the external location code is throwing below error when executing the Terraf...
- 3682 Views
- 2 replies
- 0 kudos
- 0 kudos
Hi @manoj_2355ca , I am also facing the same error, did you get the solution for it?
- 0 kudos
- 3850 Views
- 5 replies
- 0 kudos
typing extensions import match error
I am trying to install the stanza library and try to create a udf function to create NER tags for my chunk_text in the dataframe.Cluster Config: DBR 14.3 LTS SPARK 3.5.0 SCALA 2.12below code:def extract_entities(text import stanza nlp = stanza....
- 3850 Views
- 5 replies
- 0 kudos
- 0 kudos
@SaadhikaB Hi, when I run dbutils.library.restartPython(), I get the following error
- 0 kudos
- 370 Views
- 0 replies
- 0 kudos
PYTEST: Module not found error
Hi,Apologies, as I am trying to use Pytest first time. I know this question has been raised but I went through previous answers but the issue still exists.I am following DAtabricks and other articles using pytest. My structure is simple as -tests--co...
- 370 Views
- 0 replies
- 0 kudos
- 2992 Views
- 4 replies
- 3 kudos
Failure when deploying a custom serving endpoint LLLM
I'm currently experimenting with vector search using Databricks. Everything runs smoothly when I load the model deployed in Unity Catalog into a notebook session and ask questions using Python. However, when I attempt to serve it, I encounter a gener...
- 2992 Views
- 4 replies
- 3 kudos
- 3 kudos
"Deploying a custom serving endpoint for LLMs can be challenging, especially when handling model dependencies and scaling issues. Has anyone found a reliable workaround for deployment failures? Also, for those looking for updates on government assist...
- 3 kudos
- 4690 Views
- 4 replies
- 0 kudos
Resolved! What version of Python is used for the 16.1 runtime
I'm trying to create a spark udf for a registered model and getting:Exception: Python versions in the Spark Connect client and server are different. To execute user-defined functions, client and server should have the same minor Python version. Pleas...
- 4690 Views
- 4 replies
- 0 kudos
- 0 kudos
Does this mean that:1. A new dbx runtime comes out2. Serverless compute automatically switches to the new runtime + new python version3. Any external environments that use serverless ie, local VScode / CICD environments also need to upgrade their pyt...
- 0 kudos
- 294 Views
- 1 replies
- 0 kudos
Lakehouse monitoring metrices tables not created automatically.
Hello,I have an external table created in databricks unity catalog workspace and trying to "Create a monitor" for the same from quality tab.While creating the same the dashboard is getting created however the two metrices tables "profile" & "drift" a...
- 294 Views
- 1 replies
- 0 kudos
- 0 kudos
Hello @nikhil_2212! It looks like this post duplicates the one you recently posted. A response has already been provided to the Original post. I recommend continuing the discussion in that thread to keep the conversation focused and organised.
- 0 kudos
- 305 Views
- 1 replies
- 0 kudos
Stream processing large number of JSON files and handling exception
application writes several JSON (small) files and the expected volumes of these files are high ( Estimate: 1 million during the peak season in a hourly window) . As per current design, these files are streamed through Spark Stream and we use autolo...
- 305 Views
- 1 replies
- 0 kudos
- 0 kudos
We have customers that read millions of files per hour+ using Databricks Auto Loader. For high-volume use cases, we recommend enabling file notification mode, which, instead of continuously performing list operations on the filesystem, uses cloud nat...
- 0 kudos
- 616 Views
- 1 replies
- 0 kudos
Urgent: Need Authentication Reset for Databricks Workspace Access
I am unable to access my Databricks workspace because it is still redirecting to Microsoft Entra ID (Azure AD) authentication, even after I have removed the Azure AD enterprise application and changed the AWS IAM Identity Center settings.Issue Detail...
- 616 Views
- 1 replies
- 0 kudos
- 0 kudos
Hello @Pooviond! Please submit a ticket with the Databricks Support team for assistance in resolving this issue.
- 0 kudos
- 1227 Views
- 4 replies
- 1 kudos
Resolved! How best to measure the time-spent-waiting-for-an-instance?
I'm exploring using an instance pool. Can someone clarify for me which job event log tells me the time-spent-waiting-for-an-instance? I've found 2 candidates:1. The delta between "waitingForCluster" and "started" on the "run events" log, accessible v...
- 1227 Views
- 4 replies
- 1 kudos
- 567 Views
- 2 replies
- 1 kudos
Resolved! When is it time to change from ETL in notebooks to whl/py?
Hi!I would like some input/tips from the community regarding when is it time to go from a working solution in notebooks to something more "stable", like whl/py-files?What are the pros/cons with notebooks compared to whl/py?The way i structured things...
- 567 Views
- 2 replies
- 1 kudos
- 1 kudos
Hey @Forssen ,My advice:Using .py files and .whl packages is generally more secure and scalable, especially when working in a team. One of the key advantages is that code reviews and version control are much more efficient with .py files, as changes ...
- 1 kudos
- 7282 Views
- 7 replies
- 2 kudos
Resolved! Move multiple notebooks at the same time (programmatically)
If I want to move multiple (hundreds of) notebooks at the same time from one folder to another, what is the best way to do that? Other than going to each individual notebook and clicking "Move".Is there a way to programmatically move notebooks? Like ...
- 7282 Views
- 7 replies
- 2 kudos
- 2 kudos
You can use the export and import API calls in order to export this notebook to your local machine and then import it to the new workspace.Export: https://docs.databricks.com/api/workspace/workspace/exportImport: https://docs.databricks.com/api/works...
- 2 kudos
- 721 Views
- 1 replies
- 0 kudos
Resolved! Deduplication with rocksdb, should old state files be deleted manually (to manage storage size)?
Hi, I have following streaming setup:I want to remove duplicates in streaming.1) deduplication strategy is defined by two fields: extraction_timestamp and hash (row wise hash)2) watermark strategy: extraction_timestamp with "10 seconds" interval--> R...
- 721 Views
- 1 replies
- 0 kudos
- 0 kudos
Found solution. https://kb.databricks.com/streaming/how-to-efficiently-manage-state-store-files-in-apache-spark-streaming-applications <-- these two parameters.
- 0 kudos
Join Us as a Local Community Builder!
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now-
Access Data
2 -
Access Delta Tables
2 -
Account reset
1 -
ADF Pipeline
1 -
ADLS Gen2 With ABFSS
1 -
AI
1 -
Analytics
1 -
Apache spark
1 -
API Documentation
2 -
Architecture
1 -
Auto-loader
1 -
Autoloader
2 -
AWS
3 -
AWS security token
1 -
AWSDatabricksCluster
1 -
Azure
4 -
Azure data disk
1 -
Azure databricks
12 -
Azure Databricks SQL
5 -
Azure databricks workspace
1 -
Azure Unity Catalog
4 -
Azure-databricks
1 -
AzureDatabricks
1 -
AzureDevopsRepo
1 -
Big Data Solutions
1 -
Billing
1 -
Billing and Cost Management
1 -
Bronze Layer
1 -
Certification
3 -
Certification Exam
1 -
Certification Voucher
3 -
Cloud_files_state
1 -
CloudFiles
1 -
Cluster
3 -
Community Edition
3 -
Community Group
1 -
Community Members
1 -
Compute
3 -
conditional tasks
1 -
Connection
1 -
Contest
1 -
Cost
2 -
Credentials
1 -
CustomLibrary
1 -
Data + AI Summit
1 -
Data Engineering
3 -
Data Explorer
1 -
Data Ingestion & connectivity
1 -
databricks
2 -
Databricks Academy
1 -
Databricks AI + Data Summit
1 -
Databricks Alerts
1 -
Databricks Assistant
1 -
Databricks Certification
1 -
Databricks Cluster
2 -
Databricks Clusters
1 -
Databricks Community
9 -
Databricks community edition
3 -
Databricks Community Rewards Store
3 -
Databricks connect
1 -
Databricks Dashboard
1 -
Databricks delta
2 -
Databricks Delta Table
2 -
Databricks Demo Center
1 -
Databricks Documentation
1 -
Databricks Job
1 -
Databricks Lakehouse Platform
6 -
Databricks notebook
2 -
Databricks Notebooks
2 -
Databricks Platform
2 -
Databricks Pyspark
1 -
Databricks Python Notebook
1 -
Databricks Repo
1 -
Databricks Runtime
1 -
Databricks SQL
5 -
Databricks SQL Alerts
1 -
Databricks SQL Warehouse
1 -
Databricks UI
1 -
Databricks Unity Catalog
4 -
Databricks Workflow
2 -
Databricks Workflows
2 -
Databricks workspace
1 -
Databricks-connect
1 -
DatabricksJobCluster
1 -
DataDays
1 -
Datagrip
1 -
DataMasking
2 -
dbdemos
2 -
DBFS
1 -
DBRuntime
1 -
DDL
1 -
Dear Community
1 -
deduplication
1 -
Delt Lake
1 -
Delta
22 -
Delta Live Pipeline
3 -
Delta Live Table
5 -
Delta Live Table Pipeline
5 -
Delta Live Table Pipelines
4 -
Delta Live Tables
7 -
Delta Sharing
2 -
deltaSharing
1 -
Deny assignment
1 -
Development
1 -
Devops
1 -
DLT
10 -
DLT Pipeline
7 -
DLT Pipelines
5 -
DLTCluster
1 -
Dolly
1 -
Download files
1 -
Dynamic Variables
1 -
Engineering With Databricks
1 -
env
1 -
External Sources
1 -
External Storage
2 -
FAQ for Databricks Learning Festival
2 -
Feature Store
2 -
Filenotfoundexception
1 -
Free trial
1 -
GCP Databricks
1 -
Getting started
1 -
Google Bigquery
1 -
HIPAA
1 -
Integration
1 -
JDBC Connector
1 -
Job Task
1 -
Lineage
1 -
LLM
1 -
Login
1 -
Login Account
1 -
Machine Learning
2 -
MachineLearning
1 -
Materialized Tables
2 -
Medallion Architecture
1 -
MlFlow
2 -
Model Training
1 -
Networking
1 -
Notebook
1 -
Onboarding Trainings
1 -
Permission
1 -
Permissions
1 -
personalcompute
1 -
Pipeline
2 -
Plotly
1 -
PostgresSQL
1 -
Pricing
1 -
Pyspark
1 -
Python
4 -
Python Code
1 -
Python Wheel
1 -
Quickstart
1 -
Read data
1 -
Repos Support
1 -
Reset
1 -
Rest API
10 -
Rewards Store
2 -
Schedule
1 -
Serverless
2 -
Session
1 -
Sign Up Issues
2 -
Spark
3 -
sparkui
2 -
Splunk
1 -
SQL
8 -
Summit23
7 -
Support Tickets
1 -
Sydney
2 -
Table Download
1 -
Tags
1 -
Training
2 -
Troubleshooting
1 -
ucx
2 -
Unity Catalog
4 -
Unity Catalog Metastore
1 -
Update
1 -
user groups
1 -
Venicold
3 -
volumes
2 -
Voucher Not Recieved
1 -
Watermark
1 -
Weekly Documentation Update
1 -
Weekly Release Notes
2 -
Women
1 -
Workflow
2 -
Workspace
3
- « Previous
- Next »
User | Count |
---|---|
122 | |
56 | |
40 | |
30 | |
20 |