- 4088 Views
- 3 replies
- 1 kudos
drop duplicates within watermark
Recently we are using structured streaming to ingest data. We want to use watermark to drop duplicated event. But We encountered some wired behavior and unexpected exception. Anyone can help me to explain what is the expected behavior and how should ...
- 4088 Views
- 3 replies
- 1 kudos
- 1 kudos
Any maintainer can help me on this question??
- 1 kudos
- 6032 Views
- 2 replies
- 1 kudos
Resolved! Read zstd file from Databricks
I just started to read `zstd` compressed file in Databricks on Azure, Runtime 14.1 on Spark 3.5.0I've set PySpark commands as followspath = f"wasbs://{container}@{storageaccount}.blob.core.windows.net/test-zstd" schema = "some schema" df = spark.read...
- 6032 Views
- 2 replies
- 1 kudos
- 1 kudos
The available compression types are format dependent.For json, zstd is not (yet) available, whereas for parquet it is.
- 1 kudos
- 3364 Views
- 0 replies
- 0 kudos
Can Error Message be un Redacted
I there a way to un-redact the logging of error message ?Alternatively would be nice to have access to the source code of involved classes like : com.databricks.backend.common.util.CommandLineHelper or com.databricks.util.UntrustedUtils I'm getting t...
- 3364 Views
- 0 replies
- 0 kudos
- 5705 Views
- 1 replies
- 1 kudos
How to schedule/refresh databricks alerts using REST API?
Hi, I am deploying Databricks SQL alerts using REST API. But I can't seem to figure out how to schedule their refresh task.I went through the documentation it says "Alerts can be scheduled using the sql_task type of the Jobs API, e.g. Jobs/Create"How...
- 5705 Views
- 1 replies
- 1 kudos
- 1 kudos
What they mention in the API docs is that you can create a job with sql_task of type Alert. To make it easier you can try creating the job first in the UI first and downloading the JSON config. Here is an example with the main parameters that should ...
- 1 kudos
- 1116 Views
- 0 replies
- 0 kudos
Small files and discrepancy in S3 vs catalog
Hello all,I'm in the process of optimizing my tables and I'm running into a confusing situation. I have a table named "trace_messages_fg_streaming_event". If I navigate to the Databricks catalog, it shows stats:Size: 6.7GB, Files: 464But when I look ...
- 1116 Views
- 0 replies
- 0 kudos
- 12028 Views
- 1 replies
- 1 kudos
Shared access vs Single user access mode
I am running a notebook to get secret value from GCP Secret Manager. This is working well with Single user Access Mode, however it fail when i use a cluster with Shared Access mode. I have specified the same GCP service account on both of these clust...
- 12028 Views
- 1 replies
- 1 kudos
- 1 kudos
Thanks for your response.I am using a cloud service account (same account that was used to create the workspace) on the cluster properties in case of both the single user cluster and on the shared user cluster. This service account has all the necess...
- 1 kudos
- 17381 Views
- 6 replies
- 0 kudos
Specify bottleneck for databricks cluster
Hi, Im trying to find out what is bottleneck on cluster when running loading process.Scenario: Loading CDC changes from sql server to Raw zone and merge changes into Bronze zone and then merge Bronze to Silver. All is orchestrated in data factory as ...
- 17381 Views
- 6 replies
- 0 kudos
- 0 kudos
stdout and stderr looks okay, do you have the log4j to share? You can make a doc out of it and share the doc here.
- 0 kudos
- 30186 Views
- 3 replies
- 3 kudos
Facing Issues with Databricks JDBC Connectivity after Idle time
Hello team, I am using commons(commons-dbcp2) Datasource which supports default connection pooling in Spring Java application (rest services to fetch databricks data via JDBC template).Initially all works fine and can get the data from databricks via...
- 30186 Views
- 3 replies
- 3 kudos
- 3 kudos
I am seeing the same issue with hikari. When a pooled connection is created then the databricks cluster is terminated (or restarted), the HikariDataSource retains a stale session handle.Why does connection.isValid() returns true then executing any qu...
- 3 kudos
- 4796 Views
- 2 replies
- 0 kudos
Resolved! Unable to edit Catalog Owner
I created a Catalog and ownership was assigned to meI created databricks account-group on UC, added my user to this account-group, Assigned ownership of the catalog to this account-group.I deleted the account-groupNow, the catalog ownership is showin...
- 4796 Views
- 2 replies
- 0 kudos
- 0 kudos
Hi, In addition to the previous message, you can refer to https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/index.html#assign-a-metastore-admin to get more information on metastore and etc.
- 0 kudos
- 3017 Views
- 2 replies
- 0 kudos
Merge version data files of Delta table
Hi,I am having one CDC enabled Delta table. In 256th version, table is having 50 data files. I want all to merge and create a single file. How can I merge all 50 data file and when I query for 256th version, I should get 1 data file? Is there any com...
- 3017 Views
- 2 replies
- 0 kudos
- 0 kudos
Hi, ae you talking about merging CSV files? https://community.databricks.com/t5/machine-learning/merge-12-csv-files-in-databricks/td-p/3551#:~:text=Use%20Union()%20method%20to,from%20the%20specified%20set%2Fs.
- 0 kudos
- 2224 Views
- 1 replies
- 0 kudos
validating record count at SQL server database tabbles with migrated azure data lake gen2
we are migrating out project from on-premise to azure , so on-premise database is the SQL server that we are using and azure data lake gen2 is the storage location where store data currently and so far we are currently validating record count of each...
- 2224 Views
- 1 replies
- 0 kudos
- 1825 Views
- 0 replies
- 0 kudos
Deleting external table takes 8 hrs
Hi,I am trying to delete the data from the external partitioned table, it has around 3 years of data, and the partition is created on the date column.I am trying to delete each partition first and then the schema of the table, which takes around 8hrs...
- 1825 Views
- 0 replies
- 0 kudos
- 2456 Views
- 0 replies
- 0 kudos
why the code breaks below?
from pyspark.sql import SparkSessionfrom pyspark.ml.regression import LinearRegressionfrom pyspark.ml.feature import VectorAssemblerfrom pyspark.ml.evaluation import RegressionEvaluatorfrom pyspark.ml import Pipelineimport numpy as np# Create a Spark...
- 2456 Views
- 0 replies
- 0 kudos
- 2046 Views
- 2 replies
- 0 kudos
Having trouble with ARC (Automated Record Connector) Python Notebook
I'm trying to use Databricks ARC (Automated Record Connector) and running into an object issue. I assume I'm missing something rather trivial that's not related to ARC. #Databricks Python notebook #CMD1 import AutoLinker from arc.autolinker import A...
- 2046 Views
- 2 replies
- 0 kudos
- 0 kudos
https://www.databricks.com/blog/improving-public-sector-decision-making-simple-automated-record-linking and https://github.com/databricks-industry-solutions/auto-data-linkage#databricks-runtime-requirements
- 0 kudos
- 2759 Views
- 1 replies
- 0 kudos
Databricks Certified Data Engineer Associate Exam suspended
Initially, My exam was suspended after 20 mins on me into the exam. I stated that there was some internet issue from my end(however there wasn't any). However, I immediately reached out to Kryterion folks but they responded after 50 minutes and allow...
- 2759 Views
- 1 replies
- 0 kudos
- 0 kudos
Adding my username - akhil.inala@tredence.comexam - Databricks Certified Data Engineer Associate Exam
- 0 kudos
Join Us as a Local Community Builder!
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now-
.CSV
1 -
Access Data
2 -
Access Databricks
2 -
Access Delta Tables
2 -
Account reset
1 -
ADF Pipeline
1 -
ADLS Gen2 With ABFSS
1 -
Advanced Data Engineering
2 -
AI
3 -
Analytics
1 -
Apache spark
1 -
Apache Spark 3.0
1 -
Api Calls
1 -
API Documentation
3 -
App
1 -
Architecture
1 -
asset bundle
1 -
Asset Bundles
3 -
Auto-loader
1 -
Autoloader
4 -
AWS security token
1 -
AWSDatabricksCluster
1 -
Azure
6 -
Azure data disk
1 -
Azure databricks
15 -
Azure Databricks SQL
6 -
Azure databricks workspace
1 -
Azure Unity Catalog
6 -
Azure-databricks
1 -
AzureDatabricks
1 -
AzureDevopsRepo
1 -
Big Data Solutions
1 -
Billing
1 -
Billing and Cost Management
2 -
Blackduck
1 -
Bronze Layer
1 -
Certification
3 -
Certification Exam
1 -
Certification Voucher
3 -
CICDForDatabricksWorkflows
1 -
Cloud_files_state
1 -
CloudFiles
1 -
Cluster
3 -
Cluster Init Script
1 -
Comments
1 -
Community Edition
3 -
Community Event
1 -
Community Group
2 -
Community Members
1 -
Compute
3 -
Compute Instances
1 -
conditional tasks
1 -
Connection
1 -
Contest
1 -
Credentials
1 -
Custom Python
1 -
CustomLibrary
1 -
Data
1 -
Data + AI Summit
1 -
Data Engineer Associate
1 -
Data Engineering
3 -
Data Explorer
1 -
Data Ingestion & connectivity
1 -
Data Processing
1 -
Databrick add-on for Splunk
1 -
databricks
2 -
Databricks Academy
1 -
Databricks AI + Data Summit
1 -
Databricks Alerts
1 -
Databricks App
1 -
Databricks Assistant
1 -
Databricks Certification
1 -
Databricks Cluster
2 -
Databricks Clusters
1 -
Databricks Community
10 -
Databricks community edition
3 -
Databricks Community Edition Account
1 -
Databricks Community Rewards Store
3 -
Databricks connect
1 -
Databricks Dashboard
3 -
Databricks delta
2 -
Databricks Delta Table
2 -
Databricks Demo Center
1 -
Databricks Documentation
4 -
Databricks genAI associate
1 -
Databricks JDBC Driver
1 -
Databricks Job
1 -
Databricks Lakehouse Platform
6 -
Databricks Migration
1 -
Databricks Model
1 -
Databricks notebook
2 -
Databricks Notebooks
4 -
Databricks Platform
2 -
Databricks Pyspark
1 -
Databricks Python Notebook
1 -
Databricks Repo
1 -
Databricks Runtime
1 -
Databricks SQL
5 -
Databricks SQL Alerts
1 -
Databricks SQL Warehouse
1 -
Databricks Terraform
1 -
Databricks UI
1 -
Databricks Unity Catalog
4 -
Databricks Workflow
2 -
Databricks Workflows
2 -
Databricks workspace
3 -
Databricks-connect
1 -
databricks_cluster_policy
1 -
DatabricksJobCluster
1 -
DataCleanroom
1 -
DataDays
1 -
Datagrip
1 -
DataMasking
2 -
DataVersioning
1 -
dbdemos
2 -
DBFS
1 -
DBRuntime
1 -
DBSQL
1 -
DDL
1 -
Dear Community
1 -
deduplication
1 -
Delt Lake
1 -
Delta Live Pipeline
3 -
Delta Live Table
5 -
Delta Live Table Pipeline
5 -
Delta Live Table Pipelines
4 -
Delta Live Tables
7 -
Delta Sharing
2 -
deltaSharing
1 -
Deny assignment
1 -
Development
1 -
Devops
1 -
DLT
10 -
DLT Pipeline
7 -
DLT Pipelines
5 -
Dolly
1 -
Download files
1 -
Dynamic Variables
1 -
Engineering With Databricks
1 -
env
1 -
ETL Pipelines
1 -
External Sources
1 -
External Storage
2 -
FAQ for Databricks Learning Festival
2 -
Feature Store
2 -
Filenotfoundexception
1 -
Free trial
1 -
GCP Databricks
1 -
GenAI
1 -
Getting started
2 -
Google Bigquery
1 -
HIPAA
1 -
Hubert Dudek
5 -
import
1 -
Integration
1 -
JDBC Connections
1 -
JDBC Connector
1 -
Job Task
1 -
Learning
1 -
Lineage
1 -
LLM
1 -
Login
1 -
Login Account
1 -
Machine Learning
3 -
MachineLearning
1 -
Materialized Tables
2 -
Medallion Architecture
1 -
meetup
1 -
Metadata
1 -
Migration
1 -
ML Model
2 -
MlFlow
2 -
Model Training
1 -
Module
1 -
Monitoring
1 -
Networking
1 -
Notebook
1 -
Onboarding Trainings
1 -
OpenAI
1 -
Pandas udf
1 -
Permissions
1 -
personalcompute
1 -
Pipeline
2 -
Plotly
1 -
PostgresSQL
1 -
Pricing
1 -
Pyspark
1 -
Python
5 -
Python Code
1 -
Python Wheel
1 -
Quickstart
1 -
Read data
1 -
Repos Support
1 -
Reset
1 -
Rewards Store
2 -
Sant
1 -
Schedule
1 -
Serverless
3 -
serving endpoint
1 -
Session
1 -
Sign Up Issues
2 -
Software Development
1 -
Spark Connect
1 -
Spark scala
1 -
sparkui
2 -
Splunk
2 -
SQL
8 -
Summit23
7 -
Support Tickets
1 -
Sydney
2 -
Table Download
1 -
Tags
3 -
terraform
1 -
Training
2 -
Troubleshooting
1 -
Unity Catalog
4 -
Unity Catalog Metastore
2 -
Update
1 -
user groups
1 -
Venicold
3 -
Voucher Not Recieved
1 -
Watermark
1 -
Weekly Documentation Update
1 -
Weekly Release Notes
2 -
Women
1 -
Workflow
2 -
Workspace
3
- « Previous
- Next »
| User | Count |
|---|---|
| 133 | |
| 126 | |
| 57 | |
| 48 | |
| 42 |