- 11119 Views
- 3 replies
- 1 kudos
Fuzzy Match on PySpark using UDF/Pandas UDF
I'm trying to do fuzzy matching on two dataframes by cross joining them and then using a udf for my fuzzy matching. But using both python udf and pandas udf its either very slow or I get an error. @pandas_udf("int")def core_match_processor(s1: pd.Ser...
- 11119 Views
- 3 replies
- 1 kudos
- 1 kudos
I'm now getting the error: (SQL_GROUPED_AGG_PANDAS_UDF) is not supported on clusters in Shared access mode.Even though this article clearly states that pandas udf is supported for shared cluster in databrickshttps://www.databricks.com/blog/shared-clu...
- 1 kudos
- 7163 Views
- 2 replies
- 1 kudos
Resolved! Okta and Unified login
Hey Folks anyone put Databricks behind Okta and enabled Unified Login with workspaces that have a Unity Catalog metastore applied and some that don't?There are some workspaces we can't move over yet and it isn't clear in documentation if Unity Catalo...
- 7163 Views
- 2 replies
- 1 kudos
- 1 kudos
Yes, users should be able to use a single Okta application for all workspaces, regardless of whether the Unity Catalog metastore has been applied or not. The Unity Catalog is a feature that allows you to manage and secure access to your data across a...
- 1 kudos
- 1483 Views
- 0 replies
- 0 kudos
Public preview API not working - artifact-allowlists
I am trying to hit /api/2.1/unity-catalog/artifact-allowlists/as a part of INIT migration script. Its is in public preview, do we need to enable anything else to use a API which is in Public preview. I am getting 404 error. But using same token for ...
- 1483 Views
- 0 replies
- 0 kudos
- 3448 Views
- 1 replies
- 0 kudos
How to enable "Create Vector Search Index" button in DB workspace?
How to enable "Create Vector Search Index" button in DB workspace?Following is the screenshot from the Microsoft Ignite 2023 Databricks presentation:
- 3448 Views
- 1 replies
- 0 kudos
- 0 kudos
The feature is in public preview only in some regions, you can check the available regions in the documentation here. In addition there are certain requirements, such as a UC enabled workspace and Serverless Compute enabled, you can check all requir...
- 0 kudos
- 5299 Views
- 5 replies
- 0 kudos
CONVERT_TIMEZONE issue in DLT
I can run a query that uses the CONVERT_TIMEZONE function in a SQL notebook. When I move the code to my DLT notebook the pipeline produces this error:Cannot resolve function `CONVERT_TIMEZONE`Here is the line: CONVERT_TIMEZONE('UTC', 'America/Phoen...
- 5299 Views
- 5 replies
- 0 kudos
- 0 kudos
Yes, the notebook is set to SQL and the convert_timezone function is within a select statement.
- 0 kudos
- 8801 Views
- 2 replies
- 1 kudos
Can we get the actual query execution plan programmatically after a query is executed? Apart from UI
Let's say i have run a query and it showed me results. we can find the respective query execution plan on the UI. Is there any way we can get that execution plan through programmatically or through API?
- 8801 Views
- 2 replies
- 1 kudos
- 1 kudos
You can obtain the query execution plan programmatically using the EXPLAIN statement in SQL. The EXPLAIN statement displays the execution plan that the database planner generates for the supplied statement. The execution plan shows how the table(s) r...
- 1 kudos
- 3481 Views
- 2 replies
- 4 kudos
Top Kudoed Author 🌟🤩🧑🎤
I recently saw a link to the Kudos Leaderboard for the Community Discussions. It has always been my hope and fantasy , ever since I was a little child that I would someday be the #1 Kudoed Author on Community Discusions on community.Databricks.com....
- 3481 Views
- 2 replies
- 4 kudos
- 10836 Views
- 5 replies
- 7 kudos
Incremental ingestion of Snowflake data with Delta Live Table (CDC)
Hello,I have some data which are lying into Snowflake, so I want to apply CDC on them using delta live table but I am having some issues.Here is what I am trying to do: @dlt.view() def table1(): return spark.read.format("snowflake").options(**opt...
- 10836 Views
- 5 replies
- 7 kudos
- 7 kudos
The CDC for delta live works fine for delta tables, as you have noticed. However it is not a full blown CDC implementation/software.If you want to capture changes in Snowflake, you will have to implement some CDC method on Snowflake itself, and read...
- 7 kudos
- 3117 Views
- 2 replies
- 0 kudos
New to PySpark
Hi all,I am trying to get the domain from an email field using below expression; but getting an error.Kindly help. df.select(df.email, substring(df.email,instr(df.email,'@'),length(df.email).alias('domain')))
- 3117 Views
- 2 replies
- 0 kudos
- 0 kudos
In your case, you want to extract the domain from the email, which starts from the position just after '@'. So, you should add 1 to the position of '@'. Also, the length of the substring should be the difference between the total length of the email ...
- 0 kudos
- 2073 Views
- 1 replies
- 0 kudos
Issue in inferring schema for streaming dataframe using json files
Below is the pileine design in databricks and it's not working out , kindly look on this and let me know whether it will work or not , I'm getting json files of different schemas from directory under the root directory and it read all the files using...
- 2073 Views
- 1 replies
- 0 kudos
- 0 kudos
Could you please share some sample of your dataset and code snippet of what you're trying to implement?
- 0 kudos
- 6381 Views
- 0 replies
- 0 kudos
Database: Delta Lake or PostgreSQL
Hey all,I am searching for a non-political answer to my database questions. Please know that I am a data newbie and litteraly do not know anything about this topic, but I want to learn, so please be gentle. Some context: I am working for an OEM that...
- 6381 Views
- 0 replies
- 0 kudos
- 7362 Views
- 2 replies
- 3 kudos
Resolved! Pros and cons of physically separating data in different storage accounts and containers
When setting up Unity Catalog, it is recommended by Databricks to figure out your data isolation model when it comes to physically separating your data into different storage accounts and/or contaners. There are so many options, it can be hard to be ...
- 7362 Views
- 2 replies
- 3 kudos
- 3 kudos
Hello @pernilak , Thanks for reaching out to Databricks Community! My name is Raphael, and I'll be helping out. Should all catalogs and the metastore reside in the same storage account (but different containers) Yes, Databricks recommends having o...
- 3 kudos
- 2362 Views
- 0 replies
- 0 kudos
New draft for every post I visit
When I visit my profile page, under the drafts section I see an entry for every post I visit in the discussions. Is this normal?
- 2362 Views
- 0 replies
- 0 kudos
- 1763 Views
- 1 replies
- 1 kudos
Databricks Web Editor's Cell like UI in local IDE
I want to have databricks related developement locally.There is extension that allows to run local python file on remote databricks cluster.But I want to have cell like structure that is present in databricks UI for python files in local IDE as well....
- 1763 Views
- 1 replies
- 1 kudos
- 1 kudos
@swapnilmd You can use VSCode extension for Databricks.https://docs.databricks.com/en/dev-tools/vscode-ext/index.html
- 1 kudos
- 6183 Views
- 2 replies
- 0 kudos
Databricks jdbc driver connectiion issue with apache solr
Hi,databricks jdbc version - 2.6.34I am facing the below issue with connecting databricks sql from apache solr Caused by: java.sql.SQLFeatureNotSupportedException: [Databricks][JDBC](10220) Driver does not support this optional feature.at com.databri...
- 6183 Views
- 2 replies
- 0 kudos
- 0 kudos
Databricks team recommended to set IgnoreTransactions=1 and autocommit=false in the connection string but that didn't resolve the issue .Ultimately I had to use solr update API for uploading documents
- 0 kudos
-
.CSV
1 -
Access Data
2 -
Access Databricks
3 -
Access Delta Tables
2 -
Account reset
1 -
adcAws databricks
1 -
ADF Linked Service
1 -
ADF Pipeline
1 -
ADLS Gen2 With ABFSS
1 -
Advanced Data Engineering
2 -
AI
5 -
Analytics
1 -
Apache spark
1 -
Apache Spark 3.0
1 -
api
1 -
Api Calls
1 -
API Documentation
4 -
App
2 -
Application
2 -
Architecture
1 -
asset bundle
1 -
Asset Bundles
3 -
Auto-loader
1 -
Autoloader
4 -
Aws databricks
1 -
AWS security token
1 -
AWSDatabricksCluster
1 -
Azure
7 -
Azure data disk
1 -
Azure databricks
16 -
Azure Databricks Delta Table
1 -
Azure Databricks Job
1 -
Azure Databricks SQL
6 -
Azure databricks workspace
1 -
Azure Unity Catalog
6 -
Azure-databricks
1 -
AzureDatabricks
1 -
AzureDevopsRepo
1 -
best practices
1 -
Big Data Solutions
1 -
Billing
1 -
Billing and Cost Management
2 -
Blackduck
1 -
Bronze Layer
1 -
CDC
1 -
Certification
3 -
Certification Exam
1 -
Certification Voucher
3 -
CICDForDatabricksWorkflows
1 -
Cloud_files_state
1 -
CloudFiles
1 -
Cluster
3 -
Cluster Init Script
1 -
Comments
1 -
Community Edition
4 -
Community Edition Account
1 -
Community Event
1 -
Community Group
2 -
Community Members
1 -
Compute
3 -
Compute Instances
1 -
conditional tasks
1 -
Connection
1 -
Contest
1 -
Credentials
1 -
csv
1 -
Custom Python
1 -
CustomLibrary
1 -
Data
1 -
Data + AI Summit
1 -
Data Engineer Associate
1 -
Data Engineering
4 -
Data Explorer
1 -
Data Governance
1 -
Data Ingestion & connectivity
1 -
Data Ingestion Architecture
1 -
Data Processing
1 -
Databrick add-on for Splunk
1 -
databricks
4 -
Databricks Academy
1 -
Databricks AI + Data Summit
1 -
Databricks Alerts
1 -
Databricks App
1 -
Databricks Assistant
1 -
Databricks autoloader
1 -
Databricks Certification
1 -
Databricks Cluster
2 -
Databricks Clusters
1 -
Databricks Community
10 -
Databricks community edition
3 -
Databricks Community Edition Account
1 -
Databricks Community Rewards Store
3 -
Databricks connect
1 -
Databricks Dashboard
3 -
Databricks delta
2 -
Databricks Delta Table
2 -
Databricks Demo Center
1 -
Databricks Documentation
4 -
Databricks genAI associate
1 -
Databricks JDBC Driver
1 -
Databricks Job
1 -
Databricks Lakehouse Platform
6 -
Databricks Migration
1 -
Databricks Model
1 -
Databricks notebook
2 -
Databricks Notebooks
4 -
Databricks Platform
2 -
Databricks Pyspark
1 -
Databricks Python Notebook
1 -
Databricks Repo
1 -
Databricks Runtime
1 -
Databricks Serverless
2 -
Databricks SQL
5 -
Databricks SQL Alerts
1 -
Databricks SQL Warehouse
1 -
Databricks Terraform
1 -
Databricks UI
1 -
Databricks Unity Catalog
4 -
Databricks User Group
1 -
Databricks Workflow
2 -
Databricks Workflows
2 -
Databricks workspace
3 -
Databricks-connect
1 -
databricks_cluster_policy
1 -
DatabricksJobCluster
1 -
DataCleanroom
1 -
DataDays
1 -
Datagrip
1 -
DataMasking
2 -
DataVersioning
1 -
dbdemos
2 -
DBFS
1 -
DBRuntime
1 -
DBSQL
1 -
DDL
1 -
Dear Community
1 -
deduplication
1 -
Delt Lake
1 -
Delta Live Pipeline
3 -
Delta Live Table
5 -
Delta Live Table Pipeline
5 -
Delta Live Table Pipelines
4 -
Delta Live Tables
7 -
Delta Sharing
2 -
Delta Time Travel
1 -
deltaSharing
1 -
Deny assignment
1 -
Development
1 -
Devops
1 -
DLT
10 -
DLT Pipeline
7 -
DLT Pipelines
5 -
Dolly
1 -
Download files
1 -
DQX
1 -
Dynamic Variables
1 -
Engineering With Databricks
1 -
env
1 -
ETL Pipelines
1 -
Event Driven
1 -
External Sources
1 -
External Storage
2 -
FAQ for Databricks Learning Festival
2 -
Feature Store
2 -
Filenotfoundexception
1 -
Free Edition
1 -
Free trial
1 -
friendsofcommunity
1 -
GCP Databricks
1 -
GenAI
2 -
GenAI and LLMs
1 -
GenAI Course Material
1 -
Getting started
3 -
Google Bigquery
1 -
HIPAA
1 -
Hubert Dudek
2 -
import
2 -
Integration
1 -
JDBC Connections
1 -
JDBC Connector
1 -
Job Task
1 -
JSON Object
1 -
LakeflowDesigner
1 -
Learning
2 -
Lineage
1 -
LLM
1 -
Login
1 -
Login Account
1 -
Machine Learning
3 -
MachineLearning
1 -
Materialized Tables
2 -
Medallion Architecture
1 -
meetup
2 -
Metadata
1 -
Migration
1 -
ML Model
2 -
MlFlow
2 -
Model
1 -
Model Serving
1 -
Model Training
1 -
Module
1 -
Monitoring
1 -
Networking
2 -
Notebook
1 -
Onboarding Trainings
1 -
OpenAI
1 -
Pandas udf
1 -
Permissions
1 -
personalcompute
1 -
Pipeline
2 -
Plotly
1 -
PostgresSQL
1 -
Pricing
1 -
provisioned throughput
1 -
Pyspark
1 -
Python
5 -
Python Code
1 -
Python Wheel
1 -
Quickstart
1 -
Read data
1 -
Repos Support
1 -
Reset
1 -
Rewards Store
2 -
Sant
1 -
Schedule
1 -
Serverless
3 -
serving endpoint
1 -
Session
1 -
Sign Up Issues
2 -
Software Development
1 -
Spark
1 -
Spark Connect
1 -
Spark scala
1 -
sparkui
2 -
Speakers
1 -
Splunk
2 -
SQL
8 -
streamlit
1 -
Summit23
7 -
Support Tickets
1 -
Sydney
2 -
Table Download
1 -
Tags
3 -
terraform
1 -
Training
2 -
Troubleshooting
1 -
Unity Catalog
4 -
Unity Catalog Metastore
2 -
Update
1 -
user groups
2 -
Venicold
3 -
Vnet
1 -
Voucher Not Recieved
1 -
Watermark
1 -
Weekly Documentation Update
1 -
Weekly Release Notes
2 -
Women
1 -
Workflow
2 -
Workspace
3
- « Previous
- Next »
| User | Count |
|---|---|
| 140 | |
| 135 | |
| 57 | |
| 46 | |
| 42 |