Introduction
In Part 1 of this blog series, we explored the various types of duplicates, considerations for remediation, and the impacts of unchecked duplicated records on strategic decision-making.
...
By Jeroen Meulemans, Solutions Architect at Databricks Amsterdam, [email protected], Thu 7 Sep 2023
Introduction
This blog guides you through the process of configuring OAuth credential...
Authors: Anastasia Prokaieva and Puneet Jain
In our first part, we have covered the main aspects of the data loading using Hugging Face integration with the Spark dataframes and how to use RayAIR to ...
Learn to build fast, stateful pipelines for operational workloads. Discover stateless vs. stateful streams, how to setup your cluster and more. Get hands-on building a pipeline with code snippets and ...
Authors: Abhishek Pratap (@aps) & Dipankar Kushari (@dkushari)In this blog, we explore how to synchronize nested groups in Databricks from your organization’s identity provider - Azure Active Director...
Databricks Serverless SQL (DBSQL) is the latest offering from Databricks to build data warehouses on the Lakehouse. It incorporates all the Lakehouse features like open format, unified analytics, and ...
What is Single Sign-On (SSO)?The rise of Software as a Service (Saas) has proliferated the adoption of specialized software in todays modern world. This resulted in enterprises adopting many Saas serv...
Authors: Anastasia Prokaieva and Puneet Jain The aim of this blog is to show the end-to-end process of conversion from vanilla Hugging Face to Ray AIR on Databricks, without changing the training lo...
Maintaining Slowly Changing Dimensions (SCD) is a common practice in data warehousing to manage and track changes in your records over time. It enables businesses to make more informed and strategic d...
In this blog, I would like to introduce to you the Databricks lakehouse platform and explain concepts like batch processing, streaming, apache spark at a high level and how it all ties together with s...