Get Started Resources
Explore essential resources to kickstart your journey with Databricks. Access tutorials, guides, and...
Explore essential resources to kickstart your journey with Databricks. Access tutorials, guides, and...
Stay updated on Databricks events, including webinars, conferences, and workshops. Discover opportun...
Find answers to common questions and troubleshoot issues with Databricks support FAQs. Access helpfu...
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Dat...
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practi...
Stay up-to-date with the latest announcements from Databricks. Learn about product updates, new feat...
Community-produced videos to help you leverage Databricks in your Data & AI journey. Tune in to expl...
Are you ready to dive into the world of data engineering and analytics? Join us for Databricks Get Started Days, a half-day virtual event designed to accelerate your learning and equip you with essential Databricks skills! Here’s what makes Get Start...
Introduction In the fast-paced world of big data, optimizing performance is critical for maintaining efficiency and reducing costs. Databricks SQL (DBSQL) Warehouse is a robust feature of the Databricks platform that enables data analysts, data engin...
Very detailed and informative. Thanks @rakhidarshi for sharing!
Everyone's rushing their Snowflake to Databricks migration, and they're setting themselves up for failure.After leading multiple enterprise migrations to Databricks last quarter, here's what shocked me: The technical lift isn't the hard part. It's th...
Hi Guys, I have passed it already some time ago, but just recently have summarized all the materials which helped me to do it. Pay special attention to GitHub repository, which contains many great exercises prepared by Databricks teamhttps://youtu.be...
I almost gave up after failing Databricks-Certified-Data-Engineer-Associate once, but then I found Passexamhub. Their realistic practice tests helped me understand my weak areas and improve. Passed on my second attempt—such a relief!
Today we are announcing a deep partnership with SAP which we think can be game changing for our industry. In short, it is the marriage between the most important business data for enterprises globally (SAP data) and the best data platform in the mark...
In today’s data-centric world, experimentation is essential for developers and data scientists to create cutting-edge models, test hypotheses, and build robust data pipelines. However, giving these teams access to production data raises serious conce...
Great post!I went after more reading while reading each topics and I would like to add a few things here1. AnonymizationI wouldn't use uuid() like this.Using a hashing function would be better to ensure consistency across multiple runs. F.sha2(F.con...
In modern data-driven enterprises, data flows like lifeblood through complex systems and repositories to drive decision-making and innovation. Each dataset, whether structured or unstructured, holds the potential to unlock insights and drive innovati...
It is a great article, I am excited for the next parts.I am not sure about having these metadata tables in the lakehouse. It forces us to build a data pipeline for the metadata table. Isn't it better to just use a transactional database like mongo or...
As organizations increasingly adopt multi-cloud strategies to leverage the unique strengths of various cloud platforms, they face the dual challenge of maintaining robust security while enabling efficient data sharing. Balancing accessibility with p...
Thank you for your feedback @Mantsama4!! Our solution is designed to tackle both B2B and Line of Business sharing by combining robust security with flexible, region-specific deployment. For B2B scenarios, we ensure that external partners access data ...
Databricks Unity Catalog (UC) is the industry’s only unified and open governance solution for data and AI, built into the Databricks Data Intelligence Platform. Unity Catalog provides a single source of truth for your organization’s data and AI asset...
Intro Ray is rapidly becoming the standard for logic-parallel computing, enabling many Databricks customers to accelerate a wide range of Python workloads. Since its general availability on Databricks in early 2024, Ray on Databricks has opened up ne...
In this article we will cover in depth about streaming deduplication using watermarking with dropDuplicates and dropDuplicatesWithinWatermark, how they are different. This blog expects you to have a good understanding on how watermarking works in Spa...
Hi @MuraliTalluri,Thank you for such a detailed article.I am following dropDuplicatesWithinWatermark with the same steps as yours. The only difference is that I am using autoloader and reading CSV files as the source and writing data to a Delta table...
As organizations continue to scale their data infrastructure, efficient resource utilization, cost control, and operational transparency are paramount for success. With the growing adoption of Databricks, monitoring and optimizing compute usage and d...
Thank you Mohana for sharing the detail, really appreciate it.
In the world of data integration, synchronizing external relational databases (like Oracle, MySQL) with the Databricks platform can be complex, especially when Change Data Feed (CDF) streams aren’t available. Using snapshots is a powerful way to mana...
Hi AjayCan apply changes into snapshot handle re-processing of an older snapshot? UseCase:- Source has delivered data on day T, T1 and T2. - Consumers realise there is an error on the day T data, and make a correction in the source. The source redel...
Databricks Serverless SQL (DBSQL) is the latest offering from Databricks to build data warehouses on the Lakehouse. It incorporates all the Lakehouse features like open format, unified analytics, and collaborative platforms across the different data ...
This is a great solution! The post provides an in-depth, structured approach to optimizing Databricks SQL Serverless, highlighting key tips such as resource optimization, query performance improvements, and the best practices for data types and cachi...
Databricks recommends four methods to migrate Hive tables to Unity Catalog, each with its pros and cons. The choice of method depends on specific requirements.SYNC: A SQL command that migrates schema or tables to Unity Catalog external tables. Howeve...
This is a great solution! The post effectively outlines the methods for migrating Hive tables to Unity Catalog while emphasizing the importance of not just performing a simple migration but transforming the data architecture into something more robus...
User | Count |
---|---|
247 | |
76 | |
55 | |
38 | |
32 |