cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Latest Blog PostsJanuary 13 - 20 Did you get a chance to look at the most recent blog posts? Here are some happening content from the past week that i...

Sujitha
Community Manager
Community Manager

Latest Blog Posts

January 13 - 20

Did you get a chance to look at the most recent blog posts?

Here are some happening content from the past week that is worth the read. 

What’s New With SQL User-Defined Functions 

In this blog, we describe several enhancements we have recently made to make SQL user-defined functions even more user-friendly and powerful, along with examples of how you can use them to wrap encapsulated logic in components suitable for using on your own or sharing with others. This way, you can keep queries simple while enjoying strong type-safety thanks to the Databricks SQL analyzer. Please read on for more details.

Easy Ingestion to Lakehouse With COPY INTO

This blog focuses on COPY INTO, a simple yet powerful SQL command that allows you to perform batch file ingestion into Delta Lake from cloud object stores. It's idempotent, which guarantees to ingest files with exactly-once semantics when executed multiple times, supporting incremental appends and simple transformations. It can be run once, in an ad hoc manner, and can be scheduled through Databricks Workflows. In recent Databricks Runtime releases, COPY INTO introduced new functionalities for data preview, validation, enhanced error handling, and a new way to copy into a schemaless Delta Lake table so that users can get started quickly, completing the end-to-end user journey to ingest from cloud object stores. Let's take a look at the popular COPY INTO use cases. 

Streaming in Production: Collected Best Practices

The recommendations in this blog post are written from the Structured Streaming engine perspective, most of which apply to both DLT and Workflows (although DLT does take care of some of these automatically, like Triggers and Checkpoints). We group the recommendations under the headings "Before Deployment" and "After Deployment" to highlight when these concepts will need to be applied and are releasing this blog series with this split between the two. There will be additional deep-dive content for some of the sections beyond as well. We recommend reading all sections before beginning work to productionalize a streaming pipeline or application, and revisiting these recommendations as you promote it from dev to QA and eventually production. 

Best Practices for Super Powering Your dbt Project on Databricks

dbt is a data transformation framework that enables data teams to collaboratively model, test and document data in data warehouses. Getting started with dbt and Databricks SQL is very simple with the native dbt-databricks adapter, support for running dbt in production in Databricks Workflows, and easy connectivity to dbt Cloud through Partner Connect. You can have your first dbt project running in production in no time at all!

However, as you start to deploy more complex dbt projects into production you will likely need to start using various advanced features like macros and hooks, dbt packages and third party tools to help improve your productivity and development workflow. In this blog post, we will share five best practices to supercharge your dbt project on Databricks.

Streaming in Production: Collected Best Practices, Part 2

In our two-part blog series titled "Streaming in Production: Collected Best Practices," this is the second article. Here we discuss the "After Deployment" considerations for a Structured Streaming Pipeline. The majority of the suggestions in this post are relevant to both Structured Streaming Jobs and Delta Live Tables (our flagship and fully managed ETL product that supports both batch and streaming pipelines).

The previous issue "Before Deployment" is covered in Collected Best Practices, Part 1 - if you haven't read the post yet, we suggest doing so first.

We still recommend reading all of the sections from both posts before beginning work to productionalize a Structured Streaming job, and hope you will revisit these recommendations again as you promote your applications from dev to QA and eventually production.

5 REPLIES 5

Ajay-Pandey
Esteemed Contributor III

Thanks for sharing.

Ajay Kumar Pandey

Hubert-Dudek
Esteemed Contributor III

Thank you @Sujitha Ramamoorthy​ for great insights

Glad to know it was insightful! 🙂

Chaitanya_Raju
Honored Contributor

Thanks @Sujitha Ramamoorthy​ , for sharing with the community these are worth reading and insightful.

Thanks for reading and like if this is useful and for improvements or feedback please comment.

Your acknowledgement means a lot! Appreciate it! Thank you 🙂

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group