Topics with Label: Data Engineering

by dmatrixjsd • Databricks Employee

2 weeks ago

172 Views
0 replies
1 kudos

Handling the Chaos: Data Quality Strategies with PySpark Ingestion

Tips and Techniques for Ingesting Large JSON files with PySparkIntroductionSuppose you’ve ever struggled or grappled with consuming massive JSON files with PySpark. In that case, you are aware that insufficient data can always creep in and silently d...

Screenshot 2025-11-16 at 10.42.50 AM.png

Screenshot 2025-11-16 at 10.45.18 AM.png

Community Articles

Reply

172 Views
0 replies
1 kudos

2 weeks ago

by jsdmatrix • Databricks Employee

10-31-2025 2:58:38 PM

238 Views
0 replies
1 kudos

SQL Scripting in Apache Spark™ 4.0

The Apache Spark™ 4.0 introduces a new feature for SQL developers and data engineers: SQL Scripting. As such, this feature enhances the power and extends the flexibility of Spark SQL, enabling users to write procedural code within SQL queries, with t...

Community Articles

Reply

238 Views
0 replies
1 kudos

10-31-2025 2:58:38 PM

by Traxccel • New Contributor

07-23-2025 6:34:25 AM

858 Views
0 replies
0 kudos

Implementing data contracts on Databricks for industrial AI pipelines

Enforce schema consistency using declarative contracts on Databricks Lakehouse.Industrial AI is transforming how operations are optimized, from forecasting equipment failure to streamlining supply chains. But even the most advanced models are only as...

Community Articles

Reply

858 Views
0 replies
0 kudos

07-23-2025 6:34:25 AM

by Harun • Honored Contributor

06-08-2024 8:20:11 AM

8467 Views
3 replies
2 kudos

Optimizing Costs in Databricks by Dynamically Choosing Cluster Sizes

Databricks is a popular unified data analytics platform known for its powerful data processing capabilities and seamless integration with Apache Spark. However, managing and optimizing costs in Databricks can be challenging, especially when it comes ...

Community Articles

Reply

8467 Views
3 replies
2 kudos

06-08-2024 8:20:11 AM

View Replies

Latest Reply

kmacgregor
New Contributor II

03-12-2025 8:21:44 AM

2 kudos

How can this actually be used to choose a cluster pool for a Databricks workflow dynamically, that is, at run time? In other words, what can you actually do with the value of `selected_pool` other than printing it out?

2 kudos

03-12-2025 8:21:44 AM

2 More Replies

by CURIOUS_DE • Contributor III

06-10-2025 9:14:50 PM

1190 Views
2 replies
6 kudos

🔐 How Do I Prevent Users from Accidentally Deleting Tables in Unity Catalog? 🔐

Question:I have a role called dev-dataengineer with the following privileges on the catalog dap_catalog_dev:APPLY TAGCREATE FUNCTIONCREATE MATERIALIZED VIEWCREATE TABLECREATE VOLUMEEXECUTEREAD VOLUMEREFRESHSELECTUSE SCHEMAWRITE VOLUMEDespite this, u...

Community Articles

Reply

1190 Views
2 replies
6 kudos

06-10-2025 9:14:50 PM

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor

07-01-2025 12:23:23 PM

6 kudos

Managing assets in UC is always a overhead maintenance. We have this access controls in terraform codes and it is always hard to see what level of access is given to different personas in the org. We are building an audit dashboard for it.

6 kudos

07-01-2025 12:23:23 PM

1 More Replies

Databricks Community

Forum Posts

Handling the Chaos: Data Quality Strategies with PySpark Ingestion

SQL Scripting in Apache Spark™ 4.0

Implementing data contracts on Databricks for industrial AI pipelines

Optimizing Costs in Databricks by Dynamically Choosing Cluster Sizes

🔐 How Do I Prevent Users from Accidentally Deleting Tables in Unity Catalog? 🔐