cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Facing Issues with Cloud Solutions and Data Engineering – Any Advice?

ameliahebrew
New Contributor II

Hi everyone! I hope you’re all doing well. I’ve recently completed my Mount point in unity catalog - Databricks Community - 79478 projects in the cloud, but now I’m facing some frustrating issues. After deploying my pipelines, I’m encountering data inconsistencies and performance slowdowns that I didn’t anticipate. It seems like the data isn’t syncing properly across different cloud services, leading to gaps in my analytics.

I’ve tried optimizing my configurations and reviewing my architecture, but the problems persist. If anyone has dealt with similar challenges after implementing cloud solutions based data engineering solutions, I’d love to hear your insights. What strategies or tools have you found helpful in resolving these issues?

Any tips on improving data quality and performance would be greatly appreciated! Thank you in advance for your support!

1 ACCEPTED SOLUTION

Accepted Solutions

NandiniN
Databricks Employee
Databricks Employee

Hi @ameliahebrew 

If you're using Unity Catalog, it's recommended to avoid using mount points and instead use Storage Locations within Unity Catalog. This can help improve data governance and access control, which might indirectly help with data sync issues.

If you are observing slowness because of the cloud provider, they would be able to help understand, the cause of the slowness.

In general,

 

  • Large data operations, such as copying data between partitions or updating large tables, can be time-consuming. Consider breaking these operations into smaller, more manageable tasks.
  • If possible, test with a smaller subset of data to identify if the slowness is consistent across different data sizes.
  • Resource could also be a reason. Monitor via Spark UI and metrics to understand specific bottlenecks or stages where the slowness occurs.

 

Thanks!

View solution in original post

1 REPLY 1

NandiniN
Databricks Employee
Databricks Employee

Hi @ameliahebrew 

If you're using Unity Catalog, it's recommended to avoid using mount points and instead use Storage Locations within Unity Catalog. This can help improve data governance and access control, which might indirectly help with data sync issues.

If you are observing slowness because of the cloud provider, they would be able to help understand, the cause of the slowness.

In general,

 

  • Large data operations, such as copying data between partitions or updating large tables, can be time-consuming. Consider breaking these operations into smaller, more manageable tasks.
  • If possible, test with a smaller subset of data to identify if the slowness is consistent across different data sizes.
  • Resource could also be a reason. Monitor via Spark UI and metrics to understand specific bottlenecks or stages where the slowness occurs.

 

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group