cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Harsh1
by New Contributor II
  • 982 Views
  • 0 replies
  • 0 kudos

Issues in Metastore Migration using Databricks Migration Tool

Hi Team,As I'm performing the Databricks workspace migration, during Metastore migration I'm facing below issue.As we found differences in the Metastore table count between Legacy and Target workspace, we checked error logs.After going through Failed...

  • 982 Views
  • 0 replies
  • 0 kudos
isaac_gritz
by Databricks Employee
  • 1211 Views
  • 0 replies
  • 1 kudos

Data Mesh with Databricks

Where to Learn More about Databricks for Data MeshWe recommend checking out our Data & AI Summit Talk on how the Databricks Lakehouse platform is the best platform for distributed architectures like Data Mesh. We would also recommend checking out thi...

  • 1211 Views
  • 0 replies
  • 1 kudos
isaac_gritz
by Databricks Employee
  • 7510 Views
  • 4 replies
  • 2 kudos

Performance Tuning Best Practices

Recommendations for performance tuning best practices on DatabricksWe recommend also checking out this article from my colleague @Franco Patano​ on best practices for performance tuning on Databricks.​Performance tuning your workloads is an important...

Performance Tuning Framework.png
  • 7510 Views
  • 4 replies
  • 2 kudos
Latest Reply
isaac_gritz
Databricks Employee
  • 2 kudos

Let us know in the comments if you have any other performance tuning tips & tricks

  • 2 kudos
3 More Replies
harsha4u
by New Contributor II
  • 934 Views
  • 1 replies
  • 2 kudos

Any suggestions around automating sizing of clusters and best practices around it? Other than enabling auto scaling, are there any other practices aro...

Any suggestions around automating sizing of clusters and best practices around it? Other than enabling auto scaling, are there any other practices around creating a right size driver/worker nodes?

  • 934 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16766737456
Databricks Employee
  • 2 kudos

Autoscaling should help in sizing the clusters according to the workload. You may want to consider the recommendations here: https://docs.databricks.com/clusters/cluster-config-best-practices.html#cluster-sizing-considerations

  • 2 kudos
Raymond_Garcia
by Contributor II
  • 2339 Views
  • 4 replies
  • 2 kudos

Migrating from Databricks Notebooks to IDE for Development

Hello, we are developers who have been creating a system in Databricks with Scala. We enabled the Git feature, so the project is in a repository. The project has a lot of notebooks and a lot of calls to other notebooks. Sometimes it is a little overw...

  • 2339 Views
  • 4 replies
  • 2 kudos
Latest Reply
Raymond_Garcia
Contributor II
  • 2 kudos

it is true that we can't work without data bricks but we can develop an IDE and send the jar to databricks, this will allow us to create unit tests, and use the IDE capabilities (i.e fast navigation among classes).

  • 2 kudos
3 More Replies
SailajaB
by Valued Contributor III
  • 7820 Views
  • 1 replies
  • 5 kudos

Resolved! Best practices for implementing Unit Test cases in databricks and Azure devops

Hello,Please suggest the best practices/ ways to implement the unit test cases in Databricks python to pass code coverage at Azure devops

  • 7820 Views
  • 1 replies
  • 5 kudos
Latest Reply
User16753725182
Databricks Employee
  • 5 kudos

Hi, the process is like traditional software development practices.Docs to refer: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/ci-cd-azure-devops#unit-tests-in-azure-databricks-notebooksAzure DevOps Best Practices: https://docs.m...

  • 5 kudos
Srikanth_Gupta_
by Valued Contributor
  • 4764 Views
  • 2 replies
  • 0 kudos

How to process images and video through structured streaming using Delta Lake?

Can we scan though videos and identify and alert in real time if something goes wrong? what are best practices for this kind of use case?

  • 4764 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Maybe I'm a little off topic, but can you recommend companies that are engaged in video production? I want to make an explanatory video for my site.

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 1563 Views
  • 1 replies
  • 1 kudos

Resolved! Access to Cluster Logs for non-admins

Suppose I have a DevOps team that needs near real-time access to cluster logs to troubleshoot job failures. What is the best way for me to grant access to view logs without granting them admin access?

  • 1563 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Please use logging option and set destination for sending logs in cluster settings to other Azure Blob or S3 storage (need to be mounted first):

  • 1 kudos
nicole_wong
by New Contributor II
  • 2314 Views
  • 1 replies
  • 1 kudos

Resolved! Best practices for working with Redshift

I have a customer with the following question - I'm posting on their behalf to introduce them to the community. For doing modeling in a python environment what is our best practice for getting the data from redshift? A "load" option seems to leave me...

  • 2314 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Nicole Wong​ ,Have you check the docs from here? As far as I know, this might be the only way to read/write data to/from redshift.

  • 1 kudos
sravan_enukonda
by New Contributor II
  • 2765 Views
  • 1 replies
  • 2 kudos

Resolved! I am looking for best practices in implementing Ranger type of Access control in Databricks ?

Need this to do auditing and numbers of users accessing databases and tables created in databricks

  • 2765 Views
  • 1 replies
  • 2 kudos
Latest Reply
garren_staubli
New Contributor III
  • 2 kudos

Hi Sravan, Apache Ranger is commonly used for fine-grained access controls. In your description, it sounds like you might be able to leverage Databricks audit logs, which would allow you to see user-level actions: https://docs.databricks.com/administ...

  • 2 kudos
User16857281869
by New Contributor II
  • 808 Views
  • 1 replies
  • 0 kudos

We want to do demand forecasting for our supply chain. How should we benefit from Spark in the Usercase development?

We have a series of blogs on the topic which describe the challenges and the best practices on development of demand forecasting usecases on Databricks. Please refer to this blog and the references in it for more info.

  • 808 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

We have a series of blogs on the topic which describe the challenges and the best practices on development of demand forecasting usecases on Databricks. Please refer to this blog and the references in it for more info.

  • 0 kudos
brickster_2018
by Databricks Employee
  • 1202 Views
  • 1 replies
  • 0 kudos

What are the best practices for Adaptive query execution

What are common configurations used and which workload will get benefit

  • 1202 Views
  • 1 replies
  • 0 kudos
Latest Reply
amr
Databricks Employee
  • 0 kudos

Leave it turned on. the bet is with each Spark version released AQE will get better and better and eventually will lead to a much more performance optimisation plan than manually trying to tune it.

  • 0 kudos
brickster_2018
by Databricks Employee
  • 832 Views
  • 1 replies
  • 0 kudos

Resolved! Best practices for DStream application in Databricks

I do not see any best practice guide for the DStream application in Databricks docs. Any reference

  • 832 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Dstream is unsupported by Databricks. Databrcks strongly recommend migrating the Dstream applications to use Structured Streaminghttps://kb.databricks.com/streaming/dstream-not-supported.html

  • 0 kudos
User16783853906
by Contributor III
  • 2696 Views
  • 3 replies
  • 0 kudos

Resolved! Frequent spot loss of driver nodes resulting in failed jobs when using spot fleet pools

When using spot fleet pools to schedule jobs, driver and worker nodes are provisioned from the spot pools and we are noticing jobs failing with the below exception when there is a driver spot loss. Share best practices around using fleet pools with 1...

  • 2696 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16783853906
Contributor III
  • 0 kudos

In this scenario, the driver node is reclaimed by AWS. Databricks started preview of hybrid pools feature which would allow you to provision driver node from a different pool. We recommend using on-demand pool for driver node to improve reliability i...

  • 0 kudos
2 More Replies
Labels