Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hi Team,As I'm performing the Databricks workspace migration, during Metastore migration I'm facing below issue.As we found differences in the Metastore table count between Legacy and Target workspace, we checked error logs.After going through Failed...
Where to Learn More about Databricks for Data MeshWe recommend checking out our Data & AI Summit Talk on how the Databricks Lakehouse platform is the best platform for distributed architectures like Data Mesh. We would also recommend checking out thi...
Recommendations for performance tuning best practices on DatabricksWe recommend also checking out this article from my colleague @Franco Patano​ on best practices for performance tuning on Databricks.​Performance tuning your workloads is an important...
Any suggestions around automating sizing of clusters and best practices around it? Other than enabling auto scaling, are there any other practices around creating a right size driver/worker nodes?
Autoscaling should help in sizing the clusters according to the workload. You may want to consider the recommendations here: https://docs.databricks.com/clusters/cluster-config-best-practices.html#cluster-sizing-considerations
Hello, we are developers who have been creating a system in Databricks with Scala. We enabled the Git feature, so the project is in a repository. The project has a lot of notebooks and a lot of calls to other notebooks. Sometimes it is a little overw...
it is true that we can't work without data bricks but we can develop an IDE and send the jar to databricks, this will allow us to create unit tests, and use the IDE capabilities (i.e fast navigation among classes).
Hi, the process is like traditional software development practices.Docs to refer: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/ci-cd-azure-devops#unit-tests-in-azure-databricks-notebooksAzure DevOps Best Practices: https://docs.m...
Suppose I have a DevOps team that needs near real-time access to cluster logs to troubleshoot job failures. What is the best way for me to grant access to view logs without granting them admin access?
I have a customer with the following question - I'm posting on their behalf to introduce them to the community. For doing modeling in a python environment what is our best practice for getting the data from redshift? A "load" option seems to leave me...
Hi Sravan, Apache Ranger is commonly used for fine-grained access controls. In your description, it sounds like you might be able to leverage Databricks audit logs, which would allow you to see user-level actions: https://docs.databricks.com/administ...
We have a series of blogs on the topic which describe the challenges and the best practices on development of demand forecasting usecases on Databricks. Please refer to this blog and the references in it for more info.
We have a series of blogs on the topic which describe the challenges and the best practices on development of demand forecasting usecases on Databricks. Please refer to this blog and the references in it for more info.
Leave it turned on. the bet is with each Spark version released AQE will get better and better and eventually will lead to a much more performance optimisation plan than manually trying to tune it.
Dstream is unsupported by Databricks. Databrcks strongly recommend migrating the Dstream applications to use Structured Streaminghttps://kb.databricks.com/streaming/dstream-not-supported.html
When using spot fleet pools to schedule jobs, driver and worker nodes are provisioned from the spot pools and we are noticing jobs failing with the below exception when there is a driver spot loss. Share best practices around using fleet pools with 1...
In this scenario, the driver node is reclaimed by AWS. Databricks started preview of hybrid pools feature which would allow you to provision driver node from a different pool. We recommend using on-demand pool for driver node to improve reliability i...