- 512 Views
- 1 replies
- 0 kudos
- 0 kudos
Yes. Please refer here for the limits per Tier.https://docs.databricks.com/resources/limits.html
- 0 kudos
Yes. Please refer here for the limits per Tier.https://docs.databricks.com/resources/limits.html
Hey I am new to Databricks and heard of photon , which is the fastest engine developed by Databricks , Will it make the query faster , what about Concurrency of the queries , will it increase
Photon is databrick's brand new native vectorized engine developed in C++ for improved query performance (speed and concurrency). It integrates directly with the Databricks Runtime and Spark, meaning no code changes are required to use Photon. At thi...
In this blog we implement a typical model for customer attrition in subscription models from data preparation to operationalisation of the model.
Hello have you read our solution accelerator for prediction customer churn?If you have further questions, please contact your databricks liaison and we can walk you through the solution and how you can deploy it at scale.
Delta Live Tables offers built-in data lineage between tables and views defined in a pipeline, which allows for easier monitoring and simplified recovery
You can monitor user access to data and other resources using Databricks Audit Logs.Diagnostic logging in Azure DatabricksConfigure audit logging in AWS Databricks
What are best practices for Spark streaming in Databricksis it good idea to consume multiple topics in one streaming jobis Auto scaling recommended for spark streamingHow many worker nodes we should choose for streaming jobWhen should we run OPTIMIZE...
See our docs for other considerations when deploying a production streaming job.
Looking at the API for Dataframe.unionAll() when you have 2 different dataframes with different number of columns and names unionAll() doesn't work.How can you do it?One possible solution is using the following function which performs the union of tw...
I'm not sure union is the right tool, if the DataFrames have fundamentally different information in them. If the difference is merely column name, yes, rename. If they don't, then the 'union' contemplated here is really a union of columns as well as ...
How to start a photon cluster, where I can fins the pricing of photon Cluster
As of the time of this message, Photon availability in the Data Science & Engineering workspace in Public Preview on AWS. You can reference our docs for instructions on how to provision a cluster using a Photon-enabled runtime. As for pricing, we tre...
We list the OS version in the "Environment" section of each runtime version's release notes. See link to all the runtime release notes here: https://docs.databricks.com/release-notes/runtime/releases.html
Hosting your own internal PyPI mirror. That will allow you to manage and approve packages vs directly downloading from public PyPI and then also would remove dependency on an external serviceUpload all wheel files to DBFS, maybe through a CI/CD proce...
My team and I were using a cluster for some days and it disappeared without any apparent reason. I recreate the cluster, but after some days it disappeared again. Do you know why my cluster disappeared? how to avoid that?
A cluster is deleted after 30 days after a cluster is terminated. To keep an all-purpose cluster configuration even after a cluster has been terminated for more than 30 days, an administrator can pin the cluster. Up to 70 clusters can be pinned.To av...
Delta Lake is a data storage and management layer that fixes the issues with existing data lakes, e.g. on S3, GCS or ADLS. Delta supports streaming and batch operations. It's an open source project, donated to the Linux Foundation. You can check it o...
I heard Databricks recommend overwatch for monitoring Clusters, Can anybody help like what all metrics it will provide that , how it is helpful in monitoring or better than ganglia ?
Overwatch is a different kind of tool - right now it couldn't be used for real-time monitoring, like, Ganglia. Overwatch collects data from the multiple data sources (audit logs, APIs, cluster logs, etc.), process, enrich and aggregate them following...
I have a use case in mlflow with python code to find a model version that has the best metric (for instance, “accuracy”) among so many versions , I don't want to use web ui but to use python code to achieve this. Any Idea?
import mlflow client = mlflow.tracking.MlflowClient() runs = client.search_runs("my_experiment_id", "", order_by=["metrics.rmse DESC"], max_results=1) best_run = runs[0]https://mlflow.org/docs/latest/python_api/mlflow.tracking.html#mlflow.tracking.M...
We need to add unit test cases for our code that we're writing using the Scala in Python. But we can't use the calls like `assertEqual` for comparing the content of DataFrames. Are any special libraries for that?
There are several libraries for Scala and Python that help with writing unit tests for Spark code.For Scala you can use following:Built-in Spark test suite - it's designed to test all parts of Spark. It supports RDD, Dataframe/Dataset, Streaming API...
User | Count |
---|---|
1601 | |
736 | |
343 | |
284 | |
247 |