cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826994223
by Honored Contributor III
  • 824 Views
  • 1 replies
  • 0 kudos

What is Photon in DataBricks

Hey I am new to Databricks and heard of photon , which is the fastest engine developed by Databricks , Will it make the query faster , what about Concurrency of the queries , will it increase

  • 824 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mooune_DBU
Valued Contributor
  • 0 kudos

Photon is databrick's brand new native vectorized engine developed in C++ for improved query performance (speed and concurrency). It integrates directly with the Databricks Runtime and Spark, meaning no code changes are required to use Photon. At thi...

  • 0 kudos
User16857281869
by New Contributor II
  • 632 Views
  • 1 replies
  • 1 kudos

What are the best ways of developing a customer churn usecase on databricks?

In this blog we implement a typical model for customer attrition in subscription models from data preparation to operationalisation of the model.

  • 632 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mooune_DBU
Valued Contributor
  • 1 kudos

Hello have you read our solution accelerator for prediction customer churn?If you have further questions, please contact your databricks liaison and we can walk you through the solution and how you can deploy it at scale.

  • 1 kudos
Srikanth_Gupta_
by Valued Contributor
  • 632 Views
  • 1 replies
  • 0 kudos
  • 632 Views
  • 1 replies
  • 0 kudos
Latest Reply
craig_ng
New Contributor III
  • 0 kudos

Delta Live Tables offers built-in data lineage between tables and views defined in a pipeline, which allows for easier monitoring and simplified recovery

  • 0 kudos
craig_ng
by New Contributor III
  • 1792 Views
  • 2 replies
  • 0 kudos
  • 1792 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

You can monitor user access to data and other resources using Databricks Audit Logs.Diagnostic logging in Azure DatabricksConfigure audit logging in AWS Databricks

  • 0 kudos
1 More Replies
Srikanth_Gupta_
by Valued Contributor
  • 593 Views
  • 2 replies
  • 1 kudos

What are Best Practices for Spark streaming in Databricks

What are best practices for Spark streaming in Databricksis it good idea to consume multiple topics in one streaming jobis Auto scaling recommended for spark streamingHow many worker nodes we should choose for streaming jobWhen should we run OPTIMIZE...

  • 593 Views
  • 2 replies
  • 1 kudos
Latest Reply
craig_ng
New Contributor III
  • 1 kudos

See our docs for other considerations when deploying a production streaming job.

  • 1 kudos
1 More Replies
User15787040559
by New Contributor III
  • 2836 Views
  • 2 replies
  • 0 kudos

How to do a unionAll() when the number and the name of columns are different?

Looking at the API for Dataframe.unionAll() when you have 2 different dataframes with different number of columns and names unionAll() doesn't work.How can you do it?One possible solution is using the following function which performs the union of tw...

  • 2836 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

I'm not sure union is the right tool, if the DataFrames have fundamentally different information in them. If the difference is merely column name, yes, rename. If they don't, then the 'union' contemplated here is really a union of columns as well as ...

  • 0 kudos
1 More Replies
User16826994223
by Honored Contributor III
  • 545 Views
  • 1 replies
  • 0 kudos

Start photon cluster

How to start a photon cluster, where I can fins the pricing of photon Cluster

  • 545 Views
  • 1 replies
  • 0 kudos
Latest Reply
craig_ng
New Contributor III
  • 0 kudos

As of the time of this message, Photon availability in the Data Science & Engineering workspace in Public Preview on AWS. You can reference our docs for instructions on how to provision a cluster using a Photon-enabled runtime. As for pricing, we tre...

  • 0 kudos
Anonymous
by Not applicable
  • 517 Views
  • 1 replies
  • 0 kudos
  • 517 Views
  • 1 replies
  • 0 kudos
Latest Reply
craig_ng
New Contributor III
  • 0 kudos

We list the OS version in the "Environment" section of each runtime version's release notes. See link to all the runtime release notes here: https://docs.databricks.com/release-notes/runtime/releases.html

  • 0 kudos
MoJaMa
by Valued Contributor II
  • 549 Views
  • 1 replies
  • 0 kudos
  • 549 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Valued Contributor II
  • 0 kudos

Hosting your own internal PyPI mirror. That will allow you to manage and approve packages vs directly downloading from public PyPI and then also would remove dependency on an external serviceUpload all wheel files to DBFS, maybe through a CI/CD proce...

  • 0 kudos
User16860826802
by New Contributor III
  • 3724 Views
  • 1 replies
  • 1 kudos

Resolved! Why does my cluster keeps disappearing?

My team and I were using a cluster for some days and it disappeared without any apparent reason. I recreate the cluster, but after some days it disappeared again. Do you know why my cluster disappeared? how to avoid that?

  • 3724 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16860826802
New Contributor III
  • 1 kudos

A cluster is deleted after 30 days after a cluster is terminated. To keep an all-purpose cluster configuration even after a cluster has been terminated for more than 30 days, an administrator can pin the cluster. Up to 70 clusters can be pinned.To av...

  • 1 kudos
Anonymous
by Not applicable
  • 520 Views
  • 1 replies
  • 0 kudos
  • 520 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16752244127
Contributor
  • 0 kudos

Delta Lake is a data storage and management layer that fixes the issues with existing data lakes, e.g. on S3, GCS or ADLS. Delta supports streaming and batch operations. It's an open source project, donated to the Linux Foundation. You can check it o...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 5086 Views
  • 1 replies
  • 1 kudos

What is overwatch ?

I heard Databricks recommend overwatch for monitoring Clusters, Can anybody help like what all metrics it will provide that , how it is helpful in monitoring or better than ganglia ?

  • 5086 Views
  • 1 replies
  • 1 kudos
Latest Reply
alexott
Valued Contributor II
  • 1 kudos

Overwatch is a different kind of tool - right now it couldn't be used for real-time monitoring, like, Ganglia. Overwatch collects data from the multiple data sources (audit logs, APIs, cluster logs, etc.), process, enrich and aggregate them following...

  • 1 kudos
User16826994223
by Honored Contributor III
  • 1047 Views
  • 1 replies
  • 0 kudos

Resolved! How to find best model using python in mlflow

I have a use case in mlflow with python code to find a model version that has the best metric (for instance, “accuracy”) among so many versions , I don't want to use web ui but to use python code to achieve this. Any Idea?

  • 1047 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

import mlflow client = mlflow.tracking.MlflowClient() runs = client.search_runs("my_experiment_id", "", order_by=["metrics.rmse DESC"], max_results=1) best_run = runs[0]https://mlflow.org/docs/latest/python_api/mlflow.tracking.html#mlflow.tracking.M...

  • 0 kudos
alexott
by Valued Contributor II
  • 1271 Views
  • 1 replies
  • 0 kudos

What libraries could be used for unit testing of the Spark code?

We need to add unit test cases for our code that we're writing using the Scala in Python. But we can't use the calls like `assertEqual` for comparing the content of DataFrames. Are any special libraries for that?

  • 1271 Views
  • 1 replies
  • 0 kudos
Latest Reply
alexott
Valued Contributor II
  • 0 kudos

There are several libraries for Scala and Python that help with writing unit tests for Spark code.For Scala you can use following:Built-in Spark test suite - it's designed to test all parts of Spark. It supports RDD, Dataframe/Dataset, Streaming API...

  • 0 kudos
Labels
Top Kudoed Authors