cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826994223
by Databricks Employee
  • 2107 Views
  • 1 replies
  • 0 kudos

What is Photon in DataBricks

Hey I am new to Databricks and heard of photon , which is the fastest engine developed by Databricks , Will it make the query faster , what about Concurrency of the queries , will it increase

  • 2107 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mooune_DBU
Databricks Employee
  • 0 kudos

Photon is databrick's brand new native vectorized engine developed in C++ for improved query performance (speed and concurrency). It integrates directly with the Databricks Runtime and Spark, meaning no code changes are required to use Photon. At thi...

  • 0 kudos
User16857281869
by Databricks Employee
  • 1621 Views
  • 1 replies
  • 1 kudos

What are the best ways of developing a customer churn usecase on databricks?

In this blog we implement a typical model for customer attrition in subscription models from data preparation to operationalisation of the model.

  • 1621 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mooune_DBU
Databricks Employee
  • 1 kudos

Hello have you read our solution accelerator for prediction customer churn?If you have further questions, please contact your databricks liaison and we can walk you through the solution and how you can deploy it at scale.

  • 1 kudos
Srikanth_Gupta_
by Databricks Employee
  • 1762 Views
  • 1 replies
  • 0 kudos
  • 1762 Views
  • 1 replies
  • 0 kudos
Latest Reply
craig_ng
Databricks Employee
  • 0 kudos

Delta Live Tables offers built-in data lineage between tables and views defined in a pipeline, which allows for easier monitoring and simplified recovery

  • 0 kudos
craig_ng
by Databricks Employee
  • 3975 Views
  • 2 replies
  • 0 kudos
  • 3975 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

You can monitor user access to data and other resources using Databricks Audit Logs.Diagnostic logging in Azure DatabricksConfigure audit logging in AWS Databricks

  • 0 kudos
1 More Replies
Srikanth_Gupta_
by Databricks Employee
  • 2953 Views
  • 2 replies
  • 1 kudos

What are Best Practices for Spark streaming in Databricks

What are best practices for Spark streaming in Databricksis it good idea to consume multiple topics in one streaming jobis Auto scaling recommended for spark streamingHow many worker nodes we should choose for streaming jobWhen should we run OPTIMIZE...

  • 2953 Views
  • 2 replies
  • 1 kudos
Latest Reply
craig_ng
Databricks Employee
  • 1 kudos

See our docs for other considerations when deploying a production streaming job.

  • 1 kudos
1 More Replies
User15787040559
by Databricks Employee
  • 5239 Views
  • 2 replies
  • 0 kudos

How to do a unionAll() when the number and the name of columns are different?

Looking at the API for Dataframe.unionAll() when you have 2 different dataframes with different number of columns and names unionAll() doesn't work.How can you do it?One possible solution is using the following function which performs the union of tw...

  • 5239 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

I'm not sure union is the right tool, if the DataFrames have fundamentally different information in them. If the difference is merely column name, yes, rename. If they don't, then the 'union' contemplated here is really a union of columns as well as ...

  • 0 kudos
1 More Replies
User16826994223
by Databricks Employee
  • 1205 Views
  • 1 replies
  • 0 kudos

Start photon cluster

How to start a photon cluster, where I can fins the pricing of photon Cluster

  • 1205 Views
  • 1 replies
  • 0 kudos
Latest Reply
craig_ng
Databricks Employee
  • 0 kudos

As of the time of this message, Photon availability in the Data Science & Engineering workspace in Public Preview on AWS. You can reference our docs for instructions on how to provision a cluster using a Photon-enabled runtime. As for pricing, we tre...

  • 0 kudos
Anonymous
by Not applicable
  • 1467 Views
  • 1 replies
  • 0 kudos
  • 1467 Views
  • 1 replies
  • 0 kudos
Latest Reply
craig_ng
Databricks Employee
  • 0 kudos

We list the OS version in the "Environment" section of each runtime version's release notes. See link to all the runtime release notes here: https://docs.databricks.com/release-notes/runtime/releases.html

  • 0 kudos
MoJaMa
by Databricks Employee
  • 1391 Views
  • 1 replies
  • 0 kudos
  • 1391 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

Hosting your own internal PyPI mirror. That will allow you to manage and approve packages vs directly downloading from public PyPI and then also would remove dependency on an external serviceUpload all wheel files to DBFS, maybe through a CI/CD proce...

  • 0 kudos
User16860826802
by Databricks Employee
  • 9064 Views
  • 1 replies
  • 1 kudos

Resolved! Why does my cluster keeps disappearing?

My team and I were using a cluster for some days and it disappeared without any apparent reason. I recreate the cluster, but after some days it disappeared again. Do you know why my cluster disappeared? how to avoid that?

  • 9064 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16860826802
Databricks Employee
  • 1 kudos

A cluster is deleted after 30 days after a cluster is terminated. To keep an all-purpose cluster configuration even after a cluster has been terminated for more than 30 days, an administrator can pin the cluster. Up to 70 clusters can be pinned.To av...

  • 1 kudos
Anonymous
by Not applicable
  • 1612 Views
  • 1 replies
  • 0 kudos
  • 1612 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16752244127
Databricks Employee
  • 0 kudos

Delta Lake is a data storage and management layer that fixes the issues with existing data lakes, e.g. on S3, GCS or ADLS. Delta supports streaming and batch operations. It's an open source project, donated to the Linux Foundation. You can check it o...

  • 0 kudos
User16826994223
by Databricks Employee
  • 6729 Views
  • 1 replies
  • 1 kudos

What is overwatch ?

I heard Databricks recommend overwatch for monitoring Clusters, Can anybody help like what all metrics it will provide that , how it is helpful in monitoring or better than ganglia ?

  • 6729 Views
  • 1 replies
  • 1 kudos
Latest Reply
alexott
Databricks Employee
  • 1 kudos

Overwatch is a different kind of tool - right now it couldn't be used for real-time monitoring, like, Ganglia. Overwatch collects data from the multiple data sources (audit logs, APIs, cluster logs, etc.), process, enrich and aggregate them following...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels