cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826994223
by Honored Contributor III
  • 1640 Views
  • 1 replies
  • 0 kudos

What is Photon in DataBricks

Hey I am new to Databricks and heard of photon , which is the fastest engine developed by Databricks , Will it make the query faster , what about Concurrency of the queries , will it increase

  • 1640 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mooune_DBU
Valued Contributor
  • 0 kudos

Photon is databrick's brand new native vectorized engine developed in C++ for improved query performance (speed and concurrency). It integrates directly with the Databricks Runtime and Spark, meaning no code changes are required to use Photon. At thi...

  • 0 kudos
User16857281869
by New Contributor II
  • 1328 Views
  • 1 replies
  • 1 kudos

What are the best ways of developing a customer churn usecase on databricks?

In this blog we implement a typical model for customer attrition in subscription models from data preparation to operationalisation of the model.

  • 1328 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mooune_DBU
Valued Contributor
  • 1 kudos

Hello have you read our solution accelerator for prediction customer churn?If you have further questions, please contact your databricks liaison and we can walk you through the solution and how you can deploy it at scale.

  • 1 kudos
Srikanth_Gupta_
by Databricks Employee
  • 1469 Views
  • 1 replies
  • 0 kudos
  • 1469 Views
  • 1 replies
  • 0 kudos
Latest Reply
craig_ng
New Contributor III
  • 0 kudos

Delta Live Tables offers built-in data lineage between tables and views defined in a pipeline, which allows for easier monitoring and simplified recovery

  • 0 kudos
craig_ng
by New Contributor III
  • 3193 Views
  • 2 replies
  • 0 kudos
  • 3193 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

You can monitor user access to data and other resources using Databricks Audit Logs.Diagnostic logging in Azure DatabricksConfigure audit logging in AWS Databricks

  • 0 kudos
1 More Replies
Srikanth_Gupta_
by Databricks Employee
  • 1553 Views
  • 2 replies
  • 1 kudos

What are Best Practices for Spark streaming in Databricks

What are best practices for Spark streaming in Databricksis it good idea to consume multiple topics in one streaming jobis Auto scaling recommended for spark streamingHow many worker nodes we should choose for streaming jobWhen should we run OPTIMIZE...

  • 1553 Views
  • 2 replies
  • 1 kudos
Latest Reply
craig_ng
New Contributor III
  • 1 kudos

See our docs for other considerations when deploying a production streaming job.

  • 1 kudos
1 More Replies
User15787040559
by Databricks Employee
  • 4594 Views
  • 2 replies
  • 0 kudos

How to do a unionAll() when the number and the name of columns are different?

Looking at the API for Dataframe.unionAll() when you have 2 different dataframes with different number of columns and names unionAll() doesn't work.How can you do it?One possible solution is using the following function which performs the union of tw...

  • 4594 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

I'm not sure union is the right tool, if the DataFrames have fundamentally different information in them. If the difference is merely column name, yes, rename. If they don't, then the 'union' contemplated here is really a union of columns as well as ...

  • 0 kudos
1 More Replies
User16826994223
by Honored Contributor III
  • 988 Views
  • 1 replies
  • 0 kudos

Start photon cluster

How to start a photon cluster, where I can fins the pricing of photon Cluster

  • 988 Views
  • 1 replies
  • 0 kudos
Latest Reply
craig_ng
New Contributor III
  • 0 kudos

As of the time of this message, Photon availability in the Data Science & Engineering workspace in Public Preview on AWS. You can reference our docs for instructions on how to provision a cluster using a Photon-enabled runtime. As for pricing, we tre...

  • 0 kudos
Anonymous
by Not applicable
  • 1161 Views
  • 1 replies
  • 0 kudos
  • 1161 Views
  • 1 replies
  • 0 kudos
Latest Reply
craig_ng
New Contributor III
  • 0 kudos

We list the OS version in the "Environment" section of each runtime version's release notes. See link to all the runtime release notes here: https://docs.databricks.com/release-notes/runtime/releases.html

  • 0 kudos
MoJaMa
by Databricks Employee
  • 1111 Views
  • 1 replies
  • 0 kudos
  • 1111 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

Hosting your own internal PyPI mirror. That will allow you to manage and approve packages vs directly downloading from public PyPI and then also would remove dependency on an external serviceUpload all wheel files to DBFS, maybe through a CI/CD proce...

  • 0 kudos
User16860826802
by New Contributor III
  • 7569 Views
  • 1 replies
  • 1 kudos

Resolved! Why does my cluster keeps disappearing?

My team and I were using a cluster for some days and it disappeared without any apparent reason. I recreate the cluster, but after some days it disappeared again. Do you know why my cluster disappeared? how to avoid that?

  • 7569 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16860826802
New Contributor III
  • 1 kudos

A cluster is deleted after 30 days after a cluster is terminated. To keep an all-purpose cluster configuration even after a cluster has been terminated for more than 30 days, an administrator can pin the cluster. Up to 70 clusters can be pinned.To av...

  • 1 kudos
Anonymous
by Not applicable
  • 1255 Views
  • 1 replies
  • 0 kudos
  • 1255 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16752244127
Contributor
  • 0 kudos

Delta Lake is a data storage and management layer that fixes the issues with existing data lakes, e.g. on S3, GCS or ADLS. Delta supports streaming and batch operations. It's an open source project, donated to the Linux Foundation. You can check it o...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 6344 Views
  • 1 replies
  • 1 kudos

What is overwatch ?

I heard Databricks recommend overwatch for monitoring Clusters, Can anybody help like what all metrics it will provide that , how it is helpful in monitoring or better than ganglia ?

  • 6344 Views
  • 1 replies
  • 1 kudos
Latest Reply
alexott
Databricks Employee
  • 1 kudos

Overwatch is a different kind of tool - right now it couldn't be used for real-time monitoring, like, Ganglia. Overwatch collects data from the multiple data sources (audit logs, APIs, cluster logs, etc.), process, enrich and aggregate them following...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels