cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826992783
by New Contributor II
  • 1750 Views
  • 1 replies
  • 1 kudos

Receiving a "Databricks Delta is not enabled on your account" error

The team is using Databricks Light for some pipeline development and would like to leverage Delta but are running into this error? "Databricks Delta is not enabled on your account"How can we enable Delta for our account

  • 1750 Views
  • 1 replies
  • 1 kudos
Latest Reply
craig_ng
New Contributor III
  • 1 kudos

Databricks Light is the open source Apache Spark runtime and does not come with any type of client for Delta Lake pre-installed. You'll need to manually install open source Delta Lake in order to do any reads or writes.See our docs and release notes ...

  • 1 kudos
User16765131552
by Contributor III
  • 2541 Views
  • 1 replies
  • 1 kudos

Resolved! Create a new cluster in Databricks using databricks-cli

I'm trying to create a new cluster in Databricks on Azure using databricks-cli.I'm using the following command:databricks clusters create --json '{ "cluster_name": "template2", "spark_version": "4.1.x-scala2.11" }'And getting back this error: Error: ...

  • 2541 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16765131552
Contributor III
  • 1 kudos

I found the right answer here.The correct format to run this command on azure is:databricks clusters create --json '{ "cluster_name": "my-cluster", "spark_version": "4.1.x-scala2.11", "node_type_id": "Standard_DS3_v2", "autoscale" : { "min_workers": ...

  • 1 kudos
User15787040559
by Databricks Employee
  • 1680 Views
  • 1 replies
  • 1 kudos

Why do we need the ec2:CreateTags and ec2:DeleteTags permissions?

Why do we need the ec2:CreateTags and ec2:DeleteTags permissions?Are they required?Are ec2 tags used internally as well?

  • 1680 Views
  • 1 replies
  • 1 kudos
Latest Reply
User15787040559
Databricks Employee
  • 1 kudos

Yes, it’s required. It’s how Databrics tracks and tags resources.The tags are used to identify the owner of clusters on the AWS side and Databricks uses the tag information internally as well.

  • 1 kudos
User16826994223
by Honored Contributor III
  • 1106 Views
  • 1 replies
  • 1 kudos

Does Databricks provide any isolation mechanisms when deployed in my account?

Does Databricks provide any isolation mechanisms when deployed in my account?

  • 1106 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mooune_DBU
Valued Contributor
  • 1 kudos

If you're running on AWS: Databricks deploys Spark nodes in an Amazon Virtual Private Cloud (VPC) running in the customer’s own AWS account, giving the customer full control over their data and instances. VPCs enable customers to isolate the network ...

  • 1 kudos
User16826994223
by Honored Contributor III
  • 1553 Views
  • 1 replies
  • 0 kudos

What is Photon in DataBricks

Hey I am new to Databricks and heard of photon , which is the fastest engine developed by Databricks , Will it make the query faster , what about Concurrency of the queries , will it increase

  • 1553 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mooune_DBU
Valued Contributor
  • 0 kudos

Photon is databrick's brand new native vectorized engine developed in C++ for improved query performance (speed and concurrency). It integrates directly with the Databricks Runtime and Spark, meaning no code changes are required to use Photon. At thi...

  • 0 kudos
User16857281869
by New Contributor II
  • 1259 Views
  • 1 replies
  • 1 kudos

What are the best ways of developing a customer churn usecase on databricks?

In this blog we implement a typical model for customer attrition in subscription models from data preparation to operationalisation of the model.

  • 1259 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mooune_DBU
Valued Contributor
  • 1 kudos

Hello have you read our solution accelerator for prediction customer churn?If you have further questions, please contact your databricks liaison and we can walk you through the solution and how you can deploy it at scale.

  • 1 kudos
User16826994223
by Honored Contributor III
  • 6271 Views
  • 1 replies
  • 1 kudos

What is overwatch ?

I heard Databricks recommend overwatch for monitoring Clusters, Can anybody help like what all metrics it will provide that , how it is helpful in monitoring or better than ganglia ?

  • 6271 Views
  • 1 replies
  • 1 kudos
Latest Reply
alexott
Databricks Employee
  • 1 kudos

Overwatch is a different kind of tool - right now it couldn't be used for real-time monitoring, like, Ganglia. Overwatch collects data from the multiple data sources (audit logs, APIs, cluster logs, etc.), process, enrich and aggregate them following...

  • 1 kudos
User16826994223
by Honored Contributor III
  • 1405 Views
  • 3 replies
  • 0 kudos

What is Autolader in Databricks?

Want to Know what is Autoloader and what are its advantages

  • 1405 Views
  • 3 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

The biggest advantage is the ease with which you can star ingesting data from your Cloud Storage directly into a Delta Table. You can choose Directory Listing mode or File Notification mode, depending on what fits your use case best.

  • 0 kudos
2 More Replies
MoJaMa
by Databricks Employee
  • 976 Views
  • 1 replies
  • 0 kudos
  • 976 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

Most possibly in future as we progress down our Roadmap.Currently it is per-workspace, and only accessible in Databricks notebooks/jobs.Please refer to our docs:https://docs.databricks.com/applications/machine-learning/feature-store.html#known-limita...

  • 0 kudos
MoJaMa
by Databricks Employee
  • 939 Views
  • 1 replies
  • 0 kudos
  • 939 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

We used to require this, but starting June 9, 2021 we no longer do, and have improved our E2 security posture.See https://docs.databricks.com/administration-guide/account-api/iam-role.html for the current permissions required.

  • 0 kudos
sajith_appukutt
by Honored Contributor II
  • 1529 Views
  • 1 replies
  • 1 kudos

Resolved! Are there any ways to automatically cleanup temporary files created in s3 by the Amazon Redshift connector

The Amazon Redshift data source in Databricks seems to be using S3 for storing intermediate results. Are there any ways to automatically cleanup temporary files created in S3

  • 1529 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

You could use storage lifecycle policy for the s3 bucket used for storing intermediate results and configure expiration actions. This way temporary/intermediate results would be automatically cleaned up

  • 1 kudos
sajith_appukutt
by Honored Contributor II
  • 1164 Views
  • 1 replies
  • 1 kudos
  • 1164 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

You'd need to open connections to Databricks web applicationDatabricks secure cluster connectivity (SCC) relayAWS S3 global URLAWS S3 regional URLAWS STS global URLAWS STS regional URLAWS Kinesis regional URLTable metastore RDS regional URL (by data ...

  • 1 kudos
User16826994223
by Honored Contributor III
  • 1147 Views
  • 1 replies
  • 0 kudos

multitask in Databricks

Hi Team is there any way we can utilize same cluster to run multiple dependent jobs in multi-task, starting cluster for every jobs take time

  • 1147 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16830818524
New Contributor II
  • 0 kudos

At this time it is not possible

  • 0 kudos
Srikanth_Gupta_
by Databricks Employee
  • 1510 Views
  • 1 replies
  • 0 kudos
  • 1510 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Delta cache is an automatic hands-free solution that leverages high read speeds of modern SSDs to transparently create copies of remote files in nodes’ local storage to accelerate data reads . In comparison, you have choose what and when to cache wit...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1597 Views
  • 1 replies
  • 0 kudos

Can you use external job scheduling tools to start and schedule Databricks jobs?

I am wondering if I have to use the Databricks jobs scheduler to kick off Databricks jobs. My company already uses another job scheduler for our workflows and it would be useful to add our Databricks jobs to that flow.

  • 1597 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

You could use external tools to schedule jobs in Databricks. Here is a blogpost explaining how Databricks could be used along with Azure Data factory . This blog explains how to use Airflow with DatabricksIt is worth noting that a lot Databricks's f...

  • 0 kudos
Labels