cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MoJaMa
by Databricks Employee
  • 1993 Views
  • 1 replies
  • 0 kudos
  • 1993 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

Currently there is no concept of "Cluster Owner". https://docs.databricks.com/security/access-control/cluster-acl.html#cluster-level-permissionsSo you have to clone the cluster, thus making the person who cloned it the creator of the new cluster. The...

  • 0 kudos
jose_gonzalez
by Databricks Employee
  • 1639 Views
  • 1 replies
  • 0 kudos

Resolved! How can I connect my favorite IDE, like Pycharm to Databricks cluster?

I would like to know if there is a way to connect to Databricks cluster using my IDE

  • 1639 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Databricks connect allows you to connect your favorite IDE to Databricks clusters. You can find more details on how to set it up and install all the libraries https://docs.databricks.com/dev-tools/databricks-connect.html

  • 0 kudos
aladda
by Databricks Employee
  • 2223 Views
  • 1 replies
  • 0 kudos
  • 2223 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

There’s two places to leverage Github for content management and version control in DatabricksRepos for Git integration - Repos are folders whose contents are co-versioned together by syncing them to a remote Git repository. Repos can contain only Da...

  • 0 kudos
User15787040559
by Databricks Employee
  • 1750 Views
  • 1 replies
  • 0 kudos

How to translate Apache Pig FOREACH GENERATE statement to Spark?

If you have the following Apache Pig FOREACH GENERATE statement:XBCUD_Y_TMP1 = FOREACH (FILTER XBCUD BY act_ind == 'Y') GENERATE cust_hash_key,CONCAT(brd_abbr_cd,ctry_cd) as brdCtry:chararray,updt_dt_hash_key;the equivalent code in Apache Spark is:XB...

  • 1750 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15725630784
Databricks Employee
  • 0 kudos

the equivalent code in Apache Spark is:XBCUD_Y_TMP1_DF = (XBCUD_DF .filter(col("act_ind") == "Y") .select(col("cust_hash_key"), concat(col("brd_abbr_cd"),col("ctry_cd")).alias("brdCtry"), col("updt_dt_hash_key")) )

  • 0 kudos
User15787040559
by Databricks Employee
  • 2627 Views
  • 1 replies
  • 0 kudos

What timezone is the “timestamp” value in the Databricks Usage log?

What timezone is the “timestamp” value in the Databricks Usage log ?Is it UTC?timestamp2020-12-01T00:59:59.000ZNeed to match this to AWS Cost Explorer timezone for simplicity.It's UTC.Please see timestamp under Audit Log Schema https://docs.databrick...

  • 2627 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15725630784
Databricks Employee
  • 0 kudos

UTC

  • 0 kudos
User16765131552
by Contributor III
  • 2863 Views
  • 1 replies
  • 1 kudos

Resolved! Create a new cluster in Databricks using databricks-cli

I'm trying to create a new cluster in Databricks on Azure using databricks-cli.I'm using the following command:databricks clusters create --json '{ "cluster_name": "template2", "spark_version": "4.1.x-scala2.11" }'And getting back this error: Error: ...

  • 2863 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16765131552
Contributor III
  • 1 kudos

I found the right answer here.The correct format to run this command on azure is:databricks clusters create --json '{ "cluster_name": "my-cluster", "spark_version": "4.1.x-scala2.11", "node_type_id": "Standard_DS3_v2", "autoscale" : { "min_workers": ...

  • 1 kudos
User16830818524
by New Contributor II
  • 23162 Views
  • 1 replies
  • 0 kudos

Read Delta Table with Pandas

Is it possible to read a Delta table directly into a Pandas Dataframe?

  • 23162 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

You'd have convert a delta table to pyarrow and then use to_pandas. See https://databricks.com/blog/2020/12/22/natively-query-your-delta-lake-with-scala-java-and-python.html for details# Create a Pandas Dataframe by initially converting the Delta Lak...

  • 0 kudos
User15787040559
by Databricks Employee
  • 1891 Views
  • 1 replies
  • 1 kudos

Why do we need the ec2:CreateTags and ec2:DeleteTags permissions?

Why do we need the ec2:CreateTags and ec2:DeleteTags permissions?Are they required?Are ec2 tags used internally as well?

  • 1891 Views
  • 1 replies
  • 1 kudos
Latest Reply
User15787040559
Databricks Employee
  • 1 kudos

Yes, it’s required. It’s how Databrics tracks and tags resources.The tags are used to identify the owner of clusters on the AWS side and Databricks uses the tag information internally as well.

  • 1 kudos
MoJaMa
by Databricks Employee
  • 2006 Views
  • 1 replies
  • 0 kudos
  • 2006 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

Yes. We can convert an existing workspace to PrivateLink on E2.So you can have one workspace that's on PL and one that's not.Please contact your Databricks Representative and we can help you make this change.

  • 0 kudos
HowardWong
by New Contributor II
  • 784 Views
  • 0 replies
  • 0 kudos

How do you handle Kafka offsets in a DR scenario?

If on one region running a structured streaming job with a checkpoint fails for whatever reason, DR kicks in to run a job in another region. What is the best way for the pick up the offset to continue where the failed job stopped?

  • 784 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 1224 Views
  • 1 replies
  • 1 kudos

Does Databricks provide any isolation mechanisms when deployed in my account?

Does Databricks provide any isolation mechanisms when deployed in my account?

  • 1224 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mooune_DBU
Valued Contributor
  • 1 kudos

If you're running on AWS: Databricks deploys Spark nodes in an Amazon Virtual Private Cloud (VPC) running in the customer’s own AWS account, giving the customer full control over their data and instances. VPCs enable customers to isolate the network ...

  • 1 kudos
User16826994223
by Honored Contributor III
  • 1903 Views
  • 1 replies
  • 0 kudos

What is Photon in DataBricks

Hey I am new to Databricks and heard of photon , which is the fastest engine developed by Databricks , Will it make the query faster , what about Concurrency of the queries , will it increase

  • 1903 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mooune_DBU
Valued Contributor
  • 0 kudos

Photon is databrick's brand new native vectorized engine developed in C++ for improved query performance (speed and concurrency). It integrates directly with the Databricks Runtime and Spark, meaning no code changes are required to use Photon. At thi...

  • 0 kudos
User16857281869
by New Contributor II
  • 1487 Views
  • 1 replies
  • 1 kudos

What are the best ways of developing a customer churn usecase on databricks?

In this blog we implement a typical model for customer attrition in subscription models from data preparation to operationalisation of the model.

  • 1487 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mooune_DBU
Valued Contributor
  • 1 kudos

Hello have you read our solution accelerator for prediction customer churn?If you have further questions, please contact your databricks liaison and we can walk you through the solution and how you can deploy it at scale.

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels