Data Engineering

Forum Posts

Sorted by:

by User15787040559 • Databricks Employee

06-18-2021 10:01:08 AM

1878 Views
1 replies
0 kudos

How to translate Apache Pig FOREACH GENERATE statement to Spark?

If you have the following Apache Pig FOREACH GENERATE statement:XBCUD_Y_TMP1 = FOREACH (FILTER XBCUD BY act_ind == 'Y') GENERATE cust_hash_key,CONCAT(brd_abbr_cd,ctry_cd) as brdCtry:chararray,updt_dt_hash_key;the equivalent code in Apache Spark is:XB...

Data Engineering

1878 Views
1 replies
0 kudos

06-18-2021 10:01:08 AM

View Replies

Latest Reply

User15725630784
Databricks Employee

06-18-2021 12:28:11 PM

0 kudos

the equivalent code in Apache Spark is:XBCUD_Y_TMP1_DF = (XBCUD_DF .filter(col("act_ind") == "Y") .select(col("cust_hash_key"), concat(col("brd_abbr_cd"),col("ctry_cd")).alias("brdCtry"), col("updt_dt_hash_key")) )

0 kudos

06-18-2021 12:28:11 PM

by User15787040559 • Databricks Employee

06-18-2021 11:30:37 AM

2870 Views
1 replies
0 kudos

What timezone is the “timestamp” value in the Databricks Usage log?

What timezone is the “timestamp” value in the Databricks Usage log ?Is it UTC?timestamp2020-12-01T00:59:59.000ZNeed to match this to AWS Cost Explorer timezone for simplicity.It's UTC.Please see timestamp under Audit Log Schema https://docs.databrick...

Data Engineering

2870 Views
1 replies
0 kudos

06-18-2021 11:30:37 AM

View Replies

Latest Reply

User15725630784
Databricks Employee

06-18-2021 12:26:05 PM

0 kudos

UTC

0 kudos

06-18-2021 12:26:05 PM

by User16765131552 • Contributor III

06-18-2021 12:22:24 PM

3039 Views
1 replies
1 kudos

Resolved! Create a new cluster in Databricks using databricks-cli

I'm trying to create a new cluster in Databricks on Azure using databricks-cli.I'm using the following command:databricks clusters create --json '{ "cluster_name": "template2", "spark_version": "4.1.x-scala2.11" }'And getting back this error: Error: ...

Data Engineering

3039 Views
1 replies
1 kudos

06-18-2021 12:22:24 PM

View Replies

Latest Reply

User16765131552
Contributor III

06-18-2021 12:24:11 PM

1 kudos

I found the right answer here.The correct format to run this command on azure is:databricks clusters create --json '{ "cluster_name": "my-cluster", "spark_version": "4.1.x-scala2.11", "node_type_id": "Standard_DS3_v2", "autoscale" : { "min_workers": ...

1 kudos

06-18-2021 12:24:11 PM

by User16830818524 • New Contributor III

06-18-2021 11:26:21 AM

23699 Views
1 replies
1 kudos

Read Delta Table with Pandas

Is it possible to read a Delta table directly into a Pandas Dataframe?

Data Engineering

23699 Views
1 replies
1 kudos

06-18-2021 11:26:21 AM

View Replies

Latest Reply

aladda
Databricks Employee

06-18-2021 12:24:09 PM

1 kudos

You'd have convert a delta table to pyarrow and then use to_pandas. See https://databricks.com/blog/2020/12/22/natively-query-your-delta-lake-with-scala-java-and-python.html for details# Create a Pandas Dataframe by initially converting the Delta Lak...

1 kudos

06-18-2021 12:24:09 PM

by User15787040559 • Databricks Employee

06-18-2021 10:37:53 AM

2027 Views
1 replies
1 kudos

Why do we need the ec2:CreateTags and ec2:DeleteTags permissions?

Why do we need the ec2:CreateTags and ec2:DeleteTags permissions?Are they required?Are ec2 tags used internally as well?

Data Engineering

2027 Views
1 replies
1 kudos

06-18-2021 10:37:53 AM

View Replies

Latest Reply

User15787040559
Databricks Employee

06-18-2021 12:08:03 PM

1 kudos

Yes, it’s required. It’s how Databrics tracks and tags resources.The tags are used to identify the owner of clusters on the AWS side and Databricks uses the tag information internally as well.

1 kudos

06-18-2021 12:08:03 PM

by MoJaMa • Databricks Employee

06-18-2021 12:03:19 PM

2131 Views
1 replies
0 kudos

Can I convert an existing workspace to Privatelink (AWS)?

Data Engineering

2131 Views
1 replies
0 kudos

06-18-2021 12:03:19 PM

View Replies

Latest Reply

MoJaMa
Databricks Employee

06-18-2021 12:05:42 PM

0 kudos

Yes. We can convert an existing workspace to PrivateLink on E2.So you can have one workspace that's on PL and one that's not.Please contact your Databricks Representative and we can help you make this change.

0 kudos

06-18-2021 12:05:42 PM

by HowardWong • New Contributor II

06-18-2021 11:59:08 AM

2663 Views
0 replies
3 kudos

What is the difference between a host subnet and a container subnet when deploying a workspace with vnet injection?

How is the host subnet and container subnet used with vnet injection?

Data Engineering

2663 Views
0 replies
3 kudos

06-18-2021 11:59:08 AM

by HowardWong • New Contributor II

06-18-2021 11:53:54 AM

848 Views
0 replies
0 kudos

How do you handle Kafka offsets in a DR scenario?

If on one region running a structured streaming job with a checkpoint fails for whatever reason, DR kicks in to run a job in another region. What is the best way for the pick up the offset to continue where the failed job stopped?

Data Engineering

848 Views
0 replies
0 kudos

06-18-2021 11:53:54 AM

by User16826994223 • Honored Contributor III

06-15-2021 8:58:45 AM

1293 Views
1 replies
1 kudos

Does Databricks provide any isolation mechanisms when deployed in my account?

Data Engineering

1293 Views
1 replies
1 kudos

06-15-2021 8:58:45 AM

View Replies

Latest Reply

Mooune_DBU
Valued Contributor

06-18-2021 11:52:45 AM

1 kudos

If you're running on AWS: Databricks deploys Spark nodes in an Amazon Virtual Private Cloud (VPC) running in the customer’s own AWS account, giving the customer full control over their data and instances. VPCs enable customers to isolate the network ...

1 kudos

06-18-2021 11:52:45 AM

by MoJaMa • Databricks Employee

06-18-2021 11:35:42 AM

1105 Views
1 replies
0 kudos

Are there any limits on number of workspaces created in the AWS E2 Architecture?

Data Engineering

1105 Views
1 replies
0 kudos

06-18-2021 11:35:42 AM

View Replies

Latest Reply

MoJaMa
Databricks Employee

06-18-2021 11:36:09 AM

0 kudos

Yes. Please refer here for the limits per Tier.https://docs.databricks.com/resources/limits.html

0 kudos

06-18-2021 11:36:09 AM

by User16826994223 • Honored Contributor III

06-15-2021 9:10:34 AM

2051 Views
1 replies
0 kudos

What is Photon in DataBricks

Hey I am new to Databricks and heard of photon , which is the fastest engine developed by Databricks , Will it make the query faster , what about Concurrency of the queries , will it increase

Data Engineering

2051 Views
1 replies
0 kudos

06-15-2021 9:10:34 AM

View Replies

Latest Reply

Mooune_DBU
Valued Contributor

06-18-2021 11:31:29 AM

0 kudos

Photon is databrick's brand new native vectorized engine developed in C++ for improved query performance (speed and concurrency). It integrates directly with the Databricks Runtime and Spark, meaning no code changes are required to use Photon. At thi...

0 kudos

06-18-2021 11:31:29 AM

by User16857281869 • New Contributor II

06-18-2021 5:10:09 AM

1579 Views
1 replies
1 kudos

What are the best ways of developing a customer churn usecase on databricks?

In this blog we implement a typical model for customer attrition in subscription models from data preparation to operationalisation of the model.

Data Engineering

1579 Views
1 replies
1 kudos

06-18-2021 5:10:09 AM

View Replies

Latest Reply

Mooune_DBU
Valued Contributor

06-18-2021 11:18:07 AM

1 kudos

Hello have you read our solution accelerator for prediction customer churn?If you have further questions, please contact your databricks liaison and we can walk you through the solution and how you can deploy it at scale.

1 kudos

06-18-2021 11:18:07 AM

by Srikanth_Gupta_ • Databricks Employee

06-18-2021 10:38:37 AM

1706 Views
1 replies
0 kudos

How is Data lineage achieved in Delta lake starting from source -> Bronze -> Silver -> Gold layers

Data Engineering

1706 Views
1 replies
0 kudos

06-18-2021 10:38:37 AM

View Replies

Latest Reply

craig_ng
New Contributor III

06-18-2021 10:52:25 AM

0 kudos

Delta Live Tables offers built-in data lineage between tables and views defined in a pipeline, which allows for easier monitoring and simplified recovery

0 kudos

06-18-2021 10:52:25 AM

by craig_ng • New Contributor III

06-18-2021 10:34:34 AM

3867 Views
2 replies
0 kudos

How can an administrator track what actions users take in the workspace?

Data Engineering

3867 Views
2 replies
0 kudos

06-18-2021 10:34:34 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-18-2021 10:40:51 AM

0 kudos

You can monitor user access to data and other resources using Databricks Audit Logs.Diagnostic logging in Azure DatabricksConfigure audit logging in AWS Databricks

0 kudos

06-18-2021 10:40:51 AM

1 More Replies

by Srikanth_Gupta_ • Databricks Employee

06-14-2021 3:15:21 PM

2883 Views
2 replies
1 kudos

What are Best Practices for Spark streaming in Databricks

What are best practices for Spark streaming in Databricksis it good idea to consume multiple topics in one streaming jobis Auto scaling recommended for spark streamingHow many worker nodes we should choose for streaming jobWhen should we run OPTIMIZE...

Data Engineering

2883 Views
2 replies
1 kudos

06-14-2021 3:15:21 PM

View Replies

Latest Reply

craig_ng
New Contributor III

06-18-2021 10:37:30 AM

1 kudos

See our docs for other considerations when deploying a production streaming job.

1 kudos

06-18-2021 10:37:30 AM

1 More Replies

Databricks Community

Forum Posts

How to translate Apache Pig FOREACH GENERATE statement to Spark?

What timezone is the “timestamp” value in the Databricks Usage log?

Resolved! Create a new cluster in Databricks using databricks-cli

Read Delta Table with Pandas

Why do we need the ec2:CreateTags and ec2:DeleteTags permissions?

Can I convert an existing workspace to Privatelink (AWS)?

What is the difference between a host subnet and a container subnet when deploying a workspace with vnet injection?

How do you handle Kafka offsets in a DR scenario?

Does Databricks provide any isolation mechanisms when deployed in my account?

Are there any limits on number of workspaces created in the AWS E2 Architecture?

What is Photon in DataBricks

What are the best ways of developing a customer churn usecase on databricks?

How is Data lineage achieved in Delta lake starting from source -> Bronze -> Silver -> Gold layers

How can an administrator track what actions users take in the workspace?

What are Best Practices for Spark streaming in Databricks

Join Us as a Local Community Builder!

Difference between libraries dlt and dp

Getting an warning message in Declarative Pipeline...

Streamlining Custom Job Notifications with a Centr...

Self Dependency TumblingWindowTrigger in adf

How to Find DBU Consumption and Cost for a Serverl...