Data Engineering

Forum Posts

Sorted by:

Start a conversation

by User16826994223 • Honored Contributor III

06-25-2021 6:15:59 AM

1989 Views
1 replies
0 kudos

How to change the time zone in notebook ,

Data Engineering

1989 Views
1 replies
0 kudos

06-25-2021 6:15:59 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 6:16:27 AM

0 kudos

import java.util.TimeZone spark.conf.set("spark.sql.session.timeZone", "Asia/Calcutta")TimeZone.setDefault(TimeZone.getTimeZone("Asia/Calcutta"))Scalaimport java.timeval s: String = time.LocalDateTime.now().toStringprintln(s)sql %sqlselect current_t...

0 kudos

06-25-2021 6:16:27 AM

by User16869510359 • Esteemed Contributor

06-25-2021 6:08:59 AM

819 Views
1 replies
0 kudos

Resolved! How Can I update the DBR versions of all my jobs in one go?

I keep it a point to use the latest DBR versions for my workloads and mostly we leverage those new features. But I have 300 jobs on the Databricks workspace and updating the DBR versions for each job manually is difficult to do. Any quick hack

Data Engineering

819 Views
1 replies
0 kudos

06-25-2021 6:08:59 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 6:10:27 AM

0 kudos

Below code snippet can be helpful if you are using Databricks CLIfor jobid in `databricks jobs list | awk '{print $1}'`; do databricks jobs get --job-id $jobid | jq .settings > /tmp/jobs/$jobid.json; done sed -i 's/"spark_version": ".*"/"spark_ver...

0 kudos

06-25-2021 6:10:27 AM

by User16869510359 • Esteemed Contributor

06-25-2021 6:00:53 AM

758 Views
1 replies
0 kudos

Resolved! What is the trade-off of using an unsupported DBR version on my cluster?

I do not want to upgrade my cluster every one month. I am looking for stability over new features.

Data Engineering

758 Views
1 replies
0 kudos

06-25-2021 6:00:53 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 6:04:01 AM

0 kudos

The strong recommendation is not to use an unsupported version of DBR on your cluster. For production workloads where you don't welcome newer versions, then check the Databricks LTS DBR versions. if using an unsupported version then you don't receiv...

0 kudos

06-25-2021 6:04:01 AM

by User16869510359 • Esteemed Contributor

06-25-2021 5:54:16 AM

1045 Views
1 replies
0 kudos

Resolved! Getting file permission issues even though I have the right IAM role attached

I am reading data from S3 from a Databricks cluster and the read operation seldom fails with 403 permission errors. Restarting the cluster fixes my issue.

Data Engineering

1045 Views
1 replies
0 kudos

06-25-2021 5:54:16 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 5:56:18 AM

0 kudos

The main reason for this behavior is : AWS keys are used in addition to the IAM role. Using global init scripts to set the AWS keys can cause this behavior.The IAM role has the required permission to access the S3 data, but AWS keys are set in the Sp...

0 kudos

06-25-2021 5:56:18 AM

by User16869510359 • Esteemed Contributor

06-25-2021 5:51:31 AM

1039 Views
1 replies
0 kudos

Resolved! Why do I see data loss with Structured streaming jobs?

I have a Spark structured streaming job reading data from Kafka and loading it to the Delta table. I have some transformations and aggregations on the streaming data before writing to Delta table

Data Engineering

1039 Views
1 replies
0 kudos

06-25-2021 5:51:31 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 5:53:04 AM

0 kudos

The typical reason for data loss on a Structured streaming application is having an incorrect value set for watermarking. The watermarking is done to ensure the application does not develop the state for a long period, However, it should be ensured ...

0 kudos

06-25-2021 5:53:04 AM

by User16869510359 • Esteemed Contributor

06-25-2021 5:48:32 AM

488 Views
1 replies
0 kudos

Resolved! Does Table ACL support column-level security like Ranger?

I have used Ranger in Apache Hadoop and it works fine for my use case. Now that I am migrating my workloads from Apache Hadoop to Databricks

Data Engineering

488 Views
1 replies
0 kudos

06-25-2021 5:48:32 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 5:49:39 AM

0 kudos

Currently, Table ACL does not support column-level security. There are several tools like Privcera which has better integration with Databricks.

0 kudos

06-25-2021 5:49:39 AM

by User16752240150 • New Contributor II

06-04-2021 12:04:59 PM

4012 Views
1 replies
0 kudos

When to use cache vs checkpoint?

I've seen .cache() and .checkpoint() used similarly in some workflows I've come across. What's the difference, and when should I use one over the other?

Data Engineering

4012 Views
1 replies
0 kudos

06-04-2021 12:04:59 PM

View Replies

Latest Reply

Srikanth_Gupta_
Valued Contributor

06-25-2021 5:48:48 AM

0 kudos

Caching is extremely useful than checkpointing when you have lot of available memory to store your RDD or Dataframes if they are massive.Caching will maintain the result of your transformations so that those transformations will not have to be recomp...

0 kudos

06-25-2021 5:48:48 AM

by User16826994223 • Honored Contributor III

06-25-2021 5:42:34 AM

505 Views
1 replies
0 kudos

Why a new project instead of putting this in Apache Spark itself

Data Engineering

505 Views
1 replies
0 kudos

06-25-2021 5:42:34 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 5:42:58 AM

0 kudos

1. We want a venue in which we can rapidly iterate and make new releases. The overhead of making a release as a separate project is minuscule (in the order of minutes). A release on Spark takes a lot longer (in the order of days)2. Koalas takes a dif...

0 kudos

06-25-2021 5:42:58 AM

by User16826994223 • Honored Contributor III

06-25-2021 3:42:38 AM

732 Views
1 replies
0 kudos

How is Koalas different from Dask

Data Engineering

732 Views
1 replies
0 kudos

06-25-2021 3:42:38 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 3:42:53 AM

0 kudos

Different projects have different focuses. Spark is already deployed in virtually every organization, and often is the primary interface to the massive amount of data stored in data lakes. Koalas was inspired by Dask, and aims to make the transition ...

0 kudos

06-25-2021 3:42:53 AM

by User16826994223 • Honored Contributor III

06-25-2021 3:38:45 AM

592 Views
1 replies
1 kudos

Does Databricks have a data processing agreement?

Data Engineering

592 Views
1 replies
1 kudos

06-25-2021 3:38:45 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 3:39:03 AM

1 kudos

Databricks offers a standalone data processing agreement to comply with certain data protection laws that contains our contractual commitments with respect to applicable data protection and privacy law. If your company determines that you require ter...

1 kudos

06-25-2021 3:39:03 AM

by User16826994223 • Honored Contributor III

06-25-2021 3:35:54 AM

2509 Views
1 replies
0 kudos

Do login sessions into Databricks have an idle timeout?

Data Engineering

2509 Views
1 replies
0 kudos

06-25-2021 3:35:54 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 3:36:17 AM

0 kudos

Short Answer:YesDetailed Answer:User sessions automatically timeout after six hours of idle time. This timeout is not configurable. User sessions are terminated if the user is removed from the workspace. To trigger session end for users who were remo...

0 kudos

06-25-2021 3:36:17 AM

by Anonymous • Not applicable

06-24-2021 9:52:53 PM

596 Views
1 replies
0 kudos

Is it possible to disable the beta features once it is enabled in the workspace?

Data Engineering

596 Views
1 replies
0 kudos

06-24-2021 9:52:53 PM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 3:32:04 AM

0 kudos

for any other non-private previews, they can check out admin console --> advanced tab. there are tons of toggles there to enable/disable features. if it’s not there, there usually isn’t an easy (or direct) way of disabling

0 kudos

06-25-2021 3:32:04 AM

by Anonymous • Not applicable

06-24-2021 10:11:41 PM

782 Views
1 replies
0 kudos

Is it possible to cancel a Spark query submitted via JDBC without restarting the cluster?

Data Engineering

782 Views
1 replies
0 kudos

06-24-2021 10:11:41 PM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 3:14:54 AM

0 kudos

you can go to spark UI and Kill it

0 kudos

06-25-2021 3:14:54 AM

by Anonymous • Not applicable

06-24-2021 10:14:54 PM

558 Views
1 replies
0 kudos

Does Delta have database schema?

Data Engineering

558 Views
1 replies
0 kudos

06-24-2021 10:14:54 PM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 3:10:59 AM

0 kudos

This will be coming with Unity. There will be a third-level namespace. The default HM will be the default database in unity most likely.

0 kudos

06-25-2021 3:10:59 AM

by Anonymous • Not applicable

06-24-2021 10:18:52 PM

454 Views
1 replies
0 kudos

Can someone please provide more details on the %%bash magic command?

Data Engineering

454 Views
1 replies
0 kudos

06-24-2021 10:18:52 PM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 3:01:38 AM

0 kudos

Is it really %%bash or may be you are referring to %sh or %fs or dbutils command

0 kudos

06-25-2021 3:01:38 AM

User

Count

1601

736

343

284

247

Databricks

Forum Posts

How to change the time zone in notebook ,

Resolved! How Can I update the DBR versions of all my jobs in one go?

Resolved! What is the trade-off of using an unsupported DBR version on my cluster?

Resolved! Getting file permission issues even though I have the right IAM role attached

Resolved! Why do I see data loss with Structured streaming jobs?

Resolved! Does Table ACL support column-level security like Ranger?

When to use cache vs checkpoint?

Why a new project instead of putting this in Apache Spark itself

How is Koalas different from Dask

Does Databricks have a data processing agreement?

Do login sessions into Databricks have an idle timeout?

Is it possible to disable the beta features once it is enabled in the workspace?

Is it possible to cancel a Spark query submitted via JDBC without restarting the cluster?

Does Delta have database schema?

Can someone please provide more details on the %%bash magic command?

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...