Data Engineering

Forum Posts

Sorted by:

Start a conversation

by User16869510359 • Esteemed Contributor

06-25-2021 3:46:28 PM

687 Views
1 replies
0 kudos

What are the advantages of using RocksDB State store compared to HDFS backed state store

Data Engineering

687 Views
1 replies
0 kudos

06-25-2021 3:46:28 PM

View Replies

Latest Reply

aladda
Honored Contributor II

06-25-2021 4:01:35 PM

0 kudos

Can you provide some additional details on this? What components are we comparing the states for?

0 kudos

06-25-2021 4:01:35 PM

by User16869510359 • Esteemed Contributor

06-25-2021 3:57:59 PM

1120 Views
1 replies
0 kudos

What to do if I accidentally deleted Delta log directory

Data Engineering

1120 Views
1 replies
0 kudos

06-25-2021 3:57:59 PM

View Replies

Latest Reply

aladda
Honored Contributor II

06-25-2021 4:00:10 PM

0 kudos

Deleting the Delta log directory would cause you to lose the underlying transaction history on the delta table and other delta related optimizations. In effect the table would be converted to a Parquet table at that point

0 kudos

06-25-2021 4:00:10 PM

by User16790091296 • Contributor II

06-25-2021 3:23:42 PM

760 Views
2 replies
0 kudos

Cluster Policies: Are there any examples around implementations of cluster policies?

Data Engineering

760 Views
2 replies
0 kudos

06-25-2021 3:23:42 PM

View Replies

Latest Reply

Taha
New Contributor III

06-25-2021 3:59:25 PM

0 kudos

Also, a lot of examples here: https://docs.databricks.com/administration-guide/clusters/policies.html#cluster-policy-examples

0 kudos

06-25-2021 3:59:25 PM

1 More Replies

by User16869510359 • Esteemed Contributor

06-25-2021 3:33:57 PM

811 Views
1 replies
0 kudos

What is the advantage of using the new Global init script compared to the legacy init script

Data Engineering

811 Views
1 replies
0 kudos

06-25-2021 3:33:57 PM

View Replies

Latest Reply

aladda
Honored Contributor II

06-25-2021 3:57:44 PM

0 kudos

Global: run on every cluster in the workspace. They can help you to enforce consistent cluster configurations across your workspace. Use them carefully because they can cause unanticipated impacts, like library conflicts. Only admin users can create ...

0 kudos

06-25-2021 3:57:44 PM

by User16869510359 • Esteemed Contributor

06-25-2021 3:57:12 PM

993 Views
0 replies
1 kudos

Why do I see HangingTaskDetector message in my executor logs

Data Engineering

993 Views
0 replies
1 kudos

06-25-2021 3:57:12 PM

by MoJaMa • Valued Contributor II

06-25-2021 3:56:09 PM

770 Views
1 replies
0 kudos

Does Databricks support VPC migration for an E2 workspace for both Customer Managed VPC and Databricks Managed VPC?

Data Engineering

770 Views
1 replies
0 kudos

06-25-2021 3:56:09 PM

View Replies

Latest Reply

MoJaMa
Valued Contributor II

06-25-2021 3:56:59 PM

0 kudos

Yes.Currently supported paths are:BYO VPC -> New BYO VPCDatabricks-created VPC -> New Databricks-created VPC

0 kudos

06-25-2021 3:56:59 PM

by User16869510359 • Esteemed Contributor

06-25-2021 3:48:30 PM

819 Views
1 replies
0 kudos

Can I use a custom log4j file for the Databricks cluster

Data Engineering

819 Views
1 replies
0 kudos

06-25-2021 3:48:30 PM

View Replies

Latest Reply

aladda
Honored Contributor II

06-25-2021 3:56:32 PM

0 kudos

See this knowledge base article for details - https://kb.databricks.com/clusters/overwrite-log4j-logs.html

0 kudos

06-25-2021 3:56:32 PM

by User16869510359 • Esteemed Contributor

06-25-2021 3:50:46 PM

631 Views
1 replies
1 kudos

Can I run the Python2 Pyspark application in my Databricks GCP workspace

Data Engineering

631 Views
1 replies
1 kudos

06-25-2021 3:50:46 PM

View Replies

Latest Reply

aladda
Honored Contributor II

06-25-2021 3:55:46 PM

1 kudos

For Databricks Runtime 5.5 LTS, Spark jobs, Python notebook cells, and library installation all support both Python 2 and 3.The default Python version for clusters created using the UI is Python 3. In Databricks Runtime 5.5 LTS the default version fo...

1 kudos

06-25-2021 3:55:46 PM

by User16869510359 • Esteemed Contributor

06-25-2021 3:54:52 PM

411 Views
0 replies
0 kudos

What are real-time audit logs? How is it different from the usual Audit logs?

Data Engineering

411 Views
0 replies
0 kudos

06-25-2021 3:54:52 PM

by MoJaMa • Valued Contributor II

06-25-2021 3:54:16 PM

463 Views
0 replies
0 kudos

Regarding DBFS encryption using CMK: Is there a way to patch existing workspaces so DBFS can be encrypted using a customer managed key? or is this only for new workspaces?

Data Engineering

463 Views
0 replies
0 kudos

06-25-2021 3:54:16 PM

by User16869510359 • Esteemed Contributor

06-25-2021 3:54:10 PM

907 Views
0 replies
0 kudos

What are Closure and Serialization in Spark? How to avoid TaskNotSerailizable error

Data Engineering

907 Views
0 replies
0 kudos

06-25-2021 3:54:10 PM

by User16869510359 • Esteemed Contributor

06-25-2021 3:53:26 PM

634 Views
0 replies
0 kudos

Is there a limit on the output generated in the stdout from a job cluster

Data Engineering

634 Views
0 replies
0 kudos

06-25-2021 3:53:26 PM

by User16869509994 • New Contributor II

06-24-2021 7:21:43 AM

730 Views
1 replies
1 kudos

In Databricks UI /Workspace and /Repos are in same level but while reading a CSV file in Repos Notebooks why do we need to give the path as /Workspace/Repos...?

Data Engineering

730 Views
1 replies
1 kudos

06-24-2021 7:21:43 AM

View Replies

Latest Reply

aladda
Honored Contributor II

06-25-2021 3:52:43 PM

1 kudos

Can you provide an example of what exactly do you mean? If the reference is to how "Repos" shows up in the UI, that's more for a Ux convenience. Repos as such are designed to be a container for version controlled notebooks that live in the Git reposi...

1 kudos

06-25-2021 3:52:43 PM

by User16869510359 • Esteemed Contributor

06-25-2021 3:51:24 PM

505 Views
0 replies
0 kudos

Is it safe to set ignoreMissingFiles to true on a Streaming workload

Data Engineering

505 Views
0 replies
0 kudos

06-25-2021 3:51:24 PM

by User16790091296 • Contributor II

06-25-2021 3:38:16 PM

1026 Views
2 replies
0 kudos

What is the difference between delta lake (Databricks) and Delta Lake (Open Source -Maven) ?

Data Engineering

1026 Views
2 replies
0 kudos

06-25-2021 3:38:16 PM

View Replies

Latest Reply

aladda
Honored Contributor II

06-25-2021 3:51:16 PM

0 kudos

Delta Lake on Databricks has added runtime optimizations of the Delta Engine that further enhance the performance and scale of the open source Delta Format. In additional you also get access to a whole host of capabilities available on the Databricks...

0 kudos

06-25-2021 3:51:16 PM

1 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

What are the advantages of using RocksDB State store compared to HDFS backed state store

What to do if I accidentally deleted Delta log directory

Cluster Policies: Are there any examples around implementations of cluster policies?

What is the advantage of using the new Global init script compared to the legacy init script

Why do I see HangingTaskDetector message in my executor logs

Does Databricks support VPC migration for an E2 workspace for both Customer Managed VPC and Databricks Managed VPC?

Can I use a custom log4j file for the Databricks cluster

Can I run the Python2 Pyspark application in my Databricks GCP workspace

What are real-time audit logs? How is it different from the usual Audit logs?

Regarding DBFS encryption using CMK: Is there a way to patch existing workspaces so DBFS can be encrypted using a customer managed key? or is this only for new workspaces?

What are Closure and Serialization in Spark? How to avoid TaskNotSerailizable error

Is there a limit on the output generated in the stdout from a job cluster

In Databricks UI /Workspace and /Repos are in same level but while reading a CSV file in Repos Notebooks why do we need to give the path as /Workspace/Repos...?

Is it safe to set ignoreMissingFiles to true on a Streaming workload

What is the difference between delta lake (Databricks) and Delta Lake (Open Source -Maven) ?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...