Data Engineering

Forum Posts

Sorted by:

by User16826994223 • Honored Contributor III

06-18-2021 4:00:26 AM

1051 Views
1 replies
0 kudos

What is databricks Sync

I am trying to migrate my workload to another workspace ( from ST to E2), I am planning to use data bricks sync, but still I am not sure, will it migrate everything like , currents, user , groups, job, notebook etc or has some limitations which I s...

Data Engineering

1051 Views
1 replies
0 kudos

06-18-2021 4:00:26 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-22-2021 5:55:11 PM

0 kudos

Here is the support matrix for import/export operations for databricks-syncAlso checkout https://github.com/databrickslabs/migrate

0 kudos

06-22-2021 5:55:11 PM

by User16826994223 • Honored Contributor III

06-21-2021 5:57:04 AM

858 Views
1 replies
0 kudos

How do we manage data recency in Databricks

I want to know how databricks maintain data recency in databricks

Data Engineering

858 Views
1 replies
0 kudos

06-21-2021 5:57:04 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-22-2021 5:43:42 PM

0 kudos

When using delta tables in databricks, you have the advantage of delta cache which accelerates data reads by creating copies of remote files in nodes’ local storage using a fast intermediate data format. At the beginning of each query delta tables au...

0 kudos

06-22-2021 5:43:42 PM

by MoJaMa • Valued Contributor II

06-22-2021 5:26:58 PM

859 Views
1 replies
0 kudos

Since Databricks manages the runtime on SQL Endpoints, how do I know which version I'm on?

Data Engineering

859 Views
1 replies
0 kudos

06-22-2021 5:26:58 PM

View Replies

Latest Reply

MoJaMa
Valued Contributor II

06-22-2021 5:28:39 PM

0 kudos

Start an endpointRun a queryGo to Query HistoryClick Details, Go to the Environment tabSearch sparkVersion.

0 kudos

06-22-2021 5:28:39 PM

by User16826994223 • Honored Contributor III

06-22-2021 2:25:35 AM

1038 Views
1 replies
0 kudos

Why NPIP is an optional and not mandatory

Even though the NPIP is more secure as the network traffic travel through Microsoft backbone network why it is optional , it should be mandatory, is there some limitataion or a case where we may not able to use NPIP .

Data Engineering

1038 Views
1 replies
0 kudos

06-22-2021 2:25:35 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-22-2021 5:26:34 PM

0 kudos

NPIP / secure cluster connectivity requires a NAT gateway (or similar appliance) for outbound traffic from your workspace’s subnets to the Azure backbone and public network. This incurs a small additional cost. Also, it is worth mentioning that ne...

0 kudos

06-22-2021 5:26:34 PM

by MoJaMa • Valued Contributor II

06-22-2021 5:22:11 PM

837 Views
1 replies
0 kudos

Databricks on GCP. How many partitions of local ssd does Databricks need per VM?

Data Engineering

837 Views
1 replies
0 kudos

06-22-2021 5:22:11 PM

View Replies

Latest Reply

MoJaMa
Valued Contributor II

06-22-2021 5:24:10 PM

0 kudos

Each local disk is 375 GB.So, for example, for n2-standard-4, it is 2 local disks. (0.75TB /2)https://databricks.com/wp-content/uploads/2021/05/GCP-Pricing-Estimator-v2.pdf?_ga=2.241263109.66068867.1623086616-828667513.1602536526

0 kudos

06-22-2021 5:24:10 PM

by MoJaMa • Valued Contributor II

06-22-2021 5:20:18 PM

635 Views
1 replies
0 kudos

Databricks on GCP. For the persistent storage with each node what's the specific type Databricks uses?

Data Engineering

635 Views
1 replies
0 kudos

06-22-2021 5:20:18 PM

View Replies

Latest Reply

MoJaMa
Valued Contributor II

06-22-2021 5:20:50 PM

0 kudos

They are Zonal SSD Persistent Diskhttps://cloud.google.com/compute/docs/disks#introduction

0 kudos

06-22-2021 5:20:50 PM

by User16826994223 • Honored Contributor III

06-22-2021 4:53:45 AM

1144 Views
2 replies
0 kudos

Don't want checkpoint in delta

Suppose I am not interested in checkpoints, how can I disable Checkpoints write in delta

Data Engineering

1144 Views
2 replies
0 kudos

06-22-2021 4:53:45 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-22-2021 5:13:57 PM

0 kudos

Writing statistics in a checkpoint has a cost which is visible usually only for very large tables. However it is worth mentioning that, this statistics would be very useful for data skipping which speeds up subsequent operations. In Databricks Runti...

0 kudos

06-22-2021 5:13:57 PM

1 More Replies

by Digan_Parikh • Valued Contributor

06-22-2021 4:50:40 PM

1079 Views
1 replies
0 kudos

Resolved! Delta Live Table - landing database?

Where do you specify what database the DLT tables land in?

Data Engineering

1079 Views
1 replies
0 kudos

06-22-2021 4:50:40 PM

View Replies

Latest Reply

Digan_Parikh
Valued Contributor

06-22-2021 4:53:02 PM

0 kudos

The target key, when creating the pipeline specifies the database that the tables get published to. Documented here - https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-user-guide.html#publish-tables

0 kudos

06-22-2021 4:53:02 PM

by Anonymous • Not applicable

06-22-2021 11:38:15 AM

1531 Views
1 replies
0 kudos

Resolved! Questions on using Docker image with Databricks Container Service

Specifically, we have in mind:* Create a Databricks job for testing API changes (the API library is built in a custom Jar file)* When we want to test an API change, build a Docker image with the relevant changes in a Jar file* Update the job configur...

Data Engineering

1531 Views
1 replies
0 kudos

06-22-2021 11:38:15 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-22-2021 4:32:11 PM

0 kudos

>Where do we put custom Jar files when building the Docker image? /databricks/jars>How do we update the job configuration so that the job’s cluster will be built with this new Docker image, and how long do we expect this re-configuring process to tak...

0 kudos

06-22-2021 4:32:11 PM

by brickster_2018 • Esteemed Contributor

06-22-2021 4:25:48 PM

5987 Views
1 replies
2 kudos

Resolved! How to find the Databricks Platform version

Data Engineering

5987 Views
1 replies
2 kudos

06-22-2021 4:25:48 PM

View Replies

Latest Reply

brickster_2018
Esteemed Contributor

06-22-2021 4:27:06 PM

2 kudos

Use the below endpoint on your workspace. https://your-workspace-name.cloud.databricks.com/version

2 kudos

06-22-2021 4:27:06 PM

by brickster_2018 • Esteemed Contributor

06-22-2021 4:16:50 PM

1575 Views
1 replies
0 kudos

Resolved! Z-order or Partitioning? Which is better for Data skipping?

For Delta tables, among Z-order and Partioning which is recommended technique for efficient Data Skipping

Data Engineering

1575 Views
1 replies
0 kudos

06-22-2021 4:16:50 PM

View Replies

Latest Reply

brickster_2018
Esteemed Contributor

06-22-2021 4:19:13 PM

0 kudos

Partition pruning is the most efficient way to ensure Data skipping. However, choosing the right column for partitioning is very important. It's common to see choosing the wrong column for partitioning can cause a large number of small file problems ...

0 kudos

06-22-2021 4:19:13 PM

by Srikanth_Gupta_ • Valued Contributor

06-22-2021 7:56:54 AM

1114 Views
2 replies
0 kudos

I have several thousands of Delta tables in my Production, what is the best way to get counts

if I might need a dashboard to see increase in number of rows on day to day basis, also a dashboard that shows size of Parquet/Delta files in my Lake?

Data Engineering

1114 Views
2 replies
0 kudos

06-22-2021 7:56:54 AM

View Replies

Latest Reply

brickster_2018
Esteemed Contributor

06-22-2021 3:53:13 PM

0 kudos

val db = "database_name" spark.sessionState.catalog.listTables(db).map(table=>spark.sessionState.catalog.externalCatalog.getTable(table.database.get,table.table)).filter(x=>x.provider.toString().toLowerCase.contains("delta"))The above code snippet wi...

0 kudos

06-22-2021 3:53:13 PM

1 More Replies

by User16826992666 • Valued Contributor

06-22-2021 8:24:22 AM

3295 Views
2 replies
0 kudos

Resolved! Can I reset the checkpoint of a streaming job if I want to do a full reload of a table?

Data Engineering

3295 Views
2 replies
0 kudos

06-22-2021 8:24:22 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-22-2021 3:44:42 PM

0 kudos

If the read stream definition has something similar to val df = spark .read .format("kafka") .option("kafka.bootstrap.servers", "host1:port1,host2:port2") .option("subscribePattern", "topic.*") .option("startingOffsets", "earliest")resettin...

0 kudos

06-22-2021 3:44:42 PM

1 More Replies

by Anonymous • Not applicable

06-21-2021 2:46:41 PM

1144 Views
2 replies
0 kudos

Changing default Delta behavior in DBR 8.x for writes

Is there anyway to add a Spark Config that reverts the default behavior when doing tables writes from Delta to Parquet in DBR 8.0+? I know you can simply specify .format("parquet") but that could involve a decent amount of code change for some client...

Data Engineering

1144 Views
2 replies
0 kudos

06-21-2021 2:46:41 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-22-2021 3:26:30 PM

0 kudos

Thanks @Ryan Chynoweth !

0 kudos

06-22-2021 3:26:30 PM

1 More Replies

by User15761966159 • New Contributor

06-22-2021 12:17:19 PM

800 Views
1 replies
0 kudos

Does removing a User from the workspace automatically invalidate their tokens

If you have a user that is removed from the workspace, are the tokens they've created automatically invalidated?

Data Engineering

800 Views
1 replies
0 kudos

06-22-2021 12:17:19 PM

View Replies

Latest Reply

Ryan_Chynoweth
Honored Contributor III

06-22-2021 3:13:55 PM

0 kudos

Yes, PAT tokens will be invalid if a user is removed since those tokens are attached to their current credentials and access.

0 kudos

06-22-2021 3:13:55 PM

User

Count

1603

744

348

285

247

Databricks Community

Forum Posts

What is databricks Sync

How do we manage data recency in Databricks

Since Databricks manages the runtime on SQL Endpoints, how do I know which version I'm on?

Why NPIP is an optional and not mandatory

Databricks on GCP. How many partitions of local ssd does Databricks need per VM?

Databricks on GCP. For the persistent storage with each node what's the specific type Databricks uses?

Don't want checkpoint in delta

Resolved! Delta Live Table - landing database?

Resolved! Questions on using Docker image with Databricks Container Service

Resolved! How to find the Databricks Platform version

Resolved! Z-order or Partitioning? Which is better for Data skipping?

I have several thousands of Delta tables in my Production, what is the best way to get counts

Resolved! Can I reset the checkpoint of a streaming job if I want to do a full reload of a table?

Changing default Delta behavior in DBR 8.x for writes

Does removing a User from the workspace automatically invalidate their tokens

Compute Policy Does Not Install Libraries

Is there a way to let the DLT pipeline retry by it...

Can't create Catalog on Databricks on AWS

Executing Notebooks - Run All Cells vs Run All Bel...

getting Status code: 301 Moved Permanently error