Data Engineering

Forum Posts

Sorted by:

Start a conversation

by Anonymous • Not applicable

06-22-2021 7:14:53 PM

2670 Views
1 replies
0 kudos

What are some common third party libraries I can use to create visualizations in databricks python notebooks?

Data Engineering

2670 Views
1 replies
0 kudos

06-22-2021 7:14:53 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-22-2021 8:21:39 PM

0 kudos

You can use libraries such as Seaborn, Bokeh, Matplotlib, Plotly for visualization inside of Python notebooks. See https://docs.databricks.com/notebooks/visualizations/index.html#visualizations-in-pythonAlso, Databricks has its own built-in visualiza...

0 kudos

06-22-2021 8:21:39 PM

by aladda • Databricks Employee

06-18-2021 12:02:17 PM

11708 Views
2 replies
1 kudos

Resolved! What is a good way to ingest Google Analytics data into Databricks

Data Engineering

11708 Views
2 replies
1 kudos

06-18-2021 12:02:17 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-22-2021 8:06:22 PM

1 kudos

Thanks @Digan Parikh . Credit to Tahir Fayyaz, Found a couple of different paths depending on whether you're looking to bring in raw GA data vs aggregated GA data. 1) For Raw You can bring in data from GA Universal Analytics 360 Paid version or GA ...

1 kudos

06-22-2021 8:06:22 PM

1 More Replies

by Anonymous • Not applicable

06-22-2021 7:24:36 PM

1442 Views
0 replies
0 kudos

What are the resulting steps when two pyspark dataframes are co-grouped by a common key & a function is applied to each co-group?

Data Engineering

1442 Views
0 replies
0 kudos

06-22-2021 7:24:36 PM

by User16826994223 • Databricks Employee

06-18-2021 4:00:26 AM

2022 Views
1 replies
0 kudos

What is databricks Sync

I am trying to migrate my workload to another workspace ( from ST to E2), I am planning to use data bricks sync, but still I am not sure, will it migrate everything like , currents, user , groups, job, notebook etc or has some limitations which I s...

Data Engineering

2022 Views
1 replies
0 kudos

06-18-2021 4:00:26 AM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-22-2021 5:55:11 PM

0 kudos

Here is the support matrix for import/export operations for databricks-syncAlso checkout https://github.com/databrickslabs/migrate

0 kudos

06-22-2021 5:55:11 PM

by User16826994223 • Databricks Employee

06-21-2021 5:57:04 AM

1800 Views
1 replies
0 kudos

How do we manage data recency in Databricks

I want to know how databricks maintain data recency in databricks

Data Engineering

1800 Views
1 replies
0 kudos

06-21-2021 5:57:04 AM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-22-2021 5:43:42 PM

0 kudos

When using delta tables in databricks, you have the advantage of delta cache which accelerates data reads by creating copies of remote files in nodes’ local storage using a fast intermediate data format. At the beginning of each query delta tables au...

0 kudos

06-22-2021 5:43:42 PM

by MoJaMa • Databricks Employee

06-22-2021 5:26:58 PM

1667 Views
1 replies
0 kudos

Since Databricks manages the runtime on SQL Endpoints, how do I know which version I'm on?

Data Engineering

1667 Views
1 replies
0 kudos

06-22-2021 5:26:58 PM

View Replies

Latest Reply

MoJaMa
Databricks Employee

06-22-2021 5:28:39 PM

0 kudos

Start an endpointRun a queryGo to Query HistoryClick Details, Go to the Environment tabSearch sparkVersion.

0 kudos

06-22-2021 5:28:39 PM

by User16826994223 • Databricks Employee

06-22-2021 2:25:35 AM

2028 Views
1 replies
0 kudos

Why NPIP is an optional and not mandatory

Even though the NPIP is more secure as the network traffic travel through Microsoft backbone network why it is optional , it should be mandatory, is there some limitataion or a case where we may not able to use NPIP .

Data Engineering

2028 Views
1 replies
0 kudos

06-22-2021 2:25:35 AM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-22-2021 5:26:34 PM

0 kudos

NPIP / secure cluster connectivity requires a NAT gateway (or similar appliance) for outbound traffic from your workspace’s subnets to the Azure backbone and public network. This incurs a small additional cost. Also, it is worth mentioning that ne...

0 kudos

06-22-2021 5:26:34 PM

by MoJaMa • Databricks Employee

06-22-2021 5:22:11 PM

1573 Views
1 replies
0 kudos

Databricks on GCP. How many partitions of local ssd does Databricks need per VM?

Data Engineering

1573 Views
1 replies
0 kudos

06-22-2021 5:22:11 PM

View Replies

Latest Reply

MoJaMa
Databricks Employee

06-22-2021 5:24:10 PM

0 kudos

Each local disk is 375 GB.So, for example, for n2-standard-4, it is 2 local disks. (0.75TB /2)https://databricks.com/wp-content/uploads/2021/05/GCP-Pricing-Estimator-v2.pdf?_ga=2.241263109.66068867.1623086616-828667513.1602536526

0 kudos

06-22-2021 5:24:10 PM

by MoJaMa • Databricks Employee

06-22-2021 5:20:18 PM

1244 Views
1 replies
0 kudos

Databricks on GCP. For the persistent storage with each node what's the specific type Databricks uses?

Data Engineering

1244 Views
1 replies
0 kudos

06-22-2021 5:20:18 PM

View Replies

Latest Reply

MoJaMa
Databricks Employee

06-22-2021 5:20:50 PM

0 kudos

They are Zonal SSD Persistent Diskhttps://cloud.google.com/compute/docs/disks#introduction

0 kudos

06-22-2021 5:20:50 PM

by User16826994223 • Databricks Employee

06-22-2021 4:53:45 AM

2449 Views
2 replies
0 kudos

Don't want checkpoint in delta

Suppose I am not interested in checkpoints, how can I disable Checkpoints write in delta

Data Engineering

2449 Views
2 replies
0 kudos

06-22-2021 4:53:45 AM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-22-2021 5:13:57 PM

0 kudos

Writing statistics in a checkpoint has a cost which is visible usually only for very large tables. However it is worth mentioning that, this statistics would be very useful for data skipping which speeds up subsequent operations. In Databricks Runti...

0 kudos

06-22-2021 5:13:57 PM

1 More Replies

by Digan_Parikh • Databricks Employee

06-22-2021 4:50:40 PM

2233 Views
1 replies
0 kudos

Resolved! Delta Live Table - landing database?

Where do you specify what database the DLT tables land in?

Data Engineering

2233 Views
1 replies
0 kudos

06-22-2021 4:50:40 PM

View Replies

Latest Reply

Digan_Parikh
Databricks Employee

06-22-2021 4:53:02 PM

0 kudos

The target key, when creating the pipeline specifies the database that the tables get published to. Documented here - https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-user-guide.html#publish-tables

0 kudos

06-22-2021 4:53:02 PM

by Anonymous • Not applicable

06-22-2021 11:38:15 AM

3210 Views
1 replies
0 kudos

Resolved! Questions on using Docker image with Databricks Container Service

Specifically, we have in mind:* Create a Databricks job for testing API changes (the API library is built in a custom Jar file)* When we want to test an API change, build a Docker image with the relevant changes in a Jar file* Update the job configur...

Data Engineering

3210 Views
1 replies
0 kudos

06-22-2021 11:38:15 AM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-22-2021 4:32:11 PM

0 kudos

>Where do we put custom Jar files when building the Docker image? /databricks/jars>How do we update the job configuration so that the job’s cluster will be built with this new Docker image, and how long do we expect this re-configuring process to tak...

0 kudos

06-22-2021 4:32:11 PM

by brickster_2018 • Databricks Employee

06-22-2021 4:25:48 PM

13089 Views
1 replies
2 kudos

Resolved! How to find the Databricks Platform version

Data Engineering

13089 Views
1 replies
2 kudos

06-22-2021 4:25:48 PM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-22-2021 4:27:06 PM

2 kudos

Use the below endpoint on your workspace. https://your-workspace-name.cloud.databricks.com/version

2 kudos

06-22-2021 4:27:06 PM

by brickster_2018 • Databricks Employee

06-22-2021 4:16:50 PM

3384 Views
1 replies
0 kudos

Resolved! Z-order or Partitioning? Which is better for Data skipping?

For Delta tables, among Z-order and Partioning which is recommended technique for efficient Data Skipping

Data Engineering

3384 Views
1 replies
0 kudos

06-22-2021 4:16:50 PM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-22-2021 4:19:13 PM

0 kudos

Partition pruning is the most efficient way to ensure Data skipping. However, choosing the right column for partitioning is very important. It's common to see choosing the wrong column for partitioning can cause a large number of small file problems ...

0 kudos

06-22-2021 4:19:13 PM

by Srikanth_Gupta_ • Databricks Employee

06-22-2021 7:56:54 AM

2162 Views
2 replies
0 kudos

I have several thousands of Delta tables in my Production, what is the best way to get counts

if I might need a dashboard to see increase in number of rows on day to day basis, also a dashboard that shows size of Parquet/Delta files in my Lake?

Data Engineering

2162 Views
2 replies
0 kudos

06-22-2021 7:56:54 AM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-22-2021 3:53:13 PM

0 kudos

val db = "database_name" spark.sessionState.catalog.listTables(db).map(table=>spark.sessionState.catalog.externalCatalog.getTable(table.database.get,table.table)).filter(x=>x.provider.toString().toLowerCase.contains("delta"))The above code snippet wi...

0 kudos

06-22-2021 3:53:13 PM

1 More Replies

Databricks Community

Forum Posts

What are some common third party libraries I can use to create visualizations in databricks python notebooks?

Resolved! What is a good way to ingest Google Analytics data into Databricks

What are the resulting steps when two pyspark dataframes are co-grouped by a common key & a function is applied to each co-group?

What is databricks Sync

How do we manage data recency in Databricks

Since Databricks manages the runtime on SQL Endpoints, how do I know which version I'm on?

Why NPIP is an optional and not mandatory

Databricks on GCP. How many partitions of local ssd does Databricks need per VM?

Databricks on GCP. For the persistent storage with each node what's the specific type Databricks uses?

Don't want checkpoint in delta

Resolved! Delta Live Table - landing database?

Resolved! Questions on using Docker image with Databricks Container Service

Resolved! How to find the Databricks Platform version

Resolved! Z-order or Partitioning? Which is better for Data skipping?

I have several thousands of Delta tables in my Production, what is the best way to get counts

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template