cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by Not applicable
  • 2670 Views
  • 1 replies
  • 0 kudos
  • 2670 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

You can use libraries such as Seaborn, Bokeh, Matplotlib, Plotly for visualization inside of Python notebooks. See https://docs.databricks.com/notebooks/visualizations/index.html#visualizations-in-pythonAlso, Databricks has its own built-in visualiza...

  • 0 kudos
aladda
by Databricks Employee
  • 11708 Views
  • 2 replies
  • 1 kudos
  • 11708 Views
  • 2 replies
  • 1 kudos
Latest Reply
aladda
Databricks Employee
  • 1 kudos

Thanks @Digan Parikh​ . Credit to Tahir Fayyaz, Found a couple of different paths depending on whether you're looking to bring in raw GA data vs aggregated GA data. 1) For Raw You can bring in data from GA Universal Analytics 360 Paid version or GA ...

  • 1 kudos
1 More Replies
User16826994223
by Databricks Employee
  • 2022 Views
  • 1 replies
  • 0 kudos

What is databricks Sync

I am trying to migrate my workload to another workspace ( from ST to E2), I am planning to use data bricks sync, but still I am not sure, will it migrate everything like , currents, user , groups, job, notebook etc or has some limitations which I s...

  • 2022 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

Here is the support matrix for import/export operations for databricks-syncAlso checkout https://github.com/databrickslabs/migrate

  • 0 kudos
User16826994223
by Databricks Employee
  • 1800 Views
  • 1 replies
  • 0 kudos

How do we manage data recency in Databricks

I want to know how databricks maintain data recency in databricks

  • 1800 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

When using delta tables in databricks, you have the advantage of delta cache which accelerates data reads by creating copies of remote files in nodes’ local storage using a fast intermediate data format. At the beginning of each query delta tables au...

  • 0 kudos
User16826994223
by Databricks Employee
  • 2028 Views
  • 1 replies
  • 0 kudos

Why NPIP is an optional and not mandatory

Even though the NPIP is more secure as the network traffic travel through Microsoft backbone network why it is optional , it should be mandatory, is there some limitataion or a case where we may not able to use NPIP .

  • 2028 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

NPIP / secure cluster connectivity  requires a NAT gateway (or similar appliance) for outbound traffic from your workspace’s subnets to the Azure backbone and public network. This incurs a small additional cost. Also, it is worth mentioning that ne...

  • 0 kudos
MoJaMa
by Databricks Employee
  • 1573 Views
  • 1 replies
  • 0 kudos
  • 1573 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

Each local disk is 375 GB.So, for example, for n2-standard-4, it is 2 local disks. (0.75TB /2)https://databricks.com/wp-content/uploads/2021/05/GCP-Pricing-Estimator-v2.pdf?_ga=2.241263109.66068867.1623086616-828667513.1602536526

  • 0 kudos
User16826994223
by Databricks Employee
  • 2449 Views
  • 2 replies
  • 0 kudos

Don't want checkpoint in delta

Suppose I am not interested in checkpoints, how can I disable Checkpoints write in delta

  • 2449 Views
  • 2 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

Writing statistics in a checkpoint has a cost which is visible usually only for very large tables. However it is worth mentioning that, this statistics would be very useful for data skipping which speeds up subsequent operations. In Databricks Runti...

  • 0 kudos
1 More Replies
Digan_Parikh
by Databricks Employee
  • 2233 Views
  • 1 replies
  • 0 kudos

Resolved! Delta Live Table - landing database?

Where do you specify what database the DLT tables land in?

  • 2233 Views
  • 1 replies
  • 0 kudos
Latest Reply
Digan_Parikh
Databricks Employee
  • 0 kudos

The target key, when creating the pipeline specifies the database that the tables get published to. Documented here - https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-user-guide.html#publish-tables

  • 0 kudos
Anonymous
by Not applicable
  • 3210 Views
  • 1 replies
  • 0 kudos

Resolved! Questions on using Docker image with Databricks Container Service

Specifically, we have in mind:* Create a Databricks job for testing API changes (the API library is built in a custom Jar file)* When we want to test an API change, build a Docker image with the relevant changes in a Jar file* Update the job configur...

  • 3210 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

>Where do we put custom Jar files when building the Docker image? /databricks/jars>How do we update the job configuration so that the job’s cluster will be built with this new Docker image, and how long do we expect this re-configuring process to tak...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 3384 Views
  • 1 replies
  • 0 kudos

Resolved! Z-order or Partitioning? Which is better for Data skipping?

For Delta tables, among Z-order and Partioning which is recommended technique for efficient Data Skipping

  • 3384 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Partition pruning is the most efficient way to ensure Data skipping. However, choosing the right column for partitioning is very important. It's common to see choosing the wrong column for partitioning can cause a large number of small file problems ...

  • 0 kudos
Srikanth_Gupta_
by Databricks Employee
  • 2162 Views
  • 2 replies
  • 0 kudos

I have several thousands of Delta tables in my Production, what is the best way to get counts

if I might need a dashboard to see increase in number of rows on day to day basis, also a dashboard that shows size of Parquet/Delta files in my Lake?

  • 2162 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

val db = "database_name" spark.sessionState.catalog.listTables(db).map(table=>spark.sessionState.catalog.externalCatalog.getTable(table.database.get,table.table)).filter(x=>x.provider.toString().toLowerCase.contains("delta"))The above code snippet wi...

  • 0 kudos
1 More Replies
Labels