cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

aladda
by Honored Contributor II
  • 1098 Views
  • 1 replies
  • 0 kudos
  • 1098 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Courtesy of my colleague Sri, here's some sample library code to execute on a databricks cluster with a short SLAimport logging import textwrap import time from typing import Text from databricks_cli.sdk import ApiClient, ClusterService # Create a cu...

  • 0 kudos
User16783853501
by New Contributor II
  • 1644 Views
  • 2 replies
  • 1 kudos

using Spark SQL or particularly %SQL in a databricks notebook, is there a way to use pagination or offset or skip ?

using Spark SQL or particularly %SQL in a databricks notebook, is there a way to use pagination or offset or skip ? 

  • 1644 Views
  • 2 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

There is no offset support yet. Here are a few possible workarounds If you data is all in one partition ( rarely the case ) , you could create a column with monotonically_increasing_id and apply filter conditions. if there are multiple partitions...

  • 1 kudos
1 More Replies
aladda
by Honored Contributor II
  • 971 Views
  • 1 replies
  • 0 kudos
  • 971 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Delta Live Table supports the data quality checks via expectations. On encountering invalid records you can choose to either retain them, drop them or fail/stop the pipeline. See the link below for additional detailshttps://docs.databricks.com/data-e...

  • 0 kudos
aladda
by Honored Contributor II
  • 2298 Views
  • 1 replies
  • 0 kudos
  • 2298 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Here's the difference a View and Table in the context of a Delta Live Table PIpelineViews are similar to a temporary view in SQL and are an alias for some computation. A view allows you to break a complicated query into smaller or easier-to-understan...

  • 0 kudos
aladda
by Honored Contributor II
  • 1077 Views
  • 1 replies
  • 0 kudos
  • 1077 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Yes. You can specify a "target" database as part of your DLT pipeline configuration to publish results to a target database in the metastore. See - https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-quickstart.html#publi...

  • 0 kudos
aladda
by Honored Contributor II
  • 899 Views
  • 1 replies
  • 0 kudos
  • 899 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

DLT Pipeline results are published to the "Storage Location" defined as part of configuring the Pipeline. Ex:- https://docs.databricks.com/_images/dlt-create-notebook-pipeline.pngIf an explicit Storage Location is not specified, the pipeline results ...

  • 0 kudos
aladda
by Honored Contributor II
  • 1237 Views
  • 1 replies
  • 1 kudos
  • 1237 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Honored Contributor II
  • 1 kudos

Notebooks with Delta Live Table/View definition just contain the pipeline definition. In order to execute Delta Live Tables Notebooks you need to define a Pipeline via the Jobs UI. Pipeline carries with it the logic to build the dependency graph betw...

  • 1 kudos
User16137833804
by New Contributor III
  • 1288 Views
  • 1 replies
  • 1 kudos
  • 1288 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

You could have the single node cluster where proxy is installed monitored by one of the tools like cloudwatch, azure monitor, datadog etc and have it configured to send alerts on node failure

  • 1 kudos
User16826994223
by Honored Contributor III
  • 928 Views
  • 1 replies
  • 0 kudos

DBFS root resides in the Customer account or Databricks Account

IF I installed the root Bucket I see a root bucket is created with workspace, Does this bucket resided in Customer account or Databricks Account. How can I Access the bucket and can i see this bucket directly in s3 or ADLS

  • 928 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Didin't get the reference about installing bucket ? did you mean configured a workspace with root bucket. If so, you'd have probably gathered that root storage for a workspace resides in customer's account

  • 0 kudos
Ryan_Chynoweth
by Esteemed Contributor
  • 2387 Views
  • 2 replies
  • 1 kudos
  • 2387 Views
  • 2 replies
  • 1 kudos
Latest Reply
User16783853906
Contributor III
  • 1 kudos

Delta cache accelerates data reads by creating copies of remote files in nodes’ local storage using a fast intermediate data format. The data is cached automatically whenever a file has to be fetched from a remote location. Successive reads of the sa...

  • 1 kudos
1 More Replies
User16783853501
by New Contributor II
  • 1429 Views
  • 1 replies
  • 1 kudos

Converting data that is in Delta format to plain parquet format

Many a times there is a need to convert Delta tables from Delta format to plain parquet format for a number of reasons, what is the best way to do that?

  • 1429 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

You can easily convert a Delta table back to a Parquet table using the following steps:If you have performed Delta Lake operations that can change the data files (for example, delete or merge, run vacuum with retention of 0 hours to delete all data f...

  • 1 kudos
User16783853906
by Contributor III
  • 3883 Views
  • 1 replies
  • 0 kudos

Metaexception [Version information not found in metastore] during cluster [re]start

Trying to configure new external metastore and running into the following exception during cluster initialization - Caused by: MetaException(message:Version information not found in metastore. )   at org.apache.hadoop.hive.metastore.RetryingHMSHandl...

  • 3883 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16783853906
Contributor III
  • 0 kudos

The above exception happens when the hive schema is not available in the metastore instance. Please check in your init scripts to make sure the following flag is enabled to create hive Schema and tables if not already present. datanucleus.autoCreateA...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 1125 Views
  • 1 replies
  • 0 kudos
  • 1125 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

The below code snippet can be used to get the DBR details on a HC clusterprint("hadoopVersion:" + sc._gateway.jvm.org.apache.hadoop.util.VersionInfo.getVersion()) print("baseVersion:" + sc._gateway.jvm.org.apache.spark.BuildInfo.sparkBranch()) print(...

  • 0 kudos
aladda
by Honored Contributor II
  • 886 Views
  • 1 replies
  • 0 kudos
  • 886 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

Databricks notebooks can be exported and stored in S3 or any other object storage. The internal storage of the databricks notebook cannot be changed or configured. The implementation is internal to Databicks control plane and not user configurable.

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels