Data Engineering

Forum Posts

Sorted by:

by User16835756816 • Databricks Employee

04-05-2022 1:54:26 PM

3030 Views
1 replies
5 kudos

Announcing: Delta Live Tables !

Databricks is excited to announce the general availability of Delta Live Tables to you, our community. Anxiously awaited, Delta Live Tables (DLT) is the first ETL framework that uses a simple, declarative approach to building reliable streaming or ...

Data Engineering

3030 Views
1 replies
5 kudos

04-05-2022 1:54:26 PM

View Replies

Latest Reply

User16725394280
Databricks Employee

04-08-2022 4:13:53 AM

5 kudos

Informative Content thanks for sharing.

5 kudos

04-08-2022 4:13:53 AM

by Kush22 • New Contributor

04-07-2022 6:34:59 AM

2518 Views
0 replies
0 kudos

Delete the file

While exporting data from Databricks to Azure blob storage how can I delete the committed, started and success file?

Data Engineering

2518 Views
0 replies
0 kudos

04-07-2022 6:34:59 AM

by sgannavaram • New Contributor III

04-07-2022 5:53:37 AM

4858 Views
1 replies
2 kudos

Resolved! How to pass variables into query string?

I have two variables StartTimeStmp and EndTimeStmp, i am going to assign the Start timestamp to it based on Last Successful Job Runtime and EndTimeStamp would be current time of system.SET StartTimeStmp = '2022-03-24 15:40:00.000';SET EndTimeStmp = '...

Data Engineering

4858 Views
1 replies
2 kudos

04-07-2022 5:53:37 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

04-07-2022 6:05:46 AM

2 kudos

@Srinivas Gannavaram , in python:spark.sql(f""" SELECT CI.CORPORATE_ITEM_INTEGRATION_ID , CI.CORPORATE_ITEM_CD WHERE CI.DW_CREATE_TS < '{my_timestamp_variable}' ; """)

2 kudos

04-07-2022 6:05:46 AM

by Direo • Contributor II

04-07-2022 5:06:11 AM

16122 Views
2 replies
3 kudos

Resolved! How temporary is dbfs:/tmp/? Are files periodically deleted from there?

Data Engineering

16122 Views
2 replies
3 kudos

04-07-2022 5:06:11 AM

View Replies

Latest Reply

User16873043212
Databricks Employee

04-07-2022 6:05:04 AM

3 kudos

@Direo Direo , Yeah, this is a location inside your dbfs. The whole control is on you. Databricks do not delete something you keep in this location.

3 kudos

04-07-2022 6:05:04 AM

1 More Replies

by Direo • Contributor II

04-07-2022 5:09:31 AM

2605 Views
1 replies
5 kudos

Is it possible to write tables to delta lake using upsert mode? Would it be more efficiant than overwrite?

Data Engineering

2605 Views
1 replies
5 kudos

04-07-2022 5:09:31 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

04-07-2022 5:16:25 AM

5 kudos

@Direo Direo , Yes, you use Merge syntax for that https://docs.delta.io/latest/delta-update.html.And is more efficient than overwriting if you want to update only part of the data, but you need to think about the logic of what to update so overwriti...

5 kudos

04-07-2022 5:16:25 AM

by Constantine • Contributor III

04-06-2022 3:47:43 PM

2341 Views
1 replies
4 kudos

Resolved! What's the best architecture for Structured Streaming and why?

I am building an ETL pipeline which reads data from a Kafka topic ( data is serialized in Thrift format) and writes it to Delta Table in databricks. I want to have two layersBronze Layer -> which has raw Kafka dataSilver Layer -> which has deserializ...

Data Engineering

2341 Views
1 replies
4 kudos

04-06-2022 3:47:43 PM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

04-07-2022 3:15:43 AM

4 kudos

@John Constantine , "Bronze Layer -> which has raw Kafka data"If you use confluent.io, you can also utilize a direct sink to DataLake Storage - bronze layer."Silver Layer -> which has deserialized data"Then use Delta Live Tables to process it to del...

4 kudos

04-07-2022 3:15:43 AM

by cal • New Contributor

04-06-2022 6:37:22 PM

804 Views
0 replies
0 kudos

G.I.S., Inc. is a distributor and fabricator of thermal and acoustical insulation systems for industrial, commercial, power, process, original equipme...

G.I.S., Inc. is a distributor and fabricator of thermal and acoustical insulation systems for industrial, commercial, power, process, original equipment manufacturers, plumbing and HVAC industries. In today's fast paced market, consumers have a multi...

Data Engineering

804 Views
0 replies
0 kudos

04-06-2022 6:37:22 PM

by Anonymous • Not applicable

04-06-2022 7:10:09 AM

2520 Views
1 replies
1 kudos

Resolved! "policy_id" parameter in JOB API

I can't find information about that parameter in https://docs.databricks.com/dev-tools/api/latest/jobs.htmlWhere is it documented?

Data Engineering

2520 Views
1 replies
1 kudos

04-06-2022 7:10:09 AM

View Replies

Latest Reply

Ryan_Chynoweth
Databricks Employee

04-06-2022 2:32:32 PM

1 kudos

I believe it is just "policy_id". As an incomplete example the specification via API would be something like: { "cluster_id": "1234-567890-abd35gh", "spark_context_id": 1234567890, "cluster_name": "my_cluster", "spark_version": "9.1.x-scala2....

1 kudos

04-06-2022 2:32:32 PM

by sgannavaram • New Contributor III

03-30-2022 12:47:13 PM

4536 Views
3 replies
4 kudos

Resolved! Write output of DataFrame to a file with tild ( ~) separator in Databricks Mount or Storage Mount with VM.

I need to write output of Data Frame to a file with tilde ( ~) separator in Databricks Mount or Storage Mount with VM. Could you please help with some sample code if you have any?

Data Engineering

4536 Views
3 replies
4 kudos

03-30-2022 12:47:13 PM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

03-30-2022 12:52:37 PM

4 kudos

@Srinivas Gannavaram , Does it have to be CSV with fields separated by ~?If yes is enough to add .option("sep", "~")(df .write .option("sep", "~") .csv(mount_path))

4 kudos

03-30-2022 12:52:37 PM

2 More Replies

by Braxx • Contributor II

04-06-2022 3:17:00 AM

3842 Views
1 replies
2 kudos

Resolved! list users having access to scope credentials

Hello!How do I list all the users or groups having access to the key-vault backed scope credentials?Let's say, I have a scope called MyScope for which all the secrets are stored in MyKeyVault.I would like to see what users have access there and ideal...

Data Engineering

3842 Views
1 replies
2 kudos

04-06-2022 3:17:00 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

04-06-2022 6:38:45 AM

2 kudos

@Bartosz Wachocki , As secrets use ACL for the scope, you need to make an API call (can be via CLI also) to list ACL for the given scope >> 2.0/secrets/acls/list more info here https://docs.databricks.com/dev-tools/api/latest/secrets.html#list-secre...

2 kudos

04-06-2022 6:38:45 AM

by BeginnerBob • New Contributor III

03-15-2022 10:52:29 AM

7098 Views
2 replies
2 kudos

Bronze silver gold layers

Is there a best practise guide on setting up the delta lake for these 3 layers. I'm looking for document or scripts to run that will assist me.

Data Engineering

7098 Views
2 replies
2 kudos

03-15-2022 10:52:29 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

04-05-2022 5:33:23 PM

2 kudos

hi @Lloyd Vickery ,I would highly recommend to use Databricks Delta Live Tables (DLT) docs here https://databricks.com/product/delta-live-tables

2 kudos

04-05-2022 5:33:23 PM

1 More Replies

by AdamRink • New Contributor III

02-14-2022 3:03:40 PM

5972 Views
3 replies
0 kudos

Try catch multiple write streams on a job

We are having issues with checkpoints and schema versions getting out of date (no idea why), but it causes jobs to fail. We have jobs that are running 15-30 streaming queries, so if one fails, that creates an issue. I would like to trap the checkpo...

Data Engineering

5972 Views
3 replies
0 kudos

02-14-2022 3:03:40 PM

View Replies

Latest Reply

AdamRink
New Contributor III

02-23-2022 9:22:41 AM

0 kudos

The problem is that on startup if a stream fails, it would never hit the awaitAnyTermination? I almost want to take that while loop and put it on a background thread to start that at the beginning and then fire all the streams afterward... not sure ...

0 kudos

02-23-2022 9:22:41 AM

2 More Replies

by TS • New Contributor III

04-05-2022 12:39:13 AM

5881 Views
3 replies
3 kudos

Resolved! Turn spark.sql query into scala function

Hello,I'm learning Scala / Spark and try to understand what's wrong with my function:I have a spark.sql query, stored in a variable:val uViewName = spark.sql(""" SELECT v.Data_View_Name FROM apoHierarchy AS h INNER JOIN apoView AS v ON h.View_N...

Data Engineering

5881 Views
3 replies
3 kudos

04-05-2022 12:39:13 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

04-05-2022 2:17:08 AM

3 kudos

try add .first()(0) it will return only value from first row/column as currently you are returning Dataset: var uViewName = spark.sql(s""" SELECT v.Data_View_Name FROM apoHierarchy AS h INNER JOIN apoView AS v ON h.View_Name = v.Context_View_N...

3 kudos

04-05-2022 2:17:08 AM

2 More Replies

by brickster_2018 • Databricks Employee

06-24-2021 10:14:39 AM

4482 Views
2 replies
1 kudos

Resolved! How to test Kafka connectivity from a Databricks notebook

My structured streaming job is failing as it's unable to connect to Kafka. I believe the issue is with Spark. How can I isolate if it's a Spark library issue or an actual network issue.

Data Engineering

4482 Views
2 replies
1 kudos

06-24-2021 10:14:39 AM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-24-2021 10:15:18 AM

1 kudos

The below code snippet can be used to test the connectivityimport java.util.Arrays import java.util.Properties import org.apache.kafka.clients.admin.AdminClient import org.apache.kafka.clients.admin.AdminClientConfig import org.apache.kafka.clients.a...

1 kudos

06-24-2021 10:15:18 AM

1 More Replies

by Mr__E • Contributor II

04-02-2022 4:02:36 AM

8096 Views
5 replies
5 kudos

Resolved! Using shared python wheels for job compute clusters

We have a GitHub workflow that generates a python wheel and uploads to a shared S3 available to our Databricks workspaces. When I install the Python wheel to a normal compute cluster using the path approach, it correctly installs the Python wheel and...

Data Engineering

8096 Views
5 replies
5 kudos

04-02-2022 4:02:36 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

04-02-2022 5:34:49 AM

5 kudos

You can mount S3 as a DBFS folder then set that library in "cluster" -> "libraries" tab -> "install new" -> "DBFS"

5 kudos

04-02-2022 5:34:49 AM

4 More Replies

Databricks Community

Forum Posts

Announcing: Delta Live Tables !

Delete the file

Resolved! How to pass variables into query string?

Resolved! How temporary is dbfs:/tmp/? Are files periodically deleted from there?

Is it possible to write tables to delta lake using upsert mode? Would it be more efficiant than overwrite?

Resolved! What's the best architecture for Structured Streaming and why?

G.I.S., Inc. is a distributor and fabricator of thermal and acoustical insulation systems for industrial, commercial, power, process, original equipme...

Resolved! "policy_id" parameter in JOB API

Resolved! Write output of DataFrame to a file with tild ( ~) separator in Databricks Mount or Storage Mount with VM.

Resolved! list users having access to scope credentials

Bronze silver gold layers

Try catch multiple write streams on a job

Resolved! Turn spark.sql query into scala function

Resolved! How to test Kafka connectivity from a Databricks notebook

Resolved! Using shared python wheels for job compute clusters

Databricks to Salesforce Core (Not cloud)

Databricks optimization for query perfomance and p...

Parametrize the DLT pipeline for dynamic loading o...

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...