cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sgannavaram
by New Contributor III
  • 4342 Views
  • 1 replies
  • 2 kudos

Resolved! How to pass variables into query string?

I have two variables StartTimeStmp and EndTimeStmp, i am going to assign the Start timestamp to it based on Last Successful Job Runtime and EndTimeStamp would be current time of system.SET StartTimeStmp = '2022-03-24 15:40:00.000';SET EndTimeStmp = '...

  • 4342 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

@Srinivas Gannavaram​ , in python:spark.sql(f""" SELECT CI.CORPORATE_ITEM_INTEGRATION_ID , CI.CORPORATE_ITEM_CD WHERE CI.DW_CREATE_TS < '{my_timestamp_variable}' ; """)

  • 2 kudos
Direo
by Contributor II
  • 15036 Views
  • 2 replies
  • 3 kudos
  • 15036 Views
  • 2 replies
  • 3 kudos
Latest Reply
User16873043212
New Contributor III
  • 3 kudos

@Direo Direo​ , Yeah, this is a location inside your dbfs. The whole control is on you. Databricks do not delete something you keep in this location.

  • 3 kudos
1 More Replies
Direo
by Contributor II
  • 2336 Views
  • 1 replies
  • 5 kudos
  • 2336 Views
  • 1 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

@Direo Direo​ , Yes, you use Merge syntax for that https://docs.delta.io/latest/delta-update.html.And is more efficient than overwriting if you want to update only part of the data, but you need to think about the logic of what to update so overwriti...

  • 5 kudos
Constantine
by Contributor III
  • 2070 Views
  • 1 replies
  • 4 kudos

Resolved! What's the best architecture for Structured Streaming and why?

I am building an ETL pipeline which reads data from a Kafka topic ( data is serialized in Thrift format) and writes it to Delta Table in databricks. I want to have two layersBronze Layer -> which has raw Kafka dataSilver Layer -> which has deserializ...

  • 2070 Views
  • 1 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

@John Constantine​ , "Bronze Layer -> which has raw Kafka data"If you use confluent.io, you can also utilize a direct sink to DataLake Storage - bronze layer."Silver Layer -> which has deserialized data"Then use Delta Live Tables to process it to del...

  • 4 kudos
cal
by New Contributor
  • 700 Views
  • 0 replies
  • 0 kudos

G.I.S., Inc. is a distributor and fabricator of thermal and acoustical insulation systems for industrial, commercial, power, process, original equipme...

G.I.S., Inc. is a distributor and fabricator of thermal and acoustical insulation systems for industrial, commercial, power, process, original equipment manufacturers, plumbing and HVAC industries. In today's fast paced market, consumers have a multi...

  • 700 Views
  • 0 replies
  • 0 kudos
Anonymous
by Not applicable
  • 2227 Views
  • 1 replies
  • 1 kudos

Resolved! "policy_id" parameter in JOB API

I can't find information about that parameter in https://docs.databricks.com/dev-tools/api/latest/jobs.htmlWhere is it documented?

  • 2227 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 1 kudos

I believe it is just "policy_id". As an incomplete example the specification via API would be something like: { "cluster_id": "1234-567890-abd35gh", "spark_context_id": 1234567890, "cluster_name": "my_cluster", "spark_version": "9.1.x-scala2....

  • 1 kudos
sgannavaram
by New Contributor III
  • 4040 Views
  • 3 replies
  • 4 kudos

Resolved! Write output of DataFrame to a file with tild ( ~) separator in Databricks Mount or Storage Mount with VM.

I need to write output of Data Frame to a file with tilde ( ~) separator in Databricks Mount or Storage Mount with VM. Could you please help with some sample code if you have any?

  • 4040 Views
  • 3 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

@Srinivas Gannavaram​ , Does it have to be CSV with fields separated by ~?If yes is enough to add .option("sep", "~")(df .write .option("sep", "~") .csv(mount_path))

  • 4 kudos
2 More Replies
Braxx
by Contributor II
  • 3314 Views
  • 1 replies
  • 2 kudos

Resolved! list users having access to scope credentials

Hello!How do I list all the users or groups having access to the key-vault backed scope credentials?Let's say, I have a scope called MyScope for which all the secrets are stored in MyKeyVault.I would like to see what users have access there and ideal...

  • 3314 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

@Bartosz Wachocki​ , As secrets use ACL for the scope, you need to make an API call (can be via CLI also) to list ACL for the given scope >> 2.0/secrets/acls/list more info here https://docs.databricks.com/dev-tools/api/latest/secrets.html#list-secre...

  • 2 kudos
BeginnerBob
by New Contributor III
  • 6426 Views
  • 2 replies
  • 2 kudos

Bronze silver gold layers

Is there a best practise guide on setting up the delta lake for these 3 layers. ​I'm looking for document or scripts to run that will assist me.

  • 6426 Views
  • 2 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

hi @Lloyd Vickery​ ,I would highly recommend to use Databricks Delta Live Tables (DLT) docs here https://databricks.com/product/delta-live-tables

  • 2 kudos
1 More Replies
AdamRink
by New Contributor III
  • 5446 Views
  • 3 replies
  • 0 kudos

Try catch multiple write streams on a job

We are having issues with checkpoints and schema versions getting out of date (no idea why), but it causes jobs to fail. We have jobs that are running 15-30 streaming queries, so if one fails, that creates an issue. I would like to trap the checkpo...

  • 5446 Views
  • 3 replies
  • 0 kudos
Latest Reply
AdamRink
New Contributor III
  • 0 kudos

The problem is that on startup if a stream fails, it would never hit the awaitAnyTermination? I almost want to take that while loop and put it on a background thread to start that at the beginning and then fire all the streams afterward... not sure ...

  • 0 kudos
2 More Replies
TS
by New Contributor III
  • 5182 Views
  • 3 replies
  • 3 kudos

Resolved! Turn spark.sql query into scala function

Hello,I'm learning Scala / Spark and try to understand what's wrong with my function:I have a spark.sql query, stored in a variable:val uViewName = spark.sql(""" SELECT v.Data_View_Name FROM apoHierarchy AS h INNER JOIN apoView AS v ON h.View_N...

  • 5182 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

try add .first()(0) it will return only value from first row/column as currently you are returning Dataset: var uViewName = spark.sql(s""" SELECT v.Data_View_Name FROM apoHierarchy AS h INNER JOIN apoView AS v ON h.View_Name = v.Context_View_N...

  • 3 kudos
2 More Replies
brickster_2018
by Databricks Employee
  • 3909 Views
  • 2 replies
  • 1 kudos

Resolved! How to test Kafka connectivity from a Databricks notebook

My structured streaming job is failing as it's unable to connect to Kafka. I believe the issue is with Spark. How can I isolate if it's a Spark library issue or an actual network issue.

  • 3909 Views
  • 2 replies
  • 1 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 1 kudos

The below code snippet can be used to test the connectivityimport java.util.Arrays import java.util.Properties import org.apache.kafka.clients.admin.AdminClient import org.apache.kafka.clients.admin.AdminClientConfig import org.apache.kafka.clients.a...

  • 1 kudos
1 More Replies
Mr__E
by Contributor II
  • 6853 Views
  • 5 replies
  • 5 kudos

Resolved! Using shared python wheels for job compute clusters

We have a GitHub workflow that generates a python wheel and uploads to a shared S3 available to our Databricks workspaces. When I install the Python wheel to a normal compute cluster using the path approach, it correctly installs the Python wheel and...

  • 6853 Views
  • 5 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

You can mount S3 as a DBFS folder then set that library in "cluster" -> "libraries" tab -> "install new" -> "DBFS" 

  • 5 kudos
4 More Replies
yoniau
by New Contributor II
  • 3143 Views
  • 2 replies
  • 5 kudos

Resolved! Different configurations for same Databricks Runtime version

Hi all,On my DBR installations, s3a scheme is mapped to shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem. On my customer's DBR installations it is mapped to com.databricks.s3a.S3AFileSystem.We both use the same DBR runtime, and none of us has...

  • 3143 Views
  • 2 replies
  • 5 kudos
Latest Reply
Prabakar
Databricks Employee
  • 5 kudos

@Yoni Au​ , If both of you are using the same DBR version, then you should not find any difference. As @Hubert Dudek​ mentioned, there might be some spark configuration change made on one of the clusters. Also, it's worth checking for any cluster sco...

  • 5 kudos
1 More Replies
susan1234567
by New Contributor
  • 2291 Views
  • 1 replies
  • 2 kudos

I cannot access databricks community edition account

Last week, I cannot loginto https://community.cloud.databricks.com/login.html all of a sudden. I tried to set the password, also didn't receive the reset email. It says "Invalid email address or password Note: Emails/usernames are case-sensitive".I e...

  • 2291 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

@Kaniz Fatma​ can help, additionally you can open ticket here https://help.databricks.com/s/contact-us

  • 2 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels