cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

BeginnerBob
by New Contributor III
  • 2135 Views
  • 4 replies
  • 1 kudos

Loading Dimensions including SCDType2

I have a customer dimension and for every incremental load I am applying type2 or type1 to the dimension.This dimension is based off a silver table in my delta lake where I am applying a merge statement.What happens if I need to go back and track ad...

  • 2135 Views
  • 4 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Lloyd Vickery​, We haven’t heard from you on the last response from @Werner Stinckens​, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to ot...

  • 1 kudos
3 More Replies
Anonymous
by Not applicable
  • 539 Views
  • 0 replies
  • 2 kudos

www.vandevelde.eu

June Featured Member of the Month ! Werner Stinckens Job Title: Data Engineer @ Van de Velde (www.vandevelde.eu)What are three words your coworkers would use to describe you?Helpful, accurate, inquisitiveWhat is your favorite thing about your curren...

  • 539 Views
  • 0 replies
  • 2 kudos
cmotla
by New Contributor III
  • 1950 Views
  • 3 replies
  • 8 kudos

Issue with complex json based data frame select

We are getting the below error when trying to select the nested columns (string type in a struct) even though we don't have more than a 1000 records in the data frame. The schema is very complex and has few columns as struct type and few as array typ...

  • 1950 Views
  • 3 replies
  • 8 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 8 kudos

Hi @Chaitanya Motla​ , Just a friendly follow-up. Do you still need help, or did you find the solution? Please let us know.

  • 8 kudos
2 More Replies
User16835756816
by Valued Contributor
  • 1580 Views
  • 2 replies
  • 5 kudos

 Announcing: Delta Live Tables ! 

Databricks is excited to announce the general availability of Delta Live Tables to you, our community. Anxiously awaited, Delta Live Tables (DLT) is the first ETL framework that uses a simple, declarative approach to building reliable streaming or ...

  • 1580 Views
  • 2 replies
  • 5 kudos
Latest Reply
User16725394280
Contributor II
  • 5 kudos

Informative Content thanks for sharing.

  • 5 kudos
1 More Replies
emanuele_maffeo
by New Contributor III
  • 2817 Views
  • 6 replies
  • 8 kudos

Resolved! Trigger.AvailableNow on scala - compile issue

Hi everybody,Trigger.AvailableNow is released within the databricks 10.1 runtime and we would like to use this new feature with autoloader.We write all our data pipeline in scala and our projects import spark as a provided dependency. If we try to sw...

  • 2817 Views
  • 6 replies
  • 8 kudos
Latest Reply
Anonymous
Not applicable
  • 8 kudos

You can switch to python. Depending on what you're doing and if you're using UDFs, there shouldn't be any difference at all in terms of performance.

  • 8 kudos
5 More Replies
Mirko
by Contributor
  • 2048 Views
  • 3 replies
  • 0 kudos

Resolved! Location for DB and for specific tables in DB

The following situation: I am creating a Database with location somewhere in my Azure Lake Gen 2.CREATE SCHEMA IF NOT EXISTS curated LOCATION 'somelocation'Then i want a specific Table in curated to be in a subfolder in 'somelocation':CREATE TABLE IF...

  • 2048 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Mirko Ludewig​ - Thanks for letting us know. I don't like strange all that much, but I do like working as desired!

  • 0 kudos
2 More Replies
RicksDB
by Contributor II
  • 4181 Views
  • 6 replies
  • 6 kudos

Resolved! SingleNode all-purpose cluster for small ETLs

Hi,I have many "small" jobs than needs to be executed quickly and at a predictable low cost from several Azure Data Factory pipelines. For this reason, I configured a small single node cluster to execute those processes. For the moment, everything se...

image
  • 4181 Views
  • 6 replies
  • 6 kudos
Latest Reply
RicksDB
Contributor II
  • 6 kudos

@Bilal Aslam​  In my case, it usually depends on the customers and their SLA. Most of them usually do not have a "true" high SLA requirement thus prefer the jobs to be throttled when the actual cost is within a certain range of the budget instead of ...

  • 6 kudos
5 More Replies
RicksDB
by Contributor II
  • 3694 Views
  • 9 replies
  • 1 kudos

Configure jobs throttling for ephemeral cluster ETLs

Hi,Is it possible to configure job throttling in order to queue jobs across a workspace after a given number of concurrent execution when using the ephemeral cluster pattern? The reason is mainly for cost control. We prefer reducing performance rathe...

  • 3694 Views
  • 9 replies
  • 1 kudos
Latest Reply
RicksDB
Contributor II
  • 1 kudos

Thanks for the help josephk. I will continue to use an interactive cluster for the time being until the release of that new feature. Hopefully, it will allow my use case. Is there visibility on the roadmap for an ETA or more information on it?

  • 1 kudos
8 More Replies
MadelynM
by Contributor II
  • 685 Views
  • 0 replies
  • 1 kudos

vimeo.com

Auto Loader provides Python and Scala methods to ingest new data from a folder location into a Delta Lake table by using directory listing or file notifications. Here's a quick video (7:00) on how to use Auto Loader for Databricks on AWS with Databri...

  • 685 Views
  • 0 replies
  • 1 kudos
Anonymous
by Not applicable
  • 2360 Views
  • 3 replies
  • 7 kudos

Resolved! How does 73% of the data go unused for analytics or decision-making?

Is Lakehouse the answer? Here's a good resource that was just published: https://dbricks.co/3q3471X

  • 2360 Views
  • 3 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

@Alexis Lopez​ - If @Dan Zafar​ 's or @Harikrishnan Kunhumveettil​'s answers solved the issue, would you be happy to mark one of their answers as best so other members can find the solution more easily?

  • 7 kudos
2 More Replies
eq
by New Contributor III
  • 4152 Views
  • 7 replies
  • 7 kudos

Resolved! Multi-task Jobs orchestration - simulating onComplete status

Currently, we are investigating how to effectively incorporate databricks latest feature for orchestration of tasks - Multi-task Jobs.The default behaviour is that a downstream task would not be executed if the previous one has failed for some reason...

  • 4152 Views
  • 7 replies
  • 7 kudos
Latest Reply
User16844513407
New Contributor III
  • 7 kudos

Hi @Stefan V​ ,My name is Jan and I'm a product manager working on job orchestration. Thank you for your question. At the moment this is not something directly supported yet, this is however on our radar. If you are interested in having a short conve...

  • 7 kudos
6 More Replies
brickster_2018
by Esteemed Contributor
  • 2580 Views
  • 2 replies
  • 3 kudos

Resolved! What is the best file format for a temporary table?

As part of my ETL process, I create intermediate/staging temporary tables. These tables created are read at a later point in the ETL and finally cleaned up. Should I use Delta? Using Delta creates the overhead of running optimize jobs, which would de...

  • 2580 Views
  • 2 replies
  • 3 kudos
Latest Reply
Sebastian
Contributor
  • 3 kudos

Agree.. the intermediate delta tables helps since it brings reliability to the pipeline.

  • 3 kudos
1 More Replies
Labels