Data Engineering

Forum Posts

Sorted by:

by exilon • New Contributor

02-23-2024 8:58:01 AM

1936 Views
0 replies
0 kudos

DLT streaming with sliding window missing last windows interval

Hello, I have a DLT pipeline where I want to calculate the rolling average of a column for the last 24 hours which is updated every hour.I'm using the below code to achieve this: @Dlt.table() def gold(): df = dlt.read_stream("silver_table")...

Data Engineering

dlt

spark

streaming

window

1936 Views
0 replies
0 kudos

02-23-2024 8:58:01 AM

by CaptainJack • New Contributor III

02-23-2024 3:36:21 AM

3114 Views
2 replies
0 kudos

File arrival trigger customization

Hi all. I have workflow which I would like to trigger when new file arrive. Problem is that in my storage account, there are few different types of files. Lets assume that I have big csv file and small xlsx mapping file. I would like to trigger job, ...

Data Engineering

3114 Views
2 replies
0 kudos

02-23-2024 3:36:21 AM

View Replies

Latest Reply

feiyun0112
Honored Contributor

02-23-2024 4:36:21 AM

0 kudos

option pathGlobFilter or fileNamePatternhttps://docs.databricks.com/en/ingestion/auto-loader/options.html

0 kudos

02-23-2024 4:36:21 AM

1 More Replies

by alxsbn • Contributor

02-23-2024 2:18:10 AM

2310 Views
0 replies
1 kudos

Compute pool and AWS instance profiles

Hi everyone,We're looking at using the compute pool feature. Now we're mostly relying on all-purpose and job compute. On these two we're using instance profiles to let the clusters access our s3 buckets and more.We don't see anything related to insta...

Data Engineering

2310 Views
0 replies
1 kudos

02-23-2024 2:18:10 AM

by JakeerDE • New Contributor III

02-20-2024 4:18:18 AM

6273 Views
3 replies
0 kudos

Resolved! Databricks SQL - Deduplication in DLT APPLY CHANGES INTO

Hi @Retired_mod,We have a kafka source appending the data into bronze table and a subsequent DLT apply changes into to do the SCD handling. Finally, we have materialized views to create dims/facts.We are facing issues, when we perform deduplication i...

Data Engineering

6273 Views
3 replies
0 kudos

02-20-2024 4:18:18 AM

View Replies

Latest Reply

JakeerDE
New Contributor III

02-20-2024 10:43:12 PM

0 kudos

Hi @Palash01 Thanks for the response. Below is what I am trying to do. However, it is throwing an error. APPLY CHANGES INTO LIVE.targettable FROM ( SELECT DISTINCT * FROM STREAM(sourcetable_1) tbl1 INNER JOIN STREAM(sourcetable_2) tbl2 ON tbl1.id = ...

0 kudos

02-20-2024 10:43:12 PM

2 More Replies

by sumitdesai • New Contributor III

02-22-2024 2:30:24 AM

4345 Views
1 replies
0 kudos

Job not able to access notebook from github

I have created a job in Databricks and configured to use a cluster having single user access enabled and using github as a source. When I am trying to run the job, getting following error-run failed with error message Unable to access the notebook "d...

Data Engineering

4345 Views
1 replies
0 kudos

02-22-2024 2:30:24 AM

View Replies

Latest Reply

ezhil
New Contributor III

02-23-2024 12:06:55 AM

0 kudos

I think you need to link the git account with databricks by passing the access token which is generated in githubFollow the document for reference: https://docs.databricks.com/en/repos/get-access-tokens-from-git-provider.htmlNote : While creating the...

0 kudos

02-23-2024 12:06:55 AM

by cltj • New Contributor III

02-22-2024 1:41:05 AM

3545 Views
1 replies
0 kudos

Managed tables and ADLS - infrastructure

Hi all. I want to get this right and therefore I am reaching out to the community. We are using azure, and currently are using 1 Azure Data Lake Storage for development, and 1 for production. These are connected to dev and prod databricks workspaces....

Data Engineering

3545 Views
1 replies
0 kudos

02-22-2024 1:41:05 AM

View Replies

Latest Reply

ossinova
Contributor II

02-22-2024 8:35:32 AM

0 kudos

I recommend you read this article (Managed vs External tables) and answer the following questions:do I require direct access to the data outside of Azure Databricks clusters or Databricks SQL warehouses?If yes, then External is your only optionIn rel...

0 kudos

02-22-2024 8:35:32 AM

by Marcin_U • New Contributor II

02-21-2024 5:54:27 AM

2661 Views
1 replies
0 kudos

AutoLoader - problem with adding new source location

Hello,I have some trouble with AutoLoader. Currently we use many diffrent source location on ADLS to read parquet files and write it to delta table using AutoLoader. Files in locations have the same schema.Every things works fine untill we have to ad...

Data Engineering

2661 Views
1 replies
0 kudos

02-21-2024 5:54:27 AM

View Replies

Latest Reply

Marcin_U
New Contributor II

02-22-2024 7:27:07 AM

0 kudos

Thanks for the reply @Retired_mod . I have some questions related to you answer.Checkpoint Location:Does deleteing checkpoint folder (or only files?) mean that next run of AutoLoader will load all files from provided source locations? So it will dupl...

0 kudos

02-22-2024 7:27:07 AM

by oussValrho • New Contributor

02-22-2024 7:01:25 AM

9018 Views
0 replies
0 kudos

Cannot resolve due to data type mismatch: incompatible types ("STRING" and ARRAY<STRING>

hey i have this error from a while : Cannot resolve "(needed_skill_id = needed_skill_id)" due to data type mismatch: the left and right operands of the binary operator have incompatible types ("STRING" and "ARRAY<STRING>"). SQLSTATE: 42K09;and these ...

Data Engineering

9018 Views
0 replies
0 kudos

02-22-2024 7:01:25 AM

by Lightyagami • New Contributor

02-22-2024 3:25:18 AM

7452 Views
0 replies
0 kudos

Save workbook with macros

Hi, Is there any way to save a workbook without losing the macros in databricks?

Data Engineering

Databricks

pyspark

7452 Views
0 replies
0 kudos

02-22-2024 3:25:18 AM

by essura • New Contributor II

02-21-2024 5:14:01 AM

3320 Views
1 replies
1 kudos

Create a docker image for dbt task

Hi there,We are trying to setup up a docker image for our dbt execution, primarily to improve execution speed, but also to simplify deployment (we are using a private repos for both the dbt project and some of the dbt packages).It seems to work curre...

Data Engineering

3320 Views
1 replies
1 kudos

02-21-2024 5:14:01 AM

View Replies

by vijaykumar99535 • New Contributor III

02-21-2024 9:28:25 AM

2647 Views
1 replies
0 kudos

How to create job cluster using rest api

I am creating cluster using rest api call but every-time it is creating all purpose cluster. Is there a way to create job cluster and run notebook using python code?

Data Engineering

2647 Views
1 replies
0 kudos

02-21-2024 9:28:25 AM

View Replies

Latest Reply

feiyun0112
Honored Contributor

02-21-2024 4:51:04 PM

0 kudos

job_cluster_key string [ 1 .. 100 ] characters ^[\w\-\_]+$ If job_cluster_key, this task is executed reusing the cluster specified in job.settings.job_clusters.Create a new job | Jobs API | REST API reference | Databricks on AWS

0 kudos

02-21-2024 4:51:04 PM

by William_Scardua • Valued Contributor

02-21-2024 11:57:57 AM

3513 Views
1 replies
1 kudos

groupBy without aggregation (Pyspark API)

Hi guys,You have any idea how can I do a groupBy without aggregation (Pyspark API)like: df.groupBy('field1', 'field2', 'field3') My target is make a group but in this case is not necessary count records or aggregationThank you

Data Engineering

3513 Views
1 replies
1 kudos

02-21-2024 11:57:57 AM

View Replies

Latest Reply

feiyun0112
Honored Contributor

02-21-2024 4:28:46 PM

1 kudos

df.select("field1","field2","field3").distinct()do you mean get distinct rows for selected column?

1 kudos

02-21-2024 4:28:46 PM

by Innov • New Contributor

02-21-2024 2:38:45 PM

1273 Views
0 replies
0 kudos

Parse nested json for building footprints

Looking for some help. If anyone has worked with nested json file in Databricks notebook. I am trying to parse nested json file to get coordinates and use that to create polygon for footprint. Do I need to read it as txt? How can I use the Databricks...

Data Engineering

1273 Views
0 replies
0 kudos

02-21-2024 2:38:45 PM

by zero234 • New Contributor III

02-21-2024 8:29:20 AM

1842 Views
1 replies
0 kudos

I am trying to read nested data from json file to put it into streaming table using dlt

So i have this nested data with more than 200+columns and i have extracted this data into json file when i use the below code to read the json files, if in data there are few columns which have no value at all it doest inclued those columns in schema...

Data Engineering

1842 Views
1 replies
0 kudos

02-21-2024 8:29:20 AM

View Replies

Latest Reply

zero234
New Contributor III

02-21-2024 10:57:30 AM

0 kudos

replying to my above questionwe cannot use inferschema on streaming table we need to externally specify schema can anyone please suggest a way to write data in nested form to streaming table and if this is possible?

0 kudos

02-21-2024 10:57:30 AM

by asad77007 • New Contributor II

06-19-2023 5:57:20 AM

4182 Views
3 replies
1 kudos

How to connect Analysis Service Cube with Databricks notebook

I am trying to connect AS Cube with Databricks notebook but unfortunately didn't find any solution yet. is there any possible way to connect AS cube with databricks notebook? if yes can someone please guide me

Data Engineering

4182 Views
3 replies
1 kudos

06-19-2023 5:57:20 AM

View Replies

Latest Reply

omfspartan
New Contributor III

08-30-2023 7:55:11 AM

1 kudos

I am able to connect Azure analysis services using Azure Analysis services rest api. is yours on-prem?

1 kudos

08-30-2023 7:55:11 AM

2 More Replies

Databricks Community

Forum Posts

DLT streaming with sliding window missing last windows interval

File arrival trigger customization

Compute pool and AWS instance profiles

Resolved! Databricks SQL - Deduplication in DLT APPLY CHANGES INTO

Job not able to access notebook from github

Managed tables and ADLS - infrastructure

AutoLoader - problem with adding new source location

Cannot resolve due to data type mismatch: incompatible types ("STRING" and ARRAY<STRING>

Save workbook with macros

Create a docker image for dbt task

How to create job cluster using rest api

groupBy without aggregation (Pyspark API)

Parse nested json for building footprints

I am trying to read nested data from json file to put it into streaming table using dlt

How to connect Analysis Service Cube with Databricks notebook

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template