Data Engineering

Forum Posts

Sorted by:

by JakeerDE • New Contributor III

02-20-2024 4:18:18 AM

1011 Views
3 replies
0 kudos

Resolved! Databricks SQL - Deduplication in DLT APPLY CHANGES INTO

Hi @Kaniz,We have a kafka source appending the data into bronze table and a subsequent DLT apply changes into to do the SCD handling. Finally, we have materialized views to create dims/facts.We are facing issues, when we perform deduplication inside ...

Data Engineering

1011 Views
3 replies
0 kudos

02-20-2024 4:18:18 AM

View Replies

Latest Reply

JakeerDE
New Contributor III

02-20-2024 10:43:12 PM

0 kudos

Hi @Palash01 Thanks for the response. Below is what I am trying to do. However, it is throwing an error. APPLY CHANGES INTO LIVE.targettable FROM ( SELECT DISTINCT * FROM STREAM(sourcetable_1) tbl1 INNER JOIN STREAM(sourcetable_2) tbl2 ON tbl1.id = ...

0 kudos

02-20-2024 10:43:12 PM

2 More Replies

by sumitdesai • New Contributor II

02-22-2024 2:30:24 AM

829 Views
1 replies
0 kudos

Job not able to access notebook from github

I have created a job in Databricks and configured to use a cluster having single user access enabled and using github as a source. When I am trying to run the job, getting following error-run failed with error message Unable to access the notebook "d...

Data Engineering

829 Views
1 replies
0 kudos

02-22-2024 2:30:24 AM

View Replies

Latest Reply

ezhil
New Contributor III

02-23-2024 12:06:55 AM

0 kudos

I think you need to link the git account with databricks by passing the access token which is generated in githubFollow the document for reference: https://docs.databricks.com/en/repos/get-access-tokens-from-git-provider.htmlNote : While creating the...

0 kudos

02-23-2024 12:06:55 AM

by Gilg • Contributor II

02-13-2024 3:27:53 PM

687 Views
5 replies
2 kudos

Multiple Autoloader reading the same directory path

HiOriginally, I only have 1 pipeline looking to a directory. Now as a test, I cloned the existing pipeline and edited the settings to a different catalog. Now both pipelines is basically reading the same directory path and running continuous mode.Que...

Data Engineering

687 Views
5 replies
2 kudos

02-13-2024 3:27:53 PM

View Replies

Latest Reply

Kaniz
Community Manager

02-14-2024 12:05:52 AM

2 kudos

Hi @Gilg, When multiple pipelines are simultaneously accessing the same directory path and utilizing Autoloader in continuous mode, it is crucial to consider the management of file locks and data consistency carefully. Let's delve into the specifi...

2 kudos

02-14-2024 12:05:52 AM

4 More Replies

by cltj • New Contributor III

02-22-2024 1:41:05 AM

539 Views
1 replies
0 kudos

Managed tables and ADLS - infrastructure

Hi all. I want to get this right and therefore I am reaching out to the community. We are using azure, and currently are using 1 Azure Data Lake Storage for development, and 1 for production. These are connected to dev and prod databricks workspaces....

Data Engineering

539 Views
1 replies
0 kudos

02-22-2024 1:41:05 AM

View Replies

Latest Reply

ossinova
Contributor II

02-22-2024 8:35:32 AM

0 kudos

I recommend you read this article (Managed vs External tables) and answer the following questions:do I require direct access to the data outside of Azure Databricks clusters or Databricks SQL warehouses?If yes, then External is your only optionIn rel...

0 kudos

02-22-2024 8:35:32 AM

by Marcin_U • New Contributor II

02-21-2024 5:54:27 AM

502 Views
2 replies
0 kudos

AutoLoader - problem with adding new source location

Hello,I have some trouble with AutoLoader. Currently we use many diffrent source location on ADLS to read parquet files and write it to delta table using AutoLoader. Files in locations have the same schema.Every things works fine untill we have to ad...

Data Engineering

502 Views
2 replies
0 kudos

02-21-2024 5:54:27 AM

View Replies

Latest Reply

Marcin_U
New Contributor II

02-22-2024 7:27:07 AM

0 kudos

Thanks for the reply @Kaniz . I have some questions related to you answer.Checkpoint Location:Does deleteing checkpoint folder (or only files?) mean that next run of AutoLoader will load all files from provided source locations? So it will duplicate ...

0 kudos

02-22-2024 7:27:07 AM

1 More Replies

by -werners- • Esteemed Contributor III

02-21-2024 7:16:38 AM

816 Views
2 replies
0 kudos

performance issues using shared compute access mode in scala

I created on our dev environment a cluster using the shared access mode, for our devs to use (instead of separate single user clusters).What I notice is that the performance of this cluster is terrible. And I mean really terrible: notebook cells wit...

Data Engineering

816 Views
2 replies
0 kudos

02-21-2024 7:16:38 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

02-22-2024 5:20:51 AM

0 kudos

Thanks for the answer!It seems that using shared access mode adds overhead. The nodes/driver are not stressed at all (cpu/ram/network).We use UC only.The clusters seems configured correctly (using the same cluster in single user mode changes perform...

0 kudos

02-22-2024 5:20:51 AM

1 More Replies

by essura • New Contributor II

02-21-2024 5:14:01 AM

610 Views
2 replies
1 kudos

Create a docker image for dbt task

Hi there,We are trying to setup up a docker image for our dbt execution, primarily to improve execution speed, but also to simplify deployment (we are using a private repos for both the dbt project and some of the dbt packages).It seems to work curre...

Data Engineering

610 Views
2 replies
1 kudos

02-21-2024 5:14:01 AM

View Replies

Latest Reply

Kaniz
Community Manager

02-22-2024 1:54:20 AM

1 kudos

Hi @essura, Setting up a Docker image for your dbt execution is a great approach. Let’s dive into the details. Prebuilt Docker Images: dbt Core and all adapter plugins maintained by dbt Labs are available as Docker images. These images are distr...

1 kudos

02-22-2024 1:54:20 AM

1 More Replies

by Innov • New Contributor

02-21-2024 2:38:45 PM

482 Views
1 replies
0 kudos

Parse nested json for building footprints

Looking for some help. If anyone has worked with nested json file in Databricks notebook. I am trying to parse nested json file to get coordinates and use that to create polygon for footprint. Do I need to read it as txt? How can I use the Databricks...

Data Engineering

482 Views
1 replies
0 kudos

02-21-2024 2:38:45 PM

View Replies

Latest Reply

Kaniz
Community Manager

02-22-2024 2:00:30 AM

0 kudos

Hi @Innov, Working with nested JSON files in Databricks Notebooks is a common task, and I can guide you through the process. Let’s break it down step by step: Reading the Nested JSON File: You don’t need to read the JSON file as plain text (.txt...

0 kudos

02-22-2024 2:00:30 AM

by zero234 • New Contributor III

02-21-2024 11:24:23 AM

483 Views
1 replies
1 kudos

Data is not loaded when creating two different streaming table from one delta live table pipeline

i am trying to create 2 streaming tables in one DLT pipleine , both read json data from different locations and both have different schema , the pipeline executes but no data is inserted in both the tables.where as when i try to run each table indiv...

Data Engineering

dlt

spark

STREAMINGTABLE

483 Views
1 replies
1 kudos

02-21-2024 11:24:23 AM

View Replies

Latest Reply

Kaniz
Community Manager

02-22-2024 1:50:33 AM

1 kudos

Hi @zero234, It seems you’re encountering an issue with your Delta Live Tables (DLT) pipeline where you’re trying to create two streaming tables from different sources with distinct schemas. Let’s dive into this! DLT is a powerful feature in Data...

1 kudos

02-22-2024 1:50:33 AM

by vijaykumar99535 • New Contributor III

02-21-2024 9:28:25 AM

538 Views
1 replies
0 kudos

How to create job cluster using rest api

I am creating cluster using rest api call but every-time it is creating all purpose cluster. Is there a way to create job cluster and run notebook using python code?

Data Engineering

538 Views
1 replies
0 kudos

02-21-2024 9:28:25 AM

View Replies

Latest Reply

feiyun0112
Contributor III

02-21-2024 4:51:04 PM

0 kudos

job_cluster_key string [ 1 .. 100 ] characters ^[\w\-\_]+$ If job_cluster_key, this task is executed reusing the cluster specified in job.settings.job_clusters.Create a new job | Jobs API | REST API reference | Databricks on AWS

0 kudos

02-21-2024 4:51:04 PM

by William_Scardua • Valued Contributor

02-21-2024 11:57:57 AM

399 Views
1 replies
1 kudos

groupBy without aggregation (Pyspark API)

Hi guys,You have any idea how can I do a groupBy without aggregation (Pyspark API)like: df.groupBy('field1', 'field2', 'field3') My target is make a group but in this case is not necessary count records or aggregationThank you

Data Engineering

399 Views
1 replies
1 kudos

02-21-2024 11:57:57 AM

View Replies

Latest Reply

feiyun0112
Contributor III

02-21-2024 4:28:46 PM

1 kudos

df.select("field1","field2","field3").distinct()do you mean get distinct rows for selected column?

1 kudos

02-21-2024 4:28:46 PM

by zero234 • New Contributor III

02-21-2024 8:29:20 AM

424 Views
1 replies
0 kudos

I am trying to read nested data from json file to put it into streaming table using dlt

So i have this nested data with more than 200+columns and i have extracted this data into json file when i use the below code to read the json files, if in data there are few columns which have no value at all it doest inclued those columns in schema...

Data Engineering

424 Views
1 replies
0 kudos

02-21-2024 8:29:20 AM

View Replies

Latest Reply

zero234
New Contributor III

02-21-2024 10:57:30 AM

0 kudos

replying to my above questionwe cannot use inferschema on streaming table we need to externally specify schema can anyone please suggest a way to write data in nested form to streaming table and if this is possible?

0 kudos

02-21-2024 10:57:30 AM

by asad77007 • New Contributor II

06-19-2023 5:57:20 AM

1337 Views
3 replies
1 kudos

How to connect Analysis Service Cube with Databricks notebook

I am trying to connect AS Cube with Databricks notebook but unfortunately didn't find any solution yet. is there any possible way to connect AS cube with databricks notebook? if yes can someone please guide me

Data Engineering

1337 Views
3 replies
1 kudos

06-19-2023 5:57:20 AM

View Replies

Latest Reply

omfspartan
New Contributor III

08-30-2023 7:55:11 AM

1 kudos

I am able to connect Azure analysis services using Azure Analysis services rest api. is yours on-prem?

1 kudos

08-30-2023 7:55:11 AM

2 More Replies

by Baldrez • New Contributor II

10-14-2021 6:42:04 AM

2425 Views
4 replies
5 kudos

Resolved! REST API for Stream Monitoring

Hi, everyone. I just recently started using Databricks on Azure so my question is probably very basic but I am really stuck right now.I need to capture some streaming metrics (number of input rows and their time) so I tried using the Spark Rest Api ...

Data Engineering

2425 Views
4 replies
5 kudos

10-14-2021 6:42:04 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

10-26-2021 4:55:31 PM

5 kudos

hi @Roberto Baldrez ,if you think that @Gaurav Rupnar solved your question, then please select it as best response to it can be moved to the top of the topic and it will help more users in the future.Thank you

5 kudos

10-26-2021 4:55:31 PM

3 More Replies

by zero234 • New Contributor III

02-20-2024 5:20:33 AM

749 Views
2 replies
2 kudos

I have created a DLT pipeline which reads data from json files which are stored in databricks volum

I have created a DLT pipeline which reads data from json files which are stored in databricks volume and puts data into streaming table This was working fine.when i tried to read the data that is inserted into the table and compare the values with t...

Data Engineering

749 Views
2 replies
2 kudos

02-20-2024 5:20:33 AM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

02-21-2024 4:24:47 AM

2 kudos

Keep your DLT code separate from your comparison code, and run your comparison code once your DLT data has been ingested.

2 kudos

02-21-2024 4:24:47 AM

1 More Replies

User

Count

1603

736

344

284

247

Databricks

Forum Posts

Resolved! Databricks SQL - Deduplication in DLT APPLY CHANGES INTO

Job not able to access notebook from github

Multiple Autoloader reading the same directory path

Managed tables and ADLS - infrastructure

AutoLoader - problem with adding new source location

performance issues using shared compute access mode in scala

Create a docker image for dbt task

Parse nested json for building footprints

Data is not loaded when creating two different streaming table from one delta live table pipeline

How to create job cluster using rest api

groupBy without aggregation (Pyspark API)

I am trying to read nested data from json file to put it into streaming table using dlt

How to connect Analysis Service Cube with Databricks notebook

Resolved! REST API for Stream Monitoring

I have created a DLT pipeline which reads data from json files which are stored in databricks volum

DLT table not picked in python notebook

Load multiple delta tables at once from Sql server

Starting Serverless sql cluster on GCP

"Can't login to databricks socket is closed" when ...

Temporary views no longer working for Share Comput...