Topics with Label: Azure data factory

Forum Posts

Sorted by:

by irfanaziz • Contributor II

11-03-2021 12:58:01 PM

18030 Views
7 replies
8 kudos

Resolved! How to merge small parquet files into a single parquet file?

I have thousands of parquet files having same schema and each has 1 or more records. But reading with spark these files is very very slow. I want to know if there is any solution how to merge the files before reading them with spark? Or is there any ...

Data Engineering

18030 Views
7 replies
8 kudos

11-03-2021 12:58:01 PM

View Replies

Latest Reply

mmore500
New Contributor II

02-19-2024 11:47:30 AM

8 kudos

Give [*joinem*](https://github.com/mmore500/joinem) a try, available via PyPi: `python3 -m pip install joinem`.*joinem* provides a CLI for fast, flexbile concatenation of tabular data using [polars](https://pola.rs).I/O is *lazily streamed* in order ...

8 kudos

02-19-2024 11:47:30 AM

6 More Replies

by MartinH • New Contributor II

03-23-2023 3:09:56 PM

2349 Views
6 replies
3 kudos

Azure Data Factory and Photon

Hello, we have Databricks Python workbooks accessing Delta tables. These workbooks are scheduled/invoked by Azure Data Factory. How can I enable Photon on the linked services that are used to call Databricks?If I specify new job cluster, there does n...

Data Engineering

2349 Views
6 replies
3 kudos

03-23-2023 3:09:56 PM

View Replies

Latest Reply

CharlesReily
New Contributor III

01-16-2024 11:22:48 PM

3 kudos

When you create a cluster on Databricks, you can enable Photon by selecting the "Photon" option in the cluster configuration settings. This is typically done when creating a new cluster, and you would find the option in the advanced cluster configura...

3 kudos

01-16-2024 11:22:48 PM

5 More Replies

by rubenesanchez • New Contributor II

04-15-2023 8:33:12 AM

2793 Views
4 replies
0 kudos

How dynamically pass a string parameter to a Delta Live Table Pipeline when calling from Azure Data Factory using REST API

I want to pass some context information to the delta live tables pipeline when calling from Azure Data Factory. I know the body of the API call supports Full Refresh parameter but I wonder if I can add my own custom parameters and how this can be re...

Data Engineering

2793 Views
4 replies
0 kudos

04-15-2023 8:33:12 AM

View Replies

Latest Reply

Manjula_Ganesap
Contributor

08-23-2023 6:46:31 AM

0 kudos

@rubenesanchez - Did you find a solution to your problem? I have the same question.

0 kudos

08-23-2023 6:46:31 AM

3 More Replies

by Chakra • New Contributor II

08-10-2021 5:25:16 PM

826 Views
1 replies
1 kudos

Create job cluster with a docker image in azure data factory

Is there a way to create a job cluster in azure data factory with a docker image either through API or UI

Data Engineering

826 Views
1 replies
1 kudos

08-10-2021 5:25:16 PM

View Replies

Latest Reply

m_szklarczyk
New Contributor II

07-17-2023 6:31:25 AM

1 kudos

Does anyone handle how to run from ADF custom image as jobs compute?

1 kudos

07-17-2023 6:31:25 AM

by Enzo_Bahrami • New Contributor III

05-30-2023 12:18:46 PM

2873 Views
6 replies
1 kudos

Resolved! On-Premise SQL Server Ingestion to Databricks Bronze Layer

Hello everyone!So I want to ingest tables with schemas from the on-premise SQL server to Databricks Bronze layer with Delta Live Table and I want to do it using Azure Data Factory and I want the load to be a Snapshot batch load, not an incremental lo...

Data Engineering

2873 Views
6 replies
1 kudos

05-30-2023 12:18:46 PM

View Replies

Latest Reply

Anonymous
Not applicable

05-31-2023 8:18:18 PM

1 kudos

Hi @Parsa Bahraminejad Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

1 kudos

05-31-2023 8:18:18 PM

5 More Replies

by dbx_8451 • New Contributor II

06-22-2023 7:49:14 PM

1797 Views
3 replies
0 kudos

How to set the permissions to databricks jobs that created and run from Azure Data Factory(ADF)?

I would like to set the permissions to jobs such as granting "CAN_VIEW" or "CAN_MANAGE" to specific groups that run from ADF. It appears that we need to set permissions in pipe line where job runs from ADF, But I could not figure it out.

Data Engineering

1797 Views
3 replies
0 kudos

06-22-2023 7:49:14 PM

View Replies

Latest Reply

dbx_8451
New Contributor II

06-23-2023 5:49:15 AM

0 kudos

Thank you @Debayan Mukherjee and @Vidula Khanna for getting back to me. But, it didn't help my case. I am specifically looking for setting permissions to the job so that our team can see the job cluster including Spark UI with that privilege. ...

0 kudos

06-23-2023 5:49:15 AM

2 More Replies

by timothy_uk • New Contributor III

06-14-2023 10:12:37 AM

561 Views
1 replies
1 kudos

Mysterious simultaneous long-running Databricks Workflows

Hi,This happened across 4x seemingly unrelated workflows at the same time of the day - all 4x workflows eventually completed successfully. It appeared that all workflows sat idling despite triggering via the Jobs API. The two symptoms I have observed...

Data Engineering

561 Views
1 replies
1 kudos

06-14-2023 10:12:37 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-17-2023 2:28:49 AM

1 kudos

Hi @Timothy Lin Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

1 kudos

06-17-2023 2:28:49 AM

by RonanStokes_DB • New Contributor III

06-08-2021 10:06:15 AM

1346 Views
1 replies
1 kudos

Can you apply a specific cluster policy when launching a Databricks job via Azure Data Factory

When using Azure Data Factory to coordinate the launch of Databricks jobs - can you specify which cluster policy to apply to the job, either explicitly or implicitly?

Data Engineering

1346 Views
1 replies
1 kudos

06-08-2021 10:06:15 AM

View Replies

Latest Reply

mvandeborne
New Contributor II

06-16-2023 5:46:40 AM

1 kudos

you could, but not from ADF's UI. You need to edit the json of the linked service, adding a 'policyId' parameter in the 'typeProperties' object, pointing to the cluster policy ID from Databricks (which you could find in Databricks' URL).

1 kudos

06-16-2023 5:46:40 AM

by selvakumar092 • New Contributor II

06-15-2023 1:48:18 AM

3181 Views
5 replies
0 kudos

Resolved! Incremental Load without Last Modified Date and Primary Key field in Azure Data Factory to create bronze data in data bricks

I am trying to do incremental load in azure data factory. Most of the tables in the Oracle database doesn't have last modified date and Primary key column. Is there any way to do incremental loading without last modified date and primary key column?

Data Engineering

3181 Views
5 replies
0 kudos

06-15-2023 1:48:18 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-15-2023 8:16:42 PM

0 kudos

Hi @Selva Kumar Ponnusamy Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please te...

0 kudos

06-15-2023 8:16:42 PM

4 More Replies

by William_Scardua • Valued Contributor

05-30-2023 4:51:10 PM

797 Views
1 replies
0 kudos

REPOS change my notebook format

Hi guys,I have some notebooks with REPOS but I noticed that REPOS changed my notebook format to .py because of this my Azure Data Factory no longer recognizes the notebook (.py)Have any ideia to convert that .py to databricks format ?

Data Engineering

797 Views
1 replies
0 kudos

05-30-2023 4:51:10 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

06-06-2023 11:37:13 PM

0 kudos

that is odd. repos is merely another location (linked to git).You can copy/paste the code inside the py file into a notebook, or convert them using online tools or python libraries (like py2ipynb).

0 kudos

06-06-2023 11:37:13 PM

by PhaniKumar • New Contributor

05-20-2023 3:34:24 AM

484 Views
0 replies
0 kudos

Databricks MountPoints Need to be refreshed when created by an ADF Job cluster?

In my particular use case, the creation of the mount is initiated through a notebook activity in Azure Data Factory (ADF). This activity utilizes a job cluster for the current execution. However, it has come to my attention that the mounts generated ...

Data Engineering

484 Views
0 replies
0 kudos

05-20-2023 3:34:24 AM

by killjoy • New Contributor III

04-11-2023 4:06:17 AM

3555 Views
2 replies
0 kudos

Unexpected failure while fetching notebook - What can we do from our side?

Hello!We got some pipelines running in Azure Data Factory that call Databricks Notebooks to run data transformations. This morning at 6:21 AM (UTC) we got an error " Unexpected failure while fetching notebook" inside a notebook that calls another one...

Data Engineering

3555 Views
2 replies
0 kudos

04-11-2023 4:06:17 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-17-2023 6:29:27 AM

0 kudos

@Rita Fernandes :Based on the error message you provided, it seems like the issue might be related to the version mismatch between the ANTLR tool used for code generation and the current runtime version. Additionally, the error message suggests that...

0 kudos

04-17-2023 6:29:27 AM

1 More Replies

by Jkb • New Contributor II

02-23-2023 1:46:00 AM

1338 Views
2 replies
2 kudos

Resolved! Workflow triggered by CLI shown "manually" triggered

We trigger different Worflows by ADF.These workflows will be shown triggered "manually".Is this behaviour intentional? At least for users, this is confusing.ADF-triggered Run: Databricks-Workflows:

Data Engineering

1338 Views
2 replies
2 kudos

02-23-2023 1:46:00 AM

View Replies

Latest Reply

Kaniz
Community Manager

03-02-2023 6:33:47 AM

2 kudos

Hi @J. G., Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does. Your feedback will...

2 kudos

03-02-2023 6:33:47 AM

1 More Replies

by killjoy • New Contributor III

02-14-2023 1:30:21 AM

3729 Views
7 replies
0 kudos

Resolved! Pipeline failed while calling Databricks Notebook - Cluster Terminated

Hello,We have an Azure Data Factory pipeline running during the night, and one of the activities calls a Databricks Notebook with dynamic DatabricksInstancePoolId, ClusterVersion and Workers. Yesterday, it failed with with the following error:Cluster...

Data Engineering

3729 Views
7 replies
0 kudos

02-14-2023 1:30:21 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

02-23-2023 2:14:55 PM

0 kudos

Hi @Rita Fernandes,What are you trying to install in your init script? only the ODBC driver or some other libraries/dependencies?

0 kudos

02-23-2023 2:14:55 PM

6 More Replies

by Azure_databric1 • New Contributor II

12-29-2022 1:18:23 AM

928 Views
1 replies
2 kudos

How to find the road distance between two cities? We can use Azure databricks and azure map.

We will be given an excel file, in which we can see column sender_city and destination_city. We have to find the distance between these two cities and the distance calculated should be written in a column total_distance. All these processes should be...

Data Engineering

928 Views
1 replies
2 kudos

12-29-2022 1:18:23 AM

View Replies

Latest Reply

sher
Valued Contributor II

01-02-2023 9:12:30 AM

2 kudos

heywithout using latitude and longitude it is hard to find out but you can try some distance-based algorithm

2 kudos

01-02-2023 9:12:30 AM