cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

irfanaziz
by Contributor II
  • 18030 Views
  • 7 replies
  • 8 kudos

Resolved! How to merge small parquet files into a single parquet file?

I have thousands of parquet files having same schema and each has 1 or more records. But reading with spark these files is very very slow. I want to know if there is any solution how to merge the files before reading them with spark? Or is there any ...

  • 18030 Views
  • 7 replies
  • 8 kudos
Latest Reply
mmore500
New Contributor II
  • 8 kudos

Give [*joinem*](https://github.com/mmore500/joinem) a try, available via PyPi: `python3 -m pip install joinem`.*joinem* provides a CLI for fast, flexbile concatenation of tabular data using [polars](https://pola.rs).I/O is *lazily streamed* in order ...

  • 8 kudos
6 More Replies
MartinH
by New Contributor II
  • 2349 Views
  • 6 replies
  • 3 kudos

Azure Data Factory and Photon

Hello, we have Databricks Python workbooks accessing Delta tables. These workbooks are scheduled/invoked by Azure Data Factory. How can I enable Photon on the linked services that are used to call Databricks?If I specify new job cluster, there does n...

  • 2349 Views
  • 6 replies
  • 3 kudos
Latest Reply
CharlesReily
New Contributor III
  • 3 kudos

When you create a cluster on Databricks, you can enable Photon by selecting the "Photon" option in the cluster configuration settings. This is typically done when creating a new cluster, and you would find the option in the advanced cluster configura...

  • 3 kudos
5 More Replies
rubenesanchez
by New Contributor II
  • 2793 Views
  • 4 replies
  • 0 kudos

How dynamically pass a string parameter to a Delta Live Table Pipeline when calling from Azure Data Factory using REST API

I want to pass some context information to the delta live tables pipeline when calling from Azure Data Factory. I know the body of the API call supports Full Refresh parameter but I wonder if I can add my own custom parameters and how this can be re...

  • 2793 Views
  • 4 replies
  • 0 kudos
Latest Reply
Manjula_Ganesap
Contributor
  • 0 kudos

@rubenesanchez  - Did you find a solution to your problem? I have the same question. 

  • 0 kudos
3 More Replies
Chakra
by New Contributor II
  • 826 Views
  • 1 replies
  • 1 kudos

Create job cluster with a docker image in azure data factory

Is there a way to create a job cluster in azure data factory with a docker image either through API or UI

  • 826 Views
  • 1 replies
  • 1 kudos
Latest Reply
m_szklarczyk
New Contributor II
  • 1 kudos

Does anyone handle how to run from ADF custom image as jobs compute?

  • 1 kudos
Enzo_Bahrami
by New Contributor III
  • 2873 Views
  • 6 replies
  • 1 kudos

Resolved! On-Premise SQL Server Ingestion to Databricks Bronze Layer

Hello everyone!So I want to ingest tables with schemas from the on-premise SQL server to Databricks Bronze layer with Delta Live Table and I want to do it using Azure Data Factory and I want the load to be a Snapshot batch load, not an incremental lo...

  • 2873 Views
  • 6 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Parsa Bahraminejad​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

  • 1 kudos
5 More Replies
dbx_8451
by New Contributor II
  • 1797 Views
  • 3 replies
  • 0 kudos

How to set the permissions to databricks jobs that created and run from Azure Data Factory(ADF)?

I would like to set the permissions to jobs such as granting "CAN_VIEW" or "CAN_MANAGE" to specific groups that run from ADF. It appears that we need to set permissions in pipe line where job runs from ADF, But I could not figure it out. ​​

  • 1797 Views
  • 3 replies
  • 0 kudos
Latest Reply
dbx_8451
New Contributor II
  • 0 kudos

Thank you @Debayan Mukherjee​  and @Vidula Khanna​  for getting back to me. But, it didn't help my case. I am specifically looking for setting permissions to the job so that our team can see the job cluster including Spark UI with that privilege. ...

  • 0 kudos
2 More Replies
timothy_uk
by New Contributor III
  • 561 Views
  • 1 replies
  • 1 kudos

Mysterious simultaneous long-running Databricks Workflows

Hi,This happened across 4x seemingly unrelated workflows at the same time of the day - all 4x workflows eventually completed successfully. It appeared that all workflows sat idling despite triggering via the Jobs API. The two symptoms I have observed...

  • 561 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Timothy Lin​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 1 kudos
RonanStokes_DB
by New Contributor III
  • 1346 Views
  • 1 replies
  • 1 kudos

Can you apply a specific cluster policy when launching a Databricks job via Azure Data Factory

When using Azure Data Factory to coordinate the launch of Databricks jobs - can you specify which cluster policy to apply to the job, either explicitly or implicitly?

  • 1346 Views
  • 1 replies
  • 1 kudos
Latest Reply
mvandeborne
New Contributor II
  • 1 kudos

you could, but not from ADF's UI. You need to edit the json of the linked service, adding a 'policyId' parameter in the 'typeProperties' object, pointing to the cluster policy ID from Databricks (which you could find in Databricks' URL).

  • 1 kudos
selvakumar092
by New Contributor II
  • 3181 Views
  • 5 replies
  • 0 kudos

Resolved! Incremental Load without Last Modified Date and Primary Key field in Azure Data Factory to create bronze data in data bricks

 I am trying to do incremental load in azure data factory. Most of the tables in the Oracle database doesn't have last modified date and Primary key column. Is there any way to do incremental loading without last modified date and primary key column?

  • 3181 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Selva Kumar Ponnusamy​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please te...

  • 0 kudos
4 More Replies
William_Scardua
by Valued Contributor
  • 797 Views
  • 1 replies
  • 0 kudos

REPOS change my notebook format

Hi guys,I have some notebooks with REPOS but I noticed that REPOS changed my notebook format to .py because of this my Azure Data Factory no longer recognizes the notebook (.py)Have any ideia to convert that .py to databricks format ?

Screenshot 2023-05-30 at 20.39.02
  • 797 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

that is odd. repos is merely another location (linked to git).You can copy/paste the code inside the py file into a notebook, or convert them using online tools or python libraries (like py2ipynb).

  • 0 kudos
killjoy
by New Contributor III
  • 3555 Views
  • 2 replies
  • 0 kudos

Unexpected failure while fetching notebook - What can we do from our side?

Hello!We got some pipelines running in Azure Data Factory that call Databricks Notebooks to run data transformations. This morning at 6:21 AM (UTC) we got an error " Unexpected failure while fetching notebook" inside a notebook that calls another one...

  • 3555 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Rita Fernandes​ :Based on the error message you provided, it seems like the issue might be related to the version mismatch between the ANTLR tool used for code generation and the current runtime version. Additionally, the error message suggests that...

  • 0 kudos
1 More Replies
Jkb
by New Contributor II
  • 1338 Views
  • 2 replies
  • 2 kudos

Resolved! Workflow triggered by CLI shown "manually" triggered

We trigger different Worflows by ADF.These workflows will be shown triggered "manually".Is this behaviour intentional? At least for users, this is confusing.ADF-triggered Run: Databricks-Workflows: 

ADF_Monitor manually1 manually2
  • 1338 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @J. G.​, Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does. Your feedback will...

  • 2 kudos
1 More Replies
killjoy
by New Contributor III
  • 3729 Views
  • 7 replies
  • 0 kudos

Resolved! Pipeline failed while calling Databricks Notebook - Cluster Terminated

Hello,We have an Azure Data Factory pipeline running during the night, and one of the activities calls a Databricks Notebook with dynamic DatabricksInstancePoolId, ClusterVersion and Workers. Yesterday, it failed with with the following error:Cluster...

  • 3729 Views
  • 7 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Rita Fernandes​,What are you trying to install in your init script? only the ODBC driver or some other libraries/dependencies?

  • 0 kudos
6 More Replies
Azure_databric1
by New Contributor II
  • 928 Views
  • 1 replies
  • 2 kudos

How to find the road distance between two cities? We can use Azure databricks and azure map.

We will be given an excel file, in which we can see column sender_city and destination_city. We have to find the distance between these two cities and the distance calculated should be written in a column total_distance. All these processes should be...

  • 928 Views
  • 1 replies
  • 2 kudos
Latest Reply
sher
Valued Contributor II
  • 2 kudos

heywithout using latitude and longitude it is hard to find out but you can try some distance-based algorithm

  • 2 kudos
Labels