I have thousands of parquet files having same schema and each has 1 or more records. But reading with spark these files is very very slow. I want to know if there is any solution how to merge the files before reading them with spark? Or is there any ...
Give [*joinem*](https://github.com/mmore500/joinem) a try, available via PyPi: `python3 -m pip install joinem`.*joinem* provides a CLI for fast, flexbile concatenation of tabular data using [polars](https://pola.rs).I/O is *lazily streamed* in order ...
Hello, we have Databricks Python workbooks accessing Delta tables. These workbooks are scheduled/invoked by Azure Data Factory. How can I enable Photon on the linked services that are used to call Databricks?If I specify new job cluster, there does n...
When you create a cluster on Databricks, you can enable Photon by selecting the "Photon" option in the cluster configuration settings. This is typically done when creating a new cluster, and you would find the option in the advanced cluster configura...
I want to pass some context information to the delta live tables pipeline when calling from Azure Data Factory. I know the body of the API call supports Full Refresh parameter but I wonder if I can add my own custom parameters and how this can be re...
Hello everyone!So I want to ingest tables with schemas from the on-premise SQL server to Databricks Bronze layer with Delta Live Table and I want to do it using Azure Data Factory and I want the load to be a Snapshot batch load, not an incremental lo...
Hi @Parsa Bahraminejad​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...
I would like to set the permissions to jobs such as granting "CAN_VIEW" or "CAN_MANAGE" to specific groups that run from ADF. It appears that we need to set permissions in pipe line where job runs from ADF, But I could not figure it out. ​​
Thank you @Debayan Mukherjee​ and @Vidula Khanna​ for getting back to me. But, it didn't help my case. I am specifically looking for setting permissions to the job so that our team can see the job cluster including Spark UI with that privilege. ...
Hi,This happened across 4x seemingly unrelated workflows at the same time of the day - all 4x workflows eventually completed successfully. It appeared that all workflows sat idling despite triggering via the Jobs API. The two symptoms I have observed...
When using Azure Data Factory to coordinate the launch of Databricks jobs - can you specify which cluster policy to apply to the job, either explicitly or implicitly?
you could, but not from ADF's UI. You need to edit the json of the linked service, adding a 'policyId' parameter in the 'typeProperties' object, pointing to the cluster policy ID from Databricks (which you could find in Databricks' URL).
I am trying to do incremental load in azure data factory. Most of the tables in the Oracle database doesn't have last modified date and Primary key column. Is there any way to do incremental loading without last modified date and primary key column?
Hi @Selva Kumar Ponnusamy​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please te...
Hi guys,I have some notebooks with REPOS but I noticed that REPOS changed my notebook format to .py because of this my Azure Data Factory no longer recognizes the notebook (.py)Have any ideia to convert that .py to databricks format ?
that is odd. repos is merely another location (linked to git).You can copy/paste the code inside the py file into a notebook, or convert them using online tools or python libraries (like py2ipynb).
In my particular use case, the creation of the mount is initiated through a notebook activity in Azure Data Factory (ADF). This activity utilizes a job cluster for the current execution. However, it has come to my attention that the mounts generated ...
Hello!We got some pipelines running in Azure Data Factory that call Databricks Notebooks to run data transformations. This morning at 6:21 AM (UTC) we got an error " Unexpected failure while fetching notebook" inside a notebook that calls another one...
@Rita Fernandes​ :Based on the error message you provided, it seems like the issue might be related to the version mismatch between the ANTLR tool used for code generation and the current runtime version. Additionally, the error message suggests that...
We trigger different Worflows by ADF.These workflows will be shown triggered "manually".Is this behaviour intentional? At least for users, this is confusing.ADF-triggered Run: Databricks-Workflows:
Hi @J. G.​, Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does. Your feedback will...
Hello,We have an Azure Data Factory pipeline running during the night, and one of the activities calls a Databricks Notebook with dynamic DatabricksInstancePoolId, ClusterVersion and Workers. Yesterday, it failed with with the following error:Cluster...
We will be given an excel file, in which we can see column sender_city and destination_city. We have to find the distance between these two cities and the distance calculated should be written in a column total_distance. All these processes should be...