Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
my project I want if job take longer time then it will terminate and again it will try even if there is timeout error and in databricks launched status should show retry by scheduler and it should follow min_retry_interval_millis before start retry...
Hi , I Want to set execution termination time/timeout limit for job in job config file. please help me how I can do this by pass some parameter in job config file.
Hi @Pratibha You can configure optional duration thresholds for a job, including an expected completion time for the job and a maximum completion time for the job. To configure duration thresholds, click Set duration thresholds. If you are creating j...
Hello,I am wondering if there is a way in Databricks to run a job continuously except for 1 or 2 hours every night in which the cluster could restart. We are using interactive clusters for our jobs and development in Dev and UAT. In Prod we are still...
Is it possible to avoid using Service Principal (and managing their secrets) via the Python MSAL library and, instead, use the "Access Connector for Azure Databricks" to access Azure SQL Server (just like we do for connecting to Azure Data Lake Stora...
Hi Currently all data reauired resides in Az sql database. We have a project in which we need to query on demand this data in Salesforce data cloud to be further used for reporting in CRMA dashboard.do we need to move this data from az sql to delta l...
It depends. If Salesforce Data Cloud has a connector for AZ SQL (being a native one or odbc/jdbc), you can query directly. MS also has something like OData. AFAIK AZ SQL does not have a query API, only for DB-management purposes.If all above is no...
I have around 25GBs of data in my Azure storage. I am performing data ingestion using Autoloader in databricks. Below are the steps I am performing:Setting the enableChangeDataFeed as true.Reading the complete raw data using readStream.Writing as del...
Hello All,Following command on running through databricks notebook is not working Command%sh# Bash code to print 'Hello, PowerShell!'echo 'Hello, PowerShell!'# powershell.exe -ExecutionPolicy Restricted -File /dbfs:/FileStore/Read_Vault_Inventory.ps1...
[Situation]I am using AWS DMS to store mysql cdc in S3 as a parquet file.I have implemented a streaming pipeline using the DLT module.The target destination is Unity Catalog.[Questions and issues].- Where are the tables and materialized views specifi...
Using Databricks Runtime 12.0, when attempting to mount an Azure blob storage container, I'm getting the following exception:`IllegalArgumentException: Unsupported Azure Scheme: abfss` dbutils.fs.mount(
source="abfss://container@my-storage-accoun...
I'm writing some code to perform regression testing which require notebook path and its default language. Based on default language it will perform further analysis. So how can I programmatically get my notebook default language and save in some vari...
You can get the default language of a notebook using dbutils.notebook.get_notebook_language()
try this example:
%pythonimport dbutilsdefault_language = dbutils.notebook.get_notebook_language()print(default_language)
I have a dataframe that is a series of transformation of big data (167 million rows) and I want to write it to delta files and tables using the below : try:
(df_new.write.format('delta')
.option("delta.minReaderVersion", "2")
.optio...
Hi @Retired_mod I am having the same issue where i made a inner join on two spark dataframes they are running only a single node not sure how to modify to run on many nodes and same thing with when i write a 30 gb data to a delta table it is almost 3...
Importing or cloning the .dbc Folder from "advanced-data-engineering-with-databricks" into my own workspace fails with a time-out. The folder is incomplete How can I fix this?I tried download and import the file and via URL...
I am having an issue with Databricks (Community Edition) where I can use Pandas to read a parquet file into a dataframe, but when I use Spark it states the file doesn't exist. I have tried reformatting the file path for spark but I can't seem to find...
We are trying to retrieve xml file name using _metadata but not working. we are not able to use input _file_name() also as we are using shared cluster.we are reading the xml files using com.datadricks.spark.xml library