โ07-24-2023 03:07 PM
Hi,
We are working on a migration project from Cloudera to Databricks.
All our code is in .py files and we decided to keep the same in Databricks as well and try to execute the same from GIT through Databricks workflows.
We have two kinds of exit functionality requirements:
1. Soft exit with sys.exit(0)- when the job criteria meet some condition, then we do soft exist with sys.exit(0), which basically terminates the job softly by marking the job as successful
2. Job Terminate with sys.exit(1) - when the job criteria meet some condition, then we do terminate the job with sys.exit(1), which basically terminates the job by marking the job as failed.
the above criteria work as expected in any Python environment but not in Databricks.
In Databricks, both sys.exit(1) as well as sys.exit(0) are marked as Job failed.
I read an article in Community and somebody mentioned that "Usage of spark.stop(), sc.stop() , System.exit() in your application can cause this behavior. Databricks manages the context shutdown on its own. Forcefully closing it can cause this abrupt behavior."
If this is true, then what is the best alternative to achieve sys.exit(0) in Databricks?
Any help would greatly be appreciated.
FYI: we have code in .py files and we don't want to use the Notebooks for PROD jobs.
Here is an example to execute and see the behavior:
โ07-26-2023 08:34 AM
Ramana,
I checked internally and the suggestion is to structure your code in such a way that it returns from the main function when a certain condition is met. I modified your code to what you see below.
# sys.exit(0) equivalent
def main():
bucket_name = "prod"
if bucket_name == "prod":
return
# Rest of your code
if __name__ == "__main__":
main()
โ07-27-2023 07:31 AM - edited โ07-27-2023 07:32 AM
I tested with different levels of nesting and it is working as expected.
Here is the sample code:
import sys
bucket_name = "prod"# str(sys.argv[1]).lower()
def main():
i,j=0,0
while j<=2:
print(f"while loop iteration: {j}")
for i in range(0,3):
print(f"for loop iteratation: {i}")
if bucket_name == "prod" and j==1 and i==1:
print("Success. It is PROD. Existing with 0")
return True
print("After return")
# Rest of your code
else:
# print("Fail. It is DEV. Existing with 1")
# sys.exit(1)
print(f"Else: while iteration: {j} and for iteration: {i}")
continue
print("outside if else")
i+=1
print("inside for loop")
j+=1
print("end of for loop")
print("end of while loop")
if __name__ == "__main__":
main()
โ07-26-2023 08:34 AM
Ramana,
I checked internally and the suggestion is to structure your code in such a way that it returns from the main function when a certain condition is met. I modified your code to what you see below.
# sys.exit(0) equivalent
def main():
bucket_name = "prod"
if bucket_name == "prod":
return
# Rest of your code
if __name__ == "__main__":
main()
โ07-26-2023 09:27 AM
I tested with simple code and it worked because return statements only work inside the functions and converting all the code into the main function makes it work. I will test the same with a full scale of code and if that works then I will mark this as the solution.
โ07-27-2023 07:31 AM - edited โ07-27-2023 07:32 AM
I tested with different levels of nesting and it is working as expected.
Here is the sample code:
import sys
bucket_name = "prod"# str(sys.argv[1]).lower()
def main():
i,j=0,0
while j<=2:
print(f"while loop iteration: {j}")
for i in range(0,3):
print(f"for loop iteratation: {i}")
if bucket_name == "prod" and j==1 and i==1:
print("Success. It is PROD. Existing with 0")
return True
print("After return")
# Rest of your code
else:
# print("Fail. It is DEV. Existing with 1")
# sys.exit(1)
print(f"Else: while iteration: {j} and for iteration: {i}")
continue
print("outside if else")
i+=1
print("inside for loop")
j+=1
print("end of for loop")
print("end of while loop")
if __name__ == "__main__":
main()
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group