cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
cancel
Showing results for 
Search instead for 
Did you mean: 

Security Analysis Tool (SAT) on GCP - OSError: [Errno 5] Input/output error

GlenMacLarty
New Contributor III

I am interested to hear from anyone who has setup the Security Analysis Tool (SAT) on a GCP hosted Databricks environment.

I am in the process of getting the tool setup and I'm experiencing issues running the security_analysis_initializer notebook. The observation is that when the readBestPracticesConfigsFile() call attempts to access the CSV from the workspace location there is a file access error.

The error returned is 

 

OSError: [Errno 5] Input/output error: '/Workspace/Users/<user>/<path>/security-analysis-tool/notebooks/Utils'

This is happening when attempting to load the default best practices CSV from the workspace location. I have been able to work around this by refactoring and using a DBFS location, but I run into this same issue again when the logging utils are being referenced.

Any pointers would be greatly appreciated.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

GlenMacLarty
New Contributor III

Thanks @Kaniz,

I have been able to get past this error through recreating the cluster with absolute barebone config. It was potentially a custom configuration (unknown at this time) which was causing this to fail. I will try and reproduce once I get some further issues sorted and provide a summary to the community to help others who may run into similar problems.

Thanks for the tips. I did actually refactor to use the dbfs location, but the issue was manifesting elsewhere in the official SAT code due to the cluster misconfiguration so resolving it was the only option to ensure I wasn't using a customised SAT setup.

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @GlenMacLarty , 

The error message suggests a problem accessing the file in the workspace location, possibly due to permissions or access issues.

To troubleshoot this issue, you can try the following steps:

  1. Check for any permission issues with the workspace location and ensure that the user running the SAT has sufficient permissions to access and read the files.

  2. Check if the file location and file name are correct, including the case and file extension.

  3. Try accessing the file using a Databricks CLI or Databricks API to see if it is accessible from the command line outside the notebook environment.

  4. Check if any firewall or networking policies block access to the file location.

  5. Consider using a DBFS location instead of the workspace location if this does not conflict with your security requirements.

  6. Lastly, check if there are any other processes or jobs that are accessing the duplicate files at the same time, which could result in a conflict.

Hopefully, by using these tips, you can identify and resolve the issue with the SAT setup.

GlenMacLarty
New Contributor III

Thanks @Kaniz,

I have been able to get past this error through recreating the cluster with absolute barebone config. It was potentially a custom configuration (unknown at this time) which was causing this to fail. I will try and reproduce once I get some further issues sorted and provide a summary to the community to help others who may run into similar problems.

Thanks for the tips. I did actually refactor to use the dbfs location, but the issue was manifesting elsewhere in the official SAT code due to the cluster misconfiguration so resolving it was the only option to ensure I wasn't using a customised SAT setup.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.