Hi all,
tl;dr
I ran the following on a docker-backed personal compute instance (running 13.3-LTS)
```
%sql
USE CATALOG hail;
USE SCHEMA volumes_testing;
CREATE VOLUME 1kg
COMMENT 'Testing 1000 Genomes volume';
```
But this gives
```
ParseException: [UC_VOLUMES_NOT_ENABLED] Support for Unity Catalog Volumes is not enabled on this instance
```
Full version
I am trying to use Hail | Hail 0.2 with DataBricks.
Although not technically supported, I have built a docker container that seems to support Spark 3.4.1 with the latest branch of the hail version (with a patch or too), code is here for those interested.
docker-who/repositories/hail/0.2.126--spark-3.4.1-patch at main ยท umccr/docker-who (github.com)
Container is open access here - https://github.com/umccr/docker-who/pkgs/container/hail
This docker image has been backed onto DataBricks Runtime 13.3 LTS.
I have a personal compute cluster with the following configuration, and use the docker image above.
```json
{
"num_workers": 0,
"cluster_name": "hail",
"spark_version": "13.3.x-scala2.12",
"spark_conf": {
"spark.databricks.cluster.profile": "singleNode",
"spark.master": "local[*, 4]"
},
"aws_attributes": {
"first_on_demand": 1,
"availability": "ON_DEMAND",
"zone_id": "auto",
"spot_bid_price_percent": 100,
"ebs_volume_count": 0
},
"node_type_id": "i3.xlarge",
"driver_node_type_id": "i3.xlarge",
"ssh_public_keys": [],
"custom_tags": {
"ResourceClass": "SingleNode"
},
"spark_env_vars": {},
"autotermination_minutes": 20,
"enable_elastic_disk": true,
"init_scripts": [],
"docker_image": {
"url": "ghcr.io/umccr/hail:latest"
},
"single_user_name": "alexis.lucattini@umccr.org",
"policy_id": "D063D7FAC000009E",
"enable_local_disk_encryption": false,
"data_security_mode": "SINGLE_USER",
"runtime_engine": "STANDARD",
"cluster_id": "1203-223920-6d7l06ne"
}
```
I then ran the following code (derived from Hail | MatrixTable Tutorial)
```
# Imports
import hail as hl
from tempfile import TemporaryDirectory
from pathlib import Path
# Initialisation
hl.init(
sc,
idempotent=True,
quiet=True,
skip_logging_configuration=True
)
# TMP Information
TMP_DATA_OBJ = TemporaryDirectory()
TMP_DATA_PATH = Path(TMP_DATA_OBJ.name)
# TMP Data Path
hl.utils.get_1kg(str(TMP_DATA_PATH) + "/")
# Import VCF
hl.import_vcf(
str(TMP_DATA_PATH / '1kg.vcf.bgz')
).write(
str(TMP_DATA_PATH / '1kg.mt'),
overwrite=True
)
# Read Matrix Table
mt = hl.read_matrix_table(
str(TMP_DATA_PATH / '1kg.mt')
)
%sql
USE CATALOG hail;
USE SCHEMA volumes_testing;
CREATE VOLUME 1kg
COMMENT 'Testing 1000 Genomes volume';
```
But gives
```
ParseException: [UC_VOLUMES_NOT_ENABLED] Support for Unity Catalog Volumes is not enabled on this instance
```
I thought Volumes were enabled in Private preview for 13.2 and above?
Announcing Public Preview of Volumes in Databricks Unity Catalog | Databricks Blog
Is it because my instance is backed by a docker image?
Can someone point me to the documentation on this if I have missed it?
Alexis