To access multiple .mdb (Microsoft Access) files stored in Azure Data Lake Storage (ADLS) or the Databricks File System (DBFS) using Python, you can use the pandas_access
library. Below are the steps and code snippets to help you achieve this:
-
Install the pandas_access
library:
%pip install pandas_access
-
Import the necessary libraries and read the .mdb file:
import pandas_access as mdb
db_filename = '/dbfs/FileStore/Campaign_Template.mdb'
for tbl in mdb.list_tables(db_filename):
print(tbl)
df = mdb.read_table(db_filename, "Campaign_Table")
df.head()
-
Handling paths in ADLS: If your .mdb files are stored in ADLS, you need to mount the ADLS container to DBFS first. Here is an example of how to mount an ADLS container:
configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<client-id>",
"fs.azure.account.oauth2.client.secret": "<client-secret>",
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"
}
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/<mount-name>",
extra_configs = configs
)
After mounting, you can access the .mdb files using the mounted path:
db_filename = '/mnt/<mount-name>/path/to/your/file.mdb'
-
Reading the .mdb file from the mounted path:
import pandas_access as mdb
db_filename = '/mnt/<mount-name>/path/to/your/file.mdb'
for tbl in mdb.list_tables(db_filename):
print(tbl)
df = mdb.read_table(db_filename, "Campaign_Table")
df.head()
These steps should help you access and manipulate data from .mdb files stored in ADLS or DBFS using Python in Databricks. If you encounter any issues, ensure that the paths are correctly specified and that the necessary libraries are installed.