cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Retrieve data from multiple .mdb files using Python.

JamesBrown54
New Contributor

Hello,

I'm interested in accessing several .mdb Access files stored in either Azure Data Lake Storage (ADLS) or the Databricks File System using Python. Could you provide guidance on how to accomplish this? It would be immensely helpful if you could also share some code snippets for reference.

2 REPLIES 2

NandiniN
Databricks Employee
Databricks Employee

To access multiple .mdb (Microsoft Access) files stored in Azure Data Lake Storage (ADLS) or the Databricks File System (DBFS) using Python, you can use the pandas_access library. Below are the steps and code snippets to help you achieve this:

  1. Install the pandas_access library:

    %pip install pandas_access
  2. Import the necessary libraries and read the .mdb file:

    import pandas_access as mdb
    
    # Path to your .mdb file in DBFS
    db_filename = '/dbfs/FileStore/Campaign_Template.mdb'
    
    # Listing the tables in the .mdb file
    for tbl in mdb.list_tables(db_filename):
        print(tbl)
    
    # Reading a specific table into a DataFrame
    df = mdb.read_table(db_filename, "Campaign_Table")
    df.head()
  3. Handling paths in ADLS: If your .mdb files are stored in ADLS, you need to mount the ADLS container to DBFS first. Here is an example of how to mount an ADLS container:

    configs = {
        "fs.azure.account.auth.type": "OAuth",
        "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
        "fs.azure.account.oauth2.client.id": "<client-id>",
        "fs.azure.account.oauth2.client.secret": "<client-secret>",
        "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"
    }
    
    dbutils.fs.mount(
        source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
        mount_point = "/mnt/<mount-name>",
        extra_configs = configs
    )
     

    After mounting, you can access the .mdb files using the mounted path:

    db_filename = '/mnt/<mount-name>/path/to/your/file.mdb'
  4. Reading the .mdb file from the mounted path:

    import pandas_access as mdb
    
    # Path to your .mdb file in the mounted ADLS container
    db_filename = '/mnt/<mount-name>/path/to/your/file.mdb'
    
    # Listing the tables in the .mdb file
    for tbl in mdb.list_tables(db_filename):
        print(tbl)
    
    # Reading a specific table into a DataFrame
    df = mdb.read_table(db_filename, "Campaign_Table")
    df.head()
     

These steps should help you access and manipulate data from .mdb files stored in ADLS or DBFS using Python in Databricks. If you encounter any issues, ensure that the paths are correctly specified and that the necessary libraries are installed.

NandiniN
Databricks Employee
Databricks Employee

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now