Email Extraction
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago
Hi , Hope you are doing well. I was trying to extract a specific email attachment from the outlook, and inject into the dbfs loaction, but something went wrong. Could you please help. I am hereby giving the code whcih I used.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago
If you face issues with IMAP, consider using Microsoft Graph API for email access. It provides robust support for Outlook without handling IMAP details and enhances security with OAuth2 tokens.
Followed is a sample script, but I didn't tested it:
pip install msal
import os
import requests
from msal import ConfidentialClientApplication
# Azure AD App Credentials
CLIENT_ID = os.getenv("CLIENT_ID") # Client ID from Azure App Registration
CLIENT_SECRET = os.getenv("CLIENT_SECRET") # Client Secret from Azure App
TENANT_ID = os.getenv("TENANT_ID") # Tenant ID
EMAIL_ADDRESS = "your-email@company.com"
# Microsoft Graph API URL
GRAPH_API_ENDPOINT = "https://graph.microsoft.com/v1.0"
# Authentication
def get_access_token():
app = ConfidentialClientApplication(
CLIENT_ID,
authority=f"https://login.microsoftonline.com/{TENANT_ID}",
client_credential=CLIENT_SECRET,
)
token_response = app.acquire_token_for_client(scopes=["https://graph.microsoft.com/.default"])
if "access_token" in token_response:
return token_response["access_token"]
else:
raise Exception(f"Failed to get access token: {token_response}")
# Get emails with attachments
def get_emails_with_attachments():
access_token = get_access_token()
headers = {"Authorization": f"Bearer {access_token}"}
# Fetch the first 10 emails
response = requests.get(f"{GRAPH_API_ENDPOINT}/users/{EMAIL_ADDRESS}/messages?$filter=hasAttachments eq true", headers=headers)
response.raise_for_status()
emails = response.json()["value"]
for email in emails:
print(f"Email Subject: {email['subject']}")
email_id = email["id"]
download_attachments(email_id, headers)
# Download attachments
def download_attachments(email_id, headers):
response = requests.get(f"{GRAPH_API_ENDPOINT}/me/messages/{email_id}/attachments", headers=headers)
response.raise_for_status()
attachments = response.json()["value"]
for attachment in attachments:
if "contentBytes" in attachment:
filename = attachment["name"]
file_data = attachment["contentBytes"]
file_path = f"/tmp/{filename}"
# Save locally first
with open(file_path, "wb") as f:
f.write(bytes.fromhex(file_data.encode("utf-8").hex()))
print(f"Saved attachment: {filename}")
# Upload to DBFS
dbfs_path = f"/dbfs/tmp/{filename}"
dbutils.fs.cp(f"file:{file_path}", dbfs_path)
print(f"Uploaded to DBFS: {dbfs_path}")
Another Approach could be, to use Logic Apps, if you are in the Azure Cloud. Have a look here: https://bakshiharsh55.medium.com/save-e-mail-attachment-to-blob-storage-utilizing-azure-logic-app-9d...