<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic AWS NAT (Network Address Translation) Automated On-demand Destruct / Create in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/aws-nat-network-address-translation-automated-on-demand-destruct/m-p/74765#M34787</link>
    <description>&lt;P&gt;Hi folks,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Our company typically uses Databrick during a 12 hour block, however the AWS NAT for elastic compute is up 24 hours, and I'd rather not pay for those hours.&lt;/P&gt;&lt;P&gt;I gather AWS lambda and cloudwatch can be used to schedule / trigger NAT destruction and creation.&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Has anyone tried this with success, and can you provide guidance on best practice here?&lt;BR /&gt;2. Are there any important considerations to bear in mind (ie: will removal of NAT also destroy attached route tables / security groups / elastic IP allocation)?&lt;BR /&gt;&lt;BR /&gt;Thank you.&lt;/P&gt;</description>
    <pubDate>Tue, 18 Jun 2024 02:14:49 GMT</pubDate>
    <dc:creator>csmcpherson</dc:creator>
    <dc:date>2024-06-18T02:14:49Z</dc:date>
    <item>
      <title>AWS NAT (Network Address Translation) Automated On-demand Destruct / Create</title>
      <link>https://community.databricks.com/t5/data-engineering/aws-nat-network-address-translation-automated-on-demand-destruct/m-p/74765#M34787</link>
      <description>&lt;P&gt;Hi folks,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Our company typically uses Databrick during a 12 hour block, however the AWS NAT for elastic compute is up 24 hours, and I'd rather not pay for those hours.&lt;/P&gt;&lt;P&gt;I gather AWS lambda and cloudwatch can be used to schedule / trigger NAT destruction and creation.&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Has anyone tried this with success, and can you provide guidance on best practice here?&lt;BR /&gt;2. Are there any important considerations to bear in mind (ie: will removal of NAT also destroy attached route tables / security groups / elastic IP allocation)?&lt;BR /&gt;&lt;BR /&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Jun 2024 02:14:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/aws-nat-network-address-translation-automated-on-demand-destruct/m-p/74765#M34787</guid>
      <dc:creator>csmcpherson</dc:creator>
      <dc:date>2024-06-18T02:14:49Z</dc:date>
    </item>
    <item>
      <title>Re: AWS NAT (Network Address Translation) Automated On-demand Destruct / Create</title>
      <link>https://community.databricks.com/t5/data-engineering/aws-nat-network-address-translation-automated-on-demand-destruct/m-p/78880#M35631</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Thanks for your reply.&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;BR /&gt;I created some Lambda functions to execute the NAT delete / create approach, factoring in route tables, elastic IP details and security groups per the forum guide.&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;However, there is a problem with Databricks not being able to connect to the EC2 resources - the clusters can initiate EC2 instance start up, but cannot connect to the resource, or even terminate it, and Databricks (DLT and compute) is constantly "waiting for resource", even though the instance is running in AWS.&lt;BR /&gt;&lt;BR /&gt;Is there anything that I may have missed?&lt;BR /&gt;&lt;BR /&gt;Lambda functions are below:&lt;BR /&gt;== delete lambda ==&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="csmcpherson_0-1721090211510.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/9579i15BCF1A021C1C80E/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="csmcpherson_0-1721090211510.png" alt="csmcpherson_0-1721090211510.png" /&gt;&lt;/span&gt;&lt;P&gt;== create lambda ==&lt;/P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="csmcpherson_0-1721090500932.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/9582iA363A23FD7CFD431/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="csmcpherson_0-1721090500932.png" alt="csmcpherson_0-1721090500932.png" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="csmcpherson_2-1721090284289.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/9581iBF888C3EA69F4D3A/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="csmcpherson_2-1721090284289.png" alt="csmcpherson_2-1721090284289.png" /&gt;&lt;/span&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 16 Jul 2024 00:42:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/aws-nat-network-address-translation-automated-on-demand-destruct/m-p/78880#M35631</guid>
      <dc:creator>csmcpherson</dc:creator>
      <dc:date>2024-07-16T00:42:31Z</dc:date>
    </item>
    <item>
      <title>Re: AWS NAT (Network Address Translation) Automated On-demand Destruct / Create</title>
      <link>https://community.databricks.com/t5/data-engineering/aws-nat-network-address-translation-automated-on-demand-destruct/m-p/81244#M36253</link>
      <description>&lt;P&gt;For interest, this is how I ended up solving the situation, with pointers from AWS support:&lt;BR /&gt;&amp;lt;&amp;lt; CREATE NAT &amp;gt;&amp;gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import boto3
import logging
from datetime import datetime

ec2 = boto3.client('ec2')
cloudwatch = boto3.client('logs')

def lambda_handler(event, context):
    allocation_id = # 'YOUR_ELASTIC_IP_ALLOCATION_ID'
    subnet_id = # 'YOUR_SUBNET_ID'
    route_table_id = # YOUR_ROUTE_TABLE_ID

    response = ec2.create_nat_gateway(
        AllocationId=allocation_id,
        SubnetId=subnet_id
    )

    nat_gateway_id = response['NatGateway']['NatGatewayId']
    waiter = ec2.get_waiter('nat_gateway_available')
    
    try:
        print(f"Waiting for NAT Gateway {nat_gateway_id} to become available...")
        waiter.wait(NatGatewayIds=[nat_gateway_id],  WaiterConfig={
            'Delay': 15,
            'MaxAttempts': 40}
        )
        print(f"NAT Gateway {nat_gateway_id} is now available.")
        update_route_table(route_table_id, nat_gateway_id)
    except Exception as e:
        print(f"Error waiting for NAT Gateway to become available: {e}")

    try:
        nat_gateway_ip = response['NatGateway']['NatGatewayAddresses'][0]['PublicIp']
    except KeyError as e:
        print(f"KeyError: {e}. The NAT Gateway response does not contain the expected key.")
        nat_gateway_ip = None


    log_nat_gateway_details(nat_gateway_id, nat_gateway_ip, 'Created')

    return {
        'statusCode': 200,
        'body': f'NAT Gateway {nat_gateway_id} created with IP {nat_gateway_ip}'
    }

def update_route_table(route_table_id, nat_gateway_id):
    try:
        # Describe the existing routes in the route table
        route_table = ec2.describe_route_tables(RouteTableIds=[route_table_id])
        routes = route_table['RouteTables'][0]['Routes']
        
        # Check if a route for 0.0.0.0/0 exists and update it
        route_exists = False
        for route in routes:
            if route['DestinationCidrBlock'] == '0.0.0.0/0':
                route_exists = True
                ec2.replace_route(
                    RouteTableId=route_table_id,
                    DestinationCidrBlock='0.0.0.0/0',
                    NatGatewayId=nat_gateway_id
                )
                print(f"Route updated in route table {route_table_id} destination 0.0.0.0/0 to point to NAT Gateway {nat_gateway_id}.")
                break

        # If no existing route for 0.0.0.0/0, create a new route
        if not route_exists:
            ec2.create_route(
                RouteTableId=route_table_id,
                DestinationCidrBlock='0.0.0.0/0',
                NatGatewayId=nat_gateway_id
            )
            print(f"New route created in route table {route_table_id} destination 0.0.0.0/ to point to NAT Gateway {nat_gateway_id}.")
    except Exception as e:
        logging.error(f"Error updating route table: {e}")

def log_nat_gateway_details(nat_gateway_id, nat_gateway_ip, action):
    log_group = 'NATGatewayLogs'
    log_stream = 'NATGatewayActions'

    try:
        cloudwatch.create_log_group(logGroupName=log_group)
    except cloudwatch.exceptions.ResourceAlreadyExistsException:
        pass

    try:
        cloudwatch.create_log_stream(logGroupName=log_group, logStreamName=log_stream)
    except cloudwatch.exceptions.ResourceAlreadyExistsException:
        pass

    timestamp = int(datetime.now().timestamp() * 1000)
    message = f'{action} NAT Gateway: {nat_gateway_id}, IP: {nat_gateway_ip} at {datetime.now()}'

    cloudwatch.put_log_events(
        logGroupName=log_group,
        logStreamName=log_stream,
        logEvents=[
            {
                'timestamp': timestamp,
                'message': message
            }
        ]
    )&lt;/LI-CODE&gt;&lt;P&gt;&amp;lt;&amp;lt; DELETE NAT &amp;gt;&amp;gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import boto3
import logging
from datetime import datetime

ec2 = boto3.client('ec2')
cloudwatch = boto3.client('logs')

def lambda_handler(event, context):
    nat_gateway_id = get_available_nat_gateway_id()

    if nat_gateway_id:
        response = ec2.delete_nat_gateway(
            NatGatewayId=nat_gateway_id
        )
        print(f"NAT Gateway {nat_gateway_id} deleted.")
    else:
        print("No available NAT Gateway found.")

    log_nat_gateway_details(nat_gateway_id, 'Deleted')

    return {
        'statusCode': 200,
        'body': f'NAT Gateway {nat_gateway_id} deletion initiated'
    }

def get_available_nat_gateway_id():
    response = ec2.describe_nat_gateways(
        Filters=[
            {
                'Name': 'state',
                'Values': ['available']
            }
        ]
    )
    for nat_gateway in response['NatGateways']:
        return nat_gateway['NatGatewayId']
    return None

def log_nat_gateway_details(nat_gateway_id, action):
    log_group = 'NATGatewayLogs'
    log_stream = 'NATGatewayActions'

    try:
        cloudwatch.create_log_group(logGroupName=log_group)
    except cloudwatch.exceptions.ResourceAlreadyExistsException:
        pass

    try:
        cloudwatch.create_log_stream(logGroupName=log_group, logStreamName=log_stream)
    except cloudwatch.exceptions.ResourceAlreadyExistsException:
        pass

    timestamp = int(datetime.now().timestamp() * 1000)
    message = f'{action} NAT Gateway: {nat_gateway_id} at {datetime.now()}'

    cloudwatch.put_log_events(
        logGroupName=log_group,
        logStreamName=log_stream,
        logEvents=[
            {
                'timestamp': timestamp,
                'message': message
            }
        ]
    )&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jul 2024 01:10:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/aws-nat-network-address-translation-automated-on-demand-destruct/m-p/81244#M36253</guid>
      <dc:creator>csmcpherson</dc:creator>
      <dc:date>2024-07-31T01:10:05Z</dc:date>
    </item>
  </channel>
</rss>

