Tag Archives: AWS Transfer for SFTP

AWS Transfer Family SFTP connectors now support VPC-based connectivity

Post Syndicated from Betty Zheng (郑予彬) original https://aws.amazon.com/blogs/aws/aws-transfer-family-sftp-connectors-now-support-vpc-based-connectivity/

Many organizations rely on the Secure File Transfer Protocol (SFTP) as the industry standard for exchanging critical business data. Traditionally, securely connecting to private SFTP servers required custom infrastructure, manual scripting, or exposing endpoints to the public internet.

Today, AWS Transfer Family SFTP connectors now support connectivity to remote SFTP servers through Amazon Virtual Private Cloud (Amazon VPC) environments. You can transfer files between Amazon Simple Storage Service (Amazon S3) and private or public SFTP servers while applying the security controls and network configurations already defined in your VPC. This capability helps you integrate data sources across on-premises environments, partner-hosted private servers, or internet-facing endpoints, with the operational simplicity of a fully managed Amazon Web Services (AWS) service.

New capabilities with SFTP connectors
The following are the key enhancements:

  • Connect to private SFTP servers – SFTP connectors can now reach endpoints that are only accessible within your AWS VPC connection. These include servers hosted in your VPC or a shared VPC, on-premises systems connected over AWS Direct Connect, and partner-hosted servers connected through VPN tunnels.
  • Security and compliance – All file transfers are routed through the security controls already applied in your VPC, such as AWS Network Firewall or centralized ingress and egress inspection. Private SFTP servers remain private and don’t need to be exposed to the internet. You can also present static Elastic IP or bring your own IP (BYOIP) addresses to meet partner allowlist requirements.
  • Performance and simplicity – By using your own network resources such as NAT gateways, AWS Direct Connect or VPN connections, connectors can take advantage of higher bandwidth capacity for large-scale transfers. You can configure connectors in minutes through the AWS Management Console,  AWS Command Line Interface (AWS CLI), or AWS SDKs without building custom scripts or third-party tools.

How VPC- based SFTP connections work
SFTP connectors use Amazon VPC Lattice resources to establish secure connectivity through your VPC. Key constructs include a resource configuration and a resource gateway. The resource configuration represents the target SFTP server, which you specify using a private IP address or public DNS name. The resource gateway provides SFTP connector access to these configurations, enabling file transfers to flow through your VPC and its security controls.

The following architecture diagram illustrates how traffic flows between Amazon S3 and remote SFTP servers. As shown in the architecture, traffic flows from Amazon S3 through the SFTP connector into your VPC. A resource gateway is the entry point that handles inbound connections from the connector to your VPC resources. Outbound traffic is routed through your configured egress path, using Amazon VPC NAT gateways with Elastic IPs for public servers or AWS Direct Connect and VPN connections for private servers. You can use existing IP addresses from your VPC CIDR range, simplifying partner server allowlists. Centralized firewalls in the VPC enforce security policies, and customer-owned NAT gateways provide higher bandwidth for large-scale transfers.

When to use this feature
With this capability, developers and IT administrators can simplify workflows while meeting security and compliance requirements across a range of scenarios:

  • Hybrid environments – Transfer files between Amazon S3 and on-premises SFTP servers using AWS Direct Connect or AWS Site-to-Site VPN, without exposing endpoints to the internet.
  • Partner integrations – Connect with business partners’ SFTP servers that are only accessible through private VPN tunnels or shared VPCs. This avoids building custom scripts or managing third-party tools, reducing operational complexity.
  • Regulated industries – Route file transfers through centralized firewalls and inspection points in VPCs to comply with financial services, government, or healthcare security requirements.
  • High-throughput transfers – Use your own network configurations such as NAT gateways, AWS Direct Connect, or VPN connections with Elastic IP or BYOIP to handle large-scale, high-bandwidth transfers while retaining IP addresses already on partner allowlists.
  • Unified file transfer solution – Standardize on Transfer Family for both internal and external SFTP connectivity, reducing fragmentation across file transfer tools.

Start building with SFTP connectors
To begin transferring files with SFTP connectors through my VPC environment, I follow these steps:

First, I configure my VPC Lattice resources. In the Amazon VPC console, under PrivateLink and Lattice in the navigation pane, I choose Resource gateways, choose Create resource gateway to create one to act as the ingress point into my VPC. Next, under PrivateLink and Lattice in the navigation pane, I choose Resource configuration and choose Create resource configuration to create a resource configuration for my target SFTP server. Specify the private IP address or public DNS name, and the port (typically 22).

Then, I configure AWS Identity and Access Management (IAM) permissions. I ensure that the IAM role used for connector creation has transfer:* permissions, and VPC Lattice permissions (vpc-lattice:CreateServiceNetworkResourceAssociation, vpc-lattice:GetResourceConfiguration, vpc-lattice:AssociateViaAWSService). I update the trust policy on the IAM role to specify transfer.amazonaws.com as a trusted principal. This enables AWS Transfer Family to assume the role when creating and managing my SFTP connectors.

After that, I create an SFTP connector through the AWS Transfer Family console. I choose SFTP Connectors and then choose Create SFTP connector. In the Connector configuration section, I select VPC Lattice as the egress type, then provide the Amazon Resource Name (ARN) of the Resource Configuration, Access role, and Connector credentials. Optionally, include a trusted host key for enhanced security, or override the default port if my SFTP server uses a nonstandard port.

Next, I test the connection. On the Actions menu, I choose Test connection to confirm that the connector can reach the target SFTP server.

Finally, after the connector status is ACTIVE, I can begin file operations with my remote SFTP server programmatically by calling Transfer Family APIs such as StartDirectoryListing, StartFileTransfer, StartRemoteDelete, or StartRemoteMove. All traffic is routed through my VPC using my configured resources such as NAT gateways, AWS Direct Connect, or VPN connections together with my IP addresses and security controls.

For the complete set of options and advanced workflows, refer to the AWS Transfer Family documentation.

Now available

SFTP connectors with VPC-based connectivity are now available in 21 AWS Regions. Check the AWS Services by Region for the latest supported AWS Regions. You can now securely connect AWS Transfer Family SFTP connectors to private, on-premises, or internet-facing servers using your own VPC resources such as NAT gateways, Elastic IPs, and network firewalls.

Betty

Use AWS Glue to streamline SFTP data processing

Post Syndicated from Seun Akinyosoye original https://aws.amazon.com/blogs/big-data/use-aws-glue-to-streamline-sftp-data-processing/

In today’s data-driven world, seamless integration and transformation of data across diverse sources into actionable insights is paramount. AWS Glue is a serverless data integration service that helps analytics users to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog. It enables you to visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes.

In this blog post, we explore how to use the SFTP Connector for AWS Glue from the AWS Marketplace to efficiently process data from Secure File Transfer Protocol (SFTP) servers into Amazon Simple Storage Service (Amazon S3), further empowering your data analytics and insights.

Introducing the SFTP connector for AWS Glue

The SFTP connector for AWS Glue simplifies the process of connecting AWS Glue jobs to extract data from SFTP storage and to load data into SFTP storage. This connector provides comprehensive access to SFTP storage, facilitating cloud ETL processes for operational reporting, backup and disaster recovery, data governance, and more.

Solution overview

In this example, you use AWS Glue Studio to connect to an SFTP server, then enrich that data and upload it to Amazon S3. The SFTP connector is used to manage the connection to the SFTP server. You will load the event data from the SFTP site, join it to the venue data stored on Amazon S3, apply transformations, and store the data in Amazon S3. The event and venue files are from the TICKIT dataset.

The TICKIT dataset tracks sales activity for the fictional TICKIT website, where users buy and sell tickets online for sporting events, shows, and concerts. In this dataset, analysts can identify ticket movement over time, success rates for sellers, and best-selling events, venues, and seasons.

For this example, you use AWS Glue Studio to develop a visual ETL pipeline. This pipeline will read data from an SFTP server, perform transformations, and then load the transformed data into Amazon S3. The following diagram illustrates this architecture.

solution overview

By the end of this post, your visual ETL job will resemble the following screenshot.

final solution

Prerequisites

For this solution, you need the following:

  • Subscribe to the SFTP Connector for AWS Glue in the AWS Marketplace.
  • Access to an SFTP server with permissions to upload and download data.
    • If the SFTP server is hosted on Amazon Elastic Compute Cloud (Amazon EC2), we recommend that the network communication between the SFTP server and the AWS Glue job happens within the virtual private cloud (VPC) as pictured in the preceding architecture diagram. Running your Glue job within a VPC and security group will be discussed further in the steps to create the AWS Glue job.
    • If the SFTP server is hosted within your on-premises network, we recommend that the network communication between the SFTP server and the Glue job happens through VPN or AWS DirectConnect.
  • Access to an S3 bucket or the permissions to create an S3 bucket. We recommend that you connect to that bucket using a gateway endpoint. This will allow you to connect to your S3 bucket directly from your VPC. If you need to create an S3 bucket to store the results, complete the following steps:
    1. On the Amazon S3 console, choose Buckets in the navigation pane.
    2. Choose Create bucket.
    3. For Name, enter a globally unique name for your bucket; for example, tickit-use1-<accountnumber>.
    4. Choose Create bucket.
    5. For this demonstration, create a folder with the name tickit in your S3 bucket.
    6. Create the gateway endpoint.
  • Create an AWS Identity and Access Management (IAM) role for the AWS Glue ETL job. You must specify an IAM role for the job to use. The role must grant access to all resources used by the job, including Amazon S3 (for any sources, targets, scripts, and temporary directories) and AWS Secrets Manager. For instructions, see Configure an IAM role for your ETL job.

Load dataset to SFTP site

Load the allevents_pipe.txt file and venue_pipe.txt file from the TICKIT dataset to your SFTP server.

Store SFTP server sign-in credentials

An AWS Glue connection is a Data Catalog object that stores connection information, such as URI strings and location to credentials that are stored in a Secrets Manager secret.

To store the SFTP server username and password in Secrets Manager, complete the following steps:

  1. On the Secrets Manager console, choose Secrets in the navigation pane.
  2. Choose Store a new secret.
  3. Select Other type of secret.
  4. Enter host as Secret key and your SFTP server’s IP address (for example, 153.47.122) as the Secret value, then choose Add row.
  5. Enter the username as Secret key and your SFTP username as Secret value, then choose Add row.
  6. Enter password as Secret key and your SFTP password as Secret value, then choose Add row.
  7. Enter keyS3Uri as Secret Key and the Amazon S3 location of your SFTP secret key file as Secret value

Note: Secret Value is the full S3 path where the SFTP server key file is stored. For example:s3://sftp-bucket-johndoe123/id_rsa.

  1. For Secret name, enter a descriptive name, then choose Next.
  2. Choose Next to move to the review step, then choose Store.

secret value

Create a connection to the SFTP server in AWS Glue

Complete the following steps to create your connection to the SFTP server.

  1. On the AWS Glue console, under Data Catalog in the navigation pane, choose Connections.

creating sftp connection from marketplace

  1. Select the SFTP connector for AWS Glue 4.0. Then choose Create connection.

using sftp connector

  1. Enter a name for the connection and then, under Connection access, choose the Secrets Manager secret you created for you SFTP server credentials.

finishing sftp connection

Create a connection to the VPC in AWS Glue

A data connection is used to establish network connectivity between the VPC and the AWS Glue job. To create the VPC connection, complete the following steps.

  1. On the AWS Glue console page, click on Data Connections location on the left side menu.
  2. Click the Create connection button in the Connections panel.

creating connection for VPC

  1. Select Network

choosing network option

  1. Select the VPC, Subnet, and Security Group that your SFTP server resides in. Click Next.

choosing vpc, subnet, sg for connection

  1. Name the connection SFTP VPC Connect and then click

Deploy the solution

Now that we completed the prerequisites, we are going to setup the AWS Glue Studio job for this solution. We will create a glue studio job, add events and venue data from the SFTP server, carry out data transformations and load transformed data to s3.

Create your AWS Glue Studio job:

  1. On the AWS Glue console, under ETL Jobs in the navigation pane, choose Visual ETL.
  2. Select Visual ETL in the central pane.
  3. Choose the pencil icon to enter a name for your job.
  4. Choose the Job details tab.

choosing job details

  1. Scroll down to and select Advanced properties and expand.
  2. Scroll to Connections and select SFTP VPC Connect.

choosing sftp vpc connection

  1. Choose Visual to go back to the workflow editor page.

Add the events data from the SFTP server as your first data set:

  1. Choose Add nodes and select SFTP Connector for AWS Glue 4.0 on the Sources
  2. Enter the following for Data source properties for:
    1. Connection: Select the connection to the SFTP server that you created in Create the connection to the SFTP server in AWS Glue.
    2. Enter the following key-value pairs:
Key Value
header false
path /files (this should be the path to the event file in your SFTP server)
fileFormat csv
delimiter |

glue studio job configuration

Rename the columns of the Event dataset:

  1. Choose Add nodes and choose Change Schema on the Transforms
  2. Enter the following transform properties:
    1. For Name, enter Rename Event data.
    2. For Node parents, select SFTP Connector for AWS Glue 4.0.
    3. In the Change Schema section, map the source keys to the target keys:
      1. col0: eventid
      2. col1: e_venueid
      3. col2: catid
      4. col3: dateid
      5. col4: eventname
      6. col5: starttime

transforming event data

Add the venue_pipe.txt file from the SFTP site:

  1. Choose Add nodes and choose SFTP Connector for AWS Glue 4.0 on the Sources
  2. Enter the following for Data source properties for:
    1. Connection: Select the connection to the SFTP server that you created in Create the connection to the SFTP server in AWS Glue.
    2. Enter the following key-value pairs:
Key Value
header false
path /files (this should be the path to the venue file in your SFTP site)
fileFormat csv
delimiter |

Rename the columns of the venue dataset:

  1. Choose Add nodes and choose Change Schema on the Transforms
  2. Enter the following transform properties:
    1. For Name, enter Rename Venue data.
    2. For Node parents, select Venue.
    3. In the Change Schema section, map the source keys to the target keys:
      1. col0: venueid
      2. col1: venuename
      3. col2: venuecity
      4. col3: venuestate
      5. col4: venueseats

transforming venue data

Join the venue and event datasets.

  1. Choose Add nodes and choose Join on the Transforms
  2. Enter the following transform properties:
    1. For Name, enter Join.
    2. For Node parents, select Rename Venue data and Rename Event data.
    3. For Join type¸ select Inner join.
    4. For Join conditions, select venueid for Rename Venue data and e_venueid for Rename Event data.

transform join venue and event

Drop the duplicate field:

  1. Choose Add nodes and choose Drop Fields on the Transforms
  2. Enter the following transform properties:
    1. For Name, enter Drop Fields.
    2. For Node parents, select Join.
    3. In the DropFields section, select e_venueid.

drop field transform

Load the data into your S3 bucket:

  1. Choose Add nodes and choose Amazon S3 from the Sources
  2. Enter the following transform properties:
    1. For Node parents, select Drop Fields.
    2. For Format, select CSV.
    3. For Compression Type, select None.
    4. For S3 Target Location, choose your S3 bucket and enter your desired file name followed by a slash (/).

loading data to s3 target

You can now save and run your AWS Glue visual ETL Job. Run the job and then go to the Runs tab to monitor its progress. After the job has completed, the Run status will change to Succeeded. The data will be in the target S3 bucket.

completed job

Clean up

To avoid incurring additional charges caused by resources created as part of this post, make sure you delete the items created in the AWS Account for this post:

  • Delete the Secrets Manager key created for the SFTP connector . credentials.
  • Delete the SFTP connector.
  • Unsubscribe from the SFTP Connector in AWS Marketplace.
  • Delete the data loaded to the Amazon S3 bucket and the bucket.
  • Delete the AWS Glue visual ETL job.

Conclusion

In this blog post, we demonstrated how to use the SFTP connector for AWS Glue to streamline the processing of data from SFTP servers into Amazon S3. This integration plays a pivotal role in enhancing your data analytics capabilities by offering an efficient and straightforward method to bring together disparate data sources. Whether your goal is to analyze SFTP server data for actionable insights, bolster your reporting mechanisms, or enrich your business intelligence tools, this connector ensures a more streamlined and cost-effective approach to achieving your data objectives.

For further details on the SFTP connector, see the SFTP Connector for Glue documentation.


About the Authors

Sean Bjurstrom is a Technical Account Manager in ISV accounts at Amazon Web Services, where he specializes in Analytics technologies and draws on his background in consulting to support customers on their analytics and cloud journeys. Sean is passionate about helping businesses harness the power of data to drive innovation and growth. Outside of work, he enjoys running and has participated in several marathons.

Seun Akinyosoye is a Sr. Technical Account Manager supporting public sector customer at Amazon Web Services. Seun has a background in analytics, data engineering which he uses to help customers achieve their outcomes and goals. Outside of work Seun enjoys spending time with his family, reading, traveling and supporting his favorite sports teams.

Vinod Jayendra is a Enterprise Support Lead in ISV accounts at Amazon Web Services, where he helps customers in solving their architectural, operational, and cost optimization challenges. With a particular focus on Serverless technologies, he draws from his extensive background in application development to deliver top-tier solutions. Beyond work, he finds joy in quality family time, embarking on biking adventures, and coaching youth sports team.

Kamen Sharlandjiev is a Sr. Big Data and ETL Solutions Architect, MWAA and AWS Glue ETL expert. He’s on a mission to make life easier for customers who are facing complex data integration and orchestration challenges. His secret weapon? Fully managed AWS services that can get the job done with minimal effort. Follow Kamen on LinkedIn to keep up to date with the latest MWAA and AWS Glue features and news!

Chris Scull is a Solutions Architect dealing in orchestration tools and modern cloud technologies. With two years of experience at AWS, Chris has developed an interest in Amazon Managed Workflows for Apache Airflow, which allows for efficient data processing and workflow management. Additionally, he is passionate about exploring the capabilities of GenAI with Bedrock, a platform for building generative AI applications on AWS.

Shengjie Luo is a Big data architect of Amazon Cloud Technology professional service team. Responsible for solutions consulting, architecture and delivery of AWS based data warehouse and data lake, and good at server-less computing, data migration, cloud data integration, data warehouse planning, data service architecture design and implementation.

Qiushuang Feng is a Solutions Architect at AWS, responsible for Enterprise customers’ technical architecture design, consulting, and design optimization on AWS Cloud services. Before joining AWS, Qiushuang worked in IT companies such as IBM and Oracle, and accumulated rich practical experience in development and analytics.