Over the last 12 months, organizations of all types have increasingly needed to stay connected to their customers. With the move to virtual interactions accelerating across industries, email has remained a trusted channel for customer communications. Amazon Simple Email Service (SES) has seen record outbound email traffic in 2020, supporting critical customer communications during COVID and commercial moments like Black Friday and Cyber Monday.
The importance of email during COVID
Unlike real-time communications like voice or live chat, email is asynchronous. It can be read and consumed at the customer’s leisure. In some geographies like North America, email also represents an individual’s unique identity, persisting longer than mobile phone numbers or social networking accounts. Even with the importance of email established before 2020, it was important to most organizations to send only the right messages during the COVID crisis.
Many organizations chose to decrease promotional or marketing emails during the pandemic voluntarily. This decrease in sending was to recognize the increased stress most individuals were facing in their personal lives. However, even with the drop in marketing emails across organizational types, there was an increased need to communicate and maintain customer engagement. Most organizations went through three distinct customer communication phases with email in 2020: React, Respond, and Reimagine.
React – These were the initial emails sent to acknowledge the COVID crisis, occurring early in 2020. These emails included messages reinforcing commitment to customer health, employee safety, or communicating new cleaning protocols.
Respond – These messages often included communication on the status of the business or event. Most businesses needed to communicate their transition to remote work, temporary closures, and many in-person events canceled.
Reimagine – Throughout the crisis, organizations were reimagining how to do business. Healthcare started operating video consultations, and restaurants shifted to pick up/take out only. Email communication was vital to take customers on the journey into this “new normal,” even as some businesses started to reopen.
To send these customer communications at scale, many organizations worked with Amazon SES.
How Amazon SES scaled and supported customers in 2020
Amazon SES saw several sending spikes that aligned with organizations working to communicate with their customers during COVID. Nine times in 2020, transactions per second (TPS) in Amazon SES exceeding 150% of the previous record held by 2019 Black Friday. This over 150% TPS spike also occurred on 2020 Black Friday and Cyber Monday.
In addition to supporting those upsurges in throughput, the Amazon SES team also responded to customer feedback on increasing the global footprint of Amazon SES. Since January, Amazon SES increased the total number of regions supported from 7 to 14, including the US government cloud. These additional regions were deployed during 2020 as the team worked remotely. This regional expansion enabled customers to adhere to local data sovereignty requirements for email sending while also improving performance.
Customers also told us they needed tools to help them manage compliance with important governance laws like CAN-SPAM and GDPR. Amazon SES released list management to help organizations manage their customer’s contact information and preferences.
Looking forward
As we move into 2021, email will remain at the forefront of customer communication channels. Enterprise customers like Netflix and Duolingo rely on Amazon SES to deliver their email at scale. For more information on how you can use Amazon SES, visit our website.
Email is a popular channel for applications, used in both marketing campaigns and other outbound customer communications. The challenge with email is that it can become increasingly complex to manage for companies that must send large quantities of messages per month. This complexity is especially true when companies need to measure detailed email engagement metrics to track campaign success.
As a marketer, you want to monitor several metrics, including open rates, click-through rates, bounce rates, and delivery rates. If you do not track your email results, you could potentially be wasting your campaign resources. Monitoring and interpreting your sending results can help you deliver the best content possible to your subscribers’ inboxes, and it can also ensure that your IP reputation stays high. Mailbox providers prioritize inbox placement for senders that deliver relevant content. As a business professional, tracking your emails can also help you stay on top of hot leads and important clients. For example, if someone has opened your email multiple times in one day, it might be a good idea to send out another follow-up email to touch base.
Building a large-scale email solution is a complex and expensive challenge for any business. You would need to build infrastructure, assemble your network, and warm up your IP addresses. Alternatively, working with some third-party email solutions require contract negotiations and upfront costs.
Fortunately, Amazon Simple Email Service (SES) has a highly scalable and reliable backend infrastructure to reduce the preceding challenges. It has improved content filtering techniques, reputation management features, and a vast array of analytics and reporting functions. These features help email senders reach their audiences and make it easier to manage email channels across applications. Amazon SES also provides API operations to monitor your sending activities through simple API calls. You can publish these events to Amazon CloudWatch, Amazon Kinesis Data Firehose, or by using Amazon Simple Notification Service (SNS).
In this post, you learn how to build and automate a serverless architecture that analyzes email events. We explore how to track important metrics such as open and click rate of the emails.
Solution overview
The metrics that you can measure using Amazon SES are referred to as email sending events. You can use Amazon CloudWatch to retrieve Amazon SES event data. You can also use Amazon SNS to interpret Amazon SES event data. However, in this post, we are going to use Amazon Kinesis Data Firehose to monitor our user sending activity.
Enable Amazon SES configuration sets with open and click metrics and publish email sending events to Amazon Kinesis Data Firehose as JSON records. A Lambda function is used to parse the JSON records and publish the content in the Amazon S3 bucket.
Ingested data lands in an Amazon S3 bucket that we refer to as the raw zone. To make that data available, you have to catalog its schema in the AWS Glue data catalog. You create and run the AWS Glue crawler that crawls your data sources and construct your Data Catalog. The Data Catalog uses pre-built classifiers for many popular source formats and data types, including JSON, CSV, and Parquet.
When the crawler is finished creating the table definition and schema, you analyze the data using Amazon Athena. It is an interactive query service that makes it easy to analyze data in Amazon S3 using SQL. Point to your data in Amazon S3, define the schema, and start querying using standard SQL, with most results delivered in seconds.
Now you can build visualizations, perform ad hoc analysis, and quickly get business insights from the Amazon SES event data using Amazon QuickSight. You can easily run SQL queries using Amazon Athena on data stored in Amazon S3, and build business dashboards within Amazon QuickSight.
Deploying the architecture:
Configuring Amazon Kinesis Data Firehose to write to Amazon S3:
Navigate to the Amazon Kinesis in the AWS Management Console. Choose Kinesis Data Firehose and create a delivery stream.
Enter delivery stream name as “SES_Firehose_Demo”.
Under the source category, select “Direct Put or other sources”.
On the next page, make sure to enable Data Transformation of source records with AWS Lambda. We use AWS Lambda to parse the notification contents that we only process the required information as per the use case.
Click the “Create New” Lambda function.
Click on “General Kinesis Data FirehoseProcessing” Lambda blueprint and this opens up the Lambda console. Enter following values in Lambda
Name: SES-Firehose-Json-Parser
Execution role: Create a new role with basic Lambda permissions.
Click “Create Function”. Now replace the Lambda code with the following provided code and save the function.
For this blog, we are only filtering out three fields i.e. Eventname, destination_Email, and SourceIP. If you want to store other parameters you can modify your code accordingly. For the list of information that we receive in notifications, you may check out the following document.
You may utilize the above values in the Amazon S3 prefix and error prefix. If you use your own prefixes make sure to accordingly update the target values in AWS Glue which you will see in further process.
Keep the Amazon S3 backup option disabled and click “Next”.
On the next page, under the Permissions section, select create a new role. This opens up a new tab and then click “Allow” to create the role.
Navigate back to the Amazon Kinesis Data Firehose console and click “Next”.
Review the changes and click on “Create delivery stream”.
Configure Amazon SES to publish event data to Kinesis Data Firehose:
Navigate to Amazon SES console and select “Email Addresses” from the left side.
Click on “Verify a New Email Address” on the top. Enter your email address to which you send a test email.
Go to your email inbox and click on the verify link. Navigate back to the Amazon SES console and you will see verified status on the email address provided.
Open the Amazon SES console and select Configuration set from the left side.
Create a new configuration set. Enter “SES_Firehose_Demo” as the configuration set name and click “Create”.
Choose Kinesis Data Firehose as the destination and provide the following details.
Name: OpenClick
Event Types: Open and Click
In the IAM Role field, select ‘Let SES make a new role’. This allows SES to create a new role and add sufficient permissions for this use case in that role.
Click “Save”.
Sending a Test email:
Navigate to Amazon SES console, click on “Email Addresses” on the left side.
Select your verified email address and click on “Send a Test email”.
Make sure you select the raw email format. You may use the following format to send out a test email from the console. Make sure you send out this email to a recipient inbox to which you have the access.
X-SES-CONFIGURATION-SET: SES_Firehose_Demo
X-SES-MESSAGE-TAGS: Email=NULL
From: [email protected]
To: [email protected]
Subject: Test email
Content-Type: multipart/alternative;
boundary="----=_boundary"
------=_boundary
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit
This is a test email.
<a href="https://aws.amazon.com/">Amazon Web Services</a>
------=_boundary
Once the email is received in the recipient’s inbox, open the email and click the link present in the same. This generates a click and open event and send the response back to SES.
Creating Glue Crawler:
Navigate to the AWS Glue console, select “crawler” from the left side, and then click on “Add crawler” on the top.
Enter the crawler name as “SES_Firehose_Crawler” and click “Next”.
Under Crawler source type, select “Data stores” and click “Next”.
Select Amazon S3 as the data source and prove the required path. Include the path until the “fhbase” folder.
Select “no” under Add another data source section.
In the IAM role, select the option to ‘Create an IAM role’. Enter the name as “SES_Firehose-Crawler”. This provides the necessary permissions automatically to the newly created role.
In the frequency section, select run on demand and click “Next”. You may choose this value as per your use case.
Click on add Database and provide the name as “ses_firehose_glue_db”. Click on create and then click “Next”.
Review your Glue crawler setting and click on “Finish”.
Run the above-created crawler. This crawls the data from the specified Amazon S3 bucket and create a catalog and table definition.
Now navigate to “tables” on the left, and verify a “fhbase” table is created after you run the crawler.
If you want to analyze the data stored until now, you can use Amazon Athena and test the queries. If not, you can move to the Amazon Quicksight directly.
Analyzing the data using Amazon Athena:
Open Athena console and select the database, which is created using AWS Glue
Click on “setup a query result location in Amazon S3” as shown in the following screenshot.
Navigate to the Amazon S3 bucket created in earlier steps and create a folder called “AthenaQueryResult”. We store our Athena query result in this bucket.
Now navigate back to Amazon Athena and select the Amazon S3 bucket with the folder location as shown in the following screenshot and click “Save”.
Run the following query to test the sample output and accordingly modify your SQL query to get the desired output.
Select * from “ses_firehose_glue_db”.”fhbase”
Note: If you want to track the opened emails by unique Ip addresses then you can modify your SQL query accordingly. This is because every time an email gets opened, you will receive a notification even if the same email was previously opened.
Visualizing the data in Amazon QuickSight dashboards:
Now, let’s analyze this data using Amazon Athena via Amazon Quicksight.
Log into Amazon Quicksight and choose Manage data, New dataset. Choose Amazon Athena as a new data source.
Enter the data source name as “SES-Demo” and click on “Create the data source”.
Select your database from the drop-down as “ses_firehose_glue_db” and table “fhbase” that you have created in AWS Glue.
And add a custom SQL based on your use case and click on “Confirm query”. Refer to the example below.
You can perform ad hoc analysis and modify your query according to your business needs as shown in the following image. Click “Save & Visualize”.
You can now visualize your event data on Amazon Quicksight dashboard. You can use various graphs to represent your data. For this demo, the default graph is used and two fields are selected to populate on the graph, as shown below.
Conclusion:
This architecture shows how to track your email sending activity at a granular level. You set up Amazon SES to publish event data to Amazon Kinesis Data Firehose based on fine-grained email characteristics that you define. You can also track several types of email sending events, including sends, deliveries, bounces, complaints, rejections, rendering failures, and delivery delays. This information can be useful for operational and analytical purposes.
Chirag Oswal is a solutions architect and AR/VR specialist working with the public sector India. He works with AWS customers to help them adopt the cloud operating model on a large scale.
Apoorv Gakhar is a Cloud Support Engineer and an Amazon SES Expert. He is working with AWS to help the customers integrate their applications with various AWS Services.
Businesses need to send real-time notifications in order to take action when alerted of a critical situation. Examples could include anomaly detection, healthcare emergencies, operations failures, and fraud transactions. Email, SMS, and push notifications are often used to notify stakeholders in real-time. However, building a large-scale, real-time notification solution can be a complex and costly challenge for a business.
Amazon Pinpoint enables you to engage with your stakeholders in real-time by sending email, SMS and push notifications. Your app can use the Amazon Pinpoint API and the AWS SDKs to send direct messages. With transactional messages, you send alerts to specific recipients, as opposed to messages that you send to segments. There is no minimum fee, no setup cost, and no fixed monthly cost with Amazon Pinpoint.
In this blog, we explore a solution to notify stakeholders of a large customer transaction. This requires immediate attention as our stakeholders want to ensure that there is enough inventory available to deliver goods with no delay.
Solution Overview
The solution that we build to handle this use case can be deployed in one hour. The following diagram illustrates the AWS services integrated in this solution:
At a high level, the solution uses the following workflow:
1. Define large value transaction threshold in rule table in Amazon DynamoDB. 2. Setup Amazon Pinpoint and configure to send email and SMS. 3. Setup AWS Lambda and implement the logic to send SMS and email if a customer makes a large value order transaction. 4. Create a test transaction in an order table using Amazon DynamoDB. 5. Check details of SMS and email received.
Setting up the solution
Step 1: Set up Amazon Pinpoint
The first step in setting up this solution is to create a new Amazon Pinpoint project and configure the SMS and Email channel. 1. Navigate to Services -> Pinpoint. 2.Click on create a project. 3. Provide a name and click on the Create button. 4. Select Email in left panel and click on Edit. 5. Select “Enable the email channel for this project”. Select “Verify a new email address” and provide a default sender address. Click on verify email address. Click on save. You should get a verification mail. Verify the email by clicking on the verification link.
Once your email is verified, it should look like this:
6. Repeat the same step to verify receiver email address as well.
Please note: By enabling the email channel, we can send up to 200 email per day in sandbox mode. We must verify an email address or domain identity in order to send any email. While we remain in sandbox, we also must verify all recipient email addresses before we proceed — this does not apply in production.
7. Select SMS and voice in left panel and select “Enable the SMS channel for this project”. Please make sure “Transactional” is selected for critical or time-sensitive messages. Amazon Pinpoint optimizes the delivery of these messages for highest reliability.
Please note: We do not need to verify any phone numbers for this channel. However, we can request dedicated codes (either short or long) for our own exclusive use. Otherwise AWS will automatically allocate codes when we send SMS.
4. Create order_detail table. Provide order_id as partition key. You can select “String” as data type. 5. Now we must enable the stream in this table. Once we enable a stream on a table, Amazon DynamoDB captures information about every modification to data items in the table. We will integrate this table with AWS Lambda to validate the order value. Click on Manage Stream. 6. Select appropriate view type and click on Enable.
Step 3: Set up AWS Lambda
In this step, we create an AWS Lambda function and then integrate it with the order_detail table. After, we will check the order and send a notification if large enough.
1. Navigate to Service>Lambda. 2.Click on Create function. 3. Select runtime as Python 3.6. Select an execution role. Please make sure that your role gives read access to Amazon DynamoDB table. 4. Click on Add Trigger. 5. Select the Amazon DynamoDB table from the drop-down. Select order_detail table from the drop-down. Keep everything default. This integrates order_detail streaming to our AWS Lambda function.
Our Lambda function is invoked every time a transaction happens in order_detail table.
1. That’s it! We have built the solution. Now it’s time to do an end to end test by creating an order transaction in order_detail table.
2. Run the Amazon DynamoDB put-item command to create a new item over our threshhold, or replace an old item with a new one. In our case it will be a new order transaction in order_detail table. Please make sure you have configured AWS CLI. You can refer to the quickstart guide to learn more.
aws dynamodb put-item --table-name order_detail --item "{""order_id"":{""S"":""O0001""},""customer_id"":{""S"":""C0001""},""customer_name"":{""S"":""JOHN MILLER""},""order_date"":{""S"":""2020-06-13T17:42:34Z""},""item_id"":{""S"":""P0001""},""item_quantity"":{""N"":""12""},""order_value"":{""N"":""20000""},""unit"":{""S"":""$""},""delivery_date"":{""S"":""2020-06-20""}}" --return-consumed-capacity TOTAL
3. It returns a success message similar to this:
4. The order value of $20,000 is falling in the range of our large value order transaction definition (between $10,000 to $50,000). You should receive an SMS on the mobile number you have provided in the transaction_alert_rule table.
5. You will also receive an email as per configuration in the transaction_alert_rule table.
Step 4: Clean up
You’ve now successfully built our real-time notification solution using Amazon Pinpoint. Delete the resource you created in this blog like the Amazon DynamoDB tables to avoid ongoing charges. You can use the AWS CLI, AWS Management Consoles, or the AWS APIs to perform the cleanup.
Conclusion
Customers can use Amazon Pinpoint to help scale communications across use cases, including real-time notifications. Amazon Pinpoint is a flexible and scalable outbound and inbound marketing communications service. You can connect with customers and stakeholders over channels like email, SMS, push, or voice.
A criminal group called Cosmic Lynx seems to be based in Russia:
Dubbed Cosmic Lynx, the group has carried out more than 200 BEC campaigns since July 2019, according to researchers from the email security firm Agari, particularly targeting senior executives at large organizations and corporations in 46 countries. Cosmic Lynx specializes in topical, tailored scams related to mergers and acquisitions; the group typically requests hundreds of thousands or even millions of dollars as part of its hustles.
[…]
For example, rather than use free accounts, Cosmic Lynx will register strategic domain names for each BEC campaign to create more convincing email accounts. And the group knows how to shield these domains so they’re harder to trace to the true owner. Cosmic Lynx also has a strong understanding of the email authentication protocol DMARC and does reconnaissance to assess its targets’ specific system DMARC policies to most effectively circumvent them.
Cosmic Lynx also drafts unusually clean and credible-looking messages to deceive targets. The group will find a company that is about to complete an acquisition and contact one of its top executives posing as the CEO of the organization being bought. This phony CEO will then involve “external legal counsel” to facilitate the necessary payments. This is where Cosmic Lynx adds a second persona to give the process an air of legitimacy, typically impersonating a real lawyer from a well-regarded law firm in the United Kingdom. The fake lawyer will email the same executive that the “CEO” wrote to, often in a new email thread, and share logistics about completing the transaction. Unlike most BEC campaigns, in which the messages often have grammatical mistakes or awkward wording, Cosmic Lynx messages are almost always clean.
GoatCounter 1.2 (due to be released later today or tomorrow) will switch from email-based authentication to password authentication.
The way it originally worked is that you would sign up with your email, and to login a “magic link” with a secret token would be emailed to you, which will set the cookie and log you in.
I did it like this after a suggestions/discussion at Lobste.rs last year, and I thought it would be easier to implement (it’s not) and easier for users (it’s not).
Some problems I encountered:
Some (perhaps even many?) people use password managers, and for them the whole email thing is an annoyingly different workflow. The built-in Firefox password manager has actually seen quite a few nice improvements recently, making it easier than ever.
Getting 100% of emails delivered is hard and requires quite some setup.
It’s harder to get the self-hosted setup running, since emails are a hard requirement.
If you lose access to your email, you lose access to your account.
One issue I had is people misspelling their email during signup, so they were immediately locked out of their account. This is an issue especially on GoatCounter since people choose a domain code (mycode.goatcounter.com) during signup, and this code is now “taken” if they choose to re-register.
It’s pretty hard to test if an email is deliverable without sending an actual email (and even then you don’t know, as the receiving server may silently drop it or classify it as spam).
I got quite a few “undeliverable email” returns from the “please set a password”-announcement I sent this morning, and these people will never be able to login unless they email me and ask me to change their email manually. I presume these are inactive accounts, but still…
I found it annoying to use myself:
Go to arp242.goatcounter.com
Enter my email.
Go to FastMail.
Wait 10 seconds to 5 minutes for the email to arrive.
Click the link, opening a new logged-in goatcounter tab.
Close the old goatcounter tab where I filled in my email and FastMail tab.
Whereas with password auth it’s:
Go to arp242.goatcounter.com
Enter my email and password.
It’s quite a bit of code to deal with various edge cases, allowing people to configure the From address, etc. The code is neither simpler nor shorter than just using password auth.
It’s harder in development. I added some special code to show the signin link when using -dev so I didn’t have to copy it from the GoatCounter logs, but meh.
The signin links doesn’t always work: they’re valid only once and I think some “preview” or “prefetch” logic accessed the URL and invalidated it, but I never quite got to the bottom of it in spite of quite a bit of effort (I never experienced it myself, just got user reports).
This is fixable by allowing the link to be re-used within 15 minutes or so, but this was really the straw that made me implement the password auth.
In some other use cases it might work better, but for GoatCounter it didn’t really work very well (the Lobsters thread linked above has some people report different experiences).
I considered allowing both email and password auth, but that’s more code to maintain and more flags in the CLI, so figured it’s best to just remove the email auth. I’m not sure if anyone truly likes it.
In my own experience outside of GoatCounter as a user I find logging in with some magic token (Medium sends emails, Tinder sends SMS) annoying in general. Then again, I also find logging in with a strong password on mobile annoying and the value of some token-based auth is probably greater there. I’m not sure what the best solution is here, but I do know that forcing everyone to use token-based authentication is probably not a good idea.
The Firefox password sync solved most of my issues here. I was wary of syncing passwords from the browser for a long time, but found myself changing and re-using passwords just to log in on mobile, which is even less secure.
I might bring back some token-based auth in the future as an auxiliary method, sending tokens over SMS or Telegram is probably easier here. But for now, there’s bigger fish to fry in GoatCounter.
Last year, ZecOps discovered two iPhone zero-day exploits. They will be patched in the next iOS release:
Avraham declined to disclose many details about who the targets were, and did not say whether they lost any data as a result of the attacks, but said “we were a bit surprised about who was targeted.” He said some of the targets were an executive from a telephone carrier in Japan, a “VIP” from Germany, managed security service providers from Saudi Arabia and Israel, people who work for a Fortune 500 company in North America, and an executive from a Swiss company.
[…]
On the other hand, this is not as polished a hack as others, as it relies on sending an oversized email, which may get blocked by certain email providers. Moreover, Avraham said it only works on the default Apple Mail app, and not on Gmail or Outlook, for example.
Google presented its system of using deep-learning techniques to identify malicious email attachments:
At the RSA security conference in San Francisco on Tuesday, Google’s security and anti-abuse research lead Elie Bursztein will present findings on how the new deep-learning scanner for documents is faring against the 300 billion attachments it has to process each week. It’s challenging to tell the difference between legitimate documents in all their infinite variations and those that have specifically been manipulated to conceal something dangerous. Google says that 63 percent of the malicious documents it blocks each day are different than the ones its systems flagged the day before. But this is exactly the type of pattern-recognition problem where deep learning can be helpful.
[…]
The document analyzer looks for common red flags, probes files if they have components that may have been purposefully obfuscated, and does other checks like examining macros — the tool in Microsoft Word documents that chains commands together in a series and is often used in attacks. The volume of malicious documents that attackers send out varies widely day to day. Bursztein says that since its deployment, the document scanner has been particularly good at flagging suspicious documents sent in bursts by malicious botnets or through other mass distribution methods. He was also surprised to discover how effective the scanner is at analyzing Microsoft Excel documents, a complicated file format that can be difficult to assess.
This is the sort of thing that’s pretty well optimized for machine-learning techniques.
Motherboard has a long article on apps — Edison, Slice, and Cleanfox — that spy on your email by scraping your screen, and then sell that information to others:
Some of the companies listed in the J.P. Morgan document sell data sourced from “personal inboxes,” the document adds. A spokesperson for J.P. Morgan Research, the part of the company that created the document, told Motherboard that the research “is intended for institutional clients.”
That document describes Edison as providing “consumer purchase metrics including brand loyalty, wallet share, purchase preferences, etc.” The document adds that the “source” of the data is the “Edison Email App.”
[…]
A dataset obtained by Motherboard shows what some of the information pulled from free email app users’ inboxes looks like. A spreadsheet containing data from Rakuten’s Slice, an app that scrapes a user’s inbox so they can better track packages or get their money back once a product goes down in price, contains the item that an app user bought from a specific brand, what they paid, and an unique identification code for each buyer.
Note: This post was written by Zach Barbitta, the Product Lead for Pinpoint Journeys, and Josh Kahn, an AWS Solution Architect.
We recently added a new feature called journeys to Amazon Pinpoint. With journeys, you can create fully automated, multi-step customer engagements through an easy to use, drag-and-drop interface.
In this post, we’ll provide a quick overview of the features and capabilities of Amazon Pinpoint journeys. We’ll also provide a blueprint that you can use to build your first journey.
What is a journey?
Think of a journey as being similar to a flowchart. Every journey begins by adding a segment of participants to it. These participants proceed through steps in the journey, which we call activities. Sometimes, participants split into different branches. Some participants end their journeys early, while some proceed through several steps. The experience of going down one path in the journey can be very different from the experience of going down a different path. In a journey, you determine the outcomes that are most relevant for your customers, and then create the structure that leads to those outcomes.
Your journey can contain several different types of activities, each of which does something different when journey participants arrive on it. For example, when participants arrive on an Email activity, they receive an email. When they land on a Random Split activity, they’re randomly separated into one of up to five different groups. You can also separate customers based on their attributes, or based on their interactions with previous steps in the journey, by using a Yes/no Split or a Multivariate Split activity. There are six different types of activities that you can add to your journeys. You can find more information about these activity types in our User Guide.
The web-based Amazon Pinpoint management console includes an easy-to-use, drag-and-drop interface for creating your journeys. You can also create journeys programmatically by using the Amazon Pinpoint API, the AWS CLI, or an AWS SDK. Best of all, you can use the API to modify journeys that you created in the console, and vice-versa.
Planning our journey
The flexible nature of journeys makes it easy to create customer experiences that are timely, personalized, and relevant. In this post, we’ll assume the role of a music streaming service. We’ll create a journey that takes our customers through the following steps:
When customers sign up, we’ll send them an email that highlights some of the cool features they’ll get by using our service.
After 1 week, we’ll divide customers into two separate groups based whether or not they opened the email we sent them when they signed up.
To the group of customers who opened the first message, we’ll send an email that contains additional tips and tricks.
To the group who didn’t open the first message, we’ll send an email that reiterates the basic information that we mentioned in the first message.
The best part about journeys in Amazon Pinpoint is that you can set them up in just a few minutes, without having to do any coding, and without any specialized training (unless you count reading this blog post as training).
After we launch this journey, we’ll be able to view metrics through the same interface that we used to create the journey. For example, for each split activity, we’ll be able to see how many participants were sent down each path. For each Email activity, we’ll be able to view the total number of sends, deliveries, opens, clicks, bounces, and unsubscribes. We’ll also be able to view these metrics as aggregated totals for the entire journey. These metrics can help you discover what worked and what didn’t work in your journey, so that you can build more effective journeys in the future.
First steps
There are a few prerequisites that we have to complete before we can start creating the journey.
Create Endpoints
When a new user signs up for your service, you have to create an endpoint record for that user. One way to do this is by using AWS Amplify (if you use Amplify’s built-in analytics tools, Amplify creates Amazon Pinpoint endpoints). You can also use AWS Lambda to call Amazon Pinpoint’s UpdateEndpoint API. The exact method that you use to create these endpoints is up to your engineering team.
When your app creates new endpoints, they should include a user attribute that identifies them as someone who should participate in the journey. We’ll use this attribute to create our segment of journey participants.
If you just want to get started without engaging your engineering team, you can import a segment of endpoints to use in your journey. To learn more, see Importing Segments in the Amazon Pinpoint User Guide.
Verify an Email Address
In Amazon Pinpoint, you have to verify the email addresses that you use to send email. By verifying an email address, you prove that you own it, and you grant Amazon Pinpoint permission to send your email from that address. For more information, see Verifying Email Identities in the Amazon Pinpoint User Guide.
Create a Template
Before you can send email from a journey, you have to create email templates. Email templates are pre-defined message bodies that you can use to send email messages to your customers. You can learn more about creating email templates in the Amazon Pinpoint User Guide.
To create the journey that we discussed earlier, we’ll need at least three email templates: one that contains the basic information that we want to share with new customers, one with more advanced content for users who opened the first email, and another that reiterates the content in the first message.
Create a Segment
Finally, you need to create the segment of users who will participate in the journey. If your service creates an endpoint record that includes a specific user attribute, you can use this attribute to create your segment. You can learn more about creating segments in the Amazon Pinpoint User Guide.
Building the Journey
Now that we’ve completed the prerequisites, we can start building our journey. We’ll break the process down into a few quick steps.
Set it up
Start by signing in to the Amazon Pinpoint console. In the navigation pane on the right side of the page, choose Journeys, and then choose Create a journey. In the upper left corner of the page, you’ll see an area where you can enter a name for the journey. Give the journey a name that helps you identify it.
Add participants
Every journey begins with a Journey entry activity. In this activity, you define the segment that will participate in the journey. On the Journey entry activity, choose Set entry condition. In the list, choose the segment that contains the participants for the journey.
On this activity, you can also control how often new participants are added to the journey. Because new customers could sign up for our service at any time, we want to update the segment regularly. For this example, we’ll set up the journey to look for new segment members once every hour.
Send the initial email
Now we can configure the first message that we’ll send in our journey. Under the Journey entry activity that you just configured, choose Add activity, and then choose Send email. Choose the email template that you want to use in the email, and then specify the email address that you want to send the email from.
Add the split
Under the Email activity, choose Add activity, and then choose Yes/no split. This type of activity looks for all journey participants who meet a condition and sends them all down a path (the “Yes” path). All participants who don’t meet that condition are sent down a “No” path. This is similar to the Multivariate split activity. The difference between the two is that the Multivariate split can produce up to four branches, each with its own criteria, plus an “Else” branch for everyone who doesn’t meet the criteria in the other branches. A Yes/no split activity, on the other hand, can only ever have two branches, “Yes” and “No”. However, unlike the Multivariate split activity, the “Yes” branch in a Yes/no split activity can contain more than one matching criteria.
In this case, we’ll set up the “Yes” branch to look for email click events in the Email activity that occurred earlier in the journey. We also know that the majority of recipients who open the email aren’t going to do so the instant they receive the email. For this reason, we’ll adjust the Condition evaluation setting for the activity to wait for 7 days before looking for open events. This gives our customers plenty of time to receive the message, check their email, open the message, and, if they’re interested in the content of the message, click a link in the message. When you finish setting up the split activity, it should resemble the example shown in the following image.
Continue building the branches
From here, you can continue building the branches of the journey. In each branch, you’ll send a different email template based on the needs of the participants in that branch. Participants in the “Yes” branch will receive an email template that contains more advanced information, while those in the “No” branch will receive a template that revisits the content from the original message.
To learn more about setting up the various types of journey activities, see Setting Up Journey Activities in the Amazon Pinpoint User Guide.
Review the journey
When you finish adding branches and activities to your journey, choose the Review button in the top right corner of the page. When you do, the Review your journey pane opens on the left side of the screen. The Review your journey pane shows you errors that need to be fixed in your journey. It also gives you some helpful recommendations, and provides some best practices that you can use to optimize your journey. You can click any of the notifications in the Review your journey page to move immediately to the journey activity that the message applies to. When you finish addressing the errors and reviewing the recommendations, you can mark the journey as reviewed.
Testing and publishing the journey
After you review the journey, you have a couple options. You can either launch the journey immediately, or test the journey before you launch it.
If you want to test the journey, close the Review your journey pane. Then, on the Actions menu, choose Test. When you test your journey, you have to choose a segment of testers. The segment that you choose should only contain the internal recipients on your team who you want to test the journey before you publish it. You also have the option of changing the wait times in your journey, or skipping them altogether. If you choose a wait time of 1 hour here, it makes it much easier to test the Yes/no split activity (which, in the production version of the journey, contains a 7 day wait). To learn more about testing your journey, including some tips and best practices, see Testing Your Journey in the Amazon Pinpoint User Guide.
When you finish your testing, complete the review process again. On the final page of content in the Review your journey pane, choose Publish. After a short delay, your journey goes live. That’s it! You’ve created and launched your very first journey!
Pretty horrible story of a US journalist who had his computer and phone searched at the border when returning to the US from Mexico.
After I gave him the password to my iPhone, Moncivias spent three hours reviewing hundreds of photos and videos and emails and calls and texts, including encrypted messages on WhatsApp, Signal, and Telegram. It was the digital equivalent of tossing someone’s house: opening cabinets, pulling out drawers, and overturning furniture in hopes of finding something — anything — illegal. He read my communications with friends, family, and loved ones. He went through my correspondence with colleagues, editors, and sources. He asked about the identities of people who have worked with me in war zones. He also went through my personal photos, which I resented. Consider everything on your phone right now. Nothing on mine was spared.
Pomeroy, meanwhile, searched my laptop. He browsed my emails and my internet history. He looked through financial spreadsheets and property records and business correspondence. He was able to see all the same photos and videos as Moncivias and then some, including photos I thought I had deleted.
If you are a U.S. citizen, border agents cannot stop you from entering the country, even if you refuse to unlock your device, provide your device password, or disclose your social media information. However, agents may escalate the encounter if you refuse. For example, agents may seize your devices, ask you intrusive questions, search your bags more intensively, or increase by many hours the length of detention. If you are a lawful permanent resident, agents may raise complicated questions about your continued status as a resident. If you are a foreign visitor, agents may deny you entry.
The most important piece of advice is to think about this all beforehand, and plan accordingly.
In Gmail addresses, the dots don’t matter. The account “[email protected]” maps to the exact same address as “[email protected]” and “[email protected]” — and so on. (Note: I own none of those addresses, if they are actually valid.)
Recently, we observed a group of BEC actors make extensive use of Gmail dot accounts to commit a large and diverse amount of fraud. Since early 2018, this group has used this fairly simple tactic to facilitate the following fraudulent activities:
Submit 48 credit card applications at four US-based financial institutions, resulting in the approval of at least $65,000 in fraudulent credit
Register for 14 trial accounts with a commercial sales leads service to collect targeting data for BEC attacks
File 13 fraudulent tax returns with an online tax filing service
Submit 12 change of address requests with the US Postal Service
Submit 11 fraudulent Social Security benefit applications
Apply for unemployment benefits under nine identities in a large US state
Submit applications for FEMA disaster assistance under three identities
In each case, the scammers created multiple accounts on each website within a short period of time, modifying the placement of periods in the email address for each account. Each of these accounts is associated with a different stolen identity, but all email from these services are received by the same Gmail account. Thus, the group is able to centralize and organize their fraudulent activity around a small set of email accounts, thereby increasing productivity and making it easier to continue their fraudulent behavior.
This isn’t a new trick. It has been previously documented as a way to trick Netflix users.
Attackers are targeting two-factor authentication systems:
Attackers working on behalf of the Iranian government collected detailed information on targets and used that knowledge to write spear-phishing emails that were tailored to the targets’ level of operational security, researchers with security firm Certfa Lab said in a blog post. The emails contained a hidden image that alerted the attackers in real time when targets viewed the messages. When targets entered passwords into a fake Gmail or Yahoo security page, the attackers would almost simultaneously enter the credentials into a real login page. In the event targets’ accounts were protected by 2fa, the attackers redirected targets to a new page that requested a one-time password.
This isn’t new. I wrote about this exact attack in 2005 and 2009.
Facebook is not content to use the contact information you willingly put into your Facebook profile for advertising. It is also using contact information you handed over for security purposes and contact information you didn’t hand over at all, but that was collected from other people’s contact books, a hidden layer of details Facebook has about you that I’ve come to call “shadow contact information.” I managed to place an ad in front of Alan Mislove by targeting his shadow profile. This means that the junk email address that you hand over for discounts or for shady online shopping is likely associated with your account and being used to target you with ads.
They found that when a user gives Facebook a phone number for two-factor authentication or in order to receive alerts about new log-ins to a user’s account, that phone number became targetable by an advertiser within a couple of weeks. So users who want their accounts to be more secure are forced to make a privacy trade-off and allow advertisers to more easily find them on the social network.
Last week, researchersdisclosed vulnerabilities in a large number of encrypted e-mail clients: specifically, those that use OpenPGP and S/MIME, including Thunderbird and AppleMail. These are seriousvulnerabilities: An attacker who can alter mail sent to a vulnerable client can trick that client into sending a copy of the plaintext to a web server controlled by that attacker. The story of these vulnerabilities and the tale of how they were disclosed illustrate some important lessons about security vulnerabilities in general and e-mail security in particular.
But first, if you use PGP or S/MIME to encrypt e-mail, you need to check the list on this page and see if you are vulnerable. If you are, check with the vendor to see if they’ve fixed the vulnerability. (Note that some early patches turned out not to fix the vulnerability.) If not, stop using the encrypted e-mail program entirely until it’s fixed. Or, if you know how to do it, turn off your e-mail client’s ability to process HTML e-mail or — even better — stop decrypting e-mails from within the client. There’s even more complex advice for more sophisticated users, but if you’re one of those, you don’t need me to explain this to you.
Consider your encrypted e-mail insecure until this is fixed.
All software contains security vulnerabilities, and one of the primary ways we all improve our security is by researchers discovering those vulnerabilities and vendors patching them. It’s a weird system: Corporate researchers are motivated by publicity, academic researchers by publication credentials, and just about everyone by individual fame and the small bug-bounties paid by some vendors.
Software vendors, on the other hand, are motivated to fix vulnerabilities by the threat of public disclosure. Without the threat of eventual publication, vendors are likely to ignore researchers and delay patching. This happened a lot in the 1990s, and even today, vendors often use legal tactics to try to block publication. It makes sense; they look bad when their products are pronounced insecure.
Over the past few years, researchers have started to choreograph vulnerability announcements to make a big press splash. Clever names — the e-mail vulnerability is called “Efail” — websites, and cute logos are now common. Key reporters are given advance information about the vulnerabilities. Sometimes advance teasers are released. Vendors are now part of this process, trying to announce their patches at the same time the vulnerabilities are announced.
This simultaneous announcement is best for security. While it’s always possible that some organization — either government or criminal — has independently discovered and is using the vulnerability before the researchers go public, use of the vulnerability is essentially guaranteed after the announcement. The time period between announcement and patching is the most dangerous, and everyone except would-be attackers wants to minimize it.
Things get much more complicated when multiple vendors are involved. In this case, Efail isn’t a vulnerability in a particular product; it’s a vulnerability in a standard that is used in dozens of different products. As such, the researchers had to ensure both that everyone knew about the vulnerability in time to fix it and that no one leaked the vulnerability to the public during that time. As you can imagine, that’s close to impossible.
Efail was discovered sometime last year, and the researchers alerted dozens of different companies between last October and March. Some companies took the news more seriously than others. Most patched. Amazingly, news about the vulnerability didn’t leak until the day before the scheduled announcement date. Two days before the scheduled release, the researchers unveiled a teaser — honestly, a really bad idea — which resulted in details leaking.
After the leak, the Electronic Frontier Foundation posted a notice about the vulnerability without details. The organization has beencriticized for its announcement, but I am hard-pressed to find fault with its advice. (Note: I am a board member at EFF.) Then, the researchers published — and lotsofpressfollowed.
All of this speaks to the difficulty of coordinating vulnerability disclosure when it involves a large number of companies or — even more problematic — communities without clear ownership. And that’s what we have with OpenPGP. It’s even worse when the bug involves the interaction between different parts of a system. In this case, there’s nothing wrong with PGP or S/MIME in and of themselves. Rather, the vulnerability occurs because of the way many e-mail programs handle encrypted e-mail. GnuPG, an implementation of OpenPGP, decided that the bug wasn’t its fault and did nothing about it. This is arguably true, but irrelevant. They should fix it.
Expect more of these kinds of problems in the future. The Internet is shifting from a set of systems we deliberately use — our phones and computers — to a fully immersive Internet-of-things world that we live in 24/7. And like this e-mail vulnerability, vulnerabilities will emerge through the interactions of different systems. Sometimes it will be obvious who should fix the problem. Sometimes it won’t be. Sometimes it’ll be two secure systems that, when they interact in a particular way, cause an insecurity. In April, I wrote about a vulnerability that arose because Google and Netflix make different assumptions about e-mail addresses. I don’t even know who to blame for that one.
It gets even worse. Our system of disclosure and patching assumes that vendors have the expertise and ability to patch their systems, but that simply isn’t true for many of the embedded and low-cost Internet of things software packages. They’re designed at a much lower cost, often by offshore teams that come together, create the software, and then disband; as a result, there simply isn’t anyone left around to receive vulnerability alerts from researchers and write patches. Even worse, many of these devices aren’t patchable at all. Right now, if you own a digital video recorder that’s vulnerable to being recruited for a botnet — remember Mirai from 2016? — the only way to patch it is to throw it away and buy a new one.
Patching is starting to fail, which means that we’re losing the best mechanism we have for improving software security at exactly the same time that software is gaining autonomy and physical agency. Many researchers and organizations, including myself, have proposed government regulations enforcing minimal security standards for Internet-of-things devices, including standards around vulnerability disclosure and patching. This would be expensive, but it’s hard to see any other viable alternative.
Getting back to e-mail, the truth is that it’s incredibly difficult to secure well. Not because the cryptography is hard, but because we expect e-mail to do so many things. We use it for correspondence, for conversations, for scheduling, and for record-keeping. I regularly search my 20-year e-mail archive. The PGP and S/MIME security protocols are outdated, needlessly complicated and have been difficult to properly use the whole time. If we could start again, we would design something better and more user friendlybut the huge number of legacy applications that use the existing standards mean that we can’t. I tell people that if they want to communicate securely with someone, to use one of the secure messaging systems: Signal, Off-the-Record, or — if having one of those two on your system is itself suspicious — WhatsApp. Of course they’re not perfect, as last week’s announcement of a vulnerability (patched within hours) in Signal illustrates. And they’re not as flexible as e-mail, but that makes them easier to secure.
We all know that we should not commit any passwords or keys to the repo with our code (no matter if public or private). Yet, thousands of production passwords can be found on GitHub (and probably thousands more in internal company repositories). Some have tried to fix that by removing the passwords (once they learned it’s not a good idea to store them publicly), but passwords have remained in the git history.
Knowing what not to do is the first and very important step. But how do we store production credentials. Database credentials, system secrets (e.g. for HMACs), access keys for 3rd party services like payment providers or social networks. There doesn’t seem to be an agreed upon solution.
I’ve previously argued with the 12-factor app recommendation to use environment variables – if you have a few that might be okay, but when the number of variables grow (as in any real application), it becomes impractical. And you can set environment variables via a bash script, but you’d have to store it somewhere. And in fact, even separate environment variables should be stored somewhere.
This somewhere could be a local directory (risky), a shared storage, e.g. FTP or S3 bucket with limited access, or a separate git repository. I think I prefer the git repository as it allows versioning (Note: S3 also does, but is provider-specific). So you can store all your environment-specific properties files with all their credentials and environment-specific configurations in a git repo with limited access (only Ops people). And that’s not bad, as long as it’s not the same repo as the source code.
Since many companies are using GitHub or BitBucket for their repositories, storing production credentials on a public provider may still be risky. That’s why it’s a good idea to encrypt the files in the repository. A good way to do it is via git-crypt. It is “transparent” encryption because it supports diff and encryption and decryption on the fly. Once you set it up, you continue working with the repo as if it’s not encrypted. There’s even a fork that works on Windows.
You simply run git-crypt init (after you’ve put the git-crypt binary on your OS Path), which generates a key. Then you specify your .gitattributes, e.g. like that:
And you’re done. Well, almost. If this is a fresh repo, everything is good. If it is an existing repo, you’d have to clean up your history which contains the unencrypted files. Following these steps will get you there, with one addition – before calling git commit, you should call git-crypt status -f so that the existing files are actually encrypted.
You’re almost done. We should somehow share and backup the keys. For the sharing part, it’s not a big issue to have a team of 2-3 Ops people share the same key, but you could also use the GPG option of git-crypt (as documented in the README). What’s left is to backup your secret key (that’s generated in the .git/git-crypt directory). You can store it (password-protected) in some other storage, be it a company shared folder, Dropbox/Google Drive, or even your email. Just make sure your computer is not the only place where it’s present and that it’s protected. I don’t think key rotation is necessary, but you can devise some rotation procedure.
git-crypt authors claim to shine when it comes to encrypting just a few files in an otherwise public repo. And recommend looking at git-remote-gcrypt. But as often there are non-sensitive parts of environment-specific configurations, you may not want to encrypt everything. And I think it’s perfectly fine to use git-crypt even in a separate repo scenario. And even though encryption is an okay approach to protect credentials in your source code repo, it’s still not necessarily a good idea to have the environment configurations in the same repo. Especially given that different people/teams manage these credentials. Even in small companies, maybe not all members have production access.
The outstanding questions in this case is – how do you sync the properties with code changes. Sometimes the code adds new properties that should be reflected in the environment configurations. There are two scenarios here – first, properties that could vary across environments, but can have default values (e.g. scheduled job periods), and second, properties that require explicit configuration (e.g. database credentials). The former can have the default values bundled in the code repo and therefore in the release artifact, allowing external files to override them. The latter should be announced to the people who do the deployment so that they can set the proper values.
The whole process of having versioned environment-speific configurations is actually quite simple and logical, even with the encryption added to the picture. And I think it’s a good security practice we should try to follow.
Backblaze is hiring a Director of Sales. This is a critical role for Backblaze as we continue to grow the team. We need a strong leader who has experience in scaling a sales team and who has an excellent track record for exceeding goals by selling Software as a Service (SaaS) solutions. In addition, this leader will need to be highly motivated, as well as able to create and develop a highly-motivated, success oriented sales team that has fun and enjoys what they do.
The History of Backblaze from our CEO In 2007, after a friend’s computer crash caused her some suffering, we realized that with every photo, video, song, and document going digital, everyone would eventually lose all of their information. Five of us quit our jobs to start a company with the goal of making it easy for people to back up their data.
Like many startups, for a while we worked out of a co-founder’s one-bedroom apartment. Unlike most startups, we made an explicit agreement not to raise funding during the first year. We would then touch base every six months and decide whether to raise or not. We wanted to focus on building the company and the product, not on pitching and slide decks. And critically, we wanted to build a culture that understood money comes from customers, not the magical VC giving tree. Over the course of 5 years we built a profitable, multi-million dollar revenue business — and only then did we raise a VC round.
Fast forward 10 years later and our world looks quite different. You’ll have some fantastic assets to work with:
A brand millions recognize for openness, ease-of-use, and affordability.
A computer backup service that stores over 500 petabytes of data, has recovered over 30 billion files for hundreds of thousands of paying customers — most of whom self-identify as being the people that find and recommend technology products to their friends.
Our B2 service that provides the lowest cost cloud storage on the planet at 1/4th the price Amazon, Google or Microsoft charges. While being a newer product on the market, it already has over 100,000 IT and developers signed up as well as an ecosystem building up around it.
A growing, profitable and cash-flow positive company.
And last, but most definitely not least: a great sales team.
You might be saying, “sounds like you’ve got this under control — why do you need me?” Don’t be misled. We need you. Here’s why:
We have a great team, but we are in the process of expanding and we need to develop a structure that will easily scale and provide the most success to drive revenue.
We just launched our outbound sales efforts and we need someone to help develop that into a fully successful program that’s building a strong pipeline and closing business.
We need someone to work with the marketing department and figure out how to generate more inbound opportunities that the sales team can follow up on and close.
We need someone who will work closely in developing the skills of our current sales team and build a path for career growth and advancement.
We want someone to manage our Customer Success program.
So that’s a bit about us. What are we looking for in you?
Experience: As a sales leader, you will strategically build and drive the territory’s sales pipeline by assembling and leading a skilled team of sales professionals. This leader should be familiar with generating, developing and closing software subscription (SaaS) opportunities. We are looking for a self-starter who can manage a team and make an immediate impact of selling our Backup and Cloud Storage solutions. In this role, the sales leader will work closely with the VP of Sales, marketing staff, and service staff to develop and implement specific strategic plans to achieve and exceed revenue targets, including new business acquisition as well as build out our customer success program.
Leadership: We have an experienced team who’s brought us to where we are today. You need to have the people and management skills to get them excited about working with you. You need to be a strong leader and compassionate about developing and supporting your team.
Data driven and creative: The data has to show something makes sense before we scale it up. However, without creativity, it’s easy to say “the data shows it’s impossible” or to find a local maximum. Whether it’s deciding how to scale the team, figuring out what our outbound sales efforts should look like or putting a plan in place to develop the team for career growth, we’ve seen a bit of creativity get us places a few extra dollars couldn’t.
Jive with our culture: Strong leaders affect culture and the person we hire for this role may well shape, not only fit into, ours. But to shape the culture you have to be accepted by the organism, which means a certain set of shared values. We default to openness with our team, our customers, and everyone if possible. We love initiative — without arrogance or dictatorship. We work to create a place people enjoy showing up to work. That doesn’t mean ping pong tables and foosball (though we do try to have perks & fun), but it means people are friendly, non-political, working to build a good service but also a good place to work.
Do the work: Ideas and strategy are critical, but good execution makes them happen. We’re looking for someone who can help the team execute both from the perspective of being capable of guiding and organizing, but also someone who is hands-on themselves.
Additional Responsibilities needed for this role:
Recruit, coach, mentor, manage and lead a team of sales professionals to achieve yearly sales targets. This includes closing new business and expanding upon existing clientele.
Expand the customer success program to provide the best customer experience possible resulting in upsell opportunities and a high retention rate.
Develop effective sales strategies and deliver compelling product demonstrations and sales pitches.
Acquire and develop the appropriate sales tools to make the team efficient in their daily work flow.
Apply a thorough understanding of the marketplace, industry trends, funding developments, and products to all management activities and strategic sales decisions.
Ensure that sales department operations function smoothly, with the goal of facilitating sales and/or closings; operational responsibilities include accurate pipeline reporting and sales forecasts.
This position will report directly to the VP of Sales and will be staffed in our headquarters in San Mateo, CA.
Requirements:
7 – 10+ years of successful sales leadership experience as measured by sales performance against goals. Experience in developing skill sets and providing career growth and opportunities through advancement of team members.
Background in selling SaaS technologies with a strong track record of success.
Strong presentation and communication skills.
Must be able to travel occasionally nationwide.
BA/BS degree required
Think you want to join us on this adventure? Send an email to jobscontact@backblaze.com with the subject “Director of Sales.” (Recruiters and agencies, please don’t email us.) Include a resume and answer these two questions:
How would you approach evaluating the current sales team and what is your process for developing a growth strategy to scale the team?
What are the goals you would set for yourself in the 3 month and 1-year timeframes?
Thank you for taking the time to read this and I hope that this sounds like the opportunity for which you’ve been waiting.
This post is courtesy of Otavio Ferreira, Manager, Amazon SNS, AWS Messaging.
Amazon SNS message filtering provides a set of string and numeric matching operators that allow each subscription to receive only the messages of interest. Hence, SNS message filtering can simplify your pub/sub messaging architecture by offloading the message filtering logic from your subscriber systems, as well as the message routing logic from your publisher systems.
After you set the subscription attribute that defines a filter policy, the subscribing endpoint receives only the messages that carry attributes matching this filter policy. Other messages published to the topic are filtered out for this subscription. In this way, the native integration between SNS and Amazon CloudWatch provides visibility into the number of messages delivered, as well as the number of messages filtered out.
CloudWatch metrics are captured automatically for you. To get started with SNS message filtering, see Filtering Messages with Amazon SNS.
Message Filtering Metrics
The following six CloudWatch metrics are relevant to understanding your SNS message filtering activity:
NumberOfMessagesPublished – Inbound traffic to SNS. This metric tracks all the messages that have been published to the topic.
NumberOfNotificationsDelivered – Outbound traffic from SNS. This metric tracks all the messages that have been successfully delivered to endpoints subscribed to the topic. A delivery takes place either when the incoming message attributes match a subscription filter policy, or when the subscription has no filter policy at all, which results in a catch-all behavior.
NumberOfNotificationsFilteredOut – This metric tracks all the messages that were filtered out because they carried attributes that didn’t match the subscription filter policy.
NumberOfNotificationsFilteredOut-NoMessageAttributes – This metric tracks all the messages that were filtered out because they didn’t carry any attributes at all and, consequently, didn’t match the subscription filter policy.
NumberOfNotificationsFilteredOut-InvalidAttributes – This metric keeps track of messages that were filtered out because they carried invalid or malformed attributes and, thus, didn’t match the subscription filter policy.
NumberOfNotificationsFailed – This last metric tracks all the messages that failed to be delivered to subscribing endpoints, regardless of whether a filter policy had been set for the endpoint. This metric is emitted after the message delivery retry policy is exhausted, and SNS stops attempting to deliver the message. At that moment, the subscribing endpoint is likely no longer reachable. For example, the subscribing SQS queue or Lambda function has been deleted by its owner. You may want to closely monitor this metric to address message delivery issues quickly.
Message filtering graphs
Through the AWS Management Console, you can compose graphs to display your SNS message filtering activity. The graph shows the number of messages published, delivered, and filtered out within the timeframe you specify (1h, 3h, 12h, 1d, 3d, 1w, or custom).
To compose an SNS message filtering graph with CloudWatch:
Open the CloudWatch console.
Choose Metrics, SNS, All Metrics, and Topic Metrics.
Select all metrics to add to the graph, such as:
NumberOfMessagesPublished
NumberOfNotificationsDelivered
NumberOfNotificationsFilteredOut
Choose Graphed metrics.
In the Statistic column, switch from Average to Sum.
Title your graph with a descriptive name, such as “SNS Message Filtering”
After you have your graph set up, you may want to copy the graph link for bookmarking, emailing, or sharing with co-workers. You may also want to add your graph to a CloudWatch dashboard for easy access in the future. Both actions are available to you on the Actions menu, which is found above the graph.
Summary
SNS message filtering defines how SNS topics behave in terms of message delivery. By using CloudWatch metrics, you gain visibility into the number of messages published, delivered, and filtered out. This enables you to validate the operation of filter policies and more easily troubleshoot during development phases.
SNS message filtering can be implemented easily with existing AWS SDKs by applying message and subscription attributes across all SNS supported protocols (Amazon SQS, AWS Lambda, HTTP, SMS, email, and mobile push). CloudWatch metrics for SNS message filtering is available now, in all AWS Regions.
The adoption of Apache Spark has increased significantly over the past few years, and running Spark-based application pipelines is the new normal. Spark jobs that are in an ETL (extract, transform, and load) pipeline have different requirements—you must handle dependencies in the jobs, maintain order during executions, and run multiple jobs in parallel. In most of these cases, you can use workflow scheduler tools like Apache Oozie, Apache Airflow, and even Cron to fulfill these requirements.
Apache Oozie is a widely used workflow scheduler system for Hadoop-based jobs. However, its limited UI capabilities, lack of integration with other services, and heavy XML dependency might not be suitable for some users. On the other hand, Apache Airflow comes with a lot of neat features, along with powerful UI and monitoring capabilities and integration with several AWS and third-party services. However, with Airflow, you do need to provision and manage the Airflow server. The Cron utility is a powerful job scheduler. But it doesn’t give you much visibility into the job details, and creating a workflow using Cron jobs can be challenging.
What if you have a simple use case, in which you want to run a few Spark jobs in a specific order, but you don’t want to spend time orchestrating those jobs or maintaining a separate application? You can do that today in a serverless fashion using AWS Step Functions. You can create the entire workflow in AWS Step Functions and interact with Spark on Amazon EMR through Apache Livy.
In this post, I walk you through a list of steps to orchestrate a serverless Spark-based ETL pipeline using AWS Step Functions and Apache Livy.
Input data
For the source data for this post, I use the New York City Taxi and Limousine Commission (TLC) trip record data. For a description of the data, see this detailed dictionary of the taxi data. In this example, we’ll work mainly with the following three columns for the Spark jobs.
Column name
Column description
RateCodeID
Represents the rate code in effect at the end of the trip (for example, 1 for standard rate, 2 for JFK airport, 3 for Newark airport, and so on).
FareAmount
Represents the time-and-distance fare calculated by the meter.
TripDistance
Represents the elapsed trip distance in miles reported by the taxi meter.
The trip data is in comma-separated values (CSV) format with the first row as a header. To shorten the Spark execution time, I trimmed the large input data to only 20,000 rows. During the deployment phase, the input file tripdata.csv is stored in Amazon S3 in the <<your-bucket>>/emr-step-functions/input/ folder.
The following image shows a sample of the trip data:
Solution overview
The next few sections describe how Spark jobs are created for this solution, how you can interact with Spark using Apache Livy, and how you can use AWS Step Functions to create orchestrations for these Spark applications.
At a high level, the solution includes the following steps:
Trigger the AWS Step Function state machine by passing the input file path.
The first stage in the state machine triggers an AWS Lambda
The Lambda function interacts with Apache Spark running on Amazon EMR using Apache Livy, and submits a Spark job.
The state machine waits a few seconds before checking the Spark job status.
Based on the job status, the state machine moves to the success or failure state.
Subsequent Spark jobs are submitted using the same approach.
The state machine waits a few seconds for the job to finish.
The job finishes, and the state machine updates with its final status.
Let’s take a look at the Spark application that is used for this solution.
Spark jobs
For this example, I built a Spark jar named spark-taxi.jar. It has two different Spark applications:
MilesPerRateCode – The first job that runs on the Amazon EMR cluster. This job reads the trip data from an input source and computes the total trip distance for each rate code. The output of this job consists of two columns and is stored in Apache Parquet format in the output path.
The following are the expected output columns:
rate_code – Represents the rate code for the trip.
total_distance – Represents the total trip distance for that rate code (for example, sum(trip_distance)).
RateCodeStatus – The second job that runs on the EMR cluster, but only if the first job finishes successfully. This job depends on two different input sets:
csv – The same trip data that is used for the first Spark job.
miles-per-rate – The output of the first job.
This job first reads the tripdata.csv file and aggregates the fare_amount by the rate_code. After this point, you have two different datasets, both aggregated by rate_code. Finally, the job uses the rate_code field to join two datasets and output the entire rate code status in a single CSV file.
The output columns are as follows:
rate_code_id – Represents the rate code type.
total_distance – Derived from first Spark job and represents the total trip distance.
total_fare_amount – A new field that is generated during the second Spark application, representing the total fare amount by the rate code type.
Note that in this case, you don’t need to run two different Spark jobs to generate that output. The goal of setting up the jobs in this way is just to create a dependency between the two jobs and use them within AWS Step Functions.
Both Spark applications take one input argument called rootPath. It’s the S3 location where the Spark job is stored along with input and output data. Here is a sample of the final output:
The next section discusses how you can use Apache Livy to interact with Spark applications that are running on Amazon EMR.
Using Apache Livy to interact with Apache Spark
Apache Livy provides a REST interface to interact with Spark running on an EMR cluster. Livy is included in Amazon EMR release version 5.9.0 and later. In this post, I use Livy to submit Spark jobs and retrieve job status. When Amazon EMR is launched with Livy installed, the EMR master node becomes the endpoint for Livy, and it starts listening on port 8998 by default. Livy provides APIs to interact with Spark.
Let’s look at a couple of examples how you can interact with Spark running on Amazon EMR using Livy.
To list active running jobs, you can execute the following from the EMR master node:
curl localhost:8998/sessions
If you want to do the same from a remote instance, just change localhost to the EMR hostname, as in the following (port 8998 must be open to that remote instance through the security group):
Through Spark submit, you can pass multiple arguments for the Spark job and Spark configuration settings. You can also do that using Livy, by passing the S3 path through the args parameter, as shown following:
For a detailed list of Livy APIs, see the Apache Livy REST API page. This post uses GET /batches and POST /batches.
In the next section, you create a state machine and orchestrate Spark applications using AWS Step Functions.
Using AWS Step Functions to create a Spark job workflow
AWS Step Functions automatically triggers and tracks each step and retries when it encounters errors. So your application executes in order and as expected every time. To create a Spark job workflow using AWS Step Functions, you first create a Lambda state machine using different types of states to create the entire workflow.
First, you use the Task state—a simple state in AWS Step Functions that performs a single unit of work. You also use the Wait state to delay the state machine from continuing for a specified time. Later, you use the Choice state to add branching logic to a state machine.
The following is a quick summary of how to use different states in the state machine to create the Spark ETL pipeline:
Task state – Invokes a Lambda function. The first Task state submits the Spark job on Amazon EMR, and the next Task state is used to retrieve the previous Spark job status.
Wait state – Pauses the state machine until a job completes execution.
Choice state – Each Spark job execution can return a failure, an error, or a success state So, in the state machine, you use the Choice state to create a rule that specifies the next action or step based on the success or failure of the previous step.
Here is one of my Task states, MilesPerRateCode, which simply submits a Spark job:
"MilesPerRate Job": {
"Type": "Task",
"Resource":"arn:aws:lambda:us-east-1:xxxxxx:function:blog-miles-per-rate-job-submit-function",
"ResultPath": "$.jobId",
"Next": "Wait for MilesPerRate job to complete"
}
This Task state configuration specifies the Lambda function to execute. Inside the Lambda function, it submits a Spark job through Livy using Livy’s POST API. Using ResultPath, it tells the state machine where to place the result of the executing task. As discussed in the previous section, Spark submit returns the session ID, which is captured with $.jobId and used in a later state.
The following code section shows the Lambda function, which is used to submit the MilesPerRateCode job. It uses the Python request library to submit a POST against the Livy endpoint hosted on Amazon EMR and passes the required parameters in JSON format through payload. It then parses the response, grabs id from the response, and returns it. The Next field tells the state machine which state to go to next.
Just like in the MilesPerRate job, another state submits the RateCodeStatus job, but it executes only when all previous jobs have completed successfully.
Here is the Task state in the state machine that checks the Spark job status:
Just like other states, the preceding Task executes a Lambda function, captures the result (represented by jobStatus), and passes it to the next state. The following is the Lambda function that checks the Spark job status based on a given session ID:
In the Choice state, it checks the Spark job status value, compares it with a predefined state status, and transitions the state based on the result. For example, if the status is success, move to the next state (RateCodeJobStatus job), and if it is dead, move to the MilesPerRate job failed state.
To set up this entire solution, you need to create a few AWS resources. To make it easier, I have created an AWS CloudFormation template. This template creates all the required AWS resources and configures all the resources that are needed to create a Spark-based ETL pipeline on AWS Step Functions.
This CloudFormation template requires you to pass the following four parameters during initiation.
Parameter
Description
ClusterSubnetID
The subnet where the Amazon EMR cluster is deployed and Lambda is configured to talk to this subnet.
KeyName
The name of the existing EC2 key pair to access the Amazon EMR cluster.
VPCID
The ID of the virtual private cloud (VPC) where the EMR cluster is deployed and Lambda is configured to talk to this VPC.
S3RootPath
The Amazon S3 path where all required files (input file, Spark job, and so on) are stored and the resulting data is written.
IMPORTANT: These templates are designed only to show how you can create a Spark-based ETL pipeline on AWS Step Functions using Apache Livy. They are not intended for production use without modification. And if you try this solution outside of the us-east-1 Region, download the necessary files from s3://aws-data-analytics-blog/emr-step-functions, upload the files to the buckets in your Region, edit the script as appropriate, and then run it.
To launch the CloudFormation stack, choose Launch Stack:
Launching this stack creates the following list of AWS resources.
Logical ID
Resource Type
Description
StepFunctionsStateExecutionRole
IAM role
IAM role to execute the state machine and have a trust relationship with the states service.
SparkETLStateMachine
AWS Step Functions state machine
State machine in AWS Step Functions for the Spark ETL workflow.
LambdaSecurityGroup
Amazon EC2 security group
Security group that is used for the Lambda function to call the Livy API.
RateCodeStatusJobSubmitFunction
AWS Lambda function
Lambda function to submit the RateCodeStatus job.
MilesPerRateJobSubmitFunction
AWS Lambda function
Lambda function to submit the MilesPerRate job.
SparkJobStatusFunction
AWS Lambda function
Lambda function to check the Spark job status.
LambdaStateMachineRole
IAM role
IAM role for all Lambda functions to use the lambda trust relationship.
EMRCluster
Amazon EMR cluster
EMR cluster where Livy is running and where the job is placed.
During the AWS CloudFormation deployment phase, it sets up S3 paths for input and output. Input files are stored in the <<s3-root-path>>/emr-step-functions/input/ path, whereas spark-taxi.jar is copied under <<s3-root-path>>/emr-step-functions/.
The following screenshot shows how the S3 paths are configured after deployment. In this example, I passed a bucket that I created in the AWS account s3://tm-app-demos for the S3 root path.
If the CloudFormation template completed successfully, you will see Spark-ETL-State-Machine in the AWS Step Functions dashboard, as follows:
Choose the Spark-ETL-State-Machine state machine to take a look at this implementation. The AWS CloudFormation template built the entire state machine along with its dependent Lambda functions, which are now ready to be executed.
On the dashboard, choose the newly created state machine, and then choose New execution to initiate the state machine. It asks you to pass input in JSON format. This input goes to the first state MilesPerRate Job, which eventually executes the Lambda function blog-miles-per-rate-job-submit-function.
Pass the S3 root path as input:
{
“rootPath”: “s3://tm-app-demos”
}
Then choose Start Execution:
The rootPath value is the same value that was passed when creating the CloudFormation stack. It can be an S3 bucket location or a bucket with prefixes, but it should be the same value that is used for AWS CloudFormation. This value tells the state machine where it can find the Spark jar and input file, and where it will write output files. After the state machine starts, each state/task is executed based on its definition in the state machine.
At a high level, the following represents the flow of events:
Execute the first Spark job, MilesPerRate.
The Spark job reads the input file from the location <<rootPath>>/emr-step-functions/input/tripdata.csv. If the job finishes successfully, it writes the output data to <<rootPath>>/emr-step-functions/miles-per-rate.
If the Spark job fails, it transitions to the error state MilesPerRate job failed, and the state machine stops. If the Spark job finishes successfully, it transitions to the RateCodeStatus Job state, and the second Spark job is executed.
If the second Spark job fails, it transitions to the error state RateCodeStatus job failed, and the state machine stops with the Failed status.
If this Spark job completes successfully, it writes the final output data to the <<rootPath>>/emr-step-functions/rate-code-status/ It also transitions the RateCodeStatus job finished state, and the state machine ends its execution with the Success status.
This following screenshot shows a successfully completed Spark ETL state machine:
The right side of the state machine diagram shows the details of individual states with their input and output.
When you execute the state machine for the second time, it fails because the S3 path already exists. The state machine turns red and stops at MilePerRate job failed. The following image represents that failed execution of the state machine:
You can also check your Spark application status and logs by going to the Amazon EMR console and viewing the Application history tab:
I hope this walkthrough paints a picture of how you can create a serverless solution for orchestrating Spark jobs on Amazon EMR using AWS Step Functions and Apache Livy. In the next section, I share some ideas for making this solution even more elegant.
Next steps
The goal of this post is to show a simple example that uses AWS Step Functions to create an orchestration for Spark-based jobs in a serverless fashion. To make this solution robust and production ready, you can explore the following options:
In this example, I manually initiated the state machine by passing the rootPath as input. You can instead trigger the state machine automatically. To run the ETL pipeline as soon as the files arrive in your S3 bucket, you can pass the new file path to the state machine. Because CloudWatch Events supports AWS Step Functions as a target, you can create a CloudWatch rule for an S3 event. You can then set AWS Step Functions as a target and pass the new file path to your state machine. You’re all set!
You can also improve this solution by adding an alerting mechanism in case of failures. To do this, create a Lambda function that sends an alert email and assigns that Lambda function to a Fail That way, when any part of your state fails, it triggers an email and notifies the user.
If you want to submit multiple Spark jobs in parallel, you can use the Parallel state type in AWS Step Functions. The Parallel state is used to create parallel branches of execution in your state machine.
With Lambda and AWS Step Functions, you can create a very robust serverless orchestration for your big data workload.
Cleaning up
When you’ve finished testing this solution, remember to clean up all those AWS resources that you created using AWS CloudFormation. Use the AWS CloudFormation console or AWS CLI to delete the stack named Blog-Spark-ETL-Step-Functions.
Summary
In this post, I showed you how to use AWS Step Functions to orchestrate your Spark jobs that are running on Amazon EMR. You used Apache Livy to submit jobs to Spark from a Lambda function and created a workflow for your Spark jobs, maintaining a specific order for job execution and triggering different AWS events based on your job’s outcome. Go ahead—give this solution a try, and share your experience with us!
Tanzir Musabbir is an EMR Specialist Solutions Architect with AWS. He is an early adopter of open source Big Data technologies. At AWS, he works with our customers to provide them architectural guidance for running analytics solutions on Amazon EMR, Amazon Athena & AWS Glue. Tanzir is a big Real Madrid fan and he loves to travel in his free time.
Thanks to Greg Eppel, Sr. Solutions Architect, Microsoft Platform for this great blog that describes how to create a custom CodeBuild build environment for the .NET Framework. — AWS CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. CodeBuild provides curated build environments for programming languages and runtimes such as Android, Go, Java, Node.js, PHP, Python, Ruby, and Docker. CodeBuild now supports builds for the Microsoft Windows Server platform, including a prepackaged build environment for .NET Core on Windows. If your application uses the .NET Framework, you will need to use a custom Docker image to create a custom build environment that includes the Microsoft proprietary Framework Class Libraries. For information about why this step is required, see our FAQs. In this post, I’ll show you how to create a custom build environment for .NET Framework applications and walk you through the steps to configure CodeBuild to use this environment.
Build environments are Docker images that include a complete file system with everything required to build and test your project. To use a custom build environment in a CodeBuild project, you build a container image for your platform that contains your build tools, push it to a Docker container registry such as Amazon Elastic Container Registry (Amazon ECR), and reference it in the project configuration. When it builds your application, CodeBuild retrieves the Docker image from the container registry specified in the project configuration and uses the environment to compile your source code, run your tests, and package your application.
Step 1: Launch EC2 Windows Server 2016 with Containers
In the Amazon EC2 console, in your region, launch an Amazon EC2 instance from a Microsoft Windows Server 2016 Base with Containers AMI.
Increase disk space on the boot volume to at least 50 GB to account for the larger size of containers required to install and run Visual Studio Build Tools.
Run the following command in that directory. This process can take a while. It depends on the size of EC2 instance you launched. In my tests, a t2.2xlarge takes less than 30 minutes to build the image and produces an approximately 15 GB image.
docker build -t buildtools2017:latest -m 2GB .
Run the following command to test the container and start a command shell with all the developer environment variables:
docker run -it buildtools2017
Create a repository in the Amazon ECS console. For the repository name, type buildtools2017. Choose Next step and then complete the remaining steps.
Execute the following command to generate authentication details for our registry to the local Docker engine. Make sure you have permissions to the Amazon ECR registry before you execute the command.
aws ecr get-login
In the same command prompt window, copy and paste the following commands:
In the CodeCommit console, create a repository named DotNetFrameworkSampleApp. On the Configure email notifications page, choose Skip.
Clone a .NET Framework Docker sample application from GitHub. The repository includes a sample ASP.NET Framework that we’ll use to demonstrate our custom build environment.On the EC2 instance, open a command prompt and execute the following commands:
Navigate to the CodeCommit repository and confirm that the files you just pushed are there.
Step 4: Configure build spec
To build your .NET Framework application with CodeBuild you use a build spec, which is a collection of build commands and related settings, in YAML format, that AWS CodeBuild can use to run a build. You can include a build spec as part of the source code or you can define a build spec when you create a build project. In this example, I include a build spec as part of the source code.
In the root directory of your source directory, create a YAML file named buildspec.yml.
At this point, we have a Docker image with Visual Studio Build Tools installed and stored in the Amazon ECR registry. We also have a sample ASP.NET Framework application in a CodeCommit repository. Now we are going to set up CodeBuild to build the ASP.NET Framework application.
In the Amazon ECR console, choose the repository that was pushed earlier with the docker push command. On the Permissions tab, choose Add.
For Source Provider, choose AWS CodeCommit and then choose the called DotNetFrameworkSampleApp repository.
For Environment Image, choose Specify a Docker image.
For Environment type, choose Windows.
For Custom image type, choose Amazon ECR.
For Amazon ECR repository, choose the Docker image with the Visual Studio Build Tools installed, buildtools2017. Your configuration should look like the image below:
Choose Continue and then Save and Build to create your CodeBuild project and start your first build. You can monitor the status of the build in the console. You can also configure notifications that will notify subscribers whenever builds succeed, fail, go from one phase to another, or any combination of these events.
Summary
CodeBuild supports a number of platforms and languages out of the box. By using custom build environments, it can be extended to other runtimes. In this post, I showed you how to build a .NET Framework environment on a Windows container and demonstrated how to use it to build .NET Framework applications in CodeBuild.
We’re excited to see how customers extend and use CodeBuild to enable continuous integration and continuous delivery for their Windows applications. Feel free to share what you’ve learned extending CodeBuild for your own projects. Just leave questions or suggestions in the comments.
When I talk with customers and partners, I find that they are in different stages in the adoption of DevOps methodologies. They are automating the creation of application artifacts and the deployment of their applications to different infrastructure environments. In many cases, they are creating and supporting multiple applications using a variety of coding languages and artifacts.
The management of these processes and artifacts can be challenging, but using the right tools and methodologies can simplify the process.
In this post, I will show you how you can automate the creation and storage of application artifacts through the implementation of a pipeline and custom deploy action in AWS CodePipeline. The example includes a Node.js code base stored in an AWS CodeCommit repository. A Node Package Manager (npm) artifact is built from the code base, and the build artifact is published to a JFrogArtifactory npm repository.
I frequently recommend AWS CodePipeline, the AWS continuous integration and continuous delivery tool. You can use it to quickly innovate through integration and deployment of new features and bug fixes by building a workflow that automates the build, test, and deployment of new versions of your application. And, because AWS CodePipeline is extensible, it allows you to create a custom action that performs customized, automated actions on your behalf.
JFrog’s Artifactory is a universal binary repository manager where you can manage multiple applications, their dependencies, and versions in one place. Artifactory also enables you to standardize the way you manage your package types across all applications developed in your company, no matter the code base or artifact type.
If you already have a Node.js CodeCommit repository, a JFrog Artifactory host, and would like to automate the creation of the pipeline, including the custom action and CodeBuild project, you can use this AWS CloudFormationtemplate to create your AWS CloudFormation stack.
This figure shows the path defined in the pipeline for this project. It starts with a change to Node.js source code committed to a private code repository in AWS CodeCommit. With this change, CodePipeline triggers AWS CodeBuild to create the npm package from the node.js source code. After the build, CodePipeline triggers the custom action job worker to commit the build artifact to the designated artifact repository in Artifactory.
This blog post assumes you have already:
· Created a CodeCommit repository that contains a Node.js project.
· Configured a two-stage pipeline in AWS CodePipeline.
The Source stage of the pipeline is configured to poll the Node.js CodeCommit repository. The Build stage is configured to use a CodeBuild project to build the npm package using a buildspec.yml file located in the code repository.
If you do not have a Node.js repository, you can create a CodeCommit repository that contains this simple ‘Hello World’ project. This project also includes a buildspec.yml file that is used when you define your CodeBuild project. It defines the steps to be taken by CodeBuild to create the npm artifact.
If you do not already have a pipeline set up in CodePipeline, you can use this template to create a pipeline with a CodeCommit source action and a CodeBuild build action through the AWS Command Line Interface (AWS CLI). If you do not want to install the AWS CLI on your local machine, you can use AWS Cloud9, our managed integrated development environment (IDE), to interact with AWS APIs.
In your development environment, open your favorite editor and fill out the template with values appropriate to your project. For information, see the readme in the GitHub repository.
Use this CLI command to create the pipeline from the template:
It creates a pipeline that has a CodeCommit source action and a CodeBuild build action.
Integrating JFrog Artifactory
JFrog Artifactory provides default repositories for your project needs. For my NPM package repository, I am using the default virtual npm repository (named npm) that is available in Artifactory Pro. You might want to consider creating a repository per project but for the example used in this post, using the default lets me get started without having to configure a new repository.
I can use the steps in the Set Me Up -> npm section on the landing page to configure my worker to interact with the default NPM repository.
Describes the required values to run the custom action. I will define my custom action in the ‘Deploy’ category, identify the provider as ‘Artifactory’, of version ‘1’, and specify a variety of configurationProperties whose values will be defined when this stage is added to my pipeline.
Polls CodePipeline for a job, scanning for its action-definition properties. In this blog post, after a job has been found, the job worker does the work required to publish the npm artifact to the Artifactory repository.
{
"category": "Deploy",
"configurationProperties": [{
"name": "TypeOfArtifact",
"required": true,
"key": true,
"secret": false,
"description": "Package type, ex. npm for node packages",
"type": "String"
},
{ "name": "RepoKey",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Name of the repository in which this artifact should be stored"
},
{ "name": "UserName",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Username for authenticating with the repository"
},
{ "name": "Password",
"required": true,
"key": true,
"secret": true,
"type": "String",
"description": "Password for authenticating with the repository"
},
{ "name": "EmailAddress",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Email address used to authenticate with the repository"
},
{ "name": "ArtifactoryHost",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Public address of Artifactory host, ex: https://myexamplehost.com or http://myexamplehost.com:8080"
}],
"provider": "Artifactory",
"version": "1",
"settings": {
"entityUrlTemplate": "{Config:ArtifactoryHost}/artifactory/webapp/#/artifacts/browse/tree/General/{Config:RepoKey}"
},
"inputArtifactDetails": {
"maximumCount": 5,
"minimumCount": 1
},
"outputArtifactDetails": {
"maximumCount": 5,
"minimumCount": 0
}
}
There are seven sections to the custom action definition:
category: This is the stage in which you will be creating this action. It can be Source, Build, Deploy, Test, Invoke, Approval. Except for source actions, the category section simply allows us to organize our actions. I am setting the category for my action as ‘Deploy’ because I’m using it to publish my node artifact to my Artifactory instance.
configurationProperties: These are the parameters or variables required for your project to authenticate and commit your artifact. In the case of my custom worker, I need:
TypeOfArtifact: In this case, npm, because it’s for the Node Package Manager.
RepoKey: The name of the repository. In this case, it’s the default npm.
UserName and Password for the user to authenticate with the Artifactory repository.
EmailAddress used to authenticate with the repository.
Artifactory host name or IP address.
provider: The name you define for your custom action stage. I have named the provider Artifactory.
version: Version number for the custom action. Because this is the first version, I set the version number to 1.
entityUrlTemplate: This URL is presented to your users for the deploy stage along with the title you define in your provider. The link takes the user to their artifact repository page in the Artifactory host.
inputArtifactDetails: The number of artifacts to expect from the previous stage in the pipeline.
outputArtifactDetails: The number of artifacts that should be the result from the custom action stage. Later in this blog post, I define 0 for my output artifacts because I am publishing the artifact to the Artifactory repository as the final action.
After I define the custom action in a JSON file, I use the AWS CLI to create the custom action type in CodePipeline:
After I create the custom action type in the same region as my pipeline, I edit the pipeline to add a Deploy stage and configure it to use the custom action I created for Artifactory:
I have created a custom worker for the actions required to commit the npm artifact to the Artifactory repository. The worker is in Python and it runs in a loop on an Amazon EC2 instance. My custom worker polls for a deploy job and publishes the NPM artifact to the Artifactory repository.
The EC2 instance is running Amazon Linux and has an IAM instance role attached that gives the worker permission to access CodePipeline. The worker process is as follows:
Take the configuration properties from the custom worker and poll CodePipeline for a custom action job.
After there is a job in the job queue with the appropriate category, provider, and version, acknowledge the job.
Download the zipped artifact created in the previous Build stage from the provided S3 buckets with the provided temporary credentials.
Unzip the artifact into a temporary directory.
A user-defined Artifactory user name and password is used to receive a temporary API key from Artifactory.
To avoid having to write the password to a file, use that temporary API key and user name to authenticate with the NPM repository.
Publish the Node.js package to the specified repository.
Because I am running my custom worker on an Amazon Linux EC2 instance, I installed npm with the following command:
sudo yum install nodejs npm --enablerepo=epel
For my custom worker, I used pip to install the required Python libraries:
pip install boto3 requests
For a full Python package list, see requirements.txt in the GitHub repository.
Let’s take a look at some of the code snippets from the worker.
First, the worker polls for jobs:
def action_type():
ActionType = {
'category': 'Deploy',
'owner': 'Custom',
'provider': 'Artifactory',
'version': '1' }
return(ActionType)
def poll_for_jobs():
try:
artifactory_action_type = action_type()
print(artifactory_action_type)
jobs = codepipeline.poll_for_jobs(actionTypeId=artifactory_action_type)
while not jobs['jobs']:
time.sleep(10)
jobs = codepipeline.poll_for_jobs(actionTypeId=artifactory_action_type)
if jobs['jobs']:
print('Job found')
return jobs['jobs'][0]
except ClientError as e:
print("Received an error: %s" % str(e))
raise
When there is a job in the queue, the poller returns a number of values from the queue such as jobId, the input and output S3 buckets for artifacts, temporary credentials to access the S3 buckets, and other configuration details from the stage in the pipeline.
After successfully receiving the job details, the worker sends an acknowledgement to CodePipeline to ensure that the work on the job is not duplicated by other workers watching for the same job:
def job_acknowledge(jobId, nonce):
try:
print('Acknowledging job')
result = codepipeline.acknowledge_job(jobId=jobId, nonce=nonce)
return result
except Exception as e:
print("Received an error when trying to acknowledge the job: %s" % str(e))
raise
With the job now acknowledged, the worker publishes the source code artifact into the desired repository. The worker gets the value of the artifact S3 bucket and objectKey from the inputArtifacts in the response from the poll_for_jobs API request. Next, the worker creates a new directory in /tmp and downloads the S3 object into this directory:
def get_bucket_location(bucketName, init_client):
region = init_client.get_bucket_location(Bucket=bucketName)['LocationConstraint']
if not region:
region = 'us-east-1'
return region
def get_s3_artifact(bucketName, objectKey, ak, sk, st):
init_s3 = boto3.client('s3')
region = get_bucket_location(bucketName, init_s3)
session = Session(aws_access_key_id=ak,
aws_secret_access_key=sk,
aws_session_token=st)
s3 = session.resource('s3',
region_name=region,
config=botocore.client.Config(signature_version='s3v4'))
try:
tempdirname = tempfile.mkdtemp()
except OSError as e:
print('Could not write temp directory %s' % tempdirname)
raise
bucket = s3.Bucket(bucketName)
obj = bucket.Object(objectKey)
filename = tempdirname + '/' + objectKey
try:
if os.path.dirname(objectKey):
directory = os.path.dirname(filename)
os.makedirs(directory)
print('Downloading the %s object and writing it to disk in %s location' % (objectKey, tempdirname))
with open(filename, 'wb') as data:
obj.download_fileobj(data)
except ClientError as e:
print('Downloading the object and writing the file to disk raised this error: ' + str(e))
raise
return(filename, tempdirname)
Because the downloaded artifact from S3 is a zip file, the worker must unzip it first. To have a clean area in which to work, I extract the downloaded zip archive into a new directory:
def unzip_codepipeline_artifact(artifact, origtmpdir):
# create a new temp directory
# Unzip artifact into new directory
try:
newtempdir = tempfile.mkdtemp()
print('Extracting artifact %s into temporary directory %s' % (artifact, newtempdir))
zip_ref = zipfile.ZipFile(artifact, 'r')
zip_ref.extractall(newtempdir)
zip_ref.close()
shutil.rmtree(origtmpdir)
return(os.listdir(newtempdir), newtempdir)
except OSError as e:
if e.errno != errno.EEXIST:
shutil.rmtree(newtempdir)
raise
The worker now has the npm package that I want to store in my Artifactory NPM repository.
To authenticate with the NPM repository, the worker requests a temporary token from the Artifactory host. After receiving this temporary token, it creates a .npmrc file in the worker user’s home directory that includes a hash of the user name and temporary token. After it has authenticated, the worker runs npm config set registry <URL OF REPOSITORY> to configure the npm registry value to be the Artifactory host. Next, the worker runs npm publish –registry <URL OF REPOSITORY>, which publishes the node package to the NPM repository in the Artifactory host.
def push_to_npm(configuration, artifact_list, temp_dir, jobId):
reponame = configuration['RepoKey']
art_type = configuration['TypeOfArtifact']
print("Putting artifact into NPM repository " + reponame)
token, hostname, username = gen_artifactory_auth_token(configuration)
npmconfigfile = create_npmconfig_file(configuration, username, token)
url = hostname + '/artifactory/api/' + art_type + '/' + reponame
print("Changing directory to " + str(temp_dir))
os.chdir(temp_dir)
try:
print("Publishing following files to the repository: %s " % os.listdir(temp_dir))
print("Sending artifact to Artifactory NPM registry URL: " + url)
subprocess.call(["npm", "config", "set", "registry", url])
req = subprocess.call(["npm", "publish", "--registry", url])
print("Return code from npm publish: " + str(req))
if req != 0:
err_msg = "npm ERR! Recieved non OK response while sending response to Artifactory. Return code from npm publish: " + str(req)
signal_failure(jobId, err_msg)
else:
signal_success(jobId)
except requests.exceptions.RequestException as e:
print("Received an error when trying to commit artifact %s to repository %s: " % (str(art_type), str(configuration['RepoKey']), str(e)))
raise
return(req, npmconfigfile)
If the return value from publishing to the repository is not 0, the worker signals a failure to CodePipeline. If the value is 0, the worker signals success to CodePipeline to indicate that the stage of the pipeline has been completed successfully.
For the custom worker code, see npm_job_worker.py in the GitHub repository.
I run my custom worker on an EC2 instance using the command python npm_job_worker.py, with an optional --version flag that can be used to specify worker versions other than 1. Then I trigger a release change in my pipeline:
From my custom worker output logs, I have just committed a package named node_example at version 1.0.3:
On artifact: index.js
Committing to the repo: https://artifactory.myexamplehost.com/artifactory/api/npm/npm
Sending artifact to Artifactory URL: https:// artifactoryhost.myexamplehost.com/artifactory/api/npm/npm
npm config: 0
npm http PUT https://artifactory.myexamplehost.com/artifactory/api/npm/npm/node_example
npm http 201 https://artifactory.myexamplehost.com/artifactory/api/npm/npm/node_example
+ [email protected]
Return code from npm publish: 0
Signaling success to CodePipeline
After that has been built successfully, I can find my artifact in my Artifactory repository:
To help you automate this process, I have created this AWS CloudFormation template that automates the creation of the CodeBuild project, the custom action, and the CodePipeline pipeline. It also launches the Amazon EC2-based custom job worker in an AWS Auto Scaling group. This template requires you to have a VPC and CodeCommit repository for your Node.js project. If you do not currently have a VPC in which you want to run your custom worker EC2 instances, you can use this AWS QuickStart to create one. If you do not have an existing Node.js project, I’ve provided a sample project in the GitHub repository.
Conclusion
I‘ve shown you the steps to integrate your JFrog Artifactory repository with your CodePipeline workflow. I’ve shown you how to create a custom action in CodePipeline and how to create a custom worker that works in your CI/CD pipeline. To dig deeper into custom actions and see how you can integrate your Artifactory repositories into your AWS CodePipeline projects, check out the full code base on GitHub.
If you have any questions or feedback, feel free to reach out to us through the AWS CodePipeline forum.
Erin McGill is a Solutions Architect in the AWS Partner Program with a focus on DevOps and automation tooling.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.