Tag Archives: Documentation

New – SES Dedicated IP Pools

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-ses-dedicated-ip-pools/

Today we released Dedicated IP Pools for Amazon Simple Email Service (SES). With dedicated IP pools, you can specify which dedicated IP addresses to use for sending different types of email. Dedicated IP pools let you use your SES for different tasks. For instance, you can send transactional emails from one set of IPs and you can send marketing emails from another set of IPs.

If you’re not familiar with Amazon SES these concepts may not make much sense. We haven’t had the chance to cover SES on this blog since 2016, which is a shame, so I want to take a few steps back and talk about the service as a whole and some of the enhancements the team has made over the past year. If you just want the details on this new feature I strongly recommend reading the Amazon Simple Email Service Blog.

What is SES?

So, what is SES? If you’re a customer of Amazon.com you know that we send a lot of emails. Bought something? You get an email. Order shipped? You get an email. Over time, as both email volumes and types increased Amazon.com needed to build an email platform that was flexible, scalable, reliable, and cost-effective. SES is the result of years of Amazon’s own work in dealing with email and maximizing deliverability.

In short: SES gives you the ability to send and receive many types of email with the monitoring and tools to ensure high deliverability.

Sending an email is easy; one simple API call:

import boto3
ses = boto3.client('ses')
ses.send_email(
    Source='[email protected]',
    Destination={'ToAddresses': ['[email protected]']},
    Message={
        'Subject': {'Data': 'Hello, World!'},
        'Body': {'Text': {'Data': 'Hello, World!'}}
    }
)

Receiving and reacting to emails is easy too. You can set up rulesets that forward received emails to Amazon Simple Storage Service (S3), Amazon Simple Notification Service (SNS), or AWS Lambda – you could even trigger a Amazon Lex bot through Lambda to communicate with your customers over email. SES is a powerful tool for building applications. The image below shows just a fraction of the capabilities:

Deliverability 101

Deliverability is the percentage of your emails that arrive in your recipients’ inboxes. Maintaining deliverability is a shared responsibility between AWS and the customer. AWS takes the fight against spam very seriously and works hard to make sure services aren’t abused. To learn more about deliverability I recommend the deliverability docs. For now, understand that deliverability is an important aspect of email campaigns and SES has many tools that enable a customer to manage their deliverability.

Dedicated IPs and Dedicated IP pools

When you’re starting out with SES your emails are sent through a shared IP. That IP is responsible for sending mail on behalf of many customers and AWS works to maintain appropriate volume and deliverability on each of those IPs. However, when you reach a sufficient volume shared IPs may not be the right solution.

By creating a dedicated IP you’re able to fully control the reputations of those IPs. This makes it vastly easier to troubleshoot any deliverability or reputation issues. It’s also useful for many email certification programs which require a dedicated IP as a commitment to maintaining your email reputation. Using the shared IPs of the Amazon SES service is still the right move for many customers but if you have sustained daily sending volume greater than hundreds of thousands of emails per day you might want to consider a dedicated IP. One caveat to be aware of: if you’re not sending a sufficient volume of email with a consistent pattern a dedicated IP can actually hurt your reputation. Dedicated IPs are $24.95 per address per month at the time of this writing – but you can find out more at the pricing page.

Before you can use a Dedicated IP you need to “warm” it. You do this by gradually increasing the volume of emails you send through a new address. Each IP needs time to build a positive reputation. In March of this year SES released the ability to automatically warm your IPs over the course of 45 days. This feature is on by default for all new dedicated IPs.

Customers who send high volumes of email will typically have multiple dedicated IPs. Today the SES team released dedicated IP pools to make managing those IPs easier. Now when you send email you can specify a configuration set which will route your email to an IP in a pool based on the pool’s association with that configuration set.

One of the other major benefits of this feature is that it allows customers who previously split their email sending across several AWS accounts (to manage their reputation for different types of email) to consolidate into a single account.

You can read the documentation and blog for more info.

Analyzing AWS Cost and Usage Reports with Looker and Amazon Athena

Post Syndicated from Dillon Morrison original https://aws.amazon.com/blogs/big-data/analyzing-aws-cost-and-usage-reports-with-looker-and-amazon-athena/

This is a guest post by Dillon Morrison at Looker. Looker is, in their own words, “a new kind of analytics platform–letting everyone in your business make better decisions by getting reliable answers from a tool they can use.” 

As the breadth of AWS products and services continues to grow, customers are able to more easily move their technology stack and core infrastructure to AWS. One of the attractive benefits of AWS is the cost savings. Rather than paying upfront capital expenses for large on-premises systems, customers can instead pay variables expenses for on-demand services. To further reduce expenses AWS users can reserve resources for specific periods of time, and automatically scale resources as needed.

The AWS Cost Explorer is great for aggregated reporting. However, conducting analysis on the raw data using the flexibility and power of SQL allows for much richer detail and insight, and can be the better choice for the long term. Thankfully, with the introduction of Amazon Athena, monitoring and managing these costs is now easier than ever.

In the post, I walk through setting up the data pipeline for cost and usage reports, Amazon S3, and Athena, and discuss some of the most common levers for cost savings. I surface tables through Looker, which comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive.

Analysis with Athena

With Athena, there’s no need to create hundreds of Excel reports, move data around, or deploy clusters to house and process data. Athena uses Apache Hive’s DDL to create tables, and the Presto querying engine to process queries. Analysis can be performed directly on raw data in S3. Conveniently, AWS exports raw cost and usage data directly into a user-specified S3 bucket, making it simple to start querying with Athena quickly. This makes continuous monitoring of costs virtually seamless, since there is no infrastructure to manage. Instead, users can leverage the power of the Athena SQL engine to easily perform ad-hoc analysis and data discovery without needing to set up a data warehouse.

After the data pipeline is established, cost and usage data (the recommended billing data, per AWS documentation) provides a plethora of comprehensive information around usage of AWS services and the associated costs. Whether you need the report segmented by product type, user identity, or region, this report can be cut-and-sliced any number of ways to properly allocate costs for any of your business needs. You can then drill into any specific line item to see even further detail, such as the selected operating system, tenancy, purchase option (on-demand, spot, or reserved), and so on.

Walkthrough

By default, the Cost and Usage report exports CSV files, which you can compress using gzip (recommended for performance). There are some additional configuration options for tuning performance further, which are discussed below.

Prerequisites

If you want to follow along, you need the following resources:

Enable the cost and usage reports

First, enable the Cost and Usage report. For Time unit, select Hourly. For Include, select Resource IDs. All options are prompted in the report-creation window.

The Cost and Usage report dumps CSV files into the specified S3 bucket. Please note that it can take up to 24 hours for the first file to be delivered after enabling the report.

Configure the S3 bucket and files for Athena querying

In addition to the CSV file, AWS also creates a JSON manifest file for each cost and usage report. Athena requires that all of the files in the S3 bucket are in the same format, so we need to get rid of all these manifest files. If you’re looking to get started with Athena quickly, you can simply go into your S3 bucket and delete the manifest file manually, skip the automation described below, and move on to the next section.

To automate the process of removing the manifest file each time a new report is dumped into S3, which I recommend as you scale, there are a few additional steps. The folks at Concurrency labs wrote a great overview and set of scripts for this, which you can find in their GitHub repo.

These scripts take the data from an input bucket, remove anything unnecessary, and dump it into a new output bucket. We can utilize AWS Lambda to trigger this process whenever new data is dropped into S3, or on a nightly basis, or whatever makes most sense for your use-case, depending on how often you’re querying the data. Please note that enabling the “hourly” report means that data is reported at the hour-level of granularity, not that a new file is generated every hour.

Following these scripts, you’ll notice that we’re adding a date partition field, which isn’t necessary but improves query performance. In addition, converting data from CSV to a columnar format like ORC or Parquet also improves performance. We can automate this process using Lambda whenever new data is dropped in our S3 bucket. Amazon Web Services discusses columnar conversion at length, and provides walkthrough examples, in their documentation.

As a long-term solution, best practice is to use compression, partitioning, and conversion. However, for purposes of this walkthrough, we’re not going to worry about them so we can get up-and-running quicker.

Set up the Athena query engine

In your AWS console, navigate to the Athena service, and click “Get Started”. Follow the tutorial and set up a new database (we’ve called ours “AWS Optimizer” in this example). Don’t worry about configuring your initial table, per the tutorial instructions. We’ll be creating a new table for cost and usage analysis. Once you walked through the tutorial steps, you’ll be able to access the Athena interface, and can begin running Hive DDL statements to create new tables.

One thing that’s important to note, is that the Cost and Usage CSVs also contain the column headers in their first row, meaning that the column headers would be included in the dataset and any queries. For testing and quick set-up, you can remove this line manually from your first few CSV files. Long-term, you’ll want to use a script to programmatically remove this row each time a new file is dropped in S3 (every few hours typically). We’ve drafted up a sample script for ease of reference, which we run on Lambda. We utilize Lambda’s native ability to invoke the script whenever a new object is dropped in S3.

For cost and usage, we recommend using the DDL statement below. Since our data is in CSV format, we don’t need to use a SerDe, we can simply specify the “separatorChar, quoteChar, and escapeChar”, and the structure of the files (“TEXTFILE”). Note that AWS does have an OpenCSV SerDe as well, if you prefer to use that.

 

CREATE EXTERNAL TABLE IF NOT EXISTS cost_and_usage	 (
identity_LineItemId String,
identity_TimeInterval String,
bill_InvoiceId String,
bill_BillingEntity String,
bill_BillType String,
bill_PayerAccountId String,
bill_BillingPeriodStartDate String,
bill_BillingPeriodEndDate String,
lineItem_UsageAccountId String,
lineItem_LineItemType String,
lineItem_UsageStartDate String,
lineItem_UsageEndDate String,
lineItem_ProductCode String,
lineItem_UsageType String,
lineItem_Operation String,
lineItem_AvailabilityZone String,
lineItem_ResourceId String,
lineItem_UsageAmount String,
lineItem_NormalizationFactor String,
lineItem_NormalizedUsageAmount String,
lineItem_CurrencyCode String,
lineItem_UnblendedRate String,
lineItem_UnblendedCost String,
lineItem_BlendedRate String,
lineItem_BlendedCost String,
lineItem_LineItemDescription String,
lineItem_TaxType String,
product_ProductName String,
product_accountAssistance String,
product_architecturalReview String,
product_architectureSupport String,
product_availability String,
product_bestPractices String,
product_cacheEngine String,
product_caseSeverityresponseTimes String,
product_clockSpeed String,
product_currentGeneration String,
product_customerServiceAndCommunities String,
product_databaseEdition String,
product_databaseEngine String,
product_dedicatedEbsThroughput String,
product_deploymentOption String,
product_description String,
product_durability String,
product_ebsOptimized String,
product_ecu String,
product_endpointType String,
product_engineCode String,
product_enhancedNetworkingSupported String,
product_executionFrequency String,
product_executionLocation String,
product_feeCode String,
product_feeDescription String,
product_freeQueryTypes String,
product_freeTrial String,
product_frequencyMode String,
product_fromLocation String,
product_fromLocationType String,
product_group String,
product_groupDescription String,
product_includedServices String,
product_instanceFamily String,
product_instanceType String,
product_io String,
product_launchSupport String,
product_licenseModel String,
product_location String,
product_locationType String,
product_maxIopsBurstPerformance String,
product_maxIopsvolume String,
product_maxThroughputvolume String,
product_maxVolumeSize String,
product_maximumStorageVolume String,
product_memory String,
product_messageDeliveryFrequency String,
product_messageDeliveryOrder String,
product_minVolumeSize String,
product_minimumStorageVolume String,
product_networkPerformance String,
product_operatingSystem String,
product_operation String,
product_operationsSupport String,
product_physicalProcessor String,
product_preInstalledSw String,
product_proactiveGuidance String,
product_processorArchitecture String,
product_processorFeatures String,
product_productFamily String,
product_programmaticCaseManagement String,
product_provisioned String,
product_queueType String,
product_requestDescription String,
product_requestType String,
product_routingTarget String,
product_routingType String,
product_servicecode String,
product_sku String,
product_softwareType String,
product_storage String,
product_storageClass String,
product_storageMedia String,
product_technicalSupport String,
product_tenancy String,
product_thirdpartySoftwareSupport String,
product_toLocation String,
product_toLocationType String,
product_training String,
product_transferType String,
product_usageFamily String,
product_usagetype String,
product_vcpu String,
product_version String,
product_volumeType String,
product_whoCanOpenCases String,
pricing_LeaseContractLength String,
pricing_OfferingClass String,
pricing_PurchaseOption String,
pricing_publicOnDemandCost String,
pricing_publicOnDemandRate String,
pricing_term String,
pricing_unit String,
reservation_AvailabilityZone String,
reservation_NormalizedUnitsPerReservation String,
reservation_NumberOfReservations String,
reservation_ReservationARN String,
reservation_TotalReservedNormalizedUnits String,
reservation_TotalReservedUnits String,
reservation_UnitsPerReservation String,
resourceTags_userName String,
resourceTags_usercostcategory String  


)
    ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      ESCAPED BY '\\'
      LINES TERMINATED BY '\n'

STORED AS TEXTFILE
    LOCATION 's3://<<your bucket name>>';

Once you’ve successfully executed the command, you should see a new table named “cost_and_usage” with the below properties. Now we’re ready to start executing queries and running analysis!

Start with Looker and connect to Athena

Setting up Looker is a quick process, and you can try it out for free here (or download from Amazon Marketplace). It takes just a few seconds to connect Looker to your Athena database, and Looker comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive. After you’re connected, you can use the Looker UI to run whatever analysis you’d like. Looker translates this UI to optimized SQL, so any user can execute and visualize queries for true self-service analytics.

Major cost saving levers

Now that the data pipeline is configured, you can dive into the most popular use cases for cost savings. In this post, I focus on:

  • Purchasing Reserved Instances vs. On-Demand Instances
  • Data transfer costs
  • Allocating costs over users or other Attributes (denoted with resource tags)

On-Demand, Spot, and Reserved Instances

Purchasing Reserved Instances vs On-Demand Instances is arguably going to be the biggest cost lever for heavy AWS users (Reserved Instances run up to 75% cheaper!). AWS offers three options for purchasing instances:

  • On-Demand—Pay as you use.
  • Spot (variable cost)—Bid on spare Amazon EC2 computing capacity.
  • Reserved Instances—Pay for an instance for a specific, allotted period of time.

When purchasing a Reserved Instance, you can also choose to pay all-upfront, partial-upfront, or monthly. The more you pay upfront, the greater the discount.

If your company has been using AWS for some time now, you should have a good sense of your overall instance usage on a per-month or per-day basis. Rather than paying for these instances On-Demand, you should try to forecast the number of instances you’ll need, and reserve them with upfront payments.

The total amount of usage with Reserved Instances versus overall usage with all instances is called your coverage ratio. It’s important not to confuse your coverage ratio with your Reserved Instance utilization. Utilization represents the amount of reserved hours that were actually used. Don’t worry about exceeding capacity, you can still set up Auto Scaling preferences so that more instances get added whenever your coverage or utilization crosses a certain threshold (we often see a target of 80% for both coverage and utilization among savvy customers).

Calculating the reserved costs and coverage can be a bit tricky with the level of granularity provided by the cost and usage report. The following query shows your total cost over the last 6 months, broken out by Reserved Instance vs other instance usage. You can substitute the cost field for usage if you’d prefer. Please note that you should only have data for the time period after the cost and usage report has been enabled (though you can opt for up to 3 months of historical data by contacting your AWS Account Executive). If you’re just getting started, this query will only show a few days.

 

SELECT 
	DATE_FORMAT(from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate),'%Y-%m') AS "cost_and_usage.usage_start_month",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0) AS "cost_and_usage.total_unblended_cost",
	COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_reserved_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_on_ris",
	COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_non_reserved_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_on_non_ris"
FROM aws_optimizer.cost_and_usage  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 500

The resulting table should look something like the image below (I’m surfacing tables through Looker, though the same table would result from querying via command line or any other interface).

With a BI tool, you can create dashboards for easy reference and monitoring. New data is dumped into S3 every few hours, so your dashboards can update several times per day.

It’s an iterative process to understand the appropriate number of Reserved Instances needed to meet your business needs. After you’ve properly integrated Reserved Instances into your purchasing patterns, the savings can be significant. If your coverage is consistently below 70%, you should seriously consider adjusting your purchase types and opting for more Reserved instances.

Data transfer costs

One of the great things about AWS data storage is that it’s incredibly cheap. Most charges often come from moving and processing that data. There are several different prices for transferring data, broken out largely by transfers between regions and availability zones. Transfers between regions are the most costly, followed by transfers between Availability Zones. Transfers within the same region and same availability zone are free unless using elastic or public IP addresses, in which case there is a cost. You can find more detailed information in the AWS Pricing Docs. With this in mind, there are several simple strategies for helping reduce costs.

First, since costs increase when transferring data between regions, it’s wise to ensure that as many services as possible reside within the same region. The more you can localize services to one specific region, the lower your costs will be.

Second, you should maximize the data you’re routing directly within AWS services and IP addresses. Transfers out to the open internet are the most costly and least performant mechanisms of data transfers, so it’s best to keep transfers within AWS services.

Lastly, data transfers between private IP addresses are cheaper than between elastic or public IP addresses, so utilizing private IP addresses as much as possible is the most cost-effective strategy.

The following query provides a table depicting the total costs for each AWS product, broken out transfer cost type. Substitute the “lineitem_productcode” field in the query to segment the costs by any other attribute. If you notice any unusually high spikes in cost, you’ll need to dig deeper to understand what’s driving that spike: location, volume, and so on. Drill down into specific costs by including “product_usagetype” and “product_transfertype” in your query to identify the types of transfer costs that are driving up your bill.

SELECT 
	cost_and_usage.lineitem_productcode  AS "cost_and_usage.product_code",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost), 0) AS "cost_and_usage.total_unblended_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_data_transfer_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer-In')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_inbound_data_transfer_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer-Out')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_outbound_data_transfer_cost"
FROM aws_optimizer.cost_and_usage  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 500

When moving between regions or over the open web, many data transfer costs also include the origin and destination location of the data movement. Using a BI tool with mapping capabilities, you can get a nice visual of data flows. The point at the center of the map is used to represent external data flows over the open internet.

Analysis by tags

AWS provides the option to apply custom tags to individual resources, so you can allocate costs over whatever customized segment makes the most sense for your business. For a SaaS company that hosts software for customers on AWS, maybe you’d want to tag the size of each customer. The following query uses custom tags to display the reserved, data transfer, and total cost for each AWS service, broken out by tag categories, over the last 6 months. You’ll want to substitute the cost_and_usage.resourcetags_customersegment and cost_and_usage.customer_segment with the name of your customer field.

 

SELECT * FROM (
SELECT *, DENSE_RANK() OVER (ORDER BY z___min_rank) as z___pivot_row_rank, RANK() OVER (PARTITION BY z__pivot_col_rank ORDER BY z___min_rank) as z__pivot_col_ordering FROM (
SELECT *, MIN(z___rank) OVER (PARTITION BY "cost_and_usage.product_code") as z___min_rank FROM (
SELECT *, RANK() OVER (ORDER BY CASE WHEN z__pivot_col_rank=1 THEN (CASE WHEN "cost_and_usage.total_unblended_cost" IS NOT NULL THEN 0 ELSE 1 END) ELSE 2 END, CASE WHEN z__pivot_col_rank=1 THEN "cost_and_usage.total_unblended_cost" ELSE NULL END DESC, "cost_and_usage.total_unblended_cost" DESC, z__pivot_col_rank, "cost_and_usage.product_code") AS z___rank FROM (
SELECT *, DENSE_RANK() OVER (ORDER BY CASE WHEN "cost_and_usage.customer_segment" IS NULL THEN 1 ELSE 0 END, "cost_and_usage.customer_segment") AS z__pivot_col_rank FROM (
SELECT 
	cost_and_usage.lineitem_productcode  AS "cost_and_usage.product_code",
	cost_and_usage.resourcetags_customersegment  AS "cost_and_usage.customer_segment",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0) AS "cost_and_usage.total_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_data_transfers_unblended",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.unblended_percent_spend_on_ris"
FROM aws_optimizer.cost_and_usage_raw  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1,2) ww
) bb WHERE z__pivot_col_rank <= 16384
) aa
) xx
) zz
 WHERE z___pivot_row_rank <= 500 OR z__pivot_col_ordering = 1 ORDER BY z___pivot_row_rank

The resulting table in this example looks like the results below. In this example, you can tell that we’re making poor use of Reserved Instances because they represent such a small portion of our overall costs.

Again, using a BI tool to visualize these costs and trends over time makes the analysis much easier to consume and take action on.

Summary

Saving costs on your AWS spend is always an iterative, ongoing process. Hopefully with these queries alone, you can start to understand your spending patterns and identify opportunities for savings. However, this is just a peek into the many opportunities available through analysis of the Cost and Usage report. Each company is different, with unique needs and usage patterns. To achieve maximum cost savings, we encourage you to set up an analytics environment that enables your team to explore all potential cuts and slices of your usage data, whenever it’s necessary. Exploring different trends and spikes across regions, services, user types, etc. helps you gain comprehensive understanding of your major cost levers and consistently implement new cost reduction strategies.

Note that all of the queries and analysis provided in this post were generated using the Looker data platform. If you’re already a Looker customer, you can get all of this analysis, additional pre-configured dashboards, and much more using Looker Blocks for AWS.


About the Author

Dillon Morrison leads the Platform Ecosystem at Looker. He enjoys exploring new technologies and architecting the most efficient data solutions for the business needs of his company and their customers. In his spare time, you’ll find Dillon rock climbing in the Bay Area or nose deep in the docs of the latest AWS product release at his favorite cafe (“Arlequin in SF is unbeatable!”).

 

 

 

Wanted: Front End Developer

Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-front-end-developer/

Want to work at a company that helps customers in over 150 countries around the world protect the memories they hold dear? Do you want to challenge yourself with a business that serves consumers, SMBs, Enterprise, and developers? If all that sounds interesting, you might be interested to know that Backblaze is looking for a Front End Developer​!

Backblaze is a 10 year old company. Providing great customer experiences is the “secret sauce” that enables us to successfully compete against some of technology’s giants. We’ll finish the year at ~$20MM ARR and are a profitable business. This is an opportunity to have your work shine at scale in one of the fastest growing verticals in tech – Cloud Storage.

You will utilize HTML, ReactJS, CSS and jQuery to develop intuitive, elegant user experiences. As a member of our Front End Dev team, you will work closely with our web development, software design, and marketing teams.

On a day to day basis, you must be able to convert image mockups to HTML or ReactJS – There’s some production work that needs to get done. But you will also be responsible for helping build out new features, rethink old processes, and enabling third party systems to empower our marketing/sales/ and support teams.

Our Front End Developer must be proficient in:

  • HTML, ReactJS
  • UTF-8, Java Properties, and Localized HTML (Backblaze runs in 11 languages!)
  • JavaScript, CSS, Ajax
  • jQuery, Bootstrap
  • JSON, XML
  • Understanding of cross-browser compatibility issues and ways to work around them
  • Basic SEO principles and ensuring that applications will adhere to them
  • Learning about third party marketing and sales tools through reading documentation. Our systems include Google Tag Manager, Google Analytics, Salesforce, and Hubspot

Struts, Java, JSP, Servlet and Apache Tomcat are a plus, but not required.

We’re looking for someone that is:

  • Passionate about building friendly, easy to use Interfaces and APIs.
  • Likes to work closely with other engineers, support, and marketing to help customers.
  • Is comfortable working independently on a mutually agreed upon prioritization queue (we don’t micromanage, we do make sure tasks are reasonably defined and scoped).
  • Diligent with quality control. Backblaze prides itself on giving our team autonomy to get work done, do the right thing for our customers, and keep a pace that is sustainable over the long run. As such, we expect everyone that checks in code that is stable. We also have a small QA team that operates as a secondary check when needed.

Backblaze Employees Have:

  • Good attitude and willingness to do whatever it takes to get the job done
  • Strong desire to work for a small fast, paced company
  • Desire to learn and adapt to rapidly changing technologies and work environment
  • Comfort with well behaved pets in the office

This position is located in San Mateo, California. Regular attendance in the office is expected. Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

If this sounds like you
Send an email to [email protected] with:

  1. Front End Dev​ in the subject line
  2. Your resume attached
  3. An overview of your relevant experience

The post Wanted: Front End Developer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Launch – AWS Glue Now Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/launch-aws-glue-now-generally-available/

Today we’re excited to announce the general availability of AWS Glue. Glue is a fully managed, serverless, and cloud-optimized extract, transform and load (ETL) service. Glue is different from other ETL services and platforms in a few very important ways.

First, Glue is “serverless” – you don’t need to provision or manage any resources and you only pay for resources when Glue is actively running. Second, Glue provides crawlers that can automatically detect and infer schemas from many data sources, data types, and across various types of partitions. It stores these generated schemas in a centralized Data Catalog for editing, versioning, querying, and analysis. Third, Glue can automatically generate ETL scripts (in Python!) to translate your data from your source formats to your target formats. Finally, Glue allows you to create development endpoints that allow your developers to use their favorite toolchains to construct their ETL scripts. Ok, let’s dive deep with an example.

In my job as a Developer Evangelist I spend a lot of time traveling and I thought it would be cool to play with some flight data. The Bureau of Transportations Statistics is kind enough to share all of this data for anyone to use here. We can easily download this data and put it in an Amazon Simple Storage Service (S3) bucket. This data will be the basis of our work today.

Crawlers

First, we need to create a Crawler for our flights data from S3. We’ll select Crawlers in the Glue console and follow the on screen prompts from there. I’ll specify s3://crawler-public-us-east-1/flight/2016/csv/ as my first datasource (we can add more later if needed). Next, we’ll create a database called flights and give our tables a prefix of flights as well.

The Crawler will go over our dataset, detect partitions through various folders – in this case months of the year, detect the schema, and build a table. We could add additonal data sources and jobs into our crawler or create separate crawlers that push data into the same database but for now let’s look at the autogenerated schema.

I’m going to make a quick schema change to year, moving it from BIGINT to INT. Then I can compare the two versions of the schema if needed.

Now that we know how to correctly parse this data let’s go ahead and do some transforms.

ETL Jobs

Now we’ll navigate to the Jobs subconsole and click Add Job. Will follow the prompts from there giving our job a name, selecting a datasource, and an S3 location for temporary files. Next we add our target by specifying “Create tables in your data target” and we’ll specify an S3 location in Parquet format as our target.

After clicking next, we’re at screen showing our various mappings proposed by Glue. Now we can make manual column adjustments as needed – in this case we’re just going to use the X button to remove a few columns that we don’t need.

This brings us to my favorite part. This is what I absolutely love about Glue.

Glue generated a PySpark script to transform our data based on the information we’ve given it so far. On the left hand side we can see a diagram documenting the flow of the ETL job. On the top right we see a series of buttons that we can use to add annotated data sources and targets, transforms, spigots, and other features. This is the interface I get if I click on transform.

If we add any of these transforms or additional data sources, Glue will update the diagram on the left giving us a useful visualization of the flow of our data. We can also just write our own code into the console and have it run. We can add triggers to this job that fire on completion of another job, a schedule, or on demand. That way if we add more flight data we can reload this same data back into S3 in the format we need.

I could spend all day writing about the power and versatility of the jobs console but Glue still has more features I want to cover. So, while I might love the script editing console, I know many people prefer their own development environments, tools, and IDEs. Let’s figure out how we can use those with Glue.

Development Endpoints and Notebooks

A Development Endpoint is an environment used to develop and test our Glue scripts. If we navigate to “Dev endpoints” in the Glue console we can click “Add endpoint” in the top right to get started. Next we’ll select a VPC, a security group that references itself and then we wait for it to provision.


Once it’s provisioned we can create an Apache Zeppelin notebook server by going to actions and clicking create notebook server. We give our instance an IAM role and make sure it has permissions to talk to our data sources. Then we can either SSH into the server or connect to the notebook to interactively develop our script.

Pricing and Documentation

You can see detailed pricing information here. Glue crawlers, ETL jobs, and development endpoints are all billed in Data Processing Unit Hours (DPU) (billed by minute). Each DPU-Hour costs $0.44 in us-east-1. A single DPU provides 4vCPU and 16GB of memory.

We’ve only covered about half of the features that Glue has so I want to encourage everyone who made it this far into the post to go read the documentation and service FAQs. Glue also has a rich and powerful API that allows you to do anything console can do and more.

We’re also releasing two new projects today. The aws-glue-libs provide a set of utilities for connecting, and talking with Glue. The aws-glue-samples repo contains a set of example jobs.

I hope you find that using Glue reduces the time it takes to start doing things with your data. Look for another post from me on AWS Glue soon because I can’t stop playing with this new service.
Randall

AWS Config Update – New Managed Rules to Secure S3 Buckets

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-config-update-new-managed-rules-to-secure-s3-buckets/

AWS Config captures the state of your AWS resources and the relationships between them. Among other features, it allows you to select a resource and then view a timeline of configuration changes that affect the resource (read Track AWS Resource Relationships With AWS Config to learn more).

AWS Config rules extends Config with a powerful rule system, with support for a “managed” collection of AWS rules as well as custom rules that you write yourself (my blog post, AWS Config Rules – Dynamic Compliance Checking for Cloud Resources, contains more info). The rules (AWS Lambda functions) represent the ideal (properly configured and compliant) state of your AWS resources. The appropriate functions are invoked when a configuration change is detected and check to ensure compliance.

You already have access to about three dozen managed rules. For example, here are some of the rules that check your EC2 instances and related resources:

Two New Rules
Today we are adding two new managed rules that will help you to secure your S3 buckets. You can enable these rules with a single click. The new rules are:

s3-bucket-public-write-prohibited – Automatically identifies buckets that allow global write access. There’s rarely a reason to create this configuration intentionally since it allows
unauthorized users to add malicious content to buckets and to delete (by overwriting) existing content. The rule checks all of the buckets in the account.

s3-bucket-public-read-prohibited – Automatically identifies buckets that allow global read access. This will flag content that is publicly available, including web sites and documentation. This rule also checks all buckets in the account.

Like the existing rules, the new rules can be run on a schedule or in response to changes detected by Config. You can see the compliance status of all of your rules at a glance:

Each evaluation runs in a matter of milliseconds; scanning an account with 100 buckets will take less than a minute. Behind the scenes, the rules are evaluated by a reasoning engine that uses some leading-edge constraint solving techniques that can, in many cases, address NP-complete problems in polynomial time (we did not resolve P versus NP; that would be far bigger news). This work is part of a larger effort within AWS, some of which is described in a AWS re:Invent presentation: Automated Formal Reasoning About AWS Systems:

Now Available
The new rules are available now and you can start using them today. Like the other rules, they are priced at $2 per rule per month.

Jeff;

New – AWS SAM Local (Beta) – Build and Test Serverless Applications Locally

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-aws-sam-local-beta-build-and-test-serverless-applications-locally/

Today we’re releasing a beta of a new tool, SAM Local, that makes it easy to build and test your serverless applications locally. In this post we’ll use SAM local to build, debug, and deploy a quick application that allows us to vote on tabs or spaces by curling an endpoint. AWS introduced Serverless Application Model (SAM) last year to make it easier for developers to deploy serverless applications. If you’re not already familiar with SAM my colleague Orr wrote a great post on how to use SAM that you can read in about 5 minutes. At it’s core, SAM is a powerful open source specification built on AWS CloudFormation that makes it easy to keep your serverless infrastructure as code – and they have the cutest mascot.

SAM Local takes all the good parts of SAM and brings them to your local machine.

There are a couple of ways to install SAM Local but the easiest is through NPM. A quick npm install -g aws-sam-local should get us going but if you want the latest version you can always install straight from the source: go get github.com/awslabs/aws-sam-local (this will create a binary named aws-sam-local, not sam).

I like to vote on things so let’s write a quick SAM application to vote on Spaces versus Tabs. We’ll use a very simple, but powerful, architecture of API Gateway fronting a Lambda function and we’ll store our results in DynamoDB. In the end a user should be able to curl our API curl https://SOMEURL/ -d '{"vote": "spaces"}' and get back the number of votes.

Let’s start by writing a simple SAM template.yaml:

AWSTemplateFormatVersion : '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
  VotesTable:
    Type: "AWS::Serverless::SimpleTable"
  VoteSpacesTabs:
    Type: "AWS::Serverless::Function"
    Properties:
      Runtime: python3.6
      Handler: lambda_function.lambda_handler
      Policies: AmazonDynamoDBFullAccess
      Environment:
        Variables:
          TABLE_NAME: !Ref VotesTable
      Events:
        Vote:
          Type: Api
          Properties:
            Path: /
            Method: post

So we create a [dynamo_i] table that we expose to our Lambda function through an environment variable called TABLE_NAME.

To test that this template is valid I’ll go ahead and call sam validate to make sure I haven’t fat-fingered anything. It returns Valid! so let’s go ahead and get to work on our Lambda function.

import os
import os
import json
import boto3
votes_table = boto3.resource('dynamodb').Table(os.getenv('TABLE_NAME'))

def lambda_handler(event, context):
    print(event)
    if event['httpMethod'] == 'GET':
        resp = votes_table.scan()
        return {'body': json.dumps({item['id']: int(item['votes']) for item in resp['Items']})}
    elif event['httpMethod'] == 'POST':
        try:
            body = json.loads(event['body'])
        except:
            return {'statusCode': 400, 'body': 'malformed json input'}
        if 'vote' not in body:
            return {'statusCode': 400, 'body': 'missing vote in request body'}
        if body['vote'] not in ['spaces', 'tabs']:
            return {'statusCode': 400, 'body': 'vote value must be "spaces" or "tabs"'}

        resp = votes_table.update_item(
            Key={'id': body['vote']},
            UpdateExpression='ADD votes :incr',
            ExpressionAttributeValues={':incr': 1},
            ReturnValues='ALL_NEW'
        )
        return {'body': "{} now has {} votes".format(body['vote'], resp['Attributes']['votes'])}

So let’s test this locally. I’ll need to create a real DynamoDB database to talk to and I’ll need to provide the name of that database through the enviornment variable TABLE_NAME. I could do that with an env.json file or I can just pass it on the command line. First, I can call:
$ echo '{"httpMethod": "POST", "body": "{\"vote\": \"spaces\"}"}' |\
TABLE_NAME="vote-spaces-tabs" sam local invoke "VoteSpacesTabs"

to test the Lambda – it returns the number of votes for spaces so theoritically everything is working. Typing all of that out is a pain so I could generate a sample event with sam local generate-event api and pass that in to the local invocation. Far easier than all of that is just running our API locally. Let’s do that: sam local start-api. Now I can curl my local endpoints to test everything out.
I’ll run the command: $ curl -d '{"vote": "tabs"}' http://127.0.0.1:3000/ and it returns: “tabs now has 12 votes”. Now, of course I did not write this function perfectly on my first try. I edited and saved several times. One of the benefits of hot-reloading is that as I change the function I don’t have to do any additional work to test the new function. This makes iterative development vastly easier.

Let’s say we don’t want to deal with accessing a real DynamoDB database over the network though. What are our options? Well we can download DynamoDB Local and launch it with java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb. Then we can have our Lambda function use the AWS_SAM_LOCAL environment variable to make some decisions about how to behave. Let’s modify our function a bit:

import os
import json
import boto3
if os.getenv("AWS_SAM_LOCAL"):
    votes_table = boto3.resource(
        'dynamodb',
        endpoint_url="http://docker.for.mac.localhost:8000/"
    ).Table("spaces-tabs-votes")
else:
    votes_table = boto3.resource('dynamodb').Table(os.getenv('TABLE_NAME'))

Now we’re using a local endpoint to connect to our local database which makes working without wifi a little easier.

SAM local even supports interactive debugging! In Java and Node.js I can just pass the -d flag and a port to immediately enable the debugger. For Python I could use a library like import epdb; epdb.serve() and connect that way. Then we can call sam local invoke -d 8080 "VoteSpacesTabs" and our function will pause execution waiting for you to step through with the debugger.

Alright, I think we’ve got everything working so let’s deploy this!

First I’ll call the sam package command which is just an alias for aws cloudformation package and then I’ll use the result of that command to sam deploy.

$ sam package --template-file template.yaml --s3-bucket MYAWESOMEBUCKET --output-template-file package.yaml
Uploading to 144e47a4a08f8338faae894afe7563c3  90570 / 90570.0  (100.00%)
Successfully packaged artifacts and wrote output template to file package.yaml.
Execute the following command to deploy the packaged template
aws cloudformation deploy --template-file package.yaml --stack-name 
$ sam deploy --template-file package.yaml --stack-name VoteForSpaces --capabilities CAPABILITY_IAM
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - VoteForSpaces

Which brings us to our API:
.

I’m going to hop over into the production stage and add some rate limiting in case you guys start voting a lot – but otherwise we’ve taken our local work and deployed it to the cloud without much effort at all. I always enjoy it when things work on the first deploy!

You can vote now and watch the results live! http://spaces-or-tabs.s3-website-us-east-1.amazonaws.com/

We hope that SAM Local makes it easier for you to test, debug, and deploy your serverless apps. We have a CONTRIBUTING.md guide and we welcome pull requests. Please tweet at us to let us know what cool things you build. You can see our What’s New post here and the documentation is live here.

Randall

OSGeo-Live 11.0 Released

Post Syndicated from ris original https://lwn.net/Articles/730340/rss

OSGeo-Live is a live DVD/USB/VM distribution that includes a variety of
open-source geospatial software. Version 11.0 is “a major
reboot, with a refocus on leading applications and emphasis on quality over
quantity. Less mature parts of the projects have been dropped with a
targeted focus placed on upgrading and improving documentation.

AWS Encryption SDK: How to Decide if Data Key Caching Is Right for Your Application

Post Syndicated from June Blender original https://aws.amazon.com/blogs/security/aws-encryption-sdk-how-to-decide-if-data-key-caching-is-right-for-your-application/

AWS KMS image

Today, the AWS Crypto Tools team introduced a new feature in the AWS Encryption SDK: data key caching. Data key caching lets you reuse the data keys that protect your data, instead of generating a new data key for each encryption operation.

Data key caching can reduce latency, improve throughput, reduce cost, and help you stay within service limits as your application scales. In particular, caching might help if your application is hitting the AWS Key Management Service (KMS) requests-per-second limit and raising the limit does not solve the problem.

However, these benefits come with some security tradeoffs. Encryption best practices generally discourage extensive reuse of data keys.

In this blog post, I explore those tradeoffs and provide information that can help you decide whether data key caching is a good strategy for your application. I also explain how data key caching is implemented in the AWS Encryption SDK and describe the security thresholds that you can set to limit the reuse of data keys. Finally, I provide some practical examples of using the security thresholds to meet cost, performance, and security goals.

Introducing data key caching

The AWS Encryption SDK is a client-side encryption library that makes it easier for you to implement cryptography best practices in your application. It includes secure default behavior for developers who are not encryption experts, while being flexible enough to work for the most experienced users.

In the AWS Encryption SDK, by default, you generate a new data key for each encryption operation. This is the most secure practice. However, in some applications, the overhead of generating a new data key for each operation is not acceptable.

Data key caching saves the plaintext and ciphertext of the data keys you use in a configurable cache. When you need a key to encrypt or decrypt data, you can reuse a data key from the cache instead of creating a new data key. You can create multiple data key caches and configure each one independently. Most importantly, the AWS Encryption SDK provides security thresholds that you can set to determine how much data key reuse you will allow.

To make data key caching easier to implement, the AWS Encryption SDK provides LocalCryptoMaterialsCache, an in-memory, least-recently-used cache with a configurable size. The SDK manages the cache for you, including adding store, search, and match logic to all encryption and decryption operations.

We recommend that you use LocalCryptoMaterialsCache as it is, but you can customize it, or substitute a compatible cache. However, you should never store plaintext data keys on disk.

The AWS Encryption SDK documentation includes sample code in Java and Python for an application that uses data key caching to encrypt data sent to and from Amazon Kinesis Streams.

Balance cost and security

Your decision to use data key caching should balance cost—in time, money, and resources—against security. In every consideration, though, the balance should favor your security requirements. As a rule, use the minimal caching required to achieve your cost and performance goals.

Before implementing data key caching, consider the details of your applications, your security requirements, and the cost and frequency of your encryption operations. In general, your application can benefit from data key caching if each operation is slow or expensive, or if you encrypt and decrypt data frequently. If the cost and speed of your encryption operations are already acceptable or can be improved by other means, do not use a data key cache.

Data key caching can be the right choice for your application if you have high encryption and decryption traffic. For example, if you are hitting your KMS requests-per-second limit, caching can help because you get some of your data keys from the cache instead of calling KMS for every request.

However, you can also create a case in the AWS Support Center to raise the KMS limit for your account. If raising the limit solves the problem, you do not need data key caching.

Configure caching thresholds for cost and security

In the AWS Encryption SDK, you can configure data key caching to allow just enough data key reuse to meet your cost and performance targets while conforming to the security requirements of your application. The SDK enforces the thresholds so that you can use them with any compatible cache.

The data key caching security thresholds apply to each cache entry. The AWS Encryption SDK will not use the data key from a cache entry that exceeds any of the thresholds that you set.

  • Maximum age (required): Set the lifetime of each cached key to be long enough to get cache hits, but short enough to limit exposure of a plaintext data key in memory to a specific time period.

You can use the maximum age threshold like a key rotation policy. Use it to limit the reuse of data keys and minimize exposure of cryptographic materials. You can also use it to evict data keys when the type or source of data that your application is processing changes.

  • Maximum messages encrypted (optional; default is 232 messages): Set the number of messages protected by each cached data key to be large enough to get value from reuse, but small enough to limit the number of messages that might potentially be exposed.

The AWS Encryption SDK only caches data keys that use an algorithm suite with a key derivation function. This technique avoids the cryptographic limits on the number of bytes encrypted with a single key. However, the more data that a key encrypts, the more data that is exposed if the data key is compromised.

Limiting the number of messages, rather than the number of bytes, is particularly useful if your application encrypts many messages of a similar size or when potential exposure must be limited to very few messages. This threshold is also useful when you want to reuse a data key for a particular type of message and know in advance how many messages of that type you have. You can also use an encryption context to select particular cached data keys for your encryption requests.

  • Maximum bytes encrypted (optional; default is 263 – 1): Set the bytes protected by each cached data key to be large enough to allow the reuse you need, but small enough to limit the amount of data encrypted under the same key.

Limiting the number of bytes, rather than the number of messages, is preferable when your application encrypts messages of widely varying size or when possibly exposing large amounts of data is much more of a concern than exposing smaller amounts of data.

In addition to these security thresholds, the LocalCryptoMaterialsCache in the AWS Encryption SDK lets you set its capacity, which is the maximum number of entries the cache can hold.

Use the capacity value to tune the performance of your LocalCryptoMaterialsCache. In general, use the smallest value that will achieve the performance improvements that your application requires. You might want to test with a very small cache of 5–10 entries and expand if necessary. You will need a slightly larger cache if you are using the cache for both encryption and decryption requests, or if you are using encryption contexts to select particular cache entries.

Consider these cache configuration examples

After you determine the security and performance requirements of your application, consider the cache security thresholds carefully and adjust them to meet your needs. There are no magic numbers for these thresholds: the ideal settings are specific to each application, its security and performance requirements, and budget. Use the minimal amount of caching necessary to get acceptable performance and cost.

The following examples show ways you can use the LocalCryptoMaterialsCache capacity setting and the security thresholds to help meet your security requirements:

  • Slow master key operations: If your master key processes only 100 transactions per second (TPS) but your application needs to process 1,000 TPS, you can meet your application requirements by allowing a maximum of 10 messages to be protected under each data key.
  • High frequency and volume: If your master key costs $0.01 per operation and you need to process a consistent 1,000 TPS while staying within a budget of $100,000 per month, allow a maximum of 275 messages for each cache entry.
  • Burst traffic: If your application’s processing bursts to 100 TPS for five seconds in each minute but is otherwise zero, and your master key costs $0.01 per operation, setting maximum messages to 3 can achieve significant savings. To prevent data keys from being reused across bursts (55 seconds), set the maximum age of each cached data key to 20 seconds.
  • Expensive master key operations: If your application uses a low-throughput encryption service that costs as much as $1.00 per operation, you might want to minimize the number of operations. To do so, create a cache that is large enough to contain the data keys you need. Then, set the byte and message limits high enough to allow reuse while conforming to your security requirements. For example, if your security requirements do not permit a data key to encrypt more than 10 GB of data, setting bytes processed to 10 GB still significantly minimizes operations and conforms to your security requirements.

Learn more about data key caching

To learn more about data key caching, including how to implement it, how to set the security thresholds, and details about the caching components, see Data Key Caching in the AWS Encryption SDK. Also, see the AWS Encryption SDKs for Java and Python as well as the Javadoc and Python documentation.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions, file an issue in the GitHub repos for the Encryption SDK in Java or Python, or start a new thread on the KMS forum.

– June

Updates to GPIO Zero, the physical computing API

Post Syndicated from Ben Nuttall original https://www.raspberrypi.org/blog/gpio-zero-update/

GPIO Zero v1.4 is out now! It comes with a set of new features, including a handy pinout command line tool. To start using this newest version of the API, update your Raspbian OS now:

sudo apt update && sudo apt upgrade

Some of the things we’ve added will make it easier for you try your hand on different programming styles. In doing so you’ll build your coding skills, and will improve as a programmer. As a consequence, you’ll learn to write more complex code, which will enable you to take on advanced electronics builds. And on top of that, you can use the skills you’ll acquire in other computing projects.

GPIO Zero pinout tool

The new pinout tool

Developing GPIO Zero

Nearly two years ago, I started the GPIO Zero project as a simple wrapper around the low-level RPi.GPIO library. I wanted to create a simpler way to control GPIO-connected devices in Python, based on three years’ experience of training teachers, running workshops, and building projects. The idea grew over time, and the more we built for our Python library, the more sophisticated and powerful it became.

One of the great things about Python is that it’s a multi-paradigm programming language. You can write code in a number of different styles, according to your needs. You don’t have to write classes, but you can if you need them. There are functional programming tools available, but beginners get by without them. Importantly, the more advanced features of the language are not a barrier to entry.

Become a more advanced programmer

As a beginner to programming, you usually start by writing procedural programs, in which the flow moves from top to bottom. Then you’ll probably add loops and create your own functions. Your next step might be to start using libraries which introduce new patterns that operate in a different manner to what you’ve written before, for example threaded callbacks (event-driven programming). You might move on to object-oriented programming, extending the functionality of classes provided by other libraries, and starting to write your own classes. Occasionally, you may make use of tools created with functional programming techniques.

Five buttons in different colours

Take control of the buttons in your life

It’s much the same with GPIO Zero: you can start using it very easily, and we’ve made it simple to progress along the learning curve towards more advanced programming techniques. For example, if you want to make a push button control an LED, the easiest way to do this is via procedural programming using a while loop:

from gpiozero import LED, Button

led = LED(17)
button = Button(2)

while True:
    if button.is_pressed:
        led.on()
    else:
        led.off()

But another way to achieve the same thing is to use events:

from gpiozero import LED, Button
from signal import pause

led = LED(17)
button = Button(2)

button.when_pressed = led.on
button.when_released = led.off

pause()

You could even use a declarative approach, and set the LED’s behaviour in a single line:

from gpiozero import LED, Button
from signal import pause

led = LED(17)
button = Button(2)

led.source = button.values

pause()

You will find that using the procedural approach is a great start, but at some point you’ll hit a limit, and will have to try a different approach. The example above can be approach in several programming styles. However, if you’d like to control a wider range of devices or a more complex system, you need to carefully consider which style works best for what you want to achieve. Being able to choose the right programming style for a task is a skill in itself.

Source/values properties

So how does the led.source = button.values thing actually work?

Every GPIO Zero device has a .value property. For example, you can read a button’s state (True or False), and read or set an LED’s state (so led.value = True is the same as led.on()). Since LEDs and buttons operate with the same value set (True and False), you could say led.value = button.value. However, this only sets the LED to match the button once. If you wanted it to always match the button’s state, you’d have to use a while loop. To make things easier, we came up with a way of telling devices they’re connected: we added a .values property to all devices, and a .source to output devices. Now, a loop is no longer necessary, because this will do the job:

led.source = button.values

This is a simple approach to connecting devices using a declarative style of programming. In one single line, we declare that the LED should get its values from the button, i.e. when the button is pressed, the LED should be on. You can even mix the procedural with the declarative style: at one stage of the program, the LED could be set to match the button, while in the next stage it could just be blinking, and finally it might return back to its original state.

These additions are useful for connecting other devices as well. For example, a PWMLED (LED with variable brightness) has a value between 0 and 1, and so does a potentiometer connected via an ADC (analogue-digital converter) such as the MCP3008. The new GPIO Zero update allows you to say led.source = pot.values, and then twist the potentiometer to control the brightness of the LED.

But what if you want to do something more complex, like connect two devices with different value sets or combine multiple inputs?

We provide a set of device source tools, which allow you to process values as they flow from one device to another. They also let you send in artificial values such as random data, and you can even write your own functions to generate values to pass to a device’s source. For example, to control a motor’s speed with a potentiometer, you could use this code:

from gpiozero import Motor, MCP3008
from signal import pause

motor = Motor(20, 21)
pot = MCP3008()

motor.source = pot.values

pause()

This works, but it will only drive the motor forwards. If you wanted the potentiometer to drive it forwards and backwards, you’d use the scaled tool to scale its values to a range of -1 to 1:

from gpiozero import Motor, MCP3008
from gpiozero.tools import scaled
from signal import pause

motor = Motor(20, 21)
pot = MCP3008()

motor.source = scaled(pot.values, -1, 1)

pause()

And to separately control a robot’s left and right motor speeds with two potentiometers, you could do this:

from gpiozero import Robot, MCP3008
from signal import pause

robot = Robot(left=(2, 3), right=(4, 5))
left = MCP3008(0)
right = MCP3008(1)

robot.source = zip(left.values, right.values)

pause()

GPIO Zero and Blue Dot

Martin O’Hanlon created a Python library called Blue Dot which allows you to use your Android device to remotely control things on their Raspberry Pi. The API is very similar to GPIO Zero, and it even incorporates the value/values properties, which means you can hook it up to GPIO devices easily:

from bluedot import BlueDot
from gpiozero import LED
from signal import pause

bd = BlueDot()
led = LED(17)

led.source = bd.values

pause()

We even included a couple of Blue Dot examples in our recipes.

Make a series of binary logic gates using source/values

Read more in this source/values tutorial from The MagPi, and on the source/values documentation page.

Remote GPIO control

GPIO Zero supports multiple low-level GPIO libraries. We use RPi.GPIO by default, but you can choose to use RPIO or pigpio instead. The pigpio library supports remote connections, so you can run GPIO Zero on one Raspberry Pi to control the GPIO pins of another, or run code on a PC (running Windows, Mac, or Linux) to remotely control the pins of a Pi on the same network. You can even control two or more Pis at once!

If you’re using Raspbian on a Raspberry Pi (or a PC running our x86 Raspbian OS), you have everything you need to remotely control GPIO. If you’re on a PC running Windows, Mac, or Linux, you just need to install gpiozero and pigpio using pip. See our guide on configuring remote GPIO.

I road-tested the new pin_factory syntax at the Raspberry Jam @ Pi Towers

There are a number of different ways to use remote pins:

  • Set the default pin factory and remote IP address with environment variables:
$ GPIOZERO_PIN_FACTORY=pigpio PIGPIO_ADDR=192.168.1.2 python3 blink.py
  • Set the default pin factory in your script:
import gpiozero
from gpiozero import LED
from gpiozero.pins.pigpio import PiGPIOFactory

gpiozero.Device.pin_factory = PiGPIOFactory(host='192.168.1.2')

led = LED(17)
  • The pin_factory keyword argument allows you to use multiple Pis in the same script:
from gpiozero import LED
from gpiozero.pins.pigpio import PiGPIOFactory

factory2 = PiGPIOFactory(host='192.168.1.2')
factory3 = PiGPIOFactory(host='192.168.1.3')

local_hat = TrafficHat()
remote_hat2 = TrafficHat(pin_factory=factory2)
remote_hat3 = TrafficHat(pin_factory=factory3)

This is a really powerful feature! For more, read this remote GPIO tutorial in The MagPi, and check out the remote GPIO recipes in our documentation.

GPIO Zero on your PC

GPIO Zero doesn’t have any dependencies, so you can install it on your PC using pip. In addition to the API’s remote GPIO control, you can use its ‘mock’ pin factory on your PC. We originally created the mock pin feature for the GPIO Zero test suite, but we found that it’s really useful to be able to test GPIO Zero code works without running it on real hardware:

$ GPIOZERO_PIN_FACTORY=mock python3
>>> from gpiozero import LED
>>> led = LED(22)
>>> led.blink()
>>> led.value
True
>>> led.value
False

You can even tell pins to change state (e.g. to simulate a button being pressed) by accessing an object’s pin property:

>>> from gpiozero import LED
>>> led = LED(22)
>>> button = Button(23)
>>> led.source = button.values
>>> led.value
False
>>> button.pin.drive_low()
>>> led.value
True

You can also use the pinout command line tool if you set your pin factory to ‘mock’. It gives you a Pi 3 diagram by default, but you can supply a revision code to see information about other Pi models. For example, to use the pinout tool for the original 256MB Model B, just type pinout -r 2.

GPIO Zero documentation and resources

On the API’s website, we provide beginner recipes and advanced recipes, and we have added remote GPIO configuration including PC/Mac/Linux and Pi Zero OTG, and a section of GPIO recipes. There are also new sections on source/values, command-line tools, FAQs, Pi information and library development.

You’ll find plenty of cool projects using GPIO Zero in our learning resources. For example, you could check out the one that introduces physical computing with Python and get stuck in! We even provide a GPIO Zero cheat sheet you can download and print.

There are great GPIO Zero tutorials and projects in The MagPi magazine every month. Moreover, they also publish Simple Electronics with GPIO Zero, a book which collects a series of tutorials useful for building your knowledge of physical computing. And the best thing is, you can download it, and all magazine issues, for free!

Check out the API documentation and read more about what’s new in GPIO Zero on my blog. We have lots planned for the next release. Watch this space.

Get building!

The world of physical computing is at your fingertips! Are you feeling inspired?

If you’ve never tried your hand on physical computing, our Build a robot buggy learning resource is the perfect place to start! It’s your step-by-step guide for building a simple robot controlled with the help of GPIO Zero.

If you have a gee-whizz idea for an electronics project, do share it with us below. And if you’re currently working on a cool build and would like to show us how it’s going, pop a link to it in the comments.

The post Updates to GPIO Zero, the physical computing API appeared first on Raspberry Pi.

[$] The kernel’s genpool subsystem

Post Syndicated from corbet original https://lwn.net/Articles/729653/rss

The kernel is a huge program; among other things, that means that many
problems encountered by a kernel developer have already been solved
somewhere else in the tree. But those solutions are not always well known
or documented. Recently, a seasoned developer confessed to having never encountered the
“genpool” memory allocator. This little subsystem does not appear in the
kernel documentation, and is likely to be unknown to others as well. In
the interest of fixing both of those problems, here is an overview of
genpool (or “genalloc”) and what it does.

Newly Updated: Example AWS IAM Policies for You to Use and Customize

Post Syndicated from Deren Smith original https://aws.amazon.com/blogs/security/newly-updated-example-policies-for-you-to-use-and-customize/

To help you grant access to specific resources and conditions, the Example Policies page in the AWS Identity and Access Management (IAM) documentation now includes more than thirty policies for you to use or customize to meet your permissions requirements. The AWS Support team developed these policies from their experiences working with AWS customers over the years. The example policies cover common permissions use cases you might encounter across services such as Amazon DynamoDB, Amazon EC2, AWS Elastic Beanstalk, Amazon RDS, Amazon S3, and IAM.

In this blog post, I introduce the updated Example Policies page and explain how to use and customize these policies for your needs.

The new Example Policies page

The Example Policies page in the IAM User Guide now provides an overview of the example policies and includes a link to view each policy on a separate page. Note that each of these policies has been reviewed and approved by AWS Support. If you would like to submit a policy that you have found to be particularly useful, post it on the IAM forum.

To give you an idea of the policies we have included on this page, the following are a few of the EC2 policies on the page:

To see the full list of available policies, see the Example Polices page.

In the following section, I demonstrate how to use a policy from the Example Policies page and customize it for your needs.

How to customize an example policy for your needs

Suppose you want to allow an IAM user, Bob, to start and stop EC2 instances with a specific resource tag. After looking through the Example Policies page, you see the policy, Allows Starting or Stopping EC2 Instances a User Has Tagged, Programmatically and in the Console.

To apply this policy to your specific use case:

  1. Navigate to the Policies section of the IAM console.
  2. Choose Create policy.
    Screenshot of choosing "Create policy"
  3. Choose the Select button next to Create Your Own Policy. You will see an empty policy document with boxes for Policy Name, Description, and Policy Document, as shown in the following screenshot.
  4. Type a name for the policy, copy the policy from the Example Policies page, and paste the policy in the Policy Document box. In this example, I use “start-stop-instances-for-owner-tag” as the policy name and “Allows users to start or stop instances if the instance tag Owner has the value of their user name” as the description.
  5. Update the placeholder text in the policy (see the full policy that follows this step). For example, replace <REGION> with a region from AWS Regions and Endpoints and <ACCOUNTNUMBER> with your 12-digit account number. The IAM policy variable, ${aws:username}, is a dynamic property in the policy that automatically applies to the user to which it is attached. For example, when the policy is attached to Bob, the policy replaces ${aws:username} with Bob. If you do not want to use the key value pair of Owner and ${aws:username}, you can edit the policy to include your desired key value pair. For example, if you want to use the key value pair, CostCenter:1234, you can modify “ec2:ResourceTag/Owner”: “${aws:username}” to “ec2:ResourceTag/CostCenter”: “1234”.
    {
        "Version": "2012-10-17",
        "Statement": [
           {
          "Effect": "Allow",
          "Action": [
              "ec2:StartInstances",
              "ec2:StopInstances"
          ],
                 "Resource": "arn:aws:ec2:<REGION>:<ACCOUNTNUMBER>:instance/*",
                 "Condition": {
              "StringEquals": {
                  "ec2:ResourceTag/Owner": "${aws:username}"
              }
          }
            },
            {
                 "Effect": "Allow",
                 "Action": "ec2:DescribeInstances",
                 "Resource": "*"
            }
        ]
    }

  6. After you have edited the policy, choose Create policy.

You have created a policy that allows an IAM user to stop and start EC2 instances in your account, as long as these instances have the correct resource tag and the policy is attached to your IAM users. You also can attach this policy to an IAM group and apply the policy to users by adding them to that group.

Summary

We updated the Example Policies page in the IAM User Guide so that you have a central location where you can find examples of the most commonly requested and used IAM policies. In addition to these example policies, we recommend that you review the list of AWS managed policies, including the AWS managed policies for job functions. You can choose these predefined policies from the IAM console and associate them with your IAM users, groups, and roles.

We will add more IAM policies to the Example Policies page over time. If you have a useful policy you would like to share with others, post it on the IAM forum. If you have comments about this post, submit them in the “Comments” section below.

– Deren

Create Multiple Builds from the Same Source Using Different AWS CodeBuild Build Specification Files

Post Syndicated from Prakash Palanisamy original https://aws.amazon.com/blogs/devops/create-multiple-builds-from-the-same-source-using-different-aws-codebuild-build-specification-files/

In June 2017, AWS CodeBuild announced you can now specify an alternate build specification file name or location in an AWS CodeBuild project.

In this post, I’ll show you how to use different build specification files in the same repository to create different builds. You’ll find the source code for this post in our GitHub repo.

Requirements

The AWS CLI must be installed and configured.

Solution Overview

I have created a C program (cbsamplelib.c) that will be used to create a shared library and another utility program (cbsampleutil.c) to use that library. I’ll use a Makefile to compile these files.

I need to put this sample application in RPM and DEB packages so end users can easily deploy them. I have created a build specification file for RPM. It will use make to compile this code and the RPM specification file (cbsample.rpmspec) configured in the build specification to create the RPM package. Similarly, I have created a build specification file for DEB. It will create the DEB package based on the control specification file (cbsample.control) configured in this build specification.

RPM Build Project:

The following build specification file (buildspec-rpm.yml) uses build specification version 0.2. As described in the documentation, this version has different syntax for environment variables. This build specification includes multiple phases:

  • As part of the install phase, the required packages is installed using yum.
  • During the pre_build phase, the required directories are created and the required files, including the RPM build specification file, are copied to the appropriate location.
  • During the build phase, the code is compiled, and then the RPM package is created based on the RPM specification.

As defined in the artifact section, the RPM file will be uploaded as a build artifact.

version: 0.2

env:
  variables:
    build_version: "0.1"

phases:
  install:
    commands:
      - yum install rpm-build make gcc glibc -y
  pre_build:
    commands:
      - curr_working_dir=`pwd`
      - mkdir -p ./{RPMS,SRPMS,BUILD,SOURCES,SPECS,tmp}
      - filename="cbsample-$build_version"
      - echo $filename
      - mkdir -p $filename
      - cp ./*.c ./*.h Makefile $filename
      - tar -zcvf /root/$filename.tar.gz $filename
      - cp /root/$filename.tar.gz ./SOURCES/
      - cp cbsample.rpmspec ./SPECS/
  build:
    commands:
      - echo "Triggering RPM build"
      - rpmbuild --define "_topdir `pwd`" -ba SPECS/cbsample.rpmspec
      - cd $curr_working_dir

artifacts:
  files:
    - RPMS/x86_64/cbsample*.rpm
  discard-paths: yes

Using cb-centos-project.json as a reference, create the input JSON file for the CLI command. This project uses an AWS CodeCommit repository named codebuild-multispec and a file named buildspec-rpm.yml as the build specification file. To create the RPM package, we need to specify a custom image name. I’m using the latest CentOS 7 image available in the Docker Hub. I’m using a role named CodeBuildServiceRole. It contains permissions similar to those defined in CodeBuildServiceRole.json. (You need to change the resource fields in the policy, as appropriate.)

{
    "name": "rpm-build-project",
    "description": "Project which will build RPM from the source.",
    "source": {
        "type": "CODECOMMIT",
        "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec",
        "buildspec": "buildspec-rpm.yml"
    },
    "artifacts": {
        "type": "S3",
        "location": "codebuild-demo-artifact-repository"
    },
    "environment": {
        "type": "LINUX_CONTAINER",
        "image": "centos:7",
        "computeType": "BUILD_GENERAL1_SMALL"
    },
    "serviceRole": "arn:aws:iam::012345678912:role/service-role/CodeBuildServiceRole",
    "timeoutInMinutes": 15,
    "encryptionKey": "arn:aws:kms:eu-west-1:012345678912:alias/aws/s3",
    "tags": [
        {
            "key": "Name",
            "value": "RPM Demo Build"
        }
    ]
}

After the cli-input-json file is ready, execute the following command to create the build project.

$ aws codebuild create-project --name CodeBuild-RPM-Demo --cli-input-json file://cb-centos-project.json

{
    "project": {
        "name": "CodeBuild-RPM-Demo", 
        "serviceRole": "arn:aws:iam::012345678912:role/service-role/CodeBuildServiceRole", 
        "tags": [
            {
                "value": "RPM Demo Build", 
                "key": "Name"
            }
        ], 
        "artifacts": {
            "namespaceType": "NONE", 
            "packaging": "NONE", 
            "type": "S3", 
            "location": "codebuild-demo-artifact-repository", 
            "name": "CodeBuild-RPM-Demo"
        }, 
        "lastModified": 1500559811.13, 
        "timeoutInMinutes": 15, 
        "created": 1500559811.13, 
        "environment": {
            "computeType": "BUILD_GENERAL1_SMALL", 
            "privilegedMode": false, 
            "image": "centos:7", 
            "type": "LINUX_CONTAINER", 
            "environmentVariables": []
        }, 
        "source": {
            "buildspec": "buildspec-rpm.yml", 
            "type": "CODECOMMIT", 
            "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec"
        }, 
        "encryptionKey": "arn:aws:kms:eu-west-1:012345678912:alias/aws/s3", 
        "arn": "arn:aws:codebuild:eu-west-1:012345678912:project/CodeBuild-RPM-Demo", 
        "description": "Project which will build RPM from the source."
    }
}

When the project is created, run the following command to start the build. After the build has started, get the build ID. You can use the build ID to get the status of the build.

$ aws codebuild start-build --project-name CodeBuild-RPM-Demo
{
    "build": {
        "buildComplete": false, 
        "initiator": "prakash", 
        "artifacts": {
            "location": "arn:aws:s3:::codebuild-demo-artifact-repository/CodeBuild-RPM-Demo"
        }, 
        "projectName": "CodeBuild-RPM-Demo", 
        "timeoutInMinutes": 15, 
        "buildStatus": "IN_PROGRESS", 
        "environment": {
            "computeType": "BUILD_GENERAL1_SMALL", 
            "privilegedMode": false, 
            "image": "centos:7", 
            "type": "LINUX_CONTAINER", 
            "environmentVariables": []
        }, 
        "source": {
            "buildspec": "buildspec-rpm.yml", 
            "type": "CODECOMMIT", 
            "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec"
        }, 
        "currentPhase": "SUBMITTED", 
        "startTime": 1500560156.761, 
        "id": "CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc", 
        "arn": "arn:aws:codebuild:eu-west-1: 012345678912:build/CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc"
    }
}

$ aws codebuild list-builds-for-project --project-name CodeBuild-RPM-Demo
{
    "ids": [
        "CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc"
    ]
}

$ aws codebuild batch-get-builds --ids CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc
{
    "buildsNotFound": [], 
    "builds": [
        {
            "buildComplete": true, 
            "phases": [
                {
                    "phaseStatus": "SUCCEEDED", 
                    "endTime": 1500560157.164, 
                    "phaseType": "SUBMITTED", 
                    "durationInSeconds": 0, 
                    "startTime": 1500560156.761
                }, 
                {
                    "contexts": [], 
                    "phaseType": "PROVISIONING", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 24, 
                    "startTime": 1500560157.164, 
                    "endTime": 1500560182.066
                }, 
                {
                    "contexts": [], 
                    "phaseType": "DOWNLOAD_SOURCE", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 15, 
                    "startTime": 1500560182.066, 
                    "endTime": 1500560197.906
                }, 
                {
                    "contexts": [], 
                    "phaseType": "INSTALL", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 19, 
                    "startTime": 1500560197.906, 
                    "endTime": 1500560217.515
                }, 
                {
                    "contexts": [], 
                    "phaseType": "PRE_BUILD", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 0, 
                    "startTime": 1500560217.515, 
                    "endTime": 1500560217.662
                }, 
                {
                    "contexts": [], 
                    "phaseType": "BUILD", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 0, 
                    "startTime": 1500560217.662, 
                    "endTime": 1500560217.995
                }, 
                {
                    "contexts": [], 
                    "phaseType": "POST_BUILD", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 0, 
                    "startTime": 1500560217.995, 
                    "endTime": 1500560218.074
                }, 
                {
                    "contexts": [], 
                    "phaseType": "UPLOAD_ARTIFACTS", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 0, 
                    "startTime": 1500560218.074, 
                    "endTime": 1500560218.542
                }, 
                {
                    "contexts": [], 
                    "phaseType": "FINALIZING", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 4, 
                    "startTime": 1500560218.542, 
                    "endTime": 1500560223.128
                }, 
                {
                    "phaseType": "COMPLETED", 
                    "startTime": 1500560223.128
                }
            ], 
            "logs": {
                "groupName": "/aws/codebuild/CodeBuild-RPM-Demo", 
                "deepLink": "https://console.aws.amazon.com/cloudwatch/home?region=eu-west-1#logEvent:group=/aws/codebuild/CodeBuild-RPM-Demo;stream=57a36755-4d37-4b08-9c11-1468e1682abc", 
                "streamName": "57a36755-4d37-4b08-9c11-1468e1682abc"
            }, 
            "artifacts": {
                "location": "arn:aws:s3:::codebuild-demo-artifact-repository/CodeBuild-RPM-Demo"
            }, 
            "projectName": "CodeBuild-RPM-Demo", 
            "timeoutInMinutes": 15, 
            "initiator": "prakash", 
            "buildStatus": "SUCCEEDED", 
            "environment": {
                "computeType": "BUILD_GENERAL1_SMALL", 
                "privilegedMode": false, 
                "image": "centos:7", 
                "type": "LINUX_CONTAINER", 
                "environmentVariables": []
            }, 
            "source": {
                "buildspec": "buildspec-rpm.yml", 
                "type": "CODECOMMIT", 
                "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec"
            }, 
            "currentPhase": "COMPLETED", 
            "startTime": 1500560156.761, 
            "endTime": 1500560223.128, 
            "id": "CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc", 
            "arn": "arn:aws:codebuild:eu-west-1:012345678912:build/CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc"
        }
    ]
}

DEB Build Project:

In this project, we will use the build specification file named buildspec-deb.yml. Like the RPM build project, this specification includes multiple phases. Here I use a Debian control file to create the package in DEB format. After a successful build, the DEB package will be uploaded as build artifact.

version: 0.2

env:
  variables:
    build_version: "0.1"

phases:
  install:
    commands:
      - apt-get install gcc make -y
  pre_build:
    commands:
      - mkdir -p ./cbsample-$build_version/DEBIAN
      - mkdir -p ./cbsample-$build_version/usr/lib
      - mkdir -p ./cbsample-$build_version/usr/include
      - mkdir -p ./cbsample-$build_version/usr/bin
      - cp -f cbsample.control ./cbsample-$build_version/DEBIAN/control
  build:
    commands:
      - echo "Building the application"
      - make
      - cp libcbsamplelib.so ./cbsample-$build_version/usr/lib
      - cp cbsamplelib.h ./cbsample-$build_version/usr/include
      - cp cbsampleutil ./cbsample-$build_version/usr/bin
      - chmod +x ./cbsample-$build_version/usr/bin/cbsampleutil
      - dpkg-deb --build ./cbsample-$build_version

artifacts:
  files:
    - cbsample-*.deb

Here we use cb-ubuntu-project.json as a reference to create the CLI input JSON file. This project uses the same AWS CodeCommit repository (codebuild-multispec) but a different buildspec file in the same repository (buildspec-deb.yml). We use the default CodeBuild image to create the DEB package. We use the same IAM role (CodeBuildServiceRole).

{
    "name": "deb-build-project",
    "description": "Project which will build DEB from the source.",
    "source": {
        "type": "CODECOMMIT",
        "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec",
        "buildspec": "buildspec-deb.yml"
    },
    "artifacts": {
        "type": "S3",
        "location": "codebuild-demo-artifact-repository"
    },
    "environment": {
        "type": "LINUX_CONTAINER",
        "image": "aws/codebuild/ubuntu-base:14.04",
        "computeType": "BUILD_GENERAL1_SMALL"
    },
    "serviceRole": "arn:aws:iam::012345678912:role/service-role/CodeBuildServiceRole",
    "timeoutInMinutes": 15,
    "encryptionKey": "arn:aws:kms:eu-west-1:012345678912:alias/aws/s3",
    "tags": [
        {
            "key": "Name",
            "value": "Debian Demo Build"
        }
    ]
}

Using the CLI input JSON file, create the project, start the build, and check the status of the project.

$ aws codebuild create-project --name CodeBuild-DEB-Demo --cli-input-json file://cb-ubuntu-project.json

$ aws codebuild list-builds-for-project --project-name CodeBuild-DEB-Demo

$ aws codebuild batch-get-builds --ids CodeBuild-DEB-Demo:e535c4b0-7067-4fbe-8060-9bb9de203789

After successful completion of the RPM and DEB builds, check the S3 bucket configured in the artifacts section for the build packages. Build projects will create a directory in the name of the build project and copy the artifacts inside it.

$ aws s3 ls s3://codebuild-demo-artifact-repository/CodeBuild-RPM-Demo/
2017-07-20 16:16:59       8108 cbsample-0.1-1.el7.centos.x86_64.rpm

$ aws s3 ls s3://codebuild-demo-artifact-repository/CodeBuild-DEB-Demo/
2017-07-20 16:37:22       5420 cbsample-0.1.deb

Override Buildspec During Build Start:

It’s also possible to override the build specification file of an existing project when starting a build. If we want to create the libs RPM package instead of the whole RPM, we will use the build specification file named buildspec-libs-rpm.yml. This build specification file is similar to the earlier RPM build. The only difference is that it uses a different RPM specification file to create libs RPM.

version: 0.2

env:
  variables:
    build_version: "0.1"

phases:
  install:
    commands:
      - yum install rpm-build make gcc glibc -y
  pre_build:
    commands:
      - curr_working_dir=`pwd`
      - mkdir -p ./{RPMS,SRPMS,BUILD,SOURCES,SPECS,tmp}
      - filename="cbsample-libs-$build_version"
      - echo $filename
      - mkdir -p $filename
      - cp ./*.c ./*.h Makefile $filename
      - tar -zcvf /root/$filename.tar.gz $filename
      - cp /root/$filename.tar.gz ./SOURCES/
      - cp cbsample-libs.rpmspec ./SPECS/
  build:
    commands:
      - echo "Triggering RPM build"
      - rpmbuild --define "_topdir `pwd`" -ba SPECS/cbsample-libs.rpmspec
      - cd $curr_working_dir

artifacts:
  files:
    - RPMS/x86_64/cbsample-libs*.rpm
  discard-paths: yes

Using the same RPM build project that we created earlier, start a new build and set the value of the `–buildspec-override` parameter to buildspec-libs-rpm.yml .

$ aws codebuild start-build --project-name CodeBuild-RPM-Demo --buildspec-override buildspec-libs-rpm.yml
{
    "build": {
        "buildComplete": false, 
        "initiator": "prakash", 
        "artifacts": {
            "location": "arn:aws:s3:::codebuild-demo-artifact-repository/CodeBuild-RPM-Demo"
        }, 
        "projectName": "CodeBuild-RPM-Demo", 
        "timeoutInMinutes": 15, 
        "buildStatus": "IN_PROGRESS", 
        "environment": {
            "computeType": "BUILD_GENERAL1_SMALL", 
            "privilegedMode": false, 
            "image": "centos:7", 
            "type": "LINUX_CONTAINER", 
            "environmentVariables": []
        }, 
        "source": {
            "buildspec": "buildspec-libs-rpm.yml", 
            "type": "CODECOMMIT", 
            "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec"
        }, 
        "currentPhase": "SUBMITTED", 
        "startTime": 1500562366.239, 
        "id": "CodeBuild-RPM-Demo:82d05f8a-b161-401c-82f0-83cb41eba567", 
        "arn": "arn:aws:codebuild:eu-west-1:012345678912:build/CodeBuild-RPM-Demo:82d05f8a-b161-401c-82f0-83cb41eba567"
    }
}

After the build is completed successfully, check to see if the package appears in the artifact S3 bucket under the CodeBuild-RPM-Demo build project folder.

$ aws s3 ls s3://codebuild-demo-artifact-repository/CodeBuild-RPM-Demo/
2017-07-20 16:16:59       8108 cbsample-0.1-1.el7.centos.x86_64.rpm
2017-07-20 16:53:54       5320 cbsample-libs-0.1-1.el7.centos.x86_64.rpm

Conclusion

In this post, I have shown you how multiple buildspec files in the same source repository can be used to run multiple AWS CodeBuild build projects. I have also shown you how to provide a different buildspec file when starting the build.

For more information about AWS CodeBuild, see the AWS CodeBuild documentation. You can get started with AWS CodeBuild by using this step by step guide.


About the author

Prakash Palanisamy is a Solutions Architect for Amazon Web Services. When he is not working on Serverless, DevOps or Alexa, he will be solving problems in Project Euler. He also enjoys watching educational documentaries.

Run Common Data Science Packages on Anaconda and Oozie with Amazon EMR

Post Syndicated from John Ohle original https://aws.amazon.com/blogs/big-data/run-common-data-science-packages-on-anaconda-and-oozie-with-amazon-emr/

In the world of data science, users must often sacrifice cluster set-up time to allow for complex usability scenarios. Amazon EMR allows data scientists to spin up complex cluster configurations easily, and to be up and running with complex queries in a matter of minutes.

Data scientists often use scheduling applications such as Oozie to run jobs overnight. However, Oozie can be difficult to configure when you are trying to use popular Python packages (such as “pandas,” “numpy,” and “statsmodels”), which are not included by default.

One such popular platform that contains these types of packages (and more) is Anaconda. This post focuses on setting up an Anaconda platform on EMR, with an intent to use its packages with Oozie. I describe how to run jobs using a popular open source scheduler like Oozie.

Walkthrough

For this post, you walk through the following tasks:

  • Create an EMR cluster.
  • Download Anaconda on your master node.
  • Configure Oozie.
  • Test the steps.

Create an EMR cluster

Spin up an Amazon EMR cluster using the console or the AWS CLI. Use the latest release, and include Apache Hadoop, Apache Spark, Apache Hive, and Oozie.

To create a three-node cluster in the us-east-1 region, issue an AWS CLI command such as the following. This command must be typed as one line, as shown below. It is shown here separated for readability purposes only.

aws emr create-cluster \ 
--release-label emr-5.7.0 \ 
 --name '<YOUR-CLUSTER-NAME>' \
 --applications Name=Hadoop Name=Oozie Name=Spark Name=Hive \ 
 --ec2-attributes '{"KeyName":"<YOUR-KEY-PAIR>","SubnetId":"<YOUR-SUBNET-ID>","EmrManagedSlaveSecurityGroup":"<YOUR-EMR-SLAVE-SECURITY-GROUP>","EmrManagedMasterSecurityGroup":"<YOUR-EMR-MASTER-SECURITY-GROUP>"}' \ 
 --use-default-roles \ 
 --instance-groups '[{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"<YOUR-INSTANCE-TYPE>","Name":"Master - 1"},{"InstanceCount":<YOUR-CORE-INSTANCE-COUNT>,"InstanceGroupType":"CORE","InstanceType":"<YOUR-INSTANCE-TYPE>","Name":"Core - 2"}]'

One-line version for reference:

aws emr create-cluster --release-label emr-5.7.0 --name '<YOUR-CLUSTER-NAME>' --applications Name=Hadoop Name=Oozie Name=Spark Name=Hive --ec2-attributes '{"KeyName":"<YOUR-KEY-PAIR>","SubnetId":"<YOUR-SUBNET-ID>","EmrManagedSlaveSecurityGroup":"<YOUR-EMR-SLAVE-SECURITY-GROUP>","EmrManagedMasterSecurityGroup":"<YOUR-EMR-MASTER-SECURITY-GROUP>"}' --use-default-roles --instance-groups '[{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"<YOUR-INSTANCE-TYPE>","Name":"Master - 1"},{"InstanceCount":<YOUR-CORE-INSTANCE-COUNT>,"InstanceGroupType":"CORE","InstanceType":"<YOUR-INSTANCE-TYPE>","Name":"Core - 2"}]'

Download Anaconda

SSH into your EMR master node instance and download the official Anaconda installer:

wget https://repo.continuum.io/archive/Anaconda2-4.4.0-Linux-x86_64.sh

At the time of publication, Anaconda 4.4 is the most current version available. For the download link location for the latest Python 2.7 version (Python 3.6 may encounter issues), see https://www.continuum.io/downloads.  Open the context (right-click) menu for the Python 2.7 download link, choose Copy Link Location, and use this value in the previous wget command.

This post used the Anaconda 4.4 installation. If you have a later version, it is shown in the downloaded filename:  “anaconda2-<version number>-Linux-x86_64.sh”.

Run this downloaded script and follow the on-screen installer prompts.

chmod u+x Anaconda2-4.4.0-Linux-x86_64.sh
./Anaconda2-4.4.0-Linux-x86_64.sh

For an installation directory, select somewhere with enough space on your cluster, such as “/mnt/anaconda/”.

The process should take approximately 1–2 minutes to install. When prompted if you “wish the installer to prepend the Anaconda2 install location”, select the default option of [no].

After you are done, export the PATH to include this new Anaconda installation:

export PATH=/mnt/anaconda/bin:$PATH

Zip up the Anaconda installation:

cd /mnt/anaconda/
zip -r anaconda.zip .

The zip process may take 4–5 minutes to complete.

(Optional) Upload this anaconda.zip file to your S3 bucket for easier inclusion into future EMR clusters. This removes the need to repeat the previous steps for future EMR clusters.

Configure Oozie

Next, you configure Oozie to use Pyspark and the Anaconda platform.

Get the location of your Oozie sharelibupdate folder. Issue the following command and take note of the “sharelibDirNew” value:

oozie admin -sharelibupdate

For this post, this value is “hdfs://ip-192-168-4-200.us-east-1.compute.internal:8020/user/oozie/share/lib/lib_20170616133136”.

Pass in the required Pyspark files into Oozies sharelibupdate location. The following files are required for Oozie to be able to run Pyspark commands:

  • pyspark.zip
  • py4j-0.10.4-src.zip

These are located on the EMR master instance in the location “/usr/lib/spark/python/lib/”, and must be put into the Oozie sharelib spark directory. This location is the value of the sharelibDirNew parameter value (shown above) with “/spark/” appended, that is, “hdfs://ip-192-168-4-200.us-east-1.compute.internal:8020/user/oozie/share/lib/lib_20170616133136/spark/”.

To do this, issue the following commands:

hdfs dfs -put /usr/lib/spark/python/lib/py4j-0.10.4-src.zip hdfs://ip-192-168-4-200.us-east-1.compute.internal:8020/user/oozie/share/lib/lib_20170616133136/spark/
hdfs dfs -put /usr/lib/spark/python/lib/pyspark.zip hdfs://ip-192-168-4-200.us-east-1.compute.internal:8020/user/oozie/share/lib/lib_20170616133136/spark/

After you’re done, Oozie can use Pyspark in its processes.

Pass the anaconda.zip file into HDFS as follows:

hdfs dfs -put /mnt/anaconda/anaconda.zip /tmp/myLocation/anaconda.zip

(Optional) Verify that it was transferred successfully with the following command:

hdfs dfs -ls /tmp/myLocation/

On your master node, execute the following command:

export PYSPARK_PYTHON=/mnt/anaconda/bin/python

Set the PYSPARK_PYTHON environment variable on the executor nodes. Put the following configurations in your “spark-opts” values in your Oozie workflow.xml file:

–conf spark.executorEnv.PYSPARK_PYTHON=./anaconda_remote/bin/python
–conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./anaconda_remote/bin/python

This is referenced from the Oozie job in the following line in your workflow.xml file, also included as part of your “spark-opts”:

--archives hdfs:///tmp/myLocation/anaconda.zip#anaconda_remote

Your Oozie workflow.xml file should now look something like the following:

<workflow-app name="spark-wf" xmlns="uri:oozie:workflow:0.5">
<start to="start_spark" />
<action name="start_spark">
    <spark xmlns="uri:oozie:spark-action:0.1">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <prepare>
            <delete path="/tmp/test/spark_oozie_test_out3"/>
        </prepare>
        <master>yarn-cluster</master>
        <mode>cluster</mode>
        <name>SparkJob</name>
        <class>clear</class>
        <jar>hdfs:///user/oozie/apps/myPysparkProgram.py</jar>
        <spark-opts>--queue default
            --conf spark.ui.view.acls=*
            --executor-memory 2G --num-executors 2 --executor-cores 2 --driver-memory 3g
            --conf spark.executorEnv.PYSPARK_PYTHON=./anaconda_remote/bin/python
            --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./anaconda_remote/bin/python
            --archives hdfs:///tmp/myLocation/anaconda.zip#anaconda_remote
        </spark-opts>
    </spark>
    <ok to="end"/>
    <error to="kill"/>
</action>
        <kill name="kill">
                <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
        </kill>
        <end name="end"/>
</workflow-app>

Test steps

To test this out, you can use the following job.properties and myPysparkProgram.py file, along with the following steps:

job.properties

masterNode ip-xxx-xxx-xxx-xxx.us-east-1.compute.internal
nameNode hdfs://${masterNode}:8020
jobTracker ${masterNode}:8032
master yarn
mode cluster
queueName default
oozie.libpath ${nameNode}/user/oozie/share/lib
oozie.use.system.libpath true
oozie.wf.application.path ${nameNode}/user/oozie/apps/

Note: You can get your master node IP address (denoted as “ip-xxx-xxx-xxx-xxx” here) from the value for the sharelibDirNew parameter noted earlier.

myPysparkProgram.py

from pyspark import SparkContext, SparkConf
import numpy
import sys

conf = SparkConf().setAppName('myPysparkProgram')
sc = SparkContext(conf=conf)

rdd = sc.textFile("/user/hadoop/input.txt")

x = numpy.sum([3,4,5]) #total = 12

rdd = rdd.map(lambda line: line + str(x))
rdd.saveAsTextFile("/user/hadoop/output")

Put the “myPysparkProgram.py” into the location mentioned between the “<jar>xxxxx</jar>” tags in your workflow.xml. In this example, the location is “hdfs:///user/oozie/apps/”. Use the following command to move the “myPysparkProgram.py” file to the correct location:

hdfs dfs -put myPysparkProgram.py /user/oozie/apps/

Put the above workflow.xml file into the “/user/oozie/apps/” location in hdfs:

hdfs dfs –put workflow.xml /user/oozie/apps/

Note: The job.properties file is run locally from the EMR master node.

Create a sample input.txt file with some data in it. For example:

input.txt

This is a sentence.
So is this. 
This is also a sentence.

Put this file into hdfs:

hdfs dfs -put input.txt /user/hadoop/

Execute the job in Oozie with the following command. This creates an Oozie job ID.

oozie job -config job.properties -run

You can check the Oozie job state with the command:

oozie job -info <Oozie job ID>

  1. When the job is successfully finished, the results are located at:
/user/hadoop/output/part-00000
/user/hadoop/output/part-00001

  1. Run the following commands to view the output:
hdfs dfs -cat /user/hadoop/output/part-00000
hdfs dfs -cat /user/hadoop/output/part-00001

The output will be:

This is a sentence. 12
So is this 12
This is also a sentence 12

Summary

The myPysparkProgram.py has successfully imported the numpy library from the Anaconda platform and has produced some output with it. If you tried to run this using standard Python, you’d encounter the following error:

Now when your Python job runs in Oozie, any imported packages that are implicitly imported by your Pyspark script are imported into your job within Oozie directly from the Anaconda platform. Simple!

If you have questions or suggestions, please leave a comment below.


Additional Reading

Learn how to use Apache Oozie workflows to automate Apache Spark jobs on Amazon EMR.

 


About the Author

John Ohle is an AWS BigData Cloud Support Engineer II for the BigData team in Dublin. He works to provide advice and solutions to our customers on their Big Data projects and workflows on AWS. In his spare time, he likes to play music, learn, develop tools and write documentation to further help others – both colleagues and customers alike.

 

 

 

Use CloudFormation StackSets to Provision Resources Across Multiple AWS Accounts and Regions

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/use-cloudformation-stacksets-to-provision-resources-across-multiple-aws-accounts-and-regions/

AWS CloudFormation helps AWS customers implement an Infrastructure as Code model. Instead of setting up their environments and applications by hand, they build a template and use it to create all of the necessary resources, collectively known as a CloudFormation stack. This model removes opportunities for manual error, increases efficiency, and ensures consistent configurations over time.

Today I would like to tell you about a new feature that makes CloudFormation even more useful. This feature is designed to help you to address the challenges that you face when you use Infrastructure as Code in situations that include multiple AWS accounts and/or AWS Regions. As a quick review:

Accounts – As I have told you in the past, many organizations use a multitude of AWS accounts, often using AWS Organizations to arrange the accounts into a hierarchy and to group them into Organizational Units, or OUs (read AWS Organizations – Policy-Based Management for Multiple AWS Accounts to learn more). Our customers use multiple accounts for business units, applications, and developers. They often create separate accounts for development, testing, staging, and production on a per-application basis.

Regions – Customers also make great use of the large (and ever-growing) set of AWS Regions. They build global applications that span two or more regions, implement sophisticated multi-region disaster recovery models, replicate S3, Aurora, PostgreSQL, and MySQL data in real time, and choose locations for storage and processing of sensitive data in accord with national and regional regulations.

This expansion into multiple accounts and regions comes with some new challenges with respect to governance and consistency. Our customers tell us that they want to make sure that each new account is set up in accord with their internal standards. Among other things, they want to set up IAM users and roles, VPCs and VPC subnets, security groups, Config Rules, logging, and AWS Lambda functions in a consistent and reliable way.

Introducing StackSet
In order to address these important customer needs, we are launching CloudFormation StackSet today. You can now define an AWS resource configuration in a CloudFormation template and then roll it out across multiple AWS accounts and/or Regions with a couple of clicks. You can use this to set up a baseline level of AWS functionality that addresses the cross-account and cross-region scenarios that I listed above. Once you have set this up, you can easily expand coverage to additional accounts and regions.

This feature always works on a cross-account basis. The master account owns one or more StackSets and controls deployment to one or more target accounts. The master account must include an assumable IAM role and the target accounts must delegate trust to this role. To learn how to do this, read Prerequisites in the StackSet Documentation.

Each StackSet references a CloudFormation template and contains lists of accounts and regions. All operations apply to the cross-product of the accounts and regions in the StackSet. If the StackSet references three accounts (A1, A2, and A3) and four regions (R1, R2, R3, and R4), there are twelve targets:

  • Region R1: Accounts A1, A2, and A3.
  • Region R2: Accounts A1, A2, and A3.
  • Region R3: Accounts A1, A2, and A3.
  • Region R4: Accounts A1, A2, and A3.

Deploying a template initiates creation of a CloudFormation stack in an account/region pair. Templates are deployed sequentially to regions (you control the order) to multiple accounts within the region (you control the amount of parallelism). You can also set an error threshold that will terminate deployments if stack creation fails.

You can use your existing CloudFormation templates (taking care to make sure that they are ready to work across accounts and regions), create new ones, or use one of our sample templates. We are launching with support for the AWS partition (all public regions except those in China), and expect to expand it to to the others before too long.

Using StackSets
You can create and deploy StackSets from the CloudFormation Console, via the CloudFormation APIs, or from the command line.

Using the Console, I start by clicking on Create StackSet. I can use my own template or one of the samples. I’ll use the last sample (Add config rule encrypted volumes):

I click on View template to learn more about the template and the rule:

I give my StackSet a name. The template that I selected accepts an optional parameter, and I can enter it at this time:

Next, I choose the accounts and regions. I can enter account numbers directly, reference an AWS organizational unit, or upload a list of account numbers:

I can set up the regions and control the deployment order:

I can also set the deployment options. Once I am done I click on Next to proceed:

I can add tags to my StackSet. They will be applied to the AWS resources created during the deployment:

The deployment begins, and I can track the status from the Console:

I can open up the Stacks section to see each stack. Initially, the status of each stack is OUTDATED, indicating that the template has yet to be deployed to the stack; this will change to CURRENT after a successful deployment. If a stack cannot be deleted, the status will change to INOPERABLE.

After my initial deployment, I can click on Manage StackSet to add additional accounts, regions, or both, to create additional stacks:

Now Available
This new feature is available now and you can start using it today at no extra charge (you pay only for the AWS resources created on your behalf).

Jeff;

PS – If you create some useful templates and would like to share them with other AWS users, please send a pull request to our AWS Labs GitHub repo.

How to Use Batch References in Amazon Cloud Directory to Refer to New Objects in a Batch Request

Post Syndicated from Vineeth Harikumar original https://aws.amazon.com/blogs/security/how-to-use-batch-references-in-amazon-cloud-directory-to-refer-to-new-objects-in-a-batch-request/

In Amazon Cloud Directory, it’s often necessary to add new objects or add relationships between new objects and existing objects to reflect changes in a real-world hierarchy. With Cloud Directory, you can make these changes efficiently by using batch references within batch operations.

Let’s say I want to take an existing child object in a hierarchy, detach it from its parent, and reattach it to another part of the hierarchy. A simple way to do this would be to make a call to get the object’s unique identifier, another call to detach the object from its parent using the unique identifier, and a third call to attach it to a new parent. However, if I use batch references within a batch write operation, I can perform all three of these actions in the same request, greatly simplifying my code and reducing the round trips required to make such changes.

In this post, I demonstrate how to use batch references in a single write request to simplify adding and restructuring a Cloud Directory hierarchy. I have used the AWS SDK for Java for all the sample code in this post, but you can use other language SDKs or the AWS CLI in a similar way.

Using batch references

In my previous post, I demonstrated how to add AnyCompany’s North American warehouses to a global network of warehouses. As time passes and demand grows, AnyCompany launches multiple warehouses in North American cities to fulfill customer orders with continued efficiency. This requires the company to restructure the network to group warehouses in the same region so that the company can apply similar standards to them, such as delivery times, delivery areas, and types of products sold.

For instance, in the NorthAmerica object (see the following diagram), AnyCompany has launched two new warehouses in the Phoenix (PHX) area: PHX_2 and PHX_3. AnyCompany wants to add these new warehouses to the network and regroup them with existing warehouse PHX_1 under the new node, PHX.

The state of the hierarchy before this regrouping is shown in the following diagram, where I added the NorthAmerica warehouses (also represented as NA in the diagram) to the larger network of AnyCompany’s warehouses.

Diagram showing the state of the hierarchy before this post's regrouping

Adding and grouping new warehouses in the NorthAmerica network

I want to add and group the new warehouses with a single request, and using batch references in a batch write lets me do that. A batch reference is just another way of using object references that you are allowed to define arbitrarily. This allows you to chain operations, which means using the return value from one operation in a subsequent operation within the same batch write request

Let’s say I have a batch write request with two batch operations: operation A and operation B. Both batch operations operate on the same object X. In operation A, I use the object X found at /NorthAmerica/Phoenix, and I assign it to a batch reference that I call referencePhoenix. In operation B, I want to modify the same object X, so I use referencePhoenix as the object reference that points to the same unique object X used in operation A. I also will use the same helper method implementation from my previous post for getBatchCreateOperation. To learn more about batch references, see the ObjectReference documentation.

To add and group the new warehouses, I will take advantage of batch references to sequentially:

  1. Detach PHX_1 from the NA node and maintain a reference to PHX_1.
  2. Create a new child node, PHX, and attach it to the NA node.
  3. Create PHX_2 and PHX_3 nodes for the new warehouses.
  4. Link all three nodes—PHX_1 (using the batch reference), PHX_2, and PHX_3—to the PHX node.

The following code example achieves these changes in a single batch by using references. First, the code sets up a createObjectPHX operation to create the PHX parent object and attach it to the parent NorthAmerica object. It then sets up createObjectPHX_2 and createObjectPHX_3 and attaches these new objects to the new PHX object. The code then sets up a detachObject to detach the current PHX_1 object from its parent and assign it to a batch reference. The last operation uses that same batch reference to attach the PHX_1 object to the newly created PHX object. The code example orders these steps sequentially in a batch write operation.

   BatchWriteOperation createObjectPHX = getBatchCreateOperation(
        "PHX",
        directorySchemaARN,
        "/NorthAmerica",
        "Phoenix");
   BatchWriteOperation createObjectPHX_2 = getBatchCreateOperation(
        "PHX_2",
        directorySchemaARN,
        "/NorthAmerica/Phoenix",
        "PHX_2");
   BatchWriteOperation createObjectPHX_3 = getBatchCreateOperation(
        "PHX_3",
        directorySchemaARN,
        "/NorthAmerica/Phoenix",
        "PHX_3");


   BatchDetachObject detachObject = new BatchDetachObject()
        .withBatchReferenceName("referenceToPHX_1")
        .withLinkName("Phoenix")
        .withParentReference(new ObjectReference()
             .withSelector("/NorthAmerica"));

   BatchAttachObject attachObject = new BatchAttachObject()
        .withChildReference(new ObjectReference().withSelector("#referenceToPHX_1"))
        .withLinkName("PHX_1")
        .withParentReference(new ObjectReference()
            .withSelector("/NorthAmerica/Phoenix"));

   BatchWriteOperation detachOperation = new BatchWriteOperation()
       .withDetachObject(detachObject);
   BatchWriteOperation attachOperation = new BatchWriteOperation()
       .withAttachObject(attachObject);


   BatchWriteRequest request = new BatchWriteRequest();
   request.setDirectoryArn(directoryARN);
   request.setOperations(Lists.newArrayList(
       detachOperation,
       createObjectPHX,
       createObjectPHX_2,
       createObjectPHX_3,
       attachOperation));

   client.batchWrite(request);

In the preceding code example, I use the batch reference, referenceToPHX_1, in the same batch write operation because I do not have to know the object identifier of that object. If I couldn’t use such a batch reference, I would have to use separate requests to get the PHX_1 identifier, detach it from the NA node, and then attach it to the new PHX node.

I now have the network configuration I want, as shown in the following diagram. I have used a combination of batch operations with batch references to bring new warehouses into the network and regroup them within the same local group of warehouses.

Diagram showing the desired network configuration

Summary

In this post, I have shown how you can use batch references in a single batch write request to simplify adding and restructuring your existing hierarchies in Cloud Directory. You can use batch references in scenarios where you want to get an object identifier, but don’t want the overhead of using a read operation before a write operation. Instead, you can use a batch reference to refer to an object as part of the intermediate batch operation. To learn more about batch operations, see Batches, BatchWrite, and BatchRead.

If you have comments about this post, submit them in the “Comments” section below. If you have implementation questions, start a new thread on the Directory Service forum.

– Vineeth

Several TVAddons Domains Transferred to Canadian Lawfirm

Post Syndicated from Andy original https://torrentfreak.com/several-tvaddons-domains-transferred-to-canadian-lawfirm-170718/

The last couple of months have been the most chaotic on record for the booming Kodi third-party addon scene. After years of largely interrupted service, a single lawsuit changed the entire landscape.

Last month, TF broke the news that third-party Kodi add-on ZemTV and the TVAddons library were being sued in a federal court in Texas. The fallout was something to behold.

Within days the ‘pirate’ Kodi community found itself in turmoil. Several high-profile Kodi addons took the decision to shut down and even TVAddons itself went dark without explanation.

At the time, unsubstantiated rumors suggested that TVAddons’ disappearance could be attributed to some coincidental site maintenance. However, with around 40 million regular users built up over a number of years, a disappearing Facebook page, and complete radio silence during alleged “routine maintenance,” something was clearly wrong.

It would’ve taken just a couple of minutes to put a ‘maintenance’ notice on the site but one didn’t appear back in June, and one hasn’t appeared since. Behind the scenes,
however, things have been shifting.

In addition to wiping the DNS entries of TVAddons.ag, on at least another couple of occasions the domain has been quietly updated. The image below shows how it used to look.

TVAddons historical domain WHOIS

PrivacyDotLink refers to a service offered by Cayman Islands-based registry Uniregistry. Instead of displaying the real name and address of the domain owner (in this case the person behind TVAddons.ag), the registry replaces the information with details of its own.

The privacy service is used for many reasons, but it’s not hard to see why it’s of particular use to sites in the ‘pirate’ sector.

While some of the changes to the TVAddons domain during the past five weeks or so haven’t been obvious, this morning we observed the biggest change yet. As seen in the image below, its ownership details are no longer obscured by the privacy service.

TVAddons new domain WHOIS

What stands out here is the name Daniel Drapeau. On closer inspection, this gentleman turns out to be a Canada-based lawyer who was admitted to the Quebec Bar in 1991.

“A passion for IP and a 20 year track record, servicing corporations and individuals alike in a wide variety of industries, including industrial equipment, consumer products, publishing, food & beverage, fashion and arts,” Drapeau’s Linkedin page reads.

“His forte is the strategic use of IP rights and litigation to achieve his clients’ goals, whether they be protective, aggressive or defensive. Specialties: Expeditive remedies, including injunctions and seizure orders.”

The other fresh detail in the WHOIS is an address – 600, de Maisonneuve West, Montreal (Quebec) H3A 3J2. It’s a perfect match for the premises of DrapeauLex, a law firm launched by Drapeau in 2012.

Only adding to the intrigue is the fact that other domains operated by TVAddons both recently and historically have also been transferred to the lawfirm.

XMBCHUB.com, which was the domain used by TVAddons before making the switch several years ago, was transferred yesterday. The same can be said about Offshoregit.com, the domain used by TVAddons to distribute Kodi addons.

While there are a few explanations for a lawyer’s name appearing on the TVAddons domains, none of them are yet supported by legal documentation filed in the United States. As of this morning, the Dish Network case docket had received no additional updates. No notice of action in Canada has been made public.

Nevertheless, as a past president of the Intellectual Property Institute of Canada’s anti-counterfeiting committee, Drapeau is certainly an interesting character in the IP space. As noted in a 2009 article by Professor of Law Michael Geist, Drapeau “urged the government to adopt a system of notice-and-takedown.”

Interestingly, Drapeau also worked at law firm Smart & Biggar, where former colleague Jean-Sébastien Dupont recently went on to represent Canadian broadcasters in Wesley (Mtlfreetv.com) v. Bell Canada, the big Kodi-addon piracy case currently underway in Canada.

At this stage, it’s unclear who Drapeau is working for in the TVAddons case. It’s possible that he’s working for Dish and this is a step towards the domains being handed over to the broadcaster as part of a settlement deal with TVAddons. That being said, the XBMChub and Offshoregit domains weren’t mentioned in the Dish lawsuit so something else might be underway.

TorrentFreak reached out to Drapeau for comment and clarification, but at the time of publication, we had received no response.

Dan Drapeau talks Intellectual Property from DrapeauLex on Vimeo.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Amazon EC2 Systems Manager Patch Manager now supports Linux

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-ec2-systems-manager-patch-manager-now-supports-linux/

Hot on the heels of some other great Amazon EC2 Systems Manager (SSM) updates is another vital enhancement: the ability to use Patch Manager on Linux instances!

We launched Patch Manager with SSM at re:Invent in 2016 and Linux support was a commonly requested feature. Starting today we can support patch manager in:

  • Amazon Linux 2014.03 and later (2015.03 and later for 64-bit)
  • Ubuntu Server 16.04 LTS, 14.04 LTS, and 12.04 LTS
  • RHEL 6.5 and later (7.x and later for 64-Bit)

When I think about patching a big group of heterogenous systems I get a little anxious. Years ago, I administered my school’s computer lab. This involved a modest group of machines running a small number of VMs with an immodest number of distinct Linux distros. When there was a critical security patch it was a lot of work to remember the constraints of each system. I remember having to switch back and forth between arcane invocations of various package managers – pinning and unpinning packages: sudo yum update -y, rpm -Uvh ..., apt-get, or even emerge (one of our professors loved Gentoo).

Even now, when I use configuration management systems like Chef or Puppet I still have to specify the package manager and remember a portion of the invocation – and I don’t always want to roll out a patch without some manual approval process. Based on these experiences I decided it was time for me to update my skillset and learn to use Patch Manager.

Patch Manager is a fully-managed service (provided at no additional cost) that helps you simplify your operating system patching process, including defining the patches you want to approve for deployment, the method of patch deployment, the timing for patch roll-outs, and determining patch compliance status across your entire fleet of instances. It’s extremely configurable with some sensible defaults and helps you easily deal with patching hetergenous clusters.

Since I’m not running that school computer lab anymore my fleet is a bit smaller these days:

a list of instances with amusing names

As you can see above I only have a few instances in this region but if you look at the launch times they range from 2014 to a few minutes ago. I’d be willing to bet I’ve missed a patch or two somewhere (luckily most of these have strict security groups). To get started I installed the SSM agent on all of my machines by following the documentation here. I also made sure I had the appropriate role and IAM profile attached to the instances to talk to SSM – I just used this managed policy: AmazonEC2RoleforSSM.

Now I need to define a Patch Baseline. We’ll make security updates critical and all other updates informational and subject to my approval.

 

Next, I can run the AWS-RunPatchBaseline SSM Run Command in “Scan” mode to generate my patch baseline data.

Then, we can go to the Patch Compliance page in the EC2 console and check out how I’m doing.

Yikes, looks like I need some security updates! Now, I can use Maintenance Windows, Run Command, or State Manager in SSM to actually manage this patching process. One thing to note, when patching is completed, your machine reboots – so managing that roll out with Maintenance Windows or State Manager is a best practice. If I had a larger set of instances I could group them by creating a tag named “Patch Group”.

For now, I’ll just use the same AWS-RunPatchBaseline Run Command command from above with the “Install” operation to update these machines.

As always, the CLIs and APIs have been updated to support these new options. The documentation is here. I hope you’re all able to spend less time patching and more time coding!

Randall