Tag Archives: Architecture

Announcing the Winners of the AWS Chatbot Challenge – Conversational, Intelligent Chatbots using Amazon Lex and AWS Lambda

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/announcing-the-winners-of-the-aws-chatbot-challenge-conversational-intelligent-chatbots-using-amazon-lex-and-aws-lambda/

A couple of months ago on the blog, I announced the AWS Chatbot Challenge in conjunction with Slack. The AWS Chatbot Challenge was an opportunity to build a unique chatbot that helped to solve a problem or that would add value for its prospective users. The mission was to build a conversational, natural language chatbot using Amazon Lex and leverage Lex’s integration with AWS Lambda to execute logic or data processing on the backend.

I know that you all have been anxiously waiting to hear announcements of who were the winners of the AWS Chatbot Challenge as much as I was. Well wait no longer, the winners of the AWS Chatbot Challenge have been decided.

May I have the Envelope Please? (The Trumpets sound)

The winners of the AWS Chatbot Challenge are:

  • First Place: BuildFax Counts by Joe Emison
  • Second Place: Hubsy by Andrew Riess, Andrew Puch, and John Wetzel
  • Third Place: PFMBot by Benny Leong and his team from MoneyLion.
  • Large Organization Winner: ADP Payroll Innovation Bot by Eric Liu, Jiaxing Yan, and Fan Yang


Diving into the Winning Chatbot Projects

Let’s take a walkthrough of the details for each of the winning projects to get a view of what made these chatbots distinctive, as well as, learn more about the technologies used to implement the chatbot solution.


BuildFax Counts by Joe Emison

The BuildFax Counts bot was created as a real solution for the BuildFax company to decrease the amount the time that sales and marketing teams can get answers on permits or properties with permits meet certain criteria.

BuildFax, a company co-founded by bot developer Joe Emison, has the only national database of building permits, which updates data from approximately half of the United States on a monthly basis. In order to accommodate the many requests that come in from the sales and marketing team regarding permit information, BuildFax has a technical sales support team that fulfills these requests sent to a ticketing system by manually writing SQL queries that run across the shards of the BuildFax databases. Since there are a large number of requests received by the internal sales support team and due to the manual nature of setting up the queries, it may take several days for getting the sales and marketing teams to receive an answer.

The BuildFax Counts chatbot solves this problem by taking the permit inquiry that would normally be sent into a ticket from the sales and marketing team, as input from Slack to the chatbot. Once the inquiry is submitted into Slack, a query executes and the inquiry results are returned immediately.

Joe built this solution by first creating a nightly export of the data in their BuildFax MySQL RDS database to CSV files that are stored in Amazon S3. From the exported CSV files, an Amazon Athena table was created in order to run quick and efficient queries on the data. He then used Amazon Lex to create a bot to handle the common questions and criteria that may be asked by the sales and marketing teams when seeking data from the BuildFax database by modeling the language used from the BuildFax ticketing system. He added several different sample utterances and slot types; both custom and Lex provided, in order to correctly parse every question and criteria combination that could be received from an inquiry.  Using Lambda, Joe created a Javascript Lambda function that receives information from the Lex intent and used it to build a SQL statement that runs against the aforementioned Athena database using the AWS SDK for JavaScript in Node.js library to return inquiry count result and SQL statement used.

The BuildFax Counts bot is used today for the BuildFax sales and marketing team to get back data on inquiries immediately that previously took up to a week to receive results.

Not only is BuildFax Counts bot our 1st place winner and wonderful solution, but its creator, Joe Emison, is a great guy.  Joe has opted to donate his prize; the $5,000 cash, the $2,500 in AWS Credits, and one re:Invent ticket to the Black Girls Code organization. I must say, you rock Joe for helping these kids get access and exposure to technology.


Hubsy by Andrew Riess, Andrew Puch, and John Wetzel

Hubsy bot was created to redefine and personalize the way users traditionally manage their HubSpot account. HubSpot is a SaaS system providing marketing, sales, and CRM software. Hubsy allows users of HubSpot to create engagements and log engagements with customers, provide sales teams with deals status, and retrieves client contact information quickly. Hubsy uses Amazon Lex’s conversational interface to execute commands from the HubSpot API so that users can gain insights, store and retrieve data, and manage tasks directly from Facebook, Slack, or Alexa.

In order to implement the Hubsy chatbot, Andrew and the team members used AWS Lambda to create a Lambda function with Node.js to parse the users request and call the HubSpot API, which will fulfill the initial request or return back to the user asking for more information. Terraform was used to automatically setup and update Lambda, CloudWatch logs, as well as, IAM profiles. Amazon Lex was used to build the conversational piece of the bot, which creates the utterances that a person on a sales team would likely say when seeking information from HubSpot. To integrate with Alexa, the Amazon Alexa skill builder was used to create an Alexa skill which was tested on an Echo Dot. Cloudwatch Logs are used to log the Lambda function information to CloudWatch in order to debug different parts of the Lex intents. In order to validate the code before the Terraform deployment, ESLint was additionally used to ensure the code was linted and proper development standards were followed.


PFMBot by Benny Leong and his team from MoneyLion

PFMBot, Personal Finance Management Bot,  is a bot to be used with the MoneyLion finance group which offers customers online financial products; loans, credit monitoring, and free credit score service to improve the financial health of their customers. Once a user signs up an account on the MoneyLion app or website, the user has the option to link their bank accounts with the MoneyLion APIs. Once the bank account is linked to the APIs, the user will be able to login to their MoneyLion account and start having a conversation with the PFMBot based on their bank account information.

The PFMBot UI has a web interface built with using Javascript integration. The chatbot was created using Amazon Lex to build utterances based on the possible inquiries about the user’s MoneyLion bank account. PFMBot uses the Lex built-in AMAZON slots and parsed and converted the values from the built-in slots to pass to AWS Lambda. The AWS Lambda functions interacting with Amazon Lex are Java-based Lambda functions which call the MoneyLion Java-based internal APIs running on Spring Boot. These APIs obtain account data and related bank account information from the MoneyLion MySQL Database.


ADP Payroll Innovation Bot by Eric Liu, Jiaxing Yan, and Fan Yang

ADP PI (Payroll Innovation) bot is designed to help employees of ADP customers easily review their own payroll details and compare different payroll data by just asking the bot for results. The ADP PI Bot additionally offers issue reporting functionality for employees to report payroll issues and aids HR managers in quickly receiving and organizing any reported payroll issues.

The ADP Payroll Innovation bot is an ecosystem for the ADP payroll consisting of two chatbots, which includes ADP PI Bot for external clients (employees and HR managers), and ADP PI DevOps Bot for internal ADP DevOps team.

The architecture for the ADP PI DevOps bot is different architecture from the ADP PI bot shown above as it is deployed internally to ADP. The ADP PI DevOps bot allows input from both Slack and Alexa. When input comes into Slack, Slack sends the request to Lex for it to process the utterance. Lex then calls the Lambda backend, which obtains ADP data sitting in the ADP VPC running within an Amazon VPC. When input comes in from Alexa, a Lambda function is called that also obtains data from the ADP VPC running on AWS.

The architecture for the ADP PI bot consists of users entering in requests and/or entering issues via Slack. When requests/issues are entered via Slack, the Slack APIs communicate via Amazon API Gateway to AWS Lambda. The Lambda function either writes data into one of the Amazon DynamoDB databases for recording issues and/or sending issues or it sends the request to Lex. When sending issues, DynamoDB integrates with Trello to keep HR Managers abreast of the escalated issues. Once the request data is sent from Lambda to Lex, Lex processes the utterance and calls another Lambda function that integrates with the ADP API and it calls ADP data from within the ADP VPC, which runs on Amazon Virtual Private Cloud (VPC).

Python and Node.js were the chosen languages for the development of the bots.

The ADP PI bot ecosystem has the following functional groupings:

Employee Functionality

  • Summarize Payrolls
  • Compare Payrolls
  • Escalate Issues
  • Evolve PI Bot

HR Manager Functionality

  • Bot Management
  • Audit and Feedback

DevOps Functionality

  • Reduce call volume in service centers (ADP PI Bot).
  • Track issues and generate reports (ADP PI Bot).
  • Monitor jobs for various environment (ADP PI DevOps Bot)
  • View job dashboards (ADP PI DevOps Bot)
  • Query job details (ADP PI DevOps Bot)



Let’s all wish all the winners of the AWS Chatbot Challenge hearty congratulations on their excellent projects.

You can review more details on the winning projects, as well as, all of the submissions to the AWS Chatbot Challenge at: https://awschatbot2017.devpost.com/submissions. If you are curious on the details of Chatbot challenge contest including resources, rules, prizes, and judges, you can review the original challenge website here:  https://awschatbot2017.devpost.com/.

Hopefully, you are just as inspired as I am to build your own chatbot using Lex and Lambda. For more information, take a look at the Amazon Lex developer guide or the AWS AI blog on Building Better Bots Using Amazon Lex (Part 1)

Chat with you soon!


Announcement: IPS code

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/08/announcement-ips-code.html

So after 20 years, IBM is killing off my BlackICE code created in April 1998. So it’s time that I rewrite it.

BlackICE was the first “inline” intrusion-detection system, aka. an “intrusion prevention system” or IPS. ISS purchased my company in 2001 and replaced their RealSecure engine with it, and later renamed it Proventia. Then IBM purchased ISS in 2006. Now, they are formally canceling the project and moving customers onto Cisco’s products, which are based on Snort.

So now is a good time to write a replacement. The reason is that BlackICE worked fundamentally differently than Snort, using protocol analysis rather than pattern-matching. In this way, it worked more like Bro than Snort. The biggest benefit of protocol-analysis is speed, making it many times faster than Snort. The second benefit is better detection ability, as I describe in this post on Heartbleed.

So my plan is to create a new project. I’ll be checking in the starter bits into GitHub starting a couple weeks from now. I need to figure out a new name for the project, so I don’t have to rip off a name from William Gibson like I did last time :).

Some notes:

  • Yes, it’ll be GNU open source. I’m a capitalist, so I’ll earn money like snort/nmap dual-licensing it, charging companies who don’t want to open-source their addons. All capitalists GNU license their code.
  • C, not Rust. Sorry, I’m going for extreme scalability. We’ll re-visit this decision later when looking at building protocol parsers.
  • It’ll be 95% compatible with Snort signatures. Their language definition leaves so much ambiguous it’ll be hard to be 100% compatible.
  • It’ll support Snort output as well, though really, Snort’s events suck.
  • Protocol parsers in Lua, so you can use it as a replacement for Bro, writing parsers to extract data you are interested in.
  • Protocol state machine parsers in C, like you see in my Masscan project for X.509.
  • First version IDS only. These days, “inline” means also being able to MitM the SSL stack, so I’m gong to have to think harder on that.
  • Mutli-core worker threads off PF_RING/DPDK/netmap receive queues. Should handle 10gbps, tracking 10 million concurrent connections, with quad-core CPU.
So if you want to contribute to the project, here’s what I need:
  • Requirements from people who work daily with IDS/IPS today. I need you to write up what your products do well that you really like. I need to you write up what they suck at that needs to be fixed. These need to be in some detail.
  • Testing environment to play with. This means having a small server plugged into a real-world link running at a minimum of several gigabits-per-second available for the next year. I’ll sign NDAs related to the data I might see on the network.
  • Coders. I’ll be doing the basic architecture, but protocol parsers, output plugins, etc. will need work. Code will be in C and Lua for the near term. Unfortunately, since I’m going to dual-license, I’ll need waivers before accepting pull requests.
Anyway, follow me on Twitter @erratarob if you want to contribute.

Analyzing AWS Cost and Usage Reports with Looker and Amazon Athena

Post Syndicated from Dillon Morrison original https://aws.amazon.com/blogs/big-data/analyzing-aws-cost-and-usage-reports-with-looker-and-amazon-athena/

This is a guest post by Dillon Morrison at Looker. Looker is, in their own words, “a new kind of analytics platform–letting everyone in your business make better decisions by getting reliable answers from a tool they can use.” 

As the breadth of AWS products and services continues to grow, customers are able to more easily move their technology stack and core infrastructure to AWS. One of the attractive benefits of AWS is the cost savings. Rather than paying upfront capital expenses for large on-premises systems, customers can instead pay variables expenses for on-demand services. To further reduce expenses AWS users can reserve resources for specific periods of time, and automatically scale resources as needed.

The AWS Cost Explorer is great for aggregated reporting. However, conducting analysis on the raw data using the flexibility and power of SQL allows for much richer detail and insight, and can be the better choice for the long term. Thankfully, with the introduction of Amazon Athena, monitoring and managing these costs is now easier than ever.

In the post, I walk through setting up the data pipeline for cost and usage reports, Amazon S3, and Athena, and discuss some of the most common levers for cost savings. I surface tables through Looker, which comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive.

Analysis with Athena

With Athena, there’s no need to create hundreds of Excel reports, move data around, or deploy clusters to house and process data. Athena uses Apache Hive’s DDL to create tables, and the Presto querying engine to process queries. Analysis can be performed directly on raw data in S3. Conveniently, AWS exports raw cost and usage data directly into a user-specified S3 bucket, making it simple to start querying with Athena quickly. This makes continuous monitoring of costs virtually seamless, since there is no infrastructure to manage. Instead, users can leverage the power of the Athena SQL engine to easily perform ad-hoc analysis and data discovery without needing to set up a data warehouse.

After the data pipeline is established, cost and usage data (the recommended billing data, per AWS documentation) provides a plethora of comprehensive information around usage of AWS services and the associated costs. Whether you need the report segmented by product type, user identity, or region, this report can be cut-and-sliced any number of ways to properly allocate costs for any of your business needs. You can then drill into any specific line item to see even further detail, such as the selected operating system, tenancy, purchase option (on-demand, spot, or reserved), and so on.


By default, the Cost and Usage report exports CSV files, which you can compress using gzip (recommended for performance). There are some additional configuration options for tuning performance further, which are discussed below.


If you want to follow along, you need the following resources:

Enable the cost and usage reports

First, enable the Cost and Usage report. For Time unit, select Hourly. For Include, select Resource IDs. All options are prompted in the report-creation window.

The Cost and Usage report dumps CSV files into the specified S3 bucket. Please note that it can take up to 24 hours for the first file to be delivered after enabling the report.

Configure the S3 bucket and files for Athena querying

In addition to the CSV file, AWS also creates a JSON manifest file for each cost and usage report. Athena requires that all of the files in the S3 bucket are in the same format, so we need to get rid of all these manifest files. If you’re looking to get started with Athena quickly, you can simply go into your S3 bucket and delete the manifest file manually, skip the automation described below, and move on to the next section.

To automate the process of removing the manifest file each time a new report is dumped into S3, which I recommend as you scale, there are a few additional steps. The folks at Concurrency labs wrote a great overview and set of scripts for this, which you can find in their GitHub repo.

These scripts take the data from an input bucket, remove anything unnecessary, and dump it into a new output bucket. We can utilize AWS Lambda to trigger this process whenever new data is dropped into S3, or on a nightly basis, or whatever makes most sense for your use-case, depending on how often you’re querying the data. Please note that enabling the “hourly” report means that data is reported at the hour-level of granularity, not that a new file is generated every hour.

Following these scripts, you’ll notice that we’re adding a date partition field, which isn’t necessary but improves query performance. In addition, converting data from CSV to a columnar format like ORC or Parquet also improves performance. We can automate this process using Lambda whenever new data is dropped in our S3 bucket. Amazon Web Services discusses columnar conversion at length, and provides walkthrough examples, in their documentation.

As a long-term solution, best practice is to use compression, partitioning, and conversion. However, for purposes of this walkthrough, we’re not going to worry about them so we can get up-and-running quicker.

Set up the Athena query engine

In your AWS console, navigate to the Athena service, and click “Get Started”. Follow the tutorial and set up a new database (we’ve called ours “AWS Optimizer” in this example). Don’t worry about configuring your initial table, per the tutorial instructions. We’ll be creating a new table for cost and usage analysis. Once you walked through the tutorial steps, you’ll be able to access the Athena interface, and can begin running Hive DDL statements to create new tables.

One thing that’s important to note, is that the Cost and Usage CSVs also contain the column headers in their first row, meaning that the column headers would be included in the dataset and any queries. For testing and quick set-up, you can remove this line manually from your first few CSV files. Long-term, you’ll want to use a script to programmatically remove this row each time a new file is dropped in S3 (every few hours typically). We’ve drafted up a sample script for ease of reference, which we run on Lambda. We utilize Lambda’s native ability to invoke the script whenever a new object is dropped in S3.

For cost and usage, we recommend using the DDL statement below. Since our data is in CSV format, we don’t need to use a SerDe, we can simply specify the “separatorChar, quoteChar, and escapeChar”, and the structure of the files (“TEXTFILE”). Note that AWS does have an OpenCSV SerDe as well, if you prefer to use that.


identity_LineItemId String,
identity_TimeInterval String,
bill_InvoiceId String,
bill_BillingEntity String,
bill_BillType String,
bill_PayerAccountId String,
bill_BillingPeriodStartDate String,
bill_BillingPeriodEndDate String,
lineItem_UsageAccountId String,
lineItem_LineItemType String,
lineItem_UsageStartDate String,
lineItem_UsageEndDate String,
lineItem_ProductCode String,
lineItem_UsageType String,
lineItem_Operation String,
lineItem_AvailabilityZone String,
lineItem_ResourceId String,
lineItem_UsageAmount String,
lineItem_NormalizationFactor String,
lineItem_NormalizedUsageAmount String,
lineItem_CurrencyCode String,
lineItem_UnblendedRate String,
lineItem_UnblendedCost String,
lineItem_BlendedRate String,
lineItem_BlendedCost String,
lineItem_LineItemDescription String,
lineItem_TaxType String,
product_ProductName String,
product_accountAssistance String,
product_architecturalReview String,
product_architectureSupport String,
product_availability String,
product_bestPractices String,
product_cacheEngine String,
product_caseSeverityresponseTimes String,
product_clockSpeed String,
product_currentGeneration String,
product_customerServiceAndCommunities String,
product_databaseEdition String,
product_databaseEngine String,
product_dedicatedEbsThroughput String,
product_deploymentOption String,
product_description String,
product_durability String,
product_ebsOptimized String,
product_ecu String,
product_endpointType String,
product_engineCode String,
product_enhancedNetworkingSupported String,
product_executionFrequency String,
product_executionLocation String,
product_feeCode String,
product_feeDescription String,
product_freeQueryTypes String,
product_freeTrial String,
product_frequencyMode String,
product_fromLocation String,
product_fromLocationType String,
product_group String,
product_groupDescription String,
product_includedServices String,
product_instanceFamily String,
product_instanceType String,
product_io String,
product_launchSupport String,
product_licenseModel String,
product_location String,
product_locationType String,
product_maxIopsBurstPerformance String,
product_maxIopsvolume String,
product_maxThroughputvolume String,
product_maxVolumeSize String,
product_maximumStorageVolume String,
product_memory String,
product_messageDeliveryFrequency String,
product_messageDeliveryOrder String,
product_minVolumeSize String,
product_minimumStorageVolume String,
product_networkPerformance String,
product_operatingSystem String,
product_operation String,
product_operationsSupport String,
product_physicalProcessor String,
product_preInstalledSw String,
product_proactiveGuidance String,
product_processorArchitecture String,
product_processorFeatures String,
product_productFamily String,
product_programmaticCaseManagement String,
product_provisioned String,
product_queueType String,
product_requestDescription String,
product_requestType String,
product_routingTarget String,
product_routingType String,
product_servicecode String,
product_sku String,
product_softwareType String,
product_storage String,
product_storageClass String,
product_storageMedia String,
product_technicalSupport String,
product_tenancy String,
product_thirdpartySoftwareSupport String,
product_toLocation String,
product_toLocationType String,
product_training String,
product_transferType String,
product_usageFamily String,
product_usagetype String,
product_vcpu String,
product_version String,
product_volumeType String,
product_whoCanOpenCases String,
pricing_LeaseContractLength String,
pricing_OfferingClass String,
pricing_PurchaseOption String,
pricing_publicOnDemandCost String,
pricing_publicOnDemandRate String,
pricing_term String,
pricing_unit String,
reservation_AvailabilityZone String,
reservation_NormalizedUnitsPerReservation String,
reservation_NumberOfReservations String,
reservation_ReservationARN String,
reservation_TotalReservedNormalizedUnits String,
reservation_TotalReservedUnits String,
reservation_UnitsPerReservation String,
resourceTags_userName String,
resourceTags_usercostcategory String  

      ESCAPED BY '\\'

    LOCATION 's3://<<your bucket name>>';

Once you’ve successfully executed the command, you should see a new table named “cost_and_usage” with the below properties. Now we’re ready to start executing queries and running analysis!

Start with Looker and connect to Athena

Setting up Looker is a quick process, and you can try it out for free here (or download from Amazon Marketplace). It takes just a few seconds to connect Looker to your Athena database, and Looker comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive. After you’re connected, you can use the Looker UI to run whatever analysis you’d like. Looker translates this UI to optimized SQL, so any user can execute and visualize queries for true self-service analytics.

Major cost saving levers

Now that the data pipeline is configured, you can dive into the most popular use cases for cost savings. In this post, I focus on:

  • Purchasing Reserved Instances vs. On-Demand Instances
  • Data transfer costs
  • Allocating costs over users or other Attributes (denoted with resource tags)

On-Demand, Spot, and Reserved Instances

Purchasing Reserved Instances vs On-Demand Instances is arguably going to be the biggest cost lever for heavy AWS users (Reserved Instances run up to 75% cheaper!). AWS offers three options for purchasing instances:

  • On-Demand—Pay as you use.
  • Spot (variable cost)—Bid on spare Amazon EC2 computing capacity.
  • Reserved Instances—Pay for an instance for a specific, allotted period of time.

When purchasing a Reserved Instance, you can also choose to pay all-upfront, partial-upfront, or monthly. The more you pay upfront, the greater the discount.

If your company has been using AWS for some time now, you should have a good sense of your overall instance usage on a per-month or per-day basis. Rather than paying for these instances On-Demand, you should try to forecast the number of instances you’ll need, and reserve them with upfront payments.

The total amount of usage with Reserved Instances versus overall usage with all instances is called your coverage ratio. It’s important not to confuse your coverage ratio with your Reserved Instance utilization. Utilization represents the amount of reserved hours that were actually used. Don’t worry about exceeding capacity, you can still set up Auto Scaling preferences so that more instances get added whenever your coverage or utilization crosses a certain threshold (we often see a target of 80% for both coverage and utilization among savvy customers).

Calculating the reserved costs and coverage can be a bit tricky with the level of granularity provided by the cost and usage report. The following query shows your total cost over the last 6 months, broken out by Reserved Instance vs other instance usage. You can substitute the cost field for usage if you’d prefer. Please note that you should only have data for the time period after the cost and usage report has been enabled (though you can opt for up to 3 months of historical data by contacting your AWS Account Executive). If you’re just getting started, this query will only show a few days.


	DATE_FORMAT(from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate),'%Y-%m') AS "cost_and_usage.usage_start_month",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0) AS "cost_and_usage.total_unblended_cost",
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_reserved_unblended_cost",
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_on_ris",
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_non_reserved_unblended_cost",
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_on_non_ris"
FROM aws_optimizer.cost_and_usage  AS cost_and_usage

	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))

The resulting table should look something like the image below (I’m surfacing tables through Looker, though the same table would result from querying via command line or any other interface).

With a BI tool, you can create dashboards for easy reference and monitoring. New data is dumped into S3 every few hours, so your dashboards can update several times per day.

It’s an iterative process to understand the appropriate number of Reserved Instances needed to meet your business needs. After you’ve properly integrated Reserved Instances into your purchasing patterns, the savings can be significant. If your coverage is consistently below 70%, you should seriously consider adjusting your purchase types and opting for more Reserved instances.

Data transfer costs

One of the great things about AWS data storage is that it’s incredibly cheap. Most charges often come from moving and processing that data. There are several different prices for transferring data, broken out largely by transfers between regions and availability zones. Transfers between regions are the most costly, followed by transfers between Availability Zones. Transfers within the same region and same availability zone are free unless using elastic or public IP addresses, in which case there is a cost. You can find more detailed information in the AWS Pricing Docs. With this in mind, there are several simple strategies for helping reduce costs.

First, since costs increase when transferring data between regions, it’s wise to ensure that as many services as possible reside within the same region. The more you can localize services to one specific region, the lower your costs will be.

Second, you should maximize the data you’re routing directly within AWS services and IP addresses. Transfers out to the open internet are the most costly and least performant mechanisms of data transfers, so it’s best to keep transfers within AWS services.

Lastly, data transfers between private IP addresses are cheaper than between elastic or public IP addresses, so utilizing private IP addresses as much as possible is the most cost-effective strategy.

The following query provides a table depicting the total costs for each AWS product, broken out transfer cost type. Substitute the “lineitem_productcode” field in the query to segment the costs by any other attribute. If you notice any unusually high spikes in cost, you’ll need to dig deeper to understand what’s driving that spike: location, volume, and so on. Drill down into specific costs by including “product_usagetype” and “product_transfertype” in your query to identify the types of transfer costs that are driving up your bill.

	cost_and_usage.lineitem_productcode  AS "cost_and_usage.product_code",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost), 0) AS "cost_and_usage.total_unblended_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_data_transfer_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer-In')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_inbound_data_transfer_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer-Out')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_outbound_data_transfer_cost"
FROM aws_optimizer.cost_and_usage  AS cost_and_usage

	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))

When moving between regions or over the open web, many data transfer costs also include the origin and destination location of the data movement. Using a BI tool with mapping capabilities, you can get a nice visual of data flows. The point at the center of the map is used to represent external data flows over the open internet.

Analysis by tags

AWS provides the option to apply custom tags to individual resources, so you can allocate costs over whatever customized segment makes the most sense for your business. For a SaaS company that hosts software for customers on AWS, maybe you’d want to tag the size of each customer. The following query uses custom tags to display the reserved, data transfer, and total cost for each AWS service, broken out by tag categories, over the last 6 months. You’ll want to substitute the cost_and_usage.resourcetags_customersegment and cost_and_usage.customer_segment with the name of your customer field.


SELECT *, DENSE_RANK() OVER (ORDER BY z___min_rank) as z___pivot_row_rank, RANK() OVER (PARTITION BY z__pivot_col_rank ORDER BY z___min_rank) as z__pivot_col_ordering FROM (
SELECT *, MIN(z___rank) OVER (PARTITION BY "cost_and_usage.product_code") as z___min_rank FROM (
SELECT *, RANK() OVER (ORDER BY CASE WHEN z__pivot_col_rank=1 THEN (CASE WHEN "cost_and_usage.total_unblended_cost" IS NOT NULL THEN 0 ELSE 1 END) ELSE 2 END, CASE WHEN z__pivot_col_rank=1 THEN "cost_and_usage.total_unblended_cost" ELSE NULL END DESC, "cost_and_usage.total_unblended_cost" DESC, z__pivot_col_rank, "cost_and_usage.product_code") AS z___rank FROM (
SELECT *, DENSE_RANK() OVER (ORDER BY CASE WHEN "cost_and_usage.customer_segment" IS NULL THEN 1 ELSE 0 END, "cost_and_usage.customer_segment") AS z__pivot_col_rank FROM (
	cost_and_usage.lineitem_productcode  AS "cost_and_usage.product_code",
	cost_and_usage.resourcetags_customersegment  AS "cost_and_usage.customer_segment",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0) AS "cost_and_usage.total_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_data_transfers_unblended",
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.unblended_percent_spend_on_ris"
FROM aws_optimizer.cost_and_usage_raw  AS cost_and_usage

	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1,2) ww
) bb WHERE z__pivot_col_rank <= 16384
) aa
) xx
) zz
 WHERE z___pivot_row_rank <= 500 OR z__pivot_col_ordering = 1 ORDER BY z___pivot_row_rank

The resulting table in this example looks like the results below. In this example, you can tell that we’re making poor use of Reserved Instances because they represent such a small portion of our overall costs.

Again, using a BI tool to visualize these costs and trends over time makes the analysis much easier to consume and take action on.


Saving costs on your AWS spend is always an iterative, ongoing process. Hopefully with these queries alone, you can start to understand your spending patterns and identify opportunities for savings. However, this is just a peek into the many opportunities available through analysis of the Cost and Usage report. Each company is different, with unique needs and usage patterns. To achieve maximum cost savings, we encourage you to set up an analytics environment that enables your team to explore all potential cuts and slices of your usage data, whenever it’s necessary. Exploring different trends and spikes across regions, services, user types, etc. helps you gain comprehensive understanding of your major cost levers and consistently implement new cost reduction strategies.

Note that all of the queries and analysis provided in this post were generated using the Looker data platform. If you’re already a Looker customer, you can get all of this analysis, additional pre-configured dashboards, and much more using Looker Blocks for AWS.

About the Author

Dillon Morrison leads the Platform Ecosystem at Looker. He enjoys exploring new technologies and architecting the most efficient data solutions for the business needs of his company and their customers. In his spare time, you’ll find Dillon rock climbing in the Bay Area or nose deep in the docs of the latest AWS product release at his favorite cafe (“Arlequin in SF is unbeatable!”).




Raspbian Stretch has arrived for Raspberry Pi

Post Syndicated from Simon Long original https://www.raspberrypi.org/blog/raspbian-stretch/

It’s now just under two years since we released the Jessie version of Raspbian. Those of you who know that Debian run their releases on a two-year cycle will therefore have been wondering when we might be releasing the next version, codenamed Stretch. Well, wonder no longer – Raspbian Stretch is available for download today!

Disney Pixar Toy Story Raspbian Stretch Raspberry Pi

Debian releases are named after characters from Disney Pixar’s Toy Story trilogy. In case, like me, you were wondering: Stretch is a purple octopus from Toy Story 3. Hi, Stretch!

The differences between Jessie and Stretch are mostly under-the-hood optimisations, and you really shouldn’t notice any differences in day-to-day use of the desktop and applications. (If you’re really interested, the technical details are in the Debian release notes here.)

However, we’ve made a few small changes to our image that are worth mentioning.

New versions of applications

Version 3.0.1 of Sonic Pi is included – this includes a lot of new functionality in terms of input/output. See the Sonic Pi release notes for more details of exactly what has changed.

Raspbian Stretch Raspberry Pi

The Chromium web browser has been updated to version 60, the most recent stable release. This offers improved memory usage and more efficient code, so you may notice it running slightly faster than before. The visual appearance has also been changed very slightly.

Raspbian Stretch Raspberry Pi

Bluetooth audio

In Jessie, we used PulseAudio to provide support for audio over Bluetooth, but integrating this with the ALSA architecture used for other audio sources was clumsy. For Stretch, we are using the bluez-alsa package to make Bluetooth audio work with ALSA itself. PulseAudio is therefore no longer installed by default, and the volume plugin on the taskbar will no longer start and stop PulseAudio. From a user point of view, everything should still work exactly as before – the only change is that if you still wish to use PulseAudio for some other reason, you will need to install it yourself.

Better handling of other usernames

The default user account in Raspbian has always been called ‘pi’, and a lot of the desktop applications assume that this is the current user. This has been changed for Stretch, so now applications like Raspberry Pi Configuration no longer assume this to be the case. This means, for example, that the option to automatically log in as the ‘pi’ user will now automatically log in with the name of the current user instead.

One other change is how sudo is handled. By default, the ‘pi’ user is set up with passwordless sudo access. We are no longer assuming this to be the case, so now desktop applications which require sudo access will prompt for the password rather than simply failing to work if a user without passwordless sudo uses them.

Scratch 2 SenseHAT extension

In the last Jessie release, we added the offline version of Scratch 2. While Scratch 2 itself hasn’t changed for this release, we have added a new extension to allow the SenseHAT to be used with Scratch 2. Look under ‘More Blocks’ and choose ‘Add an Extension’ to load the extension.

This works with either a physical SenseHAT or with the SenseHAT emulator. If a SenseHAT is connected, the extension will control that in preference to the emulator.

Raspbian Stretch Raspberry Pi

Fix for Broadpwn exploit

A couple of months ago, a vulnerability was discovered in the firmware of the BCM43xx wireless chipset which is used on Pi 3 and Pi Zero W; this potentially allows an attacker to take over the chip and execute code on it. The Stretch release includes a patch that addresses this vulnerability.

There is also the usual set of minor bug fixes and UI improvements – I’ll leave you to spot those!

How to get Raspbian Stretch

As this is a major version upgrade, we recommend using a clean image; these are available from the Downloads page on our site as usual.

Upgrading an existing Jessie image is possible, but is not guaranteed to work in every circumstance. If you wish to try upgrading a Jessie image to Stretch, we strongly recommend taking a backup first – we can accept no responsibility for loss of data from a failed update.

To upgrade, first modify the files /etc/apt/sources.list and /etc/apt/sources.list.d/raspi.list. In both files, change every occurrence of the word ‘jessie’ to ‘stretch’. (Both files will require sudo to edit.)

Then open a terminal window and execute

sudo apt-get update
sudo apt-get -y dist-upgrade

Answer ‘yes’ to any prompts. There may also be a point at which the install pauses while a page of information is shown on the screen – hold the ‘space’ key to scroll through all of this and then hit ‘q’ to continue.

Finally, if you are not using PulseAudio for anything other than Bluetooth audio, remove it from the image by entering

sudo apt-get -y purge pulseaudio*

The post Raspbian Stretch has arrived for Raspberry Pi appeared first on Raspberry Pi.

New – AWS SAM Local (Beta) – Build and Test Serverless Applications Locally

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-aws-sam-local-beta-build-and-test-serverless-applications-locally/

Today we’re releasing a beta of a new tool, SAM Local, that makes it easy to build and test your serverless applications locally. In this post we’ll use SAM local to build, debug, and deploy a quick application that allows us to vote on tabs or spaces by curling an endpoint. AWS introduced Serverless Application Model (SAM) last year to make it easier for developers to deploy serverless applications. If you’re not already familiar with SAM my colleague Orr wrote a great post on how to use SAM that you can read in about 5 minutes. At it’s core, SAM is a powerful open source specification built on AWS CloudFormation that makes it easy to keep your serverless infrastructure as code – and they have the cutest mascot.

SAM Local takes all the good parts of SAM and brings them to your local machine.

There are a couple of ways to install SAM Local but the easiest is through NPM. A quick npm install -g aws-sam-local should get us going but if you want the latest version you can always install straight from the source: go get github.com/awslabs/aws-sam-local (this will create a binary named aws-sam-local, not sam).

I like to vote on things so let’s write a quick SAM application to vote on Spaces versus Tabs. We’ll use a very simple, but powerful, architecture of API Gateway fronting a Lambda function and we’ll store our results in DynamoDB. In the end a user should be able to curl our API curl https://SOMEURL/ -d '{"vote": "spaces"}' and get back the number of votes.

Let’s start by writing a simple SAM template.yaml:

AWSTemplateFormatVersion : '2010-09-09'
Transform: AWS::Serverless-2016-10-31
    Type: "AWS::Serverless::SimpleTable"
    Type: "AWS::Serverless::Function"
      Runtime: python3.6
      Handler: lambda_function.lambda_handler
      Policies: AmazonDynamoDBFullAccess
          TABLE_NAME: !Ref VotesTable
          Type: Api
            Path: /
            Method: post

So we create a [dynamo_i] table that we expose to our Lambda function through an environment variable called TABLE_NAME.

To test that this template is valid I’ll go ahead and call sam validate to make sure I haven’t fat-fingered anything. It returns Valid! so let’s go ahead and get to work on our Lambda function.

import os
import os
import json
import boto3
votes_table = boto3.resource('dynamodb').Table(os.getenv('TABLE_NAME'))

def lambda_handler(event, context):
    if event['httpMethod'] == 'GET':
        resp = votes_table.scan()
        return {'body': json.dumps({item['id']: int(item['votes']) for item in resp['Items']})}
    elif event['httpMethod'] == 'POST':
            body = json.loads(event['body'])
            return {'statusCode': 400, 'body': 'malformed json input'}
        if 'vote' not in body:
            return {'statusCode': 400, 'body': 'missing vote in request body'}
        if body['vote'] not in ['spaces', 'tabs']:
            return {'statusCode': 400, 'body': 'vote value must be "spaces" or "tabs"'}

        resp = votes_table.update_item(
            Key={'id': body['vote']},
            UpdateExpression='ADD votes :incr',
            ExpressionAttributeValues={':incr': 1},
        return {'body': "{} now has {} votes".format(body['vote'], resp['Attributes']['votes'])}

So let’s test this locally. I’ll need to create a real DynamoDB database to talk to and I’ll need to provide the name of that database through the enviornment variable TABLE_NAME. I could do that with an env.json file or I can just pass it on the command line. First, I can call:
$ echo '{"httpMethod": "POST", "body": "{\"vote\": \"spaces\"}"}' |\
TABLE_NAME="vote-spaces-tabs" sam local invoke "VoteSpacesTabs"

to test the Lambda – it returns the number of votes for spaces so theoritically everything is working. Typing all of that out is a pain so I could generate a sample event with sam local generate-event api and pass that in to the local invocation. Far easier than all of that is just running our API locally. Let’s do that: sam local start-api. Now I can curl my local endpoints to test everything out.
I’ll run the command: $ curl -d '{"vote": "tabs"}' and it returns: “tabs now has 12 votes”. Now, of course I did not write this function perfectly on my first try. I edited and saved several times. One of the benefits of hot-reloading is that as I change the function I don’t have to do any additional work to test the new function. This makes iterative development vastly easier.

Let’s say we don’t want to deal with accessing a real DynamoDB database over the network though. What are our options? Well we can download DynamoDB Local and launch it with java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb. Then we can have our Lambda function use the AWS_SAM_LOCAL environment variable to make some decisions about how to behave. Let’s modify our function a bit:

import os
import json
import boto3
if os.getenv("AWS_SAM_LOCAL"):
    votes_table = boto3.resource(
    votes_table = boto3.resource('dynamodb').Table(os.getenv('TABLE_NAME'))

Now we’re using a local endpoint to connect to our local database which makes working without wifi a little easier.

SAM local even supports interactive debugging! In Java and Node.js I can just pass the -d flag and a port to immediately enable the debugger. For Python I could use a library like import epdb; epdb.serve() and connect that way. Then we can call sam local invoke -d 8080 "VoteSpacesTabs" and our function will pause execution waiting for you to step through with the debugger.

Alright, I think we’ve got everything working so let’s deploy this!

First I’ll call the sam package command which is just an alias for aws cloudformation package and then I’ll use the result of that command to sam deploy.

$ sam package --template-file template.yaml --s3-bucket MYAWESOMEBUCKET --output-template-file package.yaml
Uploading to 144e47a4a08f8338faae894afe7563c3  90570 / 90570.0  (100.00%)
Successfully packaged artifacts and wrote output template to file package.yaml.
Execute the following command to deploy the packaged template
aws cloudformation deploy --template-file package.yaml --stack-name 
$ sam deploy --template-file package.yaml --stack-name VoteForSpaces --capabilities CAPABILITY_IAM
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - VoteForSpaces

Which brings us to our API:

I’m going to hop over into the production stage and add some rate limiting in case you guys start voting a lot – but otherwise we’ve taken our local work and deployed it to the cloud without much effort at all. I always enjoy it when things work on the first deploy!

You can vote now and watch the results live! http://spaces-or-tabs.s3-website-us-east-1.amazonaws.com/

We hope that SAM Local makes it easier for you to test, debug, and deploy your serverless apps. We have a CONTRIBUTING.md guide and we welcome pull requests. Please tweet at us to let us know what cool things you build. You can see our What’s New post here and the documentation is live here.


Automating Blue/Green Deployments of Infrastructure and Application Code using AMIs, AWS Developer Tools, & Amazon EC2 Systems Manager

Post Syndicated from Ramesh Adabala original https://aws.amazon.com/blogs/devops/bluegreen-infrastructure-application-deployment-blog/

Previous DevOps blog posts have covered the following use cases for infrastructure and application deployment automation:

An AMI provides the information required to launch an instance, which is a virtual server in the cloud. You can use one AMI to launch as many instances as you need. It is security best practice to customize and harden your base AMI with required operating system updates and, if you are using AWS native services for continuous security monitoring and operations, you are strongly encouraged to bake into the base AMI agents such as those for Amazon EC2 Systems Manager (SSM), Amazon Inspector, CodeDeploy, and CloudWatch Logs. A customized and hardened AMI is often referred to as a “golden AMI.” The use of golden AMIs to create EC2 instances in your AWS environment allows for fast and stable application deployment and scaling, secure application stack upgrades, and versioning.

In this post, using the DevOps automation capabilities of Systems Manager, AWS developer tools (CodePipeLine, CodeDeploy, CodeCommit, CodeBuild), I will show you how to use AWS CodePipeline to orchestrate the end-to-end blue/green deployments of a golden AMI and application code. Systems Manager Automation is a powerful security feature for enterprises that want to mature their DevSecOps practices.

Here are the high-level phases and primary services covered in this use case:


You can access the source code for the sample used in this post here: https://github.com/awslabs/automating-governance-sample/tree/master/Bluegreen-AMI-Application-Deployment-blog.

This sample will create a pipeline in AWS CodePipeline with the building blocks to support the blue/green deployments of infrastructure and application. The sample includes a custom Lambda step in the pipeline to execute Systems Manager Automation to build a golden AMI and update the Auto Scaling group with the golden AMI ID for every rollout of new application code. This guarantees that every new application deployment is on a fully patched and customized AMI in a continuous integration and deployment model. This enables the automation of hardened AMI deployment with every new version of application deployment.



We will build and run this sample in three parts.

Part 1: Setting up the AWS developer tools and deploying a base web application

Part 1 of the AWS CloudFormation template creates the initial Java-based web application environment in a VPC. It also creates all the required components of Systems Manager Automation, CodeCommit, CodeBuild, and CodeDeploy to support the blue/green deployments of the infrastructure and application resulting from ongoing code releases.

Part 1 of the AWS CloudFormation stack creates these resources:

After Part 1 of the AWS CloudFormation stack creation is complete, go to the Outputs tab and click the Elastic Load Balancing link. You will see the following home page for the base web application:

Make sure you have all the outputs from the Part 1 stack handy. You need to supply them as parameters in Part 3 of the stack.

Part 2: Setting up your CodeCommit repository

In this part, you will commit and push your sample application code into the CodeCommit repository created in Part 1. To access the initial git commands to clone the empty repository to your local machine, click Connect to go to the AWS CodeCommit console. Make sure you have the IAM permissions required to access AWS CodeCommit from command line interface (CLI).

After you’ve cloned the repository locally, download the sample application files from the part2 folder of the Git repository and place the files directly into your local repository. Do not include the aws-codedeploy-sample-tomcat folder. Go to the local directory and type the following commands to commit and push the files to the CodeCommit repository:

git add .
git commit -a -m "add all files from the AWS Java Tomcat CodeDeploy application"
git push

After all the files are pushed successfully, the repository should look like this:


Part 3: Setting up CodePipeline to enable blue/green deployments     

Part 3 of the AWS CloudFormation template creates the pipeline in AWS CodePipeline and all the required components.

a) Source: The pipeline is triggered by any change to the CodeCommit repository.

b) BuildGoldenAMI: This Lambda step executes the Systems Manager Automation document to build the golden AMI. After the golden AMI is successfully created, a new launch configuration with the new AMI details will be updated into the Auto Scaling group of the application deployment group. You can watch the progress of the automation in the EC2 console from the Systems Manager –> Automations menu.

c) Build: This step uses the application build spec file to build the application build artifact. Here are the CodeBuild execution steps and their status:

d) Deploy: This step clones the Auto Scaling group, launches the new instances with the new AMI, deploys the application changes, reroutes the traffic from the elastic load balancer to the new instances and terminates the old Auto Scaling group. You can see the execution steps and their status in the CodeDeploy console.

After the CodePipeline execution is complete, you can access the application by clicking the Elastic Load Balancing link. You can find it in the output of Part 1 of the AWS CloudFormation template. Any consecutive commits to the application code in the CodeCommit repository trigger the pipelines and deploy the infrastructure and code with an updated AMI and code.


If you have feedback about this post, add it to the Comments section below. If you have questions about implementing the example used in this post, open a thread on the Developer Tools forum.

About the author


Ramesh Adabala is a Solutions Architect in Southeast Enterprise Solution Architecture team at Amazon Web Services.

Deploy a Data Warehouse Quickly with Amazon Redshift, Amazon RDS for PostgreSQL and Tableau Server

Post Syndicated from Jorge A. Lopez original https://aws.amazon.com/blogs/big-data/deploy-a-data-warehouse-quickly-with-amazon-redshift-amazon-rds-for-postgresql-and-tableau-server/

One of the benefits of a data warehouse environment using both Amazon Redshift and Amazon RDS for PostgreSQL is that you can leverage the advantages of each service. Amazon Redshift is a high performance, petabyte-scale data warehouse service optimized for the online analytical processing (OLAP) queries typical of analytic reporting and business intelligence applications. On the other hand, a service like RDS excels at transactional OLTP workloads such as inserting, deleting, or updating rows.

In the recent JOIN Amazon Redshift AND Amazon RDS PostgreSQL WITH dblink post, we showed how you can deploy such an environment. Now, you can deploy a similar architecture using the Modern Data Warehouse on AWS Quick Start. The Quick Start is an automated deployment that uses AWS CloudFormation templates to launch, configure, and run the services required to deploy a data warehousing environment on AWS, based on Amazon Redshift and RDS for PostgreSQL.

The Quick Start also includes an instance of Tableau Server, running on Amazon EC2. This gives you the ability to host and serve analytic dashboards, workbooks and visualizations, supported by a trial license. You can play with the sample data source and dashboard, or create your own analyses by uploading your own data sets.

For more information about the Modern Data Warehouse on AWS Quick Start, download the full deployment guide. If you’re ready to get started, use one of the buttons below:

Option 1: Deploy Quick Start into a new VPC on AWS

Option 2: Deploy Quick Start into an existing VPC

If you have questions, please leave a comment below.

Next Steps

You can also join us for the webinar Unlock Insights and Reduce Costs by Modernizing Your Data Warehouse on AWS on Tuesday, August 22, 2017. Pearson, the education and publishing company, will present best practices and lessons learned during their journey to Amazon Redshift and Tableau.

Growing up alongside tech

Post Syndicated from Eevee original https://eev.ee/blog/2017/08/09/growing-up-alongside-tech/

IndustrialRobot asks… or, uh, asked last month:

industrialrobot: How has your views on tech changed as you’ve got older?

This is so open-ended that it’s actually stumped me for a solid month. I’ve had a surprisingly hard time figuring out where to even start.

It’s not that my views of tech have changed too much — it’s that they’ve changed very gradually. Teasing out and explaining any one particular change is tricky when it happened invisibly over the course of 10+ years.

I think a better framework for this is to consider how my relationship to tech has changed. It’s gone through three pretty distinct phases, each of which has strongly colored how I feel and talk about technology.

Act I

In which I start from nothing.

Nothing is an interesting starting point. You only really get to start there once.

Learning something on my own as a kid was something of a magical experience, in a way that I don’t think I could replicate as an adult. I liked computers; I liked toying with computers; so I did that.

I don’t know how universal this is, but when I was a kid, I couldn’t even conceive of how incredible things were made. Buildings? Cars? Paintings? Operating systems? Where does any of that come from? Obviously someone made them, but it’s not the sort of philosophical point I lingered on when I was 10, so in the back of my head they basically just appeared fully-formed from the æther.

That meant that when I started trying out programming, I had no aspirations. I couldn’t imagine how far I would go, because all the examples of how far I would go were completely disconnected from any idea of human achievement. I started out with BASIC on a toy computer; how could I possibly envision a connection between that and something like a mainstream video game? Every new thing felt like a new form of magic, so I couldn’t conceive that I was even in the same ballpark as whatever process produced real software. (Even seeing the source code for GORILLAS.BAS, it didn’t quite click. I didn’t think to try reading any of it until years after I’d first encountered the game.)

This isn’t to say I didn’t have goals. I invented goals constantly, as I’ve always done; as soon as I learned about a new thing, I’d imagine some ways to use it, then try to build them. I produced a lot of little weird goofy toys, some of which entertained my tiny friend group for a couple days, some of which never saw the light of day. But none of it felt like steps along the way to some mountain peak of mastery, because I didn’t realize the mountain peak was even a place that could be gone to. It was pure, unadulterated (!) playing.

I contrast this to my art career, which started only a couple years ago. I was already in my late 20s, so I’d already spend decades seeing a very broad spectrum of art: everything from quick sketches up to painted masterpieces. And I’d seen the people who create that art, sometimes seen them create it in real-time. I’m even in a relationship with one of them! And of course I’d already had the experience of advancing through tech stuff and discovering first-hand that even the most amazing software is still just code someone wrote.

So from the very beginning, from the moment I touched pencil to paper, I knew the possibilities. I knew that the goddamn Sistine Chapel was something I could learn to do, if I were willing to put enough time in — and I knew that I’m not, so I’d have to settle somewhere a ways before that. I knew that I’d have to put an awful lot of work in before I’d be producing anything very impressive.

I did it anyway (though perhaps waited longer than necessary to start), but those aren’t things I can un-know, and so I can never truly explore art from a place of pure ignorance. On the other hand, I’ve probably learned to draw much more quickly and efficiently than if I’d done it as a kid, precisely because I know those things. Now I can decide I want to do something far beyond my current abilities, then go figure out how to do it. When I was just playing, that kind of ambition was impossible.

So, I played.

How did this affect my views on tech? Well, I didn’t… have any. Learning by playing tends to teach you things in an outward sprawl without many abrupt jumps to new areas, so you don’t tend to run up against conflicting information. The whole point of opinions is that they’re your own resolution to a conflict; without conflict, I can’t meaningfully say I had any opinions. I just accepted whatever I encountered at face value, because I didn’t even know enough to suspect there could be alternatives yet.

Act II

That started to seriously change around, I suppose, the end of high school and beginning of college. I was becoming aware of this whole “open source” concept. I took classes that used languages I wouldn’t otherwise have given a second thought. (One of them was Python!) I started to contribute to other people’s projects. Eventually I even got a job, where I had to work with other people. It probably also helped that I’d had to maintain my own old code a few times.

Now I was faced with conflicting subjective ideas, and I had to form opinions about them! And so I did. With gusto. Over time, I developed an idea of what was Right based on experience I’d accrued. And then I set out to always do things Right.

That’s served me decently well with some individual problems, but it also led me to inflict a lot of unnecessary pain on myself. Several endeavors languished for no other reason than my dissatisfaction with the architecture, long before the basic functionality was done. I started a number of “pure” projects around this time, generic tools like imaging libraries that I had no direct need for. I built them for the sake of them, I guess because I felt like I was improving some niche… but of course I never finished any. It was always in areas I didn’t know that well in the first place, which is a fine way to learn if you have a specific concrete goal in mind — but it turns out that building a generic library for editing images means you have to know everything about images. Perhaps that ambition went a little haywire.

I’ve said before that this sort of (self-inflicted!) work was unfulfilling, in part because the best outcome would be that a few distant programmers’ lives are slightly easier. I do still think that, but I think there’s a deeper point here too.

In forgetting how to play, I’d stopped putting any of myself in most of the work I was doing. Yes, building an imaging library is kind of a slog that someone has to do, but… I assume the people who work on software like PIL and ImageMagick are actually interested in it. The few domains I tried to enter and revolutionize weren’t passions of mine; I just happened to walk through the neighborhood one day and decided I could obviously do it better.

Not coincidentally, this was the same era of my life that led me to write stuff like that PHP post, which you may notice I am conspicuously not even linking to. I don’t think I would write anything like it nowadays. I could see myself approaching the same subject, but purely from the point of view of language design, with more contrasts and tradeoffs and less going for volume. I certainly wouldn’t lead off with inflammatory puffery like “PHP is a community of amateurs”.


I think I’ve mellowed out a good bit in the last few years.

It turns out that being Right is much less important than being Not Wrong — i.e., rather than trying to make something perfect that can be adapted to any future case, just avoid as many pitfalls as possible. Code that does something useful has much more practical value than unfinished code with some pristine architecture.

Nowhere is this more apparent than in game development, where all code is doomed to be crap and the best you can hope for is to stem the tide. But there’s also a fixed goal that’s completely unrelated to how the code looks: does the game work, and is it fun to play? Yes? Ship the damn thing and forget about it.

Games are also nice because it’s very easy to pour my own feelings into them and evoke feelings in the people who play them. They’re mine, something with my fingerprints on them — even the games I’ve built with glip have plenty of my own hallmarks, little touches I added on a whim or attention to specific details that I care about.

Maybe a better example is the Doom map parser I started writing. It sounds like a “pure” problem again, except that I actually know an awful lot about the subject already! I also cleverly (accidentally) released some useful results of the work I’ve done thusfar — like statistics about Doom II maps and a few screenshots of flipped stock maps — even though I don’t think the parser itself is far enough along to release yet. The tool has served a purpose, one with my fingerprints on it, even without being released publicly. That keeps it fresh in my mind as something interesting I’d like to keep working on, eventually. (When I run into an architecture question, I step back for a while, or I do other work in the hopes that the solution will reveal itself.)

I also made two simple Pokémon ROM hacks this year, despite knowing nothing about Game Boy internals or assembly when I started. I just decided I wanted to do an open-ended thing beyond my reach, and I went to do it, not worrying about cleanliness and willing to accept a bumpy ride to get there. I played, but in a more experienced way, invoking the stuff I know (and the people I’ve met!) to help me get a running start in completely unfamiliar territory.

This feels like a really fine distinction that I’m not sure I’m doing justice. I don’t know if I could’ve appreciated it three or four years ago. But I missed making toys, and I’m glad I’m doing it again.

In short, I forgot how to have fun with programming for a little while, and I’ve finally started to figure it out again. And that’s far more important than whether you use PHP or not.

Firefox 55 released

Post Syndicated from ris original https://lwn.net/Articles/730198/rss

Firefox 55.0 has been released. From the release
: “Today’s release brings innovative functionality, improvements to core browser performance, and more proof that we’re committed to making Firefox better than ever. New features include support for WebVR, making Firefox the first Windows desktop browser to support VR experiences. Performance changes include significantly faster startup times when restoring lots of tabs and settings that let users take greater control of our new multi-process architecture. We’ve also upgraded the address bar to make finding what you want easier, with search suggestions and the integration of our one-click search feature, and safer, by prioritizing the secure – https – version of sites when possible.

Turbocharge your Apache Hive queries on Amazon EMR using LLAP

Post Syndicated from Jigar Mistry original https://aws.amazon.com/blogs/big-data/turbocharge-your-apache-hive-queries-on-amazon-emr-using-llap/

Apache Hive is one of the most popular tools for analyzing large datasets stored in a Hadoop cluster using SQL. Data analysts and scientists use Hive to query, summarize, explore, and analyze big data.

With the introduction of Hive LLAP (Low Latency Analytical Processing), the notion of Hive being just a batch processing tool has changed. LLAP uses long-lived daemons with intelligent in-memory caching to circumvent batch-oriented latency and provide sub-second query response times.

This post provides an overview of Hive LLAP, including its architecture and common use cases for boosting query performance. You will learn how to install and configure Hive LLAP on an Amazon EMR cluster and run queries on LLAP daemons.

What is Hive LLAP?

Hive LLAP was introduced in Apache Hive 2.0, which provides very fast processing of queries. It uses persistent daemons that are deployed on a Hadoop YARN cluster using Apache Slider. These daemons are long-running and provide functionality such as I/O with DataNode, in-memory caching, query processing, and fine-grained access control. And since the daemons are always running in the cluster, it saves substantial overhead of launching new YARN containers for every new Hive session, thereby avoiding long startup times.

When Hive is configured in hybrid execution mode, small and short queries execute directly on LLAP daemons. Heavy lifting (like large shuffles in the reduce stage) is performed in YARN containers that belong to the application. Resources (CPU, memory, etc.) are obtained in a traditional fashion using YARN. After the resources are obtained, the execution engine can decide which resources are to be allocated to LLAP, or it can launch Apache Tez processors in separate YARN containers. You can also configure Hive to run all the processing workloads on LLAP daemons for querying small datasets at lightning fast speeds.

LLAP daemons are launched under YARN management to ensure that the nodes don’t get overloaded with the compute resources of these daemons. You can use scheduling queues to make sure that there is enough compute capacity for other YARN applications to run.

Why use Hive LLAP?

With many options available in the market (Presto, Spark SQL, etc.) for doing interactive SQL  over data that is stored in Amazon S3 and HDFS, there are several reasons why using Hive and LLAP might be a good choice:

  • For those who are heavily invested in the Hive ecosystem and have external BI tools that connect to Hive over JDBC/ODBC connections, LLAP plugs in to their existing architecture without a steep learning curve.
  • It’s compatible with existing Hive SQL and other Hive tools, like HiveServer2, and JDBC drivers for Hive.
  • It has native support for security features with authentication and authorization (SQL standards-based authorization) using HiveServer2.
  • LLAP daemons are aware about of the columns and records that are being processed which enables you to enforce fine-grained access control.
  • It can use Hive’s vectorization capabilities to speed up queries, and Hive has better support for Parquet file format when vectorization is enabled.
  • It can take advantage of a number of Hive optimizations like merging multiple small files for query results, automatically determining the number of reducers for joins and groupbys, etc.
  • It’s optional and modular so it can be turned on or off depending on the compute and resource requirements of the cluster. This lets you to run other YARN applications concurrently without reserving a cluster specifically for LLAP.

How do you install Hive LLAP in Amazon EMR?

To install and configure LLAP on an EMR cluster, use the following bootstrap action (BA):


This BA downloads and installs Apache Slider on the cluster and configures LLAP so that it works with EMR Hive. For LLAP to work, the EMR cluster must have Hive, Tez, and Apache Zookeeper installed.

You can pass the following arguments to the BA.

Argument Definition Default value
--instances Number of instances of LLAP daemon Number of core/task nodes of the cluster
--cache Cache size per instance 20% of physical memory of the node
--executors Number of executors per instance Number of CPU cores of the node
--iothreads Number of IO threads per instance Number of CPU cores of the node
--size Container size per instance 50% of physical memory of the node
--xmx Working memory size 50% of container size
--log-level Log levels for the LLAP instance INFO

LLAP example

This section describes how you can try the faster Hive queries with LLAP using the TPC-DS testbench for Hive on Amazon EMR.

Use the following AWS command line interface (AWS CLI) command to launch a 1+3 nodes m4.xlarge EMR 5.6.0 cluster with the bootstrap action to install LLAP:

aws emr create-cluster --release-label emr-5.6.0 \
--applications Name=Hadoop Name=Hive Name=Hue Name=ZooKeeper Name=Tez \
--bootstrap-actions '[{"Path":"s3://aws-bigdata-blog/artifacts/Turbocharge_Apache_Hive_on_EMR/configure-Hive-LLAP.sh","Name":"Custom action"}]' \ 
--ec2-attributes '{"KeyName":"<YOUR-KEY-PAIR>","InstanceProfile":"EMR_EC2_DefaultRole","SubnetId":"subnet-xxxxxxxx","EmrManagedSlaveSecurityGroup":"sg-xxxxxxxx","EmrManagedMasterSecurityGroup":"sg-xxxxxxxx"}' 
--service-role EMR_DefaultRole \
--enable-debugging \
--log-uri 's3n://<YOUR-BUCKET/' --name 'test-hive-llap' \
--instance-groups '[{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}],"EbsOptimized":true},"InstanceGroupType":"MASTER","InstanceType":"m4.xlarge","Name":"Master - 1"},{"InstanceCount":3,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}],"EbsOptimized":true},"InstanceGroupType":"CORE","InstanceType":"m4.xlarge","Name":"Core - 2"}]' 
--region us-east-1

After the cluster is launched, log in to the master node using SSH, and do the following:

  1. Open the hive-tpcds folder:
    cd /home/hadoop/hive-tpcds/
  2. Start Hive CLI using the testbench configuration, create the required tables, and run the sample query:

    hive –i testbench.settings
    hive> source create_tables.sql;
    hive> source query55.sql;

    This sample query runs on a 40 GB dataset that is stored on Amazon S3. The dataset is generated using the data generation tool in the TPC-DS testbench for Hive.It results in output like the following:
  3. This screenshot shows that the query finished in about 47 seconds for LLAP mode. Now, to compare this to the execution time without LLAP, you can run the same workload using only Tez containers:
    hive> set hive.llap.execution.mode=none;
    hive> source query55.sql;

    This query finished in about 80 seconds.

The difference in query execution time is almost 1.7 times when using just YARN containers in contrast to running the query on LLAP daemons. And with every rerun of the query, you notice that the execution time substantially decreases by the virtue of in-memory caching by LLAP daemons.


In this post, I introduced Hive LLAP as a way to boost Hive query performance. I discussed its architecture and described several use cases for the component. I showed how you can install and configure Hive LLAP on an Amazon EMR cluster and how you can run queries on LLAP daemons.

If you have questions about using Hive LLAP on Amazon EMR or would like to share your use cases, please leave a comment below.

Additional Reading

Learn how to to automatically partition Hive external tables with AWS.

About the Author

Jigar Mistry is a Hadoop Systems Engineer with Amazon Web Services. He works with customers to provide them architectural guidance and technical support for processing large datasets in the cloud using open-source applications. In his spare time, he enjoys going for camping and exploring different restaurants in the Seattle area.





Deploying an NGINX Reverse Proxy Sidecar Container on Amazon ECS

Post Syndicated from Nathan Peck original https://aws.amazon.com/blogs/compute/nginx-reverse-proxy-sidecar-container-on-amazon-ecs/

Reverse proxies are a powerful software architecture primitive for fetching resources from a server on behalf of a client. They serve a number of purposes, from protecting servers from unwanted traffic to offloading some of the heavy lifting of HTTP traffic processing.

This post explains the benefits of a reverse proxy, and explains how to use NGINX and Amazon EC2 Container Service (Amazon ECS) to easily implement and deploy a reverse proxy for your containerized application.


NGINX is a high performance HTTP server that has achieved significant adoption because of its asynchronous event driven architecture. It can serve thousands of concurrent requests with a low memory footprint. This efficiency also makes it ideal as a reverse proxy.

Amazon ECS is a highly scalable, high performance container management service that supports Docker containers. It allows you to run applications easily on a managed cluster of Amazon EC2 instances. Amazon ECS helps you get your application components running on instances according to a specified configuration. It also helps scale out these components across an entire fleet of instances.

Sidecar containers are a common software pattern that has been embraced by engineering organizations. It’s a way to keep server side architecture easier to understand by building with smaller, modular containers that each serve a simple purpose. Just like an application can be powered by multiple microservices, each microservice can also be powered by multiple containers that work together. A sidecar container is simply a way to move part of the core responsibility of a service out into a containerized module that is deployed alongside a core application container.

The following diagram shows how an NGINX reverse proxy sidecar container operates alongside an application server container:

In this architecture, Amazon ECS has deployed two copies of an application stack that is made up of an NGINX reverse proxy side container and an application container. Web traffic from the public goes to an Application Load Balancer, which then distributes the traffic to one of the NGINX reverse proxy sidecars. The NGINX reverse proxy then forwards the request to the application server and returns its response to the client via the load balancer.

Reverse proxy for security

Security is one reason for using a reverse proxy in front of an application container. Any web server that serves resources to the public can expect to receive lots of unwanted traffic every day. Some of this traffic is relatively benign scans by researchers and tools, such as Shodan or nmap:

[18/May/2017:15:10:10 +0000] "GET /YesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScann HTTP/1.1" 404 1389 - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
[18/May/2017:18:19:51 +0000] "GET /clientaccesspolicy.xml HTTP/1.1" 404 322 - Cloud mapping experiment. Contact [email protected]

But other traffic is much more malicious. For example, here is what a web server sees while being scanned by the hacking tool ZmEu, which scans web servers trying to find PHPMyAdmin installations to exploit:

[18/May/2017:16:27:39 +0000] "GET /mysqladmin/scripts/setup.php HTTP/1.1" 404 391 - ZmEu
[18/May/2017:16:27:39 +0000] "GET /web/phpMyAdmin/scripts/setup.php HTTP/1.1" 404 394 - ZmEu
[18/May/2017:16:27:39 +0000] "GET /xampp/phpmyadmin/scripts/setup.php HTTP/1.1" 404 396 - ZmEu
[18/May/2017:16:27:40 +0000] "GET /apache-default/phpmyadmin/scripts/setup.php HTTP/1.1" 404 405 - ZmEu
[18/May/2017:16:27:40 +0000] "GET /phpMyAdmin- HTTP/1.1" 404 397 - ZmEu
[18/May/2017:16:27:40 +0000] "GET /mysql/scripts/setup.php HTTP/1.1" 404 386 - ZmEu
[18/May/2017:16:27:41 +0000] "GET /admin/scripts/setup.php HTTP/1.1" 404 386 - ZmEu
[18/May/2017:16:27:41 +0000] "GET /forum/phpmyadmin/scripts/setup.php HTTP/1.1" 404 396 - ZmEu
[18/May/2017:16:27:41 +0000] "GET /typo3/phpmyadmin/scripts/setup.php HTTP/1.1" 404 396 - ZmEu
[18/May/2017:16:27:42 +0000] "GET /phpMyAdmin- HTTP/1.1" 404 399 - ZmEu
[18/May/2017:16:27:44 +0000] "GET /administrator/components/com_joommyadmin/phpmyadmin/scripts/setup.php HTTP/1.1" 404 418 - ZmEu
[18/May/2017:18:34:45 +0000] "GET /phpmyadmin/scripts/setup.php HTTP/1.1" 404 390 - ZmEu
[18/May/2017:16:27:45 +0000] "GET /w00tw00t.at.blackhats.romanian.anti-sec:) HTTP/1.1" 404 401 - ZmEu

In addition, servers can also end up receiving unwanted web traffic that is intended for another server. In a cloud environment, an application may end up reusing an IP address that was formerly connected to another service. It’s common for misconfigured or misbehaving DNS servers to send traffic intended for a different host to an IP address now connected to your server.

It’s the responsibility of anyone running a web server to handle and reject potentially malicious traffic or unwanted traffic. Ideally, the web server can reject this traffic as early as possible, before it actually reaches the core application code. A reverse proxy is one way to provide this layer of protection for an application server. It can be configured to reject these requests before they reach the application server.

Reverse proxy for performance

Another advantage of using a reverse proxy such as NGINX is that it can be configured to offload some heavy lifting from your application container. For example, every HTTP server should support gzip. Whenever a client requests gzip encoding, the server compresses the response before sending it back to the client. This compression saves network bandwidth, which also improves speed for clients who now don’t have to wait as long for a response to fully download.

NGINX can be configured to accept a plaintext response from your application container and gzip encode it before sending it down to the client. This allows your application container to focus 100% of its CPU allotment on running business logic, while NGINX handles the encoding with its efficient gzip implementation.

An application may have security concerns that require SSL termination at the instance level instead of at the load balancer. NGINX can also be configured to terminate SSL before proxying the request to a local application container. Again, this also removes some CPU load from the application container, allowing it to focus on running business logic. It also gives you a cleaner way to patch any SSL vulnerabilities or update SSL certificates by updating the NGINX container without needing to change the application container.

NGINX configuration

Configuring NGINX for both traffic filtering and gzip encoding is shown below:

http {
  # NGINX will handle gzip compression of responses from the app server
  gzip on;
  gzip_proxied any;
  gzip_types text/plain application/json;
  gzip_min_length 1000;
  server {
    listen 80;
    # NGINX will reject anything not matching /api
    location /api {
      # Reject requests with unsupported HTTP method
      if ($request_method !~ ^(GET|POST|HEAD|OPTIONS|PUT|DELETE)$) {
        return 405;
      # Only requests matching the whitelist expectations will
      # get sent to the application server
      proxy_pass http://app:3000;
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection 'upgrade';
      proxy_set_header Host $host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_cache_bypass $http_upgrade;

The above configuration only accepts traffic that matches the expression /api and has a recognized HTTP method. If the traffic matches, it is forwarded to a local application container accessible at the local hostname app. If the client requested gzip encoding, the plaintext response from that application container is gzip-encoded.

Amazon ECS configuration

Configuring ECS to run this NGINX container as a sidecar is also simple. ECS uses a core primitive called the task definition. Each task definition can include one or more containers, which can be linked to each other:

  "containerDefinitions": [
       "name": "nginx",
       "image": "<NGINX reverse proxy image URL here>",
       "memory": "256",
       "cpu": "256",
       "essential": true,
       "portMappings": [
           "containerPort": "80",
           "protocol": "tcp"
       "links": [
       "name": "app",
       "image": "<app image URL here>",
       "memory": "256",
       "cpu": "256",
       "essential": true
   "networkMode": "bridge",
   "family": "application-stack"

This task definition causes ECS to start both an NGINX container and an application container on the same instance. Then, the NGINX container is linked to the application container. This allows the NGINX container to send traffic to the application container using the hostname app.

The NGINX container has a port mapping that exposes port 80 on a publically accessible port but the application container does not. This means that the application container is not directly addressable. The only way to send it traffic is to send traffic to the NGINX container, which filters that traffic down. It only forwards to the application container if the traffic passes the whitelisted rules.


Running a sidecar container such as NGINX can bring significant benefits by making it easier to provide protection for application containers. Sidecar containers also improve performance by freeing your application container from various CPU intensive tasks. Amazon ECS makes it easy to run sidecar containers, and automate their deployment across your cluster.

To see the full code for this NGINX sidecar reference, or to try it out yourself, you can check out the open source NGINX reverse proxy reference architecture on GitHub.

– Nathan

Amazon QuickSight Now Supports Amazon Athena in EU (Ireland), Count Distinct, and Week Aggregation

Post Syndicated from Luis Wang original https://aws.amazon.com/blogs/big-data/amazon-quicksight-now-supports-amazon-athena-in-eu-ireland-count-distinct-and-week-aggregation/

Today, I’m excited to share a couple of new features in Amazon QuickSight. First, with this release, we expanded connectivity options by adding Amazon Athena support in the EU (Ireland) Region. Additionally, you can now use Count Distinct on your dimensions and metrics in the visualizations and aggregate date fields by week for SPICE data sets.

Athena in Ireland

Athena is one of the most popular data sources used by QuickSight customers. It allows you to deploy a serverless BI and analytics architecture for your operational and business data. With this release, the Athena connector is now available in the EU (Ireland) Region. You can connect QuickSight to your Athena databases and tables in the region and start visualizing your data in a matter of seconds.

Count Distinct

You can now perform aggregations using Count Distinct in the visualizations, one of the top requests from users. To use Count Distinct, simply select Count Distinct as the aggregation on the visual axis or in the field well. Count Distinct is supported for both direct queries and SPICE data sets. You can apply it to strings and measures. It is available for all supported visualization types.

Date aggregation by week

Time series line charts are one of the most common ways for customers to report on business trends. In addition to Year, Month, Day and Hour, you can now aggregate date fields by WEEK and visualize your data at a weekly granularity.

Learn more

To learn more about these capabilities and start using them in your dashboards, see the QuickSight User Guide.

Stay engaged

If you have questions or suggestions, you can post them on the QuickSight Discussion Forum.

Not a QuickSight user?

To get started for FREE, see quicksight.aws.


Qubes OS 4.0-rc1 released

Post Syndicated from corbet original https://lwn.net/Articles/729364/rss

For those who are curious about what the next release of the Qubes OS
distribution will bring (and want to help make it better): the first
Qubes OS 4.0 release candidate
is available.
This new Core Stack allows to easily extend the Qubes Architecture
in new directions, allowing us to finally build (in a clean way) lots of
things we’ve wanted for years, but which would have been too complex to
build on the ‘old’ Qubes infrastructure. The new Qubes Admin API, which we
introduced in a recent post, is a prime example of one such

Amazon Redshift Spectrum Extends Data Warehousing Out to Exabytes—No Loading Required

Post Syndicated from Maor Kleider original https://aws.amazon.com/blogs/big-data/amazon-redshift-spectrum-extends-data-warehousing-out-to-exabytes-no-loading-required/

When we first looked into the possibility of building a cloud-based data warehouse many years ago, we were struck by the fact that our customers were storing ever-increasing amounts of data, and yet only a small fraction of that data ever made it into a data warehouse or Hadoop system for analysis. We saw that this wasn’t just a cloud-specific anomaly. It was also true in the broader industry, where the growth rate of the enterprise storage market segment greatly surpassed that of the data warehousing market segment.

We dubbed this the “dark data” problem. Our customers knew that there was untapped value in the data they collected; why else would they spend money to store it? But the systems available to them to analyze this data were simply too slow, complex, and expensive for them to use on all but a select subset of this data. They were storing it with optimistic hope that, someday, someone would find a solution.

Amazon Redshift became one of the fastest-growing AWS services because it helped solve the dark data problem. It was at least an order of magnitude less expensive and faster than most alternatives available. And Amazon Redshift was fully managed from the start—you didn’t have to worry about capacity, provisioning, patching, monitoring, backups, and a host of other DBA headaches. Many customers, including Vevo, Yelp, Redfin, and Edmunds, migrated to Amazon Redshift to improve query performance, reduce DBA overhead, and lower the cost of analytics.

And our customers’ data continues to grow at a very fast rate. Across the board, gigabytes to petabytes, the average Amazon Redshift customer doubles the data analyzed every year. That’s why we implement features that help customers handle their growing data, for example to double the query throughput or improve the compression ratios from 3x to 4x. That gives our customers some time before they have to consider throwing away data or removing it from their analytic environments. However, there is an increasing number of AWS customers who each generate a petabyte of data every day—that’s an exabyte in only three years. There wasn’t a solution for customers like that. If your data is doubling every year, it’s not long before you have to find new, disruptive approaches that transform the cost, performance, and simplicity curves for managing data.

Let’s look at the options available today. You can use Hadoop-based technologies like Apache Hive with Amazon EMR. This is actually a pretty great solution because it makes it easy and cost-effective to operate directly on data in Amazon S3 without ingestion or transformation. You can spin up clusters as you wish when you need, and size them right for that specific job you’re running. These systems are great at high scale-out processing like scans, filters, and aggregates. On the other hand, they’re not that good at complex query processing. For example, join processing requires data to be shuffled across nodes—for a large amount of data and large numbers of nodes that gets very slow. And joins are intrinsic to any meaningful analytics problem.

You can also use a columnar MPP data warehouse like Amazon Redshift. These systems make it simple to run complex analytic queries with orders of magnitude faster performance for joins and aggregations performed over large datasets. Amazon Redshift, in particular, leverages high-performance local disks, sophisticated query execution. and join-optimized data formats. Because it is just standard SQL, you can keep using your existing ETL and BI tools. But you do have to load data, and you have to provision clusters against the storage and CPU requirements you need.

Both solutions have powerful attributes, but they force you to choose which attributes you want. We see this as a “tyranny of OR.” You can have the throughput of local disks OR the scale of Amazon S3. You can have sophisticated query optimization OR high-scale data processing. You can have fast join performance with optimized formats OR a range of data processing engines that work against common data formats. But you shouldn’t have to choose. At this scale, you really can’t afford to choose. You need “all of the above.”

Redshift Spectrum

We built Redshift Spectrum to end this “tyranny of OR.” With Redshift Spectrum, Amazon Redshift customers can easily query their data in Amazon S3. Like Amazon EMR, you get the benefits of open data formats and inexpensive storage, and you can scale out to thousands of nodes to pull data, filter, project, aggregate, group, and sort. Like Amazon Athena, Redshift Spectrum is serverless and there’s nothing to provision or manage. You just pay for the resources you consume for the duration of your Redshift Spectrum query. Like Amazon Redshift itself, you get the benefits of a sophisticated query optimizer, fast access to data on local disks, and standard SQL. And like nothing else, Redshift Spectrum can execute highly sophisticated queries against an exabyte of data or more—in just minutes.

Redshift Spectrum is a built-in feature of Amazon Redshift, and your existing queries and BI tools will continue to work seamlessly. Under the covers, we manage a fleet of thousands of Redshift Spectrum nodes spread across multiple Availability Zones. These are transparently scaled and allocated to your queries based on the data that you need to process, with no provisioning or commitments. Redshift Spectrum is also highly concurrent—you can access your Amazon S3 data from any number of Amazon Redshift clusters.

The life of a Redshift Spectrum query

It all starts when Redshift Spectrum queries are submitted to the leader node of your Amazon Redshift cluster. The leader node optimizes, compiles, and pushes the query execution to the compute nodes in your Amazon Redshift cluster. Next, the compute nodes obtain the information describing the external tables from your data catalog, dynamically pruning nonrelevant partitions based on the filters and joins in your queries. The compute nodes also examine the data available locally and push down predicates to efficiently scan only the relevant objects in Amazon S3.

The Amazon Redshift compute nodes then generate multiple requests depending on the number of objects that need to be processed, and submit them concurrently to Redshift Spectrum, which pools thousands of Amazon EC2 instances per AWS Region. The Redshift Spectrum worker nodes scan, filter, and aggregate your data from Amazon S3, streaming required data for processing back to your Amazon Redshift cluster. Then, the final join and merge operations are performed locally in your cluster and the results are returned to your client.

Redshift Spectrum’s architecture offers several advantages. First, it elastically scales compute resources separately from the storage layer in Amazon S3. Second, it offers significantly higher concurrency because you can run multiple Amazon Redshift clusters and query the same data in Amazon S3. Third, Redshift Spectrum leverages the Amazon Redshift query optimizer to generate efficient query plans, even for complex queries with multi-table joins and window functions. Fourth, it operates directly on your source data in its native format (Parquet, RCFile, CSV, TSV, Sequence, Avro, RegexSerDe and more to come soon). This means that no data loading or transformation is needed. This also eliminates data duplication and associated costs. Fifth, operating on open data formats gives you the flexibility to leverage other AWS services and execution engines across your various teams to collaborate on the same data in Amazon S3. You get all of this, and because Redshift Spectrum is a feature of Amazon Redshift, you get the same level of end-to-end security, compliance, and certifications as with Amazon Redshift.

Designed for performance and cost-effectiveness

With Amazon Redshift Spectrum, you pay only for the queries you run against the data that you actually scan. We encourage you to leverage file partitioning, columnar data formats, and data compression to significantly minimize the amount of data scanned in Amazon S3. This is important for data warehousing because it dramatically improves query performance and reduces cost. Partitioning your data in Amazon S3 by date, time, or any other custom keys enables Redshift Spectrum to dynamically prune nonrelevant partitions to minimize the amount of data processed. If you store data in a columnar format, such as Parquet, Redshift Spectrum scans only the columns needed by your query, rather than processing entire rows. Similarly, if you compress your data using one of Redshift Spectrum’s supported compression algorithms, less data is scanned.

Amazon Redshift and Redshift Spectrum give you the best of both worlds. If you need to run frequent queries on the same data, you can normalize it, store it in Amazon Redshift, and get all of the benefits of a fully featured data warehouse for storing and querying structured data at a flat rate. At the same time, you can keep your additional data in multiple open file formats in Amazon S3, whether it is historical data or the most recent data, and extend your Amazon Redshift queries across your Amazon S3 data lake.

And that is how Amazon Redshift Spectrum scales data warehousing to exabytes—with no loading required. Redshift Spectrum ends the “tyranny of OR,” enabling you to store your data where you want, in the format you want, and have it available for fast processing using standard SQL when you need it, now and in the future.

Additional Reading

10 Best Practices for Amazon Redshift Spectrum
Amazon QuickSight Adds Support for Amazon Redshift Spectrum
Amazon Redshift Spectrum – Exabyte-Scale In-Place Queries of S3 Data



About the Author

Maor Kleider is a Senior Product Manager for Amazon Redshift, a fast, simple and cost-effective data warehouse. Maor is passionate about collaborating with customers and partners, learning about their unique big data use cases and making their experience even better. In his spare time, Maor enjoys traveling and exploring new restaurants with his family.




Will Magento’s Progressive Web Apps drive more mobile revenue for merchants?

Post Syndicated from chris desantis original https://www.anchor.com.au/blog/2017/07/magentos-progressive-web-apps/

Magento Live always has a few big announcements and this year’s London edition did not disappoint. With the promise of more traffic and higher conversions, Magento, in collaboration with Google, will offer native Progressive Web Apps (PWA’s) to the Magento ecommerce community in 2018.

Even though that’s at LEAST five months away, you might already be pondering ‘What does that mean for my ecommerce site?’. Has responsive web design gone out of fashion like man buns and paleo diets? And how do I explain Progressive Web Apps to the boss?

What are Progressive Web Apps?

Progressive Web Apps are technically standard web pages that can ‘live’ on a user’s home screen and behave like a native app, without the need for downloading and installing an app. PWAs promise a faster, more reliable, and more engaging user experience.

Google Developers state that:

“Progressive Web Apps are experiences that combine the best of the web and the best of apps.”

However, for developers, creating Progressive Web Apps requires the use of Service Workers, a Manifest and Application shell architecture, and more, to ensure the promised user experience is delivered. It’s an evolution of the way developers are currently working but is yet another thing for developers to master.

Google Developers has a huge run down on all things PWAcheck it out.

But, back to the Magento announcement. It’s actually a big deal.

Magento is one of the most widely adopted ecommerce platforms in use today. By partnering with Google to bring PWAs into the Magento platform, PWAs are poised to become the new norm for forward-thinking ecommerce brands.

And what’s not to love? Productivity and cost benefits for e-commerce stores could be huge—instead of maintaining separate native app and web properties, a single Progressive Web App could be a new reality. For developers already using the Magento platform, the friction to get a PWA live is likely to be significantly reduced if this partnership lives up to the hype.

Mark Lavelle, CEO of Magento, stated:

“We see PWAs as a natural evolution of the mobile web, and by working with industry
leaders such as Google to develop PWAs, we plan to keep merchants ahead of the curve.”

So, don’t throw away your responsive site and native apps just yet. But keep an eye on Magento and Google, as we think this is going to be a huge benefit to developers and businesses in the ecommerce space. Roll on 2018!

Read the Magento press release here.

Check out pwa.rocks for interesting some examples of Progressive Web Apps in action.

Orange CTA Button | Progressive Web Apps

The post Will Magento’s Progressive Web Apps drive more mobile revenue for merchants? appeared first on AWS Managed Services by Anchor.

Libgcrypt 1.8.0 released

Post Syndicated from ris original https://lwn.net/Articles/728287/rss

The GnuPG Project has announced the availability of Libgcrypt 1.8.0.
This is a new stable version of Libgcrypt with full API
and ABI compatibility to the 1.7 series. Its main features are support
Blake-2, XTS mode, an improved RNG, and performance improvements for the
ARM architecture.

Kernel prepatch 4.13-rc1

Post Syndicated from corbet original https://lwn.net/Articles/728046/rss

Linus has released 4.13-rc1 and closed the
merge window for this cycle. “Once again, the diffstat is absolutely
dominated by some AMD gpu header files, but if you ignore that, things look
pretty regular, with about two thirds drivers and one third “rest”
(architecture, core kernel, core networking, tooling).