Tag Archives: marketing

Banning Surveillance-Based Advertising

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/06/banning-surveillance-based-advertising.html

The Norwegian Consumer Council just published a fantastic new report: “Time to Ban Surveillance-Based Advertising.” From the Introduction:

The challenges caused and entrenched by surveillance-based advertising include, but are not limited to:

  • privacy and data protection infringements
  • opaque business models
  • manipulation and discrimination at scale
  • fraud and other criminal activity
  • serious security risks

In the following chapters, we describe various aspects of these challenges and point out how today’s dominant model of online advertising is a threat to consumers, democratic societies, the media, and even to advertisers themselves. These issues are significant and serious enough that we believe that it is time to ban these detrimental practices.

A ban on surveillance-based practices should be complemented by stronger enforcement of existing legislation, including the General Data Protection Regulation, competition regulation, and the Unfair Commercial Practices Directive. However, enforcement currently consumes significant time and resources, and usually happens after the damage has already been done. Banning surveillance-based advertising in general will force structural changes to the advertising industry and alleviate a number of significant harms to consumers and to society at large.

A ban on surveillance-based advertising does not mean that one can no longer finance digital content using advertising. To illustrate this, we describe some possible ways forward for advertising-funded digital content, and point to alternative advertising technologies that may contribute to a safer and healthier digital economy for both consumers and businesses.

Press release. Press coverage.

I signed their open letter.

How Grab Leveraged Performance Marketing Automation to Improve Conversion Rates by 30%

Post Syndicated from Grab Tech original https://engineering.grab.com/learn-how-grab-leveraged-performance-marketing-automation

Grab, Southeast Asia’s leading superapp, is a hyperlocal three-sided marketplace that operates across hundreds of cities in Southeast Asia. Grab started out as a taxi hailing company back in 2012 and in less than a decade, the business has evolved tremendously and now offers a diverse range of services for consumers’ everyday needs.

To fuel our business growth in newer service offerings such as GrabFood, GrabMart and GrabExpress, user acquisition efforts play a pivotal role in ensuring we create a sustainable Grab ecosystem that balances the marketplace dynamics between our consumers, driver-partners and merchant-partners.

Part of our user growth strategy is centred around our efforts in running direct-response app campaigns to increase trials on our superapp offerings. Executing these campaigns brings about a set of unique challenges against the diverse cultural backdrop present in Southeast Asia, challenging the team to stay hyperlocal in our strategies while driving user volumes at scale. To address these unique challenges, Grab’s performance marketing team is constantly seeking ways to leverage automation and innovate on our operations, improving our marketing efficiency and effectiveness.

Managing Grab’s Ever-expanding Business, Geographical Coverage and New User Acquisition

Grab’s ever-expanding services, extensive geographical coverage and hyperlocal strategies result in an extremely dynamic, yet complex ad account structure. This also means that whenever there is a new business vertical launch or hyperlocal campaign, the team would spend valuable hours rolling out a large volume of new ads across our accounts in the region.

Sample Google Ads account structure
A sample of our Google Ads account structure.

The granular structure of our Google Ads account provided us with flexibility to execute hyperlocal strategies, but this also resulted in thousands of ad groups that had to be individually maintained.

In 2019, Grab’s growth was simply outpacing our team’s resources and we finally hit a bottleneck. This challenged the team to take a step back and make the decision to pursue a fully automated solution built on the following principles for long term sustainability:

  • Building ad-tech solutions in-house instead of acquiring off-the-shelf solutions

    Grab’s unique business model calls for several tailor-made features, none of which the existing ad tech solutions were able to provide.

  • Shifting our mindset to focus on the infinite game

    In order to sustain the exponential volume in the ads we run, we had to seek the path of automation.

For our very first automation project, we decided to look into automating creative refresh and upload for our Google Ads account. With thousands of ad groups running multiple creatives each, this had become a growing problem for the team. Overtime, manually monitoring these creatives and refreshing them on a regular basis had become impossible.

The Automation Fundamentals

Grab’s superapp nature means that any automation solution fundamentally needs to be robust:

  • Performance-driven – to maintain and improve conversion efficiency over time
  • Flexibility –  to fit needs across business verticals and hyperlocal execution
  • Inclusivity – to account for future service launches and marketing tech (e.g. product feeds and more)
  • Scalability – to account for future geography/campaign coverage

With these in mind, we incorporated them in our requirements for the custom creative automation tool we planned to build.

  • Performance-driven – while many advertising platforms, such as Google’s App Campaigns, have built-in algorithms to prevent low-performing creatives from being served, the fundamental bottleneck lies in the speed in which these low-performing creatives can be replaced with new assets to improve performance. Thus, solving this bottleneck would become the primary goal of our tool.

  • Flexibility – to accommodate our broad range of services, geographies and marketing objectives, a mapping logic would be required to make sure the right creatives are added into the right campaigns.

    To solve this, we relied on a standardised creative naming convention, using key attributes in the file name to map an asset to a specific campaign and ad group based on:

    • Market
    • City
    • Service type
    • Language
    • Creative theme
    • Asset type
    • Campaign optimisation goal
  • Inclusivity – to address coverage of future service offerings and interoperability with existing ad-tech vendors, we designed and built our tool conforming to many industry API and platform standards.

  • Scalability – to ensure full coverage of future geographies/campaigns, the in-house solution’s frontend and backend had to be robust enough to handle volume. Working hand in glove with Google, the solution was built by leveraging multiple APIs including Google Ads and Youtube to host and replace low-performing assets across our ad groups. The solution was then deployed on AWS’ serverless compute engine.

Enter CARA

CARA is an automation tool that scans for any low-performing creatives and replaces them with new assets from our creative library:

CARA Workflow
A sneak peek of how CARA works

In a controlled experimental launch, we saw nearly 2,000 underperforming assets automatically replaced across more than 8,000 active ad groups, translating to an 18-30% increase in clickthrough and conversion rates.

Subset of results from CARA experimental launch
A subset of results from CARA’s experimental launch

Through automation, Grab’s performance marketing team has been able to significantly improve clickthrough and conversion rates while saving valuable man-hours. We have also established a scalable foundation for future growth. The best part? We are just getting started.


Authored on behalf of the performance marketing team @ Grab. Special thanks to the CRM data analytics team, particularly Milhad Miah and Vaibhav Vij for making this a reality.


Join Us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

Opt-in to the new Amazon SES console experience

Post Syndicated from Simon Poile original https://aws.amazon.com/blogs/messaging-and-targeting/amazon-ses-console-opt-in/

Amazon Web Services (AWS) is pleased to announce the launch of the newly redesigned Amazon Simple Email Service (SES) console. With its streamlined look and feel, the new console makes it even easier for customers to leverage the speed, reliability, and flexibility that Amazon SES has to offer. Customers can access the new console experience via an opt-in link on the classic console.

Amazon SES now offers a new, optimized console to provide customers with a simpler, more intuitive way to create and manage their resources, collect sending activity data, and monitor reputation health. It also has a more robust set of configuration options and new features and functionality not previously available in the classic console.

Here are a few of the improvements customers can find in the new Amazon SES console:

Verified identities

Streamlines how customers manage their sender identities in Amazon SES. This is done by replacing the classic console’s identity management section with verified identities. Verified identities are a centralized place in which customers can view, create, and configure both domain and email address identities on one page. Other notable improvements include:

  • DKIM-based verification
    DKIM-based domain verification replaces the previous verification method which was based on TXT records. DomainKeys Identified Mail (DKIM) is an email authentication mechanism that receiving mail servers use to validate email. This new verification method offers customers the added benefit of enhancing their deliverability with DKIM-compliant email providers, and helping them achieve compliance with DMARC (Domain-based Message Authentication, Reporting and Conformance).
  • Amazon SES mailbox simulator
    The new mailbox simulator makes it significantly easier for customers to test how their applications handle different email sending scenarios. From a dropdown, customers select which scenario they’d like to simulate. Scenario options include bounces, complaints, and automatic out-of-office responses. The mailbox simulator provides customers with a safe environment in which to test their email sending capabilities.

Configuration sets

The new console makes it easier for customers to experience the benefits of using configuration sets. Configuration sets enable customers to capture and publish event data for specific segments of their email sending program. It also isolates IP reputation by segment by assigning dedicated IP pools. With a wider range of configuration options, such as reputation tracking and custom suppression options, customers get even more out of this powerful feature.

  • Default configuration set
    One important feature to highlight is the introduction of the default configuration set. By assigning a default configuration set to an identity, customers ensure that the assigned configuration set is always applied to messages sent from that identity at the time of sending. This enables customers to associate a dedicated IP pool or set up event publishing for an identity without having to modify their email headers.

Account dashboard

There is also an account dashboard for the new SES console. This feature provides customers with fast access to key information about their account, including sending limits and restrictions, and overall account health. A visual representation of the customer’s daily email usage helps them ensure that they aren’t approaching their sending limits. Additionally, customers who use the Amazon SES SMTP interface to send emails can visit the account dashboard to obtain or update their SMTP credentials.

Reputation metrics

The new reputation metrics page provides customers with high-level insight into historic bounce and complaint rates. This is viewed at both the account level and the configuration set level. Bounce and complaint rates are two important metrics that Amazon SES considers when assessing a customer’s sender reputation, as well as the overall health of their account.

The redesigned Amazon SES console, with its easy-to-use workflows, will not only enhance the customers’ on-boarding experience, it will also change the paradigms used for their on-going usage. The Amazon SES team remains committed to investing on behalf of our customers and empowering them to be productive anywhere, anytime. We invite you to opt in to the new Amazon SES console experience and let us know what you think.

Should There Be Limits on Persuasive Technologies?

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/12/should-there-be-limits-on-persuasive-technologies.html

Persuasion is as old as our species. Both democracy and the market economy depend on it. Politicians persuade citizens to vote for them, or to support different policy positions. Businesses persuade consumers to buy their products or services. We all persuade our friends to accept our choice of restaurant, movie, and so on. It’s essential to society; we couldn’t get large groups of people to work together without it. But as with many things, technology is fundamentally changing the nature of persuasion. And society needs to adapt its rules of persuasion or suffer the consequences.

Democratic societies, in particular, are in dire need of a frank conversation about the role persuasion plays in them and how technologies are enabling powerful interests to target audiences. In a society where public opinion is a ruling force, there is always a risk of it being mobilized for ill purposes — ­such as provoking fear to encourage one group to hate another in a bid to win office, or targeting personal vulnerabilities to push products that might not benefit the consumer.

In this regard, the United States, already extremely polarized, sits on a precipice.

There have long been rules around persuasion. The US Federal Trade Commission enforces laws that claims about products “must be truthful, not misleading, and, when appropriate, backed by scientific evidence.” Political advertisers must identify themselves in television ads. If someone abuses a position of power to force another person into a contract, undue influence can be argued to nullify that agreement. Yet there is more to persuasion than the truth, transparency, or simply applying pressure.

Persuasion also involves psychology, and that has been far harder to regulate. Using psychology to persuade people is not new. Edward Bernays, a pioneer of public relations and nephew to Sigmund Freud, made a marketing practice of appealing to the ego. His approach was to tie consumption to a person’s sense of self. In his 1928 book Propaganda, Bernays advocated engineering events to persuade target audiences as desired. In one famous stunt, he hired women to smoke cigarettes while taking part in the 1929 New York City Easter Sunday parade, causing a scandal while linking smoking with the emancipation of women. The tobacco industry would continue to market lifestyle in selling cigarettes into the 1960s.

Emotional appeals have likewise long been a facet of political campaigns. In the 1860 US presidential election, Southern politicians and newspaper editors spread fears of what a “Black Republican” win would mean, painting horrific pictures of what the emancipation of slaves would do to the country. In the 2020 US presidential election, modern-day Republicans used Cuban Americans’ fears of socialism in ads on Spanish-language radio and messaging on social media. Because of the emotions involved, many voters believed the campaigns enough to let them influence their decisions.

The Internet has enabled new technologies of persuasion to go even further. Those seeking to influence others can collect and use data about targeted audiences to create personalized messaging. Tracking the websites a person visits, the searches they make online, and what they engage with on social media, persuasion technologies enable those who have access to such tools to better understand audiences and deliver more tailored messaging where audiences are likely to see it most. This information can be combined with data about other activities, such as offline shopping habits, the places a person visits, and the insurance they buy, to create a profile of them that can be used to develop persuasive messaging that is aimed at provoking a specific response.

Our senses of self, meanwhile, are increasingly shaped by our interaction with technology. The same digital environment where we read, search, and converse with our intimates enables marketers to take that data and turn it back on us. A modern day Bernays no longer needs to ferret out the social causes that might inspire you or entice you­ — you’ve likely already shared that by your online behavior.

Some marketers posit that women feel less attractive on Mondays, particularly first thing in the morning — ­and therefore that’s the best time to advertise cosmetics to them. The New York Times once experimented by predicting the moods of readers based on article content to better target ads, enabling marketers to find audiences when they were sad or fearful. Some music streaming platforms encourage users to disclose their current moods, which helps advertisers target subscribers based on their emotional states.

The phones in our pockets provide marketers with our location in real time, helping deliver geographically relevant ads, such as propaganda to those attending a political rally. This always-on digital experience enables marketers to know what we are doing­ — and when, where, and how we might be feeling at that moment.

All of this is not intended to be alarmist. It is important not to overstate the effectiveness of persuasive technologies. But while many of them are more smoke and mirrors than reality, it is likely that they will only improve over time. The technology already exists to help predict moods of some target audiences, pinpoint their location at any given time, and deliver fairly tailored and timely messaging. How far does that ability need to go before it erodes the autonomy of those targeted to make decisions of their own free will?

Right now, there are few legal or even moral limits on persuasion­ — and few answers regarding the effectiveness of such technologies. Before it is too late, the world needs to consider what is acceptable and what is over the line.

For example, it’s been long known that people are more receptive to advertisements made with people who look like them: in race, ethnicity, age, gender. Ads have long been modified to suit the general demographic of the television show or magazine they appear in. But we can take this further. The technology exists to take your likeness and morph it with a face that is demographically similar to you. The result is a face that looks like you, but that you don’t recognize. If that turns out to be more persuasive than coarse demographic targeting, is that okay?

Another example: Instead of just advertising to you when they detect that you are vulnerable, what if advertisers craft advertisements that deliberately manipulate your mood? In some ways, being able to place ads alongside content that is likely to provoke a certain emotional response enables advertisers to do this already. The only difference is that the media outlet claims it isn’t crafting the content to deliberately achieve this. But is it acceptable to actively prime a target audience and then to deliver persuasive messaging that fits the mood?

Further, emotion-based decision-making is not the rational type of slow thinking that ought to inform important civic choices such as voting. In fact, emotional thinking threatens to undermine the very legitimacy of the system, as voters are essentially provoked to move in whatever direction someone with power and money wants. Given the pervasiveness of digital technologies, and the often instant, reactive responses people have to them, how much emotion ought to be allowed in persuasive technologies? Is there a line that shouldn’t be crossed?

Finally, for most people today, exposure to information and technology is pervasive. The average US adult spends more than eleven hours a day interacting with media. Such levels of engagement lead to huge amounts of personal data generated and aggregated about you­ — your preferences, interests, and state of mind. The more those who control persuasive technologies know about us, what we are doing, how we are feeling, when we feel it, and where we are, the better they can tailor messaging that provokes us into action. The unsuspecting target is grossly disadvantaged. Is it acceptable for the same services to both mediate our digital experience and to target us? Is there ever such thing as too much targeting?

The power dynamics of persuasive technologies are changing. Access to tools and technologies of persuasion is not egalitarian. Many require large amounts of both personal data and computation power, turning modern persuasion into an arms race where the better resourced will be better placed to influence audiences.

At the same time, the average person has very little information about how these persuasion technologies work, and is thus unlikely to understand how their beliefs and opinions might be manipulated by them. What’s more, there are few rules in place to protect people from abuse of persuasion technologies, much less even a clear articulation of what constitutes a level of manipulation so great it effectively takes agency away from those targeted. This creates a positive feedback loop that is dangerous for society.

In the 1970s, there was widespread fear about so-called subliminal messaging, which claimed that images of sex and death were hidden in the details of print advertisements, as in the curls of smoke in cigarette ads and the ice cubes of liquor ads. It was pretty much all a hoax, but that didn’t stop the Federal Trade Commission and the Federal Communications Commission from declaring it an illegal persuasive technology. That’s how worried people were about being manipulated without their knowledge and consent.

It is time to have a serious conversation about limiting the technologies of persuasion. This must begin by articulating what is permitted and what is not. If we don’t, the powerful persuaders will become even more powerful.

This essay was written with Alicia Wanless, and previously appeared in Foreign Policy.

Companies that Scrape Your Email

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/02/companies_that_.html

Motherboard has a long article on apps — Edison, Slice, and Cleanfox — that spy on your email by scraping your screen, and then sell that information to others:

Some of the companies listed in the J.P. Morgan document sell data sourced from “personal inboxes,” the document adds. A spokesperson for J.P. Morgan Research, the part of the company that created the document, told Motherboard that the research “is intended for institutional clients.”

That document describes Edison as providing “consumer purchase metrics including brand loyalty, wallet share, purchase preferences, etc.” The document adds that the “source” of the data is the “Edison Email App.”

[…]

A dataset obtained by Motherboard shows what some of the information pulled from free email app users’ inboxes looks like. A spreadsheet containing data from Rakuten’s Slice, an app that scrapes a user’s inbox so they can better track packages or get their money back once a product goes down in price, contains the item that an app user bought from a specific brand, what they paid, and an unique identification code for each buyer.

Predictive User Engagement using Amazon Pinpoint and Amazon Personalize

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/messaging-and-targeting/predictive-user-engagement-using-amazon-pinpoint-and-amazon-personalize/

Note: This post was written by John Burry, a Solution Architect on the AWS Customer Engagement team.


Predictive User Engagement (PUE) refers to the integration of machine learning (ML) and customer engagement services. By implementing a PUE solution, you can combine ML-based predictions and recommendations with real-time notifications and analytics, all based on your customers’ behaviors.

This blog post shows you how to set up a PUE solution by using Amazon Pinpoint and Amazon Personalize. Best of all, you can implement this solution even if you don’t have any prior machine learning experience. By completing the steps in this post, you’ll be able to build your own model in Personalize, integrate it with Pinpoint, and start sending personalized campaigns.

Prerequisites

Before you complete the steps in this post, you need to set up the following:

  • Create an admin user in Amazon Identity and Account Management (IAM). For more information, see Creating Your First IAM Admin User and Group in the IAM User Guide. You need to specify the credentials of this user when you set up the AWS Command Line Interface.
  • Install Python 3 and the pip package manager. Python 3 is installed by default on recent versions of Linux and macOS. If it isn’t already installed on your computer, you can download an installer from the Python website.
  • Use pip to install the following modules:
    • awscli
    • boto3
    • jupyter
    • matplotlib
    • sklearn
    • sagemaker

    For more information about installing modules, see Installing Python Modules in the Python 3.X Documentation.

  • Configure the AWS Command Line Interface (AWS CLI). During the configuration process, you have to specify a default AWS Region. This solution uses Amazon Sagemaker to build a model, so the Region that you specify has to be one that supports Amazon Sagemaker. For a complete list of Regions where Sagemaker is supported, see AWS Service Endpoints in the AWS General Reference. For more information about setting up the AWS CLI, see Configuring the AWS CLI in the AWS Command Line Interface User Guide.
  • Install Git. Git is installed by default on most versions of Linux and macOS. If Git isn’t already installed on your computer, you can download an installer from the Git website.

Step 1: Create an Amazon Pinpoint Project

In this section, you create and configure a project in Amazon Pinpoint. This project contains all of the customers that we will target, as well as the recommendation data that’s associated with each one. Later, we’ll use this data to create segments and campaigns.

To set up the Amazon Pinpoint project

  1. Sign in to the Amazon Pinpoint console at http://console.aws.amazon.com/pinpoint/.
  2. On the All projects page, choose Create a project. Enter a name for the project, and then choose Create.
  3. On the Configure features page, under SMS and voice, choose Configure.
  4. Under General settings, select Enable the SMS channel for this project, and then choose Save changes.
  5. In the navigation pane, under Settings, choose General settings. In the Project details section, copy the value under Project ID. You’ll need this value later.

Step 2: Create an Endpoint

In Amazon Pinpoint, an endpoint represents a specific method of contacting a customer, such as their email address (for email messages) or their phone number (for SMS messages). Endpoints can also contain custom attributes, and you can associate multiple endpoints with a single user. In this example, we use these attributes to store the recommendation data that we receive from Amazon Personalize.

In this section, we create a new endpoint and user by using the AWS CLI. We’ll use this endpoint to test the SMS channel, and to test the recommendations that we receive from Personalize.

To create an endpoint by using the AWS CLI

  1. At the command line, enter the following command:
    aws pinpoint update-endpoint --application-id <project-id> \
    --endpoint-id 12456 --endpoint-request "Address='<mobile-number>', \
    ChannelType='SMS',User={UserAttributes={recommended_items=['none']},UserId='12456'}"

    In the preceding example, replace <project-id> with the Amazon Pinpoint project ID that you copied in Step 1. Replace <mobile-number> with your phone number, formatted in E.164 format (for example, +12065550142).

Note that this endpoint contains hard-coded UserId and EndpointId values of 12456. These IDs match an ID that we’ll create later when we generate the Personalize data set.

Step 3: Create a Segment and Campaign in Amazon Pinpoint

Now that we have an endpoint, we need to add it to a segment so that we can use it within a campaign. By sending a campaign, we can verify that our Pinpoint project is configured correctly, and that we created the endpoint correctly.

To create the segment and campaign

  1. Open the Pinpoint console at http://console.aws.amazon.com/pinpoint, and then choose the project that you created in Step 1.
  2. In the navigation pane, choose Segments, and then choose Create a segment.
  3. Name the segment “No recommendations”. Under Segment group 1, on the Add a filter menu, choose Filter by user.
  4. On the Choose a user attribute menu, choose recommended-items. Set the value of the filter to “none”.
  5. Confirm that the Segment estimate section shows that there is one eligible endpoint, and then choose Create segment.
  6. In the navigation pane, choose Campaigns, and then choose Create a campaign.
  7. Name the campaign “SMS to users with no recommendations”. Under Choose a channel for this campaign, choose SMS, and then choose Next.
  8. On the Choose a segment page, choose the “No recommendations” segment that you just created, and then choose Next.
  9. In the message editor, type a test message, and then choose Next.
  10. On the Choose when to send the campaign page, keep all of the default values, and then choose Next.
  11. On the Review and launch page, choose Launch campaign. Within a few seconds, you receive a text message at the phone number that you specified when you created the endpoint.

Step 4: Load sample data into Amazon Personalize

At this point, we’ve finished setting up Amazon Pinpoint. Now we can start loading data into Amazon Personalize.

To load the data into Amazon Personalize

  1. At the command line, enter the following command to clone the sample data and Jupyter Notebooks to your computer:
    git clone https://github.com/markproy/personalize-car-search.git

  2. At the command line, change into the directory that contains the data that you just cloned. Enter the following command:
    jupyter notebook

    A new window opens in your web browser.

  3. In your web browser, open the first notebook (01_generate_data.ipynb). On the Cell menu, choose Run all. Wait for the commands to finish running.
  4. Open the second notebook (02_make_dataset_group.ipynb). In the first step, replace the value of the account_id variable with the ID of your AWS account. Then, on the Cell menu, choose Run all. This step takes several minutes to complete. Make sure that all of the commands have run successfully before you proceed to the next step.
  5. Open the third notebook (03_make_campaigns.ipynb). In the first step, replace the value of the account_id variable with the ID of your AWS account. Then, on the Cell menu, choose Run all. This step takes several minutes to complete. Make sure that all of the commands have run successfully before you proceed to the next step.
  6. Open the fourth notebook (04_use_the_campaign.ipynb). In the first step, replace the value of the account_id variable with the ID of your AWS account. Then, on the Cell menu, choose Run all. This step takes several minutes to complete.
  7. After the fourth notebook is finished running, choose Quit to terminate the Jupyter Notebook. You don’t need to run the fifth notebook for this example.
  8. Open the Amazon Personalize console at http://console.aws.amazon.com/personalize. Verify that Amazon Personalize contains one dataset group named car-dg.
  9. In the navigation pane, choose Campaigns. Verify that it contains all of the following campaigns, and that the status for each campaign is Active:
    • car-popularity-count
    • car-personalized-ranking
    • car-hrnn-metadata
    • car-sims
    • car-hrnn

Step 5: Create the Lambda function

We’ve loaded the data into Amazon Personalize, and now we need to create a Lambda function to update the endpoint attributes in Pinpoint with the recommendations provided by Personalize.

The version of the AWS SDK for Python that’s included with Lambda doesn’t include the libraries for Amazon Personalize. For this reason, you need to download these libraries to your computer, put them in a .zip file, and upload the entire package to Lambda.

To create the Lambda function

  1. In a text editor, create a new file. Paste the following code.
    # Copyright 2010-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
    #
    # This file is licensed under the Apache License, Version 2.0 (the "License").
    # You may not use this file except in compliance with the License. A copy of the
    # License is located at
    #
    # http://aws.amazon.com/apache2.0/
    #
    # This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
    # OF ANY KIND, either express or implied. See the License for the specific
    # language governing permissions and limitations under the License.
    
    AWS_REGION = "<region>"
    PROJECT_ID = "<project-id>"
    CAMPAIGN_ARN = "<car-hrnn-campaign-arn>"
    USER_ID = "12456"
    endpoint_id = USER_ID
    
    from datetime import datetime
    import json
    import boto3
    import logging
    from botocore.exceptions import ClientError
    
    DATE = datetime.now()
    
    personalize           = boto3.client('personalize')
    personalize_runtime   = boto3.client('personalize-runtime')
    personalize_events    = boto3.client('personalize-events')
    pinpoint              = boto3.client('pinpoint')
    
    def lambda_handler(event, context):
        itemList = get_recommended_items(USER_ID,CAMPAIGN_ARN)
        response = update_pinpoint_endpoint(PROJECT_ID,endpoint_id,itemList)
    
        return {
            'statusCode': 200,
            'body': json.dumps('Lambda execution completed.')
        }
    
    def get_recommended_items(user_id, campaign_arn):
        response = personalize_runtime.get_recommendations(campaignArn=campaign_arn, 
                                                           userId=str(user_id), 
                                                           numResults=10)
        itemList = response['itemList']
        return itemList
    
    def update_pinpoint_endpoint(project_id,endpoint_id,itemList):
        itemlistStr = []
        
        for item in itemList:
            itemlistStr.append(item['itemId'])
    
        pinpoint.update_endpoint(
        ApplicationId=project_id,
        EndpointId=endpoint_id,
        EndpointRequest={
                            'User': {
                                'UserAttributes': {
                                    'recommended_items': 
                                        itemlistStr
                                }
                            }
                        }
        )    
    
        return
    

    In the preceding code, make the following changes:

    • Replace <region> with the name of the AWS Region that you want to use, such as us-east-1.
    • Replace <project-id> with the ID of the Amazon Pinpoint project that you created earlier.
    • Replace <car-hrnn-campaign-arn> with the Amazon Resource Name (ARN) of the car-hrnn campaign in Amazon Personalize. You can find this value in the Amazon Personalize console.
  2. Save the file as pue-get-recs.py.
  3. Create and activate a virtual environment. In the virtual environment, use pip to download the latest versions of the boto3 and botocore libraries. For complete procedures, see Updating a Function with Additional Dependencies With a Virtual Environment in the AWS Lambda Developer Guide. Also, add the pue-get-recs.py file to the .zip file that contains the libraries.
  4. Open the IAM console at http://console.aws.amazon.com/iam. Create a new role. Attach the following policy to the role:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "logs:CreateLogStream",
                    "logs:DescribeLogGroups",
                    "logs:CreateLogGroup",
                    "logs:PutLogEvents",
                    "personalize:GetRecommendations",
                    "mobiletargeting:GetUserEndpoints",
                    "mobiletargeting:GetApp",
                    "mobiletargeting:UpdateEndpointsBatch",
                    "mobiletargeting:GetApps",
                    "mobiletargeting:GetEndpoint",
                    "mobiletargeting:GetApplicationSettings",
                    "mobiletargeting:UpdateEndpoint"
                ],
                "Resource": "*"
            }
        ]
    }
    
  5. Open the Lambda console at http://console.aws.amazon.com/lambda, and then choose Create function.
  6. Create a new Lambda function from scratch. Choose the Python 3.7 runtime. Under Permissions, choose Use an existing role, and then choose the IAM role that you just created. When you finish, choose Create function.
  7. Upload the .zip file that contains the Lambda function and the boto3 and botocore libraries.
  8. Under Function code, change the Handler value to pue-get-recs.lambda_handler. Save your changes.

When you finish creating the function, you can test it to make sure it was set up correctly.

To test the Lambda function

  1. On the Select a test event menu, choose Configure test events. On the Configure test events window, specify an Event name, and then choose Create.
  2. Choose the Test button to execute the function.
  3. If the function executes successfully, open the Amazon Pinpoint console at http://console.aws.amazon.com/pinpoint.
  4. In the navigation pane, choose Segments, and then choose the “No recommendations” segment that you created earlier. Verify that the number under total endpoints is 0. This is the expected value; the segment is filtered to only include endpoints with no recommendation attributes, but when you ran the Lambda function, it added recommendations to the test endpoint.

Step 7: Create segments and campaigns based on recommended items

In this section, we’ll create a targeted segment based on the recommendation data provided by our Personalize dataset. We’ll then use that segment to create a campaign.

To create a segment and campaign based on personalized recommendations

  1. Open the Amazon Pinpoint console at http://console.aws.amazon.com/pinpoint. On the All projects page, choose the project that you created earlier.
  2. In the navigation pane, choose Segments, and then choose Create a segment. Name the new segment “Recommendations for product 26304”.
  3. Under Segment group 1, on the Add a filter menu, choose Filter by user. On the Choose a user attribute menu, choose recommended-items. Set the value of the filter to “26304”. Confirm that the Segment estimate section shows that there is one eligible endpoint, and then choose Create segment.
  4. In the navigation pane, choose Campaigns, and then choose Create a campaign.
  5. Name the campaign “SMS to users with recommendations for product 26304”. Under Choose a channel for this campaign, choose SMS, and then choose Next.
  6. On the Choose a segment page, choose the “Recommendations for product 26304” segment that you just created, and then choose Next.
  7. In the message editor, type a test message, and then choose Next.
  8. On the Choose when to send the campaign page, keep all of the default values, and then choose Next.
  9. On the Review and launch page, choose Launch campaign. Within a few seconds, you receive a text message at the phone number that you specified when you created the endpoint.

Next steps

Your PUE solution is now ready to use. From here, there are several ways that you can make the solution your own:

  • Expand your usage: If you plan to continue sending SMS messages, you should request a spending limit increase.
  • Extend to additional channels: This post showed the process of setting up an SMS campaign. You can add more endpoints—for the email or push notification channels, for example—and associate them with your users. You can then create new segments and new campaigns in those channels.
  • Build your own model: This post used a sample data set, but Amazon Personalize makes it easy to provide your own data. To start building a model with Personalize, you have to provide a data set that contains information about your users, items, and interactions. To learn more, see Getting Started in the Amazon Personalize Developer Guide.
  • Optimize your model: You can enrich your model by sending your mobile, web, and campaign engagement data to Amazon Personalize. In Pinpoint, you can use event streaming to move data directly to S3, and then use that data to retrain your Personalize model. To learn more about streaming events, see Streaming App and Campaign Events in the Amazon Pinpoint User Guide.
  • Update your recommendations on a regular basis: Use the create-campaign API to create a new recurring campaign. Rather than sending messages, include the hook property with a reference to the ARN of the pue-get-recs function. By completing this step, you can configure Pinpoint to retrieve the most up-to-date recommendation data each time the campaign recurs. For more information about using Lambda to modify segments, see Customizing Segments with AWS Lambda in the Amazon Pinpoint Developer Guide.

Creating Great Content Marketing

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/creating-great-content-marketing/

In Cinema | Coming Soon: Content Marketing

Once the hot new marketing strategy, content marketing has lost some of its luster. If you follow marketing newsletters and blogs, you’ve likely even seen the claim that content marketing is dead. Some say it’s no longer effective because consumers are oversaturated with content. Others feel that much of content marketing is too broad a strategy and it’s more effective to target those who can directly affect the behavior of others using influencer marketing. Still others think that the hoopla over content marketing is over and money is better spent on keyword purchases, social media, SEO, and other techniques to direct customers into the top of the marketing funnel.

Backblaze has had its own journey of discovery in figuring out which kind of marketing would help it grow from a small backup and cloud storage business to a serious competitor to Amazon, Google, and Microsoft and other storage and cloud companies. Backblaze’s story provides a useful example of how a company came to content marketing after rejecting or not finding success using a number of other marketing approaches. Content marketing worked for Backblaze in large part due to the culture of the company, which will reinforce our argument a little bit later that content marketing is a lot about your company culture. But first things first: what exactly is content marketing?

What is Content Marketing?

Content marketing is the practice of creating, publishing, and sharing content with the goal of building the reputation and visibility of your brand.

The goal of content marketing is to get customers to come to you by providing them with something they need or enjoy. Once you have their attention, you can promote (overtly or covertly) whatever it is you wish to sell to them.

Conceptually, content marketing is similar to running a movie theatre. The movie gets people into the theatre where they can be sold soft drinks, popcorn, Mike & Ikes and Raisinets, which is how theatre owners make most of their money, not from ticket sales. Now you know why movie theatre snacks and drinks are so expensive; they have to cover the cost of the loss leader, the movie itself, as well as give the owner some profit.

Movie theatre concession stand
The movie gets the audience in the theater, but the theater owner’s profit comes from the popcorn.
Movie theatre snack concession. Image from Wikipedia.

The Growth of Content Marketing

Marketing in recent years has increasingly become a game of metrics. Marketers today have access to a wealth of data about customer and marketing behavior and an ever growing number of apps and tools to quantify and interpret that data. We have all this data because marketing has become largely an online game and it’s fairly easy to collect behavioral data when users interact with websites, emails, webinars, videos, and podcasts. Metrics existed before for conventional mail campaigns and the like, and focus groups provided some confirmation of what marketers guessed was true, but it was generally a matter of manually counting heads, responses, and sales. Now that we’re online, just adding snippets of code to websites, apps, and emails can provide a wealth of information about consumers’ behavior. Conversion, funnel, nurturing, and keyword ranking are in the daily lexicon of marketers who look to numbers to demystify consumer behavior and justify the funding of their programs.

A trend contrary to marketing metrics grew in importance alongside the metrics binge and that trend is modern content marketing. While modern content marketing takes advantage of the immediacy and delivery vehicles of the internet, content marketing itself is as old as any marketing technique. It isn’t close to being the world’s oldest profession, but it does go back to the first attempts by humans to lure consumers to products and services with a better or more polished pitch than the next guy.

Benjamin Franklin used his annual Poor Richard’s Almanack as early as 1732 to promote his printing business and made sure readers knew where his printing shop was located. Farming equipment manufacturer John Deere put out the first issue of The Furrow in 1895. Today it has a circulation of 1.5 million in 40 countries and 12 different languages.

Benjamin Franklin’s Poor Richard’s Almanac from 1739
Benjamin Franklin’s Poor Richard’s Almanac from 1739
Ben’s conversion pitch -- The location of his printing office “near the Market”
Ben’s conversion pitch — The location of his printing office “near the Market”

John Deere's The Furrow, started in 1895
John Deere’s The Furrow, started in 1895

One might argue that long before these examples, stained glass windows in medieval cathedrals were another example of content marketing. They presented stories that entertained and educated and were an enticement to bring people to services.

Much later, the arrival of the internet and the web, and along with them, fast and easy content creation and easy consumer targeting, fueled the rapid growth of content marketing. We now have many more types of media beyond print suitable for content marketing, including social media, blogs, video, photos, podcasts and the like, which enabled content marketing to gain even more power and importance.

What’s the Problem With So Much Content Marketing?

If content marketing is so great, why are we hearing so many statements about content marketing being dead? My view is that content marketing isn’t any more dead now than in was in Benjamin Franklin’s time, and people aren’t going to stop buying popcorn at movie theaters. The problem is that there is so much content marketing that doesn’t reach its potential because it is empty and meaningless.

Unfortunately, too many people who are running content marketing programs have the same mindset as the people running poor metrics marketing programs. They look at what’s worked in the past for themselves or others and assume that repeating an earlier campaign will be as successful as the original. The approach that’s deadly for content marketing is to think that since a little is good, more must be better, and more of the very same thing.

When content marketing isn’t working, it’s usually not the marketing vehicle that’s to blame, it’s the content itself. Hollywood produces some great and creative content that gets people into theaters, but it also produces a lot of formulaic, repetitive garbage that falls flat. If a content marketing campaign is just following a formula and counting on repeating a past success, no amount of obscure performance metric optimization is going to make the content itself any better. That applies just as much to marketing technology products as it does to marketing Hollywood movies.

When content marketing isn’t working, it’s usually not the marketing vehicle that’s to blame, it’s the content itself.

The screenwriter William Goldman (Butch Cassidy and the Sundance Kid, All the President’s Men, Marathon Man, The Princess Bride) once famously said, “In Hollywood, no one knows anything.” He meant that no matter how much experience a producer or studio might have, it’s hard to predict what’s going to resonate with an audience because what always resonates is what is fresh and authentic, which are the hardest qualities to judge in any content and eludes simple formulas. Movie remakes sometimes work, but more often they fail to capture something that audiences responded to in the original: a fresh concept, great performances by engaged actors, an inspired director, and a great script. Just reproducing the elements in a previous success doesn’t guarantee success. The experience in the new version has to capture the magic in the original that appealed to the audience.

The Dissatisfaction With So Much Content

A lot of content just dangles an attractive hook to entice content consumers to click, and that’s all it does. Anyone can post a cute animal video or a suggestive or revealing photo, but it doesn’t do anything to help your audience understand who you are or help solve their problems.

Unfortunately for media consumers, clickbait works in simply getting users to click, which is the reason it hasn’t disappeared. As long as people click on the enticing image, celebrity reference, or promised secret revelation, we’ll have to suffer with clickbait. Even worse, clickbait is often used to tip the scales of value from the reader, where it belongs, to the publisher. Many viral tests, quizzes and celebrity slideshows plant advertising cookies that benefit the publisher by increasing the cost and perceived value of advertising on their site, leaving the consumer feeling that they’ve been used, which of course is exactly what has happened.

Another, and I think more important reason that content marketing isn’t succeeding for many is not that it’s not interesting or even useful, but that the content isn’t connected in a meaningful way with the content publisher. Just posting memes, how-tos, thought pieces, and stories unrelated to who you are as a business, or not reflecting who your employees are and the values you hold as a company, doesn’t do anything to connect your visitors to you. Empty content is like empty calories in junk food; it doesn’t nourish and strengthen the relationship you should be building with your audience.

Is SEO the Enemy?

SEO is not the enemy, but focusing on only some superficial SEO tactics above other approaches is not going to create a long term bond with your visitors. Keyword stuffing and optimization can damage the user experience if the user feels manipulated. Google might still bring people to your content as a result of these techniques, but it’s a hollow relationship that has no staying power. When you create quality content that your audience will like and will recommend to others, you produce backlinks and social signals what will improve your search rankings, which is the way to win in SEO.

Despite all the supposed secret formulas and tricks to get high search engine ranking, the real secret is that Google loves quality content and will reward it, so that’s the smart SEO strategy to follow.

What is Good Content Marketing?

Similar to coming up with an idea for the next movie blockbuster to get people into theaters, content marketing is about creating good and useful content that entertains, educates, creates interest, or is useful in some way. It works best when it is the kind of content that people want to share with others. The viral effect will compound the audience you earn. That’s why content marketing has really taken off in the age of social media. Word-of-mouth and good write-ups have always propelled good content, but they are nothing compared to the effect viral online sharing can have on a good blog post, video, photograph, meme or other content.

How do you create this great content? We’re going to cover three steps that will take you from ho-hum content marketing to good and possibly even great content marketing. If you follow these three steps, you’ll be ahead of 90 percent of the businesses out there that are trying to crack the how-to of content marketing.

First — Start with Why You Do What You Do

Simon Sinek in his book, Start with Why, and in his presentations, especially his TED Talk, How Great Leaders Inspire Action, argues that people don’t base their purchasing decisions primarily on what a company does, but on why they do it. This might be hard to envision for some products, like toothpaste or laundry detergent, but I think it does apply to every purchase we make, even if in some cases it’s to a small degree. For some things it’s much more apparent. People identify with iOS or Android, Ford or Chevy, Ducati or Suzuki, based on much more than practical considerations of price, effectiveness, and other qualities. People want to use products and services that bolster their image of who they are, or who they want to be. Some companies are great at using this desire (Apple, BMW, Nike, Sephora, Ikea, Whole Foods, REI) and have a distinct identity that is the foundation for every message they put out.

Golden Circle: Why? How? What?
From Simon Sinek, Start With Why

To communicate the why of your products and services, you can’t just put out generic content that works for anyone. You have to produce content that shows specifically who you are. The best content marketing is cultural. The content you deliver tells your audience what kind of company you are, what your values are, who are the people in the company, and why they work there and do the things they do. That means you must be authentic and transparent. That takes courage, and isn’t easy, which is why so few companies are good at it. It takes vision, leadership, and a constant reminder from company leaders of what you’re doing and why it matters.

Unfortunately, this is hard to maintain as companies grow. The organizations that have grown dramatically and yet successfully maintained the core company values have either had a charismatic leader who represented and reiterated the company’s values at every opportunity (Apple), or have built them into every communication, event, and presentation by the company, no matter who is delivering them (Salesforce).

If your company isn’t good at this, don’t despair. These skills can be learned, so if your company needs to get better at understanding and communicating the why of who they are, there’s still hope that with some effort, it can still happen.

Second — Put Yourself in Your Customers’ Shoes

You not only need to understand yourself and your company and present yourself authentically, you have to really understand your customer — really, really understand your customer. That takes time, research, and empathy to walk a mile in their shoes. You need to visit your customers, spend a day fielding support calls or working customer service, go places, do things, and ask questions that you’ve never asked. Are they well off with cash to burn, or do they count every penny? Do they live for themselves, their parents, their children, their community, their church, their livelihood? How could your company help them solve their problems or make their lives better?

The best marketers have imagination and empathy. They, like novelists, playwrights, and poets, are able to imagine what it would be like to live like someone else. Some marketing organizations formalize this function by having one person who is assigned to represent the customers and always advocate for their interests. This can help prevent falling into the mindset of thinking of the customer only as a source of revenue or problems that have to be solved.

One common marketing technique is to create a persona or personas that represent your ideal customer(s). What is their age, sex, occupation? What are their interests, fears, etc.? This can help make sure that the customer is never just an unknown face or potential revenue source, but instead is a real person whom you need to be close to and understand as deeply as possible.

Once you’ve made the commitment to understand your customers, you’re ready to help solve their problems.

Third — Focus on Solving Your Customers’ Problems

Once you have your authentic voice down and you really know who your customer is and how they think, the third thing you need to do is focus on providing useful content. Useful content for your customers is content that solves a real problem they have. What’s causing them pain or what’s impeding them doing what they need or want to do? The customer may or may not know they have this pain. You might be creating a new need or desire for them by telling a story about how their life will be if they only had this thing, service, or experience. Help them dream of being on a riverboat in Europe, enjoying the pool in their backyard on a summer’s day, or showing off a new mobile phone to their friends at work.

By speaking to the needs of your customers, you’re helping them solve problems, but also forging a bond of trust and usefulness that will go forward in your relationship with them.

Mastering Blogging for Content Marketing

There are many ways to create and deliver content that is authentic and serves a need. Podcasts, Vlogs, events, publications, words, pictures, music, and videos all can be effective delivery vehicles for quality content. Let’s focus on one vehicle that can return exceptional results when done right, and that is blogging, which has worked well for Backblaze.

Backblaze didn’t just create a blog that then turned into an overnight success. Backblaze tried a number of marketing approaches that didn’t perform as the company hoped. The company wrote about these efforts on its blog, which is a major reason why the blog became a marketing success — it showed that the company was willing to talk about both its successes and its failures. You can read about some of these marketing adventures at As Seen on Ellen and How to Save Marketing Money by Being Nice. Forbes wrote about Backblaze’s marketing history in an article in 2013, One Startup Tried Every Marketing Ploy From ‘Ellen’ To Twitter: Here’s What Worked.

Blendtec on the Ellen Show

Backblaze on the Ellen Show

Backblaze billboard on Highway 101 in Silicon Valley, 2011

Backblaze billboard on Highway 101 in Silicon Valley

Backblaze decided early on that it would be as transparent as possible in its business practices. That meant that if there were no good reason not to release information, the company should release it, and the blog became the place where the company made that information public. Backblaze’s CEO Gleb Budman wrote about this commitment to transparency, and the results from it, in a blog post in 2017, The Decision on Transparency. An early example of this transparency is a 2010 post in which Backblaze analyzed why a proposed acquisition of the company failed, Backblaze online backup almost acquired — Breaking down the breakup. Companies rarely write about acquisitions that fall through.

Backblaze’s blog really took off in 2015 when the company decided to publish the statistics it had collected on the failure rate of hard drives in its data centers, Reliability Data Set For 41,000 Hard Drives Now Open Source. While many cloud companies routinely collected this kind of data, including Amazon, Google, and Microsoft, none had ever made it public. It turned out that readers were tremendously hungry for data on how hard drives performed, and Backblaze’s blog readership subsequently increased by hundreds of thousands of readers. Readers analyzed the drive failure data and debated which drives were the best for their own purposes. This was despite Backblaze’s disclaimer that how Backblaze used hard drives in its data centers didn’t really reflect how the drives would perform in other applications, including homes and businesses. Customers didn’t care. They were starved for the information and waited anxiously for the release of each new Drive Stats post.

It Turns Out That Blogging with Authenticity and Transparency is Rewarded

As Gilmore and Pine wrote in their book, Authenticity, “People increasingly see the world in terms of real and fake, and want to buy something real from someone genuine, not a fake from some phony.” How do you convince your customers that you’re real and genuine? The simple answer is to communicate honestly about who you are, which means sometimes telling them about your failures and mistakes and owning up to less than stellar performances by yourself or your company. Consider lifting the veil occasionally to reveal who you are. If you put the customer first, that shouldn’t be too hard even when you fall short. If your intentions are good, being transparent will almost always be rewarded with forgiveness and greater trust and loyalty from your customers.

Many companies created blogs thinking they had to because everyone else was and they started posting articles by their executives and product marketers going on about how great their products were. Then they were surprised when they got little traffic. These people didn’t get the message about how content should be used to help customers with their problems and build a relationship with them through authenticity and transparency.

If you have a blog, you could use that as a place to write about how you do business, the lessons you’ve learned, and yes, even the mistakes you’ve made. Don’t assume that all your company information needs to be protected. If at all possible, write about the tough issues and why you made the decisions you did. Your customers will respond because they don’t see that kind of frankness elsewhere and because they appreciate understanding the kind of company they’re paying for the product or service.

Your Blog Isn’t One Audience of Thousands or Millions, But Many Audiences of One

Don’t be afraid to write to a specific audience or group on your blog. You might have multiple audiences, but you might have specialized ones, as well. When you’re writing to an audience with specialized vocabulary or acronyms, don’t be afraid to use them. Other readers will recognize that the post is not for them and skip over it, or they’ll use it as an entry to a new area that interests them. If you try to make all your posts suitable for a homogeneous reader, you’ll end up with many readers leaving because you’re not speaking directly to them using their language.

If the piece is aimed at a novice or general audience, definitely take the time to explain unfamiliar concepts and spell out abbreviations and acronyms that might not be familiar to the reader. However, if the piece is aimed at a professional audience, you should avoid doing that because the reader might think that the post isn’t aimed at professionals and they could dismiss the post and the blog thinking it’s not suitable for them.

Strive to match the content, vocabulary, graphics, technical argot, and level of reading to the intended market segment. The goal is to make each reader feel that the piece was written specifically for him or her.

Taking Just OK Content Marketing and Making It Great

Authenticity, honesty, frankness, and sincerity are all qualities that to some degree or other are present in the best content. Unfortunately, marketers have the reputation for producing content that’s at the opposite end of the spectrum. Comedian George Burns could have been parodying a modern marketing course when he wrote, “To be a fine actor, you’ve got to be honest. And if you can fake that, you’ve got it made.”

There’s a reason that the recommendation to be authentic sounds like advice you’d get from your mom or dad about how to behave on your first date. We all learn sooner or later that if someone doesn’t like you for who you are, there’s no amount of faking being someone else that will make them like you for more than just a little while.

Be yourself, say something that really means something to you, and tell a story that connects with your audience and gives them some value. Those intangibles are hard to measure in metrics, but, when done well, might earn you an honest response, some respect, and perhaps a repeat visit.

The post Creating Great Content Marketing appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

FREE NOODS with FOODBEAST and Nissin

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/free-ramen-foodbeast-nissin/

Push a button and share a hashtag to get free ramen, games, or swag with the Dream Machine, a Raspberry Pi–driven vending machine built by FOODBEAST and Nissin.

foodbeast.com on Twitter

This Instagram-powered vending machine gives away FREE @OrigCupNoodles and VIDEO GAMES 🍜🎮!! Where should it travel next? #ad https://t.co/W0YyWOCFVv

Raspberry Pi and marketing

Digital viral marketing campaigns are super popular right now, thanks to the low cost of the technology necessary to build bespoke projects for them. From story-telling phoneboxes to beer-pouring bicycles, we see more and more examples of such projects appear in our inbox every week.

The latest campaign we like is the Dream Machine, a retrofit vending machine that dispenses ramen noodles, video games, and swag in exchange for the use of an Instagram hashtag.

Free ramen from FOODBEAST and Nissin

With Dream Machines in Torrance, California and Las Vegas, Nevada, I’ve yet to convince Liz that it’s worth the time and money for me to fly out and do some field research. But, as those who have interacted with a Dream Machine know, the premise is pretty simple.

The Dream Machine vending machine from FOODBEAST and NissanPress the big yellow button on the front of the vending machine, and it will tell you a unique hashtag to use for posting a selfie with the Dream Machine on Instagram. The machine’s internet-enabled Raspberry Pi brain then uses its magic noodle powers (or, more likely, custom software) to detect the hashtag and pop out a tasty treat, video game, or gift card as a reward.

The Dream Machine vending machine from FOODBEAST and Nissan

The Dream Machines appeared at the start of March, and online sources suggest they’ll stay in their current locations throughout the month. I’d like to take this moment to suggest their next locations: Cambridge, UK and Oakland, California. Please and thank you!

Hold your horses…

We know this is a marketing ploy. We know its intention is to get Joe Public to spread the brand across social media. We know it’s all about money. We know. But still, it’s cool, harmless, and delicious. So let’s not have another robocall debate, OK 😂

The post FREE NOODS with FOODBEAST and Nissin appeared first on Raspberry Pi.

New Data Privacy Regulations

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/06/new_data_privac.html

When Marc Zuckerberg testified before both the House and the Senate last month, it became immediately obvious that few US lawmakers had any appetite to regulate the pervasive surveillance taking place on the Internet.

Right now, the only way we can force these companies to take our privacy more seriously is through the market. But the market is broken. First, none of us do business directly with these data brokers. Equifax might have lost my personal data in 2017, but I can’t fire them because I’m not their customer or even their user. I could complain to the companies I do business with who sell my data to Equifax, but I don’t know who they are. Markets require voluntary exchange to work properly. If consumers don’t even know where these data brokers are getting their data from and what they’re doing with it, they can’t make intelligent buying choices.

This is starting to change, thanks to a new law in Vermont and another in Europe. And more legislation is coming.

Vermont first. At the moment, we don’t know how many data brokers collect data on Americans. Credible estimates range from 2,500 to 4,000 different companies. Last week, Vermont passed a law that will change that.

The law does several things to improve the security of Vermonters’ data, but several provisions matter to all of us. First, the law requires data brokers that trade in Vermonters’ data to register annually. And while there are many small local data brokers, the larger companies collect data nationally and even internationally. This will help us get a more accurate look at who’s in this business. The companies also have to disclose what opt-out options they offer, and how people can request to opt out. Again, this information is useful to all of us, regardless of the state we live in. And finally, the companies have to disclose the number of security breaches they’ve suffered each year, and how many individuals were affected.

Admittedly, the regulations imposed by the Vermont law are modest. Earlier drafts of the law included a provision requiring data brokers to disclose how many individuals’ data it has in its databases, what sorts of data it collects and where the data came from, but those were removed as the bill negotiated its way into law. A more comprehensive law would allow individuals to demand to exactly what information they have about them­ — and maybe allow individuals to correct and even delete data. But it’s a start, and the first statewide law of its kind to be passed in the face of strong industry opposition.

Vermont isn’t the first to attempt this, though. On the other side of the country, Representative Norma Smith of Washington introduced a similar bill in both 2017 and 2018. It goes further, requiring disclosure of what kinds of data the broker collects. So far, the bill has stalled in the state’s legislature, but she believes it will have a much better chance of passing when she introduces it again in 2019. I am optimistic that this is a trend, and that many states will start passing bills forcing data brokers to be increasingly more transparent in their activities. And while their laws will be tailored to residents of those states, all of us will benefit from the information.

A 2018 California ballot initiative could help. Among its provisions, it gives consumers the right to demand exactly what information a data broker has about them. If it passes in November, once it takes effect, lots of Californians will take the list of data brokers from Vermont’s registration law and demand this information based on their own law. And again, all of us — regardless of the state we live in­ — will benefit from the information.

We will also benefit from another, much more comprehensive, data privacy and security law from the European Union. The General Data Protection Regulation (GDPR) was passed in 2016 and took effect on 25 May. The details of the law are far too complex to explain here, but among other things, it mandates that personal data can only be collected and saved for specific purposes and only with the explicit consent of the user. We’ll learn who is collecting what and why, because companies that collect data are going to have to ask European users and customers for permission. And while this law only applies to EU citizens and people living in EU countries, the disclosure requirements will show all of us how these companies profit off our personal data.

It has already reaped benefits. Over the past couple of weeks, you’ve received many e-mails from companies that have you on their mailing lists. In the coming weeks and months, you’re going to see other companies disclose what they’re doing with your data. One early example is PayPal: in preparation for GDPR, it published a list of the over 600 companies it shares your personal data with. Expect a lot more like this.

Surveillance is the business model of the Internet. It’s not just the big companies like Facebook and Google watching everything we do online and selling advertising based on our behaviors; there’s also a large and largely unregulated industry of data brokers that collect, correlate and then sell intimate personal data about our behaviors. If we make the reasonable assumption that Congress is not going to regulate these companies, then we’re left with the market and consumer choice. The first step in that process is transparency. These new laws, and the ones that will follow, are slowly shining a light on this secretive industry.

This essay originally appeared in the Guardian.

Hiring a Director of Sales

Post Syndicated from Yev original https://www.backblaze.com/blog/hiring-a-director-of-sales/

Backblaze is hiring a Director of Sales. This is a critical role for Backblaze as we continue to grow the team. We need a strong leader who has experience in scaling a sales team and who has an excellent track record for exceeding goals by selling Software as a Service (SaaS) solutions. In addition, this leader will need to be highly motivated, as well as able to create and develop a highly-motivated, success oriented sales team that has fun and enjoys what they do.

The History of Backblaze from our CEO
In 2007, after a friend’s computer crash caused her some suffering, we realized that with every photo, video, song, and document going digital, everyone would eventually lose all of their information. Five of us quit our jobs to start a company with the goal of making it easy for people to back up their data.

Like many startups, for a while we worked out of a co-founder’s one-bedroom apartment. Unlike most startups, we made an explicit agreement not to raise funding during the first year. We would then touch base every six months and decide whether to raise or not. We wanted to focus on building the company and the product, not on pitching and slide decks. And critically, we wanted to build a culture that understood money comes from customers, not the magical VC giving tree. Over the course of 5 years we built a profitable, multi-million dollar revenue business — and only then did we raise a VC round.

Fast forward 10 years later and our world looks quite different. You’ll have some fantastic assets to work with:

  • A brand millions recognize for openness, ease-of-use, and affordability.
  • A computer backup service that stores over 500 petabytes of data, has recovered over 30 billion files for hundreds of thousands of paying customers — most of whom self-identify as being the people that find and recommend technology products to their friends.
  • Our B2 service that provides the lowest cost cloud storage on the planet at 1/4th the price Amazon, Google or Microsoft charges. While being a newer product on the market, it already has over 100,000 IT and developers signed up as well as an ecosystem building up around it.
  • A growing, profitable and cash-flow positive company.
  • And last, but most definitely not least: a great sales team.

You might be saying, “sounds like you’ve got this under control — why do you need me?” Don’t be misled. We need you. Here’s why:

  • We have a great team, but we are in the process of expanding and we need to develop a structure that will easily scale and provide the most success to drive revenue.
  • We just launched our outbound sales efforts and we need someone to help develop that into a fully successful program that’s building a strong pipeline and closing business.
  • We need someone to work with the marketing department and figure out how to generate more inbound opportunities that the sales team can follow up on and close.
  • We need someone who will work closely in developing the skills of our current sales team and build a path for career growth and advancement.
  • We want someone to manage our Customer Success program.

So that’s a bit about us. What are we looking for in you?

Experience: As a sales leader, you will strategically build and drive the territory’s sales pipeline by assembling and leading a skilled team of sales professionals. This leader should be familiar with generating, developing and closing software subscription (SaaS) opportunities. We are looking for a self-starter who can manage a team and make an immediate impact of selling our Backup and Cloud Storage solutions. In this role, the sales leader will work closely with the VP of Sales, marketing staff, and service staff to develop and implement specific strategic plans to achieve and exceed revenue targets, including new business acquisition as well as build out our customer success program.

Leadership: We have an experienced team who’s brought us to where we are today. You need to have the people and management skills to get them excited about working with you. You need to be a strong leader and compassionate about developing and supporting your team.

Data driven and creative: The data has to show something makes sense before we scale it up. However, without creativity, it’s easy to say “the data shows it’s impossible” or to find a local maximum. Whether it’s deciding how to scale the team, figuring out what our outbound sales efforts should look like or putting a plan in place to develop the team for career growth, we’ve seen a bit of creativity get us places a few extra dollars couldn’t.

Jive with our culture: Strong leaders affect culture and the person we hire for this role may well shape, not only fit into, ours. But to shape the culture you have to be accepted by the organism, which means a certain set of shared values. We default to openness with our team, our customers, and everyone if possible. We love initiative — without arrogance or dictatorship. We work to create a place people enjoy showing up to work. That doesn’t mean ping pong tables and foosball (though we do try to have perks & fun), but it means people are friendly, non-political, working to build a good service but also a good place to work.

Do the work: Ideas and strategy are critical, but good execution makes them happen. We’re looking for someone who can help the team execute both from the perspective of being capable of guiding and organizing, but also someone who is hands-on themselves.

Additional Responsibilities needed for this role:

  • Recruit, coach, mentor, manage and lead a team of sales professionals to achieve yearly sales targets. This includes closing new business and expanding upon existing clientele.
  • Expand the customer success program to provide the best customer experience possible resulting in upsell opportunities and a high retention rate.
  • Develop effective sales strategies and deliver compelling product demonstrations and sales pitches.
  • Acquire and develop the appropriate sales tools to make the team efficient in their daily work flow.
  • Apply a thorough understanding of the marketplace, industry trends, funding developments, and products to all management activities and strategic sales decisions.
  • Ensure that sales department operations function smoothly, with the goal of facilitating sales and/or closings; operational responsibilities include accurate pipeline reporting and sales forecasts.
  • This position will report directly to the VP of Sales and will be staffed in our headquarters in San Mateo, CA.

Requirements:

  • 7 – 10+ years of successful sales leadership experience as measured by sales performance against goals.
    Experience in developing skill sets and providing career growth and opportunities through advancement of team members.
  • Background in selling SaaS technologies with a strong track record of success.
  • Strong presentation and communication skills.
  • Must be able to travel occasionally nationwide.
  • BA/BS degree required

Think you want to join us on this adventure?
Send an email to jobscontact@backblaze.com with the subject “Director of Sales.” (Recruiters and agencies, please don’t email us.) Include a resume and answer these two questions:

  1. How would you approach evaluating the current sales team and what is your process for developing a growth strategy to scale the team?
  2. What are the goals you would set for yourself in the 3 month and 1-year timeframes?

Thank you for taking the time to read this and I hope that this sounds like the opportunity for which you’ve been waiting.

Backblaze is an Equal Opportunity Employer.

The post Hiring a Director of Sales appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Cryptocurrency Security Challenges

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/cryptocurrency-security-challenges/

Physical coins representing cyrptocurrencies

Most likely you’ve read the tantalizing stories of big gains from investing in cryptocurrencies. Someone who invested $1,000 into bitcoins five years ago would have over $85,000 in value now. Alternatively, someone who invested in bitcoins three months ago would have seen their investment lose 20% in value. Beyond the big price fluctuations, currency holders are possibly exposed to fraud, bad business practices, and even risk losing their holdings altogether if they are careless in keeping track of the all-important currency keys.

It’s certain that beyond the rewards and risks, cryptocurrencies are here to stay. We can’t ignore how they are changing the game for how money is handled between people and businesses.

Some Advantages of Cryptocurrency

  • Cryptocurrency is accessible to anyone.
  • Decentralization means the network operates on a user-to-user (or peer-to-peer) basis.
  • Transactions can completed for a fraction of the expense and time required to complete traditional asset transfers.
  • Transactions are digital and cannot be counterfeited or reversed arbitrarily by the sender, as with credit card charge-backs.
  • There aren’t usually transaction fees for cryptocurrency exchanges.
  • Cryptocurrency allows the cryptocurrency holder to send exactly what information is needed and no more to the merchant or recipient, even permitting anonymous transactions (for good or bad).
  • Cryptocurrency operates at the universal level and hence makes transactions easier internationally.
  • There is no other electronic cash system in which your account isn’t owned by someone else.

On top of all that, blockchain, the underlying technology behind cryptocurrencies, is already being applied to a variety of business needs and itself becoming a hot sector of the tech economy. Blockchain is bringing traceability and cost-effectiveness to supply-chain management — which also improves quality assurance in areas such as food, reducing errors and improving accounting accuracy, smart contracts that can be automatically validated, signed and enforced through a blockchain construct, the possibility of secure, online voting, and many others.

Like any new, booming marketing there are risks involved in these new currencies. Anyone venturing into this domain needs to have their eyes wide open. While the opportunities for making money are real, there are even more ways to lose money.

We’re going to cover two primary approaches to staying safe and avoiding fraud and loss when dealing with cryptocurrencies. The first is to thoroughly vet any person or company you’re dealing with to judge whether they are ethical and likely to succeed in their business segment. The second is keeping your critical cryptocurrency keys safe, which we’ll deal with in this and a subsequent post.

Caveat Emptor — Buyer Beware

The short history of cryptocurrency has already seen the demise of a number of companies that claimed to manage, mine, trade, or otherwise help their customers profit from cryptocurrency. Mt. Gox, GAW Miners, and OneCoin are just three of the many companies that disappeared with their users’ money. This is the traditional equivalent of your bank going out of business and zeroing out your checking account in the process.

That doesn’t happen with banks because of regulatory oversight. But with cryptocurrency, you need to take the time to investigate any company you use to manage or trade your currencies. How long have they been around? Who are their investors? Are they affiliated with any reputable financial institutions? What is the record of their founders and executive management? These are all important questions to consider when evaluating a company in this new space.

Would you give the keys to your house to a service or person you didn’t thoroughly know and trust? Some companies that enable you to buy and sell currencies online will routinely hold your currency keys, which gives them the ability to do anything they want with your holdings, including selling them and pocketing the proceeds if they wish.

That doesn’t mean you shouldn’t ever allow a company to keep your currency keys in escrow. It simply means that you better know with whom you’re doing business and if they’re trustworthy enough to be given that responsibility.

Keys To the Cryptocurrency Kingdom — Public and Private

If you’re an owner of cryptocurrency, you know how this all works. If you’re not, bear with me for a minute while I bring everyone up to speed.

Cryptocurrency has no physical manifestation, such as bills or coins. It exists purely as a computer record. And unlike currencies maintained by governments, such as the U.S. dollar, there is no central authority regulating its distribution and value. Cryptocurrencies use a technology called blockchain, which is a decentralized way of keeping track of transactions. There are many copies of a given blockchain, so no single central authority is needed to validate its authenticity or accuracy.

The validity of each cryptocurrency is determined by a blockchain. A blockchain is a continuously growing list of records, called “blocks”, which are linked and secured using cryptography. Blockchains by design are inherently resistant to modification of the data. They perform as an open, distributed ledger that can record transactions between two parties efficiently and in a verifiable, permanent way. A blockchain is typically managed by a peer-to-peer network collectively adhering to a protocol for validating new blocks. Once recorded, the data in any given block cannot be altered retroactively without the alteration of all subsequent blocks, which requires collusion of the network majority. On a scaled network, this level of collusion is impossible — making blockchain networks effectively immutable and trustworthy.

Blockchain process

The other element common to all cryptocurrencies is their use of public and private keys, which are stored in the currency’s wallet. A cryptocurrency wallet stores the public and private “keys” or “addresses” that can be used to receive or spend the cryptocurrency. With the private key, it is possible to write in the public ledger (blockchain), effectively spending the associated cryptocurrency. With the public key, it is possible for others to send currency to the wallet.

What is a cryptocurrency address?

Cryptocurrency “coins” can be lost if the owner loses the private keys needed to spend the currency they own. It’s as if the owner had lost a bank account number and had no way to verify their identity to the bank, or if they lost the U.S. dollars they had in their wallet. The assets are gone and unusable.

The Cryptocurrency Wallet

Given the importance of these keys, and lack of recourse if they are lost, it’s obviously very important to keep track of your keys.

If you’re being careful in choosing reputable exchanges, app developers, and other services with whom to trust your cryptocurrency, you’ve made a good start in keeping your investment secure. But if you’re careless in managing the keys to your bitcoins, ether, Litecoin, or other cryptocurrency, you might as well leave your money on a cafe tabletop and walk away.

What Are the Differences Between Hot and Cold Wallets?

Just like other numbers you might wish to keep track of — credit cards, account numbers, phone numbers, passphrases — cryptocurrency keys can be stored in a variety of ways. Those who use their currencies for day-to-day purchases most likely will want them handy in a smartphone app, hardware key, or debit card that can be used for purchases. These are called “hot” wallets. Some experts advise keeping the balances in these devices and apps to a minimal amount to avoid hacking or data loss. We typically don’t walk around with thousands of dollars in U.S. currency in our old-style wallets, so this is really a continuation of the same approach to managing spending money.

Bread mobile app screenshot

A “hot” wallet, the Bread mobile app

Some investors with large balances keep their keys in “cold” wallets, or “cold storage,” i.e. a device or location that is not connected online. If funds are needed for purchases, they can be transferred to a more easily used payment medium. Cold wallets can be hardware devices, USB drives, or even paper copies of your keys.

Trezor hardware wallet

A “cold” wallet, the Trezor hardware wallet

Ledger Nano S hardware wallet

A “cold” wallet, the Ledger Nano S

Bitcoin paper wallet

A “cold” Bitcoin paper wallet

Wallets are suited to holding one or more specific cryptocurrencies, and some people have multiple wallets for different currencies and different purposes.

A paper wallet is nothing other than a printed record of your public and private keys. Some prefer their records to be completely disconnected from the internet, and a piece of paper serves that need. Just like writing down an account password on paper, however, it’s essential to keep the paper secure to avoid giving someone the ability to freely access your funds.

How to Keep your Keys, and Cryptocurrency Secure

In a post this coming Thursday, Securing Your Cryptocurrency, we’ll discuss the best strategies for backing up your cryptocurrency so that your currencies don’t become part of the millions that have been lost. We’ll cover the common (and uncommon) approaches to backing up hot wallets, cold wallets, and using paper and metal solutions to keeping your keys safe.

In the meantime, please tell us of your experiences with cryptocurrencies — good and bad — and how you’ve dealt with the issue of cryptocurrency security.

The post Cryptocurrency Security Challenges appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The intersection of Customer Engagement and Data Science

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/messaging-and-targeting/the-intersection-of-customer-engagement-and-data-science/

On the Messaging and Targeting team, we’re constantly inspired by the new and novel ways that customers use our services. For example, last year we took an in-depth look at a customer who built a fully featured email marketing platform based on Amazon SES and other AWS Services.

This week, our friends on the AWS Machine Learning team published a blog post that brings together the worlds of data science and customer engagement. Their solution uses Amazon SageMaker (a platform for building and deploying machine learning models) to create a system that makes purchasing predictions based on customers’ past behaviors. It then uses Amazon Pinpoint to send campaigns to customers based on these predictions.

The blog post is an interesting read that includes a primer on the process of creating a useful Machine Learning solution. It then goes in-depth, discussing the real-world considerations that are involved in implementing the solution.

Take a look at their post, Amazon Pinpoint campaigns driven by machine learning on Amazon SageMaker, on the AWS Machine Learning Blog.

Analyze data in Amazon DynamoDB using Amazon SageMaker for real-time prediction

Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/analyze-data-in-amazon-dynamodb-using-amazon-sagemaker-for-real-time-prediction/

Many companies across the globe use Amazon DynamoDB to store and query historical user-interaction data. DynamoDB is a fast NoSQL database used by applications that need consistent, single-digit millisecond latency.

Often, customers want to turn their valuable data in DynamoDB into insights by analyzing a copy of their table stored in Amazon S3. Doing this separates their analytical queries from their low-latency critical paths. This data can be the primary source for understanding customers’ past behavior, predicting future behavior, and generating downstream business value. Customers often turn to DynamoDB because of its great scalability and high availability. After a successful launch, many customers want to use the data in DynamoDB to predict future behaviors or provide personalized recommendations.

DynamoDB is a good fit for low-latency reads and writes, but it’s not practical to scan all data in a DynamoDB database to train a model. In this post, I demonstrate how you can use DynamoDB table data copied to Amazon S3 by AWS Data Pipeline to predict customer behavior. I also demonstrate how you can use this data to provide personalized recommendations for customers using Amazon SageMaker. You can also run ad hoc queries using Amazon Athena against the data. DynamoDB recently released on-demand backups to create full table backups with no performance impact. However, it’s not suitable for our purposes in this post, so I chose AWS Data Pipeline instead to create managed backups are accessible from other services.

To do this, I describe how to read the DynamoDB backup file format in Data Pipeline. I also describe how to convert the objects in S3 to a CSV format that Amazon SageMaker can read. In addition, I show how to schedule regular exports and transformations using Data Pipeline. The sample data used in this post is from Bank Marketing Data Set of UCI.

The solution that I describe provides the following benefits:

  • Separates analytical queries from production traffic on your DynamoDB table, preserving your DynamoDB read capacity units (RCUs) for important production requests
  • Automatically updates your model to get real-time predictions
  • Optimizes for performance (so it doesn’t compete with DynamoDB RCUs after the export) and for cost (using data you already have)
  • Makes it easier for developers of all skill levels to use Amazon SageMaker

All code and data set in this post are available in this .zip file.

Solution architecture

The following diagram shows the overall architecture of the solution.

The steps that data follows through the architecture are as follows:

  1. Data Pipeline regularly copies the full contents of a DynamoDB table as JSON into an S3
  2. Exported JSON files are converted to comma-separated value (CSV) format to use as a data source for Amazon SageMaker.
  3. Amazon SageMaker renews the model artifact and update the endpoint.
  4. The converted CSV is available for ad hoc queries with Amazon Athena.
  5. Data Pipeline controls this flow and repeats the cycle based on the schedule defined by customer requirements.

Building the auto-updating model

This section discusses details about how to read the DynamoDB exported data in Data Pipeline and build automated workflows for real-time prediction with a regularly updated model.

Download sample scripts and data

Before you begin, take the following steps:

  1. Download sample scripts in this .zip file.
  2. Unzip the src.zip file.
  3. Find the automation_script.sh file and edit it for your environment. For example, you need to replace 's3://<your bucket>/<datasource path>/' with your own S3 path to the data source for Amazon ML. In the script, the text enclosed by angle brackets—< and >—should be replaced with your own path.
  4. Upload the json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar file to your S3 path so that the ADD jar command in Apache Hive can refer to it.

For this solution, the banking.csv  should be imported into a DynamoDB table.

Export a DynamoDB table

To export the DynamoDB table to S3, open the Data Pipeline console and choose the Export DynamoDB table to S3 template. In this template, Data Pipeline creates an Amazon EMR cluster and performs an export in the EMRActivity activity. Set proper intervals for backups according to your business requirements.

One core node(m3.xlarge) provides the default capacity for the EMR cluster and should be suitable for the solution in this post. Leave the option to resize the cluster before running enabled in the TableBackupActivity activity to let Data Pipeline scale the cluster to match the table size. The process of converting to CSV format and renewing models happens in this EMR cluster.

For a more in-depth look at how to export data from DynamoDB, see Export Data from DynamoDB in the Data Pipeline documentation.

Add the script to an existing pipeline

After you export your DynamoDB table, you add an additional EMR step to EMRActivity by following these steps:

  1. Open the Data Pipeline console and choose the ID for the pipeline that you want to add the script to.
  2. For Actions, choose Edit.
  3. In the editing console, choose the Activities category and add an EMR step using the custom script downloaded in the previous section, as shown below.

Paste the following command into the new step after the data ­­upload step:

s3://#{myDDBRegion}.elasticmapreduce/libs/script-runner/script-runner.jar,s3://<your bucket name>/automation_script.sh,#{output.directoryPath},#{myDDBRegion}

The element #{output.directoryPath} references the S3 path where the data pipeline exports DynamoDB data as JSON. The path should be passed to the script as an argument.

The bash script has two goals, converting data formats and renewing the Amazon SageMaker model. Subsequent sections discuss the contents of the automation script.

Automation script: Convert JSON data to CSV with Hive

We use Apache Hive to transform the data into a new format. The Hive QL script to create an external table and transform the data is included in the custom script that you added to the Data Pipeline definition.

When you run the Hive scripts, do so with the -e option. Also, define the Hive table with the 'org.openx.data.jsonserde.JsonSerDe' row format to parse and read JSON format. The SQL creates a Hive EXTERNAL table, and it reads the DynamoDB backup data on the S3 path passed to it by Data Pipeline.

Note: You should create the table with the “EXTERNAL” keyword to avoid the backup data being accidentally deleted from S3 if you drop the table.

The full automation script for converting follows. Add your own bucket name and data source path in the highlighted areas.

#!/bin/bash
hive -e "
ADD jar s3://<your bucket name>/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar ; 
DROP TABLE IF EXISTS blog_backup_data ;
CREATE EXTERNAL TABLE blog_backup_data (
 customer_id map<string,string>,
 age map<string,string>, job map<string,string>, 
 marital map<string,string>,education map<string,string>, 
 default map<string,string>, housing map<string,string>,
 loan map<string,string>, contact map<string,string>, 
 month map<string,string>, day_of_week map<string,string>, 
 duration map<string,string>, campaign map<string,string>,
 pdays map<string,string>, previous map<string,string>, 
 poutcome map<string,string>, emp_var_rate map<string,string>, 
 cons_price_idx map<string,string>, cons_conf_idx map<string,string>,
 euribor3m map<string,string>, nr_employed map<string,string>, 
 y map<string,string> ) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
LOCATION '$1/';

INSERT OVERWRITE DIRECTORY 's3://<your bucket name>/<datasource path>/' 
SELECT concat( customer_id['s'],',', 
 age['n'],',', job['s'],',', 
 marital['s'],',', education['s'],',', default['s'],',', 
 housing['s'],',', loan['s'],',', contact['s'],',', 
 month['s'],',', day_of_week['s'],',', duration['n'],',', 
 campaign['n'],',',pdays['n'],',',previous['n'],',', 
 poutcome['s'],',', emp_var_rate['n'],',', cons_price_idx['n'],',',
 cons_conf_idx['n'],',', euribor3m['n'],',', nr_employed['n'],',', y['n'] ) 
FROM blog_backup_data
WHERE customer_id['s'] > 0 ; 

After creating an external table, you need to read data. You then use the INSERT OVERWRITE DIRECTORY ~ SELECT command to write CSV data to the S3 path that you designated as the data source for Amazon SageMaker.

Depending on your requirements, you can eliminate or process the columns in the SELECT clause in this step to optimize data analysis. For example, you might remove some columns that have unpredictable correlations with the target value because keeping the wrong columns might expose your model to “overfitting” during the training. In this post, customer_id  columns is removed. Overfitting can make your prediction weak. More information about overfitting can be found in the topic Model Fit: Underfitting vs. Overfitting in the Amazon ML documentation.

Automation script: Renew the Amazon SageMaker model

After the CSV data is replaced and ready to use, create a new model artifact for Amazon SageMaker with the updated dataset on S3.  For renewing model artifact, you must create a new training job.  Training jobs can be run using the AWS SDK ( for example, Amazon SageMaker boto3 ) or the Amazon SageMaker Python SDK that can be installed with “pip install sagemaker” command as well as the AWS CLI for Amazon SageMaker described in this post.

In addition, consider how to smoothly renew your existing model without service impact, because your model is called by applications in real time. To do this, you need to create a new endpoint configuration first and update a current endpoint with the endpoint configuration that is just created.

#!/bin/bash
## Define variable 
REGION=$2
DTTIME=`date +%Y-%m-%d-%H-%M-%S`
ROLE="<your AmazonSageMaker-ExecutionRole>" 


# Select containers image based on region.  
case "$REGION" in
"us-west-2" )
    IMAGE="174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest"
    ;;
"us-east-1" )
    IMAGE="382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:latest" 
    ;;
"us-east-2" )
    IMAGE="404615174143.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest" 
    ;;
"eu-west-1" )
    IMAGE="438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest" 
    ;;
 *)
    echo "Invalid Region Name"
    exit 1 ;  
esac

# Start training job and creating model artifact 
TRAINING_JOB_NAME=TRAIN-${DTTIME} 
S3OUTPUT="s3://<your bucket name>/model/" 
INSTANCETYPE="ml.m4.xlarge"
INSTANCECOUNT=1
VOLUMESIZE=5 
aws sagemaker create-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --algorithm-specification TrainingImage=${IMAGE},TrainingInputMode=File --role-arn ${ROLE}  --input-data-config '[{ "ChannelName": "train", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://<your bucket name>/<datasource path>/", "S3DataDistributionType": "FullyReplicated" } }, "ContentType": "text/csv", "CompressionType": "None" , "RecordWrapperType": "None"  }]'  --output-data-config S3OutputPath=${S3OUTPUT} --resource-config  InstanceType=${INSTANCETYPE},InstanceCount=${INSTANCECOUNT},VolumeSizeInGB=${VOLUMESIZE} --stopping-condition MaxRuntimeInSeconds=120 --hyper-parameters feature_dim=20,predictor_type=binary_classifier  

# Wait until job completed 
aws sagemaker wait training-job-completed-or-stopped --training-job-name ${TRAINING_JOB_NAME}  --region ${REGION}

# Get newly created model artifact and create model
MODELARTIFACT=`aws sagemaker describe-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --query 'ModelArtifacts.S3ModelArtifacts' --output text `
MODELNAME=MODEL-${DTTIME}
aws sagemaker create-model --region ${REGION} --model-name ${MODELNAME}  --primary-container Image=${IMAGE},ModelDataUrl=${MODELARTIFACT}  --execution-role-arn ${ROLE}

# create a new endpoint configuration 
CONFIGNAME=CONFIG-${DTTIME}
aws sagemaker  create-endpoint-config --region ${REGION} --endpoint-config-name ${CONFIGNAME}  --production-variants  VariantName=Users,ModelName=${MODELNAME},InitialInstanceCount=1,InstanceType=ml.m4.xlarge

# create or update the endpoint
STATUS=`aws sagemaker describe-endpoint --endpoint-name  ServiceEndpoint --query 'EndpointStatus' --output text --region ${REGION} `
if [[ $STATUS -ne "InService" ]] ;
then
    aws sagemaker  create-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}    
else
    aws sagemaker  update-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}
fi

Grant permission

Before you execute the script, you must grant proper permission to Data Pipeline. Data Pipeline uses the DataPipelineDefaultResourceRole role by default. I added the following policy to DataPipelineDefaultResourceRole to allow Data Pipeline to create, delete, and update the Amazon SageMaker model and data source in the script.

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "sagemaker:CreateTrainingJob",
 "sagemaker:DescribeTrainingJob",
 "sagemaker:CreateModel",
 "sagemaker:CreateEndpointConfig",
 "sagemaker:DescribeEndpoint",
 "sagemaker:CreateEndpoint",
 "sagemaker:UpdateEndpoint",
 "iam:PassRole"
 ],
 "Resource": "*"
 }
 ]
}

Use real-time prediction

After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint. This approach is useful for interactive web, mobile, or desktop applications.

Following, I provide a simple Python code example that queries against Amazon SageMaker endpoint URL with its name (“ServiceEndpoint”) and then uses them for real-time prediction.

=== Python sample for real-time prediction ===

#!/usr/bin/env python
import boto3
import json 

client = boto3.client('sagemaker-runtime', region_name ='<your region>' )
new_customer_info = '34,10,2,4,1,2,1,1,6,3,190,1,3,4,3,-1.7,94.055,-39.8,0.715,4991.6'
response = client.invoke_endpoint(
    EndpointName='ServiceEndpoint',
    Body=new_customer_info, 
    ContentType='text/csv'
)
result = json.loads(response['Body'].read().decode())
print(result)
--- output(response) ---
{u'predictions': [{u'score': 0.7528127431869507, u'predicted_label': 1.0}]}

Solution summary

The solution takes the following steps:

  1. Data Pipeline exports DynamoDB table data into S3. The original JSON data should be kept to recover the table in the rare event that this is needed. Data Pipeline then converts JSON to CSV so that Amazon SageMaker can read the data.Note: You should select only meaningful attributes when you convert CSV. For example, if you judge that the “campaign” attribute is not correlated, you can eliminate this attribute from the CSV.
  2. Train the Amazon SageMaker model with the new data source.
  3. When a new customer comes to your site, you can judge how likely it is for this customer to subscribe to your new product based on “predictedScores” provided by Amazon SageMaker.
  4. If the new user subscribes your new product, your application must update the attribute “y” to the value 1 (for yes). This updated data is provided for the next model renewal as a new data source. It serves to improve the accuracy of your prediction. With each new entry, your application can become smarter and deliver better predictions.

Running ad hoc queries using Amazon Athena

Amazon Athena is a serverless query service that makes it easy to analyze large amounts of data stored in Amazon S3 using standard SQL. Athena is useful for examining data and collecting statistics or informative summaries about data. You can also use the powerful analytic functions of Presto, as described in the topic Aggregate Functions of Presto in the Presto documentation.

With the Data Pipeline scheduled activity, recent CSV data is always located in S3 so that you can run ad hoc queries against the data using Amazon Athena. I show this with example SQL statements following. For an in-depth description of this process, see the post Interactive SQL Queries for Data in Amazon S3 on the AWS News Blog. 

Creating an Amazon Athena table and running it

Simply, you can create an EXTERNAL table for the CSV data on S3 in Amazon Athena Management Console.

=== Table Creation ===
CREATE EXTERNAL TABLE datasource (
 age int, 
 job string, 
 marital string , 
 education string, 
 default string, 
 housing string, 
 loan string, 
 contact string, 
 month string, 
 day_of_week string, 
 duration int, 
 campaign int, 
 pdays int , 
 previous int , 
 poutcome string, 
 emp_var_rate double, 
 cons_price_idx double,
 cons_conf_idx double, 
 euribor3m double, 
 nr_employed double, 
 y int 
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
LOCATION 's3://<your bucket name>/<datasource path>/';

The following query calculates the correlation coefficient between the target attribute and other attributes using Amazon Athena.

=== Sample Query ===

SELECT corr(age,y) AS correlation_age_and_target, 
 corr(duration,y) AS correlation_duration_and_target, 
 corr(campaign,y) AS correlation_campaign_and_target,
 corr(contact,y) AS correlation_contact_and_target
FROM ( SELECT age , duration , campaign , y , 
 CASE WHEN contact = 'telephone' THEN 1 ELSE 0 END AS contact 
 FROM datasource 
 ) datasource ;

Conclusion

In this post, I introduce an example of how to analyze data in DynamoDB by using table data in Amazon S3 to optimize DynamoDB table read capacity. You can then use the analyzed data as a new data source to train an Amazon SageMaker model for accurate real-time prediction. In addition, you can run ad hoc queries against the data on S3 using Amazon Athena. I also present how to automate these procedures by using Data Pipeline.

You can adapt this example to your specific use case at hand, and hopefully this post helps you accelerate your development. You can find more examples and use cases for Amazon SageMaker in the video AWS 2017: Introducing Amazon SageMaker on the AWS website.

 


Additional Reading

If you found this post useful, be sure to check out Serving Real-Time Machine Learning Predictions on Amazon EMR and Analyzing Data in S3 using Amazon Athena.

 


About the Author

Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”

 

 

OMG The Stupid It Burns

Post Syndicated from Robert Graham original https://blog.erratasec.com/2018/04/omg-stupid-it-burns.html

This article, pointed out by @TheGrugq, is stupid enough that it’s worth rebutting.

The article starts with the question “Why did the lessons of Stuxnet, Wannacry, Heartbleed and Shamoon go unheeded?“. It then proceeds to ignore the lessons of those things.
Some of the actual lessons should be things like how Stuxnet crossed air gaps, how Wannacry spread through flat Windows networking, how Heartbleed comes from technical debt, and how Shamoon furthers state aims by causing damage.
But this article doesn’t cover the technical lessons. Instead, it thinks the lesson should be the moral lesson, that we should take these things more seriously. But that’s stupid. It’s the sort of lesson people teach you that know nothing about the topic. When you have nothing of value to contribute to a topic you can always take the moral high road and criticize everyone for being morally weak for not taking it more seriously. Obviously, since doctors haven’t cured cancer yet, it’s because they don’t take the problem seriously.
The article continues to ignore the lesson of these cyber attacks and instead regales us with a list of military lessons from WW I and WW II. This makes the same flaw that many in the military make, trying to understand cyber through analogies with the real world. It’s not that such lessons could have no value, it’s that this article contains a poor list of them. It seems to consist of a random list of events that appeal to the author rather than events that have bearing on cybersecurity.
Then, in case we don’t get the point, the article bullies us with hyperbole, cliches, buzzwords, bombastic language, famous quotes, and citations. It’s hard to see how most of them actually apply to the text. Rather, it seems like they are included simply because he really really likes them.
The article invests much effort in discussing the buzzword “OODA loop”. Most attacks in cyberspace don’t have one. Instead, attackers flail around, trying lots of random things, overcoming defense with brute-force rather than an understanding of what’s going on. That’s obviously the case with Wannacry: it was an accident, with the perpetrator experimenting with what would happen if they added the ETERNALBLUE exploit to their existing ransomware code. The consequence was beyond anybody’s ability to predict.
You might claim that this is just the first stage, that they’ll loop around, observe Wannacry’s effects, orient themselves, decide, then act upon what they learned. Nope. Wannacry burned the exploit. It’s essentially removed any vulnerable systems from the public Internet, thereby making it impossible to use what they learned. It’s still active a year later, with infected systems behind firewalls busily scanning the Internet so that if you put a new system online that’s vulnerable, it’ll be taken offline within a few hours, before any other evildoer can take advantage of it.
See what I’m doing here? Learning the actual lessons of things like Wannacry? The thing the above article fails to do??
The article has a humorous paragraph on “defense in depth”, misunderstanding the term. To be fair, it’s the cybersecurity industry’s fault: they adopted then redefined the term. That’s why there’s two separate articles on Wikipedia: one for the old military term (as used in this article) and one for the new cybersecurity term.
As used in the cybersecurity industry, “defense in depth” means having multiple layers of security. Many organizations put all their defensive efforts on the perimeter, and none inside a network. The idea of “defense in depth” is to put more defenses inside the network. For example, instead of just one firewall at the edge of the network, put firewalls inside the network to segment different subnetworks from each other, so that a ransomware infection in the customer support computers doesn’t spread to sales and marketing computers.
The article talks about exploiting WiFi chips to bypass the defense in depth measures like browser sandboxes. This is conflating different types of attacks. A WiFi attack is usually considered a local attack, from somebody next to you in bar, rather than a remote attack from a server in Russia. Moreover, far from disproving “defense in depth” such WiFi attacks highlight the need for it. Namely, phones need to be designed so that successful exploitation of other microprocessors (namely, the WiFi, Bluetooth, and cellular baseband chips) can’t directly compromise the host system. In other words, once exploited with “Broadpwn”, a hacker would need to extend the exploit chain with another vulnerability in the hosts Broadcom WiFi driver rather than immediately exploiting a DMA attack across PCIe. This suggests that if PCIe is used to interface to peripherals in the phone that an IOMMU be used, for “defense in depth”.
Cybersecurity is a young field. There are lots of useful things that outsider non-techies can teach us. Lessons from military history would be well-received.
But that’s not this story. Instead, this story is by an outsider telling us we don’t know what we are doing, that they do, and then proceeds to prove they don’t know what they are doing. Their argument is based on a moral suasion and bullying us with what appears on the surface to be intellectual rigor, but which is in fact devoid of anything smart.
My fear, here, is that I’m going to be in a meeting where somebody has read this pretentious garbage, explaining to me why “defense in depth” is wrong and how we need to OODA faster. I’d rather nip this in the bud, pointing out if you found anything interesting from that article, you are wrong.

Confused About the Hybrid Cloud? You’re Not Alone

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/confused-about-the-hybrid-cloud-youre-not-alone/

Hybrid Cloud. What is it?

Do you have a clear understanding of the hybrid cloud? If you don’t, it’s not surprising.

Hybrid cloud has been applied to a greater and more varied number of IT solutions than almost any other recent data management term. About the only thing that’s clear about the hybrid cloud is that the term hybrid cloud wasn’t invented by customers, but by vendors who wanted to hawk whatever solution du jour they happened to be pushing.

Let’s be honest. We’re in an industry that loves hype. We can’t resist grafting hyper, multi, ultra, and super and other prefixes onto the beginnings of words to entice customers with something new and shiny. The alphabet soup of cloud-related terms can include various options for where the cloud is located (on-premises, off-premises), whether the resources are private or shared in some degree (private, community, public), what type of services are offered (storage, computing), and what type of orchestrating software is used to manage the workflow and the resources. With so many moving parts, it’s no wonder potential users are confused.

Let’s take a step back, try to clear up the misconceptions, and come up with a basic understanding of what the hybrid cloud is. To be clear, this is our viewpoint. Others are free to do what they like, so bear that in mind.

So, What is the Hybrid Cloud?

The hybrid cloud refers to a cloud environment made up of a mixture of on-premises private cloud resources combined with third-party public cloud resources that use some kind of orchestration between them.

To get beyond the hype, let’s start with Forrester Research‘s idea of the hybrid cloud: “One or more public clouds connected to something in my data center. That thing could be a private cloud; that thing could just be traditional data center infrastructure.”

To put it simply, a hybrid cloud is a mash-up of on-premises and off-premises IT resources.

To expand on that a bit, we can say that the hybrid cloud refers to a cloud environment made up of a mixture of on-premises private cloud[1] resources combined with third-party public cloud resources that use some kind of orchestration[2] between them. The advantage of the hybrid cloud model is that it allows workloads and data to move between private and public clouds in a flexible way as demands, needs, and costs change, giving businesses greater flexibility and more options for data deployment and use.

In other words, if you have some IT resources in-house that you are replicating or augmenting with an external vendor, congrats, you have a hybrid cloud!

Private Cloud vs. Public Cloud

The cloud is really just a collection of purpose built servers. In a private cloud, the servers are dedicated to a single tenant or a group of related tenants. In a public cloud, the servers are shared between multiple unrelated tenants (customers). A public cloud is off-site, while a private cloud can be on-site or off-site — or on-prem or off-prem.

As an example, let’s look at a hybrid cloud meant for data storage, a hybrid data cloud. A company might set up a rule that says all accounting files that have not been touched in the last year are automatically moved off-prem to cloud storage to save cost and reduce the amount of storage needed on-site. The files are still available; they are just no longer stored on your local systems. The rules can be defined to fit an organization’s workflow and data retention policies.

The hybrid cloud concept also contains cloud computing. For example, at the end of the quarter, order processing application instances can be spun up off-premises in a hybrid computing cloud as needed to add to on-premises capacity.

Hybrid Cloud Benefits

If we accept that the hybrid cloud combines the best elements of private and public clouds, then the benefits of hybrid cloud solutions are clear, and we can identify the primary two benefits that result from the blending of private and public clouds.

Diagram of the Components of the Hybrid Cloud

Benefit 1: Flexibility and Scalability

Undoubtedly, the primary advantage of the hybrid cloud is its flexibility. It takes time and money to manage in-house IT infrastructure and adding capacity requires advance planning.

The cloud is ready and able to provide IT resources whenever needed on short notice. The term cloud bursting refers to the on-demand and temporary use of the public cloud when demand exceeds resources available in the private cloud. For example, some businesses experience seasonal spikes that can put an extra burden on private clouds. These spikes can be taken up by a public cloud. Demand also can vary with geographic location, events, or other variables. The public cloud provides the elasticity to deal with these and other anticipated and unanticipated IT loads. The alternative would be fixed cost investments in on-premises IT resources that might not be efficiently utilized.

For a data storage user, the on-premises private cloud storage provides, among other benefits, the highest speed access. For data that is not frequently accessed, or needed with the absolute lowest levels of latency, it makes sense for the organization to move it to a location that is secure, but less expensive. The data is still readily available, and the public cloud provides a better platform for sharing the data with specific clients, users, or with the general public.

Benefit 2: Cost Savings

The public cloud component of the hybrid cloud provides cost-effective IT resources without incurring capital expenses and labor costs. IT professionals can determine the best configuration, service provider, and location for each service, thereby cutting costs by matching the resource with the task best suited to it. Services can be easily scaled, redeployed, or reduced when necessary, saving costs through increased efficiency and avoiding unnecessary expenses.

Comparing Private vs Hybrid Cloud Storage Costs

To get an idea of the difference in storage costs between a purely on-premises solutions and one that uses a hybrid of private and public storage, we’ll present two scenarios. For each scenario we’ll use data storage amounts of 100 terabytes, 1 petabyte, and 2 petabytes. Each table is the same format, all we’ve done is change how the data is distributed: private (on-premises) cloud or public (off-premises) cloud. We are using the costs for our own B2 Cloud Storage in this example. The math can be adapted for any set of numbers you wish to use.

Scenario 1    100% of data on-premises storage

Data Stored
Data stored On-Premises: 100% 100 TB 1,000 TB 2,000 TB
On-premises cost range Monthly Cost
Low — $12/TB/Month $1,200 $12,000 $24,000
High — $20/TB/Month $2,000 $20,000 $40,000

Scenario 2    20% of data on-premises with 80% public cloud storage (B2)

Data Stored
Data stored On-Premises: 20% 20 TB 200 TB 400 TB
Data stored in Cloud: 80% 80 TB 800 TB 1,600 TB
On-premises cost range Monthly Cost
Low — $12/TB/Month $240 $2,400 $4,800
High — $20/TB/Month $400 $4,000 $8,000
Public cloud cost range Monthly Cost
Low — $5/TB/Month (B2) $400 $4,000 $8,000
High — $20/TB/Month $1,600 $16,000 $32,000
On-premises + public cloud cost range Monthly Cost
Low $640 $6,400 $12,800
High $2,000 $20,000 $40,000

As can be seen in the numbers above, using a hybrid cloud solution and storing 80% of the data in the cloud with a provider such as Backblaze B2 can result in significant savings over storing only on-premises. For other cost scenarios, see the B2 Cost Calculator.

When Hybrid Might Not Always Be the Right Fit

There are circumstances where the hybrid cloud might not be the best solution. Smaller organizations operating on a tight IT budget might best be served by a purely public cloud solution. The cost of setting up and running private servers is substantial.

An application that requires the highest possible speed might not be suitable for hybrid, depending on the specific cloud implementation. While latency does play a factor in data storage for some users, it is less of a factor for uploading and downloading data than it is for organizations using the hybrid cloud for computing. Because Backblaze recognized the importance of speed and low-latency for customers wishing to use computing on data stored in B2, we directly connected our data centers with those of our computing partners, ensuring that latency would not be an issue even for a hybrid cloud computing solution.

It is essential to have a good understanding of workloads and their essential characteristics in order to make the hybrid cloud work well for you. Each application needs to be examined for the right mix of private cloud, public cloud, and traditional IT resources that fit the particular workload in order to benefit most from a hybrid cloud architecture.

The Hybrid Cloud Can Be a Win-Win Solution

From the high altitude perspective, any solution that enables an organization to respond in a flexible manner to IT demands is a win. Avoiding big upfront capital expenses for in-house IT infrastructure will appeal to the CFO. Being able to quickly spin up IT resources as they’re needed will appeal to the CTO and VP of Operations.

Should You Go Hybrid?

We’ve arrived at the bottom line and the question is, should you or your organization embrace hybrid cloud infrastructures?

According to 451 Research, by 2019, 69% of companies will operate in hybrid cloud environments, and 60% of workloads will be running in some form of hosted cloud service (up from 45% in 2017). That indicates that the benefits of the hybrid cloud appeal to a broad range of companies.

In Two Years, More Than Half of Workloads Will Run in Cloud

Clearly, depending on an organization’s needs, there are advantages to a hybrid solution. While it might have been possible to dismiss the hybrid cloud in the early days of the cloud as nothing more than a buzzword, that’s no longer true. The hybrid cloud has evolved beyond the marketing hype to offer real solutions for an increasingly complex and challenging IT environment.

If an organization approaches the hybrid cloud with sufficient planning and a structured approach, a hybrid cloud can deliver on-demand flexibility, empower legacy systems and applications with new capabilities, and become a catalyst for digital transformation. The result can be an elastic and responsive infrastructure that has the ability to quickly respond to changing demands of the business.

As data management professionals increasingly recognize the advantages of the hybrid cloud, we can expect more and more of them to embrace it as an essential part of their IT strategy.

Tell Us What You’re Doing with the Hybrid Cloud

Are you currently embracing the hybrid cloud, or are you still uncertain or hanging back because you’re satisfied with how things are currently? Maybe you’ve gone totally hybrid. We’d love to hear your comments below on how you’re dealing with the hybrid cloud.


[1] Private cloud can be on-premises or a dedicated off-premises facility.

[2] Hybrid cloud orchestration solutions are often proprietary, vertical, and task dependent.

The post Confused About the Hybrid Cloud? You’re Not Alone appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Welcome Nathan – Our Solutions Engineer

Post Syndicated from Yev original https://www.backblaze.com/blog/welcome-nathan-our-solutions-engineer/

Backblaze is growing, and with it our need to cater to a lot of different use cases that our customers bring to us. We needed a Solutions Engineer to help out, and after a long search we’ve hired our first one! Lets learn a bit more about Nathan shall we?

What is your Backblaze Title?
Solutions Engineer. Our customers bring a thousand different use cases to both B1 and B2, and I’m here to help them figure out how best to make those use cases a reality. Also, any odd jobs that Nilay wants me to do.

Where are you originally from?
I am native to the San Francisco Bay Area, studying mathematics at UC Santa Cruz, and then computer science at California University of Hayward (which has since renamed itself California University of the East Hills. I observe that it’s still in Hayward).

What attracted you to Backblaze?
As a stable, growing company with huge growth and even bigger potential, the business model is attractive, and the team is outstanding. Add to that the strong commitment to transparency, and it’s a hard company to resist. We can store – and restore – data while offering superior reliability at an economic advantage to do-it-yourself, and that’s a great place to be.

What do you expect to learn while being at Backblaze?
Everything I need to, but principally how our customers choose to interact with web storage. Storage isn’t a solution per se, but it’s an important component of any persistent solution. I’m looking forward to working with all the different concepts our customers have to make use of storage.

Where else have you worked?
All sorts of places, but I’ll admit publicly to EMC, Gemalto, and my own little (failed, alas) startup, IC2N. I worked with low-level document imaging.

Where did you go to school?
UC Santa Cruz, BA Mathematics CU Hayward, Master of Science in Computer Science.

What’s your dream job?
Sipping tea in the California redwood forest. However, solutions engineer at Backblaze is a good second choice!

Favorite place you’ve traveled?
Ashland, Oregon, for the Oregon Shakespeare Festival and the marble caves (most caves form from limestone).

Favorite hobby?
Theater. Pathfinder. Writing. Baking cookies and cakes.

Of what achievement are you most proud?
Marrying the most wonderful man in the world.

Star Trek or Star Wars?
Star Trek’s utopian science fiction vision of humanity and science resonates a lot more strongly with me than the dystopian science fantasy of Star Wars.

Coke or Pepsi?
Neither. I’d much rather have a cup of jasmine tea.

Favorite food?
It varies, but I love Indian and Thai cuisine. Truly excellent Italian food is marvelous – wood fired pizza, if I had to pick only one, but the world would be a boring place with a single favorite food.

Why do you like certain things?
If I knew that, I’d be in marketing.

Anything else you’d like you’d like to tell us?
If you haven’t already encountered the amazing authors Patricia McKillip and Lois McMasters Bujold – go encounter them. Be happy.

There’s nothing wrong with a nice cup of tea and a long game of Pathfinder. Sign us up! Welcome to the team Nathan!

The post Welcome Nathan – Our Solutions Engineer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Needed: Associate Front End Developer

Post Syndicated from Yev original https://www.backblaze.com/blog/needed-associate-front-end-developer/

Want to work at a company that helps customers in over 150 countries around the world protect the memories they hold dear? Do you want to challenge yourself with a business that serves consumers, SMBs, Enterprise, and developers?

If all that sounds interesting, you might be interested to know that Backblaze is looking for an Associate Front End Developer!

Backblaze is a 10 year old company. Providing great customer experiences is the “secret sauce” that enables us to successfully compete against some of technology’s giants. We’ll finish the year at ~$20MM ARR and are a profitable business. This is an opportunity to have your work shine at scale in one of the fastest growing verticals in tech — Cloud Storage.

You will utilize HTML, ReactJS, CSS and jQuery to develop intuitive, elegant user experiences. As a member of our Front End Dev team, you will work closely with our web development, software design, and marketing teams.

On a day to day basis, you must be able to convert image mockups to HTML or ReactJS – There’s some production work that needs to get done. But you will also be responsible for helping build out new features, rethink old processes, and enabling third party systems to empower our marketing, sales, and support teams.

Our Associate Front End Developer must be proficient in:

  • HTML, CSS, Javascript (ES5)
  • jQuery, Bootstrap (with responsive targets)
  • Understanding of ensuring cross-browser compatibility and browser security for features
  • Basic SEO principles and ensuring that applications will adhere to them
  • Familiarity with ES2015+, ReactJS, unit testing
  • Learning about third party marketing and sales tools through reading documentation. Our systems include Google Tag Manager, Google Analytics, Salesforce, and Hubspot.
  • React Flux, Redux, SASS, Node experience is a plus

We’re looking for someone that is:

  • Passionate about building friendly, easy to use Interfaces and APIs.
  • Likes to work closely with other engineers, support, and marketing to help customers.
  • Is comfortable working independently on a mutually agreed upon prioritization queue (we don’t micromanage, we do make sure tasks are reasonably defined and scoped).
  • Diligent with quality control. Backblaze prides itself on giving our team autonomy to get work done, do the right thing for our customers, and keep a pace that is sustainable over the long run. As such, we expect everyone that checks in code that is stable. We also have a small QA team that operates as a secondary check when needed.

Backblaze Employees Have:

  • Good attitude and willingness to do whatever it takes to get the job done.
  • Strong desire to work for a small fast, paced company.
  • Desire to learn and adapt to rapidly changing technologies and work environment.
  • Comfort with well behaved pets in the office.

This position is located in San Mateo, California. Regular attendance in the office is expected.

Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

If this sounds like you…
Send an email to: jobscontact@backblaze.com with:

  1. Associate Front End Dev in the subject line
  2. Your resume attached
  3. An overview of your relevant experience

The post Needed: Associate Front End Developer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The Challenges of Opening a Data Center — Part 2

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/factors-for-choosing-data-center/

Rows of storage pods in a data center

This is part two of a series on the factors that an organization needs to consider when opening a data center and the challenges that must be met in the process.

In Part 1 of this series, we looked at the different types of data centers, the importance of location in planning a data center, data center certification, and the single most expensive factor in running a data center, power.

In Part 2, we continue to look at factors that need to considered both by those interested in a dedicated data center and those seeking to colocate in an existing center.

Power (continued from Part 1)

In part 1, we began our discussion of the power requirements of data centers.

As we discussed, redundancy and failover is a chief requirement for data center power. A redundantly designed power supply system is also a necessity for maintenance, as it enables repairs to be performed on one network, for example, without having to turn off servers, databases, or electrical equipment.

Power Path

The common critical components of a data center’s power flow are:

  • Utility Supply
  • Generators
  • Transfer Switches
  • Distribution Panels
  • Uninterruptible Power Supplies (UPS)
  • PDUs

Utility Supply is the power that comes from one or more utility grids. While most of us consider the grid to be our primary power supply (hats off to those of you who manage to live off the grid), politics, economics, and distribution make utility supply power susceptible to outages, which is why data centers must have autonomous power available to maintain availability.

Generators are used to supply power when the utility supply is unavailable. They convert mechanical energy, usually from motors, to electrical energy.

Transfer Switches are used to transfer electric load from one source or electrical device to another, such as from one utility line to another, from a generator to a utility, or between generators. The transfer could be manually activated or automatic to ensure continuous electrical power.

Distribution Panels get the power where it needs to go, taking a power feed and dividing it into separate circuits to supply multiple loads.

A UPS, as we touched on earlier, ensures that continuous power is available even when the main power source isn’t. It often consists of batteries that can come online almost instantaneously when the current power ceases. The power from a UPS does not have to last a long time as it is considered an emergency measure until the main power source can be restored. Another function of the UPS is to filter and stabilize the power from the main power supply.

Data Center UPS

Data center UPSs

PDU stands for the Power Distribution Unit and is the device that distributes power to the individual pieces of equipment.

Network

After power, the networking connections to the data center are of prime importance. Can the data center obtain and maintain high-speed networking connections to the building? With networking, as with all aspects of a data center, availability is a primary consideration. Data center designers think of all possible ways service can be interrupted or lost, even briefly. Details such as the vulnerabilities in the route the network connections make from the core network (the backhaul) to the center, and where network connections enter and exit a building, must be taken into consideration in network and data center design.

Routers and switches are used to transport traffic between the servers in the data center and the core network. Just as with power, network redundancy is a prime factor in maintaining availability of data center services. Two or more upstream service providers are required to ensure that availability.

How fast a customer can transfer data to a data center is affected by: 1) the speed of the connections the data center has with the outside world, 2) the quality of the connections between the customer and the data center, and 3) the distance of the route from customer to the data center. The longer the length of the route and the greater the number of packets that must be transferred, the more significant a factor will be played by latency in the data transfer. Latency is the delay before a transfer of data begins following an instruction for its transfer. Generally latency, not speed, will be the most significant factor in transferring data to and from a data center. Packets transferred using the TCP/IP protocol suite, which is the conceptual model and set of communications protocols used on the internet and similar computer networks, must be acknowledged when received (ACK’d) and requires a communications roundtrip for each packet. If the data is in larger packets, the number of ACKs required is reduced, so latency will be a smaller factor in the overall network communications speed.

Latency generally will be less significant for data storage transfers than for cloud computing. Optimizations such as multi-threading, which is used in Backblaze’s Cloud Backup service, will generally improve overall transfer throughput if sufficient bandwidth is available.

Those interested in testing the overall speed and latency of their connection to Backblaze’s data centers can use the Check Your Bandwidth tool on our website.
Data center telecommunications equipment

Data center telecommunications equipment

Data center under floor cable runs

Data center under floor cable runs

Cooling

Computer, networking, and power generation equipment generates heat, and there are a number of solutions employed to rid a data center of that heat. The location and climate of the data center is of great importance to the data center designer because the climatic conditions dictate to a large degree what cooling technologies should be deployed that in turn affect the power used and the cost of using that power. The power required and cost needed to manage a data center in a warm, humid climate will vary greatly from managing one in a cool, dry climate. Innovation is strong in this area and many new approaches to efficient and cost-effective cooling are used in the latest data centers.

Switch's uninterruptible, multi-system, HVAC Data Center Cooling Units

Switch’s uninterruptible, multi-system, HVAC Data Center Cooling Units

There are three primary ways data center cooling can be achieved:

Room Cooling cools the entire operating area of the data center. This method can be suitable for small data centers, but becomes more difficult and inefficient as IT equipment density and center size increase.

Row Cooling concentrates on cooling a data center on a row by row basis. In its simplest form, hot aisle/cold aisle data center design involves lining up server racks in alternating rows with cold air intakes facing one way and hot air exhausts facing the other. The rows composed of rack fronts are called cold aisles. Typically, cold aisles face air conditioner output ducts. The rows the heated exhausts pour into are called hot aisles. Typically, hot aisles face air conditioner return ducts.

Rack Cooling tackles cooling on a rack by rack basis. Air-conditioning units are dedicated to specific racks. This approach allows for maximum densities to be deployed per rack. This works best in data centers with fully loaded racks, otherwise there would be too much cooling capacity, and the air-conditioning losses alone could exceed the total IT load.

Security

Data Centers are high-security facilities as they house business, government, and other data that contains personal, financial, and other secure information about businesses and individuals.

This list contains the physical-security considerations when opening or co-locating in a data center:

Layered Security Zones. Systems and processes are deployed to allow only authorized personnel in certain areas of the data center. Examples include keycard access, alarm systems, mantraps, secure doors, and staffed checkpoints.

Physical Barriers. Physical barriers, fencing and reinforced walls are used to protect facilities. In a colocation facility, one customers’ racks and servers are often inaccessible to other customers colocating in the same data center.

Backblaze racks secured in the data center

Backblaze racks secured in the data center

Monitoring Systems. Advanced surveillance technology monitors and records activity on approaching driveways, building entrances, exits, loading areas, and equipment areas. These systems also can be used to monitor and detect fire and water emergencies, providing early detection and notification before significant damage results.

Top-tier providers evaluate their data center security and facilities on an ongoing basis. Technology becomes outdated quickly, so providers must stay-on-top of new approaches and technologies in order to protect valuable IT assets.

To pass into high security areas of a data center requires passing through a security checkpoint where credentials are verified.

Data Center security

The gauntlet of cameras and steel bars one must pass before entering this data center

Facilities and Services

Data center colocation providers often differentiate themselves by offering value-added services. In addition to the required space, power, cooling, connectivity and security capabilities, the best solutions provide several on-site amenities. These accommodations include offices and workstations, conference rooms, and access to phones, copy machines, and office equipment.

Additional features may consist of kitchen facilities, break rooms and relaxation lounges, storage facilities for client equipment, and secure loading docks and freight elevators.

Moving into A Data Center

Moving into a data center is a major job for any organization. We wrote a post last year, Desert To Data in 7 Days — Our New Phoenix Data Center, about what it was like to move into our new data center in Phoenix, Arizona.

Desert To Data in 7 Days — Our New Phoenix Data Center

Visiting a Data Center

Our Director of Product Marketing Andy Klein wrote a popular post last year on what it’s like to visit a data center called A Day in the Life of a Data Center.

A Day in the Life of a Data Center

Would you Like to Know More about The Challenges of Opening and Running a Data Center?

That’s it for part 2 of this series. If readers are interested, we could write a post about some of the new technologies and trends affecting data center design and use. Please let us know in the comments.

Here's a tip!Here’s a tip on finding all the posts tagged with data center on our blog. Just follow https://www.backblaze.com/blog/tag/data-center/.

Don’t miss future posts on data centers and other topics, including hard drive stats, cloud storage, and tips and tricks for backing up to the cloud. Use the Join button above to receive notification of future posts on our blog.

The post The Challenges of Opening a Data Center — Part 2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Amazon Redshift – 2017 Recap

Post Syndicated from Larry Heathcote original https://aws.amazon.com/blogs/big-data/amazon-redshift-2017-recap/

We have been busy adding new features and capabilities to Amazon Redshift, and we wanted to give you a glimpse of what we’ve been doing over the past year. In this article, we recap a few of our enhancements and provide a set of resources that you can use to learn more and get the most out of your Amazon Redshift implementation.

In 2017, we made more than 30 announcements about Amazon Redshift. We listened to you, our customers, and delivered Redshift Spectrum, a feature of Amazon Redshift, that gives you the ability to extend analytics to your data lake—without moving data. We launched new DC2 nodes, doubling performance at the same price. We also announced many new features that provide greater scalability, better performance, more automation, and easier ways to manage your analytics workloads.

To see a full list of our launches, visit our what’s new page—and be sure to subscribe to our RSS feed.

Major launches in 2017

Amazon Redshift Spectrumextend analytics to your data lake, without moving data

We launched Amazon Redshift Spectrum to give you the freedom to store data in Amazon S3, in open file formats, and have it available for analytics without the need to load it into your Amazon Redshift cluster. It enables you to easily join datasets across Redshift clusters and S3 to provide unique insights that you would not be able to obtain by querying independent data silos.

With Redshift Spectrum, you can run SQL queries against data in an Amazon S3 data lake as easily as you analyze data stored in Amazon Redshift. And you can do it without loading data or resizing the Amazon Redshift cluster based on growing data volumes. Redshift Spectrum separates compute and storage to meet workload demands for data size, concurrency, and performance. Redshift Spectrum scales processing across thousands of nodes, so results are fast, even with massive datasets and complex queries. You can query open file formats that you already use—such as Apache Avro, CSV, Grok, ORC, Apache Parquet, RCFile, RegexSerDe, SequenceFile, TextFile, and TSV—directly in Amazon S3, without any data movement.

For complex queries, Redshift Spectrum provided a 67 percent performance gain,” said Rafi Ton, CEO, NUVIAD. “Using the Parquet data format, Redshift Spectrum delivered an 80 percent performance improvement. For us, this was substantial.

To learn more about Redshift Spectrum, watch our AWS Summit session Intro to Amazon Redshift Spectrum: Now Query Exabytes of Data in S3, and read our announcement blog post Amazon Redshift Spectrum – Exabyte-Scale In-Place Queries of S3 Data.

DC2 nodes—twice the performance of DC1 at the same price

We launched second-generation Dense Compute (DC2) nodes to provide low latency and high throughput for demanding data warehousing workloads. DC2 nodes feature powerful Intel E5-2686 v4 (Broadwell) CPUs, fast DDR4 memory, and NVMe-based solid state disks (SSDs). We’ve tuned Amazon Redshift to take advantage of the better CPU, network, and disk on DC2 nodes, providing up to twice the performance of DC1 at the same price. Our DC2.8xlarge instances now provide twice the memory per slice of data and an optimized storage layout with 30 percent better storage utilization.

Redshift allows us to quickly spin up clusters and provide our data scientists with a fast and easy method to access data and generate insights,” said Bradley Todd, technology architect at Liberty Mutual. “We saw a 9x reduction in month-end reporting time with Redshift DC2 nodes as compared to DC1.”

Read our customer testimonials to see the performance gains our customers are experiencing with DC2 nodes. To learn more, read our blog post Amazon Redshift Dense Compute (DC2) Nodes Deliver Twice the Performance as DC1 at the Same Price.

Performance enhancements— 3x-5x faster queries

On average, our customers are seeing 3x to 5x performance gains for most of their critical workloads.

We introduced short query acceleration to speed up execution of queries such as reports, dashboards, and interactive analysis. Short query acceleration uses machine learning to predict the execution time of a query, and to move short running queries to an express short query queue for faster processing.

We launched results caching to deliver sub-second response times for queries that are repeated, such as dashboards, visualizations, and those from BI tools. Results caching has an added benefit of freeing up resources to improve the performance of all other queries.

We also introduced late materialization to reduce the amount of data scanned for queries with predicate filters by batching and factoring in the filtering of predicates before fetching data blocks in the next column. For example, if only 10 percent of the table rows satisfy the predicate filters, Amazon Redshift can potentially save 90 percent of the I/O for the remaining columns to improve query performance.

We launched query monitoring rules and pre-defined rule templates. These features make it easier for you to set metrics-based performance boundaries for workload management (WLM) queries, and specify what action to take when a query goes beyond those boundaries. For example, for a queue that’s dedicated to short-running queries, you might create a rule that aborts queries that run for more than 60 seconds. To track poorly designed queries, you might have another rule that logs queries that contain nested loops.

Customer insights

Amazon Redshift and Redshift Spectrum serve customers across a variety of industries and sizes, from startups to large enterprises. Visit our customer page to see the success that customers are having with our recent enhancements. Learn how companies like Liberty Mutual Insurance saw a 9x reduction in month-end reporting time using DC2 nodes. On this page, you can find case studies, videos, and other content that show how our customers are using Amazon Redshift to drive innovation and business results.

In addition, check out these resources to learn about the success our customers are having building out a data warehouse and data lake integration solution with Amazon Redshift:

Partner solutions

You can enhance your Amazon Redshift data warehouse by working with industry-leading experts. Our AWS Partner Network (APN) Partners have certified their solutions to work with Amazon Redshift. They offer software, tools, integration, and consulting services to help you at every step. Visit our Amazon Redshift Partner page and choose an APN Partner. Or, use AWS Marketplace to find and immediately start using third-party software.

To see what our Partners are saying about Amazon Redshift Spectrum and our DC2 nodes mentioned earlier, read these blog posts:

Resources

Blog posts

Visit the AWS Big Data Blog for a list of all Amazon Redshift articles.

YouTube videos

GitHub

Our community of experts contribute on GitHub to provide tips and hints that can help you get the most out of your deployment. Visit GitHub frequently to get the latest technical guidance, code samples, administrative task automation utilities, the analyze & vacuum schema utility, and more.

Customer support

If you are evaluating or considering a proof of concept with Amazon Redshift, or you need assistance migrating your on-premises or other cloud-based data warehouse to Amazon Redshift, our team of product experts and solutions architects can help you with architecting, sizing, and optimizing your data warehouse. Contact us using this support request form, and let us know how we can assist you.

If you are an Amazon Redshift customer, we offer a no-cost health check program. Our team of database engineers and solutions architects give you recommendations for optimizing Amazon Redshift and Amazon Redshift Spectrum for your specific workloads. To learn more, email us at [email protected].

If you have any questions, email us at [email protected].

 


Additional Reading

If you found this post useful, be sure to check out Amazon Redshift Spectrum – Exabyte-Scale In-Place Queries of S3 Data, Using Amazon Redshift for Fast Analytical Reports and How to Migrate Your Oracle Data Warehouse to Amazon Redshift Using AWS SCT and AWS DMS.


About the Author

Larry Heathcote is a Principle Product Marketing Manager at Amazon Web Services for data warehousing and analytics. Larry is passionate about seeing the results of data-driven insights on business outcomes. He enjoys family time, home projects, grilling out and the taste of classic barbeque.

 

 

 

[$] An overview of Project Atomic

Post Syndicated from jake original https://lwn.net/Articles/747576/rss

Terms like "cloud-native” and "web scale" are often used
and understood as pointless buzzwords. Under the layers of marketing, though,
cloud systems do work best with a new and different way of thinking about system
administration. Much of the tool set used for cloud operations is free
software, and Linux is the platform of choice for almost all cloud
applications. While just about any distribution can be made to work, there are
several projects working to create a ground-up system specifically for cloud
hosts. One of the best known of these is Project Atomic from Red Hat and the
Fedora Project.