Tag Archives: reports

Cassette deck in an old Ferrari, Pi-fied

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/cassette-deck-in-an-old-ferrari-pi-fied/

Here’s one for the classic car enthusiasts and audiophiles in the room. Matthew Leigh (Managing Director of Infomagnet by day, skilled maker by night) took the aged cassette deck from an old Ferarri, and brought it into 2017 with the help of a Raspberry Pi.

Raspberry Pi Ferarri

He used a HiFiBerry DAC alongside a Raspberry Pi 3 to allow the playback of digital music through the sound system of the car. The best part? It all fits neatly into the existing tape deck.

Raspberry Pi FerarriMatthew was also able to integrate the tech with the existing function buttons, allowing him to use the original fast-forward, rewind, pause and play controls.

Raspberry Pi Ferrari

The USB ports are accessible via the cassette door, allowing users to insert flash drives loaded with music. As always, the Raspberry Pi 3 is also accessible via WiFi, providing further connectivity and functionality. A network-connected tablet acts as a media centre screen.

Raspberry Pi Ferarri

The build could be taken further. The Amazon Alexa Voice Service, connected to a 4G dongle or phone, could update the driver with traffic issues, breaking news, or weather reports. In fact, we’ve seen so many ‘carputer’ builds, we’re convinced that there’s no end to the vehicular uses for a hidden Raspberry Pi.

Have you built a carputer? Or maybe hidden a Raspberry Pi in an old piece of tech, or an unexpected location? Let us know in the comments below.

The post Cassette deck in an old Ferrari, Pi-fied appeared first on Raspberry Pi.

Why your compliance scan keeps failing

Post Syndicated from Christian Larsen original https://www.anchor.com.au/blog/2017/02/compliance-scan-failing/

At Anchor, a large number of our customers seek various certifications for their hosted properties. These commonly include PCI, IRAP and many others.

The business drivers for undertaking a certification programme vary between customers, but often involve the need to meet some form or regulatory or industry compliance requirement. Whilst it may not be the initial driver, the fundamental goal of many of these activities is to reduce business risk by improving security posture.

We’ve become accustomed to dealing with auditors to assist our customers in undertaking certification initiatives. Unfortunately, just as in any industry, the quality of service, analysis and rigour that these entities employ can vary wildly.

Automated scans

Auditors commonly use automated tooling as an initial mechanism when assessing the compliance of a hosted infrastructure against the certification that their customer is attempting to achieve. This is a perfectly valid approach; automation increases consistency and reduces incidences of error as well as effort expended.

These tools will often expose a number of non-compliances, many of which are perfectly valid in context of the certification against which they are testing. Unfortunately, in our experience, there are often a large number of invalid non-compliance items on these reports.

Many of these non-compliance items relate to the use of outdated software, but auditor assumptions as to what version numbers actually mean can often be faulty.

Keeping software up to date

Without doubt, outdated and unpatched software is a huge security risk. Keeping software up to date is one of the most basic tenets of any mature security posture. As software becomes more prolific in our lives, so do the number of bugs and potential security vulnerabilities that come with it.

The Common Vulnerabilities and Exposures system is the industry standard mechanism for tracking known security flaws in software. Each vulnerability is assigned an ID by the Mitre corporation, which can be later referenced by software developers and operators to assess their security exposure. Software vendors will use this data as input into their development processes to associate bug fixes with the next release that addresses them.

As consequence, consumers of affected software can then reference the software change logs or documentation to identify at what point in time and what version the software was fixed for the particular vulnerability they are concerned with. The more recent the vulnerability, the more likely it is that a more recent version of the software will address it.

With this in mind, a reasonable expectation is thus that to attain the latest security fixes, one must update to the latest available version of the affected software. Many auditor tools make the same assumption; it is unfortunate that this line of thought is naive.

Problems with updating software

One of the problems associated with updating to the latest available version of software is that vendors generally do not just release new security fixes. They’ll often (almost always) couple in new features, some of which will also change the functionality of said software.

It is these changes in functionality that represent a problem for many organisations. Updating and patching software becomes a monolithic and arduous task because it is difficult to test, validate and orchestrate at even small scales. Changes to functionality of interfaces can potentially break production services. The unfortunate consequence is that patching becomes an irregular task, which only serves to increase the organisation’s length of exposure to publicly known security vulnerabilities. From a business perspective, uptime will generally take precedence over security.

As a matter of policy, Anchor updates and patches all customer services on a weekly basis. For a large, heterogeneous fleet of services such as ours, given the prior constraints, this may seem like a mammoth task. Whilst not trivial at scale, it is actually not at all a disruptive activity for our customers — In fact, this author can count on one hand the quantity of incidents that have been caused due to Anchor’s patching practices, and all have been resolved within the bounds of the associated maintenance window.

Backporting

Anchor is able to achieve this record by taking advantage of others‘ work. Wherever possible, the software we make available to customers is that provided and packaged by the operating system (OS) vendor. We rely heavily on the OS vendors and their incredible track record for releasing stable, reliable updates that address security concerns.

For many OS vendors, there exists a policy that software within a major release should remain stable; this means that there should be no or minimal changes to packaged software for the lifetime of the release. Red Hat Enterprise Linux (RHEL) is one such example with a stellar reputation. RHEL has a published application compatibility policy.

Stable software, however, should not be stagnant software. To ensure that the OS vendor’s packaged software remains secure, they will backport security fixes into their products from upstream software projects.

From Red Hat’s backporting documentation:

We use the term backporting to describe the action of taking a fix for a security flaw out of the most recent version of an upstream software package and applying that fix to an older version of the package we distribute.
When we backport security fixes, we:

  • identify the fixes and isolate them from any other changes
  • make sure the fixes do not introduce unwanted side effects
  • apply the fixes to our previously released versions

Red Hat and many other vendors will publicly document their security advisories so that users can assess the vulnerability of the software they have been provided. It is also possible to search by CVE identifier.

By taking advantage of the OS vendor’s existing security practice, Anchor is able to regularly update our customer’s infrastructure whilst ensuring stability and security.

The caveat to this approach, however, is that the software versions listed as being in use register as vulnerable and outdated to many naive auditor tools.

Ignoring auditor advice

A common consequence of an initial audit is a request to update to the most recent version of the software identified, without regard for the current security practices. A list of CVE IDs that scanning tools associate with the currently used software will also be supplied.

Again, from Red Hat’s backporting documentation:

“Backporting has a number of advantages for customers, but it can create confusion when it is not understood. Customers need to be aware that just looking at the version number of a package will not tell them if they are vulnerable or not.
Also, some security scanning and auditing tools make decisions about vulnerabilities based solely on the version number of components they find. This results in false positives as the tools do not take into account backported security fixes.”



It is at this stage that a dispute must be filed with the findings; a tiresome but necessary process. Some auditors will record current security practice as a compensating control.

Most of the time, we are successful in such discussions, but there are many times when an auditor will not accept ‘No’ for an answer.

Why not just update to the latest version?

An obvious solution to appease a difficult auditor is to simply heed their advice and update the identified software to the latest version provided by the upstream project. This can in actuality be a terrible decision that may reduce overall security in the long term.

When you deviate from the software provided by the operating system vendor, you no longer receive the benefits of their maintenance. The burden falls onto you to regularly update, test and deploy the software, dealing with any changes in functionality along the way.

This practice is difficult at any scale. As consequence, it is very likely to become an infrequent activity that will result in less updates and greater length of exposure to any known vulnerabilities.

Anchor will always recommend to customers that they stick to OS vendor provided software packages where possible. It increases security, stability and reliability whilst reducing ongoing maintenance overhead.

In conclusion

Sticking with OS vendor packaged software is not the only option. One of the consequences of sticking to packaged software is that there will be times when you do actually require new features in current releases that aren’t available in the installed package — Remember that the OS vendor’s objective is to maintain stability, not increase functionality. In these circumstances, a trade off may be required to achieve the desired business outcome. There’s nothing wrong with deviating from established practice if you are aware of the risks and willing to accept responsibility.

Certification programmes can be taxing on all involved, but predominantly lead to positive outcomes once completed. Auditors, whilst not often popular, can be a necessary part of the process and are ultimately there to assist wherever they can — despite common wisdom, they’re not the enemy.

It’s important to have a good understanding of your own security practices and the risks you are willing to accept. With this knowledge in hand, you’ll be well prepared to have a productive conversation with your auditor and, with some luck, survive certification with your sanity intact.

The post Why your compliance scan keeps failing appeared first on AWS Managed Services by Anchor.

Why your compliance scan keeps failing

Post Syndicated from christian larson original http://www.anchor.com.au/blog/2017/02/compliance-scan-failing/

At Anchor, a large number of our customers seek various certifications for their hosted properties. These commonly include PCI, IRAP and many others.

The business drivers for undertaking a certification programme vary between customers, but often involve the need to meet some form or regulatory or industry compliance requirement. Whilst it may not be the initial driver, the fundamental goal of many of these activities is to reduce business risk by increasing security posture.

We’ve become accustomed to dealing with auditors to assist our customers in undertaking certification initiatives. Unfortunately, just as in any industry, the quality of service, analysis and rigour that these entities employ can vary wildly.

Automated scans

Auditors commonly use automated tooling as an initial mechanism when assessing the compliance of a hosted infrastructure against the certification that their customer is attempting to achieve. This is a perfectly valid approach; automation increases consistency and reduces incidences of error as well as effort expended.

These tools will often expose a number of non-compliances, many of which are perfectly valid in context of the certification against which they are testing. Unfortunately, in our experience, there are often a large number of invalid non-compliance items on these reports.

Many of these non-compliance items relate to the use of outdated software, but auditor assumptions as to what version numbers actually mean can often be faulty.

Keeping software up to date

Without doubt, outdated and unpatched software is a huge security risk. Keeping software up to date is one of the most basic tenets of any mature security posture. As software becomes more prolific in our lives, so do the number of bugs and potential security vulnerabilities that come with it.

The Common Vulnerabilities and Exposures system is the industry standard mechanism for tracking known security flaws in software. Each vulnerability is assigned an ID by the Mitre corporation, which can be later referenced by software developers and operators to assess their security exposure. Software vendors will use this data as input into their development processes to associate bug fixes with the next release that addresses them.

As consequence, consumers of affected software can then reference the software change logs or documentation to identify at what point in time and what version the software was fixed for the particular vulnerability they are concerned with. The more recent the vulnerability, the more likely it is that a more recent version of the software will address it.

With this in mind, a reasonable expectation is thus that to attain the latest security fixes, one must update to the latest available version of the affected software. Many auditor tools make the same assumption; it is unfortunate that this line of thought is naive.

Problems with updating software

One of the problems associated with updating to the latest available version of software is that vendors generally do not just release new security fixes. They’ll often (almost always) couple in new features, some of which will also change the functionality of said software.

It is these changes in functionality that represent a problem for many organisations. Updating and patching software becomes a monolithic and arduous task because it is difficult to test, validate and orchestrate at even small scales. Changes to functionality of interfaces can potentially break production services. The unfortunate consequence is that patching becomes an irregular task, which only serves to increase the organisation’s length of exposure to publicly known security vulnerabilities. From a business perspective, uptime will generally take precedence over security.

As a matter of policy, Anchor updates and patches all customer services on a weekly basis. For a large, heterogeneous fleet of services such as ours, given the prior constraints, this may seem like a mammoth task. Whilst not trivial at scale, it is actually not at all a disruptive activity for our customers — In fact, this author can count on one hand the quantity of incidents that have been caused due to Anchor’s patching practices, and all have been resolved within the bounds of the associated maintenance window.

Backporting

Anchor is able to achieve this record by taking advantage of others‘ work. Wherever possible, the software we make available to customers is that provided and packaged by the operating system (OS) vendor. We rely heavily on the OS vendors and their incredible track record for releasing stable, reliable updates that address security concerns.

For many OS vendors, there exists a policy that software within a major release should remain stable; this means that there should be no or minimal changes to packaged software for the lifetime of the release. Red Hat Enterprise Linux(RHEL) is one such example with a stellar reputation. RHEL has a published application compatibility policy.

Stable software, however, should not be stagnant software. To ensure that the OS vendor’s packaged software remains secure, they will backport security fixes into their products from upstream software projects.

From Red Hat’s backporting documentation:

We use the term backporting to describe the action of taking a fix for a security flaw out of the most recent version of an upstream software package and applying that fix to an older version of the package we distribute.

When we backport security fixes, we:

identify the fixes and isolate them from any other changes
make sure the fixes do not introduce unwanted side effects
apply the fixes to our previously released versions

Red Hat and many other vendors will publicly document their security advisories so that users can assess the vulnerability of the software they have been provided. It is also possible to search by CVE identifier.

By taking advantage of the OS vendor’s existing security practice, Anchor is able to regularly update our customer’s infrastructure whilst ensuring stability and security.

The caveat to this approach, however, is that the software versions listed as being in use register as vulnerable and outdated to many naive auditor tools.

Ignoring auditor advice

A common consequence of an initial audit is a request to update to the most recent version of the software identified, without regard for the current security practices. A list of CVE IDs that scanning tools associate with the currently used software will also be supplied.

Again, from Red Hat’s backporting documentation:

Backporting has a number of advantages for customers, but it can create confusion when it is not understood. Customers need to be aware that just looking at the version number of a package will not tell them if they are vulnerable or not.

Also, some security scanning and auditing tools make decisions about vulnerabilities based solely on the version number of components they find. This results in false positives as the tools do not take into account backported security fixes.

It is at this stage that a dispute must be filed with the findings; a tiresome but necessary process. Some auditors will record current security practice as a compensating control.

Most of the time, we are successful in such discussions, but there are many times when an auditor will not accept ‘No’ for an answer.

Why not just update to the latest version?

An obvious solution to appease a difficult auditor is to simply heed their advice and update the identified software to the latest version provided by the upstream project. This can in actuality be a terrible decision that may reduce overall security in the long term.

When you deviate from the software provided by the operating system vendor, you no longer receive the benefits of their maintenance. The burden falls onto you to regularly update, test and deploy the software, dealing with any changes in functionality along the way.

This practice is difficult at any scale. As consequence, it is very likely to become an infrequent activity that will result in less updates and greater length of exposure to any known vulnerabilities.

Anchor will always recommend to customers that they stick to OS vendor provided software packages where possible. It increases security, stability and reliability whilst reducing ongoing maintenance overhead.

In conclusion

Sticking with OS vendor packaged software is not the only option. One of the consequences of sticking to packaged software is that there will be times when you do actually require new features in current releases that aren’t available in the installed package — Remember that the OS vendor’s objective is to maintain stability, not increase functionality. In these circumstances, a trade off may be required to achieve the desired business outcome. There’s nothing wrong with deviating from established practice if you are aware of the risks and willing to accept responsibility.

Certification programmes can be taxing on all involved, but predominantly lead to positive outcomes once completed. Auditors, whilst not often popular, can be a necessary part of the process and are ultimately there to assist wherever they can — despite common wisdom, they’re not the enemy.

It’s important to have a good understanding of your own security practices and the risks you are willing to accept. With this knowledge in hand, you’ll be well prepared to have a productive conversation with your auditor and, with some luck, survive certification with your sanity intact.

The post Why your compliance scan keeps failing appeared first on AWS Managed Services by Anchor.

Linux champion Munich takes decisive step towards returning to Windows (TechRepublic)

Post Syndicated from corbet original https://lwn.net/Articles/714544/rss

TechRepublic reports
that the Munich, Germany city council has voted to begin the move back to
proprietary desktop software. “Under a proposal backed by the general council, the administration will investigate how long it will take and how much it will cost to build a Windows 10 client for use by the city’s employees.
Once this work is complete, the council will vote again on whether to
replace LiMux, a custom version of the Linux-based OS Ubuntu, across the
authority from 2021.

Research into the Root Causes of Terrorism

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/02/research_into_t_1.html

Interesting article in Science discussing field research on how people are radicalized to become terrorists.

The potential for research that can overcome existing constraints can be seen in recent advances in understanding violent extremism and, partly, in interdiction and prevention. Most notable is waning interest in simplistic root-cause explanations of why individuals become violent extremists (e.g., poverty, lack of education, marginalization, foreign occupation, and religious fervor), which cannot accommodate the richness and diversity of situations that breed terrorism or support meaningful interventions. A more tractable line of inquiry is how people actually become involved in terror networks (e.g., how they radicalize and are recruited, move to action, or come to abandon cause and comrades).

Reports from the The Soufan Group, International Center for the Study of Radicalisation (King’s College London), and the Combating Terrorism Center (U.S. Military Academy) indicate that approximately three-fourths of those who join the Islamic State or al-Qaeda do so in groups. These groups often involve preexisting social networks and typically cluster in particular towns and neighborhoods.. This suggests that much recruitment does not need direct personal appeals by organization agents or individual exposure to social media (which would entail a more dispersed recruitment pattern). Fieldwork is needed to identify the specific conditions under which these processes play out. Natural growth models of terrorist networks then might be based on an epidemiology of radical ideas in host social networks rather than built in the abstract then fitted to data and would allow for a public health, rather than strictly criminal, approach to violent extremism.

Such considerations have implications for countering terrorist recruitment. The present USG focus is on “counternarratives,” intended as alternative to the “ideologies” held to motivate terrorists. This strategy treats ideas as disembodied from the human conditions in which they are embedded and given life as animators of social groups. In their stead, research and policy might better focus on personalized “counterengagement,” addressing and harnessing the fellowship, passion, and purpose of people within specific social contexts, as ISIS and al-Qaeda often do. This focus stands in sharp contrast to reliance on negative mass messaging and sting operations to dissuade young people in doubt through entrapment and punishment (the most common practice used in U.S. law enforcement) rather than through positive persuasion and channeling into productive life paths. At the very least, we need field research in communities that is capable of capturing evidence to reveal which strategies are working, failing, or backfiring.

Survey Data on Americans and Cybersecurity

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/02/survey_data_on_.html

Pew Research just published their latest research data on Americans and their views on cybersecurity:

This survey finds that a majority of Americans have directly experienced some form of data theft or fraud, that a sizeable share of the public thinks that their personal data have become less secure in recent years, and that many lack confidence in various institutions to keep their personal data safe from misuse. In addition, many Americans are failing to follow digital security best practices in their own personal lives, and a substantial majority expects that major cyberattacks will be a fact of life in the future.

Here’s the full report.

How to Enable Multi-Factor Authentication for AWS Services by Using AWS Microsoft AD and On-Premises Credentials

Post Syndicated from Peter Pereira original https://aws.amazon.com/blogs/security/how-to-enable-multi-factor-authentication-for-amazon-workspaces-and-amazon-quicksight-by-using-microsoft-ad-and-on-premises-credentials/

You can now enable multi-factor authentication (MFA) for users of AWS services such as Amazon WorkSpaces and Amazon QuickSight and their on-premises credentials by using your AWS Directory Service for Microsoft Active Directory (Enterprise Edition) directory, also known as AWS Microsoft AD. MFA adds an extra layer of protection to a user name and password (the first “factor”) by requiring users to enter an authentication code (the second factor), which has been provided by your virtual or hardware MFA solution. These factors together provide additional security by preventing access to AWS services, unless users supply a valid MFA code.

To enable MFA for AWS services such as Amazon WorkSpaces and QuickSight, a key requirement is an MFA solution that is a Remote Authentication Dial-In User Service (RADIUS) server or a plugin to a RADIUS server already implemented in your on-premises infrastructure. RADIUS is an industry-standard client/server protocol that provides authentication, authorization, and accounting management to enable users to connect network services. The RADIUS server connects to your on-premises AD to authenticate and authorize users. For the purposes of this blog post, I will use “RADIUS/MFA” to refer to your on-premises RADIUS and MFA authentication solution.

In this blog post, I show how to enable MFA for your Amazon WorkSpaces users in two steps: 1) Configure your RADIUS/MFA server to accept Microsoft AD requests, and 2) configure your Microsoft AD directory to enable MFA.

Getting started

The solution in this blog post assumes that you already have the following components running:

  1. An active Microsoft AD directory
  2. An on-premises AD
  3. A trust relationship between your Microsoft AD and on-premises AD directories

To learn more about how to set up Microsoft AD and create trust relationships to enable Amazon WorkSpaces users to use AD on-premises credentials, see Now Available: Simplified Configuration of Trust Relationship in the AWS Directory Service Console.

Solution overview

The following network diagram shows the components you must have running to enable RADIUS/MFA for Amazon WorkSpaces. The left side in the diagram (covered in Step 1 below) represents your corporate data center with your on-premises AD connected to your RADIUS/MFA infrastructure that will provide the RADIUS user authentication. The right side (covered in Step 2 below) shows your Microsoft AD directory in the AWS Cloud connected to your on-premises AD via trust relationship, and the Amazon WorkSpaces joined to your Microsoft AD directory that will require the MFA code when you configure your environment by following Step 1 and Step 2.
Network diagram

Step 1 – Configure your RADIUS/MFA server to accept Microsoft AD requests

The following steps show you how to configure your RADIUS/MFA server to accept requests from your Microsoft AD directory.

  1. Obtain the Microsoft AD domain controller (DC) IP addresses to configure your RADIUS/MFA server:
    1. Open the AWS Management Console, choose Directory Service, and then choose your Microsoft AD Directory ID link.
      Screenshot of choosing Microsoft AD Directory ID link
    2. On the Directory details page, you will see the two DC IP addresses for your Microsoft AD directory (shown in the following screenshot as DNS Address). Your Microsoft AD DCs are the RADIUS clients to your RADIUS/MFA server.
      Screenshot of the two DC IP addresses for your Microsoft AD directory
  2. Configure your RADIUS/MFA server to add the RADIUS clients. If your RADIUS/MFA server supports DNS addresses, you will need to create only one RADIUS client configuration. Otherwise, you must create one RADIUS client configuration for each Microsoft AD DC, using the DC IP addresses you obtained in Step 1:
    1. Open your RADIUS client configuration screen in your RADIUS/MFA solution.
    2. Create one RADIUS client configuration for each Microsoft AD DC. The following are the common parameters (your RADIUS/MFA server may vary):
      • Address (DNS or IP): Type the DNS address of your Microsoft AD directory or the IP address of your Microsoft AD DC you obtained in Step 1.
      • Port number: You might need to configure the port number of your RADIUS/MFA server on which your RADIUS/MFA server accepts RADIUS client connections. The standard RADIUS port is 1812.
      • Shared secret: Type or generate a shared secret that will be used by the RADIUS/MFA server to connect with RADIUS clients.
      • Protocol: You might need to configure the authentication protocol between the Microsoft AD DCs and the RADIUS/MFA server. Supported protocols are PAP, CHAP MS-CHAPv1, and MS-CHAPv2. MS-CHAPv2 is recommended because it provides the strongest security of the three options.
      • Application name: This may be optional in some RADIUS/MFA servers and usually identifies the application in messages or reports.
    3. Configure your on-premises network to allow inbound traffic from the RADIUS clients (Microsoft AD DCs IP addresses) to your RADIUS/MFA server port, defined in Step 1.
    4. Add a rule to the Amazon security group of your Microsoft AD directory to allow inbound traffic from the RADIUS/MFA server IP address and port number defined previously.

Step 2 – Configure your Microsoft AD directory to enable MFA

The final step is to configure your Microsoft AD directory to enable MFA. When you enable MFA, Amazon WorkSpaces that are enabled in your Microsoft AD directory will require the user to enter an MFA code along with their user name and password.

To enable MFA in your Microsoft AD directory:

  1. Open the AWS Management Console, choose Directory Service, and then choose your Microsoft AD Directory ID link.
  2. Choose the Multi-Factor authentication tab and you will see what the following screenshot shows.
    Screenshot of Multi-Factor authentication tab
  3. Enter the following values to configure your RADIUS/MFA server to connect to your Microsoft AD directory:
    • Enable Multi-Factor Authentication: Select this check box to enable MFA configuration input settings fields.
    • RADIUS server IP address(es): Enter the IP addresses of your RADIUS/MFA server. You can enter multiple IP addresses, if you have more than one RADIUS/MFA server, by separating them with a comma (for example, 192.0.0.0, 192.0.0.12). Alternatively, you can use a DNS name for your RADIUS server when using AWS CLI.
    • Port: Enter the port number of your RADIUS/MFA server that you set in Step 1B.
    • Shared secret code: Enter the same shared secret you created in your RADIUS/MFA server in Step 1B.
    • Confirm shared secret code: Reenter your shared secret code.
    • Protocol: Select the authentication protocol between the Microsoft AD DCs and the RADIUS/MFA server. Supported protocols are PAP, CHAP MS-CHAPv1, and MS-CHAPv2. I recommend MS-CHAPv2 because it provides the strongest security of the three options.
    • Server timeout (in seconds): Enter the amount of time to wait for the RADIUS/MFA server to respond to authentication requests. If the RADIUS/MFA server does not respond in time, authentication will be retried (see Max retries). This value must be from 1 to 20.
    • Max retries: Specify the number of times that communication with the RADIUS/MFA server is attempted before failing. This must be a value from 0 to 10.
  4. Choose Update directory to update the RADIUS/MFA settings for your directory. The update process will take less than two minutes to complete. When the RADIUS/MFA Status changes to Completed, Amazon WorkSpaces will automatically prompt users to enter their user name and password from the on-premises AD, as well as an MFA code at next sign-in.
    1. If you receive a Failed status after choosing the Update directory button, check the following three most common errors (if you make a change to the configuration, choose Update to apply the changes):
      1. A mismatch between the shared key provided in the RADIUS/MFA server and Microsoft AD configurations.
      2. Network connectivity issues between your Microsoft AD and RADIUS/MFA server, because the on-premises network infrastructure or Amazon security groups are not properly set.
      3. The authentication protocol configured in Microsoft AD does not match or is not supported by the RADIUS/MFA server.

Summary

In this blog post, I provided a solution overview and walked through the two main steps to provide an extra layer of protection for Amazon WorkSpaces by enabling RADIUS/MFA by using Microsoft AD. Because users will be required to provide an MFA code (and have a virtual or hardware MFA device) immediately after you complete the configuration in Step 2, be sure you test this implementation in a test/development environment before deploying it in production.

You can also configure the MFA settings for Microsoft AD using the Directory Service APIs. To learn more about AWS Directory Service, see the AWS Directory Service home page. If you have questions, please post them on the Directory Service forum.

– Peter

CSIS’s Cybersecurity Agenda

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/02/csiss_cybersecu.html

The Center for Strategic and International Studies (CSIS) published “From Awareness to Action: A Cybersecurity Agenda for the 45th President” (press release here). There’s a lot I agree with — and some things I don’t — but these paragraphs struck me as particularly insightful:

The Obama administration made significant progress but suffered from two conceptual problems in its cybersecurity efforts. The first was a belief that the private sector would spontaneously generate the solutions needed for cybersecurity and minimize the need for government action. The obvious counter to this is that our problems haven’t been solved. There is no technological solution to the problem of cybersecurity, at least any time soon, so turning to technologists was unproductive. The larger national debate over the role of government made it difficult to balance public and private-sector responsibility and created a sense of hesitancy, even timidity, in executive branch actions.

The second was a misunderstanding of how the federal government works. All White Houses tend to float above the bureaucracy, but this one compounded the problem with its desire to bring high-profile business executives into government. These efforts ran counter to what is needed to manage a complex bureaucracy where greatly differing rules, relationships, and procedures determine the success of any initiative. Unlike the private sector, government decisionmaking is more collective, shaped by external pressures both bureaucratic and political, and rife with assorted strictures on resources and personnel.

Sandstorm is returning to its community roots

Post Syndicated from ris original https://lwn.net/Articles/713895/rss

Kenton Varda reports
that Sandstorm, as a company, is no more, but community development lives
on. LWN covered the Sandstorm personal
cloud platform in June 2014.

Many people also know that Sandstorm is a for-profit startup, with a business model centered on charging for enterprise-oriented features, such as LDAP and SAML single-sign-on integration, organizational access control policies, and the like. This product was called “Sandstorm for Work”; it was still open source, but official builds hid the features behind a paywall. Additionally, we planned eventually to release a scalable version of Sandstorm for big enterprise users, based on the same tech that powers Sandstorm Oasis, our managed hosting service.

As an open source project, Sandstorm has been successful: We have a thriving community of contributors, many developers building and packaging apps, and thousands of self-hosted servers running in the wild. This will continue.

However, our business has not succeeded. To date, almost no one has purchased Sandstorm for Work, despite hundreds of trials and lots of interest expressed. Only a tiny fraction of Sandstorm Oasis users choose to pay for the service – enough to cover costs, but not much more.

B2 for Beginners: Inside the B2 Web interface

Post Syndicated from Peter Cohen original https://www.backblaze.com/blog/b2-for-beginners-inside-the-b2-web-interface/

B2 for Beginners

B2 for Beginners: Inside the B2 Web interface

B2 Cloud Storage enables you to store data in the cloud at a fraction of what you’ll pay other services. For instance, we’re one-fourth of the price of Amazon’s S3. We’ve made it easy to access thanks to a web interface, API and command line interface. Let’s get to know the web interface a bit better, because it’s the easiest way to get around B2 and it’s a good way to get a handle on the fundamentals of B2 use.

Anyone with a Backblaze account can set up B2 access by visiting My Settings. Look for Enabled Products and check B2 Cloud Storage.

B2 is accessed the same way as your Backblaze Computer Backup. The sidebar on the left side of your My Account window shows you all the Backblaze services you use, including B2. Let’s go through the individual links under B2 Cloud Storage to get a sense of what they are and what they do.

Buckets

Data in B2 is stored in buckets. Think of a bucket as a top-level folder or directory. You can create as many buckets as you want. What’s more, you can put in as many files as you want. Buckets can contain files of any type or size.

Buckets

Third-party applications and services can integrate with B2, and many already do. The Buckets screen is where you can get your Account ID information and create an application key – a unique identifier your apps will use to securely connect to B2. If you’re using a third-party app that needs access to your bucket, such as a NAS backup app or a file sync tool, this is where you’ll find the info you need to connect. (We’ll have more info about how to backup your NAS to B2 very soon!)

The Buckets window lists the buckets you’ve created and provides basic information including creation date, ID, public or private type, lifecycle information, number of files, size and snapshots.

Click the Bucket Settings link to adjust each bucket’s individual settings. You can specify if files in the bucket are public or private. Private files can’t be shared, while public ones can be.

You can also tag your bucket with customized information encoded in JSON format. Custom info can contain letters, numbers, “-” and “_”.

Browse Files

Click the Upload/Download button to see a directory of each bucket. Alternately, click the Browse Files link on the left side of the B2 interface.

Browse Files

You can create a new subdirectory by clicking the New Folder button, or begin to upload files by clicking the Upload button. You can drag and drop files you’d like to upload and Backblaze will handle that for you. Alternately, clicking on the dialog box that appears will enable you to select the files on your computer you’d like to upload.

Info Button

Next to each individual file is an information button. Click it for details about the file, including name, location, kind, size and other details. You’ll also see a “Friendly URL” link. If the bucket is public and you’d like to share this file with others, you may copy that Friendly URL and paste it into an email or message to let people know where to find it.

Download

You can download the contents of your buckets by clicking the checkbox next to the filename and clicking the Download button. You can also delete files and create snapshots. Snapshots are helpful if you want to preserve copies of your files in their present state for some future download or recovery. You can also create a snapshot of the full bucket. If you have a large snapshot, you can order it as a hard drive instead of downloading it. We’ll get more into snapshots in a future blog post.

Lifecycle Settings

We recently introduced Lifecycle Settings to keep your buckets from getting cluttered with too many versions of files. Our web interface lets you manage these settings for each individual bucket.

Lifecycle Rules

By default, the bucket’s lifecycle setting is to keep all versions of files you upload. The web interface lets you adjust that so B2 only keeps the last file version, keeps the last file for a specific number of days, or keeps files based on your own custom rule. You can determine the file path, the number of days until the file is hidden, and the number of days until the file is deleted.

Lifecycle Rules

Reports

Backblaze updates your account daily with details on what’s happening with your B2 files. These reports are accessible through the B2 interface under the Reports tab. Clicking on reports will reveal an easy to understand visual charge showing you the average number of GB stored, total GB downloaded and total number of transactions for the month.

Reports

Look further down the page for a breakdown of monthly transactions by type, along with charts that help you track average GB stored, GB downloaded and count of average stored files for the month.

Caps and Alerts

One of our goals with B2 was to take the surprise out cloud storage fees. The B2 web GUI sports a Caps & Alerts section to help you control how much you spend on B2.

Caps & Alerts

This is where you can see – and limit – daily storage caps, daily downloads, and daily transactions. “Transactions” are interactions with your account like creating a new bucket, listing the contents of a bucket, or downloading a file.

You can make sure to send those alerts to your cell phone and email, so you’ll never be hit with an unwelcome surprise in the form of an unexpected bill. The first 10 GB of storage is free, with unlimited free uploads and 1 GB of free downloads each day.

Edit Caps

Click the Edit Caps button to enter dollar amount limits for storage, download bandwidth, Class B and Class C transactions separately (or specify No Cap if you don’t want to be encumbered). This way, you maintain control over how much you spend with B2.

And There’s More

That’s an overview of the B2 web GUI to help you get started using B2 Cloud Storage. If you’re more technical and are interested in connecting to B2 using our API instead, make sure to check out our B2 Starter Guide for a comprehensive overview of what’s under the hood.

Still have questions about the B2 web GUI, or ideas for how we can make it better? Fire away in the comments, we want to hear from you!

The post B2 for Beginners: Inside the B2 Web interface appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Security Risks of the President’s Android Phone

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/01/security_risks_13.html

Reports are that President Trump is still using his old Android phone. There are security risks here, but they are not the obvious ones.

I’m not concerned about the data. Anything he reads on that screen is coming from the insecure network that we all use, and any e-mails, texts, Tweets, and whatever are going out to that same network. But this is a consumer device, and it’s going to have security vulnerabilities. He’s at risk from everybody, ranging from lone hackers to the better-funded intelligence agencies of the world. And while the risk of a forged e-mail is real — it could easily move the stock market — the bigger risk is eavesdropping. That Android has a microphone, which means that it can be turned into a room bug without anyone’s knowledge. That’s my real fear.

I commented in this story.

EDITED TO ADD (1/27): Nicholas Weaver comments.

Run Mixed Workloads with Amazon Redshift Workload Management

Post Syndicated from Suresh Akena original https://aws.amazon.com/blogs/big-data/run-mixed-workloads-with-amazon-redshift-workload-management/

Mixed workloads run batch and interactive workloads (short-running and long-running queries or reports) concurrently to support business needs or demand. Typically, managing and configuring mixed workloads requires a thorough understanding of access patterns, how the system resources are being used and performance requirements.

It’s common for mixed workloads to have some processes that require higher priority than others. Sometimes, this means a certain job must complete within a given SLA. Other times, this means you only want to prevent a non-critical reporting workload from consuming too many cluster resources at any one time.

Without workload management (WLM), each query is prioritized equally, which can cause a person, team, or workload to consume excessive cluster resources for a process which isn’t as valuable as other more business-critical jobs.

This post provides guidelines on common WLM patterns and shows how you can use WLM query insights to optimize configuration in production workloads.

Workload concepts

You can use WLM to define the separation of business concerns and to prioritize the different types of concurrently running queries in the system:

  • Interactive: Software that accepts input from humans as it runs. Interactive software includes most popular programs, such as BI tools or reporting applications.
    • Short-running, read-only user queries such as Tableau dashboard query with low latency requirements.
    • Long-running, read-only user queries such as a complex structured report that aggregates the last 10 years of sales data.
  • Batch: Execution of a job series in a server program without manual intervention (non-interactive). The execution of a series of programs, on a set or “batch” of inputs, rather than a single input, would instead be a custom job.
    • Batch queries includes bulk INSERT, UPDATE, and DELETE transactions, for example, ETL or ELT programs.

Amazon Redshift Workload Management

Amazon Redshift is a fully managed, petabyte scale, columnar, massively parallel data warehouse that offers scalability, security and high performance. Amazon Redshift provides an industry standard JDBC/ODBC driver interface, which allows customers to connect their existing business intelligence tools and re-use existing analytics queries.

Amazon Redshift is a good fit for any type of analytical data model, for example, star and snowflake schemas, or simple de-normalized tables.

Managing workloads

Amazon Redshift Workload Management allows you to manage workloads of various sizes and complexity for specific environments. Parameter groups contain WLM configuration, which determines how many query queues are available for processing and how queries are routed to those queues. The default parameter group settings are not configurable. Create a custom parameter group to modify the settings in that group, and then associate it with your cluster. The following settings can be configured:

  • How many queries can run concurrently in each queue
  • How much memory is allocated among the queues
  • How queries are routed to queues, based on criteria such as the user who is running the query or a query label
  • Query timeout settings for a queue

When the user runs a query, WLM assigns the query to the first matching queue and executes rules based on the WLM configuration. For more information about WLM query queues, concurrency, user groups, query groups, timeout configuration, and queue hopping capability, see Defining Query Queues. For more information about the configuration properties that can be changed dynamically, see WLM Dynamic and Static Configuration Properties.

For example, the WLM configuration in the following screenshot has three queues to support ETL, BI, and other users. ETL jobs are assigned to the long-running queue and BI queries to the short-running queue. Other user queries are executed in the default queue.

Redshift_Workload_1

Guidelines on WLM optimal cluster configuration

1. Separate the business concerns and run queries independently from each other

Create independent queues to support different business processes, such as dashboard queries and ETL. For example, creating a separate queue for one-time queries would be a good solution so that they don’t block more important ETL jobs.

Additionally, because faster queries typically use a smaller amount of memory, you can set a low percentage for WLM memory percent to use for that one-time user queue or query group.

2. Rotate the concurrency and memory allocations based on the access patterns (if applicable)

In traditional data management, ETL jobs pull the data from the source systems in a specific batch window, transform, and then load the data into the target data warehouse. In this approach, you can allocate more concurrency and memory to the BI_USER group and very limited resources to ETL_USER during business hours. After hours, you can dynamically allocate or switch the resources to ETL_USER without rebooting the cluster so that heavy, resource-intensive jobs complete very quickly.

Note: The example AWS CLI command is shown on several lines for demonstration purposes. Actual commands should not have line breaks and must be submitted as a single line. The following JSON configuration requires escaped quotes.

Redshift_Workload_2

To change WLM settings dynamically, AWS recommends a scheduled Lambda function or scheduled data pipeline (ShellCmd).

3. Use queue hopping to optimize or support mixed workload (ETL and BI workload) continuously

WLM queue hopping allows read-only queries (BI_USER queries) to move from one queue to another queue without cancelling them completely. For example, as shown in the following screenshot, you can create two queues—one with a 60-second timeout for interactive queries and another with no timeout for batch queries—and add the same user group, BI_USER, to each queue. WLM automatically re-routes any only BI_USER timed-out queries in the interactive queue to the batch queue and restarts them.

Redshift_Workload_3

In this example, the ETL workload does not block the BI workload queries and the BI workload is eventually classified as batch, so that long-running, read-only queries do not block the execution of quick-running queries from the same user group.

4. Increase the slot count temporarily for resource-intensive ETL or batch queries

Amazon Redshift writes intermediate results to the disk to help prevent out-of-memory errors, but the disk I/O can degrade the performance. The following query shows if any active queries are currently running on disk:

SELECT query, label, is_diskbased FROM svv_query_state
WHERE is_diskbased = 't';

Query results:

query | label        | is_diskbased
-------+--------------+--------------
1025   | hash tbl=142 |      t

Typically, hashes, aggregates, and sort operators are likely to write data to disk if the system doesn’t have enough memory allocated for query processing. To fix this issue, allocate more memory to the query by temporarily increasing the number of query slots that it uses. For example, a queue with a concurrency level of 4 has 4 slots. When the slot count is set to 4, a single query uses the entire available memory of that queue. Note that assigning several slots to one query consumes the concurrency and blocks other queries from being able to run.

In the following example, I set the slot count to 4 before running the query and then reset the slot count back to 1 after the query finishes.

set wlm_query_slot_count to 4;
select
	p_brand,
	p_type,
	p_size,
	count(distinct ps_suppkey) as supplier_cnt
from
	partsupp,
	part
where
	p_partkey = ps_partkey
	and p_brand <> 'Brand#21'
	and p_type not like 'LARGE POLISHED%'
	and p_size in (26, 40, 28, 23, 17, 41, 2, 20)
	and ps_suppkey not in (
		select
			s_suppk
                                  from
			supplier
		where
			s_comment like '%Customer%Complaints%'
	)
group by
	p_brand,
	p_type,
	p_size
order by
	supplier_cnt desc,
	p_brand,
	p_type,
	p_size;

set wlm_query_slot_count to 1; -- after query completion, resetting slot count back to 1

Note:  The above TPC data set query is used for illustration purposes only.

Example insights from WLM queries

The following example queries can help answer questions you might have about your workloads:

  • What is the current query queue configuration? What is the number of query slots and the time out defined for each queue?
  • How many queries are executed, queued and executing per query queue?
  • What is my workload look like for each query queue per hour? Do I need to change my configuration based on the load?
  • How is my existing WLM configuration working? What query queues should be optimized to meet the business demand?

WLM configures query queues according to internally-defined WLM service classes. The terms queue and service class are often used interchangeably in the system tables.

Amazon Redshift creates several internal queues according to these service classes along with the queues defined in the WLM configuration. Each service class has a unique ID. Service classes 1-4 are reserved for system use and the superuser queue uses service class 5. User-defined queues use service class 6 and greater.

Query: Existing WLM configuration

Run the following query to check the existing WLM configuration. Four queues are configured and every queue is assigned to a number. In the query, queue number is mapped to service_class (Queue #1 => ETL_USER=>Service class 6) with the evictable flag set to false (no query timeout defined).

select service_class, num_query_tasks, evictable, eviction_threshold, name
from stv_wlm_service_class_config
where service_class > 5;
Redshift_Workload_4

The query above provides information about the current WLM configuration. This query can be automated using Lambda and send notifications to the operations team whenever there is a change to WLM.

Query: Queue state

Run the following query to monitor the state of the queues, the memory allocation for each queue and the number of queries executed in each queue. The query provides information about the custom queues and the superuser queue.

select config.service_class, config.name
, trim (class.condition) as description
, config.num_query_tasks as slots
, config.max_execution_time as max_time
, state.num_queued_queries queued
, state.num_executing_queries executing
, state.num_executed_queries executed
from
STV_WLM_CLASSIFICATION_CONFIG class,
STV_WLM_SERVICE_CLASS_CONFIG config,
STV_WLM_SERVICE_CLASS_STATE state
where
class.action_service_class = config.service_class
and class.action_service_class = state.service_class
and config.service_class > 5
order by config.service_class;

Redshift_Workload_5

Service class 9 is not being used in the above results. This would allow you to configure minimum possible resources (concurrency and memory) for default queue. Service class 6, etl_group, has executed more queries so you may configure or re-assign more memory and concurrency for this group.

Query: After the last cluster restart

The following query shows the number of queries that are either executing or have completed executing by service class after the last cluster restart.

select service_class, num_executing_queries,  num_executed_queries
from stv_wlm_service_class_state
where service_class >5
order by service_class;

Redshift_Workload_6
Service class 9 is not being used in the above results. Service class 6, etl_group, has executed more queries than any other service class. You may want configure more memory and concurrency for this group to speed up query processing.

Query: Hourly workload for each WLM query queue

The following query returns the hourly workload for each WLM query queue. Use this query to fine-tune WLM queues that contain too many or too few slots, resulting in WLM queuing or unutilized cluster memory. You can copy this query (wlm_apex_hourly.sql) from the amazon-redshift-utils GitHub repo.

WITH
        -- Replace STL_SCAN in generate_dt_series with another table which has > 604800 rows if STL_SCAN does not
        generate_dt_series AS (select sysdate - (n * interval '1 second') as dt from (select row_number() over () as n from stl_scan limit 604800)),
        apex AS (SELECT iq.dt, iq.service_class, iq.num_query_tasks, count(iq.slot_count) as service_class_queries, sum(iq.slot_count) as service_class_slots
                FROM
                (select gds.dt, wq.service_class, wscc.num_query_tasks, wq.slot_count
                FROM stl_wlm_query wq
                JOIN stv_wlm_service_class_config wscc ON (wscc.service_class = wq.service_class AND wscc.service_class > 4)
                JOIN generate_dt_series gds ON (wq.service_class_start_time <= gds.dt AND wq.service_class_end_time > gds.dt)
                WHERE wq.userid > 1 AND wq.service_class > 4) iq
        GROUP BY iq.dt, iq.service_class, iq.num_query_tasks),
        maxes as (SELECT apex.service_class, trunc(apex.dt) as d, date_part(h,apex.dt) as dt_h, max(service_class_slots) max_service_class_slots
                        from apex group by apex.service_class, apex.dt, date_part(h,apex.dt))
SELECT apex.service_class, apex.num_query_tasks as max_wlm_concurrency, maxes.d as day, maxes.dt_h || ':00 - ' || maxes.dt_h || ':59' as hour, MAX(apex.service_class_slots) as max_service_class_slots
FROM apex
JOIN maxes ON (apex.service_class = maxes.service_class AND apex.service_class_slots = maxes.max_service_class_slots)
GROUP BY  apex.service_class, apex.num_query_tasks, maxes.d, maxes.dt_h
ORDER BY apex.service_class, maxes.d, maxes.dt_h;

For the purposes of this post, the results are broken down by service class.

Redshift_Workload_7

In the above results, service class#6 seems to be utilized consistently up to 8 slots in 24 hrs. Looking at these numbers, no change is required for this service class at this point.

Redshift_Workload_8

Service class#7 can be optimized based on the above results. Two observations to note:

  • 6am- 3pm or 6pm- 6am (next day): The maximum number of slots used is 3. There is an opportunity to rotate concurrency and memory allocation based on these access patterns. For more information about how to rotate resources dynamically, see the guidelines section earlier in the post.
  • 3pm-6pm: Peak is observed during this period. You can leave the existing configuration during this time.

Summary

Amazon Redshift is a powerful, fully managed data warehouse that can offer significantly increased performance and lower cost in the cloud. Using the WLM feature, you can ensure that different users and processes running on the cluster receive the appropriate amount of resource to maximize performance and throughput.

If you have questions or suggestions, please leave a comment below.

 


About the Author

Suresh_90Suresh Akena is a Senior Big data/IT Transformation architect for AWS Professional Services. He works with the enterprise customers to provide leadership on large scale data strategies including migration to AWS platform, big data and analytics projects and help them to optimize and improve time to market for data driven applications when using AWS. In his spare time, he likes to play with his 8 and 3 year old daughters and watch movies.

 

 


Related

 

Top 10 Performance Tuning Techniques for Amazon Redshift

o_redshift_update_1

 

New SOC 2 Report Available: Confidentiality

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/new-soc-2-report-available-confidentiality/

AICPA SOC logo

As with everything at Amazon, the success of our security and compliance program is primarily measured by one thing: our customers’ success. Our customers drive our portfolio of compliance reports, attestations, and certifications that support their efforts in running a secure and compliant cloud environment. As a result of our engagement with key customers across the globe, we are happy to announce the publication of our new SOC 2 Confidentiality report. This report is available now through AWS Artifact in the AWS Management Console.

We’ve been publishing SOC 2 Security and Availability Trust Principle reports for years now, and the Confidentiality criteria is complementary to the Security and Availability criteria. The SOC 2 Confidentiality Trust Principle, developed by the American Institute of CPAs (AICPA) Assurance Services Executive Committee (ASEC), outlines additional criteria focused on further safeguarding data, limiting and reducing access to authorized users, and addressing the effective and timely disposal of customer content after deletion by the customer.

The AWS SOC Report covers the data centers in the US East (N. Virginia), US West (Oregon), US West (N. California), AWS GovCloud (US), EU (Ireland), EU (Frankfurt), Asia Pacific (Singapore), Asia Pacific (Seoul), Asia Pacific (Mumbai), Asia Pacific (Sydney), Asia Pacific (Tokyo), and South America (São Paulo) Regions. See AWS Global Infrastructure for more information.

To request this report:

  1. Sign in to your AWS account.
  2. In the list of services under Security, Identity, and Compliance, choose Compliance Reports, and on the next page choose the report you would like to review. Note that you might need to request approval from Amazon for some reports. Requests are reviewed and approved by Amazon within 24 hours.

Want to know more? See answers to some frequently asked questions about the AWS SOC program.  

– Chad

New White House Privacy Report

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/01/new_white_house.html

Two days ago, the White House released a report on privacy: “Privacy in our Digital Lives: Protecting Individuals and Promoting Innovation.” The report summarizes things the administration has done, and lists future challenges:

Areas for Further Attention

  1. Technology will pose new consumer privacy and security challenges.
  2. Emerging technology may simultaneously create new challenges and opportunities for law enforcement and national security.
  3. The digital economy is making privacy a global value.
  4. Consumers’ voices are being heard — and must continue to be heard — in the regulatory process.
  5. The Federal Government benefits from hiring more privacy professionals.
  6. Transparency is vital for earning and retaining public trust.
  7. Privacy is a bipartisan issue.

I especially like the framing of privacy as a right. From President Obama’s introduction:

Privacy is more than just, as Justice Brandeis famously proclaimed, the “right to be let alone.” It is the right to have our most personal information be kept safe by others we trust. It is the right to communicate freely and to do so without fear. It is the right to associate freely with others, regardless of the medium. In an age where so many of our thoughts, words, and movements are digitally recorded, privacy cannot simply be an abstract concept in our lives; privacy must be an embedded value.

The conclusion:

For the past 240 years, the core of our democracy — the values that have helped propel the United States of America — have remained largely the same. We are still a people founded on the beliefs of equality and economic prosperity for all. The fierce independence that encouraged us to break from an oppressive king is the same independence found in young women and men across the country who strive to make their own path in this world and create a life unique unto to themselves. So long as that independence is encouraged, so long as it is fostered by the ability to transcend past data points and by the ability to speak and create free from intrusion, the United States will continue to lead the world. Privacy is necessary to our economy, free expression, and the digital free flow of data because it is fundamental to ourselves.

Privacy, as a right that has been enjoyed by past generations, must be protected in our digital ecosystem so that future generations are given the same freedoms to engage, explore, and create the future we all seek.

I know; rhetoric is easy, policy is hard. But we can’t change policy without a changed rhetoric.

EDITED TO ADD: The document was originally on the whitehouse.gov website, but was deleted in the Trump transition.

Converging Data Silos to Amazon Redshift Using AWS DMS

Post Syndicated from Pratim Das original https://aws.amazon.com/blogs/big-data/converging-data-silos-to-amazon-redshift-using-aws-dms/

Organizations often grow organically—and so does their data in individual silos. Such systems are often powered by traditional RDBMS systems and they grow orthogonally in size and features. To gain intelligence across heterogeneous data sources, you have to join the data sets. However, this imposes new challenges, as joining data over dblinks or into a single view is extremely cumbersome and an operational nightmare.

This post walks through using AWS Database Migration Service (AWS DMS) and other AWS services to make it easy to converge multiple heterogonous data sources to Amazon Redshift. You can then use Amazon QuickSight, to visualize the converged dataset to gain additional business insights.

AWS service overview

Here’s a brief overview of AWS services that help with data convergence.

AWS DMS

With DMS, you can migrate your data to and from most widely used commercial and open-source databases. The service supports homogenous migrations such as Oracle to Oracle, as well as heterogeneous migrations between different database platforms, such as Oracle to Amazon Aurora or Microsoft SQL Server to MySQL. It also allows you to stream data to Amazon Redshift from any of the supported sources including:

  • Amazon Aurora
  • PostgreSQL
  • MySQL
  • MariaDB
  • Oracle
  • SAP ASE
  • SQL Server

DMS enables consolidation and easy analysis of data in the petabyte-scale data warehouse. It can also be used for continuous data replication with high availability.

Amazon QuickSight

Amazon QuickSight provides very fast, easy-to-use, cloud-powered business intelligence at 1/10th the cost of traditional BI solutions. QuickSight uses a new, super-fast, parallel, in-memory calculation engine (“SPICE”) to perform advanced calculations and render visualizations rapidly.

QuickSight integrates automatically with AWS data services, enables organizations to scale to hundreds of thousands of users, and delivers fast and responsive query performance to them. You can easily connect QuickSight to AWS data services, including Amazon Redshift, Amazon RDS, Amazon Aurora, Amazon S3, and Amazon Athena. You can also upload CSV, TSV, and spreadsheet files or connect to third-party data sources such as Salesforce.

Amazon Redshift

Amazon Redshift delivers fast query performance by using columnar storage technology to improve I/O efficiency and parallelizing queries across multiple nodes. Amazon Redshift is typically priced at 1/10th of the price of the competition. We have many customers running petabyte scale data analytics on AWS using Amazon Redshift.

Amazon Redshift is also ANSI SQL compliant, supports JDBC/ODBC, and is easy to connect to your existing business intelligence (BI) solution. However, if your storage requirement is in the 10s of TB range and requires high levels of concurrency across small queries, you may want to consider Amazon Aurora as the target converged database.

Walkthrough

Assume that you have an events company specializing on sports, and have built a MySQL database that holds data for the players and the sporting events. Customers and ticket information is stored in another database; in this case, assume it is PostgresSQL and this gets updated when customer purchases tickets from our website and mobile apps. You can download a sample dataset from the aws-database-migration-samples GitHub repo.

These databases could be anywhere: at an on-premises facility; on AWS in Amazon EC2 or Amazon RDS, or other cloud provider; or in a mixture of such locations. To complicate things a little more, you can assume that the lost opportunities (where a customer didn’t complete buying the ticket even though it was added to the shopping cart) are streamed via clickstream through Amazon Kinesis and then stored on Amazon S3. We then use AWS Data Pipeline to orchestrate a process to cleanse that data using Amazon EMR and make it ready for loading to Amazon Redshift. The clickstream integration is not covered in this post but was demonstrated in the recent Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics post.

Architecture

In this solution, you use DMS to bring the two data sources into Amazon Redshift and run analytics to gain business insights. The following diagram demonstrates the proposed solution.

DataSilos_1_1

After the data is available on Amazon Redshift, you could easily build BI dashboards and generate intelligent reports to gain insights using Amazon QuickSight. You could also take this a step further and build a model using Amazon Machine Learning. Amazon Machine Learning uses powerful algorithms to create ML models by finding patterns in your existing data stored in Amazon S3, or Amazon Redshift. It is also highly scalable and can generate billions of predictions daily, and serve those predictions in real time and at high throughput.

Creating source databases

For the purposes of this post, create two RDS databases, one with a MySQL engine, and the other with PostgreSQL and then load some data. These represent a real-life scenario where databases could be located on-premises, on AWS, or both. Just as in real life, there may be more than two source databases; the process described in this post would still be reasonably similar.

Follow the steps in Tutorial: Create a Web Server and an Amazon RDS Database to create the two source databases. Use the links from the main tutorial page to see how to connect to specific databases and load data. For more information, see:

Make a note of the security group that you create and associate all the RDS instances with it. Call it “MyRDSSecurityGroup”.

Afterward, you should be able to see all the databases listed in the RDS Instances dashboard.

DataSilos_2_1

Setting up a target Amazon Redshift cluster

Set up a two-node cluster as shown below, with a cluster name similar to “consolidated-dwh” and a database named similar to “mydwh”. You could also set up a one-node cluster based on the instance type; the instance type may be available on the AWS Free Tier.

DataSilos_3

In the next step, choose Publicly Accessible for non-production usage to keep the configuration simple.

Also, for simplicity, choose the same VPC where you have placed the RDS instances and include the MyRDSSecurityGroup in the list of security groups allowed to access the Amazon Redshift cluster.

Setting up DMS

You can set up DMS easily, as indicated in the AWS Database Migration Service post on the AWS blog. However, rather than using the wizard, you may take a step-by-step approach:

  1. Create a replication instance.
  2. Create the endpoints for the two source databases and the target Amazon Redshift database.
  3. Create a task to synchronize each of the sources to the target.

Create a replication instance

In the DMS console, choose Replication instances, Create replication instance. The instance type you select depends on the data volume you deal with. After setup, you should be able to see your replication instance.

DataSilos_4
Create endpoints

In the DMS console, choose Endpoints, Create endpoint. You need to configure the two source endpoints representing the PostgreSQL and MySQL RDS databases. You also need to create the target endpoint by supplying the Amazon Redshift database that you created in the previous steps. After configuration, the endpoints look similar to the following screenshot:

DataSilos_5

Create a task and start data migration

You can rely on DMS to create the target tables in your target Amazon Redshift database or you may want to take advantage of AWS Schema Conversion Tool to create the target schema and also do a compatibility analysis in the process. Using the AWS Schema Conversion Tool is particularly useful when migrating using heterogeneous data sources. For more information, see Getting Started with the AWS Schema Conversion Tool.

For simplicity, I avoided using the AWS Schema Conversion Tool in this post and used jump to DMS to create the target schema and underlying tables and then set up the synchronization between the data sources and the target.

In the DMS console, choose Tasks, Create Tasks. Fill in the fields as shown in the following screenshot:

DataSilos_6

Note that given the source is RDS MySQL and you chose Migrate data and replicate on going changes, you need to enable bin log retention. Other engines have other requirements and DMS prompts you accordingly. For this particular case, run the following command:

call mysql.rds_set_configuration('binlog retention hours', 24);

Now, choose Start task on create. In the task settings, choose Drop tables on target to have DMS create the tables, if you haven’t already created the target tables using the AWS Schema Conversion Tool, as described earlier. Choose Enable logging but note that this incurs additional costs as the generated CloudWatch logs require storage.

In the table mappings, for Schema to migrate, ensure that the correct schema has been selected from the source databases. DMS creates the schema on the target if it does not already exist.

Repeat for the other data source, choosing the other source endpoint and the same Amazon Redshift target endpoint. In the table mappings section, choose Custom and customize as appropriate. For example, you can specify the schema names to include and tables to exclude, as shown in the following screenshot:

DataSilos_7

Using this custom configuration, you can perform some minor transformations, such as down casing target table names, or choosing a different target schema for both sources.

After both tasks have successfully completed, the Tasks tab now looks like the following:

DataSilos_8

Running queries on Amazon Redshift

In Amazon Redshift, select your target cluster and choose Loads. You can see all operations that DMS performed in the background to load the data from the two source databases into Amazon Redshift.

DataSilos_9

Ensure change data capture is working

Generate additional data on Amazon RDS PostgreSQL in the ticketing.sporting_event_ticket by running the script provided in the generate_mlb_season.sql aws-database-migration-samples GitHub repository. Notice that the tasks have caught up and are showing the migration in progress. You can also query the target tables and see that the new data is in the target table.

Visualization options

Set up QuickSight and configure your data source to be your Amazon Redshift database. If you have a Redshift cluster in the same account and in the same region, it will appear when you clock Redshift (Auto-discovered) from the data sets page, as shown below.

DataSilos_16

Access to any other Redshift cluster can be configured as follows using the Redshift (Manual connect) link:

DataSilos_10

Now, create your data set. Choose New Data Set and select either a new data source or an existing data source listed at the bottom of the page. Choose Ticketing for Sports.

DataSilos_11_1
In the next step, choose Create Data Set.

In the next step, when QuickSight prompts you to choose your table, you can select the schema and the required table and choose Select. Alternatively, you may choose Edit/Preview data.

DataSilos_11

You could use the graphical options shown below to start creating your data set. Given that you have data from multiple sources, it’s safe to assume that your target tables are in separate schemas. Select the schema and tables, select the other schemas, and bring the appropriate tables to the palette by selecting them using the check box to the right. For each join, select the join type and then map the appropriate keys between the tables until the two reds turn to one of the blue join types.

DataSilos_12

In this case, rather than preparing the data set in the palette, you provide a custom SQL query. On the left pane, choose Tables, Switch to Custom SQL tool.

Paste the following SQL query in the Custom SQL field and enter a name.

select to_char( e.start_date_time, 'YYYY-MM-DD' ) event_date, 
to_char( e.start_date_time, 'HH24:MI' ) start_time, e.sold_out, 
e.sport_type_name, l.name event_location, l.city event_city, 
l.seating_capacity, hteam.name home_team, hl.name home_field, 
hl.city home_city, ateam.name away_team, al.name away_field, 
al.city away_city, sum( t.ticket_price ) total_ticket_price, 
avg( t.ticket_price ) average_ticket_price, 
min ( t.ticket_price ) cheapest_ticket, 
max( t.ticket_price ) most_expensive_ticket, count(*) num_tickets

from ticketing.sporting_event_ticket t, sourcemysql.sporting_event e, 
sourcemysql.sport_location l, sourcemysql.sport_team hteam, 
sourcemysql.sport_team ateam, sourcemysql.sport_location hl, 
sourcemysql.sport_location al

where t.sporting_event_id = e.id
and t.sport_location_id = l.id
and e.home_team_id = hteam.id
and e.away_team_id = ateam.id
and hteam.home_field_id = hl.id
and ateam.home_field_id = al.id

group by to_char( e.start_date_time, 'YYYY-MM-DD' ), 
to_char( e.start_date_time, 'HH24:MI' ), e.start_date_time, 
e.sold_out, e.sport_type_name, l.name, l.city, l.seating_capacity, 
hteam.name, ateam.name, hl.name, hl.city, al.name, al.city;

DataSilos_13

You can choose Save and visualize and view the QuickSight visualization toolkit and filter options. Here you can build your story or dashboards and start sharing them with your team.

Now, you can choose various fields from the field list and the various measures to get the appropriate visualization, like the one shown below. This one was aimed to understand the date at which each event in each city reached the maximum capacity.

DataSilos_14

You can also combine many such visualizations and prepare your dashboard for management reporting. The analysis may also drive where you need to invent on campaigns and where things are going better than expected to ensure a healthy sales pipeline.

DataSilos_15

Summary

In this post, you used AWS DMS to converge multiple heterogonous data sources to an Amazon Redshift cluster. You also used Quicksight to create a data visualization on the converged dataset to provide you with additional insights. Although we have used an e-commerce use case related to an events company, this concept of converging multiple data silos to a target is also applicable to other verticals such as retail, health-care, finance, insurance and banking, gaming, and so on.

If you have questions or suggestions, please comment below.


About the Author

 

Pratim_DasPratim Das is a Specialist Solutions Architect for Analytics in EME. He works with customers on big data and analytical projects, helping them build solutions on AWS, using AWS services and (or) other open source or commercial solution from the big data echo system. In his spare time he enjoys cooking and creating exciting new recipes always with that spicy kick.

 

 


Related

Derive Insights from IoT in Minutes using AWS IoT, Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight

o_realtime_1_1_1
 

 

 

 

 

Call for Papers! DEEM: 1st Workshop on Data Management for End-to-End Machine Learning

Post Syndicated from Joseph Spisak original https://aws.amazon.com/blogs/big-data/call-for-papers-deem-1st-workshop-on-data-management-for-end-to-end-machine-learning/

DEEM

Amazon and Matroid will hold the first workshop on Data Management for End-to-End Machine Learning (DEEM) on May 14th, 2017 in conjunction with the premier systems conference SIGMOD/PODS 2017 in Raleigh, North Carolina. For more details about the workshop focus, see Challenges and opportunities in machine learning below.

DEEM brings together researchers and practitioners at the intersection of applied machine learning, data management, and systems research to discuss data management issues in ML application scenarios.

We’re soliciting research papers that describe preliminary and ongoing research results. We’re also looking for reports from industry describing end-to-end ML deployments. Submissions can either be short papers (4 pages) or long papers (up to 10 pages) following the ACM proceedings format.

Register and submit: https://cmt3.research.microsoft.com/DEEM2017/ (account needed)

Submission Deadline: February 1, 2017

Notification of Acceptance: March 1, 2017

Final papers due: March 20, 2017

Workshop: May 14th, 2017

Follow us on twitter @deem_workshop.

Challenges and opportunities in machine learning

Applying machine learning (ML) in real-world scenarios is challenging. In recent years, the database community has focused on creating systems and abstractions for efficiently training ML models on large datasets. But model training is only one of many steps in an end-to-end ML application. Many orthogonal data management problems arise from the large-scale use of ML. The data management community needs to focus on these problems.

For example, preprocessing data and extracting feature workloads causes complex pipelines that often require the simultaneous execution of relational and linear algebraic operations. Next, the class of the ML model to use needs to be chosen. For that, a set of popular approaches such as linear models, decision trees, and deep neural networks often must be analyzed, evaluated, and interpreted.

The prediction quality of such ML models depends on the choice of features and hyperparameters, which are typically selected in a costly offline evaluation process. Afterwards, the resulting models must be deployed and integrated into existing business workflows in a way that enables fast and efficient predictions while allowing for the lifecycle of models (that become stale over time) to be managed.

As a further complication, the resulting systems need to take the target audience of ML applications into account. This audience is heterogeneous, ranging from analysts without programming skills that possibly prefer an easy-to-use, cloud-based solution, to teams of data processing experts and statisticians that develop and deploy custom-tailored algorithms.

DEEM aims to bring together researchers and practitioners at the intersection of applied machine learning, data management and systems research to discuss data management issues in ML application scenarios. This workshop solicits regular research papers describing preliminary and ongoing research results. In addition, the workshop encourages the submission of industrial experience reports of end-to-end ML deployments.

Questions? Please send them to [email protected]

amazon_matroid

The Top 10 Most Downloaded AWS Security and Compliance Documents in 2016

Post Syndicated from Sara Duffer original https://aws.amazon.com/blogs/security/the-top-10-most-downloaded-aws-security-and-compliance-documents-in-2016/

The following list includes the ten most downloaded AWS security and compliance documents in 2016. Using this list, you can learn about what other people found most interesting about security and compliance last year.

  1. Service Organization Controls (SOC) 3 Report – This publicly available report describes internal controls for security, availability, processing integrity, confidentiality, or privacy.
  2. AWS Best Practices for DDoS Resiliency – This whitepaper covers techniques to mitigate distributed denial of service (DDoS) attacks.
  3. Architecting for HIPAA Security and Compliance on AWS – This whitepaper describes how to leverage AWS to develop applications that meet HIPAA and HITECH compliance requirements.
  4. ISO 27001 Certification – The ISO 27001 certification of our Information Security Management System (ISMS) covers our infrastructure, data centers, and services including Amazon Elastic Compute Cloud (Amazon EC2), Amazon Simple Storage Service (Amazon S3), and Amazon Virtual Private Cloud (Amazon VPC).
  5. AWS: Overview of Security Processes – This whitepaper describes the physical and operational security processes for the AWS managed network and infrastructure, and helps answer questions such as, “How does AWS help me protect my data?”
  6. AWS: Risk and Compliance – This whitepaper provides information to help customers integrate AWS into their existing control framework, including a basic approach for evaluating AWS controls and a description of AWS certifications, programs, reports, and third-party attestations.
  7. ISO 27017 Certification – The ISO 27017 certification provides guidance about the information security aspects of cloud computing, recommending the implementation of cloud-specific information security controls that supplement the guidance of the ISO 27002 and ISO 27001 standards.
  8. AWS Whitepaper on EU Data Protection – This whitepaper provides information about how to meet EU compliance requirements when using AWS services.
  9. PCI Compliance in the AWS Cloud: Technical Workbook – This workbook provides guidance about building an environment in AWS that is compliant with the Payment Card Industry Data Security Standard (PCI DSS).
  10. Auditing Security Checklist – This whitepaper provides information, tools, and approaches for auditors to use when auditing the security of the AWS managed network and infrastructure.

– Sara

Photocopier Security

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/01/photocopier_sec.html

A modern photocopier is basically a computer with a scanner and printer attached. This computer has a hard drive, and scans of images are regularly stored on that drive. This means that when a photocopier is thrown away, that hard drive is filled with pages that the machine copied over its lifetime. As you might expect, some of those pages will contain sensitive information.

This 2011 report was written by the Inspector General of the National Archives and Records Administration (NARA). It found that the organization did nothing to safeguard its photocopiers.

Our audit found that opportunities exist to strengthen controls to ensure photocopier hard drives are protected from potential exposure. Specifically, we found the following weaknesses.

  • NARA lacks appropriate controls to ensure all photocopiers across the agency are accounted for and that any hard drives residing on these machines are tracked and properly sanitized or destroyed prior to disposal.
  • There are no policies documenting security measures to be taken for photocopiers utilized for general use nor are there procedures to ensure photocopier hard drives are sanitized or destroyed prior to disposal or at the end of the lease term.

  • Photocopier lease agreements and contracts do not include a “keep disk”1 or similar clause as required by NARA’s IT Security Methodology for Media Protection Policy version 5.1.

I don’t mean to single this organization out. Pretty much no one thinks about this security threat.