Abstract: The detection of faked identities is a major problem in security. Current memory-detection techniques cannot be used as they require prior knowledge of the respondent’s true identity. Here, we report a novel technique for detecting faked identities based on the use of unexpected questions that may be used to check the respondent identity without any prior autobiographical information. While truth-tellers respond automatically to unexpected questions, liars have to “build” and verify their responses. This lack of automaticity is reflected in the mouse movements used to record the responses as well as in the number of errors. Responses to unexpected questions are compared to responses to expected and control questions (i.e., questions to which a liar also must respond truthfully). Parameters that encode mouse movement were analyzed using machine learning classifiers and the results indicate that the mouse trajectories and errors on unexpected questions efficiently distinguish liars from truth-tellers. Furthermore, we showed that liars may be identified also when they are responding truthfully. Unexpected questions combined with the analysis of mouse movement may efficiently spot participants with faked identities without the need for any prior information on the examinee.
One of the more useful features of masscan is the “–banners” check, which connects to the TCP port, sends some request, and gets a basic response back. However, since masscan has it’s own TCP stack, it’ll interfere with the operating system’s TCP stack if they are sharing the same IPv4 address. The operating system will reply with a RST packet before the TCP connection can be established.
The way to fix this is to use the built-in packet-filtering firewall to block those packets in the operating-system TCP/IP stack. The masscan program still sees everything before the packet-filter, but the operating system can’t see anything after the packet-filter.
Note that we are talking about the “packet-filter” firewall feature here. Remember that macOS, like most operating systems these days, has two separate firewalls: an application firewall and a packet-filter firewall. The application firewall is the one you see in System Settings labeled “Firewall”, and it controls things based upon the application’s identity rather than by which ports it uses. This is normally “on” by default. The packet-filter is normally “off” by default and is of little use to normal users.
Also note that macOS changed packet-filters around version 10.10.5 (“Yosemite”, October 2014). The older one is known as “ipfw“, which was the default firewall for FreeBSD (much of macOS is based on FreeBSD). The replacement is known as PF, which comes from OpenBSD. Whereas you used to use the old “ipfw” command on the command line, you now use the “pfctl” command, as well as the “/etc/pf.conf” configuration file.
What we need to filter is the source port of the packets that masscan will send, so that when replies are received, they won’t reach the operating-system stack, and just go to masscan instead. To do this, we need find a range of ports that won’t conflict with the operating system. Namely, when the operating system creates outgoing connections, it randomly chooses a source port within a certain range. We want to use masscan to use source ports in a different range.
To figure out the range macOS uses, we run the following command:
On my laptop, which is probably the default for macOS, I get the following range. Sniffing with Wireshark confirms this is the range used for source ports for outgoing connections.
So this means I shouldn’t use source ports anywhere in the range 49152 to 65535. On my laptop, I’ve decided to use for masscan the ports 40000 to 41023. The range masscan uses must be a power of 2, so here I’m using 1024 (two to the tenth power).
To configure masscan, I can either type the parameter “–source-port 40000-41023” every time I run the program, or I can add the following line to /etc/masscan/masscan.conf. Remember that by default, masscan will look in that configuration file for any configuration parameters, so you don’t have to keep retyping them on the command line.
source-port = 40000-41023
Next, I need to add the following firewall rule to the bottom of /etc/pf.conf:
block in proto tcp from any to any port 40000 >< 41024
However, we aren’t done yet. By default, the packet-filter firewall is off on some versions of macOS. Therefore, every time you reboot your computer, you need to enable it. The simple way to do this is on the command line run:
pfctl -e
Or, if that doesn’t work, try:
pfctl -E
If the firewall is already running, then you’ll need to load the file explicitly (or reboot):
AWS Identity and Access Management (IAM) now makes it easier for you to control access to your AWS resources by using the AWS organization of IAM principals (users and roles). For some services, you grant permissions using resource-based policies to specify the accounts and principals that can access the resource and what actions they can perform on it. Now, you can use a new condition key, aws:PrincipalOrgID, in these policies to require all principals accessing the resource to be from an account in the organization. For example, let’s say you have an Amazon S3 bucket policy and you want to restrict access to only principals from AWS accounts inside of your organization. To accomplish this, you can define the aws:PrincipalOrgID condition and set the value to your organization ID in the bucket policy. Your organization ID is what sets the access control on the S3 bucket. Additionally, when you use this condition, policy permissions apply when you add new accounts to this organization without requiring an update to the policy.
In this post, I walk through the details of the new condition and show you how to restrict access to only principals in your organization using S3.
Condition concepts
Before I introduce the new condition, let’s review the condition element of an IAM policy. A condition is an optional IAM policy element you can use to specify special circumstances under which the policy grants or denies permission. A condition includes a condition key, operator, and value for the condition. There are two types of conditions: service-specific conditions and global conditions. Service-specific conditions are specific to certain actions in an AWS service. For example, the condition key ec2:InstanceType supports specific EC2 actions. Global conditions support all actions across all AWS services.
Now that I’ve reviewed the condition element in an IAM policy, let me introduce the new condition.
AWS:PrincipalOrgID Condition Key
You can use this condition key to apply a filter to the Principal element of a resource-based policy. You can use any string operator, such as StringLike, with this condition and specify the AWS organization ID for as its value.
Condition key
Description
Operator(s)
Value
aws:PrincipalOrgID
Validates if the principal accessing the resource belongs to an account in your organization.
Example: Restrict access to only principals from my organization
Let’s consider an example where I want to give specific IAM principals in my organization direct access to my S3 bucket, 2018-Financial-Data, that contains sensitive financial information. I have two accounts in my AWS organization with multiple account IDs, and only some IAM users from these accounts need access to this financial report.
To grant this access, I author a resource-based policy for my S3 bucket as shown below. In this policy, I list the individuals who I want to grant access. For the sake of this example, let’s say that while doing so, I accidentally specify an incorrect account ID. This means a user named Steve, who is not in an account in my organization, can now access my financial report. To require the principal account to be in my organization, I add a condition to my policy using the global condition key aws:PrincipalOrgID. This condition requires that only principals from accounts in my organization can access the S3 bucket. This means that although Steve is one of the principals in the policy, he can’t access the financial report because the account that he is a member of doesn’t belong to my organization.
In the policy above, I specify the principals that I grant access to using the principal element of the statement. Next, I add s3:GetObject as the action and 2018-Financial-Data/* as the resource to grant read access to my S3 bucket. Finally, I add the new condition key aws:PrincipalOrgID and specify my organization ID in the condition element of the statement to make sure only the principals from the accounts in my organization can access this bucket.
Summary
You can now use the aws:PrincipalOrgID condition key in your resource-based policies to more easily restrict access to IAM principals from accounts in your AWS organization. For more information about this global condition key and policy examples using aws:PrincipalOrgID, read the IAM documentation.
If you have comments about this post, submit them in the Comments section below. If you have questions about or suggestions for this solution, start a new thread on the IAM forum or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
Since our last System and Organization Control (SOC) audit, our service and compliance teams have been working to increase the number of AWS Services in scope prioritized based on customer requests. Today, we’re happy to report 11 services are newly SOC compliant, which is a 21 percent increase in the last six months.
With the addition of the following 11 new services, you can now select from a total of 62 SOC-compliant services. To see the full list, go to our Services in Scope by Compliance Program page:
Our latest SOC 1, 2, and 3 reports covering the period from October 1, 2017 to March 31, 2018 are now available. The SOC 1 and 2 reports are available on-demand through AWS Artifact by logging into the AWS Management Console. The SOC 3 report can be downloaded here.
Finally, prospective customers can read our SOC 1 and 2 reports by reaching out to AWS Compliance.
Want more AWS Security news? Follow us on Twitter.
This blog post was co-authored by Ujjwal Ratan, a senior AI/ML solutions architect on the global life sciences team.
Healthcare data is generated at an ever-increasing rate and is predicted to reach 35 zettabytes by 2020. Being able to cost-effectively and securely manage this data whether for patient care, research or legal reasons is increasingly important for healthcare providers.
Healthcare providers must have the ability to ingest, store and protect large volumes of data including clinical, genomic, device, financial, supply chain, and claims. AWS is well-suited to this data deluge with a wide variety of ingestion, storage and security services (e.g. AWS Direct Connect, Amazon Kinesis Streams, Amazon S3, Amazon Macie) for customers to handle their healthcare data. In a recent Healthcare IT News article, healthcare thought-leader, John Halamka, noted, “I predict that five years from now none of us will have datacenters. We’re going to go out to the cloud to find EHRs, clinical decision support, analytics.”
I realize simply storing this data is challenging enough. Magnifying the problem is the fact that healthcare data is increasingly attractive to cyber attackers, making security a top priority. According to Mariya Yao in her Forbes column, it is estimated that individual medical records can be worth hundreds or even thousands of dollars on the black market.
In this first of a 2-part post, I will address the value that AWS can bring to customers for ingesting, storing and protecting provider’s healthcare data. I will describe key components of any cloud-based healthcare workload and the services AWS provides to meet these requirements. In part 2 of this post we will dive deep into the AWS services used for advanced analytics, artificial intelligence and machine learning.
The data tsunami is upon us
So where is this data coming from? In addition to the ubiquitous electronic health record (EHR), the sources of this data include:
genomic sequencers
devices such as MRIs, x-rays and ultrasounds
sensors and wearables for patients
medical equipment telemetry
mobile applications
Additional sources of data come from non-clinical, operational systems such as:
human resources
finance
supply chain
claims and billing
Data from these sources can be structured (e.g., claims data) as well as unstructured (e.g., clinician notes). Some data comes across in streams such as that taken from patient monitors, while some comes in batch form. Still other data comes in near-real time such as HL7 messages. All of this data has retention policies dictating how long it must be stored. Much of this data is stored in perpetuity as many systems in use today have no purge mechanism. AWS has services to manage all these data types as well as their retention, security and access policies.
Imaging is a significant contributor to this data tsunami. Increasing demand for early-stage diagnoses along with aging populations drive increasing demand for images from CT, PET, MRI, ultrasound, digital pathology, X-ray and fluoroscopy. For example, a thin-slice CT image can be hundreds of megabytes. Increasing demand and strict retention policies make storage costly.
Due to the plummeting cost of gene sequencing, molecular diagnostics (including liquid biopsy) is a large contributor to this data deluge. Many predict that as the value of molecular testing becomes more identifiable, the reimbursement models will change and it will increasingly become the standard of care. According to the Washington Post article “Sequencing the Genome Creates so Much Data We Don’t Know What to do with It,”
“Some researchers predict that up to one billion people will have their genome sequenced by 2025 generating up to 40 exabytes of data per year.”
Although genomics is primarily used for oncology diagnostics today, it’s also used for other purposes, pharmacogenomics — used to understand how an individual will metabolize a medication.
Reference Architecture
It is increasingly challenging for the typical hospital, clinic or physician practice to securely store, process and manage this data without cloud adoption.
Amazon has a variety of ingestion techniques depending on the nature of the data including size, frequency and structure. AWS Snowball and AWS Snowmachine are appropriate for extremely-large, secure data transfers whether one time or episodic. AWS Glue is a fully-managed ETL service for securely moving data from on-premise to AWS and Amazon Kinesis can be used for ingesting streaming data.
Amazon S3, Amazon S3 IA, and Amazon Glacier are economical, data-storage services with a pay-as-you-go pricing model that expand (or shrink) with the customer’s requirements.
The above architecture has four distinct components – ingestion, storage, security, and analytics. In this post I will dive deeper into the first three components, namely ingestion, storage and security. In part 2, I will look at how to use AWS’ analytics services to draw value on, and optimize, your healthcare data.
Ingestion
A typical provider data center will consist of many systems with varied datasets. AWS provides multiple tools and services to effectively and securely connect to these data sources and ingest data in various formats. The customers can choose from a range of services and use them in accordance with the use case.
For use cases involving one-time (or periodic), very large data migrations into AWS, customers can take advantage of AWS Snowball devices. These devices come in two sizes, 50 TB and 80 TB and can be combined together to create a petabyte scale data transfer solution.
The devices are easy to connect and load and they are shipped to AWS avoiding the network bottlenecks associated with such large-scale data migrations. The devices are extremely secure supporting 256-bit encryption and come in a tamper-resistant enclosure. AWS Snowball imports data in Amazon S3 which can then interface with other AWS compute services to process that data in a scalable manner.
For use cases involving a need to store a portion of datasets on premises for active use and offload the rest on AWS, the Amazon storage gateway service can be used. The service allows you to seamlessly integrate on premises applications via standard storage protocols like iSCSI or NFS mounted on a gateway appliance. It supports a file interface, a volume interface and a tape interface which can be utilized for a range of use cases like disaster recovery, backup and archiving, cloud bursting, storage tiering and migration.
The AWS Storage Gateway appliance can use the AWS Direct Connect service to establish a dedicated network connection from the on premises data center to AWS.
Specific Industry Use Cases
By using the AWS proposed reference architecture for disaster recovery, healthcare providers can ensure their data assets are securely stored on the cloud and are easily accessible in the event of a disaster. The “AWS Disaster Recovery” whitepaper includes details on options available to customers based on their desired recovery time objective (RTO) and recovery point objective (RPO).
AWS is an ideal destination for offloading large volumes of less-frequently-accessed data. These datasets are rarely used in active compute operations but are exceedingly important to retain for reasons like compliance. By storing these datasets on AWS, customers can take advantage of the highly-durable platform to securely store their data and also retrieve them easily when they need to. For more details on how AWS enables customers to run back and archival use cases on AWS, please refer to the following set of whitepapers.
A healthcare provider may have a variety of databases spread throughout the hospital system supporting critical applications such as EHR, PACS, finance and many more. These datasets often need to be aggregated to derive information and calculate metrics to optimize business processes. AWS Glue is a fully-managed Extract, Transform and Load (ETL) service that can read data from a JDBC-enabled, on-premise database and transfer the datasets into AWS services like Amazon S3, Amazon Redshift and Amazon RDS. This allows customers to create transformation workflows that integrate smaller datasets from multiple sources and aggregates them on AWS.
Healthcare providers deal with a variety of streaming datasets which often have to be analyzed in near real time. These datasets come from a variety of sources such as sensors, messaging buses and social media, and often do not adhere to an industry standard. The Amazon Kinesis suite of services, that includes Amazon Kinesis Streams, Amazon Kinesis Firehose, and Amazon Kinesis Analytics, are the ideal set of services to accomplish the task of deriving value from streaming data.
Example: Using AWS Glue to de-identify and ingest healthcare data into S3 Let’s consider a scenario in which a provider maintains patient records in a database they want to ingest into S3. The provider also wants to de-identify the data by stripping personally- identifiable attributes and store the non-identifiable information in an S3 bucket. This bucket is different from the one that contains identifiable information. Doing this allows the healthcare provider to separate sensitive information with more restrictions set up via S3 bucket policies.
To ingest records into S3, we create a Glue job that reads from the source database using a Glue connection. The connection is also used by a Glue crawler to populate the Glue data catalog with the schema of the source database. We will use the Glue development endpoint and a zeppelin notebook server on EC2 to develop and execute the job.
Step 1: Import the necessary libraries and also set a glue context which is a wrapper on the spark context:
Step 2: Create a dataframe from the source data. I call the dataframe “readmissionsdata”. Here is what the schema would look like:
Step 3: Now select the columns that contains indentifiable information and store it in a new dataframe. Call the new dataframe “phi”.
Step 4: Non-PHI columns are stored in a separate dataframe. Call this dataframe “nonphi”.
Step 5: Write the two dataframes into two separate S3 buckets
Once successfully executed, the PHI and non-PHI attributes are stored in two separate files in two separate buckets that can be individually maintained.
Storage
In 2016, 327 healthcare providers reported a protected health information (PHI) breach, affecting 16.4m patient records[1]. There have been 342 data breaches reported in 2017 — involving 3.2 million patient records.[2]
To date, AWS has released 51 HIPAA-eligible services to help customers address security challenges and is in the process of making many more services HIPAA-eligible. These HIPAA-eligible services (along with all other AWS services) help customers build solutions that comply with HIPAA security and auditing requirements. A catalogue of HIPAA-enabled services can be found at AWS HIPAA-eligible services. It is important to note that AWS manages physical and logical access controls for the AWS boundary. However, the overall security of your workloads is a shared responsibility, where you are responsible for controlling user access to content on your AWS accounts.
AWS storage services allow you to store data efficiently while maintaining high durability and scalability. By using Amazon S3 as the central storage layer, you can take advantage of the Amazon S3 storage management features to get operational metrics on your data sets and transition them between various storage classes to save costs. By tagging objects on Amazon S3, you can build a governance layer on Amazon S3 to grant role based access to objects using Amazon IAM and Amazon S3 bucket policies.
To learn more about the Amazon S3 storage management features, see the following link.
Security
In the example above, we are storing the PHI information in a bucket named “phi.” Now, we want to protect this information to make sure its encrypted, does not have unauthorized access, and all access requests to the data are logged.
Encryption: S3 provides settings to enable default encryption on a bucket. This ensures any object in the bucket is encrypted by default.
Logging: S3 provides object level logging that can be used to capture all API calls to the object. The API calls are logged in cloudtrail for easy access and consolidation. Moreover, it also supports events to proactively alert customers of read and write operations.
Access control: Customers can use S3 bucket policies and IAM policies to restrict access to the phi bucket. It can also put a restriction to enforce multi-factor authentication on the bucket. For example, the following policy enforces multi-factor authentication on the phi bucket:
In Part 1 of this blog, we detailed the ingestion, storage, security and management of healthcare data on AWS. Stay tuned for part two where we are going to dive deep into optimizing the data for analytics and machine learning.
In this tutorial from The MagPi issue 68, Steve Martin takes us through the process of house-building in Minecraft Pi. Get your copy of The MagPi in stores now, or download it as a free PDF here.
Writing programs that create things in Minecraft is not only a great way to learn how to code, but it also means that you have a program that you can run again and again to make as many copies of your Minecraft design as you want. You never need to worry about your creation being destroyed by your brother or sister ever again — simply rerun your program and get it back! Whilst it might take a little longer to write the program than to build one house, once it’s finished you can build as many houses as you want.
Co-ordinates in Minecraft
Let’s start with a review of the coordinate system that Minecraft uses to know where to place blocks. If you are already familiar with this, you can skip to the next section. Otherwise, read on.
Plan view of our house design
Minecraft shows us a three-dimensional (3D) view of the world. Imagine that the room you are in is the Minecraft world and you want to describe your location within that room. You can do so with three numbers, as follows:
How far across the room are you? As you move from side to side, you change this number. We can consider this value to be our X coordinate.
How high off the ground are you? If you are upstairs, or if you jump, this value increases. We can consider this value to be our Y coordinate.
How far into the room are you? As you walk forwards or backwards, you change this number. We can consider this value to be our Z coordinate.
You might have done graphs in school with X going across the page and Y going up the page. Coordinates in Minecraft are very similar, except that we have an extra value, Z, for our third dimension. Don’t worry if this still seems a little confusing: once we start to build our house, you will see how these three dimensions work in Minecraft.
Designing our house
It is a good idea to start with a rough design for our house. This will help us to work out the values for the coordinates when we are adding doors and windows to our house. You don’t have to plan every detail of your house right away. It is always fun to enhance it once you have got the basic design written. The image above shows the plan view of the house design that we will be creating in this tutorial. Note that because this is a plan view, it only shows the X and Z co-ordinates; we can’t see how high anything is. Hopefully, you can imagine the house extending up from the screen.
We will build our house close to where the Minecraft player is standing. This a good idea when creating something in Minecraft with Python, as it saves us from having to walk around the Minecraft world to try to find our creation.
Starting our program
Type in the code as you work through this tutorial. You can use any editor you like; we would suggest either Python 3 (IDLE) or Thonny Python IDE, both of which you can find on the Raspberry Pi menu under Programming. Start by selecting the File menu and creating a new file. Save the file with a name of your choice; it must end with .py so that the Raspberry Pi knows that it is a Python program.
It is important to enter the code exactly as it is shown in the listing. Pay particular attention to both the spelling and capitalisation (upper- or lower-case letters) used. You may find that when you run your program the first time, it doesn’t work. This is very common and just means there’s a small error somewhere. The error message will give you a clue about where the error is.
It is good practice to start all of your Python programs with the first line shown in our listing. All other lines that start with a # are comments. These are ignored by Python, but they are a good way to remind us what the program is doing.
The two lines starting with from tell Python about the Minecraft API; this is a code library that our program will be using to talk to Minecraft. The line starting mc = creates a connection between our Python program and the game. Then we get the player’s location broken down into three variables: x, y, and z.
Building the shell of our house
To help us build our house, we define three variables that specify its width, height, and depth. Defining these variables makes it easy for us to change the size of our house later; it also makes the code easier to understand when we are setting the co-ordinates of the Minecraft bricks. For now, we suggest that you use the same values that we have; you can go back and change them once the house is complete and you want to alter its design.
It’s now time to start placing some bricks. We create the shell of our house with just two lines of code! These lines of code each use the setBlocks command to create a complete block of bricks. This function takes the following arguments:
setBlocks(x1, y1, z1, x2, y2, z2, block-id, data)
x1, y1, and z1 are the coordinates of one corner of the block of bricks that we want to create; x1, y1, and z1 are the coordinates of the other corner. The block-id is the type of block that we want to use. Some blocks require another value called data; we will see this being used later, but you can ignore it for now.
We have to work out the values that we need to use in place of x1, y1, z1, x1, y1, z1 for our walls. Note that what we want is a larger outer block made of bricks and that is filled with a slightly smaller block of air blocks. Yes, in Minecraft even air is actually just another type of block.
Once you have typed in the two lines that create the shell of your house, you almost ready to run your program. Before doing so, you must have Minecraft running and displaying the contents of your world. Do not have a world loaded with things that you have created, as they may get destroyed by the house that we are building. Go to a clear area in the Minecraft world before running the program. When you run your program, check for any errors in the ‘console’ window and fix them, repeatedly running the code again until you’ve corrected all the errors.
You should see a block of bricks now, as shown above. You may have to turn the player around in the Minecraft world before you can see your house.
Adding the floor and door
Now, let’s make our house a bit more interesting! Add the lines for the floor and door. Note that the floor extends beyond the boundary of the wall of the house; can you see how we achieve this?
Hint: look closely at how we calculate the x and z attributes as compared to when we created the house shell above. Also note that we use a value of y-1 to create the floor below our feet.
Minecraft doors are two blocks high, so we have to create them in two parts. This is where we have to use the data argument. A value of 0 is used for the lower half of the door, and a value of 8 is used for the upper half (the part with the windows in it). These values will create an open door. If we add 4 to each of these values, a closed door will be created.
Before you run your program again, move to a new location in Minecraft to build the house away from the previous one. Then run it to check that the floor and door are created; you will need to fix any errors again. Even if your program runs without errors, check that the floor and door are positioned correctly. If they aren’t, then you will need to check the arguments so setBlock and setBlocks are exactly as shown in the listing.
Adding windows
Hopefully you will agree that your house is beginning to take shape! Now let’s add some windows. Looking at the plan for our house, we can see that there is a window on each side; see if you can follow along. Add the four lines of code, one for each window.
Now you can move to yet another location and run the program again; you should have a window on each side of the house. Our house is starting to look pretty good!
Adding a roof
The final stage is to add a roof to the house. To do this we are going to use wooden stairs. We will do this inside a loop so that if you change the width of your house, more layers are added to the roof. Enter the rest of the code. Be careful with the indentation: I recommend using spaces and avoiding the use of tabs. After the if statement, you need to indent the code even further. Each indentation level needs four spaces, so below the line with if on it, you will need eight spaces.
Since some of these code lines are lengthy and indented a lot, you may well find that the text wraps around as you reach the right-hand side of your editor window — don’t worry about this. You will have to be careful to get those indents right, however.
Now move somewhere new in your world and run the complete program. Iron out any last bugs, then admire your house! Does it look how you expect? Can you make it better?
Customising your house
Now you can start to customise your house. It is a good idea to use Save As in the menu to save a new version of your program. Then you can keep different designs, or refer back to your previous program if you get to a point where you don’t understand why your new one doesn’t work.
Consider these changes:
Change the size of your house. Are you able also to move the door and windows so they stay in proportion?
Change the materials used for the house. An ice house placed in an area of snow would look really cool!
Add a back door to your house. Or make the front door a double-width door!
We hope that you have enjoyed writing this program to build a house. Now you can easily add a house to your Minecraft world whenever you want to by simply running this program.
You may also enjoy Martin O’Hanlon’s and David Whale’s Adventures in Minecraft, and the Hacking and Making in Minecraft MagPi Essentials guide, which you can download for free or buy in print here.
In this post, I’ll show you how to create a sample dataset for Amazon Macie, and how you can use Amazon Macie to implement data-centric compliance and security analytics in your Amazon S3 environment. I’ll also dive into the different kinds of credentials, document types, and PII detections supported by Macie. First, I’ll walk through creating a “getting started” sample set of artificial, generated data that you can use to test Macie capabilities and start building your own policies and alerts.
Create a realistic data sample set in S3
I’ll use amazon-macie-activity-generator, which we call “AMG” for short, a sample application developed by AWS that generates realistic content and accesses your test account to create the data. AMG uses AWS CloudFormation, AWS Lambda, and Python’s excellent Faker library to create a data set with artificial—but realistic—data classifications and access patterns to help test some of the features and capabilities of Macie. AMG is released under Amazon Software License 1.0, and we’ll accept pull requests on our GitHub repository and monitor any issues that are opened so we can try to fix bugs and consider new feature requests.
The following diagram shows a high level architecture overview of the components that will be created in your AWS account for AMG. For additional detail about these components and their relationships, review the CloudFormation setup script.
Depending on the data types specified in your JSON configuration template (details below), AMG will periodically generate artificial documents for the specified S3 target with a PutObject action. By default, the CloudFormation stack uses a configuration file that instructs AMG to create a new, private S3 bucket that can only be accessed by authorized AWS users/roles in the same account as the bucket. All the S3 objects with fake data in this bucket have a private ACL and inherit the bucket’s access control configuration. All generated objects feature the header in the example below, and AMG supports all fake data providers offered by https://faker.readthedocs.io/en/latest/index.html, as well as a few of AMG‘s own custom fake data providers requested by our customers: aws_creds, slack_creds, github_creds, facebook_creds, linux_shadow, rsa, linux_passwd, dsa, ec, pgp, cert, itin, swift_code, and cve.
# Sample Report - No identification of actual persons or places is # intended or should be inferred
74323 Julie Field Lake Joshuamouth, OR 30055-3905 1-196-191-4438x974 53001 Paul Union New John, HI 94740 Mastercard Amanda Wells 5135725008183484 09/26 CVV: 550
354-70-6172 242 George Plaza East Lawrencefurt, VA 37287-7620 GB73WAUS0628038988364 587 Silva Village Pearsonburgh, NM 11616-7231 LDNM1948227117807 American Express Brett Garza 347965534580275 05/20 CID: 4758
599.335.2742 JCB 15 digit Michael Arias 210069190253121 03/27 CVC: 861
Create your amazon-macie-activity-generator CloudFormation stack
You can deploy AMG in your AWS account by using either these methods:
Log in to the AWS Console in a region supported by Amazon Macie, which currently includes US East (N. Virginia), US West (Oregon).
Select the One-click CloudFormation launch stack, or launch CloudFormation using the template above.
Read our terms, select the Acknowledgement box, and then select Create.
Creating the data takes a few minutes, and you can periodically refresh CloudWatch to track progress.
Add the new sample data to Macie
Now, I’ll log into the Macie console and add the newly created sample data buckets for analysis by Macie.
Note: If you don’t explicitly specify a bucket for S3 targets in CloudFormation, AMG will use the S3 bucket that’s created by default for the stack, which will be printed out in the CloudFormation stack’s output.
To add buckets for data classification, follow these steps:
Log in to Amazon Macie.
Select Integrations, and then select Services.
Select your account, and then select Details from the Amazon S3 card.
Select your newly created buckets for Full classification, including existing data.
For additional details on configuring Macie, refer to our getting started documentation.
Macie classifies all historical and newly created data in the buckets created by AMG, and the data will be available in the Macie console as it’s classified. Typically, you can expect the data in the sample set to be classified within 60 minutes of the time it was selected for analysis.
Classifying objects with Macie
To see the objects in your test sample set, in Macie, open the Research tab, and then select the S3 Objects index. We’ll use the regular expression search capability in Macie to find any objects written to buckets that start with “amazon-macie-activity-generator-defaults3bucket”. To search for this, type the following text into the Macie search box and select the magnifying glass icon.
From here, you can see a nice breakdown of the kinds of objects that have been classified by Macie, as well as the object-specific details. Create an advanced search using Lucene Query Syntax, and save it as an alert to be matched against any newly created data.
Analyzing accesses to your test data
In addition to classifying data, Macie tracks all control plane and data plane accesses to your content using CloudTrail. To see accesses to your generated environment (created periodically by AMG to mimic user activity), on the Macie navigation bar, select Research, select the CloudTrail data index, and then use the following search to identify our generated role activity:
From this search, you can dive into the user activity (IAM users, assumed roles, federated users, and so on), which is summarized in 5-minute aggregations (user sessions). For example, in the screen shot you can see that one of our AMG-generated users listed objects one time (ListObjects) and wrote 56 objects to S3 (PutObject) during a 5-minute period.
Macie alerts
Macie features both predictive (machine learning-based) and basic (rule-based) alerts, including alerts on unencrypted credentials being uploaded to S3 (because this activity might not follow compliance best practices), risky activity such as data exfiltration, and user-defined alerts that are based on saved searches. To see alerts that have been generated based on AMG‘s activity, on the Macie navigation bar, select Alerts.
AMG will continue to run, periodically uploading content to the specified S3 buckets. To stop AMG, delete the AMG CloudFormation stack and associated resources here.
What are the costs?
Macie has a free tier enabling up to 1GB of content to be analyzed per month at no cost to you. By default, AMG will write approximately 10MB of objects to Amazon S3 per day, and you will incur charges for data classification after crossing the 1GB monthly free tier. Running continuously, AMG will generate about 310MB of content per month (10MB/day x 31 days), which will stay below the free tier. Any data use above 1GB will be billed at the Macie public price of $5/GB. For more detail, see the Macie pricing documentation.
If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the Amazon Macie forum or contact AWS Support.
A couple of weekends ago, we celebrated our sixth birthday by coordinating more than 100 simultaneous Raspberry Jam events around the world. The Big Birthday Weekend was a huge success: our fantastic community organised Jams in 40 countries, covering six continents!
We sent the Jams special birthday kits to help them celebrate in style, and a video message featuring a thank you from Philip and Eben:
To celebrate the Raspberry Pi’s sixth birthday, we coordinated Raspberry Jams all over the world to take place over the Raspberry Jam Big Birthday Weekend, 3-4 March 2018. A massive thank you to everyone who ran an event and attended.
The Raspberry Jam photo booth
I put together code for a Pi-powered photo booth which overlaid the Big Birthday Weekend logo onto photos and (optionally) tweeted them. We included an arcade button in the Jam kits so they could build one — and it seemed to be quite popular. Some Jams put great effort into housing their photo booth:
If you want to try out the photo booth software yourself, find the code on GitHub.
The great Raspberry Jam bake-off
Traditionally, in the UK, people have a cake on their birthday. And we had a few! We saw (and tasted) a great selection of Pi-themed cakes and other baked goods throughout the weekend:
Raspberry Jams everywhere
We always say that every Jam is different, but there’s a common and recognisable theme amongst them. It was great to see so many different venues around the world filling up with like-minded Pi enthusiasts, Raspberry Jam–branded banners, and Raspberry Pi balloons!
Thank you so much to all the attendees of the Ikana Jam in Krakow past Saturday! We shared fun experiences, some of them… also painful 😉 A big thank you to @Raspberry_Pi for these global celebrations! And a big thank you to @hubraum for their hospitality! #PiParty #rjam
We also had a super successful set of wearables workshops using @adafruit Circuit Playground Express boards and conductive thread at today’s @Raspberry_Pi Jam! Very popular! #PiParty
Learning how to scare the zombies in case of an apocalypse- it worked on our young learners #PiParty @worksopcollege @Raspberry_Pi https://t.co/pntEm57TJl
Being one of the two places in Kenya where the #PiParty took place, it was an amazing time spending the day with this team and getting to learn and have fun. @TaitaTavetaUni and @Raspberry_Pi thank you for your support. @TTUTechlady @mictecttu ch
The Philly & Pi #PiParty event with @Bresslergroup and @TechGirlzorg was awesome! The Scratch and Pi workshop was amazing! It was overall a great day of fun and tech!!! Thank you everyone who came out!
Thanks everyone who came out to the @Raspberry_Pi Big Birthday Jam! Special thanks to @PBFerrell @estefanniegg @pcsforme @pandafulmanda @colnels @bquentin3 couldn’t’ve put on this amazing community event without you guys!
Así terminamos el #Raspberry Jam Big Birthday Weekend #Bogota 2018 #PiParty de #RaspberryJamBogota 2018 @Raspberry_Pi Nos vemos el 7 de marzo en #ArduinoDayBogota 2018 y #RaspberryJamBogota 2018
Happy 6th birthday, @Raspberry_Pi! Greetings all the way from CEBU,PH! #PiParty #IoTCebu Thanks @CebuXGeeks X Ramos for these awesome pics. #Fablab #UPCebu
ラズパイ、6才のお誕生日会スタート in Tokyo PCNブースで、いろいろ展示とhttps://t.co/L6E7KgyNHFとIchigoJamつないだ、こどもIoTハッカソンmini体験やってます at 東京蒲田駅近 https://t.co/yHEuqXHvqe #piparty #pipartytokyo #rjam #opendataday
Personally, I managed to get to three Jams over the weekend: two run by the same people who put on the first two Jams to ever take place, and also one brand-new one! The Preston Raspberry Jam team, who usually run their event on a Monday evening, wanted to do something extra special for the birthday, so they came up with the idea of putting on a Raspberry Jam Sandwich — on the Friday and Monday around the weekend! This meant I was able to visit them on Friday, then attend the Manchester Raspberry Jam on Saturday, and finally drop by the new Jam at Worksop College on my way home on Sunday.
I’m at my first Raspberry Jam #PiParty event of the big birthday weekend! @PrestonRJam has been running for nearly 6 years and is a great place to start the celebrations!
Thanks to everyone who came to our Jam and everyone who helped out. @phoenixtogether thanks for amazing cake & hosting. Ademir you’re so cool. It was awesome to meet Craig Morley from @Raspberry_Pi too. #PiParty
Great #PiParty today at the @cotswoldjam with bloody delicious cake and lots of raspberry goodness. Great to see @ClareSutcliffe @martinohanlon playing on my new pi powered arcade build:-)
It’s @Raspberry_Pi 6th birthday and we’re celebrating by taking part in @amsterjam__! Happy Birthday Raspberry Pi, we’re so happy to be a part of the family! #PiParty
For more Jammy birthday goodness, check out the PiParty hashtag on Twitter!
The Jam makers!
A lot of preparation went into each Jam, and we really appreciate all the hard work the Jam makers put in to making these events happen, on the Big Birthday Weekend and all year round. Thanks also to all the teams that sent us a group photo:
Lots of the Jams that took place were brand-new events, so we hope to see them continue throughout 2018 and beyond, growing the Raspberry Pi community around the world and giving more people, particularly youths, the opportunity to learn digital making skills.
So many wonderful people in the @Raspberry_Pi community. Thanks to everyone at #PottonPiAndPints for a great afternoon and for everything you do to help young people learn digital making. #PiParty
Special thanks to ModMyPi for shipping the special Raspberry Jam kits all over the world!
Don’t forget to check out our Jam page to find an event near you! This is also where you can find free resources to help you get a new Jam started, and download free starter projects made especially for Jam activities. These projects are available in English, Français, Français Canadien, Nederlands, Deutsch, Italiano, and 日本語. If you’d like to help us translate more content into these and other languages, please get in touch!
PS Some of the UK Jams were postponed due to heavy snowfall, so you may find there’s a belated sixth-birthday Jam coming up where you live!
Amazon S3 provides comprehensive security and compliance capabilities that meet even the most stringent regulatory requirements. It gives you flexibility in the way you manage data for cost optimization, access control, and compliance. However, because the service is flexible, a user could accidentally configure buckets in a manner that is not secure. For example, let’s say you uploaded files to an Amazon S3 bucket with public read permissions, even though you intended only to share this file with a colleague or a partner. Although this might have accomplished your task to share the file internally, the file is now available to anyone on the internet, even without authentication.
In this blog post, we show you how to prevent your Amazon S3 buckets and objects from allowing public access. We discuss how to secure data in Amazon S3 with a defense-in-depth approach, where multiple security controls are put in place to help prevent data leakage. This approach helps prevent you from allowing public access to confidential information, such as personally identifiable information (PII) or protected health information (PHI).
Preventing your Amazon S3 buckets and objects from allowing public access
Every call to an Amazon S3 service becomes a REST API request. When your request is transformed via a REST call, the permissions are converted into parameters included in the HTTP header or as URL parameters. The Amazon S3 bucket policy allows or denies access to the Amazon S3 bucket or Amazon S3 objects based on policy statements, and then evaluates conditions based on those parameters. To learn more, see Using Bucket Policies and User Policies.
With this in mind, let’s say multiple AWS Identity and Access Management (IAM) users at Example Corp. have access to an Amazon S3 bucket and the objects in the bucket. Example Corp. wants to share the objects among its IAM users, while at the same time preventing the objects from being made available publicly.
To demonstrate how to do this, we start by creating an Amazon S3 bucket named examplebucket. After creating this bucket, we must apply the following bucket policy. This policy denies any uploaded object (PutObject) with the attribute x-amz-acl having the values public-read, public-read-write, or authenticated-read. This means authenticated users cannot upload objects to the bucket if the objects have public permissions.
“Deny any Amazon S3 request to PutObject or PutObjectAcl in the bucket examplebucket when the request includes one of the following access control lists (ACLs): public-read, public-read-write, or authenticated-read.”
Remember that IAM policies are evaluated not in a first-match-and-exit model. Instead, IAM evaluates first if there is an explicit Deny. If there is not, IAM continues to evaluate if you have an explicit Allow and then you have an implicit Deny.
The above policy creates an explicit Deny. Even when any authenticated user tries to upload (PutObject) an object with public read or write permissions, such as public-read or public-read-write or authenticated-read, the action will be denied. To understand how S3 Access Permissions work, you must understand what Access Control Lists (ACL) and Grants are. You can find the documentation here.
Now let’s continue our bucket policy explanation by examining the next statement.
This statement is very similar to the first statement, except that instead of checking the ACLs, we are checking specific user groups’ grants that represent the following groups:
AuthenticatedUsers group. Represented by http://acs.amazonaws.com/groups/global/AuthenticatedUsers, this group represents all AWS accounts. Access permissions to this group allow any AWS account to access the resource. However, all requests must be signed (authenticated).
AllUsers group. Represented by http://acs.amazonaws.com/groups/global/AllUsers, access permissions to this group allow anyone on the internet access to the resource. The requests can be signed (authenticated) or unsigned (anonymous). Unsigned requests omit the Authentication header in the request.
Now that you know how to deny object uploads with permissions that would make the object public, you just have two statement policies that prevent users from changing the bucket permissions (Denying s3:PutBucketACL from ACL and Denying s3:PutBucketACL from Grants).
Below is how we’re preventing users from changing the bucket permisssions.
As you can see above, the statement is very similar to the Object statements, except that now we use s3:PutBucketAcl instead of s3:PutObjectAcl, the Resource is just the bucket ARN, and the objects have the “/*” in the end of the ARN.
In this section, we showed how to prevent IAM users from accidently uploading Amazon S3 objects with public permissions to buckets. In the next section, we show you how to enforce multiple layers of security controls, such as encryption of data at rest and in transit while serving traffic from Amazon S3.
Securing data on Amazon S3 with defense-in-depth
Let’s say that Example Corp. wants to serve files securely from Amazon S3 to its users with the following requirements:
The data must be encrypted at rest and during transit.
The data must be accessible only by a limited set of public IP addresses.
All requests for data should be handled only by Amazon CloudFront (which is a content delivery network) instead of being directly available from an Amazon S3 URL. If you’re using an Amazon S3 bucket as the origin for a CloudFront distribution, you can grant public permission to read the objects in your bucket. This allows anyone to access your objects either through CloudFront or the Amazon S3 URL. CloudFront doesn’t expose Amazon S3 URLs, but your users still might have access to those URLs if your application serves any objects directly from Amazon S3, or if anyone gives out direct links to specific objects in Amazon S3.
A domain name is required to consume the content. Custom SSL certificate support lets you deliver content over HTTPS by using your own domain name and your own SSL certificate. This gives visitors to your website the security benefits of CloudFront over an SSL connection that uses your own domain name, in addition to lower latency and higher reliability.
To represent defense-in-depth visually, the following diagram contains several Amazon S3 objects (A) in a single Amazon S3 bucket (B). You can encrypt these objects on the server side or the client side. You also can configure the bucket policy such that objects are accessible only through CloudFront, which you can accomplish through an origin access identity (C). You then can configure CloudFront to deliver content only over HTTPS in addition to using your own domain name (D).
Defense-in-depth requirement 1: Data must be encrypted at rest and during transit
Let’s start with the objects themselves. Amazon S3 objects—files in this case—can range from zero bytes to multiple terabytes in size (see service limits for the latest information). Each Amazon S3 bucket includes a collection of objects, and the objects can be uploaded via the Amazon S3 console, AWS CLI, or AWS API.
If you choose to use server-side encryption, Amazon S3 encrypts your objects before saving them on disks in AWS data centers. To encrypt an object at the time of upload, you need to add the x-amz-server-side-encryption header to the request to tell Amazon S3 to encrypt the object using Amazon S3 managed keys (SSE-S3), AWS KMS managed keys (SSE-KMS), or customer-provided keys (SSE-C). There are two possible values for the x-amz-server-side-encryption header: AES256, which tells Amazon S3 to use Amazon S3 managed keys, and aws:kms, which tells Amazon S3 to use AWS KMS managed keys.
The following code example shows a Put request using SSE-S3.
PUT /example-object HTTP/1.1
Host: myBucket.s3.amazonaws.com
Date: Wed, 8 Jun 2016 17:50:00 GMT
Authorization: authorization string
Content-Type: text/plain
Content-Length: 11434
x-amz-meta-author: Janet
Expect: 100-continue
x-amz-server-side-encryption: AES256
[11434 bytes of object data]
If you choose to use client-side encryption, you can encrypt data on the client side and upload the encrypted data to Amazon S3. In this case, you manage the encryption process, the encryption keys, and related tools. You encrypt data on the client side by using AWS KMS managed keys or a customer-supplied, client-side master key.
Defense-in-depth requirement 2: Data must be accessible only by a limited set of public IP addresses
At the Amazon S3 bucket level, you can configure permissions through a bucket policy. For example, you can limit access to the objects in a bucket by IP address range or specific IP addresses. Alternatively, you can make the objects accessible only through HTTPS.
The following bucket policy allows access to Amazon S3 objects only through HTTPS (the policy was generated with the AWS Policy Generator). Here the bucket policy explicitly denies ("Effect": "Deny") all read access ("Action": "s3:GetObject") from anybody who browses ("Principal": "*") to Amazon S3 objects within an Amazon S3 bucket if they are not accessed through HTTPS ("aws:SecureTransport": "false").
Defense-in-depth requirement 3: Data must not be publicly accessible directly from an Amazon S3 URL
Next, configure Amazon CloudFront to serve traffic from within the bucket. The use of CloudFront serves several purposes:
CloudFront is a content delivery network that acts as a cache to serve static files quickly to clients.
Depending on the number of requests, the cost of delivery is less than if objects were served directly via Amazon S3.
Objects served through CloudFront can be limited to specific countries.
Access to these Amazon S3 objects is available only through CloudFront. We do this by creating an origin access identity (OAI) for CloudFront and granting access to objects in the respective Amazon S3 bucket only to that OAI. As a result, access to Amazon S3 objects from the internet is possible only through CloudFront; all other means of accessing the objects—such as through an Amazon S3 URL—are denied. CloudFront acts not only as a content distribution network, but also as a host that denies access based on geographic restrictions. You apply these restrictions by updating your CloudFront web distribution and adding a whitelist that contains only a specific country’s name (let’s say Liechtenstein). Alternatively, you could add a blacklist that contains every country except that country. Learn more about how to use CloudFront geographic restriction to whitelist or blacklist a country to restrict or allow users in specific locations from accessing web content in the AWS Support Knowledge Center.
Defense-in-depth requirement 4: A domain name is required to consume the content
To serve content from CloudFront, you must use a domain name in the URLs for objects on your webpages or in your web application. The domain name can be either of the following:
The domain name that CloudFront automatically assigns when you create a distribution, such as d111111abcdef8.cloudfront.net
Your own domain name, such as example.com
For example, you might use one of the following URLs to return the file image.jpg:
You use the same URL format whether you store the content in Amazon S3 buckets or at a custom origin, like one of your own web servers.
Instead of using the default domain name that CloudFront assigns for you when you create a distribution, you can add an alternate domain name that’s easier to work with, like example.com. By setting up your own domain name with CloudFront, you can use a URL like this for objects in your distribution: http://example.com/images/image.jpg.
Let’s say that you already have a domain name hosted on Amazon Route 53. You would like to serve traffic from the domain name, request an SSL certificate, and add this to your CloudFront web distribution. The SSL offloading occurs in CloudFront by serving traffic securely from each CloudFront location. You also can configure CloudFront to deliver your content over HTTPS by using your custom domain name and your own SSL certificate. Serving web content through CloudFront reduces response from the origin as requests are redirected to the nearest edge location. This results in faster download times than if the visitor had requested the content from a data center that is located farther away.
Summary
In this post, we demonstrated how you can apply policies to Amazon S3 buckets so that only users with appropriate permissions are allowed to access the buckets. Anonymous users (with public-read/public-read-write permissions) and authenticated users without the appropriate permissions are prevented from accessing the buckets.
We also examined how to secure access to objects in Amazon S3 buckets. The objects in Amazon S3 buckets can be encrypted at rest and during transit. Doing so helps provide end-to-end security from the source (in this case, Amazon S3) to your users.
If you have feedback about this blog post, submit comments in the “Comments” section below. If you have questions about this blog post, start a new thread on the Amazon S3 forum or contact AWS Support.
This video presents the project MoveLens: a voice controlled glasses with magnifying lens. It was the my entry for the Voice Activated context on unstructables. Check the step by step guide at Voice Controlled Glasses With Magnifying Lens. Source code: https://github.com/pichiliani/MoveLens Step by Step guide: https://www.instructables.com/id/Voice-Controlled-Glasses-With-Magnifying-Lens/
It’s a kind of magnification
We’ve all been there – that moment when you need another pair of hands to complete a task. And while these glasses may not hold all the answers, they’re a perfect addition to any hobbyist’s arsenal.
Introducing Mauro Pichilliani’s voice-activated glasses: a pair of frames with magnification lenses that can flip up and down in response to a voice command, depending on the task at hand. No more needing to put down your tools in order to put magnifying glasses on. No more trying to re-position a magnifying glass with the back of your left wrist, or getting grease all over your lenses.
As Mauro explains in his tutorial for the glasses:
Many professionals work for many hours looking at very small areas, such as surgeons, watchmakers, jewellery designers and so on. Most of the time these professionals use some kind of magnification glasses that helps them to see better the area they are working with and other tiny items used on the job. The devices that had magnifications lens on a form factor of a glass usually allow the professional to move the lens out of their eye sight, i.e. put aside the lens. However, in some scenarios touching the lens or the glass rim to move away the lens can contaminate the fingers. Also, it is cumbersome and can break the concentration of the professional.
Voice-controlled magnification glasses
Using a Raspberry Pi Zero W, a servo motor, a microphone, and the IBM Watson speech-to-text service, Mauro built a pair of glasses that lets users control the position of the magnification lenses with voice commands.
The glasses Mauro modified, before he started work on them; you have to move the lenses with your hands, like it’s October 2015
Mauro started by dismantling a pair of standard magnification glasses in order to modify the lens supports to allow them to move freely. He drilled a hole in one of the lens supports to provide a place to attach the servo, and used lollipop sticks and hot glue to fix the lenses relative to one another, so they would both move together under the control of the servo. Then, he set up a Raspberry Pi Zero, installing Raspbian and software to use a USB microphone; after connecting the servo to the Pi Zero’s GPIO pins, he set up the Watson speech-to-text service.
Finally, he wrote the code to bring the project together. Two Python scripts direct the servo to raise and lower the lenses, and a Node.js script captures audio from the microphone, passes it on to Watson, checks for an “up” or “down” command, and calls the appropriate Python script as required.
Your turn
You can follow the tutorial on the Instructables website, where Mauro entered the glasses into the Instructables Voice Activated Challenge. And if you’d like to take your first steps into digital making using the Raspberry Pi, take a look at our free online projects.
— E. Harding🇸🇾, друг народа (anti-Russia=block) (@Enopoletus) March 1, 2018
The question is about a blog post that claims Tor privately tips off the government about vulnerabilities, using as proof a “vulnerability” from October 2007 that wasn’t made public until 2011.
The tl;dr is that it’s bunk. There was no vulnerability, it was a feature request. The details were already public. There was no spy agency involved, but the agency that does Voice of America, and which tries to protect activists under foreign repressive regimes.
Discussion
The issue is that Tor traffic looks like Tor traffic, making it easy to block/censor, or worse, identify users. Over the years, Tor has added features to make it look more and more like normal traffic, like the encrypted traffic used by Facebook, Google, and Apple. Tors improves this bit-by-bit over time, but short of actually piggybacking on website traffic, it will always leave some telltale signature.
An example showing how we can distinguish Tor traffic is the packet below, from the latest version of the Tor server:
Had this been Google or Facebook, the names would be something like “www.google.com” or “facebook.com”. Or, had this been a normal “self-signed” certificate, the names would still be recognizable. But Tor creates randomized names, with letters and numbers, making it distinctive. It’s hard to automate detection of this, because it’s only probably Tor (other self-signed certificates look like this, too), which means you’ll have occasional “false-positives”. But still, if you compare this to the pattern of traffic, you can reliably detect that Tor is happening on your network.
This has always been a known issue, since the earliest days. Google the search term “detect tor traffic”, and set your advanced search dates to before 2007, and you’ll see lots of discussion about this, such as this post for writing intrusion-detection signatures for Tor.
Among the things you’ll find is this presentation from 2006 where its creator (Roger Dingledine) talks about how Tor can be identified on the network with its unique network fingerprint. For a “vulnerability” they supposedly kept private until 2011, they were awfully darn public about it.
The above blogpost claims Tor kept this vulnerability secret until 2011 by citing this message. It’s because Levine doesn’t understand the terminology and is just blindly searching for an exact match for “TLS normalization”. Here’s an earlier proposed change for the long term goal of to “make our connection handshake look closer to a regular HTTPS [TLS] connection”, from February 2007. Here is another proposal from October 2007 on changing TLS certificates, from days after the email discussion (after they shipped the feature, presumably).
What we see here is here is a known problem from the very beginning of the project, a long term effort to fix that problem, and a slow dribble of features added over time to preserve backwards compatibility.
Now let’s talk about the original train of emails cited in the blogpost. It’s hard to see the full context here, but it sounds like BBG made a feature request to make Tor look even more like normal TLS, which is hinted with the phrase “make our funders happy”. Of course the people giving Tor money are going to ask for improvements, and of course Tor would in turn discuss those improvements with the donor before implementing them. It’s common in project management: somebody sends you a feature request, you then send the proposal back to them to verify what you are building is what they asked for.
As for the subsequent salacious paragraph about “secrecy”, that too is normal. When improving a problem, you don’t want to talk about the details until after you have a fix. But note that this is largely more for PR than anything else. The details on how to detect Tor are available to anybody who looks for them — they just aren’t readily accessible to the layman. For example, Tenable Networks announced the previous month exactly this ability to detect Tor’s traffic, because any techy wanting to would’ve found the secrets how to. Indeed, Teneble’s announcement may have been the impetus for BBG’s request to Tor: “can you fix it so that this new Tenable feature no longer works”.
To be clear, there are zero secret “vulnerability details” here that some secret spy agency could use to detect Tor. They were already known, and in the Teneble product, and within the grasp of any techy who wanted to discover them. A spy agency could just buy Teneble, or copy it, instead of going through this intricate conspiracy.
Conclusion
The issue isn’t a “vulnerability”. Tor traffic is recognizable on the network, and over time, they make it less and less recognizable. Eventually they’ll just piggyback on true HTTPS and convince CloudFlare to host ingress nodes, or something, making it completely undetectable. In the meanwhile, it leaves behind fingerprints, as I showed above.
What we see in the email exchanges is the normal interaction of a donor asking for a feature, not a private “tip off”. It’s likely the donor is the one who tipped off Tor, pointing out Tenable’s product to detect Tor.
Whatever secrets Tor could have tipped off to the “secret spy agency” were no more than what Tenable was already doing in a shipping product.
Update: People are trying to make it look like Voice of America is some sort of intelligence agency. That’s a conspiracy theory. It’s not a member of the American intelligence community. You’d have to come up with a solid reason explaining why the United States is hiding VoA’s membership in the intelligence community, or you’d have to believe that everything in the U.S. government is really just some arm of the C.I.A.
The Free Software Foundation has announced the availability of its 2016 annual report. “The Annual Report reviews the Foundation’s activities, accomplishments, and financial picture from October 1, 2015 to September 30, 2016. It is the result of a full external financial audit, along with a focused study of program results.” It may lack punctuality, but it makes up for it in glitz.
If you’re going to commit an illegal act, it’s best not to discuss it in e-mail. It’s also best to Google tech instructions rather than asking someone else to do it:
One new detail from the indictment, however, points to just how unsophisticated Manafort seems to have been. Here’s the relevant passage from the indictment. I’ve bolded the most important bits:
Manafort and Gates made numerous false and fraudulent representations to secure the loans. For example, Manafort provided the bank with doctored [profit and loss statements] for [Davis Manafort Inc.] for both 2015 and 2016, overstating its income by millions of dollars. The doctored 2015 DMI P&L submitted to Lender D was the same false statement previously submitted to Lender C, which overstated DMI’s income by more than $4 million. The doctored 2016 DMI P&L was inflated by Manafort by more than $3.5 million. To create the false 2016 P&L, on or about October 21, 2016, Manafort emailed Gates a .pdf version of the real 2016 DMI P&L, which showed a loss of more than $600,000. Gates converted that .pdf into a “Word” document so that it could be edited, which Gates sent back to Manafort. Manafort altered that “Word” document by adding more than $3.5 million in income. He then sent this falsified P&L to Gates and asked that the “Word” document be converted back to a .pdf, which Gates did and returned to Manafort. Manafort then sent the falsified 2016 DMI P&L .pdf to Lender D.
So here’s the essence of what went wrong for Manafort and Gates, according to Mueller’s investigation: Manafort allegedly wanted to falsify his company’s income, but he couldn’t figure out how to edit the PDF. He therefore had Gates turn it into a Microsoft Word document for him, which led the two to bounce the documents back-and-forth over email. As attorney and blogger Susan Simpson notes on Twitter, Manafort’s inability to complete a basic task on his own seems to have effectively “created an incriminating paper trail.”
If there’s a lesson here, it’s that the Internet constantly generates data about what people are doing on it, and that data is all potential evidence. The FBI is 100% wrong that they’re going dark; it’s really the golden age of surveillance, and the FBI’s panic is really just its own lack of technical sophistication.
I’m aware this is going to be a very niche topic. Electronically signing PDFs is far from a mainstream usecase. However, I’ll write it for two reasons – first, I think it will be very useful for those few who actually need it, and second, I think it will become more and more common as the eIDAS regulation gain popularity – it basically says that electronic signatures are recognized everywhere in Europe (now, it’s not exactly true, because of some boring legal details, but anyway).
So, what is the usecase – first, you have to electronically sign the PDF with an a digital signature (the legal term is “electronic signature”, so I’ll use them interchangeably, although they don’t fully match – e.g. any electronic data applied to other data can be seen as an electronic signature, where a digital signature is the PKI-based signature).
Second, you may want to actually display the signature on the pages, rather than have the PDF reader recognize it and show it in some side-panel. Why is that? Because people are used to seeing signatures on pages and some may insist on having the signature visible (true story – I’ve got a comment that a detached signature “is not a REAL electronic signature, because it’s not visible on the page”).
Now, notice that I wrote “pages”, on “page”. Yes, an electronic document doesn’t have pages – it’s a stream of bytes. So having the signature just on the last page is okay. But, again, people are used to signing all pages, so they’d prefer the electronic signature to be visible on all pages.
And that makes the task tricky – PDF is good with having a digital signature box on the last page, but having multiple such boxes doesn’t work well. Therefore one has to add other types of annotations that look like a signature box and when clicked open the signature panel (just like an actual signature box).
I have to introduce here DSS – a wonderful set of components by the European Commission that can be used to sign and validate all sorts of electronic signatures. It’s open source, you can use at any way you like. Deploy the demo application, use only the libraries, whatever. It includes the signing functionality out of the box – just check the PAdESService or the PDFBoxSignatureService. It even includes the option to visualize the signature once (on a particular page).
However, it doesn’t have the option to show “stamps” (images) on multiple pages. Which is why I forked it and implemented the functionality. Most of my changes are in the PDFBoxSignatureService in the loadAndStampDocument(..) method. If you want to use that functionality you can just build a jar from my fork and use it (by passing the appropriate SignatureImageParameters to PAdESSErvice.sign(..) to define how the signature will look like).
Why is this needed in the first place? Because when a document is signed, you cannot modify it anymore, as you will change the hash. However, PDFs have incremental updates which allow appending to the document and thus having a newer version without modifying anything in the original version. That way the signature is still valid (the originally signed content is not modified), but new stuff is added. In our case, this new stuff is some “annotations”, which represent an image and a clickable area that opens the signature panel (in Adobe Reader at least). And while they are added before the signature box is added, if there are more than one signer, then the 2nd signer’s annotations are added after the first signature.
Sadly, PDFBox doesn’t support that out of the box. Well, it almost does – the piece of code below looks hacky, and it took a while to figure what exactly should be called and when, but it works with just a single reflection call:
for (PDPage page : pdDocument.getPages()) {
// reset existing annotations (needed in order to have the stamps added)
page.setAnnotations(null);
}
// reset document outline (needed in order to have the stamps added)
pdDocument.getDocumentCatalog().setDocumentOutline(null);
List<PDAnnotation> annotations = addStamps(pdDocument, parameters);
setDocumentId(parameters, pdDocument);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (COSWriter writer = new COSWriter(baos, new RandomAccessBuffer(pdfBytes))) {
// force-add the annotations (wouldn't be saved in incremental updates otherwise)
annotations.forEach(ann -> addObjectToWrite(writer, ann.getCOSObject()));
// technically the same as saveIncremental but with more control
writer.write(pdDocument);
}
pdDocument.close();
pdDocument = PDDocument.load(baos.toByteArray());
...
}
private void addObjectToWrite(COSWriter writer, COSDictionary cosObject) {
// the COSWriter does not expose the addObjectToWrite method, so we need reflection to add the annotations
try {
Method method = writer.getClass().getDeclaredMethod("addObjectToWrite", COSBase.class);
method.setAccessible(true);
method.invoke(writer, cosObject);
} catch (Exception ex) {
throw new RuntimeException(ex);
}
}
What it does is – loads the original PDF, clears some internal catalogs, adds the annotations (images) to all pages, and then “force-add the annotations” because they “wouldn’t be saved in incremental updates otherwise”. I hope PDFBox make this a little more straightforward, but for the time being this works, and it doesn’t invalidate the existing signatures.
I hope that this posts introduces you to:
the existence of legally binding electronic signatures
the existence of the DSS utilities
the PAdES standard for PDF signing
how to place more than just one signature box in a PDF document
And I hope this article becomes more and more popular over time, as more and more businesses realize they could make use of electronic signatures.
At this year’s FOSDEM in Brussels, Jan Tobias Mühlberg gave a talk on the latest work on Sancus, a project that was originally presented at the USENIX Security Symposium in 2013. The project is a fully open-source hardware platform to support “trusted computing” and other security functionality. It is designed to be used for internet of things (IoT) devices, automotive applications, critical infrastructure, and other embedded devices where trusted code is expected to be run.
The Amazon RDS team launched nearly 80 features in 2017. Some of them were covered in this blog, others on the AWS Database Blog, and the rest in What’s New or Forum posts. To wrap up my week, I thought it would be worthwhile to give you an organized recap. So here we go!
Using Amazon Cloud Directory, you can build flexible, cloud-native directories for organizing hierarchies of data along multiple dimensions. And now, you can search more efficiently by searching across only a subset of objects in your directory. For example, instead of searching through all of the employees in a company directory built using Cloud Directory, you can choose to search only full-time employees or contractors.
To search across such a subset of objects, you must first create a facet-based index. A facet is a set of attributes defined in a schema that is associated with a directory object. Using facets, you can create different object types in your directory. For instance, you can create different facets for full-time employees and contractors in a schema and then create full-time employee objects and contractor objects. You then can create an index of all the objects that include a specific facet and search those objects more efficiently.
In this blog post, I show how you can create a facet-based index in Cloud Directory to more efficiently search for objects in your directory.
Scenario: Searching a company directory for a specific employee type
Let’s say a company called AnyCompany wants to be able to efficiently search in Cloud Directory for information about its full-time employees and contractors. To do this, AnyCompany must create a company directory using Cloud Directory. (If AnyCompany already had a company directory using Cloud Directory, they could use that directory instead.) AnyCompany starts by creating DirectorySchema, which is a schema that includes three facets: FullTimeEmployeeFacet, ManagerFacet, and ContractorFacet.
The following diagram is a visual representation of AnyCompany’s company directory, and it includes full-time employees and contractors in a reporting hierarchy. The full-time employees are shown in blue nodes and the contractors are shown in green nodes. The directory’s three facets are shown as they correspond to full-time employees, managers, and contractors.
To more efficiently search your directory, follow these steps:
Create a facet-based index that includes the facets you want to use when searching.
Populate the index with the appropriate employee objects.
List all the objects in the index.
List objects in the index that include a specific facet.
1. Create a facet-based index that includes the facets you want to use when searching
The following code example creates a facet-based index of the employee objects in the directory. Cloud Directory currently supports only simple indexes, which means that an index object can only store one type of value, such as a facet.
// Create an index
// <region> indicates an AWS Region value such as “us-east-1”
// <accountId> indicates your AWS account ID
// <directoryId> indicates your Cloud Directory ID
// The schemaArn points to the specific schema, which is DirectorySchema
String schemaArn = "arn:aws:clouddirectory:<region>:<accountId>:directory/<directoryId>/schema/DirectorySchema/1.0" ;
// I define attributes that I want to use for indexing. In this case, I use “facets” to define an
// attribute for indexing. This is a hard-coded value that is defined by Cloud Directory.
AttributeKey indexAttributeKey = new AttributeKey()
.withSchemaArn(schemaArn)
.withFacetName("facets")
.withName("facets") ;
List<AttributeKey> orderedIndexedAttributeList = new ArrayList<AttributeKey>() ;
orderedIndexedAttributeList.add(indexAttributeKey) ;
// The directoryArn points to the specific directory that I am working on
String directoryArn = "arn:aws:clouddirectory:<region>:<accountId>:directory/<directoryId>" ;
// I create the index request and pass in the directoryArn and my attribute list.
// Because I am defining the facetIndex at the root of my directory, my parentReference is the root of the directory
// For LinkName, I have defined “MyFacetIndex,” as shown in the diagram
ObjectReference dirRoot = new ObjectReference().withSelector("/"); // Directory root
CreateIndexRequest createRequest = new CreateIndexRequest()
.withDirectoryArn(directoryArn)
.withOrderedIndexedAttributeList(orderedIndexedAttributeList)
.withIsUnique(false)
.withParentReference(dirRoot) // Attach to directory root
.withLinkName("MyFacetIndex") ;
// I assign the indexed object to facetIndex
CreateIndexResult facetIndexResult = cloudDirectoryClient.createIndex(createRequest) ;
ObjectReference facetIndex = new ObjectReference().withSelector(facetIndexResult.getObjectIdentifier());
2. Populate the index with the appropriate employee objects
Next, I add all the objects that I want to include in the index. The following code example adds objects to facetIndex.
// I assume userObj1 is “Tim”. The following code adds “Tim” to the index.
// Create an index attach request with the directory, facet, and object details
AttachToIndexRequest indexAttachRequest = new AttachToIndexRequest()
.withDirectoryArn(directoryArn)
.withIndexReference(facetIndex)
.withTargetReference(userObj1) ;
// Add the object to the index
cloudDirectoryClient.attachToIndex(indexAttachRequest) ;
// You can follow the same code pattern to add other full-time employee and contractor objects to the index.
3. List all the objects in the index
Now, I can query my directory efficiently for the set of objects I have in facetIndex. The following code example returns all the objects in your index.
// List all objects in the facet-based index
ListIndexResult listResults = cloudDirectoryClient.listIndex(new ListIndexRequest()
.withDirectoryArn(directoryArn)
.withIndexReference(facetIndex)) ;
4. List objects in the index that include a specific facet
I can add a filter for retrieving subsets of objects in the index that contain a specific facet. The following code example shows how to add a filter to the query so that only objects that contain the facet FullTimeEmployeeFacet are returned.
// I choose the specific facet I will use for filtering my query and get all objects that contain this facet in them.
String filterString = "DirectorySchema/1.0/FullTimeEmployeeFacet" ;
TypedAttributeValue filterStringValue = new TypedAttributeValue().withStringValue(filterString);
// I define the filter range and mention both the start mode and end mode as inclusive because I will query for a specific facet
ObjectAttributeRange objectAttributeRange = new ObjectAttributeRange()
.withAttributeKey(new AttributeKey()
.withFacetName("facets")
.withName("facets")
.withSchemaArn(schemaArn))
.withRange(new TypedAttributeValueRange()
.withStartMode(RangeMode.INCLUSIVE)
.withStartValue(filterStringValue)
.withEndMode(RangeMode.INCLUSIVE)
.withEndValue(filterStringValue)) ; // Query for objects with FullTimeEmployeeFacet which is defined in filterString
// List the index results
ListIndexResult filteredResults = cloudDirectoryClient.listIndex(new ListIndexRequest()
.withDirectoryArn(directoryArn)
.withIndexReference(facetIndex)
.withRangesOnIndexedValues(objectAttributeRange)) ;
Using this subset of objects, I can now search for a specific employee without searching across all the objects in my directory.
Summary
You can use facet-based indexing to search your directory more efficiently by searching across only a subset of objects in of your directory. For more information about this feature, see Indexing and Search.
If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about implementing the solution in this blog post, start a new thread in the Directory Service forum or contact AWS Support.
It seems like only yesterday that we crossed the 350 petabyte mark. It was actually June 2017, but boy have we been growing since. In October 2017 we crossed 400 petabytes. Today, we’re proud to announce we’ve crossed the 500 petabyte mark. That’s a very healthy clip, see for yourself!
Whether you have 50 GB, 500 GB or are just an avid blog reader, thank you for being on this incredible journey with us through the years.
…we’re literally moving at 1,000,000 files per hour.
We’re extremely proud of our track record. Throughout these 11 years we’ve striven to be the simplest, fastest, and most affordable online backup (and now cloud storage) solution available. We’re not just focusing on data ingress, but also adhering to our original goal of making sure that “no one ever loses data again.” How quickly are we restoring data? On average, we’re literally moving at 1,000,000 files per hour.
Even after all these years, one of the most frequent questions asked is, “How has Backblaze maintained such affordable pricing, particularly when the industry continues to move away from unlimited data plans?”
The cloud storage industry is very competitive, with cloud sync, storage, and backup providers leaving the unlimited market every single day: OneDrive, Amazon Cloud Storage, and most recently CrashPlan. Other providers either have tiered pricing (iDrive), or charge almost double or even triple for all the features we provide for our unlimited backup service (Carbonite). So how do we do it?
The answer comes down to our relentless pursuit of lowering costs. Our open-source Backblaze Storage Pods comprise our Backblaze Vaults, and the less expensive and more performant our Storage Pods are, the better the service that we can provide. This all directly translates into the service and pricing we can offer you.
A key part of our service is to be as open as possible with our costs and structure. After all, you are entrusting us with some of your most valuable assets. Still, it is very difficult to find an apples to apples comparison to what our competitors are doing. For example, we can gain some insight from a 2011 interview with Carbonite’s CEO, who gave an interview in which he said Carbonite’s cost of storing a petabyte was $250,000. At the time, our cost to store a petabyte was $76,481 (more on that calculation can be found here and here). If Backblaze’s fundamental cost to store data is one-third that of Carbonite’s, it makes sense that Carbonite’s cost to its customers would be more than Backblaze’s. Today, Backblaze backup is $50/year and Carbonite’s equivalent service is $149.99.
Our continued focus on reducing costs has allowed us to maintain a healthy business. And after accepting customer data for almost 10 years, we sincerely want to thank you all for giving us your trust, and allowing us to protect your important data and memories for you. Here’s to the next 500 petabytes; they’ll be here before we know it.
Update 2/5/18
Since publishing this post, we have posted the latest in our series of Hard Drive Stats, in which we summarize the performance of the hard drives we used in our data centers in 2017 and previously.
Kaspersky Labs is reporting on a new piece of sophisticated malware:
We observed many web landing pages that mimic the sites of mobile operators and which are used to spread the Android implants. These domains have been registered by the attackers since 2015. According to our telemetry, that was the year the distribution campaign was at its most active. The activities continue: the most recently observed domain was registered on October 31, 2017. Based on our KSN statistics, there are several infected individuals, exclusively in Italy.
Moreover, as we dived deeper into the investigation, we discovered several spyware tools for Windows that form an implant for exfiltrating sensitive data on a targeted machine. The version we found was built at the beginning of 2017, and at the moment we are not sure whether this implant has been used in the wild.
It seems to be Italian. Ars Technica speculates that it is related to Hacking Team:
That’s not to say the malware is perfect. The various versions examined by Kaspersky Lab contained several artifacts that provide valuable clues about the people who may have developed and maintained the code. Traces include the domain name h3g.co, which was registered by Italian IT firm Negg International. Negg officials didn’t respond to an email requesting comment for this post. The malware may be filling a void left after the epic hack in 2015 of Hacking Team, another Italy-based developer of spyware.
We came to see Toby and Brian Bahn, physics teacher for Dover High School and leader of the Dover Robotics club, so they could tell us about the inner workings of the Tic-Tac-Toe Robot project, and how the Raspberry Pi fit within it. Check out our video for Toby’s explanation of the build and the software controlling it.
Toby’s original robotic arm prototype used a weight to direct the pen on and off the paper. He later replaced this with a servo motor.
Toby documented the prototyping process for the robot on the Dover Robotics blog. Head over there to hear more about the highs and lows of building a robotic arm from scratch, and about how Toby learned to integrate a Raspberry Pi for both software and hardware control.
The finished build is a tic-tac-toe beast, besting everyone who dares to challenge it to a game.
And in case you’re wondering: no, none of the Raspberry Pi team were able to beat the Tic-Tac-Toe Robot when we played against it.
Your turn
We always love seeing Raspberry Pis being used in schools to teach coding and digital making, whether in the classroom or during after-school activities such as the Dover Robotics club and our own Code Clubs and CoderDojos. If you are part of a coding or robotics club, we’d love to hear your story! So make sure to share your experiences and projects in the comments below, or via our social media accounts.
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.