Елон Мъск споделя в Twitter идеята да създаде нова медия – “crowdsourced site”, наречен Pravda – който според Мъск би позволил на обществеността “да оцени истината във всяка статия и доверието към всеки журналист, редактор и публикация.” Дори репортери са открили заявление за вписване на търговско дружество Pravda Corp.
Мнозина правят паралел между Мъск и Тръмп – заради определянето на негативни новини за тях като фалшиви и отрицателната оценка за новинарските медии като цяло.
Pravda е много, много лоша идея, се казва в коментар на The Verge: Мъск, както и Тръмп, може да не харесва начина, по който го представят – или като жертва, или като провал – но вие не можете да законодателствате достоверността, нито да я подлагате на гласуване – и да очаквате някакъв друг резултат, освен дистопия. Неинформираното общество е общество, което може да бъде подведено да вярва в каквото и да е – че земята е плоска и че човек не е кацал на Луната – и това е много опасно нещо в ръцете на мотивиран субект. Особено ако се съчетае с ехо-ефекта на платформите, може да подкопае доверието във всеки източник, който си позволява да влезе в конфликт с този субект.
Голямата тема не е Мъск, голямата тема е манипулирането на общественото мнение – и възможното противодействие.
Обществена оценка какво е вярно е на една крачка от обществената оценка какво е правилно да се направи – и обяснява резултатите от някои избори и референдуми.
The Intercept has a long article on Japan’s equivalent of the NSA: the Directorate for Signals Intelligence. Interesting, but nothing really surprising.
The directorate has a history that dates back to the 1950s; its role is to eavesdrop on communications. But its operations remain so highly classified that the Japanese government has disclosed little about its work even the location of its headquarters. Most Japanese officials, except for a select few of the prime minister’s inner circle, are kept in the dark about the directorate’s activities, which are regulated by a limited legal framework and not subject to any independent oversight.
Now, a new investigation by the Japanese broadcaster NHK — produced in collaboration with The Intercept — reveals for the first time details about the inner workings of Japan’s opaque spy community. Based on classified documents and interviews with current and former officials familiar with the agency’s intelligence work, the investigation shines light on a previously undisclosed internet surveillance program and a spy hub in the south of Japan that is used to monitor phone calls and emails passing across communications satellites.
The article includes some new documents from the Snowden archive.
Attention, case modders: take a look at the Brutus 2, an extremely snazzy computer case with a partly transparent, animated side panel that’s powered by a Pi. Daniel Otto and Carsten Lehman have a current crowdfunder for the case; their video is in German, but the looks of the build speak for themselves. There are some truly gorgeous effects here.
Vorbestellungen ab sofort auf https://www.startnext.com/brutus2 Weitere Infos zu uns auf: https://3nb.de https://www.facebook.com/3nb.de https://www.instagram.com/3nb.de Über 3nb: – GbR aus Leipzig, gegründet 2017 – wir kommen aus den Bereichen Elektronik und Informatik – erstes Produkt: der Brutus One ein Gaming PC mit transparentem Display in der Seite Kurzinfo Brutus 2: – Markencomputergehäuse für Gaming- /Casemoddingszene – Besonderheit: animiertes Seitenfenster angesteuert mit einem Raspberry Pi – Vorteile von unserem Case: o Case ist einzeln lieferbar und nicht nur als komplett-PC o kein Leistungsverbrauch der Grafikkarte dank integriertem Raspberry Pi o bessere Darstellung von Texten und Grafiken durch unscharfen Hintergrund
What’s case modding?
Case modding just means modifying your computer or gaming console’s case, and it’s very popular in the gaming community. Some mods are functional, while others improve the way the case looks. Lots of dedicated gamers don’t only want a powerful computer, they also want it to look amazing — at home, or at LAN parties and games tournaments.
The Brutus 2 case
The Brutus 2 case is made by Daniel and Carsten’s startup, 3nb electronics, and it’s a product that is officially Powered by Raspberry Pi. Its standout feature is the semi-transparent TFT screen, which lets you play any video clip you choose while keeping your gaming hardware on display. It looks incredibly cool. All the graphics for the case’s screen are handled by a Raspberry Pi, so it doesn’t use any of your main PC’s GPU power and your gaming won’t suffer.
To use Brutus 2, you just need to run a small desktop application on your PC to choose what you want to display on the case. A number of neat animations are included, and you can upload your own if you want.
So far, the app only runs on Windows, but 3nb electronics are planning to make the code open-source, so you can modify it for other operating systems, or to display other file types. This is true to the spirit of the case modding and Raspberry Pi communities, who love adapting, retrofitting, and overhauling projects and code to fit their needs.
Daniel and Carsten say that one of their campaign’s stretch goals is to implement more functionality in the Brutus 2 app. So in the future, the case could also show things like CPU temperature, gaming stats, and in-game messages. Of course, there’s nothing stopping you from integrating features like that yourself.
If you have any questions about the case, you can post them directly to Daniel and Carsten here.
The crowdfunding campaign
The Brutus 2 campaign on Startnext is currently halfway to its first funding goal of €10000, with over three weeks to go until it closes. If you’re quick, you still be may be able to snatch one of the early-bird offers. And if your whole guild NEEDS this, that’s OK — there are discounts for bulk orders.
Earlier this month, the Pentagon stopped selling phones made by the Chinese companies ZTE and Huawei on military bases because they might be used to spy on their users.
It’s a legitimate fear, and perhaps a prudent action. But it’s just one instance of the much larger issue of securing our supply chains.
All of our computerized systems are deeply international, and we have no choice but to trust the companies and governments that touch those systems. And while we can ban a few specific products, services or companies, no country can isolate itself from potential foreign interference.
In this specific case, the Pentagon is concerned that the Chinese government demanded that ZTE and Huawei add “backdoors” to their phones that could be surreptitiously turned on by government spies or cause them to fail during some future political conflict. This tampering is possible because the software in these phones is incredibly complex. It’s relatively easy for programmers to hide these capabilities, and correspondingly difficult to detect them.
This isn’t the first time the United States has taken action against foreign software suspected to contain hidden features that can be used against us. Last December, President Trump signed into law a bill banning software from the Russian company Kaspersky from being used within the US government. In 2012, the focus was on Chinese-made Internet routers. Then, the House Intelligence Committee concluded: “Based on available classified and unclassified information, Huawei and ZTE cannot be trusted to be free of foreign state influence and thus pose a security threat to the United States and to our systems.”
Nor is the United States the only country worried about these threats. In 2014, China reportedly banned antivirus products from both Kaspersky and the US company Symantec, based on similar fears. In 2017, the Indian government identified 42 smartphone apps that China subverted. Back in 1997, the Israeli company Check Point was dogged by rumors that its government added backdoors into its products; other of that country’s tech companies have been suspected of the same thing. Even al-Qaeda was concerned; ten years ago, a sympathizer released the encryption software Mujahedeen Secrets, claimed to be free of Western influence and backdoors. If a country doesn’t trust another country, then it can’t trust that country’s computer products.
But this trust isn’t limited to the country where the company is based. We have to trust the country where the software is written — and the countries where all the components are manufactured. In 2016, researchers discovered that many different models of cheap Android phones were sending information back to China. The phones might be American-made, but the software was from China. In 2016, researchers demonstrated an even more devious technique, where a backdoor could be added at the computer chip level in the factory that made the chips without the knowledge of, and undetectable by, the engineers who designed the chips in the first place. Pretty much every US technology company manufactures its hardware in countries such as Malaysia, Indonesia, China and Taiwan.
We also have to trust the programmers. Today’s large software programs are written by teams of hundreds of programmers scattered around the globe. Backdoors, put there by we-have-no-idea-who, have been discovered in Juniper firewalls and D-Link routers, both of which are US companies. In 2003, someone almost slipped a very clever backdoor into Linux. Think of how many countries’ citizens are writing software for Apple or Microsoft or Google.
We can go even farther down the rabbit hole. We have to trust the distribution systems for our hardware and software. Documents disclosed by Edward Snowden showed the National Security Agency installing backdoors into Cisco routers being shipped to the Syrian telephone company. There are fake apps in the Google Play store that eavesdrop on you. Russian hackers subverted the update mechanism of a popular brand of Ukrainian accounting software to spread the NotPetya malware.
I could go on. Supply-chain security is an incredibly complex problem. US-only design and manufacturing isn’t an option; the tech world is far too internationally interdependent for that. We can’t trust anyone, yet we have no choice but to trust everyone. Our phones, computers, software and cloud systems are touched by citizens of dozens of different countries, any one of whom could subvert them at the demand of their government. And just as Russia is penetrating the US power grid so they have that capability in the event of hostilities, many countries are almost certainly doing the same thing at the consumer level.
We don’t know whether the risk of Huawei and ZTE equipment is great enough to warrant the ban. We don’t know what classified intelligence the United States has, and what it implies. But we do know that this is just a minor fix for a much larger problem. It’s doubtful that this ban will have any real effect. Members of the military, and everyone else, can still buy the phones. They just can’t buy them on US military bases. And while the US might block the occasional merger or acquisition, or ban the occasional hardware or software product, we’re largely ignoring that larger issue. Solving it borders on somewhere between incredibly expensive and realistically impossible.
Perhaps someday, global norms and international treaties will render this sort of device-level tampering off-limits. But until then, all we can do is hope that this particular arms race doesn’t get too far out of control.
In this article from The MagPi issue 69, David Crookes explains how Daniel Berrangé took an old Kodak Brownie from the 1950s and turned it into a quirky digital camera. Get your copy of The MagPi magazine in stores now, or download it as a free PDF here.
The Kodak Box Brownie
When Kodak unveiled its Box Brownie in 1900, it did so with the slogan ‘You press the button, we do the rest.’ The words referred to the ease-of-use of what was the world’s first mass-produced camera. But it could equally apply to Daniel Berrangé’s philosophy when modifying it for the 21st century. “I wanted to use the Box Brownie’s shutter button to trigger image capture, and make it simple to use,” he tells us.
Daniel’s project grew from a previous effort in which he placed a pinhole webcam inside a ladies’ powder compact case. “The Box Brownie project is essentially a repeat of that design but with a normal lens instead of a pinhole, a real camera case, and improved software to enable a shutter button. Ideally, it would look unchanged from when it was shooting film.”
At first, Daniel looked for a cheap webcam, intending to spend no more than the price of a Pi Zero. This didn’t work out too well. “The low-light performance of the webcam was not sufficient to make a pinhole camera so I just decided to make a ‘normal’ digital camera instead,” he reveals. To that end, he began removing some internal components from the Box Brownie. “With the original lens removed, the task was to position the webcam’s electronic light sensor (the CCD) and lens as close to the front of the camera as possible,” Daniel explains. “In the end, the CCD was about 15 mm away from the front aperture of the camera, giving a field of view that was approximately the same as the unmodified camera would achieve.”
It was then time for him to insert the Raspberry Pi, upon which was a custom ‘init’ binary that loads a couple of kernel modules to run the webcam, mount the microSD file system, and launch the application binary. Here, Daniel found he was in luck. “I’d noticed that the size of a 620 film spool (63 mm) was effectively the same as the width of a Raspberry Pi Zero (65 mm), so it could be held in place between the film spool grips,” he recalls. “It was almost as if it was designed with this in mind.”
In order to operate the camera, Daniel had to work on the shutter button. “The Box Brownie’s shutter button is entirely mechanical, driven by a handful of levers and springs,” Daniel explains. “First, the Pi Zero needs to know when the shutter button is pressed and second, the physical shutter has to be open while the webcam is capturing the image. Rather than try to synchronise image capture with the fraction of a second that the physical shutter is open, a bit of electrical tape was used on the shutter mechanism to keep it permanently open.”
Daniel made use of the Pi Zero’s GPIO pins to detect the pressing of the shutter button. It determines if each pin is at 0 or 5 volts. “My thought was that I could set a GPIO pin high to 5 V, and then use the action of the shutter button to short it to ground, and detect this change in level from software.”
This initially involved using a pair of bare wires and some conductive paint, although the paint was later replaced by a piece of tinfoil. But with the button pressed, the GPIO pin level goes to zero and the device constantly captures still images until the button is released. All that’s left to do is smile and take the perfect snap.
The Internet of Things (IoT) has precipitated to an influx of connected devices and data that can be mined to gain useful business insights. If you own an IoT device, you might want the data to be uploaded seamlessly from your connected devices to the cloud so that you can make use of cloud storage and the processing power to perform sophisticated analysis of data. To upload the data to the AWS Cloud, devices must pass authentication and authorization checks performed by the respective AWS services. The standard way of authenticating AWS requests is the Signature Version 4 algorithm that requires the caller to have an access key ID and secret access key. Consequently, you need to hardcode the access key ID and the secret access key on your devices. Alternatively, you can use the built-in X.509 certificate as the unique device identity to authenticate AWS requests.
AWS IoT has introduced the credentials provider feature that allows a caller to authenticate AWS requests by having an X.509 certificate. The credentials provider authenticates a caller using an X.509 certificate, and vends a temporary, limited-privilege security token. The token can be used to sign and authenticate any AWS request. Thus, the credentials provider relieves you from having to manage and periodically refresh the access key ID and secret access key remotely on your devices.
In the process of retrieving a security token, you use AWS IoT to create a thing (a representation of a specific device or logical entity), register a certificate, and create AWS IoT policies. You also configure an AWS Identity and Access Management (IAM) role and attach appropriate IAM policies to the role so that the credentials provider can assume the role on your behalf. You also make an HTTP-over-Transport Layer Security (TLS) mutual authentication request to the credentials provider that uses your preconfigured thing, certificate, policies, and IAM role to authenticate and authorize the request, and obtain a security token on your behalf. You can then use the token to sign any AWS request using Signature Version 4.
In this blog post, I explain the AWS IoT credentials provider design and then demonstrate the end-to-end process of retrieving a security token from AWS IoT and using the token to write a temperature and humidity record to a specific Amazon DynamoDB table.
Note: This post assumes you are familiar with AWS IoT and IAM to perform steps using the AWS CLI and OpenSSL. Make sure you are running the latest version of the AWS CLI.
Overview of the credentials provider workflow
The following numbered diagram illustrates the credentials provider workflow. The diagram is followed by explanations of the steps.
To explain the steps of the workflow as illustrated in the preceding diagram:
The AWS IoT device uses the AWS SDK or custom client to make an HTTPS request to the credentials provider for a security token. The request includes the device X.509 certificate for authentication.
The credentials provider forwards the request to the AWS IoT authentication and authorization module to verify the certificate and the permission to request the security token.
If the certificate is valid and has permission to request a security token, the AWS IoT authentication and authorization module returns success. Otherwise, it returns failure, which goes back to the device with the appropriate exception.
The requested service invokes IAM to validate the signature and authorize the request against access policies attached to the preconfigured IAM role.
If IAM validates the signature successfully and authorizes the request, the request goes through.
In another solution, you could configure an AWS Lambda rule that ingests your device data and sends it to another AWS service. However, in applications that require the uploading of large files such as videos or aggregated telemetry to the AWS Cloud, you may want your devices to be able to authenticate and send data directly to the AWS service of your choice. The credentials provider enables you to do that.
Outline of the steps to retrieve and use security token
Perform the following steps as part of this solution:
Create an AWS IoT thing: Start by creating a thing that corresponds to your home thermostat in the AWS IoT thing registry database. This allows you to authenticate the request as a thing and use thing attributes as policy variables in AWS IoT and IAM policies.
Register a certificate: Create and register a certificate with AWS IoT, and attach it to the thing for successful device authentication.
Create and configure an IAM role: Create an IAM role to be assumed by the service on behalf of your device. I illustrate how to configure a trust policy and an access policy so that AWS IoT has permission to assume the role, and the token has necessary permission to make requests to DynamoDB.
Create a role alias: Create a role alias in AWS IoT. A role alias is an alternate data model pointing to an IAM role. The credentials provider request must include a role alias name to indicate which IAM role to assume for obtaining a security token from AWS STS. You may update the role alias on the server to point to a different IAM role and thus make your device obtain a security token with different permissions.
Attach a policy: Create an authorization policy with AWS IoT and attach it to the certificate to control which device can assume which role aliases.
Request a security token: Make an HTTPS request to the credentials provider and retrieve a security token and use it to sign a DynamoDB request with Signature Version 4.
Use the security token to sign a request: Use the retrieved token to sign a request to DynamoDB and successfully write a temperature and humidity record from your home thermostat in a specific table. Thus, starting with an X.509 certificate on your home thermostat, you can successfully upload your thermostat record to DynamoDB and use it for further analysis. Before the availability of the credentials provider, you could not do this.
Deploy the solution
1. Create an AWS IoT thing
Register your home thermostat in the AWS IoT thing registry database by creating a thing type and a thing. You can use the AWS CLI with the following command to create a thing type. The thing type allows you to store description and configuration information that is common to a set of things.
Now, you need to have a Certificate Authority (CA) certificate, sign a device certificate using the CA certificate, and register both certificates with AWS IoT before your device can authenticate to AWS IoT. If you do not already have a CA certificate, you can use OpenSSL to create a CA certificate, as described in Use Your Own Certificate. To register your CA certificate with AWS IoT, follow the steps on Registering Your CA Certificate.
You then have to create a device certificate signed by the CA certificate and register it with AWS IoT, which you can do by following the steps on Creating a Device Certificate Using Your CA Certificate. Save the certificate and the corresponding key pair; you will use them when you request a security token later. Also, remember the password you provide when you create the certificate.
Run the following command in the AWS CLI to attach the device certificate to your thing so that you can use thing attributes in policy variables.
If the attach-thing-principal command succeeds, the output is empty.
3. Configure an IAM role
Next, configure an IAM role in your AWS account that will be assumed by the credentials provider on behalf of your device. You are required to associate two policies with the role: a trust policy that controls who can assume the role, and an access policy that controls which actions can be performed on which resources by assuming the role.
The following trust policy grants the credentials provider permission to assume the role. Put it in a text document and save the document with the name, trustpolicyforiot.json.
The following access policy allows DynamoDB operations on the table that has the same name as the thing name that you created in Step 1, MyHomeThermostat, by using credentials-iot:ThingName as a policy variable. I explain after Step 5 about using thing attributes as policy variables. Put the following policy in a text document and save the document with the name, accesspolicyfordynamodb.json.
Finally, run the following command in the AWS CLI to attach the access policy to your role.
aws iam attach-role-policy --role-name dynamodb-access-role --policy-arn arn:aws:iam::<your_aws_account_id>:policy/accesspolicyfordynamodb
If the attach-role-policy command succeeds, the output is empty.
Configure the PassRole permissions
The IAM role that you have created must be passed to AWS IoT to create a role alias, as described in Step 4. The user who performs the operation requires iam:PassRole permission to authorize this action. You also should add permission for the iam:GetRole action to allow the user to retrieve information about the specified role. Create the following policy to grant iam:PassRole and iam:GetRole permissions. Name this policy, passrolepermission.json.
Now, run the following command to attach the policy to the user.
aws iam attach-user-policy --policy-arn arn:aws:iam::<your_aws_account_id>:policy/passrolepermission --user-name <user_name>
If the attach-user-policy command succeeds, the output is empty.
4. Create a role alias
Now that you have configured the IAM role, you will create a role alias with AWS IoT. You must provide the following pieces of information when creating a role alias:
RoleAlias: This is the primary key of the role alias data model and hence a mandatory attribute. It is a string; the minimum length is 1 character, and the maximum length is 128 characters.
RoleArn: This is the Amazon Resource Name (ARN) of the IAM role you have created. This is also a mandatory attribute.
CredentialDurationSeconds: This is an optional attribute specifying the validity (in seconds) of the security token. The minimum value is 900 seconds (15 minutes), and the maximum value is 3,600 seconds (60 minutes); the default value is 3,600 seconds, if not specified.
Run the following command in the AWS CLI to create a role alias. Use the credentials of the user to whom you have given the iam:PassRole permission.
You created and registered a certificate with AWS IoT earlier for successful authentication of your device. Now, you need to create and attach a policy to the certificate to authorize the request for the security token.
Let’s say you want to allow a thing to get credentials for the role alias, Thermostat-dynamodb-access-role-alias, with thing owner Alice, thing type thermostat, and the thing attached to a principal. The following policy, with thing attributes as policy variables, achieves these requirements. After this step, I explain more about using thing attributes as policy variables. Put the policy in a text document, and save it with the name, alicethermostatpolicy.json.
If the attach-policy command succeeds, the output is empty.
You have completed all the necessary steps to request an AWS security token from the credentials provider!
Using thing attributes as policy variables
Before I show how to request a security token, I want to explain more about how to use thing attributes as policy variables and the advantage of using them. As a prerequisite, a device must provide a thing name in the credentials provider request.
Thing substitution variables in AWS IoT policies
AWS IoT Simplified Permission Management allows you to associate a connection with a specific thing, and allow the thing name, thing type, and other thing attributes to be available as substitution variables in AWS IoT policies. You can write a generic AWS IoT policy as in alicethermostatpolicy.json in Step 5, attach it to multiple certificates, and authorize the connection as a thing. For example, you could attach alicethermostatpolicy.json to certificates corresponding to each of the thermostats you have that you want to assume the role alias, Thermostat-dynamodb-access-role-alias, and allow operations only on the table with the name that matches the thing name. For more information, see the full list of thing policy variables.
Thing substitution variables in IAM policies
You also can use the following three substitution variables in the IAM role’s access policy (I used credentials-iot:ThingName in accesspolicyfordynamodb.json in Step 3):
When the device provides the thing name in the request, the credentials provider fetches these three variables from the database and adds them as context variables to the security token. When the device uses the token to access DynamoDB, the variables in the role’s access policy are replaced with the corresponding values in the security token. Note that you also can use credentials-iot:AwsCertificateId as a policy variable; AWS IoT returns certificateId during registration.
6. Request a security token
Make an HTTPS request to the credentials provider to fetch a security token. You have to supply the following information:
Certificate and key pair: Because this is an HTTP request over TLS mutual authentication, you have to provide the certificate and the corresponding key pair to your client while making the request. Use the same certificate and key pair that you used during certificate registration with AWS IoT.
RoleAlias: Provide the role alias (in this example, Thermostat-dynamodb-access-role-alias) to be assumed in the request.
ThingName: Provide the thing name that you created earlier in the AWS IoT thing registry database. This is passed as a header with the name, x-amzn-iot-thingname. Note that the thing name is mandatory only if you have thing attributes as policy variables in AWS IoT or IAM policies.
Run the following command in the AWS CLI to obtain your AWS account-specific endpoint for the credentials provider. See the DescribeEndpoint API documentation for further details.
Note that if you are on Mac OS X, you need to export your certificate to a .pfx or .p12 file before you can pass it in the https request. Use OpenSSL with the following command to convert the device certificate from .pem to .pfx format. Remember the password because you will need it subsequently in a curl command.
Create a DynamoDB table called MyHomeThermostat in your AWS account. You will have to choose the hash (partition key) and the range (sort key) while creating the table to uniquely identify a record. Make the hash the serial_number of the thermostat and the range the timestamp of the record. Create a text file with the following JSON to put a temperature and humidity record in the table. Name the file, item.json.
You can use the accessKeyId, secretAccessKey, and sessionToken retrieved from the output of the curl command to sign a request that writes the temperature and humidity record to the DynamoDB table. Use the following commands to accomplish this.
In this blog post, I demonstrated how to retrieve a security token by using an X.509 certificate and then writing an item to a DynamoDB table by using the security token. Similarly, you could run applications on surveillance cameras or sensor devices that exchange the X.509 certificate for an AWS security token and use the token to upload video streams to Amazon Kinesis or telemetry data to Amazon CloudWatch.
If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about or issues implementing this solution, start a new thread on the AWS IoT forum.
Many of today’s discussions around blockchain technology remind me of the classic Shimmer Floor Wax skit. According to Dan Aykroyd, Shimmer is a dessert topping. Gilda Radner claims that it is a floor wax, and Chevy Chase settles the debate and reveals that it actually is both! Some of the people that I talk to see blockchains as the foundation of a new monetary system and a way to facilitate international payments. Others see blockchains as a distributed ledger and immutable data source that can be applied to logistics, supply chain, land registration, crowdfunding, and other use cases. Either way, it is clear that there are a lot of intriguing possibilities and we are working to help our customers use this technology more effectively.
We are launching AWS Blockchain Templates today. These templates will let you launch an Ethereum (either public or private) or Hyperledger Fabric (private) network in a matter of minutes and with just a few clicks. The templates create and configure all of the AWS resources needed to get you going in a robust and scalable fashion.
Launching a Private Ethereum Network The Ethereum template offers two launch options. The ecs option creates an Amazon ECS cluster within a Virtual Private Cloud (VPC) and launches a set of Docker images in the cluster. The docker-local option also runs within a VPC, and launches the Docker images on EC2 instances. The template supports Ethereum mining, the EthStats and EthExplorer status pages, and a set of nodes that implement and respond to the Ethereum RPC protocol. Both options create and make use of a DynamoDB table for service discovery, along with Application Load Balancers for the status pages.
Here are the AWS Blockchain Templates for Ethereum:
Learn more: http://rpf.io/ Subscribe to our YouTube channel: http://rpf.io/ytsub Help us reach a wider audience by translating our video content: http://rpf.io/yttranslate Buy a Raspberry Pi from one of our Approved Resellers: http://rpf.io/ytproducts Find out more about the Raspberry Pi Foundation: Raspberry Pi http://rpf.io/ytrpi Code Club UK http://rpf.io/ytccuk Code Club International http://rpf.io/ytcci CoderDojo http://rpf.io/ytcd Check out our free online training courses: http://rpf.io/ytfl Find your local Raspberry Jam event: http://rpf.io/ytjam Work through our free online projects: http://rpf.io/ytprojects Do you have a question about your Raspberry Pi?
Fantastic collections and where to find them
Large, impressive statues are truly a sight to be seen. Take for example the 2.4m Hoa Hakananai’a at the British Museum. Its tall stature looms over you as you read its plaque to learn of the statue’s journey from Easter Island to the UK under the care of Captain Cook in 1774, and you can’t help but wonder at how it made it here in one piece.
But unless you live near a big city where museums are plentiful, you’re unlikely to see the likes of Hoa Hakananai’a in person. Instead, you have to content yourself with online photos or videos of world-famous artefacts.
And that only accounts for the objects that are on display: conservators estimate that only approximately 5 to 10% of museums’ overall collections are actually on show across the globe. The rest is boxed up in storage, inaccessible to the public due to risk of damage, or simply due to lack of space.
Museum in a Box
Museum in a Box aims to “put museum collections and expert knowledge into your hand, wherever you are in the world,” through modern maker practices such as 3D printing and digital making. With the help of the ‘Scan the World’ movement, an “ambitious initiative whose mission is to archive objects of cultural significance using 3D scanning technologies”, the Museum in a Box team has been able to print small, handheld replicas of some of the world’s most recognisable statues and sculptures.
Each 3D print gets NFC tags so it can initiate audio playback from a Raspberry Pi that sits snugly within the laser-cut housing of a ‘brain box’. Thus the print can talk directly to us through the magic of wireless technology, replacing the dense, dry text of a museum plaque with engaging speech.
The Museum in a Box team headed by CEO George Oates (featured in the video above) makes use of these 3D-printed figures alongside original artefacts, postcards, and more to bridge the gap between large, crowded, distant museums and local schools. Modeled after the museum handling collections that used to be sent to schools, Museum in a Box is a cheaper, more accessible alternative. Moreover, it not only allows for hands-on learning, but also encourages children to get directly involved by hacking its technology! With NFC technology readily available to the public, students can curate their own collections about their local area, record their own messages, and send their own box-sized museums on to schools in other towns or countries. In this way, Museum in a Box enables students to explore, and expand the reach of, their own histories.
With the technology perfected and interest in the project ever-growing, Museum in a Box has a busy year ahead. Supporting the new ‘Unstacked’ learning initiative, the team will soon be delivering ten boxes to the Smithsonian Libraries. The team has curated two collections specifically for this: an exploration into Asia-Pacific America experiences of migration to the USA throughout the 20th century, and a look into the history of science.
The team will also be making a box for the British Museum to support their Iraq Scheme initiative, and another box will be heading to the V&A to support their See Red programme. While primarily installed in the Lansbury Micro Museum, the box will also take to the road to visit the local Spotlight high school.
Museum in a Box at Raspberry Fields
Lastly, by far the most exciting thing the Museum in a Box team will be doing this year — in our opinion at least — is showcasing at Raspberry Fields! This is our brand-new festival of digital making that’s taking place on 30 June and 1 July 2018 here in Cambridge, UK. Find more information about it and get your ticket here.
American Public Television was like many organizations that have been around for a while. They were entrenched using an older technology — in their case, tape storage and distribution — that once met their needs but was limiting their productivity and preventing them from effectively collaborating with their many media partners. APT’s VP of Technology knew that he needed to move into the future and embrace cloud storage to keep APT ahead of the game.
Since 1961, American Public Television (APT) has been a leading distributor of groundbreaking, high-quality, top-rated programming to the nation’s public television stations. Gerry Field is the Vice President of Technology at APT and is responsible for delivering their extensive program catalog to 350+ public television stations nationwide.
In the time since Gerry joined APT in 2007, the industry has been in digital overdrive. During that time APT has continued to acquire and distribute the best in public television programming to their technically diverse subscribers.
This created two challenges for Gerry. First, new technology and format proliferation were driving dramatic increases in digital storage. Second, many of APT’s subscribers struggled to keep up with the rapidly changing industry. While some subscribers had state-of-the-art satellite systems to receive programming, others had to wait for the post office to drop off programs recorded on tape weeks earlier. With no slowdown on the horizon of innovation in the industry, Gerry knew that his storage and distribution systems would reach a crossroads in no time at all.
Living the tape paradigm
The digital media industry is only a few years removed from its film, and later videotape, roots. Tape was the input and the output of the industry for many years. As a consequence, the tools and workflows used by the industry were built and designed to work with tape. Over time, the “file” slowly replaced the tape as the object to be captured, edited, stored and distributed. Trouble was, many of the systems and more importantly workflows were based on processing tape, and these have proven to be hard to change.
At APT, Gerry realized the limits of the tape paradigm and began looking for technologies and solutions that enabled workflows based on file and object based storage and distribution.
Thinking file based storage and distribution
For data (digital media) storage, APT, like everyone else, started by installing onsite storage servers. As the amount of digital data grew, more storage was added. In addition, APT was expanding its distribution footprint by creating or partnering with distribution channels such as CreateTV and APT Worldwide. This dramatically increased the number of programming formats and the amount of data that had to be stored. As a consequence, updating, maintaining, and managing the APT storage systems was becoming a major challenge and a major resource hog.
Knowing that his in-house storage system was only going to cost more time and money, Gerry decided it was time to look at cloud storage. But that wasn’t the only reason he looked at the cloud. While most people consider cloud storage as just a place to back up and archive files, Gerry was envisioning how the ubiquity of the cloud could help solve his distribution challenges. The trouble was the price of cloud storage from vendors like Amazon S3 and Microsoft Azure was a non-starter, especially for a non-profit. Then Gerry came across Backblaze. B2 Cloud Storage service met all of his performance requirements, and at $0.005/GB/month for storage and $0.01/GB for downloads it was nearly 75% less than S3 or Azure.
Gerry did the math and found that he could economically incorporate B2 Cloud Storage into his IT portfolio, using it for both program submission and for active storage and archiving of the APT programs. In addition, B2 now gives him the foundation necessary to receive and distribute programming content over the Internet. This is especially useful for organizations that can’t conveniently access satellite distribution systems. Not to mention downloading from the cloud is much faster than sending a tape through the mail.
Adding B2 Cloud Storage to their infrastructure has helped American Public Television address two key challenges. First, they now have “unlimited” storage in the cloud without having to add any hardware. In addition, with B2, they only pay for the storage they use. That means they don’t have to buy storage upfront trying to match the maximum amount of storage they’ll ever need. Second, by using B2 as a distribution source for their programming APT subscribers, especially the smaller and remote ones, can get content faster and more reliably without having to perform costly upgrades to their infrastructure.
The road ahead
As APT gets used to their file based infrastructure and workflow, there are a number of cost saving and income generating ideas they are pondering which are now worth considering. Here are a few:
Program Submissions — New content can be uploaded from anywhere using a web browser, an Internet connection, and a login. For example, a producer in Cambodia can upload their film to B2. From there the film is downloaded to an in-house system where it is processed and transcoded using compute. The finished film is added to the APT catalog and added to B2. Once there, the program is instantly available for subscribers to order and download.
“The affordability and performance of Backblaze B2 is what allowed us to make the B2 cloud part of the APT data storage and distribution strategy into the future.” — Gerry Field
Easier Previews — At any time, work in process or finished programs can be made available for download from the B2 cloud. One place this could be useful is where a subscriber needs to review a program to comply with local policies and practices before airing. In the old system, each “one-off” was a time consuming manual process.
Instant Subscriptions — There are many organizations such as schools and businesses that want to use just one episode of a desired show. With an e-commerce based website, current or even archived programming kept in B2 could be available to download or stream for a minimal charge.
At APT there were multiple technologies needed to make their file-based infrastructure work, but as Gerry notes, having an affordable, trustworthy, cloud storage service like B2 is one of the critical building blocks needed to make everything work together.
In this post, I’ll show you how to create a sample dataset for Amazon Macie, and how you can use Amazon Macie to implement data-centric compliance and security analytics in your Amazon S3 environment. I’ll also dive into the different kinds of credentials, document types, and PII detections supported by Macie. First, I’ll walk through creating a “getting started” sample set of artificial, generated data that you can use to test Macie capabilities and start building your own policies and alerts.
Create a realistic data sample set in S3
I’ll use amazon-macie-activity-generator, which we call “AMG” for short, a sample application developed by AWS that generates realistic content and accesses your test account to create the data. AMG uses AWS CloudFormation, AWS Lambda, and Python’s excellent Faker library to create a data set with artificial—but realistic—data classifications and access patterns to help test some of the features and capabilities of Macie. AMG is released under Amazon Software License 1.0, and we’ll accept pull requests on our GitHub repository and monitor any issues that are opened so we can try to fix bugs and consider new feature requests.
The following diagram shows a high level architecture overview of the components that will be created in your AWS account for AMG. For additional detail about these components and their relationships, review the CloudFormation setup script.
Depending on the data types specified in your JSON configuration template (details below), AMG will periodically generate artificial documents for the specified S3 target with a PutObject action. By default, the CloudFormation stack uses a configuration file that instructs AMG to create a new, private S3 bucket that can only be accessed by authorized AWS users/roles in the same account as the bucket. All the S3 objects with fake data in this bucket have a private ACL and inherit the bucket’s access control configuration. All generated objects feature the header in the example below, and AMG supports all fake data providers offered by https://faker.readthedocs.io/en/latest/index.html, as well as a few of AMG‘s own custom fake data providers requested by our customers: aws_creds, slack_creds, github_creds, facebook_creds, linux_shadow, rsa, linux_passwd, dsa, ec, pgp, cert, itin, swift_code, and cve.
# Sample Report - No identification of actual persons or places is # intended or should be inferred
74323 Julie Field Lake Joshuamouth, OR 30055-3905 1-196-191-4438x974 53001 Paul Union New John, HI 94740 Mastercard Amanda Wells 5135725008183484 09/26 CVV: 550
354-70-6172 242 George Plaza East Lawrencefurt, VA 37287-7620 GB73WAUS0628038988364 587 Silva Village Pearsonburgh, NM 11616-7231 LDNM1948227117807 American Express Brett Garza 347965534580275 05/20 CID: 4758
599.335.2742 JCB 15 digit Michael Arias 210069190253121 03/27 CVC: 861
Create your amazon-macie-activity-generator CloudFormation stack
You can deploy AMG in your AWS account by using either these methods:
Log in to the AWS Console in a region supported by Amazon Macie, which currently includes US East (N. Virginia), US West (Oregon).
Select the One-click CloudFormation launch stack, or launch CloudFormation using the template above.
Read our terms, select the Acknowledgement box, and then select Create.
Creating the data takes a few minutes, and you can periodically refresh CloudWatch to track progress.
Add the new sample data to Macie
Now, I’ll log into the Macie console and add the newly created sample data buckets for analysis by Macie.
Note: If you don’t explicitly specify a bucket for S3 targets in CloudFormation, AMG will use the S3 bucket that’s created by default for the stack, which will be printed out in the CloudFormation stack’s output.
To add buckets for data classification, follow these steps:
Log in to Amazon Macie.
Select Integrations, and then select Services.
Select your account, and then select Details from the Amazon S3 card.
Select your newly created buckets for Full classification, including existing data.
For additional details on configuring Macie, refer to our getting started documentation.
Macie classifies all historical and newly created data in the buckets created by AMG, and the data will be available in the Macie console as it’s classified. Typically, you can expect the data in the sample set to be classified within 60 minutes of the time it was selected for analysis.
Classifying objects with Macie
To see the objects in your test sample set, in Macie, open the Research tab, and then select the S3 Objects index. We’ll use the regular expression search capability in Macie to find any objects written to buckets that start with “amazon-macie-activity-generator-defaults3bucket”. To search for this, type the following text into the Macie search box and select the magnifying glass icon.
From here, you can see a nice breakdown of the kinds of objects that have been classified by Macie, as well as the object-specific details. Create an advanced search using Lucene Query Syntax, and save it as an alert to be matched against any newly created data.
Analyzing accesses to your test data
In addition to classifying data, Macie tracks all control plane and data plane accesses to your content using CloudTrail. To see accesses to your generated environment (created periodically by AMG to mimic user activity), on the Macie navigation bar, select Research, select the CloudTrail data index, and then use the following search to identify our generated role activity:
From this search, you can dive into the user activity (IAM users, assumed roles, federated users, and so on), which is summarized in 5-minute aggregations (user sessions). For example, in the screen shot you can see that one of our AMG-generated users listed objects one time (ListObjects) and wrote 56 objects to S3 (PutObject) during a 5-minute period.
Macie features both predictive (machine learning-based) and basic (rule-based) alerts, including alerts on unencrypted credentials being uploaded to S3 (because this activity might not follow compliance best practices), risky activity such as data exfiltration, and user-defined alerts that are based on saved searches. To see alerts that have been generated based on AMG‘s activity, on the Macie navigation bar, select Alerts.
AMG will continue to run, periodically uploading content to the specified S3 buckets. To stop AMG, delete the AMG CloudFormation stack and associated resources here.
What are the costs?
Macie has a free tier enabling up to 1GB of content to be analyzed per month at no cost to you. By default, AMG will write approximately 10MB of objects to Amazon S3 per day, and you will incur charges for data classification after crossing the 1GB monthly free tier. Running continuously, AMG will generate about 310MB of content per month (10MB/day x 31 days), which will stay below the free tier. Any data use above 1GB will be billed at the Macie public price of $5/GB. For more detail, see the Macie pricing documentation.
If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the Amazon Macie forum or contact AWS Support.
We love Mugsy, the Raspberry Pi coffee robot that has smashed its crowdfunding goal within days! Our latest YouTube video shows our catch-up with Mugsy and its creator Matthew Oswald at Maker Faire New York last year.
Labelled ‘the world’s first hackable, customisable, dead simple, robotic coffee maker’, Mugsy allows you to take control of every aspect of the coffee-making process: from grind size and water temperature, to brew and bloom time. Feeling lazy instead? Read in your beans’ barcode via an onboard scanner, and it will automatically use the best settings for your brew.
Looking to start your day with your favourite coffee straight out of bed? Send the robot a text, email, or tweet, and it will notify you when your coffee is ready!
Learning through product development
“Initially, I used [Mugsy] as a way to teach myself hardware design,” explained Matthew at his Editor’s Choice–winning Maker Faire stand. “I really wanted to hold something tangible in my hands. By using the Raspberry Pi and just being curious, anytime I wanted to use a new technology, I would try to pull back [and ask] ‘How can I integrate this into Mugsy?’”
By exploring his passions and using Mugsy as his guinea pig, Matthew created a project that not only solves a problem — how to make amazing coffee at home — but also brings him one step closer to ‘making things’ for a living. “I used to dream about this stuff when I was a kid, and I used to say ‘I’m never going to be able to do something like that.’” he admitted. But now, with open-source devices like the Raspberry Pi so readily available, he “can see the end of the road”: making his passion his livelihood.
With only a few days left on the Kickstarter campaign, Mugsy has reached its goal and then some. It’s available for backing from $150 if you provide your own Raspberry Pi 3, or from $175 with a Pi included — check it out today!
John Oliver covered bitcoin/cryptocurrencies last night. I thought I’d describe a bunch of things he gets wrong.
How Bitcoin works
Nowhere in the show does it describe what Bitcoin is and how it works.
Discussions should always start with Satoshi Nakamoto’s original paper. The thing Satoshi points out is that there is an important cost to normal transactions, namely, the entire legal system designed to protect you against fraud, such as the way you can reverse the transactions on your credit card if it gets stolen. The point of Bitcoin is that there is no way to reverse a charge. A transaction is done via cryptography: to transfer money to me, you decrypt it with your secret key and encrypt it with mine, handing ownership over to me with no third party involved that can reverse the transaction, and essentially no overhead.
All the rest of the stuff, like the decentralized blockchain and mining, is all about making that work.
Bitcoin crazies forget about the original genesis of Bitcoin. For example, they talk about adding features to stop fraud, reversing transactions, and having a central authority that manages that. This misses the point, because the existing electronic banking system already does that, and does a better job at it than cryptocurrencies ever can. If you want to mock cryptocurrencies, talk about the “DAO”, which did exactly that — and collapsed in a big fraudulent scheme where insiders made money and outsiders didn’t.
Sticking to Satoshi’s original ideas are a lot better than trying to repeat how the crazy fringe activists define Bitcoin.
How does any money have value?
Oliver’s answer is currencies have value because people agree that they have value, like how they agree a Beanie Baby is worth $15,000.
This is wrong. A better way of asking the question why the value of money changes. The dollar has been losing roughly 2% of its value each year for decades. This is called “inflation”, as the dollar loses value, it takes more dollars to buy things, which means the price of things (in dollars) goes up, and employers have to pay us more dollars so that we can buy the same amount of things.
The reason the value of the dollar changes is largely because the Federal Reserve manages the supply of dollars, using the same law of Supply and Demand. As you know, if a supply decreases (like oil), then the price goes up, or if the supply of something increases, the price goes down. The Fed manages money the same way: when prices rise (the dollar is worth less), the Fed reduces the supply of dollars, causing it to be worth more. Conversely, if prices fall (or don’t rise fast enough), the Fed increases supply, so that the dollar is worth less.
The reason money follows the law of Supply and Demand is because people use money, they consume it like they do other goods and services, like gasoline, tax preparation, food, dance lessons, and so forth. It’s not like a fine art painting, a stamp collection or a Beanie Baby — money is a product. It’s just that people have a hard time thinking of it as a consumer product since, in their experience, money is what they use to buy consumer products. But it’s a symmetric operation: when you buy gasoline with dollars, you are actually selling dollars in exchange for gasoline. That you call one side in this transaction “money” and the other “goods” is purely arbitrary, you call gasoline money and dollars the good that is being bought and sold for gasoline.
The reason dollars is a product is because trying to use gasoline as money is a pain in the neck. Storing it and exchanging it is difficult. Goods like this do become money, such as famously how prisons often use cigarettes as a medium of exchange, even for non-smokers, but it has to be a good that is fungible, storable, and easily exchanged. Dollars are the most fungible, the most storable, and the easiest exchanged, so has the most value as “money”. Sure, the mechanic can fix the farmers car for three chickens instead, but most of the time, both parties in the transaction would rather exchange the same value using dollars than chickens.
So the value of dollars is not like the value of Beanie Babies, which people might buy for $15,000, which changes purely on the whims of investors. Instead, a dollar is like gasoline, which obey the law of Supply and Demand.
This brings us back to the question of where Bitcoin gets its value. While Bitcoin is indeed used like dollars to buy things, that’s only a tiny use of the currency, so therefore it’s value isn’t determined by Supply and Demand. Instead, the value of Bitcoin is a lot like Beanie Babies, obeying the laws of investments. So in this respect, Oliver is right about where the value of Bitcoin comes, but wrong about where the value of dollars comes from.
Why Bitcoin conference didn’t take Bitcoin
John Oliver points out the irony of a Bitcoin conference that stopped accepting payments in Bitcoin for tickets.
The biggest reason for this is because Bitcoin has become so popular that transaction fees have gone up. Instead of being proof of failure, it’s proof of popularity. What John Oliver is saying is the old joke that nobody goes to that popular restaurant anymore because it’s too crowded and you can’t get a reservation.
Moreover, the point of Bitcoin is not to replace everyday currencies for everyday transactions. If you read Satoshi Nakamoto’s whitepaper, it’s only goal is to replace certain types of transactions, like purely electronic transactions where electronic goods and services are being exchanged. Where real-life goods/services are being exchanged, existing currencies work just fine. It’s only the crazy activists who claim Bitcoin will eventually replace real world currencies — the saner people see it co-existing with real-world currencies, each with a different value to consumers.
Turning a McNugget back into a chicken
John Oliver uses the metaphor of turning a that while you can process a chicken into McNuggets, you can’t reverse the process. It’s a funny metaphor.
But it’s not clear what the heck this metaphor is trying explain. That’s not a metaphor for the blockchain, but a metaphor for a “cryptographic hash”, where each block is a chicken, and the McNugget is the signature for the block (well, the block plus the signature of the last block, forming a chain).
Even then that metaphor as problems. The McNugget produced from each chicken must be unique to that chicken, for the metaphor to accurately describe a cryptographic hash. You can therefore identify the original chicken simply by looking at the McNugget. A slight change in the original chicken, like losing a feather, results in a completely different McNugget. Thus, nuggets can be used to tell if the original chicken has changed.
This then leads to the key property of the blockchain, it is unalterable. You can’t go back and change any of the blocks of data, because the fingerprints, the nuggets, will also change, and break the nugget chain.
The point is that while John Oliver is laughing at a silly metaphor to explain the blockchain becuase he totally misses the point of the metaphor.
Oliver rightly says “don’t worry if you don’t understand it — most people don’t”, but that includes the big companies that John Oliver name. Some companies do get it, and are producing reasonable things (like JP Morgan, by all accounts), but some don’t. IBM and other big consultancies are charging companies millions of dollars to consult with them on block chain products where nobody involved, the customer or the consultancy, actually understand any of it. That doesn’t stop them from happily charging customers on one side and happily spending money on the other.
Thus, rather than Oliver explaining the problem, he’s just being part of the problem. His explanation of blockchain left you dumber than before.
John Oliver mocks the Brave ICO ($35 million in 30 seconds), claiming it’s all driven by YouTube personalities and people who aren’t looking at the fundamentals.
And while this is true, most ICOs are bunk, the Brave ICO actually had a business model behind it. Brave is a Chrome-like web-browser whose distinguishing feature is that it protects your privacy from advertisers. If you don’t use Brave or a browser with an ad block extension, you have no idea how bad things are for you. However, this presents a problem for websites that fund themselves via advertisements, which is most of them, because visitors no longer see ads. Brave has a fix for this. Most people wouldn’t mind supporting the websites they visit often, like the New York Times. That’s where the Brave ICO “token” comes in: it’s not simply stock in Brave, but a token for micropayments to websites. Users buy tokens, then use them for micropayments to websites like New York Times. The New York Times then sells the tokens back to the market for dollars. The buying and selling of tokens happens without a centralized middleman.
How to you make money from Bitcoin?
The last part of the show is dedicated to describing all the scam out there, advising people to be careful, and to be “responsible”. This is garbage.
It’s like my simple two step process to making lots of money via Bitcoin: (1) buy when the price is low, and (2) sell when the price is high. My advice is correct, of course, but useless. Same as “be careful” and “invest responsibly”.
The truth about investing in cryptocurrencies is “don’t”. The only responsible way to invest is to buy low-overhead market index funds and hold for retirement. No, you won’t get super rich doing this, but anything other than this is irresponsible gambling.
It’s a hard lesson to learn, because everyone is telling you the opposite. The entire channel CNBC is devoted to day traders, who buy and sell stocks at a high rate based on the same principle as a ponzi scheme, basing their judgment not on the fundamentals (like long term dividends) but animal spirits of whatever stock is hot or cold at the moment. This is the same reason people buy or sell Bitcoin, not because they can describe the fundamental value, but because they believe in a bigger fool down the road who will buy it for even more.
For things like Bitcoin, the trick to making money is to have bought it over 7 years ago when it was essentially worthless, except to nerds who were into that sort of thing. It’s the same tick to making a lot of money in Magic: The Gathering trading cards, which nerds bought decades ago which are worth a ton of money now. Or, to have bought Apple stock back in 2009 when the iPhone was new, when nerds could understand the potential of real Internet access and apps that Wall Street could not.
That was my strategy: be a nerd, who gets into things. I’ve made a good amount of money on all these things because as a nerd, I was into Magic: The Gathering, Bitcoin, and the iPhone before anybody else was, and bought in at the point where these things were essentially valueless.
At this point with cryptocurrencies, with the non-nerds now flooding the market, there little chance of making it rich. The lottery is probably a better bet. Instead, if you want to make money, become a nerd, obsess about a thing, understand a thing when its new, and cash out once the rest of the market figures it out. That might be Brave, for example, but buy into it because you’ve spent the last year studying the browser advertisement ecosystem, the market’s willingness to pay for content, and how their Basic Attention Token delivers value to websites — not because you want in on the ICO craze.
John Oliver spends 25 minutes explaining Bitcoin, Cryptocurrencies, and the Blockchain to you. Sure, it’s funny, but it leaves you worse off than when it started. It admits they “simplify” the explanation, but they simplified it so much to the point where they removed all useful information.
One of the challenges faced by our customers—especially those in highly regulated industries—is balancing the need for security with flexibility. In this post, we cover how to enable multi-tenancy and increase security by using EMRFS (EMR File System) authorization, the Amazon S3 storage-level authorization on Amazon EMR.
Amazon EMR is an easy, fast, and scalable analytics platform enabling large-scale data processing. EMRFS authorization provides Amazon S3 storage-level authorization by configuring EMRFS with multiple IAM roles. With this functionality enabled, different users and groups can share the same cluster and assume their own IAM roles respectively.
Simply put, on Amazon EMR, we can now have an Amazon EC2 role per user assumed at run time instead of one general EC2 role at the cluster level. When the user is trying to access Amazon S3 resources, Amazon EMR evaluates against a predefined mappings list in EMRFS configurations and picks up the right role for the user.
In this post, we will discuss what EMRFS authorization is (Amazon S3 storage-level access control) and show how to configure the role mappings with detailed examples. You will then have the desired permissions in a multi-tenant environment. We also demo Amazon S3 access from HDFS command line, Apache Hive on Hue, and Apache Spark.
EMRFS authorization for Amazon S3
There are two prerequisites for using this feature:
Users must be authenticated, because EMRFS needs to map the current user/group/prefix to a predefined user/group/prefix. There are several authentication options. In this post, we launch a Kerberos-enabled cluster that manages the Key Distribution Center (KDC) on the master node, and enable a one-way trust from the KDC to a Microsoft Active Directory domain.
The application must support accessing Amazon S3 via Applications that have their own S3FileSystem APIs (for example, Presto) are not supported at this time.
EMRFS supports three types of mapping entries: user, group, and Amazon S3 prefix. Let’s use an example to show how this works.
Assume that you have the following three identities in your organization, and they are defined in the Active Directory:
To enable all these groups and users to share the EMR cluster, you need to define the following IAM roles:
In this case, you create a separate Amazon EC2 role that doesn’t give any permission to Amazon S3. Let’s call the role the base role (the EC2 role attached to the EMR cluster), which in this example is named EMR_EC2_RestrictedRole. Then, you define all the Amazon S3 permissions for each specific user or group in their own roles. The restricted role serves as the fallback role when the user doesn’t belong to any user/group, nor does the user try to access any listed Amazon S3 prefixes defined on the list.
Important: For all other roles, like emrfs_auth_group_role_data_eng, you need to add the base role (EMR_EC2_RestrictedRole) as the trusted entity so that it can assume other roles. See the following example:
This role grants all Amazon S3 permissions to the emrfs-auth-data-science-bucket-demo bucket and all the objects in it. Similarly, the policy for the role emrfs_auth_group_role_data_eng is shown below:
To configure EMRFS authorization, you use EMR security configuration. Here is the configuration we use in this post
Consider the following scenario.
First, the admin user admin1 tries to log in and run a command to access Amazon S3 data through EMRFS. The first role emrfs_auth_user_role_admin_user on the mapping list, which is a user role, is mapped and picked up. Then admin1 has access to the Amazon S3 locations that are defined in this role.
Then a user from the data engineer group (grp_data_engineering) tries to access a data bucket to run some jobs. When EMRFS sees that the user is a member of the grp_data_engineering group, the group role emrfs_auth_group_role_data_eng is assumed, and the user has proper access to Amazon S3 that is defined in the emrfs_auth_group_role_data_eng role.
Next, the third user comes, who is not an admin and doesn’t belong to any of the groups. After failing evaluation of the top three entries, EMRFS evaluates whether the user is trying to access a certain Amazon S3 prefix defined in the last mapping entry. This type of mapping entry is called the prefix type. If the user is trying to access s3://emrfs-auth-default-bucket-demo/, then the prefix mapping is in effect, and the prefix role emrfs_auth_prefix_role_default_s3_prefix is assumed.
If the user is not trying to access any of the Amazon S3 paths that are defined on the list—which means it failed the evaluation of all the entries—it only has the permissions defined in the EMR_EC2RestrictedRole. This role is assumed by the EC2 instances in the cluster.
In this process, all the mappings defined are evaluated in the defined order, and the first role that is mapped is assumed, and the rest of the list is skipped.
Setting up an EMR cluster and mapping Active Directory users and groups
Now that we know how EMRFS authorization role mapping works, the next thing we need to think about is how we can use this feature in an easy and manageable way.
Active Directory setup
Many customers manage their users and groups using Microsoft Active Directory or other tools like OpenLDAP. In this post, we create the Active Directory on an Amazon EC2 instance running Windows Server and create the users and groups we will be using in the example below. After setting up Active Directory, we use the Amazon EMR Kerberos auto-join capability to establish a one-way trust from the KDC running on the EMR master node to the Active Directory domain on the EC2 instance. You can use your own directory services as long as it talks to the LDAP (Lightweight Directory Access Protocol).
After configuring Active Directory, you can create all the users and groups using the Active Directory tools and add users to appropriate groups. In this example, we created users like admin1, dataeng1, datascientist1, grp_data_engineering, and grp_data_science, and then add the users to the right groups.
Join the EMR cluster to an Active Directory domain
For clusters with Kerberos, Amazon EMR now supports automated Active Directory domain joins. You can use the security configuration to configure the one-way trust from the KDC to the Active Directory domain. You also configure the EMRFS role mappings in the same security configuration.
The following is an example of the EMR security configuration with a trusted Active Directory domain EMRKRB.TEST.COM and the EMRFS role mappings as we discussed earlier:
The EMRFS role mapping configuration is shown in this example:
We will also provide an example AWS CLI command that you can run.
Launching the EMR cluster and running the tests
Now you have configured Kerberos and EMRFS authorization for Amazon S3.
Additionally, you need to configure Hue with Active Directory using the Amazon EMR configuration API in order to log in using the AD users created before. The following is an example of Hue AD configuration.
Note: In the preceding configuration JSON file, change the values as required before pasting it into the software setting section in the Amazon EMR console.
Now let’s use this configuration and the security configuration you created before to launch the cluster.
In the Amazon EMR console, choose Create cluster. Then choose Go to advanced options. On the Step1: Software and Steps page, under Edit software settings (optional), paste the configuration in the box.
The rest of the setup is the same as an ordinary cluster setup, except in the Security Options section. In Step 4: Security, under Permissions, choose Custom, and then choose the RestrictedRole that you created before.
Choose the appropriate subnets (these should meet the base requirement in order for a successful Active Directory join—see the Amazon EMR Management Guide for more details), and choose the appropriate security groups to make sure it talks to the Active Directory. Choose a key so that you can log in and configure the cluster.
Most importantly, choose the security configuration that you created earlier to enable Kerberos and EMRFS authorization for Amazon S3.
You can use the following AWS CLI command to create a cluster.
It successfully returns the listing results. Next we will test Apache Hive and then Apache Spark.
To run jobs successfully, you need to create a home directory for every user in HDFS for staging data under /user/<username>. Users can configure a step to create a home directory at cluster launch time for every user who has access to the cluster. In this example, you use Hue since Hue will create the home directory in HDFS for the user at the first login. Here Hue also needs to be integrated with the same Active Directory as explained in the example configuration described earlier.
First, log in to Hue as a data engineer user, and open a Hive Notebook in Hue. Then run a query to create a new table pointing to the data engineer bucket, s3://emrfs-auth-data-engineering-bucket-demo/table1_data_eng/.
You can see that the table was created successfully. Now try to create another table pointing to the data science group’s bucket, where the data engineer group doesn’t have access.
It failed and threw an Amazon S3 Access Denied error.
Now insert one line of data into the successfully create table.
Next, log out, switch to a data science group user, and create another table, test2_datasci_tb.
The creation is successful.
The last task is to test Spark (it requires the user directory, but Hue created one in the previous step).
Now let’s come back to the command line and run some Spark commands.
Login to the master node using the datascientist1 user:
Start the SparkSQL interactive shell by typing spark-sql, and run the show tables command. It should list the tables that you created using Hive.
As a data science group user, try select on both tables. You will find that you can only select the table defined in the location that your group has access to.
EMRFS authorization for Amazon S3 enables you to have multiple roles on the same cluster, providing flexibility to configure a shared cluster for different teams to achieve better efficiency. The Active Directory integration and group mapping make it much easier for you to manage your users and groups, and provides better auditability in a multi-tenant environment.
If you’re an educator from the UK, chances are you’ve heard of Bett. For everyone else: Bett stands for British Education Technology Tradeshow. It’s the El Dorado of edtech, where every street is adorned with interactive whiteboards, VR headsets, and new technologies for the classroom. Every year since 2014, the Raspberry Pi Foundation has been going to the event hosted in the ExCeL London to chat to thousands of lovely educators about our free programmes and resources.
On a mission
Our setup this year consisted of four pods (imagine tables on steroids) in the STEAM village, and the mission of our highly trained team of education agents was to establish a new world record for Highest number of teachers talked to in a four-day period. I’m only half-joking.
Educators with a mission
The best thing about being at Bett is meeting the educators who use our free content and training materials. It’s easy to get wrapped up in the everyday tasks of the office without stopping to ask: “Hey, have we asked our users what they want recently?” Events like Bett help us to connect with our audience, creating some lovely moments for both sides. We had plenty of Hello World authors visit us, including Gary Stager, co-author of Invent to Learn, a must-read for any computing educator. More than 700 people signed up for a digital subscription, we had numerous lovely conversations about our content and about ideas for new articles, and we met many new authors expressing an interest in writing for us in the future.
We also talked to lots of Raspberry Pi Certified Educators who we’d trained in our free Picademy programme — new dates in Belfast and Dublin now! — and who are now doing exciting and innovative things in their local areas. For example, Chris Snowden came to tell us about the great digital making outreach work he has been doing with the Eureka! museum in Yorkshire.
Raspberry Pi Certified Educator Chris Snowden
Digital making for kids
The other best thing about being at Bett is running workshops for young learners and seeing the delight on their faces when they accomplish something they believed to be impossible only five minutes ago. On the Saturday, we ran a massive Raspberry Jam/Code Club where over 250 children, parents, and curious onlookers got stuck into some of our computing activities. We were super happy to find out that we’d won the Bett Kids’ Choice Award for Best Hands-on Experience — a fantastic end to a busy four days. With Bett over for another year, our tired and happy ‘rebel alliance’ from across the Foundation still had the energy to pose for a group photo.
Beginning in April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. As of the end of 2017, there are about 88 million entries totaling 23 GB of data. You can download this data from our website if you want to do your own research, but for starters here’s what we found.
At the end of 2017 we had 93,240 spinning hard drives. Of that number, there were 1,935 boot drives and 91,305 data drives. This post looks at the hard drive statistics of the data drives we monitor. We’ll review the stats for Q4 2017, all of 2017, and the lifetime statistics for all of the drives Backblaze has used in our cloud storage data centers since we started keeping track. Along the way we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.
Hard Drive Reliability Statistics for Q4 2017
At the end of Q4 2017 Backblaze was monitoring 91,305 hard drives used to store data. For our evaluation we remove from consideration those drives which were used for testing purposes and those drive models for which we did not have at least 45 drives (read why after the chart). This leaves us with 91,243 hard drives. The table below is for the period of Q4 2017.
A few things to remember when viewing this chart:
The failure rate listed is for just Q4 2017. If a drive model has a failure rate of 0%, it means there were no drive failures of that model during Q4 2017.
There were 62 drives (91,305 minus 91,243) that were not included in the list above because we did not have at least 45 of a given drive model. The most common reason we would have fewer than 45 drives of one model is that we needed to replace a failed drive and we had to purchase a different model as a replacement because the original model was no longer available. We use 45 drives of the same model as the minimum number to qualify for reporting quarterly, yearly, and lifetime drive statistics.
Quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of drive days. For example, the Seagate 4 TB drive, model ST4000DM005, has a annualized failure rate of 29.08%, but that is based on only 1,255 drive days and 1 (one) drive failure.
AFR stands for Annualized Failure Rate, which is the projected failure rate for a year based on the data from this quarter only.
Bulking Up and Adding On Storage
Looking back over 2017, we not only added new drives, we “bulked up” by swapping out functional and smaller 2, 3, and 4TB drives with larger 8, 10, and 12TB drives. The changes in drive quantity by quarter are shown in the chart below:
For 2017 we added 25,746 new drives, and lost 6,442 drives to retirement for a net of 19,304 drives. When you look at storage space, we added 230 petabytes and retired 19 petabytes, netting us an additional 211 petabytes of storage in our data center in 2017.
2017 Hard Drive Failure Stats
Below are the lifetime hard drive failure statistics for the hard drive models that were operational at the end of Q4 2017. As with the quarterly results above, we have removed any non-production drives and any models that had fewer than 45 drives.
The chart above gives us the lifetime view of the various drive models in our data center. The Q4 2017 chart at the beginning of the post gives us a snapshot of the most recent quarter of the same models.
Let’s take a look at the same models over time, in our case over the past 3 years (2015 through 2017), by looking at the annual failure rates for each of those years.
The failure rate for each year is calculated for just that year. In looking at the results the following observations can be made:
The failure rates for both of the 6 TB models, Seagate and WDC, have decreased over the years while the number of drives has stayed fairly consistent from year to year.
While it looks like the failure rates for the 3 TB WDC drives have also decreased, you’ll notice that we migrated out nearly 1,000 of these WDC drives in 2017. While the remaining 180 WDC 3 TB drives are performing very well, decreasing the data set that dramatically makes trend analysis suspect.
The Toshiba 5 TB model and the HGST 8 TB model had zero failures over the last year. That’s impressive, but with only 45 drives in use for each model, not statistically useful.
The HGST/Hitachi 4 TB models delivered sub 1.0% failure rates for each of the three years. Amazing.
A Few More Numbers
To save you countless hours of looking, we’ve culled through the data to uncover the following tidbits regarding our ever changing hard drive farm.
116,833 — The number of hard drives for which we have data from April 2013 through the end of December 2017. Currently there are 91,305 drives (data drives) in operation. This means 25,528 drives have either failed or been removed from service due for some other reason — typically migration.
29,844 — The number of hard drives that were installed in 2017. This includes new drives, migrations, and failure replacements.
81.76 — The number of hard drives that were installed each day in 2017. This includes new drives, migrations, and failure replacements.
95,638 — The number of drives installed since we started keeping records in April 2013 through the end of December 2017.
55.41 — The average number of hard drives installed per day from April 2013 to the end of December 2017. The installations can be new drives, migration replacements, or failure replacements.
1,508 — The number of hard drives that were replaced as failed in 2017.
4.13 — The average number of hard drives that have failed each day in 2017.
6,795 — The number of hard drives that have failed from April 2013 until the end of December 2017.
3.94 — The average number of hard drives that have failed each day from April 2013 until the end of December 2017.
Can’t Get Enough Hard Drive Stats?
We’ll be presenting the webinar “Backblaze Hard Drive Stats for 2017” on Thursday February 9, 2017 at 10:00 Pacific time. The webinar will dig deeper into the quarterly, yearly, and lifetime hard drive stats and include the annual and lifetime stats by drive size and manufacturer. You will need to subscribe to the Backblaze BrightTALK channel to view the webinar. Sign up today.
As a reminder, the complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone — it is free.
Good luck and let us know if you find anything interesting.
For over a decade, civil libertarians have been fighting government mass surveillance of innocent Americans over the Internet. We’ve just lost an important battle. On January 18, President Trumpsigned the renewal of Section 702, domestic mass surveillance became effectively a permanent part of US law.
Section 702 was initially passed in 2008, as an amendment to the Foreign Intelligence Surveillance Act of 1978. As the title of that law says, it was billed as a way for the NSA to spy on non-Americans located outside the United States. It was supposed to be an efficiency and cost-saving measure: the NSA was already permitted to tap communications cables located outside the country, and it was already permitted to tap communications cables from one foreign country to another that passed through the United States. Section 702 allowed it to tap those cables from inside the United States, where it was easier. It also allowed the NSA to request surveillance data directly from Internet companies under a program called PRISM.
The problem is that this authority also gave the NSA the ability to collect foreign communications and data in a way that inherently and intentionally also swept up Americans’ communications as well, without a warrant. Other law enforcement agencies are allowed to ask the NSA to search those communications, give their contents to the FBI and other agencies and then lie about their origins in court.
In 1978, after Watergate had revealed the Nixon administration’s abuses of power, we erected a wall between intelligence and law enforcement that prevented precisely this kind of sharing of surveillance data under any authority less restrictive than the Fourth Amendment. Weakening that wall is incredibly dangerous, and the NSA should never have been given this authority in the first place.
Arguably, it never was. The NSA had been doing this type of surveillance illegally for years, something that was first made public in 2006. Section 702 was secretly used as a way to paper over that illegal collection, but nothing in the text of the later amendment gives the NSA this authority. We didn’t know that the NSA was using this law as the statutory basis for this surveillance until Edward Snowden showed us in 2013.
Civil libertarians have been battling this law in both Congress and the courts ever since it was proposed, and the NSA’s domestic surveillance activities even longer. What this most recent vote tells me is that we’ve lost that fight.
Section 702 was passed under George W. Bush in 2008, reauthorized under Barack Obama in 2012, and now reauthorized again under Trump. In all three cases, congressional support was bipartisan. It has survived multiple lawsuits by the Electronic Frontier Foundation, the ACLU, and others. It has survived the revelations by Snowden that it was being used far more extensively than Congress or the public believed, and numerous public reports of violations of the law. It has even survived Trump’s belief that he was being personally spied on by the intelligence community, as well as any congressional fears that Trump could abuse the authority in the coming years. And though this extension lasts only six years, it’s inconceivable to me that it will ever be repealed at this point.
So what do we do? If we can’t fight this particular statutory authority, where’s the new front on surveillance? There are, it turns out, reasonable modifications that target surveillance more generally, and not in terms of any particular statutory authority. We need to look at US surveillance law more generally.
First, we need to strengthen the minimization procedures to limit incidental collection. Since the Internet was developed, all the world’s communications travel around in a single global network. It’s impossible to collect only foreign communications, because they’re invariably mixed in with domestic communications. This is called “incidental” collection, but that’s a misleading name. It’s collected knowingly, and searched regularly. The intelligence community needs much stronger restrictions on which American communications channels it can access without a court order, and rules that require they delete the data if they inadvertently collect it. More importantly, “collection” is defined as the point the NSA takes a copy of the communications, and not later when they search their databases.
Second, we need to limit how other law enforcement agencies can use incidentally collected information. Today, those agencies can query a database of incidental collection on Americans. The NSA can legally pass information to those other agencies. This has to stop. Data collected by the NSA under its foreign surveillance authority should not be used as a vehicle for domestic surveillance.
The most recent reauthorization modified this lightly, forcing the FBI to obtain a court order when querying the 702 data for a criminal investigation. There are still exceptions and loopholes, though.
Third, we need to end what’s called “parallel construction.” Today, when a law enforcement agency uses evidence found in this NSA database to arrest someone, it doesn’t have to disclose that fact in court. It can reconstruct the evidence in some other manner once it knows about it, and then pretend it learned of it that way. This right to lie to the judge and the defense is corrosive to liberty, and it must end.
Pressure to reform the NSA will probably first come from Europe. Already, European Union courts have pointed to warrantless NSA surveillance as a reason to keep Europeans’ data out of US hands. Right now, there is a fragile agreement between the EU and the United States – called “Privacy Shield” — that requires Americans to maintain certain safeguards for international data flows. NSA surveillance goes against that, and it’s only a matter of time before EU courts start ruling this way. That’ll have significant effects on both government and corporate surveillance of Europeans and, by extension, the entire world.
Further pressure will come from the increased surveillance coming from the Internet of Things. When your home, car, and body are awash in sensors, privacy from both governments and corporations will become increasingly important. Sooner or later, society will reach a tipping point where it’s all too much. When that happens, we’re going to see significant pushback against surveillance of all kinds. That’s when we’ll get new laws that revise all government authorities in this area: a clean sweep for a new world, one with new norms and new fears.
It’s possible that a federal court will rule on Section 702. Although there have been many lawsuits challenging the legality of what the NSA is doing and the constitutionality of the 702 program, no court has ever ruled on those questions. The Bush and Obama administrations successfully argued that defendants don’t have legal standing to sue. That is, they have no right to sue because they don’t know they’re being targeted. If any of the lawsuits can get past that, things might change dramatically.
Meanwhile, much of this is the responsibility of the tech sector. This problem exists primarily because Internet companies collect and retain so much personal data and allow it to be sent across the network with minimal security. Since the government has abdicated its responsibility to protect our privacy and security, these companies need to step up: Minimize data collection. Don’t save data longer than absolutely necessary. Encrypt what has to be saved. Well-designed Internet services will safeguard users, regardless of government surveillance authority.
For the rest of us concerned about this, it’s important not to give up hope. Everything we do to keep the issue in the public eye – and not just when the authority comes up for reauthorization again in 2024 — hastens the day when we will reaffirm our rights to privacy in the digital age.
On his blog, embedded developer Karim Yaghmour has written about his ten-day trip to Shenzen, China, which is known as the “Silicon Valley of hardware”. His lengthy trip report covers much that would be of use to others who are thinking of making the trip, but also serves as an interesting travelogue even for those who are likely to never go. “The map didn’t disappoint and I was able to find a large number of kiosks selling some of the items I was interested in. Obviously many kiosks also had items that I had seen on Amazon or elsewhere as well. I was mostly focusing on things I hadn’t seen before. After a few hours of walking floors upon floors of shops, I was ready to start focusing on other aspects of my research: hard to source and/or evaluate components, tools and expanding my knowledge of what was available in the hardware space. Hint: TEGES’ [The Essential Guide to Electronics in Shenzhen] advice about having comfortable shoes and comfortable clothing is completely warranted.
Finding tools was relatively easy. TEGES indicates the building and floor to go to, and you’ll find most anything you can think of from rework stations, to pick-and-place machines, and including things like oscilloscopes, stereo microscopes, multimeters, screwdrivers, etc. In the process I saw some tools which I couldn’t immediately figure out the purpose for, but later found out their uses on some other visits. Satisfied with a first glance at the tools, I set out to look for one specific component I was having a hard time with. That proved a lot more difficult than anticipated. Actually I should qualify that. It was trivial to find tons of it, just not something that matched exactly what I needed. I used TEGES to identify one part of the market that seemed most likely to have what I was looking for, but again, I could find lots of it, just not what I needed.”
Днес мои клиенти ми звъннаха, че компютърът не им позволявал да си свалят новата версия на една програма от НАП. Когато стигнах на място, установих следното:
1. Въпросната програма е Справка от чл. 73 от ЗДДФЛ, версия 6.0 2. „Не може да бъде свалена“, понеже Windows Defender открива в нея вирус – Trojan:Win32/Azden.A!cl – и я блокира. 3. Сайтът на НАП, към който те се свързват, е истинският. Линкът е http://www.nap.bg/document?id=4311
Липсата на време не ми позволи да седна и да анализирам файловете в пакета ръчно, или дори да ги проверя с друг антивирус. Затова не зная дали реално съдържат вирус, или е фалшив позитив на Windows Defender.
Както едното, така и другото се е случвало преди. Надявам се да е фалшива тревога – поне един друг продукт, Xeoma, бива идентифициран погрешно от WD като този вирус. Ако обаче е реална заплаха, е неприятна. Вирусът е доста „модерен“ – събира и изпраща на стопаните си много подробна информация за компютъра и потребителите му, ъпдейтва се автоматично, сваля от Интернет и инсталира още допълнителни вирусни възможности, и позволява отдалечено командване на компютъра. Затова е разумно в този случай да се заложи на предпазливостта.
Свързах се веднага с НАП и ги предупредих за ситуацията. Единствената реакция (упорито повтаряна всеки път, когато се опитвах да обясня, че е възможно положението да е опасно), беше да им пратя е-майл и принтстрийн на съобщението, което получавам. За всеки случай им пратих описание на проблема – току-виж го прочете и някой, който различава компютър от прахосмукачка.
Моят съвет към всички е – задръжте мъничко с инсталирането на тази версия. Изчакайте, докато се разбере дали наистина съдържа вирус, или е фалшива тревога. НАП вероятно скоро ще обявят нещата и в двата случая – елементарна отговорност е да го направят.
This is a guest post by Yukinori Koide, an the head of development for the Newspass department at Gunosy.
Gunosy is a news curation application that covers a wide range of topics, such as entertainment, sports, politics, and gourmet news. The application has been installed more than 20 million times.
Gunosy aims to provide people with the content they want without the stress of dealing with a large influx of information. We analyze user attributes, such as gender and age, and past activity logs like click-through rate (CTR). We combine this information with article attributes to provide trending, personalized news articles to users.
Users need fresh and personalized news. There are two constraints to consider when delivering appropriate articles:
Time: Articles have freshness—that is, they lose value over time. New articles need to reach users as soon as possible.
Frequency (volume): Only a limited number of articles can be shown. It’s unreasonable to display all articles in the application, and users can’t read all of them anyway.
To deliver fresh articles with a high probability that the user is interested in them, it’s necessary to include not only past user activity logs and some feature values of articles, but also the most recent (real-time) user activity logs.
We optimize the delivery of articles with these two steps.
Personalization: Deliver articles based on each user’s attributes, past activity logs, and feature values of each article—to account for each user’s interests.
Trends analysis/identification: Optimize delivering articles using recent (real-time) user activity logs—to incorporate the latest trends from all users.
Optimizing the delivery of articles is always a cold start. Initially, we deliver articles based on past logs. We then use real-time data to optimize as quickly as possible. In addition, news has a short freshness time. Specifically, day-old news is past news, and even the news that is three hours old is past news. Therefore, shortening the time between step 1 and step 2 is important.
To tackle this issue, we chose AWS for processing streaming data because of its fully managed services, cost-effectiveness, and so on.
The following diagrams depict the architecture for optimizing article delivery by processing real-time user activity logs
There are three processing flows:
Process real-time user activity logs.
Store and process all user-based and article-based logs.
Execute ad hoc or heavy queries.
In this post, I focus on the first processing flow and explain how it works.
Process real-time user activity logs
The following are the steps for processing user activity logs in real time using Kinesis Data Streams and Kinesis Data Analytics.
The Fluentd server sends the following user activity logs to Kinesis Data Streams:
b. Insert the joined source stream and application reference data source into the temporary stream.
CREATE OR REPLACE PUMP "TMP_PUMP" AS
INSERT INTO "TMP_SQL_STREAM"
R.GENDER, R.SEGMENT_ID, S.ARTICLE_ID, S.ACTION
FROM "SOURCE_SQL_STREAM_001" S
LEFT JOIN "REFERENCE_DATA_SOURCE" R
ON S.USER_ID = R.USER_ID;
c. Define the destination stream named DESTINATION_SQL_STREAM.
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
TIME TIMESTAMP, GENDER VARCHAR(32), SEGMENT_ID INTEGER, ARTICLE_ID INTEGER,
IMPRESSION INTEGER, CLICK INTEGER
d. Insert the processed temporary stream, using a tumbling window, into the destination stream per minute.
CREATE OR REPLACE PUMP "STREAM_PUMP" AS
INSERT INTO "DESTINATION_SQL_STREAM"
ROW_TIME AS TIME,
GENDER, SEGMENT_ID, ARTICLE_ID,
SUM(CASE ACTION WHEN 'impression' THEN 1 ELSE 0 END) AS IMPRESSION,
SUM(CASE ACTION WHEN 'click' THEN 1 ELSE 0 END) AS CLICK
GENDER, SEGMENT_ID, ARTICLE_ID,
FLOOR("TMP_SQL_STREAM".ROWTIME TO MINUTE);
Batch servers get results from Amazon ES every minute. They then optimize delivering articles with other data sources using a proprietary optimization algorithm.
How to connect a stream to another stream in another AWS Region
When we built the solution, Kinesis Data Analytics was not available in the Asia Pacific (Tokyo) Region, so we used the US West (Oregon) Region. The following shows how we connected a data stream to another data stream in the other Region.
There is no need to continue containing all components in a single AWS Region, unless you have a situation where a response difference at the millisecond level is critical to the service.
The solution provides benefits for both our company and for our users. Benefits for the company are cost savings—including development costs, operational costs, and infrastructure costs—and reducing delivery time. Users can now find articles of interest more quickly. The solution can process more than 500,000 records per minute, and it enables fast and personalized news curating for our users.
In this post, I showed you how we optimize trending user activities to personalize news using Amazon Kinesis Data Firehose, Amazon Kinesis Data Analytics, and related AWS services in Gunosy.
AWS gives us a quick and economical solution and a good experience.
If you have questions or suggestions, please comment below.
Yukinori Koide is the head of development for the Newspass department at Gunosy. He is working on standardization of provisioning and deployment flow, promoting the utilization of serverless and containers for machine learning and AI services. His favorite AWS services are DynamoDB, Lambda, Kinesis, and ECS.
Akihiro Tsukada is a start-up solutions architect with AWS. He supports start-up companies in Japan technically at many levels, ranging from seed to later-stage.
Yuta Ishii is a solutions architect with AWS. He works with our customers to provide architectural guidance for building media & entertainment services, helping them improve the value of their services when using AWS.
Use the Join button above to receive notification of new posts in this series.
Running out of cash is one of the quickest ways for a startup to go out of business. When you are starting a company the question of where to get cash is usually the top priority, but managing cash flow is critical for every stage in the lifecycle of a company. As a primarily bootstrapped but capital-intensive business, managing cash flow at Backblaze was and still is a key element of our success and requires continued focus. Let’s look at what we learned over the years.
Raising Your Initial Funding
When starting a tech business in Silicon Valley, the default assumption is that you will immediately try to raise venture funding. There are certainly many advantages to raising funding — not the least of which is that you don’t need to be cash-flow positive since you have cash in the bank and the expectation is that you will have a “burn rate,” i.e. you’ll be spending more than you make.
Note: While you’re not expected to be cash-flow positive, that doesn’t mean you don’t have to worry about cash. Cash-flow management will determine your burn rate. Whether you can get to cash-flow breakeven or need to raise another round of funding is a direct byproduct of your cash flow management.
Also, raising funding takes time (most successful fundraising cycles take 3-6 months start-to-finish), and time at a startup is in short supply. Constantly trying to raise funding can take away from product development and pursuing growth opportunities. If you’re not successful in raising funding, you then have to either shut down or find an alternate method of funding the business.
Sources of Funding
Depending on the stage of the company, type of company, and other factors, you may have access to different sources of funding. Let’s list a number of them:
Sales — the best kind of funding. It is non-dilutive, doesn’t have to be paid back, and is a direct metric of the success of your company.
Pre-Sales — some customers may be willing to pay you for a product in beta, a test, or pre-pay for a product they’ll receive when finished. Pre-Sales income also is great because it shares the characteristics of cash from sales, but you get the cash early. It also can be a good sign that the product you’re building fills a market need. We started charging for Backblaze computer backup while it was still in private beta, which allowed us to not only collect cash from customers, but also test the billing experience and users’ real desire for the service.
Services — if you’re a service company and customers are paying you for that, great. You can effectively scale for the number of hours available in a day. As demand grows, you can add more employees to increase the total number of billable hours.
Note: If you’re a product company and customers are paying you to consult, that can provide much needed cash, and could provide feedback toward the right product. However, it can also distract from your core business, send you down a path where you’re building a product for a single customer, and addict you to a path that prevents you from building a scalable business.
Yourself — you likely are putting your time into the business, and deferring salary in the process. You may also put your own cash into the business either as an investment or a loan.
Angels — angels are ideal as early investors since they are used to investing in businesses with little to no traction. AngelList is a good place to find them, though finding people you’re connected with through someone that knows you well is best.
Crowdfunding — a component of the JOBS Act permitted entrepreneurs to raise money from nearly anyone since May 2016. The SEC imposes limits on both investors and the companies. This article goes into some depth on the options and sites available.
VCs — VCs are ideal for companies that need to raise at least a few million dollars and intend to build a business that will be worth over $1 billion.
Friends & Family — F&F are often the first people to give you money because they are investing in you. It’s great to have some early supporters, but it also can be risky to take money from people who aren’t used to the risks. The key advice here is to only take money from people who won’t mind losing it. If someone is talking about using their children’s college funds or borrowing from their 401k, say ‘no thank you’ — even if they’re sure they want to loan you money.
Bank Loans — a variety of loan types exist, but most either require the company to have been operational for a couple years, be able to borrow against money the company has or is making, or be able to get a personal guarantee from the founders whereby their own credit is on the line. Fundera provides a good overview of loan options and can help secure some, but most will not be an option for a brand new startup.
Government — in some areas there is the potential for government grants to facilitate research. The SBIR program facilitates some such grants.
At Backblaze, we used a number of these options:
We loaned a cumulative total of a couple hundred thousand dollars to the company and invested our time by going without a salary for a year and a half.
After a year and a half, we raised $370k from 11 angels. All of them were either people whom we knew personally or were a strong recommendation from a mutual friend.
After a couple years we were able to get equipment leases whereby the Storage Pods and hard drives were used as collateral to secure the lease on them.
Ater five years we raised $5m from TMT Investments to add to the balance sheet and invest in growth.
The variety and quantity of sources we used is by no means uncommon.
GAAP vs. Cash
Most companies start tracking financials based on cash, and as they scale they switch to GAAP (Generally Accepted Accounting Principles). Cash is easier to track — we got paid $XXXX and spent $YYY — and as often mentioned, is required for the business to stay alive. GAAP has more subtlety and complexity, but provides a clearer picture of how the business is really doing. Backblaze was on a ‘cash’ system for the first few years, then switched to GAAP. For this post, I’m going to focus on things that help cash flow, not GAAP profitability.
Stages of Cash Flow Management
In a pure service business (e.g. solo proprietor law firm), you may have no expenses other than your time, so this stage doesn’t exist. However, in a product business there is a period of time where you are building the product and have nothing to sell. You have zero cash coming in, but have cash going out. Your cash-flow is completely negative and you need funds to cover that.
Starting to see cash come in from customers is thrilling. I initially had our system set up to email me with every $5 payment we received. You’re making sales, but not covering expenses.
But it takes a lot of $5 payments to pay for servers and salaries, so for a while expenses are likely to outstrip sales. Getting to ramen-profitable is a critical stage where sales cover the business expenses and are “paying enough for the founders to eat ramen.” This extends the runway for a business, but is not completely sustainable, since presumably the founders can’t (or won’t) live forever on a subsistence salary.
This is the ultimate stage whereby the business is truly profitable, including paying everyone market-rate salaries. A business at this stage is self-sustaining. (Of course, market shifts and plenty of other challenges can kill the business, but cash-flow issues alone will not.)
Note, I’m using the word ‘profitable’ here to mean this is still on a cash-basis.
Backblaze was in the all-spend stage for just over a year, during which time we built the service and hadn’t yet made the service available to customers. Backblaze was in the sales-generating stage for nearly another year before the company was barely ramen-profitable where sales were covering the company expenses and paying the founders minimum wage. (I say ‘barely’ since minimum wage in the SF Bay Area is arguably never subsistence.) It took almost three more years before the company was business-profitable, paying everyone including the founders market-rate.
Cash Flow Forecasting
When raising funding it’s helpful to think of milestones reached. You don’t necessarily need enough cash on day one to last for the next 100 years of the company. Some good milestones to consider are how much cash you need to prove there is a market need, prove you can build a product to meet that need, or get to ramen-profitable.
Two things to consider:
1) Unit Economics (COGS)
If your product is 100% software, this may not be relevant. Once software is built it costs effectively nothing to deliver the product to one customer or one million customers. However, in most businesses there is some incremental cost to provide the product. If you’re selling a hardware device, perhaps you sell it for $100 but it costs you $50 to make it. This is called “COGS” (Cost of Goods Sold).
Many products rely on cloud services where the costs scale with growth. That model works great, but it’s still important to understand what the costs are for the cloud service you use per unit of product you sell.
Support is often done by the founders early-on in a business, but that is another real cost to factor in and estimate on a per-user basis. Taking all of the per unit costs combined, you may charge $10/month/user for your service, but if it costs you $7/month/user in cloud services, you’re only netting $3/month/user.
2) Operating Expenses (OpEx)
These are expenses that don’t scale with the number of product units you sell. Typically this includes research & development, sales & marketing, and general & administrative expenses. Presumably there is a certain level of these functions required to build the product, market it, sell it, and run the organization. You can choose to invest or cut back on these, but you’ll still make the same amount per product unit.
Incremental Net Profit Per Unit
If you’ve calculated your COGS and your unit economics are “upside down,” where the amount you charge is less than that it costs you to provide your service, it’s worth thinking hard about how that’s going to change over time. If it will not change, there is no scale that will make the business work. Presuming you do make money on each unit of product you sell — what is sometimes referred to as “Contribution Margin” — consider how many of those product units you need to sell to cover your operating expenses as described above.
Calculating Your Profit
The math on getting to ramen-profitable is simple:
(Number of Product Units Sold x Contribution Margin) - Operating Expenses = Profit
If your operating expenses include subsistence salaries for the founders and profit > $0, you’re ramen-profitable.
Improving Cash Flow
Having access to sources of cash, whether from selling to customers or other methods, is excellent. But needing less cash gives you more choices and allows you to either dilute less, owe less, or invest more.
There are two ways to improve cash flow:
1) Collect More Cash
The best way to collect more cash is to provide more value to your customers and as a result have them pay you more. Additional features/products/services can allow this. However, you can also collect more cash by changing how you charge for your product. If you have a subscription, changing from charging monthly to yearly dramatically improves your cash flow. If you have a product that customers use up, selling a year’s supply instead of selling them one-by-one can help.
2) Spend Less Cash
Reducing COGS is a fantastic way to spend less cash in a scalable way. If you can do this without harming the product or customer experience, you win. There are a myriad of ways to also reduce operating expenses, including taking sub-market salaries, using your home instead of renting office space, staying focused on your core product, etc.
Ultimately, collecting more and spending less cash dramatically simplifies the process of getting to ramen-profitable and later to business-profitable.
Be Careful (Why GAAP Matters)
A word of caution: while running out of cash will put you out of business immediately, overextending yourself will likely put you out of business not much later. GAAP shows how a business is really doing; cash doesn’t. If you only focus on cash, it is possible to commit yourself to both delivering products and repaying loans in the future in an unsustainable fashion. If you’re taking out loans, watch the total balance and monthly payments you’re committing to. If you’re asking customers for pre-payment, make sure you believe you can deliver on what they’ve paid for.
There are numerous challenges to building a business, and ensuring you have enough cash is amongst the most important. Having the cash to keep going lets you keep working on all of the other challenges. The frameworks above were critical for maintaining Backblaze’s cash flow and cash balance. Hopefully you can take some of the lessons we learned and apply them to your business. Let us know what works for you in the comments below.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.