Tag Archives: map

[$] The NOVA filesystem

Post Syndicated from jake original https://lwn.net/Articles/754505/rss

At the 2018 Linux Storage, Filesystem, and Memory-Management Summit, Andiry
Xu presented the NOVA filesystem, which he
is trying to get into the upstream kernel. Unlike existing kernel
filesystems, NOVA exclusively targets non-volatile main memory (NVMM)
rather than
traditional block devices (disks or SSDs). In fact, it does not use the
kernel’s block layer at all and instead uses persistent memory mapped
directly into the kernel address space.

A Peek Behind the Mail Curtain

Post Syndicated from marcelatoath original https://yahooeng.tumblr.com/post/174023151641

USE IMAP TO ACCESS SOME UNIQUE FEATURES

By Libby Lin, Principal Product Manager

Well, we actually won’t show you how we create the magic in our big OATH consumer mail factory. But nevertheless we wanted to share how interested developers could leverage some of our unique features we offer for our Yahoo and AOL Mail customers.

To drive experiences like our travel and shopping smart views or message threading, we tag qualified mails with something we call DECOS and THREADID. While we will not indulge in explaining how exactly we use them internally, we wanted to share how they can be used and accessed through IMAP.

So let’s just look at a sample IMAP command chain. We’ll just assume that you are familiar with the IMAP protocol at this point and you know how to properly talk to an IMAP server.

So here’s how you would retrieve DECO and THREADIDs for specific messages:

1. CONNECT

   openssl s_client -crlf -connect imap.mail.yahoo.com:993

2. LOGIN

   a login username password

   a OK LOGIN completed

3. LIST FOLDERS

   a list “” “*”

   * LIST (\Junk \HasNoChildren) “/” “Bulk Mail”

   * LIST (\Archive \HasNoChildren) “/” “Archive”

   * LIST (\Drafts \HasNoChildren) “/” “Draft”

   * LIST (\HasNoChildren) “/” “Inbox”

   * LIST (\HasNoChildren) “/” “Notes”

   * LIST (\Sent \HasNoChildren) “/” “Sent”

   * LIST (\Trash \HasChildren) “/” “Trash”

   * LIST (\HasNoChildren) “/” “Trash/l2”

   * LIST (\HasChildren) “/” “test level 1”

   * LIST (\HasNoChildren) “/” “test level 1/nestedfolder”

   * LIST (\HasNoChildren) “/” “test level 1/test level 2”

   * LIST (\HasNoChildren) “/” “&T2BZfXso-”

   * LIST (\HasNoChildren) “/” “&gQKAqk7WWr12hA-”

   a OK LIST completed

4.SELECT FOLDER

   a select inbox

   * 94 EXISTS

   * 0 RECENT

   * OK [UIDVALIDITY 1453335194] UIDs valid

   * OK [UIDNEXT 40213] Predicted next UID

   * FLAGS (\Answered \Deleted \Draft \Flagged \Seen $Forwarded $Junk $NotJunk)

   * OK [PERMANENTFLAGS (\Answered \Deleted \Draft \Flagged \Seen $Forwarded $Junk $NotJunk)] Permanent flags

   * OK [HIGHESTMODSEQ 205]

   a OK [READ-WRITE] SELECT completed; now in selected state

5. SEARCH FOR UID

   a uid search 1:*

   * SEARCH 1 2 3 4 11 12 14 23 24 75 76 77 78 114 120 121 124 128 129 130 132 133 134 135 136 137 138 40139 40140 40141 40142 40143 40144 40145 40146 40147 40148     40149 40150 40151 40152 40153 40154 40155 40156 40157 40158 40159 40160 40161 40162 40163 40164 40165 40166 40167 40168 40172 40173 40174 40175 40176     40177 40178 40179 40182 40183 40184 40185 40186 40187 40188 40190 40191 40192 40193 40194 40195 40196 40197 40198 40199 40200 40201 40202 40203 40204     40205 40206 40207 40208 40209 40211 40212

   a OK UID SEARCH completed

6. FETCH DECOS BASED ON UID

   a uid fetch 40212 (X-MSG-DECOS X-MSG-ID X-MSG-THREADID)

   * 94 FETCH (UID 40212 X-MSG-THREADID “108” X-MSG-ID “ACfIowseFt7xWtj0og0L2G0T1wM” X-MSG-DECOS (“FTI” “F1” “EML”))

   a OK UID FETCH completed

EC2 Instance Update – C5 Instances with Local NVMe Storage (C5d)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-instance-update-c5-instances-with-local-nvme-storage-c5d/

As you can see from my EC2 Instance History post, we add new instance types on a regular and frequent basis. Driven by increasingly powerful processors and designed to address an ever-widening set of use cases, the size and diversity of this list reflects the equally diverse group of EC2 customers!

Near the bottom of that list you will find the new compute-intensive C5 instances. With a 25% to 50% improvement in price-performance over the C4 instances, the C5 instances are designed for applications like batch and log processing, distributed and or real-time analytics, high-performance computing (HPC), ad serving, highly scalable multiplayer gaming, and video encoding. Some of these applications can benefit from access to high-speed, ultra-low latency local storage. For example, video encoding, image manipulation, and other forms of media processing often necessitates large amounts of I/O to temporary storage. While the input and output files are valuable assets and are typically stored as Amazon Simple Storage Service (S3) objects, the intermediate files are expendable. Similarly, batch and log processing runs in a race-to-idle model, flushing volatile data to disk as fast as possible in order to make full use of compute resources.

New C5d Instances with Local Storage
In order to meet this need, we are introducing C5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for the applications that I described above, as well as others that you will undoubtedly dream up! Here are the specs:

Instance Name vCPUs RAM Local Storage EBS Bandwidth Network Bandwidth
c5d.large 2 4 GiB 1 x 50 GB NVMe SSD Up to 2.25 Gbps Up to 10 Gbps
c5d.xlarge 4 8 GiB 1 x 100 GB NVMe SSD Up to 2.25 Gbps Up to 10 Gbps
c5d.2xlarge 8 16 GiB 1 x 225 GB NVMe SSD Up to 2.25 Gbps Up to 10 Gbps
c5d.4xlarge 16 32 GiB 1 x 450 GB NVMe SSD 2.25 Gbps Up to 10 Gbps
c5d.9xlarge 36 72 GiB 1 x 900 GB NVMe SSD 4.5 Gbps 10 Gbps
c5d.18xlarge 72 144 GiB 2 x 900 GB NVMe SSD 9 Gbps 25 Gbps

Other than the addition of local storage, the C5 and C5d share the same specs. Both are powered by 3.0 GHz Intel Xeon Platinum 8000-series processors, optimized for EC2 and with full control over C-states on the two largest sizes, giving you the ability to run two cores at up to 3.5 GHz using Intel Turbo Boost Technology.

You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.

Here are a couple of things to keep in mind about the local NVMe storage:

Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.

Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.

Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.

Available Now
C5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent C5 instances.

Jeff;

PS – We will be adding local NVMe storage to other EC2 instance types in the months to come, so stay tuned!

Practice Makes Perfect: Testing Campaigns Before You Send Them

Post Syndicated from Zach Barbitta original https://aws.amazon.com/blogs/messaging-and-targeting/practice-makes-perfect-testing-campaigns-before-you-send-them/

In an article we posted to Medium in February, we talked about how to determine the best time to engage your customers by using Amazon Pinpoint’s built-in session heat map. The session heat map allows you to find the times that your customers are most likely to use your app. In this post, we continued on the topic of best practices—specifically, how to appropriately test a campaign before going live.

In this post, we’ll talk about the old adage “practice makes perfect,” and how it applies to the campaigns you send using Amazon Pinpoint. Let’s take a scenario many of our customers encounter daily: creating a campaign to engage users by sending a push notification.

As you can see from the preceding screenshot, the segment we plan to target has nearly 1.7M recipients, which is a lot! Of course, before we got to this step, we already put several best practices into practice. For example, we determined the best time to engage our audience, scheduled the message based on recipients’ local time zones, performed A/B/N testing, measured lift using a hold-out group, and personalized the content for maximum effectiveness. Now that we’re ready to send the notification, we should test the message before we send it to all of the recipients in our segment. The reason for testing the message is pretty straightforward: we want to make sure every detail of the message is accurate before we send it to all 1,687,575 customers.

Fortunately, Amazon Pinpoint makes it easy to test your messages—in fact, you don’t even have to leave the campaign wizard in order to do so. In step 3 of the campaign wizard, below the message editor, there’s a button labelled Test campaign.

When you choose the Test campaign button, you have three options: you can send the test message to a segment of 100 endpoints or less, or to a set of specific endpoint IDs (up to 10), or to a set of specific device tokens (up to 10), as shown in the following image.

In our case, we’ve already created a segment of internal recipients who will test our message. On the Test Campaign window, under Send a test message to, we choose A segment. Then, in the drop-down menu, we select our test segment, and then choose Send test message.

Because we’re sending the test message to a segment, Amazon Pinpoint automatically creates a new campaign dedicated to this test. This process executes a test campaign, complete with message analytics, which allows you to perform end-to-end testing as if you sent the message to your production audience. To see the analytics for your test campaign, go to the Campaigns tab, and then choose the campaign (the name of the campaign contains the word “test”, followed by four random characters, followed by the name of the campaign).

After you complete a successful test, you’re ready to launch your campaign. As a final check, the Review & Launch screen includes a reminder that indicates whether or not you’ve tested the campaign, as shown in the following image.

There are several other ways you can use this feature. For example, you could use it for troubleshooting a campaign, or for iterating on existing campaigns. To learn more about testing campaigns, see the Amazon Pinpoint User Guide.

Securing Your Cryptocurrency

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/backing-up-your-cryptocurrency/

Securing Your Cryptocurrency

In our blog post on Tuesday, Cryptocurrency Security Challenges, we wrote about the two primary challenges faced by anyone interested in safely and profitably participating in the cryptocurrency economy: 1) make sure you’re dealing with reputable and ethical companies and services, and, 2) keep your cryptocurrency holdings safe and secure.

In this post, we’re going to focus on how to make sure you don’t lose any of your cryptocurrency holdings through accident, theft, or carelessness. You do that by backing up the keys needed to sell or trade your currencies.

$34 Billion in Lost Value

Of the 16.4 million bitcoins said to be in circulation in the middle of 2017, close to 3.8 million may have been lost because their owners no longer are able to claim their holdings. Based on today’s valuation, that could total as much as $34 billion dollars in lost value. And that’s just bitcoins. There are now over 1,500 different cryptocurrencies, and we don’t know how many of those have been misplaced or lost.



Now that some cryptocurrencies have reached (at least for now) staggering heights in value, it’s likely that owners will be more careful in keeping track of the keys needed to use their cryptocurrencies. For the ones already lost, however, the owners have been separated from their currencies just as surely as if they had thrown Benjamin Franklins and Grover Clevelands over the railing of a ship.

The Basics of Securing Your Cryptocurrencies

In our previous post, we reviewed how cryptocurrency keys work, and the common ways owners can keep track of them. A cryptocurrency owner needs two keys to use their currencies: a public key that can be shared with others is used to receive currency, and a private key that must be kept secure is used to spend or trade currency.

Many wallets and applications allow the user to require extra security to access them, such as a password, or iris, face, or thumb print scan. If one of these options is available in your wallets, take advantage of it. Beyond that, it’s essential to back up your wallet, either using the backup feature built into some applications and wallets, or manually backing up the data used by the wallet. When backing up, it’s a good idea to back up the entire wallet, as some wallets require additional private data to operate that might not be apparent.

No matter which backup method you use, it is important to back up often and have multiple backups, preferable in different locations. As with any valuable data, a 3-2-1 backup strategy is good to follow, which ensures that you’ll have a good backup copy if anything goes wrong with one or more copies of your data.

One more caveat, don’t reuse passwords. This applies to all of your accounts, but is especially important for something as critical as your finances. Don’t ever use the same password for more than one account. If security is breached on one of your accounts, someone could connect your name or ID with other accounts, and will attempt to use the password there, as well. Consider using a password manager such as LastPass or 1Password, which make creating and using complex and unique passwords easy no matter where you’re trying to sign in.

Approaches to Backing Up Your Cryptocurrency Keys

There are numerous ways to be sure your keys are backed up. Let’s take them one by one.

1. Automatic backups using a backup program

If you’re using a wallet program on your computer, for example, Bitcoin Core, it will store your keys, along with other information, in a file. For Bitcoin Core, that file is wallet.dat. Other currencies will use the same or a different file name and some give you the option to select a name for the wallet file.

To back up the wallet.dat or other wallet file, you might need to tell your backup program to explicitly back up that file. Users of Backblaze Backup don’t have to worry about configuring this, since by default, Backblaze Backup will back up all data files. You should determine where your particular cryptocurrency, wallet, or application stores your keys, and make sure the necessary file(s) are backed up if your backup program requires you to select which files are included in the backup.

Backblaze B2 is an option for those interested in low-cost and high security cloud storage of their cryptocurrency keys. Backblaze B2 supports 2-factor verification for account access, works with a number of apps that support automatic backups with encryption, error-recovery, and versioning, and offers an API and command-line interface (CLI), as well. The first 10GB of storage is free, which could be all one needs to store encrypted cryptocurrency keys.

2. Backing up by exporting keys to a file

Apps and wallets will let you export your keys from your app or wallet to a file. Once exported, your keys can be stored on a local drive, USB thumb drive, DAS, NAS, or in the cloud with any cloud storage or sync service you wish. Encrypting the file is strongly encouraged — more on that later. If you use 1Password or LastPass, or other secure notes program, you also could store your keys there.

3. Backing up by saving a mnemonic recovery seed

A mnemonic phrase, mnemonic recovery phrase, or mnemonic seed is a list of words that stores all the information needed to recover a cryptocurrency wallet. Many wallets will have the option to generate a mnemonic backup phrase, which can be written down on paper. If the user’s computer no longer works or their hard drive becomes corrupted, they can download the same wallet software again and use the mnemonic recovery phrase to restore their keys.

The phrase can be used by anyone to recover the keys, so it must be kept safe. Mnemonic phrases are an excellent way of backing up and storing cryptocurrency and so they are used by almost all wallets.

A mnemonic recovery seed is represented by a group of easy to remember words. For example:

eye female unfair moon genius pipe nuclear width dizzy forum cricket know expire purse laptop scale identify cube pause crucial day cigar noise receive

The above words represent the following seed:

0a5b25e1dab6039d22cd57469744499863962daba9d2844243fec 9c0313c1448d1a0b2cd9e230a78775556f9b514a8be45802c2808e fd449a20234e9262dfa69

These words have certain properties:

  • The first four letters are enough to unambiguously identify the word.
  • Similar words are avoided (such as: build and built).

Bitcoin and most other cryptocurrencies such as Litecoin, Ethereum, and others use mnemonic seeds that are 12 to 24 words long. Other currencies might use different length seeds.

4. Physical backups — Paper, Metal

Some cryptocurrency holders believe that their backup, or even all their cryptocurrency account information, should be stored entirely separately from the internet to avoid any risk of their information being compromised through hacks, exploits, or leaks. This type of storage is called “cold storage.” One method of cold storage involves printing out the keys to a piece of paper and then erasing any record of the keys from all computer systems. The keys can be entered into a program from the paper when needed, or scanned from a QR code printed on the paper.

Printed public and private keys

Printed public and private keys

Some who go to extremes suggest separating the mnemonic needed to access an account into individual pieces of paper and storing those pieces in different locations in the home or office, or even different geographical locations. Some say this is a bad idea since it could be possible to reconstruct the mnemonic from one or more pieces. How diligent you wish to be in protecting these codes is up to you.

Mnemonic recovery phrase booklet

Mnemonic recovery phrase booklet

There’s another option that could make you the envy of your friends. That’s the CryptoSteel wallet, which is a stainless steel metal case that comes with more than 250 stainless steel letter tiles engraved on each side. Codes and passwords are assembled manually from the supplied part-randomized set of tiles. Users are able to store up to 96 characters worth of confidential information. Cryptosteel claims to be fireproof, waterproof, and shock-proof.

image of a Cryptosteel cold storage device

Cryptosteel cold wallet

Of course, if you leave your Cryptosteel wallet in the pocket of a pair of ripped jeans that gets thrown out by the housekeeper, as happened to the character Russ Hanneman on the TV show Silicon Valley in last Sunday’s episode, then you’re out of luck. That fictional billionaire investor lost a USB drive with $300 million in cryptocoins. Let’s hope that doesn’t happen to you.

Encryption & Security

Whether you store your keys on your computer, an external disk, a USB drive, DAS, NAS, or in the cloud, you want to make sure that no one else can use those keys. The best way to handle that is to encrypt the backup.

With Backblaze Backup for Windows and Macintosh, your backups are encrypted in transmission to the cloud and on the backup server. Users have the option to add an additional level of security by adding a Personal Encryption Key (PEK), which secures their private key. Your cryptocurrency backup files are secure in the cloud. Using our web or mobile interface, previous versions of files can be accessed, as well.

Our object storage cloud offering, Backblaze B2, can be used with a variety of applications for Windows, Macintosh, and Linux. With B2, cryptocurrency users can choose whichever method of encryption they wish to use on their local computers and then upload their encrypted currency keys to the cloud. Depending on the client used, versioning and life-cycle rules can be applied to the stored files.

Other backup programs and systems provide some or all of these capabilities, as well. If you are backing up to a local drive, it is a good idea to encrypt the local backup, which is an option in some backup programs.

Address Security

Some experts recommend using a different address for each cryptocurrency transaction. Since the address is not the same as your wallet, this means that you are not creating a new wallet, but simply using a new identifier for people sending you cryptocurrency. Creating a new address is usually as easy as clicking a button in the wallet.

One of the chief advantages of using a different address for each transaction is anonymity. Each time you use an address, you put more information into the public ledger (blockchain) about where the currency came from or where it went. That means that over time, using the same address repeatedly could mean that someone could map your relationships, transactions, and incoming funds. The more you use that address, the more information someone can learn about you. For more on this topic, refer to Address reuse.

Note that a downside of using a paper wallet with a single key pair (type-0 non-deterministic wallet) is that it has the vulnerabilities listed above. Each transaction using that paper wallet will add to the public record of transactions associated with that address. Newer wallets, i.e. “deterministic” or those using mnemonic code words support multiple addresses and are now recommended.

There are other approaches to keeping your cryptocurrency transaction secure. Here are a couple of them.

Multi-signature

Multi-signature refers to requiring more than one key to authorize a transaction, much like requiring more than one key to open a safe. It is generally used to divide up responsibility for possession of cryptocurrency. Standard transactions could be called “single-signature transactions” because transfers require only one signature — from the owner of the private key associated with the currency address (public key). Some wallets and apps can be configured to require more than one signature, which means that a group of people, businesses, or other entities all must agree to trade in the cryptocurrencies.

Deep Cold Storage

Deep cold storage ensures the entire transaction process happens in an offline environment. There are typically three elements to deep cold storage.

First, the wallet and private key are generated offline, and the signing of transactions happens on a system not connected to the internet in any manner. This ensures it’s never exposed to a potentially compromised system or connection.

Second, details are secured with encryption to ensure that even if the wallet file ends up in the wrong hands, the information is protected.

Third, storage of the encrypted wallet file or paper wallet is generally at a location or facility that has restricted access, such as a safety deposit box at a bank.

Deep cold storage is used to safeguard a large individual cryptocurrency portfolio held for the long term, or for trustees holding cryptocurrency on behalf of others, and is possibly the safest method to ensure a crypto investment remains secure.

Keep Your Software Up to Date

You should always make sure that you are using the latest version of your app or wallet software, which includes important stability and security fixes. Installing updates for all other software on your computer or mobile device is also important to keep your wallet environment safer.

One Last Thing: Think About Your Testament

Your cryptocurrency funds can be lost forever if you don’t have a backup plan for your peers and family. If the location of your wallets or your passwords is not known by anyone when you are gone, there is no hope that your funds will ever be recovered. Taking a bit of time on these matters can make a huge difference.

To the Moon*

Are you comfortable with how you’re managing and backing up your cryptocurrency wallets and keys? Do you have a suggestion for keeping your cryptocurrencies safe that we missed above? Please let us know in the comments.


*To the Moon — Crypto slang for a currency that reaches an optimistic price projection.

The post Securing Your Cryptocurrency appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Bell/TSN Letter to University Connects Site-Blocking Support to Students’ Futures

Post Syndicated from Andy original https://torrentfreak.com/bell-tsn-letter-to-university-connects-site-blocking-support-to-students-futures-180510/

In January, a coalition of Canadian companies called on local telecoms regulator CRTC to implement a website-blocking regime in Canada.

The coalition, Fairplay Canada, is a collection of organizations and companies with ties to the entertainment industries and includes Bell, Cineplex, Directors Guild of Canada, Maple Leaf Sports and Entertainment, Movie Theatre Association of Canada, and Rogers Media. Its stated aim is to address Canada’s online piracy problems.

While CTRC reviews FairPlay Canada’s plans, the coalition has been seeking to drum up support for the blocking regime, encouraging a diverse range of supporters to send submissions endorsing the project. Of course, building a united front among like-minded groups is nothing out of the ordinary but a situation just uncovered by Canadian law Professor Micheal Geist, one of the most vocal opponents of the proposed scheme, is bound to raise eyebrows.

Geist discovered a submission by Brian Hutchings, who works as Vice-President, Administration at Brock University in Ontario. Dated March 22, 2018, it notes that one of the university’s most sought-after programs is Sports Management, which helps Brock’s students to become “the lifeblood” of Canada’s sport and entertainment industries.

“Our University is deeply alarmed at how piracy is eroding an industry that employs so many of our co-op students and graduates. Piracy is a serious, pervasive threat that steals creativity, undermines investment in content development and threatens the survival of an industry that is also part of our national identity,” the submission reads.

“Brock ardently supports the FairPlay Canada coalition of more than 25 organizations involved in every aspect of Canada’s film, TV, radio, sports entertainment and music industries. Specifically, we support the coalition’s request that the CRTC introduce rules that would disable access in Canada to the most egregious piracy sites, similar to measures that have been taken in the UK, France and Australia. We are committed to assist the members of the coalition and the CRTC in eliminating the theft of digital content.”

The letter leaves no doubt that Brock University as a whole stands side-by-side with Fairplay Canada but according to a subsequent submission signed by Michelle Webber, President, Brock University Faculty Association (BUFA), nothing could be further from the truth.

Noting that BUFA unanimously supports the position of the Canadian Association of University Teachers which opposes the FairPlay proposal, Webber adds that BUFA stands in opposition to the submission by Brian Hutchings on behalf of Brock University.

“Vice President Hutching’s intervention was undertaken without consultation with the wider Brock University community, including faculty, librarians, and Senate; therefore, his submission should not be seen as indicative of the views of Brock University as a whole.”

BUFA goes on to stress the importance of an open Internet to researchers and educators while raising concerns that the blocking proposals could threaten the principles of net neutrality in Canada.

While the undermining of Hutching’s position is embarrassing enough, via access to information laws Geist has also been able to reveal the chain of events that prompted the Vice-President to write a letter of support on behalf of the whole university.

It began with an email sent by former Brock professor Cheri Bradish to Mark Milliere, TSN’s Senior Vice President and General Manager, with Hutchings copied in. The idea was to connect the pair, with the suggestion that supporting the site-blocking plan would help to mitigate the threat to “future work options” for students.

What followed was a direct email from Mark Milliere to Brian Hutchings, in which the former laid out the contributions his company makes to the university, while again suggesting that support for site-blocking would be in the long-term interests of students seeking employment in the industry.

On March 23, Milliere wrote to Hutchings again, thanking him for “a terrific letter” and stating that “If you need anything from TSN, just ask.”

This isn’t the first time that Bell has asked those beholden to the company to support its site-blocking plans.

Back in February it was revealed that the company had asked its own employees to participate in the site-blocking submission process, without necessarily revealing their affiliations with the company.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Creating a 1.3 Million vCPU Grid on AWS using EC2 Spot Instances and TIBCO GridServer

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/creating-a-1-3-million-vcpu-grid-on-aws-using-ec2-spot-instances-and-tibco-gridserver/

Many of my colleagues are fortunate to be able to spend a good part of their day sitting down with and listening to our customers, doing their best to understand ways that we can better meet their business and technology needs. This information is treated with extreme care and is used to drive the roadmap for new services and new features.

AWS customers in the financial services industry (often abbreviated as FSI) are looking ahead to the Fundamental Review of Trading Book (FRTB) regulations that will come in to effect between 2019 and 2021. Among other things, these regulations mandate a new approach to the “value at risk” calculations that each financial institution must perform in the four hour time window after trading ends in New York and begins in Tokyo. Today, our customers report this mission-critical calculation consumes on the order of 200,000 vCPUs, growing to between 400K and 800K vCPUs in order to meet the FRTB regulations. While there’s still some debate about the magnitude and frequency with which they’ll need to run this expanded calculation, the overall direction is clear.

Building a Big Grid
In order to make sure that we are ready to help our FSI customers meet these new regulations, we worked with TIBCO to set up and run a proof of concept grid in the AWS Cloud. The periodic nature of the calculation, along with the amount of processing power and storage needed to run it to completion within four hours, make it a great fit for an environment where a vast amount of cost-effective compute power is available on an on-demand basis.

Our customers are already using the TIBCO GridServer on-premises and want to use it in the cloud. This product is designed to run grids at enterprise scale. It runs apps in a virtualized fashion, and accepts requests for resources, dynamically provisioning them on an as-needed basis. The cloud version supports Amazon Linux as well as the PostgreSQL-compatible edition of Amazon Aurora.

Working together with TIBCO, we set out to create a grid that was substantially larger than the current high-end prediction of 800K vCPUs, adding a 50% safety factor and then rounding up to reach 1.3 million vCPUs (5x the size of the largest on-premises grid). With that target in mind, the account limits were raised as follows:

  • Spot Instance Limit – 120,000
  • EBS Volume Limit – 120,000
  • EBS Capacity Limit – 2 PB

If you plan to create a grid of this size, you should also bring your friendly local AWS Solutions Architect into the loop as early as possible. They will review your plans, provide you with architecture guidance, and help you to schedule your run.

Running the Grid
We hit the Go button and launched the grid, watching as it bid for and obtained Spot Instances, each of which booted, initialized, and joined the grid within two minutes. The test workload used the Strata open source analytics & market risk library from OpenGamma and was set up with their assistance.

The grid grew to 61,299 Spot Instances (1.3 million vCPUs drawn from 34 instance types spanning 3 generations of EC2 hardware) as planned, with just 1,937 instances reclaimed and automatically replaced during the run, and cost $30,000 per hour to run, at an average hourly cost of $0.078 per vCPU. If the same instances had been used in On-Demand form, the hourly cost to run the grid would have been approximately $93,000.

Despite the scale of the grid, prices for the EC2 instances did not move during the bidding process. This is due to the overall size of the AWS Cloud and the smooth price change model that we launched late last year.

To give you a sense of the compute power, we computed that this grid would have taken the #1 position on the TOP 500 supercomputer list in November 2007 by a considerable margin, and the #2 position in June 2008. Today, it would occupy position #360 on the list.

I hope that you enjoyed this AWS success story, and that it gives you an idea of the scale that you can achieve in the cloud!

Jeff;

[$] A mapping layer for filesystems

Post Syndicated from jake original https://lwn.net/Articles/753650/rss

In a plenary session on the second day of the Linux Storage, Filesystem,
and Memory-Management Summit (LSFMM), Dave Chinner described his ideas for
a virtual block address-space layer. It would allow “space accounting to be
shared and managed at various layers in the storage stack”. One of the
targets for this work is for filesystems on thin-provisioned devices, where
the filesystem
is larger than the storage devices holding it (and administrators are
expected to add storage as needed); in current systems, running out of
space causes huge problems for filesystems and users because the filesystem
cannot communicate that error in a usable fashion.

[$] Shared memory mappings for devices

Post Syndicated from jake original https://lwn.net/Articles/753481/rss

In a short filesystem-only discussion at the 2018 Linux Storage,
Filesystem, and Memory-Management Summit (LSFMM), Jérôme Glisse wanted to
talk about some (more) changes to support GPUs, FPGAs, and RDMA devices.
In other talks at LSFMM, he discussed
changes to struct page
in support of these kinds of devices, but here he was looking to discuss
other changes
to support mapping a device’s memory into multiple processes. It should be
noted that I had a hard time following the discussion in this session, so
there may be significant gaps in the article.

Analyze data in Amazon DynamoDB using Amazon SageMaker for real-time prediction

Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/analyze-data-in-amazon-dynamodb-using-amazon-sagemaker-for-real-time-prediction/

Many companies across the globe use Amazon DynamoDB to store and query historical user-interaction data. DynamoDB is a fast NoSQL database used by applications that need consistent, single-digit millisecond latency.

Often, customers want to turn their valuable data in DynamoDB into insights by analyzing a copy of their table stored in Amazon S3. Doing this separates their analytical queries from their low-latency critical paths. This data can be the primary source for understanding customers’ past behavior, predicting future behavior, and generating downstream business value. Customers often turn to DynamoDB because of its great scalability and high availability. After a successful launch, many customers want to use the data in DynamoDB to predict future behaviors or provide personalized recommendations.

DynamoDB is a good fit for low-latency reads and writes, but it’s not practical to scan all data in a DynamoDB database to train a model. In this post, I demonstrate how you can use DynamoDB table data copied to Amazon S3 by AWS Data Pipeline to predict customer behavior. I also demonstrate how you can use this data to provide personalized recommendations for customers using Amazon SageMaker. You can also run ad hoc queries using Amazon Athena against the data. DynamoDB recently released on-demand backups to create full table backups with no performance impact. However, it’s not suitable for our purposes in this post, so I chose AWS Data Pipeline instead to create managed backups are accessible from other services.

To do this, I describe how to read the DynamoDB backup file format in Data Pipeline. I also describe how to convert the objects in S3 to a CSV format that Amazon SageMaker can read. In addition, I show how to schedule regular exports and transformations using Data Pipeline. The sample data used in this post is from Bank Marketing Data Set of UCI.

The solution that I describe provides the following benefits:

  • Separates analytical queries from production traffic on your DynamoDB table, preserving your DynamoDB read capacity units (RCUs) for important production requests
  • Automatically updates your model to get real-time predictions
  • Optimizes for performance (so it doesn’t compete with DynamoDB RCUs after the export) and for cost (using data you already have)
  • Makes it easier for developers of all skill levels to use Amazon SageMaker

All code and data set in this post are available in this .zip file.

Solution architecture

The following diagram shows the overall architecture of the solution.

The steps that data follows through the architecture are as follows:

  1. Data Pipeline regularly copies the full contents of a DynamoDB table as JSON into an S3
  2. Exported JSON files are converted to comma-separated value (CSV) format to use as a data source for Amazon SageMaker.
  3. Amazon SageMaker renews the model artifact and update the endpoint.
  4. The converted CSV is available for ad hoc queries with Amazon Athena.
  5. Data Pipeline controls this flow and repeats the cycle based on the schedule defined by customer requirements.

Building the auto-updating model

This section discusses details about how to read the DynamoDB exported data in Data Pipeline and build automated workflows for real-time prediction with a regularly updated model.

Download sample scripts and data

Before you begin, take the following steps:

  1. Download sample scripts in this .zip file.
  2. Unzip the src.zip file.
  3. Find the automation_script.sh file and edit it for your environment. For example, you need to replace 's3://<your bucket>/<datasource path>/' with your own S3 path to the data source for Amazon ML. In the script, the text enclosed by angle brackets—< and >—should be replaced with your own path.
  4. Upload the json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar file to your S3 path so that the ADD jar command in Apache Hive can refer to it.

For this solution, the banking.csv  should be imported into a DynamoDB table.

Export a DynamoDB table

To export the DynamoDB table to S3, open the Data Pipeline console and choose the Export DynamoDB table to S3 template. In this template, Data Pipeline creates an Amazon EMR cluster and performs an export in the EMRActivity activity. Set proper intervals for backups according to your business requirements.

One core node(m3.xlarge) provides the default capacity for the EMR cluster and should be suitable for the solution in this post. Leave the option to resize the cluster before running enabled in the TableBackupActivity activity to let Data Pipeline scale the cluster to match the table size. The process of converting to CSV format and renewing models happens in this EMR cluster.

For a more in-depth look at how to export data from DynamoDB, see Export Data from DynamoDB in the Data Pipeline documentation.

Add the script to an existing pipeline

After you export your DynamoDB table, you add an additional EMR step to EMRActivity by following these steps:

  1. Open the Data Pipeline console and choose the ID for the pipeline that you want to add the script to.
  2. For Actions, choose Edit.
  3. In the editing console, choose the Activities category and add an EMR step using the custom script downloaded in the previous section, as shown below.

Paste the following command into the new step after the data ­­upload step:

s3://#{myDDBRegion}.elasticmapreduce/libs/script-runner/script-runner.jar,s3://<your bucket name>/automation_script.sh,#{output.directoryPath},#{myDDBRegion}

The element #{output.directoryPath} references the S3 path where the data pipeline exports DynamoDB data as JSON. The path should be passed to the script as an argument.

The bash script has two goals, converting data formats and renewing the Amazon SageMaker model. Subsequent sections discuss the contents of the automation script.

Automation script: Convert JSON data to CSV with Hive

We use Apache Hive to transform the data into a new format. The Hive QL script to create an external table and transform the data is included in the custom script that you added to the Data Pipeline definition.

When you run the Hive scripts, do so with the -e option. Also, define the Hive table with the 'org.openx.data.jsonserde.JsonSerDe' row format to parse and read JSON format. The SQL creates a Hive EXTERNAL table, and it reads the DynamoDB backup data on the S3 path passed to it by Data Pipeline.

Note: You should create the table with the “EXTERNAL” keyword to avoid the backup data being accidentally deleted from S3 if you drop the table.

The full automation script for converting follows. Add your own bucket name and data source path in the highlighted areas.

#!/bin/bash
hive -e "
ADD jar s3://<your bucket name>/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar ; 
DROP TABLE IF EXISTS blog_backup_data ;
CREATE EXTERNAL TABLE blog_backup_data (
 customer_id map<string,string>,
 age map<string,string>, job map<string,string>, 
 marital map<string,string>,education map<string,string>, 
 default map<string,string>, housing map<string,string>,
 loan map<string,string>, contact map<string,string>, 
 month map<string,string>, day_of_week map<string,string>, 
 duration map<string,string>, campaign map<string,string>,
 pdays map<string,string>, previous map<string,string>, 
 poutcome map<string,string>, emp_var_rate map<string,string>, 
 cons_price_idx map<string,string>, cons_conf_idx map<string,string>,
 euribor3m map<string,string>, nr_employed map<string,string>, 
 y map<string,string> ) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
LOCATION '$1/';

INSERT OVERWRITE DIRECTORY 's3://<your bucket name>/<datasource path>/' 
SELECT concat( customer_id['s'],',', 
 age['n'],',', job['s'],',', 
 marital['s'],',', education['s'],',', default['s'],',', 
 housing['s'],',', loan['s'],',', contact['s'],',', 
 month['s'],',', day_of_week['s'],',', duration['n'],',', 
 campaign['n'],',',pdays['n'],',',previous['n'],',', 
 poutcome['s'],',', emp_var_rate['n'],',', cons_price_idx['n'],',',
 cons_conf_idx['n'],',', euribor3m['n'],',', nr_employed['n'],',', y['n'] ) 
FROM blog_backup_data
WHERE customer_id['s'] > 0 ; 

After creating an external table, you need to read data. You then use the INSERT OVERWRITE DIRECTORY ~ SELECT command to write CSV data to the S3 path that you designated as the data source for Amazon SageMaker.

Depending on your requirements, you can eliminate or process the columns in the SELECT clause in this step to optimize data analysis. For example, you might remove some columns that have unpredictable correlations with the target value because keeping the wrong columns might expose your model to “overfitting” during the training. In this post, customer_id  columns is removed. Overfitting can make your prediction weak. More information about overfitting can be found in the topic Model Fit: Underfitting vs. Overfitting in the Amazon ML documentation.

Automation script: Renew the Amazon SageMaker model

After the CSV data is replaced and ready to use, create a new model artifact for Amazon SageMaker with the updated dataset on S3.  For renewing model artifact, you must create a new training job.  Training jobs can be run using the AWS SDK ( for example, Amazon SageMaker boto3 ) or the Amazon SageMaker Python SDK that can be installed with “pip install sagemaker” command as well as the AWS CLI for Amazon SageMaker described in this post.

In addition, consider how to smoothly renew your existing model without service impact, because your model is called by applications in real time. To do this, you need to create a new endpoint configuration first and update a current endpoint with the endpoint configuration that is just created.

#!/bin/bash
## Define variable 
REGION=$2
DTTIME=`date +%Y-%m-%d-%H-%M-%S`
ROLE="<your AmazonSageMaker-ExecutionRole>" 


# Select containers image based on region.  
case "$REGION" in
"us-west-2" )
    IMAGE="174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest"
    ;;
"us-east-1" )
    IMAGE="382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:latest" 
    ;;
"us-east-2" )
    IMAGE="404615174143.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest" 
    ;;
"eu-west-1" )
    IMAGE="438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest" 
    ;;
 *)
    echo "Invalid Region Name"
    exit 1 ;  
esac

# Start training job and creating model artifact 
TRAINING_JOB_NAME=TRAIN-${DTTIME} 
S3OUTPUT="s3://<your bucket name>/model/" 
INSTANCETYPE="ml.m4.xlarge"
INSTANCECOUNT=1
VOLUMESIZE=5 
aws sagemaker create-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --algorithm-specification TrainingImage=${IMAGE},TrainingInputMode=File --role-arn ${ROLE}  --input-data-config '[{ "ChannelName": "train", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://<your bucket name>/<datasource path>/", "S3DataDistributionType": "FullyReplicated" } }, "ContentType": "text/csv", "CompressionType": "None" , "RecordWrapperType": "None"  }]'  --output-data-config S3OutputPath=${S3OUTPUT} --resource-config  InstanceType=${INSTANCETYPE},InstanceCount=${INSTANCECOUNT},VolumeSizeInGB=${VOLUMESIZE} --stopping-condition MaxRuntimeInSeconds=120 --hyper-parameters feature_dim=20,predictor_type=binary_classifier  

# Wait until job completed 
aws sagemaker wait training-job-completed-or-stopped --training-job-name ${TRAINING_JOB_NAME}  --region ${REGION}

# Get newly created model artifact and create model
MODELARTIFACT=`aws sagemaker describe-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --query 'ModelArtifacts.S3ModelArtifacts' --output text `
MODELNAME=MODEL-${DTTIME}
aws sagemaker create-model --region ${REGION} --model-name ${MODELNAME}  --primary-container Image=${IMAGE},ModelDataUrl=${MODELARTIFACT}  --execution-role-arn ${ROLE}

# create a new endpoint configuration 
CONFIGNAME=CONFIG-${DTTIME}
aws sagemaker  create-endpoint-config --region ${REGION} --endpoint-config-name ${CONFIGNAME}  --production-variants  VariantName=Users,ModelName=${MODELNAME},InitialInstanceCount=1,InstanceType=ml.m4.xlarge

# create or update the endpoint
STATUS=`aws sagemaker describe-endpoint --endpoint-name  ServiceEndpoint --query 'EndpointStatus' --output text --region ${REGION} `
if [[ $STATUS -ne "InService" ]] ;
then
    aws sagemaker  create-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}    
else
    aws sagemaker  update-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}
fi

Grant permission

Before you execute the script, you must grant proper permission to Data Pipeline. Data Pipeline uses the DataPipelineDefaultResourceRole role by default. I added the following policy to DataPipelineDefaultResourceRole to allow Data Pipeline to create, delete, and update the Amazon SageMaker model and data source in the script.

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "sagemaker:CreateTrainingJob",
 "sagemaker:DescribeTrainingJob",
 "sagemaker:CreateModel",
 "sagemaker:CreateEndpointConfig",
 "sagemaker:DescribeEndpoint",
 "sagemaker:CreateEndpoint",
 "sagemaker:UpdateEndpoint",
 "iam:PassRole"
 ],
 "Resource": "*"
 }
 ]
}

Use real-time prediction

After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint. This approach is useful for interactive web, mobile, or desktop applications.

Following, I provide a simple Python code example that queries against Amazon SageMaker endpoint URL with its name (“ServiceEndpoint”) and then uses them for real-time prediction.

=== Python sample for real-time prediction ===

#!/usr/bin/env python
import boto3
import json 

client = boto3.client('sagemaker-runtime', region_name ='<your region>' )
new_customer_info = '34,10,2,4,1,2,1,1,6,3,190,1,3,4,3,-1.7,94.055,-39.8,0.715,4991.6'
response = client.invoke_endpoint(
    EndpointName='ServiceEndpoint',
    Body=new_customer_info, 
    ContentType='text/csv'
)
result = json.loads(response['Body'].read().decode())
print(result)
--- output(response) ---
{u'predictions': [{u'score': 0.7528127431869507, u'predicted_label': 1.0}]}

Solution summary

The solution takes the following steps:

  1. Data Pipeline exports DynamoDB table data into S3. The original JSON data should be kept to recover the table in the rare event that this is needed. Data Pipeline then converts JSON to CSV so that Amazon SageMaker can read the data.Note: You should select only meaningful attributes when you convert CSV. For example, if you judge that the “campaign” attribute is not correlated, you can eliminate this attribute from the CSV.
  2. Train the Amazon SageMaker model with the new data source.
  3. When a new customer comes to your site, you can judge how likely it is for this customer to subscribe to your new product based on “predictedScores” provided by Amazon SageMaker.
  4. If the new user subscribes your new product, your application must update the attribute “y” to the value 1 (for yes). This updated data is provided for the next model renewal as a new data source. It serves to improve the accuracy of your prediction. With each new entry, your application can become smarter and deliver better predictions.

Running ad hoc queries using Amazon Athena

Amazon Athena is a serverless query service that makes it easy to analyze large amounts of data stored in Amazon S3 using standard SQL. Athena is useful for examining data and collecting statistics or informative summaries about data. You can also use the powerful analytic functions of Presto, as described in the topic Aggregate Functions of Presto in the Presto documentation.

With the Data Pipeline scheduled activity, recent CSV data is always located in S3 so that you can run ad hoc queries against the data using Amazon Athena. I show this with example SQL statements following. For an in-depth description of this process, see the post Interactive SQL Queries for Data in Amazon S3 on the AWS News Blog. 

Creating an Amazon Athena table and running it

Simply, you can create an EXTERNAL table for the CSV data on S3 in Amazon Athena Management Console.

=== Table Creation ===
CREATE EXTERNAL TABLE datasource (
 age int, 
 job string, 
 marital string , 
 education string, 
 default string, 
 housing string, 
 loan string, 
 contact string, 
 month string, 
 day_of_week string, 
 duration int, 
 campaign int, 
 pdays int , 
 previous int , 
 poutcome string, 
 emp_var_rate double, 
 cons_price_idx double,
 cons_conf_idx double, 
 euribor3m double, 
 nr_employed double, 
 y int 
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
LOCATION 's3://<your bucket name>/<datasource path>/';

The following query calculates the correlation coefficient between the target attribute and other attributes using Amazon Athena.

=== Sample Query ===

SELECT corr(age,y) AS correlation_age_and_target, 
 corr(duration,y) AS correlation_duration_and_target, 
 corr(campaign,y) AS correlation_campaign_and_target,
 corr(contact,y) AS correlation_contact_and_target
FROM ( SELECT age , duration , campaign , y , 
 CASE WHEN contact = 'telephone' THEN 1 ELSE 0 END AS contact 
 FROM datasource 
 ) datasource ;

Conclusion

In this post, I introduce an example of how to analyze data in DynamoDB by using table data in Amazon S3 to optimize DynamoDB table read capacity. You can then use the analyzed data as a new data source to train an Amazon SageMaker model for accurate real-time prediction. In addition, you can run ad hoc queries against the data on S3 using Amazon Athena. I also present how to automate these procedures by using Data Pipeline.

You can adapt this example to your specific use case at hand, and hopefully this post helps you accelerate your development. You can find more examples and use cases for Amazon SageMaker in the video AWS 2017: Introducing Amazon SageMaker on the AWS website.

 


Additional Reading

If you found this post useful, be sure to check out Serving Real-Time Machine Learning Predictions on Amazon EMR and Analyzing Data in S3 using Amazon Athena.

 


About the Author

Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”

 

 

[$] Zone-lock and mmap_sem scalability

Post Syndicated from corbet original https://lwn.net/Articles/753269/rss

The memory-management subsystem is a central point that handles all of the
system’s memory, so it is naturally subject to scalability problems as
systems grow larger. Two sessions during the memory-management track of
the 2018 Linux Storage, Filesystem, and Memory-Management Summit looked at
specific contention points: the zone locks and the mmap_sem
semaphore.

EC2 Fleet – Manage Thousands of On-Demand and Spot Instances with One Request

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-fleet-manage-thousands-of-on-demand-and-spot-instances-with-one-request/

EC2 Spot Fleets are really cool. You can launch a fleet of Spot Instances that spans EC2 instance types and Availability Zones without having to write custom code to discover capacity or monitor prices. You can set the target capacity (the size of the fleet) in units that are meaningful to your application and have Spot Fleet create and then maintain the fleet on your behalf. Our customers are creating Spot Fleets of all sizes. For example, one financial service customer runs Monte Carlo simulations across 10 different EC2 instance types. They routinely make requests for hundreds of thousands of vCPUs and count on Spot Fleet to give them access to massive amounts of capacity at the best possible price.

EC2 Fleet
Today we are extending and generalizing the set-it-and-forget-it model that we pioneered in Spot Fleet with EC2 Fleet, a new building block that gives you the ability to create fleets that are composed of a combination of EC2 On-Demand, Reserved, and Spot Instances with a single API call. You tell us what you need, capacity and instance-wise, and we’ll handle all the heavy lifting. We will launch, manage, monitor and scale instances as needed, without the need for scaffolding code.

You can specify the capacity of your fleet in terms of instances, vCPUs, or application-oriented units, and also indicate how much of the capacity should be fulfilled by Spot Instances. The application-oriented units allow you to specify the relative power of each EC2 instance type in a way that directly maps to the needs of your application. All three capacity specification options (instances, vCPUs, and application-oriented units) are known as weights.

I think you’ll find a number ways this feature makes managing a fleet of instances easier, and believe that you will also appreciate the team’s near-term feature roadmap of interest (more on that in a bit).

Using EC2 Fleet
There are a number of ways that you can use this feature, whether you’re running a stateless web service, a big data cluster or a continuous integration pipeline. Today I’m going to describe how you can use EC2 Fleet for genomic processing, but this is similar to workloads like risk analysis, log processing or image rendering. Modern DNA sequencers can produce multiple terabytes of raw data each day, to process that data into meaningful information in a timely fashion you need lots of processing power. I’ll be showing you how to deploy a “grid” of worker nodes that can quickly crunch through secondary analysis tasks in parallel.

Projects in genomics can use the elasticity EC2 provides to experiment and try out new pipelines on hundreds or even thousands of servers. With EC2 you can access as many cores as you need and only pay for what you use. Prior to today, you would need to use the RunInstances API or an Auto Scaling group for the On-Demand & Reserved Instance portion of your grid. To get the best price performance you’d also create and manage a Spot Fleet or multiple Spot Auto Scaling groups with different instance types if you wanted to add Spot Instances to turbo-boost your secondary analysis. Finally, to automate scaling decisions across multiple APIs and Auto Scaling groups you would need to write Lambda functions that periodically assess your grid’s progress & backlog, as well as current Spot prices – modifying your Auto Scaling Groups and Spot Fleets accordingly.

You can now replace all of this with a single EC2 Fleet, analyzing genomes at scale for as little as $1 per analysis. In my grid, each step in in the pipeline requires 1 vCPU and 4 GiB of memory, a perfect match for M4 and M5 instances with 4 GiB of memory per vCPU. I will create a fleet using M4 and M5 instances with weights that correspond to the number of vCPUs on each instance:

  • m4.16xlarge – 64 vCPUs, weight = 64
  • m5.24xlarge – 96 vCPUs, weight = 96

This is expressed in a template that looks like this:

"Overrides": [
{
  "InstanceType": "m4.16xlarge",
  "WeightedCapacity": 64,
},
{
  "InstanceType": "m5.24xlarge",
  "WeightedCapacity": 96,
},
]

By default, EC2 Fleet will select the most cost effective combination of instance types and Availability Zones (both specified in the template) using the current prices for the Spot Instances and public prices for the On-Demand Instances (if you specify instances for which you have matching RIs, your discounts will apply). The default mode takes weights into account to get the instances that have the lowest price per unit. So for my grid, fleet will find the instance that offers the lowest price per vCPU.

Now I can request capacity in terms of vCPUs, knowing EC2 Fleet will select the lowest cost option using only the instance types I’ve defined as acceptable. Also, I can specify how many vCPUs I want to launch using On-Demand or Reserved Instance capacity and how many vCPUs should be launched using Spot Instance capacity:

"TargetCapacitySpecification": {
	"TotalTargetCapacity": 2880,
	"OnDemandTargetCapacity": 960,
	"SpotTargetCapacity": 1920,
	"DefaultTargetCapacityType": "Spot"
}

The above means that I want a total of 2880 vCPUs, with 960 vCPUs fulfilled using On-Demand and 1920 using Spot. The On-Demand price per vCPU is lower for m5.24xlarge than the On-Demand price per vCPU for m4.16xlarge, so EC2 Fleet will launch 10 m5.24xlarge instances to fulfill 960 vCPUs. Based on current Spot pricing (again, on a per-vCPU basis), EC2 Fleet will choose to launch 30 m4.16xlarge instances or 20 m5.24xlarges, delivering 1920 vCPUs either way.

Putting it all together, I have a single file (fl1.json) that describes my fleet:

    "LaunchTemplateConfigs": [
        {
            "LaunchTemplateSpecification": {
                "LaunchTemplateId": "lt-0e8c754449b27161c",
                "Version": "1"
            }
        "Overrides": [
        {
          "InstanceType": "m4.16xlarge",
          "WeightedCapacity": 64,
        },
        {
          "InstanceType": "m5.24xlarge",
          "WeightedCapacity": 96,
        },
      ]
        }
    ],
    "TargetCapacitySpecification": {
        "TotalTargetCapacity": 2880,
        "OnDemandTargetCapacity": 960,
        "SpotTargetCapacity": 1920,
        "DefaultTargetCapacityType": "Spot"
    }
}

I can launch my fleet with a single command:

$ aws ec2 create-fleet --cli-input-json file://home/ec2-user/fl1.json
{
    "FleetId":"fleet-838cf4e5-fded-4f68-acb5-8c47ee1b248a"
}

My entire fleet is created within seconds and was built using 10 m5.24xlarge On-Demand Instances and 30 m4.16xlarge Spot Instances, since the current Spot price was 1.5¢ per vCPU for m4.16xlarge and 1.6¢ per vCPU for m5.24xlarge.

Now lets imagine my grid has crunched through its backlog and no longer needs the additional Spot Instances. I can then modify the size of my fleet by changing the target capacity in my fleet specification, like this:

{         
    "TotalTargetCapacity": 960,
}

Since 960 was equal to the amount of On-Demand vCPUs I had requested, when I describe my fleet I will see all of my capacity being delivered using On-Demand capacity:

"TargetCapacitySpecification": {
	"TotalTargetCapacity": 960,
	"OnDemandTargetCapacity": 960,
	"SpotTargetCapacity": 0,
	"DefaultTargetCapacityType": "Spot"
}

When I no longer need my fleet I can delete it and terminate the instances in it like this:

$ aws ec2 delete-fleets --fleet-id fleet-838cf4e5-fded-4f68-acb5-8c47ee1b248a \
  --terminate-instances   
{
    "UnsuccessfulFleetDletetions": [],
    "SuccessfulFleetDeletions": [
        {
            "CurrentFleetState": "deleted_terminating",
            "PreviousFleetState": "active",
            "FleetId": "fleet-838cf4e5-fded-4f68-acb5-8c47ee1b248a"
        }
    ]
}

Earlier I described how RI discounts apply when EC2 Fleet launches instances for which you have matching RIs, so you might be wondering how else RI customers benefit from EC2 Fleet. Let’s say that I own regional RIs for M4 instances. In my EC2 Fleet I would remove m5.24xlarge and specify m4.10xlarge and m4.16xlarge. Then when EC2 Fleet creates the grid, it will quickly find M4 capacity across the sizes and AZs I’ve specified, and my RI discounts apply automatically to this usage.

In the Works
We plan to connect EC2 Fleet and EC2 Auto Scaling groups. This will let you create a single fleet that mixed instance types and Spot, Reserved and On-Demand, while also taking advantage of EC2 Auto Scaling features such as health checks and lifecycle hooks. This integration will also bring EC2 Fleet functionality to services such as Amazon ECS, Amazon EKS, and AWS Batch that build on and make use of EC2 Auto Scaling for fleet management.

Available Now
You can create and make use of EC2 Fleets today in all public AWS Regions!

Jeff;

[$] The LRU lock and mmap_sem

Post Syndicated from corbet original https://lwn.net/Articles/753058/rss

The kernel’s memory-management subsystem has to manage a great deal of
concurrency; that leads to an ongoing series of locking challenges that
sometimes seem intractable. Two recurring locking issues — the LRU locks
and the mmap_sem lock — were the topic of sessions held during the
memory-management track of the 2018 Linux Storage, Filesystem, and
Memory-Management Summit. In both cases, it quickly became clear that,
while some interesting ideas are being pursued, easy
solutions are not on offer.

timeShift(GrafanaBuzz, 1w) Issue 42

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/04/27/timeshiftgrafanabuzz-1w-issue-42/

Welcome to TimeShift Grafana v5.1 Stable is available! Two of the biggest new features include a native data source for MSSQL Server and heatmap support for Prometheus. Download the latest release and checkout other new features and fixes below.
Heading to KubeCon + CloudNativeCon Europe 2018 in Copenhagen, Denmark, May 2-4? Come by our booth and say hi! Also don’t miss Tom Wilkie’s talk on Prometheus Monitoring Mixins: Using Jsonnet to Package Together Dashboards, Alerts and Exporters, and Goutham Veeramanchaneni’s talks: TSDB: The Engine behind Prometheus and TSDB: The Past, Present and the Future Latest Release We received a lot of great suggestions, bug reports and pull requests from our amazing community – Thank you all!

[$] Repurposing page->mapping

Post Syndicated from corbet original https://lwn.net/Articles/752564/rss

The page structure is one of the
most complex in the kernel due to the need to cram the maximum amount of
information into as little space as possible. Each field is so heavily
overloaded that developers prefer to avoid making changes to struct
page
if they can avoid
it. That didn’t deter Jérôme Glisse from proposing a significant change
during two plenary sessions at
the 2018 Linux Storage, Filesystem, and Memory-Management Summit, though.
There are some interesting benefits on offer, but getting there will not be
a simple task.

Grafana v5.1 Released

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/04/26/grafana-v5.1-released/

v5.1 Stable Release

The recent 5.0 major release contained a lot of new features so the Grafana 5.1 release is focused on smoothing out the rough edges and iterating over some of the new features.

Download Grafana 5.1 Now

Release Highlights

There are two new features included, Heatmap Support for Prometheus and a new core data source for Microsoft SQL Server.

Another highlight is the revamp of the Grafana docker container that makes it easier to run and control but be aware there is a breaking change to file permissions that will affect existing containers with data volumes.

We got tons of useful improvement suggestions, bug reports and Pull Requests from our amazing community. Thank you all! See the full changelog for more details.

Improved Scrolling Experience

In Grafana v5.0 we introduced a new scrollbar component. Unfortunately this introduced a lot of issues and in some scenarios removed
the native scrolling functionality. Grafana v5.1 ships with a native scrollbar for all pages together with a scrollbar component for
the dashboard grid and panels that does not override the native scrolling functionality. We hope that these changes and improvements should
make the Grafana user experience much better!

Improved Docker Image

Grafana v5.1 brings an improved official docker image which should make it easier to run and use the Grafana docker image and at the same time give more control to the user how to use/run it.

We have switched the id of the grafana user running Grafana inside a docker container. Unfortunately this means that files created prior to 5.1 will not have the correct permissions for later versions and thereby introduces a breaking change. We made this change so that it would be easier for you to control what user Grafana is executed as.

Please read the updated documentation which includes migration instructions and more information.

Heatmap Support for Prometheus

The Prometheus datasource now supports transforming Prometheus histograms to the heatmap panel. The Prometheus histogram is a powerful feature, and we’re
really happy to finally allow our users to render those as heatmaps. The Heatmap panel documentation
contains more information on how to use it.

Another improvement is that the Prometheus query editor now supports autocomplete for template variables. More information in the Prometheus data source documentation.

Microsoft SQL Server

Grafana v5.1 now ships with a built-in Microsoft SQL Server (MSSQL) data source plugin that allows you to query and visualize data from any
Microsoft SQL Server 2005 or newer, including Microsoft Azure SQL Database. Do you have metric or log data in MSSQL? You can now visualize
that data and define alert rules on it as with any of Grafana’s other core datasources.

The using Microsoft SQL Server in Grafana documentation has more detailed information on how to get started.

Adding New Panels to Dashboards

The control for adding new panels to dashboards now includes panel search and it is also now possible to copy and paste panels between dashboards.

By copying a panel in a dashboard it will be displayed in the Paste tab. When you switch to a new dashboard you can paste the
copied panel.

Align Zero-Line for Right and Left Y-axes

The feature request to align the zero-line for right and left Y-axes on the Graph panel is more than 3 years old. It has finally been implemented – more information in the Graph panel documentation.

Other Highlights

  • Table Panel: New enhancements includes support for mapping a numeric value/range to text and additional units. More information in the Table panel documentation.
  • New variable interpolation syntax: We now support a new option for rendering variables that gives the user full control of how the value(s) should be rendered. More details in the in the Variables documentation.
  • Improved workflow for provisioned dashboards. More details here.

Changelog

Checkout the CHANGELOG.md file for a complete list
of new features, changes, and bug fixes.

Continued: the answers to your questions for Eben Upton

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/eben-q-a-2/

Last week, we shared the first half of our Q&A with Raspberry Pi Trading CEO and Raspberry Pi creator Eben Upton. Today we follow up with all your other questions, including your expectations for a Raspberry Pi 4, Eben’s dream add-ons, and whether we really could go smaller than the Zero.

Live Q&A with Eben Upton, creator of the Raspberry Pi

Get your questions to us now using #AskRaspberryPi on Twitter

With internet security becoming more necessary, will there be automated versions of VPN on an SD card?

There are already third-party tools which turn your Raspberry Pi into a VPN endpoint. Would we do it ourselves? Like the power button, it’s one of those cases where there are a million things we could do and so it’s more efficient to let the community get on with it.

Just to give a counterexample, while we don’t generally invest in optimising for particular use cases, we did invest a bunch of money into optimising Kodi to run well on Raspberry Pi, because we found that very large numbers of people were using it. So, if we find that we get half a million people a year using a Raspberry Pi as a VPN endpoint, then we’ll probably invest money into optimising it and feature it on the website as we’ve done with Kodi. But I don’t think we’re there today.

Have you ever seen any Pis running and doing important jobs in the wild, and if so, how does it feel?

It’s amazing how often you see them driving displays, for example in radio and TV studios. Of course, it feels great. There’s something wonderful about the geographic spread as well. The Raspberry Pi desktop is quite distinctive, both in its previous incarnation with the grey background and logo, and the current one where we have Greg Annandale’s road picture.

The PIXEL desktop on Raspberry Pi

And so it’s funny when you see it in places. Somebody sent me a video of them teaching in a classroom in rural Pakistan and in the background was Greg’s picture.

Raspberry Pi 4!?!

There will be a Raspberry Pi 4, obviously. We get asked about it a lot. I’m sticking to the guidance that I gave people that they shouldn’t expect to see a Raspberry Pi 4 this year. To some extent, the opportunity to do the 3B+ was a surprise: we were surprised that we’ve been able to get 200MHz more clock speed, triple the wireless and wired throughput, and better thermals, and still stick to the $35 price point.

We’re up against the wall from a silicon perspective; we’re at the end of what you can do with the 40nm process. It’s not that you couldn’t clock the processor faster, or put a larger processor which can execute more instructions per clock in there, it’s simply about the energy consumption and the fact that you can’t dissipate the heat. So we’ve got to go to a smaller process node and that’s an order of magnitude more challenging from an engineering perspective. There’s more effort, more risk, more cost, and all of those things are challenging.

With 3B+ out of the way, we’re going to start looking at this now. For the first six months or so we’re going to be figuring out exactly what people want from a Raspberry Pi 4. We’re listening to people’s comments about what they’d like to see in a new Raspberry Pi, and I’m hoping by early autumn we should have an idea of what we want to put in it and a strategy for how we might achieve that.

Could you go smaller than the Zero?

The challenge with Zero as that we’re periphery-limited. If you run your hand around the unit, there is no edge of that board that doesn’t have something there. So the question is: “If you want to go smaller than Zero, what feature are you willing to throw out?”

It’s a single-sided board, so you could certainly halve the PCB area if you fold the circuitry and use both sides, though you’d have to lose something. You could give up some GPIO and go back to 26 pins like the first Raspberry Pi. You could give up the camera connector, you could go to micro HDMI from mini HDMI. You could remove the SD card and just do USB boot. I’m inventing a product live on air! But really, you could get down to two thirds and lose a bunch of GPIO – it’s hard to imagine you could get to half the size.

What’s the one feature that you wish you could outfit on the Raspberry Pi that isn’t cost effective at this time? Your dream feature.

Well, more memory. There are obviously technical reasons why we don’t have more memory on there, but there are also market reasons. People ask “why doesn’t the Raspberry Pi have more memory?”, and my response is typically “go and Google ‘DRAM price’”. We’re used to the price of memory going down. And currently, we’re going through a phase where this has turned around and memory is getting more expensive again.

Machine learning would be interesting. There are machine learning accelerators which would be interesting to put on a piece of hardware. But again, they are not going to be used by everyone, so according to our method of pricing what we might add to a board, machine learning gets treated like a $50 chip. But that would be lovely to do.

Which citizen science projects using the Pi have most caught your attention?

I like the wildlife camera projects. We live out in the countryside in a little village, and we’re conscious of being surrounded by nature but we don’t see a lot of it on a day-to-day basis. So I like the nature cam projects, though, to my everlasting shame, I haven’t set one up yet. There’s a range of them, from very professional products to people taking a Raspberry Pi and a camera and putting them in a plastic box. So those are good fun.

Raspberry Shake seismometer

The Raspberry Shake seismometer

And there’s Meteor Pi from the Cambridge Science Centre, that’s a lot of fun. And the seismometer Raspberry Shake – that sort of thing is really nice. We missed the recent South Wales earthquake; perhaps we should set one up at our Californian office.

How does it feel to go to bed every day knowing you’ve changed the world for the better in such a massive way?

What feels really good is that when we started this in 2006 nobody else was talking about it, but now we’re part of a very broad movement.

We were in a really bad way: we’d seen a collapse in the number of applicants applying to study Computer Science at Cambridge and elsewhere. In our view, this reflected a move away from seeing technology as ‘a thing you do’ to seeing it as a ‘thing that you have done to you’. It is problematic from the point of view of the economy, industry, and academia, but most importantly it damages the life prospects of individual children, particularly those from disadvantaged backgrounds. The great thing about STEM subjects is that you can’t fake being good at them. There are a lot of industries where your Dad can get you a job based on who he knows and then you can kind of muddle along. But if your dad gets you a job building bridges and you suck at it, after the first or second bridge falls down, then you probably aren’t going to be building bridges anymore. So access to STEM education can be a great driver of social mobility.

By the time we were launching the Raspberry Pi in 2012, there was this wonderful movement going on. Code Club, for example, and CoderDojo came along. Lots of different ways of trying to solve the same problem. What feels really, really good is that we’ve been able to do this as part of an enormous community. And some parts of that community became part of the Raspberry Pi Foundation – we merged with Code Club, we merged with CoderDojo, and we continue to work alongside a lot of these other organisations. So in the two seconds it takes me to fall asleep after my face hits the pillow, that’s what I think about.

We’re currently advertising a Programme Manager role in New Delhi, India. Did you ever think that Raspberry Pi would be advertising a role like this when you were bringing together the Foundation?

No, I didn’t.

But if you told me we were going to be hiring somewhere, India probably would have been top of my list because there’s a massive IT industry in India. When we think about our interaction with emerging markets, India, in a lot of ways, is the poster child for how we would like it to work. There have already been some wonderful deployments of Raspberry Pi, for example in Kerala, without our direct involvement. And we think we’ve got something that’s useful for the Indian market. We have a product, we have clubs, we have teacher training. And we have a body of experience in how to teach people, so we have a physical commercial product as well as a charitable offering that we think are a good fit.

It’s going to be massive.

What is your favourite BBC type-in listing?

There was a game called Codename: Druid. There is a famous game called Codename: Droid which was the sequel to Stryker’s Run, which was an awesome, awesome game. And there was a type-in game called Codename: Druid, which was at the bottom end of what you would consider a commercial game.

codename druid

And I remember typing that in. And what was really cool about it was that the next month, the guy who wrote it did another article that talks about the memory map and which operating system functions used which bits of memory. So if you weren’t going to do disc access, which bits of memory could you trample on and know the operating system would survive.

babbage versus bugs Raspberry Pi annual

See the full listing for Babbage versus Bugs in the Raspberry Pi 2018 Annual

I still like type-in listings. The Raspberry Pi 2018 Annual has a type-in listing that I wrote for a Babbage versus Bugs game. I will say that’s not the last type-in listing you will see from me in the next twelve months. And if you download the PDF, you could probably copy and paste it into your favourite text editor to save yourself some time.

The post Continued: the answers to your questions for Eben Upton appeared first on Raspberry Pi.