Tag Archives: programming

Wanted: Sales Engineer

Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-sales-engineer/

At inception, Backblaze was a consumer company. Thousands upon thousands of individuals came to our website and gave us $5/mo to keep their data safe. But, we didn’t sell business solutions. It took us years before we had a sales team. In the last couple of years, we’ve released products that businesses of all sizes love: Backblaze B2 Cloud Storage and Backblaze for Business Computer Backup. Those businesses want to integrate Backblaze deeply into their infrastructure, so it’s time to hire our first Sales Engineer!

Company Description:
Founded in 2007, Backblaze started with a mission to make backup software elegant and provide complete peace of mind. Over the course of almost a decade, we have become a pioneer in robust, scalable low cost cloud backup. Recently, we launched B2 – robust and reliable object storage at just $0.005/gb/mo. Part of our differentiation is being able to offer the lowest price of any of the big players while still being profitable.

We’ve managed to nurture a team oriented culture with amazingly low turnover. We value our people and their families. Don’t forget to check out our “About Us” page to learn more about the people and some of our perks.

We have built a profitable, high growth business. While we love our investors, we have maintained control over the business. That means our corporate goals are simple – grow sustainably and profitably.

Some Backblaze Perks:

  • Competitive healthcare plans
  • Competitive compensation and 401k
  • All employees receive Option grants
  • Unlimited vacation days
  • Strong coffee
  • Fully stocked Micro kitchen
  • Catered breakfast and lunches
  • Awesome people who work on awesome projects
  • Childcare bonus
  • Normal work hours
  • Get to bring your pets into the office
  • San Mateo Office – located near Caltrain and Highways 101 & 280.

Backblaze B2 cloud storage is a building block for almost any computing service that requires storage. Customers need our help integrating B2 into iOS apps to Docker containers. Some customers integrate directly to the API using the programming language of their choice, others want to solve a specific problem using ready made software, already integrated with B2.

At the same time, our computer backup product is deepening it’s integration into enterprise IT systems. We are commonly asked for how to set Windows policies, integrate with Active Directory, and install the client via remote management tools.

We are looking for a sales engineer who can help our customers navigate the integration of Backblaze into their technical environments.

Are you 1/2” deep into many different technologies, and unafraid to dive deeper?

Can you confidently talk with customers about their technology, even if you have to look up all the acronyms right after the call?

Are you excited to setup complicated software in a lab and write knowledge base articles about your work?

Then Backblaze is the place for you!

Enough about Backblaze already, what’s in it for me?
In this role, you will be given the opportunity to learn about the technologies that drive innovation today; diverse technologies that customers are using day in and out. And more importantly, you’ll learn how to learn new technologies.

Just as an example, in the past 12 months, we’ve had the opportunity to learn and become experts in these diverse technologies:

  • How to setup VM servers for lab environments, both on-prem and using cloud services.
  • Create an automatically “resetting” demo environment for the sales team.
  • Setup Microsoft Domain Controllers with Active Directory and AD Federation Services.
  • Learn the basics of OAUTH and web single sign on (SSO).
  • Archive video workflows from camera to media asset management systems.
  • How upload/download files from Javascript by enabling CORS.
  • How to install and monitor online backup installations using RMM tools, like JAMF.
  • Tape (LTO) systems. (Yes – people still use tape for storage!)

How can I know if I’ll succeed in this role?

You have:

  • Confidence. Be able to ask customers questions about their environments and convey to them your technical acumen.
  • Curiosity. Always want to learn about customers’ situations, how they got there and what problems they are trying to solve.
  • Organization. You’ll work with customers, integration partners, and Backblaze team members on projects of various lengths. You can context switch and either have a great memory or keep copious notes. Your checklists have their own checklists.

You are versed in:

  • The fundamentals of Windows, Linux and Mac OS X operating systems. You shouldn’t be afraid to use a command line.
  • Building, installing, integrating and configuring applications on any operating system.
  • Debugging failures – reading logs, monitoring usage, effective google searching to fix problems excites you.
  • The basics of TCP/IP networking and the HTTP protocol.
  • Novice development skills in any programming/scripting language. Have basic understanding of data structures and program flow.
  • Your background contains:

  • Bachelor’s degree in computer science or the equivalent.
  • 2+ years of experience as a pre or post-sales engineer.
  • The right extra credit:
    There are literally hundreds of previous experiences you can have had that would make you perfect for this job. Some experiences that we know would be helpful for us are below, but make sure you tell us your stories!

  • Experience using or programming against Amazon S3.
  • Experience with large on-prem storage – NAS, SAN, Object. And backing up data on such storage with tools like Veeam, Veritas and others.
  • Experience with photo or video media. Media archiving is a key market for Backblaze B2.
  • Program arduinos to automatically feed your dog.
  • Experience programming against web or REST APIs. (Point us towards your projects, if they are open source and available to link to.)
  • Experience with sales tools like Salesforce.
  • 3D print door stops.
  • Experience with Windows Servers, Active Directory, Group policies and the like.
  • What’s it like working with the Sales team?
    The Backblaze sales team collaborates. We help each other out by sharing ideas, templates, and our customer’s experiences. When we talk about our accomplishments, there is no “I did this,” only “we”. We are truly a team.

    We are honest to each other and our customers and communicate openly. We aim to have fun by embracing crazy ideas and creative solutions. We try to think not outside the box, but with no boxes at all. Customers are the driving force behind the success of the company and we care deeply about their success.

    If this all sounds like you:

    1. Send an email to [email protected] with the position in the subject line.
    2. Tell us a bit about your Sales Engineering experience.
    3. Include your resume.

    The post Wanted: Sales Engineer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    I am Beemo, a little living boy: Adventure Time prop build

    Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/adventure-time-bmo/

    Bob Herzberg, BMO builder and blogger at BYOBMO.com, fills us in on the whys and hows and even the Pen Wards of creating interactive Adventure Time BMO props with the Raspberry Pi.

    A Conversation With BMO

    A conversation with BMO showing off some voice recognition capabilities. There is no interaction for BMO’s responses other than voice commands. There is a small microphone inside BMO (right behind the blue dot) and the voice commands are processed by Google voice API over WiFi.

    Finding BMO

    My first BMO began as a cosplay prop for my daughter. She and her friends are huge fans of Adventure Time and made their costumes for Princess Bubblegum, Marceline, and Finn. It was my job to come up with a BMO.

    Raspberry Pi BMO Laura Herzberg Bob Herzberg

    Bob as Banana Guard, daughter Laura as Princess Bubblegum, and son Steven as Finn

    I wanted something electronic, and also interactive if possible. And it had to run on battery power. There was only one option that I found that would work: the Raspberry Pi.

    Building a living little boy

    BMO’s basic internals consist of the Raspberry Pi, an 8” HDMI monitor, and a USB battery pack. The body is made from laser-cut MDF wood, which I sanded, sealed, and painted. I added 3D-printed arms and legs along with some vinyl lettering to complete the look. There is also a small wireless keyboard that works as a remote control.

    Adventure Time BMO prop
    Adventure Time BMO prop
    Adventure Time BMO prop
    Adventure Time BMO prop

    To make the front panel button function, I created a custom PCB, mounted laser-cut acrylic buttons on it, and connected it to the Pi’s IO header.

    Inside BMO - Raspberry Pi BMO Laura Herzberg Bob Herzberg

    Custom-made PCBs control BMO’s gaming buttons and USB input.

    The USB jack is extended with another custom PCB, which gives BMO USB ports on the front panel. His battery life is an impressive 8 hours of continuous use.

    The main brain game frame

    Most of BMO’s personality comes from custom animations that my daughter created and that were then turned into MP4 video files. The animations are triggered by the remote keyboard. Some versions of BMO have an internal microphone, and the Google Voice API is used to translate the user’s voice and map it to an appropriate response, so it’s possible to have a conversation with BMO.

    The final components of Raspberry Pi BMO Laura Herzberg Bob Herzberg

    The Raspberry Pi Camera Module was also put to use. Some BMOs have a servo that can pop up a camera, called GoMO, which takes pictures. Although some people mistake it for ghost detecting equipment, BMO just likes taking nice pictures.

    Who wants to play video games?

    Playing games on BMO is as simple as loading one of the emulators supported by Raspbian.

    BMO connected to SNES controllers - Raspberry Pi BMO Laura Herzberg Bob Herzberg

    I’m partial to the Atari 800 emulator, since I used to write games for that platform when I was just starting to learn programming. The front-panel USB ports are used for connecting gamepads, or his front-panel buttons and D-Pad can be used.

    Adventure time

    BMO has been a lot of fun to bring to conventions. He makes it to ComicCon San Diego each year and has been as far away as DragonCon in Atlanta, where he finally got to meet the voice of BMO, Niki Yang.

    BMO's back panel - Raspberry Pi BMO Laura Herzberg Bob Herzberg

    BMO’s back panel, autographed by Niki Yang

    One day, I received an email from the producer of Adventure Time, Kelly Crews, with a very special request. Kelly was looking for a birthday present for the show’s creator, Pendleton Ward. It was either luck or coincidence that I just was finishing up the latest version of BMO. Niki Yang added some custom greetings just for Pen.

    BMO Wishes Pendleton Ward a Happy Birthday!

    Happy birthday to Pendleton Ward, the creator of, well, you know what. We were asked to build Pen his very own BMO and with help from Niki Yang and the Adventure Time crew here is the result.

    We added a few more items inside, including a 3D-printed heart, a medal, and a certificate which come from the famous Be More episode that explains BMO’s origins.

    Back of Adventure Time BMO prop
    Adventure Time BMO prop
    Adventure Time BMO prop
    Adventure Time BMO prop

    BMO was quite a challenge to create. Fabricating the enclosure required several different techniques and materials. Fortunately, bringing him to life was quite simple once he had a Raspberry Pi inside!

    Find out more

    Be sure to follow Bob’s adventures with BMO at the Build Your Own BMO blog. And if you’ve built your own prop from television or film using a Raspberry Pi, be sure to share it with us in the comments below or on our social media channels.

     

    All images c/o Bob and Laura Herzberg

    The post I am Beemo, a little living boy: Adventure Time prop build appeared first on Raspberry Pi.

    Musician’s White Noise YouTube Video Hit With Copyright Complaints

    Post Syndicated from Andy original https://torrentfreak.com/musicians-white-noise-youtube-video-hit-with-copyright-complaints-180105/

    When people upload original content to YouTube, there should be no problem with getting paid for that content, should it attract enough interest from the public.

    Those who upload infringing content get a much less easy ride, with their uploads getting flagged for abuse, potentially putting their accounts at risk.

    That’s what’s happened to Australia-based music technologist Sebastian Tomczak, who uploaded a completely non-infringing work to YouTube and now faces five separate copyright complaints.

    “I teach and work in a music department at a University here in Australia. I’ve got a PhD in chiptune, and my main research interests are various intersections of music / sound / tech e.g. arduino programming and DIY stuff, modular synthesis, digital production, sound design for games, etc,” Tomczak informs TF.

    “I started blogging about music around a decade ago or so, mainly to write about stuff I was interested in, researching or doing. At the time this would have been physical interaction, music controller design, sound design and composition involving computers.”

    One of Tomczak videos was a masterpiece entitled “10 Hours of Low Level White Noise” which features – wait for it – ten hours of low-level white noise.

    “The white noise video was part of a number of videos I put online at the time. I was interested in listening to continuous sounds of various types, and how our perception of these kinds of sounds and our attention changes over longer periods – e.g. distracted, focused, sleeping, waking, working etc,” Tomczak says.

    White noise is the sound created when all different frequencies are combined together into a kind of audio mush that’s a little baffling and yet soothing in the right circumstances. Some people use it to fall asleep a little easier, others to distract their attention away from irritating sounds in the environment, like an aircon system or fan, for example.

    The white noise made by Tomczak and presented in his video was all his own work.

    “I ‘created’ and uploaded the video in question. The video was created by generating a noise waveform of 10 hours length using the freeware software Audacity and the built-in noise generator. The resulting 10-hour audio file was then imported into ScreenFlow, where the text was added and then rendered as one 10-hour video file,” he explains.

    This morning, however, Tomczak received a complaint from YouTube after a copyright holder claimed that it had the rights to his composition. When he checked his YouTube account, yet more complaints greeted him. In fact, since July 2015, when the video was first uploaded, a total of five copyright complaints had been filed against Tomczak’s composition.

    As seen from the image below, posted by Tomczak to his Twitter account, the five complaints came from four copyright holders, with one feeling the need to file two separate complaints while citing two different works.

    The complaints against Tomczak’s white noise

    One company involved – Catapult Distribution – say that Tomczak’s composition infringes on the copyrights of “White Noise Sleep Therapy”, a client selling the title “Majestic Ocean Waves”. It also manages to do the same for the company’s “Soothing Baby Sleep” title. The other complaints come from Merlin Symphonic Distribution and Dig Dis for similar works .

    Under normal circumstances, Tomczak’s account could have been disabled by YouTube for so many infringements but in all cases the copyright holders chose to monetize the musician’s ‘infringement’ instead, via the site’s ContentID system. In other words, after creating the video himself with his own efforts, copyright holders are now taking all the revenue. It’s a situation that Tomczak will now dispute with YouTube.

    “I’ve had quite a few copyright claims against me, usually based on cases where I’ve made long mixes of work, or longer pieces. Usually I don’t take them too seriously,” he explains.

    “In any of the cases where I think a given claim would be an issue, I would dispute it by saying I could either prove that I have made the work, have the original materials that generated the work, or could show enough of the components included in the work to prove originality. This has always been successful for me and I hope it will be in this case as well.”

    Sadly, this isn’t the only problem Tomczak’s had with YouTube’s copyright complaints system. A while back the musician was asked to take part in a video for his workplace but things didn’t go well.

    “I was asked to participate in a video for my workplace and the production team asked if they could use my music and I said ‘no problem’. A month later, the video was uploaded to one of our work channels, and then YouTube generated a copyright claim against me for my own music from the work channel,” he reveals.

    Tomczak says that to him, automated copyright claims are largely an annoyance and if he was making enough money from YouTube, the system would be detrimental in the long run. He feels it’s something that YouTube should adjust, to ensure that false claims aren’t filed against uploads like his.

    While he tries to sort out this mess with YouTube, there is some good news. Other videos of his including “10 Hours of a Perfect Fifth“, “The First 106 Fifths Derived from a 3/2 Ratio” and “Hour-Long Octave Shift” all remain copyright-complaint free.

    For now……

    Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

    Dish Network Files Two Lawsuits Against Pirate IPTV Providers

    Post Syndicated from Andy original https://torrentfreak.com/dish-network-files-two-lawsuits-against-pirate-iptv-providers-180103/

    In broad terms, there are two types of unauthorized online streaming of live TV. The first is via open-access websites where users can view for free. The second features premium services to which viewers are required to subscribe.

    Usually available for a few dollars, euros, or pounds per month, the latter are gaining traction all around the world. Service levels are relatively high and the majority of illicit packages offer a dazzling array of programming, often putting official providers in the shade.

    For this reason, commercial IPTV providers are considered a huge threat to broadcasters’ business models, since they offer a broadly comparable and accessible service at a much cheaper price. This is forcing companies such as US giant Dish Networks to court, seeking relief.

    Following on from a lawsuit filed last year against Kodi add-on ZemTV and TVAddons.ag, Dish has just filed two more lawsuits targeting a pair of unauthorized pirate IPTV services.

    Filed in Maryland and Texas respectively, the actions are broadly similar, with the former targeting a provider known as Spider-TV.

    The suit, filed against Dima Furniture Inc. and Mohammad Yusif (individually and collectively doing business as Spider-TV), claims that the defendants are “capturing
    broadcasts of television channels exclusively licensed to DISH and are unlawfully retransmitting these channels over the Internet to their customers throughout the United States, 24 hours per day, 7 days per week.”

    Dish claim that the defendants profit from the scheme by selling set-top boxes along with subscriptions, charging around $199 per device loaded with 13 months of service.

    Dima Furniture is a Maryland corporation, registered at Takoma Park, Maryland 20912, an address that is listed on the Spider-TV website. The connection between the defendants is further supported by FCC references which identify Spider devices in the market. Mohammad Yusif is claimed to be the president, executive director, general manager, and sole shareholder of Dima Furniture.

    Dish describes itself as the fourth largest pay-television provider in the United States, delivering copyrighted programming to millions of subscribers nationwide by means of satellite delivery and over-the-top services. Dish has acquired the rights to do this, the defendants have not, the broadcaster states.

    “Defendants capture live broadcast signals of the Protected Channels, transcode these signals into a format useful for streaming over the Internet, transfer the transcoded content to one or more servers provided, controlled, and maintained by Defendants, and then transmit the Protected Channels to users of the Service through
    OTT delivery, including users in the United States,” the lawsuit reads.

    It’s claimed that in July 2015, Yusif registered Spider-TV as a trade name of Dima Furniture with the Department of Assessments and Taxation Charter Division, describing the business as “Television Channel Installation”. Since then, the defendants have been illegally retransmitting Dish channels to customers in the United States.

    The overall offer from Spider-TV appears to be considerable, with a claimed 1,300 channels from major regions including the US, Canada, UK, Europe, Middle East, and Africa.

    Importantly, Dish state that the defendants know that their activities are illegal, since the provider sent at least 32 infringement notices since January 20, 2017 demanding an end to the unauthorized retransmission of its channels. It went on to send even more to the defendants’ ISPs.

    “DISH and Networks sent at least thirty-three additional notices requesting the
    removal of infringing content to Internet service providers associated with the Service from February 16, 2017 to the filing of this Complaint. Upon information and belief, at least some of these notices were forwarded to Defendants,” the lawsuit reads.

    But while Dish says that the takedowns responded to by the ISPs were initially successful, the defendants took evasive action by transmitting the targeted channels from other locations.

    Describing the defendants’ actions as “willful, malicious, intentional [and] purposeful”, Dish is suing for Direct Copyright Infringement, demanding a permanent injunction preventing the promotion and provision of the service plus statutory damages of $150,000 per registered work. The final amount isn’t specified but the numbers are potentially enormous. In addition, Dish demands attorneys’ fees, costs, and the seizure of all infringing articles.

    The second lawsuit, filed in Texas, is broadly similar. It targets Mo’ Ayad Al
    Zayed Trading Est., and Mo’ Ayad Fawzi Al Zayed (individually and collectively doing business as Tiger International Company), and Shenzhen Tiger Star Electronical Co., Ltd, otherwise known as Shenzhen Tiger Star.

    Dish claims that these defendants also illegally capture and retransmit channels to customers in the United States. IPTV boxes costing up to $179 including one year’s service are the method of delivery.

    In common with the Maryland case, Dish says it sent almost two dozen takedown notices to ISPs utilized by the defendants. These were also countered by the unauthorized service retransmitting Dish channels from other servers.

    The biggest difference between the Maryland and Texas cases is that while Yusif/Spider/Dima Furniture are said to be in the US, Zayed is said to reside in Amman, Jordan, and Tiger Star is registered in Shenzhen, China. However, since the unauthorized service is targeted at customers in Texas, Dish states that the Texas court has jurisdiction.

    Again, Dish is suing for Direct Infringement, demanding damages, costs, and a permanent injunction.

    The complaints can be found here and here.

    Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

    Supporting Conservancy Makes a Difference

    Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2017/12/31/donate-conservancy.html

    Earlier this year, in
    February, I wrote a blog post encouraging people to donate
    to where I
    work, Software Freedom Conservancy. I’ve not otherwise blogged too much
    this year. It’s been a rough year for many reasons, and while I
    personally and Conservancy in general have accomplished some very
    important work this year, I’m reminded as always that more resources do
    make things easier.

    I understand the urge, given how bad the larger political crises have
    gotten, to want to give to charities other than those related to software
    freedom. There are important causes out there that have become more urgent
    this year. Here’s three issues which have become shockingly more acute
    this year:

    • making sure the USA keeps it commitment
      to immigrants to allow them make a new life here just like my own ancestors
      did,
    • assuring that the great national nature reserves are maintained and
      left pristine for generations to come,
    • assuring that we have zero tolerance abusive behavior —
      particularly by those in power against people who come to them for help and
      job opportunities.

    These are just three of the many issues this year that I’ve seen get worse,
    not better. I am glad that I know and support people who work on these
    issues, and I urge everyone to work on these issues, too.

    Nevertheless, as I plan my primary donations this year, I’m again, as I
    always do, giving to the FSF and my
    own employer, Software
    Freedom Conservancy
    . The reason is simple: software freedom is still
    an essential cause and it is frankly one that most people don’t understand
    (yet). I wrote almost
    two years ago about the phenomenon I dubbed Kuhn’s
    Paradox
    . Simply put: it keeps getting more and more difficult
    to avoid proprietary software in a normal day’s tasks, even while the
    number of lines of code licensed freely gets larger every day.

    As long as that paradox remains true, I see software freedom as urgent. I
    know that we’re losing ground on so many other causes, too. But those of
    you who read my blog are some of the few people in the world that
    understand that software freedom is under threat and needs the urgent work
    that the very few software-freedom-related organizations,
    like the FSF
    and Software Freedom
    Conservancy
    are doing. I hope you’ll donate now to both of them. For
    my part, I gave $120 myself to FSF as part of the monthly Associate
    Membership program, and in a few minutes, I’m going to give $400 to
    Conservancy. I’ll be frank: if you work in technology in an industrialized
    country, I’m quite sure you can afford that level of money, and I suspect
    those amounts are less than most of you spent on technology equipment
    and/or network connectivity charges this year. Make a difference for us
    and give to the cause of software freedom at least as much a you’re giving
    to large technology companies.

    Finally, a good reason to give to smaller charities like FSF and
    Conservancy is that your donation makes a bigger difference. I do think
    bigger organizations, such as (to pick an example of an organization I used
    to give to) my local NPR station does important work. However, I was
    listening this week to my local NPR station, and they said their goal
    for that day was to raise $50,000. For Conservancy, that’s closer
    to a goal we have for entire fundraising season, which for this year was
    $75,000. The thing is: NPR is an important part of USA society, but it’s
    one that nearly everyone understands. So few people understand the threats
    looming from proprietary software, and they may not understand at all until
    it’s too late — when all their devices are locked down, DRM is
    fully ubiquitous, and no one is allowed to tinker with the software on
    their devices and learn the wonderful art of computer programming. We are
    at real risk of reaching that distopia before 90% of the world’s
    population understands the threat!

    Thus, giving to organizations in the area of software freedom is just
    going to have a bigger and more immediate impact than more general causes
    that more easily connect with people. You’re giving to prevent a future
    that not everyone understands yet, and making an impact on our
    work to help explain the dangers to the larger population.

    Instrumenting Web Apps Using AWS X-Ray

    Post Syndicated from Bharath Kumar original https://aws.amazon.com/blogs/devops/instrumenting-web-apps-using-aws-x-ray/

    This post was written by James Bowman, Software Development Engineer, AWS X-Ray

    AWS X-Ray helps developers analyze and debug distributed applications and underlying services in production. You can identify and analyze root-causes of performance issues and errors, understand customer impact, and extract statistical aggregations (such as histograms) for optimization.

    In this blog post, I will provide a step-by-step walkthrough for enabling X-Ray tracing in the Go programming language. You can use these steps to add X-Ray tracing to any distributed application.

    Revel: A web framework for the Go language

    This section will assist you with designing a guestbook application. Skip to “Instrumenting with AWS X-Ray” section below if you already have a Go language application.

    Revel is a web framework for the Go language. It facilitates the rapid development of web applications by providing a predefined framework for controllers, views, routes, filters, and more.

    To get started with Revel, run revel new github.com/jamesdbowman/guestbook. A project base is then copied to $GOPATH/src/github.com/jamesdbowman/guestbook.

    $ tree -L 2
    .
    ├── README.md
    ├── app
    │ ├── controllers
    │ ├── init.go
    │ ├── routes
    │ ├── tmp
    │ └── views
    ├── conf
    │ ├── app.conf
    │ └── routes
    ├── messages
    │ └── sample.en
    ├── public
    │ ├── css
    │ ├── fonts
    │ ├── img
    │ └── js
    └── tests
    └── apptest.go

    Writing a guestbook application

    A basic guestbook application can consist of just two routes: one to sign the guestbook and another to list all entries.
    Let’s set up these routes by adding a Book controller, which can be routed to by modifying ./conf/routes.

    ./app/controllers/book.go:
    package controllers
    
    import (
        "math/rand"
        "time"
    
        "github.com/aws/aws-sdk-go/aws"
        "github.com/aws/aws-sdk-go/aws/endpoints"
        "github.com/aws/aws-sdk-go/aws/session"
        "github.com/aws/aws-sdk-go/service/dynamodb"
        "github.com/aws/aws-sdk-go/service/dynamodb/dynamodbattribute"
        "github.com/revel/revel"
    )
    
    const TABLE_NAME = "guestbook"
    const SUCCESS = "Success.\n"
    const DAY = 86400
    
    var letters = []rune("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
    
    func init() {
        rand.Seed(time.Now().UnixNano())
    }
    
    // randString returns a random string of len n, used for DynamoDB Hash key.
    func randString(n int) string {
        b := make([]rune, n)
        for i := range b {
            b[i] = letters[rand.Intn(len(letters))]
        }
        return string(b)
    }
    
    // Book controls interactions with the guestbook.
    type Book struct {
        *revel.Controller
        ddbClient *dynamodb.DynamoDB
    }
    
    // Signature represents a user's signature.
    type Signature struct {
        Message string
        Epoch   int64
        ID      string
    }
    
    // ddb returns the controller's DynamoDB client, instatiating a new client if necessary.
    func (c Book) ddb() *dynamodb.DynamoDB {
        if c.ddbClient == nil {
            sess := session.Must(session.NewSession(&aws.Config{
                Region: aws.String(endpoints.UsWest2RegionID),
            }))
            c.ddbClient = dynamodb.New(sess)
        }
        return c.ddbClient
    }
    
    // Sign allows users to sign the book.
    // The message is to be passed as application/json typed content, listed under the "message" top level key.
    func (c Book) Sign() revel.Result {
        var s Signature
    
        err := c.Params.BindJSON(&s)
        if err != nil {
            return c.RenderError(err)
        }
        now := time.Now()
        s.Epoch = now.Unix()
        s.ID = randString(20)
    
        item, err := dynamodbattribute.MarshalMap(s)
        if err != nil {
            return c.RenderError(err)
        }
    
        putItemInput := &dynamodb.PutItemInput{
            TableName: aws.String(TABLE_NAME),
            Item:      item,
        }
        _, err = c.ddb().PutItem(putItemInput)
        if err != nil {
            return c.RenderError(err)
        }
    
        return c.RenderText(SUCCESS)
    }
    
    // List allows users to list all signatures in the book.
    func (c Book) List() revel.Result {
        scanInput := &dynamodb.ScanInput{
            TableName: aws.String(TABLE_NAME),
            Limit:     aws.Int64(100),
        }
        res, err := c.ddb().Scan(scanInput)
        if err != nil {
            return c.RenderError(err)
        }
    
        messages := make([]string, 0)
        for _, v := range res.Items {
            messages = append(messages, *(v["Message"].S))
        }
        return c.RenderJSON(messages)
    }
    

    ./conf/routes:
    POST /sign Book.Sign
    GET /list Book.List

    Creating the resources and testing

    For the purposes of this blog post, the application will be run and tested locally. We will store and retrieve messages from an Amazon DynamoDB table. Use the following AWS CLI command to create the guestbook table:

    aws dynamodb create-table --region us-west-2 --table-name "guestbook" --attribute-definitions AttributeName=ID,AttributeType=S AttributeName=Epoch,AttributeType=N --key-schema AttributeName=ID,KeyType=HASH AttributeName=Epoch,KeyType=RANGE --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

    Now, let’s test our sign and list routes. If everything is working correctly, the following result appears:

    $ curl -d '{"message":"Hello from cURL!"}' -H "Content-Type: application/json" http://localhost:9000/book/sign
    Success.
    $ curl http://localhost:9000/book/list
    [
      "Hello from cURL!"
    ]%
    

    Integrating with AWS X-Ray

    Download and run the AWS X-Ray daemon

    The AWS SDKs emit trace segments over UDP on port 2000. (This port can be configured.) In order for the trace segments to make it to the X-Ray service, the daemon must listen on this port and batch the segments in calls to the PutTraceSegments API.
    For information about downloading and running the X-Ray daemon, see the AWS X-Ray Developer Guide.

    Installing the AWS X-Ray SDK for Go

    To download the SDK from GitHub, run go get -u github.com/aws/aws-xray-sdk-go/... The SDK will appear in the $GOPATH.

    Enabling the incoming request filter

    The first step to instrumenting an application with AWS X-Ray is to enable the generation of trace segments on incoming requests. The SDK conveniently provides an implementation of http.Handler which does exactly that. To ensure incoming web requests travel through this handler, we can modify app/init.go, adding a custom function to be run on application start.

    import (
        "github.com/aws/aws-xray-sdk-go/xray"
        "github.com/revel/revel"
    )
    
    ...
    
    func init() {
      ...
        revel.OnAppStart(installXRayHandler)
    }
    
    func installXRayHandler() {
        revel.Server.Handler = xray.Handler(xray.NewFixedSegmentNamer("GuestbookApp"), revel.Server.Handler)
    }
    

    The application will now emit a segment for each incoming web request. The service graph appears:

    You can customize the name of the segment to make it more descriptive by providing an alternate implementation of SegmentNamer to xray.Handler. For example, you can use xray.NewDynamicSegmentNamer(fallback, pattern) in place of the fixed namer. This namer will use the host name from the incoming web request (if it matches pattern) as the segment name. This is often useful when you are trying to separate different instances of the same application.

    In addition, HTTP-centric information such as method and URL is collected in the segment’s http subsection:

    "http": {
        "request": {
            "url": "/book/list",
            "method": "GET",
            "user_agent": "curl/7.54.0",
            "client_ip": "::1"
        },
        "response": {
            "status": 200
        }
    },
    

    Instrumenting outbound calls

    To provide detailed performance metrics for distributed applications, the AWS X-Ray SDK needs to measure the time it takes to make outbound requests. Trace context is passed to downstream services using the X-Amzn-Trace-Id header. To draw a detailed and accurate representation of a distributed application, outbound call instrumentation is required.

    AWS SDK calls

    The AWS X-Ray SDK for Go provides a one-line AWS client wrapper that enables the collection of detailed per-call metrics for any AWS client. We can modify the DynamoDB client instantiation to include this line:

    // ddb returns the controller's DynamoDB client, instatiating a new client if necessary.
    func (c Book) ddb() *dynamodb.DynamoDB {
        if c.ddbClient == nil {
            sess := session.Must(session.NewSession(&aws.Config{
                Region: aws.String(endpoints.UsWest2RegionID),
            }))
            c.ddbClient = dynamodb.New(sess)
            xray.AWS(c.ddbClient.Client) // add subsegment-generating X-Ray handlers to this client
        }
        return c.ddbClient
    }
    

    We also need to ensure that the segment generated by our xray.Handler is passed to these AWS calls so that the X-Ray SDK knows to which segment these generated subsegments belong. In Go, the context.Context object is passed throughout the call path to achieve this goal. (In most other languages, some variant of ThreadLocal is used.) AWS clients provide a *WithContext method variant for each AWS operation, which we need to switch to:

    _, err = c.ddb().PutItemWithContext(c.Request.Context(), putItemInput)
        res, err := c.ddb().ScanWithContext(c.Request.Context(), scanInput)
    

    We now see much more detail in the Timeline view of the trace for the sign and list operations:

    We can use this detail to help diagnose throttling on our DynamoDB table. In the following screenshot, the purple in the DynamoDB service graph node indicates that our table is underprovisioned. The red in the GuestbookApp node indicates that the application is throwing faults due to this throttling.

    HTTP calls

    Although the guestbook application does not make any non-AWS outbound HTTP calls in its current state, there is a similar one-liner to wrap HTTP clients that make outbound requests. xray.Client(c *http.Client) wraps an existing http.Client (or nil if you want to use a default HTTP client). For example:

    resp, err := ctxhttp.Get(ctx, xray.Client(nil), "https://aws.amazon.com/")

    Instrumenting local operations

    X-Ray can also assist in measuring the performance of local compute operations. To see this in action, let’s create a custom subsegment inside the randString method:

    
    // randString returns a random string of len n, used for DynamoDB Hash key.
    func randString(ctx context.Context, n int) string {
        xray.Capture(ctx, "randString", func(innerCtx context.Context) {
            b := make([]rune, n)
            for i := range b {
                b[i] = letters[rand.Intn(len(letters))]
            }
            s := string(b)
        })
        return s
    }
    
    // we'll also need to change the callsite
    
    s.ID = randString(c.Request.Context(), 20)
    

    Summary

    By now, you are an expert on how to instrument X-Ray for your Go applications. Instrumenting X-Ray with your applications is an easy way to analyze and debug performance issues and understand customer impact. Please feel free to give any feedback or comments below.

    For more information about advanced configuration of the AWS X-Ray SDK for Go, see the AWS X-Ray SDK for Go in the AWS X-Ray Developer Guide and the aws/aws-xray-sdk-go GitHub repository.

    For more information about some of the advanced X-Ray features such as histograms, annotations, and filter expressions, see the Analyzing Performance for Amazon Rekognition Apps Written on AWS Lambda Using AWS X-Ray blog post.

    Amazon Linux 2 – Modern, Stable, and Enterprise-Friendly

    Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-linux-2-modern-stable-and-enterprise-friendly/

    I’m getting ready to wrap up my work for the year, cleaning up my inbox and catching up on a few recent AWS launches that happened at and shortly after AWS re:Invent.

    Last week we launched Amazon Linux 2. This is modern version of Linux, designed to meet the security, stability, and productivity needs of enterprise environments while giving you timely access to new tools and features. It also includes all of the things that made the Amazon Linux AMI popular, including AWS integration, cloud-init, a secure default configuration, regular security updates, and AWS Support. From that base, we have added many new features including:

    Long-Term Support – You can use Amazon Linux 2 in situations where you want to stick with a single major version of Linux for an extended period of time, perhaps to avoid re-qualifying your applications too frequently. This build (2017.12) is a candidate for LTS status; the final determination will be made based on feedback in the Amazon Linux Discussion Forum. Long-term support for the Amazon Linux 2 LTS build will include security updates, bug fixes, user-space Application Binary Interface (ABI), and user-space Application Programming Interface (API) compatibility for 5 years.

    Extras Library – You can now get fast access to fresh, new functionality while keeping your base OS image stable and lightweight. The Amazon Linux Extras Library eliminates the age-old tradeoff between OS stability and access to fresh software. It contains open source databases, languages, and more, each packaged together with any needed dependencies.

    Tuned Kernel – You have access to the latest 4.9 LTS kernel, with support for the latest EC2 features and tuned to run efficiently in AWS and other virtualized environments.

    SystemdAmazon Linux 2 includes the systemd init system, designed to provide better boot performance and increased control over individual services and groups of interdependent services. For example, you can indicate that Service B must be started only after Service A is fully started, or that Service C should start on a change in network connection status.

    Wide AvailabiltyAmazon Linux 2 is available in all AWS Regions in AMI and Docker image form. Virtual machine images for Hyper-V, KVM, VirtualBox, and VMware are also available. You can build and test your applications on your laptop or in your own data center and then deploy them to AWS.

    Launching an Instance
    You can launch an instance in all of the usual ways – AWS Management Console, AWS Command Line Interface (CLI), AWS Tools for Windows PowerShell, RunInstances, and via a AWS CloudFormation template. I’ll use the Console:

    I’m interested in the Extras Library; here’s how I see which topics (lists of packages) are available:

    As you can see, the library includes languages, editors, and web tools that receive frequent updates. Each topic contains all of dependencies that are needed to install the package on Amazon Linux 2. For example, the Rust topic includes the cmake build system for Rust, cargo for Rust package maintenance, and the LLVM-based compiler toolchain for Rust.

    Here’s how I install a topic (Emacs 25.3):

    SNS Updates
    Many AWS customers use the Amazon Linux AMIs as a starting point for their own AMIs. If you do this and would like to kick off your build process whenever a new AMI is released, you can subscribe to an SNS topic:

    You can be notified by email, invoke a AWS Lambda function, and so forth.

    Available Now
    Amazon Linux 2 is available now and you can start using it in the cloud and on-premises today! To learn more, read the Amazon Linux 2 LTS Candidate (2017.12) Release Notes.

    Jeff;

     

    MQTT 5: Is it time to upgrade to MQTT 5 yet?

    Post Syndicated from The HiveMQ Team original https://www.hivemq.com/blog/mqtt-5-time-to-upgrade-yet/

    MQTT 5 - Is it time to upgrade yet?

    Is it time to upgrade to MQTT 5 yet?

    Welcome to this week’s blog post! After last week’s Introduction to MQTT 5, many readers wondered when the successor to MQTT 3.1.1 is ready for prime time and can be used in future and existing projects.

    Before we try to answer the question in more detail, we’d love to hear your thoughts about upgrading to MQTT 5. We prepared a small survey below. Let us know how your MQTT 5 upgrading plans are!

    The MQTT 5 OASIS Standard

    As of late December 2017, the MQTT 5 specification is not available as an official “Committee Specification” yet. In other words: MQTT 5 is not available yet officially. The foundation for every implementation of the standard is, that the Technical Committee at OASIS officially releases the standard.

    The good news: Although no official version of the standard is available yet, fundamental changes to the current state of the specification are not expected.
    The Public Review phase of the “Committee Specification Draft 2” finished without any major comments or issues. We at HiveMQ expect the MQTT 5 standard to be released in very late December 2017 or January 2018.

    Current state of client libraries

    To start using MQTT 5, you need two participants: An MQTT 5 client library implementation in your programming language(s) of choice and an MQTT 5 broker implementation (like HiveMQ). If both components support the new standard, you are good to go and can use the new version in your projects.

    When it comes to MQTT libraries, Eclipse Paho is the one-stop shop for MQTT clients in most programming languages. A recent Paho mailing list entry stated that Paho plans to release MQTT 5 client libraries end of June 2018 for the following programming languages:

    • C (+ embedded C)
    • Java
    • Go
    • C++

    If you’re feeling adventurous, at least the Java Paho client has preliminary MQTT 5 support available. You can play around with the API and get a feel about the upcoming Paho version. Just build the library from source and test it, but be aware that this is not safe for production use.

    There is also a very basic test broker implementation available at Eclipse Paho which can be used for playing around. This is of course only for very basic tests and does not support all MQTT 5 features yet. If you’re planning to write your own library, this may be a good tool to test your implementation against.

    There are of other individual MQTT library projects which may be worth to check out. As of December 2017 most of these libraries don’t have an MQTT 5 roadmap published yet.

    HiveMQ and MQTT 5

    You can’t use the new version of the MQTT protocol only by having a client that is ready for MQTT 5. The counterpart, the MQTT broker, also needs to fully support the new protocol version. At the time of writing, no broker is MQTT 5 ready yet.

    HiveMQ was the first broker to fully support version 3.1.1 of MQTT and of course here at HiveMQ we are committed to give our customers the advantage of the new features of version 5 of the protocol as soon as possible and viable.

    We are going to provide an Early Access version of the upcoming HiveMQ generation with MQTT 5 support by May/June 2018. If you’re a library developer or want to go live with the new protocol version as soon as possible: The Early Access version is for you. Add yourself to the Early Access Notification List and we’ll notify you when the Early Access version is available.

    We expect to release the upcoming HiveMQ generation in the third quarter of 2018 with full support of ALL MQTT 5 features at scale in an interoperable way with previous MQTT versions.

    When is MQTT 5 ready for prime time?

    MQTT is typically used in mission critical environments where it’s not acceptable that parts of the infrastructure, broker or client, are unreliable or have some rough edges. So it’s typically not advisable to be the very first to try out new things in a critical production environment.

    Here at HiveMQ we expect that the first users will go live to production in late Q3 (September) 2018 and in the subsequent months. After the releases of the Paho library in June and the HiveMQ Early Access version, the adoption of MQTT 5 is expected to increase rapidly.

    So, is MQTT 5 ready for prime time yet (as of December 2017)? No.

    Will the new version of the protocol be suitable for production environments in the second half of 2018: Yes, definitely.

    Upcoming topics in this series

    We will continue this blog post series in January after the European Christmas Holidays. To kick-off the technical part of the series, we will take a look at the foundational changes of the MQTT protocol. And after that, we will release one blog post per week that will thoroughly review and inspect one new feature in detail together with best practices and fun trivia.

    If you want us to send the next and all upcoming articles directly into your inbox, just use the newsletter subscribe form below.

    P.S. Don’t forget to let us know if MQTT 5 is of interest for you by participating in this quick poll.

    Have an awesome week,
    The HiveMQ Team

    How to Easily Apply Amazon Cloud Directory Schema Changes with In-Place Schema Upgrades

    Post Syndicated from Mahendra Chheda original https://aws.amazon.com/blogs/security/how-to-easily-apply-amazon-cloud-directory-schema-changes-with-in-place-schema-upgrades/

    Now, Amazon Cloud Directory makes it easier for you to apply schema changes across your directories with in-place schema upgrades. Your directory now remains available while Cloud Directory applies backward-compatible schema changes such as the addition of new fields. Without migrating data between directories or applying code changes to your applications, you can upgrade your schemas. You also can view the history of your schema changes in Cloud Directory by using version identifiers, which help you track and audit schema versions across directories. If you have multiple instances of a directory with the same schema, you can view the version history of schema changes to manage your directory fleet and ensure that all directories are running with the same schema version.

    In this blog post, I demonstrate how to perform an in-place schema upgrade and use schema versions in Cloud Directory. I add additional attributes to an existing facet and add a new facet to a schema. I then publish the new schema and apply it to running directories, upgrading the schema in place. I also show how to view the version history of a directory schema, which helps me to ensure my directory fleet is running the same version of the schema and has the correct history of schema changes applied to it.

    Note: I share Java code examples in this post. I assume that you are familiar with the AWS SDK and can use Java-based code to build a Cloud Directory code example. You can apply the concepts I cover in this post to other programming languages such as Python and Ruby.

    Cloud Directory fundamentals

    I will start by covering a few Cloud Directory fundamentals. If you are already familiar with the concepts behind Cloud Directory facets, schemas, and schema lifecycles, you can skip to the next section.

    Facets: Groups of attributes. You use facets to define object types. For example, you can define a device schema by adding facets such as computers, phones, and tablets. A computer facet can track attributes such as serial number, make, and model. You can then use the facets to create computer objects, phone objects, and tablet objects in the directory to which the schema applies.

    Schemas: Collections of facets. Schemas define which types of objects can be created in a directory (such as users, devices, and organizations) and enforce validation of data for each object class. All data within a directory must conform to the applied schema. As a result, the schema definition is essentially a blueprint to construct a directory with an applied schema.

    Schema lifecycle: The four distinct states of a schema: Development, Published, Applied, and Deleted. Schemas in the Published and Applied states have version identifiers and cannot be changed. Schemas in the Applied state are used by directories for validation as applications insert or update data. You can change schemas in the Development state as many times as you need them to. In-place schema upgrades allow you to apply schema changes to an existing Applied schema in a production directory without the need to export and import the data populated in the directory.

    How to add attributes to a computer inventory application schema and perform an in-place schema upgrade

    To demonstrate how to set up schema versioning and perform an in-place schema upgrade, I will use an example of a computer inventory application that uses Cloud Directory to store relationship data. Let’s say that at my company, AnyCompany, we use this computer inventory application to track all computers we give to our employees for work use. I previously created a ComputerSchema and assigned its version identifier as 1. This schema contains one facet called ComputerInfo that includes attributes for SerialNumber, Make, and Model, as shown in the following schema details.

    Schema: ComputerSchema
    Version: 1
    
    Facet: ComputerInfo
    Attribute: SerialNumber, type: Integer
    Attribute: Make, type: String
    Attribute: Model, type: String

    AnyCompany has offices in Seattle, Portland, and San Francisco. I have deployed the computer inventory application for each of these three locations. As shown in the lower left part of the following diagram, ComputerSchema is in the Published state with a version of 1. The Published schema is applied to SeattleDirectory, PortlandDirectory, and SanFranciscoDirectory for AnyCompany’s three locations. Implementing separate directories for different geographic locations when you don’t have any queries that cross location boundaries is a good data partitioning strategy and gives your application better response times with lower latency.

    Diagram of ComputerSchema in Published state and applied to three directories

    Legend for the diagrams in this post

    The following code example creates the schema in the Development state by using a JSON file, publishes the schema, and then creates directories for the Seattle, Portland, and San Francisco locations. For this example, I assume the schema has been defined in the JSON file. The createSchema API creates a schema Amazon Resource Name (ARN) with the name defined in the variable, SCHEMA_NAME. I can use the putSchemaFromJson API to add specific schema definitions from the JSON file.

    // The utility method to get valid Cloud Directory schema JSON
    String validJson = getJsonFile("ComputerSchema_version_1.json")
    
    String SCHEMA_NAME = "ComputerSchema";
    
    String developmentSchemaArn = client.createSchema(new CreateSchemaRequest()
            .withName(SCHEMA_NAME))
            .getSchemaArn();
    
    // Put the schema document in the Development schema
    PutSchemaFromJsonResult result = client.putSchemaFromJson(new PutSchemaFromJsonRequest()
            .withSchemaArn(developmentSchemaArn)
            .withDocument(validJson));
    

    The following code example takes the schema that is currently in the Development state and publishes the schema, changing its state to Published.

    String SCHEMA_VERSION = "1";
    String publishedSchemaArn = client.publishSchema(
            new PublishSchemaRequest()
            .withDevelopmentSchemaArn(developmentSchemaArn)
            .withVersion(SCHEMA_VERSION))
            .getPublishedSchemaArn();
    
    // Our Published schema ARN is as follows
    // arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:schema/published/ComputerSchema/1

    The following code example creates a directory named SeattleDirectory and applies the published schema. The createDirectory API call creates a directory by using the published schema provided in the API parameters. Note that Cloud Directory stores a version of the schema in the directory in the Applied state. I will use similar code to create directories for PortlandDirectory and SanFranciscoDirectory.

    String DIRECTORY_NAME = "SeattleDirectory"; 
    
    CreateDirectoryResult directory = client.createDirectory(
            new CreateDirectoryRequest()
            .withName(DIRECTORY_NAME)
            .withSchemaArn(publishedSchemaArn));
    
    String directoryArn = directory.getDirectoryArn();
    String appliedSchemaArn = directory.getAppliedSchemaArn();
    
    // This code section can be reused to create directories for Portland and San Francisco locations with the appropriate directory names
    
    // Our directory ARN is as follows 
    // arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX
    
    // Our applied schema ARN is as follows 
    // arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1
    

    Revising a schema

    Now let’s say my company, AnyCompany, wants to add more information for computers and to track which employees have been assigned a computer for work use. I modify the schema to add two attributes to the ComputerInfo facet: Description and OSVersion (operating system version). I make Description optional because it is not important for me to track this attribute for the computer objects I create. I make OSVersion mandatory because it is critical for me to track it for all computer objects so that I can make changes such as applying security patches or making upgrades. Because I make OSVersion mandatory, I must provide a default value that Cloud Directory will apply to objects that were created before the schema revision, in order to handle backward compatibility. Note that you can replace the value in any object with a different value.

    I also add a new facet to track computer assignment information, shown in the following updated schema as the ComputerAssignment facet. This facet tracks these additional attributes: Name (the name of the person to whom the computer is assigned), EMail (the email address of the assignee), Department, and department CostCenter. Note that Cloud Directory refers to the previously available version identifier as the Major Version. Because I can now add a minor version to a schema, I also denote the changed schema as Minor Version A.

    Schema: ComputerSchema
    Major Version: 1
    Minor Version: A 
    
    Facet: ComputerInfo
    Attribute: SerialNumber, type: Integer 
    Attribute: Make, type: String
    Attribute: Model, type: Integer
    Attribute: Description, type: String, required: NOT_REQUIRED
    Attribute: OSVersion, type: String, required: REQUIRED_ALWAYS, default: "Windows 7"
    
    Facet: ComputerAssignment
    Attribute: Name, type: String
    Attribute: EMail, type: String
    Attribute: Department, type: String
    Attribute: CostCenter, type: Integer

    The following diagram shows the changes that were made when I added another facet to the schema and attributes to the existing facet. The highlighted area of the diagram (bottom left) shows that the schema changes were published.

    Diagram showing that schema changes were published

    The following code example revises the existing Development schema by adding the new attributes to the ComputerInfo facet and by adding the ComputerAssignment facet. I use a new JSON file for the schema revision, and for the purposes of this example, I am assuming the JSON file has the full schema including planned revisions.

    // The utility method to get a valid CloudDirectory schema JSON
    String schemaJson = getJsonFile("ComputerSchema_version_1_A.json")
    
    // Put the schema document in the Development schema
    PutSchemaFromJsonResult result = client.putSchemaFromJson(
            new PutSchemaFromJsonRequest()
            .withSchemaArn(developmentSchemaArn)
            .withDocument(schemaJson));

    Upgrading the Published schema

    The following code example performs an in-place schema upgrade of the Published schema with schema revisions (it adds new attributes to the existing facet and another facet to the schema). The upgradePublishedSchema API upgrades the Published schema with backward-compatible changes from the Development schema.

    // From an earlier code example, I know the publishedSchemaArn has this value: "arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:schema/published/ComputerSchema/1"
    
    // Upgrade publishedSchemaArn to minorVersion A. The Development schema must be backward compatible with 
    // the existing publishedSchemaArn. 
    
    String minorVersion = "A"
    
    UpgradePublishedSchemaResult upgradePublishedSchemaResult = client.upgradePublishedSchema(new UpgradePublishedSchemaRequest()
            .withDevelopmentSchemaArn(developmentSchemaArn)
            .withPublishedSchemaArn(publishedSchemaArn)
            .withMinorVersion(minorVersion));
    
    String upgradedPublishedSchemaArn = upgradePublishedSchemaResult.getUpgradedSchemaArn();
    
    // The Published schema ARN after the upgrade shows a minor version as follows 
    // arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:schema/published/ComputerSchema/1/A

    Upgrading the Applied schema

    The following diagram shows the in-place schema upgrade for the SeattleDirectory directory. I am performing the schema upgrade so that I can reflect the new schemas in all three directories. As a reminder, I added new attributes to the ComputerInfo facet and also added the ComputerAssignment facet. After the schema and directory upgrade, I can create objects for the ComputerInfo and ComputerAssignment facets in the SeattleDirectory. Any objects that were created with the old facet definition for ComputerInfo will now use the default values for any additional attributes defined in the new schema.

    Diagram of the in-place schema upgrade for the SeattleDirectory directory

    I use the following code example to perform an in-place upgrade of the SeattleDirectory to a Major Version of 1 and a Minor Version of A. Note that you should change a Major Version identifier in a schema to make backward-incompatible changes such as changing the data type of an existing attribute or dropping a mandatory attribute from your schema. Backward-incompatible changes require directory data migration from a previous version to the new version. You should change a Minor Version identifier in a schema to make backward-compatible upgrades such as adding additional attributes or adding facets, which in turn may contain one or more attributes. The upgradeAppliedSchema API lets me upgrade an existing directory with a different version of a schema.

    // This upgrades ComputerSchema version 1 of the Applied schema in SeattleDirectory to Major Version 1 and Minor Version A
    // The schema must be backward compatible or the API will fail with IncompatibleSchemaException
    
    UpgradeAppliedSchemaResult upgradeAppliedSchemaResult = client.upgradeAppliedSchema(new UpgradeAppliedSchemaRequest()
            .withDirectoryArn(directoryArn)
            .withPublishedSchemaArn(upgradedPublishedSchemaArn));
    
    String upgradedAppliedSchemaArn = upgradeAppliedSchemaResult.getUpgradedSchemaArn();
    
    // The Applied schema ARN after the in-place schema upgrade will appear as follows
    // arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1
    
    // This code section can be reused to upgrade directories for the Portland and San Francisco locations with the appropriate directory ARN

    Note: Cloud Directory has excluded returning the Minor Version identifier in the Applied schema ARN for backward compatibility and to enable the application to work across older and newer versions of the directory.

    The following diagram shows the changes that are made when I perform an in-place schema upgrade in the two remaining directories, PortlandDirectory and SanFranciscoDirectory. I make these calls sequentially, upgrading PortlandDirectory first and then upgrading SanFranciscoDirectory. I use the same code example that I used earlier to upgrade SeattleDirectory. Now, all my directories are running the most current version of the schema. Also, I made these schema changes without having to migrate data and while maintaining my application’s high availability.

    Diagram showing the changes that are made with an in-place schema upgrade in the two remaining directories

    Schema revision history

    I can now view the schema revision history for any of AnyCompany’s directories by using the listAppliedSchemaArns API. Cloud Directory maintains the five most recent versions of applied schema changes. Similarly, to inspect the current Minor Version that was applied to my schema, I use the getAppliedSchemaVersion API. The listAppliedSchemaArns API returns the schema ARNs based on my schema filter as defined in withSchemaArn.

    I use the following code example to query an Applied schema for its version history.

    // This returns the five most recent Minor Versions associated with a Major Version
    ListAppliedSchemaArnsResult listAppliedSchemaArnsResult = client.listAppliedSchemaArns(new ListAppliedSchemaArnsRequest()
            .withDirectoryArn(directoryArn)
            .withSchemaArn(upgradedAppliedSchemaArn));
    
    // Note: The listAppliedSchemaArns API without the SchemaArn filter returns all the Major Versions in a directory

    The listAppliedSchemaArns API returns the two ARNs as shown in the following output.

    arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1
    arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1/A

    The following code example queries an Applied schema for current Minor Version by using the getAppliedSchemaVersion API.

    // This returns the current Applied schema's Minor Version ARN 
    
    GetAppliedSchemaVersion getAppliedSchemaVersionResult = client.getAppliedSchemaVersion(new GetAppliedSchemaVersionRequest()
    	.withSchemaArn(upgradedAppliedSchemaArn));

    The getAppliedSchemaVersion API returns the current Applied schema ARN with a Minor Version, as shown in the following output.

    arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1/A

    If you have a lot of directories, schema revision API calls can help you audit your directory fleet and ensure that all directories are running the same version of a schema. Such auditing can help you ensure high integrity of directories across your fleet.

    Summary

    You can use in-place schema upgrades to make changes to your directory schema as you evolve your data set to match the needs of your application. An in-place schema upgrade allows you to maintain high availability for your directory and applications while the upgrade takes place. For more information about in-place schema upgrades, see the in-place schema upgrade documentation.

    If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about implementing the solution in this post, start a new thread in the Directory Service forum or contact AWS Support.

    – Mahendra

     

    AWS Contributes to Milestone 1.0 Release and Adds Model Serving Capability for Apache MXNet

    Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-contributes-to-milestone-1-0-release-and-adds-model-serving-capability-for-apache-mxnet/

    Post by Dr. Matt Wood

    Today AWS announced contributions to the milestone 1.0 release of the Apache MXNet deep learning engine including the introduction of a new model-serving capability for MXNet. The new capabilities in MXNet provide the following benefits to users:

    1) MXNet is easier to use: The model server for MXNet is a new capability introduced by AWS, and it packages, runs, and serves deep learning models in seconds with just a few lines of code, making them accessible over the internet via an API endpoint and thus easy to integrate into applications. The 1.0 release also includes an advanced indexing capability that enables users to perform matrix operations in a more intuitive manner.

    • Model Serving enables set up of an API endpoint for prediction: It saves developers time and effort by condensing the task of setting up an API endpoint for running and integrating prediction functionality into an application to just a few lines of code. It bridges the barrier between Python-based deep learning frameworks and production systems through a Docker container-based deployment model.
    • Advanced indexing for array operations in MXNet: It is now more intuitive for developers to leverage the powerful array operations in MXNet. They can use the advanced indexing capability by leveraging existing knowledge of NumPy/SciPy arrays. For example, it supports MXNet NDArray and Numpy ndarray as index, e.g. (a[mx.nd.array([1,2], dtype = ‘int32’]).

    2) MXNet is faster: The 1.0 release includes implementation of cutting-edge features that optimize the performance of training and inference. Gradient compression enables users to train models up to five times faster by reducing communication bandwidth between compute nodes without loss in convergence rate or accuracy. For speech recognition acoustic modeling like the Alexa voice, this feature can reduce network bandwidth by up to three orders of magnitude during training. With the support of NVIDIA Collective Communication Library (NCCL), users can train a model 20% faster on multi-GPU systems.

    • Optimize network bandwidth with gradient compression: In distributed training, each machine must communicate frequently with others to update the weight-vectors and thereby collectively build a single model, leading to high network traffic. Gradient compression algorithm enables users to train models up to five times faster by compressing the model changes communicated by each instance.
    • Optimize the training performance by taking advantage of NCCL: NCCL implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. NCCL provides communication routines that are optimized to achieve high bandwidth over interconnection between multi-GPUs. MXNet supports NCCL to train models about 20% faster on multi-GPU systems.

    3) MXNet provides easy interoperability: MXNet now includes a tool for converting neural network code written with the Caffe framework to MXNet code, making it easier for users to take advantage of MXNet’s scalability and performance.

    • Migrate Caffe models to MXNet: It is now possible to easily migrate Caffe code to MXNet, using the new source code translation tool for converting Caffe code to MXNet code.

    MXNet has helped developers and researchers make progress with everything from language translation to autonomous vehicles and behavioral biometric security. We are excited to see the broad base of users that are building production artificial intelligence applications powered by neural network models developed and trained with MXNet. For example, the autonomous driving company TuSimple recently piloted a self-driving truck on a 200-mile journey from Yuma, Arizona to San Diego, California using MXNet. This release also includes a full-featured and performance optimized version of the Gluon programming interface. The ease-of-use associated with it combined with the extensive set of tutorials has led significant adoption among developers new to deep learning. The flexibility of the interface has driven interest within the research community, especially in the natural language processing domain.

    Getting started with MXNet
    Getting started with MXNet is simple. To learn more about the Gluon interface and deep learning, you can reference this comprehensive set of tutorials, which covers everything from an introduction to deep learning to how to implement cutting-edge neural network models. If you’re a contributor to a machine learning framework, check out the interface specs on GitHub.

    To get started with the Model Server for Apache MXNet, install the library with the following command:

    $ pip install mxnet-model-server

    The Model Server library has a Model Zoo with 10 pre-trained deep learning models, including the SqueezeNet 1.1 object classification model. You can start serving the SqueezeNet model with just the following command:

    $ mxnet-model-server \
      --models squeezenet=https://s3.amazonaws.com/model-server/models/squeezenet_v1.1/squeezenet_v1.1.model \
      --service dms/model_service/mxnet_vision_service.py

    Learn more about the Model Server and view the source code, reference examples, and tutorials here: https://github.com/awslabs/mxnet-model-server/

    -Dr. Matt Wood

    Diehl: Reflecting on Haskell in 2017

    Post Syndicated from ris original https://lwn.net/Articles/740627/rss

    Stephen Diehl looks back
    at what happened in Haskell during the past year.
    Haskell has had a great year and 2017 was defined by vast quantities of new code, including 14,000 new Haskell projects on Github . The amount of writing this year was voluminous and my list of interesting work is eight times as large as last year. At least seven new companies came into existence and many existing firms unexpectedly dropped large open source Haskell projects into the public sphere. Driven by a lot of software catastrophes, the intersection of security, software correctness and formal methods have been become quite an active area of investment and research across both industry and academia. It’s really never been an easier and more exciting time to be programming professionally in the world’s most advanced (yet usable) statically typed language.

    GPIO expander: access a Pi’s GPIO pins on your PC/Mac

    Post Syndicated from Gordon Hollingworth original https://www.raspberrypi.org/blog/gpio-expander/

    Use the GPIO pins of a Raspberry Pi Zero while running Debian Stretch on a PC or Mac with our new GPIO expander software! With this tool, you can easily access a Pi Zero’s GPIO pins from your x86 laptop without using SSH, and you can also take advantage of your x86 computer’s processing power in your physical computing projects.

    A Raspberry Pi zero connected to a laptop - GPIO expander

    What is this magic?

    Running our x86 Stretch distribution on a PC or Mac, whether installed on the hard drive or as a live image, is a great way of taking advantage of a well controlled and simple Linux distribution without the need for a Raspberry Pi.

    The downside of not using a Pi, however, is that there aren’t any GPIO pins with which your Scratch or Python programs could communicate. This is a shame, because it means you are limited in your physical computing projects.

    I was thinking about this while playing around with the Pi Zero’s USB booting capabilities, having seen people employ the Linux gadget USB mode to use the Pi Zero as an Ethernet device. It struck me that, using the udev subsystem, we could create a simple GUI application that automatically pops up when you plug a Pi Zero into your computer’s USB port. Then the Pi Zero could be programmed to turn into an Ethernet-connected computer running pigpio to provide you with remote GPIO pins.

    So we went ahead and built this GPIO expander application, and your PC or Mac can now have GPIO pins which are accessible through Scratch or the GPIO Zero Python library. Note that you can only use this tool to access the Pi Zero.

    You can also install the application on the Raspberry Pi. Theoretically, you could connect a number of Pi Zeros to a single Pi and (without a USB hub) use a maximum of 140 pins! But I’ve not tested this — one for you, I think…

    Making the GPIO expander work

    If you’re using a PC or Mac and you haven’t set up x86 Debian Stretch yet, you’ll need to do that first. An easy way to do it is to download a copy of the Stretch release from this page and image it onto a USB stick. Boot from the USB stick (on most computers, you just need to press F10 during booting and select the stick when asked), and then run Stretch directly from the USB key. You can also install it to the hard drive, but be aware that installing it will overwrite anything that was on your hard drive before.

    Whether on a Mac, PC, or Pi, boot through to the Stretch desktop, open a terminal window, and install the GPIO expander application:

    sudo apt install usbbootgui

    Next, plug in your Raspberry Pi Zero (don’t insert an SD card), and after a few seconds the GUI will appear.

    A screenshot of the GPIO expander GUI

    The Raspberry Pi USB programming GUI

    Select GPIO expansion board and click OK. The Pi Zero will now be programmed as a locally connected Ethernet port (if you run ifconfig, you’ll see the new interface usb0 coming up).

    What’s really cool about this is that your plugged-in Pi Zero is now running pigpio, which allows you to control its GPIOs through the network interface.

    With Scratch 2

    To utilise the pins with Scratch 2, just click on the start bar and select Programming > Scratch 2.

    In Scratch, click on More Blocks, select Add an Extension, and then click Pi GPIO.

    Two new blocks will be added: the first is used to set the output pin, the second is used to get the pin value (it is true if the pin is read high).

    This a simple application using a Pibrella I had hanging around:

    A screenshot of a Scratch 2 program - GPIO expander

    With Python

    This is a Python example using the GPIO Zero library to flash an LED:

    [email protected]:~ $ export GPIOZERO_PIN_FACTORY=pigpio
    [email protected]:~ $ export PIGPIO_ADDR=fe80::1%usb0
    [email protected]:~ $ python3
    >>> from gpiozero import LED
    >>> led = LED(17)
    >>> led.blink()
    A Raspberry Pi zero connected to a laptop - GPIO expander

    The pinout command line tool is your friend

    Note that in the code above the IP address of the Pi Zero is an IPv6 address and is shortened to fe80::1%usb0, where usb0 is the network interface created by the first Pi Zero.

    With pigs directly

    Another option you have is to use the pigpio library and the pigs application and redirect the output to the Pi Zero network port running IPv6. To do this, you’ll first need to set some environment variable for the redirection:

    [email protected]:~ $ export PIGPIO_ADDR=fe80::1%usb0
    [email protected]:~ $ pigs bc2 0x8000
    [email protected]:~ $ pigs bs2 0x8000

    With the commands above, you should be able to flash the LED on the Pi Zero.

    The secret sauce

    I know there’ll be some people out there who would be interested in how we put this together. And I’m sure many people are interested in the ‘buildroot’ we created to run on the Pi Zero — after all, there are lots of things you can create if you’ve got a Pi Zero on the end of a piece of IPv6 string! For a closer look, find the build scripts for the GPIO expander here and the source code for the USB boot GUI here.

    And be sure to share your projects built with the GPIO expander by tagging us on social media or posting links in the comments!

    The post GPIO expander: access a Pi’s GPIO pins on your PC/Mac appeared first on Raspberry Pi.

    AWS Cloud9 – Cloud Developer Environments

    Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/aws-cloud9-cloud-developer-environments/

    One of the first things you learn when you start programming is that, just like any craftsperson, your tools matter. Notepad.exe isn’t going to cut it. A powerful editor and testing pipeline supercharge your productivity. I still remember learning to use Vim for the first time and being able to zip around systems and complex programs. Do you remember how hard it was to setup all your compilers and dependencies on a new machine? How many cycles have you wasted matching versions, tinkering with configs, and then writing documentation to onboard a new developer to a project?

    Today we’re launching AWS Cloud9, an Integrated Development Environment (IDE) for writing, running, and debugging code, all from your web browser. Cloud9 comes prepackaged with essential tools for many popular programming languages (Javascript, Python, PHP, etc.) so you don’t have to tinker with installing various compilers and toolchains. Cloud9 also provides a seamless experience for working with serverless applications allowing you to quickly switch between local and remote testing or debugging. Based on the popular open source Ace Editor and c9.io IDE (which we acquired last year), AWS Cloud9 is designed to make collaborative cloud development easy with extremely powerful pair programming features. There are more features than I could ever cover in this post but to give a quick breakdown I’ll break the IDE into 3 components: The editor, the AWS integrations, and the collaboration.

    Editing


    The Ace Editor at the core of Cloud9 is what lets you write code quickly, easily, and beautifully. It follows a UNIX philosophy of doing one thing and doing it well: writing code.

    It has all the typical IDE features you would expect: live syntax checking, auto-indent, auto-completion, code folding, split panes, version control integration, multiple cursors and selections, and it also has a few unique features I want to highlight. First of all, it’s fast, even for large (100000+ line) files. There’s no lag or other issues while typing. It has over two dozen themes built-in (solarized!) and you can bring all of your favorite themes from Sublime Text or TextMate as well. It has built-in support for 40+ language modes and customizable run configurations for your projects. Most importantly though, it has Vim mode (or emacs if your fingers work that way). It also has a keybinding editor that allows you to bend the editor to your will.

    The editor supports powerful keyboard navigation and commands (similar to Sublime Text or vim plugins like ctrlp). On a Mac, with ⌘+P you can open any file in your environment with fuzzy search. With ⌘+. you can open up the command pane which allows you to do invoke any of the editor commands by typing the name. It also helpfully displays the keybindings for a command in the pane, for instance to open to a terminal you can press ⌥+T. Oh, did I mention there’s a terminal? It ships with the AWS CLI preconfigured for access to your resources.

    The environment also comes with pre-installed debugging tools for many popular languages – but you’re not limited to what’s already installed. It’s easy to add in new programs and define new run configurations.

    The editor is just one, admittedly important, component in an IDE though. I want to show you some other compelling features.

    AWS Integrations

    The AWS Cloud9 IDE is the first IDE I’ve used that is truly “cloud native”. The service is provided at no additional charge, and you only charged for the underlying compute and storage resources. When you create an environment you’re prompted for either: an instance type and an auto-hibernate time, or SSH access to a machine of your choice.

    If you’re running in AWS the auto-hibernate feature will stop your instance shortly after you stop using your IDE. This can be a huge cost savings over running a more permanent developer desktop. You can also launch it within a VPC to give it secure access to your development resources. If you want to run Cloud9 outside of AWS, or on an existing instance, you can provide SSH access to the service which it will use to create an environment on the external machine. Your environment is provisioned with automatic and secure access to your AWS account so you don’t have to worry about copying credentials around. Let me say that again: you can run this anywhere.

    Serverless Development with AWS Cloud9

    I spend a lot of time on Twitch developing serverless applications. I have hundreds of lambda functions and APIs deployed. Cloud9 makes working with every single one of these functions delightful. Let me show you how it works.


    If you look in the top right side of the editor you’ll see an AWS Resources tab. Opening this you can see all of the lambda functions in your region (you can see functions in other regions by adjusting your region preferences in the AWS preference pane).

    You can import these remote functions to your local workspace just by double-clicking them. This allows you to edit, test, and debug your serverless applications all locally. You can create new applications and functions easily as well. If you click the Lambda icon in the top right of the pane you’ll be prompted to create a new lambda function and Cloud9 will automatically create a Serverless Application Model template for you as well. The IDE ships with support for the popular SAM local tool pre-installed. This is what I use in most of my local testing and serverless development. Since you have a terminal, it’s easy to install additional tools and use other serverless frameworks.

     

    Launching an Environment from AWS CodeStar

    With AWS CodeStar you can easily provision an end-to-end continuous delivery toolchain for development on AWS. Codestar provides a unified experience for building, testing, deploying, and managing applications using AWS CodeCommit, CodeBuild, CodePipeline, and CodeDeploy suite of services. Now, with a few simple clicks you can provision a Cloud9 environment to develop your application. Your environment will be pre-configured with the code for your CodeStar application already checked out and git credentials already configured.

    You can easily share this environment with your coworkers which leads me to another extremely useful set of features.

    Collaboration

    One of the many things that sets AWS Cloud9 apart from other editors are the rich collaboration tools. You can invite an IAM user to your environment with a few clicks.

    You can see what files they’re working on, where their cursors are, and even share a terminal. The chat features is useful as well.

    Things to Know

    • There are no additional charges for this service beyond the underlying compute and storage.
    • c9.io continues to run for existing users. You can continue to use all the features of c9.io and add new team members if you have a team account. In the future, we will provide tools for easy migration of your c9.io workspaces to AWS Cloud9.
    • AWS Cloud9 is available in the US West (Oregon), US East (Ohio), US East (N.Virginia), EU (Ireland), and Asia Pacific (Singapore) regions.

    I can’t wait to see what you build with AWS Cloud9!

    Randall

    New- AWS IoT Device Management

    Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-iot-device-management/

    AWS IoT and AWS Greengrass give you a solid foundation and programming environment for your IoT devices and applications.

    The nature of IoT means that an at-scale device deployment often encompasses millions or even tens of millions of devices deployed at hundreds or thousands of locations. At that scale, treating each device individually is impossible. You need to be able to set up, monitor, update, and eventually retire devices in bulk, collective fashion while also retaining the flexibility to accommodate varying deployment configurations, device models, and so forth.

    New AWS IoT Device Management
    Today we are launching AWS IoT Device Management to help address this challenge. It will help you through each phase of the device lifecycle, from manufacturing to retirement. Here’s what you get:

    Onboarding – Starting with devices in their as-manufactured state, you can control the provisioning workflow. You can use IoT Device Management templates to quickly onboard entire fleets of devices with a few clicks. The templates can include information about device certificates and access policies.

    Organization – In order to deal with massive numbers of devices, AWS IoT Device Management extends the existing IoT Device Registry and allows you to create a hierarchical model of your fleet and to set policies on a hierarchical basis. You can drill-down through the hierarchy in order to locate individual devices. You can also query your fleet on attributes such as device type or firmware version.

    Monitoring – Telemetry from the devices is used to gather real-time connection, authentication, and status metrics, which are published to Amazon CloudWatch. You can examine the metrics and locate outliers for further investigation. IoT Device Management lets you configure the log level for each device group, and you can also publish change events for the Registry and Jobs for monitoring purposes.

    Remote ManagementAWS IoT Device Management lets you remotely manage your devices. You can push new software and firmware to them, reset to factory defaults, reboot, and set up bulk updates at the desired velocity.

    Exploring AWS IoT Device Management
    The AWS IoT Device Management Console took me on a tour and pointed out how to access each of the features of the service:

    I already have a large set of devices (pressure gauges):

    These gauges were created using the new template-driven bulk registration feature. Here’s how I create a template:

    The gauges are organized into groups (by US state in this case):

    Here are the gauges in Colorado:

    AWS IoT group policies allow you to control access to specific IoT resources and actions for all members of a group. The policies are structured very much like IAM policies, and can be created in the console:

    Jobs are used to selectively update devices. Here’s how I create one:

    As indicated by the Job type above, jobs can run either once or continuously. Here’s how I choose the devices to be updated:

    I can create custom authorizers that make use of a Lambda function:

    I’ve shown you a medium-sized subset of AWS IoT Device Management in this post. Check it out for yourself to learn more!

    Jeff;

     

    Object models

    Post Syndicated from Eevee original https://eev.ee/blog/2017/11/28/object-models/

    Anonymous asks, with dollars:

    More about programming languages!

    Well then!

    I’ve written before about what I think objects are: state and behavior, which in practice mostly means method calls.

    I suspect that the popular impression of what objects are, and also how they should work, comes from whatever C++ and Java happen to do. From that point of view, the whole post above is probably nonsense. If the baseline notion of “object” is a rigid definition woven tightly into the design of two massively popular languages, then it doesn’t even make sense to talk about what “object” should mean — it does mean the features of those languages, and cannot possibly mean anything else.

    I think that’s a shame! It piles a lot of baggage onto a fairly simple idea. Polymorphism, for example, has nothing to do with objects — it’s an escape hatch for static type systems. Inheritance isn’t the only way to reuse code between objects, but it’s the easiest and fastest one, so it’s what we get. Frankly, it’s much closer to a speed tradeoff than a fundamental part of the concept.

    We could do with more experimentation around how objects work, but that’s impossible in the languages most commonly thought of as object-oriented.

    Here, then, is a (very) brief run through the inner workings of objects in four very dynamic languages. I don’t think I really appreciated objects until I’d spent some time with Python, and I hope this can help someone else whet their own appetite.

    Python 3

    Of the four languages I’m going to touch on, Python will look the most familiar to the Java and C++ crowd. For starters, it actually has a class construct.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    class Vector:
        def __init__(self, x, y):
            self.x = x
            self.y = y
    
        def __neg__(self):
            return Vector(-self.x, -self.y)
    
        def __div__(self, denom):
            return Vector(self.x / denom, self.y / denom)
    
        @property
        def magnitude(self):
            return (self.x ** 2 + self.y ** 2) ** 0.5
    
        def normalized(self):
            return self / self.magnitude
    

    The __init__ method is an initializer, which is like a constructor but named differently (because the object already exists in a usable form by the time the initializer is called). Operator overloading is done by implementing methods with other special __dunder__ names. Properties can be created with @property, where the @ is syntax for applying a wrapper function to a function as it’s defined. You can do inheritance, even multiply:

    1
    2
    3
    4
    class Foo(A, B, C):
        def bar(self, x, y, z):
            # do some stuff
            super().bar(x, y, z)
    

    Cool, a very traditional object model.

    Except… for some details.

    Some details

    For one, Python objects don’t have a fixed layout. Code both inside and outside the class can add or remove whatever attributes they want from whatever object they want. The underlying storage is just a dict, Python’s mapping type. (Or, rather, something like one. Also, it’s possible to change, which will probably be the case for everything I say here.)

    If you create some attributes at the class level, you’ll start to get a peek behind the curtains:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    class Foo:
        values = []
    
        def add_value(self, value):
            self.values.append(value)
    
    a = Foo()
    b = Foo()
    a.add_value('a')
    print(a.values)  # ['a']
    b.add_value('b')
    print(b.values)  # ['a', 'b']
    

    The [] assigned to values isn’t a default assigned to each object. In fact, the individual objects don’t know about it at all! You can use vars(a) to get at the underlying storage dict, and you won’t see a values entry in there anywhere.

    Instead, values lives on the class, which is a value (and thus an object) in its own right. When Python is asked for self.values, it checks to see if self has a values attribute; in this case, it doesn’t, so Python keeps going and asks the class for one.

    Python’s object model is secretly prototypical — a class acts as a prototype, as a shared set of fallback values, for its objects.

    In fact, this is also how method calls work! They aren’t syntactically special at all, which you can see by separating the attribute lookup from the call.

    1
    2
    3
    print("abc".startswith("a"))  # True
    meth = "abc".startswith
    print(meth("a"))  # True
    

    Reading obj.method looks for a method attribute; if there isn’t one on obj, Python checks the class. Here, it finds one: it’s a function from the class body.

    Ah, but wait! In the code I just showed, meth seems to “know” the object it came from, so it can’t just be a plain function. If you inspect the resulting value, it claims to be a “bound method” or “built-in method” rather than a function, too. Something funny is going on here, and that funny something is the descriptor protocol.

    Descriptors

    Python allows attributes to implement their own custom behavior when read from or written to. Such an attribute is called a descriptor. I’ve written about them before, but here’s a quick overview.

    If Python looks up an attribute, finds it in a class, and the value it gets has a __get__ method… then instead of using that value, Python will use the return value of its __get__ method.

    The @property decorator works this way. The magnitude property in my original example was shorthand for doing this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    class MagnitudeDescriptor:
        def __get__(self, instance, owner):
            if instance is None:
                return self
            return (instance.x ** 2 + instance.y ** 2) ** 0.5
    
    class Vector:
        def __init__(self, x, y):
            self.x = x
            self.y = y
    
        magnitude = MagnitudeDescriptor()
    

    When you ask for somevec.magnitude, Python checks somevec but doesn’t find magnitude, so it consults the class instead. The class does have a magnitude, and it’s a value with a __get__ method, so Python calls that method and somevec.magnitude evaluates to its return value. (The instance is None check is because __get__ is called even if you get the descriptor directly from the class via Vector.magnitude. A descriptor intended to work on instances can’t do anything useful in that case, so the convention is to return the descriptor itself.)

    You can also intercept attempts to write to or delete an attribute, and do absolutely whatever you want instead. But note that, similar to operating overloading in Python, the descriptor must be on a class; you can’t just slap one on an arbitrary object and have it work.

    This brings me right around to how “bound methods” actually work. Functions are descriptors! The function type implements __get__, and when a function is retrieved from a class via an instance, that __get__ bundles the function and the instance together into a tiny bound method object. It’s essentially:

    1
    2
    3
    4
    5
    class FunctionType:
        def __get__(self, instance, owner):
            if instance is None:
                return self
            return functools.partial(self, instance)
    

    The self passed as the first argument to methods is not special or magical in any way. It’s built out of a few simple pieces that are also readily accessible to Python code.

    Note also that because obj.method() is just an attribute lookup and a call, Python doesn’t actually care whether method is a method on the class or just some callable thing on the object. You won’t get the auto-self behavior if it’s on the object, but otherwise there’s no difference.

    More attribute access, and the interesting part

    Descriptors are one of several ways to customize attribute access. Classes can implement __getattr__ to intervene when an attribute isn’t found on an object; __setattr__ and __delattr__ to intervene when any attribute is set or deleted; and __getattribute__ to implement unconditional attribute access. (That last one is a fantastic way to create accidental recursion, since any attribute access you do within __getattribute__ will of course call __getattribute__ again.)

    Here’s what I really love about Python. It might seem like a magical special case that descriptors only work on classes, but it really isn’t. You could implement exactly the same behavior yourself, in pure Python, using only the things I’ve just told you about. Classes are themselves objects, remember, and they are instances of type, so the reason descriptors only work on classes is that type effectively does this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    class type:
        def __getattribute__(self, name):
            value = super().__getattribute__(name)
            # like all op overloads, __get__ must be on the type, not the instance
            ty = type(value)
            if hasattr(ty, '__get__'):
                # it's a descriptor!  this is a class access so there is no instance
                return ty.__get__(value, None, self)
            else:
                return value
    

    You can even trivially prove to yourself that this is what’s going on by skipping over types behavior:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    class Descriptor:
        def __get__(self, instance, owner):
            print('called!')
    
    class Foo:
        bar = Descriptor()
    
    Foo.bar  # called!
    type.__getattribute__(Foo, 'bar')  # called!
    object.__getattribute__(Foo, 'bar')  # ...
    

    And that’s not all! The mysterious super function, used to exhaustively traverse superclass method calls even in the face of diamond inheritance, can also be expressed in pure Python using these primitives. You could write your own superclass calling convention and use it exactly the same way as super.

    This is one of the things I really like about Python. Very little of it is truly magical; virtually everything about the object model exists in the types rather than the language, which means virtually everything can be customized in pure Python.

    Class creation and metaclasses

    A very brief word on all of this stuff, since I could talk forever about Python and I have three other languages to get to.

    The class block itself is fairly interesting. It looks like this:

    1
    2
    class Name(*bases, **kwargs):
        # code
    

    I’ve said several times that classes are objects, and in fact the class block is one big pile of syntactic sugar for calling type(...) with some arguments to create a new type object.

    The Python documentation has a remarkably detailed description of this process, but the gist is:

    • Python determines the type of the new class — the metaclass — by looking for a metaclass keyword argument. If there isn’t one, Python uses the “lowest” type among the provided base classes. (If you’re not doing anything special, that’ll just be type, since every class inherits from object and object is an instance of type.)

    • Python executes the class body. It gets its own local scope, and any assignments or method definitions go into that scope.

    • Python now calls type(name, bases, attrs, **kwargs). The name is whatever was right after class; the bases are position arguments; and attrs is the class body’s local scope. (This is how methods and other class attributes end up on the class.) The brand new type is then assigned to Name.

    Of course, you can mess with most of this. You can implement __prepare__ on a metaclass, for example, to use a custom mapping as storage for the local scope — including any reads, which allows for some interesting shenanigans. The only part you can’t really implement in pure Python is the scoping bit, which has a couple extra rules that make sense for classes. (In particular, functions defined within a class block don’t close over the class body; that would be nonsense.)

    Object creation

    Finally, there’s what actually happens when you create an object — including a class, which remember is just an invocation of type(...).

    Calling Foo(...) is implemented as, well, a call. Any type can implement calls with the __call__ special method, and you’ll find that type itself does so. It looks something like this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    # oh, a fun wrinkle that's hard to express in pure python: type is a class, so
    # it's an instance of itself
    class type:
        def __call__(self, *args, **kwargs):
            # remember, here 'self' is a CLASS, an instance of type.
            # __new__ is a true constructor: object.__new__ allocates storage
            # for a new blank object
            instance = self.__new__(self, *args, **kwargs)
            # you can return whatever you want from __new__ (!), and __init__
            # is only called on it if it's of the right type
            if isinstance(instance, self):
                instance.__init__(*args, **kwargs)
            return instance
    

    Again, you can trivially confirm this by asking any type for its __call__ method. Assuming that type doesn’t implement __call__ itself, you’ll get back a bound version of types implementation.

    1
    2
    >>> list.__call__
    <method-wrapper '__call__' of type object at 0x7fafb831a400>
    

    You can thus implement __call__ in your own metaclass to completely change how subclasses are created — including skipping the creation altogether, if you like.

    And… there’s a bunch of stuff I haven’t even touched on.

    The Python philosophy

    Python offers something that, on the surface, looks like a “traditional” class/object model. Under the hood, it acts more like a prototypical system, where failed attribute lookups simply defer to a superclass or metaclass.

    The language also goes to almost superhuman lengths to expose all of its moving parts. Even the prototypical behavior is an implementation of __getattribute__ somewhere, which you are free to completely replace in your own types. Proxying and delegation are easy.

    Also very nice is that these features “bundle” well, by which I mean a library author can do all manner of convoluted hijinks, and a consumer of that library doesn’t have to see any of it or understand how it works. You only need to inherit from a particular class (which has a metaclass), or use some descriptor as a decorator, or even learn any new syntax.

    This meshes well with Python culture, which is pretty big on the principle of least surprise. These super-advanced features tend to be tightly confined to single simple features (like “makes a weak attribute“) or cordoned with DSLs (e.g., defining a form/struct/database table with a class body). In particular, I’ve never seen a metaclass in the wild implement its own __call__.

    I have mixed feelings about that. It’s probably a good thing overall that the Python world shows such restraint, but I wonder if there are some very interesting possibilities we’re missing out on. I implemented a metaclass __call__ myself, just once, in an entity/component system that strove to minimize fuss when communicating between components. It never saw the light of day, but I enjoyed seeing some new things Python could do with the same relatively simple syntax. I wouldn’t mind seeing, say, an object model based on composition (with no inheritance) built atop Python’s primitives.

    Lua

    Lua doesn’t have an object model. Instead, it gives you a handful of very small primitives for building your own object model. This is pretty typical of Lua — it’s a very powerful language, but has been carefully constructed to be very small at the same time. I’ve never encountered anything else quite like it, and “but it starts indexing at 1!” really doesn’t do it justice.

    The best way to demonstrate how objects work in Lua is to build some from scratch. We need two key features. The first is metatables, which bear a passing resemblance to Python’s metaclasses.

    Tables and metatables

    The table is Lua’s mapping type and its primary data structure. Keys can be any value other than nil. Lists are implemented as tables whose keys are consecutive integers starting from 1. Nothing terribly surprising. The dot operator is sugar for indexing with a string key.

    1
    2
    3
    4
    5
    local t = { a = 1, b = 2 }
    print(t['a'])  -- 1
    print(t.b)  -- 2
    t.c = 3
    print(t['c'])  -- 3
    

    A metatable is a table that can be associated with another value (usually another table) to change its behavior. For example, operator overloading is implemented by assigning a function to a special key in a metatable.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    local t = { a = 1, b = 2 }
    --print(t + 0)  -- error: attempt to perform arithmetic on a table value
    
    local mt = {
        __add = function(left, right)
            return 12
        end,
    }
    setmetatable(t, mt)
    print(t + 0)  -- 12
    

    Now, the interesting part: one of the special keys is __index, which is consulted when the base table is indexed by a key it doesn’t contain. Here’s a table that claims every key maps to itself.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    local t = {}
    local mt = {
        __index = function(table, key)
            return key
        end,
    }
    setmetatable(t, mt)
    print(t.foo)  -- foo
    print(t.bar)  -- bar
    print(t[3])  -- 3
    

    __index doesn’t have to be a function, either. It can be yet another table, in which case that table is simply indexed with the key. If the key still doesn’t exist and that table has a metatable with an __index, the process repeats.

    With this, it’s easy to have several unrelated tables that act as a single table. Call the base table an object, fill the __index table with functions and call it a class, and you have half of an object system. You can even get prototypical inheritance by chaining __indexes together.

    At this point things are a little confusing, since we have at least three tables going on, so here’s a diagram. Keep in mind that Lua doesn’t actually have anything called an “object”, “class”, or “method” — those are just convenient nicknames for a particular structure we might build with Lua’s primitives.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
                        ╔═══════════╗        ...
                        ║ metatable ║         ║
                        ╟───────────╢   ┌─────╨───────────────────────┐
                        ║ __index   ╫───┤ lookup table ("superclass") │
                        ╚═══╦═══════╝   ├─────────────────────────────┤
      ╔═══════════╗         ║           │ some other method           ┼─── function() ... end
      ║ metatable ║         ║           └─────────────────────────────┘
      ╟───────────╢   ┌─────╨──────────────────┐
      ║ __index   ╫───┤ lookup table ("class") │
      ╚═══╦═══════╝   ├────────────────────────┤
          ║           │ some method            ┼─── function() ... end
          ║           └────────────────────────┘
    ┌─────╨─────────────────┐
    │ base table ("object") │
    └───────────────────────┘
    

    Note that a metatable is not the same as a class; it defines behavior, not methods. Conversely, if you try to use a class directly as a metatable, it will probably not do much. (This is pretty different from e.g. Python, where operator overloads are just methods with funny names. One nice thing about the Lua approach is that you can keep interface-like functionality separate from methods, and avoid clogging up arbitrary objects’ namespaces. You could even use a dummy table as a key and completely avoid name collisions.)

    Anyway, code!

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    local class = {
        foo = function(a)
            print("foo got", a)
        end,
    }
    local mt = { __index = class }
    -- setmetatable returns its first argument, so this is nice shorthand
    local obj1 = setmetatable({}, mt)
    local obj2 = setmetatable({}, mt)
    obj1.foo(7)  -- foo got 7
    obj2.foo(9)  -- foo got 9
    

    Wait, wait, hang on. Didn’t I call these methods? How do they get at the object? Maybe Lua has a magical this variable?

    Methods, sort of

    Not quite, but this is where the other key feature comes in: method-call syntax. It’s the lightest touch of sugar, just enough to have method invocation.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    -- note the colon!
    a:b(c, d, ...)
    
    -- exactly equivalent to this
    -- (except that `a` is only evaluated once)
    a.b(a, c, d, ...)
    
    -- which of course is really this
    a["b"](a, c, d, ...)
    

    Now we can write methods that actually do something.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    local class = {
        bar = function(self)
            print("our score is", self.score)
        end,
    }
    local mt = { __index = class }
    local obj1 = setmetatable({ score = 13 }, mt)
    local obj2 = setmetatable({ score = 25 }, mt)
    obj1:bar()  -- our score is 13
    obj2:bar()  -- our score is 25
    

    And that’s all you need. Much like Python, methods and data live in the same namespace, and Lua doesn’t care whether obj:method() finds a function on obj or gets one from the metatable’s __index. Unlike Python, the function will be passed self either way, because self comes from the use of : rather than from the lookup behavior.

    (Aside: strictly speaking, any Lua value can have a metatable — and if you try to index a non-table, Lua will always consult the metatable’s __index. Strings all have the string library as a metatable, so you can call methods on them: try ("%s %s"):format(1, 2). I don’t think Lua lets user code set the metatable for non-tables, so this isn’t that interesting, but if you’re writing Lua bindings from C then you can wrap your pointers in metatables to give them methods implemented in C.)

    Bringing it all together

    Of course, writing all this stuff every time is a little tedious and error-prone, so instead you might want to wrap it all up inside a little function. No problem.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    local function make_object(body)
        -- create a metatable
        local mt = { __index = body }
        -- create a base table to serve as the object itself
        local obj = setmetatable({}, mt)
        -- and, done
        return obj
    end
    
    -- you can leave off parens if you're only passing in 
    local Dog = {
        -- this acts as a "default" value; if obj.barks is missing, __index will
        -- kick in and find this value on the class.  but if obj.barks is assigned
        -- to, it'll go in the object and shadow the value here.
        barks = 0,
    
        bark = function(self)
            self.barks = self.barks + 1
            print("woof!")
        end,
    }
    
    local mydog = make_object(Dog)
    mydog:bark()  -- woof!
    mydog:bark()  -- woof!
    mydog:bark()  -- woof!
    print(mydog.barks)  -- 3
    print(Dog.barks)  -- 0
    

    It works, but it’s fairly barebones. The nice thing is that you can extend it pretty much however you want. I won’t reproduce an entire serious object system here — lord knows there are enough of them floating around — but the implementation I have for my LÖVE games lets me do this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    local Animal = Object:extend{
        cries = 0,
    }
    
    -- called automatically by Object
    function Animal:init()
        print("whoops i couldn't think of anything interesting to put here")
    end
    
    -- this is just nice syntax for adding a first argument called 'self', then
    -- assigning this function to Animal.cry
    function Animal:cry()
        self.cries = self.cries + 1
    end
    
    local Cat = Animal:extend{}
    
    function Cat:cry()
        print("meow!")
        Cat.__super.cry(self)
    end
    
    local cat = Cat()
    cat:cry()  -- meow!
    cat:cry()  -- meow!
    print(cat.cries)  -- 2
    

    When I say you can extend it however you want, I mean that. I could’ve implemented Python (2)-style super(Cat, self):cry() syntax; I just never got around to it. I could even make it work with multiple inheritance if I really wanted to — or I could go the complete opposite direction and only implement composition. I could implement descriptors, customizing the behavior of individual table keys. I could add pretty decent syntax for composition/proxying. I am trying very hard to end this section now.

    The Lua philosophy

    Lua’s philosophy is to… not have a philosophy? It gives you the bare minimum to make objects work, and you can do absolutely whatever you want from there. Lua does have something resembling prototypical inheritance, but it’s not so much a first-class feature as an emergent property of some very simple tools. And since you can make __index be a function, you could avoid the prototypical behavior and do something different entirely.

    The very severe downside, of course, is that you have to find or build your own object system — which can get pretty confusing very quickly, what with the multiple small moving parts. Third-party code may also have its own object system with subtly different behavior. (Though, in my experience, third-party code tries very hard to avoid needing an object system at all.)

    It’s hard to say what the Lua “culture” is like, since Lua is an embedded language that’s often a little different in each environment. I imagine it has a thousand millicultures, instead. I can say that the tedium of building my own object model has led me into something very “traditional”, with prototypical inheritance and whatnot. It’s partly what I’m used to, but it’s also just really dang easy to get working.

    Likewise, while I love properties in Python and use them all the dang time, I’ve yet to use a single one in Lua. They wouldn’t be particularly hard to add to my object model, but having to add them myself (or shop around for an object model with them and also port all my code to use it) adds a huge amount of friction. I’ve thought about designing an interesting ECS with custom object behavior, too, but… is it really worth the effort? For all the power and flexibility Lua offers, the cost is that by the time I have something working at all, I’m too exhausted to actually use any of it.

    JavaScript

    JavaScript is notable for being preposterously heavily used, yet not having a class block.

    Well. Okay. Yes. It has one now. It didn’t for a very long time, and even the one it has now is sugar.

    Here’s a vector class again:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    class Vector {
        constructor(x, y) {
            this.x = x;
            this.y = y;
        }
    
        get magnitude() {
            return Math.sqrt(this.x * this.x + this.y * this.y);
        }
    
        dot(other) {
            return this.x * other.x + this.y * other.y;
        }
    }
    

    In “classic” JavaScript, this would be written as:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    function Vector(x, y) {
        this.x = x;
        this.y = y;
    }
    
    Object.defineProperty(Vector.prototype, 'magnitude', {
        configurable: true,
        enumerable: true,
        get: function() {
            return Math.sqrt(this.x * this.x + this.y * this.y);
        },
    });
    
    
    Vector.prototype.dot = function(other) {
        return this.x * other.x + this.y * other.y;
    };
    

    Hm, yes. I can see why they added class.

    The JavaScript model

    In JavaScript, a new type is defined in terms of a function, which is its constructor.

    Right away we get into trouble here. There is a very big difference between these two invocations, which I actually completely forgot about just now after spending four hours writing about Python and Lua:

    1
    2
    let vec = Vector(3, 4);
    let vec = new Vector(3, 4);
    

    The first calls the function Vector. It assigns some properties to this, which here is going to be window, so now you have a global x and y. It then returns nothing, so vec is undefined.

    The second calls Vector with this set to a new empty object, then evaluates to that object. The result is what you’d actually expect.

    (You can detect this situation with the strange new.target expression, but I have never once remembered to do so.)

    From here, we have true, honest-to-god, first-class prototypical inheritance. The word “prototype” is even right there. When you write this:

    1
    vec.dot(vec2)
    

    JavaScript will look for dot on vec and (presumably) not find it. It then consults vecs prototype, an object you can see for yourself by using Object.getPrototypeOf(). Since vec is a Vector, its prototype is Vector.prototype.

    I stress that Vector.prototype is not the prototype for Vector. It’s the prototype for instances of Vector.

    (I say “instance”, but the true type of vec here is still just object. If you want to find Vector, it’s automatically assigned to the constructor property of its own prototype, so it’s available as vec.constructor.)

    Of course, Vector.prototype can itself have a prototype, in which case the process would continue if dot were not found. A common (and, arguably, very bad) way to simulate single inheritance is to set Class.prototype to an instance of a superclass to get the prototype right, then tack on the methods for Class. Nowadays we can do Object.create(Superclass.prototype).

    Now that I’ve been through Python and Lua, though, this isn’t particularly surprising. I kinda spoiled it.

    I suppose one difference in JavaScript is that you can tack arbitrary attributes directly onto Vector all you like, and they will remain invisible to instances since they aren’t in the prototype chain. This is kind of backwards from Lua, where you can squirrel stuff away in the metatable.

    Another difference is that every single object in JavaScript has a bunch of properties already tacked on — the ones in Object.prototype. Every object (and by “object” I mean any mapping) has a prototype, and that prototype defaults to Object.prototype, and it has a bunch of ancient junk like isPrototypeOf.

    (Nit: it’s possible to explicitly create an object with no prototype via Object.create(null).)

    Like Lua, and unlike Python, JavaScript doesn’t distinguish between keys found on an object and keys found via a prototype. Properties can be defined on prototypes with Object.defineProperty(), but that works just as well directly on an object, too. JavaScript doesn’t have a lot of operator overloading, but some things like Symbol.iterator also work on both objects and prototypes.

    About this

    You may, at this point, be wondering what this is. Unlike Lua and Python (and the last language below), this is a special built-in value — a context value, invisibly passed for every function call.

    It’s determined by where the function came from. If the function was the result of an attribute lookup, then this is set to the object containing that attribute. Otherwise, this is set to the global object, window. (You can also set this to whatever you want via the call method on functions.)

    This decision is made lexically, i.e. from the literal source code as written. There are no Python-style bound methods. In other words:

    1
    2
    3
    4
    5
    // this = obj
    obj.method()
    // this = window
    let meth = obj.method
    meth()
    

    Also, because this is reassigned on every function call, it cannot be meaningfully closed over, which makes using closures within methods incredibly annoying. The old approach was to assign this to some other regular name like self (which got syntax highlighting since it’s also a built-in name in browsers); then we got Function.bind, which produced a callable thing with a fixed context value, which was kind of nice; and now finally we have arrow functions, which explicitly close over the current this when they’re defined and don’t change it when called. Phew.

    Class syntax

    I already showed class syntax, and it’s really just one big macro for doing all the prototype stuff The Right Way. It even prevents you from calling the type without new. The underlying model is exactly the same, and you can inspect all the parts.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    class Vector { ... }
    
    console.log(Vector.prototype);  // { dot: ..., magnitude: ..., ... }
    let vec = new Vector(3, 4);
    console.log(Object.getPrototypeOf(vec));  // same as Vector.prototype
    
    // i don't know why you would subclass vector but let's roll with it
    class Vectest extends Vector { ... }
    
    console.log(Vectest.prototype);  // { ... }
    console.log(Object.getPrototypeOf(Vectest.prototype))  // same as Vector.prototype
    

    Alas, class syntax has a couple shortcomings. You can’t use the class block to assign arbitrary data to either the type object or the prototype — apparently it was deemed too confusing that mutations would be shared among instances. Which… is… how prototypes work. How Python works. How JavaScript itself, one of the most popular languages of all time, has worked for twenty-two years. Argh.

    You can still do whatever assignment you want outside of the class block, of course. It’s just a little ugly, and not something I’d think to look for with a sugary class.

    A more subtle result of this behavior is that a class block isn’t quite the same syntax as an object literal. The check for data isn’t a runtime thing; class Foo { x: 3 } fails to parse. So JavaScript now has two largely but not entirely identical styles of key/value block.

    Attribute access

    Here’s where things start to come apart at the seams, just a little bit.

    JavaScript doesn’t really have an attribute protocol. Instead, it has two… extension points, I suppose.

    One is Object.defineProperty, seen above. For common cases, there’s also the get syntax inside a property literal, which does the same thing. But unlike Python’s @property, these aren’t wrappers around some simple primitives; they are the primitives. JavaScript is the only language of these four to have “property that runs code on access” as a completely separate first-class concept.

    If you want to intercept arbitrary attribute access (and some kinds of operators), there’s a completely different primitive: the Proxy type. It doesn’t let you intercept attribute access or operators; instead, it produces a wrapper object that supports interception and defers to the wrapped object by default.

    It’s cool to see composition used in this way, but also, extremely weird. If you want to make your own type that overloads in or calling, you have to return a Proxy that wraps your own type, rather than actually returning your own type. And (unlike the other three languages in this post) you can’t return a different type from a constructor, so you have to throw that away and produce objects only from a factory. And instanceof would be broken, but you can at least fix that with Symbol.hasInstance — which is really operator overloading, implement yet another completely different way.

    I know the design here is a result of legacy and speed — if any object could intercept all attribute access, then all attribute access would be slowed down everywhere. Fair enough. It still leaves the surface area of the language a bit… bumpy?

    The JavaScript philosophy

    It’s a little hard to tell. The original idea of prototypes was interesting, but it was hidden behind some very awkward syntax. Since then, we’ve gotten a bunch of extra features awkwardly bolted on to reflect the wildly varied things the built-in types and DOM API were already doing. We have class syntax, but it’s been explicitly designed to avoid exposing the prototype parts of the model.

    I admit I don’t do a lot of heavy JavaScript, so I might just be overlooking it, but I’ve seen virtually no code that makes use of any of the recent advances in object capabilities. Forget about custom iterators or overloading call; I can’t remember seeing any JavaScript in the wild that even uses properties yet. I don’t know if everyone’s waiting for sufficient browser support, nobody knows about them, or nobody cares.

    The model has advanced recently, but I suspect JavaScript is still shackled to its legacy of “something about prototypes, I don’t really get it, just copy the other code that’s there” as an object model. Alas! Prototypes are so good. Hopefully class syntax will make it a bit more accessible, as it has in Python.

    Perl 5

    Perl 5 also doesn’t have an object system and expects you to build your own. But where Lua gives you two simple, powerful tools for building one, Perl 5 feels more like a puzzle with half the pieces missing. Clearly they were going for something, but they only gave you half of it.

    In brief, a Perl object is a reference that has been blessed with a package.

    I need to explain a few things. Honestly, one of the biggest problems with the original Perl object setup was how many strange corners and unique jargon you had to understand just to get off the ground.

    (If you want to try running any of this code, you should stick a use v5.26; as the first line. Perl is very big on backwards compatibility, so you need to opt into breaking changes, and even the mundane say builtin is behind a feature gate.)

    References

    A reference in Perl is sort of like a pointer, but its main use is very different. See, Perl has the strange property that its data structures try very hard to spill their contents all over the place. Despite having dedicated syntax for arrays — @foo is an array variable, distinct from the single scalar variable $foo — it’s actually impossible to nest arrays.

    1
    2
    3
    my @foo = (1, 2, 3, 4);
    my @bar = (@foo, @foo);
    # @bar is now a flat list of eight items: 1, 2, 3, 4, 1, 2, 3, 4
    

    The idea, I guess, is that an array is not one thing. It’s not a container, which happens to hold multiple things; it is multiple things. Anywhere that expects a single value, such as an array element, cannot contain an array, because an array fundamentally is not a single value.

    And so we have “references”, which are a form of indirection, but also have the nice property that they’re single values. They add containment around arrays, and in general they make working with most of Perl’s primitive types much more sensible. A reference to a variable can be taken with the \ operator, or you can use [ ... ] and { ... } to directly create references to anonymous arrays or hashes.

    1
    2
    3
    my @foo = (1, 2, 3, 4);
    my @bar = (\@foo, \@foo);
    # @bar is now a nested list of two items: [1, 2, 3, 4], [1, 2, 3, 4]
    

    (Incidentally, this is the sole reason I initially abandoned Perl for Python. Non-trivial software kinda requires nesting a lot of data structures, so you end up with references everywhere, and the syntax for going back and forth between a reference and its contents is tedious and ugly.)

    A Perl object must be a reference. Perl doesn’t care what kind of reference — it’s usually a hash reference, since hashes are a convenient place to store arbitrary properties, but it could just as well be a reference to an array, a scalar, or even a sub (i.e. function) or filehandle.

    I’m getting a little ahead of myself. First, the other half: blessing and packages.

    Packages and blessing

    Perl packages are just namespaces. A package looks like this:

    1
    2
    3
    4
    5
    6
    7
    package Foo::Bar;
    
    sub quux {
        say "hi from quux!";
    }
    
    # now Foo::Bar::quux() can be called from anywhere
    

    Nothing shocking, right? It’s just a named container. A lot of the details are kind of weird, like how a package exists in some liminal quasi-value space, but the basic idea is a Bag Of Stuff.

    The final piece is “blessing,” which is Perl’s funny name for binding a package to a reference. A very basic class might look like this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    package Vector;
    
    # the name 'new' is convention, not special
    sub new {
        # perl argument passing is weird, don't ask
        my ($class, $x, $y) = @_;
    
        # create the object itself -- here, unusually, an array reference makes sense
        my $self = [ $x, $y ];
    
        # associate the package with that reference
        # note that $class here is just the regular string, 'Vector'
        bless $self, $class;
    
        return $self;
    }
    
    sub x {
        my ($self) = @_;
        return $self->[0];
    }
    
    sub y {
        my ($self) = @_;
        return $self->[1];
    }
    
    sub magnitude {
        my ($self) = @_;
        return sqrt($self->x ** 2 + $self->y ** 2);
    }
    
    # switch back to the "default" package
    package main;
    
    # -> is method call syntax, which passes the invocant as the first argument;
    # for a package, that's just the package name
    my $vec = Vector->new(3, 4);
    say $vec->magnitude;  # 5
    

    A few things of note here. First, $self->[0] has nothing to do with objects; it’s normal syntax for getting the value of a index 0 out of an array reference called $self. (Most classes are based on hashrefs and would use $self->{value} instead.) A blessed reference is still a reference and can be treated like one.

    In general, -> is Perl’s dereferencey operator, but its exact behavior depends on what follows. If it’s followed by brackets, then it’ll apply the brackets to the thing in the reference: ->{} to index a hash reference, ->[] to index an array reference, and ->() to call a function reference.

    But if -> is followed by an identifier, then it’s a method call. For packages, that means calling a function in the package and passing the package name as the first argument. For objects — blessed references — that means calling a function in the associated package and passing the object as the first argument.

    This is a little weird! A blessed reference is a superposition of two things: its normal reference behavior, and some completely orthogonal object behavior. Also, object behavior has no notion of methods vs data; it only knows about methods. Perl lets you omit parentheses in a lot of places, including when calling a method with no arguments, so $vec->magnitude is really $vec->magnitude().

    Perl’s blessing bears some similarities to Lua’s metatables, but ultimately Perl is much closer to Ruby’s “message passing” approach than the above three languages’ approaches of “get me something and maybe it’ll be callable”. (But this is no surprise — Ruby is a spiritual successor to Perl 5.)

    All of this leads to one little wrinkle: how do you actually expose data? Above, I had to write x and y methods. Am I supposed to do that for every single attribute on my type?

    Yes! But don’t worry, there are third-party modules to help with this incredibly fundamental task. Take Class::Accessor::Fast, so named because it’s faster than Class::Accessor:

    1
    2
    3
    package Foo;
    use base qw(Class::Accessor::Fast);
    __PACKAGE__->mk_accessors(qw(fred wilma barney));
    

    (__PACKAGE__ is the lexical name of the current package; qw(...) is a list literal that splits its contents on whitespace.)

    This assumes you’re using a hashref with keys of the same names as the attributes. $obj->fred will return the fred key from your hashref, and $obj->fred(4) will change it to 4.

    You also, somewhat bizarrely, have to inherit from Class::Accessor::Fast. Speaking of which,

    Inheritance

    Inheritance is done by populating the package-global @ISA array with some number of (string) names of parent packages. Most code instead opts to write use base ...;, which does the same thing. Or, more commonly, use parent ...;, which… also… does the same thing.

    Every package implicitly inherits from UNIVERSAL, which can be freely modified by Perl code.

    A method can call its superclass method with the SUPER:: pseudo-package:

    1
    2
    3
    4
    sub foo {
        my ($self) = @_;
        $self->SUPER::foo;
    }
    

    However, this does a depth-first search, which means it almost certainly does the wrong thing when faced with multiple inheritance. For a while the accepted solution involved a third-party module, but Perl eventually grew an alternative you have to opt into: C3, which may be more familiar to you as the order Python uses.

    1
    2
    3
    4
    5
    6
    use mro 'c3';
    
    sub foo {
        my ($self) = @_;
        $self->next::method;
    }
    

    Offhand, I’m not actually sure how next::method works, seeing as it was originally implemented in pure Perl code. I suspect it involves peeking at the caller’s stack frame. If so, then this is a very different style of customizability from e.g. Python — the MRO was never intended to be pluggable, and the use of a special pseudo-package means it isn’t really, but someone was determined enough to make it happen anyway.

    Operator overloading and whatnot

    Operator overloading looks a little weird, though really it’s pretty standard Perl.

    1
    2
    3
    4
    5
    6
    7
    8
    package MyClass;
    
    use overload '+' => \&_add;
    
    sub _add {
        my ($self, $other, $swap) = @_;
        ...
    }
    

    use overload here is a pragma, where “pragma” means “regular-ass module that does some wizardry when imported”.

    \&_add is how you get a reference to the _add sub so you can pass it to the overload module. If you just said &_add or _add, that would call it.

    And that’s it; you just pass a map of operators to functions to this built-in module. No worry about name clashes or pollution, which is pretty nice. You don’t even have to give references to functions that live in the package, if you don’t want them to clog your namespace; you could put them in another package, or even inline them anonymously.

    One especially interesting thing is that Perl lets you overload every operator. Perl has a lot of operators. It considers some math builtins like sqrt and trig functions to be operators, or at least operator-y enough that you can overload them. You can also overload the “file text” operators, such as -e $path to test whether a file exists. You can overload conversions, including implicit conversion to a regex. And most fascinating to me, you can overload dereferencing — that is, the thing Perl does when you say $hashref->{key} to get at the underlying hash. So a single object could pretend to be references of multiple different types, including a subref to implement callability. Neat.

    Somewhat related: you can overload basic operators (indexing, etc.) on basic types (not references!) with the tie function, which is designed completely differently and looks for methods with fixed names. Go figure.

    You can intercept calls to nonexistent methods by implementing a function called AUTOLOAD, within which the $AUTOLOAD global will contain the name of the method being called. Originally this feature was, I think, intended for loading binary components or large libraries on-the-fly only when needed, hence the name. Offhand I’m not sure I ever saw it used the way __getattr__ is used in Python.

    Is there a way to intercept all method calls? I don’t think so, but it is Perl, so I must be forgetting something.

    Actually no one does this any more

    Like a decade ago, a council of elder sages sat down and put together a whole whizbang system that covers all of it: Moose.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    package Vector;
    use Moose;
    
    has x => (is => 'rw', isa => 'Int');
    has y => (is => 'rw', isa => 'Int');
    
    sub magnitude {
        my ($self) = @_;
        return sqrt($self->x ** 2 + $self->y ** 2);
    }
    

    Moose has its own way to do pretty much everything, and it’s all built on the same primitives. Moose also adds metaclasses, somehow, despite that the underlying model doesn’t actually support them? I’m not entirely sure how they managed that, but I do remember doing some class introspection with Moose and it was much nicer than the built-in way.

    (If you’re wondering, the built-in way begins with looking at the hash called %Vector::. No, that’s not a typo.)

    I really cannot stress enough just how much stuff Moose does, but I don’t want to delve into it here since Moose itself is not actually the language model.

    The Perl philosophy

    I hope you can see what I meant with what I first said about Perl, now. It has multiple inheritance with an MRO, but uses the wrong one by default. It has extensive operator overloading, which looks nothing like how inheritance works, and also some of it uses a totally different mechanism with special method names instead. It only understands methods, not data, leaving you to figure out accessors by hand.

    There’s 70% of an object system here with a clear general design it was gunning for, but none of the pieces really look anything like each other. It’s weird, in a distinctly Perl way.

    The result is certainly flexible, at least! It’s especially cool that you can use whatever kind of reference you want for storage, though even as I say that, I acknowledge it’s no different from simply subclassing list or something in Python. It feels different in Perl, but maybe only because it looks so different.

    I haven’t written much Perl in a long time, so I don’t know what the community is like any more. Moose was already ubiquitous when I left, which you’d think would let me say “the community mostly focuses on the stuff Moose can do” — but even a decade ago, Moose could already do far more than I had ever seen done by hand in Perl. It’s always made a big deal out of roles (read: interfaces), for instance, despite that I’d never seen anyone care about them in Perl before Moose came along. Maybe their presence in Moose has made them more popular? Who knows.

    Also, I wrote Perl seriously, but in the intervening years I’ve only encountered people who only ever used Perl for one-offs. Maybe it’ll come as a surprise to a lot of readers that Perl has an object model at all.

    End

    Well, that was fun! I hope any of that made sense.

    Special mention goes to Rust, which doesn’t have an object model you can fiddle with at runtime, but does do things a little differently.

    It’s been really interesting thinking about how tiny differences make a huge impact on what people do in practice. Take the choice of storage in Perl versus Python. Perl’s massively common URI class uses a string as the storage, nothing else; I haven’t seen anything like that in Python aside from markupsafe, which is specifically designed as a string type. I would guess this is partly because Perl makes you choose — using a hashref is an obvious default, but you have to make that choice one way or the other. In Python (especially 3), inheriting from object and getting dict-based storage is the obvious thing to do; the ability to use another type isn’t quite so obvious, and doing it “right” involves a tiny bit of extra work.

    Or, consider that Lua could have descriptors, but the extra bit of work (especially design work) has been enough of an impediment that I’ve never implemented them. I don’t think the object implementations I’ve looked at have included them, either. Super weird!

    In that light, it’s only natural that objects would be so strongly associated with the features Java and C++ attach to them. I think that makes it all the more important to play around! Look at what Moose has done. No, really, you should bear in mind my description of how Perl does stuff and flip through the Moose documentation. It’s amazing what they’ve built.

    Introducing AWS AppSync – Build data-driven apps with real-time and off-line capabilities

    Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/introducing-amazon-appsync/

    In this day and age, it is almost impossible to do without our mobile devices and the applications that help make our lives easier. As our dependency on our mobile phone grows, the mobile application market has exploded with millions of apps vying for our attention. For mobile developers, this means that we must ensure that we build applications that provide the quality, real-time experiences that app users desire.  Therefore, it has become essential that mobile applications are developed to include features such as multi-user data synchronization, offline network support, and data discovery, just to name a few.  According to several articles, I read recently about mobile development trends on publications like InfoQ, DZone, and the mobile development blog AlleviateTech, one of the key elements in of delivering the aforementioned capabilities is with cloud-driven mobile applications.  It seems that this is especially true, as it related to mobile data synchronization and data storage.

    That being the case, it is a perfect time for me to announce a new service for building innovative mobile applications that are driven by data-intensive services in the cloud; AWS AppSync. AWS AppSync is a fully managed serverless GraphQL service for real-time data queries, synchronization, communications and offline programming features. For those not familiar, let me briefly share some information about the open GraphQL specification. GraphQL is a responsive data query language and server-side runtime for querying data sources that allow for real-time data retrieval and dynamic query execution. You can use GraphQL to build a responsive API for use in when building client applications. GraphQL works at the application layer and provides a type system for defining schemas. These schemas serve as specifications to define how operations should be performed on the data and how the data should be structured when retrieved. Additionally, GraphQL has a declarative coding model which is supported by many client libraries and frameworks including React, React Native, iOS, and Android.

    Now the power of the GraphQL open standard query language is being brought to you in a rich managed service with AWS AppSync.  With AppSync developers can simplify the retrieval and manipulation of data across multiple data sources with ease, allowing them to quickly prototype, build and create robust, collaborative, multi-user applications. AppSync keeps data updated when devices are connected, but enables developers to build solutions that work offline by caching data locally and synchronizing local data when connections become available.

    Let’s discuss some key concepts of AWS AppSync and how the service works.

    AppSync Concepts

    • AWS AppSync Client: service client that defines operations, wraps authorization details of requests, and manage offline logic.
    • Data Source: the data storage system or a trigger housing data
    • Identity: a set of credentials with permissions and identification context provided with requests to GraphQL proxy
    • GraphQL Proxy: the GraphQL engine component for processing and mapping requests, handling conflict resolution, and managing Fine Grained Access Control
    • Operation: one of three GraphQL operations supported in AppSync
      • Query: a read-only fetch call to the data
      • Mutation: a write of the data followed by a fetch,
      • Subscription: long-lived connections that receive data in response to events.
    • Action: a notification to connected subscribers from a GraphQL subscription.
    • Resolver: function using request and response mapping templates that converts and executes payload against data source

    How It Works

    A schema is created to define types and capabilities of the desired GraphQL API and tied to a Resolver function.  The schema can be created to mirror existing data sources or AWS AppSync can create tables automatically based the schema definition. Developers can also use GraphQL features for data discovery without having knowledge of the backend data sources. After a schema definition is established, an AWS AppSync client can be configured with an operation request, like a Query operation. The client submits the operation request to GraphQL Proxy along with an identity context and credentials. The GraphQL Proxy passes this request to the Resolver which maps and executes the request payload against pre-configured AWS data services like an Amazon DynamoDB table, an AWS Lambda function, or a search capability using Amazon Elasticsearch. The Resolver executes calls to one or all of these services within a single network call minimizing CPU cycles and bandwidth needs and returns the response to the client. Additionally, the client application can change data requirements in code on demand and the AppSync GraphQL API will dynamically map requests for data accordingly, allowing prototyping and faster development.

    In order to take a quick peek at the service, I’ll go to the AWS AppSync console. I’ll click the Create API button to get started.

     

    When the Create new API screen opens, I’ll give my new API a name, TarasTestApp, and since I am just exploring the new service I will select the Sample schema option.  You may notice from the informational dialog box on the screen that in using the sample schema, AWS AppSync will automatically create the DynamoDB tables and the IAM roles for me.It will also deploy the TarasTestApp API on my behalf.  After review of the sample schema provided by the console, I’ll click the Create button to create my test API.

    After the TaraTestApp API has been created and the associated AWS resources provisioned on my behalf, I can make updates to the schema, data source, or connect my data source(s) to a resolver. I also can integrate my GraphQL API into an iOS, Android, Web, or React Native application by cloning the sample repo from GitHub and downloading the accompanying GraphQL schema.  These application samples are great to help get you started and they are pre-configured to function in offline scenarios.

    If I select the Schema menu option on the console, I can update and view the TarasTestApp GraphQL API schema.


    Additionally, if I select the Data Sources menu option in the console, I can see the existing data sources.  Within this screen, I can update, delete, or add data sources if I so desire.

    Next, I will select the Query menu option which takes me to the console tool for writing and testing queries. Since I chose the sample schema and the AWS AppSync service did most of the heavy lifting for me, I’ll try a query against my new GraphQL API.

    I’ll use a mutation to add data for the event type in my schema. Since this is a mutation and it first writes data and then does a read of the data, I want the query to return values for name and where.

    If I go to the DynamoDB table created for the event type in the schema, I will see that the values from my query have been successfully written into the table. Now that was a pretty simple task to write and retrieve data based on a GraphQL API schema from a data source, don’t you think.


     Summary

    AWS AppSync is currently in AWS AppSync is in Public Preview and you can sign up today. It supports development for iOS, Android, and JavaScript applications. You can take advantage of this managed GraphQL service by going to the AWS AppSync console or learn more by reviewing more details about the service by reading a tutorial in the AWS documentation for the service or checking out our AWS AppSync Developer Guide.

    Tara

     

    ACE and CAP Shut Down Aussie Pirate IPTV Operation

    Post Syndicated from Andy original https://torrentfreak.com/ace-and-cap-shut-down-aussie-pirate-iptv-operation-171128/

    Instead of companies like the MPAA, Amazon, Netflix, CBS, HBO, BBC, Sky, CBS, Foxtel, and Village Roadshow tackling piracy completely solo, this year they teamed up to form the Alliance for Creativity and Entertainment (ACE).

    This massive collaboration of 30 companies represents a new front in the fight against piracy, with global players publicly cooperating to tackle the phenomenon in all its forms.

    The same is true of CASBAA‘s Coalition Against Piracy (CAP), a separate anti-piracy collective which to some extent shares the same members as ACE but with a sharp of focus on Asia.

    This morning the groups announced the results of a joint investigation in Australia which targeted a large supplier of illicit IPTV devices. These small set-top boxes, which come in several forms, are often configured to receive programming from unauthorized sources. In this particular case, they came pre-loaded to play pirated movies, television shows, sports programming, plus other content.

    The Melbourne-based company targeted by ACE and CAP allegedly sold these devices in Asia for many years. The company demanded AUS$400 (US$305) per IPTV unit and bundled each with a year’s subscription to pirated TV channels and on-demand movies from the US, EU, India and South East Asia markets.

    In the past, companies operating in these areas have often been met with overwhelming force including criminal action, but ACE and CAP appear to have reached an agreement with the company and its owner, even going as far as keeping their names out of the press.

    In return, the company has agreed to measures which will prevent people who have already invested in these boxes being able to access ACE and CAP content going forward. That is likely to result in a whole bunch of irritated customers.

    “The film and television industry has made significant investments to provide audiences with access to creative content how, where, and when they want it,” says ACE spokesperson Zoe Thorogood.

    “ACE and CAP members initiated this investigation as part of a comprehensive global approach to protect the legal marketplace for creative content, reduce online piracy, and bolster a creative economy that supports millions of workers. This latest action was part of a series of global actions to address the growth of illegal and unsafe piracy devices and apps.”

    Neil Gane, General Manager of the CASBAA Coalition Against Piracy (CAP), also weighed in with what are now becoming industry-standard warnings of losses to content makers and supposed risks to consumers.

    “These little black boxes are now beginning to dominate the piracy ecosystem, causing significant damage to all sectors of the content industry, from producers to telecommunication platforms,” Gane said.

    “They also pose a risk to consumers who face a well-documented increase in exposure to malware. The surge in availability of these illicit streaming devices is an international issue that requires a coordinated effort between industry and government. This will be the first of many disruption and enforcement initiatives on which CAP, ACE, and other industry associations will be collaborating together.”

    In September, TF revealed the secret agreement behind the ACE initiative, noting how the group’s founding members are required to commit $5m each annually to the project. The remaining 21 companies on the coalition’s Executive Committee put in $200,000 each.

    While today’s IPTV announcement was very public, ACE has already been flexing its muscles behind the scenes. Earlier this month we reported on several cases where UK-based Kodi addon developers were approached by the anti-piracy group and warned to shut down – or else.

    While all complied, each was warned not to reveal the terms of their agreement with ACE. This means that the legal basis for its threats remains shrouded in mystery. That being said, it’s likely that several European Court of Justice decisions earlier in the year played a key role.

    Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

    Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production

    Post Syndicated from Rafi Ton original https://aws.amazon.com/blogs/big-data/using-amazon-redshift-spectrum-amazon-athena-and-aws-glue-with-node-js-in-production/

    This is a guest post by Rafi Ton, founder and CEO of NUVIAD. NUVIAD is, in their own words, “a mobile marketing platform providing professional marketers, agencies and local businesses state of the art tools to promote their products and services through hyper targeting, big data analytics and advanced machine learning tools.”

    At NUVIAD, we’ve been using Amazon Redshift as our main data warehouse solution for more than 3 years.

    We store massive amounts of ad transaction data that our users and partners analyze to determine ad campaign strategies. When running real-time bidding (RTB) campaigns in large scale, data freshness is critical so that our users can respond rapidly to changes in campaign performance. We chose Amazon Redshift because of its simplicity, scalability, performance, and ability to load new data in near real time.

    Over the past three years, our customer base grew significantly and so did our data. We saw our Amazon Redshift cluster grow from three nodes to 65 nodes. To balance cost and analytics performance, we looked for a way to store large amounts of less-frequently analyzed data at a lower cost. Yet, we still wanted to have the data immediately available for user queries and to meet their expectations for fast performance. We turned to Amazon Redshift Spectrum.

    In this post, I explain the reasons why we extended Amazon Redshift with Redshift Spectrum as our modern data warehouse. I cover how our data growth and the need to balance cost and performance led us to adopt Redshift Spectrum. I also share key performance metrics in our environment, and discuss the additional AWS services that provide a scalable and fast environment, with data available for immediate querying by our growing user base.

    Amazon Redshift as our foundation

    The ability to provide fresh, up-to-the-minute data to our customers and partners was always a main goal with our platform. We saw other solutions provide data that was a few hours old, but this was not good enough for us. We insisted on providing the freshest data possible. For us, that meant loading Amazon Redshift in frequent micro batches and allowing our customers to query Amazon Redshift directly to get results in near real time.

    The benefits were immediately evident. Our customers could see how their campaigns performed faster than with other solutions, and react sooner to the ever-changing media supply pricing and availability. They were very happy.

    However, this approach required Amazon Redshift to store a lot of data for long periods, and our data grew substantially. In our peak, we maintained a cluster running 65 DC1.large nodes. The impact on our Amazon Redshift cluster was evident, and we saw our CPU utilization grow to 90%.

    Why we extended Amazon Redshift to Redshift Spectrum

    Redshift Spectrum gives us the ability to run SQL queries using the powerful Amazon Redshift query engine against data stored in Amazon S3, without needing to load the data. With Redshift Spectrum, we store data where we want, at the cost that we want. We have the data available for analytics when our users need it with the performance they expect.

    Seamless scalability, high performance, and unlimited concurrency

    Scaling Redshift Spectrum is a simple process. First, it allows us to leverage Amazon S3 as the storage engine and get practically unlimited data capacity.

    Second, if we need more compute power, we can leverage Redshift Spectrum’s distributed compute engine over thousands of nodes to provide superior performance – perfect for complex queries running against massive amounts of data.

    Third, all Redshift Spectrum clusters access the same data catalog so that we don’t have to worry about data migration at all, making scaling effortless and seamless.

    Lastly, since Redshift Spectrum distributes queries across potentially thousands of nodes, they are not affected by other queries, providing much more stable performance and unlimited concurrency.

    Keeping it SQL

    Redshift Spectrum uses the same query engine as Amazon Redshift. This means that we did not need to change our BI tools or query syntax, whether we used complex queries across a single table or joins across multiple tables.

    An interesting capability introduced recently is the ability to create a view that spans both Amazon Redshift and Redshift Spectrum external tables. With this feature, you can query frequently accessed data in your Amazon Redshift cluster and less-frequently accessed data in Amazon S3, using a single view.

    Leveraging Parquet for higher performance

    Parquet is a columnar data format that provides superior performance and allows Redshift Spectrum (or Amazon Athena) to scan significantly less data. With less I/O, queries run faster and we pay less per query. You can read all about Parquet at https://parquet.apache.org/ or https://en.wikipedia.org/wiki/Apache_Parquet.

    Lower cost

    From a cost perspective, we pay standard rates for our data in Amazon S3, and only small amounts per query to analyze data with Redshift Spectrum. Using the Parquet format, we can significantly reduce the amount of data scanned. Our costs are now lower, and our users get fast results even for large complex queries.

    What we learned about Amazon Redshift vs. Redshift Spectrum performance

    When we first started looking at Redshift Spectrum, we wanted to put it to the test. We wanted to know how it would compare to Amazon Redshift, so we looked at two key questions:

    1. What is the performance difference between Amazon Redshift and Redshift Spectrum on simple and complex queries?
    2. Does the data format impact performance?

    During the migration phase, we had our dataset stored in Amazon Redshift and S3 as CSV/GZIP and as Parquet file formats. We tested three configurations:

    • Amazon Redshift cluster with 28 DC1.large nodes
    • Redshift Spectrum using CSV/GZIP
    • Redshift Spectrum using Parquet

    We performed benchmarks for simple and complex queries on one month’s worth of data. We tested how much time it took to perform the query, and how consistent the results were when running the same query multiple times. The data we used for the tests was already partitioned by date and hour. Properly partitioning the data improves performance significantly and reduces query times.

    Simple query

    First, we tested a simple query aggregating billing data across a month:

    SELECT 
      user_id, 
      count(*) AS impressions, 
      SUM(billing)::decimal /1000000 AS billing 
    FROM <table_name> 
    WHERE 
      date >= '2017-08-01' AND 
      date <= '2017-08-31'  
    GROUP BY 
      user_id;

    We ran the same query seven times and measured the response times (red marking the longest time and green the shortest time):

    Execution Time (seconds)
      Amazon Redshift Redshift Spectrum
    CSV
    Redshift Spectrum Parquet
    Run #1 39.65 45.11 11.92
    Run #2 15.26 43.13 12.05
    Run #3 15.27 46.47 13.38
    Run #4 21.22 51.02 12.74
    Run #5 17.27 43.35 11.76
    Run #6 16.67 44.23 13.67
    Run #7 25.37 40.39 12.75
    Average 21.53  44.82 12.61

    For simple queries, Amazon Redshift performed better than Redshift Spectrum, as we thought, because the data is local to Amazon Redshift.

    What was surprising was that using Parquet data format in Redshift Spectrum significantly beat ‘traditional’ Amazon Redshift performance. For our queries, using Parquet data format with Redshift Spectrum delivered an average 40% performance gain over traditional Amazon Redshift. Furthermore, Redshift Spectrum showed high consistency in execution time with a smaller difference between the slowest run and the fastest run.

    Comparing the amount of data scanned when using CSV/GZIP and Parquet, the difference was also significant:

    Data Scanned (GB)
    CSV (Gzip) 135.49
    Parquet 2.83

    Because we pay only for the data scanned by Redshift Spectrum, the cost saving of using Parquet is evident and substantial.

    Complex query

    Next, we compared the same three configurations with a complex query.

    Execution Time (seconds)
      Amazon Redshift Redshift Spectrum CSV Redshift Spectrum Parquet
    Run #1 329.80 84.20 42.40
    Run #2 167.60 65.30 35.10
    Run #3 165.20 62.20 23.90
    Run #4 273.90 74.90 55.90
    Run #5 167.70 69.00 58.40
    Average 220.84 71.12 43.14

    This time, Redshift Spectrum using Parquet cut the average query time by 80% compared to traditional Amazon Redshift!

    Bottom line: For complex queries, Redshift Spectrum provided a 67% performance gain over Amazon Redshift. Using the Parquet data format, Redshift Spectrum delivered an 80% performance improvement over Amazon Redshift. For us, this was substantial.

    Optimizing the data structure for different workloads

    Because the cost of S3 is relatively inexpensive and we pay only for the data scanned by each query, we believe that it makes sense to keep our data in different formats for different workloads and different analytics engines. It is important to note that we can have any number of tables pointing to the same data on S3. It all depends on how we partition the data and update the table partitions.

    Data permutations

    For example, we have a process that runs every minute and generates statistics for the last minute of data collected. With Amazon Redshift, this would be done by running the query on the table with something as follows:

    SELECT 
      user, 
      COUNT(*) 
    FROM 
      events_table 
    WHERE 
      ts BETWEEN ‘2017-08-01 14:00:00’ AND ‘2017-08-01 14:00:59’ 
    GROUP BY 
      user;

    (Assuming ‘ts’ is your column storing the time stamp for each event.)

    With Redshift Spectrum, we pay for the data scanned in each query. If the data is partitioned by the minute instead of the hour, a query looking at one minute would be 1/60th the cost. If we use a temporary table that points only to the data of the last minute, we save that unnecessary cost.

    Creating Parquet data efficiently

    On the average, we have 800 instances that process our traffic. Each instance sends events that are eventually loaded into Amazon Redshift. When we started three years ago, we would offload data from each server to S3 and then perform a periodic copy command from S3 to Amazon Redshift.

    Recently, Amazon Kinesis Firehose added the capability to offload data directly to Amazon Redshift. While this is now a viable option, we kept the same collection process that worked flawlessly and efficiently for three years.

    This changed, however, when we incorporated Redshift Spectrum. With Redshift Spectrum, we needed to find a way to:

    • Collect the event data from the instances.
    • Save the data in Parquet format.
    • Partition the data effectively.

    To accomplish this, we save the data as CSV and then transform it to Parquet. The most effective method to generate the Parquet files is to:

    1. Send the data in one-minute intervals from the instances to Kinesis Firehose with an S3 temporary bucket as the destination.
    2. Aggregate hourly data and convert it to Parquet using AWS Lambda and AWS Glue.
    3. Add the Parquet data to S3 by updating the table partitions.

    With this new process, we had to give more attention to validating the data before we sent it to Kinesis Firehose, because a single corrupted record in a partition fails queries on that partition.

    Data validation

    To store our click data in a table, we considered the following SQL create table command:

    create external TABLE spectrum.blog_clicks (
        user_id varchar(50),
        campaign_id varchar(50),
        os varchar(50),
        ua varchar(255),
        ts bigint,
        billing float
    )
    partitioned by (date date, hour smallint)  
    stored as parquet
    location 's3://nuviad-temp/blog/clicks/';

    The above statement defines a new external table (all Redshift Spectrum tables are external tables) with a few attributes. We stored ‘ts’ as a Unix time stamp and not as Timestamp, and billing data is stored as float and not decimal (more on that later). We also said that the data is partitioned by date and hour, and then stored as Parquet on S3.

    First, we need to get the table definitions. This can be achieved by running the following query:

    SELECT 
      * 
    FROM 
      svv_external_columns 
    WHERE 
      tablename = 'blog_clicks';

    This query lists all the columns in the table with their respective definitions:

    schemaname tablename columnname external_type columnnum part_key
    spectrum blog_clicks user_id varchar(50) 1 0
    spectrum blog_clicks campaign_id varchar(50) 2 0
    spectrum blog_clicks os varchar(50) 3 0
    spectrum blog_clicks ua varchar(255) 4 0
    spectrum blog_clicks ts bigint 5 0
    spectrum blog_clicks billing double 6 0
    spectrum blog_clicks date date 7 1
    spectrum blog_clicks hour smallint 8 2

    Now we can use this data to create a validation schema for our data:

    const rtb_request_schema = {
        "name": "clicks",
        "items": {
            "user_id": {
                "type": "string",
                "max_length": 100
            },
            "campaign_id": {
                "type": "string",
                "max_length": 50
            },
            "os": {
                "type": "string",
                "max_length": 50            
            },
            "ua": {
                "type": "string",
                "max_length": 255            
            },
            "ts": {
                "type": "integer",
                "min_value": 0,
                "max_value": 9999999999999
            },
            "billing": {
                "type": "float",
                "min_value": 0,
                "max_value": 9999999999999
            }
        }
    };

    Next, we create a function that uses this schema to validate data:

    function valueIsValid(value, item_schema) {
        if (schema.type == 'string') {
            return (typeof value == 'string' && value.length <= schema.max_length);
        }
        else if (schema.type == 'integer') {
            return (typeof value == 'number' && value >= schema.min_value && value <= schema.max_value);
        }
        else if (schema.type == 'float' || schema.type == 'double') {
            return (typeof value == 'number' && value >= schema.min_value && value <= schema.max_value);
        }
        else if (schema.type == 'boolean') {
            return typeof value == 'boolean';
        }
        else if (schema.type == 'timestamp') {
            return (new Date(value)).getTime() > 0;
        }
        else {
            return true;
        }
    }

    Near real-time data loading with Kinesis Firehose

    On Kinesis Firehose, we created a new delivery stream to handle the events as follows:

    Delivery stream name: events
    Source: Direct PUT
    S3 bucket: nuviad-events
    S3 prefix: rtb/
    IAM role: firehose_delivery_role_1
    Data transformation: Disabled
    Source record backup: Disabled
    S3 buffer size (MB): 100
    S3 buffer interval (sec): 60
    S3 Compression: GZIP
    S3 Encryption: No Encryption
    Status: ACTIVE
    Error logging: Enabled

    This delivery stream aggregates event data every minute, or up to 100 MB, and writes the data to an S3 bucket as a CSV/GZIP compressed file. Next, after we have the data validated, we can safely send it to our Kinesis Firehose API:

    if (validated) {
        let itemString = item.join('|')+'\n'; //Sending csv delimited by pipe and adding new line
    
        let params = {
            DeliveryStreamName: 'events',
            Record: {
                Data: itemString
            }
        };
    
        firehose.putRecord(params, function(err, data) {
            if (err) {
                console.error(err, err.stack);        
            }
            else {
                // Continue to your next step 
            }
        });
    }

    Now, we have a single CSV file representing one minute of event data stored in S3. The files are named automatically by Kinesis Firehose by adding a UTC time prefix in the format YYYY/MM/DD/HH before writing objects to S3. Because we use the date and hour as partitions, we need to change the file naming and location to fit our Redshift Spectrum schema.

    Automating data distribution using AWS Lambda

    We created a simple Lambda function triggered by an S3 put event that copies the file to a different location (or locations), while renaming it to fit our data structure and processing flow. As mentioned before, the files generated by Kinesis Firehose are structured in a pre-defined hierarchy, such as:

    S3://your-bucket/your-prefix/2017/08/01/20/events-4-2017-08-01-20-06-06-536f5c40-6893-4ee4-907d-81e4d3b09455.gz

    All we need to do is parse the object name and restructure it as we see fit. In our case, we did the following (the event is an object received in the Lambda function with all the data about the object written to S3):

    /*
    	object key structure in the event object:
    your-prefix/2017/08/01/20/event-4-2017-08-01-20-06-06-536f5c40-6893-4ee4-907d-81e4d3b09455.gz
    	*/
    
    let key_parts = event.Records[0].s3.object.key.split('/'); 
    
    let event_type = key_parts[0];
    let date = key_parts[1] + '-' + key_parts[2] + '-' + key_parts[3];
    let hour = key_parts[4];
    if (hour.indexOf('0') == 0) {
     		hour = parseInt(hour, 10) + '';
    }
        
    let parts1 = key_parts[5].split('-');
    let minute = parts1[7];
    if (minute.indexOf('0') == 0) {
            minute = parseInt(minute, 10) + '';
    }

    Now, we can redistribute the file to the two destinations we need—one for the minute processing task and the other for hourly aggregation:

        copyObjectToHourlyFolder(event, date, hour, minute)
            .then(copyObjectToMinuteFolder.bind(null, event, date, hour, minute))
            .then(addPartitionToSpectrum.bind(null, event, date, hour, minute))
            .then(deleteOldMinuteObjects.bind(null, event))
            .then(deleteStreamObject.bind(null, event))        
            .then(result => {
                callback(null, { message: 'done' });            
            })
            .catch(err => {
                console.error(err);
                callback(null, { message: err });            
            }); 

    Kinesis Firehose stores the data in a temporary folder. We copy the object to another folder that holds the data for the last processed minute. This folder is connected to a small Redshift Spectrum table where the data is being processed without needing to scan a much larger dataset. We also copy the data to a folder that holds the data for the entire hour, to be later aggregated and converted to Parquet.

    Because we partition the data by date and hour, we created a new partition on the Redshift Spectrum table if the processed minute is the first minute in the hour (that is, minute 0). We ran the following:

    ALTER TABLE 
      spectrum.events 
    ADD partition
      (date='2017-08-01', hour=0) 
      LOCATION 's3://nuviad-temp/events/2017-08-01/0/';

    After the data is processed and added to the table, we delete the processed data from the temporary Kinesis Firehose storage and from the minute storage folder.

    Migrating CSV to Parquet using AWS Glue and Amazon EMR

    The simplest way we found to run an hourly job converting our CSV data to Parquet is using Lambda and AWS Glue (and thanks to the awesome AWS Big Data team for their help with this).

    Creating AWS Glue jobs

    What this simple AWS Glue script does:

    • Gets parameters for the job, date, and hour to be processed
    • Creates a Spark EMR context allowing us to run Spark code
    • Reads CSV data into a DataFrame
    • Writes the data as Parquet to the destination S3 bucket
    • Adds or modifies the Redshift Spectrum / Amazon Athena table partition for the table
    import sys
    import sys
    from awsglue.transforms import *
    from awsglue.utils import getResolvedOptions
    from pyspark.context import SparkContext
    from awsglue.context import GlueContext
    from awsglue.job import Job
    import boto3
    
    ## @params: [JOB_NAME]
    args = getResolvedOptions(sys.argv, ['JOB_NAME','day_partition_key', 'hour_partition_key', 'day_partition_value', 'hour_partition_value' ])
    
    #day_partition_key = "partition_0"
    #hour_partition_key = "partition_1"
    #day_partition_value = "2017-08-01"
    #hour_partition_value = "0"
    
    day_partition_key = args['day_partition_key']
    hour_partition_key = args['hour_partition_key']
    day_partition_value = args['day_partition_value']
    hour_partition_value = args['hour_partition_value']
    
    print("Running for " + day_partition_value + "/" + hour_partition_value)
    
    sc = SparkContext()
    glueContext = GlueContext(sc)
    spark = glueContext.spark_session
    job = Job(glueContext)
    job.init(args['JOB_NAME'], args)
    
    df = spark.read.option("delimiter","|").csv("s3://nuviad-temp/events/"+day_partition_value+"/"+hour_partition_value)
    df.registerTempTable("data")
    
    df1 = spark.sql("select _c0 as user_id, _c1 as campaign_id, _c2 as os, _c3 as ua, cast(_c4 as bigint) as ts, cast(_c5 as double) as billing from data")
    
    df1.repartition(1).write.mode("overwrite").parquet("s3://nuviad-temp/parquet/"+day_partition_value+"/hour="+hour_partition_value)
    
    client = boto3.client('athena', region_name='us-east-1')
    
    response = client.start_query_execution(
        QueryString='alter table parquet_events add if not exists partition(' + day_partition_key + '=\'' + day_partition_value + '\',' + hour_partition_key + '=' + hour_partition_value + ')  location \'s3://nuviad-temp/parquet/' + day_partition_value + '/hour=' + hour_partition_value + '\'' ,
        QueryExecutionContext={
            'Database': 'spectrumdb'
        },
        ResultConfiguration={
            'OutputLocation': 's3://nuviad-temp/convertresults'
        }
    )
    
    response = client.start_query_execution(
        QueryString='alter table parquet_events partition(' + day_partition_key + '=\'' + day_partition_value + '\',' + hour_partition_key + '=' + hour_partition_value + ') set location \'s3://nuviad-temp/parquet/' + day_partition_value + '/hour=' + hour_partition_value + '\'' ,
        QueryExecutionContext={
            'Database': 'spectrumdb'
        },
        ResultConfiguration={
            'OutputLocation': 's3://nuviad-temp/convertresults'
        }
    )
    
    job.commit()

    Note: Because Redshift Spectrum and Athena both use the AWS Glue Data Catalog, we could use the Athena client to add the partition to the table.

    Here are a few words about float, decimal, and double. Using decimal proved to be more challenging than we expected, as it seems that Redshift Spectrum and Spark use them differently. Whenever we used decimal in Redshift Spectrum and in Spark, we kept getting errors, such as:

    S3 Query Exception (Fetch). Task failed due to an internal error. File 'https://s3-external-1.amazonaws.com/nuviad-temp/events/2017-08-01/hour=2/part-00017-48ae5b6b-906e-4875-8cde-bc36c0c6d0ca.c000.snappy.parquet has an incompatible Parquet schema for column 's3://nuviad-events/events.lat'. Column type: DECIMAL(18, 8), Parquet schema:\noptional float lat [i:4 d:1 r:0]\n (https://s3-external-1.amazonaws.com/nuviad-temp/events/2017-08-01/hour=2/part-00017-48ae5b6b-906e-4875-8cde-bc36c0c6d0ca.c000.snappy.parq

    We had to experiment with a few floating-point formats until we found that the only combination that worked was to define the column as double in the Spark code and float in Spectrum. This is the reason you see billing defined as float in Spectrum and double in the Spark code.

    Creating a Lambda function to trigger conversion

    Next, we created a simple Lambda function to trigger the AWS Glue script hourly using a simple Python code:

    import boto3
    import json
    from datetime import datetime, timedelta
     
    client = boto3.client('glue')
     
    def lambda_handler(event, context):
        last_hour_date_time = datetime.now() - timedelta(hours = 1)
        day_partition_value = last_hour_date_time.strftime("%Y-%m-%d") 
        hour_partition_value = last_hour_date_time.strftime("%-H") 
        response = client.start_job_run(
        JobName='convertEventsParquetHourly',
        Arguments={
             '--day_partition_key': 'date',
             '--hour_partition_key': 'hour',
             '--day_partition_value': day_partition_value,
             '--hour_partition_value': hour_partition_value
             }
        )

    Using Amazon CloudWatch Events, we trigger this function hourly. This function triggers an AWS Glue job named ‘convertEventsParquetHourly’ and runs it for the previous hour, passing job names and values of the partitions to process to AWS Glue.

    Redshift Spectrum and Node.js

    Our development stack is based on Node.js, which is well-suited for high-speed, light servers that need to process a huge number of transactions. However, a few limitations of the Node.js environment required us to create workarounds and use other tools to complete the process.

    Node.js and Parquet

    The lack of Parquet modules for Node.js required us to implement an AWS Glue/Amazon EMR process to effectively migrate data from CSV to Parquet. We would rather save directly to Parquet, but we couldn’t find an effective way to do it.

    One interesting project in the works is the development of a Parquet NPM by Marc Vertes called node-parquet (https://www.npmjs.com/package/node-parquet). It is not in a production state yet, but we think it would be well worth following the progress of this package.

    Timestamp data type

    According to the Parquet documentation, Timestamp data are stored in Parquet as 64-bit integers. However, JavaScript does not support 64-bit integers, because the native number type is a 64-bit double, giving only 53 bits of integer range.

    The result is that you cannot store Timestamp correctly in Parquet using Node.js. The solution is to store Timestamp as string and cast the type to Timestamp in the query. Using this method, we did not witness any performance degradation whatsoever.

    Lessons learned

    You can benefit from our trial-and-error experience.

    Lesson #1: Data validation is critical

    As mentioned earlier, a single corrupt entry in a partition can fail queries running against this partition, especially when using Parquet, which is harder to edit than a simple CSV file. Make sure that you validate your data before scanning it with Redshift Spectrum.

    Lesson #2: Structure and partition data effectively

    One of the biggest benefits of using Redshift Spectrum (or Athena for that matter) is that you don’t need to keep nodes up and running all the time. You pay only for the queries you perform and only for the data scanned per query.

    Keeping different permutations of your data for different queries makes a lot of sense in this case. For example, you can partition your data by date and hour to run time-based queries, and also have another set partitioned by user_id and date to run user-based queries. This results in faster and more efficient performance of your data warehouse.

    Storing data in the right format

    Use Parquet whenever you can. The benefits of Parquet are substantial. Faster performance, less data to scan, and much more efficient columnar format. However, it is not supported out-of-the-box by Kinesis Firehose, so you need to implement your own ETL. AWS Glue is a great option.

    Creating small tables for frequent tasks

    When we started using Redshift Spectrum, we saw our Amazon Redshift costs jump by hundreds of dollars per day. Then we realized that we were unnecessarily scanning a full day’s worth of data every minute. Take advantage of the ability to define multiple tables on the same S3 bucket or folder, and create temporary and small tables for frequent queries.

    Lesson #3: Combine Athena and Redshift Spectrum for optimal performance

    Moving to Redshift Spectrum also allowed us to take advantage of Athena as both use the AWS Glue Data Catalog. Run fast and simple queries using Athena while taking advantage of the advanced Amazon Redshift query engine for complex queries using Redshift Spectrum.

    Redshift Spectrum excels when running complex queries. It can push many compute-intensive tasks, such as predicate filtering and aggregation, down to the Redshift Spectrum layer, so that queries use much less of your cluster’s processing capacity.

    Lesson #4: Sort your Parquet data within the partition

    We achieved another performance improvement by sorting data within the partition using sortWithinPartitions(sort_field). For example:

    df.repartition(1).sortWithinPartitions("campaign_id")…

    Conclusion

    We were extremely pleased with using Amazon Redshift as our core data warehouse for over three years. But as our client base and volume of data grew substantially, we extended Amazon Redshift to take advantage of scalability, performance, and cost with Redshift Spectrum.

    Redshift Spectrum lets us scale to virtually unlimited storage, scale compute transparently, and deliver super-fast results for our users. With Redshift Spectrum, we store data where we want at the cost we want, and have the data available for analytics when our users need it with the performance they expect.


    About the Author

    With 7 years of experience in the AdTech industry and 15 years in leading technology companies, Rafi Ton is the founder and CEO of NUVIAD. He enjoys exploring new technologies and putting them to use in cutting edge products and services, in the real world generating real money. Being an experienced entrepreneur, Rafi believes in practical-programming and fast adaptation of new technologies to achieve a significant market advantage.