AWS Marketplace Now Offers Professional Services

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/aws-marketplace-now-offers-professional-services/

Now with AWS Marketplace, customers can not only find and buy third-party software but also the professional services needed to support the full lifecycle of those products, including planning, deployment and support. This simplifies the software supply chain including tasks like managing provider relationships and procurement processes and also consolidates billing and invoices in one place.

Until today, customers have used AWS Marketplace for buying software and then used a separate process for contracting professional services. Many customers need extra professional services when they purchase third-party software, like premium support, implementation, or training. The additional effort to support different procurement processes impacts customers’ project timelines and adds a lot complexity to the customer’s organization.

Last year we announced AWS IQ, a service that helps you engage with AWS Certified third-party experts for AWS project work. This year we want to go one step further and help you find professional services for all those third-party software solutions you currently buy from AWS Marketplace.

For the Buyers
Buyers can now discover professional services using AWS Marketplace from multiple trusted sellers, manage the invoices and payments from software and services together and reduce procurement time, accelerating the process from months to days.

This new feature allow buyers to choose from a selection of professional services such as assessments, implementation, premium support, and managed services and training from consulting partners, managed service providers and independent software vendors.

To get started finding and buying professional services, first you need to find the right service for you. If you are looking for a professional service associated with a particular piece of software, using the search tool in AWS Marketplace, you can search for the software and the related professional services will appear in the search results. Use the delivery method to filter the results to just include professional services.

Screenshot of searching for professional services

After you find the service you are looking for, you can visit the service details page and learn more information about the listing. If you want to buy the service, just click continue.

Screenshot of service page

That will open the request service form where you can connect to the seller and request the service. The seller will receive a notification and then they can contact you to agree on the scope of the work including deliverables, milestones, pricing, payment schedules, and service terms.

Screenshot of request service form

Once you agree with the seller on all the specific details of the contract, the seller sends you a private offer. Now the offer page will show the private offer details instead of a request for service form. You can review the pricing, payment schedule, and contract terms and create the contract.

Screenshot of private offer

The service subscription starts after you review and accept the private offer on AWS Marketplace. Also, you will receive an invoice from AWS Marketplace and you can track your subscriptions in the buyers management console. The purchases of the services are itemized on your AWS invoice, simplifying payments and cost management.

For the Sellers
This new feature of AWS Marketplace enables you, the seller, to grow your business and reach new customers by listing your professional service offerings. You can list professional services offerings as individual products or alongside existing software products in AWS Marketplace using pricing, payment schedule, and service terms that are independent from the software.

In AWS Marketplace you will create your seller page, where all your information as a seller will be displayed to the potential buyers.

Public professional service listings are discoverable by search and visible in your seller profile. You will receive customer requests for each of the services listed. Agree with the customer on the details of the service contract and then send a private offer to them.

Screenshot for creating a professional service

AWS Marketplace will invoice and collect the payments from the customers and distribute the funds to your bank account after the customers pay. AWS Marketplace also offers you seller reports that are updated daily to understand how your business is doing.

Availability
To learn more about buying and selling professional services in AWS Marketplace, visit the AWS Marketplace service page

Marcia

Managed Entitlements in AWS License Manager Streamlines License Tracking and Distribution for Customers and ISVs

Post Syndicated from Harunobu Kameda original https://aws.amazon.com/blogs/aws/managed-entitlements-for-aws-license-manager-streamlines-license-management-for-customers-and-isvs/

AWS License Manager is a service that helps you easily manage software licenses from vendors such as Microsoft, SAP, Oracle, and IBM across your Amazon Web Services (AWS) and on-premises environments. You can define rules based on your licensing agreements to prevent license violations, such as using more licenses than are available. You can set the rules to help prevent licensing violations or notify you of breaches. AWS License Manager also offers automated discovery of bring your own licenses (BYOL) usage that keeps you informed of all software installations and uninstallations across your environment and alerts you of licensing violations.

License Manager can manage licenses purchased in AWS Marketplace, a curated digital catalog where you can easily find, purchase, deploy, and manage third-party software, data, and services to build solutions and run your business. Marketplace lists thousands of software listings from independent software vendors (ISVs) in popular categories such as security, networking, storage, machine learning, business intelligence, database, and DevOps.

Managed entitlements for AWS License Manager
Starting today, you can use managed entitlements, a new feature of AWS License Manager that lets you distribute licenses across your AWS Organizations, automate software deployments quickly and track licenses – all from a single, central account. Previously, each of your users would have to independently accept licensing terms and subscribe through their own individual AWS accounts. As your business grows and scales, this becomes increasingly inefficient.

Customers can use managed entitlements to manage more than 8,000 listings available for purchase from more than 1600 vendors in the AWS Marketplace. Today, AWS License Manager automates license entitlement distribution for Amazon Machine Image, Containers and Machine Learning products purchased in the Marketplace with a variety of solutions.

How It Works
Managed entitlements provides built-in controls that allow only authorized users and workloads to consume a license within vendor-defined limits. This new license management mechanism also eliminates the need for ISVs to maintain their own licensing systems and conduct costly audits.

overview

Each time a customer purchases licenses from AWS Marketplace or a supported ISV, the license is activated based on AWS IAM credentials, and the details are registered to License Manager.

list of granted license

Administrators distribute licenses to AWS accounts. They can manage a list of grants for each license.

list of grants

Benefits for ISVs
AWS License Manager managed entitlements provides several benefits to ISVs to simplify the automatic license creation and distribution process as part of their transactional workflow. License entitlements can be distributed to end users with and without AWS accounts. Managed entitlements streamlines upgrades and renewals by removing expensive license audits and provides customers with a self-service tracking tool with built-in license tracking capabilities. There are no fees for this feature.

Managed entitlements provides the ability to distribute licenses to end users who do not have AWS accounts. In conjunction with the AWS License Manager, ISVs create a unique long-term token to identify the customer. The token is generated and shared with the customer. When the software is launched, the customer enters the token to activate the license. The software exchanges the long-term customer token for a short-term token that is passed to the API and the setting of the license is completed. For on-premises workloads that are not connected to the Internet, ISVs can generate a host-specific license file that customers can use to run the software on that host.

Now Available
This new enhancement to AWS License Manager is available today for US East (N. Virginia), US West (Oregon), and Europe (Ireland) with other AWS Regions coming soon.

Licenses purchased on AWS Marketplace are automatically created in AWS License Manager and no special steps are required to use managed entitlements. For more details about the new feature, see the managed entitlement pages on AWS Marketplace, and the documentation. For ISVs to use this new feature, please visit our getting started guide.

Get started with AWS License Manager and the new managed entitlements feature today.

– Kame

Linux Foundation 2020 annual report

Post Syndicated from corbet original https://lwn.net/Articles/838871/rss

The Linux Foundation has published a
glossy report
of its activities for 2020. “2020 has been a year
of challenges for the Linux Foundation (‘LF’) and our hosted
communities. During this pandemic, we’ve all seen our daily lives and those
of many of our colleagues, friends, and family around the world completely
changed. Too many in our community also grieved over the loss of family and
friends.

It was uplifting to see LF members join the fight against COVID-19. Our
members worldwide contributed technical resources for scientific
researchers, offered assistance to struggling families and individuals,
contributed to national and international efforts, and some even came
together to create open source projects under LF Public Health to help
countries deal with the pandemic.”

Security updates for Thursday

Post Syndicated from jake original https://lwn.net/Articles/838870/rss

Security updates have been issued by Mageia (cimg, pngcheck, poppler, tor, and xdg-utils), openSUSE (mariadb), Red Hat (go-toolset-1.14-golang), and Ubuntu (linux, linux-aws, linux-aws-hwe, linux-azure, linux-azure-4.15, linux-gcp, linux-gcp-4.15, linux-gke-4.15, linux-hwe, linux-kvm, linux-oem, linux-oracle, linux-raspi2, linux-snapdragon).

Diversity and inclusion in computing education — new research seminars

Post Syndicated from Sue Sentance original https://www.raspberrypi.org/blog/diversity-inclusion-computing-education-research-seminars/

At the Raspberry Pi Foundation, we host a free online research seminar once a month to explore a wide variety of topics in the area of digital and computing education. This year, we’ve hosted eleven seminars — you can (re)discover slides and recordings on our website.

A classroom of young learners and a teacher at laptops

Now we’re getting ready for new seminars in 2021! In the coming months, our seminars are going to focus on diversity and inclusion in computing education. This topic is extremely important, as we want to make sure that computing is accessible to all, that we understand how to actively remove barriers to participation for learners, and that we understand how to teach computing in an inclusive way. 

We are delighted to announce that these seminars focusing on diversity and inclusion will be co-hosted by the Royal Academy of Engineering. The Royal Academy of Engineering is harnessing the power of engineering to build a sustainable society and an inclusive economy that works for everyone.

Royal Academy of Engineering logo

We’re very excited to be partnering with the Academy because of our shared interest in ensuring that computing and engineering are inclusive and accessible to all.

Our upcoming seminars

The seminars take place on the first Tuesday of the month at 17:00–18:30 GMT / 12:00–13:30 EST / 9:00–10:30 PST / 18:00–19:30 CET.

  • 5 January 2021: Peter Kemp (King’s College London) and Billy Wong (University of Reading) will be looking at computing education in England, particularly GCSE computer science, and how it is accessed by groups typically underrepresented in computing.
  • 2 February 2021: Professor Tia Madkins (University of Texas at Austin), Nicol R. Howard (University of Redlands), and Shomari Jones (Bellevue School District) will be talking about equity-focused teaching in K–12 computer science. Find out more.
  • 2 March 2021: Dr Jakita O. Thomas (Auburn University, Alabama) will be talking about her research on supporting computational algorithmic thinking in the context of intersectional computing.
  • April 2021: event to be confirmed
  • 4 May 2021: Dr Cecily Morrison (Microsoft Research) will be speaking about her work on physical programming for people with visual impairments.

Join the seminars

We’d love to welcome you to these seminars so we can learn and discuss together. To get access, simply sign up with your name and email address.

Once you’ve signed up, we’ll email you the seminar meeting link and instructions for joining. If you attended our seminars in the past, the link remains the same.

The post Diversity and inclusion in computing education — new research seminars appeared first on Raspberry Pi.

How Grab is Blazing Through the Super App Bazel Migration

Post Syndicated from Grab Tech original https://engineering.grab.com/how-grab-is-blazing-through-the-super-app-bazel-migration

Introduction

At Grab, we build a seamless user experience that addresses more and more of the daily lifestyle needs of people across South East Asia. We’re proud of our Grab rides, payments, and delivery services, and want to provide a unified experience across these offerings.

Here is couple of examples of what Grab does for millions of people across South East Asia every day:

Grab Service Offerings
Grab Service Offerings

The Grab Passenger application reached super app status more than a year ago and continues to provide hundreds of life-changing use cases in dozens of areas for millions of users.

With the big product scale, it brings with it even bigger technical challenges. Here are a couple of dimensions that can give you a sense of the scale we’re working with.

Engineering and product structure

Technical and product teams work in close collaboration to outserve our customers. These teams are combined into dedicated groups to form Tech Families and focus on similar use cases and areas.

Grab consists of many Tech Families who work on food, payments, transport, and other services, which are supported by hundreds of engineers. The diverse landscape makes the development process complicated and requires the industry’s best practices and approaches.

Codebase scale overview

The Passenger Applications (Android and iOS) contain more than 2.5 million lines of code each and it keeps growing. We have 1000+ modules in the Android App and 700+ targets in the iOS App. Hundreds of commits are merged by all the mobile engineers on a daily basis.

To maintain the health of the codebase and product stability, we run 40K+ unit tests on Android and 30K+ unit tests on iOS, as well as thousands of UI tests and hundreds of end-to-end tests on both platforms.

Build time challenges

The described complexity and scale do not come without challenges. A huge codebase propels the build process to the ultimate extreme- challenging the efficiency of build systems and hardware used to compile the super app, and creating out of the line challenges to be addressed.

Local build time

Local build time (the build on engineers’ laptop) is one of the most obvious challenges. More code goes in the application binary, hence the build system requires more time to compile it.

ADR local build time

The Android ecosystem provides a great out-of-the-box tool to build your project called Gradle. It’s flexible and user friendly, and  provides huge capabilities for a reasonable cost. But is this always true? It appears to not be the case due to multiple reasons. Let’s unpack these reasons below.

Gradle performs well for medium sized projects with say 1 million line of code. Once the code surpasses that 1 million mark (or so), Gradle starts failing in giving engineers a reasonable build time for the given flexibility. And that’s exactly what we have observed in our Android application.

At some point in time, the Android local build became ridiculously long. We even encountered cases  where engineers’ laptops simply failed to build the project due to hardware resources limits. Clean builds took by the hours, and incremental builds easily hit dozens of minutes.

iOS local build time

Xcode behaved a bit better compared to Gradle. The Xcode build cache was somehow bearable for incremental builds and didn’t exceed a couple of minutes. Clean builds still took dozens of minutes though. When Xcode failed to provide the valid cache, engineers had to rerun everything as a clean build, which killed the experience entirely.

CI pipeline time

Each time an engineer submits a Merge Request (MR), our CI kicks in running a wide variety of jobs to ensure the commit is valid and doesn’t introduce regression to the master branch. The feedback loop time is critical here as well, and the pipeline time tends to skyrocket alongside the code base growth. We found ourselves on the trend where the feedback loop came in by the hours, which again was just breaking the engineering experience, and prevented  us from delivering the world’s best features to our customers.

As mentioned, we have a large number of unit tests (30K-40K+) and UI tests (700+) that we run on a pre-merge pipeline. This brings us to hours of execution time before we could actually allow MRs to land to the master branch.

The number of daily commits, which is by the hundreds, adds another stone to the basket of challenges.

All this clearly indicated the area of improvement. We were missing opportunities in terms of engineering productivity.

The extra mile

The biggest question for us to answer was how to put all this scale into a reasonable experience with minimal engineering idle time and fast feedback loop.

Build time critical path optimization

The most reasonable thing to do was to pay attention to the utilization of the hardware resources and make the build process optimal.

This literally boiled down to the simplest approach:

  1. Decouple building blocks
  2. Make building blocks as small as possible

This approach is valid for any build system and applies  for both iOS and Android. The first thing we focused on was to understand what our build graph looked like, how dependencies were distributed, and which blocks were bottlenecks.

Given the scale of the apps, it’s practically not possible to manage a dependency tree manually, thus we created a tool to help us.

Critical path overview

We introduced the Critical Path concept:

The critical path is the longest (time) chain of sequential dependencies, which must be built one after the other.

Critical Path
Critical Path build

Even with an infinite number of parallel processors/cores, the total build time cannot be less than the critical path time.

We implemented the tool that parsed the dependency trees (for both Android and iOS), aggregated modules/target build time, and calculated the critical path.

The concept of the critical path introduced a number of action items, which we prioritized:

  • The critical path must be as short as possible.
  • Any huge module/target on the critical path must be split into smaller modules/targets.
  • Depend on interfaces/bridges rather than implementations to shorten the critical path.
  • The presence of other teams’ implementation modules/targets in the critical path of the given team is a red flag.
Stack representation of the Critical Path build time
Stack representation of the Critical Path build time

Project’s scale factor

To implement the conceptually easy action items, we ran a Grab-wide program. The program has impacted almost every mobile team at Grab and involved 200+ engineers to some degree. The whole implementation took 6 months to complete.

During this period of time, we assigned engineers who were responsible to review the changes, provide support to the engineers across Grab, and monitor the results.

Results

Even though the overall plan seemed to be good on paper, the results were minimal – it just flattened the build time curve of the upcoming trend introduced by the growth of the codebase. The estimated impact was almost the same for both platforms and gave us about a 7%-10% cut in the CI and local build time.

Open source plan

The critical path tool proved to be effective to illustrate the projects’ bottlenecks in a dependency tree configuration. It is currently widely used by mobile teams at Grab to analyze their dependencies and cut out or limit an unnecessary impact on the respective scope.

The tool is currently considered to be open-sourced as we’d like to hear feedback from other external teams and see what can be built on top of it. We’ll provide more details on this in future posts.

Remote build

Another pillar of the  build process is the hardware where the build runs. The solution is  really straightforward – put more muscles on your build to get it stronger and to run faster.

Clearly, our engineers’ laptops could not be considered fast enough. To have a fast enough build we were looking at something with 20+ cores, ~200Gb of RAM. None of the desktop or laptop computers can reach those numbers within reasonable pricing. We hit a bottleneck in hardware. Further parallelization of the build process didn’t give any significant improvement as all the build tasks were just queueing and waiting for the resources to be released. And that’s where cloud computing came into the picture where a huge variety of available options is ready to be used.

ADR mainframer

We took advantage of the Mainframer tool. When the build must run, the code diff is pushed to the remote executor, gets compiled, and then the generated artifacts are pushed back to the local machine. An engineer might still benefit from indexing, debugging, and other features available in the IDE.

To make the infrastructure mature enough, we’ve introduced Kubernetes-based autoscaling based on the load. Currently, we have a stable infrastructure that accommodates 100+ Android engineers scaling up and down (saving costs).

This strategy gave us a 40-50% improvement in the local build time. Android builds finished, in the extreme case, x2 faster.

iOS

Given the success of the Android remote build infrastructure, we have immediately turned our attention to the iOS builds. It was an obvious move for us – we wanted the same infrastructure for iOS builds. The idea looked good on paper and was proven with Android infrastructure, but the reality was a bit different for our iOS builds.

Our  very first roadblock was that Xcode is not that flexible and the process of delegating build to remote is way more complicated compared to Android. We tackled a series of blockers such as running indexing on a remote machine, sending and consuming build artifacts, and even running the remote build itself.

The reality was that the remote build was absolutely possible for iOS. There were  minor tradeoffs impacting engineering experience alongside obvious gains from utilizing cloud computing resources. But the problem is that legally iOS builds are only allowed to be built on an Apple machine.

Even if we get the most powerful hardware – a macPro –  the specs are still not ideal and are unfortunately not optimized for the build process. A 24 core, 194Gb RAM macPro could have given about x2 improvement on the build time, but when it had to  run 3 builds simultaneously for different users, the build efficiency immediately dropped to the baseline value.

Android remote machines with the above same specs are capable of running up to 8 simultaneous builds. This allowed us to accommodate up to 30-35 engineers per machine, whereas iOS’ infrastructure would require to keep this balance at 5-6 engineers per machine. This solution didn’t seem to be scalable at all, causing us to abandon the idea of the remote builds for iOS at that moment.

Test impact analysis

The other battlefront was the CI pipeline time. Our efforts in dependency tree optimizations complemented with comparably powerful hardware played a good part in achieving a reasonable build time on CI.

CI validations also include the execution of unit and UI tests and may easily take 50%-60% of the pipeline time. The problem was getting worse as the number of tests was constantly growing. We were to face incredibly huge tests’ execution time in the near future. We could mitigate the problem by a muscle approach – throwing more runners and shredding tests – but it won’t make finance executives happy.

So the time for smart solutions came again. It’s a known fact that the simpler solution is more likely to be correct. The simplest solution was to stop running ALL tests. The idea was to run only those tests that were impacted by the codebase change introduced in the given MR.

Behind this simple idea, we’ve found a huge impact. Once the Test Impact Analysis was applied to the pre-merge pipelines, we’ve managed to cut down the total number of executed tests by up to 90% without any impact on the codebase quality or applications’ stability. As a result, we cut the pipeline for both platforms by more than 30%.

Today, the Test Impact Analysis is coupled with our codebase. We are looking to  invest some effort to make it available for open sourcing. We are excited to be  on this path.

The end of the Native Build Systems

One might say that our journey was long and we won the battle for the build time.

Today, we hit a limit to the native build systems’ efficiency and hardware for both Android and iOS. And it’s clear to us that in our current setup, we would not be able to scale up while delivering high engineering experience.

Let’s move to Bazel

To introduce another big improvement to the build time, we needed to make some ground-level changes. And this time, we focused on the  build system itself.

Native build systems are designed to work well for small and medium-sized projects, however they have not been as successful in large scale projects such as the Grab Passenger applications.

With these assumptions, we considered options and found the Bazel build system to be a good contender. The deep comparison of build systems disclosed that Bazel was promising better results almost in all key areas:

  • Bazel enables remote builds out of box
  • Bazel provides sustainable cache capabilities (local and remote). This cache can be reused across all consumers – local builds, CI builds
  • Bazel was designed with the big codebase as a cornerstone requirement
  • The majority of the tooling may be reused across multiple platforms

Ways of adopting

On paper, Bazel was awesome and shining. All our playground investigations showed positive results:

  • Cache worked great
  • Incremental builds were incredibly fast

But the effort to shift to this new build system was huge. We made sure that we foresee all possible pitfalls and impediments. It took us about 5 months to estimate the impact and put together a sustainable proof of concept, which reflected the majority of our use cases.

Migration limitations

After those 5 months of investigation, we got the endless list of incompatible features and major blockers to be addressed. Those blockers touched even such obvious things as indexing and the jump to definition IDE feature, which we used to take for granted.

But the biggest challenge was the need to keep the pace of the product release. There was no compromise of stopping the product development even for a day. The way out appeared to be a hybrid build concept. We figured out how to marry native and Bazel build systems to live together in harmony. This move gave us a chance to start migrating target by target, project by project moving from the bottom to top of the dependency graph.

This approach was a valid enabler, however we were still faced with a challenge of our app’s  scale. The codebase of over 2.5 million of LOC cannot be migrated overnight. The initial estimation was based on the idea of manually migrating the whole codebase, which would have required us to invest dozens of person-months.

Team capacity limitations

This approach was immediately pushed back by multiple teams arguing with the priority and concerns about the impact on their own product roadmap.

We were left with not much  choice. On one hand, we had a pressingly long build time. And on the other hand, we were asking for a huge effort from teams. We clearly needed to get buy-ins from all of our stakeholders to push things forward.

Getting buy-in

To get all needed buy-ins, all stakeholders were grouped and addressed separately. We defined key factors for each group.

Key factors

C-level stakeholders:

  • Impact. The migration impact must be significant – at least a 40% decrease on the build time.
  • Costs. Migration costs must be paid back in a reasonable time and the positive impact is extended to  the future.
  • Engineering experience. The user experience must not be compromised. All tools and features engineers used must be available during migration and even after.

Engineers:

  • Engineering experience. Similar to the criteria established at the C-level factor.
  • Early adopters engagement. A common  core experience must be created across the mobile engineering community to support other engineers in the later stages.
  • Education. Awareness campaigns must be in place. Planned and conducted a series of tech talks and workshops to raise awareness among engineers and cut the learning curve. We wrote hundreds of pages of documentation and guidelines.

Product teams:

  • No roadmap impact. Migration must not affect the product roadmap.
  • Minimize the engineering effort. Migration must not increase the efforts from engineering.

Migration automation (separate talks)

The biggest concern for the majority of the stakeholders appeared to be the estimated migration effort, which impacted the cost, the product roadmap, and the engineering experience. It became evident that we needed to streamline the process and reduce the effort for migration.

Fortunately, the actual migration process was routine in nature, so we had opportunities for automation. We investigated ideas on automating the whole migration process.

The tools we’ve created

We found that it’s relatively easy to create a bunch of tools that read the native project structure and create an equivalent Bazel set up. This was a game changer.

Things moved pretty smoothly for both Android and iOS projects. We managed to roll out tooling to migrate the codebase in a single click/command (well with some exceptions as of now. Stay tuned for another blog post on this). With this tooling combined with the hybrid build concept, we addressed all the key buy-in factors:

  • Migration cost dropped by at least 50%.
  • Less engineers required for the actual migration. There was no need to engage the wide engineering community as a small group of people can manage the whole process.
  • There is no more impact on the product roadmap.

Where do we stand today

When we were in the middle of the actual migration, we decided to take a pragmatic path and migrate our applications in phases to ensure everything was under control and that there were no unforeseen issues.

The hybrid build time is racing alongside our migration progress. It has a linear dependency on the amount of the migrated code. The figures look positive and we are confident in achieving our impact goal of decreasing at least 40% of the build time.

Plans to open source

The automated migration tooling we’ve created is planned to be open sourced. We are doing a bit better on the Android side decoupling it from our applications’ implementation details and plan to open source it in the near future.

The iOS tooling is a bit behind, and we expect it to be available for open-sourcing by the end of Q1’2021.

Is it worth it all?

Bazel is not a silver bullet for the build time and your project. There are a lot of edge cases you’ll never know until it punches you straight in your face.

It’s far from industry standard and you might find yourself having difficulty hiring engineers with such knowledge. It has a steep learning curve as well. It’s absolutely an overhead for small to medium-sized projects, but it’s undeniably essential once you start playing in a high league of super apps.

If you were to ask whether we’d go this path again, the answer would come in a fast and correct way – yes, without any doubts.


Authored by Sergii Grechukha on behalf of the Passenger App team at Grab. Special thanks to Madushan Gamage, Mikhail Zinov, Nguyen Van Minh, Mihai Costiug, Arunkumar Sampathkumar, Maryna Shaposhnikova, Pavlo Stavytskyi, Michael Goletto, Nico Liu, and Omar Gawish for their contributions.


Join us

Grab is more than just the leading ride-hailing and mobile payments platform in Southeast Asia. We use data and technology to improve everything from transportation to payments and financial services across a region of more than 620 million people. We aspire to unlock the true potential of Southeast Asia and look for like-minded individuals to join us on this ride.

If you share our vision of driving South East Asia forward, apply to join our team today.

[$] Python structural pattern matching morphs again

Post Syndicated from jake original https://lwn.net/Articles/838600/rss

A way to specify multiply branched conditionals in the Python language—akin
to the C switch statement—has been
a longtime feature request. Over the years, various proposals have been
mooted, but none has ever crossed the finish line and made it into the
language. A highly ambitious proposal that
would solve the multi-branch-conditional problem (and quite a bit more) has
been discussed—dissected, perhaps—in the Python community over the last six
months or so. We have covered
some of the discussion in August and September, but the ground has shifted once
again so it is time to see where things stand.

High Quality Asphere Manufacturing from Edmund Optics

Post Syndicated from IEEE Spectrum Recent Content full text original https://spectrum.ieee.org/webinar/high-quality-asphere-manufacturing-from-edmund-optics

Edmund Optics’ asphere experts Amy Frantz and Oleg Leonov, and moderator Lars Sandström, Precision Optics Senior Business Line Manager, present the benefits of using aspheres in optical system design and what factors need be taken into account during the design process. These key manufacturability considerations will significantly reduce asphere lead time and cost if considered early enough in the design process.

At the conclusion of this webinar, participants will have a strong understanding around:

  • Benefits of using aspheres in optics system design
  • Challenges of asphere manufacturing
  • Key factors on manufacturable aspheres

Australian Team Running Its Own DARPA-Style Cave Challenge to Test Robots

Post Syndicated from Evan Ackerman original https://spectrum.ieee.org/automaton/robotics/robotics-hardware/darpa-subt-team-csiro-data-61

Although the in-person Systems Track event of the DARPA SubT Challenge was cancelled because of the global pandemic, the Systems Track teams still have to prepare for the Final Event in 2021, which will include a cave component. Systems Track teams have been on their own to find cave environments to test in, and many of them are running their own DARPA-style competitions to test their software and hardware.

We’ll be posting a series of interviews exploring where and how the teams are making this happen, and today we’re featuring Team CSIRO Data 61, based in Brisbane, Australia.

GitHub Availability Report: November 2020

Post Syndicated from Keith Ballinger original https://github.blog/2020-12-02-availability-report-november-2020/

Introduction

In November, we experienced two incidents resulting in significant impact and degraded state of availability for issues, pull requests, and GitHub Actions services.

November 2 12:00 UTC (lasting 32 minutes)

The SSL certificate for *.githubassets.com expired, impacting web requests for GitHub.com UI and services. There was an auto-generated issue indicating the certificate was within 30 days of expiration, but it was not addressed in time. Impact was reported, and the on-call engineer remediated it promptly.

We are using this occurrence to evaluate our current processes, as well as our tooling and automation, within this area to reduce the likelihood of such instances in the future.

November 27 16:04 UTC (lasting one hour and one minute)

Our service monitors detected abnormal levels of replication lag within one of our MySQL clusters affecting the GitHub Actions service.

Due to the recency of this incident, we are still investigating the contributing factors and will provide a more detailed update in next month’s report.

In summary

We place great importance in the reliability of our services along with the trust that our users place in us every day. We’ll continue to keep you updated on the progress we’re making to ensure this. To learn more about what we’re working on, visit the GitHub engineering blog.

re:Invent 2020 Liveblog: Partner Keynote

Post Syndicated from AWS News Blog Team original https://aws.amazon.com/blogs/aws/reinvent-2020-liveblog-partner-keynote/

Join us Thursday, Dec. 3, from 7:45-9:30 a.m., for the AWS Partner Keynote with Doug Yeum, head of Worldwide Channels and Alliances; Sandy Carter, vice president, Global Public Sector Partners and Programs; and Dave McCann, vice president, AWS Migration, Marketplace, and Control Services. Developer Advocates Steve Roberts and Martin Beeby will be liveblogging all the announcements and discussion.

See you at 7:45 a.m. (PST) Thursday!

— Steve & — Martin

 


Polling Is Too Hard—for Humans

Post Syndicated from Steven Cherry original https://spectrum.ieee.org/podcast/artificial-intelligence/machine-learning/polling-is-too-hardfor-humans

Steven Cherry Hi, this is Steven Cherry, for Radio Spectrum.

The Literary Digest, a now-defunct magazine, was founded in 1890. It offered—despite what you’d expect from its name—condensed versions of news-analysis and opinion pieces. By the mid-1920s, it had over a million subscribers. Some measure of its fame and popularity stemmed from accurately predicting every presidential election from 1916 to 1932, based on polls it conducted of its ever-growing readership.

Then came 1936. The Digest predicted that Kansas Governor Alf Landon would win in a landslide over the incumbent, Franklin Delano Roosevelt. Landon in fact captured only 38 percent of the vote. Roosevelt won 46 of the U.S.’s 48 states, the biggest landslide in presidential history. The magazine never recovered from its gaffe and folded two years later.

The Chicago Tribune did recover from its 1948 gaffe, one of the most famous newspaper headlines of all time, “Dewey Defeats Truman”—a headline that by the way was corrected in the second edition that election night to read “Democrats Make Sweep of State Offices,” and by the final edition, “Early Dewey Lead Narrow; Douglas, Stevenson Win,” referring to candidates that year for Senator and Governor. The Senator, Paul Douglas, by the way, was no relation to an earlier Senator from Illinois a century ago, Stephen Douglas.

The Literary Digest’s error was due, famously, to the way it conducted its polls— its readership, even though a million strong, was woefully unrepresentative of the nation’s voters as a whole.

The Tribune’s gaffe was in part due to a printer’s strike that forced the paper to settle on a first-edition banner headline hours earlier than it otherwise would have, but it made the guess with great confidence in part because the unanimous consensus of the polling that year that had Dewey ahead, despite his running one of the most lackluster, risk-averse campaigns of all time.

Polls have been making mistakes ever since, and it’s always, fundamentally, the same mistake. They’re based on representative samples of the electorate that aren’t sufficiently representative.

After the election of 2016, in which the polling was not only wrong but itself might have inspired decisions that affected the outcome—where the Clinton campaign shepherded its resources; whether James Comey would hold a press conference—pollsters looked inward, re-weighted various variables, assured us that the errors of 2016 had been identified and addressed, and then proceeded to systematically mis-predict the 2020 presidential election much as they had four years earlier.

After a century of often-wrong results, it would be reasonable to conclude that polling is just too difficult for humans to get right.

But what about software? Amazon, Netflix, and Google do a remarkable job of predicting consumer sentiment, preferences, and behavior. Could artificial intelligence predict voter sentiment, preferences, and behavior?

Well, it’s not as if they haven’t tried. And results in 2020 were mixed. One system predicted Biden’s lead in the popular vote to be large, but his electoral college margin small—not quite the actual outcome. Another system was even further from the mark, giving Biden wins in Florida, Texas, and Ohio—adding up to a wildly off-base electoral margin.

One system, though, did remarkably well. As a headline in Fortune magazine put it the morning of election day, “The polls are wrong. The U.S. presidential race is a near dead heat, this AI ‘sentiment analysis’ tool says.” The AI tool predicted a popular vote of 50.2 percent for Biden, only about one-sixth of one percent from the actual total, and 47.3 percent for Trump, off by a mere one-tenth of one percent.

The AI compay that Fortune magazine referred to is called Expert.ai, and its Chief Technology Officer, Marco Varone, is my guest today.

Marco, welcome to the podcast.

Marco Varone Hi everybody.

Steven Cherry Marco, AI-based speech recognition has been pretty good for 20 years, AI has been getting better and better at fraud detection for 25 years. AI beat the reigning chess champion back in 1997. Why has it taken so long to apply AI to polling, which is, after all … well, even in 2017, was a $20.1 billion dollar industry, which is about $20 billion more than chess.

Marco Varone Well, there are two reasons for this. The first one, that if you wanted to apply artificial intelligence to this kind of problem, you need to have the capability of understanding language in a pretty specific, deep, and nuanced way. And it is something that, frankly, for many, many years was very difficult and required a lot of investment and a lot of work in trying to go deeper than the traditional shallow understanding of text. So this was one element.

The second element is that, as you have seen in this particular case, polls, on average, are working still pretty well. But there are particular events, in particular a situation where there is a clear gap between what has been predicted and the final result. And there is a tendency to say, okay, on average, the results are not so bad. So don’t change too much because we can make it with good results, without the big changes that are always requiring investment, modification, and a complex process.

I would say that it’s a combination of the technology that needed to become better and better in understanding the capability of really extracting insights and small nuances from any kind of communication and the fact that for other types of polls, the current situation is not so bad.

The fact [is] that now there is a growing amount of information that you can easily analyze because it is everywhere in every social network, every communication in every blog and comments, made it a bit easier to say, okay, now we have a better technology, even in specific situations we can have access to a huge amount of data. So let’s try it. And this is what we did. And I believe this will become a major trend in the future.

Steven Cherry Every AI system needs data; Expert.ai uses social posts. How does it work?

Marco Varone Well, the social posts are, I would say, the most valuable kind of content that you can analyze in a situation like this, because on one side, it is a type of content that we know. When I say we know, it means that we have a used this type of content for many other projects. It is normally the kind of content that we analyze for our traditional customers, looking for the actual comments and opinions about products, services, and particular events. Social content is easy to get—up to a point; with the recent scandals, it’s becoming a bit more difficult to have access to. A huge amount of social data in the past was a bit simpler—and also it is something where you can find really every kind of person, every kind of expression, and every kind of discussion.

So it’s easier to analyze this content, to extract a big amount of insight, a big amount of information, and trying to tune … to create reasonably solid models that can be tested in a sort of realtime—there is a continuous stream of social content. There is an infinite number of topics that are discussed. And so you have the opportunity to have something that is plentiful, that is cheap, but has a big [?] expression and where you can really tune your models and tune your algorithms in it much faster and more cost-effective way than with the other type of content.

Steven Cherry So that sort of thing requires something to count as a ground truth. What is what is your ground truth here?

Very, very, very good point … a very good question. The key point is that from the start, we have decided to invest a lot of money and a lot of efforts in creating a sort of representation of knowledge that we have stored in a big knowledge graph that has been crafted manually, initially.

So we created this knowledge representation that is a sort of representation of the world knowledge, in a reduced form, and the language and the way that you express this knowledge. And we created this solid foundation, manually, so we have been able to build on a very solid and very structured foundation. On top of this foundation, it was possible, as I mentioned, to add the new knowledge, working, analyzing a big amount of data, social data is an example, but there are many other types of data that we use to enrich our knowledge. And so we are not influenced, like many other approaches, from a bias that you can take from extracting knowledge only from data.

So it’s the start of a two-tier system where we have this solid ground-truth foundation—the knowledge and information that that expert linguists and the people that have a huge understanding of things that’s created. On top of that, we can add all the information that we can extract more or less automatically from a different type of data. We believe that this was a huge investment that we did during the years, but is paying big dividends and also giving us the possibility of understanding the language and the communication at a deeper level than with other approaches.

Steven Cherry And are you using only data from Twitter or from other social media as well?

Marco Varone No, no, we try to use as much social media as possible, the limitation sometimes is that Twitter is much easier and faster to have access to a bigger amount of information. For other social sources sometimes is not that easy because you can have issues in accessing the content or you have a very limited amount of information that you can download, or that is expensive—or some sources, you cannot really control them automatically. So Twitter becomes his first choice for the reason that it is easier to get a big volume. And if you are ready to pay, you can have access to the full Twitter firehose.

Steven Cherry The universe of people who post on social networks would seem to be skewed in any number of ways. Some people post more than the average. Some people don’t post much at all. People are sometimes more extreme in their views. How do you go from social media sentiment to voter sentiment? How do you avoid the Literary Digest problem?

Marco Varone Probably the most relevant element is our huge experience. Somehow we we have started to analyze the big amount of data, textual data, many, many years ago, and we were forced to really find a way of managing and balancing and avoiding this kind of noise or duplicated information or extra spurious information—[it] can really impact on the capability of our solution to extract the real insights.

So I think that experience—a lot of experience in doing this for many, many years—is the second the secret element of our recipe in being able to do this kind of analysis. And that I would add that also you should consider that if you do it several times, we started to analyze political content, things that link it to political elections a few years ago. So we also had this generic experience and a specific experience in finding how to tune the different parameters, how to set the different algorithms to try to minimize these kinds of noisy elements. You can’t remove them completely. It is impossible.

But for example, when we analyzed the social content for the Brexit referendum, in the UK, and we were able to guess—one of the few able to do this—the real result of it, we learned a lot of lessons and we were able to improve our capability. Clearly, this means that there is not that a formula that is good for every kind of analysis.

Steven Cherry It’s sort of a commonplace that people express more extreme views on the Internet than they do in face-to-face encounters. The results from 2016 and 2020—and the Brexit result as well—suggests that the opposite may be the case. People’s voting reflects truly-held extreme views, while the polling reflects a sort of face-to-face façade.

Marco Varone Yes, I must admit that we had a small advantage in this—compared with many other companies and probably many other players that tried to guess the result of this election or the Brexit—being based where our technology is. Here in Italy, we saw this kind of situation happening much sooner than we have seen happening in other countries. So in Italy, we had, even many years ago, the strange situation where people, when they were polled, for an interview, were saying, “Oh, no, I think that is too extreme. I will never vote for this. I will vote for this other candidate or the other party.” But in the end, when that the elections were over, you saw that, oh, this is not what really happened in the secret of the vote.

So I would say that this is a small secret, a small advantage that we have against many other people that try to guess this result, creating this kind of technology and implementation in Italy, where these small splits or exaggerated positioning decided the vote for the election was happening before then we have seen. Now it’s very common. This is happening not only in the U.S., but also in other countries. It was happening before … So we have been able to understand it sooner and try to adjust and balance our parameters accordingly.

Steven Cherry That’s so interesting. People have, of course, compared the Trump administration to the Berlusconi administration, but I didn’t realize that the comparison went back all the way to their initial candidacies. So in effect, the shy voter theory—especially the shy-Trump voter theory—is basically correct and people express themselves more authentically online.

Steven Cherry Correct. This is what we are seeing, again and again. And it is something that I believe is not only happening in the political environment, but there it’s somehow stronger than in other places. As I told you, we are applying our artificial intelligence solution in many different fields, analyzing the feedback from customers of telco companies, banks, insurance companies. And you see that when you look at, for example, the content of the e-mails, or let me say official communication that they are exchanging between the customer and the company, everything is a bit smoother, more natural. The tone is under control. And then that when you see the same kind of problem that is discussed in a social content, everything is stronger. People are really trying to give a much stronger opinion, saying, I’ll never buy this kind of service or I had big problems with this company.

And so, again, this is something that we have seen also in other spaces. In the political situation, I believe it is even stronger because they are really not buying something like when you are interacting with a company, but you are trying to give your small contribution to the future of your country or your state or your local government. So probably there are even stronger sentiments and feelings for people. And in the social situation, they are really free because you are not really identified—normally you can be recognized, but in many cases you are not linked to the specific person doing that. So I believe that that is the strongest place where there is this, “Okay, I really wanted to say what I think, and this is the only place where I will tell this, because the risk of having a sort of a negative result is smaller.”

Steven Cherry Yeah. So not to belabor the point, but it does seem important. It’s commonly thought that the Internet goads people into holding more extreme positions than they really do, but the reality is that it instead frees them to express themselves more honestly.

A 2015 article in Nature argued that public opinion has become more extreme over time, and then the article looks at some of the possible causes. I’m wondering if you have seen that in your work and is it possible that standard polling techniques simply have not caught up with that?

Marco Varone Yes, I think that we can confirm we have seen that this kind of change we have … We are applying our solution to social content for a good number of years. I would say not exactly from the start because you need to have the sort of a minimum amount of data but it’s been a big number of years. And I can confirm, yes, it’s something that we have seen that it is happening. I don’t know exactly if it is also something that is linked to the fact that people that are more vocal on such a content are also part of the new generation of people that are younger, that have been able to use these kinds of channels of communication actively more or less from the start. I think that there are different element on this, but for sure I can confirm this.

And in different countries, we have seen some significant variation. For example, you should expect that here in Italy it’s super-strong because the Italian people, for example, are considered very … They don’t fear to express their opinion, but I will say that in the U.S. and also in the U.K., we are seeing [it] even stronger. Okay, so it’s happening in all the countries where we are operating and there are some countries where it’s even stronger than another one. You will not be surprised that, for example, when you analyze there, the content in Germany, such a content, everything is somehow more under control, exactly as you expect. So sometimes there are surprises. In other situations that are things that are more or less as you expect.

Steven Cherry I mentioned earlier Amazon, Netflix and Google. Are there similarities between what you’re doing here and what recommendation engines do?

Marco Varone There are elements in common and there are significant differences. The elements in common that they are also using the capability that they have in analyzing the textual content to extract elements for the recommendation, but they are also using a lot of other information. For us that when you analyze something more or less, the only information that we can get access to is really that the tweets, the posts, the articles, and other similar things. But for Amazon, they have access—or for Netflix—to get a lot of other information. So on Amazon, you have the clicks, you have the story of the customer, you have the different path that has been followed in navigating the site. They have historical information. So they have a much richer set of data and the text part is only somehow a complement of it. So there are elements in common and differences. And the other difference is that all these companies have a very shallow capability of understanding what is really written—in a comment, in a post, in a tweet—they tend to work more or less on a keyword level. Okay, this is a negative keyword; this is a positive keyword. With our AI intelligence, we can go deeper than that. So we can get the emotion, the feeling—we can disambiguate much better small differences in the expression of the person because we can go to a deeper level of understanding. It is not like a person. A person is still better than understanding all the nuances, but it’s something that can add more value and allows us to compensate—up to a point—to the fact that we don’t have access to this huge set of other data that these big companies easily have because they track and they log everything.

Steven Cherry I’m not sure humans always do better. You know, one of my complaints about the movie rating site Rotten Tomatoes is they take reviews by film reviewers and assess whether the review was a generally positive or generally negative review. It’s incredibly simplistic. And yet, in my opinion, they often get it wrong. I’d love to see a sentiment analysis software attack the movie-rating problem. Speaking of which, polling is more of a way to show off your company’s capabilities, yes? Your main business involves applications in industries like insurance and banking and publishing?

Marco Varone Correct. Absolutely. We decided that we would do it from time to time, as is said, to apply our technology and our solutions to this specific problem, not because we want to become a competitor of the companies doing these polls, but because we think it is a very good way to show the capability and the power of our technology and our solution, applied to a problem that is easily understood by everybody.

Normally what we do is to apply this kind of approach, for example, in analyzing the customer interaction between the customers and our clients or analyzing big amounts of social content to identify trends, patterns, emerging elements that can be emerging technologies or managing challenges.

Part of our customers are also in the intelligence space. So public police forces, national security, intelligence agencies … and use our AI platform to try to recognize possible threats, to help investigators and analysts to find the information that they want to find in a much faster and more structured way. Finally, I will say that our historical market is in publishing. Publishers are always searching for a way to enrich the content that they publish with the additional metadata so that the people reading and navigating inside the knowledge can really slice and dice the information across many dimensions or can then focus on specific topics, a specific place, or specific type of event.

Steven Cherry Returning to polling, the Pew Research Center is just one of many polling organizations that looked inward after 2020 and as far as I can tell, concluded that it needed to do still better sampling and weighting of voters. In other words, they just need to do a better job of what they had been doing. Do you think they could ever succeed at that or are they just on a failed path and they really need to start doing something more like what you’re doing?

Marco Varone I think that they are on a failed path and they need to really merge the two approaches. I believe that for the future, they really need to keep the good part of what they did for many, many years, because there is still a lot of value in that. But they are obliged to add this additional dimension because only working together with these two approaches, you can really find something that can give a good result. And I would say good prediction in the majority of the situations, even in these extreme events that are becoming more and more common. And this is sort of a part of how the world is changing.

So we think that they need to look at the kind of artificial technology, artificial intelligence technologies that we and other companies are making available because you cannot continue. This is not a problem of tuning the existing formulas. They should not discard it. It would be a big mistake, but for sure, in my opinion, they need a tool to blend the two things and spend the time to balance this combined model, because, again, if you just then merge the two approaches without spending time on balancing, the result would be even worse than what they have now.

Steven Cherry Well, Marco, I think that’s a very natural human need to predict the future, to help us plan accordingly, and a very natural cultural need to understand where our fellow citizens stand and feel and think about the important issues that face us. Polling tries to meet those needs. And if it’s been on the wrong path these many years, I hope there’s a right path and hopefully you’re pointing the way to it. Thanks for your work and for joining us today.

Marco Varone Thank you. It was a pleasure.

Steven Cherry We’ve been speaking with Marco Varone, CTO of Expert.ai, about polling, prediction, social media, and natural language processing.

Radio Spectrum is brought to you by IEEE Spectrum, the member magazine of the Institute of Electrical and Electronic Engineers, a professional organization dedicated to advancing technology for the benefit of humanity.

This interview was recorded November 24, 2020. Our theme music is by Chad Crouch.

You can subscribe to Radio Spectrum on the Spectrum website, Spotify, Apple Podcast, or wherever you get your podcasts. You sign up for alerts or for our upcoming newsletter. And we welcome your feedback on the web or in social media.

For Radio Spectrum, I’m Steven Cherry.

Note: Transcripts are created for the convenience of our readers and listeners. The authoritative record of IEEE Spectrum’s audio programming is the audio version.

We welcome your comments on Twitter (@RadioSpectrum1 and @IEEESpectrum) and Facebook.

Impressive iPhone Exploit

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/12/impressive-iphone-exploit.html

This is a scarily impressive vulnerability:

Earlier this year, Apple patched one of the most breathtaking iPhone vulnerabilities ever: a memory corruption bug in the iOS kernel that gave attackers remote access to the entire device­ — over Wi-Fi, with no user interaction required at all. Oh, and exploits were wormable­ — meaning radio-proximity exploits could spread from one nearby device to another, once again, with no user interaction needed.

[…]

Beer’s attack worked by exploiting a buffer overflow bug in a driver for AWDL, an Apple-proprietary mesh networking protocol that makes things like Airdrop work. Because drivers reside in the kernel — ­one of the most privileged parts of any operating system­ — the AWDL flaw had the potential for serious hacks. And because AWDL parses Wi-Fi packets, exploits can be transmitted over the air, with no indication that anything is amiss.

[…]

Beer developed several different exploits. The most advanced one installs an implant that has full access to the user’s personal data, including emails, photos, messages, and passwords and crypto keys stored in the keychain. The attack uses a laptop, a Raspberry Pi, and some off-the-shelf Wi-Fi adapters. It takes about two minutes to install the prototype implant, but Beer said that with more work a better written exploit could deliver it in a “handful of seconds.” Exploits work only on devices that are within Wi-Fi range of the attacker.

There is no evidence that this vulnerability was ever used in the wild.

EDITED TO ADD: Slashdot thread.

Building a scalable streaming data processor with Amazon Kinesis Data Streams on AWS Fargate

Post Syndicated from Florian Mair original https://aws.amazon.com/blogs/big-data/building-a-scalable-streaming-data-processor-with-amazon-kinesis-data-streams-on-aws-fargate/

Data is ubiquitous in businesses today, and the volume and speed of incoming data are constantly increasing. To derive insights from data, it’s essential to deliver it to a data lake or a data store and analyze it. Real-time or near-real-time data delivery can be cost prohibitive, therefore an efficient architecture is key for processing, and becomes more essential with growing data volume and velocity.

In this post, we show you how to build a scalable producer and consumer application for Amazon Kinesis Data Streams running on AWS Fargate. Kinesis Data Streams is a fully managed and scalable data stream that enables you to ingest, buffer, and process data in real time. AWS Fargate is a serverless compute engine for containers that works with AWS container orchestration services like Amazon Elastic Container Service (Amazon ECS), which allows us to easily run, scale, and secure containerized applications.

This solution also uses the Amazon Kinesis Producer Library (KPL) and Amazon Kinesis Client Library (KCL) to ingest data into the stream and to process it. KPL helps you optimize shard utilization in your data stream by specifying settings for aggregation and batching as data is being produced into your data stream. KCL helps you write robust and scalable consumers that can keep up with fluctuating data volumes being sent to your data stream.

The sample code for this post is available in a GitHub repo, which also includes an AWS CloudFormation template to get you started.

What is data streaming?

Before we look into the details of data streaming architectures, let’s get started with a brief overview of data streaming. Streaming data is data that is generated continuously by a large number of sources that transmit the data records simultaneously in small packages. You can use data streaming for many use cases, such as log processing, clickstream analysis, device geo-location, social media data processing, and financial trading.

A data streaming application consists of two layers: the storage layer and the processing layer. As stream storage, AWS offers the managed services Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (Amazon MSK), but you can also run stream storages like Apache Kafka or Apache Flume on Amazon Elastic Compute Cloud (Amazon EC2) or Amazon EMR. The processing layer consumes the data from the storage layer and runs computations on that data. This could be an Apache Flink application running fully managed on Amazon Kinesis Analytics for Apache Flink, an application running stream processing frameworks like Apache Spark Streaming and Apache Storm or a custom application using the Kinesis API or KCL. For this post, we use Kinesis Data Streams as the storage layer and the containerized KCL application on AWS Fargate as the processing layer.

Streaming data processing architecture

This section gives a brief introduction to the solution’s architecture, as shown in the following diagram.

The architecture consists of four components:

  • Producer group (data ingestion)
  • Stream storage
  • Consumer group (stream processing)
  • Kinesis Data Streams auto scaling

Data ingestion

For ingesting data into the data stream, you use the KPL, which aggregates, compresses, and batches data records to make the ingestion more efficient. In this architecture, the KPL increased the per-shard throughput up to 100 times, compared to ingesting the records with the PutRecord API (more on this in the Monitoring your stream and applications section). This is because the records are smaller than 1 KB each and the example code uses the KPL to buffer and send a collection of records in one HTTP request.

The record buffering can consume enough memory to crash itself; therefore, we recommend handling back-pressure. A sample on handling back-pressure is available in the KPL GitHub repo.

Not every use case is suited for using the KPL for ingestion. Due to batching and aggregation, the KPL has to buffer records, and therefore introduces some additional per-record latency. For a large number of small producers (such as mobile applications), you should use the PutRecords API to batch records or implement a proxy that handles aggregation and batching.

In this post, you set up a simple HTTP endpoint that receives data records and processes them using the KPL. The producer application runs in a Docker container, which is orchestrated by Amazon ECS on AWS Fargate. A target tracking scaling policy manages the number of parallel running data ingestion containers. It adjusts the number of running containers so you maintain an average CPU utilization of 65%.

Stream storage: Kinesis Data Streams

As mentioned earlier, you can run a variety of streaming platforms on AWS. However, for the data processor in this post, you use Kinesis Data Streams. Kinesis Data Streams is a data store where the data is held for 24 hours and configurable up to 1 year. Kinesis Data Streams is designed to be highly available and redundant by storing data across three Availability Zones in the specified Region.

The stream consists of one or more shards, which are uniquely identified sequences of data records in a stream. One shard has a maximum of 2 MB/s in reads (up to five transactions) and 1 MB/s writes per second (up to 1,000 records per second). Consumers with Dedicated Throughput (Enhanced Fan-Out) support up to 2 MB/s data egress per consumer and shard.

Each record written to Kinesis Data Streams has a partition key, which is used to group data by shard. In this example, the data stream starts with five shards. You use random generated partition keys for the records because records don’t have to be in a specific shard. Kinesis Data Streams assigns a sequence number to each data record, which is unique within the partition key. Sequence numbers generally increase over time so you can identify which record was written to the stream before or after another.

Stream processing: KCL application on AWS Fargate

This post shows you how to use custom consumers—specifically, enhanced fan-out consumers—using the KCL. Enhanced fan-out consumers have a dedicated throughput of 2 MB/s and use a push model instead of pull to get data. Records are pushed to the consumer from the Kinesis Data Streams shards using HTTP/2 Server Push, which also reduces the latency for record processing. If you have more than one instance of a consumer, each instance has a 2 MB/s fan-out pipe to each shard independent from any other consumers. You can use enhanced fan-out consumers with the AWS SDK or the KCL.

For the producer application, this example uses the KPL, which aggregates and batches records. For the consumer to be able to process these records, the application needs to deaggregate the records. To do this, you can use the KCL or the Kinesis Producer Library Deaggeragtion Modules for AWS Lambda (support for Java, Node.js, Python, and Go). The KCL is a Java library but also supports other languages via a MultiLangDaemon. The MultiLangDaemon uses STDIN and STDOUT to communicate with the record processor, so be aware of logging limitations. For this sample application, you use enhanced fan-out consumers with the KCL for Python 2.0.1.

Due to the STDOUT limitation, the record processor logs data records to a file that is written to the container logs and published to Amazon CloudWatch. If you create your own record processor, make sure it handles exceptions, otherwise records may be skipped.

The KCL creates an Amazon DynamoDB table to keep track of consumer progress. For example, if your stream has four shards and you have one producer instance, your instance runs a separate record processor for each shard. If the consumer scales to two instances, the KCL rebalances the record processor and runs two record processors on each instance. For more information, see Using the Kinesis Client Library.

A target tracking scaling policy manages the number of parallel running data processor containers. It adjusts the number of running containers to maintain an average CPU utilization of 65%.

Container configuration

The base layer of the container is Amazon Linux 2 with Python 3 and Java 8. Although you use KCL for Python, you need Java because the record processor communicates with the MultiLangDaemon of the KCL.

During the Docker image build, the Python library for the KCL (version 2.0.1 of amazon_kclpy) is installed, and the sample application (release 2.0.1) from the KCL for Python GitHub repo is cloned. This allows you to use helper tools (samples/amazon_kclpy_helper.py) so you can focus on developing the record processor. The KCL is configured via a properties file (record_processor.properties).

For logging, you have to distinguish between logging of the MultiLangDaemon and the record processor. The logging configuration for the MultiLangDaemon is specified in logback.xml, whereas the record processor has its own logger. The record processor logs to a file and not to STDOUT, because the MultiLangDaemon uses STDOUT for communication, therefore the Daemon would throw an unrecognized messages error.

Logs written to a file (app/logs/record_processor.log) are attached to container logs by a subprocess that runs in the container entry point script (run.sh). The starting script also runs set_properties_py, which uses environment variables to set the AWS Region, stream name, and application name dynamically. If you want to also change other properties, you can extend this script.

The container gets its permissions (such as to read from Kinesis Data Streams and write to DynamoDB) by assuming the role ECSTaskConsumerRole01. This sample deployment uses 2 vCPU and 4 GB memory to run the container.

Kinesis capacity management

When changes in the rate of data flow occur, you may have to increase or decrease the capacity. With Kinesis Data Streams, you can have one or more hot shards as a result of unevenly distributed partition keys, very similar to a hot key in a database. This means that a certain shard receives more traffic than others, and if it’s overloaded, it produces a ProvisionedThroughputExceededException (enable detailed monitoring to see that metric on shard level).

You need to split these hot shards to increase throughput, and merge cold shards to increase efficiency. For this post, you use random partition keys (and therefore random shard assignment) for the records, so we don’t dive deeper into splitting and merging specific shards. Instead, we show how to increase and decrease throughput capacity for the whole stream. For more information about scaling on a shard level, see Strategies for Resharding.

You can build your own scaling application utilizing the UpdateShardCount, SplitShard, and MergeShards APIs or use the custom resource scaling solution as described in Scale Amazon Kinesis Data Streams with AWS Application Auto Scaling or Amazon Kineis Scaling Utils. The Application Auto Scaling is an event-driven scaling architecture based on CloudWatch alarms, and the Scaling Utils is a Docker container that constantly monitors your data stream. The Application Auto Scaling manages the number of shards for scaling, whereas the Kinesis Scaling Utils additionally handles shard keyspace allocations, hot shard splitting, and cold shard merging. For this solution, you use the Kinesis Scaling Utils and deploy it on Amazon ECS. You can also deploy it on AWS Elastic Beanstalk as a container or on an Apache Tomcat platform.

Prerequisites

For this walkthrough, you must have an AWS account.

Solution overview

In this post, we walk through the following steps:

  1. Deploying the CloudFormation template.
  2. Sending records to Kinesis Data Streams.
  3. Monitoring your stream and applications.

Deploying the CloudFormation template

Deploy the CloudFormation stack by choosing Launch Stack:

The template launches in the US East (N. Virginia) Region by default. To launch it in a different Region, use the Region selector in the console navigation bar. The following Regions are supported:

  • US East (Ohio)
  • US West (N. California)
  • US West (Oregon)
  • Asia Pacific (Singapore)
  • Asia Pacific (Sydney)
  • Europe (Frankfurt)
  • Europe (Ireland)

Alternatively, you can download the CloudFormation template and deploy it manually. When asked to provide an IPv4 CIDR range, enter the CIDR range that can send records to your application. You can change it later on by adapting the security groups inbound rule for the Application Load Balancer.

Sending records to Kinesis Data Streams

You have several options to send records to Kinesis Data Streams. You can do it from the CLI or any API client that can send REST requests, or use a load testing solution like Distributed Load Testing on AWS or Artillery. With load testing, additional charges for requests occur; as a guideline, 10,000 requests per second for 10 minutes generate an AWS bill of less than $5.00. To do a POST request via curl, run the following command and replace ALB_ENDPOINT with the DNS record of your Application Load Balancer. You can find it on the CloudFormation stack’s Outputs tab. Ensure you have a JSON element “data”. Otherwise, the application can’t process the record.

curl --location --request POST '<ALB_ENDPOINT>' --header 'Content-Type: application/json' --data-raw '{"data":" This is a testing record"}'

Your Application Load Balancer is the entry point for your data records, so all traffic has to pass through it. Application Load Balancers automatically scale to the appropriate size based on traffic by adding or removing different sized load balancer nodes.

Monitoring your stream and applications

The CloudFormation template creates a CloudWatch dashboard. You can find it on the CloudWatch console or by choosing the link on the stack’s Outputs tab on the CloudFormation console. The following screenshot shows the dashboard.

This dashboard shows metrics for the producer, consumer, and stream. The metric Consumer Behind Latest gives you the offset between current time and when the last record was written to the stream. An increase in this metric means that your consumer application can’t keep up with the rate records are ingested. For more information, see Consumer Record Processing Falling Behind.

The dashboard also shows you the average CPU utilization for the consumer and producer applications, the number of PutRecords API calls to ingest data into Kinesis Data Streams, and how many user records are ingested.

Without using the KPL, you would see one PutRecord equals one user record, but in our architecture, you should see a significantly higher number of user records than PutRecords. The ratio between UserRecords and PutRecords operations strongly depends on KPL configuration parameters. For example, if you increase the value of RecordMaxBufferedTime, data records are buffered longer at the producer, more records can be aggregated, but the latency for ingestion is increased.

All three applications (including the Kinesis Data Streams scaler) publish logs to their respective log group (for example, ecs/kinesis-data-processor-producer) in CloudWatch. You can either check the CloudWatch logs of the Auto Scaling Application or the data stream metrics to see the scaling behavior of Kinesis Data Streams.

Cleaning up

To avoid additional cost, ensure that the provisioned resources are decommissioned. To do that, delete the images in the Amazon Elastic Container Registry (Amazon ECR) repository, the CloudFormation stack, and any remaining resources that the CloudFormation stack didn’t automatically delete. Additionally, delete the DynamoDB table DataProcessorConsumer, which the KCL created.

Conclusion

In this post, you saw how to run the KCL for Python on AWS Fargate to consume data from Kinesis Data Streams. The post also showed you how to scale the data production layer (KPL), data storage layer (Kinesis Data Streams), and the stream processing layer (KCL). You can build your own data streaming solution by deploying the sample code from the GitHub repo. To get started with Kinesis Data Streams, see Getting Started with Amazon Kinesis Data Streams.


About the Author

Florian Mair is a Solutions Architect at AWS.He is a t echnologist that helps customers in Germany succeed and innovate by solving business challenges using AWS Cloud services. Besides working as a Solutions Architect, Florian is a passionate mountaineer, and has climbed some of the highest mountains across Europe.

Certificates from Let’s Encrypt (R3 active)

Post Syndicated from ris original https://lwn.net/Articles/838811/rss

Let’s Encrypt has announced that, as of today, the TLS certificates issued
by the Let’s Encrypt certificate authority are using a new intermediate
certificate. “While LE will start using their new _roots_ next year, the change today
is using a _variant_ of their “R3” certificate which is cross-signed
from IdenTrust, rather than chaining back to their “ISRG Root X1”.

This will affect you if you’re using DANE, TLSA records in DNS, signed
by DNSSEC, to advertise properties of the certificate chain which remote
systems should expect to see.”

IEEE President’s Column: Amid Global Uncertainty, IEEE Steps Up

Post Syndicated from Toshio Fukuda original https://spectrum.ieee.org/the-institute/ieee-member-news/ieee-presidents-column-amid-global-uncertainty-ieee-steps-up

IEEE COVID-19 coverage logo, link to landing page

THE INSTITUTE As 2020 draws to a close, I look back on my year as IEEE president and marvel at what have been 12 world-changing, paradigm-shifting months. Throughout this period one thing became quite apparent: IEEE is more than just our technical conferences, publications, and standards. IEEE is a vibrant, engaged, international community growing every year and contributing more diverse, insightful, and essential work than ever before. This year our community has come together in new ways, faced the challenges of a global pandemic, and emerged even stronger.

The year demonstrated the impact that professional engineers and technologists have had on society. We have witnessed amazing engineering developments and important medical and technological breakthroughs. We have stayed connected and engaged, leveraging computing and communications to allow critical work to continue while keeping individuals and families safe. The challenges and changes we have witnessed in local communities, across nations, and around the world confirm that the work of professional engineers, technologists, educators, young professionals, and students preparing for technical careers will continue to be in high demand and have a great impact.

I would be remiss if I didn’t thank all of you who proudly call yourselves IEEE members. Your enthusiasm in being members, in joining together in virtual communities, participating in our webinars, writing for our journals, and moving our professions forward is greatly appreciated. We will continue to look for ways to improve IEEE’s products, our communications, and our advocacy. We will also continue to engage the public, policymakers, and the news media about the important work that you and your colleagues do each and every day.

I want to acknowledge the many volunteer leaders who have served on the boards of our major committees, sections, societies, and councils and thank all who agreed to dedicate their time to the work being done by our regions, chapters, and branches. As a whole, there were thousands of volunteers at all levels this year who said “yes” when asked to serve. The profession owes all of you a debt of gratitude for your efforts.

Finally, I would like to recognize our professional staff from around the world and thank them for their efforts in continuing to successfully meet the challenge of supporting our mission and our members while working under unique circumstances.

NEW WAYS OF CONNECTING

Despite the loss of face-to-face opportunities and interaction, this year IEEE became more vital than ever. IEEE operations not only continued but also intensified to meet the increased need for access to technical resources; the need for swifter dissemination of pandemic-related papers; a seamless transition to online platforms for conferences and events; and, perhaps most importantly, embracing new ways to connect and communicate.

Our membership has remained strong, our resources were in high demand, and we worked to increase the public’s understanding of the key roles that our members play in society around the globe.

In response to the pandemic’s challenges, we learned even more ways to use technology to work smarter and to reach wider audiences by engaging them how and where it worked best for them. For example, the Region 10 Students, Young Professionals, Women in Engineering, and Life Members virtual congress held this fall attracted more than 10,000 online participants. Instead of the conventional full-day model, the congress was held for shortened periods of time over 16 days. This provided more flexibility for members to participate from numerous time zones and still fulfill their own professional and personal commitments.

The International Conference on Intelligent Robots and Systems pivoted to an on-demand conference, moving beyond constraints of time and space to provide a platform to view prerecorded videos of the more than 1,400 technical talks and 60 tutorials. This enabled participants to easily access content, anywhere, anytime, and with any device.

Personally, by participating in so many virtual events, I have been able to attend more IEEE activities and engage with more IEEE members around the world than ever before, despite the pandemic.

It has been quite an interesting year to be IEEE president. Navigating the post-COVID meeting and conference environment will require adaptability, flexibility, and innovation. IEEE has a great opportunity to develop new models for virtual and hybrid events that provide participants all the benefits of IEEE’s cutting-edge technical content.

EDUCATION EVOLUTION

Another promising development for IEEE is the ongoing evolution of its role within the field of continuing professional education and lifelong learning. It is imperative that IEEE be one of the driving forces within the area of professional development—taking advantage of the latest online platforms and our unique worldwide volunteer community, which can provide a local-content perspective from almost anywhere on the planet. Throughout 2020, we dedicated time, energy, and expertise to this important topic.

Action plans have been developed to create IEEE Academies on artificial intelligence, the Internet of Things, and the smart grid. The IEEE Academies will provide members new and unique value, as they will be able to take training with a more thorough learning pathway. They will also combine resources such as eLearning courses, webinar recordings, videos, and articles about a key subject of interest together with new materials and take learners through a guided journey that better ties these concepts and materials together. Our volunteer educators-in-chief are building these educational products that IEEE can offer to raise the caliber of professionals in these fields.

In this year of unprecedented challenges and uncertainty, I’ve had the remarkable opportunity to witness IEEE’s mission—advancing technology for the benefit of humanity—in action by our members, who are making significant improvements throughout society. That, in my opinion, is one of the primary benefits of being part of a global community such as IEEE. Together, IEEE members have changed our world—and we will continue do so every day.

A future of promise and possibility lies ahead for IEEE. We will continue to build that future together. I thank you for helping us progress during this extraordinary year.

Share your thoughts with me at [email protected].

This article appears in the December 2020 print issue as “Amid Global Uncertainty, IEEE Steps Up.”

Controlling data lake access across multiple AWS accounts using AWS Lake Formation

Post Syndicated from Rafael Suguiura original https://aws.amazon.com/blogs/big-data/controlling-data-lake-access-across-multiple-aws-accounts-using-aws-lake-formation/

When deploying data lakes on AWS, you can use multiple AWS accounts to better separate different projects or lines of business. In this post, we see how the AWS Lake Formation cross-account capabilities simplify securing and managing distributed data lakes across multiple accounts through a centralized approach, providing fine-grained access control to the AWS Glue Data Catalog and Amazon Simple Storage Service (Amazon S3) locations.

Use case

Keeping each business unit’s resources as compute and storage in its own AWS account allows for easier cost allocation and permissions governance. In the other hand, centralizing your Data Catalog into a single account with Lake Formation removes the overhead of managing multiple catalogs in isolated data silos, simplifying the management and data availability.

For this post, we use the example of a company with two separate teams:

  • The Analytics team is responsible for data ingestion, validation, and cleansing. After processing the income data, they store it on Amazon S3 and use Lake Formation for the Data Catalog, in a primary AWS account.
  • The Business Analyst team is responsible for generating reports and extracting insight from such data. They use Amazon Athena running in a secondary AWS account.

When a secondary account needs to access data, the data lake administrator use Lake Formation to share data across accounts, avoiding data duplication, silos, and reducing complexity. Data can be shared at the database or table level, and the administrator can define which tables and columns each analyst has access to, establishing a centralized and granular access control. The following diagram illustrates this architecture.

Architecture overview

We provide two AWS CloudFormation templates to set up the required infrastructure for this data lake use case. Each template deploys the resources in one of the accounts (primary and secondary).

In the primary account, the CloudFormation template loads the sample data in the S3 bucket. For this post, we use the publicly available dataset of historic taxi trips collected in New York City in the month of June 2020, in CSV format. The dataset is available from the New York City Taxi & Limousine Commission, via Registry of Open Data on AWS, and contains information on the geolocation and collected fares of individual taxi trips.

The template also creates a Data Catalog configuration by crawling the bucket using an AWS Glue crawler, and updating the Lake Formation Data Catalog on the primary account.

Prerequisites

To follow along with this post, you must have two AWS accounts (primary and secondary), with AWS Identity and Access Management (IAM) administrator access.

Deploying the CloudFormation templates

To get started, launch the first CloudFormation template in the primary account:

After that, deploy the second template in the secondary account:

You now have the deployment as depicted in the following architecture, and are ready to set up Lake Formation with cross-account access.

Setting up Lake Formation in the primary account

Now that you have the basic infrastructure provisioned by the template, we can dive deeper into the steps required for Lake Formation configuration. First sign in to the primary account on the AWS Management Console, using the existing IAM administrator role and account.

Assigning a role to our data lake

Lake Formation administrators are IAM users or roles that can grant and delegate Lake Formation permissions on data locations, databases, and tables. The CloudFormation template created an IAM role with the proper IAM permissions, named LakeFormationPrimaryAdmin. Now we need to assign it to our data lake:

  1. On the Lake Formation console, in the Welcome to Lake Formation pop-up window, choose Add administrators.
    1. If the pop-up doesn’t appear, in the navigation pane, under Permissions, choose Admins and database creators.
    2. Under Data Lake Administrators, choose Grant.
  2. For IAM users and roles, choose LakeFormationPrimaryAdmin.
  3. Choose Save.

After we assign the Lake Formation administrator, we can assume this role and start managing our data lake.

  1. On the console, choose your user name and choose Switch Roles.

  1. Enter your primary account number and the role LakeFormationPrimaryAdmin.
  2. Choose Switch Role.

For detailed instructions on changing your role, see Switching to a role (console).

Adding the Amazon S3 location as a storage layer

Now you’re the Lake Formation administrator. For Lake Formation to implement access control on the data lake, we need to include the Amazon S3 location as a storage layer. Let’s register our existing S3 bucket that contains sample data.

  1. On the Lake Formation console, in the navigation pane, under Register and Ingest, choose Data lake locations.
  2. For Amazon S3 path, choose Browse.
  3. Choose the S3 bucket in the primary account, referenced in the CloudFormation template outputs as S3BucketPrimary.
  4. Choose Register location.

Configuring access control

When you create the template, an AWS Glue crawler populates the Data Catalog with the database and catalog pointing to our S3 bucket. By default, Lake Formation adds IAMAllowedPrincipals permissions, which isn’t compatible with cross-account sharing. We must disable it on our database and table. For this post, we use Lake Formation access control in conjunction with IAM. For more information, see Change Data Catalog Settings.

  1. On the Lake Formation console, in the navigation pane, under Data Catalog, choose Databases.
  2. Choose gluedatabaseprimary.
  3. Choose Edit.
  4. Deselect Use only IAM access control for new tables in this database.
  5. Choose Save.

  1. On the database details page, on the Actions menu, choose Revoke.
  2. For IAM users and roles, choose IAMAllowedPrincipals.
  3. For Database permissions, select Super.

  1. Choose Revoke.
  2. On the database details page, choose View Tables.
  3. Select the table that starts with lf_table.
  4. On the Actions menu, choose Revoke.
  5. For IAM users and roles, choose IAMAllowedPrincipals.
  6. For Database permissions, select Super.
  7. Choose Revoke.

You can now see the metadata and Amazon S3 data location in the table details. The CloudFormation template ran an AWS Glue crawler that populated the table.

Granting permissions

Now we’re ready to grant permissions to the Business Analyst users. Because they’re in a separate AWS account, we need to share the database across accounts.

  1. On the Lake Formation console, under Data Catalog¸ choose Databases.
  2. Select our database.
  3. On the Actions menu, choose Grant.
  4. Select External account.
  5. For AWS account ID or AWS organization ID, enter the secondary account number.
  6. For Table, choose All tables.
  7. For Table permissions, select Select.
  8. For Grantable permissions, select Select.

Grantable permissions are required to allow the principal to pass this grant to other users and roles. For our use case, the secondary account LakeFormationAdministrator grants access to the secondary account BusinessAnalyst. If this permission is revoked on the primary account in the future, all access granted to BusinessAnalyst and LakeFormationAdministrator on the secondary account is also revoked.

For this post, we share the database with a single account. Lake Formation also allows sharing with an AWS organization.

  1. Choose Grant.

Sharing specific tables across accounts

Optionally, instead of sharing the whole database, you can share specific tables across accounts. You don’t need to share the database to share a table underneath it.

  1. On the Lake Formation console, under Data Catalog, choose Tables.
  2. Select the table that starts with lf_table.
  3. On the Actions menu, choose Grant.
  4. Select External account.
  5. For AWS account ID or AWS organization ID, enter the secondary account number.

You can also choose specific columns to share with the secondary account. For this post, we share five columns.

  1. For Columns, choose Include columns.
  2. For Include columns, choose the following columns
    1. vendorid
    2. lpep_pickup_datetime
    3. lp_dropoff_taketime
    4. store_and_forward_flag
    5. ratecodeid
  3. For Table permissions, select Select.
  4. For Grantable permissions, select Select.
  5. Choose Grant.

Setting up Lake Formation in the secondary account

Now that the primary account setup is complete, let’s configure the secondary account. We access the resource share and create appropriate resource links, pointing to the databases or tables in the primary account. This allows the data lake administrator to grant proper access to the Business Analyst team, who queries the data through Athena. The following diagram illustrates this architecture.

Assigning a role to our data lake

Similar to the primary account, we need to assign an IAM role as the Lake Formation administrator. To better differentiate the roles, this one is named LakeFormationSecondaryAdmin.

  1. On the Lake Formation console, under Permissions, choose Admins and database creators.
  2. Under Data Lake Administrators, choose Grant.
  3. In the pop-up window, choose LakeFormationSecondaryAdmin.
  4. Choose Save.
  5. On the console, switch to the LakeFormationSecondaryAdmin role.

Sharing resources

Lake Formation shares resources (databases and tables) by using AWS Resource Access Manager. AWS RAM provides a streamlined way to share resources across AWS accounts and also integrates with AWS Organizations. If both primary and secondary accounts are in the same organization with resource sharing enabled, resources shares are accepted automatically and you can skip this step. If not, complete the following steps:

  1. On the AWS RAM console, in the navigation pane, under Shared with me, choose Resource shares.
  2. Choose the Lake Formation share.
  3. Choose Accept resource share.

The resource status switches to Active.

Creating a resource link

With the share accepted, we can create a resource link in the secondary account. Resource links are Data Catalog virtual objects that link to a shared database or table. The resource link lives in your account and the referenced object it points to can be anywhere else.

  1. On the Lake Formation console, under Data Catalog, choose Databases.
  2. Choose Create database.
  3. Select Resource link.
  4. For Resource link name, enter a name, such as lf-primary-database-rl.
  5. For Shared database, choose gluedatabaseprimary.

The shared database’s owner ID is populated automatically.

  1. Choose Create.

You can use this resource link the same way you use database or table references in Lake Formation. The following screenshot shows the resource link listed on the Databases page.

Granting permissions

As the creator of the resource link, at this point only you (IAM role LakeFormationSecondaryAdmin) can view and access this object in the Data Catalog. To grant visibility on the resource link to our Business Analyst users (IAM role LakeFormationSecondaryAnalyst), we need to grant them describe permissions.

  1. On the Lake Formation console, navigate to the database details page.
  2. On the Actions menu, choose Grant.
  3. For IAM users and roles, choose LakeFormationSecondaryAnalyst.
  4. For Resource Link permissions, select Describe and deselect Super.
  5. Choose Grant.

Granting permissions on a resource link doesn’t grant permissions on the target (linked) database or table, so let’s do it now. For our use case, the analysts only need SQL SELECT capabilities, and only to the specific columns of the table.

  1. In the navigation pane, under Data Catalog, choose Databases.
  2. Select lf-primary-database-rl.
  3. On the Actions menu, choose Grant on Target.
  4. In the Grant permissions dialog box, choose My account.
  5. For IAM users and roles, choose LakeFormationSecondaryAnalyst.
  6. Choose the table that starts with lf_table.
  7. Under Columns, select Include Columns and select the first five columns.
  8. For Table permissions, select Select.
  9. Choose Grant.

Accessing the data

With all the Lake Formation grants in place, the users are ready to access the data at the proper level.

  1. In the secondary account, switch to the role LakeFormationSecondaryAnalyst.
  2. On the Athena console, choose Get Started.
  3. On the selection bar, under Workgroup, choose LakeFormationCrossAccount.
  4. Choose Switch workgroup.

The screen refreshes; make sure you are in the right workgroup.

To use Lake Formation cross-account access, you don’t need a separate Athena workgroup. For this post, the CloudFormation template created one to simplify deployment with the proper Athena configuration.

  1. For Data source, choose AwsDataCatalog.
  2. For Database, choose lf-primary-database-rl.
  3. For Tables, choose if_table_<string>.
  4. On the menu, choose Preview table.

  1. Choose Run query.

You now have a data analyst on the secondary account with access to an S3 bucket in the primary account. The analyst only has access to the five columns we specified earlier.

Data access that is granted by Lake Formation cross-account access is logged in the secondary account AWS CloudTrail log file, and Lake Formation copies the event to the primary account’s log file. For more information and examples of logging messages, see Cross-Account CloudTrail Logging.

Cleaning up

To avoid incurring future charges, delete the CloudFormation templates after you finish testing the solution.

Conclusion

In this post, we went through the process of configuring Lake Formation to share AWS Glue Data Catalog metadata information across AWS accounts.

Large enterprises typically use multiple AWS accounts, and many of those accounts might need access to a data lake managed by a single AWS account. AWS Lake Formation with cross-account access set up enables you to run queries and jobs that can join and query tables across multiple accounts.


About the Authors

Rafael Suguiura is a Principal Solutions Architect at Amazon Web Services. He guides some of the world’s largest financial services companies in their cloud journey. When the weather is nice, he enjoys cycling and finding new hiking trails—and when it’s not, he catches up with sci-fi books, TV series, and video games.

 

 

 

Himanish Kushary is a Senior Big Data Architect at Amazon Web Services. He helps customers across multiple domains build scalable big data analytics platforms. He enjoys playing video games, and watching good movies and TV series.

UX Design – A Competitive Advantage for Robotics Companies

Post Syndicated from Rocos original https://spectrum.ieee.org/robotics/robotics-software/ux-design-a-competitive-advantage-for-robotics-companies

UX design differentiates great products from average products, and helps you stand out from the rest. Now, in the age of robotics, your teams may be creating products with no blueprint to bounce off.

Find out why a UX-led approach for the design and build of your products is critical, learn from the masters in getting this right – first time, speed up the time it takes you to get to market, and deliver products users love.

Download

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close