Tag Archives: artificial intelligence

Being Naughty to See Who Was Nice: Machine Learning Attacks on Santa’s List

Post Syndicated from Erick Galinkin original https://blog.rapid7.com/2022/01/14/being-naughty-to-see-who-was-nice-machine-learning-attacks-on-santas-list/

Being Naughty to See Who Was Nice: Machine Learning Attacks on Santa’s List

Editor’s note: We had planned to publish our Hacky Holidays blog series throughout December 2021 – but then Log4Shell happened, and we dropped everything to focus on this major vulnerability that impacted the entire cybersecurity community worldwide. Now that it’s 2022, we’re feeling in need of some holiday cheer, and we hope you’re still in the spirit of the season, too. Throughout January, we’ll be publishing Hacky Holidays content (with a few tweaks, of course) to give the new year a festive start. So, grab an eggnog latte, line up the carols on Spotify, and let’s pick up where we left off.

Santa’s task of making the nice and naughty list has gotten a lot harder over time. According to estimates, there are around 2.2 billion children in the world. That’s a lot of children to make a list of, much less check it twice! So like many organizations with big data problems, Santa has turned to machine learning to help him solve the issue and built a classifier using historical naughty and nice lists. This makes it easy to let the algorithm decide whether they’ll be getting the gifts they’ve asked for or a lump of coal.

Being Naughty to See Who Was Nice: Machine Learning Attacks on Santa’s List

Santa’s lists have long been a jealously guarded secret. After all, being on the naughty list can turn one into a social pariah. Thus, Santa has very carefully protected his training data — it’s locked up tight. Santa has, however, made his model’s API available to anyone who wants it. That way, a parent can check whether their child is on the nice or naughty list.

Santa, being a just and equitable person, has already asked his data elves to tackle issues of algorithmic bias. Unfortunately, these data elves have overlooked some issues in machine learning security. Specifically, the issues of membership inference and model inversion.

Membership inference attacks

Membership inference is a class of machine learning attacks that allows a naughty attacker to query a model and ask, in effect, “Was this example in your training data?” Using the techniques of Salem et al. or a tool like PrivacyRaven, an attacker can train a model that figures out whether or not a model has seen an example before.

Being Naughty to See Who Was Nice: Machine Learning Attacks on Santa’s List

From a technical perspective, we know that there is some amount of memorization in models, and so when they make their predictions, they are more likely to be confident on items that they have seen before — in some ways, “memorizing” examples that have already been seen. We can then create a dataset for our “shadow” model — a model that approximates Santa’s nice/naughty system, trained on data that we’ve collected and labeled ourselves.

We can then take the training data and label the outputs of this model with a “True” value — it was in the training dataset. Then, we can run some additional data through the model for inference and collect the outputs and label it with a “False” value — it was not in the training dataset. It doesn’t matter if these in-training and out-of-training data points are nice or naughty — just that we know if they were in the “shadow” training dataset or not. Using this “shadow” dataset, we train a simple model to answer the yes or no question: “Was this in the training data?” Then, we can turn our naughty algorithm against Santa’s model — “Dear Santa, was this in your training dataset?” This lets us take real inputs to Santa’s model and find out if the model was trained on that data — effectively letting us de-anonymize the historical nice and naughty lists!

Model inversion

Now being able to take some inputs and de-anonymize them is fun, but what if we could get the model to just tell us all its secrets? That’s where model inversion comes in! Fredrikson et al. proposed model inversion in 2015 and really opened up the realm of possibilities for extracting data from models. Model inversion seeks to take a model and, as the name implies, turn the output we can see into the training inputs. Today, extracting data from models has been done at scale by the likes of Carlini et al., who have managed to extract data from large language models like GPT-2.

Being Naughty to See Who Was Nice: Machine Learning Attacks on Santa’s List

In model inversion, we aim to extract memorized training data from the model. This is easier with generative models than with classifiers, but a classifier can be used as part of a larger model called a Generative Adversarial Network (GAN). We then sample the generator, requesting text or images from the model. Then, we use the membership inference attack mentioned above to identify outputs that are more likely to belong to the training set. We can iterate this process over and over to generate progressively more training set-like outputs. In time, this will provide us with memorized training data.

Note that model inversion is a much heavier lift than membership inference and can’t be done against all models all the time — but for models like Santa’s, where the training data is so sensitive, it’s worth considering how much we might expose! To date, model inversion has only been conducted in lab settings on models for text generation and image classification, so whether or not it could work on a binary classifier like Santa’s list remains an open question.

Mitigating model mayhem

Now, if you’re on the other side of this equation and want to help Santa secure his models, there are a few things we can do. First and foremost, we want to log, log, log! In order to carry out the attacks, the model — or a very good approximation — needs to be available to the attacker. If you see a suspicious number of queries, you can filter IP addresses or rate limit. Additionally, limiting the return values to merely “naughty” or “nice” instead of returning the probabilities can make both attacks more difficult.

For extremely sensitive applications, the use of differential privacy or optimizing with DPSGD can also make it much more difficult for attackers to carry out their attacks, but be aware that these techniques come with some accuracy loss. As a result, you may end up with some nice children on the naughty list and a naughty hacker on your nice list.

Santa making his list into a model will save him a whole lot of time, but if he’s not careful about how the model can be queried, it could also lead to some less-than-jolly times for his data. Membership inference and model inversion are two types of privacy-related attacks that models like this may be susceptible to. As a best practice, Santa should:

  • Log information about queries like:
    • IP address
    • Input value
    • Output value
    • Time
  • Consider differentially private model training
  • Limit API access
  • Limit the information returned from the model to label-only

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

More Hacky Holidays blogs

How can AI-based analysis help educators support students?

Post Syndicated from Henna Gorsia original https://www.raspberrypi.org/blog/ai-sytems-in-education-learner-support-research-seminar/

We are hosting a series of free research seminars about how to teach artificial intelligence (AI) and data science to young people, in partnership with The Alan Turing Institute.

In the fifth seminar of this series, we heard from Rose Luckin, Professor of Learner Centred Design at the University College London (UCL) Knowledge Lab. Rose is Founder of EDUCATE Ventures Research Ltd., a London consultancy service working with start-ups, researchers, and educators to develop evidence-based educational technology.

Rose Luckin.
Rose Luckin, UCL

Based on her experience at EDUCATE, Rose spoke about how AI-based analysis could help educators gain a deeper understanding of their students, and how educators could work with AI systems to provide better learning resources to their students. This provided us with a different angle to the first four seminars in our current series, where we’ve been thinking about how young people learn to understand AI systems.

Rose Luckin's definition of AI: technology capable of actions and behaviours "requiring intelligence when done by humans".
Rose’s definition of artificial intelligence for this presentation.

Education and AI systems

AI systems have the potential to impact education in a number of different ways, which Rose distilled into three areas: 

  1. Using AI in education to tackle some of the big educational challenges
  2. Educating teachers about AI so that they can use it safely and effectively 
  3. Changing education so that we focus on human intelligence and prepare people for an AI world

It is clear that the three areas are interconnected, meaning developments in one area will affect the others. Rose’s focus during the seminar was the second area: educating people about AI.

Rose Luckin's definition of the three intersections of education and artificial intelligence, see text in list above.

What can AI systems do in education? 

Through giving examples of existing AI-based systems used for education, Rose described what in particular it is about AI systems that can be useful in an education setting. The first point she raised was that AI systems can adapt based on learning from data. Her main example was the AI-based platform ENSKILLS, which detects the user’s level of competency with spoken English through the user’s interactions with a virtual character, and gradually adapts the character to the user’s level. Other examples of adaptive AI systems for education include Carnegie Learning and Century Intelligent Learning.

We know that AI systems can respond to different forms of data. Rose introduced the example of OyaLabs to demonstrate how AI systems can gather and process real-time sensory data. This is an app that parents can use in a young child’s room to monitor the child’s interactions with others. The app analyses the data it gathers and produces advice for parents on how they can support their child’s language development.

AI system creators can also combine adaptivity and real-time sensory data processing  in their systems. One example Rosa gave of this was SimSensei from the University of Southern California. This is a simulated coach, which a student can interact with and which gathers real-time data about how the student is speaking, including their tone, speed of speech, and facial expressions. The system adapts its coaching advice based on these interactions and on what it learns from interactions with other students.

Getting ready for AI systems in education

For the remainder of her presentation, Rose focused on the framework she is involved in developing, as part of the EDUCATE service, to support organisations to prepare for implementing AI systems, including educators within these organisations. The aim of this ETHICAI framework is to enable organisations and educators to understand:

  • What AI systems are capable of doing
  • The strengths and weaknesses of AI systems
  • How data is used by AI systems to learn
The EDUCATE consultancy service's seven-part AI readiness framework, see test below for list.

Rose described the seven steps of the framework as:

  1. Educate, enthuse, excite – about building an AI mindset within your community 
  2. Tailor and Hone – the particular challenges you want to focus on
  3. Identify – identify (wisely), collate and …
  4. Collect – new data relevant to your focus
  5. Apply – AI techniques to the relevant data you have brought together
  6. Learn – understand what the data is telling you about your focus and return to step 5 until you are AI ready
  7. Iterate

She then went on to demonstrate how the framework is applied using the example of online teaching. Online teaching has been a key part of education throughout the coronavirus pandemic; AI systems could be used to analyse datasets generated during online teaching sessions, in order to make decisions for and recommendations to educators.

The first step of the ETHICAI framework is educate, enthuse, excite. In Rose’s example, this step consisted of choosing online teaching as a scenario, because it is very pertinent to a teacher’s practice. The second step is to tailor and hone in on particular challenges that are to be the focus, capitalising on what AI systems can do. In Rose’s example, the challenge is assessing the quality of online lessons in a way that would be useful to educators. The third step of the framework is to identify what data is required to perform this quality assessment.

Examples of data to be fed into an AI system for education, see text.

The fourth step is the collection of new data relevant to the focus of the project. The aim is to gain an increased understanding of what happens in online learning across thousands of schools. Walking through the online learning example, Rose suggested we might be able to collect the following types of data:

  • Log data
  • Audio data
  • Performance data
  • Video data, which includes eye-movement data
  • Historical data from tests and interviews
  • Behavioural data from surveying teachers and parents about how they felt about online learning

It is important to consider the ethical implications of gathering all this data about students, something that was a recurrent theme in both Rose’s presentation and the Q&A at the end.

Step five of the ETHICAI framework focuses on applying AI techniques to the relevant data to combine and process it. The figure below shows that in preparation, the various data sets need to be collated, cleaned, organised, and transformed.

Presentation slide showing that data for an AI system needs to be collated, cleaned, organised, and transformed.

From the correctly prepared data, interaction profiles can be produced in order to put characteristics from different lessons into groups/profiles. Rose described how cluster analysis using a combination of both AI and human intelligence could be used to sort lessons into groups based on common features.

The sixth step in Rose’s example focused on what may be learned from analysing collected data linked to the particular challenge of online teaching and learning. Rose said that applying an AI system to students’ behavioural data could, for example, give indications about students’ focus and confidence, and make or recommend interventions to educators accordingly.

Presentation slide showing example graphs of results produced by an AI system in education.

Where might we take applications of AI systems in education in the future?

Rose described that AI systems can possess some types of intelligence humans have or can develop: interdisciplinary academic intelligence, meta-knowing intelligence, and potentially social intelligence. However, there are types such as meta-contextual intelligence and perceived self-efficacy that AI systems are not able to demonstrate in the way humans can.

The seven types of human intelligence as defined by Rose Luckin: interdisciplinary academic knowledge, meta-knowing intelligence, social intelligence, metacognitive intelligence, meta-subjective intelligence, meta-contextual knowledge, perceived self-efficacy.

The use of AI systems in education can cause ethical issues. As an example, Rose pointed out the use of virtual glasses to identify when students need help, even if they do not realise it themselves. A system like this could help educators with assessing who in their class needs more help, and could link this back to student performance. However, using such a system like this has obvious ethical implications, and some of these were the focus of the Q&A that followed Rose’s presentation.

It’s clear that, in the education domain as in all other domains, both positive and negative outcomes of integrating AI are possible. In a recent paper written by Wayne Holmes (also from the UCL Knowledge Lab) and co-authors, ‘Ethics of AI in Education: Towards a Community Wide Framework’ [1], the authors suggest that the interpretation of data, consent and privacy, data management, surveillance, and power relations are all ethical issues that should be taken into consideration. Finding consensus for a practical ethical framework or set of principles, with all stakeholders, at the very start of an AI-related project is the only way to ensure ethics are built into the project and the AI system itself from the ground up.

Two boys at laptops in a classroom.

Ethical issues of AI systems more broadly, and how to involve young people in discussions of AI ethics, were the focus of our seminar with Dr Mhairi Aitken back in September. You can revisit the seminar recording, presentation slides, and summary blog post.

I really enjoyed both the focus and content of Rose’s talk: educators understanding how AI systems may be applied to education in order to help them make more informed decisions about how to best support their students. This is an important factor to consider in the context of the bigger picture of what young people should be learning about AI. The work that Rose and her colleagues are doing also makes an important contribution to translating research into practical models that teachers can use.

Join our next free seminars

You may still have time to sign up for our Tuesday 11 January seminar, today at 17:00–18:30 GMT, where we will welcome Dave Touretzky and Fred Martin, founders of the influential AI4K12 framework, which identifies the five big ideas of AI and how they can be integrated into education.

Next month, on 1 February at 17:00–18:30 GMT, Tara Chklovski (CEO of Technovation) will give a presentation called Teaching youth to use AI to tackle the Sustainable Development Goals at our seminar series.

If you want to join any of our seminars, click the button below to sign up and we will send you information on how to join. We look forward to seeing you there!

You’ll always find our schedule of upcoming seminars on this page. For previous seminars, you can visit our past seminars and recordings page.

The post How can AI-based analysis help educators support students? appeared first on Raspberry Pi.

Snapshots from the history of AI, plus AI education resources

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/machine-learning-education-snapshots-history-ai-hello-world-12/

In Hello World issue 12, our free magazine for computing educators, George Boukeas, DevOps Engineer for the Astro Pi Challenge here at the Foundation, introduces big moments in the history of artificial intelligence (AI) to share with your learners:

The story of artificial intelligence (AI) is a story about humans trying to understand what makes them human. Some of the episodes in this story are fascinating. These could help your learners catch a glimpse of what this field is about and, with luck, compel them to investigate further.                   

The imitation game

In 1950, Alan Turing published a philosophical essay titled Computing Machinery and Intelligence, which started with the words: “I propose to consider the question: Can machines think?” Yet Turing did not attempt to define what it means to think. Instead, he suggested a game as a proxy for answering the question: the imitation game. In modern terms, you can imagine a human interrogator chatting online with another human and a machine. If the interrogator does not successfully determine which of the other two is the human and which is the machine, then the question has been answered: this is a machine that can think.

A statue of Alan Turing on a park bench in Manchester.
The Alan Turing Memorial in Manchester

This imitation game is now a fiercely debated benchmark of artificial intelligence called the Turing test. Notice the shift in focus that Turing suggests: thinking is to be identified in terms of external behaviour, not in terms of any internal processes. Humans are still the yardstick for intelligence, but there is no requirement that a machine should think the way humans do, as long as it behaves in a way that suggests some sort of thinking to humans.

In his essay, Turing also discusses learning machines. Instead of building highly complex programs that would prescribe every aspect of a machine’s behaviour, we could build simpler programs that would prescribe mechanisms for learning, and then train the machine to learn the desired behaviour. Turing’s text provides an excellent metaphor that could be used in class to describe the essence of machine learning: “Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child’s? If this were then subjected to an appropriate course of education one would obtain the adult brain. We have thus divided our problem into two parts: the child-programme and the education process.”

A chess board with two pieces of each colour left.
Chess was among the games that early AI researchers like Alan Turing developed algorithms for.

It is remarkable how Turing even describes approaches that have since been evolved into established machine learning methods: evolution (genetic algorithms), punishments and rewards (reinforcement learning), randomness (Monte Carlo tree search). He even forecasts the main issue with some forms of machine learning: opacity. “An important feature of a learning machine is that its teacher will often be very largely ignorant of quite what is going on inside, although he may still be able to some extent to predict his pupil’s behaviour.”

The evolution of a definition

The term ‘artificial intelligence’ was coined in 1956, at an event called the Dartmouth workshop. It was a gathering of the field’s founders, researchers who would later have a huge impact, including John McCarthy, Claude Shannon, Marvin Minsky, Herbert Simon, Allen Newell, Arthur Samuel, Ray Solomonoff, and W.S. McCulloch.   

Go has vastly more possible moves than chess, and was thought to remain out of the reach of AI for longer than it did.

The simple and ambitious definition for artificial intelligence, included in the proposal for the workshop, is illuminating: ‘making a machine behave in ways that would be called intelligent if a human were so behaving’. These pioneers were making the assumption that ‘every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it’. This assumption turned out to be patently false and led to unrealistic expectations and forecasts. Fifty years later, McCarthy himself stated that ‘it was harder than we thought’.

Modern definitions of intelligence are of distinctly different flavour than the original one: ‘Intelligence is the quality that enables an entity to function appropriately and with foresight in its environment’ (Nilsson). Some even speak of rationality, rather than intelligence: ‘doing the right thing, given what it knows’ (Russell and Norvig).

A computer screen showing a complicated graph.
The amount of training data AI developers have access to has skyrocketed in the past decade.

Read the whole of this brief history of AI in Hello World #12

In the full article, which you can read in the free PDF copy of the issue, George looks at:

  • Early advances researchers made from the 1950s onwards while developing games algorithms, e.g. for chess.
  • The 1997 moment when Deep Blue, a purpose-built IBM computer, beating chess world champion Garry Kasparov using a search approach.
  • The 2011 moment when Watson, another IBM computer system, beating two human Jeopardy! champions using multiple techniques to answer questions posed in natural language.
  • The principles behind artificial neural networks, which have been around for decades and are now underlying many AI/machine learning breakthroughs because of the growth in computing power and availability of vast datasets for training.
  • The 2017 moment when AlphaGo, an artificial neural network–based computer program by Alphabet’s DeepMind, beating Ke Jie, the world’s top-ranked Go player at the time.
Stacks of server hardware behind metal fencing in a data centre.
Machine learning systems need vast amounts of training data, the collection and storage of which has only become technically possible in the last decade.

More on machine learning and AI education in Hello World #12

In your free PDF of Hello World issue 12, you’ll also find:

  • An interview with University of Cambridge statistician David Spiegelhalter, whose work shaped some of the foundations of AI, and who shares his thoughts on data science in schools and the limits of AI 
  • An introduction to Popbots, an innovative project by MIT to open AI to the youngest learners
  • An article by Ken Kahn, researcher in the Department of Education at the University of Oxford, on using the block-based Snap! language to introduce your learners to natural language processing
  • Unplugged and online machine learning activities for learners age 7 to 16 in the regular ‘Lesson plans’ section
  • And lots of other relevant articles

You can also read many of these articles online on the Hello World website.

Find more resources for AI and data science education

In Hello World issue 16, the focus is on all things data science and data literacy for your learners. As always, you can download a free copy of the issue. And on our Hello World podcast, we chat with practicing computing educators about how they bring AI, AI ethics, machine learning, and data science to the young people they teach.

If you want a practical introduction to the basics of machine learning and how to use it, take our free online course.

Drawing of a machine learning ars rover trying to decide whether it is seeing an alien or a rock.

There are still many open questions about what good AI and data science education looks like for young people. To learn more, you can watch our panel discussion about the topic, and join our monthly seminar series to hear insights from computing education researchers around the world.

We are also collating a growing list of educational resources about these topics based on our research seminars, seminar participants’ recommendations, and our own work. Find the resource list here.

The post Snapshots from the history of AI, plus AI education resources appeared first on Raspberry Pi.

New AWS Scholarship Program Helps Underrepresented and Underserved Students Prep for Careers in AI and ML

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/new-aws-scholarship-program-helps-underrepresented-and-underserved-students-prep-for-careers-in-ai-and-ml/

As a woman working in information technology (IT) for many years, it has always been close to my heart to challenge long-standing gender stereotypes and inspire more young learners to consider a career in tech. With artificial intelligence (AI) and machine learning (ML) defining the future of technology, this future also depends on diverse representation.

The World Economic Forum estimates that technological advances and automation will create 97 million new technology jobs by 2025, including in the field of AI and ML. Yet, according to their research, women make up just 32% of AI jobs globally. The Pew Research Center found that Black and Hispanic workers in the U.S. comprise just 9% and 8% of workers in the science, technology, engineering, and mathematics (STEM) careers respectively.

At Amazon, we believe that technology should be built in a way that’s inclusive, diverse, and equitable. We already offer STEM programs to enable underrepresented groups to build or advance their technical skills and open up new career possibilities. Programs such as We Power Tech, Amazon Future Engineer, AWS Girls’ Tech Day, and AWS GetIT aim to build a future of tech that is inclusive, diverse, and accessible.

Today, I am excited to announce the launch of the AWS AI & ML Scholarship program in collaboration with Intel and Udacity, designed to prepare underrepresented and underserved students globally for careers in ML.

AWS AI & ML Scholarship Program Debuts with AWS DeepRacer Student
The AWS AI & ML Scholarship program is launching as part of the all-new AWS DeepRacer Student service and Student League. This is a new student division of the popular AWS DeepRacer program, a cloud-based 3D car racing simulator that provides a fun way to learn about ML and reinforcement learning (RL). Through DeepRacer Student, you have access to free online trainings to learn the ML and RL basics. You also have access to 10 hours of model training and 5 GB of storage per month to participate in the DeepRacer Student League, a global autonomous racing competition exclusively for AWS AI & ML students.

As part of DeepRacer Student, the AWS AI & ML Scholarship program in collaboration with Intel and Udacity is geared toward underserved and underrepresented high school and college students globally who are at least 16 years old. These are students who may have faced financial barriers growing up or are part of underrepresented groups, including women, people with disabilities, LGBTQ+ persons, as well as people of color. Students in these groups who take part in the DeepRacer Student League are eligible for a chance to win one or two of 2,500 annual scholarships from Udacity, an online learning platform focused on technology skills.

How to Qualify for the AWS AI & ML Scholarship
AWS DeepRacer Student League LogoTo be considered for a scholarship, you must successfully finish all AWS DeepRacer Student learning modules and achieve a score of at least 80% on all course quizzes, reach a certain lap time performance with your DeepRacer car in the Student League, and submit an essay.

Each year, 2,000 students will win a scholarship to the AI Programming with Python Udacity Nanodegree program. Udacity Nanodegrees are massive open online courses (MOOCs) designed to bridge the gap between learning and career goals. This aims to equip students with programming and ML fundamentals to solve real world problems with ML. The top 500 participants in this first Nanodegree will be eligible to join a second customized Nanodegree program curated specifically for AWS AI & ML Scholarship program recipients.

Coaching and Mentorship Opportunities for Scholarship Recipients
Scholarship recipients not only get access to the educational content, but also receive up to 85 hours of support from Udacity instructors to make sure that students successfully learn and progress through the Nanodegree. Instructors and students meet weekly in small groups, review the content for the week, answer questions, and work on a case study as a group exercise.

The top 500 participants who advance into the second Nanodegree will receive 12 months of mentorship from tenured technology leaders at Amazon and Intel to help prepare for a tech career.

All scholarship recipients will be given exclusive access to Ask-Me-Anythings (AMAs), fireside chats, and office hours with AI/ML professionals and diversity experts from Amazon, Intel, and AWS collaborators such as Girls in Tech. These will help familiarize students with different job functions in the AI/ML field.

Enroll via AWS DeepRacer Student
To enroll in the AWS AI & ML Scholarship program, first sign up at the AWS DeepRacer Student service with a valid email address. Note that this student player account is separate from an AWS account and doesn’t require you to provide any billing or credit card information.

AWS DeepRacer Student Sign Up

You will receive a verification code via email. Once you have verified your email address, log in to the DeepRacer Student service and complete the sign-up process.

AWS DeepRacer Student Complete Sign Up

The form will ask for some additional information, including your school, major, and planned year of graduation. You must also self-certify that you are a student enrolled in either high school, university, or community college.

AWS DeepRacer Student Complete Sign Up with Personal Information

Next, enroll in the AWS AI & ML Scholarship program by selecting the corresponding checkbox. The scholarship qualification will start on March 1, 2022.

AWS DeepRacer Student Opt In To Scholarship

Welcome to AWS DeepRacer Student! Start learning the fundamentals of ML through the provided online trainings. You can also participate in the pre-season DeepRacer Student League before the qualifying races start on March 1, 2022.

AWS DeepRacer Student Dashboard

Available Now
Start preparing for the AWS AI & ML Scholarship program by signing up for AWS DeepRacer Student, available globally today, and opt in to the scholarship program. Start learning with learning modules, train your first AWS DeepRacer model, and see how it performs in AWS DeepRacer Student League Pre-season happening now. Join the AWS Slack community to connect with experts and ask questions.

Sign up for AWS DeepRacer Student and the AWS AI & ML Scholarship program today.

Antje

Now in Preview – Amazon SageMaker Studio Lab, a Free Service to Learn and Experiment with ML

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/now-in-preview-amazon-sagemaker-studio-lab-a-free-service-to-learn-and-experiment-with-ml/

Our mission at AWS is to make machine learning (ML) more accessible. Through many conversations over the past years, I learned about barriers that many ML beginners face. Existing ML environments are often too complex for beginners, or too limited to support modern ML experimentation. Beginners want to quickly start learning and not worry about spinning up infrastructure, configuring services, or implementing billing alarms to avoid going over budget. This emphasizes another barrier for many people: the need to provide billing and credit card information at sign-up.

What if you could have a predictable and controlled environment for hosting your Jupyter notebooks in which you can’t accidentally run up a big bill? One that doesn’t require billing and credit card information at all at sign-up?

Today, I am extremely happy to announce the public preview of Amazon SageMaker Studio Lab, a free service that enables anyone to learn and experiment with ML without needing an AWS account, credit card, or cloud configuration knowledge.

At AWS, we believe technology has the power to solve the world’s most pressing issues. And, we proudly support the new and innovative ways that our customers are using these technologies to deliver social impacts.

This is why I am also excited to announce the launch of the AWS Disaster Response Hackathon using Amazon SageMaker Studio Lab. The hackathon, starting today and running through February 7, 2022, is a great way to start learning ML while doing good in the world. I will share more details on how to get involved at the end of the post.

Getting Started with Amazon SageMaker Studio Lab
Studio Lab is based on open-source JupyterLab and gives you free access to AWS compute resources to quickly start learning and experimenting with ML. Studio Lab is also simple to set up. In fact, the only configuration you have to do is one click to choose whether you need a CPU or GPU instance for your project. Let me show you.

The first step is to request a free Studio Lab account here.

Amazon SageMaker Studio Lab

When your account request is approved, you will receive an email with a link to the Studio Lab account registration page. You can now create your account with your approved email address and set a password and your username. This account is separate from an AWS account and doesn’t require you to provide any billing information.

Amazon SageMaker Studio Lab - Create Account

Once you have created your account and verified your email address, you can sign in to Studio Lab. Now, you can select the compute type for your project. You can choose between 12 hours of CPU or 4 hours of GPU per user session, with an unlimited number of user sessions available to you. Furthermore, you get a minimum of 15 GB of persistent storage per project. When your session expires, Studio Lab will take a snapshot of your environment. This enables you to pick up right where you left off. Let’s select CPU for this demo, and choose Start runtime.

Amazon SageMaker Studio Lab - Select Compute

Once the instance is running, select Open project to go to your free Studio Lab environment and start building. No additional configuration is required.

Amazon SageMaker Studio Lab - Open Project

Amazon SageMaker Studio Lab Environment

Customize your environment
Studio Lab comes with a Python base image to get you started. The image only has a few libraries pre-installed to save the available space for the frameworks and libraries that you actually need.

Amazon SageMaker Studio Lab - Select Kernel

You can customize the Conda environment and install additional packages using the %conda install <package> or %pip install <package> command right from within your notebook. You can also create entirely new, custom Conda environments, or install open-source JupyterLab and Jupyter Server extensions. For detailed instructions, see the Studio Lab documentation.

GitHub integration
Studio Lab is tightly integrated with GitHub and offers full support for the Git command line. This lets you easily clone, copy, and save your projects. Moreover, you can add an Open in Studio Lab badge to the README.md file or notebooks in your public GitHub repo to share your work with others.

Open in Amazon SageMaker Studio Lab Badge

This will let everyone open and view the notebook in Studio Lab. If they have a Studio Lab account, then they can also run the notebook. Add the following markdown to the top of your README.md file or notebook to add the Open in Studio Lab badge:

[![Open In Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/org/repo/blob/master/path/to/notebook.ipynb)

Replace org, repo, path and the notebook filename with those for your repo. Then, when you click the Open in Studio Lab badge, it will preview the notebook in Studio Lab. If your repo is private within a GitHub account or organization and you would like other people to use it, then you must additionally install the Amazon SageMaker GitHub App at the GitHub account or organization level.

Amazon SageMaker Studio Lab Notebook Preview

If you have a Studio Lab account, you can click Copy to project and choose to either copy just the notebook or to clone the entire repo into your Studio Lab account. Moreover, Studio Lab can check if the repository contains a Conda environment file and build the custom Conda environment for you.

Learn the Fundamentals of ML
If you are new to ML, then Studio Lab provides access to free, educational content to get you started. Dive into Deep Learning (D2L) is a free interactive book that teaches the ideas, the math, and the code behind ML and DL. The AWS Machine Learning University (MLU) gives you access to the same ML courses used to train Amazon’s own developers on ML. Hugging Face is a large open source community and a hub for pre-trained deep learning (DL) models. This is mainly aimed at natural language processing. In just a few clicks, you can import the relevant notebooks from D2L, MLU, and Hugging Face into your Studio Lab environment.

Join the AWS Disaster Response Hackathon using Amazon SageMaker Studio Lab
The frequency and severity of natural disasters are increasing. This year alone, we have seen significant wildfires across the Western United States and in countries like Greece and Turkey; major floods across Europe; and Hurricane Ida’s impact to the coast of Louisiana. In response, governments, businesses, nonprofits, and international organizations are placing more emphasis on disaster preparedness and response than ever before.

AWS Disaster Response Hackathon

Through the AWS Disaster Response Hackathon offering a total of $54,000 USD in prices, we hope to stimulate ways of applying ML to solve pressing challenges in natural disaster preparedness and response.

Join the hackathon today, start building, and don’t forget to submit your project before February 7, 2022. This hackathon is also an attempt to set the Guinness World Record for the “largest machine learning competition.”

Join the Preview
You can request a free Amazon SageMaker Studio Lab account starting today. The number of new account registrations will be limited to ensure a high quality of experience for all customers. You can find sample notebooks in the Studio Lab GitHub repository. Give it a try and let us know your feedback.

Request a free Amazon SageMaker Studio Lab account.

Antje

New – Introducing SageMaker Training Compiler

Post Syndicated from Sean M. Tracey original https://aws.amazon.com/blogs/aws/new-introducing-sagemaker-training-compiler/

An image explaining the benefits of using Amazon SageMaker Training CompilerToday, we’re pleased to announce Amazon SageMaker Training Compiler, a new Amazon SageMaker capability that can accelerate the training of deep learning (DL) models by up to 50%.

As DL models grow in complexity, so too does the time it can take to optimize and train them. For example, it can take 25,000 GPU-hours to train popular natural language processing (NLP) model “RoBERTa“. Although there are techniques and optimizations that customers can apply to reduce the time it can take to train a model, these also take time to implement and require a rare skillset. This can impede innovation and progress in the wider adoption of artificial intelligence (AI).

How has this been done to date?
Typically, there are three ways to speed up training:

  1. Using more powerful, individual machines to process the calculations
  2. Distributing compute across a cluster of GPU instances to train the model in parallel
  3. Optimizing model code to run more efficiently on GPUs by utilizing less memory and compute.

In practice, optimizing machine learning (ML) code is difficult, time-consuming, and a rare skill set to acquire. Data scientists typically write their training code in a Python-based ML framework, such as TensorFlow or PyTorch, relying on ML frameworks to convert their Python code into mathematical functions that can run on GPUs, commonly known as kernels. However, this translation from the Python code of a user is often inefficient because ML frameworks use pre-built, generic GPU kernels, instead of creating kernels specific to the code and model of the user.

It can take even the most skilled GPU programmers months to create custom kernels for each new model and optimize them. We built SageMaker Training Compiler to solve this problem.

Today’s launch lets SageMaker Training Compiler automatically compile your Python training code and generate GPU kernels specifically for your model. Consequently, the training code will use less memory and compute, and therefore train faster. For example, when fine-tuning Hugging Face’s GPT-2 model, SageMaker Training Compiler reduced training time from nearly 3 hours to 90 minutes.

Automatically Optimizing Deep Learning Models
So, how have we achieved this acceleration? SageMaker Training Compiler accelerates training jobs by converting DL models from their high-level language representation to hardware-optimized instructions that train faster than jobs with off-the-shelf frameworks. Under the hood, SageMaker Training Compiler makes incremental optimizations beyond what the native PyTorch and TensorFlow frameworks offer to maximize compute and memory utilization on SageMaker GPU instances.

More specifically, SageMaker Training Compiler uses graph-level optimization (operator fusion, memory planning, and algebraic simplification), data flow-level optimizations (layout transformation, common sub-expression elimination), and back-end optimizations (memory latency hiding, loop oriented optimizations) to produce an optimized model that efficiently uses hardware resources. As a result, training is accelerated by up to 50%, and the returned model is the same as if SageMaker Training Compiler had not been used.

But how do you use SageMaker Training Compiler with your models? It can be as simple as adding two lines of code!

SageMaker Training Compiler Code Changes

The shortened training times mean that customers gain more time for innovating and deploying their newly-trained models at a reduced cost and a greater ability to experiment with larger models and more data.

Getting the most from SageMaker Training Compiler
Although many DL models can benefit from SageMaker Training Compiler, larger models with longer training will realize the greatest time and cost savings. For example, training time and costs fell by 30% on a long-running RoBERTa-base fine-tuning exercise.

Jorge Lopez Grisman, a Senior Data Scientist at Quantum Health – an organization on a mission to “make healthcare navigation smarter, simpler, and more cost-effective for everyone” – said:

“Iterating with NLP models can be a challenge because of their size: long training times bog down workflows and high costs can discourage our team from trying larger models that might offer better performance. Amazon SageMaker Training Compiler is exciting because it has the potential to alleviate these frictions. Achieving a speedup with SageMaker Training Compiler is a real win for our team that will make us more agile and innovative moving forward.”

Further Resources
To learn more about how Amazon SageMaker Training Compiler can benefit you, you can visit our page here. And to get started see our technical documentation here.

New – Create and Manage EMR Clusters and Spark Jobs with Amazon SageMaker Studio

Post Syndicated from Sean M. Tracey original https://aws.amazon.com/blogs/aws/new-create-and-manage-emr-clusters-and-spark-jobs-with-amazon-sagemaker-studio/

Today, we’re very excited to offer three new enhancements to our Amazon SageMaker Studio service.

As of now, users of SageMaker Studio can create, terminate, manage, discover, and connect to Amazon EMR clusters running within a single AWS account and in shared accounts across an organization—all directly from SageMaker Studio. Furthermore, SageMaker Studio Notebook users can able to utilize SparkUI to monitor and debug Spark jobs running on an Amazon EMR cluster—directly from the SageMaker Studio Notebooks!

The story so far…
Before today, SageMaker Studio users had some ability to find and connect with EMR clusters, provided that they were running in the same account as SageMaker Studio. While useful in many circumstances, if a cluster did not exist that would suit the requirements of the model or analysis being run, then data scientists would have to leave their development environment and manually configure a cluster that suited their needs. As well as being disruptive to workflow of data scientists, there are no guarantees that the data scientists would have either the permissions or depth of knowledge required to provision a cluster that would enable them to continue with their work. Additionally, being restricted to creating and managing clusters in a single account could become prohibitive in organizations working across many AWS accounts.

What’s new?
Data scientists can:

  • Discover, manage, create, terminate, and connect to Amazon EMR clusters from within SageMaker Studio
  • Utilize “templates” – a new way to configure and provision clusters for your workload needs with support from seasoned DevOps practitioners
  • Connect to, debug, and monitor Spark jobs running on an Amazon EMR cluster from within a SageMaker Studio Notebook

Creating, Connecting to, and Managing EMR Clusters

Connecting to an EMR Cluster from a SageMaker Studio Notebook

With the ability to connect to and manage EMR clusters from within SageMaker Studio, data scientists no longer have to leave their familiar environment to create, configure and provision the EMR clusters where they run their workloads.

Introducing Templates
A template is a collection of off-the-shelf cluster configurations optimized for numerous workloads. Templates can be created and managed by DevOps administrators and made available through the AWS Service Catalog to data scientists within SageMaker Studio. This lets them quickly spin up a cluster to meet their needs, all while safe in the knowledge that a trusted DevOps admin has correctly configured a cluster per the project’s requirements. Furthermore, this lets data scientists get on with the work they do best, and it gives DevOps administrators within these teams greater ability to manage the types of provisioned infrastructure.

Managing EMR Clusters from within SageMaker Studio Notebooks

Directly Connect to and monitor Spark Jobs
Finally, to make the job of data scientists even simpler, we’ve built the ability to connect to, debug, and monitor Spark jobs running on an Amazon EMR cluster from within a SageMaker Studio Notebook. Before now, to access the monitoring UI of a Spark Job, one needed to configure secure tunnels and web proxies to gain direct access to currently executing jobs, adding friction to the workflow of a data scientist trying to observe and debug their workloads. Now, with these new features, users will have one-click access directly from the interface that they already know. This enables them to build and put their workloads to work, rather than spending time on configuring infrastructure and workloads.

Connecting to a Spark Job from within a SageMaker Studio Notebook

These new features let data scientists can use a simple, consistent UI to provision and manage infrastructure as needed without ever having to leave SageMaker Studio or dive into the minutiae of the provisioning of such hardware – Moreover, they won’t have to spend time configuring proxies and SSH tunnels to debug and monitor ongoing Spark jobs.

Find out more
These features are generally available in all AWS Regions where SageMaker Studio is available, and there are no additional charges to use this capability. For complete information on pricing and regional availability, please refer to the SageMaker Studio pricing page .

To learn more, see our documentation.

Announcing Amazon SageMaker Ground Truth Plus

Post Syndicated from Sean M. Tracey original https://aws.amazon.com/blogs/aws/announcing-amazon-sagemaker-ground-truth-plus/

Today, we’re pleased to announce the latest service in the Amazon SageMaker suite that will make labeling datasets easier than ever before. Ground Truth Plus is a turn-key service that uses an expert workforce to deliver high-quality training datasets fast, and reduces costs by up to 40 percent.

The Challenges of Machine Learning Model Creation
One of the biggest challenges in building and training machine learning (ML) models is sourcing enough high-quality, labeled data at scale to feed into and train those models so that they can make an accurate prediction.

On the face of it, labeling data might seem like a fairly straightforward task…

  • Step 1: Get data
  • Step 2: Label it

…but this is far from the reality.

Even before you have labelers begin annotations, you need a custom labeling workflow and user interface specific to your project so that you get a high-quality dataset. This relies on a combination of robust tooling and skilled workers, and the effort spent can be significant.

Once the data labeling workflow and user interface has been constructed, a workforce to use those systems must be organized and trained – and this is all before a single point of data has been labeled!

Finally, once the labeling systems have been built, the workflows designed, and the workforce trained and deployed, the process of passing data through that system must be monitored and checked to ensure a consistent, high-quality output. After enough data has been passed through and labeled by the system, you have arrived at the point you’ve been trying to get to all along: you finally have enough data to train the ML model.

Each of these steps represents a significant investment in time, costs, and energy. You could be spending these resources building ML models instead of labeling and managing data, and using Ground Truth Plus can help free you up to do just that.

Introducing Amazon SageMaker Ground Truth Plus
Amazon SageMaker Ground Truth Plus enables you to easily create high-quality training datasets without having to build labeling applications and manage the labeling workforce on your own. Which means you don’t even need to have deep ML expertise or extensive knowledge of workflow design and quality management. You simply provide data along with labeling requirements and Ground Truth Plus sets up the data labeling workflows and manages them on your behalf in accordance with your requirements.

For example, if you need medical experts to label radiology images, you can specify that in the guidelines you provide to Ground Truth Plus. The service will then automatically select labelers trained in radiology to label your data, and from there an expert workforce that is trained on a variety of ML tasks will start labeling the data. Ground Truth Plus brings ML-powered automation to data labeling, which increases the quality of the output dataset and decreases the data labeling costs.

Amazon SageMaker Ground Truth Plus uses a multi-step labeling workflow including ML techniques for active learning, pre-labeling, and machine validation. This reduces the time required to label datasets for a variety of use cases including computer vision and natural language processing. Finally, Ground Truth Plus provides transparency into data labeling operations and quality management through interactive dashboards and user interfaces. This lets you monitor the progress of training datasets across multiple projects, track project metrics such as daily throughput, inspect labels for quality, and provide feedback on the labeled data.

How Does It Work?
A SageMaker Ground Truth Plus screenshot showing the request formFirst, let’s head to the new Ground Truth Plus console and fill out a form outlining the requirements for the data labeling project. Following that, our team of AWS Experts will schedule a call to discuss your data labeling project.

After the call, you simply upload data in an Amazon Simple Storage Service (Amazon S3) bucket for labeling.

Once the data has been uploaded, our experts will set-up the data labeling workflow per your requirements and create a team of labelers with the expertise necessary to label your data effectively. This helps make sure that you have the best people possible working on your projects.

These expert labelers use the Ground Truth Plus tools we’ve built to label these datasets quickly and effectively.

Initially, labelers will annotate the data you’ve uploaded, much like the following example image that we’ve uploaded from the CBCL StreetScenes dataset. However, as the labelers start to submit examples of labeled data, something cool begins happening: our ML systems kick in and start to pre-label the images on behalf of the expert workforce!

An example of the raw dataset used to demonstrate Amazon SageMaker Ground Truth Plus functionality

As more and more data is labeled by the expert workforce, the ML model becomes better at pre-labeling those images. This means that there’s less need for a human to spend as much time creating each individual label for every object of interest in a dataset. Less time spent on labeling means lower costs for you, and it also means a quicker turnaround in creating a dataset that can be used for training a model – all without sacrificing quality.

A screenshot showing one of the labelling interfaces for SageMaker Ground Truth Plus

As the process continues, these ML models will also start to highlight potential areas of interest that the labeling workforce may have missed or incorrectly labeled through machine validation (indicated below by the purple box). Once an area of interest has been highlighted, a human labeler can view and either confirm or delete the suggestion that the model has made. This iteratively improves the pre-labeling and machine validation stages, further reducing the time needed by a labeler to manually label the data, and ensures a high-quality output throughout the process.

A screenshot showing one of the labelling interfaces filed in my a machine learning model for SageMaker Ground Truth Plus

While this is all going on, you can monitor the progress and output of the project using the Ground Truth Plus Project Portal. Within this portal, you can track the amount of data labeled on a day-by-day basis, and make sure that the project is progressing at an acceptable rate.

A screenshot showing the metrics dashboard enabling users to track the progress of their labelling jobs in SageMaker Ground Truth Plus

With each batch of images uploaded and labeled, you can decide whether to accept them or send them back for relabeling if something has been missed.

Finally, when the labeling process has completed, you can retrieve the labeled data from a secure S3 bucket and get to the business of training models.

Find out more
Today, Amazon SageMaker Ground Truth Plus is available in the N. Virginia (us-east-1) region.

To learn more:

New – Amazon DevOps Guru for RDS to Detect, Diagnose, and Resolve Amazon Aurora-Related Issues using ML

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/new-amazon-devops-guru-for-rds-to-detect-diagnose-and-resolve-amazon-aurora-related-issues-using-ml/

Today we are announcing Amazon DevOps Guru for RDS, a new capability for Amazon DevOps Guru. It allows developers to easily detect, diagnose, and resolve performance and operational issues in Amazon Aurora.

Hundreds of thousands of customers nowadays are using Amazon Aurora because it is highly available, scalable, and durable. But as applications grow in size and complexity, it becomes more challenging for these customers to detect and resolve operational and performance issues quickly.

During last year’s re:Invent, we announced DevOps Guru, a service that uses machine learning (ML) to automatically detect and alert customers of application issues, including database problems. Today we are announcing DevOps Guru for RDS to help developers using Amazon Aurora databases to detect, diagnose, and resolve database performance issues fast and at scale. Now developers will have enough information to determine the exact cause for a database performance issue. This launch will save developers and engineers many hours of work trying to uncover and remediate the performance-related database issues.

DevOps Guru for RDS uses ML to automatically identify and analyze a wide range of performance-related database issues, such as over-utilization of host resources, database bottlenecks, or misbehavior of SQL queries. It also recommends solutions to remediate the issues it finds. To use this capability, you don’t need to be a database or ML expert.

When an issue is detected, DevOps Guru for RDS displays the finding in the DevOps Guru console and sends notifications using Amazon EventBridge or Amazon Simple Notification Service (SNS). This allows developers to automatically manage and take real-time action on the issues.

How DevOps Guru for RDS Works
DevOps Guru for RDS uses anomaly detection on the database load (DB load) performance metric to detect issues. DB load is measured in units of Average Active Sessions (AAS). DB load measures the level of activity in your database, making it a great metric to understand the health of your database. If the DB load is high, this can result in performance issues. This metric can be compared to the number of virtual CPUs (vCPUs), and if the DB load is higher than that number, issues can arise.

The most useful dimensions for this metric are the wait events and the top SQL. The wait event describes what the system conditions that are currently running SQLs are waiting on. The most common reasons why a statement is waiting is that it is waiting for the CPU, waiting for a read or write, or waiting for a locked resource. The top SQL dimension shows which queries are contributing the most to DB load.

The following image is an example of a finding that DevOps Guru for RDS reported. The graph shows that from the AAS, most of them were waiting for access to a table or for CPU.

Example of anomaly detection
If you continue scrolling on the DevOps Guru for RDS analysis page, you can discover the cause for the problem and some recommendations to fix it. In this particular example, two problems were detected: high-load wait events and CPU capacity exceeded.

DevOps Guru for RDS looks more in-depth into these problems. First, it looks at the high-load wait events, where there were 27 AAS for the IO and CPU wait types, which is 99 percent of the total DB load.

Second, it tells us that the running tasks exceeded six processes. This database only has two vCPUs, and the recommended number of running processes should be a maximum of four (2x vCPUs). DevOps Guru for RDS also makes recommendations to fix these issues.

Recommendations

In another anomaly, the graph shows that there was a high load of wait events, and one SQL query was found to require further investigation. You can even see the exact SQL query if you click on the SQL digest IDs. The insight’s analysis and recommendation section is full of information on how to investigate further and fix the issue. You can get a lot of detailed information by clicking on the wait event, for example, on the wait event wait/io/table/sql/handler or in the View troubleshooting doc link.

Analysis and recommendations

Get started with DevOps Guru for RDS
To get started with this new capability of DevOps Guru, make sure that Performance Insights is enabled for your Amazon Aurora DB instances. It supports Amazon Aurora with MySQL- and PostgreSQL-compatibility. For instructions on how to enable Performance Insights, see Enabling and disabling Performance Insights.

The next step is to enable DevOps Guru to start monitoring your AWS resources. You can specify the resources you want to be covered by DevOps Guru.

If you are already using DevOps Guru, whenever there is a new insight for an Amazon Aurora database resource, you will see it in the console.

To see the detailed database analysis, navigate to the Insight page and select the new View analysis button under the DB load aggregated metric. That button will take you to the detailed analysis by DevOps Guru for RDS.

View analysis

Pricing and Availability
DevOps Guru for RDS is offered to customers at no additional charge, as part of the existing price that DevOps Guru charges customers for RDS resources.

DevOps Guru for RDS is available in all Regions where DevOps Guru is available, US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

Learn more about DevOps Guru for RDS and check out the talk at AWS re:Invent “Automatically detect and resolve performance issues with Amazon DevOps Guru for RDS” (Session Id 15877).

Marcia

Announcing Amazon SageMaker Canvas – a Visual, No Code Machine Learning Capability for Business Analysts

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/announcing-amazon-sagemaker-canvas-a-visual-no-code-machine-learning-capability-for-business-analysts/

As an organization facing business problems and dealing with data on a daily basis, the ability to build systems that can predict business outcomes becomes very important. This ability lets you solve problems and move faster by automating slow processes and embedding intelligence in your IT systems.

But how do you make sure that all teams and individual decision makers in the organization are empowered to create these machine learning (ML) systems at scale, and without depending on other data science and data engineering teams? As a business user or data analyst, you’d like to build and use prediction systems based on the data that you analyze and process every day, without having to learn about hundreds of algorithms, training parameters, evaluation metrics, and deployment best practices.

Today, I’m excited to announce the general availability of Amazon SageMaker Canvas, a new visual, no code capability that allows business analysts to build ML models and generate accurate predictions without writing code or requiring ML expertise. Its intuitive user interface lets you browse and access disparate data sources in the cloud or on-premises, combine datasets with the click of a button, train accurate models, and then generate new predictions once new data is available.

SageMaker Canvas leverages the same technology as Amazon SageMaker to automatically clean and combine your data, create hundreds of models under the hood, select the best performing one, and generate new individual or batch predictions. It supports multiple problem types such as binary classification, multi-class classification, numerical regression, and time series forecasting. These problem types let you address business-critical use cases, such as fraud detection, churn reduction, and inventory optimization, without writing a single line of code.

SageMaker Canvas in Action
Imagine that I’m an e-commerce manager who needs to predict whether or not a product will be shipped on time. The datasets at my disposal consist of a product catalog and the historical shipping dataset, both in CSV format.

First, I enter the SageMaker Canvas application where all of my models and datasets are created and inspected.

I select Import, and upload two CSV files: ProductData.csv and ShippingData.csv. I have 120 products and 10,000 shipping records.

I could also fetch data from Amazon Simple Storage Service (Amazon S3) or connect to other cloud or on-premises data sources, such as Amazon Redshift or Snowflake. For this use case, I prefer to upload 1.6 MB of data directly from my computer.

Before confirming the import, I have a chance to preview the two datasets, their columns, and their respective values. For example, each product has a ComputerBrand, ScreenSize, and PackageWeight. In addition to useful columns such as ShippingOrigin, OrderDate, and ShippingPriority, each record in the shipping dataset also contains OnTimeDelivery, which is either On Time or Late. This column will be used by SageMaker Canvas to generate a prediction model based on historical data.

After a few seconds of processing, the datasets are ready, and I decide to join them to create a single dataset containing both product and shipping information. This is an optional step that often lets you increase the precision of a prediction model.

Now I can simply drag and drop the two datasets: SageMaker Canvas will automatically identify the shared ProductId column and apply an Inner Join transformation.

The join preview lets me visualize the resulting columns, identify missing or invalid values, and optionally deselect unwanted columns.

I select Save joined data and provide a new name for this joined dataset, which now includes 16 columns and 10,000 records.

Next, I want to create a model and start by selecting New model in the Models section on the left menu. I call it On Time Prediction Model.

The first step is selecting a dataset.

I select a target column that my model will predict: OnTimeDelivery.

SageMaker Canvas shows me the value distribution and already recommends the most appropriate model type: two categories classification.

Before proceeding with the model training, I have the option to generate an analysis report. This analysis gives me two very important pieces of information: the estimated accuracy and the impact of each column.

The estimated accuracy of 99.9% gives me confidence, but then I notice that the highest impact is provided by the ActualShippingDays column. Unfortunately, this column is not available in advance and I can’t use it for my predictions. So I deselect it and run the analysis again.

The new estimated accuracy is 94.2%, which is still pretty high. The most impactful columns are ShippingPriority, YShippingDistance, XShippingDistance, and Carrier. This is great because all of this information is available in advance and can be used for a prediction. On the other hand, product-related columns, such PackageWeight and ScreenSize, have very small impacts on the prediction. This means that in the future I could simplify the overall process by feeding only shipping information into the training and prediction phases.

I’m happy with the analysis insights. Therefore, I decide to proceed and build a prediction model by selecting the Standard build option.

Now I can go for a walk, attend a few productive meetings, or simply spend some time with family. SageMaker Canvas is doing all of the work for me, training hundreds of models behind the scenes. It will select the best performing one, so that I can start generating accurate predictions in a couple of hours. Of course, the training duration will vary depending on the dataset size and problem type.

After about an hour and a half, the model is ready and the console lets me analyze its accuracy and the column impacts visually. I’m also happy to see that the model predicts the correct value 95.8% of the time, which is even higher than the estimated accuracy.

Optionally, I could also inspect advanced metrics such as Precision, Recall, F1 Score, and so on. These metrics help me understand how the model is performing and what kind of false positives and false negatives I can expect from this model.

From here, I could share the model into Amazon SageMaker Studio or continue using the Canvas UI to generate new predictions.

I decide to continue with the intuitive UI and select Predict. Now I can work with individual records or with a dataset for batch predictions.

When selecting Single prediction, SageMaker Canvas simplifies my life and lets me start from an existing record. I modify the column values and get immediate feedback on the prediction and the corresponding feature importance.

This quick feedback loop and intuitive UI allows me to use the ML model without having to write custom code. In case I decide to integrate the model into an automated production system, the Amazon SageMaker Studio integration lets me share the model easily with other data scientists in my team.

Generally Available Today
SageMaker Canvas is generally available in US East (Ohio), US East (N. Virginia), US West (Oregon), Europe (Frankfurt), and Europe (Ireland). You can start using it with your local datasets, as well as data already stored on Amazon S3, Amazon Redshift, or Snowflake. With just a few clicks, you’ll prepare and join your datasets, analyze estimated accuracy, verify which columns are impactful, train the best performing model, and generate new individual or batch predictions. We’re excited to hear your feedback and help you solve even more business problems with ML.

Alex

How do we develop AI education in schools? A panel discussion

Post Syndicated from Sue Sentance original https://www.raspberrypi.org/blog/ai-education-schools-panel-uk-policy/

AI is a broad and rapidly developing field of technology. Our goal is to make sure all young people have the skills, knowledge, and confidence to use and create AI systems. So what should AI education in schools look like?

To hear a range of insights into this, we organised a panel discussion as part of our seminar series on AI and data science education, which we co-host with The Alan Turing Institute. Here our panel chair Tabitha Goldstaub, Co-founder of CogX and Chair of the UK government’s AI Council, summarises the event. You can also watch the recording below.

As part of the Raspberry Pi Foundation’s monthly AI education seminar series, I was delighted to chair a special panel session to broaden the range of perspectives on the subject. The members of the panel were:

  • Chris Philp, UK Minister for Tech and the Digital Economy
  • Philip Colligan, CEO of the Raspberry Pi Foundation 
  • Danielle Belgrave, Research Scientist, DeepMind
  • Caitlin Glover, A level student, Sandon School, Chelmsford
  • Alice Ashby, student, University of Brighton

The session explored the UK government’s commitment in the recently published UK National AI Strategy stating that “the [UK] government will continue to ensure programmes that engage children with AI concepts are accessible and reach the widest demographic.” We discussed what it will take to make this a reality, and how we will ensure young people have a seat at the table.

Two teenage girls do coding during a computer science lesson.

Why AI education for young people?

It was clear that the Minister felt it is very important for young people to understand AI. He said, “The government takes the view that AI is going to be one of the foundation stones of our future prosperity and our future growth. It’s an enabling technology that’s going to have almost universal applicability across our entire economy, and that is why it’s so important that the United Kingdom leads the world in this area. Young people are the country’s future, so nothing is complete without them being at the heart of it.”

A teacher watches two female learners code in Code Club session in the classroom.

Our panelist Caitlin Glover, an A level student at Sandon School, reiterated this from her perspective as a young person. She told us that her passion for AI started initially because she wanted to help neurodiverse young people like herself. Her idea was to start a company that would build AI-powered products to help neurodiverse students.

What careers will AI education lead to?

A theme of the Foundation’s seminar series so far has been how learning about AI early may impact young people’s career choices. Our panelist Alice Ashby, who studies Computer Science and AI at Brighton University, told us about her own process of deciding on her course of study. She pointed to the fact that terms such as machine learning, natural language processing, self-driving cars, chatbots, and many others are currently all under the umbrella of artificial intelligence, but they’re all very different. Alice thinks it’s hard for young people to know whether it’s the right decision to study something that’s still so ambiguous.

A young person codes at a Raspberry Pi computer.

When I asked Alice what gave her the courage to take a leap of faith with her university course, she said, “I didn’t know it was the right move for me, honestly. I took a gamble, I knew I wanted to be in computer science, but I wanted to spice it up.” The AI ecosystem is very lucky that people like Alice choose to enter the field even without being taught what precisely it comprises.

We also heard from Danielle Belgrave, a Research Scientist at DeepMind with a remarkable career in AI for healthcare. Danielle explained that she was lucky to have had a Mathematics teacher who encouraged her to work in statistics for healthcare. She said she wanted to ensure she could use her technical skills and her love for math to make an impact on society, and to really help make the world a better place. Danielle works with biologists, mathematicians, philosophers, and ethicists as well as with data scientists and AI researchers at DeepMind. One possibility she suggested for improving young people’s understanding of what roles are available was industry mentorship. Linking people who work in the field of AI with school students was an idea that Caitlin was eager to confirm as very useful for young people her age.

We need investment in AI education in school

The AI Council’s Roadmap stresses how important it is to not only teach the skills needed to foster a pool of people who are able to research and build AI, but also to ensure that every child leaves school with the necessary AI and data literacy to be able to become engaged, informed, and empowered users of the technology. During the panel, the Minister, Chris Philp, spoke about the fact that people don’t have to be technical experts to come up with brilliant ideas, and that we need more people to be able to think creatively and have the confidence to adopt AI, and that this starts in schools. 

A class of primary school students do coding at laptops.

Caitlin is a perfect example of a young person who has been inspired about AI while in school. But sadly, among young people and especially girls, she’s in the minority by choosing to take computer science, which meant she had the chance to hear about AI in the classroom. But even for young people who choose computer science in school, at the moment AI isn’t in the national Computing curriculum or part of GCSE computer science, so much of their learning currently takes place outside of the classroom. Caitlin added that she had had to go out of her way to find information about AI; the majority of her peers are not even aware of opportunities that may be out there. She suggested that we ensure AI is taught across all subjects, so that every learner sees how it can make their favourite subject even more magical and thinks “AI’s cool!”.

A primary school boy codes at a laptop with the help of an educator.

Philip Colligan, the CEO here at the Foundation, also described how AI could be integrated into existing subjects including maths, geography, biology, and citizenship classes. Danielle thoroughly agreed and made the very good point that teaching this way across the school would help prepare young people for the world of work in AI, where cross-disciplinary science is so important. She reminded us that AI is not one single discipline. Instead, many different skill sets are needed, including engineering new AI systems, integrating AI systems into products, researching problems to be addressed through AI, or investigating AI’s societal impacts and how humans interact with AI systems.

On hearing about this multitude of different skills, our discussion turned to the teachers who are responsible for imparting this knowledge, and to the challenges they face. 

The challenge of AI education for teachers

When we shifted the focus of the discussion to teachers, Philip said: “If we really want to equip every young person with the knowledge and skills to thrive in a world that shaped by these technologies, then we have to find ways to evolve the curriculum and support teachers to develop the skills and confidence to teach that curriculum.”

Teenage students and a teacher do coding during a computer science lesson.

I asked the Minister what he thought needed to happen to ensure we achieved data and AI literacy for all young people. He said, “We need to work across government, but also across business and society more widely as well.” He went on to explain how important it was that the Department for Education (DfE) gets the support to make the changes needed, and that he and the Office for AI were ready to help.

Philip explained that the Raspberry Pi Foundation is one of the organisations in the consortium running the National Centre for Computing Education (NCCE), which is funded by the DfE in England. Through the NCCE, the Foundation has already supported thousands of teachers to develop their subject knowledge and pedagogy around computer science.

A recent study recognises that the investment made by the DfE in England is the most comprehensive effort globally to implement the computing curriculum, so we are starting from a good base. But Philip made it clear that now we need to expand this investment to cover AI.

Young people engaging with AI out of school

Philip described how brilliant it is to witness young people who choose to get creative with new technologies. As an example, he shared that the Foundation is seeing more and more young people employ machine learning in the European Astro Pi Challenge, where participants run experiments using Raspberry Pi computers on board the International Space Station. 

Three teenage boys do coding at a shared computer during a computer science lesson.

Philip also explained that, in the Foundation’s non-formal CoderDojo club network and its Coolest Projects tech showcase events, young people build their dream AI products supported by volunteers and mentors. Among these have been autonomous recycling robots and AI anti-collision alarms for bicycles. Like Caitlin with her company idea, this shows that young people are ready and eager to engage and create with AI.

We closed out the panel by going back to a point raised by Mhairi Aitken, who presented at the Foundation’s research seminar in September. Mhairi, an Alan Turing Institute ethics fellow, argues that children don’t just need to learn about AI, but that they should actually shape the direction of AI. All our panelists agreed on this point, and we discussed what it would take for young people to have a seat at the table.

A Black boy uses a Raspberry Pi computer at school.

Alice advised that we start by looking at our existing systems for engaging young people, such as Youth Parliament, student unions, and school groups. She also suggested adding young people to the AI Council, which I’m going to look into right away! Caitlin agreed and added that it would be great to make these forums virtual, so that young people from all over the country could participate.

The panel session was full of insight and felt very positive. Although the challenge of ensuring we have a data- and AI-literate generation of young people is tough, it’s clear that if we include them in finding the solution, we are in for a bright future. 

What’s next for AI education at the Raspberry Pi Foundation?

In the coming months, our goal at the Foundation is to increase our understanding of the concepts underlying AI education and how to teach them in an age-appropriate way. To that end, we will start to conduct a series of small AI education research projects, which will involve gathering the perspectives of a variety of stakeholders, including young people. We’ll make more information available on our research pages soon.

In the meantime, you can sign up for our upcoming research seminars on AI and data science education, and peruse the collection of related resources we’ve put together.

The post How do we develop AI education in schools? A panel discussion appeared first on Raspberry Pi.

New – Amazon EC2 G5g Instances Powered by AWS Graviton2 Processors and NVIDIA T4G Tensor Core GPUs

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-ec2-g5g-instances-powered-by-aws-graviton2-processors-and-nvidia-t4g-tensor-core-gpus/

AWS Graviton2 processors are custom-designed by AWS to enable the best price performance in Amazon EC2. Thousands of customers are realizing significant price performance benefits for a wide variety of workloads with Graviton2-based instances.

Today, we are announcing the general availability of Amazon EC2 G5g instances that extend Graviton2 price-performance benefits to GPU-based workloads including graphics applications and machine learning inference. In addition to Graviton2 processors, G5g instances feature NVIDIA T4G Tensor Core GPUs to provide the best price performance for Android game streaming, with up to 25 Gbps of networking bandwidth and 19 Gbps of EBS bandwidth.

These instances provide up to 30 percent lower cost per stream per hour for Android game streaming than x86-based GPU instances. G5g instances are also ideal for machine learning developers who are looking for cost-effective inference, have ML models that are sensitive to CPU performance, and leverage NVIDIA’s AI libraries.

G5g instances are available in the six sizes as shown below.

Instance Name vCPUs Memory (GB) NVIDIA T4G Tensor Core GPU GPU Memory (GB) EBS Bandwidth (Gbps) Network Bandwidth (Gbps)
g5g.xlarge 4 8 1 16 Up to 3.5 Up to 10
g5g.2xlarge 8 16 1 16 Up to 3.5 Up to 10
g5g.4xlarge 16 32 1 16 Up to 3.5 Up to 10
g5g.8xlarge 32 64 1 16 9 12
g5g.16xlarge 64 128 2 32 19 25
g5g.metal 64 128 2 32 19 25

These instances are a great fit for many interesting types of workloads. Here are a few examples:

  • Streaming Android gaming—With G5g instances, Android game developers can build natively on Arm-based GPU instances without the need for cross-compilation or emulation on x86-based instances. They can encode the rendered graphics and stream the game over the network to a mobile device. This helps simplify development efforts and time and lowers the cost per stream per hour by up to 30 percent.
  • ML Inference —G5g instances are also ideal for machine learning developers who are looking for cost-effective inference, have ML models that are sensitive to CPU performance, and leverage NVIDIA’s AI If you don’t have any dependencies on NVIDIA software, you may use Inf1 instances, which deliver up to 70 percent lower cost-per-inference than G4dn instances.
  • Graphics rendering—G5g instances are the most cost-effective option for customers with rendering workloads and dependencies on NVIDIA libraries. These instances also support rendering applications and use cases that leverage industry-standard APIs such as OpenGL and Vulkan.
  • Autonomous Vehicle Simulations—Several of our customers are designing and simulating autonomous vehicles that include multiple real-time sensors. They can use ray tracing to simulate sensor input in real time.

The instances are compatible with a very long list of graphical and machine learning libraries on Linux, including NVENC, NVDEC, nvJPEG, OpenGL, Vulkan, CUDA, CuDNN, CuBLAS, and TensorRT.

Available Now
The new G5g instances are available now, and you can start using them today in the US East (N. Virginia), US West (Oregon), and Asia-Pacific (Seoul, Singapore and Tokyo) Regions in On-Demand, Spot, Savings Plan, and Reserved Instance form. To learn more, see the EC2 pricing page.

G5g instances are available now in AWS Deep Learning AMIs with NVIDIA drivers and popular ML frameworks, Amazon Elastic Container Service (Amazon ECS), or Amazon Elastic Kubernetes Service (Amazon EKS) clusters for containerized ML applications.

You can send feedback to the AWS forum for Amazon EC2 or through your usual AWS Support contacts.

Channy

Amazon CodeGuru Reviewer Introduces Secrets Detector to Identify Hardcoded Secrets and Secure Them with AWS Secrets Manager

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/codeguru-reviewer-secrets-detector-identify-hardcoded-secrets/

Amazon CodeGuru helps you improve code quality and automate code reviews by scanning and profiling your Java and Python applications. CodeGuru Reviewer can detect potential defects and bugs in your code. For example, it suggests improvements regarding security vulnerabilities, resource leaks, concurrency issues, incorrect input validation, and deviation from AWS best practices.

One of the most well-known security practices is the centralization and governance of secrets, such as passwords, API keys, and credentials in general. As many other developers facing a strict deadline, I’ve often taken shortcuts when managing and consuming secrets in my code, using plaintext environment variables or hard-coding static secrets during local development, and then inadvertently commit them. Of course, I’ve always regretted it and wished there was an automated way to detect and secure these secrets across all my repositories.

Today, I’m happy to announce the new Amazon CodeGuru Reviewer Secrets Detector, an automated tool that helps developers detect secrets in source code or configuration files, such as passwords, API keys, SSH keys, and access tokens.

These new detectors use machine learning (ML) to identify hardcoded secrets as part of your code review process, ultimately helping you to ensure that all new code doesn’t contain hardcoded secrets before being merged and deployed. In addition to Java and Python code, secrets detectors also scan configuration and documentation files. CodeGuru Reviewer suggests remediation steps to secure your secrets with AWS Secrets Manager, a managed service that lets you securely and automatically store, rotate, manage, and retrieve credentials, API keys, and all sorts of secrets.

This new functionality is included as part of the CodeGuru Reviewer service at no additional cost and supports the most common API providers, such as AWS, Atlassian, Datadog, Databricks, GitHub, Hubspot, Mailchimp, Salesforce, SendGrid, Shopify, Slack, Stripe, Tableau, Telegram, and Twilio. Check out the full list here.

Secrets Detectors in Action
First, I select CodeGuru from the AWS Secrets Manager console. This new flow lets me associate a new repository and run a full repository analysis with the goal of identifying hardcoded secrets.

Associating a new repository only takes a few seconds. I connect my GitHub account, and then select a repository named hawkcd, which contains a few Java, C#, JavaScript, and configuration files.

A few minutes later, my full repository is successfully associated and the full scan is completed. I could also have a look at a demo repository analysis called DemoFullRepositoryAnalysisSecrets. You’ll find this demo in the CodeGuru console, under Full repository analysis, in your AWS Account.

I select the repository analysis and find 42 recommendations, including one recommendation for a hardcoded secret (you can filter recommendations by Type=Secrets). CodeGuru Reviewer identified a hardcoded AWS Access Key ID in a .travis.yml file.

The recommendation highlights the importance of storing these secrets securely, provides a link to learn more about the issue, and suggests rotating the identified secret to make sure that it can’t be reused by malicious actors in the future.

CodeGuru Reviewer lets me jump to the exact file and line of code where the secret appears, so that I can dive deeper, understand the context, verify the file history, and take action quickly.

Last but not least, the recommendation includes a Protect your credential button that lets me jump quickly to the AWS Secrets Manager console and create a new secret with the proper name and value.

I’m going to remove the plaintext secret from my source code and update my application to fetch the secret value from AWS Secrets Manager. In many cases, you can keep the current configuration structure and use existing parameters to store the secret’s name instead of the secret’s value.

Once the secret is securely stored, AWS Secrets Manager also provides me with code snippets that fetch my new secret in many programming languages using the AWS SDKs. These snippets let me save time and include the necessary SDK call, as well as the error handling, decryption, and decoding logic.

I’ve showed you how to run a full repository analysis, and of course the same analysis can be performed continuously on every new pull request to help you prevent hardcoded secrets and other issues from being introduced in the future.

Available Today with CodeGuru Reviewer
CodeGuru Reviewer Secrets Detector is available in all regions where CodeGuru Reviewer is available, at no additional cost.

If you’re new to CodeGuru Reviewer, you can try it for free for 90 days with repositories up to 100,000 lines of code. Connecting your repositories and starting a full scan takes only a couple of minutes, whether your code is hosted on AWS CodeCommit, BitBucket, or GitHub. If you’re using GitHub, check out the GitHub Actions integration as well.

You can learn more about Secrets Detector in the technical documentation.

Alex

The machine learning effect: Magic boxes and computational thinking 2.0

Post Syndicated from Jane Waite original https://www.raspberrypi.org/blog/machine-learning-education-school-computational-thinking-2-0-research-seminar/

How does teaching children and young people about machine learning (ML) differ from teaching them about other aspects of computing? Professor Matti Tedre and Dr Henriikka Vartiainen from the University of Eastern Finland shared some answers at our latest research seminar.

Three smiling young learners in a computing classroom.
We need to determine how to teach young people about machine learning, and what teachers need to know to help their learners form correct mental models.

Their presentation, titled ‘ML education for K-12: emerging trajectories’, had a profound impact on my thinking about how we teach computational thinking and programming. For this blog post, I have simplified some of the complexity associated with machine learning for the benefit of readers who are new to the topic.

a 3D-rendered grey box.
Machine learning is not magic — what needs to change in computing education to make sure learners don’t see ML systems as magic boxes?

Our seminars on teaching AI, ML, and data science

We’re currently partnering with The Alan Turing Institute to host a series of free research seminars about how to teach artificial intelligence (AI) and data science to young people.

The seminar with Matti and Henriikka, the third one of the series, was very well attended. Over 100 participants from San Francisco to Rajasthan, including teachers, researchers, and industry professionals, contributed to a lively and thought-provoking discussion.

Representing a large interdisciplinary team of researchers, Matti and Henriikka have been working on how to teach AI and machine learning for more than three years, which in this new area of study is a long time. So far, the Finnish team has written over a dozen academic papers based on their pilot studies with kindergarten-, primary-, and secondary-aged learners.

Current teaching in schools: classical rule-driven programming

Matti and Henriikka started by giving an overview of classical programming and how it is currently taught in schools. Classical programming can be described as rule-driven. Example features of classical computer programs and programming languages are:

  • A classical language has a strict syntax, and a limited set of commands that can only be used in a predetermined way
  • A classical language is deterministic, meaning we can guarantee what will happen when each line of code is run
  • A classical program is executed in a strict, step-wise order following a known set of rules

When we teach this type of programming, we show learners how to use a deductive problem solving approach or workflow: defining the task, designing a possible solution, and implementing the solution by writing a stepwise program that is then run on a computer. We encourage learners to avoid using trial and error to write programs. Instead, as they develop and test a program, we ask them to trace it line by line in order to predict what will happen when each line is run (glass-box testing).

A list of features of rule-driven computer programming, also included in the text.
The features of classical (rule-driven) programming approaches as taught in computer science education (CSE) (Tedre & Vartiainen, 2021).

Classical programming underpins the current view of computational thinking (CT). Our speakers called this version of CT ‘CT 1.0’. So what’s the alternative Matti and Henriikka presented, and how does it affect what computational thinking is or may become?

Machine learning (data-driven) models and new computational thinking (CT 2.0) 

Rule-based programming languages are not being eradicated. Instead, software systems are being augmented through the addition of machine learning (data-driven) elements. Many of today’s successful software products, such as search engines, image classifiers, and speech recognition programs, combine rule-driven software and data-driven models. However, the workflows for these two approaches to solving problems through computing are very different.

A table comparing problem solving workflows using computational thinking 1.0 versus computational thinking 2.0, info also included in the text.
Problem solving is very different depending on whether a rule-driven computational thinking (CT 1.0) approach or a data-driven computational thinking (CT 2.0) approach is used (Tedre & Vartiainen,2021).

Significantly, while in rule-based programming (and CT 1.0), the focus is on solving problems by creating algorithms, in data-driven approaches, the problem solving workflow is all about the data. To highlight the profound impact this shift in focus has on teaching and learning computing, Matti introduced us to a new version of computational thinking for machine learning, CT 2.0, which is detailed in a forthcoming research paper.

Because of the focus on data rather than algorithms, developing a machine learning model is not at all like developing a classical rule-driven program. In classical programming, programs can be traced, and we can predict what will happen when they run. But in data-driven development, there is no flow of rules, and no absolutely right or wrong answer.

A table comparing conceptual differences between computational thinking 1.0 versus computational thinking 2.0, info also included in the text.
There are major differences between rule-driven computational thinking (CT 1.0) and data-driven computational thinking (CT 2.0), which impact what computing education needs to take into account (Tedre & Vartiainen,2021).

Machine learning models are created iteratively using training data and must be cross-validated with test data. A tiny change in the data provided can make a model useless. We rarely know exactly why the output of an ML model is as it is, and we cannot explain each individual decision that the model might have made. When evaluating a machine learning system, we can only say how well it works based on statistical confidence and efficiency. 

Machine learning education must cover ethical and societal implications 

The ethical and societal implications of computer science have always been important for students to understand. But machine learning models open up a whole new set of topics for teachers and students to consider, because of these models’ reliance on large datasets, the difficulty of explaining their decisions, and their usefulness for automating very complex processes. This includes privacy, surveillance, diversity, bias, job losses, misinformation, accountability, democracy, and veracity, to name but a few.

I see the shift in problem solving approach as a chance to strengthen the teaching of computing in general, because it opens up opportunities to teach about systems, uncertainty, data, and society.

Jane Waite

Teaching machine learning: the challenges of magic boxes and new mental models

For teaching classical rule-driven programming, much time and effort has been put into researching learners’ understanding of what a program will do when it is run. This kind of understanding is called a learner’s mental model or notional machine. An approach teachers often use to help students develop a useful mental model of a program is to hide the detail of how the program works and only gradually reveal its complexity. This approach is described with the metaphor of hiding the detail of elements of the program in a box. 

Data-driven models in machine learning systems are highly complex and make little sense to humans. Therefore, they may appear like magic boxes to students. This view needs to be banished. Machine learning is not magic. We have just not figured out yet how to explain the detail of data-driven models in a way that allows learners to form useful mental models.

An example of a representation of a machine learning model in TensorFlow, an online machine learning tool (Tedre & Vartiainen,2021).

Some existing ML tools aim to help learners form mental models of ML, for example through visual representations of how a neural network works (see Figure 2). But these explanations are still very complex. Clearly, we need to find new ways to help learners of all ages form useful mental models of machine learning, so that teachers can explain to them how machine learning systems work and banish the view that machine learning is magic.

Some tools and teaching approaches for ML education

Matti and Henriikka’s team piloted different tools and pedagogical approaches with different age groups of learners. In terms of tools, since large amounts of data are needed for machine learning projects, our presenters suggested that tools that enable lots of data to be easily collected are ideal for teaching activities. Media-rich education tools provide an opportunity to capture still images, movements, sounds, or sense other inputs and then use these as data in machine learning teaching activities. For example, to create a machine learning–based rock-paper-scissors game, students can take photographs of their hands to train a machine learning model using Google Teachable Machine.

Photos of hands are used to train a machine learning model as part of a project to create a rock-paper-scissors game.
Photos of hands are used to train a Teachable Machine machine learning model as part of a project to create a rock-paper-scissors game (Tedre & Vartiainen, 2021).

Similar to tools that teach classic programming to novice students (e.g. Scratch), some of the new classroom tools for teaching machine learning have a drag-and-drop interface (e.g. Cognimates). Using such tools means that in lessons, there can be less focus on one of the more complex aspects of learning to program, learning programming language syntax. However, not all machine learning education products include drag-and-drop interaction, some instead have their own complex languages (e.g. Wolfram Programming Lab), which are less attractive to teachers and learners. In their pilot studies, the Finnish team found that drag-and-drop machine learning tools appeared to work well with students of all ages.

The different pedagogical approaches the Finnish research team used in their pilot studies included an exploratory approach with preschool children, who investigated machine learning recognition of happy or sad faces; and a project-based approach with older students, who co-created machine learning apps with web-based tools such as Teachable Machine and Learn Machine Learning (built by the research team), supported by machine learning experts.

Example of a middle school (age 8 to 11) student’s pen and paper design for a machine learning app that recognises different instruments and chords.
Example of a middle school (age 8 to 11) student’s design for a machine learning app that recognises different instruments and chords (Tedre & Vartiainen, 2021).

What impact these pedagogies have on students’ long-term mental models about machine learning has yet to be researched. If you want to find out more about the classroom pilot studies, the academic paper is a very accessible read.

My take-aways: new opportunities, new research questions

We all learned a tremendous amount from Matti and Henriikka and their perspectives on this important topic. Our seminar participants asked them many questions about the pedagogies and practicalities of teaching machine learning in class, and raised concerns about squeezing more into an already packed computing curriculum.

For me, the most significant take-away from the seminar was the need to shift focus from algorithms to data and from CT 1.0 to CT 2.0. Learning how to best teach classical rule-driven programming has been a long journey that we have not yet completed. We are forming an understanding of what concepts learners need to be taught, the progression of learning, key mental models, pedagogical options, and assessment approaches. For teaching data-driven development, we need to do the same.  

The question of how we make sure teachers have the necessary understanding is key.

Jane Waite

I see the shift in problem solving approach as a chance to strengthen the teaching of computing in general, because it opens up opportunities to teach about systems, uncertainty, data, and society. I think it will help us raise awareness about design, context, creativity, and student agency. But I worry about how we will introduce this shift. In my view, there is a considerable risk that we will be sucked into open-ended, project-based learning, with busy and fun but shallow learning experiences that result in restricted conceptual development for students.

I also worry about how we can best help teachers build up the knowledge and experience to support their students. In the Q&A after the seminar, I asked Matti and Henriikka about the role of their team’s machine learning experts in their pilot studies. It seemed to me that without them, the pilot lessons would not have worked, as the participating teachers and students would not have had the vocabulary to talk about the process and would not have known what was doable given the available time, tools, and student knowledge.

The question of how we make sure teachers have the necessary understanding is key. Many existing professional development resources for teachers wanting to learn about ML seem to imply that teachers will all need a PhD in statistics and neural network optimisation to engage with machine learning education. This is misleading. But teachers do need to understand the machine learning concepts that their students need to learn about, and I think we don’t yet know exactly what these concepts are. 

In summary, clearly more research is needed. There are fundamental questions still to be answered about what, when, and how we teach data-driven approaches to software systems development and how this impacts what we teach about classical, rule-based programming. But to me, that is exciting, and I am very much looking forward to the journey ahead.

Join our next free seminar

To find out what others recommend about teaching AI and ML, catch up on last month’s seminar with Professor Carsten Schulte and colleagues on centring data instead of code in the teaching of AI.

We have another four seminars in our monthly series on AI, machine learning, and data science education. Find out more about them on this page, and catch up on past seminar blogs and recordings here.

At our next seminar on Tuesday 7 December at 17:00–18:30 GMT, we will welcome Professor Rose Luckin from University College London. She will be presenting on what it is about AI that makes it useful for teachers and learners.

We look forward to meeting you there!

PS You can build your understanding of machine learning by joining our latest free online course, where you’ll learn foundational concepts and train your own ML model!

The post The machine learning effect: Magic boxes and computational thinking 2.0 appeared first on Raspberry Pi.

Announcing Fully Managed RStudio on Amazon SageMaker for Data Scientists

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/announcing-fully-managed-rstudio-on-amazon-sagemaker-for-data-scientists/

Two years ago, we introduced Amazon SageMaker Studio, the industry’s first fully integrated development environment (IDE) for machine learning (ML). Amazon SageMaker Studio provides a single, web-based visual interface where you can perform all ML development steps, improving data science team productivity by up to 10 times

Many data scientists love the R project, an open-source ecosystem with more than 18,000 packages that is not just a programming language but is also an interactive environment for doing data science. RStudio is one of the most popular IDE among R developers for ML and data science projects. RStudio provides open-source tools for R and enterprise-ready professional software for data science teams to develop and share their work in the organization. But, building, securing, scaling and maintaining RStudio yourself is tedious and cumbersome.

Today, in collaboration with RStudio PBC, we are excited to announce the general availability of RStudio on Amazon SageMaker, the industry’s first fully managed RStudio Workbench IDE in the cloud. You can now bring your current RStudio license to easily migrate your self-managed RStudio environments to Amazon SageMaker in just a few simple steps. If you’d like to read more about this exciting collaboration, check out this blog from RStudio PBC.

With RStudio on Amazon SageMaker, administrators can have a simple experience to migrate their RStudio environments to integrate into Amazon SageMaker and bring existing RStudio licenses to manage through AWS License Manager. They can onboard both R and Python developers to the same Amazon SageMaker domain using AWS Single Sign-On (SSO) or AWS Identity and Access Management (IAM) and take it as a centralized place to configure both RStudio and Amzon SageMaker Studio.

So, data scientists have a freedom of choice between programming languages and coding interfaces to switch between RStudio and Amazon SageMaker Studio notebooks. All of their work, including code, datasets, repositories, and other artifacts are synchronized between the two environments through the underlying Amazon EFS storage.

Getting Started with RStudio on SageMaker
You now can launch the familiar RStudio Workbench with a simple click from Amazon SageMaker. Before getting started, your administrator needs to buy an appropriate license from RStudio PBC for end-users, set up your granted licenses in AWS License Manager, and create an Amazon SageMaker domain and user profile to launch RStudio on Amazon SageMaker. To learn all the administrator jobs, including managing licenses and monitoring usages, see a blog post of the setting up process, or Manage RStudio on Amazon SageMaker in the AWS documentation.

Once the required setup process is completed, you can open the RStudio Workbench from the new Launch app drop-down list in the created user list and select RStudio.

You will immediately see the RStudio Workbench home page and a list of sessions, projects, and published content on the home page. To create a new session, select the New Session button on the page, select a desired instance in the Instance Type dropdown list, and choose Start Session.

When you choose a compute instance type for a lightweight analysis that can be powered by two vCPU and four GiB memory, you can use a default ml.t3.medium instance. For a complex and large-scale ML modeling, you can choose a large instance with desired compute and memory from a wide array of ML instances available on Amazon SageMaker.

In a few minutes, your session will be ready for development in RStudio Workbench. When you launch your RStudio session, the Base R image serves as the basis of your instance. This Docker image includes R v4.0, AWS tools such as awscli, sagemaker, boto3 Python packages, and reticulate package for the interoperability between Python and R.

Managing R Packages and Publishing your Analysis
Along with the RStudio Workbench, RStudio Connect and RStudio Package Manager are the most used products of RStudio.

RStudio Connect is designed to allow data scientists to publish insights and dashboard and web applications from RStudio Workbench easily. RStudio Package Manager centrally manages the package repository for your organization so that data scientists can securely install packages faster while ensuring project reproducibility and repeatability.

Your administrator, for example, can create a repository and subscribe it to the built-in source named cran in RStudio Package Manager.

$ rspm sync --wait # Initiate a sync
$ rspm create repo --name=prod-cran --description='Access CRAN packages' # Create a repository:
$ rspm subscribe --repo=prod-cran --source=cran # Subscribe the repository to the cran source

When these steps are completed, you can use the prod-cran repository in the web interface of RStudio Package Manager.

Now, you can configure this repository to install and manage your packages in RStudio Workbench. You can also configure RStudio Connect to publish insights, dashboard and web applications from RStudio Workbench via RStudio Connect so that your collaborators can easily consume your work.

For example, you run the analysis inline to create an R Markdown that can be published to your collaborators. You can preview the slides while writing codes with the Preview button and publish it with the Publish icon in your RStudio session.

You can also publish Shiny application easy to create interactive web interfaces, or Python-based content such as Streamlit to the RStudio Connect instance.

To learn more, see Host RStudio Connect and Package Manager for ML development in RStudio on Amazon SageMaker written by my colleagues, Michael Hsieh, Chayan Panda, and Farooq Sabir on the AWS Machine Learning Blog.

Integrating training jobs with Amazon SageMaker
One of the benefits of using RStudio on Amazon SageMaker is the integration of Amazon SageMaker features. Your RStudio and Jupyter Notebook instances of Amazon SageMaker allow you to share the same Amazon EFS file system. You can import R codes written in Jupyter Notebook or use the same files in both Jupyter Notebook and RStudio without having to move your files between the two.

For example, you can run an R sample code including importing libraries, creating an Amazon SageMaker session, getting the IAM role, and importing and visualizing sample data. And then, it stores data on the S3 bucket, and triggers a training task with an XGBoost model by specifying the training container and defining an Amazon SageMaker Estimator. To learn more, see R sample codes in Amazon SageMaker.

# Import reticulate, readr and sagemaker libraries
library(reticulate)
library(readr)
sagemaker <- import('sagemaker')

# Create a sagemaker session
session <- sagemaker$Session()

# Get execution role
role_arn <- sagemaker$get_execution_role()

# Read a csv file from UCI public repository
data_file <- 'http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data'

# Copy data to a dataframe, rename columns, and show dataframe head
data_csv <- read_csv(file = data_file, col_names = FALSE, col_types = cols())
names(data_csv) <- c('sex', 'length', 'diameter', 'height', 'whole_weight', 'shucked_weight', 'viscera_weight', 'shell_weight', 'rings')
head(data_csv)

# Visualize data have height equal to 0
library(ggplot2)
options(repr.plot.width = 5, repr.plot.height = 4) 
ggplot(abalone, aes(x = height, y = rings, color = sex, alpha=0.5)) + geom_point() + geom_jitter()

# Upload data to Amazon S3 bucket
s3_train <- session$upload_data(path = data_csv,
                                bucket = my_s3_bucket, 
                                key_prefix = 'r_hello_world_demo/data')
s3_path = paste('s3://',bucket,'/r_hello_world_demo/data/abalone.csv',sep = '')

# Train a XGBoost model, specify the training containers, and define an Amazon SageMaker Estimator
container <- sagemaker$image_uris$retrieve(framework='xgboost', 
                                           region= session$boto_region_name, 
										   version='latest')							
estimator <- sagemaker$estimator$Estimator(image_uri = container,
                                           role = role_arn,
                                           train_instance_count = 1L,
                                           train_instance_type = 'ml.m5.4xlarge',
                                           train_volume_size = 30L,
                                           train_max_run = 3600L,
                                           input_mode = 'File',
                                           output_path = s3_path)

Now Available
RStudio on Amazon SageMaker is available in all AWS Regions where both Amazon SageMaker Studio and AWS License Manager are available. You can bring your own license of RStudio on Amazon SageMaker and pay for the underlying compute and storage resources within Amazon SageMaker or other AWS services, based on your usage.

To get started with RStudio on Amazon SageMaker, you can use AWS Free Tier. You can use 250 hours of ml.t3.medium instance on Amazon SageMaker Studio per month for the first two months. To learn more, see Amazon SageMaker Pricing page.

Give it a try, and please send us feedback either in the AWS forum for Amazon SageMaker or through your usual AWS support contacts.

Channy

Detect Python and Java code security vulnerabilities with Amazon CodeGuru Reviewer

Post Syndicated from Haider Naqvi original https://aws.amazon.com/blogs/devops/detect-python-and-java-code-security-vulnerabilities-with-codeguru-reviewer/

with Aaron Friedman (Principal PM-T for xGuru services)

Amazon CodeGuru is a developer tool that uses machine learning and automated reasoning to catch hard to find defects and security vulnerabilities in application code. The purpose of this blog is to show how new CodeGuru Reviewer features help improve the security posture of your Python applications and highlight some of the specific categories of code vulnerabilities that CodeGuru Reviewer can detect. We will also cover newly expanded security capabilities for Java applications.

Amazon CodeGuru Reviewer can detect code vulnerabilities and provide actionable recommendations across dozens of the most common and impactful categories of code security issues (as classified by industry-recognized standards, Open Web Application Security, OWASP , “top ten” and Common Weakness Enumeration, CWE. The following are some of the most severe code vulnerabilities that CodeGuru Reviewer can now help you detect and prevent:

  • Injection weaknesses typically appears in data-rich applications. Every year, hundreds of web servers are compromised using SQL Injection. An attacker can use this method to bypass access control and read or modify application data.
  • Path Traversal security issues occur when an application does not properly handle special elements within a provided path name. An attacker can use it to overwrite or delete critical files and expose sensitive data.
  • Null Pointer Dereference issues can occur due to simple programming errors, race conditions, and others. Null Pointer Dereference can cause availability issues in rare conditions. Attackers can use it to read and modify memory.
  • Weak or broken cryptography is a risk that may compromise the confidentiality and integrity of sensitive data.

Security vulnerabilities present in source code can result in application downtime, leaked data, lost revenue, and lost customer trust. Best practices and peer code reviews aren’t sufficient to prevent these issues. You need a systematic way of detecting and preventing vulnerabilities from being deployed to production. CodeGuru Reviewer Security Detectors can provide a scalable approach to DevSecOps, a mechanism that employs automation to address security issues early in the software development lifecycle. Security detectors automate the detection of hard-to-find security vulnerabilities in Java and now Python applications, and provide actionable recommendations to developers.

By baking security mechanisms into each step of the process, DevSecOps enables the development of secure software without sacrificing speed. However, false positive issues raised by Static Application Security Testing (SAST) tools often must be manually triaged effectively and work against this value. CodeGuru uses techniques from automated reasoning, and specifically, precise data-flow analysis, to enhance the precision of code analysis. CodeGuru therefore reports fewer false positives.

Many customers are already embracing open-source code analysis tools in their DevSecOps practices. However, integrating such software into a pipeline requires a heavy up front lift, ongoing maintenance, and patience to configure. Furthering the utility of new security detectors in Amazon CodeGuru Reviewer, this update adds integrations with Bandit  and Infer, two widely-adopted open-source code analysis tools. In Java code bases, CodeGuru Reviewer now provide recommendations from Infer that detect null pointer dereferences, thread safety violations and improper use of synchronization locks. And in Python code, the service detects instances of SQL injection, path traversal attacks, weak cryptography, or the use of compromised libraries. Security issues found and recommendations generated by these tools are shown in the console, in pull requests comments, or through CI/CD integrations, alongside code recommendations generated by CodeGuru’s code quality and security detectors. Let’s dive deep and review some examples of code vulnerabilities that CodeGuru Reviewer can help detect.

Injection (Python)

Amazon CodeGuru Reviewer can help detect the most common injection vulnerabilities including SQL, XML, OS command, and LDAP types. For example, SQL injection occurs when SQL queries are constructed through string formatting. An attacker could manipulate the program inputs to modify the intent of the SQL query. The following python statement executes a SQL query constructed through string formatting and can be an attack vector:

import sqlite3
from flask import request

def removing_product():
    productId = request.args.get('productId')
    str = 'DELETE FROM products WHERE productID = ' + productId
    return str

def sql_injection():
    connection = psycopg2.connect("dbname=test user=postgres")
    cur = db.cursor()
    query = removing_product()
    cur.execute(query)

CodeGuru will flag a potential SQL injection using Bandit security detector, will make the following recommendation:

>> We detected a SQL command that might use unsanitized input. 
This can result in an SQL injection. To increase the security of your code, 
sanitize inputs before using them to form a query string.

To avoid this, the user should correct the code to use a parameter sanitization mechanism that guards against SQL injection as done below:

import sqlite3
from flask import request

def removing_product():
    productId = sanitize_input(request.args.get('productId'))
    str = 'DELETE FROM products WHERE productID = ' + productId
    return str

def sql_injection():
    connection = psycopg2.connect("dbname=test user=postgres")
    cur = db.cursor()
    query = removing_product()
    cur.execute(query)

In the above corrected code, user supplied sanitize_input method will take care of sanitizing user inputs.

Path Traversal (Python)

When applications use user input to create a path to read or write local files, an attacker can manipulate the input to overwrite or delete critical files or expose sensitive data. These critical files might include source code, sensitive, or application configuration information.

@app.route('/someurl')
def path_traversal():
    file_name = request.args["file"]
    f = open("./{}".format(file_name))
    f.close()

In above example, file name is directly passed to an open API without checking or filtering its content.

CodeGuru’s recommendation:

>> Potentially untrusted inputs are used to access a file path.
To protect your code from a path traversal attack, verify that your inputs are
sanitized.

In response, the developer should sanitize data before using it for creating/opening file.

@app.route('/someurl')
def path_traversal():
file_name = sanitize_data(request.args["file"])
f = open("./{}".format(file_name))
f.close()

In this modified code, input data file_name has been clean/filtered by sanitized_data api call.

Null Pointer Dereference (Java)

Infer detectors are a new addition that complement CodeGuru Reviewer native Java Security Detectors. Infer detectors, based on the Facebook Infer static analyzer, include rules to detect null pointer dereferences, thread safety violations, and improper use of synchronization locks. In particular, the null-pointer-dereference rule detects paths in the code that lead to null pointer exceptions in Java. Null pointer dereference is a very common pitfall in Java and is considered one of 25 most dangerous software weaknesses.

The Infer null-pointer-dereference rule guards against unexpected null pointer exceptions by detecting locations in the code where pointers that could be null are dereferenced. CodeGuru augments the Infer analyzer with knowledge about the AWS APIs, which allows the security detectors to catch potential null pointer exceptions when using AWS APIs.

For example, the AWS DynamoDBMapper class provides a convenient abstraction for mapping Amazon DynamoDB tables to Java objects. However, developers should be aware that DynamoDB Mapper load operations can return a null pointer if the object was not found in the table. The following code snippet updates a record in a catalog using a DynamoDB Mapper:

DynamoDBMapper mapper = new DynamoDBMapper(client);
// Retrieve the item.
CatalogItem itemRetrieved = mapper.load(
CatalogItem.class, 601);
// Update the item.
itemRetrieved.setISBN("622-2222222222");
itemRetrieved.setBookAuthors(
new HashSet<String>(Arrays.asList(
"Author1", "Author3")));
mapper.save(itemRetrieved);

CodeGuru will protect against a potential null dereference by making the following recommendation:

object `itemRetrieved` last assigned on line 88 could be null
and is dereferenced at line 90.

In response, the developer should add a null check to prevent the null pointer dereference from occurring.

DynamoDBMapper mapper = new DynamoDBMapper(client);
// Retrieve the item.
CatalogItem itemRetrieved = mapper.load(CatalogItem.class, 601);
// Update the item.
if (itemRetrieved != null) {
itemRetrieved.setISBN("622-2222222222");
itemRetrieved.setBookAuthors(
new HashSet<String>(Arrays.asList(
"Author1","Author3")));
mapper.save(itemRetrieved);
} else {
throw new CatalogItemNotFoundException();
}

Weak or broken cryptography  (Python)

Python security detectors support popular frameworks along with built-in APIs such as cryptography, pycryptodome etc. to identify ciphers related vulnerability. As suggested in CWE-327 , the use of a non-standard/inadequate key length algorithm is dangerous because attacker may be able to break the algorithm and compromise whatever data has been protected. In this example, `PBKDF2` is used with a weak algorithm and may lead to cryptographic vulnerabilities.
from Crypto.Protocol.KDF import PBKDF2
from Crypto.Hash import SHA1

def risky_crypto_algorithm(password):
salt = get_random_bytes(16)
keys = PBKDF2(password, salt, 64, count=1000000,
hmac_hash_module=SHA1)

SHA1 is used to create a PBKDF2, however, it is insecure hence not recommended for PBKDF2. CodeGuru’s identifies the issue and makes the following recommendation:

>> The `PBKDF2` function is using a weak algorithm which might
lead to cryptographic vulnerabilities. We recommend that you use the
`SHA224`, `SHA256`, `SHA384`,`SHA512/224`, `SHA512/256`, `BLAKE2s`,
`BLAKE2b`, `SHAKE128`, `SHAKE256` algorithms.

In response, the developer should use the correct SHA algorithm to protect against potential cipher attacks.

from Crypto.Protocol.KDF import PBKDF2
from Crypto.Hash import SHA512

def risky_crypto_algorithm(password):
salt = get_random_bytes(16)
keys = PBKDF2(password, salt, 64, count=1000000,
hmac_hash_module=SHA512)

This modified example uses high strength SHA512 algorithm.

Conclusion

This post reviewed Amazon CodeGuru Reviewer security detectors and how they automatically check your code for vulnerabilities and provide actionable recommendations in code reviews. We covered new capabilities for detecting issues in Python applications, as well as additional security features from Bandit and Infer. Together CodeGuru Reviewer’s security features provide a scalable approach for customers embracing DevSecOps, a mechanism that requires automation to address security issues earlier in the software development lifecycle. CodeGuru automates detection and helps prevent hard-to-find security vulnerabilities, accelerating DevSecOps processes for application development workflow.

You can get started from the CodeGuru console by running a full repository scan or integrating CodeGuru Reviewer with your supported CI/CD pipeline. Code analysis from Infer and Bandit is included as part of the standard CodeGuru Reviewer service.

For more information about automating code reviews and application profiling with Amazon CodeGuru, check out the AWS DevOps Blog. For more details on how to get started, visit the documentation.

Learn the fundamentals of AI and machine learning with our free online course

Post Syndicated from Michael Conterio original https://www.raspberrypi.org/blog/fundamentals-ai-machine-learning-free-online-course/

Join our free online course Introduction to Machine Learning and AI to discover the fundamentals of machine learning and learn to train your own machine learning models using free online tools.

Drawing of a machine learning robot helping a human identify spam at a computer.

Although artificial intelligence (AI) was once the province of science fiction, these days you’re very likely to hear the term in relation to new technologies, whether that’s facial recognition, medical diagnostic tools, or self-driving cars, which use AI systems to make decisions or predictions.

By the end of this free online course, you will have an appreciation for what goes into machine learning and artificial intelligence systems — and why you should think carefully about what comes out.

Machine learning — a brief overview

You’ll also often hear about AI systems that use machine learning (ML). Very simply, we can say that programs created using ML are ‘trained’ on large collections of data to ‘learn’ to produce more accurate outputs over time. One rather funny application you might have heard of is the ‘muffin or chihuahua?’ image recognition task.

Drawing of a machine learning ars rover trying to decide whether it is seeing an alien or a rock.

More precisely, we would say that a ML algorithm builds a model, based on large collections of data (the training data), without being explicitly programmed to do so. The model is ‘finished’ when it makes predictions or decisions with an acceptable level of accuracy. (For example, it rarely mistakes a muffin for a chihuahua in a photo.) It is then considered to be able to make predictions or decisions using new data in the real world.

It’s important to understand AI and ML — especially for educators

But how does all this actually work? If you don’t know, it’s hard to judge what the impacts of these technologies might be, and how we can be sure they benefit everyone — an important discussion that needs to involve people from across all of society. Not knowing can also be a barrier to using AI, whether that’s for a hobby, as part of your job, or to help your community solve a problem.

some things that machine learning and AI systems can be built into: streetlamps, waste collecting vehicles, cars, traffic lights.

For teachers and educators it’s particularly important to have a good foundational knowledge of AI and ML, as they need to teach their learners what the young people need to know about these technologies and how they impact their lives. (We’ve also got a free seminar series about teaching these topics.)

To help you understand the fundamentals of AI and ML, we’ve put together a free online course: Introduction to Machine Learning and AI. Over four weeks in two hours per week, you’ll learn how machine learning can be used to solve problems, without going too deeply into the mathematical details. You’ll also get to grips with the different ways that machines ‘learn’, and you will try out online tools such as Machine Learning for Kids and Teachable Machine to design and train your own machine learning programs.

What types of problems and tasks are AI systems used for?

As well as finding out how these AI systems work, you’ll look at the different types of tasks that they can help us address. One of these is classification — working out which group (or groups) something fits in, such as distinguishing between positive and negative product reviews, identifying an animal (or a muffin) in an image, or spotting potential medical problems in patient data.

You’ll also learn about other types of tasks ML programs are used for, such as regression (predicting a numerical value from a continuous range) and knowledge organisation (spotting links between different pieces of data or clusters of similar data). Towards the end of the course you’ll dive into one of the hottest topics in AI today: neural networks, which are ML models whose design is inspired by networks of brain cells (neurons).

drawing of a small machine learning neural network.

Before an ML program can be trained, you need to collect data to train it with. During the course you’ll see how tools from statistics and data science are important for ML — but also how ethical issues can arise both when data is collected and when the outputs of an ML program are used.

By the end of the course, you will have an appreciation for what goes into machine learning and artificial intelligence systems — and why you should think carefully about what comes out.

Sign up to the course today, for free

The Introduction to Machine Learning and AI course is open for you to sign up to now. Sign-ups will pause after 12 December. Once you sign up, you’ll have access for six weeks. During this time you’ll be able to interact with your fellow learners, and before 25 October, you’ll also benefit from the support of our expert facilitators. So what are you waiting for?

Share your views as part of our research

As part of our research on computing education, we would like to find out about educators’ views on machine learning. Before you start the course, we will ask you to complete a short survey. As a thank you for helping us with our research, you will be offered the chance to take part in a prize draw for a £50 book token!

Learn more about AI, its impacts, and teaching learners about them

To develop your computing knowledge and skills, you might also want to:

If you are a teacher in England, you can develop your teaching skills through the National Centre for Computing Education, which will give you free upgrades for our courses (including Introduction to Machine Learning and AI) so you’ll receive certificates and unlimited access.

The post Learn the fundamentals of AI and machine learning with our free online course appeared first on Raspberry Pi.

Should we teach AI and ML differently to other areas of computer science? A challenge

Post Syndicated from Sue Sentance original https://www.raspberrypi.org/blog/research-seminar-data-centric-ai-ml-teaching-in-school/

Between September 2021 and March 2022, we’re partnering with The Alan Turing Institute to host a series of free research seminars about how to teach AI and data science to young people.

In the second seminar of the series, we were excited to hear from Professor Carsten Schulte, Yannik Fleischer, and Lukas Höper from the University of Paderborn, Germany, who presented on the topic of teaching AI and machine learning (ML) from a data-centric perspective. Their talk raised the question of whether and how AI and ML should be taught differently from other themes in the computer science curriculum at school.

Machine behaviour — a new field of study?

The rationale behind the speakers’ work is a concept they call hybrid interaction system, referring to the way that humans and machines interact. To explain this concept, Carsten referred to an 2019 article published in Nature by Iyad Rahwan and colleagues: Machine hehaviour. The article’s authors propose that the study of AI agents (complex and simple algorithms that make decisions) should be a separate, cross-disciplinary field of study, because of the ubiquity and complexity of AI systems, and because these systems can have both beneficial and detrimental impacts on humanity, which can be difficult to evaluate. (Our previous seminar by Mhairi Aitken highlighted some of these impacts.) The authors state that to study this field, we need to draw on scientific practices from across different fields, as shown below:

Machine behaviour as a field sits at the intersection of AI engineering and behavioural science. Quantitative evidence from machine behaviour studies feeds into the study of the impact of technology, which in turn feeds questions and practices into engineering and behavioural science.
The interdisciplinarity of machine behaviour. (Image taken from Rahwan et al [1])

In establishing their argument, the authors compare the study of animal behaviour and machine behaviour, citing that both fields consider aspects such as mechanism, development, evolution and function. They describe how part of this proposed machine behaviour field may focus on studying individual machines’ behaviour, while collective machines and what they call ‘hybrid human-machine behaviour’ can also be studied. By focusing on the complexities of the interactions between machines and humans, we can think both about machines shaping human behaviour and humans shaping machine behaviour, and a sort of ‘co-behaviour’ as they work together. Thus, the authors conclude that machine behaviour is an interdisciplinary area that we should study in a different way to computer science.

Carsten and his team said that, as educators, we will need to draw on the parameters and frameworks of this machine behaviour field to be able to effectively teach AI and machine learning in school. They argue that our approach should be centred on data, rather than on code. I believe this is a challenge to those of us developing tools and resources to support young people, and that we should be open to these ideas as we forge ahead in our work in this area.

Ideas or artefacts?

In the interpretation of computational thinking popularised in 2006 by Jeanette Wing, she introduces computational thinking as being about ‘ideas, not artefacts’. When we, the computing education community, started to think about computational thinking, we moved from focusing on specific technology — and how to understand and use it — to the ideas or principles underlying the domain. The challenge now is: have we gone too far in that direction?

Carsten argued that, if we are to understand machine behaviour, and in particular, human-machine co-behaviour, which he refers to as the hybrid interaction system, then we need to be studying   artefacts as well as ideas.

Throughout the seminar, the speakers reminded us to keep in mind artefacts, issues of bias, the role of data, and potential implications for the way we teach.

Studying machine learning: a different focus

In addition, Carsten highlighted a number of differences between learning ML and learning other areas of computer science, including traditional programming:

  1. The process of problem-solving is different. Traditionally, we might try to understand the problem, derive a solution in terms of an algorithm, then understand the solution. In ML, the data shapes the model, and we do not need a deep understanding of either the problem or the solution.
  2. Our tolerance of inaccuracy is different. Traditionally, we teach young people to design programs that lead to an accurate solution. However, the nature of ML means that there will be an error rate, which we strive to minimise. 
  3. The role of code is different. Rather than the code doing the work as in traditional programming, the code is only a small part of a real-world ML system. 

These differences imply that our teaching should adapt too.

A graphic demonstrating that in machine learning as compared to other areas of computer science, the process of problem-solving, tolerance of inaccuracy, and role of code is different.
Click to enlarge.

ProDaBi: a programme for teaching AI, data science, and ML in secondary school

In Germany, education is devolved to state governments. Although computer science (known as informatics) was only last year introduced as a mandatory subject in lower secondary schools in North Rhine-Westphalia, where Paderborn is located, it has been taught at the upper secondary levels for many years. ProDaBi is a project that researchers have been running at Paderborn University since 2017, with the aim of developing a secondary school curriculum around data science, AI, and ML.

The ProDaBi curriculum includes:

  • Two modules for 11- to 12-year-olds covering decision trees and data awareness (ethical aspects), introduced this year
  • A short course for 13-year-olds covering aspects of artificial intelligence, through the game Hexapawn
  • A set of modules for 14- to 15-year-olds, covering data science, data exploration, decision trees, neural networks, and data awareness (ethical aspects), using Jupyter notebooks
  • A project-based course for 18-year-olds, including the above topics at a more advanced level, using Codap and Jupyter notebooks to develop practical skills through projects; this course has been running the longest and is currently in its fourth iteration

Although the ProDaBi project site is in German, an English translation is available.

Learning modules developed as part of the ProDaBi project.
Modules developed as part of the ProDaBi project

Our speakers described example activities from three of the modules:

  • Hexapawn, a two-player game inspired by the work of Donald Michie in 1961. The purpose of this activity is to support learners in reflecting on the way the machine learns. Children can then relate the activity to the behavior of AI agents such as autonomous cars. An English version of the activity is available. 
  • Data cards, a series of activities to teach about decision trees. The cards are designed in a ‘Top Trumps’ style, and based on food items, with unplugged and digital elements. 
  • Data awareness, a module focusing on the amount of data an individual can generate as they move through a city, in this case through the mobile phone network. Children are encouraged to reflect on personal data in the context of the interaction between the human and data-driven artefact, and how their view of the world influences their interpretation of the data that they are given.

Questioning how we should teach AI and ML at school

There was a lot to digest in this seminar: challenging ideas and some new concepts, for me anyway. An important takeaway for me was how much we do not yet know about the concepts and skills we should be teaching in school around AI and ML, and about the approaches that we should be using to teach them effectively. Research such as that being carried out in Paderborn, demonstrating a data-centric approach, can really augment our understanding, and I’m looking forward to following the work of Carsten and his team.

Carsten and colleagues ended with this summary and discussion point for the audience:

“‘AI education’ requires developing an adequate picture of the hybrid interaction system — a kind of data-driven, emergent ecosystem which needs to be made explicitly to understand the transformative role as well as the technological basics of these artificial intelligence tools and how they are related to data science.”

You can catch up on the seminar, including the Q&A with Carsten and his colleagues, here:

Join our next seminar

This seminar really extended our thinking about AI education, and we look forward to introducing new perspectives from different researchers each month. At our next seminar on Tuesday 2 November at 17:00–18:30 BST / 12:00–13:30 EDT / 9:00–10:30 PDT / 18:00–19:30 CEST, we will welcome Professor Matti Tedre and Henriikka Vartiainen (University of Eastern Finland). The two Finnish researchers will talk about emerging trajectories in ML education for K-12. We look forward to meeting you there.

Carsten and their colleagues are also running a series of seminars on AI and data science: you can find out about these on their registration page.

You can increase your own understanding of machine learning by joining our latest free online course!


[1] Rahwan, I., Cebrian, M., Obradovich, N., Bongard, J., Bonnefon, J. F., Breazeal, C., … & Wellman, M. (2019). Machine behaviour. Nature, 568(7753), 477-486.

The post Should we teach AI and ML differently to other areas of computer science? A challenge appeared first on Raspberry Pi.

Scaling Ad Verification with Machine Learning and AWS Inferentia

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/scaling-ad-verification-with-machine-learning-and-aws-inferentia/

Amazon Advertising helps companies build their brand and connect with shoppers, through ads shown both within and beyond Amazon’s store, including websites, apps, and streaming TV content in more than 15 countries. Businesses or brands of all sizes including registered sellers, vendors, book vendors, Kindle Direct Publishing (KDP) authors, app developers, and agencies on Amazon marketplaces can upload their own ad creatives, which can include images, video, audio, and of course products sold on Amazon. To promote an accurate, safe, and pleasant shopping experience, these ads must comply with content guidelines.

Here’s a simple example. Can you figure out why two of the following ads would not be compliant?

Amazon Ads

The ad in the center doesn’t feature the product in context. It also shows the same product multiple times. The ad on the right looks much better, but it contains text, which is not allowed for this ad format.

New ad creatives come in many sizes, shapes, and languages, and at very large scale. Assuming it would even be possible, verifying them manually would be a complex, slow, and error-prone process. Machine learning (ML) to the rescue!

Using Machine Learning to Verify Ad Creatives
Each ad must be evaluated against many rules, which no single model could reasonably learn. In fact, it takes many models to check ad properties, for example:

  • Media-specific models that analyze images, video, audio, and text that describe the advertised products.
  • Content-specific models that detect headlines, text, backgrounds, and objects.
  • Language-specific models that validate syntax and grammar, and flag unapproved language.

Some of these capabilities are readily available in AWS AI services. For example, Amazon Advertising teams use Amazon Rekognition to extract metadata information from images and videos.

Other capabilities require custom models trained on in-house datasets. For this purpose, Amazon teams labeled large ad datasets with Amazon SageMaker Ground Truth, using a combination of manual labeling, and automatic labeling with active learning. Using these datasets, teams then used Amazon SageMaker to train models, and deploy them automatically on real-time prediction endpoints with the AWS Cloud Development Kit (AWS CDK) and Amazon SageMaker Pipelines.

When a business uploads a new ad, relevant models are invoked simultaneously to process specific ad components, extract signals, and output a quality score. All scores are then consolidated, and sent to a final model that predicts whether the ad should be manually reviewed.

Thanks to this process, most new ads can be verified and published automatically, which means businesses can quickly promote their brand and products, and Amazon can maintain a high-quality shopping experience.

However, faced with a growing number of more complex models, Amazon Advertising teams started to look for a solution that could increase prediction throughput while reducing costs. They found it in AWS Inferentia.

What is AWS Inferentia?
Available in Amazon EC2 Inf1 instances, AWS Inferentia is a custom chip built by AWS to accelerate ML inference workloads, and optimize their cost. Each AWS Inferentia chip contains four NeuronCores. Each NeuronCore implements a high-performance systolic array matrix multiply engine, which massively speeds up typical deep learning operations such as convolution and transformers. NeuronCores are also equipped with a large on-chip cache, which helps to cut down on external memory accesses, reduce latency, and increase throughput.

Thanks to AWS Neuron, a software development kit for ML inference, AWS Inferentia can be used natively from ML frameworks like TensorFlow, PyTorch, and Apache MXNet. It consists of a compiler, runtime, and profiling tools that enable you to run high-performance and low latency inference. For many trained models, compilation is a one-liner with the Neuron SDK, not requiring any additional application code changes. The result is a high performance inference deployment, that can easily scale while keeping costs under control. You’ll find many examples in the Neuron documentation. Alternatively, thanks to Amazon SageMaker Neo, you can also compile models directly in SageMaker.

Scaling Ad Verification with AWS Inferentia
Amazon Advertising teams started compiling their models for Inferentia, and deploying them on SageMaker endpoints powered by Inf1 instances. They compared the Inf1 endpoints to the GPU endpoints they had been using so far. They found that large deep learning models like BERT run more effectively on Inferentia, which decreases latency by 30%, and reduces costs by 71%. A few months ago, ML teams working on Amazon Alexa came to the same conclusions.

What about prediction quality? GPU models are typically trained with single-precision floating-point data (FP32). Inferentia uses the shorter FP16, BF16, and INT8 data types, which can create slight differences in predicted output. Running both GPU and Inferentia models in parallel, teams analyzed probability distributions, tweaked prediction thresholds for their Inferentia models, and made sure that these models would predict ads just like GPU models did. You can learn more about these techniques in the Performance Tuning section of the documentation.

With these final adjustments out of the way, the Amazon Advertising teams started phasing out GPU models. All text data is now predicted on Inferentia, and the migration of computer vision pipelines is in progress.

AWS Customers Are Successful with AWS Inferentia
In addition to Amazon teams, customers also report very nice results on scaling and optimizing their ML workloads with Inferentia.

Binghui Ouyang, Senior Data Scientist at Autodesk: “Autodesk is advancing the cognitive technology of our AI-powered virtual assistant, Autodesk Virtual Agent (AVA) by using Inferentia. AVA answers over 100,000 customer questions per month by applying natural language understanding (NLU) and deep learning techniques to extract the context, intent, and meaning behind inquiries. Piloting Inferentia, we are able to obtain a 4.9x higher throughput over G4dn for our NLU models, and look forward to running more workloads on the Inferentia-based Inf1 instances.

Paul Fryzel, Principal Engineer, AI Infrastructure at Condé Nast: “Condé Nast’s global portfolio encompasses over 20 leading media brands, including Wired, Vogue, and Vanity Fair. Within a few weeks, our team was able to integrate our recommendation engine with AWS Inferentia chips. This union enables multiple runtime optimizations for state-of-the-art natural language models on SageMaker’s Inf1 instances. As a result, we observed a 72% reduction in cost than the previously deployed GPU instances.”

Getting Started
You can get started with Inferentia and Inf1 instances today, either on Amazon SageMaker or with the Neuron SDK. This self-paced workshop walks you through both options.

Give it a try, and let us know what you think. As always, we look forward to your feedback. You can send it through your usual AWS Support contacts, post it on the AWS Forum for SageMaker, or on the Neuron SDK Github repository.

– Julien

What’s a kangaroo?! AI ethics lessons for and from the younger generation

Post Syndicated from Sue Sentance original https://www.raspberrypi.org/blog/ai-ethics-lessons-education-children-research/

Between September 2021 and March 2022, we’re partnering with The Alan Turing Institute to host speakers from the UK, Finland, Germany, and the USA presenting a series of free research seminars about AI and data science education for young people. These rapidly developing technologies have a huge and growing impact on our lives, so it’s important for young people to understand them both from a technical and a societal perspective, and for educators to learn how to best support them to gain this understanding.

Mhairi Aitken.

In our first seminar we were beyond delighted to hear from Dr Mhairi Aitken, Ethics Fellow at The Alan Turing Institute. Mhairi is a sociologist whose research examines social and ethical dimensions of digital innovation, particularly relating to uses of data and AI. You can catch up on her full presentation and the Q&A with her in the video below.

Why we need AI ethics

The increased use of AI in society and industry is bringing some amazing benefits. In healthcare for example, AI can facilitate early diagnosis of life-threatening conditions and provide more accurate surgery through robotics. AI technology is also already being used in housing, financial services, social services, retail, and marketing. Concerns have been raised about the ethical implications of some aspects of these technologies, and Mhairi gave examples of a number of controversies to introduce us to the topic.

“Ethics considers not what we can do but rather what we should do — and what we should not do.”

Mhairi Aitken

One such controversy in England took place during the coronavirus pandemic, when an AI system was used to make decisions about school grades awarded to students. The system’s algorithm drew on grades awarded in previous years to other students of a school to upgrade or downgrade grades given by teachers; this was seen as deeply unfair and raised public consciousness of the real-life impact that AI decision-making systems can have.

An AI system was used in England last year to make decisions about school grades awarded to students — this was seen as deeply unfair.

Another high-profile controversy was caused by biased machine learning-based facial recognition systems and explored in Shalini Kantayya’s documentary Coded Bias. Such facial recognition systems have been shown to be much better at recognising a white male face than a black female one, demonstrating the inequitable impact of the technology.

What should AI be used for?

There is a clear need to consider both the positive and negative impacts of AI in society. Mhairi stressed that using AI effectively and ethically is not just about mitigating negative impacts but also about maximising benefits. She told us that bringing ethics into the discussion means that we start to move on from what AI applications can do to what they should and should not do. To outline how ethics can be applied to AI, Mhairi first outlined four key ethical principles:

  • Beneficence (do good)
  • Nonmaleficence (do no harm)
  • Autonomy
  • Justice

Mhairi shared a number of concrete questions that ethics raise about new technologies including AI: 

  • How do we ensure the benefits of new technologies are experienced equitably across society?
  • Do AI systems lead to discriminatory practices and outcomes?
  • Do new forms of data collection and monitoring threaten individuals’ privacy?
  • Do new forms of monitoring lead to a Big Brother society?
  • To what extent are individuals in control of the ways they interact with AI technologies or how these technologies impact their lives?
  • How can we protect against unjust outcomes, ensuring AI technologies do not exacerbate existing inequalities or reinforce prejudices?
  • How do we ensure diverse perspectives and interests are reflected in the design, development, and deployment of AI systems? 

Who gets to inform AI systems? The kangaroo metaphor

To mitigate negative impacts and maximise benefits of an AI system in practice, it’s crucial to consider the context in which the system is developed and used. Mhairi illustrated this point using the story of an autonomous vehicle, a self-driving car, developed in Sweden in 2017. It had been thoroughly safety-tested in the country, including tests of its ability to recognise wild animals that may cross its path, for example elk and moose. However, when the car was used in Australia, it was not able to recognise kangaroos that hopped into the road! Because the system had not been tested with kangaroos during its development, it did not know what they were. As a result, the self-driving car’s safety and reliability significantly decreased when it was taken out of the context in which it had been developed, jeopardising people and kangaroos.

A parent kangaroo with a young kangaroo in its pouch stands on grass.
Mitigating negative impacts and maximising benefits of AI systems requires actively involving the perspectives of groups that may be affected by the system — ‘kangoroos’ in Mhairi’s metaphor.

Mhairi used the kangaroo example as a metaphor to illustrate ethical issues around AI: the creators of an AI system make certain assumptions about what an AI system needs to know and how it needs to operate; these assumptions always reflect the positions, perspectives, and biases of the people and organisations that develop and train the system. Therefore, AI creators need to include metaphorical ‘kangaroos’ in the design and development of an AI system to ensure that their perspectives inform the system. Mhairi highlighted children as an important group of ‘kangaroos’. 

AI in children’s lives

AI may have far-reaching consequences in children’s lives, where it’s being used for decision-making around access to resources and support. Mhairi explained the impact that AI systems are already having on young people’s lives through these systems’ deployment in children’s education, in apps that children use, and in children’s lives as consumers.

A young child sits at a table using a tablet.
AI systems are already having an impact on children’s lives.

Children can be taught not only that AI impacts their lives, but also that it can get things wrong and that it reflects human interests and biases. However, Mhairi was keen to emphasise that we need to find out what children know and want to know before we make assumptions about what they should be taught. Moreover, engaging children in discussions about AI is not only about them learning about AI, it’s also about ethical practice: what can people making decisions about AI learn from children by listening to their views and perspectives?

AI research that listens to children

UNICEF, the United Nations Children’s Fund, has expressed concerns about the impact of new AI technologies used on children and young people. They have developed the UNICEF Requirements for Child-Centred AI.

Unicef Requirements for Child-Centred AI: Support childrenʼs development and well-being. Ensure inclusion of and for children. Prioritise fairness and non-discrimination for children. Protect childrenʼs data and privacy. Ensure safety for children. Provide transparency, explainability, and accountability for children. Empower governments and businesses with knowledge of AI and childrenʼs rights. Prepare children for present and future developments in AI. Create an enabling environment for child-centred AI. Engage in digital cooperation.
UNICEF’s requirements for child-centred AI, as presented by Mhairi. Click to enlarge.

Together with UNICEF, Mhairi and her colleagues working on the Ethics Theme in the Public Policy Programme at The Alan Turing Institute are engaged in new research to pilot UNICEF’s Child-Centred Requirements for AI, and to examine how these impact public sector uses of AI. A key aspect of this research is to hear from children themselves and to develop approaches to engage children to inform future ethical practices relating to AI in the public sector. The researchers hope to find out how we can best engage children and ensure that their voices are at the heart of the discussion about AI and ethics.

We all learned a tremendous amount from Mhairi and her work on this important topic. After her presentation, we had a lively discussion where many of the participants relayed the conversations they had had about AI ethics and shared their own concerns and experiences and many links to resources. The Q&A with Mhairi is included in the video recording.

What we love about our research seminars is that everyone attending can share their thoughts, and as a result we learn so much from attendees as well as from our speakers!

It’s impossible to cover more than a tiny fraction of the seminar here, so I do urge you to take the time to watch the seminar recording. You can also catch up on our previous seminars through our blogs and videos.

Join our next seminar

We have six more seminars in our free series on AI, machine learning, and data science education, taking place every first Tuesday of the month. At our next seminar on Tuesday 5 October at 17:00–18:30 BST / 12:00–13:30 EDT / 9:00–10:30 PDT / 18:00–19:30 CEST, we will welcome Professor Carsten Schulte, Yannik Fleischer, and Lukas Höper from the University of Paderborn, Germany, who will be presenting on the topic of teaching AI and machine learning (ML) from a data-centric perspective (find out more here). Their talk will raise the questions of whether and how AI and ML should be taught differently from other themes in the computer science curriculum at school.

Sign up now and we’ll send you the link to join on the day of the seminar — don’t forget to put the date in your diary.

I look forward to meeting you there!

In the meantime, we’re offering a brand-new, free online course that introduces machine learning with a practical focus — ideal for educators and anyone interested in exploring AI technology for the first time.

The post What’s a kangaroo?! AI ethics lessons for and from the younger generation appeared first on Raspberry Pi.