Tag Archives: image recognition

Safeguarding your brand identity: Logo Matching for Brand Protection

Post Syndicated from Alexandra Moraru http://blog.cloudflare.com/author/alexandra/ original https://blog.cloudflare.com/safeguarding-your-brand-identity-logo-matching-for-brand-protection


In an era dominated by digital landscapes, protecting your brand’s identity has become more challenging than ever. Malicious actors regularly build lookalike websites, complete with official logos and spoofed domains, to try to dupe customers and employees. These kinds of phishing attacks can damage your reputation, erode customer trust, or even result in data breaches.

In March 2023 we introduced Cloudflare’s Brand and Phishing Protection suite, beginning with Brand Domain Name Alerts. This tool recognizes so-called “confusable” domains (which can be nearly indistinguishable from their authentic counterparts) by sifting through the trillions of DNS requests passing through Cloudflare’s DNS resolver, 1.1.1.1. This helps brands and organizations stay ahead of malicious actors by spotting suspicious domains as soon as they appear in the wild.

Today we are excited to expand our Brand Protection toolkit with the addition of Logo Matching. Logo Matching is a powerful tool that allows brands to detect unauthorized logo usage: if Cloudflare detects your logo on an unauthorized site, you receive an immediate notification.

The new Logo Matching feature is a direct result of a frequent request from our users. Phishing websites often use official brand logos as part of their facade. In fact, the appearance of unauthorized logos is a strong signal that a hitherto dormant suspicious domain is being weaponized. Being able to identify these sites before they are widely distributed is a powerful tool in defending against phishing attacks. Organizations can use Cloudflare Gateway to block employees from connecting to sites with a suspicious domain and unauthorized logo use.

Imagine having the power to fortify your brand’s presence and reputation. By detecting instances where your logo is being exploited, you gain the upper hand in protecting your brand from potential fraud and phishing attacks.

Getting started with Logo Matching

For most brands, the first step to leveraging Logo Matching will be to configure Domain Name Alerts. For example, we might decide to set up an alert for example.com, which will use fuzzy matching to detect lookalike, high-risk domain names. All sites that trigger an alert are automatically analyzed by Cloudflare’s phishing scanner, which gathers technical information about each site, including SSL certificate data, HTTP request and response data, page performance data, DNS records, and more — all of which inform a machine-learning based phishing risk analysis.

Logo Matching further extends this scan by looking for matching images. The system leverages image recognition algorithms to crawl through scanned domains, identifying matches even when images have undergone slight modifications or alterations.

Once configured, Domain Name Alerts and the scans they trigger will continue on an ongoing basis. In addition, Logo Matching monitors for images across all domains scanned by Cloudflare’s phishing scanner, including those scanned by other Brand Protection users, as well as scans initiated via the Cloudflare Radar URL scanner, and the Investigate Portal within Cloudflare’s Security Center dashboard.

How we built Logo Matching for Brand Protection

Under the hood of our API Insights

Now, let’s dive deeper into the engine powering this feature – our Brand Protection API. This API serves as the backbone of the entire process. Not only does it enable users to submit logos and brand images for scanning, but it also orchestrates the complex matching process.

When a logo is submitted through the API, the Logo Matching feature not only identifies potential matches but also allows customers to save a query, providing an easy way to refer back to their queries and see the most recent results. If a customer chooses to save a query, the logo is swiftly added to our data storage in R2, Cloudflare’s zero egress fee object storage. This foundational feature enables us to continuously provide updated results without the customer having to create a new query for the same logo.

The API ensures real-time responses for logo submissions, simultaneously kick-starting our internal scanning pipelines. An image look-back ID is generated to facilitate seamless tracking and processing of logo submissions. This identifier allows us to keep a record of the submitted images, ensuring that we can efficiently manage and process them through our system.

Scan result retrieval

As images undergo scanning, the API remains the conduit for result retrieval. Its role here is to constantly monitor and provide the results in real time. During scanning, the API ensures users receive timely updates. If scanning is still in progress, a “still scanning” status is communicated. Upon completion, the API is designed to relay crucial information — details on matches if found, or a simple “no matches” declaration.

Storing and maintaining logo data

In the background, we maintain a vectorized version of all user-uploaded logos when the user query is saved. This system, acting as a logo matching subscriber, is entrusted with the responsibility of ensuring accurate and up-to-date logo matching.

To accomplish this, two strategies come into play. Firstly, the subscriber stays attuned to revisions in the logo set. It saves vectorized logo sets with every revision and regular checks are conducted by the subscriber to ensure alignment between the vectorized logos and those saved in the database.

While monitoring the query, the subscriber employs a diff-based strategy. This recalibrates the vectorized logo set against the current logos stored in the database, ensuring a seamless transition into processing.

Shaping the future of brand protection: our roadmap ahead

With the introduction of the Logo Matching feature, Cloudflare’s Brand Protection suite advances to the next level of brand integrity management. By enabling you to detect and analyze, and act on unauthorized logo usage, we’re helping businesses to take better care of their brand identity.

At Cloudflare, we’re committed to shaping a comprehensive brand protection solution that anticipates and mitigates risks proactively. In the future, we plan to add enhancements to our brand protection solution with features like automated cease and desist letters for swift legal action against unauthorized logo use, proactive domain monitoring upon onboarding, simplified reporting of brand impersonations and more.

Getting started

If you’re an Enterprise customer, sign up for Beta Access for Brand protection now to gain access to private scanning for your domains, logo matching, save queries and set up alerts on matched domains. Learn more about Brand Protection here.

Classroom activities to discuss machine learning accuracy and ethics | Hello World #18

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/classroom-activity-machine-learning-accuracy-ethics-hello-world-18/

In Hello World issue 18, available as a free PDF download, teacher Michael Jones shares how to use Teachable Machine with learners aged 13–14 in your classroom to investigate issues of accuracy and ethics in machine learning models.

Machine learning: Accuracy and ethics

The landscape for working with machine learning/AI/deep learning has grown considerably over the last couple of years. Students are now able to develop their understanding from the hard-coded end via resources such as Machine Learning for Kids, get their hands dirty using relatively inexpensive hardware such as the Nvidia Jetson Nano, and build a classification machine using the Google-driven Teachable Machine resources. I have used all three of the above with my students, and this article focuses on Teachable Machine.

For this module, I’m more concerned with the fuzzy end of AI, including how credible AI decisions are, and the elephant-in-the-room aspect of bias and potential for harm.

Michael Jones

For the worried, there is absolutely no coding involved in this resource; the ‘machine’ behind the portal does the hard work for you. For my Year 9 classes (students aged 13 to 14) undertaking a short, three-week module, this was ideal. The coding is important, but was not my focus. For this module, I’m more concerned with the fuzzy end of AI, including how credible AI decisions are, and the elephant-in-the-room aspect of bias and potential for harm.

Getting started with Teachable Machine activities

There are three possible routes to use in Teachable Machine, and my focus is the ‘Image Project’, and within this, the ‘Standard image model’. From there, you are presented with a basic training scenario template — see Hello World issue 16 (pages 84–86) for a step-by-step set-up and training guide. For this part of the project, my students trained the machine to recognise different breeds of dog, with border collie, labrador, saluki, and so on as classes. Any AI system devoted to recognition requires a substantial set of training data. Fortunately, there are a number of freely available data sets online (for example, download a folder of dog photos separated by breed by accessing helloworld.cc/dogdata). Be warned, these can be large, consisting of thousands of images. If you have more time, you may want to set students off to collect data to upload using a camera (just be aware that this can present safeguarding considerations). This is a key learning point with your students and an opportunity to discuss the time it takes to gather such data, and variations in the data (for example, images of dogs from the front, side, or top).

Drawing of a machine learning ars rover trying to decide whether it is seeing an alien or a rock.
Image recognition is a common application of machine learning technology.

Once you have downloaded your folders, upload the images to your Teachable Machine project. It is unlikely that you will be able to upload a whole subfolder at once — my students have found that the optimum number of images seems to be twelve. Remember to build this time for downloading and uploading into your lesson plan. This is a good opportunity to discuss the need for balance in the training data. Ask questions such as, “How likely would the model be to identify a saluki if the training set contained 10 salukis and 30 of the other dogs?” This is a left-field way of dropping the idea of bias into the exploration of AI — more on that later!

Accuracy issues in machine learning models

If you have got this far, the heavy lifting is complete and Google’s training engine will now do the work for you. Once you have set your model on its training, leave the system to complete its work — it takes seconds, even on large sets of data. Once it’s done, you should be ready to test you model. If all has gone well and a webcam is attached to your computer, the Output window will give a prediction of what is being viewed. Again, the article in Hello World issue 16 takes you through the exact steps of this process. Make sure you have several images ready to test. See Figure 1a for the response to an image of a saluki presented to the model. As you might expect, it is showing as a 100 percent prediction.

Screenshots from Teachable Machine showing photos of dogs classified as specific breeds with different degrees of confidence by a machine learning model.
Figure 1: Outputs of a Teachable Machine model classifying photos of dog breeds. 1a (left): Photo of a saluki. 1b (right): Photo of a Samoyed and two people.

It will spark an interesting discussion if you now try the same operation with an image with items other than the one you’re testing in it. For example see Figure 1b, in which two people are in the image along with the Samoyed dog. The model is undecided, as the people are affecting the outcome. This raises the question of accuracy. Which features are being used to identify the dogs as border collie and saluki? Why are the humans in the image throwing the model off the scent?

Getting closer to home, training a model on human faces provides an opportunity to explore AI accuracy through the question of what might differentiate a female from a male face. You can find a model at helloworld.cc/maleorfemale that contains 5418 images almost evenly spread across male and female faces (see Figure 2). Note that this model will take a little longer to train.

Screenshot from Teachable Machine showing two datasets of photos of faces labeled either male or female.
Figure 2: Two photo sets of faces labeled either male or female, uploaded to Teachable Machine.

Once trained, try the model out. Props really help — a top hat, wig, and beard give the model a testing time (pun intended). In this test (see Figure 3), I presented myself to the model face-on and, unsurprisingly, I came out as 100 percent male. However, adding a judge’s wig forces the model into a rethink, and a beard produces a variety of results, but leaves the model unsure. It might be reasonable to assume that our model uses hair length as a strong feature. Adding a top hat to the ensemble brings the model back to a 100 percent prediction that the image is of a male.

Screenshots from Teachable Machine showing two datasets of a model classifying photos of the same face as either male or female with different degrees of confidence, based on the face is wearing a wig, a fake beard, or a tophat.
Figure 3: Outputs of a Teachable Machine model classifying photos of the author’s face as male or female with different degrees of confidence. Click to enlarge.

Machine learning uses a best-fit principle. The outputs, in this case whether I am male or female, have a greater certainty of male (65 percent) versus a lesser certainty of female (35 percent) if I wear a beard (Figure 3, second image from the right). Remove the beard and the likelihood of me being female increases by 2 percent (Figure 3, second image from the left).

Bias in machine learning models

Within a fairly small set of parameters, most human faces are similar. However, when you start digging, the research points to there being bias in AI (whether this is conscious or unconscious is a debate for another day!). You can exemplify this by firstly creating classes with labels such as ‘young smart’, ‘old smart’, ‘young not smart’, and ‘old not smart’. Select images that you think would fit the classes, and train them in Teachable Machine. You can then test the model by asking your students to find images they think fit each category. Run them against the model and ask students to debate whether the AI is acting fairly, and if not, why they think that is. Who is training these models? What images are they receiving? Similarly, you could create classes of images of known past criminals and heroes. Train the model before putting yourself in front of it. How far up the percentage scale are you towards being a criminal? It soon becomes frighteningly worrying that unless you are white and seemingly middle class, AI may prove problematic to you, from decisions on financial products such as mortgages through to mistaken arrest and identification.

It soon becomes frighteningly worrying that unless you are white and seemingly middle class, AI may prove problematic to you, from decisions on financial products such as mortgages through to mistaken arrest and identification.

Michael Jones

Encourage your students to discuss how they could influence this issue of race, class, and gender bias — for example, what rules would they use for identifying suitable images for a data set? There are some interesting articles on this issue that you can share with your students at helloworld.cc/aibias1 and helloworld.cc/aibias2.

Where next with your learners?

In the classroom, you could then follow the route of building models that identify letters for words, for example. One of my students built a model that could identify a range of spoons and forks. You may notice that Teachable Machine can also be run on Arduino boards, which adds an extra dimension. Why not get your students to create their own AI assistant that responds to commands? The possibilities are there to be explored. If you’re using webcams to collect photos yourself, why not create a system that will identify students? If you are lucky enough to have a set of identical twins in your class, that adds just a little more flavour! Teachable Machine offers a hands-on way to demonstrate the issues of AI accuracy and bias, and gives students a healthy opportunity for debate.

Michael Jones is director of Computer Science at Northfleet Technology College in the UK. He is a Specialist Leader of Education and a CS Champion for the National Centre for Computing Education.

More resources for AI and data science education

At the Foundation, AI education is one of our focus areas. Here is how we are supporting you and your learners in this area already:

  • Hello World issue 12 focuses on AI and machine learning education, with many practical resources, insightful interviews, and inspiring features from computer science educators. Download your free copy of issue 12 now.
  • In Hello World issue 16, the focus is on all things data science and data literacy for your learners. As always, you can download a free copy of the issue.
  • On our Hello World podcast, we’ve got episodes where we talk with practicing computing educators about how they bring AI, AI ethics, machine learning, and data science to the young people they teach.
  • If you’d like a practical introduction to the basics of machine learning and how to use it, take our free online course.
An image demonstrating that AI systems for object recognition do not distinguish between a real banana on a desk and the photo of a banana on a laptop screen.
  • Computing education researchers are working to answer the many open questions about what good AI and data science education looks like for young people. To learn more, you can watch the recordings from our research seminar series focused on this. We ourselves are working on research projects in this area and will share the results freely with the computing education community.
  • You can find a list of free educational resources about these topics that we’ve collated based on our research seminars, seminar participants’ recommendations, and our own work.

The post Classroom activities to discuss machine learning accuracy and ethics | Hello World #18 appeared first on Raspberry Pi.