Tag Archives: AI

Teaching about AI – Teacher symposium

2025-02-25 Jane Waite

Post Syndicated from Jane Waite original https://www.raspberrypi.org/blog/teaching-about-ai-teacher-symposium/

AI has become a pervasive term that is heard with trepidation, excitement, and often a furrowed brow in school staffrooms. For educators, there is pressure to use AI applications for productivity — to save time, to help create lesson plans, to write reports, to answer emails, etc. There is also a lot of interest in using AI tools in the classroom, for example, to personalise or augment teaching and learning. However, without understanding AI technology, neither productivity nor personalisation are likely to be successful as teachers and students alike must be critical consumers of these new ways of working to be able to use them productively.

Fifty teachers and researchers posing for a photo at the AI Symposium, held at the Raspberry Pi Foundation office. — Fifty teachers and researchers share knowledge about teaching about AI.

In both England and globally, there are few new AI-based curricula being introduced and the drive for teachers and students to learn about AI in schools is lagging, with limited initiatives supporting teachers in what to teach and how to teach it. At the Raspberry Pi Foundation and Raspberry Pi Computing Education Research Centre, we decided it was time to investigate this missing link of teaching about AI, and specifically to discover what the teachers who are leading the way in this topic are doing in their classrooms.

A day of sharing and activities in Cambridge

We organised a day-long, face-to-face symposium with educators who have already started to think deeply about teaching about AI, have started to create teaching resources, and are starting to teach about AI in their classrooms. The event was held in Cambridge, England, on 1 February 2025, at the head office of the Raspberry Pi Foundation.

Photo of educators and researchers collaborating at the AI symposium. — Teachers collaborated and shared their knowledge about teaching about AI.

Over 150 educators and researchers applied to take part in the symposium. With only 50 places available, we followed a detailed protocol, whereby those who had the most experience teaching about AI in schools were selected. We also made sure that educators and researchers from different teaching contexts were selected so that there was a good mix of primary to further education phases represented. Educators and researchers from England, Scotland, and the Republic of Ireland were invited and gathered to share about their experiences. One of our main aims was to build a community of early adopters who have started along the road of classroom-based AI curriculum design and delivery.

Inspiration, examples, and expertise

To inspire the attendees with an international perspective of the topics being discussed, Professor Matti Tedre, a visiting academic from Finland, gave a brief overview of the approach to teaching about AI and resources that his research team have developed. In Finland, there is no compulsory distinct computing topic taught, so AI is taught about in other subjects, such as history. Matti showcased tools and approaches developed from the Generation AI research programme in Finland. You can read about the Finnish research programme and Matti’s two month visit to the Raspberry Pi Computing Education Research Centre in our blog.

Photo of a researcher presenting at the AI Symposium. — A Finnish perspective to teaching about AI.

Attendees were asked to talk about, share, and analyse their teaching materials. To model how to analyse resources, Ben Garside from the Raspberry Pi Foundation modelled how to complete the activities using the Experience AI resources as an example. The Experience AI materials have been co-created with Google DeepMind and are a suite of free classroom resources, teacher professional development, and hands-on activities designed to help teachers confidently deliver AI lessons. Aimed at learners aged 11 to 14, the materials are informed by the AI education framework developed at the Raspberry Pi Computing Education Research Centre and are grounded in real-world contexts. We’ve recently released new lessons on AI safety, and we’ve localised the resources for use in many countries including Africa, Asia, Europe, and North America.

In the morning session, Ben exemplified how to talk about and share learning objectives, concepts, and research underpinning materials using the Experience AI resources and in the afternoon he discussed how he had mapped the Experience AI materials to the UNESCO AI competency framework for students.

Photo of an adult presenting at the AI Symposium. — UNESCO provide important expertise.

Kelly Shiohira, from UNESCO, kindly attended our session, and gave an invaluable insight into the UNESCO AI competency framework for students. Kelly is one of the framework’s authors and her presentation helped teachers understand how the materials had been developed. The attendees then used the framework to analyse their resources, to identify gaps and to explore what progression might look like in the teaching of AI.

Photo of a whiteboard featuring different coloured post-it notes displayed featuring teachers' and researchers' ideas. — Teachers shared their knowledge about teaching about AI.

Throughout the day, the teachers worked together to share their experience of teaching about AI. They considered the concepts and learning objectives taught, what progression might look like, what the challenges and opportunities were of teaching about AI, what research informed the resources and what research needs to be done to help improve the teaching and learning of AI.

What next?

We are now analysing the vast amount of data that we gathered from the day and we will share this with the symposium participants before we share it with a wider audience. What is clear from our symposium is that teachers have crucial insights into what should be taught to students about AI, and how, and we are greatly looking forward to continuing this journey with them.

As well as the symposium, we are also conducting academic research in this area, you can read more about this in our Annual Report and on our research webpages. We will also be consulting with teachers and AI experts. If you’d like to ensure you are sent links to these blog posts, then sign up to our newsletter. If you’d like to take part in our research and potentially be interviewed about your perspectives on curriculum in AI, then contact us at: [email protected]

We also are sharing the research being done by ourselves and other researchers in the field at our research seminars. This year, our seminar series is on teaching about AI and data science in schools. Please do sign up and come along, or watch some of the presentations that have already been delivered by the amazing research teams who are endeavouring to discover what we should be teaching about AI and how in schools

The post Teaching about AI – Teacher symposium appeared first on Raspberry Pi Foundation.

More Research Showing AI Breaking the Rules

2025-02-24 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/02/more-research-showing-ai-breaking-the-rules.html

These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating.

Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any human, or any of the AI models in the study. Researchers also gave the models what they call a “scratchpad:” a text box the AI could use to “think” before making its next move, providing researchers with a window into their reasoning.

In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’—not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.

Between Jan. 10 and Feb. 13, the researchers ran hundreds of such trials with each model. OpenAI’s o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the timemaking them the only two models tested that attempted to hack without the researchers’ first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.

Here’s the paper.

Implementing Cryptography in AI Systems

2025-02-21 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/02/implementing-cryptography-in-ai-systems.html

Interesting research: “How to Securely Implement Cryptography in Deep Neural Networks.”

Abstract: The wide adoption of deep neural networks (DNNs) raises the question of how can we equip them with a desired cryptographic functionality (e.g, to decrypt an encrypted input, to verify that this input is authorized, or to hide a secure watermark in the output). The problem is that cryptographic primitives are typically designed to run on digital computers that use Boolean gates to map sequences of bits to sequences of bits, whereas DNNs are a special type of analog computer that uses linear mappings and ReLUs to map vectors of real numbers to vectors of real numbers. This discrepancy between the discrete and continuous computational models raises the question of what is the best way to implement standard cryptographic primitives as DNNs, and whether DNN implementations of secure cryptosystems remain secure in the new setting, in which an attacker can ask the DNN to process a message whose “bits” are arbitrary real numbers.

In this paper we lay the foundations of this new theory, defining the meaning of correctness and security for implementations of cryptographic primitives as ReLU-based DNNs. We then show that the natural implementations of block ciphers as DNNs can be broken in linear time by using such nonstandard inputs. We tested our attack in the case of full round AES-128, and had success rate in finding randomly chosen keys. Finally, we develop a new method for implementing any desired cryptographic functionality as a standard ReLU-based DNN in a provably secure and correct way. Our protective technique has very low overhead (a constant number of additional layers and a linear number of additional neurons), and is completely practical.

AI and Civil Service Purges

2025-02-14 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/02/ai-and-civil-service-purges.html

Donald Trump and Elon Musk’s chaotic approach to reform is upending government operations. Critical functions have been halted, tens of thousands of federal staffers are being encouraged to resign, and congressional mandates are being disregarded. The next phase: The Department of Government Efficiency reportedly wants to use AI to cut costs. According to The Washington Post, Musk’s group has started to run sensitive data from government systems through AI programs to analyze spending and determine what could be pruned. This may lead to the elimination of human jobs in favor of automation. As one government official who has been tracking Musk’s DOGE team told the Post, the ultimate aim is to use AI to replace “the human workforce with machines.” (Spokespeople for the White House and DOGE did not respond to requests for comment.)

Using AI to make government more efficient is a worthy pursuit, and this is not a new idea. The Biden administration disclosed more than 2,000 AI applications in development across the federal government. For example, FEMA has started using AI to help perform damage assessment in disaster areas. The Centers for Medicare and Medicaid Services has started using AI to look for fraudulent billing. The idea of replacing dedicated and principled civil servants with AI agents, however, is new—and complicated.

The civil service—the massive cadre of employees who operate government agencies—plays a vital role in translating laws and policy into the operation of society. New presidents can issue sweeping executive orders, but they often have no real effect until they actually change the behavior of public servants. Whether you think of these people as essential and inspiring do-gooders, boring bureaucratic functionaries, or as agents of a “deep state,” their sheer number and continuity act as ballast that resists institutional change.

This is why Trump and Musk’s actions are so significant. The more AI decision making is integrated into government, the easier change will be. If human workers are widely replaced with AI, executives will have unilateral authority to instantaneously alter the behavior of the government, profoundly raising the stakes for transitions of power in democracy. Trump’s unprecedented purge of the civil service might be the last time a president needs to replace the human beings in government in order to dictate its new functions. Future leaders may do so at the press of a button.

To be clear, the use of AI by the executive branch doesn’t have to be disastrous. In theory, it could allow new leadership to swiftly implement the wishes of its electorate. But this could go very badly in the hands of an authoritarian leader. AI systems concentrate power at the top, so they could allow an executive to effectuate change over sprawling bureaucracies instantaneously. Firing and replacing tens of thousands of human bureaucrats is a huge undertaking. Swapping one AI out for another, or modifying the rules that those AIs operate by, would be much simpler.

Social-welfare programs, if automated with AI, could be redirected to systematically benefit one group and disadvantage another with a single prompt change. Immigration-enforcement agencies could prioritize people for investigation and detainment with one instruction. Regulatory-enforcement agencies that monitor corporate behavior for malfeasance could turn their attention to, or away from, any given company on a whim.

Even if Congress were motivated to fight back against Trump and Musk, or against a future president seeking to bulldoze the will of the legislature, the absolute power to command AI agents would make it easier to subvert legislative intent. AI has the power to diminish representative politics. Written law is never fully determinative of the actions of government—there is always wiggle room for presidents, appointed leaders, and civil servants to exercise their own judgment. Whether intentional or not, whether charitably or not, each of these actors uses discretion. In human systems, that discretion is widely distributed across many individuals—people who, in the case of career civil servants, usually outlast presidencies.

Today, the AI ecosystem is dominated by a small number of corporations that decide how the most widely used AI models are designed, which data they are trained on, and which instructions they follow. Because their work is largely secretive and unaccountable to public interest, these tech companies are capable of making changes to the bias of AI systems—either generally or with aim at specific governmental use cases—that are invisible to the rest of us. And these private actors are both vulnerable to coercion by political leaders and self-interested in appealing to their favor. Musk himself created and funded xAI, now one of the world’s largest AI labs, with an explicitly ideological mandate to generate anti-“woke” AI and steer the wider AI industry in a similar direction.

But there’s a second way that AI’s transformation of government could go. AI development could happen inside of transparent and accountable public institutions, alongside its continued development by Big Tech. Applications of AI in democratic governments could be focused on benefitting public servants and the communities they serve by, for example, making it easier for non-English speakers to access government services, making ministerial tasks such as processing routine applications more efficient and reducing backlogs, or helping constituents weigh in on the policies deliberated by their representatives. Such AI integrations should be done gradually and carefully, with public oversight for their design and implementation and monitoring and guardrails to avoid unacceptable bias and harm.

Governments around the world are demonstrating how this could be done, though it’s early days. Taiwan has pioneered the use of AI models to facilitate deliberative democracy at an unprecedented scale. Singapore has been a leader in the development of public AI models, built transparently and with public-service use cases in mind. Canada has illustrated the role of disclosure and public input on the consideration of AI use cases in government. Even if you do not trust the current White House to follow any of these examples, U.S. states—which have much greater contact and influence over the daily lives of Americans than the federal government—could lead the way on this kind of responsible development and deployment of AI.

As the political theorist David Runciman has written, AI is just another in a long line of artificial “machines” used to govern how people live and act, not unlike corporations and states before it. AI doesn’t replace those older institutions, but it changes how they function. As the Trump administration forges stronger ties to Big Tech and AI developers, we need to recognize the potential of that partnership to steer the future of democratic governance—and act to make sure that it does not enable future authoritarians.

This essay was written with Nathan E. Sanders, and originally appeared in The Atlantic.

AIs and Robots Should Sound Robotic

2025-02-06 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/02/ais-and-robots-should-sound-robotic.html

Most people know that robots no longer sound like tinny trash cans. They sound like Siri, Alexa, and Gemini. They sound like the voices in labyrinthine customer support phone trees. And even those robot voices are being made obsolete by new AI-generated voices that can mimic every vocal nuance and tic of human speech, down to specific regional accents. And with just a few seconds of audio, AI can now clone someone’s specific voice.

This technology will replace humans in many areas. Automated customer support will save money by cutting staffing at call centers. AI agents will make calls on our behalf, conversing with others in natural language. All of that is happening, and will be commonplace soon.

But there is something fundamentally different about talking with a bot as opposed to a person. A person can be a friend. An AI cannot be a friend, despite how people might treat it or react to it. AI is at best a tool, and at worst a means of manipulation. Humans need to know whether we’re talking with a living, breathing person or a robot with an agenda set by the person who controls it. That’s why robots should sound like robots.

You can’t just label AI-generated speech. It will come in many different forms. So we need a way to recognize AI that works no matter the modality. It needs to work for long or short snippets of audio, even just a second long. It needs to work for any language, and in any cultural context. At the same time, we shouldn’t constrain the underlying system’s sophistication or language complexity.

We have a simple proposal: all talking AIs and robots should use a ring modulator. In the mid-twentieth century, before it was easy to create actual robotic-sounding speech synthetically, ring modulators were used to make actors’ voices sound robotic. Over the last few decades, we have become accustomed to robotic voices, simply because text-to-speech systems were good enough to produce intelligible speech that was not human-like in its sound. Now we can use that same technology to make robotic speech that is indistinguishable from human sound robotic again.

A ring modulator has several advantages: It is computationally simple, can be applied in real-time, does not affect the intelligibility of the voice, and—most importantly—is universally “robotic sounding” because of its historical usage for depicting robots.

Responsible AI companies that provide voice synthesis or AI voice assistants in any form should add a ring modulator of some standard frequency (say, between 30-80 Hz) and of a minimum amplitude (say, 20 percent). That’s it. People will catch on quickly.

Here are a couple of examples you can listen to for examples of what we’re suggesting. The first clip is an AI-generated “podcast” of this article made by Google’s NotebookLM featuring two AI “hosts.” Google’s NotebookLM created the podcast script and audio given only the text of this article. The next two clips feature that same podcast with the AIs’ voices modulated more and less subtly by a ring modulator:

Raw audio sample generated by Google’s NotebookLM

Audio sample with added ring modulator (30 Hz-25%)

Audio sample with added ring modulator (30 Hz-40%)

We were able to generate the audio effect with a 50-line Python script generated by Anthropic’s Claude. One of the most well-known robot voices were those of the Daleks from Doctor Who in the 1960s. Back then robot voices were difficult to synthesize, so the audio was actually an actor’s voice run through a ring modulator. It was set to around 30 Hz, as we did in our example, with different modulation depth (amplitude) depending on how strong the robotic effect is meant to be. Our expectation is that the AI industry will test and converge on a good balance of such parameters and settings, and will use better tools than a 50-line Python script, but this highlights how simple it is to achieve.

Of course there will also be nefarious uses of AI voices. Scams that use voice cloning have been getting easier every year, but they’ve been possible for many years with the right know-how. Just like we’re learning that we can no longer trust images and videos we see because they could easily have been AI-generated, we will all soon learn that someone who sounds like a family member urgently requesting money may just be a scammer using a voice-cloning tool.

We don’t expect scammers to follow our proposal: They’ll find a way no matter what. But that’s always true of security standards, and a rising tide lifts all boats. We think the bulk of the uses will be with popular voice APIs from major companies—and everyone should know that they’re talking with a robot.

This essay was written with Barath Raghavan, and originally appeared in IEEE Spectrum.

AI Reasoning Models: OpenAI o3-mini, o1-mini, and DeepSeek R1

2025-02-05 Molly Clancy

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/ai-reasoning-models-openai-o3-mini-o1-mini-and-deepseek-r1/

A decorative image showing an AI chip connecting icons of representing different files.

If you haven’t been able to keep pace with the AI news cycle, you’d be forgiven. I work at a tech company, and it’s felt like bailing water with a teacup over the past few weeks. But the term that keeps rising to the top of the flotsam in the boat is this: reasoning models. The regular ol’ models that power ChatGPT, Gemini, and Claude are cool and all, but reasoning models are what you should keep an eye on as an enterprise tech leader, specifically DeepSeek and OpenAI.

In the spirit of our AI 101 series, I’ll do my level best to recap the finer points and decode some of the more esoteric terms you’re likely to encounter (Like: WTH is a “mixture of experts”? That sounds like a party I want to be invited to, but will definitely skip at the last minute.)

The reasoning model releases: OpenAI o1-mini, DeepSeek R1, and OpenAI o3-mini

The last few weeks and months have seen a flurry of activity in the AI space, with reasoning models taking center stage. The TL/DR is that reasoning models are LLMs that can self-correct before delivering a response to a prompt, though their turn time is a little longer than your standard LLM.

Here are the releases that you should know about.

OpenAI o1-mini: September 12, 2024

It seems like a lifetime ago, but OpenAI released its o1-mini model back in September. o1-mini wasn’t the first reasoning model to go to market (models from Google, DeepMind, Anthropic, and Meta dabbled in reasoning for specific tasks). But, it was more cost-efficient at inference—80% cheaper than the o1-preview model. What you need to know:

Yes, o1-preview and o1-mini were released at the same time—it’s confusing. Without getting into the weeds, here’s the difference: pricing. o1-preview was the most expensive OpenAI model on offer at $15/1M input tokens and $60/1M output tokens versus mini’s $3/1M input and $12/1M output. (You can think of tokens as units of data, like a prompt or a response, that are processed by the ML model.)
o1-preview (the expensive one) was purported at the time to perform “similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology.”
o1-mini (the 80% cheaper one) was designed to be particularly well-suited for coding tasks.

DeepSeek R1: January 20, 2025

Unless you’ve been under a rock, you’ve heard about this one. DeepSeek rattled the AI industry and financial markets with its release of R1, challenging OpenAI’s models on performance, pricing, and open-source availability. (We love a good open-source release.) What you need to know:

DeepSeek R1 delivers comparable results to OpenAI’s o1 models, both preview and mini, on math and coding benchmarks, while being trained on fewer GPUs—orders of magnitude fewer. Best guess estimates put it at around 60,000 GPUs, while industry leaders like OpenAI and Anthropic exceed 500k each.
This makes R1 much cheaper at $0.14/1M input tokens and $2.19/1M output tokens.
These efficiency claims could have far-reaching impacts for enterprises looking to build AI at a fraction of the cost. (The DeepSeek platform page has been down since we tasked one of our favorite tech evangelists with testing it, but stay tuned for a deep dive on how it works.)

OpenAI o3-mini: January 31, 2025

OpenAI previewed o3 in December, and brought it to GA just 11 days after DeepSeek joined the party. What you need to know:

o3-mini is intended for programming and STEM use cases.
At $1.10/1M input tokens and $4.40/1M output tokens, o3-mini beats o1-mini on pricing, but doesn’t challenge DeepSeek R1.
Also, OpenAI alleges that DeepSeek stole its data for training purposes.

I’m admittedly cherry picking these releases a bit to keep things simple. Suffice it to say, there are a lot of models, even within OpenAI’s o-series, but these are the ones of note at least as it pertains to recent events.

What is reasoning anyway?

You might see reasoning described as “thinking” before it delivers an answer, but do not be fooled. AI cannot yet “think” or, to be fair, “reason” in the ways that we apply those terms to humans. To describe what they actually do, I need to use a word salad of jargon. I’m sorry—definitions follow. Reasoning models leverage chain-of-thought prompting to guide decision-making, incorporating self-improvement mechanisms and using test-time thinking to make real-time adjustments.

Chain-of-thought (CoT) prompting: Models break problems into logical steps (e.g., solving math problems via intermediate equations)
Self-improvement mechanisms: Techniques like the Self-Taught Reasoner (STaR) enable iterative refinement of reasoning through automated feedback loops.
Test-time thinking: Models can make decisions during deployment based on real-time inputs, rather than relying solely on pre-trained models or fixed strategies.

Here are a few more terms you might come across for good measure:

Inference compute: The computational power needed to run a reasoning model and generate predictions or outputs based on new data after the model has been trained.
Mixture of experts approach: Using multiple specialized models (“experts”) that handle different tasks, and applying a gating mechanism to select the most relevant expert to use to make predictions based on the input data. Of note: DeepSeek used this approach to create efficiencies.
Distillation: Using inputs and outputs from one model to train another model. Of note: OpenAI alleges this is how DeepSeek “stole” its IP.

This is all pretty cool, if linguistically painful, stuff, and it means that reasoning models are shifting perceptions of model capabilities. But they’re not without persistent challenges. Like other LLMs, they still struggle with complex reasoning failures, lack of training transparency, and cognitive biases.

Why should you care?

If the past two weeks (and, really, the past two years) are any indication, AI innovation will continue its blistering pace. Reasoning models, and LLMs in general, will become diverse and specialized for narrower tasks as the core technology is increasingly commoditized and cheapened. And, it’s worth noting that this is a totally normal—and expected—lifecycle when it comes to new technology.

What does it all mean for enterprises looking to build AI into their operations? Two key takeaways:

Don’t overcommit on any one toolset or investment: Test out OpenAI, DeepSeek, Gemini, Alibaba’s Qwen, and others. And, stay ahead of the changing landscape and new models—stay nimble, and keep experimenting.
Take care of your data: What makes these models valuable for your company isn’t so much their capabilities, but your data. You need to retain it in storage that’s reliable, easy to access, and doesn’t lock you out of AI experimentation with exorbitant egress fees.

Even as AI models get better, having those fundamentals in place can only help your business and set you up to better leverage AI when it’s right for your operations.

The post AI Reasoning Models: OpenAI o3-mini, o1-mini, and DeepSeek R1 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

On Generative AI Security

2025-02-05 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/02/on-generative-ai-security.html

Microsoft’s AI Red Team just published “Lessons from Red Teaming 100 Generative AI Products.” Their blog post lists “three takeaways,” but the eight lessons in the report itself are more useful:

Understand what the system can do and where it is applied.

You don’t have to compute gradients to break an AI system.

AI red teaming is not safety benchmarking.

Automation can help cover more of the risk landscape.

The human element of AI red teaming is crucial.

Responsible AI harms are pervasive but difficult to measure.

LLMs amplify existing security risks and introduce new ones.

The work of securing AI systems will never be complete.

Creating a Personal Assistant in Zabbix with Artificial Intelligence

2025-02-05 Cesar Caceres

Post Syndicated from Cesar Caceres original https://blog.zabbix.com/creating-a-personal-assistant-in-zabbix-with-artificial-intelligence/29596/

Zabbix is dedicated to monitoring IT infrastructures based on predetermined thresholds, such as servers, networks, and applications. Incorporating artificial intelligence (AI) into Zabbix as a complement allows a user to mitigate alerts based on these predetermined thresholds, offering possible causes and solutions to problems. This can help a user resolve incidents more efficiently.

In this article, we will explain how to integrate Zabbix and Google’s AI tool Gemini by using the API provided as well as a custom widget alternative.

First steps towards integration

You can find the repository in GitHub based on the Google Gemini model. You’ll need to create an account in Google AI Studio to obtain the required API.

Script configuration in Zabbix

From Zabbix version 7.0, access:

“Alerts” > “Scripts” > “Create Script.”

For this functionality, we designated the name as “Possible cause and solution.” Next, we can configure the parameters with the trigger event and the API generated in AI Studio. We then copy and get the script from the repository mentioned in the «Script» field, as in the following image:

Application in the problem panel

After configuration, we access the alerts panel and select a specific alert. We click on “AI Assistant” and access the functionality that was previously named as “Possible cause and solution.”

The following images present an example of an agent installed on a notebook.

Possible cause:

Possible solution:

The AI will be able to provide a precise solution for each problem presented, allowing us to progressively optimize the predetermined thresholds.

Creating accurate personalized dashboards for the user is essential. With this in mind, we propose the creation of an AI-based widget called “What are you working on?” (¿Qué harías tu? in Spanish), which analyzes the current state of the problem presented in Zabbix.

This concept integrates all the functionalities present in the widget (including Summary, Perspectives, Diagnosis, Comparison, and Forecast), since the used prompt can indicate whether it is necessary to make adjustments to the strategic plan or predict future trends based on the panel data built.

To exemplify how the “What are you working?” widget works, let’s consider the analysis of disk usage on our Zabbix Server.

The creation of personalized widgets from the official Zabbix page.

Once we have knowledge for the project, on the backend of our Zabbix Server we locate the route:

/usr/share/zabbix/widgets/

Then, we create a carpet called “insights” and copy the following repository. It is necessary to place the Gemini API in the file «assets/js/class.widget.php.js» in the field “YOUR_API_KEY.”

On the frontend, we go to “Administration” > “General” > “Modules.”

In the upper right corner, we click on “Scan Directory.” We have our widget to use:

After performing the scan, it is necessary to enable the widget, as it is disabled by default.

The importance of using AI in Zabbix

Let’s imagine a scenario with 100 monitored servers. Performance thresholds, Windows services, or other specific services can generate up to 50 weekly alerts. With the help of AI, it’s possible to reduce this number to a bare minimum, thanks to the weekly collection of possible causes and solutions.

This ground-level approach allows users to solve problems faster, but also improves overall health by minimizing necessary adjustments to the Zabbix server.

Implementing AI locally

Using a dedicated server with open source AI models like HuggingFace, it’s possible to implement the AI locally and create a database collecting the possible causes and solutions of the events.

The AI will learn from repetitive events, offering more accurate answers in the future. The analysis of possible trends can be based on the generated alerts. In this way, we can optimize our alerts and put artificial intelligence to work understanding and solving our problems.

Conclusion

The model we use is project-oriented. We are constantly evolving artificial intelligence, and we must use the model we know best. language is distinct due to the orientation of the prompts used for the answers and the learning we can provide, either by making requests to specific artificial intelligence platforms or by using it locally.

The post Creating a Personal Assistant in Zabbix with Artificial Intelligence appeared first on Zabbix Blog.

No hallucinations here: track the latest AI trends with expanded insights on Cloudflare Radar

2025-02-04 David Belson

Post Syndicated from David Belson original https://blog.cloudflare.com/expanded-ai-insights-on-cloudflare-radar/

During 2024’s Birthday Week, we launched an AI bot & crawler traffic graph on Cloudflare Radar that provides visibility into which bots and crawlers are the most aggressive and have the highest volume of requests, which crawl on a regular basis, and more. Today, we are launching a new dedicated “AI Insights” page on Cloudflare Radar that incorporates this graph and builds on it with additional metrics that you can use to understand AI-related trends from multiple perspectives. In addition to the traffic trends, the new section includes a view into the relative popularity of publicly available Generative AI services based on 1.1.1.1 DNS resolver traffic, the usage of robots.txt directives to restrict AI bot access to content, and open source model usage as seen by Cloudflare Workers AI.

Below, we’ll review each section of the new AI Insights page in more detail.

AI bots and crawlers traffic trends

Tracking traffic trends for AI bots can help us better understand their activity over time. Initially launched in September 2024 on Radar’s Traffic page, the AI bot & crawler traffic graph has moved to the AI Insights page and provides visibility into traffic trends gathered globally over the selected time period for the top five most active AI bots & crawlers. The associated list of user agents tracked here is based on the ai.robots.txt list, and will be updated with new entries as they are identified. The time series and summary data for this graph is available from the Radar API, and traffic trends for the full set of AI bots & crawlers we see traffic from can be viewed in the Data Explorer.

Popularity of Generative AI services

Over the last several years, the Cloudflare Radar Year in Review has analyzed request traffic data from our 1.1.1.1 DNS resolver to present rankings of the most popular Internet services, both generally and across several categories. In both 2023 and 2024, this section included rankings for publicly-available Generative AI services, with ChatGPT topping the list both years. While an accompanying blog post provides a more detailed look at how the rankings shifted over the course of the year, it too is looking through the rearview mirror. That is, it doesn’t provide visibility into the changes as they are occurring. The new Generative AI services popularity graph shows the relative rankings of these services and platforms based on DNS request traffic for domains associated with these services aggregated at a daily level. The underlying time series data is available through the Radar API, using the serviceCategory=Generative%20AI parameter.

The graph below shows that as of the end of January 2025, the top five services were fairly stable over the preceding four weeks, but there was regular movement among those ranked #6-10. We expect that the rankings will continue to change over time. DeepSeek, a Generative AI service that took the industry by storm at the end of January, can be seen making its initial appearance at #9 on January 26, rising rapidly to #3 on January 29, just three days later.

Analysis of robots.txt files

Content providers can attempt to control access to their full site, or specific portions of it, through the use of Allow or Disallow directives in a robots.txt file. However, successful access control is dependent on the bots respecting the listed directives. Cloudflare’s AI Audit gives you visibility and control into how AI bots are interacting with your website, and now Cloudflare Radar gives you insights into how other sites are handling them.

On a weekly basis, we analyze Radar’s top 10,000 domains to determine which associated sites publish robots.txt files, as well as aggregating the AI-specific directives within those files. In our new AI user agents found in robots.txt graph, seen below, we are now providing insights into actions that these top sites are taking with respect to AI bots. These actions are specified by directives that allow or disallow access by a given user agent (bot identifier) for either all content on the site (Fully Allowed/Disallowed) or certain sections (Partially Allowed/Disallowed).

In addition, we have also organized these domains by category (for example, Ecommerce or News & Media), highlighting the specific bots that the sites within those categories have listed in their directives. For example, the News & Media domain category graph shown below illustrates that these types of sites almost universally fully disallow access to their sites by AI user agents.

Changing the directive to “Allow” shows a much smaller set of user agents, with a drastically smaller set of sites explicitly allowing full or partial access. (Note that if a user agent is not listed in a robots.txt file, and a wildcard “*” user agent is not specified, then access is fully allowed by default.)

In addition to appearing on the AI Insights page, the underlying data is available for further exploration and analysis through the Radar API and the Data Explorer.

Popularity of models and tasks on Workers AI

The AI model landscape is rapidly evolving, with providers regularly releasing more powerful models, capable of tasks like text and image generation, speech recognition, and image classification. Cloudflare works closely with AI model providers to ensure that Workers AI supports these models as soon as possible following their release. On the new AI Insights page, Radar now provides visibility into the popularity of publicly available supported models (Workers AI model popularity) as well as the types of tasks (Workers AI task popularity) that these models perform, based on customer account share. Extended insights, including share trends and summary shares for the full list of models and tasks, as well as the ability to compare model and task shares across time periods, are available through the Data Explorer. The underlying model popularity and task popularity data is also available through API endpoints.

Conclusion

The AI space is extremely dynamic, with new platforms, services, and models regularly appearing. In some cases, these new entrants even have the power to upset the market as they see rapid growth in interest and usage. And over two years since ChatGPT was announced, there continues to be tension between content providers and AI platforms about scraping content for model training. The new “AI Insights” page on Cloudflare Radar provides timely trends and information about this dynamic space, enabling industry observers and participants to better understand how it is changing and evolving over time.

If you share AI Insights graphs on social media, be sure to tag us: @CloudflareRadar (X), noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky). You can also reach out on social media, or contact us via email, with suggestions for AI metrics that we can explore adding to the page in the future.

Deepfakes and the 2024 US Election

2025-02-04 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/02/deepfakes-and-the-2024-us-election.html

Interesting analysis:

We analyzed every instance of AI use in elections collected by the WIRED AI Elections Project (source for our analysis), which tracked known uses of AI for creating political content during elections taking place in 2024 worldwide. In each case, we identified what AI was used for and estimated the cost of creating similar content without AI.

We find that (1) half of AI use isn’t deceptive, (2) deceptive content produced using AI is nevertheless cheap to replicate without AI, and (3) focusing on the demand for misinformation rather than the supply is a much more effective way to diagnose problems and identify interventions.

This tracks with my analysis. People share as a form of social signaling. I send you a meme/article/clipping/photo to show that we are on the same team. Whether it is true, or misinformation, or actual propaganda, is of secondary importance. Sometimes it’s completely irrelevant. This is why fact checking doesn’t work. This is why “cheap fakes”—obviously fake photos and videos—are effective. This is why, as the authors of that analysis said, the demand side is the real problem.

Intel Falcon Shores GPU Not Coming to Market in an AI Hit

2025-01-31 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/intel-falcon-shores-gpu-not-coming-to-market-in-an-ai-hit/

Intel announced that it would not be re-entering the high-end AI data center GPU market with Falcon Shores and instead will wait a generation

The post Intel Falcon Shores GPU Not Coming to Market in an AI Hit appeared first on ServeTheHome.

Kioxia AiSAQ SSD-backed RAG Open Sourced

2025-01-30 Cliff Robinson

Post Syndicated from Cliff Robinson original https://www.servethehome.com/kioxia-aisaq-ssd-backed-rag-open-sourced/

Kioxia AiSAQ is designed to replace DRAM with lower-cost flash in RAG applications. It is now open-sourced and available on Github

The post Kioxia AiSAQ SSD-backed RAG Open Sourced appeared first on ServeTheHome.

DeepSeek Day in AI

2025-01-28 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/deepseek-day-in-ai/

DeepSeek-R1 is out and it has the entire AI infrastructure realm has taken notice. This is a good development for the industry

The post DeepSeek Day in AI appeared first on ServeTheHome.

Supermicro SYS-821GE-TNHR 8x NVIDIA H200 GPU Air Cooled

2025-01-24 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/supermicro-sys-821ge-tnhr-8x-nvidia-h200-gpu-air-cooled-bluefield-intel-xeon-astera-broadcom/

In our Supermicro SYS-821GE-TNHR review, we see what makes this super popular 8U air-cooled NVIDIA HGX H200 server the standard setter

The post Supermicro SYS-821GE-TNHR 8x NVIDIA H200 GPU Air Cooled appeared first on ServeTheHome.

AI Will Write Complex Laws

2025-01-22 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/01/ai-will-write-complex-laws.html

Artificial intelligence (AI) is writing law today. This has required no changes in legislative procedure or the rules of legislative bodies—all it takes is one legislator, or legislative assistant, to use generative AI in the process of drafting a bill.

In fact, the use of AI by legislators is only likely to become more prevalent. There are currently projects in the US House, US Senate, and legislatures around the world to trial the use of AI in various ways: searching databases, drafting text, summarizing meetings, performing policy research and analysis, and more. A Brazilian municipality passed the first known AI-written law in 2023.

That’s not surprising; AI is being used more everywhere. What is coming into focus is how policymakers will use AI and, critically, how this use will change the balance of power between the legislative and executive branches of government. Soon, US legislators may turn to AI to help them keep pace with the increasing complexity of their lawmaking—and this will suppress the power and discretion of the executive branch to make policy.

Demand for Increasingly Complex Legislation

Legislators are writing increasingly long, intricate, and complicated laws that human legislative drafters have trouble producing. Already in the US, the multibillion-dollar lobbying industry is subsidizing lawmakers in writing baroque laws: suggesting paragraphs to add to bills, specifying benefits for some, carving out exceptions for others. Indeed, the lobbying industry is growing in complexity and influence worldwide.

Several years ago, researchers studied bills introduced into state legislatures throughout the US, looking at which bills were wholly original texts and which borrowed text from other states or from lobbyist-written model legislation. Their conclusion was not very surprising. Those who borrowed the most text were in legislatures that were less resourced. This makes sense: If you’re a part-time legislator, perhaps unpaid and without a lot of staff, you need to rely on more external support to draft legislation. When the scope of policymaking outstrips the resources of legislators, they look for help. Today, that often means lobbyists, who provide expertise, research services, and drafting labor to legislators at the local, state, and federal levels at no charge. Of course, they are not unbiased: They seek to exert influence on behalf of their clients.

Ano ther study, at the US federal level, measured the complexity of policies proposed in legislation and tried to determine the factors that led to such growing complexity. While there are numerous ways to measure legal complexity, these authors focused on the specificity of institutional design: How exacting is Congress in laying out the relational network of branches, agencies, and officials that will share power to implement the policy?

In looking at bills enacted between 1993 and 2014, the researchers found two things. First, they concluded that ideological polarization drives complexity. The suggestion is that if a legislator is on the extreme end of the ideological spectrum, they’re more likely to introduce a complex law that constrains the discretion of, as the authors put it, “entrenched bureaucratic interests.” And second, they found that divided government drives complexity to a large degree: Significant legislation passed under divided government was found to be 65 percent more complex than similar legislation passed under unified government. Their conclusion is that, if a legislator’s party controls Congress, and the opposing party controls the White House, the legislator will want to give the executive as little wiggle room as possible. When legislators’ preferences disagree with the executive’s, the legislature is incentivized to write laws that specify all the details. This gives the agency designated to implement the law as little discretion as possible.

Because polarization and divided government are increasingly entrenched in the US, the demand for complex legislation at the federal level is likely to grow. Today, we have both the greatest ideological polarization in Congress in living memory and an increasingly divided government at the federal level. Between 1900 and 1970 (57th through 90th Congresses), we had 27 instances of unified government and only seven divided; nearly a four-to-one ratio. Since then, the trend is roughly the opposite. As of the start of the next Congress, we will have had 20 divided governments and only eight unified (nearly a three-to-one ratio). And while the incoming Trump administration will see a unified government, the extremely closely divided House may often make this Congress look and feel like a divided one (see the recent government shutdown crisis as an exemplar) and makes truly divided government a strong possibility in 2027.

Another related factor driving the complexity of legislation is the need to do it all at once. The lobbyist feeding frenzy—spurring major bills like the Affordable Care Act to be thousands of pages in length—is driven in part by gridlock in Congress. Congressional productivity has dropped so low that bills on any given policy issue seem like a once-in-a-generation opportunity for legislators—and lobbyists—to set policy.

These dynamics also impact the states. States often have divided governments, albeit less often than they used to, and their demand for drafting assistance is arguably higher due to their significantly smaller staffs. And since the productivity of Congress has cratered in recent years, significantly more policymaking is happening at the state level.

But there’s another reason, particular to the US federal government, that will likely force congressional legislation to be more complex even during unified government. In June 2024, the US Supreme Court overturned the Chevron doctrine, which gave executive agencies broad power to specify and implement legislation. Suddenly, there is a mandate from the Supreme Court for more specific legislation. Issues that have historically been left implicitly to the executive branch are now required to be either explicitly delegated to agencies or specified directly in statute. Either way, the Court’s ruling implied that law should become more complex and that Congress should increase its policymaking capacity.

This affects the balance of power between the executive and legislative branches of government. When the legislature delegates less to the executive branch, it increases its own power. Every decision made explicitly in statute is a decision the executive makes not on its own but, rather, according to the directive of the legislature. In the US system of separation of powers, administrative law is a tool for balancing power among the legislative, executive, and judicial branches. The legislature gets to decide when to delegate and when not to, and it can respond to judicial review to adjust its delegation of control as needed. The elimination of Chevron will induce the legislature to exert its control over delegation more robustly.

At the same time, there are powerful political incentives for Congress to be vague and to rely on someone else, like agency bureaucrats, to make hard decisions. That empowers third parties—the corporations, or lobbyists—that have been gifted by the overturning of Chevron a new tool in arguing against administrative regulations not specifically backed up by law. A continuing stream of Supreme Court decisions handing victories to unpopular industries could be another driver of complex law, adding political pressure to pass legislative fixes.

AI Can Supply Complex Legislation

Congress may or may not be up to the challenge of putting more policy details into law, but the external forces outlined above—lobbyists, the judiciary, and an increasingly divided and polarized government—are pushing them to do so. When Congress does take on the task of writing complex legislation, it’s quite likely it will turn to AI for help.

Two particular AI capabilities enable Congress to write laws different from laws humans tend to write. One, AI models have an enormous scope of expertise, whereas people have only a handful of specializations. Large language models (LLMs) like the one powering ChatGPT can generate legislative text on funding specialty crop harvesting mechanization equally as well as material on energy efficiency standards for street lighting. This enables a legislator to address more topics simultaneously. Two, AI models have the sophistication to work with a higher degree of complexity than people can. Modern LLM systems can instantaneously perform several simultaneous multistep reasoning tasks using information from thousands of pages of documents. This enables a legislator to fill in more baroque detail on any given topic.

That’s not to say that handing over legislative drafting to machines is easily done. Modernizing any institutional process is extremely hard, even when the technology is readily available and performant. And modern AI still has a ways to go to achieve mastery of complex legal and policy issues. But the basic tools are there.

AI can be used in each step of lawmaking, and this will bring various benefits to policymakers. It could let them work on more policies—more bills—at the same time, add more detail and specificity to each bill, or interpret and incorporate more feedback from constituents and outside groups. The addition of a single AI tool to a legislative office may have an impact similar to adding several people to their staff, but with far lower cost.

Speed sometimes matters when writing law. When there is a change of governing party, there is often a rush to change as much policy as possible to match the platform of the new regime. AI could help legislators do that kind of wholesale revision. The result could be policy that is more responsive to voters—or more political instability. Already in 2024, the US House’s Office of the Clerk has begun using AI to speed up the process of producing cost estimates for bills and understanding how new legislation relates to existing code. Ohio has used an AI tool to do wholesale revision of state administrative law since 2020.

AI can also make laws clearer and more consistent. With their superhuman attention spans, AI tools are good at enforcing syntactic and grammatical rules. They will be effective at drafting text in precise and proper legislative language, or offering detailed feedback to human drafters. Borrowing ideas from software development, where coders use tools to identify common instances of bad programming practices, an AI reviewer can highlight bad law-writing practices. For example, it can detect when significant phrasing is inconsistent across a long bill. If a bill about insurance repeatedly lists a variety of disaster categories, but leaves one out one time, AI can catch that.

Perhaps this seems like minutiae, but a small ambiguity or mistake in law can have massive consequences. In 2015, the Affordable Care Act came close to being struck down because of a typo in four words, imperiling health care services extended to more than 7 million Americans.

There’s more that AI can do in the legislative process. AI can summarize bills and answer questions about their provisions. It can highlight aspects of a bill that align with, or are contrary to, different political points of view. We can even imagine a future in which AI can be used to simulate a new law and determine whether or not it would be effective, or what the side effects would be. This means that beyond writing them, AI could help lawmakers understand laws. Congress is notorious for producing bills hundreds of pages long, and many other countries sometimes have similarly massive omnibus bills that address many issues at once. It’s impossible for any one person to understand how each of these bills’ provisions would work. Many legislatures employ human analysis in budget or fiscal offices that analyze these bills and offer reports. AI could do this kind of work at greater speed and scale, so legislators could easily query an AI tool about how a particular bill would affect their district or areas of concern.

This is a use case that the House subcommittee on modernization has urged the Library of Congress to take action on. Numerous software vendors are already marketing AI legislative analysis tools. These tools can potentially find loopholes or, like the human lobbyists of today, craft them to benefit particular private interests.

These capabilities will be attractive to legislators who are looking to expand their power and capabilities but don’t necessarily have more funding to hire human staff. We should understand the idea of AI-augmented lawmaking contextualized within the longer history of legislative technologies. To serve society at modern scales, we’ve had to come a long way from the Athenian ideals of direct democracy and sortition. Democracy no longer involves just one person and one vote to decide a policy. It involves hundreds of thousands of constituents electing one representative, who is augmented by a staff as well as subsidized by lobbyists, and who implements policy through a vast administrative state coordinated by digital technologies. Using AI to help those representatives specify and refine their policy ideas is part of a long history of transformation.

Whether all this AI augmentation is good for all of us subject to the laws they make is less clear. There are real risks to AI-written law, but those risks are not dramatically different from what we endure today. AI-written law trying to optimize for certain policy outcomes may get it wrong (just as many human-written laws are misguided). AI-written law may be manipulated to benefit one constituency over others, by the tech companies that develop the AI, or by the legislators who apply it, just as human lobbyists steer policy to benefit their clients.

Regardless of what anyone thinks of any of this, regardless of whether it will be a net positive or a net negative, AI-made legislation is coming—the growing complexity of policy demands it. It doesn’t require any changes in legislative procedures or agreement from any rules committee. All it takes is for one legislative assistant, or lobbyist, to fire up a chatbot and ask it to create a draft. When legislators voted on that Brazilian bill in 2023, they didn’t know it was AI-written; the use of ChatGPT was undisclosed. And even if they had known, it’s not clear it would have made a difference. In the future, as in the past, we won’t always know which laws will have good impacts and which will have bad effects, regardless of the words on the page, or who (or what) wrote them.

This essay was written with Nathan E. Sanders, and originally appeared in Lawfare.

This is the Massive AMD Instinct MI300A Heatsink in the Gigabyte G383-R80-AAP1

2025-01-22 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/this-is-the-massive-amd-instinct-mi300a-heatsink-in-the-gigabyte-g383-r80-aap1/

This is Gigabyte’s AMD Instinct MI300A heatsink for the G383-R80-AAP1’s four APUs and it is absolutely a giant

The post This is the Massive AMD Instinct MI300A Heatsink in the Gigabyte G383-R80-AAP1 appeared first on ServeTheHome.

AI Mistakes Are Very Different from Human Mistakes

2025-01-21 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/01/ai-mistakes-are-very-different-from-human-mistakes.html

Humans make mistakes all the time. All of us do, every day, in tasks both new and routine. Some of our mistakes are minor and some are catastrophic. Mistakes can break trust with our friends, lose the confidence of our bosses, and sometimes be the difference between life and death.

Over the millennia, we have created security systems to deal with the sorts of mistakes humans commonly make. These days, casinos rotate their dealers regularly, because they make mistakes if they do the same task for too long. Hospital personnel write on limbs before surgery so that doctors operate on the correct body part, and they count surgical instruments to make sure none were left inside the body. From copyediting to double-entry bookkeeping to appellate courts, we humans have gotten really good at correcting human mistakes.

Humanity is now rapidly integrating a wholly different kind of mistake-maker into society: AI. Technologies like large language models (LLMs) can perform many cognitive tasks traditionally fulfilled by humans, but they make plenty of mistakes. It seems ridiculous when chatbots tell you to eat rocks or add glue to pizza. But it’s not the frequency or severity of AI systems’ mistakes that differentiates them from human mistakes. It’s their weirdness. AI systems do not make mistakes in the same ways that humans do.

Much of the friction—and risk—associated with our use of AI arise from that difference. We need to invent new security systems that adapt to these differences and prevent harm from AI mistakes.

Human Mistakes vs AI Mistakes

Life experience makes it fairly easy for each of us to guess when and where humans will make mistakes. Human errors tend to come at the edges of someone’s knowledge: Most of us would make mistakes solving calculus problems. We expect human mistakes to be clustered: A single calculus mistake is likely to be accompanied by others. We expect mistakes to wax and wane, predictably depending on factors such as fatigue and distraction. And mistakes are often accompanied by ignorance: Someone who makes calculus mistakes is also likely to respond “I don’t know” to calculus-related questions.

To the extent that AI systems make these human-like mistakes, we can bring all of our mistake-correcting systems to bear on their output. But the current crop of AI models—particularly LLMs—make mistakes differently.

AI errors come at seemingly random times, without any clustering around particular topics. LLM mistakes tend to be more evenly distributed through the knowledge space. A model might be equally likely to make a mistake on a calculus question as it is to propose that cabbages eat goats.

And AI mistakes aren’t accompanied by ignorance. A LLM will be just as confident when saying something completely wrong—and obviously so, to a human—as it will be when saying something true. The seemingly random inconsistency of LLMs makes it hard to trust their reasoning in complex, multi-step problems. If you want to use an AI model to help with a business problem, it’s not enough to see that it understands what factors make a product profitable; you need to be sure it won’t forget what money is.

How to Deal with AI Mistakes

This situation indicates two possible areas of research. The first is to engineer LLMs that make more human-like mistakes. The second is to build new mistake-correcting systems that deal with the specific sorts of mistakes that LLMs tend to make.

We already have some tools to lead LLMs to act in more human-like ways. Many of these arise from the field of “alignment” research, which aims to make models act in accordance with the goals and motivations of their human developers. One example is the technique that was arguably responsible for the breakthrough success of ChatGPT: reinforcement learning with human feedback. In this method, an AI model is (figuratively) rewarded for producing responses that get a thumbs-up from human evaluators. Similar approaches could be used to induce AI systems to make more human-like mistakes, particularly by penalizing them more for mistakes that are less intelligible.

When it comes to catching AI mistakes, some of the systems that we use to prevent human mistakes will help. To an extent, forcing LLMs to double-check their own work can help prevent errors. But LLMs can also confabulate seemingly plausible, but truly ridiculous, explanations for their flights from reason.

Other mistake mitigation systems for AI are unlike anything we use for humans. Because machines can’t get fatigued or frustrated in the way that humans do, it can help to ask an LLM the same question repeatedly in slightly different ways and then synthesize its multiple responses. Humans won’t put up with that kind of annoying repetition, but machines will.

Understanding Similarities and Differences

Researchers are still struggling to understand where LLM mistakes diverge from human ones. Some of the weirdness of AI is actually more human-like than it first appears. Small changes to a query to an LLM can result in wildly different responses, a problem known as prompt sensitivity. But, as any survey researcher can tell you, humans behave this way, too. The phrasing of a question in an opinion poll can have drastic impacts on the answers.

LLMs also seem to have a bias towards repeating the words that were most common in their training data; for example, guessing familiar place names like “America” even when asked about more exotic locations. Perhaps this is an example of the human “availability heuristic” manifesting in LLMs, with machines spitting out the first thing that comes to mind rather than reasoning through the question. And like humans, perhaps, some LLMs seem to get distracted in the middle of long documents; they’re better able to remember facts from the beginning and end. There is already progress on improving this error mode, as researchers have found that LLMs trained on more examples of retrieving information from long texts seem to do better at retrieving information uniformly.

In some cases, what’s bizarre about LLMs is that they act more like humans than we think they should. For example, some researchers have tested the hypothesis that LLMs perform better when offered a cash reward or threatened with death. It also turns out that some of the best ways to “jailbreak” LLMs (getting them to disobey their creators’ explicit instructions) look a lot like the kinds of social engineering tricks that humans use on each other: for example, pretending to be someone else or saying that the request is just a joke. But other effective jailbreaking techniques are things no human would ever fall for. One group found that if they used ASCII art (constructions of symbols that look like words or pictures) to pose dangerous questions, like how to build a bomb, the LLM would answer them willingly.

Humans may occasionally make seemingly random, incomprehensible, and inconsistent mistakes, but such occurrences are rare and often indicative of more serious problems. We also tend not to put people exhibiting these behaviors in decision-making positions. Likewise, we should confine AI decision-making systems to applications that suit their actual abilities—while keeping the potential ramifications of their mistakes firmly in mind.

This essay was written with Nathan E. Sanders, and originally appeared in IEEE Spectrum.

EDITED TO ADD (1/24): Slashdot thread.

This is the AMD SMC for its Instinct UBB

2025-01-16 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/this-is-the-amd-smc-for-its-instinct-ubb/

This is the AMD SMC for its Instinct 8-GPU UBB found in its AI servers to help integrate the GPU assembly into servers

The post This is the AMD SMC for its Instinct UBB appeared first on ServeTheHome.

Microsoft Takes Legal Action Against AI “Hacking as a Service” Scheme

2025-01-13 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/01/microsoft-takes-legal-action-against-ai-hacking-as-a-service-scheme.html

Not sure this will matter in the end, but it’s a positive move:

Microsoft is accusing three individuals of running a “hacking-as-a-service” scheme that was designed to allow the creation of harmful and illicit content using the company’s platform for AI-generated content.

The foreign-based defendants developed tools specifically designed to bypass safety guardrails Microsoft has erected to prevent the creation of harmful content through its generative AI services, said Steven Masada, the assistant general counsel for Microsoft’s Digital Crimes Unit. They then compromised the legitimate accounts of paying customers. They combined those two things to create a fee-based platform people could use.

It was a sophisticated scheme:

The service contained a proxy server that relayed traffic between its customers and the servers providing Microsoft’s AI services, the suit alleged. Among other things, the proxy service used undocumented Microsoft network application programming interfaces (APIs) to communicate with the company’s Azure computers. The resulting requests were designed to mimic legitimate Azure OpenAPI Service API requests and used compromised API keys to authenticate them.

Slashdot thread.

They Let Bring a Camera Into a Top Classified US Supercomputer El Capitan

2025-01-10 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/inside-top-classified-us-supercomputer-el-capitan-amd-hpe/

We had the opportunity to take photos and film inside El Capitan, the number 1 Top500 supercomputer as it enters its classified mission

The post They Let Bring a Camera Into a Top Classified US Supercomputer El Capitan appeared first on ServeTheHome.

A day of sharing and activities in Cambridge

Inspiration, examples, and expertise

What next?

Raw audio sample generated by Google’s NotebookLM

Audio sample with added ring modulator (30 Hz-25%)

Audio sample with added ring modulator (30 Hz-40%)

The reasoning model releases: OpenAI o1-mini, DeepSeek R1, and OpenAI o3-mini

OpenAI o1-mini: September 12, 2024

DeepSeek R1: January 20, 2025

OpenAI o3-mini: January 31, 2025

What is reasoning anyway?

Why should you care?

First steps towards integration

Script configuration in Zabbix

Application in the problem panel

Using the custom widget “What are you working on?”

The importance of using AI in Zabbix

Implementing AI locally

Conclusion

AI bots and crawlers traffic trends

Popularity of Generative AI services

Analysis of robots.txt files

Popularity of models and tasks on Workers AI

Conclusion

Demand for Increasingly Complex Legislation

AI Can Supply Complex Legislation

Human Mistakes vs AI Mistakes

How to Deal with AI Mistakes

Understanding Similarities and Differences

The collective thoughts of the interwebz