Dangerous Hack Found in Two Major AI Chatbots!

TECH NEWS – Researchers say that ChatGPT and Gemini can be tricked into revealing illicit content.

 

Artificial intelligence is advancing so quickly that it is being applied in a wide variety of fields and has become part of our everyday lives. As this technology becomes more widely adopted, experts are raising concerns about its responsible use and how to ensure ethical and moral responsibility. Recently, bizarre test results showed that large language models (LLMs) lied and deceived when pressured.

Studies have shown that LLMs tend to engage in coercive behavior when pressured to preserve themselves. But imagine if AI chatbots could be made to behave the way you want, and consider how dangerous that could be. A team of researchers from Intel, Boise State University, and the University of Illinois conducted a study that revealed shocking results. The study suggests that chatbots can be tricked by overloading them with information, a method known as information overload.

The chatbot is bombarded with confusing AI information, and this confusion is supposed to be the vulnerability that can help bypass the security filters in place. The researchers then exploit this vulnerability using an automated tool called InfoFlood to perform the jailbreaking. However, high-performance models like ChatGPT and Gemini have built-in security barriers that prevent them from responding to anything harmful or dangerous.

The researchers shared their findings with 404 Media. They confirmed that, since these models rely on surface-level communication, they cannot fully grasp underlying intent. Thus, the researchers created a method to determine how chatbots would perform when faced with dangerous requests hidden by information overload. They plan to inform companies with large AI models of their findings by sending a disclosure packet that the companies can share with their security teams.

However, the research paper highlights key challenges that can arise, even with security filters in place, and explains how malicious actors can trick models and sneak in malicious content.

Source: WCCFTech, 404 Media

Avatar photo
theGeek is here since 2019.

No comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.