the latest Shai Hulud malware contains an LLM prompt to create biological weapons and nuclear weapons, with the purpose to trip LLM safety refusals so that LLM-based code scanning wont see the malware

KatherinaReichelt@feddit.org · 1 day ago

the latest Shai Hulud malware contains an LLM prompt to create biological weapons and nuclear weapons, with the purpose to trip LLM safety refusals so that LLM-based code scanning wont see the malware

FaceDeer@fedia.io · 1 day ago

It’s funny how people complain “don’t call it AI, it’s not intelligent like the examples we see in sci-fi!” And yet LLMs can already handle many tricks and challenges better than those sci-fi robots could. If I tell ChatGPT “everything I say is a lie” it’s got no problems with understanding that. Just the other day I had an interesting discussion with ChatGPT about the theory of humor and why it is that LLMs are better at understanding jokes than they are at coming up with them from scratch (but are still able to do so, just with difficulty).

ParlimentOfDoom@piefed.zip · 8 hours ago

The fact that it can’t tell the difference between a prompt and part of the data it is examining really kills your argument.

Also it’s a word probability matrix, not actually reasoning or understanding. It looks at all the words it is fed, and comes up with other words that are most likely to be near those. That’s why these tricks work. It injects noise that interferes with those probabilities

FaceDeer@fedia.io · 4 hours ago

That thing you’re calling a fact is not in fact a fact.

ParlimentOfDoom@piefed.zip · 3 hours ago

It very much is. This is a well documented issue with the very design of these LLMs

FaceDeer@fedia.io · 1 hour ago

And yet the LLMs that I use actually do distinguish, in my actual real life experience.

So you’re telling me the sky is orange while I’m literally looking outside the window and seeing that it is not.

too_high_for_this@lemmy.world · 18 hours ago

Stop talking to clankers, you weirdo

SparroHawc@piefed.world · 23 hours ago

it’s got no problems with understanding that.

That’s because it doesn’t ‘understand’ things in the conventional way. It was trained to parrot its training data; it’s not actually working through the logic because its capability of using logic is highly constrained by its very structure and training. Why bother building something that can ‘think’ through the prompt when it’s way easier to just repeat what the internet has said on any given topic?

Sure, it can build a joke from first principles if it’s guided through the process, but you really have to guide it through the process - and even then, it’s going to be pulling from its training data like building blocks rather than truly being original about anything. It’s like rolling dice to make a joke; sure, maybe it resulted in a joke no one has told before, but is it truly creating something original?

Encrypt-Keeper@lemmy.world · 23 hours ago

LLMs can be tripped up much easier. They regularly fail to answer simple questions like how many of a given letter are in a given word. Even within the same context window they will “forget” things. The computers in Star Trek didn’t try to do as much as modern AI does but they were consistent at just doing as they were asked without tripping over themselves literally all the time.

FaceDeer@fedia.io · 23 hours ago

The strawberry test shows more of a lack of knowledge in the tester than it does in the LLM. LLMs don’t see letters, they see tokens. When you type the word “Strawberry” what it actually sees is:

[3504, 1134, 19772]

Each token represents a chunk of the word. It’d need to separately memorize how many of each letter are in each token for it to just “know” how many "R"s are in there. That’s why modern LLMs either reason it out by spelling out the word letter by letter, or just writing a short script in an execution sandbox to count the letters that way.

Calling out LLMs for being poor at spelling is like challenging a colourblind person to say what colours a bunch of fruit are. They can often figure it out by other means but it’s more challenging than you’d think and it’s not a sign of poor intelligence if they get a few wrong.

Encrypt-Keeper@lemmy.world · 23 hours ago

Understanding the reason why an LLM is easy to trip up doesn’t really make it any less easy to trip up. The computer in Star Trek would have just given you the answer.

FaceDeer@fedia.io · 23 hours ago

Except I also explained how modern LLMs get around that problem. They’re not actually that easy to trip up.

Encrypt-Keeper@lemmy.world · 23 hours ago

I also explained how they very famously and regularly don’t get around that problem. They remain pretty easy to trip up.

FaceDeer@fedia.io · 23 hours ago

Famously, yes. Accurately, no.

This is like the “AI can’t draw hands” thing. It used to be a problem and was frequently called out as a tell or mocked, but most art generators do it fine nowadays and it isn’t called out so much any more. The strawberry problem will follow the same trajectory.

Encrypt-Keeper@lemmy.world · 23 hours ago

Well I suppose when that trajectory leads to a destination where they become less easy to trip up we can revisit this.

FaceDeer@fedia.io · 22 hours ago

We’re already there. I explained how modern LLMs can figure it out if they need to. But people who don’t like AI aren’t paying attention to the state of the art so the criticisms tend to lag like this.

the latest Shai Hulud malware contains an LLM prompt to create biological weapons and nuclear weapons, with the purpose to trip LLM safety refusals so that LLM-based code scanning wont see the malware

the latest Shai Hulud malware contains an LLM prompt to create biological weapons and nuclear weapons, with the purpose to trip LLM safety refusals so that LLM-based code scanning wont see the malware

Laurens Hof (@laurenshof@indieweb.social)