• minorkeys@lemmy.world
    link
    fedilink
    arrow-up
    177
    arrow-down
    6
    ·
    2 days ago

    The public fundamentally misunderstands this tech because salesman lied to them. An LLM is not AI. It just says the most likely thing based off what is most common in its training data for that scenario. It can’t do math or problem solve. It can only tell you what the most likely answer would be. It can’t do function things. It’s like Family Feud where it says what the most people surveyed said.

    • Clent@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      88
      ·
      2 days ago

      Some of them will “do math” but not with the LLM predictor, they have a math engine and the predictor decides when to use it. What’s great is when it outputs results, it’s not clear if it engaged the math engine or just guessed.

      • hikaru755@lemmy.world
        link
        fedilink
        arrow-up
        15
        ·
        2 days ago

        when it outputs results, it’s not clear if it engaged the math engine or just guessed

        That depends on the harness though. In the plain model output it will be clear if a tool call happened, and it depends on the application UI around it whether that’s directly shown to the user, or if you only see the LLM’s final response based on it.

        • Axolotl@feddit.it
          link
          fedilink
          arrow-up
          2
          ·
          1 day ago

          In all the UIs i have seen, not even 1 will tell you that it called the math engine, maybe it does happn with “thinking” models but i never tried

          Edit: i tries with deepseek, i don’t hsve enough math knowledge to do crazy stuff, so i did an addition lmao

          • hikaru755@lemmy.world
            link
            fedilink
            English
            arrow-up
            9
            ·
            1 day ago

            Quick note on terminology, there’s no thing called a “math engine”. Most models have the ability to run custom computer code in some way as one of the “tools” they have available, and that’s what’s used if a model decides to offload the calculation, rather than answer directly.

            This is what that looks like in Claude Code:

            Notice the lines starting with a green dot and the text Bash(python3...). Those are the model calling the “Bash” tool to run python code to answer the second and third question. The first question it answered (correctly, btw) without doing any tool call, that’s just the LLM itself getting it right in a straight shot, similar to DeepSeek in your example. Current models are actually good enough to generally get this kind of simple math correct on their own. I still wouldn’t want to rely on that, but I’m not surprised it got it correct without any tool calls.

            So I tested my more complex calculations against DeepSeek, and it seems like (at least in the Web UI) it doesn’t have any access to a math or code running tool. It just starts working through it in verbose text, basically explaining to itself how to do manual addition like you learn in school, and then doing that. Incredibly wasteful, but it did actually arrive at the correct answers.

            Gemini is the only web-based AI app I thought to test right now that seems to have access to a code running tool, here’s what that looks like:

            It’s hidden by default, but you can click on “Show code” in the top right to see what it did.

            This is what I mean when I say the harness matters. The models are all pretty similar, but the app you’re using to interact with them determines what tools are even made available to the LLM in the first place, and whether/how you’re shown when it calls those tools.

    • 1D10@lemmy.world
      link
      fedilink
      arrow-up
      30
      ·
      2 days ago

      I explain it as asking 100 people to Google something and taking the most common answer.

        • 1D10@lemmy.world
          link
          fedilink
          arrow-up
          23
          ·
          2 days ago

          Yep but instead of “name something a woman keeps in her purse” it’s “write my legal document” or “is it ok to lick a lamp socket”

          • felbane@lemmy.world
            link
            fedilink
            arrow-up
            5
            ·
            1 day ago

            Great question! The answer to all three of your queries is “yes.” Would you like me to search for the nearest lamp socket?

    • Subscript5676@piefed.ca
      link
      fedilink
      English
      arrow-up
      25
      arrow-down
      19
      ·
      2 days ago

      I know Lemmy hates AI with a fiery passion (and I too hate it for various reasons), but the ability to make this sort of prediction in a way far more stable than whatever else came before with natural language processing (fancy term of the day for those who havem’t heard of it), and however inefficiently built and ran it is, is useful if you can nudge it enough in a certain direction. It can’t do functional things reliably, but if you contain it to only parse human language and extract very specific information, show it in a machine-parsable way, and then use that as input for something you can program, you’ve essentially built something that feels like it can understand you in human language for a handful of tasks and carry out those tasks (even if the carrying out part isn’t actually done by an LLM). So pedantically, it’s not AI, but most people not in tech don’t know or care about the difference. It’s all magic all the way down like how computers should just magically do what they’re thinking of. That’s not changed.

      My point though, and this isn’t targeting you specifically dear OC, is that we can circlejerk all we want here, but echoing this oversimplification of what LLMs can do is pretty irrelevant to the bigger discourse. Call these companies out on their practices! Their hypocrisy! Their indifference to the collapse of our biosphere, human suffering, letting the most vulnerable to hang high and dry!

      Tech is a tool, and if our best argument is calling a tool useless when it’s demonstrably useful in specific ways, we’re only making a fool of ourselves, turning people away from us and discouraging others from listening to us.

      But if your goal is to feel good by letting one out, please be my guest.

      Peace

      • Susaga@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        22
        arrow-down
        3
        ·
        2 days ago

        The only way to know if LLM output is accurate is to know what an accurate output should look like, and if you know that, you don’t need an LLM. If you don’t know what an accurate output should look like, an LLM is equally likely to confidently lie to you as it is to help you, making you dumber the more you use it. The only other situation is if you know what an accurate output should look like, but you want an inaccurate one, which is a bad thing to encourage.

        “Demonstrably useful” is a lie. It’s a blatant and obvious lie. LLMs are so actively detrimental to their users, and society as a whole, that calling them useless is being generous. And even if they were the most beneficial thing on the planet, there is still no reason to use the billionaire’s toxic Nazi plagiarism machine.

        • Axolotl@feddit.it
          link
          fedilink
          arrow-up
          3
          ·
          1 day ago

          Sometimes i use AI even if i know the answer because i am a lazy person, and holy shit, i can confirm that it lies a lot and tells wrong shit

        • hikaru755@lemmy.world
          link
          fedilink
          arrow-up
          4
          arrow-down
          2
          ·
          1 day ago

          The only way to know if LLM output is accurate is to know what an accurate output should look like, and if you know that, you don’t need an LLM

          I empathize with your overall standpoint, but that’s just plain wrong. There are a lot of problems where verifying an answer is much easier for a human (or non-LLM computer program) than coming up with a correct answer.

          Anything that involves language manipulation, for example. I’ll have a much easier time checking a translation from English to German for accuracy than doing the full translation myself, assuming the model gets most of it correct and I don’t have to rewrite anything major (which is generally the case with current models). Or letting an LLM proof-read a text I wrote - I can’t be sure it got everything, but the things it does find are trivial for me to verify, and will often include things that slipped past me and three other people who proof-read the same text. Less useful, but still applicable to the premise: Producing a set of words that rhyme with a given one. Coming up with new ones after the first couple that pop into your head gets pretty hard, but checking if new candidates actually do rhyme is trivially easy.

          Moving on from language-stuff, finding security issues in software is a huge one - finding those is often extremely hard, but verifying them is mostly pretty straightforward if the report is well prepared. Models are just now getting good enough to reliably produce good security reports for actual issues.

          Answering questions about a big codebase, where the actual value doesn’t lie in the specific response the model gives, but pointing me to the correct places in the code where I can check for myself.

          Producing code or entire programs is a bit more debatable and it depends heavily on the goal and the skill level of the operator whether complete verification is actually easier than doing it yourself.

          Just a couple of examples. As I said I get where you’re coming from, but completely denying any kind of utility does not help your cause at all, it just make you look like an absolutist who doesn’t know what they’re talking about.

          • Susaga@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            1
            ·
            1 day ago

            If you know enough to verify a translation as accurate, or you have the tools to figure out an accurate translation through dictionaries or some such, then you know enough to do the translation yourself. If you don’t, then I cannot trust your translation.

            And if you can’t trust the output to be comprehensive or correct, then why would you trust something like system security to an LLM? Any security analyst who deserves their job would never take that risk. You don’t cut those corners.

            Quick reminder: rhyming dictionaries exist. LLMs solved a solved problem, but worse.

            Once again, even if the billionaire’s toxic Nazi plagiarism machine was useful, it is so morally repugnant that it should never be used, which makes it functionally useless. This is an absolute statement, but trying to “um actually” that makes you look like either a boot-licker, a pollutant, a Nazi, a plagiarist, an idiot, or some combination of those.

            I would rather look like an absolutist. How about you?

            • hikaru755@lemmy.world
              link
              fedilink
              arrow-up
              3
              arrow-down
              2
              ·
              1 day ago

              If you know enough to verify a translation as accurate, or you have the tools to figure out an accurate translation through dictionaries or some such, then you know enough to do the translation yourself.

              Correct. But it’s going to take me a lot more work and time, possibly to the point of not being feasible and probably even matching the energy cost of using the LLM over the entirety of the task.

              why would you trust something like system security to an LLM?

              I wouldn’t. I don’t know where you got that. Adding LLM-based analysis to your toolkit to spot important issues that otherwise might not have been found is just that: an addition. Not replacing anything. And it is demonstrably useful for that at this point, there’s just no denying that.

              Once again, even if the billionaire’s toxic Nazi plagiarism machine was useful, it is so morally repugnant that it should never be used, which makes it functionally useless.

              My point is that if you are this confidently wrong about the capabilities of LLM-based tools, then why should I believe you to be any less wrong about the moral and ethical issues you’re raising? It looks like you’re either completely misinformed or deliberately fighting a strawman for a part of your argument, so it gives anyone on the other side an easy excuse to just not engage with the rest of it and just dismiss it entirely. That’s what I’m trying to get across here.

              • richieadler@programming.dev
                link
                fedilink
                arrow-up
                1
                ·
                10 hours ago

                Are you saying that you need to have perfect technical knowledge of AI to know if a person that promotes it is immoral? It looks like a non sequitur to me.

                • hikaru755@lemmy.world
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  8 hours ago

                  No, that’s not what I’m saying. I’m saying that if someone wants their argument to be taken seriously, they should be willing to reevaluate parts of it that they’re very obviously wrong about, especially if, by their own admission, those parts don’t even matter in the face of the rest of the argument.

                  I’m just fed up with people feeling the need to have strong opinions on everything, even if they don’t actually know much about it. It’s fine if you don’t know anything about how capable current LLMs actually are. Especially as an opponent of LLMs for moral reasons, it makes total sense that you’d just be avoiding them and thus not really be that informed. It does not in any way weaken your argument. As long as you seem to have a good grip on what you know and what you don’t know, it’s all good. But being confidently wrong about things and refusing to reevaluate when getting pushback on that just signals that you neither know nor care about the limits of your own knowledge, and makes the entirety of your argument untrustworthy.

              • Susaga@sh.itjust.works
                link
                fedilink
                English
                arrow-up
                2
                ·
                1 day ago

                Surely, the energy cost to verify the translation would be the same as translating it? If you’re struggling that much, why are you translating it at all? I cannot trust your translation.

                If you tell an LLM to generate reports, it will, regardless of the actual quality of the environment. It doesn’t know what’s secure and what isn’t. All you’ve shown it to do is convince the kinds of security analysts with a system so insecure as to have a LOT of good reports that their system is more secure than it is. Which is useless at best, detrimental at worst.

                It’s useless for translation. It’s useless for security analysis. It’s useless for rhyming (I notice you didn’t mention that one). You’re trying so hard to prove how useful it is, and your failure demonstrates how useless it is.

                You can’t condemn confident wrongness and defend LLMs. And you can’t defend the billionaire’s toxic Nazi plagiarism machine while questioning someone else’s morals. You can’t cherry-pick my argument and claim I’m the one fighting a strawman. …Well, not if you’re arguing in good faith.

                • hikaru755@lemmy.world
                  link
                  fedilink
                  arrow-up
                  2
                  ·
                  23 hours ago

                  Look, I’m not trying to argue against your moral stance. I’m neither saying it’s wrong nor that it’s outweighed by any usefulness, real or not. What I’m trying is get you to see that your claims about uselessness are undermining your moral argument, which would be a hell of a lot stronger if you were not hell-bent on denying any kind of utility! Because in the eyes of people that do perceive LLMs as useful (which is exactly the kind of people that need to hear about the moral issues), that just makes you seem out of touch and not worth listening to.

                  It’s useless for security analysis.

                  Have you looked at any of the four links I provided? You might be working on old data here because it’s a very recent development, but a lot of high profile open source maintainers are saying that AI-generated security reports are now generally pretty good and not slop anymore. They’re fixing actual bugs because of it, and more than ever. How can you call that useless?

                  Surely, the energy cost to verify the translation would be the same as translating it?

                  Uh, no? Have you ever translated something? Verifying a translation happens mostly at attentive reading speed, double it for probably reading it twice overall to focus on content and grammar separately, plus some overhead for correcting the occasional flaw and checking one or two things that I’m unsure about from the top of my head, so for the sake of argument let’s say three times slower than just reading normally. I don’t know about you, but three times slower than reading is still a lot faster than I would be able to produce a translation from scratch, weighing different word options against each other, how to get some flow into the reading experience, etc. If I’m translating into a language that I’m fluent but not native in that takes even longer, because the ratio between my passive and active vocabulary is worse. I can read (and thus verify) English at a much more sophisticated level than I’m able to talk or write, because the words and native idioms just don’t come to me as naturally, or sometimes even at all without a lot of mental effort and a Thesaurus. LLMs are just plain better at writing English than I have any hope of achieving in my lifetime, and I can still fully understand and verify the factual, orthographic and grammatical correctness of what they’re outputting easily. Those two things are not mutually exclusive.

                  It’s useless for rhyming (I notice you didn’t mention that one)

                  Yeah, because I’m focusing on the more relevant things. I disagree that it’s completely useless for rhyming, but it is a much weaker and more contrived point than the others, and going into that discussion would just derail things more for no added value. Also, funny that you call me out for that, when you just fully ignored two use cases I mentioned in my initial comment (LLM proofreading texts, and answering questions about unfamiliar code bases). Those have a lot of legitimate utility for someone who’s not aware of or doesn’t care about the moral issues. And once again, that’s my point here - those people will not listen if they perceive you as talking about a fictional world where LLMs are completely useless, which fails to match up with their experience.

      • mycodesucks@lemmy.world
        link
        fedilink
        arrow-up
        10
        arrow-down
        2
        ·
        2 days ago

        We already have tools that can give us incorrect answers in natural human language.

        And they post their videos to youtube for free.

    • SorryQuick@lemmy.ca
      link
      fedilink
      arrow-up
      3
      arrow-down
      5
      ·
      1 day ago

      Is a human much different? We too require tons of training and we too are prone to stupid mistakes.

      • Scubus@sh.itjust.works
        link
        fedilink
        arrow-up
        2
        ·
        1 day ago

        Fundamentally yes and no. Original commentor could’ve saved his breath, if people wanted to be educated on AI they have plenty of resources to do so but instead they choose to remain ill informed. The difference is that humans are capable of critical thinking and conceptual connection. We are just as prone to mistakes as AI, we just have a much higher apptitude for mistakes lol. Hence the goal not being to make a perfect AI, its a much more achievable goal of making AI’s that beat us in specific fields. Then to beat us in all fields.

        • SorryQuick@lemmy.ca
          link
          fedilink
          arrow-up
          0
          arrow-down
          2
          ·
          1 day ago

          It’s missing features obviously (think neuroplasticity) but is that how AI differs from human intelligence, or simply a lack in the current generation?

          • Scubus@sh.itjust.works
            link
            fedilink
            arrow-up
            2
            ·
            24 hours ago

            It seems to be a flaw in both the hardware and software side of things. Hardware wise, we have yet to make chips that achieve the processing density of human brain matter. Also, heat generation becomes an issue as you try to scale smaller systems up. Software wise, we know our current neural networks dont scale up well, so we seem to be waiting on some more foundational research for more efficient algorithms. My suspicion is that we’re not really going to get true General Superintelligence until we start manufacturing chips that incorporate living neurons, it just really seems cheaper to use already existing computing systems than to design your own architecture.