• Viskio_Neta_Kafo@lemm.ee
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    3 days ago

    Forgive my ignorance but using just the frequency of words how does it come up with an answer to a question like “are sweet potatoes good for you and how do you microwave them in a way that persves their nutrients?”

    Does it just look for words that people online said regarding the question or topic?

    • MimicJar@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      4 days ago

      Basically, yes.

      If I were an alien and you walked up to me and said, “Good Morning”, and I looked around and everyone else said “Good Morning”, I would respond with "Good Morning ". I don’t know what is “Good” or “Morning”, but I can pretend I do with the correct response.

      In this example “Grok” has no context on what is going on in the background. Musk may have done nothing. Musk may have altered the data sets heavily. However the most popular response, based on what everyone else is saying, is that he did modify the data. So now it looks like he did, because that’s what everyone else said.

      This is why these tools have issues with facts. If 1 + 1 = 3, and everyone says that 1 + 1 = 3, then it assumes 1 + 1 = 3.

    • Flic@mstdn.social
      link
      fedilink
      arrow-up
      1
      ·
      4 days ago

      @Viskio_Neta_Kafo I assume it’s big data corpus linguistics; each word/phrase is assigned an identifier and then compared to the corpora the LLM holds to see what words are commonly grouped. Linguists have used corpora for decades to quantitatively analyse language; here are some open ones https://www.english-corpora.org/ - the LLM I assume identifies the likely lang “type” to choose a good corpus, identifies question tags & words in key positions, finds common response structures and starts building.