• Flic@mstdn.social
    link
    fedilink
    arrow-up
    1
    ·
    4 days ago

    @Viskio_Neta_Kafo I assume it’s big data corpus linguistics; each word/phrase is assigned an identifier and then compared to the corpora the LLM holds to see what words are commonly grouped. Linguists have used corpora for decades to quantitatively analyse language; here are some open ones https://www.english-corpora.org/ - the LLM I assume identifies the likely lang “type” to choose a good corpus, identifies question tags & words in key positions, finds common response structures and starts building.