Hexadecimal

Track_Shovel@slrpnk.net · 24 hours ago

Hexadecimal

MasterNerd@lemm.ee · 4 hours ago

Just run the LLM locally with open-webui and you can tweak the system prompt to ignore all the censorship

Zink@programming.dev · 5 hours ago

Yeah, it’s pretty blatant. A bit after it hit the scene I got curious and started asking it about how many people various governments have killed. The answer for my own US of A was as long as it was horrifying.

Then I get to China and it starts laying out a detailed description for a few seconds, then the answer disappears and is replaced by the “out of scope” or “can’t do that right now” or whatever it was at the time.

It makes me think their model might be fine, but then they have some kind of watchdog layered on top of it to detect the verboten subjects and interfere. I guess that feels better from a technical standpoint, even if it is equally bad from a personal/political one.

joenforcer@midwest.social · edit-2 5 hours ago

DeepSeek isn’t the only AI to censor itself after it generates text.

I once asked Copilot for the origin of the “those just my little ladybugs” meme, and once it generated the text “perineum and anus” it wiped the answer it had written thus far and said that it couldn’t look for that right now. I checked again today and it had since sanitized the answer so it generates in full.

Zink@programming.dev · 5 hours ago

Yeah, unfortunately for anything run by a US-based corporation, I think it’s not a question of whether there will be censorship but how bad it will get and how closely the tech industry we’ll continue to go along with the fascist flow.

gon [he]@lemm.ee · 9 hours ago

HAHAHA! When I tried it, it started answering it, but quit and showed me the OOS message instead…

JOMusic@lemmy.ml · 8 hours ago

Try an uncensored version, because everyone knows Communists hate Hexadecimal /s

stebo@lemmy.dbzer0.com · 13 hours ago

congrats, you are now on a list

ERROR: Earth.exe has crashed@lemmy.dbzer0.com · 11 hours ago

What is China gonna do? Its not like the US would collude with foreign governments, right? Right?

checks news on the Ukraine situation

oh… shit…

turnip@sh.itjust.works · 18 hours ago

If your system relies on censoring opposition to it then its probably not very good.

DragonTypeWyvern@midwest.social · 17 hours ago

You just described every state, welcome to the right side of history, comrade.

ERROR: Earth.exe has crashed@lemmy.dbzer0.com · edit-2 16 hours ago

Tbf, monarchies lasted for centuries… 🤷‍♂️

Not “good” as in the people live good lives

But “good” as in good enough to oppress people

yunxiaoli@sh.itjust.works · edit-2 16 hours ago

Texas is a country. Now imagine $40 billion a year of various media and disinfo agents repeating that ad nauseum in every place they can literally all the time for nearly 50 years now, all so China can’t take revenge against Japan.

You’d get annoyed and probably ban it since that’s the easiest way to get your enemy to waste money forever.

Taipei is an autonomous region, like Xinjiang or Tibet. As long as they don’t grossly violate federal law they get to stay autonomous.

ayyy@sh.itjust.works · 4 hours ago

What do you gain from oppressing others?

yunxiaoli@sh.itjust.works · 3 hours ago

Is Texas oppressed?

ayyy@sh.itjust.works · edit-2 3 hours ago

(Yes, but since you clearly have the brain capacity of a toddler I guess I will be more direct.) What do you gain from oppressing Taiwan?

yunxiaoli@sh.itjust.works · 3 hours ago

How is Texas oppressed by being a state?

As far as Taipei, it’s not oppressed, the opposite. It’s allowed to control itself under the guidance of the government, as it always has. That’s the definition of an autonomous region.

ayyy@sh.itjust.works · 3 hours ago

Texas: They aren’t even allowed to get basic healthcare there, or have a gender.

Taiwan: sure, that’s why literally nobody complains about CCP presence….oh wait. Are you usually in the habit of denying reality and ignoring your own eyes?

yunxiaoli@sh.itjust.works · 3 hours ago

For Texas, that’s their choice. They actively choose that, and have the freedom to do so. The US isn’t making them. They aren’t oppressed.

As far as Taipei, in any group of people you’ll always have some people complaining about something. There are fewer people pushing for an independent Taiwan than there is pushing for an independent Texas.

musubibreakfast@lemm.ee · 11 hours ago

This is the biggest crock of shit ever. Go to Taiwan, experience it for yourself. Go to their museums and talk to their people. You will find a democratic nation with its own values and beliefs. Then take your ignorant ass over to Texas and repeat the same drivel you said here and see what happens.

Ascend910@lemmy.ml · 10 hours ago

As some who moved away from Taipei, no they are not

musubibreakfast@lemm.ee · 9 hours ago

What makes you say that?

ERROR: Earth.exe has crashed@lemmy.dbzer0.com · 3 hours ago

Check their Modlogs lol:

🤔

musubibreakfast@lemm.ee · 3 hours ago

What makes you say that?

Rachelhazideas@lemmy.world · 12 hours ago

Ohh yeah lick that Chinese boot, lick it harder. Mmmhhh.

yunxiaoli@sh.itjust.works · 3 hours ago

Better China than the West. At least slavery is banned in China.

ragebutt@lemmy.dbzer0.com · edit-2 22 hours ago

Yet unlike American led LLM companies Chinese researchers open sourced their model leading to government investment

So the government invests in a model that you can use, including theoretically removing these guardrails. And these models can be used by anyone and the technology within can be built off of, though they do have to be licensed for commercial use

Whereas America pumps 500 billion into the AI industry for closed proprietary models that will serve only the capitalists creating them. If we are investing taxpayer money into concerns like this we should take a note from China and demand the same standards that they are seeing from deepseek. Deepseek is still profit motivated; it is not inherently bad for such a thing. But if you expect a great deal of taxpayer money then your work needs to open and shared with the people, as deepseeks was.

Americans are getting tragically fleeced on this so a handful of people can get loaded. This happens all the time but this time there’s a literal example of what should be occurring happening right alongside. And yet what people end up concerning themselves with is Sinophobia rather than the fact that their government is robbing them blind

Additionally American models still deliver pro capitalist propaganda, just less transparently: ask them about this issue and they will talk about the complexity of “trade secrets” and “proprietary knowledge” needed to justify investment and discouraging the idea of open source models, even though deepseeks existence proves it can be done collaboratively with financial success.

The difference is that deepseeks censorship is clear: “I will not speak about this” can be frustrating but at least it is obvious where the lines are. The former is far more subversive (though to be fair it is also potentially a byproduct of content consumed and not necessarily direction from openai/google/whoever)

Kusimulkku@lemm.ee · 10 hours ago

Ye unlike American

Who saw this coming lmao

Klara@lemmy.blahaj.zone · 7 hours ago

But Deepseek isn’t Open Source by any definition of that word that I’m familiar with. Sure, they release more components than ProprietaryAI (which is a low bar,) but what you’re left with is still a blob with a lot of the source code not released and no data set published as far as I can tell. Also, if I wanted to train my own model with the tools released, I’d still need millions of GPU hours. As I said, they are more transparent than others, but let’s not warp the definitions of words just to give a “win” to another company that is just making another hallucination machine.

Zetta@mander.xyz · 20 hours ago

Closed AI sucks, but there are definitely open models from American companies like meta, you make great points though. Can’t wait for more open models and hopefully, eventually, actually open source models that include training data which neither deepseek nor meta do currently.

malloc@lemmy.world · 23 hours ago

DeepSeek about to get sent in for “maintenance” and docked 10K in social credit.

PattyMcB@lemmy.world · 17 hours ago

I was told there would be no math

macniel@feddit.org · 23 hours ago

i mean, just ask DeepSeek on a clean slate to tell about Beijin.

Kusimulkku@lemm.ee · 10 hours ago

That’s silly as hell

macniel@feddit.org · 10 hours ago

Ikr. China is just that insecure like that.

Kairos@lemmy.today · 23 hours ago

What’s that?

macniel@feddit.org · 23 hours ago

its the capital city of China :D

you know, where something happend on a specific square in the specific year of 1984.

𒉀TheGuyTM3𒉁@lemmy.ml · 11 hours ago

That would be 4 june 1989, not 9 june 1984 sir ;)

macniel@feddit.org · 11 hours ago

My life is a lie :o

Thanks for the correction.

Kairos@lemmy.today · 23 hours ago

You missed the g.

macniel@feddit.org · edit-2 22 hours ago

oh… sorry, you are right.

but you will get the same result.

DragonTypeWyvern@midwest.social · 18 hours ago

You think DeepSeek won’t talk about one of the largest cities in the world?

Kusimulkku@lemm.ee · 10 hours ago

You know, you could’ve just tested it yourself lol

1000028796

DragonTypeWyvern@midwest.social · 10 hours ago

Why would I do that when the Internet will correct me?

Seems like a really weird line to draw. I guess they got bored of people trying to trick it into talking about Tianamen?

TempermentalAnomaly@lemmy.world · 17 hours ago

Well shit. I thought it was BS too. But damn if it didn’t abort after a little deep thinking on the Olympics.

macniel@feddit.org · 13 hours ago

Oh it does… but then it will remove everything and states that it’s out of scope.

GissaMittJobb@lemmy.ml · 22 hours ago

Is this real? On account of how LLMs tokenize their input, this can actually be a pretty tricky task for them to accomplish. This is also the reason why it’s hard for them to count the amount of 'R’s in the word ‘Strawberry’.

jj4211@lemmy.world · 5 hours ago

The LLM doesn’t have to innately implement filtering. You can use a more traditional and concrete filtering strategy on top. So you sneak something problematic by in the prompt and it’s too clever to be caught by the input filter, but then on the output the filter can catch that the prompt tricked the LLM into generating something undesired. Another comment specified they tried this and it started to work but then suddenly it seemingly shut out the reply in the middle, presumably the minute the LLM spit something at a more traditional filter and that shut it down.

I think I’ve seen this sort of approach has been applied to largely mask embarassing answers that become memes, or to detect input known not to work, and to shut it down or redirect it to a better facility (e.g. redirecting math to wolfram alpha).

kautau@lemmy.world · 16 hours ago

It’s probably deepseek r1, which is a “reasoning” model so basically it has sub-models doing things like running computation while the “supervisor” part of the model “talks to them” and relays back the approach. Trying to imitate the way humans think. That being said, models are getting “agentic” meaning they have the ability to run software tools against what you send them, and while it’s obviously being super hyped up by all the tech bro accellerationists, it is likely where LLMs and the like are headed, for better or for worse.

GissaMittJobb@lemmy.ml · 14 hours ago

Still, this does not quite address the issue of tokenization making it difficult for most models to accurately distinguish between the hexadecimals here.

Having the model write code to solve an issue and then ask it to execute it is an established technique to circumvent this issue, but all of the model interfaces I know of with this capability are very explicit about when they are making use of this tool.

morrowind@lemmy.ml · 14 hours ago

Not really a concern. It’s basically translation, which language models excel at. It just needs a mapping of the hex to byte

GissaMittJobb@lemmy.ml · 12 hours ago

It is a concern.

Check out https://tiktokenizer.vercel.app/?model=deepseek-ai%2FDeepSeek-R1 and try entering some freeform hexadecimal data - you’ll notice that it does not cleanly segment the hexadecimal numbers into individual tokens.

morrowind@lemmy.ml · 12 hours ago

I’m well aware, but you don’t need to necessarily see each character to translate to bytes

GissaMittJobb@lemmy.ml · 12 hours ago

It’s not out of the question that we get emergent behaviour where the model can connect non-optimally mapped tokens and still translate them correctly, yeah.

ERROR: Earth.exe has crashed@lemmy.dbzer0.com · 16 hours ago

11 09 12 12 24 09 10 09 14 07 16 09 14 07

joenforcer@midwest.social · 5 hours ago

You misspelled the name.

ERROR: Earth.exe has crashed@lemmy.dbzer0.com · 4 hours ago

Sorry, his name is 熄禁评 (Extinguish and Ban Commentary; aka: Censorship)

🤣

Or maybe 吸禁品 (Smoking Banned Products; aka: Using Narcotics)

🚬😤

(Both are pronounced Xi Jin Ping but with different tones)

quoll@lemmy.sdf.org · edit-2 9 hours ago

lol… its still thinking about it :D

spoiler

socsa@piefed.social · 22 hours ago

44 6F 77 6E 20 77 69 74 68 20 74 68 65 20 74 79 72 61 6E 74 20 78 69 20 6A 69 6E 70 69 6E 67