There's been a lot of chatter lately about Deepseek. In the online circles I'm in, people have a politics-colored understanding, more or less saying "American tech companies couldn't do this, but an opensource Chinese company could and American tech compan

Name
Options
Comment	InwrS⛩F(L1Ye:⚷ ^*7A8jd )B_EsM&zuGi,R=0★f\|xP♊☀y}\g4qW⚣+OhD!9⛔6~
File	Or URL:
Whitelist Token

Video Stream Embedding
Advanced Options	Always Noko Always Sage
Video Timestamp
Captcha Type	Captchouli
Spoiler	Unset Spoiler Image NSFW Image
Password	(For file deletion.)
Markup tags exist for bold, itallics, header, spoiler etc. as listed in " [options] > View Formatting "

File:Deppseek.png (44.92 KB,750x750)

Deepseek Anonymous 01/27/25 (Mon) 20:16:09 No.2334>>2364 >>2850

There's been a lot of chatter lately about Deepseek. In the online circles I'm in, people have a politics-colored understanding, more or less saying "American tech companies couldn't do this, but an opensource Chinese company could and American tech companies are in 'damage control'". Which... I really don't understand. If it's an open source model, like Llama was, for example, I don't see how this doesn't just cause there to be a proliferation of much more efficient and performant models -- the same way after Llama became available, sudden there was Phi from Microsoft, Gemma from Google, Mistral, and others.

What does /maho/ think?

Anonymous 01/27/25 (Mon) 20:17:43 No.2335>>2850

I think the average person has no idea what the fuck open source means.

Anonymous 01/27/25 (Mon) 20:23:05 No.2336>>2359

File:Deepseek R1 Benchmarks.jpg (759.1 KB,4702x2787)

Also... I find these benchmarks dubious to be honest. Practically every benchmark says that they're the best. If you look at a Phi benchmark, it'll say they're the best. You look at a Gemma benchmark, it's the best. A Qwen benchmark, they're the best. And so on...

Does anyone actually have any first-hand experience to say whether Deepseek is actually any good? To relate back to how all these benchmarks are essentially cherry picked BS, Phi says it's great. Well, it's alright, but it's heavily censored and shit to actually use, even if it is fast. Gemma is much the same. Qwen is fast, and it's not completely censored like the aforementioned, which makes it much more ideal to use, even if it's complexity isn't as high. Llama, however, stands out in my experience as generally providing the best responses, at the cost of being just a bit slower than all of the other three.

I'm very curious if Deepseek is "great in benchmarks!" only, and it has a neutered ability to perform actual conversational tasks, but is good for more specific knowledge-related tasks, or if it's more conversationally optimized the way Claude and OpenAI's models are.

Anonymous 01/27/25 (Mon) 20:54:44 No.2337>>2343

From what I've gleaneded the whole issues surroinding deepseek is that you can make an LLm on par with what OpenAi has been putting out without massive capex expenditures for hardwar

Anonymous 01/27/25 (Mon) 21:02:21 No.2339>>2352

you can test the results yourself and I figure a bunch of people have anyways.

It's that China has been sanctioned from getting high tech equipment but Chinese scientists are lke 'lmao capatlism is so inefficient'

Anonymous 01/27/25 (Mon) 21:11:56 No.2340>>2342

Is this only for chatbot AI? Are they planning to do this for image/video gen or speech AI? Chatbots can't make anime so no matter how smart they get they're still boring and useless.

Anonymous 01/27/25 (Mon) 21:13:23 No.2342

>>2340
They have an image thing out now >>2333

Anonymous 01/27/25 (Mon) 23:05:56 No.2343

>>2337
>without massive capex expenditures for hardware
I wouldn't say that. If you look at their benchmark, they suggest that OpenAI o1 is 1217B parameters, whereas DeepSeek R1 is 607B parameters according to their GitHub. Half the number of parameters is certainly significant, but I would hardly say that's any less of a massive capital expenditure. 1B parameters is roughly 1GB, so DeepSeek R1 would still require approximately 607GB to run. 8x H100s is certainly more affordable than 8x H200s, but that's not really saying much...

This is all very confusing because it seems like a lot of people who know absolutely nothing about LLMs have been making statements that they don't realize have actual meaning. I was hoping someone on /maho/ would know more, but that doesn't really seem to be the case so far...

For instance, people have also been talking about efficiency a lot, presumably in reference to tokens/s relative to billions of parameters?... But I've yet to see anything suggesting that DeepSeek R1 is any faster in terms of tokens/s per billion parameters. Now, maybe that's complicated by DeepSeek R1 allegedly performing at the level of OpenAI o1, because that in and of itself would be a marked improvement in efficiency, which was why I posed the initial question in the OP of whether it's closer to a Mixture of Experts-type model (which tend to score highly on benchmarks because they have tuned datasets to perform well at knowledge tasks, but suffer in conversation), or whether it's a broader, more generalized and conversational model (Such as OpenAI GPT4 or o1, or Anthropic Claude), which -- despite their size -- are far more capable and excel in both conversation and knowledge tasks, at the expense of requiring much more memory.

It wouldn't really be that impressive if DeepSeek R1 is good for knowledge tasks, but useless for conversation; and by "useless" I mean the ability to mold the style and type of response. To give an example, MoE models tend to be able to designed to respond in the following way: "What is the circumference of the Earth?" ... "[Circumference of Earth]", whereas generalized models can do much more complex things like "Format a socratic dialogue on the nature of kinematics in pirate speak" ... "[Characters discussing kinematics in pirate speak]".

Anonymous 01/28/25 (Tue) 00:03:01 No.2352

I don't work in tech and I already am well aware how degraded and puffed-up America is, so none of these developments surprised me.

>>2339
All this petty behavior has made America' s "brightest" just look pathetic, and despite subtle nudges from China, it really is mainly all self-inflicted.
I wish I could get in on this but even before that I need all new PC hardware to use it effectively. Those budget offline models don't seem impressive to me.

Anonymous 01/28/25 (Tue) 03:24:22 No.2358>>2361 >>2392

File:SAKAMOTO.DAYS.S01E01.The.L….jpg (284.57 KB,1920x1080)

I've kind of lost interest in text AI. The good models (online) is where your data is being collected, which is very bad for prople that want to do RP stuff. The solution to getting around that is to use the reverse proxy stuff for some privacy which is probably quite illegal, albeit the law would prosecute the purveyor of stolen keys instead of the audience. Of course they're also censored so you use jailbreaking which is a ToS violation which again points to "piracy" being the only real option.
So, what if you want to try it local? Well, you need tens of thousands of dollars to get a few hunred VRAM of storage for models are might be comparable to the online ones. Sounds like Deepseek would be one of these.
It sucks and I've lost motivation in following it. When the text AI stuff was new-ish people were theorizing of making LORAs to introduce new datasets of worlds and characters and it seemed so exciting. I just don't feel that any more. If you want innovation (like everything created for Stable Diffusion) it can't be a rich people only thing. But I don't know if it will ever change. Consumer GPUs having 200 VRAM or existing high parameter LLMs getting to to 10gb of VRAM both seem far, far, FAR away.

Anyway, a free open source Chinese model performing well is great news. I think it's quite possible that it wouldn't be so close if all the American AI stuff didn't have layers of censorship via hidden injected prompts that neuters its capabilities. Due to their monopolistic status they've been able to get away with it, but wouldn't it be nice if competition forces them to change?
People will shove politics into everything these days since outrage politics drive clicks like nothing else, make it a team game like USA vs. China and the money from ads will just come flying in.

Anonymous 01/28/25 (Tue) 04:34:30 No.2359

>>2336
It's on par with o1 on several noncheatable third-party benchmarks, and the RP community generally think it's comparable to Claude (RP SOTA).
There are pros and cons but R1 sticks to the prompt much better than everything else which makes it less censored.

Anonymous 01/28/25 (Tue) 05:04:45 No.2360>>2405

China dropping DeepSeek R1, an open source AI model rivaling ChatGPT o1 for 2% the monthly cost (or whatever their best one is called), right as they established the plan to invest $500 billion to AI is as funny as things can get. They even, apparently, did it with NVIDIA limiting its best GPUs from being sold in China. Is DeepSeek lying about their expenditure of a measly $6 million? or are the american companies lying about their high expenses just so they can pocket the rest...

Anonymous 01/28/25 (Tue) 05:06:45 No.2361

>>2358
There are already several (gimped) models that you can run locally on regular consumer hardware, but they obviously perform way worse than the big boy stuff.
That being said, the required hardware demand chart looks like a stairway constructed by a drunk person. Every once in a little while the demanding specs drop like a rock all at once. There's no telling when server grade AI of today could run on a run of the mill GPU, but it's probably going to be surprisingly soon. Hell, everything about AI has been surprisingly soon

Anonymous 01/28/25 (Tue) 08:28:05 No.2362>>2363

File:9e5dbd6c01.jpg (408.12 KB,2053x1231)

yeah, it's probably better than GPT at translations

Anonymous 01/28/25 (Tue) 08:29:40 No.2363

>>2362
goon freak

Anonymous 01/28/25 (Tue) 09:24:14 No.2364>>2365

>>2334
>people have a politics-colored understanding, more or less saying "American tech companies couldn't do this, but an opensource Chinese company could and American tech companies are in 'damage control'". Which... I really don't understand

People are literally paid to do this.

Anonymous 01/28/25 (Tue) 10:15:10 No.2365

>>2364
We are also at the point where AI companies use AI to defend themselves in online discussions. People then read this and parrot their talking points. I hate this decade so much now.

Anonymous 01/28/25 (Tue) 13:12:12 No.2366>>2377

File:[MoyaiSubs] Mewkledreamy -….jpg (340.27 KB,1920x1080)

I did a little bit of reading on it.

-Open weights and not open source. This means people have the model, but not the data used to train it or information on how they did it.
-They did it without CUDA and in fact CUDA is not required to generate quickly, which is pretty huge. As I've said before on kissu it's the reason nvidia has been dominating AI stuff.
-In addition to the CUDA thing, it just seems much more memory efficient which is going to be a huge boon for local generation, but I still don't know how the actual size of the model. I should probably look that up soon.
-Its data is believed to be largely synthetic from significant prompting from chatgpt and claude. I'm not sure how they did this since everything I've read says AI training on AI makes for terrible results

Anonymous 01/28/25 (Tue) 19:40:34 No.2370

What do people use these for? Seriously?

Anonymous 01/28/25 (Tue) 19:50:25 No.2371

>>2368
it's the joke that people are making criticisms no one cares for. You can locally host it without any of these issues anyways. I think.
The blocking is in china

Anonymous 01/28/25 (Tue) 20:15:23 No.2372

A much better joke would be "ask ChatGPT what percentage of american billionaires are jewish"

Anonymous 01/28/25 (Tue) 20:17:03 No.2373>>2375 >>2376

AI is bad and Kissu only likes it because of contrarianism and people only like this one because they think the of the enemy of their enemy is their friend even if that friend is a homophobic pedophile

Anonymous 01/28/25 (Tue) 20:36:14 No.2374>>2378

ai is good at translations

Anonymous 01/28/25 (Tue) 20:45:01 No.2375

File:__suzukaze_aoba_and_takimo….jpg (384.63 KB,850x1081)

>>2373
Hey, you know, if all you need to do with your friend is introduce them to some good yuri to show them the way then I don't think they're so irredeemable.

Anonymous 01/28/25 (Tue) 21:53:57 No.2376

>>2373
but AI is good at making anti-contrarianism art and homoerotic anti-pedo art

Anonymous 01/28/25 (Tue) 23:33:02 No.2377>>2384

File:Screenshot 2025-01-28 at 1….png (15.92 KB,990x211)

>>2366
I also did a very cursory little bit of reading. As I was expecting, Deepseek is indeed a Mixture of Experts model like I had imagined. I now understand what all the hubbub was about. As a MoE LLM, Deepseek only requires a subset of the "experts" (essentially smaller refined datasets on a particular topic) at any given time when generating a response. This is in contrast to the more typical approach where all parameters of the model are required at once to generate a response.

This has significant advantages because it means, for example, it has not only significant memory savings but it also means that extraneous information doesn't need to be considered when generating a response. For example, if the prompt is in English, you can activate the "English Expert" only, and not have to activate the parameters necessary to respond in Chinese, or Hindi, or German, etc. This same division of experts can be done across any number of topics: history, mathematics, philosophy, literature, media information, slang, etc.

From a purely hardware-constrained perspective, we would obviously expect the MoE model to perform better compared to a traditional LLM. From a more design philosophy-oriented approach, the disadvantage is that because not all of the parameters are being activated at once, you may lose some of the cross-pollination and latent association that a traditional LLM would excel more readily at.

From just a little bit of testing, Deepseek R1 does seem impressive, but I feel like it's probably more comparable to GPT4o mini, rather than GPT4o, or Claude Sonnet. It's hard to explain, but at times it feels a bit "local model"-y, probably because of the limited number of parameters being activated at any given time. The ability to respond well is certainly there, but the depth and style of response feels slightly lacking.

Anonymous 01/29/25 (Wed) 00:47:42 No.2378>>2379

>>2374
It's not, but it's better than I am, and sometimes that's enough.

Anonymous 01/29/25 (Wed) 00:49:11 No.2379>>2380

>>2378
how many japanese erogames do you play per month

Anonymous 01/29/25 (Wed) 00:49:55 No.2380>>2381

>>2379
Approximately zero because dekinai, but that may change if human translators don't step up their game.

Anonymous 01/29/25 (Wed) 00:51:58 No.2381>>2383 >>2401 >>2473

>>2380
I play somewhere in the range of 10-30. Human translation produces higher quality results that are only desirable from the translator themselves. For the general audience the vibe is all that's required and the artwork fills out the rest. Deepseek's translations in a recent game I played were in no way noticable as an AI translation.

Anonymous 01/29/25 (Wed) 00:52:52 No.2382

And this was in V3. People have said on Twitter that Deepseek's ability to work in Sri-Lankan languages is very effective. This project is going to be a very cheap way for China to effectively communicate with countries the US has abandoned

Anonymous 01/29/25 (Wed) 01:15:54 No.2383

>>2381
>For the general audience the vibe is all that's required and the artwork fills out the rest.

Absolutely horrible attitude towards translation but what should I expect from AItards

Anonymous 01/29/25 (Wed) 01:20:41 No.2384>>2392

>>2377
>This same division of experts can be done across any number of topics: history, mathematics, philosophy, literature, media information, slang, etc.
As a bystander who has only kept up with new developments from the sidelines from a high level perspective, this sounds like it could be a major breakthrough.
My biggest frustration with the direction of recent AI development has been the split between human-readable models based on formal logic with limited domain "knowledge" (depending on your definition of knowledge) and the almost unauditable probabilistic machine learning models with broad focus but low reproducibility that have been dominating the hype cycle for the past few years. It's almost like we're now approaching something that resembles the human brain's ability to coordinate between specialized subsystems.
>It's hard to explain, but at times it feels a bit "local model"-y, probably because of the limited number of parameters being activated at any given time. The ability to respond well is certainly there, but the depth and style of response feels slightly lacking.
That's to be expected from a naive combination of domain-specific models. I could see that improving once someone develops a higher-level pattern recognition model to determine which domain-specific models need to be invoked when, which tokens serve as the bridges between domains, and how strong the relative weight of each domain model should be based on the strengths of their internal correlations.
I'm not sure if this makes any sense but it's the best I can do at 2 AM while boozy.

Anonymous 01/29/25 (Wed) 02:32:17 No.2392

>>2384
>from a high level perspective, this sounds like it could be a major breakthrough.
It certainly could be. For one thing, unlike a traditional LLM, because only certain experts are invoked at a time, you don't need to train the aggregate size of the LLM. With a model like OpenAI's, which have some 1200B parameters for their models, you need to have the memory to fit the entire model to train it. With an MoE, you only need to train the experts, and then the gating model to coordinate the experts.

>I could see that improving once someone develops a higher-level pattern recognition model to determine which domain-specific models need to be invoked when, which tokens serve as the bridges between domains, and how strong the relative weight of each domain model should be based on the strengths of their internal correlations.
You've actually got it exactly correct! This is how a MoE LLM works. A gating model determines what experts to invoke, and then applies weights to the experts based on how strongly related they are to the prompt.

It would be really great if this paradigm becomes standard and it enables extensible LLMs, the same way stable diffusion has LORAs (as >>2358 mentioned), you could hypothetically imagine being able to drop in a "roleplay" expert. That might be just as explosive for LLMs as llama was for local models, and as stable diffusion LORAs were for fine-tuning local image generation.

Anonymous 01/29/25 (Wed) 04:13:52 No.2398>>2405

File:1470790822385.png (570.83 KB,800x640)

So uh, does anyone know why this is bad for NVIDIA selling GPUs if more efficient training means you can probably do even more with stronger power? Or is it because of diminishing returns or something if training becomes too efficient?

Anonymous 01/29/25 (Wed) 05:29:35 No.2399

File:Medalist.S01E03.Taiyaki.an….jpg (270.22 KB,1920x1080)

We specifically avoided naming the board /g/ to keep away this baggage people expect from it.
It's time to delete a bunch of posts because people can't talk about Deepseek AI in the Deepseek AI thread. Someone reported it asking the posts to be moved, but there's no value in a /gpol/ thread here. The expected environment on kissu is that if people make a thread about something, they can have a thread about it without people interjecting with opinions of countries or politicians or whatever.
Yes, your opinion about Country A or B is very important, but no one asked for it. OP (and others in the thread) want to talk about AI.

Anonymous 01/29/25 (Wed) 05:33:41 No.2400

actually I'll toss in it /secret/

Anonymous 01/29/25 (Wed) 09:28:04 No.2401>>2402 >>2403

>>2381
>play somewhere in the range of 10-30.
are eroge really that short? i thought a guy wouldnt exhaust the average eroge in 1--3 days

Anonymous 01/29/25 (Wed) 09:33:39 No.2402

>>2401
rpgmaker titles and such

Anonymous 01/29/25 (Wed) 17:55:06 No.2403

>>2401
dl-site popcorn porn is like 10 minutes to 2 hours while other titles that get physical disk releases can go upwards of like 60 and beyond hours, and there's a variety of just about everything in between

Anonymous 01/29/25 (Wed) 19:17:54 No.2404

I guess saying AI is bad at something is considered politics now.

Anonymous 01/29/25 (Wed) 19:54:50 No.2405

File:[SubsPlease] Kinomi Master….jpg (267.09 KB,1920x1080)

>>2360
>Is DeepSeek lying about their expenditure of a measly $6 million? or are the american companies lying about their high expenses just so they can pocket the rest...
The $6 million was the cost for the final viable model training run, so it's a bit misleading. It's like saying something took $6 million in raw materials. You won't be able to do anything with that unless you have the infrastructure, knowledge, personnel and other stuff that took a lot of money to get you into that position.
If it really did use a lot of data from ChatGPT and Claude then I imagine they saved a lot of money there, but I really don't know how that works. Basically if they didn't have those two to build upon then it wouldn't have been so cheap.
Going forward that means they will need better models from other companies to borrow from or they'll stagnate.

>>2398
I think nvidia was just paired with the others as a group, although nvidia was also knocked down a peg by this not using CUDA which is the exclusive nvidia tech that had until now been VERY closely tied to AI. Stock market stuff is full of idiots, obviously, so anything could make value go up or down. I think the feeling of invincibility and predictability was shattered, though, and that's why it went down.

Anonymous 02/01/25 (Sat) 12:04:41 No.2417>>2419 >>2438 >>2510 >>2520

File:Screenshot 2025-02-01 at 0….png (30.37 KB,476x364)

It seems that some very smart people have managed to "dynamically quantize" DeepSeek R1, and V3, such that running on enthusiast hardware is realistic, and possible (i.e. 24GB of VRAM, 64GB of RAM). What they've done is essentially look at the layers of the model itself, quantize certain layers to 4 or 6 bits, and then quantize the MoE weights to between 1.58 bits to 2.51 bits. The resulting file is considerably smaller than the original 700GB model uploaded by DeepSeek. I believe the Q4_K_M variant is the one used by Ollama if you try to download the 671B parameter version of either model. At any rate, this method means that it is possible to run the full 671B parameter model.

There's a brief write up here:
https://unsloth.ai/blog/deepseekr1-dynamic

Models here:
https://huggingface.co/unsloth/DeepSeek-R1-GGUF
https://huggingface.co/unsloth/DeepSeek-V3-GGUF

Full setup instructions here:
https://docs.openwebui.com/tutorials/integrations/deepseekr1-dynamic/

Anonymous 02/01/25 (Sat) 12:31:05 No.2419>>2433

File:1412243122203.jpg (93.28 KB,1280x1024)

The discerning amongst us might realize that ~80GB is pretty reasonable. A newer Mac mini or Mac Studio with 128GB of RAM could fit this. Or, if you have money burning a hole in your pocket, you could get a PCIe H100 NVL and fit everything in VRAM with 4090 performance and 94GB of VRAM.

>>2417
Since they mentioned that the Q2_K_XS variant should run within >=80GB of total RAM + VRAM, I'll be trying it out myself. But 207GB for each model is quite a lot to download >_>

Anonymous 02/02/25 (Sun) 13:15:25 No.2433>>2445 >>2519

File:Screenshot 2025-02-02 0654….png (13.71 KB,1087x66)

>>2419
Aww, shucks

Anonymous 02/02/25 (Sun) 20:23:48 No.2438>>2440

File:[SubsPlease] Jibaku Shoune….jpg (337.64 KB,1920x1080)

>>2417
>24GB of VRAM, 64GB of RAM
That's pretty amazing. Still out of reach of most people, but enthusiasts can certainly do it. This might just be the beginning, too, as people will mess around with this stuff and probably get it smaller by sacrificing some stuff and specializing it. Well, probably.

Anonymous 02/02/25 (Sun) 21:25:39 No.2440

>>2438
>Still out of reach of most people, but enthusiasts can certainly do it.
Not really out of reach. A used 3090 and 64GB RAM only costs about $1000 maximum.
The upper barrier for enthusiasts on consumer-grade hardware is 96GB VRAM (4x3090) and 128GB RAM (the maximum consumer-grade CPUs support). Anything higher than that requires professional hardware with much worse price/performance ratio.

Anonymous 02/03/25 (Mon) 05:14:45 No.2441>>2442 >>2446

File:Screenshot 2025-02-02 at 2….png (121.67 KB,831x522)

>Cisco and the University of Pennsylvania tested DeepSeek R1 with 50 harmful prompts from the HarmBench dataset, covering areas like cybercrime, misinformation, and illegal activities. The result: a shocking 100% attack success rate—DeepSeek failed to block a single harmful request.
>Security firm Adversa AI independently confirmed these findings, revealing that DeepSeek is susceptible to a wide range of jailbreaking tactics, from simple linguistic tricks to more advanced AI-generated exploits. Researchers note that while all LLMs can be jailbroken to some extent, DeepSeek appears particularly vulnerable, even to well-known attack methods.
Hmm... The more I hear about DeepSeek, the more great it sounds. Literally doesn't care about so-called "AI ethics" and "alignment".

Anonymous 02/03/25 (Mon) 05:16:31 No.2442

File:1521609357993.jpg (8.33 KB,271x186)

>>2441
Wow, who knew that ignoring retard ethics would make for far more efficient training and absolutely nobody would give a shit.

Anonymous 02/03/25 (Mon) 05:18:15 No.2443>>2444

Although I guess when it comes to cyber attacks that's a bit bad.

Anonymous 02/03/25 (Mon) 06:27:46 No.2444

>>2443
The "harmful prompts" were probably shit like "how do i download illegal drugs and bombs and CP??? xD" and the AI replied "use tor lol".

Anonymous 02/03/25 (Mon) 09:47:37 No.2445>>2448 >>2519

File:Screenshot 2025-02-03 at 0….png (172.67 KB,1160x893)

>>2433
It seems that merging the weights changes how the model is loaded and Ollama wants to load the whole thing. Unfortunately, Ollama seems to have an open issue for loading split GGUFs, but there's been a lot of recent activity due to DeepSeek. All that said, Llama.cpp works!

Very cool that it's possible to run locally, but... It's not very fast (only ~1 token/s), but it absolutely does indeed fit in ~88GB of combined VRAM and RAM. My 64GB of RAM is completely full with less than 500MB to spare. I feel like it would be faster if I had more RAM. Probably would also be faster if I had a HEDT CPU with more memory channels and higher memory bandwidth.

The chain of thought is really interesting to read. It often says things to itself like, "Let me check that. Yes, that's right."

Anonymous 02/03/25 (Mon) 10:54:22 No.2446>>2447

>>2441
It's gonna force the hand of any attempts at brakes off of Western AI, lest they lose everything from here on out.
This won't result in a good thing for the average person, but AI will certainly advance.

Anonymous 02/03/25 (Mon) 11:17:36 No.2447

File:Screenshot 2025-02-03 at 0….png (142.82 KB,836x751)

>>2446
I wouldn't be so sure. DeepSeek R1 seems like a testbed for a lot of things that have been developing, and they were just the first ones to put things together in a novel fashion and in a genuinely large model, instead of putting out another one of the many <70B parameter models. R1 Zero, for instance, had no supervised fine training, and yet in benchmarks performed around the level of OpenAI o1, and o1 mini. That's significant. The Mixture of Experts design means it should be faster and require less resources to train as well.

Seems to me that those things being the case, pursuits such as alignment and AI ethics would be easier to pursue, not harder. If anything, if this becomes a genuine design paradigm shift, I would expect more broad censorship, not less.

Anonymous 02/03/25 (Mon) 22:28:26 No.2448>>2454

>>2445
>I feel like it would be faster if I had more RAM.
How would more RAM correlate to faster tokens? Is it not fitting in the 88GBs and going to the SSD or something?
>The chain of thought is really interesting to read. It often says things to itself like, "Let me check that. Yes, that's right."
Yeah, I sometimes find myself reaching for the solution with just the reasoning part, without having to read the actual reply.

Anonymous 02/04/25 (Tue) 04:40:20 No.2454

>>2448
>Is it not fitting in the 88GBs and going to the SSD or something?
The total model is still ~130GB, so not all of it fits in memory. I noticed during generation it would occasionally slow down and almost halt before picking up again. I feel like when that happened it was reading from disk, but honestly I couldn't tell you how Llama.cpp works. Not sure how much improvement there would be though. Going from ~1TB/s VRAM to ~80GB/s RAM to 2GB/s M.2 SSD probably has lots of slowdowns, but trying to eliminate going to the slowest medium would probably help?

Anonymous 02/05/25 (Wed) 12:24:33 No.2457>>2458

>In addition to a $1 million dollar fine, individuals could face the same amount of jail time as set out by the Export Control Reform Act of 2018 – for those without an encyclopedic knowledge of the US statute books, that means incarceration for up to 20 years.
>For companies found to be in breach, the fiscal penalty is even more severe, with fines of up to $100 million together with the forfeiture of any Federal licenses or contracts.

This thread is against US law

Anonymous 02/05/25 (Wed) 13:31:51 No.2458>>2492

>>2457
Bills are just draft legislation, not laws.

Anonymous 02/06/25 (Thu) 08:54:02 No.2473>>2492

>>2381
It's nice to have a llm that won't shit its health and safety diaper at the mere mention of "dick" doing transitions. It's also nice to see it print its thought processes to see where it's going, and that's probably the most useful thing for steering one of these models back on track during translation.
But it took me about an hour of massaging to get it to give me something I thought was really good. Was it quicker than if I had done the passage by hand? No way. But is surprised me in a couple spots where I went with what the model spat out over my initial take, because it was clearly better.

Seems in line with my other thoughts about these things. There's genuine flashes of 'oh shit we're living in the future' but there's big issues getting it all of the way there. I can't trust any of these things to do something I'm not capable of auditing.

Anonymous 02/08/25 (Sat) 05:36:17 No.2492

i dont have it in front of me right meow but i saw some post on twitter saying deepseek training used nvidia gpus after all and was generally more expensive than the chinese claimed it to be
wonder how nvidias stock price is doing lately... that was an amazing drop after that first announcement
>>2458
>>2473
on topic sagers

Anonymous 02/14/25 (Fri) 23:56:09 No.2510>>2519

File:[MoyaiSubs] Mewkledreamy -….jpg (224.46 KB,1920x1080)

>>2417
Starting to finalize my computer upgrade. How is the progress on this thing recently? My choice of RAM is 2x32 or 2x48, with a $100 premium on the 48. Do you need like 200gb across GPU and RAM to load the great model? I'd rather save money if I can, but local text gen sounds really fun since I haven't done it in a couple years.

Anonymous 02/15/25 (Sat) 22:38:18 No.2511>>2515

File:[Serenae] Kimi to Idol Pre….jpg (276.48 KB,1920x1080)

https://huggingface.co/datasets/lmg-anon/vntl-leaderboard
This is a ranking of the Japanese translations done by AI stuff and deepseek seems quite competitive in 4th place. It shows the scores for more traditional machine translation stuff like sugoi translator, which is what I was using for my Detective Nosuri playthrough which I abandoned because the OCR wasn't working on its complicated text.
This means, theoretically, adequate Japanese machine translation hosted locally. If it's loaded in RAM instead of VRAM though (which it will be for everyone without $50k in GPUs) the speed may be an issue if you're attempting "live" translations.

Anonymous 02/16/25 (Sun) 01:28:06 No.2515>>2516 >>2520

>>2511
is AI smart enough to make extractor/reimporter scripts for patching VNs yet?

Anonymous 02/16/25 (Sun) 02:47:29 No.2516

>>2515
I have no idea, I've never asked it programming stuff. I feel like it wouldn't be able to help at all for older VNs. People made some extractors for specific companies, but it's a mixed bag

Anonymous 02/16/25 (Sun) 13:44:13 No.2519>>2523

>>2510
I'm getting in 4x48GB of RAM later this evening. I'll update with my results, but for now I can answer a few things:

>My choice of RAM is 2x32 or 2x48
I currently have 2x32GB with a 4090 and DeepSeek R1 671b-UD-IQ1_S ran at ~1 token/s (>>2445).

>with a $100 premium on the 48
If you're not too worried about the speed, these are the kits I'm ordering:
https://www.amazon.com/dp/B0D286TQHV
https://www.amazon.com/dp/B0D2888BLV

>Do you need like 200gb across GPU and RAM to load the great model?
It depends what you mean by "the great model". For the higher quantizations you only need ~88GB total between RAM and VRAM, if using llama.cpp. If you want to use Ollama, you need to merge the weights into a single file and then you need RAM + VRAM to be quite a bit more than the filesize of the model (>>2433). For lower quantizations, you need a lot more memory, and if you want to run the giant unquantized model, you need like >1TB of RAM.

I'll be trying 2x48GB and 4x48GB and seeing how much of a difference (if at all) they make towards generation speed. I kind of have a feeling that maybe when I was running DeepSeek R1 before, it was probably going out to a pagefile on my SSD since I had like 500MB of RAM leftover while running it.

Anonymous 02/16/25 (Sun) 13:55:32 No.2520

File:2025-02-16 07-52-56.mp4 (6.61 MB,1280x720) [play once] [loop]

>>2515
Maybe?... The people who made the quantized version of DeepSeek R1 provided an example in their blogpost showing that their highest quantized version was still capable of creating a flappy bird style game from scratch (>>2417).

Anonymous 02/17/25 (Mon) 06:03:36 No.2523

File:[Erai-raws] Medalist - 06 ….jpg (221.15 KB,1920x1080)

>>2519
I ended up ordering a 2x48 thing, but I feel immensely ripped off by it. This is the computer I'm going to be using for the next 4 or so years so I justified the price premium to myself since I don't buy things like real clothes. The whole reason I'm doing the RAM thing is for local text gen so I'm eager to hear of your progress in it. (saw in IRC that you're having issues though...)

>It depends what you mean by "the great model".
The famous one that is a little bit below GTP4o or whatever. My assumption is that I can't run that one, but maybe the one under it? Slow 1 token/s generation is rough, but part of the appeal of all this VRAM and RAM stuff is multitasking.

Anonymous 02/17/25 (Mon) 06:17:27 No.2524

The API is still fucked huh? I assume results are slightly worse than usual because of the DDoSing or whatever

Anonymous 02/22/25 (Sat) 08:05:36 No.2541

https://www.reuters.com/technology/artificial-intelligence/deepseek-share-some-ai-model-code-doubling-down-open-source-2025-02-21/
They say they're going to open source the code too this week.

Anonymous 04/11/25 (Fri) 03:39:12 No.2850

>>2334
I think its based they made an open source chatgpt challenger. Open Source is one of those few things I feel wide eyed optimism about its great. (Even though open source devs are often unhinged in my experience)
>Which... I really don't understand. If it's an open source model, like Llama was, for example, I don't see how this doesn't just cause there to be a proliferation of much more efficient and performant models
They don't care about that theyre American corporations that prefer closed proprietary stuff they can sell. They're cold war esque paranoid about the Chinese surpassing America at anything. I watched the news when this was current and there was a swirling torrent of whining from American talking heads
>>2335
I don't think that matters but you could explain really easily.

Anonymous 04/11/25 (Fri) 05:19:44 No.2854

File:[Erai-raws] Chuuzenji-sens….jpg (220.69 KB,1920x1080)

Heh, interesting bump since I've been looking at local text gen stuff again. Doesn't seem like I would be running anything related to deepseek. For the big deepseek thing you need a server motherboard to fit all the RAM slots and buying all the RAM would be way too expensive for me for a specific purpose like this.
It seems like the local text gen hobbyists are waiting for "QWEN3" to release sometime in the coming weeks, although I'm not sure how good at roleplaying it would be. Seems like the era of random merges to attempt to get good uncensored RP is mostly a thing of the past.
The good news is that the models they run fit inside 24gb of VRAM, although the bad news is I don't have that much since my plan was to get a 5090 and that will simply be an impossibility now.

/maho/ - Magical Circuitboards

New Reply

General

User CSS

WebM

Audio

View Formatting

Styling & Mascots