[ home / bans / all ] [ qa / jp / sum ] [ maho ] [ xmas ] [ f / ec ] [ b / poll ] [ tv / bann ] [ toggle-new / tab ]

/maho/ - Magical Circuitboards

Advanced technology is indistinguishable from magic

New Reply

Options
Comment
File
Whitelist Token
Spoiler
Password (For file deletion.)
Markup tags exist for bold, itallics, header, spoiler etc. as listed in " [options] > View Formatting "


[Return] [Bottom] [Catalog]

File:Deppseek.png (44.92 KB,750x750)

 No.2334

There's been a lot of chatter lately about Deepseek. In the online circles I'm in, people have a politics-colored understanding, more or less saying "American tech companies couldn't do this, but an opensource Chinese company could and American tech companies are in 'damage control'". Which... I really don't understand. If it's an open source model, like Llama was, for example, I don't see how this doesn't just cause there to be a proliferation of much more efficient and performant models -- the same way after Llama became available, sudden there was Phi from Microsoft, Gemma from Google, Mistral, and others.

What does /maho/ think?

 No.2335

I think the average person has no idea what the fuck open source means.

 No.2336

File:Deepseek R1 Benchmarks.jpg (759.1 KB,4702x2787)

Also... I find these benchmarks dubious to be honest. Practically every benchmark says that they're the best. If you look at a Phi benchmark, it'll say they're the best. You look at a Gemma benchmark, it's the best. A Qwen benchmark, they're the best. And so on...

Does anyone actually have any first-hand experience to say whether Deepseek is actually any good? To relate back to how all these benchmarks are essentially cherry picked BS, Phi says it's great. Well, it's alright, but it's heavily censored and shit to actually use, even if it is fast. Gemma is much the same. Qwen is fast, and it's not completely censored like the aforementioned, which makes it much more ideal to use, even if it's complexity isn't as high. Llama, however, stands out in my experience as generally providing the best responses, at the cost of being just a bit slower than all of the other three.

I'm very curious if Deepseek is "great in benchmarks!" only, and it has a neutered ability to perform actual conversational tasks, but is good for more specific knowledge-related tasks, or if it's more conversationally optimized the way Claude and OpenAI's models are.

 No.2337

From what I've gleaneded the whole issues surroinding deepseek is that you can make an LLm on par with what OpenAi has been putting out without massive capex expenditures for hardwar

 No.2339

you can test the results yourself and I figure a bunch of people have anyways.

It's that China has been sanctioned from getting high tech equipment but Chinese scientists are lke 'lmao capatlism is so inefficient'

 No.2340

Is this only for chatbot AI? Are they planning to do this for image/video gen or speech AI? Chatbots can't make anime so no matter how smart they get they're still boring and useless.

 No.2342

>>2340
They have an image thing out now >>2333

 No.2343

>>2337
>without massive capex expenditures for hardware
I wouldn't say that. If you look at their benchmark, they suggest that OpenAI o1 is 1217B parameters, whereas DeepSeek R1 is 607B parameters according to their GitHub. Half the number of parameters is certainly significant, but I would hardly say that's any less of a massive capital expenditure. 1B parameters is roughly 1GB, so DeepSeek R1 would still require approximately 607GB to run. 8x H100s is certainly more affordable than 8x H200s, but that's not really saying much...

This is all very confusing because it seems like a lot of people who know absolutely nothing about LLMs have been making statements that they don't realize have actual meaning. I was hoping someone on /maho/ would know more, but that doesn't really seem to be the case so far...

For instance, people have also been talking about efficiency a lot, presumably in reference to tokens/s relative to billions of parameters?... But I've yet to see anything suggesting that DeepSeek R1 is any faster in terms of tokens/s per billion parameters. Now, maybe that's complicated by DeepSeek R1 allegedly performing at the level of OpenAI o1, because that in and of itself would be a marked improvement in efficiency, which was why I posed the initial question in the OP of whether it's closer to a Mixture of Experts-type model (which tend to score highly on benchmarks because they have tuned datasets to perform well at knowledge tasks, but suffer in conversation), or whether it's a broader, more generalized and conversational model (Such as OpenAI GPT4 or o1, or Anthropic Claude), which -- despite their size -- are far more capable and excel in both conversation and knowledge tasks, at the expense of requiring much more memory.

It wouldn't really be that impressive if DeepSeek R1 is good for knowledge tasks, but useless for conversation; and by "useless" I mean the ability to mold the style and type of response. To give an example, MoE models tend to be able to designed to respond in the following way: "What is the circumference of the Earth?" ... "[Circumference of Earth]", whereas generalized models can do much more complex things like "Format a socratic dialogue on the nature of kinematics in pirate speak" ... "[Characters discussing kinematics in pirate speak]".

 No.2352

I don't work in tech and I already am well aware how degraded and puffed-up America is, so none of these developments surprised me.

>>2339
All this petty behavior has made America' s "brightest" just look pathetic, and despite subtle nudges from China, it really is mainly all self-inflicted.
I wish I could get in on this but even before that I need all new PC hardware to use it effectively. Those budget offline models don't seem impressive to me.

 No.2358

File:SAKAMOTO.DAYS.S01E01.The.L….jpg (284.57 KB,1920x1080)

I've kind of lost interest in text AI. The good models (online) is where your data is being collected, which is very bad for prople that want to do RP stuff. The solution to getting around that is to use the reverse proxy stuff for some privacy which is probably quite illegal, albeit the law would prosecute the purveyor of stolen keys instead of the audience. Of course they're also censored so you use jailbreaking which is a ToS violation which again points to "piracy" being the only real option.
So, what if you want to try it local? Well, you need tens of thousands of dollars to get a few hunred VRAM of storage for models are might be comparable to the online ones. Sounds like Deepseek would be one of these.
It sucks and I've lost motivation in following it. When the text AI stuff was new-ish people were theorizing of making LORAs to introduce new datasets of worlds and characters and it seemed so exciting. I just don't feel that any more. If you want innovation (like everything created for Stable Diffusion) it can't be a rich people only thing. But I don't know if it will ever change. Consumer GPUs having 200 VRAM or existing high parameter LLMs getting to to 10gb of VRAM both seem far, far, FAR away.

Anyway, a free open source Chinese model performing well is great news. I think it's quite possible that it wouldn't be so close if all the American AI stuff didn't have layers of censorship via hidden injected prompts that neuters its capabilities. Due to their monopolistic status they've been able to get away with it, but wouldn't it be nice if competition forces them to change?
People will shove politics into everything these days since outrage politics drive clicks like nothing else, make it a team game like USA vs. China and the money from ads will just come flying in.

 No.2359

>>2336
It's on par with o1 on several noncheatable third-party benchmarks, and the RP community generally think it's comparable to Claude (RP SOTA).
There are pros and cons but R1 sticks to the prompt much better than everything else which makes it less censored.

 No.2360

China dropping DeepSeek R1, an open source AI model rivaling ChatGPT o1 for 2% the monthly cost (or whatever their best one is called), right as they established the plan to invest $500 billion to AI is as funny as things can get. They even, apparently, did it with NVIDIA limiting its best GPUs from being sold in China. Is DeepSeek lying about their expenditure of a measly $6 million? or are the american companies lying about their high expenses just so they can pocket the rest...

 No.2361

>>2358
There are already several (gimped) models that you can run locally on regular consumer hardware, but they obviously perform way worse than the big boy stuff.
That being said, the required hardware demand chart looks like a stairway constructed by a drunk person. Every once in a little while the demanding specs drop like a rock all at once. There's no telling when server grade AI of today could run on a run of the mill GPU, but it's probably going to be surprisingly soon. Hell, everything about AI has been surprisingly soon

 No.2362

File:9e5dbd6c01.jpg (408.12 KB,2053x1231)

yeah, it's probably better than GPT at translations

 No.2363

>>2362
goon freak

 No.2364

>>2334
>people have a politics-colored understanding, more or less saying "American tech companies couldn't do this, but an opensource Chinese company could and American tech companies are in 'damage control'". Which... I really don't understand

People are literally paid to do this.

 No.2365

>>2364
We are also at the point where AI companies use AI to defend themselves in online discussions. People then read this and parrot their talking points. I hate this decade so much now.

 No.2366

File:[MoyaiSubs] Mewkledreamy -….jpg (340.27 KB,1920x1080)

I did a little bit of reading on it.

-Open weights and not open source. This means people have the model, but not the data used to train it or information on how they did it.
-They did it without CUDA and in fact CUDA is not required to generate quickly, which is pretty huge. As I've said before on kissu it's the reason nvidia has been dominating AI stuff.
-In addition to the CUDA thing, it just seems much more memory efficient which is going to be a huge boon for local generation, but I still don't know how the actual size of the model. I should probably look that up soon.
-Its data is believed to be largely synthetic from significant prompting from chatgpt and claude. I'm not sure how they did this since everything I've read says AI training on AI makes for terrible results

 No.2370

What do people use these for? Seriously?

 No.2371

>>2368
it's the joke that people are making criticisms no one cares for. You can locally host it without any of these issues anyways. I think.
The blocking is in china

 No.2372

A much better joke would be "ask ChatGPT what percentage of american billionaires are jewish"

 No.2373

AI is bad and Kissu only likes it because of contrarianism and people only like this one because they think the of the enemy of their enemy is their friend even if that friend is a homophobic pedophile

 No.2374

ai is good at translations

 No.2375

File:__suzukaze_aoba_and_takimo….jpg (384.63 KB,850x1081)

>>2373
Hey, you know, if all you need to do with your friend is introduce them to some good yuri to show them the way then I don't think they're so irredeemable.

 No.2376

>>2373
but AI is good at making anti-contrarianism art and homoerotic anti-pedo art

 No.2377

File:Screenshot 2025-01-28 at 1….png (15.92 KB,990x211)

>>2366
I also did a very cursory little bit of reading. As I was expecting, Deepseek is indeed a Mixture of Experts model like I had imagined. I now understand what all the hubbub was about. As a MoE LLM, Deepseek only requires a subset of the "experts" (essentially smaller refined datasets on a particular topic) at any given time when generating a response. This is in contrast to the more typical approach where all parameters of the model are required at once to generate a response.

This has significant advantages because it means, for example, it has not only significant memory savings but it also means that extraneous information doesn't need to be considered when generating a response. For example, if the prompt is in English, you can activate the "English Expert" only, and not have to activate the parameters necessary to respond in Chinese, or Hindi, or German, etc. This same division of experts can be done across any number of topics: history, mathematics, philosophy, literature, media information, slang, etc.

From a purely hardware-constrained perspective, we would obviously expect the MoE model to perform better compared to a traditional LLM. From a more design philosophy-oriented approach, the disadvantage is that because not all of the parameters are being activated at once, you may lose some of the cross-pollination and latent association that a traditional LLM would excel more readily at.

From just a little bit of testing, Deepseek R1 does seem impressive, but I feel like it's probably more comparable to GPT4o mini, rather than GPT4o, or Claude Sonnet. It's hard to explain, but at times it feels a bit "local model"-y, probably because of the limited number of parameters being activated at any given time. The ability to respond well is certainly there, but the depth and style of response feels slightly lacking.

 No.2378

>>2374
It's not, but it's better than I am, and sometimes that's enough.

 No.2379

>>2378
how many japanese erogames do you play per month

 No.2380

>>2379
Approximately zero because dekinai, but that may change if human translators don't step up their game.

 No.2381

>>2380
I play somewhere in the range of 10-30. Human translation produces higher quality results that are only desirable from the translator themselves. For the general audience the vibe is all that's required and the artwork fills out the rest. Deepseek's translations in a recent game I played were in no way noticable as an AI translation.

 No.2382

And this was in V3. People have said on Twitter that Deepseek's ability to work in Sri-Lankan languages is very effective. This project is going to be a very cheap way for China to effectively communicate with countries the US has abandoned

 No.2383

>>2381
>For the general audience the vibe is all that's required and the artwork fills out the rest.

Absolutely horrible attitude towards translation but what should I expect from AItards

 No.2384

>>2377
>This same division of experts can be done across any number of topics: history, mathematics, philosophy, literature, media information, slang, etc.
As a bystander who has only kept up with new developments from the sidelines from a high level perspective, this sounds like it could be a major breakthrough.
My biggest frustration with the direction of recent AI development has been the split between human-readable models based on formal logic with limited domain "knowledge" (depending on your definition of knowledge) and the almost unauditable probabilistic machine learning models with broad focus but low reproducibility that have been dominating the hype cycle for the past few years. It's almost like we're now approaching something that resembles the human brain's ability to coordinate between specialized subsystems.
>It's hard to explain, but at times it feels a bit "local model"-y, probably because of the limited number of parameters being activated at any given time. The ability to respond well is certainly there, but the depth and style of response feels slightly lacking.
That's to be expected from a naive combination of domain-specific models. I could see that improving once someone develops a higher-level pattern recognition model to determine which domain-specific models need to be invoked when, which tokens serve as the bridges between domains, and how strong the relative weight of each domain model should be based on the strengths of their internal correlations.
I'm not sure if this makes any sense but it's the best I can do at 2 AM while boozy.

 No.2392

>>2384
>from a high level perspective, this sounds like it could be a major breakthrough.
It certainly could be. For one thing, unlike a traditional LLM, because only certain experts are invoked at a time, you don't need to train the aggregate size of the LLM. With a model like OpenAI's, which have some 1200B parameters for their models, you need to have the memory to fit the entire model to train it. With an MoE, you only need to train the experts, and then the gating model to coordinate the experts.

>I could see that improving once someone develops a higher-level pattern recognition model to determine which domain-specific models need to be invoked when, which tokens serve as the bridges between domains, and how strong the relative weight of each domain model should be based on the strengths of their internal correlations.
You've actually got it exactly correct! This is how a MoE LLM works. A gating model determines what experts to invoke, and then applies weights to the experts based on how strongly related they are to the prompt.

It would be really great if this paradigm becomes standard and it enables extensible LLMs, the same way stable diffusion has LORAs (as >>2358 mentioned), you could hypothetically imagine being able to drop in a "roleplay" expert. That might be just as explosive for LLMs as llama was for local models, and as stable diffusion LORAs were for fine-tuning local image generation.

 No.2398

File:1470790822385.png (570.83 KB,800x640)

So uh, does anyone know why this is bad for NVIDIA selling GPUs if more efficient training means you can probably do even more with stronger power? Or is it because of diminishing returns or something if training becomes too efficient?

 No.2399

File:Medalist.S01E03.Taiyaki.an….jpg (270.22 KB,1920x1080)

We specifically avoided naming the board /g/ to keep away this baggage people expect from it.
It's time to delete a bunch of posts because people can't talk about Deepseek AI in the Deepseek AI thread. Someone reported it asking the posts to be moved, but there's no value in a /gpol/ thread here. The expected environment on kissu is that if people make a thread about something, they can have a thread about it without people interjecting with opinions of countries or politicians or whatever.
Yes, your opinion about Country A or B is very important, but no one asked for it. OP (and others in the thread) want to talk about AI.

 No.2400

actually I'll toss in it /secret/

 No.2401

>>2381
>play somewhere in the range of 10-30.
are eroge really that short? i thought a guy wouldnt exhaust the average eroge in 1--3 days

 No.2402

>>2401
rpgmaker titles and such

 No.2403

>>2401
dl-site popcorn porn is like 10 minutes to 2 hours while other titles that get physical disk releases can go upwards of like 60 and beyond hours, and there's a variety of just about everything in between

 No.2404

I guess saying AI is bad at something is considered politics now.

 No.2405

File:[SubsPlease] Kinomi Master….jpg (267.09 KB,1920x1080)

>>2360
>Is DeepSeek lying about their expenditure of a measly $6 million? or are the american companies lying about their high expenses just so they can pocket the rest...
The $6 million was the cost for the final viable model training run, so it's a bit misleading. It's like saying something took $6 million in raw materials. You won't be able to do anything with that unless you have the infrastructure, knowledge, personnel and other stuff that took a lot of money to get you into that position.
If it really did use a lot of data from ChatGPT and Claude then I imagine they saved a lot of money there, but I really don't know how that works. Basically if they didn't have those two to build upon then it wouldn't have been so cheap.
Going forward that means they will need better models from other companies to borrow from or they'll stagnate.

>>2398
I think nvidia was just paired with the others as a group, although nvidia was also knocked down a peg by this not using CUDA which is the exclusive nvidia tech that had until now been VERY closely tied to AI. Stock market stuff is full of idiots, obviously, so anything could make value go up or down. I think the feeling of invincibility and predictability was shattered, though, and that's why it went down.




[Return] [Top] [Catalog] [Post a Reply]
Delete Post [ ]

[ home / bans / all ] [ qa / jp / sum ] [ maho ] [ xmas ] [ f / ec ] [ b / poll ] [ tv / bann ] [ toggle-new / tab ]