this post was submitted on 25 Aug 2023
22 points (100.0% liked)

LocalLLaMA

3427 readers
3 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago
MODERATORS
 

Is it just memory bandwidth? Or is it that AMD is not well supported by pytorch well enough for most products? Or some combination of those?

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 4 points 2 years ago* (last edited 2 years ago) (2 children)

I've gotten LLAMA running locally during CLBlast on an AMD GPU, and using the CPU simultaneously (basically APU execution pathway)

AMD is seriously slacking when it comes to machine learning, the hardware is Uber powerful, but just like everyone complains about, software isn't there.

ROCM doesn't even work on Windows, FFS.

You can run models on almost anything but the token generation is extremely slow. Like, you might be waiting upwards of 5 minutes for a response, or something like 0.2-0.6/tokens per second, which for a minimum of 100 tokens to be coherent is abysmal.

[–] [email protected] 3 points 2 years ago

Isn't windows for gaming and weird proprietary applications like photoshop?

[–] [email protected] 2 points 2 years ago (1 children)

If you're using llama.cpp, some ROCM stuff recently got merged in. It works pretty well, at least on my 6600. I believe there were instructions for getting it working on Windows in the pull.

[–] [email protected] 2 points 2 years ago

Thank you so much! I'll be sure to check that out / get it updated