Llamas on pixel 7s https://github.com/rupeshs/alpaca.cpp/tree/linux-android-build-support (ik ik its not over 13B yet, just sharing progress)
There are people who run 30B Llama on consumer PC successfully and even 65B (but it is extremely slow)
@ValeryCherepanov By "run on a single GPU" I mean the weights + one full input vector can fit on a consumer GPU at once. Otherwise the question would be meaningless - you can always split up matrices into smaller blocks and run the computation sequentially.
This is now extremely close to being resolved by Llama (Llama 13B does not actually beat GPT-3 on every measured benchmark, however, it only comes very close). 72% is way too low though so I guess whoever reads this comment first can collect some free mana in expectation.
FLAN-T5 3B very likely can resolve this now, but I suspect it will be a while before anyone actually bothers to run it on all of the benchmarks.