Compile and run the C++ version of the GEMMA LLM(2 billion parameters) on my own laptop. The CPU is maxed out, but it still can’t produce a single word output after a long time.😃


Running Gemma.cpp on a laptop
CPU is maxed out when running Gemma.cpp on a laptop

Pull down the latest codes, compile and run, it’s OK, great.


Running Gemma.cpp
Prompt is “Variational Autoencoders (VAE) are currently applied in various scenarios, list some.”, get 476 tokens output in 4.44sec.


