Abstract: With growing concerns around data privacy, cloud dependence, and hardware constraints by implementing quantized large language models (LLMs) that can run on both standard CPUs and ...