LMQuickBench: Building, Testing, and Publishing a Local LLM Benchmarking Tool

3 minute read

Why Build LMQuickBench?

While tools like LM Studio make it easy to run LLMs locally, I wanted to measure and compare model performance. I wanted a tool that could:

Quickly measure response latency
Track token usage
Calculate tokens per second output speed
Test both short prompts and complex coding tasks

With these goals in mind, I created LMQuickBench, now available on PyPI. You can install it with:

pip install lmquickbench

How I Built LMQuickBench

Building LMQuickBench was a rewarding experience. Here’s how I approached it:

Python CLI Tool: I used Python and the requests library to interact with LM Studio’s local server API.
Flexible CLI Arguments: I added options like --prompt, --promptfile, --max_tokens, and --server_url for flexibility.
Clean Project Structure: I structured the project with setup.py and pyproject.toml for easy packaging.
PyPI Publishing: I packaged the tool and published it to PyPI for public use.
GitHub Actions: I set up CI/CD workflows to automatically test the tool on every push and pull request.

One key decision was to allow a dynamic server URL (--server_url) to make LMQuickBench future-proof for different deployment scenarios.

Key Features

LMQuickBench is designed to be simple yet powerful. Here are its key features:

Feature	Description
Single or batch benchmarking	Test individual prompts or batches of prompts.
Dynamic server URL	Connect to any compatible LLM server.
Lightweight	Only depends on `requests`.
Expandable	CSV output and dashboard visualization planned.

Quick Demo

Here’s how you can use LMQuickBench to benchmark your local LLMs:

Benchmark a Single Prompt

lmquickbench --prompt "Explain recursion in computer science." --max_tokens 512

Batch Test Prompts from a File

lmquickbench --promptfile prompts/prompts_coding.txt --max_tokens 512

Example Output:

Testing prompt: Explain recursion.
Model: qwen2.5-coder-14b-instruct, Latency: 4.56 sec, Tokens: 133, Tokens/sec: 29.14
Output: Recursion is a method where the solution to a problem depends on solutions to smaller instances of the same problem...

Performance Comparison

To demonstrate LMQuickBench’s capabilities, I tested several models with different prompts. Below is a summary of the results:

Prompt	Model	Latency (sec)	Tokens	Tokens/sec	Output Summary
Write a Python function that checks whether a given string is a palindrome.	qwen2.5-coder-14b-instruct	2.95	58	19.68	Provided a Python function to check if a string is a palindrome.
Write a Python class called Stack that implements push, pop, and peek operations.	qwen2.5-coder-14b-instruct	7.53	300	39.87	Provided a Python class `Stack` with `push`, `pop`, and `peek` methods.
Write a Python function to calculate the factorial of a number recursively.	qwen2.5-coder-14b-instruct	7.52	297	39.51	Provided a recursive Python function to calculate the factorial of a number.
Write a Python function that reverses a given string.	qwen2.5-coder-14b-instruct	4.46	148	33.21	Provided a Python function to reverse a string using slicing.
Write a Python function that sums all elements in a list.	qwen2.5-coder-14b-instruct	5.83	215	36.86	Provided a Python function to sum all elements in a list using `sum()`.

Summary:

Average Latency: 6.57 sec
Average Tokens/sec: 36.91

This table highlights how LMQuickBench can provide clear, actionable insights into model performance.

Publishing to PyPI

Publishing LMQuickBench to PyPI was a major milestone for me. It’s my first pip-installable open-source project, and it’s now publicly available for anyone to use.

To install LMQuickBench, simply run:

pip install lmquickbench

Future Plans

I’m excited about the future of LMQuickBench. Here are some features I’m planning to add:

CSV/JSON Export: Save benchmarking results for further analysis.
Streamlit Dashboard: Visualize results in an interactive dashboard.
System Monitoring: Track CPU and RAM usage during benchmarking.
Stress Testing: Auto-scale prompt generation to test model limits.

Conclusion

Building LMQuickBench taught me a lot about structuring Python projects, publishing to PyPI, and setting up CI/CD workflows. It’s been a rewarding journey, and I’m thrilled to share this tool with the community.

If you’re running LLMs locally, give LMQuickBench a try! Feedback and contributions are always welcome.

👉 GitHub Repo: https://github.com/gordonyfg/LMQuickBench
👉 PyPI Install:

pip install lmquickbench

Let’s keep pushing the boundaries of local LLM capabilities together 🚀.

Share on

X Facebook LinkedIn

Gordon Yeung

LMQuickBench: Building, Testing, and Publishing a Local LLM Benchmarking Tool

Why Build LMQuickBench?

How I Built LMQuickBench

Key Features

Quick Demo

Benchmark a Single Prompt

Batch Test Prompts from a File

Performance Comparison

Summary:

Publishing to PyPI

Future Plans

Conclusion

Share on

Comments

You May Also Enjoy

Running Qwen2.5-Coder-32B-Instruct Locally with LM Studio and VS Code

Design Portfolio: My Contributions at AP Infosense

Building a Multi-Protocol Texting App for Raspberry Pi

Extending GTest CI with Structured Tests on Linux