As AI increasingly moves from the cloud to on-device, how, exactly, is one supposed to know whether such and such new laptop will run a GenAI-powered app faster than rival off-the-shelf laptops — or desktops or all-in-ones, for that matter? Knowing could mean the difference between waiting a few seconds for an image to generate versus a few minutes — and as they say, time is money.
MLCommons, the industry group behind a number of AI-related hardware benchmarking standards, wants to make it easier to comparison shop with the launch of performance benchmarks targeted at “client systems” — i.e. consumer PCs.
Today, MLCommons announced the formation of a new working group, MLPerf Client, whose goal is establishing AI benchmarks for desktops, laptops and workstations running Windows, Linux and other popular operating systems. MLCommons promises that the benchmarks will be “scenario-driven,” focusing on real end-user use cases and “grounded in feedback from the community.”
To that end, MLPerf Client’s first benchmark will focus on text-generating models, specifically Meta’s Llama 2, which MLCommons executive director David Kanter notes has already been incorporated into MLCommons’ other benchmarking suites for datacenter hardware. Meta’s also done extensive work on Llama 2 with Qualcomm and Microsoft to optimize Llama 2 for Windows, much to the benefit of Windows-running devices.
“The time is ripe to bring MLPerf to client systems, as AI is becoming an expected part of computing everywhere,” Kanter said in a press release. “We look forward to teaming up with our members to bring the excellence of MLPerf into client systems and drive new capabilities for the broader community.”
Members of the MLPerf Client working group include AMD, Arm, Asus, Dell, Intel, Lenovo, Microsoft, Nvidia and Qualcomm — but not Apple, notably.
Apple isn’t a member of the MLCommons, either, and a Microsoft engineering director (Yannis Minadakis) co-chairs the MLPerf Client group — which makes the absence not entirely surprising. The disappointing outcome, though, is that whatever AI benchmarks MLPerf Client conjures up won’t be tested across Apple devices — at least not in the near term.
Still, this writer’s curious to see what sort of benchmarks and tooling emerge from MLPerf Client — macOS-supporting or no. Assuming GenAI is here to stay — and there’s no indication that the bubble’s about to burst anytime soon — I wouldn’t be surprised to see AI benchmarks play an increasingly role in device buying decisions.
In my best-case scenario, the MLPerf Client benchmarks will be akin to the many PC build comparison tools online, giving an indication as to what AI app performance one can expect from a particular machine. Perhaps they’ll expand to cover phones and tablets in the future, even, given Qualcomm’s and Arm’s participation (both are heavily invested in the mobile device ecosystem). It’s clearly early days — but here’s hoping.