| 领域Domain | 适合度Rating | 说明Notes |
|---|---|---|
| 前端开发Frontend | A | React/Vue/CSS 代码生成准确,输出简洁可直接用React/Vue/CSS generation accurate and concise |
| 后端开发Backend | A | API、数据库查询、中间件逻辑清晰,JSON 输出零错误API, DB queries, middleware — clean JSON with zero errors |
| 移动开发Mobile | A | SwiftUI/Flutter 组件代码质量高,结构完整SwiftUI/Flutter components — high quality, complete structure |
| 桌面开发Desktop | B | Electron/PyQt 可用,需人工调整 UI 细节Electron/PyQt workable, needs manual UI touch-ups |
| 数据库与数据工程Data Engineering | A | SQL 生成准确,Python 数据处理管道整洁SQL generation accurate, clean Python data pipelines |
| 嵌入式与物联网Embedded / IoT | B | C/Arduino 代码可编译,需硬件知识配合C/Arduino compiles, needs hardware domain knowledge |
| 云原生与基础设施Cloud Native | B | Docker/K8s YAML 正确,复杂编排需人工Docker/K8s YAML correct, complex orchestration needs review |
| AI / 机器学习AI / ML | A | PyTorch/TensorFlow 代码准确,transformer 实现完整PyTorch/TensorFlow accurate, complete transformer implementations |
| 游戏开发Game Dev | B | Pygame/Unity C# 基础可用,复杂游戏逻辑较弱Pygame/Unity C# basics work, complex game logic weaker |
| 安全开发Security | C | 加解密实现可用,安全审计场景不推荐依赖 AICrypto works, not recommended for security audit scenarios |
| 底层与系统开发Systems | B | C/Rust 内存管理代码可参考,需人工审查C/Rust memory management — useful ref, needs human review |
| 新兴领域Emerging | A | WebAssembly/Solidity 智能合约输出规范WebAssembly/Solidity smart contracts — well-structured output |
2,859 题代码生成测试,覆盖前端/后端/移动/桌面。不拼速度,拼输出质量。 2,859 code generation tasks across frontend/backend/mobile/desktop. Quality over speed.
| 模型Model | 硬件Hardware | 平均行数Avg Lines | 输出质量Quality | 延迟Latency |
|---|---|---|---|---|
| STORM 32B | DGX | 37 | 简洁精准Clean & precise | 19.9s |
| DeepSeek V3 | Cloud | 43 | 偏冗长Verbose | 2.7s |
| Kimi | Cloud | 40 | 均衡Balanced | 4.9s |
| Mac M4 14B | Local | 38 | 干净Clean | 9.0s |
使用 EvalScope 对 STORM AI(DGX 30 并发)进行流式性能测试: EvalScope streaming performance test on STORM AI (DGX, 30 concurrent):
| 指标 / Metric | 实测数值 / Value | 含义Meaning | 评判Assessment |
|---|---|---|---|
| TTFT Time To First Token |
1,427ms (P50) ~2,373ms (P99) |
从发送请求到收到第一个 token 的时间Time from request to first token | ✅ 良好 — 用户感知"首字响应"不到 1.5 秒✅ Good — under 1.5s first response |
| TPOT Time Per Output Token |
~85ms (P50) ~91ms (P99) |
生成每个 token(不含首 token)的平均时间Avg time per output token (excl. first) | ✅ 流畅 — 生成过程无明显卡顿✅ Smooth — no visible stutter |
| Output Throughput | 307 tok/s | 系统每秒生成的输出 token 数Output tokens per second | ✅ 高 — 30 并发下仍保持 300+ tok/s✅ High — 300+ tok/s at 30 concurrent |
| Req Throughput QPS |
1.20 req/s | 每秒完成的请求数Requests completed per second | ✅ 稳定 — 30 人同时在线✅ Stable — 30 concurrent users |
| Avg Latency | 24.0s | 完整回复总时间Total response time | ✅ 可接受 — 32B 模型正常范围,Agent 场景容忍度高✅ Acceptable — normal for 32B, Agent-tolerant |
| 成功率Success Rate | 100% (60/60) | 30 并发下无失败Zero failures at 30 concurrent | ✅ 完美 — 核心承诺✅ Perfect — core promise |
60/60 成功 · QPS 1.20 · TTFT 1.4s60/60 success · QPS 1.20 · TTFT 1.4s
70/70 超时 · 系统资源耗尽70/70 timeout · Resource exhausted
| 机器Machine | 模型Model | 安全上限Safe Limit | 极限Breaking Point |
|---|---|---|---|
| DGX Spark | 32B-AWQ | 30 | 35 崩35 crash |
| Mac Mini M4 | 14B | 18 | 20 (74s 延迟)20 (74s latency) |
测试工具:魔搭 EvalScope · 推理引擎:vLLM / OllamaTesting: ModelScope EvalScope · Engine: vLLM / Ollama
南京暴风引擎科技有限公司Nanjing Storm Engine Technology Co., Ltd. · stormengine.cloud
⚠ 仅公司内部可靠性测试数据,仅供参考。 ⚠ Internal reliability test data for reference only.