本次测试基于 MetaComputing AI PC with Framework Laptop 13,系统为 Debian GNU/Linux 12.11(bookworm)aarch64,内核版本 Linux 6.6.89。所有测试均在系统刚启动、无额外后台负载的情况下进行,结果更偏向理论性能上限,实际日常使用中会受功耗策略、温控与负载情况影响。

基础信息

系统信息

roma@debian ~> fastfetch
        _,met$$$$$gg.          roma@debian
     ,g$$$$$$$$$$$$$$$P.       -----------
   ,g$$P""       """Y$$.".     OS: Debian GNU/Linux 12.11 (bookworm) aarch64
  ,$$P'              `$$$.     Host: MC FML13V04 Board (1.0)
',$$P       ,ggs.     `$$b:    Kernel: Linux 6.6.89-cix-build-generic
`d$$'     ,$P"'   .    $$$     Uptime: 11 mins
 $$P      d$'     ,    $$P     Packages: 2632 (dpkg)
 $$:      $$.   -    ,d$$'     Shell: fish 3.6.0
 $$;      Y$b._   _,d$P'       Display (BOE095F): 2256x1504 @ 2x in 13", 60 Hz [Built-in]
 Y$$.    `.`"Y$$$$P"'          DE: GNOME 43.9
 `$$b      "-.__               WM: Mutter (Wayland)
  `Y$$b                        WM Theme: Adwaita
   `Y$$.                       Theme: Adwaita [GTK2/3/4]
     `$$b.                     Icons: Adwaita [GTK2/3/4]
       `Y$$b.                  Font: Cantarell (11pt) [GTK2/3/4]
         `"Y$b._               Cursor: Adwaita (24px)
             `""""             Terminal: /dev/pts/0
                               CPU: Cortex-A720*2 + Cortex-A520*4 + Cortex-A720*6 (12) @ 2.60 GHz
                               GPU: Mali-G720-Immortalis [Integrated]
                               Memory: 7.74 GiB / 30.94 GiB (25%)
                               Swap: 0 B / 976.00 MiB (0%)
                               Disk (/): 21.29 GiB / 466.95 GiB (5%) - ext4
                               Local IP (wlan0): 192.168.50.38/24
                               Battery (ABC): 95% [AC Connected]
                               Locale: en_US.UTF-8

基准测试结果

系统响应、开关机测试

  • 系统启动时间测试:系统从内核加载到进入图形界面总耗时 约 9.7 秒,其中 userspace 仅 6.1 秒 即达到 graphical.target。

roma@debian ~ > systemd-analyze

Startup finished in 3.587s (kernel) + 6.134s (userspace) = 9.722s
graphical.target reached after 6.075s in userspace.
  • 系统关闭时间测试:关机时间仅需约 46.45毫秒(实际断电时间)。这几乎是“瞬间”完成,体现了系统进程管理的效率和底层固件响应的迅速。用户完全无需等待。

roma@debian ~> time sudo shutdown -h now

Broadcast message from root@debian on pts/2 (Wed 2025-12-17 14:00:28 CST):

The system will power off now!


________________________________________________________
Executed in   46.45 millis    fish           external
   usr time    3.12 millis  993.00 micros    2.13 millis
   sys time    5.48 millis    0.00 micros    5.48 millis
  • 系统响应时间测试

roma@debian ~> time ls
Desktop/  Documents/  Downloads/  Geekbench-6.2.2-LinuxARMPreview/  Music/  Pictures/  Public/  Templates/  tinymembench/  top500-benchmark/  Videos/

________________________________________________________
Executed in    6.07 millis    fish           external
   usr time    0.84 millis  844.00 micros    0.00 millis
   sys time    5.29 millis  696.00 micros    4.60 millis

CPU

  • Geekbench 6

  • Geekbench 6跑分对比图(部分数据来源于网络(Bilibili、Youtube))

  • Coremark:单核性能已经明显超过传统嵌入式 ARM多;核扩展性良好,线程数拉高后仍能线性增长

  • 单核测试:24832.38(单位:Iterations/Sec)

    roma@debian ~/coremark (main)> make compile XCFLAGS="-DMULTITHREAD=1 -DUSE_PTHREAD"
    roma@debian ~/coremark (main)> ./coremark.exe
    2K performance run parameters for coremark.
    CoreMark Size    : 666
    Total ticks      : 12081
    Total time (secs): 12.081000
    Iterations/Sec   : 24832.381425
    Iterations       : 300000
    Compiler version : GCC12.2.0
    Compiler flags   : -O2 -DMULTITHREAD=1 -DUSE_PTHREAD  -lrt -lpthread
    Memory location  : Please put data memory location here
                            (e.g. code in flash, data on heap etc)
    seedcrc          : 0xe9f5
    [0]crclist       : 0xe714
    [0]crcmatrix     : 0x1fd7
    [0]crcstate      : 0x8e3a
    [0]crcfinal      : 0xcc42
    Correct operation validated. See README.md for run and reporting rules.
    CoreMark 1.0 : 24832.381425 / GCC12.2.0 -O2 -DMULTITHREAD=1 -DUSE_PTHREAD  -lrt -lpthread / Heap
    
    • 多核测试:186003.26(单位:Iterations/Sec)

roma@debian ~/coremark (main)> make compile XCFLAGS="-DMULTITHREAD=64 -DUSE_PTHREAD"
roma@debian ~/coremark (main)> ./coremark.exe
2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 103224
Total time (secs): 103.224000
Iterations/Sec   : 186003.255057
Iterations       : 19200000
Compiler version : GCC12.2.0
Compiler flags   : -O2 -DMULTITHREAD=64 -DUSE_PTHREAD  -lrt -lpthread
Parallel PThreads : 64
Memory location  : Please put data memory location here
                        (e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5

Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 186003.255057 / GCC12.2.0 -O2 -DMULTITHREAD=64 -DUSE_PTHREAD  -lrt -lpthread / Heap / 64:PThreads

GPU

  • Glmark2测试结果:得分 3845,这个分数需要结合场景和分辨率来看。在 800x600 窗口化下跑出近 5000 FPS 的单项测试,以及最终 3845 的总分,表明 Mali-G720 Immortalis 的性能远超以往的 ARM 集成显卡(如 Mali-G78/G710)。

roma@debian ~/coremark (main)> make compile XCFLAGS="-DMULTITHREAD=64 -DUSE_PTHREAD"
roma@debian ~/coremark (main)> ./coremark.exe
2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 103224
Total time (secs): 103.224000
Iterations/Sec   : 186003.255057
Iterations       : 19200000
Compiler version : GCC12.2.0
Compiler flags   : -O2 -DMULTITHREAD=64 -DUSE_PTHREAD  -lrt -lpthread
Parallel PThreads : 64
Memory location  : Please put data memory location here
                        (e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5

Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 186003.255057 / GCC12.2.0 -O2 -DMULTITHREAD=64 -DUSE_PTHREAD  -lrt -lpthread / Heap / 64:PThreads

Memory

  • LPDDR5(32G)

  • 高带宽: 多种复制 (copy) 操作速度达到 14-16 GB/s,而填充 (fill) 操作更是飙升至 35-41 GB/s (如 standard memset41.25 GB/s)。这明显是 LPDDR5 或 LPDDR5X 内存的特征,提供了充沛的数据吞吐能力,对 CPU 多核性能、GPU 纹理填充、大数据处理至关重要。

  • 低延迟: 内存延迟测试 (Memory latency test) 结果优秀。在常见应用使用的大小范围内(几MB到几十MB),额外延迟普遍在 几十纳秒 (ns) 级别。即使在最大的 64MB 块测试中,单/双随机读取延迟也仅为 194.1 ns / 228.1 ns (普通页) 和 175.6 ns / 210.6 ns (大页)。这确保了系统的快速响应。

tinymembench v0.4.9 (simple benchmark for memory throughput and latency)

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :  14892.4 MB/s (1.0%)
 C copy backwards (32 byte blocks)                    :  15205.4 MB/s (0.4%)
 C copy backwards (64 byte blocks)                    :  15140.4 MB/s (0.3%)
 C copy                                               :  14994.8 MB/s (0.3%)
 C copy prefetched (32 bytes step)                    :  14132.7 MB/s (0.2%)
 C copy prefetched (64 bytes step)                    :  14301.2 MB/s (0.8%)
 C 2-pass copy                                        :  14294.2 MB/s
 C 2-pass copy prefetched (32 bytes step)             :  13718.6 MB/s
 C 2-pass copy prefetched (64 bytes step)             :  14677.6 MB/s (0.3%)
 C fill                                               :  36013.7 MB/s (2.1%)
 C fill (shuffle within 16 byte blocks)               :  35850.4 MB/s (2.0%)
 C fill (shuffle within 32 byte blocks)               :  35094.1 MB/s (0.5%)
 C fill (shuffle within 64 byte blocks)               :  35493.6 MB/s (0.4%)
 ---
 standard memcpy                                      :  16168.9 MB/s (1.9%)
 standard memset                                      :  41249.7 MB/s (5.9%)
 ---
 NEON LDP/STP copy                                    :  16664.6 MB/s (1.0%)
 NEON LDP/STP copy pldl2strm (32 bytes step)          :  15130.8 MB/s (0.4%)
 NEON LDP/STP copy pldl2strm (64 bytes step)          :  15379.7 MB/s (0.2%)
 NEON LDP/STP copy pldl1keep (32 bytes step)          :  15705.0 MB/s
 NEON LDP/STP copy pldl1keep (64 bytes step)          :  16216.1 MB/s (0.4%)
 NEON LD1/ST1 copy                                    :  16102.1 MB/s (0.2%)
 NEON STP fill                                        :  37033.2 MB/s (1.9%)
 NEON STNP fill                                       :  34190.7 MB/s (2.6%)
 ARM LDP/STP copy                                     :  16468.4 MB/s (1.9%)
 ARM STP fill                                         :  39243.4 MB/s (2.7%)
 ARM STNP fill                                        :  33605.0 MB/s (1.6%)

==========================================================================
== Framebuffer read tests.                                              ==
==                                                                      ==
== Many ARM devices use a part of the system memory as the framebuffer, ==
== typically mapped as uncached but with write-combining enabled.       ==
== Writes to such framebuffers are quite fast, but reads are much       ==
== slower and very sensitive to the alignment and the selection of      ==
== CPU instructions which are used for accessing memory.                ==
==                                                                      ==
== Many x86 systems allocate the framebuffer in the GPU memory,         ==
== accessible for the CPU via a relatively slow PCI-E bus. Moreover,    ==
== PCI-E is asymmetric and handles reads a lot worse than writes.       ==
==                                                                      ==
== If uncached framebuffer reads are reasonably fast (at least 100 MB/s ==
== or preferably >300 MB/s), then using the shadow framebuffer layer    ==
== is not necessary in Xorg DDX drivers, resulting in a nice overall    ==
== performance improvement. For example, the xf86-video-fbturbo DDX     ==
== uses this trick.                                                     ==
==========================================================================

 NEON LDP/STP copy (from framebuffer)                 :   2235.3 MB/s
 NEON LDP/STP 2-pass copy (from framebuffer)          :   1515.1 MB/s
 NEON LD1/ST1 copy (from framebuffer)                 :   2231.8 MB/s
 NEON LD1/ST1 2-pass copy (from framebuffer)          :   1525.6 MB/s
 ARM LDP/STP copy (from framebuffer)                  :   1911.1 MB/s
 ARM LDP/STP 2-pass copy (from framebuffer)           :   1175.7 MB/s

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read, [MADV_NOHUGEPAGE]
      1024 :    0.0 ns          /     0.0 ns
      2048 :    0.0 ns          /     0.0 ns
      4096 :    0.0 ns          /     0.0 ns
      8192 :    0.0 ns          /     0.0 ns
     16384 :    0.0 ns          /     0.0 ns
     32768 :    0.0 ns          /     0.0 ns
     65536 :    0.0 ns          /     0.0 ns
    131072 :    1.0 ns          /     1.5 ns
    262144 :    2.0 ns          /     2.8 ns
    524288 :    3.0 ns          /     3.9 ns
   1048576 :   19.3 ns          /    27.6 ns
   2097152 :   27.1 ns          /    34.1 ns
   4194304 :   31.9 ns          /    36.2 ns
   8388608 :   37.2 ns          /    37.3 ns
  16777216 :   52.8 ns          /    50.5 ns
  33554432 :  128.8 ns          /   168.6 ns
  67108864 :  194.1 ns          /   228.1 ns

block size : single random read / dual random read, [MADV_HUGEPAGE]
      1024 :    0.0 ns          /     0.0 ns
      2048 :    0.0 ns          /     0.0 ns
      4096 :    0.0 ns          /     0.0 ns
      8192 :    0.0 ns          /     0.0 ns
     16384 :    0.0 ns          /     0.0 ns
     32768 :    0.0 ns          /     0.0 ns
     65536 :    0.0 ns          /     0.0 ns
    131072 :    1.0 ns          /     1.5 ns
    262144 :    1.5 ns          /     2.1 ns
    524288 :    1.8 ns          /     2.4 ns
   1048576 :   17.5 ns          /    26.1 ns
   2097152 :   25.2 ns          /    32.3 ns
   4194304 :   28.9 ns          /    34.3 ns
   8388608 :   31.1 ns          /    35.0 ns
  16777216 :   41.8 ns          /    47.3 ns
  33554432 :  123.6 ns          /   163.9 ns
  67108864 :  175.6 ns          /   210.6 ns

Disk

  • ZHITAI TiPlus7100 512GB ,iozone 基准测试

  • 顺序读写 (1M): 表现强劲,读取 ~4.9 GB/s,写入 ~4.6 GB/s。这达到了 PCIe 4.0 x4 NVMe SSD 的主流高性能水平,确保了大文件传输、视频加载、系统启动/休眠的速度。

  • 4K 随机读写: 读取 ~74.5 MB/s,写入 ~203.7 MB/s。写入性能不错,读取性能相对普通。这反映了 SSD 控制器和闪存颗粒的特性。对于日常使用(系统操作、程序加载)影响不大,但对于高并发随机读密集的数据库类应用会是瓶颈。

基准

结果(MB/s)

iozone 4K 随机写入

203.7

iozone 4K 随机读取

74.5

iozone 4K 顺序写入

141.8

iozone 4K 顺序读取

195.5

iozone 1M 随机写入

4,677.2

iozone 1M 随机读取

4,240.2

iozone 1M 顺序写入

4,575.3

iozone 1M 顺序读取

4,911.3

WLAN(Wi-Fi 6 AX210 160MHz)

  • 测试结果受路由器和网络环境影响较大。单向上传 (TX) 达到 355 Mbps 和下载 (RX) 215 Mbps,表明 Intel AX210 网卡在良好的 Wi-Fi 6/6E (160MHz) 环境下能提供接近有线千兆网的无线速度。同时收发时性能有所下降是正常现象,但吞吐量总和 (278 Mbps) 依然可观,满足高清视频流、大文件同步、在线协作等需求绰绰有余。

测试项

结果

iperf3 发

355 Mbits/sec

iperf3 收

215 Mbits/sec

iperf3 同时收发

上传 (TX):161 Mbits/sec

下载 (RX):117 Mbits/sec

总结

MetaComputing AI PC + Framework Laptop 13 的组合,是一台极具特色和前瞻性的设备。它在多核CPU性能、系统响应速度、内存带宽和集成GPU性能上带来了远超传统ARM笔记本甚至部分X86轻薄本的惊喜表现。模块化设计赋予了它长久的生命力。

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐