mpeg标准8x8DCT变换以及定点化实现

Mpeg1/2标准的DCT变换与H264的整数DCT不太一样，会有小数运算。硬件实现时会做定点化处理，因此也会产生误差，误差主要体现在某些数值小数部分在0.5左右。比如小数运算时数值为4.4999，四舍五入最终像素值为4；如果定点化结果为4.50001，实际上与小数运算差异不大，但最终像素值却相差1。1.DCT简介DCT和DCT反变换可用如下公式表示：其中，Xij是图像块 X 中第 i 行第 j

qq62

3320人浏览 · 2021-01-04 15:53:27

qq62 · 2021-01-04 15:53:27 发布

1.DCT简介

DCT和DCT反变换可用如下公式表示：
dct变换公式
其中，Xij是图像块 X 中第 i 行第 j 列图像或残差值，Ymn 是变换结果矩阵 Y 相应频率点上的 DCT 系数。可以用矩阵表示
$Y=AXA^T$
$X=A^TYA$

Mpeg1/2标准的DCT是8x8大小的。
令 $a=1/8a=\sqrt{1/8}$ ,
$b=12cos⁡(π8)b=\frac{1}{2}\cos{\left(\frac{\pi}{8}\right)}$ ,
$c=12cos⁡(3π8)c=\frac{1}{2}\cos{\left(\frac{3\pi}{8}\right)}$ ,
$d=12cos⁡(π16)d=\frac{1}{2}\cos{\left(\frac{\pi}{16}\right)}$ ,
$e=12cos⁡(316π)e=\frac{1}{2}\cos{\left(\frac{3}{16}\pi\right)}$ ,
$f=12cos⁡516πf=\frac{1}{2}\cos{\frac{5}{16}\pi}$ ,
$g=12cos⁡716πg=\frac{1}{2}\cos{\frac{7}{16}\pi}$ ,
则矩阵A可以如下表示
$(aaaaaaaadefg−g−f−e−dbc−c−b−b−ccbe−g−d−ffdg−ea−a−aaa−a−aaf−dge−e−gd−fc−bb−c−cb−bcg−fe−dd−ef−g)\begin{pmatrix} a & a & a & a& a& a& a& a \\ d & e&f &g&-g&-f&-e&-d \\ b&c&-c&-b&-b&-c&c&b \\ e&-g&-d&-f&f&d&g&-e \\ a& -a&-a&a&a&-a&-a&a \\ f&-d&g&e&-e&-g&d&-f \\ c&-b&b&-c&-c&b&-b&c\\ g&-f&e&-d&d&-e&f&-g \end{pmatrix}$
矩阵运算可以使用蝶形变换加速，以8x8反变换为例，分析 $A^TY$ 的第一列，
其中间结果用acc0-acc7表示，如下
$(acc0acc1acc2acc3)=(abacac−a−ba−c−aba−ba−c)(x0x2x4x6)\begin{pmatrix} acc0 \\ acc1 \\ acc 2 \\ acc3 \\ \end{pmatrix} =\begin{pmatrix} a&b& a& c\\ a&c&-a&-b \\ a&-c&-a&b\\ a&-b&a&-c \end{pmatrix}\begin{pmatrix} x0\\x2 \\ x4\\ x6 \end{pmatrix}$

$(acc4acc5acc6acc7)=(defge−g−d−ff−dgeg−fe−d)(x1x3x5x7)\begin{pmatrix} acc4 \\ acc5 \\ acc 6 \\ acc7 \\ \end{pmatrix} =\begin{pmatrix} d&e& f& g\\ e&-g&-d&-f \\ f&-d&g&e\\ g&-f&e&-d \end{pmatrix}\begin{pmatrix} x1\\x3 \\ x5\\ x7 \end{pmatrix}$
最终结果为
$y 0 = a c c 0 + a c c 4$
$y 1 = a c c 1 + a c c 5$
$y 2 = a c c 2 + a c c 6$
$y 3 = a c c 3 + a c c 7$
$y 4 = a c c 3 - a c c 7$
$y 5 = a c c 2 - a c c 6$
$y 6 = a c c 1 - a c c 5$
$y 7 = a c c 0 - a c c 4$

DCT反变换的几种python实现

按照公式直接计算

def normal_dct(block):
    A = np.zeros((8, 8))  # 生成0矩阵
    shape = src.shape[1]  # 获取维数
    for i in range(8):
        for j in range(8):
            if (i == 0):
                x = sqrt(1 / shape)
            else:
                x = sqrt(2 / shape)
            A[i][j] = x * cos(pi * (j + 0.5) * i / shape)  
    return np.dot(np.dot(A.T, src), A)

使用scipy的dct函数

from scipy.fftpack import dct, idct
def idct2(block):
    return idct(idct(block.T, norm = 'ortho').T, norm = 'ortho')

2.定点化实现

定点主要是把小数转成整型数据范围，通过整形加减以及移位操作替换小数运算。DCT运算过程中矩阵 $A$ 中的参数都是小数，需要转为整数进行运算。
小数可以用2的负数次幂相加来表示，以 $a=1/8a=\sqrt{1/8}$ 为例，可以表示为 $2^{-2} + 2^{-4} + 2^{-5} + 2^{-7} + 2^{-9}$ 。实现过程中以16个元素的数组来表示，每个数组元素所在数组下标对应2的幂次，则a的表项为a_table = [0,0,1,0,1,1,0,1,0,1,0,0,0,0,0,0]。

# a = sqrt(1/8) = 0.35355 = 2^-2 + 2^-4 + 2^-5 + 2^-7 + 2^-9
a_table = [0,0,1,0,1,1,0,1,0,1,0,0,0,0,0,0]
# b = (1/2*cos(PI/8)) = 0.46194 = 2^-1 - 2^-5 - 2^-7 + 2^-10
b_table = [0,1,0,0,0,-1,0,-1,0,0,1,0,0,0,0,0]
# c = (1/2*cos(PI*3/8)) = 0.19134 = 2^-3 + 2^-4 + 2^-8 - 2^-14
c_table = [0,0,0,1,1,0,0,0,1,0,0,0,0,0,-1,0]
# d = 1/2*cos(PI/16) = 0.49039 = 2^-1 - 2^-7 - 2^-9 + 2^-13
d_table = [0,1,0,0,0,0,0,-1,0,-1,0,0,0,1,0,0]
# e = 1/2*cos(PI*3/16) = 0.41573 = 2^-2 + 2^-3 + 2^-5 + 2^-7 + 2^-9 - 2^-12
e_table = [0,0,1,1,0,1,0,1,0,1,0,0,-1,0,0,0]
# f = 1/2*cos(PI*5/16) = 0.27779 = 2^-2 + 2^-5 - 2^-8 + 2^-11
f_table = [0,0,1,0,0,1,0,0,-1,0,0,1,0,0,0,0]
# g = 1/2*cos(PI*7/16) = 0.09755 = 2^-4 + 2^-5 + 2^-8 - 2^-13
g_table = [0,0,0,0,1,1,0,0,1,0,0,0,0,-1,0,0]
# 定点化IDCT
def fixpoint_idct(block):
    input = block.reshape(64, 1)
    idct_data1 = [0 for i in range(64)]
    idct_data2 = [0 for i in range(64)]
    tmp_data1 = idct_data1
    tmp_data2 = idct_data2

    for i in range(64):
        tmp_data1[i] = input[i] << 10
    #两次蝶形变换
    for k in range(2):
        for hor in range(8):
            a_x0 = b_x2 = a_x4 = c_x6 = 0
            c_x2 = b_x6 = 0
            d_x1 = e_x3 = f_x5 = g_x7 = 0
            e_x1 = g_x3 = d_x5 = f_x7 = 0
            f_x1 = d_x3 = g_x5 = e_x7 = 0
            g_x1 = f_x3 = e_x5 = d_x7 = 0
            i = hor << 3

            x0 = tmp_data1[i]
            x1 = tmp_data1[i+1]
            x2 = tmp_data1[i+2]
            x3 = tmp_data1[i+3]
            x4 = tmp_data1[i+4]
            x5 = tmp_data1[i+5]
            x6 = tmp_data1[i+6]
            x7 = tmp_data1[i+7]
			#遍历长度为16的查找表
            for j in range(16):
                a_x0 += (x0 >> j) * a_table[j]
                b_x2 += (x2 >> j) * b_table[j]
                a_x4 += (x4 >> j) * a_table[j]
                c_x6 += (x6 >> j) * c_table[j]
                c_x2 += (x2 >> j) * c_table[j]
                b_x6 += (x6 >> j) * b_table[j]

                d_x1 += (x1 >> j) * d_table[j]
                e_x1 += (x1 >> j) * e_table[j]
                f_x1 += (x1 >> j) * f_table[j]
                g_x1 += (x1 >> j) * g_table[j]

                d_x3 += (x3 >> j) * d_table[j]
                e_x3 += (x3 >> j) * e_table[j]
                f_x3 += (x3 >> j) * f_table[j]
                g_x3 += (x3 >> j) * g_table[j]
                d_x5 += (x5 >> j) * d_table[j]
                e_x5 += (x5 >> j) * e_table[j]
                f_x5 += (x5 >> j) * f_table[j]
                g_x5 += (x5 >> j) * g_table[j]
                d_x7 += (x7 >> j) * d_table[j]
                e_x7 += (x7 >> j) * e_table[j]
                f_x7 += (x7 >> j) * f_table[j]
                g_x7 += (x7 >> j) * g_table[j]

            acc0 = a_x0 + b_x2 + a_x4 + c_x6
            acc1 = a_x0 + c_x2 - a_x4 - b_x6
            acc2 = a_x0 - c_x2 - a_x4 + b_x6
            acc3 = a_x0 - b_x2 + a_x4 - c_x6
            acc4 = d_x1 + e_x3 + f_x5 + g_x7
            acc5 = e_x1 - g_x3 - d_x5 - f_x7
            acc6 = f_x1 - d_x3 + g_x5 + e_x7
            acc7 = g_x1 - f_x3 + e_x5 - d_x7

            tmp_data2[hor] = acc0 + acc4
            tmp_data2[(1<<3)+hor] = acc1 + acc5
            tmp_data2[(2<<3)+hor] = acc2 + acc6
            tmp_data2[(3<<3)+hor] = acc3 + acc7
            tmp_data2[(4<<3)+hor] = acc3 - acc7
            tmp_data2[(5<<3)+hor] = acc2 - acc6
            tmp_data2[(6<<3)+hor] = acc1 - acc5
            tmp_data2[(7<<3)+hor] = acc0 - acc4

        if k == 0:
            tmp_data1 = idct_data2
            tmp_data2 = idct_data1

    for i in range(64):
        tmp_data2[i] = (tmp_data2[i] + 512) >> 10
        if tmp_data2[i] > 255:
            tmp_data2[i] = 255
        if tmp_data2[i] < -256:
            tmp_data2[i] = -256

    return np.asarray(tmp_data2).reshape(8, 8)

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

当AI开始写代码，我们终于想起了自己是谁

历史绕了一个圈：2000年，资源稀缺逼你做工程师；2015年，资本扩张哄你做码农；2025年，责任回归，AI强迫你必须做回工程师。当AI开始写代码，我们终于想起了自己是谁——不是打字员，而是导演；不是搬砖工，而是指挥官。工程师的"签字权"回来了，系统思维和逻辑能力成了最保命的硬通货。

2048 AI社区

《160分钟踩坑实录：Streamlit + Ollama 聊天机器人环境配置全解析》

本文介绍了如何使用Python+Streamlit+Ollama搭建本地智能聊天机器人。主要内容包括：1) 核心代码实现，通过Ollama客户端调用本地部署的deepseek-r1:1.5b模型；2) 详细解决了开发过程中遇到的三大问题：虚拟环境激活、PowerShell脚本执行策略和包路径问题；3) 提供标准化操作流程，包括首次设置执行策略、激活虚拟环境和启动服务等步骤。该方案实现了基于本地私有