005-数据验证与Pydantic

难度:🟡 | 预计时间:90分钟 | 前置:004-请求体与响应模型

学习目标

  • 深入理解Pydantic的数据验证机制和工作原理
  • 掌握自定义验证器的编写和高级验证技巧
  • 学会处理复杂的数据类型、嵌套结构和数据转换
  • 理解数据序列化和反序列化的最佳实践
  • 实现企业级应用中的高级数据验证场景

Pydantic基础概念

验证器类型对比

验证器类型 执行时机 适用场景 性能影响 示例
字段验证器 字段赋值时 单字段验证 @validator('field')
根验证器 所有字段后 跨字段验证 @root_validator
预处理验证器 类型转换前 数据清洗 @validator('field', pre=True)
自定义类型 类型检查时 复杂类型 constr(), conint()

Pydantic验证流程

成功
失败
原始数据输入
预处理验证器
类型转换
字段验证器
根验证器
验证结果
创建模型实例
抛出ValidationError
序列化输出
错误信息收集

什么是Pydantic

Pydantic是一个使用Python类型提示进行数据验证和设置管理的库。它在FastAPI中扮演着核心角色,负责:

  • 数据验证
  • 数据序列化/反序列化
  • 自动生成JSON Schema
  • 类型转换
from pydantic import BaseModel, ValidationError
from typing import List, Optional
from datetime import datetime

class User(BaseModel):
    id: int
    name: str
    email: str
    age: Optional[int] = None
    created_at: datetime = datetime.now()

# 正确的数据
try:
    user = User(
        id=1,
        name="张三",
        email="zhangsan@example.com",
        age=25
    )
    print(user)
except ValidationError as e:
    print(e)

# 错误的数据
try:
    user = User(
        id="invalid",  # 应该是int
        name="",       # 空字符串
        email="invalid-email"  # 无效邮箱
    )
except ValidationError as e:
    print(e.json())

2. 基本数据类型验证

from pydantic import BaseModel, Field
from typing import List, Dict, Set, Tuple, Union
from datetime import datetime, date, time
from decimal import Decimal
from uuid import UUID

class DataTypes(BaseModel):
    # 基本类型
    integer: int
    float_num: float
    string: str
    boolean: bool
    
    # 日期时间类型
    datetime_field: datetime
    date_field: date
    time_field: time
    
    # 数值类型
    decimal_field: Decimal
    
    # 标识符
    uuid_field: UUID
    
    # 集合类型
    list_field: List[str]
    dict_field: Dict[str, int]
    set_field: Set[str]
    tuple_field: Tuple[str, int, bool]
    
    # 联合类型
    union_field: Union[str, int]
    
    # 可选类型
    optional_field: Optional[str] = None

# 示例数据
data = {
    "integer": 42,
    "float_num": 3.14,
    "string": "Hello",
    "boolean": True,
    "datetime_field": "2023-12-01T10:00:00",
    "date_field": "2023-12-01",
    "time_field": "10:00:00",
    "decimal_field": "99.99",
    "uuid_field": "123e4567-e89b-12d3-a456-426614174000",
    "list_field": ["a", "b", "c"],
    "dict_field": {"key1": 1, "key2": 2},
    "set_field": ["x", "y", "z"],
    "tuple_field": ["hello", 123, True],
    "union_field": "can be string or int"
}

validated_data = DataTypes(**data)
print(validated_data)

🛡️ 字段验证

1. Field函数的使用

from pydantic import BaseModel, Field
from typing import Optional
import re

class UserRegistration(BaseModel):
    # 基本约束
    username: str = Field(
        ...,  # 必需字段
        min_length=3,
        max_length=20,
        description="用户名,3-20个字符"
    )
    
    # 正则表达式验证
    email: str = Field(
        ...,
        regex=r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$',
        description="有效的邮箱地址"
    )
    
    # 数值范围
    age: int = Field(
        ...,
        ge=0,    # 大于等于0
        le=120,  # 小于等于120
        description="年龄,0-120岁"
    )
    
    # 密码强度
    password: str = Field(
        ...,
        min_length=8,
        max_length=128,
        description="密码,至少8个字符"
    )
    
    # 可选字段
    phone: Optional[str] = Field(
        None,
        regex=r'^\+?1?\d{9,15}$',
        description="电话号码"
    )
    
    # 列表长度限制
    interests: list[str] = Field(
        default=[],
        max_items=10,
        description="兴趣爱好,最多10个"
    )
    
    # 字符串选择
    gender: Optional[str] = Field(
        None,
        regex=r'^(male|female|other)$',
        description="性别:male, female, other"
    )
    
    class Config:
        schema_extra = {
            "example": {
                "username": "zhangsan",
                "email": "zhangsan@example.com",
                "age": 25,
                "password": "securepassword123",
                "phone": "+86-138-0013-8000",
                "interests": ["编程", "阅读", "旅行"],
                "gender": "male"
            }
        }

2. 自定义验证器

from pydantic import BaseModel, validator, root_validator
from typing import Optional
import re
from datetime import datetime, date

class UserProfile(BaseModel):
    username: str
    email: str
    password: str
    confirm_password: str
    birth_date: Optional[date] = None
    website: Optional[str] = None
    
    @validator('username')
    def validate_username(cls, v):
        """验证用户名"""
        if not v.isalnum():
            raise ValueError('用户名只能包含字母和数字')
        if v.lower() in ['admin', 'root', 'user', 'test']:
            raise ValueError('用户名不能使用保留词')
        return v.lower()  # 转换为小写
    
    @validator('email')
    def validate_email(cls, v):
        """验证邮箱"""
        email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
        if not re.match(email_pattern, v):
            raise ValueError('无效的邮箱格式')
        return v.lower()
    
    @validator('password')
    def validate_password(cls, v):
        """验证密码强度"""
        if len(v) < 8:
            raise ValueError('密码长度至少8位')
        
        if not re.search(r'[A-Z]', v):
            raise ValueError('密码必须包含至少一个大写字母')
        
        if not re.search(r'[a-z]', v):
            raise ValueError('密码必须包含至少一个小写字母')
        
        if not re.search(r'\d', v):
            raise ValueError('密码必须包含至少一个数字')
        
        if not re.search(r'[!@#$%^&*(),.?":{}|<>]', v):
            raise ValueError('密码必须包含至少一个特殊字符')
        
        return v
    
    @validator('birth_date')
    def validate_birth_date(cls, v):
        """验证出生日期"""
        if v is None:
            return v
        
        today = date.today()
        if v > today:
            raise ValueError('出生日期不能是未来日期')
        
        age = today.year - v.year - ((today.month, today.day) < (v.month, v.day))
        if age > 120:
            raise ValueError('年龄不能超过120岁')
        
        return v
    
    @validator('website')
    def validate_website(cls, v):
        """验证网站URL"""
        if v is None:
            return v
        
        url_pattern = r'^https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:\w*))?)?$'
        if not re.match(url_pattern, v):
            raise ValueError('无效的网站URL格式')
        
        return v
    
    @root_validator
    def validate_passwords_match(cls, values):
        """验证密码确认"""
        password = values.get('password')
        confirm_password = values.get('confirm_password')
        
        if password and confirm_password and password != confirm_password:
            raise ValueError('密码和确认密码不匹配')
        
        return values
    
    @root_validator
    def validate_email_username_different(cls, values):
        """验证邮箱和用户名不能相同"""
        username = values.get('username')
        email = values.get('email')
        
        if username and email and username == email.split('@')[0]:
            raise ValueError('用户名不能与邮箱前缀相同')
        
        return values

3. 条件验证

from pydantic import BaseModel, validator, Field
from typing import Optional
from enum import Enum

class PaymentMethod(str, Enum):
    credit_card = "credit_card"
    paypal = "paypal"
    bank_transfer = "bank_transfer"
    cash = "cash"

class PaymentInfo(BaseModel):
    method: PaymentMethod
    amount: float = Field(..., gt=0, description="支付金额")
    
    # 信用卡相关字段
    card_number: Optional[str] = None
    card_holder: Optional[str] = None
    expiry_date: Optional[str] = None
    cvv: Optional[str] = None
    
    # PayPal相关字段
    paypal_email: Optional[str] = None
    
    # 银行转账相关字段
    bank_account: Optional[str] = None
    bank_name: Optional[str] = None
    
    @validator('card_number')
    def validate_card_number(cls, v, values):
        """验证信用卡号"""
        if values.get('method') == PaymentMethod.credit_card:
            if not v:
                raise ValueError('信用卡支付需要提供卡号')
            
            # 移除空格和连字符
            card_num = re.sub(r'[\s-]', '', v)
            
            # 检查长度和数字
            if not card_num.isdigit() or len(card_num) not in [13, 14, 15, 16, 17, 18, 19]:
                raise ValueError('无效的信用卡号格式')
            
            # Luhn算法验证
            def luhn_check(card_number):
                def digits_of(n):
                    return [int(d) for d in str(n)]
                
                digits = digits_of(card_number)
                odd_digits = digits[-1::-2]
                even_digits = digits[-2::-2]
                checksum = sum(odd_digits)
                for d in even_digits:
                    checksum += sum(digits_of(d*2))
                return checksum % 10 == 0
            
            if not luhn_check(card_num):
                raise ValueError('无效的信用卡号')
            
            return card_num
        
        return v
    
    @validator('card_holder')
    def validate_card_holder(cls, v, values):
        """验证持卡人姓名"""
        if values.get('method') == PaymentMethod.credit_card:
            if not v:
                raise ValueError('信用卡支付需要提供持卡人姓名')
            if len(v.strip()) < 2:
                raise ValueError('持卡人姓名至少2个字符')
        return v
    
    @validator('expiry_date')
    def validate_expiry_date(cls, v, values):
        """验证有效期"""
        if values.get('method') == PaymentMethod.credit_card:
            if not v:
                raise ValueError('信用卡支付需要提供有效期')
            
            # 验证格式 MM/YY 或 MM/YYYY
            if not re.match(r'^(0[1-9]|1[0-2])/(\d{2}|\d{4})$', v):
                raise ValueError('有效期格式应为 MM/YY 或 MM/YYYY')
            
            month, year = v.split('/')
            if len(year) == 2:
                year = '20' + year
            
            from datetime import datetime
            try:
                expiry = datetime(int(year), int(month), 1)
                if expiry < datetime.now():
                    raise ValueError('信用卡已过期')
            except ValueError:
                raise ValueError('无效的有效期')
        
        return v
    
    @validator('cvv')
    def validate_cvv(cls, v, values):
        """验证CVV"""
        if values.get('method') == PaymentMethod.credit_card:
            if not v:
                raise ValueError('信用卡支付需要提供CVV')
            if not v.isdigit() or len(v) not in [3, 4]:
                raise ValueError('CVV应为3或4位数字')
        return v
    
    @validator('paypal_email')
    def validate_paypal_email(cls, v, values):
        """验证PayPal邮箱"""
        if values.get('method') == PaymentMethod.paypal:
            if not v:
                raise ValueError('PayPal支付需要提供邮箱')
            
            email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
            if not re.match(email_pattern, v):
                raise ValueError('无效的PayPal邮箱格式')
        return v
    
    @validator('bank_account')
    def validate_bank_account(cls, v, values):
        """验证银行账号"""
        if values.get('method') == PaymentMethod.bank_transfer:
            if not v:
                raise ValueError('银行转账需要提供账号')
            if not v.isdigit() or len(v) < 10:
                raise ValueError('银行账号至少10位数字')
        return v
    
    @validator('bank_name')
    def validate_bank_name(cls, v, values):
        """验证银行名称"""
        if values.get('method') == PaymentMethod.bank_transfer:
            if not v:
                raise ValueError('银行转账需要提供银行名称')
        return v

🔄 数据转换和序列化

1. 自动类型转换

from pydantic import BaseModel
from datetime import datetime
from typing import List

class AutoConversion(BaseModel):
    integer_from_string: int
    float_from_string: float
    boolean_from_string: bool
    datetime_from_string: datetime
    list_from_string: List[int]

# 自动转换示例
data = {
    "integer_from_string": "42",
    "float_from_string": "3.14",
    "boolean_from_string": "true",
    "datetime_from_string": "2023-12-01T10:00:00",
    "list_from_string": ["1", "2", "3"]
}

converted = AutoConversion(**data)
print(converted)
print(type(converted.integer_from_string))  # <class 'int'>
print(type(converted.datetime_from_string))  # <class 'datetime.datetime'>

2. 自定义序列化

from pydantic import BaseModel, Field
from datetime import datetime
from typing import Dict, Any
import json

class CustomSerialization(BaseModel):
    name: str
    created_at: datetime
    metadata: Dict[str, Any] = {}
    
    def dict(self, **kwargs) -> Dict[str, Any]:
        """自定义字典序列化"""
        data = super().dict(**kwargs)
        # 格式化日期时间
        if 'created_at' in data:
            data['created_at'] = data['created_at'].strftime('%Y-%m-%d %H:%M:%S')
        return data
    
    def json(self, **kwargs) -> str:
        """自定义JSON序列化"""
        return json.dumps(self.dict(**kwargs), ensure_ascii=False, indent=2)
    
    class Config:
        # 自定义JSON编码器
        json_encoders = {
            datetime: lambda v: v.strftime('%Y-%m-%d %H:%M:%S')
        }

# 使用示例
obj = CustomSerialization(
    name="测试对象",
    created_at=datetime.now(),
    metadata={"version": "1.0", "author": "张三"}
)

print(obj.dict())
print(obj.json())

3. 字段别名和排除

from pydantic import BaseModel, Field
from typing import Optional

class UserWithAlias(BaseModel):
    user_id: int = Field(alias="id")
    user_name: str = Field(alias="name")
    email_address: str = Field(alias="email")
    password_hash: str = Field(alias="password")
    is_active: bool = Field(True, alias="active")
    internal_notes: Optional[str] = Field(None, exclude=True)  # 排除此字段
    
    class Config:
        allow_population_by_field_name = True  # 允许使用字段名或别名

# 使用别名创建对象
user_data = {
    "id": 1,
    "name": "张三",
    "email": "zhangsan@example.com",
    "password": "hashed_password",
    "active": True
}

user = UserWithAlias(**user_data)
print(user.dict())  # 使用字段名
print(user.dict(by_alias=True))  # 使用别名
print(user.dict(exclude={'password_hash'}))  # 排除密码字段

🏗️ 复杂数据结构

1. 嵌套模型验证

from pydantic import BaseModel, validator
from typing import List, Optional
from datetime import datetime
from enum import Enum

class OrderStatus(str, Enum):
    pending = "pending"
    confirmed = "confirmed"
    shipped = "shipped"
    delivered = "delivered"
    cancelled = "cancelled"

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str
    country: str = "中国"
    
    @validator('zip_code')
    def validate_zip_code(cls, v, values):
        country = values.get('country', '中国')
        if country == '中国':
            if not v.isdigit() or len(v) != 6:
                raise ValueError('中国邮政编码应为6位数字')
        elif country == '美国':
            if not v.isdigit() or len(v) != 5:
                raise ValueError('美国邮政编码应为5位数字')
        return v

class Product(BaseModel):
    id: int
    name: str
    price: float
    category: str
    
    @validator('price')
    def validate_price(cls, v):
        if v <= 0:
            raise ValueError('价格必须大于0')
        return round(v, 2)

class OrderItem(BaseModel):
    product: Product
    quantity: int
    unit_price: float
    
    @validator('quantity')
    def validate_quantity(cls, v):
        if v <= 0:
            raise ValueError('数量必须大于0')
        return v
    
    @validator('unit_price')
    def validate_unit_price(cls, v, values):
        product = values.get('product')
        if product and abs(v - product.price) > 0.01:
            raise ValueError('单价与产品价格不匹配')
        return v
    
    @property
    def total_price(self) -> float:
        return round(self.quantity * self.unit_price, 2)

class Customer(BaseModel):
    id: int
    name: str
    email: str
    phone: Optional[str] = None
    
class Order(BaseModel):
    id: int
    customer: Customer
    items: List[OrderItem]
    shipping_address: Address
    billing_address: Optional[Address] = None
    status: OrderStatus = OrderStatus.pending
    order_date: datetime = datetime.now()
    notes: Optional[str] = None
    
    @validator('items')
    def validate_items(cls, v):
        if not v:
            raise ValueError('订单必须包含至少一个商品')
        if len(v) > 50:
            raise ValueError('订单商品数量不能超过50个')
        return v
    
    @validator('billing_address', always=True)
    def validate_billing_address(cls, v, values):
        if v is None:
            # 如果没有提供账单地址,使用配送地址
            return values.get('shipping_address')
        return v
    
    @property
    def total_amount(self) -> float:
        return round(sum(item.total_price for item in self.items), 2)
    
    @property
    def total_items(self) -> int:
        return sum(item.quantity for item in self.items)

# 使用示例
order_data = {
    "id": 1001,
    "customer": {
        "id": 1,
        "name": "张三",
        "email": "zhangsan@example.com",
        "phone": "+86-138-0013-8000"
    },
    "items": [
        {
            "product": {
                "id": 1,
                "name": "iPhone 15",
                "price": 999.99,
                "category": "电子产品"
            },
            "quantity": 1,
            "unit_price": 999.99
        },
        {
            "product": {
                "id": 2,
                "name": "保护壳",
                "price": 29.99,
                "category": "配件"
            },
            "quantity": 2,
            "unit_price": 29.99
        }
    ],
    "shipping_address": {
        "street": "中关村大街1号",
        "city": "北京",
        "state": "北京市",
        "zip_code": "100000",
        "country": "中国"
    },
    "status": "confirmed",
    "notes": "请在工作日配送"
}

order = Order(**order_data)
print(f"订单总金额: {order.total_amount}")
print(f"商品总数: {order.total_items}")
print(order.json(indent=2, ensure_ascii=False))

2. 动态模型和泛型

from pydantic import BaseModel, create_model
from typing import TypeVar, Generic, List, Dict, Any

# 泛型模型
T = TypeVar('T')

class ApiResponse(BaseModel, Generic[T]):
    success: bool
    message: str
    data: T
    timestamp: datetime = datetime.now()

class PaginatedResponse(BaseModel, Generic[T]):
    items: List[T]
    total: int
    page: int
    size: int
    pages: int

# 使用泛型
class User(BaseModel):
    id: int
    name: str
    email: str

# 创建特定类型的响应
UserResponse = ApiResponse[User]
UserListResponse = PaginatedResponse[User]

# 动态创建模型
def create_dynamic_model(fields: Dict[str, Any]) -> BaseModel:
    """动态创建Pydantic模型"""
    return create_model('DynamicModel', **fields)

# 示例:根据配置创建模型
config_fields = {
    'name': (str, ...),
    'age': (int, 18),
    'email': (str, None)
}

DynamicUser = create_dynamic_model(config_fields)
user = DynamicUser(name="张三", age=25, email="zhangsan@example.com")
print(user)

🎯 实战案例:表单验证系统

from fastapi import FastAPI, HTTPException, status
from pydantic import BaseModel, validator, Field, EmailStr
from typing import List, Optional, Dict, Any
from datetime import datetime, date
from enum import Enum
import re

app = FastAPI(title="表单验证系统")

# 枚举定义
class Gender(str, Enum):
    male = "male"
    female = "female"
    other = "other"
    prefer_not_to_say = "prefer_not_to_say"

class EducationLevel(str, Enum):
    high_school = "high_school"
    bachelor = "bachelor"
    master = "master"
    phd = "phd"
    other = "other"

class EmploymentStatus(str, Enum):
    employed = "employed"
    unemployed = "unemployed"
    student = "student"
    retired = "retired"
    self_employed = "self_employed"

# 复杂表单模型
class JobApplicationForm(BaseModel):
    # 基本信息
    first_name: str = Field(..., min_length=1, max_length=50, description="名")
    last_name: str = Field(..., min_length=1, max_length=50, description="姓")
    email: EmailStr = Field(..., description="邮箱地址")
    phone: str = Field(..., description="电话号码")
    
    # 个人信息
    birth_date: date = Field(..., description="出生日期")
    gender: Gender = Field(..., description="性别")
    nationality: str = Field(..., min_length=2, max_length=50, description="国籍")
    
    # 地址信息
    address_line1: str = Field(..., min_length=5, max_length=100, description="地址行1")
    address_line2: Optional[str] = Field(None, max_length=100, description="地址行2")
    city: str = Field(..., min_length=2, max_length=50, description="城市")
    state_province: str = Field(..., min_length=2, max_length=50, description="省/州")
    postal_code: str = Field(..., description="邮政编码")
    country: str = Field(..., min_length=2, max_length=50, description="国家")
    
    # 教育背景
    education_level: EducationLevel = Field(..., description="教育水平")
    university: Optional[str] = Field(None, max_length=100, description="大学")
    major: Optional[str] = Field(None, max_length=100, description="专业")
    graduation_year: Optional[int] = Field(None, ge=1950, le=2030, description="毕业年份")
    gpa: Optional[float] = Field(None, ge=0.0, le=4.0, description="GPA")
    
    # 工作经验
    employment_status: EmploymentStatus = Field(..., description="就业状态")
    current_company: Optional[str] = Field(None, max_length=100, description="当前公司")
    current_position: Optional[str] = Field(None, max_length=100, description="当前职位")
    years_of_experience: int = Field(..., ge=0, le=50, description="工作年限")
    
    # 技能和语言
    skills: List[str] = Field(..., min_items=1, max_items=20, description="技能列表")
    languages: List[str] = Field(..., min_items=1, max_items=10, description="语言能力")
    
    # 申请信息
    position_applied: str = Field(..., min_length=2, max_length=100, description="申请职位")
    expected_salary: Optional[float] = Field(None, gt=0, description="期望薪资")
    available_start_date: date = Field(..., description="可入职日期")
    
    # 附加信息
    cover_letter: Optional[str] = Field(None, max_length=2000, description="求职信")
    references: Optional[List[str]] = Field(None, max_items=5, description="推荐人")
    
    # 同意条款
    agree_to_terms: bool = Field(..., description="同意条款和条件")
    agree_to_privacy: bool = Field(..., description="同意隐私政策")
    
    @validator('phone')
    def validate_phone(cls, v):
        """验证电话号码"""
        # 移除所有非数字字符
        phone_digits = re.sub(r'\D', '', v)
        
        # 检查长度
        if len(phone_digits) < 10 or len(phone_digits) > 15:
            raise ValueError('电话号码应为10-15位数字')
        
        return v
    
    @validator('birth_date')
    def validate_birth_date(cls, v):
        """验证出生日期"""
        today = date.today()
        age = today.year - v.year - ((today.month, today.day) < (v.month, v.day))
        
        if v > today:
            raise ValueError('出生日期不能是未来日期')
        
        if age < 16:
            raise ValueError('申请人年龄必须至少16岁')
        
        if age > 100:
            raise ValueError('请检查出生日期是否正确')
        
        return v
    
    @validator('postal_code')
    def validate_postal_code(cls, v, values):
        """根据国家验证邮政编码"""
        country = values.get('country', '').lower()
        
        if country in ['中国', 'china']:
            if not re.match(r'^\d{6}$', v):
                raise ValueError('中国邮政编码应为6位数字')
        elif country in ['美国', 'usa', 'united states']:
            if not re.match(r'^\d{5}(-\d{4})?$', v):
                raise ValueError('美国邮政编码格式:12345 或 12345-6789')
        elif country in ['加拿大', 'canada']:
            if not re.match(r'^[A-Za-z]\d[A-Za-z] \d[A-Za-z]\d$', v):
                raise ValueError('加拿大邮政编码格式:A1A 1A1')
        
        return v.upper()
    
    @validator('graduation_year')
    def validate_graduation_year(cls, v, values):
        """验证毕业年份"""
        if v is None:
            return v
        
        birth_date = values.get('birth_date')
        if birth_date:
            min_graduation_age = 16
            earliest_graduation = birth_date.year + min_graduation_age
            
            if v < earliest_graduation:
                raise ValueError(f'毕业年份不能早于{earliest_graduation}年')
        
        current_year = datetime.now().year
        if v > current_year + 10:
            raise ValueError('毕业年份不能超过当前年份10年')
        
        return v
    
    @validator('skills')
    def validate_skills(cls, v):
        """验证技能列表"""
        # 移除重复项
        unique_skills = list(set(skill.strip().lower() for skill in v if skill.strip()))
        
        if len(unique_skills) < 1:
            raise ValueError('至少需要提供一项技能')
        
        # 检查技能名称长度
        for skill in unique_skills:
            if len(skill) < 2 or len(skill) > 50:
                raise ValueError('每项技能名称应为2-50个字符')
        
        return unique_skills
    
    @validator('available_start_date')
    def validate_available_start_date(cls, v):
        """验证可入职日期"""
        today = date.today()
        
        if v < today:
            raise ValueError('可入职日期不能早于今天')
        
        # 不能超过一年后
        from datetime import timedelta
        max_date = today + timedelta(days=365)
        if v > max_date:
            raise ValueError('可入职日期不能超过一年后')
        
        return v
    
    @validator('agree_to_terms')
    def validate_terms_agreement(cls, v):
        """验证条款同意"""
        if not v:
            raise ValueError('必须同意条款和条件才能提交申请')
        return v
    
    @validator('agree_to_privacy')
    def validate_privacy_agreement(cls, v):
        """验证隐私政策同意"""
        if not v:
            raise ValueError('必须同意隐私政策才能提交申请')
        return v
    
    class Config:
        schema_extra = {
            "example": {
                "first_name": "三",
                "last_name": "张",
                "email": "zhangsan@example.com",
                "phone": "+86-138-0013-8000",
                "birth_date": "1995-06-15",
                "gender": "male",
                "nationality": "中国",
                "address_line1": "中关村大街1号",
                "city": "北京",
                "state_province": "北京市",
                "postal_code": "100000",
                "country": "中国",
                "education_level": "bachelor",
                "university": "清华大学",
                "major": "计算机科学",
                "graduation_year": 2018,
                "gpa": 3.8,
                "employment_status": "employed",
                "current_company": "科技公司",
                "current_position": "软件工程师",
                "years_of_experience": 5,
                "skills": ["Python", "FastAPI", "数据库", "机器学习"],
                "languages": ["中文", "英语"],
                "position_applied": "高级软件工程师",
                "expected_salary": 25000.0,
                "available_start_date": "2024-01-15",
                "cover_letter": "我对这个职位非常感兴趣...",
                "agree_to_terms": True,
                "agree_to_privacy": True
            }
        }

# API端点
@app.post("/applications/", status_code=status.HTTP_201_CREATED)
def submit_application(application: JobApplicationForm):
    """提交求职申请"""
    # 这里可以添加保存到数据库的逻辑
    
    return {
        "message": "申请提交成功",
        "application_id": "APP-2023-001",
        "submitted_at": datetime.now(),
        "applicant_name": f"{application.last_name}{application.first_name}",
        "position": application.position_applied,
        "status": "under_review"
    }

@app.get("/applications/validate")
def validate_application_data(application: JobApplicationForm):
    """验证申请数据(不保存)"""
    return {
        "message": "数据验证通过",
        "validation_summary": {
            "applicant_name": f"{application.last_name}{application.first_name}",
            "age": datetime.now().year - application.birth_date.year,
            "skills_count": len(application.skills),
            "languages_count": len(application.languages),
            "has_experience": application.years_of_experience > 0,
            "education_level": application.education_level
        }
    }

🔧 高级技巧

1. 自定义验证装饰器

from functools import wraps
from pydantic import validator

def conditional_validator(condition_field: str, condition_value: Any):
    """条件验证装饰器"""
    def decorator(func):
        @wraps(func)
        def wrapper(cls, v, values):
            if values.get(condition_field) == condition_value:
                return func(cls, v, values)
            return v
        return wrapper
    return decorator

class ConditionalModel(BaseModel):
    type: str
    value: str
    
    @validator('value')
    @conditional_validator('type', 'email')
    def validate_email_value(cls, v, values):
        if '@' not in v:
            raise ValueError('无效的邮箱格式')
        return v
    
    @validator('value')
    @conditional_validator('type', 'phone')
    def validate_phone_value(cls, v, values):
        if not v.isdigit():
            raise ValueError('电话号码只能包含数字')
        return v

2. 批量验证

from typing import List, Dict
from pydantic import ValidationError

def validate_batch(model_class: BaseModel, data_list: List[Dict]) -> Dict:
    """批量验证数据"""
    results = {
        "valid": [],
        "invalid": [],
        "summary": {
            "total": len(data_list),
            "valid_count": 0,
            "invalid_count": 0
        }
    }
    
    for i, data in enumerate(data_list):
        try:
            validated_item = model_class(**data)
            results["valid"].append({
                "index": i,
                "data": validated_item.dict()
            })
            results["summary"]["valid_count"] += 1
        except ValidationError as e:
            results["invalid"].append({
                "index": i,
                "data": data,
                "errors": e.errors()
            })
            results["summary"]["invalid_count"] += 1
    
    return results

# 使用示例
user_data_list = [
    {"name": "张三", "email": "zhangsan@example.com", "age": 25},
    {"name": "", "email": "invalid-email", "age": -5},  # 无效数据
    {"name": "李四", "email": "lisi@example.com", "age": 30}
]

class SimpleUser(BaseModel):
    name: str = Field(..., min_length=1)
    email: EmailStr
    age: int = Field(..., ge=0, le=120)

results = validate_batch(SimpleUser, user_data_list)
print(results)

实践练习

练习1:创建配置验证系统

目标:实现一个完整的应用配置验证系统

要求

  • 创建数据库配置验证模型(包含主机、端口、用户名、密码等字段)
  • 实现Redis配置验证(支持集群和单机模式)
  • 添加邮件服务配置验证(SMTP设置、认证信息)
  • 设计日志配置验证(日志级别、输出格式、文件路径)
  • 实现安全配置验证(JWT密钥、加密算法、过期时间)

验收标准:所有配置项都能正确验证,错误信息清晰明确

练习2:实现多语言表单验证

目标:创建支持多语言的用户注册表单验证

要求

  • 根据语言设置不同的验证规则(如中文姓名、英文姓名格式)
  • 返回对应语言的错误信息
  • 支持字段的本地化显示
  • 实现电话号码的国际化验证

验收标准:支持至少中英文两种语言,验证规则和错误信息正确

练习3:构建数据转换管道

目标:实现复杂的数据清洗和转换流程

要求

  • 创建数据清洗验证器(去除空格、格式化等)
  • 实现数据类型自动转换
  • 添加数据完整性检查
  • 支持批量数据处理

验收标准:能够处理脏数据并输出标准格式

常见问题

Q1: 验证器执行顺序问题

问题:多个验证器的执行顺序不确定

解决方案

class OrderedValidation(BaseModel):
    value: str
    
    @validator('value', pre=True)  # 预处理验证器
    def pre_validator(cls, v):
        return v.strip() if isinstance(v, str) else v
    
    @validator('value')  # 主验证器
    def main_validator(cls, v):
        if len(v) < 3:
            raise ValueError('长度不能少于3个字符')
        return v

Q2: 性能优化问题

问题:大量数据验证时性能较差

解决方案

class OptimizedModel(BaseModel):
    name: str
    age: int
    
    class Config:
        validate_assignment = False  # 禁用赋值验证
        arbitrary_types_allowed = True  # 允许任意类型
        allow_population_by_field_name = True  # 优化字段访问

Q3: 循环引用问题

问题:嵌套模型出现循环引用

解决方案:使用typing.ForwardRef或字符串类型注解

from __future__ import annotations

class User(BaseModel):
    name: str
    friends: List[User] = []  # 使用延迟注解

总结

本章深入探讨了Pydantic数据验证的核心概念和高级技巧:

  • 验证机制:掌握了Pydantic的数据验证原理和执行流程
  • 自定义验证器:学会编写复杂的验证逻辑和错误处理
  • 数据转换:理解了序列化、反序列化和类型转换
  • 性能优化:了解了验证性能优化的最佳实践
  • 实际应用:通过实践练习掌握了企业级验证场景

下一步

参考资源

更新记录

  • 更新时间: 2024-01-20 | 更新内容: 重构文档结构,完善练习和FAQ,添加标准模板元素 | 更新人: Assistant
Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐