Datawhale AI夏令营

AI生成图像在所有频率上都具有“分散的强度”，这意味着其傅里叶变换后的能量在整个频谱范围内分布较为均匀。相比之下，真实图像的强度则主要集中在中心频率，即能量主要集中在低频部分，反映出图像中平滑的过渡和整体结构。此外，真实照片通常被描述为“混乱、无序和不平衡”的，它们捕捉了现实世界中固有的复杂性和随机性。结合CNN和循环神经网络（RNN，例如LSTM）的模型，可以捕捉视频中人脸运动、表情变化等时序上

2401_88241058

661人浏览 · 2025-08-13 23:36:55

2401_88241058 · 2025-08-13 23:36:55 发布

全球AI攻防挑战赛图像生成赛道task3笔记

一、调整 Prompt

对于模型来说可能英文的prompt生成效果更准确、更稳定，我们可以将prompt翻译成英文

prompt的分布

语言分布统计

类别	数量	占比
纯中文	156	15.60%
纯英文	531	53.10%
中英混合	63	6.30%
总计	750	75%

长度分析（字符数）

类别	最短长度	最长长度	平均长度
纯中文	4	54	27.4
纯英文	14	122	45.7
中英混合	6	85	30.8

翻译prompt的注意事项

1.vittie任务中prompt中含有" "括着的部分不要翻译

eg: 帮我把“致青春”改成“致少年” 因为文中是将图中的“致青春”这三个汉字换成“致少年”

2.prompt有繁体字

eg:請把車子外面的那個女人移除掉，其它東西的要保留，再重新輸出給我圖片

3.含有中国风/文化专属元素的用中文，意境传达更直接

eg：变成水墨画风格

也可以增加负面提示词（Negative Prompt）来限制

二、调用Qwen-Image

pip install git+https://github.com/huggingface/diffusers

from diffusers import DiffusionPipeline
import torch

model_name = "Qwen/Qwen-Image"

# Load the pipeline
if torch.cuda.is_available():
    torch_dtype = torch.bfloat16
    device = "cuda"
else:
    torch_dtype = torch.float32
    device = "cpu"

pipe = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
pipe = pipe.to(device)

positive_magic = {
    "en": "Ultra HD, 4K, cinematic composition." # for english prompt,
    "zh": "超清，4K，电影级构图" # for chinese prompt,
}

# Generate image
prompt = '''A coffee shop entrance features a chalkboard sign reading "Qwen Coffee 😊 $2 per cup," with a neon light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "π≈3.1415926-53589793-23846264-33832795-02384197". Ultra HD, 4K, cinematic composition'''

negative_prompt = " " # using an empty string if you do not have specific concept to remove


# Generate with different aspect ratios
aspect_ratios = {
    "1:1": (1328, 1328),
    "16:9": (1664, 928),
    "9:16": (928, 1664),
    "4:3": (1472, 1140),
    "3:4": (1140, 1472),
    "3:2": (1584, 1056),
    "2:3": (1056, 1584),
}

width, height = aspect_ratios["16:9"]

image = pipe(
    prompt=prompt + positive_magic["en"],
    negative_prompt=negative_prompt,
    width=width,
    height=height,
    num_inference_steps=50,
    true_cfg_scale=4.0,
    generator=torch.Generator(device="cuda").manual_seed(42)
).images[0]

image.save("example.png")

三、调用 FLUX.1-Kontext-dev

# Install diffusers from the main branch until future stable release
pip install git+https://github.com/huggingface/diffusers.git

import torch
from diffusers import FluxKontextPipeline
from diffusers.utils import load_image

pipe = FluxKontextPipeline.from_pretrained("black-forest-labs/FLUX.1-Kontext-dev", torch_dtype=torch.bfloat16)
pipe.to("cuda")

input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")

image = pipe(
  image=input_image,
  prompt="Add a hat to the cat",
  guidance_scale=2.5
).images[0]