基于轻量级Transformer+跨模态注意力融合的电商商品图文匹配算法许幸

版权信息 · Copyright

加入编委 / 副主编 / 同行评议专家
ISSN 3081-135X（Print）
ISSN 3081-1368（Online）
CODEN：KYYYAU
国际标准连续出版物标识符·全球唯一标识符
分配机构：美国化学文摘社（CAS）
DOI前缀：10.66106/kyyyau
国图集团 CIBTC
进口备案刊号：G015Z105

编委会（Editorial Board）
主编
张雪娇
同行评议专家
副主编
唐阳/广西中医药大学附属瑞康医院
 郑水泉/碧桂园生活服务集团东莞分公司
王艳/吴春红/刘志风/荣欣/王鹤
（排名不分先后顺序）
编委
楚泽滨/中俄体育俱乐部(黑龙江)公司
编委会助理
韩玲

索引 / 检索 / 存档（Global Indexing）
本刊入选以下全球权威数据库，致力于研究成果的全球化无障碍传播。
DOI
ICI 哥白尼索引（波兰）
EuroPub 欧洲文献数据库（英国）
Academia 学术界（美国）
CJWK 长江文库（中国）
OALib 开放存取资源图书馆（美国）
Baidu Scholar 百度学术（中国）
Baidu Baike 百度百科（中国）
Google Scholar 谷歌学术（美国）
Microsoft Bing 微软学术（美国）
SPI-Hub™ 范德比尔特大学（美国）
ESJI 欧亚科学期刊索引（哈萨克斯坦）
ResearchBib 研究者索引（日本）
KIND CONGRESS（阿塞拜疆）
Sci Online 科学在线（中国澳门）
SCRIBD 数字图书馆（美国）
Yahoo! Search（美国）
Naver 搜遍（韩国）
SJIF 科学期刊索引（印度）
RJIF 研究期刊影响因子
ASCI 亚洲科学引文索引（美国）
LivRe 巴西国家核能委员会CIN（巴西）
RCCSE 中国学术期刊收录（武汉大学）
Semantic Scholar 语义学者（美国）
CiteFactor 指标与信任索引（印度）
OAJ 全球OA期刊索引（中国）
Portico 数字保存库（美国）
Scite（美国）
EBSCO（美国）
MDPI scilit（瑞士）
Crossref 交叉引用（美国）
ProQuest 科睿唯安（美国）
ResearchGate 研究之门（德国）
COAJ 开放获取期刊数据（中国）
DOAJ 开放获取期刊目录（瑞典）
J-Gate 开放获取期刊门户（印度）
WorldCat 全球联机编目数据库（美国）
J-STAGE 学术信息电子期刊库（日本）
arXiv 学术论文预印本平台（美国）
PubMed 医学检索库（美国）
PubMed Central 医学全文库（美国）
The Lens 透镜学术（澳大利亚）
Scopus 爱思唯尔·斯高帕斯（荷兰）
ChatGPT OpenAI在线服务（美国）
EZB 电子期刊图书馆（德国）

Editing and Publishing
Quest Press Limited
Address
Unit D, 7th Floor, No. 19, Rua de Pa̍k-chiông Vai, Macau
Telephone
+853 6881 9699
Email
QuestPress@hotmail.com
Web site
https://kjyj.scionline2025.com

期刊分类 · Categories

材料与化学

能源与环境

数学与物理

经济与管理

人文与社会科学

联合创办期刊

教育学与心理学

计算机与信息科学

医学与健康/生命科学

科学技术与社会（STS)

往期阅览 · All Issues-> 2025年 · 2025

科技研究与应用，2025年，第1卷，第3期，第1-7页，pISSN 3081-135X、eISSN 3081-1368

发布者：Quest Press

发布日期：2026/3/17

doi 10.66106/kyyyau.20250301

Open Access

下载0次

浏览12次

基于轻量级Transformer+跨模态注意力融合的电商商品图文匹配算法

许幸

眉山药科职业学院，四川眉山，620000

摘要：针对电商场景下商品图文匹配存在的特征对齐精度低、通用模型算力消耗大及属性不一致校验难等问题，提出一种基于轻量级Transformer与跨模态属性注意力融合的图文匹配算法。该算法首先通过轻量化改造的ITransformer与TTransformer分别提取商品图像视觉特征与文本语义特征，引入动态通道剪枝（剪枝率40%）与特征维度压缩技术（压缩比r=4）降低参数量；随后设计电商专属的跨模态属性注意力模块（AGA），将文本属性词作为Query与图像空间特征进行深度对齐；最后通过余弦相似度实现匹配判定。在FashionGen与自采电商数据集上的实验表明，本算法Top1 准确率达88.7%，模型参数量仅为38.6M，推理速度达185 FPS，在保证匹配精度的前提下实现了显著的轻量化，适配电商平台实时推荐与违规校验场景。

关健词：电商商品匹配；轻量级Transformer；跨模态融合；属性注意力；通道剪枝

E-commerce Product Image-Text Matching Algorithm Based on Lightweight Transformer Cross-modal Attention Fusion

Xing Xu

Meishan College of Chinese Medicine, Meishan, Sichuan, 620000 , China

Abstract: In response to the problems of low feature alignment accuracy, large computational power consumption of general models, and difficult attribute inconsistency verification in commodity-text matching scenarios in e-commerce, we propose a lightweight transformer and cross-modal attribute attention fusion based image-text matching algorithm. This algorithm first extracts commodity image visual features text semantic features separately through the lightweight ITTransformer and TTransformer, and introduces dynamic channel pruning (pruning rate 40%) and feature dimension compression technology (compression r=4) to reduce the number of parameters. Subsequently, a cross-modal attribute attention module (AGA) is designed for e-commerce, which uses text attribute as Queries and deeply aligns them with image spatial features. Finally, the cosine similarity is used to achieve matching determination. Experiments on FashionGen and self-collected ecommerce datasets show that the Top1 accuracy of this algorithm reaches 88.7%, the model parameter quantity is only 38.6M, and the reasoning speed 185 FPS. Under the premise of ensuring the accuracy of the match, it has been significantly lightweighted, and it is suitable for real-time recommendation and violation verification in e-commerce platforms.

Keywords : E-commerce Commodity Matching; Lightweight Transformer; Cross-modal Fusion; Attribute Attention; Channel

参考文献
[1] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, etal. An image is worth 16x16 words: Transformers forimage recognition at scale [C]// Proceedings of the2021 International Conference on LearningRepresentations (ICLR), 2021: 121.
[2] RADFORD A, KIM J W, HALLACY C, et al.Learning transferable visual models from naturallanguage supervision [C]// Proceedings of the 38thInternational Conference on Machine Learning(ICML), 2021: 8748-8763.
[3] LI J, SELVARAJU R, GOTHARE R, et al. Alignbefore fuse: Vision and language representationlearning with momentum distillation [C]// Advances inNeural Information Processing Systems (NeurIPS),2021: 9694-9705.
[4] MEHTA S, RASTE GARI M. MobileViT:Lightweight, general-purpose, and mobile-friendlyvision transformer [C]// Proceedings of the 2022International Conference on Learning Representations(ICLR), 2022.
[5] 王某某, 李某某. 面向电商场景的细粒度图文匹配算法研究[J]. 软件导刊, 2023, 22(04): 45-50.
[6] 张某, 陈某. 基于跨模态注意力的商品推荐系统设计[J]. 计算机工程与设计, 2023, 44(02): 512-518.
[7] CHEN Z, GUO J, WU W, et al. FashionGen: Thegenerative fashion dataset and challenge [EB/OL].arXiv preprint, arXiv:1806.08317, 2018.
[8] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is all you need [C]// Advances in NeuralInformation Processing Systems, 2017: 5998-6008.
[9] 刘某, 赵某. 轻量级Transformer 在嵌入式视觉中的应用[J]. 信息技术应用, 2024, 18(01): 22-28.
[10] HE K, CHEN X, XIE S, et al. Masked autoencodersare scalable vision learners [C]// Proceedings of theIEEE/CVF Conference on Computer Vision andPattern Recognition (CVPR), 2022: 16000-16009.

论文收录证明 / 文献检索报告

Document Retrieval Certificate / Proof of Publication Indexing

作者贡献声明 / 贡献确认书

Author Contribution Statement / Certificate of Authorship Contribution

同行评审报告 / 评审意见

Peer Review Report / Peer Review Comments

利益冲突

Conflict of Interest

作者声明不存在任何利益冲突。

The author declares no conflict of interest.

本文采用知识共享“署名 4.0 国际”许可 (CC BY 4.0) 进行许可。许可协议详情请访问： https://creativecommons.org/licenses/by/4.0/。

This article is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). For details of the license, please visit:https://creativecommons.org/licenses/by/4.0/.