中国外语

2025, 04, v.22 85-95

大语言模型之于文学翻译的适切性研究——基于多指标评估的《边城》多模型译文质量对比

1.华东师范大学

基金项目(Foundation): 上海市AI赋能科研计划专项“计算语言学”（编号：2024A101001）的阶段性成果

邮箱(Email):

DOI: 10.13564/j.cnki.issn.1672-9382.2025.04.008

724	0	64
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

人工智能技术，特别是大语言模型在文学翻译领域的切适性问题逐渐成为学界关注和争议的焦点。本研究以沈从文的《边城》英译文为例，构建一个“1对14”汉英平行语料库，并采用BLEU、TER、chrF++和BERTScore评估指标，以传统神经机器翻译为基准对比不同大语言模型的翻译性能，以考察不同模型的文学翻译性能。研究发现，DeepSeek模型汉译英方向的文学翻译质量优于传统神经机器翻译和其他大语言模型，而提示词工程在文学翻译中的效果不如其在非文学领域显著。本研究为基于大语言模型的机器翻译在文学领域的应用提供了实证依据，有助于促进人工智能与翻译研究的跨学科耦合，为未来扩展文本类型和探索特定文学翻译的提示策略提供方向。

关键词：

Abstract：

The suitability of artificial intelligence technologies, particularly large language models, in the field of literary machine translation has gradually become a focal point of academic attention and debate. This study takes Shen Congwen's Biancheng as a case study, constructing a “1-to-14” ChineseEnglish parallel corpus and employing evaluation metrics such as BLEU, TER, chrF++, and BERTScore to compare the translation performance of different LLMs against traditional neural machine translation(NMT) as a benchmark. The findings reveal that the DeepSeek model outperforms traditional NMT and other LLMs in Chinese-to-English literary translation quality, while the ef fectiveness of prompt engineering in literary translation may not be as pronounced as in non-literary domains. This study provides empirical evidence for the application of LLM-based machine translation in the literary field,fosters interdisciplinary integration between AI and translation studies, and offers directions for future research to expand text types and explore tailored prompt strategies for specific literary translations.

KeyWords： large language models; literary translation; Biancheng; translation quality assessment;

如需获取全文，请访问cnki.net

参考文献

[1]Agrawal,S.,C.Zhou,M.Lewis et al.Incontext examples selection for machine translation[A].In Findings of the Association for Computational Linguistics:ACL 2023[C].Toronto:Association for Computational Linguistics,2023.

[2]Cho,K., B.V.Merri?nboer,C.Gulcehre et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[A].In Proceedings of the 2014Conference on Empirical Methods in Natural Language Processing(EMNLP)[C].Doha:Association for Computational Linguistics,2014.

[3]Gao,Y.,R.Wang&F.Hou.How to design translation prompts for ChatGPT:An empirical study[Z].arXiv,2023.DOI:10.48550/arXiv.2304.02182.

[4]Guerreiro,N.M.,R.Rei,D.V.Stigt et al.xCOMET:Transparent machine translation evaluation through fine-grained error detection[J].Transactions of the Association for Computational Linguistics,2024(1 2):979-995.

[5]Jiao,W.,W.Wang,J.Huang et al.Is ChatGPT a good translator?A preliminary study[Z].arXiv,2023.DOI:10.48550/arXiv.2301.08745.

[6]Jiao,W.,W.Wang,J.Huang et al.Is ChatGPT a good translator?Yes with GPT-4 as the engine[Z].arXiv,2023.DOI:10.48550/arXiv.2301.08745.

[7]Juraska,J.,M.Finkelstein,D.Deutsch et al.MetricX-23:The Google submission to the WMT 2023 metrics shared task[A].In Proceedings of the Eighth Conference on Machine Translation[C].Singapore:Association for Computational Linguistics,2023.

[8]Kocmi,T.&C.Federmann.GEMBA-MQM:Detecting translation quality error spans with GPT-4[Z].arXiv,2023.DOI:10.48550/arXiv.2310.13988.

[9]Lee,S.,J.Lee,H.Moon et al.A survey on evaluation metrics for machine translation[J].Mathematics,2023.DOI:10.3 3 90/math11041006.

[10]Li,Y.,Y Yin,J.Li et al.Prompt-driven neural machine translation[A].In Findings of the Association for Computational Linguistics:ACL2022[C].Dublin:Association for Computational Linguistics,2022.

[11]Liu,P.,W.Yuan,J.Fu et al.Pre-train,prompt,and predict:A systematic survey of prompting methods in natural language processing[J].ACM Computing Surveys,2023(9):1-35.

[12]Liu,S.&H.Cao.The effectiveness of ChatGPT in translating chunky construction texts in Chinese political discourse[J].Journal of Electrical Systems,2024(2):1684-1698.

[13]Liu,Y.,J.Gu,N.Goyal et al.Multilingual denoising pre-training for neural machine translation[J].Transactions of the Association for Computational Linguistics,2020(8):726-742.

[14]Lu,H.,H.Huang,D.Zhang et al.Chain-ofdictionary prompting elicits translation in large language models[Z].arXiv,2024.DOI:10.485 50/arXiv.23 05.06575.

[15]Papineni,K., S.Roukos,T.Ward et al.BLEU:A method for automatic rvaluation of machine translation[A].In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics(ACL)[C].Philadelphia:Association for Computational Linguistics,2001.

[16]Peng,K.,L.Ding,Q.Zhong et al.Towards making the most of ChatGPT for machine translation[Z].arXiv,2023.DOI:10.48550/arXiv.2305.06575.

[17]Popovic,M.chrF:Character n-gram F-score for automatic MT evaluation[A].In Proceedingsof the Tenth Workshop on Statistical Machine Translation[C].Lisbon:Association for Computational Linguistics,2015.

[18]Popovic,M.chrF++:Words helping character n-grams[A].In Proceedings of the Second Conference on Machine Translation[C].Copenhagen:Association for Computational Linguistics,2017.

[19]Robinson,N.R.,P.Ogayo,D.R.Mortensen&G.Neubig.ChatGPT MT:Competitive for high-(but not low-)resource languages[Z].arXiv,2023.DOI:10.48550/arXiv.2309.07423.

[20]Snover,M.,B.Dorr,R.Schwartz et al.A study of translation edit rate with targeted human annotation[A].In Proceedings of the

7th Conference of the Association for Machine Translation in the Americas:Technical Papers[C].Cambridge:Association for Machine Translation in the Americas,2006.

[21]Son,J.&B.Kim.Translation performance from the user's perspective of large language models and neural machine translation sy stems[J].Information,2023(10):574.

[22]Vaswani,A.,N.Shazeer,N.Parmar et al.Attention is all you need[A].A dvances in Neural Information Processing Systems[C].New York:Neural Information Processing Systems,2017.

[23]Velásquez-Henao,J.D.,C.J.Franco-Cardona&L.Cadavid-Higuita.Prompt engineering:A methodology for optimizing interactions with AI-Language models in the field of engineering[J].DYNA,2023(23 0):9-17.

[24]Wu,M.,J.Xu,Y.Yuan et al.(Perhaps)Beyond human translation:Harnessing multiagent collaboration for translating ultra-long literary texts[Z].arXiv,2024.DOI:10.48550/arXiv.2405.11804.

[25]Wu,Y.&G.Hu.Exploring prompt engineering with GPT language models for document-level machine translation:Insights and findings[A].In Proceedings of the Eighth Conference on Machine Translation[C].Singapore:Association for Computational Linguistics,2023:166-169.

[26]Xu,H.,Y.J.Kim,A.Sharaf&H.H.Awadalla.A Paradigm shift in machine translation:Boostingtranslation performance of large language models[Z].arXiv,2024.DOI:10.48550/arXiv.2309.11674.

[27]Zhang,T.,V.Kishore,F.Wu et al.BERTScore:Evaluating text gneration with BERT[Z].arXiv,2019.DOI:10.48550/arXiv.1904.09675.

[28]Zoph,B.,D.Yuret,J.May&K.Knight.Transfer learning for low-resource neural machine translation[Z].arXiv,2016.DOI:10.48550/arXiv.1604.02201.

[29]曹智泉,穆永誉,肖桐,等.预训练神经机器翻译研究进展分析[J].中文信息学报,2024(6):1-23.

[30]程珊,张勇.自贸协定翻译“三合”及信息熵解析:以NAFTA翻译为例[J].上海翻译,2024(1):43-49.

[31]耿芳,胡健.人工智能辅助译后编辑新方向——基于ChatGPT的翻译实例研究[J].中国外语,2023(3):41-47.

[32]刘磊,梁茂成.LingAlign:基于跨语言句向量的多语种句对齐方法研究[J].数据分析与知识发现,2024(3):1-11.

[33]刘世界.涉海翻译中的机器翻译应用效能:基于BLEU、chrF++和BERTScore指标的综合评估[J].中国海洋大学学报(社会科学版),2024(2):21-31.

[34]沈从文.沈从文全集(第8卷·小说)[M].太原:北岳文艺出版社,2002.

[35]沈梦菲,黄伟.寻找机器翻译痕迹——神经机器翻译文本的句法特征研究[J].外语教学与研究,2024(3):429-441;480-481.

[36]王和私,马柯昕.人工智能翻译应用的对比研究——以生物医学文本为例[J].中国科技翻译,2023(3):23-26.

[37]王金铨,文秋芳.国内外机器自动评分系统评述——兼论对中国学生翻译自动评分系统的启示[J].外语界,2010(1):75-81;91.

[38]王立非,林旭.国外翻译技术研究进展及其对翻译教学的启示[J].外语教学理论与实践,2023(6):66-77;41.

[39]王子云,毛毳.ChatGPT译文质量的评估与提升——以陶瓷类文本汉英翻译为例[J].山东陶瓷,2023(4):20-27.

[40]文旭,田亚灵.ChatGPT应用于中国特色话语翻译的有效性研究[J].上海翻译,2024(2):27-34;94-95.

[41]杨锋昌.ChatGPT对译员的思考与启示——以越南语法律翻译为例[J].中国科技翻译,2023(3):27-30;4.

[42]于蕾.ChatGPT翻译的词汇多样性和句法复杂度研究[J].外语教学与研究,2024(2):297-307;321.

[43]袁毓林.如何测试ChatGPT的语义理解与常识推理水平?——兼谈大语言模型时代语言学的挑战与机会[J].语言战略研究,2024a(1):49-63.

[44]袁毓林.ChatGPT等大模型的语言处理机制及其理论蕴涵[J].外国语,2024b(4):2-14.

[45]赵雪,赵志枭,孙凤兰,等.面向语言文学领域的大语言模型性能评测研究[J].外语电化教学,2023(6):57-65;114.

[46]赵志枭,胡蝶,刘畅,等.人文社科领域中文通用大模型性能评测[J].图书情报工作,2024(13):132-143.

(2) 4个人工译本：Green Jade and Green Jade（项美丽、邵洵美合译，1936）、The Frontier City（金隄、白英合译，1947）、The Border Town and Other Stories（戴乃迭译，1981）、Border Town:A Novel（金介甫译，2009），用于计算BLEU、chrF++、TER和BERTScore指标。

(3)原始1对4人工翻译平行语料库的一行对齐单元中包含：原文、人类翻译1(HT1）、HT2、HT3、HT4。然而，对齐单元中的原文，有些译者漏译了，漏译部分用特殊的符号填充，如“****”。设置30个字符的限制条件，可以去除这些漏译的行，确保每行对齐单元的机器译文都有4个人类参考译文进行评分计算。

(4) BP,Brevity Penalty，即确保机器candidate译文的长度比reference人工参考译文的长度要来的短时付出代价。观察公式我们发现c>r的情况下机器译文更长没有影响，但是c<=r的情况下，机器译文更短则会使得BP<1,BLEU值相应变低。

基本信息:

DOI：10.13564/j.cnki.issn.1672-9382.2025.04.008

中图分类号:I046;H315.9

引用信息:

[1]张曙康,赵朝永.大语言模型之于文学翻译的适切性研究——基于多指标评估的《边城》多模型译文质量对比[J].中国外语,2025,22(04):85-95.DOI:10.13564/j.cnki.issn.1672-9382.2025.04.008.

基金信息:

上海市AI赋能科研计划专项“计算语言学”（编号：2024A101001）的阶段性成果

请选择需要下载的pdf数据

中国外语

Summary

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文