自然语言数字水印的关键技术 - Details - 西安交通大学机构知识库

Author：

何路 (何路.)

Indexed by：

学位论文库

Abstract：

数字水印是目前保护数字内容的重要手段，在图像、音频、视频等方面在过去的近20年里取得了长足的发展。但是当研究者把数字水印的技术运用到由语言文字组成的文本时却遇到了巨大的困难。造成这种困难的原因首先是因为自然语言文字构成的文本文件与数字信号构成的图像、音频、视频文件在本质上是不同的，信号处理的方法不能用来分析语言文字。所以将文本的格式或字形看作图像把文本水印转化为图像水印的研究方法，其鲁棒性都难以有效保障。其次，虽然通过同义词替换、句法变换等手段在自然语言上寻找到了冗余空间，可是其容量仍然有限，而且由于自然语言处理技术本身还不完美，所以应用自然语言处理技术分析文本进行改写不可避免地会由于其分析错误造成改写的错误，影响了数字水印的隐蔽性。基于同样的原因，自然语言数字水印的鲁棒性、隐蔽性也不能采用图像水印的方法来进行评价。但目前自然语言数字水印鲁棒性、隐蔽性的评测工作非常少，而且只是研究者针对自己算法的评估，并不是全面的评估，也缺乏系统的理论支持。有鉴于此，本文以自然语言数字水印的鲁棒性、隐蔽性为研究对象展开研究，并将这些研究成果应用到新算法的设计中去，主要的研究内容和取得的研究成果包括：1.　自然语言数字水印敌手模型的研究。根据自然语言的特点分析了各种可能的攻击，提出一个简洁的自然语言数字水印的敌手模型，仅设计替换攻击和摘要攻击两种攻击就可以仿真实际中各种可能的攻击效果。设计实现了一个自然语言数字水印的自动攻击工具，可以用于测评自然语言数字水印系统的鲁棒性。定量地研究了攻击力度与载体使用价值之间的关系。实验数据表明当压缩比为0.9、攻击力度不超过0.2时，不会对载体的使用价值造成显著影响。2.　自然语言数字水印鲁棒性的研究。根据敌手模型和自然语言文字的特点，建立了攻击力度和误比特率之间关系的数学模型，并且通过实验验证了模型的正确性。理论分析与实验数据显示现有的自然语言数字水印的鲁棒性远比预期的差。对于难以归类做一般性分析的编码算法逐个进行分析：以n维超球为模型，证明了在摘要攻击下扩频向量水印无法在虚警率和漏警率之间进行平衡；从每种词性序列中随机选择一个句子进行替换攻击，实验结果表明只需攻击少量的句子即可破坏词性标记串统计特性水印算法的水印；根据替换攻击对零水印可能的影响，提出同步攻击和生日攻击两种替换攻击的策略，理论和实验都证明了将这两种策略融合起来使用不到0.4%的改写量就可以有几乎百分之百的把握破坏载体中的零水印信息。这些模型和攻击对于评价现有的自然语言数字水印编码算法和研究新的编码算法具有重要的指导意义。3.　自然语言数字水印隐蔽性评测方法的研究。根据语言心理学分析了使用自然语言处理技术改写载体时引入的错误给人类感官上造成影响的程度，提出了自然语言数字水印的隐蔽性评测方法。以此设计实现了一个隐蔽性的自动测评系统。实验证明该方法可以正确地反映人类的感知。4.　自然语言数字水印的设计研究。提出一种以载体单元统一表达不同自然语言处理技术对载体分析的结果，在此基础上设计实现了一个通用嵌入提取算法，可以把任意多种载体操纵技术和一个水印编码算法搭配使用。这样，一方面，有效地缓解了自然语言数字水印的容量问题。另一方面，使得研究者可以分别设计水印编码和载体操纵技术来更加有效地处理鲁棒性和隐蔽性问题。水印编码方面，本文提出一个在载体单元的哈希域上以频次来构造水印编码的算法。这种算法在遭受攻击时，产生误比特的概率几乎为零。进而又提出一个折中方案，可以在鲁棒性和改写量之间调整。载体操纵方面，由于自然语言处理技术在深层分析上正确率不高，本文仅使用词性序列表达句式，根据语言学的研究成果，建立句式之间的变换关系。根据不同变换在语用学上相似的作用对变换关系进行筛选，保障变换在不同上下文中的一致性，实验与现有的典型算法比较，这种方法可以显著减少改写错误，提高水印的隐蔽性。

Keyword：

自然语言数字水印自然语言处理文本水印信息隐藏零水印

Author Community：

[ 1 ] 西安交通大学电子与信息工程学院

Reprint Author's Address：

Show more details

Translated Title

Translated Abstract

Watermarking　is　an　important　way　to　protect　the　copyright　of　digital　contents.　In　the　past　20　years,　image,　audio,　video　watermarking　have　well　developed.　However,　the　technologies　of　image　and　audio　watermarking　hardly　apply　to　natural　language　watermarking　directly.　The　reasons　are:　firstly,　the　nature　of　natural　language　is　quite　different　from　that　of　image　and　audio,　so　the　methods　of　signal　processing　cannot　be　applied　to　natural　language.　The　robustness　cannot　achieve　the　requirement　by　treating　the　format　of　texts　or　strokes　of　character　as　image.　Secondly,　although　it　can　be　obtained　redundancy　space　by　synonyms　substitution　and　syntactic　transformation,　the　capacity　is　still　limited.　Furthermor,　since　the　underdevelopment　of　natural　language　processing　technology,　the　analysis　errors　are　inevitably　will　cause　rewrite　errors　which　result　in　negative　impact　on　imperceptibility.Since　the　same　reason,　evaluating　the　robustness　and　the　imperceptibility　of　natural　language　watermarking　cannot　adapt　the　motheds　of　image　watermarking.　However,　there　are　few　researches　of　the　robustness　and　the　imperceptibility　of　natural　language　watermarking.　Moreover,　these　evaluating　are　not　all　around　and　lack　of　theoretics.　Hence,　the　robustness　and　the　imperceptibility　of　natural　language　watermarking　are　studied　in　this　thesis.　Then,　new　algorithm　is　designed.　The　main　works　and　creations　are　as　follows:1. Research　on　adversary　model　of　natural　language　watermarking.　According　to　the　characteristic　of　natural　language,　an　adversary　model　is　proposed　which　can　simulate　all　kinds　of　real　attack　by　the　replace　attack　and　the　summary　attack.　For　evaluating　the　robustness　of　natural　language　watermarking　systems　an　automatic　attack　tool　is　designed　and　developed.　The　“attack　strength　vs.　value-in-use”　is　investigated.　The　experiment　result　shows　that　when　compression　radio　is　0.9　and　attack　strength　less　than　0.2,　summary　attack　wouldn’t　affect　the　value　of　carrier　remarkably.　2. Research　on　the　robustness　of　natural　language　watermarking.　According　to　adversary　model　and　the　characteristic　of　natural　language,　the　“attack　strength　vs.　bit-error”　is　investigated　and　the　theory　models　are　proposed.　These　theory　models　have　been　verified　by　experiments.　The　experiments　results　show　that　the　extant　natural　language　watermarking　are　not　as　good　as　researches　expected.　For　the　algorithms　that　cannot　be　classified,　these　algorithms　are　analyzed　one　by　one:　under　summary　attack,　by　n-dimension　hyper-sphere　model,　it　is　proved　that　the　spread　spectrum　vector　watermarking　cannot　saticfied　the　false　positive　rate　and　false　negative　rate　at　the　same　time　By　performing　replace　attack　on　a　sentence　that　choose　from　each　class　of　parts-of-speech　tag　sequence,　the　result　shows　that　the　parts-of-speech　tag　sequence　watermarking　can　be　destroyed　by　few　attacks　According　to　the　influence　of　replace　attack　on　zero　watermarking,　the　synchronization　attack　and　the　birthday　attack　are　proposed.　Both　theory　and　experiment　proved　that　less　0.4%　rewrite　radio　will　be　almost　certainly　destroyed　the　zero　watermarking　by　combine　the　two　attacks.　3. Research　on　evaluating　method　of　imperceptibility　of　natural　language　watermarking.　According　to　linguistic　psychology,　the　influences　on　human　sense　are　analyzed　which　caused　by　rewrite　texts　and　propose　a　perceptual　model　of　natural　language　watermarking.　According　to　the　perceptual　model,　an　automatic　evaluate　system　is　designed　for　evaluating　imperceptibility　of　natural　language　watermarking.　The　expiremental　results　show　this　evaluating　method　is　coincident　with　human　sense.4. Research　on　natural　language　watermarking　design.　the　concept　of　cover　unit　is　proposed　which　hide　the　details　of　different　natural　language　processing　technology.　Based　on　cover　unit,　a　general　embedding　and　extracting　algorithm　is　designed　which　combing　arbitary　number　of　carrier　manipulating　technologies　and　a　watermarking　coding　algorithm　together.　In　consequence,　on　one　hand,　the　capacity　is　impoved.　On　the　other　hand,　such　design　can　be　treat　the　robustness　and　imperceptibility　respectively.　In　watermarking　coding,　a　hash　domain　algorithm　is　proposed　which　use　the　hash　value　of　cover　unit.　The　bit-error-radio　of　our　coding　algorithm　almost　equal　to　0.　Furthermore,　a　banlence　algorithm　is　proposed　which　can　be　adjusted　between　robustness　and　rewrite　radio.　In　cover　manipulating,　due　to　the　low　accuracy　of　depth　analysis　of　natural　language　processing　technology,　it　is　proposed　using　parts-of-speech　tag　sequence　to　describ　the　transformation　relation　between　sentence　patterns　and　filter　the　transformation　to　guarantee　the　consistency　of　transformations　by　measure　the　pragmatics　similarity　of　sentence　pattens.　The　experiment　result　shows　that　our　method　can　reduce　errors　prominently.　Thus,　our　method　can　improve　the　imperceptibility　of　watermarking.

Translated Keyword

[]

Research Interests

Classification

Corresponding authors email

Basic Info ：

Degree：工学博士

Mentor：桂晓林

Student No.：

Year： 2013

Language： Chinese

Cited Count：

WoS CC Cited Count： 0

30 Days PV： 5

Affiliated Colleges：

电子与信息工程学部（原电子与信息工程学院）本学院/部未明确归属的数据

Location

Library Discovery Baidu Scholar Search

Type
Departments

All Years Choose Year From to