Translated Abstract
Watermarking is an important way to protect the copyright of digital contents. In the past 20 years, image, audio, video watermarking have well developed. However, the technologies of image and audio watermarking hardly apply to natural language watermarking directly. The reasons are: firstly, the nature of natural language is quite different from that of image and audio, so the methods of signal processing cannot be applied to natural language. The robustness cannot achieve the requirement by treating the format of texts or strokes of character as image. Secondly, although it can be obtained redundancy space by synonyms substitution and syntactic transformation, the capacity is still limited. Furthermor, since the underdevelopment of natural language processing technology, the analysis errors are inevitably will cause rewrite errors which result in negative impact on imperceptibility.Since the same reason, evaluating the robustness and the imperceptibility of natural language watermarking cannot adapt the motheds of image watermarking. However, there are few researches of the robustness and the imperceptibility of natural language watermarking. Moreover, these evaluating are not all around and lack of theoretics. Hence, the robustness and the imperceptibility of natural language watermarking are studied in this thesis. Then, new algorithm is designed. The main works and creations are as follows:1. Research on adversary model of natural language watermarking. According to the characteristic of natural language, an adversary model is proposed which can simulate all kinds of real attack by the replace attack and the summary attack. For evaluating the robustness of natural language watermarking systems an automatic attack tool is designed and developed. The “attack strength vs. value-in-use” is investigated. The experiment result shows that when compression radio is 0.9 and attack strength less than 0.2, summary attack wouldn’t affect the value of carrier remarkably. 2. Research on the robustness of natural language watermarking. According to adversary model and the characteristic of natural language, the “attack strength vs. bit-error” is investigated and the theory models are proposed. These theory models have been verified by experiments. The experiments results show that the extant natural language watermarking are not as good as researches expected. For the algorithms that cannot be classified, these algorithms are analyzed one by one: under summary attack, by n-dimension hyper-sphere model, it is proved that the spread spectrum vector watermarking cannot saticfied the false positive rate and false negative rate at the same time By performing replace attack on a sentence that choose from each class of parts-of-speech tag sequence, the result shows that the parts-of-speech tag sequence watermarking can be destroyed by few attacks According to the influence of replace attack on zero watermarking, the synchronization attack and the birthday attack are proposed. Both theory and experiment proved that less 0.4% rewrite radio will be almost certainly destroyed the zero watermarking by combine the two attacks. 3. Research on evaluating method of imperceptibility of natural language watermarking. According to linguistic psychology, the influences on human sense are analyzed which caused by rewrite texts and propose a perceptual model of natural language watermarking. According to the perceptual model, an automatic evaluate system is designed for evaluating imperceptibility of natural language watermarking. The expiremental results show this evaluating method is coincident with human sense.4. Research on natural language watermarking design. the concept of cover unit is proposed which hide the details of different natural language processing technology. Based on cover unit, a general embedding and extracting algorithm is designed which combing arbitary number of carrier manipulating technologies and a watermarking coding algorithm together. In consequence, on one hand, the capacity is impoved. On the other hand, such design can be treat the robustness and imperceptibility respectively. In watermarking coding, a hash domain algorithm is proposed which use the hash value of cover unit. The bit-error-radio of our coding algorithm almost equal to 0. Furthermore, a banlence algorithm is proposed which can be adjusted between robustness and rewrite radio. In cover manipulating, due to the low accuracy of depth analysis of natural language processing technology, it is proposed using parts-of-speech tag sequence to describ the transformation relation between sentence patterns and filter the transformation to guarantee the consistency of transformations by measure the pragmatics similarity of sentence pattens. The experiment result shows that our method can reduce errors prominently. Thus, our method can improve the imperceptibility of watermarking.
Corresponding authors email