Injecting Descriptive Meta-information into Pre-trained Language Models with Hypernetworks
(3 minutes introduction)
Wenying Duan (Nanchang University, China), Xiaoxi He (ETH Zürich, Switzerland), Zimu Zhou (Singapore Management University, Singapore), Hong Rao (Nanchang University, China), Lothar Thiele (ETH Zürich, Switzerland) |
---|
Pre-trained language models have been widely adopted as backbones in various natural language processing tasks. However, existing pre-trained language models ignore the descriptive meta-information in the text such as the distinction between the title and the mainbody, leading to over-weighted attention to insignificant text. In this paper, we propose a hypernetwork-based architecture to model the descriptive meta-information and integrate it into pre-trained language models. Evaluations on three natural language processing tasks show that our method notably improves the performance of pre-trained language models and achieves the state-of-the-art results on keyphrase extraction.