:::info
Author:
(1) Mingda Chen.
:::
Table of Links
2.1 Self-Supervised Language Pretraining
2.2 Naturally-Occurring Data Structures
2.3 Sentence Variational Autoencoder
3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING
3.1 Improving Language Representation Learning via Sentence Ordering Prediction
3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training
4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA
4.1 Learning Entity Representations from Hyperlinks
4.2 Learning Discourse-Aware Sentence Representations from Document Structures
4.3 Learning Concept Hierarchies from Document Categories
5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY
5.1 Disentangling Semantics and Syntax in Sentence Representations
5.2 Controllable Paraphrase Generation with a Syntactic Exemplar
6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS
6.1 Long-Form Data-to-Text Generation
6.2 Long-Form Text Summarization
6.3 Story Generation with Constraints
APPENDIX A – APPENDIX TO CHAPTER 3
APPENDIX B – APPENDIX TO CHAPTER 6
6.4 Summary
In this chapter, we showed that naturally-occurring textual resources can be tailored to build datasets for long-form data-to-text generation, long-form text summarization, and story generation with constraints. For each dataset, we conducted experiments to characterize the challenges in these new datasets. We also proposed new (either automatic or human-evaluation) metrics and models for these tasks to promote research in these directions.
:::info
This paper is available on arxiv under CC 4.0 license.
:::