Dataset Domain Task-setting Paper url-paper url-dataset
LCSTS weibo task-single LCSTS: A Large Scale Chinese Short Text Summarization Dataset https://arxiv.org/pdf/1506.05865.pdf http://icrc.hitsz.edu.cn/Article/show/139.html
Xsum news task-single Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization https://arxiv.org/pdf/1808.08745.pdf https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset
RottenTomatoes movie task-multi Neural Network-Based Abstract Generation for Opinions and Arguments https://arxiv.org/pdf/1606.02785.pdf http://www.ccs.neu.edu/home/luwang
Reddit TIFU social-media task-single Abstractive Summarization of Reddit Posts with Multi-level Memory Networks https://arxiv.org/pdf/1811.00783.pdf https://drive.google.com/uc?id=1ffWfITKFMJeqjT8loC8aiCLRNJpc_XnF&export=download
BIGPATENT patent task-longtext, task-single BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization https://arxiv.org/pdf/1906.03741.pdf https://evasharma.github.io/bigpatent
CNNDM news task-single Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond https://arxiv.org/pdf/1602.06023.pdf https://github.com/abisee/cnn-dailymail
Arxiv scientific-paper task-longtext, task-single A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents https://arxiv.org/pdf/1804.05685.pdf https://github.com/armancohan/long-summarization
Pubmed scientific-paper task-longtext, task-single A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents https://arxiv.org/pdf/1804.05685.pdf https://github.com/armancohan/long-summarization
Newsroom news task-single Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies https://arxiv.org/pdf/1804.11283.pdf https://summari.es
BillSum legislation task-single BillSum: A Corpus for Automatic Summarization of US Legislation https://arxiv.org/pdf/1910.00523.pdf https://github.com/FiscalNote/BillSum
AESLC email task-single This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation https://www.aclweb.org/anthology/P19-1043.pdf https://github.com/ryanzhumich/AESLC
SAMSum dialogue task-single SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization https://arxiv.org/pdf/1911.12237.pdf https://arxiv.org/src/1911.12237v2/anc
Global Voices multilingual task-single, multilingual Global Voices: Crossing Borders in Automatic News Summarization https://www.aclweb.org/anthology/D19-5411.pdf
WikiSum wiki task-multi Generating Wikipedia by Summarizing Long Sequences https://arxiv.org/pdf/1801.10198.pdf https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/wikisum
ScisummNet scientific-paper task-single ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks https://arxiv.org/pdf/1909.01716.pdf https://cs.stanford.edu/~myasu/projects/scisumm_net
Multi-News news task-multi Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model https://arxiv.org/pdf/1906.01749.pdf https://github.com/Alex-Fabbri/Multi-News
Auto-hmds multilingual task-multi Auto-hMDS: Automatic Construction of a Large Heterogeneous Multilingual Multi-Document Summarization Corpus http://www.lrec-conf.org/proceedings/lrec2018/pdf/1018.pdf https://github.com/AIPHES/auto-hMDS
WikiHow wiki task-single WikiHow: A Large Scale Text Summarization Dataset https://arxiv.org/pdf/1810.09305.pdf https://github.com/mahnazkoupaee/WikiHow-Dataset
DisputeDiscussions debate task-single Understanding and Detecting Supporting Arguments of Diverse Types https://arxiv.org/pdf/1705.00045.pdf http://www.ccs.neu.edu/home/luwang/data.html
Debatepedia debate task-question Diversity driven Attention Model for Query-based Abstractive Summarization https://arxiv.org/pdf/1704.08300v1.pdf https://github.com/PrekshaNema25/DiverstiyBasedAttentionMechanism
Funcom code task-code Recommendations for Datasets for Source Code Summarization https://arxiv.org/pdf/1904.02660.pdf http://leclair.tech/data/funcom/
Talksum scientific-paper task-multi TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks https://arxiv.org/pdf/1906.01351.pdf https://github.com/levguy/talksumm
Multi-Aspect CNN/DM news task-aspect Inducing Document Structure for Aspect-based Summarization https://www.aclweb.org/anthology/P19-1630.pdf
proto-summ court-judgment task-single How to Write Summaries with Patterns? Learning towards Abstractive Summarization through Prototype Editing https://arxiv.org/pdf/1909.08837.pdf https://github.com/gsh199449/proto-summ
PeerRead peer-review task-single A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications https://arxiv.org/pdf/1804.09635.pdf https://github.com/allenai/PeerRead/tree/master/data
AMI meeting task-multimodal The AMI meeting corpus: a pre-announcement https://www.m-iti.org/uploads/MLMI2005_CaAsBoEtAl.pdf http://groups.inf.ed.ac.uk/ami/download
BookSum book task-longtext Explorations in Automatic Book Summarization https://www.aclweb.org/anthology/D07-1040.pdf http://lit.csci.unt.edu/index.php/Downloads
MScript movie-script task-single Movie Script Summarization as Graph-based Scene Extraction https://www.aclweb.org/anthology/N15-1113.pdf
Summarizing Opinions product-review task-opinion Summarizing Opinions: Aspect Extraction Meets Sentiment Prediction and They Are Both Weakly Supervised https://arxiv.org/pdf/1808.08858v1.pdf https://github.com/stangelid/oposum