Dataset |
Domain |
Task-setting |
Paper |
url-paper |
url-dataset |
LCSTS |
weibo |
task-single |
LCSTS: A Large Scale Chinese Short Text Summarization Dataset |
https://arxiv.org/pdf/1506.05865.pdf |
http://icrc.hitsz.edu.cn/Article/show/139.html |
Xsum |
news |
task-single |
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization |
https://arxiv.org/pdf/1808.08745.pdf |
https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset |
RottenTomatoes |
movie |
task-multi |
Neural Network-Based Abstract Generation for Opinions and Arguments |
https://arxiv.org/pdf/1606.02785.pdf |
http://www.ccs.neu.edu/home/luwang |
Reddit TIFU |
social-media |
task-single |
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks |
https://arxiv.org/pdf/1811.00783.pdf |
https://drive.google.com/uc?id=1ffWfITKFMJeqjT8loC8aiCLRNJpc_XnF&export=download |
BIGPATENT |
patent |
task-longtext, task-single |
BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization |
https://arxiv.org/pdf/1906.03741.pdf |
https://evasharma.github.io/bigpatent |
CNNDM |
news |
task-single |
Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond |
https://arxiv.org/pdf/1602.06023.pdf |
https://github.com/abisee/cnn-dailymail |
Arxiv |
scientific-paper |
task-longtext, task-single |
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents |
https://arxiv.org/pdf/1804.05685.pdf |
https://github.com/armancohan/long-summarization |
Pubmed |
scientific-paper |
task-longtext, task-single |
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents |
https://arxiv.org/pdf/1804.05685.pdf |
https://github.com/armancohan/long-summarization |
Newsroom |
news |
task-single |
Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies |
https://arxiv.org/pdf/1804.11283.pdf |
https://summari.es |
BillSum |
legislation |
task-single |
BillSum: A Corpus for Automatic Summarization of US Legislation |
https://arxiv.org/pdf/1910.00523.pdf |
https://github.com/FiscalNote/BillSum |
AESLC |
email |
task-single |
This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation |
https://www.aclweb.org/anthology/P19-1043.pdf |
https://github.com/ryanzhumich/AESLC |
SAMSum |
dialogue |
task-single |
SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization |
https://arxiv.org/pdf/1911.12237.pdf |
https://arxiv.org/src/1911.12237v2/anc |
Global Voices |
multilingual |
task-single, multilingual |
Global Voices: Crossing Borders in Automatic News Summarization |
https://www.aclweb.org/anthology/D19-5411.pdf |
|
WikiSum |
wiki |
task-multi |
Generating Wikipedia by Summarizing Long Sequences |
https://arxiv.org/pdf/1801.10198.pdf |
https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/wikisum |
ScisummNet |
scientific-paper |
task-single |
ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks |
https://arxiv.org/pdf/1909.01716.pdf |
https://cs.stanford.edu/~myasu/projects/scisumm_net |
Multi-News |
news |
task-multi |
Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model |
https://arxiv.org/pdf/1906.01749.pdf |
https://github.com/Alex-Fabbri/Multi-News |
Auto-hmds |
multilingual |
task-multi |
Auto-hMDS: Automatic Construction of a Large Heterogeneous Multilingual Multi-Document Summarization Corpus |
http://www.lrec-conf.org/proceedings/lrec2018/pdf/1018.pdf |
https://github.com/AIPHES/auto-hMDS |
WikiHow |
wiki |
task-single |
WikiHow: A Large Scale Text Summarization Dataset |
https://arxiv.org/pdf/1810.09305.pdf |
https://github.com/mahnazkoupaee/WikiHow-Dataset |
DisputeDiscussions |
debate |
task-single |
Understanding and Detecting Supporting Arguments of Diverse Types |
https://arxiv.org/pdf/1705.00045.pdf |
http://www.ccs.neu.edu/home/luwang/data.html |
Debatepedia |
debate |
task-question |
Diversity driven Attention Model for Query-based Abstractive Summarization |
https://arxiv.org/pdf/1704.08300v1.pdf |
https://github.com/PrekshaNema25/DiverstiyBasedAttentionMechanism |
Funcom |
code |
task-code |
Recommendations for Datasets for Source Code Summarization |
https://arxiv.org/pdf/1904.02660.pdf |
http://leclair.tech/data/funcom/ |
Talksum |
scientific-paper |
task-multi |
TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks |
https://arxiv.org/pdf/1906.01351.pdf |
https://github.com/levguy/talksumm |
Multi-Aspect CNN/DM |
news |
task-aspect |
Inducing Document Structure for Aspect-based Summarization |
https://www.aclweb.org/anthology/P19-1630.pdf |
|
proto-summ |
court-judgment |
task-single |
How to Write Summaries with Patterns? Learning towards Abstractive Summarization through Prototype Editing |
https://arxiv.org/pdf/1909.08837.pdf |
https://github.com/gsh199449/proto-summ |
PeerRead |
peer-review |
task-single |
A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications |
https://arxiv.org/pdf/1804.09635.pdf |
https://github.com/allenai/PeerRead/tree/master/data |
AMI |
meeting |
task-multimodal |
The AMI meeting corpus: a pre-announcement |
https://www.m-iti.org/uploads/MLMI2005_CaAsBoEtAl.pdf |
http://groups.inf.ed.ac.uk/ami/download |
BookSum |
book |
task-longtext |
Explorations in Automatic Book Summarization |
https://www.aclweb.org/anthology/D07-1040.pdf |
http://lit.csci.unt.edu/index.php/Downloads |
MScript |
movie-script |
task-single |
Movie Script Summarization as Graph-based Scene Extraction |
https://www.aclweb.org/anthology/N15-1113.pdf |
|
Summarizing Opinions |
product-review |
task-opinion |
Summarizing Opinions: Aspect Extraction Meets Sentiment Prediction and They Are Both Weakly Supervised |
https://arxiv.org/pdf/1808.08858v1.pdf |
https://github.com/stangelid/oposum |