{"data":{"id":"10.48550/arxiv.2111.10952","type":"dois","attributes":{"doi":"10.48550/arxiv.2111.10952","prefix":"10.48550","suffix":"arxiv.2111.10952","identifiers":[{"identifier":"2111.10952","identifierType":"arXiv"}],"alternateIdentifiers":[{"alternateIdentifierType":"arXiv","alternateIdentifier":"2111.10952"}],"creators":[{"name":"Aribandi, Vamsi","nameType":"Personal","givenName":"Vamsi","familyName":"Aribandi","affiliation":[],"nameIdentifiers":[]},{"name":"Tay, Yi","nameType":"Personal","givenName":"Yi","familyName":"Tay","affiliation":[],"nameIdentifiers":[]},{"name":"Schuster, Tal","nameType":"Personal","givenName":"Tal","familyName":"Schuster","affiliation":[],"nameIdentifiers":[]},{"name":"Rao, Jinfeng","nameType":"Personal","givenName":"Jinfeng","familyName":"Rao","affiliation":[],"nameIdentifiers":[]},{"name":"Zheng, Huaixiu Steven","nameType":"Personal","givenName":"Huaixiu Steven","familyName":"Zheng","affiliation":[],"nameIdentifiers":[]},{"name":"Mehta, Sanket Vaibhav","nameType":"Personal","givenName":"Sanket Vaibhav","familyName":"Mehta","affiliation":[],"nameIdentifiers":[]},{"name":"Zhuang, Honglei","nameType":"Personal","givenName":"Honglei","familyName":"Zhuang","affiliation":[],"nameIdentifiers":[]},{"name":"Tran, Vinh Q.","nameType":"Personal","givenName":"Vinh Q.","familyName":"Tran","affiliation":[],"nameIdentifiers":[]},{"name":"Bahri, Dara","nameType":"Personal","givenName":"Dara","familyName":"Bahri","affiliation":[],"nameIdentifiers":[]},{"name":"Ni, Jianmo","nameType":"Personal","givenName":"Jianmo","familyName":"Ni","affiliation":[],"nameIdentifiers":[]},{"name":"Gupta, Jai","nameType":"Personal","givenName":"Jai","familyName":"Gupta","affiliation":[],"nameIdentifiers":[]},{"name":"Hui, Kai","nameType":"Personal","givenName":"Kai","familyName":"Hui","affiliation":[],"nameIdentifiers":[]},{"name":"Ruder, Sebastian","nameType":"Personal","givenName":"Sebastian","familyName":"Ruder","affiliation":[],"nameIdentifiers":[]},{"name":"Metzler, Donald","nameType":"Personal","givenName":"Donald","familyName":"Metzler","affiliation":[],"nameIdentifiers":[]}],"titles":[{"title":"ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning"}],"publisher":"arXiv","container":{},"publicationYear":2021,"subjects":[{"lang":"en","subject":"Computation and Language (cs.CL)","subjectScheme":"arXiv"},{"lang":"en","subject":"Machine Learning (cs.LG)","subjectScheme":"arXiv"},{"subject":"FOS: Computer and information sciences","subjectScheme":"Fields of Science and Technology (FOS)"},{"subject":"FOS: Computer and information sciences","schemeUri":"http://www.oecd.org/science/inno/38235147.pdf","subjectScheme":"Fields of Science and Technology (FOS)"}],"contributors":[],"dates":[{"date":"2021-11-22T02:34:46Z","dateType":"Submitted","dateInformation":"v1"},{"date":"2021-11-23T01:26:15Z","dateType":"Updated","dateInformation":"v1"},{"date":"2022-01-29T07:41:54Z","dateType":"Submitted","dateInformation":"v2"},{"date":"2022-02-01T01:09:52Z","dateType":"Updated","dateInformation":"v2"},{"date":"2021-11","dateType":"Available","dateInformation":"v1"},{"date":"2021","dateType":"Issued"}],"language":null,"types":{"ris":"GEN","bibtex":"misc","citeproc":"article","schemaOrg":"CreativeWork","resourceType":"Article","resourceTypeGeneral":"Preprint"},"relatedIdentifiers":[],"relatedItems":[],"sizes":[],"formats":[],"version":"2","rightsList":[{"rights":"Creative Commons Attribution 4.0 International","rightsUri":"https://creativecommons.org/licenses/by/4.0/legalcode","schemeUri":"https://spdx.org/licenses/","rightsIdentifier":"cc-by-4.0","rightsIdentifierScheme":"SPDX"}],"descriptions":[{"description":"Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during pre-training. Towards this goal, this paper introduces ExMix (Extreme Mixture): a massive collection of 107 supervised NLP tasks across diverse domains and task-families. Using ExMix, we study the effect of multi-task pre-training at the largest scale to date, and analyze co-training transfer amongst common families of tasks. Through this analysis, we show that manually curating an ideal set of tasks for multi-task pre-training is not straightforward, and that multi-task scaling can vastly improve models on its own. Finally, we propose ExT5: a model pre-trained using a multi-task objective of self-supervised span denoising and supervised ExMix. Via extensive experiments, we show that ExT5 outperforms strong T5 baselines on SuperGLUE, GEM, Rainbow, Closed-Book QA tasks, and several tasks outside of ExMix. ExT5 also significantly improves sample efficiency while pre-training.","descriptionType":"Abstract"},{"description":"ICLR 2022; see https://youtu.be/FbRcbM4T-50 for a video overview of the paper","descriptionType":"Other"}],"geoLocations":[],"fundingReferences":[],"xml":"PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz4KPHJlc291cmNlIHhtbG5zPSJodHRwOi8vZGF0YWNpdGUub3JnL3NjaGVtYS9rZXJuZWwtNCIgeG1sbnM6eHNpPSJodHRwOi8vd3d3LnczLm9yZy8yMDAxL1hNTFNjaGVtYS1pbnN0YW5jZSIgeHNpOnNjaGVtYUxvY2F0aW9uPSJodHRwOi8vZGF0YWNpdGUub3JnL3NjaGVtYS9rZXJuZWwtNCBodHRwOi8vc2NoZW1hLmRhdGFjaXRlLm9yZy9tZXRhL2tlcm5lbC00LjMvbWV0YWRhdGEueHNkIj4KICA8aWRlbnRpZmllciBpZGVudGlmaWVyVHlwZT0iRE9JIj4xMC40ODU1MC9BUlhJVi4yMTExLjEwOTUyPC9pZGVudGlmaWVyPgogIDxhbHRlcm5hdGVJZGVudGlmaWVycz4KICAgIDxhbHRlcm5hdGVJZGVudGlmaWVyIGFsdGVybmF0ZUlkZW50aWZpZXJUeXBlPSJhclhpdiI+MjExMS4xMDk1MjwvYWx0ZXJuYXRlSWRlbnRpZmllcj4KICA8L2FsdGVybmF0ZUlkZW50aWZpZXJzPgogIDxjcmVhdG9ycz4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5BcmliYW5kaSwgVmFtc2k8L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPlZhbXNpPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPkFyaWJhbmRpPC9mYW1pbHlOYW1lPgogICAgPC9jcmVhdG9yPgogICAgPGNyZWF0b3I+CiAgICAgIDxjcmVhdG9yTmFtZSBuYW1lVHlwZT0iUGVyc29uYWwiPlRheSwgWWk8L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPllpPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPlRheTwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5TY2h1c3RlciwgVGFsPC9jcmVhdG9yTmFtZT4KICAgICAgPGdpdmVuTmFtZT5UYWw8L2dpdmVuTmFtZT4KICAgICAgPGZhbWlseU5hbWU+U2NodXN0ZXI8L2ZhbWlseU5hbWU+CiAgICA8L2NyZWF0b3I+CiAgICA8Y3JlYXRvcj4KICAgICAgPGNyZWF0b3JOYW1lIG5hbWVUeXBlPSJQZXJzb25hbCI+UmFvLCBKaW5mZW5nPC9jcmVhdG9yTmFtZT4KICAgICAgPGdpdmVuTmFtZT5KaW5mZW5nPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPlJhbzwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5aaGVuZywgSHVhaXhpdSBTdGV2ZW48L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPkh1YWl4aXUgU3RldmVuPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPlpoZW5nPC9mYW1pbHlOYW1lPgogICAgPC9jcmVhdG9yPgogICAgPGNyZWF0b3I+CiAgICAgIDxjcmVhdG9yTmFtZSBuYW1lVHlwZT0iUGVyc29uYWwiPk1laHRhLCBTYW5rZXQgVmFpYmhhdjwvY3JlYXRvck5hbWU+CiAgICAgIDxnaXZlbk5hbWU+U2Fua2V0IFZhaWJoYXY8L2dpdmVuTmFtZT4KICAgICAgPGZhbWlseU5hbWU+TWVodGE8L2ZhbWlseU5hbWU+CiAgICA8L2NyZWF0b3I+CiAgICA8Y3JlYXRvcj4KICAgICAgPGNyZWF0b3JOYW1lIG5hbWVUeXBlPSJQZXJzb25hbCI+Wmh1YW5nLCBIb25nbGVpPC9jcmVhdG9yTmFtZT4KICAgICAgPGdpdmVuTmFtZT5Ib25nbGVpPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPlpodWFuZzwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5UcmFuLCBWaW5oIFEuPC9jcmVhdG9yTmFtZT4KICAgICAgPGdpdmVuTmFtZT5WaW5oIFEuPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPlRyYW48L2ZhbWlseU5hbWU+CiAgICA8L2NyZWF0b3I+CiAgICA8Y3JlYXRvcj4KICAgICAgPGNyZWF0b3JOYW1lIG5hbWVUeXBlPSJQZXJzb25hbCI+QmFocmksIERhcmE8L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPkRhcmE8L2dpdmVuTmFtZT4KICAgICAgPGZhbWlseU5hbWU+QmFocmk8L2ZhbWlseU5hbWU+CiAgICA8L2NyZWF0b3I+CiAgICA8Y3JlYXRvcj4KICAgICAgPGNyZWF0b3JOYW1lIG5hbWVUeXBlPSJQZXJzb25hbCI+TmksIEppYW5tbzwvY3JlYXRvck5hbWU+CiAgICAgIDxnaXZlbk5hbWU+Smlhbm1vPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPk5pPC9mYW1pbHlOYW1lPgogICAgPC9jcmVhdG9yPgogICAgPGNyZWF0b3I+CiAgICAgIDxjcmVhdG9yTmFtZSBuYW1lVHlwZT0iUGVyc29uYWwiPkd1cHRhLCBKYWk8L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPkphaTwvZ2l2ZW5OYW1lPgogICAgICA8ZmFtaWx5TmFtZT5HdXB0YTwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5IdWksIEthaTwvY3JlYXRvck5hbWU+CiAgICAgIDxnaXZlbk5hbWU+S2FpPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPkh1aTwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5SdWRlciwgU2ViYXN0aWFuPC9jcmVhdG9yTmFtZT4KICAgICAgPGdpdmVuTmFtZT5TZWJhc3RpYW48L2dpdmVuTmFtZT4KICAgICAgPGZhbWlseU5hbWU+UnVkZXI8L2ZhbWlseU5hbWU+CiAgICA8L2NyZWF0b3I+CiAgICA8Y3JlYXRvcj4KICAgICAgPGNyZWF0b3JOYW1lIG5hbWVUeXBlPSJQZXJzb25hbCI+TWV0emxlciwgRG9uYWxkPC9jcmVhdG9yTmFtZT4KICAgICAgPGdpdmVuTmFtZT5Eb25hbGQ8L2dpdmVuTmFtZT4KICAgICAgPGZhbWlseU5hbWU+TWV0emxlcjwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICA8L2NyZWF0b3JzPgogIDx0aXRsZXM+CiAgICA8dGl0bGU+RXhUNTogVG93YXJkcyBFeHRyZW1lIE11bHRpLVRhc2sgU2NhbGluZyBmb3IgVHJhbnNmZXIgTGVhcm5pbmc8L3RpdGxlPgogIDwvdGl0bGVzPgogIDxwdWJsaXNoZXI+YXJYaXY8L3B1Ymxpc2hlcj4KICA8cHVibGljYXRpb25ZZWFyPjIwMjE8L3B1YmxpY2F0aW9uWWVhcj4KICA8c3ViamVjdHM+CiAgICA8c3ViamVjdCB4bWw6bGFuZz0iZW4iIHN1YmplY3RTY2hlbWU9ImFyWGl2Ij5Db21wdXRhdGlvbiBhbmQgTGFuZ3VhZ2UgKGNzLkNMKTwvc3ViamVjdD4KICAgIDxzdWJqZWN0IHhtbDpsYW5nPSJlbiIgc3ViamVjdFNjaGVtZT0iYXJYaXYiPk1hY2hpbmUgTGVhcm5pbmcgKGNzLkxHKTwvc3ViamVjdD4KICAgIDxzdWJqZWN0IHN1YmplY3RTY2hlbWU9IkZpZWxkcyBvZiBTY2llbmNlIGFuZCBUZWNobm9sb2d5IChGT1MpIj5GT1M6IENvbXB1dGVyIGFuZCBpbmZvcm1hdGlvbiBzY2llbmNlczwvc3ViamVjdD4KICA8L3N1YmplY3RzPgogIDxkYXRlcz4KICAgIDxkYXRlIGRhdGVUeXBlPSJTdWJtaXR0ZWQiIGRhdGVJbmZvcm1hdGlvbj0idjEiPjIwMjEtMTEtMjJUMDI6MzQ6NDZaPC9kYXRlPgogICAgPGRhdGUgZGF0ZVR5cGU9IlVwZGF0ZWQiIGRhdGVJbmZvcm1hdGlvbj0idjEiPjIwMjEtMTEtMjNUMDE6MjY6MTVaPC9kYXRlPgogICAgPGRhdGUgZGF0ZVR5cGU9IlN1Ym1pdHRlZCIgZGF0ZUluZm9ybWF0aW9uPSJ2MiI+MjAyMi0wMS0yOVQwNzo0MTo1NFo8L2RhdGU+CiAgICA8ZGF0ZSBkYXRlVHlwZT0iVXBkYXRlZCIgZGF0ZUluZm9ybWF0aW9uPSJ2MiI+MjAyMi0wMi0wMVQwMTowOTo1Mlo8L2RhdGU+CiAgICA8ZGF0ZSBkYXRlVHlwZT0iQXZhaWxhYmxlIiBkYXRlSW5mb3JtYXRpb249InYxIj4yMDIxLTExPC9kYXRlPgogIDwvZGF0ZXM+CiAgPHJlc291cmNlVHlwZSByZXNvdXJjZVR5cGVHZW5lcmFsPSJQcmVwcmludCI+QXJ0aWNsZTwvcmVzb3VyY2VUeXBlPgogIDx2ZXJzaW9uPjI8L3ZlcnNpb24+CiAgPHJpZ2h0c0xpc3Q+CiAgICA8cmlnaHRzIHJpZ2h0c1VSST0iaHR0cDovL2NyZWF0aXZlY29tbW9ucy5vcmcvbGljZW5zZXMvYnkvNC4wLyIgcmlnaHRzSWRlbnRpZmllclNjaGVtZT0iU1BEWCIgcmlnaHRzSWRlbnRpZmllcj0iQ0MtQlktNC4wIj5DcmVhdGl2ZSBDb21tb25zIEF0dHJpYnV0aW9uIDQuMCBJbnRlcm5hdGlvbmFsPC9yaWdodHM+CiAgPC9yaWdodHNMaXN0PgogIDxkZXNjcmlwdGlvbnM+CiAgICA8ZGVzY3JpcHRpb24gZGVzY3JpcHRpb25UeXBlPSJBYnN0cmFjdCI+RGVzcGl0ZSB0aGUgcmVjZW50IHN1Y2Nlc3Mgb2YgbXVsdGktdGFzayBsZWFybmluZyBhbmQgdHJhbnNmZXIgbGVhcm5pbmcgZm9yIG5hdHVyYWwgbGFuZ3VhZ2UgcHJvY2Vzc2luZyAoTkxQKSwgZmV3IHdvcmtzIGhhdmUgc3lzdGVtYXRpY2FsbHkgc3R1ZGllZCB0aGUgZWZmZWN0IG9mIHNjYWxpbmcgdXAgdGhlIG51bWJlciBvZiB0YXNrcyBkdXJpbmcgcHJlLXRyYWluaW5nLiBUb3dhcmRzIHRoaXMgZ29hbCwgdGhpcyBwYXBlciBpbnRyb2R1Y2VzIEV4TWl4IChFeHRyZW1lIE1peHR1cmUpOiBhIG1hc3NpdmUgY29sbGVjdGlvbiBvZiAxMDcgc3VwZXJ2aXNlZCBOTFAgdGFza3MgYWNyb3NzIGRpdmVyc2UgZG9tYWlucyBhbmQgdGFzay1mYW1pbGllcy4gVXNpbmcgRXhNaXgsIHdlIHN0dWR5IHRoZSBlZmZlY3Qgb2YgbXVsdGktdGFzayBwcmUtdHJhaW5pbmcgYXQgdGhlIGxhcmdlc3Qgc2NhbGUgdG8gZGF0ZSwgYW5kIGFuYWx5emUgY28tdHJhaW5pbmcgdHJhbnNmZXIgYW1vbmdzdCBjb21tb24gZmFtaWxpZXMgb2YgdGFza3MuIFRocm91Z2ggdGhpcyBhbmFseXNpcywgd2Ugc2hvdyB0aGF0IG1hbnVhbGx5IGN1cmF0aW5nIGFuIGlkZWFsIHNldCBvZiB0YXNrcyBmb3IgbXVsdGktdGFzayBwcmUtdHJhaW5pbmcgaXMgbm90IHN0cmFpZ2h0Zm9yd2FyZCwgYW5kIHRoYXQgbXVsdGktdGFzayBzY2FsaW5nIGNhbiB2YXN0bHkgaW1wcm92ZSBtb2RlbHMgb24gaXRzIG93bi4gRmluYWxseSwgd2UgcHJvcG9zZSBFeFQ1OiBhIG1vZGVsIHByZS10cmFpbmVkIHVzaW5nIGEgbXVsdGktdGFzayBvYmplY3RpdmUgb2Ygc2VsZi1zdXBlcnZpc2VkIHNwYW4gZGVub2lzaW5nIGFuZCBzdXBlcnZpc2VkIEV4TWl4LiBWaWEgZXh0ZW5zaXZlIGV4cGVyaW1lbnRzLCB3ZSBzaG93IHRoYXQgRXhUNSBvdXRwZXJmb3JtcyBzdHJvbmcgVDUgYmFzZWxpbmVzIG9uIFN1cGVyR0xVRSwgR0VNLCBSYWluYm93LCBDbG9zZWQtQm9vayBRQSB0YXNrcywgYW5kIHNldmVyYWwgdGFza3Mgb3V0c2lkZSBvZiBFeE1peC4gRXhUNSBhbHNvIHNpZ25pZmljYW50bHkgaW1wcm92ZXMgc2FtcGxlIGVmZmljaWVuY3kgd2hpbGUgcHJlLXRyYWluaW5nLjwvZGVzY3JpcHRpb24+CiAgICA8ZGVzY3JpcHRpb24gZGVzY3JpcHRpb25UeXBlPSJPdGhlciI+SUNMUiAyMDIyOyBzZWUgaHR0cHM6Ly95b3V0dS5iZS9GYlJjYk00VC01MCBmb3IgYSB2aWRlbyBvdmVydmlldyBvZiB0aGUgcGFwZXI8L2Rlc2NyaXB0aW9uPgogIDwvZGVzY3JpcHRpb25zPgo8L3Jlc291cmNlPg==","url":"https://arxiv.org/abs/2111.10952","contentUrl":null,"metadataVersion":0,"schemaVersion":"http://datacite.org/schema/kernel-4","source":"mds","isActive":true,"state":"findable","reason":null,"viewCount":0,"viewsOverTime":[],"downloadCount":0,"downloadsOverTime":[],"referenceCount":0,"citationCount":0,"citationsOverTime":[],"partCount":0,"partOfCount":0,"versionCount":0,"versionOfCount":0,"created":"2022-02-20T07:40:59.000Z","registered":"2022-02-20T07:41:00.000Z","published":"2021","updated":"2022-02-20T07:41:00.000Z"},"relationships":{"client":{"data":{"id":"arxiv.content","type":"clients"}},"provider":{"data":{"id":"arxiv","type":"providers"}},"media":{"data":{"id":"10.48550/arxiv.2111.10952","type":"media"}},"references":{"data":[]},"citations":{"data":[]},"parts":{"data":[]},"partOf":{"data":[]},"versions":{"data":[]},"versionOf":{"data":[]}}}}