{"data":{"id":"10.48550/arxiv.2303.03846","type":"dois","attributes":{"doi":"10.48550/arxiv.2303.03846","prefix":"10.48550","suffix":"arxiv.2303.03846","identifiers":[{"identifier":"2303.03846","identifierType":"arXiv"}],"alternateIdentifiers":[{"alternateIdentifierType":"arXiv","alternateIdentifier":"2303.03846"}],"creators":[{"name":"Wei, Jerry","nameType":"Personal","givenName":"Jerry","familyName":"Wei","affiliation":[],"nameIdentifiers":[]},{"name":"Wei, Jason","nameType":"Personal","givenName":"Jason","familyName":"Wei","affiliation":[],"nameIdentifiers":[]},{"name":"Tay, Yi","nameType":"Personal","givenName":"Yi","familyName":"Tay","affiliation":[],"nameIdentifiers":[]},{"name":"Tran, Dustin","nameType":"Personal","givenName":"Dustin","familyName":"Tran","affiliation":[],"nameIdentifiers":[]},{"name":"Webson, Albert","nameType":"Personal","givenName":"Albert","familyName":"Webson","affiliation":[],"nameIdentifiers":[]},{"name":"Lu, Yifeng","nameType":"Personal","givenName":"Yifeng","familyName":"Lu","affiliation":[],"nameIdentifiers":[]},{"name":"Chen, Xinyun","nameType":"Personal","givenName":"Xinyun","familyName":"Chen","affiliation":[],"nameIdentifiers":[]},{"name":"Liu, Hanxiao","nameType":"Personal","givenName":"Hanxiao","familyName":"Liu","affiliation":[],"nameIdentifiers":[]},{"name":"Huang, Da","nameType":"Personal","givenName":"Da","familyName":"Huang","affiliation":[],"nameIdentifiers":[]},{"name":"Zhou, Denny","nameType":"Personal","givenName":"Denny","familyName":"Zhou","affiliation":[],"nameIdentifiers":[]},{"name":"Ma, Tengyu","nameType":"Personal","givenName":"Tengyu","familyName":"Ma","affiliation":[],"nameIdentifiers":[]}],"titles":[{"title":"Larger language models do in-context learning differently"}],"publisher":"arXiv","container":{},"publicationYear":2023,"subjects":[{"lang":"en","subject":"Computation and Language (cs.CL)","subjectScheme":"arXiv"},{"subject":"FOS: Computer and information sciences","subjectScheme":"Fields of Science and Technology (FOS)"},{"subject":"FOS: Computer and information sciences","schemeUri":"http://www.oecd.org/science/inno/38235147.pdf","subjectScheme":"Fields of Science and Technology (FOS)"}],"contributors":[],"dates":[{"date":"2023-03-07T12:24:17Z","dateType":"Submitted","dateInformation":"v1"},{"date":"2023-03-08T01:18:53Z","dateType":"Updated","dateInformation":"v1"},{"date":"2023-03-08T07:37:43Z","dateType":"Submitted","dateInformation":"v2"},{"date":"2023-03-09T01:09:01Z","dateType":"Updated","dateInformation":"v2"},{"date":"2023-03","dateType":"Available","dateInformation":"v1"},{"date":"2023","dateType":"Issued"}],"language":null,"types":{"ris":"GEN","bibtex":"misc","citeproc":"article","schemaOrg":"CreativeWork","resourceType":"Article","resourceTypeGeneral":"Preprint"},"relatedIdentifiers":[],"relatedItems":[],"sizes":[],"formats":[],"version":"2","rightsList":[{"rights":"Creative Commons Attribution 4.0 International","rightsUri":"https://creativecommons.org/licenses/by/4.0/legalcode","schemeUri":"https://spdx.org/licenses/","rightsIdentifier":"cc-by-4.0","rightsIdentifierScheme":"SPDX"}],"descriptions":[{"description":"We study how in-context learning (ICL) in language models is affected by semantic priors versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with semantically-unrelated labels-across various model families (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). First, experiments on ICL with flipped labels show that overriding semantic priors is an emergent ability of model scale. While small language models ignore flipped labels presented in-context and thus rely primarily on semantic priors from pretraining, large models can override semantic priors when presented with in-context exemplars that contradict priors, despite the stronger semantic priors that larger models may hold. We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e.g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task. The ability to do SUL-ICL also emerges primarily with scale, and large-enough language models can even perform linear classification in a SUL-ICL setting. Finally, we evaluate instruction-tuned models and find that instruction tuning strengthens both the use of semantic priors and the capacity to learn input-label mappings, but more of the former.","descriptionType":"Abstract"}],"geoLocations":[],"fundingReferences":[],"xml":"PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz4KPHJlc291cmNlIHhtbG5zPSJodHRwOi8vZGF0YWNpdGUub3JnL3NjaGVtYS9rZXJuZWwtNCIgeG1sbnM6eHNpPSJodHRwOi8vd3d3LnczLm9yZy8yMDAxL1hNTFNjaGVtYS1pbnN0YW5jZSIgeHNpOnNjaGVtYUxvY2F0aW9uPSJodHRwOi8vZGF0YWNpdGUub3JnL3NjaGVtYS9rZXJuZWwtNCBodHRwOi8vc2NoZW1hLmRhdGFjaXRlLm9yZy9tZXRhL2tlcm5lbC00LjMvbWV0YWRhdGEueHNkIj4KICA8aWRlbnRpZmllciBpZGVudGlmaWVyVHlwZT0iRE9JIj4xMC40ODU1MC9BUlhJVi4yMzAzLjAzODQ2PC9pZGVudGlmaWVyPgogIDxhbHRlcm5hdGVJZGVudGlmaWVycz4KICAgIDxhbHRlcm5hdGVJZGVudGlmaWVyIGFsdGVybmF0ZUlkZW50aWZpZXJUeXBlPSJhclhpdiI+MjMwMy4wMzg0NjwvYWx0ZXJuYXRlSWRlbnRpZmllcj4KICA8L2FsdGVybmF0ZUlkZW50aWZpZXJzPgogIDxjcmVhdG9ycz4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5XZWksIEplcnJ5PC9jcmVhdG9yTmFtZT4KICAgICAgPGdpdmVuTmFtZT5KZXJyeTwvZ2l2ZW5OYW1lPgogICAgICA8ZmFtaWx5TmFtZT5XZWk8L2ZhbWlseU5hbWU+CiAgICA8L2NyZWF0b3I+CiAgICA8Y3JlYXRvcj4KICAgICAgPGNyZWF0b3JOYW1lIG5hbWVUeXBlPSJQZXJzb25hbCI+V2VpLCBKYXNvbjwvY3JlYXRvck5hbWU+CiAgICAgIDxnaXZlbk5hbWU+SmFzb248L2dpdmVuTmFtZT4KICAgICAgPGZhbWlseU5hbWU+V2VpPC9mYW1pbHlOYW1lPgogICAgPC9jcmVhdG9yPgogICAgPGNyZWF0b3I+CiAgICAgIDxjcmVhdG9yTmFtZSBuYW1lVHlwZT0iUGVyc29uYWwiPlRheSwgWWk8L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPllpPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPlRheTwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5UcmFuLCBEdXN0aW48L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPkR1c3RpbjwvZ2l2ZW5OYW1lPgogICAgICA8ZmFtaWx5TmFtZT5UcmFuPC9mYW1pbHlOYW1lPgogICAgPC9jcmVhdG9yPgogICAgPGNyZWF0b3I+CiAgICAgIDxjcmVhdG9yTmFtZSBuYW1lVHlwZT0iUGVyc29uYWwiPldlYnNvbiwgQWxiZXJ0PC9jcmVhdG9yTmFtZT4KICAgICAgPGdpdmVuTmFtZT5BbGJlcnQ8L2dpdmVuTmFtZT4KICAgICAgPGZhbWlseU5hbWU+V2Vic29uPC9mYW1pbHlOYW1lPgogICAgPC9jcmVhdG9yPgogICAgPGNyZWF0b3I+CiAgICAgIDxjcmVhdG9yTmFtZSBuYW1lVHlwZT0iUGVyc29uYWwiPkx1LCBZaWZlbmc8L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPllpZmVuZzwvZ2l2ZW5OYW1lPgogICAgICA8ZmFtaWx5TmFtZT5MdTwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5DaGVuLCBYaW55dW48L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPlhpbnl1bjwvZ2l2ZW5OYW1lPgogICAgICA8ZmFtaWx5TmFtZT5DaGVuPC9mYW1pbHlOYW1lPgogICAgPC9jcmVhdG9yPgogICAgPGNyZWF0b3I+CiAgICAgIDxjcmVhdG9yTmFtZSBuYW1lVHlwZT0iUGVyc29uYWwiPkxpdSwgSGFueGlhbzwvY3JlYXRvck5hbWU+CiAgICAgIDxnaXZlbk5hbWU+SGFueGlhbzwvZ2l2ZW5OYW1lPgogICAgICA8ZmFtaWx5TmFtZT5MaXU8L2ZhbWlseU5hbWU+CiAgICA8L2NyZWF0b3I+CiAgICA8Y3JlYXRvcj4KICAgICAgPGNyZWF0b3JOYW1lIG5hbWVUeXBlPSJQZXJzb25hbCI+SHVhbmcsIERhPC9jcmVhdG9yTmFtZT4KICAgICAgPGdpdmVuTmFtZT5EYTwvZ2l2ZW5OYW1lPgogICAgICA8ZmFtaWx5TmFtZT5IdWFuZzwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5aaG91LCBEZW5ueTwvY3JlYXRvck5hbWU+CiAgICAgIDxnaXZlbk5hbWU+RGVubnk8L2dpdmVuTmFtZT4KICAgICAgPGZhbWlseU5hbWU+WmhvdTwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5NYSwgVGVuZ3l1PC9jcmVhdG9yTmFtZT4KICAgICAgPGdpdmVuTmFtZT5UZW5neXU8L2dpdmVuTmFtZT4KICAgICAgPGZhbWlseU5hbWU+TWE8L2ZhbWlseU5hbWU+CiAgICA8L2NyZWF0b3I+CiAgPC9jcmVhdG9ycz4KICA8dGl0bGVzPgogICAgPHRpdGxlPkxhcmdlciBsYW5ndWFnZSBtb2RlbHMgZG8gaW4tY29udGV4dCBsZWFybmluZyBkaWZmZXJlbnRseTwvdGl0bGU+CiAgPC90aXRsZXM+CiAgPHB1Ymxpc2hlcj5hclhpdjwvcHVibGlzaGVyPgogIDxwdWJsaWNhdGlvblllYXI+MjAyMzwvcHVibGljYXRpb25ZZWFyPgogIDxzdWJqZWN0cz4KICAgIDxzdWJqZWN0IHhtbDpsYW5nPSJlbiIgc3ViamVjdFNjaGVtZT0iYXJYaXYiPkNvbXB1dGF0aW9uIGFuZCBMYW5ndWFnZSAoY3MuQ0wpPC9zdWJqZWN0PgogICAgPHN1YmplY3Qgc3ViamVjdFNjaGVtZT0iRmllbGRzIG9mIFNjaWVuY2UgYW5kIFRlY2hub2xvZ3kgKEZPUykiPkZPUzogQ29tcHV0ZXIgYW5kIGluZm9ybWF0aW9uIHNjaWVuY2VzPC9zdWJqZWN0PgogIDwvc3ViamVjdHM+CiAgPGRhdGVzPgogICAgPGRhdGUgZGF0ZVR5cGU9IlN1Ym1pdHRlZCIgZGF0ZUluZm9ybWF0aW9uPSJ2MSI+MjAyMy0wMy0wN1QxMjoyNDoxN1o8L2RhdGU+CiAgICA8ZGF0ZSBkYXRlVHlwZT0iVXBkYXRlZCIgZGF0ZUluZm9ybWF0aW9uPSJ2MSI+MjAyMy0wMy0wOFQwMToxODo1M1o8L2RhdGU+CiAgICA8ZGF0ZSBkYXRlVHlwZT0iU3VibWl0dGVkIiBkYXRlSW5mb3JtYXRpb249InYyIj4yMDIzLTAzLTA4VDA3OjM3OjQzWjwvZGF0ZT4KICAgIDxkYXRlIGRhdGVUeXBlPSJVcGRhdGVkIiBkYXRlSW5mb3JtYXRpb249InYyIj4yMDIzLTAzLTA5VDAxOjA5OjAxWjwvZGF0ZT4KICAgIDxkYXRlIGRhdGVUeXBlPSJBdmFpbGFibGUiIGRhdGVJbmZvcm1hdGlvbj0idjEiPjIwMjMtMDM8L2RhdGU+CiAgPC9kYXRlcz4KICA8cmVzb3VyY2VUeXBlIHJlc291cmNlVHlwZUdlbmVyYWw9IlByZXByaW50Ij5BcnRpY2xlPC9yZXNvdXJjZVR5cGU+CiAgPHZlcnNpb24+MjwvdmVyc2lvbj4KICA8cmlnaHRzTGlzdD4KICAgIDxyaWdodHMgcmlnaHRzVVJJPSJodHRwOi8vY3JlYXRpdmVjb21tb25zLm9yZy9saWNlbnNlcy9ieS80LjAvIiByaWdodHNJZGVudGlmaWVyU2NoZW1lPSJTUERYIiByaWdodHNJZGVudGlmaWVyPSJDQy1CWS00LjAiPkNyZWF0aXZlIENvbW1vbnMgQXR0cmlidXRpb24gNC4wIEludGVybmF0aW9uYWw8L3JpZ2h0cz4KICA8L3JpZ2h0c0xpc3Q+CiAgPGRlc2NyaXB0aW9ucz4KICAgIDxkZXNjcmlwdGlvbiBkZXNjcmlwdGlvblR5cGU9IkFic3RyYWN0Ij5XZSBzdHVkeSBob3cgaW4tY29udGV4dCBsZWFybmluZyAoSUNMKSBpbiBsYW5ndWFnZSBtb2RlbHMgaXMgYWZmZWN0ZWQgYnkgc2VtYW50aWMgcHJpb3JzIHZlcnN1cyBpbnB1dC1sYWJlbCBtYXBwaW5ncy4gV2UgaW52ZXN0aWdhdGUgdHdvIHNldHVwcy1JQ0wgd2l0aCBmbGlwcGVkIGxhYmVscyBhbmQgSUNMIHdpdGggc2VtYW50aWNhbGx5LXVucmVsYXRlZCBsYWJlbHMtYWNyb3NzIHZhcmlvdXMgbW9kZWwgZmFtaWxpZXMgKEdQVC0zLCBJbnN0cnVjdEdQVCwgQ29kZXgsIFBhTE0sIGFuZCBGbGFuLVBhTE0pLiBGaXJzdCwgZXhwZXJpbWVudHMgb24gSUNMIHdpdGggZmxpcHBlZCBsYWJlbHMgc2hvdyB0aGF0IG92ZXJyaWRpbmcgc2VtYW50aWMgcHJpb3JzIGlzIGFuIGVtZXJnZW50IGFiaWxpdHkgb2YgbW9kZWwgc2NhbGUuIFdoaWxlIHNtYWxsIGxhbmd1YWdlIG1vZGVscyBpZ25vcmUgZmxpcHBlZCBsYWJlbHMgcHJlc2VudGVkIGluLWNvbnRleHQgYW5kIHRodXMgcmVseSBwcmltYXJpbHkgb24gc2VtYW50aWMgcHJpb3JzIGZyb20gcHJldHJhaW5pbmcsIGxhcmdlIG1vZGVscyBjYW4gb3ZlcnJpZGUgc2VtYW50aWMgcHJpb3JzIHdoZW4gcHJlc2VudGVkIHdpdGggaW4tY29udGV4dCBleGVtcGxhcnMgdGhhdCBjb250cmFkaWN0IHByaW9ycywgZGVzcGl0ZSB0aGUgc3Ryb25nZXIgc2VtYW50aWMgcHJpb3JzIHRoYXQgbGFyZ2VyIG1vZGVscyBtYXkgaG9sZC4gV2UgbmV4dCBzdHVkeSBzZW1hbnRpY2FsbHktdW5yZWxhdGVkIGxhYmVsIElDTCAoU1VMLUlDTCksIGluIHdoaWNoIGxhYmVscyBhcmUgc2VtYW50aWNhbGx5IHVucmVsYXRlZCB0byB0aGVpciBpbnB1dHMgKGUuZy4sIGZvby9iYXIgaW5zdGVhZCBvZiBuZWdhdGl2ZS9wb3NpdGl2ZSksIHRoZXJlYnkgZm9yY2luZyBsYW5ndWFnZSBtb2RlbHMgdG8gbGVhcm4gdGhlIGlucHV0LWxhYmVsIG1hcHBpbmdzIHNob3duIGluIGluLWNvbnRleHQgZXhlbXBsYXJzIGluIG9yZGVyIHRvIHBlcmZvcm0gdGhlIHRhc2suIFRoZSBhYmlsaXR5IHRvIGRvIFNVTC1JQ0wgYWxzbyBlbWVyZ2VzIHByaW1hcmlseSB3aXRoIHNjYWxlLCBhbmQgbGFyZ2UtZW5vdWdoIGxhbmd1YWdlIG1vZGVscyBjYW4gZXZlbiBwZXJmb3JtIGxpbmVhciBjbGFzc2lmaWNhdGlvbiBpbiBhIFNVTC1JQ0wgc2V0dGluZy4gRmluYWxseSwgd2UgZXZhbHVhdGUgaW5zdHJ1Y3Rpb24tdHVuZWQgbW9kZWxzIGFuZCBmaW5kIHRoYXQgaW5zdHJ1Y3Rpb24gdHVuaW5nIHN0cmVuZ3RoZW5zIGJvdGggdGhlIHVzZSBvZiBzZW1hbnRpYyBwcmlvcnMgYW5kIHRoZSBjYXBhY2l0eSB0byBsZWFybiBpbnB1dC1sYWJlbCBtYXBwaW5ncywgYnV0IG1vcmUgb2YgdGhlIGZvcm1lci48L2Rlc2NyaXB0aW9uPgogIDwvZGVzY3JpcHRpb25zPgo8L3Jlc291cmNlPg==","url":"https://arxiv.org/abs/2303.03846","contentUrl":null,"metadataVersion":1,"schemaVersion":"http://datacite.org/schema/kernel-4","source":"mds","isActive":true,"state":"findable","reason":null,"viewCount":0,"viewsOverTime":[],"downloadCount":0,"downloadsOverTime":[],"referenceCount":0,"citationCount":1,"citationsOverTime":[{"year":"2024","total":1}],"partCount":0,"partOfCount":0,"versionCount":0,"versionOfCount":0,"created":"2023-03-08T02:10:41.000Z","registered":"2023-03-08T02:10:41.000Z","published":"2023","updated":"2024-08-29T08:52:59.000Z"},"relationships":{"client":{"data":{"id":"arxiv.content","type":"clients"}},"provider":{"data":{"id":"arxiv","type":"providers"}},"media":{"data":{"id":"10.48550/arxiv.2303.03846","type":"media"}},"references":{"data":[]},"citations":{"data":[{"id":"10.4230/lipics.cp.2024.20","type":"dois"}]},"parts":{"data":[]},"partOf":{"data":[]},"versions":{"data":[]},"versionOf":{"data":[]}}}}