{"data":{"id":"10.48550/arxiv.2106.03400","type":"dois","attributes":{"doi":"10.48550/arxiv.2106.03400","prefix":"10.48550","suffix":"arxiv.2106.03400","identifiers":[{"identifier":"2106.03400","identifierType":"arXiv"}],"alternateIdentifiers":[{"alternateIdentifierType":"arXiv","alternateIdentifier":"2106.03400"}],"creators":[{"name":"Yang, Yiqin","nameType":"Personal","givenName":"Yiqin","familyName":"Yang","affiliation":[],"nameIdentifiers":[]},{"name":"Ma, Xiaoteng","nameType":"Personal","givenName":"Xiaoteng","familyName":"Ma","affiliation":[],"nameIdentifiers":[]},{"name":"Li, Chenghao","nameType":"Personal","givenName":"Chenghao","familyName":"Li","affiliation":[],"nameIdentifiers":[]},{"name":"Zheng, Zewu","nameType":"Personal","givenName":"Zewu","familyName":"Zheng","affiliation":[],"nameIdentifiers":[]},{"name":"Zhang, Qiyuan","nameType":"Personal","givenName":"Qiyuan","familyName":"Zhang","affiliation":[],"nameIdentifiers":[]},{"name":"Huang, Gao","nameType":"Personal","givenName":"Gao","familyName":"Huang","affiliation":[],"nameIdentifiers":[]},{"name":"Yang, Jun","nameType":"Personal","givenName":"Jun","familyName":"Yang","affiliation":[],"nameIdentifiers":[]},{"name":"Zhao, Qianchuan","nameType":"Personal","givenName":"Qianchuan","familyName":"Zhao","affiliation":[],"nameIdentifiers":[]}],"titles":[{"title":"Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning"}],"publisher":"arXiv","container":{},"publicationYear":2021,"subjects":[{"lang":"en","subject":"Artificial Intelligence (cs.AI)","subjectScheme":"arXiv"},{"subject":"FOS: Computer and information sciences","subjectScheme":"Fields of Science and Technology (FOS)"},{"subject":"FOS: Computer and information sciences","schemeUri":"http://www.oecd.org/science/inno/38235147.pdf","subjectScheme":"Fields of Science and Technology (FOS)"}],"contributors":[],"dates":[{"date":"2021-06-07T08:02:31Z","dateType":"Submitted","dateInformation":"v1"},{"date":"2021-06-08T00:36:16Z","dateType":"Updated","dateInformation":"v1"},{"date":"2021-10-26T10:50:50Z","dateType":"Submitted","dateInformation":"v2"},{"date":"2021-10-27T00:26:40Z","dateType":"Updated","dateInformation":"v2"},{"date":"2021-06","dateType":"Available","dateInformation":"v1"},{"date":"2021","dateType":"Issued"}],"language":null,"types":{"ris":"GEN","bibtex":"misc","citeproc":"article","schemaOrg":"CreativeWork","resourceType":"Article","resourceTypeGeneral":"Preprint"},"relatedIdentifiers":[],"relatedItems":[],"sizes":[],"formats":[],"version":"2","rightsList":[{"rights":"Creative Commons Attribution Non Commercial Share Alike 4.0 International","rightsUri":"https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode","schemeUri":"https://spdx.org/licenses/","rightsIdentifier":"cc-by-nc-sa-4.0","rightsIdentifierScheme":"SPDX"}],"descriptions":[{"description":"Learning from datasets without interaction with environments (Offline Learning) is an essential step to apply Reinforcement Learning (RL) algorithms in real-world scenarios. However, compared with the single-agent counterpart, offline multi-agent RL introduces more agents with the larger state and action space, which is more challenging but attracts little attention. We demonstrate current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error. In this paper, we propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error by only trusting the state-action pairs given in the dataset for value estimation. Moreover, we extend ICQ to multi-agent tasks by decomposing the joint-policy under the implicit constraint. Experimental results demonstrate that the extrapolation error is successfully controlled within a reasonable range and insensitive to the number of agents. We further show that ICQ achieves the state-of-the-art performance in the challenging multi-agent offline tasks (StarCraft II). Our code is public online at https://github.com/YiqinYang/ICQ.","descriptionType":"Abstract"},{"description":"Accepted by NeurIPS2021. The first two authors contributed equally to the work","descriptionType":"Other"}],"geoLocations":[],"fundingReferences":[],"xml":"PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz4KPHJlc291cmNlIHhtbG5zPSJodHRwOi8vZGF0YWNpdGUub3JnL3NjaGVtYS9rZXJuZWwtNCIgeG1sbnM6eHNpPSJodHRwOi8vd3d3LnczLm9yZy8yMDAxL1hNTFNjaGVtYS1pbnN0YW5jZSIgeHNpOnNjaGVtYUxvY2F0aW9uPSJodHRwOi8vZGF0YWNpdGUub3JnL3NjaGVtYS9rZXJuZWwtNCBodHRwOi8vc2NoZW1hLmRhdGFjaXRlLm9yZy9tZXRhL2tlcm5lbC00LjMvbWV0YWRhdGEueHNkIj4KICA8aWRlbnRpZmllciBpZGVudGlmaWVyVHlwZT0iRE9JIj4xMC40ODU1MC9BUlhJVi4yMTA2LjAzNDAwPC9pZGVudGlmaWVyPgogIDxhbHRlcm5hdGVJZGVudGlmaWVycz4KICAgIDxhbHRlcm5hdGVJZGVudGlmaWVyIGFsdGVybmF0ZUlkZW50aWZpZXJUeXBlPSJhclhpdiI+MjEwNi4wMzQwMDwvYWx0ZXJuYXRlSWRlbnRpZmllcj4KICA8L2FsdGVybmF0ZUlkZW50aWZpZXJzPgogIDxjcmVhdG9ycz4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5ZYW5nLCBZaXFpbjwvY3JlYXRvck5hbWU+CiAgICAgIDxnaXZlbk5hbWU+WWlxaW48L2dpdmVuTmFtZT4KICAgICAgPGZhbWlseU5hbWU+WWFuZzwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5NYSwgWGlhb3Rlbmc8L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPlhpYW90ZW5nPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPk1hPC9mYW1pbHlOYW1lPgogICAgPC9jcmVhdG9yPgogICAgPGNyZWF0b3I+CiAgICAgIDxjcmVhdG9yTmFtZSBuYW1lVHlwZT0iUGVyc29uYWwiPkxpLCBDaGVuZ2hhbzwvY3JlYXRvck5hbWU+CiAgICAgIDxnaXZlbk5hbWU+Q2hlbmdoYW88L2dpdmVuTmFtZT4KICAgICAgPGZhbWlseU5hbWU+TGk8L2ZhbWlseU5hbWU+CiAgICA8L2NyZWF0b3I+CiAgICA8Y3JlYXRvcj4KICAgICAgPGNyZWF0b3JOYW1lIG5hbWVUeXBlPSJQZXJzb25hbCI+WmhlbmcsIFpld3U8L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPlpld3U8L2dpdmVuTmFtZT4KICAgICAgPGZhbWlseU5hbWU+Wmhlbmc8L2ZhbWlseU5hbWU+CiAgICA8L2NyZWF0b3I+CiAgICA8Y3JlYXRvcj4KICAgICAgPGNyZWF0b3JOYW1lIG5hbWVUeXBlPSJQZXJzb25hbCI+WmhhbmcsIFFpeXVhbjwvY3JlYXRvck5hbWU+CiAgICAgIDxnaXZlbk5hbWU+UWl5dWFuPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPlpoYW5nPC9mYW1pbHlOYW1lPgogICAgPC9jcmVhdG9yPgogICAgPGNyZWF0b3I+CiAgICAgIDxjcmVhdG9yTmFtZSBuYW1lVHlwZT0iUGVyc29uYWwiPkh1YW5nLCBHYW88L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPkdhbzwvZ2l2ZW5OYW1lPgogICAgICA8ZmFtaWx5TmFtZT5IdWFuZzwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5ZYW5nLCBKdW48L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPkp1bjwvZ2l2ZW5OYW1lPgogICAgICA8ZmFtaWx5TmFtZT5ZYW5nPC9mYW1pbHlOYW1lPgogICAgPC9jcmVhdG9yPgogICAgPGNyZWF0b3I+CiAgICAgIDxjcmVhdG9yTmFtZSBuYW1lVHlwZT0iUGVyc29uYWwiPlpoYW8sIFFpYW5jaHVhbjwvY3JlYXRvck5hbWU+CiAgICAgIDxnaXZlbk5hbWU+UWlhbmNodWFuPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPlpoYW88L2ZhbWlseU5hbWU+CiAgICA8L2NyZWF0b3I+CiAgPC9jcmVhdG9ycz4KICA8dGl0bGVzPgogICAgPHRpdGxlPkJlbGlldmUgV2hhdCBZb3UgU2VlOiBJbXBsaWNpdCBDb25zdHJhaW50IEFwcHJvYWNoIGZvciBPZmZsaW5lIE11bHRpLUFnZW50IFJlaW5mb3JjZW1lbnQgTGVhcm5pbmc8L3RpdGxlPgogIDwvdGl0bGVzPgogIDxwdWJsaXNoZXI+YXJYaXY8L3B1Ymxpc2hlcj4KICA8cHVibGljYXRpb25ZZWFyPjIwMjE8L3B1YmxpY2F0aW9uWWVhcj4KICA8c3ViamVjdHM+CiAgICA8c3ViamVjdCB4bWw6bGFuZz0iZW4iIHN1YmplY3RTY2hlbWU9ImFyWGl2Ij5BcnRpZmljaWFsIEludGVsbGlnZW5jZSAoY3MuQUkpPC9zdWJqZWN0PgogICAgPHN1YmplY3Qgc3ViamVjdFNjaGVtZT0iRmllbGRzIG9mIFNjaWVuY2UgYW5kIFRlY2hub2xvZ3kgKEZPUykiPkZPUzogQ29tcHV0ZXIgYW5kIGluZm9ybWF0aW9uIHNjaWVuY2VzPC9zdWJqZWN0PgogIDwvc3ViamVjdHM+CiAgPGRhdGVzPgogICAgPGRhdGUgZGF0ZVR5cGU9IlN1Ym1pdHRlZCIgZGF0ZUluZm9ybWF0aW9uPSJ2MSI+MjAyMS0wNi0wN1QwODowMjozMVo8L2RhdGU+CiAgICA8ZGF0ZSBkYXRlVHlwZT0iVXBkYXRlZCIgZGF0ZUluZm9ybWF0aW9uPSJ2MSI+MjAyMS0wNi0wOFQwMDozNjoxNlo8L2RhdGU+CiAgICA8ZGF0ZSBkYXRlVHlwZT0iU3VibWl0dGVkIiBkYXRlSW5mb3JtYXRpb249InYyIj4yMDIxLTEwLTI2VDEwOjUwOjUwWjwvZGF0ZT4KICAgIDxkYXRlIGRhdGVUeXBlPSJVcGRhdGVkIiBkYXRlSW5mb3JtYXRpb249InYyIj4yMDIxLTEwLTI3VDAwOjI2OjQwWjwvZGF0ZT4KICAgIDxkYXRlIGRhdGVUeXBlPSJBdmFpbGFibGUiIGRhdGVJbmZvcm1hdGlvbj0idjEiPjIwMjEtMDY8L2RhdGU+CiAgPC9kYXRlcz4KICA8cmVzb3VyY2VUeXBlIHJlc291cmNlVHlwZUdlbmVyYWw9IlByZXByaW50Ij5BcnRpY2xlPC9yZXNvdXJjZVR5cGU+CiAgPHZlcnNpb24+MjwvdmVyc2lvbj4KICA8cmlnaHRzTGlzdD4KICAgIDxyaWdodHMgcmlnaHRzVVJJPSJodHRwOi8vY3JlYXRpdmVjb21tb25zLm9yZy9saWNlbnNlcy9ieS1uYy1zYS80LjAvIiByaWdodHNJZGVudGlmaWVyU2NoZW1lPSJTUERYIiByaWdodHNJZGVudGlmaWVyPSJDQy1CWS1OQy1TQS00LjAiPkNyZWF0aXZlIENvbW1vbnMgQXR0cmlidXRpb24gTm9uIENvbW1lcmNpYWwgU2hhcmUgQWxpa2UgNC4wIEludGVybmF0aW9uYWw8L3JpZ2h0cz4KICA8L3JpZ2h0c0xpc3Q+CiAgPGRlc2NyaXB0aW9ucz4KICAgIDxkZXNjcmlwdGlvbiBkZXNjcmlwdGlvblR5cGU9IkFic3RyYWN0Ij5MZWFybmluZyBmcm9tIGRhdGFzZXRzIHdpdGhvdXQgaW50ZXJhY3Rpb24gd2l0aCBlbnZpcm9ubWVudHMgKE9mZmxpbmUgTGVhcm5pbmcpIGlzIGFuIGVzc2VudGlhbCBzdGVwIHRvIGFwcGx5IFJlaW5mb3JjZW1lbnQgTGVhcm5pbmcgKFJMKSBhbGdvcml0aG1zIGluIHJlYWwtd29ybGQgc2NlbmFyaW9zLiBIb3dldmVyLCBjb21wYXJlZCB3aXRoIHRoZSBzaW5nbGUtYWdlbnQgY291bnRlcnBhcnQsIG9mZmxpbmUgbXVsdGktYWdlbnQgUkwgaW50cm9kdWNlcyBtb3JlIGFnZW50cyB3aXRoIHRoZSBsYXJnZXIgc3RhdGUgYW5kIGFjdGlvbiBzcGFjZSwgd2hpY2ggaXMgbW9yZSBjaGFsbGVuZ2luZyBidXQgYXR0cmFjdHMgbGl0dGxlIGF0dGVudGlvbi4gV2UgZGVtb25zdHJhdGUgY3VycmVudCBvZmZsaW5lIFJMIGFsZ29yaXRobXMgYXJlIGluZWZmZWN0aXZlIGluIG11bHRpLWFnZW50IHN5c3RlbXMgZHVlIHRvIHRoZSBhY2N1bXVsYXRlZCBleHRyYXBvbGF0aW9uIGVycm9yLiBJbiB0aGlzIHBhcGVyLCB3ZSBwcm9wb3NlIGEgbm92ZWwgb2ZmbGluZSBSTCBhbGdvcml0aG0sIG5hbWVkIEltcGxpY2l0IENvbnN0cmFpbnQgUS1sZWFybmluZyAoSUNRKSwgd2hpY2ggZWZmZWN0aXZlbHkgYWxsZXZpYXRlcyB0aGUgZXh0cmFwb2xhdGlvbiBlcnJvciBieSBvbmx5IHRydXN0aW5nIHRoZSBzdGF0ZS1hY3Rpb24gcGFpcnMgZ2l2ZW4gaW4gdGhlIGRhdGFzZXQgZm9yIHZhbHVlIGVzdGltYXRpb24uIE1vcmVvdmVyLCB3ZSBleHRlbmQgSUNRIHRvIG11bHRpLWFnZW50IHRhc2tzIGJ5IGRlY29tcG9zaW5nIHRoZSBqb2ludC1wb2xpY3kgdW5kZXIgdGhlIGltcGxpY2l0IGNvbnN0cmFpbnQuIEV4cGVyaW1lbnRhbCByZXN1bHRzIGRlbW9uc3RyYXRlIHRoYXQgdGhlIGV4dHJhcG9sYXRpb24gZXJyb3IgaXMgc3VjY2Vzc2Z1bGx5IGNvbnRyb2xsZWQgd2l0aGluIGEgcmVhc29uYWJsZSByYW5nZSBhbmQgaW5zZW5zaXRpdmUgdG8gdGhlIG51bWJlciBvZiBhZ2VudHMuIFdlIGZ1cnRoZXIgc2hvdyB0aGF0IElDUSBhY2hpZXZlcyB0aGUgc3RhdGUtb2YtdGhlLWFydCBwZXJmb3JtYW5jZSBpbiB0aGUgY2hhbGxlbmdpbmcgbXVsdGktYWdlbnQgb2ZmbGluZSB0YXNrcyAoU3RhckNyYWZ0IElJKS4gT3VyIGNvZGUgaXMgcHVibGljIG9ubGluZSBhdCBodHRwczovL2dpdGh1Yi5jb20vWWlxaW5ZYW5nL0lDUS48L2Rlc2NyaXB0aW9uPgogICAgPGRlc2NyaXB0aW9uIGRlc2NyaXB0aW9uVHlwZT0iT3RoZXIiPkFjY2VwdGVkIGJ5IE5ldXJJUFMyMDIxLiBUaGUgZmlyc3QgdHdvIGF1dGhvcnMgY29udHJpYnV0ZWQgZXF1YWxseSB0byB0aGUgd29yazwvZGVzY3JpcHRpb24+CiAgPC9kZXNjcmlwdGlvbnM+CjwvcmVzb3VyY2U+","url":"https://arxiv.org/abs/2106.03400","contentUrl":null,"metadataVersion":0,"schemaVersion":"http://datacite.org/schema/kernel-4","source":"mds","isActive":true,"state":"findable","reason":null,"viewCount":0,"viewsOverTime":[],"downloadCount":0,"downloadsOverTime":[],"referenceCount":0,"citationCount":0,"citationsOverTime":[],"partCount":0,"partOfCount":0,"versionCount":0,"versionOfCount":0,"created":"2022-02-23T14:04:07.000Z","registered":"2022-02-23T14:04:08.000Z","published":"2021","updated":"2022-02-23T14:04:08.000Z"},"relationships":{"client":{"data":{"id":"arxiv.content","type":"clients"}},"provider":{"data":{"id":"arxiv","type":"providers"}},"media":{"data":{"id":"10.48550/arxiv.2106.03400","type":"media"}},"references":{"data":[]},"citations":{"data":[]},"parts":{"data":[]},"partOf":{"data":[]},"versions":{"data":[]},"versionOf":{"data":[]}}}}