{"data":{"id":"10.48550/arxiv.2102.10697","type":"dois","attributes":{"doi":"10.48550/arxiv.2102.10697","prefix":"10.48550","suffix":"arxiv.2102.10697","identifiers":[{"identifier":"2102.10697","identifierType":"arXiv"}],"alternateIdentifiers":[{"alternateIdentifierType":"arXiv","alternateIdentifier":"2102.10697"}],"creators":[{"name":"Fajcik, Martin","nameType":"Personal","givenName":"Martin","familyName":"Fajcik","affiliation":[],"nameIdentifiers":[]},{"name":"Docekal, Martin","nameType":"Personal","givenName":"Martin","familyName":"Docekal","affiliation":[],"nameIdentifiers":[]},{"name":"Ondrej, Karel","nameType":"Personal","givenName":"Karel","familyName":"Ondrej","affiliation":[],"nameIdentifiers":[]},{"name":"Smrz, Pavel","nameType":"Personal","givenName":"Pavel","familyName":"Smrz","affiliation":[],"nameIdentifiers":[]}],"titles":[{"title":"Pruning the Index Contents for Memory Efficient Open-Domain QA"}],"publisher":"arXiv","container":{},"publicationYear":2021,"subjects":[{"lang":"en","subject":"Computation and Language (cs.CL)","subjectScheme":"arXiv"},{"lang":"en","subject":"Artificial Intelligence (cs.AI)","subjectScheme":"arXiv"},{"lang":"en","subject":"Machine Learning (cs.LG)","subjectScheme":"arXiv"},{"subject":"FOS: Computer and information sciences","subjectScheme":"Fields of Science and Technology (FOS)"},{"subject":"FOS: Computer and information sciences","schemeUri":"http://www.oecd.org/science/inno/38235147.pdf","subjectScheme":"Fields of Science and Technology (FOS)"}],"contributors":[],"dates":[{"date":"2021-02-21T21:56:38Z","dateType":"Submitted","dateInformation":"v1"},{"date":"2021-02-23T01:29:20Z","dateType":"Updated","dateInformation":"v1"},{"date":"2021-04-09T19:02:54Z","dateType":"Submitted","dateInformation":"v2"},{"date":"2021-04-13T00:01:36Z","dateType":"Updated","dateInformation":"v2"},{"date":"2021-02","dateType":"Available","dateInformation":"v1"},{"date":"2021","dateType":"Issued"}],"language":null,"types":{"ris":"GEN","bibtex":"misc","citeproc":"article","schemaOrg":"CreativeWork","resourceType":"Article","resourceTypeGeneral":"Preprint"},"relatedIdentifiers":[],"relatedItems":[],"sizes":[],"formats":[],"version":"2","rightsList":[{"rights":"Creative Commons Attribution 4.0 International","rightsUri":"https://creativecommons.org/licenses/by/4.0/legalcode","schemeUri":"https://spdx.org/licenses/","rightsIdentifier":"cc-by-4.0","rightsIdentifierScheme":"SPDX"}],"descriptions":[{"description":"This work presents a novel pipeline that demonstrates what is achievable with a combined effort of state-of-the-art approaches. Specifically, it proposes the novel R2-D2 (Rank twice, reaD twice) pipeline composed of retriever, passage reranker, extractive reader, generative reader and a simple way to combine them. Furthermore, previous work often comes with a massive index of external documents that scales in the order of tens of GiB. This work presents a simple approach for pruning the contents of a massive index such that the open-domain QA system altogether with index, OS, and library components fits into 6GiB docker image while retaining only 8% of original index contents and losing only 3% EM accuracy.","descriptionType":"Abstract"},{"description":"v2 - added connection between pruner and DPR, results on TriviaQA, new reranker, results with HN-DPR checkpoint and additional analyses","descriptionType":"Other"}],"geoLocations":[],"fundingReferences":[],"xml":"PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz4KPHJlc291cmNlIHhtbG5zPSJodHRwOi8vZGF0YWNpdGUub3JnL3NjaGVtYS9rZXJuZWwtNCIgeG1sbnM6eHNpPSJodHRwOi8vd3d3LnczLm9yZy8yMDAxL1hNTFNjaGVtYS1pbnN0YW5jZSIgeHNpOnNjaGVtYUxvY2F0aW9uPSJodHRwOi8vZGF0YWNpdGUub3JnL3NjaGVtYS9rZXJuZWwtNCBodHRwOi8vc2NoZW1hLmRhdGFjaXRlLm9yZy9tZXRhL2tlcm5lbC00LjMvbWV0YWRhdGEueHNkIj4KICA8aWRlbnRpZmllciBpZGVudGlmaWVyVHlwZT0iRE9JIj4xMC40ODU1MC9BUlhJVi4yMTAyLjEwNjk3PC9pZGVudGlmaWVyPgogIDxhbHRlcm5hdGVJZGVudGlmaWVycz4KICAgIDxhbHRlcm5hdGVJZGVudGlmaWVyIGFsdGVybmF0ZUlkZW50aWZpZXJUeXBlPSJhclhpdiI+MjEwMi4xMDY5NzwvYWx0ZXJuYXRlSWRlbnRpZmllcj4KICA8L2FsdGVybmF0ZUlkZW50aWZpZXJzPgogIDxjcmVhdG9ycz4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5GYWpjaWssIE1hcnRpbjwvY3JlYXRvck5hbWU+CiAgICAgIDxnaXZlbk5hbWU+TWFydGluPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPkZhamNpazwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5Eb2Nla2FsLCBNYXJ0aW48L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPk1hcnRpbjwvZ2l2ZW5OYW1lPgogICAgICA8ZmFtaWx5TmFtZT5Eb2Nla2FsPC9mYW1pbHlOYW1lPgogICAgPC9jcmVhdG9yPgogICAgPGNyZWF0b3I+CiAgICAgIDxjcmVhdG9yTmFtZSBuYW1lVHlwZT0iUGVyc29uYWwiPk9uZHJlaiwgS2FyZWw8L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPkthcmVsPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPk9uZHJlajwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5TbXJ6LCBQYXZlbDwvY3JlYXRvck5hbWU+CiAgICAgIDxnaXZlbk5hbWU+UGF2ZWw8L2dpdmVuTmFtZT4KICAgICAgPGZhbWlseU5hbWU+U21yejwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICA8L2NyZWF0b3JzPgogIDx0aXRsZXM+CiAgICA8dGl0bGU+UHJ1bmluZyB0aGUgSW5kZXggQ29udGVudHMgZm9yIE1lbW9yeSBFZmZpY2llbnQgT3Blbi1Eb21haW4gUUE8L3RpdGxlPgogIDwvdGl0bGVzPgogIDxwdWJsaXNoZXI+YXJYaXY8L3B1Ymxpc2hlcj4KICA8cHVibGljYXRpb25ZZWFyPjIwMjE8L3B1YmxpY2F0aW9uWWVhcj4KICA8c3ViamVjdHM+CiAgICA8c3ViamVjdCB4bWw6bGFuZz0iZW4iIHN1YmplY3RTY2hlbWU9ImFyWGl2Ij5Db21wdXRhdGlvbiBhbmQgTGFuZ3VhZ2UgKGNzLkNMKTwvc3ViamVjdD4KICAgIDxzdWJqZWN0IHhtbDpsYW5nPSJlbiIgc3ViamVjdFNjaGVtZT0iYXJYaXYiPkFydGlmaWNpYWwgSW50ZWxsaWdlbmNlIChjcy5BSSk8L3N1YmplY3Q+CiAgICA8c3ViamVjdCB4bWw6bGFuZz0iZW4iIHN1YmplY3RTY2hlbWU9ImFyWGl2Ij5NYWNoaW5lIExlYXJuaW5nIChjcy5MRyk8L3N1YmplY3Q+CiAgICA8c3ViamVjdCBzdWJqZWN0U2NoZW1lPSJGaWVsZHMgb2YgU2NpZW5jZSBhbmQgVGVjaG5vbG9neSAoRk9TKSI+Rk9TOiBDb21wdXRlciBhbmQgaW5mb3JtYXRpb24gc2NpZW5jZXM8L3N1YmplY3Q+CiAgPC9zdWJqZWN0cz4KICA8ZGF0ZXM+CiAgICA8ZGF0ZSBkYXRlVHlwZT0iU3VibWl0dGVkIiBkYXRlSW5mb3JtYXRpb249InYxIj4yMDIxLTAyLTIxVDIxOjU2OjM4WjwvZGF0ZT4KICAgIDxkYXRlIGRhdGVUeXBlPSJVcGRhdGVkIiBkYXRlSW5mb3JtYXRpb249InYxIj4yMDIxLTAyLTIzVDAxOjI5OjIwWjwvZGF0ZT4KICAgIDxkYXRlIGRhdGVUeXBlPSJTdWJtaXR0ZWQiIGRhdGVJbmZvcm1hdGlvbj0idjIiPjIwMjEtMDQtMDlUMTk6MDI6NTRaPC9kYXRlPgogICAgPGRhdGUgZGF0ZVR5cGU9IlVwZGF0ZWQiIGRhdGVJbmZvcm1hdGlvbj0idjIiPjIwMjEtMDQtMTNUMDA6MDE6MzZaPC9kYXRlPgogICAgPGRhdGUgZGF0ZVR5cGU9IkF2YWlsYWJsZSIgZGF0ZUluZm9ybWF0aW9uPSJ2MSI+MjAyMS0wMjwvZGF0ZT4KICA8L2RhdGVzPgogIDxyZXNvdXJjZVR5cGUgcmVzb3VyY2VUeXBlR2VuZXJhbD0iUHJlcHJpbnQiPkFydGljbGU8L3Jlc291cmNlVHlwZT4KICA8dmVyc2lvbj4yPC92ZXJzaW9uPgogIDxyaWdodHNMaXN0PgogICAgPHJpZ2h0cyByaWdodHNVUkk9Imh0dHA6Ly9jcmVhdGl2ZWNvbW1vbnMub3JnL2xpY2Vuc2VzL2J5LzQuMC8iIHJpZ2h0c0lkZW50aWZpZXJTY2hlbWU9IlNQRFgiIHJpZ2h0c0lkZW50aWZpZXI9IkNDLUJZLTQuMCI+Q3JlYXRpdmUgQ29tbW9ucyBBdHRyaWJ1dGlvbiA0LjAgSW50ZXJuYXRpb25hbDwvcmlnaHRzPgogIDwvcmlnaHRzTGlzdD4KICA8ZGVzY3JpcHRpb25zPgogICAgPGRlc2NyaXB0aW9uIGRlc2NyaXB0aW9uVHlwZT0iQWJzdHJhY3QiPlRoaXMgd29yayBwcmVzZW50cyBhIG5vdmVsIHBpcGVsaW5lIHRoYXQgZGVtb25zdHJhdGVzIHdoYXQgaXMgYWNoaWV2YWJsZSB3aXRoIGEgY29tYmluZWQgZWZmb3J0IG9mIHN0YXRlLW9mLXRoZS1hcnQgYXBwcm9hY2hlcy4gU3BlY2lmaWNhbGx5LCBpdCBwcm9wb3NlcyB0aGUgbm92ZWwgUjItRDIgKFJhbmsgdHdpY2UsIHJlYUQgdHdpY2UpIHBpcGVsaW5lIGNvbXBvc2VkIG9mIHJldHJpZXZlciwgcGFzc2FnZSByZXJhbmtlciwgZXh0cmFjdGl2ZSByZWFkZXIsIGdlbmVyYXRpdmUgcmVhZGVyIGFuZCBhIHNpbXBsZSB3YXkgdG8gY29tYmluZSB0aGVtLiBGdXJ0aGVybW9yZSwgcHJldmlvdXMgd29yayBvZnRlbiBjb21lcyB3aXRoIGEgbWFzc2l2ZSBpbmRleCBvZiBleHRlcm5hbCBkb2N1bWVudHMgdGhhdCBzY2FsZXMgaW4gdGhlIG9yZGVyIG9mIHRlbnMgb2YgR2lCLiBUaGlzIHdvcmsgcHJlc2VudHMgYSBzaW1wbGUgYXBwcm9hY2ggZm9yIHBydW5pbmcgdGhlIGNvbnRlbnRzIG9mIGEgbWFzc2l2ZSBpbmRleCBzdWNoIHRoYXQgdGhlIG9wZW4tZG9tYWluIFFBIHN5c3RlbSBhbHRvZ2V0aGVyIHdpdGggaW5kZXgsIE9TLCBhbmQgbGlicmFyeSBjb21wb25lbnRzIGZpdHMgaW50byA2R2lCIGRvY2tlciBpbWFnZSB3aGlsZSByZXRhaW5pbmcgb25seSA4JSBvZiBvcmlnaW5hbCBpbmRleCBjb250ZW50cyBhbmQgbG9zaW5nIG9ubHkgMyUgRU0gYWNjdXJhY3kuPC9kZXNjcmlwdGlvbj4KICAgIDxkZXNjcmlwdGlvbiBkZXNjcmlwdGlvblR5cGU9Ik90aGVyIj52MiAtIGFkZGVkIGNvbm5lY3Rpb24gYmV0d2VlbiBwcnVuZXIgYW5kIERQUiwgcmVzdWx0cyBvbiBUcml2aWFRQSwgbmV3IHJlcmFua2VyLCByZXN1bHRzIHdpdGggSE4tRFBSIGNoZWNrcG9pbnQgYW5kIGFkZGl0aW9uYWwgYW5hbHlzZXM8L2Rlc2NyaXB0aW9uPgogIDwvZGVzY3JpcHRpb25zPgo8L3Jlc291cmNlPg==","url":"https://arxiv.org/abs/2102.10697","contentUrl":null,"metadataVersion":0,"schemaVersion":"http://datacite.org/schema/kernel-4","source":"mds","isActive":true,"state":"findable","reason":null,"viewCount":0,"viewsOverTime":[],"downloadCount":0,"downloadsOverTime":[],"referenceCount":0,"citationCount":0,"citationsOverTime":[],"partCount":0,"partOfCount":0,"versionCount":0,"versionOfCount":0,"created":"2022-02-23T13:34:42.000Z","registered":"2022-02-23T13:34:43.000Z","published":"2021","updated":"2022-02-23T13:34:43.000Z"},"relationships":{"client":{"data":{"id":"arxiv.content","type":"clients"}},"provider":{"data":{"id":"arxiv","type":"providers"}},"media":{"data":{"id":"10.48550/arxiv.2102.10697","type":"media"}},"references":{"data":[]},"citations":{"data":[]},"parts":{"data":[]},"partOf":{"data":[]},"versions":{"data":[]},"versionOf":{"data":[]}}}}