{"data":{"id":"10.48550/arxiv.2006.11648","type":"dois","attributes":{"doi":"10.48550/arxiv.2006.11648","prefix":"10.48550","suffix":"arxiv.2006.11648","identifiers":[{"identifier":"2006.11648","identifierType":"arXiv"}],"alternateIdentifiers":[{"alternateIdentifierType":"arXiv","alternateIdentifier":"2006.11648"}],"creators":[{"name":"Brand, Jan van den","nameType":"Personal","givenName":"Jan van den","familyName":"Brand","affiliation":[],"nameIdentifiers":[]},{"name":"Peng, Binghui","nameType":"Personal","givenName":"Binghui","familyName":"Peng","affiliation":[],"nameIdentifiers":[]},{"name":"Song, Zhao","nameType":"Personal","givenName":"Zhao","familyName":"Song","affiliation":[],"nameIdentifiers":[]},{"name":"Weinstein, Omri","nameType":"Personal","givenName":"Omri","familyName":"Weinstein","affiliation":[],"nameIdentifiers":[]}],"titles":[{"title":"Training (Overparametrized) Neural Networks in Near-Linear Time"}],"publisher":"arXiv","container":{},"publicationYear":2020,"subjects":[{"lang":"en","subject":"Machine Learning (cs.LG)","subjectScheme":"arXiv"},{"lang":"en","subject":"Data Structures and Algorithms (cs.DS)","subjectScheme":"arXiv"},{"lang":"en","subject":"Machine Learning (stat.ML)","subjectScheme":"arXiv"},{"subject":"FOS: Computer and information sciences","subjectScheme":"Fields of Science and Technology (FOS)"},{"subject":"FOS: Computer and information sciences","schemeUri":"http://www.oecd.org/science/inno/38235147.pdf","subjectScheme":"Fields of Science and Technology (FOS)"}],"contributors":[],"dates":[{"date":"2020-06-20T20:26:14Z","dateType":"Submitted","dateInformation":"v1"},{"date":"2020-06-23T00:15:10Z","dateType":"Updated","dateInformation":"v1"},{"date":"2020-12-09T02:02:10Z","dateType":"Submitted","dateInformation":"v2"},{"date":"2020-12-10T01:08:14Z","dateType":"Updated","dateInformation":"v2"},{"date":"2020-06","dateType":"Available","dateInformation":"v1"},{"date":"2020","dateType":"Issued"}],"language":null,"types":{"ris":"GEN","bibtex":"misc","citeproc":"article","schemaOrg":"CreativeWork","resourceType":"Article","resourceTypeGeneral":"Preprint"},"relatedIdentifiers":[],"relatedItems":[],"sizes":[],"formats":[],"version":"2","rightsList":[{"rights":"Creative Commons Attribution Share Alike 4.0 International","rightsUri":"https://creativecommons.org/licenses/by-sa/4.0/legalcode","schemeUri":"https://spdx.org/licenses/","rightsIdentifier":"cc-by-sa-4.0","rightsIdentifierScheme":"SPDX"}],"descriptions":[{"description":"The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster $\\mathit{second}$-$\\mathit{order}$ optimization algorithms beyond SGD, without compromising the generalization error. Despite their remarkable convergence rate ($\\mathit{independent}$ of the training batch size $n$), second-order algorithms incur a daunting slowdown in the $\\mathit{cost}$ $\\mathit{per}$ $\\mathit{iteration}$ (inverting the Hessian matrix of the loss function), which renders them impractical. Very recently, this computational overhead was mitigated by the works of [ZMG19,CGH+19}, yielding an $O(mn^2)$-time second-order algorithm for training two-layer overparametrized neural networks of polynomial width $m$. We show how to speed up the algorithm of [CGH+19], achieving an $\\tilde{O}(mn)$-time backpropagation algorithm for training (mildly overparametrized) ReLU networks, which is near-linear in the dimension ($mn$) of the full gradient (Jacobian) matrix. The centerpiece of our algorithm is to reformulate the Gauss-Newton iteration as an $\\ell_2$-regression problem, and then use a Fast-JL type dimension reduction to $\\mathit{precondition}$ the underlying Gram matrix in time independent of $M$, allowing to find a sufficiently good approximate solution via $\\mathit{first}$-$\\mathit{order}$ conjugate gradient. Our result provides a proof-of-concept that advanced machinery from randomized linear algebra -- which led to recent breakthroughs in $\\mathit{convex}$ $\\mathit{optimization}$ (ERM, LPs, Regression) -- can be carried over to the realm of deep learning as well.","descriptionType":"Abstract"}],"geoLocations":[],"fundingReferences":[],"xml":"PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz4KPHJlc291cmNlIHhtbG5zPSJodHRwOi8vZGF0YWNpdGUub3JnL3NjaGVtYS9rZXJuZWwtNCIgeG1sbnM6eHNpPSJodHRwOi8vd3d3LnczLm9yZy8yMDAxL1hNTFNjaGVtYS1pbnN0YW5jZSIgeHNpOnNjaGVtYUxvY2F0aW9uPSJodHRwOi8vZGF0YWNpdGUub3JnL3NjaGVtYS9rZXJuZWwtNCBodHRwOi8vc2NoZW1hLmRhdGFjaXRlLm9yZy9tZXRhL2tlcm5lbC00LjMvbWV0YWRhdGEueHNkIj4KICA8aWRlbnRpZmllciBpZGVudGlmaWVyVHlwZT0iRE9JIj4xMC40ODU1MC9BUlhJVi4yMDA2LjExNjQ4PC9pZGVudGlmaWVyPgogIDxhbHRlcm5hdGVJZGVudGlmaWVycz4KICAgIDxhbHRlcm5hdGVJZGVudGlmaWVyIGFsdGVybmF0ZUlkZW50aWZpZXJUeXBlPSJhclhpdiI+MjAwNi4xMTY0ODwvYWx0ZXJuYXRlSWRlbnRpZmllcj4KICA8L2FsdGVybmF0ZUlkZW50aWZpZXJzPgogIDxjcmVhdG9ycz4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5CcmFuZCwgSmFuIHZhbiBkZW48L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPkphbiB2YW4gZGVuPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPkJyYW5kPC9mYW1pbHlOYW1lPgogICAgPC9jcmVhdG9yPgogICAgPGNyZWF0b3I+CiAgICAgIDxjcmVhdG9yTmFtZSBuYW1lVHlwZT0iUGVyc29uYWwiPlBlbmcsIEJpbmdodWk8L2NyZWF0b3JOYW1lPgogICAgICA8Z2l2ZW5OYW1lPkJpbmdodWk8L2dpdmVuTmFtZT4KICAgICAgPGZhbWlseU5hbWU+UGVuZzwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5Tb25nLCBaaGFvPC9jcmVhdG9yTmFtZT4KICAgICAgPGdpdmVuTmFtZT5aaGFvPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPlNvbmc8L2ZhbWlseU5hbWU+CiAgICA8L2NyZWF0b3I+CiAgICA8Y3JlYXRvcj4KICAgICAgPGNyZWF0b3JOYW1lIG5hbWVUeXBlPSJQZXJzb25hbCI+V2VpbnN0ZWluLCBPbXJpPC9jcmVhdG9yTmFtZT4KICAgICAgPGdpdmVuTmFtZT5PbXJpPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPldlaW5zdGVpbjwvZmFtaWx5TmFtZT4KICAgIDwvY3JlYXRvcj4KICA8L2NyZWF0b3JzPgogIDx0aXRsZXM+CiAgICA8dGl0bGU+VHJhaW5pbmcgKE92ZXJwYXJhbWV0cml6ZWQpIE5ldXJhbCBOZXR3b3JrcyBpbiBOZWFyLUxpbmVhciBUaW1lPC90aXRsZT4KICA8L3RpdGxlcz4KICA8cHVibGlzaGVyPmFyWGl2PC9wdWJsaXNoZXI+CiAgPHB1YmxpY2F0aW9uWWVhcj4yMDIwPC9wdWJsaWNhdGlvblllYXI+CiAgPHN1YmplY3RzPgogICAgPHN1YmplY3QgeG1sOmxhbmc9ImVuIiBzdWJqZWN0U2NoZW1lPSJhclhpdiI+TWFjaGluZSBMZWFybmluZyAoY3MuTEcpPC9zdWJqZWN0PgogICAgPHN1YmplY3QgeG1sOmxhbmc9ImVuIiBzdWJqZWN0U2NoZW1lPSJhclhpdiI+RGF0YSBTdHJ1Y3R1cmVzIGFuZCBBbGdvcml0aG1zIChjcy5EUyk8L3N1YmplY3Q+CiAgICA8c3ViamVjdCB4bWw6bGFuZz0iZW4iIHN1YmplY3RTY2hlbWU9ImFyWGl2Ij5NYWNoaW5lIExlYXJuaW5nIChzdGF0Lk1MKTwvc3ViamVjdD4KICAgIDxzdWJqZWN0IHN1YmplY3RTY2hlbWU9IkZpZWxkcyBvZiBTY2llbmNlIGFuZCBUZWNobm9sb2d5IChGT1MpIj5GT1M6IENvbXB1dGVyIGFuZCBpbmZvcm1hdGlvbiBzY2llbmNlczwvc3ViamVjdD4KICA8L3N1YmplY3RzPgogIDxkYXRlcz4KICAgIDxkYXRlIGRhdGVUeXBlPSJTdWJtaXR0ZWQiIGRhdGVJbmZvcm1hdGlvbj0idjEiPjIwMjAtMDYtMjBUMjA6MjY6MTRaPC9kYXRlPgogICAgPGRhdGUgZGF0ZVR5cGU9IlVwZGF0ZWQiIGRhdGVJbmZvcm1hdGlvbj0idjEiPjIwMjAtMDYtMjNUMDA6MTU6MTBaPC9kYXRlPgogICAgPGRhdGUgZGF0ZVR5cGU9IlN1Ym1pdHRlZCIgZGF0ZUluZm9ybWF0aW9uPSJ2MiI+MjAyMC0xMi0wOVQwMjowMjoxMFo8L2RhdGU+CiAgICA8ZGF0ZSBkYXRlVHlwZT0iVXBkYXRlZCIgZGF0ZUluZm9ybWF0aW9uPSJ2MiI+MjAyMC0xMi0xMFQwMTowODoxNFo8L2RhdGU+CiAgICA8ZGF0ZSBkYXRlVHlwZT0iQXZhaWxhYmxlIiBkYXRlSW5mb3JtYXRpb249InYxIj4yMDIwLTA2PC9kYXRlPgogIDwvZGF0ZXM+CiAgPHJlc291cmNlVHlwZSByZXNvdXJjZVR5cGVHZW5lcmFsPSJQcmVwcmludCI+QXJ0aWNsZTwvcmVzb3VyY2VUeXBlPgogIDx2ZXJzaW9uPjI8L3ZlcnNpb24+CiAgPHJpZ2h0c0xpc3Q+CiAgICA8cmlnaHRzIHJpZ2h0c1VSST0iaHR0cDovL2NyZWF0aXZlY29tbW9ucy5vcmcvbGljZW5zZXMvYnktc2EvNC4wLyIgcmlnaHRzSWRlbnRpZmllclNjaGVtZT0iU1BEWCIgcmlnaHRzSWRlbnRpZmllcj0iQ0MtQlktU0EtNC4wIj5DcmVhdGl2ZSBDb21tb25zIEF0dHJpYnV0aW9uIFNoYXJlIEFsaWtlIDQuMCBJbnRlcm5hdGlvbmFsPC9yaWdodHM+CiAgPC9yaWdodHNMaXN0PgogIDxkZXNjcmlwdGlvbnM+CiAgICA8ZGVzY3JpcHRpb24gZGVzY3JpcHRpb25UeXBlPSJBYnN0cmFjdCI+VGhlIHNsb3cgY29udmVyZ2VuY2UgcmF0ZSBhbmQgcGF0aG9sb2dpY2FsIGN1cnZhdHVyZSBpc3N1ZXMgb2YgZmlyc3Qtb3JkZXIgZ3JhZGllbnQgbWV0aG9kcyBmb3IgdHJhaW5pbmcgZGVlcCBuZXVyYWwgbmV0d29ya3MsIGluaXRpYXRlZCBhbiBvbmdvaW5nIGVmZm9ydCBmb3IgZGV2ZWxvcGluZyBmYXN0ZXIgJFxtYXRoaXR7c2Vjb25kfSQtJFxtYXRoaXR7b3JkZXJ9JCBvcHRpbWl6YXRpb24gYWxnb3JpdGhtcyBiZXlvbmQgU0dELCB3aXRob3V0IGNvbXByb21pc2luZyB0aGUgZ2VuZXJhbGl6YXRpb24gZXJyb3IuIERlc3BpdGUgdGhlaXIgcmVtYXJrYWJsZSBjb252ZXJnZW5jZSByYXRlICgkXG1hdGhpdHtpbmRlcGVuZGVudH0kIG9mIHRoZSB0cmFpbmluZyBiYXRjaCBzaXplICRuJCksIHNlY29uZC1vcmRlciBhbGdvcml0aG1zIGluY3VyIGEgZGF1bnRpbmcgc2xvd2Rvd24gaW4gdGhlICRcbWF0aGl0e2Nvc3R9JCAkXG1hdGhpdHtwZXJ9JCAkXG1hdGhpdHtpdGVyYXRpb259JCAoaW52ZXJ0aW5nIHRoZSBIZXNzaWFuIG1hdHJpeCBvZiB0aGUgbG9zcyBmdW5jdGlvbiksIHdoaWNoIHJlbmRlcnMgdGhlbSBpbXByYWN0aWNhbC4gVmVyeSByZWNlbnRseSwgdGhpcyBjb21wdXRhdGlvbmFsIG92ZXJoZWFkIHdhcyBtaXRpZ2F0ZWQgYnkgdGhlIHdvcmtzIG9mIFtaTUcxOSxDR0grMTl9LCB5aWVsZGluZyBhbiAkTyhtbl4yKSQtdGltZSBzZWNvbmQtb3JkZXIgYWxnb3JpdGhtIGZvciB0cmFpbmluZyB0d28tbGF5ZXIgb3ZlcnBhcmFtZXRyaXplZCBuZXVyYWwgbmV0d29ya3Mgb2YgcG9seW5vbWlhbCB3aWR0aCAkbSQuCiAgV2Ugc2hvdyBob3cgdG8gc3BlZWQgdXAgdGhlIGFsZ29yaXRobSBvZiBbQ0dIKzE5XSwgYWNoaWV2aW5nIGFuICRcdGlsZGV7T30obW4pJC10aW1lIGJhY2twcm9wYWdhdGlvbiBhbGdvcml0aG0gZm9yIHRyYWluaW5nIChtaWxkbHkgb3ZlcnBhcmFtZXRyaXplZCkgUmVMVSBuZXR3b3Jrcywgd2hpY2ggaXMgbmVhci1saW5lYXIgaW4gdGhlIGRpbWVuc2lvbiAoJG1uJCkgb2YgdGhlIGZ1bGwgZ3JhZGllbnQgKEphY29iaWFuKSBtYXRyaXguIFRoZSBjZW50ZXJwaWVjZSBvZiBvdXIgYWxnb3JpdGhtIGlzIHRvIHJlZm9ybXVsYXRlIHRoZSBHYXVzcy1OZXd0b24gaXRlcmF0aW9uIGFzIGFuICRcZWxsXzIkLXJlZ3Jlc3Npb24gcHJvYmxlbSwgYW5kIHRoZW4gdXNlIGEgRmFzdC1KTCB0eXBlIGRpbWVuc2lvbiByZWR1Y3Rpb24gdG8gJFxtYXRoaXR7cHJlY29uZGl0aW9ufSQgdGhlIHVuZGVybHlpbmcgR3JhbSBtYXRyaXggaW4gdGltZSBpbmRlcGVuZGVudCBvZiAkTSQsIGFsbG93aW5nIHRvIGZpbmQgYSBzdWZmaWNpZW50bHkgZ29vZCBhcHByb3hpbWF0ZSBzb2x1dGlvbiB2aWEgJFxtYXRoaXR7Zmlyc3R9JC0kXG1hdGhpdHtvcmRlcn0kIGNvbmp1Z2F0ZSBncmFkaWVudC4gT3VyIHJlc3VsdCBwcm92aWRlcyBhIHByb29mLW9mLWNvbmNlcHQgdGhhdCBhZHZhbmNlZCBtYWNoaW5lcnkgZnJvbSByYW5kb21pemVkIGxpbmVhciBhbGdlYnJhIC0tIHdoaWNoIGxlZCB0byByZWNlbnQgYnJlYWt0aHJvdWdocyBpbiAkXG1hdGhpdHtjb252ZXh9JCAkXG1hdGhpdHtvcHRpbWl6YXRpb259JCAoRVJNLCBMUHMsIFJlZ3Jlc3Npb24pIC0tIGNhbiBiZSBjYXJyaWVkIG92ZXIgdG8gdGhlIHJlYWxtIG9mIGRlZXAgbGVhcm5pbmcgYXMgd2VsbC48L2Rlc2NyaXB0aW9uPgogIDwvZGVzY3JpcHRpb25zPgo8L3Jlc291cmNlPg==","url":"https://arxiv.org/abs/2006.11648","contentUrl":null,"metadataVersion":0,"schemaVersion":"http://datacite.org/schema/kernel-4","source":"mds","isActive":true,"state":"findable","reason":null,"viewCount":0,"viewsOverTime":[],"downloadCount":0,"downloadsOverTime":[],"referenceCount":0,"citationCount":0,"citationsOverTime":[],"partCount":0,"partOfCount":0,"versionCount":0,"versionOfCount":0,"created":"2022-02-25T07:41:11.000Z","registered":"2022-02-25T07:41:12.000Z","published":"2020","updated":"2022-02-25T07:41:12.000Z"},"relationships":{"client":{"data":{"id":"arxiv.content","type":"clients"}},"provider":{"data":{"id":"arxiv","type":"providers"}},"media":{"data":{"id":"10.48550/arxiv.2006.11648","type":"media"}},"references":{"data":[]},"citations":{"data":[]},"parts":{"data":[]},"partOf":{"data":[]},"versions":{"data":[]},"versionOf":{"data":[]}}}}