{"data":{"id":"10.48550/arxiv.2010.01637","type":"dois","attributes":{"doi":"10.48550/arxiv.2010.01637","prefix":"10.48550","suffix":"arxiv.2010.01637","identifiers":[{"identifier":"2010.01637","identifierType":"arXiv"}],"alternateIdentifiers":[{"alternateIdentifierType":"arXiv","alternateIdentifier":"2010.01637"}],"creators":[{"name":"Wang, Jun-Kun","nameType":"Personal","givenName":"Jun-Kun","familyName":"Wang","affiliation":[],"nameIdentifiers":[]},{"name":"Abernethy, Jacob","nameType":"Personal","givenName":"Jacob","familyName":"Abernethy","affiliation":[],"nameIdentifiers":[]}],"titles":[{"title":"Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron"}],"publisher":"arXiv","container":{},"publicationYear":2020,"subjects":[{"lang":"en","subject":"Machine Learning (cs.LG)","subjectScheme":"arXiv"},{"lang":"en","subject":"Machine Learning (stat.ML)","subjectScheme":"arXiv"},{"subject":"FOS: Computer and information sciences","subjectScheme":"Fields of Science and Technology (FOS)"},{"subject":"FOS: Computer and information sciences","schemeUri":"http://www.oecd.org/science/inno/38235147.pdf","subjectScheme":"Fields of Science and Technology (FOS)"}],"contributors":[],"dates":[{"date":"2020-10-04T17:27:44Z","dateType":"Submitted","dateInformation":"v1"},{"date":"2020-10-06T00:22:38Z","dateType":"Updated","dateInformation":"v1"},{"date":"2021-05-25T18:41:22Z","dateType":"Submitted","dateInformation":"v2"},{"date":"2021-05-27T00:01:21Z","dateType":"Updated","dateInformation":"v2"},{"date":"2021-09-27T18:59:35Z","dateType":"Submitted","dateInformation":"v3"},{"date":"2021-09-29T00:02:07Z","dateType":"Updated","dateInformation":"v3"},{"date":"2020-10","dateType":"Available","dateInformation":"v1"},{"date":"2020","dateType":"Issued"}],"language":null,"types":{"ris":"GEN","bibtex":"misc","citeproc":"article","schemaOrg":"CreativeWork","resourceType":"Article","resourceTypeGeneral":"Preprint"},"relatedIdentifiers":[],"relatedItems":[],"sizes":[],"formats":[],"version":"3","rightsList":[{"rights":"arXiv.org perpetual, non-exclusive license","rightsUri":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/"}],"descriptions":[{"description":"Over-parametrization has become a popular technique in deep learning. It is observed that by over-parametrization, a larger neural network needs a fewer training iterations than a smaller one to achieve a certain level of performance -- namely, over-parametrization leads to acceleration in optimization. However, despite that over-parametrization is widely used nowadays, little theory is available to explain the acceleration due to over-parametrization. In this paper, we propose understanding it by studying a simple problem first. Specifically, we consider the setting that there is a single teacher neuron with quadratic activation, where over-parametrization is realized by having multiple student neurons learn the data generated from the teacher neuron. We provably show that over-parametrization helps the iterate generated by gradient descent to enter the neighborhood of a global optimal solution that achieves zero testing error faster. On the other hand, we also point out an issue regarding the necessity of over-parametrization and study how the scaling of the output neurons affects the convergence time.","descriptionType":"Abstract"},{"description":"Accepted at ACML 2021","descriptionType":"Other"}],"geoLocations":[],"fundingReferences":[],"xml":"PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz4KPHJlc291cmNlIHhtbG5zPSJodHRwOi8vZGF0YWNpdGUub3JnL3NjaGVtYS9rZXJuZWwtNCIgeG1sbnM6eHNpPSJodHRwOi8vd3d3LnczLm9yZy8yMDAxL1hNTFNjaGVtYS1pbnN0YW5jZSIgeHNpOnNjaGVtYUxvY2F0aW9uPSJodHRwOi8vZGF0YWNpdGUub3JnL3NjaGVtYS9rZXJuZWwtNCBodHRwOi8vc2NoZW1hLmRhdGFjaXRlLm9yZy9tZXRhL2tlcm5lbC00LjMvbWV0YWRhdGEueHNkIj4KICA8aWRlbnRpZmllciBpZGVudGlmaWVyVHlwZT0iRE9JIj4xMC40ODU1MC9BUlhJVi4yMDEwLjAxNjM3PC9pZGVudGlmaWVyPgogIDxhbHRlcm5hdGVJZGVudGlmaWVycz4KICAgIDxhbHRlcm5hdGVJZGVudGlmaWVyIGFsdGVybmF0ZUlkZW50aWZpZXJUeXBlPSJhclhpdiI+MjAxMC4wMTYzNzwvYWx0ZXJuYXRlSWRlbnRpZmllcj4KICA8L2FsdGVybmF0ZUlkZW50aWZpZXJzPgogIDxjcmVhdG9ycz4KICAgIDxjcmVhdG9yPgogICAgICA8Y3JlYXRvck5hbWUgbmFtZVR5cGU9IlBlcnNvbmFsIj5XYW5nLCBKdW4tS3VuPC9jcmVhdG9yTmFtZT4KICAgICAgPGdpdmVuTmFtZT5KdW4tS3VuPC9naXZlbk5hbWU+CiAgICAgIDxmYW1pbHlOYW1lPldhbmc8L2ZhbWlseU5hbWU+CiAgICA8L2NyZWF0b3I+CiAgICA8Y3JlYXRvcj4KICAgICAgPGNyZWF0b3JOYW1lIG5hbWVUeXBlPSJQZXJzb25hbCI+QWJlcm5ldGh5LCBKYWNvYjwvY3JlYXRvck5hbWU+CiAgICAgIDxnaXZlbk5hbWU+SmFjb2I8L2dpdmVuTmFtZT4KICAgICAgPGZhbWlseU5hbWU+QWJlcm5ldGh5PC9mYW1pbHlOYW1lPgogICAgPC9jcmVhdG9yPgogIDwvY3JlYXRvcnM+CiAgPHRpdGxlcz4KICAgIDx0aXRsZT5VbmRlcnN0YW5kaW5nIEhvdyBPdmVyLVBhcmFtZXRyaXphdGlvbiBMZWFkcyB0byBBY2NlbGVyYXRpb246IEEgY2FzZSBvZiBsZWFybmluZyBhIHNpbmdsZSB0ZWFjaGVyIG5ldXJvbjwvdGl0bGU+CiAgPC90aXRsZXM+CiAgPHB1Ymxpc2hlcj5hclhpdjwvcHVibGlzaGVyPgogIDxwdWJsaWNhdGlvblllYXI+MjAyMDwvcHVibGljYXRpb25ZZWFyPgogIDxzdWJqZWN0cz4KICAgIDxzdWJqZWN0IHhtbDpsYW5nPSJlbiIgc3ViamVjdFNjaGVtZT0iYXJYaXYiPk1hY2hpbmUgTGVhcm5pbmcgKGNzLkxHKTwvc3ViamVjdD4KICAgIDxzdWJqZWN0IHhtbDpsYW5nPSJlbiIgc3ViamVjdFNjaGVtZT0iYXJYaXYiPk1hY2hpbmUgTGVhcm5pbmcgKHN0YXQuTUwpPC9zdWJqZWN0PgogICAgPHN1YmplY3Qgc3ViamVjdFNjaGVtZT0iRmllbGRzIG9mIFNjaWVuY2UgYW5kIFRlY2hub2xvZ3kgKEZPUykiPkZPUzogQ29tcHV0ZXIgYW5kIGluZm9ybWF0aW9uIHNjaWVuY2VzPC9zdWJqZWN0PgogIDwvc3ViamVjdHM+CiAgPGRhdGVzPgogICAgPGRhdGUgZGF0ZVR5cGU9IlN1Ym1pdHRlZCIgZGF0ZUluZm9ybWF0aW9uPSJ2MSI+MjAyMC0xMC0wNFQxNzoyNzo0NFo8L2RhdGU+CiAgICA8ZGF0ZSBkYXRlVHlwZT0iVXBkYXRlZCIgZGF0ZUluZm9ybWF0aW9uPSJ2MSI+MjAyMC0xMC0wNlQwMDoyMjozOFo8L2RhdGU+CiAgICA8ZGF0ZSBkYXRlVHlwZT0iU3VibWl0dGVkIiBkYXRlSW5mb3JtYXRpb249InYyIj4yMDIxLTA1LTI1VDE4OjQxOjIyWjwvZGF0ZT4KICAgIDxkYXRlIGRhdGVUeXBlPSJVcGRhdGVkIiBkYXRlSW5mb3JtYXRpb249InYyIj4yMDIxLTA1LTI3VDAwOjAxOjIxWjwvZGF0ZT4KICAgIDxkYXRlIGRhdGVUeXBlPSJTdWJtaXR0ZWQiIGRhdGVJbmZvcm1hdGlvbj0idjMiPjIwMjEtMDktMjdUMTg6NTk6MzVaPC9kYXRlPgogICAgPGRhdGUgZGF0ZVR5cGU9IlVwZGF0ZWQiIGRhdGVJbmZvcm1hdGlvbj0idjMiPjIwMjEtMDktMjlUMDA6MDI6MDdaPC9kYXRlPgogICAgPGRhdGUgZGF0ZVR5cGU9IkF2YWlsYWJsZSIgZGF0ZUluZm9ybWF0aW9uPSJ2MSI+MjAyMC0xMDwvZGF0ZT4KICA8L2RhdGVzPgogIDxyZXNvdXJjZVR5cGUgcmVzb3VyY2VUeXBlR2VuZXJhbD0iUHJlcHJpbnQiPkFydGljbGU8L3Jlc291cmNlVHlwZT4KICA8dmVyc2lvbj4zPC92ZXJzaW9uPgogIDxyaWdodHNMaXN0PgogICAgPHJpZ2h0cyByaWdodHNVUkk9Imh0dHA6Ly9hcnhpdi5vcmcvbGljZW5zZXMvbm9uZXhjbHVzaXZlLWRpc3RyaWIvMS4wLyI+YXJYaXYub3JnIHBlcnBldHVhbCwgbm9uLWV4Y2x1c2l2ZSBsaWNlbnNlPC9yaWdodHM+CiAgPC9yaWdodHNMaXN0PgogIDxkZXNjcmlwdGlvbnM+CiAgICA8ZGVzY3JpcHRpb24gZGVzY3JpcHRpb25UeXBlPSJBYnN0cmFjdCI+T3Zlci1wYXJhbWV0cml6YXRpb24gaGFzIGJlY29tZSBhIHBvcHVsYXIgdGVjaG5pcXVlIGluIGRlZXAgbGVhcm5pbmcuIEl0IGlzIG9ic2VydmVkIHRoYXQgYnkgb3Zlci1wYXJhbWV0cml6YXRpb24sIGEgbGFyZ2VyIG5ldXJhbCBuZXR3b3JrIG5lZWRzIGEgZmV3ZXIgdHJhaW5pbmcgaXRlcmF0aW9ucyB0aGFuIGEgc21hbGxlciBvbmUgdG8gYWNoaWV2ZSBhIGNlcnRhaW4gbGV2ZWwgb2YgcGVyZm9ybWFuY2UgLS0gbmFtZWx5LCBvdmVyLXBhcmFtZXRyaXphdGlvbiBsZWFkcyB0byBhY2NlbGVyYXRpb24gaW4gb3B0aW1pemF0aW9uLiBIb3dldmVyLCBkZXNwaXRlIHRoYXQgb3Zlci1wYXJhbWV0cml6YXRpb24gaXMgd2lkZWx5IHVzZWQgbm93YWRheXMsIGxpdHRsZSB0aGVvcnkgaXMgYXZhaWxhYmxlIHRvIGV4cGxhaW4gdGhlIGFjY2VsZXJhdGlvbiBkdWUgdG8gb3Zlci1wYXJhbWV0cml6YXRpb24uIEluIHRoaXMgcGFwZXIsIHdlIHByb3Bvc2UgdW5kZXJzdGFuZGluZyBpdCBieSBzdHVkeWluZyBhIHNpbXBsZSBwcm9ibGVtIGZpcnN0LiBTcGVjaWZpY2FsbHksIHdlIGNvbnNpZGVyIHRoZSBzZXR0aW5nIHRoYXQgdGhlcmUgaXMgYSBzaW5nbGUgdGVhY2hlciBuZXVyb24gd2l0aCBxdWFkcmF0aWMgYWN0aXZhdGlvbiwgd2hlcmUgb3Zlci1wYXJhbWV0cml6YXRpb24gaXMgcmVhbGl6ZWQgYnkgaGF2aW5nIG11bHRpcGxlIHN0dWRlbnQgbmV1cm9ucyBsZWFybiB0aGUgZGF0YSBnZW5lcmF0ZWQgZnJvbSB0aGUgdGVhY2hlciBuZXVyb24uIFdlIHByb3ZhYmx5IHNob3cgdGhhdCBvdmVyLXBhcmFtZXRyaXphdGlvbiBoZWxwcyB0aGUgaXRlcmF0ZSBnZW5lcmF0ZWQgYnkgZ3JhZGllbnQgZGVzY2VudCB0byBlbnRlciB0aGUgbmVpZ2hib3Job29kIG9mIGEgZ2xvYmFsIG9wdGltYWwgc29sdXRpb24gdGhhdCBhY2hpZXZlcyB6ZXJvIHRlc3RpbmcgZXJyb3IgZmFzdGVyLiBPbiB0aGUgb3RoZXIgaGFuZCwgd2UgYWxzbyBwb2ludCBvdXQgYW4gaXNzdWUgcmVnYXJkaW5nIHRoZSBuZWNlc3NpdHkgb2Ygb3Zlci1wYXJhbWV0cml6YXRpb24gYW5kIHN0dWR5IGhvdyB0aGUgc2NhbGluZyBvZiB0aGUgb3V0cHV0IG5ldXJvbnMgYWZmZWN0cyB0aGUgY29udmVyZ2VuY2UgdGltZS48L2Rlc2NyaXB0aW9uPgogICAgPGRlc2NyaXB0aW9uIGRlc2NyaXB0aW9uVHlwZT0iT3RoZXIiPkFjY2VwdGVkIGF0IEFDTUwgMjAyMTwvZGVzY3JpcHRpb24+CiAgPC9kZXNjcmlwdGlvbnM+CjwvcmVzb3VyY2U+","url":"https://arxiv.org/abs/2010.01637","contentUrl":null,"metadataVersion":0,"schemaVersion":"http://datacite.org/schema/kernel-4","source":"mds","isActive":true,"state":"findable","reason":null,"viewCount":0,"viewsOverTime":[],"downloadCount":0,"downloadsOverTime":[],"referenceCount":0,"citationCount":0,"citationsOverTime":[],"partCount":0,"partOfCount":0,"versionCount":0,"versionOfCount":0,"created":"2022-02-24T14:59:01.000Z","registered":"2022-02-24T14:59:02.000Z","published":"2020","updated":"2022-02-24T14:59:02.000Z"},"relationships":{"client":{"data":{"id":"arxiv.content","type":"clients"}},"provider":{"data":{"id":"arxiv","type":"providers"}},"media":{"data":{"id":"10.48550/arxiv.2010.01637","type":"media"}},"references":{"data":[]},"citations":{"data":[]},"parts":{"data":[]},"partOf":{"data":[]},"versions":{"data":[]},"versionOf":{"data":[]}}}}