The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects.

Zhanxing Zhu , Jingfeng Wu , Bing Yu , Lei Wu
international conference on machine learning 7654 -7663

175
2019
Tangent-Normal Adversarial Regularization for Semi-supervised Learning

Zhanxing Zhu , Jingfeng Wu , Bing Yu , Jinwen Ma
arXiv: Learning

29
2018
On the Noisy Gradient Descent that Generalizes as SGD

Jingfeng Wu , Wenqing Hu , Haoyi Xiong , Jun Huan
arXiv: Learning

57
2019
Obtaining Adjustable Regularization for Free via Iterate Averaging

Vladimir Braverman , Lin Yang , Jingfeng Wu
international conference on machine learning 1 10344 -10354

2020
Lifelong Learning with Sketched Structural Regularization.

Vladimir Braverman , Soheil Kolouri , Praveen K. Pilly , Jingfeng Wu
arXiv: Learning

8
2021
The benefits of implicit regularization from sgd in least squares problems

Difan Zou , Jingfeng Wu , Vladimir Braverman , Quanquan Gu
Advances in neural information processing systems 34 5456 -5468

15
2021
Benign Overfitting of Constant-Stepsize SGD for Linear Regression

Difan Zou , Jingfeng Wu , Vladimir Braverman , Quanquan Gu
COLT

58
2021
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate

Jingfeng Wu , Difan Zou , Vladimir Braverman , Quanquan Gu
International Conference on Learning Representations

32
2021
Last iterate risk bounds of sgd with decaying stepsize for overparameterized linear regression

Jingfeng Wu , Difan Zou , Vladimir Braverman , Quanquan Gu
International Conference on Machine Learning 24280 -24314

23
2022
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

Jingfeng Wu , Difan Zou , Zixiang Chen , Vladimir Braverman
arXiv preprint arXiv:2310.08391

17
2023
The power and limitation of pretraining-finetuning for linear regression under covariate shift

Jingfeng Wu , Difan Zou , Vladimir Braverman , Quanquan Gu
Advances in Neural Information Processing Systems 35 33041 -33053

12
2022
Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

Jingfeng Wu , Difan Zou , Zixiang Chen , Vladimir Braverman
International Conference on Machine Learning

6
2023
Risk bounds of multi-pass sgd for least squares in the interpolation regime

Difan Zou , Jingfeng Wu , Vladimir Braverman , Quanquan Gu
Advances in Neural Information Processing Systems 35 12909 -12920

4
2022
Risk Bounds of Accelerated SGD for Overparameterized Linear Regression

Xuheng Li , Yihe Deng , Jingfeng Wu , Dongruo Zhou
arXiv preprint arXiv:2311.14222

2023
A collective AI via lifelong learning and sharing at the edge

Andrea Soltoggio , Eseoghene Ben-Iwhiwhu , Vladimir Braverman , Eric Eaton
Nature Machine Intelligence 6 ( 3) 251 -264

1
2024
Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency

Jingfeng Wu , Peter L Bartlett , Matus Telgarsky , Bin Yu
arXiv preprint arXiv:2402.15926

1
2024
In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization

Ruiqi Zhang , Jingfeng Wu , Peter L Bartlett
arXiv preprint arXiv:2402.14951

1
2024
Implicit bias of gradient descent for logistic regression at the edge of stability

Jingfeng Wu , Vladimir Braverman , Jason D Lee
Advances in Neural Information Processing Systems 36

8
2024
Private Federated Frequency Estimation: Adapting to the Hardness of the Instance

Jingfeng Wu , Wennan Zhu , Peter Kairouz , Vladimir Braverman
Advances in Neural Information Processing Systems 36

2024