In 2020 I took a course on Optimization Theory taught by Professor Bahman Gharesifard.
The course gave a rigorous foundation on the field of optimization theory. One of the topics
we covered in the course is convergence rate proofs for learning algorithms. I got really interested
in the topic and decided to do my project on convergence rates for stochastic gradient descent.
My project surrounded reproducing a proof from the paper Making Gradient Descent Optimal
for Strongly Convex Stochastic Optimization by Sridharan et al. Specfically, I reproduce a proof
on how to recover an optimal O(1/T) convergence rate to SGD. I tried to use concepts, theorems, and
propisitions learned in Optimization Theory to motivate and reproduce the proof. Additionally,
I wrote a simple Neural Network to demonstrate the practical use of SGD vs slower algorithms like GD.
This is the direct link to the PDF of the paper