Loading web-font TeX/Main/Regular
Polynomial Escape-Time from Saddle Points in Distributed Non-Convex Optimization | IEEE Conference Publication | IEEE Xplore

Polynomial Escape-Time from Saddle Points in Distributed Non-Convex Optimization


Abstract:

The diffusion strategy for distributed learning from streaming data employs local stochastic gradient updates along with exchange of iterates over neighborhoods. In this ...Show More

Abstract:

The diffusion strategy for distributed learning from streaming data employs local stochastic gradient updates along with exchange of iterates over neighborhoods. In this work we establish that agents cluster around a network centroid in the mean-fourth sense and proceeded to study the dynamics of this point. We establish expected descent in non-convex environments in the large-gradient regime and introduce a short-term model to examine the dynamics over finite-time horizons. Using this model, we establish that the diffusion strategy is able to escape from strict saddle-points in O(1/μ) iterations, where μ denotes the step-size; it is also able to return approximately second-order stationary points in a polynomial number of iterations. Relative to prior works on the polynomial escape from saddle-points, most of which focus on centralized perturbed or stochastic gradient descent, our approach requires less restrictive conditions on the gradient noise process.
Date of Conference: 15-18 December 2019
Date Added to IEEE Xplore: 05 March 2020
ISBN Information:
Conference Location: Le gosier, Guadeloupe

1. Introduction

We consider a network of agents. Each agent is equipped with a local, stochastic cost of the form , where denotes a parameter vector and denotes random data. We consider a global optimization problem of the form:\begin{equation*} \min_{w}J(w),\quad \mathrm{where}\ J(w)\triangleq \sum_{k=1}^{N}p_{k}J_{k}(w) \tag{1} \end{equation*}

where the weights are a function of the graph topology and will be specified further below in (4). Solutions to such problems via distributed strategies can be pursued through a variety of algorithms, including those of the consensus and diffusion type [3]–[9]. We study the diffusion strategy due to its proven enhanced performance in adaptive environments in response to streaming data and drifting conditions [4], [10]. The strategy takes the form:\begin{align*} \phi_{k,i}=w_{k,i-1}-\mu\widehat{\nabla J}_{k}(w_{k,i-1})\tag{2a}\\ w_{k,i}=\sum_{\ell=1}^{N}a_{\ell k}\phi_{\ell,i}\tag{2b} \end{align*}

Contact IEEE to Subscribe

References

References is not available for this document.