Skip to main content
Log in

Inexact proximal \(\epsilon \)-subgradient methods for composite convex optimization problems

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

We present two approximate versions of the proximal subgradient method for minimizing the sum of two convex functions (not necessarily differentiable). At each iteration, the algorithms require inexact evaluations of the proximal operator, as well as, approximate subgradients of the functions (namely: the\(\epsilon \)-subgradients). The methods use different error criteria for approximating the proximal operators. We provide an analysis of the convergence and rate of convergence properties of these methods, considering various stepsize rules, including both, diminishing and constant stepsizes. For the case where one of the functions is smooth, we propose an inexact accelerated version of the proximal gradient method, and prove that the optimal convergence rate for the function values can be achieved. Moreover, we provide some numerical experiments comparing our algorithm with similar recent ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. In [9] and [8], it is actually considered incremental proximal subgradient methods for problem (1), when the objective function is the sum of a large number of convex functions.

  2. We stress that, for the IPGM, \(y^k=x^{k-1}-\alpha _k\nabla f(x^{k-1})\) at each iteration k.

References

  1. Alber, Y.I., Iusem, A.N., Solodov, M.V.: On the projected subgradient method for nonsmooth convex optimization in a Hilbert space. Math. Program. 81(1), 23–35 (1998)

    Article  MathSciNet  Google Scholar 

  2. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  3. Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image denoising and deblurring. IEEE Trans. Image Proc. 18, 2419–2434 (2009)

    Article  MathSciNet  Google Scholar 

  4. Bello Cruz, J.Y.: On proximal subgradient splitting method for minimizing the sum of two nonsmooth convex functions. Set-Valued Var. Anal. 25(2), 245–263 (2017)

    Article  MathSciNet  Google Scholar 

  5. Bello Cruz, J.Y., Díaz Millán, R.: A direct splitting method for nonsmooth variational inequalities. J. Optim. Theory Appl. 161(3), 728–737 (2014)

    Article  MathSciNet  Google Scholar 

  6. Bello Cruz, J.Y., Díaz Millán, R.: A relaxed-projection splitting algorithm for variational inequalities in Hilert spaces. J. Glob. Optim. 65(3), 597–614 (2016)

    Article  Google Scholar 

  7. Bertsekas, D.P.: Convex Optimization Theory. Athena Scientific, Belmont (2009)

    MATH  Google Scholar 

  8. Bertsekas, D.P.: Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey. CoRR arXiv:1507.01030 (2015)

  9. Bertsekas, D.P.: Incremental proximal methods for large scale convex optimization. Math. Program. Ser. B 129(2), 163–195 (2011)

    Article  MathSciNet  Google Scholar 

  10. Birgin, E.G., Martínez, J.M., Raydan, M.: Inexact spectral projected gradient methods on convex sets. IMA J. Numer. Anal. 23(4), 539–559 (2003)

    Article  MathSciNet  Google Scholar 

  11. Boţ, R.I., Csetnek, E.R.: An inertial forward–backward–forward primal–dual splitting algorithm for solving monotone inclusion problems. Numer. Algorithms 71(3), 519–540 (2016)

    Article  MathSciNet  Google Scholar 

  12. Brøndsted, A., Rockafellar, R.T.: On the subdifferentiability of convex functions. Proc. Am. Math. Soc. 16, 605–611 (1965)

    Article  MathSciNet  Google Scholar 

  13. Burachik, R.S., Martínez-Legaz, J.E., Rezaie, M., Théra, M.: An additive subfamily of enlargements of a maximally monotone operator. Set-Valued Var. Anal. 23(4), 643–665 (2015)

    Article  MathSciNet  Google Scholar 

  14. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20, 89–97 (2004)

    Article  MathSciNet  Google Scholar 

  15. Combettes, P.L.: Quasi-Fejérian analysis of some optimization algorithms. In inherently parallel algorithms in feasibility and optimization and their applications. Stud. Comput. Math. 8, 115–152 (2001)

    Article  MathSciNet  Google Scholar 

  16. Díaz Millán, R.: Two algorithms for solving systems of inclusion problems. Numer. Algorithms 78(4), 1111–1127 (2018)

    Article  MathSciNet  Google Scholar 

  17. Díaz Millán, R.: On several algorithms for variational inequality and inclusion problems. PhD thesis, Federal University of Goiás, Goiânia, GO, (2015). Institute of Mathematic and Statistic, IME-UFG

  18. Ermoliev, Y.M.: On the method of generalized stochastic gradients and quasi-Fejér sequences. Cybernetics 5, 208–220 (1969)

    Article  Google Scholar 

  19. Guo, X.L., Zhao, C.J., Li, Z.W.: On generalized \(\epsilon \)-subdifferential and radial epiderivative of set-valued mappings. Optim. Lett. 8(5), 1707–1720 (2014)

    Article  MathSciNet  Google Scholar 

  20. Helou, E.S., Simões, L.E.A.: \(\epsilon \)-subgradient algorithms for bilevel convex optimization. Inverse Probl. 33, 5 (2017)

    MathSciNet  MATH  Google Scholar 

  21. Hiriart-Urruty, J.-B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms. II. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 306. Springer, Berlin (1993)

    Google Scholar 

  22. Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)

    Article  MathSciNet  Google Scholar 

  23. Liang, J., Fadili, J., Peyré, G.: Local linear convergence analysis of primal-dual splitting methods. Optimization 67, 821–853 (2018)

    Article  MathSciNet  Google Scholar 

  24. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)

    Article  MathSciNet  Google Scholar 

  25. Minty, G.J.: Monotone (nonlinear) operators in Hilbert space. Duke Math. J. 29, 341–346 (1962)

    Article  MathSciNet  Google Scholar 

  26. Monteiro, R.D.C., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013)

    Article  MathSciNet  Google Scholar 

  27. Moreau, J.-J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)

    Article  MathSciNet  Google Scholar 

  28. Nedic, A., Bertsekas, D.: Incremental subgradient methods for nondifferentiable optimization. SIAM J. Optim. 12, 109–138 (2001)

    Article  MathSciNet  Google Scholar 

  29. Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Dokl. 27, 372–376 (1983)

    MATH  Google Scholar 

  30. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. Ser. B 140(1), 125–161 (2013)

    Article  MathSciNet  Google Scholar 

  31. Nesterov, Y.: Introductory Lectures on Convex Optimization. A Basic Course. Applied Optimization, vol. 87. Kluwer, Boston (2004)

    Book  Google Scholar 

  32. Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)

    Article  MathSciNet  Google Scholar 

  33. Polyak, B.T.: Introduction to optimization. Translations Series in Mathematics and Engineering. Optimization Software, Inc., Publications Division, New York (1987)

  34. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)

    Article  MathSciNet  Google Scholar 

  35. Schmidt, M., Le Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: NIPS’11—25th Annual Conference on Neural Information Processing Systems (Grenada, Spain, Dec. 2011)

  36. Shor, N.Z.: Minimization Methods for Non-differentiable Functions. Springer Series in Computational Mathematics. Springer, Berlin (1985)

    Book  Google Scholar 

  37. Simonetto, A., Jamali-Rad, H.: Primal recovery from consensus-based dual decomposition for distributed convex optimization. J. Optim. Theory Appl. 168(1), 172–197 (2016)

    Article  MathSciNet  Google Scholar 

  38. Solodov, M.V., Svaiter, B.F.: A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximal monotone operator. Set-Valued Anal. 7(4), 323–345 (1999)

    Article  MathSciNet  Google Scholar 

  39. Solodov, M.V., Svaiter, B.F.: A unified framework for some inexact proximal point algorithms. Numer. Funct. Anal. Optim. 22(7–8), 1013–1035 (2001)

    Article  MathSciNet  Google Scholar 

  40. Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. 170, 67–96 (2018). https://doi.org/10.1007/s10107-018-1284-2

    Article  MathSciNet  MATH  Google Scholar 

  41. Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. J SIAM J. Optim. 23(3), 1607–1633 (2013)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was partially completed while M.P.M. was supported by a CAPES post-doctoral fellowship at the University of Campinas. M.P.M. is very grateful to IMECC at the University of Campinas and especially to Professor Sandra Augusta Santos for the warm hospitality.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Díaz Millán.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Proposition 3

We first define, for all \(k\ge 0\),

$$\begin{aligned} \eta _k : = \min _{x\in \mathbb {R}^n}\{t_k\mathfrak {L}_k(x)+1/2||x-x^0||^2\}, \end{aligned}$$
(46)

and prove that \(\eta _{k+1}-\eta _k\ge t_{k+1}(f+g)(\overline{x}^{k+1}) - t_k(f+g)(\overline{x}^k)\). To do this, we observe that

$$\begin{aligned} \begin{aligned} \eta _{k+1} =&t_{k+1}\mathfrak {L}_{k+1}(x^{k+1}) + \dfrac{1}{2}||x^{k+1}-x^0||^2\\ =&\beta _{k+1} \ell _{k+1}(x^{k+1}) + t_{k}\mathfrak {L}_{k}(x^{k+1}) + \dfrac{1}{2}||x^{k+1}-x^0||^2, \end{aligned} \end{aligned}$$
(47)

where the first equality above follows from the last relation in (39), and the second equality is a consequence of the definition of \(\mathfrak {L}_k\).

Next, we note that the definition of \(\eta _k\), together with the fact that the function in the minimization problem in (46) is quadratic, implies that

$$\begin{aligned}t_{k}\mathfrak {L}_{k}(x^{k+1}) + \dfrac{1}{2}||x^{k+1}-x^0||^2 = \eta _{k} + \dfrac{1}{2}||x-x^{k}||^2.\end{aligned}$$

Therefore, combining this latter relation with (47) we obtain

$$\begin{aligned} \begin{aligned} \eta _{k+1} =&\beta _{k+1}\ell _{k+}(x^{k+1}) + \eta _{k} + \dfrac{1}{2}||x-x^{k}||^2\\ =&\eta _{k} - t_{k}(f+g)(\overline{x}^{k}) + \beta _{k+1}\ell _{k+1}(x^{k+1}) + t_{k}(f+g)(\overline{x}^{k})\\&+ \dfrac{1}{2}||x^{k+1}-x^{k}||^2\\ \ge&\eta _{k} - t_{k}(f+g)(\overline{x}^{k}) + \beta _{k+1}\ell _{k+1}(x^{k+1}) + t_{k}\ell _{k+1}(\overline{x}^{k})\\&+ \dfrac{1}{2}||x^{k+1}-x^{k}||^2, \end{aligned} \end{aligned}$$
(48)

where the inequality above is due to the first relation in (39).

Now, from the definition of \(t_{k+1}\) and because \(\ell _{k+1}\) is affine, we have

$$\begin{aligned} \beta _{k+1}\ell _{k+1}(x^{k+1}) + t_{k}\ell _{k+1}(\overline{x}^{k}) = t_{k+1}\ell _{k+1}\left( \dfrac{\beta _{k+1}}{t_{k+1}}x^{k+1} + \dfrac{t_k}{t_{k+1}}\overline{x}^k\right) . \end{aligned}$$

Moreover, denoting \(x = \dfrac{\beta _{k+1}}{t_{k+1}}x^{k+1} + \dfrac{t_k}{t_{k+1}}\overline{x}^k\) and using the definition of \(\tilde{x}^{k+1}\) in step 1 of Algorithm 3, we have \(x^{k+1} - x^k = \dfrac{t_{k+1}}{\beta _{k+1}}(x-\tilde{x}^{k+1})\). Therefore, combining these relations with Eq. (48) we obtain

$$\begin{aligned} \begin{aligned} \eta _{k+1} \ge&\eta _k -t_k(f+g)(\overline{x}^k) + t_{k+1}\ell _{k+1}(x) + \dfrac{t_{k+1}^2}{2\beta _{k+1}^2}||x-\tilde{x}^{k+1}||^2\\ =&\eta _k - t_k(f+g)(\overline{x}^k) + t_{k+1}\left( \ell _{k+1}(x) + \dfrac{1}{2\alpha (1-\sigma ^2)}||x-\tilde{x}^{k+1}||^2\right) , \end{aligned} \end{aligned}$$

where we used in the equality above the definition of \(\beta _{k+1}\). By (40) and the assumption that \(\sigma ^2<1/2\) we conclude our claim.

Now, we observe that since the sequence \((\eta _k-t_k(f+g)(\overline{x}^k))_{k\in \mathbb {N}}\) is non-decreasing, we have, for all \(k\ge 1\), that

$$\begin{aligned} \eta _k - t_k(f+g)(\overline{x}^k) \ge \eta _0 - t_0(f+g)(\overline{x}^0) = 0. \end{aligned}$$

Hence, using (46) we deduce the second inequality in (42).

To prove the first inequality in (42), we note that from the definitions of \(t_k\) and \(\beta _k\) it follows that

$$\begin{aligned} t_k \ge t_{k-1} + \dfrac{\alpha (1-\sigma ^2)}{2} + \sqrt{\alpha (1-\sigma ^2)t_{k-1}} \ge \left( \sqrt{t_{k-1}} +\dfrac{1}{2}\sqrt{\alpha (1-\sigma ^2)} \right) ^2. \end{aligned}$$

Thus, we conclude using that \(\alpha =\sigma ^2/L\) and an induction argument. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Millán, R.D., Machado, M.P. Inexact proximal \(\epsilon \)-subgradient methods for composite convex optimization problems. J Glob Optim 75, 1029–1060 (2019). https://doi.org/10.1007/s10898-019-00808-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-019-00808-8

Keywords

Mathematics Subject Classification

Navigation