Title:

Randomized iterative methods for linear systems : momentum, inexactness and gossip

In the era of big data, one of the key challenges is the development of novel optimization algorithms that can accommodate vast amounts of data while at the same time satisfying constraints and limitations of the problem under study. The need to solve optimization problems is ubiquitous in essentially all quantitative areas of human endeavour, including industry and science. In the last decade there has been a surge in the demand from practitioners, in fields such as machine learning, computer vision, artificial intelligence, signal processing and data science, for new methods able to cope with these new large scale problems. In this thesis we are focusing on the design, complexity analysis and efficient implementations of such algorithms. In particular, we are interested in the development of randomized first order iterative methods for solving large scale linear systems, stochastic quadratic optimization problems and the distributed average consensus problem. In Chapter 2, we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent: convex quadratic problems. We prove global nonasymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates, and dual function values. We also show that the primal iterates converge at an accelerated linear rate in a somewhat weaker sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets. In Chapter 3, we present a convergence rate analysis of inexact variants of stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic subspace ascent. A common feature of these methods is that in their update rule a certain subproblem needs to be solved exactly. We relax this requirement by allowing for the subproblem to be solved inexactly. In particular, we propose and analyze inexact randomized iterative methods for solving three closely related problems: a convex stochastic quadratic optimization problem, a best approximation problem and its dual { a concave quadratic maximization problem. We provide iteration complexity results under several assumptions on the inexactness error. Inexact variants of many popular and some more exotic methods, including randomized block Kaczmarz, randomized Gaussian Kaczmarz and randomized block coordinate descent, can be cast as special cases. Finally, we present numerical experiments which demonstrate the benefits of allowing inexactness. When the data describing a given optimization problem is big enough, it becomes impossible to store it on a single machine. In such situations, it is usually preferable to distribute the data among the nodes of a cluster or a supercomputer. In one such setting the nodes cooperate to minimize the sum (or average) of private functions (convex or nonconvex) stored at the nodes. Among the most popular protocols for solving this problem in a decentralized fashion (communication is allowed only between neighbours) are randomized gossip algorithms. In Chapter 4 we propose a new approach for the design and analysis of randomized gossip algorithms which can be used to solve the distributed average consensus problem, a fundamental problem in distributed computing, where each node of a network initially holds a number or vector, and the aim is to calculate the average of these objects by communicating only with its neighbours (connected nodes). The new approach consists in establishing new connections to recent literature on randomized iterative methods for solving largescale linear systems. Our general framework recovers a comprehensive array of wellknown gossip protocols as special cases and allow for the development of block and arbitrary sampling variants of all of these methods. In addition, we present novel and provably accelerated randomized gossip protocols where in each step all nodes of the network update their values using their own information but only a subset of them exchange messages. The accelerated protocols are the first randomized gossip algorithms that converge to consensus with a provably accelerated linear rate. The theoretical results are validated via computational testing on typical wireless sensor network topologies. Finally, in Chapter 5, we move towards a different direction and present the first randomized gossip algorithms for solving the average consensus problem while at the same time protecting the private values stored at the nodes as these may be sensitive. In particular, we develop and analyze three privacy preserving variants of the randomized pairwise gossip algorithm ("randomly pick an edge of the network and then replace the values stored at vertices of this edge by their average") first proposed by Boyd et al. [16] for solving the average consensus problem. The randomized methods we propose are all dual in nature. That is, they are designed to solve the dual of the best approximation optimization formulation of the average consensus problem. We call our three privacy preservation techniques "Binary Oracle", "ε Gap Oracle" and "Controlled Noise Insertion". We give iteration complexity bounds for the proposed privacy preserving randomized gossip protocols and perform extensive numerical experiments.
