A. V. Gasnikov, D. Yu. Dmitriev, “On efficient randomized algorithms for finding the PageRank vector”, Zh. Vychisl. Mat. Mat. Fiz., 2015, Volume 55, Number 3,Pages <nobr>355

This article is cited in 8 papers

On efficient randomized algorithms for finding the PageRank vector

A. V. Gasnikov^ab, D. Yu. Dmitriev^ba

^a Moscow Institute of Physics and Technology, Institutskii per. 9, Dolgoprudnyi, Moscow oblast, 141700, Russia
^b Institute for Information Transmission Problems, Russian Academy of Sciences, Bolshoi Karetnyi per. 19, str. 1, Moscow, 127051, Russia

Abstract: Two randomized methods are considered for finding the PageRank vector; in other words, the solution of the system $\mathbf{p}^{\mathrm{T}}=\mathbf{p}^{\mathrm{T}}P$ with a stochastic $n\times n$ matrix $P$, where $n\sim 10^7$–$10^9$, is sought (in the class of probability distributions) with accuracy $\varepsilon:\varepsilon\gg n^{-1}$. Thus, the possibility of brute-force multiplication of $P$ by the column is ruled out in the case of dense objects. The first method is based on the idea of Markov chain Monte Carlo algorithms. This approach is efficient when the iterative process $\mathbf{p}_{t+1}^{\mathrm{T}}=\mathbf{p}_t^{\mathrm{T}}P$ quickly reaches a steady state. Additionally, it takes into account another specific feature of $P$, namely, the nonzero off-diagonal elements of $P$ are equal in rows (this property is used to organize a random walk over the graph with the matrix $P$). Based on modern concentration-of-measure inequalities, new bounds for the running time of this method are presented that take into account the specific features of $P$. In the second method, the search for a ranking vector is reduced to finding the equilibrium in the antagonistic matrix game
$$ \min_{\mathbf{p}\in S_n(1)}\max_{\mathbf{u}\in S_n(1)}\langle \mathbf{u}, (P^{\mathrm{T}}-I)\mathbf{p}\rangle, $$
where $S_n(1)$ is a unit simplex in $\mathbb{R}^n$ and $I$ is the identity matrix. The arising problem is solved by applying a slightly modified Grigoriadis–Khachiyan algorithm (1995). This technique, like the Nazin–Polyak method (2009), is a randomized version of Nemirovski’s mirror descent method. The difference is that randomization in the Grigoriadis–Khachiyan algorithm is used when the gradient is projected onto the simplex rather than when the stochastic gradient is computed. For sparse matrices $P$, the method proposed yields noticeably better results.

Key words: mirror descent method, Markov chain Monte Carlo, stochastic optimization, randomization, PageRank.

UDC: 519.62

MSC: Primary 68W20; Secondary 90C15, 90C47, 90C90

Received: 03.09.2014

DOI: 10.7868/S0044466915030060