R. Colini-Baldeschi, B. de Keijzer, S. Leonardi, and S. Turchetta
Approximately Efficient Double Auctions with Strong Budget Balance
Proc. 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2016), 2016
Mechanism design for one-sided markets is an area of extensive research in economics and, since more than a decade, in computer science as well. Two-sided markets, on the other hand, have not received the same attention despite the numerous applications to web advertisement, stock exchange, and frequency spectrum allocation. This work studies double auctions, in which unit-demand buyers and unit-supply sellers act strategically.
An ideal goal in double auction design is to maximize the social welfare of buyers and sellers with individually rational (IR), incentive compatible (IC) and strongly budget-balanced (SBB) mechanisms. The first two properties are standard. SBB requires that the payments charged to the buyers are entirely handed to the sellers. This property is crucial in all the contexts that do not allow the auctioneer retaining a share of buyers' payments or subsidizing the market.
Unfortunately, this goal is known to be unachievable even for the special case of bilateral trade, where there is only one buyer and one seller. Therefore, in subsequent papers, meaningful trade-offs between these requirements have been investigated.
Our main contribution is the first IR, IC and SBB mechanism that provides an O(1)-approximation to the optimal social welfare. This result holds for any number of buyers and sellers with arbitrary, independent distributions. Moreover, our result continues to hold when there is an additional matroid constraint on the sets of buyers who may get allocated an item. To prove our main result, we devise an extension of sequential posted price mechanisms to two-sided markets. In addition to this, we improve the best-known approximation bounds for the bilateral trade problem.
P. Brach, M. Cygan, J. Lacki, and P. Sankowski
Algorithmic Complexity of Power Law Networks
Proc. 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2016), 2016
We define a deterministic condition for checking whether a graph has a power law degree distribution and experimentally validate it on real-world networks. This definition allows us to derive interesting properties of power law networks. We observe that for exponents of the degree distribution in the range [1, 2] such networks exhibit double power law phenomenon that was observed for several real-world networks. Moreover, we give a novel theoretical explanation why many algorithms run faster on real-world data than what is predicted by algorithmic worst-case analysis. We show how to exploit the power law degree distribution to design faster algorithms for a number of classic problems including transitive closure, maximum matching, determinant, PageRank, matrix inverse, counting triangles and finding maximum clique. In contrast to previously done average-case analyses, we believe that this is the first “waterproof’ argument that explains why many real-world networks are easier.
R. Colini-Baldeschi, S. Leonardi, M. Henzinger, and M. Starnberger
On Multiple Keyword Sponsored Search Auctions with Budgets
ACM Transactions on Economics and Computation, Volume 4, Number 1, 2015
We study multiple keyword sponsored search auctions with budgets. Each keyword has multiple ad slots with a click-through rate. The bidders have additive valuations, which are linear in the click-through rates, and budgets, which are restricting their overall payments. Additionally, the number of slots per keyword assigned to a bidder is bounded.
We show the following results: (1) We give the first mechanism for multiple keywords, where click-through rates differ among slots. Our mechanism is incentive compatible in expectation, individually rational in expectation, and Pareto optimal. (2) We study the combinatorial setting, where each bidder is only interested in a subset of the keywords. We give an incentive compatible, individually rational, Pareto-optimal, and deterministic mechanism for identical click-through rates. (3) We give an impossibility result for incentive compatible, individually rational, Pareto-optimal, and deterministic mechanisms for bidders with diminishing marginal valuations.
V. Auletta, D. Ferraioli, F. Pasquale, P. Penna, and G. Persiano
Logit Dynamics with Concurrent Updates for Local Interaction Potential Games
Algorithmica, Volume 73, Number 3, 2015
Logit choice dynamics constitute a family of randomized best response dynamics based on the logit choice function (McFadden in Frontiers in econometrics. Academic Press, New York, 1974) that models players with limited rationality and knowledge. In this paper we study the all-logit dynamics [also known as simultaneous learning (Alós-Ferrer and Netzer in Games Econ Behav 68(2):413--427, 2010)], where at each time step all players concurrently update their strategies according to the logit choice function. In the well studied (one-)logit dynamics (Blume in Games Econ Behav 5(3):387--424, 1993) instead at each step only one randomly chosen player is allowed to update. We study properties of the all-logit dynamics in the context of local interaction potential games, a class of games that has been used to model complex social phenomena (Montanari and Saberi 2009; Peyton in The economy as a complex evolving system. Oxford University Press, Oxford, 2003) and physical systems (Levin et al. in Probab Theory Relat Fields 146(1--2):223--265, 2010; Martinelli in Lectures on probability theory and statistics. Springer, Berlin, 1999). In a local interaction potential game players are the vertices of a social graph whose edges are two-player potential games. Each player picks one strategy to be played for all the games she is involved in and the payoff of the player is the sum of the payoffs from each of the games. We prove that local interaction potential games characterize the class of games for which the all-logit dynamics is reversible. We then compare the stationary behavior of one-logit and all-logit dynamics. Specifically, we look at the expected value of a notable class of observables, that we call decomposable observables. We prove that the difference between the expected values of the observables at stationarity for the two dynamics depends only on the rationality level beta and on the distance of the social graph from a bipartite graph. In particular, if the social graph is bipartite then decomposable observables have the same expected value. Finally, we show that the mixing time of the all-logit dynamics has the same twofold behavior that has been highlighted in the case of the one-logit: for some games it exponentially depends on the rationality level beta, whereas for other games it can be upper bounded by a function independent from beta.
D. Ferraioli, and P. Penna
Imperfect Best-Response Mechanisms
Theory of Computing Systems, Volume 57, Number 3, 2015
Best-response Mechanisms, introduced by Nisan et al. (2011) provide a unifying framework for studying various distributed protocols in which the participants are instructed to repeatedly best respond to each others’ strategies. Two fundamental features of these mechanisms are convergence and incentive compatibility. This work investigates convergence and incentive compatibility conditions of such mechanisms when players are not guaranteed to always best respond but they rather play an imperfect best-response strategy. That is, at every time step every player deviates from the prescribed best-response strategy according to some probability parameter. The results explain to what extent convergence and incentive compatibility depend on the assumption that players never make mistakes, and how robust such protocols are to "noise" or "mistakes".
A. Gupta, J. Könemann, S. Leonardi, R. Ravi, and G. Schäfer
Efficient Cost-Sharing Mechanisms for Prize-Collecting Problems
Mathematical Programming, Volume 152, Number 1, 2015
We consider the problem of designing efficient mechanisms to share the cost of providing some service to a set of self-interested customers. In this paper, we mainly focus on cost functions that are induced by prize-collecting optimization problems. Such cost functions arise naturally whenever customers can be served in two different ways: either by being part of a common service solution or by being served individually. One of our main contributions is a general lifting technique that allows us to extend the social cost approximation guarantee of a Moulin mechanism for the respective non-prize-collecting problem to its prize-collecting counterpart. Our lifting technique also suggests a generic design template to derive Moulin mechanisms for prize-collecting problems. The approach is particularly suited for cost-sharing methods that are based on primal-dual algorithms. We illustrate the applicability of our approach by deriving Moulin mechanisms for prize-collecting variants of submodular cost-sharing, facility location and Steiner forest problems. All our mechanisms are essentially best possible with respect to budget balance and social cost approximation guarantees. Finally, we show that the Moulin mechanism by Könemann et al. (SIAM J. Comput. 37(5):1319--1341, 2008) for the Steiner forest problem is O(log3k)-approximate. Our approach adds a novel methodological contribution to existing techniques by showing that such a result can be proved by embedding the graph distances into random hierarchically separated trees.
A. Anagnostopoulos, L. Becchetti, B. de Keijzer, and G. Schäfer
Inefficiency of Games with Social Context [pdf]
Theory of Computing Systems, Volume 57, Number 3, 2015
The study of other-regarding player behavior such as altruism and spite in games has recently received quite some attention in the algorithmic game theory literature. Already for very simple models, it has been shown that altruistic behavior can actually be harmful for society in the sense that the price of anarchy may increase as the players become more altruistic. In this paper, we study the severity of this phenomenon for more realistic settings in which there is a complex underlying social structure, causing the players to direct their altruistic and spiteful behavior in a refined player-specific sense (depending, for example, on friendships that exist among the players). Our findings show that the increase in the price of anarchy is modest for congestion games and minsum scheduling games, whereas it might be drastic for generalized second price auctions.
B. de Keijzer, G. Schäfer, and O. Telelis
The Strong Price of Anarchy of Linear Bottleneck Congestion Games
Theory of Computing Systems, Volume 57, Number 2, 2015
We study the inefficiency of equilibrium outcomes in Bottleneck Congestion games. These games model situations in which strategic players compete for a limited number of facilities. Each player allocates his weight to a (feasible) subset of the facilities with the goal to minimize the maximum (weight-dependent) latency that he experiences on any of these facilities. We analyze the (strong) Price of Anarchy of these games for a natural load balancing social cost objective, i.e., minimize the maximum latency of a facility. In our studies, we focus on Bottleneck Congestion games with linear latency functions. These games still constitute a rich class of games and generalize, for example, Load Balancing games with identical or uniformly related machines (with or without restricted assignments). We derive upper and asymptotically matching lower bounds on the (strong) Price of Anarchy of these games. We also derive more refined bounds for several special cases of these games, including the cases of identical player weights, identical latency functions and symmetric strategy sets. Further, we provide lower bounds on the Price of Anarchy for k-strong equilibria.
A. Anagnostopoulos, L. Becchetti, I. Bordino, S. Leonardi, I. Mele, and P. Sankowski
Stochastic Query Covering for Fast Approximate Document Retrieval [pdf]
ACM Transactions on Intelligent Systems, Volume 33, Number 3, 2015
We design algorithms that, given a collection of documents and a distribution over user queries, return a small subset of the document collection in such a way that we can efficiently provide high quality answers to user queries using only the selected subset. This approach has applications when space is a constraint or when the query-processing time increases significantly with the size of the collection. We study our algorithms through the lens of stochastic analysis and we prove that, even though they use only a small fraction of the entire collection, they can provide answers to most user queries, achieving a performance close to the optimal. To complement our theoretical findings, we experimentally show the versatility of our approach by considering two important cases in the context of web search: In the first case, we favor the retrieval of documents that are very relevant to the query, whereas in the second case we aim for document diversification. Both the theoretical and the experimental analysis provide strong evidence of the potential value of query covering in diverse application scenarios.
A. Anagnostopoulos, and M. Sorella
Learning a Macroscopic Model of Cultural Dynamics [pdf]
Proc. Of the 15th International Conference on Data Mining (ICDM 2015), 2015
A fundamental open question that has been studied by sociologists since the 70s and recently started being addressed by the computer-science community is the understanding of the role that influence and selection play in shaping the evolution of socio-cultural systems. Quantifying these forces in real settings is still a big challenge, especially in the large-scale case in which the entire social network between the users may not be known, and only longitudinal data in terms of masses of cultural groups (e.g., political affiliation, product adoption, market share, cultural tastes) may be available. We propose an influence and selection model encompassing an explicit characterization of the feature space for the different cultural groups in the form of a natural equation-based macroscopic model, following the approach of Kempe et al. [EC 2013]. Our main goal is to estimate edge influence strengths and selection parameters from an observed time series. To do an experimental evaluation on real data, we perform learning on real datasets from Last.FM and Wikipedia.
A. Bessi, F. Petroni, M. D. Vicario, F. Zollo, A. Anagnostopoulos, A. Scala, G. Caldarelli, and W. Quattrociocchi
Viral Misinformation: The Role of Homophily and Polarization [pdf]
Proc. Of the 24th International World Wide Web Conference, Companion Volume (WWW 2015, Web Science Track), 2015
No abstract available.
A. Anagnostopoulos, D. Ferraioli, and S. Leonardi
Competitive Influence in Social Networks: Convergence, Submodularity, and Competition Effects [pdf]
Proc. 2015 International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015), 2015
In the last 10 years, a vast amount of scientific literature has studied the problem of influence maximization. Yet, only very recently have scientists started considering the more realistic case in which competing entities try to expand their market and maximize their share via viral marketing. Goyal and Kearns [STOC 2012] present a model for the diffusion of two competing alternatives in a social network, which consists of two phases: one for the activation, in which nodes choose whether to adopt any of the two alternatives or none of them, and one for the selection, which is for choosing which of the two alternatives to adopt.
In this work we consider this two-phase model, by composing some of the most known dynamics (threshold, voter, and logit models), and we ask the following questions: (1) How is the stationary distribution of the composition of these dynamics related to those of the single composing dynamics? (2) Does the number of adopters of one of the alternatives increase in a monotone and submodular way with respect to the set of initial adopters of that alternative? (3) To what extent does the competition among alternatives affect the total number of agents adopting one of the alternatives?
A. Anagnostopoulos, L. Becchetti, A. Fazzone, I. Mele, and M. Riondato
The Importance of Being Expert: Efficient Max-Finding in Crowdsourcing [pdf]
Proc. 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD 2015), 2015
Crowdsourcing is a computational paradigm whose distinctive feature is the involvement of human workers in key steps of the computation. It is used successfully to address problems that would be hard or impossible to solve for machines. As we highlight in this work, the exclusive use of nonexpert individuals may prove ineffective in some cases, especially when the task at hand or the need for accurate solutions demand some degree of specialization to avoid excessive uncertainty and inconsistency in the answers. We address this limitation by proposing an approach that combines the wisdom of the crowd with the educated opinion of experts. We present a computational model for crowdsourcing that envisions two classes of workers with different expertise levels. One of its distinctive features is the adoption of the threshold error model, whose roots are in psychometrics and which we extend from previous theoretical work. Our computational model allows to evaluate the performance of crowdsourcing algorithms with respect to accuracy and cost. We use our model to develop and analyze an algorithm for approximating the best, in a broad sense, of a set of elements. The algorithm uses naïve and expert workers to find an element that is a constant-factor approximation to the best. We prove upper and lower bounds on the number of comparisons needed to solve this problem, showing that our algorithm uses expert and naïve workers optimally up to a constant factor. Finally, we evaluate our algorithm on real and synthetic datasets using the CrowdFlower crowdsourcing platform, showing that our approach is also effective in practice.
L. Becchetti, A. Clementi, E. Natale, F. Pasquale, and G. Posta
Self-Stabilizing Repeated Balls-Into-Bins
Proc. 27th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2015), 2015
We study the following synchronous process that we call repeated balls-into-bins. The process is started by assigning n balls to n bins in an arbitrary way. Then, in every subsequent round, one ball is chosen according to some fixed strategy (random, FIFO, etc) from each non-empty bin, and re-assigned to one of the n bins uniformly at random. This process corresponds to a non-reversible Markov chain and our aim is to study its self-stabilization properties with respect to the maximum (bin) load and some related performance measures.
We define a configuration (i.e., a state) legitimate if its maximum load is O(logn). We first prove that, starting from any legitimate configuration, the process will only take on legitimate configurations over a period of length bounded by any polynomial in n, with high probability (w.h.p.). Further we prove that, starting from any configuration, the process converges to a legitimate configuration in linear time, w.h.p. This implies that the process is self-stabilizing w.h.p. and, moreover, that every ball traverses all bins in O(nlog2n) rounds, w.h.p.
The latter result can also be interpreted as an almost tight bound on the cover time for the problem of parallel resource assignment in the complete graph.
J. R. Correa, J. Jong, B. de Keijzer, and M. Uetz
The Curse of Sequentiality in Routing Games
Proc. 11th Conference on Web and Internet Economics (WINE 2015), 2015
In the "The curse of simultaneity", Paes Leme et al. show that there are interesting classes of games for which sequential decision making and corresponding subgame perfect equilibria avoid worst case Nash equilibria, resulting in substantial improvements for the price of anarchy. This is called the sequential price of anarchy. A handful of papers have lately analysed it for various problems, yet one of the most interesting open problems was to pin down its value for linear atomic routing (also: network congestion) games, where the price of anarchy equals 5/2. The main contribution of this paper is the surprising result that the sequential price of anarchy is unbounded even for linear symmetric routing games, thereby showing that sequentiality can be arbitrarily worse than simultaneity for this class of games. Complementing this result we solve an open problem in the area by establishing that the (regular) price of anarchy for linear symmetric routing games equals 5/2. Additionally, we prove that in these games, even with two players, computing the outcome of a subgame perfect equilibrium is NP-hard.
M. Adamczyk, A. Borodin, D. Ferraioli, B. de Keijzer, and S. Leonardi
Sequential Posted Price Mechanisms with Correlated Valuations
Proc. 11th Conference on Web and Internet Economics (WINE 2015), 2015
We study the revenue performance of sequential posted price mechanisms and some natural extensions, for a general setting where the valuations of the buyers are drawn from a correlated distribution. Sequential posted price mechanisms are conceptually simple mechanisms that work by proposing a “take-it-or-leave-it” offer to each buyer. We apply sequential posted price mechanisms to single-parameter multi-unit settings in which each buyer demands only one item and the mechanism can assign the service to at most k of the buyers. For standard sequential posted price mechanisms, we prove that with the valuation distribution having finite support, no sequential posted price mechanism can extract a constant fraction of the optimal expected revenue, even with unlimited supply. We extend this result to the case of a continuous valuation distribution when various standard assumptions hold simultaneously. In fact, it turns out that the best fraction of the optimal revenue that is extractable by a sequential posted price mechanism is proportional to the ratio of the highest and lowest possible valuation. We prove that for two simple generalizations of these mechanisms, a better revenue performance can be achieved: if the sequential posted price mechanism has for each buyer the option of either proposing an offer or asking the buyer for its valuation, then an Omega(1/max1,d) fraction of the optimal revenue can be extracted, where d denotes the “degree of dependence” of the valuations, ranging from complete independence (d=0) to arbitrary dependence (d=n−1). When we generalize the sequential posted price mechanisms further, such that the mechanism has the ability to make a take-it-or-leave-it offer to the i-th buyer that depends on the valuations of all buyers except i, we prove that a constant fraction (2−√e)/4 (approximately 0.088) of the optimal revenue can be always extracted.
L. Becchetti, A. Clementi, E. Natale, F. Pasquale, and R. Silvestri
Plurality Consensus in the Gossip Model
Proc. 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2015), 2015
We study Plurality Consensus in the GOSSIP Model over a network of n anonymous agents. Each agent supports an initial opinion or color. We assume that at the onset, the number of agents supporting the plurality color exceeds that of the agents supporting any other color by a sufficiently-large bias, though the initial plurality itself might be very far from absolute majority. The goal is to provide a protocol that, with high probability, brings the system into the configuration in which all agents support the (initial) plurality color.
We consider the Undecided-State Dynamics, a well-known protocol which uses just one more state (the undecided one) than those necessary to store colors.
We show that the speed of convergence of this protocol depends on the initial color configuration as a whole, not just on the gap between the plurality and the second largest color community. This dependence is best captured by the monochromatic distance, a novel notion we introduce, measuring the distance of the initial color configuration from the closest monochromatic one. In the complete graph, we prove that, for a wide range of the input parameters, this dynamics converges in a number of rounds that is logarithmic in the number of nodes and linear with respect to the aforementioned monochromatic distance. We prove that this upper bound is almost tight in the strong sense: Starting from any color configuration, the convergence time is linear in the monochromatic. Finally, we adapt the protocol above to obtain a fast, random walk-based protocol for plurality consensus on regular expanders. All our bounds hold with high probability.
S. Lattanzi, S. Leonardi, V. Mirrokni, and I. Razenshteyn
Robust Hierarchical K-Center Clustering
Proc. 6th ACM International Conference on Innovations on Theoretical Computer Science (ITCS 2015), 2015
L. Becchetti, A. Clementi, F. Pasquale, G. Resta, P. Santi, and R. Silvestri
Flooding Time in Opportunistic Networks Under Power Law and Exponential Intercontact Times
IEEE Transactions on Parallel and Distributed Systems, Volume 25, Number 9, 2014
Performance bounds for opportunistic networks have been derived in a number of recent papers for several key quantities, such as the expected delivery time of a unicast message, or the flooding time (a measure of how fast information spreads). However, to the best of our knowledge, none of the existing results is derived under a mobility model which is able to reproduce the power law+exponential tail dichotomy of the pairwise node inter-contact time distribution which has been observed in traces of several real opportunistic networks.
The contributions of this paper are two-fold: first, we present a simple pairwise contact model--called the Home-MEG model--for opportunistic networks based on the observation made in previous work that pairs of nodes in the network tend to meet in very few, selected locations (home locations); this contact model is shown to be able to faithfully reproduce the power law+exponential tail dichotomy of inter-contact time. Second, we use the Home-MEG model to analyze flooding time in opportunistic networks, presenting asymptotic bounds on flooding time that assume different initial conditions for the existence of opportunistic links.
Finally, our bounds provide some analytical evidences that the speed of information spreading in opportunistic networks can be much faster than that predicted by simple geometric mobility models.
F. Grandoni, P. Krysta, S. Leonardi, and C. Ventre
Utilitarian Mechanism Design for Multiobjective Optimization
SIAM Journal on Computing, Volume 43, Number 4, 2014
In a classic optimization problem, the complete input data is assumed to be known to the algorithm. This assumption may not be true anymore in optimization problems motivated by the Internet where part of the input data is private knowledge of independent selfish agents. The goal of algorithmic mechanism design is to provide (in polynomial time) a solution to the optimization problem and a set of incentives for the agents such that disclosing the input data is a dominant strategy for the agents. In the case of NP-hard problems, the solution computed should also be a good approximation of the optimum. In this paper we focus on mechanism design for multiobjective optimization problems. In this setting we are given a main objective function and a set of secondary objectives which are modeled via budget constraints. Multiobjective optimization is a natural setting for mechanism design as many economical choices ask for a compromise between different, partially conflicting goals. The main contribution of this paper is showing that two of the main tools for the design of approximation algorithms for multiobjective optimization problems, namely, approximate Pareto sets and Lagrangian relaxation, can lead to truthful approximation schemes. By exploiting the method of approximate Pareto sets, we devise truthful deterministic and randomized multicriteria fully polynomial-time approximation schemes (FPTASs) for multiobjective optimization problems whose exact version admits a pseudopolynomial-time algorithm, as, for instance, the multibudgeted versions of minimum spanning tree, shortest path, maximum (perfect) matching, and matroid intersection. Our construction also applies to multidimensional knapsack and multiunit combinatorial auctions. Our FPTASs compute a (1+varepsilon)-approximate solution violating each budget constraint by a factor (1+varepsilon). When feasible solutions induce an independence system, i.e., when subsets of feasible solutions are feasible as well, we present a PTAS (not violating any constraint), which combines the approach above with a novel monotone way to guess the heaviest elements in the optimum solution. Finally, we present a universally truthful Las Vegas PTAS for minimum spanning tree with a single budget constraint, where one wants to compute a minimum cost spanning tree whose length is at most a given value L. This result is based on the Lagrangian relaxation method, in combination with our monotone guessing step and with a random perturbation step (ensuring low expected running time). This result can be derandomized in the case of integral lengths. All the mentioned results match the best known approximation ratios, which are, however, obtained by nontruthful algorithms.
P. Chen, B. de Keijzer, D. Kempe, and G. Schäfer
Altruism and Its Impact on the Price of Anarchy
ACM Transactions on Economics and Computation, Volume 2, Number 4, 2014
We study the inefficiency of equilibria for congestion games when players are (partially) altruistic. We model altruistic behavior by assuming that player i's perceived cost is a convex combination of αi times his direct cost and αi times the social cost. Tuning the parameters αi allows smooth interpolation between purely selfish and purely altruistic behavior. Within this framework, we study primarily altruistic extensions of (atomic and nonatomic) congestion games, but also obtain some results on fair cost-sharing games and valid utility games.
We derive (tight) bounds on the price of anarchy of these games for several solution concepts. Thereto, we suitably adapt the smoothness notion introduced by Roughgarden and show that it captures the essential properties to determine the robust price of anarchy of these games. Our bounds show that for atomic congestion games and cost-sharing games, the robust price of anarchy gets worse with increasing altruism, while for valid utility games, it remains constant and is not affected by altruism.
However, the increase in the price of anarchy is not a universal phenomenon: For general nonatomic congestion games with uniform altruism, the price of anarchy improves with increasing altruism. For atomic and nonatomic symmetric singleton congestion games, we derive bounds on the pure price of anarchy that improve as the average level of altruism increases. (For atomic games, we only derive such bounds when cost functions are linear.) Since the bounds are also strictly lower than the robust price of anarchy, these games exhibit natural examples in which pure Nash equilibria are more efficient than more permissive notions of equilibrium.
B. de Keijzer, T. B. Klos, and Y. Zhang
Finding Optimal Solutions for Voting Game Design Problems
Journal of Artificial Intelligence Research, Volume 50, 2014
In many circumstances where multiple agents need to make a joint decision, voting is used to aggregate the agents' preferences. Each agent's vote carries a weight, and if the sum of the weights of the agents in favor of some outcome is larger than or equal to a given quota, then this outcome is decided upon. The distribution of weights leads to a certain distribution of power. Several `power indices' have been proposed to measure such power. In the so-called inverse problem, we are given a target distribution of power, and are asked to come up with a game in the form of a quota, plus an assignment of weights to the players whose power distribution is as close as possible to the target distribution (according to some specified distance measure).
Here we study solution approaches for the larger class of voting game design (VGD) problems, one of which is the inverse problem. In the general VGD problem, the goal is to find a voting game (with a given number of players) that optimizes some function over these games. In the inverse problem, for example, we look for a weighted voting game that minimizes the distance between the distribution of power among the players and a given target distribution of power (according to a given distance measure). Our goal is to find algorithms that solve voting game design problems exactly, and we approach this goal by enumerating all games in the class of games of interest.
We first present a doubly exponential algorithm for enumerating the set of simple games. We then improve on this algorithm for the class of weighted voting games and obtain a quadratic exponential (i.e., 2O(n2)) algorithm for enumerating them. We show that this improved algorithm runs in output-polynomial time, making it the fastest possible enumeration algorithm up to a polynomial factor. Finally, we propose an exact anytime-algorithm that runs in exponential time for the power index weighted voting game design problem (the `inverse problem'). We implement this algorithm to find a weighted voting game with a normalized Banzhaf power distribution closest to a target power index, and perform experiments to obtain some insights about the set of weighted voting games. We remark that our algorithm is applicable to optimizing any exponential-time computable function, the distance of the normalized Banzhaf index to a target power index is merely taken as an example.
P. Gawrychowski, A. Jeż, and Ł. Jeż
Validating the Knuth-Morris-Pratt Failure Function, Fast and Online
Theory of Computing Systems, Volume 54, Number 2, 2014
Let π'w denote the failure function of the Knuth-Morris-Pratt algorithm for a word w. In this paper we study the following problem: given an integer array A'[1..n], is there a word w over an arbitrary alphabet Σ such that A'[i]=π'w[i] for all i? Moreover, what is the minimum cardinality of Σ required? We give an elementary and self-contained O(nlogn) time algorithm for this problem, thus improving the previously known solution (Duval et al. in Conference in honor of Donald E. Knuth, 2007), which had no polynomial time bound. Using both deeper combinatorial insight into the structure of π' and advanced algorithmic tools, we further improve the running time to O(n).
L. Becchetti, L. Bergamini, U. M. Colesanti, L. Filipponi, G. Persiano, and A. Vitaletti
A Lightweight Privacy Preserving SMS-Based Recommendation System for Mobile Users
Knowledge and Information Systems, Volume 40, Number 1, 2014
In this paper, we propose a fully decentralized approach for recommending new contacts in the social network of mobile phone users. With respect to existing solutions, our approach is characterized by some distinguishing features. In particular, the application we propose does not assume any centralized coordination: It transparently collects and processes user information that is accessible in any mobile phone, such as the log of calls, the list of contacts or the inbox/outbox of short messages and exchanges it with other users. This information is used to recommend new friendships to other users. Furthermore, the information needed to perform recommendation is collected and exchanged between users in a privacy preserving way. Finally, information necessary to implement the application is exchanged transparently and opportunistically, by using the residual space in standard short messages occasionally exchanged between users. As a consequence, we do not ask users to change their habits in using SMS.
L. Becchetti, A. Clementi, E. Natale, F. Pasquale, R. Silvestri, and L. Trevisan
Simple Dynamics for Plurality Consensus
Proc. 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2014), 2014
We study a Majority Consensus process in which each of n anonymous agents of a communication network supports an initial opinion (a color chosen from a finite set [k]) and, at every time step, he can revise his color according to a random sample of neighbors. It is assumed that the initial color configuration has a sufficiently large bias s towards a fixed majority color, that is, the number of nodes supporting the majority color exceeds the number of nodes supporting any other color by s additional nodes. The goal (of the agents) is to let the process converge to the stable configuration where all nodes support the majority color. We consider a basic model in which the network is a clique and the update rule (called here the 3-majority dynamics) of the process is that each agent looks at the colors of three random neighbors and then applies the majority rule (breaking ties uniformly).
We prove that with high probability the process converges in time logarithmic in the number of nodes and linear in the number of colors. A natural question is whether looking at more (than three) random neighbors can significantly speed up the process. We provide a negative answer to this question: in particular, we show that samples of polylogarithmic size can speed up the process by a polylogarithmic factor only.
A. Epasto, J. Feldman, S. Lattanzi, S. Leonardi, and V. Mirrokni
Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs
Proceedings of the 23rd International Conference on World Wide Web , 2014
We study the problem of computing similarity rankings in large-scale multi-categorical bipartite graphs, where the two sides of the graph represent actors and items, and the items are partitioned into an arbitrary set of categories. The problem has several real-world applications, including identifying competing advertisers and suggesting related queries in an online advertising system or finding users with similar interests and suggesting content to them. In these settings, we are interested in computing on-the-fly rankings of similar actors, given an actor and an arbitrary subset of categories of interest. Two main challenges arise: First, the bipartite graphs are huge and often lopsided (e.g. the system might receive billions of queries while presenting only millions of advertisers). Second, the sheer number of possible combinations of categories prevents the pre-computation of the results for all of them. We present a novel algorithmic framework that addresses both issues for the computation of several graph-theoretical similarity measures, including # common neighbors, and Personalized PageRank. We show how to tackle the imbalance in the graphs to speed up the computation and provide efficient real-time algorithms for computing rankings for an arbitrary subset of categories. Finally, we show experimentally the accuracy of our approach with real-world data, using both public graphs and a very large dataset from Google AdWords.
R. Colini-Baldeschi, S. Leonardi, P. Sankowski, and Q. Zhang
Revenue Maximizing Envy-Free Fixed-Price Auctions with Budgets
Proc. 10th Conference on Web and Internet Economics (WINE 2014), 2014
Traditional incentive-compatible auctions [6,16] for selling multiple goods to unconstrained and budgeted bidders can discriminate between bidders by selling identical goods at different prices. For this reason, Feldman et al. [7] dropped incentive compatibility and turned the attention to revenue maximizing envy-free item-pricing allocations for budgeted bidders. Envy-free allocations were suggested by classical papers [9,15]. The key property of such allocations is that no one envies the allocation and the price charged to anyone else. In this paper we consider this classical notion of envy-freeness and study fixed-pricemechanisms which use nondiscriminatory uniform prices for all goods. Feldman et al. [7] gave an item-pricing mechanism that obtains 1/2 of the revenue obtained from any envy-free fixed-price mechanism for identical goods. We improve over this result by presenting an FPTAS for the problem that returns an (1−varepsilon)-approximation of the revenue obtained by any envy-free fixed-price mechanism for any varepsilon>0 and runs in polynomial time in the number of bidders n and 1/varepsilon even for exponential supply of goods m. Next, we consider the case of budgeted bidders with matching-type preferences on the set of goods, i.e., the valuation of each bidder for each item is either vi or 0. In this more general case, we prove that it is impossible to approximate the optimum revenue within O(min(n,m)1/2-varepsilon) for any varepsilon>0 unless P=NP. On the positive side, we are able to extend the FPTAS for identical goods to budgeted bidders in the case of constant number of different types of goods. Our FPTAS gives also a constant approximation with respect to the general envy-free auction.
S. Lattanzi, and S. Leonardi
Efficient Computation of the Weighted Clustering Coefficient
Proc. 11th Workshop on Algorithms and Models for the Web Graph (WAW 2014), 2014
The clustering coefficient of an unweighted network has been extensively used to quantify how tightly connected is the neighbor around a node and it has been widely adopted for assessing the quality of nodes in a social network. The computation of the clustering coefficient is challenging since it requires to count the number of triangles in the graph. Several recent works proposed efficient sampling, streaming and MapReduce algorithms that allow to overcome this computational bottleneck. As a matter of fact, the intensity of the interaction between nodes, that is usually represented with weights on the edges of the graph, is also an important measure of the statistical cohesiveness of a network. Recently various notions of weighted clustering coefficient have been proposed but all those techniques are hard to implement on large-scale graphs. In this work we show how standard sampling techniques can be used to obtain efficient estimators for the most commonly used measures of weighted clustering coefficient. Furthermore we also propose a novel graph-theoretic notion of clustering coefficient in weighted networks.
H. Aziz, and B. de Keijzer
Shapley Meets Shapley
Proc. 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014), 2014
This paper concerns the analysis of the Shapley value in matching games. Matching games constitute a fundamental class of cooperative games which help understand and model auctions and assignments. In a matching game, the value of a coalition of vertices is the weight of the maximum size matching in the subgraph induced by the coalition. The Shapley value is one of the most important solution concepts in cooperative game theory. After establishing some general insights, we show that the Shapley value of matching games can be computed in polynomial time for some special cases: graphs with maximum degree two, and graphs that have a small modular decomposition into cliques or cocliques (complete k-partite graphs are a notable special case of this). The latter result extends to various other well-known classes of graph-based cooperative games. We continue by showing that computing the Shapley value of unweighted matching games is #P-complete in general. Finally, a fully polynomial-time randomized approximation scheme (FPRAS) is presented. This FPRAS can be considered the best positive result conceivable, in view of the #P-completeness result.
M. Adamczyk, P. Sankowski, and Q. Zhang
Efficiency of Truthful and Symmetric Mechanisms in One-Sided Matching
Proc. 7th International Symposium on Algorithmic Game Theory (SAGT 2014), 2014
We study the efficiency (in terms of social welfare) of truthful and symmetric mechanisms in one-sided matching problems with dichotomous preferences and em normalized von Neumann-Morgenstern preferences. We are particularly interested in the well-known Random Serial Dictatorship mechanism. For dichotomous preferences, we first show that truthful, symmetric and optimal mechanisms exist if intractable mechanisms are allowed. We then provide a connection to online bipartite matching. Using this connection, it is possible to design truthful, symmetric and tractable mechanisms that extract (1-1/e)-fraction or even 69% of the maximum social welfare. Furthermore, we show that Random Serial Dictatorship always returns an assignment in which the expected social welfare is at least a third of the maximum social welfare. For normalized von Neumann-Morgenstern preferences, we show that Random Serial Dictatorship always returns an assignment in which the expected social welfare is at least ν(opt)2/en, where ν(opt) is the maximum social welfare and n is the number of both agents and items. On the hardness side, we show that no truthful mechanism can achieve a social welfare better than O(ν(opt)2/n).
M. Adamczyk, M. Sviridenko, and J. Ward
Submodular Stochastic Probing on Matroids
Proc. 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014), 2014
In a stochastic probing problem we are given a universe E, where each element e∈E is active independently with probability pe∈[0,1], and only a probe of e can tell us whether it is active or not. On this universe we execute a process that one by one probes elements--if a probed element is active, then we have to include it in the solution, which we gradually construct. Throughout the process we need to obey inner constraints on the set of elements taken into the solution, and outer constraints on the set of all probed elements. This abstract model was presented by Gupta~and~Nagarajan (IPCO '13), and provides a unified view of a number of problems. Thus far all the results in this general framework pertain only to the case in which we are maximizing a linear objective function of the successfully probed elements. In this paper we generalize the stochastic probing problem by considering a monotone submodular objective function. For any T∈(0,1], we give a (1-e-T)/(T(kin+kout)+1)-approximation for the case in which we are given kinge0 matroids as inner constraints and koutge1 matroids as outer constraints. In the case that k=kin+kout>1, we show that the optimal value for T is given by T=-1-1/k-W-1(-e-1-1/k)≈√(2/k)-1/3, where W is the Lambert W function.
There are two main ingredients behind this result. First is a previously unpublished stronger bound on the continuous greedy algorithm due to Vondrak~citeVondrak:email. Second is a rounding procedure that also allows us to obtain an improved 1/(kin+kout)-approximation for linear objective functions.
D. Ferraioli, L. Gourvès, and J. Monnot
On Regular and Approximately Fair Allocations of Indivisible Goods
Proc. 2014 International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014), 2014
An active stream of research is devoted to the design of polynomial approximation algorithms for the fair allocation of indivisible goods. Central to this field is the MaxMin Allocation problem, for which there is a significant gap between known approximation and inapproximability results. Closing this gap is a stimulating challenge.
To this end, we consider a regular version of MaxMin Allocation where each agent must receive exactly k goods, for a given integer k. We call this problem k-division. The analysis of this problem allows us to highlight two interesting features of the classical MaxMin Allocation problem. First, we show a close connection of the problem with matroid theory. This connection provides us an exact algorithm for a special case of k-division and a 1/k-approximation algorithm for general inputs. Moreover, we show that the difficulty of the MaxMin Allocation may be caught by an apparently simpler problem, namely the k-division problem in which an agent's utility for a single good can only take one value out of three.
P. Rozenshtein, A. Anagnostopoulos, A. Gionis, and N. Tatti
Event Detection in Activity Networks [pdf]
Proc. 20th International Conference on Knowledge Discovery and Data Mining (KDD 2014), 2014
With the fast growth of smart devices and social networks, a lot of computing systems collect data that record different types of activities. An important computational challenge is to analyze these data, extract patterns, and understand activity trends. We consider the problem of mining activity networks to identify interesting events, such as a big concert or a demonstration in a city, or a trending keyword in a user community in a social network.
We define an event to be a subset of nodes in the network that are close to each other and have high activity levels. We formalize the problem of event detection using two graph-theoretic formulations. The first one captures the compactness of an event using the sum of distances among all pairs of the event nodes. We show that this formulation can be mapped to the MaxCut problem, and thus, it can be solved by applying standard semidefinite programming techniques. The second formulation captures compactness using a minimum-distance tree. This formulation leads to the prize-collecting Steiner-tree problem, which we solve by adapting existing approximation algorithms. For the two problems we introduce, we also propose efficient and effective greedy approaches and we prove performance guarantees for one of them. We experiment with the proposed algorithms on real datasets from a public bicycling system and a geolocation-enabled social network dataset collected from twitter. The results show that our methods are able to detect meaningful events.
A. Anagnostopoulos, F. Grandoni, S. Leonardi, and A. Wiese
A Mazing 2+ε Approximation for Unsplittable Flow on a Path [pdf]
Proc. 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2014), 2014
We study the unsplittable flow on a path problem (UFP), which arises naturally in many applications such as bandwidth allocation, job scheduling, and caching. Here we are given a path with non-negative edge capacities and a set of tasks, which are characterized by a subpath, a demand, and a profit. The goal is to find the most profitable subset of tasks whose total demand does not violate the edge capacities. Not surprisingly, this problem has received a lot of attention in the research community.
If the demand of each task is at most a small enough fraction δ of the capacity along its subpath (δ-small tasks), then it has been known for a long time [Chekuri et al., ICALP 2003] how to compute a solution of value arbitrarily close to the optimum via LP rounding. However, much remains unknown for the complementary case, that is, when the demand of each task is at least some fraction δ>0 of the smallest capacity of its subpath (δ-large tasks). For this setting a constant factor approximation, improving on an earlier logarithmic approximation, was found only recently [Bonsma et al., FOCS 2011].
In this paper we present a PTAS for δ-large tasks, for any constant δ>0. Key to this result is a complex geometrically inspired dynamic program. Each task is represented as a segment underneath the capacity curve, and we identify a proper maze-like structure so that each corridor of the maze is crossed by only O(1) tasks in the andyoptimal solution. The maze has a tree topology, which guides our dynamic program. Our result implies a 2+ε approximation for UFP, for any constant ε>0, improving on the previously best 7+ε approximation by Bonsma et al. We remark that our improved approximation algorithm matches the best known approximation ratio for the considerably easier special case of uniform edge capacities.
A. Epasto, J. Feldman, S. Lattanzi, S. Leonardi, and V. Mirrokni
Similarity Ranking in Multi-Categorical Bipartite Graphs
Proc. 23rd International World Wide Web Conference (WWW 2014), 2014
We study the problem of computing similarity rankings in large-scale multi-categorical bipartite graphs, where the two sides of the graph represent actors and items, and the items are partitioned into an arbitrary set of categories. The problem has several real-world applications, including identifying competing advertisers and suggesting related queries in an online advertising system or finding users with similar interests and suggesting content to them. In these settings, we are interested in computing on-the-fly rankings of similar actors and items, given an actor and an arbitrary subset of categories of interest. Two main challenges arise: First, the bipartite graphs are huge and often lopsided (e.g. the system might receive billions of queries while presenting only millions of advertisers). Second, the sheer number of possible combinations of categories prevents the pre-computation of the results for all of them.
We present a novel algorithmic framework that addresses both issues for the computation of several graph-theoretical similarity measures, including # common neighbors, and Personalized PageRank. We show how to tackle the imbalance in the graphs to speed up the computation and provide efficient real-time algorithms for computing rankings for an arbitrary subset of categories.
Finally, we show experimentally the accuracy of our approach with real-world data, using both public graphs and a very large dataset from Google AdWords.
M. Bieńkowski, J. Byrka, M. Chrobak, Ł. Jeż, D. Nogneng, and J. Sgall
Better Approximation Bounds for the Joint Replenishment Problem
Proc. 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2014), 2014
The Joint Replenishment Problem (JRP) deals with optimizing shipments of goods from a supplier to retailers through a shared warehouse. Each shipment involves transporting goods from the supplier to the warehouse, at a fixed cost C, followed by a redistribution of these goods from the warehouse to the retailers that ordered them, where transporting goods to a retailer rho has a fixed cost crho. In addition, we incur waiting costs for each order, possibly an arbitrary non-decreasing function of time, different for each order. The objective is to minimize the overall cost of satisfying all orders, namely the sum of all shipping and waiting costs.
JRP has been well studied in Operations Research and, more recently, in the area of approximation algorithms. For arbitrary waiting cost functions, the best known approximation ratio is 1.8. This ratio can be reduced to ≈1.574 for the JRP-D model, where there is no cost for waiting but orders have deadlines. As for hardness results, it is known that the problem is APX- hard and that the natural linear program for JRP has integrality gap at least 1.245. Both results hold even for JRP-D. In the online scenario, the best lower and upper bounds on the competitive ratio are 2.64 and 3, respectively. The lower bound of 2.64 applies even to the restricted version of JRP, denoted JRP-L, where the waiting cost function is linear.
We provide several new approximation results for JRP. In the offline case, we give an algorithm with ratio ≈1.791, breaking the barrier of 1.8. We also show that the integrality gap of the linear program for JRP-L is at least 12/11 ≈1.09. In the online case, we show a lower bound of ≈2.754 on the competitive ratio for JRP-L (and thus JRP as well), improving the previous bound of 2.64. We also study the online version of JRP-D, for which we prove that the optimal competitive ratio is 2.
F. Grandoni, A. Gupta, S. Leonardi, P. Miettinen, P. Sankowski, and M. Singh
Set Covering with Our Eyes Closed
SIAM Journal on Computing, Volume 42, Number 3, 2013
Given a universe U of n elements and a weighted collection C of m subsets of U, the universal set cover problem is to a-priori map each element u∈U to a set S(u)∈C containing u, so that X⊆U is covered by S(X)=∪u∈XS(u). The aim is finding a mapping such that the cost of S(X) is as close as possible to the optimal set-cover cost for X. (Such problems are also called oblivious or a-priori optimization problems.) Unfortunately, for every universal mapping, the cost of S(X) can be Ω(√n) times larger than optimal if the set X is adversarially chosen.
In this paper we study the performance on average, when X is a set of randomly chosen elements from the universe: we show how to efficiently find a universal map whose expected cost is O(logmn) times the expected optimal cost. In fact, we give a slightly improved analysis and show that this is the best possible. We generalize these ideas to weighted set cover and show similar guarantees to (non-metric) facility location, where we have to balance the facility opening cost with the cost of connecting clients to the facilities. We show applications of our results to universal multi-cut and disc-covering problems, and show how all these universal mappings give us stochastic online algorithms with the same competitive factors.
Ł. Jeż
A Universal Randomized Packet Scheduling Algorithm
Algorithmica, Volume 67, Number 4, 2013
We give a memoryless scale-invariant randomized algorithm ReMix for Packet Scheduling that is e/(e−1)-competitive against an adaptive adversary. ReMix unifies most of previously known randomized algorithms, and its general analysis yields improved performance guarantees for several restricted variants, including the s-bounded instances. In particular, ReMix attains the optimum competitive ratio of 4/3 on 2-bounded instances.
Our results are applicable to a more general problem, called Item Collection, in which only the relative order between packets’ deadlines is known. ReMix is the optimal memoryless randomized algorithm against adaptive adversary for that problem.
M. Bienkowski, M. Chrobak, C. Dürr, M. Hurand, A. Jeż, Ł. Jeż, and G. Stachowiak
Collecting Weighted Items From a Dynamic Queue
Algorithmica, Volume 65, Number 1, 2013
We consider online competitive algorithms for the problem of collecting weighted items from a dynamic queue S. The content of S varies over time. An update to S can occur between any two consecutive time steps, and it consists in deleting any number of items at the front of S and inserting other items into arbitrary locations in S. At each time step we are allowed to collect one item in S. The objective is to maximize the total weight of collected items. This is a generalization of bounded-delay packet scheduling (also known as buffer management). We present several upper and lower bounds on the competitive ratio for the general case and for some restricted variants of this problem.
M. Bienkowski, M. Chrobak, C. Dürr, M. Hurand, A. Jeż, Ł. Jeż, and G. Stachowiak
A φ-Competitive Algorithm for Collecting Items with Increasing Weights From a Dynamic Queue
Theoretical Computer Science, Volume 475, 2013
The bounded-delay packet scheduling (or buffer management) problem is to schedule transmissions of packets arriving in a buffer of a network link. Each packet has a deadline and a weight associated with it. The objective is to maximize the weight of packets that are transmitted before their deadlines, assuming that only one packet can be transmitted in one time step. Online packet scheduling algorithms have been extensively studied. It is known that no online algorithm can achieve a competitive ratio better than φ≈1.618 (the golden ratio), while the currently best upper bound on the competitive ratio is 2√2-1≈1.1824. Closing the gap between these bounds remains a major open problem.
The above mentioned lower bound of φ uses instances where item weights increase exponentially over time. In fact, all lower bounds for various versions of buffer management problems involve instances of this type. In this paper, we design an online algorithm for packet scheduling with competitive ratio φ when packet weights are increasing, thus matching this lower bound. Our algorithm applies, in fact, to a much more general version of packet scheduling, where only the relative order of the deadlines is known, not their exact values.
M. Chrobak, Ł. Jeż, and J. Sgall
Better Bounds for Incremental Frequency Allocation in Bipartite Graphs
Theoretical Computer Science, Volume 514, 2013
We study frequency allocation in wireless networks. A wireless network is modeled by an undirected graph, with vertices corresponding to cells. In each vertex, we have a certain number of requests, and each of those requests must be assigned a different frequency. Edges represent conflicts between cells, meaning that frequencies in adjacent vertices must be different as well. The objective is to minimize the total number of used frequencies.
The offline version of the problem is known to be NP-hard. In the incremental version, requests for frequencies arrive over time and the algorithm is required to assign a frequency to a request as soon as it arrives. Competitive incremental algorithms have been studied for several classes of graphs. For paths, the optimal (asymptotic) ratio is known to be 4/3, while for hexagonal-cell graphs it is between 1.5 and 1.9126. For ξ-colorable graphs, the ratio of (ξ+1)/2 can be achieved.
In this paper, we prove nearly tight bounds on the asymptotic competitive ratio for bipartite graphs, showing that it is between 1.428 and 1.433. This improves the previous lower bound of 4/3 and upper bound of 1.5. Our proofs are based on reducing the incremental problem to a purely combinatorial (equivalent) problem of constructing set families with certain intersection properties.
Ł. Jeż, J. Schwartz, J. Sgall, and J. Békési
Lower Bounds for Online Makespan Minimization on a Small Number of Related Machines
Journal of Scheduling, Volume 16, Number 5, 2013
In online makespan minimization, the jobs characterized by their processing time arrive one-by-one and each has to be assigned to one of the m uniformly related machines. The goal is to minimize the length of the schedule. We prove new combinatorial lower bounds for m=4 and m=5, and computer-assisted lower bounds for m≤11.
L. Becchetti, V. Bonifaci, M. Dirnberger, A. Karrenbauer, and K. Mehlhorn
Physarum Can Compute Shortest Paths: Convergence Proofs and Complexity Bounds [pdf]
Proc. 40th International Colloquium on Automata, Languages and Programming (ICALP 2013), 2013
Physarum polycephalum is a slime mold that is apparently able to solve shortest path problems. A mathematical model for the slime’s behavior in the form of a coupled system of differential equations was proposed by Tero, Kobayashi and Nakagaki [TKN07]. We prove that a discretization of the model (Euler integration) computes a (1+ε)-approximation of the shortest path in O(mL(logn+logL)/ε3) iterations, with arithmetic on numbers of O(log(nL/ε)) bits; here, n and m are the number of nodes and edges of the graph, respectively, and L is the largest length of an edge. We also obtain two results for a directed Physarum model proposed by Ito et al. [IJNT11]: convergence in the general, nonuniform case and convergence and complexity bounds for the discretization of the uniform case.
A. Anagnostopoulos, L. Becchetti, B. de Keijzer, and G. Schäfer
Inefficiency of Games with Social Context [pdf]
Proc. 6th International Symposium on Algorithmic Game Theory (SAGT 2013), 2013
The study of other-regarding player behavior such as altruism and spite in games has recently received quite some attention in the algorithmic game theory literature. Already for very simple models, it has been shown that altruistic behavior can actually be harmful for society in the sense that the price of anarchy may increase as the players become more altruistic. In this paper, we study the severity of this phenomenon for more realistic settings in which there is a complex underlying social structure, causing the players to direct their altruistic and spiteful behavior in a refined player-specific sense (depending, for example, on friendships that exist among the players). Our findings show that the increase in the price of anarchy is modest for congestion games and minsum scheduling games, whereas it is drastic for generalized second price auctions.
A. Anagnostopoulos, F. Grandoni, S. Leonardi, and A. Wiese
Constant Integrality Gap LP Formulations of Unsplittable Flow on a Path [pdf]
Proc. 16th Conference on Integer Programming and Combinatorial Optimization (IPCO 2013), 2013
The Unsplittable Flow Problem on a Path (UFP) is a core problem in many important settings such as network flows, bandwidth allocation, resource constraint scheduling, and interval packing. We are given a path with capacities on the edges and a set of tasks, each task having a demand, a profit, a source and a destination vertex on the path. The goal is to compute a subset of tasks of maximum profit that does not violate the edge capacities.
In practical applications generic approaches such as integer programming (IP) methods are desirable. Unfortunately, no IP-formulation is known for the problem whose LP-relaxation has an integrality gap that is provably constant. For the unweighted case, we show that adding a few constraints to the standard LP of the problem is sufficient to make the integrality gap drop from Ω(n) to O(1). This positively answers an open question in [Chekuri et al., APPROX 2009].
For the general (weighted) case, we present an extended formulation with integrality gap bounded by 7+ε. This matches the best known approximation factor for the problem [Bonsma et al., FOCS 2011]. This result exploits crucially a technique for embedding dynamic programs into linear programs. We believe that this method could be useful to strengthen LP-formulations for other problems as well and might eventually speed up computations due to stronger problem formulations.
S. Bhattacharya, E. Koutsoupias, J. Kulkarni, S. Leonardi, T. Roughgarden, and X. Xu
Near-Optimal Multi-Unit Auctions with Ordered Bidders
Proc. 14th ACM Conference on Electronic Commerce (EC 2013), 2013
We construct prior-free auctions with constant-factor approximation guarantees with ordered bidders, in both unlimited and limited supply settings. We compare the expected revenue of our auctions on a bid vector to the monotone price benchmark, the maximum revenue that can be obtained from a bid vector using supply-respecting prices that are nonincreasing in the bidder ordering and bounded above by the second-highest bid. As a consequence, our auctions are simultaneously near-optimal in a wide range of Bayesian multi-unit environments.
M. Pilipczuk, M. Pilipczuk, P. Sankowski, and E. J. Leeuwen
Subexponential-Time Parameterized Algorithm for Steiner Tree on Planar Graphs
Proc. 30th International Symposium on Theoretical Aspects of Computer Science (STACS 2013), 2013
The well-known bidimensionality theory provides a method for designing fast, subexponential-time parameterized algorithms for a vast number of NP-hard problems on sparse graph classes such as planar graphs, bounded genus graphs, or, more generally, graphs with a fixed excluded minor. However, in order to apply the bidimensionality framework the considered problem needs to fulfill a special density property. Some well-known problems do not have this property, unfortunately, with probably the most prominent and important example being the Steiner Tree problem. Hence the question whether a subexponential-time parameterized algorithm for Steiner Tree on planar graphs exists has remained open. In this paper, we answer this question positively and develop an algorithm running in O(2O((klogk)^(2/3))n) time and polynomial space, where k is the size of the Steiner tree and n is the number of vertices of the graph. Our algorithm does not rely on tools from bidimensionality theory or graph minors theory, apart from Baker’s classical approach. Instead, we introduce new tools and concepts to the study of the parameterized complexity of problems on sparse graphs.
M. Cygan, and Ł. Jeż
Online Knapsack Revisited
Proc. 11th Workshop on Approximation and Online Algorithms (WAOA 2013), 2013
We investigate the online variant of the Multiple Knapsack problem: an algorithm is to pack items, of arbitrary sizes and profits, in k knapsacks (bins) without exceeding the capacity of any bin. We study two objective functions: the sum and the maximum of profits over all bins. Both have been studied before in restricted variants of our problem: the sum in Dual Bin Packing [1], and the maximum in Removable Knapsack [7,8]. Following these, we study two variants, depending on whether the algorithm is allowed to remove (forever) items from its bins or not, and two special cases where the profit of an item is a function of its size, in addition to the general setting.
We study both deterministic and randomized algorithms; for the latter, we consider both the oblivious and the adaptive adversary model. We classify each variant as either admitting O(1)-competitive algorithms or not. We develop simple O(1)-competitive algorithms for some cases of the max-objective variant believed to be infeasible because only 1-bin deterministic algorithms were considered for them before.
M. Bieńkowski, J. Byrka, M. Chrobak, Ł. Jeż, J. Sgall, and G. Stachowiak
Online Control Message Aggregation in Chain Networks
Algorithms and Data Structures Symposium , 2013
In the Control Message Aggregation (CMA) problem, control packets are generated over time at the nodes of a tree T and need to be transmitted to the root of T. To optimize the overall cost, these transmissions can be delayed and different packets can be aggregated, that is a single transmission can include all packets from a subtree rooted at the root of T. The cost of this transmission is then equal to the total edge length of this subtree, independently of the number of packets that are sent. A sequence of transmissions that transmits all packets is called a schedule. The objective is to compute a schedule with minimum cost, where the cost of a schedule is the sum of all the transmission costs and delay costs of all packets. The problem is known to be NP-hard, even for trees of depth 2. In the online scenario, it is an open problem whether a constant-competitive algorithm exists.
We address the special case of the problem when T is a chain network. For the online case, we present a 5-competitive algorithm and give a lower bound of 2+φ≈3.618, improving the previous bounds of 8 and 2, respectively. Furthermore, for the offline version, we give a polynomial-time algorithm that computes the optimum solution.
C. Dürr, Ł. Jeż, and O. C. Vásquez
Mechanism Design for Aggregating Energy Consumption and Quality of Service in Speed Scaling Scheduling
Proc. 9th Conference on Web and Internet Economics (WINE 2013), 2013
We consider a strategic game, where players submit jobs to a machine that executes all jobs in a way that minimizes energy while respecting the jobs' deadlines. The energy consumption is then charged to the players in some way. Each player wants to minimize the sum of that charge and of their job’s deadline multiplied by a priority weight. Two charging schemes are studied, the proportional cost share which does not always admit pure Nash equilibria, and the marginal cost share, which does always admit pure Nash equilibria, at the price of overcharging by a constant factor.
A. Anagnostopoulos, A. Dasgupta, and R. Kumar
A Constant-Factor Approximation Algorithm for Co-Clustering [pdf]
Theory of Computing, Volume 8, Number 26, 2012
Co-clustering is the simultaneous partitioning of the rows and columns of a matrix such that the blocks induced by the row/column partitions are good clusters. Motivated by several applications in text mining, market-basket analysis, and bioinformatics, this problem has attracted a lot of attention in the past few years. Unfortunately, to date, most of the algorithmic work on this problem has been heuristic in nature.
In this work we obtain the first approximation algorithms for the co-clustering problem. Our algorithms are simple and provide constant-factor approximations to the optimum. We also show that co-clustering is NP-hard, thereby complementing our algorithmic result.
L. Becchetti, L. Filipponi, and A. Vitaletti
Privacy Support in People-Centric Sensing [pdf]
Journal of Communications, Volume 7, Number 8, 2012
In this paper we present an approach to support privacy in people-centric sensing. In particular, we propose a technique that allows a central authority to select a subset of users whose past positions provide a good coverage of a given area of interest, without explicitly georeferencing them. To achieve this goal, we propose an efficient algorithm to solve the well known, NP-complete Set Cover problem that does not require explicit knowledge of the sets, but only uses their compact, privacy preserving representations or sketches. We perform a thorough experimental analysis to evaluate the performance of the proposed technique and its sensitivity to a few key parameters using public data from real applications. Experimental results support the effectiveness of the proposed approach to efficiently produce accurate environmental or social maps, at the same time preserving users' privacy.
L. Becchetti, U. Colesanti, A. Marchetti-Spaccamela, and A. Vitaletti
Fully Decentralized Recommendations in Pervasive Systems: Models and Experimental Analysis
International Journal of Engineering Intelligent Systems for Electrical Engineering and Communications, Volume 20, Number 3, 2012
In this paper we investigate the use of fully decentralized recommendation techniques for pervasive systems of small devices with limited or no communication and computational. We consider a reference scenario where items are distributed in a shop (e.g. a bookstore or a supermarket) and are tagged with computational devices of limited (or no) communication and computational capabilities (such as RFIDs). We do not assume any cooperation among users and, more importantly, we assume that every node has only partial view of the past history of the system, determined by users' visit patterns; information exchange among nodes is mediated by the collective and unpredictable navigation of users. These limitations are coherent with standard systems and current technology. We propose recommendation algorithms relying on the computation and maintenance of simple statistics about users' visit patterns, without requiring any explicit interaction among users and can be easily and efficiently implemented. We report on extensive simulations that show the effectiveness of our profiling heuristics. Finally, we test our recommendation algorithms on real, publicly available datasets.
A. Kesselman, and S. Leonardi
Game-Theoretic Analysis of Internet Switching with Selfish Users
Theoretical Computer Science, Volume 452, 2012
We consider the problem of Internet switching, where traffic is generated by selfish users. We study a packetized (TCP-like) traffic model, which is more accurate than the widely used fluid model. We assume that routers have First-In-First-Out (FIFO) buffers of bounded capacity managed by the drop-tail policy. The utility of each user depends on its goodput and the congestion level. Since selfish users try to maximize their own utility disregarding the system objectives, we study Nash equilibria that correspond to a steady state of the system. We quantify the degradation in the network performance called the price of anarchy resulting from such selfish behavior. We show that for a single bottleneck buffer, the price of anarchy is proportional to the number of users. Then we propose a simple modification of the Random Early Detection (RED) drop policy, which reduces the price of anarchy to a constant. We demonstrate that a Nash equilibrium can be reached if all users deploy TCP Vegas as their transport protocol under the drop-tail policy. We also consider some natural extensions of our model including the case of multiple Quality of Service (QoS) requirements, routing on parallel links and general networks with multiple bottlenecks.
A. Anagnostopoulos, L. Becchetti, C. Castillo, A. Gionis, and S. Leonardi
Online Team Formation in Social Networks [pdf]
Proc. 21st International World Wide Web Conference (WWW 2012), 2012
We study the problem of online team formation. We consider a setting in which people possess different skills and compatibility among potential team members is modeled by a social network. A sequence of tasks arrives in an online fashion, and each task requires a specific set of skills. The goal is to form a new team upon arrival of each task, so that (i) each team possesses all skills required by the task, (ii) each team has small communication overhead, and (iii) the workload of performing the tasks is balanced among people in the fairest possible way.
We propose efficient algorithms that address all these requirements: our algorithms form teams that always satisfy the required skills, provide approximation guarantees with respect to team communication overhead, and they are online-competitive with respect to load balancing. Experiments performed on collaboration networks among film actors and scientists, confirm that our algorithms are successful at balancing these conflicting requirements.
This is the first paper that simultaneously addresses all these aspects. Previous work has either focused on minimizing coordination for a single task or balancing the workload neglecting coordination costs.
A. Anagnostopoulos, R. Kumar, M. Mahdian, E. Upfal, and F. Vandin
Algorithms on Evolving Graphs [pdf]
Proc. 3rd ACM International Conference on Innovations on Theoretical Computer Science (ITCS 2012), 2012
Motivated by applications that concern graphs that are evolving and massive in nature, we define a new general framework for computing with such graphs. In our framework, the graph changes over time and an algorithm can only track these changes by explicitly probing the graph. This framework captures the inherent tradeoff between the complexity of maintaining an up-to-date view of the graph and the quality of results computed with the available view. We apply this framework to two classical graph connectivity problems, namely, path connectivity and minimum spanning trees, and obtain efficient algorithms.
L. Becchetti, L. Bergamini, F. Ficarola, and A. Vitaletti
Population Protocols on Real Social Networks [pdf]
Proc. 5th Workshop on Social Network Systems (SNS 2012), 2012
In this paper we present an experimental analysis to assess the performance of Population Protocols, a well-known fully decentralized computational model, on a real, evolving social network.
S. Leonardi, and T. Roughgarden
Prior-Free Auctions with Ordered Bidders
Proc. 44th ACM Symposium on Theory of Computing (STOC 2012), 2012
Prior-free auctions are robust auctions that assume no distribution over bidders' valuations and provide worst-case (input-by-input) approximation guarantees. In contrast to previous work on this topic, we pursue good prior-free auctions with non-identical bidders.
Prior-free auctions can approximate meaningful benchmarks for non-identical bidders only when "sufficient qualitative information" about the bidder asymmetry is publicly known. We consider digital goods auctions where there is a total ordering of the bidders that is known to the seller, where earlier bidders are in some sense thought to have higher valuations. We use the framework of Hartline and Roughgarden (STOC '08) to define an appropriate revenue benchmark: the maximum revenue that can be obtained from a bid vector using prices that are nonincreasing in the bidder ordering and bounded above by the second-highest bid. This monotone-price benchmark is always as large as the well-known fixed-price benchmark F(2), so designing prior-free auctions with good approximation guarantees is only harder. By design, an auction that approximates the monotone-price benchmark satisfies a very strong guarantee: it is, in particular, simultaneously near-optimal for essentially every Bayesian environment in which bidders' valuation distributions have nonincreasing monopoly prices, or in which the distribution of each bidder stochastically dominates that of the next. Of course, even if there is no distribution over bidders' valuations, such an auction still provides a quantifiable input-by-input performance guarantee.
In this paper, we design a simple prior-free auction for digital goods with ordered bidders, the Random Price Restriction (RPR) auction. We prove that its expected revenue on every bid profile b is Ω(M(b)/log*n), where M denotes the monotone-price benchmark and log*n denotes the number of times that the log2 operator can be applied to n before the result drops below a fixed constant.
M. Feldman, A. Fiat, S. Leonardi, and P. Sankowski
Revenue Maximizing Envy-Free Multi-Unit Auctions with Budgets
Proc. 13th ACM Conference on Electronic Commerce (EC 2012), 2012
We study envy-free (EF) mechanisms for multi-unit auctions with budgeted agents that approximately maximize revenue. In an EF auction, prices are set so that every bidder receives a bundle that maximizes her utility amongst all bundles; We show that the problem of revenue-maximizing EF auctions is NP-hard, even for the case of identical items and additive valuations (up to the budget). The main result of our paper is a novel EF auction that runs in polynomial time and provides a approximation of 1/2 with respect to the revenue-maximizing EF auction. A slight variant of our mechanism will produce an allocation and pricing that is more restrictive (so called item pricing) and gives a 1/2 approximation to the optimal revenue within this more restrictive class.
R. Colini-Baldeschi, M. Henzinger, S. Leonardi, and M. Starnberger
On Multiple Keyword Sponsored Search Auctions with Budgets
Proc. 39th International Colloquium on Automata, Languages and Programming (ICALP 2012), 2012
We study multiple keyword sponsored search auctions with budgets. Each keyword has multiple ad slots with a click-through rate. The bidders have additive valuations, which are linear in the click-through rates, and budgets, which are restricting their overall payments. Additionally, the number of slots per keyword assigned to a bidder is bounded.
We show the following results: (1) We give the first mechanism for multiple keywords, where click-through rates differ among slots. Our mechanism is incentive compatible in expectation, individually rational in expectation, and Pareto optimal. (2) We study the combinatorial setting, where each bidder is only interested in a subset of the keywords. We give an incentive compatible, individually rational, Pareto optimal, and deterministic mechanism for identical click-through rates. (3) We give an impossibility result for incentive compatible, individually rational, Pareto optimal, and deterministic mechanisms for bidders with diminishing marginal valuations.
I. Mele, F. Bonchi, and A. Gionis
The Early-Adopter Graph and Its Application to Web-Page Recommendation [pdf]
Proc. 21st ACM International Conference on Information and Knowledge Management (CIKM 2012), 2012
In this paper we present a novel graph-based data abstraction for modeling the browsing behavior of web users. The objective is to identify users who discover interesting pages before others. We call these users early adopters. By tracking the browsing activity of early adopters we can identify new interesting pages early, and recommend these pages to similar users. We focus on news and blog pages, which are more dynamic in nature and more appropriate for recommendation.
Our proposed model is called early-adopter graph. In this graph, nodes represent users and a directed arc between users u and v expresses the fact that u and v visit similar pages and, in particular, that user u tends to visit those pages before user v. The weight of the edge is the degree to which the temporal rule "v visits a page before v" holds.
Based on the early-adopter graph, we build a recommendation system for news and blog pages, which outperforms other out-of-the-shelf recommendation systems based on collaborative filtering.
A. Anagnostopoulos, A. Z. Broder, E. Gabrilovich, V. Josifovski, and L. Riedel
Web Page Summarization for Just-in-Time Contextual Advertising [pdf]
ACM Transactions on Intelligent Systems and Technology, Volume 3, Number 1, 2011
Contextual advertising is a type of Web advertising, which, given the URL of a Web page, aims to embed into the page the most relevant textual ads available. For static pages that are displayed repeatedly, the matching of ads can be based on prior analysis of their entire content; however, often ads need to be matched to new or dynamically created pages that cannot be processed ahead of time. Analyzing the entire content of such pages on-the-fly entails prohibitive communication and latency costs. To solve the three-horned dilemma of either low relevance or high latency or high load, we propose to use text summarization techniques paired with external knowledge (exogenous to the page) to craft short page summaries in real time.
Empirical evaluation proves that matching ads on the basis of such summaries does not sacrifice relevance, and is competitive with matching based on the entire page content. Specifically, we found that analyzing a carefully selected 6% fraction of the page text can sacrifice only 1%--3% in ad relevance. Furthermore, our summaries are fully compatible with the standard JavaScript mechanisms used for ad placement: they can be produced at ad-display time by simple additions to the usual script, and they only add 500--600 bytes to the usual request. We also compared our summarization approach, which is based on structural properties of the HTML content of the page, with a more principled one based on one of the standard text summarization tools (MEAD), and found their performance to be comparable.
A. Anagnostopoulos, R. Kumar, M. Mahdian, and E. Upfal
Sorting and Selection on Dynamic Data [pdf]
Theoretical Computer Science, Volume 412, Number 24, 2011
We formulate and study a new computational model for dynamic data. In this model, the data changes gradually and the goal of an algorithm is to compute the solution to some problem on the data at each time step, under the constraint that it only has limited access to the data each time. As the data is constantly changing and the algorithm might be unaware of these changes, it cannot be expected to always output the exact right solution; we are interested in algorithms that guarantee to output an approximate solution. In particular, we focus on the fundamental problems of sorting and selection, where the true ordering of the elements changes slowly. We provide algorithms with performance close to the optimal in expectation and with high probability.
L. Becchetti, U. Colesanti, A. Marchetti-Spaccamela, and A. Vitaletti
Recommending Items in Pervasive Scenarios: Models and Experimental Analysis [pdf]
Knowledge and Information Systems, Volume 28, Number 3, 2011
In this paper, we propose and investigate the effectiveness of fully decentralized, collaborative filtering techniques. These are particularly interesting for use in pervasive systems of small devices with limited communication and computational capabilities. In particular, we assume that items are tagged with smart tags (such as passive RFIDs), storing aggregate information about the visiting patterns of users that interacted with them in the past. Users access and modify information stored in smart tags transparently, by smart reader devices that are already available on commercial mobile phones. Smart readers use private information about previous behavior of the user and aggregate information retrieved from smart tags to recommend new items that are more likely to meet user expectations. Note that we do not assume any transmission capabilities between smart tags: Information exchange among them is mediated by users' collective and unpredictable navigation patterns. Our algorithms do not require any explicit interaction among users and can be easily and efficiently implemented. We analyze their theoretical behavior and assess their performance in practice, by simulation on both synthetic and real, publicly available data sets. We also compare the performance of our fully decentralized solutions with that of state-of-the-art centralized strategies.
L. Becchetti, I. Bordino, S. Leonardi, and A. Rosen
Fully Decentralized Computation of Aggregates over Data Streams [pdf]
ACM SIGKDD Explorations Newsletter, Volume 12, Number 2, 2011
In several emerging applications, data is collected in massive streams at several distributed points of observation. A basic and challenging task is to allow every node to monitor a neighbourhood of interest by issuing continuous aggregate queries on the streams observed in its vicinity. This class of algorithms is fully decentralized and diffusive in nature: collecting all data at few central nodes of the network is unfeasible in networks of low capability devices or in the presence of massive data sets.
The main difficulty in designing diffusive algorithms is to cope with duplicate detections. These arise both from the observation of the same event at several nodes of the network and/or receipt of the same aggregated information along multiple paths of diffusion. In this paper, we consider fully decentralized algorithms that answer locally continuous aggregate queries on the number of distinct events, total number of events and the second frequency moment in the scenario outlined above. The proposed algorithms use in the worst case or on realistic distributions sublinear space at every node. We also propose strategies that minimize the communication needed to update the aggregates when new events are observed. We experimentally evaluate for the efficiency and accuracy of our algorithms on realistic simulated scenarios.
L. Becchetti, I. Chatzigiannakis, and Y. Giannakopoulos
Streaming Techniques and Data Aggregation in Networks of Tiny Artefacts [pdf]
Computer Science Review, Volume 5, Number 1, 2011
In emerging pervasive scenarios, data is collected by sensing devices in streams that occur at several distributed points of observation. The size of the data typically far exceeds the storage and computational capabilities of the tiny devices that have to collect and process them. A general and challenging task is to allow (some of) the nodes of a pervasive network to collectively perform monitoring of a neighbourhood of interest by issuing continuous aggregate queries on the streams observed in its vicinity. This class of algorithms is fully decentralized and diffusive in nature: collecting all the data at a few central nodes of the network is unfeasible in networks of low capability devices or in the presence of massive data sets. Two main problems arise in this scenario: (i) the intrinsic complexity of maintaining statistics over a data stream whose size greatly exceeds the capabilities of the device that performs the computation; (ii) composing the partial outcomes computed at different points of observation into an accurate, global statistic over a neighbourhood of interest, which entails coping with several problems, last but not least the receipt of duplicate information along multiple paths of diffusion.
Streaming techniques have emerged as powerful tools to achieve the general goals described above, in the first place because they assume a computational model in which computational and storage resources are assumed to be far exceeded by the amount of data on which computation occurs. In this contribution, we review the main streaming techniques and provide a classification of the computational problems and the applications they effectively address, with an emphasis on decentralized scenarios, which are of particular interest in pervasive networks.
A. Anagnostopoulos, G. Brova, and E. Terzi
Peer and Authority Pressure in Information-Propagation Models [pdf]
Proc. 22nd European Conference on Machine Learning and 15th European Conference on Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2011), 2011
Existing models of information diffusion assume that peer influence is the main reason for the observed propagation patterns. In this paper, we examine the role of authority pressure on the observed information cascades. We model this intuition by characterizing some nodes in the network as "authority" nodes. These are nodes that can influence large number of peers, while themselves cannot be influenced by peers. We propose a model that associates with every item two parameters that quantify the impact of the peer and the authority pressure on the item's propagation. Given a network and the observed diffusion patterns of the item, we learn these parameters from the data and characterize the item as peer- or authority-propagated. We also develop a randomization test that evaluates the statistical significance of our findings and makes our item characterization robust to noise. Our experiments with real data from online media and scientific-collaboration networks indicate that there is a strong signal of authority pressure in these networks.
A. Anagnostopoulos, L. Becchetti, I. Mele, S. Leonardi, and P. Sankowski
Stochastic Query Covering [pdf]
Proc. 4th ACM International Conference on Web Search and Data Mining (WSDM 2011), 2011
In this paper we introduce the problem of query covering as a means to efficiently cache query results. The general idea is to populate the cache with documents that contribute to the result pages of a large number of queries, as opposed to caching the top documents for each query. It turns out that the problem is hard and solving it requires knowledge of the structure of the queries and the results space, as well as knowledge of the input query distribution. We formulate the problem under the framework of stochastic optimization; theoretically it can be seen as a stochastic universal version of set multicover. While the problem is NP-hard to be solved exactly, we show that for any distribution it can be approximated using a simple greedy approach. Our theoretical findings are complemented by experimental activity on real datasets, showing the feasibility and potential interest of query-covering approaches in practice.
A. Fiat, S. Leonardi, J. Saia, and P. Sankowski
Single Valued Combinatorial Auctions with Budgets
Proc. 12th ACM Conference on Electronic Commerce (EC 2011), 2011
We consider budget constrained combinatorial auctions where each bidder has a private value for each of the items in some subset of the items and an overall budget constraint. Such auctions capture adword auctions, where advertisers offer a bid for those adwords that (hopefully) target their intended audience, and advertisers also have budgets. It is known that even if all items are identical and all budgets are public it is nots possible to be truthful and efficient. Our main result is a novel auction that runs in polynomial time, is incentive compatible, and ensures Pareto-optimality. The auction is incentive compatible with respect to the private valuations whereas the budgets and the sets of interest are assumed to be public knowledge. This extends the result of Dobzinski, Lavi and Nisan (FOCS 2008) for auctions of multiple identical items with bugets to single-valued combinatorial auctions and address one of the basic challenges on auctioning web ads (see Nisan et al, 2009, Google auctions for tv ads).
L. Becchetti, P. Boldi, C. Castillo, and A. Gionis
Efficient Algorithms for Large-Scale Local Triangle Counting [pdf]
ACM Transactions on Knowledge Discovery From Data, Volume 4, Number 3, 2010
In this article, we study the problem of approximate local triangle counting in large graphs. Namely, given a large graph G=(V,E) we want to estimate as accurately as possible the number of triangles incident to every node v∈V in the graph. We consider the question both for undirected and directed graphs. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first contribution that addresses the problem of approximate local triangle counting with a focus on the efficiency issues arising in massive graphs and that also considers the directed case. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features for content quality assessment in social networks.
For computing the local number of triangles (undirected and directed), we propose two approximation algorithms, which are based on the idea of min-wise independent permutations [Broder et al. 1998]. Our algorithms operate in a semi-streaming fashion, using O(|V|) space in main memory and performing O(log|V|) sequential scans over the edges of the graph. The first algorithm we describe in this article also uses O(|E|) space of external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results on large graphs, demonstrating the practical efficiency of our approach.
A. Anagnostopoulos, L. Becchetti, C. Castillo, A. Gionis, and S. Leonardi
Power in Unity: Forming Teams in Large-Scale Community Systems [pdf]
Proc. 19th ACM International Conference on Information and Knowledge Management (CIKM 2010), 2010
The internet has enabled the collaboration of groups at a scale that was unseen before. A key problem for large collaboration groups is to be able to allocate tasks effectively. An effective task assignment method should consider both how fit teams are for each job as well as how fair the assignment is to team members, in terms that no one should be overloaded or unfairly singled out. The assignment has to be done automatically or semi-automatically given that it is difficult and time-consuming to keep track of the skills and the workload of each person. Obviously the method to do this assignment must also be computationally efficient.
In this paper we present a general framework for task-assignment problems. We provide a formal treatment on how to represent teams and tasks. We propose alternative functions for measuring the fitness of a team performing a task and we discuss desirable properties of those functions. Then we focus on one class of task-assignment problems, we characterize the complexity of the problem, and we provide algorithms with provable approximation guarantees, as well as lower bounds. We also present experimental results that show that our methods are useful in practice in several application scenarios.
A. Anagnostopoulos, F. Grandoni, S. Leonardi, and P. Sankowski
Online Network Design with Outliers [pdf]
Proc. 37th International Colloquium on Automata, Languages and Programming (ICALP 2010), 2010
In a classical online network design problem, traffic requirements are gradually revealed to an algorithm. Each time a new request arrives, the algorithm has to satisfy it by augmenting the network under construction in a proper way (with no possibility of recovery). In this paper we study a natural generalization of the problems above, where a fraction of the requests (the outliers) can be disregarded. Now, each time a request arrives, the algorithm first decides whether to satisfy it or not, and only in the first case it acts accordingly; in the end at least k out of t requests must be selected. We cast three classical network design problems into this framework, the Online Steiner Tree with Outliers, the Online TSP with Outliers, and the Online Facility Location with Outliers.
We focus on the known distribution model, where terminals are independently sampled from a given distribution. For all the above problems, we present bicriteria online algorithms that, for any constant ε>0, select at least (1-ε)k terminals with high probability and pay in expectation O(log2n) times more than the expected cost of the optimal offline solution (selecting k terminals). These upper bounds are complemented by inapproximability results.
A. Anagnostopoulos, C. Dombry, N. Guillotin-Plantard, I. Kontoyiannis, and E. Upfal
Stochastic Analysis of the k-Server Problem on the Circle [pdf]
Proc. 21st International Meeting on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2010), 2010
We consider a stochastic version of the k-server problem in which k servers move on a circle to satisfy stochastically generated requests. The requests are independent and identically distributed according to an arbitrary distribution on a circle, which is either discrete or continuous. The cost of serving a request is the distance that a server needs to move to reach the request. The goal is to minimize the steady-state expected cost induced by the requests. We study the performance of a greedy strategy, focusing, in particular, on its convergence properties and the interplay between the discrete and continuous versions of the process.
A. Anagnostopoulos, L. Becchetti, C. Castillo, and A. Gionis
An Optimization Framework for Query Recommendation [pdf]
Proc. 3rd ACM International Conference on Web Search and Data Mining (WSDM 2010), 2010
Query recommendation is an integral part of modern search engines. The goal of query recommendation is to facilitate users while searching for information. Query recommendation also allows users to explore concepts related to their information needs.
In this paper, we present a formal treatment of the problem of query recommendation. In our framework we model the querying behavior of users by a probabilistic reformulation graph, or query-flow graph [Boldi et al. CIKM 2008]. A sequence of queries submitted by a user can be seen as a path on this graph. Assigning score values to queries allows us to define suitable utility functions and to consider the expected utility achieved by a reformulation path on the query-flow graph. Providing recommendations can be seen as adding shortcuts in the query-flow graph that "nudge" the reformulation paths of users, in such a way that users are more likely to follow paths with larger expected utility.
We discuss in detail the most important questions that arise in the proposed framework. In particular, we provide examples of meaningful utility functions to optimize, we discuss how to estimate the effect of recommendations on the reformulation probabilities, we address the complexity of the optimization problems that we consider, we suggest efficient algorithmic solutions, and we validate our models and algorithms with extensive experimentation. Our techniques can be applied to other scenarios where user behavior can be modeled as a Markov process.
E. Baglioni, L. Becchetti, L. Bergamini, U. Colesanti, L. Filipponi, A. Vitaletti, and G. Persiano
A Lightweight Privacy Preserving SMS-Based Recommendation System for Mobile Users [pdf]
Proc. 4th ACM International Conference on Recommender Systems (RecSys 2010), 2010
In this paper we propose a fully decentralized approach for recommending new contacts in the social network of mobile phone users. With respect to existing solutions, our approach is characterized by some distinguishing features. In particular, the application we propose does not assume any centralized coordination: it transparently collects and processes user information that is accessible in any mobile phone, such as the log of calls, the list of contacts or the inbox/outbox of short messages and exchanges it with other users. This information is used to recommend new friendships to other users. Furthermore, the information needed to perform recommendation is collected and exchanged between users in a privacy preserving way. Finally, information necessary to implement the application is exchanged transparently and opportunistically, by using the residual space in standard short messages occasionally exchanged between users. As a consequence, we do not ask users to change their habits in using SMS.
F. Grandoni, P. Krysta, S. Leonardi, and C. Ventre
Utilitarian Mechanism Design for Multi-Objective Optimization
Proc. 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2010), 2010
In a classic optimization problem the complete input data is known to the algorithm. This assumption may not be true anymore in optimization problems motivated by the Internet where part of the input data is private knowledge of independent selfish agents. The goal of algorithmic mechanism design is to provide (in polynomial time) a solution to the optimization problem and a set of incentives for the agents such that disclosing the input data is a dominant strategy for the agents. In case of NP-hard problems, the solution computed should also be a good approximation of the optimum.
In this paper we focus on mechanism design for multi-objective optimization problems, where we are given the main objective function, and a set of secondary objectives which are modeled via budget constraints. Multi-objective optimization is a natural setting for mechanism design as many economical choices ask for a compromise between different, partially conflicting, goals. Our main contribution is showing that two of the main tools for the design of approximation algorithms for multi-objective optimization problems, namely approximate Pareto curves and Lagrangian relaxation, can lead to truthful approximation schemes.
By exploiting the method of approximate Pareto curves, we devise truthful FPTASs for multi-objective optimization problems whose exact version admits a pseudo-polynomial-time algorithm, as for instance the multi-budgeted versions of minimum spanning tree, shortest path, maximum (perfect) matching, and matroid intersection. Our technique applies also to multi-dimensional knapsack and multi-unit combinatorial auctions. Our FPTASs compute a (1+ε)-approximate solution violating each budget constraint by a factor (1+ε). For a relevant sub-class of the mentioned problems we also present a PTAS (not violating any constraint), which combines the approach above with a novel monotone way to guess the heaviest elements in the optimum solution.
Finally we present a universally truthful Las Vegas PTAS for minimum spanning tree with a single budget constraint. This result is based on the Lagrangian relaxation method, in combination with our monotone guessing step and a random perturbation step (ensuring low expected running time in a way similar to the smoothed analysis of algorithms). All the mentioned results match the best known approximation ratios, which however are obtained by non-truthful algorithms.
M. Vlachos, A. Anagnostopoulos, O. Verscheure, and P. S. Yu
Online Pairing of VoIP Conversations [pdf]
The VLDB Journal, Volume 18, Number 1, 2009
This paper answers the following question; given a multiplicity of evolving 1-way conversations, can a machine or an algorithm discern the conversational pairs in an online fashion, without understanding the content of the communications? Our analysis indicates that this is possible, and can be achieved just by exploiting the temporal dynamics inherent in a conversation. We also show that our findings are applicable for anonymous and encrypted conversations over VoIP networks. We achieve this by exploiting the aperiodic inter-departure time of VoIP packets, hence trivializing each VoIP stream into a binary time-series, indicating the voice activity of each stream. We propose effective techniques that progressively pair conversing parties with high accuracy and in a limited amount of time. Our findings are verified empirically on a dataset consisting of 1000 conversations. We obtain very high pairing accuracy that reaches 97% after 5 minutes of voice conversations. Using a modeling approach we also demonstrate analytically that our result can be extended over an unlimited number of conversations.
L. Becchetti, A. Marchetti-Spaccamela, A. Vitaletti, P. Korteweg, M. Skutella, and L. Stougie
Latency-Constrained Aggregation in Sensor Networks [pdf]
ACM Transactions on Algorithms, Volume 6, Number 1, 2009
A sensor network consists of sensing devices which may exchange data through wireless communication; sensor networks are highly energy constrained since they are usually battery operated. Data aggregation is a possible way to save energy consumption: nodes may delay data in order to aggregate them into a single packet before forwarding them towards some central node (sink). However, many applications impose constraints on the maximum delay of data; this translates into latency constraints for data arriving at the sink.
We study the problem of data aggregation to minimize maximum energy consumption under latency constraints on sensed data delivery, and we assume unique communication paths that form an intree rooted at the sink. We prove that the offline problem is strongly NP-hard and we design a 2-approximation algorithm. The latter uses a novel rounding technique.
Almost all real-life sensor networks are managed online by simple distributed algorithms in the nodes. In this context we consider both the case in which sensor nodes are synchronized or not. We assess the performance of the algorithm by competitive analysis. We also provide lower bounds for the models we consider, in some cases showing optimality of the algorithms we propose. Most of our results also hold when minimizing the total energy consumption of all nodes.
A. Anagnostopoulos, R. Kumar, M. Mahdian, and E. Upfal
Sort Me if You Can: How to Sort Dynamic Data [pdf]
Proc. 36th International Colloquium on Automata, Languages and Programming (ICALP 2009), 2009
We formulate and study a new computational model for dynamic data. In this model the data changes gradually and the goal of an algorithm is to compute the solution to some problem on the data at each time step, under the constraint that it only has a limited access to the data each time. As the data is constantly changing and the algorithm might be unaware of these changes, it cannot be expected to always output the exact right solution; we are interested in algorithms that guarantee to output an approximate solution. In particular, we focus on the fundamental problems of sorting and selection, where the true ordering of the elements changes slowly. We provide algorithms with performance close to the optimal in expectation and with high probability.
L. Becchetti, and E. Koutsoupias
Competitive Analysis of Aggregate Max in Windowed Streaming [pdf]
Proc. 36th International Colloquium on Automata, Languages and Programming (ICALP 2009), 2009
We consider the problem of maintaining a fixed number k of items observed over a data stream, so as to optimize the maximum value over a fixed number n of recent observations. Unlike previous approaches, we use the competitive analysis framework and compare the performance of the online streaming algorithm against an optimal adversary that knows the entire sequence in advance. We consider the problem of maximizing the aggregate max, i.e., the sum of the values of the largest items in the algorithm's memory over the entire sequence. For this problem, we prove an asymptotically tight competitive ratio, achieved by a simple heuristic, called partition-greedy, that performs stream updates efficiently and has almost optimal performance. In contrast, we prove that the problem of maximizing, for every time t, the value maintained by the online algorithm in memory, is considerably harder: in particular, we show a tight competitive ratio that depends on the maximum value of the stream. We further prove negative results for the closely related problem of maintaining the aggregate minimum and for the generalized version of the aggregate max problem in which every item comes with an individual window.
L. Becchetti, C. Castillo, D. Donato, R. Baeza-Yates, and S. Leonardi
Link Analysis for Web Spam Detection [pdf]
ACM Transactions on the Web, Volume 2, Number 1, 2008
We propose link-based techniques for automating the detection of Web spam, a term referring to pages which use deceptive techniques to obtain undeservedly high scores in search engines. The issue of Web spam is widespread and difficult to solve, mostly due to the large size of the Web which means that, in practice, many algorithms are infeasible.
We perform a statistical analysis of a large collection of Web pages. In particular, we compute statistics of the links in the vicinity of every Web page applying rank propagation and probabilistic counting over the entire Web graph in a scalable way. We build several automatic web spam classifiers using different techniques. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance.
Based on these results we propose spam detection techniques which only consider the link structure of Web, regardless of page contents. These statistical features are used to build a classifier that is tested over a large collection of Web link spam. After ten-fold cross-validation, our best classifiers have a performance comparable to that of state-of-the-art spam classifiers that use content attributes, and orthogonal to their methods.
D. Donato, S. Leonardi, S. Millozzi, and P. Tsaparas
Mining the Inner Structure of the Web Graph
Journal of Physics A: Mathematical and Theoretical, Volume 41, 2008
Despite being the sum of decentralized and uncoordinated efforts by heterogeneous groups and individuals, the World Wide Web exhibits a well defined structure, characterized by several interesting properties. This structure was clearly revealed by Broder et al. (Computer Networks 33, 2000) who presented the evocative bow-tie picture of the Web. Although, the bow-tie structure is a relatively clear abstraction of the macroscopic picture of the Web, it is quite uninformative with respect to the inner details of the Web graph. In this paper, we mine the inner structure of the Web graph. We present a series of measurements on the Web, which offer a better understanding of the individual components of the bow-tie. In the process, we develop algorithmic techniques for performing these measurements. We discover that the scale-free properties permeate all the components of the bow-tie which exhibit the same macroscopic properties as the Web graph itself. However, close inspection reveals that their inner structure is quite distinct. We show that the Web graph does not exhibit self similarity within its components, and we propose a possible alternative picture for the Web graph, as it emerges from our experiments.
J. Könemann, S. Leonardi, G. Schäfer, and S. Zwam
A Group-Strategyproof Cost Sharing Mechanism for the Steiner Forest Game
SIAM Journal on Computing, Volume 37, Number 5, 2008
We consider a game-theoretical variant of the Steiner forest problem in which each player j, out of a set of k players, strives to connect his terminal pair (sj,tj) of vertices in an undirected, edge-weighted graph G. In this paper we show that a natural adaptation of the primal- dual Steiner forest algorithm of Agrawal, Klein, and Ravi [SIAM Journal on Computing, 24(3):445-456, 1995] yields a 2-budget balanced and cross-monotonic cost sharing method for this game.
We also present a negative result, arguing that no cross-monotonic cost sharing method can achieve a budget balance factor of less than 2 for the Steiner tree game. This shows that our result is tight.
Our algorithm gives rise to a new linear programming relaxation for the Steiner forest problem which we term the lifted-cut relaxation. We show that this new relaxation is stronger than the standard undirected cut relaxation for the Steiner forest problem.
L. Becchetti, P. Boldi, C. Castillo, and A. Gionis
Efficient Semi-Streaming Algorithms for Local Triangle Counting in Massive Graphs [pdf]
Proc. 14th International Conference on Knowledge Discovery and Data Mining (KDD 2008), 2008
In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G=(V,E) we want to estimate as accurately as possible the number of triangles incident to every node v∈V in the graph. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help to detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features to assess content quality in social networks.
For computing the local number of triangles we propose two approximation algorithms, which are based on the idea of min-wise independent permutations (Broder et al. 1998). Our algorithms operate in a semi-streaming fashion, using O(|V|) space in main memory and performing O(log|V|) sequential scans over the edges of the graph. The first algorithm we describe in this paper also uses O(|E|) space in external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results in massive graphs demonstrating the practical efficiency of our approach.
I. Bordino, D. Donato, A. Gionis, and S. Leonardi
Mining Large Networks with Subgraph Counting
Proc. 8th IEEE International Conference on Data Mining (ICDM 2008), 2008
The problem of mining frequent patterns in networks has many applications, including analysis of complex networks, clustering of graphs, finding communities in social networks, and indexing of graphical and biological databases. Despite this wealth of applications, the current state of the art lacks algorithmic tools for counting the number of subgraphs contained in a large network.
In this paper we develop data-stream algorithms that approximate the number of all subgraphs of three and four vertices in directed and undirected networks. We use the frequency of occurrence of all subgraphs to prove their significance in order to characterize different kinds of networks: we achieve very good precision in clustering networks with similar structure. The significance of our method is supported by the fact that such high precision cannot be achieved when performing clustering based on simpler topological properties, such as degree, assortativity, and eigenvector distributions. We have also tested our techniques using swap randomization.
D. Donato, S. Leonardi, and M. Paniccia
Combining Transitive Trust and Negative Opinions for Better Reputation Management in Social Networks
Proc. 2nd ACM Workshop on Social Network Mining and Analysis (SNA-KDD 2008), 2008
Reputation management is a crucial task in Peer-to-Peer networks, social networks and other decentralized distributed systems. In this work we investigate the role of users' negative opinions in order to devise fully decentralized reputation management mechanisms. Our study is motivated by the limitations of methods proposed in literature that are based on the idea of propagating positive opinions, most notably EigenTrust [9], a cornerstone method for reputation management in decentralized systems. EigenTrust makes use of a transitive deÞnition of trust: a peer tends to trust those peers who have a high reputation in the opinion of trustworthy peers. While EigenTrust has been shown to be effective against a number of threat attacks from coalitions of malicious players, it does not address properly more sophisticated threat attacks.
In this paper we propose a new approach to the design of fully decentralized reputation mechanisms that combine negative and positive opinions expressed by peers to reach a global consensus on trust and distrust valuess for each peer of the network. We show how these strategies outperform EigenTrust in terms of number of successful transactions against a large set of sophisticated threat attacks posed by coalitions of malicious peers. We also discuss a clustering method that achieves detecting most of the malicious peers with high precision.
D. Donato, L. Laura, S. Leonardi, and S. Millozzi
The Web as a Graph: How Far We Are
ACM Transactions on Internet Technology, Volume 7, Number 1, 2007
In this article we present an experimental study of the properties of webgraphs. We study a large crawl from 2001 of 200M pages and about 1.4 billion edges, made available by the WebBase project at Stanford, as well as several synthetic ones generated according to various models proposed recently. We investigate several topological properties of such graphs, including the number of bipartite cores and strongly connected components, the distribution of degrees and PageRank values and some correlations; we present a comparison study of the models against these measures. Our findings are that (i) the WebBase sample differs slightly from the (older) samples studied in the literature, and (ii) despite the fact that these models do not catch all of its properties, they do exhibit some peculiar behaviors not found, for example, in the models from classical random graph theory.Moreover we developed a software library able to generate and measure massive graphs in secondary memory; this library is publicy available under the GPL licence. We discuss its implementation and some computational issues related to secondary memory graph algorithms.
D. Donato, M. Paniccia, M. Selis, C. Castillo, G. Cortese, and S. Leonardi
New Metrics for Reputation Management in P2P Networks [pdf]
Proc. 3rd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb 2007), 2007
In this work we study the effectiveness of mechanisms for decentralized reputation management in P2P networks. We depart from EigenTrust, an algorithm designed for reputation management in file sharing applications over p2p networks. EigenTrust has been proved very effective against three different natural attacks from malicious coalitions while it performs poorly on particular attack organized by two different kinds of malicious peers. We propose various metrics of reputation based on ideas recently introduced for detecting and demoting Web spam. We combine these metrics with the original EigenTrust approach. Our mechanisms are more effective than EigenTrust alone for detecting malicious peers and reducing the number of inauthentic downloads not only for all the cases previously addressed but also for more sophisticated attacks.
S. Leonardi, and P. Sankowski
Network Formation Games with Local Coalitions
Proc. 26th Annual ACM Symposium on Principles of Distributed Computing (PODC 2007), 2007
The quality of Nash equilibria in network formations games has recently been analyzed in the case of uncoordinated players. In this paper we study how the price of anarchy of network formation games with Shapley cost allocation is affected by allowing locally coordinated coalitions of players. In a distributed setting not all users can communicate and form coalitions, at least they have to know that the others exist. Here, we assume that the users can form a coalition when they share a resource, i.e., in our case a group of users that share an edge can form a coalition. We show that this assumption is strong enough to decrease the price of anarchy from Θ(k) to Θ(logk) in the one terminal undirected case, where every vertex node is associated with a player and k is the number of players. Whereas in the directed or multi terminal case local communication does not necessary lead to a better price of anarchy. We additionally show that in the directed case the price of stability increases from Θ(logk) to Θ(k).
L. Buriol, G. Frahling, S. Leonardi, and C. Sohler
Estimating Clustering Indexes in Data Streams
Proc. 15th Annual European Symposium on Algorithms (ESA 2007), 2007
We present random sampling algorithms that with probability at least 1-δ compute a (1±ε)-approximation of the clustering coefficient and of the number of bipartite clique subgraphs of a graph given as an incidence stream of edges. The space used by our algorithm to estimate the clustering coefficient is inversely related to the clustering coefficient of the network itself. The space used by our algorithm to compute the number K3,3 of bipartite cliques is proportional to the ratio between the number of K1,3 and K3,3 in the graph.
Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size. Therefore they provide a basic tool to analyze the structure of dense clusters in large graphs and have many applications in the discovery of web communities, the analysis of the structure of large social networks and the probing of frequent patterns in large graphs.
We implemented both algorithms and evaluated their performance on networks from different application domains and of different size; The largests instance is a webgraph consisting of more than 135 million nodes and 1 billion edges. Both algorithms compute accurate results in reasonable time on the tested instances.
A. Capocci, V. Servedio, F. Colaiori, L. Buriol, D. Donato, S. Leonardi, and G. Caldarelli
Preferential Attachment in the Growth of Social Networks: The Internet Encyclopedia Wikipedia
Physical Review E, Volume 74, Number 3, 2006
We present an analysis of the statistical properties and growth of the free on-line encyclopedia Wikipedia. By describing topics by vertices and hyperlinks between them as edges, we can represent this encyclopedia as a directed graph. The topological properties of this graph are in close analogy with those of the World Wide Web, despite the very different growth mechanism. In particular, we measure a scale-invariant distribution of the in and out degree and we are able to reproduce these features by means of a simple statistical model. As a major consequence, Wikipedia growth can be described by local rules such as the preferential attachment mechanism, though users, who are responsible of its evolution, can act globally on the network.
C. Castillo, D. Donato, L. Becchetti, P. Boldi, S. Leonardi, M. Santini, and S. Vigna
A Reference Collection for Web Spam
ACM SIGIR Forum, Volume 40, Number 2, 2006
We describe the WEBSPAM-UK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if the hosts are include Web spam aspects or not. This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges.
D. Donato, L. Laura, S. Leonardi, U. Meyer, S. Millozzi, and J. Sibeyn
Algorithms and Experiments for the Webgraph
Journal of Graph Algorithms and Applications, Volume 10, Number 2, 2006
In this paper we present an experimental study of the properties of web graphs. We study a large crawl from 2001 of 200M pages and about 1.4 billion edges made available by the WebBase project at Stanford [19], and synthetic graphs obtained by the large scale simulation of stochastic graph models for the Webgraph. This work has required the development and the use of external and semi-external algorithms for computing properties of massive graphs, and for the large scale simulation of stochastic graph models. We report our experimental findings on the topological properties of such graphs, describe the algorithmic tools developed within this project and report the experiments on their time performance.
D. Donato, S. Leonardi, and P. Tsaparas
Stability and Similarity of Link Analysis Ranking Algorithms
Internet Mathematics, Volume 3, Number 4, 2006
Recently, there has been a surge of research activity in the area of link analysis ranking, where hyperlink structures are used to determine the relative authority of webpages. One of the seminal works in this area is that of Kleinberg [Kleinberg 98], who proposed the HITS algorithm. In this paper, we undertake a theoretical analysis of the properties of the HITS algorithm on a broad class of random graphs. Working within the framework of Borodin et al. [Borodin et al. 05], we prove that, under some assumptions, on this class (a) the HITS algorithm is stable with high probability and (b) the HITS algorithm is similar to the InDegree heuristic that assigns to each node weight proportional to the number of incoming links. We demonstrate that our results go through for the case that the expected in-degrees of the graph follow a power law distribution. We also study experimentally the similarity between HITS and InDegree, and we investigate the general conditions under which the two algorithms are similar.
L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates
Link-Based Characterization and Detection of Web Spam
Proc. 2nd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb 2006), 2006
We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. Using this approach we are able to detect 80.4% of the Web spam in our sample, with only 1.1% of false positives.
L. Buriol, C. Castillo, D. Donato, S. Leonardi, and S. Millozzi
Temporal Analysis of the Wikigraph
Proc. 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006), 2006
Wikipedia is an online encyclopedia, available in more than 100 languages and comprising over 1 million articles in its English version. If we consider each Wikipedia article as a node and each hyperlink between articles as an arc we have a "Wikigraph", a graph that represents the link structure of Wikipedia. The Wikigraph differs from other Web graphs studied in the literature by the fact that there are explicit timestamps associated with each node's events. This allows us to do a detailed analysis of the Wikipedia evolution over time. In the first part of this study we characterize this evolution in terms of users, editions and articles; in the second part, we depict the temporal evolution of several topological properties of the Wikigraph. The insights obtained from the Wikigraphs can be applied to large Web graphs from which the temporal data is usually not available.
L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates
Using Rank Propagation and Probabilistic Counting for Link-Based Spam Detection
Proc. 2006 Workshop on Web Mining and Web Usage Analysis (WebKDD 2006), 2006
This paper describes a technique for automating the detection of Web link spam, that is, groups of pages that are linked together with the sole purpose of obtaining an undeservedly high score in search engines. The problem of Web spam is widespread and difficult to solve, mostly due to the large size of web collections that makes many algorithms unfeasible in practice.
L. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler
Counting Triangles in Data Streams
Proc. 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2006), 2006
We present two space bounded random sampling algorithms that compute an approximation of the number of triangles in an undirected graph given as a stream of edges. Our first algorithm does not make any assumptions on the order of edges in the stream. It uses space that is inversely related to the ratio between the number of triangles and the number of triples with at least one edge in the induced subgraph, and constant expected update time per edge. Our second algorithm is designed for incidence streams (all edges incident to the same vertex appear consecutively). It uses space that is inversely related to the ratio between the number of triangles and length 2 paths in the graph and expected update time O(log|V|(1+s|V|/|E|)), where s is the space requirement of the algorithm. These results significantly improve over previous work [20, 8]. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size and so they provide a basic tool to analyze the structure of large graphs. They have many applications, for example, in the discovery of Web communities, the computation of clustering and transitivity coefficient, and discovery of frequent patterns in large graphs.
We have implemented both algorithms and evaluated their performance on networks from different application domains. The sizes of the considered graphs varied from about 8,000 nodes and 40,000 edges to 135 million nodes and more than 1 billion edges. For both algorithms we run experiments with parameter s=1,000, 10,000, 100,000, 1,000,000 to evaluate running time and approximation guarantee. Both algorithms appear to be time efficient for these sample sizes. The approximation quality of the first algorithm was varying significantly and even for s=1,000,000 we had more than 10% deviation for more than half of the instances. The second algorithm performed much better and even for s=10,000 we had an average deviation of less than 6% (taken over all but the largest instance for which we could not compute the number of triangles exactly).
L. Becchetti, C. Castillo, D. Donato, and A. Fazzone
A Comparison of Sampling Techniques for Web Graph Characterization
Proc. 2006 ACM Workshop on Link Analysis: Dynamics and Static of Large Networks (LinKDD 2006), 2006
We present a detailed statistical analysis of the characteristics of partial Web graphs obtained by sub-sampling a large collection of Web pages.
We show that in general the macroscopic properties of the Web are better represented by a shallow exploration of a large number of sites than by a deep exploration of a limited set of sites. We also describe and quantify the bias induced by the different sampling strategies, and show that it can be significant even if the sample covers a large fraction of the collection.