The tools that we develop are based on the design of various models and algorithms, which we subject to theoretical and experimental analysis. Based on the application area and on the requirements we design stochastic, approximation, or streaming algorithms.
Streaming models are often appropriate for the computation of quantities involving large datasets, such as large networks. We have developed streaming, sampling-based techniques for the computation of clustering coefficients, for counting triangles in large graphs, and so on, by performing one of few passes on the data.
We have developed approximation algorithms for the co-clustering problem, which has applications in text mining, market-basket analysis, and bioinformatics.
Given the uncertain nature of the input, we often resort to tools from stochastic analysis of algorithms. Often in nature, we deal with massive data, which also change continually over time. To model such scenarios, we have introduced and analyzed the model of dynamic or evolving data, and we have shown how we can maintain statistics of the data over time with limited queries. We have also designed algorithms for web mining problems, such as caching or query recommendation, which we analyze under probabilistic assumptions of users' search behavior.
P. Brach, M. Cygan, J. Lacki, and P. Sankowski
Algorithmic Complexity of Power Law Networks
Proc. 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2016), 2016
We define a deterministic condition for checking whether a graph has a power law degree distribution and experimentally validate it on real-world networks. This definition allows us to derive interesting properties of power law networks. We observe that for exponents of the degree distribution in the range [1, 2] such networks exhibit double power law phenomenon that was observed for several real-world networks. Moreover, we give a novel theoretical explanation why many algorithms run faster on real-world data than what is predicted by algorithmic worst-case analysis. We show how to exploit the power law degree distribution to design faster algorithms for a number of classic problems including transitive closure, maximum matching, determinant, PageRank, matrix inverse, counting triangles and finding maximum clique. In contrast to previously done average-case analyses, we believe that this is the first “waterproof’ argument that explains why many real-world networks are easier.
V. Auletta, D. Ferraioli, F. Pasquale, P. Penna, and G. Persiano
Logit Dynamics with Concurrent Updates for Local Interaction Potential Games
Algorithmica, Volume 73, Number 3, 2015
Logit choice dynamics constitute a family of randomized best response dynamics based on the logit choice function (McFadden in Frontiers in econometrics. Academic Press, New York, 1974) that models players with limited rationality and knowledge. In this paper we study the all-logit dynamics [also known as simultaneous learning (Alós-Ferrer and Netzer in Games Econ Behav 68(2):413--427, 2010)], where at each time step all players concurrently update their strategies according to the logit choice function. In the well studied (one-)logit dynamics (Blume in Games Econ Behav 5(3):387--424, 1993) instead at each step only one randomly chosen player is allowed to update. We study properties of the all-logit dynamics in the context of local interaction potential games, a class of games that has been used to model complex social phenomena (Montanari and Saberi 2009; Peyton in The economy as a complex evolving system. Oxford University Press, Oxford, 2003) and physical systems (Levin et al. in Probab Theory Relat Fields 146(1--2):223--265, 2010; Martinelli in Lectures on probability theory and statistics. Springer, Berlin, 1999). In a local interaction potential game players are the vertices of a social graph whose edges are two-player potential games. Each player picks one strategy to be played for all the games she is involved in and the payoff of the player is the sum of the payoffs from each of the games. We prove that local interaction potential games characterize the class of games for which the all-logit dynamics is reversible. We then compare the stationary behavior of one-logit and all-logit dynamics. Specifically, we look at the expected value of a notable class of observables, that we call decomposable observables. We prove that the difference between the expected values of the observables at stationarity for the two dynamics depends only on the rationality level beta and on the distance of the social graph from a bipartite graph. In particular, if the social graph is bipartite then decomposable observables have the same expected value. Finally, we show that the mixing time of the all-logit dynamics has the same twofold behavior that has been highlighted in the case of the one-logit: for some games it exponentially depends on the rationality level beta, whereas for other games it can be upper bounded by a function independent from beta.
A. Anagnostopoulos, L. Becchetti, A. Fazzone, I. Mele, and M. Riondato
The Importance of Being Expert: Efficient Max-Finding in Crowdsourcing [pdf]
Proc. 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD 2015), 2015
Crowdsourcing is a computational paradigm whose distinctive feature is the involvement of human workers in key steps of the computation. It is used successfully to address problems that would be hard or impossible to solve for machines. As we highlight in this work, the exclusive use of nonexpert individuals may prove ineffective in some cases, especially when the task at hand or the need for accurate solutions demand some degree of specialization to avoid excessive uncertainty and inconsistency in the answers. We address this limitation by proposing an approach that combines the wisdom of the crowd with the educated opinion of experts. We present a computational model for crowdsourcing that envisions two classes of workers with different expertise levels. One of its distinctive features is the adoption of the threshold error model, whose roots are in psychometrics and which we extend from previous theoretical work. Our computational model allows to evaluate the performance of crowdsourcing algorithms with respect to accuracy and cost. We use our model to develop and analyze an algorithm for approximating the best, in a broad sense, of a set of elements. The algorithm uses naïve and expert workers to find an element that is a constant-factor approximation to the best. We prove upper and lower bounds on the number of comparisons needed to solve this problem, showing that our algorithm uses expert and naïve workers optimally up to a constant factor. Finally, we evaluate our algorithm on real and synthetic datasets using the CrowdFlower crowdsourcing platform, showing that our approach is also effective in practice.
L. Becchetti, A. Clementi, E. Natale, F. Pasquale, and G. Posta
Self-Stabilizing Repeated Balls-Into-Bins
Proc. 27th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2015), 2015
We study the following synchronous process that we call repeated balls-into-bins. The process is started by assigning n balls to n bins in an arbitrary way. Then, in every subsequent round, one ball is chosen according to some fixed strategy (random, FIFO, etc) from each non-empty bin, and re-assigned to one of the n bins uniformly at random. This process corresponds to a non-reversible Markov chain and our aim is to study its self-stabilization properties with respect to the maximum (bin) load and some related performance measures.
We define a configuration (i.e., a state) legitimate if its maximum load is O(logn). We first prove that, starting from any legitimate configuration, the process will only take on legitimate configurations over a period of length bounded by any polynomial in n, with high probability (w.h.p.). Further we prove that, starting from any configuration, the process converges to a legitimate configuration in linear time, w.h.p. This implies that the process is self-stabilizing w.h.p. and, moreover, that every ball traverses all bins in O(nlog2n) rounds, w.h.p.
The latter result can also be interpreted as an almost tight bound on the cover time for the problem of parallel resource assignment in the complete graph.
L. Becchetti, A. Clementi, E. Natale, F. Pasquale, and R. Silvestri
Plurality Consensus in the Gossip Model
Proc. 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2015), 2015
We study Plurality Consensus in the GOSSIP Model over a network of n anonymous agents. Each agent supports an initial opinion or color. We assume that at the onset, the number of agents supporting the plurality color exceeds that of the agents supporting any other color by a sufficiently-large bias, though the initial plurality itself might be very far from absolute majority. The goal is to provide a protocol that, with high probability, brings the system into the configuration in which all agents support the (initial) plurality color.
We consider the Undecided-State Dynamics, a well-known protocol which uses just one more state (the undecided one) than those necessary to store colors.
We show that the speed of convergence of this protocol depends on the initial color configuration as a whole, not just on the gap between the plurality and the second largest color community. This dependence is best captured by the monochromatic distance, a novel notion we introduce, measuring the distance of the initial color configuration from the closest monochromatic one. In the complete graph, we prove that, for a wide range of the input parameters, this dynamics converges in a number of rounds that is logarithmic in the number of nodes and linear with respect to the aforementioned monochromatic distance. We prove that this upper bound is almost tight in the strong sense: Starting from any color configuration, the convergence time is linear in the monochromatic. Finally, we adapt the protocol above to obtain a fast, random walk-based protocol for plurality consensus on regular expanders. All our bounds hold with high probability.
S. Lattanzi, S. Leonardi, V. Mirrokni, and I. Razenshteyn
Robust Hierarchical K-Center Clustering
Proc. 6th ACM International Conference on Innovations on Theoretical Computer Science (ITCS 2015), 2015
L. Becchetti, A. Clementi, F. Pasquale, G. Resta, P. Santi, and R. Silvestri
Flooding Time in Opportunistic Networks Under Power Law and Exponential Intercontact Times
IEEE Transactions on Parallel and Distributed Systems, Volume 25, Number 9, 2014
Performance bounds for opportunistic networks have been derived in a number of recent papers for several key quantities, such as the expected delivery time of a unicast message, or the flooding time (a measure of how fast information spreads). However, to the best of our knowledge, none of the existing results is derived under a mobility model which is able to reproduce the power law+exponential tail dichotomy of the pairwise node inter-contact time distribution which has been observed in traces of several real opportunistic networks.
The contributions of this paper are two-fold: first, we present a simple pairwise contact model--called the Home-MEG model--for opportunistic networks based on the observation made in previous work that pairs of nodes in the network tend to meet in very few, selected locations (home locations); this contact model is shown to be able to faithfully reproduce the power law+exponential tail dichotomy of inter-contact time. Second, we use the Home-MEG model to analyze flooding time in opportunistic networks, presenting asymptotic bounds on flooding time that assume different initial conditions for the existence of opportunistic links.
Finally, our bounds provide some analytical evidences that the speed of information spreading in opportunistic networks can be much faster than that predicted by simple geometric mobility models.
F. Grandoni, P. Krysta, S. Leonardi, and C. Ventre
Utilitarian Mechanism Design for Multiobjective Optimization
SIAM Journal on Computing, Volume 43, Number 4, 2014
In a classic optimization problem, the complete input data is assumed to be known to the algorithm. This assumption may not be true anymore in optimization problems motivated by the Internet where part of the input data is private knowledge of independent selfish agents. The goal of algorithmic mechanism design is to provide (in polynomial time) a solution to the optimization problem and a set of incentives for the agents such that disclosing the input data is a dominant strategy for the agents. In the case of NP-hard problems, the solution computed should also be a good approximation of the optimum. In this paper we focus on mechanism design for multiobjective optimization problems. In this setting we are given a main objective function and a set of secondary objectives which are modeled via budget constraints. Multiobjective optimization is a natural setting for mechanism design as many economical choices ask for a compromise between different, partially conflicting goals. The main contribution of this paper is showing that two of the main tools for the design of approximation algorithms for multiobjective optimization problems, namely, approximate Pareto sets and Lagrangian relaxation, can lead to truthful approximation schemes. By exploiting the method of approximate Pareto sets, we devise truthful deterministic and randomized multicriteria fully polynomial-time approximation schemes (FPTASs) for multiobjective optimization problems whose exact version admits a pseudopolynomial-time algorithm, as, for instance, the multibudgeted versions of minimum spanning tree, shortest path, maximum (perfect) matching, and matroid intersection. Our construction also applies to multidimensional knapsack and multiunit combinatorial auctions. Our FPTASs compute a (1+varepsilon)-approximate solution violating each budget constraint by a factor (1+varepsilon). When feasible solutions induce an independence system, i.e., when subsets of feasible solutions are feasible as well, we present a PTAS (not violating any constraint), which combines the approach above with a novel monotone way to guess the heaviest elements in the optimum solution. Finally, we present a universally truthful Las Vegas PTAS for minimum spanning tree with a single budget constraint, where one wants to compute a minimum cost spanning tree whose length is at most a given value L. This result is based on the Lagrangian relaxation method, in combination with our monotone guessing step and with a random perturbation step (ensuring low expected running time). This result can be derandomized in the case of integral lengths. All the mentioned results match the best known approximation ratios, which are, however, obtained by nontruthful algorithms.
P. Gawrychowski, A. Jeż, and Ł. Jeż
Validating the Knuth-Morris-Pratt Failure Function, Fast and Online
Theory of Computing Systems, Volume 54, Number 2, 2014
Let π'w denote the failure function of the Knuth-Morris-Pratt algorithm for a word w. In this paper we study the following problem: given an integer array A'[1..n], is there a word w over an arbitrary alphabet Σ such that A'[i]=π'w[i] for all i? Moreover, what is the minimum cardinality of Σ required? We give an elementary and self-contained O(nlogn) time algorithm for this problem, thus improving the previously known solution (Duval et al. in Conference in honor of Donald E. Knuth, 2007), which had no polynomial time bound. Using both deeper combinatorial insight into the structure of π' and advanced algorithmic tools, we further improve the running time to O(n).
L. Becchetti, A. Clementi, E. Natale, F. Pasquale, R. Silvestri, and L. Trevisan
Simple Dynamics for Plurality Consensus
Proc. 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2014), 2014
We study a Majority Consensus process in which each of n anonymous agents of a communication network supports an initial opinion (a color chosen from a finite set [k]) and, at every time step, he can revise his color according to a random sample of neighbors. It is assumed that the initial color configuration has a sufficiently large bias s towards a fixed majority color, that is, the number of nodes supporting the majority color exceeds the number of nodes supporting any other color by s additional nodes. The goal (of the agents) is to let the process converge to the stable configuration where all nodes support the majority color. We consider a basic model in which the network is a clique and the update rule (called here the 3-majority dynamics) of the process is that each agent looks at the colors of three random neighbors and then applies the majority rule (breaking ties uniformly).
We prove that with high probability the process converges in time logarithmic in the number of nodes and linear in the number of colors. A natural question is whether looking at more (than three) random neighbors can significantly speed up the process. We provide a negative answer to this question: in particular, we show that samples of polylogarithmic size can speed up the process by a polylogarithmic factor only.
M. Adamczyk, P. Sankowski, and Q. Zhang
Efficiency of Truthful and Symmetric Mechanisms in One-Sided Matching
Proc. 7th International Symposium on Algorithmic Game Theory (SAGT 2014), 2014
We study the efficiency (in terms of social welfare) of truthful and symmetric mechanisms in one-sided matching problems with dichotomous preferences and em normalized von Neumann-Morgenstern preferences. We are particularly interested in the well-known Random Serial Dictatorship mechanism. For dichotomous preferences, we first show that truthful, symmetric and optimal mechanisms exist if intractable mechanisms are allowed. We then provide a connection to online bipartite matching. Using this connection, it is possible to design truthful, symmetric and tractable mechanisms that extract (1-1/e)-fraction or even 69% of the maximum social welfare. Furthermore, we show that Random Serial Dictatorship always returns an assignment in which the expected social welfare is at least a third of the maximum social welfare. For normalized von Neumann-Morgenstern preferences, we show that Random Serial Dictatorship always returns an assignment in which the expected social welfare is at least ν(opt)2/en, where ν(opt) is the maximum social welfare and n is the number of both agents and items. On the hardness side, we show that no truthful mechanism can achieve a social welfare better than O(ν(opt)2/n).
M. Adamczyk, M. Sviridenko, and J. Ward
Submodular Stochastic Probing on Matroids
Proc. 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014), 2014
In a stochastic probing problem we are given a universe E, where each element e∈E is active independently with probability pe∈[0,1], and only a probe of e can tell us whether it is active or not. On this universe we execute a process that one by one probes elements--if a probed element is active, then we have to include it in the solution, which we gradually construct. Throughout the process we need to obey inner constraints on the set of elements taken into the solution, and outer constraints on the set of all probed elements. This abstract model was presented by Gupta~and~Nagarajan (IPCO '13), and provides a unified view of a number of problems. Thus far all the results in this general framework pertain only to the case in which we are maximizing a linear objective function of the successfully probed elements. In this paper we generalize the stochastic probing problem by considering a monotone submodular objective function. For any T∈(0,1], we give a (1-e-T)/(T(kin+kout)+1)-approximation for the case in which we are given kinge0 matroids as inner constraints and koutge1 matroids as outer constraints. In the case that k=kin+kout>1, we show that the optimal value for T is given by T=-1-1/k-W-1(-e-1-1/k)≈√(2/k)-1/3, where W is the Lambert W function.
There are two main ingredients behind this result. First is a previously unpublished stronger bound on the continuous greedy algorithm due to Vondrak~citeVondrak:email. Second is a rounding procedure that also allows us to obtain an improved 1/(kin+kout)-approximation for linear objective functions.
A. Anagnostopoulos, F. Grandoni, S. Leonardi, and A. Wiese
A Mazing 2+ε Approximation for Unsplittable Flow on a Path [pdf]
Proc. 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2014), 2014
We study the unsplittable flow on a path problem (UFP), which arises naturally in many applications such as bandwidth allocation, job scheduling, and caching. Here we are given a path with non-negative edge capacities and a set of tasks, which are characterized by a subpath, a demand, and a profit. The goal is to find the most profitable subset of tasks whose total demand does not violate the edge capacities. Not surprisingly, this problem has received a lot of attention in the research community.
If the demand of each task is at most a small enough fraction δ of the capacity along its subpath (δ-small tasks), then it has been known for a long time [Chekuri et al., ICALP 2003] how to compute a solution of value arbitrarily close to the optimum via LP rounding. However, much remains unknown for the complementary case, that is, when the demand of each task is at least some fraction δ>0 of the smallest capacity of its subpath (δ-large tasks). For this setting a constant factor approximation, improving on an earlier logarithmic approximation, was found only recently [Bonsma et al., FOCS 2011].
In this paper we present a PTAS for δ-large tasks, for any constant δ>0. Key to this result is a complex geometrically inspired dynamic program. Each task is represented as a segment underneath the capacity curve, and we identify a proper maze-like structure so that each corridor of the maze is crossed by only O(1) tasks in the andyoptimal solution. The maze has a tree topology, which guides our dynamic program. Our result implies a 2+ε approximation for UFP, for any constant ε>0, improving on the previously best 7+ε approximation by Bonsma et al. We remark that our improved approximation algorithm matches the best known approximation ratio for the considerably easier special case of uniform edge capacities.
M. Bieńkowski, J. Byrka, M. Chrobak, Ł. Jeż, D. Nogneng, and J. Sgall
Better Approximation Bounds for the Joint Replenishment Problem
Proc. 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2014), 2014
The Joint Replenishment Problem (JRP) deals with optimizing shipments of goods from a supplier to retailers through a shared warehouse. Each shipment involves transporting goods from the supplier to the warehouse, at a fixed cost C, followed by a redistribution of these goods from the warehouse to the retailers that ordered them, where transporting goods to a retailer rho has a fixed cost crho. In addition, we incur waiting costs for each order, possibly an arbitrary non-decreasing function of time, different for each order. The objective is to minimize the overall cost of satisfying all orders, namely the sum of all shipping and waiting costs.
JRP has been well studied in Operations Research and, more recently, in the area of approximation algorithms. For arbitrary waiting cost functions, the best known approximation ratio is 1.8. This ratio can be reduced to ≈1.574 for the JRP-D model, where there is no cost for waiting but orders have deadlines. As for hardness results, it is known that the problem is APX- hard and that the natural linear program for JRP has integrality gap at least 1.245. Both results hold even for JRP-D. In the online scenario, the best lower and upper bounds on the competitive ratio are 2.64 and 3, respectively. The lower bound of 2.64 applies even to the restricted version of JRP, denoted JRP-L, where the waiting cost function is linear.
We provide several new approximation results for JRP. In the offline case, we give an algorithm with ratio ≈1.791, breaking the barrier of 1.8. We also show that the integrality gap of the linear program for JRP-L is at least 12/11 ≈1.09. In the online case, we show a lower bound of ≈2.754 on the competitive ratio for JRP-L (and thus JRP as well), improving the previous bound of 2.64. We also study the online version of JRP-D, for which we prove that the optimal competitive ratio is 2.
F. Grandoni, A. Gupta, S. Leonardi, P. Miettinen, P. Sankowski, and M. Singh
Set Covering with Our Eyes Closed
SIAM Journal on Computing, Volume 42, Number 3, 2013
Given a universe U of n elements and a weighted collection C of m subsets of U, the universal set cover problem is to a-priori map each element u∈U to a set S(u)∈C containing u, so that X⊆U is covered by S(X)=∪u∈XS(u). The aim is finding a mapping such that the cost of S(X) is as close as possible to the optimal set-cover cost for X. (Such problems are also called oblivious or a-priori optimization problems.) Unfortunately, for every universal mapping, the cost of S(X) can be Ω(√n) times larger than optimal if the set X is adversarially chosen.
In this paper we study the performance on average, when X is a set of randomly chosen elements from the universe: we show how to efficiently find a universal map whose expected cost is O(logmn) times the expected optimal cost. In fact, we give a slightly improved analysis and show that this is the best possible. We generalize these ideas to weighted set cover and show similar guarantees to (non-metric) facility location, where we have to balance the facility opening cost with the cost of connecting clients to the facilities. We show applications of our results to universal multi-cut and disc-covering problems, and show how all these universal mappings give us stochastic online algorithms with the same competitive factors.
Ł. Jeż
A Universal Randomized Packet Scheduling Algorithm
Algorithmica, Volume 67, Number 4, 2013
We give a memoryless scale-invariant randomized algorithm ReMix for Packet Scheduling that is e/(e−1)-competitive against an adaptive adversary. ReMix unifies most of previously known randomized algorithms, and its general analysis yields improved performance guarantees for several restricted variants, including the s-bounded instances. In particular, ReMix attains the optimum competitive ratio of 4/3 on 2-bounded instances.
Our results are applicable to a more general problem, called Item Collection, in which only the relative order between packets’ deadlines is known. ReMix is the optimal memoryless randomized algorithm against adaptive adversary for that problem.
M. Bienkowski, M. Chrobak, C. Dürr, M. Hurand, A. Jeż, Ł. Jeż, and G. Stachowiak
Collecting Weighted Items From a Dynamic Queue
Algorithmica, Volume 65, Number 1, 2013
We consider online competitive algorithms for the problem of collecting weighted items from a dynamic queue S. The content of S varies over time. An update to S can occur between any two consecutive time steps, and it consists in deleting any number of items at the front of S and inserting other items into arbitrary locations in S. At each time step we are allowed to collect one item in S. The objective is to maximize the total weight of collected items. This is a generalization of bounded-delay packet scheduling (also known as buffer management). We present several upper and lower bounds on the competitive ratio for the general case and for some restricted variants of this problem.
M. Bienkowski, M. Chrobak, C. Dürr, M. Hurand, A. Jeż, Ł. Jeż, and G. Stachowiak
A φ-Competitive Algorithm for Collecting Items with Increasing Weights From a Dynamic Queue
Theoretical Computer Science, Volume 475, 2013
The bounded-delay packet scheduling (or buffer management) problem is to schedule transmissions of packets arriving in a buffer of a network link. Each packet has a deadline and a weight associated with it. The objective is to maximize the weight of packets that are transmitted before their deadlines, assuming that only one packet can be transmitted in one time step. Online packet scheduling algorithms have been extensively studied. It is known that no online algorithm can achieve a competitive ratio better than φ≈1.618 (the golden ratio), while the currently best upper bound on the competitive ratio is 2√2-1≈1.1824. Closing the gap between these bounds remains a major open problem.
The above mentioned lower bound of φ uses instances where item weights increase exponentially over time. In fact, all lower bounds for various versions of buffer management problems involve instances of this type. In this paper, we design an online algorithm for packet scheduling with competitive ratio φ when packet weights are increasing, thus matching this lower bound. Our algorithm applies, in fact, to a much more general version of packet scheduling, where only the relative order of the deadlines is known, not their exact values.
M. Chrobak, Ł. Jeż, and J. Sgall
Better Bounds for Incremental Frequency Allocation in Bipartite Graphs
Theoretical Computer Science, Volume 514, 2013
We study frequency allocation in wireless networks. A wireless network is modeled by an undirected graph, with vertices corresponding to cells. In each vertex, we have a certain number of requests, and each of those requests must be assigned a different frequency. Edges represent conflicts between cells, meaning that frequencies in adjacent vertices must be different as well. The objective is to minimize the total number of used frequencies.
The offline version of the problem is known to be NP-hard. In the incremental version, requests for frequencies arrive over time and the algorithm is required to assign a frequency to a request as soon as it arrives. Competitive incremental algorithms have been studied for several classes of graphs. For paths, the optimal (asymptotic) ratio is known to be 4/3, while for hexagonal-cell graphs it is between 1.5 and 1.9126. For ξ-colorable graphs, the ratio of (ξ+1)/2 can be achieved.
In this paper, we prove nearly tight bounds on the asymptotic competitive ratio for bipartite graphs, showing that it is between 1.428 and 1.433. This improves the previous lower bound of 4/3 and upper bound of 1.5. Our proofs are based on reducing the incremental problem to a purely combinatorial (equivalent) problem of constructing set families with certain intersection properties.
Ł. Jeż, J. Schwartz, J. Sgall, and J. Békési
Lower Bounds for Online Makespan Minimization on a Small Number of Related Machines
Journal of Scheduling, Volume 16, Number 5, 2013
In online makespan minimization, the jobs characterized by their processing time arrive one-by-one and each has to be assigned to one of the m uniformly related machines. The goal is to minimize the length of the schedule. We prove new combinatorial lower bounds for m=4 and m=5, and computer-assisted lower bounds for m≤11.
L. Becchetti, V. Bonifaci, M. Dirnberger, A. Karrenbauer, and K. Mehlhorn
Physarum Can Compute Shortest Paths: Convergence Proofs and Complexity Bounds [pdf]
Proc. 40th International Colloquium on Automata, Languages and Programming (ICALP 2013), 2013
Physarum polycephalum is a slime mold that is apparently able to solve shortest path problems. A mathematical model for the slime’s behavior in the form of a coupled system of differential equations was proposed by Tero, Kobayashi and Nakagaki [TKN07]. We prove that a discretization of the model (Euler integration) computes a (1+ε)-approximation of the shortest path in O(mL(logn+logL)/ε3) iterations, with arithmetic on numbers of O(log(nL/ε)) bits; here, n and m are the number of nodes and edges of the graph, respectively, and L is the largest length of an edge. We also obtain two results for a directed Physarum model proposed by Ito et al. [IJNT11]: convergence in the general, nonuniform case and convergence and complexity bounds for the discretization of the uniform case.
A. Anagnostopoulos, F. Grandoni, S. Leonardi, and A. Wiese
Constant Integrality Gap LP Formulations of Unsplittable Flow on a Path [pdf]
Proc. 16th Conference on Integer Programming and Combinatorial Optimization (IPCO 2013), 2013
The Unsplittable Flow Problem on a Path (UFP) is a core problem in many important settings such as network flows, bandwidth allocation, resource constraint scheduling, and interval packing. We are given a path with capacities on the edges and a set of tasks, each task having a demand, a profit, a source and a destination vertex on the path. The goal is to compute a subset of tasks of maximum profit that does not violate the edge capacities.
In practical applications generic approaches such as integer programming (IP) methods are desirable. Unfortunately, no IP-formulation is known for the problem whose LP-relaxation has an integrality gap that is provably constant. For the unweighted case, we show that adding a few constraints to the standard LP of the problem is sufficient to make the integrality gap drop from Ω(n) to O(1). This positively answers an open question in [Chekuri et al., APPROX 2009].
For the general (weighted) case, we present an extended formulation with integrality gap bounded by 7+ε. This matches the best known approximation factor for the problem [Bonsma et al., FOCS 2011]. This result exploits crucially a technique for embedding dynamic programs into linear programs. We believe that this method could be useful to strengthen LP-formulations for other problems as well and might eventually speed up computations due to stronger problem formulations.
M. Pilipczuk, M. Pilipczuk, P. Sankowski, and E. J. Leeuwen
Subexponential-Time Parameterized Algorithm for Steiner Tree on Planar Graphs
Proc. 30th International Symposium on Theoretical Aspects of Computer Science (STACS 2013), 2013
The well-known bidimensionality theory provides a method for designing fast, subexponential-time parameterized algorithms for a vast number of NP-hard problems on sparse graph classes such as planar graphs, bounded genus graphs, or, more generally, graphs with a fixed excluded minor. However, in order to apply the bidimensionality framework the considered problem needs to fulfill a special density property. Some well-known problems do not have this property, unfortunately, with probably the most prominent and important example being the Steiner Tree problem. Hence the question whether a subexponential-time parameterized algorithm for Steiner Tree on planar graphs exists has remained open. In this paper, we answer this question positively and develop an algorithm running in O(2O((klogk)^(2/3))n) time and polynomial space, where k is the size of the Steiner tree and n is the number of vertices of the graph. Our algorithm does not rely on tools from bidimensionality theory or graph minors theory, apart from Baker’s classical approach. Instead, we introduce new tools and concepts to the study of the parameterized complexity of problems on sparse graphs.
M. Cygan, and Ł. Jeż
Online Knapsack Revisited
Proc. 11th Workshop on Approximation and Online Algorithms (WAOA 2013), 2013
We investigate the online variant of the Multiple Knapsack problem: an algorithm is to pack items, of arbitrary sizes and profits, in k knapsacks (bins) without exceeding the capacity of any bin. We study two objective functions: the sum and the maximum of profits over all bins. Both have been studied before in restricted variants of our problem: the sum in Dual Bin Packing [1], and the maximum in Removable Knapsack [7,8]. Following these, we study two variants, depending on whether the algorithm is allowed to remove (forever) items from its bins or not, and two special cases where the profit of an item is a function of its size, in addition to the general setting.
We study both deterministic and randomized algorithms; for the latter, we consider both the oblivious and the adaptive adversary model. We classify each variant as either admitting O(1)-competitive algorithms or not. We develop simple O(1)-competitive algorithms for some cases of the max-objective variant believed to be infeasible because only 1-bin deterministic algorithms were considered for them before.
M. Bieńkowski, J. Byrka, M. Chrobak, Ł. Jeż, J. Sgall, and G. Stachowiak
Online Control Message Aggregation in Chain Networks
Algorithms and Data Structures Symposium , 2013
In the Control Message Aggregation (CMA) problem, control packets are generated over time at the nodes of a tree T and need to be transmitted to the root of T. To optimize the overall cost, these transmissions can be delayed and different packets can be aggregated, that is a single transmission can include all packets from a subtree rooted at the root of T. The cost of this transmission is then equal to the total edge length of this subtree, independently of the number of packets that are sent. A sequence of transmissions that transmits all packets is called a schedule. The objective is to compute a schedule with minimum cost, where the cost of a schedule is the sum of all the transmission costs and delay costs of all packets. The problem is known to be NP-hard, even for trees of depth 2. In the online scenario, it is an open problem whether a constant-competitive algorithm exists.
We address the special case of the problem when T is a chain network. For the online case, we present a 5-competitive algorithm and give a lower bound of 2+φ≈3.618, improving the previous bounds of 8 and 2, respectively. Furthermore, for the offline version, we give a polynomial-time algorithm that computes the optimum solution.
C. Dürr, Ł. Jeż, and O. C. Vásquez
Mechanism Design for Aggregating Energy Consumption and Quality of Service in Speed Scaling Scheduling
Proc. 9th Conference on Web and Internet Economics (WINE 2013), 2013
We consider a strategic game, where players submit jobs to a machine that executes all jobs in a way that minimizes energy while respecting the jobs' deadlines. The energy consumption is then charged to the players in some way. Each player wants to minimize the sum of that charge and of their job’s deadline multiplied by a priority weight. Two charging schemes are studied, the proportional cost share which does not always admit pure Nash equilibria, and the marginal cost share, which does always admit pure Nash equilibria, at the price of overcharging by a constant factor.
A. Anagnostopoulos, A. Dasgupta, and R. Kumar
A Constant-Factor Approximation Algorithm for Co-Clustering [pdf]
Theory of Computing, Volume 8, Number 26, 2012
Co-clustering is the simultaneous partitioning of the rows and columns of a matrix such that the blocks induced by the row/column partitions are good clusters. Motivated by several applications in text mining, market-basket analysis, and bioinformatics, this problem has attracted a lot of attention in the past few years. Unfortunately, to date, most of the algorithmic work on this problem has been heuristic in nature.
In this work we obtain the first approximation algorithms for the co-clustering problem. Our algorithms are simple and provide constant-factor approximations to the optimum. We also show that co-clustering is NP-hard, thereby complementing our algorithmic result.
A. Anagnostopoulos, R. Kumar, M. Mahdian, E. Upfal, and F. Vandin
Algorithms on Evolving Graphs [pdf]
Proc. 3rd ACM International Conference on Innovations on Theoretical Computer Science (ITCS 2012), 2012
Motivated by applications that concern graphs that are evolving and massive in nature, we define a new general framework for computing with such graphs. In our framework, the graph changes over time and an algorithm can only track these changes by explicitly probing the graph. This framework captures the inherent tradeoff between the complexity of maintaining an up-to-date view of the graph and the quality of results computed with the available view. We apply this framework to two classical graph connectivity problems, namely, path connectivity and minimum spanning trees, and obtain efficient algorithms.
A. Anagnostopoulos, R. Kumar, M. Mahdian, and E. Upfal
Sorting and Selection on Dynamic Data [pdf]
Theoretical Computer Science, Volume 412, Number 24, 2011
We formulate and study a new computational model for dynamic data. In this model, the data changes gradually and the goal of an algorithm is to compute the solution to some problem on the data at each time step, under the constraint that it only has limited access to the data each time. As the data is constantly changing and the algorithm might be unaware of these changes, it cannot be expected to always output the exact right solution; we are interested in algorithms that guarantee to output an approximate solution. In particular, we focus on the fundamental problems of sorting and selection, where the true ordering of the elements changes slowly. We provide algorithms with performance close to the optimal in expectation and with high probability.
L. Becchetti, I. Bordino, S. Leonardi, and A. Rosen
Fully Decentralized Computation of Aggregates over Data Streams [pdf]
ACM SIGKDD Explorations Newsletter, Volume 12, Number 2, 2011
In several emerging applications, data is collected in massive streams at several distributed points of observation. A basic and challenging task is to allow every node to monitor a neighbourhood of interest by issuing continuous aggregate queries on the streams observed in its vicinity. This class of algorithms is fully decentralized and diffusive in nature: collecting all data at few central nodes of the network is unfeasible in networks of low capability devices or in the presence of massive data sets.
The main difficulty in designing diffusive algorithms is to cope with duplicate detections. These arise both from the observation of the same event at several nodes of the network and/or receipt of the same aggregated information along multiple paths of diffusion. In this paper, we consider fully decentralized algorithms that answer locally continuous aggregate queries on the number of distinct events, total number of events and the second frequency moment in the scenario outlined above. The proposed algorithms use in the worst case or on realistic distributions sublinear space at every node. We also propose strategies that minimize the communication needed to update the aggregates when new events are observed. We experimentally evaluate for the efficiency and accuracy of our algorithms on realistic simulated scenarios.
A. Anagnostopoulos, L. Becchetti, I. Mele, S. Leonardi, and P. Sankowski
Stochastic Query Covering [pdf]
Proc. 4th ACM International Conference on Web Search and Data Mining (WSDM 2011), 2011
In this paper we introduce the problem of query covering as a means to efficiently cache query results. The general idea is to populate the cache with documents that contribute to the result pages of a large number of queries, as opposed to caching the top documents for each query. It turns out that the problem is hard and solving it requires knowledge of the structure of the queries and the results space, as well as knowledge of the input query distribution. We formulate the problem under the framework of stochastic optimization; theoretically it can be seen as a stochastic universal version of set multicover. While the problem is NP-hard to be solved exactly, we show that for any distribution it can be approximated using a simple greedy approach. Our theoretical findings are complemented by experimental activity on real datasets, showing the feasibility and potential interest of query-covering approaches in practice.
A. Anagnostopoulos, C. Dombry, N. Guillotin-Plantard, I. Kontoyiannis, and E. Upfal
Stochastic Analysis of the k-Server Problem on the Circle [pdf]
Proc. 21st International Meeting on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2010), 2010
We consider a stochastic version of the k-server problem in which k servers move on a circle to satisfy stochastically generated requests. The requests are independent and identically distributed according to an arbitrary distribution on a circle, which is either discrete or continuous. The cost of serving a request is the distance that a server needs to move to reach the request. The goal is to minimize the steady-state expected cost induced by the requests. We study the performance of a greedy strategy, focusing, in particular, on its convergence properties and the interplay between the discrete and continuous versions of the process.
A. Anagnostopoulos, R. Kumar, M. Mahdian, and E. Upfal
Sort Me if You Can: How to Sort Dynamic Data [pdf]
Proc. 36th International Colloquium on Automata, Languages and Programming (ICALP 2009), 2009
We formulate and study a new computational model for dynamic data. In this model the data changes gradually and the goal of an algorithm is to compute the solution to some problem on the data at each time step, under the constraint that it only has a limited access to the data each time. As the data is constantly changing and the algorithm might be unaware of these changes, it cannot be expected to always output the exact right solution; we are interested in algorithms that guarantee to output an approximate solution. In particular, we focus on the fundamental problems of sorting and selection, where the true ordering of the elements changes slowly. We provide algorithms with performance close to the optimal in expectation and with high probability.
L. Becchetti, and E. Koutsoupias
Competitive Analysis of Aggregate Max in Windowed Streaming [pdf]
Proc. 36th International Colloquium on Automata, Languages and Programming (ICALP 2009), 2009
We consider the problem of maintaining a fixed number k of items observed over a data stream, so as to optimize the maximum value over a fixed number n of recent observations. Unlike previous approaches, we use the competitive analysis framework and compare the performance of the online streaming algorithm against an optimal adversary that knows the entire sequence in advance. We consider the problem of maximizing the aggregate max, i.e., the sum of the values of the largest items in the algorithm's memory over the entire sequence. For this problem, we prove an asymptotically tight competitive ratio, achieved by a simple heuristic, called partition-greedy, that performs stream updates efficiently and has almost optimal performance. In contrast, we prove that the problem of maximizing, for every time t, the value maintained by the online algorithm in memory, is considerably harder: in particular, we show a tight competitive ratio that depends on the maximum value of the stream. We further prove negative results for the closely related problem of maintaining the aggregate minimum and for the generalized version of the aggregate max problem in which every item comes with an individual window.
L. Becchetti, P. Boldi, C. Castillo, and A. Gionis
Efficient Semi-Streaming Algorithms for Local Triangle Counting in Massive Graphs [pdf]
Proc. 14th International Conference on Knowledge Discovery and Data Mining (KDD 2008), 2008
In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G=(V,E) we want to estimate as accurately as possible the number of triangles incident to every node v∈V in the graph. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help to detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features to assess content quality in social networks.
For computing the local number of triangles we propose two approximation algorithms, which are based on the idea of min-wise independent permutations (Broder et al. 1998). Our algorithms operate in a semi-streaming fashion, using O(|V|) space in main memory and performing O(log|V|) sequential scans over the edges of the graph. The first algorithm we describe in this paper also uses O(|E|) space in external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results in massive graphs demonstrating the practical efficiency of our approach.
L. Buriol, G. Frahling, S. Leonardi, and C. Sohler
Estimating Clustering Indexes in Data Streams
Proc. 15th Annual European Symposium on Algorithms (ESA 2007), 2007
We present random sampling algorithms that with probability at least 1-δ compute a (1±ε)-approximation of the clustering coefficient and of the number of bipartite clique subgraphs of a graph given as an incidence stream of edges. The space used by our algorithm to estimate the clustering coefficient is inversely related to the clustering coefficient of the network itself. The space used by our algorithm to compute the number K3,3 of bipartite cliques is proportional to the ratio between the number of K1,3 and K3,3 in the graph.
Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size. Therefore they provide a basic tool to analyze the structure of dense clusters in large graphs and have many applications in the discovery of web communities, the analysis of the structure of large social networks and the probing of frequent patterns in large graphs.
We implemented both algorithms and evaluated their performance on networks from different application domains and of different size; The largests instance is a webgraph consisting of more than 135 million nodes and 1 billion edges. Both algorithms compute accurate results in reasonable time on the tested instances.
D. Donato, S. Leonardi, and P. Tsaparas
Stability and Similarity of Link Analysis Ranking Algorithms
Internet Mathematics, Volume 3, Number 4, 2006
Recently, there has been a surge of research activity in the area of link analysis ranking, where hyperlink structures are used to determine the relative authority of webpages. One of the seminal works in this area is that of Kleinberg [Kleinberg 98], who proposed the HITS algorithm. In this paper, we undertake a theoretical analysis of the properties of the HITS algorithm on a broad class of random graphs. Working within the framework of Borodin et al. [Borodin et al. 05], we prove that, under some assumptions, on this class (a) the HITS algorithm is stable with high probability and (b) the HITS algorithm is similar to the InDegree heuristic that assigns to each node weight proportional to the number of incoming links. We demonstrate that our results go through for the case that the expected in-degrees of the graph follow a power law distribution. We also study experimentally the similarity between HITS and InDegree, and we investigate the general conditions under which the two algorithms are similar.
L. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler
Counting Triangles in Data Streams
Proc. 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2006), 2006
We present two space bounded random sampling algorithms that compute an approximation of the number of triangles in an undirected graph given as a stream of edges. Our first algorithm does not make any assumptions on the order of edges in the stream. It uses space that is inversely related to the ratio between the number of triangles and the number of triples with at least one edge in the induced subgraph, and constant expected update time per edge. Our second algorithm is designed for incidence streams (all edges incident to the same vertex appear consecutively). It uses space that is inversely related to the ratio between the number of triangles and length 2 paths in the graph and expected update time O(log|V|(1+s|V|/|E|)), where s is the space requirement of the algorithm. These results significantly improve over previous work [20, 8]. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size and so they provide a basic tool to analyze the structure of large graphs. They have many applications, for example, in the discovery of Web communities, the computation of clustering and transitivity coefficient, and discovery of frequent patterns in large graphs.
We have implemented both algorithms and evaluated their performance on networks from different application domains. The sizes of the considered graphs varied from about 8,000 nodes and 40,000 edges to 135 million nodes and more than 1 billion edges. For both algorithms we run experiments with parameter s=1,000, 10,000, 100,000, 1,000,000 to evaluate running time and approximation guarantee. Both algorithms appear to be time efficient for these sample sizes. The approximation quality of the first algorithm was varying significantly and even for s=1,000,000 we had more than 10% deviation for more than half of the instances. The second algorithm performed much better and even for s=10,000 we had an average deviation of less than 6% (taken over all but the largest instance for which we could not compute the number of triangles exactly).