Friday, October 11, 2019

Paper Critique: “Airavat: Security and Privacy for Mapreduce” Essay

1. (10%) State the problem the paper is trying to solve. This paper is trying to demonstrate how Airavat, a MapReduce-based system for distributed computations provides end-to-end confidentiality, integrity, and privacy guarantees using a combination of mandatory access control and differential privacy which provides security and privacy guarantees against data leakage. 2. (20%) State the main contribution of the paper:   solving a new problem, proposing a new algorithm, or presenting a new evaluation (analysis). If a new problem, why was the problem important? Is the problem still important today? Will the problem be important tomorrow? If a new algorithm or new evaluation (analysis), what are the improvements over previous algorithms or evaluations? How do they come up with the new algorithm or evaluation? The main contribution of the paper is that Airavat builds on mandatory access control (MAC) and differential privacy to ensure untrusted MapReduce computations on sensitive data do not leak private information and provide confidentiality, integrity, and privacy guarantees. The goal is to prevent malicious computation providers from violating privacy policies a data provider imposes on the data to prevent leaking information about individual items in the data. The system is implemented as a modification to MapReduce and the Java virtual machine, and runs on top of SELinux 3. (15%) Summarize the (at most) 3 key main ideas (each in 1 sentence.) (1) First work to add MAC and differential privacy to mapreduce. (2) Proposes a new framework for privacy preserving mapreduce computations. (3) Confines untrusted code. 4. (30%) Critique the main contribution a. Rate the significance of the paper on a scale of 5 (breakthrough), 4 (significant contribution), 3 (modest contribution), 2 (incremental contribution), 1 (no contribution or negative contribution). Explain your rating in a sentence or two. This system provides security and privacy guarantees for distributed computations on sensitive data at the ends. However, the data still can be leaked in the cloud. Because multiple machines are involved in the computation and malicious worker can sent the intermediate data to the outside system, which threatens the privacy of the input data. Even not to this extent, temporary data is stored in the workers and those data can be fetched even after computation is done. b. Rate how convincing the methodology is: how do the authors justify the solution approach or evaluation? Do the authors use arguments, analyses, experiments, simulations, or a combination of them? Do the claims and conclusions follow from the arguments, analyses or experiments? Are the assumptions realistic (at the time of the research)? Are the assumptions still valid today? Are the experiments well designed? Are there different experiments that would be more convincing? Are there other alternatives the authors should have considered? (And, of course, is the paper free of methodological errors.) As the author’s stated on page 3 â€Å"We aim to prevent malicious computation providers from violating the privacy policy of the data provider(s) by leaking information about individual data items.† They use differential privacy mechanism to ensure this. One interesting solution to data leakage is that they have the mapper specify a range of its keys. It seems like that the larger your data set is, the more privacy you have because a user affects less of the output, if removed. They showed results that were really close to 100% with the added noise, it seems this is viable solution to protect the privacy of your data input c. What is the most important limitation of the approach? As the authors mention, one computation provider could exhaust this budget on a dataset for all other computation providers and use more than its fair share. While there is some estimation of effective parameters, there are a large number of parameters that must be set for Airavat to work properly. This increases the probability of misconfigurations or configurations that might severely limit the computations that can be performed on the data. 5. (15%) What lessons should researchers and builders take away from this work. What (if any) questions does this work leave open? The current implementation of Airavat supports both trusted and untrusted Mappers, but Reducers must be trusted and they also modified the JVM to make mappers independent (using invocation numbers to identify current and previous mappers). They also modified the reducer to provide differential privacy. From the data provider’s perspective they must provide several privacy parameters like- privacy group and privacy budget. 6. (10%) Propose your improvement on the same problem. I have no suggested improvements.

No comments:

Post a Comment