TY - BOOK AU - Buyya,Rajkumar AU - Calheiros,Rodrigo N. AU - Dastjerdi,Amir Vahid TI - Big data: principles and paradigms SN - 9780128053942 AV - QA76.9.B45 B5565 2016 U1 - 005.7 23 PY - 2016///] CY - Cambridge, MA PB - Elsevier/Morgan Kaufmann KW - Big data KW - fast N1 - Includes bibliographical references and index; Machine generated contents note; ch. 1; BDA = ML + CC --; 1.1; Introduction --; 1.2; A Historical Review of Big Data --; 1.2.1; The Origin of Big Data --; 1.2.2; Debates of Big Data Implication --; 1.3; Historical Interpretation of Big Data --; 1.3.1; Methodology for Defining Big Data --; 1.3.2; Different Attributes of Definitions --; 1.3.3; Summary of 7 Types Definitions of Big Data --; 1.3.4; Motivations Behind the Definitions --; 1.4; Defining Big Data From 3Vs to 32Vs --; 1.4.1; Data Domain --; 1.4.2; Business Intelligent (BI) Domain --; 1.4.3; Statistics Domain --; 1.4.4; 32 Vs Definition and Big Data Venn Diagram --; 1.5; Big Data Analytics and Machine Learning --; 1.5.1; Big Data Analytics --; 1.5.2; Machine Learning --; 1.6; Big Data Analytics and Cloud Computing --; 1.7; Hadoop, HDFS, MapReduce, Spark, and Flink --; 1.7.1; Google File System (GFS) and HDFS --; 1.7.2; MapReduce --; 1.7.3; The Origin of the Hadoop Project --; 1.7.4; Spark and Spark Stack --; 1.7.5; Flink and Other Data Process Engines --; 1.7.6; Summary of Hadoop and Its Ecosystems --; 1.8; ML +CC -> BDA and Guidelines --; 1.9; Conclusion --; References --; ch. 2; Real-lime Analytics --; 2.1; Introduction --; 2.2; Computing Abstractions for Real-Time Analytics --; 2.3; Characteristics of Real-Time Systems --; 2.3.1; Low Latency --; 2.3.2; High Availability --; 2.3.3; Horizontal Scalability --; 2.4; Real-Time Processing for Big Data -- Concepts and Platforms --; 2.4.1; Event --; 2.4.2; Event Processing --; 2.4.3; Event Stream Processing and Data Stream Processing --; 2.4.4; Complex Event Processing --; 2.4.5; Event Type --; 2.4.6; Event Pattern --; 2.5; Data Stream Processing Platforms --; 2.5.1; Spark --; 2.5.2; Storm --; 2.5.3; Kafka --; 2.5.4; Flume --; 2.5.5; Amazon Kinesis --; 2.6; Data Stream Analytics Platforms --; 2.6.1; Query-Based EPSs --; 2.6.2; Rule-Oriented EPSs --; 2.6.3; Programmatic EPSs --; 2.7; Data Analysis and Analytic Techniques --; 2.7.1; Data Analysis in General --; 2.7.2; Data Analysis for Stream Applications --; 2.8; Finance Domain Requirements and a Case Study --; 2.8.1; Real-Time Analytics in Finance Domain --; 2.8.2; Selected Scenarios --; 2.8.3; CEP Application as a Case Study --; 2.9; Future Research Challenges --; References --; ch. 3; Big Data Analytics for Social Media --; 3.1; Introduction --; 3.2; NLP and Its Applications --; 3.2.1; Language Detection --; 3.2.2; Named Entity Recognition --; 3.3; Text Mining --; 3.3.1; Sentiment Analysis --; 3.3.2; Trending Topics --; 3.3.3; Recommender Systems --; 3.4; Anomaly Detection --; Acknowledgments --; References --; ch. 4; Deep Learning and Its Parallelization --; 4.1; Introduction --; 4.1.1; Application Background --; 4.1.2; Performance Demands for Deep Learning --; 4.1.3; Existing Parallel Frameworks of Deep Learning --; 4.2; Concepts and Categories of Deep Learning --; 4.2.1; Deep Learning --; 4.2.2; Mainstream Deep Learning Models --; 4.3; Parallel Optimization for Deep Learning --; 4.3.1; Convolutional Architecture for Fast Feature Embedding --; 4.3.2; DistBelief --; 4.3.3; Deep Learning Based on Multi-GPUs --; 4.4; Discussions --; 4.4.1; Grand Challenges of Deep Learning in Big Data --; 4.4.2; Future Directions --; References --; ch. 5; Characterization and Traversal of Large Real-World Networks --; 5.1; Introduction --; 5.2; Background --; 5.3; Characterization and Measurement --; 5.4; Efficient Complex Network Traversal --; 5.4.1; HPC Traversal of Large Networks --; 5.4.2; Algorithms for Accelerating AS-BFS on GPU --; 5.4.3; Performance Study of AS-BFS on GPU's --; 5.5; k-Core-Based Partitioning for Heterogeneous Graph Processing --; 5.5.1; Graph Partitioning for Heterogeneous Computing --; 5.5.2; k-Core-Based Complex-Network Unbalanced Bisection --; 5.6; Future Directions --; 5.7; Conclusions --; Acknowledgments --; References --; ch. 6; Database Techniques for Big Data --; 6.1; Introduction --; 6.2; Background --; 6.2.1; Navigational Data Models --; 6.2.2; Relational Data Models --; 6.3; NoSQL Movement --; 6.4; NoSQL Solutions for Big Data Management --; 6.5; NoSQL Data Models --; 6.5.1; Key-Value Stores --; 6.5.2; Column-Based Stores --; 6.5.3; Graph-Based Stores --; 6.5.4; Document-Based Stores --; 6.6; Future Directions --; 6.7; Conclusions --; References --; ch. 7; Resource Management in Big Data Processing Systems --; 7.1; Introduction --; 7.2; Types of Resource Management --; 7.2.1; CPU and Memory Resource Management --; 7.2.2; Storage Resource Management --; 7.2.3; Network Resource Management --; 7.3; Big Data Processing Systems and Platforms --; 7.3.1; Hadoop --; 7.3.2; Dryad --; 7.3.3; Pregel --; 7.3.4; Storm --; 7.3.5; Spark --; 7.3.6; Summary --; 7.4; Single-Resource Management in the Cloud --; 7.4.1; Desired Resource Allocation Properties --; 7.4.2; Problems for Existing Fairness Policies --; 7.4.3; Long-Term Resource Allocation Policy --; 7.4.4; Experimental Evaluation --; 7.5; Multiresource Management in the Cloud --; 7.5.1; Resource Allocation Model --; 7.5.2; Multiresource Fair Sharing Issues --; 7.5.3; Reciprocal Resource Fairness --; 7.5.4; Experimental Evaluation --; 7.6; Related Work on Resource Management --; 7.6.1; Resource Utilization Optimization --; 7.6.2; Power and Energy Cost Saving Optimization --; 7.6.3; Monetary Cost Optimization --; 7.6.4; Fairness Optimization --; 7.7; Open Problems --; 7.7.1; SLA Guarantee for Applications --; 7.7.2; Various Computation Models and Systems --; 7.7.3; Exploiting Emerging Hardware --; 7.8; Summary --; References --; ch. 8; Local Resource Consumption Shaping: A Case for MapReduce --; 8.1; Introduction --; 8.2; Motivation --; 8.2.1; Pitfalls of Fair Resource Sharing --; 8.3; Local Resource Shaper --; 8.3.1; Design Philosophy --; 8.3.2; Splitter --; 8.3.3; The Interleave MapReduce Scheduler --; 8.4; Evaluation --; 8.4.1; Experiments With Hadoop 1.x --; 8.4.2; Experiments With Hadoop 2.x --; 8.5; Related Work --; 8.6; Conclusions --; Appendix CPU Utilization With Different Slot Configurations and LRS --; References --; ch. 9; System Optimization for Big Data Processing --; 9.1; Introduction --; 9.2; Basic Framework of the Hadoop Ecosystem --; 9.3; Parallel Computation Framework: MapReduce --; 9.3.1; Improvements of MapReduce Framework --; 9.3.2; Optimization for Task Scheduling and Load Balancing of MapReduce --; 9.4; Job Scheduling of Hadoop --; 9.4.1; Built-In Scheduling Algorithms of Hadoop --; 9.4.2; Improvement of the Hadoop Job Scheduling Algorithm --; 9.4.3; Improvement of the Hadoop Job Management Framework --; 9.5; Performance Optimization of HDFS --; 9.5.1; Small File Performance Optimization --; 9.5.2; HDFS Security Optimization --; 9.6; Performance Optimization of HBase --; 9.6.1; HBase Framework, Storage, and Application Optimization --; 9.6.2; Load Balancing of HBase --; 9.6.3; Optimization of HBase Configuration --; 9.7; Performance Enhancement of Hadoop System --; 9.7.1; Efficiency Optimization of Hadoop --; 9.7.2; Availability Optimization of Hadoop --; 9.8; Conclusions and Future Directions --; References --; ch. 10; Packing Algorithms for Big Data Replay on Multicore --; 10.1; Introduction --; 10.2; Performance Bottlenecks --; 10.2.1; Hadoop/MapReduce Performance Bottlenecks --; 10.2.2; Performance Bottlenecks Under Parallel Loads --; 10.2.3; Parameter Spaces for Storage and Shared Memory --; 10.2.4; Main Storage Performance --; 10.2.5; Shared Memory Performance --; 10.3; The Big Data Replay Method --; 10.3.1; The Replay Method --; 10.3.2; Jobs as Sketches on a Timeline --; 10.3.3; Performance Bottlenecks Under Replay --; 10.4; Packing Algorithms --; 10.4.1; Shared Memory Performance Tricks --; 10.4.2; Big Data Replay at Scale --; 10.4.3; Practical Packing Models --; 10.5; Performance Analysis --; 10.5.1; Hotspot Distributions --; 10.5.2; Modeling Methodology --; 10.5.3; Processing Overhead Versus Bottlenecks --; 10.5.4; Control Grain for Drop Versus Drag Models --; 10.6; Summary and Future Directions --; References --; ch. 11; Spatial Privacy Challenges in Social Networks --; 11.1; Introduction --; 11.2; Background --; 11.3; Spatial Aspects of Social Networks --; 11.4; Cloud-Based Big Data Infrastructure --; 11.5; Spatial Privacy Case Studies --; 11.6; Conclusions --; Acknowledgments --; References --; ch. 12; Security and Privacy in Big Data --; 12.1; Introduction --; 12.2; Secure Queries Over Encrypted Big Data --; 12.2.1; System Model --; 12.2.2; Threat Model and Attack Model --; 12.2.3; Secure Query Scheme in Clouds --; 12.2.4; Security Definition of Index-Based Secure Query Techniques --; 12.2.5; Implementations of Index-Based Secure Query Techniques --; 12.3; Other Big Data Security --; 12.3.1; Digital Watermarking --; 12.3.2; Self-Adaptive Risk Access Control --; 12.4; Privacy on Correlated Big Data --; 12.4.1; Correlated Data in Big Data --; 12.4.2; Anonymity --; 12.4.3; Differential Privacy --; 12.5; Future Directions --; 12.6; Conclusions --; References --; ch. 13; Location Inferring in Internet of Things and Big Data --; 13.1; Introduction --; 13.2; Device-Based Sensing Using Big Data --; 13.2.1; Introduction --; 13.2.2; Approach Overview --; 13.2.3; Trajectories Matching --; 13.2.4; Establishing the Mapping Between Floor Plan and RSS Readings --; 13.2.5; User Localization --; 13.2.6; Graph Matching Based Tracking --; 13.2.7; Evaluation --; 13.3; Device-Free Sensing Using Big Data --; 13.3.1; Customer Behavior Identification --; 13.3.2; Human Object Estimation; Note continued; 13.4; Conclusion --; Acknowledgements --; References --; ch. 14; A Framework for Mining Thai Public Opinions --; 14.1; Introduction --; 14.2; XDOM --; 14.2.1; Data Sources --; 14.2.2; DOM System Architecture --; 14.2.3; MapReduce Framework --; 14.2.4; Sentiment Analysis --; 14.2.5; Clustering-Based Summarization Framework --; 14.2.6; Influencer Analysis --; 14.2.7; AsKDOM: Mobile Application --; 14.3; Implementation --; 14.3.1; Server --; 14.3.2; Core Service --; 14.3.3; I/O --; 14.4; Validation --; 14.4.1; Validation Parameter --; 14.4.2; Validation method --; 14.4.3; Validation results --; 14.5; Case Studies --; 14.5.1; Political Opinion: #prayforthailand --; 14.5.2; Bangkok Traffic Congestion Ranking --; 14.6; Summary and Conclusions --; Acknowledgments --; References --; ch. 15; A Case Study in Big Data Analytics: Exploring Twitter Sentiment Analysis and the Weather --; 15.1; Background --; 15.2; Big Data System Components --; 15.2.1; System Back-End Architecture --; 15.2.2; System Front-End Architecture --; 15.2.3; Software Stack --; 15.3; Machine-Learning Methodology --; 15.3.1; Tweets Sentiment Analysis --; 15.3.2; Weather and Emotion Correlation Analysis --; 15.4; System Implementation --; 15.4.1; Home Page --; 15.4.2; Sentiment Pages --; 15.4.3; Weather Pages --; 15.5; Key Findings --; 15.5.1; Time Series --; 15.5.2; Analysis with Hourly Weather Data --; 15.5.3; Analysis with Daily Weather Data --; 15.5.4; DBSCAN Cluster Algorithm --; 15.5.5; Straightforward Weather Impact on Emotion --; 15.6; Summary and Conclusions --; Acknowledgments --; References --; ch. 16; Dynamic Uncertainty-Based Analytics for Caching Performance Improvements in Mobile Broadband Wireless Networks --; 16.1; Introduction --; 16.1.1; Big Data Concerns --; 16.1.2; Key Focus Areas --; 16.2; Background --; 16.2.1; Cellular Network and VoD --; 16.2.2; Markov Processes --; 16.3; Related Work --; 16.4; VoD Architecture --; 16.5; Overview --; 16.6; Data Generation --; 16.7; Edge and Core Components --; 16.8; INCA Caching Algorithm --; 16.9; QoE Estimation --; 16.10; Theoretical Framework --; 16.11; Experiments and Results --; 16.11.1; Cache Hits With Nu, Nc, Nm and k --; 16.11.2; QoE Impact With Prefetch Bandwidth --; 16.11.3; User Satisfaction With Prefetch Bandwidth --; 16.12; Synthetic Dataset --; 16.12.1; INCA Hit Gain --; 16.12.2; QoE Performance --; 16.12.3; Satisfied Users --; 16.13; Conclusions and Future Directions --; References --; ch. 17; Big Data Analytics on a Smart Grid: Mining PMU Data for Event and Anomaly Detection --; 17.1; Introduction --; 17.2; Smart Grid With PMUs and PDCs --; 17.3; Improving Traditional Workflow --; 17.4; Characterizing Normal Operation --; 17.5; Identifying Unusual Phenomena --; 17.6; Identifying Known Events --; 17.7; Related Efforts --; 17.8; Conclusion and Future Directions --; Acknowledgments --; References --; ch. 18; eScience and Big Data Workflows in Clouds: A Taxonomy and Survey --; 18.1; Introduction --; 18.2; Background --; 18.2.1; History --; 18.2.2; Grid-Based eScience --; 18.2.3; Cloud Computing --; 18.3; Taxonomy and Review of eScience Services in the Cloud --; 18.3.1; Infrastructure --; 18.3.2; Ownership --; 18.3.3; Application --; 18.3.4; Processing Tools --; 18.3.5; Storage --; 18.3.6; Security --; 18.3.7; Service Models --; 18.3.8; Collaboration --; 18.4; Resource Provisioning for eScience Workflows in Clouds --; 18.4.1; Motivation --; 18.4.2; Our Solution --; 18.5; Open Problems --; 18.6; Summary --; References UR - https://www.sciencedirect.com/book/9780128053942/big-data ER -