Big data : principles and paradigms / edited by Rajkumar Buyya, The University of Melbourne and Manjrasoft Pty Ltd, Australia, Rodrigo N. Calheiros, The University of Melbourne, Australia, Amir Vahid Dastjerdi, The University of Melbourne, Australia.
Material type: TextPublisher: Cambridge, MA : Elsevier/Morgan Kaufmann, [2016]Description: xxv, 468 pages : illustrations ; 24 cmContent type:- text
- unmediated
- volume
- 9780128053942
- 0128053941
- 005.7 23
- QA76.9.B45 B5565 2016
Includes bibliographical references and index.
Machine generated contents note: ch. 1 BDA = ML + CC -- 1.1. Introduction -- 1.2. A Historical Review of Big Data -- 1.2.1. The Origin of Big Data -- 1.2.2. Debates of Big Data Implication -- 1.3. Historical Interpretation of Big Data -- 1.3.1. Methodology for Defining Big Data -- 1.3.2. Different Attributes of Definitions -- 1.3.3. Summary of 7 Types Definitions of Big Data -- 1.3.4. Motivations Behind the Definitions -- 1.4. Defining Big Data From 3Vs to 32Vs -- 1.4.1. Data Domain -- 1.4.2. Business Intelligent (BI) Domain -- 1.4.3. Statistics Domain -- 1.4.4. 32 Vs Definition and Big Data Venn Diagram -- 1.5. Big Data Analytics and Machine Learning -- 1.5.1. Big Data Analytics -- 1.5.2. Machine Learning -- 1.6. Big Data Analytics and Cloud Computing -- 1.7. Hadoop, HDFS, MapReduce, Spark, and Flink -- 1.7.1. Google File System (GFS) and HDFS -- 1.7.2. MapReduce -- 1.7.3. The Origin of the Hadoop Project -- 1.7.4. Spark and Spark Stack -- 1.7.5. Flink and Other Data Process Engines -- 1.7.6. Summary of Hadoop and Its Ecosystems -- 1.8. ML +CC -> BDA and Guidelines -- 1.9. Conclusion -- References -- ch. 2 Real-lime Analytics -- 2.1. Introduction -- 2.2. Computing Abstractions for Real-Time Analytics -- 2.3. Characteristics of Real-Time Systems -- 2.3.1. Low Latency -- 2.3.2. High Availability -- 2.3.3. Horizontal Scalability -- 2.4. Real-Time Processing for Big Data -- Concepts and Platforms -- 2.4.1. Event -- 2.4.2. Event Processing -- 2.4.3. Event Stream Processing and Data Stream Processing -- 2.4.4. Complex Event Processing -- 2.4.5. Event Type -- 2.4.6. Event Pattern -- 2.5. Data Stream Processing Platforms -- 2.5.1. Spark -- 2.5.2. Storm -- 2.5.3. Kafka -- 2.5.4. Flume -- 2.5.5. Amazon Kinesis -- 2.6. Data Stream Analytics Platforms -- 2.6.1. Query-Based EPSs -- 2.6.2. Rule-Oriented EPSs -- 2.6.3. Programmatic EPSs -- 2.7. Data Analysis and Analytic Techniques -- 2.7.1. Data Analysis in General -- 2.7.2. Data Analysis for Stream Applications -- 2.8. Finance Domain Requirements and a Case Study -- 2.8.1. Real-Time Analytics in Finance Domain -- 2.8.2. Selected Scenarios -- 2.8.3. CEP Application as a Case Study -- 2.9. Future Research Challenges -- References -- ch. 3 Big Data Analytics for Social Media -- 3.1. Introduction -- 3.2. NLP and Its Applications -- 3.2.1. Language Detection -- 3.2.2. Named Entity Recognition -- 3.3. Text Mining -- 3.3.1. Sentiment Analysis -- 3.3.2. Trending Topics -- 3.3.3. Recommender Systems -- 3.4. Anomaly Detection -- Acknowledgments -- References -- ch. 4 Deep Learning and Its Parallelization -- 4.1. Introduction -- 4.1.1. Application Background -- 4.1.2. Performance Demands for Deep Learning -- 4.1.3. Existing Parallel Frameworks of Deep Learning -- 4.2. Concepts and Categories of Deep Learning -- 4.2.1. Deep Learning -- 4.2.2. Mainstream Deep Learning Models -- 4.3. Parallel Optimization for Deep Learning -- 4.3.1. Convolutional Architecture for Fast Feature Embedding -- 4.3.2. DistBelief -- 4.3.3. Deep Learning Based on Multi-GPUs -- 4.4. Discussions -- 4.4.1. Grand Challenges of Deep Learning in Big Data -- 4.4.2. Future Directions -- References -- ch. 5 Characterization and Traversal of Large Real-World Networks -- 5.1. Introduction -- 5.2. Background -- 5.3. Characterization and Measurement -- 5.4. Efficient Complex Network Traversal -- 5.4.1. HPC Traversal of Large Networks -- 5.4.2. Algorithms for Accelerating AS-BFS on GPU -- 5.4.3. Performance Study of AS-BFS on GPU's -- 5.5. k-Core-Based Partitioning for Heterogeneous Graph Processing -- 5.5.1. Graph Partitioning for Heterogeneous Computing -- 5.5.2. k-Core-Based Complex-Network Unbalanced Bisection -- 5.6. Future Directions -- 5.7. Conclusions -- Acknowledgments -- References -- ch. 6 Database Techniques for Big Data -- 6.1. Introduction -- 6.2. Background -- 6.2.1. Navigational Data Models -- 6.2.2. Relational Data Models -- 6.3. NoSQL Movement -- 6.4. NoSQL Solutions for Big Data Management -- 6.5. NoSQL Data Models -- 6.5.1. Key-Value Stores -- 6.5.2. Column-Based Stores -- 6.5.3. Graph-Based Stores -- 6.5.4. Document-Based Stores -- 6.6. Future Directions -- 6.7. Conclusions -- References -- ch. 7 Resource Management in Big Data Processing Systems -- 7.1. Introduction -- 7.2. Types of Resource Management -- 7.2.1. CPU and Memory Resource Management -- 7.2.2. Storage Resource Management -- 7.2.3. Network Resource Management -- 7.3. Big Data Processing Systems and Platforms -- 7.3.1. Hadoop -- 7.3.2. Dryad -- 7.3.3. Pregel -- 7.3.4. Storm -- 7.3.5. Spark -- 7.3.6. Summary -- 7.4. Single-Resource Management in the Cloud -- 7.4.1. Desired Resource Allocation Properties -- 7.4.2. Problems for Existing Fairness Policies -- 7.4.3. Long-Term Resource Allocation Policy -- 7.4.4. Experimental Evaluation -- 7.5. Multiresource Management in the Cloud -- 7.5.1. Resource Allocation Model -- 7.5.2. Multiresource Fair Sharing Issues -- 7.5.3. Reciprocal Resource Fairness -- 7.5.4. Experimental Evaluation -- 7.6. Related Work on Resource Management -- 7.6.1. Resource Utilization Optimization -- 7.6.2. Power and Energy Cost Saving Optimization -- 7.6.3. Monetary Cost Optimization -- 7.6.4. Fairness Optimization -- 7.7. Open Problems -- 7.7.1. SLA Guarantee for Applications -- 7.7.2. Various Computation Models and Systems -- 7.7.3. Exploiting Emerging Hardware -- 7.8. Summary -- References -- ch. 8 Local Resource Consumption Shaping: A Case for MapReduce -- 8.1. Introduction -- 8.2. Motivation -- 8.2.1. Pitfalls of Fair Resource Sharing -- 8.3. Local Resource Shaper -- 8.3.1. Design Philosophy -- 8.3.2. Splitter -- 8.3.3. The Interleave MapReduce Scheduler -- 8.4. Evaluation -- 8.4.1. Experiments With Hadoop 1.x -- 8.4.2. Experiments With Hadoop 2.x -- 8.5. Related Work -- 8.6. Conclusions -- Appendix CPU Utilization With Different Slot Configurations and LRS -- References -- ch. 9 System Optimization for Big Data Processing -- 9.1. Introduction -- 9.2. Basic Framework of the Hadoop Ecosystem -- 9.3. Parallel Computation Framework: MapReduce -- 9.3.1. Improvements of MapReduce Framework -- 9.3.2. Optimization for Task Scheduling and Load Balancing of MapReduce -- 9.4. Job Scheduling of Hadoop -- 9.4.1. Built-In Scheduling Algorithms of Hadoop -- 9.4.2. Improvement of the Hadoop Job Scheduling Algorithm -- 9.4.3. Improvement of the Hadoop Job Management Framework -- 9.5. Performance Optimization of HDFS -- 9.5.1. Small File Performance Optimization -- 9.5.2. HDFS Security Optimization -- 9.6. Performance Optimization of HBase -- 9.6.1. HBase Framework, Storage, and Application Optimization -- 9.6.2. Load Balancing of HBase -- 9.6.3. Optimization of HBase Configuration -- 9.7. Performance Enhancement of Hadoop System -- 9.7.1. Efficiency Optimization of Hadoop -- 9.7.2. Availability Optimization of Hadoop -- 9.8. Conclusions and Future Directions -- References -- ch. 10 Packing Algorithms for Big Data Replay on Multicore -- 10.1. Introduction -- 10.2. Performance Bottlenecks -- 10.2.1. Hadoop/MapReduce Performance Bottlenecks -- 10.2.2. Performance Bottlenecks Under Parallel Loads -- 10.2.3. Parameter Spaces for Storage and Shared Memory -- 10.2.4. Main Storage Performance -- 10.2.5. Shared Memory Performance -- 10.3. The Big Data Replay Method -- 10.3.1. The Replay Method -- 10.3.2. Jobs as Sketches on a Timeline -- 10.3.3. Performance Bottlenecks Under Replay -- 10.4. Packing Algorithms -- 10.4.1. Shared Memory Performance Tricks -- 10.4.2. Big Data Replay at Scale -- 10.4.3. Practical Packing Models -- 10.5. Performance Analysis -- 10.5.1. Hotspot Distributions -- 10.5.2. Modeling Methodology -- 10.5.3. Processing Overhead Versus Bottlenecks -- 10.5.4. Control Grain for Drop Versus Drag Models -- 10.6. Summary and Future Directions -- References -- ch. 11 Spatial Privacy Challenges in Social Networks -- 11.1. Introduction -- 11.2. Background -- 11.3. Spatial Aspects of Social Networks -- 11.4. Cloud-Based Big Data Infrastructure -- 11.5. Spatial Privacy Case Studies -- 11.6. Conclusions -- Acknowledgments -- References -- ch. 12 Security and Privacy in Big Data -- 12.1. Introduction -- 12.2. Secure Queries Over Encrypted Big Data -- 12.2.1. System Model -- 12.2.2. Threat Model and Attack Model -- 12.2.3. Secure Query Scheme in Clouds -- 12.2.4. Security Definition of Index-Based Secure Query Techniques -- 12.2.5. Implementations of Index-Based Secure Query Techniques -- 12.3. Other Big Data Security -- 12.3.1. Digital Watermarking -- 12.3.2. Self-Adaptive Risk Access Control -- 12.4. Privacy on Correlated Big Data -- 12.4.1. Correlated Data in Big Data -- 12.4.2. Anonymity -- 12.4.3. Differential Privacy -- 12.5. Future Directions -- 12.6. Conclusions -- References -- ch. 13 Location Inferring in Internet of Things and Big Data -- 13.1. Introduction -- 13.2. Device-Based Sensing Using Big Data -- 13.2.1. Introduction -- 13.2.2. Approach Overview -- 13.2.3. Trajectories Matching -- 13.2.4. Establishing the Mapping Between Floor Plan and RSS Readings -- 13.2.5. User Localization -- 13.2.6. Graph Matching Based Tracking -- 13.2.7. Evaluation -- 13.3. Device-Free Sensing Using Big Data -- 13.3.1. Customer Behavior Identification -- 13.3.2. Human Object Estimation
Note continued: 13.4. Conclusion -- Acknowledgements -- References -- ch. 14 A Framework for Mining Thai Public Opinions -- 14.1. Introduction -- 14.2. XDOM -- 14.2.1. Data Sources -- 14.2.2. DOM System Architecture -- 14.2.3. MapReduce Framework -- 14.2.4. Sentiment Analysis -- 14.2.5. Clustering-Based Summarization Framework -- 14.2.6. Influencer Analysis -- 14.2.7. AsKDOM: Mobile Application -- 14.3. Implementation -- 14.3.1. Server -- 14.3.2. Core Service -- 14.3.3. I/O -- 14.4. Validation -- 14.4.1. Validation Parameter -- 14.4.2. Validation method -- 14.4.3. Validation results -- 14.5. Case Studies -- 14.5.1. Political Opinion: #prayforthailand -- 14.5.2. Bangkok Traffic Congestion Ranking -- 14.6. Summary and Conclusions -- Acknowledgments -- References -- ch. 15 A Case Study in Big Data Analytics: Exploring Twitter Sentiment Analysis and the Weather -- 15.1. Background -- 15.2. Big Data System Components -- 15.2.1. System Back-End Architecture -- 15.2.2. System Front-End Architecture -- 15.2.3. Software Stack -- 15.3. Machine-Learning Methodology -- 15.3.1. Tweets Sentiment Analysis -- 15.3.2. Weather and Emotion Correlation Analysis -- 15.4. System Implementation -- 15.4.1. Home Page -- 15.4.2. Sentiment Pages -- 15.4.3. Weather Pages -- 15.5. Key Findings -- 15.5.1. Time Series -- 15.5.2. Analysis with Hourly Weather Data -- 15.5.3. Analysis with Daily Weather Data -- 15.5.4. DBSCAN Cluster Algorithm -- 15.5.5. Straightforward Weather Impact on Emotion -- 15.6. Summary and Conclusions -- Acknowledgments -- References -- ch. 16 Dynamic Uncertainty-Based Analytics for Caching Performance Improvements in Mobile Broadband Wireless Networks -- 16.1. Introduction -- 16.1.1. Big Data Concerns -- 16.1.2. Key Focus Areas -- 16.2. Background -- 16.2.1. Cellular Network and VoD -- 16.2.2. Markov Processes -- 16.3. Related Work -- 16.4. VoD Architecture -- 16.5. Overview -- 16.6. Data Generation -- 16.7. Edge and Core Components -- 16.8. INCA Caching Algorithm -- 16.9. QoE Estimation -- 16.10. Theoretical Framework -- 16.11. Experiments and Results -- 16.11.1. Cache Hits With Nu, Nc, Nm and k -- 16.11.2. QoE Impact With Prefetch Bandwidth -- 16.11.3. User Satisfaction With Prefetch Bandwidth -- 16.12. Synthetic Dataset -- 16.12.1. INCA Hit Gain -- 16.12.2. QoE Performance -- 16.12.3. Satisfied Users -- 16.13. Conclusions and Future Directions -- References -- ch. 17 Big Data Analytics on a Smart Grid: Mining PMU Data for Event and Anomaly Detection -- 17.1. Introduction -- 17.2. Smart Grid With PMUs and PDCs -- 17.3. Improving Traditional Workflow -- 17.4. Characterizing Normal Operation -- 17.5. Identifying Unusual Phenomena -- 17.6. Identifying Known Events -- 17.7. Related Efforts -- 17.8. Conclusion and Future Directions -- Acknowledgments -- References -- ch. 18 eScience and Big Data Workflows in Clouds: A Taxonomy and Survey -- 18.1. Introduction -- 18.2. Background -- 18.2.1. History -- 18.2.2. Grid-Based eScience -- 18.2.3. Cloud Computing -- 18.3. Taxonomy and Review of eScience Services in the Cloud -- 18.3.1. Infrastructure -- 18.3.2. Ownership -- 18.3.3. Application -- 18.3.4. Processing Tools -- 18.3.5. Storage -- 18.3.6. Security -- 18.3.7. Service Models -- 18.3.8. Collaboration -- 18.4. Resource Provisioning for eScience Workflows in Clouds -- 18.4.1. Motivation -- 18.4.2. Our Solution -- 18.5. Open Problems -- 18.6. Summary -- References.
There are no comments on this title.