Normal view MARC view ISBD view

Big data : principles and paradigms /

Big data : principles and paradigms / edited by Rajkumar Buyya, The University of Melbourne and Manjrasoft Pty Ltd, Australia, Rodrigo N. Calheiros, The University of Melbourne, Australia, Amir Vahid Dastjerdi, The University of Melbourne, Australia. - xxv, 468 pages : illustrations ; 24 cm

Includes bibliographical references and index.

Machine generated contents note: BDA = ML + CC -- Introduction -- A Historical Review of Big Data -- The Origin of Big Data -- Debates of Big Data Implication -- Historical Interpretation of Big Data -- Methodology for Defining Big Data -- Different Attributes of Definitions -- Summary of 7 Types Definitions of Big Data -- Motivations Behind the Definitions -- Defining Big Data From 3Vs to 32Vs -- Data Domain -- Business Intelligent (BI) Domain -- Statistics Domain -- 32 Vs Definition and Big Data Venn Diagram -- Big Data Analytics and Machine Learning -- Big Data Analytics -- Machine Learning -- Big Data Analytics and Cloud Computing -- Hadoop, HDFS, MapReduce, Spark, and Flink -- Google File System (GFS) and HDFS -- MapReduce -- The Origin of the Hadoop Project -- Spark and Spark Stack -- Flink and Other Data Process Engines -- Summary of Hadoop and Its Ecosystems -- ML +CC -> BDA and Guidelines -- Conclusion -- References -- Real-lime Analytics -- Introduction -- Computing Abstractions for Real-Time Analytics -- Characteristics of Real-Time Systems -- Low Latency -- High Availability -- Horizontal Scalability -- Real-Time Processing for Big Data -- Concepts and Platforms -- Event -- Event Processing -- Event Stream Processing and Data Stream Processing -- Complex Event Processing -- Event Type -- Event Pattern -- Data Stream Processing Platforms -- Spark -- Storm -- Kafka -- Flume -- Amazon Kinesis -- Data Stream Analytics Platforms -- Query-Based EPSs -- Rule-Oriented EPSs -- Programmatic EPSs -- Data Analysis and Analytic Techniques -- Data Analysis in General -- Data Analysis for Stream Applications -- Finance Domain Requirements and a Case Study -- Real-Time Analytics in Finance Domain -- Selected Scenarios -- CEP Application as a Case Study -- Future Research Challenges -- References -- Big Data Analytics for Social Media -- Introduction -- NLP and Its Applications -- Language Detection -- Named Entity Recognition -- Text Mining -- Sentiment Analysis -- Trending Topics -- Recommender Systems -- Anomaly Detection -- Acknowledgments -- References -- Deep Learning and Its Parallelization -- Introduction -- Application Background -- Performance Demands for Deep Learning -- Existing Parallel Frameworks of Deep Learning -- Concepts and Categories of Deep Learning -- Deep Learning -- Mainstream Deep Learning Models -- Parallel Optimization for Deep Learning -- Convolutional Architecture for Fast Feature Embedding -- DistBelief -- Deep Learning Based on Multi-GPUs -- Discussions -- Grand Challenges of Deep Learning in Big Data -- Future Directions -- References -- Characterization and Traversal of Large Real-World Networks -- Introduction -- Background -- Characterization and Measurement -- Efficient Complex Network Traversal -- HPC Traversal of Large Networks -- Algorithms for Accelerating AS-BFS on GPU -- Performance Study of AS-BFS on GPU's -- k-Core-Based Partitioning for Heterogeneous Graph Processing -- Graph Partitioning for Heterogeneous Computing -- k-Core-Based Complex-Network Unbalanced Bisection -- Future Directions -- Conclusions -- Acknowledgments -- References -- Database Techniques for Big Data -- Introduction -- Background -- Navigational Data Models -- Relational Data Models -- NoSQL Movement -- NoSQL Solutions for Big Data Management -- NoSQL Data Models -- Key-Value Stores -- Column-Based Stores -- Graph-Based Stores -- Document-Based Stores -- Future Directions -- Conclusions -- References -- Resource Management in Big Data Processing Systems -- Introduction -- Types of Resource Management -- CPU and Memory Resource Management -- Storage Resource Management -- Network Resource Management -- Big Data Processing Systems and Platforms -- Hadoop -- Dryad -- Pregel -- Storm -- Spark -- Summary -- Single-Resource Management in the Cloud -- Desired Resource Allocation Properties -- Problems for Existing Fairness Policies -- Long-Term Resource Allocation Policy -- Experimental Evaluation -- Multiresource Management in the Cloud -- Resource Allocation Model -- Multiresource Fair Sharing Issues -- Reciprocal Resource Fairness -- Experimental Evaluation -- Related Work on Resource Management -- Resource Utilization Optimization -- Power and Energy Cost Saving Optimization -- Monetary Cost Optimization -- Fairness Optimization -- Open Problems -- SLA Guarantee for Applications -- Various Computation Models and Systems -- Exploiting Emerging Hardware -- Summary -- References -- Local Resource Consumption Shaping: A Case for MapReduce -- Introduction -- Motivation -- Pitfalls of Fair Resource Sharing -- Local Resource Shaper -- Design Philosophy -- Splitter -- The Interleave MapReduce Scheduler -- Evaluation -- Experiments With Hadoop 1.x -- Experiments With Hadoop 2.x -- Related Work -- Conclusions -- Appendix CPU Utilization With Different Slot Configurations and LRS -- References -- System Optimization for Big Data Processing -- Introduction -- Basic Framework of the Hadoop Ecosystem -- Parallel Computation Framework: MapReduce -- Improvements of MapReduce Framework -- Optimization for Task Scheduling and Load Balancing of MapReduce -- Job Scheduling of Hadoop -- Built-In Scheduling Algorithms of Hadoop -- Improvement of the Hadoop Job Scheduling Algorithm -- Improvement of the Hadoop Job Management Framework -- Performance Optimization of HDFS -- Small File Performance Optimization -- HDFS Security Optimization -- Performance Optimization of HBase -- HBase Framework, Storage, and Application Optimization -- Load Balancing of HBase -- Optimization of HBase Configuration -- Performance Enhancement of Hadoop System -- Efficiency Optimization of Hadoop -- Availability Optimization of Hadoop -- Conclusions and Future Directions -- References -- Packing Algorithms for Big Data Replay on Multicore -- Introduction -- Performance Bottlenecks -- Hadoop/MapReduce Performance Bottlenecks -- Performance Bottlenecks Under Parallel Loads -- Parameter Spaces for Storage and Shared Memory -- Main Storage Performance -- Shared Memory Performance -- The Big Data Replay Method -- The Replay Method -- Jobs as Sketches on a Timeline -- Performance Bottlenecks Under Replay -- Packing Algorithms -- Shared Memory Performance Tricks -- Big Data Replay at Scale -- Practical Packing Models -- Performance Analysis -- Hotspot Distributions -- Modeling Methodology -- Processing Overhead Versus Bottlenecks -- Control Grain for Drop Versus Drag Models -- Summary and Future Directions -- References -- Spatial Privacy Challenges in Social Networks -- Introduction -- Background -- Spatial Aspects of Social Networks -- Cloud-Based Big Data Infrastructure -- Spatial Privacy Case Studies -- Conclusions -- Acknowledgments -- References -- Security and Privacy in Big Data -- Introduction -- Secure Queries Over Encrypted Big Data -- System Model -- Threat Model and Attack Model -- Secure Query Scheme in Clouds -- Security Definition of Index-Based Secure Query Techniques -- Implementations of Index-Based Secure Query Techniques -- Other Big Data Security -- Digital Watermarking -- Self-Adaptive Risk Access Control -- Privacy on Correlated Big Data -- Correlated Data in Big Data -- Anonymity -- Differential Privacy -- Future Directions -- Conclusions -- References -- Location Inferring in Internet of Things and Big Data -- Introduction -- Device-Based Sensing Using Big Data -- Introduction -- Approach Overview -- Trajectories Matching -- Establishing the Mapping Between Floor Plan and RSS Readings -- User Localization -- Graph Matching Based Tracking -- Evaluation -- Device-Free Sensing Using Big Data -- Customer Behavior Identification -- Human Object Estimation ch. 1 1.1. 1.2. 1.2.1. 1.2.2. 1.3. 1.3.1. 1.3.2. 1.3.3. 1.3.4. 1.4. 1.4.1. 1.4.2. 1.4.3. 1.4.4. 1.5. 1.5.1. 1.5.2. 1.6. 1.7. 1.7.1. 1.7.2. 1.7.3. 1.7.4. 1.7.5. 1.7.6. 1.8. 1.9. ch. 2 2.1. 2.2. 2.3. 2.3.1. 2.3.2. 2.3.3. 2.4. 2.4.1. 2.4.2. 2.4.3. 2.4.4. 2.4.5. 2.4.6. 2.5. 2.5.1. 2.5.2. 2.5.3. 2.5.4. 2.5.5. 2.6. 2.6.1. 2.6.2. 2.6.3. 2.7. 2.7.1. 2.7.2. 2.8. 2.8.1. 2.8.2. 2.8.3. 2.9. ch. 3 3.1. 3.2. 3.2.1. 3.2.2. 3.3. 3.3.1. 3.3.2. 3.3.3. 3.4. ch. 4 4.1. 4.1.1. 4.1.2. 4.1.3. 4.2. 4.2.1. 4.2.2. 4.3. 4.3.1. 4.3.2. 4.3.3. 4.4. 4.4.1. 4.4.2. ch. 5 5.1. 5.2. 5.3. 5.4. 5.4.1. 5.4.2. 5.4.3. 5.5. 5.5.1. 5.5.2. 5.6. 5.7. ch. 6 6.1. 6.2. 6.2.1. 6.2.2. 6.3. 6.4. 6.5. 6.5.1. 6.5.2. 6.5.3. 6.5.4. 6.6. 6.7. ch. 7 7.1. 7.2. 7.2.1. 7.2.2. 7.2.3. 7.3. 7.3.1. 7.3.2. 7.3.3. 7.3.4. 7.3.5. 7.3.6. 7.4. 7.4.1. 7.4.2. 7.4.3. 7.4.4. 7.5. 7.5.1. 7.5.2. 7.5.3. 7.5.4. 7.6. 7.6.1. 7.6.2. 7.6.3. 7.6.4. 7.7. 7.7.1. 7.7.2. 7.7.3. 7.8. ch. 8 8.1. 8.2. 8.2.1. 8.3. 8.3.1. 8.3.2. 8.3.3. 8.4. 8.4.1. 8.4.2. 8.5. 8.6. ch. 9 9.1. 9.2. 9.3. 9.3.1. 9.3.2. 9.4. 9.4.1. 9.4.2. 9.4.3. 9.5. 9.5.1. 9.5.2. 9.6. 9.6.1. 9.6.2. 9.6.3. 9.7. 9.7.1. 9.7.2. 9.8. ch. 10 10.1. 10.2. 10.2.1. 10.2.2. 10.2.3. 10.2.4. 10.2.5. 10.3. 10.3.1. 10.3.2. 10.3.3. 10.4. 10.4.1. 10.4.2. 10.4.3. 10.5. 10.5.1. 10.5.2. 10.5.3. 10.5.4. 10.6. ch. 11 11.1. 11.2. 11.3. 11.4. 11.5. 11.6. ch. 12 12.1. 12.2. 12.2.1. 12.2.2. 12.2.3. 12.2.4. 12.2.5. 12.3. 12.3.1. 12.3.2. 12.4. 12.4.1. 12.4.2. 12.4.3. 12.5. 12.6. ch. 13 13.1. 13.2. 13.2.1. 13.2.2. 13.2.3. 13.2.4. 13.2.5. 13.2.6. 13.2.7. 13.3. 13.3.1. 13.3.2. Note continued: Conclusion -- Acknowledgements -- References -- A Framework for Mining Thai Public Opinions -- Introduction -- XDOM -- Data Sources -- DOM System Architecture -- MapReduce Framework -- Sentiment Analysis -- Clustering-Based Summarization Framework -- Influencer Analysis -- AsKDOM: Mobile Application -- Implementation -- Server -- Core Service -- I/O -- Validation -- Validation Parameter -- Validation method -- Validation results -- Case Studies -- Political Opinion: #prayforthailand -- Bangkok Traffic Congestion Ranking -- Summary and Conclusions -- Acknowledgments -- References -- A Case Study in Big Data Analytics: Exploring Twitter Sentiment Analysis and the Weather -- Background -- Big Data System Components -- System Back-End Architecture -- System Front-End Architecture -- Software Stack -- Machine-Learning Methodology -- Tweets Sentiment Analysis -- Weather and Emotion Correlation Analysis -- System Implementation -- Home Page -- Sentiment Pages -- Weather Pages -- Key Findings -- Time Series -- Analysis with Hourly Weather Data -- Analysis with Daily Weather Data -- DBSCAN Cluster Algorithm -- Straightforward Weather Impact on Emotion -- Summary and Conclusions -- Acknowledgments -- References -- Dynamic Uncertainty-Based Analytics for Caching Performance Improvements in Mobile Broadband Wireless Networks -- Introduction -- Big Data Concerns -- Key Focus Areas -- Background -- Cellular Network and VoD -- Markov Processes -- Related Work -- VoD Architecture -- Overview -- Data Generation -- Edge and Core Components -- INCA Caching Algorithm -- QoE Estimation -- Theoretical Framework -- Experiments and Results -- Cache Hits With Nu, Nc, Nm and k -- QoE Impact With Prefetch Bandwidth -- User Satisfaction With Prefetch Bandwidth -- Synthetic Dataset -- INCA Hit Gain -- QoE Performance -- Satisfied Users -- Conclusions and Future Directions -- References -- Big Data Analytics on a Smart Grid: Mining PMU Data for Event and Anomaly Detection -- Introduction -- Smart Grid With PMUs and PDCs -- Improving Traditional Workflow -- Characterizing Normal Operation -- Identifying Unusual Phenomena -- Identifying Known Events -- Related Efforts -- Conclusion and Future Directions -- Acknowledgments -- References -- eScience and Big Data Workflows in Clouds: A Taxonomy and Survey -- Introduction -- Background -- History -- Grid-Based eScience -- Cloud Computing -- Taxonomy and Review of eScience Services in the Cloud -- Infrastructure -- Ownership -- Application -- Processing Tools -- Storage -- Security -- Service Models -- Collaboration -- Resource Provisioning for eScience Workflows in Clouds -- Motivation -- Our Solution -- Open Problems -- Summary -- References. 13.4. ch. 14 14.1. 14.2. 14.2.1. 14.2.2. 14.2.3. 14.2.4. 14.2.5. 14.2.6. 14.2.7. 14.3. 14.3.1. 14.3.2. 14.3.3. 14.4. 14.4.1. 14.4.2. 14.4.3. 14.5. 14.5.1. 14.5.2. 14.6. ch. 15 15.1. 15.2. 15.2.1. 15.2.2. 15.2.3. 15.3. 15.3.1. 15.3.2. 15.4. 15.4.1. 15.4.2. 15.4.3. 15.5. 15.5.1. 15.5.2. 15.5.3. 15.5.4. 15.5.5. 15.6. ch. 16 16.1. 16.1.1. 16.1.2. 16.2. 16.2.1. 16.2.2. 16.3. 16.4. 16.5. 16.6. 16.7. 16.8. 16.9. 16.10. 16.11. 16.11.1. 16.11.2. 16.11.3. 16.12. 16.12.1. 16.12.2. 16.12.3. 16.13. ch. 17 17.1. 17.2. 17.3. 17.4. 17.5. 17.6. 17.7. 17.8. ch. 18 18.1. 18.2. 18.2.1. 18.2.2. 18.2.3. 18.3. 18.3.1. 18.3.2. 18.3.3. 18.3.4. 18.3.5. 18.3.6. 18.3.7. 18.3.8. 18.4. 18.4.1. 18.4.2. 18.5. 18.6.

ISBN: 9780128053942 electronic bk 0128053941

LCCN: 2016933655

Subjects--Topical Terms:
Big data.
Big data.
Big data.

LC Class. No.: QA76.9.B45 / B5565 2016

Dewey Class. No.: 005.7