Welcome to Central Library, SUST
Amazon cover image
Image from Amazon.com
Image from Google Jackets

Big data : principles and paradigms / edited by Rajkumar Buyya, The University of Melbourne and Manjrasoft Pty Ltd, Australia, Rodrigo N. Calheiros, The University of Melbourne, Australia, Amir Vahid Dastjerdi, The University of Melbourne, Australia.

Contributor(s): Material type: TextTextPublisher: Cambridge, MA : Elsevier/Morgan Kaufmann, [2016]Description: xxv, 468 pages : illustrations ; 24 cmContent type:
  • text
Media type:
  • unmediated
Carrier type:
  • volume
ISBN:
  • 9780128053942
  • 0128053941
Subject(s): DDC classification:
  • 005.7 23
LOC classification:
  • QA76.9.B45 B5565 2016
Online resources:
Contents:
ch. 1 BDA = ML + CC -- 1.1. Introduction -- 1.2. A Historical Review of Big Data -- 1.2.1. The Origin of Big Data -- 1.2.2. Debates of Big Data Implication -- 1.3. Historical Interpretation of Big Data -- 1.3.1. Methodology for Defining Big Data -- 1.3.2. Different Attributes of Definitions -- 1.3.3. Summary of 7 Types Definitions of Big Data -- 1.3.4. Motivations Behind the Definitions -- 1.4. Defining Big Data From 3Vs to 32Vs -- 1.4.1. Data Domain -- 1.4.2. Business Intelligent (BI) Domain -- 1.4.3. Statistics Domain -- 1.4.4. 32 Vs Definition and Big Data Venn Diagram -- 1.5. Big Data Analytics and Machine Learning -- 1.5.1. Big Data Analytics -- 1.5.2. Machine Learning -- 1.6. Big Data Analytics and Cloud Computing -- 1.7. Hadoop, HDFS, MapReduce, Spark, and Flink -- 1.7.1. Google File System (GFS) and HDFS -- 1.7.2. MapReduce -- 1.7.3. The Origin of the Hadoop Project -- 1.7.4. Spark and Spark Stack -- 1.7.5. Flink and Other Data Process Engines -- 1.7.6. Summary of Hadoop and Its Ecosystems -- 1.8. ML +CC -> BDA and Guidelines -- 1.9. Conclusion -- References -- ch. 2 Real-lime Analytics -- 2.1. Introduction -- 2.2. Computing Abstractions for Real-Time Analytics -- 2.3. Characteristics of Real-Time Systems -- 2.3.1. Low Latency -- 2.3.2. High Availability -- 2.3.3. Horizontal Scalability -- 2.4. Real-Time Processing for Big Data -- Concepts and Platforms -- 2.4.1. Event -- 2.4.2. Event Processing -- 2.4.3. Event Stream Processing and Data Stream Processing -- 2.4.4. Complex Event Processing -- 2.4.5. Event Type -- 2.4.6. Event Pattern -- 2.5. Data Stream Processing Platforms -- 2.5.1. Spark -- 2.5.2. Storm -- 2.5.3. Kafka -- 2.5.4. Flume -- 2.5.5. Amazon Kinesis -- 2.6. Data Stream Analytics Platforms -- 2.6.1. Query-Based EPSs -- 2.6.2. Rule-Oriented EPSs -- 2.6.3. Programmatic EPSs -- 2.7. Data Analysis and Analytic Techniques -- 2.7.1. Data Analysis in General -- 2.7.2. Data Analysis for Stream Applications -- 2.8. Finance Domain Requirements and a Case Study -- 2.8.1. Real-Time Analytics in Finance Domain -- 2.8.2. Selected Scenarios -- 2.8.3. CEP Application as a Case Study -- 2.9. Future Research Challenges -- References -- ch. 3 Big Data Analytics for Social Media -- 3.1. Introduction -- 3.2. NLP and Its Applications -- 3.2.1. Language Detection -- 3.2.2. Named Entity Recognition -- 3.3. Text Mining -- 3.3.1. Sentiment Analysis -- 3.3.2. Trending Topics -- 3.3.3. Recommender Systems -- 3.4. Anomaly Detection -- Acknowledgments -- References -- ch. 4 Deep Learning and Its Parallelization -- 4.1. Introduction -- 4.1.1. Application Background -- 4.1.2. Performance Demands for Deep Learning -- 4.1.3. Existing Parallel Frameworks of Deep Learning -- 4.2. Concepts and Categories of Deep Learning -- 4.2.1. Deep Learning -- 4.2.2. Mainstream Deep Learning Models -- 4.3. Parallel Optimization for Deep Learning -- 4.3.1. Convolutional Architecture for Fast Feature Embedding -- 4.3.2. DistBelief -- 4.3.3. Deep Learning Based on Multi-GPUs -- 4.4. Discussions -- 4.4.1. Grand Challenges of Deep Learning in Big Data -- 4.4.2. Future Directions -- References -- ch. 5 Characterization and Traversal of Large Real-World Networks -- 5.1. Introduction -- 5.2. Background -- 5.3. Characterization and Measurement -- 5.4. Efficient Complex Network Traversal -- 5.4.1. HPC Traversal of Large Networks -- 5.4.2. Algorithms for Accelerating AS-BFS on GPU -- 5.4.3. Performance Study of AS-BFS on GPU's -- 5.5. k-Core-Based Partitioning for Heterogeneous Graph Processing -- 5.5.1. Graph Partitioning for Heterogeneous Computing -- 5.5.2. k-Core-Based Complex-Network Unbalanced Bisection -- 5.6. Future Directions -- 5.7. Conclusions -- Acknowledgments -- References -- ch. 6 Database Techniques for Big Data -- 6.1. Introduction -- 6.2. Background -- 6.2.1. Navigational Data Models -- 6.2.2. Relational Data Models -- 6.3. NoSQL Movement -- 6.4. NoSQL Solutions for Big Data Management -- 6.5. NoSQL Data Models -- 6.5.1. Key-Value Stores -- 6.5.2. Column-Based Stores -- 6.5.3. Graph-Based Stores -- 6.5.4. Document-Based Stores -- 6.6. Future Directions -- 6.7. Conclusions -- References -- ch. 7 Resource Management in Big Data Processing Systems -- 7.1. Introduction -- 7.2. Types of Resource Management -- 7.2.1. CPU and Memory Resource Management -- 7.2.2. Storage Resource Management -- 7.2.3. Network Resource Management -- 7.3. Big Data Processing Systems and Platforms -- 7.3.1. Hadoop -- 7.3.2. Dryad -- 7.3.3. Pregel -- 7.3.4. Storm -- 7.3.5. Spark -- 7.3.6. Summary -- 7.4. Single-Resource Management in the Cloud -- 7.4.1. Desired Resource Allocation Properties -- 7.4.2. Problems for Existing Fairness Policies -- 7.4.3. Long-Term Resource Allocation Policy -- 7.4.4. Experimental Evaluation -- 7.5. Multiresource Management in the Cloud -- 7.5.1. Resource Allocation Model -- 7.5.2. Multiresource Fair Sharing Issues -- 7.5.3. Reciprocal Resource Fairness -- 7.5.4. Experimental Evaluation -- 7.6. Related Work on Resource Management -- 7.6.1. Resource Utilization Optimization -- 7.6.2. Power and Energy Cost Saving Optimization -- 7.6.3. Monetary Cost Optimization -- 7.6.4. Fairness Optimization -- 7.7. Open Problems -- 7.7.1. SLA Guarantee for Applications -- 7.7.2. Various Computation Models and Systems -- 7.7.3. Exploiting Emerging Hardware -- 7.8. Summary -- References -- ch. 8 Local Resource Consumption Shaping: A Case for MapReduce -- 8.1. Introduction -- 8.2. Motivation -- 8.2.1. Pitfalls of Fair Resource Sharing -- 8.3. Local Resource Shaper -- 8.3.1. Design Philosophy -- 8.3.2. Splitter -- 8.3.3. The Interleave MapReduce Scheduler -- 8.4. Evaluation -- 8.4.1. Experiments With Hadoop 1.x -- 8.4.2. Experiments With Hadoop 2.x -- 8.5. Related Work -- 8.6. Conclusions -- Appendix CPU Utilization With Different Slot Configurations and LRS -- References -- ch. 9 System Optimization for Big Data Processing -- 9.1. Introduction -- 9.2. Basic Framework of the Hadoop Ecosystem -- 9.3. Parallel Computation Framework: MapReduce -- 9.3.1. Improvements of MapReduce Framework -- 9.3.2. Optimization for Task Scheduling and Load Balancing of MapReduce -- 9.4. Job Scheduling of Hadoop -- 9.4.1. Built-In Scheduling Algorithms of Hadoop -- 9.4.2. Improvement of the Hadoop Job Scheduling Algorithm -- 9.4.3. Improvement of the Hadoop Job Management Framework -- 9.5. Performance Optimization of HDFS -- 9.5.1. Small File Performance Optimization -- 9.5.2. HDFS Security Optimization -- 9.6. Performance Optimization of HBase -- 9.6.1. HBase Framework, Storage, and Application Optimization -- 9.6.2. Load Balancing of HBase -- 9.6.3. Optimization of HBase Configuration -- 9.7. Performance Enhancement of Hadoop System -- 9.7.1. Efficiency Optimization of Hadoop -- 9.7.2. Availability Optimization of Hadoop -- 9.8. Conclusions and Future Directions -- References -- ch. 10 Packing Algorithms for Big Data Replay on Multicore -- 10.1. Introduction -- 10.2. Performance Bottlenecks -- 10.2.1. Hadoop/MapReduce Performance Bottlenecks -- 10.2.2. Performance Bottlenecks Under Parallel Loads -- 10.2.3. Parameter Spaces for Storage and Shared Memory -- 10.2.4. Main Storage Performance -- 10.2.5. Shared Memory Performance -- 10.3. The Big Data Replay Method -- 10.3.1. The Replay Method -- 10.3.2. Jobs as Sketches on a Timeline -- 10.3.3. Performance Bottlenecks Under Replay -- 10.4. Packing Algorithms -- 10.4.1. Shared Memory Performance Tricks -- 10.4.2. Big Data Replay at Scale -- 10.4.3. Practical Packing Models -- 10.5. Performance Analysis -- 10.5.1. Hotspot Distributions -- 10.5.2. Modeling Methodology -- 10.5.3. Processing Overhead Versus Bottlenecks -- 10.5.4. Control Grain for Drop Versus Drag Models -- 10.6. Summary and Future Directions -- References -- ch. 11 Spatial Privacy Challenges in Social Networks -- 11.1. Introduction -- 11.2. Background -- 11.3. Spatial Aspects of Social Networks -- 11.4. Cloud-Based Big Data Infrastructure -- 11.5. Spatial Privacy Case Studies -- 11.6. Conclusions -- Acknowledgments -- References -- ch. 12 Security and Privacy in Big Data -- 12.1. Introduction -- 12.2. Secure Queries Over Encrypted Big Data -- 12.2.1. System Model -- 12.2.2. Threat Model and Attack Model -- 12.2.3. Secure Query Scheme in Clouds -- 12.2.4. Security Definition of Index-Based Secure Query Techniques -- 12.2.5. Implementations of Index-Based Secure Query Techniques -- 12.3. Other Big Data Security -- 12.3.1. Digital Watermarking -- 12.3.2. Self-Adaptive Risk Access Control -- 12.4. Privacy on Correlated Big Data -- 12.4.1. Correlated Data in Big Data -- 12.4.2. Anonymity -- 12.4.3. Differential Privacy -- 12.5. Future Directions -- 12.6. Conclusions -- References -- ch. 13 Location Inferring in Internet of Things and Big Data -- 13.1. Introduction -- 13.2. Device-Based Sensing Using Big Data -- 13.2.1. Introduction -- 13.2.2. Approach Overview -- 13.2.3. Trajectories Matching -- 13.2.4. Establishing the Mapping Between Floor Plan and RSS Readings -- 13.2.5. User Localization -- 13.2.6. Graph Matching Based Tracking -- 13.2.7. Evaluation -- 13.3. Device-Free Sensing Using Big Data -- 13.3.1. Customer Behavior Identification -- 13.3.2. Human Object Estimation
13.4. Conclusion -- Acknowledgements -- References -- ch. 14 A Framework for Mining Thai Public Opinions -- 14.1. Introduction -- 14.2. XDOM -- 14.2.1. Data Sources -- 14.2.2. DOM System Architecture -- 14.2.3. MapReduce Framework -- 14.2.4. Sentiment Analysis -- 14.2.5. Clustering-Based Summarization Framework -- 14.2.6. Influencer Analysis -- 14.2.7. AsKDOM: Mobile Application -- 14.3. Implementation -- 14.3.1. Server -- 14.3.2. Core Service -- 14.3.3. I/O -- 14.4. Validation -- 14.4.1. Validation Parameter -- 14.4.2. Validation method -- 14.4.3. Validation results -- 14.5. Case Studies -- 14.5.1. Political Opinion: #prayforthailand -- 14.5.2. Bangkok Traffic Congestion Ranking -- 14.6. Summary and Conclusions -- Acknowledgments -- References -- ch. 15 A Case Study in Big Data Analytics: Exploring Twitter Sentiment Analysis and the Weather -- 15.1. Background -- 15.2. Big Data System Components -- 15.2.1. System Back-End Architecture -- 15.2.2. System Front-End Architecture -- 15.2.3. Software Stack -- 15.3. Machine-Learning Methodology -- 15.3.1. Tweets Sentiment Analysis -- 15.3.2. Weather and Emotion Correlation Analysis -- 15.4. System Implementation -- 15.4.1. Home Page -- 15.4.2. Sentiment Pages -- 15.4.3. Weather Pages -- 15.5. Key Findings -- 15.5.1. Time Series -- 15.5.2. Analysis with Hourly Weather Data -- 15.5.3. Analysis with Daily Weather Data -- 15.5.4. DBSCAN Cluster Algorithm -- 15.5.5. Straightforward Weather Impact on Emotion -- 15.6. Summary and Conclusions -- Acknowledgments -- References -- ch. 16 Dynamic Uncertainty-Based Analytics for Caching Performance Improvements in Mobile Broadband Wireless Networks -- 16.1. Introduction -- 16.1.1. Big Data Concerns -- 16.1.2. Key Focus Areas -- 16.2. Background -- 16.2.1. Cellular Network and VoD -- 16.2.2. Markov Processes -- 16.3. Related Work -- 16.4. VoD Architecture -- 16.5. Overview -- 16.6. Data Generation -- 16.7. Edge and Core Components -- 16.8. INCA Caching Algorithm -- 16.9. QoE Estimation -- 16.10. Theoretical Framework -- 16.11. Experiments and Results -- 16.11.1. Cache Hits With Nu, Nc, Nm and k -- 16.11.2. QoE Impact With Prefetch Bandwidth -- 16.11.3. User Satisfaction With Prefetch Bandwidth -- 16.12. Synthetic Dataset -- 16.12.1. INCA Hit Gain -- 16.12.2. QoE Performance -- 16.12.3. Satisfied Users -- 16.13. Conclusions and Future Directions -- References -- ch. 17 Big Data Analytics on a Smart Grid: Mining PMU Data for Event and Anomaly Detection -- 17.1. Introduction -- 17.2. Smart Grid With PMUs and PDCs -- 17.3. Improving Traditional Workflow -- 17.4. Characterizing Normal Operation -- 17.5. Identifying Unusual Phenomena -- 17.6. Identifying Known Events -- 17.7. Related Efforts -- 17.8. Conclusion and Future Directions -- Acknowledgments -- References -- ch. 18 eScience and Big Data Workflows in Clouds: A Taxonomy and Survey -- 18.1. Introduction -- 18.2. Background -- 18.2.1. History -- 18.2.2. Grid-Based eScience -- 18.2.3. Cloud Computing -- 18.3. Taxonomy and Review of eScience Services in the Cloud -- 18.3.1. Infrastructure -- 18.3.2. Ownership -- 18.3.3. Application -- 18.3.4. Processing Tools -- 18.3.5. Storage -- 18.3.6. Security -- 18.3.7. Service Models -- 18.3.8. Collaboration -- 18.4. Resource Provisioning for eScience Workflows in Clouds -- 18.4.1. Motivation -- 18.4.2. Our Solution -- 18.5. Open Problems -- 18.6. Summary -- References.
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
No physical items for this record

Includes bibliographical references and index.

Machine generated contents note: ch. 1 BDA = ML + CC -- 1.1. Introduction -- 1.2. A Historical Review of Big Data -- 1.2.1. The Origin of Big Data -- 1.2.2. Debates of Big Data Implication -- 1.3. Historical Interpretation of Big Data -- 1.3.1. Methodology for Defining Big Data -- 1.3.2. Different Attributes of Definitions -- 1.3.3. Summary of 7 Types Definitions of Big Data -- 1.3.4. Motivations Behind the Definitions -- 1.4. Defining Big Data From 3Vs to 32Vs -- 1.4.1. Data Domain -- 1.4.2. Business Intelligent (BI) Domain -- 1.4.3. Statistics Domain -- 1.4.4. 32 Vs Definition and Big Data Venn Diagram -- 1.5. Big Data Analytics and Machine Learning -- 1.5.1. Big Data Analytics -- 1.5.2. Machine Learning -- 1.6. Big Data Analytics and Cloud Computing -- 1.7. Hadoop, HDFS, MapReduce, Spark, and Flink -- 1.7.1. Google File System (GFS) and HDFS -- 1.7.2. MapReduce -- 1.7.3. The Origin of the Hadoop Project -- 1.7.4. Spark and Spark Stack -- 1.7.5. Flink and Other Data Process Engines -- 1.7.6. Summary of Hadoop and Its Ecosystems -- 1.8. ML +CC -> BDA and Guidelines -- 1.9. Conclusion -- References -- ch. 2 Real-lime Analytics -- 2.1. Introduction -- 2.2. Computing Abstractions for Real-Time Analytics -- 2.3. Characteristics of Real-Time Systems -- 2.3.1. Low Latency -- 2.3.2. High Availability -- 2.3.3. Horizontal Scalability -- 2.4. Real-Time Processing for Big Data -- Concepts and Platforms -- 2.4.1. Event -- 2.4.2. Event Processing -- 2.4.3. Event Stream Processing and Data Stream Processing -- 2.4.4. Complex Event Processing -- 2.4.5. Event Type -- 2.4.6. Event Pattern -- 2.5. Data Stream Processing Platforms -- 2.5.1. Spark -- 2.5.2. Storm -- 2.5.3. Kafka -- 2.5.4. Flume -- 2.5.5. Amazon Kinesis -- 2.6. Data Stream Analytics Platforms -- 2.6.1. Query-Based EPSs -- 2.6.2. Rule-Oriented EPSs -- 2.6.3. Programmatic EPSs -- 2.7. Data Analysis and Analytic Techniques -- 2.7.1. Data Analysis in General -- 2.7.2. Data Analysis for Stream Applications -- 2.8. Finance Domain Requirements and a Case Study -- 2.8.1. Real-Time Analytics in Finance Domain -- 2.8.2. Selected Scenarios -- 2.8.3. CEP Application as a Case Study -- 2.9. Future Research Challenges -- References -- ch. 3 Big Data Analytics for Social Media -- 3.1. Introduction -- 3.2. NLP and Its Applications -- 3.2.1. Language Detection -- 3.2.2. Named Entity Recognition -- 3.3. Text Mining -- 3.3.1. Sentiment Analysis -- 3.3.2. Trending Topics -- 3.3.3. Recommender Systems -- 3.4. Anomaly Detection -- Acknowledgments -- References -- ch. 4 Deep Learning and Its Parallelization -- 4.1. Introduction -- 4.1.1. Application Background -- 4.1.2. Performance Demands for Deep Learning -- 4.1.3. Existing Parallel Frameworks of Deep Learning -- 4.2. Concepts and Categories of Deep Learning -- 4.2.1. Deep Learning -- 4.2.2. Mainstream Deep Learning Models -- 4.3. Parallel Optimization for Deep Learning -- 4.3.1. Convolutional Architecture for Fast Feature Embedding -- 4.3.2. DistBelief -- 4.3.3. Deep Learning Based on Multi-GPUs -- 4.4. Discussions -- 4.4.1. Grand Challenges of Deep Learning in Big Data -- 4.4.2. Future Directions -- References -- ch. 5 Characterization and Traversal of Large Real-World Networks -- 5.1. Introduction -- 5.2. Background -- 5.3. Characterization and Measurement -- 5.4. Efficient Complex Network Traversal -- 5.4.1. HPC Traversal of Large Networks -- 5.4.2. Algorithms for Accelerating AS-BFS on GPU -- 5.4.3. Performance Study of AS-BFS on GPU's -- 5.5. k-Core-Based Partitioning for Heterogeneous Graph Processing -- 5.5.1. Graph Partitioning for Heterogeneous Computing -- 5.5.2. k-Core-Based Complex-Network Unbalanced Bisection -- 5.6. Future Directions -- 5.7. Conclusions -- Acknowledgments -- References -- ch. 6 Database Techniques for Big Data -- 6.1. Introduction -- 6.2. Background -- 6.2.1. Navigational Data Models -- 6.2.2. Relational Data Models -- 6.3. NoSQL Movement -- 6.4. NoSQL Solutions for Big Data Management -- 6.5. NoSQL Data Models -- 6.5.1. Key-Value Stores -- 6.5.2. Column-Based Stores -- 6.5.3. Graph-Based Stores -- 6.5.4. Document-Based Stores -- 6.6. Future Directions -- 6.7. Conclusions -- References -- ch. 7 Resource Management in Big Data Processing Systems -- 7.1. Introduction -- 7.2. Types of Resource Management -- 7.2.1. CPU and Memory Resource Management -- 7.2.2. Storage Resource Management -- 7.2.3. Network Resource Management -- 7.3. Big Data Processing Systems and Platforms -- 7.3.1. Hadoop -- 7.3.2. Dryad -- 7.3.3. Pregel -- 7.3.4. Storm -- 7.3.5. Spark -- 7.3.6. Summary -- 7.4. Single-Resource Management in the Cloud -- 7.4.1. Desired Resource Allocation Properties -- 7.4.2. Problems for Existing Fairness Policies -- 7.4.3. Long-Term Resource Allocation Policy -- 7.4.4. Experimental Evaluation -- 7.5. Multiresource Management in the Cloud -- 7.5.1. Resource Allocation Model -- 7.5.2. Multiresource Fair Sharing Issues -- 7.5.3. Reciprocal Resource Fairness -- 7.5.4. Experimental Evaluation -- 7.6. Related Work on Resource Management -- 7.6.1. Resource Utilization Optimization -- 7.6.2. Power and Energy Cost Saving Optimization -- 7.6.3. Monetary Cost Optimization -- 7.6.4. Fairness Optimization -- 7.7. Open Problems -- 7.7.1. SLA Guarantee for Applications -- 7.7.2. Various Computation Models and Systems -- 7.7.3. Exploiting Emerging Hardware -- 7.8. Summary -- References -- ch. 8 Local Resource Consumption Shaping: A Case for MapReduce -- 8.1. Introduction -- 8.2. Motivation -- 8.2.1. Pitfalls of Fair Resource Sharing -- 8.3. Local Resource Shaper -- 8.3.1. Design Philosophy -- 8.3.2. Splitter -- 8.3.3. The Interleave MapReduce Scheduler -- 8.4. Evaluation -- 8.4.1. Experiments With Hadoop 1.x -- 8.4.2. Experiments With Hadoop 2.x -- 8.5. Related Work -- 8.6. Conclusions -- Appendix CPU Utilization With Different Slot Configurations and LRS -- References -- ch. 9 System Optimization for Big Data Processing -- 9.1. Introduction -- 9.2. Basic Framework of the Hadoop Ecosystem -- 9.3. Parallel Computation Framework: MapReduce -- 9.3.1. Improvements of MapReduce Framework -- 9.3.2. Optimization for Task Scheduling and Load Balancing of MapReduce -- 9.4. Job Scheduling of Hadoop -- 9.4.1. Built-In Scheduling Algorithms of Hadoop -- 9.4.2. Improvement of the Hadoop Job Scheduling Algorithm -- 9.4.3. Improvement of the Hadoop Job Management Framework -- 9.5. Performance Optimization of HDFS -- 9.5.1. Small File Performance Optimization -- 9.5.2. HDFS Security Optimization -- 9.6. Performance Optimization of HBase -- 9.6.1. HBase Framework, Storage, and Application Optimization -- 9.6.2. Load Balancing of HBase -- 9.6.3. Optimization of HBase Configuration -- 9.7. Performance Enhancement of Hadoop System -- 9.7.1. Efficiency Optimization of Hadoop -- 9.7.2. Availability Optimization of Hadoop -- 9.8. Conclusions and Future Directions -- References -- ch. 10 Packing Algorithms for Big Data Replay on Multicore -- 10.1. Introduction -- 10.2. Performance Bottlenecks -- 10.2.1. Hadoop/MapReduce Performance Bottlenecks -- 10.2.2. Performance Bottlenecks Under Parallel Loads -- 10.2.3. Parameter Spaces for Storage and Shared Memory -- 10.2.4. Main Storage Performance -- 10.2.5. Shared Memory Performance -- 10.3. The Big Data Replay Method -- 10.3.1. The Replay Method -- 10.3.2. Jobs as Sketches on a Timeline -- 10.3.3. Performance Bottlenecks Under Replay -- 10.4. Packing Algorithms -- 10.4.1. Shared Memory Performance Tricks -- 10.4.2. Big Data Replay at Scale -- 10.4.3. Practical Packing Models -- 10.5. Performance Analysis -- 10.5.1. Hotspot Distributions -- 10.5.2. Modeling Methodology -- 10.5.3. Processing Overhead Versus Bottlenecks -- 10.5.4. Control Grain for Drop Versus Drag Models -- 10.6. Summary and Future Directions -- References -- ch. 11 Spatial Privacy Challenges in Social Networks -- 11.1. Introduction -- 11.2. Background -- 11.3. Spatial Aspects of Social Networks -- 11.4. Cloud-Based Big Data Infrastructure -- 11.5. Spatial Privacy Case Studies -- 11.6. Conclusions -- Acknowledgments -- References -- ch. 12 Security and Privacy in Big Data -- 12.1. Introduction -- 12.2. Secure Queries Over Encrypted Big Data -- 12.2.1. System Model -- 12.2.2. Threat Model and Attack Model -- 12.2.3. Secure Query Scheme in Clouds -- 12.2.4. Security Definition of Index-Based Secure Query Techniques -- 12.2.5. Implementations of Index-Based Secure Query Techniques -- 12.3. Other Big Data Security -- 12.3.1. Digital Watermarking -- 12.3.2. Self-Adaptive Risk Access Control -- 12.4. Privacy on Correlated Big Data -- 12.4.1. Correlated Data in Big Data -- 12.4.2. Anonymity -- 12.4.3. Differential Privacy -- 12.5. Future Directions -- 12.6. Conclusions -- References -- ch. 13 Location Inferring in Internet of Things and Big Data -- 13.1. Introduction -- 13.2. Device-Based Sensing Using Big Data -- 13.2.1. Introduction -- 13.2.2. Approach Overview -- 13.2.3. Trajectories Matching -- 13.2.4. Establishing the Mapping Between Floor Plan and RSS Readings -- 13.2.5. User Localization -- 13.2.6. Graph Matching Based Tracking -- 13.2.7. Evaluation -- 13.3. Device-Free Sensing Using Big Data -- 13.3.1. Customer Behavior Identification -- 13.3.2. Human Object Estimation

Note continued: 13.4. Conclusion -- Acknowledgements -- References -- ch. 14 A Framework for Mining Thai Public Opinions -- 14.1. Introduction -- 14.2. XDOM -- 14.2.1. Data Sources -- 14.2.2. DOM System Architecture -- 14.2.3. MapReduce Framework -- 14.2.4. Sentiment Analysis -- 14.2.5. Clustering-Based Summarization Framework -- 14.2.6. Influencer Analysis -- 14.2.7. AsKDOM: Mobile Application -- 14.3. Implementation -- 14.3.1. Server -- 14.3.2. Core Service -- 14.3.3. I/O -- 14.4. Validation -- 14.4.1. Validation Parameter -- 14.4.2. Validation method -- 14.4.3. Validation results -- 14.5. Case Studies -- 14.5.1. Political Opinion: #prayforthailand -- 14.5.2. Bangkok Traffic Congestion Ranking -- 14.6. Summary and Conclusions -- Acknowledgments -- References -- ch. 15 A Case Study in Big Data Analytics: Exploring Twitter Sentiment Analysis and the Weather -- 15.1. Background -- 15.2. Big Data System Components -- 15.2.1. System Back-End Architecture -- 15.2.2. System Front-End Architecture -- 15.2.3. Software Stack -- 15.3. Machine-Learning Methodology -- 15.3.1. Tweets Sentiment Analysis -- 15.3.2. Weather and Emotion Correlation Analysis -- 15.4. System Implementation -- 15.4.1. Home Page -- 15.4.2. Sentiment Pages -- 15.4.3. Weather Pages -- 15.5. Key Findings -- 15.5.1. Time Series -- 15.5.2. Analysis with Hourly Weather Data -- 15.5.3. Analysis with Daily Weather Data -- 15.5.4. DBSCAN Cluster Algorithm -- 15.5.5. Straightforward Weather Impact on Emotion -- 15.6. Summary and Conclusions -- Acknowledgments -- References -- ch. 16 Dynamic Uncertainty-Based Analytics for Caching Performance Improvements in Mobile Broadband Wireless Networks -- 16.1. Introduction -- 16.1.1. Big Data Concerns -- 16.1.2. Key Focus Areas -- 16.2. Background -- 16.2.1. Cellular Network and VoD -- 16.2.2. Markov Processes -- 16.3. Related Work -- 16.4. VoD Architecture -- 16.5. Overview -- 16.6. Data Generation -- 16.7. Edge and Core Components -- 16.8. INCA Caching Algorithm -- 16.9. QoE Estimation -- 16.10. Theoretical Framework -- 16.11. Experiments and Results -- 16.11.1. Cache Hits With Nu, Nc, Nm and k -- 16.11.2. QoE Impact With Prefetch Bandwidth -- 16.11.3. User Satisfaction With Prefetch Bandwidth -- 16.12. Synthetic Dataset -- 16.12.1. INCA Hit Gain -- 16.12.2. QoE Performance -- 16.12.3. Satisfied Users -- 16.13. Conclusions and Future Directions -- References -- ch. 17 Big Data Analytics on a Smart Grid: Mining PMU Data for Event and Anomaly Detection -- 17.1. Introduction -- 17.2. Smart Grid With PMUs and PDCs -- 17.3. Improving Traditional Workflow -- 17.4. Characterizing Normal Operation -- 17.5. Identifying Unusual Phenomena -- 17.6. Identifying Known Events -- 17.7. Related Efforts -- 17.8. Conclusion and Future Directions -- Acknowledgments -- References -- ch. 18 eScience and Big Data Workflows in Clouds: A Taxonomy and Survey -- 18.1. Introduction -- 18.2. Background -- 18.2.1. History -- 18.2.2. Grid-Based eScience -- 18.2.3. Cloud Computing -- 18.3. Taxonomy and Review of eScience Services in the Cloud -- 18.3.1. Infrastructure -- 18.3.2. Ownership -- 18.3.3. Application -- 18.3.4. Processing Tools -- 18.3.5. Storage -- 18.3.6. Security -- 18.3.7. Service Models -- 18.3.8. Collaboration -- 18.4. Resource Provisioning for eScience Workflows in Clouds -- 18.4.1. Motivation -- 18.4.2. Our Solution -- 18.5. Open Problems -- 18.6. Summary -- References.

There are no comments on this title.

to post a comment.