Because Knetbooks knows college students. Our rental program is designed to save you time and money. Whether you need a textbook for a semester, quarter or even a summer session, we have an option for you. Simply select a rental period, enter your information and your book will be on its way!
| Foreword | |
| Preface | |
| Acknowledgments | |
| Preliminaries | |
| Introduction | |
| What Is Data Mining? | |
| Where Is Data Mining Used? | |
| The Origins of Data Mining | |
| The Rapid Growth of Data Mining | |
| Why Are There So Many Different Methods? | |
| Terminology and Notation | |
| Road Maps to Thi... MORE | |
| Overview of the Data Mining Process | |
| Introduction | |
| Core Ideas in Data Mining | |
| Supervised and Unsupervised Learning | |
| The Steps in Data Mining | |
| Preliminary Steps | |
| Building a Model: Example with Linear Regression | |
| Using Excel for Data Mining | |
| Problems | |
| Data Exploration and Dimension Reduction | |
| Data Visualization | |
| Uses of Data Visualization | |
| Data Examples | |
| Boston Housing Data | |
| Ridership on Amtrak Trains | |
| Basic Charts: bar charts, line graphs, and scatterplots | |
| Distribution Plots | |
| Heatmaps: visualizing correlations and missing values | |
| MultiDimensional Visualization | |
| Adding Variables: color, hue, size, shape, multiple panels, animation | |
| Manipulations: rescaling,aggregation and hierarchies, zooming and panning, filtering | |
| Reference: trend line and labels | |
| Scaling up: large datasets | |
| Multivariate plot: parallel coordinates plot | |
| Interactive visualization | |
| Specialized Visualizations | |
| Visualizing networked data | |
| Visualizing hierarchical data: treemaps | |
| Visualizing geographical data: maps | |
| Summary of major visualizations and operations, according to data mining goal | |
| Prediction | |
| Classification | |
| Time series forecasting | |
| Unsupervised learning | |
| Problems | |
| Dimension Reduction | |
| Introduction | |
| Practical Considerations | |
| House Prices in Boston | |
| Data Summaries | |
| Correlation Analysis | |
| Reducing the Number of Categories in Categorical Variables | |
| Converting A Categorical Variable to A Numerical Variable | |
| Principal Components Analysis | |
| Breakfast Cereals | |
| Principal Components | |
| Normalizing the Data | |
| Using Principal Components for Classification and Prediction | |
| Dimension Reduction Using Regression Models | |
| Dimension Reduction Using Classification and Regression Trees | |
| Problems | |
| Performance Evaluation | |
| Evaluating Classification and Predictive Performance | |
| Introduction | |
| Judging Classification Performance | |
| Benchmark: The Naive Rule | |
| Class Separation | |
| The Classification Matrix | |
| Using the Validation Data | |
| Accuracy Measures | |
| Cutoff for Classification | |
| Performance in Unequal Importance of Classes | |
| Asymmetric Misclassification Costs | |
| Oversampling and Asymmetric Costs | |
| Classification Using a Triage Strategy | |
| Evaluating Predictive Performance | |
| Benchmark: The Average | |
| Prediction Accuracy Measures | |
| Problems | |
| Prediction and Classification Methods | |
| Multiple Linear Regression | |
| Introduction | |
| Explanatory vs. Predictive Modeling | |
| Estimating the Regression Equation and Prediction | |
| Example: Predicting the Price of Used Toyota Corolla Automobiles | |
| Variable Selection in Linear Regression | |
| Reducing the Number of Predictors | |
| How to Reduce the Number of Predictors | |
| Problems | |
| kNearest | |
| Neighbors (kNN) | |
| The kNN | |
| Classifier | |
| Determining Neighbors | |
| Classification Rule | |
| Example: Riding Mowers | |
| Choosing k | |
| Setting the Cutoff Value | |
| kNN | |
| With More Than 2 Classes | |
| kNN | |
| for a Numerical Response | |
| Advantages and Shortcomings of kNN | |
| Algorithms | |
| Problems | |
| Naive Bayes | |
| Introduction | |
| Predicting Fraudulent Financial Reporting | |
| The Practical Difficulty with the Complete (Exact) Bayes Procedure | |
| The Solution: Na‹ve Bayes | |
| Predicting Fraudulent Financial Reports, 2 Predictors | |
| Predicting Delayed Flights | |
| Advantages and Shortcomings of the naive Bayes Classifier | |
| Problems | |
| Classification and Regression Trees | |
| Introduction | |
| Classification Trees | |
| Recursive Partitioning | |
| Riding Mowers | |
| Measures of Impurity | |
| Evaluating the Performance of a Classification Tree | |
| Acceptance of Personal Loan | |
| Avoiding Overfitting | |
| Stopping Tree Growth: CHAID | |
| Pruning the Tree | |
| Classification Rules from Trees | |
| Classification Trees for More Than 2 Classes | |
| Regression Trees | |
| Prediction | |
| Measuring Impurity | |
| Evaluating Performance | |
| Advantages, Weaknesses, and Extensions | |
| Problems | |
| Logistic Regression | |
| Introduction | |
| The Logistic Regression Model | |
| Example: Acceptance of Personal Loan | |
| Model with a Single Predictor | |
| Estimating the Logistic Model from Data: Computing Parameter | |
| Estimates | |
| Interpreting Results in Terms of Odds | |
| Evaluating Classification Performance | |
| Variable Selection | |
| Example of Complete Analysis: Predicting Delayed Flights | |
| Data Preprocessing | |
| Model Fitting and Estimation | |
| Model Interpretation | |
| Model Performance | |
| Variable Selection | |
| Appendix: Logistic Regression for Profiling | |
| Appendix: Logistic regression for profiling | |
| Appendix: B: Evaluating Goodness of Fit | |
| Appendix B Evaluating Goodness of Fit | |
| Appendix: C: Logistic Regression for More Than Two Classes | |
| Appendix C Logistic Regression for More Than Two Classes | |
| Problems | |
| Neural Nets | |
| Introduction | |
| Concept and Structure of a Neural Network | |
| Fitting a Network to Data | |
| Tiny Dataset | |
| Computing Output of Nodes | |
| Preprocessing the Data | |
| Training the Model | |
| Classifying Accident Severity | |
| Avoiding overfitting | |
| Using the Output for Prediction and Classification | |
| Required User Input | |
| Exploring the Relationship Between Predictors and Response | |
| Advantages and Weaknesses of Neural Networks | |
| Problems | |
| Discriminant Analysis | |
| Introduction | |
| Riding Mowers | |
| Personal Loan Acceptance | |
| Distance of an Observation from a Class | |
| Fisher's Linear Classification Functions | |
| Classification Performance of Discriminant Analysis | |
| Prior Probabilities | |
| Unequal Misclassification Costs | |
| Classifying More Than Two Classes | |
| Medical Dispatch to Accident Scenes | |
| Advantages and Weaknesses | |
| Problems | |
| Mining Relationships Among Records | |
| Association Rules | |
| Introduction | |
| Discovering Association Rules in Transaction Databases | |
| Synthetic Data on Purchases of Phone Faceplates | |
| Generating Candidate Rules | |
| The Apriori Algorithm | |
| Selecting Strong Rules | |
| Support and Confidence | |
| Lift Ratio | |
| Data Format | |
| The Process of Rule Selection | |
| Interpreting the Results | |
| Statistical Significance of Rules | |
| Rules for Similar Book Purchases | |
| Summary | |
| Problems | |
| Cluster Analysis | |
| Introduction | |
| Example: Public Utilities | |
| Measuring Distance Between Two Records | |
| Euclidean Distance | |
| Normalizing Numerical Measurements | |
| Other Distance Measures for Numerical Data | |
| Distance Measures for Categorical Data | |
| Distance Measures for Mixed Data | |
| Measuring Distance Between Two Clusters | |
| Hierarchical (Agglomerative) Clustering | |
| Contents | |
| Minimum Distance (Single Linkage) | |
| Maximum Distance (Complete Linkage) | |
| Average Distance (Average Linkage) | |
| Dendrograms: Displaying Clustering Process and Results | |
| Validating Clusters | |
| Limitations of Hierarchical Clustering | |
| Nonhierarchical Clustering: The kMeans Algorithm | |
| Initial Partition into k Clusters | |
| Problems | |
| Forecasting Time Series | |
| Handling Time Series | |
| Introduction | |
| Explanatory vs. Predictive Modeling | |
| Popular Forecasting Methods in Business | |
| Combining Methods | |
| Time Series Components | |
| Example: Ridership on Amtrak Trains | |
| Data Partitioning | |
| Problems | |
| Regression Based Forecasting | |
| A Model with Trend | |
| Linear Trend | |
| Exponential Trend | |
| Polynomial Trend | |
| A Model with Seasonality | |
| A model with trend and seasonality | |
| Autocorrelation and ARIMA Models | |
| Computing Autocorrelation | |
| Computing Autocorrelation | |
| Improving Forecasts by Integrating Autocorrelation Information | |
| Improving Forecasts by Integrating Autocorrelation Information | |
| Evaluating Predictability | |
| Evaluating Predictability | |
| Problems | |
| Smoothing Methods | |
| Introduction | |
| Moving Average | |
| Centered Moving Average for Visualization | |
| Trailing Moving Average for Forecasting | |
| Choosing Window Width | |
| Simple Exponential Smoothing | |
| Choosing Smoothing Parameter | |
| Relation Between Moving Average and Simple Exponential | |
| Smoothing | |
| Advanced Exponential Smoothing | |
| Series with a trend | |
| Series with a trend and seasonality | |
| Series with seasonality | |
| Problems | |
| Cases | |
| Cases | |
| Charles Book Club | |
| German Credit | |
| Tayko Software Cataloger | |
| Segmenting Consumers of Bath Soap | |
| DirectMail Fundraising | |
| Catalog CrossSelling | |
| Predicting Bankruptcy | |
| Time Series Case: Forecasting Public Transportation Demand | |
| References | |
| Index | |
| Table of Contents provided by Publisher. All Rights Reserved. |
NITIN R. PATEL, PhD, is Chairman and cofounder of Cytel, Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology for over ten years.
PETER C. BRUCE is President and owner of statistics.com, the leading provider of online education in statistics.