Category: Statistics
-
Raising Abstraction Level in SAS
In the following codes an attempt is made to raise the abstraction level available to the sas programmer by providing list manipulation macros: This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.…
-
The Need For MapReduce and NoSQL
The Need for MapReduce Relational Database Management Systems have been in use since 1970s. They provide the SQL language interface. They are good at needle in the haystack problems – finding small results from big datasets. They provide a number of advantages: A declarative query language Schemas Logical Data Independence Database Indexing Optimizations Through Use…
-
Introduction to Hadoop
Apache Hadoop is a software framework for distributed processing of very large datasets. It provides a distributed storage system Hadoop Distributed File System (HDFS)), and a processing part of the system MapReduce. The system is so designed so as to recover from hardware failures in some nodes that make up the distributed cluster. Hadoop is…
-
Difference Equations
1. Difference Equations 1.1. Introduction Time series analysis deals with a series of random variables. 1.2. First Order Difference Equations We will study time indexed random variables . Let be a linear function of and . Equation 1 is a linear first-order difference equation. It is a first-order difference equation because only depends on and…
-
Misconceptions About P Values and The History of Hypothesis Testing
1. Introduction The misconceptions surrounding p values are an example where knowing the history of the field and the mathematical and philosophical principles behind it can greatly help in understanding. The classical statistical testing of today is a hybrid of the approaches taken by R. A. Fisher on the one hand and Jerzy Neyman and…
-
Ordinary Least Squares Under Standard Assumptions
Suppose that a scalar is related to a vector, and a disturbance term according to the regression model. In this article, we will study the estimation and hypothesis testing of when is deterministic and is i.i.d. Gaussian. 1. The Algebra of Linear Regression Given a sample of T values of and the vector , the…
-
Confidence Interval Interpretation
In frequentist statistics (which is one the used by journals and academia), is a fixed quantity, not a random variable. Hence, a confidence interval is not a probability statement about . A 95 percent confidence interval does not mean that the interval would capture the true value 95 percent of the time. This statement would…
-
Hypothesis Testing and p-values
1. Introduction Hypothesis testing is a method of inference. Definition 1 A hypothesis is a statement about a population parameter. Definition 2 Null and Alternate Hypothesis: We partition the parameter space into two disjoint sets and and we wish to test: We call the null hypothesis and the alternate hypothesis. Definition 3 Rejection Region: Let…
-
Parametric Inference
There are two methods of estimating . 1. Method of Moments It is a method of generating parametric estimators. These estimators are not optimal but they are easy to compute. They are also used to generate starting values for other numerical parametric estimation methods. Definition 1 Moments and Sample Moments: Suppose that the parameter has…
-
Introduction to Statistical Inference
1. Introduction We assume that the data we are looking at comes from a probability distribution with some unknown parameters that control the exact shape of the distribution. Definition 1 Statistical Inference: It is the process of using given data to infer the properties of the distribution (for example the values of the parameters) which…