Tuesday, October 20, 2015

The R Language: Vectors Everywhere.

When learning the R language, one of the data types you will hear about is called a vector.  In typical scientific and mathematic language a vector is generally defined as having a magnitude and a direction. This is common place in things such as geometry and higher level mathematics including dealing with interactions and motions of particles in 2 or 3 dimensional spaces.

In languages like Java and C++ Vectors are expandable collections of a specific object type. Creating these Vectors can increase the memory efficiency of a program since the size of the collection does not need to be known in advance In R, everything contains vectors including scalar types being a length of 1.  The basic types are even called the "atomic vectors" including logical (boolean), character/string, complex, etc. Below you can see the atomic vectors in action.

> a <- 1+2i
> b <- as.integer(1)
> c <- 1.0
> d <- TRUE
> e <- 'Testing'
> f <- charToRaw('A')
> typeof(a)
[1] "complex"
> typeof(b)
[1] "integer"
> typeof(c)
[1] "double"
> typeof(d)
[1] "logical"
> typeof(e)
[1] "character"
> typeof(f)
[1] "raw"

In order to see the true power of R however you need to organize information into larger structures. Vectors can actually hold more than a single object, but they must all be of the same basic type. If a previously existing vector can't accept the then it will try to convert the elements so that the vector can. As is in the case below you can see where it starts as logical type when a size of 0 and ends up a character type.

> vec <- vector()
> typeof(vec)
[1] "logical"
> vec[1] <- 1
> typeof(vec)
[1] "double"
> vec[2] <- 'a'
> typeof(vec)
[1] "character"

 Vectors and other collection types are accessed with a 1-based index. The other built-in types available are Lists, Matrices, and Data Frames. Lists are simple, but they don't require the types of all of it's elements to be of the same type. They also have the advantage of allowing for a key value access method.
> ll <- list()
> ll[1] <- 1
> ll[2] <- 2
> ll[3] <- 'a'
> ll['foo'] <- 'bar'
> ll
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] "a"

$foo
[1] "bar"

> ll$foo      # Name based access with $ operator
[1] "bar"
> ll[['foo']] # Named access method with double bracket
[1] "bar"
> ll[[4]]      # Index Based Access
[1] "bar"

Unlike vectors when trying to access members directly a [[]] double bracket is used, and if it is named then it can be accessed via index id or it's associated name. A matrix functions the same as when considering mathematical concepts, and requires that all elements are of the same type In fact there are many operations such as finding the determinate of a matrix which are part of the base installation.
> test <- c(1,2,3,4)
> testMat <- matrix(test, nrow=2, byrow = TRUE)
> testMat
    [,1] [,2]
[1,]    1    2
[2,]    3    4
> testMat[2,1]
[1] 3

Data Frames are the last thing we will talk about in the post. They act much like one would expect a table from a database to act. Each row can contain heterogenous data, but all data in a column must be of the same type. Accessing information can be done a number of ways and provide,  column vectors, column slices, and row slices. Column Vectors only provide the values in a particular column , and can be useful for aggregate functions such as averages and sums. Column slices provide a named associate with the specific column, and finally row slices provide the all rows which meet a particular criterion.

I hope this has been helpful in learning R.


No comments: