本文是computing for data analysis課程的摘要。也有一些上課的心得。
Data types and Basic Oprations
basic or ‘atomic’ classes
- character
- numeric
- integer
- complex
- logical(True/False)
The most basic object is a vector
Numbers
- as numeric objects
- 1 is a numeric object; 1L is an integer
- NaN represent undefined value: 0/0
- Inf (infinity): 1/0 = Inf and 1/Inf = 0
Attributes R objects
- names, dimnames
- dimensions
- class
- length
- other user-definded attributes
Matrices
- vectors with a dimension attribute. dimension attribute is an ieteger vetor
1 | > m<-matrix(nrow=2,ncol=3) |
List
1 | > x <- list(1,"a", TRUE, 1 + 4i) |
NA
1 | > is.na(NAN) |
Data Frames
- 創建data frame 通常用read.table(), read.csv()
- 或者由matrix來創建 data.matrix()
1 | > x <- data.frame(foo=1:4, bar=c(T,T,F,F)) |
Data Type and Basic Operations
subsetting list
- [ 永遠會傳回跟原本物件一樣class的物件
- [[ 用來截取list或data frame的元素,回傳的物件不一定是list或data frame
- $ 用name來截取list或data frame的元素,很像是[[
1 | > x <- list(foo=1:4, bar=0.6) |
1 | > x <- list(foo = 1:4, bar = 0.6, baz = "hello") |
Subsetting Nestined Elements of a List
- [[ take an integer sequnece
1 | > x <- list(a = list(10,12,14), b = c(3.14, 2.81)) |
1 | > x1 = 1:4; x2 = 1:4 + 10; x3 = 1:4 + 20; x4 = 1:4 + 30 |
Removing NA values
1 | > airquality[1:6,] |
Vectorized Operations
1 | > x <- matrix(1:4, 2, 2); y <- matrix(rep(10,4), 2, 2) |
Reading and Writing Data
reading
- read.table, read.csv
- readLines, 從文字檔案讀進來
- source, 讀R code files
- dget, 讀R code files
- load, 讀入廚吋的workspaecs
- unserialize, 從binary form讀入R物件
writing
- write.table,
- writeLines
- dump
- dput
- save
- serialize
使用 read.tables讀大檔時,稍微計算記憶體會不會爆炸掉,指定colClasses
會讀得比較快,如果檔案裡面沒有註解,那就把comment.char=""
,這樣也會比較快
1 | # 讀前100行看看型態 |