Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The terminology is weird. I'm not an R expert, but here's how I think of it:

vector: this one is clear based on the name; it's a homogeneous sequence (with very aggressive type conversion). A sequence of strings, a sequence of numerics, etc. One thing worth knowing is that there are no atomic types, so c(1) == 1. That is, the value 1 is identical to the singleton vector containing 1. Also the empty vector c() is identical to NULL! is.null(c()) == TRUE. Weird.

list: the name is confusing, but I think of it basically like a dict in Python. And the syntax is the same: list(a=1, b=2) vs dict(a=1, b=2). I think you can use it like a sequence as you are saying, but I never use them that way. Lists are for ad hoc composite types -- if I want to return 2 values from a function, I return a list() of them. I think you can convert lists to environments easily, or they are the same -- also similar to Python's dicts.

data frame: This is the core type AFAICT, it is basically a collection of named column vectors of the same length. e.g. data.frame(name=c("a", "b", "c"), value=c(1,2,3)). This seems pretty intuitive. A row has different types (like a DB relation) but the columns have the same type since a column is vector.

matrix: I don't use these too much, but it basically seems like a homogeneous type like vector, except you specify the dimensions.

array: I don't use this, but the R documentation says "A 2-dimensional array is the same thing as a matrix". So I think I am confused and what I typed above is an "array", and matrix is the special 2D case. Yes the names are bad. I think of a matrix as having arbitrary number of dimensions (e.g. in matlab).

I think where it gets confusing is that there are all these arbitary conversions. And you can use things more than the prescribed ways, so you might stumble across code that uses them wrong. But after a fair amount of R programming, there is my mental model, whether right or wrong :)

I think a lot of the mess comes from the fact that dealing with real data is just messy. R takes the mess and makes the common case convenient, and people like that. But it's like Perl in that it's a "Do what I mean" language and tries to guess a lot, rather than "Do what I say" like Python. And when it's guessing your intent wrong it can leave you very frustrated, as with Perl.



Hi chubot,

Two things:

1) A data.frame is in fact a list of vectors of the same length "compacted" together.

2) I find the types very "sensible" for a person doing statistics. But I guess (almost) everything makes sense once you get used to it...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: