Finding out (fast) the classes of data.frame vectors

Sometimes it’s useful to write down the various classes of vectors inside your data.frame objects for documentation and other people to use it.

I’ve searched for a quick way to find out all the classes of vectors inside a data.frame.

Since I’ve found no reference for such a function/process I made one up.

I’d like to hear what people have to say about the following use of the “class” function on data.frames

a simple call :

> library(rpart) # comes with R
> data(kyphosis) # comes with rpart
> class(kyphosis)

"data.frame"

trying to use the “apply” function to know what classes are the columns in the data.frame yeilds the following unwanted result :

> apply(kyphosis,2,class)
Kyphosis         Age      Number       Start 
"character" "character" "character" "character" 

For some reason the apply function returns “character” on all vectors regardless of their true content (any ideas why?).

Anyhow, after some thought I’ve come up with the following function :

> allClass <- function(x) {unlist(lapply(unclass(x),class))}
> allClass(kyphosis)
Kyphosis       Age    Number     Start 
 "factor" "integer" "integer" "integer"

Compact, fast and quite useful. Of course the control flow needs more work to fit other classes and recognize when x is not a data.frame.

Comments are welcome.

Advertisements

12 thoughts on “Finding out (fast) the classes of data.frame vectors

  1. Very nice! I’m going to repost this to my own blog and link back to yours if you don’t mind. Thanks!

  2. how about sapply(kyposis,class)
    sapply normally applies the right function to either list or vector.

  3. Well, the str() function will tell you a bunch of stuff, including the types of the columns, but I like your one-liner. Simple and elegant, and reflects a nice understanding of how data.frames are constructed.

  4. I believe that apply needs to construct an array, and arrays need to be of homogenous data types, so it’s coercing everything to a common representation before calling class.

  5. Or:
    sapply(dataframe, class)

    apply transorms the data.frame into a matrix, so if your data.frame contains factor or character…

  6. > data(kyphosis, package=”rpart”)
    > sapply(kyphosis, class)
    Kyphosis Age Number Start
    “factor” “integer” “integer” “integer”

  7. Sweet post Aviad, keep going! 🙂

    I’ll add the two options that came to mind:
    1) str() (Which many people probably know)
    2) str.ls() (Which I was first informed about by Romain Francis)

    Cheers 🙂
    Tal

  8. apply converts its argument to matrix (or array) which has to be of a uniform type. Only ‘character’ will hold all the arguments in the data; hence that is what the implicit conversion give you.

    The usual answer is to remember that a data.frame is also a list (is.list(kyphosis)==TRUE) an use lapply(kyphosis, class) which gives you the results you expect.

    The sapply solution suggested is essentially the same: sapply is a convenience wrapper around lapply.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s