Sometimes it’s useful to write down the various classes of vectors inside your data.frame objects for documentation and other people to use it.

I’ve searched for a quick way to find out all the classes of vectors inside a data.frame.

Since I’ve found no reference for such a function/process I made one up.

I’d like to hear what people have to say about the following use of the “class” function on data.frames

a simple call :

> library(rpart) # comes with R
> data(kyphosis) # comes with rpart
> class(kyphosis)
"data.frame"

trying to use the “apply” function to know what classes are the columns in the data.frame yeilds the following unwanted result :

> apply(kyphosis,2,class)
Kyphosis Age Number Start
"character" "character" "character" "character"

For some reason the apply function returns “character” on all vectors regardless of their true content (any ideas why?).

Anyhow, after some thought I’ve come up with the following function :

> allClass <- function(x) {unlist(lapply(unclass(x),class))}
> allClass(kyphosis)
Kyphosis Age Number Start
"factor" "integer" "integer" "integer"

Compact, fast and quite useful. Of course the control flow needs more work to fit other classes and recognize when x is not a data.frame.

Comments are welcome.

### Like this:

Like Loading...

*Related*

Very nice! I’m going to repost this to my own blog and link back to yours if you don’t mind. Thanks!

Of course I don’t mind, you’re welcome.

how about sapply(kyposis,class)

sapply normally applies the right function to either list or vector.

Well, the str() function will tell you a bunch of stuff, including the types of the columns, but I like your one-liner. Simple and elegant, and reflects a nice understanding of how data.frames are constructed.

I believe that apply needs to construct an array, and arrays need to be of homogenous data types, so it’s coercing everything to a common representation before calling class.

Or:

sapply(dataframe, class)

apply transorms the data.frame into a matrix, so if your data.frame contains factor or character…

Thanks.

> data(kyphosis, package=”rpart”)

> sapply(kyphosis, class)

Kyphosis Age Number Start

“factor” “integer” “integer” “integer”

Thank you, much more neat

Sweet post Aviad, keep going! 🙂

I’ll add the two options that came to mind:

1) str() (Which many people probably know)

2) str.ls() (Which I was first informed about by Romain Francis)

Cheers 🙂

Tal

apply converts its argument to matrix (or array) which has to be of a uniform type. Only ‘character’ will hold all the arguments in the data; hence that is what the implicit conversion give you.

The usual answer is to remember that a data.frame is also a list (is.list(kyphosis)==TRUE) an use lapply(kyphosis, class) which gives you the results you expect.

The sapply solution suggested is essentially the same: sapply is a convenience wrapper around lapply.

Thanks, your answer really sums it up