Рубрика: R

Count number of occurrences of a character in a string

I was looking for a more optimal solution to my approach of counting occurrences of a character in a string in R. And I found this post with the following solution:

countCharOccurrences <- function(char, s) { s2 <- gsub(char,"",s) return (nchar(s) - nchar(s2)) }

I don't see a contact information there to write to the author and suggest my solution, so I'll put it here:


countCharOccurrences2 <- function(char, s) { length(strsplit(s, char, fixed=TRUE)[[1]])-1 }

This test shows mine is 5 times faster.


library(microbenchmark)
microbenchmark(countCharOccurrences(":","2:2:00"), countCharOccurrences2(":","2:2:00"), times=10000L)

Unit: microseconds
expr min lq mean median uq max neval
countCharOccurrences(":", "2:2:00") 13.866 15.326 16.138550 15.690 16.056 1807.277 10000
countCharOccurrences2(":", "2:2:00") 2.190 3.284 3.940256 4.014 4.380 25.178 10000

I will actually pass the string as the first argument, and fixing "fixed" isn't ideal, instead ... should pass arguments. Except for that, it's still probably quite an awkward way to do it. What is the proper way?

Also, I needed to count it to now what format to apply when converting string to a time - sometimes timestamp is like "1:00: and sometimes "1:23:00" - depending if it's more than hour. I count ":" and apply either hms() or ms() from lubridate library. There should be a better way to do this, right?