Partial string matching in R -


this question has answer here:

i trying remove 'bad' email addresses csv. have column of emails "abd@no.com," "123@none.com," "@," or "a". there wide range of email formats want try find , remove them all.

my inital idea strictly @ end of email string - "@..." part. @ length of character, if email of length 1 or 2 not valid.

if have list of bad emails, want generate new list of emails bad ones replaced na.

below code have far not work , looks exact matches on pattern, not end of string.

        email_clean <- function(email, invalid = na)         {         email <- trimws(email)               # remove whitespace         email[nchar(email) %in% c(1,2)] <- invalid         bad_email <- c("\\@no.com", "\\@none.com","\\@email.com","\\@noemail.com")         pattern = paste0("(?i)\\b",paste0(bad_email,collapse="\\b|\\b"),"\\b")         emails <-gsub(pattern,"",sapply(csv_file$email,as.character))         email         }          cleaned_email <- email_clean(csv_file$email) 

thank help!!!

your function pretty close. note few tweaks:

email_clean <- function(email, invalid = na) {         email <- trimws(email)               # remove whitespace         email[nchar(email) %in% c(1,2)] <- invalid         bad_email <- c("\\@no.com", "\\@none.com","\\@email.com","\\@noemail.com")         pattern = paste0("(?i)\\b",paste0(bad_email,collapse="\\b|\\b"),"\\b")         email <-gsub(pattern, invalid, sapply(email,as.character))         unname(email) }  emails <- c("pierre@gmail.com", "hi@none.com", "@", "a") email_clean(emails) # [1] "pierre@gmail.com" na                 na                 # [4] na   

Comments

Popular posts from this blog

ruby - Trying to change last to "x"s to 23 -

jquery - Clone last and append item to closest class -

c - Unrecognised emulation mode: elf_i386 on MinGW32 -