using rvest to scrape match scores from crickbuzz in R -


i scraping page crickbuzz scores getting match details. using selector gadget getting css tag. things have done far :

crickbuzz <- read_html(httr::get("http://www.cricbuzz.com/cricket-match/live-scores")) matches_dates <- crickbuzz %>% html_nodes(".schedule-date:nth-child(1)") %>% html_text() 

i have fetched matches , scores , venues , having difficulty in fetching dates. getting below result above code

> matches_dates      "   -     " "   -     " "   "       "   "       "   "       "   "   "  "           "   "       "   "       "   "       "   -     " "   -     " "   -     " 

means getting 21 element , right there 21 matches , not getting text.

then had seen coming in html_nodes() , giving :

{xml_nodeset (21)}  1 <span class="schedule-date" timestamp="1452132000000" format="mmm dd'">        </span> 2 <span class="schedule-date" timestamp="1452132000000" format="mmm dd'">        </span> 3 <span class="schedule-date" timestamp="1452132000000" format="mmm dd'">        </span> , on.... 

this means not getting text tag. how ?

you need extract using timestamp attribute:

library(rvest) crickbuzz <- read_html(httr::get("http://www.cricbuzz.com/cricket-match/live-scores")) matches_dates <- crickbuzz %>%     html_nodes(".schedule-date:nth-child(1)")%>%    html_attr("timestamp")  matches_dates  [1] "1452268800000" "1452132000000" "1452247200000" "1452242400000" "1452327000000" "1452290400000" "1452310200000" "1452310200000" "1452310200000" [10] "1452310200000" "1452324600000" "1452324600000" "1452324600000" "1452324600000" "1452324600000" "1452150000000" "1452153600000" "1452153600000"  # unix time , if need convert date-time format, follow answer  question:  http://stackoverflow.com/questions/13456241/convert-unix-epoch-to-date-object-in-r 

Comments

Popular posts from this blog

ruby - Trying to change last to "x"s to 23 -

jquery - Clone last and append item to closest class -

c - Unrecognised emulation mode: elf_i386 on MinGW32 -