using rvest to scrape match scores from crickbuzz in R -
i scraping page crickbuzz scores getting match details. using selector gadget getting css tag. things have done far :
crickbuzz <- read_html(httr::get("http://www.cricbuzz.com/cricket-match/live-scores")) matches_dates <- crickbuzz %>% html_nodes(".schedule-date:nth-child(1)") %>% html_text()
i have fetched matches , scores , venues , having difficulty in fetching dates. getting below result above code
> matches_dates " - " " - " " " " " " " " " " " " " " " " " " - " " - " " - "
means getting 21 element , right there 21 matches , not getting text.
then had seen coming in html_nodes() , giving :
{xml_nodeset (21)} 1 <span class="schedule-date" timestamp="1452132000000" format="mmm dd'"> </span> 2 <span class="schedule-date" timestamp="1452132000000" format="mmm dd'"> </span> 3 <span class="schedule-date" timestamp="1452132000000" format="mmm dd'"> </span> , on....
this means not getting text tag. how ?
you need extract using timestamp attribute:
library(rvest) crickbuzz <- read_html(httr::get("http://www.cricbuzz.com/cricket-match/live-scores")) matches_dates <- crickbuzz %>% html_nodes(".schedule-date:nth-child(1)")%>% html_attr("timestamp") matches_dates [1] "1452268800000" "1452132000000" "1452247200000" "1452242400000" "1452327000000" "1452290400000" "1452310200000" "1452310200000" "1452310200000" [10] "1452310200000" "1452324600000" "1452324600000" "1452324600000" "1452324600000" "1452324600000" "1452150000000" "1452153600000" "1452153600000" # unix time , if need convert date-time format, follow answer question: http://stackoverflow.com/questions/13456241/convert-unix-epoch-to-date-object-in-r
Comments
Post a Comment