hive - Hadoop Cluster Requirement Software /Hardware -


hi trying setup hadoop environment. in short problem trying solve involves billions of xml files of size few mb, extract relevant information them using hive , analytic work information. know trivial problem in hadoop world if hadoop solution works me size , number of files dealing increase in geometric progession form.

i did research referring various books "hadoop - definite guide", "hadoop in action". resources documents yahoo , hortonworks. not able figure out hardware /software specifications establishing hadoop environment. in resources had referred far had kind of found standard solutions

  1. namenode/jobtracker (2 x 1gb/s ethernet, 16 gb of ram, 4xcpu, 100 gb disk)
  2. datanode (2 x 1gb/s ethernet, 8 gb of ram, 4xcpu, multiple disks total amount
    of 500+ gb)

but if can give suggestions great.

first suggest consider: need more processing + storage or opposite, , view select hardware. case sounds more processing storage.
specify bit differently standard hardware hadoop
namenode: high quality disk in mirror, 16 gb hdd.
data nodes: 16-24 gb ram, dual quad or dual 6 cores cpu, 4 6 1-2-3 sata tb drives.

i consider 10 gbit option. think if not add more 15% of cluster price - makes sense. 15% came rough estimation data shipping mappers reducers takes 15% of job time.
in case more willing sacrifice disc sizes save money, not cpu/memory/number of drives.


Comments

Popular posts from this blog

ruby - Trying to change last to "x"s to 23 -

jquery - Clone last and append item to closest class -

c - Unrecognised emulation mode: elf_i386 on MinGW32 -