怎样添加,移除,移动,复制,创建和查找节点

怎样添加,移除,移动,复制,创建和查找节点,第1张

(1)创建新节点

createDocumentFragment()    //创建一个DOM片段

createElement()   //创建一个具体的元素

createTextNode()   //创建一个文本节点

(2)添加、移除、替换、插入

appendChild()

removeChild()

replaceChild()

insertBefore()

(3)查找

getElementsByTagName()    //通过标签名称

getElementsByName()    //通过元素的Name属性的值

getElementById()    //通过元素Id,唯一性

1、创建元素节点

document.createElement() 方法 用于创建元素,接受一个参数,即要创建元素的标签名,返回创建的元素节点

(1)var div = document.createElement("div")//创建一个div元素

(2)div.id = "myDiv"//设置div的id

(3)div.className = "box"//设置div的class

创建元素后还要把元素添加到文档树中

2、添加元素节点

appendChild() 方法 用于向childNodes列表的末尾添加一个节点,返回要添加的元素节点

(1)var ul = document.getElementByIdx("myList")//获得ul

(2)var li = document.createElement("li")//创建li

(3) li.innerHTML = "项目四"//向li内添加文本

(4)ul.appendChild(li)//把li 添加到ul子节点的末尾

appendChild() 方法还可以添加已经存在的元素,会将元素从原来的位置移到新的位置

(1)var ul = document.getElementById("myList")//获得ul

(2)ul.appendChild(ul.firstChild)//把ul的第一个元素节点移到ul子节点的末尾

insertBefore() 方法,如果不是在末尾插入节点,而是想放在特定的位置上,用这个方法,该方法接受2个参数,第一个是要插入的节点,第二个是参照节点,返回要添加的元素节点

(1)var ul = document.getElementById("myList")//获得ul

(2)var li = document.createElement("li")//创建li

(3)li.innerHTML= "项目四"//向li内添加文本

(4)ul.insertBefore(li,ul.firstChild)//把li添加到ul的第一个子节点前

(1)var ul = document.getElementById("myList")//获得ul

(2)var li = document.createElement("li")//创建li

(3)li.innerHTML= "项目四"//向li内添加文本

(4)ul.insertBefore(li,ul.lastChild)//把li添加到ul的最后一个子节点(包括文本节点)之前

(1)var ul = document.getElementById("myList")//获得ul

(2)var li = document.createElement("li")//创建li

(3)li.innerHTML= "项目四"//向li内添加文本

(4)var lis = ul.getElementsByTagName("li") //获取ul中所有li的集合

(5)ul.insertBefore(li,lis[1])//把li添加到ul中的第二个li节点前

添加后:

3、移除元素节点

removeChild() 方法,用于移除节点,接受一个参数,即要移除的节点,返回被移除的节点,注意被移除的节点仍然在文档中,不过文档中已没有其位置了

(1)var ul = document.getElementById("myList")//获得ul

(2)var fromFirstChild = ul.removeChild(ul.firstChild)//移除ul第一个子节点

(1)var ul = document.getElementById("myList")//获得ul

(2)var lis = ul.getElementsByTagName("li") //获取ul中所有li的集合

(3)ul.removeChild(lis[0])//移除第一个li,与上面不同,要考虑浏览器之间的差异

4、替换元素节点

replaceChild() 方法,用于替换节点,接受两个参数,第一参数是要插入的节点,第二个是要替换的节点,返回被替换的节点

(1)var ul = document.getElementById("myList")//获得ul

(2)var fromFirstChild = ul.replaceChild(ul.firstChild)//替换ul第一个子节点

(1)var ul = document.getElementById("myList")//获得ul

(2)var li = document.createElement("li")//创建li

(3)li.innerHTML= "项目四"//向li内添加文本

(4)var lis = ul.getElementsByTagName("li") //获取ul中所有li的集合

(5)var returnNode = ul.replaceChild(li,lis[1])//用创建的li替换原来的第二个li

5、复制节点

cloneNode() 方法,用于复制节点, 接受一个布尔值参数, true 表示深复制(复制节点及其所有子节点), false 表示浅复制(复制节点本身,不复制子节点)

(1)var ul = document.getElementById("myList")//获得ul

(2)var deepList = ul.cloneNode(true)//深复制

(3)var shallowList = ul.cloneNode(false)//浅复制

Hadoop添加节点的方法

自己实际添加节点过程:

1. 先在slave上配置好环境,包括ssh,jdk,相关config,lib,bin等的拷贝;

2. 将新的datanode的host加到集群namenode及其他datanode中去;

3. 将新的datanode的ip加到master的conf/slaves中;

4. 重启cluster,在cluster中看到新的datanode节点;

5. 运行bin/start-balancer.sh,这个会很耗时间

备注:

1. 如果不balance,那么cluster会把新的数据都存放在新的node上,这样会降低mr的工作效率;

2. 也可调用bin/start-balancer.sh 命令执行,也可加参数 -threshold 5

threshold 是平衡阈值,默认是10%,值越低各节点越平衡,但消耗时间也更长。

3. balancer也可以在有mr job的cluster上运行,默认dfs.balance.bandwidthPerSec很低,为1M/s。在没有mr job时,可以提高该设置加快负载均衡时间。

其他备注:

1. 必须确保slave的firewall已关闭

2. 确保新的slave的ip已经添加到master及其他slaves的/etc/hosts中,反之也要将master及其他slave的ip添加到新的slave的/etc/hosts中

mapper及reducer个数

url地址: http://wiki.apache.org/hadoop/HowManyMapsAndReduces

HowManyMapsAndReduces

Partitioning your job into maps and reduces

Picking the appropriate size for the tasks for your job can radically change the performance of Hadoop. Increasing the number of tasks increases the framework overhead, but increases load balancing and lowers the cost of failures. At one extreme is the 1 map/1 reduce case where nothing is distributed. The other extreme is to have 1,000,000 maps/ 1,000,000 reduces where the framework runs out of resources for the overhead.

Number of Maps

The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to adjust their DFS block size to adjust the number of maps. The right level of parallelism for maps seems to be around 10-100 maps/node, although we have taken it up to 300 or so for very cpu-light map tasks. Task setup takes awhile, so it is best if the maps take at least a minute to execute.

Actually controlling the number of maps is subtle. The mapred.map.tasks parameter is just a hint to the InputFormat for the number of maps. The default InputFormat behavior is to split the total number of bytes into the right number of fragments. However, in the default case the DFS block size of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapred.min.split.size. Thus, if you expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k maps, unless your mapred.map.tasks is even larger. Ultimately the [WWW] InputFormat determines the number of maps.

The number of map tasks can also be increased manually using the JobConf's conf.setNumMapTasks(int num). This can be used to increase the number of map tasks, but will not set the number below that which Hadoop determines via splitting the input data.

Number of Reduces

The right number of reduces seems to be 0.95 or 1.75 * (nodes * mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can launch immediately and start transfering map outputs as the maps finish. At 1.75 the faster nodes will finish their first round of reduces and launch a second round of reduces doing a much better job of load balancing.

Currently the number of reduces is limited to roughly 1000 by the buffer size for the output files (io.buffer.size * 2 * numReduces <<heapSize). This will be fixed at some point, but until it is it provides a pretty firm upper bound.

The number of reduces also controls the number of output files in the output directory, but usually that is not important because the next map/reduce step will split them into even smaller splits for the maps.

The number of reduce tasks can also be increased in the same way as the map tasks, via JobConf's conf.setNumReduceTasks(int num).

自己的理解:

mapper个数的设置:跟input file 有关系,也跟filesplits有关系,filesplits的上线为dfs.block.size,下线可以通过mapred.min.split.size设置,最后还是由InputFormat决定。

较好的建议:

The right number of reduces seems to be 0.95 or 1.75 multiplied by (<no. of nodes>* mapred.tasktracker.reduce.tasks.maximum).increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures.

<property>

<name>mapred.tasktracker.reduce.tasks.maximum</name>

<value>2</value>

<description>The maximum number of reduce tasks that will be run

simultaneously by a task tracker.

</description>

</property>

单个node新加硬盘

1.修改需要新加硬盘的node的dfs.data.dir,用逗号分隔新、旧文件目录

2.重启dfs

同步hadoop 代码

hadoop-env.sh

# host:path where hadoop code should be rsync'd from. Unset by default.

# export HADOOP_MASTER=master:/home/$USER/src/hadoop

用命令合并HDFS小文件

hadoop fs -getmerge <src><dest>

重启reduce job方法

Introduced recovery of jobs when JobTracker restarts. This facility is off by default.

Introduced config parameters "mapred.jobtracker.restart.recover", "mapred.jobtracker.job.history.block.size", and "mapred.jobtracker.job.history.buffer.size".

还未验证过。

IO写 *** 作出现问题

0-1246359584298, infoPort=50075, ipcPort=50020):Got exception while serving blk_-5911099437886836280_1292 to /172.16.100.165:

java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/

172.16.100.165:50010 remote=/172.16.100.165:50930]

at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)

at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)

at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94)

at java.lang.Thread.run(Thread.java:619)

It seems there are many reasons that it can timeout, the example given in

HADOOP-3831 is a slow reading client.

解决办法:在hadoop-site.xml中设置dfs.datanode.socket.write.timeout=0试试;

My understanding is that this issue should be fixed in Hadoop 0.19.1 so that

we should leave the standard timeout. However until then this can help

resolve issues like the one you're seeing.

HDFS退服节点的方法

目前版本的dfsadmin的帮助信息是没写清楚的,已经file了一个bug了,正确的方法如下:

1. 将 dfs.hosts 置为当前的 slaves,文件名用完整路径,注意,列表中的节点主机名要用大名,即 uname -n 可以得到的那个。

2. 将 slaves 中要被退服的节点的全名列表放在另一个文件里,如 slaves.ex,使用 dfs.host.exclude 参数指向这个文件的完整路径

3. 运行命令 bin/hadoop dfsadmin -refreshNodes

4. web界面或 bin/hadoop dfsadmin -report 可以看到退服节点的状态是 Decomission in progress,直到需要复制的数据复制完成为止

5. 完成之后,从 slaves 里(指 dfs.hosts 指向的文件)去掉已经退服的节点

附带说一下 -refreshNodes 命令的另外三种用途:

2. 添加允许的节点到列表中(添加主机名到 dfs.hosts 里来)

3. 直接去掉节点,不做数据副本备份(在 dfs.hosts 里去掉主机名)

4. 退服的逆 *** 作——停止 exclude 里面和 dfs.hosts 里面都有的,正在进行 decomission 的节点的退服,也就是把 Decomission in progress 的节点重新变为 Normal (在 web 界面叫 in service)

Hadoop添加节点的方法

自己实际添加节点过程:

1. 先在slave上配置好环境,包括ssh,jdk,相关config,lib,bin等的拷贝;

2. 将新的datanode的host加到集群namenode及其他datanode中去;

3. 将新的datanode的ip加到master的conf/slaves中;

4. 重启cluster,在cluster中看到新的datanode节点;

5. 运行bin/start-balancer.sh,这个会很耗时间

备注:

1. 如果不balance,那么cluster会把新的数据都存放在新的node上,这样会降低mr的工作效率;

2. 也可调用bin/start-balancer.sh 命令执行,也可加参数 -threshold 5

threshold 是平衡阈值,默认是10%,值越低各节点越平衡,但消耗时间也更长。

3. balancer也可以在有mr job的cluster上运行,默认dfs.balance.bandwidthPerSec很低,为1M/s。在没有mr job时,可以提高该设置加快负载均衡时间。

其他备注:

1. 必须确保slave的firewall已关闭

2. 确保新的slave的ip已经添加到master及其他slaves的/etc/hosts中,反之也要将master及其他slave的ip添加到新的slave的/etc/hosts中


欢迎分享,转载请注明来源:内存溢出

原文地址:https://54852.com/bake/11394427.html

(0)
打赏 微信扫一扫微信扫一扫 支付宝扫一扫支付宝扫一扫
上一篇 2023-05-15
下一篇2023-05-15

发表评论

登录后才能评论

评论列表(0条)

    保存