问题:
ElasticSearch中存在unassigned的问题。
处理方式:
参考
首先查看状态:
[appadmin@plm_logmgt_es1 ~]$ curl -s "http://localhost:9200/_cat/shards"tbg_prd_prc_arc 1 p STARTED 3022779 1.3gb 192.168.0.7 esnode1 tbg_prd_prc_arc 4 p STARTED 3100536 1.4gb 192.168.0.7 esnode1 tbg_prd_prc_arc 0 p STARTED 3886180 1.7gb 192.168.0.7 esnode1 tbg_prd_bgms 2 p STARTED 10493792 2.7gb 192.168.0.7 esnode1 tbg_prd_bgms 2 r UNASSIGNED tbg_prd_bgms 0 p STARTED 8882000 2.3gb 192.168.0.7 esnode1 tbg_prd_bgms 0 r STARTED 8882000 2.3gb 192.168.0.9 esnode3 tbg_prd_sm 1 p STARTED 41107 11.6mb 192.168.0.7 esnode1 tbg_prd_sm 1 r STARTED 41107 11.5mb 192.168.0.9 esnode3 tbg_prd_sm 4 p STARTED 41430 11.6mb 192.168.0.7 esnode1 tbg_prd_sm 4 r STARTED 41430 11.7mb 192.168.0.9 esnode3 tbg_prd_sm 3 p STARTED 39404 10.9mb 192.168.0.9 esnode3 tbg_prd_sm 3 r UNASSIGNED tbg_prd_sm 2 p STARTED 40211 11.2mb 192.168.0.7 esnode1 tbg_prd_sm 2 r STARTED 40211 11.2mb 192.168.0.9 esnode3 tbg_prd_sm 0 p STARTED 39159 10.7mb 192.168.0.7 esnode1 tbg_prd_sm 0 r STARTED 39159 10.7mb 192.168.0.9 esnode3 tbg_prd_apache 1 p STARTED 2493881 937.3mb 192.168.0.7 esnode1 tbg_prd_apache 1 r UNASSIGNED tbg_prd_apache 4 r STARTED 2550738 957mb 192.168.0.7 esnode1 tbg_prd_apache 4 p STARTED 2550738 957mb 192.168.0.9 esnode3 tbg_prd_apache 3 p UNASSIGNED tbg_prd_apache 3 r UNASSIGNED tbg_prd_apache 2 p STARTED 2964898 1gb 192.168.0.7 esnode1 tbg_prd_apache 2 r STARTED 2964898 1gb 192.168.0.9 esnode3 tbg_prd_apache 0 p STARTED 2700325 1016.8mb 192.168.0.7 esnode1 tbg_prd_apache 0 r STARTED 2700325 1016.9mb 192.168.0.9 esnode3 tbg_qas_smgc 1 p STARTED 11316 6mb 192.168.0.7 esnode1 tbg_qas_smgc 1 r STARTED 11316 6mb 192.168.0.9 esnode3 tbg_qas_smgc 4 p STARTED 11068 5.8mb 192.168.0.7 esnode1 tbg_qas_smgc 4 r STARTED 11068 5.8mb 192.168.0.9 esnode3 tbg_qas_smgc 3 p STARTED 11380 6mb 192.168.0.7 esnode1 tbg_qas_smgc 3 r STARTED 11380 6mb 192.168.0.9 esnode3 tbg_qas_smgc 2 p STARTED 11399 6mb 192.168.0.7 esnode1 tbg_qas_smgc 2 r STARTED 11399 6mb 192.168.0.9 esnode3 tbg_qas_smgc 0 p STARTED 11252 5.9mb 192.168.0.7 esnode1 tbg_qas_smgc 0 r UNASSIGNED tbg_qas_esi_adp 1 p STARTED 14909 3.7mb 192.168.0.7 esnode1 tbg_qas_esi_adp 1 r STARTED 14909 3.7mb 192.168.0.9 esnode3 tbg_qas_esi_adp 4 p STARTED 14975 3.7mb 192.168.0.7 esnode1 tbg_qas_esi_adp 4 r UNASSIGNED tbg_qas_esi_adp 3 r STARTED 13821 3.4mb 192.168.0.8 esnode2 tbg_qas_esi_adp 3 p STARTED 13821 3.4mb 192.168.0.9 esnode3 tbg_prd_esi_adp 1 p STARTED 7770291 1.7gb 192.168.0.7 esnode1 tbg_prd_esi_adp 1 r UNASSIGNED tbg_prd_esi_adp 4 p STARTED 7726400 1.7gb 192.168.0.7 esnode1 tbg_prd_esi_adp 0 p STARTED 4668127 1gb 192.168.0.7 esnode1 tbg_prd_esi_adp 0 r UNASSIGNED tbg_qas_le_adp 1 p STARTED 31731 8.2mb 192.168.0.7 esnode1 tbg_qas_le_adp 2 p STARTED 30471 7.8mb 192.168.0.7 esnode1 tbg_qas_le_adp 2 r UNASSIGNED tbg_qas_le_adp 0 p UNASSIGNED tbg_qas_le_adp 0 r UNASSIGNED tbg_qas_msgc 1 p STARTED 228672 124.2mb 192.168.0.7 esnode1 tbg_qas_msgc 1 r STARTED 228672 124.2mb 192.168.0.9 esnode3 tbg_qas_msgc 2 r STARTED 227977 124mb 192.168.0.9 esnode3 tbg_qas_msgc 0 p STARTED 228148 124mb 192.168.0.7 esnode1 tbg_qas_msgc 0 r UNASSIGNED .kibana 0 p STARTED 30 49.2kb 192.168.0.7 esnode1 .kibana 0 r STARTED 30 49.2kb 192.168.0.9 esnode3 tbg_qas_prc_arc 1 p STARTED 9109 4.6mb 192.168.0.7 esnode1 tbg_qas_prc_arc 3 p STARTED 8463 4.2mb 192.168.0.9 esnode3 tbg_qas_prc_arc 3 r UNASSIGNED tbg_qas_prc_arc 2 p STARTED 8853 4.5mb 192.168.0.7 esnode1 tbg_qas_prc_arc 2 r STARTED 8853 4.4mb 192.168.0.9 esnode3 tbg_qas_prc_arc 0 p STARTED 8528 4.2mb 192.168.0.7 esnode1 tbg_qas_prc_arc 0 r UNASSIGNED tbg_prd_smgc 1 p STARTED 3269 1.9mb 192.168.0.7 esnode1 tbg_prd_smgc 1 r STARTED 3269 1.9mb 192.168.0.9 esnode3 tbg_prd_smgc 2 p STARTED 3232 1.9mb 192.168.0.7 esnode1 tbg_prd_smgc 2 r UNASSIGNED tbg_prd_smgc 0 p STARTED 3436 2mb 192.168.0.9 esnode3 tbg_prd_smgc 0 r UNASSIGNED tbg_prd_msgc 1 p STARTED 281124 167.7mb 192.168.0.7 esnode1 tbg_prd_msgc 1 r STARTED 281124 167.7mb 192.168.0.9 esnode3 tbg_prd_le_adp 2 p STARTED 14916851 3.5gb 192.168.0.7 esnode1 tbg_prd_le_adp 2 r UNASSIGNED tbg_prd_le_adp 0 r STARTED 12598197 2.9gb 192.168.0.7 esnode1 tbg_prd_le_adp 0 p STARTED 12598197 2.9gb 192.168.0.9 esnode3 这里省略了很多[appadmin@plm_logmgt_es1 ~]$ curl -s "http://localhost:9200/_cat/shards"|grep UNASSIGNEDtbg_prd_bgms 2 r UNASSIGNED tbg_prd_sm 3 r UNASSIGNED tbg_prd_apache 1 r UNASSIGNED tbg_prd_apache 3 p UNASSIGNED tbg_prd_apache 3 r UNASSIGNED tbg_qas_smgc 0 r UNASSIGNED tbg_qas_esi_adp 4 r UNASSIGNED tbg_prd_esi_adp 1 r UNASSIGNED tbg_prd_esi_adp 0 r UNASSIGNED tbg_qas_le_adp 2 r UNASSIGNED tbg_qas_le_adp 0 p UNASSIGNED tbg_qas_le_adp 0 r UNASSIGNED tbg_qas_msgc 0 r UNASSIGNED tbg_qas_prc_arc 3 r UNASSIGNED tbg_qas_prc_arc 0 r UNASSIGNED tbg_prd_smgc 2 r UNASSIGNED tbg_prd_smgc 0 r UNASSIGNED tbg_prd_le_adp 2 r UNASSIGNED
处理方法:
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ "commands" : [ { "allocate" : { "index" : "tbg_qas_ms", "shard" : 1, "node" : "1ar-iyZKR8Svz1medQpRjA", "allow_primary" : true } } ] }'|python -m json.tool
网上给出了一个脚本来处理,但是测试失败。具体原因未找到。不过也有参考意义。
#!/bin/bashfor index in $(curl -s 'http://localhost:9200/_cat/shards' | grep UNASSIGNED | awk '{print $1}' | sort | uniq); do for shard in $(curl -s 'http://localhost:9200/_cat/shards' | grep UNASSIGNED | grep $index | awk '{print $2}' | sort | uniq); do echo $index $shard curl -XPOST 'localhost:9200/_cluster/reroute' -d "{ 'commands' : [ { 'allocate' : { 'index' : $index, 'shard' : $shard, 'node' : 'Master', 'allow_primary' : true } } ] }" sleep 5 donedone
另外一个有用的命令用来得到node信息:
curl -s http://10.120.31.145:9200/_nodes/process?pretty
2016-10-26补充:
执行reroute操作时报错:
[appadmin@hadoop4 ~]$ curl -XPOST 'localhost:9200/_cluster/reroute' -d '{> "commands" : [ {> "allocate" : {> "index" : "ebg_qas_apache",> "shard" : 0,> "node" : "esnode0",> "allow_primary" : true> }> }> ]> }'|python -m json.tool % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed100 1824 100 1567 100 257 7020 1151 --:--:-- --:--:-- --:--:-- 7058{ "error": { "reason": "[allocate] allocation of [ebg_qas_apache][0] on node {esnode0}{d7a9nTZqTaSx4iowTY2BDA}{192.168.0.15}{192.168.0.15:9300}{master=true} is not allowed, reason: [YES(allocation disabling is ignored)][YES(node passes include/exclude/require filters)][NO(more than allowed [85.0%] used disk on node, free: [2.947365288039714%])][YES(target node version [2.3.0] is same or newer than source node version [2.3.0])][YES(shard not primary or relocation disabled)][YES(below shard recovery limit of [2])][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][YES(shard is not allocated to same node or host)][YES(primary is already active)][YES(total shard limit disabled: [index: -1, cluster: -1] <= 0)]", "root_cause": [ { "reason": "[allocate] allocation of [ebg_qas_apache][0] on node {esnode0}{d7a9nTZqTaSx4iowTY2BDA}{192.168.0.15}{192.168.0.15:9300}{master=true} is not allowed, reason: [YES(allocation disabling is ignored)][YES(node passes include/exclude/require filters)][NO(more than allowed [85.0%] used disk on node, free: [2.947365288039714%])][YES(target node version [2.3.0] is same or newer than source node version [2.3.0])][YES(shard not primary or relocation disabled)][YES(below shard recovery limit of [2])][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][YES(shard is not allocated to same node or host)][YES(primary is already active)][YES(total shard limit disabled: [index: -1, cluster: -1] <= 0)]", "type": "illegal_argument_exception" } ], "type": "illegal_argument_exception" }, "status": 400}[appadmin@hadoop4 ~]$
其中有台机器的磁盘占用率超过85%,应该是这个原因。增加磁盘空间后,操作成功。