elasticsearch

search：搜索
elastic：弹性的
功能：
搜索引擎，全文索引的搜索引擎文件存储

概述：

elasticsearch是一个分布式可扩展的实时搜索和分析引擎，一个建立在全文搜索ApacheLucene基础上的搜索引擎，它不仅可以进行全文搜索，还可以进行以下工作：
1.分布式实时文件存储，并将每一个字段都编入索引，使其可以被搜索
2.实时分析的分布式搜索引擎
3.可以扩展到上百台服务器（水平扩展非常方便），处理PB级别的结构化或非及结构化数据
4.提供了REST API（软件和软件的接口）的操作接口，开箱即用

存储单位：
1YB=1024ZB
1ZB=1024EB
1EB=1024PB
1PB=1024TB
1TB=1024GB
1GB=1024MB
1MB=1024KB
1KB=1024B

基本概念：

elasticsearch使面向文档型数据库，一条数据在这里就是一个文档
1、索引（index）：
索引使具有相似特性的文档集合。例如：可以为客户数据提供索引，为产品目录建立另一个索引，以及为订单数据建立另一个索引。索引由名称（必须全部为小写）标识，该名称用于在对其中的文档执行索引、搜索、更新和删除操作时引用索引。在单个集群中，您可以定义尽可能多的索引。（可以将索引理解为数据库里面的一个库，分别存储不同的数据）
2、文档（document）：
Elasticsearch文档是一个存储在索引中的JSON文档。每个文档都有一个类型和对应的ID，这是唯一的。如：

{
   “index”：“packtpub”，
   “type”：“elk”，
   “_id”：“1”，
   “_version”：1，
   “found”：true，
   ”_source“：{
        book_name:“learning elk”，
        book_author：“鲁迅” 
    }
}   #json格式

3、字段（filed）
文档内的一个基本单位，键值对形式（book_name:“learning elk”）

4.类型（Type）
类型是index下的一个逻辑分类。比如：wealth这个index里，可以按照城市分组，也可以按照气候类型分组，这种分组就叫做类型

5.映射（Mapping）
映射用于映射文档的每个filed及其对应的数据类型，例如字符串、整数、浮点数、双精度数、日期等等。

6.分片（Shard）
shard：单台机器无法存储大量数据，es可以将一个索引中的数据切分为多个shard，分布在多台服务器上存储。有了shard就可以横向扩展，存储更多数据，让搜索和分析等操作分布到多台服务器上去执行，提升吞吐量和性能。
索引可以理解为sql中的库。

主分片（实际工作的）/复制分片（备份的）（两个分片一模一样，）
主分片：shard/primary shard
复制分片：replica shard
主分片和复制分片不会放在一起

7.分词
把一段文本中的词按一定规则进行切分

8.主分片（shard/Primary shard）与复制分片（replica shard）
复制分片通常驻留在一个不同的节点上，而不是主碎片，在故障转移和负载平衡的情况下，可以满足多个请求

9.集群（Cluster）
集群式存储索引数据的节点集合。elasticsearch提供了水平的可伸缩性用以存储集群中的数据。每个集群都由一个集群名称来表示，不同的节点指明集群名称连接在一起。

10.节点（Node）
节点是一个单独远行的elasticsearch实例，它属于一个集群。默认情况下，elasticsearch中的每个节点口加入名为“elasticsearch”的集群。每个节点都可以在elasticsearch中使用自己的elastic search.yml，他们可以对内存和资源分配有不同的设置。

es集群分类

1. 数据节点(Data Node)
数据节点索引文档并对索引文档执行搜索。建议添加更多的数据节点，以提高性能或扩展集群。通过在
elasticsearch中设置这些属性，可以使节点成为一个数据节点。elasticsearch.yml配置

2. 管理节点(Master Node)
主节点负责集群的管理。对于大型集群，建议有三个专用的主节点(一个主节点和两个备份节点)，它们只作为
主节点，不存储索引或执行搜索。

3. 路由节点亦称负载均衡节点(Routing Node or load balancer node)
这些节点不扮演主或数据节点的角色，但只需执行负载平衡，或为搜索请求路由，或将文档编入适当的节
点。这对于高容量搜索或索引操作非常有用。

4. 提取节点（lngest节点）
可以创建多个预处理管道，用以修改传入文档

集群节点分类：

1.数据节点（Data Node）
数据节点索引文档并对索引文档执行搜索。建议添加更多的数据节点，以提高性能或扩展集群。通过在elasticsearch中设置这些属性，可以是节点成为一个数据节点。elasticsearch.yml配置node.master = false node.data = true（是否为数据节点）

2.管理节点（Master Node）
主节点负责集群的管理。对于大型集群,建议有三个专用的主节点(一个主节点和两个备份节点),它们只作为主节点,不存储索引减或执行搜索。在 elasticsearch.yml配置声明节点为主节点: node. master = true（有资格选为管理节点） node.data = false

3.路由节点亦称负载均衡节点( Routing Node or load balancernde)
这些节点不扮演主或数据节点的角色,但只需执行负载平衡, 或为搜索请求路由,或将文档编入适当的节点。这对于高容量搜索或索引操作非常有用。node. master= false node.data=false

4.提取节点（Ingest节点）
可以创建多个预处理管道，用以修改传入文档。

zendiscovery通信

默认ES进程会绑定在自己的回环地址上，然后扫描本机的9300和9305号端口，尝试跟其它端口上的启动的es进程进行通信，然后自动形成一个集群。如果修改了监听地址为非回环地址，ES按照配置文件里指定的地址或自动扫描当前网段其它节点，自动跟其它节点上的ES node进行通信。

zendiscovery扫描方式：
默认扫描端口或配置文件中内容指定扫描端口

Master选举

如下图所示：

图片说明：
Activenodes： 活跃的节点

临时节点： 1 .数据节点2.管理节点，现在选举只选举管理节点，没有数据节点，形成不了集群

Clusterstate: 包括主是谁，主的状态情况（需要及时更新主的数据）
下边的判断框（是否本地节点），是说master是否为本机，如果master不是本机，则发送加入集群的请求，如果master是本机，等待加入的节点数量达到规定数量以完成选举，选举完成之后宣布自己是master，发布clusterstate。要说明的是，当临时master选举出来的时候只是选举master的第一步，master是集群概念，所以要等其他加点加入进来才算真正意义的master。

集群自动抱团形成集群。

脑裂

因为网络或者其它故障，导致个集群被划分成了两伙或者多方势力，这些群伙都有多个node以及—个 master,那么原来的集群就出现了多个 master。 master主宰了集群状态的维护以及 shard的分配，因此如果有多个 master，可能会导致数据被破坏。

容错机制：

状态	意义
green	所有主分片和从分片都可用
yellow	所有主分片可用，但存在不可用的从分片
red	存在不可用的主分片

集群在主节点master宕机后，如何自行修复？
集群在主节点master宕机后，如何自行修复？
宕机瞬间
master node宕机的一瞬间，该节点的primary shard（主分片）就没有了，此时状态就不是active status，那么集群中就不是所有的主分片都是active的了。

容错步骤：（需要将宕掉的node能够及时启动起来）
1.master选举，es自动选举另一个node成为master，承担器master的责任
2.新master将丢失掉的主分片的某个复制分片提升为主分片，此时clusterstatus会变为yellow，因为所有的主分片都变成active status了，但是少了N个复制分片。
3.重启node，新master会将缺失的副本都copy一份到该节点，而且该节点会使用之前已有的分片数据，只是同步以下宕机的修改，cluster status变为green。

数据节点宕机后，如何修复？
当数据节点P3主分片宕机后，会把P3的复制分片提升为主分片，主分片会自动将P3、P2的复制分片重新做一份
复制分片交叉存放。

注：在生产环境中，每个节点都是单独的，都是非常单纯的数据节点或者管理节点。

es的集群

环境：

主机	ip
	192.168.10.30
	192.168.10.40
	192.168.10.50

三台都要做

[root@localhost ~]# vim /etc/security/limits.conf
#末尾添加
* hard nofile 819200
* soft nofile 819200
* soft nproc 2048
* hard nproc 4096
[root@localhost ~]# vim /etc/sysctl.conf 
#末尾添加
vm.max_map_count = 655360
[root@localhost ~]# sysctl -p
vm.max_map_count = 655360

[root@localhost ~]# vim /etc/selinux/config
#修改
SELINUX=disabled
[root@localhost ~]# systemctl stop firewalld
[root@localhost ~]# setenforce 0

[root@localhost ~]# reboot  #重启主机

[root@localhost ~]# ulimit -n
819200
[root@localhost ~]# sysctl -p
vm.max_map_count = 655360

安装es

[root@localhost ~]# groupadd es
[root@localhost ~]# useradd es -g es
[root@localhost ~]# tar -zxf elasticsearch-6.3.2.tar.gz 
[root@localhost ~]# mv elasticsearch-6.3.2 /usr/local/es
[root@localhost ~]# mkdir -p /es/{data,logs}
[root@localhost ~]# chown -R es:es /es/
[root@localhost ~]# chown -R es:es /usr/local/es/

第一台主机：

[root@localhost ~]# vim /usr/local/es/config/elasticsearch.yml
#取消注释
17 cluster.name: my-application  #集群名称
23 node.name: node-1  #节点名称
#添加
24 node.master: true  #可以作为数据节点，也能作为管理节点
25 node.data: true
#修改
35 path.data: /es/data #数据文件
39 path.logs: /es/logs  #日志文件
57 network.host: 192.168.10.30  #监听的ip
61 http.port: 9200 #监听的端口
#添加
62 transport.tcp.port: 9300  #集群通讯的端口
#取消注释并修改
71 discovery.zen.ping.unicast.hosts: ["192.168.10.30", "192.168.10.40","192.168.10.50"]   #集群的节点
75 discovery.zen.minimum_master_nodes: 2  #有几个节点可以选举主
#添加
76 discovery.zen.ping_timeout: 120s  
 77 client.transport.ping_timeout: 60s  #客户端连接超时的时间
#85行取消注释并在下面添加
85 gateway.recover_after_nodes: 3  #集群中有3个节点
86 gateway.recover_after_time: 5m  #如果5分钟后还是没有3个节点
87 gateway.recover_after_data_nodes: 2  #就用两个节点

将配置文件直接拷贝到后两台主机上

[root@localhost ~]# scp /usr/local/es/config/elasticsearch.yml [email protected]:/usr/local/es/config/elasticsearch.yml
[root@localhost ~]# scp /usr/local/es/config/elasticsearch.yml [email protected]:/usr/local/es/config/elasticsearch.yml

第二台：

[root@localhost ~]# vim /usr/local/es/config/elasticsearch.yml
 23 node.name: node-2
  57 network.host: 192.168.10.40

第三台：

[root@localhost ~]# vim /usr/local/es/config/elasticsearch.yml
23 node.name: node-3
57 network.host: 192.168.10.50

三台主机启动

[root@localhost ~]# su es
[es@localhost root]$ /usr/local/es/bin/elasticsearch
………………
[2021-01-26T10:48:36,242][INFO ][o.e.n.Node               ] [node-1] started  #看到这个则表示集群创建成功
…………

第一台主机另开一个终端，查看集群状态

[root@localhost ~]# curl -XGET http://192.168.10.30:9200/_cat/master  #查看主节点
4MFNOyx9QmW5EBPvnUUBjQ 192.168.10.50 192.168.10.50 node-3

[root@localhost ~]# curl -XGET http://192.168.10.30:9200/_cat/nodes   #查看组成员
192.168.10.50 20 96 1 0.08 0.04 0.05 mdi * node-3
192.168.10.30 27 95 0 0.03 0.04 0.05 mdi - node-1
192.168.10.40 19 96 0 0.04 0.04 0.05 mdi - node-2
[root@localhost ~]# curl -XGET http://192.168.10.30:9200/_cluster/health?pretty   #查看集群的状态
{
  "cluster_name" : "my-application",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

分词器的功能

三台都要做，另开终端

[root@localhost ~]# mkdir /usr/local/es/plugins/ik
[root@localhost ~]# unzip elasticsearch-analysis-ik-6.3.2.zip -d /usr/local/es/plugins/ik/
[root@localhost ~]# chown -R es:es /usr/local/es/

然后将es再次重启（ctrl+c打断，然后再执行一遍即可）

第一台上：
1、添加一个索引

[root@localhost ~]# curl -XPUT http://192.168.10.30:9200/ershou
{"acknowledged":true,"shards_acknowledged":true,"index":"ershou"}

2、添加一个映射，也就是指定使用哪种分词器

[root@localhost ~]# curl -XPOST http://192.168.10.30:9200/ershou/shouji/_mapping -H 'Content-Type:application/json' -d' 
> {
> "properties":{
> "content":{
> "type":"text",
> "analyzer":"ik_max_word",
> "search_analyzer":"ik_max_word"
> }
> }
> }'
{"acknowledged":true}

3、添加数据

[root@localhost ~]# curl -XPOST http://192.168.10.30:9200/ershou/shouji/1 -H 'Content-Type:application/json' -d'{"content":"二手 手机真便宜"}'
{"_index":"ershou","_type":"shouji","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":0,"_primary_term":1}

[root@localhost ~]# curl -XPOST http://192.168.10.30:9200/ershou/shouji/2 -H 'Content-Type:application/json' -d'{"content":"二手 电脑真便宜"}'
{"_index":"ershou","_type":"shouji","_id":"2","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":0,"_primary_term":1}

4、进行测试

[root@localhost ~]# curl -XPOST http://192.168.10.30:9200/ershou/shouji/_search -H 'Content-Type:application/json' -d'{                                     
> "query":{"match":{"content":"二手"}},
> "highlight":{
> "pre_tags":["<tag1>","<tag2>"],
> "post_tags":["</tag1>","</tag2>"],
> "fields":{
> "content":{}
> }
> } 
> }'
{"took":222,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":2,"max_score":0.8630463,"hits":[{"_index":"ershou","_type":"shouji","_id":"2","_score":0.8630463,"_source":{"content":"二手 电脑真便宜"},"highlight":{"content":["<tag1>二</tag1><tag1>手</tag1> 电脑真便宜"]}},{"_index":"ershou","_type":"shouji","_id":"1","_score":0.8630463,"_source":{"content":"二手 手机真便宜"},"highlight":{"content":["<tag1>二</tag1><tag1>手</tag1> 手机真便宜"]}}]}}

head的插件图形界面的插件

三台主机都要做

[root@localhost ~]# unzip elasticsearch-head-master.zip
[root@localhost ~]# tar -zxf node-v10.6.0-linux-x64.tar.gz
[root@localhost ~]# mv node-v10.6.0-linux-x64 /usr/local/node
[root@localhost ~]# echo 'PATH=$PATH:/usr/local/node/bin' >> /etc/profile
[root@localhost ~]# source /etc/profile
[root@localhost ~]# node -v
v10.6.0
[root@localhost ~]# npm -v
6.1.0

安装插件并修改配置

[root@localhost ~]# npm install -g grunt --registry=https://registry.npm.taobao.org
[root@localhost ~]# npm install -g cnpm --registry=https://registry.npm.taobao.org
[root@localhost ~]# mv elasticsearch-head-master /usr/local/es/head
[root@localhost ~]# cd /usr/local/es/head/
[root@localhost head]# cnpm install
[root@localhost head]# vim Gruntfile.js 
#添加
93                                         hostname:'0.0.0.0',
[root@localhost head]# vim /usr/local/es/config/elasticsearch.yml
#末尾添加
96 http.cors.enabled: true #开启跨域访问
97 http.cors.allow-origin: "*"    #所有用户都可以访问
[root@localhost head]# grunt server  #重启   会阻塞终端

然后把es集群重启

另开一个终端查看集群状态
注：这里可能需要等待的时间比较长，只有当看到它的状态变成green才可以

[root@localhost ~]# curl -XGET http://192.168.10.30:9200/_cluster/health?pretty
{
  "cluster_name" : "my-application",
  "status" : "green",    #集群状态
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 5,
  "active_shards" : 10,
  "relocating_shards" : 2,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 4,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 39,
  "active_shards_percent_as_number" : 100.0
}

访问http://192.168.10.30:9100 验证图形化界面是否成功

Es常用命令：

1、查看所有可查看项: _cat
[root@localhost ~]# curl http://192.168.10.30:9200/_cat

2、显示详细信息： ?v
[root@localhost ~]# curl http://192.168.10.30:9200/_cat/master?v

3、输出可用显示的列： ?help
[root@localhost ~]# curl http://192.168.10.30:9200/_cat/master?help

4、指定输出的列： ?h
[root@localhost ~]# curl http://192.168.10.30:9200/_cat/master?h=ip,node

5、查看所有索引
[root@localhost ~]# curl http://192.168.10.30:9200/_cat/indices?v

6、创建索引
[root@localhost ~]# curl -XPUT http://192.168.10.30:9200/kgctest1?pretty

7、关闭索引