第一章:基础入门
ElasticSearch 是一个实时的分布式搜索分析引擎,被认作是全文检索、结构化搜索和分析这三个功能的组合。
Elasticsearch 建立在全文搜索引擎库 Apache Lucene 基础之上。
一、安装并运行Elasticsearch
1.1 下载Elasticsearch
1.1.1 下载Elasticsearch
- 从 elastic 官网获取最新版本:https://www.elastic.co/cn/downloads/elasticsearch
- 从国内镜像站下载:https://repo.huaweicloud.com/elasticsearch/
- 使用Docker安装:
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.8.0
1.1.2 启动Elasticsearch
Linux 环境下:
./bin/elasticsearch
Windows 环境下:
./bin/elasticsearch.bat
Docker:
docker run -p 9200:9200 -p 9300:9300 -e "descovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.8.0
Docker Compose:
创建
docker-compse.yml
文件version: '2.2' services: es01: image: docker.elastic.co/elasticsearch/elasticsearch:7.8.0 container_name: es01 environment: - node.name=es01 - cluster.name=es-docker-cluster - discovery.seed_hosts=es02,es03 - cluster.initial_master_nodes=es01,es02,es03 - bootstrap.memory_lock=true - "ES_JAVA_OPTS=-Xms512m -Xmx512m" ulimits: memlock: soft: -1 hard: -1 volumes: - data01:/usr/share/elasticsearch/data ports: 9200:9200 networks: - elastic es02: image: docker.elastic.co/elasticsearch/elasticsearch:7.8.0 container_name: es02 environment: - node.name=es02 - cluster.name=es-docker-cluster - discovery.seed_hosts=es01,es03 - cluster.initial_master_nodes=es01,es02,es03 - bootstrap.memory_lock=true - "ES_JAVA_OPTS=-Xms512m -Xmx512m" ulimits: memlock: soft: -1 hard: -1 volumes: - data02:/usr/share/elasticsearch/data networks: - elastic es03: image: docker.elastic.co/elasticsearch/elasticsearch:7.8.0 container_name: es03 environment: - node.name=es03 - cluster.name=es-docker-cluster - discovery.seed_hosts=es01,es02 - cluster.initial_master_nodes=es01,es02,es03 - bootstrap.memory_lock=true - "ES_JAVA_OPTS=-Xms512m -Xmx512m" ulimits: memlock: soft: -1 hard: -1 volumes: - data03:/usr/share/elasticsearch/data networks: - elastic volumns: data01: driver: local data02: driver: local data03: driver: local networks: elastic: driver: bridge
启动
docker-compse up
1.1.3 测试
执行:curl http://localhost:9200/?pretty
,可得到如下响应:
{
"name" : "DESKTOP-6N3N71M",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "-oFT3sGkRAK9RqgTFMYvSQ",
"version" : {
"number" : "7.8.0",
"build_flavor" : "default",
"build_type" : "zip",
"build_hash" : "757314695644ea9a1dc2fecd26d1a43856725e65",
"build_date" : "2020-06-14T19:35:50.234439Z",
"build_snapshot" : false,
"lucene_version" : "8.5.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
1.2 安装Kibana
Kibana 是与 Elasticsearch 一起使用的开源分析和可视化平台。用于与 Elasticsearch 索引中存储的数据进行交互。可以轻松的执行高级数据分析,并在各种图表、表格和地图中可视化数据。
1.2.1 下载Kibana
- 官网下载:https://www.elastic.co/cn/downloads/kibana
- 华为镜像站:https://repo.huaweicloud.com/kibana/
- Unix上使用apt-get:
- 下载GPG:
wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
- 添加仓库:
echo "deb https://packages.elastic.co/kibana/4.6/debian stable main" | sudo tee -a /etc/apt/sources.list.d/kibana.list
- 安装Kibana:
sudo apt-get update && sudo apt-get install kibana
- 初始化:
sudo update-rc.d kibana defaults 95 10
- 使用服务(可选):
sudo /bin/systemctl daemon-reload sudo /bin/systemctl enable kibana.service
- 下载GPG:
1.2.2 启动Kibana
从安装目录运行:bin/kibana
,并访问 localhost:5601
,可得到类似如下页面:
1.3 安装Sense
最新版本已不需要安装Sense了,而改用Dev Tools: http://localhost:5601/app/kibana#/dev_tools/console。
二、与 Elasticsearch 交互
2.1 Java API
Elasticsearch 内置两个客户端:
- 节点客户端 Node Client:
- 节点客户端作为一个非数据节点加入到本地集群中。它本身不保存任何数据,但它知道数据在集群的哪个节点中,并且可以把请求转发到正确的节点
- 传输客户端 Transport client:
- 轻量级的传输客户端,将请求转发到远程集群。它本身不加入集群,但是可将请求转发到集群中的节点上
2.2 Restfule API with JSON over HTTP
使用 Restful API 通过 9200 端口与 Elasticsearch 通信。
请求格式为:curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
参数说明:
变量 | 描述 |
---|---|
VERB | HTTP 方法,如GET、POST、PUT、HEAD、DELETE |
PROTOCOL | 协议,如 HTTP、HTTPS |
HOST | ES 任意节点的主机名 |
PORT | ES 服务的端口号,默认9200 |
PATH | API 终端路径(如_count将返回集群中文档数量) |
QUERY_STRING | 查询参数(如?pretty会格式化输出JSON值) |
BODY | JSON 格式的请求体 |
可使用 Kibana 提供的 Dev TOOLS 进行测试
三、面向文档
Elasticsearch 是面向文档的,意味着它存储整个对象或文档,并索引每个文档的内容,使之能够被检索。
四、使用方法
注:_type 字段,在 ES6.0之后就过期了。去不去掉使用方法都是一致的。
4.1 创建文档
向 ES 中插入几条信息:
PUT /megacorp/employee/1
{
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": ["sports","music"]
}
PUT /megacorp/employee/2
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
PUT /megacorp/employee/3
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
- megacorp :索引名称,对应
_index
字段 - employee:类型名称。对应
_type
字段 - 1:特定的ID,对应
_id
字段
4.2 检索文档
GET /megacorp/employee/1
:可返回原始的 JSON 文档。GET /megacorp/employee/_search
:搜索所有雇员,并将结果置于 hits 中GET /megacorp/employee/_search?q=last_name:Smith
:条件查询,条件跟在?q=
后面,可与表达式查询一起使用GET /megacorp/employee/_search { "query": { "match": { "last_name": "Smith" } } }
:DSL 表达式查询,可与条件查询一起使用
GET /megacorp/employee/_search { "query": { "bool": { "must": [ {"match": { "last_name": "smith" }} ], "filter": [ {"range": { "age": { "gte": 30 } }} ] } } }
:复杂条件查询,查询姓氏为 Smith 的员工,且年龄大于30岁的
GET /megacorp/employee/_search { "query": { "match": { "about": "rock climbing" } } }
:全文搜索,使用
match
查询在about
属性上搜索rock climbing
;_score
为相关性得分,数值越大匹配度越高。这也是完全区别于传统关系型数据库的,数据库中的记录要么匹配要么不匹配。GET /megacorp/employee/_search { "query": { "match_phrase": { "about": "rock climbing" } } }
:短语搜索,使用
match_phrase
精确匹配一系列单词或短语GET /megacorp/employee/_search { "query": { "match_phrase": { "about": "rock climbing" } }, "highlight": { "fields": { "about": {} } } }
:高亮搜索,使用
highlight
参数指定高亮字段;返回结果中,多了highlight
字段,并用<em></em>
包含了about
匹配的文本片段GET /megacorp/employee/_search { "aggs": { "all_interests": { "terms": { "field": "interests", "size": 10 } } } }
:聚合搜索,查询员工的兴趣爱好。
这里在测试时,出现了异常。详见下方的“存在的问题:聚合搜索异常”
GET /megacorp/_search { "query": { "match": { "last_name": "smith" } }, "aggs": { "all_interests": { "terms": { "field": "interests", "size": 10 } } } }
:带匹配条件的聚合搜索,查询姓氏为Smith 的员工爱好
GET /megacorp/_search { "aggs": { "all_interests": { "terms": { "field": "interests", "size": 10 }, "aggs": { "avg_age": { "avg": {"field":"age"} } } } } }
:分级汇总聚合,查询特定兴趣爱好员工的平均年龄
4.3 删除文档
同 4.2,将 GET 改为 DELETE,将删除指定ID的文档。
4.4 更新文档
同 4.2,将 GET 改为 PUT,将更新已存在的文档。
4.5 检查文档是否存在
同 4.2,将 GET 改为 HEAD,将检查指定ID的文档是否存在。
五、分布式特性
Elasticsearch 可以横向扩展至数百(甚至数千)台服务器节点,同时可以处理PB级数据,并且在设计时屏蔽了分布式的复杂性。
这里列举一些在后台自动执行的操作:
- 分配文档到不同的容器或分片中,文档可以存储在一个或多个节点中
- 按集群节点来均衡分配这些分片,从而对索引和搜索过程进行负载均衡
- 复制诶个分片以支持数据冗余,从而防止硬件故障导致的数据丢失
- 将集群中任一节点的请求路由到存有相关数据的节点
- 集群扩容时无缝整合新节点,重新分配分片以便从离群节点恢复
本书中包含分布式特性的补充章节。如:集群扩容、故障转移(集群内的原理)、应对文档存储(分布式文档存储)、执行分布式检索、分区及分片内部原理。这些章节非必读,无需了解就能使用 Elasticsearch,但是它们将帮助你了解更完整的 ES 知识。
存在的问题
ES 启动失败1
- 异常信息:
Error occurred during initialization of VM ,Could not reserve enough space for 2097152KB object heap
- 原因:不能为堆对象保留2G的空间,内存空间不足
- 解决方案1:修改jvm配置,编辑
config/jvm.options
,修改-Xms512m -Xmx512m
即可。 - 解决方案2:增加启动参数,
ES_JAVA_OPTS="-Xms512m -Xmx512m ./bin/elasticsearch"
- 解决方案3:如果都没有用,请检查Windows的环境变量,是否是以前装过ES并做了相关服务,如果有,则删掉之前的配置
- 异常信息:
ES 启动警告
- 警告信息:
future versions of Elasticsearch will require Java 11;
- 解决方案:警告最好使用java11,但是会向下兼容。不需要处理
- 警告信息:
ES 启动失败2
- 错误信息:
org.elasticsearch.bootstrap.StartupException: ElasticsearchException[X-Pack is not supported and Machine Learning is not available for [windows-x86]; you can use the other X-Pack features (unsupported) by setting xpack.ml.enabled: false in elasticsearch.yml]
- 原因:X-Pack 不支持Windows
- 解决方案:编辑
config/elasticsearch.yml
,添加一行xpack.ml.enabled: false
- 错误信息:
聚合查询错误:
执行操作:
curl -X GET 'localhost:9200/megacorp/employee/_search' -H "Content-Type: application/json" -d {"aggs": {"all_interests": {"terms": {"field": "interests","size": 10}}}}
异常信息:
{ "error" : { "root_cause" : [ { "type" : "illegal_argument_exception", "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [interests] in order to load field data by uninverting the inverted index. Note that this can use significant memory." } ], "type" : "search_phase_execution_exception", "reason" : "all shards failed", "phase" : "query", "grouped" : true, "failed_shards" : [ { "shard" : 0, "index" : "megacorp", "node" : "BbhnQSaTTHyA8KDYnk2IJg", "reason" : { "type" : "illegal_argument_exception", "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [interests] in order to load field data by uninverting the inverted index. Note that this can use significant memory." } } ], "caused_by" : { "type" : "illegal_argument_exception", "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [interests] in order to load field data by uninverting the inverted index. Note that this can use significant memory.", "caused_by" : { "type" : "illegal_argument_exception", "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [interests] in order to load field data by uninverting the inverted index. Note that this can use significant memory." } } }, "status" : 400 }
原因:需要聚合的字段
interests
没有进行优化,也没有加索引。而 ES 默认禁止聚合/排序没有优化的字段。解决方法:将需要聚合的字段添加优化。
curl -X PUT "localhost:9200/megacorp/_mapping?pretty" -H "Content-Type: application/json" -d {"properties": {"interests": {"type": "text","fielddata": true}}}