大家好，欢迎来到IT知识分享网。

Rasa NLU

What is Rasa?

The Rasa Stack is a set of open source machine learning tools for developers to create contextual AI assistants and chatbots:
Core = a chatbot framework with machine learning-based dialogue management
NLU = a library for natural language understanding with intent classification and entity extraction
NLU and Core are independent. You can use NLU without Core, and vice versa. We recommend using both.

Rasa是什么？

RASA堆栈是一组开源机器学习工具，供开发人员创建上下文人工智能助手和聊天机器人：
Core = 基于机器学习的对话管理聊天机器人框架
NLU = 意图分类和实体提取的自然语言理解库
NLU 和 Core模块是独立的，你可以使用没有Core的NLU模块，反之亦然。推荐两者同时使用。
[外链图片转存失败(img-YsGSaWGz-1566962894477)(https://raw.githubusercontent.com/wangyizhen/article_images/master/1548487939788.png)]

NLU understands the user’s message based on your previous training data:

Intent classification: Interpreting meaning based on predefined intents (Example: Please send the confirmation to amy@example.com is a provide_email intent with 93% confidence)
Entity extraction: Recognizing structured data (Example: amy@example.com is an email)

Core decides what happens next in this conversation. It’s machine learning-based dialogue management predicts the next best action based on the input from NLU, the conversation history and your training data. (Example: Core has a confidence of 87% that ask_primary_change is the next best action to confirm with the user if they want to change their primary contact information.)

rasa 训练过程

Prepare your NLU Training Data
Define your Machine Learning Model
Train your Machine Learning NLU model
Use your model

There are two ways to use your model, directly from python, or by starting a http server.

pipeline

pipeline 定义了各个组件之间数据的前后流动关系，组件之间是存在依赖关系的，任意一个组件的依赖需求没有被满足都将导致 pipeline 出错（Rasa NLU 会在启动的时候检查是否每一个组件的依赖都被满足，如果没有满足，则终止运行并给出相关的提示消息）。具有以下特征：

组件之间的顺序关系至关重要，比如 NER 组件需要前面的组件提供分词结果才能正常工作，那么前面的组件中有必须有一个分词器。
组件是可以相互替换的，比如同样是提供分词结果，同时有几个 component 可以选择，比如中文的可以选择清华的分词器、北大的分词器的。
有些组件是互斥的，比如：分词器是互斥的，分词结果不能同时由两个组件提供，否则会出现混乱。
有些组件是可以同时使用的，比如：提取文本特征的组件可以同时使用基于规则的和基于文本嵌入向量的。

一个 NLU 应用通常包括 命名实体识别 和 意图识别 两个任务。为了完成这些任务，一个典型的 Rasa NLU pipeline 通常具有以下的 pattern:
[外链图片转存失败(img-qyV4WoEN-1566962894478)(https://raw.githubusercontent.com/wangyizhen/article_images/master/1550722559704.png)]

初始化类组件：为了加载模型文件，为后续的组件提供框架支持，如初始化 SpaCy 和 MITIE
分词组件：将文本分割成词语序列，为后续的高级 NLP 任务提供基础数据
提取特征：提取词语序列的文本特征，通常采用 Word Embedding 的方式，提取特征的组件可以同时使用，同时搭配的还可能有基于正则表达式的提取特征的方法。
NER 组件：根据前面提供的特征对文本进行命名实体的识别
意图分类：按照语义对文本进行意图的分类，也称意图识别

componets

Component Lifecycle

[外链图片转存失败(img-FJ5DDxV9-1566962894479)(https://raw.githubusercontent.com/wangyizhen/article_images/master/1550805027539.png)]

Component Structure
- property
  - name = “”#组件名字，
  - provides = [] #当前组件能够计算出什么
  - requires = [] #当前组件需要提供什么
  - defaults = {} #组件的默认参数，可以被pipeline中参数覆写
- __init__
- create
  在训练之前初始化组件
- train
  训练组件，如果不需要训练
- persist
  保存组件模型到本地以备将来使用，如果没有需要保存的东西，可以不实现
- load
  从本地加载保存的东西，若没有保存东西到本地，也不需要实现
- process
  使用组件进行处理，从message中取想要的数据，计算完成后更新到message中
- required_packages
  指定需要安装哪些python包才能使用此组件。
Entity Extraction

Component	provides	requires
nlp_spacy	[“spacy_doc”, “spacy_nlp”]
nlp_mitie	[“mitie_feature_extractor”, “mitie_file”]
tokenizer_jieba	tokens
tokenizer_mitie	tokens
tokenizer_spacy	tokens	spacy_doc
tokenizer_whitespace	tokens
ner_crf	entities	tokens
ner_synonyms	entities
intent_featurizer_spacy	text_features	spacy_doc
intent_featurizer_mitie	text_features	[“tokens”, “mitie_feature_extractor”]
intent_entity_featurizer_regex	text_features	spacy_doc

Intents classification
Custom Component

Reference

免责声明：本站所有文章内容,图片，视频等均是来源于用户投稿和互联网及文摘转载整编而成，不代表本站观点，不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益，请在线联系站长,一经查实,本站将立刻删除。本文来自网络,若有侵权，请联系删除，如若转载，请注明出处：https://yundeesoft.com/15858.html

Rasa NLU-自然语言处理工具

Rasa NLU

What is Rasa?

rasa 训练过程

pipeline

componets

Reference

发表回复

Rasa NLU-自然语言处理工具

Rasa NLU

What is Rasa?

rasa 训练过程

pipeline

componets

Reference

相关推荐

发表回复