Author: om prakash sharma

Coauthor(s): Parul Mishra

Status: Work In Progress

Conversational Agents are used nowadays, for various purposes using deep learning algorithm which makes chat but very special.
Many companies are hoping to develop bots using Deep Learning techniques to have natural conversations indistinguishable from human ones, and many are claiming to be using NLP and Deep Learning techniques to make this possible.
In this chat system I have used Retrieval-based models as a repository of predefined responses and some kind of heuristic to pick an appropriate response based on the input and context.
These systems don’t generate any new text, it just picks an appropriate response from a fixed set of data which picked by algorithm.

Using RETRIEVAL-BASED system, but extend the domain in multiple fields. IT help desk, Weather forecasting, Stock market analysis tool and sentiment analysis adviser (Closed Domain). Multiple adopter can be embedded to extend feature.

How it works
The Deep Learning model I have used for this project is called a Dual Encoder LSTM network.
This type of network is just one of many we could apply to this problem. The reason i have gone for the Dual Encoder is because it has been shown decent performance on UBUNTU DIALOG CORPUS data set, which i am using.

It works as follows:
Both the context and the response text are split by words, and each word is embedded into a vector.
The word embedding are initialized with Stanford’s GloVe vectors and are fine-tuned during training.

Both the embedded context and response are fed into the same Recurrent Neural Network word-by-word.
The RNN generates a vector representation that, loosely speaking, captures the “meaning” of the context and response.
We can choose how large these vectors should be, but let’s say we pick 256 dimensions.

I multiply c with a matrix M to “predict” a response r’.
If c is a 256-dimensional vector, then M is a 256×256 dimensional matrix, and the result is another 256-dimensional vector, which we can interpret as a generated response. The matrix M is learned during training.

I measure the similarity of the predicted response r’ and the actual response r by taking the dot product of these two vectors.
A large dot product means the vectors are similar and that the response should receive a high score.
I then apply a sigmoid function to convert that score into a probability.

Tools – numpy, pandas, TF Learn, Tensorflow and for data set – UBUNTU DIALOG CORPUS

Jiwei Li1* Michel Galley2 Chris Brockett2 Georgios P. Spithourakis3* Jianfeng Gao2 Bill Dolan2