After last meeting(held on every Thursday), I have gone through several approaches of template expansion(Paraphrase generation).
We should differentiate between Template Expansion (what we’re focussing on) and Template Generation, i.e. completely new questions for which we need to build also the corresponding SPARQL query.
The expected output would be a paraphrase of the original template.
Paraphrases are sequences that convey the same meaning but using different wording. Paraphrasing exists at different granularity levels, such as lexical level, phrasal level and sentential level.
Although most of the papers don’t publish their source code, I’ve made a list for those whose implementation is open-sourced:
|Title||Paper||Code source and Implementation|
|Neural Syntactic Preordering for Controlled Paraphrase Generation||https://arxiv.org/pdf/2005.02013v1.pdf||https://github.com/tagoyal/sow-reap-paraphrasing|
|Decomposable Neural Paraphrase Generation||https://www.aclweb.org/anthology/P19-1332.pdf||——|
|Syntax-guided Controlled Generation of Paraphrases||https://arxiv.org/pdf/2005.08417v1.pdf||https://github.com/malllabiisc/SGCP|
|Paraphrase Generation with Latent Bag of Words||https://arxiv.org/pdf/2001.01941v1.pdf||https://github.com/FranxYao/dgm_latent_bow|
Among all, the last T5 model(proposed by my mentor Tommaso) has the best instruction of implementation and seems to match our need, so we will start our Template Expansion task with this model.
Figure1: New question template will be generated by our Paraphrase Model and hopefully be matched with the same Query template
We should first of all ensure that the paraphrase of question template have the same meaning with the original one, this could be done with Universal Sentence Encoder, which could help us compute sentence level semantic similarity scores.
Here are some results of the preliminary experiments realized with the pre-trained T5 model and the metric Cosine similarity:
When is the birth date of XYZ ?
0: When did the birth date of XYZ begin?
1: Is XYZ born on June 8th?
2: What is the year XYZ?
3: What is the date of birth of XYZ?
4: When is XYZ birth date?
5: When did you date your XYZ birth?
6: What was XYZ & when was his birthday?
7: When was XYZ born?
8: What is birth date of xyz?
We can say that #7 paraphrase is the perfect one because it is closer to people’s daily language. Additionally, it contains the conversion from the nominal predicate(birth date) to a verbal predicate(was born), but interestingly, the cosine similarity is relatively low among all (0.7896207).
Consequently, I think wh should have a second criteria to evaluate the quality of paraphrasing. With this second metric, we should be able to ensure that the syntax similarity is low as we expect very different question structures. To be noted, we are not looking for synonyms as those will be handled by replacing internal with global Word Embeddings.
This difference of syntax could be evaluated with Levenshtein distance, or other possible metrics. My mentor Tommaso proposed a tool part-of-speech tagging.
This POS tagging tool can make linguistic annotations as a token attributes, so it may help us to detect the ‘nominal2verbal’ or opposite changes.
But the most important thing is that the improvement of the overall F-score on the QALD-benchmark(Question Answering over Linked Data), that will be our final goal.
As we will still focus on the final metric on QALD benchmark, setting up a benchmark and a baseline will be necessary in the next weeks.
In addition, in order to build this benchmark, creating our own pipeline is also supposed to be done next weeks. I will not create a pipeline from scratch, but use my predecessor, Anand’s work as the base, and add my Template Expansion pipeline to it.