Basic CNN model from 《Applying Deep Learning To Answer Selection: A Study And An Open Task》 RNN. to find the most similar question from a large QA dataset. The dataset now includes 10,898 articles, 17,794 tweets, and 13,757 crowdsourced question-answer pairs. Yahoo Language Data: This page features manually curated QA datasets from Yahoo Answers from Yahoo. Maluuba News QA Dataset. Text. Paraphrase identication is a well-studied task in NLP (Das and Smith,2009;Chang et al.,2010;He et al.,2015;Wang et al.,2016, inter alia). Quora Question Pairs: first dataset release from Quora containing duplicate / semantic similarity labels. In each track, the task was defined such that the systems were to retrieve small snippets of text that contained an answer for open-domain, closed-class questions. There are several websites which provide forums to ask open-ended questions such as Yahoo Answers, Quora, as well as numerous Reddit forums, or subreddits. Config description: The Stanford Question Answering Dataset is a question-answering dataset consisting of question-paragraph pairs, where one of the sentences in the paragraph (drawn from Wikipedia) contains the answer to the corresponding question (written by an annotator). We compare HBAM with other state-of-the-art language models such as bidirectional encoder representation from transformers (BERT) and Manhattan LSTM Model (MaLSTM). CMU Q/A Dataset. TWEETQA is a social media-focused question answering dataset. There are many ships, boats on the oceans and it is impossible to manually keep track of what everyone is doing. Dataset: Speech Emotion Recognition Dataset. However, since the test set is typically a randomly selected subset of the whole set of data collected, and thus follows the same distribution as the training and development sets, the performance of models on the test set tends to overestimate the models' performance. OpenBookQA is a new kind of question-answering dataset modeled after open book exams for assessing human understanding of a subject. We focus on the subreddit Explain Like I'm Five (ELI5) where users are encouraged to provide answers which are comprehensible by a five year old. ELI5 is appealing … Insurance-QA deeplearning model. NarrativeQA is a data set constructed to encourage deeper understanding of language. Our hypothesis is that by training on a large corpus for a similar medical task, we can embed medical knowledge into the model. The total number of medical related data from Quora dataset is nearly 70000, but we randomly pick the 10000 as the (train/dev/test) dataset. Over 100 million people visit Quora every month, so it's no surprise that many people ask similarly worded questions. I build a model based on Facebook AI's roBERTa base to classify questions on Quora as sincere or insincere. Model Average Eval_accuracy by three times Range of change; BERT baseline model: 0.7686 (-0.0073, +0.0057) HDBA model: 0.8146 (-0.0082, +0.0098) Bi-LSTM + Attention model: 0.8043 (-0.0103, +0.0062) QA systems. Besides interactions, the latter enables users to label the questions with topic tags that highlight the key points conveyed in the questions. Human evaluation indicate that the paraphrases generated by our system are well-formed. Version 1.2 released August 23, 2013 (same data as 1.1, but now released under GFDL and CC BY-SA 3.0) The data set consists of 113,000 Wikipedia-based QA pairs.