Schema matching evaluation

Theme: Structured representations for knowledge discovery

Evaluation of schema matching using graph and text-based methods

What is schema matching? We start with a text (or a query), for example the 2016 Masters Tournament, and a set of other texts that we have studied well, what we'll call schemas, like `diving competition`, `practice makes perfect`, and `driving a car`, idealizations of event sequences selected from a schema library (i.e., our dataset, called Torquestra, see the paper). Which of these do you suspect our query, the Masters Tournament, is similar to?

Basically, this is search: identify the schemas most similar to the query. Traditionally, a common similarity metric like TF-IDF is great at finding texts that are similar based on shared tokens. In our experiments, we test graph-based methods: We first generate a causal graph given a text, make graph embeddings using a graph attention network (GAT) (Velickovic et al, 2018), encode our query and all candidates and measure pair-wise distances (standardized Euclidean works good), examining top-k most similar schemas.

Now, to work. In the sample below, you will find 50 queries, each matched with four (4) candidates from our schema library. Your task, should you choose to accept it, is to see if the query is like any of the candidates!

Important! For the first 25 (GAT), we match graph-to-graphs, so you'll find each text is followed by its corresponding graph query, while the four (4) candidates are shown as texts followed by graphs. For the second 25 (TF-IDF), we match text-to-texts, so graphs are not needed...

Also remember that the query using GAT is always a generated causal graph, while the candidate graphs are from our manually curated schema library.

Instructions: Look at each query and each of the four (4) candidates that follow. Check the boxes of those you find similar!

Similar means: words are similar, meaning is similiar, causal structure is similar...

For additional guidelines and examples, see the link above! Navigate using the Go to matched set▿ ⬆ dropdown!

As you check boxes, Mean Average Precision will update in the table! Other values are fixed.

Data and model specifics used to generate query causal graphs, embed, and match.

Data: 3.5k Wikipedia texts labeled with event types and topics

Generative model: GPT2-XL-distill-high fine-tuned 16 epochs, 500 block_size

GAT (self-supervised); dropout=0.5; hidden_channels=256; 2 conv layers

Graph metrics and evaluation

Evaluation: Mean average precision

METHOD	Mean # nodes	Mean # edge	Mean degree	Mean density
GAT	11.73	7.45	1.29	0.08
FEATHER	12.08	8.51	1.51	0.09
TFIDF	10.65	6.62	1.26	0.10
Text embedding	11.01	7.21	1.32	0.10