๐Ÿ”— Relation Extraction & Knowledge Graphs

Lecture 11 Interactive Learning Companion ยท RE ยท Distant Supervision ยท Open IE ยท KG-aware Extraction

๐Ÿ” Lecture 11 Topic Explorer

Explore each core topic from Lecture 11. Select a category to see definitions, methods, and annotated examples.

// Select a topic above to explore Lecture 11 features
๐Ÿ“š Lecture 11 in context: Lecture 10 (Semantic Table Interpretation) showed how to populate a knowledge graph from semi-structured tables. Today we cover the unstructured-text case โ€” extracting triples from free-running natural language.

๐Ÿงช Relation Extraction Sandbox

Try Open IE-style extraction. Paste a sentence with two entities marked using [E1]โ€ฆ[/E1] and [E2]โ€ฆ[/E2] tags. The parser detects entity types, candidate verbs between them, and proposes a triple.

โš ๏ธ Reality check: This is a toy parser โ€” production RE uses neural encoders (BERT, transformers) over millions of examples. But the shape of the task is the same: identify entities, look at the words/dependencies between them, and predict a relation label.

โœ… RE Method Walkthrough

Step through how different RE methods process the same sentence. Compare how supervised, distantly supervised, Open IE, and KG-aware approaches arrive at a triple.

Select a method above to begin the walkthrough

๐Ÿ’ก Key Insight: The same sentence can yield different triples depending on the method. Supervised RE picks from a fixed schema; Open IE keeps the surface verb; distant supervision inherits whatever label the KG already had; KG-aware RE uses the existing graph as a prior.

๐Ÿงฑ RE Methods Lab

Explore the major RE methods and their trade-offs. Click a tip to expand it.

RE Methods at a Glance

MethodSupervisionWhat it doesTypical use
Feature-basedSupervisedHand-crafted features โ†’ SVM / MaxEnt classifierSmall, well-annotated corpora
Tree kernelsSupervisedSimilarity over parse-tree fragmentsPre-deep-learning era benchmarks
CNN / PCNNSup. / DistantConvolutional encoder over word + position embeddingsDistantly-supervised corpora (NYT)
BiLSTM + attentionSupervisedSequence model with selective attention over mentionsLong-range context, document-level RE
BERT / transformersSupervisedEntity-marker tokens + [CLS] classificationModern strong baseline (TACRED, DocRED)
Open IEUnsupervisedSurface-form (subj; verb; obj) tuples, no fixed schemaBroad-coverage extraction over web text
Distant supervisionWeakUse KB triples as automatic labels (Mintz 2009)Scaling to large unlabeled corpora
Multi-instanceWeakBag-level labels with sentence-level attentionReducing distant-supervision noise
KG-aware REHybridText encoder + KG embedding (Mรถller & Usbeck 2025)Imbalanced / zero-shot RE over Wikidata
Few-shot LLMPromptedIn-context examples; LLM generates triplesNew domains with little labelled data
Generative RESeq2SeqREBEL / GenIE / KnowGL โ€” text โ†’ triples directlyEnd-to-end KG construction

โš ๏ธ Common RE Pitfalls โ€” Click to expand

โš ๏ธ Distant supervision noise โ€” the elephant in the room

๐ŸŽญ NO_RELATION is a class โ€” and it's the most important one

๐Ÿ” Sentence-level vs document-level โ€” the recall gap

๐Ÿ”„ Direction matters โ€” (Microsoft, acquired, GitHub) โ‰  (GitHub, acquired, Microsoft)

๐Ÿ“Š Held-out KB metrics over-estimate model precision

๐Ÿค– LLM RE is fast โ€” but watch for hallucination

๐Ÿ”— Where Relations Hide in Text

Surface patterns that signal a relation between two entities โ€” what an RE model has to learn:

Verb between entities

"[Marie Curie] was born in [Warsaw]" โ€” easiest case; the verb phrase is the relation cue.

Appositive / parenthesis

"[Apple] (founded by [Steve Jobs])โ€ฆ" โ€” relation in a side clause, no main verb.

Cross-sentence

"[Curie] won two Nobel prizes. She studied in [Paris]." โ€” needs coreference + multi-hop.

Implicit / world knowledge

"[Obama] left the [White House] in 2017" โ€” implies "was president of"; surface text alone is ambiguous.

๐ŸŽฎ RE Challenge

Test your knowledge of RE, distant supervision, Open IE, and KG-aware extraction. 10 points per correct answer!

Score: 0 / 100 Question - of 10

Press "Start" to begin the RE Challenge!

๐Ÿ“ Lecture 11 Quiz

12 questions covering RE foundations, supervised methods, distant supervision, Open IE, and KG construction. You can review answers before finishing.

1. In the Information Extraction pipeline, where does Relation Extraction sit?
Before NER
Between Entity Linking and Event Extraction
After Event Extraction
RE replaces all other IE stages
1 / 12

๐Ÿ“„ Lecture 11 Cheat Sheet

Key concepts, formulas, and patterns for relation extraction and KG construction. Keep this open during revision.

๐Ÿ”— RE Task Definition

Standard RE input/output: Input: sentence s, entities (e1, e2) Output: relation r in R โˆช {NONE} # R = predefined relation schema # (s, r, o) triples are the same shape # as RDF โ€” that's why RE feeds KGs
IE pipeline (textbook ch. 10 sections): NER โ†’ Entity Linking โ†’ RE โ†’ Events 10.1 โ†’ 10.2 โ†’ 10.3 โ†’ 10.4 # Modern joint models collapse stages

๐ŸŽฏ Supervised RE Features

Classical feature categories: Lexical: bag-of-words between e1,e2 head words of each entity Syntactic: POS tags, dependency path parse-tree fragments Semantic: entity types, WordNet hyper. SRL roles, gazetteer hits KB: prior triples, type info
Neural entity-marker pattern (BERT-style): "[E1] Steve Jobs [/E1] founded [E2] Apple [/E2] in 1976." # Encode โ†’ pool [E1],[E2] โ†’ MLP โ†’ relation

๐Ÿ“š Distant Supervision

Mintz et al. (2009) assumption: If (e1, r, e2) โˆˆ KB, then EVERY sentence mentioning both e1 and e2 expresses r. # Strong โ†’ free labels # Wrong โ†’ produces noisy data
Multi-instance relaxation (Riedel 2010): For a bag of sentences {s1..sk} mentioning (e1,e2): AT LEAST ONE expresses r. # PCNN + selective attention (Lin 2016) # RL/GAN denoising (Narasimhan, Wu 2017)

๐ŸŒ Open Information Extraction

Open IE produces schema-free tuples: "Steve Jobs co-founded Apple in 1976." โ†’ (Steve Jobs; co-founded; Apple) โ†’ (Steve Jobs; co-founded Apple in; 1976) # No fixed relation set # Surface verb IS the relation
Tools & trade-offs: TextRunner, ReVerb, OLLIE, Stanford OpenIE, MinIE โœ“ Domain-independent, scalable โœ— Same relation surfaces many ways โœ— Needs canonicalisation for a KG

๐Ÿ“Š RE Evaluation Metrics

Standard sentence-level metrics: P = TP / (TP + FP) # precision R = TP / (TP + FN) # recall F1 = 2ยทPยทR / (P + R) # Often ignore the NO_RELATION class # in macro-averaging
Distant-supervision evaluation: Held-out KG facts: split KB triples into train / test, evaluate predictions vs held-out test set. Manual P@N: rank predictions, hand- check top N โ€” most honest for facts NEW to the KG.

๐Ÿง  KG-aware RE (Mรถller & Usbeck 2025)

Architecture sketch: h_text = BERT(sentence) h_kg = NeuralBellmanFord(KG, e1, e2) logits = MLP([h_text ; h_kg]) # concatenate or attend over both
When KG signal helps most: โœ“ Imbalanced labels (long-tail rels) โœ“ Zero-shot relations via topology โœ“ Disambiguating ambiguous text cues โœ— Helps less when entities are not yet in the KG (cold start)

๐Ÿ—๏ธ KG Construction Pipeline

From text to a validated KG: 1. NER โ€” find entity mentions 2. EL โ€” link to KG IDs (Q-IDs) 3. RE โ€” predict relations 4. Canonicalise โ€” map verbs โ†’ props 5. Fuse โ€” reconcile across sources 6. Validate โ€” SHACL / ShEx checks 7. Commit โ€” write to live graph
KG completion vs extension: Completion: predict missing edges among entities ALREADY in the KG (link prediction; pure graph task) Extension: add NEW facts (and maybe new entities) by extracting from external text โ€” needs RE

๐Ÿ† Best Practices

Train, evaluate, deploy effectively: โœ“ Always include a NO_RELATION class โœ“ Use entity markers for transformer RE โœ“ Stratify splits โ€” beware entity leakage โœ“ Validate triples with SHACL before committing to the KG โœ“ Track provenance (sentence + model + confidence) for every triple โœ— Don't trust held-out KG metrics โ€” the KG itself is incomplete โœ— Don't ignore the long tail โ€” most relations have few examples