Relation Extraction & Knowledge Graphs — Interactive Learning Companion

🔍 Lecture 11 Topic Explorer

Explore each core topic from Lecture 11. Select a category to see definitions, methods, and annotated examples.

// Select a topic above to explore Lecture 11 features

📚 Lecture 11 in context: Lecture 10 (Semantic Table Interpretation) showed how to populate a knowledge graph from semi-structured tables. Today we cover the unstructured-text case — extracting triples from free-running natural language.

🧪 Relation Extraction Sandbox

Try Open IE-style extraction. Paste a sentence with two entities marked using [E1]…[/E1] and [E2]…[/E2] tags. The parser detects entity types, candidate verbs between them, and proposes a triple.

⚠️ Reality check: This is a toy parser — production RE uses neural encoders (BERT, transformers) over millions of examples. But the shape of the task is the same: identify entities, look at the words/dependencies between them, and predict a relation label.

✅ RE Method Walkthrough

Step through how different RE methods process the same sentence. Compare how supervised, distantly supervised, Open IE, and KG-aware approaches arrive at a triple.

Select a method above to begin the walkthrough

💡 Key Insight: The same sentence can yield different triples depending on the method. Supervised RE picks from a fixed schema; Open IE keeps the surface verb; distant supervision inherits whatever label the KG already had; KG-aware RE uses the existing graph as a prior.

🧱 RE Methods Lab

Explore the major RE methods and their trade-offs. Click a tip to expand it.

RE Methods at a Glance

Method	Supervision	What it does	Typical use
Feature-based	Supervised	Hand-crafted features → SVM / MaxEnt classifier	Small, well-annotated corpora
Tree kernels	Supervised	Similarity over parse-tree fragments	Pre-deep-learning era benchmarks
CNN / PCNN	Sup. / Distant	Convolutional encoder over word + position embeddings	Distantly-supervised corpora (NYT)
BiLSTM + attention	Supervised	Sequence model with selective attention over mentions	Long-range context, document-level RE
BERT / transformers	Supervised	Entity-marker tokens + [CLS] classification	Modern strong baseline (TACRED, DocRED)
Open IE	Unsupervised	Surface-form (subj; verb; obj) tuples, no fixed schema	Broad-coverage extraction over web text
Distant supervision	Weak	Use KB triples as automatic labels (Mintz 2009)	Scaling to large unlabeled corpora
Multi-instance	Weak	Bag-level labels with sentence-level attention	Reducing distant-supervision noise
KG-aware RE	Hybrid	Text encoder + KG embedding (Möller & Usbeck 2025)	Imbalanced / zero-shot RE over Wikidata
Few-shot LLM	Prompted	In-context examples; LLM generates triples	New domains with little labelled data
Generative RE	Seq2Seq	REBEL / GenIE / KnowGL — text → triples directly	End-to-end KG construction

⚠️ Common RE Pitfalls — Click to expand

⚠️ Distant supervision noise — the elephant in the room

🎭 NO_RELATION is a class — and it's the most important one

🔁 Sentence-level vs document-level — the recall gap

🔄 Direction matters — (Microsoft, acquired, GitHub) ≠ (GitHub, acquired, Microsoft)

📊 Held-out KB metrics over-estimate model precision

🤖 LLM RE is fast — but watch for hallucination

🔗 Where Relations Hide in Text

Surface patterns that signal a relation between two entities — what an RE model has to learn:

Verb between entities

"[Marie Curie] was born in [Warsaw]" — easiest case; the verb phrase is the relation cue.

Appositive / parenthesis

"[Apple] (founded by [Steve Jobs])…" — relation in a side clause, no main verb.

Cross-sentence

"[Curie] won two Nobel prizes. She studied in [Paris]." — needs coreference + multi-hop.

Implicit / world knowledge

"[Obama] left the [White House] in 2017" — implies "was president of"; surface text alone is ambiguous.

🎮 RE Challenge

Test your knowledge of RE, distant supervision, Open IE, and KG-aware extraction. 10 points per correct answer!

Score: 0 / 100 Question - of 10

Press "Start" to begin the RE Challenge!

📝 Lecture 11 Quiz

12 questions covering RE foundations, supervised methods, distant supervision, Open IE, and KG construction. You can review answers before finishing.

1. In the Information Extraction pipeline, where does Relation Extraction sit?

Before NER

Between Entity Linking and Event Extraction

After Event Extraction

RE replaces all other IE stages

1 / 12

📄 Lecture 11 Cheat Sheet

Key concepts, formulas, and patterns for relation extraction and KG construction. Keep this open during revision.

🔗 RE Task Definition

Standard RE input/output:

Input:  sentence s, entities (e1, e2)
Output: relation r in R ∪ {NONE}

# R = predefined relation schema
# (s, r, o) triples are the same shape
# as RDF — that's why RE feeds KGs

IE pipeline (textbook ch. 10 sections):

NER  → Entity Linking → RE → Events
10.1 →     10.2      → 10.3 →  10.4

# Modern joint models collapse stages

🎯 Supervised RE Features

Classical feature categories:

Lexical:   bag-of-words between e1,e2
           head words of each entity
Syntactic: POS tags, dependency path
           parse-tree fragments
Semantic:  entity types, WordNet hyper.
           SRL roles, gazetteer hits
KB:        prior triples, type info

Neural entity-marker pattern (BERT-style):

"[E1] Steve Jobs [/E1] founded
 [E2] Apple [/E2] in 1976."

# Encode → pool [E1],[E2] → MLP → relation

📚 Distant Supervision

Mintz et al. (2009) assumption:

If (e1, r, e2) ∈ KB,
then EVERY sentence mentioning
both e1 and e2 expresses r.

# Strong → free labels
# Wrong → produces noisy data

Multi-instance relaxation (Riedel 2010):

For a bag of sentences {s1..sk}
mentioning (e1,e2):
  AT LEAST ONE expresses r.

# PCNN + selective attention (Lin 2016)
# RL/GAN denoising (Narasimhan, Wu 2017)

🌐 Open Information Extraction

Open IE produces schema-free tuples:

"Steve Jobs co-founded Apple in 1976."
  → (Steve Jobs; co-founded; Apple)
  → (Steve Jobs; co-founded Apple in;
     1976)

# No fixed relation set
# Surface verb IS the relation

Tools & trade-offs:

TextRunner, ReVerb, OLLIE,
Stanford OpenIE, MinIE

✓ Domain-independent, scalable
✗ Same relation surfaces many ways
✗ Needs canonicalisation for a KG

📊 RE Evaluation Metrics

Standard sentence-level metrics:

P = TP / (TP + FP)   # precision
R = TP / (TP + FN)   # recall
F1 = 2·P·R / (P + R)

# Often ignore the NO_RELATION class
# in macro-averaging

Distant-supervision evaluation:

Held-out KG facts: split KB triples
  into train / test, evaluate
  predictions vs held-out test set.

Manual P@N: rank predictions, hand-
  check top N — most honest for
  facts NEW to the KG.

🧠 KG-aware RE (Möller & Usbeck 2025)

Architecture sketch:

h_text = BERT(sentence)
h_kg   = NeuralBellmanFord(KG, e1, e2)

logits = MLP([h_text ; h_kg])
# concatenate or attend over both

When KG signal helps most:

✓ Imbalanced labels (long-tail rels)
✓ Zero-shot relations via topology
✓ Disambiguating ambiguous text cues
✗ Helps less when entities are
  not yet in the KG (cold start)

🏗️ KG Construction Pipeline

From text to a validated KG:

1. NER  — find entity mentions
2. EL   — link to KG IDs (Q-IDs)
3. RE   — predict relations
4. Canonicalise — map verbs → props
5. Fuse — reconcile across sources
6. Validate — SHACL / ShEx checks
7. Commit — write to live graph

KG completion vs extension:

Completion: predict missing edges
  among entities ALREADY in the KG
  (link prediction; pure graph task)

Extension:  add NEW facts (and maybe
  new entities) by extracting from
  external text — needs RE

🏆 Best Practices

Train, evaluate, deploy effectively:

✓ Always include a NO_RELATION class
✓ Use entity markers for transformer RE
✓ Stratify splits — beware entity leakage
✓ Validate triples with SHACL before
  committing to the KG
✓ Track provenance (sentence + model
  + confidence) for every triple
✗ Don't trust held-out KG metrics —
  the KG itself is incomplete
✗ Don't ignore the long tail —
  most relations have few examples