How to build an NLP labelling interface in JavaScript

… and why you may not want to.

If you want to build your own ML models that can understand text, you'll need an annotation interface to help you train the model. In this post we'll talk you through how you can build a good annotation interface in JS and discuss some of the surprising challenges that crop up.

Information extraction is a common use case for machine learning and NLP. An ML model can find the sections of a document where particular items occur and extract them.

To train an ML model that can take text and extract information, we must first manually label a training set for the system to learn from.

Labelling training data can be tedious and expensive. At Humanloop, our models reduce the amount of annotation required by using active learning, but it's still necessary to do some labelling by hand.

Most annotation tools focus on overly simple tasks

Most text annotation interfaces focus on simple Named Entity Recognition (NER), and aren't well suited to more challenging tasks. Real world information extraction problems often go beyond simple NER. e.g:

  • identifying and extracting clauses inside legal documents
  • finding displays of positive and negative behaviors in interview responses
  • identifying behaviors in social media posts
  • extracting attributes of real estate listings - the size of the kitchen, who is selling the property, etc.

These tasks are all examples of more complex sequence extraction problems. They typically have:

  • Annotations that vary widely in length. From just a few characters (potentially causing label clashes) to multiple paragraphs (which have their own challenges that we’ll discuss in a moment).
  • Large numbers of possible class labels. This presents a UX  challenge for how best to show more labels than can fit on screen whilst keeping each class visually distinct.
  • A very high density of labels, which must be handled without their labels overlapping or becoming unreadable.
  • Users that need to collaborate with each other to rapidly reach consensus. This is vital to achieve high quality data.

Machine learning is a "garbage in - garbage out" technology and so having very high data quality is incredibly important. If the data is inconsistently labelled, It'll dramatically reduce the performance of any ML model trained on that data.

Many existing tools aren't well suited to collecting high quality text annotations in more complex settings. So let's make something that can.

There are subtle requirements to a truly excellent labelling interface

As a JavaScript engineer, an NER text annotation interface seems very simple: a series of inline text elements, showing the classified and unclassified parts of a document in order. The classified text needs a label name attached to it.

Simple right? Let's think through some other requirements before we build.

  • The document must not move while the user is annotating - many NER text annotation interfaces ignore this requirement with poor results: inserting labels names next to annotations will make all the subsequent text reflow to fit the new label. Users find jumping text frustrating, they read as they label, and it's unpleasant to have a document jump around while you're reading it.
  • The document must be able to be read naturally Labels between words [feature] prevents [hassle] the users [identity] from reading [activity] the sentence.
  • Our interface should handle long annotations. Because text flow is controlled by a user’s window size, almost any multi-word annotations may cover multiple lines. Labels must be positioned correctly and consistently, even though an annotation may start at the very end of one line of text and finish at the very beginning of another line.
  • Our interface should handle short annotations. Even if they’re close together and have long label names.
  • Labels need to be readable and distinct between classes. Black text on a dark blue background won't cut it. Nor will asking users to assign readable colors to every one of their classes - people's time is valuable.
  • Users will need to collaborate. Annotators will need to be able to discuss data points and classes they’re unsure about with each other and with project managers.

If you've been following closely, you may be dawning on a particular realisation:

Unlike traditional web development, where content and user interface are separate from each other, a good text annotation interface is more akin to Google Docs or Office 365 - a document-focused, creation (via labelling) and collaboration environment.

Let's get to work. Here are some of the key considerations:

1. Label positioning

As we've mentioned, text cannot move about while the user is annotating. Simple NER tools often insert labels between words, which causes reflow

A NER interface that puts labels in between words ... 
… adding new labels makes the entire document shift. As documents become longer this becomes increasingly more annoying to users.

One native approach to label positioning would be to replace the characters in multiple line annotations (normally displayed as inline elements like span) with block elements (like p) because blocks are easier to label. While this guarantees label positioning, it also means all subsequent text will move and will reflow onto new lines. As we've mentioned, users don't want a document moving when they're reading it.

Instead, we want labels to be in a consistent place, in relation to the annotated text, regardless of the length of the annotation.

A good solution this is the Rect (rectangles) API, which allows you to determine where a specific element is drawn on screen. It's unlikely you've used Rect before (unless you've written a span tagging interface or played around with text selection) so let's explain.

If you have an inline element that, due to the size available, flows across two lines:

then you'll need two rectangles. The Rect API will tell you the coordinates of those two rectangles.

Now for some sad news:

Rects are always determined relative to the viewport, whereas you'll want to position the labels relative to the document element.

This means you'll have to write code that converts between two coordinates systems - the viewport, and the DOM tree.

// Return coordinates that can be used to absolutely position
// a label inside a 'relative' positioned parent
export function getLabelCoordinates(spanElement) {
  // rects are relative to viewport
  // See https://developer.mozilla.org/en-US/docs/Web/API/Element/getClientRects
  const rects = spanElement.getClientRects();
  const lastRect = rects[rects.length - 1];
  // So we can position our new label relative to it's positioned ancestors,
  // rather than the viewport, find out the position of the positioned ancestor
  // so we can subtract it!
  const positionedAncestor = spanElement.offsetParent;
  const positionedAncestorRect = positionedAncestor.getBoundingClientRect();
  const top = round(lastRect.bottom - positionedAncestorRect.top);
  // This starts the label diagonally underneath the last rectangle (we'll slide it back in Annotation.jsx)
  const left = round(
    lastRect.left - positionedAncestorRect.left + lastRect.width
  );
  const lastRectWidth = round(lastRect.width);
  return {
    top,
    left,
    lastRectWidth,
  };
}


We can use these to ensure that labels consistently appear at the end of the annotated text, regardless of where the text itself is positioned. We'll also need to run this code on events that reflow the text, like font loads, window resizes and other UI changes, to ensure the labels are always perfectly positioned.

The same document as above, with a slightly wider window. Using the 'Rects' API ensures that labels are always correctly positioned regardless of where the associated text is drawn on screen.

2. Highlight colors

Making distinct label colors

We need to highlight our annotations, but highlight colors must ensure that text can still be read against the background.

The ideal colors here are pastel shades - they'll vary in hue, but are consistently light in tone. Thankfully CSS hsla colors provide a way to vary hues, but keep saturation and lightness bright enough to read.

The following code:

  • Converts every string into a number
  • Divides the number by 360 to return a unique hue (on a 360 degree color wheel)
  • Uses the result for the hue part of an hsl() CSS color.
// Convert string to a number between 0 and 360
export function stringToHueNumber(string) {
  const encoded = Buffer.from(string).toString("hex");
  const remainder = BigInt(`0x${encoded}`) % BigInt(360);
  return Number(remainder);
}

// Given a string, return a unique pastel color
export function stringToHSLA(string) {
  const pastelColor = stringToHueNumber(string);
  const BACKGROUND_SATURATION = 94;
  const BACKGROUND_LIGHTNESS = 86;
  const TEXT_SATURATION = 100;
  const TEXT_LIGHTNESS = 21;
  return {
    backgroundColor: `hsla(${pastelColor}deg, ${BACKGROUND_SATURATION}%, ${BACKGROUND_LIGHTNESS}%, 1)`,
    color: `hsla(${pastelColor}deg, ${TEXT_SATURATION}%, ${TEXT_LIGHTNESS}%, 1)`,
  };
}

The results ensure that highlight colors are always readable, and that the same color can be consistently determined from a class name.

We can handle arbitrary labels by varying the hue based on the label's name, keeping other color values consistent to ensure the text is still readable.

3. Typography - every character is a click target

My own first impression of text annotation is that it should look like high quality reading-focused typography. Think Substack, Medium or the New York Times. Variable width fonts are typically considered better for reading.

However reading isn't the only activity going on here - each character in an annotation interface is also a click target for the mouse. Sometimes words are a single character. Users may also need to select characters within words in some cases, for example in medical data or in languages with compound words.

Microsoft mentions:

Make click targets at least 16x16 pixels so that they can be easily clicked

A variable width font won't meet those requirements:

Thin characters like i and l are difficult to select

A monospace font and more relaxed word spacing will:

Even thin characters like i and l are easily selectable

A little CSS typography and we've fixed it:

article {
  word-spacing: 0.5em;
  line-height: 2.875;
  font-size: 1.125rem;
  line-height: 1.75rem;
  font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas,
  'Liberation Mono', 'Courier New', monospace;
}

4. Handling label contention

When you have many small spans of text annotated near each other (a common example is postal addresses), you'll get labels overrunning each other.

So, when text shorter than a particular threshold is labelled:

  • We'll use a smaller font for these labels, and use the extra space to split words over two lines
  • We'll use an ellipses '...' to truncate the label after that

Now for some bad news.

CSS (officially) can't yet truncate multiple lines.

However the CSS line clamp spec provides a way to do exactly this, specifying a fixed amount of lines (in our case, two).

.clamped {
  overflow: hidden;
  display: -webkit-box;
  -webkit-box-orient: vertical;
  -webkit-line-clamp: 2;
}

We can use these to produce labels that avoid overlapping when used on small spans on text.

When smaller spans of text are labelled, we can split words and use the CSS line clamp spec to avoid labels clashing.

5. The rest

To really make the annotation interface sing there are a bunch of auxiliary features that it's important to get right:

  • Maintain focus on selections: Our interactive span tagging UI will remove focus on the selection, which normally hides it. We'll need to replicate the HTML selection on the page while the user picks a label. This is fairly easily done though - get the selection anchor and focus (where the selection was started and where it finished -keeping in mind users can select backwards!), and calculate the character offsets. Then get the range of each selection, get the rectangles for each range, and draw elements in the same places (converting the coordinates using the same trick we used for labels above).  
Since the label selector input is the active element, we'll need to create our own selection that still displays
  • Taxonomy Editing: Working out what the right classes for a machine learning problem is highly iterative. Sometimes annotators will think of new classes while they're editing documents. We should be able to add and remove classes right from the labelling interface - and if we're removing them we'll need to warn users how many annotations are affected. Of course not every user needs the ability to edit labels so we'll need to control that.
  • Collaboration Tools: Sometimes annotators aren't sure about a data point. Perhaps the existing labels don't describe the data point well, perhaps the data point itself has some kind of noise or other distraction. We'll need to record when and why users can't make a decision.
  • Model Predictions: It may be desirable to show predictions from an ML model to users - to allow them to quickly accept/reject the predictions:
Underlines suggest possible labels to annotators to help them move faster

Getting the details right is hard

We hope that was a useful guide to some of the many aspects of building a text annotation UI.

It should also give you an idea of some of the considerations - avoiding reflows during editing, handling large annotations, large amounts of classes, consistent labelling, readability, and collaboration - that you should keep in mind when building your own span tagging interface, or looking at existing solutions.

At Humanloop we've spent considerable time working with real customers on a text annotation interface that works well with complex documents and large datasets. Our active learning platform dramatically reduces the amount of annotation required. Interested? Book a demo.