displaCy: Dependency parse tree visualization with CSS
code

My latest project started out as web development job but quickly turned into something much bigger: displaCy, a visualiser for dependency parse trees (grammatical structure) using the spaCy API. It's super lightweight and built entirely with CSS, some basic HTML and a bit of JavaScript. This post is about the development process and how to make the most of CSS to visualise data in a fun and flexible way. (And don't worry, I'll also try to explain a bit of that computational linguistics stuff.)

What is spaCy?

spaCy is a library for Natural Language Processing (NLP). The basics are actually quite simple: You can feed spaCy all kinds of texts and it will tell you a lot about those texts in basically no time. For example, it can parse a sentence and tell you everything you want to know about its grammatical structure. This may sound quite abstract and academic (I still have vivid memories of having to draw a bunch of those in uni) and maybe the significance of this is not instantly obvious. But think about it this way:

Computers don’t understand text. This is unfortunate, because that’s what the web almost entirely consists of. We want to recommend people text based on other text they liked. We want to shorten text to display it on a mobile screen. We want to aggregate it, link it, filter it, categorise it, generate it and correct it.

So basically, once your computer is able to accurately tag the parts of speech (i.e. knows what's a verb, a noun, an adjective and so on) and the dependencies (i.e. knows that noun A is subject of verb B), you can use this data and build really cool things, like apps, web services or advanced analytics tools that actually, more or less, understand language. If you want to know more about the theory behind this and how to make a computer learn, check out this Visual Introduction to Machine Learning.

displaCy in a nutshell

What started as a little test to find the best way to present the dependency parse tree examples on the spaCy website, quickly turned into its own project. Most other existing visualisation tools create more or less static images or SVGs and end up more and more confusing the longer the sentence gets. Plus, the output usually looks like your average academic textbook or – no offence – a flash application from the 90s. displaCy is different and uses only CSS, HTML and JavaScript.

What makes displaCy special

  • lightweight and incredibly small: the gzipped core is less than 3kb
  • HTML, CSS & JS only: 100% client-side if necessary. If spaCy runs locally, it can be used without an internet connection and the current DOM can be copied for a static HTML file
  • cross-browser compatible: can be accessed in any (modern) browser and even looks decent on smartphones
  • web standards compatible: produces no weird or invalid markup, doesn't mess with anything and can easily be extended

How it works

The idea behind it is this: In a dependency parse tree, you need to be able to connect all words in all possible ways to describe every possible relation between them. For a sentence with five words, it would look something like this:

For 5 words, we need 4 short arrows of length 1 to connect them to their direct neighbours, 3 arrows of length 2 to connect them to their second next neighbour and so on. Or a little more abstract: For a sentence of n words, we need n - 1 arrows to connect all words to their neighbours – this is what I call level 1. On level 2, an arrow spans over 2 arrows on level 1 to connect the words to their second next neighbours. In total, we can potentially have up to n - 1 levels. Level 1 contains n - 1 arrows, level 2 n - 2, level 3 n - 3 and so on.

To display the example above, I put together this simple HTML structure:

<div class="arrows">
  <div class="level level1">
    <span class="arrow"></span>
    <span class="arrow"></span>
    <span class="arrow"></span>
    <span class="arrow"></span>
  </div>

  <div class="level level2">
    <span class="arrow"></span>
    <span class="arrow"></span>
    <span class="arrow"></span>
  </div>

  <div class="level level3">
    <span class="arrow"></span>
    <span class="arrow"></span>
  </div>

  <div class="level level4">
    <span class="arrow"></span>
  </div>
</div>

<div class="words">
  <span>word one</span>
  <span>word two</span>
  <span>word three</span>
  <span>word four</span>
  <span>word five</span>
</div>

The API

In order to visualise a sentence, we need to feed it to the spaCy API which can be set up to return pretty much anything we need. So Matt set it up to return JSON formatted data that contains the words and arrows in separate objects. Using JavaScript, those can be turned into HTML markup similar to the example above. If an arrow is used, it gets the class left or right to style it and set the direction. If it's unused, it's set to display: none. That way, we are able to keep a flexible structure and calculate size and positions using only CSS.

I'm skipping the detailed explanation of the JavaScript code because it really is quite simple and mostly a bunch of loops iterating over arrows and words, calculating things here and there and adding stuff to the DOM (I might make a separate post about this later). The CSS is where the real magic happens.

The basic CSS

It all starts with the integral part of the whole thing: the arrow. I knew it was possible to draw all kinds of shapes with CSS so I gave it a quick go on Codepen and it came together surprisingly fast. The arrow needs only one HTML element, <div class="arrow"> and the CSS pseudo-elements :before and :after. See the example here:

The :before pseudo-element is used for the arc and is essentially a circle (border-radius: 50%) with a 250px diameter and a black outline. Since its parent .arrow is only half its height and set to overflow: hidden, it’s "cut in half" and ends up looking like a half circle.

For the arrow head, I used the :after pseudo-element and a simple upside-down CSS triangle shape. The element itself is 0px wide and 0px high so basically nonexistent – and that’s the trick. It creates the pointy tip and with the top being elevated by 20px (border-top: 20px solid black) but no sides to apply the border to, the transparent borders on the left and right side create the triangle shape. To attach the arrow head to the arc, it’s set to position: absolute, bottom: 0 and right: 0.

Here's a dynamic SCSS version of two arrows (left and right) that shows how easy it is to change an arrow's direction. To play around with it, simply change the variables at the top of the SCSS code. It's not perfect but it shows another trick: The arc doesn't have the full length but $length - ($head/2 - $thickness) – if you take half of the arrow head's width and the thicknesss of the arc off, the arc aligns perfectly in the middle of the arrow.

The complex CSS

Based on the HTML markup above, I came up with the first draft to connect the arrows and words. I wrote this in SCSS which was a tremendous help in maintaining a neat structure and keeping track of the numbers and calculations. I have no idea how I managed to survive before CSS preprocessors. To view the final compiled CSS, click "View compiled" in the SCSS tab. (And even if you're not interested in this particular use of the CSS arrows – you can make pretty cool patterns with them, too.)

To make this happen in CSS, all levels have an absolute position and are stacked on top of each other. In order to start the first arrow the middle of the first word, all levels need a left offset of half of the word's width, i.e. 100% / n / 2. The second child of each level – :nth-child(2) – starts in the middle of word 2, so it gets an additional offset of the word's width, 100% / n. The third child is moved even further along to word 3, so :nth-child(3) receives an offset of 100% / n * 2. The list goes on.

Automating the logic in SCSS

To show how this can be automated, here’s a quick SCSS function:

$words: 5; // Number of words
$height: 200px; // Static value
$width: 100% / $words;

@for $i from 1 through $words {
    .level#{$i} {
        height: $height * $i;

        .arrow {
            width: #{$width * $i};
        }
    }

    .level .arrow:nth-child(#{$i}) {
        left: #{$width * ($i - 1)};
    }
}

This little loop creates a class for each level (named level1, level2 and so on) and adds the respective level height and arrow width. It then adds a declaration for every possible arrow. Note $i - 1 because the moving of child elements starts with the second child which is moved by 100% / n * (2 - 1) = 100% / n.

This is obviously just a little drafted example of the basic principle. It took a bit of fiddling and calculating to find the perfect ratios for the arrows to expand and add dynamic paddings for each level to avoid arrow overlapping. A crucial ingredient was the CSS3 calc() function to calculate sizes from different units like px and percent (even though it’s pretty much supported by all modern browsers, it’s still fairly underrated). Another helpful strategy was to take advantage of valid native HTML attributes like title which I added to both arrows and words. This doesn't only make it incredibly easy to add tags to the arcs via content: attr(title) in CSS, it also adds semantic value.

The main calculations are now done directly in JavaScript and then added to the DOM to avoid redundant markup and make it even more compact and lightweight.

Using and customising displaCy

Even without any additional CSS, displaCy renders the arrows and words correctly and looks pretty decent. But since it outputs plain HTML with a bunch of descriptive classes and titles, it can be styled in a lot of different ways to highlight various features. The demo comes with a few different themes to show that it is possible to make a dependency parse tree look fun and stylish.

Example: All words have several classes like w-noun, w-verb or w-ent (named entities) which can be used to style the word or its container. In combination with the title this means that we’re able to style almost anything using CSS selectors and pseudo-elements only. Here's a left to right arrow for a nominal subject on level 1 with a red arc and a blue arrow head:

.level1 .arrow.right[title="nsubj"]:before {
  border-color: red;      /* arc */
}
.level1 .arrow.right[title="nsubj"]:after {
  border-top-color: blue; /* arrow head */
}

Interactive features and future plans

Aside from the standard view that simply displays a full sentence, we've also added a bunch of interactive features to show what's going on behind the scenes and visualise how computers "understand" language. The "Step through" mode lets you walk through the annotation process step by step and shows each action in order. The "Annotate" mode lets you annotate the sentence manually – even using your keyboard. This could be incredibly useful for companies that need to annotate text efficiently and require an adaptable solution that doesn't suck.

displaCy is still only a demo but we're planning on implementing new features and making it available as an official addon. If you are interested in spaCy and displaCy, you can follow the development on the spaCy blog.

Latest Posts