Some Heading

This is a simple HTML article.

Text can be nested in html tags. Multiple whitespaces are collapsed, but we try to keep linebreaks (\n). Sentences can span multiple
lines.
We should be able to split sentences that contain: 1. multiple dots and interpunctation 2. lists and other things ... but still be just one sentence.
However, this sentencing does not have to be perfect, e.g. deteciting some artifacts as sentences should still be ok. With the TextRank algorithm and other plausibility checks (e.g. POS checking with spaCy) we should be able to filter these.
Here's a list of things, let's see how this split: