When translating a document name to a url path segment the following rules are used:

  • Control characters (ASCII / Unicode / ISO/IEC 8859-1 codes below 0x1f and between 0x80 and 0x9f such as tabs, carriage return, line feeds, end-of-text, bell, etcetera) are removed;
  • A non brakable space is converted into a regular space;
  • Multiple spaces are merged into a single space;
  • Spaces are converted to hyphens;
  • All common printable characters (ASCII / Unicode / ISO/IEC 8859-1 codes in the range 0x21 and 0x7f) are are used as-is, unless there is a replacement as specified in table 1.
  • All letters from the Unicode Latin-1 Supplement block are converted to their lowercase base letter, e.g. 0x00c0 Latin Capital Letter A with grave, is converted to "a".
  • All letters from the Unicode Latin Extended-A block are converted to their lowercase base letter, e.g. 0x0100 Latin Capital Letter A with macron, is converted to "a".
  • Other special characters (ASCII / Unicode / ISO/IEC 8859-1 codes not in the range 0x21 and 0x7f) that are not listed in table 2 or table 3 are converted to lowercase;
  • Leading spaces and ending spaces are removed.

All rules and translation tables are applied, not just the first matching rule. So if a rule indicates that a character is converted into a space, and another rule specifies that spaces are converted to hyphens, than the character is converted into a hyphen

The following translation tables for printable characters are used:

table 1: special handling of some regular printable characters
!removed
"removed
#removed
$usd
%removed
&removed
'removed
(removed
)removed
*converted into space
+converted into space
,removed
-converted into hyphen
.removed at end; otherwise not changed
/converted into hyphen
:converted into space
;converted into space
<removed
=converted into hyphen
>removed
?removed
@-at-
{removed
|converted into hyphen
}removed
~converted into hyphen
table 2: ISO 8859-1 special characters
¡removed
¢ct
£gbp
¤removed
¥yen
¦-
§removed
¨removed
©removed
ªremoved
«removed
¬removed
­ -
®removed
¯-
°removed
±-
²removed
³removed
´removed
µremoved
removed
·removed
¸removed
¹removed
ºremoved
»removed
¼removed
½removed
¾removed
Ðd
Øo
Ùu
Úu
Ûu
Üu
Ýy
Þy
ßss
àa
áa
âa
ãa
äa
åa
æae
çc
èe
ée
êe
ëe
ìi
íi
îi
ïi
ðd
ñn
òo
óo
ôo
õo
öo
÷removed
øo
ùu
úu
ûu
üu
ýu
þy
ÿy
table 3: translation of Unicode characters above 0xc200
c2a1removed
c2a2ct
c2a3gbp
c2a4removed
c2a5yen
c2a6removed
c2a7removed
c2a8removed
c2a9removed
c2aaremoved
c2abremoved
c2acremoved
c2ad-
c2aeremoved
c2af-
c2b0removed
c2b1removed
c2b2removed
c2b3removed
c2b4removed
c2b5removed
c2b6removed
c2b7removed
c2b8removed
c2b9removed
c2baremoved
c2bbremoved
c2bcremoved
c2bdremoved
c2beremoved
c2bfremoved
c380a
c381a
c382a
c383a
c384a
c385a
c386ae
c387c
c388e
c389e
c38ae
c38be
c38ci
c38di
c38ei
c38fi
c390d
c391n
c392o
c393o
c394o
c395o
c396o
c397x
c398o
c399u
c39au
c39bu
c39cu
c39dy
c39ey
c39fss
c3a0a
c3a1a
c3a2a
c3a3a
c3a4a
c3a5a
c3a6ae
c3a7c
c3a8e
c3a9e
c3aae
c3abe
c3aci
c3adi
c3aei
c3afi
c3b0d
c3b1n
c3b2o
c3b3o
c3b4o
c3b5o
c3b6o
c3b7removed
c3b8o
c3b9u
c3bau
c3bbu
c3bcu
c3bdy
c3bey
c3bfy