The retirement of HTML

  1. HTML is a technology that emerged in the beginning of the 90s, and the current version (4.01) dates back to 1999. Lately, HTML5 has been a big buzzword, but I must say it's overrated and definitely not a new silver bullet. HTML5 defines some new tags, but that's basically it. The vocabulary is still as limited as the current version of HTML. Vocabulary is only one of HTML's limitations. The time has come for something new.

    The basic function of HTML was to represent text (on the internet). An HTML page defined a document, such as a standalone text or a part of a bigger whole, such as a chapter. The document consisted of a body section for its content, and a header section for meta-data. Additionally, HTML supported hyperlinks and various data structures such as lists and tables.

    Soon, the desire to change appearance emerged. To address this, CSS was born. CSS enabled modifications of how a browser would render the HTML pages, and all was well. Finally we could rid ourselves of ugly borders around an image hyperlink.

    But let's get back to the problem. Eventually, static HTML pages just wasn't good enough. We wanted interactive elements and dynamic manipulations of the DOM. Much of this technology did not achieve widespread use before the dawn of AJAX. Since AJAX became the new hype, content on the web was not merely documents anymore. Content would now consist of text of course, but also be portals, intranets, applications or even games. These are problems that just doesn't fit well in the HTML document model. Today, we are using a mix of technologies that just doesn't work well with what we want to make, causing a lot of overhead and complexity. Creating a dynamic web page requires a lot of work and is hard to debug.

    I think the first issue we need to solve is the current lack of separation. The web has become home to many things, which can primarily be distilled into two things; data and presentation. HTML pages contain both data and presentation structures. CSS can only describe presentation, but does usually not act alone. Indeed, it's not unusual that significant amounts of HTML code is required only to achieve certain UI requirements. The HTML vocabulary is also quite insufficient and there's often data duplicity. These are problems that should be solved.

    Let's first address data content. In one of my previous posts, I wrote about the semantic web and RDFa. The idea behind the semantic web is that all data on the web should be defined in such a way that machines can read it and connect it to other data. HTML is insufficient for this. For example, a machine cannot determine if a HTML item list represents a menu or a bullet list within a text. We need data sources with semantics. RDF requires separate XML files to describe the data. That duplicity is usually not very desirable. RDFa offers a "shortcut" by allowing us to add semantics within a HTML page. Now it's time to retire HTML, however, and invent something better.

    Lets assume we have two data sources. One of them is a query-able source of news on a fictional site; mysite.com. The other represents a menu. In this case, both data sources are in XML format, but could hypothetically be in any format. Examples of what their contents could be, is given below. For the sake of simplicity, name-spaces are not used.

    <newsitems>
      <newsitem>
        <name>Newsflash</name>
        <text>Something amazing has happened!</text>
        <time>2010-10-10</time>
        <keywords>
          <keyword>News</keyword>
          <keyword>Flash</keyword>
          <keyword>Amazing</keyword>
        </keywords>
      </newsitem>
      <newsitem>
        <name>Old news</name>
        <text>Something bad happened!</text>
        <time>2010-10-09</time>
        <keywords>
          <keyword>Bad</keyword>
        </keywords>
      </newsitem>
    </newsitems>
    <menuitems>
      <menuitem>
        <name>Home</name>
        <url>http://www.mysite.com</url>
      </menuitem>
      <menuitem>
        <name>About</name>
        <url>http://www.mysite.com/about</url>
      </menuitem>
    </menuitems>

    Having taken the data out of the picture, what we have left is presentation. There is really no need for a "page" anymore. Basically, we must have a way to describe a user interface, and the ability to map elements to data sources. CSS and the DOM model are adequate for this purpose, but with a couple of extensions.

    First of all we need a way to bind data sources. The base reference to a data source would be an URL and optional parameters. Additionally, there should be a way to define a selection of data within the given data source. Let us think of the data as objects and collections of objects. It would make sense to use programming-like syntax for this. Lets use "." to refer to the current object and "[" and "]" to handle collections. Below are some example references for the news data source. Note that we will have to access the "newsitems" element before we can access it's children. We don't really have any interest in the wrapper object, but we need to access it.

    "Default""First""All""First 5""First 5 reversed"
    ".newsitem"".newsitem[0]"".newsitem[0..]"".newsitem[0..4]"".newsitem[4..0]"

    Let's introduce two new keywords; "data" and "dataset". The "data" keyword refers to an object or a collection of objects needed to generate an UI element's nested contents. The "dataset" keyword is also such a reference, but without binding it to the nested content.

    Let's define a simple user interface. It should consist of a banner at the top, the main menu of the site and a list of the latest news. The menu and list is bound to our two data sources. Below is a hypothetical "template" using HTML- and CSS-like vocabulary in JSON format. Notice how nested content bound to a data source using the data keyword is only defined once, but rendered several times.

    {
    
      type        : "document",
      dataset     : "/data/news?i=2.newsitems",
      title       : "Latest news @ Mysite.com",
      description : "The latest news from Mysite.com",
      keywords    : ".newsitem[0..].keywords", //Concatenation
      content     : [
        
        {
          id          : "banner",
          type        : "block",
          width       : "100%",
          text-align  : "center",
          content     : "Mysite.com brings you the latest news"
        },
        
        {
          id          : "mainmenu",
          type        : "orderedlist",
          data        : "/data/mainmenu.menuitems.menuitem",
          content     : [
            {
              type        : "listitem",
              display     : "inline-block",
              content     : [
                {
                  type        : "anchor",
                  href        : ".url",
                  title       : ".name",
                  content     : ".name"
                }
              ]
            }
          ]
        },
        
        {
          id          : "newslist",
          type        : "block",
          data        : ".newsitem",
          content     : [
            {
              type        : "block"
              content     : [
                {
                  type        : "header",
                  content     : ".name"
                },
                {
                  type        : "paragraph",
                  content     : ".text"
                }
              ]
            }
          ]
        }
      
      ]
    }

    Optionally, we could define it using XML format, which in this case actually looks less verbose. The above template would look something like this:

    <document dataset="/data/news?i=2.newsitems" title="Latest news @ Mysite.com" description="The latest news from Mysite.com" keywords=".newsitem[0..].keywords">
    
      <block id="banner" width="100%" text-align="center">Mysite.com brings you the latest news</block>
      
      <orderedlist id="mainmenu" data="/data/mainmenu.menuitems.menuitem">
        <listitem display="inline-block">
          <anchor href=".url" title=".name" content=".name" />
        </listitem>
      </orderedlist>
      
      <block id="newslist" data=".newsitem">
        <block>
          <header content=".name" />
          <paragraph content=".text" />
        </block>
      </block>
      
    </document>

    Let's also consider making our own vocabulary through name-spaces and classes. That way we can isolate specific styles in separate files. Our template then becomes a lot cleaner and more controller-like; binding data into a structure. Here's a simplification of the XML code above:

    <document dataset="/data/news?i=2.newsitems" title="Latest news @ Mysite.com" description="The latest news from Mysite.com" keywords=".newsitem[0..].keywords">
    
      <mysite:banner>Mysite.com brings you the latest news</mysite:banner>
      
      <mysite:mainmenu data="/data/mainmenu.menuitems.menuitem">
        <mysite:mainmenuitem>
          <anchor href=".url" title=".name" content=".name" />
        </mysite:mainmenuitem>
      </mysite:mainmenu>
      
      <mysite:newslist data=".newsitem">
        <block>
          <header content=".name" />
          <paragraph content=".text" />
        </block>
      </mysite:newslist>
      
    </document>

    This is starting to look like real UI "programming". Data have been moved into separate data sources and can thus be part of the semantic web, without having to program complex relationships between data and UI. The UI can now be defined by more meaningful, presentation-oriented templates. The templates can be made by designers and only the addition of data-bindings is needed to make them work. These templates are not so different from server-side templates, but unlike them, these ones does not involve any server processing at all. The UI is generated in the browser. Also, since the browser controls the data retrieval process, developers does not need to write scripts for dynamic content through methods like AJAX or Comet, but instead rely on the browser to perform such tasks.

    It's time to rid ourselves of HTML and look to the future of the web.