OpenSourceVarsity

pdf icons

An Introduction To HTML5

The HTML5 Doctype Element

NOTE: This material pre-assumes competence in HTML4 coding.

Before entering into the world of code associated with an HTML5 webpage template, here is a very brief introduction to the syntax and semantics of HTML5. The good news is that HTML5 is designed to be backwards compatible with both HTML4 ( and XHTML1 ). The difference between HTML4 and HTML5 is purely syntactical.

Because HTML5 is designed to be backwards compatible, most HTML4 documents can also effectively be HTML5 documents. The syntactical rules of the languages are unchanged, meaning if your HTML4 documents conform to the syntactical rules of that language, they also automatically conform syntactically to HTML5.

Hence, in order for any HTML4 document to be converted to an HTML5 document, only a single change needs to be carried out. The HTML5 doctype declaration must replace the HTML4 doctype declaration in use. The HTML5 specification states that an HTML5 document requires a DOCTYPE to be specified to ensure that all Browsers render the HTML5 document in their standards mode. The DOCTYPE declaration has no other purpose.

The doctype declaration for any HTML5 document is simply <!DOCTYPE html>

Simply use this doctype declaration at the start of any HTML4 document, and any Browser will recognize it to be an HTML5 document and try to render the HTML document in standards mode.

Declaring Character Encodings in HTML5

When one visualizes how textual content will be displayed on a webpage in a Browser, it’s very likely that one thinks of characters and symbols that are displayed on the computer’s VDU when it renders HTML code to display a web page.

Regretfully, computers do not deal with the characters and symbols that human beings understand. Computers have each of the characters and symbols human beings understand stored within specific character and symbol encoding.

There is actually a lot of such character and symbol encoding that is stored in and accessed by a computer when required. Different encoding sets are used for different languages, such as Russian, Chinese, English and so on. Broadly speaking, character encoding provides the computer a way to map what you see on its screen, i.e. what is rendered as web page content, and the coding patterns that the computer stores within its memory and on its hard disk that it uses to render the web page.

In reality, it’s a lot more complicated than that. The very same character or symbol might appear in more than one encoding, but each encoding may use a completely different sequence of bytes to actually store the character or symbol in its memory or on its hard disk.

Hence, if the computer is given a sequence of bytes and informed that it is text, it would need to know exactly what encoding to use to correctly decode the sequence of bytes and display the correct character or symbol on its VDU so that this image makes complete sense to the human being sitting in front of the VDU.

So now the question arises, How does your Browser actually determine the character encoding used by the Web server when it delivers web page content to it? Without the Browser determining the character encoding correctly it’s more than possible that erroneous content will be displayed on the VDU.

Now it’s quite possible that you’ve seen the following information contained with the HTTP header:
Content-Type: text/html; charset=”utf-8″

Briefly, this is the Web server’s way of saying that it thinks it’s sending you an HTML document and that it thinks that the document uses UTF-8 encoding. Unfortunately, very few web page creators have any kind of control on their Web server, hence HTML4 provides a way that page creators could specify the character encoding within the HTML4 document itself.

It’s quite possible you have seen the following codespec as well:
<meta httpequiv=”ContentType” content=”text/html; charset=utf-8″>

This is the way that the webpage creator informs the Browser that they authored the HTML document using the UTF-8 character encoding. Both of the techniques explained above ( i.e. HTTP headers sent by a Web server and the meta tag attributes ) still work in HTML5.

The HTTP header is the preferred method.

The HTTP header always overrides the <meta> tag if present, but because not everyone can set HTTP headers, the <meta> tag and its attributes is still around.

In HTML5 this is the syntax: <meta charset=”utf-8″ />

The rule of thumb is as follows, coders who create web pages in languages that are entirely based on the Roman character set ( such as English ) can normally get away with not really having to bother about character encodings as most operating systems and Browsers typically treat the Roman character set as their default character set, but ideally all coders should clearly identify which character encoding was used to create the web page either via the Web server’s HTTP header or via the meta tag within the HTML document.

The Root Element

The HTML codespec interpreted by a Browser to produce a webpage is actually a series of nested elements. The entire structure of a webpage is like a tree. Some of the elements in the HTML codespec are siblings. These HTML elements can be visualized as multiple branches that extend from the same tree trunk.

Other HTML elements are children of other HTML elements. There are HTML elements that can have no children these are called leaf nodes.

The outermost element, which is the ancestor of all the other elements in the HTML codespec, is called the root element. The root element of HTML codespec is always <html>.

In most HTML codespec the root element is always coded as follows:
<html xmlns=http://www.w3.org/1999/xhtml lang=en xml:lang=en>

This code snippet is perfectly okay, it is valid HTML5, but parts of this code snippet are no longer necessary in HTML5, a few bytes of unnecessary HTML code can be saved by removing them.

Let’s take a quick look at the following HTML codespec:
<html xmlns=http://www.w3.org/1999/xhtml lang=en xml:lang=en>

The first attribute to the opening <html is xmlns. This attribute is drawn from XHTML 1.0. The value passed to this attribute indicates that all the elements contained within the codespec are in the XHTML namespace, http://www.w3.org/1999/xhtml.

Elements that belong to HTML5 are always in this namespace, so there is longer a need to declare it explicitly. The HTML5 codespec will work exactly the same in all Browsers, whether this attribute is present or not.

Dropping the xmlns attribute leaves the root element with only the following attributes:
<html lang=en xml:lang=en>

The two attributes, lang and xml:lang, both define the language of the HTML web page. These are remnants of XHTML. However, only the lang attribute has any effect in HTML5. Hence keep the xml:lang attribute, but ensure that it contains the same value as the lang attribute.

That means that the HTML5 root element is: <html lang=en>

The <head> Element

The first child of the root element is usually the <head> element. The <head> element in HTML codespec is used to hold metadata which is important technical information about the web page.

The <head> element itself has not changed in any way in HTML5. Important technical information about the web page is what is placed between the <head></head> elements.

Here is a small example:

Code Snippet 1

<head> 
  <meta charset=utf-8" /> 
  <title>My HTML5 Template</title> 
  <link rel="stylesheet" href="style-original.css"  type="text/css" /> 
  <link rel="shortcut icon"  href="/favicon.ico" /> 
</head>

HTML5 breaks link relations into two categories:

1. Links to external resources.
These links point to specific resources that are external to the HTML codespec, but contain specific information that will be used to augment the current HTML document in some way.

2. Hyperlinks to other HTML documents.
In the code snippet contained in Code Sample 1, only the first link which has the attribute rel=”stylesheet” is a link that points to an external resource.

Expanding the very first link relation in Code Snippet 1.
<link rel=stylesheet href=style-original.css type=text/css />
Perhaps this is the most frequently used link relation when coding in HTML.

<link rel=”stylesheet”> is used for pointing to CSS rules that are stored in an external file named
style-original.css.

In HTML5 the type attribute can be dropped. There really is only one stylesheet language for the web, CSS. Hence that is always taken as the default value for the type attribute therefore need not be specifically declared. This works in all Browsers.

Hence in HTML5 a link to an external style sheet can be coded as follows:
<link rel=”stylesheet” href=”styleoriginal.css” />

The Structural And Semantic Aspects Of HTML5

Well for now most of the basic housekeeping aspects of an HTML5 document are out of the way. It is now time to get involved with the structural and semantic aspects of HTML5.

The header element

A very common section of most web page layouts is a header section. This section typically contains one or more headings for the web page.

Often web page developers using HTML4 or XHTML1 would place such content within:
<div; class=”header”> Web Page header content placed here </div>

Because this is such a common practice, HTML5 has defined an element expressly for this purpose, i.e. the header element ( NOTE: please do not confuse this with the head element).

The HTML5 header element is intended to define the start and end of a webpage’s header section. Multiple blocks of page content can be place in the header section such as:

  • A company Logo and Name
  • A web site search sub system
  • A website Login In / Register sub system

And so on.
Just remember that having a header section in a web page is completely optional.

What is interesting is that HTML5 header elements are not restricted to being used only at the top of a webpage.

The HTML5 header element can be used within page content as well, for example – Blog post titles. The header element may contain h1, h2 other such elements, but this is mandatory. The header element may also contain any kind of content. The only restriction is that header elements may not contain header or footer elements nested within them.

As mentioned earlier the header not mandatory so that all HTML5 documents must have a header element, so, for example, if the web page simply begins with an h1 element, there is no need to enclose the h1 element within a header element.

The hgroup Element

In addition to the header element, HTML5 also has an hgroup element. The header element may contain most kinds of content ( except section and footer elements ), the hgroups element must contain only headings. The hgroups element helps the sectioning ( or outlining ) algorithm of HTML5 work successfully.

This material will not explore what the sectioning ( or outlining ) algorithm of HTML5 really is because this is really more relevant to Browser developers than web page developers. Just to touch the subject HTML5 has an explicit outline, model, as well as an implicit one, while traditionally HTML has only had an implicit outline model.

One could visualize sectioning as the table of contents in a book typically, heading levels define where new sections start and end.

Such an outline is implicit, because headings imply the beginning of a new section. HTML5 goes one step further, by having an explicit outlining model. Where ever a new section of a web page is required use the section element, which will contain the content ( i.e. mark the start and end ) of that specific section.

Very common header codespec will be marked up as follows:

<header>
	<h1>Ivan Bayross</h1>
	<h2>Technical Writer, Mentor and Friend</h2>
</header>

In this case, the codespec contained within the header section is implicitly is part of the outline, the h1 element creates an implicit section, and the h2 element an implicit subsection.

In situations like this, if only one heading from the codespec must be part of the web page outline, enclose that heading within the hgroup element. Then only that will be part of the web page outline when it is being interpreted by the Browser’s built in HTML5 interpreter.

Now the code snippet will be as follows:

<header>
	<hgroup>
		<h1>Ivan Bayross</h1>
		<h2>Technical Writer, Mentor and Friend</h2>
	</hgroup>
</header>

This is happily not something that one has to pay a great deal of attention to right now, simply because at present no Browser has actually implemented the HTML5’s outlining algorithm within their HTML5 interpreters, but to future proof the HTML5 markup one can group headings like using the hgroup element.

The nav Element

HTML5 has defined a nav element used for marking up the site navigation codespec of a web page. According to the HTML5 specification – The nav element represents a section of a webpage that links to other web pages or to content within same webpage.

To use the nav element first identify the codespec of the web page used for navigation. Then markup these navigational links ( which are very commonly enclosed within <ul> <li> elements ) within a nav element.

According to HTML5 the nav element should be used to enclose links to content within a web page itself, or other pages contained within the website.

Hence, ( by the above definition ) links on a web page, which point to any kind of resource on other websites ( i.e. other webs pages, images, .pdf file and so on ) are not considered as navigational links in HTML5 and as such should not be enclosed within the nav element.

In reality not every link or group of links on a web page must be enclosed within the HTML5 nav element. For example, the nav element is most likely not the most appropriate markup element to use to enclose links to privacy policies and terms and conditions linked to within the footer of a web page. However, if a link points to contact details and other significant parts of a site these can be enclosed within the nav element.

<nav>
	<ul>
		<li>Menu Link 1</li>
		<li>Menu Link 2</li>
		<li>Menu Link 3</li>
	</ul>
</nav>

The footer Element

Just as many HTML web pages have header sections, another common web page section is the footer section.

Typically up to today HTML coders marked up the web page content as follows:
<div class=”footer”> Web Page footer content placed here </div>

Just as HTML5 has defined the header element, it also defines a footer element, which can be used instead of placing web page footer content within a div whose id=”footer” as shown below:

<footer> Web Page footer content placed here </footer>

Just as with the header element, there may be more than one footer element in a web page. Each specific section of the web page may have its own footer.

If this idea seems a little strange, consider the example of a Blog post. Each Blog post may have a header containing a title, author information and so on, as well as a footer which might have the Blog posts date time stamp, authors signature, as well as various share this style links perhaps.

The header and footer elements of HTML5 are just perfect for marking up this kind of web page content.

Now while the above elements broadly describe the structure of an HTML5 document, there are specific HMTL5 elements that have been defined to describe the webpage content area. The webpage content area in an HTML5 document is what holds content such as, Body text, Links, Images, Videos, Podcasts and so on.

In the next section of this material we will take a quick look at the HTML5 elements that are used to markup the structure of the webpage content in the HTML5 document.

Sectioning In HTML5

Take a close look at the contents of any webpage, it will be organized into logical sections. For example if the web page represents a newspaper this will definitely contain sections such as sports, news, business, articles, and so on simply because these are the sections that all newspapers contain.

Similarly if the webpage is part of an E-Book it could contain chapters, chapter header, paragraph headers, paragraphs, ( what is often termed body text ) appendices, and such.

Although a ton of webpage content is written as sections of information, neither HTML4 nor XHTML1 ever provided any explicit markup elements that could be used to create discrete web page sections.

One may ask, what about the HTML div element? Well, a div is just a generic container in which to place web page content. The div element can be visualized as a catch all bucket for web page content, which must be marked up by a host of other HTML4 or XHTML elements. The div element has no explicit meaning of its own.

Wrapping web page content within a div element is great for managing code complexity. It provides programmers a mechanism for placing web page content within a div and styling such content using CSS, but there is no specific semantics associated with a div.

All web page content sections in an HTML4 document are implicit. Headings begin new sections of a document.

This is not in any way relevant to most normal web users, but is a very important feature for vision impaired site visitors who use screen readers. These site visitors frequently rely upon headings in a web page as a means of navigating through it.

HTML5 introduces an explicit sectioning model. There are a number of new structural elements, ( some have been already touched on ), which can be used to section web page content.

Here are the new sectioning HTML5 elements with brief descriptions from the HTML5 specification:

The section Element

The HTML5 section element represents a generic chunk of content within a web page or web application. The HTML5 section element represents a thematic grouping of web page or web application content, typically with a heading.

The article Element

The HTML5 article element represents a self-contained block of web page content in an HTML5 web page or web application. The article element is intended to be independently distributable or reusable, for example in data syndication.

The article element can be used to enclose:

  • A forum post
  • A magazine or newspaper feature story
  • A blog entry
  • A user-submitted comment

Diagram 1 – Describes a standard webpage

The <section> Element

The HTML5 section element is used for grouping related content within the body of any web page. The section element can be nested, i.e. <section></section> may contain other <section></section> elements.

Some web developers ask where does this leave the div element? Well, a div need not be used to segregate web page content any more. The web page content placed within a single <div></div> can be segregated using <section></section>.

Further segregation of the same web page content can be achieved by nesting <section></section> within another <section></section>.The <div></div> elements can often be converted to <section></section> in a web page crafted using HTML5.

Think of a very common web page, design pattern.
The web page will have:

  • Header content
  • Main page content

With two adjacent columns:

  • One column which could be devoted affiliate links, displayed as affiliate banners.
  • The other column could be used to display web site navigation links, a blog rolls, a login sub-system

All of this followed by webpage footer content as shown in diagram 1.

The <article>Element

The HTML5 article element has great potential to cause confusion. To clarify theHTML5 article element is a special kind of section. The HTML5 section element is quite generic, simply used to mark the start and end of related web page content.

Blog posts are a great example for the correct use of the element section.

Should the web page belong to a newspaper for example then to segregate and group news article the article element is ideal.

To decide when to use <article></article> and when to use <section></section>, use the HTML5 element <section></section> to enclose any block of web page information that can stand separately / independently within the rest of the web page content.

When writing Blog posts, a Blogger often writes each Blog post in such a way that each post’s content is in some way linked with the one immediately above or below. This kind of web page content is ideally enclosed using <section></section>.

Anyway, if ever you are puzzled and are wondering whether to use <section></section> or <article></article> to enclose specific web page content ( please do not over think this ) simply use <section></section> and all should work just fine.

The <aside> Element

The HTML5 aside element represents a section of page content which is tangentially related to the main content of the web page. The web page content enclosed with the aside element can be considered separate from main page content. In the print world, such content is often printed within the sidebar of the page.

I think the HTML5 element aside is poorly named, but that’s the element name that the HTML5 developers have chosen so I guess we will just have to live with it.

Enclose web page content that is tangentially related to the rest of the web page content within the <aside></aside> elements in the web page.

Do remember that there are no hard and fast rules associated with using HTML5 elements, hence simply use these elements using common sense and the HTML5 codespec will be interpreted correctly by most HTML5 compliant Browsers.

Now let’s take a look at a simple HTML5 template code as shown in code block 2.

Code block 2:

<!DOCTYPE HTML>

<html>

<head>

<title>HTML 5 Template</title>
<meta charset=”utf-8″> <meta name=”description” content=””> <meta name=”author” content=”Ivan Bayross”>

<!– Linking to template_css.css. All bootstrap overrides declared here –> <link href=”template_css.css” type=”text/css” rel=”stylesheet”>

<!–[if lt IE 9]> <script src=”http://html5shiv.googlecode.com/svn/trunk/html5.js”></script> <![endif]–>
</head>

<body>

<header> </header>

<nav> <ul> <li><a href=”#”>Link one</a></li> <li><a href=”#”>Link two</a></li> <li><a href=”#”>Link three</a></li> </ul> </nav>

<div class=”pageContent”>

<article class=”articleContent”>
<section></section>

<section></section> </article>

<aside class=”sidebar1″></aside>

<aside class=”sidebar2″></aside>
<!– end .sidebar1 –>

</div> <!– end .pageContent –>

<footer> <address></address>
</footer>

</body>
</html>

May 5, 2016
Design by Ivan Bayross and Meher Bala © 2016 All Rights Reserved
X