SAE Academy
Fundamentals

Document Structure

The required skeleton of every HTML page and what each part does.

Every HTML file you write starts the same way. Not because it is a convention or a style choice, but because browsers require a specific structure to parse and render a page correctly. Skip parts of it and the browser falls back to guessing, and its guesses are often wrong.

This lesson covers what that structure is, why each piece exists, and what breaks without it.

The DOCTYPE declaration

The first line of every HTML file must be:

<!DOCTYPE html>

The DOCTYPE (Document Type Declaration) tells the browser which version of HTML you are writing. This particular declaration means HTML5, the current standard.

What happens without it

Without a DOCTYPE, the browser switches to quirks mode. Quirks mode is a compatibility fallback that mimics the behaviour of browsers from the late 1990s, before HTML and CSS were standardised. Browsers still support it to avoid breaking old websites.

In quirks mode:

  • The CSS box model works differently. Elements are sized in a way that was standardised in Internet Explorer 5.
  • Some CSS properties behave differently or are ignored.
  • Vertical centering, table layouts, and percentages can all break in unexpected ways.

You can tell if a page loaded in quirks mode by opening browser developer tools, going to the Console tab, and typing document.compatMode. If it returns "BackCompat", the page is in quirks mode. If it returns "CSS1Compat", it is in standard mode.

A page that looks fine during development can render incorrectly on certain browsers because of missing DOCTYPE. Always write it.

<!-- Wrong: no DOCTYPE -->
<html>
<head><title>My Page</title></head>
<body><p>Hello</p></body>
</html>

<!-- Correct -->
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <title>My Page</title>
</head>
<body>
  <p>Hello</p>
</body>
</html>

The <html> element and the lang attribute

The <html> element is the root of the document. Every other element is a descendant of it. It wraps the entire page.

The lang attribute tells the browser and assistive technology what language the page content is written in.

<html lang="en">

Why lang matters

Screen readers (software that reads the page aloud for blind or low-vision users) use the lang attribute to pick the right pronunciation engine. An English text-to-speech engine reading French will mispronounce every word. An English engine reading Arabic will read left to right instead of right to left.

Without lang, a screen reader falls back to the operating system's default language. If a French user visits your English site, their screen reader will attempt to read English words with a French pronunciation engine. The result is often incomprehensible.

<!-- Wrong: missing lang — screen readers guess -->
<html>

<!-- Wrong: wrong lang code — French reader on an English page -->
<html lang="fr">

<!-- Correct: English page -->
<html lang="en">

<!-- Correct: French page -->
<html lang="fr">

<!-- Correct: Spanish page -->
<html lang="es">

Language codes follow the BCP 47 standard. The two-letter codes (en, fr, es, de, ja) cover most cases. Regional variants use a subtag: en-US for American English, en-GB for British English, pt-BR for Brazilian Portuguese.

You can also set lang on individual elements to mark a word or phrase in a different language than the rest of the page:

<html lang="en">
<body>
  <p>The French greeting <span lang="fr">bonjour</span> means "good day".</p>
</body>
</html>

The <head> element

The <head> element contains metadata: information about the page that is not displayed to the user. The browser uses this metadata to render the page correctly and to describe the page to search engines, social media previews, and other tools.

Nothing inside <head> appears on the page itself.

Character encoding: <meta charset="UTF-8">

A character set (charset) defines how the characters in your HTML file are encoded as bytes. UTF-8 is the encoding that can represent every character in every language, including Latin characters, accented letters, Arabic script, Chinese characters, and emoji.

Without this meta tag, the browser guesses the encoding. For ASCII-only content, it often guesses correctly. The moment your content includes accented characters, curly quotes, or anything outside basic English, the guess can be wrong.

Here is what happens with missing or incorrect charset on a page containing French text:

<!-- What you wrote -->
Ça va ?

<!-- What a misconfigured browser renders -->
Ça va ?

The characters are stored correctly in the file, but the browser interprets the bytes with the wrong decoder, producing garbage. The fix is always the same: declare UTF-8 explicitly, and save the file as UTF-8.

<!-- Wrong: no charset declaration -->
<head>
  <title>My Page</title>
</head>

<!-- Correct -->
<head>
  <meta charset="UTF-8" />
  <title>My Page</title>
</head>

The charset meta tag must appear within the first 1024 bytes of the document. Put it first inside <head>, before any other tags.

Viewport: <meta name="viewport">

The viewport meta tag controls how the browser scales your page on mobile devices.

<meta name="viewport" content="width=device-width, initial-scale=1.0" />

width=device-width tells the browser to set the page width equal to the device's screen width, rather than a fixed default. initial-scale=1.0 means no zoom is applied when the page first loads.

Without this tag, mobile browsers use a default viewport width of around 980 pixels and then scale the entire page down to fit the screen. A page designed for desktop will appear tiny and users will need to pinch-zoom to read anything.

Checking the viewport behaviour without this tag is straightforward: open a page in Chrome, press F12 to open developer tools, click the "Toggle device toolbar" icon (the phone/tablet icon), and reload. Without the viewport meta tag, the page renders as a shrunken desktop version.

<!-- Wrong: no viewport — page renders at ~980px on mobile, looks zoomed out -->
<head>
  <meta charset="UTF-8" />
  <title>My Page</title>
</head>

<!-- Correct -->
<head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>My Page</title>
</head>

The page title: <title>

The <title> element sets the name displayed in:

  • The browser tab
  • The browser's bookmark when a user saves the page
  • Search engine results pages (as the blue clickable link)
  • Screen reader announcements when the page loads

A missing title means the browser tab shows the URL, bookmarks save as "Untitled", and search engines have nothing useful to display. A generic title like "Home" is only slightly better than nothing.

<!-- Wrong: no title -->
<head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
</head>

<!-- Wrong: generic title -->
<title>Page</title>

<!-- Correct: specific, descriptive -->
<title>Checkout - SAE Academy</title>

A good title describes the current page first, then the site name. This matters because tabs truncate text from the right. "SAE Academy - Checkout" becomes "SAE Academy -..." when the tab is narrow. "Checkout - SAE Academy" becomes "Checkout -...", which is more useful.

The <body> element

The <body> element contains all visible content: headings, paragraphs, images, forms, everything the user sees and interacts with.

<body>
  <h1>Welcome to SAE Academy</h1>
  <p>This is the first paragraph of the page.</p>
</body>

Content placed outside of <body> (but inside <html>) is invalid HTML. Browsers are forgiving about this and will often place the content in the body anyway, but you should not rely on that behaviour.

The browser's forgiveness is not a safety net

Browsers are deliberately lenient with malformed HTML. If you omit <head>, <body>, or even <html>, modern browsers will insert them automatically. If you forget a closing tag, browsers often add it where it seems to belong.

This is by design. The web would break if one typo crashed a page for millions of users. But the browser's repair decisions are not always what you intend. An implicit close can put elements in the wrong nesting. An auto-inserted <tbody> can break a CSS grid on a table.

Writing the structure explicitly is not optional extra work. It makes your intent clear and avoids relying on the browser to do the right thing.

The complete skeleton

Every HTML page you write should start with this structure:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>Page Title - Site Name</title>
</head>
<body>

</body>
</html>

All visible content goes inside <body>. All metadata goes inside <head>. The DOCTYPE and <html> wrap everything.

Common mistakes

Missing DOCTYPE. The page enters quirks mode. Box model and layout behave unpredictably across browsers.

Missing or wrong lang attribute. Screen readers use the wrong pronunciation engine. Users who rely on text-to-speech hear garbled output.

Missing charset. Special characters render as garbage. UTF-8 decoding errors are invisible during development if your test content is ASCII-only, then appear in production when real content is used.

Missing viewport meta. The page is unreadable on mobile without pinch-zooming. This affects a majority of web users.

Missing or generic <title>. Browser tab shows the URL. Search engines have nothing to index. Bookmarks are unnamed.

Putting visible content in <head>. The browser will usually move it to <body>, but not always in the way you expect.

Exercise

Build a complete HTML document from scratch. Fill in the <body> with a heading and a paragraph about yourself or any topic you choose.

The preview panel renders your HTML directly. If your page structure is correct, you will see your heading and paragraph styled with the provided CSS.

When you are done, the preview should show your heading and paragraph. If you see nothing, check that you have a <body> element and that your content is inside it.

On this page