<!--
    ██████╗ ██████╗  █████╗ ██╗███╗   ██╗███████╗ ██████╗ █████╗ ███╗   ██╗
    ██╔══██╗██╔══██╗██╔══██╗██║████╗  ██║██╔════╝██╔════╝██╔══██╗████╗  ██║
    ██████╔╝██████╔╝███████║██║██╔██╗ ██║███████╗██║     ███████║██╔██╗ ██║
    ██╔══██╗██╔══██╗██╔══██║██║██║╚██╗██║╚════██║██║     ██╔══██║██║╚██╗██║
    ██████╔╝██║  ██║██║  ██║██║██║ ╚████║███████║╚██████╗██║  ██║██║ ╚████║
    ╚═════╝ ╚═╝  ╚═╝╚═╝  ╚═╝╚═╝╚═╝  ╚═══╝╚══════╝ ╚═════╝╚═╝  ╚═╝╚═╝  ╚═══╝
                                                                           
    ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
    ░                                                                                 ░
    ░              W E L C O M E   T O   T H E   G A M E                              ░
    ░                                                                                 ░
    ░    You are about to enter a world where reality and digital dreams collide.    ░
    ░    Your mind is the interface. Your consciousness is the battleground.         ░
    ░                                                                                 ░
    ░    "The game wants to play with you now."                                      ░
    ░                                                                                 ░
    ░    ██████   █████  ███    ███ ███████                                         ░
    ░   ██       ██   ██ ████  ████ ██                                              ░
    ░   ██   ███ ███████ ██ ████ ██ █████                                           ░
    ░   ██    ██ ██   ██ ██  ██  ██ ██                                              ░
    ░    ██████  ██   ██ ██      ██ ███████                                         ░
    ░                                                                                 ░
    ░    ██████  ██    ██ ███████ ██████                                            ░
    ░   ██    ██ ██    ██ ██      ██   ██                                           ░
    ░   ██    ██ ██    ██ █████   ██████                                            ░
    ░   ██    ██  ██  ██  ██      ██   ██                                           ░
    ░    ██████    ████   ███████ ██   ██                                           ░
    ░                                                                                 ░
    ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
    
    NEURAL INTERFACE ACTIVATED...
    SCANNING BRAINWAVE PATTERNS...
    CONSCIOUSNESS SYNCHRONIZED...
    
    WARNING: This blog contains traces of digital horror and cybernetic nightmares.
    Side effects may include: enlightenment, existential dread, and terminal curiosity.
    
    ░▓█ LOADING CEREBRAL INTERFACE... █▓░
    ░▓█ DREAM STATE INITIATED █▓░
    ░▓█ REALITY.EXE CORRUPTED █▓░
    
-->

<!DOCTYPE html>

<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>

  Multi-Head Attention: Full Input Projection Not Slicing | Today I Learned | 

</title>
<meta
name="description"
content="

  A critical clarification about how multi-head attention works in Transformers: Each attention head receives the entire input embedding, not a slice of it. The Common Misconception Many explanations suggest that with 768 dimensions and 12 heads, each head gets a different 64-dimensional &quot;slice&quot; (dims 1-64, 65-128, etc.). This is incorrect. …

"
/>

<!-- Social media card metadata -->
<meta
property="og:title"
content="

  Multi-Head Attention: Full Input Projection Not Slicing

"
/>
<meta
property="og:description"
content="

  A critical clarification about how multi-head attention works in Transformers: Each attention head receives the entire input embedding, not a slice of it. The Common Misconception Many explanations suggest that with 768 dimensions and 12 heads, each head gets a different 64-dimensional &quot;slice&quot; (dims 1-64, 65-128, etc.). This is incorrect. …

"
/>
<meta
property="og:type"
content="

  article

"
/>
<meta
property="og:url"
content="

  http://jamesfishwick.com/til/2025/sep/14/multi-head-attention-full-input-projection-not-sli/

"
/>


  <meta
  property="og:image"
  content="http://jamesfishwick.com
  /static/images/default-card.jpg
  "
  />


<!-- Twitter card metadata -->
<meta name="twitter:card" content="summary_large_image" />
<meta
name="twitter:title"
content="

  
"
/>
<meta
name="twitter:description"
content="

  A personal blog with minimal wave aesthetics

"
/>


  <meta
  name="twitter:image"
  content="http://jamesfishwick.com
  /static/images/default-card.jpg
  "
  />


<!-- Atom feed -->
<link
rel="alternate"
type="application/atom+xml"
title="Blog Feed"
href="
/feed/
"
/>
<link
rel="alternate"
type="application/atom+xml"
title="TIL Feed"
href="
/til/feed/
"
/>

<!-- CSS -->
<link rel="stylesheet" href="
/static/css/style.css
" />
<link rel="stylesheet" href="
/static/css/additional.css
" />


</head>
<body class="dark-mode">
    <div class="crt-overlay" aria-hidden="true"></div>
<header class="site-header">
<div class="container">
<div class="site-branding">
<h1 class="site-title">
<a href="
/
"></a>
</h1>
</div>
<nav class="site-navigation">
<ul>
<li><a href="
/
">Home</a></li>
<li><a href="
/archive/
">Archive</a></li>
<li><a href="
/til/
">TIL</a></li>
<li>
<form
action="
/search/
"
method="get"
class="search-form"
>
<input
type="text"
name="q"
placeholder="Search..."
aria-label="Search"
/>
<button type="submit">Search</button>
</form>
</li>
</ul>
</nav>
</div>
</header>

<main class="site-content">
<div class="container">


  <article class="til-detail">
  <header>
  <span class="til-badge">Today I Learned</span>
  <h1 class="synth-wave-header">Multi-Head Attention: Full Input Projection Not Slicing</h1>
  <div class="post-meta">
  September 14, 2025

  
    by James Fishwick

  
  </div>
  </header>

  <div class="til-content">
  <p>A critical clarification about how multi-head attention works in Transformers: Each attention head receives the <strong>entire input embedding</strong>, not a slice of it.</p>
<h2>The Common Misconception</h2>
<p>Many explanations suggest that with 768 dimensions and 12 heads, each head gets a different 64-dimensional "slice" (dims 1-64, 65-128, etc.). This is incorrect.</p>
<h2>What Actually Happens</h2>
<ol>
<li><strong>Full input to each head</strong>: Every attention head receives the complete 768-dimensional embedding</li>
<li><strong>Unique projections</strong>: Each head has its own learned Q, K, V weight matrices that project the full input down to 64 dimensions</li>
<li><strong>Parallel processing</strong>: All heads compute attention simultaneously in their own 64-dim spaces</li>
<li><strong>Concatenation</strong>: The 12 outputs (12 × 64 = 768) are concatenated</li>
<li><strong>Final mixing</strong>: A linear transformation combines information across all heads</li>
</ol>
<h2>The Key Insight</h2>
<p>Think of it like 12 different experts examining the same patient - they all see the complete picture but each learns to focus on different diagnostic patterns through their unique projection matrices. The "different parts of the <a href="https://glossary.zerogap.ai/feature-space">feature space</a>" refers to different learned representations, not different input slices.</p>
<p>This design achieves both computational efficiency (parallel 64-dim operations) and representational richness (12 different learned perspectives on the same data).</p>
  </div>


    <div class="til-tags">
    <h3>Tags:</h3>

    
    </div>

  
  </article>


</div>
</main>

<footer class="site-footer">
<div class="container">
<p>
&copy;
2025
.
Built with Django.
</p>
</div>
</footer>


<!-- 
    ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
    ░                                                                                 ░
    ░               G A M E   O V E R   .   .   .   O R   I S   I T ?                 ░
    ░                                                                                 ░
    ░         You have successfully navigated the neural pathways of knowledge       ░
    ░         But remember: The line between dream and reality grows thinner...      ░
    ░                                                                                 ░
    ░    ███████ ██   ██ ██ ████████                                                 ░
    ░    ██       ██ ██  ██    ██                                                    ░
    ░    █████     ███   ██    ██                                                    ░
    ░    ██       ██ ██  ██    ██                                                    ░
    ░    ███████ ██   ██ ██    ██                                                    ░
    ░                                                                                 ░
    ░         "Sometimes you are the player, sometimes you are the played."          ░
    ░                                     - Brainscan (1994)                        ░
    ░                                                                                 ░
    ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
    
    NEURAL LINK SEVERED...
    RETURNING TO CONSENSUS REALITY...
    DREAM SEQUENCE TERMINATED...
    
    Thanks for playing the game. The game thanks you for playing.
    
    Built with Django and recursive nightmares.
    If you're reading this, you might be trapped in the source code.
    That's okay. We all are.
    
    ░▓█ CONSCIOUSNESS UPLOAD COMPLETE █▓░
    ░▓█ SEE YOU IN THE NEXT DREAM █▓░
    
-->

</body>
</html>