From 5a991f1e0e9714100e7319bbcda72e7eda713904 Mon Sep 17 00:00:00 2001
From: pab <pablillocea@gmail.com>
Date: Tue, 2 Jun 2026 12:25:13 +0200
Subject: [PATCH] working on explanation

---
 data/processed/proportions_CA_table.md | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/data/processed/proportions_CA_table.md b/data/processed/proportions_CA_table.md
index ce2c331..fe528d9 100644
--- a/data/processed/proportions_CA_table.md
+++ b/data/processed/proportions_CA_table.md
@@ -21,9 +21,15 @@ Every other column is a variable used in the CA. Variables are grouped into eigh
 | `migration`  | 2             | Inmigrations, outmigrations.                                                                    |
 | `demography` | 2             | Number of retirees, number of localities. Pooled into one block so block normalisation can run. |
 
+#### A note on the demography block
+
+Pooling *retirees* and *localities* is partly a technical workaround. Block normalisation needs at least two variables in a block, otherwise the rescaling collapses every row to the same constant and the variable carries no information.
+
+Both variables describe **how the population is distributed**: retirees say something about *who lives there* (ageing concentration), localities say something about *where they live* (how many separate settlements the municipality has). The contrast the block ends up encoding—retirees relative to localities—is in effect "people-per-settlement vs spread-thin-across-settlements". A municipality with many retirees per locality reads as an ageing population concentrated in a few settlements; one with many localities per retiree reads as a population spread thinly across small ones. That contrast lines up with the urban–rural gradient the analysis is built to detect.
+
 ### Supplementary blocks (2)
 
-| Block         | Variables (n) | Content                                                                                                                                                                                       |
+| Block         | Variables (n) | What it captures                                                                                                                                                                              |
 | ------------- | ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `provision` | up to 33      | Counts of educational institution units by type (preschool, primary, secondary, adult, HE) × {total, public, private}. Some columns with no observations anywhere are dropped automatically. |
 | `opinion`   | 9             | Survey-based satisfaction with preschool, elementary school and high school (bad / mid / good shares).                                                                                        |
@@ -34,7 +40,7 @@ The pipeline lives in `src/municipalities/04-proportions_CA.R`. The table you se
 
 ### 1. Backfilled 2022 cross-section
 
-For every variable, the value used is the 2022 figure if available; otherwise the most recent earlier figure for that municipality (priority: 2020-2021 window, then census-closest, then discontinued-last). This mirrors the sampling logic of `01-sampling.R`.
+For every variable, the value used is the 2022 figure if available; otherwise the most recent earlier figure for that municipality (priority: 2020-2021 window, then census-closest, then discontinued-last).
 
 ### 2. Block normalisation
 
@@ -43,7 +49,7 @@ Inside each *active* block (and the opinion block), every municipality's row is
 - every municipality contributes the same row mass to the CA (no size effect), and
 - every block contributes the same total weight (no block dominates because its raw counts are bigger).
 
-So the value in, say, the `Upper secondary` column for Upplands Väsby (435.2) reads as "435.2 of 1000 within the education block for that municipality" — i.e. ~43.5% of the municipality's educated population sits at upper-secondary level.
+So the value in, say, the `Upper secondary` column for Upplands Väsby (435.2) reads as "435.2 of 1000 within the education block for that municipality"; i.e. ~43.5% of the municipality's educated population has attained upper-secondary level.
 
 ### 3. Provision rescaled as a per-capita rate
 
@@ -51,8 +57,8 @@ The supplementary `provision` columns are *not* block-normalised. Instead each c
 
 ### 4. Renaming, drop-empty
 
-Columns are renamed to short readable labels (see `proportions_CA_table_columns.csv` for the mapping). Any column that is zero in every municipality (e.g. an institution type with no private units anywhere) is dropped — it carries no information and would break the CA's supplementary projection.
+Columns are renamed to short readable labels (see `proportions_CA_table_columns.csv` for the mapping). Any column that is zero in every municipality (e.g. an institution type with no private units anywhere) is dropped.
 
 ## How to read a row
 
-A row is a municipality's *profile* across all blocks. Within each active block the values sum to 1000 and can be read as per-mille shares; provision values are rates per 100 000 inhabitants; opinion values are also normalised to a per-1000 share within their preschool/elementary/highschool triples. Across blocks the values aren't comparable as raw numbers — that's the whole point of block normalisation: each block is comparable to *itself* across municipalities, not to the other blocks.
+A row is a municipality's *profile* across all blocks. Within each active block the values sum to 1000 and can be read as per-mille shares; provision values are rates per 100 000 inhabitants; opinion values are also normalised to a per-1000 share within their preschool/elementary/highschool triples. Across blocks the values aren't comparable as raw numbers; each block is comparable to *itself* across municipalities, not to the other blocks.