From 5a991f1e0e9714100e7319bbcda72e7eda713904 Mon Sep 17 00:00:00 2001 From: pab Date: Tue, 2 Jun 2026 12:25:13 +0200 Subject: [PATCH] working on explanation --- data/processed/proportions_CA_table.md | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/data/processed/proportions_CA_table.md b/data/processed/proportions_CA_table.md index ce2c331..fe528d9 100644 --- a/data/processed/proportions_CA_table.md +++ b/data/processed/proportions_CA_table.md @@ -21,9 +21,15 @@ Every other column is a variable used in the CA. Variables are grouped into eigh | `migration` | 2 | Inmigrations, outmigrations. | | `demography` | 2 | Number of retirees, number of localities. Pooled into one block so block normalisation can run. | +#### A note on the demography block + +Pooling *retirees* and *localities* is partly a technical workaround. Block normalisation needs at least two variables in a block, otherwise the rescaling collapses every row to the same constant and the variable carries no information. + +Both variables describe **how the population is distributed**: retirees say something about *who lives there* (ageing concentration), localities say something about *where they live* (how many separate settlements the municipality has). The contrast the block ends up encoding—retirees relative to localities—is in effect "people-per-settlement vs spread-thin-across-settlements". A municipality with many retirees per locality reads as an ageing population concentrated in a few settlements; one with many localities per retiree reads as a population spread thinly across small ones. That contrast lines up with the urban–rural gradient the analysis is built to detect. + ### Supplementary blocks (2) -| Block | Variables (n) | Content | +| Block | Variables (n) | What it captures | | ------------- | ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `provision` | up to 33 | Counts of educational institution units by type (preschool, primary, secondary, adult, HE) × {total, public, private}. Some columns with no observations anywhere are dropped automatically. | | `opinion` | 9 | Survey-based satisfaction with preschool, elementary school and high school (bad / mid / good shares). | @@ -34,7 +40,7 @@ The pipeline lives in `src/municipalities/04-proportions_CA.R`. The table you se ### 1. Backfilled 2022 cross-section -For every variable, the value used is the 2022 figure if available; otherwise the most recent earlier figure for that municipality (priority: 2020-2021 window, then census-closest, then discontinued-last). This mirrors the sampling logic of `01-sampling.R`. +For every variable, the value used is the 2022 figure if available; otherwise the most recent earlier figure for that municipality (priority: 2020-2021 window, then census-closest, then discontinued-last). ### 2. Block normalisation @@ -43,7 +49,7 @@ Inside each *active* block (and the opinion block), every municipality's row is - every municipality contributes the same row mass to the CA (no size effect), and - every block contributes the same total weight (no block dominates because its raw counts are bigger). -So the value in, say, the `Upper secondary` column for Upplands Väsby (435.2) reads as "435.2 of 1000 within the education block for that municipality" — i.e. ~43.5% of the municipality's educated population sits at upper-secondary level. +So the value in, say, the `Upper secondary` column for Upplands Väsby (435.2) reads as "435.2 of 1000 within the education block for that municipality"; i.e. ~43.5% of the municipality's educated population has attained upper-secondary level. ### 3. Provision rescaled as a per-capita rate @@ -51,8 +57,8 @@ The supplementary `provision` columns are *not* block-normalised. Instead each c ### 4. Renaming, drop-empty -Columns are renamed to short readable labels (see `proportions_CA_table_columns.csv` for the mapping). Any column that is zero in every municipality (e.g. an institution type with no private units anywhere) is dropped — it carries no information and would break the CA's supplementary projection. +Columns are renamed to short readable labels (see `proportions_CA_table_columns.csv` for the mapping). Any column that is zero in every municipality (e.g. an institution type with no private units anywhere) is dropped. ## How to read a row -A row is a municipality's *profile* across all blocks. Within each active block the values sum to 1000 and can be read as per-mille shares; provision values are rates per 100 000 inhabitants; opinion values are also normalised to a per-1000 share within their preschool/elementary/highschool triples. Across blocks the values aren't comparable as raw numbers — that's the whole point of block normalisation: each block is comparable to *itself* across municipalities, not to the other blocks. +A row is a municipality's *profile* across all blocks. Within each active block the values sum to 1000 and can be read as per-mille shares; provision values are rates per 100 000 inhabitants; opinion values are also normalised to a per-1000 share within their preschool/elementary/highschool triples. Across blocks the values aren't comparable as raw numbers; each block is comparable to *itself* across municipalities, not to the other blocks.