ruralitic-qrm/data/processed/proportions_CA_table.md
2026-06-02 12:25:13 +02:00

5.8 KiB
Raw Permalink Blame History

proportions_CA_table.csv

The input table to the proportions_CA correspondence analysis (Swedish municipalities, 2022 cross-section). One row per municipality, one column per variable. This is the table used for the CA. Values have already been normalised (see "How it was built" below). Companion file proportions_CA_table_columns.csv lists, for every column, its role (active or supplementary) and its block.

Rows

290 Swedish municipalities, identified by code (4-digit SCB code, zero-padded) and municipality name.

Columns

Every other column is a variable used in the CA. Variables are grouped into eight thematic blocks. Six are active (they shape the CA axes); two are supplementary (projected onto the axes for interpretation but not used to build them).

Active blocks (6)

Block Variables (n) Content
education 4 Four levels of educational attainment, primary/lower-secondary through post-graduate.
employment 16 16 activity sectors (NACE-style), agriculture through arts & recreation.
housing 3 Rented, tenant-owned, and owner-occupied dwellings (sum across building types).
workplace 3 Commuters in, commuters out, working and living in the same municipality.
migration 2 Inmigrations, outmigrations.
demography 2 Number of retirees, number of localities. Pooled into one block so block normalisation can run.

A note on the demography block

Pooling retirees and localities is partly a technical workaround. Block normalisation needs at least two variables in a block, otherwise the rescaling collapses every row to the same constant and the variable carries no information.

Both variables describe how the population is distributed: retirees say something about who lives there (ageing concentration), localities say something about where they live (how many separate settlements the municipality has). The contrast the block ends up encoding—retirees relative to localities—is in effect "people-per-settlement vs spread-thin-across-settlements". A municipality with many retirees per locality reads as an ageing population concentrated in a few settlements; one with many localities per retiree reads as a population spread thinly across small ones. That contrast lines up with the urbanrural gradient the analysis is built to detect.

Supplementary blocks (2)

Block Variables (n) What it captures
provision up to 33 Counts of educational institution units by type (preschool, primary, secondary, adult, HE) × {total, public, private}. Some columns with no observations anywhere are dropped automatically.
opinion 9 Survey-based satisfaction with preschool, elementary school and high school (bad / mid / good shares).

How it was built

The pipeline lives in src/municipalities/04-proportions_CA.R. The table you see in this CSV is the result of three steps.

1. Backfilled 2022 cross-section

For every variable, the value used is the 2022 figure if available; otherwise the most recent earlier figure for that municipality (priority: 2020-2021 window, then census-closest, then discontinued-last).

2. Block normalisation

Inside each active block (and the opinion block), every municipality's row is rescaled so the block-total is the same constant (1000). After this step:

  • every municipality contributes the same row mass to the CA (no size effect), and
  • every block contributes the same total weight (no block dominates because its raw counts are bigger).

So the value in, say, the Upper secondary column for Upplands Väsby (435.2) reads as "435.2 of 1000 within the education block for that municipality"; i.e. ~43.5% of the municipality's educated population has attained upper-secondary level.

3. Provision rescaled as a per-capita rate

The supplementary provision columns are not block-normalised. Instead each count is divided by the row sum of the education block (a population proxy) and multiplied by 100 000, giving an "institutions per 100 000 inhabitants" rate. Block normalisation here would assign artificially high "shares" to rare institution types in small municipalities that happen to host one, throwing their supplementary projection far outside the active cloud.

4. Renaming, drop-empty

Columns are renamed to short readable labels (see proportions_CA_table_columns.csv for the mapping). Any column that is zero in every municipality (e.g. an institution type with no private units anywhere) is dropped.

How to read a row

A row is a municipality's profile across all blocks. Within each active block the values sum to 1000 and can be read as per-mille shares; provision values are rates per 100 000 inhabitants; opinion values are also normalised to a per-1000 share within their preschool/elementary/highschool triples. Across blocks the values aren't comparable as raw numbers; each block is comparable to itself across municipalities, not to the other blocks.