1. General
This page contains a collection of EUROMOD modelling conventions. It gathers a list of agreed rules and as such partly forward-looking, with some rules yet to be implemented in EUROMOD. All guidelines are meant as compulsory except recommendations which are denoted with [REC].
- The term “base-year simulation” refers to the case in which the year of policy rules matches the input data income reference period (e.g. 2019 policy rules and SILC 2020). The purpose of base-year simulation is to provide information about the actual income distribution in the most accurate way. Thus, aggregate figures of simulated components should match as closely as possible the official statistics. However, this should be done without any calibrations of the sort that would distort the effect of simulated changes. The aim is to create the best starting point for simulating changes, not to reproduce base year statistics as an end in itself.
- The term “target-year simulation” refers to the case in which the year of policy rules does not match input data income reference period (e.g. 2020-2022 policy rules and SILC 2020 with 2019 incomes). The purpose of target-year simulation is to provide accurate policy simulations based on certain assumptions about growth in (market) incomes rather than deriving the actual income distribution. Thus, aggregate figures of simulated and non-simulated components will not necessarily match official statistics. Nevertheless, by comparing against official figures one needs to check that the results from the simulation and uprating (see Section 4) move in a sensible direction and so, be able to explain any large variation in EUROMOD estimates. Furthermore, the Country Report discussion of the macrovalidation results should explain, where possible, the main reasons for diverging external and EUROMOD estimates (e.g. falling unemployment not captured in the target year EUROMOD estimates). See Section 14 for more on validation.
- The term “baseline simulations” refers to the best-possible combination (best match) between policy rules year and input data (e.g. 2019-2022 policy rules and SILC 2020, in case SILC 2020 is the most recent data available).
- The EU official country acronyms are to be used (e.g. in tax unit and input/output file names etc). The list can be found at http://publications.europa.eu/code/en/en-370100.htm.
2. Input datasets
- Input datasets are kept in (tabulated) text format.
- The files based on EU-SILC data are named CC_SILCyear_x# where CC is the country acronym, SILCyear refers to the SILC data collection year, x is a letter identifying the SILC database source (a, b, …) and # is the version number. For example, ES_2015_b1.txt.
- The files based on EU-SILC and HBS data are named CC_SILCyear_x#_HBSyear_COICOPversion_y$ where HBSyear refers to the HBS collection year, COICOPversion identifies the 2-digit year of COICOP version used in the data, y is a letter identifying the expenditure database source (e for EU-HBS, n for national HBS, a for admin data) and $ is the version number of the matched EUROMOD dataset. For example, ES_2015_b1_2015_03_e2.txt.
- Hypothetical household datasets generated using the Hypothetical Household Tool (HHoT) are named CC_year_hhot where cc is the country acronym, and year refers to the policy year. For example, ES_2015_hhot.txt.
- All variables in input data must be documented in a Data Requirement Document (DRD). Most importantly, how they have been derived from the original source and what they contain. See also information requirements in the DRD template.
- Every dataset must include all compulsory variables (see the list in the DRD template). Where no required information is available, a variable with zero values needs to be created.
- Reference units:
- Data is provided at the individual level (with people grouped into households).
- All income variables must be provided with gross values, i.e. before deduction of employee and self-employed social insurance contributions and any taxes but excluding employer social insurance contributions. Where these are not available they must be imputed. If both gross and net values are recorded in the data, such values must be checked to see if these are reliable (e.g. gross > net; ratio between gross and net according to income source, personal characteristics and tax-benefit rules).
- Income and expenditure data must be expressed in monthly terms, i.e. divided by 12 if originally recorded in annual terms regardless of the actual number of months of receipt. Where the latter is known this information needs to be retained in the input dataset for improving the simulation accuracy of monthly-based policies (e.g., social insurance contributions and unemployment benefit). This does not apply to (monetary) asset variables as these reflect the stock of resources not flow.
- Monetary variables need to be presented in the national currency (if other than the euro). Convert monetary variables using the same exchange rate as was used originally to convert the national currency into euros (e.g., PX010 in SILC). See Section 5 for rules concerning exchange rate for policy parameters and output.
- Sample adjustments:
- Observations with zero (or negative) household weight need to be dropped from the sample. If observations with positive weights have been dropped (e.g. they had missing information on income) the weights must be recalibrated.
- Children who are born after the income reference period must be dropped from the sample. (For example, those born in 2006 in EU-SILC 2006). Instead, an indicator variable is used to denote the number of such children in each household.
- EUROMOD operates only with personal level variables:
- Any monetary variable at the household level in the original dataset (typically, capital income, family allowances, social exclusion benefits, inter-household transfers, taxes and social contributions) must be assigned to one person in the household. Components of market income can be also divided between several persons where it makes sense. When the choice of the person(s) is not obvious, the value should be assigned to the person whose age is closest to 45 (either below or above). If there are more than one such persons, then the one who appears first in the household (by idperson).
- Capital and property income must be shared equally between the oldest household member and his/her partner (the underlying assumption being that in the case of three generation households it would be probably the oldest couple who would retain this kind of income).
- Any monetary variable related to housing at the household level in the original dataset (housing allowances, imputed rent, housing cost) must be assigned to the person(s) responsible for the accommodation. If there are two such persons then the amount is shared equally between them.
- Any non-monetary variable at the household level in the original dataset should be assigned to all the persons in the household. (Such variables are explicitly marked in the variable configuration file).
- Children with no parents (“loose children”) are not to be assigned to other adults in the household if there are any. However, one should check how common these cases are.
- Other imputations:
- Categorical variables have to be created with categories, not as dummy variables.
- Missing values are not allowed. For non-applicable (N/A), -1 or 0 can be used.
- Negative values of benefit and tax variables can optionally be recoded as zero in the input dataset (as likely to represent back payments and hence not related to the income reference period in question), except variable tad (originally hy145n in the EU-SILC) which is meant to record that by definition.
- Negative values of market income variables (e.g. self-employment and financial income) are to be included in the input dataset without any adjustments as can be legitimate values for a given income reference period. However, these should be recoded as zero in the policy neg_cc in EUROMOD.
- Age must be recorded as at the end of the income reference period.
- Where there is no information about financial capital in the original data source (e.g. SILC), this needs to be imputed using average interest rates in the income reference year for each type of investment income recorded in the data. If such detailed information is not available then, by default, financial capital should be imputed using the bank interest rate on deposits from households with an agreed maturity of over one year provided by the ECB Data Portal.
- Level of detail:
- All cash incomes available in the original dataset need to be retained, including information on taxes and benefits that are to be simulated in the model (where this is reliable). Capital gains and other lump-sum incomes (e.g. lottery winnings, severance pay) must be clearly separated from other incomes (e.g. dividends, interests etc) if possible.
- Income data should be as detailed as possible. Only the same type of income can be aggregated (e.g. earnings from the main and secondary job but not two unemployment benefits where one is means-tested and another not), on the condition that more detailed information is not needed for the model and unlikely to be used. Incomes can be aggregated up to class 2 (see Section 3), e.g. yem.
- The same monetary information cannot be entered in more than one variable of the same level of detail. If all different components (i.e. commonly rooted variables) are known then their sum must be equal to the root (with the exception of variables reporting duration).
- Where both detailed and aggregated variables are included (e.g. yem and yem*) the exact relationship must be documented.
- [REC] Housing expenditure should be as detailed as possible, at least with rent expenses separated from others (e.g. electricity, water etc.).
- Original ID variables (renamed to origid*) are always retained so that the EUROMOD input dataset can be linked to the original data source (e.g., UDB or national SILC).
3. Variable naming convention
- Variables must be named following a naming convention that consists of a list of acronyms that joined together in a predetermined order build the variable's name. For a list of variables and acronyms currently used in EUROMOD see the model (Administration Tools > Variables).
- There are two classes of acronyms ordered hierarchically:
- Class 1: one character that identifies the type of variable (asset, labour market, demographic, system, y for (cash) market income, expenditure, benefit, pension, taxes and contributions, in-kind income); the id-variables are exception as they have 2-letter class 1 acronym.
- Class 2: two-character acronyms specific for each variable type (assets, demographic, etc). Each Class 2 acronym has a unique meaning within each variable type (Class 1). But it could be that the same acronym means different things across variable types (e.g., AG stands for age among demographic variables and agriculture in assets, taxes, market income and benefits). The Class 2 acronyms are listed in ordered groups.
- For example, employment income is named yem: y for market income + em for employment.
- All variable names must always begin with a class 1 acronym followed by at least one class 2 acronym.
- The order of acronyms in the variable name must follow the order of the groups these acronyms are included. This prevents the same variable having different names because the acronyms are used in different orders. For example, a housing benefit complement for pensioners must be named bhopecm instead of bhocmpe (b - benefit, ho - housing, pe - pensioner, cm - complement).
- [REC] Avoid using acronyms from the same class 2 group. If more than one acronym is used from the same group then these should be ordered alphabetically.
- Intermediate Class 2 groups can be omitted, i.e. acronyms from a previous group are not compulsory.
- [REC] Avoid using more than five class 2 acronyms together (this would result in a name with more than 11 characters). Variable names include typically one to three class 2 acronyms.
- Each acronym should add a relevant or useful information. In case of monetary variables each additional acronym should represent an additional level of detail or specificity and it is assumed to be a component (part) of commonly rooted and less detailed variables. For example, bunctcm is a component of bunct which is a component of bun. (Note that main components can be distinguished from complements using the acronym 00. For example, poa00 is the main (“basic”) old-age pension and poacm its complement.)
- Before creating a new variable, it must be checked whether that name (or something similar) is already defined in EUROMOD. It is strongly recommended to use the existing variables whenever possible. For example, if there is already a variable that measures the time worked in months, avoid creating a new one that measures that in weeks or years.
- When new categorical variables are created, the full list of categories, types or status must be documented.
4. Uprating (non-simulated) monetary variables
- Where income reference period and policy year do not match, monetary variables need to be updated to the policy year. This is done by using relevant uprating factors in the model.
- Table 1 provides the guidelines for deriving uprating factors for market incomes as well as pensions, non-simulated benefits, and other sources of income/expenditure, following the purpose of target-year simulations (see Section 1). These guidelines set out the minimum standard that country models need to fulfil. The sources used for the derivation of uprating factors, as well as any departures from the guidelines, should be thoroughly documented in the Country Reports.
- An uprating factor equal to the Harmonised Consumer Price Index is centrally included and named $HICP (time series can be found on the Eurostat website, indicator prc_hicp_aind); projections of HICP published by DG ECFIN can be found here (see indicator ZCPIH). Developers and NTs should not change the HCPI uprating factor. All other uprating factors names should start with the suffix “$f_”.
- If for the same pension/benefit two uprating factors are available calculated using information on 1) the indexation rules and 2) the growth in the average amount, the default uprating factor should be based on 1). However, a switch policy called UAA_cc (Uprating by Average Adjustment) should also enable the use of the uprating factor based on 2) – for switchable policies, see also Section 8.
- Uprating of earnings:
- Uprating factors reflecting the evolution of average hourly wages by sector of economic activity are centrally included in the Uprating Indices table and named $f_hourly_wage_lindi_xx. The time series used to construct them can be found here and here (Eurostat series NAMA_10_A64 and . NAMA_10_A64_E respectively). For the most recent policy year, DG ECFIN's economic forecasts on the nominal compensation per employee were used to populate the series. The relevant statistics are available here.
- In case national sources provide better and/or more granular information than the above-mentioned Eurostat sources, they can be used instead.
- Uprating of pensions and other benefits:
- Detailed description of the policy changes regarding pensions needs to be included in the Country Report. Although pensions are largely not simulated in EUROMOD, they are a very important income source and we ought to know if there have been changes to the system and the extent to which they are captured in our simulations/uprating, if at all.
- Uprating factors for public pensions and other (non-simulated) benefits should reflect statutory and discretionary indexation rules (if such rules exist) as well as changes in policies and exclude changes in the composition of recipients which could affect the dynamics of average amounts. Uprating factors should follow the rules which were applied in practice and should be comprehensive, capturing both regular annual indexation (according to the statutory rules or common practice) as well as additional ad hoc policy changes (cuts, revisions/recalculations, ad hoc increases etc.). As such, calculations could be complex and involve differentiated percentage rates or even lump-sum amounts.
- In addition to the main uprating factor for public pensions (i.e. the default option in the model), provide information on the growth in average public pensions. Growth in average amounts would capture, in addition to the policy changes, changes in the composition of recipients.
| Income component | Uprating factors (by priority) | Comments |
|---|---|---|
| Income from employment | 1) Growth in average hourly wages by sector of economic activity centrally included in the models | In case national sources provide better and/or more granular and/or more timely information than the uprating factors added centrally, the former can be used instead of the latter. |
| Income from self-employment | 1) Growth in average income from self-employment (full-time equivalent if possible) | It is advised not to use information on aggregated self-employment income from national accounts because of likely large compositional changes and income fluctuations. However, if information on average self-employment income is available it can be used for the uprating factor and could be preferred over the minimum standard guidelines. |
| Private transfers | 1) Same as for income from employment | |
| Private pensions | 1) Annual indexation rules | In case private pensions are a minor source of income and average amounts or indexation rules are hard to obtain, discuss other options with the developer. |
| Income from capital | 1) Growth in average income from capital | Ideally, different sources of income from capital (dividends, interest income, profit from unincorporated business) should be distinguished and updated separately using corresponding sources of information. If this is not feasible, then the second best approach is to apply the uprating factor of the dominant income component to all income from capital. |
| Expenditures | CPI | Note that for example, some tax allowances/deductions might be linked to earnings, in which case uprating by growth in average full-time equivalent earnings would be a more appropriate uprating factor. Also, if the model includes data on income and expenditure for any component (e.g. rent on property paid and received) then the uprating indices should be the same for both. |
| Public pensions | 1) Annual indexation and policy changes | If option 1) is used in the model, provide also growth in average pensions. |
| Contributory benefits | 1) Annual indexation and policy changes | |
| Non-contributory benefits | 1) Annual indexation and policy changes | |
| Components that are used only for micro/macro-validation in the base-year | Uprating factor of 1 | These are the components that we simulate in the model; and so they do not enter the simulations or any income list, e.g. tin (income tax), tscee (employee SIC), tscer (employer SIC), yds (disposable income). |
- The model constructs uprating factors for each input dataset on the basis of the raw data, taking relevant year as the base period. Table 2 sets out a hypothetical example of the raw data series for a non-simulated maternity benefit.
| Index | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | Original source and comments |
|---|---|---|---|---|---|---|---|---|---|
| benefit amount (EUR/month) | 339 | 353 | 369 | 389 | 400 | 410 | 415 | 415 | National Statistical Office (link) |
5. System and database configuration
- Each EUROMOD input database (e.g. 2020) can be used, by default, to simulate the best-match policy system (e.g. 2019) and three consecutive years (2020/21/22). This is to flag that system/data with relatively large year gaps can lead to bad quality simulation results. In any case, users are still able to change this configuration manually and run whatever data/system combination they need.
- For each policy year, the (single) best combination/match with available input datasets – baseline simulation – is denoted by ‘best’ (in Country Tools > Databases). For a given policy year, the best dataset is (by priority):
- (i) the input dataset which income reference period matches the policy year, e.g. EU-SILC 2012 (with incomes of 2011) for 2011 policy rules;
- (ii) the latest input dataset which income reference period precedes the policy year, e.g. if EU-SILC 2008 and 2010 are both available then EU-SILC 2008 should be used for 2008 policy rules;
- (iii) the earliest input dataset which income reference period follows the policy year, e.g. if only EU-SILC 2010 is available then this should be used for 2008 policy rules.
- HHoT datasets (see Section 2) are configured in a separate tab from other (micro) datasets. The following standardised specifications apply:
- The currency is set to national.
- Income and data year are the same as the year of the dataset. For example, 2015 for the dataset ES_2015_hhot.txt.
- The default combination of HHoT data and policy system is the policy system with the same year as the dataset and should be set to “x”. Other combinations should be set to “n/a”. For example: the systems ES_2007 to ES_2014 are set to “n/a” for ES_2015_hhot.txt, while the system ES_2015 is set to “x”. Preceding systems are again set to “n/a”.
- All (allowed) policy-dataset combinations must run without producing errors and warnings, however, the checks of validity of the results (see Section 14) are generally limited to the baseline simulations only.
- Any dataset or policy system (or part of it) which are not publicly available (e.g. any system under construction or systems constructed for a specific project) must be defined as private.
- Exchange rate from euro to national currency should be as of June 30 in the corresponding policy year. They are managed centrally and should not be changed manually by developers or NTs.
- Monetary policy parameters should be in the national currency. The choice is flexible for countries having policy years implemented before and after the euro adoption.
- Output data must be in the national currency (by default). In the case of euro-zone countries use the euro also for the systems preceding the euro adoption.
- All policy systems must have also training data defined.
- Each policy system and individual policy/function/parameter has a unique ID, which are critical for automated merging of different versions of the model. To ensure that unique ID-s remain constant (for baseline policy systems), only use add system/policy/function/ parameter feature for genuine new elements and not for revising existing elements.
6. Scope of policies
- Policies are simulated as of June 30 in the corresponding policy year and on an accrual not cash basis (i.e. independent of when actual payments are made). For example, final annual income tax liability is simulated, not just current withheld tax payments. Substantial within-year policy changes can optionally be simulated using an extension (FYA_cc, see Section 8).
- Policies to be simulated (at least partly) are:
- social insurance contributions: employer, employee, self-employed and credited contributions;
- personal income taxes (excluding taxation of capital gains);
- property, wealth and other personal direct taxes if possible;
- cash benefits (including unemployment and parental benefits);
- minimum wage.
- Policies are to be simulated as detailed as possible given the underlying dataset. For example, all components of SICs are to be simulated and retained in separate variables (e.g. health insurance contributions by employers).
- Regional differences are to be simulated as far as possible.
- In general, policies are implemented assuming full benefit take-up and no tax evasion. However, whenever the macro validation exercise shows that results produced by the model substantially deviate from comparable external sources, non take-up and tax evasion should be modelled (if possible). The baseline, in such cases, will be based on the parameterisation accounting for non-take-up and evasion (see also Section 8). Both versions of the model (with and without the adjustment for non take-up and tax evasion) must be validated. Where non take-up is modelled, random assignment is preferred over data eligibility approach.
7. Order and structure of policies
- Policy simulations should begin with the following sequence:
o default values (SetDefault_cc),
o uprating factors (Uprate_cc),
o uprating in bands (Uprate_bands_cc) – if applicable,
o constant definitions (ConstDef_cc),
o initialisation of variables (InitVars_cc) – if applicable,
o standard income list definitions (ILSDef_cc),
o standard income lists for UDB EU-SILC definitions (ILSUDBDef_cc),
o country-specific income list definitions (ILDef_cc),
o tax unit definitions (TUDef_cc),
o generation of random numbers (Random_cc) – if applicable,
o switchable policies (see Section 8) – to keep them outside the METR loop,
o minimum wage (yem_cc), and
o recoding negative values of income (neg_cc) – see Section 2 for further details.
- If pensions are uprated in bands (even only between two years and uprated as usual throughout the rest of the period), this should be implemented in a separate policy called Uprate_bands_cc, which should be placed right after (or as close as possible to) the policy Uprate_cc.
- [REC] If market incomes or non-simulated benefits (apart from pensions) are uprated in bands, this could be implemented either in Uprate_cc or in a separate policy (similar to Uprate_bands_cc).
- Simulation of a policy system (i.e. policy spine) should end with the following two policies: individual-level standard output (
output_std_cc) and household-level standard output (output_std_hh_cc). In contrast to the former, the latter policyoutput_std_hh_ccshould be always switched off. - [REC] Each independent instrument should be modelled in a separate policy module, e.g. every family benefit independent of others. On the other hand, a given policy component should be implemented within the same policy module across policy systems and not be split into different policy modules by years (e.g. when there are large structural changes). The same logic applies for regional policies. A policy module should not be split when different systems exist for different regions.
- Policy name should use the main output variable name of the policy (without
_s) followed by_cc(e.g.tscee_be). In the comments column, instrument’s title must start with either DEF, SWITCH, TAX, BEN or SIC. The title must specify the name of the instrument both in English and in native language in parenthesis. - [REC] It is strongly recommended to store as many policy parameters as constants as possible and especially those repeatedly used throughout the model (e.g. minimum wage). It is also recommended to harmonise constant names using specific naming conventions.
- Constants used in more than one policy must be defined in a separate constant definition policy (
ConstDef_cc). - All monetary values with a reference period (including those on monthly basis) must have the period defined: e.g.
#d(daily),#w(weekly),#m(monthly),#q(quarterly),#y(yearly). Monetary stock parameters must be denoted with#c(capital). This allows the model to distinguish monetary parameters from non-monetary ones. (See EUROMOD help for a complete list of period notations and further details.) - To store intermediate results in the baseline systems, define temporary variables for intermediate results using a prefix
i_in the variable name (the preferred option) or use pre-defined variables (sinXX_s, but notstmXX_s). - Whenever joint taxation is applied, the simulated tax must be allocated proportionally to the taxable income/tax base between the members of the assessment unit in the relevant policy (where the income tax is simulated).
- Each simulated benefit must be assigned to the most likely recipient within the assessment unit (often the head).
- Where (short-term) benefits are adjusted with the number of months in receipt, this should be modelled as the last step in the relevant policy (if possible). Where social insurance contributions contain fixed amount elements, these should be adjusted with the number of months in work.
- Where adjustments such as for benefit take-up or tax evasion rely on random numbers, these should be generated in the policy
Random_ccfor the whole population at the household level, assigning the same number for everyone in the household to ensure consistency with reform scenarios (see for example BE). The assumption is that decisions about benefit take-up or tax evasion are taken at the household, rather than the individual level. - If a policy for initialising variables exists, it should use the harmonised name
InitVars_cc. - Make sure that the Marginal Effective Tax Rate (METR) and Net Replacement Rate (NRR) Add-ons work with the policy system and training data, if necessary make adjustment to account for country specifics. For the calculations of METRs and NRRs, benefit take-up and tax compliance adjustments should be switched off.
8. Extensions (switchable policies)
- Adjustments for e.g. benefit take-up and/or tax evasion must be done by adding extensions (switchable policies) such that the policy system runs with these either switched on or off without requiring any further modifications (rather than having two separate systems and/or policies).
- There are two types of extensions: global extensions and country-specific extensions. Global extensions are those that are available for all countries and should be used uniformly across countries, while country-specific extensions can be specifically designed for each country.
- The most commonly used global policy extensions are listed in Table 3, together with the implications of switching them on/off and their baseline values.
| Global extension | Set to OFF | Set to ON | Baseline value |
|---|---|---|---|
| BTA_cc | Full benefit take-up | Non-take up correction applied | Country-specific |
| TCA_cc | Full tax compliance | Incomplete tax compliance | Country-specific |
| UAA_cc | Respective benefits or pensions are uprated by the indexation rule | Respective benefits or pensions are uprated by the growth in the average nominal amount | Off |
| FYA_cc | Policies as of June 30 | Annual policies (reflecting policy changes over the year) | Country-specific |
| PBE_cc | Parental leave benefits variables taken from the input data | Respective simulated variables used | Country-specific |
| MWA_cc | Minimum wage policy switched off | Minimum wage policy switched on | Off |
- [REC] BTA_cc and TCA_cc extensions should be activated in combination with training data where this is sensible, e.g. if the adjustment is implemented using a random adjustment (say 15% random non-take up).
- The extension Uprating by Average Adjustment (UAA_cc) should be implemented in the following way:
- In the function Uprate in the policy Uprate_cc, one should mark entries for which the uprating factor is based on the growth in the average benefit/pension as switched-on elements of the extension (“add to, switched on”); and mark entries for which the uprating factor equals the indexation rule as switched-off elements of the extension (“add to, switched off”). Thus, in the baseline, when the extension UAA_cc is off, the respective benefit/pension will be uprated by the indexation rule. When the switch policy is on, then the respective benefit/pension will be uprated by the growth in the average amount.
- If pensions are uprated in bands in the policy Uprate_bands_cc, the module should be marked as a switched-off element of the extension (“add to, switched off”). Furthermore, in the function Uprate in the policy Uprate_cc, the uprating factor should be set to 1 conditional on the UAA_cc extension being switched off. Thus, in the baseline, when the extension UAA_cc is off, the respective pension will be uprated by a factor of 1 in Uprate_cc and then uprated in bands in Uprate_bands_cc. When the extension is on, then Uprate_bands_cc will be turned off and the respective pension will be uprated by the growth in the average amount.
- The use of the Full Year Adjustments (FYA_cc) extension is highly encouraged in order to account for important policy changes that occur throughout the year. In particular:
- Policies in place on June 30 that were not in force for the whole year can be accounted for through the FYA_cc extension. In this case, the extension would reduce the duration of the relevant policy to the months it was in force.
- Polices not in place on June 30 can be simulated through the FYA_cc extension. Models can include policies that are not yet implemented in a country, as long as they are officially announced and approved by the authorities.
- One-off policies should be included as part of the baseline simulations (and not as part of the FYA_cc extension) irrespective of the date of their implementation.
- HHoT datasets (see Section 2): Switches for hypothetical household datasets need to be defined such that take-up adjustments, tax compliance adjustments and random assignments are not taken into account.
9. Standard assumptions for simulating specific policies
- Minimum wage
- Minimum wage must be simulated for all countries (even for those where it does not exist) using an extension (MWA_cc, see Section 8).
- The simulation would replace observed employment income with minimum wage which is adjusted with the number of months receiving employment income and the number of working hours (ignoring actual hours above the standard hours, i.e. overtime), whenever the former is lower than the later. The extension is by default set to OFF in the baseline.
- Unemployment (insurance) benefit
- Unemployment benefits (UB) must be simulated for all countries. The policy should be switched ON if the results of simulations are satisfactory (that is the simulated results in the base year correspond to the recorded amount and/or administrative record well in terms of aggregate and average amounts). If not, the policy should be set to OFF. In either case, the implementation of UB should be updated annually and (if possible) the UB replacement rates should be checked.
- Unemployment benefit policy should be clearly marked as ‘PART SIMULATED’ (since eligibility is taken from the data). This should be marked in the comments fields next to the policy itself and next to the function restricting the simulation. If any other elements (duration, amounts, ceiling etc.) are taken directly from the data when simulating unemployment benefits, this should also be recorded as part simulation in the comments section.
- Three groups of ‘potential’ recipients of UB are distinguished in simulations and treated differently:
- Individuals currently in receipt of UB in the data (bun > 0) should be eligible for receiving the simulated UB.
- Individuals with unemployment spells and not in receipt (lunmy_s > 0 & bun = 0) should be included into simulations, but their eligibility to UB should be restricted by setting the number of months worked in the qualifying period to 0 (liwmy_s = 0 if bun = 0).
- Employed individuals with no unemployment spells are potential new unemployed and might be turned into such for specific applications (e.g. modelling employment transitions for labour market adjustments or calculating replacement rates). Previously these individuals were identified as (ils_earns != 0 & bunct = 0), now they are included in simulations through the condition lnu > 0. Variable lnu (default is 0) is created within the Labour Market Adjustments (LMA) or Net Replacement Rates (NRR) Add-Ons and serves as an identifier for those who undergo transition into unemployment.
- Given that the data available in most countries is insufficient to fully or accurately simulate UB, the following imputations and assumptions will be used.
- (a) Eligibility
- Qualifying period: most countries require a minimum number of contributions (or months in work) over a period of time. Usually the requirement is about 12 months of contributions in the last 18 to 36 months preceding unemployment. The minimum required number of months in work and the total number of months to be checked should be recorded as constants: e.g. $UB_QperMin and $UB_QperTot. Note, that the information available in the EU-SILC is number of months spent at full-time/part-time work in income reference period.
- unemployed and in receipt: use the observed value for the income reference period and assume it is representative to the whole qualifying period (i.e. liwmy*$UB_QperTot/12), but not exceeding the total no of months worked (liwwh). As individuals in this group are eligible assume that the number of contributions is at least equal to the minimum legal requirement. In cases where self-employed are not eligible for UB receipt altogether, use yemmy instead of liwmy.
- unemployed and not in receipt: set to 0.
- new unemployed: equal to liwmy_a (default is 0). The variable is calculated in the LMA or NRR Add-on in the same way as for group (1), but it is not restricted to be at least equal to the legal requirement.
- (b) Benefit amount
- Previous contribution: in many countries the benefit amount depends on the amount of previous contributions (or on previous net/gross earnings). Since previous contributions/earnings are not available in the EU-SILC, assume previous earnings are equal to those predicted by wage equation (yivwg derived in the Stata do-files) and, if needed, contributions calculated over such earnings (can be calculated in EUROMOD applying SIC rules).
- unemployed and in receipt
- (a): impute previous wage based on the estimation of a wage equation (yivwg) assuming full-time employment as defined in the country (i.e. yivwg*40*52/12 for per month amount)
- (b): previous wage obtained by reverting UB rules (yempv)
- unemployed and not in receipt: set to 0
- new unemployed: set to be equal to yempv_a (default is 0). The variable is calculated in the LMA or NRR Add-on based on the actual observed earnings (e.g. yem).
- (c) Benefit duration
- Number of previous contributions: in some countries the maximum duration of the benefit depends on the total number of contributions. In the EU-SILC there is a proxy variable: number of years spent in paid work (PL200).
- all groups: assume total number of months making contributions = PL200 * 12 (i.e. liwwh).
- unemployed and in receipt: the duration of UB must be limited to the number of months in receipt of unemployment benefit (EUROMOD variable bunmy).
- (d) Compatibility with the Add-ons (LMA and NRR)
- If not ON in the baseline the UB policy should be set to OFF
- Constants required for calculations of the qualifying period ($UB_QperMin, $UB_QperTot) should be defined in ConstDef_cc
- Variables used in the Add-on’s should be set to 0 as default in SetDefault_cc. These include:
- lnu – a dummy variable for new unemployed
- liwmy_a – number of months in work in the year preceding unemployment (for new unemployed)
- yempv_a – monthly gross wage in the year preceding unemployment (for new unemployed)
- yem_a (previously yem00) – monthly gross wage for new employed
- lhw_a (previously lhw00) – average weekly working hours for new employed
- There might be other country specific variables, e.g. liwmy01_a and liwmy02_a in case of PT. They should be treated in the same way.
- Parental leave benefits
- Parental Leave Benefits (PB) are simulated for all countries starting from 2015. The policies should be switched ON if the simulations are available for all years and the results are satisfactory (see section 9.2). If not, the policies should be switched OFF in the baseline and be part of the PBE_cc extension. In either case, they should be updated annually.
- All elements of parental leave benefits (eligibility conditions, durations and amounts) should be simulated in the model [REC]. However, if the fully simulated policy cannot be used in baselines, partial simulation can be used instead (i.e. eligibility, duration or other elements are taken from the data),. In that case policies should be clearly marked as ‘PART SIMULATED’ in the comments fields next to the function restricting the simulation.
- Given that the data available in most countries is insufficient to fully or accurately simulate PB, the following imputations and assumptions should be used.
- (a) Eligibility
- Previous contribution history is assumed to be proportional to the observed months in work and out of work during the reference period, i.e. past 12 months.
- (b) Benefit amount
- Previous contribution: if the benefit amount depends on the amount of previous contributions (or on previous net/gross earnings), assume previous earnings are equal to those predicted by wage equation (yivwg derived in the Stata do-files, recalculated in monthly terms) or the current wage, whichever is higher.
- If multiple options for benefit rates or replacement rates are available, a default (if defined in the benefit rules) or the most common rate (according to external statistics) is assumed for all potential recipients. If administrative statistics on the distribution of options in the target population are available, recipients can be randomly assigned to different optional durations.
- (c) Benefit duration
- If multiple options of benefit duration and replacement are available in legislation, a default (if defined in the benefit rules) or the most popular option (according to external statistics) is assumed for all potential recipients.
- Benefit duration should take into account the birth month of the child (dmb).
10. Tax units
- The name of each tax unit must start with a prefix “tu_”.
- Tax units used in more than one policy must be defined in the tax unit sheet(s).
- For the purpose of METR Add-on, individual and household unit must be defined (i.e. tu_individual_cc and tu_household_cc, respectively).
11. Income lists
- The name of each standard income list must start with a prefix “ils_” and all country-specific income lists must start with a prefix “il_”.
- All income lists must have a description (next to where they are defined).
- All standard income lists (see below) are compulsory and must be defined in the standard income list policy (ILSDef_cc). Exceptionally, standard income lists for UDB EU-SILC definitions are stored in a separate policy (ILSUDBDef_cc).
- All country-specific income lists used in more than one policy must be defined in the income list policy (ILDef_cc), others which are relevant only for a single policy can be defined in the corresponding policies.
- Make sure that no variable is double counted in income lists, e.g. a benefit component separately and in aggregated variable, both data and simulated variable.
- [REC] Income lists should include detailed incomes rather than their aggregates (here referring to variables not other income lists). For example, assuming there are two unemployment benefits and both need to be included, it is better to have each component separately rather their aggregate.
12. Definitions of standard income lists
- 12.1 General
- Earnings (ils_earns) – labour earnings (e.g. employment and self-employment income)
- Original income (ils_origy) – i.e. market income with the following components :
- Earnings (+)
- Income from capital, e.g. dividends and interests (+)
- Income from occupational and private pensions (+)
- Income from property (+)
- Income received by children (+)
- Regular inter-household cash transfer received (+)
- Regular inter-household cash transfer paid (-)
- Original income and replacement incomes (ils_origrepy) – the latter referring to contributory benefits intended to provide incomes for specific life cycle (e.g. old age, maternity, unemployment, disability, invalidity ); contributory benefits require the payment of contributions, by the protected persons or by other parties on their behalf, in order to secure individual entitlement to benefits while this is not the case for non-contributory benefits
- Employer social insurance contributions (ils_sicer) – including (employer) payroll taxes
- Credited social insurance contributions (ils_sicct) – contributions paid by government or social security institution on benefits (if these are simulated)
- Employee social insurance contributions (ils_sicee)
- Self-employed social insurance contributions (ils_sicse)
- Other social insurance contributions (ils_sicot) – contributions paid by individuals but not directly linked to employment or self-employment (e.g. SIC due on benefits and paid by the benefit recipients, health contributions paid by general population)
- Total SIC considered for disposable income (ils_sicdy) constructed as ils_sicdy = ils_sicee + ils_sicse + ils_sicot
- Public pensions (ils_pen) - contributory benefits related to old age, survivors’, disability and early retirement as well as integral non-contributory elements (e.g. minimum pensions)
- Means-tested benefits (ils_benmt) – social benefits, which are explicitly or implicitly conditional on the beneficiary's income and/or wealth falling below a specified level
- Non means-tested benefits (ils_bennt)
- Total benefits (ils_ben) constructed as ils_ben = ils_pen + ils_benmt + ils_bennt
- Simulated benefits (ils_bensim)
- Income taxes (ils_taxin)
- Wealth taxes (ils_taxwl)
- Total taxes constructed as ils_tax = ils_taxin + ils_taxwl
- Simulated taxes (ils_taxsim)
- Tax bases (ils_base_t*) – a separate income list for each simulated tax instrument, denoting incomes which enter its tax base (e.g. ils_base_tin, ils_base_tcr). The income list refers to gross taxable income (i.e. before subtracting any tax credits, tax allowances, deductions or other expenses) and excludes incomes which are only relevant for the progression clause (where this exists), i.e. only relevant to determine the tax bracket of the income tax schedule.
- Disposable income (ils_dispy) constructed as ils_dispy = ils_origy + ils_ben – ils_sicdy – ils_tax
- 12.2 Aggregate benefits by function
- Benefits by function following Eurostat definitions (ESSPROS & SILC). These income lists should contain all benefits available in EUROMOD (taken from data or simulated) that are included in the EM standard definition of disposable income (ils_dispy):
- Child-birth related benefits (ils_b1_bcb) – benefits related to the cost of pregnancy, childbirth and adoption.
- Family benefits (ils_b1_bfa) – benefits related to the cost of pregnancy, childbirth and adoption (i.e. ils_b1_bcb), bringing up children and caring for other family members (classification corresponding to SILC variable HY050G);
- Education benefits (ils_b1_bed) - grants, scholarships and other education help received by students (classification corresponding to SILC variable PY140G);
- Old-age benefits (ils_b1_boa) – income maintenance and support in connection with old age (classification corresponding to SILC variable PY100G);
- Survivor benefits (ils_b1_bsu) – income maintenance and support in connection with the death of a family member (classification corresponding to SILC variable PY110G);
- Disability benefits (ils_b1_bdi) – income maintenance and support (except health care) in connection with the inability of physically or mentally disabled people to engage in economic and social activities (classification corresponding to SILC variable PY130G);
- Unemployment benefits (ils_b1_bun) – income maintenance and support in cash or kind in connection with unemployment (classification corresponding to SILC variable PY090G);
- Health/sickness benefits (ils_b1_bhl) – income maintenance and support in connection with physical or mental illness, excluding disability; health care intended to maintain, restore or improve the health of the people protected irrespective of the origin of the disorder (classification corresponding to SILC variable PY120G);
- Housing benefits (ils_b1_bho) – help towards the cost of housing (classification corresponding to SILC variable HY070G);
- Social assistance/exclusion benefits (ils_b1_bsa) – benefits (except health care) specifically intended to combat social exclusion where they are not covered by one of the other functions (classification corresponding to SILC variable HY060G).
- In-work benefits (ils_b1_bwk) – benefits intended to supplement the income of low-paid individuals.
- The sum of all ils_b1* income lists must equal total benefits (ils_ben), excluding ils_b1_bcb to avoid double-counting.
- Benefit by function (semi-aggregated categories):
- Family and educations benefits (ils_b2_bfaed) constructed as ils_b2_bfaed = ils_b1_bfa + ils_b1_bed
- Pensions, disability and health benefits (ils_b2_penhl) constructed as ils_b2_penhl = ils_b1_boa + ils_b1_bsu + ils_b1_bhl + ils_b1_bdi
- Social assistance and housing benefits (ils_b2_bsaho) constructed as ils_b2_bsaho = ils_b1_bsa + ils_b1_bho
- Unemployment and in-work benefits (ils_b2_bunwk) constructed as ils_b2_bunwk = ils_b1_bun + ils_b1_bwk
- The sum of ils_b2* income lists must equal total benefits (ils_ben): ils_ben = ils_b2_bunwk + ils_b2_bfaed + ils_b2_penhl + ils_b2_bsaho
- Benefits by function following Eurostat definitions (ESSPROS & SILC). These income lists should contain all benefits available in EUROMOD (taken from data or simulated) that are included in the EM standard definition of disposable income (ils_dispy):
- 12.3 UDB EU-SILC definitions
- Income lists corresponding to UDB EU-SILC definitions of market income, tax and benefit variables. NB! These lists may include variables not part of EUROMOD standard concept of disposable income (ils_dispy) and vice versa. They should not include variables simulated in EUROMOD if they are not accounted for in the original UDB EU-SILC.
- Employment income (ils_udb_yem)
- Self-employment income (ils_udb_yse)
- Investment income (ils_udb_yiy)
- Pensions from individual private plans (ils_udb_ypp)
- Income from rental of a property or land (ils_udb_ypr)
- Income received by people aged under 16 (ils_udb_yot)
- Regular inter-household cash transfers received (ils_udb_ypt)
- Regular inter-household cash transfers paid (ils_udb_xmp)
- Company car (ils_udb_kfbcc)
- Regular taxes on wealth (ils_udb_tpr)
- Tax on income and social contributions (ils_udb_tis)
- Family benefits (ils_udb_bfa)
- Education benefits (ils_udb_bed)
- Old-age benefits (ils_udb_boa)
- Survivor benefits (ils_udb_bsu)
- Disability benefits (ils_udb_bdi)
- Unemployment benefits (ils_udb_bun)
- Health/sickness benefits (ils_udb_bhl)
- Housing benefits (ils_udb_bho)
- Social assistance/exclusion benefits (ils_udb_bsa)
- Disposable income (ils_udb_yds) – the sum of all ils_udb* income lists, where ils_udb_xmp, ils_udb_tpr and ils_udb_tis are included with a negative sign.
- Various standard income lists and their structure are summarised in Figure 1.
- Income lists corresponding to UDB EU-SILC definitions of market income, tax and benefit variables. NB! These lists may include variables not part of EUROMOD standard concept of disposable income (ils_dispy) and vice versa. They should not include variables simulated in EUROMOD if they are not accounted for in the original UDB EU-SILC.
Figure 1. Structure of standard income lists

13. Output
- Results must be outputted at the individual level by default.
- Income and expenditure data are automatically outputted in monthly terms.
- Output (at the individual level) needs to include:
- all income variables from data which are not simulated in the model (including fringe benefits),
- all simulated income variables,
- all socio-demographic variables,
- all income lists,
- yds: household disposable income as reported in original data (e.g. HY020) entirely attributed to one member of the household,
- [REC] identifiers for all tax units.
- Make sure that output policies does not have any variable (or income list) twice or any temporary/intermediate variables.
14. Validation
- Micro validation:
- Check eligibilities and the amounts of taxes and benefits simulated by the model (case-by-case validation for a selection of particular households)
- Compare simulated values against data recorded values in the same survey on case-by-case basis
- Check descriptive statistics for outcome variables (min, max, mean)
- Check that results for some basic indicators (e.g., average tax rate) make sense for all observations in the sample
- [REC] Check budget constraint graphs for different types of hypothetical households
- Macro validation:
- Compare the sum of each income component and the number of recipients (both original incomes and tax-benefit instruments) with external statistics; simulated values can be also compared against data recorded values in the same survey. For simulated instruments, this should answer the question how close EUROMOD estimates are to the external figures given the following factors:
- quality of external statistics
- survey quality, i.e. how representative market/non-simulated incomes and population structure are, measurement errors etc.
- quality of data imputations (e.g. net-to-gross imputations, income splitting, replacing missing values)
- simulation quality, i.e. accuracy of tax-benefit rules
- key modelling choices and assumptions (e.g. adjustments for benefit non-take up and tax evasion, the June 30 rule etc.)
- In general, the focus should be on relative differences, e.g. how one instrument compares to another, whether the bias has an expected sign and whether trends over years are in the expected direction.
- Compare inequality measures (Gini, S80/S20) and poverty measures with Eurostat statistics based on SILC or external statistics (if relevant). The latter should address the question how different EUROMOD estimates are due to differences in the underlying methodology. The following sources of bias are justified (but need to be acknowledged of course):
- sample adjustments (e.g. removing children born after income reference period from the sample)
- differences in the concept of disposable income
- data imputations, i.e. (c) above
- differences between data and simulated values, that is (d) and (e) from above combined.
- Assess the distributional effects of annual policy changes using the Policy Effects Tool (PET).
- [REC] Produce budget constraint graphs for different household types with the hypothetical data.
- Compare the sum of each income component and the number of recipients (both original incomes and tax-benefit instruments) with external statistics; simulated values can be also compared against data recorded values in the same survey. For simulated instruments, this should answer the question how close EUROMOD estimates are to the external figures given the following factors: