Arnout van Delden ([email protected]), Reinder Banning ... · Arnout van Delden...
Transcript of Arnout van Delden ([email protected]), Reinder Banning ... · Arnout van Delden...
Arnout van Delden ([email protected]), Reinder
Banning, Arjen de Boer and Jeroen Pannekoek
Analysing whether sample survey data
can be replaced by administrative data
Outline
2
1. Understanding fitness for use
2. Conceptual differences
3. Numerical differences
4. Discussion
1. Understanding fitness for use
3
Concepts admin. data: • Numerous rules • Differ by type of industry Case study: • 2011 new production system • Levels and growth rates • Can VAT be used for turnover? • 324 “base cells” for publication
1. Fitness for use: group the base cells
4
Taking decisions
Group Target vs. administrative variable
Control No conceptual differences
Accept Conceptual differences and small numerical differences
Adjust Conceptual differences and substantial systematic numerical differences
Reject Conceptual differences and substantial non-systematic numerical differences
How to assign the base cells to the groups?
2. Conceptual differences; find Control
5
Base cells
Unique (set) of rules ExpectedEffect
85 No regulation VAT = T
64 Foreign services not charged from 2010 VAT < T
35 International trade regulations, correctly derived VAT ≈ T
18 * Subcontractors shift VAT payment to main contractor * Foreign turnover not charged
VAT ≈ T
17 Derogation: certain economic activities not charged VAT ≪ T
16 Subcontractors shift VAT payment to main contractor VAT ≈ T
89 21 Other sets of rules (not specified)
324 Total
3. Numerical differences: the data
6
Yearly turnover: 2009, 2010 • SBS and VAT • Linked at micro level • Units exist whole year • Extremely small units excluded
Hotels and similar accommodation
3. Numerical data: the model
7
Linear regression:
𝑦𝑘𝑖𝑡 = 𝛼𝑘 + 𝑑𝛼𝑘𝛿𝑘𝑖
𝑡 + (𝛽𝑘 +𝑑𝛽𝑘 𝛿𝑘𝑖𝑡 ) 𝑥𝑘𝑖
𝑡 + 휀𝑘𝑖𝑡
SBS(𝑦) and VAT (𝑥) for base cell (𝑘), unit (𝑖), year(𝑡) & year-dummy (𝛿𝑘𝑖
𝑡 )
Regression weights
– calibration weights (sample to population)
– weighted residuals (heteroscedasticity)
– M-estimator (Huber weights against outliers)
3. Numerical data: indicators for grouping
8
Indicator Description
𝑅𝑘2 = 1 −
𝑆𝑆(𝑤)𝑘,𝑟𝑒𝑠
𝑆𝑆(𝑤)𝑘,𝑡𝑜𝑡
Coefficient of determination, with regression weights w
𝑀𝑘𝑦 ,𝑦
= 𝑑𝑘𝑖
𝑡 (𝑦 𝑘𝑖𝑡 −𝑦𝑘𝑖
𝑡 )𝑖𝑡
𝑑𝑘𝑖𝑡 (𝑦 𝑘𝑖
𝑡 +𝑦𝑘𝑖𝑡 )𝑖𝑡
MAPE: Mean absolute percentage error, with calibration weights d
𝛼𝑘, 𝑑𝛼𝑘, 𝛽𝑘, 𝑑𝛽𝑘 Size and p-values of regression coefficients
Indicators for Reject
9
𝑅𝑘2
𝑹𝒌𝟐: 20 poorest base cells
• Sales partly not charged (19) • International Trade (1)
← 95% range Control → R
Sea and coastal passenger water transport
Indicators for group Accept & Adjust
10
slope 2009
← 95% range Control →
Import of new passenger motor vehicles
Conceptual and numerical result in line?
11
Adjust? Expected effect VAT < T
Base cell Number of points
Slope (2009)
Change of Slope? (2010)
Regulation
45112 1742 1.36 -0.01 Margin
45402 31 1.34 NA Margin
45194 42 1.17 0.05 Margin
45111 55 1.16 -0.03 Margin
45191X 210 1.08 -0.04 Margin
47641 59 1.02 0.09 Different moment,
Margin
47790 88 0.99 1.86 Margin
45320 35 0.94 0.09 Margin
4. Discussion
12
Main findings
– Use outlier robust regression and indicators
– Also control group not error free (deviations from 1:1)
– We could not use the significance of regression coefficients
– Instead: used 95%-range from control group
– We achieved a rough grouping by re-using existing data
Discussion points
– Some base cells no decision: conceptual ≠ numerical results
– Limitations: requires the presence of a control group