iv6

четверг, 4 февраля 2010 г.

Station errors in CRUTEM3

My attempts at reproducing uncertainty estimates of CRUTEM3 dataset, following methods of [Brohan et al. 2006] have encountered strange problem. My own estimates of station errors consistently yield smaller values compared to CRUTEM3_station_error file - but only for grid-boxes with several (more than one) stations. If there is only one station in the grid-box, estimates match perfectly.

Let's look at a 5-degree grid-box with bottom-left corner at 40N 145E. There are two stations: NEMURO (474200) and YUZHNO-KURIL'SK (321650). Prior to 1947, only one station provides data for the grid-box, and it is Nemuro. Station record, published by British Met Office, lists standard deviations of temperature data over the period from 1941 to 1990.

SD1 = 1.4 1.7 1.0 1.0 1.1 1.3 1.8 1.5 0.9 0.8 1.1 1.5

Following [Brohan et al. 2006], we can calculate normal error for this station (all 30 years for the normals are available) as SD / sqrt(30):

NE1 = 0.26 0.31 0.18 0.18 0.20 0.24 0.33 0.27 0.16 0.15 0.20 0.27

For calculation if station error, we need also measurement uncertainty and homogenization adjustment uncertainty. As suggested in referenced work, measurement uncertainty = 0.03 C and  homogenization adjustment uncertainty = 0.4 C. Therefore, station error is:

SE1 = sqrt(0.03^2 + 0.4^2 + NE1^2)
SE1 = 0.48 0.51 0.44 0.44 0.45 0.47 0.52 0.49 0.43 0.43 0.45 0.49

These are exactly the same values as in CRUTEM3_station_error prior to 1947 year, when there is only one station in the grid-box. No problems so far.

However, in 1947 another station appears. It is Yuzhno-Kurilsk (321650). Standard deviations of temperature data:

SD2 = 1.3 1.6 1.1 0.9 1.1 1.3 1.6 1.5 1.0 0.8 1.1 1.4

Normal error:
NE2 = 0.24 0.29 0.20 0.16 0.20 0.24 0.29 0.27 0.18 0.15 0.20 0.26

Station error:
SE2 = sqrt(0.03^2 + 0.4^2 + NE2^2)
SE2 = 0.47 0.50 0.45 0.43 0.45 0.47 0.50 0.49 0.44 0.43 0.45 0.48

[Brohan et al. 2006] says: "The grid-box anomaly is the mean of the n station anomalies in that grid box, so the grid-box station uncertainty is the root mean square of the station errors, multiplied by 1/sqrt(n)." There are two stations, so n=2. Root mean square of the station errors:

RMS = sqrt((SE1^2 + SE2^2)/2)
RMS = 0.475 0.505 0.445 0.435 0.450 0.470 0.510 0.490 0.435 0.430 0.450 0.485

Multiplied by 1/sqrt(2):

0.336 0.357 0.315 0.308 0.318 0.332 0.361 0.346 0.308 0.304 0.318 0.343

And CRUTEM3_station_error after 1947 year lists:

0.376 0.414 0.343 0.333 0.347 0.370 0.421 0.394 0.333 0.319 0.347 0.388

They are clearly correlated with my estimates, but consistently larger, and I cannot explain that. Of course, this is not limited to that particular grid-box. No, rounding errors do not explain that.

I believe that I correctly calculated station errors for individual stations, since they match those in CRUTEM3_station_error, if there is only one station in the grid-box. But I also do not see an error I probably made in combining them.

Thoughts?