sinefit documentation

sinefit fits a least-squares estimate of a sinusoid to time series data that have a periodicity of 1 year.

For related functions, see sineval and sinefit_bootstrap documentation.

Back to Climate Data Tools Contents

Contents

Syntax

ft = sinefit(t,y)
ft = sinefit(...,'weight',weights)
ft = sinefit(...,'terms',TermOption)
[ft,rmse] = sinefit(...)

Description

ft = sinefit(t,y) fits a sinusoid with a periodicity of 1 year to input data y collected at times t. Input times can be in datenum, datetime, or datestr format, and do not need to be sampled at regular intervals. The output ft contains the coefficients of the terms in the calculation described below.

ft = sinefit(...,'weight',w) applies weighting to each of the observations y. For example, if formal errors err are associated with y, you might let w = 1./err.^2. By default, w = ones(size(y)).

ft = sinefit(...,'terms',TermOption) specifies which terms are calculated in the sinusoid fit. TermOption can be 2, 3, 4, or 5:

[ft,rmse] = sinefit(...) returns the root-mean-square error of the residuals of y after removing the sinusoidal fit. This is one measure of how well the sinusoid fits the data, but for a more in-depth understanding of the uncertainties, including uncertainties in the timing, see sinefit_bootstrap.

Example 1a: Fit a sinusoid to toy data

In this example we'll make up some data with known parameters, then use sinefit to fit a sinusoid to the data. Let's assume you have 100 measurements taken between Jan 1, 2005 and today. Use sineval to generate a sinudoid of 75 unit amplitude, having a maximum value on day 123 of each year (July 5), and a long-term linear trend of -2.2 units per year. Note that in sineval we'll also have to enter a somewhat strange value of 5e3, which is just the y-intercept at year zero, but is not very meaningful to us in the 21st century.

To make it more of a challenge for sinefit, we'll also add gaussian noise that has a standard deviation of 36 units. (This is an appreciable amount of noise relative to the 75 unit amplitude of the sinusoid.)

% 100 random dates between January 1, 2005 and today:
t = datenum('jan 1, 2005') + (now-datenum('jan 1, 2005'))*rand(100,1);

% Sinusoid of amplitude 75; max on day 123 (July 5); trend -2.2 units/year:
sine_parameters = [75 123 5e3 -2.2];
err = 36*randn(size(t)); % random error
y = sineval([75 123 5e3 -2.2],t) + err;  % sinusoid + noise

% Plot the 100 data points:
figure(1)
plot(t,y,'o')
axis tight % removes white space
box off    % removes frame
datetick('x','keeplimits') % formats x date labels

Believe it or not, that dataset is sinusoidal. It's easier to see the sinusoidal nature if we use doy to plot all the data as a function of day of year:

figure(2)
plot(doy(t),y,'o')

axis tight
box off
xlabel 'day of year'

In this toy dataset, we know we should have a sinusoid with an amplitude of 75 units, a maximum value on day 123, and a long term trend of -2.2 units per year. And since we intentionally added 36 units of gaussian noise, we know the sinusoid should match the "measurements" to about an rms error of 36 units.

[ft,ft_error] = sinefit(t,y,'terms',4)
ft =
   1.0e+03 *
    0.0864    0.1186    7.6751   -0.0035
ft_error =
   33.9327

The numbers above tell us that sinefit has determined our 100 noisy datapoints are characterized by a sinusoid with an amplitude of ft(1) units, which we see is close to the prescribed value of 75, a maximum value near the 123rd day of the year, and a linear trend of about -2.2 units per year. The optional second output of the sinefit function tells us the sinusoid matches the "measurements" to a 1-sigma uncertainty given by ft_error, which is in line with the prescribed value of gaussian noise.

Here is the sinusoid fit to the data:

% A daily array of times from the first datapoint to the last:
t_daily = min(t):max(t);

% The fit sinusoid at daily resolution:
y_daily = sineval(ft,t_daily);

figure(1) % goes back to the first figure
hold on
plot(t_daily,y_daily)

Example 1b: Specifying weights

Suppose you know the formal errors associated with each observation y. Some measurements have higher uncertainties associated with them than others. From the err vector we used above to prescribe noise, we'll weight each observation y like this:

w = 1./err.^2;

figure
scatter(doy(t),y,25,w,'filled')
axis tight
xlabel 'day of year'
cb = colorbar;
ylabel(cb,'weight')
cmocean('amp')
caxis([0 0.01])

The plot above shows that we give more weight to the values with lower errors associated with them. Now use sinefit with the specified weights:

ft = sinefit(t,y,'weight',w,'terms',4)
ft =
   1.0e+03 *
    0.0764    0.1233    5.1464   -0.0023

Comparing to the unweighted solution, you'll see that this version produces a slightly more accurate coefficients.

Example 1c: More uncertainty quantification

For a more in-depth analysis of uncertainty in the sinusoid fit, check out sinefit_bootstrap, which can provide 1-sigma levels of uncertainty for each parameter like this (give it a few seconds):

ftb = sinefit_bootstrap(t,y,'terms',4);

% 1 sigma uncertainty for each parameter:
std(ftb)
ans =
   1.0e+03 *
    0.0055    0.0029    1.6112    0.0008

That tells us sinefit should be accurate to 1-sigma levels of about 5 for the amplitude (compare ft(1) to the prescribed amplitude to prove this to yourself); timing should be accurate within about 4 days; and the trend should be accurate within about 1 unit per year.

Example 2: Fit a sinusoid to real sea ice data

Let's take a look at some real sea ice data from the NSIDC. I've packaged up a time series of sea ice extent in .mat format, and included in this File Exchange upload so you can follow along:

% Load sample data:
load seaice_extent

% Plot raw data:
figure
plot(t,extent_N)
hold on
axis tight
box off
ylabel 'NH sea ice extent (10^6 km^2)'

Quite clearly that data has some periodicity to it at the 1/yr frequency. How much does Northern Hemisphere sea ice extent vary at subannual timescales, when does it typically reach a maximum value, and how much is sea ice changing in terms of long-term trend?

% Fit a sinusoid:
ft = sinefit(t,extent_N,'terms',4)
ft =
    4.4079   66.8394  125.2987   -0.0569

Similar to Example 1, the output of sinefit tells us Arctic sea ice tends to vary by about 4.41 million square kilometers each year, it reaches a maximum around day 66 (March 7) (see note below), and northern hemisphere sea ice has decreased by 60,000 square kilometers per year since 1978.

Here's the sinusoid fit plotted on top of the raw data:

hold on
plot(t,sineval(ft,t))
legend('raw data','fit by sinefit')

And zoom in a bit to see the structure:

xlim([datetime(1995,1,1) datetime(2000,1,1)])

Note to users

One brief note related to the all the parameters estimated by sinefit: These parameters describe the best-fit sinusoid, but that does not necessarily mean they fully describe the behavior of the underlying data itself. For example, in terms of climatological average, Northern Hemisphere sea ice extent actually typically reaches a maximum around day 71 (March 12), whereas sinefit says the maximum value of the best-fit sinusoid occurs on day 66 (March 7). That's because the true behavior of sea ice extent is more complex than a simple sinusoid. In your work, be sure to consider the difference between true behavior and the 1/yr frequency component of the true behavior.

Example 3: Data cube

This functionality is still in beta, but here's the code I'm using to test it:

load pacific_sst

ft = sinefit(t,sst,'terms',3);

figure
subplot(1,3,1)
imagescn(lon,lat,ft(:,:,1))
title 'sinusoid amplitude'
cmocean amp
cb = colorbar;
ylabel(cb,'seasonal magnitude \circC')
axis image

subplot(1,3,2)
imagescn(lon,lat,ft(:,:,2))
title 'sinusoid phase'
cmocean phase
caxis([1 365])
cb = colorbar;
ylabel(cb,'month of max temperature')
cbdate(cb,'mmm')
axis image

subplot(1,3,3)
imagescn(lon,lat,ft(:,:,3))
title 'mean temperature'
cmocean thermal
cb = colorbar;
ylabel(cb,'mean temperature')
axis image

Nothing too surprising above: Shallow waters have more temperature variability throughout the year than deep waters, and everything near the equator is has almost no seasonality. The middle panel shows us that the warmest waters occur in August or September in the northern hemisphere, or in February or March in the southern hemisphere. The third panel is just that constant offset which is effectively the mean sea surface temperature.

Author Info

This function is part of the Climate Data Toolbox for Matlab. The sinefit, sineval, and sinefit_bootstrap functions as well as the supporting documentation were written by Chad A. Greene of the University of Texas Institute for Geophysics.