logo

[time-series]

2일차 복습 풀이

 

ACF 플롯

tute1.csv 파일을 열어 Sales 열의 ACF를 플롯으로 그려보세요. 다음 중에서 Sales가 가진 특징을 모두 고르세요.

import pandas as pd

# 파일 열기 방법(1)
df = pd.read_csv('tute1.csv', index_col='Date', parse_dates=True)
# 파일 열기 방법(2)
df = pd.read_csv('tute1.csv')
df.Date = pd.to_datetime(df.Date)
df.set_index('Date', inplace=True)
df.head()

:::note[output]

             Sales  AdBudget    GDP
Date
1981-03-01  1020.2     659.2  251.8
1981-06-01   889.2     589.0  290.9
1981-09-01   795.0     512.5  290.8
1981-12-01  1003.9     614.1  292.4
1982-03-01  1057.7     647.2  279.1

:::

from statsmodels.tsa.stattools import acf
acf(df.Sales)

:::note[output]

array([ 1.        ,  0.00246107, -0.69909131, -0.02667314,  0.77281964,
       -0.02377194, -0.6577847 , -0.0630775 ,  0.70622827, -0.04020876,
       -0.65691554, -0.04214139,  0.68034473, -0.05075486, -0.63756589,
       -0.005513  ,  0.62785989, -0.06791843, -0.5809052 , -0.01483006,
        0.62228202])

:::

df.Sales.plot()

:::note[output]

<Axes: xlabel='Date'>

:::

from statsmodels.graphics.tsaplots import plot_acf
plot_acf(df.Sales);

:::note[output]

:::

 

계절성 차분

위의 Sales를 계절성 차분하려면 주기는 얼마로 하는 것이 적당해보입니까?

plot_acf(df.Sales.diff(4).dropna());

:::note[output]

:::

 

ADF

from statsmodels.tsa.stattools import adfuller
adfuller(df.Sales)

:::note[output]

(-3.2627546696298033,
 0.016627676807431355,
 9,
 90,
 {'1%': -3.505190196159122,
  '5%': -2.894232085048011,
  '10%': -2.5842101234567902},
 948.799716692216)

:::

p = 0.01 < 0.05

귀무가설(비정상 시계열) 기각 -> 정상

 

KPSS

from statsmodels.tsa.stattools import kpss
kpss(df.Sales)

:::note[output]

(0.30554407533260924,
 0.1,
 19,
 {'10%': 0.347, '5%': 0.463, '2.5%': 0.574, '1%': 0.739})

:::

p = 0.1 > 0.05

귀무가설(추세-정상) 기각X -> 추세-정상

 

계절성 단순 기법으로 예측

future = pd.date_range('2006-03-01', periods=20, freq='QS-MAR')
future

:::note[output]

DatetimeIndex(['2006-03-01', '2006-06-01', '2006-09-01', '2006-12-01',
               '2007-03-01', '2007-06-01', '2007-09-01', '2007-12-01',
               '2008-03-01', '2008-06-01', '2008-09-01', '2008-12-01',
               '2009-03-01', '2009-06-01', '2009-09-01', '2009-12-01',
               '2010-03-01', '2010-06-01', '2010-09-01', '2010-12-01'],
              dtype='datetime64[ns]', freq='QS-MAR')

:::

import numpy as np
pred = pd.DataFrame({'Sales': np.tile(df.Sales[-4:], 5)}, index=future)
pred

:::note[output]

             Sales
2006-03-01  1112.5
2006-06-01   997.4
2006-09-01   826.8
2006-12-01   992.6
2007-03-01  1112.5
2007-06-01   997.4
2007-09-01   826.8
2007-12-01   992.6
2008-03-01  1112.5
2008-06-01   997.4
2008-09-01   826.8
2008-12-01   992.6
2009-03-01  1112.5
2009-06-01   997.4
2009-09-01   826.8
2009-12-01   992.6
2010-03-01  1112.5
2010-06-01   997.4
2010-09-01   826.8
2010-12-01   992.6

:::

df.tail()

:::note[output]

             Sales  AdBudget    GDP
Date
2004-12-01  1018.7     634.9  284.0
2005-03-01  1112.5     663.1  270.9
2005-06-01   997.4     583.3  294.7
2005-09-01   826.8     508.6  292.2
2005-12-01   992.6     634.2  255.1

:::

df.Sales.plot()
pred.Sales.plot()

:::note[output]

<Axes: xlabel='Date'>

:::

Previous
2일차 복습 ★