정렬
판다스에서 정렬을 하는 방법을 알아보겠습니다.
import pandas as pd
df = pd.read_excel('census.xlsx')
오름차순 정렬
1, 2, 3, 4, ...와 같이 점점 커지는 순서대로 정렬하는 것을 오름차순 정렬이라고 합니다. 아래는 age 열을 기준으로 오름차순 정렬을 합니다.
df.sort_values('age')
| age | workclass | fnlwgt | education | education_num | marital_status | occupation | relationship | race | sex | capital_gain | capital_loss | hours_per_week | native_country | income | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12318 | 17 | Private | 127366 | 11th | 7 | Never-married | Sales | Own-child | White | Female | 0 | 0 | 8 | United-States | <=50K |
| 6312 | 17 | Private | 132755 | 11th | 7 | Never-married | Sales | Own-child | White | Male | 0 | 0 | 15 | United-States | <=50K |
| 30927 | 17 | Private | 108470 | 11th | 7 | Never-married | Other-service | Own-child | Black | Male | 0 | 0 | 17 | United-States | <=50K |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 5104 | 90 | Private | 52386 | Some-college | 10 | Never-married | Other-service | Not-in-family | Asian-Pac-Islander | Male | 0 | 0 | 35 | United-States | <=50K |
| 8963 | 90 | ? | 77053 | HS-grad | 9 | Widowed | ? | Not-in-family | White | Female | 0 | 4356 | 40 | United-States | <=50K |
| 10210 | 90 | Self-emp-not-inc | 282095 | Some-college | 10 | Married-civ-spouse | Farming-fishing | Husband | White | Male | 0 | 0 | 40 | United-States | <=50K |
32561 rows × 15 columns
여러 개의 열을 기준으로 정렬을 할 수도 있습니다. 아래는 먼저 age열, 다음으로 fnlwgt 열을 기준으로 정렬을 하는 예입니다.
df.sort_values(['age', 'fnlwgt'])
| age | workclass | fnlwgt | education | education_num | marital_status | occupation | relationship | race | sex | capital_gain | capital_loss | hours_per_week | native_country | income | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 18593 | 17 | Private | 19752 | 11th | 7 | Never-married | Other-service | Own-child | Black | Female | 0 | 0 | 25 | United-States | <=50K |
| 31959 | 17 | Private | 24090 | HS-grad | 9 | Never-married | Exec-managerial | Own-child | White | Female | 0 | 0 | 35 | United-States | <=50K |
| 21200 | 17 | Private | 25051 | 10th | 6 | Never-married | Other-service | Own-child | White | Male | 0 | 0 | 16 | United-States | <=50K |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4070 | 90 | Private | 313986 | 11th | 7 | Never-married | Handlers-cleaners | Own-child | White | Male | 0 | 0 | 40 | United-States | <=50K |
| 6624 | 90 | Private | 313986 | 11th | 7 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0 | 0 | 40 | United-States | <=50K |
| 31696 | 90 | ? | 313986 | HS-grad | 9 | Married-civ-spouse | ? | Husband | White | Male | 0 | 0 | 40 | United-States | >50K |
32561 rows × 15 columns
내림차순 정렬
9, 8, 7, ...와 같이 점점 작아지는 순서대로 정렬하는 것을 내림차순 정렬이라고 합니다. 내림차순 정렬을 하려면 ascending=False를 추가해줍니다.
df.sort_values('age', ascending=False)
| age | workclass | fnlwgt | education | education_num | marital_status | occupation | relationship | race | sex | capital_gain | capital_loss | hours_per_week | native_country | income | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5406 | 90 | Private | 51744 | Masters | 14 | Never-married | Exec-managerial | Not-in-family | Black | Male | 0 | 0 | 50 | United-States | >50K |
| 6624 | 90 | Private | 313986 | 11th | 7 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0 | 0 | 40 | United-States | <=50K |
| 20610 | 90 | Private | 206667 | Masters | 14 | Married-civ-spouse | Prof-specialty | Wife | White | Female | 0 | 0 | 40 | United-States | >50K |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 19190 | 17 | Private | 29571 | 12th | 8 | Never-married | Handlers-cleaners | Own-child | White | Male | 0 | 0 | 15 | United-States | <=50K |
| 19206 | 17 | Private | 183066 | 10th | 6 | Never-married | Other-service | Own-child | White | Female | 0 | 0 | 25 | United-States | <=50K |
| 24954 | 17 | Private | 160968 | 11th | 7 | Never-married | Adm-clerical | Own-child | White | Male | 0 | 0 | 16 | United-States | <=50K |
32561 rows × 15 columns
오름차순과 내림차순을 섞기
여러 개의 열을 기준으로 정렬을 할 때, 각각 오름차순과 내림차순을 정할 수 있습니다. ascending에 정렬의 기준이 되는 열의 순서대로 True라고 해주면 오름차순, False라고 해주면 내림차순 정렬이 됩니다. 아래 예는 age는 오름차순, fnlwgt는 내림차순으로 정렬하는 예입니다.
df.sort_values(
['age', 'fnlwgt'],
ascending=[
True, # age는 오름차순 정렬
False # fnlwgt는 내림차순 정렬
])
| age | workclass | fnlwgt | education | education_num | marital_status | occupation | relationship | race | sex | capital_gain | capital_loss | hours_per_week | native_country | income | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 23373 | 17 | ? | 806316 | 11th | 7 | Never-married | ? | Own-child | White | Female | 0 | 0 | 20 | United-States | <=50K |
| 27167 | 17 | Private | 721712 | 10th | 6 | Never-married | Other-service | Own-child | White | Male | 0 | 0 | 15 | United-States | <=50K |
| 7663 | 17 | ? | 659273 | 11th | 7 | Never-married | ? | Own-child | Black | Female | 0 | 0 | 40 | Trinadad&Tobago | <=50K |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 8973 | 90 | Private | 46786 | Bachelors | 13 | Married-civ-spouse | Sales | Husband | White | Male | 9386 | 0 | 15 | United-States | >50K |
| 11996 | 90 | Private | 40388 | Bachelors | 13 | Never-married | Exec-managerial | Not-in-family | White | Male | 0 | 0 | 55 | United-States | <=50K |
| 11731 | 90 | ? | 39824 | HS-grad | 9 | Widowed | ? | Not-in-family | White | Male | 401 | 0 | 4 | United-States | <=50K |
32561 rows × 15 columns