[pandas-basic] 정렬 :: 마인드스케일

판다스에서 정렬을 하는 방법을 알아보겠습니다.

import pandas as pd

df = pd.read_excel('census.xlsx')

오름차순 정렬

1, 2, 3, 4, ...와 같이 점점 커지는 순서대로 정렬하는 것을 오름차순 정렬이라고 합니다. 아래는 age 열을 기준으로 오름차순 정렬을 합니다.

df.sort_values('age')

	age	workclass	fnlwgt	education	education_num	marital_status	occupation	relationship	race	sex	capital_gain	capital_loss	hours_per_week	native_country	income
12318	17	Private	127366	11th	7	Never-married	Sales	Own-child	White	Female	0	0	8	United-States	<=50K
6312	17	Private	132755	11th	7	Never-married	Sales	Own-child	White	Male	0	0	15	United-States	<=50K
30927	17	Private	108470	11th	7	Never-married	Other-service	Own-child	Black	Male	0	0	17	United-States	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
5104	90	Private	52386	Some-college	10	Never-married	Other-service	Not-in-family	Asian-Pac-Islander	Male	0	0	35	United-States	<=50K
8963	90	?	77053	HS-grad	9	Widowed	?	Not-in-family	White	Female	0	4356	40	United-States	<=50K
10210	90	Self-emp-not-inc	282095	Some-college	10	Married-civ-spouse	Farming-fishing	Husband	White	Male	0	0	40	United-States	<=50K

32561 rows × 15 columns

여러 개의 열을 기준으로 정렬을 할 수도 있습니다. 아래는 먼저 age열, 다음으로 fnlwgt 열을 기준으로 정렬을 하는 예입니다.

df.sort_values(['age', 'fnlwgt'])

	age	workclass	fnlwgt	education	education_num	marital_status	occupation	relationship	race	sex	capital_gain	capital_loss	hours_per_week	native_country	income
18593	17	Private	19752	11th	7	Never-married	Other-service	Own-child	Black	Female	0	0	25	United-States	<=50K
31959	17	Private	24090	HS-grad	9	Never-married	Exec-managerial	Own-child	White	Female	0	0	35	United-States	<=50K
21200	17	Private	25051	10th	6	Never-married	Other-service	Own-child	White	Male	0	0	16	United-States	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
4070	90	Private	313986	11th	7	Never-married	Handlers-cleaners	Own-child	White	Male	0	0	40	United-States	<=50K
6624	90	Private	313986	11th	7	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	40	United-States	<=50K
31696	90	?	313986	HS-grad	9	Married-civ-spouse	?	Husband	White	Male	0	0	40	United-States	>50K

32561 rows × 15 columns

내림차순 정렬

9, 8, 7, ...와 같이 점점 작아지는 순서대로 정렬하는 것을 내림차순 정렬이라고 합니다. 내림차순 정렬을 하려면 ascending=False를 추가해줍니다.

df.sort_values('age', ascending=False)

	age	workclass	fnlwgt	education	education_num	marital_status	occupation	relationship	race	sex	capital_gain	capital_loss	hours_per_week	native_country	income
5406	90	Private	51744	Masters	14	Never-married	Exec-managerial	Not-in-family	Black	Male	0	0	50	United-States	>50K
6624	90	Private	313986	11th	7	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	40	United-States	<=50K
20610	90	Private	206667	Masters	14	Married-civ-spouse	Prof-specialty	Wife	White	Female	0	0	40	United-States	>50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
19190	17	Private	29571	12th	8	Never-married	Handlers-cleaners	Own-child	White	Male	0	0	15	United-States	<=50K
19206	17	Private	183066	10th	6	Never-married	Other-service	Own-child	White	Female	0	0	25	United-States	<=50K
24954	17	Private	160968	11th	7	Never-married	Adm-clerical	Own-child	White	Male	0	0	16	United-States	<=50K

32561 rows × 15 columns

오름차순과 내림차순을 섞기

여러 개의 열을 기준으로 정렬을 할 때, 각각 오름차순과 내림차순을 정할 수 있습니다. ascending에 정렬의 기준이 되는 열의 순서대로 True라고 해주면 오름차순, False라고 해주면 내림차순 정렬이 됩니다. 아래 예는 age는 오름차순, fnlwgt는 내림차순으로 정렬하는 예입니다.

df.sort_values(
    ['age', 'fnlwgt'],
    ascending=[
        True,  # age는 오름차순 정렬
        False  # fnlwgt는 내림차순 정렬
    ])

	age	workclass	fnlwgt	education	education_num	marital_status	occupation	relationship	race	sex	capital_gain	capital_loss	hours_per_week	native_country	income
23373	17	?	806316	11th	7	Never-married	?	Own-child	White	Female	0	0	20	United-States	<=50K
27167	17	Private	721712	10th	6	Never-married	Other-service	Own-child	White	Male	0	0	15	United-States	<=50K
7663	17	?	659273	11th	7	Never-married	?	Own-child	Black	Female	0	0	40	Trinadad&Tobago	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
8973	90	Private	46786	Bachelors	13	Married-civ-spouse	Sales	Husband	White	Male	9386	0	15	United-States	>50K
11996	90	Private	40388	Bachelors	13	Never-married	Exec-managerial	Not-in-family	White	Male	0	0	55	United-States	<=50K
11731	90	?	39824	HS-grad	9	Widowed	?	Not-in-family	White	Male	401	0	4	United-States	<=50K

32561 rows × 15 columns