참/거짓으로 선택

판다스에서 조건으로 행을 선택하는 또 다른 방법을 알아보겠습니다.

import pandas as pd
df = pd.read_excel('census.xlsx')

참/거짓으로 선택

판다스에서 다음과 같이 비교를 하면 age의 모든 값을 40과 비교하여 참(True), 거짓(False)의 시리즈를 만듭니다.

df['age'] > 40

0        False
1         True
2        False
         ...  
32558     True
32559    False
32560     True
Name: age, Length: 32561, dtype: bool

참/거짓의 시리즈를 [] 사이에 넣어주면 해당 조건이 참인 행을 선택합니다.

df[df['age'] > 40]

	age	workclass	fnlwgt	education	education_num	marital_status	occupation	relationship	race	sex	capital_gain	capital_loss	hours_per_week	native_country	income
1	50	Self-emp-not-inc	83311	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	13	United-States	<=50K
3	53	Private	234721	11th	7	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0	0	40	United-States	<=50K
6	49	Private	160187	9th	5	Married-spouse-absent	Other-service	Not-in-family	Black	Female	0	0	16	Jamaica	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
32554	53	Private	321865	Masters	14	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	40	United-States	>50K
32558	58	Private	151910	HS-grad	9	Widowed	Adm-clerical	Unmarried	White	Female	0	0	40	United-States	<=50K
32560	52	Self-emp-inc	287927	HS-grad	9	Married-civ-spouse	Exec-managerial	Wife	White	Female	15024	0	40	United-States	>50K

13443 rows × 15 columns

이 방법은 대체로 코드가 복잡하고, 실행 속도도 느립니다. 가능하면 query를 사용하세요.

and

두 조건 모두 참이어야 할 경우에는 & 연산자를 씁니다. 단, &는 ==과 같은 비교연산자보다 계산 우선 순위가 높기 때문에 비교를 먼저 하도록 괄호를 씌워줍니다.

아래는 age가 40보다 크고, sex가 Male인 경우에만 참입니다.

(df['age'] > 40) & (df['sex'] == 'Male')

0        False
1         True
2        False
         ...  
32558    False
32559    False
32560    False
Length: 32561, dtype: bool

아래는 age가 40보다 크고, sex가 Male인 행을 선택합니다.

df[(df['age'] > 40) & (df['sex'] == 'Male')]

	age	workclass	fnlwgt	education	education_num	marital_status	occupation	relationship	race	sex	capital_gain	capital_loss	hours_per_week	native_country	income
1	50	Self-emp-not-inc	83311	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	13	United-States	<=50K
3	53	Private	234721	11th	7	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0	0	40	United-States	<=50K
7	52	Self-emp-not-inc	209642	HS-grad	9	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	45	United-States	>50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
32550	43	Self-emp-not-inc	27242	Some-college	10	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	50	United-States	<=50K
32552	43	Private	84661	Assoc-voc	11	Married-civ-spouse	Sales	Husband	White	Male	0	0	45	United-States	<=50K
32554	53	Private	321865	Masters	14	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	40	United-States	>50K

9497 rows × 15 columns

or

두 조건 중에 하나만 참이어도 되는 경우에는 |를 씁니다. |는 키보드에서 백스페이스와 엔터 사이의 긴 막대기 형태의 기호입니다. | 역시 ==과 같은 비교연산자보다 계산 우선 순위가 높기 때문에 비교를 먼저 하도록 괄호를 씌워줍니다.

아래는 relationship이 Husband이거나 Wife인 경우에만 참입니다.

(df['relationship'] == 'Husband') | (df['relationship'] == 'Wife')

0        False
1         True
2        False
         ...  
32558    False
32559    False
32560     True
Name: relationship, Length: 32561, dtype: bool

아래는 relationship이 Husband이거나 Wife인 경우에만 참인 행을 선택합니다.

df[(df['relationship'] == 'Husband') | (df['relationship'] == 'Wife')]

	age	workclass	fnlwgt	education	education_num	marital_status	occupation	relationship	race	sex	capital_gain	capital_loss	hours_per_week	native_country	income
1	50	Self-emp-not-inc	83311	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	13	United-States	<=50K
3	53	Private	234721	11th	7	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0	0	40	United-States	<=50K
4	28	Private	338409	Bachelors	13	Married-civ-spouse	Prof-specialty	Wife	Black	Female	0	0	40	Cuba	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
32556	27	Private	257302	Assoc-acdm	12	Married-civ-spouse	Tech-support	Wife	White	Female	0	0	38	United-States	<=50K
32557	40	Private	154374	HS-grad	9	Married-civ-spouse	Machine-op-inspct	Husband	White	Male	0	0	40	United-States	>50K
32560	52	Self-emp-inc	287927	HS-grad	9	Married-civ-spouse	Exec-managerial	Wife	White	Female	15024	0	40	United-States	>50K

14761 rows × 15 columns

not

반대되는 경우를 찾으려면 ~을 사용합니다. 아래는 age가 30보다 작고 race가 Black이 경우가 아닌 행를 찾습니다.

df[~((df['age'] < 30) & (df['race'] == "Black"))]

	age	workclass	fnlwgt	education	education_num	marital_status	occupation	relationship	race	sex	capital_gain	capital_loss	hours_per_week	native_country	income
0	39	State-gov	77516	Bachelors	13	Never-married	Adm-clerical	Not-in-family	White	Male	2174	0	40	United-States	<=50K
1	50	Self-emp-not-inc	83311	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	13	United-States	<=50K
2	38	Private	215646	HS-grad	9	Divorced	Handlers-cleaners	Not-in-family	White	Male	0	0	40	United-States	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
32558	58	Private	151910	HS-grad	9	Widowed	Adm-clerical	Unmarried	White	Female	0	0	40	United-States	<=50K
32559	22	Private	201490	HS-grad	9	Never-married	Adm-clerical	Own-child	White	Male	0	0	20	United-States	<=50K
32560	52	Self-emp-inc	287927	HS-grad	9	Married-civ-spouse	Exec-managerial	Wife	White	Female	15024	0	40	United-States	>50K

31603 rows × 15 columns

포함 관계

포함관계를 확인할 때는 isin 메소드를 사용합니다.

아래는 relationship이 Husband이거나 Wife에 포함될 때만 참입니다.

df['relationship'].isin(['Husband', 'Wife'])

0        False
1         True
2        False
         ...  
32558    False
32559    False
32560     True
Name: relationship, Length: 32561, dtype: bool

아래는 relationship이 Husband이거나 Wife에 포함되는 행만 선택합니다.

df[df['relationship'].isin(['Husband', 'Wife'])]

	age	workclass	fnlwgt	education	education_num	marital_status	occupation	relationship	race	sex	capital_gain	capital_loss	hours_per_week	native_country	income
1	50	Self-emp-not-inc	83311	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	13	United-States	<=50K
3	53	Private	234721	11th	7	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0	0	40	United-States	<=50K
4	28	Private	338409	Bachelors	13	Married-civ-spouse	Prof-specialty	Wife	Black	Female	0	0	40	Cuba	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
32556	27	Private	257302	Assoc-acdm	12	Married-civ-spouse	Tech-support	Wife	White	Female	0	0	38	United-States	<=50K
32557	40	Private	154374	HS-grad	9	Married-civ-spouse	Machine-op-inspct	Husband	White	Male	0	0	40	United-States	>50K
32560	52	Self-emp-inc	287927	HS-grad	9	Married-civ-spouse	Exec-managerial	Wife	White	Female	15024	0	40	United-States	>50K

14761 rows × 15 columns