logo

[pandas-basic] 조건 검색

판다스에서 조건으로 행을 선택하는 방법을 알아보겠습니다.

import pandas as pd
df = pd.read_excel('census.xlsx')
 

쿼리

query 메소드에 코드를 문자열 형태로 입력하면, 그 코드가 참인 행을 선택해줍니다. 코드에서 열 이름은 변수처럼 사용하면 됩니다. 아래는 age 열의 값이 40보다 큰 행만 선택합니다.

df.query('age > 40')
ageworkclassfnlwgteducationeducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_countryincome
150Self-emp-not-inc83311Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale0013United-States<=50K
353Private23472111th7Married-civ-spouseHandlers-cleanersHusbandBlackMale0040United-States<=50K
649Private1601879th5Married-spouse-absentOther-serviceNot-in-familyBlackFemale0016Jamaica<=50K
................................................
3255453Private321865Masters14Married-civ-spouseExec-managerialHusbandWhiteMale0040United-States>50K
3255858Private151910HS-grad9WidowedAdm-clericalUnmarriedWhiteFemale0040United-States<=50K
3256052Self-emp-inc287927HS-grad9Married-civ-spouseExec-managerialWifeWhiteFemale15024040United-States>50K

13443 rows × 15 columns

 

and

두 가지 조건이 동시에 참이어야 하는 경우에는 and를 사용합니다. 아래는 age가 40이 넘고 sexMale인 경우를 찾습니다.

df.query('age > 40 and sex == "Male"')
ageworkclassfnlwgteducationeducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_countryincome
150Self-emp-not-inc83311Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale0013United-States<=50K
353Private23472111th7Married-civ-spouseHandlers-cleanersHusbandBlackMale0040United-States<=50K
752Self-emp-not-inc209642HS-grad9Married-civ-spouseExec-managerialHusbandWhiteMale0045United-States>50K
................................................
3255043Self-emp-not-inc27242Some-college10Married-civ-spouseCraft-repairHusbandWhiteMale0050United-States<=50K
3255243Private84661Assoc-voc11Married-civ-spouseSalesHusbandWhiteMale0045United-States<=50K
3255453Private321865Masters14Married-civ-spouseExec-managerialHusbandWhiteMale0040United-States>50K

9497 rows × 15 columns

 

or

둘 중에 하나라도 참이어야 하는 경우에는 or를 사용합니다. 아래는 relationshipHusband이거나 또는 Wife인 경우를 찾습니다.

df.query('relationship == "Husband" or relationship == "Wife"')
ageworkclassfnlwgteducationeducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_countryincome
150Self-emp-not-inc83311Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale0013United-States<=50K
353Private23472111th7Married-civ-spouseHandlers-cleanersHusbandBlackMale0040United-States<=50K
428Private338409Bachelors13Married-civ-spouseProf-specialtyWifeBlackFemale0040Cuba<=50K
................................................
3255627Private257302Assoc-acdm12Married-civ-spouseTech-supportWifeWhiteFemale0038United-States<=50K
3255740Private154374HS-grad9Married-civ-spouseMachine-op-inspctHusbandWhiteMale0040United-States>50K
3256052Self-emp-inc287927HS-grad9Married-civ-spouseExec-managerialWifeWhiteFemale15024040United-States>50K

14761 rows × 15 columns

 

not

반대되는 경우를 찾으려면 not을 사용합니다. 아래는 age가 30보다 작고 raceBlack이 경우가 아닌 행를 찾습니다.

df.query('not (age < 30 and race == "Black")')
ageworkclassfnlwgteducationeducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_countryincome
039State-gov77516Bachelors13Never-marriedAdm-clericalNot-in-familyWhiteMale2174040United-States<=50K
150Self-emp-not-inc83311Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale0013United-States<=50K
238Private215646HS-grad9DivorcedHandlers-cleanersNot-in-familyWhiteMale0040United-States<=50K
................................................
3255858Private151910HS-grad9WidowedAdm-clericalUnmarriedWhiteFemale0040United-States<=50K
3255922Private201490HS-grad9Never-marriedAdm-clericalOwn-childWhiteMale0020United-States<=50K
3256052Self-emp-inc287927HS-grad9Married-civ-spouseExec-managerialWifeWhiteFemale15024040United-States>50K

31603 rows × 15 columns

 

in

파이썬에서 in 연산자는 포함 관계를 나타냅니다. 아래는 relationship의 값이 Husband 또는 Wife에 포함되는 경우를 뜻합니다. or와 비슷하지만 비교할 대상이 많을 경우에는 포함관계의 형태로 나타내면 코드가 간결합니다.

df.query('relationship in ["Husband", "Wife"]')
ageworkclassfnlwgteducationeducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_countryincome
150Self-emp-not-inc83311Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale0013United-States<=50K
353Private23472111th7Married-civ-spouseHandlers-cleanersHusbandBlackMale0040United-States<=50K
428Private338409Bachelors13Married-civ-spouseProf-specialtyWifeBlackFemale0040Cuba<=50K
................................................
3255627Private257302Assoc-acdm12Married-civ-spouseTech-supportWifeWhiteFemale0038United-States<=50K
3255740Private154374HS-grad9Married-civ-spouseMachine-op-inspctHusbandWhiteMale0040United-States>50K
3256052Self-emp-inc287927HS-grad9Married-civ-spouseExec-managerialWifeWhiteFemale15024040United-States>50K

14761 rows × 15 columns

반대로 포함되지 않음을 나타낼 때는 not in을 씁니다.

df.query('relationship not in ["Husband", "Wife"]')
ageworkclassfnlwgteducationeducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours_per_weeknative_countryincome
039State-gov77516Bachelors13Never-marriedAdm-clericalNot-in-familyWhiteMale2174040United-States<=50K
238Private215646HS-grad9DivorcedHandlers-cleanersNot-in-familyWhiteMale0040United-States<=50K
649Private1601879th5Married-spouse-absentOther-serviceNot-in-familyBlackFemale0016Jamaica<=50K
................................................
3255522Private310152Some-college10Never-marriedProtective-servNot-in-familyWhiteMale0040United-States<=50K
3255858Private151910HS-grad9WidowedAdm-clericalUnmarriedWhiteFemale0040United-States<=50K
3255922Private201490HS-grad9Never-marriedAdm-clericalOwn-childWhiteMale0020United-States<=50K

17800 rows × 15 columns

Previous
인덱스 설정