pandas selection and filtering basics

Topics

pandas

data wrangling

Basic Filtering

Label vs position selection: Use .loc for label-based indexing, .iloc for positional indexing.

.loc: Index labels or boolean masks
.iloc: Integer positions

df.loc['2025-01-01']           # select rows with index label "2025-01-01"
df.loc[df['value'] > 0, 'id']  # filter rows by condition, get 'id'
df.iloc[0:5, [1, 3]]           # first 5 rows, columns at 1 and 3

.at (label-based) or .iat (position-based) are faster for single value access

Boolean filtering: Boolean masks filter DataFrames. Combine conditions with & (AND), | (OR), and use parentheses.

mask = (df['age'] > 30) & (df['income'] < 50_000)
subset = df[mask]
subset = df[df['category'] != 'X']

.isin performs membership checks:

df[df['country'].isin(['USA', 'Canada'])]
df[df['score'].between(0, 100)]    # numeric range filter (inclusive)

Query strings: Filter using DataFrame.query(expr) with pandas-safe expression strings. Concise, leverages numexpr engine for speed.

df.query('age > 30 and income < 50000')
min_val = 5
df.query('value >= @min_val or category == "A"')  # use @ for external vars

Tip

query/eval may be faster than complex boolean indexing, avoids intermediate arrays.

Filtering with eval: Use df.eval("df.col > 2") using expression strings (can also include boolean masks). Vectorized evaluation improves performance for complex expressions.

import math
df[df.eval('math.sin(df.a)>0')] # fast filtering

Multi-index slicing: Use .loc with tuples/pd.IndexSlice for DataFrames with MultiIndex to slice by partial levels.

Example: df.loc[('StoreA', slice('2025-01','2025-02')), :] selects data for StoreA in Jan-Feb 2025
Use df.xs() (cross-section) to select along a level by label: df.xs('StoreA', level='store')

Select and Assign

Conditional assignment: Use np.where for vectorized if-else logic. It chooses elements from two options based on a boolean condition array:

df['sign'] = np.where(df['balance'] >= 0, 'Positive', 'Negative')

For multiple conditions, use np.select:

conditions = [df['score'] >= 90, df['score'] >= 60]
choices = ['A', 'B']
df['grade'] = np.select(conditions, choices, default='C')

These are much faster than iterating or using apply row by row.

DataFrame.where and mask: df.where(cond, other) keeps original values where cond is True and uses other where False. df.mask(cond, other) does the inverse (replaces where condition is True). This is useful for bulk value setting:

df['adjusted'] = df['value'].where(df['value'] >= 0, 0)  # set negatives to 0
# equivalent using mask:
df['adjusted'] = df['value'].mask(df['value'] < 0, 0)

Tip

You can also fill entire DataFrame cells conditionally (e.g., replace all negative entries with NaN: df.mask(df < 0, np.nan)).

pandas groupby and agg basics

Altamash Khan

Altamash Khan

pandas selection and filtering basics

Basic Filtering

Select and Assign

Backlinks

Altamash Khan

pandas selection and filtering basics

Basic Filtering

Select and Assign

Related

Backlinks