1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
  2. Anuncie Aqui
    Anuncie aqui você Também: fdantas@4each.com.br

[Python] Filter a pandas df: per group, keep only non-null rows if we have them, else keep a...

Discussão em 'Python' iniciado por Stack, Dezembro 1, 2025.

  1. Stack

    Stack Membro Participativo

    Hopefully the title is reasonably intuitive, edits welcome. Say I have this dataframe:

    df = pd.DataFrame({'x': ['A', 'B', 'B', 'C', 'C', 'C', 'D', 'D'],
    'y': [None, None, 1, 2, 3, 4, None, None]})

    x y
    0 A NaN
    1 B NaN
    2 B 1.0
    3 C 2.0
    4 C 3.0
    5 C 4.0
    6 D NaN
    7 D NaN


    Per grouping variable, x in this case, I want to keep:

    • only the rows where y is not None if any non-null values exist
    • a single row to represent x in the case that all y is None

    That is: keep A (only one null row), only non-null B, all of C, and one row for D

    Here is one approach:

    pd.concat([
    df.groupby('x').filter(lambda x: any(x['y'].notna())).dropna(),
    df.groupby('x').filter(lambda x: all(x['y'].isna())).drop_duplicates()
    ])

    x y
    2 B 1.0
    3 C 2.0
    4 C 3.0
    5 C 4.0
    0 A NaN
    6 D NaN


    I could also drop NAs and merge with unique values of x to bring back any that are no longer represented?

    df.loc[df['y'].notna()].merge(df[['x']].drop_duplicates(),
    on='x', how='outer')

    x y
    0 A NaN
    1 B 1.0
    2 C 2.0
    3 C 3.0
    4 C 4.0
    5 D NaN


    Is there something more elegant than this? I thought of some kind of all-in-one filter() but struck out...

    Continue reading...

Compartilhe esta Página