1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

  2. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] Merging Many-to-Many with Pandas DataFrames

Discussão em 'Python' iniciado por Stack, Janeiro 27, 2025.

  1. Stack

    Stack Membro Participativo

    I have to match Payment Slips with Transaction data. Those two sets of data are not directly linked, as there isn't a shared ID among them.

    I came up with the Block_ID, calculated value I can use to match them as close as possible.

    Unfortunately, there may be repeated Block_ID so some Transaction may match the same Payment Slip while the other Payment Slips with the same Block_ID will remain unmatched.

    This was solved manually by using the following approach (pseudo python code):

    known_slips= set()

    def find_payment_slip(transaction):
    return (
    select(payment_slip_table)
    .where(block_column == transaction.block_id)
    .where(id_column.not_in(known_slips))
    .order_by(block_column, id_column)
    .limit(1)
    )

    def process_transaction():
    transactions = load_transactions() # Not really important how the data gets loaded
    for transaction in transactions:
    slip = find_payment_slip(transaction)
    if slip:
    known_slips.add(slip.id)
    transaction.slip_id = slip.id
    transaction.update()



    While this approach works, I know it is inefficient.

    I wanted to try using pandas DataFrame merge to accomplish this but my studies tell me I must perform a cross-apply and then filter.

    This approach also seems wasteful, as I will end up with a cartesian product of tables that easily surpass the thousands.

    I realize there may be other approaches using pandas however I am not familiar with the library enough.

    Is there a way to solve this problem with a pandas solution?

    Continue reading...

Compartilhe esta Página