I have to match Payment Slips with Transaction data. Those two sets of data are not directly linked, as there isn't a shared ID among them. I came up with the Block_ID, calculated value I can use to match them as close as possible. Unfortunately, there may be repeated Block_ID so some Transaction may match the same Payment Slip while the other Payment Slips with the same Block_ID will remain unmatched. This was solved manually by using the following approach (pseudo python code): known_slips= set() def find_payment_slip(transaction): return ( select(payment_slip_table) .where(block_column == transaction.block_id) .where(id_column.not_in(known_slips)) .order_by(block_column, id_column) .limit(1) ) def process_transaction(): transactions = load_transactions() # Not really important how the data gets loaded for transaction in transactions: slip = find_payment_slip(transaction) if slip: known_slips.add(slip.id) transaction.slip_id = slip.id transaction.update() While this approach works, I know it is inefficient. I wanted to try using pandas DataFrame merge to accomplish this but my studies tell me I must perform a cross-apply and then filter. This approach also seems wasteful, as I will end up with a cartesian product of tables that easily surpass the thousands. I realize there may be other approaches using pandas however I am not familiar with the library enough. Is there a way to solve this problem with a pandas solution? Continue reading...