1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
  2. Anuncie Aqui
    Anuncie aqui você Também: fdantas@4each.com.br

[Python] Polars lazyframe update() silently failing in a serverless Cloud Function (OOM error)

Discussão em 'Python' iniciado por Stack, Dezembro 1, 2025.

  1. Stack

    Stack Membro Participativo

    I am trying to apply changes from one dataframe (source file is a 7 MB .CSV) to a larger dataframe (source file approx. 3GB .CSV), e.g. update existing rows with matching IDs, while at the same time adding new rows with no pre-existing ID in the larger dataframe. I believe the correct way to do this is to use the Polars update() method with the "how" strategy set to "full".

    Unfortunately, this works fine testing on my local machine but silently fails in a Cloud Function environment even with the container configured for 8G RAM.

    I am using scan_csv() with infer_schema=False to get LazyFrames (with only strings) of the two datasets before calling update(), and tried logging intermediate results using describe(), which logs the dataframe stats just fine for each of the source datasets, but never is able to get past the update() to log the resulting dataframe describe():

    import polars as pl

    large_df = pl.scan_csv(large_file_path, infer_schema=False)
    small_df = pl.scan_csv(small_file_path, infer_schema=False)

    logging.info(f'LARGE: {large_df.describe()}') # Logs are visible for this
    logging.info(f'SMALL: {small_df.describe()}') # Logs are visible for this
    merged_df = large_df.update(small_df, how='full', on='id') # results in OOM in the Cloud Function log
    logger.info(f'MERGED: {merged_df.describe()}') # Never reaches this line


    Am I doing anything wrong or inefficient here?

    Continue reading...

Compartilhe esta Página