Assign the nearest research stage to each record in a dataset.
Args: dataset (pd.DataFrame): The dataset containing records to be assigned research stages. population (pd.DataFrame): The population data with participant_id, cohort, research_stage, and research_stage_date. max_days (int, optional): The maximum number of days allowed between the collection date and research stage date. Defaults to 60. stages (List[str], optional): The list of types of research stages to consider. Defaults to [‘visit’]. agg (Union[str, None], optional): The aggregation function to be used when (optionally) aggregating multiple rows from the same research stage. The rows are already sorted by distance from the date of the research stage. Can be ‘first’ (closest), ‘last’ (farthest), ‘mean’, ‘min’, ‘max’, or None. Defaults to ‘first’.
Returns: pd.DataFrame: The dataset with the nearest research stage assigned to each record.