
假设您的数据已经按主题,学生然后按等级排序。如果没有,请先对其进行排序。
#generate the reply_count for each valid combination by comparing the current row and the row above.count_list = df.apply(lambda x: [df.ix[x.name-1].student if x.name >0 else np.nan, x.student, x.level>1], axis=1).values#create a count dataframe using the count_list datadf_count = pd.Dataframe(columns=['st_source','st_dest','reply_count'], data=count_list)#Aggregate and sum all counts belonging to a source-dest pair, finally remove rows with same source and dest.df_count = df_count.groupby(['st_source','st_dest']).sum().astype(int).reset_index()[lambda x: x.st_source != x.st_dest]print(df_count)Out[218]: st_source st_dest reply_count1 a b 42 b a 23 b c 14 c a 15 c b 1
欢迎分享,转载请注明来源:内存溢出
微信扫一扫
支付宝扫一扫
评论列表(0条)