Pandas如何基于某一列的数据对其它列数据进行操作(三种方法)

我们就简单举一个例子把star_rating为3到4中的positive减去0.25把star_rating小于3的positive减去0.3star_ratingpositive050.98072110.737101250.945672320.729632450.99853530.408589610.65...

Z.ink

12182人浏览 · 2020-03-10 11:38:43

Z.ink · 2020-03-10 11:38:43 发布

我们就简单举一个例子

把star_rating为3到4中的positive减去0.25

把star_rating小于3的positive减去0.3

	star_rating	positive
0	5	0.98072
1	1	0.737101
2	5	0.945672
3	2	0.729632
4	5	0.99853
5	3	0.408589
6	1	0.650988
7	1	0.666691
8	5	0.899953
9	4	0.895248
10	4	0.609864
11	3	0.614354
12	4	0.892443
13	3	0.648455
14	4	0.880974
15	5	0.998756
16	3	0.046396
17	4	0.882441
18	1	0.509702
19	5	0.959157
20	1	0.640282

这里提供多种方法有麻烦有简单：

法一（利用切片）：

dff.loc[(3 <= dff["star_rating"])&(dff["star_rating"] <= 4),"positive"] = dff.loc[(3 <= dff["star_rating"])&(dff["star_rating"] <= 4),"positive"].apply(lambda x:np.abs(x-0.25))
dff.loc[(0 <= dff["star_rating"])&(dff["star_rating"] < 3),"positive"] = dff.loc[(0 <= dff["star_rating"])&(dff["star_rating"] <  3),"positive"].apply(lambda x:np.abs(x-0.3))

法二（利用apply + lambda）：

def to_cal(x,y):
    if 3 <=x <=4:
        y -= 0.25 
    elif 1 <= x < 3:
        y -= 0.2
    return y 
dff.apply(lambda row:to_cal(row["star_rating"],row["positive"]),axis = 1)

法三（切片 + apply)

def fun(a):
    if a['star_rating'] <= 3:
        a.loc["positive"]  = a.loc["positive"] - 0.3
    elif a['star_rating'] >3 and a['star_rating'] <= 4:
        a.loc['positive'] = a.loc['positive'] - 0.25
    return a

data = dff.loc[:].apply(fun,axis = 1)