I am currently trying to count repeated values in a column of a CSV file and return the value to another CSV column in a python.
For example, my CSV file :
KeyID GeneralID
145258 KL456
145259 BG486
145260 HJ789
145261 KL456
What I want to achieve is to count how many data have the same GeneralID
and insert it into a new CSV column. For example,
KeyID Total_GeneralID
145258 2
145259 1
145260 1
145261 2
I have tried to split each column using the split method but it didn't work so well.
My code :
case_id_list_data = []
with open[file_path_1, "rU"] as g:
for line in g:
case_id_list_data.append[line.split['\t']]
#print case_id_list_data[0][0] #the result is dissatisfying
#I'm stuck here..
Stephen Rauch♦
45.7k30 gold badges105 silver badges126 bronze badges
asked Apr 25, 2017 at 5:21
And if you are adverse to pandas and want to stay with the standard library:
Code:
import csv
from collections import Counter
with open['file1', 'rU'] as f:
reader = csv.reader[f, delimiter='\t']
header = next[reader]
lines = [line for line in reader]
counts = Counter[[l[1] for l in lines]]
new_lines = [l + [str[counts[l[1]]]] for l in lines]
with open['file2', 'wb'] as f:
writer = csv.writer[f, delimiter='\t']
writer.writerow[header + ['Total_GeneralID']]
writer.writerows[new_lines]
Results:
KeyID GeneralID Total_GeneralID
145258 KL456 2
145259 BG486 1
145260 HJ789 1
145261 KL456 2
answered Apr 25, 2017 at 5:42
Stephen Rauch♦Stephen Rauch
45.7k30 gold badges105 silver badges126 bronze badges
2
You have to divide the task in three steps: 1. Read CSV file 2. Generate new column's value 3. Add value to the file back import csv import fileinput import sys
# 1. Read CSV file
# This is opening CSV and reading value from it.
with open["dev.csv"] as filein:
reader = csv.reader[filein, skipinitialspace = True]
xs, ys = zip[*reader]
result=["Total_GeneralID"]
# 2. Generate new column's value
# This loop is for counting the "GeneralID" element.
for i in range[1,len[ys],1]:
result.append[ys.count[ys[i]]]
# 3. Add value to the file back
# This loop is for writing new column
for ind,line in enumerate[fileinput.input["dev.csv",inplace=True]]:
sys.stdout.write["{} {}, {}\n".format["",line.rstrip[],result[ind]]]
I haven't use temp file or any high level module like panda or anything.
answered Apr 25, 2017 at 6:39
Raj DamaniRaj Damani
7756 silver badges19 bronze badges
2
import pandas as pd
#read your csv to a dataframe
df = pd.read_csv['file_path_1']
#generate the Total_GeneralID by counting the values in the GeneralID column and extract the occurrance for the current row.
df['Total_GeneralID'] = df.GeneralID.apply[lambda x: df.GeneralID.value_counts[][x]]
df = df[['KeyID','Total_GeneralID']]
Out[442]:
KeyID Total_GeneralID
0 145258 2
1 145259 1
2 145260 1
3 145261 2
answered Apr 25, 2017 at 5:29
Allen QinAllen Qin
18.8k6 gold badges48 silver badges62 bronze badges
You can use
pandas
library:
- first
read_csv
- get counts of values in column
GeneralID
byvalue_counts
,rename
by output column join
to originalDataFrame
import pandas as pd
df = pd.read_csv['file']
s = df['GeneralID'].value_counts[].rename['Total_GeneralID']
df = df.join[s, on='GeneralID']
print [df]
KeyID GeneralID Total_GeneralID
0 145258 KL456 2
1 145259 BG486 1
2 145260 HJ789 1
3 145261 KL456 2
answered Apr 25, 2017 at 5:37
jezraeljezrael
756k83 gold badges1203 silver badges1139 bronze badges
Use csv.reader instead of split[] method. Its easier.
Thanks
answered Apr 25, 2017 at 6:48
Vikram SharmaVikram Sharma
1992 gold badges2 silver badges13 bronze badges