First of all, as people say - CSV format looks simple, but it can be quite nontrivial, especially once strings enter play. monkut already gave you two solutions, the cleaned-up version of your code, and one more that uses CSV library. I'll give yet another option: no libraries, but plenty of idiomatic code to chew on, which gives you averages for all columns at once.
def get_averages[csv]:
column_sums = None
with open[csv] as file:
lines = file.readlines[]
rows_of_numbers = [map[float, line.split[',']] for line in lines]
sums = map[sum, zip[*rows_of_numbers]]
averages = [sum_item / len[lines] for sum_item in sums]
return averages
Things to note: In your code, f
is a file object. You try to close it after you have
already returned the value. This code will never be reached: nothing executes after a return
has been processed, unless you have a try...finally
construct, or with
construct [like I am using - which will automatically close the stream].
map[f, l]
, or equivalent [f[x] for x in l]
, creates a new list whose elements are obtained by applying function f
on each element on l
.
f[*l]
will "unpack" the list l
before function invocation, giving to function f
each element
as a separate argument.
Pandas is a powerful Python package that can be used to perform statistical analysis. In this guide, you’ll see how to use Pandas to calculate stats from an imported CSV file.
The Example
To demonstrate how to calculate stats from an imported CSV file, let’s review a simple example with the following dataset:
Name | Salary | Country |
Dan | 40000 | USA |
Elizabeth | 32000 | Brazil |
Jon | 45000 | Italy |
Maria | 54000 | USA |
Mark | 72000 | USA |
Bill | 62000 | Brazil |
Jess | 92000 | Italy |
Julia | 55000 | USA |
Jeff | 35000 | Italy |
Ben | 48000 | Brazil |
Step 1: Copy the Dataset into a CSV file
To begin, you’ll need to copy the above dataset into a CSV file. Then rename the CSV file as stats.
Step 2: Import the CSV File into Python
Next, you’ll need to import the CSV file into Python using this template:
import pandas as pd df = pd.read_csv [r'Path where the CSV file is stored\File name.csv'] print [df]
Here is an example of a path where the CSV file is stored:
C:\Users\Ron\Desktop\stats.csv
So the complete code to import the stats CSV file is captured below [note that you’ll need to modify the path to reflect the location where the CSV file is stored on your computer]:
import pandas as pd df = pd.read_csv [r'C:\Users\Ron\Desktop\stats.csv'] print [df]
Once you run the code in Python [adjusted to your path], you’ll get the following DataFrame:
Name Salary Country
0 Dan 40000 USA
1 Elizabeth 32000 Brazil
2 Jon 45000 Italy
3 Maria 54000 USA
4 Mark 72000 USA
5 Bill 62000 Brazil
6 Jess 92000 Italy
7 Julia 55000 USA
8 Jeff 35000 Italy
9 Ben 48000 Brazil
Step 3: Use Pandas to Calculate Stats from an Imported CSV File
For the final step, the goal is to calculate the following statistics using the Pandas package:
- Mean salary
- Total sum of salaries
- Maximum salary
- Minimum salary
- Count of salaries
- Median salary
- Standard deviation of salaries
- Variance of of salaries
In addition, we’ll also do some grouping calculations:
- Sum of salaries, grouped by the Country column
- Count of salaries, grouped by the Country column
Once you’re ready, run the code below in order to calculate the stats from the imported CSV file using Pandas. As indicated earlier, you’ll need to change the path name [2nd row in the code] to reflect the location where the CSV file is stored on your computer.
import pandas as pd df = pd.read_csv [r'C:\Users\Ron\Desktop\stats.csv'] # block 1 - simple stats mean1 = df['Salary'].mean[] sum1 = df['Salary'].sum[] max1 = df['Salary'].max[] min1 = df['Salary'].min[] count1 = df['Salary'].count[] median1 = df['Salary'].median[] std1 = df['Salary'].std[] var1 = df['Salary'].var[] # block 2 - group by groupby_sum1 = df.groupby[['Country']].sum[] groupby_count1 = df.groupby[['Country']].count[] # print block 1 print ['Mean salary: ' + str[mean1]] print ['Sum of salaries: ' + str[sum1]] print ['Max salary: ' + str[max1]] print ['Min salary: ' + str[min1]] print ['Count of salaries: ' + str[count1]] print ['Median salary: ' + str[median1]] print ['Std of salaries: ' + str[std1]] print ['Var of salaries: ' + str[var1]] # print block 2 print ['Sum of values, grouped by the Country: ' + str[groupby_sum1]] print ['Count of values, grouped by the Country: ' + str[groupby_count1]]
After you run the code in Python, you’ll get the following results:
Mean salary: 53500.0
Sum of salaries: 535000
Max salary: 92000
Min salary: 32000
Count of salaries: 10
Median salary: 51000.0
Std of salaries: 18222.391598128816
Var of salaries: 332055555.5555556
Sum of values, grouped by the Country:
Country
Brazil 142000
Italy 172000
USA 221000
Count of values, grouped by the Country:
Country
Brazil 3 3
Italy 3 3
USA 4 4
You just saw how to calculate simple stats using Pandas. You may also want to check the Pandas documentation to learn more about the power of this great library!