I have a list of dictionaries which I need to aggregate in Python:
data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 10},
{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 50},
{"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]
and I'm looking to aggregate based on budgetImpressions.
So the final result should be:
data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 60},
{"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]
Note every entry with a certain campaignName will always have the same corresponding campaignCfid, startDate and endDate.
Can this be done in Python? I've tried using itertools without much success. Would it be a better approach to use Pandas?
The itertools function in Python provides an efficient way for looping lists, tuples and dictionaries. The itertools.groupby function in itertools will be applied in this tutorial to group a list of dictionaries by a particular key.
To illustrate how this works, we will look at a list of students info [in dictionaries] and try to group these by “class” key such…
Group List of Dictionary Data by Particular Key in Python can be done using itertools.groupby[] method.
Itertools.groupby[]
This method calculates the keys for each element present in iterable. It returns key and iterable of grouped items.
Syntax: itertools.groupby[iterable, key_func]
Parameters:
- iterable: Iterable can be of any kind [list, tuple, dictionary].
- key_func: A function that calculates keys for each element present in iterable.
Return type: It returns consecutive keys and groups from the iterable. If the key function is not specified or is None, key defaults to an identity function and returns the element unchanged.
Let’s see the examples: Example 1: Suppose we have list of dictionary of employee and company.
INFO = [ {'employee': 'XYZ_1', 'company': 'ABC_1'}, {'employee': 'XYZ_2', 'company': 'ABC_2'}, {'employee': 'XYZ_3', 'company': 'ABC_3'}, {'employee': 'XYZ_4', 'company': 'ABC_3'}, {'employee': 'XYZ_5', 'company': 'ABC_2'}, {'employee': 'XYZ_6', 'company': 'ABC_3'}, {'employee': 'XYZ_7', 'company': 'ABC_1'}, {'employee': 'XYZ_8', 'company': 'ABC_2'}, {'employee': 'XYZ_9', 'company': 'ABC_1'} ]
Now we need to display all the data group by the ‘company’ key name.
Code:
Python3
from
itertools
import
groupby
INFO
=
[
{
'employee'
:
'XYZ_1'
,
'company'
:
'ABC_1'
},
{
'employee'
:
'XYZ_2'
,
'company'
:
'ABC_2'
},
{
'employee'
:
'XYZ_3'
,
'company'
:
'ABC_3'
},
{
'employee'
:
'XYZ_4'
,
'company'
:
'ABC_3'
},
{
'employee'
:
'XYZ_5'
,
'company'
:
'ABC_2'
},
{
'employee'
:
'XYZ_6'
,
'company'
:
'ABC_3'
},
{
'employee'
:
'XYZ_7'
,
'company'
:
'ABC_1'
},
{
'employee'
:
'XYZ_8'
,
'company'
:
'ABC_2'
},
{
'employee'
:
'XYZ_9'
,
'company'
:
'ABC_1'
}
]
def
key_func[k]:
return
k[
'company'
]
INFO
=
sorted
[INFO, key
=
key_func]
for
key, value
in
groupby[INFO, key_func]:
print
[key]
print
[
list
[value]]
Output:
ABC_1 [{’employee’: ‘XYZ_1’, ‘company’: ‘ABC_1′}, {’employee’: ‘XYZ_7’, ‘company’: ‘ABC_1′}, {’employee’: ‘XYZ_9’, ‘company’: ‘ABC_1′}] ABC_2 [{’employee’: ‘XYZ_2’, ‘company’: ‘ABC_2′}, {’employee’: ‘XYZ_5’, ‘company’: ‘ABC_2′}, {’employee’: ‘XYZ_8’, ‘company’: ‘ABC_2′}] ABC_3 [{’employee’: ‘XYZ_3’, ‘company’: ‘ABC_3′}, {’employee’: ‘XYZ_4’, ‘company’: ‘ABC_3′}, {’employee’: ‘XYZ_6’, ‘company’: ‘ABC_3’}]
Example 2: Suppose we have list of dictionary of student grades and marks.
students = [ {'mark': '65','grade': 'C'}, {'mark': '86','grade': 'A'}, {'mark': '73','grade': 'B'}, {'mark': '49','grade': 'D'}, {'mark': '91','grade': 'A'}, {'mark': '79','grade': 'B'} ]
Now we need to display all the data group by the ‘grade’ key.
Code:
Python3
from
itertools
import
groupby
from
operator
import
itemgetter
students
=
[
{
'mark'
:
'65'
,
'grade'
:
'C'
},
{
'mark'
:
'86'
,
'grade'
:
'A'
},
{
'mark'
:
'73'
,
'grade'
:
'B'
},
{
'mark'
:
'49'
,
'grade'
:
'D'
},
{
'mark'
:
'91'
,
'grade'
:
'A'
},
{
'mark'
:
'79'
,
'grade'
:
'B'
}
]
students
=
sorted
[students,
key
=
itemgetter[
'grade'
]]
for
key, value
in
groupby[students,
key
=
itemgetter[
'grade'
]]:
print
[key]
for
k
in
value:
print
[k]
Output:
A {'mark': '86', 'grade': 'A'} {'mark': '91', 'grade': 'A'} B {'mark': '73', 'grade': 'B'} {'mark': '79', 'grade': 'B'} C {'mark': '65', 'grade': 'C'} D {'mark': '49', 'grade': 'D'}