I have a large RDF file containing about 600,000 records of Wikidata taxonomy. From this file, I am only interested in subclassOf relations [predicate], thus, I ignore all the other statements keeping only "subclassOf" statements. The statement is like: a
is a subclassOf
b
, b
is a subclassOf
c
Like that c
is a parent of b
and b
is a parent of a
. And any parent can have many children. I want to build hierarchical tree using this
taxonomy. I have checked this thread and it almost solved my problem. Recursively creating a tree hierarchy without using class/object However, with this, I am getting tree in dictionary which I want to convert into tree data-structure. Following is what I have tried:
data = [['a','x'], ['b','x'], ['c','x'], ['x','y'], ['t','y'], ['y','p'], ['p','q']]
roots = set[]
mapping = {}
for child,parent in data:
childitem = mapping.get[child,None]
if childitem is None:
childitem = {}
mapping[child] = childitem
else:
roots.discard[child]
parentitem = mapping.get[parent,None]
if parentitem is None:
mapping[parent] = {child:childitem}
roots.add[parent]
else:
parentitem[child] = childitem
for root in roots:
print[mapping[root]]
tree = { id : mapping[id] for id in roots }
print[tree]
Output of tree looks as below:
{'q': {'p': {'y': {'t': {}, 'x': {'c': {}, 'b': {}, 'a': {}}}}}}
I want to convert this dictionary to tree. So e.g. when I say print[mapping['y']], it should give me Node y i.e.
q
├── p
└── y
currently, if I say mapping['y'], it gives me subtree rooted at y. I think there is some easy solution for this, but I am not able to understand it. I have found this link as well //gist.github.com/hrldcpr/2012250 to convert dictionary into tree, but not sure how to use it in my case. Alernatively, if
anyone knows to directly built a tree from the RDF data that I had given above, then it will be most welcome. Probably python's anytree API
will solve my issue.
Given a nested dictionary, the task is to convert this dictionary into a flattened dictionary where the key is separated by ‘_’ in case of the nested key to be started.
Given below are a few methods to solve the above task.
Method #1: Using Naive Approach
Python3
def
flatten_dict[dd, separator
=
'_'
, prefix
=
'']:
return
{ prefix
+
separator
+
k
if
prefix
else
k : v
for
kk, vv
in
dd.items[]
for
k, v
in
flatten_dict[vv, separator, kk].items[]
}
if
isinstance
[dd,
dict
]
else
{ prefix : dd }
ini_dict
=
{
'geeks'
: {
'Geeks'
: {
'for'
:
7
}},
'for'
: {
'geeks'
: {
'Geeks'
:
3
}},
'Geeks'
: {
'for'
: {
'for'
:
1
,
'geeks'
:
4
}}}
print
[
"initial_dictionary"
,
str
[ini_dict]]
print
[
"final_dictionary"
,
str
[flatten_dict[ini_dict]]]
Output:
initial_dictionary {‘geeks’: {‘Geeks’: {‘for’: 7}}, ‘Geeks’: {‘for’: {‘geeks’: 4, ‘for’: 1}}, ‘for’: {‘geeks’: {‘Geeks’: 3}}}
final_dictionary {‘Geeks_for_for’: 1, ‘geeks_Geeks_for’: 7, ‘for_geeks_Geeks’: 3, ‘Geeks_for_geeks’: 4}
Method #2: Using mutuableMapping
Python3
from
collections
import
MutableMapping
def
convert_flatten[d, parent_key
=
'
', sep ='
_']:
items
=
[]
for
k, v
in
d.items[]:
new_key
=
parent_key
+
sep
+
k
if
parent_key
else
k
if
isinstance
[v, MutableMapping]:
items.extend[convert_flatten[v, new_key, sep
=
sep].items[]]
else
:
items.append[[new_key, v]]
return
dict
[items]
ini_dict
=
{
'geeks'
: {
'Geeks'
: {
'for'
:
7
}},
'for'
: {
'geeks'
: {
'Geeks'
:
3
}},
'Geeks'
: {
'for'
: {
'for'
:
1
,
'geeks'
:
4
}}}
print
[
"initial_dictionary"
,
str
[ini_dict]]
print
[
"final_dictionary"
,
str
[convert_flatten[ini_dict]]]
Output:
initial_dictionary
{‘Geeks’: {‘for’: {‘for’: 1, ‘geeks’: 4}}, ‘for’: {‘geeks’: {‘Geeks’: 3}}, ‘geeks’: {‘Geeks’: {‘for’: 7}}}
final_dictionary {‘Geeks_for_geeks’: 4, ‘for_geeks_Geeks’: 3, ‘geeks_Geeks_for’: 7, ‘Geeks_for_for’: 1}
Method #3: Using Python Generators
Python3
my_map
=
{
"a"
:
1
,
"b"
: {
"c"
:
2
,
"d"
:
3
,
"e"
: {
"f"
:
4
,
6
:
"a"
,
5
:{
"g"
:
6
},
"l"
:[
1
,
"two"
]
}
}}
ini_dict
=
{
'geeks'
: {
'Geeks'
: {
'for'
:
7
}},
'for'
: {
'geeks'
: {
'Geeks'
:
3
}},
'Geeks'
: {
'for'
: {
'for'
:
1
,
'geeks'
:
4
}}}
def
flatten_dict[pyobj, keystring
=
'']:
if
type
[pyobj]
=
=
dict
:
keystring
=
keystring
+
'_'
if
keystring
else
keystring
for
k
in
pyobj:
yield
from
flatten_dict[pyobj[k], keystring
+
str
[k]]
else
:
yield
keystring, pyobj
print
[
"Input : %s\nOutput : %s\n\n"
%
[my_map, { k:v
for
k,v
in
flatten_dict[my_map] }]]
print
[
"Input : %s\nOutput : %s\n\n"
%
[ini_dict, { k:v
for
k,v
in
flatten_dict[ini_dict] }]]
Output:
initial_dictionary {‘for’: {‘geeks’: {‘Geeks’: 3}}, ‘geeks’: {‘Geeks’: {‘for’: 7}}, ‘Geeks’: {‘for’: {‘for’: 1, ‘geeks’: 4}}}
final_dictionary {‘Geeks_for_geeks’: 4, ‘for_geeks_Geeks’: 3, ‘Geeks_for_for’: 1, ‘geeks_Geeks_for’: 7}