save_heterograph
- gli.io.save_heterograph(name: str, edge: Dict[Tuple[str, str, str], ndarray], num_nodes_dict: Dict[str, int] | None = None, node_attrs: Dict[str, List[Attribute]] | None = None, edge_attrs: Dict[Tuple[str, str, str], List[Attribute]] | None = None, graph_node_list: Dict[str, spmatrix] | None = None, graph_edge_list: Dict[Tuple[str, str, str], spmatrix] | None = None, graph_attrs: List[Attribute] | None = None, description: str = '', citation: str = '', save_dir: str = '.')
Save a heterogeneous graph info to metadata.json and numpy data files.
- Parameters:
name (str) – The name of the graph dataset.
edge (Dict[Tuple[str, str, str], array]) – The key is a tuple of (src_node_type, edge_type, dst_node_type). And the map value is a 2D numpy array with shape (num_edges, 2). Each row is an edge with the format (src_id, dst_id). Each node group should be indexed separately from 0.
num_nodes_dict (Dict[str, int], optional) – The number of nodes in each node group. If None, it will be infered from the
edge
.node_attrs (Dict[str, List[Attribute]], optional) – The node attributes. The key is the node group name and the value is a list of Attribute, default to None.
edge_attrs (Dict[Tuple[str, str, str], List[Attribute]], optional) – The edge attributes. The key is a tuple of (src_node_type, edge_type, dst_node_type) and the value is a list of Attribute, default to None.
graph_node_list (Dict[str, spmatrix], optional) – A dictionary that maps the node group name to a sparse matrix of shape (num_graphs, num_nodes_in_group). Each row corresponds to a graph and each column corresponds to a node in that node group. The value of the element (i, j) is 1 if node j is in graph i, otherwise 0. If not specified, the graph will be considered as a single graph, defaults to None.
graph_edge_list (Dict[Tuple[str, str, str], spmatrix], optional) – A dictionary that maps the edge group to a sparse matrix of shape (num_graphs, num_edges_in_group). Each row corresponds to a graph and each column corresponds to an edge in that edge group. The value of the element (i, j) is 1 if edge j is in graph i, otherwise 0. If not specified, the graph will be considered as a single graph, defaults to None.
graph_attrs (List[Attribute], optional) – The graph attributes, defaults to None.
description (str) – The description of the graph dataset, defaults to “”.
citation (str) – The citation of the graph dataset, defaults to “”. Contributors are strongly encouraged to provide a citation.
save_dir (str) – The directory to save the graph dataset, defaults to “.”.
- Returns:
The dictionary of the content in metadata.json.
- Return type:
dict
Warning
Currently gli only support saving a single heterograph dataset. So the parameters
graph_node_list
andgraph_edge_list
are essentially redundant. They are only kept for future extension.Note
Node IDs for each node group should be indexed separately from 0. For example, consider a heterogeneous graph with two node groups “user” and “item”. If there are 3 users and 5 items, the node IDs for “user” should be 0, 1, 2 and the node IDs for “item” should be 0, 1, 2, 3, 4. gli will internally assign a global ID to each node which is unique across all node. Users can access the global node ID of a graph g by g.node_map member.
Example
import numpy as np from numpy.random import randn from scipy.sparse import random as sparse_random from gli.io import save_heterograph, Attribute node_groups = ["user", "item"] edge_groups = [("user", "click", "item"), ("user", "purchase", "item"), ("user", "is_friend", "user")] # Create a sample graph with 3 user nodes and 4+1 item nodes. edge = { edge_groups[0]: np.array([[0, 0], [0, 1], [1, 2], [2, 3]]), edge_groups[1]: np.array([[0, 1], [1, 2]]), edge_groups[2]: np.array([[0, 1], [2, 1]]) } node_attrs = { node_groups[0]: [ Attribute("UserDenseFeature", randn(3, 5), "Dense user features."), Attribute("UserSparseFeature", sparse_random(3, 500), "Sparse user features."), ], node_groups[1]: [ Attribute("ItemDenseFeature", randn(5, 5), "Dense item features.") ] } edge_attrs = { edge_groups[0]: [ Attribute("ClickTime", randn(4, 1), "Click time.") ], edge_groups[1]: [ Attribute("PurchaseTime", randn(2, 1), "Purchase time.") ], edge_groups[2]: [ Attribute("SparseFriendFeature", sparse_random(2, 500), "Sparse friend features."), Attribute("DenseFriendFeature", randn(2, 5), "Dense friend features.") ] } num_nodes_dict = { node_groups[0]: 3, node_groups[1]: 5 # more than the actual number of items in the edges } # Save the graph dataset. save_heterograph(name="example_hetero_dataset", edge=edge, num_nodes_dict=num_nodes_dict, node_attrs=node_attrs, edge_attrs=edge_attrs, description="An example heterograph dataset.")
The metadata.json will look like the following:
{ "description": "An example heterograph dataset.", "citation": "", "data": { "Node": { "user": { "UserDenseFeature": { "description": "Dense user features.", "type": "float", "format": "Tensor", "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz", "key": "Node_user_UserDenseFeature" }, "UserSparseFeature": { "description": "Sparse user features.", "type": "float", "format": "SparseTensor", "file": "example_hetero_dataset__heterograph__Node_user_UserSparseFeature__30209d631dcc4ae3813d3c360f9c42dd.sparse.npz" }, "_ID": { "description": "", "type": "int", "format": "Tensor", "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz", "key": "Node_user__ID" } }, "item": { "ItemDenseFeature": { "description": "Dense item features.", "type": "float", "format": "Tensor", "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz", "key": "Node_item_ItemDenseFeature" }, "_ID": { "description": "", "type": "int", "format": "Tensor", "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz", "key": "Node_item__ID" } } }, "Edge": { "user_click_item": { "ClickTime": { "description": "Click time.", "type": "float", "format": "Tensor", "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz", "key": "Edge_user_click_item_ClickTime" }, "_ID": { "description": "", "type": "int", "format": "Tensor", "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz", "key": "Edge_user_click_item__ID" }, "_Edge": { "description": "", "type": "int", "format": "Tensor", "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz", "key": "Edge_user_click_item__Edge" } }, "user_purchase_item": { "PurchaseTime": { "description": "Purchase time.", "type": "float", "format": "Tensor", "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz", "key": "Edge_user_purchase_item_PurchaseTime" }, "_ID": { "description": "", "type": "int", "format": "Tensor", "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz", "key": "Edge_user_purchase_item__ID" }, "_Edge": { "description": "", "type": "int", "format": "Tensor", "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz", "key": "Edge_user_purchase_item__Edge" } }, "user_is_friend_user": { "SparseFriendFeature": { "description": "Sparse friend features.", "type": "float", "format": "SparseTensor", "file": "example_hetero_dataset__heterograph__Edge_user_is_friend_user_SparseFriendFeature__fc3b5ebfe3efe6ac35e116c02d388ac6.sparse.npz" }, "DenseFriendFeature": { "description": "Dense friend features.", "type": "float", "format": "Tensor", "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz", "key": "Edge_user_is_friend_user_DenseFriendFeature" }, "_ID": { "description": "", "type": "int", "format": "Tensor", "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz", "key": "Edge_user_is_friend_user__ID" }, "_Edge": { "description": "", "type": "int", "format": "Tensor", "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz", "key": "Edge_user_is_friend_user__Edge" } } }, "Graph": { "_NodeList": { "file": "example_hetero_dataset__heterograph__Graph_NodeList__752140b0bd5669a2580f06dda6a70ced.sparse.npz" } } }, "is_heterogeneous": true }