save_heterograph

gli.io.save_heterograph(name: str, edge: Dict[Tuple[str, str, str], ndarray], num_nodes_dict: Dict[str, int] | None = None, node_attrs: Dict[str, List[Attribute]] | None = None, edge_attrs: Dict[Tuple[str, str, str], List[Attribute]] | None = None, graph_node_list: Dict[str, spmatrix] | None = None, graph_edge_list: Dict[Tuple[str, str, str], spmatrix] | None = None, graph_attrs: List[Attribute] | None = None, description: str = '', citation: str = '', save_dir: str = '.')

Save a heterogeneous graph info to metadata.json and numpy data files.

Parameters:
  • name (str) – The name of the graph dataset.

  • edge (Dict[Tuple[str, str, str], array]) – The key is a tuple of (src_node_type, edge_type, dst_node_type). And the map value is a 2D numpy array with shape (num_edges, 2). Each row is an edge with the format (src_id, dst_id). Each node group should be indexed separately from 0.

  • num_nodes_dict (Dict[str, int], optional) – The number of nodes in each node group. If None, it will be infered from the edge.

  • node_attrs (Dict[str, List[Attribute]], optional) – The node attributes. The key is the node group name and the value is a list of Attribute, default to None.

  • edge_attrs (Dict[Tuple[str, str, str], List[Attribute]], optional) – The edge attributes. The key is a tuple of (src_node_type, edge_type, dst_node_type) and the value is a list of Attribute, default to None.

  • graph_node_list (Dict[str, spmatrix], optional) – A dictionary that maps the node group name to a sparse matrix of shape (num_graphs, num_nodes_in_group). Each row corresponds to a graph and each column corresponds to a node in that node group. The value of the element (i, j) is 1 if node j is in graph i, otherwise 0. If not specified, the graph will be considered as a single graph, defaults to None.

  • graph_edge_list (Dict[Tuple[str, str, str], spmatrix], optional) – A dictionary that maps the edge group to a sparse matrix of shape (num_graphs, num_edges_in_group). Each row corresponds to a graph and each column corresponds to an edge in that edge group. The value of the element (i, j) is 1 if edge j is in graph i, otherwise 0. If not specified, the graph will be considered as a single graph, defaults to None.

  • graph_attrs (List[Attribute], optional) – The graph attributes, defaults to None.

  • description (str) – The description of the graph dataset, defaults to “”.

  • citation (str) – The citation of the graph dataset, defaults to “”. Contributors are strongly encouraged to provide a citation.

  • save_dir (str) – The directory to save the graph dataset, defaults to “.”.

Returns:

The dictionary of the content in metadata.json.

Return type:

dict

Warning

Currently gli only support saving a single heterograph dataset. So the parameters graph_node_list and graph_edge_list are essentially redundant. They are only kept for future extension.

Note

Node IDs for each node group should be indexed separately from 0. For example, consider a heterogeneous graph with two node groups “user” and “item”. If there are 3 users and 5 items, the node IDs for “user” should be 0, 1, 2 and the node IDs for “item” should be 0, 1, 2, 3, 4. gli will internally assign a global ID to each node which is unique across all node. Users can access the global node ID of a graph g by g.node_map member.

Example

import numpy as np
from numpy.random import randn
from scipy.sparse import random as sparse_random
from gli.io import save_heterograph, Attribute

node_groups = ["user", "item"]
edge_groups = [("user", "click", "item"), ("user", "purchase", "item"),
               ("user", "is_friend", "user")]
# Create a sample graph with 3 user nodes and 4+1 item nodes.
edge = {
    edge_groups[0]: np.array([[0, 0], [0, 1], [1, 2], [2, 3]]),
    edge_groups[1]: np.array([[0, 1], [1, 2]]),
    edge_groups[2]: np.array([[0, 1], [2, 1]])
}

node_attrs = {
    node_groups[0]: [
        Attribute("UserDenseFeature", randn(3, 5),
                  "Dense user features."),
        Attribute("UserSparseFeature", sparse_random(3, 500),
                  "Sparse user features."),
    ],
    node_groups[1]: [
        Attribute("ItemDenseFeature", randn(5, 5),
                  "Dense item features.")
    ]
}

edge_attrs = {
    edge_groups[0]: [
        Attribute("ClickTime", randn(4, 1), "Click time.")
    ],
    edge_groups[1]: [
        Attribute("PurchaseTime", randn(2, 1), "Purchase time.")
    ],
    edge_groups[2]: [
        Attribute("SparseFriendFeature", sparse_random(2, 500),
                  "Sparse friend features."),
        Attribute("DenseFriendFeature", randn(2, 5),
                  "Dense friend features.")
    ]
}

num_nodes_dict = {
    node_groups[0]: 3,
    node_groups[1]:
        5  # more than the actual number of items in the edges
}

# Save the graph dataset.
save_heterograph(name="example_hetero_dataset",
                        edge=edge,
                        num_nodes_dict=num_nodes_dict,
                        node_attrs=node_attrs,
                        edge_attrs=edge_attrs,
                        description="An example heterograph dataset.")

The metadata.json will look like the following:

{
    "description": "An example heterograph dataset.",
    "citation": "",
    "data": {
        "Node": {
            "user": {
                "UserDenseFeature": {
                    "description": "Dense user features.",
                    "type": "float",
                    "format": "Tensor",
                    "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz",
                    "key": "Node_user_UserDenseFeature"
                },
                "UserSparseFeature": {
                    "description": "Sparse user features.",
                    "type": "float",
                    "format": "SparseTensor",
                    "file": "example_hetero_dataset__heterograph__Node_user_UserSparseFeature__30209d631dcc4ae3813d3c360f9c42dd.sparse.npz"
                },
                "_ID": {
                    "description": "",
                    "type": "int",
                    "format": "Tensor",
                    "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz",
                    "key": "Node_user__ID"
                }
            },
            "item": {
                "ItemDenseFeature": {
                    "description": "Dense item features.",
                    "type": "float",
                    "format": "Tensor",
                    "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz",
                    "key": "Node_item_ItemDenseFeature"
                },
                "_ID": {
                    "description": "",
                    "type": "int",
                    "format": "Tensor",
                    "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz",
                    "key": "Node_item__ID"
                }
            }
        },
        "Edge": {
            "user_click_item": {
                "ClickTime": {
                    "description": "Click time.",
                    "type": "float",
                    "format": "Tensor",
                    "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz",
                    "key": "Edge_user_click_item_ClickTime"
                },
                "_ID": {
                    "description": "",
                    "type": "int",
                    "format": "Tensor",
                    "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz",
                    "key": "Edge_user_click_item__ID"
                },
                "_Edge": {
                    "description": "",
                    "type": "int",
                    "format": "Tensor",
                    "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz",
                    "key": "Edge_user_click_item__Edge"
                }
            },
            "user_purchase_item": {
                "PurchaseTime": {
                    "description": "Purchase time.",
                    "type": "float",
                    "format": "Tensor",
                    "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz",
                    "key": "Edge_user_purchase_item_PurchaseTime"
                },
                "_ID": {
                    "description": "",
                    "type": "int",
                    "format": "Tensor",
                    "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz",
                    "key": "Edge_user_purchase_item__ID"
                },
                "_Edge": {
                    "description": "",
                    "type": "int",
                    "format": "Tensor",
                    "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz",
                    "key": "Edge_user_purchase_item__Edge"
                }
            },
            "user_is_friend_user": {
                "SparseFriendFeature": {
                    "description": "Sparse friend features.",
                    "type": "float",
                    "format": "SparseTensor",
                    "file": "example_hetero_dataset__heterograph__Edge_user_is_friend_user_SparseFriendFeature__fc3b5ebfe3efe6ac35e116c02d388ac6.sparse.npz"
                },
                "DenseFriendFeature": {
                    "description": "Dense friend features.",
                    "type": "float",
                    "format": "Tensor",
                    "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz",
                    "key": "Edge_user_is_friend_user_DenseFriendFeature"
                },
                "_ID": {
                    "description": "",
                    "type": "int",
                    "format": "Tensor",
                    "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz",
                    "key": "Edge_user_is_friend_user__ID"
                },
                "_Edge": {
                    "description": "",
                    "type": "int",
                    "format": "Tensor",
                    "file": "example_hetero_dataset__heterograph__aab19db19513942e161ace237aea63b4.npz",
                    "key": "Edge_user_is_friend_user__Edge"
                }
            }
        },
        "Graph": {
            "_NodeList": {
                "file": "example_hetero_dataset__heterograph__Graph_NodeList__752140b0bd5669a2580f06dda6a70ced.sparse.npz"
            }
        }
    },
    "is_heterogeneous": true
}