Tutorial =============================== Data Loading API ~~~~~~~~~~~~~~~~ Welcome to the tutorial! Let's start with the data loading API that is used to assemble a dataset from the given graph(s) and task. To load a dataset from the remote data repository, simply use the :func:`gli.dataloading.get_gli_dataset` function: .. code:: python >>> import gli >>> dataset = gli.get_gli_dataset(dataset="cora", task="NodeClassification", device="cpu") >>> dataset Dataset("CORA dataset. NodeClassification", num_graphs=1, save_path=/Users/jimmy/.dgl/CORA dataset. NodeClassification) The above code loads the Cora on :class:`gli.task.NodeClassificationTask` that is predefined in the GLI repository. :func:`gli.dataloading.get_gli_dataset` essentially does three things: 1. Load the requested graph(s). 2. Load the requested task configuration. 3. Combine them to return a dataset instance. Alternatively, one can do the same thing step by step, with the help of functions provided by GLI. 1. :func:`gli.dataloading.get_gli_graph`, :func:`gli.graph.read_gli_graph`. 2. :func:`gli.dataloading.get_gli_task`, :func:`gli.task.read_gli_task`. 3. :func:`gli.dataloading.combine_graph_and_task`. In specific, methods started with ``get`` will download data from the remote repository and methods started with ``read`` will read data files from local directories. GLI adopts the graph classes of DGL. Therefore, :func:`gli.dataloading.get_gli_graph` will return a ``DGLGraph`` instance, or a list of ``DGLGraph`` if the dataset contains multiple graphs. Besides, GLI provides class implementations for various tasks (e.g., :class:`gli.task.NodeClassificationTask`, :class:`gli.task.LinkPredictionTask`). Furthermore, :func:`gli.dataloading.get_gli_task` will return a :class:`gli.task.GLITask` object. One can then call :func:`gli.dataloading.combine_graph_and_task` to assemble a corresponding dataset (e.g., :class:`gli.dataset.NodeClassificationDataset`, :class:`gli.dataset.LinkPredictionDataset`). .. code:: python >>> import gli >>> g = gli.get_gli_graph(dataset="cora", device="cpu", verbose=False) >>> g Graph(num_nodes=2708, num_edges=10556, ndata_schemes={'NodeFeature': Scheme(shape=(1433,), dtype=torch.float32), 'NodeLabel': Scheme(shape=(), dtype=torch.int64)} edata_schemes={}) >>> task = gli.get_gli_task(dataset="cora", task="NodeClassification", verbose=False) >>> task >>> dataset = gli.combine_graph_and_task(g, task) >>> dataset Dataset("CORA dataset. NodeClassification", num_graphs=1, save_path=/Users/jimmy/.dgl/CORA dataset. NodeClassification) The returned dataset is inherited from ``DGLDataset``. Therefore, it can be incorporated into DGL's infrastructure seamlessly: .. code:: python >>> type(dataset) >>> isinstance(dataset, dgl.data.DGLDataset) True Example ~~~~~~~ Next, let's see a full example of dataloading and training on GLI datasets. First, import all required modules. .. code:: python import gli import torch from torch import nn import torch.nn.functional as F from dgl.nn.pytorch import GraphConv from gli.utils import to_dense Then, load the Cora dataset on node classification task. .. code:: python data = gli.dataloading.get_gli_dataset("cora", "NodeClassification") g = data[0] g = to_dense(g) features = g.ndata["NodeFeature"] labels = g.ndata["NodeLabel"] train_mask = g.ndata["train_mask"] val_mask = g.ndata["val_mask"] test_mask = g.ndata["test_mask"] in_feats = features.shape[1] n_classes = data.num_labels Since there are sparse features in Cora dataset, we need to convert it to dense for later computation. We then define the evaluation function as below. .. code:: python def accuracy(logits, labels): """Calculate accuracy.""" _, indices = torch.max(logits, dim=1) correct = torch.sum(indices == labels) return correct.item() * 1.0 / len(labels) def evaluate(model, features, labels, mask, eval_func): """Evaluate model.""" model.eval() with torch.no_grad(): logits = model(features) logits = logits[mask] labels = labels[mask] return eval_func(logits, labels) Next, we define a GCN model and start training. .. code:: python class GCN(nn.Module): """GCN network.""" def __init__(self, g, in_feats, n_hidden, n_classes, n_layers, activation, dropout): """Initiate model.""" super().__init__() self.g = g self.layers = nn.ModuleList() # input layer self.layers.append(GraphConv(in_feats, n_hidden, activation=activation)) # hidden layers for _ in range(n_layers - 2): self.layers.append(GraphConv(n_hidden, n_hidden, activation=activation)) # output layer self.layers.append(GraphConv(n_hidden, n_classes)) self.dropout = nn.Dropout(p=dropout) def forward(self, features): """Forward.""" h = features for i, layer in enumerate(self.layers): if i != 0: h = self.dropout(h) h = layer(self.g, h) return h model = GCN(g=g, in_feats=in_feats, n_hidden=8, n_classes=n_classes, n_layers=2, activation=F.relu, dropout=.6) optimizer = torch.optim.AdamW(model.parameters(), lr=.01, weight_decay=.001) eval_func = accuracy loss_fcn = nn.CrossEntropyLoss() for epoch in range(200): model.train() # forward logits = model(features) loss = loss_fcn(logits[train_mask], labels[train_mask]) optimizer.zero_grad() loss.backward() optimizer.step() train_acc = eval_func(logits[train_mask], labels[train_mask]) val_acc = evaluate(model, features, labels, val_mask, eval_func) print(f"Epoch {epoch:05d} | Loss {loss.item():.4f} |" f"TrainAcc {train_acc:.4f} | ValAcc {val_acc:.4f}") test_acc = evaluate(model, features, labels, test_mask, eval_func) print(f"Test Accuracy: {test_acc:.4f}") Output: .. code:: text Epoch 00000 | Loss 1.9454 |TrainAcc 0.1429 | ValAcc 0.3180 Epoch 00001 | Loss 1.9375 |TrainAcc 0.2500 | ValAcc 0.3580 Epoch 00002 | Loss 1.9318 |TrainAcc 0.3286 | ValAcc 0.3940 Epoch 00003 | Loss 1.9242 |TrainAcc 0.3357 | ValAcc 0.4100 Epoch 00004 | Loss 1.9138 |TrainAcc 0.4214 | ValAcc 0.4420 Epoch 00005 | Loss 1.9039 |TrainAcc 0.5143 | ValAcc 0.4720 Epoch 00006 | Loss 1.9002 |TrainAcc 0.4143 | ValAcc 0.4740 Epoch 00007 | Loss 1.8891 |TrainAcc 0.4643 | ValAcc 0.4660 Epoch 00008 | Loss 1.8787 |TrainAcc 0.5071 | ValAcc 0.4760 Epoch 00009 | Loss 1.8733 |TrainAcc 0.4286 | ValAcc 0.5020 Epoch 00010 | Loss 1.8581 |TrainAcc 0.5857 | ValAcc 0.5280 ...