Garfield.data.initialize_dataloaders

Garfield.data.initialize_dataloaders(node_masked_data: Data, edge_train_data: Data | None = None, edge_val_data: Data | None = None, edge_batch_size: int | None = 64, node_batch_size: int = 64, n_direct_neighbors: int = -1, n_hops: int = 1, shuffle: bool = True, edges_directed: bool = False, neg_edge_sampling_ratio: float = 1.0) dict[source]

Initialize edge-level and node-level training and validation dataloaders.

Parameters:
  • node_masked_data – PyG Data object with node-level split masks.

  • edge_train_data – PyG Data object containing the edge-level training set.

  • edge_val_data – PyG Data object containing the edge-level validation set.

  • edge_batch_size – Batch size for the edge-level dataloaders.

  • node_batch_size – Batch size for the node-level dataloaders.

  • n_direct_neighbors – Number of sampled direct neighbors of the current batch nodes to be included in the batch. Defaults to ´-1´, which means to include all direct neighbors.

  • n_hops – Number of neighbor hops / levels for neighbor sampling of nodes to be included in the current batch. E.g. ´2´ means to not only include sampled direct neighbors of current batch nodes but also sampled neighbors of the direct neighbors.

  • shuffle – If True, shuffle the dataloaders.

  • edges_directed – If False, both symmetric edge index pairs are included in the same edge-level batch (1 edge has 2 symmetric edge index pairs).

  • neg_edge_sampling_ratio – Negative sampling ratio of edges. This is currently implemented in an approximate way, i.e. negative edges may contain false negatives.

Returns:

Dictionary containing training and validation PyG LinkNeighborLoader (for edge reconstruction) and NeighborLoader (for gene expression reconstruction) objects.

Return type:

loader_dict