Garfield.data.initialize_dataloaders
- Garfield.data.initialize_dataloaders(node_masked_data: Data, edge_train_data: Data | None = None, edge_val_data: Data | None = None, edge_batch_size: int | None = 64, node_batch_size: int = 64, n_direct_neighbors: int = -1, n_hops: int = 1, shuffle: bool = True, edges_directed: bool = False, neg_edge_sampling_ratio: float = 1.0) dict[source]
Initialize edge-level and node-level training and validation dataloaders.
- Parameters:
node_masked_data – PyG Data object with node-level split masks.
edge_train_data – PyG Data object containing the edge-level training set.
edge_val_data – PyG Data object containing the edge-level validation set.
edge_batch_size – Batch size for the edge-level dataloaders.
node_batch_size – Batch size for the node-level dataloaders.
n_direct_neighbors – Number of sampled direct neighbors of the current batch nodes to be included in the batch. Defaults to ´-1´, which means to include all direct neighbors.
n_hops – Number of neighbor hops / levels for neighbor sampling of nodes to be included in the current batch. E.g. ´2´ means to not only include sampled direct neighbors of current batch nodes but also sampled neighbors of the direct neighbors.
shuffle – If True, shuffle the dataloaders.
edges_directed – If False, both symmetric edge index pairs are included in the same edge-level batch (1 edge has 2 symmetric edge index pairs).
neg_edge_sampling_ratio – Negative sampling ratio of edges. This is currently implemented in an approximate way, i.e. negative edges may contain false negatives.
- Returns:
Dictionary containing training and validation PyG LinkNeighborLoader (for edge reconstruction) and NeighborLoader (for gene expression reconstruction) objects.
- Return type:
loader_dict