Garfield.data.prepare_data

Garfield.data.prepare_data(adata, label_name: str | None = None, used_pca_feat: bool = False, adj_key: str = 'connectivities', edge_label_adj_key: str = 'edge_label_spatial_connectivities', edge_val_ratio: float = 0.1, edge_test_ratio: float = 0.0, node_val_ratio: float = 0.1, node_test_ratio: float = 0.0) → dict[source]

Prepares the dataset for training and evaluation by performing node-level and edge-level splits and returns a dictionary containing the processed data.

Parameters:

adata (AnnData) – An AnnData object containing gene expression or other relevant data.
label_name (str, optional) – The name of the label to use for node classification or regression tasks. Default is None.
adj_key (str, optional) – Key in the AnnData object that corresponds to the adjacency matrix for graph construction. Default is “connectivities”.
edge_label_adj_key (str, optional) – Key in the AnnData object that corresponds to the adjacency matrix used for edge-label reconstruction tasks. Default is “edge_label_spatial_connectivities”.
edge_val_ratio (float, optional) – Proportion of edges to use for validation in the edge-level split. Default is 0.1.
edge_test_ratio (float, optional) – Proportion of edges to use for testing in the edge-level split. Default is 0.
node_val_ratio (float, optional) – Proportion of nodes to use for validation in the node-level split. Default is 0.1.
node_test_ratio (float, optional) – Proportion of nodes to use for testing in the node-level split. Default is 0.

Returns:

A dictionary containing the following keys: - “edge_train_data”: Training data for edge-level tasks. - “edge_val_data”: Validation data for edge-level tasks. - “edge_test_data”: Testing data for edge-level tasks. - “node_masked_data”: Data with nodes masked for validation and testing in node-level tasks.

Return type:

dict