GitHub - iMoonLab/HGHC: This repository contains the code and data for the paper “Hypergraph-Based High-Order Correlation Analysis for Large-Scale Long-Tailed Data.”

This repository contains the code and data for the paper “Hypergraph-Based High-Order Correlation Analysis for Large-Scale Long-Tailed Data.”

Highlights

HSMOTE: Lightweight dual encoder–decoder (models.HSMOTE) trained with BCE reconstruction and negative sampling on node–hyperedge incidence.
Tail oversampling via synthesis: Interpolation in embedding space + Bernoulli edge membership sampling over class-aware candidate sets (synthesize_vertices_bernoulli), with Top-M per-class popular edges, candidate capping, and class-specific degree targets.
Global structure aggregation: Build row-normalized Top-k PPR on the item–user bipartite graph, keep item→item mass, and fine-tune encoder_node + LinearHead with fixed P.
Inference-time diffusion: 3-step lightweight diffusion (utils.diffuse_3steps) to smooth representations before classification.

Repository Structure

repo/
├── config.py                 # Hyperparameters & paths (edit to your data locations)
├── main.py                   # End-to-end: load → hypergraph → HSMOTE → synth → PPR → finetune → eval
├── models.py                 # HSMOTE and LinearHead
├── load_data.py              # Loading/alignment, subgraph builder, BCE w/ negative sampling, helpers
├── utils.py                  # Training utils, metrics, PPR builders, diffusion, memmap, etc.
├── ppr.py                    # PPR
├── data/                     # raw features/labels/user JSON
├── cache/                    # runtime caches
└── README.md

Data & Formats

config.py points to your files (update paths as needed):

# Example fields — adapt to your setup
data_dir   = "Datasets/subset"
asins_pkl  = "asins_**.pkl"       
feat_pkl   = "features_**.pkl"    
lab_pkl    = "labels_**.pkl"      
user_json  = "user_products_1000.json"

Expected formats

features_**.pkl: {'features': float32 [N, d], 'asins': List[str]}
labels_**.pkl: {'labels': int64 [N]}
asins_**.pkl: several schemas are supported; load_idx2node_from_asins_pkl auto-parses and aligns

user_products_1000.json: user–item interactions, e.g.:

[
  {"user": "u1", "items": [{"prefix":"P","asin":"A1"}, {"prefix":"P","asin":"A2"}]},
  {"user": "u2", "items": [{"asin":"B3"}]}
]

Hypergraph construction & cache

Datasets download: https://drive.google.com/file/d/1v8nXKoIrd7bmfGZyW6N3_WT0dZUASDrw/view?usp=sharing
load_hypergraph builds CSR for item→user (offsets_i, indices_users) and user→item (offsets_u, indices_items).
A stable cache key (from kept_asin2row) is used to write .csr-<hash>.npz, .users.json, and .meta.json for fast reloads.

Quick Start (End-to-End)

Run:

python main.py

Pipeline in main.py:

Load & align: load_and_align aligns features/labels to idx2node → X, y.
Build hypergraph: load_hypergraph → CSR H; compute full edge features Xe_full.
Split: stratified_split_ratio per class.
Train HSMOTE: train_hsmote_on_tails on tail classes via batch subgraphs (BCE + MSE).
Synthesize: compute_ze_full_memmap encodes edge embeddings Ze (memmap). synthesize_vertices_bernoulli creates new tail nodes + merges dataset.
Build PPR: build_P_items_topk_bipartite on the bipartite graph, keep item→item rows, re-normalize → P_hat.
Fine-tune & evaluate: finetune_with_P_fixed (fixed P_hat) on train/val; inference uses 3-step diffusion.

Console logs include [HSMOTE], [PPR], [FT] (fine-tuning), and final [TEST] metrics.

Configuration (Key Hyperparameters)

Adjust in config.py :

Model: d_in, d_hid, d_embed
HSMOTE pretraining: lr_hsmote, wd_hsmote, beta_mse, neg_per_pos, hsmote_batches, per_class_train, log_every
Tail threshold: tail_threshold (relative to the largest class)
Synthesis: target_tail_ratio, topM, cand_cap_per_syn, max_edges_per_syn, ze_score_chunk, block_new_nodes
PPR: ppr_alpha, ppr_eps, ppr_topk, ppr_chunk
Fine-tuning: lr_enc, wd_enc, lr_head, wd_head, ft_epochs, precls_patience, bs_seeds
Inference: infer_alpha, infer_steps, encode_bs

Contact

Maintainer: Xiangmin Han / Tsinghua University
Email: [email protected]_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Highlights

Repository Structure

Data & Formats

Quick Start (End-to-End)

Configuration (Key Hyperparameters)

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
config.py		config.py
load_data.py		load_data.py
main.py		main.py
models.py		models.py
ppr.py		ppr.py
utils.py		utils.py

iMoonLab/HGHC

Folders and files

Latest commit

History

Repository files navigation

Highlights

Repository Structure

Data & Formats

Quick Start (End-to-End)

Configuration (Key Hyperparameters)

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages