You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/baseline.md
+47-5Lines changed: 47 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,25 +4,33 @@ From the paper to be released soon. Below, you can see the baselines for the `To
4
4
5
5
One can observe that the smaller datasets (`Zinc12k` and `Tox21`) beneficiate from adding another unrelated task (`QM9`), where the labels are computed from DFT simulations.
6
6
7
+
**NEW baselines added 2023/09/18**: Multitask baselines have been added for GatedGCN and MPNN++ (sum aggretator) using 3 random seeds. They achieve the best performance by a significant margin on Zinc12k and Tox21, while sacrificing a little on QM9.
8
+
7
9
| Dataset | Model | MAE ↓ | Pearson ↑ | R² ↑ | MAE ↓ | Pearson ↑ | R² ↑ |
@@ -88,6 +96,40 @@ This is not surprising as they contain two orders of magnitude more datapoints a
88
96
|| GIN | 0.1873 ± 0.0033 |**0.1701 ± 0.0142**|
89
97
|| GINE | 0.1883 ± 0.0039 |**0.1771 ± 0.0010**|
90
98
99
+
## NEW: Largemix improved sweep - 2023/08-18
100
+
101
+
Unsatisfied with the prior results, we ran a bayesian search over a broader set of parameters, and including only more expressive models, namely GINE, GatedGCN and MPNN++. We further increase the number of parameters to 10M due to evidence of underfitting. We evaluate only the multitask setting.
102
+
103
+
We observe a significant improvement over all tasks, with a very notable r2-score increase of +0.53 (0.27 -> 0.80) compared to the best node-level property prediction on PCQM4M_N4.
104
+
105
+
The results are reported below over 1 seed. We are currently running more seeds of the same models.
0 commit comments