Merge branch 'main' of https://github.com/PriorLabs/PriorLabs.github.io

noahho · noahho · commit d5aee300a359 · 2025-01-08T16:33:36.000+01:00
diff --git a/docs/getting_started/api.md b/docs/getting_started/api.md
@@ -40,9 +40,7 @@ The cost for each API request is calculated as:
 api_cost = (num_train_rows + num_test_rows) * num_cols * n_estimators
 ```
 
-Where `n_estimators` is by default:
-- 4 for classification tasks
-- 8 for regression tasks
+Where `n_estimators` is by default 4 for classification tasks and 8 for regression tasks.
 
 ### Monitoring Usage
 
@@ -56,8 +54,6 @@ Track your API usage through response headers:
 
 ## Current Limitations
 
-### Data Privacy and Security
-
 !!! warning "Important Data Guidelines"
     - Do NOT upload any Personally Identifiable Information (PII)
     - Do NOT upload any sensitive or confidential data
@@ -68,17 +64,13 @@ Track your API usage through response headers:
 ### Size Limitations
 
 1. Maximum total cells per request must be below 100,000:
-```python
-max_cells = (num_train_rows + num_test_rows) * num_cols
 ```
-
-2. For regression with full output (`return_full_output=True`), the number of test samples must be below 500:
-```python
-if task == 'regression' and return_full_output and num_test_samples > 500:
-    raise ValueError("Cannot return full output for regression with >500 test samples")
+(num_train_rows + num_test_rows) * num_cols < 100,000
 ```
 
-These limits will be increased in future releases.
+2. For regression with full output turned on (`return_full_output=True`), the number of test samples must be below 500.
+
+These limits will be relaxed in future releases.
 
 ### Managing User Data
 
@@ -99,11 +91,11 @@ The API uses standard HTTP status codes:
 | 400 | Invalid request |
 | 429 | Rate limit exceeded |
 
-Example error response:
+Example response, when limit reached:
 ```json
 {
     "error": "API_LIMIT_REACHED",
     "message": "Usage limit exceeded",
     "next_available_at": "2024-01-07 00:00:00"
 }
-```
+```
diff --git a/docs/getting_started/install.md b/docs/getting_started/install.md
@@ -1,4 +1,4 @@
-You can access our models through our API (https://github.com/automl/tabpfn-client) or via our user interface built on top of the API (https://www.ux.priorlabs.ai/).
+You can access our models through our API (https://github.com/automl/tabpfn-client), via our user interface built on top of the API (https://www.ux.priorlabs.ai/) or locally.
 
 === "Python API Client (No GPU, Online)"
 
@@ -28,4 +28,4 @@ You can access our models through our API (https://github.com/automl/tabpfn-clie
     !!! warning
         R support is currently under development.
         You can find a work in progress at [TabPFN R](https://github.com/robintibor/R-tabpfn).
-        Looking for contributors!
+        Looking for contributors!
diff --git a/docs/getting_started/intended_use.md b/docs/getting_started/intended_use.md
@@ -3,15 +3,17 @@
 !!! note
     For a simple example getting started with classification see [classification tutorial](../tutorials/classification.md).
 
-    We provide a comprehensive demo notebook that guides through installation and functionalities at [Interactive Colab Tutorial (with GPU usage)](https://tinyurl.com/tabpfn-colab-local) and [Interactive Colab Tutorial (without GPU usage)](https://tinyurl.com/tabpfn-colab-online).
+    We provide two comprehensive demo notebooks that guides through installation and functionalities. One [colab tutorial using the cloud](https://tinyurl.com/tabpfn-colab-online) and one [colab tutorial using the local GPU](https://tinyurl.com/tabpfn-colab-local).
 
 ### When to use TabPFN
 
-TabPFN excels in handling small to medium-sized datasets with up to 10,000 samples and 500 features. For larger datasets, approaches such as CatBoost, XGB, or AutoGluon are likely to outperform TabPFN.
+TabPFN excels in handling small to medium-sized datasets with up to 10,000 samples and 500 features. For larger datasets, methods such as CatBoost, XGBoost, or AutoGluon are likely to outperform TabPFN.
 
 ### Intended Use of TabPFN
 
-While TabPFN provides a powerful drop-in replacement for traditional tabular data models, achieving top performance on real-world problems often requires domain expertise and the ingenuity of data scientists. Data scientists should continue to apply their skills in feature engineering, data cleaning, and problem framing to get the most out of TabPFN.
+TabPFN is intended as a powerful drop-in replacement for traditional tabular data prediction tools, where top performance and fast training matter.
+It still requires data scientists to prepare the data using their domain knowledge.
+Data scientists will see benefits in performing feature engineering, data cleaning, and problem framing to get the most out of TabPFN.
 
 ### Limitations of TabPFN
 
@@ -21,7 +23,7 @@ While TabPFN provides a powerful drop-in replacement for traditional tabular dat
 
 ### Computational and Time Requirements
 
-TabPFN is computationally efficient and can run on consumer hardware for most datasets. Training on a new dataset is recommended to run on a GPU as this speeds it up significantly. However, TabPFN is not optimized for real-time inference tasks.
+TabPFN is computationally efficient and can run inference on consumer hardware for most datasets. Training on a new dataset is recommended to run on a GPU as this speeds it up significantly. TabPFN is not optimized for real-time inference tasks, but V2 can perform much faster predictions than V1 of TabPFN.
 
 ### Data Preparation
 
@@ -33,5 +35,4 @@ TabPFN's predictions come with uncertainty estimates, allowing you to assess the
 
 ### Hyperparameter Tuning
 
-TabPFN provides strong performance out-of-the-box without extensive hyperparameter tuning. If you have additional computational resources, you can further optimize TabPFN's performance using random hyperparameter tuning or the Post-Hoc Ensembling (PHE) technique.
-
+TabPFN provides strong performance out-of-the-box without extensive hyperparameter tuning. If you have additional computational resources, you can automatically tune its hyperparameters using [post-hoc ensembling](https://github.com/PriorLabs/tabpfn-extensions/tree/main/src/tabpfn_extensions/post_hoc_ensembles) or [random tuning](https://github.com/PriorLabs/tabpfn-extensions/tree/main/src/tabpfn_extensions/hpo).