Skip to content

Commit a293ceb

Browse files
authored
Merge pull request #112 from mistralai/doc/v0.0.66
Update docs to v0.0.66
2 parents 8d58430 + a419d49 commit a293ceb

File tree

6 files changed

+26
-26
lines changed

6 files changed

+26
-26
lines changed

docs/capabilities/embeddings.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Embeddings are vectorial representations of text that capture the semantic meani
1313
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
1414
</a>
1515

16-
## Mistral Embeddings API
16+
## Mistral Embed API
1717
To generate text embeddings using Mistral AI's embeddings API, we can make a request to the API endpoint and specify the embedding model `mistral-embed`, along with providing a list of input texts. The API will then return the corresponding embeddings as numerical vectors, which can be used for further analysis or processing in NLP applications.
1818

1919
```python

docs/capabilities/function-calling.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Currently, function calling is available for the following models:
1717
- Mistral Small
1818
- Mistral Large
1919
- Mixtral 8x22B
20-
- Mistral NeMo
20+
- Mistral Nemo
2121

2222

2323
### Four steps

docs/getting-started/Open-weight-models.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
22
id: open_weight_models
3-
title: Open-weight models
3+
title: Apache 2.0 models
44
sidebar_position: 1.4
55
---
66

77
We open-source both pre-trained models and instruction-tuned models. These models are not tuned for safety as we want to empower users to test and refine moderation based on their use cases. For safer models, follow our [guardrailing tutorial](/capabilities/guardrailing).
88

99
## License
10-
- Mistral 7B, Mixtral 8x7B, Mixtral 8x22B, Codestral Mamba, Mathstral, and Mistral NeMo are under [Apache 2 License](https://choosealicense.com/licenses/apache-2.0/), which permits their use without any constraints.
10+
- Mistral 7B, Mixtral 8x7B, Mixtral 8x22B, Codestral Mamba, Mathstral, and Mistral Nemo are under [Apache 2 License](https://choosealicense.com/licenses/apache-2.0/), which permits their use without any constraints.
1111
- Codestral is under [Mistral AI Non-Production (MNPL) License](https://mistral.ai/licences/MNPL-0.1.md).
1212
- Mistral Large is under [Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md).
1313

@@ -29,8 +29,8 @@ We open-source both pre-trained models and instruction-tuned models. These model
2929
| Codestral-22B-v0.1 | [Hugging Face](https://huggingface.co/mistralai/Codestral-22B-v0.1) <br/> [raw_weights](https://models.mistralcdn.com/codestral-22b-v0-1/codestral-22B-v0.1.tar) (md5sum: `1ea95d474a1d374b1d1b20a8e0159de3`) | - 32768 vocabulary size <br/> - Supports v3 Tokenizer |
3030
| Codestral-Mamba-7B-v0.1 | [Hugging Face](https://huggingface.co/mistralai/mamba-codestral-7B-v0.1) <br/> [raw_weights](https://models.mistralcdn.com/codestral-mamba-7b-v0-1/codestral-mamba-7B-v0.1.tar) (md5sum: `d3993e4024d1395910c55db0d11db163`) | - 32768 vocabulary size <br/> - Supports v3 Tokenizer |
3131
| Mathstral-7B-v0.1 | [Hugging Face](https://huggingface.co/mistralai/mathstral-7B-v0.1) <br/> [raw_weights](https://models.mistralcdn.com/mathstral-7b-v0-1/mathstral-7B-v0.1.tar) (md5sum: `5f05443e94489c261462794b1016f10b`) | - 32768 vocabulary size <br/> - Supports v3 Tokenizer |
32-
| Mistral-NeMo-Base-2407 | [Hugging Face](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) <br/> [raw_weights](https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-base-2407.tar) (md5sum: `c5d079ac4b55fc1ae35f51f0a3c0eb83`) | - 131k vocabulary size <br/> - Supports tekken.json tokenizer |
33-
| Mistral-NeMo-Instruct-2407 | [Hugging Face](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) <br/> [raw_weights](https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-instruct-2407.tar) (md5sum: `296fbdf911cb88e6f0be74cd04827fe7`) | - 131k vocabulary size <br/> - Supports tekken.json tokenizer <br/> - Supports function calling |
32+
| Mistral-Nemo-Base-2407 | [Hugging Face](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) <br/> [raw_weights](https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-base-2407.tar) (md5sum: `c5d079ac4b55fc1ae35f51f0a3c0eb83`) | - 131k vocabulary size <br/> - Supports tekken.json tokenizer |
33+
| Mistral-Nemo-Instruct-2407 | [Hugging Face](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) <br/> [raw_weights](https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-instruct-2407.tar) (md5sum: `296fbdf911cb88e6f0be74cd04827fe7`) | - 131k vocabulary size <br/> - Supports tekken.json tokenizer <br/> - Supports function calling |
3434
| Mistral-Large-Instruct-2407 | [Hugging Face](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407) <br/> [raw_weights](https://models.mistralcdn.com/mistral-large-2407/mistral-large-instruct-2407.tar) (md5sum: `fc602155f9e39151fba81fcaab2fa7c4`)| - 32768 vocabulary size <br/> - Supports v3 Tokenizer <br/> - Supports function calling |
3535

3636

@@ -44,7 +44,7 @@ We open-source both pre-trained models and instruction-tuned models. These model
4444
| Codestral-22B-v0.1 | 22.2B | 22.2B | 60 |
4545
| Codestral-Mamba-7B-v0.1 | 7.3B | 7.3B | 16 |
4646
| Mathstral-7B-v0.1 | 7.3B | 7.3B | 16 |
47-
| Mistral-NeMo-Instruct-2407 | 12B | 12B | 28 - bf16 <br/> 16 - fp8 |
47+
| Mistral-Nemo-Instruct-2407 | 12B | 12B | 28 - bf16 <br/> 16 - fp8 |
4848
| Mistral-Large-Instruct-2407 | 123B | 123B | 228 |
4949

5050
## How to run?

docs/getting-started/changelog.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ July 24, 2024
1111
- We added fine-tuning support for Codestral, Mistral Nemo and Mistral Large. Now the model choices for fine-tuning are `open-mistral-7b` (v0.3), `mistral-small-latest` (`mistral-small-2402`), `codestral-latest` (`codestral-2405`), `open-mistral-nemo` and , `mistral-large-latest` (`mistral-large-2407`)
1212

1313
July 18, 2024
14-
- We released Mistral NeMo (`open-mistral-nemo`).
14+
- We released Mistral Nemo (`open-mistral-nemo`).
1515

1616
July 16, 2024
1717
- We released Codestral Mamba (`open-codestral-mamba`) and Mathstral.

docs/getting-started/introduction.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ We release state-of-the-art generalist models, specialized models, and research
1919

2020
### Specialized models
2121
- Codestral, our cutting-edge language model for coding released [May 2024](https://mistral.ai/news/codestral/)
22-
- Mistral Embeddings, our state-of-the-art semantic for extracting representation of text extracts
22+
- Mistral Embed, our state-of-the-art semantic for extracting representation of text extracts
2323

2424
### Research models
2525
- Mistral 7b, our first dense model released [September 2023](https://mistral.ai/news/announcing-mistral-7b/)

docs/getting-started/models.mdx

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -10,27 +10,27 @@ Mistral provides three types of models: state-of-the-art generalist models, spec
1010

1111
- **State-of-the-art generalist models**
1212

13-
| Model | Available Open-weight|Available via API| Description | Max Tokens| API Endpoints|
14-
|--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|
15-
| Mistral Large |:heavy_check_mark: <br/> [Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md)| :heavy_check_mark: |Our flagship model with state-of-the-art reasoning, knowledge, and coding capabilities. It's ideal for complex tasks that require large reasoning capabilities or are highly specialized (Synthetic Text Generation, Code Generation, RAG, or Agents). Learn more on our [blog post](https://mistral.ai/news/mistral-large-2407/)| 128k | `mistral-large-latest`|
16-
| Mistral NeMo | :heavy_check_mark: <br/> Apache2 | :heavy_check_mark: | A 12B model built with the partnership with Nvidia. It is easy to use and a drop-in replacement in any system using Mistral 7B that it supersedes. Learn more on our [blog post](https://mistral.ai/news/mistral-nemo/) | 128k | `open-mistral-nemo`|
13+
| Model | Weight availability|Available via API| Description | Max Tokens| API Endpoints|Version|
14+
|--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|
15+
| Mistral Large |:heavy_check_mark: <br/> [Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md)| :heavy_check_mark: |Our flagship model with state-of-the-art reasoning, knowledge, and coding capabilities. It's ideal for complex tasks that require large reasoning capabilities or are highly specialized (Synthetic Text Generation, Code Generation, RAG, or Agents). Learn more on our [blog post](https://mistral.ai/news/mistral-large-2407/)| 128k | `mistral-large-latest`| 24.07|
16+
| Mistral Nemo | :heavy_check_mark: <br/> Apache2 | :heavy_check_mark: | A 12B model built with the partnership with Nvidia. It is easy to use and a drop-in replacement in any system using Mistral 7B that it supersedes. Learn more on our [blog post](https://mistral.ai/news/mistral-nemo/) | 128k | `open-mistral-nemo`| 24.07|
1717

1818
- **Specialized models**
1919

20-
| Model | Available Open-weight|Available via API| Description | Max Tokens| API Endpoints|
21-
|--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|
22-
| Codestral |:heavy_check_mark: <br/> [Mistral AI Non-Production License](https://mistral.ai/licenses/MNPL-0.1.md)|:heavy_check_mark: | A cutting-edge generative model that has been specifically designed and optimized for code generation tasks, including fill-in-the-middle and code completion. Learn more on our [blog post](https://mistral.ai/news/codestral/) | 32k | `codestral-latest`|
23-
| Mistral Embeddings ||:heavy_check_mark: | A model that converts text into numerical vectors of embeddings in 1024 dimensions. Embedding models enable retrieval and retrieval-augmented generation applications. It achieves a retrieval score of 55.26 on MTEB | 8k | `mistral-embed`|
20+
| Model | Weight availability|Available via API| Description | Max Tokens| API Endpoints|Version|
21+
|--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|
22+
| Codestral |:heavy_check_mark: <br/> [Mistral AI Non-Production License](https://mistral.ai/licenses/MNPL-0.1.md)|:heavy_check_mark: | A cutting-edge generative model that has been specifically designed and optimized for code generation tasks, including fill-in-the-middle and code completion. Learn more on our [blog post](https://mistral.ai/news/codestral/) | 32k | `codestral-latest`| 24.05|
23+
| Mistral Embed ||:heavy_check_mark: | A model that converts text into numerical vectors of embeddings in 1024 dimensions. Embedding models enable retrieval and retrieval-augmented generation applications. It achieves a retrieval score of 55.26 on MTEB | 8k | `mistral-embed`| 23.12|
2424

2525
- **Research models**
2626

27-
| Model | Available Open-weight|Available via API| Description | Max Tokens| API Endpoints|
28-
|--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|
29-
| Mistral 7B | :heavy_check_mark: <br/> Apache2 |:heavy_check_mark: |The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. Learn more on our [blog post](https://mistral.ai/news/announcing-mistral-7b/)| 32k | `open-mistral-7b`|
30-
| Mixtral 8x7B |:heavy_check_mark: <br/> Apache2 | :heavy_check_mark: |A sparse mixture of experts model. As such, it leverages up to 45B parameters but only uses about 12B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated [blog post](https://mistral.ai/news/mixtral-of-experts/)| 32k | `open-mixtral-8x7b`|
31-
| Mixtral 8x22B |:heavy_check_mark: <br/> Apache2 | :heavy_check_mark: |A bigger sparse mixture of experts model. As such, it leverages up to 141B parameters but only uses about 39B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated [blog post](https://mistral.ai/news/mixtral-8x22b/)| 64k | `open-mixtral-8x22b`|
32-
| Mathstral | :heavy_check_mark: <br/> Apache2 | | A math-specific 7B model designed for math reasoning and scientific tasks. Learn more on our [blog post](https://mistral.ai/news/mathstral/) | 32k | NA|
33-
| Codestral Mamba | :heavy_check_mark: <br/> Apache2 | :heavy_check_mark: | A Mamba 2 language model specialized in code generation. Learn more on our [blog post](https://mistral.ai/news/codestral-mamba/) | 256k | `open-codestral-mamba`|
27+
| Model | Weight availability|Available via API| Description | Max Tokens| API Endpoints|Version|
28+
|--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|
29+
| Mistral 7B | :heavy_check_mark: <br/> Apache2 |:heavy_check_mark: |The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. Learn more on our [blog post](https://mistral.ai/news/announcing-mistral-7b/)| 32k | `open-mistral-7b`| v0.3|
30+
| Mixtral 8x7B |:heavy_check_mark: <br/> Apache2 | :heavy_check_mark: |A sparse mixture of experts model. As such, it leverages up to 45B parameters but only uses about 12B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated [blog post](https://mistral.ai/news/mixtral-of-experts/)| 32k | `open-mixtral-8x7b`| v0.1|
31+
| Mixtral 8x22B |:heavy_check_mark: <br/> Apache2 | :heavy_check_mark: |A bigger sparse mixture of experts model. As such, it leverages up to 141B parameters but only uses about 39B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated [blog post](https://mistral.ai/news/mixtral-8x22b/)| 64k | `open-mixtral-8x22b`| v0.1|
32+
| Mathstral | :heavy_check_mark: <br/> Apache2 | | A math-specific 7B model designed for math reasoning and scientific tasks. Learn more on our [blog post](https://mistral.ai/news/mathstral/) | 32k | NA| v0.1|
33+
| Codestral Mamba | :heavy_check_mark: <br/> Apache2 | :heavy_check_mark: | A Mamba 2 language model specialized in code generation. Learn more on our [blog post](https://mistral.ai/news/codestral-mamba/) | 256k | `open-codestral-mamba`| v0.1|
3434

3535
## Pricing
3636

@@ -67,7 +67,7 @@ It can be used for complex multilingual reasoning tasks, including text understa
6767
- [Codestral](https://mistral.ai/news/codestral/): as a 22B model, Codestral sets a new standard on the performance/latency space for code generation compared to previous models used for coding.
6868
- [Codestral-Mamba](https://mistral.ai/news/codestral-mamba/): we have trained this model with advanced code and reasoning capabilities, enabling the model to have a strong performance on par with SOTA transformer-based models.
6969
- [Mathstral](https://mistral.ai/news/mathstral/): Mathstral stands on the shoulders of Mistral 7B and specialises in STEM subjects. It achieves state-of-the-art reasoning capacities in its size category across various industry-standard benchmarks.
70-
- [Mistral NeMo](https://mistral.ai/news/mistral-nemo/): Mistral NeMo's reasoning, world knowledge, and coding performance are state-of-the-art in its size category. As it relies on standard architecture, Mistral NeMo is easy to use and a drop-in replacement in any system using Mistral 7B that it supersedes.
70+
- [Mistral Nemo](https://mistral.ai/news/mistral-nemo/): Mistral Nemo's reasoning, world knowledge, and coding performance are state-of-the-art in its size category. As it relies on standard architecture, Mistral Nemo is easy to use and a drop-in replacement in any system using Mistral 7B that it supersedes.
7171

7272

7373
## Picking a model
@@ -86,7 +86,7 @@ Today, Mistral models are behind many LLM applications at scale. Here is a brief
8686
When selecting a model, it is essential to evaluate the performance, and cost trade-offs. Depending on what’s most important for your application, your choice may differ significantly. Note that the models will be updated over time, the information we share below only reflect the current state of the models.
8787

8888
In general, the larger the model, the better the performance. For instance, when looking at the popular benchmark MMLU (Massive Multitask Language Understanding), the performance ranking of Mistral’s models is as follows:
89-
- Mistral Large (84.0%) > Mistral 8x22B (77.8%) > Mistral Small (72.2%) > Mixtral 8x7B (70.6%) > Mistral NeMo (68%) > Mistral 7B (62.5%).
89+
- Mistral Large (84.0%) > Mistral 8x22B (77.8%) > Mistral Small (72.2%) > Mixtral 8x7B (70.6%) > Mistral Nemo (68%) > Mistral 7B (62.5%).
9090

9191
Notably, Mistral Large is currently outperforming all other four models across almost all benchmarks.
9292

0 commit comments

Comments
 (0)