-
Couldn't load subscription status.
- Fork 5
Add token view and model family selection #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a token view feature for visual token highlighting and simplifies model selection by grouping models into families rather than individual models. It also adds support for Google's Gemini model family.
- Introduces a Token View feature that highlights tokens with alternating colors for better visualization
- Simplifies model selection by organizing models into families (OpenAI, Anthropic, Google) instead of individual models
- Adds support for Google Gemini model family with appropriate tokenizer mapping
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/extension.js | Implements token highlighting functionality and reorganizes model selection from individual models to families |
| package.json | Updates version to 1.3.0 |
| README.md | Documents new Token View feature and model family selection changes |
| CHANGELOG.md | Records version 1.3.0 changes |
| 'gpt-4o': 'gpt-4o', | ||
| 'gpt-4': 'gpt-4', | ||
| 'gpt-3.5': 'gpt-3.5-turbo', | ||
| 'claude-4*': 'claude-3.7*', |
Copilot
AI
Aug 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mapping for 'claude-4*' points to 'claude-3.7*' instead of a corresponding Claude-4 model. This appears to be incorrect and could cause confusion about which model is actually being used for tokenization.
| 'claude-4*': 'claude-3.7*', | |
| 'claude-4*': 'claude-4*', |
| 'claude-4*': 'claude-3.7*', | ||
| 'claude-3.7*': 'claude-3.7*', | ||
| 'claude-3.5*': 'claude-3.5*', | ||
| 'gemini*': 'cl100k_base' |
Copilot
AI
Aug 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Gemini model family is mapped to 'cl100k_base' which is an OpenAI tokenizer, not a Google tokenizer. This will produce incorrect token counts for Gemini models and should use an appropriate Google tokenizer or approximation method.
| currentModel = model; | ||
|
|
||
| if (currentProvider === 'openai') { | ||
| if (currentProvider === 'openai' || currentProvider === 'google') { |
Copilot
AI
Aug 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Google models are being treated the same as OpenAI models for tokenization, but they should use different tokenizers. Google models will incorrectly use OpenAI's encoding_for_model function instead of an appropriate Google tokenizer.
| const { getTokenizer } = require('@anthropic-ai/tokenizer'); | ||
| const tok = getTokenizer(); | ||
| tokens = Array.from(tok.encode(text, 'all')); | ||
| decode = t => Buffer.from(tok.decode_single_token_bytes(t)).toString(); | ||
| tok.free(); |
Copilot
AI
Aug 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Anthropic tokenizer is being required and initialized on every token highlighting call. This should be moved to module-level or cached to avoid repeated imports and initializations.
| const { getTokenizer } = require('@anthropic-ai/tokenizer'); | |
| const tok = getTokenizer(); | |
| tokens = Array.from(tok.encode(text, 'all')); | |
| decode = t => Buffer.from(tok.decode_single_token_bytes(t)).toString(); | |
| tok.free(); | |
| if (!anthropicTokenizer) { | |
| anthropicTokenizer = getTokenizer(); | |
| } | |
| tokens = Array.from(anthropicTokenizer.encode(text, 'all')); | |
| decode = t => Buffer.from(anthropicTokenizer.decode_single_token_bytes(t)).toString(); |
| const tok = getTokenizer(); | ||
| tokens = Array.from(tok.encode(text, 'all')); | ||
| decode = t => Buffer.from(tok.decode_single_token_bytes(t)).toString(); | ||
| tok.free(); |
Copilot
AI
Aug 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tokenizer is freed immediately after use, but the decode function created on line 142 still references it. This will cause errors when the decode function is called later in the loop.
| tok.free(); |
Summary
Testing
npm run lintnpm test(fails: runs but produces no output)