Skip to content

Conversation

@BedirT
Copy link
Owner

@BedirT BedirT commented Aug 2, 2025

Summary

  • enable alternating two-color token highlight via new Token View option
  • reduce model selection to model families and add Gemini family
  • document new feature in README and changelog
  • bump version to 1.3.0

Testing

  • npm run lint
  • npm test (fails: runs but produces no output)

@BedirT BedirT requested a review from Copilot August 2, 2025 21:22
@BedirT BedirT linked an issue Aug 2, 2025 that may be closed by this pull request
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a token view feature for visual token highlighting and simplifies model selection by grouping models into families rather than individual models. It also adds support for Google's Gemini model family.

  • Introduces a Token View feature that highlights tokens with alternating colors for better visualization
  • Simplifies model selection by organizing models into families (OpenAI, Anthropic, Google) instead of individual models
  • Adds support for Google Gemini model family with appropriate tokenizer mapping

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

File Description
src/extension.js Implements token highlighting functionality and reorganizes model selection from individual models to families
package.json Updates version to 1.3.0
README.md Documents new Token View feature and model family selection changes
CHANGELOG.md Records version 1.3.0 changes

'gpt-4o': 'gpt-4o',
'gpt-4': 'gpt-4',
'gpt-3.5': 'gpt-3.5-turbo',
'claude-4*': 'claude-3.7*',
Copy link

Copilot AI Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping for 'claude-4*' points to 'claude-3.7*' instead of a corresponding Claude-4 model. This appears to be incorrect and could cause confusion about which model is actually being used for tokenization.

Suggested change
'claude-4*': 'claude-3.7*',
'claude-4*': 'claude-4*',

Copilot uses AI. Check for mistakes.
'claude-4*': 'claude-3.7*',
'claude-3.7*': 'claude-3.7*',
'claude-3.5*': 'claude-3.5*',
'gemini*': 'cl100k_base'
Copy link

Copilot AI Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Gemini model family is mapped to 'cl100k_base' which is an OpenAI tokenizer, not a Google tokenizer. This will produce incorrect token counts for Gemini models and should use an appropriate Google tokenizer or approximation method.

Copilot uses AI. Check for mistakes.
currentModel = model;

if (currentProvider === 'openai') {
if (currentProvider === 'openai' || currentProvider === 'google') {
Copy link

Copilot AI Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google models are being treated the same as OpenAI models for tokenization, but they should use different tokenizers. Google models will incorrectly use OpenAI's encoding_for_model function instead of an appropriate Google tokenizer.

Copilot uses AI. Check for mistakes.
Comment on lines +139 to +143
const { getTokenizer } = require('@anthropic-ai/tokenizer');
const tok = getTokenizer();
tokens = Array.from(tok.encode(text, 'all'));
decode = t => Buffer.from(tok.decode_single_token_bytes(t)).toString();
tok.free();
Copy link

Copilot AI Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Anthropic tokenizer is being required and initialized on every token highlighting call. This should be moved to module-level or cached to avoid repeated imports and initializations.

Suggested change
const { getTokenizer } = require('@anthropic-ai/tokenizer');
const tok = getTokenizer();
tokens = Array.from(tok.encode(text, 'all'));
decode = t => Buffer.from(tok.decode_single_token_bytes(t)).toString();
tok.free();
if (!anthropicTokenizer) {
anthropicTokenizer = getTokenizer();
}
tokens = Array.from(anthropicTokenizer.encode(text, 'all'));
decode = t => Buffer.from(anthropicTokenizer.decode_single_token_bytes(t)).toString();

Copilot uses AI. Check for mistakes.
const tok = getTokenizer();
tokens = Array.from(tok.encode(text, 'all'));
decode = t => Buffer.from(tok.decode_single_token_bytes(t)).toString();
tok.free();
Copy link

Copilot AI Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokenizer is freed immediately after use, but the decode function created on line 142 still references it. This will cause errors when the decode function is called later in the loop.

Suggested change
tok.free();

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Highlight tokens

2 participants