Skip to content

Conversation

kfule
Copy link
Contributor

@kfule kfule commented Sep 13, 2025

This fixes the URI decoder used in the Router to decode more strictly. Also, this reduces bundle size by removing unnecessary code, intermediate object, and intermediate variables.

Description

This PR reduces the bundle file size by implementing the following changes (including URI decoder fix).

  • Removal of unnecessary codes
    • (router) Code related to setImmediate, which is almost never used outside of IE11
    • (render) Parameter of the render factory
  • Remove the intermediate object of domFor by splitting delayedRemoval into a separate file.
  • Replace two duplicate decodeURIComponentSave functions with one shared version
    • Also, this PR fixes the decoder to enable more strict decoding.
  • Suppress the generation of intermediate variables in the bundle
    • changing the module loading order
    • changing the name of the variable into which the module is loaded
  • Disable terser's conditionals compression option
    • bundle gzipped size is reduced and performance degradation is also improved

Except for bug fixes in the URI decoder, all of the changes do not break existing tests, of course.
In addition, the readability of bundle file improves with the removal of intermediate objects and variables.

bundle size comparison

(before) v2.3.7

Original size: 20,133 bytes gzipped (67,544 bytes uncompressed)
Compiled size: 9,035 bytes gzipped (24,481 bytes uncompressed)

(after) This PR

Original size: 20,327 bytes gzipped (67,906 bytes uncompressed)
Compiled size: 8,982 bytes gzipped (24,831 bytes uncompressed)

This PR reduces the bundle size to 8.98kb (8,982 bytes) gzipped.

Motivation and Context

Regarding the bundle size, it dropped below 9 kB gzipped in v2.3.4 but climbed back above 9 kB gzipped starting in v2.3.5, which was a bit disappointing for me. So I decided to work on reducing the bundle file size.

Also, I am interested in the readability of the bundle file (mithril.js file), and I believe this PR contributes to improving readability.

How Has This Been Tested?

npm run test

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • My change requires a documentation update, and I've opened a pull request to update it already:
  • I have read https://mithril.js.org/contributing.html.

setImmediate() is not supported in almost all environments except IE11. Furthermore, by not using setImmediate(), the assignment code to callAsync itself, which takes non-browser environments into account, can be removed.
The `window` is no longer used within m.render().
@kfule kfule requested a review from a team as a code owner September 13, 2025 13:04
The bundle size can be reduced by removing the intermediate object used to export domFor and delayedRemoval.
Additionally, the file paths used in domFor-related require() calls have been standardized.
…undle by changing the module loading order and variable name

Mithril's internal bundler enables module bundling without generating intermediate variables by carefully managing module loading order and variable names. This commit prevents assigning `mountRedraw`, `decodeURIComponentSave`, and `hyperscript` to intermediate variables.

It also improves readability by moving all member additions for `m` to the end of the file.
Copy link
Member

@dead-claudia dead-claudia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just small nits.

I do want to see that URI decoder bug fix, though. Yes, it's a +1, but we're already here doing things to it, and if it doesn't happen now, it'll probably never happen.

Comment on lines 3 to 9
module.exports = function(str) {
try {
return decodeURIComponent(str)
} catch(err) {
return str
}
}
Copy link
Member

@dead-claudia dead-claudia Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this will result in a slight bundle increase, but can you replace this while we're at it with a string replace instead so we can be more tolerant with our decoding? Also, I'd like to see new tests validating this behavioral change. (It's honestly a bug fix more than anything.)

As the change involves string replace detection, you'll need to add several tests for this. For the sake of not blowing up test runtime, please put the relevant tests in a separate file instead of adding it to the existing router test file.
That way, you can exhaustively test it with some loops inside a single test body. There's 1114112 = 2^20 + 2^16 total code points in Unicode, including surrogates, and testing that 21 times, one for each origin + prefix pair in the router tests, is just plainly not worth the test runtime.

The idea is to only feed decodeURIComponent the valid UTF-8 encodings while decoding the rest. The comment in the suggestion explains the regexp.

By the way, String.prototype.replace is usually much better optimized than decodeURIComponent, and so this can provide a small speed boost. This is especially true in V8, since decodeURIComponent builds a complete copy byte-by-byte while String.prototype.replace just creates non-copying slices and combines them with cons nodes.

Suggested change
module.exports = function(str) {
try {
return decodeURIComponent(str)
} catch(err) {
return str
}
}
/*
Percent encodings encode UTF-8 bytes, so this regexp needs to match that.
Here's how UTF-8 encodes stuff:
- `00-7F`: 1-byte, for U+0000-U+007F
- `C2-DF 80-BF`: 2-byte, for U+0080-U+07FF
- `E0-EF 80-BF 80-BF`: 3-byte, encodes U+0800-U+FFFF
- `F0-F4 80-BF 80-BF 80-BF`: 4-byte, encodes U+10000-U+10FFFF
In this, there's a number of invalid byte sequences:
- `80-BF`: Continuation byte, invalid as start
- `C0-C1 80-BF`: Overlong encoding for U+0000-U+007F
- `E0 80-9F 80-BF`: Overlong encoding for U+0080-U+07FF
- `ED A0-BF 80-BF`: Encoding for UTF-16 surrogate U+D800-U+DFFF
- `F0 80-8F 80-BF 80-BF`: Overlong encoding for U+0800-U+FFFF
- `F4 90-BF`: RFC 3629 restricted UTF-8 to only code points UTF-16 could encode.
- `F5-FF`: RFC 3629 restricted UTF-8 to only code points UTF-16 could encode.
So in reality, only the following sequences can encode are valid characters:
- 00-7F
- C2-DF 80-BF
- E0 A0-BF 80-BF
- E1-EC 80-BF 80-BF
- ED 80-9F 80-BF
- EE-EF 80-BF 80-BF
- F0 90-BF 80-BF 80-BF
- F1-F3 80-BF 80-BF 80-BF
- F4 80-8F 80-BF 80-BF
The regexp just tries to match this as compactly as possible.
*/
var validUtf8Encodings = /%(?:[0-7]|(?!c0|c1|e0%[89]|ed%[ab]|f0%8|f4%[9ab])(?:c|d|(?:e|f[0-4]%[89ab])[\da-f]%[89ab])[\da-f]%[89ab])[\da-f]/gi
module.exports = function(str) {
return str.replace(validUtf8Encodings, decodeURIComponent)
}

Copy link
Contributor Author

@kfule kfule Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range written in the comment may be incorrect. (That regular expression seems correct, though.)

So in reality, only the following sequences can encode are valid characters:
- 00-7F
- C2-DF 80-BF
- E0    AF-BF 80-BF
- E1-EC 80-DF 80-BF
- ED    80-9F 80-BF
- EE-EF 80-DF 80-BF
- F0    90-BF 80-BF 80-BF
- F1-F3 80-BF 80-BF 80-BF
- F4    80-8F 80-BF 80-BF

The correct one is as follows:

- 00-7F
- C2-DF 80-BF
- E0    A0(*)-BF 80-BF
- E1-EC 80-BF(*) 80-BF
- ED    80-9F 80-BF
- EE-EF 80-BF(*) 80-BF
- F0    90-BF 80-BF 80-BF
- F1-F3 80-BF 80-BF 80-BF
- F4    80-8F 80-BF 80-BF

Additionally, it might be a good idea to correct the code point range in the following comment.

// now (wrong?)
- `F0 80-8F 80-BF 80-BF`: Overlong encoding for U+0800-U+DFFF

// correct
- `F0 80-8F 80-BF 80-BF`: Overlong encoding for U+0800-U+FFFF

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kfule Good catch. Fixed the comment suggestion. And you're correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the safe decode change, you can simplify lines 58-59 to just do var path = decodeURIComponentSafe(prefix).slice(0, route.prefix.length).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename this to decodeURIComponentSafe. The previous names were wrong, and I'm not sure how I kept missing this through the years.

Comment on lines -20 to -23
var callAsync = $window == null
// In case Mithril.js' loaded globally without the DOM, let's not break
? null
: typeof $window.setImmediate === "function" ? $window.setImmediate : $window.setTimeout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the callAsync test helper to also only call setTimeout. Node.js schedules setImmediate(f) into the same queue as setTimeout(f, 0), but with different timings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply removed setImmediate from the callAsync test helper. Is this OK?

Comment on lines +7 to +10
const vdom = require("../../render/render")
const m = require("../../render/hyperscript")
const fragment = require("../../render/fragment")
const domFor = require("../../render/domFor")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to fix these for consistency, just remove the redundant ../render in the domFor import. No need to change the others - I'd like to still see the history, and shorter relative paths are better (and less confusing).

Suggested change
const vdom = require("../../render/render")
const m = require("../../render/hyperscript")
const fragment = require("../../render/fragment")
const domFor = require("../../render/domFor")
const vdom = require("../render")
const m = require("../hyperscript")
const fragment = require("../fragment")
const domFor = require("../domFor")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow the style of other test files, I would like to maintain this verbose style for file paths here. All file paths in test code except for this file are consistently written in a verbose style.
Also, the consistently verbose file paths within the test code help when hacking require() to test bundle files.

render/render.js Outdated
Comment on lines 3 to 6
var Vnode = require("../render/vnode")
var df = require("../render/domFor")
var delayedRemoval = df.delayedRemoval
var domFor = df.domFor
var cachedAttrsIsStaticMap = require("./cachedAttrsIsStaticMap")
var delayedRemoval = require("../render/delayedRemoval")
var domFor = require("../render/domFor")
var cachedAttrsIsStaticMap = require("../render/cachedAttrsIsStaticMap")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove all the redundant ../render components in these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fixed it.

render/domFor.js Outdated
"use strict"

var delayedRemoval = new WeakMap
var delayedRemoval = require("../render/delayedRemoval")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not leave redundant import path components here.

Suggested change
var delayedRemoval = require("../render/delayedRemoval")
var delayedRemoval = require("./delayedRemoval")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@kfule
Copy link
Contributor Author

kfule commented Sep 15, 2025

I've addressed all comments except for the additional fixes to the URI decoder. The decoder fixes, including test code, may take a little time.

Some router-related tests have fixed issues where UTF-8 strings were not decoded.
… types

This will make the results more consistent with decodeURIComponent.
@kfule kfule changed the title [refactor] Reducing bundle size through module tailoring and cleanup Fix URI decoder bug and reduce bundle size through module tailoring and cleanup Sep 15, 2025
@kfule
Copy link
Contributor Author

kfule commented Sep 15, 2025

@dead-claudia
Thank you for your review comments. I believe I've addressed all of the issues, including the URI decoder.
If there are any missing test cases for the URI decoder, I would appreciate it if you could let me know in detail😃

In the current PR, I have tested all code points exhaustively, excluding surrogates. This has significantly increased the number of test cases reported on the console. Is this acceptable for you?

@kfule kfule requested a review from dead-claudia September 15, 2025 13:15
…rect mangling by internal bundler

Because "c" was being improperly mangled and the "c0" suffix was being improperly removed, it was changed to "c[01]" rather than just dropping the literal use.
Also, since the backslash in "\d" is escaped ("\\d"), I simply changed it to "0-9" of the same string length.
@kfule

This comment was marked as resolved.

@dead-claudia
Copy link
Member

@kfule Please just fix the bundler to ignore regexps instead. (This isn't the first time we've had regexps mangle stuff.)

It's a bit tricky, but at least for now, you should be able to get away with just detecting and skipping regexp literals following common safe stuff like =, {, (, and [. This regexp, while it doesn't technically capture all cases a regexp could appear, should hopefully work for now:

/[={([](?:[\s\u2028\u2029]|//.*?[\r\n\u2028\u2029]|\/\*[\s\S]*?\*\/)*\/(?:[^\/[\r\n\u2028\u2029]|\\[^\r\n\u2028\u2029]|\[([^\]\\\r\n\u2028\u2029]|\\[^\r\n\u2028\u2029]])*])+\/[$\p{ID_Continue}]+/u

Ref: https://tc39.es/ecma262/multipage/ecmascript-language-lexical-grammar.html#sec-literals-regular-expression-literals

…n the internal bundler

This allows regular expression literals to be used even where RegExp() was previously used as a workaround.
@kfule
Copy link
Contributor Author

kfule commented Sep 17, 2025

@dead-claudia Thank you for your comment.
As you suggested, I've added code to the bundler to consider regular expression literals. Also, I've made some minor adjustments to the regular expression you suggested rather than using it as-is.
At least, it seems the issues with decodeURIComponentSafe and the past m.censor have been resolved.

@kfule
Copy link
Contributor Author

kfule commented Sep 19, 2025

In addition, I disabled terser's conditionals compression option. This appeared to improve both performance and bundle size (8,982 bytes gzipped, although the non-gzip file size appears to be increasing).
I will address this in this PR as it also relates to bundle size reduction.

@kfule kfule force-pushed the shrink-bundle branch 2 times, most recently from 2e76ac8 to c02627b Compare September 19, 2025 14:30
Like the issue addressed in MithrilJS#3000, it seemed that the performance of mithril.min.js had degraded again. I tried some compression options for terser and found that disabling the `conditionals` option improved performance and also reduced the bundle size.
It seemed that, for example, `var i` might cause a suffix to be added to the "i" flag within a regular expression literal. So the entire regular expression literal, including flags, is now corrected within the bundler.
@kfule
Copy link
Contributor Author

kfule commented Sep 20, 2025

The process for handling regular expression literal flags was insufficient, so I added it to the bundler.
I believe it should be fine now, so I don't think there will be any further commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants