-
-
Notifications
You must be signed in to change notification settings - Fork 931
Fix URI decoder bug and reduce bundle size through module tailoring and cleanup #3050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
setImmediate() is not supported in almost all environments except IE11. Furthermore, by not using setImmediate(), the assignment code to callAsync itself, which takes non-browser environments into account, can be removed.
The `window` is no longer used within m.render().
The bundle size can be reduced by removing the intermediate object used to export domFor and delayedRemoval. Additionally, the file paths used in domFor-related require() calls have been standardized.
…ith one shared version
…undle by changing the module loading order and variable name Mithril's internal bundler enables module bundling without generating intermediate variables by carefully managing module loading order and variable names. This commit prevents assigning `mountRedraw`, `decodeURIComponentSave`, and `hyperscript` to intermediate variables. It also improves readability by moving all member additions for `m` to the end of the file.
7a41a6c
to
4e252ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just small nits.
I do want to see that URI decoder bug fix, though. Yes, it's a +1, but we're already here doing things to it, and if it doesn't happen now, it'll probably never happen.
util/decodeURIComponentSave.js
Outdated
module.exports = function(str) { | ||
try { | ||
return decodeURIComponent(str) | ||
} catch(err) { | ||
return str | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this will result in a slight bundle increase, but can you replace this while we're at it with a string replace instead so we can be more tolerant with our decoding? Also, I'd like to see new tests validating this behavioral change. (It's honestly a bug fix more than anything.)
As the change involves string replace detection, you'll need to add several tests for this. For the sake of not blowing up test runtime, please put the relevant tests in a separate file instead of adding it to the existing router test file.
That way, you can exhaustively test it with some loops inside a single test body. There's 1114112 = 2^20 + 2^16 total code points in Unicode, including surrogates, and testing that 21 times, one for each origin + prefix pair in the router tests, is just plainly not worth the test runtime.
The idea is to only feed decodeURIComponent
the valid UTF-8 encodings while decoding the rest. The comment in the suggestion explains the regexp.
By the way, String.prototype.replace
is usually much better optimized than decodeURIComponent
, and so this can provide a small speed boost. This is especially true in V8, since decodeURIComponent
builds a complete copy byte-by-byte while String.prototype.replace
just creates non-copying slices and combines them with cons nodes.
module.exports = function(str) { | |
try { | |
return decodeURIComponent(str) | |
} catch(err) { | |
return str | |
} | |
} | |
/* | |
Percent encodings encode UTF-8 bytes, so this regexp needs to match that. | |
Here's how UTF-8 encodes stuff: | |
- `00-7F`: 1-byte, for U+0000-U+007F | |
- `C2-DF 80-BF`: 2-byte, for U+0080-U+07FF | |
- `E0-EF 80-BF 80-BF`: 3-byte, encodes U+0800-U+FFFF | |
- `F0-F4 80-BF 80-BF 80-BF`: 4-byte, encodes U+10000-U+10FFFF | |
In this, there's a number of invalid byte sequences: | |
- `80-BF`: Continuation byte, invalid as start | |
- `C0-C1 80-BF`: Overlong encoding for U+0000-U+007F | |
- `E0 80-9F 80-BF`: Overlong encoding for U+0080-U+07FF | |
- `ED A0-BF 80-BF`: Encoding for UTF-16 surrogate U+D800-U+DFFF | |
- `F0 80-8F 80-BF 80-BF`: Overlong encoding for U+0800-U+FFFF | |
- `F4 90-BF`: RFC 3629 restricted UTF-8 to only code points UTF-16 could encode. | |
- `F5-FF`: RFC 3629 restricted UTF-8 to only code points UTF-16 could encode. | |
So in reality, only the following sequences can encode are valid characters: | |
- 00-7F | |
- C2-DF 80-BF | |
- E0 A0-BF 80-BF | |
- E1-EC 80-BF 80-BF | |
- ED 80-9F 80-BF | |
- EE-EF 80-BF 80-BF | |
- F0 90-BF 80-BF 80-BF | |
- F1-F3 80-BF 80-BF 80-BF | |
- F4 80-8F 80-BF 80-BF | |
The regexp just tries to match this as compactly as possible. | |
*/ | |
var validUtf8Encodings = /%(?:[0-7]|(?!c0|c1|e0%[89]|ed%[ab]|f0%8|f4%[9ab])(?:c|d|(?:e|f[0-4]%[89ab])[\da-f]%[89ab])[\da-f]%[89ab])[\da-f]/gi | |
module.exports = function(str) { | |
return str.replace(validUtf8Encodings, decodeURIComponent) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The range written in the comment may be incorrect. (That regular expression seems correct, though.)
So in reality, only the following sequences can encode are valid characters:
- 00-7F
- C2-DF 80-BF
- E0 AF-BF 80-BF
- E1-EC 80-DF 80-BF
- ED 80-9F 80-BF
- EE-EF 80-DF 80-BF
- F0 90-BF 80-BF 80-BF
- F1-F3 80-BF 80-BF 80-BF
- F4 80-8F 80-BF 80-BF
The correct one is as follows:
- 00-7F
- C2-DF 80-BF
- E0 A0(*)-BF 80-BF
- E1-EC 80-BF(*) 80-BF
- ED 80-9F 80-BF
- EE-EF 80-BF(*) 80-BF
- F0 90-BF 80-BF 80-BF
- F1-F3 80-BF 80-BF 80-BF
- F4 80-8F 80-BF 80-BF
Additionally, it might be a good idea to correct the code point range in the following comment.
// now (wrong?)
- `F0 80-8F 80-BF 80-BF`: Overlong encoding for U+0800-U+DFFF
// correct
- `F0 80-8F 80-BF 80-BF`: Overlong encoding for U+0800-U+FFFF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kfule Good catch. Fixed the comment suggestion. And you're correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the safe decode change, you can simplify lines 58-59 to just do var path = decodeURIComponentSafe(prefix).slice(0, route.prefix.length)
.
util/decodeURIComponentSave.js
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename this to decodeURIComponentSafe
. The previous names were wrong, and I'm not sure how I kept missing this through the years.
var callAsync = $window == null | ||
// In case Mithril.js' loaded globally without the DOM, let's not break | ||
? null | ||
: typeof $window.setImmediate === "function" ? $window.setImmediate : $window.setTimeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the callAsync
test helper to also only call setTimeout
. Node.js schedules setImmediate(f)
into the same queue as setTimeout(f, 0)
, but with different timings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply removed setImmediate
from the callAsync
test helper. Is this OK?
const vdom = require("../../render/render") | ||
const m = require("../../render/hyperscript") | ||
const fragment = require("../../render/fragment") | ||
const domFor = require("../../render/domFor") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to fix these for consistency, just remove the redundant ../render
in the domFor
import. No need to change the others - I'd like to still see the history, and shorter relative paths are better (and less confusing).
const vdom = require("../../render/render") | |
const m = require("../../render/hyperscript") | |
const fragment = require("../../render/fragment") | |
const domFor = require("../../render/domFor") | |
const vdom = require("../render") | |
const m = require("../hyperscript") | |
const fragment = require("../fragment") | |
const domFor = require("../domFor") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To follow the style of other test files, I would like to maintain this verbose style for file paths here. All file paths in test code except for this file are consistently written in a verbose style.
Also, the consistently verbose file paths within the test code help when hacking require() to test bundle files.
render/render.js
Outdated
var Vnode = require("../render/vnode") | ||
var df = require("../render/domFor") | ||
var delayedRemoval = df.delayedRemoval | ||
var domFor = df.domFor | ||
var cachedAttrsIsStaticMap = require("./cachedAttrsIsStaticMap") | ||
var delayedRemoval = require("../render/delayedRemoval") | ||
var domFor = require("../render/domFor") | ||
var cachedAttrsIsStaticMap = require("../render/cachedAttrsIsStaticMap") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove all the redundant ../render
components in these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fixed it.
render/domFor.js
Outdated
"use strict" | ||
|
||
var delayedRemoval = new WeakMap | ||
var delayedRemoval = require("../render/delayedRemoval") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not leave redundant import path components here.
var delayedRemoval = require("../render/delayedRemoval") | |
var delayedRemoval = require("./delayedRemoval") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
I've addressed all comments except for the additional fixes to the URI decoder. The decoder fixes, including test code, may take a little time. |
Some router-related tests have fixed issues where UTF-8 strings were not decoded.
… types This will make the results more consistent with decodeURIComponent.
@dead-claudia In the current PR, I have tested all code points exhaustively, excluding surrogates. This has significantly increased the number of test cases reported on the console. Is this acceptable for you? |
…rect mangling by internal bundler Because "c" was being improperly mangled and the "c0" suffix was being improperly removed, it was changed to "c[01]" rather than just dropping the literal use. Also, since the backslash in "\d" is escaped ("\\d"), I simply changed it to "0-9" of the same string length.
This comment was marked as resolved.
This comment was marked as resolved.
@kfule Please just fix the bundler to ignore regexps instead. (This isn't the first time we've had regexps mangle stuff.) It's a bit tricky, but at least for now, you should be able to get away with just detecting and skipping regexp literals following common safe stuff like
|
…n the internal bundler This allows regular expression literals to be used even where RegExp() was previously used as a workaround.
@dead-claudia Thank you for your comment. |
In addition, I disabled terser's |
2e76ac8
to
c02627b
Compare
Like the issue addressed in MithrilJS#3000, it seemed that the performance of mithril.min.js had degraded again. I tried some compression options for terser and found that disabling the `conditionals` option improved performance and also reduced the bundle size.
c02627b
to
a0a3b46
Compare
It seemed that, for example, `var i` might cause a suffix to be added to the "i" flag within a regular expression literal. So the entire regular expression literal, including flags, is now corrected within the bundler.
The process for handling regular expression literal flags was insufficient, so I added it to the bundler. |
This fixes the URI decoder used in the Router to decode more strictly. Also, this reduces bundle size by removing unnecessary code, intermediate object, and intermediate variables.
Description
This PR reduces the bundle file size by implementing the following changes (including URI decoder fix).
setImmediate
, which is almost never used outside of IE11domFor
by splittingdelayedRemoval
into a separate file.decodeURIComponentSave
functions with one shared versionconditionals
compression optionExcept for bug fixes in the URI decoder, all of the changes do not break existing tests, of course.
In addition, the readability of bundle file improves with the removal of intermediate objects and variables.
bundle size comparison
(before) v2.3.7
(after) This PR
This PR reduces the bundle size to 8.98kb (8,982 bytes) gzipped.
Motivation and Context
Regarding the bundle size, it dropped below 9 kB gzipped in v2.3.4 but climbed back above 9 kB gzipped starting in v2.3.5, which was a bit disappointing for me. So I decided to work on reducing the bundle file size.
Also, I am interested in the readability of the bundle file (mithril.js file), and I believe this PR contributes to improving readability.
How Has This Been Tested?
npm run test
Types of changes
Checklist