Skip to content

Commit 58ca30e

Browse files
committed
feat: added trailingPeriod option (so "example.com." only matches "example.com" and not "example.com.")
1 parent 3af18d6 commit 58ca30e

File tree

3 files changed

+58
-19
lines changed

3 files changed

+58
-19
lines changed

README.md

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
* [Node](#node)
2020
* [Browser](#browser)
2121
* [Options](#options)
22-
* [Tips](#tips)
22+
* [Quick tips and migration from url-regex](#quick-tips-and-migration-from-url-regex)
2323
* [Contributors](#contributors)
2424
* [License](#license)
2525

@@ -100,25 +100,32 @@ Assuming you are using [browserify][], [webpack][], [rollup][], or another bundl
100100

101101
## Options
102102

103-
| Property | Type | Default Value | Description | |
104-
| -------------- | ------- | ------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | - |
105-
| `exact` | Boolean | `false` | Only match an exact String. Useful with `regex.test(str)` to check if a String is a URL. We set this to `false` by default in order to match String values such as `github.com` (as opposed to requiring a protocol or `www` subdomain). We feel this closely more resembles real-world intended usage of this package. | |
106-
| `strict` | Boolean | `false` | Force URL's to start with a valid protocol or `www` if set to `true`. If `true`, then it will allow any TLD as long as it is a minimum of 2 valid characters. If it is `false`, then it will match the TLD against the list of valid TLD's using [tlds](https://github.com/stephenmathieson/node-tlds#readme). | |
107-
| `auth` | Boolean | `false` | Match against Basic Authentication headers. We set this to `false` by default since [it was deprecated in Chromium](https://bugs.chromium.org/p/chromium/issues/detail?id=82250#c7), and otherwise it leaves the user with unwanted URL matches (more closely resembles real-world intended usage of this package by having it set to `false` by default too). | |
108-
| `localhost` | Boolean | `true` | Allows localhost in the URL hostname portion. See the [test/test.js](test/test.js) for more insight into the localhost test and how it will return a value which may be unwanted. A pull request would be considered to resolve the "pic.jp" vs. "pic.jpg" issue. | |
109-
| `parens` | Boolean | `false` | Match against Markdown-style trailing parenthesis. We set this to `false` because it should be up to the user to parse for Markdown URL's. | |
110-
| `apostrophe` | Boolean | `false` | Match against apostrophes. We set this to `false` because we don't want the String `background: url('http://example.com/pic.jpg');` to result in `http://example.com/pic.jpg'`. See this [issue](https://github.com/kevva/url-regex/pull/55) for more information. | |
111-
| `ipv4` | Boolean | `true` | Match against IPv4 URL's. | |
112-
| `ipv6` | Boolean | `true` | Match against IPv6 URL's. | |
113-
| `tlds` | Array | [tlds](https://github.com/stephenmathieson/node-tlds#readme) | Match against a specific list of tlds, or the default list provided by [tlds](https://github.com/stephenmathieson/node-tlds#readme). | |
114-
| `returnString` | Boolean | `false` | Return the RegExp as a String instead of a `RegExp` (useful for custom logic, such as we did with [Spam Scanner][spam-scanner]). | |
103+
| Property | Type | Default Value | Description | |
104+
| ---------------- | ------- | ------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | - |
105+
| `exact` | Boolean | `false` | Only match an exact String. Useful with `regex.test(str)` to check if a String is a URL. We set this to `false` by default in order to match String values such as `github.com` (as opposed to requiring a protocol or `www` subdomain). We feel this closely more resembles real-world intended usage of this package. | |
106+
| `strict` | Boolean | `false` | Force URL's to start with a valid protocol or `www` if set to `true`. If `true`, then it will allow any TLD as long as it is a minimum of 2 valid characters. If it is `false`, then it will match the TLD against the list of valid TLD's using [tlds](https://github.com/stephenmathieson/node-tlds#readme). | |
107+
| `auth` | Boolean | `false` | Match against Basic Authentication headers. We set this to `false` by default since [it was deprecated in Chromium](https://bugs.chromium.org/p/chromium/issues/detail?id=82250#c7), and otherwise it leaves the user with unwanted URL matches (more closely resembles real-world intended usage of this package by having it set to `false` by default too). | |
108+
| `localhost` | Boolean | `true` | Allows localhost in the URL hostname portion. See the [test/test.js](test/test.js) for more insight into the localhost test and how it will return a value which may be unwanted. A pull request would be considered to resolve the "pic.jp" vs. "pic.jpg" issue. | |
109+
| `parens` | Boolean | `false` | Match against Markdown-style trailing parenthesis. We set this to `false` because it should be up to the user to parse for Markdown URL's. | |
110+
| `apostrophes` | Boolean | `false` | Match against apostrophes. We set this to `false` because we don't want the String `background: url('http://example.com/pic.jpg');` to result in `http://example.com/pic.jpg'`. See this [issue](https://github.com/kevva/url-regex/pull/55) for more information. | |
111+
| `trailingPeriod` | Boolean | `false` | Match against trailing periods. We set this to `false` by default since real-world behavior would want `example.com` versus `example.com.` as the match (this is different than [url-regex][] where it matches the trailing period in that package). | |
112+
| `ipv4` | Boolean | `true` | Match against IPv4 URL's. | |
113+
| `ipv6` | Boolean | `true` | Match against IPv6 URL's. | |
114+
| `tlds` | Array | [tlds](https://github.com/stephenmathieson/node-tlds#readme) | Match against a specific list of tlds, or the default list provided by [tlds](https://github.com/stephenmathieson/node-tlds#readme). | |
115+
| `returnString` | Boolean | `false` | Return the RegExp as a String instead of a `RegExp` (useful for custom logic, such as we did with [Spam Scanner][spam-scanner]). | |
115116

116117

117-
## Tips
118+
## Quick tips and migration from url-regex
118119

119120
You must override the default and set `strict: true` if you do not wish to match `github.com` by itself (though `www.github.com` will work if `strict: false`).
120121

121-
Unlike the deprecated and unmaintained package [url-regex][], we set `strict` and `auth` to `false` by default, so if you want to match that package's behavior out of the box, you will need to set these option values to `true`. Also note that we added `parens` and `ipv6` options, setting `parens` to `false` and `ipv6` to `true`, therefore you will need to set `parens` to `true` and `ipv6` to `false` if you wish to match [url-regex][] behavior. Lastly, we added an `apostrophe` option, which we set to `false` by default, but you should set to `true` if you wish to mirror [url-regex][] default behavior.
122+
Unlike the deprecated and unmaintained package [url-regex][], we do a few things differently:
123+
124+
* We set `strict` to `false` by default ([url-regex][] had this set to `true`)
125+
* We added an `auth` option, which is set to `false` by default ([url-regex][] matches against Basic Authentication; had this set to `true` - however this is a deprecated behavior in Chromium).
126+
* We added `parens` and `ipv6` options, which are set to `true` by default ([url-regex][] had `parens` set to `true` and `ipv6` was non-existent or set to `false` rather).
127+
* We added an `apostrophe` option, which is set to `false` by default ([url-regex][] had this set to `true`).
128+
* We added a `trailingPeriod` option, which is set to `false` by default (which means matches won't contain trailing periods, whereas [url-regex][] had this set to `true`).
122129

123130

124131
## Contributors

src/index.js

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ module.exports = (options) => {
1515
localhost: true,
1616
parens: false,
1717
apostrophes: false,
18+
trailingPeriod: false,
1819
ipv4: true,
1920
ipv6: true,
2021
tlds,
@@ -35,7 +36,8 @@ module.exports = (options) => {
3536
options.strict
3637
? '(?:[a-z\\u00a1-\\uffff]{2,})'
3738
: `(?:${options.tlds.sort((a, b) => b.length - a.length).join('|')})`
38-
})\\.?`;
39+
})${options.trailingPeriod ? '\\.?' : ''}`;
40+
3941
const port = '(?::\\d{2,5})?';
4042
// Not accept closing parenthesis
4143
// <https://github.com/kevva/url-regex/pull/35>

test/test.js

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,14 @@ const fixtures = [
6161
];
6262
for (const x of fixtures) {
6363
test(`match exact URLs: ${x}`, (t) => {
64-
t.true(urlRegex({ exact: true, auth: true, parens: true }).test(x));
64+
t.true(
65+
urlRegex({
66+
exact: true,
67+
auth: true,
68+
parens: true,
69+
trailingPeriod: true
70+
}).test(x)
71+
);
6572
});
6673
}
6774

@@ -203,7 +210,14 @@ for (const x of [
203210
'➡.ws/䨹'
204211
]) {
205212
test(`match using list of TLDs: ${x}`, (t) => {
206-
t.true(urlRegex({ exact: true, auth: true, parens: true }).test(x));
213+
t.true(
214+
urlRegex({
215+
exact: true,
216+
auth: true,
217+
parens: true,
218+
trailingPeriod: true
219+
}).test(x)
220+
);
207221
});
208222
}
209223

@@ -356,7 +370,8 @@ test('match using explicit list of TLDs', (t) => {
356370
exact: true,
357371
auth: true,
358372
parens: true,
359-
tlds: ['com', 'ws', 'de', 'net', 'mp', 'bar', 'onion', 'education']
373+
tlds: ['com', 'ws', 'de', 'net', 'mp', 'bar', 'onion', 'education'],
374+
trailingPeriod: true
360375
}).test(x)
361376
);
362377
}
@@ -487,3 +502,18 @@ test('localhost', (t) => {
487502
['pic.jp']
488503
);
489504
});
505+
506+
test('trailing period', (t) => {
507+
t.deepEqual(
508+
'background example.com. foobar.com'.match(
509+
urlRegex({ trailingPeriod: true })
510+
),
511+
['example.com.', 'foobar.com']
512+
);
513+
t.deepEqual(
514+
'background example.com. foobar.com'.match(
515+
urlRegex({ trailingPeriod: false })
516+
),
517+
['example.com', 'foobar.com']
518+
);
519+
});

0 commit comments

Comments
 (0)