Sentence Splitters: no sentence break in between two words with no punctuation #62

dhruvil410 · 2021-03-19T10:45:47Z

Fix #60
We can also fix the issue by replacing \n by space at starting, when we get sentences, means we can add sentences=replace(sentences, r"\n" => Base.SubstitutionString(" ")) this line at starting of function rulebased_split_sentences(sentences). We can also add different characters other than alphanumeric in committed code.
Which is better way to fix this issue? or any suggestions other than this.

triztian · 2021-04-05T19:59:36Z

I think perhaps adding tests would help in making this fix more robust, also since it'd be changing the output of the function, maybe make it an optional keyword arg so that those that need it to behave that way enable the behavior explicitly rather than it changing all of the sudden.

For example updating rulebased_split_sentences:

WordTokenizers.jl/src/sentences/sentence_splitting.jl

Line 1 in d181905

function rulebased_split_sentences(sentences)

So that it can be called like this:

rulebased_split_sentences(sentence, collapse_newlines=true)

So that multiple newlines are reduced to 1 newline and single newlines removed.

dhruvil410 · 2021-04-10T13:01:14Z

I have no idea about checks. Why didn't code pass checks?

fix JuliaText#60

d181905

made optional collapse_newlines and added test for that

8e48cec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sentence Splitters: no sentence break in between two words with no punctuation #62

Sentence Splitters: no sentence break in between two words with no punctuation #62

Uh oh!

dhruvil410 commented Mar 19, 2021

Uh oh!

triztian commented Apr 5, 2021 •

edited

Loading

Uh oh!

dhruvil410 commented Apr 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sentence Splitters: no sentence break in between two words with no punctuation #62

Are you sure you want to change the base?

Sentence Splitters: no sentence break in between two words with no punctuation #62

Uh oh!

Conversation

dhruvil410 commented Mar 19, 2021

Uh oh!

triztian commented Apr 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhruvil410 commented Apr 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

triztian commented Apr 5, 2021 •

edited

Loading