Skip to content

Add function to turn html to text #708

@TobiasNx

Description

@TobiasNx

In context of OERSI (see https://gitlab.com/oersi/oersi-etl/-/issues/360) we are in need of a fix function that gets rid of any html tags in a text and removes HTML-encoded special characters.

Idea could be:

    html_to_text {
        @Override
        public void apply(final Metafix metafix, final Record record, final List<String> params, final Map<String, String> options) {
            record.transform(params.get(0), s -> Jsoup.parse(s).wholeText());
        }
    },

Based on idea here: https://stackoverflow.com/questions/3607965/how-to-convert-html-text-to-plain-text

Not sure if we wait for #706

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions