javascript - Strip all HTML from string, except <mark> tags -

i have large string in javascript need strip html, minus specific tags.

i'm using

var nohtml = /(<([^>]+)>)/ig;

now strips html, regex can add ignore mark tags while doing this?

as mentioned in comments above, regex isn't right tool use parsing html. being said, 1 way use ahead tags want keep:

var nohtml = /(?!(<ul|<\/ul>))(<([^>]+)>)/ig;

in example, "ul"

so specific case:

var nohtml = /(?!(<mark|<\/mark>))(<([^>]+)>)/ig;

you can see working here in fiddle: https://jsfiddle.net/0xgs0u9m/

you may want instead consider using html parser on npm:

https://www.npmjs.com/package/htmlparser

from example:

var handler = new tautologistics.nodehtmlparser.defaulthandler(function (error, dom) {     if (error)         [...do errors...]     else         [...parsing done, something...] }); var parser = new tautologistics.nodehtmlparser.parser(handler); parser.parsecomplete(document.body.innerhtml); alert(json.stringify(handler.dom, null, 2));

results in:

[ { raw: 'xyz ', data: 'xyz ', type: 'text' }   , { raw: 'script language= javascript'   , data: 'script language= javascript'   , type: 'script'   , name: 'script'   , attribs: { language: 'javascript' }   , children:       [ { raw: 'var foo = \'<bar>\';<'        , data: 'var foo = \'<bar>\';<'        , type: 'text'        }      ]   } , { raw: '<!-- waah! -- '   , data: '<!-- waah! -- '   , type: 'comment'   } ]

Search This Blog

Premier

javascript - Strip all HTML from string, except <mark> tags -

Comments

Post a Comment

Popular posts from this blog

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -