javascript - Strip all HTML from string, except <mark> tags -


i have large string in javascript need strip html, minus specific tags.

i'm using

var nohtml = /(<([^>]+)>)/ig; 

now strips html, regex can add ignore mark tags while doing this?

as mentioned in comments above, regex isn't right tool use parsing html. being said, 1 way use ahead tags want keep:

var nohtml = /(?!(<ul|<\/ul>))(<([^>]+)>)/ig; 

in example, "ul"

so specific case:

var nohtml = /(?!(<mark|<\/mark>))(<([^>]+)>)/ig; 

you can see working here in fiddle: https://jsfiddle.net/0xgs0u9m/

you may want instead consider using html parser on npm:

https://www.npmjs.com/package/htmlparser

from example:

var handler = new tautologistics.nodehtmlparser.defaulthandler(function (error, dom) {     if (error)         [...do errors...]     else         [...parsing done, something...] }); var parser = new tautologistics.nodehtmlparser.parser(handler); parser.parsecomplete(document.body.innerhtml); alert(json.stringify(handler.dom, null, 2)); 

results in:

[ { raw: 'xyz ', data: 'xyz ', type: 'text' }   , { raw: 'script language= javascript'   , data: 'script language= javascript'   , type: 'script'   , name: 'script'   , attribs: { language: 'javascript' }   , children:       [ { raw: 'var foo = \'<bar>\';<'        , data: 'var foo = \'<bar>\';<'        , type: 'text'        }      ]   } , { raw: '<!-- waah! -- '   , data: '<!-- waah! -- '   , type: 'comment'   } ] 

Comments

Popular posts from this blog

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -