javascript - Strip all HTML from string, except <mark> tags -
i have large string in javascript need strip html, minus specific tags.
i'm using
var nohtml = /(<([^>]+)>)/ig;
now strips html, regex can add ignore mark tags while doing this?
as mentioned in comments above, regex isn't right tool use parsing html. being said, 1 way use ahead tags want keep:
var nohtml = /(?!(<ul|<\/ul>))(<([^>]+)>)/ig;
in example, "ul"
so specific case:
var nohtml = /(?!(<mark|<\/mark>))(<([^>]+)>)/ig;
you can see working here in fiddle: https://jsfiddle.net/0xgs0u9m/
you may want instead consider using html parser on npm:
https://www.npmjs.com/package/htmlparser
from example:
var handler = new tautologistics.nodehtmlparser.defaulthandler(function (error, dom) { if (error) [...do errors...] else [...parsing done, something...] }); var parser = new tautologistics.nodehtmlparser.parser(handler); parser.parsecomplete(document.body.innerhtml); alert(json.stringify(handler.dom, null, 2));
results in:
[ { raw: 'xyz ', data: 'xyz ', type: 'text' } , { raw: 'script language= javascript' , data: 'script language= javascript' , type: 'script' , name: 'script' , attribs: { language: 'javascript' } , children: [ { raw: 'var foo = \'<bar>\';<' , data: 'var foo = \'<bar>\';<' , type: 'text' } ] } , { raw: '<!-- waah! -- ' , data: '<!-- waah! -- ' , type: 'comment' } ]
Comments
Post a Comment