c# 4.0 - Removing HTML Tags using XmlDocument in C# 4.0 -


i have below code trying delete html element passed.

string inputstring = "<img class="imgright" title="zürich, switzerland" src="test.png" alt="switzerland" width="44" height="44"/> <p class="first">zurich</p> <p class="second">test</p> <p class="first">testing</p> <img class="imgright" title="zürich, switzerland" src="1.png" alt="switzerland" width="44" height="44"/> <a href="test.aspx">hello</a>"; //sample html string  string[] htmltags = new string[] { "a", "img", "link:componentlink" };  string removedtagshtml = removehtmltags(inputstring,htmltags);//giving error "there multiple root elements."   public static string removehtmltags(string inputstring, string[] htmltags) {     string strresult = string.empty;     foreach (string htmltag in htmltags)     {                         xmldocument xdoc = new xmldocument();         xdoc.loadxml(inputstring);         xmlnamespacemanager xman = new xmlnamespacemanager(xdoc.nametable);         xman.addnamespace("xs", xdoc.documentelement.namespaceuri);          xmlnode xnode = xdoc.selectsinglenode("xs:" + htmltag + "", xman);         xdoc.removeall();         xdoc.appendchild(xnode);         string seeoutputhere = xdoc.outerxml;      }     return strresult; } 

function generates error "there multiple root elements."

even if fix "multiple root elements" thing (see linq xml - load xml fragments file 1 example), general-case html still not valid xml.

for html processing, should htmlagilitypack.


Comments