xml parsing - DOMDocument in php -

i have started reading documentation , examples dom, in order crawl , parse document.

for example have part of document shown below:

    <div id="showcontent">     <table>     <tr>         <td>          crap         </td>     </tr> <tr>           <td width="172" valign="top"><a href="link"><img height="91" border="0" width="172" class="" src="img"></a></td>           <td width="10">&nbsp;</td>           <td valign="top"><table cellspacing="0" cellpadding="0" border="0">               <tbody><tr>                 <td height="30"><a class="px11" href="link">title</a><a><br>                     <span class="px10"></span>                 </a></td>               </tr>               <tr>                 <td><img height="1" width="580" src="crap"></td>               </tr>               <tr>                 <td align="right">                     <a href="link"><img height="16" border="0" width="65" src="/buy"></a>                 </td>               </tr>               <tr>                 <td valign="top" class="px10">                     <p style="width: 500px;">description.</p>                 </td>               </tr>           </tbody></table></td>         </tr>     <tr>         <td> crap         </td>     </tr>     <tr>         <td>          crap         </td>     </tr>     </table>     </div>

i'm trying use following code tr tags , analyze whether there crap or information inside them:

$dom = new domdocument(); @$dom->loadhtml($html);  $xpath = new domxpath($dom);   $tags = $xpath->query('.//div[@id="showcontent"]'); foreach ($tags $tag) {     $string="";     $string=trim($tag->nodevalue);     if(strlen($string)>3) {         echo $string;         echo '<br>';     } }

however i'm getting stripped string without tags, example:

crap  crap title description

but get:

<tr>    <td>crap</td> </tr> <tr>    <a href="link">title</a> </tr>

how keep html nodes (tags)?

if want work dom have understand concept. in dom document, including domdocument, node.

the domdocument hierarchical tree structure of nodes. starts root node. root node can have child nodes , these child nodes can have child nodes on own. in domdocument node type of sort, elements, attributes or text content.

          html                               legend:           /    \                              uppercase = domelement        head  body                            lowercase = domattr       /          \                           "quoted"  = domtext     title        div - class - "header"      |             \ "the title"        h1                     |            "welcome nodeville"

the diagram above shows domdocument nodes. there root element (html) 2 children (head , body). connecting lines called axes. if follow down axis title element, see has 1 domtext leaf. important because illustrates overlooked thing:

<title>the title</title>

is not one, 2 nodes. domelement domtext child. likewise, this

<div class="header">

is 3 nodes: domelement domattr holding domtext. because these inherit properties , methods domnode, essential familiarize domnode class.

in practise, means div fetched linked other nodes in document. go way root element or down leaves @ time. it's there. have query or traverse document wanted information.

whether iterating childnodes of div or use getelementbytagname() or xpath you. have understand not working raw html, nodes representing entire html document.

if need extracting specific information document, need clarify information want fetch it. instance, ask how fetch links table , answer like:

$div = $dom->getelementbyid('showcontent'); foreach ($div->getelementsbytagname('a') $link)  {     echo $dom->savexml($link); }

but unless more specific, can guess nodes might relevant.

if need more examples , code snippets on how work dom browse through previous answers related questions:

https://stackoverflow.com/search?q=user%3a208809+dom

by now, there should snippet every basic medium usecase might have dom.

Abdulmateen

Search This Blog

xml parsing - DOMDocument in php -

Comments

Post a Comment