i have sample code:
<?php $adr = 'http://www.proxynova.com/proxy-server-list/country-gb/'; $c = file_get_contents($adr); if ($c){ $regexp = '#<td>(.*?):(\d{1,4})</td>#'; $matches = array(); preg_match_all($regexp,$c,$matches); print_r($matches); if (count($matches) > 0){ foreach($matches[0] $k => $m){ $port = intval($matches[2][$k]); $ip = trim($matches[1][$k]); } } }
i using $regex = '#<td>(.*?):(\d{1,4})</td>#';
data inculde ip , port, result null, how fix !
you can see in browser, in source it's scrambled; need decode it:
function decode($str) { return long2ip(strtr($str, array( 'fgh' => 2, 'iop' => 1, 'ray' => 0, ))); }
then use domdocument
solution this:
$doc = new domdocument; libxml_use_internal_errors(true); $doc->loadhtml(file_get_contents('http://www.proxynova.com/proxy-server-list/country-gb/')); $xp = new domxpath($doc); foreach ($xp->query('//table[@id="tbl_proxy_list"]//tr') $row) { $ip = $xp->query('./td/span[@class="row_proxy_ip"]/script', $row); $port = $xp->query('./td/span[@class="row_proxy_port"]/a', $row); if ($ip->length && $port->length) { if (preg_match('/decode\("([^"]+)"\)/', $ip->item(0)->textcontent, $matches)) { echo decode($matches[1]) . ':' . $port->item(0)->textcontent, php_eol; } } }
Comments
Post a Comment