writing a screen scraper


Posted on 16th Feb 2014 07:03 pm by admin

Hello,

I'm writing a screen scraper application and want to be able to get absolute addresses for images from relative links.

So a link like this: Code: <img src="../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" /> might link to http://www.myointernational.com/furniture/e-commerce_in_a_box_small.jpg

If I am analysing a web address, I understand that the pseudo code would be something like this:Code: <?php

$string='<img src="../../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" />';
// we need to find the system root and replace the ../ with REAL values.

$url='http://www.myointernational.com/test_dir/';
if($string contains '../'){
$number_of_them=count(the number of them);
}
$i=1
while($i<=$number_of_them){
$tmp_url=go up one level from the $url;
$i++;
}
?>
<img src="<?php echo $tmp_url;?>" alt="E-Commerce" width="100" height="134" border="0" />
How would I go about finding the code to make the pseudo code work?

No comments posted yet

Your Answer:

Login to answer
252 Like 22 Dislike
Previous forums Next forums
Other forums

How to set pass login name from htaccess to php
When the user logs into my members page via htaccess, I'd like to retain the username so that I can

Printing a webpage
I use this to print the webpage: o

Serial number of exernal hard disk/Thumbdrive
Hi guys, I am new tio java networking concepts.Please tel me how to get Serial number of exernal

Socket problem
Hello,

Earlier I posted about my problem with my socket script. It took up to 100% CPU usage.

Page doesn't expand for content
Hi. I've been testing a query I made. It's here http://lapr1.6te.net/inquerito.php
The page doesn

Help on code output
My CODE:

Code: [Select] echo "<phone>".$line["phone"].&qu

update post issues
I am trying to create an update to a post function, while the update does occur, the page routing an

IDOC error
Hi, When i send IDOC from ECC system to MII there is no problem ECC side, i says message sent succe

Saas with SAP R/3
hi experts,

Is SaaS offer available in SAP? for SAP R/3? from where i can get the info a

why preg_match_all does not return the number of matches
My regex looks like

X[^x{4e00}-x{9fa5}]*Y

(where X and Y are two Chinese characters)

Sign up to write
Sign up now if you have flare of writing..
Login   |   Register
Follow Us
Indyaspeak @ Facebook Indyaspeak @ Twitter Indyaspeak @ Pinterest RSS



Play Free Quiz and Win Cash