writing a screen scraper


Posted on 16th Feb 2014 07:03 pm by admin

Hello,

I'm writing a screen scraper application and want to be able to get absolute addresses for images from relative links.

So a link like this: Code: <img src="../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" /> might link to http://www.myointernational.com/furniture/e-commerce_in_a_box_small.jpg

If I am analysing a web address, I understand that the pseudo code would be something like this:Code: <?php

$string='<img src="../../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" />';
// we need to find the system root and replace the ../ with REAL values.

$url='http://www.myointernational.com/test_dir/';
if($string contains '../'){
$number_of_them=count(the number of them);
}
$i=1
while($i<=$number_of_them){
$tmp_url=go up one level from the $url;
$i++;
}
?>
<img src="<?php echo $tmp_url;?>" alt="E-Commerce" width="100" height="134" border="0" />
How would I go about finding the code to make the pseudo code work?

No comments posted yet

Your Answer:

Login to answer
252 Like 22 Dislike
Previous forums Next forums
Other forums

Pagination
Hi All,

I think I'm finally getting somewhere with pagination!

I can now submit a quer

Upload, SSL and more php help
I recently just installed a ssl cert and do i use https for the whole site or just for the checkout.

Error In Syntax
I got this error:

Code: Parse error: syntax error, unexpected '>' in /home/bucket/publ

Auto install
Hi I have a directory lets say "apps" that I then have more folders ie "email",

Adding post count
How would I make it so everytime someone clicks submit on my form, their row in the database for the

Warning: Cannot modify header information - headers already sent by (output sta
Warning: Cannot modify header information - headers already sent by (output started at /home/praylif

Passing variables with pagination - iterating through unique id per link
Hi everyone,
I have seen a few topics like this one. Still cant find the specific thing, so I am

downloading a file as HTML
Hi.
I'm rather confused with forcing a download. I just want to save dynamic content (from $_SESS

Cannot Connect to Database
I am writing an application to do annual reviews. I cannot get my script to work. Whenever it runs I

Having a problem get the selected item from a drop down menu
I have been looking at this for days now.

I have a drop down menu that get it's values from a

Sign up to write
Sign up now if you have flare of writing..
Login   |   Register
Follow Us
Indyaspeak @ Facebook Indyaspeak @ Twitter Indyaspeak @ Pinterest RSS



Play Free Quiz and Win Cash