writing a screen scraper


Posted on 16th Feb 2014 07:03 pm by admin

Hello,

I'm writing a screen scraper application and want to be able to get absolute addresses for images from relative links.

So a link like this: Code: <img src="../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" /> might link to http://www.myointernational.com/furniture/e-commerce_in_a_box_small.jpg

If I am analysing a web address, I understand that the pseudo code would be something like this:Code: <?php

$string='<img src="../../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" />';
// we need to find the system root and replace the ../ with REAL values.

$url='http://www.myointernational.com/test_dir/';
if($string contains '../'){
$number_of_them=count(the number of them);
}
$i=1
while($i<=$number_of_them){
$tmp_url=go up one level from the $url;
$i++;
}
?>
<img src="<?php echo $tmp_url;?>" alt="E-Commerce" width="100" height="134" border="0" />
How would I go about finding the code to make the pseudo code work?

No comments posted yet

Your Answer:

Login to answer
252 Like 22 Dislike
Previous forums Next forums
Other forums

dropdown box help - open php files to textarea
Hi, I am using tinymce to edit content located in several php files. The code I attached works but i

Using two $_POST Function / Switch () statements, second does not work.
Hi all. I’m new to php and am having a problem getting $_POST Function / switch () to work. I

a very simple php header question (sorry!)
Firstly...I do apologise if this annoys anyone....a header error

I'm do not know php &

Not showing whole name with mail () script
I sent up a simple mail form with the PHP mail() script. One problem is when it sends an email with

Retrieving the 25 most recently added entries from all tables in a MySQL databas
Hello,

The code below works great. It creates a table that shows the 25 most recently added t

My XSRF Prevention code isn't working
First of all, thanks for the generous help you guys have given me in the past on this forum.
Seco

Get last modified date of web page
Hai All,

In php how can i get last modified date of a give web page . I have tried to g

Encrypt php code?
Is it possible to encrypt php code in files,
so that it displays a load of unreadable characters

pageination not working right... coping images over 4 pages
Code: <?php //This code will obtain the required page number from the $_GET array. Note that

simplexml_load_file and rss problem
Hi,

I have a problem parsing an rss feed using simplexml_load_file - this is strange as i hav

Sign up to write
Sign up now if you have flare of writing..
Login   |   Register
Follow Us
Indyaspeak @ Facebook Indyaspeak @ Twitter Indyaspeak @ Pinterest RSS



Play Free Quiz and Win Cash