writing a screen scraper


Posted on 16th Feb 2014 07:03 pm by admin

Hello,

I'm writing a screen scraper application and want to be able to get absolute addresses for images from relative links.

So a link like this: Code: <img src="../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" /> might link to http://www.myointernational.com/furniture/e-commerce_in_a_box_small.jpg

If I am analysing a web address, I understand that the pseudo code would be something like this:Code: <?php

$string='<img src="../../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" />';
// we need to find the system root and replace the ../ with REAL values.

$url='http://www.myointernational.com/test_dir/';
if($string contains '../'){
$number_of_them=count(the number of them);
}
$i=1
while($i<=$number_of_them){
$tmp_url=go up one level from the $url;
$i++;
}
?>
<img src="<?php echo $tmp_url;?>" alt="E-Commerce" width="100" height="134" border="0" />
How would I go about finding the code to make the pseudo code work?

No comments posted yet

Your Answer:

Login to answer
252 Like 22 Dislike
Previous forums Next forums
Other forums

IIS on 2003 anon access disabled issue
I have an IIS server that belongs to a domain, and anonymous access is turned off for all my sites.

How to grab certain words ??
I am working with this API

Returns JSON.
curl http://letsbetrends.com/api/current_trends

PHP and MySQL Question/Help
I have a MySQL db with all my servers and all their details like server name, IP, OS, RAM etc etc 26

Javascript or not?
How many people prefer javascript/ajax sites? How many prefer the good old fashion straight php sit

Find only certain URLs from page ... regex (semi-complete script)
Hi guys,

What I need to do is take a page & extract all the URLs from the page &a

urldecode question
How would I format this line of code properly?

<?php echo urldecode($_GET['Title']); ?

$_POST
Hi, I have 2 seperate php files, and i want my $_POSt["fname"] To go into both of them, Fo

question about header() security
is is safe to just use the header() function to redirect someone if they are, say, not logged in? or

Need help with simple code, back and forward buttons.
Basically, I have a set of pages in a folder, which have the title somephrasehere_09.php, somephrase

Consuming third party Payment Gateway API from Procedure.
Hi All,

First of all i would like to thank all the people of the oracle forum for providi

Sign up to write
Sign up now if you have flare of writing..
Login   |   Register
Follow Us
Indyaspeak @ Facebook Indyaspeak @ Twitter Indyaspeak @ Pinterest RSS



Play Free Quiz and Win Cash