writing a screen scraper


Posted on 16th Feb 2014 07:03 pm by admin

Hello,

I'm writing a screen scraper application and want to be able to get absolute addresses for images from relative links.

So a link like this: Code: <img src="../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" /> might link to http://www.myointernational.com/furniture/e-commerce_in_a_box_small.jpg

If I am analysing a web address, I understand that the pseudo code would be something like this:Code: <?php

$string='<img src="../../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" />';
// we need to find the system root and replace the ../ with REAL values.

$url='http://www.myointernational.com/test_dir/';
if($string contains '../'){
$number_of_them=count(the number of them);
}
$i=1
while($i<=$number_of_them){
$tmp_url=go up one level from the $url;
$i++;
}
?>
<img src="<?php echo $tmp_url;?>" alt="E-Commerce" width="100" height="134" border="0" />
How would I go about finding the code to make the pseudo code work?

No comments posted yet

Your Answer:

Login to answer
252 Like 22 Dislike
Previous forums Next forums
Other forums

Wierd echo error?
Hi, i got the most wierd php error ever and i don't know why..
Code: echo "<t

What's best way to get a user's Word doc converted to simple html and images?
Hi all,

I was just wondering if anybody has any experience of this.
Basically, I'm buildin

Looping Problem
I've got a client that has a database with about 200 events at any given time. I'm trying to loop t

Update Myspace status with CURL
Logging in:

Code: <?php

class Myspace
{
function login($username, $pa

PHP If Else statement for breadcrumb
Hi

I am trying to use a PHP if else statement to display a breadcrumb link on wordpress

smart reading from a text file
Hello there fellow coders, i was wondering if one of you wouldnt mind helping me with this problem i

Date Format
Hi there,

I have a date format like this right now:
Sat, 17 Oct 2009 17:04:00

I ne

RTF fomatting to email content
Im trying to sen an email with content is picked up from a rtf-file (file_get_contents('*.rtf'). Mai

Target costs on Process orders not calculating
Hi All,

We have released standard costs for all the materials. We have also done Goods r

Slow data retrieval which requires improvement..please help
I am working on a Help Desk Ticketing system and have a page called MY TICKETS which shows all ticke

Sign up to write
Sign up now if you have flare of writing..
Login   |   Register
Follow Us
Indyaspeak @ Facebook Indyaspeak @ Twitter Indyaspeak @ Pinterest RSS



Play Free Quiz and Win Cash