writing a screen scraper


Posted on 16th Feb 2014 07:03 pm by admin

Hello,

I'm writing a screen scraper application and want to be able to get absolute addresses for images from relative links.

So a link like this: Code: <img src="../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" /> might link to http://www.myointernational.com/furniture/e-commerce_in_a_box_small.jpg

If I am analysing a web address, I understand that the pseudo code would be something like this:Code: <?php

$string='<img src="../../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" />';
// we need to find the system root and replace the ../ with REAL values.

$url='http://www.myointernational.com/test_dir/';
if($string contains '../'){
$number_of_them=count(the number of them);
}
$i=1
while($i<=$number_of_them){
$tmp_url=go up one level from the $url;
$i++;
}
?>
<img src="<?php echo $tmp_url;?>" alt="E-Commerce" width="100" height="134" border="0" />
How would I go about finding the code to make the pseudo code work

No comments posted yet

Your Answer:

Login to answer
132 Like 46 Dislike
Previous forums Next forums
Other forums

Files in current folder. Should be an easy fix.
Never mind. I've asked about this before and just found my answer. Anyway to delete this?

How to Detect it is public_html or httpdocs?
How to check whether it is cPanel or Plesk?

If it found public_html perform <?php incl

IP question
ive got 2 ip addresses both global from same user how would i detect if they are local to each other

Having problemswith multithreading and prime numbers
I have an assignment when I'm suppose to do the following:

Write a multithreaded Java, Pt

Converting from one format to another
"I have got a date in DD/MM/YYYY but I need it in MM-DD-YYYY. Help!"

Somebody hacked into my site and changed coding >>> URGENT HELP NEEDED <<<
I am not that much into programming , but somebody is hacking to my site and injecting some kind of

Help on Order Entry Form/System Where is best to begin.
I have a dilema and a very short amount of time at this point and I'm looking for some help on decid

Ball movement
I want to move a ball from one point to another and it should hit a group of balls at the other end

Rounding a number queried from a database
I know that to display a rounded number you just do echo "round($number)";. But how would

UDP server recvfrom() always returns -1? :(
Hello all,
I am getting a very strange error in my code :( I am writing a server application in C

Sign up to write
Sign up now if you have flare of writing..
Login   |   Register
Follow Us
Indyaspeak @ Facebook Indyaspeak @ Twitter Indyaspeak @ Pinterest RSS



Play Free Quiz and Win Cash