writing a screen scraper


Posted on 16th Feb 2014 07:03 pm by admin

Hello,

I'm writing a screen scraper application and want to be able to get absolute addresses for images from relative links.

So a link like this: Code: <img src="../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" /> might link to http://www.myointernational.com/furniture/e-commerce_in_a_box_small.jpg

If I am analysing a web address, I understand that the pseudo code would be something like this:Code: <?php

$string='<img src="../../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" />';
// we need to find the system root and replace the ../ with REAL values.

$url='http://www.myointernational.com/test_dir/';
if($string contains '../'){
$number_of_them=count(the number of them);
}
$i=1
while($i<=$number_of_them){
$tmp_url=go up one level from the $url;
$i++;
}
?>
<img src="<?php echo $tmp_url;?>" alt="E-Commerce" width="100" height="134" border="0" />
How would I go about finding the code to make the pseudo code work?

No comments posted yet

Your Answer:

Login to answer
252 Like 22 Dislike
Previous forums Next forums
Other forums

Session problem?!?
Hello All! I am very new to the php world but I am working on fixing things that a previous programm

Code Review - SQL and Insertion Attacks (Warning: Not for Newbs)
Hey guys,

Its been a while, I know. Use to love coming here to answer peoples questions, but

Help with parsing this html
Hi,
I've got some html i just need a couple of strings from.. argh, it's freaking me out. I've t

Secure pages Sessions vs. Cookies & session_destroy() help
Im new here and new to PHP, I hope you can help me with some questions.

Im writing my web ap

Alternate messaging
I have 4 strings in MySQL db1

$string1 : Hello
$string2 : Hi
$string3 : Great
$strin

Warning: Cannot modify header information - headers already sent by (output sta
Warning: Cannot modify header information - headers already sent by (output started at /home/praylif

Code error with Index.php
Error: Parse error: syntax error, unexpected T_STRING, expecting ',' or ';' in /home/runevid/public_

SCO Unix
I know this might not be the place to ask, but, can anyone tell me if SCO Unix comes with PHP built

Tournament Brackets (Double Elimination)?
Is making a double elimination tournament style bracket system capable of being done in php?

difference between datetimes
($row['totime']-$row['fromtime'])/60

this is giving me 0

example of totime and fromtim

Sign up to write
Sign up now if you have flare of writing..
Login   |   Register
Follow Us
Indyaspeak @ Facebook Indyaspeak @ Twitter Indyaspeak @ Pinterest RSS



Play Free Quiz and Win Cash