writing a screen scraper


Posted on 16th Feb 2014 07:03 pm by admin

Hello,

I'm writing a screen scraper application and want to be able to get absolute addresses for images from relative links.

Did you know?Explore Trending and Topic pages for more stories like this.
So a link like this: Code: <img src="../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" /> might link to http://www.myointernational.com/furniture/e-commerce_in_a_box_small.jpg

If I am analysing a web address, I understand that the pseudo code would be something like this:Code: <?php

$string='<img src="../../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" />';
// we need to find the system root and replace the ../ with REAL values.

$url='http://www.myointernational.com/test_dir/';
if($string contains '../'){
$number_of_them=count(the number of them);
}
$i=1
while($i<=$number_of_them){
$tmp_url=go up one level from the $url;
$i++;
}
?>
<img src="<?php echo $tmp_url;?>" alt="E-Commerce" width="100" height="134" border="0" />
How would I go about finding the code to make the pseudo code work?
No comments posted yet

Your Answer:

Login to answer
252 Like 22 Dislike
Previous forums Next forums
Other forums

Can we convert non uni code system into unicode
Hi All,

Presently i am using non-unicode system and the sap version is 4.7.
Can i c

PHP Captcha Error help - replace the "die" command
Hi All,

I am after a bit of help with a Captcha spam protection box.

The site gave me

Search function
I am looking for some guidance from the experts.

I am trying to create a search function. It

Why is this function returning a false value when it shouldn't be??
This is in an include file. I want it to check a value in an html form and see if it's just white s

Weird Problem with curl while sending data
I am facing a weird problem with sending data to a site via curl.

If i send the value by this

ORA-00932: inconsistent datatypes: expected - got CLOB
SO : windows xp
database : XE

there is a table (transito) with two fields of clob

Click counter to ignore traffic from search bots
I have a click counter on my site that...well, counts the number of clicks a link gets on the frontp

Probably a simple error...
I'm getting the error -- Parse error: syntax error, unexpected '{' in /home/content/c/s/t/csteffen24

Loop Through Date Range
Hi guys,
I have date range as parameter like 01/JAN/2009 TO 16/JAN/2009 now i want to loop thro

Reduce redundancies in switch functions?
Hello all! I somewhat new to PHP, and was wondering if anyone could give some suggestions on a swit

Sign up to write
Sign up now if you have flare of writing..
Login   |   Register
Follow Us
Indyaspeak @ Facebook Indyaspeak @ Twitter Indyaspeak @ Pinterest RSS



Play Free Quiz and Win Cash