writing a screen scraper


Posted on 16th Feb 2014 07:03 pm by admin

Hello,

I'm writing a screen scraper application and want to be able to get absolute addresses for images from relative links.

So a link like this: Code: <img src="../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" /> might link to http://www.myointernational.com/furniture/e-commerce_in_a_box_small.jpg

If I am analysing a web address, I understand that the pseudo code would be something like this:Code: <?php

$string='<img src="../../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" />';
// we need to find the system root and replace the ../ with REAL values.

$url='http://www.myointernational.com/test_dir/';
if($string contains '../'){
$number_of_them=count(the number of them);
}
$i=1
while($i<=$number_of_them){
$tmp_url=go up one level from the $url;
$i++;
}
?>
<img src="<?php echo $tmp_url;?>" alt="E-Commerce" width="100" height="134" border="0" />
How would I go about finding the code to make the pseudo code work

No comments posted yet

Your Answer:

Login to answer
132 Like 46 Dislike
Previous forums Next forums
Other forums

MySQL issue
I am taking sentences through a form on page. Then checking some condition and trying to insert them

PHP, jAVASCRIPT setting input values, why isn't it working?
Hi:

I am calling this javascript:
<script language="javascript">

Onclick problem in Firefox
Hi,
I am using a . It doesn't seem to

Retrieving innerHTML with cURL?
Hey all (sorry I know I'm a leecher, but I soon won't be. This is my first PHP project, but not my l

Stopped working!
So I had to change some stuff, none of it major. Stuff like the extension_dir and such. BEFORE I did

PHP error. Need help urgently
Hi,

I am programming a php site and have a problem that i just cant find out how to fix. When

Getting rid of quotes when printing data
Hi Guys, I use the filter_var FILTER_SANITIZE_STRING to filter the textarea input. The function esca

How can i use [] tags instead of <> tags for profiles??
On my site im making ive made it so u can register, login and u have a profile which u can update..<

Extract text from string
Hi folks,

I have a string that looks like this:

aaaaaaaaaa:
bbbbbbbbbb (ccccccccc)

RFQ Configuration - can you make PLANT field an optional field in ME42
Is there a way to make the plant field on an RFQ optional in change mode (ME42)?

Sign up to write
Sign up now if you have flare of writing..
Login   |   Register
Follow Us
Indyaspeak @ Facebook Indyaspeak @ Twitter Indyaspeak @ Pinterest RSS



Play Free Quiz and Win Cash