writing a screen scraper


Posted on 16th Feb 2014 07:03 pm by admin

Hello,

I'm writing a screen scraper application and want to be able to get absolute addresses for images from relative links.

So a link like this: Code: <img src="../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" /> might link to http://www.myointernational.com/furniture/e-commerce_in_a_box_small.jpg

If I am analysing a web address, I understand that the pseudo code would be something like this:Code: <?php

$string='<img src="../../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" />';
// we need to find the system root and replace the ../ with REAL values.

$url='http://www.myointernational.com/test_dir/';
if($string contains '../'){
$number_of_them=count(the number of them);
}
$i=1
while($i<=$number_of_them){
$tmp_url=go up one level from the $url;
$i++;
}
?>
<img src="<?php echo $tmp_url;?>" alt="E-Commerce" width="100" height="134" border="0" />
How would I go about finding the code to make the pseudo code work

No comments posted yet

Your Answer:

Login to answer
132 Like 46 Dislike
Previous forums Next forums
Other forums

Simple program to copy files between two computers over the web
I use logmein free and often need to move files between my two computers. Generally, I move the file

How to show more than 1 users with this code...
Hello,
i have a table that shows users only if I, as Administrator, want to be shown. But its sho

Reditecting pages based on logic
Newbie here, apologies in advance!

I am trying to use the header function to forward one of a

IIS7 and getimagesize() problem
I have just discovered after hrs of trying to fix a problem where it's coming from. It's to do with

$_POST
Hi, I have 2 seperate php files, and i want my $_POSt["fname"] To go into both of them, Fo

C++ Http request?
Code: string Response = "LOGIN_UNSUCCESSFUL"; System.Net.WebRequest request = Sy

PHP Script runs on CLI but not through web browser
I am running into an issue that I just can't seem to find the answer to. I have a Windows Server 200

Pagination won't carry results past page 2.
Hi all,

I've worked out my pagination script and its paginating fine until I click next from

word wrap in emails help needed
Hello, I understand how wordwrap works in php and have used it well before. However when I used wor

Time script, set to my time zone?
I am using this line to get the date and time

Code: $time = date(F." ".d.", &q

Sign up to write
Sign up now if you have flare of writing..
Login   |   Register
Follow Us
Indyaspeak @ Facebook Indyaspeak @ Twitter Indyaspeak @ Pinterest RSS



Play Free Quiz and Win Cash