writing a screen scraper


Posted on 16th Feb 2014 07:03 pm by admin

Hello,

I'm writing a screen scraper application and want to be able to get absolute addresses for images from relative links.

So a link like this: Code: <img src="../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" /> might link to http://www.myointernational.com/furniture/e-commerce_in_a_box_small.jpg

If I am analysing a web address, I understand that the pseudo code would be something like this:Code: <?php

$string='<img src="../../e-commerce_in_a_box_small.jpg" alt="E-Commerce" width="100" height="134" border="0" />';
// we need to find the system root and replace the ../ with REAL values.

$url='http://www.myointernational.com/test_dir/';
if($string contains '../'){
$number_of_them=count(the number of them);
}
$i=1
while($i<=$number_of_them){
$tmp_url=go up one level from the $url;
$i++;
}
?>
<img src="<?php echo $tmp_url;?>" alt="E-Commerce" width="100" height="134" border="0" />
How would I go about finding the code to make the pseudo code work?

No comments posted yet

Your Answer:

Login to answer
252 Like 22 Dislike
Previous forums Next forums
Other forums

Losing 'page' data
I have this code that allows me to update my database. But after updating, I lose the $_GET['page']

Option box to change variable
Hello, i need help by making a script!

I need to write a file with option box, so a dropdown

About imagecopyresampled()
Hello,

I am looking to use this function to resize parts of an image to a fixed thumbnail siz

max() problem
I have a while loop to get image names.
Code: $imagequery = mysql_query("SELECT * FROM ad_i

how to timeout $doc = new DOMDocument()->load($url)
I am currently using this $doc->load($url) to fetch an rss feed.
If a feed takes long than

setcookie and isset($_COOKIE(name)) seem very finnicky.
I'm currently playing around with a user system with login and registration. I'm trying to use cooki

List/Menu Box
On an edit page when you want something to select what a user has previously selected from the datab

Undefined Index Notice In Internet Explorer Only
I am getting the following notice:

QuoteNotice: Undefined index: jrox in /home/ycsn/public_h

need Array help
This is what I have to do.
$teamname[1] = "Red Sox"
$teamname[2] = "Gian

Function to extract email attachments using PHP IMAP
function extract_attachments($connection, $message_number) {

$attachments = array();

Sign up to write
Sign up now if you have flare of writing..
Login   |   Register
Follow Us
Indyaspeak @ Facebook Indyaspeak @ Twitter Indyaspeak @ Pinterest RSS



Play Free Quiz and Win Cash