Best way to cross matching large datasets


Posted on 16th Feb 2014 07:03 pm by admin

Hi,

Im running a script where am I cross matching about 200 000 data sets with each other. Each data set consists of 8 parameters and I want to count all datasets which have similar or the same parameters for each data set.

Right now, I am doing the matching via a MySql query which im calling about 200 000 times. The problem is that using a query is extremely expensive… it takes up to 2 hours until the script is done. So I am wondering if there is a better method to cross match data sets and if some of could help me find a better solution.

While researching I found out that arrays may be a faster alternative to queries. And so far, I identified 3 possible ways for cross matching:

1. nested foreach () loops
foreach($array as ar1)
foreach($array as ar2)
if ($ar1[0] == $ar1[0])….

2. Using an Array_map with Callback function, so that i would have only one "hand coded" loop
foreach($array as arr)
if ($arr[0] == $parameter)….

3. Array walk where i could save one "hand coded" loop as well.

Theoretically would be the best/fastest way to go about it? Can Anyone tell me what technically the difference between those 3 ways is? And which one is the better approach or if there other alternatives to them?

I am thankful for any advice that helps me reduce execution time!

No comments posted yet

Your Answer:

Login to answer
343 Like 48 Dislike
Previous forums Next forums
Other forums

Multi Dimensional Array Append
Hi Guys

I have a function that returns a multidimentional array eg

$result = functio

disabling a button server-side then re-enabling client-side breaks button postback
I have a tabbed container and a button (not in the container) on a page. If the first tab is selecte

Displaying image from database
Hi,

I've got a site where that's got a database behind it. Currently it has loads of items in

How can I uploading Transactional Data from Legacy to New SAP system
Hell Gurus,

I am overwhelmed with questions on how to migrate huge volume of Sales Orders

Consuming third party Payment Gateway API from Procedure.
Hi All,

First of all i would like to thank all the people of the oracle forum for providi

PO Release Strategy Issue
Hi Experts,

I am facing an issue related to PO release strategy. The details are-
<

Displaying values from a SQL count...
Hi,
Easy enough question i just forget the exact method ..
I have ...

Code: //count win

Check Date/Time in PHP
What I need the code to check is that $reQuest is 24 hours in the past then return a time based on h

Do While statement
hi guys,

This may sound trivial but im new to php and as part of an assignmenti have to const

Which practice of iteration through containers is preferred
In the "real world" what kind of loop do most people use to iterate through a container like a vecto

Sign up to write
Sign up now if you have flare of writing..
Login   |   Register
Follow Us
Indyaspeak @ Facebook Indyaspeak @ Twitter Indyaspeak @ Pinterest RSS



Play Free Quiz and Win Cash