OMII-UK Home

Nwsltr0309A/Nws0309.png

Copying and clouds

by David Woolls CEO, CFL Software.
Nwsltr0309A/CFL.png

Back to index page

Next article: News in brief

Nwsltr0309A/NwsLine.png

In the last two years, CFL Software has been asked for everything: from checking whether UCAS applicants are plagiarising, to stopping music reviewers copying themselves on the Slicethepie website. And that’s in real time, often with tens of thousands of records to check per day. Not a simple task, so CFL are considering the use of cloud computing to help. We asked David Woolls for an overview of their latest work.

The problems that CFL’s customers face have a common theme: normal search methods don’t work, because the users don’t know what the question is, and straightforward pattern matching isn’t an option, because changed sentences need to be identified. We are asked for help because we have a strong background in collusion and plagiarism detection, but we now face the challenge of scaling up the methodology and making the programs work in real time.

CFL has helped the Universities and Colleges Admissions Service (UCAS) by revising our base program, Copycatch Investigator, to run 24/7 on a SunT2000. The effect has been striking. By the second year, there has been a 26% drop in the number of applicants falling into the most serious copying category. No false positives have been reported, and no successful appeal has been lodged. With the advent of cloud computing, such a program could run as a service, with the cloud handling the peaks in demand occurring at application deadlines.

Slicethepie is a music discovery and review website that pays users who submit music reviews. Some users were abusing this system by pasting the same review for each song they listened to. We placed a clause-level checking system inside the Flash player used to play Slicethepie’s music. Now a review is only accepted if it passes tests for self-copying, relevance and brevity (among others factors). Client-side monitoring allows 10,000 reviews a day to be handled with minimal background checking and a 99% clean database. The most notable side effect is that the quality of all the reviews has improved, even though the original problem was only affecting about 3%.

We are also developing a standalone Contextual Query search engine. The fuzzy search and specially designed indexing allows close comparison, at clause level, of complete documents of different lengths with a number of user-customisable parameters. This could be readily scaled to index specific areas of the web that are of interest to a particular customer: a concept which was confirmed at the Cloudscape event in Brussels. Grid technology will be required to handle the large volumes, and cloud technologies to deliver the service. CFL are exploring a strategic partnership with the UK National Grid Service to meet education needs, and looking into an enterprise-level offering of the software as a service in the future.

http://www.copycatchgold.com

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-7) was last changed on 23-Feb-2009 13:23 by SimonHettrick [RSS]

© The University of Southampton on behalf of OMII-UK. All Rights Reserved. | Terms of Use | Privacy Policy | PageRank Checker