Thread

Posted on Sun Apr 22 19:09:11 2007 by nigelhorne
report_all dedup
It would be useful if the report_all() message had a dedup boolean option. If set to true, only one report would be sent to an ISP.
Direct Responses: 4970 | Write a response
Posted on Mon Apr 23 23:39:11 2007 by adamowski in response to 4960
Re: report_all dedup
Doesn't report_one() method do that already?
Direct Responses: 4971 | Write a response
Posted on Tue Apr 24 00:17:11 2007 by nigelhorne in response to 4970
Re: report_all dedup
According to the man page, report_one() has no options.
Direct Responses: 4972 | Write a response
Posted on Tue Apr 24 00:34:57 2007 by adamowski in response to 4971
Re: report_all dedup
But what difference would there be between running report_one() and report_all(dedup => 1) ?
Direct Responses: 4973 | Write a response
Posted on Tue Apr 24 00:43:31 2007 by nigelhorne in response to 4972
Re: report_all dedup
report_one() reports one and only spam report_all(dedup => 1) reports lots of spams, but if any are found to be to be the same ISP, that ISP receives only one email.
Direct Responses: 5054 | Write a response
Posted on Sat May 5 03:37:42 2007 by adamowski in response to 4973
Re: report_all dedup

That is doable, but quite an amount of work and requires more parsing of SpamCop-generated pages. So it would make the module more dependent on the layout and content of the page etc. and more vulnerable to any changes that SpamCop would make to the page.

So let's ask ourselves, is it really useful enough to justify the changes?

The duplicates can be of 2 types AFAIK:

1) The exact same message ( many copies of one instance of the message - all having identical body and headers, especially Message ID) - removing such duplicates makes sense, but they occur only if you accidentally submit one message many times.

2) Many instances of the same message (to be exact, many messages with the same body, but differing with their Message IDs) - they actually should be reported as separate spams, even if they come from the same ISP, to allow for more accurate spam traffic statistics for this ISP.

It seems a good idea to implement this functionality for problem 1), but it will be considerable amount of work. Problem 2) isn't a problem, actually.

Write a response