Add more simple spam filters

Dear MATLAB answers team,
What about rejecting all new threads, which contain Korean characters and the typical term "seven seven dot" which appear in the title? I'm aware that this request is naive and spammers will find another gap. But at least the current flood would be stopped.
The editors need 4 mouse clicks to report a message as spam and delete it. Then it takes about 15 seconds until this procedure is ready in my browser. Afterwards a new view on the main page of the forum appears, such that I have to click a 5th time to close it also.
It would be more efficient if the opinion of 2 editors is enough to exclude a specific user from posting temporarily until an admin decides how this person is treated.
Unfortunately the forum is not usable at the moment.

5 Comments

maybe it is also a good idea to introduce captcha?
prevention is better than treatment?
@Ilham Hardy: Captchas are a small additional obstacle. A lot of other simple tricks have been mentioned before, e.g. a limit of the number of new questions per hour and day for new users.
Editors should be able to disable an account, pending review by Randy and friends, who can delete it if necessary. Disabling an account would stop further spam and other postings, at least from that account. If the spammers had to create a new account for every few minutes in order to keep posting spam, that would likely be enough of an obstacle to prevent it.
All the latest spam appeared to be coming from many different accounts.

That’s a usual tactic to keep one deleted account from interfering with ongoing spamming. However, being able to deactivate an account to prevent further posting from it would eventually cover all accounts posting spam. I suspect it would become so inconvenient to keep setting up new accounts that the spamming would stop.

Account deactivation would be a separate privilege. It could be revoked by Randy and friends if abused, but I doubt it would. I’ve never seen any evidence of abuse of any other privilege.

Sign in to comment.

 Accepted Answer

Image Analyst
Image Analyst on 26 Mar 2015
In my opinion, captchas should only be required for initial account registration. If I can possibly avoid it with better spam filters, then I don't want to have captchas imposed on me. I post so much that it would be burdensome/punishing on me. Why should I have to be punished because of the spammers? Only force captchas on me if there is no way to have reliable spam filters. I know there are many spam authors now, much more than before, so I don't know if they have some automated account creation bot. It's been so long I don't recall if there is a captcha for account creation or not. If not there should be. I don't mind doing it once but I don't want to deal with captchas 300 times per month.
I agree that having editors disable/suspend accounts for review by Randy, John, or Kevin is a good solution. And if they have captchas there would be a small, reasonable number that we editors could handle. It also might be good if we could also add our own spam filter, at least until Randy reviews it, where we can have it automatically filter out all message with a certain term (web site) in it.
I think in the spam attack, a day ago, they lowered the spam threshold so much it pulled a few hundred legitimate messages (mostly old ones) into the spam quarantine. Those messages eventually disappeared so I don't know if they got deleted or put back.
I wrote a bot to delete spam from the quarantine but since it takes about 15 seconds per spam, it's not workable if there are thousands in there. I know Randy and crew have a way of mass deleting them, instead of one at a time like we need to do. I'd like that power to help.
The daily limit on new accounts is also a good idea. Or perhaps just a captcha on questions, answers, and comments posted by new accounts, but not by old accounts or accounts with more than, say, 10 reputation points.

4 Comments

Exactly.
We do not need a new interface for blocking authors: If two persons marked different threads of an author as spam during deleting, the author should be blocked and the threads should be moved to quarantine. In the attack yesterday the bots used an account for some dozens of threads, such that this would reduce the number of interactions remarkably.
Can attacking persons hide their IP? If not, TMW could block the output of the forum for these IPs, such that they cannot control the success or run the bots any longer. Even smarter: If the spammers get a forum version without removed threads, they cannot decide if they have to adjust their methods.
Additional anti-spam tools/capabilities for editors would be very welcome.
This spam attack exploited a new vector that our filters missed, obviously. We are taking steps to update our spam filter to prevent similar deluges. In addition, we are assessing options in other areas of the site including account creation. As Adam points out above, these spam messages came from hundreds of valid accounts that effected most of the MATLAB Central applications.
This is a good discussion and is helping to inform our decisions for catching and preventing spam in the future.
Higher reputation could be used to avoid needing captchas for posting, but even there, it would be a disincentive for people to answer or post. I'd rather not see that.

Sign in to comment.

More Answers (7)

Chad Greene
Chad Greene on 26 Mar 2015
How about anyone with >50 reputation points can put questions into quarantine and anyone with >500 points can quarantine users? Those values might need to be adjusted, but given how committed someone needs to be to accrue points on this forum, the thresholds could probably be quite low.
Guillaume
Guillaume on 13 Apr 2015
A more effective spam filter is certainly required as the forum is completely flooded this morning. Over 400 spam posts, some of which are even marked as having an accepted answer by the OP while having no answer.
As it is, the forum is unusable.
per isakson
per isakson on 18 Jun 2016
Edited: per isakson on 20 Jun 2016
I've deleted several spams over the last couple of days. Most (all?) of them contained a body text, which was copied from a legal question. Example:
&nbsp
  • Wouldn't it be possible to let med block the sender for a day?
  • They all contained a telephone(?) number in the title. No legal question does that.
&nbsp
A day later: Now I've deleted another few chunks of spams of the same kind. Please, give me a feature that deletes all spams from a specific spammer and requires only three clicks.

14 Comments

There is a checkbox to delete all the spams rather than one at a time. As far as banning a single author for a day, there is a 10 post per day limit on new accounts and by the time you've noticed their spam, they're already hit their 10 post limit so banning them wouldn't have any additional effect on that day. As far as what the Mathworks does about disabling their account to prevent them from doing it the next day, I have no idea. But then they seem quite content to just create a brand new account to get 10 more spam posts so I don't know if banning them would help, though it's not a bad idea. Some places have a "3 strikes and you're out" policy against felons (life in prison after your third major crime) so maybe that would be good here. Couldn't hurt anyway.
Walter Roberson
Walter Roberson on 26 Jun 2016
Edited: Walter Roberson on 26 Jun 2016
IA, currently between roughly 01:15 and 02:15 Eastern, they post spam over a period, and often I am active while they are posting. I would be able to stop some of the public spam postings as they are happening if I could quarantine by user.
Also, even if they have exhausted one particular name, it is a nuisance to go through the ones that made it into the public and open them one by one and go through the delete cycle. It is not infrequently 40-ish posts that have to be gone through individually. The checkbox to delete all the spams is only for the ones that were automatically quarantined, not the ones that made it into public.
per isakson
per isakson on 7 Jul 2016
Edited: per isakson on 7 Jul 2016
More numbers
&nbsp
&nbsp
&nbsp
I still think it would make sense
  • to let me delete all questions from a selected spammer
  • block questions with long, e.g. nine digits, whole numbers in the title
Per, in a recent meeting with some of the Answers team, several of us identified being able to quarantine users as our top priority.
per isakson
per isakson on 9 Jul 2016
Edited: per isakson on 9 Jul 2016
Hope they do something!
"Constant dripping wears the stone" &nbsp I've never seen a long whole number in the title of a real question.
And again
An hour later
Please,
  • automatically delete all post with a phone number in the title.
  • a button to delete all post from a spammer
Five minutes later

Sign in to comment.

Jan
Jan on 12 Apr 2015
Edited: Jan on 12 Apr 2015
You got it!
Today several spammers with the known finger print have been excluded. This took some minutes only, such that I guess they have been recognized automatically.
Thanks to TMW for this tedious but successful fight.
But the spam flood is going on. What about adding pro-active filters, which reject the messages before they are posted in the forum? Chorean characters in the title, more than 3 questions per hour, a very tiny entropy due to repeated text lines.
The attacks look very similar and have a very poor quality. Most likely they are comming from the same person and based on the used IPs an identification should be possible.

13 Comments

What do you mean "excluded"? Do you mean the accounts got banned, or their spam got thrown into the spam quarantine?
I know some spammers, that I've let the Mathworks know about, still post regularly, such as Steve John who lists his professional interests as "Google Adsense". There are 2 or 3 others whose names I see regularly.
I doubt any of the spam deletions this morning were automatic.
I deleted about 100 myself (checking ‘Report this question as spam’ each time), and I know others must have been active at the same time. I’ve deleted a few since.
Editors need to be able to temporarily disable accounts posting frequent, unquestionable spam, pending permanent action by TMW.
I only saw one, so most must have happened before I woke up. I guess they must have found a new "vector", as David said, to get around the spam filter. And I wholeheartedly agree with the idea of having editors suspend accounts. And I also support captcha's for low reputation points accounts (like less than 5) since spammers won't ever get reputation points (unless two accounts are in cahoots accepting/voting for each other, but then we'd also have the suspension ability to take care of those).
It looked like several dozens of spam messages vanished during less than 10 seconds. This let me think of an automatic filter. Recognizing these messages is trivial and TMW has enough skills to create a pattern matching filter.
But You both seem to be convinced, that it was the manual work of the editors? I admit that I've participated a little bit. Some statements of TMW sounds like there are active automatic filters to protect the forum. But it is such easy to detect the current kind of spams, a 4 year old child can recognize them without the ability to read printed characters!
Gaptchas waste time of legal users. Rejecting messages with Chorean characters and a lot of text with vanishing entropy is much cheaper for the programmers, the serves and the valid users.
@Image Analyst: I meant real world names. If "Google Adsense" is the professional interest of Steve John, even Google should be interested in his activities.
I know they do have automatic filters - all of the ones in the spam quarantine got there from the automatic filter, not from us who just delete them immediately. Now whether the automatic filter pre-filters them before allowing them into the forum, or if it scans them after they're in (meaning they might be in there for a short time before the filter notices them), I don't know.
I suspect several people were working simultaneously this morning to delete them.
My favourite tactic is to open each as a tab in Firefox (this works quickly), several at a time, then highlight each of the tabs and mark that post for deletion. When the TMW server catches up, I go back and close the tabs of the deleted spam and open the next set. I can usually keep up with them this way, and sometimes get ahead of them.
Still, being able immediately to go to the account the spam is posted from and inactivate it until TMW had the opportunity to review it and delete the account would make this a lot easier.
In today's early morning hours I deleted about 150 of those spam items myself. There was certainly nothing automatic about that. If Mathworks is interested in continuing this Answers forum, it is essential that they devise much more effective methods of preventing these spam attacks. The spam insertions are of such a primitive nature, surely there are effective methods of detecting and eliminating them in an automatic manner. Mathworks cannot continue to depend on volunteer efforts. There have been times in the past when spam items arrived at a far faster rate than I could delete them working at top speed and not waiting to check the "Report this question as spam" box. I feel that enough of my available time is spent answering questions on the forum. It is unreasonable to expect the additional burden on us of crudely eliminating spam entries one-entry-at-a-time, hour after hour.
Randy told me that there are even more that we never see in the forum or the quarantine. He told me that there are literally thousands every day that just get deleted outright before we ever get a chance to see them. Clearly we're all in agreement that suspending accounts is the way to go. I bet we could suspend them much faster than it would take to create them. And there should also be a check to make sure that no one can make more than, say, two or three (to account for legitimate misspellings) accounts per hour or day from a specific IP address to make sure they don't have some automated way of creating new accounts.
I use exactly the same tactic that Star does. I open several posts as separate tabs, then go to the first one, delete it. While that tab is in delete mode, I swap to another tab, delete that next one too. I can do them pretty quickly that way, maybe one deletion every 10 seconds if I am on a roll and my connection is running fast. Even so, I could not come close to keeping up on a bad day.
First of all, thanks a lot to those cleaning up all this spam. Really.
I agree that something more proactive should be done, though. The forum is very often impossible to use. Legitimate questions are "pushed away" by a flood of meaningless stuff.
I like Jan Simon's suggestions about Chorean characters in the title (and perhaps body) of the message and message entropy. They should be easy to implement.
Thanks a lot again for everything. Francesco
It takes less time to post a spam message than to delete it. This seems to be a design mistake.
I think this guy is improving his technique. Check this post.
It's gone now. What what it? I saw one where the spam was all in a scanned image - a photo - in Urdu language.

Sign in to comment.

Jan
Jan on 17 Apr 2015
What a pitty! The next attack with almost identical looking messages: The same nonsense, the same entropy, the same character sets, the same keywords, the same frequencies, the same slow interaction with the forum's interface when I try to remove the junk.
This is not efficient anymore.
[off topic] I'm going to have a convenient spring. I'm coming back to this forum at the beginning of May and look, if the problem has been solved then. Kind regards and good luck.
Walter Roberson
Walter Roberson on 21 May 2015
The current tactic of the postings I see, is that they grab the first few lines of a recent posting, and use that as the body of a Question, with the "guru" / "black magic" spam payload in the question title.
Walter Roberson
Walter Roberson on 21 May 2015
My experience recently is that each message I click on "Yes this IS spam" in the quarantine, requires 16 to 19 seconds to delete, and some seconds after that to retrieve the current list of Questions and paint that. I use the technique of launch-a-tab-per-message that others mention previously. If I am processing multiple deletes in this manner, then a fraction between 1/4 and 2/3 of the deletions fail, giving a "Something went wrong" page, in which case the postings remain in the list undeleted. The behaviour is much the same for postings that did not make it into the quarantine and which I am hitting Delete and ticking Flag as Spam on.
Because the failure risk seems to rise the more deletions I am processing at the same time (speculation: obtaining a lock on a resource is timing out), there is an effective limit on how many of these tabs I can be running simultaneously, pretty much having to wait until the last of them finishes deletion until I can fire off a new batch, to keep the simultaneous actions down to the point the system can handle.

Categories

Asked:

Jan
on 26 Mar 2015

Commented:

on 12 Jul 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!