Donnerstag, 12. Juni 2014

A position on bot-generated articles

At the beginning of this week I was asked by a journalist from the Wall Street Journal about my opinion on bot-generated articles and the activity of Lsj and his bot Lsjbot in the swedish wikipedia as an addition to the answers she got from Lennart Guldbranson, a well known wikipedian from Sweden. The following is what I answered:
You asked me to answer some questions concernig the bot-generated content since I may have a more critical opinion on Lsjbot and bot-generated content than Lennart. To claim first: I am not active in the Swedish Wikiepdia so my view is more general on the topic than focussed on sv.wikipedia.org.

To introduce myself: I am mostly active in the German Wikipedia where you can find my profile and work at https://de.wikipedia.org/wiki/Benutzer:Achim_Raschka - there are also some dropboxes where you can find a list of articles written by me. As you can see there my focus is on zoology and especially on animal species - one of the main areas the Lsjbot has been working in.
To come to your questions:
Concerning Lsj:
I do not know Lsj (the user who started with bot-generated contend via his bot Ljsbot) personally and the only work from him I know is the topic on bot-generated content in several wikipedias. I am sure he is doing his best to bring Wikipedia forward and that the start of this project for him is a step to make Wikipedia better again. So from my point I am not in opposition to him as a person even when I am critizising hin in this special topic.
Concerning bot-generated articles:
In this case I have a clear point: I am mostly against the production of bot-generated stubs and especially against this type of content on species that was produced by Lsjbot in the swedish and other wikipedias in a very high number. In my opinion this type of content may only push quantities if you are counting numbers but they aren't useful for readers and for the evolution of a wikipedia (from my point of view)

For readers an article like e.g. Yungasia_tricolor (random article) does not really help if someone is searching for informations. This article only transports more or less correct taxonomic notes on a species name - it does not help if you want to know something on this species - how it looks like, where it lives and how it lives. So if really someone searches for this special species of leafhoppers (what only can be expected for experts in entomology) he will not find any needful information on it that helps - the information given is not better than no information. I would expact that an author at least would tell me where I can find it (Brasil as you can read in the easy to find first description of Zanol 1991) - the main argument that this article is better than none does not count for me.
For quality reasons I  prefer to have less good than millions of non-articles. To compare: Rüppellfuchs was the work of weeks for me and may be an extreme but even articles like Rot-Weißes_Riesengleithörnchen (2 hours work at maximum) show what should be the goal - compare them with the svedish one Petaurista_alborufus (and please compare the number of subspecies that is 0 in sv and 6 in my article).

For browsing users who will find this article by random it is the same - the article is boring to read and 99% of other random articles are the same (try slumpartikel).
For authors: It is often claimed that if there is a short article (stub) new users will come and expand this - my experience tells another story: It is unattractive to expand an existing stub since most users will try to find niches where to start articles from scratch - articles that are not existing are the best way to persuade authors to bring in their knowledge with the start of a new article and even if this may be the same quality as a bot-content it is worth more since it is the start of a potential new author. The best way to discourage potential authors is to present them a field of thousands of pseudo-articles with always the same structure where his own work will get lost between all these and will not be find by others. I think: yes, you can have a 1.5-million-articles wikipedia if you use bots but this will lead to the decrease or even death of user activity in the areas you try to fill with this action. If the German WP were populated with those stubs I don't think that I would be interested to work in this with my knowledge.
To have a compromise: I would think it would be a good idea to use Ljsbot and others to fill in datasets in the WikiData project and provide those data to the authors for example when they start an article to choose if they want to use it. This could increase the quality of WikiData as a database without flooding the wikipedias and could be a valuable addition. The only fields where I could imagine boticles are areas where nothing more is existing than database entries (e.g. galaxies) or for some geographical additions - but also there I see more problems than positive effects.

It is a bit longer than I expected now. Sorry about my not that good english but I think one can understand my points. If there are any open questions please don't hesitate to contact me.
Best regards,
Achim

Kommentare:

AndreasP hat gesagt…

Even good old Denis Diderot had a clear opinion on the topic:

http://de.wikipedia.org/wiki/Aguaxima

Marcus Cyron hat gesagt…

+1