When I began this project we had 3,622 tags, of which more than two-thirds are singletons (2,666). In sorting through our existing list to see what’s what, I have already pared it down to about 3,560.
I make a living working with databases and am somewhat fanatic about having a clean dataset. The banes of my existence are duplicates and ambiguous or incomplete information. Keeping database size down is also a concern. Size can affect speed and performance. The first phase of the Tag project will be to clean up duplicates and ambiguous info. The second phase we’ll look at the singletons and see where we can consolidate tags to keep the numbers and size down.
Before we go any further, meet our Official Tag Team mascots:
I did some research at the Daily Kos tag cleanup project. They are trying to pare 40,000 tags down to a list of 5,000. They have a nice Standard Tags page with 1500 of the most popular tags. I hope to have something similar available here. In the meantime, use theirs as a reference. I relied on their expertise to make decisions about how to handle issues with our tags on things such as punctuation, abbreviations and proper names.
Thus, I have drafted the following rules:
None of these are set in stone. They are guidelines, more than rules. And there are always exceptions.
1. Every essay should have at least one tag.
There is a Diary Topic pick list in the Essay editor. If nothing else, use this as your first tag – or if none apply, put something else in the tag field. I will put a note in the “rules” to alert people to choose a topic. This list could still use some refinement.
Exception: I know of at least one member here that does not want their essays tagged. A solution to this is to use the “No Tags” tag if you don’t want tags and also to be sure that no one else adds them inadvertently.
2. Tags should be single words, first & last names, or short phrases.
3. Maximum characters for a tag is 45. There is probably a limit to the total tags you can fit in the box but I’m not sure what it is.
4. Proper names.
Use first and last names.
Use initials only to disambiguate individuals, e.g. George W. Bush and George H.W. Bush
Use periods after intials or a prefix/suffix, e.g. Robert F. Kennedy Jr.
No nicknames – unless that is the most common usage, e.g. Dick Cheney instead of Richard Cheney.
Spelling variations – if there are multiple variants, check the existing tags and select the most frequently used form. e.g. Al-Qaeda / al-Qaida
Don’t include titles or ranks with names. Rep. John Conyers or Gen. Wesley Clark.
5. Use the plural form of nouns, e.g. dogs, movies, candidates.
6. Punctuation –
Do not use periods, hyphens, quotes, question marks, exclamation points or any other non-alphanumeric characters.
Remove extra spaces between words.
Congressional bills and resolutions, e.g. H.R. 1221, H.Res 333 and S. 2323
Proper Names with initials or Jr.
Congressional Districts CA-42
U.S. and U.N. (for United States and United Nations)
7. Abbreviations –
Avoid abbreviations or acronyms unless it is a commonly used reference, e.g. ACLU, FISA.
If you use acronyms, make sure the full name is also spelled out in a tag. (be aware, char. limit is 45)
Do not abbreviate states.
8. Duplication –
To avoid duplicate tags, check the current tag list. Here is the tag list in Alphabetical Order. Right now it is fairly quick to load this list. The fewer tags the better. When you are on the page, you can search it in your browser… press Ctrl+F keys to Find. Sorry this is a clunky way to search but that’s the best we can do right now.
9. Profanity? Preferable that it not be put in tags but it’s not like we would ban people for it.
10. Recommended and Promoted tags? I am ambivalent about these. However, some essays here are already being tagged as such. I will leave this up for discussion.
What is the point of all this?
Things we can do with tags…
Soapblox has a couple of built-in bloxes that make use of the tags: HotTags, Subjects, and Feeds. In the right column (on the front page) these are sections called Hot Tags, Topics, and Action Alert. The Hot Tags show the top tags in use over a given period of time (the number of tags and time length are set by the admin). I dislike the Hot Tags because it overemphasizes the Pony Parties and Open Thread. That’s not the most important thing happening here. Curiously, while the Pony Party is supposed to be an Open Thread, none of the PP’s use the Open Thread tag. But this is a meta discussion for another time.
One alternative to the HotTags is the Topics blox. This is a much better index of our most popular tags. Interestingly, it is the same list of topics that I have added to the drop-down list in the Essay editor. Do people like these tags? Should there be more/fewer? As another alternative, it is possible for us to ditch the Hot Tags and Topics and create our own custom Tags blox – with whatever subjects we want.
The Action Alert blox is a feed of the most recent essays on our site that are tagged “Action”. As you can see, I have tagged this Essay with Action – so that it will stay visible for our tag crew. We could create other mini-feeds for Environment, Elections, or whatever, from our own site or pulling feeds from other sites. For example, the ACLU has an Action Alert feed too. I could put their feed right below the Take Action feed for our site. Unfortunately, they don’t seem to keep their alerts up-to-date; the last posted was in 2007. Note – for the feeds we control how many articles are listed. I have the Action list set to 4 right now.
The dharmazine. This is notlightnessofbeing’s project. It is a very cool way to aggregate our topics into more of a news format, allowing one to scan the headlines at-a-glance. As we get more organized, and nlob gets some breathing room from settling into his new place, this “zine” can be revisited.
Widgets – there are a lot of sexy widgets out there that can be hooked up to Docudharma feeds and placed on our site or shared around the web. nlob and News Corpse know a lot more about this than I do. It looks like the wave of the future to me.
Speaking of the future – Soapblox will soon be open source. We should be able to construct a Tag Search function (which is not available right now) and do lots of cool things we haven’t even imagined yet.
Tag Cleanup Project
Tags may be modified or corrected (based on the rules above) for misspellings, duplicates, ambiguous abbreviations, incomplete names, consolidation of topics, incorrect punctuation, or wordiness.
I have identified sets of tags based on: abbreviations, plurals, groupings, proper names, spelling errors, multiple words, and others that I have never heard of or don’t know what they mean. For the latter, I just want someone to verify they are valid and spelled correctly.
Clean-up lists will be posted in the comments below. For those that want to help out, reply to the list that you want to work on. Come back and and post another comment when you are finished. When all that is done I can re-analyze the list to work on consolidating the one-of-a-kind tags where possible. Going forward just keep an eye on tags in the essays you read and make corrections or suggestions where warranted.
The best way to find the errant tags is to use the alphabetized tag list and scroll through.
Click on the tag and it will bring up all the essays with that tag.
Go into the Essay, click the Add/Edit tags button.
Make changes or do what you need to do.
Click the Add/Edit tag button again to save the changes.
Click the back button a couple times to return to the Essay list or go all the way back to the alphabetized tag list.