Skip to content

MicroID submitted as IETF Internet-Draft

Peter Saint-Andre has done the hard work and gotten the MicroID spec submitted to the IETF as an Internet-Draft. This is great news and hopefully will generate some additional conversation and visibility around MicroID.

This draft expires Feb 8, 2008 after which it may continue down the road to RFC.

Peter’s announcement on the MicroID blog:

By popular demand, we have submitted the MicroID specification as an Internet-Draft. Eventually this effort may result in publication of an Informational RFC defining the technology. Please send feedback to the mailing list and we’ll update or clarify the spec accordingly.

Tags: - -

Googlonymous not so Anonymous

I was pointed yesterday to a new site (registered July 15, 2007) I’d not seen before named Googlonymous. There are three items on the main page – a search box, a paragraph of foreboding text, and an embedded flash video.

I’ve included a screen shot and the text below.

googlonymous.jpg

When you make a search on Google, your ip address, the time, and what you searched for is stored in their database forever and this information can be used in a court of law against you. Google will willingly allow authorities to consult their database, they already did as you can see in the video below. When you search on Google through Googlonymous, it is Googlonymous that goes on Google and does the search for you, the only ip address that Google will see, is the ip address of the server of Googlonymous. Googlonymous does not keep any record who searched for what. So this way, it is completely impossible to retrieve your identity. You can search for whatever you want without a care in the world, 100% anonymously.
Click play on the video below to see a fascninating documentary showing the dangers of searching on Google.

The idea behind the site is apparently to inform the public about how our surveillance culture is quickly outstripping our awareness and then to empower them to not be tracked by one of our most favorite technologies today, Google’s Search.

However, ironically, the very embedded video on that site – a copy of CNBC’s report entitled “Big Brother, Big Business” is itself streamed from the very company that the site is trying to help us circumvent.

The video is (currently) hosted at Google Video.

Enterprising engineers at Google could probably very easily, if they wanted, cross-reference your access of Googlonymous and Google Video from the same IP at the same time. It seems the motivation of not streaming the video themselves, the owners of Googlonymous have fallen victim to the lure of convenience and price that is mentioned in the CNBC report they’re publicizing.

When the price is right, we give up some of our privacy and therefore a bit of our liberty. This is not really news – the only reason it’s notable today is the irony.

“Search Google anonymously” and at the same time “stream video from Google”.

Not so anonymous.

The tools of the information age are shiny and neat – but they come with a price for all their magic.

Tags: -

BarCampRDU – Expertise Location

Another successful BarCampRDU this past Saturday. Fred did a great job organizing the organizers and making it all run smoothly. Red Hat hosted again this year and again, to rave reviews. Pictures and Posts.

I was in charge of the big schedule board again. We had it up much faster this year with less tape failures. Technique is very important. And having 12 hands.

I learned how to play Bughouse in the first session. Two chess boards, four players, two chess clocks – and it turns you a bit nuts in less than 10 minutes – which proved just enough time for me to recover before the next hour.

I hosted the next session in the Bughouse room on Expertise Location and had a very engaging discussion around the problems of figuring out “who knows what” and how to keep track of that when you’re trying to hire or place people on teams.

I lured them in with an explanation of my thesis work around Contextual Authority Tagging and asked for input from the “real world”. I heard lots of encouraging comments about how my work meshes nicely with the movement in today’s knowledge management circles away from documenting our knowledge into files (separating the knowledge from the person who knows it) to documenting the people, their work, and simply keeping track of who knows what.

The group agreed that my ideas around tagging others’ knowledge is related to the 360° interview process and the Johari window and its concept of a “blind spot”.

“Everything is pointers.” The overwhelming consensus was that the real way people figure things out is by asking other people, and moving up the chain of expertise until the answer is uncovered. If Bill (who knows about X) doesn’t know the answers himself, he’ll point you to Dave. If Dave doesn’t know, he points you to the next person. This is how we solve problems and if I can help companies do that in a more efficient, documented, trackable way – then everyone agreed I’ve got a very marketable project – as soon as I write it all down, show that it works, and then defend it and get out of school.

The most interesting comment to come from the day’s talk was about a “persistent gap” that may prove itself to exist between what a person thinks they know about and what the group around them thinks the person knows about. Identifying if and when that happens would be a very interesting application of this technique and something I hadn’t really considered before. I’ve been working under the very straightforward assumption that there will be convergence between the three “lists” of terms/tags in my experiment:
– What I think I know
– What they think I know
– What I think they think I know

We’ll see.

The current plan gives me a year to write down what those who have come before me have already done (called the Literature Review) and a year to prove and then write down my own work (called the Dissertation).

Then of course, I’ll have to be a part of that “real world”. Hmmm…

Tags: - - - - -

Clean and store your raw tags like Flickr

Today I was working through how a new application would be handling tags and realized that I strongly believe Flickr has the most robust method of storing and querying tags. I think they do it well and wanted to copy their lead.

The main reason I feel they’ve got the best system is how they handle their ‘raw’ tags and their ‘clean’ tags.

When a photo is tagged at Flickr, the tag itself is saved in two different formats – raw and clean. If you were to tag a photo with “St. Patrick’s Day”, that’s what would remain in your list of tags, visible on screen. But how Flickr encodes and cleans up that tag results in “stpatricksday”. This is a subtle, but powerful model.

It keeps the original tagger happy (“I know how I want to tag things, darn it”) and it makes the tags more functional in terms of finding things later (both for the tagger, and everyone else). The clean tag is what is used in URLs, in the tagclouds, and wherever aggregation is important for statistics. The tags “N.Y.C.” and “NYC” and “nyc” are all ‘cleaned’ down to the same thing (“nyc”) so when a query comes in for nyc, all three original photos would be presented in the results.

I wanted that cleaning function for myself. I looked everywhere today, and couldn’t find it detailed in any one place any more than on the Flickr API pages themselves.

raw: The ‘raw’ version of the tag – as entered by the user. This version can contain spaces and punctuation.

tag-body: The ‘clean’ version of the tag – as processed by Flickr. This version is used for constructing urls.

Flickr-Like Tag Cleaning Regular Expression

Please let me know if you find any errors, or if this gets out of date. I’ll try to keep it current over time.

The function can be described fairly simply – “remove spaces and punctuation, then lowercase”. I wrote a one line ruby regular expression to do this:

clean_tag = raw_tag.gsub(/[\\s"!@#\\$\\%^&*():\\-_+=\\'\\/.;`<>\\[\\]?\\\\]/,"").downcase

The rest of my test code is included below. As far as I could test today, these are the ‘punctuation’ that Flickr is scrubbing from your raw tags:

#!/usr/local/bin/ruby -w

require 'cgi'

# removes whitespace
# downcases A-Z
# removes 27 different punctuation characters
  # quotation marks
  # exclamation point
  # at symbol
  # pound sign
  # dollar sign
  # percent sign
  # carat
  # ampersand
  # asterisk
  # open parenthesis
  # close parenthesis
  # colon
  # hyphen
  # underscore
  # plus sign
  # equals sign
  # apostrophe
  # forward slash
  # period
  # semicolon
  # backtick
  # open angle bracket
  # close angle bracket
  # open square bracket
  # close square bracket
  # question mark
  # backslash
# does not affect other characters (you should safely CGI.escape these)
  # curly brackets
  # tilda
  # pipe
  # british pound
  # euro symbol
  # chinese characters
def clean_tag(raw_tag)
  clean_tag = raw_tag.gsub(/[\\s"!@#\\$\\%^&*():\\-_+=\\'\\/.;`<>\\[\\]?\\\\]/,"").downcase
end

tags = [
  # should remove the offending characters
  "\\"double\\" quotes",          # quotation marks                     doublequotes
  "!excited!iam!",              # exclamation point                   excitediam
  "test@example.com",           # at symbol                           testexamplecom
  "pound#it",                   # pound sign                          poundit
  "$ave on everyThing",         # dollar sign                         aveoneverything
  "i feel 30% better",          # percent sign                        ifeel30better
  "carats^aretasty",            # carat                               caratsaretasty
  "and&this&and&that",          # ampersand                           andthisandthat
  "maris*61",                   # asterisk                            maris61
  "i think (maybe)",            # open and close parentheses          ithinkmaybe
  "F:ooBar",                    # colon                               foobar
  "hyphen-ated",                # hyphen                              hyphenated
  "under_my_score",             # underscore                          undermyscore
  "1+1=2",                      # plus and equals                     112
  "Saint Patrick's Day",        # apostrophe                          saintpatricksday
  "/leaning/forward/ish",       # forward slash                       leaningforwardish
  "Mrs. Jones",                 # period                              mrsjones
  "semi;automatic;parsing",     # semicolon                           semiautomaticparsing
  "back`tick`here",             # backtick                            backtickhere
  "open<and>close",             # open and close angle brackets       openandclose
  "don't[be]square",            # open and close square brackets      dontbesquare
  "you?sure",                   # question mark                       yousure
  "back\\\\slash",                # backslash                           backslash
  # should only encode the rest of these
  "crab|vs|pipe",               # pipe                                crab%7Cvs%7Cpipe
  "東京",                         # chinese characters                  %E6%9D%B1%E4%BA%AC
  "£",                          # british pound                       %C2%A3
  "nice {curly} brackets",      # curly brackets                      nice%7Bcurly%7Dbrackets
  "Mötley Crüe",                # umlauts                             m%C3%B6tleycr%C3%BCe
  "Tōkyō"                       # long o                              t%C5%8Dky%C5%8D
].each do |t|
  print t
  print "\\n\\tcleaned  -->  "
  print clean_tag(t)
  print "\\n\\tescaped  -->  "
  print CGI.escape(clean_tag(t))
  print "\\n"
end

Tags: -

Your Personal Data and whether Google knows all

Google knows a lot about each of us. If you’re doing anything online these days, you’ll be hard-pressed to do it without Google having a hand in a part of it.

Recently, James Thomas decided to not use Google’s products at all for two weeks and quickly realized it made the Internet quite hard to use. They’re everywhere – and he had to go out of his way to force his computer to not lookup or visit google.com. Not exactly an option for the vast majority of users. (He did also note that the Internet was faster…)

Google is moving into the ISP space (over 100,000 organizations already), the photo hosting space, the email space, the calendar space, the higher education space, the analytics space, more banner ads, and now, into the RSS space itself with yesterday’s purchase of FeedBurner, the premiere RSS serving tool. Most high powered RSS sites I see are pushed out over FeedBurner’s network – their statistics and republishing in multiple formats of your serialized datastream are first rate. For $100M, and a promise of a couple years future employment for the owners, Google now has insight into that side of our collective data behavior as well. They’ve got the readership side figured out with Google Reader, and now the serving side is known to them via this deal. How tidy.

I’m starting to sense a shift in my own dealings with the King of Search. I go out of my way to avoid Google Groups and Gmail. I don’t use Google Docs or Google Apps for my domain, even though they’re arguably easier and more functional than most other setups available today. I avoid having them know all the feeds I’m reading (Reader) and things I’m searching for (log out of google account before searching). I can only assume since these services all sport a unified login now, that Google could not plausibly deny that aggregation is possible across all their (growing) properties.

And I trust Google. I do.

But who I don’t trust is everyone else. I don’t trust that Google will not be driven by the government to hand over certain records or prevent themselves from a data breach forever. They are a very high value target.

With regards to Google knowing too much, Fred has a paragraph that’s worth quoting in his post from yesterday afternoon…

Anonymity is the ultimate irony of the internet. The medium is so clouded in the perception of anonymity, it can fundamentally change human behavior. Of course, the reality is that the internet is the most sophisticated data mining tool ever invented. Compared to any offline action, you are less anonymous when you are using the internet. The nature of our revelations in this false anonymous context could lead a CEO to believe that they really could uncover the “true” persona of an individual, hence being able to accurately answer these very personal questions. In fact, this may be partially true; however, what we’d have to give up to get this benefit is almost always too much.

All that said – I truly want my stuff to be online and available to me. I want global access to what is mine – and to be secure in the fact that it’s redundantly backed up and ‘safe’ from the bad guys. I think that is the way of the future. A personal repository of my stuff with nuanced access given to those who need it when they need it. Jon Udell posted something along the lines of what I want earlier this week… Hosted Lifebits…

Grade 11

You’re applying to colleges. You publish your essay into your space, then syndicate it to the common application service. The essay points to supporting evidence — your e-portfolio, recommendations — which are also (to a reasonable degree of assurance) permanently recorded in your space.

College sophomore

You visit the clinic and are diagnosed with mononucleosis. You’ve authorized the clinic to store your medical records in your space. This comes in handy a couple of years later, when you’ve transferred to another school, and their clinic needs to refer to your health history.

Working professional

You use your blog to narrate the key events and accomplishments in your professional life, and to articulate your public agenda. All this is, of course, published in your space where you are confident (to the level of assurance you can reasonably afford) that it will be reliably available for your whole life, and even beyond.

I think we are well on our way to giving up too much. There will always be a wide spectrum that defines how we live our lives, but more and more, we are choosing to give up our personal information for the sake of convenience in the very short term. This is a dangerous precedent and, I’m certainly not the first to say it, but, I’d rather not be the one who jumps first. Have we really gotten to the point where giving up all our privacy is the right answer? Posting everything online?

So it dawned on him: If being candid about his flights could clear his name, why not be open about everything? “I’ve discovered that the best way to protect your privacy is to give it away,” he says, grinning as he sips his venti Black Eye. Elahi relishes upending the received wisdom about surveillance. The government monitors your movements, but it gets things wrong. You can monitor yourself much more accurately. Plus, no ambitious agent is going to score a big intelligence triumph by snooping into your movements when there’s a Web page broadcasting the Big Mac you ate four minutes ago in Boise, Idaho. “It’s economics,” he says. “I flood the market.”

It seems so wrong… Is this just paranoia on my part?

Update: Fred did it again today – went and posted something relevant – Your Private Twitters Aren’t:

If you’ve been Twittering privately for the past few months, I’ve got some bad news. As reported by Meish, the Twitter API does not enforce privacy ACL’s, meaning all of your private Twitters are available to the public. To check this out for yourself, visit http://twittervision.com/username, and you’ll be able to see private Twitter streams.

I must note that it appears that not all accounts are affected by this problem. It’s impossible to calculate the breadth of this breach, or what it will do to Twitter as a company, but it illustrates a greater problem with the internet. What if your Gmail, or Google History, or Facebook/Myspace account leaked? Or what if the government swept up your information in a national security letter, only to have your information posted in court documents? Think it can’t happen, or that these well-meaning companies can even control it? Just ask people at Enron how they feel.

Tags: -

OpenIDs at LiveJournal leaking auth info

Joseph Petviashvili (krotty), creator of the Skype-based Bitchun Society, writes today about his detection that LiveJournal is leaking his auth info via the check_immediate feature in OpenID. I haven’t seen any other discussion of this. Can anyone confirm?

open id from livejournal is not safe

If you are logged in to livejournal, that information can be shared with third parties without your consent through OpenID. Right now livejournal.ru and kommersant.ru are doing it.

Have not found a way to disable it, they are using http://www.livejournal.com/openid/server.bml?openid.mode=checkid_immediate and livejournal is giving out my auth info without asking…

Tags: - -

Transparency trumps credentialism

Larry Sanger has been given a bigger stage. Edge has published his latest essay entitled “Who Says We Know: On the New Politics of Knowledge“. In it he argues against “dabblerism” – a word he made up to help him define his opponents’ position of anti-credentialism. Sanger is a credentialist. He wants credentials to buy a bigger seat at the table – he thinks it’s owed to the experts.

I agree with Larry Sanger about expertise mattering when compiling ideas and opinions about a subject. I’ve said as much before – Democracy is for opinion, not for knowledge. But I strongly disagree with Larry Sanger about how those experts shall be identified and whether their expertise itself should be a proxy for facts that should stand on their own. Facts should be sourced and they should be able to hold their ground on their own terms. If it is true that 97% of credentialed experts agree on view A, then the job of an encyclopedia is to publish the statistic directly following the discussion of what view A is. Whether an expert is the one who picked the particular turn of phrase is inconsequential.

Sanger also conveniently ignores the passage of time as contributing factor for Truth. Wikipedia is not a snapshot. It is not a bound book shipped across the country and sold door to door. It does not come with a year proudly stamped on its spine – declaring at first how new and relevant and then, almost immediately, how dated and quaint the information inside truly is.

Wikipedia allows the best knowledge of the time to be condensed and parsed, argued and sourced – in plain sight. As this knowledge changes, as the facts move and shift because of new discoveries and developments, the Wikipedia changes with it. If experts happen to arrive with new information, and source it well, the Wikipedia can be convinced to publish the new information. If the experts cannot source it, cannot convince the skeptics and the masses that the new facts are indeed facts, then they are sent packing – same as everyone else – to keep digging. This is not to say the masses should have all the power, it’s that if an individual truly feels they can move the discussion forward, they have to bring the evidence – whether they be expert or not.

This is the way it should be.

Because someone comes with credentials, they are not necessarily to be believed. Opinion is where we should defer and perhaps listen to experts. They have knowledge and expertise. They have experience and judgement tested through trial and error and the passage of time. Presumably they’ve even been challenged by other experts, both professionally and at lunch, and so they should be listened to and considered. But how much deference we pay to the experts should be a personal decision. The argument remains that there is no objective truth – and we are each making up our minds as to what we believe. We each use experts as proxy. We should not be told who the experts are – we should be allowed to choose ourselves – and that has to be done on a personal level.

Finally, experts are—albeit fallibly—the best-suited to articulate what expert opinion is. It is for the most part experts who create the resources that fact-checkers use to check facts. This makes their direct input in an encyclopedia invaluable.

Yes, exactly. And I think we’d be hard pressed to find anyone to argue with that. What is at issue is Sanger’s assessment of what follows:

To exclude the public is to put readers at the mercy of wrongheaded intellectual fads; and to exclude experts, or to fail to give them a special role in an encyclopedia project, is to risk getting expert opinion wrong.

It does not follow. Why does allowing experts a spot at the table specifically mean the head of the table? And nowhere still is the process for determining the expertise of the expert defined. What’s the term limit for head of the table? How often are the midterm elections held? Is there only one table?

Here’s a little dilemma. Wikipedia pooh-poohs the need for expert guidance; but how, then, does it propose to establish its own reliability? It can do so either by reference to something external to itself or else something internal, such as a poll of its own contributors. If it chooses something external to itself—such as the oft-cited Nature report—then it is conceding the authority of experts. In that case, who is it who says “we know”? Experts, at least partially: their view is still treated as the touchstone of Wikipedia’s reliability. And if it concedes the authority of experts that far, why not bring those experts on board in an official capacity, and do a better job?

This is not a strong argument. Wikipedia stands on citations from other sources, credentialed sources, sources written by experts. This is not under debate. Wikipedia takes great pride in pointing to others and showing broad consistencies where it finds them – and inconsistencies if and when it finds them. Experts are not needed for this job.

The reliability of Wikipedia is in its transparency. A full audit of edit history and personality and language is available at the click of a button. This is the main reason experts should not be given a big chair at the table of Wikipedia. They are not needed – because the knowledge compiled in Wikipedia is not original research. It is simply a compendium of the very world in which it exists. Its job is to document – and that does not require expert opinion.

Tags: - - -

ClaimID users needed for APM documentary

We just heard from American Public Media’s American RadioWorks – they want to interview some claimID users for an upcoming documentary about online identity.

What fun.

The wonderful folks at American Public Media’s American RadioWorks are looking for ClaimID users to appear in an hourlong documentary about online identity and self-marketing. This is a great chance to tell your story about online identity, as I know many of you have thought about this entensively.

As teens get older and apply for college admission or employment, online identities can be cast in a new light. American RadioWorks seeks individuals, roughly under the age of 35, for inclusion in a radio documentary. ARW is looking for those who have online identities (profiles, blogs, etc.) and who are using ClaimID. ARW prefers to start speaking with individuals before they begin using ClaimID, in order to follow them throughout their process of using the service.

Tags: -

Two ASIST Posters and VCU Technology Days

A good week for hearing back about things.

Both short papers I submitted to ASIST were accepted.

Fred and I submitted a claimID write-up with the title “Self-Representation of Online Identity in Collected Hyperlinks”.

Additionally, my first attempt at writing down my thoughts about the use of social tags over time was accepted with the current title “Tag Decay: A View Into Aging Folksonomies”.

I’m very excited about both of them – and look forward to feedback. These two topics, in their own way, are presenting themselves as the structure beneath my upcoming dissertation research – Contextual Authority Tagging.

The second bit of news this week was concerning my talk at the VCU Technology Days next week in Richmond, Va. I’ve been given the keynote slot at 12:30pm on Wednesday to speak about “Online Identity Management”. Additionally, I’ll be there on Thursday to field questions about claimID specifically. Please drop in if you’re nearby.

It’s funny how weeks go by without a feeling of tangible progress. And then there are weeks like this one.

Tags: - - -

Expressivity vs. Uniformity – social tagging and controlled vocabularies – ASIST Panel

In an immediate follow-up to last week’s panel… here’s another.

What: EXPRESSIVITY VS. UNIFORMITY: Are controlled vocabularies dead, and if not, should they be?
When: 1:00 to 2:00pm April 2nd, 2007
Where: Pleasants Family Room in Wilson Library at UNC-CH
Who: Led by Dr. Stephanie Haas, with panelists Dr. Gary Marchionini, Terrell Russell, Tim Shearer, Christiane Voisin, and Lynn Whitener
Presented by: ASIS&T-UNC

Controlled vocabularies, nomenclatures, LC or MeSH subject headings have a long history in LIS. They make classification, categorization, aggregation, sorting, and other operations easier. But with the rise of folksonomy, recommendors, improved natural language processing techniques and other technologies, are they needed any more, or are they just stifling the creativity of our expression?

This panel was on Monday and the week’s flown by since. I wanted to post my comments and see if any extra controversy could be kicked up after the fact.

For the most part – we didn’t disagree very much. The two (social tagging and controlled vocabularies) seem like different ends of a spectrum and should be able to work together… we’re only at the beginning and we need better tools.

Here’s what I said:

I think nuance and a spectra of understanding are too hard for most people for most things. People want clean lines – they want black and white. If it’s beyond my area of interest or expertise – just give me the answer already! So I think there will always be a place for controlled vocabularies wrought by experts and combed over time. People want the ‘right’ answer.

It’s a simple (read: impossibly complex) question of how high the bar of ‘good enough’ needs to be. And it’s different for every information problem. Each person looking for information has their own biases, their own history, their own level of expertise and will use different words/queries accordingly. As they continue their search, they will, themselves, become more sophisticated and use more in-group or official terminology. That doesn’t make any of the words they used to get that far, incorrect. It just means that all valid paths to the ‘right’ information are valuable.

Likewise, we know that there’s value in having a fixed set of words – for aggregation and analysis, as well as the sense that you’re getting everything the database has to offer.

However, I think we’re entering a new time where many more voices are being heard and recorded – and through all this noise and messiness, we’ll still be able to extract a remarkable order.

What’s up for debate is how dumb that order will look and how much information it will actually provide… Will it truly be the lowest common denominator? I think Gary’s right in that the vast majority of information objects are not worthy of our human attention/time. We’ve got automatic classification, fulltext retrieval, etc. As we move forward, there’s just too much of it. We need to focus our attention on the things that deserve our human attention.

These computers, they’re very good at counting things, you know…

Tags: - - -