Archive for July 2nd, 2008

Twittering images

Wednesday, July 2nd, 2008

Bob and I took a couple hours break this afternoon from our regular work to try and integrate pictures and ink into a Twitter client.

To recap from an earlier post, what I want to do is:

* Display thumbnails of flickr referenced photos where feasible
* Resolve tinyurls and the like so I can see what the links are actually pointing to
* Display thumbnails or web pages that are linked to where feasible
* Support ink drawings in the client which are then posted to flickr (or similar)

As I see it now, I don’t care if Twitter supports ink and photos directly. Adding these features to the clients is more than passable.

With only a couple hours at hand we didn’t get very far. One shortcut was to use the WPF-based open source project Witty. For the most part that got the basics going.

Then with a couple small adjustments we got images displaying as shown here:

wittypictures.png

Now when someone that I’m following (or myself I guess) points to an image directly or to a flickr image a thumbnail of the image is displayed. This works even if the person uses a tinyurl or similar url shortening service.

This turned out not to be too difficult. The approach we took was to change the low level custom control TweetTextBlock and derive it from RichTextBox rather than TextBlock. This gives lots more flexibility over what can be contained in the rendered tweet (since it can display a FlowDocument) as well as providing selection and copy to clipboard.

This didn’t turn out to be too difficult. Thumbnails of web pages was another matter. I tried using the new WebBrowser object in .NET 3.5, but I can’t get it to render anything to a image object. I guess I’ll need to ask around.

Unfortunately, the web page thumbnail problem took up most of the time so I didn’t get a chance to integrate in the ink yet. I’ve done this before with a Silverlight app–posting the ink to flickr–so I don’t think it’ll be too hard, I just need a couple hours of free time. Maybe this will be a good thing to do this holiday weekend.

As for now…it’s time to do the dishes.

Microsoft acquires Powerset

Wednesday, July 2nd, 2008

Yesterday, Microsoft announced that they would be acquiring Powerset–a semantic analysis company. The idea is a good one. Bring in linguistic specialist to help improve search and advertising targetting.

Don Dodge points explains this further on his blog.

I do have a couple comments for Don on linguistics, searching and ads.

If you want to improve “search, portals, and advertising,” you don’t need to go so far as thorough semantic analysis. Just take a look at TechMeme or to a degree my permutation, thredr.

Both of these sites go a step beyond page rank and try to cluster and rank limited sourced content. It’s a simple idea: The quality goes up if the quality goes in. Now, leveraging trusted sources will get you only so far. There is more clustering work to do and TechMeme does a pretty good job here.

Notice how Powerset is able to leverage trusted sources too. At the time Powerset went public with its search, Michael Arrington pointed out how good Powerset’s search results were so a few of us took the search challenge. And what did we find? That in large part by restricting a search to Wikipedia pages on non-semantic aware search engines we could return the same results. Limiting the search across all of Wikipedia’s permutations was the key to the search. Powerset just does this because that’s all it indexes.

Now this doesn’t mean that context isn’t a good thing. Knowing when a post is a review or commentary is worthwhile and should help it when matching against content. However, for high quality content you’d be surprised how often context is self-contained or easily extractable.

I do suggest that Microsoft take a step back and realize that there’s quite a bit of low hanging fruit here courtesy of some human input.

For instance, I’ve long advocated to Microsoft that they should focus some of their efforts on delivering quality ads to none other than their MVP community. First, the sites and blogs that the MVP community members manage have already been vetted. It’s like TechMeme’s and thredr’s blog list. I can guarantee you that few if any of the sites these people run want ads outside of their domains, yet most of the ad services are just as happy to deliver ads from eBay or NextTag and the like. There’s economic incentive to do this unfortunately. These poorly targetted ads though junk up the sites and lower the overall content. Is there any wonder that TechCrunch, et al use focused advertising and not Google or Microsoft’s ad service? Nope.

One other point here. A computer algorithm isn’t going to be the ultimate answer. Remember, people write the programs. In fact, the code is doing editorializing itself; it’s just that it’s automated. Further, a great search or ad algorithm is going to require constant adjustments. That will be its value. It will have scale plus freshness. Take either of these away and it’ll be less interesting.

Oh, and another issue that most online sites run into: A big company can’t buy an ad on a little site. A large enterprise won’t have the processes to cut a payment to an unknown vendor. If you don’t believe me, ask around. Plain and simple they have to go through intermediaries, which only creates inefficiencies. It’s a simple fact. Creating an ad purchasing infrustructure that enables the biggest buyers from the smallest sites and the smallest buyers from the largest sites is the key here. I can’t explain it any simpler than this. It’s not rocket science.

As you may know, I’ve been advocating for awhile indexing content in ways that leverage more than text. I’ll dig up a link when I get a chance. But till then, it doesn’t take much brainstorming to realize that there’s great value out there that electronic devices can sense and record and leveraging this information can be quite useful–moreso than text in many instances I believe.

Finally, enterprise search. Is there a partially naive approach to enterprise search like there is to content clustering of blogs? Could be. Inside an organization there’s quite a bit of standardization–even among its loose collection of information. Some of it will be in databases already with psuedo meaningful column headings. But there’s also a bunch of information that’s easliy mineable in chart and column data that needs to be “reverse generated.” In other words the data was once in a computer and generated yet its value to a typical search engine is as flat as all other words. It’s actually not, but most indexing extraction tools don’t know any better. Is linguistics the solution? Not here. There are much simpler and more direct approaches available.

I worked at a company awhile back that did quite well turning computer generated content back into queriably content. You’d be surprised how simple and valuable this can be. Yeah, you could try to hook all of your disperate databases together so the CEO/CFO can ask this or that question of the enterprise, but it’s actually far easier to analyze the computer generated content. Go figure. So my advice to everyone, is to look for this type of enterprise content first and analyze it. I bet you’ll be surprised how valuable this lost content will be. Just think about it, there was a reason someone purchased this content from the outside or generated the content internally. It has value. It’s not like all the other words.