Archive for the ‘Searching’ Category

Microsoft acquires Powerset

Wednesday, July 2nd, 2008

Yesterday, Microsoft announced that they would be acquiring Powerset–a semantic analysis company. The idea is a good one. Bring in linguistic specialist to help improve search and advertising targetting.

Don Dodge points explains this further on his blog.

I do have a couple comments for Don on linguistics, searching and ads.

If you want to improve “search, portals, and advertising,” you don’t need to go so far as thorough semantic analysis. Just take a look at TechMeme or to a degree my permutation, thredr.

Both of these sites go a step beyond page rank and try to cluster and rank limited sourced content. It’s a simple idea: The quality goes up if the quality goes in. Now, leveraging trusted sources will get you only so far. There is more clustering work to do and TechMeme does a pretty good job here.

Notice how Powerset is able to leverage trusted sources too. At the time Powerset went public with its search, Michael Arrington pointed out how good Powerset’s search results were so a few of us took the search challenge. And what did we find? That in large part by restricting a search to Wikipedia pages on non-semantic aware search engines we could return the same results. Limiting the search across all of Wikipedia’s permutations was the key to the search. Powerset just does this because that’s all it indexes.

Now this doesn’t mean that context isn’t a good thing. Knowing when a post is a review or commentary is worthwhile and should help it when matching against content. However, for high quality content you’d be surprised how often context is self-contained or easily extractable.

I do suggest that Microsoft take a step back and realize that there’s quite a bit of low hanging fruit here courtesy of some human input.

For instance, I’ve long advocated to Microsoft that they should focus some of their efforts on delivering quality ads to none other than their MVP community. First, the sites and blogs that the MVP community members manage have already been vetted. It’s like TechMeme’s and thredr’s blog list. I can guarantee you that few if any of the sites these people run want ads outside of their domains, yet most of the ad services are just as happy to deliver ads from eBay or NextTag and the like. There’s economic incentive to do this unfortunately. These poorly targetted ads though junk up the sites and lower the overall content. Is there any wonder that TechCrunch, et al use focused advertising and not Google or Microsoft’s ad service? Nope.

One other point here. A computer algorithm isn’t going to be the ultimate answer. Remember, people write the programs. In fact, the code is doing editorializing itself; it’s just that it’s automated. Further, a great search or ad algorithm is going to require constant adjustments. That will be its value. It will have scale plus freshness. Take either of these away and it’ll be less interesting.

Oh, and another issue that most online sites run into: A big company can’t buy an ad on a little site. A large enterprise won’t have the processes to cut a payment to an unknown vendor. If you don’t believe me, ask around. Plain and simple they have to go through intermediaries, which only creates inefficiencies. It’s a simple fact. Creating an ad purchasing infrustructure that enables the biggest buyers from the smallest sites and the smallest buyers from the largest sites is the key here. I can’t explain it any simpler than this. It’s not rocket science.

As you may know, I’ve been advocating for awhile indexing content in ways that leverage more than text. I’ll dig up a link when I get a chance. But till then, it doesn’t take much brainstorming to realize that there’s great value out there that electronic devices can sense and record and leveraging this information can be quite useful–moreso than text in many instances I believe.

Finally, enterprise search. Is there a partially naive approach to enterprise search like there is to content clustering of blogs? Could be. Inside an organization there’s quite a bit of standardization–even among its loose collection of information. Some of it will be in databases already with psuedo meaningful column headings. But there’s also a bunch of information that’s easliy mineable in chart and column data that needs to be “reverse generated.” In other words the data was once in a computer and generated yet its value to a typical search engine is as flat as all other words. It’s actually not, but most indexing extraction tools don’t know any better. Is linguistics the solution? Not here. There are much simpler and more direct approaches available.

I worked at a company awhile back that did quite well turning computer generated content back into queriably content. You’d be surprised how simple and valuable this can be. Yeah, you could try to hook all of your disperate databases together so the CEO/CFO can ask this or that question of the enterprise, but it’s actually far easier to analyze the computer generated content. Go figure. So my advice to everyone, is to look for this type of enterprise content first and analyze it. I bet you’ll be surprised how valuable this lost content will be. Just think about it, there was a reason someone purchased this content from the outside or generated the content internally. It has value. It’s not like all the other words.

Online shopping tools aren’t worth that much

Wednesday, June 25th, 2008

There’s no doubt that the web has changed and will continue to change commerce. However, the current notion that I’m going to go to sites to search for a product to buy has a fundamental flaw. A flaw that makes shopping more inefficient than it needs to be.

Here’s the problem. Most of the time I don’t think people are trying to buy one thing. Yet all of the online shopping tools have this in mind. Here’s a single line text field and type what you’re looking for. It’s wrong.

Shopping–the real kind of shopping that I want my computer to help me with–is an optimization problem more akin to scheduling than single-text entry searching.

Of course, the challenge here is to design a shopping “scheduling” service that’s no more difficult to use than say searching for something online.

Here’s one of several problems that the current shopping search tools bubble up: They over emphasize price. This is not a good thing for the store, nor do I think it’s a good thing for the consumer. Price at any cost is not always a good thing. What about product delivery times? Ability to return a product? And so on.

It makes complete sense, therefore, for shopping tools to help me focus on more than price. This is particularly true if I need to purchase more than one thing.

Now Amazon has done OK with its integrated customer reviews and “you might also be interested in XYZ” suggestions, however, this isn’t what I think most people want–unless they are truly wanting to buy only one thing. Now stores can leverage getting you in the door and into their walled garden with one good product price and then try to upsell you, but as a consumer that’s not what I want.

Put another way, I think most of the shopping search sites should focus more and more on shopping as a goal. I want this type of product to solve this type of problem. I may think I know what I want, but maybe I’m not right. Maybe I can be persuaded otherwise. Help me out.

And then let’s say I understand what I need better, then help me get what I need in an optimized way. Can I purchase everything locally in one trip to a single store within 10 miles of where I’m at? Can I “purchase” from one place that then manages the delivery of my items so I get them all on the same day so I don’t have to worry about packages arriving over three days? Or what about sales and coupons and best times to buy either historically or based on current price opportunities?

On the flip side to all of this, I think most shopping search sites need to rethink how they facilitate grazzing. Why aren’t more sites about images than text? Shouldn’t the images be the lead and the text secondary? Sure for the computer it’s the other way around, but it shouldn’t be for the human. Just look at any printed catalog.

Lastly, I think there’s another whole opportunity that’s way underserved for shopping on the go. When I have my shopping list, tell me where everything is in the store. When I’m looking for something point me to where I might find it. When I’m stuck and can’t find something in a store, give me an online chat person if nothing else–it would be faster than trying to track down someone and asking them where something is.

Anyway, lots to do. That’s for sure.

Search test: Google, Live, and Powerset. The winner is….

Sunday, May 11th, 2008

After TechCruch’s comments the other day about how terrific the new natural language aware PowerSet.com search would be I was eager to check it out. I was going to sign up for the beta and then I decided to wait for the launch. I didn’t have to wait long. It’s up now. Check it out.

I admit I am quite skeptical about the Powerset venture. The “core” of the product may be from Xerox Parc, but I’ve seen lots of people try to throw technology at search and see it come up short. My skepticism was telling me that this was to be another case.

Powerset is trying to leverage natural language processing to improve the quality of search. Rather than go for indexing tons of web pages they decided to focus on the semantics and what they could glean from Wikipedia (in one case). Half of this makes sense. The focusing on Wikipedia part. I’m completely guessing about the NLP side and from this part I’m guessing they focused too much on the NL and not enough on flat out the semantics regardless of any technique.

Anyway, so now that Powerset has launched I decided to do a 20 second test and I think many people will be surprised at the results, but not in the way you might think.

Here’s what I did. I searched for the difference between Tablet PCs and UMPCs:

powersetsearch1.PNG

(Click to enlarge)

The results were much like I expected. It’s hard to tell if any of the results targetted the query I gave.

Of course, I was being a bit unfair with my question. So I split it up into two parts. First, I asked what a Tablet PC is:

whatisatabletpcpowerset.PNG

(Click to enlarge)

And then “What is a UMPC?”:

whatisaumpc.PNG

(Click to enlarge)

It’s just my opinion, but neither sets of results are that good. And what’s with the semantic summary at the top of the query results? What does it mean for UMPC features to be “system and low.” And worse, what does it mean for UMPC to “takes” “flight.” I can guess, knowing what I know about the market, but why am I guessing? I am performing the query supposedly because I don’t know the answer.

I’m not surprised by the poor quality of these results though, because Wikipedia has a small draw and being community driven it’s going to have a disproportionate voice that doesn’t “get” Microsoft’s efforts. And as a complete guess I wasn’t surprised to see the UMPC and Tablet PC to fall into this category.

So unimpressed with my 20-second Powerset search I decided to try Google. I asked it “What is the difference between a Tablet PC and a UMPC?” Not too bad. From what I see the third link is to an article entitled “How to buy a UMPC or Tablet PC”. Hmmm. That might give me a pretty good description of the differences I presume.

whatisthedifferencebetweentabletpcandumpcgoogle.PNG

(Click to enlarge)

Not completely satisfied though, I decided to try Live. For the same query, here’s it’s results:

whatisthedifferencebetweentabletpcandumpclive.PNG

(Click to enlarge)

My. My. Look at this. The first link is to a forum post that is titled: “What is the difference between Convertable and Hybrid tablets.” Kind of close in terms of it being a comparison, but actually the link is of mediocre quality and a bit off target. I’m looking for a comparison between Tablet PCs and UMPCs.

The second link is a another so-so match. It’s titled “So what’s the difference between the Samsung Q1 and Q1B?” Both of these are UMPCs. Not qiute right.

Link 3 is getting warmer though–at least the title is more suggestive: “Define the Ultra-Mobile PC.” However, if you follow the link to Gottabemobile, it’s more about what UMPCs are and their differences with low-cost PCs, such as the Eee PC.

Scanning down the page though, you’ll see several articles with titles including the phrase “what is the difference between…”. Although none of them are exact matches, this does suggest that Live Search is placing greater sorted emphasis on content that also contains mention of at least UMPC or Tablet PC. Not bad. In fact the bottom two links on the page are “Difference between a MID vs UMPC” and “What is a UMPC.” If you read through this article, sure enough it compares in bits and pieces UMPCs to Tablet PCs. To me, although the artile is biased towards talking about a UMPC, it’s hands down a winner.

(Note: The Live query results aren’t very good if plural keywords are used, such as Tablet PCs and UMPCs. So the stemming logic in Live isn’t so hot. Not terrible. Just not as good as it could be.)

Yahoo search with the same query gives a valient effort too by strongly matching against the phrase “What is the difference between…” However I dont’ see articles that are strong matches, although I do see mentions of Tablet PCs and UMPCs which lead me to believe that with a little digging I might figure it out.

whatisthedifferencebetweentabletpcandumpcyahoo.PNG

(Click to enlarge)

I also tried Ask.com, but the results weren’t that good though there is a sidebar in which you supposedly could narrow the search by clicking on “Definition of a Tablet PC” and “What is the Tablet PC used for.” Of course, the narrowing list doesn’t mention anything about UMPCs. So I left this out.

whatisthedifferencebetweentabletpcandumpcyahoo1.PNG

(Click to enlarge)

So, my conclusion after a couple of 20-second queries is this: I’ll stick with Google and Live with Yahoo in third place. Powerset? Well, maybe if I just want to search Wikipedia.

Was this a fair test? No. I really need to do more tests. However, it is in a domain I know something about and I’d expect any search engine to handle well. It’s not that obtuse a topic. Now, maybe I’m using the Powerset search engine “wrong” and another form of queries would do well. I’ll be watching out for the experiences of other bloggers.

An aside: Michael Arrington gives kudos to Powerset for returning good results for the query “when did earthquakes hit tokyo” and suggests that people try Google to see how good Powerset is. Well, he’s right. The results from Powerset return the first hit with “The special wards of Tokyo are as follows: ….Tokyo was hit by powerful earthquakes in 1703, 1782, 1812, 1855, and 1923.” A very good match for earthquakes in the last few hundred years. The results from Goole aren’t that spectacular. However, if you search for “wikipedia when did earthquakes hit tokyo,” you’ll be surprised. The third hit to “Dogpile” has the phrase…”Tokyo was hit by powerful earthquakes in 1703, 1782, 1812, 1855 and 1923. The 1923 earthquake , with an estimated.” This is the exact same phrase Powerset returned.

tokyoearthquakesgoogle.PNG

Yes, Google could tweak their results to take into account language more. You can see that in how Live and Yahoo appeared to have good results with my earlier queries. But is this a tweak to Google or a $100M business?

Update: Danny Sullivan does a much better job of explaining the potential value of Powerset. I don’t agree with him about the value of the semantic summaries (their value applies when you already know the meaning behind the sparse words) and in terms of the outline I think he’s right, it looks like there’s potential there. However, this means that the content will have to be contained within Powerset. That may work under Wikipedia’s license, but not other content. So I’m confused how far this is going to go. Now if Powerset wants to leave it at being a better host for Wikipedia content, that’s one thing. But a general search engine? That’s another.

Live Search getting better?

Tuesday, May 6th, 2008

I actually used Live Search last night. Really! That’s the first time Live Search has outpaced Google for me. Is this a trend? I’m going to watch more closely.

Google tries a SearchMash in Flash, but what about ink support?

Friday, October 26th, 2007

Google made public an experimental “SearchMash” that’s built with Flash. It’s main focus appears to be to experiment with how search results are displayed and interacted with. Flash gives the dev team a flexible framework to work within–a more expressive framework than let’s say Javascript.

OK, all of this is fine, but what would have been really cool is if the team had integrated ink support on their input frame. They could have done something akin to my SearchTIP. Or if they had gotten really creative they could have included math support, like in the MathTIP.

Live.com search gets a refresh

Wednesday, September 26th, 2007

Todd Bishop highlights the most recent changes to Live Search.

I’m not seeing the changes that Todd mentions, so I’ve updated my post for now. Hmmm.

SearchEngineLand has more on the changes.

How Google could support handwritten queries

Friday, September 21st, 2007

Over the last couple weeks I’ve been talking more about the idea of handwriting search engine queries–and particularly about leveraging handwriting to ease the process of entering complicated text queries, such as math problems.

You can try out the Silverlight app at www.TabletPCPost.com/search and www.TabletPCPost.com/math for yourself if you haven’t done so yet. It lets you handwrite queries within the browser (using a stylus or a mouse or other pointing device) and then recognizes your handwriting on a remote server and then sends off the recognition results to Google when you’re ready.

Supporting handwriting in a search engine may seem esoteric, but the idea gets really interesting when you look how it could enable a wider variety of queries, such as math or physics or chemistry problems. Here the idea is to facilitate more “query types” that often are seeking more help or particular answers than “search” per se. By supporting these additional query types, the search engine expands its notion of being an authority source. If you want to know about xyz, go here is the idea.

Anyway, so here is a mockup of one way that I could see handwritten queries integrated into Google or Live or whatever search engine. One way is to offer a landing page which is 100% designed for ink. Another is to provide an integrated approach which does not get in the way of the majority of text users.

So let’s say at Google.com or the academic search version that Google provided a small pen icon like this:

GooglePen.png

Tap on the pen button and the default text field could collapse and an alternate inkable surface appears:

GoogleSearchTIPPanel.png

The user then could handwrite their query in the panel with the recognized text being returned from a remote server:

GoogleSearchTIPQuery.png

Once the query is properly formulated, the user presses the Search button and gets their search results.

To get back to the default text field they could refresh the page or tap on the text icon.

Handwritten queries make even more sense where it can be tedious and error prone to enter the queries textually. Take math problems, for instance:

MathRecoAndSilverlight.png

And now think about an answer engine that shelves your handwritten “queries” for access later. Makes quite a bit of sense with ink since the queries can become very complicated. And think about other problem types. Imagine a query that returns interactive plots. Formatted and publishable diagrams. And on and on. Can you also imagine how this could be a nice entry point for a partner infrastructure? I can. I’m getting ahead of myself.

Of course, there’s no doubt that integrating in a Silverlight or Flash panel to a heavily used query page is going to increase the page loads for more people than will actually use it. But then again, as more and more answer services are integrated into the search engine this ratio could change. Initially non-integrated handwritten queries probably make the most sense, followed by certain query pages–such as those for students, engineers, or other specific markets.

If you’d like to comment on this idea you can do so on my other blog here.

Math recognition and search engine answers

Wednesday, September 19th, 2007

The other day I talked about how search engines ought to support more extensive problem type answers. Why? Because they build audience and authority.

One such example is with math problems. Already, your favorite search engine probably is able to calculate the answer to simple math problems. Type “2+2″ into one of the leading search engines and it’ll probably realize you want to an answer to the problem rather than a literal search and the engine will most likely return “4.”

What I’d like to see is this notion taken even further. Of course, entering complicated math problems textually, for instance, can get very painful. One alternative is to support handwritten input of equations as I’m showing here.

I’ve posted a simple demo app, written with Silverlight and a web service, that can take handwritten math equations, recognize them, format the problems sufficiently for Google, and then passes them off to Google to be solved.

Here’s a video of the app in action:

And a screenshot for reference:

MathRecoAndSilverlight.png

Of course, you might ask why doesn’t the recognizer solve the problem too? Why involve Google or some other “answer engine.” Well, it could. But the idea here is to emphasize here in the demo that I don’t think it’s just any old webservice that is needed–it’s one of authority. One that you go to whenever you’re looking for an answer–particularly answers that you are occasionally looking for.

I’d like to see the search engines become answer engines for a whole host of problems. Math being one problem type.

Anyway, you can try out the app at http://www.TabletPCPost.com/math.

A few things to note about this demo app. The equation recognizer is based on one that was written by someone else. I don’t know the person’s name, but I think the original code was called MathInk or something like that. I believe it dates back to the early days of the Tablet PC. (Update: It was called MathsInk.)

The recognizer does have some quirks and can get rather slow with large equations. It’s also fairly limited in its grammar. It does support the square root symbol, sin, cos, tan, and log. You can also use a vertically oriented numerator and denominator.

A couple tips:

* If you’re having trouble getting “sin” to be recognized, try not dotting the letter “i”.
* Don’t use parentheses around the values that immediately follow sin, cos, tan, and log. The recognizer will work with parentheses, but it often misrecognizes them. So write “log10″ rather than “log(10)”.
* The multiplication symbol is an “x” rather than an “*”

I’ve also noticed a couple performance bugs, but since this is one of those “I threw this together in under a day demos,” so I’m afraid I haven’t had a chance to clean them up. I hope you don’t mind and that it doesn’t cloud the idea I’m trying to get across too much.

Should search engines give all the answers?

Friday, September 7th, 2007

Ever since Google first put in the ability to get simple math answers (type in 4+2 and Google will return 6) to their search engine, I’ve been in on-again-off-again debates with friends if this is a good idea. I think so. Strongly. Many others think returning calculated answers like this is off the mark. That’s not what search engines should do. I disagree. I think that search engines can be quite useful not simply as search engines per se, but rather as answer engines. Searching is great. Getting answers is terrific too. Both are needed.

In fact, I’d like to see Google take the idea further. I’d like to be able to enter even more complicated math expressions and have Google (or substitute your favorite search engine) return the answer or maybe solution or maybe a pretty formatted version of it that you can paste into a Word document and so on.

Of course, entering a complicated math problem in a search engine’s edit field isn’t that easy. That’s where inkable queries comes in.

For those that are new to my obsession with ink, here are the basics: Using Silverlight or some comparable browser-compatible technology the goal is to enable people to handwrite queries that can be recognized either locally or remotely and then sent off to an answer engine. Handwritten queries can be much more expressive in

Here’s a mock up of a simple math problem written into an inkable panel:

GoogleMathMockup500.png

You can play around with a live version of this SearchTIP over at TabletPCPost.com/search. In this case, no complicated math recognition is supported, but you can handwrite simple problems, such as 5*3.541, and get an answer from Google.

What should an answer engine return for a Math problem can be tricky. Not everyone wants the same thing. Maybe someone wants to just simplify an expression. Or pretty print it. Or create a textual representation. Or plot it. Or see how the problem is solved. Lots of possibilities for extensibility here.

All of this could be integrated into Google search proper or probably better yet initially hosted somewhere like math.google.com.

The idea doesn’t have to begin and end with math problems, however.

What about physics.google.com or chemistry.google.com? Imagine sketching out a physics problem in Google and asking it what various forces are or the volume of something or ask it to animate the diagram and create a youtube video of the animation or what about having Google/Live Search/etc save all your handwritten problems. You get the idea. Yes, there are desktop apps that do much of these–and even some online ones now. But why not channel the traffic through Google or whomever in one place? Have a question? Go to XYZ to get the answer.

Anyway, that’s my take on it. What’s yours? You can leave a comment over on my other blog here.

Scoble shakes up the search conversation

Tuesday, August 28th, 2007

The other day Robert Scoble posted a couple videos on his thoughts about the future of search. Namely he argues that search has become so polluted with poor quality links that users are ready to look elsewhere. He asserts that the winners will be services that leverage social graphs to improve the search quality. His lists includes TechMeme, Facebook, and Mahalo as likely winners.

I’m not so ready to place my bet on these companies–outside of TechMeme, which has no other search service that’s even close to providing a “searchable” view into the blogoshpere–but I do agree with Robert’s suggestion that social graphs, or authority, can help improve search.

Yesterday he’s posted links to a variety of responses–most strongly attack his assertion that search is in for a change.

I agree with Robert in that search needs improving. Searching blogs is one such example. The results are often poor. Blog’s can contain a lot of worthwhile information, about personal experiences at work, play, in terms of health care, or eyewitness accounts of breaking news. I’ve pretty much settled on using blogsearch.google.com for most of my blog searching, but it is riddled with garbage–much of it spam blogs. I don’t quite get it why it’s so difficult to clean this up. TechMeme gives me a glance at what a pool of bloggers is talking about, but if you want to check on things outside of their conversational pool you have to go elsewhere. Currently there aren’t many good “elsewhere” choices.

One point that Robert hints at is that if search at its most fundamental level was built with a social graph in mind, would it be much better. I don’t mean strapped on. I mean at the DNA level of the search engine. I’d argue search would be much different. But this wouldn’t mean it couldn’t be gamed. I bet it would be. I’m also not sure if it would give me always what i’m looking for.

To me, the better place to create a search engine with a different DNA is to focus on what’s new or what has changed. When I’m looking for authoritative archived responses that’s one thing. However, much of the rest of the time I’m looking for what’s breaking right now or what’s been learned right now. It’s a bit of a dichotomy, I know. Two extremes. That’s why search can get so muddled I think when it’s put together.

Robert is seeking more authority. I’m seeking better access to the latest information, organized by time–either most recent back or from original posting on a topic to most recent, or yes, maybe even filtered by authority.

Either way, I’d argue that at the core–at the DNA level–a successful search engine competitor should try something different. Robert puts forth leveraging social graphs. I’d build an infrastructure around deltas. Somewhere, sometime, I’m guessing someone is going to figure out a twist on search and when they do, we’ll start changing, slowly our focus on search per se, to something else. With all the gaming of search engines today, I’m ready. Over time I think Robert’s correct, others will too.

That being said Google could lead the way in this transition. There might be a trend in the direction of creating a cell phone optimized search that’s even better at getting to what you want–in large part because the screen real estate and bandwidth are more limited than on a a desktop. Live Search has a mobile version that’s thinking this way too. Who knows, maybe down the road we’ll grab for our cell phones to do searches rather than a notebook or desktop because the large screen versions have too much extra “stuff” and the cell phones get me what I want, faster.

Reporting blog spam

Tuesday, August 7th, 2007

I often search through blogsearch.google.com for the latest blog posts on a topic. It can be frustrating at times though. Sometimes the amount of blog spam is overwhelming. It can be so out of control that I think Google needs some help. A button that says “This link is spam” would be helpful. I’ll put my name and online reputation behind my spam votes too.

From what I see it doesn’t appear that it would be too difficult to come up with an algorithm to prune out these spam posts. Some would get through no doubt. And the spammers would probably come up with a workaround. But up to this point it looks like Google isn’t winning, which is frustrating.

Google please clean up BlogSearch

Tuesday, July 24th, 2007

Google’s BlogSearch is one of my top ways of searching the blogosphere. The service does have its problems though. It lists lots of junk blogs scattered throughout its results.

Try searching for something “spam friendly” yet useful, like “cancer.” Yes, there are links to news items. Yes, there are some blogs. However, there are quite a few spam sites. I’d rather not link to them to show you what I mean, but I’ll post here some of the URLs (portions of them) and you’ll get the idea:

http://www.interestrate.getinfohub.com/interest-let-mortgage-only/….
http://www.edu.mississippidebtconsolidation.net/
http://delmararguello.myweblog.com/2007/07/25/viagra….

Sometimes the spam noise isn’t that bad. It depends on the search. However, it seems like Google could be doing a much better job here.

I don’t mind if Google has a “View Spam” option, but by default I’d like to turn it off–even if it slightly hides too many sites.

I’m sure this is a cat and mouse game with the spammers, but it’s a battle worth fighting. If a blog consistently links to affiliate sites, it gets tagged. If a blog continually has text excerpts to other articles and product links otherwise, call it spam. Use too many “spam-typical-keywords” in your blog titles, and sorry the blog gets marked as spam. And so on.

From what I’ve seen, this will knock out some small reseller blogs which often copy/paste text and then add links to their products to sell, but so be it. They probably should be in another category anyway.

Will Ask’s recent search changes convince you to switch?

Tuesday, June 5th, 2007

Ask has done a nice job updating their search pages.

Despite the improvements they’ve made, I’m not ready to switch. Why? I’ve become completely hooked on Google’s BlogSearch and News Search feature that enables me to quickly restrict searches to the last hour, last day, previous week, or previous month. I use this feature all the time now. It’s one of my favorite ways now to watch what’s going on right now and how big a particular topic is at a given time.

So far Google combines the most features I find useful when searching. Do I want more? Absolutely. Search is not done. But, for how I search, Google is still in the lead.

Search engine economics encourage “spam” advertisers

Monday, May 28th, 2007

Dare Obasanjo talks about how search engines are in some ways encouraging companies to squat on domain names.

In the short term, it appears that Google (and the others that provide web advertising channels) have as much incentive as the website publishers themselves to encourage more ads–no matter where they appear. Of course, if the ads don’t perform over time, the advertisers will pull back their advertising in the long run. The ad platform providers don’t necessarily want this.

I wonder if there is research, though, on whether the ads on “incorrect” websites produce better search results than those where the user was directed to the correct site. (If so, the advertisers won’t necessarily mind where the ads were actually hosted.) It seems like for a certain class of search queries and ads this might actually be the case. For instance, if a person is searching for information on product X, but mistypes a URL or query, and winds up on a spam site with an ad for product X, it might be likely that a user will click on a product X ad on the page. Why not? The ad is right there. It’s the quickest way to get to what the user wants.

If enough users select ads this way, then it’s not only the “incrorrect” website publishers and Google that are benefitting, but it’s also the advertiser, and at least some significant portion of users. Where would the incentive be to change this–except maybe from the “correct” publishers that are losing traffic. The challenge here is that the publishers don’t provide the money–the advertisers do, so I’m not sure they have enough sway to address the problem.

There is another area where the spam problem is getting out of control–with the search engines focused around blogs. Google’s BlogSearch, for instance, has quite a few “spam” content providers that place simple links to other blog content or optionally copy in whole or in part other blog content and fill the pages with ads. It doesn’t seem right. I’m guessing this trend will end once–if ever–the advertisers come to the opinion that these spam blogs are less efficient for them. One could make the case, for instance, that these mis-directed searchers are clicking on ad links because the searches themselves have failed. A better search engine would improve the search experience so more users could get to the desired results withouth clicking on an ad. If so, the advertisers will save money. Of course, this might mean that the ad platform providers will earn less–but I don’t think so in the long run.

Fortunately, for at least blog search, the spam sites degrade the search experience so much that given enough spam blogs getting through, the more likely it will be that users pick another search engine, which will lead to an obvious drop in revenue for the search engine. That’s incentive enough to address the problem.

Should there be more videos in search results?

Sunday, May 20th, 2007

Robert Scoble: “Microsoft and Yahoo still could beat Google if they got more “live” and more video sources.”