Archive for the ‘Misc’ Category

Should smart photos and smart text editors lead the semantic web?

Sunday, March 30th, 2008

There has been on and off again chatter about the semantic web. I can appreciate the goal of making the information on the Internet more searchable, processable and valuable.

Exactly how we get there is anyone’s guess.

The classic approach is to focus on text content. That makes sense, because that’s where the most value is on the web up to this point. However, with the explosive growth of digital cameras and live video feeds pounding at the door and ever smarter cameras as I outlined in an earlier post this all may be changing.

Here’s the deal: Automatic semantic interpretation of text is a tough problem. And human-based tagging of text is a pain. It’ll only get us so far. What we need are algorithmic friendly tools that will ease the growth of the semantic web.

As I pointed out in the last post, one of the tricks we need to employee is leveraging sensed data. The thing is that for the most part, text is written by a human and only consists of text. Photos and video streams come from devices and as such potentially also have augmenting sensory information. There might be local and global positioning information, there might be depth maps that go beyond the images themselves, and so on. Combining this information with a priori knowledge as I described in the post linked to above, you could make some rather good inferences about what’s in the images or at least what their context might be.

I think that leveraging the collective world of a priori knowledge plus sensory information that can “index” into it, would give the semantic web the most scalable and powerful results for the near term.

In fact, it could change the whole search game. Assume for instance that you’re searching for information on some new gadget. Text searches work well. But all text being equal it can be a bit tricky to find the best match based on the text. Search engines use authority and other measures to guess at what to return as search results. But assume that a writer of an article took and posted a photo auto-tagged with the product name, taken by him or herself, taken from the conference where the product was announced, and from within a private press area in the conference? Now it might be a bad judgement, but this may be the closest thing to a primary reporting source based on the image in the article, not just the text. As such, it probably ought to rank higher than other articles–no matter how authoritative they might be in other respects.

Now text does give us useful information. Analyzing the words, sentences, quotes (essentially social links), text format (short declarative, essay, Q&A, bulleted, etc), temporal context, and the like can give us clues about the meaning or context. But I also see potential information beyond what can be analytically extracted from the static text itself. For instance, editors could pay more attention to what we’re writing. For instance, when you’re typing all text is equal. But if you’re going back time and again editing a particular paragraph or sentence, that’s pointing out something to the program. It may be useful. It may not. Or what about collaborative edits? From your coworker? From your boss? From an anonymous online editor within a Wiki? Looking at these deltas the editor may be able to infer what’s important. After all, you’re probably putting more time into the key points, than minor ones. This may be a bad guess, but it points out that how we type may contain quite useful information. Think about it: A movie about the US’s Declaration of Independence doesn’t focus primarily on the words of the document itself, but rather the struggles over key words and phrases in the document as it was written. The edits.

There’s one other area where semantic processing may be relatively easy and that’s with processing computer generated content–for the most part. (Think databases at this point.) Column names and table names in databases often mean something. An app searching the web, scanning computer generated data ought to be able to leverage these and the databases themselves. With developers coallescing around a common language, or subsets of languages, it’ll ease interpretation of the results later. In some cases, the database-hosted information will be most important to interpretation. In some cases, the human-focused web pages will. In some cases, it’ll be the intersection, union, or non-overlapping nature of the information.

No matter what techniques actually make up the semantic web, my guess is that they will be incremental and will probably gain popularity and value because of some additional changes in how we do things. Might this be with smarter, sensory-based cameras? Dunno, but that’s where my guess is now.

Image recognition problem solved? The solution is easy to see.

Sunday, March 30th, 2008

There’s another round of bloggers talking this morning about image recognition–this time because tagging startup tagcow.com has entered the mix. Tagcow wants to help you tag images using some as of yet undisclosed processes. However it is done, photographer Thomas Hawk is impressed with the service. Michael Arrington suspects that humans are behind the magical process. Could be. Image recognition is tough–no matter how much startup passion you apply to it.

My stomach churns every time I hear about another image reco startup. Why? Because I think they’re essentially starting at the wrong end of the problem. For most image recognition, you don’t want to start with the image, you want to start before you’ve taken the image. Using whatever hardware or software combination you can, you want to be able to sense directly or infer directly at the time that the image is captured and tag the photo based on this data. If you’re taking a photo of people, let the camera tag the general area where the people are in the image. The camera at least has the potential of detecting the people (via motion or IR sensing) This is actually quite doable. Not perfect, but doable for many standup shots.

How might this work? The cameras need more sensing built in and open access to this information.

Yes, cameras already include fairly sophisticated sensing. They can adjust image capture based on distance measurements or light measurements or guesses about horizons and objects moving in the image and so on. This is a good start. But it puts the pressure on the camera companies to do all the work. As people want to do more and more electronically with their images, however, as you can see with tagging, the camera companies can’t keep up. One result is that people start dreaming up businesses to try to address the problems that the cameras aren’t solving. Unfortunately they are trying to solve a problem late in the pipeline, which only makes their work quite challenging, and quite often is a waste of money.

The better solution? Build cameras that are open platforms–both in terms of software and hardware. You need to be able to add sensors focused on your tasks at hand. You need to be able to tweak the camera’s software not only to improve the photo quality, but to target the tagging you need for the way you take photos. Many of the best techniques–whatever they are–eventually will make their way into the cameras themselves–but for the early adopters and trend setters, there’s not usually going to be enough there.

So what kind of hardware and software am I suggesting? I’d like to see hardware and software solutions that directly sense or infer the tags about the photos I’m taking at the time–or at least based on the sensed information of the image at the time.

If there is any image processing to do, processing image sequences yields better data than that which you can get from analyzing a single frame. You can see motion. You can average out noise. You can build confidence measures over time. You can try to build context from frame to frame. Working with one frame is tough–at times even for a human.

OK, so you’re shaking your head insisting that there’s no way all this hardware and software can be supported in a camera. Even if it were available today, you’d weigh down the cameras or eat up all the power. Quite possibly. But there’s nothing forcing everything to be within the camera itself. The key is to build cameras with open communication and enable a market of companion devices and services.

What kind of communication am I suggesting? You want all the data being collected by the camera sent to the companion device–in real time. You want access to the all the control within the camera from the companion unit. In essense, you want to be able to process the images using whatever it takes and then turn around and tell the camera to adjust the image this way or that way before and after the image is taken and then tag this or that part of the image based on sensed information. So at its most basic level you want a real-time video stream out from the camera and a control path back (possibly including processed image(s) and possibly additional sensed EXIF data). Alernatively, you want to have open extensibility within the cameras themselves. If you want to add a gyro sensor, you should be able to do so.

So what kind of sensors and data am I envisioning that cameras collect? Some simple ones: GPS (for global positioning); some new ones: camera orientation for location orientation (including inclination, compass heading, elevation, etc), light conditions, distance measurements using time of flight or whatever technique, etc. The trick here is that if there’s anything you want to know about the image, try to sense it directly rather than try to guess about it later in software. Likewise, whatever you can sense directly, try to build up processes that leverage this information the most, because it’s probably the most reliable and consistent.

But sensors will get you only so far. And here comes the next big step. Cameras (or the processing of images) need to leverage as much a priori knowledge about its surroundings as possible. If you’re taking a picture that intersects the GPS point latitude 37.74611 and longitude -119.53194, then you’re probably taking a picture of Yosemite’s Half Dome. If you’re at this location and your elevation is 8,836 feet then you’re at the top of Half Dome. Now let’s place the elevation at 30,000 feet. Now you might assume the Half Dome photo is taken from a plane. Three different interpretive tags. All useful. Essentially you’re leveraging “pre-tagging” or “a priori tags” of information.

This pre-tagging notion can go even further. Think about it. There can be a priori-tag services for sporting events, for graduations, for conferences and even showroom floors, for the national parks, and on and on. Imagine a service that the camera or post-processing of the camera location/orientation data can leverage to automatically tag the photo. Some of the tags could be entered by a Mahalo-like service, some by community efforts, some by the organizers of events. The point is: Why are all 10,000 people attending a basketball game expected to tag their own photos of the game, when we all already know they were there and the main context of the location?

Why are we not leveraging a priori knowledge that such and such location is of Robert Scoble’s house (notice the implication of time)? Or the beach? Or going further–my kitchen–or my backyard–or a booth at a conference–or a particular display area of a booth at a conference–or with the right local positioning information a particuilar gadget within the display area of a booth at a conference? It all depends on the collected sensed data from the camera. Some of these tags are easier to come by than others, but there’s lots of low-hanging a priori fruit.

Maybe such a service is provided by flickr, maybe by Live Search, maybe by the camera companies themselves, maybe by a Photoshop plugin, maybe all of the above. No doubt this would be a massive service on par witih Google Earth or Virtual Earth, but can you imagine??? Now this is where the VCs should be putting their tagging money.

Can a tagging service help me find all pictures of my dog? Probably not. It may not even be able to recognize a dog from a cat or a person (although maybe someone will figure that out too), but with the right information you may be able to leverage a priori tags to help in the search. You might have to think different about searches–kind of like how we all have adjusted to searching the “Google” way, if you will. For instance, to find all pictures of my dog I might think in terms of where he was and when. Was he in the backyard when I took a picture of him? Was he inside my house? This would yield a much smaller set of images that someone could quickly scan through.

This doesn’t help with tagging the names of people in the photos either. True. Maybe the human is best for this. But there are some possibilities. Maybe tags could be shared and cross-referenced so if two people took the same photo with intersecting rays at nearly the same time and both include people and one is tagged, then maybe the photo from the other person could be auto-tagged–maybe not at the level of faces, but of the image itself. Again, this would depend on additional sensory informaion collected at the time a photo is taken.

Anyway, lots of possibilities here. Lots of market potential. My guess is that Google has the right mindset to do it, but I wouldn’t count out Microsoft or Yahoo. Who knows.

At the Scott Guthrie Show

Tuesday, March 11th, 2008

I’m at a local developer event today, which is showcasing Silverlight 2. The event is being put on by Microsoft and a local developer group. Most of the people here appear to be .Net developers in one way or another. My guesstimate is that there are 400 people here, plus or minus.

Will it be interesting enough to Twitter? Hmm. It depends on whether Scott Guthrie, who will be giving the “keynote,” will audition with his latest Vegas act. :-)

Scott Guthrie talk on March 11 in Scottsdale

Sunday, March 9th, 2008

If you’re a developer that lives in the Phoenix area, you ought to check out Scott Guthrie’s talk this Tuesday, March 11. He will be speaking at the Scottsdale Center for the Arts. You can register here for free.

From the registration site, the schedule is:

7:00 - Doors open for sponsors
8:00
- Doors open to Welcome / Please be seated
Scott Guthrie - Microsoft Silverlight 2.0 Intro (New from 1.0 and 1.1)
Eugene Osovet - Consuming Web Services with Microsoft Silverlight
Ben Waggoner - Encoding VIDEO for Microsoft Silverlight
Scott Guthrie - ASP.NET Model View Controller (MVC)
Remy Pairault - Serving Applications with Microsoft Silverlight Streaming
Give-A-Ways, Give-A-Ways, Give-A-Ways
And boy do we have Give-A-Ways.

I’ll be there with my Tablet PC and maybe UMPC. I hope to be broadcasting it live on UStream.tv if all goes well.

Google bar crashes in IE8 beta and other beta thoughts

Wednesday, March 5th, 2008

Right after the Mix08 keynote this morning I downloaded the IE8 beta. I’m typing this blog post in the beta browser right now. My thoughts? Don’t download the browser unless you’re really into hard-hat development. There are some new features (activities and slices) that I wanted to try out, so I decided to take the plunge. For me it made sense. But there are plenty of rough edges.

I won’t go into the issues here, because this is a beta build after all, however, I will say that I had to turn off/hide the Google toolbar because when I visited some pages (such as this blog’s WordPress editing tool page) Google would consistently crash IE–or at least that’s what the error dialog said. Things work OK without the Google bar.

Again, I wouldn’t recommend the IE8 beta except if you have a need for it (developer/designer), but I’m glad I took the plunge. I can see already several improvements in the layout engine.

UMPC and Tablet PC make it into Mix08 keynote

Wednesday, March 5th, 2008

I wondered if Microsoft would forget about the UMPC and Tablet–it didn’t. In the Mix08 keynote today, during an Aston-Martin demo, they used a Samsung UMPC running a demo app written in WPF. And in the demo after that they had a casting-director demo an app running on a Tablet PC. And best yet? The demoer walked around with the Tablet! Finally. Finally. Finally. I tried to take a screenshot, but the video stream failed right at that point. (Flash or Silverlight or Air is leaking memory like mad. My system is almost dead. Copy/Paste is working intermittently. Closing apps to try to make it all the way through the keynote.)

It’s good to see that the UMPC and Tablet PC didn’t get overlooked. Maybe next year we’ll see multi-touch and Silverlight or maybe even one of the key speakers using a Tablet/UMPC during their talk. :-)

 astonumpc.jpg

Online/Offline–give me better browser experiences

Sunday, March 2nd, 2008

Robert Scoble contemplates the value of offline technologies–in particular, how they might fit in with the technology battles between Google, Adobe, and Microsoft.

I think Robert has it basically right: That for those of us that switch computers a lot (2 or more times a year) and own more than one device (2, 3 or more), that maintaining a “desktop” metaphor on them simply is too overwhelming.

It’s too much work keeping the data in sync. It’s too much work (and too expensive) to keep the licensing straight–especially when you consider that you’re often paying for mulitiple installs when you’re just one person using the app on one or possibly two devices at a time. And who wants to keep installing stuff over and over again. I can’t believe all the time spent when I hear of people re-installing the OS or this or that application. What a waste. More and more people are going to be running into this situation too as the number of Internet-connected devices grows.

So Robert’s correct. A good product/service today is going to be one that can reach out to the top tier devices and run well on them. While writing thredr, for instance, I intentionally biased the design towards smaller displays–because that’s where more and more of my online time is going. I think this is a trend that more and more people are going to be following too–especially as the browsers in these devices improve.

To me, actually, it’s not just about online/offline. A good technology is going to work well 1) across a range of devices 2) be easier and less expensive to operate than other solutions by at least a significant factor and 3) operate inside the browser where it makes sense and outside of the browser when it makes sense.

I also think that browsers have become so important in how people use their devices, that they need to be kicked up a notch in terms of the capabilities they provide. There should be better editing and spell checking built in. There should be better graphics support. (Where “ink” can be rendered everywhere as vectors, for instance) And there should be a storage metaphor that works when I’m online or offline. An A cross-platform, Air-like technology is OK, but the first, bigger challenge is to improve the browser technologies–across the board. We’re not even close to the way they should be.

I want a Tesla

Tuesday, February 19th, 2008

Robert Scoble just had a ride in the first production Tesla–a high performance electric car. You can watch it here on a recording he made with his Nokia N95 cell phone (http://qik.com/video/22264). Robert’s reaction as the car starts off in the parking lot says it all.

You can also watch Jason Calicanis’ video recording as he follows–or tries to follow the Tesla. (http://qik.com/video/22262). At about the five and eight minute marks you can really see the Tesla’s acceleration as it pulls away from the Corvette Jason was driving.

Very cool.

If you only have 10 minutes. Watch Robert’s video. He rides along with Elon Musk, the chairman and main investor of Tesla and asks some great questions.

 Robert blogs about his experience of riding in the Tesla here.

Why video should be a native data type

Monday, February 18th, 2008

I want to revisit an idea I blogged about awhile back. I’d like to see video sharing/broadcasting/recording become an integral part of the OS experience–not just for computers, but for cameras, cell phones, and other digital devices.

photoofdisplay.pngA bit of background first. I was at a conference awhile back when I decided to take a picture of a session listing that was being displayed on some monitors. Simple enough right? Well sometimes the little things spawn ideas–and this one did.

As I was adjusting my position left and right to get the display in the field of view of my digital camera as well as to minimize glare, I realized I was going the long way around to capture something that was already digital. What was I doing? And it wasn’t just me. There were others standing next to me doing much the same. Silly, I realized.

I thought: Why can’t I receive a live, digital broadcast of what’s on the display, right within my WiFi-based camera? Why am I capturing “over the air,” if you will, rather than going direct?

The more I thought about this, the more I began to see that our graphics chips are throwing away a lot of opportunities for digital sharing of its content.

broadcastvideo.pngNow it’s true, that there are apps, such as SharedView, VNC, and the like which are designed to share the desktop, but what if the broadcasting experience was provided as a standard in the OS? Camtasia, WebcamMax/Superwebcam, and on and on would essentially be built in with a complementary broadcasting and recording feature built into computers, MIDs, digital cameras, cell phones, and the like.

First, back to the camera capture issue. Record directly is my mantra. In this digital world, there’s no reason to go over the visual spectrum.

So let’s say I want to capture what’s on my friend’s computer. Right now you have to adjust all over the place to get the lighting just right–all along trying to avoid seeing yourself in the reflection. When you’re recording on the go, this is silly.

What if instead, the person could (for instance), right click on their desktop and select from the context menu “Share Desktop” (or window or region or whatever). With this single click the OS would then appear as a thumbnail overlay on my camera (computer or whatever) which I could then select and record. I could record picture in picture or record to the whole frame, capture a single frame, capture a sequence of frames, or….or….or. Lots of possibilities here.

As I mentioned earlier, there are desktop sharing apps today, but what I’m advocating is that they become more “video” like with embedded content/command-and-control signals–with a two way option. With a common standard–not just a desktop standard–all manner of devices and apps could record the content–directly.

And once the content can be broadcast digitally, there’s the whole world that you can broadcast to. Imagine.

So what I’d like to see is an open sharing and broadcasting standard that makes its way into connected devices.

Flash is already starting to show the value of a “video” standard on the Internet. Now we just need to have the recording part opened up. The reasons for locking down the content are holding back a natural evolution of devices that can share, broadcast and record live, interactive streams. It’s not just the major studios that want to “broadcast.” In fact, I’d argue that they are a small subset of all broadcasting that would take place.

So imagine you’re at your next conference and someone is projecting a demo playing on their desktop up on one of several large screens. No more do you have to get just the right seat to get just the right shot of what the person is showing. Instead, they just have to share their desktop and broadcast its contents live directly to your camera or OneNote or whatever. Same goes for the doctor showing you your ultrasound of your unborn child or you MRI. Yes, you can share the files, but you can also share the playback experience with all the interaction…the sound….the movement of the pointer…and the content itself–all to your cell phone, MID, or laptop.

Conference pondering

Friday, February 15th, 2008

It’s time to think about which conferences to attend, if any. TechEd? Mix? SXSW? BIL? PDC and I’m missing a half-dozen others I’d really like to attend.

I’m going to have to start thinking a bit creatively here, however. These conferences are too expensive for my wallet this go round.

Mix is $1295. TechEd is a whopping $1,995. SXSW isn’t too bad under $1000 depending on what you’re interested in. But it doesn’t take much but to realize that this all ads up.

Top of my list is Mix and PDC, though I’ll probably forgo Mix. Why? Even though I’ve been doing Silverlight development and I’m a great proponent of the approach and Mix is the conference to attend in this regard–the reality is I’m an outsider. Check out http://www.TabletPCPost.com/search and you’ll see what I mean. This isn’t what Mix is about. So whereas I’d benefit from a couple sessions, I can’t justify the expense. Maybe sitting in the hallways is once again where I belong.

In terms of SXSW, this probably would be a better bet–especially in terms of meeting new people. How can it be that I’m a fish out of water for Mix, but not SXSW? Well, sometimes this fish wants to swim upstream and if I’m going to push myself, I’d rather go big. Again, though, maybe I belong in the hallway rather than as an attendee. (All this makes me wonder also if Mix really shouldn’t be at SXSW???)

TechEd also has me thinking. Here’s the thing. In the past I spent most of my time at TechEd talking with Microsoft employees about various topics I was keenly interested in or stuck on. Further, I spent quite a bit of time hanging around the Tablet/UMPC area in the Microsoft booth. (No idea if there will be something similar this year.) Truth is I rarely went to any of the sessions. So $2000 sounds like too much. Unfortunately the way TechEd is set up, there really aren’t any hallways to hang around in, so that’s probably a no go. And further, are there going to be sessions or people discussing topics I’m really interested in? It looks unlikely. So should I scratch TechEd off my list? I’m leaning that way.

PDC is another matter. If I had to pick one conference to attend, this would probably be it. The issue though is that I’m sure I’m going to be missing out on a lot of great new stuff if I wait till PDC (October 2008) and that’s what has me concerned. On the Microsoft tech side of things, Microsoft has locked down so much info and hopes of being more Apple like that I’m reluctant to plop down several thousand dollars to take the risk that they might actually show something worthwhile. And did I point out how expensive all this Microsoft stuff is getting? If someone has Apple envy, they should be checking out Macworld and how much it costs.

So this brings me to BIL. Now here’s something I can afford :-) and it’s sure to attract some interesting people. Hmmm. Maybe someone will UStream it?

Taking a look at Google Forms and using it on a Tablet PC

Thursday, February 7th, 2008

The recently announced Google Forms has me dreaming about ink integration with the browser again. Forms and ink go together.

My long standing point of view has been that ink should be more tightly integrated into the browser. The browser is the platform for many. It should be designed as such.

One approach would be to support an ink overlay mode. Yes, the TIP can be used, but this is Vista only and cognitively every time the TIP pops up it screams: “This is not your program running. This is another program. You must juggle two programs at once.” 

And besides if the idea is to leverage the TIP and its correction UI, it needs a lot more flexibility and the ability to integrate with the DOM. It needs to become more transparent. Literally and figuratively.

Here’s a simple form I just created on my Tablet–using the TIP 100% for all input, which was easy enough:

formcreated.PNG

This form is TIP friendly in IE. You can select values, enter information into the edit fields and so on. But I want more.

After using the form a bit I came back to some ideas I think that would help a lot. For instance, I’d like to see some meta data for the <input> tags that can guide the recognition. For instance, is the form expecting a number? An email address? Or maybe even a signature? :-) (Read that picture.)

One nice, Tablet-friendly twist would be to support edit boxes with larger fonts, restricted input, and an inkable surface that “talks” with the input item. It might look something like a classic ink form field:

formcreated2.PNG

The recognition could be supported within the browser (if on Windows Vista machine) or on a remote server on non-supported devices. To work well, keyboard input would still need to be supported, but that’s not too big of a deal.

I have an ink layer that runs under Firefox which can do some of this. Maybe I need to think about proxying the Google Forms and building a browser-based form that really supports ink.

CloudBook to get touch

Thursday, February 7th, 2008

LaptopMag hs an interview with Paul C. Kim, Director of Marketing at Everex, in which he says that the CloudBook will include touch later this year:

“45 to 60 days after the first version of the Cloudbook hits stores in February, Everex plans on releasing a touch screen version of the CloudBook to developers. They anticipate a consumer version hitting the market in Q3.”

I’m guessing we’re talking about resistive touch. This probably makes the most sense from a cost standpoint. I sure would like to see an integrated capacitive/active digitizer though on one of these small devices. That would be awesome.

Unboxing videos continue to be popular

Thursday, February 7th, 2008

What is it about taking things out of a box that’s so compelling? :-)

Looks like today people can’t get enough unboxing going on. As the Tablet PC/UMPC micro-conversation tracker shows, there’s a popular youtube video on unboxing a WiBrain UMPC:

tabletpc20080207.PNG

and not to be left out there’s an unboxing video making the mark among those discussion Apple related topics:

apple20080207.PNG

(Can you tell I’m having too much fun with the Vista Snipping Tool and iPhoneTester.com?)

Added publication date to each “article”

Wednesday, February 6th, 2008

In order to make it more obvious which articles are new and which have been around for awhile, I changed the micro-conversation tracker so that it prefixes each item with the pubDate in the RSS feed.

Here’s what it looks like for the Microsoft Developers feed:

addedpubdate.png

An e-Budget on a Tablet PC

Monday, February 4th, 2008

George Bush presented his administration’s 2009 Budget to congress on a Dell Latitude Tablet PC today.

You can watch a video of his presentation on Yahoo News, where he holds up the Tablet PC.

budgetonatabletpc.PNG

Click here to watch the video

It begins, with Bush saying, “Submitted the budget today to congress. It’s on a laptop notebook–an e-Budget. Saves paper. Saves trees. Savs money.”

Couldn’t agree more. Tablets for everyone! Or is that “Laptop Notebooks” for everyone! :-)

Gottabemobile has some shots of the presentation too.

Can’t get enough photos of George Bush with a Tablet PC? Yahoo has eight more here. Trade and collect ‘em.