Wednesday, 17 June 2009

Why I think Opera Unite is missing the point

I watched the PR fuss over Opera Unite with some interest but I do think that they have missed the point. This posting by Chris Messina raises some valid points about the use of Opera as a proxy but I think that this covers only some of the weaknesses.

Over the years, I have seen proposals for servers running on individual personal computers and on mobile phones and they consistently look to me like a solution looking for a problem.

Technically, it is easy to create or configure an HTTP server on most devices. On a PC, a server like Apache is not hard (for a techie) to install and configure although I don't see average users trying it. It is not hard to port Apache to smaller platforms but I have also seen (or created) custom HTTP servers developed for more embedded platforms. Creating a basic HTTP server in C/C++ or Python (or other languages) is easy. All of the clever options that 'real' web servers support are more challenging but probably not required for smaller devices.

What is clearly more challenging is deployment and configuration for average users, but this ties into the user cases. Given the assumption that it is practical to create an HTTP server on a range of devices, the question arises of what to use it for and how is the address made accessible. I think that both of these are real problems. I think that Unite addresses the deployment and configuration challenges but still does not come up with good use cases.

Opera try to handle publishing addresses by means of their own proxies. The comments in the blog above show some of the problems with this but I don't see a good alternative. Unlike standard web servers that have relatively stable IP addresses and that are hooked into the DNS system to make URLs usable, personal computers and mobile devices normally do not have fixed IP addresses. They commonly use DHCP to get a transient IP address. This means that it is difficult for a remote client to know what address to connect to. Also, PCs and mobile devices are not necessarily on-line the whole time (but more of this below). In practice, a personal or mobile HTTP server would need to get an address and then push its address to some central addressing server so that clients could find the address and tell if the server is online.

When we come to consider what should be published, I think that Opera are just wrong in their assumptions. I think that it is daft to try to publish static content such as images or other media files from a PC or mobile device compared to using a hosting service. If a user wants to self-publish then they have to consider issues such as backups and access restrictions. I seriously doubt that most users will want to do this. Why not just push the content to Flickr, Facebook or one of the groups services? Even if you don't fully trust the hosting services' backups they will provide basic access control and the content will be available approximately 24/7. If you want to retain control over the content then set up a cheap hosted web site. If you want to exercise access control then consider email or set up a proper site! If these files are made available from your home PC then you will have to effectivekly upload them whenever they are accessed - most home accounts are not great at uploading so this looks like a bad idea to me.

If you do want to publish any static content then surely you want the content to be available most of the time. I don't know about you but my laptop and smartphone are not connected 24/7. They are offline or turned off for substantial parts of each day.

The Fridge and Lounge services in Unite look a bit me-too to me. I could knock up such a service on a hosted site very quickly and they would be available when my home PC was turned off. I haven't looked but I would be very surprised if this type of service was not cheaply available with various hosting services. I guess that Unite makes it available for free and with less work than setting up a hosting service - you trade increased convenience for the price of going through Opera's servers. As an alternative, try setting up a closed Yahoo group and you can get many of the benefits.

I do see value in a local HTTP server but for more specialised purposes:

  • If a device (maybe embedded or mobile) wants to make its status or other dynamic information available then HTTP is a good enough protocol to use as it is well understood. I could see the value in a service wherby such devices ran a server and then informed a central service that they were available for information retrieval. This is a pretty closed use case.
  • Specialised HTTP servers such as synchronisation servers are useful but they will tend to run when required and have specialised methods for making contact with a client (you can tell that I used to work on OMA Datasync).

So, although I like the idea of running a local HTTP server on a PC or mobile device (it is quite fun to consider the porting issues), I still think it is a solution looking for a problem. The Opera site mentions future developments and the ability for developers to work with it but, heck, if I want to create web services, I will play with a hosted service. I can use Django (just one of many) to build something and all that Units gives me is some convenience at the cost of beling locked into Unite. I would love to be wrong but I don't think I am.

P.S.
Having mentioned Chris Messina's post above, I subsequently came across a response from Lawrence Eng here.

I should add that I am not criticising Opera for not going open source
and I am not commenting on the technical implementation - I have not
spent the time to look at that so I am prepared to assume that it is
fine. My view is that the whole concept is a problem looking fr a
solution. Some of the blog comments make the point that it is easy
for naive users to use and this may turn out to be sufficient.
Alternatively, somebody may come up with the killer use case that I
have overlooked. until then, I will rem ain interested but sceptical.

Sunday, 17 May 2009

Further thoughts on implementing non DBMS storage

Since my last post I spent a couple of evenings putting together a pure Python storage class to store arbitrary chunks of data in a way inspired by Haystack. This proved quite easy in Python (unsurprisingly). It would also be quite easy to implement in in C/C++ for performance but I am not convinced that is necessary and a pure Python implementation has the virtue of being simpler to deploy.

The next stage is to integrate it into Django so that it can be used as a model field and extend the admin application and generic views handle it. This should be almost as easy so I am now reading the nice tutorial on creating new model fields along with the FileField source code.

Interestingly, I came across this story via Reddit about drawbacks of CouchDB.
http://blog.woobling.org/2009/05/why-i-dont-use-couchdb.html
I had already decided that I wasn't keen to play with CouchDB at present because I think some things (such as authentication and real regular data) are more efficiently done in a normal SQL DBMS.

I am now brooding about creating a CMS with a combination of my storage classes and search but I am trying to work out if it is any different to a Wiki with search facilities.

Monday, 11 May 2009

Some thoughts on a text storage system for Django inspired by Facebook's Haystack

I have been looking at Django lately as I wanted a project to play with and I thought that I might extend it in some way. In some ways, Django has been a disappointment as it is quite mature and very powerful so the opportunities for tinkering that I wanted are not really present. This makes it great for real web developers but less great for me personally ;-)

However, one aspect that kept nagging at me was the use of DBMS fields for large quantities of text and for searching (this is not only a feature of Django, of course). When I used real DBMS (back in the nineties) we had to be very careful about optimising column sizes and we would not have dreamt of storing whole blog entries in a database, let alone larger bodies of text. I know that hardware has gotten cheaper and DBMS have improved but this still feels to me like using the wrong tool - as if the designers used a DBMS for everything because it was the tool that they knew best.

I was browsing for search software and found examples such as Xapian and Sphinx. Again, I had used a similar tool in the late nineties when building an in-house knowledge database but the world has moved on significantly since then. I installed Xapian on my Linux laptop and was shocked by how easy and quick it was to feed in text and search on it. I fed in a Robert E Howard novel using the Python binding and didn't notice it go in. A search was also extermely fast. This provoked the thought of extending Django with Models that include searchable fields. It should be possible to simply tag model fields as searchable and have a Django extension automatically index them with Xapian (or another search tool - the choice would be transparent) and then extend the generic views to include searches. This feels like a project that would be technically interesting and that would extend Django in a style that is consistent with Django.

One issue that occurred to me when thinking about Xapian was the sequencing of index updates. I am wary of indexing as part of the data creation in a web server. If multiple users are creating data simultaneously then Xapian would not like having too many simultaneous updates. As Django is deployed with a range of web servers (or certainly with a range of interfaces to web servers) I would be reluctant to try to implement locking across multiple requests. My current proposed solutuion is to create an internal message whenever an item is added that needs indexing and then to pick them up and deal with them in a batch from a specialised request. Django provides such a messaging system so it is not necessary to add another table for the purpose and the indexing requests could be kicked off by a cron job or manually by a sysadmin. The use of a single batched indexing systems allows more efficient indexing sessions.

While considering non relational DBMS storage systems (I browsed through articles on BigTable and CouchDB), I came across another story that I noticed on Reddit about Haystack - the storage system used by Facebook to store photographs. Apparently, Facebook uses MySQL extensively but the sheer volume of photo data made it impractical for photo storage. Haystack is used to store phots in a very efficient manner - the Haystacks contain meta data pointing to the photo data; all data is appended to files so multiple reads can take place while appending takes place; the data is supposed to be structured so that it can be retrieved with minimal disk seeks.

Putting these ideas together, I thought of a text version of Haystack. Each text object can be appended to a filing set and the database can just contain its meta data (file name, offset and size). This makes the relational database smaller and so more efficient. The text can be indexed for searching as it gets saved and it should be possible to extend the generic views to handle this transparently. Other ideas from Haystack and CouchDB that can be applied include versioning and never deleting content. I would probably also have a configurable maximum size for a storage file and have the system just keep extending the system with new storage files - I have a bias against unreasonably large individual files for backup and management reasons.

If I get a few hours, I should be able to prototype this and feed in a range of literature from project Gutenberg as a test set.

Sunday, 3 May 2009

Use of the Broswer as a (cross-platform) UI

I was considering some mobile application development and was debating which UI library to use. Some thinking came up with the idea of using the browser as the UI with a local HTTP server. This is really suitable for creating PC local applications rather than mobile apps at this time but more powerful mobile phones may change this.

An obvious attraction is that it helps the separation of content from presentation (as long as you do not over-indulge in Javascript, let alone do anything daft such as using Java Applets or Flash)

Browser as UI

With modern browsers and straightorward applications, a perfectly good UI can be built. It is not really possible to create a very complex UI but I claim that most applications do not need a very complex UI - just how dynamic and immersive does a PIM application need to be?

Applications such as Gmail and online office packages demonstrate what can be done.


HTTP server accessing local functions

Using an HTTP server for the 'back-end' functionality means that any language can be used - there really is no need for anything in common with the UI. My personal preference is Python so I am going to experiment with Django.

Using HTTP as the interface can be seen as overkill or inefficient. Rather than reading data directly from the UI and modifying it directly, everything has to be serialised and transferred over an internal socket. This is true but the overhead is probably negligible compared with the actual application functionality. The real queston is whether it provides a convenient programming environment.

Most web development is concerned with preventing insecure access but if the local HTTP server is the application engine then it needs full access to the local machine - not really a problem.

The real issue is deployment of the server - no normal user is going to install and configure Apache and MySQL just for one application. The good news is that the server does not need to be scaleable as it will only be serving one user. This means that lighter-weight servers such as Django's development server or one custom-built from Python's simple HTTP server classes can be adequate.

Thursday, 22 November 2007

Challenging your own views

Yesterday, I read this interview on The Register with Adam Curtis about how mainstream media are failing to take a lead by paying too much attention to bloggers.

It is quite long (as these things go) but it provoked several trains of thought in me that were not directly the subject of the piece.

One of the themes in the article is the existence of different groups (of bloggers) with fixed viewpoints who refuse to listen to any opposing or alternative viewpoints. The article takes this as a reason why the mainstream media (such as the BBC) choose to tread a middle path between opposing groups rather than trying to form their own view.

My personal lesson from this is the weakness of being too blinkered in ones views. If you hold strong views then it is very easy to keep reinforcing them but in a technical and management arena it is useful to chalenge your own views occasionally. I am in some ways an archetypical 'Guardian' reader and I can be comfortable reading newspaper articles and columnists that agree with me but too much of that breeds complacency so I make a point of reading 'The Independent' quite often to get some opinions that I don't automatically agree with. I have tried reading newpapers with views further away from mine and it is much more difficult to critically assess their views - get too far to the right and I just get angry...

In technical areas, I often take part in discussions with engineers puttng forwards views with which I disagree. In some cases, their view is not that controversial but it is simply not what I wanted to hear because I came to the dscussion with a proposed solution of my own. In other cases, I really do disagree with their view but I make a point of hearing them out and trying to see the value in it, despite my biases. This can require some emotional effort to be patient and objective, particularly when we are working under time pressure or when the engineer presenting his views is doing so clumsily, but I find enough occasions when there is value in the other point of view to keep making the effort.

Wednesday, 24 October 2007

Why do Managers behave like idiots and why are Engineers so obstructive ?

All engineers have anecdotes about when one or other manager behaves in an idiotic manner, making decisions that are obviously stupid. Often, the more senior the manager the more frequent the bad decisions and the more serious the outcome.

Alongside this, all managers know that getting engineers to do something new or different can be frustrating to the point of apoplexy. Engineers seem to have an innate desire (and skill) for finding objections and reasons why a proposal will not work.

I suggest that both of these problems are caused by the different environments and needs of managers and engineers. My starting point is to acknowledge that both managers and engineers tend to be highly intelligent and highly motivated – neither group is stupid or deliberately obstructive.

Managers have two problems that are relevant. The first is the scope of information that they need to handle. The second is the effect of friction in any organisation.

By information scope, I mean the range of information for which a manager is responsible. Even a quite junior manager is likely to responsible for a number of projects and the more senior the manager the wider the breadth of the organisation that they are supposed to know all about. This means that a manager cannot possibly know everything about all the projects etc. under their control. This does not stop more senior managers expecting instant answers to random and detailed questions. When necessary, a manager can learn all about one area (and engineers may be surprised at the speed at which a good manager can filter and absorb information) but they cannot learn everything and maintain that knowledge.

The information scope problem means that a manager will tend to have to make decisions based on partial information. The more senior managers get their information from their subordinates so their information is second-hand as well as partial.

The concept of friction comes from Clausewitz and refers in this case to all the factors that make even the simplest organisation task difficult. Clausewitz ascribed friction to the physical danger of war and the physical challenges. Most organisations do not have this level of danger but communications difficulties, conflicting priorities and natural resistance to change all make it unexpectedly difficult to carry out projects or to make changes. This means that any successful manager has to be a highly motivated problem solver. One aspect of such behaviour is an ability not to be put off by minor obstacles. If a manager stops to re-think the plan whenever somebody objects then they will never succeed. In contrast, if they ignore or override objections they may hit problems but they have a better overall chance of success. Of course, the really good managers have to decide which objections are serious and which can be safely ignored.

One side effect of the information scope and friction problems is that managers have to be able to look at the big picture without getting bogged down in minor details. Oh, and they are commonly under time pressure which can make them impatient with unnecessary details.

Turning to obstructive engineers, we can see that a good engineer needs to care about detail and needs to learn everything relevant about their subject or immediate problem. As engineers tend to work directly on their problem, they may be less affected by organisational friction (apart from the friction in getting budget or resources) – they tend to be hands-on.

When engineers discuss technical problems, they need to have a shared knowledge base that can be detailed. Therefore, they may have to provide a large amount of background information in order to present a problem or ask a question.

If an engineer ignores an inconvenient problem for reasons of expediency then the problem is unlikely to go away. It is more likely to return when least expected but bigger and uglier than before. Therefore, engineers tend to focus on problems as they find them and are uncomfortable moving on until they are convinced that the problem is solved (or at least can be solved for a reasonable cost).

To a busy manager, engineers are people who keep bringing up problems, describe them in a long-winded way and need constant ‘motivation’ to keep them moving forwards.

To a conscientious engineer, managers are ignorant people who don’t have the patience to understand what is really going on and who prefer to deal with problems by ignoring them.

Tuesday, 12 June 2007

Thoughts on interviewing software engineers

Over the weekend I came across this blog entry.
http://blog.pmarca.com/2007/06/how_to_hire_the.html

I do a lot of interviewing and I am very aware of how difficult it is to make an interview effective so I am always interested to read about interview techniques for software engineers.

Mostly, these focus on problem solving but I distrust these. I do have a collection of interview problems, some based firmly in software engineering and others more abstract, but I have tried them all myself and I don't believe that the ability to solve arbitrary problems under interview conditions is a good indicator of ability.

To be more accurate, I don't believe that the inability to solve one particular problem under interview conditions shows that an engineer is incompetent in general.