Its Computer Science Time: August 2012

Tuesday, August 21, 2012

Generality is Correctness

At work I came across a situation where the length (and other properties) of a data structure was hard-coded in many similar functions across a project, where each function acts on a different instance of the structure. At first I was reluctant to implement my own version of the function in this way. Lacking a convenient way to abstract this function I'm okay with reimplementing it in my own code, but it is possible for the length to be calculated at run-time instead of hard-coded.

I reasoned that all the functions be implemented by calculating the length, as it would make them more generic. The function would work correctly if the sizes changed, or if it were using in a different situation then the one originally intended. In this sense the more generic the function the fewer properties it has, and the more likely the features it does have will be correct. Being very generic should make a function easy to understand (there are fewer things to be aware of) but should also restrict its behavior to exclude extra information (the length used in that function is part of its behavior not described by its type) or corner cases. An more extreme example is Parametric Polymorphism, where there are even derivable laws that such functions must satisfy, but it seems like this concept occurs in all levels of design.

In the end we are hard-coding all of these sizes. This is also a form of correctness, as all function do one and only one thing, and can be used in only one way. The main reason for doing it this way is that we can show that throughout the project the sizes are used in a consistent way and that nowhere does the size get misused. This means that we will not see a bug that a size is calculated incorrectly and ends up overwriting memory, getting memory access exceptions, or taking a huge amount of time to process a structure that appears larger than it is. On the other hand, these functions are specific to a single use, and it is difficult to make global changes to them if we needed to. In this particular case that is okay, but it makes maintenance more difficult.

The main thing I've learned here is that the tradeoffs where I work are not the ones I'm used to. We do not build small, reusable components, and there are not many mechanisms for abstraction or specification. Documentation becomes extremely important, and there are often subtle bugs (in my code included) involving memory management.

C Macros and Let Polymorphism

We use macros for constants quite a lot at work, and it occurred to me that since a macro is replaced at a token level (before parsing and certainly before type checking) and since C will do coercive casting on literals, macros are somewhat like a limited form of Let Polymorphism. Obviously they not the same thing, but in practice you sometimes get the advantages of Let Polymorphism in that you get the same expression at several types.

Wednesday, August 8, 2012

Blind Crowdsourcing of Captchas

I feel like someone must have thought of this before, but it seems like one way to break captchas would be to create a service that many people want, but needs to be protected against scripted use. Give your users captchas from other sites that you want to break, and they will gladly solve them. You then simply forward the answers along to a script that is doing something nefarious- whatever is it captchas are protecting against. Distributed organic computing for solving the captcha problem.