One of the time-honored traditions of open source is reverse engineering - working out how another person accomplished a goal and replicating it. Reverse engineering is an important tool in your toolbox and has been growing in importance. But it always involves some legal risk. So how do you make reverse engineering as effective as possible while managing the risk?
Reverse engineering helps you understand how a product works so you can create something similar without breaking copyright laws. Clean room procedures are used to make sure you don't copy the original code, keeping you safe from copyright claims.
A "Classic" Formal Reverse Engineering Process
There are two types of reverse engineering: white box and black box. "White box" reverse engineering involves looking at the original code, while black box reverse engineering only looks at how the software behaves. To use white box reverse engineering legally, you need to follow a process with separate teams and controlled communication between them.
Here are the steps in the process:
Make Teams: Divide your developers into "dirty" and "clean" teams. The dirty team studies the competitor's code and creates a specification. The clean team then builds the new software using that specification. If a particular reverse engineering project is especially risky, it is also possible to further insulate the teams by creating an intermediate "evaluation" team that reviews the specification to make sure there is no expressive content included.
Create a Specification: The dirty team must write a description of the software's functions without including any creative parts of the original code. This includes non-literal elements like the structure and organization of the code. If a separate evaluation team has been created, the evaluation team's job is to review the specification and to make sure it is focused only on the functional aspects of the code.
Develop the New Software: The clean team creates the new software based on the specification. They shouldn't have contact with the dirty team or the original code. They should also not be former employees of the company that made the original software.
Test the New Software: Check if the new software works as intended. If needed, adjust the specification or code to fix any issues.
A Modern Reverse Engineering Process: Test-Driven Development
One new way of reverse engineering is by adhering to a rigorous system of test-driven development. Test-driven development (TDD) is an iterative software development technique wherein test cases covering the desired improvement or new functionality are written first. Code is only developed to pass previously written test suites.
In test-driven reverse engineering, the specification team provides test suites to the development team instead of written specifications. The development team writes code to pass the test suites. Evaluation of the produced software is performed automatically through the running of the tests.
Rigorous test-driven development is effective for reverse engineering for three reasons. First, tests are intentionally opaque—they are designed to test functions for proper inputs and outputs only, and are not dependent upon the internal structure of the code except where that internal structure has a functional aspect. This is on purpose; one frequently cited benefit of test-driven development is the ability to refactor the code (incidentally removing copyrightable similarity) without affecting the functionality of the code. Therefore, test-driven development suites effectively screen all copyrightable expression from the development team.
Second, test-driven development provides an easily verifiable "clean" communication and evaluation channel between the clean and dirty teams. Having a traditional outside evaluation team absorbs limited resources. With test-driven reverse engineering, however, the test- running software itself acts as the neutral evaluation team. The tests themselves are in functional language and must be unambiguously interpreted by the test runner. The functional nature of the tests themselves acts to clean all communications between the teams; specifications can't be passed unless they are expressible in functional, testable terms.
Third, test-driven reverse engineering provides a tighter testing loop. Outside testers aren't needed to verify that the newly written code works the same way as the existing product; the tests themselves provide instant feedback because they can be run against both the old and the new code.