Minimize API visibility

When building libraries, there is always the need to differentiate between API designed for end-users, and API designed for our internal use. We call API public when we intend for it to be used by end-users, and we call it internal when it is intended for our use, and is not intended for end users. Java, as with most object-oriented languages, has many useful concepts, mostly centered around the visibility modifiers public, protected, private, and package-private (denoted in Java by not specifying any visibility modifier). This works well for many cases, but in Java it falls down when API spans multiple packages because there is no 'library-internal' or 'module-internal' concept, like there is in other languages. Despite this, our goal as Java API designers is always to have the highest degree possible of our internal API not leak to the end-user of our library, where leaking means one of the following is occurring:

  • We have an intentionally-public type that exposes a field / constructor / method as public or protected, because it must be accessed by another type, but that it is not our intention that end-users make use of. We will refer to this going forward as 'leaking methods'.
  • We have an intentionally-public type that exposes a constructor or method parameter or return type that is not intended to be part of our public surface area. We will refer to this going forward as 'leaking types'.

One of the core API characteristics I presented earlier on this website was the concept of evolvability. I discuss this further in the design for extensibility best practice, but in this best practice we will dig into how we can limit our API surface area by being more conscientious towards our delineation of public and internal APIs whenever possible. In the course of my API design experience, I have achieved this largely by putting internal types into a separate package that is clearly documented as internal API. When I was working on the JDK itself, packages starting with com.sun.* were typically accepted as internal API. More recently in my role at Microsoft working on the Azure SDKs for Java, I introduced the convention of having an implementation package (or sub-package) for much the same thing. Tooling was configured such that these internal API packages were excluded from JavaDoc, not validated against breaking API change rules, and specific tooling was written that ensured types inside the implementation package did not leak out through our public APIs.

The visibility modifiers in Java are unfortunately very coarse-grained for what we need as API designers, and they fail to properly support all circumstances. This best practice attempts to provide more clarity for those circumstances, to provide a framework for determining how best to present an API that simultaneously provides all required APIs to the end-user whilst not providing any API that is intended for internal use only.

Dealing with Leaking Methods

Leaking methods is when an intentionally-public type has API on it that is not intended for use by end-users. This is commonly the case when a method (or constructor or field, but for brevity henceforth I will just say 'method' to mean all of the above) is made public or protected because it must be accessed by some internal code, commonly to set some property or configuration, before being handed to the user. Unfortunately, because this is a publicly-accessible API, the recipient of the type has similar freedoms to call this API and perhaps put the type into an inconsistent or illegal state. For example:

public class User {

    public String getName() { ... }
    public void setName() { ... }

    public int getScore() { ... }

    // this should not be visible to the user but due to design choices in our
    // API, it is, and the results could be catastrophic if the user calls it.
    public void resetScore() { ... }
}

There are a variety of ways that this problem can be tackled. Which one should be chosen depends a lot on the specific context of the API, so firstly we will enumerate the options and how they are implemented, and subsequently, we will dive into how we might land on the best choice of these options for a particular scenario.

Accepting Leaking Methods

The first option is to accept that we might just be best served to have leaking methods in our public API. The solutions outlined later in this document might be worse than the disease, as it were. Taking this approach requires little effort, but consideration should be given to how to lessen the probability and impact of the user calling these methods:

  • JavaDoc: It is always good practice to clearly document how an API is to be used. In the case of leaking methods in an API, documenting that it is not intended for end-use and that it is internal-only is the minimum that should be done.
  • Method naming: Prefixing internal methods with something unattractive such as impl_<methodName> can help encourage developers to steer clear of these particular methods, particularly if this is a pattern that can be established throughout the library prior to naming being set in stone. Unfortunately, naming like this can also become a giant sign to end-users about API that they might want to take a look at with extra interest, because it gives them super powers not intended for their use.
  • Deprecation: Marking the method as deprecated may discourage use further, especially if the user has their build configured to fail if deprecated functionality is being used. This is not commonly-used and is generally not recommended.
  • 'Call-once' semantics: If the API is only to be called once by the library before it is given to the user, at which point the API should not be called again, consider implementing a 'call-once' semantic on these leaking methods. This would would mean that users, upon calling the method, would receive a runtime exception or a no-op, rather than modifying the state as they expected. Enabling this kind of 'call-once' semantic is as easy as checking if a particular field is already set, or, if null is an allowed value, to check a separate boolean field to see if the method has already been called.

Introduce an Interface

If leaking methods is not acceptable, another choice is to introduce an interface that represents the type, with only the public API visible on it. This works well for returned model types, where the backing implementation of the interface is instantiated by the library and provided to the user, whereupon they are interacting with the interface only. It does not work well in situations where the user is required to provide an instance of a model type (because they are unable to instantiate an interface directly), nor does it work well in cases where the model type is round-tripping with a service and therefore is both user and service instantiated.

There are patterns that enable instantiating an interface without leaking the implementation. Most notably, the factory pattern allows the user to be unaware of the implementing class of a given interface, and only be presented with the interface API. In Java, we could collapse the factory pattern down into the interface itself, using a static factory method:

public interface ModelType {
    void setFoo(Foo foo);
    Foo getFoo();

    // static API allows the user to create an
    // instance of the internal type
    static ModelType newInstance() {
        return new ModelTypeImpl();
    }
}

// internal type pushed into an implementation package
public class ModelTypeImpl implements ModelType {
    public void setFoo(Foo foo) { ... }
    public Foo getFoo() { ... }

    // dangerous operation we don't want as part of our public API
    public void impl_formatServer() { ... }
}

By doing this, the user would never need to know of the existence of the ModelTypeImpl class or the dangerous functionality that it encapsulates. All public API would return or accept as a parameter the ModelType type, and users would proceed to instantiate an unknown implementation of ModelType by calling ModelType.newInstance().

In this approach, the internal type would not form part of the public API surface area and would be banished to an appropriate place within the implementation package. This would mean that the interface becomes the only public API and would take the place of the internal type in the package hierarchy.

There are challenges to consider with this approach:

  • Establishing this as a convention would require considerable documentation, as a user would not instinctively think to check the interface itself for a means of instantiating it.
  • Care must continue to be taken based on the knowledge that a user is still free to cast the interface type to instead be the backing internal type, at which point all API that is public on the internal type becomes accessible once again (so, please, don't have a impl_formatServer() method!).
  • Extra caution must be taken to ensure that the interface has had consideration given to any future evolution of the interface, as interface evolution presents its own set of challenges.
    • Historically, it was impossible to evolve an interface, however more recently with the introduction of default methods) in JDK 8 it has become possible to introduce new API into an interface in a backward-compatible way.
    • Beyond this, JDK releases prior to JDK 15 had no ability to prevent who could implement an interface. This means that end-users can also implement an interface, which further degrades our ability to support interface evolution. Fortunately, sealed classes (also known as JEP 360) are available today as a preview language feature in JDK 15, and will become fully supported in a future release. Sealed classes are covered later in this document.

Introduce Helper Types

Another option to consider is introducing helper types. I've personally not seen this pattern in the wild all that often, but it was something I experienced a lot when I was working on the JDK when I worked at Sun Microsystems / Oracle. Wrapping your head around this pattern, which I will dub the 'Helper Pattern' henceforth, is not the simplest of tasks, as there is quite a considerable amount of overhead in achieving the final result. I will firstly outline the steps required to implement this pattern, and then discuss the pros and cons.

The pattern starts by introducing a FooHelper type within your internal API space (i.e. within your implementation package). This class will take a form similar to the following:

// The helper class is named using the pattern <public-type-Name>Helper,
// so in this case we know this is a helper class for a `Foo` type in our public API
// surface
public final class FooHelper {
    // The FooAccessor interface is implemented within the Foo type, so that it can
    // access the non-public API within the Foo type.
    private static FooAccessor fooAccessor;

    // We don't want people to create instances of this class directly
    private FooHelper() { }

    public interface FooAccessor {
        // All APIs on FooAccessor map to API on the Foo type
        // that we want to access but we don't want to have as public API
        void doNonPublicTask(Bar bar);
        Baz computeExpensiveValue();
    }

    // setAccessor is called by the public type, in this case the `Foo` class will 
    // set this
    public static void setAccessor(final FooAccessor newAccessor) {
        if (fooAccessor != null) {
            throw new IllegalStateException();
        }

        fooAccessor = newAccessor;
    }

    public static FooAccessor getAccessor() {
        if (fooAccessor == null) {
            throw new IllegalStateException();
        }

        return fooAccessor;
    }

    // The following two methods are the API we want to access on Foo that we do
    // not want to have as public API. Note that it is static API, and that it takes
    // as its first argument the Foo on which we want to perform the actual
    // operation. There should be a 1:1 mapping between these methods and the API on
    // FooAccessor.
    public static void doNonPublicTask(Foo foo, Bar bar) {
        fooAccessor.doNonPublicTask(bar);
    }

    public static Baz computeExpensiveValue(Foo foo) {
        return fooAccessor.computeExpensiveValue();
    }
}

There is a lot of code here, and it still might not be overly clear how this helps anything! The next step is to introduce some additional code into our public Foo type, as shown below:

public class Foo {

    static {
        // In the static code block, we are configuring `FooHelper` as to how it can
        // access the private or package-private API inside `Foo`. By doing it here
        // the non-public API can be exposed to `FooHelper`, and from which it can 
        // be made available to other types elsewhere.
        FooHelper.setAccessor(new FooHelper.FooAccessor() {
            @Override
            public void doNonPublicTask(Bar bar) {
                foo.doNonPublicTask(bar);
            }

            @Override
            public Baz computeExpensiveValue() {
                return foo.computeExpensiveValue();
            }
        });
    }

    // the actual API can be made private or package-private,
    // and no longer needs to be public API
    void doNonPublicTask(Bar bar) {
        // impl goes here
    }

    Baz computeExpensiveValue() {
        // impl goes here
    }
}

After all of this code, we have now reached a point where we can use it! Anywhere in our code where we have access to the implementation package where the FooHelper type resides, we can now write code to FooHelper.doNonPublicTask(foo, bar) or FooHelper.computeExpensiveValue(foo).

The upside of this approach is that this API no longer needs to be public on the Foo type. The downside is all the extra machinery that is required to set up this pattern. Additionally, this pattern is not friendly across module boundaries, assuming that the location of the FooHelper type is not part of the public API scope and is therefore not exported to other modules. Concessions can be made in the module-info.java file to conditionally export the package containing the helper types to other friendly modules.

Choosing the Right Pattern

As should be obvious from the choices presented above, there do exist several possible patterns we could use - but it begs the question 'which pattern is right for me?' There is no 'one size fits all' answer to this, but we can at least identify some questions that we can ask ourselves that might guide us somewhat:

  • How much 'traffic' does the API get? Is it really in the face of the user on their critical path, or is it in some esoteric type on the fringes of the client library? High visibility API is less amenable to simply being ignored and accepted 'as-is, where-is'. Conversely, API we expect will rarely be used, or where the volume of API on the particular type is particularly low, might be an acceptable place for leaving the undesired API on the public type.
  • How likely is the scope of the API to grow over time? If we decide to introduce an interface, we need quite a high degree of confidence that the API will either not continue to grow over time, or that if it does, we can satisfactorily introduce a default method that means other implementations of the interface do not break. If we lack clarity and / or confidence in this, an interface-based approach may be setting us up for failure and the Helper Pattern may make more sense.
  • What is the scale of the undesirable public API over the entire client library? The Helper Pattern approach requires a helper for each type and therefore can require a considerable amount of engineering effort, as well as a non-trivial amount of maintenance effort over time as more non-public API is added that we want to make available via the Helper.

Having said all of this, the final and most critical point to make is: we are not required to choose one pattern for our entire client library! We can, and should, use all three of the approaches outlined above in a single client library as appropriate.

Dealing with Leaking Types

We leak a type when we introduce intentionally-public API that returns an internal type, or accepts as a parameter into it an internal type. In other words, we have a public API exposing something within our implementation package. The result of this is that users are now presented with API that may not meet our expectations in terms of API design, quality, documentation, and so on. Even worse, because we want to fully support Java modules, users may be unable to use this internal type, as it will not be exported from our module-info file. Finally, it is important to understand that leaking a single type may feel inconsequential, but this can often be the tip of an iceberg, as this type leaks other types, which themselves leak more types, until the entire internal API area is unintentionally being exposed to the user.

Fortunately, in the Azure SDK for Java build tooling we have checks that help to prevent this. Both APIView and our CheckStyle build step look for types that are within the implementation package being exposed through API on types that are not within the implementation package.

What should we do when we identify leaking types? The answer is obvious: stop the leak! How we go about this is very context-specific, and there are not many options available to us as API developers. Possible general solutions include:

  • Remove the public API that is leaking the internal type. This may be acceptable if the public API is of marginal utility, but is generally not a viable solution assuming that all API is already designed and determined to be integral to the overall client library API surface area.
  • Decide to make the internal type a public API and move it out of the implementation package. This can sometimes be dangerous for the simple reason that code originally destined for an implementation package is typically not given the same scrutiny from an API design perspective, and therefore it is not always the case that it can simply be copy / pasted from implementation to an appropriate public API package. It should be expected that this is given extra scrutiny: API design must be considered, documentation must be written, and confidence must be gained that no additional leaking is occurring.
  • Decide to introduce a wrapper type or interface that becomes part of the public API scope. Use this in lieu of the internal API type. This is primarily useful for method return types, where the instantiation of the wrapper or interface can be left to the API implementation. If the user is forced to use a wrapper or interface type, we start to arrive at similar issues as discussed earlier in terms of making it obvious to users how they go about instantiating this type.

If the leaking internal type is appearing within a method parameter type, one additional consideration is to remove the internal type argument from the public method parameter list, and instead take other parameters that can then be reconstituted within the implementation of the API to create the internal type (and therefore transparently to the user).

Conclusion

In this best practice we have discussed the ways in which we can fulfill our commitment on being restrained in our API. API design would be much easier if we could just make everything public and accessible by everyone, but that would make for an absolutely terrible user experience. In this best practice we therefore looked for ways that enable us to be careful that we only expose to users what is intended for their use. Any thing else that is available on our APIs that is not intended for their use exists to serve us as API designers, and that should be seen as a failing on our part. Whilst it is not always possible to completely eradicate these internal APIs from public scope, we should always treat them as a code smell that need to clear an extraordinarily high bar to be allowed to remain in our public API scope.