Integrating with service-oriented architectures

In this page we will share certain best practices you can employ in your applications using Log4j Core to integrate them with service-oriented architectures. While doing so, we will also try to share guides on some popular scenarios.

Motivation

Most modern software is deployed in service-oriented architectures. This is a very broad domain and can be realized in an amazingly large number of ways. Nevertheless, they all redefine the notion of an application:

Deployed in multiple instances
Situated in multiple locations; either in the same rack, or in different data centers located in different continents
Hosted by multiple platforms; hardware, virtual machine, container, etc.
Polyglot; a product of multiple programming languages
Scaled on demand; instances come and go in time

Naturally, logging systems also evolved to accommodate these needs. In particular, the old practice of "monoliths writing logs to files rotated daily" has changed in two major angles:

Application delivers logs differently: Applications no longer write logs to files, but encode them structurally, and deliver them to an external system centrally managed. Most of the time this is a proxy (a library, a sidecar container, etc.) that takes care of discovering the log storage system and determining the right external service to forward the logs to.
Platform stores logs differently: There is no longer /var/log/tomcat/catalina.out combining all logs of a monolith. Instead, the software runs in multiple instances, each is implemented in a different language, and instances get scaled (i.e., new ones get started, old ones get stopped) on demand. To accommodate this, logs are persisted on a central storage system (Elasticsearch, Google Cloud Logging, etc.) that allows advanced navigation and filtering capabilities.

Log4j Core not only adapts to this evolution, but also strives to provide the best in the class support for that. We will explore how to integrate Log4j with service-oriented architectures.

Best practices

Independent of the service-oriented architecture you choose, there are certain best practices we strongly encourage you to follow:

Encode logs using a structured layout

We can’t emphasize it enough to not use anything, but a structured layout to deliver your logs to an external system. We recommend JSON Template Layout for this purpose:

JSON Template Layout provides full customizability and contains several predefined layouts for popular log storage services.
JSON is accepted by every log storage service.
JSON is supported by logging frameworks in other languages. This makes it possible to agree on a common log format with non-Java applications.

Use a proxy for writing logs

Most of the time it is not a good idea to write to the log storage system directly, but instead delegate that task to a proxy. This design decouples applications' log target and the log storage system and, as a result, effectively enables each to evolve independently and reliably (i.e., without downtime). For instance, this will allow the log storage system to scale or migrate to a new environment while proxies take care of necessary buffering and routing.

This proxy can appear in many forms, for instance:

Console can act as a proxy. Logs written to console can be consumed by an external service. For example, The Twelve-Factor App and Kubernetes Logging Architecture recommends this approach.
A library can act as proxy. It can tap into the logging API and forward it to an external service. For instance, Datadog’s Java Log Collector uses this mechanism.
An external service can act as a proxy, which applications can write logs to. For example, you can write to Logstash, a Kubernetes logging agent sidecar, or a Redis queue over a socket.

What to use as a proxy depends on your deployment environment. You should consult to your colleagues if there is already an established logging proxy convention. Otherwise, we strongly encourage you to establish one in collaboration with your system administrators and architects.

Configure your appender correctly

Once you decide on the log proxy to use, the choice of appender pretty much becomes self-evident. Nevertheless, there are some tips we recommend you to practice:

For writing to console, use a Console Appender and make sure to configure its direct attribute to true for the maximum efficiency.
For writing to an external service, use a Socket Appender and make sure to configure the protocol and layout’s null termination (e.g., see the nullEventDelimiterEnabled configuration attribute of JSON Template Layout) appropriately.

Avoid writing to files

As explained in Motivation, in a service-oriented architecture, log files are

Difficult to maintain – writable volumes must be mounted to the runtime (container, VM, etc.), rotated, and monitored for excessive usage
Difficult to use – multiple files need to be manually combined while troubleshooting, no central navigation point
Difficult to interoperate – each application needs to be individually configured to produce the same structured log output to enable interleaving of logs from multiple sources while troubleshooting distributed issues

In short, we don’t recommend writing logs to files.

Separate logging configuration from the application

We strongly advise you to separate the logging configuration from the application and couple them in an environment-specific way. This will allow you to

Address environment-specific configurations (e.g., logging verbosity needs of test and production can be different)
Ensure Log4j configuration changes applies to all affected Log4j-using software without the need to manually update their Log4j configuration one by one

How to implement this separation pretty much depends on your setup. We will share some recommended approaches to give you an idea:

Choosing configuration files during deployment

Environment-specific Log4j configuration files (log4j2-common.xml, log4j2-local.xml, log4j2-test.xml, log4j2-prod.xml, etc.) can be provided in one of following ways:

Shipped with your software (i.e., accessible in the classpath)
Served from an HTTP server
A combination of the first two

Depending on the deployment environment, you can selectively activate a subset of them using the log4j2.configurationFile configuration property.

Spring Boot allows you to configure the underlying logging system. Just like any other Spring Boot configuration, logging-related configuration also can be provided in multiple files split by profiles matching the environment: application-common.yaml, application-local.yaml, etc. Spring Boot’s Externalized Configuration System will automatically load these files depending on the active profile(s).

Mounting configuration files during deployment

Many service-oriented deployment architectures offer solutions for environment-specific configuration storage; Kubernetes' ConfigMap, HashiCorp’s Consul, etc. You can leverage these to store environment-specific Log4j configurations and mount them to the associated runtime (container, VM, etc.) at deployment.

Log4j Core can poll configuration files for changes (see the monitorInterval attribute) and reconfigure the associated logger context. You can leverage this mechanism to dynamically update the Log4j configuration at runtime.

You need to be careful with this mechanism to not shoot yourself in the foot. Imagine publishing an incorrect log4j2.xml and rendering the logging setup of your entire cluster useless in seconds. Coupling the configuration with the application at deployment and gradually deploying new configurations is a more reliable approach.

Guides

In this section, we will share guides on some popular integration scenarios.

Docker

See Log4j Docker for Docker-specific Log4j features, e.g., Docker Lookup. We also strongly advise you to check the extensive logging integration offered by Docker containers.

Kubernetes

Log4j Kubernetes (containing Kubernetes Lookup) is distributed as a part of Fabric8’s Kubernetes Client, refer to its website for details.

Elasticsearch & Logstash

Elasticsearch, Logstash, and Kibana (aka. ELK Stack) is probably the most popular logging system solution. In this setup,

Elasticsearch is used for log storage
Logstash is used for transformation and ingestion to Elasticsearch from multiple sources (file, socket, etc.)
Kibana is used as a web-based UI to query Elasticsearch

To begin with, JSON is the de facto messaging format used across the entire Elastic platform. Hence, as stated earlier, we strongly advise you to configure a structured encoding, i.e., JSON Template Layout.

Logstash as a proxy

While using ELK stack, there are numerous ways you can write your application logs to Elasticsearch. We advise you to always employ a proxy while doing so. In particular, we recommend you to use Logstash for this purpose. In a modern software stack, the shape and accessibility of log varies greatly: some write to files (be it legacy or new systems), some doesn’t provide a structured encoding, etc. Logstash excels at ingesting from a wide range of sources, transforming them into the desired format, and writing them to Elasticsearch.

While setting up Logstash, we recommend you to use TCP input plugin in combination with Elasticsearch output plugin to accept logs over a TCP socket and write them to Elasticsearch:

An example logstash.conf snippet for accepting JSON-encoded log events over TCP and writing them to Elasticsearch

input {
  tcp { (1)
    port => 12345 (2)
    codec => "json" (3)
  }
}

output {

  # stdout { codec => rubydebug } (4)

  # Modify the hosts value to reflect where Elasticsearch is installed.
  elasticsearch { (5)
    hosts => ["http://localhost:9200/"] (6)
    index => "app-%{application}-%{+YYYYMMdd}" (7)
  }

}

text

1	Using TCP input plugin to accept logs from
2	Setting the port Logstash will bind to accept TCP connections to 12345 – adapt the `port` to your setup
3	Setting the payload encoding to JSON
4	Uncomment this while troubleshooting your Logstash configuration
5	Using Elasticsearch output plugin to write logs to Elasticsearch
6	The list of Elasticsearch hosts to connect to
7	The name of the Elasticsearch index to write to

Refer to the official documentation for details on configuring a Logstash pipeline.

For the sake of completeness, see the following Log4j configuration to write to the TCP socket Logstash accepts input from:

XML
JSON
YAML
Properties

Snippet from an example log4j2.xml

<Socket name="SOCKET" host="localhost" port="12345">
  <JsonTemplateLayout/>
</Socket>

xml

Snippet from an example log4j2.json

"Socket": {
  "name": "SOCKET",
  "host": "localhost",
  "port": 12345,
  "JsonTemplateLayout": {}
}

json

Snippet from an example log4j2.yaml

Socket:
  name: "SOCKET"
  host: "localhost"
  port: 12345
  JsonTemplateLayout: {}

xml

Snippet from an example log4j2.properties

appender.0.type = Socket
appender.0.name = SOCKET
appender.0.host = localhost
appender.0.port = 12345
appender.0.layout.type = JsonTemplateLayout

xml

We don’t recommend writing logs to files. If this is a necessity in your logging setup for some reason, we recommend you to check Filebeat. It is a data shipper agent for forwarding logs to Logstash, Elasticsearch, etc.