Cloud-Native Forum Platform

Architected and engineered a fault-tolerant, cloud-native distributed forum platform built on Kotlin/Spring to demonstrate real-world patterns for scalable backend systems. The system employed Microservices, CQRS (Command Query Responsibility Segregation), and Event Sourcing (AXON Framework) to ensure high data consistency and horizontal scaling. Implemented Kubernetes orchestration, automated CI/CD (GitHub Actions), and production-grade features like centralized logging, full observability (Prometheus, Grafana, Kibana), and OAuth 2.0/RBAC identity management. The architecture featured advanced distributed patterns, including Replication, Load Balancing, Circuit Breakers, Service Discovery, and Service Registry.

  • University
  • Kotlin
  • Microservices
  • Distributed Systems
  • Spring
  • AXON Framework
  • Kubernetes
  • CQRS
  • Event Sourcing
  • Replication
  • OAuth 2.0
  • Observability
  • Infrastructure as Code
  • Github Actions
  • Google Cloud
2025-12-27 19:25
11 min read
ERSMS-24L

ERSMS#

Building scalable systems is one thing. Building them in a distributed, fault-tolerant, production-like environment - as a student project - is something entirely different. This project began as an ambitious challenge: create a modern forum platform that doesn’t just work, but actually demonstrates the same architectural principles used by today’s cloud-native companies.

What started as a simple idea quickly grew into a fully distributed ecosystem built around microservices, CQRS, event sourcing, observability, and Kubernetes orchestration. The goal wasn’t just to ship features - it was to understand how real systems behave under load, fail gracefully, recover automatically, and scale without friction.

Project Overview#

The platform is designed as a collection of independent microservices communicating through asynchronous messaging. Every major action-creating posts, registering users, updating threads-is captured as an event, stored reliably, and streamed to other services in real time. This event-driven design not only improves performance but also makes the system transparent and easy to audit.

On the read side, a dedicated query service maintains optimized materialized views. This is the CQRS pattern in action: commands and queries split into separate models, each specialized for its own task. It makes the platform feel responsive and ensures consistent performance even as the dataset grows.

Cloud-Native Foundations#

From the beginning, the goal was to treat infrastructure as a first-class part of the project. All services run in Kubernetes, benefiting from containerized deployments, automatic restarts, horizontal scaling, and declarative configuration. The environment mirrors real-world DevOps setups, making the system resistant to failure and easy to extend.

Security is handled using OAuth2, enabling safe authentication flows and clear separation between public and protected endpoints. Combined with API gateways and strict service boundaries, the platform follows production-grade security practices.

Source: https://docs.axoniq.io/reference-guide/architecture-overview
Source: https://docs.axoniq.io/reference-guide/architecture-overview

Observability & Operations#

A distributed system is only as good as its observability. That’s why the project includes a full monitoring stack: logs, metrics, dashboards, and request traces. This allowed us to track bottlenecks, detect failures, and watch services interact in real time. Debugging becomes not just easier-but almost enjoyable-when the system explains itself through dashboards and traces.

ELK Stack for gathering logs
ELK Stack for gathering logs
Gathering Statistics
Gathering Statistics

Automated CI/CD pipelines close the loop, ensuring consistent builds, tests, and deployments. Any change in the codebase flows through the pipeline and lands safely in the cluster without manual intervention.

Github Actions
Github Actions

Project in details#

Topic#

Forum-like application:

  • Users can create new threads, where they become a moderator.
  • Users create posts under multiple threads
  • Users can vote a post up or down.
  • Users can edit and delete posts they’ve created.
  • Moderators can remove posts from threads they moderate.
  • Moderators can ban users from posting under a thread they moderate.
  • Moderators can appoint and remove new moderators for threads they moderate.
  • Administrators have the same power as moderators on all threads.
  • Administrators can ban users from posting under every thread.

Microservices#

  • Business logic
  • Database (DBaaS)
  • Gateway
  • CQRS Event handler

Cache#

  • The posts microservice caches user account names, for which the accounts microservice is the authoritative data source.
  • The threads microservice caches first post content, for which the posts microservice is the authoritative data source.

Control version system usage, like git#

We are using github.

Commit Messages
Commit Messages

Automation in infrastructure build and and setup#

Github Actions
Github Actions
Github Actions - Build & Publish
Github Actions - Build & Publish

As a part of CI, we have prepared three pipelines:

  • Build - run every time when someone creates a pull requests and updates it
  • Build & Publish - responsible for building all java libraries, frontend and docker images, then publish all artifactas to the Github repository. Runs after merging pull request into main.
    Github Packages
    Github Packages
  • Update Gradle Wrapper - responsible for updating Gradle when a new version is released. Runs every week.

We have Dependabot which is responsible for updating all libraries. If it detects that a newer verison is available, it creates a pull request with the update. Runs every week.

Dependabot
Dependabot

At least three types of microservices#

  • Business logic
    • Accounts Service
    • Threads Service
    • Posts Service
  • Database (DBaaS)
    • MongoDB
    • Postgres (Keycloak)
  • Security Management
    • Keycloak - open-source identity and access managment solution
  • Gateway
    • Gateway Service
  • UI Server
    • Nginx - web server delivering static site content (HTML, CSS, and JavaScript files).
  • CQRS Event handler (Message Broker)
    • Axon Server - stores all events and monitors whether the events will be executed by service.

Monitoring of the service, collecting all the logs#

Monitoring:

  • Prometheus
    • Node Exporter - running on all of the nodes
    • Prometheus - server that
  • Grafana
    • statistics vizualization
Gathering Statistics
Gathering Statistics
Prometheus Dashboard
Prometheus Dashboard
Grafana Dashboard
Grafana Dashboard

Log aggregation:

  • File beat
    • running on all of the nodes
    • it is responsible for collecting logs from services running on a node
    • it sends the logs to the main logstash service
  • Log Stash
    • is responsible for log transformation, enrichement and then sending the result to the elasticsearch
  • Elasticsearch
    • database that stores all of the logs
    • allows for fast queries
  • Kibana
    • gui created for easy queries on elasticsearch
ELK Stack for gathering logs
ELK Stack for gathering logs

Elasticsearch:

{
"name": "elasticsearch-es-default-0",
"cluster_name": "elasticsearch",
"cluster_uuid": "qWXtWZ2qTiq5d2TbPFy3kQ",
"version": {
"number": "8.13.0",
"build_flavor": "default",
"build_type": "docker",
"build_hash": "09df99393193b2c53d92899662a8b8b3c55b45cd",
"build_date": "2024-03-22T03:35:46.757803203Z",
"build_snapshot": false,
"lucene_version": "9.10.0",
"minimum_wire_compatibility_version": "7.17.0",
"minimum_index_compatibility_version": "7.0.0"
},
"tagline": "You Know, for Search"
}

Kibana:

Kibana dashboard
Kibana dashboard

Local mirror#

  • The posts microservice caches user account names, for which the accounts microservice is the authoritative data source.
  • The threads microservice caches first post content, for which the posts microservice is the authoritative data source.

When an object is created or updated, to all listening services is sent a messeage in an event. Each service update their own data caches with received information.

To obtain authorative data, we must contact to the service that stores the data. We do this using a “query” message.

Source: https://docs.axoniq.io/reference-guide/architecture-overview
Source: https://docs.axoniq.io/reference-guide/architecture-overview

Authorization, session management and identities#

Authentication demo: Google Drive

To allow both the administrator login and identities federated via OAuth - we decided to integrate with Keycloak - an open source identity and access management solution that offers features like single sign-on (SSO), user federation, and social login, delegating the security responsibilities to production ready software reducing the need for custom security implementations.

Keycloak - Github integration
Keycloak - Github integration

After signing in, a JWT token is generated. Later the token is validated inside the backend services. The JWT token is validated against Keycloak JWT set containing the certificate used to sign the JWT token.

This approach could be safer if the certificate was manually copied e.g. during the deployment. In the used approach it is assumed that the Keycloak service is trusted.

Keycloak - roles
Keycloak - roles

Keycloak maps each user to authorization scopes which are later verified using Spring Security:

@PreAuthorize("hasAnyAuthority('${Scopes.USER.READ}')")

Keycloak uses OpenID Connect (OIDC) which is an identity layer built on top of OAuth 2.0. It is used for authentication, allowing clients to verify the identity of the end-user based on the authentication performed by an authorization server.

(Not yet implemented) When creating a new account (e.g. using social login) an event is sent to the backend to also create a new account inside the Accounts service. It is going to be implemented as a custom authentication flow.

User interface GUI#

Creating new thread:

Forum post
Forum post

Risk analysis with threat model of the service#

  • Broken Access Control
    • only registred users have access to our service
    • Keycloak enforces role-based access control (RBAC) to ensure proper authorization. It provides fine-grained access control policies for resource protection.
  • Cryptographic Failures
    • we enforce communitcation with tls 1.2+
  • Injection
    • we validate values and types of request, such as checking if UID is UID, if emails are in the correct format
    • insted of raw SQL, we use ORAM, JPA, Spring Data
    • on the frontend, we use security attributes to prevent XSS, such as InnterText and TextContent
    • Keycloak uses secure serialization and deserialization libraries and practices. It validates and sanitizes input during deserialization to prevent object injection attacks.
  • Insecure Design
    • Each component is isolated and only has access to required informations
  • Security Misconfiguration
    • Keycloak offers easy-to-configure security settings and provides default secure configurations. It regularly updates and patches security vulnerabilities, reducing the risk of misconfiguration.
  • Vulnerable and Outdated Components
    • we are using DependaBot which make pull request with update of an component
  • Identification and Authentication Failures
    • we enforce logging through other social portals (SSO), and we do not store passwords except for administratiors, who are manually created
  • Software and Data Integrity Failures
    • we use a dedicated repository and Docker itself ensures the consistency of runing images.
  • Security Logging and Monitoring Failures
    • logs are saved, and access to them is only for authenticated administrators
    • we do not store any passwords in logs
  • Server-Side Request Forgery
    • an app doesn’t make any request on behalf of the user and SSRF is not possible

Every service has own self signed certificate.

A certificates are created for this domain:

*.ersms-forum.svc.cluster.local

We only trust addresses from this domain.

Services that neeed another service have been configured to trust certificates of each service - their certificates have been added to the “Trust Store”.

Eg. MongoDB has a certificate added and services that use this datebase have it added as trusted.

Stress on reliability of the service#

Every service has two replicas in case one stops woking.

We are using Goolgle services for increased security and the option to select better machines.

Pods are located in different physical locations.

Grafana reliability stats
Grafana reliability stats

Technologies#

  • Kotlin w/ Spring & Axon
  • Github Actions - building, testing & publishing
  • Kubernetes

API Gateway

Accounts Service

Threads Service

Posts Service

Event Handler (Axon Server)

Event Store

Message Queue

Database

Architecture#

Use Cases#

  • use case
  • story boards
  • scenarios

User

Moderator

Administrator

Login

Registration

Post creation

Post modification

Commenting

Comment modification

Logical View#

  • end user / customer
  • class
  • object
  • package
  • composite
  • state machine

USER_EVENTS

User Query Service

USER_VIEW

string

id

string

name

string

email

User Query Service

Development View#

  • software engineer
  • manager
  • component

Axon Server

Event Store

User Events Store

Post Events Store

Reaction Events Store

Message Queue

Database

Reaction database

Handled events token store

Reaction Read Model - aggregate projection

Post database

Handled events token store

Post Read Model - aggregate projection

Users database

Handled events token store

User Read Model - aggregate projection

Reaction Service

Reaction Command Service

Reaction Query Service

Post Service

Post Command Service

Post Query Service

User Service

User Command Service

User Query Service

Gateway

Process View#

  • Solution architect
  • Sequence
  • Communication
  • Activity
  • Timing
  • Interaction overview

Micro-service main component communication

  • command handling
Event HandlersEvent Store / Message QueueAggregateControllerEvent HandlersEvent Store / Message QueueAggregateControllerUserCommand with modification requestCommand routed via Command GatewayFetch all the events with given aggregate idRecreate the aggregate state based on those eventsHandle provided command, it may generate eventsHandle new events in appropriate event sourcing handlers (modify aggregate state)Store generated valid eventsPass events to other event handlersNo response (Async) or return generated aggregate identifierReturn resultUser
  • query service updating the state
RepositoryProjectorToken StoreQueue ServiceEvent Store / Message QueueRepositoryProjectorToken StoreQueue ServiceEvent Store / Message QueueService has event handlers for given typeThis instance is responsible for handling event with that aggregate idPass events to all the registered event handlersFetch current state of View from databaseCurrent ViewUpdate view based on info from event (in case of failure service, events may be loaded multiple times)Save updated projection in databaseSuccessfull saveEvent handledSuccessfully handled event with given id
  • querying the query service
RepositoryProjectorControllerRepositoryProjectorControllerUserQuery requestSend query via query gateway to all the registered QueryHandlersFetch appropriate projection using provided queryReturn projectionReturn resultReturn resultUser

Psychical View#

  • Software architect
  • Deployment

Almost deployment view

Axon Server

Database

Reaction Service

Post Service

User Service

Event Store

Command

Query

Result

Reaction database

Handled events token store

Reaction Read Model - aggregate projection

Post database

Handled events token store

Post Read Model - aggregate projection

Users database

Handled events token store

User Read Model - aggregate projection

UI

Gateway

User Command Service

User Query Service

Post Command Service

Post Query Service

Reaction Command Service

Reaction Query Service

User Events Store

Post Events Store

Reaction Events Store

Message Queue