Preq Docs
preq
(prounounced "preek") is a free and open community-driven reliability problem detector. Use preq
to:
- detect the latest bugs, misconfigurations, anti-patterns, and known issues from a community of practitioners
- provide engineers, on-call support, and SRE agents with impact and community recommended mitigations
- hunt for new problems in logs
preq
is powered by Common Reliability Enumerations (CREs) that are contributed by the community and Prequel's Reliability Research Team. Reliability intelligence helps teams see a broad range of problems earlier, so they can prioritize, pinpoint, and reduce the risk of outages.
Install
Download the latest release on Github.
Follow the easy installation instructions and learn about CREs and the simple problem detection syntax.
Play
Try the playground with schema validation. It uses WebAssembly and runs 100% in your browser. No data leaves it!
How it Works
preq
can run community CREs on any data source with a timestamp and line delimiter.
Example CRE Rule
preq
is powered by a rules engine that performs matching and correlation of sequences of events to detect reliability problems. CREs provides accurate and timely context for a human or SRE agent to take action on problems.
Below is simple rule that looks for a sequence of events in a single log source over a window of time along with a negative condition (an event that should not occur during the window).
rules:
- cre:
id: CRE-2024-0007
severity: 0
title: RabbitMQ Mnesia overloaded recovering persistent queues
category: message-queue-problems
author: Prequel
description: |
- The RabbitMQ cluster is processing a large number of persistent mirrored queues at boot.
cause: |
- The Erlang process, Mnesia, is overloaded while recovering persistent queues on boot.
impact: |
- RabbitMQ is unable to process any new messages and can cause outages in consumers and producers.
tags:
- cre-2024-0007
- known-problem
- rabbitmq
mitigation: |
- Adjusting mirroring policies to limit the number of mirrored queues
- Remove high-availability policies from queues
- Add additional CPU resources and restart the RabbitMQ cluster
- Use [lazy queues](https://www.rabbitmq.com/docs/lazy-queues) to avoid incurring the costs of writing data to disk
references:
- https://groups.google.com/g/rabbitmq-users/c/ekV9tTBRZms/m/1EXw-ruuBQAJ
applications:
- name: "rabbitmq"
version: "3.9.x"
metadata:
kind: prequel
id: 5UD1RZxGC5LJQnVpAkV11A
generation: 1
rule:
sequence:
window: 30s
event:
source: rabbitmq
order:
- regex: "Discarding message(.+)in an old incarnation(.+)of this node"
- "Mnesia is overloaded"
negate:
- "SIGTERM received - shutting down"