WarCrawler: Building an Autonomous OSINT Intelligence System for Real-Time Conflict Monitoring

Author: Amin — AI Automation & Infrastructure Architect
Published on: aminet.ai
Tags: OSINT automation, autonomous intelligence system, conflict monitoring, AI infrastructure, data engineering, real-time intelligence pipeline, geopolitical risk analysis, Telegram bot integration

TL;DR

WarCrawler is an autonomous open-source intelligence (OSINT) system that crawls 18 curated feeds across three active conflict theaters, extracts structured intelligence using large language models, and delivers scored, labeled alerts to Telegram — replacing manual monitoring with a disciplined, always-on analytical pipeline.

The Problem: Signal Buried Under Noise

Every active conflict zone generates an overwhelming volume of information. Across Russia-Ukraine, Iran-Israel-USA escalation cycles, and their cascading economic aftershocks, critical developments scatter across dozens of sources daily. Most of it is repetitive, speculative, or buried under irrelevant content.

Manual monitoring does not scale. It cannot consistently answer the operational questions that matter: What actually happened today? What is confirmed versus claimed? What are the downstream effects on energy markets, shipping lanes, and investor sentiment? What should I be watching over the next 72 hours?

The core problem was never a lack of information. It was the absence of structured signal extraction — too much input with no systematic way to separate fact from noise, tag urgency, and trace consequences from the battlefield to the global economy.

WarCrawler was built to solve exactly that.

System Architecture: Two-Layer Intelligence Model

The entire system is organized around a two-layer conceptual framework that separates battlefield events from their global ripple effects.

Layer 1 — Direct Conflict Intelligence

This layer captures everything happening on or near the battlefield: military strikes, drone and missile events, air-defense activations, naval incidents, force posture changes, official military statements, sanctions announcements, and escalation or de-escalation indicators. It answers the question: What happened on the ground?

Layer 2 — Global Side-Effects Tracking

This layer captures everything that radiates outward from the conflict zone into the wider world: oil-price shocks, shipping and maritime risk (especially around the Strait of Hormuz), airspace disruptions, supply-chain stress, inflation pressure, insurance cost spikes, investor sentiment shifts, and regional macroeconomic deterioration.

The architectural insight behind WarCrawler is that conflict events do not stay contained. A missile exchange becomes an energy-market story. A naval threat becomes a shipping-insurance story. A regional escalation becomes an inflation and growth story across Europe, the Gulf, the United States, and beyond. The system was designed to trace those causal chains automatically.

Three Monitored Conflict Theaters

WarCrawler scopes its monitoring to three active theaters, each feeding into both intelligence layers:

Russia-Ukraine — ground war kinetic events, force movements, energy-market consequences, and grain supply disruptions.
Iran-Israel-USA — direct confrontation dynamics including strikes, retaliation cycles, posturing, and the risk of wider regional escalation.
Economic Fallout — the combined downstream damage from both theaters: energy shocks, sanctions regimes, trade disruption, and global sentiment deterioration.

Source Architecture: 18 Curated OSINT Feeds

Rather than mass-scraping the open web, WarCrawler operates on a philosophy of narrow but high-credibility input. The system monitors 18 selected open-source intelligence feeds chosen for reliability and coverage:

Military and conflict reporting: ISW situation reports, CENTCOM press releases, ACLED conflict event tracking, GDELT global event data
Major news outlets: Reuters, BBC, Al Jazeera
Economic and energy indicators: Sentix economic sentiment, IEA reports, IMF data, OFAC sanctions feeds

This approach mirrors how a disciplined conflict analyst works — limited, trusted sources rather than an uncontrolled firehose of unverified data. The system behaves like a careful analyst with automation support, not like an aggressive web crawler.

The Data Pipeline: From Raw Feed to Structured Intelligence

The WarCrawler extraction pipeline runs autonomously with no manual trigger:

Wake Cycle — The system activates every 5 minutes on an autonomous schedule.
Fetch — All 18 feeds are crawled for new content.
Filter — Noise, duplicates, and irrelevant material are removed.
Extract — Relevant documents pass through an LLM (GLM-4 Flash) for structured extraction.
Structure — Each extracted event is tagged with: conflict theater, urgency score (1–10), geolocation, actor names, and a structured summary.
Route — Events scoring urgency 7 or above trigger an immediate Telegram alert. Everything else accumulates for scheduled reporting.

What arrives in Telegram is not raw news. It is extracted, structured, scored intelligence — ready for decision support.

Tech Stack

The production infrastructure runs on a VPS (Ubuntu 24.04) with a microservices-oriented stack:

Task orchestration: Celery workers with beat scheduler for autonomous wake cycles
Primary database: PostgreSQL for structured event storage
Caching and message broker: Redis
Full-text search: OpenSearch for querying across ingested intelligence
Vector storage: Qdrant for semantic similarity and deduplication
Object storage: MinIO for raw document archival
Delivery: Telegram Bot API for alert routing
LLM extraction: GLM-4 Flash for structured intelligence parsing

Delivery System: Three Channels, Three Cadences

WarCrawler delivers intelligence through three distinct channels, each serving a different operational need:

Immediate Alerts (Real-Time)

Any event scoring urgency 7 or above on the extraction pipeline's scoring system is pushed to Telegram instantly. This covers breaking military strikes, major escalation events, sudden energy-market shocks, and critical diplomatic developments.

Daily Digest (08:00 UTC)

A structured morning brief organized into three sections: kinetic events in Ukraine, kinetic events in the Middle East, and economic shock indicators. This provides a clean daily operational picture without requiring manual browsing.

Weekly Strategic Brief (Monday 09:00 UTC)

A synthesis report covering the week's patterns — escalation trajectories, recurring themes, trend lines, and emerging risks. This is the layer that converts daily event tracking into strategic understanding and forward-looking risk assessment.

Output Structure: Decision-Grade Intelligence

Each reporting cycle produces a standardized set of structured outputs designed to answer specific operational questions:

Direct conflict snapshot — what happened on the battlefield
Global side-effects snapshot — what rippled outward economically and logistically
Strongest verified findings — confirmed, source-grounded facts only
Cross-source pattern detection — themes appearing across multiple independent feeds
Disputed or uncertain claims — clearly labeled as unverified, never mixed with confirmed intelligence
Risk scorecard — urgency and severity ratings per theater
Next-step monitoring targets — what to watch over the next 24–72 hours
Executive brief — a concise summary suitable for decision support or content creation

Safety and Control Architecture

Reliability and discipline are built into the system at every level:

Rate limiting on both Telegram alert frequency and LLM API calls to prevent spam and cost overruns
Dead letter queue for any document the pipeline cannot process — nothing silently disappears
Operator alerts if any component fails — the system reports its own breakdowns
No uncontrolled looping — execution gathers evidence, summarizes, and terminates cleanly
Narrow mission scope — a defined source list with no uncontrolled scraping
Credibility-first prioritization — official sources and top-tier reporting ranked above weak claims
Explicit uncertainty handling — verified facts and unconfirmed claims are structurally separated at every stage

Operating Philosophy

WarCrawler follows a strict operational doctrine:

Fewer high-quality findings over high-volume noise. Every output is curated, not dumped.
Fact and rumor are separated at every stage. The system never presents claims as confirmed intelligence.
Uncertainty is labeled explicitly. If a finding is unverified, it is marked as such — always.
Repeatability is a design goal. Every run produces a consistent, comparable output structure.
Events are connected to consequences. The system always traces the line from kinetic event → economic impact → global narrative shift.
Discipline over volume. The system behaves like a careful analyst, not a chaotic aggregator.

What WarCrawler Is Not

Clear boundaries define what the system is explicitly designed to avoid:

Not a propaganda generation tool
Not a speculative war-prediction engine
Not an unrestricted mass web scraper
Not reliant on weak rumor sources as primary evidence
Not a replacement for expert geopolitical judgment
Not a military targeting, operational harm, or surveillance tool

Its purpose is structured public-information monitoring and open-source intelligence analysis only.

Lessons Learned: Building Autonomous Intelligence Infrastructure

Building WarCrawler surfaced several engineering and architectural insights that apply broadly to autonomous AI systems:

Celery beat scheduling requires careful task-name alignment. Mismatched task names between the beat configuration and the worker registration silently fail — tasks simply never execute. This was one of the most time-consuming debugging cycles in the project.

LLM extraction pipelines need structured output contracts. Without a rigid output schema enforced at the prompt level, LLM responses drift in format across runs, breaking downstream consumers. Defining explicit JSON schemas for extraction output was critical for pipeline stability.

Deduplication is harder than it looks. Across 18 feeds covering overlapping events, semantic deduplication (via Qdrant vector similarity) was essential. Exact-match deduplication alone missed paraphrased duplicates from different sources covering the same incident.

Rate limiting is a first-class architectural concern, not an afterthought. Both the LLM API and the Telegram Bot API have strict rate limits that, if hit, cascade into missed alerts and stale intelligence. Building rate-aware task scheduling into the Celery worker logic from day one prevented downstream failures.

From Monitoring Tool to Geopolitical Observatory

The end state of WarCrawler extends beyond daily monitoring. Over time, the accumulated structured data enables pattern detection across longer time horizons: whether tensions are escalating or de-escalating, whether shipping risk is spreading geographically, whether oil-price reactions are temporary spikes or structural shifts, whether sanctions regimes are tightening, and whether economic sentiment is weakening across multiple regions simultaneously.

This transforms WarCrawler from a real-time alert system into a compact geopolitical and macro-impact observatory — a durable framework for watching how regional conflict reshapes markets, logistics, sentiment, and global risk perception.

Architecture Summary

Layer	Function	Components
Source Layer	Controlled, credible input	18 curated OSINT feeds — military, news, economic
Crawl Layer	Continuous unattended collection	5-minute autonomous wake cycle via Celery beat
Extraction Layer	Raw content → structured intelligence	GLM-4 Flash with theater/urgency/actor tagging
Conflict Layer	Direct battlefield awareness	Strikes, retaliation, posture, sanctions, escalation signals
Side-Effects Layer	Downstream consequence tracking	Oil, shipping, airspace, inflation, markets, sentiment
Delivery Layer	Right information at the right time	Telegram: instant alerts, daily digest, weekly strategic brief
Control Layer	Reliability and safety	Rate limits, dead letter queue, operator alerts
Analysis Layer	Decision-grade intelligence	Risk scorecards, trend synthesis, uncertainty labeling

About

WarCrawler was designed and built by Amin, an AI Automation & Infrastructure Architect based in Dubai, specializing in autonomous intelligence systems, multi-agent orchestration, and production-grade AI infrastructure for the enterprise.

More projects and technical writing at aminet.ai.

Keywords: OSINT automation, autonomous intelligence system, conflict monitoring pipeline, real-time geopolitical risk analysis, open-source intelligence engineering, AI-powered data extraction, structured intelligence delivery, Telegram bot integration, Celery task orchestration, multi-source NLP pipeline, war intelligence automation, situational awareness system, defense intelligence infrastructure, economic impact monitoring, data engineering architecture, autonomous enterprise systems

warcrawler article