WarCrawler: Building an Autonomous OSINT Intelligence System for Real-Time Conflict Monitoring
Author: Amin — AI Automation & Infrastructure Architect
Published on: aminet.ai
Tags: OSINT automation, autonomous intelligence system, conflict monitoring, AI infrastructure, data engineering, real-time intelligence pipeline, geopolitical risk analysis, Telegram bot integration
TL;DR
WarCrawler is an autonomous open-source intelligence (OSINT) system that crawls 18 curated feeds across three active conflict theaters, extracts structured intelligence using large language models, and delivers scored, labeled alerts to Telegram — replacing manual monitoring with a disciplined, always-on analytical pipeline.
The Problem: Signal Buried Under Noise
Every active conflict zone generates an overwhelming volume of information. Across Russia-Ukraine, Iran-Israel-USA escalation cycles, and their cascading economic aftershocks, critical developments scatter across dozens of sources daily. Most of it is repetitive, speculative, or buried under irrelevant content.
Manual monitoring does not scale. It cannot consistently answer the operational questions that matter: What actually happened today? What is confirmed versus claimed? What are the downstream effects on energy markets, shipping lanes, and investor sentiment? What should I be watching over the next 72 hours?
The core problem was never a lack of information. It was the absence of structured signal extraction — too much input with no systematic way to separate fact from noise, tag urgency, and trace consequences from the battlefield to the global economy.
WarCrawler was built to solve exactly that.
System Architecture: Two-Layer Intelligence Model
The entire system is organized around a two-layer conceptual framework that separates battlefield events from their global ripple effects.
Layer 1 — Direct Conflict Intelligence
This layer captures everything happening on or near the battlefield: military strikes, drone and missile events, air-defense activations, naval incidents, force posture changes, official military statements, sanctions announcements, and escalation or de-escalation indicators. It answers the question: What happened on the ground?
Layer 2 — Global Side-Effects Tracking
This layer captures everything that radiates outward from the conflict zone into the wider world: oil-price shocks, shipping and maritime risk (especially around the Strait of Hormuz), airspace disruptions, supply-chain stress, inflation pressure, insurance cost spikes, investor sentiment shifts, and regional macroeconomic deterioration.
The architectural insight behind WarCrawler is that conflict events do not stay contained. A missile exchange becomes an energy-market story. A naval threat becomes a shipping-insurance story. A regional escalation becomes an inflation and growth story across Europe, the Gulf, the United States, and beyond. The system was designed to trace those causal chains automatically.
Three Monitored Conflict Theaters
WarCrawler scopes its monitoring to three active theaters, each feeding into both intelligence layers:
- Russia-Ukraine — ground war kinetic events, force movements, energy-market consequences, and grain supply disruptions.
- Iran-Israel-USA — direct confrontation dynamics including strikes, retaliation cycles, posturing, and the risk of wider regional escalation.
- Economic Fallout — the combined downstream damage from both theaters: energy shocks, sanctions regimes, trade disruption, and global sentiment deterioration.
Source Architecture: 18 Curated OSINT Feeds
Rather than mass-scraping the open web, WarCrawler operates on a philosophy of narrow but high-credibility input. The system monitors 18 selected open-source intelligence feeds chosen for reliability and coverage:
- Military and conflict reporting: ISW situation reports, CENTCOM press releases, ACLED conflict event tracking, GDELT global event data
- Major news outlets: Reuters, BBC, Al Jazeera
- Economic and energy indicators: Sentix economic sentiment, IEA reports, IMF data, OFAC sanctions feeds
This approach mirrors how a disciplined conflict analyst works — limited, trusted sources rather than an uncontrolled firehose of unverified data. The system behaves like a careful analyst with automation support, not like an aggressive web crawler.
The Data Pipeline: From Raw Feed to Structured Intelligence
The WarCrawler extraction pipeline runs autonomously with no manual trigger:
- Wake Cycle — The system activates every 5 minutes on an autonomous schedule.
- Fetch — All 18 feeds are crawled for new content.
- Filter — Noise, duplicates, and irrelevant material are removed.
- Extract — Relevant documents pass through an LLM (GLM-4 Flash) for structured extraction.
- Structure — Each extracted event is tagged with: conflict theater, urgency score (1–10), geolocation, actor names, and a structured summary.
- Route — Events scoring urgency 7 or above trigger an immediate Telegram alert. Everything else accumulates for scheduled reporting.
What arrives in Telegram is not raw news. It is extracted, structured, scored intelligence — ready for decision support.
Tech Stack
The production infrastructure runs on a VPS (Ubuntu 24.04) with a microservices-oriented stack:
- Task orchestration: Celery workers with beat scheduler for autonomous wake cycles
- Primary database: PostgreSQL for structured event storage
- Caching and message broker: Redis
- Full-text search: OpenSearch for querying across ingested intelligence
- Vector storage: Qdrant for semantic similarity and deduplication
- Object storage: MinIO for raw document archival
- Delivery: Telegram Bot API for alert routing
- LLM extraction: GLM-4 Flash for structured intelligence parsing
Delivery System: Three Channels, Three Cadences
WarCrawler delivers intelligence through three distinct channels, each serving a different operational need:
Immediate Alerts (Real-Time)
Any event scoring urgency 7 or above on the extraction pipeline's scoring system is pushed to Telegram instantly. This covers breaking military strikes, major escalation events, sudden energy-market shocks, and critical diplomatic developments.
Daily Digest (08:00 UTC)
A structured morning brief organized into three sections: kinetic events in Ukraine, kinetic events in the Middle East, and economic shock indicators. This provides a clean daily operational picture without requiring manual browsing.
Weekly Strategic Brief (Monday 09:00 UTC)
A synthesis report covering the week's patterns — escalation trajectories, recurring themes, trend lines, and emerging risks. This is the layer that converts daily event tracking into strategic understanding and forward-looking risk assessment.
Output Structure: Decision-Grade Intelligence
Each reporting cycle produces a standardized set of structured outputs designed to answer specific operational questions:
- Direct conflict snapshot — what happened on the battlefield
- Global side-effects snapshot — what rippled outward economically and logistically
- Strongest verified findings — confirmed, source-grounded facts only
- Cross-source pattern detection — themes appearing across multiple independent feeds
- Disputed or uncertain claims — clearly labeled as unverified, never mixed with confirmed intelligence
- Risk scorecard — urgency and severity ratings per theater
- Next-step monitoring targets — what to watch over the next 24–72 hours
- Executive brief — a concise summary suitable for decision support or content creation
Safety and Control Architecture
Reliability and discipline are built into the system at every level:
- Rate limiting on both Telegram alert frequency and LLM API calls to prevent spam and cost overruns
- Dead letter queue for any document the pipeline cannot process — nothing silently disappears
- Operator alerts if any component fails — the system reports its own breakdowns
- No uncontrolled looping — execution gathers evidence, summarizes, and terminates cleanly
- Narrow mission scope — a defined source list with no uncontrolled scraping
- Credibility-first prioritization — official sources and top-tier reporting ranked above weak claims
- Explicit uncertainty handling — verified facts and unconfirmed claims are structurally separated at every stage
Operating Philosophy
WarCrawler follows a strict operational doctrine:
- Fewer high-quality findings over high-volume noise. Every output is curated, not dumped.
- Fact and rumor are separated at every stage. The system never presents claims as confirmed intelligence.
- Uncertainty is labeled explicitly. If a finding is unverified, it is marked as such — always.
- Repeatability is a design goal. Every run produces a consistent, comparable output structure.
- Events are connected to consequences. The system always traces the line from kinetic event → economic impact → global narrative shift.
- Discipline over volume. The system behaves like a careful analyst, not a chaotic aggregator.
What WarCrawler Is Not
Clear boundaries define what the system is explicitly designed to avoid:
- Not a propaganda generation tool
- Not a speculative war-prediction engine
- Not an unrestricted mass web scraper
- Not reliant on weak rumor sources as primary evidence
- Not a replacement for expert geopolitical judgment
- Not a military targeting, operational harm, or surveillance tool
Its purpose is structured public-information monitoring and open-source intelligence analysis only.
Lessons Learned: Building Autonomous Intelligence Infrastructure
Building WarCrawler surfaced several engineering and architectural insights that apply broadly to autonomous AI systems:
Celery beat scheduling requires careful task-name alignment. Mismatched task names between the beat configuration and the worker registration silently fail — tasks simply never execute. This was one of the most time-consuming debugging cycles in the project.
LLM extraction pipelines need structured output contracts. Without a rigid output schema enforced at the prompt level, LLM responses drift in format across runs, breaking downstream consumers. Defining explicit JSON schemas for extraction output was critical for pipeline stability.
Deduplication is harder than it looks. Across 18 feeds covering overlapping events, semantic deduplication (via Qdrant vector similarity) was essential. Exact-match deduplication alone missed paraphrased duplicates from different sources covering the same incident.
Rate limiting is a first-class architectural concern, not an afterthought. Both the LLM API and the Telegram Bot API have strict rate limits that, if hit, cascade into missed alerts and stale intelligence. Building rate-aware task scheduling into the Celery worker logic from day one prevented downstream failures.
From Monitoring Tool to Geopolitical Observatory
The end state of WarCrawler extends beyond daily monitoring. Over time, the accumulated structured data enables pattern detection across longer time horizons: whether tensions are escalating or de-escalating, whether shipping risk is spreading geographically, whether oil-price reactions are temporary spikes or structural shifts, whether sanctions regimes are tightening, and whether economic sentiment is weakening across multiple regions simultaneously.
This transforms WarCrawler from a real-time alert system into a compact geopolitical and macro-impact observatory — a durable framework for watching how regional conflict reshapes markets, logistics, sentiment, and global risk perception.
Architecture Summary
| Layer | Function | Components |
|---|---|---|
| Source Layer | Controlled, credible input | 18 curated OSINT feeds — military, news, economic |
| Crawl Layer | Continuous unattended collection | 5-minute autonomous wake cycle via Celery beat |
| Extraction Layer | Raw content → structured intelligence | GLM-4 Flash with theater/urgency/actor tagging |
| Conflict Layer | Direct battlefield awareness | Strikes, retaliation, posture, sanctions, escalation signals |
| Side-Effects Layer | Downstream consequence tracking | Oil, shipping, airspace, inflation, markets, sentiment |
| Delivery Layer | Right information at the right time | Telegram: instant alerts, daily digest, weekly strategic brief |
| Control Layer | Reliability and safety | Rate limits, dead letter queue, operator alerts |
| Analysis Layer | Decision-grade intelligence | Risk scorecards, trend synthesis, uncertainty labeling |
About
WarCrawler was designed and built by Amin, an AI Automation & Infrastructure Architect based in Dubai, specializing in autonomous intelligence systems, multi-agent orchestration, and production-grade AI infrastructure for the enterprise.
More projects and technical writing at aminet.ai.
Keywords: OSINT automation, autonomous intelligence system, conflict monitoring pipeline, real-time geopolitical risk analysis, open-source intelligence engineering, AI-powered data extraction, structured intelligence delivery, Telegram bot integration, Celery task orchestration, multi-source NLP pipeline, war intelligence automation, situational awareness system, defense intelligence infrastructure, economic impact monitoring, data engineering architecture, autonomous enterprise systems