Complete digest system
This commit is contained in:
273
README.md
273
README.md
@@ -1,138 +1,231 @@
|
||||
# Zpravobot Digest Bot
|
||||
# Zprávobot AI Digest
|
||||
|
||||
Automatický denní digest z Mastodon postů pomocí Claude AI.
|
||||
Automatický denní digest systém pro Mastodon boty používající Claude AI.
|
||||
|
||||
## Struktura
|
||||
## 🎯 Co to dělá
|
||||
|
||||
```
|
||||
/app/data/zpravobot-digest/
|
||||
├── export-daily.sh # Export postů z DB do CSV
|
||||
├── digest-bot.py # Hlavní script (Claude + Mastodon)
|
||||
├── run-digest.sh # Wrapper s config
|
||||
├── config.env.example # Template konfigurace
|
||||
└── README.md
|
||||
```
|
||||
Systém 3× denně:
|
||||
1. Načte včerejší posty z CSV exportu
|
||||
2. Automaticky je kategorizuje podle témat (🌍 Politika, 🏒 Sport, 🎬 Kultura...)
|
||||
3. Analyzuje pomocí Claude AI
|
||||
4. Publikuje 2-toot thread na Mastodon
|
||||
|
||||
## Instalace
|
||||
### Tři boty s různými styly
|
||||
|
||||
### 1. Klonuj repo
|
||||
| Bot | Čas | Styl | Účel |
|
||||
|-----|-----|------|------|
|
||||
| @zpravobot | 7:30 | Neutrální | Ranní přehled zpráv |
|
||||
| @pozitivni | 12:00 | Pozitivní | Polední motivace |
|
||||
| @sarkasticky | 19:00 | Sarkastický | Večerní komentář |
|
||||
|
||||
## 📋 Požadavky
|
||||
|
||||
- Ruby 3.0+
|
||||
- mastodon-api gem
|
||||
- PostgreSQL s Mastodon daty
|
||||
- Claude API klíč
|
||||
- 3 Mastodon bot tokeny
|
||||
|
||||
## 🚀 Instalace (Cloudron)
|
||||
|
||||
### 1. Připrav prostředí
|
||||
|
||||
V Mastodon terminalu (Cloudron):
|
||||
```bash
|
||||
cd /app/data
|
||||
git clone https://gitea.tvoje-domena.cz/user/zpravobot-digest.git
|
||||
git clone https://gitea.vhsky.cz/user/zpravobot-digest.git
|
||||
cd zpravobot-digest
|
||||
```
|
||||
|
||||
### 2. Konfigurace
|
||||
### 2. Nainstaluj Ruby gem
|
||||
```bash
|
||||
export GEM_HOME=$HOME/.gem
|
||||
export PATH=$GEM_HOME/bin:$PATH
|
||||
gem install mastodon-api --user-install
|
||||
```
|
||||
|
||||
Ověř instalaci:
|
||||
```bash
|
||||
ruby -e "require 'mastodon'; puts 'OK'"
|
||||
```
|
||||
|
||||
### 3. Konfigurace
|
||||
```bash
|
||||
cp config.env.example config.env
|
||||
nano config.env
|
||||
```
|
||||
|
||||
Vyplň:
|
||||
|
||||
- `ANTHROPIC_API_KEY` - Claude API token
|
||||
- `TOKEN_ZPRAVOBOT` - Mastodon token pro @zpravobot
|
||||
- `TOKEN_POZITIVNI` - Mastodon token pro @pozitivni
|
||||
- `TOKEN_SARKASTICKY` - Mastodon token pro @sarkasticky
|
||||
|
||||
### 3. Práva
|
||||
|
||||
Vyplň tokeny:
|
||||
```bash
|
||||
chmod +x export-daily.sh run-digest.sh digest-bot.py
|
||||
export ANTHROPIC_API_KEY="sk-ant-api03-..."
|
||||
export ZPRAVOBOT_TOKEN="token-zde"
|
||||
export POZITIVNI_TOKEN="token-zde"
|
||||
export SARKASTICKY_TOKEN="token-zde"
|
||||
```
|
||||
|
||||
**Jak vytvořit Mastodon tokeny:**
|
||||
1. Přihlásit se jako bot účet
|
||||
2. Settings → Development → New application
|
||||
3. Scopes: `read:statuses` + `write:statuses`
|
||||
4. Zkopírovat "Your access token"
|
||||
|
||||
### 4. Spustitelné práva
|
||||
```bash
|
||||
chmod +x export-daily.sh publish_digest.rb run-digest.sh
|
||||
chmod 600 config.env
|
||||
```
|
||||
|
||||
### 4. Test
|
||||
## 🧪 Testování
|
||||
|
||||
### Dry-run (bez publikace)
|
||||
```bash
|
||||
./export-daily.sh # Export CSV
|
||||
./run-digest.sh zpravobot # Test digestu
|
||||
source config.env
|
||||
./run-digest.sh zpravobot --dry-run
|
||||
./run-digest.sh pozitivni --dry-run
|
||||
./run-digest.sh sarkasticky --dry-run
|
||||
```
|
||||
|
||||
## Použití
|
||||
|
||||
### Manuální spuštění
|
||||
|
||||
### Live test (skutečná publikace)
|
||||
```bash
|
||||
./run-digest.sh zpravobot # Neutrální digest
|
||||
./run-digest.sh pozitivni # Pozitivní digest
|
||||
./run-digest.sh sarkasticky # Sarkastický digest
|
||||
./run-digest.sh zpravobot
|
||||
```
|
||||
|
||||
### Automatizace (Cloudron Cron)
|
||||
Zkontroluj na Mastodonu že se thread publikoval.
|
||||
|
||||
## ⏰ Automatizace (Cron)
|
||||
|
||||
V Cloudron UI → Mastodon app → Cron tab:
|
||||
```
|
||||
0 6 * * * /app/data/zpravobot-digest/export-daily.sh
|
||||
0 7 * * * /app/data/zpravobot-digest/run-digest.sh zpravobot
|
||||
30 7 * * * /app/data/zpravobot-digest/run-digest.sh zpravobot
|
||||
0 12 * * * /app/data/zpravobot-digest/run-digest.sh pozitivni
|
||||
0 18 * * * /app/data/zpravobot-digest/run-digest.sh sarkasticky
|
||||
0 19 * * * /app/data/zpravobot-digest/run-digest.sh sarkasticky
|
||||
```
|
||||
|
||||
## Výstup
|
||||
|
||||
### Export CSV
|
||||
|
||||
- **Lokace:** `/app/data/posts-latest.csv`
|
||||
- **Formát:** `id,created_at,text,uri,url,account_id`
|
||||
- **Rozsah:** Posledních 48 hodin
|
||||
- **Archiv:** `/app/data/archive/posts-YYYY-MM-DD.csv` (7 dní)
|
||||
|
||||
### Digest
|
||||
|
||||
- 2-toot thread (summary + odkazy)
|
||||
- Publikováno na příslušný bot účet
|
||||
- Styl podle bot personality
|
||||
|
||||
## Jak to funguje
|
||||
|
||||
1. **Export (6:00):** SQL → CSV export z PostgreSQL
|
||||
2. **Digest (7:00/12:00/18:00):**
|
||||
- Načte CSV
|
||||
- Pošle data Claude API
|
||||
- Claude analyzuje témata
|
||||
- Publikuje 2-toot thread na Mastodon
|
||||
|
||||
## Struktura souborů
|
||||
**Rozvrh:**
|
||||
- 6:00 - Export postů z databáze
|
||||
- 7:30 - Neutrální digest (@zpravobot)
|
||||
- 12:00 - Pozitivní zprávy (@pozitivni)
|
||||
- 19:00 - Sarkastický komentář (@sarkasticky)
|
||||
|
||||
## 📁 Struktura souborů
|
||||
```
|
||||
/app/data/
|
||||
├── zpravobot-digest/ # Git repo
|
||||
│ ├── export-daily.sh
|
||||
│ ├── digest-bot.py
|
||||
│ ├── run-digest.sh
|
||||
│ ├── config.env # Gitignored!
|
||||
│ └── README.md
|
||||
├── posts-latest.csv # Denní export
|
||||
├── archive/ # 7-denní historie
|
||||
│ └── posts-YYYY-MM-DD.csv
|
||||
├── zpravobot-digest/
|
||||
│ ├── export-daily.sh # CSV export z PostgreSQL
|
||||
│ ├── publish_digest.rb # Hlavní Ruby script
|
||||
│ ├── run-digest.sh # Wrapper (načte config)
|
||||
│ ├── config.env # Tokeny (gitignored!)
|
||||
│ └── config.env.example # Template
|
||||
├── posts-latest.csv # Denní export (2 dny postů)
|
||||
├── archive/
|
||||
│ └── posts-YYYY-MM-DD.csv # 7 denní historie
|
||||
└── logs/
|
||||
└── export.log
|
||||
└── export.log # Logy exportu
|
||||
```
|
||||
|
||||
## Požadavky
|
||||
## 🔧 Ruční použití
|
||||
|
||||
- Python 3.x (v Cloudron Mastodonu je)
|
||||
- Mastodon instance (zpravobot.news)
|
||||
- Claude API přístup
|
||||
- 3× Mastodon bot účty s tokeny
|
||||
### Publikovat digest
|
||||
```bash
|
||||
source config.env
|
||||
./run-digest.sh zpravobot # Neutrální
|
||||
./run-digest.sh pozitivni # Pozitivní
|
||||
./run-digest.sh sarkasticky # Sarkastický
|
||||
```
|
||||
|
||||
## Bezpečnost
|
||||
### Použít specifické datum
|
||||
```bash
|
||||
./run-digest.sh zpravobot --date=2026-01-05 --dry-run
|
||||
```
|
||||
|
||||
- ⚠️ `config.env` obsahuje citlivé tokeny → chmod 600
|
||||
- ⚠️ Nepublikuj `config.env` do Gitu (je v .gitignore)
|
||||
- ✅ DB přístup jen pro export script
|
||||
- ✅ Digest script čte pouze CSV (bez DB přístupu)
|
||||
### Export CSV
|
||||
```bash
|
||||
./export-daily.sh
|
||||
```
|
||||
|
||||
## TODO
|
||||
## 🎨 Vlastnosti
|
||||
|
||||
- [ ] Prompt optimalizace pro Clauda
|
||||
- [ ] Error handling v digest-bot.py
|
||||
- [ ] Notifikace při selhání
|
||||
- [ ] Web dashboard pro statistiky
|
||||
- ✅ **Automatická kategorizace témat** (Politik, Sport, Kultura...)
|
||||
- ✅ **Claude AI analýza** s fallbackem při selhání API
|
||||
- ✅ **Style filtering** - pozitivní bot filtruje negativní zprávy
|
||||
- ✅ **2-toot threads** - summary + odkazy
|
||||
- ✅ **URL extraction** z postů
|
||||
- ✅ **Error handling** a logging
|
||||
- ✅ **Dry-run mode** pro testování
|
||||
|
||||
## Autor
|
||||
## 📊 Monitoring
|
||||
|
||||
Kolega + Claude
|
||||
### Zkontrolovat dnešní běhy
|
||||
```bash
|
||||
# V logu exportu
|
||||
tail -50 /app/data/logs/export.log
|
||||
|
||||
# Ověřit CSV
|
||||
ls -lh /app/data/posts-latest.csv
|
||||
wc -l /app/data/posts-latest.csv
|
||||
```
|
||||
|
||||
### Zkontrolovat publikace
|
||||
|
||||
Navštiv:
|
||||
- https://zpravobot.news/@zpravobot
|
||||
- https://zpravobot.news/@pozitivni
|
||||
- https://zpravobot.news/@sarkasticky
|
||||
|
||||
## 🐛 Troubleshooting
|
||||
|
||||
### "CSV file not found"
|
||||
```bash
|
||||
# Ověř že export běžel
|
||||
ls -la /app/data/posts-latest.csv
|
||||
|
||||
# Spusť manuálně
|
||||
./export-daily.sh
|
||||
```
|
||||
|
||||
### "Missing token"
|
||||
```bash
|
||||
# Ověř environment
|
||||
source config.env
|
||||
echo $ZPRAVOBOT_TOKEN
|
||||
```
|
||||
|
||||
### "The access token is invalid"
|
||||
|
||||
Token vypršel nebo je neplatný. Vygeneruj nový v Mastodon → Settings → Development.
|
||||
|
||||
### Ruby gem chyba
|
||||
```bash
|
||||
# Reinstaluj gem
|
||||
export GEM_HOME=$HOME/.gem
|
||||
export PATH=$GEM_HOME/bin:$PATH
|
||||
gem install mastodon-api --user-install
|
||||
```
|
||||
|
||||
## 💰 Náklady
|
||||
|
||||
- **Claude API**: ~$3/měsíc (3 requesty/den)
|
||||
- **Infrastruktura**: $0 (běží na Mastodon serveru)
|
||||
|
||||
## 🔒 Bezpečnost
|
||||
|
||||
- ✅ Žádný přímý DB přístup (používá CSV export)
|
||||
- ✅ Tokeny v `config.env` (gitignored)
|
||||
- ✅ Read-only přístup k datům
|
||||
- ✅ Minimální oprávnění
|
||||
|
||||
## 📝 Licence
|
||||
|
||||
Open source - vytvořeno pro Zprávobot.news komunitu.
|
||||
|
||||
## 🙏 Credits
|
||||
|
||||
- **Zprávobot.news** - České/Slovenské Mastodon zpravodajství
|
||||
- **Anthropic Claude** - AI analýza
|
||||
- **Mastodon** - Decentralizovaná sociální síť
|
||||
|
||||
---
|
||||
|
||||
**Verze:** 1.0.0
|
||||
**Aktualizováno:** Leden 2026
|
||||
|
||||
@@ -1,14 +1,13 @@
|
||||
export ANTHROPIC_API_KEY="sk-ant-xxx..."
|
||||
export TOKEN_ZPRAVOBOT="your-token-here"
|
||||
export TOKEN_POZITIVNI="your-token-here"
|
||||
export TOKEN_SARKASTICKY="your-token-here"
|
||||
```
|
||||
cat > config.env.example << 'EOF'
|
||||
# Ruby gem setup
|
||||
export GEM_HOME=$HOME/.gem
|
||||
export PATH=$GEM_HOME/bin:$PATH
|
||||
|
||||
## .gitignore
|
||||
```
|
||||
config.env
|
||||
*.csv
|
||||
logs/
|
||||
archive/
|
||||
__pycache__/
|
||||
*.pyc
|
||||
# API Keys
|
||||
export ANTHROPIC_API_KEY="your-claude-api-key-here"
|
||||
|
||||
# Mastodon Bot Tokens
|
||||
export ZPRAVOBOT_TOKEN="your-zpravobot-token-here"
|
||||
export POZITIVNI_TOKEN="your-pozitivni-token-here"
|
||||
export SARKASTICKY_TOKEN="your-sarkasticky-token-here"
|
||||
EOF
|
||||
|
||||
115
digest-bot.py
115
digest-bot.py
@@ -1,115 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
import csv
|
||||
import requests
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
|
||||
# Config
|
||||
CSV_PATH = '/app/data/posts-latest.csv'
|
||||
CLAUDE_API = 'https://api.anthropic.com/v1/messages'
|
||||
CLAUDE_KEY = os.getenv('ANTHROPIC_API_KEY')
|
||||
MASTODON_URL = 'https://zpravobot.news'
|
||||
|
||||
# Bot selection
|
||||
bot_name = sys.argv[1] if len(sys.argv) > 1 else 'zpravobot'
|
||||
TOKENS = {
|
||||
'zpravobot': os.getenv('TOKEN_ZPRAVOBOT'),
|
||||
'pozitivni': os.getenv('TOKEN_POZITIVNI'),
|
||||
'sarkasticky': os.getenv('TOKEN_SARKASTICKY')
|
||||
}
|
||||
|
||||
if bot_name not in TOKENS:
|
||||
print(f"❌ Unknown bot: {bot_name}")
|
||||
sys.exit(1)
|
||||
|
||||
TOKEN = TOKENS[bot_name]
|
||||
|
||||
if not TOKEN or not CLAUDE_KEY:
|
||||
print(f"❌ Missing env variables")
|
||||
sys.exit(1)
|
||||
|
||||
# 1. Load posts
|
||||
try:
|
||||
with open(CSV_PATH, 'r', encoding='utf-8') as f:
|
||||
posts = list(csv.DictReader(f))
|
||||
print(f"📊 Loaded {len(posts)} posts for @{bot_name}")
|
||||
except Exception as e:
|
||||
print(f"❌ Error loading CSV: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
# 2. Prepare data for Claude
|
||||
posts_sample = posts[:500] # Limit to 500 posts
|
||||
posts_json = json.dumps(posts_sample, ensure_ascii=False)
|
||||
|
||||
# 3. Claude API
|
||||
print("🤖 Calling Claude API...")
|
||||
try:
|
||||
response = requests.post(
|
||||
CLAUDE_API,
|
||||
headers={
|
||||
'x-api-key': CLAUDE_KEY,
|
||||
'anthropic-version': '2023-06-01',
|
||||
'content-type': 'application/json'
|
||||
},
|
||||
json={
|
||||
'model': 'claude-sonnet-4-20250514',
|
||||
'max_tokens': 4000,
|
||||
'messages': [{
|
||||
'role': 'user',
|
||||
'content': f'Vytvoř denní digest pro bot @{bot_name}. Data: {posts_json[:10000]}'
|
||||
}]
|
||||
},
|
||||
timeout=60
|
||||
)
|
||||
|
||||
if response.status_code != 200:
|
||||
print(f"❌ Claude API error: {response.text}")
|
||||
sys.exit(1)
|
||||
|
||||
digest = response.json()['content'][0]['text']
|
||||
print(f"✅ Claude response: {len(digest)} chars")
|
||||
except Exception as e:
|
||||
print(f"❌ Claude API exception: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
# 4. Split to 2 toots (500 chars limit)
|
||||
toot1 = digest[:500]
|
||||
toot2 = digest[500:1000] if len(digest) > 500 else None
|
||||
|
||||
# 5. Publish toot 1
|
||||
print("📤 Publishing toot 1...")
|
||||
try:
|
||||
r1 = requests.post(
|
||||
f'{MASTODON_URL}/api/v1/statuses',
|
||||
headers={'Authorization': f'Bearer {TOKEN}'},
|
||||
json={'status': toot1}
|
||||
)
|
||||
|
||||
if r1.status_code not in [200, 201]:
|
||||
print(f"❌ Mastodon error: {r1.text}")
|
||||
sys.exit(1)
|
||||
|
||||
toot1_id = r1.json()['id']
|
||||
print(f"✅ Toot 1 published: {toot1_id}")
|
||||
except Exception as e:
|
||||
print(f"❌ Mastodon exception: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
# 6. Publish toot 2 (if exists)
|
||||
if toot2:
|
||||
print("📤 Publishing toot 2...")
|
||||
try:
|
||||
r2 = requests.post(
|
||||
f'{MASTODON_URL}/api/v1/statuses',
|
||||
headers={'Authorization': f'Bearer {TOKEN}'},
|
||||
json={
|
||||
'status': toot2,
|
||||
'in_reply_to_id': toot1_id
|
||||
}
|
||||
)
|
||||
print(f"✅ Toot 2 published (thread)")
|
||||
except Exception as e:
|
||||
print(f"⚠️ Toot 2 failed: {e}")
|
||||
|
||||
print(f"✅ Done! Published to @{bot_name}")
|
||||
@@ -1,10 +1,11 @@
|
||||
cat >export-daily.sh <<'EOF'
|
||||
#!/bin/bash
|
||||
DATE=$(date +%Y-%m-%d)
|
||||
LOG="/app/data/logs/export.log"
|
||||
|
||||
mkdir -p /app/data/logs /app/data/archive
|
||||
|
||||
echo "[$(date)] Starting export..." >>"$LOG"
|
||||
echo "[$(date)] Starting export..." >> "$LOG"
|
||||
|
||||
PGPASSWORD=${CLOUDRON_POSTGRESQL_PASSWORD} psql \
|
||||
-h ${CLOUDRON_POSTGRESQL_HOST} \
|
||||
@@ -18,10 +19,13 @@ PGPASSWORD=${CLOUDRON_POSTGRESQL_PASSWORD} psql \
|
||||
AND deleted_at IS NULL
|
||||
AND created_at > NOW() - INTERVAL '2 days'
|
||||
ORDER BY created_at DESC
|
||||
) TO STDOUT WITH CSV HEADER" >/app/data/posts-latest.csv
|
||||
) TO STDOUT WITH CSV HEADER" > /app/data/posts-latest.csv
|
||||
|
||||
cp /app/data/posts-latest.csv "/app/data/archive/posts-$DATE.csv"
|
||||
find /app/data/archive -name "posts-*.csv" -mtime +7 -delete
|
||||
|
||||
LINES=$(wc -l </app/data/posts-latest.csv)
|
||||
echo "[$(date)] Exported $LINES posts" >>"$LOG"
|
||||
LINES=$(wc -l < /app/data/posts-latest.csv)
|
||||
echo "[$(date)] Exported $LINES posts" >> "$LOG"
|
||||
EOF
|
||||
|
||||
chmod +x export-daily.sh
|
||||
|
||||
542
publish_digest.rb
Normal file
542
publish_digest.rb
Normal file
@@ -0,0 +1,542 @@
|
||||
#!/usr/bin/env ruby
|
||||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Zprávobot.news - AI Daily Digest Publisher
|
||||
# Version: 1.0.1 (Cloudron - Direct HTTP)
|
||||
#
|
||||
# Generates and publishes daily digest posts to Mastodon bots:
|
||||
# - @zpravobot (7:30) - neutral overview
|
||||
# - @pozitivni (12:00) - positive news
|
||||
# - @sarkasticky (19:00) - sarcastic commentary
|
||||
|
||||
require 'csv'
|
||||
require 'json'
|
||||
require 'time'
|
||||
require 'net/http'
|
||||
require 'uri'
|
||||
require 'optparse'
|
||||
|
||||
# ==========================================
|
||||
# CONFIGURATION
|
||||
# ==========================================
|
||||
|
||||
MASTODON_URL = 'https://zpravobot.news'
|
||||
CSV_PATH = '/app/data/posts-latest.csv'
|
||||
ANTHROPIC_API_URL = 'https://api.anthropic.com/v1/messages'
|
||||
|
||||
BOTS = {
|
||||
'zpravobot' => {
|
||||
token: ENV['ZPRAVOBOT_TOKEN'],
|
||||
style: 'neutral',
|
||||
time_slot: 'morning',
|
||||
hashtags: '#zpravobot #trendydne'
|
||||
},
|
||||
'pozitivni' => {
|
||||
token: ENV['POZITIVNI_TOKEN'],
|
||||
style: 'positive',
|
||||
time_slot: 'noon',
|
||||
hashtags: '#dobréZprávy #zpravobot'
|
||||
},
|
||||
'sarkasticky' => {
|
||||
token: ENV['SARKASTICKY_TOKEN'],
|
||||
style: 'sarcastic',
|
||||
time_slot: 'evening',
|
||||
hashtags: '#realita #zpravobot'
|
||||
}
|
||||
}
|
||||
|
||||
# ==========================================
|
||||
# COMMAND LINE PARSING
|
||||
# ==========================================
|
||||
|
||||
options = {}
|
||||
OptionParser.new do |opts|
|
||||
opts.banner = "Usage: publish_digest.rb [options]"
|
||||
|
||||
opts.on("--bot BOT", String, "Bot name (zpravobot, pozitivni, sarkasticky)") do |b|
|
||||
options[:bot] = b
|
||||
end
|
||||
|
||||
opts.on("--dry-run", "Test mode - don't actually publish") do
|
||||
options[:dry_run] = true
|
||||
end
|
||||
|
||||
opts.on("--date DATE", String, "Process specific date (YYYY-MM-DD)") do |d|
|
||||
options[:date] = d
|
||||
end
|
||||
|
||||
opts.on("-h", "--help", "Show this help") do
|
||||
puts opts
|
||||
exit
|
||||
end
|
||||
end.parse!
|
||||
|
||||
bot_name = options[:bot]
|
||||
|
||||
unless bot_name && BOTS.key?(bot_name)
|
||||
puts "❌ ERROR: Invalid bot name. Use: zpravobot, pozitivni, or sarkasticky"
|
||||
exit 1
|
||||
end
|
||||
|
||||
config = BOTS[bot_name]
|
||||
|
||||
# Validate environment
|
||||
unless config[:token]
|
||||
puts "❌ ERROR: Missing token for @#{bot_name}"
|
||||
puts " Set environment variable: #{bot_name.upcase}_TOKEN"
|
||||
exit 1
|
||||
end
|
||||
|
||||
unless ENV['ANTHROPIC_API_KEY']
|
||||
puts "❌ ERROR: Missing ANTHROPIC_API_KEY"
|
||||
exit 1
|
||||
end
|
||||
|
||||
# ==========================================
|
||||
# UTILITIES
|
||||
# ==========================================
|
||||
|
||||
def log(message)
|
||||
timestamp = Time.now.strftime('%Y-%m-%d %H:%M:%S')
|
||||
puts "[#{timestamp}] #{message}"
|
||||
end
|
||||
|
||||
def extract_url(text)
|
||||
text[/https?:\/\/[^\s<>"]+/]
|
||||
end
|
||||
|
||||
# ==========================================
|
||||
# DATA LOADING
|
||||
# ==========================================
|
||||
|
||||
def load_posts_from_csv(date = nil)
|
||||
target_date = date || (Time.now - 86400).strftime('%Y-%m-%d')
|
||||
|
||||
unless File.exist?(CSV_PATH)
|
||||
log "❌ CSV file not found: #{CSV_PATH}"
|
||||
exit 1
|
||||
end
|
||||
|
||||
posts = []
|
||||
|
||||
CSV.foreach(CSV_PATH, headers: true, encoding: 'utf-8') do |row|
|
||||
begin
|
||||
created = Time.parse(row['created_at'])
|
||||
|
||||
if created.strftime('%Y-%m-%d') == target_date
|
||||
posts << {
|
||||
'text' => row['text'],
|
||||
'url' => row['url'] || '',
|
||||
'created_at' => row['created_at']
|
||||
}
|
||||
end
|
||||
rescue => e
|
||||
# Skip problematic rows
|
||||
next
|
||||
end
|
||||
end
|
||||
|
||||
log "📊 Loaded #{posts.size} posts from #{target_date}"
|
||||
|
||||
if posts.empty?
|
||||
log "⚠️ No posts found for #{target_date}"
|
||||
exit 1
|
||||
end
|
||||
|
||||
posts
|
||||
end
|
||||
|
||||
# ==========================================
|
||||
# TOPIC EXTRACTION
|
||||
# ==========================================
|
||||
|
||||
def extract_topics(posts)
|
||||
topics = Hash.new { |h, k| h[k] = [] }
|
||||
|
||||
posts.each do |post|
|
||||
text = post['text'].downcase
|
||||
|
||||
# Add URL to post if not present
|
||||
post['extracted_url'] = extract_url(post['text']) || post['url']
|
||||
|
||||
# Categorize by topic
|
||||
if text.match?(/trump|venezuela|maduro|grónsko|greenland|usa|bílý dům/)
|
||||
topics['🌍 Zahraniční politika'] << post
|
||||
elsif text.match?(/hokej|extraliga|nhl|ms u20/)
|
||||
topics['🏒 Hokej'] << post
|
||||
elsif text.match?(/fotbal|chelsea|liga|gól|penalty/)
|
||||
topics['⚽ Fotbal'] << post
|
||||
elsif text.match?(/film|seriál|stranger things|hudba|koncert|festival|netflix/)
|
||||
topics['🎬 Kultura'] << post
|
||||
elsif text.match?(/počasí|teplota|mráz|sníh|déšť/)
|
||||
topics['❄️ Počasí'] << post
|
||||
elsif text.match?(/politika|parlament|vláda|ministr/)
|
||||
topics['🏛️ Politika'] << post
|
||||
elsif text.match?(/ekonomika|koruna|inflace|mzdy|ceny/)
|
||||
topics['💼 Ekonomika'] << post
|
||||
end
|
||||
end
|
||||
|
||||
# Sort by post count
|
||||
topics = topics.sort_by { |_, posts| -posts.size }.to_h
|
||||
|
||||
log "🔍 Found #{topics.size} topics:"
|
||||
topics.each { |topic, posts| log " #{topic}: #{posts.size} posts" }
|
||||
|
||||
topics
|
||||
end
|
||||
|
||||
# ==========================================
|
||||
# CONTENT FILTERING BY STYLE
|
||||
# ==========================================
|
||||
|
||||
def filter_topics_by_style(topics, style)
|
||||
case style
|
||||
when 'neutral'
|
||||
topics
|
||||
|
||||
when 'positive'
|
||||
positive_topics = {}
|
||||
|
||||
topics.each do |topic, posts|
|
||||
next if topic.include?('Politika') || topic.include?('Zahraniční')
|
||||
|
||||
positive_posts = posts.select do |post|
|
||||
text = post['text'].downcase
|
||||
has_positive = text.match?(/úspěch|vítěz|rekord|festival|koncert|ocenění|talent/)
|
||||
no_negative = !text.match?(/nehoda|smrt|tragédie|havárie|konflikt|krize/)
|
||||
has_positive && no_negative
|
||||
end
|
||||
|
||||
positive_topics[topic] = positive_posts unless positive_posts.empty?
|
||||
end
|
||||
|
||||
log "💚 Filtered to #{positive_topics.size} positive topics"
|
||||
positive_topics
|
||||
|
||||
when 'sarcastic'
|
||||
sarcastic_topics = {}
|
||||
|
||||
topics.each do |topic, posts|
|
||||
if topic.include?('Zahraniční') || topic.include?('Politika')
|
||||
sarcastic_topics[topic] = posts
|
||||
end
|
||||
end
|
||||
|
||||
if sarcastic_topics.size < 3
|
||||
topics.each do |topic, posts|
|
||||
break if sarcastic_topics.size >= 5
|
||||
sarcastic_topics[topic] = posts unless sarcastic_topics.key?(topic)
|
||||
end
|
||||
end
|
||||
|
||||
log "😏 Selected #{sarcastic_topics.size} topics for sarcasm"
|
||||
sarcastic_topics
|
||||
|
||||
else
|
||||
topics
|
||||
end
|
||||
end
|
||||
|
||||
# ==========================================
|
||||
# CLAUDE API ANALYSIS
|
||||
# ==========================================
|
||||
|
||||
def analyze_with_claude(posts, topics)
|
||||
log "🤖 Analyzing with Claude API..."
|
||||
|
||||
topic_summary = topics.map { |topic, posts| "#{topic}: #{posts.size}" }.join(', ')
|
||||
sample_texts = posts[0..49].map { |p| p['text'][0..150] }
|
||||
|
||||
prompt = <<~PROMPT
|
||||
Analyzuj #{posts.size} českých/slovenských zpráv z Mastodon instance Zprávobot.news.
|
||||
|
||||
Témata: #{topic_summary}
|
||||
|
||||
Ukázka textů:
|
||||
#{sample_texts[0..9].join("\n---\n")}
|
||||
|
||||
Vrať POUZE JSON (žádný markdown):
|
||||
{
|
||||
"main_topics": ["téma1", "téma2", "téma3"],
|
||||
"sentiment": "neutral|positive|negative",
|
||||
"notable_events": ["událost1", "událost2"]
|
||||
}
|
||||
PROMPT
|
||||
|
||||
uri = URI(ANTHROPIC_API_URL)
|
||||
request = Net::HTTP::Post.new(uri)
|
||||
request['anthropic-version'] = '2023-06-01'
|
||||
request['content-type'] = 'application/json'
|
||||
request['x-api-key'] = ENV['ANTHROPIC_API_KEY']
|
||||
|
||||
request.body = {
|
||||
model: 'claude-sonnet-4-20250514',
|
||||
max_tokens: 1000,
|
||||
messages: [
|
||||
{ role: 'user', content: prompt }
|
||||
]
|
||||
}.to_json
|
||||
|
||||
response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) do |http|
|
||||
http.request(request)
|
||||
end
|
||||
|
||||
if response.code != '200'
|
||||
log "⚠️ Claude API error: #{response.code}"
|
||||
return default_analysis(topics)
|
||||
end
|
||||
|
||||
data = JSON.parse(response.body)
|
||||
text = data['content'][0]['text']
|
||||
|
||||
analysis = JSON.parse(text.gsub(/```json|```/, '').strip)
|
||||
log "✅ Claude analysis complete"
|
||||
analysis
|
||||
|
||||
rescue => e
|
||||
log "⚠️ Claude API error: #{e.message}"
|
||||
default_analysis(topics)
|
||||
end
|
||||
|
||||
def default_analysis(topics)
|
||||
{
|
||||
'main_topics' => topics.keys[0..2],
|
||||
'sentiment' => 'neutral',
|
||||
'notable_events' => []
|
||||
}
|
||||
end
|
||||
|
||||
# ==========================================
|
||||
# TOOT GENERATION
|
||||
# ==========================================
|
||||
|
||||
def generate_summary_toot(posts_count, topics, style, hashtags)
|
||||
date = (Time.now - 86400).strftime('%d.%m.%Y')
|
||||
|
||||
topic_lines = topics.keys[0..4].map do |topic|
|
||||
count = topics[topic].size
|
||||
"#{topic} (#{count}#{style == 'sarcastic' ? '×' : ' postů'})"
|
||||
end
|
||||
|
||||
case style
|
||||
when 'neutral'
|
||||
summary = <<~TOOT
|
||||
📊 TRENDY DNE (#{date})
|
||||
|
||||
Zpracováno #{posts_count} postů:
|
||||
|
||||
#{topic_lines.join("\n")}
|
||||
|
||||
#{hashtags}
|
||||
|
||||
👇 Odkazy na vybrané články
|
||||
TOOT
|
||||
|
||||
when 'positive'
|
||||
summary = <<~TOOT
|
||||
☀️ DOBRÉ ZPRÁVY DNE (#{date})
|
||||
|
||||
Z dnešních #{posts_count} zpráv vybrané momenty:
|
||||
|
||||
#{topic_lines[0..3].join("\n")}
|
||||
|
||||
#{hashtags}
|
||||
|
||||
👇 Inspirace na čtení
|
||||
TOOT
|
||||
|
||||
when 'sarcastic'
|
||||
summary = <<~TOOT
|
||||
😏 DNEŠNÍ REALITA (#{date})
|
||||
|
||||
#{posts_count} postů = co se stalo?
|
||||
|
||||
#{topic_lines[0..3].join("\n")}
|
||||
|
||||
#{hashtags}
|
||||
|
||||
👇 Důkazy zmaru
|
||||
TOOT
|
||||
end
|
||||
|
||||
if summary.length > 500
|
||||
summary = summary[0..496] + "..."
|
||||
end
|
||||
|
||||
summary.strip
|
||||
end
|
||||
|
||||
def generate_links_toot(topics, style)
|
||||
links = []
|
||||
max_topics = 5
|
||||
max_links_per_topic = 2
|
||||
|
||||
topics.keys[0...max_topics].each do |topic|
|
||||
posts = topics[topic]
|
||||
links << "\n#{topic}:"
|
||||
|
||||
selected = []
|
||||
selected << posts[0] if posts[0]
|
||||
selected << posts[posts.size / 2] if posts.size > 1
|
||||
|
||||
selected[0...max_links_per_topic].each do |post|
|
||||
title = post['text'].split("\n")[0][0..50].strip
|
||||
title = title.gsub(/\s+/, ' ')
|
||||
|
||||
url = post['extracted_url']
|
||||
next unless url && !url.empty?
|
||||
|
||||
short_url = url.gsub(/https?:\/\//, '')
|
||||
short_url = short_url[0..37] + '...' if short_url.length > 40
|
||||
|
||||
links << "• #{title}..."
|
||||
links << " 🔗 #{short_url}"
|
||||
end
|
||||
end
|
||||
|
||||
case style
|
||||
when 'neutral'
|
||||
header = "📌 VYBRANÉ ČLÁNKY DNE:"
|
||||
footer = "\n#články #zprávy"
|
||||
|
||||
when 'positive'
|
||||
header = "💚 POZITIVNÍ PŘÍBĚHY DNE:"
|
||||
footer = "\n💙 Máte skvělý den!\n#inspirace"
|
||||
|
||||
when 'sarcastic'
|
||||
header = "🤡 \"BREAKING NEWS\" DNE:"
|
||||
footer = "\n🙃 Zítra: repeat\n#sarkasmus"
|
||||
end
|
||||
|
||||
toot = header + links.join("\n") + footer
|
||||
|
||||
if toot.length > 500
|
||||
truncated_links = links[0..(links.size * 2 / 3)]
|
||||
toot = header + truncated_links.join("\n") + footer
|
||||
|
||||
if toot.length > 500
|
||||
toot = toot[0..496] + "..."
|
||||
end
|
||||
end
|
||||
|
||||
toot.strip
|
||||
end
|
||||
|
||||
# ==========================================
|
||||
# MASTODON PUBLISHING (DIRECT HTTP)
|
||||
# ==========================================
|
||||
|
||||
def publish_thread(bot_name, summary_toot, links_toot, dry_run: false)
|
||||
config = BOTS[bot_name]
|
||||
|
||||
log "📤 Publishing thread for @#{bot_name}..."
|
||||
|
||||
if dry_run
|
||||
log "🧪 DRY RUN MODE - Not actually publishing"
|
||||
log "\n--- TOOT 1/2 (#{summary_toot.length} chars) ---"
|
||||
log summary_toot
|
||||
log "\n--- TOOT 2/2 (#{links_toot.length} chars) ---"
|
||||
log links_toot
|
||||
log "\n✅ Dry run complete"
|
||||
return [nil, nil]
|
||||
end
|
||||
|
||||
# Publish toot 1
|
||||
uri = URI("#{MASTODON_URL}/api/v1/statuses")
|
||||
request = Net::HTTP::Post.new(uri)
|
||||
request['Authorization'] = "Bearer #{config[:token]}"
|
||||
request['Content-Type'] = 'application/json'
|
||||
request.body = { status: summary_toot, visibility: 'public' }.to_json
|
||||
|
||||
response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) do |http|
|
||||
http.request(request)
|
||||
end
|
||||
|
||||
unless response.code == '200'
|
||||
log "❌ ERROR: #{response.body}"
|
||||
exit 1
|
||||
end
|
||||
|
||||
toot1_data = JSON.parse(response.body)
|
||||
toot1_url = toot1_data['url']
|
||||
toot1_id = toot1_data['id']
|
||||
log "✅ Toot 1/2 published: #{toot1_url}"
|
||||
|
||||
# Publish toot 2 as reply
|
||||
request2 = Net::HTTP::Post.new(uri)
|
||||
request2['Authorization'] = "Bearer #{config[:token]}"
|
||||
request2['Content-Type'] = 'application/json'
|
||||
request2.body = {
|
||||
status: links_toot,
|
||||
in_reply_to_id: toot1_id,
|
||||
visibility: 'public'
|
||||
}.to_json
|
||||
|
||||
response2 = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) do |http|
|
||||
http.request(request2)
|
||||
end
|
||||
|
||||
log "✅ Toot 2/2 published (thread)"
|
||||
|
||||
[toot1_data, JSON.parse(response2.body)]
|
||||
|
||||
rescue => e
|
||||
log "❌ ERROR publishing thread: #{e.message}"
|
||||
exit 1
|
||||
end
|
||||
|
||||
# ==========================================
|
||||
# MAIN EXECUTION
|
||||
# ==========================================
|
||||
|
||||
def main(bot_name, options = {})
|
||||
log "🚀 Starting Daily Digest for @#{bot_name}"
|
||||
log "=" * 60
|
||||
|
||||
config = BOTS[bot_name]
|
||||
|
||||
posts = load_posts_from_csv(options[:date])
|
||||
|
||||
log "\n🔍 Extracting topics..."
|
||||
all_topics = extract_topics(posts)
|
||||
|
||||
topics = filter_topics_by_style(all_topics, config[:style])
|
||||
|
||||
if topics.empty?
|
||||
log "⚠️ No suitable topics found for style: #{config[:style]}"
|
||||
exit 1
|
||||
end
|
||||
|
||||
log "\n🤖 Analyzing with Claude..."
|
||||
analysis = analyze_with_claude(posts, topics)
|
||||
|
||||
log "\n📝 Generating content..."
|
||||
summary = generate_summary_toot(posts.size, topics, config[:style], config[:hashtags])
|
||||
links = generate_links_toot(topics, config[:style])
|
||||
|
||||
log " Summary: #{summary.length} chars"
|
||||
log " Links: #{links.length} chars"
|
||||
|
||||
log "\n📤 Publishing to Mastodon..."
|
||||
toot1, toot2 = publish_thread(bot_name, summary, links, dry_run: options[:dry_run])
|
||||
|
||||
log "\n" + "=" * 60
|
||||
log "✅ Digest complete for @#{bot_name}"
|
||||
|
||||
unless options[:dry_run]
|
||||
log "🔗 Thread: #{toot1['url']}" if toot1
|
||||
end
|
||||
end
|
||||
|
||||
# Run main
|
||||
begin
|
||||
main(bot_name, options)
|
||||
rescue Interrupt
|
||||
log "\n⚠️ Interrupted by user"
|
||||
exit 130
|
||||
rescue => e
|
||||
log "❌ FATAL ERROR: #{e.message}"
|
||||
log " #{e.backtrace[0..4].join("\n ")}"
|
||||
exit 1
|
||||
end
|
||||
@@ -1,3 +1,7 @@
|
||||
cat >run-digest.sh <<'EOF'
|
||||
#!/bin/bash
|
||||
source /app/data/zpravobot-digest/config.env
|
||||
python3 /app/data/zpravobot-digest/digest-bot.py "$@"
|
||||
source /app/data/config.env
|
||||
ruby /app/data/publish_digest.rb --bot="$1" "${@:2}"
|
||||
EOF
|
||||
|
||||
chmod +x run-digest.sh
|
||||
|
||||
Reference in New Issue
Block a user