scheduled hacker-news content scraping with cron-triggered workflows
Automatically fetches top stories from Hacker News API on a fixed daily schedule (23:30 UTC) using Cloudflare Workflows' cron trigger system. The scraper extracts article metadata (title, URL, score, comments) and stores raw content in Cloudflare KV for downstream processing. Uses exponential backoff retry logic built into the WorkflowEntrypoint pattern to handle transient failures without manual intervention.
Unique: Uses Cloudflare Workflows' native cron trigger with built-in exponential backoff and Durable Objects state management, eliminating the need for external schedulers (cron.io, APScheduler) or message queues. Workflow state is automatically persisted and recoverable on worker restart.
vs alternatives: Simpler than Lambda + EventBridge or Airflow because scheduling, retry logic, and state persistence are native to the Cloudflare Workers platform, reducing operational overhead.
dual-host podcast script generation with ai-powered summarization and dialogue synthesis
Converts scraped Hacker News articles into Chinese-language podcast scripts using @ai-sdk/openai-compatible's generateText function with configurable LLM backends (OpenAI, Anthropic, or compatible APIs). The system generates structured dialogue between two hosts discussing each article, including summaries, key insights, and conversational transitions. Uses prompt engineering to enforce consistent speaker roles and Chinese language output, with fallback handling for API failures.
Unique: Uses @ai-sdk/openai-compatible abstraction layer to support multiple LLM providers (OpenAI, Anthropic, Ollama) with identical code paths, enabling cost optimization and provider switching without code changes. Generates structured dialogue with explicit speaker roles rather than monolithic summaries.
vs alternatives: More flexible than hardcoded OpenAI integration because it abstracts provider differences; more cost-effective than single-provider solutions because it allows switching to cheaper models (e.g., Ollama locally) without refactoring.
image lightbox and media gallery for episode cover art and related visuals
Implements a lightbox component for displaying and navigating episode cover art and related images using a modal overlay with keyboard navigation (arrow keys, Escape to close). Images are lazy-loaded from Cloudflare R2 CDN and displayed at full resolution with zoom and pan capabilities. The lightbox is triggered by clicking on episode cover art or related images and supports touch gestures on mobile (swipe to navigate).
Unique: Implements a custom lightbox component without external libraries, reducing bundle size and enabling tight integration with the Cloudflare R2 CDN. Supports both keyboard and touch navigation for accessibility across devices.
vs alternatives: Lighter than Lightbox.js or Photoswipe because it's custom-built for this project; more accessible than generic image links because it includes keyboard navigation and ARIA labels.
environment-based configuration management for multi-environment deployment
Manages application configuration (API keys, provider selection, feature flags) through environment variables loaded from .env files and Cloudflare Workers secrets. Supports separate configurations for development (local), staging, and production environments without code changes. Configuration is validated at startup using TypeScript types, ensuring type safety and preventing runtime errors from missing or invalid settings. Implements fallback defaults for optional settings (e.g., TTS provider defaults to Edge TTS if not specified).
Unique: Uses TypeScript type definitions to validate configuration at startup, catching missing or invalid settings before runtime. Supports both .env files (development) and Cloudflare Workers secrets (production) with identical code paths.
vs alternatives: More type-safe than string-based environment variables because TypeScript enforces schema validation; simpler than external config services (Consul, etcd) because configuration is native to Cloudflare Workers.
multi-provider text-to-speech conversion with configurable voice synthesis
Converts podcast scripts into audio using pluggable TTS providers: Edge TTS (free, Microsoft-backed), Minimax HTTP API (Chinese-optimized), and Murf HTTP API (high-quality voices). Each provider is abstracted behind a common interface that accepts speaker-tagged script segments and returns per-speaker audio buffers. The system selects providers based on configuration and handles provider-specific audio format conversions (MP3, WAV, etc.) transparently.
Unique: Abstracts three distinct TTS providers (Edge TTS, Minimax, Murf) behind a unified interface, allowing runtime provider selection and fallback without code changes. Handles provider-specific quirks (API formats, audio codecs, language support) transparently in adapter classes.
vs alternatives: More flexible than single-provider TTS (e.g., Google Cloud TTS only) because it enables cost optimization (free Edge TTS for testing, premium Minimax for production) and avoids vendor lock-in; better Chinese support than generic English-first TTS services.
ffmpeg-based audio merging and mp3 encoding in browser runtime
Merges per-speaker audio segments into a single podcast episode using FFmpeg.js, a JavaScript port of FFmpeg compiled to WebAssembly. Runs entirely within the Cloudflare Worker browser runtime (no external FFmpeg binary required), concatenating speaker audio buffers with silence padding between segments and encoding the final output as MP3. Handles audio format normalization (sample rate, channels) and metadata embedding (ID3 tags with episode title, artist, date).
Unique: Uses FFmpeg.js (WebAssembly-compiled FFmpeg) running inside Cloudflare Workers to perform audio merging without external services or infrastructure. Eliminates the need for Lambda layers, ECS tasks, or dedicated audio processing servers by leveraging the worker's browser-like runtime.
vs alternatives: Simpler than AWS Lambda + FFmpeg layer because no infrastructure provisioning is needed; cheaper than Mux or Cloudinary because no per-minute billing; more deterministic than shell-based FFmpeg because behavior is identical across all worker instances.
cloudflare kv and r2 storage with automatic episode persistence and retrieval
Stores generated podcast episodes in a two-tier storage system: Cloudflare KV holds episode metadata (title, date, summary, speaker names) as JSON documents with TTL-based expiration, while Cloudflare R2 (S3-compatible object storage) persists the final MP3 audio files with public CDN URLs. The system implements a caching layer in KV to avoid repeated metadata lookups and uses R2's built-in versioning for episode rollback. Metadata keys follow a date-based naming scheme (YYYY-MM-DD) for efficient pagination and retrieval.
Unique: Combines Cloudflare KV (for fast metadata caching) and R2 (for durable audio storage) in a single unified namespace, eliminating the need for external databases or S3 buckets. Uses date-based key naming (YYYY-MM-DD) to enable efficient pagination and chronological episode discovery without secondary indexes.
vs alternatives: Cheaper than DynamoDB + S3 because Cloudflare's pricing is simpler (no per-request charges); faster than PostgreSQL for metadata lookups because KV is globally distributed; simpler than managing separate databases because both metadata and audio are in the same Cloudflare account.
rss feed generation with podcast-compatible metadata and multi-platform distribution
Generates a standards-compliant RSS 2.0 feed with podcast-specific extensions (iTunes, Podtrac, Spotify) that enables distribution to Apple Podcasts, Spotify, YouTube, and 小宇宙 (Chinese podcast platform). The feed is dynamically generated from KV metadata on each request, including episode title, description, audio URL, publication date, and cover art. Implements caching headers (ETag, Cache-Control) to reduce regeneration overhead and uses RSS validation to ensure compatibility with podcast aggregators.
Unique: Dynamically generates RSS feeds from Cloudflare KV metadata on each request rather than pre-generating static files, enabling real-time episode updates without rebuild cycles. Includes platform-specific metadata extensions (iTunes, Podtrac, Spotify) in a single feed to support simultaneous distribution to multiple podcast platforms.
vs alternatives: More flexible than static RSS generation because episodes are published immediately without rebuild; simpler than external RSS services (Transistor, Podbean) because feed generation is native to the worker; supports more platforms than generic RSS because it includes iTunes, Spotify, and Chinese-specific extensions.
+4 more capabilities