Back to Projects

App Store - Coming Soon

LivesLived Journal Transcription

Turn shoeboxes of letters, postcards, and journals into searchable, narrated, geographically mapped living archives.

An iOS archival intelligence app that combines multimodal AI, knowledge graphs, guided scanning flows, and multi-provider text-to-speech to transform fragile handwritten collections into interactive digital memory systems.

Elevated Pitch

LivesLived is an iOS app that turns shoeboxes of old letters, postcards, and journals into searchable, narrated, geographically mapped living archives using multimodal AI, knowledge graphs, and multi-provider text-to-speech.

Story

Most scanner tools stop at image capture. LivesLived is designed as a full archival workflow: intelligent scan acquisition, structured extraction, entity + location reasoning, interactive transcripts, and podcast-quality narration from historical documents.

Focus

Applied multimodal AI for family archives, historical research, and narrative preservation on iOS.

Product Type: iOS App

Technical Highlights

  • Custom capture stack with manual, voice, and auto-scan modes tuned for historical paper.
  • Schema-constrained multimodal AI extraction for reliable orientation, content, and metadata parsing.
  • Entity graph pipeline with deduplication, relationship mapping, and context-aware geocoding.
  • Interactive reader, transcript, and chat interfaces linked back to source image regions.
  • Audio Studio with script generation, multi-provider voice synthesis, and resumable chunk processing.
  • App Store-ready support/legal surface plus scalable cloud export paths.

The Problem

Millions of families hold fragile handwritten records that are functionally inaccessible: faded postcards, five-year diaries, envelopes with smudged postmarks, and annotations that traditional OCR cannot read reliably.

Existing scanner apps produce flat image libraries. They do not reconstruct relationships between people and places, they rarely handle historical cursive, and they do not create a coherent, explorable archive from hundreds of pages.

LivesLived solves this by treating historical paper as structured, connected data rather than static media.

What Makes It Exciting

Capture Engine Built for Real Archives

The app opens to Archives with collection thumbnails, page counts, and dates. Scanning supports manual shutter capture, voice-triggered capture, and auto-scan for stable document detection.

Voice commands run through on-device recognition and are tuned to avoid common iOS audio-session routing failures. Auto-scan uses real-time rectangle detection, smoothing, stability thresholds, and cooldown windows to reduce jitter and duplicate captures.

  • Manual mode: instant shutter sound with preloaded audio assets for low-latency feedback.
  • Voice mode: command vocabulary such as "snap" and "capture" with partial-transcript matching.
  • Auto-scan mode: rectangle detection, frame smoothing, stability gating, and timed cooldown.
  • Guided workflows for letters, postcards, journals, and photographs with context labels per step.

Structured AI Pipeline, Not Fragile OCR Cleanup

Captured pages are processed through Gemini models using strict JSON schemas. The output is typed and predictable, avoiding brittle regex post-processing.

The extraction layer separates printed and handwritten text, infers orientation metadata, parses date context, and captures artifact-level details such as stamps, postmarks, and address blocks.

  • Per-page orientation: 0, 90, 180, 270 degrees with metadata-based correction.
  • Postcard-specific extraction: stamp regions, postmark date/location, sender and recipient blocks.
  • Journal entry segmentation: date inference across multi-year diaries with region bounding boxes.
  • Automatic translation for non-English pages with clean TTS handoff.
  • Parallel and sequential processing modes for different provider rate limits.

Collection Detail Experience

Each processed collection opens into an operator-grade workspace: Artifacts, Pages, Chat, and Entities. Instead of browsing isolated files, users navigate a cohesive archive with status visibility across processing and audio generation.

The reading interfaces are purpose-built by medium: page zoom for letters, 3D front/back transitions for postcards, and mini/full players for narrated playback.

  • Artifacts tab: grouped by type with status badges for AI, scripts, and audio readiness.
  • Pages tab: OCR-region overlays, grouping/ungrouping, and drag reorder workflows.
  • Interactive transcripts: tap text to highlight source regions on scanned images.
  • Chat tab: collection-aware prompts for summary, names/dates, action items, and quiz generation.

Entity Graph + Map Intelligence

The entity system extracts PERSON, PLACE, THING, and IDEA nodes per page, then runs a merge pass to deduplicate aliases across the entire archive.

Location resolution uses context hints so ambiguous mentions map correctly, then persists geocoded coordinates for future sessions.

  • 3D SceneKit graph with force-directed layout and interactive node exploration.
  • MapKit visualization with color-coded entities and postcard route overlays.
  • Occurrence matching with character offsets for precise, clickable text references.
  • Stamps gallery generated from extracted postcard stamp regions.

Audio Studio and Voice Blueprinting

LivesLived includes a full audio production workflow, not just playback. It generates narration scripts that preserve writer voice while cleaning OCR artifacts and formatting for spoken flow.

Users can choose from multiple TTS providers and define reusable Voice Blueprints that shape identity, cadence, exclusions, and pronunciation.

  • Provider support: Gemini, OpenAI, Groq, and Apple system voices.
  • Chunked generation pipeline with provider-aware limits and resume support.
  • Background-safe generation and robust partial-progress recovery.
  • Optional SoundCloud publishing via OAuth 2.0 with PKCE.

Cloud Upload and Developer Controls

Collections can be exported to cloud integrations via multipart uploads with metadata and MIME-aware file detection.

Power-user settings expose model toggles, batching controls, scanning modes, sequential processing for strict rate limits, and raw-response debugging.

  • Model switching between Gemini variants.
  • Configurable batch size and processing strategy.
  • Admin mode for raw structured response inspection.
  • Diagnostics tooling for live audio-session state capture.

Technical Stack

SwiftUI + SwiftDataAVFoundationVision + VisionKitSpeech FrameworkSceneKitMapKit + CoreLocationMediaPlayerFirebase AI / Vertex AICustom URL scheme for OAuth callbacksBackground audio processing modes

The Vision

A grandmother's WWII letters, a collector's postcards from 1920s Europe, or a researcher's decades of handwritten journals can become searchable, narrated, and connected digital memory systems that stay alive across generations.

How to provide media for this page

Add screenshots and demo clips into public/media. Then register assets in data/projects.ts and reference them in this template. For YouTube, include a public link and we can embed it directly.