Author: Smin Rana

  • Building AI‑Native iOS Features: On‑device LLMs with Core ML and MLX

    Building AI‑Native iOS Features: On‑device LLMs with Core ML and MLX

    Ship a semantic search feature that feels instant, works offline, preserves privacy, and raises engagement — built natively on iOS with Core ML and MLX. Success looks like faster findability, longer sessions, and users trusting the app with their notes, docs, or content.

    What Success Looks Like

    • Latency: <100ms for typical queries on modern iPhones
    • Privacy: zero network calls for core interactions; clear user consent
    • Engagement: +20–30% more successful searches; +10–15% longer sessions
    • Reliability: graceful degradation when indexing is interrupted; safe cancellation

    The User Problem

    Users don’t remember exact words — they remember ideas. Literal search makes them feel clumsy and slow. We want the app to understand meaning: “winter boarding checklist,” “Swift actor pattern,” “sound design notes” — even if phrased differently.

    Our Constraints

    • On‑device by default (ANE/GPU/CPU), no per‑keystroke backend calls
    • Battery‑aware and memory‑bounded; performance budgets per screen
    • Simple, testable architecture; no mystery schedulers or hidden queues

    The Plan

    1. Represent meaning with embeddings
    2. Make retrieval fast with a local index
    3. Keep it private and instant with aggressive caching
    4. Respect energy and memory budgets
    5. Tell a clear UX story with tight feedback loops

    Prototyping on Mac with MLX

    We began on Apple Silicon, where iteration speed wins. MLX let us test small transformer‑based embedding models, tune dimensions, and measure throughput across real text (notes, code snippets, short docs). We weren’t chasing leaderboard scores — we optimized for consistency and speed in our domain.

    Core ML Conversion for iOS

    Once our embedding model behaved well, we converted it to Core ML (coremltools) and compiled it into .mlmodelc assets. This brought ANE acceleration and stable APIs. We wrapped the model in a boring interface: “give me a string, I’ll return a float vector.” No surprises.

    A Boring, Reliable Wrapper

    import CoreML
    
    final class TextEmbeddingModel {
        enum ModelError: Error { case outputMissing }
        private let model: MLModel
    
        init() {
            let url = Bundle.main.url(forResource: "TextEmbed", withExtension: "mlmodelc")!
            model = try! MLModel(contentsOf: url)
        }
    
        func embed(_ text: String) throws -> [Float] {
            let input = try MLDictionaryFeatureProvider(dictionary: ["text": text])
            let output = try model.prediction(from: input)
            guard let arr = output.featureValue(for: "embedding")?.multiArrayValue else { throw ModelError.outputMissing }
            var result = [Float](repeating: 0, count: arr.count)
            for i in 0..<arr.count { result[i] = arr[i].floatValue }
            return result
        }
    }

    Pipeline: Normalize, Cache, Index

    The model is a component; the pipeline is the feature. We normalized inputs, cached aggressively, and isolated mutation with actors.

    struct EmbeddingResult: Sendable { let vector: [Float]; let key: String }
    
    actor EmbeddingCache {
        private var store: [String: [Float]] = [:]
        func get(_ key: String) -> [Float]? { store[key] }
        func put(_ key: String, _ vector: [Float]) { store[key] = vector }
    }
    
    struct TextPreprocessor {
        static func normalize(_ s: String) -> String { s.lowercased().trimmingCharacters(in: .whitespacesAndNewlines) }
        static func key(for s: String) -> String { String(s.hashValue) } // replace with stable hash
    }
    
    actor EmbeddingService {
        private let cache = EmbeddingCache()
        private let model = TextEmbeddingModel()
    
        func embed(_ text: String) async throws -> EmbeddingResult {
            let clean = TextPreprocessor.normalize(text)
            let key = TextPreprocessor.key(for: clean)
            if let cached = await cache.get(key) { return EmbeddingResult(vector: cached, key: key) }
            let vector = try model.embed(clean)
            await cache.put(key, vector)
            return EmbeddingResult(vector: vector, key: key)
        }
    }

    Local Retrieval with Cosine Similarity

    Cosine similarity is simple and effective for semantic search. We kept writes serialized and reads fast.

    actor VectorIndex {
        struct Item: Sendable { let id: String; let vector: [Float] }
        private var items: [Item] = []
    
        func upsert(_ item: Item) {
            if let idx = items.firstIndex(where: { $0.id == item.id }) { items[idx] = item } else { items.append(item) }
        }
    
        func topK(query: [Float], k: Int = 10) -> [Item] {
            let scored = items.map { ($0, cosine($0.vector, query)) }
            return scored.sorted(by: { $0.1 > $1.1 }).prefix(k).map { $0.0 }
        }
    
        private func cosine(_ a: [Float], _ b: [Float]) -> Float {
            var dot: Float = 0, na: Float = 0, nb: Float = 0
            for i in 0..<min(a.count, b.count) { dot += a[i]*b[i]; na += a[i]*a[i]; nb += b[i]*b[i] }
            return dot / (sqrt(na) * sqrt(nb) + 1e-6)
        }
    }

    UX: Instant Feedback, Honest Ranking

    Users type; results update. We debounced input, embedded the query, fetched top matches locally, and updated the UI — no waiting on a network round‑trip. We surfaced “why” explanations next to results (“matched concepts: winter boarding, checklist”).

    @MainActor
    final class SearchViewModel: ObservableObject {
        @Published var query: String = ""
        @Published var results: [Doc] = []
        private let embedder = EmbeddingService()
        private let index = VectorIndex()
    
        func search(_ q: String) {
            query = q
            Task { [weak self] in
                guard let self else { return }
                do {
                    let emb = try await self.embedder.embed(q)
                    let local = await self.index.topK(query: emb.vector, k: 20)
                    await MainActor.run { self.results = local.map(toDoc) }
                } catch {
                    // handle gracefully
                }
            }
        }
    }

    Budgets and Profiling (ANE/Metal)

    We set measurable budgets and stuck to them:

    • Concurrency: 4 tasks for search, 6 for background indexing
    • Memory: cap vector lengths; compress idle caches
    • Energy: no long‑running work triggered by typing

    We used Instruments — Energy, Time Profiler, Allocations, Concurrency — and added signposts around embedding and retrieval.

    import os.signpost
    let log = OSLog(subsystem: "com.app", category: "ai")
    let sp = OSSignposter(log: log)
    
    func signposted<T>(_ name: StaticString, _ op: () async throws -> T) async rethrows -> T {
        let s = sp.beginInterval(name); defer { sp.endInterval(name, s) }
        return try await op()
    }

    Guardrails and Trust (Consent, Accessibility, Explainability)

    We treated AI like a respectful assistant:

    • Transparent consent and on‑device defaults
    • Clear controls to pause/stop generation
    • Constrained prompts and output lengths
    • Simple “why” explanations to reduce surprise

    Persistence and Resilience

    We persisted embeddings and outputs with lightweight indexing, batched writes, and versioned caches. When the model changed, we invalidated cleanly and rebuilt in the background. Checkpoints let long jobs resume.

    App Intents (Shortcuts)

    We exposed quick actions and Shortcuts so users could jump directly to “ideas about audio” or “notes on actors,” making the feature feel native beyond the app.

    Keeping the Binary Lean

    We shipped a base embedding model, downloaded larger variants on demand, and audited assets ruthlessly. Smaller apps install more, start faster, and crash less.

    Testing and CI/CD

    We tested actor‑isolated caches and indexes, verified cancellation, used fixtures for embeddings, and avoided sleeps. In CI, we staged model assets and gated releases with end‑to‑end tests. Budgets were checked on physical devices before TestFlight.

    Results

    • Latency consistently under 100ms for typical queries
    • Dramatic increase in successful searches and longer sessions
    • Fewer support tickets about “can’t find my note”
    • Positive reviews citing speed and trust (“works offline, feels instant”)

    Lessons

    • The model is not the feature; the pipeline is
    • Ownership and isolation prevent heisenbugs and copy storms
    • Budgets make performance a product choice, not luck
    • On‑device by default earns trust and word‑of‑mouth

    Implementation Checklist

    • [ ] Define objective and metrics (latency, privacy, engagement)
    • [ ] Prototype embeddings with MLX on Mac (tune dimensions/tokenization)
    • [ ] Convert to Core ML (.mlmodelc) and wrap a stable API
    • [ ] Build pipeline: normalize, cache, index
    • [ ] Implement local retrieval and “why” explanations
    • [ ] Set concurrency/memory/energy budgets; add signposts; profile on device
    • [ ] Persist vectors; batch writes; version caches; checkpoints
    • [ ] Integrate ## App Intents (Shortcuts) (Shortcuts) for quick actions
    • [ ] Keep binary lean; stage assets; test on physical devices
    • [ ] Monitor results; iterate

    FAQs

    • What is on‑device AI for iOS?
      • Running models locally on iPhone/iPad using Core ML/Metal/ANE, keeping latency low and data private.
    • Core ML vs MLX — which should I use?
      • Use MLX on Mac for rapid prototyping and custom layers; convert to Core ML for production iOS deployment with ANE acceleration.
    • Can iPhones run LLMs?
      • Yes, small distilled models are practical for templated generation, short summaries, and classification with rationale.
    • How do I keep battery usage low?
      • Cap concurrency, use ANE where available, measure with Instruments, avoid long tasks on user input.
    • How do I ensure privacy?
      • Avoid per‑keystroke network calls; keep embeddings and retrieval on device; offer opt‑in for remote expansion.
    • How do I tune search quality?
      • Normalize inputs, cache aggressively, and tune embedding dimensions/tokenization for your domain; surface “why” explanations.

    Where This Goes Next

    We’ve reused the pipeline to power intent suggestions, lightweight categorization, and short previews. The same embedding cache and index became a platform inside the app. Small, reliable pieces compound.

    Spread the love
  • Case Study 5: Dev Tool MVP

    Case Study 5: Dev Tool MVP

    I built a CLI tool intended to standardize local development setup across microservices. The promise: one command—dev bootstrap—that discovers services, generates .env files, and starts containers via Docker Compose. In demos, it was magical. In real teams, it broke in 40% of setups due to bespoke scripts, Compose version drift, OS differences, and odd edge cases. The MVP automated too much, too early, and eroded trust.

    This article explains what I built, why it failed, and how I would rebuild the MVP around a clear compatibility contract and a validator-first workflow that earns trust before automating.

    The Context: Diverse Stacks, Fragile Automation

    Microservice repos evolve organically. Teams glue together language-specific tools, local caches, custom scripts, and different container setups. A tool that tries to own the entire “bootstrap and run” flow without a shared contract is brittle.

    What I Built (MVP Scope)

    • Discovery: Scan repos for services via file patterns.
    • Env Generation: Infer env keys from docker-compose.yml and sample .env.example files; produce unified .env.
    • Compose Orchestration: Start all services locally with one command.
    • Opinionated Defaults: Assume standard port ranges and common service names.
    • Metrics: Time to first run, number of successful bootstraps per team.

    Launch and Early Results

    • Solo demos worked spectacularly.
    • Team pilots revealed fragility: custom scripts, non-standard Compose naming, and OS-specific quirks caused frequent failures.
    • Trust dropped quickly; teams reverted to their known scripts.

    Why It Failed: Over-Automation Without a Contract

    I tried to automate the whole workflow without agreeing on a small, stable contract that teams could satisfy. Without a shared “dev.json” or similar spec, guessing env keys and start commands led to errors. Reliability suffered, and with dev tools, reliability is the MVP.

    Root causes:

    • Inference Errors: Guessing configurations from heterogeneous repos is error-prone.
    • Hidden Assumptions: Opinionated defaults clashed with local reality.
    • No Validation Step: Users couldn’t see or fix mismatches before automation ran.

    The MVP I Should Have Built: Validate and Guide

    Start with a minimal compatibility contract and a validator that helps teams conform incrementally.

    • Contract: Each service exposes a dev.json containing ports, env keys, and start command.
    • Validator CLI: dev validate checks conformance, explains gaps, and suggests fixes.
    • Linter: Provide a linter for dev.json with clear error messages.
    • Guided Setup: Generate .env from dev.json and start one service at a time.
    • Telemetry: Track validation pass rate, categories of errors, and time to first successful run.

    How It Would Work (Still MVP)

    • Step 1: Teams add dev.json to each service with minimal fields.
    • Step 2: Run dev validate; fix issues based on actionable messages.
    • Step 3: Use dev env to generate environment files deterministically.
    • Step 4: Start one service with dev run service-a; expand to orchestration only after a high pass rate.

    This builds trust by making the tool predictable and by exposing mismatches early.

    Technical Shape

    • Schema: dev.json with fields { name, port, env: [KEY], start: "cmd" }.
    • Validation Engine: JSON schema + custom checks (port conflicts, missing env keys).
    • Compose Adapter: Optional; reads from dev.json to generate Compose fragments rather than infer from arbitrary files.
    • Cross-Platform Tests: Simple checks for OS differences (path separators, shell commands).

    Measuring Trust

    • Validation Pass Rate: Percentage of services passing dev validate.
    • First Successful Run: Time from install to one service running.
    • Error Categories: Distribution helps prioritize adapters and docs.
    • Rollback Incidents: Track how often teams abandon the tool mid-setup.

    Onboarding and Documentation

    • Quick Start: Create dev.json with a template; run dev validate.
    • Troubleshooting: Clear guides for common errors with copy-paste fixes.
    • Contracts Over Recipes: Emphasize the compatibility contract and why it exists.

    Personal Reflections

    I wanted the “it just works” moment so much that I skipped the steps that make “it just works” possible: a shared spec and a validator. Dev teams reward predictability over magic; trust is the currency.

    Counterfactual Outcomes

    With a validator-first MVP:

    • Validation pass rate climbs from ~40% to ~80% in two months.
    • Time to first successful run drops significantly.
    • Teams adopt the tool gradually, and orchestration becomes feasible.

    Iteration Path

    • Add adapters for common stacks (Node, Python, Go).
    • Introduce a dev doctor command that diagnoses OS and toolchain issues.
    • Expand the contract only as needed; resist auto-inference beyond the spec.

    Closing Thought

    For dev tools, the smallest viable product is a trust-building tool: define a minimal contract, validate it, and guide teams to conformance. Automate only after reliability is demonstrated. Magic is delightful, but trust is what sticks.

    Spread the love
  • Case Study 4: Consumer Health MVP

    Case Study 4: Consumer Health MVP

    The product was a habit-building app focused on sleep: wind-down routines, gentle alarms, and a simple educational library. The launch was exciting—we onboarded ~500 users via two TikTok creators. Engagement was strong in the first week thanks to streaks and badges. But adherence to core routines lagged, and by week three, many users were checking in without actually following the behaviors that mattered. The MVP drove taps, not change.

    This article breaks down the design, what didn’t work, and how I would rebuild the MVP around personalization, adaptive scheduling, and a coach-like loop that respects real-life constraints.

    The Context: Sleep Behaviors Are Constraint-Driven

    People’s lives shape their sleep more than motivation alone. Shift work, small children, travel, and social commitments make “ideal” routines unrealistic. The MVP assumed generic routines suited most people, which backfired. Users wanted guidance tailored to their circumstances, not gamification.

    What I Built (MVP Scope)

    • Routines: Wind-down steps (dim lights, screen off, breathing exercises), and a gentle wake alarm.
    • Streaks and Badges: Gamified adherence with daily streaks and weekly badges.
    • Educational Library: Short articles on sleep hygiene.
    • Reminders: Fixed-time prompts for wind-down and bedtime.
    • Metrics: Daily check-ins, streak length, weekly summaries.

    Launch and Early Signals

    • Activation was strong: ~70% completed the first wind-down routine.
    • Streaks increased check-ins but not adherence to the core behavior (e.g., screens off by 10 pm consistently).
    • Users reported “feeling good about tracking,” but didn’t see improvements in sleep quality.

    Key complaints:

    • “My schedule varies; the app nags me at the wrong times.”
    • “Badges don’t help when my kid wakes up at 3am.”
    • “Travel breaks my streak, and then I stop caring.”

    Why It Failed: Motivation Without Personalization

    I gamified behavior without modeling constraints. The MVP treated adherence as a universal routine problem rather than a personal scheduling problem. Without adapting to real life, users ignored reminders or checked in perfunctorily.

    Root causes:

    • Generic routines: Assumed one-size-fits-most.
    • Naive reminders: Fixed times didn’t adjust to late nights or early mornings.
    • No segment-specific guidance: Shift workers and new parents have different protocols.

    The MVP I Should Have Built: Personalization First, Then Motivation

    Start with one segment and tailor deeply. For example, shift workers. Build protocols specific to circadian challenges:

    • Protocols: Light exposure timing, nap rules, caffeine cutoffs aligned to shift patterns.
    • Adaptive Scheduling: Detect late shifts and adjust wind-down and wake times within guardrails.
    • Key Habit Metric: Track one behavior that matters (e.g., screens off by 10 pm four days/week) and correlate with subjective sleep quality.
    • Coach Moments: Replace badges with context-aware guidance and weekly plan adjustments.

    How It Would Work (Still MVP)

    • Onboarding: Ask about shift schedule or parenting constraints; pick a protocol.
    • Daily Flow: The app proposes a tailored wind-down and wake plan; adjusts if you log a late night.
    • Feedback Loop: Weekly review suggests a small adjustment (e.g., move wind-down earlier by 15 minutes) and explains why.
    • Success Metric: Adherence to the key habit and reported sleep quality trend.

    Technical Shape

    • Scheduling Engine: Rule-based adjustments (if late night logged, push wake by 30 minutes; enforce max shift).
    • Signal Inputs: Manual logs initially; later integrate phone usage or light sensor where available.
    • Content System: Protocol-specific modules rather than generic tips.
    • Data and Privacy: Local storage for sensitive logs; opt-in sync for backups.

    Measuring What Matters

    • Adherence Rate: Percentage of days the key habit is followed.
    • Quality Trend: Subjective sleep quality over time.
    • Adjustment Efficacy: Whether weekly plan changes improve adherence.
    • Drop-off Analysis: Identify segments with high abandonment to refine protocols.

    Personal Reflections

    I leaned on gamification because it’s easy to ship and feel good about. But in health, behavior change requires modeling constraints and giving actionable, compassionate guidance. People don’t fail because they don’t care—they fail because life is complicated.

    Counterfactual Outcomes

    With a tailored MVP for shift workers:

    • Adherence to the one key habit increases from ~35% to ~60%.
    • Reported sleep quality improves modestly but consistently over six weeks.
    • Drop-offs decrease because schedules feel respected and adjustments make sense.

    Even small improvements mean real value, because they’re sustainable.

    Iteration Path

    • Add segments: New parents, frequent travelers.
    • Introduce adaptive reminders with more signals (calendar, device usage) with strict privacy controls.
    • Layer gentle motivation (streaks) only after personalization works.
    • Explore “coach check-ins” via chat prompts for accountability.

    Closing Thought

    Health MVPs shouldn’t start with gamification. Start with constraints: tailor protocols to one segment, make schedules adaptive, and measure adherence to one meaningful habit alongside perceived quality. Motivation supports behavior; personalization enables it.

    Spread the love