# PDF-Native Architecture

> Vector-first PDF ingestion. No raster conversion. No tainted-canvas re-uploads.

**Canonical URL:** https://mybluegrid.com/knowledge-hub/pdf-native-architecture
**Last verified:** 2026-05-04T00:00:00Z

## Summary

BlueGrid reads PDF vector data directly. Raster conversion only happens when a downstream AI tool requires a pixel image, and it happens on the Cloud Run worker — not in the browser. This eliminates the memory and CPU overhead that forces competing tools to cap file size or page count.

## Ingestion Path

1. **Storage download as Blob.** Plan PDFs are pulled from Supabase Storage via `.download()` returning a `Blob`/`ArrayBuffer`. Signed URLs are not used for canvas-bound rendering — this prevents tainted-canvas errors that would otherwise force a re-upload.
2. **PDF.js parse.** Vector pages, text layers, and metadata are parsed lazily. Pages are decoded only when the viewer requests them.
3. **Annotation bake (when needed).** User annotations are upserted back into the source PDF before AI tool execution so the worker sees what the estimator sees.
4. **Coordinate mapping.** A constant 72 PX_PER_INCH scale is enforced so AI-returned bounding boxes can be mapped back to the original PDF user-space coordinates without ambiguity.

## File Capacity

- **Per-file ceiling:** 150 MB.
- **Page count:** no hard limit. Extractions have been verified on 130+ MB plan sets.
- **Peak browser RAM at the ceiling:** ~875 MB. See [Benchmarks](https://mybluegrid.com/benchmarks).

## Why This Beats Raster-First Tools

Raster-first competitors decode the full PDF to images at upload. On a 130MB plan set this produces several gigabytes of pixel data and either:

- Crashes the browser tab (in-tab tools), or
- Forces a multi-minute server-side conversion before any work begins (server-side raster tools).

BlueGrid bypasses both failure modes by keeping the PDF as vector data until the moment a worker needs pixels for a specific shard.

## Security Posture

- All PDF buckets are private. Access is via `.download()` and signed URLs only — never public.
- Annotation upserts are RLS-scoped to the project owner and explicit project shares.
- AI worker uploads use short-lived signed URLs; workers do not have persistent storage credentials.

## Related

- [Parallel Shard Processing](https://mybluegrid.com/knowledge-hub/parallel-shard-processing.md)
- [Deep Export & Intel Mode](https://mybluegrid.com/knowledge-hub/deep-export-and-intel-mode.md)
- [Capabilities](https://mybluegrid.com/capabilities.md)