📁

Upload Dataset

CSV files only · Parsed entirely in browser

📊

Drag & drop your CSV dataset

Browse file or drop it here

.csv UTF-8 Headers required No size limit

✓

—

—

📋

Dataset Summary

—

Column	Type	Missing	Unique Values	Sample Values

👥

Cohort Analysis

Compare metrics across user segments

Group By (Category)

Measure (Numeric)

Chart Type

Cohort	Count	Mean	Min	Max	vs. Overall Mean

Technical Documentation

How the Agent Works

A single HTML file that performs automated exploratory data analysis, churn detection, hypothesis generation and natural language Q&A — entirely client-side, no backend required.

Processing Pipeline

From CSV upload to intelligence output in five deterministic steps.

📁

Step 1

File Ingestion

PapaParse reads the CSV in the browser. No data leaves the device.

🔍

Step 2

Column Analysis

Each column is typed as numeric, categorical, binary, or ID. Nulls, unique counts and sample values are extracted.

📐

Step 3

Statistical Compute

Mean, median, std dev, Pearson correlation and frequency distributions are computed in pure JS.

🧠

Step 4

Pattern Detection

Churn drivers, correlations, skew, cohort gaps and outliers are detected using rule-based heuristics.

✦

Step 5

Insight Generation

Natural language insights, recommendations and chart captions are composed from the computed patterns.

Data Flow

How raw CSV rows become structured intelligence.

⟶ End-to-End Data Flow

CSV File

User upload or drag-drop

→

PapaParse

Header detection · Dynamic typing · Null handling

→

analyseColumns()

Type classification · Column metadata extraction

→

STATE object

rawData · headers · columnMeta · numericCols · catCols

Statistical Engine

mean · median · stdDev · pearson · histogram · frequency

→

Pattern Detectors

detectChurnColumn · generateChartInsight · generateHypotheses

→

Render Functions

renderIntelligenceSummary · renderChurnDrivers · renderCharts · renderInsights

→

Dashboard UI

Chart.js visualisations · Insight boxes · Churn driver cards

Ask the Agent (Claude API)

Column summary + cohort stats + 30-row sample → sent as context with every question · Multi-turn conversation history maintained client-side · Structured JSON response parsed and rendered as answer cards

⟵

buildQueryContext()

Compiles data summary without sending full dataset — token-efficient context window

Feature Architecture

How each of the four core intelligence features is implemented.

Feature 1

💡 Chart Insight Engine

Automatic pattern detection and plain-English explanation for every chart rendered.

1

Chart type dispatch

generateChartInsight(data, col, chartType) receives histogram, categorical, or cohort as chartType and branches accordingly.

2

Histogram analysis

Detects peak bucket, computes top-bucket share, identifies skew direction by comparing mean vs median, flags zero-heavy distributions (>30% zeros).

3

Categorical analysis

Identifies dominant category and its share, detects binary vs multi-category splits, flags single-value columns as low-signal.

4

Cohort analysis

Computes percentage gap between highest and lowest cohort means, scales language intensity based on gap magnitude (>50% = "significant").

5

Render injection

Each chart card contains a hidden #insight_{id} box. After Chart.js renders, the insight text is injected and the box shown.

Feature 2

📉 Churn Driver Detection

Automatic identification and ranking of variables most associated with churn.

1

Churn column detection

detectChurnColumn() first looks for column names matching /churn|cancel|attrition|exit/, then falls back to any binary column with yes/no/0/1 values.

2

Categorical churn rates

For each categorical column (2–20 unique values), groups are formed and churn rate computed as churned / total × 100 per segment. Min 3-row sample filter applied.

3

Numeric median split

Each numeric column is split at median into "Below Median" and "Above Median" groups and churn rates compared — a proxy for high/low signal without binning assumptions.

4

Impact ranking

Drivers are ranked by spread (max churn rate − min churn rate). Critical = >25pp, High = >12pp, Medium otherwise. Top 5 are displayed.

5

Context-aware recommendations

generateChurnReco() pattern-matches the column name (plan, channel, device, region, onboarding, support) and returns a tailored recommendation string.

Feature 3 & 4

🧠 Intelligence Summary

Auto-synthesised top-level view of risk, churn rate, driver and recommended action.

1

Runs on every load

renderIntelligenceSummary() is called first in the pipeline, before churn drivers render — so the summary panel is always the first thing a PM sees.

2

Top risk segment

Iterates all categorical columns and all segments to find the absolute highest churn rate across the entire dataset. Annotates if it is >1.5× the overall rate.

3

Primary driver

Repeats the driver detection logic independently and surfaces the variable with the largest cross-segment churn spread alongside its pp value.

4

Narrative generation

A single prose paragraph is assembled from computed values using string templates — no LLM needed. If no churn column is detected, a general dataset summary is shown instead.

Feature 5

🤖 Ask the Agent (LLM Q&A)

Natural language questions answered using the Claude API with your actual data as context.

1

Context window construction

buildQueryContext() compiles column definitions, cohort statistics (mean per group for top 2 categorical × top 5 numeric), and a 30-row CSV sample — kept under ~2,000 tokens.

2

Structured output prompt

The system prompt instructs Claude to return strict JSON with four fields: direct_answer, key_metrics[], insight, recommended_action — enabling deterministic UI rendering.

3

Multi-turn memory

A chatHistory array maintains the last 6 conversation turns (12 messages). Data context is only sent with the first message per question — not repeated in history to save tokens.

4

Response rendering

JSON is parsed and rendered into structured answer cards: a prose answer, metric pill badges, an insight callout, and a recommended action — all from a single API call.

Technology Stack

Everything runs in the browser. No build step, no server, no database.

📄

HTML / CSS / JS

Single self-contained file. All logic, styles and markup in one place. No bundler needed.

Vanilla

📊

PapaParse 5.4

Robust CSV parsing with header detection, dynamic typing and empty row skipping.

CDN

📈

Chart.js 4.4

Bar charts, histograms and cohort comparisons. Instances tracked and destroyed on reset to prevent canvas leaks.

CDN

✦

Claude API

claude-sonnet-4 used for natural language Q&A. Called directly from the browser via the /v1/messages endpoint.

Anthropic

🔡

Geist + Instrument Serif

Geist (body/UI) and Geist Mono (data/labels) from Google Fonts. Instrument Serif for display headings.

Google Fonts

🧮

Statistical Engine

Custom pure-JS implementations of mean, median, stdDev, Pearson r, histogram binning and frequency distribution.

Custom

🗂

STATE Object

Single shared mutable object holding rawData, headers, columnMeta, numericCols, catCols and chart instances.

In-memory

🔒

Privacy by Design

CSV data never leaves the browser except for 30-row samples sent to Claude when the PM asks a question.

Client-only

Design Principles

Decisions that shaped how this tool was built.

🚫

No Backend Required

Everything runs in the browser. The file can be opened directly from disk, hosted on GitHub Pages, or shared as an email attachment — zero infrastructure needed.

🎯

PM-First Output

Every analysis result is expressed as a business insight or recommended action, not raw statistics. The tool answers "so what?" automatically.

⚡

Instant Results

The full pipeline — parse → analyse → render all sections — completes in under a second for typical datasets. No loading screens except for the Claude API call.

🔍

Auto-Detection First

Column types, churn signals and segmentation opportunities are detected automatically. The PM doesn't need to configure anything — the tool adapts to whatever CSV is uploaded.

📐

Rule-Based Core, AI at the Edge

The intelligence summary, churn drivers and chart insights are computed deterministically using JS. Claude is only invoked for open-ended natural language Q&A — keeping the tool functional without an API key.

🔁

Stateless & Resettable

All state lives in a single JS object. Clicking "Remove file" resets everything cleanly. No stale state, no memory leaks — the tool is safe to use with multiple datasets in a session.

Function Reference

Key JavaScript functions and what they do.

${[ ['processFile(file)','File I/O','Orchestrates the full pipeline: parse → analyse → render all sections'], ['analyseColumns()','Analysis','Classifies each column, extracts nulls, unique counts, stats and sample values'], ['renderIntelligenceSummary()','Intelligence','Computes and renders the top-level 4-cell summary panel'], ['renderChurnDrivers()','Churn','Detects churn column, computes rates per segment, renders ranked driver cards'], ['detectChurnColumn()','Churn','Finds a churn column by name pattern then falls back to binary value detection'], ['isChurned(row, col)','Churn','Returns true if a row represents a churned user (yes/1/true)'], ['generateChurnReco(driver, top, bot, rate)','Churn','Returns a context-aware recommendation string based on column type'], ['generateChartInsight(data, col, type)','Charts','Analyses chart data and returns a human-readable insight string'], ['renderCharts()','Charts','Creates histogram and frequency charts with Chart.js; injects insight boxes'], ['renderCorrelationHeatmap()','Charts','Computes Pearson r matrix and renders colour-coded HTML table with insight'], ['runCohortAnalysis()','Cohort','Groups data by selected category, computes means, renders bar chart + table + insight'], ['generateHypotheses()','Hypotheses','Generates correlation, cohort, data quality and distribution hypotheses'], ['generateInsights()','Insights','Produces human-readable findings from statistical patterns'], ['generateRecommendations()','Recommendations','Creates prioritised action cards from churn and statistical findings'], ['buildQueryContext()','AI Q&A','Compiles token-efficient data context for Claude API calls'], ['submitQuery()','AI Q&A','Handles user question: builds context, calls API, parses JSON, renders answer'], ['pearson(a, b)','Stats','Computes Pearson correlation coefficient between two arrays'], ['histogram(vals, bins)','Stats','Buckets numeric values into equal-width bins with labels'], ['mean / median / stdDev','Stats','Pure JS descriptive statistics utilities'], ['resetAll()','State','Destroys all chart instances, resets STATE, hides all sections'], ].map(([fn, mod, desc]) => ` `).join('')}

Function	Module	Description
${fn}	${mod}	${desc}