Fin-Qwen Design Candidates

01

Comparison first

Baseline Delta Board

Same prompt schema prompt

Base Qwen3-8B

Flags "AI pivot" as positive, misses going-concern sarcasm.

Fin-Qwen V2

Reads the post as bearish and marks risk_flag true.

Hard eval weighted F1 0.7382 -> 0.8528

Focus

Best for proving fine-tuning value against a schema-prompted base model.

Structure

Hero comparison, metric deltas, side-by-side cases, then limitations.

Hero

Two large output panels labeled Base and Fin-Qwen with a hard-eval delta strip.

Middle

Clean vs hard eval chart, explanation of why hard cases matter, small caveat block.

Demo

Static same-input comparison for AI hype, dilution, leverage, and ticker ambiguity.

Metrics

Hard F1, Clean F1, Clean/Hard MAE, Qwen3-8B, QLoRA, DeepSeek teacher.

Pros

Very clear in 20 seconds; strong recruiter readability.

Cons

Less space for pipeline depth unless the page gets longer.

02

Classifier demo

Sentiment Classifier Console

This 30x ETH long cannot go wrong; breakout or bust.

12345

TickerETH

Risktrue

Focus

Best for showing the model as a usable financial sentiment classifier.

Structure

App-like hero, explanation of fields, preset examples, evaluation proof.

Hero

A compact classifier surface with input, sentiment score, ticker chips, and risk badge.

Middle

Field definitions, sentiment scale, risk separate from sentiment direction.

Demo

Mock presets switch visually between bearish, neutral, bullish, and risk-heavy posts.

Metrics

Sentiment score, risk_flag, ticker extraction, Weighted F1, Financial NLP.

Pros

Concrete and easy to understand without reading training details.

Cons

Can look like a product demo rather than a research/engineering project.

03

Structured output

Post to JSON Contract

Raw post

"imo, $IONQ expands ATM capacity right after a cash burn update."

{
  "sentiment_score": 2,
  "tickers": ["IONQ"],
  "risk_flag": true,
  "reasoning": "ATM capacity after cash burn suggests dilution pressure."
}

Focus

Best for communicating output contracts and downstream system readiness.

Structure

Input/output hero, schema fields, parser reliability, example gallery.

Hero

Noisy market post on the left, strict JSON object on the right.

Middle

Field cards for sentiment_score, reasoning, tickers, and risk_flag.

Demo

Multiple static posts map into consistent JSON blocks with highlighted fields.

Metrics

Schema validation, JSON output, ticker exact, risk flag, Hugging Face, PEFT.

Pros

Strong engineering signal and clearly distinct from free-form chat demos.

Cons

Less emotionally engaging than case or dashboard layouts.

04

Pipeline walkthrough

Training Pipeline Walkthrough

01Raw text

posts, FiQA, hard templates

02Teacher JSON

DeepSeek V3.2 labels

03QLoRA

4-bit Qwen3-8B

04Eval

clean + hard splits

V2 training mix1,715 hard + 3,000 replay = 4,715

Focus

Best for explaining the full model-building workflow end to end.

Structure

Hero pipeline, data sources, teacher labeling, training setup, evaluation.

Hero

Horizontal process diagram from noisy posts to local 8B structured model.

Middle

Data cleaning, SFT JSONL shape, QLoRA config, V2 continued training.

Demo

Mock artifact cards: raw row, teacher label, training row, eval result.

Metrics

7,719 teacher-labeled rows, 4,715 V2 mix, 200 clean eval, 385 hard eval.

Pros

Shows that this is not only prompt engineering.

Cons

More process-heavy; needs careful hierarchy to avoid feeling like documentation.

05

Distillation story

Teacher Student Distillation

Teacher DeepSeek V3.2

Finance-aware structured reasoning

->

Student Qwen3-8B

Local schema-following adapter

score boundaries ticker extraction risk flagging slang reasoning

Focus

Best for showing the core learning strategy: transfer teacher behavior into a smaller model.

Structure

Teacher/student hero, supervision schema, what the student learns, evaluation proof.

Hero

Large teacher model flowing into a local Qwen3-8B student with behavior tokens.

Middle

Teacher output fields, SFT training format, QLoRA adapter explanation.

Demo

Three-column Base, Fin-Qwen, Teacher answer for the same hard post.

Metrics

DeepSeek V3.2, Qwen3-8B, QLoRA, TRL SFTTrainer, Hard F1.

Pros

Clear model-learning narrative; good for LLM roles.

Cons

Needs a caveat that teacher labels are not human gold.

06

Metrics first

Metric-First Experiment Page

Clean F10.8840

Hard F10.8528

Clean MAE0.2650

Hard MAE0.3247

Hard eval0.73820.8528

Focus

Best for readers who want evidence before narrative.

Structure

KPI hero, eval table, metric definitions, error categories, caveats.

Hero

Four proof cards: Clean F1, Hard F1, Clean MAE, Hard MAE.

Middle

Clean vs hard split explanation and metric definition cards.

Demo

Small output examples appear after the metrics as qualitative validation.

Metrics

Weighted F1, MAE, eval split sizes, base vs Fin-Qwen deltas.

Pros

Credible and fast for technical reviewers.

Cons

Can feel dry if the visual treatment is too table-heavy.

07

Hard case analysis

Hard Case Gallery

Sarcasm

"totally fine" after warning

Dilution

ATM after cash burn

Leverage

30x long, breakout

AI hype

pivot after going concern

Ticker

KONE vs $BA

Mixed facts

margin up, inventory up

Reviewed hard eval385 examples

Focus

Best for showing why financial social media is a domain-specific NLP problem.

Structure

Problem-first hero, hard-case gallery, model response examples, limitations.

Hero

Grid of difficult language patterns with one headline case highlighted.

Middle

Category cards for sarcasm, dilution, rug/scam, leverage, AI hype, ticker ambiguity.

Demo

Each card expands visually into input, Fin-Qwen JSON, and why it is difficult.

Metrics

Hard F1, hard eval size, risk_flag, ticker exact, curated hard cases.

Pros

Memorable and domain-specific; explains the project value well.

Cons

Needs restraint so it does not become a wall of examples.

08

Low VRAM training

Low-VRAM Fine-Tuning Showcase

Full 8B trainNot practical

4-bit QLoRA12GB GPU

4-bit loading LoRA adapters Unsloth TRL SFTTrainer

Focus

Best for emphasizing practical training constraints and efficient adaptation.

Structure

Hardware hero, QLoRA setup, training config, results, next steps.

Hero

Single-GPU training board comparing full fine-tuning vs 4-bit QLoRA.

Middle

LoRA rank, max sequence length, batch configuration, V2 continued training.

Demo

Mock training run panel with adapter, GPU memory, and eval checkpoint cards.

Metrics

12GB VRAM, Qwen3-8B, 4-bit, LoRA, Unsloth, PyTorch CUDA.

Pros

Strong signal for hands-on ML engineering.

Cons

Less focused on user-facing model behavior.

09

Domain language

Financial Slang Understanding

diamond hands

conviction, not always low risk

rug

scam or liquidity collapse risk

ATM offering

possible dilution pressure

dead cat bounce

bearish rebound skepticism

VWAP

trading reference, not ticker

Focus

Best for showing that the model understands finance-specific social language.

Structure

Slang hero, glossary-to-reasoning map, examples, eval proof.

Hero

Decoder wall mapping slang tokens to model interpretation and risk implications.

Middle

Category sections for meme language, offering/dilution, leverage, scams, mixed facts.

Demo

Mock post annotation that highlights slang terms and the reasoning they trigger.

Metrics

Hard eval, financial slang, sarcasm, risk flagging, Financial NLP.

Pros

Distinct and memorable; makes domain difficulty obvious.

Cons

Can feel less rigorous if not paired with evaluation metrics.

10

Risk dashboard

Risk Flagging Dashboard

$IONQ ATM capacity after cash burnBearish

AI pivot after going concernHigh risk

$AAPL margin up, inventory upMixed

Risk flagtrue/false

Ticker chipsIONQ, AAPL

Focus

Best for expressing risk_flag as a separate signal from bullish/bearish sentiment.

Structure

Dashboard hero, risk taxonomy, post list, structured output, limitations.

Hero

Non-production risk monitor mock with flagged financial social posts.

Middle

Risk categories: rug/scam, dilution, leverage, cash burn, going concern.

Demo

Static watchlist rows reveal sentiment score, ticker, risk flag, and reasoning.

Metrics

Risk F1 concept, hard eval, risk_flag, ticker extraction, structured JSON.

Pros

Applied and visually clear; good for showing downstream usefulness.

Cons

Must avoid implying a production-grade financial risk system.

11

Report style

Model Evaluation Report

Evaluation memoV2 hard-data adapter

Clean eval2000 overlap

Reviewed hard eval385QA filtered

Hard F10.73820.8528

Hard MAE0.53250.3247

Focus

Best for a formal, evidence-backed model evaluation story.

Structure

Report hero, setup, datasets, metrics, results, caveats, next steps.

Hero

A clean evaluation memo with tables and split definitions.

Middle

Metric definitions, train/test overlap checks, hard-data QA process.

Demo

Qualitative case appendix with one strong case and one limitation case.

Metrics

Weighted F1, MAE, clean/hard split size, train/test overlap, teacher labels.

Pros

Professional and conservative; fits technical reviewers.

Cons

Less visually dynamic than dashboard or workbench directions.

12

Recruiter snapshot

Concise Portfolio Page

One-line value

Distilled financial social-media reasoning into a local 8B model.

ModelQwen3-8B

MethodQLoRA

OutputJSON

Hard F10.8528

Focus

Best for a fast-scanning portfolio page aimed at recruiters.

Structure

Hero summary, three proof cards, one demo, one process strip, limitations.

Hero

Large title, one-sentence project summary, and four proof chips above the fold.

Middle

What I built, how I trained it, how I evaluated it.

Demo

One concise Base vs Fin-Qwen comparison rather than many examples.

Metrics

Qwen3-8B, QLoRA, Hard F1, structured JSON, 12GB GPU.

Pros

Fastest to understand and most compatible with the current portfolio style.

Cons

May underplay the richer experiment and hard-case work.

13

Deep dive

Engineer Deep-Dive Technical Page

DataClean + hard sources

dedupe, language filter, overlap check

LabelsTeacher JSON

score, reasoning, tickers, risk

TrainUnsloth QLoRA

4-bit Qwen3-8B adapter

lr: 5e-5
epochs: 1
effective_batch: 8
max_seq_length: 2048

Focus

Best for engineers who want architecture, config, and reproducibility details.

Structure

Architecture hero, data pipeline, training config, eval scripts, demo outputs.

Hero

Blueprint-style system diagram with stack, data, labels, adapter, evaluation.

Middle

Training row format, prompt contract, LoRA setup, deterministic eval scripts.

Demo

Mock config and eval output panels beside selected hard cases.

Metrics

PyTorch, HF Datasets, TRL, PEFT, bitsandbytes, Unsloth, QLoRA.

Pros

Strong technical credibility; useful for engineering interviews.

Cons

Dense and less accessible to non-technical readers.

14

Split screen

Input Reasoning JSON Workbench

Input

"AI pivot" after going-concern warning.

Reasoning

Sarcasm plus financial distress signal.

{
  "score": 1,
  "tickers": [],
  "risk": true
}

Focus

Best for showing the input-to-reasoning-to-JSON transformation in one glance.

Structure

Split hero, field-level reasoning, examples, evaluation proof, limitations.

Hero

Three-pane workbench: input post, model reasoning, structured JSON.

Middle

Reasoning tokens map to sentiment score, ticker extraction, and risk flag.

Demo

Static split-screen cases for AI hype, ATM dilution, and neutral mixed facts.

Metrics

sentiment_score, reasoning, tickers, risk_flag, schema prompt baseline.

Pros

Highly concrete and visually balanced.

Cons

Less focused on training pipeline unless added below the fold.

15

Dark lab

Dark Model Lab

run: v2-hard-adaptereval ok

adapter: fin-qwen-v2
base: qwen3-8b-4bit
hard_f1: 0.8528
hard_mae: 0.3247

GPU12GB

FormatJSON

Focus

Best for a more atmospheric but still engineering-oriented model lab direction.

Structure

Dark console hero, run summary, training pipeline, eval dashboard, demo cases.

Hero

Dark lab console with adapter, metrics, and schema validation status.

Middle

Training cards, hard-case gallery, and model comparison panels.

Demo

Mock terminal output plus side-by-side response panes.

Metrics

Hard F1, Hard MAE, QLoRA, adapter, CUDA, structured output.

Pros

Most visually distinct; makes the page feel like an ML lab.

Cons

Dark treatment diverges more from the current light portfolio style.

16

Editorial

Light Editorial Case Study

Case study

Distilling market slang into structured signals.

Pull quote

The hard part is not sentiment alone. It is sarcasm, ticker ambiguity, and risk language in the same short post.

Focus

Best for a polished narrative case study that still feels restrained.

Structure

Editorial hero, problem, approach, results, demo, limitations.

Hero

Large serif title, short intro, and one elegant JSON/post visual.

Middle

Text-led sections with inline charts and compact example callouts.

Demo

One carefully chosen mock comparison embedded in the story flow.

Metrics

Hard F1 lift, Clean F1 lift, Qwen3-8B, QLoRA, Teacher-student distillation.

Pros

Most compatible with a professional portfolio and easy to read.

Cons

Less visually experimental than other directions.

17

Notebook inspired

Experiment Notebook

Cell 01 / load data

clean_eval = 200
hard_eval = 385

Cell 02 / compare

hard_f1: 0.7382 -> 0.8528
hard_mae: 0.5325 -> 0.3247

Focus

Best for showing experimentation and reproducibility.

Structure

Notebook hero, data prep cells, training cells, eval cells, case inspection.

Hero

Notebook cells with code-like snippets and rendered metric outputs.

Middle

Sequential experiment narrative: prepare, annotate, train, evaluate, inspect.

Demo

Mock cells showing input post, model output, and validation result.

Metrics

Seeded split, overlap check, Weighted F1, MAE, eval reports.

Pros

Good for research workflow credibility.

Cons

Could feel like internal tooling rather than portfolio storytelling.

18

Reliability

Structured Output Reliability

Field

Type

Valid

Used by

sentiment_score

int 1-5

pass

charts

tickers

array

pass

filters

risk_flag

bool

pass

alerts

Focus

Best for showing why strict structured generation matters for downstream use.

Structure

Schema reliability hero, field matrix, validation logic, comparison examples.

Hero

Schema matrix showing expected type, validation status, and downstream use.

Middle

Malformed base output vs corrected Fin-Qwen output and parser-friendly fields.

Demo

Mock validator accepts Fin-Qwen JSON and flags missing or wrong-type fields.

Metrics

JSON/schema valid, ticker exact, risk_flag, structured JSON, parser checks.

Pros

Excellent systems angle for LLM application roles.

Cons

Less focused on training novelty unless paired with pipeline details.

19

Academic minimal

Minimal Academic Project

Abstract

Fin-Qwen: Structured Reasoning for Financial Social Text

MethodQLoRAQwen3-8B

Clean F10.82200.8840

Hard F10.73820.8528

Focus

Best for a sober academic-style summary with method and results.

Structure

Abstract hero, method, data, evaluation, results, limitations.

Hero

Paper-like title, abstract, and compact result table.

Middle

Method diagram and concise dataset/evaluation descriptions.

Demo

One figure-style case study with input and output schema.

Metrics

Weighted F1, MAE, clean/hard eval, QLoRA, teacher labels.

Pros

Serious, compact, and honest about limitations.

Cons

May feel too plain for a personal website if used unchanged.

20

Interactive mock

Interactive Mock Classifier

Selected output

sentiment_score: 2

tickers: ["IONQ"]

risk_flag: true

Focus

Best for a final page that feels demo-ready while remaining fully static.

Structure

Interactive-looking hero, preset examples, comparison, training/eval proof.

Hero

Static classifier app shell with preset buttons and rendered mock output.

Middle

How the mock maps to actual model fields, followed by pipeline and metrics.

Demo

Client-side static JS can switch between curated mock cases without any API call.

Metrics

sentiment_score, tickers, risk_flag, Hard F1, Clean F1, QLoRA.

Pros

Most engaging for selection; easy to imagine as the final detail page.

Cons

Requires the most polish to avoid looking like an unfinished real app.

20 static page directions for a financial LLM fine-tuning case study.

All candidates in one scan.

Baseline Delta Board

Sentiment Classifier Console

Post to JSON Contract

Training Pipeline Walkthrough

Teacher Student Distillation

Metric-First Experiment Page

Hard Case Gallery

Low-VRAM Fine-Tuning Showcase

Financial Slang Understanding

Risk Flagging Dashboard

Model Evaluation Report

Concise Portfolio Page

Engineer Deep-Dive Technical Page

Input Reasoning JSON Workbench

Dark Model Lab

Light Editorial Case Study

Experiment Notebook

Structured Output Reliability

Minimal Academic Project

Interactive Mock Classifier