System Verification and Validation Plan

1 Introduction

This document outlines the comprehensive testing strategy for morph, a web-based text editor that explores new interfaces for machine-assisted creative writing. The plan ensures that morph meets its primary objectives: providing spatial interfaces for non-linear exploration of writing, implementing efficient feature steering through sparse autoencoders (SAEs), and maintaining a file-over-app philosophy that preserves user agency. The V&V activities described in this document cover both the web-based editor (morph) and the ML inference server (asteraceae). Key areas of validation include:

Verification of the planning interface and suggestion generation capabilities
Testing of feature steering mechanisms using SAEs
Validation of the OpenAI-compatible inference API
Assessment of system performance, particularly Time-to-First-Token (TTFT) and throughput
Evaluation of user interaction patterns and accessibility features
Verification of security measures and data privacy controls

2 General Information

2.1 Symbols, Abbreviations, and Acronyms

The following data dictionary from SRS will be used for the symbols:

7.2 Data Dictionary

Name Content Type
morph morph + ReverseProxy + asteraceae package
morph Web interface for morph package
ReverseProxy A middleware for reverse proxy with load balancer module
asteraceae Search + ActivationCache + SAEs + logit_processor + Exo package
logit_processor A function to convert incoming requests to logits function
exo Inference engine to run given LLM package
LLM open-weights models to be run for inference package
BlockManager Help manage KV-cache during inference class
SAEs a set of sparse autoencoders trained against given LLM to steer activations generation package
Search Additional search tool to improve correctness module
ActivationCache Store said generations activations for performance purposes class
streaming JSON Outputs from asteraceae will be streamed back to morph text
tensor $n \times m$ matrix(A matrix of shape (n tokens × m embedding size), where each row is a float vector embedding for a token.) represented inputs processed by logit_processor text

see also logits and sparse autoencoders
Link to original

Name	Content	Type
`morph`	`morph` + `ReverseProxy` + `asteraceae`	package
`morph`	Web interface for `morph`	package
`ReverseProxy`	A middleware for reverse proxy with load balancer	module
`asteraceae`	`Search` + `ActivationCache` + `SAEs` + `logit_processor` + `Exo`	package
`logit_processor`	A function to convert incoming requests to logits	function
`exo`	Inference engine to run given LLM	package
LLM	open-weights models to be run for inference	package
`BlockManager`	Help manage KV-cache during inference	class
SAEs	a set of sparse autoencoders trained against given LLM to steer activations generation	package
Search	Additional search tool to improve correctness	module
`ActivationCache`	Store said generations activations for performance purposes	class
streaming JSON	Outputs from `asteraceae` will be streamed back to `morph`	text
tensor	$n \times m$ matrix(A matrix of shape (n tokens × m embedding size), where each row is a float vector embedding for a token.) represented inputs processed by `logit_processor`	text

Additionally the following abbreviations will be used:

symbols	description
T	Test
V&V	Verification and Validation
UI	User Interface
SRS	Software Requirement Specification
CI	Continuous Integration

2.2 Purpose

V&V Plan outlines the testing, validation and verification procedures that are to be implemented morph.

revisions

Test cases mentioned in this document are subject to change based on the development progress and feedback from the supervisor and peers. The plan will be updated accordingly to ensure that all aspects of the system are thoroughly tested and validated.

2.3 Scope

The following entails the scope of the V&V plan for morph:

2.3.1 In scope V&V Objectives

Functional verification of the web-based editor (morph), including:
- Text editing capabilities and file management
- Planning interface functionality
- Version control features
- Configuration and preference management
- Accessibility features and keyboard navigation
Validation of the ML inference server (asteraceae), covering:
- Feature steering using sparse autoencoders (SAEs)
- OpenAI-compatible API endpoints
- Performance metrics (TTFT, throughput)
- Batch processing capabilities
- KV cache optimization
Security and compliance testing:
- HTTPS encryption for all communications
- Data privacy controls
- Content filtering for harmful outputs
- SOC 2 compliance validation
User experience validation:
- Usability testing of core writing features
- Response time measurements
- Cross-browser compatibility
- Theme customization
- Accessibility compliance
Integration testing between components:
- Editor-server communication
- File system interactions
- Feature steering feedback loop

2.3.2 Out of scope V&V Objectives

Testing of underlying language models (Gemma 2, Llama 3.1)
Validation of third-party libraries and frameworks (React, BentoML, vLLM)
Hardware-specific performance optimization
Testing on unsupported browsers or operating systems
Penetration testing of infrastructure components
Cloud provider-specific testing (beyond basic deployment verification)
Testing of external API integrations beyond OpenAI compatibility
Validation of multilingual support (initial release focuses on English)

Challenge Level and Extras

Challenge level for morph is advanced, since the project is exploring what is possible to build AI-native interfaces. It involves the field of mechanistic interpretability, which is a relatively new field in alignment research that involves a lot of domain knowledge into the inner workings of transformer circuits.

Extras that we wish to pursue: User Manual, Usability Testing.

Reasoning:

User Manual: We plan to document the functionality and usage of morph to ensure that users can effectively understand and navigate the tool. This includes clear instructions, visual examples, and troubleshooting tips to make the tool more accessible.

Usability Testing: As morph aims to explore new interactions for writing, it is important to validate our design and interaction patterns through user testing. Gathering feedback from real users will help refine our interface and ensure it supports a smooth and intuitive experience.

Link to original

2.4 Relevant documentation

See also Software Requirement Specification and Hazard Analysis documents for more information

3 Plan

This section provides general information about the plan of testing, including testing team members, milestones and detailed sections of verifications.

3.1 Verification and Validation Team

The following project members are responsible for all procedures of the validation process, including writing and executing tests:

Team Member	Responsibility
Aaron Pham	SRS Verification: ensure the SRS document accurately describes system functionalities and constraints. Implementation Verification (specifically functional requirements and performance): Ensures implemented features align with the SRS specifications.
Nebras Khan	Design Verification(evaluate software architecture), UI/UX testing(Consistency and Usability), and system interaction validation(API interaction)
Waleed Malik	Automated Testing (focusing on backend logic, integration tests, and continuous integration pipelines)
Zhiwei Li	Software Validation (end-to-end testing by simulating real user, regression testing for new features for each stage, and security testing for vulnerability)
Dr. Spencer Smith	Project Supervisor
Dr. Swati Mishra	Project Supervisor
Hunter Ceranic	Teaching Assistent as Project Supervisor

Regular Guidance and Review:

Weekly reviews and feedback sessions with Dr. Swati Mishra to ensure testing progress aligns with project milestones and system requirements.
Additional support and guidance will be provided by TA Hunter Ceranic as needed, especially during critical testing phases.

Collaborative Testing:

Verification and validation tasks related to large documents (e.g., the SRS) will be divided among team members based on their expertise. Multiple team members may collaborate on key sections to ensure thorough coverage and efficiency.
Overlapping areas will be coordinated during team meetings to avoid redundancy and ensure clear role allocation.

3.2 Milestones

Following table contains dates relevant to important milestones

Testing	Expected Date of Completion
Look and Feel Testing	2025-01-10
Usability and Humanity Testing	2025-01-15
Performance Testing	2025-01-20
Security Testing	2025-02-20
Maintainability and Support Testing	2025-02-05
Compliance Testing	2025-01-20
Unit Testing	2025-02-05
Tests for Nonfunctional Requirements	2025-02-05

Table 3: Milestones

3.3 SRS Verification Plan

The verification of the SRS will follow a systematic approach to ensure completeness, consistency, and alignment with project goals. This process will occur over a one-month period and involve multiple stakeholders.

3.3.1 Review Approaches

Supervisor Reviews
- Weekly 15-minute discussion sessions during regular meetings
- Focus on technical feasibility and scope alignment
- Documentation of feedback and required changes
- Follow-up verification of implemented changes
Internal Team Reviews
- Bi-weekly structured review sessions
- Each team member assigned specific sections for detailed review
- Cross-validation of requirements between related sections
- Documentation of findings in a shared review log
Requirements Traceability
- Mapping requirements to design elements
- Verification of requirement dependencies
- Identification of gaps or overlaps
- Documentation of requirement relationships
External Peer Review:
- Inviting feedback from peers outside the development team
- Review of requirements documentation and clarity
- Identification of potential conflicts or ambiguities
- Documentation of external feedback and proposed changes
Action Item Tracking
- Logging action items or required changes as GitHub issues and assigning them to the appropriate team member for resolution
- Maintaining a shared task tracker to list pending and completed action items, ensuring visibility and accountability
- Discussing progress on action items during weekly supervisor meetings and internal reviews, with unresolved items flagged for follow-up
- Updating corresponding sections in project documents upon completion of each action item

3.3.2 Structured Review Process

Individual Review Phase (Week 1)
- Team members review assigned sections
- Document initial findings and concerns
- Identify potential conflicts or ambiguities
- Prepare discussion points for team review
Team Review Phase (Week 2)
- Group discussion of individual findings
- Resolution of identified issues
- Documentation of decisions and changes
- Update of requirements based on feedback
Supervisor Review Phase (Week 3)
- Present consolidated findings
- Incorporate supervisor feedback
- Document required changes
- Plan implementation timeline
Final Verification Phase (Week 4)
- Implement all approved changes
- Final review of updates
- Version control and documentation
- Sign-off on verified requirements

3.3.3 Verification Checklist

Completeness

All functional requirements have clear acceptance criteria
Non-functional requirements are measurable and testable
User interface requirements cover all interaction points
Performance metrics are clearly defined
Security requirements address all identified risks

Consistency

No conflicts between requirements
Terminology is used consistently throughout
Requirements align with project goals
Dependencies are properly identified
Priorities are consistently applied

Clarity

Requirements are unambiguous
No use of unclear terms or jargon
Each requirement is atomic and verifiable
Rationale is provided for key decisions
Constraints are clearly stated

Technical Feasibility

Resource requirements are realistic
Performance targets are achievable
Technology choices are justified
Integration points are well-defined
Scalability considerations are addressed

Documentation Quality

Document follows template guidelines
Sections are properly cross-referenced
Version control is maintained
Changes are tracked and justified
Supporting diagrams are clear and accurate

The verification process will be coordinated by Aaron Pham, with specific section ownership assigned to team members based on their areas of expertise. Progress will be tracked through GitHub issues, and final verification results will be documented in the project repository.

3.4 Design Verification Plan

Design reviews will be conducted with supervisor during the weekly meet-up, performing iterative reviews to cover different parts of the design. Each review session will focus on verifying that design choices meet the functionality of performance needs specified in the SRS.

POC demo session will also be used for design verification. This process will take a month.

After the SRS review period, design verification will be conducted through iterative reviews with supervisors during weekly meetings. Each review session will focus on verifying that design choices meet the functional and performance requirements specified in the SRS.

3.4.1 Specification Verification

Category	Verification Items	Acceptance Criteria	Method
Architecture	System components and their interactions	Matches SRS architecture diagrams; Clear separation of concerns	Design review, Architecture walkthrough
Data Flow	Data handling between components	Follows file-over-app philosophy; Efficient data transmission	Flow diagram review, Data path analysis
Interface Design	API contracts and UI components	OpenAI-compatible endpoints; Consistent UI patterns	API spec review, UI mockup validation
Security Design	Authentication and data protection	Meets SOC2 requirements; Implements secure communication	Security design review
Deployment	Infrastructure and scaling strategy	Supports auto-scaling; Enables zero-downtime updates	Infrastructure diagram review

3.4.2 Functional Verification

Component	Design Aspects	Verification Method	Success Criteria
Text Editor	Buffer management, File operations	Code review, Prototype testing	Smooth text manipulation, File persistence
Planning Interface	Layout, Interaction flow	UI/UX review, User testing	Intuitive navigation, Clear feedback
Feature Steering	SAE integration, Activation caching	Technical review, Performance testing	Accurate steering, Efficient caching
Version Control	Diff tracking, History management	Design review, Data structure validation	Reliable version tracking
Inference Server	Request handling, Model integration	API review, Load testing	Meets TTFT requirements

3.4.3 Performance Verification

Metric	Target	Measurement Method	Verification Approach
TTFT	200-500ms	Response time monitoring	Load testing with varying batch sizes
Throughput	300 tokens/sec (batch=4)	Request processing rate	Concurrent request simulation
Memory Usage	<500MB per session	Resource monitoring	Memory profiling during operations
UI Responsiveness	<50ms input latency	Input lag measurement	UI performance testing
Cache Hit Rate	>80% for common operations	Cache statistics	Operation pattern analysis

3.4.4 Rationale and Benchmark References for Performance Metrics

The selected performance metrics for morph are informed by community benchmarks and design considerations typical of real-time LLM-assisted applications. These targets are meant to ensure low-latency interactions without requiring large-scale infrastructure.

Time to First Token (TTFT): The 200–500ms range is consistent with expectations for latency-sensitive tools like chat interfaces and writing assistants. Although OpenAI does not provide exact TTFT values, their rate limit documentation emphasizes prompt, sub-second responses. This threshold ensures users receive suggestions quickly enough to maintain creative flow.
Throughput: A target of approximately 300 tokens/second at batch size 4 reflects typical performance observed in community benchmarks for models like LLaMA 2 and Mistral 7B using optimized inference engines such as vLLM. On modern GPUs like the A100 or RTX 4090, these rates are frequently achievable in single-instance deployments without distributed scaling.

These targets are chosen to ensure:

Responsive, interactive text generation
Compatibility with moderately powerful consumer hardware
Support for multiple users or documents without significant delays

They support morph’s vision of enabling fast, local-first, and user-centric machine-assisted writing environments.

3.4.5 Documentation Validation

Document Type	Review Focus	Validation Method	Reviewers
API Documentation	Completeness, accuracy	Spec review, Example validation	Backend team
User Guide	Clarity, coverage	User testing, Content review	UX team
System Architecture	Consistency, completeness	Technical review	Tech leads
Deployment Guide	Accuracy, reproducibility	Trial deployment	DevOps team
Security Documentation	Compliance, comprehensiveness	Security audit	Security team

3.4.6 Review Schedule

Initial Design Review (Week 1)
- Architecture overview
- Component interaction validation
- Initial performance projections
Detailed Component Review (Week 2)
- Individual component design
- Interface specifications
- Data flow validation
Performance Design Review (Week 3)
- Caching strategy
- Resource optimization
- Scaling approach
Integration Design Review (Week 4)
- Component integration
- End-to-end flow
- System resilience

3.4.7 Validation Tools

Category	Tool	Purpose	Usage Phase
Code Analysis	ruff, eslint	Static analysis	Development
Performance	k6, Artillery	Load testing	Integration
Security	OWASP ZAP	Security scanning	Pre-release
Documentation	Vale	Style consistency	Documentation
API Testing	Postman	Endpoint validation	Development

3.4.8 Design Verification Checklist

This design verification plan will be executed in parallel with the development process, with findings and updates tracked through GitHub issues. Regular reviews will ensure that the implementation stays aligned with the verified design.

3.5 Implementation Verification Plan

In-group code walkthrough inspection is chosen as the main method to complete implementation verification of the project. The coding contents will be reviewed by other team members in exchange asynchronously. Basic grammar error in coding will be caught by linter of each different components. Detailed test cases can be found in system testing.

Final presentation is also expected to be a part of verification of usability, and its feedback will be collected and reflected on the final release.

3.6 Automated Testing and Verification Tools

The following include tools and automated tests that will be used for the verification of the system.

The automated testing strategy for morph is divided into two main components: the web-based editor (morph) and the ML inference server (asteraceae). Each component has specific tooling requirements and testing frameworks to ensure comprehensive coverage.

3.6.1 `morph`

Category	Tool	Purpose	Configuration
Unit Testing	Jest	Component-level testing	- Custom matchers for React components - Coverage threshold: 80% - Snapshot testing for UI components
End-to-End	Playwright	Browser automation testing	- Cross-browser testing (Chrome, Firefox, Safari) - Device emulation for responsive testing - Network condition simulation
Performance	Lighthouse	Frontend performance metrics	- Performance score target: >90 - First Contentful Paint < 1.5s - Time to Interactive < 3.0s
Accessibility	axe-core	WCAG compliance testing	- WCAG 2.1 Level AA compliance - Automated accessibility checks - Color contrast validation
Static Analysis	ESLint	Code quality enforcement	- TypeScript-specific rules - React hooks linting - Import sorting
Style Checking	Prettier	Code formatting	- 2-space indentation - Single quotes - No semicolons

3.6.2 `asteraceae`

Category	Tool	Purpose	Configuration
Unit Testing	pytest	Backend logic testing	- Asyncio support - Fixture-based test data - Parallel test execution
Performance	k6	Load and stress testing	- TTFT verification - Concurrent request handling - Resource utilization monitoring
API Testing	locust	API endpoint validation	- OpenAI compatibility testing - Response schema validation - Error handling verification
Static Analysis	ruff	Code quality checks	- Type checking - Import sorting - Code complexity limits
Security	Bandit	Security vulnerability scanning	- Known vulnerability checks - Security best practices - Dependency scanning

3.6.3 Critical Paths

Critical paths refer to components whose failure or malfunction would significantly compromise the system’s core functionality, user experience, or data integrity. These include but are not limited to:

Handles login flows, token generation, and access control.
Responsible for data exchange between the Morph frontend and Asteraceae backend, especially those processing user-generated input or returning AI-generated text.
Modules in Morph that manage file states, local storage, or editing buffers crucial for preserving user content.
Code that performs text generation, steering operations, or data validation in Asteraceae.
Interfaces that facilitate or transform information passed between Morph and Asteraceae.

These components are considered mission-critical and must meet 100% test coverage as per the code coverage requirements outlined in section 3.6.5. This classification helps ensure that all high-risk paths are thoroughly validated through automated testing.

3.6.4 CI Pipeline

flowchart TD
    A[Code Push] --> B[Static Analysis]
    B --> C[Unit Tests]
    C --> D[Integration Tests]
    D --> E[Performance Tests]
    E --> F[Security Scan]
    F --> G[Build & Package]
    G --> H[Deploy to Staging]

3.6.5 Testing Environment Configuration

Environment	Purpose	Configuration	Update Frequency
Development	Local testing	- Hot reloading - Mock services - Debug logging	Real-time
Integration	Component testing	- Containerized services - Test databases - Limited resources	Per PR
Staging	Pre-production validation	- Production-like setup - Full monitoring - Load testing enabled	Daily
Production	Live deployment	- High availability - Full security - Performance monitoring	Weekly

3.6.6 Automated Testing Requirements

Code Coverage Requirements
- Frontend (morph): 80% overall coverage
- Backend (asteraceae): 85% overall coverage
- Critical paths: 100% coverage
Enforcement Mechanism in CI/CD Pipeline:
- CI/CD jobs will run automated test coverage checks after each commit using Cloudflare and Vercel pipelines.
- If coverage thresholds are not met, merges to the main branch will be blocked using pre-merge checks in the pipeline.
- Automated coverage reports will be generated by the CI/CD pipeline and shared with the development team.
- Email notifications will be sent to developers when test coverage requirements are not met, providing details on failed areas.

Performance Testing Thresholds

const performanceThresholds = {
  ttft: "500ms", // Time to First Token
  p95: "2s", // 95th percentile response time
  errorRate: "1%", // Maximum error rate
  throughput: "300", // Tokens per second
}

Testing Hooks

# Pre-commit hooks
pre-commit:
  - ruff check
  - eslint
  - prettier
  - type checking
 
# Pre-push hooks
pre-push:
  - unit tests
  - integration tests
  - build verification

3.6.7 Quality Gates

Stage	Criteria	Action on Failure
Pull Request	- All tests pass - Coverage thresholds met - No security issues	Block merge
Integration	- E2E tests pass - Performance metrics met - API contract tests pass	Notify team
Deployment	- Smoke tests pass - No regression - Security scan clean	Rollback

3.6.8 Monitoring and Reporting

Metric Type	Tools	Frequency	Alert Threshold
Test Results	GitHub Actions	Per commit	Any failure
Coverage	Codecov	Per PR	< 80%
Performance	Grafana	Real-time	TTFT > 500ms
Security	Snyk	Daily	Critical vulnerabilities

3.7 Software Validation Plan

The software validation process ensures that morph meets user requirements and functions effectively in its intended environment. This plan outlines the systematic approach to validating the system through user testing, performance analysis, and functionality verification.

3.7.1 Validation Approach

Phase	Focus Area	Methods	Success Criteria
Alpha Testing	Core functionality	Internal testing, Developer validation	Basic feature completeness
Beta Testing	User experience	Limited user group testing	User satisfaction metrics
Release Candidate	Production readiness	Full system validation	All validation criteria met
Post-Release	Production monitoring	User feedback, Performance metrics	Continuous improvement metrics

3.7.2 User Validation Groups

Group	Description	Size	Duration	Focus Areas
Internal Users	Development team and stakeholders	5-8 people	Throughout development	Feature completeness, Technical stability
Beta Users	Selected writers and engineers	15-20 people	4 weeks	User experience, Feature utility
Usability Testers	UX researchers and target users	10-12 people	2 weeks	Interface design, Accessibility
Production Pilots	Early adopters	30-50 people	6 weeks	Real-world usage, Performance

3.7.3 Validation Scenarios

Category	Scenario	Success Criteria	Validation Method
Creative Writing	Complete essay draft using planning features	- Successful plan creation - Effective suggestion integration - Smooth writing flow	User observation, Feedback collection
Feature Steering	Customize text generation with specific style	- Accurate style adaptation - Consistent outputs - User satisfaction	Style comparison analysis
Document Management	Create and manage multiple writing projects	- Reliable file operations - Effective organization - Version control accuracy	User workflow analysis
Collaborative Use	Multiple users working on shared documents	- Consistent performance - Feature accessibility - Clear user permissions	Multi-user testing

3.7.4 Validation Metrics

Metric Category	Key Indicators	Target Values	Collection Method
User Satisfaction	- Net Promoter Score - Feature usefulness rating - Interface satisfaction	- NPS > 40 - Rating > 4/5 - >80% satisfied	User surveys, Interviews
Performance	- Response time - Feature availability - Error rate	- TTFT < 500ms - 99.9% uptime - <1% error rate	Automated monitoring
Feature Adoption	- Feature usage rates - Return user rate - Session duration	- >70% feature usage - >60% return rate - >15 min sessions	Analytics tracking
Writing Efficiency	- Time to complete drafts - Suggestion acceptance rate - Planning feature usage	- 20% time reduction - >50% acceptance - >30% usage	User metrics, Surveys

3.7.5 Validation Process

Pre-validation Setup
- Environment preparation
- Test data creation
- User group selection
- Success criteria documentation
Validation Execution
- Structured testing sessions
- User feedback collection
- Performance monitoring
- Issue tracking
Data Collection
- Automated metrics gathering
- User surveys
- Interview sessions
- Performance logs
Analysis and Reporting
- Metrics analysis
- User feedback synthesis
- Issue categorization
- Improvement recommendations
Feedback Integration and Iteration
- Prioritize fixes from analyzed feedback
- Assign tasks to relevant team members
- Apply updates in future sprints
- Re-test improved features with users
- Document changes and close the loop

3.7.6 Validation Schedule

Stage	Timeline	Activities	Deliverables
Planning	Week 1-2	- Validation strategy development - Test group formation - Environment setup	Validation plan document
Alpha Testing	Week 3-6	- Internal testing - Core feature validation - Performance baseline	Alpha test report
Beta Testing	Week 7-10	- User group testing - Feature refinement - Performance optimization	Beta test report
Final Validation	Week 11-12	- Full system validation - Documentation review - Release preparation	Final validation report

3.7.7 Acceptance Criteria

Component	Criteria	Validation Method
Text Editor	- Text manipulation responsiveness - File operation reliability - Interface accessibility - Customizable visualization workflows	Usability testing, Performance metrics, Accessibility testing
Planning Features	- Suggestion relevance - Planning workflow efficiency - User satisfaction - Customizable task and progress options	User feedback, Feature usage analysis, Workflow adaptability testing
Feature Steering	- Style accuracy - Generation consistency - User control effectiveness	Style analysis, User validation, Customization testing
Performance	- Response time targets - System stability - Resource utilization	Automated monitoring, Load testing
Security	- Data privacy - Access control - Secure communication	Security audit, Penetration testing

3.7.8 Issue Management

Priority	Response Time	Resolution Time	Escalation Path
Critical	1 hour	4 hours	Development lead → Project manager
High	4 hours	24 hours	Technical lead → Development lead
Medium	24 hours	72 hours	Developer → Technical lead
Low	48 hours	1 week	Developer → Issue tracking

This validation plan ensures that morph meets its intended purpose of providing an effective, user-friendly environment for creative writing while maintaining high standards for performance and reliability. The plan will be reviewed and updated based on findings throughout the validation process.

4 System Tests

This section outlines tests for verifying both functional and non-functional requirements of the software, ensuring it meets user expectations and perform reliably.

4.1 Tests for Functional Requirements

This section outlines test cases for verifying the core functional requirements of morph as specified in the SRS. The tests are organized by major feature areas, with each test designed to validate specific user interactions and system behaviors. For detailed input data associated with each test case, refer to (6.2 Test Input Table).

4.1.1 Planning and Suggestion Features

These tests verify requirements FR-1 through FR-3, which specify the system’s capability to provide writing assistance through planning suggestions and model-guided steering.

4.1.1.1 Planning Suggestion Validation

Test-FR-P1
- Control: Manual
- Initial State: Editor opened, planning mode active
- Input: Valid prompt (e.g., “environmental sustainability”)
- Output: Planning suggestions appear within MAX_DISPLAY_TIME
- Test Case Derivation: Based on FR-1’s requirement for timely suggestion generation
- How test will be performed: Enter prompt and start planning. Verify at least one suggestion appears within MAX_DISPLAY_TIME.
Test-FR-P2
- Control: Manual
- Initial State: Editor opened, planning mode active
- Input: Unintelligible prompt (e.g., “asdh123!@#”)
- Output: Error message requesting input refinement
- Test Case Derivation: Derived from FR-1’s need for robust input handling
- How test will be performed: Enter nonsensical characters and verify that an input validation error message appears clearly instructing the user to revise their input due to unrecognized content.
Test-FR-P3
- Control: Manual
- Initial State: Editor opened, planning mode active
- Input: Large text block (LARGE_INPUT_SIZE) on climate change
- Output: Condensed suggestions or length warning within LENGTH_RESPONSE_TIME
- Test Case Derivation: Based on FR-1’s requirement for handling varied input lengths
- How test will be performed: Paste a LARGE_INPUT_SIZE word passage and verify that the system responds within LENGTH_RESPONSE_TIME with a clear length-related warning message prompting the user to shorten the input.

4.1.2 Text Generation Features

These tests verify requirements FR-4 through FR-7, focusing on the system’s text generation and style adaptation capabilities.

4.1.2.1 Steering Validation

Test-FR-S1
- Control: Manual
- Initial State: Editor opened, steering feature enabled
- Input: “Didion” tone selection with prompt “reflection on modern life”
- Output: Text suggestions matching Didion’s writing style under Lexical Constraints.
- Test Case Derivation: Based on FR-2’s requirement for style-specific generation
- How test will be performed: Enable steering, select “Didion” tone, and generate output using the prompt. Compare the output to the (6.3.1 Didion’s Writing Style Validation) checklist.
Test-FR-S2
- Control: Manual
- Initial State: Editor opened, steering feature enabled
- Input: User’s writing sample for style adaptation
- Output: Customized suggestions within STYLE_MATCH_TIME
- Test Case Derivation: Derived from FR-2’s requirement for personalization
- How test will be performed: Upload a writing sample (e.g., an excerpt from a Joan Didion essay or similar source). Measure the time from upload to generation, where it must not exceed STYLE_MATCH_TIME. Then, evaluate the output using the (6.3.1 Didion’s Writing Style Validation) checklist.

4.1.3 User Interaction Features

These tests verify requirements FR-8 through FR-10, covering the system’s user interaction capabilities including feedback panels and preference management.

4.1.3.1 LTR Feedback Panel Validation

Test-FR-F1
- Control: Manual
- Initial State: Editor opened with LTR feedback panel active, initial suggestions present
- Input: User feedback on suggestion (tone adjustment, alternative phrasing)
- Output: Real-time suggestion updates with clear preview
- Test Case Derivation: Based on FR-3’s requirement for real-time feedback integration
- How test will be performed: Generate suggestions, provide feedback, verify real-time updates and clarity.

4.1.3.2 User Preferences Validation

Test-FR-TG1
- Control: Manual
- Initial State: Editor opened with settings panel accessible
- Input: Preference settings (formal tone, narrative style, passive voice, high formality)
- Output: Consistent application of preferences across generations
- Test Case Derivation: Derived from FR-4’s requirement for persistent preference application
- How test will be performed: Enable the desired preferences (e.g., formal tone, narrative style) in the settings panel. Generate at least three outputs using the same prompt. Evaluate formal tone consistency using the (6.3.2 Formal Tone Generation) checklist.

4.1.4 Profile Management Features

These tests verify requirements FR-5 and FR-11, focusing on user profile management and configuration persistence.

4.1.4.1 Profile Configuration Validation

Test-FR-UP1
- Control: Manual
- Initial State: Editor opened with profile settings access
- Input: Creative writing configuration (informal tone, narrative style)
- Output: Successfully saved profile with retained settings
- Test Case Derivation: Based on FR-5’s requirement for profile saving
- How test will be performed: Create profile, save settings, verify retention.
Test-FR-UP2
- Control: Manual
- Initial State: Profile panel with one existing profile
- Input: Academic writing profile creation
- Output: New profile saved without affecting existing profiles
- Test Case Derivation: Derived from FR-5’s requirement for multiple profile support
- How test will be performed: Create second profile, verify both profiles maintain settings.

4.1.5 Document Management Features

These tests verify requirements FR-6 and FR-8, covering document navigation and version control capabilities.

4.1.5.1 Revision Management Validation

Test-FR-RM1
- Control: Manual
- Initial State: Document with multiple sections loaded
- Input: Tree view navigation enabled
- Output: Visual document map with section hierarchy
- Test Case Derivation: Based on FR-6’s requirement for non-linear navigation
- How test will be performed: Enable tree view, verify section visualization.

4.1.5.2 Version Control Validation

Test-FR-VC1
- Control: Manual
- Initial State: Editor with version history
- Input: Navigation to previous version
- Output: Previous version display with restore/diff options
- Test Case Derivation: Based on FR-8’s requirement for version navigation
- How test will be performed: Access version menu, verify display and options.
Test-FR-VC2
- Control: Manual
- Initial State: Multiple versions available
- Input: Version reversion command
- Output: Selected version becomes current with edit capability
- Test Case Derivation: Based on FR-8’s requirement for version restoration
- How test will be performed: Select revert option, verify state change.

4.1.6 Progress Tracking Features

These tests verify requirements FR-11 and FR-12, focusing on writing goals and progress monitoring.

4.1.6.1 Writing Goals Validation

Test-FR-PT1
- Control: Manual
- Initial State: Progress tracking enabled
- Input: Word count goal (WORD_GOAL)
- Output: Real-time progress meter
- Test Case Derivation: Based on FR-11’s requirement for progress tracking
- How test will be performed: Set goal, verify meter updates.
Test-FR-PT2
- Control: Manual
- Initial State: Tone analysis enabled
- Input: Formal tone consistency goal
- Output: Tone deviation alerts and suggestions
- Test Case Derivation: Based on FR-11’s requirement for style consistency
- How test will be performed: Write with varying tones, verify alerts.

4.1.7 Document Export Features

These tests verify requirement FR-13, covering document export capabilities.

4.1.7.1 Export Functionality Validation

Test-FR-E1
- Control: Manual
- Initial State: Completed document ready for export
- Input: PDF export selection
- Output: Formatted PDF with preserved content
- Test Case Derivation: Based on FR-13’s requirement for PDF export
- How test will be performed: Export document, verify format retention.
Test-FR-E2
- Control: Manual
- Initial State: Completed document ready for export
- Input: Plain text export selection
- Output: Raw text file without formatting
- Test Case Derivation: Based on FR-13’s requirement for plain text export
- How test will be performed: Export document, verify content preservation.

4.1.8 Interface Customization Features

These tests verify requirement FR-14, covering visual theme customization.

4.1.8.1 Theme Customization Validation

Test-FR-VT1
- Control: Manual
- Initial State: Default light theme active
- Input: Dark mode selection
- Output: Consistent dark theme application
- Test Case Derivation: Based on FR-14’s requirement for theme switching
- How test will be performed: Switch theme, verify visual consistency.

4.2 Tests for Nonfunctional Requirements

4.3 Look and Feel

4.3.1 Verify Unified, Non-Intrusive, and Uncluttered Visual Design

Test-LF-A1

Type: Structural, Static, Manual
Initial State: The fully developed morph application is accessible on various devices.
Input/Condition: Access the application UI on different devices and screen sizes.
Output/Result: Confirmation that the UI is unified, non-intrusive, and uncluttered across all interfaces.
How test will be performed: Conduct a design review by assembling a team of UI/UX experts who will use a predefined checklist based on design guidelines. Usability testing will be conducted with target USABILITY_TEST_PARTICIPANTS representing primary user personas, followed by survey feedback analysis (6.1 Usability Survey Questions).

Test-LF-A2

Type: Structural, Static, Manual
Initial State: The application’s UI components are fully implemented.
Input/Condition: All UI screens and components are available for review.
Output/Result: Verification that standardized typography and color palettes are consistently applied.
How test will be performed: Conduct a UI audit using design system documentation and WCAG Contrast Checker. Manually verify font sizes, styles, and color codes across screens to ensure adherence to design standards.

4.4 Usability and Humanity

4.4.1 Assess the Planning Interface

Test-UH-EOU3

Type: Structural, Dynamic, Manual
Initial State: Planning interface is accessible.
Input/Condition: Use the planning interface to organize and adjust creative writing steps.
Output/Result: Users can effectively organize and debug their writing using the interface.
How test will be performed: Provide users with a writing task that requires planning. Observe their navigation and utilization of features. Collect feedback on effectiveness and usability through surveys and interviews (6.1 Usability Survey Questions).

4.4.2 Measure Onboarding Time for New Users

Test-UH-L1

Type: Structural, Dynamic, Manual
Initial State: The application is ready for first-time use.
Input/Condition: Provide new users with access to the application without prior instruction.
Output/Result: Users begin creating or editing content within ONBOARDING_TIME.
How test will be performed: Recruit participants unfamiliar with morph. Time their process from start to content creation. Note obstacles and gather onboarding feedback. This will then be followed by survey feedback analysis (6.1 Usability Survey Questions).

Test-UH-A2

Type: Structural, Dynamic, Manual
Initial State: Application is fully developed.
Input/Condition: Navigate the application using only the keyboard.
Output/Result: All UI components are accessible via keyboard.
How test will be performed: Verify focus indicators and functionality of interactive elements on the main interface. Perform common tasks without using a mouse to ensure accessibility.

4.5 Performance

4.5.1 Measure Time to First Tokens (TTFT)

Test-PR-SLR1

Type: Structural, Dynamic, Automatic
Initial State: The inference server and application are operational.
Input/Condition: Submit requests for suggestions and planning.
Output/Result: TTFT should meet a 95th percentile threshold below 500ms.
How test will be performed: Performance testing tools will be used to automatically simulate user requests for suggestions and planning features. The time from request submission to the receipt of the first token will be recorded. Tests will be conducted under various network conditions, including different latencies, to assess performance across typical user scenarios. The results will be compiled into a report detailing average TTFT, the 95th percentile, and any deviations to ensure the application’s responsiveness meets the specified flexible requirements.

4.5.2 Evaluate Throughput of Inference Server

Test-PR-SLR2

Type: Structural, Dynamic, Automatic
Initial State: Inference server is set up with batch processing capabilities.
Input/Condition: Send batched requests with the BATCH_SIZE.
Output/Result: Achieve approximately SUGGESTION_TOKENS throughput.
How test will be performed: Load testing tools will automatically send concurrent batched requests to the inference server. The number of tokens processed per second will be measured over multiple test runs. Server resource utilization including CPU, GPU, and memory will be analyzed to identify any bottlenecks. If the throughput is below the desired level, optimizations will be recommended to enhance performance.

4.5.3 Ensure Interface Contains Only Safe Content

Test-PR-SCR2

Type: Structural, Static, Automatic
Initial State: All UI elements and assets are integrated.
Input/Condition: Review all images, icons, and media used in the application.
Output/Result: Confirmation that there is no NSFW or harmful content.
How test will be performed: Automated image analysis tools will be used to scan all graphical assets for inappropriate content. Licenses and sources of third-party assets will be verified automatically where possible. Any content detected as unsuitable will be reviewed manually for confirmation and then replaced or removed to maintain a safe user environment.

4.5.4 Test Accuracy of Generated Text Matching User Steering

Test-PR-PAR1

Type: Structural, Dynamic, Automatic
Initial State: SAEs and steering functionalities are implemented.
Input/Condition: Provide steering inputs (e.g., tone, style) and generate text.
Output/Result: Generated text aligns with user inputs and feedback.
How test will be performed: Specific steering parameters will be defined, and automated scripts will generate text outputs based on these inputs. Analytical metrics like cosine similarity and stylistic analysis tools will be used to quantitatively assess the alignment between the generated text and the steering inputs. Results will be compiled to evaluate the system’s responsiveness and adjustments will be made to improve accuracy where necessary.

4.5.5 Test Deployment Strategy for Fault Tolerance

Test-PR-RFR2

Type: Structural, Dynamic, Automatic
Initial State: Application is deployed on a Kubernetes cluster.
Input/Condition: Simulate node or replica failures.
Output/Result: Deployment is recreated automatically, maintaining availability.
How test will be performed: Automated scripts will intentionally fail pods or nodes within the Kubernetes cluster. Monitoring tools will automatically track the system’s response and recovery time, verifying that deployments are recreated as with the fault tolerance strategy. Application availability will be checked continuously to ensure minimal impact on users.

4.5.6 Assess Asynchronous Processing of Suggestions

Test-PR-CR1

Type: Structural, Dynamic, Automatic
Initial State: System supports asynchronous suggestion processing.
Input/Condition: Multiple users submit suggestion requests simultaneously.
Output/Result: Requests are processed without significant delay or errors.
How test will be performed: Automated performance testing tools will simulate multiple users submitting requests concurrently. The system’s queue management and processing times will be monitored automatically to assess its ability to handle asynchronous processing. Any request drops or errors will be logged for analysis and remediation.

4.5.7 Verify Input Responsiveness

Test-PR-CR2

Type: Structural, Dynamic, Automatic
Initial State: Text manipulation features are implemented.
Input/Condition: Perform rapid text entry and editing operations.
Output/Result: No noticeable delays or lag in input response.
How test will be performed: Automated scripts will perform rapid text entry and editing operations while performance profiling tools measure input latency. Tests will be run on various hardware configurations and browsers automatically. If latency exceeds acceptable thresholds, code optimizations will be implemented to enhance responsiveness.

4.6 Security

4.6.1 Ensure HTTPS Encryption for All Communications

Test-SR-INT1

Type: Structural, Dynamic, Automatic
Initial State: Application and servers are set up with SSL certificates.
Input/Condition: Monitor network traffic during application use.
Output/Result: All data transmissions are encrypted using HTTPS.
How test will be performed: Automated security testing tools will monitor network traffic to verify that all communications use HTTPS. Attempts to access the application via unsecured HTTP will be scripted to ensure automatic redirection to HTTPS. The validity and configuration of SSL certificates will be checked automatically. Any mixed content warnings detected by browsers will be addressed promptly.

4.6.2 Implement DNS Security Measures

Test-SR-INT2

Type: Structural, Dynamic, Automatic
Initial State: DNS security configurations are in place.
Input/Condition: Perform DNS queries and observe responses.
Output/Result: DNS queries and responses are secure from tampering and spoofing.
How test will be performed: Automated DNSSEC testing tools will verify the implementation of DNS security measures. Simulated DNS spoofing attacks will be conducted automatically to test system resilience. Any vulnerabilities detected will be logged and remediated to protect against DNS-based attacks.

4.6.3 Validate Content Security Policies (CSP)

Test-SR-INT3

Type: Structural, Dynamic, Automatic
Initial State: CSP headers are configured on the server.
Input/Condition: Use the application while attempting to execute unauthorized scripts.
Output/Result: CSP effectively prevents XSS attacks.
How test will be performed: Automated security testing tools will attempt to inject malicious scripts into the application. The effectiveness of CSP in blocking these scripts will be verified automatically. CSP headers will be analyzed to ensure they are correctly configured. Any violations or weaknesses will be addressed to enhance security.

4.6.4 Verify Privacy Compliance

Test-SR-P1

Type: Structural, Static, Automatic
Initial State: Application codebase is complete.
Input/Condition: Review data handling processes and storage mechanisms.
Output/Result: Confirmation that no personal information is collected or stored.
How test will be performed: Automated code analysis tools will scan the codebase to identify any components that collect, process, or store personal data. Network traffic will be monitored during simulated user interactions to ensure no personal information is transmitted. Storage mechanisms like databases, local storage, and cookies will be inspected to verify they do not retain personal data. All findings will be documented, and any issues will be resolved to ensure compliance with privacy policies.

4.7 Maintainability and Support

4.7.1 Schedule and Verify Security Updates

Test-OER-MR1

Type: Structural, Static, Automatic
Initial State: Maintenance schedules are established.
Input/Condition: Review update logs and schedules while performing security scans.
Output/Result: Regular updates are performed. No outstanding vulnerabilities.
How test will be performed: Security updates will be verified by running npm audit at scheduled intervals to detect and address vulnerabilities in dependencies. Audit reports will be reviewed, and necessary updates will be applied to resolve critical issues. Continuous integration pipelines will integrate npm audit checks to prevent new vulnerabilities. All security updates and patches will be documented.

4.8 Compliance

4.8.1 Verify Compliance with Canadian Copyright Law

Test-CompR-LR1

Type: Structural, Static, Automatic
Initial State: Suggestion generation is operational.
Input/Condition: Generate content and analyze for potential copyright infringements.
Output/Result: No generated content violates Canadian copyright laws.
How test will be performed: Automated plagiarism detection tools like Quetext will be used to analyze writing suggestions and compare them against existing works to detect potential copyright infringements. The system will be tested with various inputs to ensure that suggestions do not unintentionally reproduce copyrighted phrases or unique expressions from well-known literary works. For example, if a user writes a sentence similar to a passage from a published work such as:

“Someone left this for you,” she says, then turns to the next customer.

The system should not suggest completing or modifying it in a way that reproduces protected content. All findings will be documented, and suggestion filtering mechanisms will be adjusted if necessary to ensure compliance with Canadian copyright laws.

4.9 Traceability Between Test Cases and Requirements

Requirement ID	Requirement Description	Test Case ID(s)
FR1	morph shall provide suggestions during the planning phase of creative writing, offering assistance in structuring ideas, topics, or themes based on user input.	Test-FR-P1, Test-FR-P2, Test-FR-P3
FR2	morph shall provide users with manual control over text generation, enabling them to select advanced steering options such as tone, style, or creativity level.	Test-FR-S1, Test-FR-S2
FR3	Users shall be able to interact with generated text through a left-to-right (LTR) feedback panel, allowing them to provide real-time feedback on model-generated suggestions.	Test-FR-F1
FR4	Users shall be able to set preferences for tone, style, voice, and formality, which morph will apply to all future generations of text.	Test-FR-TG1
FR13	morph shall allow users to export their documents in .pdf, .md (Markdown), and plain text formats, ensuring compatibility with external platforms.	Test-FR-E1, Test-FR-E2
FR14	morph shall allow users to customize the visual appearance of the editor by choosing from different themes, such as dark mode, light mode, and high-contrast options.	Test-FR-VT1
LF-A1	morph shall adopt a unified, non-intrusive, and uncluttered visual design.	Test-LF-A1
LF-A2	morph must implement a consistent design system across all user interfaces, involving standardized typography and color palette.	Test-LF-A2
LF-S1	The design of the application will be minimalist, utilizing clean lines and a monotonic color palette.	Test-LF-S1
LF-S2	The application must be responsive, adapting seamlessly to various device screens and orientations.	Test-LF-S2
LF-S3	Interactive elements such as buttons and links must contrast significantly with the background to ensure visibility and accessibility.	Test-LF-S3
LF-S4	The user interface should enable smooth transitions and intuitive animations across various sections and features.	Test-LF-S4
LF-S5	The application should include visual cues and feedback for user interactions to reinforce usability.	Test-LF-S5
UH-EOU3	The application shall include a planning interface to assist users in organizing and debugging their creative writing steps.	Test-UH-EOU3
UH-L1	New users should be able to understand basic functionalities and start creating or editing content within 10 minutes of initial use.	Test-UH-L1
UH-UP1	The application should utilize clear and concise language for all instructions, feedback, and user interface elements.	Test-UH-UP1, Test-OER-PR2
UH-A1	morph should support text resizing without loss of content or functionality.	Test-UH-A1
UH-A2	morph should ensure that all user interface components are accessible via keyboard navigation.	Test-UH-A2
UH-A3	Implement ARIA (Accessible Rich Internet Applications) attributes throughout the application.	Test-UH-A3
PR-SLR1	TTFT should be minimum, around 200-500ms	Test-PR-SLR1
PR-SLR2	Throughput should be approximate 300 tokens/sec for a batch size of 4	Test-PR-SLR2
PR-SCR1	Suggestions must not be harmful	Test-PR-SCR1
PR-SCR2	The interface must not contain harmful images or NSFW content.	Test-PR-SCR2
PR-PAR1	The generated text should match users’ steering direction	Test-PR-PAR1, Test-CulR-CR2
PR-RFR1	A notification toast must be sent to users in case inflight requests fail to complete.	Test-PR-RFR1
PR-RFR2	morph must implement a recreate deployment strategy	Test-PR-RFR2
PR-CR1	Suggestions would be run asynchronously on request.	Test-PR-CR1
PR-CR2	Input should not show any certain delay	Test-PR-CR2, Test-OER-EPE2
PR-SER1	morph inference server must include scale-to-zero and concurrency-based autoscaling.	Test-PR-SER1, Test-OER-EPE2
PR-LR1	Future integration with other language model architecture	Test-PR-LR1, Test-OER-RIAS1
OER-EPE1	morph will be able to run on different hardware environment, given it can run modern browser.	Test-OER-AR1
OER-EPE2	morph should have minimal increase in power consumption	Test-PR-SER1, Test-PR-CR2
OER-RIAS1	morph inference server should provide an OpenAI-compatible endpoints.	Test-PR-LR1
OER-PR1	Secrets must be configured with certain Role-based access control (RBAC) rules	Test-SR-INT4
OER-PR2	Relevant documentation should be accessible by users.	Test-UH-UP1, Test-UH-L1
OER-PR3	Feedback should also be included within the interface	Test-OER-SR1
OER-RR1	Release cycle must utilize current GitHub CD workflow.	Test-OER-MR2
OER-RR2	End-to-end tests should pass before deploying to production.	Test-OER-MR2
OER-MR1	Security updates must be done periodically	Test-OER-MR1, Test-SR-IM1
OER-MR2	Feature integrations must pass existing tests	Test-OER-MR2
OER-AR1	morph must be able to run with existing users’ environment	Test-PR-LR2
SR-INT1	All communication between the client UI, backend services, and external APIs must be encrypted using HTTPS.	Test-SR-INT1
SR-INT2	Implement DNS security measures to ensure that DNS queries and responses are protected against tampering and spoofing.	Test-SR-INT2
SR-INT3	The application will use content security policies to mitigate the risk of XSS attacks.	Test-SR-INT3
SR-P1	The application must ensure that it does not collect or store personal information, adhering strictly to privacy by design principles.	Test-SR-P1
SR-AU1	Implement monitoring of interactions with external service providers to ensure their use complies with security policies and performance expectations.	Test-OER-MR1
SR-IM1	Employ up to date security measures to protect against known threats and vulnerabilities, including regular updates and patches to the software components.	Test-OER-MR1
SR-IM2	Configure the application to minimize the surface area for attacks by disabling unused services and endpoints.	Test-SR-INT4
CulR-CR1	English supports	Test-UH-PI1
CulR-CR3	Support left-to-right (LTR) reading flow	Test-UH-PI1

5 Unit Test Description

This section outlines unit tests for verifying the core modules and components of morph. The overall philosophy for test case selection is based on achieving maximum coverage of critical functionalities while prioritizing areas of high complexity or risk.

5.1 Unit Testing Scope

The unit testing scope focuses on core components of morph while excluding certain modules due to priority, ownership, or reliability considerations.

5.1.1 In-Scope Modules

Text editor core components including buffer management and file operations
Planning interface components for user interactions
Feature steering modules for SAE integration and activation caching
Version control subsystem for tracking document changes
Inference server request handling and response processing
User preference and configuration management
Security components including authentication and data protection

5.1.2 Out-of-Scope Modules

Third-party dependencies:
- Base language models (Gemma 2, Llama 3.1) as they are externally maintained
Browser-specific implementations
Operating system specific functionalities
Cloud provider specific implementations

5.1.3 Testing Priority Rationale

High Priority (P0)

Core text editing functionality
Feature steering components
Planning interface elements
Security-critical components

Medium Priority (P1)

Version control features
Configuration management
Performance optimization components

Low Priority (P2)

Extended platform support
Optional features
Cosmetic components

5.2 Tests for Functional Requirements

Omitted for now, refer to Module Guide for more information.

5.3 Tests for Nonfunctional Requirements

5.3.1 Inference Engine

Test-IE1

Type: Functional, Dynamic, Automatic
Initial State: The Inference Engine module is fully implemented and operational.
Input/Condition: Submit a predefined request to the inference engine.
Output/Result: The Time-to-First-Token (TTFT) is within 200-500 milliseconds.
How test will be performed: A unit test will be created to send a request to the inference engine with a specific input. The test will measure the time elapsed from the moment the request is sent to the receipt of the first token from the inference engine. The test will assert that the TTFT falls within the specified range of 200-500 milliseconds.

Test-IE2

Type: Functional, Dynamic, Automatic
Initial State: The Inference Engine module supports batch processing.
Input/Condition: Send batched requests with a batch size of 4.
Output/Result: Achieve a throughput of approximately 300 tokens per second.
How test will be performed: The unit test will send multiple batched requests to the inference engine, each batch containing 4 requests. It will measure the total number of tokens processed per second during the test. The test will verify that the throughput meets the specified target by asserting that the tokens processed per second are approximately 300.

Test-IE3

Type: Functional, Dynamic, Automatic
Initial State: The Harmful Content Filter within the Inference Engine is implemented.
Input/Condition: Provide inputs that are likely to generate harmful or inappropriate content.
Output/Result: The inference engine outputs are free of harmful or inappropriate content, judged with keyword and phrase blacklists, together with Regular Expressions.
How test will be performed: The unit test will feed the inference engine with inputs known to potentially trigger harmful content. It will analyze the outputs to ensure that no harmful or inappropriate content is present, confirming that the engine’s safety mechanisms effectively filter out undesirable content.

5.3.2 User Interface

Test-UI1

Type: Functional, Dynamic, Automatic
Initial State: The UI module is fully implemented and operational.
Input/Condition: Simulate rapid text input and editing actions by the user.
Output/Result: The UI responds without noticeable delays, ensuring a smooth user experience.
How test will be performed: The unit test will automate rapid typing and editing actions within the text editor component of the UI. It will measure the response time of the UI to these actions, asserting that the response time is within acceptable thresholds of less than 50 milliseconds per action, ensuring high responsiveness during intensive user interactions.

Test-UI2

Type: Functional, Dynamic, Automatic
Initial State: The notification system within the UI is implemented.
Input/Condition: Simulate an inflight request failure in the application.
Output/Result: A notification toast is displayed to the user informing about the request failure.
How test will be performed: The unit test will mock a failure in an inflight request by triggering an error condition in the request handling module. It will then verify that the UI displays a notification toast with a clear message informing the user of the failure and suggesting a retry or troubleshooting action.

5.3.3 Deployment Management

Test-DM1

Type: Functional, Dynamic, Automatic
Initial State: The deployment strategy is configured with recreate capabilities.
Input/Condition: Simulate a failure of a node or replica in the deployment environment.
Output/Result: The deployment is automatically recreated, maintaining application availability.
How test will be performed: The unit test will simulate the failure of a node or pod within the Kubernetes cluster by programmatically deleting or stopping it. It will verify that the deployment controller automatically recreates the failed components and that the application remains available during the process, ensuring robustness and fault tolerance in the deployment strategy.

5.3.4 Suggestion Processing

Test-SP1

Type: Functional, Dynamic, Automatic
Initial State: The suggestion processing module supports asynchronous operations.
Input/Condition: Submit multiple suggestion requests simultaneously.
Output/Result: All suggestions are processed correctly without errors or significant delays.
How test will be performed: The unit test will concurrently submit multiple suggestion requests to the processing module using asynchronous calls. It will monitor the processing of each request to ensure they are handled independently and efficiently. The test will assert that all suggestions are returned correctly and within acceptable time frames, confirming the system’s capacity to handle concurrent requests.

5.3.5 Autoscaling Mechanism

Test-AM1

Type: Functional, Dynamic, Automatic
Initial State: The autoscaling configuration is implemented in the deployment environment.
Input/Condition: Vary the load on the inference server to simulate high and low traffic conditions.
Output/Result: The inference server scales up during high traffic and scales down to zero during low traffic.
How test will be performed: The unit test will programmatically generate varying loads on the inference server by simulating user requests at different rates. It will monitor the number of active server instances and verify that scaling occurs in response to traffic. It will treat scaling up when request rates exceed a defined threshold, and scaling down to zero after a period of inactivity (e.g., 5 minutes without traffic), confirming that autoscaling optimizes resource usage as configured.

5.3.6 Model Integration

Test-MI1

Type: Functional, Static, Automatic
Initial State: The application is prepared to support multiple language model architectures.
Input/Condition: Integrate an alternative language model architecture into the system.
Output/Result: The application functions correctly with the new language model without errors.
How test will be performed: The unit test will replace the current language model with an alternative one, such as integrating a new SAE model. It will run the existing unit tests and check for compatibility issues or errors, ensuring that the application remains stable and functional with the new model.

5.3.7 Platform Compatibility

Test-PC1

Type: Functional, Static, Automatic
Initial State: The application build is configured for different operating systems.
Input/Condition: Build and run the application on Windows, macOS, and Linux environments.
Output/Result: The application installs and runs successfully without error on all supported platforms.
How test will be performed: The unit test will automate the build process for the application on different operating systems using cross-platform build tools. It will then execute automated tests to ensure that the application functions correctly on each platform, ensuring adaptability and support for different distribution platforms.

5.3.8 Security Components

Test-SC1

Type: Functional, Static, Automatic
Initial State: Security components and dependencies are up-to-date.
Input/Condition: Check for known vulnerabilities in the current dependencies.
Output/Result: No critical vulnerabilities are found; security updates are scheduled and performed.
How test will be performed: The unit test will use tools like npm audit or Snyk to scan the project’s dependencies for known security vulnerabilities. It will generate a report of any issues found and verify that updates are applied promptly, maintaining the security integrity of the application.

Test-SC2

Type: Functional, Static, Automatic
Initial State: RBAC policies are defined and implemented.
Input/Condition: Attempt to access secrets and resources with different user roles.
Output/Result: Access is granted only to roles explicitly authorized in the RBAC policy; unauthorized roles are denied access and receive a clear error indicating insufficient permissions.
How test will be performed: The unit test will simulate users with various roles attempting to access sensitive resources like secrets or configuration files. It will verify that only authorized roles have access, and unauthorized attempts are blocked, ensuring that secrets are protected with proper access controls.

5.3.9 Documentation Accessibility

Test-DA1

Type: Functional, Static, Manual
Initial State: Documentation is written and linked within the application.
Input/Condition: Navigate through the application to locate and access the documentation.
Output/Result: Users can easily find and access relevant documentation from the interface.
How test will be performed: A manual test will be conducted where the tester navigates the application’s UI to locate links to usage manuals and technical documentation. The tester will verify that the links are prominently placed, clearly labeled, and lead to the correct documentation pages. This will facilitate user understanding through accessible documentation.

5.3.10 Release and Deployment

Test-RD1

Type: Functional, Dynamic, Automatic
Initial State: The GitHub CD workflow is set up for the project.
Input/Condition: Commit new code changes and push to the repository.
Output/Result: The continuous deployment pipeline is triggered, and the application is built and deployed automatically without error.
How test will be performed: The unit test will simulate code changes by committing to a test branch. It will verify that the GitHub CD workflow is triggered, running automated tests and deploying the application if all tests pass. This will ensure that the release cycle utilizes the current CI/CD workflow effectively.

5.4 Traceability Between Test Cases and Modules

Test Case ID	Description	Module
Test-FR-P1	Verify planning suggestions appear within 10 seconds with valid prompt input.	Planning Suggestions
Test-FR-P2	Test error message response to unintelligible prompt input in planning mode.	Planning Suggestions
Test-FR-P3	Validate system’s handling of large prompt inputs, providing condensed suggestions or length prompt warning.	Planning Suggestions
Test-FR-S1	Verify text suggestions in the style of a specific author when tone steering is applied.	Steering Text Generation
Test-FR-S2	Test model adaptation to user-uploaded writing sample for customized suggestions within 30 seconds.	Steering Text Generation
Test-FR-F1	Validate real-time feedback update in LTR panel based on user’s tone and phrasing adjustments.	LTR Feedback Panel
Test-FR-TG1	Ensure future text generations reflect user-set preferences for tone, style, voice, and formality.	User Preferences for Text Generation
Test-FR-UP1	Verify the saving of a user profile with specific writing style configurations, such as creative writing.	User Profile Configuration
Test-FR-UP2	Test the creation and retention of multiple user profiles without overwriting.	User Profile Configuration
Test-FR-RM1	Verify non-linear navigation in the tree view, showing key points and sections in the document.	Revision Management
Test-FR-VC1	Validate navigation to a previous version in version control with viewing or restoring options.	Version Control
Test-FR-VC2	Test reversion to a previous version in version control, making it the current document state.	Version Control
Test-FR-PT1	Test real-time word count tracking with a user-set writing goal.	Writing Goals and Progress Tracking
Test-FR-PT2	Validate tone consistency monitoring, with suggestions when tone deviates from user’s set goal.	Writing Goals and Progress Tracking
Test-FR-E1	Verify successful export of a completed document in .pdf format, preserving formatting.	Document Export Functionality
Test-FR-E2	Test plain text (.txt) export functionality, ensuring only raw content is saved.	Document Export Functionality
Test-FR-VT1	Validate theme customization, specifically switching from default light mode to dark mode.	Visual Theme Customization
Test-LF-A1	Verify unified, non-intrusive, and uncluttered visual design.	User Interface
Test-LF-A2	Verify standardized typography and color palettes are consistently applied.	User Interface
Test-LF-S1	Validate minimalist design with a monotonic color palette.	User Interface
Test-LF-S2	Test responsiveness across devices and orientations.	User Interface
Test-LF-S3	Verify contrast of interactive elements to ensure visibility.	User Interface
Test-LF-S4	Assess smooth transitions and animations for intuitive experience.	User Interface
Test-LF-S5	Verify visual feedback for user interactions.	User Interface
Test-UH-EOU1	Evaluate session history feature accuracy in recording activities.	User Interface
Test-UH-EOU2	Test interactive review and manual acceptance of suggestions.	Suggestion Processing
Test-UH-EOU3	Assess the planning interface for effective user interaction.	User Interface
Test-UH-PI1	Verify multilingual support functionality.	User Interface
Test-UH-PI2	Test theme customization options for light and dark modes.	User Interface
Test-UH-L1	Measure onboarding time for new users to begin creating or editing content.	User Interface
Test-UH-UP1	Evaluate the clarity of language used in the UI.	User Interface
Test-UH-A1	Test text resizing functionality for accessibility.	User Interface
Test-UH-A2	Verify keyboard navigation accessibility for interactive components.	User Interface
Test-UH-A3	Implement and test ARIA attributes for screen reader compatibility.	User Interface
Test-PR-SLR1	Measure TTFT (Time-to-First-Token) with a 95th percentile below 500ms during requests.	Inference Engine
Test-PR-SLR2	Evaluate throughput of the inference server with batch processing capabilities.	Inference Engine
Test-PR-SCR1	Validate that suggestions are non-harmful and appropriate.	Inference Engine
Test-PR-SCR2	Ensure interface content contains no harmful or NSFW elements.	User Interface
Test-PR-PAR1	Test the accuracy of generated text matching user steering parameters.	Suggestion Processing
Test-PR-RFR1	Verify notification toast for inflight request failures.	User Interface
Test-PR-RFR2	Test deployment strategy to ensure fault tolerance and application availability.	Deployment Management
Test-PR-CR1	Assess asynchronous processing of multiple user requests without significant delay.	Suggestion Processing
Test-PR-CR2	Verify input responsiveness during rapid text entry and editing.	User Interface
Test-PR-SER1	Test autoscaling mechanism of inference server during varying traffic loads.	Autoscaling Mechanism
Test-PR-LR1	Evaluate integration with different model architectures.	Model Integration
Test-PR-LR2	Test packaging and execution across different operating systems.	Platform Compatibility
Test-SR-INT1	Ensure all communications are encrypted using HTTPS.	Security Components
Test-SR-INT2	Implement DNS security measures to secure queries and responses.	Security Components
Test-SR-INT3	Validate Content Security Policies (CSP) to prevent XSS attacks.	Security Components
Test-SR-INT4	Test session security with JWT and short-lived tokens.	Security Components
Test-SR-P1	Verify that privacy compliance is maintained with no collection of personal data.	Security Components
Test-OER-MR1	Schedule and verify that security updates are performed regularly.	Security Components
Test-OER-MR2	Ensure new feature integrations pass existing tests without regression.	Release and Deployment
Test-OER-SR1	Implement a user feedback loop to ensure user feedback is recorded and accessible.	User Interface
Test-CompR-LR1	Verify compliance with Canadian copyright laws for generated content.	Documentation Accessibility
Test-CompR-LR2	Ensure SOC 2 compliance for security standards of the inference server.	Security Components
Test-CompR-LR3	Obtain user consent before using content for inference purposes.	Security Components
Test-CompR-SCR1	Verify that client-server communications adhere to HTTP/1.1 standards.	Release and Deployment

6 Appendix

6.1 Usability Survey Questions

On a scale of 1 to 5, how would you rate the overall visual design of morph?
Did you find the interface to be uncluttered and non-intrusive? Please explain.
How easy was it to navigate through the application?
Were the animations and transitions smooth and helpful in guiding you?
Did you notice any issues with theme customization (light/dark mode)?
How intuitive was the planning interface for organizing your writing?
Were instructions and feedback clear and understandable?
Did you encounter any difficulties when resizing text or using keyboard navigation?
How satisfied are you with the responsiveness of the application?
Do you have any suggestions for improving the usability of morph?

6.2 Test Input Table

Test Case ID	Input Description	Input Type	Sample Input	Notes
Test-FR-P1	Prompt entered into planning interface	Text	`environmental sustainability`	Used to test suggestion generation
Test-FR-P2	Unintelligible prompt for robustness check	Text	`asdh123!@#`	Should trigger input refinement message
Test-FR-P3	Large block of text for planning	Text	A 500+ word passage on climate change	Sourced externally and uploaded for testing
Test-FR-S1	Prompt with “Didion” style selection	Text + Style	Prompt: `reflection on modern life` Style: `Didion`	See Checklist B.1 for style validation
Test-FR-S2	Writing sample upload for personalization	Text file	Excerpt from Joan Didion’s essay, e.g., The White Album (from published book or verified online source)	Uploaded sample used for tone adaptation
Test-FR-F1	Feedback on generated suggestion	UI Interaction	Tone adjustment, alternate phrasing	Verifies real-time feedback loop
Test-FR-TG1	User-defined preferences for generation	Settings	Tone: formal, Style: narrative, Passive voice, High formality	Applied across generations
Test-FR-UP1	Save creative writing profile	Settings	Informal tone, Narrative style	Verifies profile retention
Test-FR-UP2	Create and save academic writing profile	Settings	Formal tone, Analytical style	Ensures multiple profiles can be saved
Test-FR-RM1	Navigate document via tree view	UI Interaction	Enable tree view navigation	Visualizes section hierarchy
Test-FR-VC1	View previous version of document	UI Interaction	Version history menu	Shows restore/diff options
Test-FR-VC2	Revert to earlier document version	UI Interaction	Select version to revert	Current document updates to selected version
Test-FR-PT1	Set word count goal for writing	Numeric Input	Goal: 1500 words	Progress meter should reflect count
Test-FR-PT2	Set tone consistency goal	Style Setting	Formal tone goal	Alerts when tone deviates
Test-FR-E1	Export document to PDF	UI Interaction	Select “Export as PDF”	Output must retain formatting
Test-FR-E2	Export document to plain text	UI Interaction	Select “Export as Plain Text”	Output should be unformatted text only
Test-FR-VT1	Switch from light to dark theme	UI Interaction	Select “Dark Mode” in preferences	Dark mode applies consistently across UI

6.3 Output Evaluation Checklists

6.3.1 Didion’s Writing Style Validation

To mark this test as pass, at least 4 of 5 criteria must be met:

Uses first-person narrative (e.g., “I think”, “I remember”)
Reflective or meditative tone with subtle emotional nuance
Unconventional sentence structures or rhythms (e.g., sentence fragments, loose clauses)
Descriptive and personal – includes vivid imagery or self-observation
Stylistic consistency across 3+ generated paragraphs

6.3.2 Formal Tone Generation

Avoids contractions (e.g., “do not” vs. “don’t”)
Uses domain-appropriate vocabulary
Maintains neutral and objective tone
Consistent sentence structure without slang or colloquialisms

6.4 Symbolic Parameters

The following symbolic constants are used throughout this document to improve clarity, maintainability, and flexibility. These constants replace hardcoded values in both functional and nonfunctional test cases.

Variable Name	Value	Description
`MAX_DISPLAY_TIME`	10 seconds	Maximum time allowed for planning suggestions to appear
`LENGTH_RESPONSE_TIME`	10 seconds	Time limit for response to long input prompts
`LARGE_INPUT_SIZE`	500 words	Threshold for input size triggering condensation or warning
`STYLE_MATCH_TIME`	30 seconds	Time to generate personalized suggestions from a user sample
`WORD_GOAL`	1500 words	Word count goal for writing progress tracking
`ONBOARDING_TIME`	10 minutes	Maximum time expected for new users to begin content creation
`SUGGESTION_TOKENS`	300 tokens/sec	Desired throughput of token generation by inference server
`TTFT_THRESHOLD`	500 ms	95th percentile threshold for Time To First Token
`BATCH_SIZE`	4	Number of concurrent requests in throughput testing
`USABILITY_TEST_PARTICIPANTS`	10 users	Number of participants for usability testing

6.5 Reflection

Aaron

One of the most successful aspects of writing this deliverable was refining and polishing the overall structure of the document. I focused on ensuring that the contents were clearly articulated, logically ordered, and stylistically consistent with our other project documents such as the SRS and Hazard Analysis. This alignment helped maintain a unified tone and made the V&V plan easier to follow. I also contributed to ensuring that terminology and references across different sections were cross-checked and synchronized, which improved clarity and cohesion.
The main challenge I experienced was the sheer volume of writing involved. Due to the document’s length and depth, it was easy to lose track of structural consistency and test coverage balance across sections. It was also difficult to find the right balance between technical specificity and readability for a diverse audience that includes supervisors, peers, and external reviewers. To address this, I used collaborative writing tools and relied on peer feedback sessions to prioritize clarity and completeness. I also coordinated with team members to divide responsibilities and reduce redundancy, which helped streamline the workload and improve section quality.
To complete the V&V process for morph, our team needs skills in dynamic testing to validate responsiveness and functional behavior under real user scenarios. Static analysis is also important for catching performance or security issues early, particularly in the backend. Accessibility testing is essential to ensure the UI meets WCAG standards and is fully keyboard-navigable. Familiarity with CI/CD pipelines is needed to automate testing and enforce quality gates. Finally, proficiency with tools like Jest, Playwright, Postman, axe-core, and Ruff is crucial for implementing tests efficiently across both the frontend and backend.
Aaron will focus on dynamic testing by applying tools like k6 directly to components during development, allowing for real-time feedback and iteration. Waleed will strengthen his static analysis skills through tutorials on tools like Ruff and Bandit, providing structured learning for backend quality. Nebras chose to enroll in a web accessibility course to gain a comprehensive foundation before conducting audits. Lucas is using GitHub documentation and example workflows to build and tune the CI/CD pipeline directly in our repo. Across the team, we’ll combine reading tool documentation with pair programming to share knowledge and ensure consistent testing practices.

Nebras

While working on the V&V plan, the process of writing the non-functional requirements section went particularly well. My familiarity with usability, performance, and accessibility testing facilitated a smooth drafting process. Additionally, collaborative efforts from Aaron, Waleed, and Lucas enriched the document by incorporating their insights and ensuring that our approach was comprehensive and aligned with project goals.
A significant challenge during the development of the V&V plan was aligning the diverse verification and validation approaches across different sections, which occasionally led to overlapping content or inconsistencies in test case formats. Managing these coordination issues was essential to maintain a coherent document. To resolve this, we held structured review sessions and used version control practices to ensure consistency and uniformity across sections. Regular team meetings and clearly documented responsibilities facilitated alignment, enabling the team to stay on the same page and maintain a clear direction throughout the process.
To successfully complete the verification and validation of our project, the team will need to acquire a variety of knowledge and skills. For Aaron, expertise in dynamic testing knowledge is essential to ensure that the interfaces and design components meet quality standards through thorough functional and performance testing. Waleed will need static testing knowledge, focusing on tools for code analysis to detect inefficiencies or vulnerabilities in the ML inference server, ensuring robust backend support. Lucas requires an understanding of specialized tool usage such as for model validation and performance monitoring to effectively contribute to SAE interventions. I would need to acquire front-end testing and accessibility knowledge to build and validate the web-based editor’s functionality, ensuring it is intuitive and inclusive for users.
To develop these essential knowledge areas, we have outlined multiple approaches for each skill and explained why each member has chosen their approach. For Aaron’s dynamic testing knowledge, he could either enroll in comprehensive courses focused on functional and performance testing or practice by applying dynamic tests to specific interface components. Aaron prefers the second approach as it provides immediate, practical application to his leadership in design work, which helping him refine testing skills relevant to our project’s needs.

Waleed’s static testing knowledge can be strengthened through tutorials on static code analysis tools or by conducting hands-on analysis using past codebases and open-source projects. Waleed has chosen the tutorial approach for a structured learning path that will directly aid in securing the ML inference server’s integrity as he believes it will enable him to build foundational expertise efficiently.

Lucas can gain proficiency in tool usage through workshops focusing on model validation and performance monitoring or by experimenting with these tools on controlled, smaller-scale projects. Lucas plans to attend workshops as they provide an opportunity to learn from experts and peers, making the process more collaborative and efficient. This is crucial for his role in SAE interventions.

For my part, to develop front-end testing and accessibility knowledge, I could either enroll in courses that cover web accessibility best practices or work on self-directed projects where I conduct accessibility audits. I will pursue the course option because it offers a systematic approach to learning, ensuring I understand the principles thoroughly and can apply them effectively when developing and validating the web-based editor.

Waleed

Writing this deliverable a better understanding of the project’s core features and functionality. It was particularly helpful in clarifying how specific requirements, like model steering, and version control, fit within our overall vision for morph. Focusing on functional requirements also allowed me to refine my grasp of the system’s design goals and limitations, helping ensure that our planned features align with user needs and our technical scope. I also appreciated that the process provided room for collaboration with the team and supervisor, leading to more well-defined requirements.
One of the main pain points I encountered was defining the functional scope accurately. There were initial uncertainties about how deeply certain features would be integrated, particularly technical aspects like the extent of model customization and maintaining performance in offline mode. It was challenging to balance the technical ambitions with practical constraints for the project’s first release. To overcome this, I initiated discussions with the team and supervisor to align our understanding and verify our priorities. These conversations clarified the project goals and identified where we could use existing frameworks versus areas that needed custom development.
Our team will need a variety of skills to carry out verification and validation:

Dynamic Testing Knowledge (All Team Members): Understanding how to perform functional and performance testing, especially on features like model responsiveness (TTFT) and interface elements like the LTR panel.
Static Testing Knowledge (Aaron and Waleed): Knowledge of code analysis tools to detect vulnerabilities or inefficiencies, which will be critical for ensuring robust and secure code.
Front-End Testing and Accessibility (Nebras): Skills in front-end testing, including accessibility evaluations, to validate that the interface is intuitive and accessible for diverse users.

Dynamic Testing Knowledge:

Approach 1: Enroll in online courses focused on software testing principles and tools.
Approach 2: Run dynamic tests on smaller sub-projects or components of morph to gain hands-on experience. All team members will apply Approach 2 by testing specific features in isolation as we build them.

Static Testing Knowledge:

Approach 1: Take specialized tutorials on static code analysis tools.
Approach 2: Practice static testing techniques on open-source projects or our previous codebases to identify common issues. Aaron and Waleed will adopt Approach 1, focusing on tutorials for code analysis. This approach will help build foundational knowledge that can later be applied during morph’s development for continuous quality assurance.

Front-End Testing and Accessibility:

Approach 1: Complete courses or certifications focused on web accessibility and front-end testing.
Approach 2: Participate in accessibility hackathons to practice identifying and solving accessibility issues in real-world projects. Nebras will pursue Approach 1 by taking a structured course in front-end accessibility. Given the importance of accessibility in our project, formal training will help Nebras systematically learn best practices and testing tools.

Lucas

1. From the past examples and rubric, the expectation is easy to understand
2. Some of my parts work as a summary of other partners' content, so I need to rearrange the relevant information after other parts completion
3. The coding language demands us to do unit testing of the components with matching testing tools, like Jest for javascript and pytest for python. This matches the knowledge about static testing knowledge I learned before.
4. Reading the website for relevant testing frame work usage introduction, and walking through the coding examples of the testing these two ways are recommanded approaches to acquiring the knowledege. The second approach is persued by group members since they can used as template to construct our test and have detailed information about how to construct a unit test.

6.6 Revision

Date	Version	Notes
Nov. 4 2024	0.0	initial VnV
Feb. 3 2024	0.1	Rev0
March 31 2025	0.2	Rename to `morph` for consistency
Apr. 3 2025	0.4	Full document revision and restructuring

morph

Table of Contents

Explorer

System Verification and Validation Plan

1 Introduction

2 General Information

2.1 Symbols, Abbreviations, and Acronyms

7.2 Data Dictionary

2.2 Purpose

2.3 Scope

2.3.1 In scope V&V Objectives

2.3.2 Out of scope V&V Objectives

Challenge Level and Extras

2.4 Relevant documentation

3 Plan

3.1 Verification and Validation Team

3.2 Milestones

3.3 SRS Verification Plan

3.3.1 Review Approaches

3.3.2 Structured Review Process

3.3.3 Verification Checklist

3.4 Design Verification Plan

3.4.1 Specification Verification

3.4.2 Functional Verification

3.4.3 Performance Verification

3.4.4 Rationale and Benchmark References for Performance Metrics

3.4.5 Documentation Validation

3.4.6 Review Schedule

3.4.7 Validation Tools

3.4.8 Design Verification Checklist

3.5 Implementation Verification Plan

3.6 Automated Testing and Verification Tools

3.6.1 morph

3.6.2 asteraceae

3.6.3 Critical Paths

3.6.4 CI Pipeline

3.6.5 Testing Environment Configuration

3.6.6 Automated Testing Requirements

3.6.7 Quality Gates

3.6.8 Monitoring and Reporting

3.7 Software Validation Plan

3.7.1 Validation Approach

3.7.2 User Validation Groups

3.7.3 Validation Scenarios

3.7.4 Validation Metrics

3.7.5 Validation Process

3.7.6 Validation Schedule

3.7.7 Acceptance Criteria

3.7.8 Issue Management

4 System Tests

4.1 Tests for Functional Requirements

4.1.1 Planning and Suggestion Features

4.1.1.1 Planning Suggestion Validation

4.1.2 Text Generation Features

4.1.2.1 Steering Validation

4.1.3 User Interaction Features

4.1.3.1 LTR Feedback Panel Validation

4.1.3.2 User Preferences Validation

4.1.4 Profile Management Features

4.1.4.1 Profile Configuration Validation

4.1.5 Document Management Features

4.1.5.1 Revision Management Validation

4.1.5.2 Version Control Validation

4.1.6 Progress Tracking Features

4.1.6.1 Writing Goals Validation

4.1.7 Document Export Features

4.1.7.1 Export Functionality Validation

4.1.8 Interface Customization Features

4.1.8.1 Theme Customization Validation

4.2 Tests for Nonfunctional Requirements

4.3 Look and Feel

4.3.1 Verify Unified, Non-Intrusive, and Uncluttered Visual Design

4.4 Usability and Humanity

4.4.1 Assess the Planning Interface

4.4.2 Measure Onboarding Time for New Users

4.4.3 Verify Keyboard Navigation Accessibility

4.5 Performance

4.5.1 Measure Time to First Tokens (TTFT)

4.5.2 Evaluate Throughput of Inference Server

4.5.3 Ensure Interface Contains Only Safe Content

3.6.1 `morph`

3.6.2 `asteraceae`