ha-mcp-server/ARCHITECTURE.md

# Home Assistant MCP Server - Architecture Documentation

## Overview

This document describes the architecture and design decisions for the Home Assistant MCP Server, which bridges the Model Context Protocol with Home Assistant's REST API.

## System Architecture

```
┌─────────────────┐
│   MCP Client    │  (Claude Desktop, other MCP clients)
│   (LLM)         │
└────────┬────────┘
         │ MCP Protocol (stdio)
         │
┌────────▼────────┐
│   MCP Server    │
│   (This app)    │
├─────────────────┤
│ • Resources     │  Read-only data access via URIs
│ • Tools         │  Action execution with parameters
│ • Type Safety   │  Zod validation + TypeScript
└────────┬────────┘
         │ HTTP + Bearer Auth
         │
┌────────▼────────┐
│ Home Assistant  │
│   REST API      │
│  (Port 8123)    │
└─────────────────┘
```

## Component Architecture

### 1. Core Components

#### `index.ts` - MCP Server
**Responsibilities:**
- Initialize MCP server with capabilities
- Handle resource listing and reading
- Handle tool listing and execution
- Manage request/response lifecycle
- Error handling and formatting

**Key Features:**
- Stdio transport for universal client compatibility
- Comprehensive error handling with MCP error codes
- Input validation for all tool parameters
- Connection verification on startup

#### `ha-client.ts` - Home Assistant Client
**Responsibilities:**
- Encapsulate all Home Assistant REST API calls
- Manage HTTP communication with axios
- Handle authentication headers
- Format and parse API responses
- Provide type-safe method interfaces

**Key Features:**
- Singleton client instance with configured base URL
- Bearer token authentication
- Timeout configuration (30s default)
- Comprehensive error formatting
- Type-safe response parsing

#### `types.ts` - Type Definitions
**Responsibilities:**
- Define TypeScript interfaces for all data structures
- Document API response formats
- Ensure type safety across the codebase

**Key Features:**
- Complete coverage of HA API response types
- Nested interface definitions for complex objects
- JSDoc comments for documentation

### 2. MCP Resource Design

Resources provide read-only access to Home Assistant data through URI-based addressing.

#### Resource URI Scheme
```
ha://states              - All entity states
ha://states/{entity_id}  - Specific entity state
ha://config              - System configuration
ha://services            - Available services
ha://events              - Registered events
ha://components          - Loaded components
ha://error_log           - Error log entries
```

#### Resource Implementation Pattern
1. **Declaration**: Resource metadata in `ListResourcesRequestSchema` handler
2. **Reading**: URI parsing and data retrieval in `ReadResourceRequestSchema` handler
3. **Response**: JSON or text content with appropriate MIME type

#### Design Rationale
- **Static URIs**: Predictable, well-known endpoints for core data
- **Dynamic URIs**: Entity-specific URIs follow `ha://states/{entity_id}` pattern
- **MIME Types**: `application/json` for structured data, `text/plain` for logs
- **Caching**: Not implemented (always fresh data from HA)

### 3. MCP Tool Design

Tools allow LLMs to execute actions and state-changing operations.

#### Tool Categories

**Device Control:**
- `call_service` - Universal service execution interface

**State Management:**
- `get_state` - Read specific entity state
- `set_state` - Create or update entity state

**Event System:**
- `fire_event` - Trigger custom events

**Data Queries:**
- `get_history` - Historical state data
- `get_logbook` - Human-readable event logs
- `render_template` - Execute Jinja2 templates

**Media & Calendar:**
- `get_camera_image` - Camera snapshots
- `get_calendar_events` - Calendar data

#### Tool Schema Design

Each tool has a JSON Schema defining:
- **Required parameters**: Must be provided
- **Optional parameters**: Have defaults or are conditional
- **Type constraints**: String, number, boolean, object, array
- **Descriptions**: Clear, LLM-friendly explanations

Example:
```typescript
{
  name: 'call_service',
  description: 'Call a Home Assistant service...',
  inputSchema: {
    type: 'object',
    properties: {
      domain: { type: 'string', description: '...' },
      service: { type: 'string', description: '...' },
      service_data: { type: 'object', description: '...' },
    },
    required: ['domain', 'service']
  }
}
```

#### Tool Implementation Pattern
1. **Declaration**: Tool metadata and schema in `ListToolsRequestSchema` handler
2. **Execution**: Parameter extraction and validation in `CallToolRequestSchema` handler
3. **API Call**: Delegate to `HomeAssistantClient` method
4. **Response**: JSON-formatted result in text content

### 4. Authentication & Security

#### Authentication Flow
```
1. Server startup: Load HA_ACCESS_TOKEN from environment
2. Client initialization: Configure axios with Bearer token header
3. Every request: Axios automatically includes Authorization header
4. HA validation: Home Assistant validates token for each request
```

#### Security Measures
- **Token Storage**: Environment variables (never hardcoded)
- **Token Transmission**: HTTPS recommended for production
- **Token Scope**: Full Home Assistant access (same as UI)
- **Token Rotation**: Manual process (revoke + create new)

#### Environment Configuration
```bash
HA_BASE_URL=http://homeassistant.local:8123
HA_ACCESS_TOKEN=eyJ0eXAiOiJKV1QiLCJhbGc...
```

Validated at startup with Zod schema:
```typescript
const CONFIG_SCHEMA = z.object({
  HA_BASE_URL: z.string().url(),
  HA_ACCESS_TOKEN: z.string().min(1),
});
```

### 5. Error Handling

#### Error Flow
```
1. Try operation
2. Catch error
3. Format error message
4. Throw McpError with appropriate code
5. MCP SDK sends error to client
```

#### Error Types

**Connection Errors:**
- No response from Home Assistant
- Network timeout
- Invalid base URL

**Authentication Errors:**
- Invalid access token (401)
- Token expired
- Missing Authorization header

**API Errors:**
- Entity not found (404)
- Invalid service call (400)
- Service execution failure

**MCP Errors:**
- Invalid request (unknown resource/tool)
- Internal error (unexpected failures)

#### Error Formatting
```typescript
static formatError(error: any): string {
  if (axios.isAxiosError(error)) {
    if (error.response) {
      return `Home Assistant API error (${error.response.status}): ...`;
    } else if (error.request) {
      return `No response from Home Assistant. Check if HA is running...`;
    }
  }
  return `Unexpected error: ${error.message}`;
}
```

### 6. Data Flow Examples

#### Example 1: Reading Entity State (Resource)
```
1. LLM client: Request resource ha://states/light.living_room
2. MCP Server: Parse URI, extract entity_id
3. MCP Server: Call haClient.getState('light.living_room')
4. HA Client: GET /api/states/light.living_room
5. Home Assistant: Validate token, fetch state, return JSON
6. HA Client: Parse response, return EntityState object
7. MCP Server: Format as MCP resource response
8. LLM client: Receive entity state data
```

#### Example 2: Calling Service (Tool)
```
1. LLM client: Call tool call_service with parameters
2. MCP Server: Validate parameters against schema
3. MCP Server: Extract domain, service, service_data
4. MCP Server: Call haClient.callService(...)
5. HA Client: POST /api/services/{domain}/{service}
6. Home Assistant: Execute service, return changed states
7. HA Client: Return state changes array
8. MCP Server: Format as tool response
9. LLM client: Receive execution result
```

## Design Decisions

### 1. Why Stdio Transport?
- **Universal compatibility**: Works with any MCP client
- **Simple integration**: No network configuration needed
- **Secure**: No exposed ports, runs as subprocess
- **Standard**: MCP reference implementation pattern

### 2. Why Resources AND Tools?
- **Semantic clarity**: Read vs. write operations are explicit
- **Optimization**: Clients can cache resource data
- **Discovery**: LLMs can explore available data sources
- **REST mapping**: Aligns with HTTP GET vs. POST semantics

### 3. Why TypeScript?
- **Type safety**: Catch errors at compile time
- **IDE support**: Excellent autocomplete and refactoring
- **Documentation**: Types serve as inline documentation
- **MCP SDK**: Official SDK is TypeScript-first

### 4. Why Axios vs. Fetch?
- **Error handling**: Better error detection and formatting
- **Interceptors**: Easy to add logging or retry logic
- **Timeout support**: Built-in timeout configuration
- **Request/response transformation**: Automatic JSON parsing

### 5. Why Not WebSocket?
- **Scope**: Initial version focuses on request-response
- **Complexity**: Stdio + WebSocket would require connection management
- **Future enhancement**: Can be added for real-time updates

## Extension Points

### Adding New Resources
1. Add resource definition to `ListResourcesRequestSchema`
2. Add URI handler to `ReadResourceRequestSchema`
3. Add HA client method if needed
4. Add type definitions if needed

### Adding New Tools
1. Add tool definition to `ListToolsRequestSchema`
2. Add execution handler to `CallToolRequestSchema`
3. Add HA client method if needed
4. Add type definitions if needed

### Supporting New HA APIs
1. Define TypeScript interfaces in `types.ts`
2. Add client methods to `HomeAssistantClient`
3. Expose as resource or tool in MCP server
4. Update documentation

## Performance Considerations

### Latency Sources
1. **MCP Protocol**: Minimal (stdio, no serialization overhead)
2. **HTTP Request**: Network RTT + HA processing (typically 10-100ms)
3. **JSON Parsing**: Minimal (native V8 parser)

### Optimization Strategies
- **Batch operations**: Use service calls that accept multiple entities
- **Minimal responses**: Use history API filters to reduce data transfer
- **Template rendering**: Offload complex queries to HA's template engine

### Scalability
- **Stateless design**: Each request is independent
- **No caching**: Always fresh data (trade-off for simplicity)
- **Connection pooling**: Axios reuses HTTP connections

## Testing Strategy

### Unit Tests (Future)
- Mock HA client responses
- Test MCP request handling
- Validate error formatting
- Test URI parsing

### Integration Tests (Future)
- Test against Home Assistant demo instance
- Verify all API endpoints
- Test authentication flows
- Test error scenarios

### Manual Testing
- Use MCP Inspector tool
- Test with Claude Desktop
- Verify against real Home Assistant instance

## Future Enhancements

### Priority 1: Core Functionality
- [ ] WebSocket support for real-time state updates
- [ ] Connection pooling and retry logic
- [ ] Comprehensive error logging

### Priority 2: Developer Experience
- [ ] Unit test coverage
- [ ] Integration test suite
- [ ] CLI for testing tools directly
- [ ] Debug mode with verbose logging

### Priority 3: Advanced Features
- [ ] Entity discovery with caching
- [ ] Service schema validation
- [ ] Template validation and preview
- [ ] Bulk operations tool
- [ ] Scene and script management tools

### Priority 4: Production Readiness
- [ ] Rate limiting
- [ ] Metrics and monitoring
- [ ] Health check endpoint
- [ ] Graceful shutdown handling
- [ ] Configuration validation UI

## Conclusion

This architecture prioritizes:
1. **Simplicity**: Easy to understand and extend
2. **Type safety**: Catch errors early
3. **Separation of concerns**: Clear component boundaries
4. **MCP best practices**: Follow official patterns
5. **Extensibility**: Easy to add new capabilities

The design balances functionality with maintainability, providing a solid foundation for Home Assistant + MCP integration.