sheeteasyAI/.cursor/rules/xls_processing.mdc

---
description:
globs:
alwaysApply: false
---
# SheetJS Unified File Processing Rules

## **Critical Requirements for Excel/CSV File Processing**

- **Always use SheetJS XLSX.read() for all file types (CSV, XLS, XLSX)**
- **Implement comprehensive Unicode/Korean support through proper codepage settings**
- **Apply defensive programming patterns to prevent processing errors**
- **Use optimized SheetJS options for performance and reliability**

## **SheetJS Unified Processing Chain**

### **1. File Type Detection and Format Handling**
```typescript
// ✅ DO: Detect file format and apply appropriate settings
function getFileFormat(file: File): string {
  const extension = file.name.toLowerCase().split(".").pop();
  switch (extension) {
    case "csv": return "CSV";
    case "xls": return "XLS";
    case "xlsx": return "XLSX";
    default: return "UNKNOWN";
  }
}

// ❌ DON'T: Use different processing chains for different formats
function processXLSX(file) { /* LuckyExcel */ }
function processXLS(file) { /* SheetJS conversion */ }
function processCSV(file) { /* Custom parser */ }
```

### **2. Korean/Unicode Support Configuration**
```typescript
// ✅ DO: Configure proper codepage for Korean support
function getOptimalReadOptions(fileType: string): XLSX.ParsingOptions {
  const baseOptions = {
    type: "array",
    cellText: true,    // Enable text generation
    raw: false,        // Use formatted values (Korean guarantee)
    codepage: 65001,   // UTF-8 codepage (Korean support)
  };

  switch (fileType) {
    case "CSV":
      return { ...baseOptions, codepage: 65001 }; // UTF-8 for CSV
    case "XLS":
      return { ...baseOptions, codepage: 65001 }; // UTF-8 for XLS
    case "XLSX":
      return baseOptions; // XLSX natively supports UTF-8
  }
}

// ❌ DON'T: Ignore encoding settings
const workbook = XLSX.read(data); // Missing encoding options
```

### **3. Fallback Encoding Strategy**
```typescript
// ✅ DO: Implement fallback encoding for robust Korean support
try {
  workbook = XLSX.read(arrayBuffer, { codepage: 65001 }); // UTF-8 first
} catch (error) {
  // Fallback to appropriate encoding
  const fallbackCodepage = fileFormat === "CSV" ? 949 : 1252; // CP949 for Korean CSV
  workbook = XLSX.read(arrayBuffer, { codepage: fallbackCodepage });
}

// ❌ DON'T: Give up on first encoding failure
try {
  workbook = XLSX.read(arrayBuffer);
} catch (error) {
  throw error; // No fallback strategy
}
```

### **4. Workbook Validation (Defensive Programming)**
```typescript
// ✅ DO: Validate workbook object thoroughly
if (!workbook || typeof workbook !== "object") {
  throw new Error("워크북을 생성할 수 없습니다");
}

if (!workbook.SheetNames || !Array.isArray(workbook.SheetNames) || workbook.SheetNames.length === 0) {
  throw new Error("시트 이름 정보가 없습니다");
}

if (!workbook.Sheets || typeof workbook.Sheets !== "object") {
  throw new Error("유효한 시트가 없습니다");
}

// ❌ DON'T: Skip validation
const firstSheet = workbook.Sheets[workbook.SheetNames[0]]; // Potential runtime error
```

### **5. Optimized SheetJS Options**
```typescript
// ✅ DO: Use performance-optimized options
const readOptions: XLSX.ParsingOptions = {
  type: "array",
  cellText: true,     // ✅ Enable for Korean text
  cellNF: false,      // ✅ Disable for performance
  cellHTML: false,    // ✅ Disable for performance
  cellFormula: false, // ✅ Disable for performance
  cellStyles: false,  // ✅ Disable for performance
  cellDates: true,    // ✅ Enable for date handling
  sheetStubs: false,  // ✅ Disable for performance
  bookProps: false,   // ✅ Disable for performance
  bookSheets: true,   // ✅ Enable for sheet info
  bookVBA: false,     // ✅ Disable for performance
  raw: false,         // ✅ Use formatted values (Korean guarantee)
  dense: false,       // ✅ Use sparse arrays (memory efficient)
  WTF: false,         // ✅ Ignore errors (stability)
  UTC: false,         // ✅ Use local time
};

// ❌ DON'T: Use default options without optimization
const workbook = XLSX.read(data, { type: "array" }); // Missing optimizations
```

### **6. Korean Data Processing**
```typescript
// ✅ DO: Process Korean data with proper settings
const jsonData = XLSX.utils.sheet_to_json(sheet, {
  header: 1,          // Array format
  defval: "",         // Default value for empty cells
  blankrows: false,   // Remove blank rows
  raw: false,         // Use formatted values (Korean guarantee)
});

// ❌ DON'T: Use raw values that might break Korean text
const jsonData = XLSX.utils.sheet_to_json(sheet, {
  raw: true,  // Might break Korean characters
});
```

## **Error Handling Patterns**

### **Specific Error Messages**
```typescript
// ✅ DO: Provide specific error messages
if (arrayBuffer.byteLength === 0) {
  throw new Error(`${fileFormat} 파일이 비어있습니다.`);
}

if (!workbook.SheetNames.length) {
  throw new Error("시트 이름 정보가 없습니다 - 파일이 비어있거나 손상되었습니다.");
}

// ❌ DON'T: Use generic error messages
throw new Error("File processing failed");
```

## **Performance Guidelines**

- **Use `type: "array"` for ArrayBuffer input**
- **Disable unnecessary features (cellFormula, cellStyles, etc.)**
- **Use `raw: false` to ensure Korean text integrity**
- **Enable `blankrows: false` to remove empty rows**
- **Cache workbook objects when processing multiple sheets**

## **Testing Requirements**

- **Test with Korean filenames and Korean data content**
- **Test encoding fallback scenarios (UTF-8 → CP949 → CP1252)**
- **Test error handling for corrupted files**
- **Test all supported file formats (CSV, XLS, XLSX)**
- **Verify memory efficiency with large files**

## **Migration from LuckyExcel**

- **Replace all LuckyExcel transformExcelToLucky() calls with SheetJS**
- **Remove XLS-to-XLSX conversion logic (SheetJS handles natively)**
- **Update data structure from LuckyExcel format to standard JSON**
- **Maintain backward compatibility in component interfaces**

## **Common Pitfalls to Avoid**

- ❌ Using different processing libraries for different file types
- ❌ Ignoring codepage settings for Korean files
- ❌ Not implementing encoding fallback strategies
- ❌ Skipping workbook validation steps
- ❌ Using `raw: true` which can break Korean characters
- ❌ Not handling empty or corrupted files gracefully

## **Best Practices**

- ✅ Log processing steps for debugging Korean encoding issues
- ✅ Use consistent error message format across all file types
- ✅ Implement comprehensive test coverage for Korean scenarios
- ✅ Monitor performance with large Korean datasets
- ✅ Document encoding strategies for team knowledge sharing