feat: 파일프로세서 개선 - 안정적인 Excel 파일 처리

- 이전 잘 작동하던 코드 로직을 현재 프로세서에 적용 - LuckyExcel 우선 시도 + SheetJS Fallback 패턴 도입 - CSV, XLS, XLSX 모든 형식에 대한 안정적 처리 - 한글 시트명 정규화 및 워크북 구조 검증 강화 - 복잡한 SheetJS 옵션 단순화로 안정성 향상 - 에러 발생 시 빈 시트 생성으로 앱 중단 방지 - 테스트 환경 및 Cursor 규칙 업데이트 Technical improvements: - convertSheetJSToLuckyExcel 함수로 안정적 데이터 변환 - UTF-8 codepage 설정으로 한글 지원 강화 - validateWorkbook 함수로 방어적 프로그래밍 적용
2025-06-20 14:32:33 +09:00
parent f288103e55
commit 3a8c6af7ea
16 changed files with 5249 additions and 133 deletions
--- a/.cursor/rules/xls_processing.mdc
+++ b/.cursor/rules/xls_processing.mdc
@@ -0,0 +1,195 @@
+---
+description: 
+globs: 
+alwaysApply: false
+---
+# SheetJS Unified File Processing Rules
+
+## **Critical Requirements for Excel/CSV File Processing**
+
+- **Always use SheetJS XLSX.read() for all file types (CSV, XLS, XLSX)**
+- **Implement comprehensive Unicode/Korean support through proper codepage settings**
+- **Apply defensive programming patterns to prevent processing errors**
+- **Use optimized SheetJS options for performance and reliability**
+
+## **SheetJS Unified Processing Chain**
+
+### **1. File Type Detection and Format Handling**
+```typescript
+// ✅ DO: Detect file format and apply appropriate settings
+function getFileFormat(file: File): string {
+  const extension = file.name.toLowerCase().split(".").pop();
+  switch (extension) {
+    case "csv": return "CSV";
+    case "xls": return "XLS"; 
+    case "xlsx": return "XLSX";
+    default: return "UNKNOWN";
+  }
+}
+
+// ❌ DON'T: Use different processing chains for different formats
+function processXLSX(file) { /* LuckyExcel */ }
+function processXLS(file) { /* SheetJS conversion */ }
+function processCSV(file) { /* Custom parser */ }
+```
+
+### **2. Korean/Unicode Support Configuration**
+```typescript
+// ✅ DO: Configure proper codepage for Korean support
+function getOptimalReadOptions(fileType: string): XLSX.ParsingOptions {
+  const baseOptions = {
+    type: "array",
+    cellText: true,    // Enable text generation
+    raw: false,        // Use formatted values (Korean guarantee)
+    codepage: 65001,   // UTF-8 codepage (Korean support)
+  };
+
+  switch (fileType) {
+    case "CSV":
+      return { ...baseOptions, codepage: 65001 }; // UTF-8 for CSV
+    case "XLS":
+      return { ...baseOptions, codepage: 65001 }; // UTF-8 for XLS
+    case "XLSX":
+      return baseOptions; // XLSX natively supports UTF-8
+  }
+}
+
+// ❌ DON'T: Ignore encoding settings
+const workbook = XLSX.read(data); // Missing encoding options
+```
+
+### **3. Fallback Encoding Strategy**
+```typescript
+// ✅ DO: Implement fallback encoding for robust Korean support
+try {
+  workbook = XLSX.read(arrayBuffer, { codepage: 65001 }); // UTF-8 first
+} catch (error) {
+  // Fallback to appropriate encoding
+  const fallbackCodepage = fileFormat === "CSV" ? 949 : 1252; // CP949 for Korean CSV
+  workbook = XLSX.read(arrayBuffer, { codepage: fallbackCodepage });
+}
+
+// ❌ DON'T: Give up on first encoding failure
+try {
+  workbook = XLSX.read(arrayBuffer);
+} catch (error) {
+  throw error; // No fallback strategy
+}
+```
+
+### **4. Workbook Validation (Defensive Programming)**
+```typescript
+// ✅ DO: Validate workbook object thoroughly
+if (!workbook || typeof workbook !== "object") {
+  throw new Error("워크북을 생성할 수 없습니다");
+}
+
+if (!workbook.SheetNames || !Array.isArray(workbook.SheetNames) || workbook.SheetNames.length === 0) {
+  throw new Error("시트 이름 정보가 없습니다");
+}
+
+if (!workbook.Sheets || typeof workbook.Sheets !== "object") {
+  throw new Error("유효한 시트가 없습니다");
+}
+
+// ❌ DON'T: Skip validation
+const firstSheet = workbook.Sheets[workbook.SheetNames[0]]; // Potential runtime error
+```
+
+### **5. Optimized SheetJS Options**
+```typescript
+// ✅ DO: Use performance-optimized options
+const readOptions: XLSX.ParsingOptions = {
+  type: "array",
+  cellText: true,     // ✅ Enable for Korean text
+  cellNF: false,      // ✅ Disable for performance
+  cellHTML: false,    // ✅ Disable for performance  
+  cellFormula: false, // ✅ Disable for performance
+  cellStyles: false,  // ✅ Disable for performance
+  cellDates: true,    // ✅ Enable for date handling
+  sheetStubs: false,  // ✅ Disable for performance
+  bookProps: false,   // ✅ Disable for performance
+  bookSheets: true,   // ✅ Enable for sheet info
+  bookVBA: false,     // ✅ Disable for performance
+  raw: false,         // ✅ Use formatted values (Korean guarantee)
+  dense: false,       // ✅ Use sparse arrays (memory efficient)
+  WTF: false,         // ✅ Ignore errors (stability)
+  UTC: false,         // ✅ Use local time
+};
+
+// ❌ DON'T: Use default options without optimization
+const workbook = XLSX.read(data, { type: "array" }); // Missing optimizations
+```
+
+### **6. Korean Data Processing**
+```typescript
+// ✅ DO: Process Korean data with proper settings
+const jsonData = XLSX.utils.sheet_to_json(sheet, {
+  header: 1,          // Array format
+  defval: "",         // Default value for empty cells
+  blankrows: false,   // Remove blank rows
+  raw: false,         // Use formatted values (Korean guarantee)
+});
+
+// ❌ DON'T: Use raw values that might break Korean text
+const jsonData = XLSX.utils.sheet_to_json(sheet, {
+  raw: true,  // Might break Korean characters
+});
+```
+
+## **Error Handling Patterns**
+
+### **Specific Error Messages**
+```typescript
+// ✅ DO: Provide specific error messages
+if (arrayBuffer.byteLength === 0) {
+  throw new Error(`${fileFormat} 파일이 비어있습니다.`);
+}
+
+if (!workbook.SheetNames.length) {
+  throw new Error("시트 이름 정보가 없습니다 - 파일이 비어있거나 손상되었습니다.");
+}
+
+// ❌ DON'T: Use generic error messages
+throw new Error("File processing failed");
+```
+
+## **Performance Guidelines**
+
+- **Use `type: "array"` for ArrayBuffer input**
+- **Disable unnecessary features (cellFormula, cellStyles, etc.)**
+- **Use `raw: false` to ensure Korean text integrity**
+- **Enable `blankrows: false` to remove empty rows**
+- **Cache workbook objects when processing multiple sheets**
+
+## **Testing Requirements**
+
+- **Test with Korean filenames and Korean data content**
+- **Test encoding fallback scenarios (UTF-8 → CP949 → CP1252)**
+- **Test error handling for corrupted files**
+- **Test all supported file formats (CSV, XLS, XLSX)**
+- **Verify memory efficiency with large files**
+
+## **Migration from LuckyExcel**
+
+- **Replace all LuckyExcel transformExcelToLucky() calls with SheetJS**
+- **Remove XLS-to-XLSX conversion logic (SheetJS handles natively)**
+- **Update data structure from LuckyExcel format to standard JSON**
+- **Maintain backward compatibility in component interfaces**
+
+## **Common Pitfalls to Avoid**
+
+- ❌ Using different processing libraries for different file types
+- ❌ Ignoring codepage settings for Korean files
+- ❌ Not implementing encoding fallback strategies  
+- ❌ Skipping workbook validation steps
+- ❌ Using `raw: true` which can break Korean characters
+- ❌ Not handling empty or corrupted files gracefully
+
+## **Best Practices**
+
+- ✅ Log processing steps for debugging Korean encoding issues
+- ✅ Use consistent error message format across all file types
+- ✅ Implement comprehensive test coverage for Korean scenarios
+- ✅ Monitor performance with large Korean datasets
+- ✅ Document encoding strategies for team knowledge sharing