--- description: globs: alwaysApply: false --- # SheetJS Unified File Processing Rules ## **Critical Requirements for Excel/CSV File Processing** - **Always use SheetJS XLSX.read() for all file types (CSV, XLS, XLSX)** - **Implement comprehensive Unicode/Korean support through proper codepage settings** - **Apply defensive programming patterns to prevent processing errors** - **Use optimized SheetJS options for performance and reliability** ## **SheetJS Unified Processing Chain** ### **1. File Type Detection and Format Handling** ```typescript // ✅ DO: Detect file format and apply appropriate settings function getFileFormat(file: File): string { const extension = file.name.toLowerCase().split(".").pop(); switch (extension) { case "csv": return "CSV"; case "xls": return "XLS"; case "xlsx": return "XLSX"; default: return "UNKNOWN"; } } // ❌ DON'T: Use different processing chains for different formats function processXLSX(file) { /* LuckyExcel */ } function processXLS(file) { /* SheetJS conversion */ } function processCSV(file) { /* Custom parser */ } ``` ### **2. Korean/Unicode Support Configuration** ```typescript // ✅ DO: Configure proper codepage for Korean support function getOptimalReadOptions(fileType: string): XLSX.ParsingOptions { const baseOptions = { type: "array", cellText: true, // Enable text generation raw: false, // Use formatted values (Korean guarantee) codepage: 65001, // UTF-8 codepage (Korean support) }; switch (fileType) { case "CSV": return { ...baseOptions, codepage: 65001 }; // UTF-8 for CSV case "XLS": return { ...baseOptions, codepage: 65001 }; // UTF-8 for XLS case "XLSX": return baseOptions; // XLSX natively supports UTF-8 } } // ❌ DON'T: Ignore encoding settings const workbook = XLSX.read(data); // Missing encoding options ``` ### **3. Fallback Encoding Strategy** ```typescript // ✅ DO: Implement fallback encoding for robust Korean support try { workbook = XLSX.read(arrayBuffer, { codepage: 65001 }); // UTF-8 first } catch (error) { // Fallback to appropriate encoding const fallbackCodepage = fileFormat === "CSV" ? 949 : 1252; // CP949 for Korean CSV workbook = XLSX.read(arrayBuffer, { codepage: fallbackCodepage }); } // ❌ DON'T: Give up on first encoding failure try { workbook = XLSX.read(arrayBuffer); } catch (error) { throw error; // No fallback strategy } ``` ### **4. Workbook Validation (Defensive Programming)** ```typescript // ✅ DO: Validate workbook object thoroughly if (!workbook || typeof workbook !== "object") { throw new Error("워크북을 생성할 수 없습니다"); } if (!workbook.SheetNames || !Array.isArray(workbook.SheetNames) || workbook.SheetNames.length === 0) { throw new Error("시트 이름 정보가 없습니다"); } if (!workbook.Sheets || typeof workbook.Sheets !== "object") { throw new Error("유효한 시트가 없습니다"); } // ❌ DON'T: Skip validation const firstSheet = workbook.Sheets[workbook.SheetNames[0]]; // Potential runtime error ``` ### **5. Optimized SheetJS Options** ```typescript // ✅ DO: Use performance-optimized options const readOptions: XLSX.ParsingOptions = { type: "array", cellText: true, // ✅ Enable for Korean text cellNF: false, // ✅ Disable for performance cellHTML: false, // ✅ Disable for performance cellFormula: false, // ✅ Disable for performance cellStyles: false, // ✅ Disable for performance cellDates: true, // ✅ Enable for date handling sheetStubs: false, // ✅ Disable for performance bookProps: false, // ✅ Disable for performance bookSheets: true, // ✅ Enable for sheet info bookVBA: false, // ✅ Disable for performance raw: false, // ✅ Use formatted values (Korean guarantee) dense: false, // ✅ Use sparse arrays (memory efficient) WTF: false, // ✅ Ignore errors (stability) UTC: false, // ✅ Use local time }; // ❌ DON'T: Use default options without optimization const workbook = XLSX.read(data, { type: "array" }); // Missing optimizations ``` ### **6. Korean Data Processing** ```typescript // ✅ DO: Process Korean data with proper settings const jsonData = XLSX.utils.sheet_to_json(sheet, { header: 1, // Array format defval: "", // Default value for empty cells blankrows: false, // Remove blank rows raw: false, // Use formatted values (Korean guarantee) }); // ❌ DON'T: Use raw values that might break Korean text const jsonData = XLSX.utils.sheet_to_json(sheet, { raw: true, // Might break Korean characters }); ``` ## **Error Handling Patterns** ### **Specific Error Messages** ```typescript // ✅ DO: Provide specific error messages if (arrayBuffer.byteLength === 0) { throw new Error(`${fileFormat} 파일이 비어있습니다.`); } if (!workbook.SheetNames.length) { throw new Error("시트 이름 정보가 없습니다 - 파일이 비어있거나 손상되었습니다."); } // ❌ DON'T: Use generic error messages throw new Error("File processing failed"); ``` ## **Performance Guidelines** - **Use `type: "array"` for ArrayBuffer input** - **Disable unnecessary features (cellFormula, cellStyles, etc.)** - **Use `raw: false` to ensure Korean text integrity** - **Enable `blankrows: false` to remove empty rows** - **Cache workbook objects when processing multiple sheets** ## **Testing Requirements** - **Test with Korean filenames and Korean data content** - **Test encoding fallback scenarios (UTF-8 → CP949 → CP1252)** - **Test error handling for corrupted files** - **Test all supported file formats (CSV, XLS, XLSX)** - **Verify memory efficiency with large files** ## **Migration from LuckyExcel** - **Replace all LuckyExcel transformExcelToLucky() calls with SheetJS** - **Remove XLS-to-XLSX conversion logic (SheetJS handles natively)** - **Update data structure from LuckyExcel format to standard JSON** - **Maintain backward compatibility in component interfaces** ## **Common Pitfalls to Avoid** - ❌ Using different processing libraries for different file types - ❌ Ignoring codepage settings for Korean files - ❌ Not implementing encoding fallback strategies - ❌ Skipping workbook validation steps - ❌ Using `raw: true` which can break Korean characters - ❌ Not handling empty or corrupted files gracefully ## **Best Practices** - ✅ Log processing steps for debugging Korean encoding issues - ✅ Use consistent error message format across all file types - ✅ Implement comprehensive test coverage for Korean scenarios - ✅ Monitor performance with large Korean datasets - ✅ Document encoding strategies for team knowledge sharing