feat: 파일프로세서 개선 - 안정적인 Excel 파일 처리

- 이전 잘 작동하던 코드 로직을 현재 프로세서에 적용
- LuckyExcel 우선 시도 + SheetJS Fallback 패턴 도입
- CSV, XLS, XLSX 모든 형식에 대한 안정적 처리
- 한글 시트명 정규화 및 워크북 구조 검증 강화
- 복잡한 SheetJS 옵션 단순화로 안정성 향상
- 에러 발생 시 빈 시트 생성으로 앱 중단 방지
- 테스트 환경 및 Cursor 규칙 업데이트

Technical improvements:
- convertSheetJSToLuckyExcel 함수로 안정적 데이터 변환
- UTF-8 codepage 설정으로 한글 지원 강화
- validateWorkbook 함수로 방어적 프로그래밍 적용
This commit is contained in:
sheetEasy AI Team
2025-06-20 14:32:33 +09:00
parent f288103e55
commit 3a8c6af7ea
16 changed files with 5249 additions and 133 deletions

View File

@@ -0,0 +1,195 @@
---
description:
globs:
alwaysApply: false
---
# SheetJS Unified File Processing Rules
## **Critical Requirements for Excel/CSV File Processing**
- **Always use SheetJS XLSX.read() for all file types (CSV, XLS, XLSX)**
- **Implement comprehensive Unicode/Korean support through proper codepage settings**
- **Apply defensive programming patterns to prevent processing errors**
- **Use optimized SheetJS options for performance and reliability**
## **SheetJS Unified Processing Chain**
### **1. File Type Detection and Format Handling**
```typescript
// ✅ DO: Detect file format and apply appropriate settings
function getFileFormat(file: File): string {
const extension = file.name.toLowerCase().split(".").pop();
switch (extension) {
case "csv": return "CSV";
case "xls": return "XLS";
case "xlsx": return "XLSX";
default: return "UNKNOWN";
}
}
// ❌ DON'T: Use different processing chains for different formats
function processXLSX(file) { /* LuckyExcel */ }
function processXLS(file) { /* SheetJS conversion */ }
function processCSV(file) { /* Custom parser */ }
```
### **2. Korean/Unicode Support Configuration**
```typescript
// ✅ DO: Configure proper codepage for Korean support
function getOptimalReadOptions(fileType: string): XLSX.ParsingOptions {
const baseOptions = {
type: "array",
cellText: true, // Enable text generation
raw: false, // Use formatted values (Korean guarantee)
codepage: 65001, // UTF-8 codepage (Korean support)
};
switch (fileType) {
case "CSV":
return { ...baseOptions, codepage: 65001 }; // UTF-8 for CSV
case "XLS":
return { ...baseOptions, codepage: 65001 }; // UTF-8 for XLS
case "XLSX":
return baseOptions; // XLSX natively supports UTF-8
}
}
// ❌ DON'T: Ignore encoding settings
const workbook = XLSX.read(data); // Missing encoding options
```
### **3. Fallback Encoding Strategy**
```typescript
// ✅ DO: Implement fallback encoding for robust Korean support
try {
workbook = XLSX.read(arrayBuffer, { codepage: 65001 }); // UTF-8 first
} catch (error) {
// Fallback to appropriate encoding
const fallbackCodepage = fileFormat === "CSV" ? 949 : 1252; // CP949 for Korean CSV
workbook = XLSX.read(arrayBuffer, { codepage: fallbackCodepage });
}
// ❌ DON'T: Give up on first encoding failure
try {
workbook = XLSX.read(arrayBuffer);
} catch (error) {
throw error; // No fallback strategy
}
```
### **4. Workbook Validation (Defensive Programming)**
```typescript
// ✅ DO: Validate workbook object thoroughly
if (!workbook || typeof workbook !== "object") {
throw new Error("워크북을 생성할 수 없습니다");
}
if (!workbook.SheetNames || !Array.isArray(workbook.SheetNames) || workbook.SheetNames.length === 0) {
throw new Error("시트 이름 정보가 없습니다");
}
if (!workbook.Sheets || typeof workbook.Sheets !== "object") {
throw new Error("유효한 시트가 없습니다");
}
// ❌ DON'T: Skip validation
const firstSheet = workbook.Sheets[workbook.SheetNames[0]]; // Potential runtime error
```
### **5. Optimized SheetJS Options**
```typescript
// ✅ DO: Use performance-optimized options
const readOptions: XLSX.ParsingOptions = {
type: "array",
cellText: true, // ✅ Enable for Korean text
cellNF: false, // ✅ Disable for performance
cellHTML: false, // ✅ Disable for performance
cellFormula: false, // ✅ Disable for performance
cellStyles: false, // ✅ Disable for performance
cellDates: true, // ✅ Enable for date handling
sheetStubs: false, // ✅ Disable for performance
bookProps: false, // ✅ Disable for performance
bookSheets: true, // ✅ Enable for sheet info
bookVBA: false, // ✅ Disable for performance
raw: false, // ✅ Use formatted values (Korean guarantee)
dense: false, // ✅ Use sparse arrays (memory efficient)
WTF: false, // ✅ Ignore errors (stability)
UTC: false, // ✅ Use local time
};
// ❌ DON'T: Use default options without optimization
const workbook = XLSX.read(data, { type: "array" }); // Missing optimizations
```
### **6. Korean Data Processing**
```typescript
// ✅ DO: Process Korean data with proper settings
const jsonData = XLSX.utils.sheet_to_json(sheet, {
header: 1, // Array format
defval: "", // Default value for empty cells
blankrows: false, // Remove blank rows
raw: false, // Use formatted values (Korean guarantee)
});
// ❌ DON'T: Use raw values that might break Korean text
const jsonData = XLSX.utils.sheet_to_json(sheet, {
raw: true, // Might break Korean characters
});
```
## **Error Handling Patterns**
### **Specific Error Messages**
```typescript
// ✅ DO: Provide specific error messages
if (arrayBuffer.byteLength === 0) {
throw new Error(`${fileFormat} 파일이 비어있습니다.`);
}
if (!workbook.SheetNames.length) {
throw new Error("시트 이름 정보가 없습니다 - 파일이 비어있거나 손상되었습니다.");
}
// ❌ DON'T: Use generic error messages
throw new Error("File processing failed");
```
## **Performance Guidelines**
- **Use `type: "array"` for ArrayBuffer input**
- **Disable unnecessary features (cellFormula, cellStyles, etc.)**
- **Use `raw: false` to ensure Korean text integrity**
- **Enable `blankrows: false` to remove empty rows**
- **Cache workbook objects when processing multiple sheets**
## **Testing Requirements**
- **Test with Korean filenames and Korean data content**
- **Test encoding fallback scenarios (UTF-8 → CP949 → CP1252)**
- **Test error handling for corrupted files**
- **Test all supported file formats (CSV, XLS, XLSX)**
- **Verify memory efficiency with large files**
## **Migration from LuckyExcel**
- **Replace all LuckyExcel transformExcelToLucky() calls with SheetJS**
- **Remove XLS-to-XLSX conversion logic (SheetJS handles natively)**
- **Update data structure from LuckyExcel format to standard JSON**
- **Maintain backward compatibility in component interfaces**
## **Common Pitfalls to Avoid**
- ❌ Using different processing libraries for different file types
- ❌ Ignoring codepage settings for Korean files
- ❌ Not implementing encoding fallback strategies
- ❌ Skipping workbook validation steps
- ❌ Using `raw: true` which can break Korean characters
- ❌ Not handling empty or corrupted files gracefully
## **Best Practices**
- ✅ Log processing steps for debugging Korean encoding issues
- ✅ Use consistent error message format across all file types
- ✅ Implement comprehensive test coverage for Korean scenarios
- ✅ Monitor performance with large Korean datasets
- ✅ Document encoding strategies for team knowledge sharing