各类文件头标准编码(16进制) - 判断文件流格式编码
各类文件格式的文件头标志
正好最近在年文件编码及转换方面的资料,弄到文件提取内容的格式判断,可以取文件流中编码的头部编码来进行文件格式的判断。
以下是收集的比较全的文件头标志十六进制编码。
再粘一段格式判断代码(python),可以根据实际自已判断:
def determine_image_type(stream_first_4_bytes):
"""Find out the image file type based on the magic number comparison of the first 4 (or 2) bytes"""
file_type = None
bytes_as_hex = binascii.b2a_hex(stream_first_4_bytes)
if bytes_as_hex.startswith(b"ffd8", 0, len(bytes_as_hex)):
file_type = '.jpeg'
#89504e47
elif bytes_as_hex.startswith(b"89504e47", 0, len(bytes_as_hex)):
file_type = '.png'
#47494638
elif bytes_as_hex.startswith(b"47494638", 0, len(bytes_as_hex)):
file_type = '.gif'
elif bytes_as_hex.startswith(b"424d", 0, len(bytes_as_hex)):
file_type = '.bmp'
#49492a00
elif bytes_as_hex.startswith(b"49492a", 0, len(bytes_as_hex)):
file_type ='.tif'
return file_type
def save_image(lt_image, page_number, images_folder):
"""Try to save the image data"""
result = None
if lt_image.stream:
file_stream = lt_image.stream.get_rawdata()
if file_stream:
file_ext = determine_image_type(file_stream[0:4])
print("show!!!!",file_stream[0:4])
if file_ext:
file_name = ''.join([str(page_number), '_', lt_image.name, file_ext])
if write_file(images_folder, file_name, file_stream, flags='wb'):
result = file_name
return result
JPEG (jpg),文件头:FFD8FF
- 文件头标识 (2 bytes): $ff, $d8 (SOI) (JPEG 文件标识)
- 文件结束标识 (2 bytes): $ff, $d9 (EOI)
TGA
- 未压缩的前5字节 00 00 02 00 00
- RLE压缩的前5字节 00 00 10 00 00
PNG (png),文件头:89504E47
- 文件头标识 (8 bytes) 89 50 4E 47 0D 0A 1A 0A
GIF (gif),文件头:47494638
文件头标识 (6 bytes) 47 49 46 38 39(37) 61
G I F 8 9 (7) a
Windows Bitmap (bmp),文件头:424D
文件头标识 (2 bytes) 42 4D
B M
PCX
- 文件头标识 (1 bytes) 0A
TIFF (tif),文件头:49492A00
- 文件头标识 (2 bytes) 4D 4D 或 49 49
ICO
- 文件头标识 (8 bytes) 00 00 01 00 01 00 20 20
CUR
- 文件头标识 (8 bytes) 00 00 02 00 01 00 20 20
IFF
文件头标识 (4 bytes) 46 4F 52 4D
F O R M
ANI
文件头标识 (4 bytes) 52 49 46 46
R I F
CAD (dwg),文件头:41433130
Adobe Photoshop (psd),文件头:38425053
Rich Text Format (rtf),文件头:7B5C727466
XML (xml),文件头:3C3F786D6C
HTML (html),文件头:68746D6C3E
Email [thorough only] (eml),文件头:44656C69766572792D646174653A
Outlook Express (dbx),文件头:CFAD12FEC5FD746F
Outlook (pst),文件头:2142444E
MS Word/Excel (xls.or.doc),文件头:D0CF11E0
MS Access (mdb),文件头:5374616E64617264204A
WordPerfect (wpd),文件头:FF575043
Postscript (eps.or.ps),文件头:252150532D41646F6265
Adobe Acrobat (pdf),文件头:255044462D312E
Quicken (qdf),文件头:AC9EBD8F
Windows Password (pwl),文件头:E3828596
ZIP Archive (zip),文件头:504B0304
RAR Archive (rar),文件头:52617221
Wave (wav),文件头:57415645
AVI (avi),文件头:41564920
Real Audio (ram),文件头:2E7261FD
Real Media (rm),文件头:2E524D46
MPEG (mpg),文件头:000001BA
MPEG (mpg),文件头:000001B3
Quicktime (mov),文件头:6D6F6F76
Windows Media (asf),文件头:3026B2758E66CF11
MIDI (mid),文件头:4D546864
2、从winhex中取出的文件头列表
File Type ExtensionsHeader
JPEG jpg;jpeg 0xFFD8FF
PNG png 0x89504E470D0A1A0A
GIF gif GIF8
TIFF tif;tiff 0x49492A00
TIFF tif;tiff 0x4D4D002A
Bit map bmp BM
AOL ART art 0x4A47040E000000
AOL ART art 0x4A47030E000000
PC Paintbrush pcx 0x0A050108
Graphics Metafile wmf 0xD7CDC69A
Graphics Metafile wmf 0x01000900
Graphics Metafile wmf 0x02000900
Enhanced Metafile emf 0x0100000058000000
Corel Draw cdr CDR
CAD dwg 0x41433130
Adobe Photoshop psd 8BPS
Rich Text Format rtf rtf
XML xml
HTML html;htm;php;php3;php4;phtml;shtml type
Email eml Delivery-date:
Outlook Express dbx 0xCFAD12FE
Outlookpst!BDN
MS Office/OLE2doc;xls;dot;ppt;xla;ppa;pps;pot;msi;sdw;db 0xD0CF11E0A1B11AE1
MS Access mdb;mda;mde;mdt Standard J
WordPerfect wpd 0xFF575043
OpenOffice Writer sxw writer
OpenOffice Calc sxc calc
OpenOffice Math sxm math
OpenOffice Impress sxi impress
OpenOffice Draw sxd draw
Adobe FrameMaker fm <MAKERFILE
PostScript eps.or.ps;ps;eps %!PS-Adobe
Adobe Acrobat pdf %PDF-1.
Quicken qdf 0xAC9EBD8F
QuickBooks Backup qbb 0x458600000600
Sage sly.or.srt.or.slt;sly;srt;slt0x53520100
Sage Backup 1 SAGEBACKUP
Lotus WordPro v9 lwp 0x576F726450726F
Lotus 123 v9 123 0x00001A00051004
Lotus 123 v5 wk4 0x00001A0002100400
Lotus 123 v3 wk3 0x00001A0000100400
Lotus 123 v1 wk1 0x2000604060
Windows Password pwl 0xE3828596
ZIP Archive zip;jar 0x504B0304
ZIP Archive (outdated) zip 0x504B3030
RAR Archive rar Rar!
GZ Archive gz;tgz 0x1F8B08
BZIP Archive bz2 BZh
ARJ Archive arj 0x60EA
7-ZIP Archive 7z 7z集'
Wave wav WAVE
AVI avi AVI
Real Audio ram;ra .ra?0
Real Media rm .RMF
MPEG mpg;mpeg 0x000001BA
MPEG mpg;mpeg 0x000001B3
Quicktime mov moov
Windows Media asf 0x3026B2758E66CF11
MIDI mid MThd
Win32 Executable exe;dll;drv;vxd;sys;ocx;vbxMZ
Win16 Executable exe;dll;drv;vxd;sys;ocx;vbxMZ
ELF Executable elf;; 0x7F454C4601010100
各种文件类型文件头标志位详细列表
FFD8FFFE00, .JPEG;.JPE;.JPG, "JPGGraphic File"
FFD8FFE000, .JPEG;.JPE;.JPG, "JPGGraphic File"
474946383961, .gif, "GIF 89A"
474946383761, .gif, "GIF 87A"
424D, .bmp, "Windows Bitmap"
4D5A,.exe;.com;.386;.ax;.acm;.sys;.dll;.drv;.flt;.fon;.ocx;.scr;.lrc;.vxd;
.cpl;.x32, "Executable File"
504B0304, .zip, "Zip Compressed"
3A42617365, .cnt, ""
D0CF11E0A1B11AE1,.doc;.xls;.xlt;.ppt;.apr, "MS Compound Document v1 or Lotus Approach APRfile"
0100000058000000, .emf, ""
03000000C466C456, .evt, ""
3F5F0300, .gid;.hlp;.lhp, "Windows HelpFile"
1F8B08, .gz, "GZ Compressed File"
28546869732066696C65, .hqx, ""
0000010000, .ico, "Icon File"
4C000000011402, .lnk, "Windows LinkFile"
25504446, .pdf, "Adobe PDF File"
5245474544495434, .reg, ""
7B5C727466,.rtf, "Rich Text Format File"
lh, .lzh, "Lz compression file"
MThd, .mid, ""
0A050108, .pcx, ""
25215053, .eps, "Adobe EPS File"
2112, .ain, "AIN Archive File"
1A02, .arc, "ARC/PKPAK Compressed 1"
1A03, .arc, "ARC/PKPAK Compressed 2"
1A04, .arc, "ARC/PKPAK Compressed 3"
1A08, .arc, "ARC/PKPAK Compressed 4"
1A09, .arc, "ARC/PKPAK Compressed 5"
60EA, .arj, "ARJ Compressed"
41564920, .avi, "Audio Video Interleave(AVI)"
425A68, .bz;.bz2, "Bzip Archive"
49536328, .cab, "Cabinet File"
4C01, .obj, "Compiled Object Module"
303730373037, .tar;.cpio, "CPIO ArchiveFile"
4352555348, .cru;.crush, "CRUSH ArchiveFile"
3ADE68B1, .dcx, "DCX Graphic File"
1F8B, .gz;.tar;.tgz, "Gzip ArchiveFile"
91334846, .hap, "HAP Archive File"
3C68746D6C3E,.htm;.html, "HyperText Markup Language 1"
3C48544D4C3E,.htm;.html, "HyperText Markup Language 2"
3C21444F4354, .htm;.html, "HyperText MarkupLanguage 3"
100, .ico, "ICON File"
5F27A889, .jar, "JAR Archive File"
2D6C68352D,.lha, "LHA Compressed"
20006040600, .wk1;.wks, "Lotus 123 v1 Worksheet"
00001A0007800100, .fm3, "Lotus 123 v3 FMTfile"
00001A0000100400, .wk3, "Lotus 123 v3Worksheet"
20006800200, .fmt, "Lotus 123 v4 FMTfile"
00001A0002100400, .wk4, "Lotus 123 v5"
5B7665725D, .ami, "Lotus Ami Pro"
300000041505052, .adx, "Lotus ApproachADX file"
1A0000030000, .nsf;.ntf, "Lotus NotesDatabase/Template"
4D47582069747064, .ds4, "MicrografixDesigner 4"
4D534346, .cab, "Microsoft CAB FileFormat"
4D546864, .mid, "Midi Audio File"
000001B3, .mpg;.mpeg, "MPEG Movie"
0902060000001000B9045C00, .xls, "MS Excel v2"
0904060000001000F6055C00, .xls, "MS Excel v4"
7FFE340A,.doc, "MS Word"
1234567890FF, .doc, "MS Word 6.0"
31BE000000AB0000, .doc, "MS Word forDOS 6.0"
1A00000300001100, .nsf, "NotesDatabase"
7E424B00, .psp, "PaintShop Pro Image File"
504B0304, .zip, "PKZIP Compressed"
89504E470D0A, .png, "PNG Image File"
6D646174, .mov, "QuickTime Movie"
6D646174, .qt, "Quicktime MovieFile"
52617221, .rar, "RAR Archive File"
2E7261FD, .ra;.ram, "Real AudioFile"
EDABEEDB, .rpm, "RPM Archive File"
2E736E64, .au, "SoundMachine AudioFile"
53495421, .sit, "Stuffit v1 ArchiveFile"
53747566664974, .sit, "Stuffit v5Archive File"
1F9D, .z, "TAR Compressed ArchiveFile"
49492A, .tif;.tiff, "TIFF (Intel)"
4D4D2A,.tif;.tiff, "TIFF (Motorola)"
554641, .ufa, "UFA Archive File"
57415645666D74, .wav, "Wave Files"
D7CDC69A,.wmf, "Windows Meta File"
4C000000, .lnk, "Windows Shortcut (LinkFile)"
504B3030504B0304, .zip, "WINZIPCompressed"
FF575047, .wpg, "WordPerfectGraphics"
FF575043, .wp, "WordPerfect v5 orv6"
3C3F786D6C,.xml, "XML Document"
FFFE3C0052004F004F0054005300540055004200, .xml, "XML Document(ROOTSTUB)"
3C21454E54495459, .dtd, "XML DTD"
5A4F4F20, .zoo, "ZOO Archive File"
本作品采用 知识共享署名-相同方式共享 4.0 国际许可协议 进行许可。
请问一下.kjb和.ktr的文件头怎么提取啊
这个没有找过,原理一样,二进制读取,文件头编码识别。