各类文件格式的文件头标志

正好最近在年文件编码及转换方面的资料,弄到文件提取内容的格式判断,可以取文件流中编码的头部编码来进行文件格式的判断。
以下是收集的比较全的文件头标志十六进制编码

再粘一段格式判断代码(python),可以根据实际自已判断:

def determine_image_type(stream_first_4_bytes):

"""Find out the image file type based on the magic number comparison of the first 4 (or 2) bytes"""
file_type = None
bytes_as_hex = binascii.b2a_hex(stream_first_4_bytes)

if bytes_as_hex.startswith(b"ffd8", 0, len(bytes_as_hex)):
    file_type = '.jpeg'
    #89504e47
elif bytes_as_hex.startswith(b"89504e47", 0, len(bytes_as_hex)):
    file_type = '.png'
    #47494638
elif bytes_as_hex.startswith(b"47494638", 0, len(bytes_as_hex)):
    file_type = '.gif'
elif bytes_as_hex.startswith(b"424d", 0, len(bytes_as_hex)):
    file_type = '.bmp'
    #49492a00
elif bytes_as_hex.startswith(b"49492a", 0, len(bytes_as_hex)):
    file_type ='.tif'
return file_type

def save_image(lt_image, page_number, images_folder):

"""Try to save the image data"""
result = None
if lt_image.stream:
    file_stream = lt_image.stream.get_rawdata()
    if file_stream:
        file_ext = determine_image_type(file_stream[0:4])
        print("show!!!!",file_stream[0:4])
        if file_ext:
            file_name = ''.join([str(page_number), '_', lt_image.name, file_ext])
            if write_file(images_folder, file_name, file_stream, flags='wb'):
                result = file_name
return result

JPEG (jpg),文件头:FFD8FF

  • 文件头标识 (2 bytes): $ff, $d8 (SOI) (JPEG 文件标识)
  • 文件结束标识 (2 bytes): $ff, $d9 (EOI)

TGA

  • 未压缩的前5字节 00 00 02 00 00
  • RLE压缩的前5字节 00 00 10 00 00

PNG (png),文件头:89504E47

  • 文件头标识 (8 bytes) 89 50 4E 47 0D 0A 1A 0A

GIF (gif),文件头:47494638

  • 文件头标识 (6 bytes) 47 49 46 38 39(37) 61

                                     G    I    F     8    9 (7)     a

    Windows Bitmap (bmp),文件头:424D

  • 文件头标识 (2 bytes) 42 4D

                                     B    M

    PCX

  • 文件头标识 (1 bytes) 0A

TIFF (tif),文件头:49492A00

  • 文件头标识 (2 bytes) 4D 4D 或 49 49

ICO

  • 文件头标识 (8 bytes) 00 00 01 00 01 00 20 20

CUR

  • 文件头标识 (8 bytes) 00 00 02 00 01 00 20 20

IFF

  • 文件头标识 (4 bytes) 46 4F 52 4D

                                     F    O   R    M
    

ANI

  • 文件头标识 (4 bytes) 52 49 46 46

                                    R     I     F
    

CAD (dwg),文件头:41433130

Adobe Photoshop (psd),文件头:38425053

Rich Text Format (rtf),文件头:7B5C727466

XML (xml),文件头:3C3F786D6C

HTML (html),文件头:68746D6C3E

Email [thorough only] (eml),文件头:44656C69766572792D646174653A

Outlook Express (dbx),文件头:CFAD12FEC5FD746F

Outlook (pst),文件头:2142444E

MS Word/Excel (xls.or.doc),文件头:D0CF11E0

MS Access (mdb),文件头:5374616E64617264204A

WordPerfect (wpd),文件头:FF575043

Postscript (eps.or.ps),文件头:252150532D41646F6265

Adobe Acrobat (pdf),文件头:255044462D312E

Quicken (qdf),文件头:AC9EBD8F

Windows Password (pwl),文件头:E3828596

ZIP Archive (zip),文件头:504B0304

RAR Archive (rar),文件头:52617221

Wave (wav),文件头:57415645

AVI (avi),文件头:41564920

Real Audio (ram),文件头:2E7261FD

Real Media (rm),文件头:2E524D46

MPEG (mpg),文件头:000001BA

MPEG (mpg),文件头:000001B3

Quicktime (mov),文件头:6D6F6F76

Windows Media (asf),文件头:3026B2758E66CF11

MIDI (mid),文件头:4D546864

2、从winhex中取出的文件头列表

File Type ExtensionsHeader

JPEG jpg;jpeg 0xFFD8FF

PNG png 0x89504E470D0A1A0A

GIF gif GIF8

TIFF tif;tiff 0x49492A00

TIFF tif;tiff 0x4D4D002A

Bit map bmp BM

AOL ART art 0x4A47040E000000

AOL ART art 0x4A47030E000000

PC Paintbrush pcx 0x0A050108

Graphics Metafile wmf 0xD7CDC69A

Graphics Metafile wmf 0x01000900

Graphics Metafile wmf 0x02000900

Enhanced Metafile emf 0x0100000058000000

Corel Draw cdr CDR

CAD dwg 0x41433130

Adobe Photoshop psd 8BPS

Rich Text Format rtf rtf

XML xml

HTML html;htm;php;php3;php4;phtml;shtml type

Email eml Delivery-date:

Outlook Express dbx 0xCFAD12FE

Outlookpst!BDN

MS Office/OLE2doc;xls;dot;ppt;xla;ppa;pps;pot;msi;sdw;db 0xD0CF11E0A1B11AE1

MS Access mdb;mda;mde;mdt Standard J

WordPerfect wpd 0xFF575043

OpenOffice Writer sxw writer

OpenOffice Calc sxc calc

OpenOffice Math sxm math

OpenOffice Impress sxi impress

OpenOffice Draw sxd draw

Adobe FrameMaker fm <MAKERFILE

PostScript eps.or.ps;ps;eps %!PS-Adobe

Adobe Acrobat pdf %PDF-1.

Quicken qdf 0xAC9EBD8F

QuickBooks Backup qbb 0x458600000600

Sage sly.or.srt.or.slt;sly;srt;slt0x53520100

Sage Backup 1 SAGEBACKUP

Lotus WordPro v9 lwp 0x576F726450726F

Lotus 123 v9 123 0x00001A00051004

Lotus 123 v5 wk4 0x00001A0002100400

Lotus 123 v3 wk3 0x00001A0000100400

Lotus 123 v1 wk1 0x2000604060

Windows Password pwl 0xE3828596

ZIP Archive zip;jar 0x504B0304

ZIP Archive (outdated) zip 0x504B3030

RAR Archive rar Rar!

GZ Archive gz;tgz 0x1F8B08

BZIP Archive bz2 BZh

ARJ Archive arj 0x60EA

7-ZIP Archive 7z 7z集'

Wave wav WAVE

AVI avi AVI

Real Audio ram;ra .ra?0

Real Media rm .RMF

MPEG mpg;mpeg 0x000001BA

MPEG mpg;mpeg 0x000001B3

Quicktime mov moov

Windows Media asf 0x3026B2758E66CF11

MIDI mid MThd

Win32 Executable exe;dll;drv;vxd;sys;ocx;vbxMZ

Win16 Executable exe;dll;drv;vxd;sys;ocx;vbxMZ

ELF Executable elf;; 0x7F454C4601010100

各种文件类型文件头标志位详细列表

FFD8FFFE00, .JPEG;.JPE;.JPG, "JPGGraphic File"

FFD8FFE000, .JPEG;.JPE;.JPG, "JPGGraphic File"

474946383961, .gif, "GIF 89A"

474946383761, .gif, "GIF 87A"

424D, .bmp, "Windows Bitmap"

4D5A,.exe;.com;.386;.ax;.acm;.sys;.dll;.drv;.flt;.fon;.ocx;.scr;.lrc;.vxd;

.cpl;.x32, "Executable File"

504B0304, .zip, "Zip Compressed"

3A42617365, .cnt, ""

D0CF11E0A1B11AE1,.doc;.xls;.xlt;.ppt;.apr, "MS Compound Document v1 or Lotus Approach APRfile"

0100000058000000, .emf, ""

03000000C466C456, .evt, ""

3F5F0300, .gid;.hlp;.lhp, "Windows HelpFile"

1F8B08, .gz, "GZ Compressed File"

28546869732066696C65, .hqx, ""

0000010000, .ico, "Icon File"

4C000000011402, .lnk, "Windows LinkFile"

25504446, .pdf, "Adobe PDF File"

5245474544495434, .reg, ""

7B5C727466,.rtf, "Rich Text Format File"

lh, .lzh, "Lz compression file"

MThd, .mid, ""

0A050108, .pcx, ""

25215053, .eps, "Adobe EPS File"

2112, .ain, "AIN Archive File"

1A02, .arc, "ARC/PKPAK Compressed 1"

1A03, .arc, "ARC/PKPAK Compressed 2"

1A04, .arc, "ARC/PKPAK Compressed 3"

1A08, .arc, "ARC/PKPAK Compressed 4"

1A09, .arc, "ARC/PKPAK Compressed 5"

60EA, .arj, "ARJ Compressed"

41564920, .avi, "Audio Video Interleave(AVI)"

425A68, .bz;.bz2, "Bzip Archive"

49536328, .cab, "Cabinet File"

4C01, .obj, "Compiled Object Module"

303730373037, .tar;.cpio, "CPIO ArchiveFile"

4352555348, .cru;.crush, "CRUSH ArchiveFile"

3ADE68B1, .dcx, "DCX Graphic File"

1F8B, .gz;.tar;.tgz, "Gzip ArchiveFile"

91334846, .hap, "HAP Archive File"

3C68746D6C3E,.htm;.html, "HyperText Markup Language 1"

3C48544D4C3E,.htm;.html, "HyperText Markup Language 2"

3C21444F4354, .htm;.html, "HyperText MarkupLanguage 3"

100, .ico, "ICON File"

5F27A889, .jar, "JAR Archive File"

2D6C68352D,.lha, "LHA Compressed"

20006040600, .wk1;.wks, "Lotus 123 v1 Worksheet"

00001A0007800100, .fm3, "Lotus 123 v3 FMTfile"

00001A0000100400, .wk3, "Lotus 123 v3Worksheet"

20006800200, .fmt, "Lotus 123 v4 FMTfile"

00001A0002100400, .wk4, "Lotus 123 v5"

5B7665725D, .ami, "Lotus Ami Pro"

300000041505052, .adx, "Lotus ApproachADX file"

1A0000030000, .nsf;.ntf, "Lotus NotesDatabase/Template"

4D47582069747064, .ds4, "MicrografixDesigner 4"

4D534346, .cab, "Microsoft CAB FileFormat"

4D546864, .mid, "Midi Audio File"

000001B3, .mpg;.mpeg, "MPEG Movie"

0902060000001000B9045C00, .xls, "MS Excel v2"

0904060000001000F6055C00, .xls, "MS Excel v4"

7FFE340A,.doc, "MS Word"

1234567890FF, .doc, "MS Word 6.0"

31BE000000AB0000, .doc, "MS Word forDOS 6.0"

1A00000300001100, .nsf, "NotesDatabase"

7E424B00, .psp, "PaintShop Pro Image File"

504B0304, .zip, "PKZIP Compressed"

89504E470D0A, .png, "PNG Image File"

6D646174, .mov, "QuickTime Movie"

6D646174, .qt, "Quicktime MovieFile"

52617221, .rar, "RAR Archive File"

2E7261FD, .ra;.ram, "Real AudioFile"

EDABEEDB, .rpm, "RPM Archive File"

2E736E64, .au, "SoundMachine AudioFile"

53495421, .sit, "Stuffit v1 ArchiveFile"

53747566664974, .sit, "Stuffit v5Archive File"

1F9D, .z, "TAR Compressed ArchiveFile"

49492A, .tif;.tiff, "TIFF (Intel)"

4D4D2A,.tif;.tiff, "TIFF (Motorola)"

554641, .ufa, "UFA Archive File"

57415645666D74, .wav, "Wave Files"

D7CDC69A,.wmf, "Windows Meta File"

4C000000, .lnk, "Windows Shortcut (LinkFile)"

504B3030504B0304, .zip, "WINZIPCompressed"

FF575047, .wpg, "WordPerfectGraphics"

FF575043, .wp, "WordPerfect v5 orv6"

3C3F786D6C,.xml, "XML Document"

FFFE3C0052004F004F0054005300540055004200, .xml, "XML Document(ROOTSTUB)"

3C21454E54495459, .dtd, "XML DTD"

5A4F4F20, .zoo, "ZOO Archive File"