How is mime type of an uploaded file determined by browser?

Mục lục bài viết

Chrome

Chrome (version 38 as of writing) has 3 ways to determine the MIME type and does so in a certain order. The snippet below is from file src/net/base/mime_util.cc, method MimeUtil::GetMimeTypeFromExtensionHelper.

// We implement the same algorithm as Mozilla for mapping a file extension to
// a mime type.  That is, we first check a hard-coded list (that cannot be
// overridden), and then if not found there, we defer to the system registry.
// Finally, we scan a secondary hard-coded list to catch types that we can
// deduce but that we also want to allow the OS to override.

The hard-coded lists come a bit earlier in the file: https://cs.chromium.org/chromium/src/net/base/mime_util.cc?l=170 (kPrimaryMappings and kSecondaryMappings).

An example: when uploading a CSV file from a Windows system with Microsoft Excel installed, Chrome will report this as application/vnd.ms-excel. This is because .csv is not specified in the first hard-coded list, so the browser falls back to the system registry. HKEY_CLASSES_ROOT\.csv has a value named Content Type that is set to application/vnd.ms-excel.

Internet Explorer

Again using the same example, the browser will report application/vnd.ms-excel. I think it’s reasonable to assume Internet Explorer (version 11 as of writing) uses the registry. Possibly it also makes use of a hard-coded list like Chrome and Firefox, but its closed source nature makes it hard to verify.

Firefox

As indicated in the Chrome code, Firefox (version 32 as of writing) works in a similar way. Snippet from file uriloader\exthandler\nsExternalHelperAppService.cpp, method nsExternalHelperAppService::GetTypeFromExtension

// OK. We want to try the following sources of mimetype information, in this order:
// 1. defaultMimeEntries array
// 2. User-set preferences (managed by the handler service)
// 3. OS-provided information
// 4. our "extras" array
// 5. Information from plugins
// 6. The "ext-to-type-mapping" category

The hard-coded lists come earlier in the file, somewhere near line 441. You’re looking for defaultMimeEntries and extraMimeEntries.

With my current profile, the browser will report text/csv because there’s an entry for it in mimeTypes.rdf (item 2 in the list above). With a fresh profile, which does not have this entry, the browser will report application/vnd.ms-excel (item 3 in the list).

Summary

The hard-coded lists in the browsers are pretty limited. Often, the MIME type sent by the browser will be the one reported by the OS. And this is exactly why, as stated in the question, the MIME type reported by the browser is unreliable.