Full TIFF Support is Coming!

Today I had the first milestone meeting with the folks of Hallo Welt about the TIFF support they are implementing for MediaWiki. This is one of the projects Wikimedia Germany was offering contracts for a while back. Now we are starting to see the first results, cool!
So, what will full TIFF support give us? Nothing spectacular, but something quite useful. TIFF is an image format widely used by museums and in scientific research. It’s also the de-facto standard used in print/reproduction. It is however rarely seen on the web, and browsers generally are not able to display TIFF images. However, the need to deal with TIFF files has increased lately, as we get more and more media from museums and archives. Especially for the people who work on image restauration, it is important to be able to have the original digital version of the image around – which is usually a TIFF file. So, TIFF uploads had been enabled on Commons a few months ago. But MediWiki can’t render them, and nither can browsers. They can’t be used as images on the wiki.
So, what Hallo Welt is doing now is implementing rendering support for TIFF files – which is not so easy, because TIFF files may contain multiple images or pages, similar to PDF files, or the DjVu files we use to represent scanned books. But the project is coming along pretty well, and it looks like it will bring some small improvements also to the existing support for PDF and DjVu files. We are also experimenting with automatic user interface testing using the Selenium framework. If this works out well, we may well use it for more things on MediaWiki in the future.
The project is scheduled to be completed in November, and I hope we will be testing it on the live sites soon after. So, look out for more pretty pictures!

Archive notice: This is an archived post from blog.wikimedia.org, which operated under different editorial and content guidelines than Diff.

11 Comments
Inline Feedbacks
View all comments

Will you be supporting 16 bit TIFFS now or only 8 bit? A portion of Library of Congress files currently require conversion.

@Durova:Not sure – it will support whatever ImageMagick supports.

I think ImageMagick can handle up to 32 bits depending on the version: http://www.imagemagick.org/script/architecture.php
/Micke

That’s great, but why is it possible to do thlis when the much more efficient lossless format, PNGs, which generally have half the file size of tiffs, and thus allow twice as big of a file to be uploaded, are still stuck at the appallingly low limit of 12.5 megapixels, and requests to fix this on Bugzilla just get met with complete stonewalling and talk about “maybe improving the message”. We’re thus providing a perverse incentive for providing lower resolution lossless scans: You won’t have to deal with idiots deleting your files because they don’t display, and, worse, don’t display… Read more »

@Adam: first: yes, we are thinking about doing something about it. The idea is to render a medium-sized (~6 megapixel) version on upload and then generate smaller thumbnails from that. This is an idea that was discussed with Brion & co, but someone has to go and do it. It’s on the list of things WMDE would like to have, but I can’t püromise we’ll do it. In the mean time: did you notice that the 12mpx limit applies to PNG but not to JPG? I don’t know if it will apply to TIFF, I don’t know enough about the… Read more »

I don’t have a lot of visibility as to where the tiff code is being added? … but I would mention for large tiff rendering imagemagic is probably not ideal (because it loads the full image into memory for resizing) .
If we could test something like vips that could potentially help with quickly generating thumbnails of very large tiff images without saturating all the available memmory.
An example on the vips site has the vips re-sizer running in 1/5th the time at 1/32 as much memory for a 25 megapixle image.

JPG is lossy, which means if you reedit an image enough times, it eventually becomes unusable because of JPEG artefacting.. That doesn’t fit in very well with our collaborative ethos, does it? Generally, I upload a PNG AND a JPEG, but, because PNG thumbnailing is handled so badly, the PNGs sometimes get deleted. Plus, the thumbnailing tends to blur the images, but only JPEGs have any sharpening applied, and a lot of other such problems.

By the way, if you’re going down that route,, wouldn’t it be better to generate a full-sized JPEG from both TIFFs and PNGs? Then you have the archival copy, but people who just want to look at the image at full resolution it can have a link to download the (much lower filesize) JPEG, and that can be used for thumbnailing. You will need the JPEG to have decently high quality, but it’d probably work out fairly well. I’d only do thiss above 12.5 Megs for PNGs, though, as PNGs are used, for instance, to explain JPEG artefacting, and if… Read more »

A third possibility is to allow administrators only to upload a file to make thumbnails from. This has some advantages with engravings and other images that the thumbnailer has trouble with. For instance, [[Image:Rajpoots_2.png]] is a featured picture on en-wiki, but thumbnails poorly. [[Image:Rajpoots_small.jpg]] is an awful image, but thumbnails very well.

Daniel :
In the mean time: did you notice that the 12mpx limit applies to PNG but not to JPG?

This is because all PNG renderers that I am aware of try to load the entire file uncompressed into memory before resizing and recompressing it.
I wrote a tool some time ago that resizes the image “by row”, which greatly limits its memory foot print. It can be found in Wikimedia SVN as pngds (the C app that does the actual resizing) and PngHandler (the MediaWiki plugin).

Keep in mind that TIFF has some variants in the wild, some of which don’t conform to the standard spec. SketchbookPro for example simply uses the metaspace differently, which borks its ability to load properly in GIMP.
In a certain way, we don’t need to preserve the TIFF formatting, simply the layers – so a TIFF to XCF process might be the ideal solution.