add how-it-opens-files page
This commit is contained in:
parent
bc247770f4
commit
a78ef520e2
4
TODO
4
TODO
@ -1,6 +1,4 @@
|
||||
- add note about vips_thumbnail() and vips_thumbnail_buffer()
|
||||
|
||||
move "how libvips opens files" into docs?
|
||||
- move "how libvips opens files" into docs?
|
||||
|
||||
- not sure about utf8 error messages on win
|
||||
|
||||
|
131
doc/How-it-opens-files.md
Normal file
131
doc/How-it-opens-files.md
Normal file
@ -0,0 +1,131 @@
|
||||
libvips now has at least four different ways of opening image files, each best
|
||||
for different file types, file sizes and image use cases. libvips
|
||||
tries hard to pick the best strategy in each case and mostly you don't need
|
||||
to know what it is doing behind the scenes, except unfortunately when you do.
|
||||
|
||||
This post tries to explain what the different strategies are and when each is
|
||||
used. If you are running into unexpected memory, disc or CPU use, this might
|
||||
be helpful. `vips_image_new_from_file()` has the official documentation.
|
||||
|
||||
# Direct access
|
||||
|
||||
This is the fastest and simplest one. The file is mapped directly into
|
||||
the process's address space and can be read with ordinary pointer
|
||||
access. Small files are completely mapped; large files are mapped in a
|
||||
series of small windows that are shared and which scroll about as pixels
|
||||
are read. Files which are accessed like this can be read by many threads
|
||||
at once, making them especially quick. They also interact well with the
|
||||
computer's operating system: your OS will use spare memory to cache
|
||||
recently used chunks of the file, very handy.
|
||||
|
||||
For this to be possible, the file format needs to be a simple dump of a memory
|
||||
array. libvips supports direct access for vips, 8-bit binary ppm/pbm/pnm,
|
||||
analyse and raw.
|
||||
|
||||
libvips has a special direct write mode where pixels can be written directly
|
||||
to the file image. This is used for the [draw operators](libvips-draw.html).
|
||||
|
||||
# Random access via load library
|
||||
|
||||
Some image file formats have libraries which allow true random access to
|
||||
image pixels. For example, libtiff lets you read any tile out of a tiled
|
||||
tiff image very quickly. Because the libraries allow true random access,
|
||||
libvips can simply hook the image load library up to the input of the
|
||||
operation pipeline.
|
||||
|
||||
These libraries are generally single-threaded, so only one thread may
|
||||
read at once, making them slower than simple direct access.
|
||||
Additionally, tiles are often compressed, meaning that each time a tile
|
||||
is fetched it must be decompressed. libvips keeps a cache of
|
||||
recently-decompressed tiles to try to avoid repeatedly decompressing the
|
||||
same tile.
|
||||
|
||||
libvips can load tiled tiff, tiled OpenEXR, FITS and OpenSlide images in
|
||||
this manner.
|
||||
|
||||
# Full decompression
|
||||
|
||||
Many image load libraries do not support random access. In order to use
|
||||
images of this type as inputs to pipelines, libvips has to convert them
|
||||
to a random access format first.
|
||||
|
||||
For small images (less than 100mb when decompressed), libvips allocates
|
||||
a large area of memory and decompresses the entire image to that. It
|
||||
then uses that memory buffer of decompressed pixels to feed the
|
||||
pipeline. For large images, libvips decompresses to a temporary file on
|
||||
disc, then loads that temporary file in direct access mode (see above).
|
||||
Note that on open libvips just reads the image header and is quick: the
|
||||
image decompress happens on the first pixel access.
|
||||
|
||||
You can control this process with environment variables, command-line
|
||||
flags and API calls as you choose, see
|
||||
`vips_image_new_from_file()`.
|
||||
They let you set the threshold at which libvips switches between memory
|
||||
and disc and where on disc the temporary files are held.
|
||||
|
||||
This is the slowest and most memory-hungry way to read files, but it's
|
||||
unavoidable for many file formats. Unless you can use the next one!
|
||||
|
||||
# Sequential access
|
||||
|
||||
This a fairly recent addition to libvips and is a hybrid of the previous
|
||||
two.
|
||||
|
||||
Imagine how this command might be executed:
|
||||
|
||||
```
|
||||
$ vips flip fred.jpg jim.jpg vertical
|
||||
```
|
||||
|
||||
meaning, read `fred.jpg`, flip it up-down, and write as `jim.jpg`.
|
||||
|
||||
In order to write `jim.jpg` top-to-bottom, it'll have to read `fred.jpg`
|
||||
bottom-to-top. Unfortunately libjpeg only supports top-to-bottom reading
|
||||
and writing, so libvips must convert `fred.jpg` to a random access format
|
||||
before it can run the flip operation.
|
||||
|
||||
However many useful operations do not require true random access. For
|
||||
example:
|
||||
|
||||
```
|
||||
$ vips shrink fred.png jim.png 10 10
|
||||
```
|
||||
|
||||
meaning shrink `fred.png` by a factor of 10 in both axies and write as
|
||||
`jim.png`.
|
||||
|
||||
You can imagine this operation running without needing `fred.png` to be
|
||||
completely decompressed first. You just read 10 lines from `fred.png` for
|
||||
every one line you write to `jim.png`.
|
||||
|
||||
To help in this case, libvips has a hint you can give to loaders to say
|
||||
"I will only need pixels from this image in top-to-bottom order". With
|
||||
this hint set, libvips will hook up the pipeline of operations directly
|
||||
to the read-a-line interface provided by the image library, and add a
|
||||
small cache of the most recent 100 or so lines.
|
||||
|
||||
There's an extra unbuffered sequential mode where vips does not keep a
|
||||
cache of recent lines. This gives a useful memory saving for operations
|
||||
like copy which do not need any non-local access.
|
||||
|
||||
# Debugging
|
||||
|
||||
There are a few flags you can use to find out what libvips is doing.
|
||||
|
||||
`--vips-leak` This makes libvips test for leaks on program exit. It checks
|
||||
for images which haven't been closed and also (usefully) shows the memory
|
||||
high-water mark. It counts all memory allocated in libvips for pixel buffers.
|
||||
|
||||
`--vips-progress` This makes libvips show a crude progress bar for every major
|
||||
image loop, with destination and elapsed time. You can see whether images
|
||||
are going to disc or to memory and how long the decompression is taking.
|
||||
|
||||
`--vips-cache-trace This shows a line for every vips operation that executes,
|
||||
with arguments. It's part of vips8, so it doesn't display vips7 operations,
|
||||
sadly.
|
||||
|
||||
# Summary
|
||||
|
||||
libvips tries hard to do the quickest thing in every case, but will
|
||||
sometimes fail. You can prod it in the right direction with a mixture of
|
||||
hints and flags to the load system.
|
122
doc/How-it-opens-files.xml
Normal file
122
doc/How-it-opens-files.xml
Normal file
@ -0,0 +1,122 @@
|
||||
<?xml version="1.0" encoding="utf-8" ?>
|
||||
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
|
||||
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
|
||||
<refentry id="How-it-opens-files.md">
|
||||
|
||||
<refmeta>
|
||||
<refentrytitle>How-it-opens-files.md</refentrytitle>
|
||||
<manvolnum>3</manvolnum>
|
||||
<refmiscinfo>libvips</refmiscinfo>
|
||||
</refmeta>
|
||||
|
||||
<refnamediv>
|
||||
<refname>libvips</refname>
|
||||
<refpurpose>How-it-opens-files.md</refpurpose>
|
||||
</refnamediv>
|
||||
|
||||
|
||||
<para>
|
||||
libvips now has at least four different ways of opening image files, each best for different file types, file sizes and image use cases. libvips tries hard to pick the best strategy in each case and mostly you don’t need to know what it is doing behind the scenes, except unfortunately when you do.
|
||||
</para>
|
||||
<para>
|
||||
This post tries to explain what the different strategies are and when each is used. If you are running into unexpected memory, disc or CPU use, this might be helpful. <literal>vips_image_new_from_file()</literal> has the official documentation.
|
||||
</para>
|
||||
<refsect3 id="direct-access">
|
||||
<title>Direct access</title>
|
||||
<para>
|
||||
This is the fastest and simplest one. The file is mapped directly into the process’s address space and can be read with ordinary pointer access. Small files are completely mapped; large files are mapped in a series of small windows that are shared and which scroll about as pixels are read. Files which are accessed like this can be read by many threads at once, making them especially quick. They also interact well with the computer’s operating system: your OS will use spare memory to cache recently used chunks of the file, very handy.
|
||||
</para>
|
||||
<para>
|
||||
For this to be possible, the file format needs to be a simple dump of a memory array. libvips supports direct access for vips, 8-bit binary ppm/pbm/pnm, analyse and raw.
|
||||
</para>
|
||||
<para>
|
||||
libvips has a special direct write mode where pixels can be written directly to the file image. This is used for the <ulink url="libvips-draw.html">draw operators</ulink>.
|
||||
</para>
|
||||
</refsect3>
|
||||
<refsect3 id="random-access-via-load-library">
|
||||
<title>Random access via load library</title>
|
||||
<para>
|
||||
Some image file formats have libraries which allow true random access to image pixels. For example, libtiff lets you read any tile out of a tiled tiff image very quickly. Because the libraries allow true random access, libvips can simply hook the image load library up to the input of the operation pipeline.
|
||||
</para>
|
||||
<para>
|
||||
These libraries are generally single-threaded, so only one thread may read at once, making them slower than simple direct access. Additionally, tiles are often compressed, meaning that each time a tile is fetched it must be decompressed. libvips keeps a cache of recently-decompressed tiles to try to avoid repeatedly decompressing the same tile.
|
||||
</para>
|
||||
<para>
|
||||
libvips can load tiled tiff, tiled OpenEXR, FITS and OpenSlide images in this manner.
|
||||
</para>
|
||||
</refsect3>
|
||||
<refsect3 id="full-decompression">
|
||||
<title>Full decompression</title>
|
||||
<para>
|
||||
Many image load libraries do not support random access. In order to use images of this type as inputs to pipelines, libvips has to convert them to a random access format first.
|
||||
</para>
|
||||
<para>
|
||||
For small images (less than 100mb when decompressed), libvips allocates a large area of memory and decompresses the entire image to that. It then uses that memory buffer of decompressed pixels to feed the pipeline. For large images, libvips decompresses to a temporary file on disc, then loads that temporary file in direct access mode (see above). Note that on open libvips just reads the image header and is quick: the image decompress happens on the first pixel access.
|
||||
</para>
|
||||
<para>
|
||||
You can control this process with environment variables, command-line flags and API calls as you choose, see <literal>vips_image_new_from_file()</literal>. They let you set the threshold at which libvips switches between memory and disc and where on disc the temporary files are held.
|
||||
</para>
|
||||
<para>
|
||||
This is the slowest and most memory-hungry way to read files, but it’s unavoidable for many file formats. Unless you can use the next one!
|
||||
</para>
|
||||
</refsect3>
|
||||
<refsect3 id="sequential-access">
|
||||
<title>Sequential access</title>
|
||||
<para>
|
||||
This a fairly recent addition to libvips and is a hybrid of the previous two.
|
||||
</para>
|
||||
<para>
|
||||
Imagine how this command might be executed:
|
||||
</para>
|
||||
<programlisting>
|
||||
$ vips flip fred.jpg jim.jpg vertical
|
||||
</programlisting>
|
||||
<para>
|
||||
meaning, read <literal>fred.jpg</literal>, flip it up-down, and write as <literal>jim.jpg</literal>.
|
||||
</para>
|
||||
<para>
|
||||
In order to write <literal>jim.jpg</literal> top-to-bottom, it’ll have to read <literal>fred.jpg</literal> bottom-to-top. Unfortunately libjpeg only supports top-to-bottom reading and writing, so libvips must convert <literal>fred.jpg</literal> to a random access format before it can run the flip operation.
|
||||
</para>
|
||||
<para>
|
||||
However many useful operations do not require true random access. For example:
|
||||
</para>
|
||||
<programlisting>
|
||||
$ vips shrink fred.png jim.png 10 10
|
||||
</programlisting>
|
||||
<para>
|
||||
meaning shrink <literal>fred.png</literal> by a factor of 10 in both axies and write as <literal>jim.png</literal>.
|
||||
</para>
|
||||
<para>
|
||||
You can imagine this operation running without needing <literal>fred.png</literal> to be completely decompressed first. You just read 10 lines from <literal>fred.png</literal> for every one line you write to <literal>jim.png</literal>.
|
||||
</para>
|
||||
<para>
|
||||
To help in this case, libvips has a hint you can give to loaders to say <quote>I will only need pixels from this image in top-to-bottom order</quote>. With this hint set, libvips will hook up the pipeline of operations directly to the read-a-line interface provided by the image library, and add a small cache of the most recent 100 or so lines.
|
||||
</para>
|
||||
<para>
|
||||
There’s an extra unbuffered sequential mode where vips does not keep a cache of recent lines. This gives a useful memory saving for operations like copy which do not need any non-local access.
|
||||
</para>
|
||||
</refsect3>
|
||||
<refsect3 id="debugging">
|
||||
<title>Debugging</title>
|
||||
<para>
|
||||
There are a few flags you can use to find out what libvips is doing.
|
||||
</para>
|
||||
<para>
|
||||
<literal>--vips-leak</literal> This makes libvips test for leaks on program exit. It checks for images which haven’t been closed and also (usefully) shows the memory high-water mark. It counts all memory allocated in libvips for pixel buffers.
|
||||
</para>
|
||||
<para>
|
||||
<literal>--vips-progress</literal> This makes libvips show a crude progress bar for every major image loop, with destination and elapsed time. You can see whether images are going to disc or to memory and how long the decompression is taking.
|
||||
</para>
|
||||
<para>
|
||||
`–vips-cache-trace This shows a line for every vips operation that executes, with arguments. It’s part of vips8, so it doesn’t display vips7 operations, sadly.
|
||||
</para>
|
||||
</refsect3>
|
||||
<refsect3 id="summary">
|
||||
<title>Summary</title>
|
||||
<para>
|
||||
libvips tries hard to do the quickest thing in every case, but will sometimes fail. You can prod it in the right direction with a mixture of hints and flags to the load system.
|
||||
</para>
|
||||
</refsect3>
|
||||
|
||||
|
||||
</refentry>
|
@ -145,6 +145,7 @@ HTML_IMAGES = \
|
||||
markdown_content_files = \
|
||||
How-it-works.md \
|
||||
Using-vipsthumbnail.md \
|
||||
How-it-opens-files.md \
|
||||
Making-image-pyramids.md
|
||||
|
||||
# converted to xml in this dir by pandoc
|
||||
|
@ -40,6 +40,7 @@
|
||||
<xi:include href="xml/file-format.xml"/>
|
||||
<xi:include href="xml/using-threads.xml"/>
|
||||
<xi:include href="xml/How-it-works.xml"/>
|
||||
<xi:include href="xml/How-it-opens-files.xml"/>
|
||||
<xi:include href="xml/Making-image-pyramids.xml"/>
|
||||
<xi:include href="xml/Using-vipsthumbnail.xml"/>
|
||||
</chapter>
|
||||
|
Loading…
Reference in New Issue
Block a user