opengl

Sample code from LearnOpenGL.com
Log | Files | Refs

stb_image.h (284733B)


      1 /* stb_image - v2.28 - public domain image loader - http://nothings.org/stb
      2                                   no warranty implied; use at your own risk
      3 
      4    Do this:
      5       #define STB_IMAGE_IMPLEMENTATION
      6    before you include this file in *one* C or C++ file to create the implementation.
      7 
      8    // i.e. it should look like this:
      9    #include ...
     10    #include ...
     11    #include ...
     12    #define STB_IMAGE_IMPLEMENTATION
     13    #include "stb_image.h"
     14 
     15    You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
     16    And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
     17 
     18 
     19    QUICK NOTES:
     20       Primarily of interest to game developers and other people who can
     21           avoid problematic images and only need the trivial interface
     22 
     23       JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
     24       PNG 1/2/4/8/16-bit-per-channel
     25 
     26       TGA (not sure what subset, if a subset)
     27       BMP non-1bpp, non-RLE
     28       PSD (composited view only, no extra channels, 8/16 bit-per-channel)
     29 
     30       GIF (*comp always reports as 4-channel)
     31       HDR (radiance rgbE format)
     32       PIC (Softimage PIC)
     33       PNM (PPM and PGM binary only)
     34 
     35       Animated GIF still needs a proper API, but here's one way to do it:
     36           http://gist.github.com/urraka/685d9a6340b26b830d49
     37 
     38       - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
     39       - decode from arbitrary I/O callbacks
     40       - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
     41 
     42    Full documentation under "DOCUMENTATION" below.
     43 
     44 
     45 LICENSE
     46 
     47   See end of file for license information.
     48 
     49 RECENT REVISION HISTORY:
     50 
     51       2.28  (2023-01-29) many error fixes, security errors, just tons of stuff
     52       2.27  (2021-07-11) document stbi_info better, 16-bit PNM support, bug fixes
     53       2.26  (2020-07-13) many minor fixes
     54       2.25  (2020-02-02) fix warnings
     55       2.24  (2020-02-02) fix warnings; thread-local failure_reason and flip_vertically
     56       2.23  (2019-08-11) fix clang static analysis warning
     57       2.22  (2019-03-04) gif fixes, fix warnings
     58       2.21  (2019-02-25) fix typo in comment
     59       2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
     60       2.19  (2018-02-11) fix warning
     61       2.18  (2018-01-30) fix warnings
     62       2.17  (2018-01-29) bugfix, 1-bit BMP, 16-bitness query, fix warnings
     63       2.16  (2017-07-23) all functions have 16-bit variants; optimizations; bugfixes
     64       2.15  (2017-03-18) fix png-1,2,4; all Imagenet JPGs; no runtime SSE detection on GCC
     65       2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
     66       2.13  (2016-12-04) experimental 16-bit API, only for PNG so far; fixes
     67       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
     68       2.11  (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
     69                          RGB-format JPEG; remove white matting in PSD;
     70                          allocate large structures on the stack;
     71                          correct channel count for PNG & BMP
     72       2.10  (2016-01-22) avoid warning introduced in 2.09
     73       2.09  (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
     74 
     75    See end of file for full revision history.
     76 
     77 
     78  ============================    Contributors    =========================
     79 
     80  Image formats                          Extensions, features
     81     Sean Barrett (jpeg, png, bmp)          Jetro Lauha (stbi_info)
     82     Nicolas Schulz (hdr, psd)              Martin "SpartanJ" Golini (stbi_info)
     83     Jonathan Dummer (tga)                  James "moose2000" Brown (iPhone PNG)
     84     Jean-Marc Lienher (gif)                Ben "Disch" Wenger (io callbacks)
     85     Tom Seddon (pic)                       Omar Cornut (1/2/4-bit PNG)
     86     Thatcher Ulrich (psd)                  Nicolas Guillemot (vertical flip)
     87     Ken Miller (pgm, ppm)                  Richard Mitton (16-bit PSD)
     88     github:urraka (animated gif)           Junggon Kim (PNM comments)
     89     Christopher Forseth (animated gif)     Daniel Gibson (16-bit TGA)
     90                                            socks-the-fox (16-bit PNG)
     91                                            Jeremy Sawicki (handle all ImageNet JPGs)
     92  Optimizations & bugfixes                  Mikhail Morozov (1-bit BMP)
     93     Fabian "ryg" Giesen                    Anael Seghezzi (is-16-bit query)
     94     Arseny Kapoulkine                      Simon Breuss (16-bit PNM)
     95     John-Mark Allen
     96     Carmelo J Fdez-Aguera
     97 
     98  Bug & warning fixes
     99     Marc LeBlanc            David Woo          Guillaume George     Martins Mozeiko
    100     Christpher Lloyd        Jerry Jansson      Joseph Thomson       Blazej Dariusz Roszkowski
    101     Phil Jordan                                Dave Moore           Roy Eltham
    102     Hayaki Saito            Nathan Reed        Won Chun
    103     Luke Graham             Johan Duparc       Nick Verigakis       the Horde3D community
    104     Thomas Ruf              Ronny Chevalier                         github:rlyeh
    105     Janez Zemva             John Bartholomew   Michal Cichon        github:romigrou
    106     Jonathan Blow           Ken Hamada         Tero Hanninen        github:svdijk
    107     Eugene Golushkov        Laurent Gomila     Cort Stratton        github:snagar
    108     Aruelien Pocheville     Sergio Gonzalez    Thibault Reuille     github:Zelex
    109     Cass Everitt            Ryamond Barbiero                        github:grim210
    110     Paul Du Bois            Engin Manap        Aldo Culquicondor    github:sammyhw
    111     Philipp Wiesemann       Dale Weiler        Oriol Ferrer Mesia   github:phprus
    112     Josh Tobin              Neil Bickford      Matthew Gregan       github:poppolopoppo
    113     Julian Raschke          Gregory Mullen     Christian Floisand   github:darealshinji
    114     Baldur Karlsson         Kevin Schmidt      JR Smith             github:Michaelangel007
    115                             Brad Weinberger    Matvey Cherevko      github:mosra
    116     Luca Sas                Alexander Veselov  Zack Middleton       [reserved]
    117     Ryan C. Gordon          [reserved]                              [reserved]
    118                      DO NOT ADD YOUR NAME HERE
    119 
    120                      Jacko Dirks
    121 
    122   To add your name to the credits, pick a random blank space in the middle and fill it.
    123   80% of merge conflicts on stb PRs are due to people adding their name at the end
    124   of the credits.
    125 */
    126 
    127 #ifndef STBI_INCLUDE_STB_IMAGE_H
    128 #define STBI_INCLUDE_STB_IMAGE_H
    129 
    130 // DOCUMENTATION
    131 //
    132 // Limitations:
    133 //    - no 12-bit-per-channel JPEG
    134 //    - no JPEGs with arithmetic coding
    135 //    - GIF always returns *comp=4
    136 //
    137 // Basic usage (see HDR discussion below for HDR usage):
    138 //    int x,y,n;
    139 //    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
    140 //    // ... process data if not NULL ...
    141 //    // ... x = width, y = height, n = # 8-bit components per pixel ...
    142 //    // ... replace '0' with '1'..'4' to force that many components per pixel
    143 //    // ... but 'n' will always be the number that it would have been if you said 0
    144 //    stbi_image_free(data);
    145 //
    146 // Standard parameters:
    147 //    int *x                 -- outputs image width in pixels
    148 //    int *y                 -- outputs image height in pixels
    149 //    int *channels_in_file  -- outputs # of image components in image file
    150 //    int desired_channels   -- if non-zero, # of image components requested in result
    151 //
    152 // The return value from an image loader is an 'unsigned char *' which points
    153 // to the pixel data, or NULL on an allocation failure or if the image is
    154 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
    155 // with each pixel consisting of N interleaved 8-bit components; the first
    156 // pixel pointed to is top-left-most in the image. There is no padding between
    157 // image scanlines or between pixels, regardless of format. The number of
    158 // components N is 'desired_channels' if desired_channels is non-zero, or
    159 // *channels_in_file otherwise. If desired_channels is non-zero,
    160 // *channels_in_file has the number of components that _would_ have been
    161 // output otherwise. E.g. if you set desired_channels to 4, you will always
    162 // get RGBA output, but you can check *channels_in_file to see if it's trivially
    163 // opaque because e.g. there were only 3 channels in the source image.
    164 //
    165 // An output image with N components has the following components interleaved
    166 // in this order in each pixel:
    167 //
    168 //     N=#comp     components
    169 //       1           grey
    170 //       2           grey, alpha
    171 //       3           red, green, blue
    172 //       4           red, green, blue, alpha
    173 //
    174 // If image loading fails for any reason, the return value will be NULL,
    175 // and *x, *y, *channels_in_file will be unchanged. The function
    176 // stbi_failure_reason() can be queried for an extremely brief, end-user
    177 // unfriendly explanation of why the load failed. Define STBI_NO_FAILURE_STRINGS
    178 // to avoid compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
    179 // more user-friendly ones.
    180 //
    181 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
    182 //
    183 // To query the width, height and component count of an image without having to
    184 // decode the full file, you can use the stbi_info family of functions:
    185 //
    186 //   int x,y,n,ok;
    187 //   ok = stbi_info(filename, &x, &y, &n);
    188 //   // returns ok=1 and sets x, y, n if image is a supported format,
    189 //   // 0 otherwise.
    190 //
    191 // Note that stb_image pervasively uses ints in its public API for sizes,
    192 // including sizes of memory buffers. This is now part of the API and thus
    193 // hard to change without causing breakage. As a result, the various image
    194 // loaders all have certain limits on image size; these differ somewhat
    195 // by format but generally boil down to either just under 2GB or just under
    196 // 1GB. When the decoded image would be larger than this, stb_image decoding
    197 // will fail.
    198 //
    199 // Additionally, stb_image will reject image files that have any of their
    200 // dimensions set to a larger value than the configurable STBI_MAX_DIMENSIONS,
    201 // which defaults to 2**24 = 16777216 pixels. Due to the above memory limit,
    202 // the only way to have an image with such dimensions load correctly
    203 // is for it to have a rather extreme aspect ratio. Either way, the
    204 // assumption here is that such larger images are likely to be malformed
    205 // or malicious. If you do need to load an image with individual dimensions
    206 // larger than that, and it still fits in the overall size limit, you can
    207 // #define STBI_MAX_DIMENSIONS on your own to be something larger.
    208 //
    209 // ===========================================================================
    210 //
    211 // UNICODE:
    212 //
    213 //   If compiling for Windows and you wish to use Unicode filenames, compile
    214 //   with
    215 //       #define STBI_WINDOWS_UTF8
    216 //   and pass utf8-encoded filenames. Call stbi_convert_wchar_to_utf8 to convert
    217 //   Windows wchar_t filenames to utf8.
    218 //
    219 // ===========================================================================
    220 //
    221 // Philosophy
    222 //
    223 // stb libraries are designed with the following priorities:
    224 //
    225 //    1. easy to use
    226 //    2. easy to maintain
    227 //    3. good performance
    228 //
    229 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
    230 // and for best performance I may provide less-easy-to-use APIs that give higher
    231 // performance, in addition to the easy-to-use ones. Nevertheless, it's important
    232 // to keep in mind that from the standpoint of you, a client of this library,
    233 // all you care about is #1 and #3, and stb libraries DO NOT emphasize #3 above all.
    234 //
    235 // Some secondary priorities arise directly from the first two, some of which
    236 // provide more explicit reasons why performance can't be emphasized.
    237 //
    238 //    - Portable ("ease of use")
    239 //    - Small source code footprint ("easy to maintain")
    240 //    - No dependencies ("ease of use")
    241 //
    242 // ===========================================================================
    243 //
    244 // I/O callbacks
    245 //
    246 // I/O callbacks allow you to read from arbitrary sources, like packaged
    247 // files or some other source. Data read from callbacks are processed
    248 // through a small internal buffer (currently 128 bytes) to try to reduce
    249 // overhead.
    250 //
    251 // The three functions you must define are "read" (reads some bytes of data),
    252 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
    253 //
    254 // ===========================================================================
    255 //
    256 // SIMD support
    257 //
    258 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
    259 // supported by the compiler. For ARM Neon support, you must explicitly
    260 // request it.
    261 //
    262 // (The old do-it-yourself SIMD API is no longer supported in the current
    263 // code.)
    264 //
    265 // On x86, SSE2 will automatically be used when available based on a run-time
    266 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
    267 // the typical path is to have separate builds for NEON and non-NEON devices
    268 // (at least this is true for iOS and Android). Therefore, the NEON support is
    269 // toggled by a build flag: define STBI_NEON to get NEON loops.
    270 //
    271 // If for some reason you do not want to use any of SIMD code, or if
    272 // you have issues compiling it, you can disable it entirely by
    273 // defining STBI_NO_SIMD.
    274 //
    275 // ===========================================================================
    276 //
    277 // HDR image support   (disable by defining STBI_NO_HDR)
    278 //
    279 // stb_image supports loading HDR images in general, and currently the Radiance
    280 // .HDR file format specifically. You can still load any file through the existing
    281 // interface; if you attempt to load an HDR file, it will be automatically remapped
    282 // to LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
    283 // both of these constants can be reconfigured through this interface:
    284 //
    285 //     stbi_hdr_to_ldr_gamma(2.2f);
    286 //     stbi_hdr_to_ldr_scale(1.0f);
    287 //
    288 // (note, do not use _inverse_ constants; stbi_image will invert them
    289 // appropriately).
    290 //
    291 // Additionally, there is a new, parallel interface for loading files as
    292 // (linear) floats to preserve the full dynamic range:
    293 //
    294 //    float *data = stbi_loadf(filename, &x, &y, &n, 0);
    295 //
    296 // If you load LDR images through this interface, those images will
    297 // be promoted to floating point values, run through the inverse of
    298 // constants corresponding to the above:
    299 //
    300 //     stbi_ldr_to_hdr_scale(1.0f);
    301 //     stbi_ldr_to_hdr_gamma(2.2f);
    302 //
    303 // Finally, given a filename (or an open file or memory block--see header
    304 // file for details) containing image data, you can query for the "most
    305 // appropriate" interface to use (that is, whether the image is HDR or
    306 // not), using:
    307 //
    308 //     stbi_is_hdr(char *filename);
    309 //
    310 // ===========================================================================
    311 //
    312 // iPhone PNG support:
    313 //
    314 // We optionally support converting iPhone-formatted PNGs (which store
    315 // premultiplied BGRA) back to RGB, even though they're internally encoded
    316 // differently. To enable this conversion, call
    317 // stbi_convert_iphone_png_to_rgb(1).
    318 //
    319 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
    320 // pixel to remove any premultiplied alpha *only* if the image file explicitly
    321 // says there's premultiplied data (currently only happens in iPhone images,
    322 // and only if iPhone convert-to-rgb processing is on).
    323 //
    324 // ===========================================================================
    325 //
    326 // ADDITIONAL CONFIGURATION
    327 //
    328 //  - You can suppress implementation of any of the decoders to reduce
    329 //    your code footprint by #defining one or more of the following
    330 //    symbols before creating the implementation.
    331 //
    332 //        STBI_NO_JPEG
    333 //        STBI_NO_PNG
    334 //        STBI_NO_BMP
    335 //        STBI_NO_PSD
    336 //        STBI_NO_TGA
    337 //        STBI_NO_GIF
    338 //        STBI_NO_HDR
    339 //        STBI_NO_PIC
    340 //        STBI_NO_PNM   (.ppm and .pgm)
    341 //
    342 //  - You can request *only* certain decoders and suppress all other ones
    343 //    (this will be more forward-compatible, as addition of new decoders
    344 //    doesn't require you to disable them explicitly):
    345 //
    346 //        STBI_ONLY_JPEG
    347 //        STBI_ONLY_PNG
    348 //        STBI_ONLY_BMP
    349 //        STBI_ONLY_PSD
    350 //        STBI_ONLY_TGA
    351 //        STBI_ONLY_GIF
    352 //        STBI_ONLY_HDR
    353 //        STBI_ONLY_PIC
    354 //        STBI_ONLY_PNM   (.ppm and .pgm)
    355 //
    356 //   - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
    357 //     want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
    358 //
    359 //  - If you define STBI_MAX_DIMENSIONS, stb_image will reject images greater
    360 //    than that size (in either width or height) without further processing.
    361 //    This is to let programs in the wild set an upper bound to prevent
    362 //    denial-of-service attacks on untrusted data, as one could generate a
    363 //    valid image of gigantic dimensions and force stb_image to allocate a
    364 //    huge block of memory and spend disproportionate time decoding it. By
    365 //    default this is set to (1 << 24), which is 16777216, but that's still
    366 //    very big.
    367 
    368 #ifndef STBI_NO_STDIO
    369 #include <stdio.h>
    370 #endif // STBI_NO_STDIO
    371 
    372 #define STBI_VERSION 1
    373 
    374 enum
    375 {
    376    STBI_default = 0, // only used for desired_channels
    377 
    378    STBI_grey       = 1,
    379    STBI_grey_alpha = 2,
    380    STBI_rgb        = 3,
    381    STBI_rgb_alpha  = 4
    382 };
    383 
    384 #include <stdlib.h>
    385 typedef unsigned char stbi_uc;
    386 typedef unsigned short stbi_us;
    387 
    388 #ifdef __cplusplus
    389 extern "C" {
    390 #endif
    391 
    392 #ifndef STBIDEF
    393 #ifdef STB_IMAGE_STATIC
    394 #define STBIDEF static
    395 #else
    396 #define STBIDEF extern
    397 #endif
    398 #endif
    399 
    400 //////////////////////////////////////////////////////////////////////////////
    401 //
    402 // PRIMARY API - works on images of any type
    403 //
    404 
    405 //
    406 // load image by filename, open file, or memory buffer
    407 //
    408 
    409 typedef struct
    410 {
    411    int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
    412    void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
    413    int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
    414 } stbi_io_callbacks;
    415 
    416 ////////////////////////////////////
    417 //
    418 // 8-bits-per-channel interface
    419 //
    420 
    421 STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *channels_in_file, int desired_channels);
    422 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *channels_in_file, int desired_channels);
    423 
    424 #ifndef STBI_NO_STDIO
    425 STBIDEF stbi_uc *stbi_load            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
    426 STBIDEF stbi_uc *stbi_load_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
    427 // for stbi_load_from_file, file pointer is left pointing immediately after image
    428 #endif
    429 
    430 #ifndef STBI_NO_GIF
    431 STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
    432 #endif
    433 
    434 #ifdef STBI_WINDOWS_UTF8
    435 STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input);
    436 #endif
    437 
    438 ////////////////////////////////////
    439 //
    440 // 16-bits-per-channel interface
    441 //
    442 
    443 STBIDEF stbi_us *stbi_load_16_from_memory   (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
    444 STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels);
    445 
    446 #ifndef STBI_NO_STDIO
    447 STBIDEF stbi_us *stbi_load_16          (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
    448 STBIDEF stbi_us *stbi_load_from_file_16(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
    449 #endif
    450 
    451 ////////////////////////////////////
    452 //
    453 // float-per-channel interface
    454 //
    455 #ifndef STBI_NO_LINEAR
    456    STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
    457    STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y,  int *channels_in_file, int desired_channels);
    458 
    459    #ifndef STBI_NO_STDIO
    460    STBIDEF float *stbi_loadf            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
    461    STBIDEF float *stbi_loadf_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
    462    #endif
    463 #endif
    464 
    465 #ifndef STBI_NO_HDR
    466    STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
    467    STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
    468 #endif // STBI_NO_HDR
    469 
    470 #ifndef STBI_NO_LINEAR
    471    STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
    472    STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
    473 #endif // STBI_NO_LINEAR
    474 
    475 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
    476 STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
    477 STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
    478 #ifndef STBI_NO_STDIO
    479 STBIDEF int      stbi_is_hdr          (char const *filename);
    480 STBIDEF int      stbi_is_hdr_from_file(FILE *f);
    481 #endif // STBI_NO_STDIO
    482 
    483 
    484 // get a VERY brief reason for failure
    485 // on most compilers (and ALL modern mainstream compilers) this is threadsafe
    486 STBIDEF const char *stbi_failure_reason  (void);
    487 
    488 // free the loaded image -- this is just free()
    489 STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
    490 
    491 // get image dimensions & components without fully decoding
    492 STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
    493 STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
    494 STBIDEF int      stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len);
    495 STBIDEF int      stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *clbk, void *user);
    496 
    497 #ifndef STBI_NO_STDIO
    498 STBIDEF int      stbi_info               (char const *filename,     int *x, int *y, int *comp);
    499 STBIDEF int      stbi_info_from_file     (FILE *f,                  int *x, int *y, int *comp);
    500 STBIDEF int      stbi_is_16_bit          (char const *filename);
    501 STBIDEF int      stbi_is_16_bit_from_file(FILE *f);
    502 #endif
    503 
    504 
    505 
    506 // for image formats that explicitly notate that they have premultiplied alpha,
    507 // we just return the colors as stored in the file. set this flag to force
    508 // unpremultiplication. results are undefined if the unpremultiply overflow.
    509 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
    510 
    511 // indicate whether we should process iphone images back to canonical format,
    512 // or just pass them through "as-is"
    513 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
    514 
    515 // flip the image vertically, so the first pixel in the output array is the bottom left
    516 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
    517 
    518 // as above, but only applies to images loaded on the thread that calls the function
    519 // this function is only available if your compiler supports thread-local variables;
    520 // calling it will fail to link if your compiler doesn't
    521 STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply);
    522 STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert);
    523 STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip);
    524 
    525 // ZLIB client - used by PNG, available for other purposes
    526 
    527 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
    528 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
    529 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
    530 STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
    531 
    532 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
    533 STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
    534 
    535 
    536 #ifdef __cplusplus
    537 }
    538 #endif
    539 
    540 //
    541 //
    542 ////   end header file   /////////////////////////////////////////////////////
    543 #endif // STBI_INCLUDE_STB_IMAGE_H
    544 
    545 #ifdef STB_IMAGE_IMPLEMENTATION
    546 
    547 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
    548   || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
    549   || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
    550   || defined(STBI_ONLY_ZLIB)
    551    #ifndef STBI_ONLY_JPEG
    552    #define STBI_NO_JPEG
    553    #endif
    554    #ifndef STBI_ONLY_PNG
    555    #define STBI_NO_PNG
    556    #endif
    557    #ifndef STBI_ONLY_BMP
    558    #define STBI_NO_BMP
    559    #endif
    560    #ifndef STBI_ONLY_PSD
    561    #define STBI_NO_PSD
    562    #endif
    563    #ifndef STBI_ONLY_TGA
    564    #define STBI_NO_TGA
    565    #endif
    566    #ifndef STBI_ONLY_GIF
    567    #define STBI_NO_GIF
    568    #endif
    569    #ifndef STBI_ONLY_HDR
    570    #define STBI_NO_HDR
    571    #endif
    572    #ifndef STBI_ONLY_PIC
    573    #define STBI_NO_PIC
    574    #endif
    575    #ifndef STBI_ONLY_PNM
    576    #define STBI_NO_PNM
    577    #endif
    578 #endif
    579 
    580 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
    581 #define STBI_NO_ZLIB
    582 #endif
    583 
    584 
    585 #include <stdarg.h>
    586 #include <stddef.h> // ptrdiff_t on osx
    587 #include <stdlib.h>
    588 #include <string.h>
    589 #include <limits.h>
    590 
    591 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
    592 #include <math.h>  // ldexp, pow
    593 #endif
    594 
    595 #ifndef STBI_NO_STDIO
    596 #include <stdio.h>
    597 #endif
    598 
    599 #ifndef STBI_ASSERT
    600 #include <assert.h>
    601 #define STBI_ASSERT(x) assert(x)
    602 #endif
    603 
    604 #ifdef __cplusplus
    605 #define STBI_EXTERN extern "C"
    606 #else
    607 #define STBI_EXTERN extern
    608 #endif
    609 
    610 
    611 #ifndef _MSC_VER
    612    #ifdef __cplusplus
    613    #define stbi_inline inline
    614    #else
    615    #define stbi_inline
    616    #endif
    617 #else
    618    #define stbi_inline __forceinline
    619 #endif
    620 
    621 #ifndef STBI_NO_THREAD_LOCALS
    622    #if defined(__cplusplus) &&  __cplusplus >= 201103L
    623       #define STBI_THREAD_LOCAL       thread_local
    624    #elif defined(__GNUC__) && __GNUC__ < 5
    625       #define STBI_THREAD_LOCAL       __thread
    626    #elif defined(_MSC_VER)
    627       #define STBI_THREAD_LOCAL       __declspec(thread)
    628    #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L && !defined(__STDC_NO_THREADS__)
    629       #define STBI_THREAD_LOCAL       _Thread_local
    630    #endif
    631 
    632    #ifndef STBI_THREAD_LOCAL
    633       #if defined(__GNUC__)
    634         #define STBI_THREAD_LOCAL       __thread
    635       #endif
    636    #endif
    637 #endif
    638 
    639 #if defined(_MSC_VER) || defined(__SYMBIAN32__)
    640 typedef unsigned short stbi__uint16;
    641 typedef   signed short stbi__int16;
    642 typedef unsigned int   stbi__uint32;
    643 typedef   signed int   stbi__int32;
    644 #else
    645 #include <stdint.h>
    646 typedef uint16_t stbi__uint16;
    647 typedef int16_t  stbi__int16;
    648 typedef uint32_t stbi__uint32;
    649 typedef int32_t  stbi__int32;
    650 #endif
    651 
    652 // should produce compiler error if size is wrong
    653 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
    654 
    655 #ifdef _MSC_VER
    656 #define STBI_NOTUSED(v)  (void)(v)
    657 #else
    658 #define STBI_NOTUSED(v)  (void)sizeof(v)
    659 #endif
    660 
    661 #ifdef _MSC_VER
    662 #define STBI_HAS_LROTL
    663 #endif
    664 
    665 #ifdef STBI_HAS_LROTL
    666    #define stbi_lrot(x,y)  _lrotl(x,y)
    667 #else
    668    #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (-(y) & 31)))
    669 #endif
    670 
    671 #if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
    672 // ok
    673 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
    674 // ok
    675 #else
    676 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
    677 #endif
    678 
    679 #ifndef STBI_MALLOC
    680 #define STBI_MALLOC(sz)           malloc(sz)
    681 #define STBI_REALLOC(p,newsz)     realloc(p,newsz)
    682 #define STBI_FREE(p)              free(p)
    683 #endif
    684 
    685 #ifndef STBI_REALLOC_SIZED
    686 #define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
    687 #endif
    688 
    689 // x86/x64 detection
    690 #if defined(__x86_64__) || defined(_M_X64)
    691 #define STBI__X64_TARGET
    692 #elif defined(__i386) || defined(_M_IX86)
    693 #define STBI__X86_TARGET
    694 #endif
    695 
    696 #if defined(__GNUC__) && defined(STBI__X86_TARGET) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
    697 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
    698 // which in turn means it gets to use SSE2 everywhere. This is unfortunate,
    699 // but previous attempts to provide the SSE2 functions with runtime
    700 // detection caused numerous issues. The way architecture extensions are
    701 // exposed in GCC/Clang is, sadly, not really suited for one-file libs.
    702 // New behavior: if compiled with -msse2, we use SSE2 without any
    703 // detection; if not, we don't use it at all.
    704 #define STBI_NO_SIMD
    705 #endif
    706 
    707 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
    708 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
    709 //
    710 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
    711 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
    712 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
    713 // simultaneously enabling "-mstackrealign".
    714 //
    715 // See https://github.com/nothings/stb/issues/81 for more information.
    716 //
    717 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
    718 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
    719 #define STBI_NO_SIMD
    720 #endif
    721 
    722 #if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
    723 #define STBI_SSE2
    724 #include <emmintrin.h>
    725 
    726 #ifdef _MSC_VER
    727 
    728 #if _MSC_VER >= 1400  // not VC6
    729 #include <intrin.h> // __cpuid
    730 static int stbi__cpuid3(void)
    731 {
    732    int info[4];
    733    __cpuid(info,1);
    734    return info[3];
    735 }
    736 #else
    737 static int stbi__cpuid3(void)
    738 {
    739    int res;
    740    __asm {
    741       mov  eax,1
    742       cpuid
    743       mov  res,edx
    744    }
    745    return res;
    746 }
    747 #endif
    748 
    749 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
    750 
    751 #if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
    752 static int stbi__sse2_available(void)
    753 {
    754    int info3 = stbi__cpuid3();
    755    return ((info3 >> 26) & 1) != 0;
    756 }
    757 #endif
    758 
    759 #else // assume GCC-style if not VC++
    760 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
    761 
    762 #if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
    763 static int stbi__sse2_available(void)
    764 {
    765    // If we're even attempting to compile this on GCC/Clang, that means
    766    // -msse2 is on, which means the compiler is allowed to use SSE2
    767    // instructions at will, and so are we.
    768    return 1;
    769 }
    770 #endif
    771 
    772 #endif
    773 #endif
    774 
    775 // ARM NEON
    776 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
    777 #undef STBI_NEON
    778 #endif
    779 
    780 #ifdef STBI_NEON
    781 #include <arm_neon.h>
    782 #ifdef _MSC_VER
    783 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
    784 #else
    785 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
    786 #endif
    787 #endif
    788 
    789 #ifndef STBI_SIMD_ALIGN
    790 #define STBI_SIMD_ALIGN(type, name) type name
    791 #endif
    792 
    793 #ifndef STBI_MAX_DIMENSIONS
    794 #define STBI_MAX_DIMENSIONS (1 << 24)
    795 #endif
    796 
    797 ///////////////////////////////////////////////
    798 //
    799 //  stbi__context struct and start_xxx functions
    800 
    801 // stbi__context structure is our basic context used by all images, so it
    802 // contains all the IO context, plus some basic image information
    803 typedef struct
    804 {
    805    stbi__uint32 img_x, img_y;
    806    int img_n, img_out_n;
    807 
    808    stbi_io_callbacks io;
    809    void *io_user_data;
    810 
    811    int read_from_callbacks;
    812    int buflen;
    813    stbi_uc buffer_start[128];
    814    int callback_already_read;
    815 
    816    stbi_uc *img_buffer, *img_buffer_end;
    817    stbi_uc *img_buffer_original, *img_buffer_original_end;
    818 } stbi__context;
    819 
    820 
    821 static void stbi__refill_buffer(stbi__context *s);
    822 
    823 // initialize a memory-decode context
    824 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
    825 {
    826    s->io.read = NULL;
    827    s->read_from_callbacks = 0;
    828    s->callback_already_read = 0;
    829    s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
    830    s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
    831 }
    832 
    833 // initialize a callback-based context
    834 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
    835 {
    836    s->io = *c;
    837    s->io_user_data = user;
    838    s->buflen = sizeof(s->buffer_start);
    839    s->read_from_callbacks = 1;
    840    s->callback_already_read = 0;
    841    s->img_buffer = s->img_buffer_original = s->buffer_start;
    842    stbi__refill_buffer(s);
    843    s->img_buffer_original_end = s->img_buffer_end;
    844 }
    845 
    846 #ifndef STBI_NO_STDIO
    847 
    848 static int stbi__stdio_read(void *user, char *data, int size)
    849 {
    850    return (int) fread(data,1,size,(FILE*) user);
    851 }
    852 
    853 static void stbi__stdio_skip(void *user, int n)
    854 {
    855    int ch;
    856    fseek((FILE*) user, n, SEEK_CUR);
    857    ch = fgetc((FILE*) user);  /* have to read a byte to reset feof()'s flag */
    858    if (ch != EOF) {
    859       ungetc(ch, (FILE *) user);  /* push byte back onto stream if valid. */
    860    }
    861 }
    862 
    863 static int stbi__stdio_eof(void *user)
    864 {
    865    return feof((FILE*) user) || ferror((FILE *) user);
    866 }
    867 
    868 static stbi_io_callbacks stbi__stdio_callbacks =
    869 {
    870    stbi__stdio_read,
    871    stbi__stdio_skip,
    872    stbi__stdio_eof,
    873 };
    874 
    875 static void stbi__start_file(stbi__context *s, FILE *f)
    876 {
    877    stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
    878 }
    879 
    880 //static void stop_file(stbi__context *s) { }
    881 
    882 #endif // !STBI_NO_STDIO
    883 
    884 static void stbi__rewind(stbi__context *s)
    885 {
    886    // conceptually rewind SHOULD rewind to the beginning of the stream,
    887    // but we just rewind to the beginning of the initial buffer, because
    888    // we only use it after doing 'test', which only ever looks at at most 92 bytes
    889    s->img_buffer = s->img_buffer_original;
    890    s->img_buffer_end = s->img_buffer_original_end;
    891 }
    892 
    893 enum
    894 {
    895    STBI_ORDER_RGB,
    896    STBI_ORDER_BGR
    897 };
    898 
    899 typedef struct
    900 {
    901    int bits_per_channel;
    902    int num_channels;
    903    int channel_order;
    904 } stbi__result_info;
    905 
    906 #ifndef STBI_NO_JPEG
    907 static int      stbi__jpeg_test(stbi__context *s);
    908 static void    *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    909 static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
    910 #endif
    911 
    912 #ifndef STBI_NO_PNG
    913 static int      stbi__png_test(stbi__context *s);
    914 static void    *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    915 static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
    916 static int      stbi__png_is16(stbi__context *s);
    917 #endif
    918 
    919 #ifndef STBI_NO_BMP
    920 static int      stbi__bmp_test(stbi__context *s);
    921 static void    *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    922 static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
    923 #endif
    924 
    925 #ifndef STBI_NO_TGA
    926 static int      stbi__tga_test(stbi__context *s);
    927 static void    *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    928 static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
    929 #endif
    930 
    931 #ifndef STBI_NO_PSD
    932 static int      stbi__psd_test(stbi__context *s);
    933 static void    *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc);
    934 static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
    935 static int      stbi__psd_is16(stbi__context *s);
    936 #endif
    937 
    938 #ifndef STBI_NO_HDR
    939 static int      stbi__hdr_test(stbi__context *s);
    940 static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    941 static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
    942 #endif
    943 
    944 #ifndef STBI_NO_PIC
    945 static int      stbi__pic_test(stbi__context *s);
    946 static void    *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    947 static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
    948 #endif
    949 
    950 #ifndef STBI_NO_GIF
    951 static int      stbi__gif_test(stbi__context *s);
    952 static void    *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    953 static void    *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
    954 static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
    955 #endif
    956 
    957 #ifndef STBI_NO_PNM
    958 static int      stbi__pnm_test(stbi__context *s);
    959 static void    *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    960 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
    961 static int      stbi__pnm_is16(stbi__context *s);
    962 #endif
    963 
    964 static
    965 #ifdef STBI_THREAD_LOCAL
    966 STBI_THREAD_LOCAL
    967 #endif
    968 const char *stbi__g_failure_reason;
    969 
    970 STBIDEF const char *stbi_failure_reason(void)
    971 {
    972    return stbi__g_failure_reason;
    973 }
    974 
    975 #ifndef STBI_NO_FAILURE_STRINGS
    976 static int stbi__err(const char *str)
    977 {
    978    stbi__g_failure_reason = str;
    979    return 0;
    980 }
    981 #endif
    982 
    983 static void *stbi__malloc(size_t size)
    984 {
    985     return STBI_MALLOC(size);
    986 }
    987 
    988 // stb_image uses ints pervasively, including for offset calculations.
    989 // therefore the largest decoded image size we can support with the
    990 // current code, even on 64-bit targets, is INT_MAX. this is not a
    991 // significant limitation for the intended use case.
    992 //
    993 // we do, however, need to make sure our size calculations don't
    994 // overflow. hence a few helper functions for size calculations that
    995 // multiply integers together, making sure that they're non-negative
    996 // and no overflow occurs.
    997 
    998 // return 1 if the sum is valid, 0 on overflow.
    999 // negative terms are considered invalid.
   1000 static int stbi__addsizes_valid(int a, int b)
   1001 {
   1002    if (b < 0) return 0;
   1003    // now 0 <= b <= INT_MAX, hence also
   1004    // 0 <= INT_MAX - b <= INTMAX.
   1005    // And "a + b <= INT_MAX" (which might overflow) is the
   1006    // same as a <= INT_MAX - b (no overflow)
   1007    return a <= INT_MAX - b;
   1008 }
   1009 
   1010 // returns 1 if the product is valid, 0 on overflow.
   1011 // negative factors are considered invalid.
   1012 static int stbi__mul2sizes_valid(int a, int b)
   1013 {
   1014    if (a < 0 || b < 0) return 0;
   1015    if (b == 0) return 1; // mul-by-0 is always safe
   1016    // portable way to check for no overflows in a*b
   1017    return a <= INT_MAX/b;
   1018 }
   1019 
   1020 #if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
   1021 // returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow
   1022 static int stbi__mad2sizes_valid(int a, int b, int add)
   1023 {
   1024    return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a*b, add);
   1025 }
   1026 #endif
   1027 
   1028 // returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow
   1029 static int stbi__mad3sizes_valid(int a, int b, int c, int add)
   1030 {
   1031    return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
   1032       stbi__addsizes_valid(a*b*c, add);
   1033 }
   1034 
   1035 // returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't overflow
   1036 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
   1037 static int stbi__mad4sizes_valid(int a, int b, int c, int d, int add)
   1038 {
   1039    return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
   1040       stbi__mul2sizes_valid(a*b*c, d) && stbi__addsizes_valid(a*b*c*d, add);
   1041 }
   1042 #endif
   1043 
   1044 #if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
   1045 // mallocs with size overflow checking
   1046 static void *stbi__malloc_mad2(int a, int b, int add)
   1047 {
   1048    if (!stbi__mad2sizes_valid(a, b, add)) return NULL;
   1049    return stbi__malloc(a*b + add);
   1050 }
   1051 #endif
   1052 
   1053 static void *stbi__malloc_mad3(int a, int b, int c, int add)
   1054 {
   1055    if (!stbi__mad3sizes_valid(a, b, c, add)) return NULL;
   1056    return stbi__malloc(a*b*c + add);
   1057 }
   1058 
   1059 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
   1060 static void *stbi__malloc_mad4(int a, int b, int c, int d, int add)
   1061 {
   1062    if (!stbi__mad4sizes_valid(a, b, c, d, add)) return NULL;
   1063    return stbi__malloc(a*b*c*d + add);
   1064 }
   1065 #endif
   1066 
   1067 // returns 1 if the sum of two signed ints is valid (between -2^31 and 2^31-1 inclusive), 0 on overflow.
   1068 static int stbi__addints_valid(int a, int b)
   1069 {
   1070    if ((a >= 0) != (b >= 0)) return 1; // a and b have different signs, so no overflow
   1071    if (a < 0 && b < 0) return a >= INT_MIN - b; // same as a + b >= INT_MIN; INT_MIN - b cannot overflow since b < 0.
   1072    return a <= INT_MAX - b;
   1073 }
   1074 
   1075 // returns 1 if the product of two signed shorts is valid, 0 on overflow.
   1076 static int stbi__mul2shorts_valid(short a, short b)
   1077 {
   1078    if (b == 0 || b == -1) return 1; // multiplication by 0 is always 0; check for -1 so SHRT_MIN/b doesn't overflow
   1079    if ((a >= 0) == (b >= 0)) return a <= SHRT_MAX/b; // product is positive, so similar to mul2sizes_valid
   1080    if (b < 0) return a <= SHRT_MIN / b; // same as a * b >= SHRT_MIN
   1081    return a >= SHRT_MIN / b;
   1082 }
   1083 
   1084 // stbi__err - error
   1085 // stbi__errpf - error returning pointer to float
   1086 // stbi__errpuc - error returning pointer to unsigned char
   1087 
   1088 #ifdef STBI_NO_FAILURE_STRINGS
   1089    #define stbi__err(x,y)  0
   1090 #elif defined(STBI_FAILURE_USERMSG)
   1091    #define stbi__err(x,y)  stbi__err(y)
   1092 #else
   1093    #define stbi__err(x,y)  stbi__err(x)
   1094 #endif
   1095 
   1096 #define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
   1097 #define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
   1098 
   1099 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
   1100 {
   1101    STBI_FREE(retval_from_stbi_load);
   1102 }
   1103 
   1104 #ifndef STBI_NO_LINEAR
   1105 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
   1106 #endif
   1107 
   1108 #ifndef STBI_NO_HDR
   1109 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
   1110 #endif
   1111 
   1112 static int stbi__vertically_flip_on_load_global = 0;
   1113 
   1114 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
   1115 {
   1116    stbi__vertically_flip_on_load_global = flag_true_if_should_flip;
   1117 }
   1118 
   1119 #ifndef STBI_THREAD_LOCAL
   1120 #define stbi__vertically_flip_on_load  stbi__vertically_flip_on_load_global
   1121 #else
   1122 static STBI_THREAD_LOCAL int stbi__vertically_flip_on_load_local, stbi__vertically_flip_on_load_set;
   1123 
   1124 STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip)
   1125 {
   1126    stbi__vertically_flip_on_load_local = flag_true_if_should_flip;
   1127    stbi__vertically_flip_on_load_set = 1;
   1128 }
   1129 
   1130 #define stbi__vertically_flip_on_load  (stbi__vertically_flip_on_load_set       \
   1131                                          ? stbi__vertically_flip_on_load_local  \
   1132                                          : stbi__vertically_flip_on_load_global)
   1133 #endif // STBI_THREAD_LOCAL
   1134 
   1135 static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
   1136 {
   1137    memset(ri, 0, sizeof(*ri)); // make sure it's initialized if we add new fields
   1138    ri->bits_per_channel = 8; // default is 8 so most paths don't have to be changed
   1139    ri->channel_order = STBI_ORDER_RGB; // all current input & output are this, but this is here so we can add BGR order
   1140    ri->num_channels = 0;
   1141 
   1142    // test the formats with a very explicit header first (at least a FOURCC
   1143    // or distinctive magic number first)
   1144    #ifndef STBI_NO_PNG
   1145    if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp, ri);
   1146    #endif
   1147    #ifndef STBI_NO_BMP
   1148    if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp, ri);
   1149    #endif
   1150    #ifndef STBI_NO_GIF
   1151    if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp, ri);
   1152    #endif
   1153    #ifndef STBI_NO_PSD
   1154    if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp, ri, bpc);
   1155    #else
   1156    STBI_NOTUSED(bpc);
   1157    #endif
   1158    #ifndef STBI_NO_PIC
   1159    if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp, ri);
   1160    #endif
   1161 
   1162    // then the formats that can end up attempting to load with just 1 or 2
   1163    // bytes matching expectations; these are prone to false positives, so
   1164    // try them later
   1165    #ifndef STBI_NO_JPEG
   1166    if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp, ri);
   1167    #endif
   1168    #ifndef STBI_NO_PNM
   1169    if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp, ri);
   1170    #endif
   1171 
   1172    #ifndef STBI_NO_HDR
   1173    if (stbi__hdr_test(s)) {
   1174       float *hdr = stbi__hdr_load(s, x,y,comp,req_comp, ri);
   1175       return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
   1176    }
   1177    #endif
   1178 
   1179    #ifndef STBI_NO_TGA
   1180    // test tga last because it's a crappy test!
   1181    if (stbi__tga_test(s))
   1182       return stbi__tga_load(s,x,y,comp,req_comp, ri);
   1183    #endif
   1184 
   1185    return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
   1186 }
   1187 
   1188 static stbi_uc *stbi__convert_16_to_8(stbi__uint16 *orig, int w, int h, int channels)
   1189 {
   1190    int i;
   1191    int img_len = w * h * channels;
   1192    stbi_uc *reduced;
   1193 
   1194    reduced = (stbi_uc *) stbi__malloc(img_len);
   1195    if (reduced == NULL) return stbi__errpuc("outofmem", "Out of memory");
   1196 
   1197    for (i = 0; i < img_len; ++i)
   1198       reduced[i] = (stbi_uc)((orig[i] >> 8) & 0xFF); // top half of each byte is sufficient approx of 16->8 bit scaling
   1199 
   1200    STBI_FREE(orig);
   1201    return reduced;
   1202 }
   1203 
   1204 static stbi__uint16 *stbi__convert_8_to_16(stbi_uc *orig, int w, int h, int channels)
   1205 {
   1206    int i;
   1207    int img_len = w * h * channels;
   1208    stbi__uint16 *enlarged;
   1209 
   1210    enlarged = (stbi__uint16 *) stbi__malloc(img_len*2);
   1211    if (enlarged == NULL) return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
   1212 
   1213    for (i = 0; i < img_len; ++i)
   1214       enlarged[i] = (stbi__uint16)((orig[i] << 8) + orig[i]); // replicate to high and low byte, maps 0->0, 255->0xffff
   1215 
   1216    STBI_FREE(orig);
   1217    return enlarged;
   1218 }
   1219 
   1220 static void stbi__vertical_flip(void *image, int w, int h, int bytes_per_pixel)
   1221 {
   1222    int row;
   1223    size_t bytes_per_row = (size_t)w * bytes_per_pixel;
   1224    stbi_uc temp[2048];
   1225    stbi_uc *bytes = (stbi_uc *)image;
   1226 
   1227    for (row = 0; row < (h>>1); row++) {
   1228       stbi_uc *row0 = bytes + row*bytes_per_row;
   1229       stbi_uc *row1 = bytes + (h - row - 1)*bytes_per_row;
   1230       // swap row0 with row1
   1231       size_t bytes_left = bytes_per_row;
   1232       while (bytes_left) {
   1233          size_t bytes_copy = (bytes_left < sizeof(temp)) ? bytes_left : sizeof(temp);
   1234          memcpy(temp, row0, bytes_copy);
   1235          memcpy(row0, row1, bytes_copy);
   1236          memcpy(row1, temp, bytes_copy);
   1237          row0 += bytes_copy;
   1238          row1 += bytes_copy;
   1239          bytes_left -= bytes_copy;
   1240       }
   1241    }
   1242 }
   1243 
   1244 #ifndef STBI_NO_GIF
   1245 static void stbi__vertical_flip_slices(void *image, int w, int h, int z, int bytes_per_pixel)
   1246 {
   1247    int slice;
   1248    int slice_size = w * h * bytes_per_pixel;
   1249 
   1250    stbi_uc *bytes = (stbi_uc *)image;
   1251    for (slice = 0; slice < z; ++slice) {
   1252       stbi__vertical_flip(bytes, w, h, bytes_per_pixel);
   1253       bytes += slice_size;
   1254    }
   1255 }
   1256 #endif
   1257 
   1258 static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   1259 {
   1260    stbi__result_info ri;
   1261    void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 8);
   1262 
   1263    if (result == NULL)
   1264       return NULL;
   1265 
   1266    // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
   1267    STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
   1268 
   1269    if (ri.bits_per_channel != 8) {
   1270       result = stbi__convert_16_to_8((stbi__uint16 *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
   1271       ri.bits_per_channel = 8;
   1272    }
   1273 
   1274    // @TODO: move stbi__convert_format to here
   1275 
   1276    if (stbi__vertically_flip_on_load) {
   1277       int channels = req_comp ? req_comp : *comp;
   1278       stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi_uc));
   1279    }
   1280 
   1281    return (unsigned char *) result;
   1282 }
   1283 
   1284 static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   1285 {
   1286    stbi__result_info ri;
   1287    void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 16);
   1288 
   1289    if (result == NULL)
   1290       return NULL;
   1291 
   1292    // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
   1293    STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
   1294 
   1295    if (ri.bits_per_channel != 16) {
   1296       result = stbi__convert_8_to_16((stbi_uc *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
   1297       ri.bits_per_channel = 16;
   1298    }
   1299 
   1300    // @TODO: move stbi__convert_format16 to here
   1301    // @TODO: special case RGB-to-Y (and RGBA-to-YA) for 8-bit-to-16-bit case to keep more precision
   1302 
   1303    if (stbi__vertically_flip_on_load) {
   1304       int channels = req_comp ? req_comp : *comp;
   1305       stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi__uint16));
   1306    }
   1307 
   1308    return (stbi__uint16 *) result;
   1309 }
   1310 
   1311 #if !defined(STBI_NO_HDR) && !defined(STBI_NO_LINEAR)
   1312 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
   1313 {
   1314    if (stbi__vertically_flip_on_load && result != NULL) {
   1315       int channels = req_comp ? req_comp : *comp;
   1316       stbi__vertical_flip(result, *x, *y, channels * sizeof(float));
   1317    }
   1318 }
   1319 #endif
   1320 
   1321 #ifndef STBI_NO_STDIO
   1322 
   1323 #if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
   1324 STBI_EXTERN __declspec(dllimport) int __stdcall MultiByteToWideChar(unsigned int cp, unsigned long flags, const char *str, int cbmb, wchar_t *widestr, int cchwide);
   1325 STBI_EXTERN __declspec(dllimport) int __stdcall WideCharToMultiByte(unsigned int cp, unsigned long flags, const wchar_t *widestr, int cchwide, char *str, int cbmb, const char *defchar, int *used_default);
   1326 #endif
   1327 
   1328 #if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
   1329 STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input)
   1330 {
   1331 	return WideCharToMultiByte(65001 /* UTF8 */, 0, input, -1, buffer, (int) bufferlen, NULL, NULL);
   1332 }
   1333 #endif
   1334 
   1335 static FILE *stbi__fopen(char const *filename, char const *mode)
   1336 {
   1337    FILE *f;
   1338 #if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
   1339    wchar_t wMode[64];
   1340    wchar_t wFilename[1024];
   1341 	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, filename, -1, wFilename, sizeof(wFilename)/sizeof(*wFilename)))
   1342       return 0;
   1343 
   1344 	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, mode, -1, wMode, sizeof(wMode)/sizeof(*wMode)))
   1345       return 0;
   1346 
   1347 #if defined(_MSC_VER) && _MSC_VER >= 1400
   1348 	if (0 != _wfopen_s(&f, wFilename, wMode))
   1349 		f = 0;
   1350 #else
   1351    f = _wfopen(wFilename, wMode);
   1352 #endif
   1353 
   1354 #elif defined(_MSC_VER) && _MSC_VER >= 1400
   1355    if (0 != fopen_s(&f, filename, mode))
   1356       f=0;
   1357 #else
   1358    f = fopen(filename, mode);
   1359 #endif
   1360    return f;
   1361 }
   1362 
   1363 
   1364 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
   1365 {
   1366    FILE *f = stbi__fopen(filename, "rb");
   1367    unsigned char *result;
   1368    if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
   1369    result = stbi_load_from_file(f,x,y,comp,req_comp);
   1370    fclose(f);
   1371    return result;
   1372 }
   1373 
   1374 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
   1375 {
   1376    unsigned char *result;
   1377    stbi__context s;
   1378    stbi__start_file(&s,f);
   1379    result = stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
   1380    if (result) {
   1381       // need to 'unget' all the characters in the IO buffer
   1382       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
   1383    }
   1384    return result;
   1385 }
   1386 
   1387 STBIDEF stbi__uint16 *stbi_load_from_file_16(FILE *f, int *x, int *y, int *comp, int req_comp)
   1388 {
   1389    stbi__uint16 *result;
   1390    stbi__context s;
   1391    stbi__start_file(&s,f);
   1392    result = stbi__load_and_postprocess_16bit(&s,x,y,comp,req_comp);
   1393    if (result) {
   1394       // need to 'unget' all the characters in the IO buffer
   1395       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
   1396    }
   1397    return result;
   1398 }
   1399 
   1400 STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *comp, int req_comp)
   1401 {
   1402    FILE *f = stbi__fopen(filename, "rb");
   1403    stbi__uint16 *result;
   1404    if (!f) return (stbi_us *) stbi__errpuc("can't fopen", "Unable to open file");
   1405    result = stbi_load_from_file_16(f,x,y,comp,req_comp);
   1406    fclose(f);
   1407    return result;
   1408 }
   1409 
   1410 
   1411 #endif //!STBI_NO_STDIO
   1412 
   1413 STBIDEF stbi_us *stbi_load_16_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels)
   1414 {
   1415    stbi__context s;
   1416    stbi__start_mem(&s,buffer,len);
   1417    return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
   1418 }
   1419 
   1420 STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels)
   1421 {
   1422    stbi__context s;
   1423    stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
   1424    return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
   1425 }
   1426 
   1427 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
   1428 {
   1429    stbi__context s;
   1430    stbi__start_mem(&s,buffer,len);
   1431    return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
   1432 }
   1433 
   1434 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
   1435 {
   1436    stbi__context s;
   1437    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1438    return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
   1439 }
   1440 
   1441 #ifndef STBI_NO_GIF
   1442 STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
   1443 {
   1444    unsigned char *result;
   1445    stbi__context s;
   1446    stbi__start_mem(&s,buffer,len);
   1447 
   1448    result = (unsigned char*) stbi__load_gif_main(&s, delays, x, y, z, comp, req_comp);
   1449    if (stbi__vertically_flip_on_load) {
   1450       stbi__vertical_flip_slices( result, *x, *y, *z, *comp );
   1451    }
   1452 
   1453    return result;
   1454 }
   1455 #endif
   1456 
   1457 #ifndef STBI_NO_LINEAR
   1458 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   1459 {
   1460    unsigned char *data;
   1461    #ifndef STBI_NO_HDR
   1462    if (stbi__hdr_test(s)) {
   1463       stbi__result_info ri;
   1464       float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp, &ri);
   1465       if (hdr_data)
   1466          stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
   1467       return hdr_data;
   1468    }
   1469    #endif
   1470    data = stbi__load_and_postprocess_8bit(s, x, y, comp, req_comp);
   1471    if (data)
   1472       return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
   1473    return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
   1474 }
   1475 
   1476 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
   1477 {
   1478    stbi__context s;
   1479    stbi__start_mem(&s,buffer,len);
   1480    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1481 }
   1482 
   1483 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
   1484 {
   1485    stbi__context s;
   1486    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1487    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1488 }
   1489 
   1490 #ifndef STBI_NO_STDIO
   1491 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
   1492 {
   1493    float *result;
   1494    FILE *f = stbi__fopen(filename, "rb");
   1495    if (!f) return stbi__errpf("can't fopen", "Unable to open file");
   1496    result = stbi_loadf_from_file(f,x,y,comp,req_comp);
   1497    fclose(f);
   1498    return result;
   1499 }
   1500 
   1501 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
   1502 {
   1503    stbi__context s;
   1504    stbi__start_file(&s,f);
   1505    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1506 }
   1507 #endif // !STBI_NO_STDIO
   1508 
   1509 #endif // !STBI_NO_LINEAR
   1510 
   1511 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
   1512 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
   1513 // reports false!
   1514 
   1515 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
   1516 {
   1517    #ifndef STBI_NO_HDR
   1518    stbi__context s;
   1519    stbi__start_mem(&s,buffer,len);
   1520    return stbi__hdr_test(&s);
   1521    #else
   1522    STBI_NOTUSED(buffer);
   1523    STBI_NOTUSED(len);
   1524    return 0;
   1525    #endif
   1526 }
   1527 
   1528 #ifndef STBI_NO_STDIO
   1529 STBIDEF int      stbi_is_hdr          (char const *filename)
   1530 {
   1531    FILE *f = stbi__fopen(filename, "rb");
   1532    int result=0;
   1533    if (f) {
   1534       result = stbi_is_hdr_from_file(f);
   1535       fclose(f);
   1536    }
   1537    return result;
   1538 }
   1539 
   1540 STBIDEF int stbi_is_hdr_from_file(FILE *f)
   1541 {
   1542    #ifndef STBI_NO_HDR
   1543    long pos = ftell(f);
   1544    int res;
   1545    stbi__context s;
   1546    stbi__start_file(&s,f);
   1547    res = stbi__hdr_test(&s);
   1548    fseek(f, pos, SEEK_SET);
   1549    return res;
   1550    #else
   1551    STBI_NOTUSED(f);
   1552    return 0;
   1553    #endif
   1554 }
   1555 #endif // !STBI_NO_STDIO
   1556 
   1557 STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
   1558 {
   1559    #ifndef STBI_NO_HDR
   1560    stbi__context s;
   1561    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1562    return stbi__hdr_test(&s);
   1563    #else
   1564    STBI_NOTUSED(clbk);
   1565    STBI_NOTUSED(user);
   1566    return 0;
   1567    #endif
   1568 }
   1569 
   1570 #ifndef STBI_NO_LINEAR
   1571 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
   1572 
   1573 STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
   1574 STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
   1575 #endif
   1576 
   1577 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
   1578 
   1579 STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
   1580 STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
   1581 
   1582 
   1583 //////////////////////////////////////////////////////////////////////////////
   1584 //
   1585 // Common code used by all image loaders
   1586 //
   1587 
   1588 enum
   1589 {
   1590    STBI__SCAN_load=0,
   1591    STBI__SCAN_type,
   1592    STBI__SCAN_header
   1593 };
   1594 
   1595 static void stbi__refill_buffer(stbi__context *s)
   1596 {
   1597    int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
   1598    s->callback_already_read += (int) (s->img_buffer - s->img_buffer_original);
   1599    if (n == 0) {
   1600       // at end of file, treat same as if from memory, but need to handle case
   1601       // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
   1602       s->read_from_callbacks = 0;
   1603       s->img_buffer = s->buffer_start;
   1604       s->img_buffer_end = s->buffer_start+1;
   1605       *s->img_buffer = 0;
   1606    } else {
   1607       s->img_buffer = s->buffer_start;
   1608       s->img_buffer_end = s->buffer_start + n;
   1609    }
   1610 }
   1611 
   1612 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
   1613 {
   1614    if (s->img_buffer < s->img_buffer_end)
   1615       return *s->img_buffer++;
   1616    if (s->read_from_callbacks) {
   1617       stbi__refill_buffer(s);
   1618       return *s->img_buffer++;
   1619    }
   1620    return 0;
   1621 }
   1622 
   1623 #if defined(STBI_NO_JPEG) && defined(STBI_NO_HDR) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
   1624 // nothing
   1625 #else
   1626 stbi_inline static int stbi__at_eof(stbi__context *s)
   1627 {
   1628    if (s->io.read) {
   1629       if (!(s->io.eof)(s->io_user_data)) return 0;
   1630       // if feof() is true, check if buffer = end
   1631       // special case: we've only got the special 0 character at the end
   1632       if (s->read_from_callbacks == 0) return 1;
   1633    }
   1634 
   1635    return s->img_buffer >= s->img_buffer_end;
   1636 }
   1637 #endif
   1638 
   1639 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC)
   1640 // nothing
   1641 #else
   1642 static void stbi__skip(stbi__context *s, int n)
   1643 {
   1644    if (n == 0) return;  // already there!
   1645    if (n < 0) {
   1646       s->img_buffer = s->img_buffer_end;
   1647       return;
   1648    }
   1649    if (s->io.read) {
   1650       int blen = (int) (s->img_buffer_end - s->img_buffer);
   1651       if (blen < n) {
   1652          s->img_buffer = s->img_buffer_end;
   1653          (s->io.skip)(s->io_user_data, n - blen);
   1654          return;
   1655       }
   1656    }
   1657    s->img_buffer += n;
   1658 }
   1659 #endif
   1660 
   1661 #if defined(STBI_NO_PNG) && defined(STBI_NO_TGA) && defined(STBI_NO_HDR) && defined(STBI_NO_PNM)
   1662 // nothing
   1663 #else
   1664 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
   1665 {
   1666    if (s->io.read) {
   1667       int blen = (int) (s->img_buffer_end - s->img_buffer);
   1668       if (blen < n) {
   1669          int res, count;
   1670 
   1671          memcpy(buffer, s->img_buffer, blen);
   1672 
   1673          count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
   1674          res = (count == (n-blen));
   1675          s->img_buffer = s->img_buffer_end;
   1676          return res;
   1677       }
   1678    }
   1679 
   1680    if (s->img_buffer+n <= s->img_buffer_end) {
   1681       memcpy(buffer, s->img_buffer, n);
   1682       s->img_buffer += n;
   1683       return 1;
   1684    } else
   1685       return 0;
   1686 }
   1687 #endif
   1688 
   1689 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
   1690 // nothing
   1691 #else
   1692 static int stbi__get16be(stbi__context *s)
   1693 {
   1694    int z = stbi__get8(s);
   1695    return (z << 8) + stbi__get8(s);
   1696 }
   1697 #endif
   1698 
   1699 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
   1700 // nothing
   1701 #else
   1702 static stbi__uint32 stbi__get32be(stbi__context *s)
   1703 {
   1704    stbi__uint32 z = stbi__get16be(s);
   1705    return (z << 16) + stbi__get16be(s);
   1706 }
   1707 #endif
   1708 
   1709 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
   1710 // nothing
   1711 #else
   1712 static int stbi__get16le(stbi__context *s)
   1713 {
   1714    int z = stbi__get8(s);
   1715    return z + (stbi__get8(s) << 8);
   1716 }
   1717 #endif
   1718 
   1719 #ifndef STBI_NO_BMP
   1720 static stbi__uint32 stbi__get32le(stbi__context *s)
   1721 {
   1722    stbi__uint32 z = stbi__get16le(s);
   1723    z += (stbi__uint32)stbi__get16le(s) << 16;
   1724    return z;
   1725 }
   1726 #endif
   1727 
   1728 #define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
   1729 
   1730 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
   1731 // nothing
   1732 #else
   1733 //////////////////////////////////////////////////////////////////////////////
   1734 //
   1735 //  generic converter from built-in img_n to req_comp
   1736 //    individual types do this automatically as much as possible (e.g. jpeg
   1737 //    does all cases internally since it needs to colorspace convert anyway,
   1738 //    and it never has alpha, so very few cases ). png can automatically
   1739 //    interleave an alpha=255 channel, but falls back to this for other cases
   1740 //
   1741 //  assume data buffer is malloced, so malloc a new one and free that one
   1742 //  only failure mode is malloc failing
   1743 
   1744 static stbi_uc stbi__compute_y(int r, int g, int b)
   1745 {
   1746    return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
   1747 }
   1748 #endif
   1749 
   1750 #if defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
   1751 // nothing
   1752 #else
   1753 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
   1754 {
   1755    int i,j;
   1756    unsigned char *good;
   1757 
   1758    if (req_comp == img_n) return data;
   1759    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
   1760 
   1761    good = (unsigned char *) stbi__malloc_mad3(req_comp, x, y, 0);
   1762    if (good == NULL) {
   1763       STBI_FREE(data);
   1764       return stbi__errpuc("outofmem", "Out of memory");
   1765    }
   1766 
   1767    for (j=0; j < (int) y; ++j) {
   1768       unsigned char *src  = data + j * x * img_n   ;
   1769       unsigned char *dest = good + j * x * req_comp;
   1770 
   1771       #define STBI__COMBO(a,b)  ((a)*8+(b))
   1772       #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
   1773       // convert source image with img_n components to one with req_comp components;
   1774       // avoid switch per pixel, so use switch per scanline and massive macros
   1775       switch (STBI__COMBO(img_n, req_comp)) {
   1776          STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=255;                                     } break;
   1777          STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
   1778          STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=255;                     } break;
   1779          STBI__CASE(2,1) { dest[0]=src[0];                                                  } break;
   1780          STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
   1781          STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                  } break;
   1782          STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=255;        } break;
   1783          STBI__CASE(3,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
   1784          STBI__CASE(3,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = 255;    } break;
   1785          STBI__CASE(4,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
   1786          STBI__CASE(4,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = src[3]; } break;
   1787          STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                    } break;
   1788          default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return stbi__errpuc("unsupported", "Unsupported format conversion");
   1789       }
   1790       #undef STBI__CASE
   1791    }
   1792 
   1793    STBI_FREE(data);
   1794    return good;
   1795 }
   1796 #endif
   1797 
   1798 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
   1799 // nothing
   1800 #else
   1801 static stbi__uint16 stbi__compute_y_16(int r, int g, int b)
   1802 {
   1803    return (stbi__uint16) (((r*77) + (g*150) +  (29*b)) >> 8);
   1804 }
   1805 #endif
   1806 
   1807 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
   1808 // nothing
   1809 #else
   1810 static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp, unsigned int x, unsigned int y)
   1811 {
   1812    int i,j;
   1813    stbi__uint16 *good;
   1814 
   1815    if (req_comp == img_n) return data;
   1816    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
   1817 
   1818    good = (stbi__uint16 *) stbi__malloc(req_comp * x * y * 2);
   1819    if (good == NULL) {
   1820       STBI_FREE(data);
   1821       return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
   1822    }
   1823 
   1824    for (j=0; j < (int) y; ++j) {
   1825       stbi__uint16 *src  = data + j * x * img_n   ;
   1826       stbi__uint16 *dest = good + j * x * req_comp;
   1827 
   1828       #define STBI__COMBO(a,b)  ((a)*8+(b))
   1829       #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
   1830       // convert source image with img_n components to one with req_comp components;
   1831       // avoid switch per pixel, so use switch per scanline and massive macros
   1832       switch (STBI__COMBO(img_n, req_comp)) {
   1833          STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=0xffff;                                     } break;
   1834          STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
   1835          STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=0xffff;                     } break;
   1836          STBI__CASE(2,1) { dest[0]=src[0];                                                     } break;
   1837          STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
   1838          STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                     } break;
   1839          STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=0xffff;        } break;
   1840          STBI__CASE(3,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
   1841          STBI__CASE(3,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = 0xffff; } break;
   1842          STBI__CASE(4,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
   1843          STBI__CASE(4,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = src[3]; } break;
   1844          STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                       } break;
   1845          default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return (stbi__uint16*) stbi__errpuc("unsupported", "Unsupported format conversion");
   1846       }
   1847       #undef STBI__CASE
   1848    }
   1849 
   1850    STBI_FREE(data);
   1851    return good;
   1852 }
   1853 #endif
   1854 
   1855 #ifndef STBI_NO_LINEAR
   1856 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
   1857 {
   1858    int i,k,n;
   1859    float *output;
   1860    if (!data) return NULL;
   1861    output = (float *) stbi__malloc_mad4(x, y, comp, sizeof(float), 0);
   1862    if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
   1863    // compute number of non-alpha components
   1864    if (comp & 1) n = comp; else n = comp-1;
   1865    for (i=0; i < x*y; ++i) {
   1866       for (k=0; k < n; ++k) {
   1867          output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
   1868       }
   1869    }
   1870    if (n < comp) {
   1871       for (i=0; i < x*y; ++i) {
   1872          output[i*comp + n] = data[i*comp + n]/255.0f;
   1873       }
   1874    }
   1875    STBI_FREE(data);
   1876    return output;
   1877 }
   1878 #endif
   1879 
   1880 #ifndef STBI_NO_HDR
   1881 #define stbi__float2int(x)   ((int) (x))
   1882 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
   1883 {
   1884    int i,k,n;
   1885    stbi_uc *output;
   1886    if (!data) return NULL;
   1887    output = (stbi_uc *) stbi__malloc_mad3(x, y, comp, 0);
   1888    if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
   1889    // compute number of non-alpha components
   1890    if (comp & 1) n = comp; else n = comp-1;
   1891    for (i=0; i < x*y; ++i) {
   1892       for (k=0; k < n; ++k) {
   1893          float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
   1894          if (z < 0) z = 0;
   1895          if (z > 255) z = 255;
   1896          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
   1897       }
   1898       if (k < comp) {
   1899          float z = data[i*comp+k] * 255 + 0.5f;
   1900          if (z < 0) z = 0;
   1901          if (z > 255) z = 255;
   1902          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
   1903       }
   1904    }
   1905    STBI_FREE(data);
   1906    return output;
   1907 }
   1908 #endif
   1909 
   1910 //////////////////////////////////////////////////////////////////////////////
   1911 //
   1912 //  "baseline" JPEG/JFIF decoder
   1913 //
   1914 //    simple implementation
   1915 //      - doesn't support delayed output of y-dimension
   1916 //      - simple interface (only one output format: 8-bit interleaved RGB)
   1917 //      - doesn't try to recover corrupt jpegs
   1918 //      - doesn't allow partial loading, loading multiple at once
   1919 //      - still fast on x86 (copying globals into locals doesn't help x86)
   1920 //      - allocates lots of intermediate memory (full size of all components)
   1921 //        - non-interleaved case requires this anyway
   1922 //        - allows good upsampling (see next)
   1923 //    high-quality
   1924 //      - upsampled channels are bilinearly interpolated, even across blocks
   1925 //      - quality integer IDCT derived from IJG's 'slow'
   1926 //    performance
   1927 //      - fast huffman; reasonable integer IDCT
   1928 //      - some SIMD kernels for common paths on targets with SSE2/NEON
   1929 //      - uses a lot of intermediate memory, could cache poorly
   1930 
   1931 #ifndef STBI_NO_JPEG
   1932 
   1933 // huffman decoding acceleration
   1934 #define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
   1935 
   1936 typedef struct
   1937 {
   1938    stbi_uc  fast[1 << FAST_BITS];
   1939    // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
   1940    stbi__uint16 code[256];
   1941    stbi_uc  values[256];
   1942    stbi_uc  size[257];
   1943    unsigned int maxcode[18];
   1944    int    delta[17];   // old 'firstsymbol' - old 'firstcode'
   1945 } stbi__huffman;
   1946 
   1947 typedef struct
   1948 {
   1949    stbi__context *s;
   1950    stbi__huffman huff_dc[4];
   1951    stbi__huffman huff_ac[4];
   1952    stbi__uint16 dequant[4][64];
   1953    stbi__int16 fast_ac[4][1 << FAST_BITS];
   1954 
   1955 // sizes for components, interleaved MCUs
   1956    int img_h_max, img_v_max;
   1957    int img_mcu_x, img_mcu_y;
   1958    int img_mcu_w, img_mcu_h;
   1959 
   1960 // definition of jpeg image component
   1961    struct
   1962    {
   1963       int id;
   1964       int h,v;
   1965       int tq;
   1966       int hd,ha;
   1967       int dc_pred;
   1968 
   1969       int x,y,w2,h2;
   1970       stbi_uc *data;
   1971       void *raw_data, *raw_coeff;
   1972       stbi_uc *linebuf;
   1973       short   *coeff;   // progressive only
   1974       int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
   1975    } img_comp[4];
   1976 
   1977    stbi__uint32   code_buffer; // jpeg entropy-coded buffer
   1978    int            code_bits;   // number of valid bits
   1979    unsigned char  marker;      // marker seen while filling entropy buffer
   1980    int            nomore;      // flag if we saw a marker so must stop
   1981 
   1982    int            progressive;
   1983    int            spec_start;
   1984    int            spec_end;
   1985    int            succ_high;
   1986    int            succ_low;
   1987    int            eob_run;
   1988    int            jfif;
   1989    int            app14_color_transform; // Adobe APP14 tag
   1990    int            rgb;
   1991 
   1992    int scan_n, order[4];
   1993    int restart_interval, todo;
   1994 
   1995 // kernels
   1996    void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
   1997    void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
   1998    stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
   1999 } stbi__jpeg;
   2000 
   2001 static int stbi__build_huffman(stbi__huffman *h, int *count)
   2002 {
   2003    int i,j,k=0;
   2004    unsigned int code;
   2005    // build size list for each symbol (from JPEG spec)
   2006    for (i=0; i < 16; ++i) {
   2007       for (j=0; j < count[i]; ++j) {
   2008          h->size[k++] = (stbi_uc) (i+1);
   2009          if(k >= 257) return stbi__err("bad size list","Corrupt JPEG");
   2010       }
   2011    }
   2012    h->size[k] = 0;
   2013 
   2014    // compute actual symbols (from jpeg spec)
   2015    code = 0;
   2016    k = 0;
   2017    for(j=1; j <= 16; ++j) {
   2018       // compute delta to add to code to compute symbol id
   2019       h->delta[j] = k - code;
   2020       if (h->size[k] == j) {
   2021          while (h->size[k] == j)
   2022             h->code[k++] = (stbi__uint16) (code++);
   2023          if (code-1 >= (1u << j)) return stbi__err("bad code lengths","Corrupt JPEG");
   2024       }
   2025       // compute largest code + 1 for this size, preshifted as needed later
   2026       h->maxcode[j] = code << (16-j);
   2027       code <<= 1;
   2028    }
   2029    h->maxcode[j] = 0xffffffff;
   2030 
   2031    // build non-spec acceleration table; 255 is flag for not-accelerated
   2032    memset(h->fast, 255, 1 << FAST_BITS);
   2033    for (i=0; i < k; ++i) {
   2034       int s = h->size[i];
   2035       if (s <= FAST_BITS) {
   2036          int c = h->code[i] << (FAST_BITS-s);
   2037          int m = 1 << (FAST_BITS-s);
   2038          for (j=0; j < m; ++j) {
   2039             h->fast[c+j] = (stbi_uc) i;
   2040          }
   2041       }
   2042    }
   2043    return 1;
   2044 }
   2045 
   2046 // build a table that decodes both magnitude and value of small ACs in
   2047 // one go.
   2048 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
   2049 {
   2050    int i;
   2051    for (i=0; i < (1 << FAST_BITS); ++i) {
   2052       stbi_uc fast = h->fast[i];
   2053       fast_ac[i] = 0;
   2054       if (fast < 255) {
   2055          int rs = h->values[fast];
   2056          int run = (rs >> 4) & 15;
   2057          int magbits = rs & 15;
   2058          int len = h->size[fast];
   2059 
   2060          if (magbits && len + magbits <= FAST_BITS) {
   2061             // magnitude code followed by receive_extend code
   2062             int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
   2063             int m = 1 << (magbits - 1);
   2064             if (k < m) k += (~0U << magbits) + 1;
   2065             // if the result is small enough, we can fit it in fast_ac table
   2066             if (k >= -128 && k <= 127)
   2067                fast_ac[i] = (stbi__int16) ((k * 256) + (run * 16) + (len + magbits));
   2068          }
   2069       }
   2070    }
   2071 }
   2072 
   2073 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
   2074 {
   2075    do {
   2076       unsigned int b = j->nomore ? 0 : stbi__get8(j->s);
   2077       if (b == 0xff) {
   2078          int c = stbi__get8(j->s);
   2079          while (c == 0xff) c = stbi__get8(j->s); // consume fill bytes
   2080          if (c != 0) {
   2081             j->marker = (unsigned char) c;
   2082             j->nomore = 1;
   2083             return;
   2084          }
   2085       }
   2086       j->code_buffer |= b << (24 - j->code_bits);
   2087       j->code_bits += 8;
   2088    } while (j->code_bits <= 24);
   2089 }
   2090 
   2091 // (1 << n) - 1
   2092 static const stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
   2093 
   2094 // decode a jpeg huffman value from the bitstream
   2095 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
   2096 {
   2097    unsigned int temp;
   2098    int c,k;
   2099 
   2100    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2101 
   2102    // look at the top FAST_BITS and determine what symbol ID it is,
   2103    // if the code is <= FAST_BITS
   2104    c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   2105    k = h->fast[c];
   2106    if (k < 255) {
   2107       int s = h->size[k];
   2108       if (s > j->code_bits)
   2109          return -1;
   2110       j->code_buffer <<= s;
   2111       j->code_bits -= s;
   2112       return h->values[k];
   2113    }
   2114 
   2115    // naive test is to shift the code_buffer down so k bits are
   2116    // valid, then test against maxcode. To speed this up, we've
   2117    // preshifted maxcode left so that it has (16-k) 0s at the
   2118    // end; in other words, regardless of the number of bits, it
   2119    // wants to be compared against something shifted to have 16;
   2120    // that way we don't need to shift inside the loop.
   2121    temp = j->code_buffer >> 16;
   2122    for (k=FAST_BITS+1 ; ; ++k)
   2123       if (temp < h->maxcode[k])
   2124          break;
   2125    if (k == 17) {
   2126       // error! code not found
   2127       j->code_bits -= 16;
   2128       return -1;
   2129    }
   2130 
   2131    if (k > j->code_bits)
   2132       return -1;
   2133 
   2134    // convert the huffman code to the symbol id
   2135    c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
   2136    if(c < 0 || c >= 256) // symbol id out of bounds!
   2137        return -1;
   2138    STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
   2139 
   2140    // convert the id to a symbol
   2141    j->code_bits -= k;
   2142    j->code_buffer <<= k;
   2143    return h->values[c];
   2144 }
   2145 
   2146 // bias[n] = (-1<<n) + 1
   2147 static const int stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
   2148 
   2149 // combined JPEG 'receive' and JPEG 'extend', since baseline
   2150 // always extends everything it receives.
   2151 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
   2152 {
   2153    unsigned int k;
   2154    int sgn;
   2155    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
   2156    if (j->code_bits < n) return 0; // ran out of bits from stream, return 0s intead of continuing
   2157 
   2158    sgn = j->code_buffer >> 31; // sign bit always in MSB; 0 if MSB clear (positive), 1 if MSB set (negative)
   2159    k = stbi_lrot(j->code_buffer, n);
   2160    j->code_buffer = k & ~stbi__bmask[n];
   2161    k &= stbi__bmask[n];
   2162    j->code_bits -= n;
   2163    return k + (stbi__jbias[n] & (sgn - 1));
   2164 }
   2165 
   2166 // get some unsigned bits
   2167 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
   2168 {
   2169    unsigned int k;
   2170    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
   2171    if (j->code_bits < n) return 0; // ran out of bits from stream, return 0s intead of continuing
   2172    k = stbi_lrot(j->code_buffer, n);
   2173    j->code_buffer = k & ~stbi__bmask[n];
   2174    k &= stbi__bmask[n];
   2175    j->code_bits -= n;
   2176    return k;
   2177 }
   2178 
   2179 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
   2180 {
   2181    unsigned int k;
   2182    if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
   2183    if (j->code_bits < 1) return 0; // ran out of bits from stream, return 0s intead of continuing
   2184    k = j->code_buffer;
   2185    j->code_buffer <<= 1;
   2186    --j->code_bits;
   2187    return k & 0x80000000;
   2188 }
   2189 
   2190 // given a value that's at position X in the zigzag stream,
   2191 // where does it appear in the 8x8 matrix coded as row-major?
   2192 static const stbi_uc stbi__jpeg_dezigzag[64+15] =
   2193 {
   2194     0,  1,  8, 16,  9,  2,  3, 10,
   2195    17, 24, 32, 25, 18, 11,  4,  5,
   2196    12, 19, 26, 33, 40, 48, 41, 34,
   2197    27, 20, 13,  6,  7, 14, 21, 28,
   2198    35, 42, 49, 56, 57, 50, 43, 36,
   2199    29, 22, 15, 23, 30, 37, 44, 51,
   2200    58, 59, 52, 45, 38, 31, 39, 46,
   2201    53, 60, 61, 54, 47, 55, 62, 63,
   2202    // let corrupt input sample past end
   2203    63, 63, 63, 63, 63, 63, 63, 63,
   2204    63, 63, 63, 63, 63, 63, 63
   2205 };
   2206 
   2207 // decode one 64-entry block--
   2208 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi__uint16 *dequant)
   2209 {
   2210    int diff,dc,k;
   2211    int t;
   2212 
   2213    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2214    t = stbi__jpeg_huff_decode(j, hdc);
   2215    if (t < 0 || t > 15) return stbi__err("bad huffman code","Corrupt JPEG");
   2216 
   2217    // 0 all the ac values now so we can do it 32-bits at a time
   2218    memset(data,0,64*sizeof(data[0]));
   2219 
   2220    diff = t ? stbi__extend_receive(j, t) : 0;
   2221    if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) return stbi__err("bad delta","Corrupt JPEG");
   2222    dc = j->img_comp[b].dc_pred + diff;
   2223    j->img_comp[b].dc_pred = dc;
   2224    if (!stbi__mul2shorts_valid(dc, dequant[0])) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   2225    data[0] = (short) (dc * dequant[0]);
   2226 
   2227    // decode AC components, see JPEG spec
   2228    k = 1;
   2229    do {
   2230       unsigned int zig;
   2231       int c,r,s;
   2232       if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2233       c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   2234       r = fac[c];
   2235       if (r) { // fast-AC path
   2236          k += (r >> 4) & 15; // run
   2237          s = r & 15; // combined length
   2238          if (s > j->code_bits) return stbi__err("bad huffman code", "Combined length longer than code bits available");
   2239          j->code_buffer <<= s;
   2240          j->code_bits -= s;
   2241          // decode into unzigzag'd location
   2242          zig = stbi__jpeg_dezigzag[k++];
   2243          data[zig] = (short) ((r >> 8) * dequant[zig]);
   2244       } else {
   2245          int rs = stbi__jpeg_huff_decode(j, hac);
   2246          if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   2247          s = rs & 15;
   2248          r = rs >> 4;
   2249          if (s == 0) {
   2250             if (rs != 0xf0) break; // end block
   2251             k += 16;
   2252          } else {
   2253             k += r;
   2254             // decode into unzigzag'd location
   2255             zig = stbi__jpeg_dezigzag[k++];
   2256             data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
   2257          }
   2258       }
   2259    } while (k < 64);
   2260    return 1;
   2261 }
   2262 
   2263 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
   2264 {
   2265    int diff,dc;
   2266    int t;
   2267    if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   2268 
   2269    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2270 
   2271    if (j->succ_high == 0) {
   2272       // first scan for DC coefficient, must be first
   2273       memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
   2274       t = stbi__jpeg_huff_decode(j, hdc);
   2275       if (t < 0 || t > 15) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   2276       diff = t ? stbi__extend_receive(j, t) : 0;
   2277 
   2278       if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) return stbi__err("bad delta", "Corrupt JPEG");
   2279       dc = j->img_comp[b].dc_pred + diff;
   2280       j->img_comp[b].dc_pred = dc;
   2281       if (!stbi__mul2shorts_valid(dc, 1 << j->succ_low)) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   2282       data[0] = (short) (dc * (1 << j->succ_low));
   2283    } else {
   2284       // refinement scan for DC coefficient
   2285       if (stbi__jpeg_get_bit(j))
   2286          data[0] += (short) (1 << j->succ_low);
   2287    }
   2288    return 1;
   2289 }
   2290 
   2291 // @OPTIMIZE: store non-zigzagged during the decode passes,
   2292 // and only de-zigzag when dequantizing
   2293 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
   2294 {
   2295    int k;
   2296    if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   2297 
   2298    if (j->succ_high == 0) {
   2299       int shift = j->succ_low;
   2300 
   2301       if (j->eob_run) {
   2302          --j->eob_run;
   2303          return 1;
   2304       }
   2305 
   2306       k = j->spec_start;
   2307       do {
   2308          unsigned int zig;
   2309          int c,r,s;
   2310          if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2311          c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   2312          r = fac[c];
   2313          if (r) { // fast-AC path
   2314             k += (r >> 4) & 15; // run
   2315             s = r & 15; // combined length
   2316             if (s > j->code_bits) return stbi__err("bad huffman code", "Combined length longer than code bits available");
   2317             j->code_buffer <<= s;
   2318             j->code_bits -= s;
   2319             zig = stbi__jpeg_dezigzag[k++];
   2320             data[zig] = (short) ((r >> 8) * (1 << shift));
   2321          } else {
   2322             int rs = stbi__jpeg_huff_decode(j, hac);
   2323             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   2324             s = rs & 15;
   2325             r = rs >> 4;
   2326             if (s == 0) {
   2327                if (r < 15) {
   2328                   j->eob_run = (1 << r);
   2329                   if (r)
   2330                      j->eob_run += stbi__jpeg_get_bits(j, r);
   2331                   --j->eob_run;
   2332                   break;
   2333                }
   2334                k += 16;
   2335             } else {
   2336                k += r;
   2337                zig = stbi__jpeg_dezigzag[k++];
   2338                data[zig] = (short) (stbi__extend_receive(j,s) * (1 << shift));
   2339             }
   2340          }
   2341       } while (k <= j->spec_end);
   2342    } else {
   2343       // refinement scan for these AC coefficients
   2344 
   2345       short bit = (short) (1 << j->succ_low);
   2346 
   2347       if (j->eob_run) {
   2348          --j->eob_run;
   2349          for (k = j->spec_start; k <= j->spec_end; ++k) {
   2350             short *p = &data[stbi__jpeg_dezigzag[k]];
   2351             if (*p != 0)
   2352                if (stbi__jpeg_get_bit(j))
   2353                   if ((*p & bit)==0) {
   2354                      if (*p > 0)
   2355                         *p += bit;
   2356                      else
   2357                         *p -= bit;
   2358                   }
   2359          }
   2360       } else {
   2361          k = j->spec_start;
   2362          do {
   2363             int r,s;
   2364             int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
   2365             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   2366             s = rs & 15;
   2367             r = rs >> 4;
   2368             if (s == 0) {
   2369                if (r < 15) {
   2370                   j->eob_run = (1 << r) - 1;
   2371                   if (r)
   2372                      j->eob_run += stbi__jpeg_get_bits(j, r);
   2373                   r = 64; // force end of block
   2374                } else {
   2375                   // r=15 s=0 should write 16 0s, so we just do
   2376                   // a run of 15 0s and then write s (which is 0),
   2377                   // so we don't have to do anything special here
   2378                }
   2379             } else {
   2380                if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
   2381                // sign bit
   2382                if (stbi__jpeg_get_bit(j))
   2383                   s = bit;
   2384                else
   2385                   s = -bit;
   2386             }
   2387 
   2388             // advance by r
   2389             while (k <= j->spec_end) {
   2390                short *p = &data[stbi__jpeg_dezigzag[k++]];
   2391                if (*p != 0) {
   2392                   if (stbi__jpeg_get_bit(j))
   2393                      if ((*p & bit)==0) {
   2394                         if (*p > 0)
   2395                            *p += bit;
   2396                         else
   2397                            *p -= bit;
   2398                      }
   2399                } else {
   2400                   if (r == 0) {
   2401                      *p = (short) s;
   2402                      break;
   2403                   }
   2404                   --r;
   2405                }
   2406             }
   2407          } while (k <= j->spec_end);
   2408       }
   2409    }
   2410    return 1;
   2411 }
   2412 
   2413 // take a -128..127 value and stbi__clamp it and convert to 0..255
   2414 stbi_inline static stbi_uc stbi__clamp(int x)
   2415 {
   2416    // trick to use a single test to catch both cases
   2417    if ((unsigned int) x > 255) {
   2418       if (x < 0) return 0;
   2419       if (x > 255) return 255;
   2420    }
   2421    return (stbi_uc) x;
   2422 }
   2423 
   2424 #define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
   2425 #define stbi__fsh(x)  ((x) * 4096)
   2426 
   2427 // derived from jidctint -- DCT_ISLOW
   2428 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
   2429    int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
   2430    p2 = s2;                                    \
   2431    p3 = s6;                                    \
   2432    p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
   2433    t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
   2434    t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
   2435    p2 = s0;                                    \
   2436    p3 = s4;                                    \
   2437    t0 = stbi__fsh(p2+p3);                      \
   2438    t1 = stbi__fsh(p2-p3);                      \
   2439    x0 = t0+t3;                                 \
   2440    x3 = t0-t3;                                 \
   2441    x1 = t1+t2;                                 \
   2442    x2 = t1-t2;                                 \
   2443    t0 = s7;                                    \
   2444    t1 = s5;                                    \
   2445    t2 = s3;                                    \
   2446    t3 = s1;                                    \
   2447    p3 = t0+t2;                                 \
   2448    p4 = t1+t3;                                 \
   2449    p1 = t0+t3;                                 \
   2450    p2 = t1+t2;                                 \
   2451    p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
   2452    t0 = t0*stbi__f2f( 0.298631336f);           \
   2453    t1 = t1*stbi__f2f( 2.053119869f);           \
   2454    t2 = t2*stbi__f2f( 3.072711026f);           \
   2455    t3 = t3*stbi__f2f( 1.501321110f);           \
   2456    p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
   2457    p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
   2458    p3 = p3*stbi__f2f(-1.961570560f);           \
   2459    p4 = p4*stbi__f2f(-0.390180644f);           \
   2460    t3 += p1+p4;                                \
   2461    t2 += p2+p3;                                \
   2462    t1 += p2+p4;                                \
   2463    t0 += p1+p3;
   2464 
   2465 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
   2466 {
   2467    int i,val[64],*v=val;
   2468    stbi_uc *o;
   2469    short *d = data;
   2470 
   2471    // columns
   2472    for (i=0; i < 8; ++i,++d, ++v) {
   2473       // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
   2474       if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
   2475            && d[40]==0 && d[48]==0 && d[56]==0) {
   2476          //    no shortcut                 0     seconds
   2477          //    (1|2|3|4|5|6|7)==0          0     seconds
   2478          //    all separate               -0.047 seconds
   2479          //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
   2480          int dcterm = d[0]*4;
   2481          v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
   2482       } else {
   2483          STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
   2484          // constants scaled things up by 1<<12; let's bring them back
   2485          // down, but keep 2 extra bits of precision
   2486          x0 += 512; x1 += 512; x2 += 512; x3 += 512;
   2487          v[ 0] = (x0+t3) >> 10;
   2488          v[56] = (x0-t3) >> 10;
   2489          v[ 8] = (x1+t2) >> 10;
   2490          v[48] = (x1-t2) >> 10;
   2491          v[16] = (x2+t1) >> 10;
   2492          v[40] = (x2-t1) >> 10;
   2493          v[24] = (x3+t0) >> 10;
   2494          v[32] = (x3-t0) >> 10;
   2495       }
   2496    }
   2497 
   2498    for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
   2499       // no fast case since the first 1D IDCT spread components out
   2500       STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
   2501       // constants scaled things up by 1<<12, plus we had 1<<2 from first
   2502       // loop, plus horizontal and vertical each scale by sqrt(8) so together
   2503       // we've got an extra 1<<3, so 1<<17 total we need to remove.
   2504       // so we want to round that, which means adding 0.5 * 1<<17,
   2505       // aka 65536. Also, we'll end up with -128 to 127 that we want
   2506       // to encode as 0..255 by adding 128, so we'll add that before the shift
   2507       x0 += 65536 + (128<<17);
   2508       x1 += 65536 + (128<<17);
   2509       x2 += 65536 + (128<<17);
   2510       x3 += 65536 + (128<<17);
   2511       // tried computing the shifts into temps, or'ing the temps to see
   2512       // if any were out of range, but that was slower
   2513       o[0] = stbi__clamp((x0+t3) >> 17);
   2514       o[7] = stbi__clamp((x0-t3) >> 17);
   2515       o[1] = stbi__clamp((x1+t2) >> 17);
   2516       o[6] = stbi__clamp((x1-t2) >> 17);
   2517       o[2] = stbi__clamp((x2+t1) >> 17);
   2518       o[5] = stbi__clamp((x2-t1) >> 17);
   2519       o[3] = stbi__clamp((x3+t0) >> 17);
   2520       o[4] = stbi__clamp((x3-t0) >> 17);
   2521    }
   2522 }
   2523 
   2524 #ifdef STBI_SSE2
   2525 // sse2 integer IDCT. not the fastest possible implementation but it
   2526 // produces bit-identical results to the generic C version so it's
   2527 // fully "transparent".
   2528 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
   2529 {
   2530    // This is constructed to match our regular (generic) integer IDCT exactly.
   2531    __m128i row0, row1, row2, row3, row4, row5, row6, row7;
   2532    __m128i tmp;
   2533 
   2534    // dot product constant: even elems=x, odd elems=y
   2535    #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
   2536 
   2537    // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
   2538    // out(1) = c1[even]*x + c1[odd]*y
   2539    #define dct_rot(out0,out1, x,y,c0,c1) \
   2540       __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
   2541       __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
   2542       __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
   2543       __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
   2544       __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
   2545       __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
   2546 
   2547    // out = in << 12  (in 16-bit, out 32-bit)
   2548    #define dct_widen(out, in) \
   2549       __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
   2550       __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
   2551 
   2552    // wide add
   2553    #define dct_wadd(out, a, b) \
   2554       __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
   2555       __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
   2556 
   2557    // wide sub
   2558    #define dct_wsub(out, a, b) \
   2559       __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
   2560       __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
   2561 
   2562    // butterfly a/b, add bias, then shift by "s" and pack
   2563    #define dct_bfly32o(out0, out1, a,b,bias,s) \
   2564       { \
   2565          __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
   2566          __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
   2567          dct_wadd(sum, abiased, b); \
   2568          dct_wsub(dif, abiased, b); \
   2569          out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
   2570          out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
   2571       }
   2572 
   2573    // 8-bit interleave step (for transposes)
   2574    #define dct_interleave8(a, b) \
   2575       tmp = a; \
   2576       a = _mm_unpacklo_epi8(a, b); \
   2577       b = _mm_unpackhi_epi8(tmp, b)
   2578 
   2579    // 16-bit interleave step (for transposes)
   2580    #define dct_interleave16(a, b) \
   2581       tmp = a; \
   2582       a = _mm_unpacklo_epi16(a, b); \
   2583       b = _mm_unpackhi_epi16(tmp, b)
   2584 
   2585    #define dct_pass(bias,shift) \
   2586       { \
   2587          /* even part */ \
   2588          dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
   2589          __m128i sum04 = _mm_add_epi16(row0, row4); \
   2590          __m128i dif04 = _mm_sub_epi16(row0, row4); \
   2591          dct_widen(t0e, sum04); \
   2592          dct_widen(t1e, dif04); \
   2593          dct_wadd(x0, t0e, t3e); \
   2594          dct_wsub(x3, t0e, t3e); \
   2595          dct_wadd(x1, t1e, t2e); \
   2596          dct_wsub(x2, t1e, t2e); \
   2597          /* odd part */ \
   2598          dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
   2599          dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
   2600          __m128i sum17 = _mm_add_epi16(row1, row7); \
   2601          __m128i sum35 = _mm_add_epi16(row3, row5); \
   2602          dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
   2603          dct_wadd(x4, y0o, y4o); \
   2604          dct_wadd(x5, y1o, y5o); \
   2605          dct_wadd(x6, y2o, y5o); \
   2606          dct_wadd(x7, y3o, y4o); \
   2607          dct_bfly32o(row0,row7, x0,x7,bias,shift); \
   2608          dct_bfly32o(row1,row6, x1,x6,bias,shift); \
   2609          dct_bfly32o(row2,row5, x2,x5,bias,shift); \
   2610          dct_bfly32o(row3,row4, x3,x4,bias,shift); \
   2611       }
   2612 
   2613    __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
   2614    __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
   2615    __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
   2616    __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
   2617    __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
   2618    __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
   2619    __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
   2620    __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
   2621 
   2622    // rounding biases in column/row passes, see stbi__idct_block for explanation.
   2623    __m128i bias_0 = _mm_set1_epi32(512);
   2624    __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
   2625 
   2626    // load
   2627    row0 = _mm_load_si128((const __m128i *) (data + 0*8));
   2628    row1 = _mm_load_si128((const __m128i *) (data + 1*8));
   2629    row2 = _mm_load_si128((const __m128i *) (data + 2*8));
   2630    row3 = _mm_load_si128((const __m128i *) (data + 3*8));
   2631    row4 = _mm_load_si128((const __m128i *) (data + 4*8));
   2632    row5 = _mm_load_si128((const __m128i *) (data + 5*8));
   2633    row6 = _mm_load_si128((const __m128i *) (data + 6*8));
   2634    row7 = _mm_load_si128((const __m128i *) (data + 7*8));
   2635 
   2636    // column pass
   2637    dct_pass(bias_0, 10);
   2638 
   2639    {
   2640       // 16bit 8x8 transpose pass 1
   2641       dct_interleave16(row0, row4);
   2642       dct_interleave16(row1, row5);
   2643       dct_interleave16(row2, row6);
   2644       dct_interleave16(row3, row7);
   2645 
   2646       // transpose pass 2
   2647       dct_interleave16(row0, row2);
   2648       dct_interleave16(row1, row3);
   2649       dct_interleave16(row4, row6);
   2650       dct_interleave16(row5, row7);
   2651 
   2652       // transpose pass 3
   2653       dct_interleave16(row0, row1);
   2654       dct_interleave16(row2, row3);
   2655       dct_interleave16(row4, row5);
   2656       dct_interleave16(row6, row7);
   2657    }
   2658 
   2659    // row pass
   2660    dct_pass(bias_1, 17);
   2661 
   2662    {
   2663       // pack
   2664       __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
   2665       __m128i p1 = _mm_packus_epi16(row2, row3);
   2666       __m128i p2 = _mm_packus_epi16(row4, row5);
   2667       __m128i p3 = _mm_packus_epi16(row6, row7);
   2668 
   2669       // 8bit 8x8 transpose pass 1
   2670       dct_interleave8(p0, p2); // a0e0a1e1...
   2671       dct_interleave8(p1, p3); // c0g0c1g1...
   2672 
   2673       // transpose pass 2
   2674       dct_interleave8(p0, p1); // a0c0e0g0...
   2675       dct_interleave8(p2, p3); // b0d0f0h0...
   2676 
   2677       // transpose pass 3
   2678       dct_interleave8(p0, p2); // a0b0c0d0...
   2679       dct_interleave8(p1, p3); // a4b4c4d4...
   2680 
   2681       // store
   2682       _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
   2683       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
   2684       _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
   2685       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
   2686       _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
   2687       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
   2688       _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
   2689       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
   2690    }
   2691 
   2692 #undef dct_const
   2693 #undef dct_rot
   2694 #undef dct_widen
   2695 #undef dct_wadd
   2696 #undef dct_wsub
   2697 #undef dct_bfly32o
   2698 #undef dct_interleave8
   2699 #undef dct_interleave16
   2700 #undef dct_pass
   2701 }
   2702 
   2703 #endif // STBI_SSE2
   2704 
   2705 #ifdef STBI_NEON
   2706 
   2707 // NEON integer IDCT. should produce bit-identical
   2708 // results to the generic C version.
   2709 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
   2710 {
   2711    int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
   2712 
   2713    int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
   2714    int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
   2715    int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
   2716    int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
   2717    int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
   2718    int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
   2719    int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
   2720    int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
   2721    int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
   2722    int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
   2723    int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
   2724    int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
   2725 
   2726 #define dct_long_mul(out, inq, coeff) \
   2727    int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
   2728    int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
   2729 
   2730 #define dct_long_mac(out, acc, inq, coeff) \
   2731    int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
   2732    int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
   2733 
   2734 #define dct_widen(out, inq) \
   2735    int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
   2736    int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
   2737 
   2738 // wide add
   2739 #define dct_wadd(out, a, b) \
   2740    int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
   2741    int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
   2742 
   2743 // wide sub
   2744 #define dct_wsub(out, a, b) \
   2745    int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
   2746    int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
   2747 
   2748 // butterfly a/b, then shift using "shiftop" by "s" and pack
   2749 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
   2750    { \
   2751       dct_wadd(sum, a, b); \
   2752       dct_wsub(dif, a, b); \
   2753       out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
   2754       out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
   2755    }
   2756 
   2757 #define dct_pass(shiftop, shift) \
   2758    { \
   2759       /* even part */ \
   2760       int16x8_t sum26 = vaddq_s16(row2, row6); \
   2761       dct_long_mul(p1e, sum26, rot0_0); \
   2762       dct_long_mac(t2e, p1e, row6, rot0_1); \
   2763       dct_long_mac(t3e, p1e, row2, rot0_2); \
   2764       int16x8_t sum04 = vaddq_s16(row0, row4); \
   2765       int16x8_t dif04 = vsubq_s16(row0, row4); \
   2766       dct_widen(t0e, sum04); \
   2767       dct_widen(t1e, dif04); \
   2768       dct_wadd(x0, t0e, t3e); \
   2769       dct_wsub(x3, t0e, t3e); \
   2770       dct_wadd(x1, t1e, t2e); \
   2771       dct_wsub(x2, t1e, t2e); \
   2772       /* odd part */ \
   2773       int16x8_t sum15 = vaddq_s16(row1, row5); \
   2774       int16x8_t sum17 = vaddq_s16(row1, row7); \
   2775       int16x8_t sum35 = vaddq_s16(row3, row5); \
   2776       int16x8_t sum37 = vaddq_s16(row3, row7); \
   2777       int16x8_t sumodd = vaddq_s16(sum17, sum35); \
   2778       dct_long_mul(p5o, sumodd, rot1_0); \
   2779       dct_long_mac(p1o, p5o, sum17, rot1_1); \
   2780       dct_long_mac(p2o, p5o, sum35, rot1_2); \
   2781       dct_long_mul(p3o, sum37, rot2_0); \
   2782       dct_long_mul(p4o, sum15, rot2_1); \
   2783       dct_wadd(sump13o, p1o, p3o); \
   2784       dct_wadd(sump24o, p2o, p4o); \
   2785       dct_wadd(sump23o, p2o, p3o); \
   2786       dct_wadd(sump14o, p1o, p4o); \
   2787       dct_long_mac(x4, sump13o, row7, rot3_0); \
   2788       dct_long_mac(x5, sump24o, row5, rot3_1); \
   2789       dct_long_mac(x6, sump23o, row3, rot3_2); \
   2790       dct_long_mac(x7, sump14o, row1, rot3_3); \
   2791       dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
   2792       dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
   2793       dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
   2794       dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
   2795    }
   2796 
   2797    // load
   2798    row0 = vld1q_s16(data + 0*8);
   2799    row1 = vld1q_s16(data + 1*8);
   2800    row2 = vld1q_s16(data + 2*8);
   2801    row3 = vld1q_s16(data + 3*8);
   2802    row4 = vld1q_s16(data + 4*8);
   2803    row5 = vld1q_s16(data + 5*8);
   2804    row6 = vld1q_s16(data + 6*8);
   2805    row7 = vld1q_s16(data + 7*8);
   2806 
   2807    // add DC bias
   2808    row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
   2809 
   2810    // column pass
   2811    dct_pass(vrshrn_n_s32, 10);
   2812 
   2813    // 16bit 8x8 transpose
   2814    {
   2815 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
   2816 // whether compilers actually get this is another story, sadly.
   2817 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
   2818 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
   2819 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
   2820 
   2821       // pass 1
   2822       dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
   2823       dct_trn16(row2, row3);
   2824       dct_trn16(row4, row5);
   2825       dct_trn16(row6, row7);
   2826 
   2827       // pass 2
   2828       dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
   2829       dct_trn32(row1, row3);
   2830       dct_trn32(row4, row6);
   2831       dct_trn32(row5, row7);
   2832 
   2833       // pass 3
   2834       dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
   2835       dct_trn64(row1, row5);
   2836       dct_trn64(row2, row6);
   2837       dct_trn64(row3, row7);
   2838 
   2839 #undef dct_trn16
   2840 #undef dct_trn32
   2841 #undef dct_trn64
   2842    }
   2843 
   2844    // row pass
   2845    // vrshrn_n_s32 only supports shifts up to 16, we need
   2846    // 17. so do a non-rounding shift of 16 first then follow
   2847    // up with a rounding shift by 1.
   2848    dct_pass(vshrn_n_s32, 16);
   2849 
   2850    {
   2851       // pack and round
   2852       uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
   2853       uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
   2854       uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
   2855       uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
   2856       uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
   2857       uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
   2858       uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
   2859       uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
   2860 
   2861       // again, these can translate into one instruction, but often don't.
   2862 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
   2863 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
   2864 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
   2865 
   2866       // sadly can't use interleaved stores here since we only write
   2867       // 8 bytes to each scan line!
   2868 
   2869       // 8x8 8-bit transpose pass 1
   2870       dct_trn8_8(p0, p1);
   2871       dct_trn8_8(p2, p3);
   2872       dct_trn8_8(p4, p5);
   2873       dct_trn8_8(p6, p7);
   2874 
   2875       // pass 2
   2876       dct_trn8_16(p0, p2);
   2877       dct_trn8_16(p1, p3);
   2878       dct_trn8_16(p4, p6);
   2879       dct_trn8_16(p5, p7);
   2880 
   2881       // pass 3
   2882       dct_trn8_32(p0, p4);
   2883       dct_trn8_32(p1, p5);
   2884       dct_trn8_32(p2, p6);
   2885       dct_trn8_32(p3, p7);
   2886 
   2887       // store
   2888       vst1_u8(out, p0); out += out_stride;
   2889       vst1_u8(out, p1); out += out_stride;
   2890       vst1_u8(out, p2); out += out_stride;
   2891       vst1_u8(out, p3); out += out_stride;
   2892       vst1_u8(out, p4); out += out_stride;
   2893       vst1_u8(out, p5); out += out_stride;
   2894       vst1_u8(out, p6); out += out_stride;
   2895       vst1_u8(out, p7);
   2896 
   2897 #undef dct_trn8_8
   2898 #undef dct_trn8_16
   2899 #undef dct_trn8_32
   2900    }
   2901 
   2902 #undef dct_long_mul
   2903 #undef dct_long_mac
   2904 #undef dct_widen
   2905 #undef dct_wadd
   2906 #undef dct_wsub
   2907 #undef dct_bfly32o
   2908 #undef dct_pass
   2909 }
   2910 
   2911 #endif // STBI_NEON
   2912 
   2913 #define STBI__MARKER_none  0xff
   2914 // if there's a pending marker from the entropy stream, return that
   2915 // otherwise, fetch from the stream and get a marker. if there's no
   2916 // marker, return 0xff, which is never a valid marker value
   2917 static stbi_uc stbi__get_marker(stbi__jpeg *j)
   2918 {
   2919    stbi_uc x;
   2920    if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
   2921    x = stbi__get8(j->s);
   2922    if (x != 0xff) return STBI__MARKER_none;
   2923    while (x == 0xff)
   2924       x = stbi__get8(j->s); // consume repeated 0xff fill bytes
   2925    return x;
   2926 }
   2927 
   2928 // in each scan, we'll have scan_n components, and the order
   2929 // of the components is specified by order[]
   2930 #define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
   2931 
   2932 // after a restart interval, stbi__jpeg_reset the entropy decoder and
   2933 // the dc prediction
   2934 static void stbi__jpeg_reset(stbi__jpeg *j)
   2935 {
   2936    j->code_bits = 0;
   2937    j->code_buffer = 0;
   2938    j->nomore = 0;
   2939    j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = j->img_comp[3].dc_pred = 0;
   2940    j->marker = STBI__MARKER_none;
   2941    j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
   2942    j->eob_run = 0;
   2943    // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
   2944    // since we don't even allow 1<<30 pixels
   2945 }
   2946 
   2947 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
   2948 {
   2949    stbi__jpeg_reset(z);
   2950    if (!z->progressive) {
   2951       if (z->scan_n == 1) {
   2952          int i,j;
   2953          STBI_SIMD_ALIGN(short, data[64]);
   2954          int n = z->order[0];
   2955          // non-interleaved data, we just need to process one block at a time,
   2956          // in trivial scanline order
   2957          // number of blocks to do just depends on how many actual "pixels" this
   2958          // component has, independent of interleaved MCU blocking and such
   2959          int w = (z->img_comp[n].x+7) >> 3;
   2960          int h = (z->img_comp[n].y+7) >> 3;
   2961          for (j=0; j < h; ++j) {
   2962             for (i=0; i < w; ++i) {
   2963                int ha = z->img_comp[n].ha;
   2964                if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
   2965                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
   2966                // every data block is an MCU, so countdown the restart interval
   2967                if (--z->todo <= 0) {
   2968                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2969                   // if it's NOT a restart, then just bail, so we get corrupt data
   2970                   // rather than no data
   2971                   if (!STBI__RESTART(z->marker)) return 1;
   2972                   stbi__jpeg_reset(z);
   2973                }
   2974             }
   2975          }
   2976          return 1;
   2977       } else { // interleaved
   2978          int i,j,k,x,y;
   2979          STBI_SIMD_ALIGN(short, data[64]);
   2980          for (j=0; j < z->img_mcu_y; ++j) {
   2981             for (i=0; i < z->img_mcu_x; ++i) {
   2982                // scan an interleaved mcu... process scan_n components in order
   2983                for (k=0; k < z->scan_n; ++k) {
   2984                   int n = z->order[k];
   2985                   // scan out an mcu's worth of this component; that's just determined
   2986                   // by the basic H and V specified for the component
   2987                   for (y=0; y < z->img_comp[n].v; ++y) {
   2988                      for (x=0; x < z->img_comp[n].h; ++x) {
   2989                         int x2 = (i*z->img_comp[n].h + x)*8;
   2990                         int y2 = (j*z->img_comp[n].v + y)*8;
   2991                         int ha = z->img_comp[n].ha;
   2992                         if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
   2993                         z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
   2994                      }
   2995                   }
   2996                }
   2997                // after all interleaved components, that's an interleaved MCU,
   2998                // so now count down the restart interval
   2999                if (--z->todo <= 0) {
   3000                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   3001                   if (!STBI__RESTART(z->marker)) return 1;
   3002                   stbi__jpeg_reset(z);
   3003                }
   3004             }
   3005          }
   3006          return 1;
   3007       }
   3008    } else {
   3009       if (z->scan_n == 1) {
   3010          int i,j;
   3011          int n = z->order[0];
   3012          // non-interleaved data, we just need to process one block at a time,
   3013          // in trivial scanline order
   3014          // number of blocks to do just depends on how many actual "pixels" this
   3015          // component has, independent of interleaved MCU blocking and such
   3016          int w = (z->img_comp[n].x+7) >> 3;
   3017          int h = (z->img_comp[n].y+7) >> 3;
   3018          for (j=0; j < h; ++j) {
   3019             for (i=0; i < w; ++i) {
   3020                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
   3021                if (z->spec_start == 0) {
   3022                   if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
   3023                      return 0;
   3024                } else {
   3025                   int ha = z->img_comp[n].ha;
   3026                   if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
   3027                      return 0;
   3028                }
   3029                // every data block is an MCU, so countdown the restart interval
   3030                if (--z->todo <= 0) {
   3031                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   3032                   if (!STBI__RESTART(z->marker)) return 1;
   3033                   stbi__jpeg_reset(z);
   3034                }
   3035             }
   3036          }
   3037          return 1;
   3038       } else { // interleaved
   3039          int i,j,k,x,y;
   3040          for (j=0; j < z->img_mcu_y; ++j) {
   3041             for (i=0; i < z->img_mcu_x; ++i) {
   3042                // scan an interleaved mcu... process scan_n components in order
   3043                for (k=0; k < z->scan_n; ++k) {
   3044                   int n = z->order[k];
   3045                   // scan out an mcu's worth of this component; that's just determined
   3046                   // by the basic H and V specified for the component
   3047                   for (y=0; y < z->img_comp[n].v; ++y) {
   3048                      for (x=0; x < z->img_comp[n].h; ++x) {
   3049                         int x2 = (i*z->img_comp[n].h + x);
   3050                         int y2 = (j*z->img_comp[n].v + y);
   3051                         short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
   3052                         if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
   3053                            return 0;
   3054                      }
   3055                   }
   3056                }
   3057                // after all interleaved components, that's an interleaved MCU,
   3058                // so now count down the restart interval
   3059                if (--z->todo <= 0) {
   3060                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   3061                   if (!STBI__RESTART(z->marker)) return 1;
   3062                   stbi__jpeg_reset(z);
   3063                }
   3064             }
   3065          }
   3066          return 1;
   3067       }
   3068    }
   3069 }
   3070 
   3071 static void stbi__jpeg_dequantize(short *data, stbi__uint16 *dequant)
   3072 {
   3073    int i;
   3074    for (i=0; i < 64; ++i)
   3075       data[i] *= dequant[i];
   3076 }
   3077 
   3078 static void stbi__jpeg_finish(stbi__jpeg *z)
   3079 {
   3080    if (z->progressive) {
   3081       // dequantize and idct the data
   3082       int i,j,n;
   3083       for (n=0; n < z->s->img_n; ++n) {
   3084          int w = (z->img_comp[n].x+7) >> 3;
   3085          int h = (z->img_comp[n].y+7) >> 3;
   3086          for (j=0; j < h; ++j) {
   3087             for (i=0; i < w; ++i) {
   3088                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
   3089                stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
   3090                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
   3091             }
   3092          }
   3093       }
   3094    }
   3095 }
   3096 
   3097 static int stbi__process_marker(stbi__jpeg *z, int m)
   3098 {
   3099    int L;
   3100    switch (m) {
   3101       case STBI__MARKER_none: // no marker found
   3102          return stbi__err("expected marker","Corrupt JPEG");
   3103 
   3104       case 0xDD: // DRI - specify restart interval
   3105          if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
   3106          z->restart_interval = stbi__get16be(z->s);
   3107          return 1;
   3108 
   3109       case 0xDB: // DQT - define quantization table
   3110          L = stbi__get16be(z->s)-2;
   3111          while (L > 0) {
   3112             int q = stbi__get8(z->s);
   3113             int p = q >> 4, sixteen = (p != 0);
   3114             int t = q & 15,i;
   3115             if (p != 0 && p != 1) return stbi__err("bad DQT type","Corrupt JPEG");
   3116             if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
   3117 
   3118             for (i=0; i < 64; ++i)
   3119                z->dequant[t][stbi__jpeg_dezigzag[i]] = (stbi__uint16)(sixteen ? stbi__get16be(z->s) : stbi__get8(z->s));
   3120             L -= (sixteen ? 129 : 65);
   3121          }
   3122          return L==0;
   3123 
   3124       case 0xC4: // DHT - define huffman table
   3125          L = stbi__get16be(z->s)-2;
   3126          while (L > 0) {
   3127             stbi_uc *v;
   3128             int sizes[16],i,n=0;
   3129             int q = stbi__get8(z->s);
   3130             int tc = q >> 4;
   3131             int th = q & 15;
   3132             if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
   3133             for (i=0; i < 16; ++i) {
   3134                sizes[i] = stbi__get8(z->s);
   3135                n += sizes[i];
   3136             }
   3137             if(n > 256) return stbi__err("bad DHT header","Corrupt JPEG"); // Loop over i < n would write past end of values!
   3138             L -= 17;
   3139             if (tc == 0) {
   3140                if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
   3141                v = z->huff_dc[th].values;
   3142             } else {
   3143                if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
   3144                v = z->huff_ac[th].values;
   3145             }
   3146             for (i=0; i < n; ++i)
   3147                v[i] = stbi__get8(z->s);
   3148             if (tc != 0)
   3149                stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
   3150             L -= n;
   3151          }
   3152          return L==0;
   3153    }
   3154 
   3155    // check for comment block or APP blocks
   3156    if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
   3157       L = stbi__get16be(z->s);
   3158       if (L < 2) {
   3159          if (m == 0xFE)
   3160             return stbi__err("bad COM len","Corrupt JPEG");
   3161          else
   3162             return stbi__err("bad APP len","Corrupt JPEG");
   3163       }
   3164       L -= 2;
   3165 
   3166       if (m == 0xE0 && L >= 5) { // JFIF APP0 segment
   3167          static const unsigned char tag[5] = {'J','F','I','F','\0'};
   3168          int ok = 1;
   3169          int i;
   3170          for (i=0; i < 5; ++i)
   3171             if (stbi__get8(z->s) != tag[i])
   3172                ok = 0;
   3173          L -= 5;
   3174          if (ok)
   3175             z->jfif = 1;
   3176       } else if (m == 0xEE && L >= 12) { // Adobe APP14 segment
   3177          static const unsigned char tag[6] = {'A','d','o','b','e','\0'};
   3178          int ok = 1;
   3179          int i;
   3180          for (i=0; i < 6; ++i)
   3181             if (stbi__get8(z->s) != tag[i])
   3182                ok = 0;
   3183          L -= 6;
   3184          if (ok) {
   3185             stbi__get8(z->s); // version
   3186             stbi__get16be(z->s); // flags0
   3187             stbi__get16be(z->s); // flags1
   3188             z->app14_color_transform = stbi__get8(z->s); // color transform
   3189             L -= 6;
   3190          }
   3191       }
   3192 
   3193       stbi__skip(z->s, L);
   3194       return 1;
   3195    }
   3196 
   3197    return stbi__err("unknown marker","Corrupt JPEG");
   3198 }
   3199 
   3200 // after we see SOS
   3201 static int stbi__process_scan_header(stbi__jpeg *z)
   3202 {
   3203    int i;
   3204    int Ls = stbi__get16be(z->s);
   3205    z->scan_n = stbi__get8(z->s);
   3206    if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
   3207    if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
   3208    for (i=0; i < z->scan_n; ++i) {
   3209       int id = stbi__get8(z->s), which;
   3210       int q = stbi__get8(z->s);
   3211       for (which = 0; which < z->s->img_n; ++which)
   3212          if (z->img_comp[which].id == id)
   3213             break;
   3214       if (which == z->s->img_n) return 0; // no match
   3215       z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
   3216       z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
   3217       z->order[i] = which;
   3218    }
   3219 
   3220    {
   3221       int aa;
   3222       z->spec_start = stbi__get8(z->s);
   3223       z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
   3224       aa = stbi__get8(z->s);
   3225       z->succ_high = (aa >> 4);
   3226       z->succ_low  = (aa & 15);
   3227       if (z->progressive) {
   3228          if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
   3229             return stbi__err("bad SOS", "Corrupt JPEG");
   3230       } else {
   3231          if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
   3232          if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
   3233          z->spec_end = 63;
   3234       }
   3235    }
   3236 
   3237    return 1;
   3238 }
   3239 
   3240 static int stbi__free_jpeg_components(stbi__jpeg *z, int ncomp, int why)
   3241 {
   3242    int i;
   3243    for (i=0; i < ncomp; ++i) {
   3244       if (z->img_comp[i].raw_data) {
   3245          STBI_FREE(z->img_comp[i].raw_data);
   3246          z->img_comp[i].raw_data = NULL;
   3247          z->img_comp[i].data = NULL;
   3248       }
   3249       if (z->img_comp[i].raw_coeff) {
   3250          STBI_FREE(z->img_comp[i].raw_coeff);
   3251          z->img_comp[i].raw_coeff = 0;
   3252          z->img_comp[i].coeff = 0;
   3253       }
   3254       if (z->img_comp[i].linebuf) {
   3255          STBI_FREE(z->img_comp[i].linebuf);
   3256          z->img_comp[i].linebuf = NULL;
   3257       }
   3258    }
   3259    return why;
   3260 }
   3261 
   3262 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
   3263 {
   3264    stbi__context *s = z->s;
   3265    int Lf,p,i,q, h_max=1,v_max=1,c;
   3266    Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
   3267    p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
   3268    s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
   3269    s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
   3270    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   3271    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   3272    c = stbi__get8(s);
   3273    if (c != 3 && c != 1 && c != 4) return stbi__err("bad component count","Corrupt JPEG");
   3274    s->img_n = c;
   3275    for (i=0; i < c; ++i) {
   3276       z->img_comp[i].data = NULL;
   3277       z->img_comp[i].linebuf = NULL;
   3278    }
   3279 
   3280    if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
   3281 
   3282    z->rgb = 0;
   3283    for (i=0; i < s->img_n; ++i) {
   3284       static const unsigned char rgb[3] = { 'R', 'G', 'B' };
   3285       z->img_comp[i].id = stbi__get8(s);
   3286       if (s->img_n == 3 && z->img_comp[i].id == rgb[i])
   3287          ++z->rgb;
   3288       q = stbi__get8(s);
   3289       z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
   3290       z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
   3291       z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
   3292    }
   3293 
   3294    if (scan != STBI__SCAN_load) return 1;
   3295 
   3296    if (!stbi__mad3sizes_valid(s->img_x, s->img_y, s->img_n, 0)) return stbi__err("too large", "Image too large to decode");
   3297 
   3298    for (i=0; i < s->img_n; ++i) {
   3299       if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
   3300       if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
   3301    }
   3302 
   3303    // check that plane subsampling factors are integer ratios; our resamplers can't deal with fractional ratios
   3304    // and I've never seen a non-corrupted JPEG file actually use them
   3305    for (i=0; i < s->img_n; ++i) {
   3306       if (h_max % z->img_comp[i].h != 0) return stbi__err("bad H","Corrupt JPEG");
   3307       if (v_max % z->img_comp[i].v != 0) return stbi__err("bad V","Corrupt JPEG");
   3308    }
   3309 
   3310    // compute interleaved mcu info
   3311    z->img_h_max = h_max;
   3312    z->img_v_max = v_max;
   3313    z->img_mcu_w = h_max * 8;
   3314    z->img_mcu_h = v_max * 8;
   3315    // these sizes can't be more than 17 bits
   3316    z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
   3317    z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
   3318 
   3319    for (i=0; i < s->img_n; ++i) {
   3320       // number of effective pixels (e.g. for non-interleaved MCU)
   3321       z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
   3322       z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
   3323       // to simplify generation, we'll allocate enough memory to decode
   3324       // the bogus oversized data from using interleaved MCUs and their
   3325       // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
   3326       // discard the extra data until colorspace conversion
   3327       //
   3328       // img_mcu_x, img_mcu_y: <=17 bits; comp[i].h and .v are <=4 (checked earlier)
   3329       // so these muls can't overflow with 32-bit ints (which we require)
   3330       z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
   3331       z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
   3332       z->img_comp[i].coeff = 0;
   3333       z->img_comp[i].raw_coeff = 0;
   3334       z->img_comp[i].linebuf = NULL;
   3335       z->img_comp[i].raw_data = stbi__malloc_mad2(z->img_comp[i].w2, z->img_comp[i].h2, 15);
   3336       if (z->img_comp[i].raw_data == NULL)
   3337          return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
   3338       // align blocks for idct using mmx/sse
   3339       z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
   3340       if (z->progressive) {
   3341          // w2, h2 are multiples of 8 (see above)
   3342          z->img_comp[i].coeff_w = z->img_comp[i].w2 / 8;
   3343          z->img_comp[i].coeff_h = z->img_comp[i].h2 / 8;
   3344          z->img_comp[i].raw_coeff = stbi__malloc_mad3(z->img_comp[i].w2, z->img_comp[i].h2, sizeof(short), 15);
   3345          if (z->img_comp[i].raw_coeff == NULL)
   3346             return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
   3347          z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
   3348       }
   3349    }
   3350 
   3351    return 1;
   3352 }
   3353 
   3354 // use comparisons since in some cases we handle more than one case (e.g. SOF)
   3355 #define stbi__DNL(x)         ((x) == 0xdc)
   3356 #define stbi__SOI(x)         ((x) == 0xd8)
   3357 #define stbi__EOI(x)         ((x) == 0xd9)
   3358 #define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
   3359 #define stbi__SOS(x)         ((x) == 0xda)
   3360 
   3361 #define stbi__SOF_progressive(x)   ((x) == 0xc2)
   3362 
   3363 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
   3364 {
   3365    int m;
   3366    z->jfif = 0;
   3367    z->app14_color_transform = -1; // valid values are 0,1,2
   3368    z->marker = STBI__MARKER_none; // initialize cached marker to empty
   3369    m = stbi__get_marker(z);
   3370    if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
   3371    if (scan == STBI__SCAN_type) return 1;
   3372    m = stbi__get_marker(z);
   3373    while (!stbi__SOF(m)) {
   3374       if (!stbi__process_marker(z,m)) return 0;
   3375       m = stbi__get_marker(z);
   3376       while (m == STBI__MARKER_none) {
   3377          // some files have extra padding after their blocks, so ok, we'll scan
   3378          if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
   3379          m = stbi__get_marker(z);
   3380       }
   3381    }
   3382    z->progressive = stbi__SOF_progressive(m);
   3383    if (!stbi__process_frame_header(z, scan)) return 0;
   3384    return 1;
   3385 }
   3386 
   3387 static int stbi__skip_jpeg_junk_at_end(stbi__jpeg *j)
   3388 {
   3389    // some JPEGs have junk at end, skip over it but if we find what looks
   3390    // like a valid marker, resume there
   3391    while (!stbi__at_eof(j->s)) {
   3392       int x = stbi__get8(j->s);
   3393       while (x == 255) { // might be a marker
   3394          if (stbi__at_eof(j->s)) return STBI__MARKER_none;
   3395          x = stbi__get8(j->s);
   3396          if (x != 0x00 && x != 0xff) {
   3397             // not a stuffed zero or lead-in to another marker, looks
   3398             // like an actual marker, return it
   3399             return x;
   3400          }
   3401          // stuffed zero has x=0 now which ends the loop, meaning we go
   3402          // back to regular scan loop.
   3403          // repeated 0xff keeps trying to read the next byte of the marker.
   3404       }
   3405    }
   3406    return STBI__MARKER_none;
   3407 }
   3408 
   3409 // decode image to YCbCr format
   3410 static int stbi__decode_jpeg_image(stbi__jpeg *j)
   3411 {
   3412    int m;
   3413    for (m = 0; m < 4; m++) {
   3414       j->img_comp[m].raw_data = NULL;
   3415       j->img_comp[m].raw_coeff = NULL;
   3416    }
   3417    j->restart_interval = 0;
   3418    if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
   3419    m = stbi__get_marker(j);
   3420    while (!stbi__EOI(m)) {
   3421       if (stbi__SOS(m)) {
   3422          if (!stbi__process_scan_header(j)) return 0;
   3423          if (!stbi__parse_entropy_coded_data(j)) return 0;
   3424          if (j->marker == STBI__MARKER_none ) {
   3425          j->marker = stbi__skip_jpeg_junk_at_end(j);
   3426             // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
   3427          }
   3428          m = stbi__get_marker(j);
   3429          if (STBI__RESTART(m))
   3430             m = stbi__get_marker(j);
   3431       } else if (stbi__DNL(m)) {
   3432          int Ld = stbi__get16be(j->s);
   3433          stbi__uint32 NL = stbi__get16be(j->s);
   3434          if (Ld != 4) return stbi__err("bad DNL len", "Corrupt JPEG");
   3435          if (NL != j->s->img_y) return stbi__err("bad DNL height", "Corrupt JPEG");
   3436          m = stbi__get_marker(j);
   3437       } else {
   3438          if (!stbi__process_marker(j, m)) return 1;
   3439          m = stbi__get_marker(j);
   3440       }
   3441    }
   3442    if (j->progressive)
   3443       stbi__jpeg_finish(j);
   3444    return 1;
   3445 }
   3446 
   3447 // static jfif-centered resampling (across block boundaries)
   3448 
   3449 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
   3450                                     int w, int hs);
   3451 
   3452 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
   3453 
   3454 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3455 {
   3456    STBI_NOTUSED(out);
   3457    STBI_NOTUSED(in_far);
   3458    STBI_NOTUSED(w);
   3459    STBI_NOTUSED(hs);
   3460    return in_near;
   3461 }
   3462 
   3463 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3464 {
   3465    // need to generate two samples vertically for every one in input
   3466    int i;
   3467    STBI_NOTUSED(hs);
   3468    for (i=0; i < w; ++i)
   3469       out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
   3470    return out;
   3471 }
   3472 
   3473 static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3474 {
   3475    // need to generate two samples horizontally for every one in input
   3476    int i;
   3477    stbi_uc *input = in_near;
   3478 
   3479    if (w == 1) {
   3480       // if only one sample, can't do any interpolation
   3481       out[0] = out[1] = input[0];
   3482       return out;
   3483    }
   3484 
   3485    out[0] = input[0];
   3486    out[1] = stbi__div4(input[0]*3 + input[1] + 2);
   3487    for (i=1; i < w-1; ++i) {
   3488       int n = 3*input[i]+2;
   3489       out[i*2+0] = stbi__div4(n+input[i-1]);
   3490       out[i*2+1] = stbi__div4(n+input[i+1]);
   3491    }
   3492    out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
   3493    out[i*2+1] = input[w-1];
   3494 
   3495    STBI_NOTUSED(in_far);
   3496    STBI_NOTUSED(hs);
   3497 
   3498    return out;
   3499 }
   3500 
   3501 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
   3502 
   3503 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3504 {
   3505    // need to generate 2x2 samples for every one in input
   3506    int i,t0,t1;
   3507    if (w == 1) {
   3508       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
   3509       return out;
   3510    }
   3511 
   3512    t1 = 3*in_near[0] + in_far[0];
   3513    out[0] = stbi__div4(t1+2);
   3514    for (i=1; i < w; ++i) {
   3515       t0 = t1;
   3516       t1 = 3*in_near[i]+in_far[i];
   3517       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
   3518       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
   3519    }
   3520    out[w*2-1] = stbi__div4(t1+2);
   3521 
   3522    STBI_NOTUSED(hs);
   3523 
   3524    return out;
   3525 }
   3526 
   3527 #if defined(STBI_SSE2) || defined(STBI_NEON)
   3528 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3529 {
   3530    // need to generate 2x2 samples for every one in input
   3531    int i=0,t0,t1;
   3532 
   3533    if (w == 1) {
   3534       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
   3535       return out;
   3536    }
   3537 
   3538    t1 = 3*in_near[0] + in_far[0];
   3539    // process groups of 8 pixels for as long as we can.
   3540    // note we can't handle the last pixel in a row in this loop
   3541    // because we need to handle the filter boundary conditions.
   3542    for (; i < ((w-1) & ~7); i += 8) {
   3543 #if defined(STBI_SSE2)
   3544       // load and perform the vertical filtering pass
   3545       // this uses 3*x + y = 4*x + (y - x)
   3546       __m128i zero  = _mm_setzero_si128();
   3547       __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
   3548       __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
   3549       __m128i farw  = _mm_unpacklo_epi8(farb, zero);
   3550       __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
   3551       __m128i diff  = _mm_sub_epi16(farw, nearw);
   3552       __m128i nears = _mm_slli_epi16(nearw, 2);
   3553       __m128i curr  = _mm_add_epi16(nears, diff); // current row
   3554 
   3555       // horizontal filter works the same based on shifted vers of current
   3556       // row. "prev" is current row shifted right by 1 pixel; we need to
   3557       // insert the previous pixel value (from t1).
   3558       // "next" is current row shifted left by 1 pixel, with first pixel
   3559       // of next block of 8 pixels added in.
   3560       __m128i prv0 = _mm_slli_si128(curr, 2);
   3561       __m128i nxt0 = _mm_srli_si128(curr, 2);
   3562       __m128i prev = _mm_insert_epi16(prv0, t1, 0);
   3563       __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
   3564 
   3565       // horizontal filter, polyphase implementation since it's convenient:
   3566       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
   3567       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
   3568       // note the shared term.
   3569       __m128i bias  = _mm_set1_epi16(8);
   3570       __m128i curs = _mm_slli_epi16(curr, 2);
   3571       __m128i prvd = _mm_sub_epi16(prev, curr);
   3572       __m128i nxtd = _mm_sub_epi16(next, curr);
   3573       __m128i curb = _mm_add_epi16(curs, bias);
   3574       __m128i even = _mm_add_epi16(prvd, curb);
   3575       __m128i odd  = _mm_add_epi16(nxtd, curb);
   3576 
   3577       // interleave even and odd pixels, then undo scaling.
   3578       __m128i int0 = _mm_unpacklo_epi16(even, odd);
   3579       __m128i int1 = _mm_unpackhi_epi16(even, odd);
   3580       __m128i de0  = _mm_srli_epi16(int0, 4);
   3581       __m128i de1  = _mm_srli_epi16(int1, 4);
   3582 
   3583       // pack and write output
   3584       __m128i outv = _mm_packus_epi16(de0, de1);
   3585       _mm_storeu_si128((__m128i *) (out + i*2), outv);
   3586 #elif defined(STBI_NEON)
   3587       // load and perform the vertical filtering pass
   3588       // this uses 3*x + y = 4*x + (y - x)
   3589       uint8x8_t farb  = vld1_u8(in_far + i);
   3590       uint8x8_t nearb = vld1_u8(in_near + i);
   3591       int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
   3592       int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
   3593       int16x8_t curr  = vaddq_s16(nears, diff); // current row
   3594 
   3595       // horizontal filter works the same based on shifted vers of current
   3596       // row. "prev" is current row shifted right by 1 pixel; we need to
   3597       // insert the previous pixel value (from t1).
   3598       // "next" is current row shifted left by 1 pixel, with first pixel
   3599       // of next block of 8 pixels added in.
   3600       int16x8_t prv0 = vextq_s16(curr, curr, 7);
   3601       int16x8_t nxt0 = vextq_s16(curr, curr, 1);
   3602       int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
   3603       int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
   3604 
   3605       // horizontal filter, polyphase implementation since it's convenient:
   3606       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
   3607       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
   3608       // note the shared term.
   3609       int16x8_t curs = vshlq_n_s16(curr, 2);
   3610       int16x8_t prvd = vsubq_s16(prev, curr);
   3611       int16x8_t nxtd = vsubq_s16(next, curr);
   3612       int16x8_t even = vaddq_s16(curs, prvd);
   3613       int16x8_t odd  = vaddq_s16(curs, nxtd);
   3614 
   3615       // undo scaling and round, then store with even/odd phases interleaved
   3616       uint8x8x2_t o;
   3617       o.val[0] = vqrshrun_n_s16(even, 4);
   3618       o.val[1] = vqrshrun_n_s16(odd,  4);
   3619       vst2_u8(out + i*2, o);
   3620 #endif
   3621 
   3622       // "previous" value for next iter
   3623       t1 = 3*in_near[i+7] + in_far[i+7];
   3624    }
   3625 
   3626    t0 = t1;
   3627    t1 = 3*in_near[i] + in_far[i];
   3628    out[i*2] = stbi__div16(3*t1 + t0 + 8);
   3629 
   3630    for (++i; i < w; ++i) {
   3631       t0 = t1;
   3632       t1 = 3*in_near[i]+in_far[i];
   3633       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
   3634       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
   3635    }
   3636    out[w*2-1] = stbi__div4(t1+2);
   3637 
   3638    STBI_NOTUSED(hs);
   3639 
   3640    return out;
   3641 }
   3642 #endif
   3643 
   3644 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3645 {
   3646    // resample with nearest-neighbor
   3647    int i,j;
   3648    STBI_NOTUSED(in_far);
   3649    for (i=0; i < w; ++i)
   3650       for (j=0; j < hs; ++j)
   3651          out[i*hs+j] = in_near[i];
   3652    return out;
   3653 }
   3654 
   3655 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
   3656 // to make sure the code produces the same results in both SIMD and scalar
   3657 #define stbi__float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
   3658 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
   3659 {
   3660    int i;
   3661    for (i=0; i < count; ++i) {
   3662       int y_fixed = (y[i] << 20) + (1<<19); // rounding
   3663       int r,g,b;
   3664       int cr = pcr[i] - 128;
   3665       int cb = pcb[i] - 128;
   3666       r = y_fixed +  cr* stbi__float2fixed(1.40200f);
   3667       g = y_fixed + (cr*-stbi__float2fixed(0.71414f)) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
   3668       b = y_fixed                                     +   cb* stbi__float2fixed(1.77200f);
   3669       r >>= 20;
   3670       g >>= 20;
   3671       b >>= 20;
   3672       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
   3673       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
   3674       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
   3675       out[0] = (stbi_uc)r;
   3676       out[1] = (stbi_uc)g;
   3677       out[2] = (stbi_uc)b;
   3678       out[3] = 255;
   3679       out += step;
   3680    }
   3681 }
   3682 
   3683 #if defined(STBI_SSE2) || defined(STBI_NEON)
   3684 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
   3685 {
   3686    int i = 0;
   3687 
   3688 #ifdef STBI_SSE2
   3689    // step == 3 is pretty ugly on the final interleave, and i'm not convinced
   3690    // it's useful in practice (you wouldn't use it for textures, for example).
   3691    // so just accelerate step == 4 case.
   3692    if (step == 4) {
   3693       // this is a fairly straightforward implementation and not super-optimized.
   3694       __m128i signflip  = _mm_set1_epi8(-0x80);
   3695       __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
   3696       __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
   3697       __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
   3698       __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
   3699       __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
   3700       __m128i xw = _mm_set1_epi16(255); // alpha channel
   3701 
   3702       for (; i+7 < count; i += 8) {
   3703          // load
   3704          __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
   3705          __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
   3706          __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
   3707          __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
   3708          __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
   3709 
   3710          // unpack to short (and left-shift cr, cb by 8)
   3711          __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
   3712          __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
   3713          __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
   3714 
   3715          // color transform
   3716          __m128i yws = _mm_srli_epi16(yw, 4);
   3717          __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
   3718          __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
   3719          __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
   3720          __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
   3721          __m128i rws = _mm_add_epi16(cr0, yws);
   3722          __m128i gwt = _mm_add_epi16(cb0, yws);
   3723          __m128i bws = _mm_add_epi16(yws, cb1);
   3724          __m128i gws = _mm_add_epi16(gwt, cr1);
   3725 
   3726          // descale
   3727          __m128i rw = _mm_srai_epi16(rws, 4);
   3728          __m128i bw = _mm_srai_epi16(bws, 4);
   3729          __m128i gw = _mm_srai_epi16(gws, 4);
   3730 
   3731          // back to byte, set up for transpose
   3732          __m128i brb = _mm_packus_epi16(rw, bw);
   3733          __m128i gxb = _mm_packus_epi16(gw, xw);
   3734 
   3735          // transpose to interleave channels
   3736          __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
   3737          __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
   3738          __m128i o0 = _mm_unpacklo_epi16(t0, t1);
   3739          __m128i o1 = _mm_unpackhi_epi16(t0, t1);
   3740 
   3741          // store
   3742          _mm_storeu_si128((__m128i *) (out + 0), o0);
   3743          _mm_storeu_si128((__m128i *) (out + 16), o1);
   3744          out += 32;
   3745       }
   3746    }
   3747 #endif
   3748 
   3749 #ifdef STBI_NEON
   3750    // in this version, step=3 support would be easy to add. but is there demand?
   3751    if (step == 4) {
   3752       // this is a fairly straightforward implementation and not super-optimized.
   3753       uint8x8_t signflip = vdup_n_u8(0x80);
   3754       int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
   3755       int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
   3756       int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
   3757       int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
   3758 
   3759       for (; i+7 < count; i += 8) {
   3760          // load
   3761          uint8x8_t y_bytes  = vld1_u8(y + i);
   3762          uint8x8_t cr_bytes = vld1_u8(pcr + i);
   3763          uint8x8_t cb_bytes = vld1_u8(pcb + i);
   3764          int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
   3765          int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
   3766 
   3767          // expand to s16
   3768          int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
   3769          int16x8_t crw = vshll_n_s8(cr_biased, 7);
   3770          int16x8_t cbw = vshll_n_s8(cb_biased, 7);
   3771 
   3772          // color transform
   3773          int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
   3774          int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
   3775          int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
   3776          int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
   3777          int16x8_t rws = vaddq_s16(yws, cr0);
   3778          int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
   3779          int16x8_t bws = vaddq_s16(yws, cb1);
   3780 
   3781          // undo scaling, round, convert to byte
   3782          uint8x8x4_t o;
   3783          o.val[0] = vqrshrun_n_s16(rws, 4);
   3784          o.val[1] = vqrshrun_n_s16(gws, 4);
   3785          o.val[2] = vqrshrun_n_s16(bws, 4);
   3786          o.val[3] = vdup_n_u8(255);
   3787 
   3788          // store, interleaving r/g/b/a
   3789          vst4_u8(out, o);
   3790          out += 8*4;
   3791       }
   3792    }
   3793 #endif
   3794 
   3795    for (; i < count; ++i) {
   3796       int y_fixed = (y[i] << 20) + (1<<19); // rounding
   3797       int r,g,b;
   3798       int cr = pcr[i] - 128;
   3799       int cb = pcb[i] - 128;
   3800       r = y_fixed + cr* stbi__float2fixed(1.40200f);
   3801       g = y_fixed + cr*-stbi__float2fixed(0.71414f) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
   3802       b = y_fixed                                   +   cb* stbi__float2fixed(1.77200f);
   3803       r >>= 20;
   3804       g >>= 20;
   3805       b >>= 20;
   3806       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
   3807       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
   3808       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
   3809       out[0] = (stbi_uc)r;
   3810       out[1] = (stbi_uc)g;
   3811       out[2] = (stbi_uc)b;
   3812       out[3] = 255;
   3813       out += step;
   3814    }
   3815 }
   3816 #endif
   3817 
   3818 // set up the kernels
   3819 static void stbi__setup_jpeg(stbi__jpeg *j)
   3820 {
   3821    j->idct_block_kernel = stbi__idct_block;
   3822    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
   3823    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
   3824 
   3825 #ifdef STBI_SSE2
   3826    if (stbi__sse2_available()) {
   3827       j->idct_block_kernel = stbi__idct_simd;
   3828       j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
   3829       j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
   3830    }
   3831 #endif
   3832 
   3833 #ifdef STBI_NEON
   3834    j->idct_block_kernel = stbi__idct_simd;
   3835    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
   3836    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
   3837 #endif
   3838 }
   3839 
   3840 // clean up the temporary component buffers
   3841 static void stbi__cleanup_jpeg(stbi__jpeg *j)
   3842 {
   3843    stbi__free_jpeg_components(j, j->s->img_n, 0);
   3844 }
   3845 
   3846 typedef struct
   3847 {
   3848    resample_row_func resample;
   3849    stbi_uc *line0,*line1;
   3850    int hs,vs;   // expansion factor in each axis
   3851    int w_lores; // horizontal pixels pre-expansion
   3852    int ystep;   // how far through vertical expansion we are
   3853    int ypos;    // which pre-expansion row we're on
   3854 } stbi__resample;
   3855 
   3856 // fast 0..255 * 0..255 => 0..255 rounded multiplication
   3857 static stbi_uc stbi__blinn_8x8(stbi_uc x, stbi_uc y)
   3858 {
   3859    unsigned int t = x*y + 128;
   3860    return (stbi_uc) ((t + (t >>8)) >> 8);
   3861 }
   3862 
   3863 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
   3864 {
   3865    int n, decode_n, is_rgb;
   3866    z->s->img_n = 0; // make stbi__cleanup_jpeg safe
   3867 
   3868    // validate req_comp
   3869    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
   3870 
   3871    // load a jpeg image from whichever source, but leave in YCbCr format
   3872    if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
   3873 
   3874    // determine actual number of components to generate
   3875    n = req_comp ? req_comp : z->s->img_n >= 3 ? 3 : 1;
   3876 
   3877    is_rgb = z->s->img_n == 3 && (z->rgb == 3 || (z->app14_color_transform == 0 && !z->jfif));
   3878 
   3879    if (z->s->img_n == 3 && n < 3 && !is_rgb)
   3880       decode_n = 1;
   3881    else
   3882       decode_n = z->s->img_n;
   3883 
   3884    // nothing to do if no components requested; check this now to avoid
   3885    // accessing uninitialized coutput[0] later
   3886    if (decode_n <= 0) { stbi__cleanup_jpeg(z); return NULL; }
   3887 
   3888    // resample and color-convert
   3889    {
   3890       int k;
   3891       unsigned int i,j;
   3892       stbi_uc *output;
   3893       stbi_uc *coutput[4] = { NULL, NULL, NULL, NULL };
   3894 
   3895       stbi__resample res_comp[4];
   3896 
   3897       for (k=0; k < decode_n; ++k) {
   3898          stbi__resample *r = &res_comp[k];
   3899 
   3900          // allocate line buffer big enough for upsampling off the edges
   3901          // with upsample factor of 4
   3902          z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
   3903          if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
   3904 
   3905          r->hs      = z->img_h_max / z->img_comp[k].h;
   3906          r->vs      = z->img_v_max / z->img_comp[k].v;
   3907          r->ystep   = r->vs >> 1;
   3908          r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
   3909          r->ypos    = 0;
   3910          r->line0   = r->line1 = z->img_comp[k].data;
   3911 
   3912          if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
   3913          else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
   3914          else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
   3915          else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
   3916          else                               r->resample = stbi__resample_row_generic;
   3917       }
   3918 
   3919       // can't error after this so, this is safe
   3920       output = (stbi_uc *) stbi__malloc_mad3(n, z->s->img_x, z->s->img_y, 1);
   3921       if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
   3922 
   3923       // now go ahead and resample
   3924       for (j=0; j < z->s->img_y; ++j) {
   3925          stbi_uc *out = output + n * z->s->img_x * j;
   3926          for (k=0; k < decode_n; ++k) {
   3927             stbi__resample *r = &res_comp[k];
   3928             int y_bot = r->ystep >= (r->vs >> 1);
   3929             coutput[k] = r->resample(z->img_comp[k].linebuf,
   3930                                      y_bot ? r->line1 : r->line0,
   3931                                      y_bot ? r->line0 : r->line1,
   3932                                      r->w_lores, r->hs);
   3933             if (++r->ystep >= r->vs) {
   3934                r->ystep = 0;
   3935                r->line0 = r->line1;
   3936                if (++r->ypos < z->img_comp[k].y)
   3937                   r->line1 += z->img_comp[k].w2;
   3938             }
   3939          }
   3940          if (n >= 3) {
   3941             stbi_uc *y = coutput[0];
   3942             if (z->s->img_n == 3) {
   3943                if (is_rgb) {
   3944                   for (i=0; i < z->s->img_x; ++i) {
   3945                      out[0] = y[i];
   3946                      out[1] = coutput[1][i];
   3947                      out[2] = coutput[2][i];
   3948                      out[3] = 255;
   3949                      out += n;
   3950                   }
   3951                } else {
   3952                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
   3953                }
   3954             } else if (z->s->img_n == 4) {
   3955                if (z->app14_color_transform == 0) { // CMYK
   3956                   for (i=0; i < z->s->img_x; ++i) {
   3957                      stbi_uc m = coutput[3][i];
   3958                      out[0] = stbi__blinn_8x8(coutput[0][i], m);
   3959                      out[1] = stbi__blinn_8x8(coutput[1][i], m);
   3960                      out[2] = stbi__blinn_8x8(coutput[2][i], m);
   3961                      out[3] = 255;
   3962                      out += n;
   3963                   }
   3964                } else if (z->app14_color_transform == 2) { // YCCK
   3965                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
   3966                   for (i=0; i < z->s->img_x; ++i) {
   3967                      stbi_uc m = coutput[3][i];
   3968                      out[0] = stbi__blinn_8x8(255 - out[0], m);
   3969                      out[1] = stbi__blinn_8x8(255 - out[1], m);
   3970                      out[2] = stbi__blinn_8x8(255 - out[2], m);
   3971                      out += n;
   3972                   }
   3973                } else { // YCbCr + alpha?  Ignore the fourth channel for now
   3974                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
   3975                }
   3976             } else
   3977                for (i=0; i < z->s->img_x; ++i) {
   3978                   out[0] = out[1] = out[2] = y[i];
   3979                   out[3] = 255; // not used if n==3
   3980                   out += n;
   3981                }
   3982          } else {
   3983             if (is_rgb) {
   3984                if (n == 1)
   3985                   for (i=0; i < z->s->img_x; ++i)
   3986                      *out++ = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
   3987                else {
   3988                   for (i=0; i < z->s->img_x; ++i, out += 2) {
   3989                      out[0] = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
   3990                      out[1] = 255;
   3991                   }
   3992                }
   3993             } else if (z->s->img_n == 4 && z->app14_color_transform == 0) {
   3994                for (i=0; i < z->s->img_x; ++i) {
   3995                   stbi_uc m = coutput[3][i];
   3996                   stbi_uc r = stbi__blinn_8x8(coutput[0][i], m);
   3997                   stbi_uc g = stbi__blinn_8x8(coutput[1][i], m);
   3998                   stbi_uc b = stbi__blinn_8x8(coutput[2][i], m);
   3999                   out[0] = stbi__compute_y(r, g, b);
   4000                   out[1] = 255;
   4001                   out += n;
   4002                }
   4003             } else if (z->s->img_n == 4 && z->app14_color_transform == 2) {
   4004                for (i=0; i < z->s->img_x; ++i) {
   4005                   out[0] = stbi__blinn_8x8(255 - coutput[0][i], coutput[3][i]);
   4006                   out[1] = 255;
   4007                   out += n;
   4008                }
   4009             } else {
   4010                stbi_uc *y = coutput[0];
   4011                if (n == 1)
   4012                   for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
   4013                else
   4014                   for (i=0; i < z->s->img_x; ++i) { *out++ = y[i]; *out++ = 255; }
   4015             }
   4016          }
   4017       }
   4018       stbi__cleanup_jpeg(z);
   4019       *out_x = z->s->img_x;
   4020       *out_y = z->s->img_y;
   4021       if (comp) *comp = z->s->img_n >= 3 ? 3 : 1; // report original components, not output
   4022       return output;
   4023    }
   4024 }
   4025 
   4026 static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   4027 {
   4028    unsigned char* result;
   4029    stbi__jpeg* j = (stbi__jpeg*) stbi__malloc(sizeof(stbi__jpeg));
   4030    if (!j) return stbi__errpuc("outofmem", "Out of memory");
   4031    memset(j, 0, sizeof(stbi__jpeg));
   4032    STBI_NOTUSED(ri);
   4033    j->s = s;
   4034    stbi__setup_jpeg(j);
   4035    result = load_jpeg_image(j, x,y,comp,req_comp);
   4036    STBI_FREE(j);
   4037    return result;
   4038 }
   4039 
   4040 static int stbi__jpeg_test(stbi__context *s)
   4041 {
   4042    int r;
   4043    stbi__jpeg* j = (stbi__jpeg*)stbi__malloc(sizeof(stbi__jpeg));
   4044    if (!j) return stbi__err("outofmem", "Out of memory");
   4045    memset(j, 0, sizeof(stbi__jpeg));
   4046    j->s = s;
   4047    stbi__setup_jpeg(j);
   4048    r = stbi__decode_jpeg_header(j, STBI__SCAN_type);
   4049    stbi__rewind(s);
   4050    STBI_FREE(j);
   4051    return r;
   4052 }
   4053 
   4054 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
   4055 {
   4056    if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
   4057       stbi__rewind( j->s );
   4058       return 0;
   4059    }
   4060    if (x) *x = j->s->img_x;
   4061    if (y) *y = j->s->img_y;
   4062    if (comp) *comp = j->s->img_n >= 3 ? 3 : 1;
   4063    return 1;
   4064 }
   4065 
   4066 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
   4067 {
   4068    int result;
   4069    stbi__jpeg* j = (stbi__jpeg*) (stbi__malloc(sizeof(stbi__jpeg)));
   4070    if (!j) return stbi__err("outofmem", "Out of memory");
   4071    memset(j, 0, sizeof(stbi__jpeg));
   4072    j->s = s;
   4073    result = stbi__jpeg_info_raw(j, x, y, comp);
   4074    STBI_FREE(j);
   4075    return result;
   4076 }
   4077 #endif
   4078 
   4079 // public domain zlib decode    v0.2  Sean Barrett 2006-11-18
   4080 //    simple implementation
   4081 //      - all input must be provided in an upfront buffer
   4082 //      - all output is written to a single output buffer (can malloc/realloc)
   4083 //    performance
   4084 //      - fast huffman
   4085 
   4086 #ifndef STBI_NO_ZLIB
   4087 
   4088 // fast-way is faster to check than jpeg huffman, but slow way is slower
   4089 #define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
   4090 #define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
   4091 #define STBI__ZNSYMS 288 // number of symbols in literal/length alphabet
   4092 
   4093 // zlib-style huffman encoding
   4094 // (jpegs packs from left, zlib from right, so can't share code)
   4095 typedef struct
   4096 {
   4097    stbi__uint16 fast[1 << STBI__ZFAST_BITS];
   4098    stbi__uint16 firstcode[16];
   4099    int maxcode[17];
   4100    stbi__uint16 firstsymbol[16];
   4101    stbi_uc  size[STBI__ZNSYMS];
   4102    stbi__uint16 value[STBI__ZNSYMS];
   4103 } stbi__zhuffman;
   4104 
   4105 stbi_inline static int stbi__bitreverse16(int n)
   4106 {
   4107   n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
   4108   n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
   4109   n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
   4110   n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
   4111   return n;
   4112 }
   4113 
   4114 stbi_inline static int stbi__bit_reverse(int v, int bits)
   4115 {
   4116    STBI_ASSERT(bits <= 16);
   4117    // to bit reverse n bits, reverse 16 and shift
   4118    // e.g. 11 bits, bit reverse and shift away 5
   4119    return stbi__bitreverse16(v) >> (16-bits);
   4120 }
   4121 
   4122 static int stbi__zbuild_huffman(stbi__zhuffman *z, const stbi_uc *sizelist, int num)
   4123 {
   4124    int i,k=0;
   4125    int code, next_code[16], sizes[17];
   4126 
   4127    // DEFLATE spec for generating codes
   4128    memset(sizes, 0, sizeof(sizes));
   4129    memset(z->fast, 0, sizeof(z->fast));
   4130    for (i=0; i < num; ++i)
   4131       ++sizes[sizelist[i]];
   4132    sizes[0] = 0;
   4133    for (i=1; i < 16; ++i)
   4134       if (sizes[i] > (1 << i))
   4135          return stbi__err("bad sizes", "Corrupt PNG");
   4136    code = 0;
   4137    for (i=1; i < 16; ++i) {
   4138       next_code[i] = code;
   4139       z->firstcode[i] = (stbi__uint16) code;
   4140       z->firstsymbol[i] = (stbi__uint16) k;
   4141       code = (code + sizes[i]);
   4142       if (sizes[i])
   4143          if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
   4144       z->maxcode[i] = code << (16-i); // preshift for inner loop
   4145       code <<= 1;
   4146       k += sizes[i];
   4147    }
   4148    z->maxcode[16] = 0x10000; // sentinel
   4149    for (i=0; i < num; ++i) {
   4150       int s = sizelist[i];
   4151       if (s) {
   4152          int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
   4153          stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
   4154          z->size [c] = (stbi_uc     ) s;
   4155          z->value[c] = (stbi__uint16) i;
   4156          if (s <= STBI__ZFAST_BITS) {
   4157             int j = stbi__bit_reverse(next_code[s],s);
   4158             while (j < (1 << STBI__ZFAST_BITS)) {
   4159                z->fast[j] = fastv;
   4160                j += (1 << s);
   4161             }
   4162          }
   4163          ++next_code[s];
   4164       }
   4165    }
   4166    return 1;
   4167 }
   4168 
   4169 // zlib-from-memory implementation for PNG reading
   4170 //    because PNG allows splitting the zlib stream arbitrarily,
   4171 //    and it's annoying structurally to have PNG call ZLIB call PNG,
   4172 //    we require PNG read all the IDATs and combine them into a single
   4173 //    memory buffer
   4174 
   4175 typedef struct
   4176 {
   4177    stbi_uc *zbuffer, *zbuffer_end;
   4178    int num_bits;
   4179    stbi__uint32 code_buffer;
   4180 
   4181    char *zout;
   4182    char *zout_start;
   4183    char *zout_end;
   4184    int   z_expandable;
   4185 
   4186    stbi__zhuffman z_length, z_distance;
   4187 } stbi__zbuf;
   4188 
   4189 stbi_inline static int stbi__zeof(stbi__zbuf *z)
   4190 {
   4191    return (z->zbuffer >= z->zbuffer_end);
   4192 }
   4193 
   4194 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
   4195 {
   4196    return stbi__zeof(z) ? 0 : *z->zbuffer++;
   4197 }
   4198 
   4199 static void stbi__fill_bits(stbi__zbuf *z)
   4200 {
   4201    do {
   4202       if (z->code_buffer >= (1U << z->num_bits)) {
   4203         z->zbuffer = z->zbuffer_end;  /* treat this as EOF so we fail. */
   4204         return;
   4205       }
   4206       z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
   4207       z->num_bits += 8;
   4208    } while (z->num_bits <= 24);
   4209 }
   4210 
   4211 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
   4212 {
   4213    unsigned int k;
   4214    if (z->num_bits < n) stbi__fill_bits(z);
   4215    k = z->code_buffer & ((1 << n) - 1);
   4216    z->code_buffer >>= n;
   4217    z->num_bits -= n;
   4218    return k;
   4219 }
   4220 
   4221 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
   4222 {
   4223    int b,s,k;
   4224    // not resolved by fast table, so compute it the slow way
   4225    // use jpeg approach, which requires MSbits at top
   4226    k = stbi__bit_reverse(a->code_buffer, 16);
   4227    for (s=STBI__ZFAST_BITS+1; ; ++s)
   4228       if (k < z->maxcode[s])
   4229          break;
   4230    if (s >= 16) return -1; // invalid code!
   4231    // code size is s, so:
   4232    b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
   4233    if (b >= STBI__ZNSYMS) return -1; // some data was corrupt somewhere!
   4234    if (z->size[b] != s) return -1;  // was originally an assert, but report failure instead.
   4235    a->code_buffer >>= s;
   4236    a->num_bits -= s;
   4237    return z->value[b];
   4238 }
   4239 
   4240 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
   4241 {
   4242    int b,s;
   4243    if (a->num_bits < 16) {
   4244       if (stbi__zeof(a)) {
   4245          return -1;   /* report error for unexpected end of data. */
   4246       }
   4247       stbi__fill_bits(a);
   4248    }
   4249    b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
   4250    if (b) {
   4251       s = b >> 9;
   4252       a->code_buffer >>= s;
   4253       a->num_bits -= s;
   4254       return b & 511;
   4255    }
   4256    return stbi__zhuffman_decode_slowpath(a, z);
   4257 }
   4258 
   4259 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
   4260 {
   4261    char *q;
   4262    unsigned int cur, limit, old_limit;
   4263    z->zout = zout;
   4264    if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
   4265    cur   = (unsigned int) (z->zout - z->zout_start);
   4266    limit = old_limit = (unsigned) (z->zout_end - z->zout_start);
   4267    if (UINT_MAX - cur < (unsigned) n) return stbi__err("outofmem", "Out of memory");
   4268    while (cur + n > limit) {
   4269       if(limit > UINT_MAX / 2) return stbi__err("outofmem", "Out of memory");
   4270       limit *= 2;
   4271    }
   4272    q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
   4273    STBI_NOTUSED(old_limit);
   4274    if (q == NULL) return stbi__err("outofmem", "Out of memory");
   4275    z->zout_start = q;
   4276    z->zout       = q + cur;
   4277    z->zout_end   = q + limit;
   4278    return 1;
   4279 }
   4280 
   4281 static const int stbi__zlength_base[31] = {
   4282    3,4,5,6,7,8,9,10,11,13,
   4283    15,17,19,23,27,31,35,43,51,59,
   4284    67,83,99,115,131,163,195,227,258,0,0 };
   4285 
   4286 static const int stbi__zlength_extra[31]=
   4287 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
   4288 
   4289 static const int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
   4290 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
   4291 
   4292 static const int stbi__zdist_extra[32] =
   4293 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
   4294 
   4295 static int stbi__parse_huffman_block(stbi__zbuf *a)
   4296 {
   4297    char *zout = a->zout;
   4298    for(;;) {
   4299       int z = stbi__zhuffman_decode(a, &a->z_length);
   4300       if (z < 256) {
   4301          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
   4302          if (zout >= a->zout_end) {
   4303             if (!stbi__zexpand(a, zout, 1)) return 0;
   4304             zout = a->zout;
   4305          }
   4306          *zout++ = (char) z;
   4307       } else {
   4308          stbi_uc *p;
   4309          int len,dist;
   4310          if (z == 256) {
   4311             a->zout = zout;
   4312             return 1;
   4313          }
   4314          if (z >= 286) return stbi__err("bad huffman code","Corrupt PNG"); // per DEFLATE, length codes 286 and 287 must not appear in compressed data
   4315          z -= 257;
   4316          len = stbi__zlength_base[z];
   4317          if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
   4318          z = stbi__zhuffman_decode(a, &a->z_distance);
   4319          if (z < 0 || z >= 30) return stbi__err("bad huffman code","Corrupt PNG"); // per DEFLATE, distance codes 30 and 31 must not appear in compressed data
   4320          dist = stbi__zdist_base[z];
   4321          if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
   4322          if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
   4323          if (zout + len > a->zout_end) {
   4324             if (!stbi__zexpand(a, zout, len)) return 0;
   4325             zout = a->zout;
   4326          }
   4327          p = (stbi_uc *) (zout - dist);
   4328          if (dist == 1) { // run of one byte; common in images.
   4329             stbi_uc v = *p;
   4330             if (len) { do *zout++ = v; while (--len); }
   4331          } else {
   4332             if (len) { do *zout++ = *p++; while (--len); }
   4333          }
   4334       }
   4335    }
   4336 }
   4337 
   4338 static int stbi__compute_huffman_codes(stbi__zbuf *a)
   4339 {
   4340    static const stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
   4341    stbi__zhuffman z_codelength;
   4342    stbi_uc lencodes[286+32+137];//padding for maximum single op
   4343    stbi_uc codelength_sizes[19];
   4344    int i,n;
   4345 
   4346    int hlit  = stbi__zreceive(a,5) + 257;
   4347    int hdist = stbi__zreceive(a,5) + 1;
   4348    int hclen = stbi__zreceive(a,4) + 4;
   4349    int ntot  = hlit + hdist;
   4350 
   4351    memset(codelength_sizes, 0, sizeof(codelength_sizes));
   4352    for (i=0; i < hclen; ++i) {
   4353       int s = stbi__zreceive(a,3);
   4354       codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
   4355    }
   4356    if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
   4357 
   4358    n = 0;
   4359    while (n < ntot) {
   4360       int c = stbi__zhuffman_decode(a, &z_codelength);
   4361       if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
   4362       if (c < 16)
   4363          lencodes[n++] = (stbi_uc) c;
   4364       else {
   4365          stbi_uc fill = 0;
   4366          if (c == 16) {
   4367             c = stbi__zreceive(a,2)+3;
   4368             if (n == 0) return stbi__err("bad codelengths", "Corrupt PNG");
   4369             fill = lencodes[n-1];
   4370          } else if (c == 17) {
   4371             c = stbi__zreceive(a,3)+3;
   4372          } else if (c == 18) {
   4373             c = stbi__zreceive(a,7)+11;
   4374          } else {
   4375             return stbi__err("bad codelengths", "Corrupt PNG");
   4376          }
   4377          if (ntot - n < c) return stbi__err("bad codelengths", "Corrupt PNG");
   4378          memset(lencodes+n, fill, c);
   4379          n += c;
   4380       }
   4381    }
   4382    if (n != ntot) return stbi__err("bad codelengths","Corrupt PNG");
   4383    if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
   4384    if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
   4385    return 1;
   4386 }
   4387 
   4388 static int stbi__parse_uncompressed_block(stbi__zbuf *a)
   4389 {
   4390    stbi_uc header[4];
   4391    int len,nlen,k;
   4392    if (a->num_bits & 7)
   4393       stbi__zreceive(a, a->num_bits & 7); // discard
   4394    // drain the bit-packed data into header
   4395    k = 0;
   4396    while (a->num_bits > 0) {
   4397       header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
   4398       a->code_buffer >>= 8;
   4399       a->num_bits -= 8;
   4400    }
   4401    if (a->num_bits < 0) return stbi__err("zlib corrupt","Corrupt PNG");
   4402    // now fill header the normal way
   4403    while (k < 4)
   4404       header[k++] = stbi__zget8(a);
   4405    len  = header[1] * 256 + header[0];
   4406    nlen = header[3] * 256 + header[2];
   4407    if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
   4408    if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
   4409    if (a->zout + len > a->zout_end)
   4410       if (!stbi__zexpand(a, a->zout, len)) return 0;
   4411    memcpy(a->zout, a->zbuffer, len);
   4412    a->zbuffer += len;
   4413    a->zout += len;
   4414    return 1;
   4415 }
   4416 
   4417 static int stbi__parse_zlib_header(stbi__zbuf *a)
   4418 {
   4419    int cmf   = stbi__zget8(a);
   4420    int cm    = cmf & 15;
   4421    /* int cinfo = cmf >> 4; */
   4422    int flg   = stbi__zget8(a);
   4423    if (stbi__zeof(a)) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
   4424    if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
   4425    if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
   4426    if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
   4427    // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
   4428    return 1;
   4429 }
   4430 
   4431 static const stbi_uc stbi__zdefault_length[STBI__ZNSYMS] =
   4432 {
   4433    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4434    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4435    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4436    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4437    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4438    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4439    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4440    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4441    7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8
   4442 };
   4443 static const stbi_uc stbi__zdefault_distance[32] =
   4444 {
   4445    5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
   4446 };
   4447 /*
   4448 Init algorithm:
   4449 {
   4450    int i;   // use <= to match clearly with spec
   4451    for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
   4452    for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
   4453    for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
   4454    for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
   4455 
   4456    for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
   4457 }
   4458 */
   4459 
   4460 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
   4461 {
   4462    int final, type;
   4463    if (parse_header)
   4464       if (!stbi__parse_zlib_header(a)) return 0;
   4465    a->num_bits = 0;
   4466    a->code_buffer = 0;
   4467    do {
   4468       final = stbi__zreceive(a,1);
   4469       type = stbi__zreceive(a,2);
   4470       if (type == 0) {
   4471          if (!stbi__parse_uncompressed_block(a)) return 0;
   4472       } else if (type == 3) {
   4473          return 0;
   4474       } else {
   4475          if (type == 1) {
   4476             // use fixed code lengths
   4477             if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , STBI__ZNSYMS)) return 0;
   4478             if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
   4479          } else {
   4480             if (!stbi__compute_huffman_codes(a)) return 0;
   4481          }
   4482          if (!stbi__parse_huffman_block(a)) return 0;
   4483       }
   4484    } while (!final);
   4485    return 1;
   4486 }
   4487 
   4488 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
   4489 {
   4490    a->zout_start = obuf;
   4491    a->zout       = obuf;
   4492    a->zout_end   = obuf + olen;
   4493    a->z_expandable = exp;
   4494 
   4495    return stbi__parse_zlib(a, parse_header);
   4496 }
   4497 
   4498 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
   4499 {
   4500    stbi__zbuf a;
   4501    char *p = (char *) stbi__malloc(initial_size);
   4502    if (p == NULL) return NULL;
   4503    a.zbuffer = (stbi_uc *) buffer;
   4504    a.zbuffer_end = (stbi_uc *) buffer + len;
   4505    if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
   4506       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   4507       return a.zout_start;
   4508    } else {
   4509       STBI_FREE(a.zout_start);
   4510       return NULL;
   4511    }
   4512 }
   4513 
   4514 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
   4515 {
   4516    return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
   4517 }
   4518 
   4519 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
   4520 {
   4521    stbi__zbuf a;
   4522    char *p = (char *) stbi__malloc(initial_size);
   4523    if (p == NULL) return NULL;
   4524    a.zbuffer = (stbi_uc *) buffer;
   4525    a.zbuffer_end = (stbi_uc *) buffer + len;
   4526    if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
   4527       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   4528       return a.zout_start;
   4529    } else {
   4530       STBI_FREE(a.zout_start);
   4531       return NULL;
   4532    }
   4533 }
   4534 
   4535 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
   4536 {
   4537    stbi__zbuf a;
   4538    a.zbuffer = (stbi_uc *) ibuffer;
   4539    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
   4540    if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
   4541       return (int) (a.zout - a.zout_start);
   4542    else
   4543       return -1;
   4544 }
   4545 
   4546 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
   4547 {
   4548    stbi__zbuf a;
   4549    char *p = (char *) stbi__malloc(16384);
   4550    if (p == NULL) return NULL;
   4551    a.zbuffer = (stbi_uc *) buffer;
   4552    a.zbuffer_end = (stbi_uc *) buffer+len;
   4553    if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
   4554       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   4555       return a.zout_start;
   4556    } else {
   4557       STBI_FREE(a.zout_start);
   4558       return NULL;
   4559    }
   4560 }
   4561 
   4562 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
   4563 {
   4564    stbi__zbuf a;
   4565    a.zbuffer = (stbi_uc *) ibuffer;
   4566    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
   4567    if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
   4568       return (int) (a.zout - a.zout_start);
   4569    else
   4570       return -1;
   4571 }
   4572 #endif
   4573 
   4574 // public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
   4575 //    simple implementation
   4576 //      - only 8-bit samples
   4577 //      - no CRC checking
   4578 //      - allocates lots of intermediate memory
   4579 //        - avoids problem of streaming data between subsystems
   4580 //        - avoids explicit window management
   4581 //    performance
   4582 //      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
   4583 
   4584 #ifndef STBI_NO_PNG
   4585 typedef struct
   4586 {
   4587    stbi__uint32 length;
   4588    stbi__uint32 type;
   4589 } stbi__pngchunk;
   4590 
   4591 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
   4592 {
   4593    stbi__pngchunk c;
   4594    c.length = stbi__get32be(s);
   4595    c.type   = stbi__get32be(s);
   4596    return c;
   4597 }
   4598 
   4599 static int stbi__check_png_header(stbi__context *s)
   4600 {
   4601    static const stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
   4602    int i;
   4603    for (i=0; i < 8; ++i)
   4604       if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
   4605    return 1;
   4606 }
   4607 
   4608 typedef struct
   4609 {
   4610    stbi__context *s;
   4611    stbi_uc *idata, *expanded, *out;
   4612    int depth;
   4613 } stbi__png;
   4614 
   4615 
   4616 enum {
   4617    STBI__F_none=0,
   4618    STBI__F_sub=1,
   4619    STBI__F_up=2,
   4620    STBI__F_avg=3,
   4621    STBI__F_paeth=4,
   4622    // synthetic filters used for first scanline to avoid needing a dummy row of 0s
   4623    STBI__F_avg_first,
   4624    STBI__F_paeth_first
   4625 };
   4626 
   4627 static stbi_uc first_row_filter[5] =
   4628 {
   4629    STBI__F_none,
   4630    STBI__F_sub,
   4631    STBI__F_none,
   4632    STBI__F_avg_first,
   4633    STBI__F_paeth_first
   4634 };
   4635 
   4636 static int stbi__paeth(int a, int b, int c)
   4637 {
   4638    int p = a + b - c;
   4639    int pa = abs(p-a);
   4640    int pb = abs(p-b);
   4641    int pc = abs(p-c);
   4642    if (pa <= pb && pa <= pc) return a;
   4643    if (pb <= pc) return b;
   4644    return c;
   4645 }
   4646 
   4647 static const stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
   4648 
   4649 // create the png data from post-deflated data
   4650 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
   4651 {
   4652    int bytes = (depth == 16? 2 : 1);
   4653    stbi__context *s = a->s;
   4654    stbi__uint32 i,j,stride = x*out_n*bytes;
   4655    stbi__uint32 img_len, img_width_bytes;
   4656    int k;
   4657    int img_n = s->img_n; // copy it into a local for later
   4658 
   4659    int output_bytes = out_n*bytes;
   4660    int filter_bytes = img_n*bytes;
   4661    int width = x;
   4662 
   4663    STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
   4664    a->out = (stbi_uc *) stbi__malloc_mad3(x, y, output_bytes, 0); // extra bytes to write off the end into
   4665    if (!a->out) return stbi__err("outofmem", "Out of memory");
   4666 
   4667    if (!stbi__mad3sizes_valid(img_n, x, depth, 7)) return stbi__err("too large", "Corrupt PNG");
   4668    img_width_bytes = (((img_n * x * depth) + 7) >> 3);
   4669    img_len = (img_width_bytes + 1) * y;
   4670 
   4671    // we used to check for exact match between raw_len and img_len on non-interlaced PNGs,
   4672    // but issue #276 reported a PNG in the wild that had extra data at the end (all zeros),
   4673    // so just check for raw_len < img_len always.
   4674    if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
   4675 
   4676    for (j=0; j < y; ++j) {
   4677       stbi_uc *cur = a->out + stride*j;
   4678       stbi_uc *prior;
   4679       int filter = *raw++;
   4680 
   4681       if (filter > 4)
   4682          return stbi__err("invalid filter","Corrupt PNG");
   4683 
   4684       if (depth < 8) {
   4685          if (img_width_bytes > x) return stbi__err("invalid width","Corrupt PNG");
   4686          cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
   4687          filter_bytes = 1;
   4688          width = img_width_bytes;
   4689       }
   4690       prior = cur - stride; // bugfix: need to compute this after 'cur +=' computation above
   4691 
   4692       // if first row, use special filter that doesn't sample previous row
   4693       if (j == 0) filter = first_row_filter[filter];
   4694 
   4695       // handle first byte explicitly
   4696       for (k=0; k < filter_bytes; ++k) {
   4697          switch (filter) {
   4698             case STBI__F_none       : cur[k] = raw[k]; break;
   4699             case STBI__F_sub        : cur[k] = raw[k]; break;
   4700             case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
   4701             case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
   4702             case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
   4703             case STBI__F_avg_first  : cur[k] = raw[k]; break;
   4704             case STBI__F_paeth_first: cur[k] = raw[k]; break;
   4705          }
   4706       }
   4707 
   4708       if (depth == 8) {
   4709          if (img_n != out_n)
   4710             cur[img_n] = 255; // first pixel
   4711          raw += img_n;
   4712          cur += out_n;
   4713          prior += out_n;
   4714       } else if (depth == 16) {
   4715          if (img_n != out_n) {
   4716             cur[filter_bytes]   = 255; // first pixel top byte
   4717             cur[filter_bytes+1] = 255; // first pixel bottom byte
   4718          }
   4719          raw += filter_bytes;
   4720          cur += output_bytes;
   4721          prior += output_bytes;
   4722       } else {
   4723          raw += 1;
   4724          cur += 1;
   4725          prior += 1;
   4726       }
   4727 
   4728       // this is a little gross, so that we don't switch per-pixel or per-component
   4729       if (depth < 8 || img_n == out_n) {
   4730          int nk = (width - 1)*filter_bytes;
   4731          #define STBI__CASE(f) \
   4732              case f:     \
   4733                 for (k=0; k < nk; ++k)
   4734          switch (filter) {
   4735             // "none" filter turns into a memcpy here; make that explicit.
   4736             case STBI__F_none:         memcpy(cur, raw, nk); break;
   4737             STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); } break;
   4738             STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
   4739             STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); } break;
   4740             STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); } break;
   4741             STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); } break;
   4742             STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); } break;
   4743          }
   4744          #undef STBI__CASE
   4745          raw += nk;
   4746       } else {
   4747          STBI_ASSERT(img_n+1 == out_n);
   4748          #define STBI__CASE(f) \
   4749              case f:     \
   4750                 for (i=x-1; i >= 1; --i, cur[filter_bytes]=255,raw+=filter_bytes,cur+=output_bytes,prior+=output_bytes) \
   4751                    for (k=0; k < filter_bytes; ++k)
   4752          switch (filter) {
   4753             STBI__CASE(STBI__F_none)         { cur[k] = raw[k]; } break;
   4754             STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k- output_bytes]); } break;
   4755             STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
   4756             STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k- output_bytes])>>1)); } break;
   4757             STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],prior[k],prior[k- output_bytes])); } break;
   4758             STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k- output_bytes] >> 1)); } break;
   4759             STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],0,0)); } break;
   4760          }
   4761          #undef STBI__CASE
   4762 
   4763          // the loop above sets the high byte of the pixels' alpha, but for
   4764          // 16 bit png files we also need the low byte set. we'll do that here.
   4765          if (depth == 16) {
   4766             cur = a->out + stride*j; // start at the beginning of the row again
   4767             for (i=0; i < x; ++i,cur+=output_bytes) {
   4768                cur[filter_bytes+1] = 255;
   4769             }
   4770          }
   4771       }
   4772    }
   4773 
   4774    // we make a separate pass to expand bits to pixels; for performance,
   4775    // this could run two scanlines behind the above code, so it won't
   4776    // intefere with filtering but will still be in the cache.
   4777    if (depth < 8) {
   4778       for (j=0; j < y; ++j) {
   4779          stbi_uc *cur = a->out + stride*j;
   4780          stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
   4781          // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
   4782          // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
   4783          stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
   4784 
   4785          // note that the final byte might overshoot and write more data than desired.
   4786          // we can allocate enough data that this never writes out of memory, but it
   4787          // could also overwrite the next scanline. can it overwrite non-empty data
   4788          // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
   4789          // so we need to explicitly clamp the final ones
   4790 
   4791          if (depth == 4) {
   4792             for (k=x*img_n; k >= 2; k-=2, ++in) {
   4793                *cur++ = scale * ((*in >> 4)       );
   4794                *cur++ = scale * ((*in     ) & 0x0f);
   4795             }
   4796             if (k > 0) *cur++ = scale * ((*in >> 4)       );
   4797          } else if (depth == 2) {
   4798             for (k=x*img_n; k >= 4; k-=4, ++in) {
   4799                *cur++ = scale * ((*in >> 6)       );
   4800                *cur++ = scale * ((*in >> 4) & 0x03);
   4801                *cur++ = scale * ((*in >> 2) & 0x03);
   4802                *cur++ = scale * ((*in     ) & 0x03);
   4803             }
   4804             if (k > 0) *cur++ = scale * ((*in >> 6)       );
   4805             if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
   4806             if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
   4807          } else if (depth == 1) {
   4808             for (k=x*img_n; k >= 8; k-=8, ++in) {
   4809                *cur++ = scale * ((*in >> 7)       );
   4810                *cur++ = scale * ((*in >> 6) & 0x01);
   4811                *cur++ = scale * ((*in >> 5) & 0x01);
   4812                *cur++ = scale * ((*in >> 4) & 0x01);
   4813                *cur++ = scale * ((*in >> 3) & 0x01);
   4814                *cur++ = scale * ((*in >> 2) & 0x01);
   4815                *cur++ = scale * ((*in >> 1) & 0x01);
   4816                *cur++ = scale * ((*in     ) & 0x01);
   4817             }
   4818             if (k > 0) *cur++ = scale * ((*in >> 7)       );
   4819             if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
   4820             if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
   4821             if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
   4822             if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
   4823             if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
   4824             if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
   4825          }
   4826          if (img_n != out_n) {
   4827             int q;
   4828             // insert alpha = 255
   4829             cur = a->out + stride*j;
   4830             if (img_n == 1) {
   4831                for (q=x-1; q >= 0; --q) {
   4832                   cur[q*2+1] = 255;
   4833                   cur[q*2+0] = cur[q];
   4834                }
   4835             } else {
   4836                STBI_ASSERT(img_n == 3);
   4837                for (q=x-1; q >= 0; --q) {
   4838                   cur[q*4+3] = 255;
   4839                   cur[q*4+2] = cur[q*3+2];
   4840                   cur[q*4+1] = cur[q*3+1];
   4841                   cur[q*4+0] = cur[q*3+0];
   4842                }
   4843             }
   4844          }
   4845       }
   4846    } else if (depth == 16) {
   4847       // force the image data from big-endian to platform-native.
   4848       // this is done in a separate pass due to the decoding relying
   4849       // on the data being untouched, but could probably be done
   4850       // per-line during decode if care is taken.
   4851       stbi_uc *cur = a->out;
   4852       stbi__uint16 *cur16 = (stbi__uint16*)cur;
   4853 
   4854       for(i=0; i < x*y*out_n; ++i,cur16++,cur+=2) {
   4855          *cur16 = (cur[0] << 8) | cur[1];
   4856       }
   4857    }
   4858 
   4859    return 1;
   4860 }
   4861 
   4862 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
   4863 {
   4864    int bytes = (depth == 16 ? 2 : 1);
   4865    int out_bytes = out_n * bytes;
   4866    stbi_uc *final;
   4867    int p;
   4868    if (!interlaced)
   4869       return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
   4870 
   4871    // de-interlacing
   4872    final = (stbi_uc *) stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0);
   4873    if (!final) return stbi__err("outofmem", "Out of memory");
   4874    for (p=0; p < 7; ++p) {
   4875       int xorig[] = { 0,4,0,2,0,1,0 };
   4876       int yorig[] = { 0,0,4,0,2,0,1 };
   4877       int xspc[]  = { 8,8,4,4,2,2,1 };
   4878       int yspc[]  = { 8,8,8,4,4,2,2 };
   4879       int i,j,x,y;
   4880       // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
   4881       x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
   4882       y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
   4883       if (x && y) {
   4884          stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
   4885          if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
   4886             STBI_FREE(final);
   4887             return 0;
   4888          }
   4889          for (j=0; j < y; ++j) {
   4890             for (i=0; i < x; ++i) {
   4891                int out_y = j*yspc[p]+yorig[p];
   4892                int out_x = i*xspc[p]+xorig[p];
   4893                memcpy(final + out_y*a->s->img_x*out_bytes + out_x*out_bytes,
   4894                       a->out + (j*x+i)*out_bytes, out_bytes);
   4895             }
   4896          }
   4897          STBI_FREE(a->out);
   4898          image_data += img_len;
   4899          image_data_len -= img_len;
   4900       }
   4901    }
   4902    a->out = final;
   4903 
   4904    return 1;
   4905 }
   4906 
   4907 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
   4908 {
   4909    stbi__context *s = z->s;
   4910    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   4911    stbi_uc *p = z->out;
   4912 
   4913    // compute color-based transparency, assuming we've
   4914    // already got 255 as the alpha value in the output
   4915    STBI_ASSERT(out_n == 2 || out_n == 4);
   4916 
   4917    if (out_n == 2) {
   4918       for (i=0; i < pixel_count; ++i) {
   4919          p[1] = (p[0] == tc[0] ? 0 : 255);
   4920          p += 2;
   4921       }
   4922    } else {
   4923       for (i=0; i < pixel_count; ++i) {
   4924          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
   4925             p[3] = 0;
   4926          p += 4;
   4927       }
   4928    }
   4929    return 1;
   4930 }
   4931 
   4932 static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
   4933 {
   4934    stbi__context *s = z->s;
   4935    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   4936    stbi__uint16 *p = (stbi__uint16*) z->out;
   4937 
   4938    // compute color-based transparency, assuming we've
   4939    // already got 65535 as the alpha value in the output
   4940    STBI_ASSERT(out_n == 2 || out_n == 4);
   4941 
   4942    if (out_n == 2) {
   4943       for (i = 0; i < pixel_count; ++i) {
   4944          p[1] = (p[0] == tc[0] ? 0 : 65535);
   4945          p += 2;
   4946       }
   4947    } else {
   4948       for (i = 0; i < pixel_count; ++i) {
   4949          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
   4950             p[3] = 0;
   4951          p += 4;
   4952       }
   4953    }
   4954    return 1;
   4955 }
   4956 
   4957 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
   4958 {
   4959    stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
   4960    stbi_uc *p, *temp_out, *orig = a->out;
   4961 
   4962    p = (stbi_uc *) stbi__malloc_mad2(pixel_count, pal_img_n, 0);
   4963    if (p == NULL) return stbi__err("outofmem", "Out of memory");
   4964 
   4965    // between here and free(out) below, exitting would leak
   4966    temp_out = p;
   4967 
   4968    if (pal_img_n == 3) {
   4969       for (i=0; i < pixel_count; ++i) {
   4970          int n = orig[i]*4;
   4971          p[0] = palette[n  ];
   4972          p[1] = palette[n+1];
   4973          p[2] = palette[n+2];
   4974          p += 3;
   4975       }
   4976    } else {
   4977       for (i=0; i < pixel_count; ++i) {
   4978          int n = orig[i]*4;
   4979          p[0] = palette[n  ];
   4980          p[1] = palette[n+1];
   4981          p[2] = palette[n+2];
   4982          p[3] = palette[n+3];
   4983          p += 4;
   4984       }
   4985    }
   4986    STBI_FREE(a->out);
   4987    a->out = temp_out;
   4988 
   4989    STBI_NOTUSED(len);
   4990 
   4991    return 1;
   4992 }
   4993 
   4994 static int stbi__unpremultiply_on_load_global = 0;
   4995 static int stbi__de_iphone_flag_global = 0;
   4996 
   4997 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
   4998 {
   4999    stbi__unpremultiply_on_load_global = flag_true_if_should_unpremultiply;
   5000 }
   5001 
   5002 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
   5003 {
   5004    stbi__de_iphone_flag_global = flag_true_if_should_convert;
   5005 }
   5006 
   5007 #ifndef STBI_THREAD_LOCAL
   5008 #define stbi__unpremultiply_on_load  stbi__unpremultiply_on_load_global
   5009 #define stbi__de_iphone_flag  stbi__de_iphone_flag_global
   5010 #else
   5011 static STBI_THREAD_LOCAL int stbi__unpremultiply_on_load_local, stbi__unpremultiply_on_load_set;
   5012 static STBI_THREAD_LOCAL int stbi__de_iphone_flag_local, stbi__de_iphone_flag_set;
   5013 
   5014 STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply)
   5015 {
   5016    stbi__unpremultiply_on_load_local = flag_true_if_should_unpremultiply;
   5017    stbi__unpremultiply_on_load_set = 1;
   5018 }
   5019 
   5020 STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert)
   5021 {
   5022    stbi__de_iphone_flag_local = flag_true_if_should_convert;
   5023    stbi__de_iphone_flag_set = 1;
   5024 }
   5025 
   5026 #define stbi__unpremultiply_on_load  (stbi__unpremultiply_on_load_set           \
   5027                                        ? stbi__unpremultiply_on_load_local      \
   5028                                        : stbi__unpremultiply_on_load_global)
   5029 #define stbi__de_iphone_flag  (stbi__de_iphone_flag_set                         \
   5030                                 ? stbi__de_iphone_flag_local                    \
   5031                                 : stbi__de_iphone_flag_global)
   5032 #endif // STBI_THREAD_LOCAL
   5033 
   5034 static void stbi__de_iphone(stbi__png *z)
   5035 {
   5036    stbi__context *s = z->s;
   5037    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   5038    stbi_uc *p = z->out;
   5039 
   5040    if (s->img_out_n == 3) {  // convert bgr to rgb
   5041       for (i=0; i < pixel_count; ++i) {
   5042          stbi_uc t = p[0];
   5043          p[0] = p[2];
   5044          p[2] = t;
   5045          p += 3;
   5046       }
   5047    } else {
   5048       STBI_ASSERT(s->img_out_n == 4);
   5049       if (stbi__unpremultiply_on_load) {
   5050          // convert bgr to rgb and unpremultiply
   5051          for (i=0; i < pixel_count; ++i) {
   5052             stbi_uc a = p[3];
   5053             stbi_uc t = p[0];
   5054             if (a) {
   5055                stbi_uc half = a / 2;
   5056                p[0] = (p[2] * 255 + half) / a;
   5057                p[1] = (p[1] * 255 + half) / a;
   5058                p[2] = ( t   * 255 + half) / a;
   5059             } else {
   5060                p[0] = p[2];
   5061                p[2] = t;
   5062             }
   5063             p += 4;
   5064          }
   5065       } else {
   5066          // convert bgr to rgb
   5067          for (i=0; i < pixel_count; ++i) {
   5068             stbi_uc t = p[0];
   5069             p[0] = p[2];
   5070             p[2] = t;
   5071             p += 4;
   5072          }
   5073       }
   5074    }
   5075 }
   5076 
   5077 #define STBI__PNG_TYPE(a,b,c,d)  (((unsigned) (a) << 24) + ((unsigned) (b) << 16) + ((unsigned) (c) << 8) + (unsigned) (d))
   5078 
   5079 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
   5080 {
   5081    stbi_uc palette[1024], pal_img_n=0;
   5082    stbi_uc has_trans=0, tc[3]={0};
   5083    stbi__uint16 tc16[3];
   5084    stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
   5085    int first=1,k,interlace=0, color=0, is_iphone=0;
   5086    stbi__context *s = z->s;
   5087 
   5088    z->expanded = NULL;
   5089    z->idata = NULL;
   5090    z->out = NULL;
   5091 
   5092    if (!stbi__check_png_header(s)) return 0;
   5093 
   5094    if (scan == STBI__SCAN_type) return 1;
   5095 
   5096    for (;;) {
   5097       stbi__pngchunk c = stbi__get_chunk_header(s);
   5098       switch (c.type) {
   5099          case STBI__PNG_TYPE('C','g','B','I'):
   5100             is_iphone = 1;
   5101             stbi__skip(s, c.length);
   5102             break;
   5103          case STBI__PNG_TYPE('I','H','D','R'): {
   5104             int comp,filter;
   5105             if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
   5106             first = 0;
   5107             if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
   5108             s->img_x = stbi__get32be(s);
   5109             s->img_y = stbi__get32be(s);
   5110             if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   5111             if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   5112             z->depth = stbi__get8(s);  if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16)  return stbi__err("1/2/4/8/16-bit only","PNG not supported: 1/2/4/8/16-bit only");
   5113             color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
   5114             if (color == 3 && z->depth == 16)                  return stbi__err("bad ctype","Corrupt PNG");
   5115             if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
   5116             comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
   5117             filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
   5118             interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
   5119             if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
   5120             if (!pal_img_n) {
   5121                s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
   5122                if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
   5123             } else {
   5124                // if paletted, then pal_n is our final components, and
   5125                // img_n is # components to decompress/filter.
   5126                s->img_n = 1;
   5127                if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
   5128             }
   5129             // even with SCAN_header, have to scan to see if we have a tRNS
   5130             break;
   5131          }
   5132 
   5133          case STBI__PNG_TYPE('P','L','T','E'):  {
   5134             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   5135             if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
   5136             pal_len = c.length / 3;
   5137             if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
   5138             for (i=0; i < pal_len; ++i) {
   5139                palette[i*4+0] = stbi__get8(s);
   5140                palette[i*4+1] = stbi__get8(s);
   5141                palette[i*4+2] = stbi__get8(s);
   5142                palette[i*4+3] = 255;
   5143             }
   5144             break;
   5145          }
   5146 
   5147          case STBI__PNG_TYPE('t','R','N','S'): {
   5148             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   5149             if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
   5150             if (pal_img_n) {
   5151                if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
   5152                if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
   5153                if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
   5154                pal_img_n = 4;
   5155                for (i=0; i < c.length; ++i)
   5156                   palette[i*4+3] = stbi__get8(s);
   5157             } else {
   5158                if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
   5159                if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
   5160                has_trans = 1;
   5161                // non-paletted with tRNS = constant alpha. if header-scanning, we can stop now.
   5162                if (scan == STBI__SCAN_header) { ++s->img_n; return 1; }
   5163                if (z->depth == 16) {
   5164                   for (k = 0; k < s->img_n; ++k) tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is
   5165                } else {
   5166                   for (k = 0; k < s->img_n; ++k) tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
   5167                }
   5168             }
   5169             break;
   5170          }
   5171 
   5172          case STBI__PNG_TYPE('I','D','A','T'): {
   5173             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   5174             if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
   5175             if (scan == STBI__SCAN_header) {
   5176                // header scan definitely stops at first IDAT
   5177                if (pal_img_n)
   5178                   s->img_n = pal_img_n;
   5179                return 1;
   5180             }
   5181             if (c.length > (1u << 30)) return stbi__err("IDAT size limit", "IDAT section larger than 2^30 bytes");
   5182             if ((int)(ioff + c.length) < (int)ioff) return 0;
   5183             if (ioff + c.length > idata_limit) {
   5184                stbi__uint32 idata_limit_old = idata_limit;
   5185                stbi_uc *p;
   5186                if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
   5187                while (ioff + c.length > idata_limit)
   5188                   idata_limit *= 2;
   5189                STBI_NOTUSED(idata_limit_old);
   5190                p = (stbi_uc *) STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
   5191                z->idata = p;
   5192             }
   5193             if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
   5194             ioff += c.length;
   5195             break;
   5196          }
   5197 
   5198          case STBI__PNG_TYPE('I','E','N','D'): {
   5199             stbi__uint32 raw_len, bpl;
   5200             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   5201             if (scan != STBI__SCAN_load) return 1;
   5202             if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
   5203             // initial guess for decoded data size to avoid unnecessary reallocs
   5204             bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
   5205             raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
   5206             z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
   5207             if (z->expanded == NULL) return 0; // zlib should set error
   5208             STBI_FREE(z->idata); z->idata = NULL;
   5209             if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
   5210                s->img_out_n = s->img_n+1;
   5211             else
   5212                s->img_out_n = s->img_n;
   5213             if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace)) return 0;
   5214             if (has_trans) {
   5215                if (z->depth == 16) {
   5216                   if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) return 0;
   5217                } else {
   5218                   if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
   5219                }
   5220             }
   5221             if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
   5222                stbi__de_iphone(z);
   5223             if (pal_img_n) {
   5224                // pal_img_n == 3 or 4
   5225                s->img_n = pal_img_n; // record the actual colors we had
   5226                s->img_out_n = pal_img_n;
   5227                if (req_comp >= 3) s->img_out_n = req_comp;
   5228                if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
   5229                   return 0;
   5230             } else if (has_trans) {
   5231                // non-paletted image with tRNS -> source image has (constant) alpha
   5232                ++s->img_n;
   5233             }
   5234             STBI_FREE(z->expanded); z->expanded = NULL;
   5235             // end of PNG chunk, read and skip CRC
   5236             stbi__get32be(s);
   5237             return 1;
   5238          }
   5239 
   5240          default:
   5241             // if critical, fail
   5242             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   5243             if ((c.type & (1 << 29)) == 0) {
   5244                #ifndef STBI_NO_FAILURE_STRINGS
   5245                // not threadsafe
   5246                static char invalid_chunk[] = "XXXX PNG chunk not known";
   5247                invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
   5248                invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
   5249                invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
   5250                invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
   5251                #endif
   5252                return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
   5253             }
   5254             stbi__skip(s, c.length);
   5255             break;
   5256       }
   5257       // end of PNG chunk, read and skip CRC
   5258       stbi__get32be(s);
   5259    }
   5260 }
   5261 
   5262 static void *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp, stbi__result_info *ri)
   5263 {
   5264    void *result=NULL;
   5265    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
   5266    if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
   5267       if (p->depth <= 8)
   5268          ri->bits_per_channel = 8;
   5269       else if (p->depth == 16)
   5270          ri->bits_per_channel = 16;
   5271       else
   5272          return stbi__errpuc("bad bits_per_channel", "PNG not supported: unsupported color depth");
   5273       result = p->out;
   5274       p->out = NULL;
   5275       if (req_comp && req_comp != p->s->img_out_n) {
   5276          if (ri->bits_per_channel == 8)
   5277             result = stbi__convert_format((unsigned char *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
   5278          else
   5279             result = stbi__convert_format16((stbi__uint16 *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
   5280          p->s->img_out_n = req_comp;
   5281          if (result == NULL) return result;
   5282       }
   5283       *x = p->s->img_x;
   5284       *y = p->s->img_y;
   5285       if (n) *n = p->s->img_n;
   5286    }
   5287    STBI_FREE(p->out);      p->out      = NULL;
   5288    STBI_FREE(p->expanded); p->expanded = NULL;
   5289    STBI_FREE(p->idata);    p->idata    = NULL;
   5290 
   5291    return result;
   5292 }
   5293 
   5294 static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   5295 {
   5296    stbi__png p;
   5297    p.s = s;
   5298    return stbi__do_png(&p, x,y,comp,req_comp, ri);
   5299 }
   5300 
   5301 static int stbi__png_test(stbi__context *s)
   5302 {
   5303    int r;
   5304    r = stbi__check_png_header(s);
   5305    stbi__rewind(s);
   5306    return r;
   5307 }
   5308 
   5309 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
   5310 {
   5311    if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
   5312       stbi__rewind( p->s );
   5313       return 0;
   5314    }
   5315    if (x) *x = p->s->img_x;
   5316    if (y) *y = p->s->img_y;
   5317    if (comp) *comp = p->s->img_n;
   5318    return 1;
   5319 }
   5320 
   5321 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
   5322 {
   5323    stbi__png p;
   5324    p.s = s;
   5325    return stbi__png_info_raw(&p, x, y, comp);
   5326 }
   5327 
   5328 static int stbi__png_is16(stbi__context *s)
   5329 {
   5330    stbi__png p;
   5331    p.s = s;
   5332    if (!stbi__png_info_raw(&p, NULL, NULL, NULL))
   5333 	   return 0;
   5334    if (p.depth != 16) {
   5335       stbi__rewind(p.s);
   5336       return 0;
   5337    }
   5338    return 1;
   5339 }
   5340 #endif
   5341 
   5342 // Microsoft/Windows BMP image
   5343 
   5344 #ifndef STBI_NO_BMP
   5345 static int stbi__bmp_test_raw(stbi__context *s)
   5346 {
   5347    int r;
   5348    int sz;
   5349    if (stbi__get8(s) != 'B') return 0;
   5350    if (stbi__get8(s) != 'M') return 0;
   5351    stbi__get32le(s); // discard filesize
   5352    stbi__get16le(s); // discard reserved
   5353    stbi__get16le(s); // discard reserved
   5354    stbi__get32le(s); // discard data offset
   5355    sz = stbi__get32le(s);
   5356    r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
   5357    return r;
   5358 }
   5359 
   5360 static int stbi__bmp_test(stbi__context *s)
   5361 {
   5362    int r = stbi__bmp_test_raw(s);
   5363    stbi__rewind(s);
   5364    return r;
   5365 }
   5366 
   5367 
   5368 // returns 0..31 for the highest set bit
   5369 static int stbi__high_bit(unsigned int z)
   5370 {
   5371    int n=0;
   5372    if (z == 0) return -1;
   5373    if (z >= 0x10000) { n += 16; z >>= 16; }
   5374    if (z >= 0x00100) { n +=  8; z >>=  8; }
   5375    if (z >= 0x00010) { n +=  4; z >>=  4; }
   5376    if (z >= 0x00004) { n +=  2; z >>=  2; }
   5377    if (z >= 0x00002) { n +=  1;/* >>=  1;*/ }
   5378    return n;
   5379 }
   5380 
   5381 static int stbi__bitcount(unsigned int a)
   5382 {
   5383    a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
   5384    a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
   5385    a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
   5386    a = (a + (a >> 8)); // max 16 per 8 bits
   5387    a = (a + (a >> 16)); // max 32 per 8 bits
   5388    return a & 0xff;
   5389 }
   5390 
   5391 // extract an arbitrarily-aligned N-bit value (N=bits)
   5392 // from v, and then make it 8-bits long and fractionally
   5393 // extend it to full full range.
   5394 static int stbi__shiftsigned(unsigned int v, int shift, int bits)
   5395 {
   5396    static unsigned int mul_table[9] = {
   5397       0,
   5398       0xff/*0b11111111*/, 0x55/*0b01010101*/, 0x49/*0b01001001*/, 0x11/*0b00010001*/,
   5399       0x21/*0b00100001*/, 0x41/*0b01000001*/, 0x81/*0b10000001*/, 0x01/*0b00000001*/,
   5400    };
   5401    static unsigned int shift_table[9] = {
   5402       0, 0,0,1,0,2,4,6,0,
   5403    };
   5404    if (shift < 0)
   5405       v <<= -shift;
   5406    else
   5407       v >>= shift;
   5408    STBI_ASSERT(v < 256);
   5409    v >>= (8-bits);
   5410    STBI_ASSERT(bits >= 0 && bits <= 8);
   5411    return (int) ((unsigned) v * mul_table[bits]) >> shift_table[bits];
   5412 }
   5413 
   5414 typedef struct
   5415 {
   5416    int bpp, offset, hsz;
   5417    unsigned int mr,mg,mb,ma, all_a;
   5418    int extra_read;
   5419 } stbi__bmp_data;
   5420 
   5421 static int stbi__bmp_set_mask_defaults(stbi__bmp_data *info, int compress)
   5422 {
   5423    // BI_BITFIELDS specifies masks explicitly, don't override
   5424    if (compress == 3)
   5425       return 1;
   5426 
   5427    if (compress == 0) {
   5428       if (info->bpp == 16) {
   5429          info->mr = 31u << 10;
   5430          info->mg = 31u <<  5;
   5431          info->mb = 31u <<  0;
   5432       } else if (info->bpp == 32) {
   5433          info->mr = 0xffu << 16;
   5434          info->mg = 0xffu <<  8;
   5435          info->mb = 0xffu <<  0;
   5436          info->ma = 0xffu << 24;
   5437          info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
   5438       } else {
   5439          // otherwise, use defaults, which is all-0
   5440          info->mr = info->mg = info->mb = info->ma = 0;
   5441       }
   5442       return 1;
   5443    }
   5444    return 0; // error
   5445 }
   5446 
   5447 static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
   5448 {
   5449    int hsz;
   5450    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
   5451    stbi__get32le(s); // discard filesize
   5452    stbi__get16le(s); // discard reserved
   5453    stbi__get16le(s); // discard reserved
   5454    info->offset = stbi__get32le(s);
   5455    info->hsz = hsz = stbi__get32le(s);
   5456    info->mr = info->mg = info->mb = info->ma = 0;
   5457    info->extra_read = 14;
   5458 
   5459    if (info->offset < 0) return stbi__errpuc("bad BMP", "bad BMP");
   5460 
   5461    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
   5462    if (hsz == 12) {
   5463       s->img_x = stbi__get16le(s);
   5464       s->img_y = stbi__get16le(s);
   5465    } else {
   5466       s->img_x = stbi__get32le(s);
   5467       s->img_y = stbi__get32le(s);
   5468    }
   5469    if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
   5470    info->bpp = stbi__get16le(s);
   5471    if (hsz != 12) {
   5472       int compress = stbi__get32le(s);
   5473       if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
   5474       if (compress >= 4) return stbi__errpuc("BMP JPEG/PNG", "BMP type not supported: unsupported compression"); // this includes PNG/JPEG modes
   5475       if (compress == 3 && info->bpp != 16 && info->bpp != 32) return stbi__errpuc("bad BMP", "bad BMP"); // bitfields requires 16 or 32 bits/pixel
   5476       stbi__get32le(s); // discard sizeof
   5477       stbi__get32le(s); // discard hres
   5478       stbi__get32le(s); // discard vres
   5479       stbi__get32le(s); // discard colorsused
   5480       stbi__get32le(s); // discard max important
   5481       if (hsz == 40 || hsz == 56) {
   5482          if (hsz == 56) {
   5483             stbi__get32le(s);
   5484             stbi__get32le(s);
   5485             stbi__get32le(s);
   5486             stbi__get32le(s);
   5487          }
   5488          if (info->bpp == 16 || info->bpp == 32) {
   5489             if (compress == 0) {
   5490                stbi__bmp_set_mask_defaults(info, compress);
   5491             } else if (compress == 3) {
   5492                info->mr = stbi__get32le(s);
   5493                info->mg = stbi__get32le(s);
   5494                info->mb = stbi__get32le(s);
   5495                info->extra_read += 12;
   5496                // not documented, but generated by photoshop and handled by mspaint
   5497                if (info->mr == info->mg && info->mg == info->mb) {
   5498                   // ?!?!?
   5499                   return stbi__errpuc("bad BMP", "bad BMP");
   5500                }
   5501             } else
   5502                return stbi__errpuc("bad BMP", "bad BMP");
   5503          }
   5504       } else {
   5505          // V4/V5 header
   5506          int i;
   5507          if (hsz != 108 && hsz != 124)
   5508             return stbi__errpuc("bad BMP", "bad BMP");
   5509          info->mr = stbi__get32le(s);
   5510          info->mg = stbi__get32le(s);
   5511          info->mb = stbi__get32le(s);
   5512          info->ma = stbi__get32le(s);
   5513          if (compress != 3) // override mr/mg/mb unless in BI_BITFIELDS mode, as per docs
   5514             stbi__bmp_set_mask_defaults(info, compress);
   5515          stbi__get32le(s); // discard color space
   5516          for (i=0; i < 12; ++i)
   5517             stbi__get32le(s); // discard color space parameters
   5518          if (hsz == 124) {
   5519             stbi__get32le(s); // discard rendering intent
   5520             stbi__get32le(s); // discard offset of profile data
   5521             stbi__get32le(s); // discard size of profile data
   5522             stbi__get32le(s); // discard reserved
   5523          }
   5524       }
   5525    }
   5526    return (void *) 1;
   5527 }
   5528 
   5529 
   5530 static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   5531 {
   5532    stbi_uc *out;
   5533    unsigned int mr=0,mg=0,mb=0,ma=0, all_a;
   5534    stbi_uc pal[256][4];
   5535    int psize=0,i,j,width;
   5536    int flip_vertically, pad, target;
   5537    stbi__bmp_data info;
   5538    STBI_NOTUSED(ri);
   5539 
   5540    info.all_a = 255;
   5541    if (stbi__bmp_parse_header(s, &info) == NULL)
   5542       return NULL; // error code already set
   5543 
   5544    flip_vertically = ((int) s->img_y) > 0;
   5545    s->img_y = abs((int) s->img_y);
   5546 
   5547    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   5548    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   5549 
   5550    mr = info.mr;
   5551    mg = info.mg;
   5552    mb = info.mb;
   5553    ma = info.ma;
   5554    all_a = info.all_a;
   5555 
   5556    if (info.hsz == 12) {
   5557       if (info.bpp < 24)
   5558          psize = (info.offset - info.extra_read - 24) / 3;
   5559    } else {
   5560       if (info.bpp < 16)
   5561          psize = (info.offset - info.extra_read - info.hsz) >> 2;
   5562    }
   5563    if (psize == 0) {
   5564       // accept some number of extra bytes after the header, but if the offset points either to before
   5565       // the header ends or implies a large amount of extra data, reject the file as malformed
   5566       int bytes_read_so_far = s->callback_already_read + (int)(s->img_buffer - s->img_buffer_original);
   5567       int header_limit = 1024; // max we actually read is below 256 bytes currently.
   5568       int extra_data_limit = 256*4; // what ordinarily goes here is a palette; 256 entries*4 bytes is its max size.
   5569       if (bytes_read_so_far <= 0 || bytes_read_so_far > header_limit) {
   5570          return stbi__errpuc("bad header", "Corrupt BMP");
   5571       }
   5572       // we established that bytes_read_so_far is positive and sensible.
   5573       // the first half of this test rejects offsets that are either too small positives, or
   5574       // negative, and guarantees that info.offset >= bytes_read_so_far > 0. this in turn
   5575       // ensures the number computed in the second half of the test can't overflow.
   5576       if (info.offset < bytes_read_so_far || info.offset - bytes_read_so_far > extra_data_limit) {
   5577          return stbi__errpuc("bad offset", "Corrupt BMP");
   5578       } else {
   5579          stbi__skip(s, info.offset - bytes_read_so_far);
   5580       }
   5581    }
   5582 
   5583    if (info.bpp == 24 && ma == 0xff000000)
   5584       s->img_n = 3;
   5585    else
   5586       s->img_n = ma ? 4 : 3;
   5587    if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
   5588       target = req_comp;
   5589    else
   5590       target = s->img_n; // if they want monochrome, we'll post-convert
   5591 
   5592    // sanity-check size
   5593    if (!stbi__mad3sizes_valid(target, s->img_x, s->img_y, 0))
   5594       return stbi__errpuc("too large", "Corrupt BMP");
   5595 
   5596    out = (stbi_uc *) stbi__malloc_mad3(target, s->img_x, s->img_y, 0);
   5597    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   5598    if (info.bpp < 16) {
   5599       int z=0;
   5600       if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
   5601       for (i=0; i < psize; ++i) {
   5602          pal[i][2] = stbi__get8(s);
   5603          pal[i][1] = stbi__get8(s);
   5604          pal[i][0] = stbi__get8(s);
   5605          if (info.hsz != 12) stbi__get8(s);
   5606          pal[i][3] = 255;
   5607       }
   5608       stbi__skip(s, info.offset - info.extra_read - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
   5609       if (info.bpp == 1) width = (s->img_x + 7) >> 3;
   5610       else if (info.bpp == 4) width = (s->img_x + 1) >> 1;
   5611       else if (info.bpp == 8) width = s->img_x;
   5612       else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
   5613       pad = (-width)&3;
   5614       if (info.bpp == 1) {
   5615          for (j=0; j < (int) s->img_y; ++j) {
   5616             int bit_offset = 7, v = stbi__get8(s);
   5617             for (i=0; i < (int) s->img_x; ++i) {
   5618                int color = (v>>bit_offset)&0x1;
   5619                out[z++] = pal[color][0];
   5620                out[z++] = pal[color][1];
   5621                out[z++] = pal[color][2];
   5622                if (target == 4) out[z++] = 255;
   5623                if (i+1 == (int) s->img_x) break;
   5624                if((--bit_offset) < 0) {
   5625                   bit_offset = 7;
   5626                   v = stbi__get8(s);
   5627                }
   5628             }
   5629             stbi__skip(s, pad);
   5630          }
   5631       } else {
   5632          for (j=0; j < (int) s->img_y; ++j) {
   5633             for (i=0; i < (int) s->img_x; i += 2) {
   5634                int v=stbi__get8(s),v2=0;
   5635                if (info.bpp == 4) {
   5636                   v2 = v & 15;
   5637                   v >>= 4;
   5638                }
   5639                out[z++] = pal[v][0];
   5640                out[z++] = pal[v][1];
   5641                out[z++] = pal[v][2];
   5642                if (target == 4) out[z++] = 255;
   5643                if (i+1 == (int) s->img_x) break;
   5644                v = (info.bpp == 8) ? stbi__get8(s) : v2;
   5645                out[z++] = pal[v][0];
   5646                out[z++] = pal[v][1];
   5647                out[z++] = pal[v][2];
   5648                if (target == 4) out[z++] = 255;
   5649             }
   5650             stbi__skip(s, pad);
   5651          }
   5652       }
   5653    } else {
   5654       int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
   5655       int z = 0;
   5656       int easy=0;
   5657       stbi__skip(s, info.offset - info.extra_read - info.hsz);
   5658       if (info.bpp == 24) width = 3 * s->img_x;
   5659       else if (info.bpp == 16) width = 2*s->img_x;
   5660       else /* bpp = 32 and pad = 0 */ width=0;
   5661       pad = (-width) & 3;
   5662       if (info.bpp == 24) {
   5663          easy = 1;
   5664       } else if (info.bpp == 32) {
   5665          if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
   5666             easy = 2;
   5667       }
   5668       if (!easy) {
   5669          if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
   5670          // right shift amt to put high bit in position #7
   5671          rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
   5672          gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
   5673          bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
   5674          ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
   5675          if (rcount > 8 || gcount > 8 || bcount > 8 || acount > 8) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
   5676       }
   5677       for (j=0; j < (int) s->img_y; ++j) {
   5678          if (easy) {
   5679             for (i=0; i < (int) s->img_x; ++i) {
   5680                unsigned char a;
   5681                out[z+2] = stbi__get8(s);
   5682                out[z+1] = stbi__get8(s);
   5683                out[z+0] = stbi__get8(s);
   5684                z += 3;
   5685                a = (easy == 2 ? stbi__get8(s) : 255);
   5686                all_a |= a;
   5687                if (target == 4) out[z++] = a;
   5688             }
   5689          } else {
   5690             int bpp = info.bpp;
   5691             for (i=0; i < (int) s->img_x; ++i) {
   5692                stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
   5693                unsigned int a;
   5694                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
   5695                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
   5696                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
   5697                a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
   5698                all_a |= a;
   5699                if (target == 4) out[z++] = STBI__BYTECAST(a);
   5700             }
   5701          }
   5702          stbi__skip(s, pad);
   5703       }
   5704    }
   5705 
   5706    // if alpha channel is all 0s, replace with all 255s
   5707    if (target == 4 && all_a == 0)
   5708       for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
   5709          out[i] = 255;
   5710 
   5711    if (flip_vertically) {
   5712       stbi_uc t;
   5713       for (j=0; j < (int) s->img_y>>1; ++j) {
   5714          stbi_uc *p1 = out +      j     *s->img_x*target;
   5715          stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
   5716          for (i=0; i < (int) s->img_x*target; ++i) {
   5717             t = p1[i]; p1[i] = p2[i]; p2[i] = t;
   5718          }
   5719       }
   5720    }
   5721 
   5722    if (req_comp && req_comp != target) {
   5723       out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
   5724       if (out == NULL) return out; // stbi__convert_format frees input on failure
   5725    }
   5726 
   5727    *x = s->img_x;
   5728    *y = s->img_y;
   5729    if (comp) *comp = s->img_n;
   5730    return out;
   5731 }
   5732 #endif
   5733 
   5734 // Targa Truevision - TGA
   5735 // by Jonathan Dummer
   5736 #ifndef STBI_NO_TGA
   5737 // returns STBI_rgb or whatever, 0 on error
   5738 static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
   5739 {
   5740    // only RGB or RGBA (incl. 16bit) or grey allowed
   5741    if (is_rgb16) *is_rgb16 = 0;
   5742    switch(bits_per_pixel) {
   5743       case 8:  return STBI_grey;
   5744       case 16: if(is_grey) return STBI_grey_alpha;
   5745                // fallthrough
   5746       case 15: if(is_rgb16) *is_rgb16 = 1;
   5747                return STBI_rgb;
   5748       case 24: // fallthrough
   5749       case 32: return bits_per_pixel/8;
   5750       default: return 0;
   5751    }
   5752 }
   5753 
   5754 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
   5755 {
   5756     int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
   5757     int sz, tga_colormap_type;
   5758     stbi__get8(s);                   // discard Offset
   5759     tga_colormap_type = stbi__get8(s); // colormap type
   5760     if( tga_colormap_type > 1 ) {
   5761         stbi__rewind(s);
   5762         return 0;      // only RGB or indexed allowed
   5763     }
   5764     tga_image_type = stbi__get8(s); // image type
   5765     if ( tga_colormap_type == 1 ) { // colormapped (paletted) image
   5766         if (tga_image_type != 1 && tga_image_type != 9) {
   5767             stbi__rewind(s);
   5768             return 0;
   5769         }
   5770         stbi__skip(s,4);       // skip index of first colormap entry and number of entries
   5771         sz = stbi__get8(s);    //   check bits per palette color entry
   5772         if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) {
   5773             stbi__rewind(s);
   5774             return 0;
   5775         }
   5776         stbi__skip(s,4);       // skip image x and y origin
   5777         tga_colormap_bpp = sz;
   5778     } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
   5779         if ( (tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11) ) {
   5780             stbi__rewind(s);
   5781             return 0; // only RGB or grey allowed, +/- RLE
   5782         }
   5783         stbi__skip(s,9); // skip colormap specification and image x/y origin
   5784         tga_colormap_bpp = 0;
   5785     }
   5786     tga_w = stbi__get16le(s);
   5787     if( tga_w < 1 ) {
   5788         stbi__rewind(s);
   5789         return 0;   // test width
   5790     }
   5791     tga_h = stbi__get16le(s);
   5792     if( tga_h < 1 ) {
   5793         stbi__rewind(s);
   5794         return 0;   // test height
   5795     }
   5796     tga_bits_per_pixel = stbi__get8(s); // bits per pixel
   5797     stbi__get8(s); // ignore alpha bits
   5798     if (tga_colormap_bpp != 0) {
   5799         if((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
   5800             // when using a colormap, tga_bits_per_pixel is the size of the indexes
   5801             // I don't think anything but 8 or 16bit indexes makes sense
   5802             stbi__rewind(s);
   5803             return 0;
   5804         }
   5805         tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
   5806     } else {
   5807         tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
   5808     }
   5809     if(!tga_comp) {
   5810       stbi__rewind(s);
   5811       return 0;
   5812     }
   5813     if (x) *x = tga_w;
   5814     if (y) *y = tga_h;
   5815     if (comp) *comp = tga_comp;
   5816     return 1;                   // seems to have passed everything
   5817 }
   5818 
   5819 static int stbi__tga_test(stbi__context *s)
   5820 {
   5821    int res = 0;
   5822    int sz, tga_color_type;
   5823    stbi__get8(s);      //   discard Offset
   5824    tga_color_type = stbi__get8(s);   //   color type
   5825    if ( tga_color_type > 1 ) goto errorEnd;   //   only RGB or indexed allowed
   5826    sz = stbi__get8(s);   //   image type
   5827    if ( tga_color_type == 1 ) { // colormapped (paletted) image
   5828       if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
   5829       stbi__skip(s,4);       // skip index of first colormap entry and number of entries
   5830       sz = stbi__get8(s);    //   check bits per palette color entry
   5831       if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
   5832       stbi__skip(s,4);       // skip image x and y origin
   5833    } else { // "normal" image w/o colormap
   5834       if ( (sz != 2) && (sz != 3) && (sz != 10) && (sz != 11) ) goto errorEnd; // only RGB or grey allowed, +/- RLE
   5835       stbi__skip(s,9); // skip colormap specification and image x/y origin
   5836    }
   5837    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test width
   5838    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test height
   5839    sz = stbi__get8(s);   //   bits per pixel
   5840    if ( (tga_color_type == 1) && (sz != 8) && (sz != 16) ) goto errorEnd; // for colormapped images, bpp is size of an index
   5841    if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
   5842 
   5843    res = 1; // if we got this far, everything's good and we can return 1 instead of 0
   5844 
   5845 errorEnd:
   5846    stbi__rewind(s);
   5847    return res;
   5848 }
   5849 
   5850 // read 16bit value and convert to 24bit RGB
   5851 static void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
   5852 {
   5853    stbi__uint16 px = (stbi__uint16)stbi__get16le(s);
   5854    stbi__uint16 fiveBitMask = 31;
   5855    // we have 3 channels with 5bits each
   5856    int r = (px >> 10) & fiveBitMask;
   5857    int g = (px >> 5) & fiveBitMask;
   5858    int b = px & fiveBitMask;
   5859    // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
   5860    out[0] = (stbi_uc)((r * 255)/31);
   5861    out[1] = (stbi_uc)((g * 255)/31);
   5862    out[2] = (stbi_uc)((b * 255)/31);
   5863 
   5864    // some people claim that the most significant bit might be used for alpha
   5865    // (possibly if an alpha-bit is set in the "image descriptor byte")
   5866    // but that only made 16bit test images completely translucent..
   5867    // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
   5868 }
   5869 
   5870 static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   5871 {
   5872    //   read in the TGA header stuff
   5873    int tga_offset = stbi__get8(s);
   5874    int tga_indexed = stbi__get8(s);
   5875    int tga_image_type = stbi__get8(s);
   5876    int tga_is_RLE = 0;
   5877    int tga_palette_start = stbi__get16le(s);
   5878    int tga_palette_len = stbi__get16le(s);
   5879    int tga_palette_bits = stbi__get8(s);
   5880    int tga_x_origin = stbi__get16le(s);
   5881    int tga_y_origin = stbi__get16le(s);
   5882    int tga_width = stbi__get16le(s);
   5883    int tga_height = stbi__get16le(s);
   5884    int tga_bits_per_pixel = stbi__get8(s);
   5885    int tga_comp, tga_rgb16=0;
   5886    int tga_inverted = stbi__get8(s);
   5887    // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
   5888    //   image data
   5889    unsigned char *tga_data;
   5890    unsigned char *tga_palette = NULL;
   5891    int i, j;
   5892    unsigned char raw_data[4] = {0};
   5893    int RLE_count = 0;
   5894    int RLE_repeating = 0;
   5895    int read_next_pixel = 1;
   5896    STBI_NOTUSED(ri);
   5897    STBI_NOTUSED(tga_x_origin); // @TODO
   5898    STBI_NOTUSED(tga_y_origin); // @TODO
   5899 
   5900    if (tga_height > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   5901    if (tga_width > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   5902 
   5903    //   do a tiny bit of precessing
   5904    if ( tga_image_type >= 8 )
   5905    {
   5906       tga_image_type -= 8;
   5907       tga_is_RLE = 1;
   5908    }
   5909    tga_inverted = 1 - ((tga_inverted >> 5) & 1);
   5910 
   5911    //   If I'm paletted, then I'll use the number of bits from the palette
   5912    if ( tga_indexed ) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
   5913    else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
   5914 
   5915    if(!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
   5916       return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
   5917 
   5918    //   tga info
   5919    *x = tga_width;
   5920    *y = tga_height;
   5921    if (comp) *comp = tga_comp;
   5922 
   5923    if (!stbi__mad3sizes_valid(tga_width, tga_height, tga_comp, 0))
   5924       return stbi__errpuc("too large", "Corrupt TGA");
   5925 
   5926    tga_data = (unsigned char*)stbi__malloc_mad3(tga_width, tga_height, tga_comp, 0);
   5927    if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
   5928 
   5929    // skip to the data's starting position (offset usually = 0)
   5930    stbi__skip(s, tga_offset );
   5931 
   5932    if ( !tga_indexed && !tga_is_RLE && !tga_rgb16 ) {
   5933       for (i=0; i < tga_height; ++i) {
   5934          int row = tga_inverted ? tga_height -i - 1 : i;
   5935          stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
   5936          stbi__getn(s, tga_row, tga_width * tga_comp);
   5937       }
   5938    } else  {
   5939       //   do I need to load a palette?
   5940       if ( tga_indexed)
   5941       {
   5942          if (tga_palette_len == 0) {  /* you have to have at least one entry! */
   5943             STBI_FREE(tga_data);
   5944             return stbi__errpuc("bad palette", "Corrupt TGA");
   5945          }
   5946 
   5947          //   any data to skip? (offset usually = 0)
   5948          stbi__skip(s, tga_palette_start );
   5949          //   load the palette
   5950          tga_palette = (unsigned char*)stbi__malloc_mad2(tga_palette_len, tga_comp, 0);
   5951          if (!tga_palette) {
   5952             STBI_FREE(tga_data);
   5953             return stbi__errpuc("outofmem", "Out of memory");
   5954          }
   5955          if (tga_rgb16) {
   5956             stbi_uc *pal_entry = tga_palette;
   5957             STBI_ASSERT(tga_comp == STBI_rgb);
   5958             for (i=0; i < tga_palette_len; ++i) {
   5959                stbi__tga_read_rgb16(s, pal_entry);
   5960                pal_entry += tga_comp;
   5961             }
   5962          } else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
   5963                STBI_FREE(tga_data);
   5964                STBI_FREE(tga_palette);
   5965                return stbi__errpuc("bad palette", "Corrupt TGA");
   5966          }
   5967       }
   5968       //   load the data
   5969       for (i=0; i < tga_width * tga_height; ++i)
   5970       {
   5971          //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
   5972          if ( tga_is_RLE )
   5973          {
   5974             if ( RLE_count == 0 )
   5975             {
   5976                //   yep, get the next byte as a RLE command
   5977                int RLE_cmd = stbi__get8(s);
   5978                RLE_count = 1 + (RLE_cmd & 127);
   5979                RLE_repeating = RLE_cmd >> 7;
   5980                read_next_pixel = 1;
   5981             } else if ( !RLE_repeating )
   5982             {
   5983                read_next_pixel = 1;
   5984             }
   5985          } else
   5986          {
   5987             read_next_pixel = 1;
   5988          }
   5989          //   OK, if I need to read a pixel, do it now
   5990          if ( read_next_pixel )
   5991          {
   5992             //   load however much data we did have
   5993             if ( tga_indexed )
   5994             {
   5995                // read in index, then perform the lookup
   5996                int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
   5997                if ( pal_idx >= tga_palette_len ) {
   5998                   // invalid index
   5999                   pal_idx = 0;
   6000                }
   6001                pal_idx *= tga_comp;
   6002                for (j = 0; j < tga_comp; ++j) {
   6003                   raw_data[j] = tga_palette[pal_idx+j];
   6004                }
   6005             } else if(tga_rgb16) {
   6006                STBI_ASSERT(tga_comp == STBI_rgb);
   6007                stbi__tga_read_rgb16(s, raw_data);
   6008             } else {
   6009                //   read in the data raw
   6010                for (j = 0; j < tga_comp; ++j) {
   6011                   raw_data[j] = stbi__get8(s);
   6012                }
   6013             }
   6014             //   clear the reading flag for the next pixel
   6015             read_next_pixel = 0;
   6016          } // end of reading a pixel
   6017 
   6018          // copy data
   6019          for (j = 0; j < tga_comp; ++j)
   6020            tga_data[i*tga_comp+j] = raw_data[j];
   6021 
   6022          //   in case we're in RLE mode, keep counting down
   6023          --RLE_count;
   6024       }
   6025       //   do I need to invert the image?
   6026       if ( tga_inverted )
   6027       {
   6028          for (j = 0; j*2 < tga_height; ++j)
   6029          {
   6030             int index1 = j * tga_width * tga_comp;
   6031             int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
   6032             for (i = tga_width * tga_comp; i > 0; --i)
   6033             {
   6034                unsigned char temp = tga_data[index1];
   6035                tga_data[index1] = tga_data[index2];
   6036                tga_data[index2] = temp;
   6037                ++index1;
   6038                ++index2;
   6039             }
   6040          }
   6041       }
   6042       //   clear my palette, if I had one
   6043       if ( tga_palette != NULL )
   6044       {
   6045          STBI_FREE( tga_palette );
   6046       }
   6047    }
   6048 
   6049    // swap RGB - if the source data was RGB16, it already is in the right order
   6050    if (tga_comp >= 3 && !tga_rgb16)
   6051    {
   6052       unsigned char* tga_pixel = tga_data;
   6053       for (i=0; i < tga_width * tga_height; ++i)
   6054       {
   6055          unsigned char temp = tga_pixel[0];
   6056          tga_pixel[0] = tga_pixel[2];
   6057          tga_pixel[2] = temp;
   6058          tga_pixel += tga_comp;
   6059       }
   6060    }
   6061 
   6062    // convert to target component count
   6063    if (req_comp && req_comp != tga_comp)
   6064       tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
   6065 
   6066    //   the things I do to get rid of an error message, and yet keep
   6067    //   Microsoft's C compilers happy... [8^(
   6068    tga_palette_start = tga_palette_len = tga_palette_bits =
   6069          tga_x_origin = tga_y_origin = 0;
   6070    STBI_NOTUSED(tga_palette_start);
   6071    //   OK, done
   6072    return tga_data;
   6073 }
   6074 #endif
   6075 
   6076 // *************************************************************************************************
   6077 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
   6078 
   6079 #ifndef STBI_NO_PSD
   6080 static int stbi__psd_test(stbi__context *s)
   6081 {
   6082    int r = (stbi__get32be(s) == 0x38425053);
   6083    stbi__rewind(s);
   6084    return r;
   6085 }
   6086 
   6087 static int stbi__psd_decode_rle(stbi__context *s, stbi_uc *p, int pixelCount)
   6088 {
   6089    int count, nleft, len;
   6090 
   6091    count = 0;
   6092    while ((nleft = pixelCount - count) > 0) {
   6093       len = stbi__get8(s);
   6094       if (len == 128) {
   6095          // No-op.
   6096       } else if (len < 128) {
   6097          // Copy next len+1 bytes literally.
   6098          len++;
   6099          if (len > nleft) return 0; // corrupt data
   6100          count += len;
   6101          while (len) {
   6102             *p = stbi__get8(s);
   6103             p += 4;
   6104             len--;
   6105          }
   6106       } else if (len > 128) {
   6107          stbi_uc   val;
   6108          // Next -len+1 bytes in the dest are replicated from next source byte.
   6109          // (Interpret len as a negative 8-bit int.)
   6110          len = 257 - len;
   6111          if (len > nleft) return 0; // corrupt data
   6112          val = stbi__get8(s);
   6113          count += len;
   6114          while (len) {
   6115             *p = val;
   6116             p += 4;
   6117             len--;
   6118          }
   6119       }
   6120    }
   6121 
   6122    return 1;
   6123 }
   6124 
   6125 static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
   6126 {
   6127    int pixelCount;
   6128    int channelCount, compression;
   6129    int channel, i;
   6130    int bitdepth;
   6131    int w,h;
   6132    stbi_uc *out;
   6133    STBI_NOTUSED(ri);
   6134 
   6135    // Check identifier
   6136    if (stbi__get32be(s) != 0x38425053)   // "8BPS"
   6137       return stbi__errpuc("not PSD", "Corrupt PSD image");
   6138 
   6139    // Check file type version.
   6140    if (stbi__get16be(s) != 1)
   6141       return stbi__errpuc("wrong version", "Unsupported version of PSD image");
   6142 
   6143    // Skip 6 reserved bytes.
   6144    stbi__skip(s, 6 );
   6145 
   6146    // Read the number of channels (R, G, B, A, etc).
   6147    channelCount = stbi__get16be(s);
   6148    if (channelCount < 0 || channelCount > 16)
   6149       return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
   6150 
   6151    // Read the rows and columns of the image.
   6152    h = stbi__get32be(s);
   6153    w = stbi__get32be(s);
   6154 
   6155    if (h > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   6156    if (w > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   6157 
   6158    // Make sure the depth is 8 bits.
   6159    bitdepth = stbi__get16be(s);
   6160    if (bitdepth != 8 && bitdepth != 16)
   6161       return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
   6162 
   6163    // Make sure the color mode is RGB.
   6164    // Valid options are:
   6165    //   0: Bitmap
   6166    //   1: Grayscale
   6167    //   2: Indexed color
   6168    //   3: RGB color
   6169    //   4: CMYK color
   6170    //   7: Multichannel
   6171    //   8: Duotone
   6172    //   9: Lab color
   6173    if (stbi__get16be(s) != 3)
   6174       return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
   6175 
   6176    // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
   6177    stbi__skip(s,stbi__get32be(s) );
   6178 
   6179    // Skip the image resources.  (resolution, pen tool paths, etc)
   6180    stbi__skip(s, stbi__get32be(s) );
   6181 
   6182    // Skip the reserved data.
   6183    stbi__skip(s, stbi__get32be(s) );
   6184 
   6185    // Find out if the data is compressed.
   6186    // Known values:
   6187    //   0: no compression
   6188    //   1: RLE compressed
   6189    compression = stbi__get16be(s);
   6190    if (compression > 1)
   6191       return stbi__errpuc("bad compression", "PSD has an unknown compression format");
   6192 
   6193    // Check size
   6194    if (!stbi__mad3sizes_valid(4, w, h, 0))
   6195       return stbi__errpuc("too large", "Corrupt PSD");
   6196 
   6197    // Create the destination image.
   6198 
   6199    if (!compression && bitdepth == 16 && bpc == 16) {
   6200       out = (stbi_uc *) stbi__malloc_mad3(8, w, h, 0);
   6201       ri->bits_per_channel = 16;
   6202    } else
   6203       out = (stbi_uc *) stbi__malloc(4 * w*h);
   6204 
   6205    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   6206    pixelCount = w*h;
   6207 
   6208    // Initialize the data to zero.
   6209    //memset( out, 0, pixelCount * 4 );
   6210 
   6211    // Finally, the image data.
   6212    if (compression) {
   6213       // RLE as used by .PSD and .TIFF
   6214       // Loop until you get the number of unpacked bytes you are expecting:
   6215       //     Read the next source byte into n.
   6216       //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
   6217       //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
   6218       //     Else if n is 128, noop.
   6219       // Endloop
   6220 
   6221       // The RLE-compressed data is preceded by a 2-byte data count for each row in the data,
   6222       // which we're going to just skip.
   6223       stbi__skip(s, h * channelCount * 2 );
   6224 
   6225       // Read the RLE data by channel.
   6226       for (channel = 0; channel < 4; channel++) {
   6227          stbi_uc *p;
   6228 
   6229          p = out+channel;
   6230          if (channel >= channelCount) {
   6231             // Fill this channel with default data.
   6232             for (i = 0; i < pixelCount; i++, p += 4)
   6233                *p = (channel == 3 ? 255 : 0);
   6234          } else {
   6235             // Read the RLE data.
   6236             if (!stbi__psd_decode_rle(s, p, pixelCount)) {
   6237                STBI_FREE(out);
   6238                return stbi__errpuc("corrupt", "bad RLE data");
   6239             }
   6240          }
   6241       }
   6242 
   6243    } else {
   6244       // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
   6245       // where each channel consists of an 8-bit (or 16-bit) value for each pixel in the image.
   6246 
   6247       // Read the data by channel.
   6248       for (channel = 0; channel < 4; channel++) {
   6249          if (channel >= channelCount) {
   6250             // Fill this channel with default data.
   6251             if (bitdepth == 16 && bpc == 16) {
   6252                stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
   6253                stbi__uint16 val = channel == 3 ? 65535 : 0;
   6254                for (i = 0; i < pixelCount; i++, q += 4)
   6255                   *q = val;
   6256             } else {
   6257                stbi_uc *p = out+channel;
   6258                stbi_uc val = channel == 3 ? 255 : 0;
   6259                for (i = 0; i < pixelCount; i++, p += 4)
   6260                   *p = val;
   6261             }
   6262          } else {
   6263             if (ri->bits_per_channel == 16) {    // output bpc
   6264                stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
   6265                for (i = 0; i < pixelCount; i++, q += 4)
   6266                   *q = (stbi__uint16) stbi__get16be(s);
   6267             } else {
   6268                stbi_uc *p = out+channel;
   6269                if (bitdepth == 16) {  // input bpc
   6270                   for (i = 0; i < pixelCount; i++, p += 4)
   6271                      *p = (stbi_uc) (stbi__get16be(s) >> 8);
   6272                } else {
   6273                   for (i = 0; i < pixelCount; i++, p += 4)
   6274                      *p = stbi__get8(s);
   6275                }
   6276             }
   6277          }
   6278       }
   6279    }
   6280 
   6281    // remove weird white matte from PSD
   6282    if (channelCount >= 4) {
   6283       if (ri->bits_per_channel == 16) {
   6284          for (i=0; i < w*h; ++i) {
   6285             stbi__uint16 *pixel = (stbi__uint16 *) out + 4*i;
   6286             if (pixel[3] != 0 && pixel[3] != 65535) {
   6287                float a = pixel[3] / 65535.0f;
   6288                float ra = 1.0f / a;
   6289                float inv_a = 65535.0f * (1 - ra);
   6290                pixel[0] = (stbi__uint16) (pixel[0]*ra + inv_a);
   6291                pixel[1] = (stbi__uint16) (pixel[1]*ra + inv_a);
   6292                pixel[2] = (stbi__uint16) (pixel[2]*ra + inv_a);
   6293             }
   6294          }
   6295       } else {
   6296          for (i=0; i < w*h; ++i) {
   6297             unsigned char *pixel = out + 4*i;
   6298             if (pixel[3] != 0 && pixel[3] != 255) {
   6299                float a = pixel[3] / 255.0f;
   6300                float ra = 1.0f / a;
   6301                float inv_a = 255.0f * (1 - ra);
   6302                pixel[0] = (unsigned char) (pixel[0]*ra + inv_a);
   6303                pixel[1] = (unsigned char) (pixel[1]*ra + inv_a);
   6304                pixel[2] = (unsigned char) (pixel[2]*ra + inv_a);
   6305             }
   6306          }
   6307       }
   6308    }
   6309 
   6310    // convert to desired output format
   6311    if (req_comp && req_comp != 4) {
   6312       if (ri->bits_per_channel == 16)
   6313          out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, 4, req_comp, w, h);
   6314       else
   6315          out = stbi__convert_format(out, 4, req_comp, w, h);
   6316       if (out == NULL) return out; // stbi__convert_format frees input on failure
   6317    }
   6318 
   6319    if (comp) *comp = 4;
   6320    *y = h;
   6321    *x = w;
   6322 
   6323    return out;
   6324 }
   6325 #endif
   6326 
   6327 // *************************************************************************************************
   6328 // Softimage PIC loader
   6329 // by Tom Seddon
   6330 //
   6331 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
   6332 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
   6333 
   6334 #ifndef STBI_NO_PIC
   6335 static int stbi__pic_is4(stbi__context *s,const char *str)
   6336 {
   6337    int i;
   6338    for (i=0; i<4; ++i)
   6339       if (stbi__get8(s) != (stbi_uc)str[i])
   6340          return 0;
   6341 
   6342    return 1;
   6343 }
   6344 
   6345 static int stbi__pic_test_core(stbi__context *s)
   6346 {
   6347    int i;
   6348 
   6349    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
   6350       return 0;
   6351 
   6352    for(i=0;i<84;++i)
   6353       stbi__get8(s);
   6354 
   6355    if (!stbi__pic_is4(s,"PICT"))
   6356       return 0;
   6357 
   6358    return 1;
   6359 }
   6360 
   6361 typedef struct
   6362 {
   6363    stbi_uc size,type,channel;
   6364 } stbi__pic_packet;
   6365 
   6366 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
   6367 {
   6368    int mask=0x80, i;
   6369 
   6370    for (i=0; i<4; ++i, mask>>=1) {
   6371       if (channel & mask) {
   6372          if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
   6373          dest[i]=stbi__get8(s);
   6374       }
   6375    }
   6376 
   6377    return dest;
   6378 }
   6379 
   6380 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
   6381 {
   6382    int mask=0x80,i;
   6383 
   6384    for (i=0;i<4; ++i, mask>>=1)
   6385       if (channel&mask)
   6386          dest[i]=src[i];
   6387 }
   6388 
   6389 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
   6390 {
   6391    int act_comp=0,num_packets=0,y,chained;
   6392    stbi__pic_packet packets[10];
   6393 
   6394    // this will (should...) cater for even some bizarre stuff like having data
   6395     // for the same channel in multiple packets.
   6396    do {
   6397       stbi__pic_packet *packet;
   6398 
   6399       if (num_packets==sizeof(packets)/sizeof(packets[0]))
   6400          return stbi__errpuc("bad format","too many packets");
   6401 
   6402       packet = &packets[num_packets++];
   6403 
   6404       chained = stbi__get8(s);
   6405       packet->size    = stbi__get8(s);
   6406       packet->type    = stbi__get8(s);
   6407       packet->channel = stbi__get8(s);
   6408 
   6409       act_comp |= packet->channel;
   6410 
   6411       if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
   6412       if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
   6413    } while (chained);
   6414 
   6415    *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
   6416 
   6417    for(y=0; y<height; ++y) {
   6418       int packet_idx;
   6419 
   6420       for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
   6421          stbi__pic_packet *packet = &packets[packet_idx];
   6422          stbi_uc *dest = result+y*width*4;
   6423 
   6424          switch (packet->type) {
   6425             default:
   6426                return stbi__errpuc("bad format","packet has bad compression type");
   6427 
   6428             case 0: {//uncompressed
   6429                int x;
   6430 
   6431                for(x=0;x<width;++x, dest+=4)
   6432                   if (!stbi__readval(s,packet->channel,dest))
   6433                      return 0;
   6434                break;
   6435             }
   6436 
   6437             case 1://Pure RLE
   6438                {
   6439                   int left=width, i;
   6440 
   6441                   while (left>0) {
   6442                      stbi_uc count,value[4];
   6443 
   6444                      count=stbi__get8(s);
   6445                      if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
   6446 
   6447                      if (count > left)
   6448                         count = (stbi_uc) left;
   6449 
   6450                      if (!stbi__readval(s,packet->channel,value))  return 0;
   6451 
   6452                      for(i=0; i<count; ++i,dest+=4)
   6453                         stbi__copyval(packet->channel,dest,value);
   6454                      left -= count;
   6455                   }
   6456                }
   6457                break;
   6458 
   6459             case 2: {//Mixed RLE
   6460                int left=width;
   6461                while (left>0) {
   6462                   int count = stbi__get8(s), i;
   6463                   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
   6464 
   6465                   if (count >= 128) { // Repeated
   6466                      stbi_uc value[4];
   6467 
   6468                      if (count==128)
   6469                         count = stbi__get16be(s);
   6470                      else
   6471                         count -= 127;
   6472                      if (count > left)
   6473                         return stbi__errpuc("bad file","scanline overrun");
   6474 
   6475                      if (!stbi__readval(s,packet->channel,value))
   6476                         return 0;
   6477 
   6478                      for(i=0;i<count;++i, dest += 4)
   6479                         stbi__copyval(packet->channel,dest,value);
   6480                   } else { // Raw
   6481                      ++count;
   6482                      if (count>left) return stbi__errpuc("bad file","scanline overrun");
   6483 
   6484                      for(i=0;i<count;++i, dest+=4)
   6485                         if (!stbi__readval(s,packet->channel,dest))
   6486                            return 0;
   6487                   }
   6488                   left-=count;
   6489                }
   6490                break;
   6491             }
   6492          }
   6493       }
   6494    }
   6495 
   6496    return result;
   6497 }
   6498 
   6499 static void *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp, stbi__result_info *ri)
   6500 {
   6501    stbi_uc *result;
   6502    int i, x,y, internal_comp;
   6503    STBI_NOTUSED(ri);
   6504 
   6505    if (!comp) comp = &internal_comp;
   6506 
   6507    for (i=0; i<92; ++i)
   6508       stbi__get8(s);
   6509 
   6510    x = stbi__get16be(s);
   6511    y = stbi__get16be(s);
   6512 
   6513    if (y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   6514    if (x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   6515 
   6516    if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
   6517    if (!stbi__mad3sizes_valid(x, y, 4, 0)) return stbi__errpuc("too large", "PIC image too large to decode");
   6518 
   6519    stbi__get32be(s); //skip `ratio'
   6520    stbi__get16be(s); //skip `fields'
   6521    stbi__get16be(s); //skip `pad'
   6522 
   6523    // intermediate buffer is RGBA
   6524    result = (stbi_uc *) stbi__malloc_mad3(x, y, 4, 0);
   6525    if (!result) return stbi__errpuc("outofmem", "Out of memory");
   6526    memset(result, 0xff, x*y*4);
   6527 
   6528    if (!stbi__pic_load_core(s,x,y,comp, result)) {
   6529       STBI_FREE(result);
   6530       result=0;
   6531    }
   6532    *px = x;
   6533    *py = y;
   6534    if (req_comp == 0) req_comp = *comp;
   6535    result=stbi__convert_format(result,4,req_comp,x,y);
   6536 
   6537    return result;
   6538 }
   6539 
   6540 static int stbi__pic_test(stbi__context *s)
   6541 {
   6542    int r = stbi__pic_test_core(s);
   6543    stbi__rewind(s);
   6544    return r;
   6545 }
   6546 #endif
   6547 
   6548 // *************************************************************************************************
   6549 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
   6550 
   6551 #ifndef STBI_NO_GIF
   6552 typedef struct
   6553 {
   6554    stbi__int16 prefix;
   6555    stbi_uc first;
   6556    stbi_uc suffix;
   6557 } stbi__gif_lzw;
   6558 
   6559 typedef struct
   6560 {
   6561    int w,h;
   6562    stbi_uc *out;                 // output buffer (always 4 components)
   6563    stbi_uc *background;          // The current "background" as far as a gif is concerned
   6564    stbi_uc *history;
   6565    int flags, bgindex, ratio, transparent, eflags;
   6566    stbi_uc  pal[256][4];
   6567    stbi_uc lpal[256][4];
   6568    stbi__gif_lzw codes[8192];
   6569    stbi_uc *color_table;
   6570    int parse, step;
   6571    int lflags;
   6572    int start_x, start_y;
   6573    int max_x, max_y;
   6574    int cur_x, cur_y;
   6575    int line_size;
   6576    int delay;
   6577 } stbi__gif;
   6578 
   6579 static int stbi__gif_test_raw(stbi__context *s)
   6580 {
   6581    int sz;
   6582    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
   6583    sz = stbi__get8(s);
   6584    if (sz != '9' && sz != '7') return 0;
   6585    if (stbi__get8(s) != 'a') return 0;
   6586    return 1;
   6587 }
   6588 
   6589 static int stbi__gif_test(stbi__context *s)
   6590 {
   6591    int r = stbi__gif_test_raw(s);
   6592    stbi__rewind(s);
   6593    return r;
   6594 }
   6595 
   6596 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
   6597 {
   6598    int i;
   6599    for (i=0; i < num_entries; ++i) {
   6600       pal[i][2] = stbi__get8(s);
   6601       pal[i][1] = stbi__get8(s);
   6602       pal[i][0] = stbi__get8(s);
   6603       pal[i][3] = transp == i ? 0 : 255;
   6604    }
   6605 }
   6606 
   6607 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
   6608 {
   6609    stbi_uc version;
   6610    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
   6611       return stbi__err("not GIF", "Corrupt GIF");
   6612 
   6613    version = stbi__get8(s);
   6614    if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
   6615    if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
   6616 
   6617    stbi__g_failure_reason = "";
   6618    g->w = stbi__get16le(s);
   6619    g->h = stbi__get16le(s);
   6620    g->flags = stbi__get8(s);
   6621    g->bgindex = stbi__get8(s);
   6622    g->ratio = stbi__get8(s);
   6623    g->transparent = -1;
   6624 
   6625    if (g->w > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   6626    if (g->h > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   6627 
   6628    if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
   6629 
   6630    if (is_info) return 1;
   6631 
   6632    if (g->flags & 0x80)
   6633       stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
   6634 
   6635    return 1;
   6636 }
   6637 
   6638 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
   6639 {
   6640    stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
   6641    if (!g) return stbi__err("outofmem", "Out of memory");
   6642    if (!stbi__gif_header(s, g, comp, 1)) {
   6643       STBI_FREE(g);
   6644       stbi__rewind( s );
   6645       return 0;
   6646    }
   6647    if (x) *x = g->w;
   6648    if (y) *y = g->h;
   6649    STBI_FREE(g);
   6650    return 1;
   6651 }
   6652 
   6653 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
   6654 {
   6655    stbi_uc *p, *c;
   6656    int idx;
   6657 
   6658    // recurse to decode the prefixes, since the linked-list is backwards,
   6659    // and working backwards through an interleaved image would be nasty
   6660    if (g->codes[code].prefix >= 0)
   6661       stbi__out_gif_code(g, g->codes[code].prefix);
   6662 
   6663    if (g->cur_y >= g->max_y) return;
   6664 
   6665    idx = g->cur_x + g->cur_y;
   6666    p = &g->out[idx];
   6667    g->history[idx / 4] = 1;
   6668 
   6669    c = &g->color_table[g->codes[code].suffix * 4];
   6670    if (c[3] > 128) { // don't render transparent pixels;
   6671       p[0] = c[2];
   6672       p[1] = c[1];
   6673       p[2] = c[0];
   6674       p[3] = c[3];
   6675    }
   6676    g->cur_x += 4;
   6677 
   6678    if (g->cur_x >= g->max_x) {
   6679       g->cur_x = g->start_x;
   6680       g->cur_y += g->step;
   6681 
   6682       while (g->cur_y >= g->max_y && g->parse > 0) {
   6683          g->step = (1 << g->parse) * g->line_size;
   6684          g->cur_y = g->start_y + (g->step >> 1);
   6685          --g->parse;
   6686       }
   6687    }
   6688 }
   6689 
   6690 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
   6691 {
   6692    stbi_uc lzw_cs;
   6693    stbi__int32 len, init_code;
   6694    stbi__uint32 first;
   6695    stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
   6696    stbi__gif_lzw *p;
   6697 
   6698    lzw_cs = stbi__get8(s);
   6699    if (lzw_cs > 12) return NULL;
   6700    clear = 1 << lzw_cs;
   6701    first = 1;
   6702    codesize = lzw_cs + 1;
   6703    codemask = (1 << codesize) - 1;
   6704    bits = 0;
   6705    valid_bits = 0;
   6706    for (init_code = 0; init_code < clear; init_code++) {
   6707       g->codes[init_code].prefix = -1;
   6708       g->codes[init_code].first = (stbi_uc) init_code;
   6709       g->codes[init_code].suffix = (stbi_uc) init_code;
   6710    }
   6711 
   6712    // support no starting clear code
   6713    avail = clear+2;
   6714    oldcode = -1;
   6715 
   6716    len = 0;
   6717    for(;;) {
   6718       if (valid_bits < codesize) {
   6719          if (len == 0) {
   6720             len = stbi__get8(s); // start new block
   6721             if (len == 0)
   6722                return g->out;
   6723          }
   6724          --len;
   6725          bits |= (stbi__int32) stbi__get8(s) << valid_bits;
   6726          valid_bits += 8;
   6727       } else {
   6728          stbi__int32 code = bits & codemask;
   6729          bits >>= codesize;
   6730          valid_bits -= codesize;
   6731          // @OPTIMIZE: is there some way we can accelerate the non-clear path?
   6732          if (code == clear) {  // clear code
   6733             codesize = lzw_cs + 1;
   6734             codemask = (1 << codesize) - 1;
   6735             avail = clear + 2;
   6736             oldcode = -1;
   6737             first = 0;
   6738          } else if (code == clear + 1) { // end of stream code
   6739             stbi__skip(s, len);
   6740             while ((len = stbi__get8(s)) > 0)
   6741                stbi__skip(s,len);
   6742             return g->out;
   6743          } else if (code <= avail) {
   6744             if (first) {
   6745                return stbi__errpuc("no clear code", "Corrupt GIF");
   6746             }
   6747 
   6748             if (oldcode >= 0) {
   6749                p = &g->codes[avail++];
   6750                if (avail > 8192) {
   6751                   return stbi__errpuc("too many codes", "Corrupt GIF");
   6752                }
   6753 
   6754                p->prefix = (stbi__int16) oldcode;
   6755                p->first = g->codes[oldcode].first;
   6756                p->suffix = (code == avail) ? p->first : g->codes[code].first;
   6757             } else if (code == avail)
   6758                return stbi__errpuc("illegal code in raster", "Corrupt GIF");
   6759 
   6760             stbi__out_gif_code(g, (stbi__uint16) code);
   6761 
   6762             if ((avail & codemask) == 0 && avail <= 0x0FFF) {
   6763                codesize++;
   6764                codemask = (1 << codesize) - 1;
   6765             }
   6766 
   6767             oldcode = code;
   6768          } else {
   6769             return stbi__errpuc("illegal code in raster", "Corrupt GIF");
   6770          }
   6771       }
   6772    }
   6773 }
   6774 
   6775 // this function is designed to support animated gifs, although stb_image doesn't support it
   6776 // two back is the image from two frames ago, used for a very specific disposal format
   6777 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp, stbi_uc *two_back)
   6778 {
   6779    int dispose;
   6780    int first_frame;
   6781    int pi;
   6782    int pcount;
   6783    STBI_NOTUSED(req_comp);
   6784 
   6785    // on first frame, any non-written pixels get the background colour (non-transparent)
   6786    first_frame = 0;
   6787    if (g->out == 0) {
   6788       if (!stbi__gif_header(s, g, comp,0)) return 0; // stbi__g_failure_reason set by stbi__gif_header
   6789       if (!stbi__mad3sizes_valid(4, g->w, g->h, 0))
   6790          return stbi__errpuc("too large", "GIF image is too large");
   6791       pcount = g->w * g->h;
   6792       g->out = (stbi_uc *) stbi__malloc(4 * pcount);
   6793       g->background = (stbi_uc *) stbi__malloc(4 * pcount);
   6794       g->history = (stbi_uc *) stbi__malloc(pcount);
   6795       if (!g->out || !g->background || !g->history)
   6796          return stbi__errpuc("outofmem", "Out of memory");
   6797 
   6798       // image is treated as "transparent" at the start - ie, nothing overwrites the current background;
   6799       // background colour is only used for pixels that are not rendered first frame, after that "background"
   6800       // color refers to the color that was there the previous frame.
   6801       memset(g->out, 0x00, 4 * pcount);
   6802       memset(g->background, 0x00, 4 * pcount); // state of the background (starts transparent)
   6803       memset(g->history, 0x00, pcount);        // pixels that were affected previous frame
   6804       first_frame = 1;
   6805    } else {
   6806       // second frame - how do we dispose of the previous one?
   6807       dispose = (g->eflags & 0x1C) >> 2;
   6808       pcount = g->w * g->h;
   6809 
   6810       if ((dispose == 3) && (two_back == 0)) {
   6811          dispose = 2; // if I don't have an image to revert back to, default to the old background
   6812       }
   6813 
   6814       if (dispose == 3) { // use previous graphic
   6815          for (pi = 0; pi < pcount; ++pi) {
   6816             if (g->history[pi]) {
   6817                memcpy( &g->out[pi * 4], &two_back[pi * 4], 4 );
   6818             }
   6819          }
   6820       } else if (dispose == 2) {
   6821          // restore what was changed last frame to background before that frame;
   6822          for (pi = 0; pi < pcount; ++pi) {
   6823             if (g->history[pi]) {
   6824                memcpy( &g->out[pi * 4], &g->background[pi * 4], 4 );
   6825             }
   6826          }
   6827       } else {
   6828          // This is a non-disposal case eithe way, so just
   6829          // leave the pixels as is, and they will become the new background
   6830          // 1: do not dispose
   6831          // 0:  not specified.
   6832       }
   6833 
   6834       // background is what out is after the undoing of the previou frame;
   6835       memcpy( g->background, g->out, 4 * g->w * g->h );
   6836    }
   6837 
   6838    // clear my history;
   6839    memset( g->history, 0x00, g->w * g->h );        // pixels that were affected previous frame
   6840 
   6841    for (;;) {
   6842       int tag = stbi__get8(s);
   6843       switch (tag) {
   6844          case 0x2C: /* Image Descriptor */
   6845          {
   6846             stbi__int32 x, y, w, h;
   6847             stbi_uc *o;
   6848 
   6849             x = stbi__get16le(s);
   6850             y = stbi__get16le(s);
   6851             w = stbi__get16le(s);
   6852             h = stbi__get16le(s);
   6853             if (((x + w) > (g->w)) || ((y + h) > (g->h)))
   6854                return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
   6855 
   6856             g->line_size = g->w * 4;
   6857             g->start_x = x * 4;
   6858             g->start_y = y * g->line_size;
   6859             g->max_x   = g->start_x + w * 4;
   6860             g->max_y   = g->start_y + h * g->line_size;
   6861             g->cur_x   = g->start_x;
   6862             g->cur_y   = g->start_y;
   6863 
   6864             // if the width of the specified rectangle is 0, that means
   6865             // we may not see *any* pixels or the image is malformed;
   6866             // to make sure this is caught, move the current y down to
   6867             // max_y (which is what out_gif_code checks).
   6868             if (w == 0)
   6869                g->cur_y = g->max_y;
   6870 
   6871             g->lflags = stbi__get8(s);
   6872 
   6873             if (g->lflags & 0x40) {
   6874                g->step = 8 * g->line_size; // first interlaced spacing
   6875                g->parse = 3;
   6876             } else {
   6877                g->step = g->line_size;
   6878                g->parse = 0;
   6879             }
   6880 
   6881             if (g->lflags & 0x80) {
   6882                stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
   6883                g->color_table = (stbi_uc *) g->lpal;
   6884             } else if (g->flags & 0x80) {
   6885                g->color_table = (stbi_uc *) g->pal;
   6886             } else
   6887                return stbi__errpuc("missing color table", "Corrupt GIF");
   6888 
   6889             o = stbi__process_gif_raster(s, g);
   6890             if (!o) return NULL;
   6891 
   6892             // if this was the first frame,
   6893             pcount = g->w * g->h;
   6894             if (first_frame && (g->bgindex > 0)) {
   6895                // if first frame, any pixel not drawn to gets the background color
   6896                for (pi = 0; pi < pcount; ++pi) {
   6897                   if (g->history[pi] == 0) {
   6898                      g->pal[g->bgindex][3] = 255; // just in case it was made transparent, undo that; It will be reset next frame if need be;
   6899                      memcpy( &g->out[pi * 4], &g->pal[g->bgindex], 4 );
   6900                   }
   6901                }
   6902             }
   6903 
   6904             return o;
   6905          }
   6906 
   6907          case 0x21: // Comment Extension.
   6908          {
   6909             int len;
   6910             int ext = stbi__get8(s);
   6911             if (ext == 0xF9) { // Graphic Control Extension.
   6912                len = stbi__get8(s);
   6913                if (len == 4) {
   6914                   g->eflags = stbi__get8(s);
   6915                   g->delay = 10 * stbi__get16le(s); // delay - 1/100th of a second, saving as 1/1000ths.
   6916 
   6917                   // unset old transparent
   6918                   if (g->transparent >= 0) {
   6919                      g->pal[g->transparent][3] = 255;
   6920                   }
   6921                   if (g->eflags & 0x01) {
   6922                      g->transparent = stbi__get8(s);
   6923                      if (g->transparent >= 0) {
   6924                         g->pal[g->transparent][3] = 0;
   6925                      }
   6926                   } else {
   6927                      // don't need transparent
   6928                      stbi__skip(s, 1);
   6929                      g->transparent = -1;
   6930                   }
   6931                } else {
   6932                   stbi__skip(s, len);
   6933                   break;
   6934                }
   6935             }
   6936             while ((len = stbi__get8(s)) != 0) {
   6937                stbi__skip(s, len);
   6938             }
   6939             break;
   6940          }
   6941 
   6942          case 0x3B: // gif stream termination code
   6943             return (stbi_uc *) s; // using '1' causes warning on some compilers
   6944 
   6945          default:
   6946             return stbi__errpuc("unknown code", "Corrupt GIF");
   6947       }
   6948    }
   6949 }
   6950 
   6951 static void *stbi__load_gif_main_outofmem(stbi__gif *g, stbi_uc *out, int **delays)
   6952 {
   6953    STBI_FREE(g->out);
   6954    STBI_FREE(g->history);
   6955    STBI_FREE(g->background);
   6956 
   6957    if (out) STBI_FREE(out);
   6958    if (delays && *delays) STBI_FREE(*delays);
   6959    return stbi__errpuc("outofmem", "Out of memory");
   6960 }
   6961 
   6962 static void *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
   6963 {
   6964    if (stbi__gif_test(s)) {
   6965       int layers = 0;
   6966       stbi_uc *u = 0;
   6967       stbi_uc *out = 0;
   6968       stbi_uc *two_back = 0;
   6969       stbi__gif g;
   6970       int stride;
   6971       int out_size = 0;
   6972       int delays_size = 0;
   6973 
   6974       STBI_NOTUSED(out_size);
   6975       STBI_NOTUSED(delays_size);
   6976 
   6977       memset(&g, 0, sizeof(g));
   6978       if (delays) {
   6979          *delays = 0;
   6980       }
   6981 
   6982       do {
   6983          u = stbi__gif_load_next(s, &g, comp, req_comp, two_back);
   6984          if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
   6985 
   6986          if (u) {
   6987             *x = g.w;
   6988             *y = g.h;
   6989             ++layers;
   6990             stride = g.w * g.h * 4;
   6991 
   6992             if (out) {
   6993                void *tmp = (stbi_uc*) STBI_REALLOC_SIZED( out, out_size, layers * stride );
   6994                if (!tmp)
   6995                   return stbi__load_gif_main_outofmem(&g, out, delays);
   6996                else {
   6997                    out = (stbi_uc*) tmp;
   6998                    out_size = layers * stride;
   6999                }
   7000 
   7001                if (delays) {
   7002                   int *new_delays = (int*) STBI_REALLOC_SIZED( *delays, delays_size, sizeof(int) * layers );
   7003                   if (!new_delays)
   7004                      return stbi__load_gif_main_outofmem(&g, out, delays);
   7005                   *delays = new_delays;
   7006                   delays_size = layers * sizeof(int);
   7007                }
   7008             } else {
   7009                out = (stbi_uc*)stbi__malloc( layers * stride );
   7010                if (!out)
   7011                   return stbi__load_gif_main_outofmem(&g, out, delays);
   7012                out_size = layers * stride;
   7013                if (delays) {
   7014                   *delays = (int*) stbi__malloc( layers * sizeof(int) );
   7015                   if (!*delays)
   7016                      return stbi__load_gif_main_outofmem(&g, out, delays);
   7017                   delays_size = layers * sizeof(int);
   7018                }
   7019             }
   7020             memcpy( out + ((layers - 1) * stride), u, stride );
   7021             if (layers >= 2) {
   7022                two_back = out - 2 * stride;
   7023             }
   7024 
   7025             if (delays) {
   7026                (*delays)[layers - 1U] = g.delay;
   7027             }
   7028          }
   7029       } while (u != 0);
   7030 
   7031       // free temp buffer;
   7032       STBI_FREE(g.out);
   7033       STBI_FREE(g.history);
   7034       STBI_FREE(g.background);
   7035 
   7036       // do the final conversion after loading everything;
   7037       if (req_comp && req_comp != 4)
   7038          out = stbi__convert_format(out, 4, req_comp, layers * g.w, g.h);
   7039 
   7040       *z = layers;
   7041       return out;
   7042    } else {
   7043       return stbi__errpuc("not GIF", "Image was not as a gif type.");
   7044    }
   7045 }
   7046 
   7047 static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   7048 {
   7049    stbi_uc *u = 0;
   7050    stbi__gif g;
   7051    memset(&g, 0, sizeof(g));
   7052    STBI_NOTUSED(ri);
   7053 
   7054    u = stbi__gif_load_next(s, &g, comp, req_comp, 0);
   7055    if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
   7056    if (u) {
   7057       *x = g.w;
   7058       *y = g.h;
   7059 
   7060       // moved conversion to after successful load so that the same
   7061       // can be done for multiple frames.
   7062       if (req_comp && req_comp != 4)
   7063          u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
   7064    } else if (g.out) {
   7065       // if there was an error and we allocated an image buffer, free it!
   7066       STBI_FREE(g.out);
   7067    }
   7068 
   7069    // free buffers needed for multiple frame loading;
   7070    STBI_FREE(g.history);
   7071    STBI_FREE(g.background);
   7072 
   7073    return u;
   7074 }
   7075 
   7076 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
   7077 {
   7078    return stbi__gif_info_raw(s,x,y,comp);
   7079 }
   7080 #endif
   7081 
   7082 // *************************************************************************************************
   7083 // Radiance RGBE HDR loader
   7084 // originally by Nicolas Schulz
   7085 #ifndef STBI_NO_HDR
   7086 static int stbi__hdr_test_core(stbi__context *s, const char *signature)
   7087 {
   7088    int i;
   7089    for (i=0; signature[i]; ++i)
   7090       if (stbi__get8(s) != signature[i])
   7091           return 0;
   7092    stbi__rewind(s);
   7093    return 1;
   7094 }
   7095 
   7096 static int stbi__hdr_test(stbi__context* s)
   7097 {
   7098    int r = stbi__hdr_test_core(s, "#?RADIANCE\n");
   7099    stbi__rewind(s);
   7100    if(!r) {
   7101        r = stbi__hdr_test_core(s, "#?RGBE\n");
   7102        stbi__rewind(s);
   7103    }
   7104    return r;
   7105 }
   7106 
   7107 #define STBI__HDR_BUFLEN  1024
   7108 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
   7109 {
   7110    int len=0;
   7111    char c = '\0';
   7112 
   7113    c = (char) stbi__get8(z);
   7114 
   7115    while (!stbi__at_eof(z) && c != '\n') {
   7116       buffer[len++] = c;
   7117       if (len == STBI__HDR_BUFLEN-1) {
   7118          // flush to end of line
   7119          while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
   7120             ;
   7121          break;
   7122       }
   7123       c = (char) stbi__get8(z);
   7124    }
   7125 
   7126    buffer[len] = 0;
   7127    return buffer;
   7128 }
   7129 
   7130 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
   7131 {
   7132    if ( input[3] != 0 ) {
   7133       float f1;
   7134       // Exponent
   7135       f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
   7136       if (req_comp <= 2)
   7137          output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
   7138       else {
   7139          output[0] = input[0] * f1;
   7140          output[1] = input[1] * f1;
   7141          output[2] = input[2] * f1;
   7142       }
   7143       if (req_comp == 2) output[1] = 1;
   7144       if (req_comp == 4) output[3] = 1;
   7145    } else {
   7146       switch (req_comp) {
   7147          case 4: output[3] = 1; /* fallthrough */
   7148          case 3: output[0] = output[1] = output[2] = 0;
   7149                  break;
   7150          case 2: output[1] = 1; /* fallthrough */
   7151          case 1: output[0] = 0;
   7152                  break;
   7153       }
   7154    }
   7155 }
   7156 
   7157 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   7158 {
   7159    char buffer[STBI__HDR_BUFLEN];
   7160    char *token;
   7161    int valid = 0;
   7162    int width, height;
   7163    stbi_uc *scanline;
   7164    float *hdr_data;
   7165    int len;
   7166    unsigned char count, value;
   7167    int i, j, k, c1,c2, z;
   7168    const char *headerToken;
   7169    STBI_NOTUSED(ri);
   7170 
   7171    // Check identifier
   7172    headerToken = stbi__hdr_gettoken(s,buffer);
   7173    if (strcmp(headerToken, "#?RADIANCE") != 0 && strcmp(headerToken, "#?RGBE") != 0)
   7174       return stbi__errpf("not HDR", "Corrupt HDR image");
   7175 
   7176    // Parse header
   7177    for(;;) {
   7178       token = stbi__hdr_gettoken(s,buffer);
   7179       if (token[0] == 0) break;
   7180       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
   7181    }
   7182 
   7183    if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
   7184 
   7185    // Parse width and height
   7186    // can't use sscanf() if we're not using stdio!
   7187    token = stbi__hdr_gettoken(s,buffer);
   7188    if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
   7189    token += 3;
   7190    height = (int) strtol(token, &token, 10);
   7191    while (*token == ' ') ++token;
   7192    if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
   7193    token += 3;
   7194    width = (int) strtol(token, NULL, 10);
   7195 
   7196    if (height > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
   7197    if (width > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
   7198 
   7199    *x = width;
   7200    *y = height;
   7201 
   7202    if (comp) *comp = 3;
   7203    if (req_comp == 0) req_comp = 3;
   7204 
   7205    if (!stbi__mad4sizes_valid(width, height, req_comp, sizeof(float), 0))
   7206       return stbi__errpf("too large", "HDR image is too large");
   7207 
   7208    // Read data
   7209    hdr_data = (float *) stbi__malloc_mad4(width, height, req_comp, sizeof(float), 0);
   7210    if (!hdr_data)
   7211       return stbi__errpf("outofmem", "Out of memory");
   7212 
   7213    // Load image data
   7214    // image data is stored as some number of sca
   7215    if ( width < 8 || width >= 32768) {
   7216       // Read flat data
   7217       for (j=0; j < height; ++j) {
   7218          for (i=0; i < width; ++i) {
   7219             stbi_uc rgbe[4];
   7220            main_decode_loop:
   7221             stbi__getn(s, rgbe, 4);
   7222             stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
   7223          }
   7224       }
   7225    } else {
   7226       // Read RLE-encoded data
   7227       scanline = NULL;
   7228 
   7229       for (j = 0; j < height; ++j) {
   7230          c1 = stbi__get8(s);
   7231          c2 = stbi__get8(s);
   7232          len = stbi__get8(s);
   7233          if (c1 != 2 || c2 != 2 || (len & 0x80)) {
   7234             // not run-length encoded, so we have to actually use THIS data as a decoded
   7235             // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
   7236             stbi_uc rgbe[4];
   7237             rgbe[0] = (stbi_uc) c1;
   7238             rgbe[1] = (stbi_uc) c2;
   7239             rgbe[2] = (stbi_uc) len;
   7240             rgbe[3] = (stbi_uc) stbi__get8(s);
   7241             stbi__hdr_convert(hdr_data, rgbe, req_comp);
   7242             i = 1;
   7243             j = 0;
   7244             STBI_FREE(scanline);
   7245             goto main_decode_loop; // yes, this makes no sense
   7246          }
   7247          len <<= 8;
   7248          len |= stbi__get8(s);
   7249          if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
   7250          if (scanline == NULL) {
   7251             scanline = (stbi_uc *) stbi__malloc_mad2(width, 4, 0);
   7252             if (!scanline) {
   7253                STBI_FREE(hdr_data);
   7254                return stbi__errpf("outofmem", "Out of memory");
   7255             }
   7256          }
   7257 
   7258          for (k = 0; k < 4; ++k) {
   7259             int nleft;
   7260             i = 0;
   7261             while ((nleft = width - i) > 0) {
   7262                count = stbi__get8(s);
   7263                if (count > 128) {
   7264                   // Run
   7265                   value = stbi__get8(s);
   7266                   count -= 128;
   7267                   if ((count == 0) || (count > nleft)) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
   7268                   for (z = 0; z < count; ++z)
   7269                      scanline[i++ * 4 + k] = value;
   7270                } else {
   7271                   // Dump
   7272                   if ((count == 0) || (count > nleft)) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
   7273                   for (z = 0; z < count; ++z)
   7274                      scanline[i++ * 4 + k] = stbi__get8(s);
   7275                }
   7276             }
   7277          }
   7278          for (i=0; i < width; ++i)
   7279             stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
   7280       }
   7281       if (scanline)
   7282          STBI_FREE(scanline);
   7283    }
   7284 
   7285    return hdr_data;
   7286 }
   7287 
   7288 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
   7289 {
   7290    char buffer[STBI__HDR_BUFLEN];
   7291    char *token;
   7292    int valid = 0;
   7293    int dummy;
   7294 
   7295    if (!x) x = &dummy;
   7296    if (!y) y = &dummy;
   7297    if (!comp) comp = &dummy;
   7298 
   7299    if (stbi__hdr_test(s) == 0) {
   7300        stbi__rewind( s );
   7301        return 0;
   7302    }
   7303 
   7304    for(;;) {
   7305       token = stbi__hdr_gettoken(s,buffer);
   7306       if (token[0] == 0) break;
   7307       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
   7308    }
   7309 
   7310    if (!valid) {
   7311        stbi__rewind( s );
   7312        return 0;
   7313    }
   7314    token = stbi__hdr_gettoken(s,buffer);
   7315    if (strncmp(token, "-Y ", 3)) {
   7316        stbi__rewind( s );
   7317        return 0;
   7318    }
   7319    token += 3;
   7320    *y = (int) strtol(token, &token, 10);
   7321    while (*token == ' ') ++token;
   7322    if (strncmp(token, "+X ", 3)) {
   7323        stbi__rewind( s );
   7324        return 0;
   7325    }
   7326    token += 3;
   7327    *x = (int) strtol(token, NULL, 10);
   7328    *comp = 3;
   7329    return 1;
   7330 }
   7331 #endif // STBI_NO_HDR
   7332 
   7333 #ifndef STBI_NO_BMP
   7334 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
   7335 {
   7336    void *p;
   7337    stbi__bmp_data info;
   7338 
   7339    info.all_a = 255;
   7340    p = stbi__bmp_parse_header(s, &info);
   7341    if (p == NULL) {
   7342       stbi__rewind( s );
   7343       return 0;
   7344    }
   7345    if (x) *x = s->img_x;
   7346    if (y) *y = s->img_y;
   7347    if (comp) {
   7348       if (info.bpp == 24 && info.ma == 0xff000000)
   7349          *comp = 3;
   7350       else
   7351          *comp = info.ma ? 4 : 3;
   7352    }
   7353    return 1;
   7354 }
   7355 #endif
   7356 
   7357 #ifndef STBI_NO_PSD
   7358 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
   7359 {
   7360    int channelCount, dummy, depth;
   7361    if (!x) x = &dummy;
   7362    if (!y) y = &dummy;
   7363    if (!comp) comp = &dummy;
   7364    if (stbi__get32be(s) != 0x38425053) {
   7365        stbi__rewind( s );
   7366        return 0;
   7367    }
   7368    if (stbi__get16be(s) != 1) {
   7369        stbi__rewind( s );
   7370        return 0;
   7371    }
   7372    stbi__skip(s, 6);
   7373    channelCount = stbi__get16be(s);
   7374    if (channelCount < 0 || channelCount > 16) {
   7375        stbi__rewind( s );
   7376        return 0;
   7377    }
   7378    *y = stbi__get32be(s);
   7379    *x = stbi__get32be(s);
   7380    depth = stbi__get16be(s);
   7381    if (depth != 8 && depth != 16) {
   7382        stbi__rewind( s );
   7383        return 0;
   7384    }
   7385    if (stbi__get16be(s) != 3) {
   7386        stbi__rewind( s );
   7387        return 0;
   7388    }
   7389    *comp = 4;
   7390    return 1;
   7391 }
   7392 
   7393 static int stbi__psd_is16(stbi__context *s)
   7394 {
   7395    int channelCount, depth;
   7396    if (stbi__get32be(s) != 0x38425053) {
   7397        stbi__rewind( s );
   7398        return 0;
   7399    }
   7400    if (stbi__get16be(s) != 1) {
   7401        stbi__rewind( s );
   7402        return 0;
   7403    }
   7404    stbi__skip(s, 6);
   7405    channelCount = stbi__get16be(s);
   7406    if (channelCount < 0 || channelCount > 16) {
   7407        stbi__rewind( s );
   7408        return 0;
   7409    }
   7410    STBI_NOTUSED(stbi__get32be(s));
   7411    STBI_NOTUSED(stbi__get32be(s));
   7412    depth = stbi__get16be(s);
   7413    if (depth != 16) {
   7414        stbi__rewind( s );
   7415        return 0;
   7416    }
   7417    return 1;
   7418 }
   7419 #endif
   7420 
   7421 #ifndef STBI_NO_PIC
   7422 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
   7423 {
   7424    int act_comp=0,num_packets=0,chained,dummy;
   7425    stbi__pic_packet packets[10];
   7426 
   7427    if (!x) x = &dummy;
   7428    if (!y) y = &dummy;
   7429    if (!comp) comp = &dummy;
   7430 
   7431    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
   7432       stbi__rewind(s);
   7433       return 0;
   7434    }
   7435 
   7436    stbi__skip(s, 88);
   7437 
   7438    *x = stbi__get16be(s);
   7439    *y = stbi__get16be(s);
   7440    if (stbi__at_eof(s)) {
   7441       stbi__rewind( s);
   7442       return 0;
   7443    }
   7444    if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
   7445       stbi__rewind( s );
   7446       return 0;
   7447    }
   7448 
   7449    stbi__skip(s, 8);
   7450 
   7451    do {
   7452       stbi__pic_packet *packet;
   7453 
   7454       if (num_packets==sizeof(packets)/sizeof(packets[0]))
   7455          return 0;
   7456 
   7457       packet = &packets[num_packets++];
   7458       chained = stbi__get8(s);
   7459       packet->size    = stbi__get8(s);
   7460       packet->type    = stbi__get8(s);
   7461       packet->channel = stbi__get8(s);
   7462       act_comp |= packet->channel;
   7463 
   7464       if (stbi__at_eof(s)) {
   7465           stbi__rewind( s );
   7466           return 0;
   7467       }
   7468       if (packet->size != 8) {
   7469           stbi__rewind( s );
   7470           return 0;
   7471       }
   7472    } while (chained);
   7473 
   7474    *comp = (act_comp & 0x10 ? 4 : 3);
   7475 
   7476    return 1;
   7477 }
   7478 #endif
   7479 
   7480 // *************************************************************************************************
   7481 // Portable Gray Map and Portable Pixel Map loader
   7482 // by Ken Miller
   7483 //
   7484 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
   7485 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
   7486 //
   7487 // Known limitations:
   7488 //    Does not support comments in the header section
   7489 //    Does not support ASCII image data (formats P2 and P3)
   7490 
   7491 #ifndef STBI_NO_PNM
   7492 
   7493 static int      stbi__pnm_test(stbi__context *s)
   7494 {
   7495    char p, t;
   7496    p = (char) stbi__get8(s);
   7497    t = (char) stbi__get8(s);
   7498    if (p != 'P' || (t != '5' && t != '6')) {
   7499        stbi__rewind( s );
   7500        return 0;
   7501    }
   7502    return 1;
   7503 }
   7504 
   7505 static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   7506 {
   7507    stbi_uc *out;
   7508    STBI_NOTUSED(ri);
   7509 
   7510    ri->bits_per_channel = stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n);
   7511    if (ri->bits_per_channel == 0)
   7512       return 0;
   7513 
   7514    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   7515    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   7516 
   7517    *x = s->img_x;
   7518    *y = s->img_y;
   7519    if (comp) *comp = s->img_n;
   7520 
   7521    if (!stbi__mad4sizes_valid(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0))
   7522       return stbi__errpuc("too large", "PNM too large");
   7523 
   7524    out = (stbi_uc *) stbi__malloc_mad4(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0);
   7525    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   7526    if (!stbi__getn(s, out, s->img_n * s->img_x * s->img_y * (ri->bits_per_channel / 8))) {
   7527       STBI_FREE(out);
   7528       return stbi__errpuc("bad PNM", "PNM file truncated");
   7529    }
   7530 
   7531    if (req_comp && req_comp != s->img_n) {
   7532       if (ri->bits_per_channel == 16) {
   7533          out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, s->img_n, req_comp, s->img_x, s->img_y);
   7534       } else {
   7535          out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
   7536       }
   7537       if (out == NULL) return out; // stbi__convert_format frees input on failure
   7538    }
   7539    return out;
   7540 }
   7541 
   7542 static int      stbi__pnm_isspace(char c)
   7543 {
   7544    return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
   7545 }
   7546 
   7547 static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
   7548 {
   7549    for (;;) {
   7550       while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
   7551          *c = (char) stbi__get8(s);
   7552 
   7553       if (stbi__at_eof(s) || *c != '#')
   7554          break;
   7555 
   7556       while (!stbi__at_eof(s) && *c != '\n' && *c != '\r' )
   7557          *c = (char) stbi__get8(s);
   7558    }
   7559 }
   7560 
   7561 static int      stbi__pnm_isdigit(char c)
   7562 {
   7563    return c >= '0' && c <= '9';
   7564 }
   7565 
   7566 static int      stbi__pnm_getinteger(stbi__context *s, char *c)
   7567 {
   7568    int value = 0;
   7569 
   7570    while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
   7571       value = value*10 + (*c - '0');
   7572       *c = (char) stbi__get8(s);
   7573       if((value > 214748364) || (value == 214748364 && *c > '7'))
   7574           return stbi__err("integer parse overflow", "Parsing an integer in the PPM header overflowed a 32-bit int");
   7575    }
   7576 
   7577    return value;
   7578 }
   7579 
   7580 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
   7581 {
   7582    int maxv, dummy;
   7583    char c, p, t;
   7584 
   7585    if (!x) x = &dummy;
   7586    if (!y) y = &dummy;
   7587    if (!comp) comp = &dummy;
   7588 
   7589    stbi__rewind(s);
   7590 
   7591    // Get identifier
   7592    p = (char) stbi__get8(s);
   7593    t = (char) stbi__get8(s);
   7594    if (p != 'P' || (t != '5' && t != '6')) {
   7595        stbi__rewind(s);
   7596        return 0;
   7597    }
   7598 
   7599    *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
   7600 
   7601    c = (char) stbi__get8(s);
   7602    stbi__pnm_skip_whitespace(s, &c);
   7603 
   7604    *x = stbi__pnm_getinteger(s, &c); // read width
   7605    if(*x == 0)
   7606        return stbi__err("invalid width", "PPM image header had zero or overflowing width");
   7607    stbi__pnm_skip_whitespace(s, &c);
   7608 
   7609    *y = stbi__pnm_getinteger(s, &c); // read height
   7610    if (*y == 0)
   7611        return stbi__err("invalid width", "PPM image header had zero or overflowing width");
   7612    stbi__pnm_skip_whitespace(s, &c);
   7613 
   7614    maxv = stbi__pnm_getinteger(s, &c);  // read max value
   7615    if (maxv > 65535)
   7616       return stbi__err("max value > 65535", "PPM image supports only 8-bit and 16-bit images");
   7617    else if (maxv > 255)
   7618       return 16;
   7619    else
   7620       return 8;
   7621 }
   7622 
   7623 static int stbi__pnm_is16(stbi__context *s)
   7624 {
   7625    if (stbi__pnm_info(s, NULL, NULL, NULL) == 16)
   7626 	   return 1;
   7627    return 0;
   7628 }
   7629 #endif
   7630 
   7631 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
   7632 {
   7633    #ifndef STBI_NO_JPEG
   7634    if (stbi__jpeg_info(s, x, y, comp)) return 1;
   7635    #endif
   7636 
   7637    #ifndef STBI_NO_PNG
   7638    if (stbi__png_info(s, x, y, comp))  return 1;
   7639    #endif
   7640 
   7641    #ifndef STBI_NO_GIF
   7642    if (stbi__gif_info(s, x, y, comp))  return 1;
   7643    #endif
   7644 
   7645    #ifndef STBI_NO_BMP
   7646    if (stbi__bmp_info(s, x, y, comp))  return 1;
   7647    #endif
   7648 
   7649    #ifndef STBI_NO_PSD
   7650    if (stbi__psd_info(s, x, y, comp))  return 1;
   7651    #endif
   7652 
   7653    #ifndef STBI_NO_PIC
   7654    if (stbi__pic_info(s, x, y, comp))  return 1;
   7655    #endif
   7656 
   7657    #ifndef STBI_NO_PNM
   7658    if (stbi__pnm_info(s, x, y, comp))  return 1;
   7659    #endif
   7660 
   7661    #ifndef STBI_NO_HDR
   7662    if (stbi__hdr_info(s, x, y, comp))  return 1;
   7663    #endif
   7664 
   7665    // test tga last because it's a crappy test!
   7666    #ifndef STBI_NO_TGA
   7667    if (stbi__tga_info(s, x, y, comp))
   7668        return 1;
   7669    #endif
   7670    return stbi__err("unknown image type", "Image not of any known type, or corrupt");
   7671 }
   7672 
   7673 static int stbi__is_16_main(stbi__context *s)
   7674 {
   7675    #ifndef STBI_NO_PNG
   7676    if (stbi__png_is16(s))  return 1;
   7677    #endif
   7678 
   7679    #ifndef STBI_NO_PSD
   7680    if (stbi__psd_is16(s))  return 1;
   7681    #endif
   7682 
   7683    #ifndef STBI_NO_PNM
   7684    if (stbi__pnm_is16(s))  return 1;
   7685    #endif
   7686    return 0;
   7687 }
   7688 
   7689 #ifndef STBI_NO_STDIO
   7690 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
   7691 {
   7692     FILE *f = stbi__fopen(filename, "rb");
   7693     int result;
   7694     if (!f) return stbi__err("can't fopen", "Unable to open file");
   7695     result = stbi_info_from_file(f, x, y, comp);
   7696     fclose(f);
   7697     return result;
   7698 }
   7699 
   7700 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
   7701 {
   7702    int r;
   7703    stbi__context s;
   7704    long pos = ftell(f);
   7705    stbi__start_file(&s, f);
   7706    r = stbi__info_main(&s,x,y,comp);
   7707    fseek(f,pos,SEEK_SET);
   7708    return r;
   7709 }
   7710 
   7711 STBIDEF int stbi_is_16_bit(char const *filename)
   7712 {
   7713     FILE *f = stbi__fopen(filename, "rb");
   7714     int result;
   7715     if (!f) return stbi__err("can't fopen", "Unable to open file");
   7716     result = stbi_is_16_bit_from_file(f);
   7717     fclose(f);
   7718     return result;
   7719 }
   7720 
   7721 STBIDEF int stbi_is_16_bit_from_file(FILE *f)
   7722 {
   7723    int r;
   7724    stbi__context s;
   7725    long pos = ftell(f);
   7726    stbi__start_file(&s, f);
   7727    r = stbi__is_16_main(&s);
   7728    fseek(f,pos,SEEK_SET);
   7729    return r;
   7730 }
   7731 #endif // !STBI_NO_STDIO
   7732 
   7733 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
   7734 {
   7735    stbi__context s;
   7736    stbi__start_mem(&s,buffer,len);
   7737    return stbi__info_main(&s,x,y,comp);
   7738 }
   7739 
   7740 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
   7741 {
   7742    stbi__context s;
   7743    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
   7744    return stbi__info_main(&s,x,y,comp);
   7745 }
   7746 
   7747 STBIDEF int stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len)
   7748 {
   7749    stbi__context s;
   7750    stbi__start_mem(&s,buffer,len);
   7751    return stbi__is_16_main(&s);
   7752 }
   7753 
   7754 STBIDEF int stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *c, void *user)
   7755 {
   7756    stbi__context s;
   7757    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
   7758    return stbi__is_16_main(&s);
   7759 }
   7760 
   7761 #endif // STB_IMAGE_IMPLEMENTATION
   7762 
   7763 /*
   7764    revision history:
   7765       2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
   7766       2.19  (2018-02-11) fix warning
   7767       2.18  (2018-01-30) fix warnings
   7768       2.17  (2018-01-29) change sbti__shiftsigned to avoid clang -O2 bug
   7769                          1-bit BMP
   7770                          *_is_16_bit api
   7771                          avoid warnings
   7772       2.16  (2017-07-23) all functions have 16-bit variants;
   7773                          STBI_NO_STDIO works again;
   7774                          compilation fixes;
   7775                          fix rounding in unpremultiply;
   7776                          optimize vertical flip;
   7777                          disable raw_len validation;
   7778                          documentation fixes
   7779       2.15  (2017-03-18) fix png-1,2,4 bug; now all Imagenet JPGs decode;
   7780                          warning fixes; disable run-time SSE detection on gcc;
   7781                          uniform handling of optional "return" values;
   7782                          thread-safe initialization of zlib tables
   7783       2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
   7784       2.13  (2016-11-29) add 16-bit API, only supported for PNG right now
   7785       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
   7786       2.11  (2016-04-02) allocate large structures on the stack
   7787                          remove white matting for transparent PSD
   7788                          fix reported channel count for PNG & BMP
   7789                          re-enable SSE2 in non-gcc 64-bit
   7790                          support RGB-formatted JPEG
   7791                          read 16-bit PNGs (only as 8-bit)
   7792       2.10  (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
   7793       2.09  (2016-01-16) allow comments in PNM files
   7794                          16-bit-per-pixel TGA (not bit-per-component)
   7795                          info() for TGA could break due to .hdr handling
   7796                          info() for BMP to shares code instead of sloppy parse
   7797                          can use STBI_REALLOC_SIZED if allocator doesn't support realloc
   7798                          code cleanup
   7799       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
   7800       2.07  (2015-09-13) fix compiler warnings
   7801                          partial animated GIF support
   7802                          limited 16-bpc PSD support
   7803                          #ifdef unused functions
   7804                          bug with < 92 byte PIC,PNM,HDR,TGA
   7805       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
   7806       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
   7807       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
   7808       2.03  (2015-04-12) extra corruption checking (mmozeiko)
   7809                          stbi_set_flip_vertically_on_load (nguillemot)
   7810                          fix NEON support; fix mingw support
   7811       2.02  (2015-01-19) fix incorrect assert, fix warning
   7812       2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
   7813       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
   7814       2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
   7815                          progressive JPEG (stb)
   7816                          PGM/PPM support (Ken Miller)
   7817                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
   7818                          GIF bugfix -- seemingly never worked
   7819                          STBI_NO_*, STBI_ONLY_*
   7820       1.48  (2014-12-14) fix incorrectly-named assert()
   7821       1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
   7822                          optimize PNG (ryg)
   7823                          fix bug in interlaced PNG with user-specified channel count (stb)
   7824       1.46  (2014-08-26)
   7825               fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
   7826       1.45  (2014-08-16)
   7827               fix MSVC-ARM internal compiler error by wrapping malloc
   7828       1.44  (2014-08-07)
   7829               various warning fixes from Ronny Chevalier
   7830       1.43  (2014-07-15)
   7831               fix MSVC-only compiler problem in code changed in 1.42
   7832       1.42  (2014-07-09)
   7833               don't define _CRT_SECURE_NO_WARNINGS (affects user code)
   7834               fixes to stbi__cleanup_jpeg path
   7835               added STBI_ASSERT to avoid requiring assert.h
   7836       1.41  (2014-06-25)
   7837               fix search&replace from 1.36 that messed up comments/error messages
   7838       1.40  (2014-06-22)
   7839               fix gcc struct-initialization warning
   7840       1.39  (2014-06-15)
   7841               fix to TGA optimization when req_comp != number of components in TGA;
   7842               fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
   7843               add support for BMP version 5 (more ignored fields)
   7844       1.38  (2014-06-06)
   7845               suppress MSVC warnings on integer casts truncating values
   7846               fix accidental rename of 'skip' field of I/O
   7847       1.37  (2014-06-04)
   7848               remove duplicate typedef
   7849       1.36  (2014-06-03)
   7850               convert to header file single-file library
   7851               if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
   7852       1.35  (2014-05-27)
   7853               various warnings
   7854               fix broken STBI_SIMD path
   7855               fix bug where stbi_load_from_file no longer left file pointer in correct place
   7856               fix broken non-easy path for 32-bit BMP (possibly never used)
   7857               TGA optimization by Arseny Kapoulkine
   7858       1.34  (unknown)
   7859               use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
   7860       1.33  (2011-07-14)
   7861               make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
   7862       1.32  (2011-07-13)
   7863               support for "info" function for all supported filetypes (SpartanJ)
   7864       1.31  (2011-06-20)
   7865               a few more leak fixes, bug in PNG handling (SpartanJ)
   7866       1.30  (2011-06-11)
   7867               added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
   7868               removed deprecated format-specific test/load functions
   7869               removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
   7870               error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
   7871               fix inefficiency in decoding 32-bit BMP (David Woo)
   7872       1.29  (2010-08-16)
   7873               various warning fixes from Aurelien Pocheville
   7874       1.28  (2010-08-01)
   7875               fix bug in GIF palette transparency (SpartanJ)
   7876       1.27  (2010-08-01)
   7877               cast-to-stbi_uc to fix warnings
   7878       1.26  (2010-07-24)
   7879               fix bug in file buffering for PNG reported by SpartanJ
   7880       1.25  (2010-07-17)
   7881               refix trans_data warning (Won Chun)
   7882       1.24  (2010-07-12)
   7883               perf improvements reading from files on platforms with lock-heavy fgetc()
   7884               minor perf improvements for jpeg
   7885               deprecated type-specific functions so we'll get feedback if they're needed
   7886               attempt to fix trans_data warning (Won Chun)
   7887       1.23    fixed bug in iPhone support
   7888       1.22  (2010-07-10)
   7889               removed image *writing* support
   7890               stbi_info support from Jetro Lauha
   7891               GIF support from Jean-Marc Lienher
   7892               iPhone PNG-extensions from James Brown
   7893               warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
   7894       1.21    fix use of 'stbi_uc' in header (reported by jon blow)
   7895       1.20    added support for Softimage PIC, by Tom Seddon
   7896       1.19    bug in interlaced PNG corruption check (found by ryg)
   7897       1.18  (2008-08-02)
   7898               fix a threading bug (local mutable static)
   7899       1.17    support interlaced PNG
   7900       1.16    major bugfix - stbi__convert_format converted one too many pixels
   7901       1.15    initialize some fields for thread safety
   7902       1.14    fix threadsafe conversion bug
   7903               header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
   7904       1.13    threadsafe
   7905       1.12    const qualifiers in the API
   7906       1.11    Support installable IDCT, colorspace conversion routines
   7907       1.10    Fixes for 64-bit (don't use "unsigned long")
   7908               optimized upsampling by Fabian "ryg" Giesen
   7909       1.09    Fix format-conversion for PSD code (bad global variables!)
   7910       1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
   7911       1.07    attempt to fix C++ warning/errors again
   7912       1.06    attempt to fix C++ warning/errors again
   7913       1.05    fix TGA loading to return correct *comp and use good luminance calc
   7914       1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
   7915       1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
   7916       1.02    support for (subset of) HDR files, float interface for preferred access to them
   7917       1.01    fix bug: possible bug in handling right-side up bmps... not sure
   7918               fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
   7919       1.00    interface to zlib that skips zlib header
   7920       0.99    correct handling of alpha in palette
   7921       0.98    TGA loader by lonesock; dynamically add loaders (untested)
   7922       0.97    jpeg errors on too large a file; also catch another malloc failure
   7923       0.96    fix detection of invalid v value - particleman@mollyrocket forum
   7924       0.95    during header scan, seek to markers in case of padding
   7925       0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
   7926       0.93    handle jpegtran output; verbose errors
   7927       0.92    read 4,8,16,24,32-bit BMP files of several formats
   7928       0.91    output 24-bit Windows 3.0 BMP files
   7929       0.90    fix a few more warnings; bump version number to approach 1.0
   7930       0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
   7931       0.60    fix compiling as c++
   7932       0.59    fix warnings: merge Dave Moore's -Wall fixes
   7933       0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
   7934       0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
   7935       0.56    fix bug: zlib uncompressed mode len vs. nlen
   7936       0.55    fix bug: restart_interval not initialized to 0
   7937       0.54    allow NULL for 'int *comp'
   7938       0.53    fix bug in png 3->4; speedup png decoding
   7939       0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
   7940       0.51    obey req_comp requests, 1-component jpegs return as 1-component,
   7941               on 'test' only check type, not whether we support this variant
   7942       0.50  (2006-11-19)
   7943               first released version
   7944 */
   7945 
   7946 
   7947 /*
   7948 ------------------------------------------------------------------------------
   7949 This software is available under 2 licenses -- choose whichever you prefer.
   7950 ------------------------------------------------------------------------------
   7951 ALTERNATIVE A - MIT License
   7952 Copyright (c) 2017 Sean Barrett
   7953 Permission is hereby granted, free of charge, to any person obtaining a copy of
   7954 this software and associated documentation files (the "Software"), to deal in
   7955 the Software without restriction, including without limitation the rights to
   7956 use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
   7957 of the Software, and to permit persons to whom the Software is furnished to do
   7958 so, subject to the following conditions:
   7959 The above copyright notice and this permission notice shall be included in all
   7960 copies or substantial portions of the Software.
   7961 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   7962 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   7963 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   7964 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
   7965 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
   7966 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
   7967 SOFTWARE.
   7968 ------------------------------------------------------------------------------
   7969 ALTERNATIVE B - Public Domain (www.unlicense.org)
   7970 This is free and unencumbered software released into the public domain.
   7971 Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
   7972 software, either in source code form or as a compiled binary, for any purpose,
   7973 commercial or non-commercial, and by any means.
   7974 In jurisdictions that recognize copyright laws, the author or authors of this
   7975 software dedicate any and all copyright interest in the software to the public
   7976 domain. We make this dedication for the benefit of the public at large and to
   7977 the detriment of our heirs and successors. We intend this dedication to be an
   7978 overt act of relinquishment in perpetuity of all present and future rights to
   7979 this software under copyright law.
   7980 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   7981 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   7982 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   7983 AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
   7984 ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
   7985 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
   7986 ------------------------------------------------------------------------------
   7987 */