commit d7d9103
dalem
·
2026-02-28 15:16:38 +0000 UTC
parent 2e266a0
Remove /stb /stb was needed when everything was handled on the swc side, now it happens within swall, so it's all dead code
2 files changed,
+0,
-23134
+0,
-9875
1@@ -1,9875 +0,0 @@
2-/* stb_image - v2.30 - public domain image loader - http://nothings.org/stb
3- no warranty implied; use at your own risk
4-
5- Do this:
6- #define STB_IMAGE_IMPLEMENTATION
7- before you include this file in *one* C or C++ file to create the
8-implementation.
9-
10- // i.e. it should look like this:
11- #include ...
12- #include ...
13- #include ...
14- #define STB_IMAGE_IMPLEMENTATION
15- #include "stb_image.h"
16-
17- You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
18- And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using
19-malloc,realloc,free
20-
21-
22- QUICK NOTES:
23- Primarily of interest to game developers and other people who can
24- avoid problematic images and only need the trivial interface
25-
26- JPEG baseline & progressive (12 bpc/arithmetic not supported, same as
27-stock IJG lib) PNG 1/2/4/8/16-bit-per-channel
28-
29- TGA (not sure what subset, if a subset)
30- BMP non-1bpp, non-RLE
31- PSD (composited view only, no extra channels, 8/16 bit-per-channel)
32-
33- GIF (*comp always reports as 4-channel)
34- HDR (radiance rgbE format)
35- PIC (Softimage PIC)
36- PNM (PPM and PGM binary only)
37-
38- Animated GIF still needs a proper API, but here's one way to do it:
39- http://gist.github.com/urraka/685d9a6340b26b830d49
40-
41- - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
42- - decode from arbitrary I/O callbacks
43- - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
44-
45- Full documentation under "DOCUMENTATION" below.
46-
47-
48-LICENSE
49-
50- See end of file for license information.
51-
52-RECENT REVISION HISTORY:
53-
54- 2.30 (2024-05-31) avoid erroneous gcc warning
55- 2.29 (2023-05-xx) optimizations
56- 2.28 (2023-01-29) many error fixes, security errors, just tons of stuff
57- 2.27 (2021-07-11) document stbi_info better, 16-bit PNM support, bug
58-fixes 2.26 (2020-07-13) many minor fixes 2.25 (2020-02-02) fix warnings 2.24
59-(2020-02-02) fix warnings; thread-local failure_reason and flip_vertically 2.23
60-(2019-08-11) fix clang static analysis warning 2.22 (2019-03-04) gif fixes, fix
61-warnings 2.21 (2019-02-25) fix typo in comment 2.20 (2019-02-07) support utf8
62-filenames in Windows; fix warnings and platform ifdefs 2.19 (2018-02-11) fix
63-warning 2.18 (2018-01-30) fix warnings 2.17 (2018-01-29) bugfix, 1-bit BMP,
64-16-bitness query, fix warnings 2.16 (2017-07-23) all functions have 16-bit
65-variants; optimizations; bugfixes 2.15 (2017-03-18) fix png-1,2,4; all Imagenet
66-JPGs; no runtime SSE detection on GCC 2.14 (2017-03-03) remove deprecated
67-STBI_JPEG_OLD; fixes for Imagenet JPGs 2.13 (2016-12-04) experimental 16-bit
68-API, only for PNG so far; fixes 2.12 (2016-04-02) fix typo in 2.11 PSD fix that
69-caused crashes 2.11 (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
70- RGB-format JPEG; remove white matting in PSD;
71- allocate large structures on the stack;
72- correct channel count for PNG & BMP
73- 2.10 (2016-01-22) avoid warning introduced in 2.09
74- 2.09 (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
75-
76- See end of file for full revision history.
77-
78-
79- ============================ Contributors =========================
80-
81- Image formats Extensions, features
82- Sean Barrett (jpeg, png, bmp) Jetro Lauha (stbi_info)
83- Nicolas Schulz (hdr, psd) Martin "SpartanJ" Golini (stbi_info)
84- Jonathan Dummer (tga) James "moose2000" Brown (iPhone PNG)
85- Jean-Marc Lienher (gif) Ben "Disch" Wenger (io callbacks)
86- Tom Seddon (pic) Omar Cornut (1/2/4-bit PNG)
87- Thatcher Ulrich (psd) Nicolas Guillemot (vertical flip)
88- Ken Miller (pgm, ppm) Richard Mitton (16-bit PSD)
89- github:urraka (animated gif) Junggon Kim (PNM comments)
90- Christopher Forseth (animated gif) Daniel Gibson (16-bit TGA)
91- socks-the-fox (16-bit PNG)
92- Jeremy Sawicki (handle all ImageNet
93-JPGs) Optimizations & bugfixes Mikhail Morozov (1-bit BMP)
94- Fabian "ryg" Giesen Anael Seghezzi (is-16-bit query)
95- Arseny Kapoulkine Simon Breuss (16-bit PNM)
96- John-Mark Allen
97- Carmelo J Fdez-Aguera
98-
99- Bug & warning fixes
100- Marc LeBlanc David Woo Guillaume George Martins
101-Mozeiko Christpher Lloyd Jerry Jansson Joseph Thomson Blazej
102-Dariusz Roszkowski Phil Jordan Dave Moore Roy
103-Eltham Hayaki Saito Nathan Reed Won Chun Luke Graham Johan
104-Duparc Nick Verigakis the Horde3D community Thomas Ruf Ronny
105-Chevalier github:rlyeh Janez Zemva John
106-Bartholomew Michal Cichon github:romigrou Jonathan Blow Ken
107-Hamada Tero Hanninen github:svdijk Eugene Golushkov Laurent
108-Gomila Cort Stratton github:snagar Aruelien Pocheville Sergio
109-Gonzalez Thibault Reuille github:Zelex Cass Everitt Ryamond
110-Barbiero github:grim210 Paul Du Bois Engin
111-Manap Aldo Culquicondor github:sammyhw Philipp Wiesemann Dale
112-Weiler Oriol Ferrer Mesia github:phprus Josh Tobin Neil
113-Bickford Matthew Gregan github:poppolopoppo Julian Raschke Gregory
114-Mullen Christian Floisand github:darealshinji Baldur Karlsson Kevin
115-Schmidt JR Smith github:Michaelangel007 Brad Weinberger Matvey
116-Cherevko github:mosra Luca Sas Alexander Veselov Zack
117-Middleton [reserved] Ryan C. Gordon [reserved] [reserved] DO NOT
118-ADD YOUR NAME HERE
119-
120- Jacko Dirks
121-
122- To add your name to the credits, pick a random blank space in the middle and
123-fill it. 80% of merge conflicts on stb PRs are due to people adding their name
124-at the end of the credits.
125-*/
126-
127-#ifndef STBI_INCLUDE_STB_IMAGE_H
128-#define STBI_INCLUDE_STB_IMAGE_H
129-
130-// DOCUMENTATION
131-//
132-// Limitations:
133-// - no 12-bit-per-channel JPEG
134-// - no JPEGs with arithmetic coding
135-// - GIF always returns *comp=4
136-//
137-// Basic usage (see HDR discussion below for HDR usage):
138-// int x,y,n;
139-// unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
140-// // ... process data if not NULL ...
141-// // ... x = width, y = height, n = # 8-bit components per pixel ...
142-// // ... replace '0' with '1'..'4' to force that many components per pixel
143-// // ... but 'n' will always be the number that it would have been if you
144-// said 0 stbi_image_free(data);
145-//
146-// Standard parameters:
147-// int *x -- outputs image width in pixels
148-// int *y -- outputs image height in pixels
149-// int *channels_in_file -- outputs # of image components in image file
150-// int desired_channels -- if non-zero, # of image components requested in
151-// result
152-//
153-// The return value from an image loader is an 'unsigned char *' which points
154-// to the pixel data, or NULL on an allocation failure or if the image is
155-// corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
156-// with each pixel consisting of N interleaved 8-bit components; the first
157-// pixel pointed to is top-left-most in the image. There is no padding between
158-// image scanlines or between pixels, regardless of format. The number of
159-// components N is 'desired_channels' if desired_channels is non-zero, or
160-// *channels_in_file otherwise. If desired_channels is non-zero,
161-// *channels_in_file has the number of components that _would_ have been
162-// output otherwise. E.g. if you set desired_channels to 4, you will always
163-// get RGBA output, but you can check *channels_in_file to see if it's trivially
164-// opaque because e.g. there were only 3 channels in the source image.
165-//
166-// An output image with N components has the following components interleaved
167-// in this order in each pixel:
168-//
169-// N=#comp components
170-// 1 grey
171-// 2 grey, alpha
172-// 3 red, green, blue
173-// 4 red, green, blue, alpha
174-//
175-// If image loading fails for any reason, the return value will be NULL,
176-// and *x, *y, *channels_in_file will be unchanged. The function
177-// stbi_failure_reason() can be queried for an extremely brief, end-user
178-// unfriendly explanation of why the load failed. Define STBI_NO_FAILURE_STRINGS
179-// to avoid compiling these strings at all, and STBI_FAILURE_USERMSG to get
180-// slightly more user-friendly ones.
181-//
182-// Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
183-//
184-// To query the width, height and component count of an image without having to
185-// decode the full file, you can use the stbi_info family of functions:
186-//
187-// int x,y,n,ok;
188-// ok = stbi_info(filename, &x, &y, &n);
189-// // returns ok=1 and sets x, y, n if image is a supported format,
190-// // 0 otherwise.
191-//
192-// Note that stb_image pervasively uses ints in its public API for sizes,
193-// including sizes of memory buffers. This is now part of the API and thus
194-// hard to change without causing breakage. As a result, the various image
195-// loaders all have certain limits on image size; these differ somewhat
196-// by format but generally boil down to either just under 2GB or just under
197-// 1GB. When the decoded image would be larger than this, stb_image decoding
198-// will fail.
199-//
200-// Additionally, stb_image will reject image files that have any of their
201-// dimensions set to a larger value than the configurable STBI_MAX_DIMENSIONS,
202-// which defaults to 2**24 = 16777216 pixels. Due to the above memory limit,
203-// the only way to have an image with such dimensions load correctly
204-// is for it to have a rather extreme aspect ratio. Either way, the
205-// assumption here is that such larger images are likely to be malformed
206-// or malicious. If you do need to load an image with individual dimensions
207-// larger than that, and it still fits in the overall size limit, you can
208-// #define STBI_MAX_DIMENSIONS on your own to be something larger.
209-//
210-// ===========================================================================
211-//
212-// UNICODE:
213-//
214-// If compiling for Windows and you wish to use Unicode filenames, compile
215-// with
216-// #define STBI_WINDOWS_UTF8
217-// and pass utf8-encoded filenames. Call stbi_convert_wchar_to_utf8 to convert
218-// Windows wchar_t filenames to utf8.
219-//
220-// ===========================================================================
221-//
222-// Philosophy
223-//
224-// stb libraries are designed with the following priorities:
225-//
226-// 1. easy to use
227-// 2. easy to maintain
228-// 3. good performance
229-//
230-// Sometimes I let "good performance" creep up in priority over "easy to
231-// maintain", and for best performance I may provide less-easy-to-use APIs that
232-// give higher performance, in addition to the easy-to-use ones. Nevertheless,
233-// it's important to keep in mind that from the standpoint of you, a client of
234-// this library, all you care about is #1 and #3, and stb libraries DO NOT
235-// emphasize #3 above all.
236-//
237-// Some secondary priorities arise directly from the first two, some of which
238-// provide more explicit reasons why performance can't be emphasized.
239-//
240-// - Portable ("ease of use")
241-// - Small source code footprint ("easy to maintain")
242-// - No dependencies ("ease of use")
243-//
244-// ===========================================================================
245-//
246-// I/O callbacks
247-//
248-// I/O callbacks allow you to read from arbitrary sources, like packaged
249-// files or some other source. Data read from callbacks are processed
250-// through a small internal buffer (currently 128 bytes) to try to reduce
251-// overhead.
252-//
253-// The three functions you must define are "read" (reads some bytes of data),
254-// "skip" (skips some bytes of data), "eof" (reports if the stream is at the
255-// end).
256-//
257-// ===========================================================================
258-//
259-// SIMD support
260-//
261-// The JPEG decoder will try to automatically use SIMD kernels on x86 when
262-// supported by the compiler. For ARM Neon support, you must explicitly
263-// request it.
264-//
265-// (The old do-it-yourself SIMD API is no longer supported in the current
266-// code.)
267-//
268-// On x86, SSE2 will automatically be used when available based on a run-time
269-// test; if not, the generic C versions are used as a fall-back. On ARM targets,
270-// the typical path is to have separate builds for NEON and non-NEON devices
271-// (at least this is true for iOS and Android). Therefore, the NEON support is
272-// toggled by a build flag: define STBI_NEON to get NEON loops.
273-//
274-// If for some reason you do not want to use any of SIMD code, or if
275-// you have issues compiling it, you can disable it entirely by
276-// defining STBI_NO_SIMD.
277-//
278-// ===========================================================================
279-//
280-// HDR image support (disable by defining STBI_NO_HDR)
281-//
282-// stb_image supports loading HDR images in general, and currently the Radiance
283-// .HDR file format specifically. You can still load any file through the
284-// existing interface; if you attempt to load an HDR file, it will be
285-// automatically remapped to LDR, assuming gamma 2.2 and an arbitrary scale
286-// factor defaulting to 1; both of these constants can be reconfigured through
287-// this interface:
288-//
289-// stbi_hdr_to_ldr_gamma(2.2f);
290-// stbi_hdr_to_ldr_scale(1.0f);
291-//
292-// (note, do not use _inverse_ constants; stbi_image will invert them
293-// appropriately).
294-//
295-// Additionally, there is a new, parallel interface for loading files as
296-// (linear) floats to preserve the full dynamic range:
297-//
298-// float *data = stbi_loadf(filename, &x, &y, &n, 0);
299-//
300-// If you load LDR images through this interface, those images will
301-// be promoted to floating point values, run through the inverse of
302-// constants corresponding to the above:
303-//
304-// stbi_ldr_to_hdr_scale(1.0f);
305-// stbi_ldr_to_hdr_gamma(2.2f);
306-//
307-// Finally, given a filename (or an open file or memory block--see header
308-// file for details) containing image data, you can query for the "most
309-// appropriate" interface to use (that is, whether the image is HDR or
310-// not), using:
311-//
312-// stbi_is_hdr(char *filename);
313-//
314-// ===========================================================================
315-//
316-// iPhone PNG support:
317-//
318-// We optionally support converting iPhone-formatted PNGs (which store
319-// premultiplied BGRA) back to RGB, even though they're internally encoded
320-// differently. To enable this conversion, call
321-// stbi_convert_iphone_png_to_rgb(1).
322-//
323-// Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
324-// pixel to remove any premultiplied alpha *only* if the image file explicitly
325-// says there's premultiplied data (currently only happens in iPhone images,
326-// and only if iPhone convert-to-rgb processing is on).
327-//
328-// ===========================================================================
329-//
330-// ADDITIONAL CONFIGURATION
331-//
332-// - You can suppress implementation of any of the decoders to reduce
333-// your code footprint by #defining one or more of the following
334-// symbols before creating the implementation.
335-//
336-// STBI_NO_JPEG
337-// STBI_NO_PNG
338-// STBI_NO_BMP
339-// STBI_NO_PSD
340-// STBI_NO_TGA
341-// STBI_NO_GIF
342-// STBI_NO_HDR
343-// STBI_NO_PIC
344-// STBI_NO_PNM (.ppm and .pgm)
345-//
346-// - You can request *only* certain decoders and suppress all other ones
347-// (this will be more forward-compatible, as addition of new decoders
348-// doesn't require you to disable them explicitly):
349-//
350-// STBI_ONLY_JPEG
351-// STBI_ONLY_PNG
352-// STBI_ONLY_BMP
353-// STBI_ONLY_PSD
354-// STBI_ONLY_TGA
355-// STBI_ONLY_GIF
356-// STBI_ONLY_HDR
357-// STBI_ONLY_PIC
358-// STBI_ONLY_PNM (.ppm and .pgm)
359-//
360-// - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
361-// want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
362-//
363-// - If you define STBI_MAX_DIMENSIONS, stb_image will reject images greater
364-// than that size (in either width or height) without further processing.
365-// This is to let programs in the wild set an upper bound to prevent
366-// denial-of-service attacks on untrusted data, as one could generate a
367-// valid image of gigantic dimensions and force stb_image to allocate a
368-// huge block of memory and spend disproportionate time decoding it. By
369-// default this is set to (1 << 24), which is 16777216, but that's still
370-// very big.
371-
372-#ifndef STBI_NO_STDIO
373-#include <stdio.h>
374-#endif // STBI_NO_STDIO
375-
376-#define STBI_VERSION 1
377-
378-enum {
379- STBI_default = 0, // only used for desired_channels
380-
381- STBI_grey = 1,
382- STBI_grey_alpha = 2,
383- STBI_rgb = 3,
384- STBI_rgb_alpha = 4
385-};
386-
387-#include <stdlib.h>
388-typedef unsigned char stbi_uc;
389-typedef unsigned short stbi_us;
390-
391-#ifdef __cplusplus
392-extern "C" {
393-#endif
394-
395-#ifndef STBIDEF
396-#ifdef STB_IMAGE_STATIC
397-#define STBIDEF static
398-#else
399-#define STBIDEF extern
400-#endif
401-#endif
402-
403-//////////////////////////////////////////////////////////////////////////////
404-//
405-// PRIMARY API - works on images of any type
406-//
407-
408-//
409-// load image by filename, open file, or memory buffer
410-//
411-
412-typedef struct {
413- int (*read)(void *user, char *data,
414- int size); // fill 'data' with 'size' bytes. return number of
415- // bytes actually read
416- void (*skip)(void *user, int n); // skip the next 'n' bytes, or 'unget' the
417- // last -n bytes if negative
418- int (*eof)(void *user); // returns nonzero if we are at end of file/data
419-} stbi_io_callbacks;
420-
421-////////////////////////////////////
422-//
423-// 8-bits-per-channel interface
424-//
425-
426-STBIDEF stbi_uc *
427-stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y,
428- int *channels_in_file, int desired_channels);
429-STBIDEF stbi_uc *
430-stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x,
431- int *y, int *channels_in_file, int desired_channels);
432-
433-#ifndef STBI_NO_STDIO
434-STBIDEF stbi_uc *
435-stbi_load(char const *filename, int *x, int *y, int *channels_in_file,
436- int desired_channels);
437-STBIDEF stbi_uc *
438-stbi_load_from_file(FILE *f, int *x, int *y, int *channels_in_file,
439- int desired_channels);
440-// for stbi_load_from_file, file pointer is left pointing immediately after
441-// image
442-#endif
443-
444-#ifndef STBI_NO_GIF
445-STBIDEF stbi_uc *
446-stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x,
447- int *y, int *z, int *comp, int req_comp);
448-#endif
449-
450-#ifdef STBI_WINDOWS_UTF8
451-STBIDEF int
452-stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen,
453- const wchar_t *input);
454-#endif
455-
456-////////////////////////////////////
457-//
458-// 16-bits-per-channel interface
459-//
460-
461-STBIDEF stbi_us *
462-stbi_load_16_from_memory(stbi_uc const *buffer, int len, int *x, int *y,
463- int *channels_in_file, int desired_channels);
464-STBIDEF stbi_us *
465-stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x,
466- int *y, int *channels_in_file,
467- int desired_channels);
468-
469-#ifndef STBI_NO_STDIO
470-STBIDEF stbi_us *
471-stbi_load_16(char const *filename, int *x, int *y, int *channels_in_file,
472- int desired_channels);
473-STBIDEF stbi_us *
474-stbi_load_from_file_16(FILE *f, int *x, int *y, int *channels_in_file,
475- int desired_channels);
476-#endif
477-
478-////////////////////////////////////
479-//
480-// float-per-channel interface
481-//
482-#ifndef STBI_NO_LINEAR
483-STBIDEF float *
484-stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y,
485- int *channels_in_file, int desired_channels);
486-STBIDEF float *
487-stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x,
488- int *y, int *channels_in_file, int desired_channels);
489-
490-#ifndef STBI_NO_STDIO
491-STBIDEF float *
492-stbi_loadf(char const *filename, int *x, int *y, int *channels_in_file,
493- int desired_channels);
494-STBIDEF float *
495-stbi_loadf_from_file(FILE *f, int *x, int *y, int *channels_in_file,
496- int desired_channels);
497-#endif
498-#endif
499-
500-#ifndef STBI_NO_HDR
501-STBIDEF void
502-stbi_hdr_to_ldr_gamma(float gamma);
503-STBIDEF void
504-stbi_hdr_to_ldr_scale(float scale);
505-#endif // STBI_NO_HDR
506-
507-#ifndef STBI_NO_LINEAR
508-STBIDEF void
509-stbi_ldr_to_hdr_gamma(float gamma);
510-STBIDEF void
511-stbi_ldr_to_hdr_scale(float scale);
512-#endif // STBI_NO_LINEAR
513-
514-// stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
515-STBIDEF int
516-stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
517-STBIDEF int
518-stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
519-#ifndef STBI_NO_STDIO
520-STBIDEF int
521-stbi_is_hdr(char const *filename);
522-STBIDEF int
523-stbi_is_hdr_from_file(FILE *f);
524-#endif // STBI_NO_STDIO
525-
526-// get a VERY brief reason for failure
527-// on most compilers (and ALL modern mainstream compilers) this is threadsafe
528-STBIDEF const char *
529-stbi_failure_reason(void);
530-
531-// free the loaded image -- this is just free()
532-STBIDEF void
533-stbi_image_free(void *retval_from_stbi_load);
534-
535-// get image dimensions & components without fully decoding
536-STBIDEF int
537-stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y,
538- int *comp);
539-STBIDEF int
540-stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x,
541- int *y, int *comp);
542-STBIDEF int
543-stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len);
544-STBIDEF int
545-stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *clbk, void *user);
546-
547-#ifndef STBI_NO_STDIO
548-STBIDEF int
549-stbi_info(char const *filename, int *x, int *y, int *comp);
550-STBIDEF int
551-stbi_info_from_file(FILE *f, int *x, int *y, int *comp);
552-STBIDEF int
553-stbi_is_16_bit(char const *filename);
554-STBIDEF int
555-stbi_is_16_bit_from_file(FILE *f);
556-#endif
557-
558-// for image formats that explicitly notate that they have premultiplied alpha,
559-// we just return the colors as stored in the file. set this flag to force
560-// unpremultiplication. results are undefined if the unpremultiply overflow.
561-STBIDEF void
562-stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
563-
564-// indicate whether we should process iphone images back to canonical format,
565-// or just pass them through "as-is"
566-STBIDEF void
567-stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
568-
569-// flip the image vertically, so the first pixel in the output array is the
570-// bottom left
571-STBIDEF void
572-stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
573-
574-// as above, but only applies to images loaded on the thread that calls the
575-// function this function is only available if your compiler supports
576-// thread-local variables; calling it will fail to link if your compiler doesn't
577-STBIDEF void
578-stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply);
579-STBIDEF void
580-stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert);
581-STBIDEF void
582-stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip);
583-
584-// ZLIB client - used by PNG, available for other purposes
585-
586-STBIDEF char *
587-stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size,
588- int *outlen);
589-STBIDEF char *
590-stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len,
591- int initial_size, int *outlen,
592- int parse_header);
593-STBIDEF char *
594-stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
595-STBIDEF int
596-stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
597-
598-STBIDEF char *
599-stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
600-STBIDEF int
601-stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer,
602- int ilen);
603-
604-#ifdef __cplusplus
605-}
606-#endif
607-
608-//
609-//
610-//// end header file /////////////////////////////////////////////////////
611-#endif // STBI_INCLUDE_STB_IMAGE_H
612-
613-#ifdef STB_IMAGE_IMPLEMENTATION
614-
615-#if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || \
616- defined(STBI_ONLY_BMP) || defined(STBI_ONLY_TGA) || \
617- defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) || \
618- defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || \
619- defined(STBI_ONLY_PNM) || defined(STBI_ONLY_ZLIB)
620-#ifndef STBI_ONLY_JPEG
621-#define STBI_NO_JPEG
622-#endif
623-#ifndef STBI_ONLY_PNG
624-#define STBI_NO_PNG
625-#endif
626-#ifndef STBI_ONLY_BMP
627-#define STBI_NO_BMP
628-#endif
629-#ifndef STBI_ONLY_PSD
630-#define STBI_NO_PSD
631-#endif
632-#ifndef STBI_ONLY_TGA
633-#define STBI_NO_TGA
634-#endif
635-#ifndef STBI_ONLY_GIF
636-#define STBI_NO_GIF
637-#endif
638-#ifndef STBI_ONLY_HDR
639-#define STBI_NO_HDR
640-#endif
641-#ifndef STBI_ONLY_PIC
642-#define STBI_NO_PIC
643-#endif
644-#ifndef STBI_ONLY_PNM
645-#define STBI_NO_PNM
646-#endif
647-#endif
648-
649-#if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && \
650- !defined(STBI_NO_ZLIB)
651-#define STBI_NO_ZLIB
652-#endif
653-
654-#include <limits.h>
655-#include <stdarg.h>
656-#include <stddef.h> // ptrdiff_t on osx
657-#include <stdlib.h>
658-#include <string.h>
659-
660-#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
661-#include <math.h> // ldexp, pow
662-#endif
663-
664-#ifndef STBI_NO_STDIO
665-#include <stdio.h>
666-#endif
667-
668-#ifndef STBI_ASSERT
669-#include <assert.h>
670-#define STBI_ASSERT(x) assert(x)
671-#endif
672-
673-#ifdef __cplusplus
674-#define STBI_EXTERN extern "C"
675-#else
676-#define STBI_EXTERN extern
677-#endif
678-
679-#ifndef _MSC_VER
680-#ifdef __cplusplus
681-#define stbi_inline inline
682-#else
683-#define stbi_inline
684-#endif
685-#else
686-#define stbi_inline __forceinline
687-#endif
688-
689-#ifndef STBI_NO_THREAD_LOCALS
690-#if defined(__cplusplus) && __cplusplus >= 201103L
691-#define STBI_THREAD_LOCAL thread_local
692-#elif defined(__GNUC__) && __GNUC__ < 5
693-#define STBI_THREAD_LOCAL __thread
694-#elif defined(_MSC_VER)
695-#define STBI_THREAD_LOCAL __declspec(thread)
696-#elif defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L && \
697- !defined(__STDC_NO_THREADS__)
698-#define STBI_THREAD_LOCAL _Thread_local
699-#endif
700-
701-#ifndef STBI_THREAD_LOCAL
702-#if defined(__GNUC__)
703-#define STBI_THREAD_LOCAL __thread
704-#endif
705-#endif
706-#endif
707-
708-#if defined(_MSC_VER) || defined(__SYMBIAN32__)
709-typedef unsigned short stbi__uint16;
710-typedef signed short stbi__int16;
711-typedef unsigned int stbi__uint32;
712-typedef signed int stbi__int32;
713-#else
714-#include <stdint.h>
715-typedef uint16_t stbi__uint16;
716-typedef int16_t stbi__int16;
717-typedef uint32_t stbi__uint32;
718-typedef int32_t stbi__int32;
719-#endif
720-
721-// should produce compiler error if size is wrong
722-typedef unsigned char validate_uint32[sizeof(stbi__uint32) == 4 ? 1 : -1];
723-
724-#ifdef _MSC_VER
725-#define STBI_NOTUSED(v) (void)(v)
726-#else
727-#define STBI_NOTUSED(v) (void)sizeof(v)
728-#endif
729-
730-#ifdef _MSC_VER
731-#define STBI_HAS_LROTL
732-#endif
733-
734-#ifdef STBI_HAS_LROTL
735-#define stbi_lrot(x, y) _lrotl(x, y)
736-#else
737-#define stbi_lrot(x, y) (((x) << (y)) | ((x) >> (-(y) & 31)))
738-#endif
739-
740-#if defined(STBI_MALLOC) && defined(STBI_FREE) && \
741- (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
742-// ok
743-#elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && \
744- !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
745-// ok
746-#else
747-#error \
748- "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
749-#endif
750-
751-#ifndef STBI_MALLOC
752-#define STBI_MALLOC(sz) malloc(sz)
753-#define STBI_REALLOC(p, newsz) realloc(p, newsz)
754-#define STBI_FREE(p) free(p)
755-#endif
756-
757-#ifndef STBI_REALLOC_SIZED
758-#define STBI_REALLOC_SIZED(p, oldsz, newsz) STBI_REALLOC(p, newsz)
759-#endif
760-
761-// x86/x64 detection
762-#if defined(__x86_64__) || defined(_M_X64)
763-#define STBI__X64_TARGET
764-#elif defined(__i386) || defined(_M_IX86)
765-#define STBI__X86_TARGET
766-#endif
767-
768-#if defined(__GNUC__) && defined(STBI__X86_TARGET) && !defined(__SSE2__) && \
769- !defined(STBI_NO_SIMD)
770-// gcc doesn't support sse2 intrinsics unless you compile with -msse2,
771-// which in turn means it gets to use SSE2 everywhere. This is unfortunate,
772-// but previous attempts to provide the SSE2 functions with runtime
773-// detection caused numerous issues. The way architecture extensions are
774-// exposed in GCC/Clang is, sadly, not really suited for one-file libs.
775-// New behavior: if compiled with -msse2, we use SSE2 without any
776-// detection; if not, we don't use it at all.
777-#define STBI_NO_SIMD
778-#endif
779-
780-#if defined(__MINGW32__) && defined(STBI__X86_TARGET) && \
781- !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
782-// Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid
783-// STBI__X64_TARGET
784-//
785-// 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
786-// Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
787-// As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
788-// simultaneously enabling "-mstackrealign".
789-//
790-// See https://github.com/nothings/stb/issues/81 for more information.
791-//
792-// So default to no SSE2 on 32-bit MinGW. If you've read this far and added
793-// -mstackrealign to your build settings, feel free to #define
794-// STBI_MINGW_ENABLE_SSE2.
795-#define STBI_NO_SIMD
796-#endif
797-
798-#if !defined(STBI_NO_SIMD) && \
799- (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
800-#define STBI_SSE2
801-#include <emmintrin.h>
802-
803-#ifdef _MSC_VER
804-
805-#if _MSC_VER >= 1400 // not VC6
806-#include <intrin.h> // __cpuid
807-static int
808-stbi__cpuid3(void)
809-{
810- int info[4];
811- __cpuid(info, 1);
812- return info[3];
813-}
814-#else
815-static int
816-stbi__cpuid3(void)
817-{
818- int res;
819- __asm {
820- mov eax,1
821- cpuid
822- mov res,edx
823- }
824- return res;
825-}
826-#endif
827-
828-#define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
829-
830-#if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
831-static int
832-stbi__sse2_available(void)
833-{
834- int info3 = stbi__cpuid3();
835- return ((info3 >> 26) & 1) != 0;
836-}
837-#endif
838-
839-#else // assume GCC-style if not VC++
840-#define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
841-
842-#if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
843-static int
844-stbi__sse2_available(void)
845-{
846- // If we're even attempting to compile this on GCC/Clang, that means
847- // -msse2 is on, which means the compiler is allowed to use SSE2
848- // instructions at will, and so are we.
849- return 1;
850-}
851-#endif
852-
853-#endif
854-#endif
855-
856-// ARM NEON
857-#if defined(STBI_NO_SIMD) && defined(STBI_NEON)
858-#undef STBI_NEON
859-#endif
860-
861-#ifdef STBI_NEON
862-#include <arm_neon.h>
863-#ifdef _MSC_VER
864-#define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
865-#else
866-#define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
867-#endif
868-#endif
869-
870-#ifndef STBI_SIMD_ALIGN
871-#define STBI_SIMD_ALIGN(type, name) type name
872-#endif
873-
874-#ifndef STBI_MAX_DIMENSIONS
875-#define STBI_MAX_DIMENSIONS (1 << 24)
876-#endif
877-
878-///////////////////////////////////////////////
879-//
880-// stbi__context struct and start_xxx functions
881-
882-// stbi__context structure is our basic context used by all images, so it
883-// contains all the IO context, plus some basic image information
884-typedef struct {
885- stbi__uint32 img_x, img_y;
886- int img_n, img_out_n;
887-
888- stbi_io_callbacks io;
889- void *io_user_data;
890-
891- int read_from_callbacks;
892- int buflen;
893- stbi_uc buffer_start[128];
894- int callback_already_read;
895-
896- stbi_uc *img_buffer, *img_buffer_end;
897- stbi_uc *img_buffer_original, *img_buffer_original_end;
898-} stbi__context;
899-
900-static void
901-stbi__refill_buffer(stbi__context *s);
902-
903-// initialize a memory-decode context
904-static void
905-stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
906-{
907- s->io.read = NULL;
908- s->read_from_callbacks = 0;
909- s->callback_already_read = 0;
910- s->img_buffer = s->img_buffer_original = (stbi_uc *)buffer;
911- s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *)buffer + len;
912-}
913-
914-// initialize a callback-based context
915-static void
916-stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
917-{
918- s->io = *c;
919- s->io_user_data = user;
920- s->buflen = sizeof(s->buffer_start);
921- s->read_from_callbacks = 1;
922- s->callback_already_read = 0;
923- s->img_buffer = s->img_buffer_original = s->buffer_start;
924- stbi__refill_buffer(s);
925- s->img_buffer_original_end = s->img_buffer_end;
926-}
927-
928-#ifndef STBI_NO_STDIO
929-
930-static int
931-stbi__stdio_read(void *user, char *data, int size)
932-{
933- return (int)fread(data, 1, size, (FILE *)user);
934-}
935-
936-static void
937-stbi__stdio_skip(void *user, int n)
938-{
939- int ch;
940- fseek((FILE *)user, n, SEEK_CUR);
941- ch = fgetc((FILE *)user); /* have to read a byte to reset feof()'s flag */
942- if (ch != EOF) {
943- ungetc(ch, (FILE *)user); /* push byte back onto stream if valid. */
944- }
945-}
946-
947-static int
948-stbi__stdio_eof(void *user)
949-{
950- return feof((FILE *)user) || ferror((FILE *)user);
951-}
952-
953-static stbi_io_callbacks stbi__stdio_callbacks = {
954- stbi__stdio_read,
955- stbi__stdio_skip,
956- stbi__stdio_eof,
957-};
958-
959-static void
960-stbi__start_file(stbi__context *s, FILE *f)
961-{
962- stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *)f);
963-}
964-
965-// static void stop_file(stbi__context *s) { }
966-
967-#endif // !STBI_NO_STDIO
968-
969-static void
970-stbi__rewind(stbi__context *s)
971-{
972- // conceptually rewind SHOULD rewind to the beginning of the stream,
973- // but we just rewind to the beginning of the initial buffer, because
974- // we only use it after doing 'test', which only ever looks at at most 92
975- // bytes
976- s->img_buffer = s->img_buffer_original;
977- s->img_buffer_end = s->img_buffer_original_end;
978-}
979-
980-enum { STBI_ORDER_RGB, STBI_ORDER_BGR };
981-
982-typedef struct {
983- int bits_per_channel;
984- int num_channels;
985- int channel_order;
986-} stbi__result_info;
987-
988-#ifndef STBI_NO_JPEG
989-static int
990-stbi__jpeg_test(stbi__context *s);
991-static void *
992-stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
993- stbi__result_info *ri);
994-static int
995-stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
996-#endif
997-
998-#ifndef STBI_NO_PNG
999-static int
1000-stbi__png_test(stbi__context *s);
1001-static void *
1002-stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
1003- stbi__result_info *ri);
1004-static int
1005-stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
1006-static int
1007-stbi__png_is16(stbi__context *s);
1008-#endif
1009-
1010-#ifndef STBI_NO_BMP
1011-static int
1012-stbi__bmp_test(stbi__context *s);
1013-static void *
1014-stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
1015- stbi__result_info *ri);
1016-static int
1017-stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
1018-#endif
1019-
1020-#ifndef STBI_NO_TGA
1021-static int
1022-stbi__tga_test(stbi__context *s);
1023-static void *
1024-stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
1025- stbi__result_info *ri);
1026-static int
1027-stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
1028-#endif
1029-
1030-#ifndef STBI_NO_PSD
1031-static int
1032-stbi__psd_test(stbi__context *s);
1033-static void *
1034-stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
1035- stbi__result_info *ri, int bpc);
1036-static int
1037-stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
1038-static int
1039-stbi__psd_is16(stbi__context *s);
1040-#endif
1041-
1042-#ifndef STBI_NO_HDR
1043-static int
1044-stbi__hdr_test(stbi__context *s);
1045-static float *
1046-stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
1047- stbi__result_info *ri);
1048-static int
1049-stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
1050-#endif
1051-
1052-#ifndef STBI_NO_PIC
1053-static int
1054-stbi__pic_test(stbi__context *s);
1055-static void *
1056-stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
1057- stbi__result_info *ri);
1058-static int
1059-stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
1060-#endif
1061-
1062-#ifndef STBI_NO_GIF
1063-static int
1064-stbi__gif_test(stbi__context *s);
1065-static void *
1066-stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
1067- stbi__result_info *ri);
1068-static void *
1069-stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z,
1070- int *comp, int req_comp);
1071-static int
1072-stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
1073-#endif
1074-
1075-#ifndef STBI_NO_PNM
1076-static int
1077-stbi__pnm_test(stbi__context *s);
1078-static void *
1079-stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
1080- stbi__result_info *ri);
1081-static int
1082-stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
1083-static int
1084-stbi__pnm_is16(stbi__context *s);
1085-#endif
1086-
1087-static
1088-#ifdef STBI_THREAD_LOCAL
1089- STBI_THREAD_LOCAL
1090-#endif
1091- const char *stbi__g_failure_reason;
1092-
1093-STBIDEF const char *
1094-stbi_failure_reason(void)
1095-{
1096- return stbi__g_failure_reason;
1097-}
1098-
1099-#ifndef STBI_NO_FAILURE_STRINGS
1100-static int
1101-stbi__err(const char *str)
1102-{
1103- stbi__g_failure_reason = str;
1104- return 0;
1105-}
1106-#endif
1107-
1108-static void *
1109-stbi__malloc(size_t size)
1110-{
1111- return STBI_MALLOC(size);
1112-}
1113-
1114-// stb_image uses ints pervasively, including for offset calculations.
1115-// therefore the largest decoded image size we can support with the
1116-// current code, even on 64-bit targets, is INT_MAX. this is not a
1117-// significant limitation for the intended use case.
1118-//
1119-// we do, however, need to make sure our size calculations don't
1120-// overflow. hence a few helper functions for size calculations that
1121-// multiply integers together, making sure that they're non-negative
1122-// and no overflow occurs.
1123-
1124-// return 1 if the sum is valid, 0 on overflow.
1125-// negative terms are considered invalid.
1126-static int
1127-stbi__addsizes_valid(int a, int b)
1128-{
1129- if (b < 0) {
1130- return 0;
1131- }
1132- // now 0 <= b <= INT_MAX, hence also
1133- // 0 <= INT_MAX - b <= INTMAX.
1134- // And "a + b <= INT_MAX" (which might overflow) is the
1135- // same as a <= INT_MAX - b (no overflow)
1136- return a <= INT_MAX - b;
1137-}
1138-
1139-// returns 1 if the product is valid, 0 on overflow.
1140-// negative factors are considered invalid.
1141-static int
1142-stbi__mul2sizes_valid(int a, int b)
1143-{
1144- if (a < 0 || b < 0) {
1145- return 0;
1146- }
1147- if (b == 0) {
1148- return 1; // mul-by-0 is always safe
1149- }
1150- // portable way to check for no overflows in a*b
1151- return a <= INT_MAX / b;
1152-}
1153-
1154-#if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || \
1155- !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
1156-// returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow
1157-static int
1158-stbi__mad2sizes_valid(int a, int b, int add)
1159-{
1160- return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a * b, add);
1161-}
1162-#endif
1163-
1164-// returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow
1165-static int
1166-stbi__mad3sizes_valid(int a, int b, int c, int add)
1167-{
1168- return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a * b, c) &&
1169- stbi__addsizes_valid(a * b * c, add);
1170-}
1171-
1172-// returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't
1173-// overflow
1174-#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
1175-static int
1176-stbi__mad4sizes_valid(int a, int b, int c, int d, int add)
1177-{
1178- return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a * b, c) &&
1179- stbi__mul2sizes_valid(a * b * c, d) &&
1180- stbi__addsizes_valid(a * b * c * d, add);
1181-}
1182-#endif
1183-
1184-#if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || \
1185- !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
1186-// mallocs with size overflow checking
1187-static void *
1188-stbi__malloc_mad2(int a, int b, int add)
1189-{
1190- if (!stbi__mad2sizes_valid(a, b, add)) {
1191- return NULL;
1192- }
1193- return stbi__malloc(a * b + add);
1194-}
1195-#endif
1196-
1197-static void *
1198-stbi__malloc_mad3(int a, int b, int c, int add)
1199-{
1200- if (!stbi__mad3sizes_valid(a, b, c, add)) {
1201- return NULL;
1202- }
1203- return stbi__malloc(a * b * c + add);
1204-}
1205-
1206-#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
1207-static void *
1208-stbi__malloc_mad4(int a, int b, int c, int d, int add)
1209-{
1210- if (!stbi__mad4sizes_valid(a, b, c, d, add)) {
1211- return NULL;
1212- }
1213- return stbi__malloc(a * b * c * d + add);
1214-}
1215-#endif
1216-
1217-// returns 1 if the sum of two signed ints is valid (between -2^31 and 2^31-1
1218-// inclusive), 0 on overflow.
1219-static int
1220-stbi__addints_valid(int a, int b)
1221-{
1222- if ((a >= 0) != (b >= 0)) {
1223- return 1; // a and b have different signs, so no overflow
1224- }
1225- if (a < 0 && b < 0) {
1226- return a >= INT_MIN - b; // same as a + b >= INT_MIN; INT_MIN - b cannot
1227- // overflow since b < 0.
1228- }
1229- return a <= INT_MAX - b;
1230-}
1231-
1232-// returns 1 if the product of two ints fits in a signed short, 0 on overflow.
1233-static int
1234-stbi__mul2shorts_valid(int a, int b)
1235-{
1236- if (b == 0 || b == -1) {
1237- return 1; // multiplication by 0 is always 0; check for -1 so SHRT_MIN/b
1238- // doesn't overflow
1239- }
1240- if ((a >= 0) == (b >= 0)) {
1241- return a <= SHRT_MAX /
1242- b; // product is positive, so similar to mul2sizes_valid
1243- }
1244- if (b < 0) {
1245- return a <= SHRT_MIN / b; // same as a * b >= SHRT_MIN
1246- }
1247- return a >= SHRT_MIN / b;
1248-}
1249-
1250-// stbi__err - error
1251-// stbi__errpf - error returning pointer to float
1252-// stbi__errpuc - error returning pointer to unsigned char
1253-
1254-#ifdef STBI_NO_FAILURE_STRINGS
1255-#define stbi__err(x, y) 0
1256-#elif defined(STBI_FAILURE_USERMSG)
1257-#define stbi__err(x, y) stbi__err(y)
1258-#else
1259-#define stbi__err(x, y) stbi__err(x)
1260-#endif
1261-
1262-#define stbi__errpf(x, y) ((float *)(size_t)(stbi__err(x, y) ? NULL : NULL))
1263-#define stbi__errpuc(x, y) \
1264- ((unsigned char *)(size_t)(stbi__err(x, y) ? NULL : NULL))
1265-
1266-STBIDEF void
1267-stbi_image_free(void *retval_from_stbi_load)
1268-{
1269- STBI_FREE(retval_from_stbi_load);
1270-}
1271-
1272-#ifndef STBI_NO_LINEAR
1273-static float *
1274-stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
1275-#endif
1276-
1277-#ifndef STBI_NO_HDR
1278-static stbi_uc *
1279-stbi__hdr_to_ldr(float *data, int x, int y, int comp);
1280-#endif
1281-
1282-static int stbi__vertically_flip_on_load_global = 0;
1283-
1284-STBIDEF void
1285-stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
1286-{
1287- stbi__vertically_flip_on_load_global = flag_true_if_should_flip;
1288-}
1289-
1290-#ifndef STBI_THREAD_LOCAL
1291-#define stbi__vertically_flip_on_load stbi__vertically_flip_on_load_global
1292-#else
1293-static STBI_THREAD_LOCAL int stbi__vertically_flip_on_load_local,
1294- stbi__vertically_flip_on_load_set;
1295-
1296-STBIDEF void
1297-stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip)
1298-{
1299- stbi__vertically_flip_on_load_local = flag_true_if_should_flip;
1300- stbi__vertically_flip_on_load_set = 1;
1301-}
1302-
1303-#define stbi__vertically_flip_on_load \
1304- (stbi__vertically_flip_on_load_set ? stbi__vertically_flip_on_load_local \
1305- : stbi__vertically_flip_on_load_global)
1306-#endif // STBI_THREAD_LOCAL
1307-
1308-static void *
1309-stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp,
1310- stbi__result_info *ri, int bpc)
1311-{
1312- memset(ri, 0,
1313- sizeof(*ri)); // make sure it's initialized if we add new fields
1314- ri->bits_per_channel =
1315- 8; // default is 8 so most paths don't have to be changed
1316- ri->channel_order =
1317- STBI_ORDER_RGB; // all current input & output are this, but this is here
1318- // so we can add BGR order
1319- ri->num_channels = 0;
1320-
1321-// test the formats with a very explicit header first (at least a FOURCC
1322-// or distinctive magic number first)
1323-#ifndef STBI_NO_PNG
1324- if (stbi__png_test(s)) {
1325- return stbi__png_load(s, x, y, comp, req_comp, ri);
1326- }
1327-#endif
1328-#ifndef STBI_NO_BMP
1329- if (stbi__bmp_test(s)) {
1330- return stbi__bmp_load(s, x, y, comp, req_comp, ri);
1331- }
1332-#endif
1333-#ifndef STBI_NO_GIF
1334- if (stbi__gif_test(s)) {
1335- return stbi__gif_load(s, x, y, comp, req_comp, ri);
1336- }
1337-#endif
1338-#ifndef STBI_NO_PSD
1339- if (stbi__psd_test(s)) {
1340- return stbi__psd_load(s, x, y, comp, req_comp, ri, bpc);
1341- }
1342-#else
1343- STBI_NOTUSED(bpc);
1344-#endif
1345-#ifndef STBI_NO_PIC
1346- if (stbi__pic_test(s)) {
1347- return stbi__pic_load(s, x, y, comp, req_comp, ri);
1348- }
1349-#endif
1350-
1351-// then the formats that can end up attempting to load with just 1 or 2
1352-// bytes matching expectations; these are prone to false positives, so
1353-// try them later
1354-#ifndef STBI_NO_JPEG
1355- if (stbi__jpeg_test(s)) {
1356- return stbi__jpeg_load(s, x, y, comp, req_comp, ri);
1357- }
1358-#endif
1359-#ifndef STBI_NO_PNM
1360- if (stbi__pnm_test(s)) {
1361- return stbi__pnm_load(s, x, y, comp, req_comp, ri);
1362- }
1363-#endif
1364-
1365-#ifndef STBI_NO_HDR
1366- if (stbi__hdr_test(s)) {
1367- float *hdr = stbi__hdr_load(s, x, y, comp, req_comp, ri);
1368- return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
1369- }
1370-#endif
1371-
1372-#ifndef STBI_NO_TGA
1373- // test tga last because it's a crappy test!
1374- if (stbi__tga_test(s)) {
1375- return stbi__tga_load(s, x, y, comp, req_comp, ri);
1376- }
1377-#endif
1378-
1379- return stbi__errpuc("unknown image type",
1380- "Image not of any known type, or corrupt");
1381-}
1382-
1383-static stbi_uc *
1384-stbi__convert_16_to_8(stbi__uint16 *orig, int w, int h, int channels)
1385-{
1386- int i;
1387- int img_len = w * h * channels;
1388- stbi_uc *reduced;
1389-
1390- reduced = (stbi_uc *)stbi__malloc(img_len);
1391- if (reduced == NULL) {
1392- return stbi__errpuc("outofmem", "Out of memory");
1393- }
1394-
1395- for (i = 0; i < img_len; ++i) {
1396- reduced[i] = (stbi_uc)((orig[i] >> 8) &
1397- 0xFF); // top half of each byte is sufficient
1398- // approx of 16->8 bit scaling
1399- }
1400-
1401- STBI_FREE(orig);
1402- return reduced;
1403-}
1404-
1405-static stbi__uint16 *
1406-stbi__convert_8_to_16(stbi_uc *orig, int w, int h, int channels)
1407-{
1408- int i;
1409- int img_len = w * h * channels;
1410- stbi__uint16 *enlarged;
1411-
1412- enlarged = (stbi__uint16 *)stbi__malloc(img_len * 2);
1413- if (enlarged == NULL) {
1414- return (stbi__uint16 *)stbi__errpuc("outofmem", "Out of memory");
1415- }
1416-
1417- for (i = 0; i < img_len; ++i) {
1418- enlarged[i] = (stbi__uint16)((orig[i] << 8) +
1419- orig[i]); // replicate to high and low
1420- // byte, maps 0->0, 255->0xffff
1421- }
1422-
1423- STBI_FREE(orig);
1424- return enlarged;
1425-}
1426-
1427-static void
1428-stbi__vertical_flip(void *image, int w, int h, int bytes_per_pixel)
1429-{
1430- int row;
1431- size_t bytes_per_row = (size_t)w * bytes_per_pixel;
1432- stbi_uc temp[2048];
1433- stbi_uc *bytes = (stbi_uc *)image;
1434-
1435- for (row = 0; row < (h >> 1); row++) {
1436- stbi_uc *row0 = bytes + row * bytes_per_row;
1437- stbi_uc *row1 = bytes + (h - row - 1) * bytes_per_row;
1438- // swap row0 with row1
1439- size_t bytes_left = bytes_per_row;
1440- while (bytes_left) {
1441- size_t bytes_copy =
1442- (bytes_left < sizeof(temp)) ? bytes_left : sizeof(temp);
1443- memcpy(temp, row0, bytes_copy);
1444- memcpy(row0, row1, bytes_copy);
1445- memcpy(row1, temp, bytes_copy);
1446- row0 += bytes_copy;
1447- row1 += bytes_copy;
1448- bytes_left -= bytes_copy;
1449- }
1450- }
1451-}
1452-
1453-#ifndef STBI_NO_GIF
1454-static void
1455-stbi__vertical_flip_slices(void *image, int w, int h, int z,
1456- int bytes_per_pixel)
1457-{
1458- int slice;
1459- int slice_size = w * h * bytes_per_pixel;
1460-
1461- stbi_uc *bytes = (stbi_uc *)image;
1462- for (slice = 0; slice < z; ++slice) {
1463- stbi__vertical_flip(bytes, w, h, bytes_per_pixel);
1464- bytes += slice_size;
1465- }
1466-}
1467-#endif
1468-
1469-static unsigned char *
1470-stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp,
1471- int req_comp)
1472-{
1473- stbi__result_info ri;
1474- void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 8);
1475-
1476- if (result == NULL) {
1477- return NULL;
1478- }
1479-
1480- // it is the responsibility of the loaders to make sure we get either 8 or
1481- // 16 bit.
1482- STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
1483-
1484- if (ri.bits_per_channel != 8) {
1485- result = stbi__convert_16_to_8((stbi__uint16 *)result, *x, *y,
1486- req_comp == 0 ? *comp : req_comp);
1487- ri.bits_per_channel = 8;
1488- }
1489-
1490- // @TODO: move stbi__convert_format to here
1491-
1492- if (stbi__vertically_flip_on_load) {
1493- int channels = req_comp ? req_comp : *comp;
1494- stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi_uc));
1495- }
1496-
1497- return (unsigned char *)result;
1498-}
1499-
1500-static stbi__uint16 *
1501-stbi__load_and_postprocess_16bit(stbi__context *s, int *x, int *y, int *comp,
1502- int req_comp)
1503-{
1504- stbi__result_info ri;
1505- void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 16);
1506-
1507- if (result == NULL) {
1508- return NULL;
1509- }
1510-
1511- // it is the responsibility of the loaders to make sure we get either 8 or
1512- // 16 bit.
1513- STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
1514-
1515- if (ri.bits_per_channel != 16) {
1516- result = stbi__convert_8_to_16((stbi_uc *)result, *x, *y,
1517- req_comp == 0 ? *comp : req_comp);
1518- ri.bits_per_channel = 16;
1519- }
1520-
1521- // @TODO: move stbi__convert_format16 to here
1522- // @TODO: special case RGB-to-Y (and RGBA-to-YA) for 8-bit-to-16-bit case to
1523- // keep more precision
1524-
1525- if (stbi__vertically_flip_on_load) {
1526- int channels = req_comp ? req_comp : *comp;
1527- stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi__uint16));
1528- }
1529-
1530- return (stbi__uint16 *)result;
1531-}
1532-
1533-#if !defined(STBI_NO_HDR) && !defined(STBI_NO_LINEAR)
1534-static void
1535-stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
1536-{
1537- if (stbi__vertically_flip_on_load && result != NULL) {
1538- int channels = req_comp ? req_comp : *comp;
1539- stbi__vertical_flip(result, *x, *y, channels * sizeof(float));
1540- }
1541-}
1542-#endif
1543-
1544-#ifndef STBI_NO_STDIO
1545-
1546-#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
1547-STBI_EXTERN __declspec(dllimport) int __stdcall MultiByteToWideChar(
1548- unsigned int cp, unsigned long flags, const char *str, int cbmb,
1549- wchar_t *widestr, int cchwide);
1550-STBI_EXTERN __declspec(dllimport) int __stdcall WideCharToMultiByte(
1551- unsigned int cp, unsigned long flags, const wchar_t *widestr, int cchwide,
1552- char *str, int cbmb, const char *defchar, int *used_default);
1553-#endif
1554-
1555-#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
1556-STBIDEF int
1557-stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t *input)
1558-{
1559- return WideCharToMultiByte(65001 /* UTF8 */, 0, input, -1, buffer,
1560- (int)bufferlen, NULL, NULL);
1561-}
1562-#endif
1563-
1564-static FILE *
1565-stbi__fopen(char const *filename, char const *mode)
1566-{
1567- FILE *f;
1568-#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
1569- wchar_t wMode[64];
1570- wchar_t wFilename[1024];
1571- if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, filename, -1, wFilename,
1572- sizeof(wFilename) / sizeof(*wFilename))) {
1573- return 0;
1574- }
1575-
1576- if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, mode, -1, wMode,
1577- sizeof(wMode) / sizeof(*wMode))) {
1578- return 0;
1579- }
1580-
1581-#if defined(_MSC_VER) && _MSC_VER >= 1400
1582- if (0 != _wfopen_s(&f, wFilename, wMode)) {
1583- f = 0;
1584- }
1585-#else
1586- f = _wfopen(wFilename, wMode);
1587-#endif
1588-
1589-#elif defined(_MSC_VER) && _MSC_VER >= 1400
1590- if (0 != fopen_s(&f, filename, mode)) {
1591- f = 0;
1592- }
1593-#else
1594- f = fopen(filename, mode);
1595-#endif
1596- return f;
1597-}
1598-
1599-STBIDEF stbi_uc *
1600-stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
1601-{
1602- FILE *f = stbi__fopen(filename, "rb");
1603- unsigned char *result;
1604- if (!f) {
1605- return stbi__errpuc("can't fopen", "Unable to open file");
1606- }
1607- result = stbi_load_from_file(f, x, y, comp, req_comp);
1608- fclose(f);
1609- return result;
1610-}
1611-
1612-STBIDEF stbi_uc *
1613-stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1614-{
1615- unsigned char *result;
1616- stbi__context s;
1617- stbi__start_file(&s, f);
1618- result = stbi__load_and_postprocess_8bit(&s, x, y, comp, req_comp);
1619- if (result) {
1620- // need to 'unget' all the characters in the IO buffer
1621- fseek(f, -(int)(s.img_buffer_end - s.img_buffer), SEEK_CUR);
1622- }
1623- return result;
1624-}
1625-
1626-STBIDEF stbi__uint16 *
1627-stbi_load_from_file_16(FILE *f, int *x, int *y, int *comp, int req_comp)
1628-{
1629- stbi__uint16 *result;
1630- stbi__context s;
1631- stbi__start_file(&s, f);
1632- result = stbi__load_and_postprocess_16bit(&s, x, y, comp, req_comp);
1633- if (result) {
1634- // need to 'unget' all the characters in the IO buffer
1635- fseek(f, -(int)(s.img_buffer_end - s.img_buffer), SEEK_CUR);
1636- }
1637- return result;
1638-}
1639-
1640-STBIDEF stbi_us *
1641-stbi_load_16(char const *filename, int *x, int *y, int *comp, int req_comp)
1642-{
1643- FILE *f = stbi__fopen(filename, "rb");
1644- stbi__uint16 *result;
1645- if (!f) {
1646- return (stbi_us *)stbi__errpuc("can't fopen", "Unable to open file");
1647- }
1648- result = stbi_load_from_file_16(f, x, y, comp, req_comp);
1649- fclose(f);
1650- return result;
1651-}
1652-
1653-#endif //! STBI_NO_STDIO
1654-
1655-STBIDEF stbi_us *
1656-stbi_load_16_from_memory(stbi_uc const *buffer, int len, int *x, int *y,
1657- int *channels_in_file, int desired_channels)
1658-{
1659- stbi__context s;
1660- stbi__start_mem(&s, buffer, len);
1661- return stbi__load_and_postprocess_16bit(&s, x, y, channels_in_file,
1662- desired_channels);
1663-}
1664-
1665-STBIDEF stbi_us *
1666-stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x,
1667- int *y, int *channels_in_file, int desired_channels)
1668-{
1669- stbi__context s;
1670- stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
1671- return stbi__load_and_postprocess_16bit(&s, x, y, channels_in_file,
1672- desired_channels);
1673-}
1674-
1675-STBIDEF stbi_uc *
1676-stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp,
1677- int req_comp)
1678-{
1679- stbi__context s;
1680- stbi__start_mem(&s, buffer, len);
1681- return stbi__load_and_postprocess_8bit(&s, x, y, comp, req_comp);
1682-}
1683-
1684-STBIDEF stbi_uc *
1685-stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x,
1686- int *y, int *comp, int req_comp)
1687-{
1688- stbi__context s;
1689- stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
1690- return stbi__load_and_postprocess_8bit(&s, x, y, comp, req_comp);
1691-}
1692-
1693-#ifndef STBI_NO_GIF
1694-STBIDEF stbi_uc *
1695-stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x,
1696- int *y, int *z, int *comp, int req_comp)
1697-{
1698- unsigned char *result;
1699- stbi__context s;
1700- stbi__start_mem(&s, buffer, len);
1701-
1702- result = (unsigned char *)stbi__load_gif_main(&s, delays, x, y, z, comp,
1703- req_comp);
1704- if (stbi__vertically_flip_on_load) {
1705- stbi__vertical_flip_slices(result, *x, *y, *z, *comp);
1706- }
1707-
1708- return result;
1709-}
1710-#endif
1711-
1712-#ifndef STBI_NO_LINEAR
1713-static float *
1714-stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1715-{
1716- unsigned char *data;
1717-#ifndef STBI_NO_HDR
1718- if (stbi__hdr_test(s)) {
1719- stbi__result_info ri;
1720- float *hdr_data = stbi__hdr_load(s, x, y, comp, req_comp, &ri);
1721- if (hdr_data) {
1722- stbi__float_postprocess(hdr_data, x, y, comp, req_comp);
1723- }
1724- return hdr_data;
1725- }
1726-#endif
1727- data = stbi__load_and_postprocess_8bit(s, x, y, comp, req_comp);
1728- if (data) {
1729- return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
1730- }
1731- return stbi__errpf("unknown image type",
1732- "Image not of any known type, or corrupt");
1733-}
1734-
1735-STBIDEF float *
1736-stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y,
1737- int *comp, int req_comp)
1738-{
1739- stbi__context s;
1740- stbi__start_mem(&s, buffer, len);
1741- return stbi__loadf_main(&s, x, y, comp, req_comp);
1742-}
1743-
1744-STBIDEF float *
1745-stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x,
1746- int *y, int *comp, int req_comp)
1747-{
1748- stbi__context s;
1749- stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
1750- return stbi__loadf_main(&s, x, y, comp, req_comp);
1751-}
1752-
1753-#ifndef STBI_NO_STDIO
1754-STBIDEF float *
1755-stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1756-{
1757- float *result;
1758- FILE *f = stbi__fopen(filename, "rb");
1759- if (!f) {
1760- return stbi__errpf("can't fopen", "Unable to open file");
1761- }
1762- result = stbi_loadf_from_file(f, x, y, comp, req_comp);
1763- fclose(f);
1764- return result;
1765-}
1766-
1767-STBIDEF float *
1768-stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1769-{
1770- stbi__context s;
1771- stbi__start_file(&s, f);
1772- return stbi__loadf_main(&s, x, y, comp, req_comp);
1773-}
1774-#endif // !STBI_NO_STDIO
1775-
1776-#endif // !STBI_NO_LINEAR
1777-
1778-// these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1779-// defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1780-// reports false!
1781-
1782-STBIDEF int
1783-stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1784-{
1785-#ifndef STBI_NO_HDR
1786- stbi__context s;
1787- stbi__start_mem(&s, buffer, len);
1788- return stbi__hdr_test(&s);
1789-#else
1790- STBI_NOTUSED(buffer);
1791- STBI_NOTUSED(len);
1792- return 0;
1793-#endif
1794-}
1795-
1796-#ifndef STBI_NO_STDIO
1797-STBIDEF int
1798-stbi_is_hdr(char const *filename)
1799-{
1800- FILE *f = stbi__fopen(filename, "rb");
1801- int result = 0;
1802- if (f) {
1803- result = stbi_is_hdr_from_file(f);
1804- fclose(f);
1805- }
1806- return result;
1807-}
1808-
1809-STBIDEF int
1810-stbi_is_hdr_from_file(FILE *f)
1811-{
1812-#ifndef STBI_NO_HDR
1813- long pos = ftell(f);
1814- int res;
1815- stbi__context s;
1816- stbi__start_file(&s, f);
1817- res = stbi__hdr_test(&s);
1818- fseek(f, pos, SEEK_SET);
1819- return res;
1820-#else
1821- STBI_NOTUSED(f);
1822- return 0;
1823-#endif
1824-}
1825-#endif // !STBI_NO_STDIO
1826-
1827-STBIDEF int
1828-stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1829-{
1830-#ifndef STBI_NO_HDR
1831- stbi__context s;
1832- stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
1833- return stbi__hdr_test(&s);
1834-#else
1835- STBI_NOTUSED(clbk);
1836- STBI_NOTUSED(user);
1837- return 0;
1838-#endif
1839-}
1840-
1841-#ifndef STBI_NO_LINEAR
1842-static float stbi__l2h_gamma = 2.2f, stbi__l2h_scale = 1.0f;
1843-
1844-STBIDEF void
1845-stbi_ldr_to_hdr_gamma(float gamma)
1846-{
1847- stbi__l2h_gamma = gamma;
1848-}
1849-STBIDEF void
1850-stbi_ldr_to_hdr_scale(float scale)
1851-{
1852- stbi__l2h_scale = scale;
1853-}
1854-#endif
1855-
1856-static float stbi__h2l_gamma_i = 1.0f / 2.2f, stbi__h2l_scale_i = 1.0f;
1857-
1858-STBIDEF void
1859-stbi_hdr_to_ldr_gamma(float gamma)
1860-{
1861- stbi__h2l_gamma_i = 1 / gamma;
1862-}
1863-STBIDEF void
1864-stbi_hdr_to_ldr_scale(float scale)
1865-{
1866- stbi__h2l_scale_i = 1 / scale;
1867-}
1868-
1869-//////////////////////////////////////////////////////////////////////////////
1870-//
1871-// Common code used by all image loaders
1872-//
1873-
1874-enum { STBI__SCAN_load = 0, STBI__SCAN_type, STBI__SCAN_header };
1875-
1876-static void
1877-stbi__refill_buffer(stbi__context *s)
1878-{
1879- int n = (s->io.read)(s->io_user_data, (char *)s->buffer_start, s->buflen);
1880- s->callback_already_read += (int)(s->img_buffer - s->img_buffer_original);
1881- if (n == 0) {
1882- // at end of file, treat same as if from memory, but need to handle case
1883- // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1884- s->read_from_callbacks = 0;
1885- s->img_buffer = s->buffer_start;
1886- s->img_buffer_end = s->buffer_start + 1;
1887- *s->img_buffer = 0;
1888- } else {
1889- s->img_buffer = s->buffer_start;
1890- s->img_buffer_end = s->buffer_start + n;
1891- }
1892-}
1893-
1894-stbi_inline static stbi_uc
1895-stbi__get8(stbi__context *s)
1896-{
1897- if (s->img_buffer < s->img_buffer_end) {
1898- return *s->img_buffer++;
1899- }
1900- if (s->read_from_callbacks) {
1901- stbi__refill_buffer(s);
1902- return *s->img_buffer++;
1903- }
1904- return 0;
1905-}
1906-
1907-#if defined(STBI_NO_JPEG) && defined(STBI_NO_HDR) && defined(STBI_NO_PIC) && \
1908- defined(STBI_NO_PNM)
1909-// nothing
1910-#else
1911-stbi_inline static int
1912-stbi__at_eof(stbi__context *s)
1913-{
1914- if (s->io.read) {
1915- if (!(s->io.eof)(s->io_user_data)) {
1916- return 0;
1917- }
1918- // if feof() is true, check if buffer = end
1919- // special case: we've only got the special 0 character at the end
1920- if (s->read_from_callbacks == 0) {
1921- return 1;
1922- }
1923- }
1924-
1925- return s->img_buffer >= s->img_buffer_end;
1926-}
1927-#endif
1928-
1929-#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && \
1930- defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && \
1931- defined(STBI_NO_PIC)
1932-// nothing
1933-#else
1934-static void
1935-stbi__skip(stbi__context *s, int n)
1936-{
1937- if (n == 0) {
1938- return; // already there!
1939- }
1940- if (n < 0) {
1941- s->img_buffer = s->img_buffer_end;
1942- return;
1943- }
1944- if (s->io.read) {
1945- int blen = (int)(s->img_buffer_end - s->img_buffer);
1946- if (blen < n) {
1947- s->img_buffer = s->img_buffer_end;
1948- (s->io.skip)(s->io_user_data, n - blen);
1949- return;
1950- }
1951- }
1952- s->img_buffer += n;
1953-}
1954-#endif
1955-
1956-#if defined(STBI_NO_PNG) && defined(STBI_NO_TGA) && defined(STBI_NO_HDR) && \
1957- defined(STBI_NO_PNM)
1958-// nothing
1959-#else
1960-static int
1961-stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1962-{
1963- if (s->io.read) {
1964- int blen = (int)(s->img_buffer_end - s->img_buffer);
1965- if (blen < n) {
1966- int res, count;
1967-
1968- memcpy(buffer, s->img_buffer, blen);
1969-
1970- count =
1971- (s->io.read)(s->io_user_data, (char *)buffer + blen, n - blen);
1972- res = (count == (n - blen));
1973- s->img_buffer = s->img_buffer_end;
1974- return res;
1975- }
1976- }
1977-
1978- if (s->img_buffer + n <= s->img_buffer_end) {
1979- memcpy(buffer, s->img_buffer, n);
1980- s->img_buffer += n;
1981- return 1;
1982- } else {
1983- return 0;
1984- }
1985-}
1986-#endif
1987-
1988-#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && \
1989- defined(STBI_NO_PIC)
1990-// nothing
1991-#else
1992-static int
1993-stbi__get16be(stbi__context *s)
1994-{
1995- int z = stbi__get8(s);
1996- return (z << 8) + stbi__get8(s);
1997-}
1998-#endif
1999-
2000-#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
2001-// nothing
2002-#else
2003-static stbi__uint32
2004-stbi__get32be(stbi__context *s)
2005-{
2006- stbi__uint32 z = stbi__get16be(s);
2007- return (z << 16) + stbi__get16be(s);
2008-}
2009-#endif
2010-
2011-#if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
2012-// nothing
2013-#else
2014-static int
2015-stbi__get16le(stbi__context *s)
2016-{
2017- int z = stbi__get8(s);
2018- return z + (stbi__get8(s) << 8);
2019-}
2020-#endif
2021-
2022-#ifndef STBI_NO_BMP
2023-static stbi__uint32
2024-stbi__get32le(stbi__context *s)
2025-{
2026- stbi__uint32 z = stbi__get16le(s);
2027- z += (stbi__uint32)stbi__get16le(s) << 16;
2028- return z;
2029-}
2030-#endif
2031-
2032-#define STBI__BYTECAST(x) \
2033- ((stbi_uc)((x) & 255)) // truncate int to byte without warnings
2034-
2035-#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && \
2036- defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && \
2037- defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
2038-// nothing
2039-#else
2040-//////////////////////////////////////////////////////////////////////////////
2041-//
2042-// generic converter from built-in img_n to req_comp
2043-// individual types do this automatically as much as possible (e.g. jpeg
2044-// does all cases internally since it needs to colorspace convert anyway,
2045-// and it never has alpha, so very few cases ). png can automatically
2046-// interleave an alpha=255 channel, but falls back to this for other cases
2047-//
2048-// assume data buffer is malloced, so malloc a new one and free that one
2049-// only failure mode is malloc failing
2050-
2051-static stbi_uc
2052-stbi__compute_y(int r, int g, int b)
2053-{
2054- return (stbi_uc)(((r * 77) + (g * 150) + (29 * b)) >> 8);
2055-}
2056-#endif
2057-
2058-#if defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && \
2059- defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && \
2060- defined(STBI_NO_PNM)
2061-// nothing
2062-#else
2063-static unsigned char *
2064-stbi__convert_format(unsigned char *data, int img_n, int req_comp,
2065- unsigned int x, unsigned int y)
2066-{
2067- int i, j;
2068- unsigned char *good;
2069-
2070- if (req_comp == img_n) {
2071- return data;
2072- }
2073- STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
2074-
2075- good = (unsigned char *)stbi__malloc_mad3(req_comp, x, y, 0);
2076- if (good == NULL) {
2077- STBI_FREE(data);
2078- return stbi__errpuc("outofmem", "Out of memory");
2079- }
2080-
2081- for (j = 0; j < (int)y; ++j) {
2082- unsigned char *src = data + j * x * img_n;
2083- unsigned char *dest = good + j * x * req_comp;
2084-
2085-#define STBI__COMBO(a, b) ((a) * 8 + (b))
2086-#define STBI__CASE(a, b) \
2087- case STBI__COMBO(a, b): \
2088- for (i = x - 1; i >= 0; --i, src += a, dest += b)
2089- // convert source image with img_n components to one with req_comp
2090- // components; avoid switch per pixel, so use switch per scanline and
2091- // massive macros
2092- switch (STBI__COMBO(img_n, req_comp)) {
2093- STBI__CASE(1, 2)
2094- {
2095- dest[0] = src[0];
2096- dest[1] = 255;
2097- }
2098- break;
2099- STBI__CASE(1, 3) { dest[0] = dest[1] = dest[2] = src[0]; }
2100- break;
2101- STBI__CASE(1, 4)
2102- {
2103- dest[0] = dest[1] = dest[2] = src[0];
2104- dest[3] = 255;
2105- }
2106- break;
2107- STBI__CASE(2, 1) { dest[0] = src[0]; }
2108- break;
2109- STBI__CASE(2, 3) { dest[0] = dest[1] = dest[2] = src[0]; }
2110- break;
2111- STBI__CASE(2, 4)
2112- {
2113- dest[0] = dest[1] = dest[2] = src[0];
2114- dest[3] = src[1];
2115- }
2116- break;
2117- STBI__CASE(3, 4)
2118- {
2119- dest[0] = src[0];
2120- dest[1] = src[1];
2121- dest[2] = src[2];
2122- dest[3] = 255;
2123- }
2124- break;
2125- STBI__CASE(3, 1)
2126- {
2127- dest[0] = stbi__compute_y(src[0], src[1], src[2]);
2128- }
2129- break;
2130- STBI__CASE(3, 2)
2131- {
2132- dest[0] = stbi__compute_y(src[0], src[1], src[2]);
2133- dest[1] = 255;
2134- }
2135- break;
2136- STBI__CASE(4, 1)
2137- {
2138- dest[0] = stbi__compute_y(src[0], src[1], src[2]);
2139- }
2140- break;
2141- STBI__CASE(4, 2)
2142- {
2143- dest[0] = stbi__compute_y(src[0], src[1], src[2]);
2144- dest[1] = src[3];
2145- }
2146- break;
2147- STBI__CASE(4, 3)
2148- {
2149- dest[0] = src[0];
2150- dest[1] = src[1];
2151- dest[2] = src[2];
2152- }
2153- break;
2154- default:
2155- STBI_ASSERT(0);
2156- STBI_FREE(data);
2157- STBI_FREE(good);
2158- return stbi__errpuc("unsupported", "Unsupported format conversion");
2159- }
2160-#undef STBI__CASE
2161- }
2162-
2163- STBI_FREE(data);
2164- return good;
2165-}
2166-#endif
2167-
2168-#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
2169-// nothing
2170-#else
2171-static stbi__uint16
2172-stbi__compute_y_16(int r, int g, int b)
2173-{
2174- return (stbi__uint16)(((r * 77) + (g * 150) + (29 * b)) >> 8);
2175-}
2176-#endif
2177-
2178-#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
2179-// nothing
2180-#else
2181-static stbi__uint16 *
2182-stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp,
2183- unsigned int x, unsigned int y)
2184-{
2185- int i, j;
2186- stbi__uint16 *good;
2187-
2188- if (req_comp == img_n) {
2189- return data;
2190- }
2191- STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
2192-
2193- good = (stbi__uint16 *)stbi__malloc(req_comp * x * y * 2);
2194- if (good == NULL) {
2195- STBI_FREE(data);
2196- return (stbi__uint16 *)stbi__errpuc("outofmem", "Out of memory");
2197- }
2198-
2199- for (j = 0; j < (int)y; ++j) {
2200- stbi__uint16 *src = data + j * x * img_n;
2201- stbi__uint16 *dest = good + j * x * req_comp;
2202-
2203-#define STBI__COMBO(a, b) ((a) * 8 + (b))
2204-#define STBI__CASE(a, b) \
2205- case STBI__COMBO(a, b): \
2206- for (i = x - 1; i >= 0; --i, src += a, dest += b)
2207- // convert source image with img_n components to one with req_comp
2208- // components; avoid switch per pixel, so use switch per scanline and
2209- // massive macros
2210- switch (STBI__COMBO(img_n, req_comp)) {
2211- STBI__CASE(1, 2)
2212- {
2213- dest[0] = src[0];
2214- dest[1] = 0xffff;
2215- }
2216- break;
2217- STBI__CASE(1, 3) { dest[0] = dest[1] = dest[2] = src[0]; }
2218- break;
2219- STBI__CASE(1, 4)
2220- {
2221- dest[0] = dest[1] = dest[2] = src[0];
2222- dest[3] = 0xffff;
2223- }
2224- break;
2225- STBI__CASE(2, 1) { dest[0] = src[0]; }
2226- break;
2227- STBI__CASE(2, 3) { dest[0] = dest[1] = dest[2] = src[0]; }
2228- break;
2229- STBI__CASE(2, 4)
2230- {
2231- dest[0] = dest[1] = dest[2] = src[0];
2232- dest[3] = src[1];
2233- }
2234- break;
2235- STBI__CASE(3, 4)
2236- {
2237- dest[0] = src[0];
2238- dest[1] = src[1];
2239- dest[2] = src[2];
2240- dest[3] = 0xffff;
2241- }
2242- break;
2243- STBI__CASE(3, 1)
2244- {
2245- dest[0] = stbi__compute_y_16(src[0], src[1], src[2]);
2246- }
2247- break;
2248- STBI__CASE(3, 2)
2249- {
2250- dest[0] = stbi__compute_y_16(src[0], src[1], src[2]);
2251- dest[1] = 0xffff;
2252- }
2253- break;
2254- STBI__CASE(4, 1)
2255- {
2256- dest[0] = stbi__compute_y_16(src[0], src[1], src[2]);
2257- }
2258- break;
2259- STBI__CASE(4, 2)
2260- {
2261- dest[0] = stbi__compute_y_16(src[0], src[1], src[2]);
2262- dest[1] = src[3];
2263- }
2264- break;
2265- STBI__CASE(4, 3)
2266- {
2267- dest[0] = src[0];
2268- dest[1] = src[1];
2269- dest[2] = src[2];
2270- }
2271- break;
2272- default:
2273- STBI_ASSERT(0);
2274- STBI_FREE(data);
2275- STBI_FREE(good);
2276- return (stbi__uint16 *)stbi__errpuc(
2277- "unsupported", "Unsupported format conversion");
2278- }
2279-#undef STBI__CASE
2280- }
2281-
2282- STBI_FREE(data);
2283- return good;
2284-}
2285-#endif
2286-
2287-#ifndef STBI_NO_LINEAR
2288-static float *
2289-stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
2290-{
2291- int i, k, n;
2292- float *output;
2293- if (!data) {
2294- return NULL;
2295- }
2296- output = (float *)stbi__malloc_mad4(x, y, comp, sizeof(float), 0);
2297- if (output == NULL) {
2298- STBI_FREE(data);
2299- return stbi__errpf("outofmem", "Out of memory");
2300- }
2301- // compute number of non-alpha components
2302- if (comp & 1) {
2303- n = comp;
2304- } else {
2305- n = comp - 1;
2306- }
2307- for (i = 0; i < x * y; ++i) {
2308- for (k = 0; k < n; ++k) {
2309- output[i * comp + k] =
2310- (float)(pow(data[i * comp + k] / 255.0f, stbi__l2h_gamma) *
2311- stbi__l2h_scale);
2312- }
2313- }
2314- if (n < comp) {
2315- for (i = 0; i < x * y; ++i) {
2316- output[i * comp + n] = data[i * comp + n] / 255.0f;
2317- }
2318- }
2319- STBI_FREE(data);
2320- return output;
2321-}
2322-#endif
2323-
2324-#ifndef STBI_NO_HDR
2325-#define stbi__float2int(x) ((int)(x))
2326-static stbi_uc *
2327-stbi__hdr_to_ldr(float *data, int x, int y, int comp)
2328-{
2329- int i, k, n;
2330- stbi_uc *output;
2331- if (!data) {
2332- return NULL;
2333- }
2334- output = (stbi_uc *)stbi__malloc_mad3(x, y, comp, 0);
2335- if (output == NULL) {
2336- STBI_FREE(data);
2337- return stbi__errpuc("outofmem", "Out of memory");
2338- }
2339- // compute number of non-alpha components
2340- if (comp & 1) {
2341- n = comp;
2342- } else {
2343- n = comp - 1;
2344- }
2345- for (i = 0; i < x * y; ++i) {
2346- for (k = 0; k < n; ++k) {
2347- float z = (float)pow(data[i * comp + k] * stbi__h2l_scale_i,
2348- stbi__h2l_gamma_i) *
2349- 255 +
2350- 0.5f;
2351- if (z < 0) {
2352- z = 0;
2353- }
2354- if (z > 255) {
2355- z = 255;
2356- }
2357- output[i * comp + k] = (stbi_uc)stbi__float2int(z);
2358- }
2359- if (k < comp) {
2360- float z = data[i * comp + k] * 255 + 0.5f;
2361- if (z < 0) {
2362- z = 0;
2363- }
2364- if (z > 255) {
2365- z = 255;
2366- }
2367- output[i * comp + k] = (stbi_uc)stbi__float2int(z);
2368- }
2369- }
2370- STBI_FREE(data);
2371- return output;
2372-}
2373-#endif
2374-
2375-//////////////////////////////////////////////////////////////////////////////
2376-//
2377-// "baseline" JPEG/JFIF decoder
2378-//
2379-// simple implementation
2380-// - doesn't support delayed output of y-dimension
2381-// - simple interface (only one output format: 8-bit interleaved RGB)
2382-// - doesn't try to recover corrupt jpegs
2383-// - doesn't allow partial loading, loading multiple at once
2384-// - still fast on x86 (copying globals into locals doesn't help x86)
2385-// - allocates lots of intermediate memory (full size of all components)
2386-// - non-interleaved case requires this anyway
2387-// - allows good upsampling (see next)
2388-// high-quality
2389-// - upsampled channels are bilinearly interpolated, even across blocks
2390-// - quality integer IDCT derived from IJG's 'slow'
2391-// performance
2392-// - fast huffman; reasonable integer IDCT
2393-// - some SIMD kernels for common paths on targets with SSE2/NEON
2394-// - uses a lot of intermediate memory, could cache poorly
2395-
2396-#ifndef STBI_NO_JPEG
2397-
2398-// huffman decoding acceleration
2399-#define FAST_BITS 9 // larger handles more cases; smaller stomps less cache
2400-
2401-typedef struct {
2402- stbi_uc fast[1 << FAST_BITS];
2403- // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
2404- stbi__uint16 code[256];
2405- stbi_uc values[256];
2406- stbi_uc size[257];
2407- unsigned int maxcode[18];
2408- int delta[17]; // old 'firstsymbol' - old 'firstcode'
2409-} stbi__huffman;
2410-
2411-typedef struct {
2412- stbi__context *s;
2413- stbi__huffman huff_dc[4];
2414- stbi__huffman huff_ac[4];
2415- stbi__uint16 dequant[4][64];
2416- stbi__int16 fast_ac[4][1 << FAST_BITS];
2417-
2418- // sizes for components, interleaved MCUs
2419- int img_h_max, img_v_max;
2420- int img_mcu_x, img_mcu_y;
2421- int img_mcu_w, img_mcu_h;
2422-
2423- // definition of jpeg image component
2424- struct {
2425- int id;
2426- int h, v;
2427- int tq;
2428- int hd, ha;
2429- int dc_pred;
2430-
2431- int x, y, w2, h2;
2432- stbi_uc *data;
2433- void *raw_data, *raw_coeff;
2434- stbi_uc *linebuf;
2435- short *coeff; // progressive only
2436- int coeff_w, coeff_h; // number of 8x8 coefficient blocks
2437- } img_comp[4];
2438-
2439- stbi__uint32 code_buffer; // jpeg entropy-coded buffer
2440- int code_bits; // number of valid bits
2441- unsigned char marker; // marker seen while filling entropy buffer
2442- int nomore; // flag if we saw a marker so must stop
2443-
2444- int progressive;
2445- int spec_start;
2446- int spec_end;
2447- int succ_high;
2448- int succ_low;
2449- int eob_run;
2450- int jfif;
2451- int app14_color_transform; // Adobe APP14 tag
2452- int rgb;
2453-
2454- int scan_n, order[4];
2455- int restart_interval, todo;
2456-
2457- // kernels
2458- void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
2459- void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y,
2460- const stbi_uc *pcb, const stbi_uc *pcr,
2461- int count, int step);
2462- stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near,
2463- stbi_uc *in_far, int w, int hs);
2464-} stbi__jpeg;
2465-
2466-static int
2467-stbi__build_huffman(stbi__huffman *h, int *count)
2468-{
2469- int i, j, k = 0;
2470- unsigned int code;
2471- // build size list for each symbol (from JPEG spec)
2472- for (i = 0; i < 16; ++i) {
2473- for (j = 0; j < count[i]; ++j) {
2474- h->size[k++] = (stbi_uc)(i + 1);
2475- if (k >= 257) {
2476- return stbi__err("bad size list", "Corrupt JPEG");
2477- }
2478- }
2479- }
2480- h->size[k] = 0;
2481-
2482- // compute actual symbols (from jpeg spec)
2483- code = 0;
2484- k = 0;
2485- for (j = 1; j <= 16; ++j) {
2486- // compute delta to add to code to compute symbol id
2487- h->delta[j] = k - code;
2488- if (h->size[k] == j) {
2489- while (h->size[k] == j) {
2490- h->code[k++] = (stbi__uint16)(code++);
2491- }
2492- if (code - 1 >= (1u << j)) {
2493- return stbi__err("bad code lengths", "Corrupt JPEG");
2494- }
2495- }
2496- // compute largest code + 1 for this size, preshifted as needed later
2497- h->maxcode[j] = code << (16 - j);
2498- code <<= 1;
2499- }
2500- h->maxcode[j] = 0xffffffff;
2501-
2502- // build non-spec acceleration table; 255 is flag for not-accelerated
2503- memset(h->fast, 255, 1 << FAST_BITS);
2504- for (i = 0; i < k; ++i) {
2505- int s = h->size[i];
2506- if (s <= FAST_BITS) {
2507- int c = h->code[i] << (FAST_BITS - s);
2508- int m = 1 << (FAST_BITS - s);
2509- for (j = 0; j < m; ++j) {
2510- h->fast[c + j] = (stbi_uc)i;
2511- }
2512- }
2513- }
2514- return 1;
2515-}
2516-
2517-// build a table that decodes both magnitude and value of small ACs in
2518-// one go.
2519-static void
2520-stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
2521-{
2522- int i;
2523- for (i = 0; i < (1 << FAST_BITS); ++i) {
2524- stbi_uc fast = h->fast[i];
2525- fast_ac[i] = 0;
2526- if (fast < 255) {
2527- int rs = h->values[fast];
2528- int run = (rs >> 4) & 15;
2529- int magbits = rs & 15;
2530- int len = h->size[fast];
2531-
2532- if (magbits && len + magbits <= FAST_BITS) {
2533- // magnitude code followed by receive_extend code
2534- int k = ((i << len) & ((1 << FAST_BITS) - 1)) >>
2535- (FAST_BITS - magbits);
2536- int m = 1 << (magbits - 1);
2537- if (k < m) {
2538- k += (~0U << magbits) + 1;
2539- }
2540- // if the result is small enough, we can fit it in fast_ac table
2541- if (k >= -128 && k <= 127) {
2542- fast_ac[i] =
2543- (stbi__int16)((k * 256) + (run * 16) + (len + magbits));
2544- }
2545- }
2546- }
2547- }
2548-}
2549-
2550-static void
2551-stbi__grow_buffer_unsafe(stbi__jpeg *j)
2552-{
2553- do {
2554- unsigned int b = j->nomore ? 0 : stbi__get8(j->s);
2555- if (b == 0xff) {
2556- int c = stbi__get8(j->s);
2557- while (c == 0xff) {
2558- c = stbi__get8(j->s); // consume fill bytes
2559- }
2560- if (c != 0) {
2561- j->marker = (unsigned char)c;
2562- j->nomore = 1;
2563- return;
2564- }
2565- }
2566- j->code_buffer |= b << (24 - j->code_bits);
2567- j->code_bits += 8;
2568- } while (j->code_bits <= 24);
2569-}
2570-
2571-// (1 << n) - 1
2572-static const stbi__uint32 stbi__bmask[17] = {
2573- 0, 1, 3, 7, 15, 31, 63, 127, 255,
2574- 511, 1023, 2047, 4095, 8191, 16383, 32767, 65535};
2575-
2576-// decode a jpeg huffman value from the bitstream
2577-stbi_inline static int
2578-stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
2579-{
2580- unsigned int temp;
2581- int c, k;
2582-
2583- if (j->code_bits < 16) {
2584- stbi__grow_buffer_unsafe(j);
2585- }
2586-
2587- // look at the top FAST_BITS and determine what symbol ID it is,
2588- // if the code is <= FAST_BITS
2589- c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS) - 1);
2590- k = h->fast[c];
2591- if (k < 255) {
2592- int s = h->size[k];
2593- if (s > j->code_bits) {
2594- return -1;
2595- }
2596- j->code_buffer <<= s;
2597- j->code_bits -= s;
2598- return h->values[k];
2599- }
2600-
2601- // naive test is to shift the code_buffer down so k bits are
2602- // valid, then test against maxcode. To speed this up, we've
2603- // preshifted maxcode left so that it has (16-k) 0s at the
2604- // end; in other words, regardless of the number of bits, it
2605- // wants to be compared against something shifted to have 16;
2606- // that way we don't need to shift inside the loop.
2607- temp = j->code_buffer >> 16;
2608- for (k = FAST_BITS + 1;; ++k) {
2609- if (temp < h->maxcode[k]) {
2610- break;
2611- }
2612- }
2613- if (k == 17) {
2614- // error! code not found
2615- j->code_bits -= 16;
2616- return -1;
2617- }
2618-
2619- if (k > j->code_bits) {
2620- return -1;
2621- }
2622-
2623- // convert the huffman code to the symbol id
2624- c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
2625- if (c < 0 || c >= 256) { // symbol id out of bounds!
2626- return -1;
2627- }
2628- STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) &
2629- stbi__bmask[h->size[c]]) == h->code[c]);
2630-
2631- // convert the id to a symbol
2632- j->code_bits -= k;
2633- j->code_buffer <<= k;
2634- return h->values[c];
2635-}
2636-
2637-// bias[n] = (-1<<n) + 1
2638-static const int stbi__jbias[16] = {0, -1, -3, -7, -15, -31,
2639- -63, -127, -255, -511, -1023, -2047,
2640- -4095, -8191, -16383, -32767};
2641-
2642-// combined JPEG 'receive' and JPEG 'extend', since baseline
2643-// always extends everything it receives.
2644-stbi_inline static int
2645-stbi__extend_receive(stbi__jpeg *j, int n)
2646-{
2647- unsigned int k;
2648- int sgn;
2649- if (j->code_bits < n) {
2650- stbi__grow_buffer_unsafe(j);
2651- }
2652- if (j->code_bits < n) {
2653- return 0; // ran out of bits from stream, return 0s intead of continuing
2654- }
2655-
2656- sgn = j->code_buffer >> 31; // sign bit always in MSB; 0 if MSB clear
2657- // (positive), 1 if MSB set (negative)
2658- k = stbi_lrot(j->code_buffer, n);
2659- j->code_buffer = k & ~stbi__bmask[n];
2660- k &= stbi__bmask[n];
2661- j->code_bits -= n;
2662- return k + (stbi__jbias[n] & (sgn - 1));
2663-}
2664-
2665-// get some unsigned bits
2666-stbi_inline static int
2667-stbi__jpeg_get_bits(stbi__jpeg *j, int n)
2668-{
2669- unsigned int k;
2670- if (j->code_bits < n) {
2671- stbi__grow_buffer_unsafe(j);
2672- }
2673- if (j->code_bits < n) {
2674- return 0; // ran out of bits from stream, return 0s intead of continuing
2675- }
2676- k = stbi_lrot(j->code_buffer, n);
2677- j->code_buffer = k & ~stbi__bmask[n];
2678- k &= stbi__bmask[n];
2679- j->code_bits -= n;
2680- return k;
2681-}
2682-
2683-stbi_inline static int
2684-stbi__jpeg_get_bit(stbi__jpeg *j)
2685-{
2686- unsigned int k;
2687- if (j->code_bits < 1) {
2688- stbi__grow_buffer_unsafe(j);
2689- }
2690- if (j->code_bits < 1) {
2691- return 0; // ran out of bits from stream, return 0s intead of continuing
2692- }
2693- k = j->code_buffer;
2694- j->code_buffer <<= 1;
2695- --j->code_bits;
2696- return k & 0x80000000;
2697-}
2698-
2699-// given a value that's at position X in the zigzag stream,
2700-// where does it appear in the 8x8 matrix coded as row-major?
2701-static const stbi_uc stbi__jpeg_dezigzag[64 + 15] = {
2702- 0, 1, 8, 16, 9, 2, 3, 10, 17, 24, 32, 25, 18, 11, 4, 5, 12, 19, 26, 33, 40,
2703- 48, 41, 34, 27, 20, 13, 6, 7, 14, 21, 28, 35, 42, 49, 56, 57, 50, 43, 36,
2704- 29, 22, 15, 23, 30, 37, 44, 51, 58, 59, 52, 45, 38, 31, 39, 46, 53, 60, 61,
2705- 54, 47, 55, 62, 63,
2706- // let corrupt input sample past end
2707- 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63};
2708-
2709-// decode one 64-entry block--
2710-static int
2711-stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc,
2712- stbi__huffman *hac, stbi__int16 *fac, int b,
2713- stbi__uint16 *dequant)
2714-{
2715- int diff, dc, k;
2716- int t;
2717-
2718- if (j->code_bits < 16) {
2719- stbi__grow_buffer_unsafe(j);
2720- }
2721- t = stbi__jpeg_huff_decode(j, hdc);
2722- if (t < 0 || t > 15) {
2723- return stbi__err("bad huffman code", "Corrupt JPEG");
2724- }
2725-
2726- // 0 all the ac values now so we can do it 32-bits at a time
2727- memset(data, 0, 64 * sizeof(data[0]));
2728-
2729- diff = t ? stbi__extend_receive(j, t) : 0;
2730- if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) {
2731- return stbi__err("bad delta", "Corrupt JPEG");
2732- }
2733- dc = j->img_comp[b].dc_pred + diff;
2734- j->img_comp[b].dc_pred = dc;
2735- if (!stbi__mul2shorts_valid(dc, dequant[0])) {
2736- return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2737- }
2738- data[0] = (short)(dc * dequant[0]);
2739-
2740- // decode AC components, see JPEG spec
2741- k = 1;
2742- do {
2743- unsigned int zig;
2744- int c, r, s;
2745- if (j->code_bits < 16) {
2746- stbi__grow_buffer_unsafe(j);
2747- }
2748- c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS) - 1);
2749- r = fac[c];
2750- if (r) { // fast-AC path
2751- k += (r >> 4) & 15; // run
2752- s = r & 15; // combined length
2753- if (s > j->code_bits) {
2754- return stbi__err(
2755- "bad huffman code",
2756- "Combined length longer than code bits available");
2757- }
2758- j->code_buffer <<= s;
2759- j->code_bits -= s;
2760- // decode into unzigzag'd location
2761- zig = stbi__jpeg_dezigzag[k++];
2762- data[zig] = (short)((r >> 8) * dequant[zig]);
2763- } else {
2764- int rs = stbi__jpeg_huff_decode(j, hac);
2765- if (rs < 0) {
2766- return stbi__err("bad huffman code", "Corrupt JPEG");
2767- }
2768- s = rs & 15;
2769- r = rs >> 4;
2770- if (s == 0) {
2771- if (rs != 0xf0) {
2772- break; // end block
2773- }
2774- k += 16;
2775- } else {
2776- k += r;
2777- // decode into unzigzag'd location
2778- zig = stbi__jpeg_dezigzag[k++];
2779- data[zig] = (short)(stbi__extend_receive(j, s) * dequant[zig]);
2780- }
2781- }
2782- } while (k < 64);
2783- return 1;
2784-}
2785-
2786-static int
2787-stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64],
2788- stbi__huffman *hdc, int b)
2789-{
2790- int diff, dc;
2791- int t;
2792- if (j->spec_end != 0) {
2793- return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2794- }
2795-
2796- if (j->code_bits < 16) {
2797- stbi__grow_buffer_unsafe(j);
2798- }
2799-
2800- if (j->succ_high == 0) {
2801- // first scan for DC coefficient, must be first
2802- memset(data, 0, 64 * sizeof(data[0])); // 0 all the ac values now
2803- t = stbi__jpeg_huff_decode(j, hdc);
2804- if (t < 0 || t > 15) {
2805- return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2806- }
2807- diff = t ? stbi__extend_receive(j, t) : 0;
2808-
2809- if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) {
2810- return stbi__err("bad delta", "Corrupt JPEG");
2811- }
2812- dc = j->img_comp[b].dc_pred + diff;
2813- j->img_comp[b].dc_pred = dc;
2814- if (!stbi__mul2shorts_valid(dc, 1 << j->succ_low)) {
2815- return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2816- }
2817- data[0] = (short)(dc * (1 << j->succ_low));
2818- } else {
2819- // refinement scan for DC coefficient
2820- if (stbi__jpeg_get_bit(j)) {
2821- data[0] += (short)(1 << j->succ_low);
2822- }
2823- }
2824- return 1;
2825-}
2826-
2827-// @OPTIMIZE: store non-zigzagged during the decode passes,
2828-// and only de-zigzag when dequantizing
2829-static int
2830-stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64],
2831- stbi__huffman *hac, stbi__int16 *fac)
2832-{
2833- int k;
2834- if (j->spec_start == 0) {
2835- return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2836- }
2837-
2838- if (j->succ_high == 0) {
2839- int shift = j->succ_low;
2840-
2841- if (j->eob_run) {
2842- --j->eob_run;
2843- return 1;
2844- }
2845-
2846- k = j->spec_start;
2847- do {
2848- unsigned int zig;
2849- int c, r, s;
2850- if (j->code_bits < 16) {
2851- stbi__grow_buffer_unsafe(j);
2852- }
2853- c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS) - 1);
2854- r = fac[c];
2855- if (r) { // fast-AC path
2856- k += (r >> 4) & 15; // run
2857- s = r & 15; // combined length
2858- if (s > j->code_bits) {
2859- return stbi__err(
2860- "bad huffman code",
2861- "Combined length longer than code bits available");
2862- }
2863- j->code_buffer <<= s;
2864- j->code_bits -= s;
2865- zig = stbi__jpeg_dezigzag[k++];
2866- data[zig] = (short)((r >> 8) * (1 << shift));
2867- } else {
2868- int rs = stbi__jpeg_huff_decode(j, hac);
2869- if (rs < 0) {
2870- return stbi__err("bad huffman code", "Corrupt JPEG");
2871- }
2872- s = rs & 15;
2873- r = rs >> 4;
2874- if (s == 0) {
2875- if (r < 15) {
2876- j->eob_run = (1 << r);
2877- if (r) {
2878- j->eob_run += stbi__jpeg_get_bits(j, r);
2879- }
2880- --j->eob_run;
2881- break;
2882- }
2883- k += 16;
2884- } else {
2885- k += r;
2886- zig = stbi__jpeg_dezigzag[k++];
2887- data[zig] =
2888- (short)(stbi__extend_receive(j, s) * (1 << shift));
2889- }
2890- }
2891- } while (k <= j->spec_end);
2892- } else {
2893- // refinement scan for these AC coefficients
2894-
2895- short bit = (short)(1 << j->succ_low);
2896-
2897- if (j->eob_run) {
2898- --j->eob_run;
2899- for (k = j->spec_start; k <= j->spec_end; ++k) {
2900- short *p = &data[stbi__jpeg_dezigzag[k]];
2901- if (*p != 0) {
2902- if (stbi__jpeg_get_bit(j)) {
2903- if ((*p & bit) == 0) {
2904- if (*p > 0) {
2905- *p += bit;
2906- } else {
2907- *p -= bit;
2908- }
2909- }
2910- }
2911- }
2912- }
2913- } else {
2914- k = j->spec_start;
2915- do {
2916- int r, s;
2917- int rs = stbi__jpeg_huff_decode(
2918- j, hac); // @OPTIMIZE see if we can use the fast path here,
2919- // advance-by-r is so slow, eh
2920- if (rs < 0) {
2921- return stbi__err("bad huffman code", "Corrupt JPEG");
2922- }
2923- s = rs & 15;
2924- r = rs >> 4;
2925- if (s == 0) {
2926- if (r < 15) {
2927- j->eob_run = (1 << r) - 1;
2928- if (r) {
2929- j->eob_run += stbi__jpeg_get_bits(j, r);
2930- }
2931- r = 64; // force end of block
2932- } else {
2933- // r=15 s=0 should write 16 0s, so we just do
2934- // a run of 15 0s and then write s (which is 0),
2935- // so we don't have to do anything special here
2936- }
2937- } else {
2938- if (s != 1) {
2939- return stbi__err("bad huffman code", "Corrupt JPEG");
2940- }
2941- // sign bit
2942- if (stbi__jpeg_get_bit(j)) {
2943- s = bit;
2944- } else {
2945- s = -bit;
2946- }
2947- }
2948-
2949- // advance by r
2950- while (k <= j->spec_end) {
2951- short *p = &data[stbi__jpeg_dezigzag[k++]];
2952- if (*p != 0) {
2953- if (stbi__jpeg_get_bit(j)) {
2954- if ((*p & bit) == 0) {
2955- if (*p > 0) {
2956- *p += bit;
2957- } else {
2958- *p -= bit;
2959- }
2960- }
2961- }
2962- } else {
2963- if (r == 0) {
2964- *p = (short)s;
2965- break;
2966- }
2967- --r;
2968- }
2969- }
2970- } while (k <= j->spec_end);
2971- }
2972- }
2973- return 1;
2974-}
2975-
2976-// take a -128..127 value and stbi__clamp it and convert to 0..255
2977-stbi_inline static stbi_uc
2978-stbi__clamp(int x)
2979-{
2980- // trick to use a single test to catch both cases
2981- if ((unsigned int)x > 255) {
2982- if (x < 0) {
2983- return 0;
2984- }
2985- if (x > 255) {
2986- return 255;
2987- }
2988- }
2989- return (stbi_uc)x;
2990-}
2991-
2992-#define stbi__f2f(x) ((int)(((x) * 4096 + 0.5)))
2993-#define stbi__fsh(x) ((x) * 4096)
2994-
2995-// derived from jidctint -- DCT_ISLOW
2996-#define STBI__IDCT_1D(s0, s1, s2, s3, s4, s5, s6, s7) \
2997- int t0, t1, t2, t3, p1, p2, p3, p4, p5, x0, x1, x2, x3; \
2998- p2 = s2; \
2999- p3 = s6; \
3000- p1 = (p2 + p3) * stbi__f2f(0.5411961f); \
3001- t2 = p1 + p3 * stbi__f2f(-1.847759065f); \
3002- t3 = p1 + p2 * stbi__f2f(0.765366865f); \
3003- p2 = s0; \
3004- p3 = s4; \
3005- t0 = stbi__fsh(p2 + p3); \
3006- t1 = stbi__fsh(p2 - p3); \
3007- x0 = t0 + t3; \
3008- x3 = t0 - t3; \
3009- x1 = t1 + t2; \
3010- x2 = t1 - t2; \
3011- t0 = s7; \
3012- t1 = s5; \
3013- t2 = s3; \
3014- t3 = s1; \
3015- p3 = t0 + t2; \
3016- p4 = t1 + t3; \
3017- p1 = t0 + t3; \
3018- p2 = t1 + t2; \
3019- p5 = (p3 + p4) * stbi__f2f(1.175875602f); \
3020- t0 = t0 * stbi__f2f(0.298631336f); \
3021- t1 = t1 * stbi__f2f(2.053119869f); \
3022- t2 = t2 * stbi__f2f(3.072711026f); \
3023- t3 = t3 * stbi__f2f(1.501321110f); \
3024- p1 = p5 + p1 * stbi__f2f(-0.899976223f); \
3025- p2 = p5 + p2 * stbi__f2f(-2.562915447f); \
3026- p3 = p3 * stbi__f2f(-1.961570560f); \
3027- p4 = p4 * stbi__f2f(-0.390180644f); \
3028- t3 += p1 + p4; \
3029- t2 += p2 + p3; \
3030- t1 += p2 + p4; \
3031- t0 += p1 + p3;
3032-
3033-static void
3034-stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
3035-{
3036- int i, val[64], *v = val;
3037- stbi_uc *o;
3038- short *d = data;
3039-
3040- // columns
3041- for (i = 0; i < 8; ++i, ++d, ++v) {
3042- // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
3043- if (d[8] == 0 && d[16] == 0 && d[24] == 0 && d[32] == 0 && d[40] == 0 &&
3044- d[48] == 0 && d[56] == 0) {
3045- // no shortcut 0 seconds
3046- // (1|2|3|4|5|6|7)==0 0 seconds
3047- // all separate -0.047 seconds
3048- // 1 && 2|3 && 4|5 && 6|7: -0.047 seconds
3049- int dcterm = d[0] * 4;
3050- v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] =
3051- dcterm;
3052- } else {
3053- STBI__IDCT_1D(d[0], d[8], d[16], d[24], d[32], d[40], d[48], d[56])
3054- // constants scaled things up by 1<<12; let's bring them back
3055- // down, but keep 2 extra bits of precision
3056- x0 += 512;
3057- x1 += 512;
3058- x2 += 512;
3059- x3 += 512;
3060- v[0] = (x0 + t3) >> 10;
3061- v[56] = (x0 - t3) >> 10;
3062- v[8] = (x1 + t2) >> 10;
3063- v[48] = (x1 - t2) >> 10;
3064- v[16] = (x2 + t1) >> 10;
3065- v[40] = (x2 - t1) >> 10;
3066- v[24] = (x3 + t0) >> 10;
3067- v[32] = (x3 - t0) >> 10;
3068- }
3069- }
3070-
3071- for (i = 0, v = val, o = out; i < 8; ++i, v += 8, o += out_stride) {
3072- // no fast case since the first 1D IDCT spread components out
3073- STBI__IDCT_1D(v[0], v[1], v[2], v[3], v[4], v[5], v[6], v[7])
3074- // constants scaled things up by 1<<12, plus we had 1<<2 from first
3075- // loop, plus horizontal and vertical each scale by sqrt(8) so together
3076- // we've got an extra 1<<3, so 1<<17 total we need to remove.
3077- // so we want to round that, which means adding 0.5 * 1<<17,
3078- // aka 65536. Also, we'll end up with -128 to 127 that we want
3079- // to encode as 0..255 by adding 128, so we'll add that before the shift
3080- x0 += 65536 + (128 << 17);
3081- x1 += 65536 + (128 << 17);
3082- x2 += 65536 + (128 << 17);
3083- x3 += 65536 + (128 << 17);
3084- // tried computing the shifts into temps, or'ing the temps to see
3085- // if any were out of range, but that was slower
3086- o[0] = stbi__clamp((x0 + t3) >> 17);
3087- o[7] = stbi__clamp((x0 - t3) >> 17);
3088- o[1] = stbi__clamp((x1 + t2) >> 17);
3089- o[6] = stbi__clamp((x1 - t2) >> 17);
3090- o[2] = stbi__clamp((x2 + t1) >> 17);
3091- o[5] = stbi__clamp((x2 - t1) >> 17);
3092- o[3] = stbi__clamp((x3 + t0) >> 17);
3093- o[4] = stbi__clamp((x3 - t0) >> 17);
3094- }
3095-}
3096-
3097-#ifdef STBI_SSE2
3098-// sse2 integer IDCT. not the fastest possible implementation but it
3099-// produces bit-identical results to the generic C version so it's
3100-// fully "transparent".
3101-static void
3102-stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
3103-{
3104- // This is constructed to match our regular (generic) integer IDCT exactly.
3105- __m128i row0, row1, row2, row3, row4, row5, row6, row7;
3106- __m128i tmp;
3107-
3108-// dot product constant: even elems=x, odd elems=y
3109-#define dct_const(x, y) _mm_setr_epi16((x), (y), (x), (y), (x), (y), (x), (y))
3110-
3111-// out(0) = c0[even]*x + c0[odd]*y (c0, x, y 16-bit, out 32-bit)
3112-// out(1) = c1[even]*x + c1[odd]*y
3113-#define dct_rot(out0, out1, x, y, c0, c1) \
3114- __m128i c0##lo = _mm_unpacklo_epi16((x), (y)); \
3115- __m128i c0##hi = _mm_unpackhi_epi16((x), (y)); \
3116- __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
3117- __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
3118- __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
3119- __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
3120-
3121-// out = in << 12 (in 16-bit, out 32-bit)
3122-#define dct_widen(out, in) \
3123- __m128i out##_l = \
3124- _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
3125- __m128i out##_h = \
3126- _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
3127-
3128-// wide add
3129-#define dct_wadd(out, a, b) \
3130- __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
3131- __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
3132-
3133-// wide sub
3134-#define dct_wsub(out, a, b) \
3135- __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
3136- __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
3137-
3138-// butterfly a/b, add bias, then shift by "s" and pack
3139-#define dct_bfly32o(out0, out1, a, b, bias, s) \
3140- { \
3141- __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
3142- __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
3143- dct_wadd(sum, abiased, b); \
3144- dct_wsub(dif, abiased, b); \
3145- out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), \
3146- _mm_srai_epi32(sum_h, s)); \
3147- out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), \
3148- _mm_srai_epi32(dif_h, s)); \
3149- }
3150-
3151-// 8-bit interleave step (for transposes)
3152-#define dct_interleave8(a, b) \
3153- tmp = a; \
3154- a = _mm_unpacklo_epi8(a, b); \
3155- b = _mm_unpackhi_epi8(tmp, b)
3156-
3157-// 16-bit interleave step (for transposes)
3158-#define dct_interleave16(a, b) \
3159- tmp = a; \
3160- a = _mm_unpacklo_epi16(a, b); \
3161- b = _mm_unpackhi_epi16(tmp, b)
3162-
3163-#define dct_pass(bias, shift) \
3164- { \
3165- /* even part */ \
3166- dct_rot(t2e, t3e, row2, row6, rot0_0, rot0_1); \
3167- __m128i sum04 = _mm_add_epi16(row0, row4); \
3168- __m128i dif04 = _mm_sub_epi16(row0, row4); \
3169- dct_widen(t0e, sum04); \
3170- dct_widen(t1e, dif04); \
3171- dct_wadd(x0, t0e, t3e); \
3172- dct_wsub(x3, t0e, t3e); \
3173- dct_wadd(x1, t1e, t2e); \
3174- dct_wsub(x2, t1e, t2e); \
3175- /* odd part */ \
3176- dct_rot(y0o, y2o, row7, row3, rot2_0, rot2_1); \
3177- dct_rot(y1o, y3o, row5, row1, rot3_0, rot3_1); \
3178- __m128i sum17 = _mm_add_epi16(row1, row7); \
3179- __m128i sum35 = _mm_add_epi16(row3, row5); \
3180- dct_rot(y4o, y5o, sum17, sum35, rot1_0, rot1_1); \
3181- dct_wadd(x4, y0o, y4o); \
3182- dct_wadd(x5, y1o, y5o); \
3183- dct_wadd(x6, y2o, y5o); \
3184- dct_wadd(x7, y3o, y4o); \
3185- dct_bfly32o(row0, row7, x0, x7, bias, shift); \
3186- dct_bfly32o(row1, row6, x1, x6, bias, shift); \
3187- dct_bfly32o(row2, row5, x2, x5, bias, shift); \
3188- dct_bfly32o(row3, row4, x3, x4, bias, shift); \
3189- }
3190-
3191- __m128i rot0_0 =
3192- dct_const(stbi__f2f(0.5411961f),
3193- stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
3194- __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f(0.765366865f),
3195- stbi__f2f(0.5411961f));
3196- __m128i rot1_0 =
3197- dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f),
3198- stbi__f2f(1.175875602f));
3199- __m128i rot1_1 =
3200- dct_const(stbi__f2f(1.175875602f),
3201- stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
3202- __m128i rot2_0 =
3203- dct_const(stbi__f2f(-1.961570560f) + stbi__f2f(0.298631336f),
3204- stbi__f2f(-1.961570560f));
3205- __m128i rot2_1 =
3206- dct_const(stbi__f2f(-1.961570560f),
3207- stbi__f2f(-1.961570560f) + stbi__f2f(3.072711026f));
3208- __m128i rot3_0 =
3209- dct_const(stbi__f2f(-0.390180644f) + stbi__f2f(2.053119869f),
3210- stbi__f2f(-0.390180644f));
3211- __m128i rot3_1 =
3212- dct_const(stbi__f2f(-0.390180644f),
3213- stbi__f2f(-0.390180644f) + stbi__f2f(1.501321110f));
3214-
3215- // rounding biases in column/row passes, see stbi__idct_block for
3216- // explanation.
3217- __m128i bias_0 = _mm_set1_epi32(512);
3218- __m128i bias_1 = _mm_set1_epi32(65536 + (128 << 17));
3219-
3220- // load
3221- row0 = _mm_load_si128((const __m128i *)(data + 0 * 8));
3222- row1 = _mm_load_si128((const __m128i *)(data + 1 * 8));
3223- row2 = _mm_load_si128((const __m128i *)(data + 2 * 8));
3224- row3 = _mm_load_si128((const __m128i *)(data + 3 * 8));
3225- row4 = _mm_load_si128((const __m128i *)(data + 4 * 8));
3226- row5 = _mm_load_si128((const __m128i *)(data + 5 * 8));
3227- row6 = _mm_load_si128((const __m128i *)(data + 6 * 8));
3228- row7 = _mm_load_si128((const __m128i *)(data + 7 * 8));
3229-
3230- // column pass
3231- dct_pass(bias_0, 10);
3232-
3233- {
3234- // 16bit 8x8 transpose pass 1
3235- dct_interleave16(row0, row4);
3236- dct_interleave16(row1, row5);
3237- dct_interleave16(row2, row6);
3238- dct_interleave16(row3, row7);
3239-
3240- // transpose pass 2
3241- dct_interleave16(row0, row2);
3242- dct_interleave16(row1, row3);
3243- dct_interleave16(row4, row6);
3244- dct_interleave16(row5, row7);
3245-
3246- // transpose pass 3
3247- dct_interleave16(row0, row1);
3248- dct_interleave16(row2, row3);
3249- dct_interleave16(row4, row5);
3250- dct_interleave16(row6, row7);
3251- }
3252-
3253- // row pass
3254- dct_pass(bias_1, 17);
3255-
3256- {
3257- // pack
3258- __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
3259- __m128i p1 = _mm_packus_epi16(row2, row3);
3260- __m128i p2 = _mm_packus_epi16(row4, row5);
3261- __m128i p3 = _mm_packus_epi16(row6, row7);
3262-
3263- // 8bit 8x8 transpose pass 1
3264- dct_interleave8(p0, p2); // a0e0a1e1...
3265- dct_interleave8(p1, p3); // c0g0c1g1...
3266-
3267- // transpose pass 2
3268- dct_interleave8(p0, p1); // a0c0e0g0...
3269- dct_interleave8(p2, p3); // b0d0f0h0...
3270-
3271- // transpose pass 3
3272- dct_interleave8(p0, p2); // a0b0c0d0...
3273- dct_interleave8(p1, p3); // a4b4c4d4...
3274-
3275- // store
3276- _mm_storel_epi64((__m128i *)out, p0);
3277- out += out_stride;
3278- _mm_storel_epi64((__m128i *)out, _mm_shuffle_epi32(p0, 0x4e));
3279- out += out_stride;
3280- _mm_storel_epi64((__m128i *)out, p2);
3281- out += out_stride;
3282- _mm_storel_epi64((__m128i *)out, _mm_shuffle_epi32(p2, 0x4e));
3283- out += out_stride;
3284- _mm_storel_epi64((__m128i *)out, p1);
3285- out += out_stride;
3286- _mm_storel_epi64((__m128i *)out, _mm_shuffle_epi32(p1, 0x4e));
3287- out += out_stride;
3288- _mm_storel_epi64((__m128i *)out, p3);
3289- out += out_stride;
3290- _mm_storel_epi64((__m128i *)out, _mm_shuffle_epi32(p3, 0x4e));
3291- }
3292-
3293-#undef dct_const
3294-#undef dct_rot
3295-#undef dct_widen
3296-#undef dct_wadd
3297-#undef dct_wsub
3298-#undef dct_bfly32o
3299-#undef dct_interleave8
3300-#undef dct_interleave16
3301-#undef dct_pass
3302-}
3303-
3304-#endif // STBI_SSE2
3305-
3306-#ifdef STBI_NEON
3307-
3308-// NEON integer IDCT. should produce bit-identical
3309-// results to the generic C version.
3310-static void
3311-stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
3312-{
3313- int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
3314-
3315- int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
3316- int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
3317- int16x4_t rot0_2 = vdup_n_s16(stbi__f2f(0.765366865f));
3318- int16x4_t rot1_0 = vdup_n_s16(stbi__f2f(1.175875602f));
3319- int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
3320- int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
3321- int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
3322- int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
3323- int16x4_t rot3_0 = vdup_n_s16(stbi__f2f(0.298631336f));
3324- int16x4_t rot3_1 = vdup_n_s16(stbi__f2f(2.053119869f));
3325- int16x4_t rot3_2 = vdup_n_s16(stbi__f2f(3.072711026f));
3326- int16x4_t rot3_3 = vdup_n_s16(stbi__f2f(1.501321110f));
3327-
3328-#define dct_long_mul(out, inq, coeff) \
3329- int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
3330- int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
3331-
3332-#define dct_long_mac(out, acc, inq, coeff) \
3333- int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
3334- int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
3335-
3336-#define dct_widen(out, inq) \
3337- int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
3338- int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
3339-
3340-// wide add
3341-#define dct_wadd(out, a, b) \
3342- int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
3343- int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
3344-
3345-// wide sub
3346-#define dct_wsub(out, a, b) \
3347- int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
3348- int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
3349-
3350-// butterfly a/b, then shift using "shiftop" by "s" and pack
3351-#define dct_bfly32o(out0, out1, a, b, shiftop, s) \
3352- { \
3353- dct_wadd(sum, a, b); \
3354- dct_wsub(dif, a, b); \
3355- out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
3356- out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
3357- }
3358-
3359-#define dct_pass(shiftop, shift) \
3360- { \
3361- /* even part */ \
3362- int16x8_t sum26 = vaddq_s16(row2, row6); \
3363- dct_long_mul(p1e, sum26, rot0_0); \
3364- dct_long_mac(t2e, p1e, row6, rot0_1); \
3365- dct_long_mac(t3e, p1e, row2, rot0_2); \
3366- int16x8_t sum04 = vaddq_s16(row0, row4); \
3367- int16x8_t dif04 = vsubq_s16(row0, row4); \
3368- dct_widen(t0e, sum04); \
3369- dct_widen(t1e, dif04); \
3370- dct_wadd(x0, t0e, t3e); \
3371- dct_wsub(x3, t0e, t3e); \
3372- dct_wadd(x1, t1e, t2e); \
3373- dct_wsub(x2, t1e, t2e); \
3374- /* odd part */ \
3375- int16x8_t sum15 = vaddq_s16(row1, row5); \
3376- int16x8_t sum17 = vaddq_s16(row1, row7); \
3377- int16x8_t sum35 = vaddq_s16(row3, row5); \
3378- int16x8_t sum37 = vaddq_s16(row3, row7); \
3379- int16x8_t sumodd = vaddq_s16(sum17, sum35); \
3380- dct_long_mul(p5o, sumodd, rot1_0); \
3381- dct_long_mac(p1o, p5o, sum17, rot1_1); \
3382- dct_long_mac(p2o, p5o, sum35, rot1_2); \
3383- dct_long_mul(p3o, sum37, rot2_0); \
3384- dct_long_mul(p4o, sum15, rot2_1); \
3385- dct_wadd(sump13o, p1o, p3o); \
3386- dct_wadd(sump24o, p2o, p4o); \
3387- dct_wadd(sump23o, p2o, p3o); \
3388- dct_wadd(sump14o, p1o, p4o); \
3389- dct_long_mac(x4, sump13o, row7, rot3_0); \
3390- dct_long_mac(x5, sump24o, row5, rot3_1); \
3391- dct_long_mac(x6, sump23o, row3, rot3_2); \
3392- dct_long_mac(x7, sump14o, row1, rot3_3); \
3393- dct_bfly32o(row0, row7, x0, x7, shiftop, shift); \
3394- dct_bfly32o(row1, row6, x1, x6, shiftop, shift); \
3395- dct_bfly32o(row2, row5, x2, x5, shiftop, shift); \
3396- dct_bfly32o(row3, row4, x3, x4, shiftop, shift); \
3397- }
3398-
3399- // load
3400- row0 = vld1q_s16(data + 0 * 8);
3401- row1 = vld1q_s16(data + 1 * 8);
3402- row2 = vld1q_s16(data + 2 * 8);
3403- row3 = vld1q_s16(data + 3 * 8);
3404- row4 = vld1q_s16(data + 4 * 8);
3405- row5 = vld1q_s16(data + 5 * 8);
3406- row6 = vld1q_s16(data + 6 * 8);
3407- row7 = vld1q_s16(data + 7 * 8);
3408-
3409- // add DC bias
3410- row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
3411-
3412- // column pass
3413- dct_pass(vrshrn_n_s32, 10);
3414-
3415- // 16bit 8x8 transpose
3416- {
3417-// these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
3418-// whether compilers actually get this is another story, sadly.
3419-#define dct_trn16(x, y) \
3420- { \
3421- int16x8x2_t t = vtrnq_s16(x, y); \
3422- x = t.val[0]; \
3423- y = t.val[1]; \
3424- }
3425-#define dct_trn32(x, y) \
3426- { \
3427- int32x4x2_t t = \
3428- vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); \
3429- x = vreinterpretq_s16_s32(t.val[0]); \
3430- y = vreinterpretq_s16_s32(t.val[1]); \
3431- }
3432-#define dct_trn64(x, y) \
3433- { \
3434- int16x8_t x0 = x; \
3435- int16x8_t y0 = y; \
3436- x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); \
3437- y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); \
3438- }
3439-
3440- // pass 1
3441- dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
3442- dct_trn16(row2, row3);
3443- dct_trn16(row4, row5);
3444- dct_trn16(row6, row7);
3445-
3446- // pass 2
3447- dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
3448- dct_trn32(row1, row3);
3449- dct_trn32(row4, row6);
3450- dct_trn32(row5, row7);
3451-
3452- // pass 3
3453- dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
3454- dct_trn64(row1, row5);
3455- dct_trn64(row2, row6);
3456- dct_trn64(row3, row7);
3457-
3458-#undef dct_trn16
3459-#undef dct_trn32
3460-#undef dct_trn64
3461- }
3462-
3463- // row pass
3464- // vrshrn_n_s32 only supports shifts up to 16, we need
3465- // 17. so do a non-rounding shift of 16 first then follow
3466- // up with a rounding shift by 1.
3467- dct_pass(vshrn_n_s32, 16);
3468-
3469- {
3470- // pack and round
3471- uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
3472- uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
3473- uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
3474- uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
3475- uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
3476- uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
3477- uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
3478- uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
3479-
3480- // again, these can translate into one instruction, but often don't.
3481-#define dct_trn8_8(x, y) \
3482- { \
3483- uint8x8x2_t t = vtrn_u8(x, y); \
3484- x = t.val[0]; \
3485- y = t.val[1]; \
3486- }
3487-#define dct_trn8_16(x, y) \
3488- { \
3489- uint16x4x2_t t = \
3490- vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); \
3491- x = vreinterpret_u8_u16(t.val[0]); \
3492- y = vreinterpret_u8_u16(t.val[1]); \
3493- }
3494-#define dct_trn8_32(x, y) \
3495- { \
3496- uint32x2x2_t t = \
3497- vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); \
3498- x = vreinterpret_u8_u32(t.val[0]); \
3499- y = vreinterpret_u8_u32(t.val[1]); \
3500- }
3501-
3502- // sadly can't use interleaved stores here since we only write
3503- // 8 bytes to each scan line!
3504-
3505- // 8x8 8-bit transpose pass 1
3506- dct_trn8_8(p0, p1);
3507- dct_trn8_8(p2, p3);
3508- dct_trn8_8(p4, p5);
3509- dct_trn8_8(p6, p7);
3510-
3511- // pass 2
3512- dct_trn8_16(p0, p2);
3513- dct_trn8_16(p1, p3);
3514- dct_trn8_16(p4, p6);
3515- dct_trn8_16(p5, p7);
3516-
3517- // pass 3
3518- dct_trn8_32(p0, p4);
3519- dct_trn8_32(p1, p5);
3520- dct_trn8_32(p2, p6);
3521- dct_trn8_32(p3, p7);
3522-
3523- // store
3524- vst1_u8(out, p0);
3525- out += out_stride;
3526- vst1_u8(out, p1);
3527- out += out_stride;
3528- vst1_u8(out, p2);
3529- out += out_stride;
3530- vst1_u8(out, p3);
3531- out += out_stride;
3532- vst1_u8(out, p4);
3533- out += out_stride;
3534- vst1_u8(out, p5);
3535- out += out_stride;
3536- vst1_u8(out, p6);
3537- out += out_stride;
3538- vst1_u8(out, p7);
3539-
3540-#undef dct_trn8_8
3541-#undef dct_trn8_16
3542-#undef dct_trn8_32
3543- }
3544-
3545-#undef dct_long_mul
3546-#undef dct_long_mac
3547-#undef dct_widen
3548-#undef dct_wadd
3549-#undef dct_wsub
3550-#undef dct_bfly32o
3551-#undef dct_pass
3552-}
3553-
3554-#endif // STBI_NEON
3555-
3556-#define STBI__MARKER_none 0xff
3557-// if there's a pending marker from the entropy stream, return that
3558-// otherwise, fetch from the stream and get a marker. if there's no
3559-// marker, return 0xff, which is never a valid marker value
3560-static stbi_uc
3561-stbi__get_marker(stbi__jpeg *j)
3562-{
3563- stbi_uc x;
3564- if (j->marker != STBI__MARKER_none) {
3565- x = j->marker;
3566- j->marker = STBI__MARKER_none;
3567- return x;
3568- }
3569- x = stbi__get8(j->s);
3570- if (x != 0xff) {
3571- return STBI__MARKER_none;
3572- }
3573- while (x == 0xff) {
3574- x = stbi__get8(j->s); // consume repeated 0xff fill bytes
3575- }
3576- return x;
3577-}
3578-
3579-// in each scan, we'll have scan_n components, and the order
3580-// of the components is specified by order[]
3581-#define STBI__RESTART(x) ((x) >= 0xd0 && (x) <= 0xd7)
3582-
3583-// after a restart interval, stbi__jpeg_reset the entropy decoder and
3584-// the dc prediction
3585-static void
3586-stbi__jpeg_reset(stbi__jpeg *j)
3587-{
3588- j->code_bits = 0;
3589- j->code_buffer = 0;
3590- j->nomore = 0;
3591- j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred =
3592- j->img_comp[3].dc_pred = 0;
3593- j->marker = STBI__MARKER_none;
3594- j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
3595- j->eob_run = 0;
3596- // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
3597- // since we don't even allow 1<<30 pixels
3598-}
3599-
3600-static int
3601-stbi__parse_entropy_coded_data(stbi__jpeg *z)
3602-{
3603- stbi__jpeg_reset(z);
3604- if (!z->progressive) {
3605- if (z->scan_n == 1) {
3606- int i, j;
3607- STBI_SIMD_ALIGN(short, data[64]);
3608- int n = z->order[0];
3609- // non-interleaved data, we just need to process one block at a
3610- // time, in trivial scanline order number of blocks to do just
3611- // depends on how many actual "pixels" this component has,
3612- // independent of interleaved MCU blocking and such
3613- int w = (z->img_comp[n].x + 7) >> 3;
3614- int h = (z->img_comp[n].y + 7) >> 3;
3615- for (j = 0; j < h; ++j) {
3616- for (i = 0; i < w; ++i) {
3617- int ha = z->img_comp[n].ha;
3618- if (!stbi__jpeg_decode_block(
3619- z, data, z->huff_dc + z->img_comp[n].hd,
3620- z->huff_ac + ha, z->fast_ac[ha], n,
3621- z->dequant[z->img_comp[n].tq])) {
3622- return 0;
3623- }
3624- z->idct_block_kernel(z->img_comp[n].data +
3625- z->img_comp[n].w2 * j * 8 + i * 8,
3626- z->img_comp[n].w2, data);
3627- // every data block is an MCU, so countdown the restart
3628- // interval
3629- if (--z->todo <= 0) {
3630- if (z->code_bits < 24) {
3631- stbi__grow_buffer_unsafe(z);
3632- }
3633- // if it's NOT a restart, then just bail, so we get
3634- // corrupt data rather than no data
3635- if (!STBI__RESTART(z->marker)) {
3636- return 1;
3637- }
3638- stbi__jpeg_reset(z);
3639- }
3640- }
3641- }
3642- return 1;
3643- } else { // interleaved
3644- int i, j, k, x, y;
3645- STBI_SIMD_ALIGN(short, data[64]);
3646- for (j = 0; j < z->img_mcu_y; ++j) {
3647- for (i = 0; i < z->img_mcu_x; ++i) {
3648- // scan an interleaved mcu... process scan_n components in
3649- // order
3650- for (k = 0; k < z->scan_n; ++k) {
3651- int n = z->order[k];
3652- // scan out an mcu's worth of this component; that's
3653- // just determined by the basic H and V specified for
3654- // the component
3655- for (y = 0; y < z->img_comp[n].v; ++y) {
3656- for (x = 0; x < z->img_comp[n].h; ++x) {
3657- int x2 = (i * z->img_comp[n].h + x) * 8;
3658- int y2 = (j * z->img_comp[n].v + y) * 8;
3659- int ha = z->img_comp[n].ha;
3660- if (!stbi__jpeg_decode_block(
3661- z, data, z->huff_dc + z->img_comp[n].hd,
3662- z->huff_ac + ha, z->fast_ac[ha], n,
3663- z->dequant[z->img_comp[n].tq])) {
3664- return 0;
3665- }
3666- z->idct_block_kernel(
3667- z->img_comp[n].data +
3668- z->img_comp[n].w2 * y2 + x2,
3669- z->img_comp[n].w2, data);
3670- }
3671- }
3672- }
3673- // after all interleaved components, that's an interleaved
3674- // MCU, so now count down the restart interval
3675- if (--z->todo <= 0) {
3676- if (z->code_bits < 24) {
3677- stbi__grow_buffer_unsafe(z);
3678- }
3679- if (!STBI__RESTART(z->marker)) {
3680- return 1;
3681- }
3682- stbi__jpeg_reset(z);
3683- }
3684- }
3685- }
3686- return 1;
3687- }
3688- } else {
3689- if (z->scan_n == 1) {
3690- int i, j;
3691- int n = z->order[0];
3692- // non-interleaved data, we just need to process one block at a
3693- // time, in trivial scanline order number of blocks to do just
3694- // depends on how many actual "pixels" this component has,
3695- // independent of interleaved MCU blocking and such
3696- int w = (z->img_comp[n].x + 7) >> 3;
3697- int h = (z->img_comp[n].y + 7) >> 3;
3698- for (j = 0; j < h; ++j) {
3699- for (i = 0; i < w; ++i) {
3700- short *data = z->img_comp[n].coeff +
3701- 64 * (i + j * z->img_comp[n].coeff_w);
3702- if (z->spec_start == 0) {
3703- if (!stbi__jpeg_decode_block_prog_dc(
3704- z, data, &z->huff_dc[z->img_comp[n].hd], n)) {
3705- return 0;
3706- }
3707- } else {
3708- int ha = z->img_comp[n].ha;
3709- if (!stbi__jpeg_decode_block_prog_ac(
3710- z, data, &z->huff_ac[ha], z->fast_ac[ha])) {
3711- return 0;
3712- }
3713- }
3714- // every data block is an MCU, so countdown the restart
3715- // interval
3716- if (--z->todo <= 0) {
3717- if (z->code_bits < 24) {
3718- stbi__grow_buffer_unsafe(z);
3719- }
3720- if (!STBI__RESTART(z->marker)) {
3721- return 1;
3722- }
3723- stbi__jpeg_reset(z);
3724- }
3725- }
3726- }
3727- return 1;
3728- } else { // interleaved
3729- int i, j, k, x, y;
3730- for (j = 0; j < z->img_mcu_y; ++j) {
3731- for (i = 0; i < z->img_mcu_x; ++i) {
3732- // scan an interleaved mcu... process scan_n components in
3733- // order
3734- for (k = 0; k < z->scan_n; ++k) {
3735- int n = z->order[k];
3736- // scan out an mcu's worth of this component; that's
3737- // just determined by the basic H and V specified for
3738- // the component
3739- for (y = 0; y < z->img_comp[n].v; ++y) {
3740- for (x = 0; x < z->img_comp[n].h; ++x) {
3741- int x2 = (i * z->img_comp[n].h + x);
3742- int y2 = (j * z->img_comp[n].v + y);
3743- short *data =
3744- z->img_comp[n].coeff +
3745- 64 * (x2 + y2 * z->img_comp[n].coeff_w);
3746- if (!stbi__jpeg_decode_block_prog_dc(
3747- z, data, &z->huff_dc[z->img_comp[n].hd],
3748- n)) {
3749- return 0;
3750- }
3751- }
3752- }
3753- }
3754- // after all interleaved components, that's an interleaved
3755- // MCU, so now count down the restart interval
3756- if (--z->todo <= 0) {
3757- if (z->code_bits < 24) {
3758- stbi__grow_buffer_unsafe(z);
3759- }
3760- if (!STBI__RESTART(z->marker)) {
3761- return 1;
3762- }
3763- stbi__jpeg_reset(z);
3764- }
3765- }
3766- }
3767- return 1;
3768- }
3769- }
3770-}
3771-
3772-static void
3773-stbi__jpeg_dequantize(short *data, stbi__uint16 *dequant)
3774-{
3775- int i;
3776- for (i = 0; i < 64; ++i) {
3777- data[i] *= dequant[i];
3778- }
3779-}
3780-
3781-static void
3782-stbi__jpeg_finish(stbi__jpeg *z)
3783-{
3784- if (z->progressive) {
3785- // dequantize and idct the data
3786- int i, j, n;
3787- for (n = 0; n < z->s->img_n; ++n) {
3788- int w = (z->img_comp[n].x + 7) >> 3;
3789- int h = (z->img_comp[n].y + 7) >> 3;
3790- for (j = 0; j < h; ++j) {
3791- for (i = 0; i < w; ++i) {
3792- short *data = z->img_comp[n].coeff +
3793- 64 * (i + j * z->img_comp[n].coeff_w);
3794- stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
3795- z->idct_block_kernel(z->img_comp[n].data +
3796- z->img_comp[n].w2 * j * 8 + i * 8,
3797- z->img_comp[n].w2, data);
3798- }
3799- }
3800- }
3801- }
3802-}
3803-
3804-static int
3805-stbi__process_marker(stbi__jpeg *z, int m)
3806-{
3807- int L;
3808- switch (m) {
3809- case STBI__MARKER_none: // no marker found
3810- return stbi__err("expected marker", "Corrupt JPEG");
3811-
3812- case 0xDD: // DRI - specify restart interval
3813- if (stbi__get16be(z->s) != 4) {
3814- return stbi__err("bad DRI len", "Corrupt JPEG");
3815- }
3816- z->restart_interval = stbi__get16be(z->s);
3817- return 1;
3818-
3819- case 0xDB: // DQT - define quantization table
3820- L = stbi__get16be(z->s) - 2;
3821- while (L > 0) {
3822- int q = stbi__get8(z->s);
3823- int p = q >> 4, sixteen = (p != 0);
3824- int t = q & 15, i;
3825- if (p != 0 && p != 1) {
3826- return stbi__err("bad DQT type", "Corrupt JPEG");
3827- }
3828- if (t > 3) {
3829- return stbi__err("bad DQT table", "Corrupt JPEG");
3830- }
3831-
3832- for (i = 0; i < 64; ++i) {
3833- z->dequant[t][stbi__jpeg_dezigzag[i]] =
3834- (stbi__uint16)(sixteen ? stbi__get16be(z->s)
3835- : stbi__get8(z->s));
3836- }
3837- L -= (sixteen ? 129 : 65);
3838- }
3839- return L == 0;
3840-
3841- case 0xC4: // DHT - define huffman table
3842- L = stbi__get16be(z->s) - 2;
3843- while (L > 0) {
3844- stbi_uc *v;
3845- int sizes[16], i, n = 0;
3846- int q = stbi__get8(z->s);
3847- int tc = q >> 4;
3848- int th = q & 15;
3849- if (tc > 1 || th > 3) {
3850- return stbi__err("bad DHT header", "Corrupt JPEG");
3851- }
3852- for (i = 0; i < 16; ++i) {
3853- sizes[i] = stbi__get8(z->s);
3854- n += sizes[i];
3855- }
3856- if (n > 256) {
3857- return stbi__err("bad DHT header",
3858- "Corrupt JPEG"); // Loop over i < n would write
3859- // past end of values!
3860- }
3861- L -= 17;
3862- if (tc == 0) {
3863- if (!stbi__build_huffman(z->huff_dc + th, sizes)) {
3864- return 0;
3865- }
3866- v = z->huff_dc[th].values;
3867- } else {
3868- if (!stbi__build_huffman(z->huff_ac + th, sizes)) {
3869- return 0;
3870- }
3871- v = z->huff_ac[th].values;
3872- }
3873- for (i = 0; i < n; ++i) {
3874- v[i] = stbi__get8(z->s);
3875- }
3876- if (tc != 0) {
3877- stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
3878- }
3879- L -= n;
3880- }
3881- return L == 0;
3882- }
3883-
3884- // check for comment block or APP blocks
3885- if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
3886- L = stbi__get16be(z->s);
3887- if (L < 2) {
3888- if (m == 0xFE) {
3889- return stbi__err("bad COM len", "Corrupt JPEG");
3890- } else {
3891- return stbi__err("bad APP len", "Corrupt JPEG");
3892- }
3893- }
3894- L -= 2;
3895-
3896- if (m == 0xE0 && L >= 5) { // JFIF APP0 segment
3897- static const unsigned char tag[5] = {'J', 'F', 'I', 'F', '\0'};
3898- int ok = 1;
3899- int i;
3900- for (i = 0; i < 5; ++i) {
3901- if (stbi__get8(z->s) != tag[i]) {
3902- ok = 0;
3903- }
3904- }
3905- L -= 5;
3906- if (ok) {
3907- z->jfif = 1;
3908- }
3909- } else if (m == 0xEE && L >= 12) { // Adobe APP14 segment
3910- static const unsigned char tag[6] = {'A', 'd', 'o', 'b', 'e', '\0'};
3911- int ok = 1;
3912- int i;
3913- for (i = 0; i < 6; ++i) {
3914- if (stbi__get8(z->s) != tag[i]) {
3915- ok = 0;
3916- }
3917- }
3918- L -= 6;
3919- if (ok) {
3920- stbi__get8(z->s); // version
3921- stbi__get16be(z->s); // flags0
3922- stbi__get16be(z->s); // flags1
3923- z->app14_color_transform = stbi__get8(z->s); // color transform
3924- L -= 6;
3925- }
3926- }
3927-
3928- stbi__skip(z->s, L);
3929- return 1;
3930- }
3931-
3932- return stbi__err("unknown marker", "Corrupt JPEG");
3933-}
3934-
3935-// after we see SOS
3936-static int
3937-stbi__process_scan_header(stbi__jpeg *z)
3938-{
3939- int i;
3940- int Ls = stbi__get16be(z->s);
3941- z->scan_n = stbi__get8(z->s);
3942- if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int)z->s->img_n) {
3943- return stbi__err("bad SOS component count", "Corrupt JPEG");
3944- }
3945- if (Ls != 6 + 2 * z->scan_n) {
3946- return stbi__err("bad SOS len", "Corrupt JPEG");
3947- }
3948- for (i = 0; i < z->scan_n; ++i) {
3949- int id = stbi__get8(z->s), which;
3950- int q = stbi__get8(z->s);
3951- for (which = 0; which < z->s->img_n; ++which) {
3952- if (z->img_comp[which].id == id) {
3953- break;
3954- }
3955- }
3956- if (which == z->s->img_n) {
3957- return 0; // no match
3958- }
3959- z->img_comp[which].hd = q >> 4;
3960- if (z->img_comp[which].hd > 3) {
3961- return stbi__err("bad DC huff", "Corrupt JPEG");
3962- }
3963- z->img_comp[which].ha = q & 15;
3964- if (z->img_comp[which].ha > 3) {
3965- return stbi__err("bad AC huff", "Corrupt JPEG");
3966- }
3967- z->order[i] = which;
3968- }
3969-
3970- {
3971- int aa;
3972- z->spec_start = stbi__get8(z->s);
3973- z->spec_end = stbi__get8(z->s); // should be 63, but might be 0
3974- aa = stbi__get8(z->s);
3975- z->succ_high = (aa >> 4);
3976- z->succ_low = (aa & 15);
3977- if (z->progressive) {
3978- if (z->spec_start > 63 || z->spec_end > 63 ||
3979- z->spec_start > z->spec_end || z->succ_high > 13 ||
3980- z->succ_low > 13) {
3981- return stbi__err("bad SOS", "Corrupt JPEG");
3982- }
3983- } else {
3984- if (z->spec_start != 0) {
3985- return stbi__err("bad SOS", "Corrupt JPEG");
3986- }
3987- if (z->succ_high != 0 || z->succ_low != 0) {
3988- return stbi__err("bad SOS", "Corrupt JPEG");
3989- }
3990- z->spec_end = 63;
3991- }
3992- }
3993-
3994- return 1;
3995-}
3996-
3997-static int
3998-stbi__free_jpeg_components(stbi__jpeg *z, int ncomp, int why)
3999-{
4000- int i;
4001- for (i = 0; i < ncomp; ++i) {
4002- if (z->img_comp[i].raw_data) {
4003- STBI_FREE(z->img_comp[i].raw_data);
4004- z->img_comp[i].raw_data = NULL;
4005- z->img_comp[i].data = NULL;
4006- }
4007- if (z->img_comp[i].raw_coeff) {
4008- STBI_FREE(z->img_comp[i].raw_coeff);
4009- z->img_comp[i].raw_coeff = 0;
4010- z->img_comp[i].coeff = 0;
4011- }
4012- if (z->img_comp[i].linebuf) {
4013- STBI_FREE(z->img_comp[i].linebuf);
4014- z->img_comp[i].linebuf = NULL;
4015- }
4016- }
4017- return why;
4018-}
4019-
4020-static int
4021-stbi__process_frame_header(stbi__jpeg *z, int scan)
4022-{
4023- stbi__context *s = z->s;
4024- int Lf, p, i, q, h_max = 1, v_max = 1, c;
4025- Lf = stbi__get16be(s);
4026- if (Lf < 11) {
4027- return stbi__err("bad SOF len", "Corrupt JPEG"); // JPEG
4028- }
4029- p = stbi__get8(s);
4030- if (p != 8) {
4031- return stbi__err(
4032- "only 8-bit",
4033- "JPEG format not supported: 8-bit only"); // JPEG baseline
4034- }
4035- s->img_y = stbi__get16be(s);
4036- if (s->img_y == 0) {
4037- return stbi__err(
4038- "no header height",
4039- "JPEG format not supported: delayed height"); // Legal, but we don't
4040- // handle it--but
4041- // neither does IJG
4042- }
4043- s->img_x = stbi__get16be(s);
4044- if (s->img_x == 0) {
4045- return stbi__err("0 width", "Corrupt JPEG"); // JPEG requires
4046- }
4047- if (s->img_y > STBI_MAX_DIMENSIONS) {
4048- return stbi__err("too large", "Very large image (corrupt?)");
4049- }
4050- if (s->img_x > STBI_MAX_DIMENSIONS) {
4051- return stbi__err("too large", "Very large image (corrupt?)");
4052- }
4053- c = stbi__get8(s);
4054- if (c != 3 && c != 1 && c != 4) {
4055- return stbi__err("bad component count", "Corrupt JPEG");
4056- }
4057- s->img_n = c;
4058- for (i = 0; i < c; ++i) {
4059- z->img_comp[i].data = NULL;
4060- z->img_comp[i].linebuf = NULL;
4061- }
4062-
4063- if (Lf != 8 + 3 * s->img_n) {
4064- return stbi__err("bad SOF len", "Corrupt JPEG");
4065- }
4066-
4067- z->rgb = 0;
4068- for (i = 0; i < s->img_n; ++i) {
4069- static const unsigned char rgb[3] = {'R', 'G', 'B'};
4070- z->img_comp[i].id = stbi__get8(s);
4071- if (s->img_n == 3 && z->img_comp[i].id == rgb[i]) {
4072- ++z->rgb;
4073- }
4074- q = stbi__get8(s);
4075- z->img_comp[i].h = (q >> 4);
4076- if (!z->img_comp[i].h || z->img_comp[i].h > 4) {
4077- return stbi__err("bad H", "Corrupt JPEG");
4078- }
4079- z->img_comp[i].v = q & 15;
4080- if (!z->img_comp[i].v || z->img_comp[i].v > 4) {
4081- return stbi__err("bad V", "Corrupt JPEG");
4082- }
4083- z->img_comp[i].tq = stbi__get8(s);
4084- if (z->img_comp[i].tq > 3) {
4085- return stbi__err("bad TQ", "Corrupt JPEG");
4086- }
4087- }
4088-
4089- if (scan != STBI__SCAN_load) {
4090- return 1;
4091- }
4092-
4093- if (!stbi__mad3sizes_valid(s->img_x, s->img_y, s->img_n, 0)) {
4094- return stbi__err("too large", "Image too large to decode");
4095- }
4096-
4097- for (i = 0; i < s->img_n; ++i) {
4098- if (z->img_comp[i].h > h_max) {
4099- h_max = z->img_comp[i].h;
4100- }
4101- if (z->img_comp[i].v > v_max) {
4102- v_max = z->img_comp[i].v;
4103- }
4104- }
4105-
4106- // check that plane subsampling factors are integer ratios; our resamplers
4107- // can't deal with fractional ratios and I've never seen a non-corrupted
4108- // JPEG file actually use them
4109- for (i = 0; i < s->img_n; ++i) {
4110- if (h_max % z->img_comp[i].h != 0) {
4111- return stbi__err("bad H", "Corrupt JPEG");
4112- }
4113- if (v_max % z->img_comp[i].v != 0) {
4114- return stbi__err("bad V", "Corrupt JPEG");
4115- }
4116- }
4117-
4118- // compute interleaved mcu info
4119- z->img_h_max = h_max;
4120- z->img_v_max = v_max;
4121- z->img_mcu_w = h_max * 8;
4122- z->img_mcu_h = v_max * 8;
4123- // these sizes can't be more than 17 bits
4124- z->img_mcu_x = (s->img_x + z->img_mcu_w - 1) / z->img_mcu_w;
4125- z->img_mcu_y = (s->img_y + z->img_mcu_h - 1) / z->img_mcu_h;
4126-
4127- for (i = 0; i < s->img_n; ++i) {
4128- // number of effective pixels (e.g. for non-interleaved MCU)
4129- z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max - 1) / h_max;
4130- z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max - 1) / v_max;
4131- // to simplify generation, we'll allocate enough memory to decode
4132- // the bogus oversized data from using interleaved MCUs and their
4133- // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
4134- // discard the extra data until colorspace conversion
4135- //
4136- // img_mcu_x, img_mcu_y: <=17 bits; comp[i].h and .v are <=4 (checked
4137- // earlier) so these muls can't overflow with 32-bit ints (which we
4138- // require)
4139- z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
4140- z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
4141- z->img_comp[i].coeff = 0;
4142- z->img_comp[i].raw_coeff = 0;
4143- z->img_comp[i].linebuf = NULL;
4144- z->img_comp[i].raw_data =
4145- stbi__malloc_mad2(z->img_comp[i].w2, z->img_comp[i].h2, 15);
4146- if (z->img_comp[i].raw_data == NULL) {
4147- return stbi__free_jpeg_components(
4148- z, i + 1, stbi__err("outofmem", "Out of memory"));
4149- }
4150- // align blocks for idct using mmx/sse
4151- z->img_comp[i].data =
4152- (stbi_uc *)(((size_t)z->img_comp[i].raw_data + 15) & ~15);
4153- if (z->progressive) {
4154- // w2, h2 are multiples of 8 (see above)
4155- z->img_comp[i].coeff_w = z->img_comp[i].w2 / 8;
4156- z->img_comp[i].coeff_h = z->img_comp[i].h2 / 8;
4157- z->img_comp[i].raw_coeff = stbi__malloc_mad3(
4158- z->img_comp[i].w2, z->img_comp[i].h2, sizeof(short), 15);
4159- if (z->img_comp[i].raw_coeff == NULL) {
4160- return stbi__free_jpeg_components(
4161- z, i + 1, stbi__err("outofmem", "Out of memory"));
4162- }
4163- z->img_comp[i].coeff =
4164- (short *)(((size_t)z->img_comp[i].raw_coeff + 15) & ~15);
4165- }
4166- }
4167-
4168- return 1;
4169-}
4170-
4171-// use comparisons since in some cases we handle more than one case (e.g. SOF)
4172-#define stbi__DNL(x) ((x) == 0xdc)
4173-#define stbi__SOI(x) ((x) == 0xd8)
4174-#define stbi__EOI(x) ((x) == 0xd9)
4175-#define stbi__SOF(x) ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
4176-#define stbi__SOS(x) ((x) == 0xda)
4177-
4178-#define stbi__SOF_progressive(x) ((x) == 0xc2)
4179-
4180-static int
4181-stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
4182-{
4183- int m;
4184- z->jfif = 0;
4185- z->app14_color_transform = -1; // valid values are 0,1,2
4186- z->marker = STBI__MARKER_none; // initialize cached marker to empty
4187- m = stbi__get_marker(z);
4188- if (!stbi__SOI(m)) {
4189- return stbi__err("no SOI", "Corrupt JPEG");
4190- }
4191- if (scan == STBI__SCAN_type) {
4192- return 1;
4193- }
4194- m = stbi__get_marker(z);
4195- while (!stbi__SOF(m)) {
4196- if (!stbi__process_marker(z, m)) {
4197- return 0;
4198- }
4199- m = stbi__get_marker(z);
4200- while (m == STBI__MARKER_none) {
4201- // some files have extra padding after their blocks, so ok, we'll
4202- // scan
4203- if (stbi__at_eof(z->s)) {
4204- return stbi__err("no SOF", "Corrupt JPEG");
4205- }
4206- m = stbi__get_marker(z);
4207- }
4208- }
4209- z->progressive = stbi__SOF_progressive(m);
4210- if (!stbi__process_frame_header(z, scan)) {
4211- return 0;
4212- }
4213- return 1;
4214-}
4215-
4216-static stbi_uc
4217-stbi__skip_jpeg_junk_at_end(stbi__jpeg *j)
4218-{
4219- // some JPEGs have junk at end, skip over it but if we find what looks
4220- // like a valid marker, resume there
4221- while (!stbi__at_eof(j->s)) {
4222- stbi_uc x = stbi__get8(j->s);
4223- while (x == 0xff) { // might be a marker
4224- if (stbi__at_eof(j->s)) {
4225- return STBI__MARKER_none;
4226- }
4227- x = stbi__get8(j->s);
4228- if (x != 0x00 && x != 0xff) {
4229- // not a stuffed zero or lead-in to another marker, looks
4230- // like an actual marker, return it
4231- return x;
4232- }
4233- // stuffed zero has x=0 now which ends the loop, meaning we go
4234- // back to regular scan loop.
4235- // repeated 0xff keeps trying to read the next byte of the marker.
4236- }
4237- }
4238- return STBI__MARKER_none;
4239-}
4240-
4241-// decode image to YCbCr format
4242-static int
4243-stbi__decode_jpeg_image(stbi__jpeg *j)
4244-{
4245- int m;
4246- for (m = 0; m < 4; m++) {
4247- j->img_comp[m].raw_data = NULL;
4248- j->img_comp[m].raw_coeff = NULL;
4249- }
4250- j->restart_interval = 0;
4251- if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) {
4252- return 0;
4253- }
4254- m = stbi__get_marker(j);
4255- while (!stbi__EOI(m)) {
4256- if (stbi__SOS(m)) {
4257- if (!stbi__process_scan_header(j)) {
4258- return 0;
4259- }
4260- if (!stbi__parse_entropy_coded_data(j)) {
4261- return 0;
4262- }
4263- if (j->marker == STBI__MARKER_none) {
4264- j->marker = stbi__skip_jpeg_junk_at_end(j);
4265- // if we reach eof without hitting a marker, stbi__get_marker()
4266- // below will fail and we'll eventually return 0
4267- }
4268- m = stbi__get_marker(j);
4269- if (STBI__RESTART(m)) {
4270- m = stbi__get_marker(j);
4271- }
4272- } else if (stbi__DNL(m)) {
4273- int Ld = stbi__get16be(j->s);
4274- stbi__uint32 NL = stbi__get16be(j->s);
4275- if (Ld != 4) {
4276- return stbi__err("bad DNL len", "Corrupt JPEG");
4277- }
4278- if (NL != j->s->img_y) {
4279- return stbi__err("bad DNL height", "Corrupt JPEG");
4280- }
4281- m = stbi__get_marker(j);
4282- } else {
4283- if (!stbi__process_marker(j, m)) {
4284- return 1;
4285- }
4286- m = stbi__get_marker(j);
4287- }
4288- }
4289- if (j->progressive) {
4290- stbi__jpeg_finish(j);
4291- }
4292- return 1;
4293-}
4294-
4295-// static jfif-centered resampling (across block boundaries)
4296-
4297-typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
4298- int w, int hs);
4299-
4300-#define stbi__div4(x) ((stbi_uc)((x) >> 2))
4301-
4302-static stbi_uc *
4303-resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
4304-{
4305- STBI_NOTUSED(out);
4306- STBI_NOTUSED(in_far);
4307- STBI_NOTUSED(w);
4308- STBI_NOTUSED(hs);
4309- return in_near;
4310-}
4311-
4312-static stbi_uc *
4313-stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w,
4314- int hs)
4315-{
4316- // need to generate two samples vertically for every one in input
4317- int i;
4318- STBI_NOTUSED(hs);
4319- for (i = 0; i < w; ++i) {
4320- out[i] = stbi__div4(3 * in_near[i] + in_far[i] + 2);
4321- }
4322- return out;
4323-}
4324-
4325-static stbi_uc *
4326-stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w,
4327- int hs)
4328-{
4329- // need to generate two samples horizontally for every one in input
4330- int i;
4331- stbi_uc *input = in_near;
4332-
4333- if (w == 1) {
4334- // if only one sample, can't do any interpolation
4335- out[0] = out[1] = input[0];
4336- return out;
4337- }
4338-
4339- out[0] = input[0];
4340- out[1] = stbi__div4(input[0] * 3 + input[1] + 2);
4341- for (i = 1; i < w - 1; ++i) {
4342- int n = 3 * input[i] + 2;
4343- out[i * 2 + 0] = stbi__div4(n + input[i - 1]);
4344- out[i * 2 + 1] = stbi__div4(n + input[i + 1]);
4345- }
4346- out[i * 2 + 0] = stbi__div4(input[w - 2] * 3 + input[w - 1] + 2);
4347- out[i * 2 + 1] = input[w - 1];
4348-
4349- STBI_NOTUSED(in_far);
4350- STBI_NOTUSED(hs);
4351-
4352- return out;
4353-}
4354-
4355-#define stbi__div16(x) ((stbi_uc)((x) >> 4))
4356-
4357-static stbi_uc *
4358-stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w,
4359- int hs)
4360-{
4361- // need to generate 2x2 samples for every one in input
4362- int i, t0, t1;
4363- if (w == 1) {
4364- out[0] = out[1] = stbi__div4(3 * in_near[0] + in_far[0] + 2);
4365- return out;
4366- }
4367-
4368- t1 = 3 * in_near[0] + in_far[0];
4369- out[0] = stbi__div4(t1 + 2);
4370- for (i = 1; i < w; ++i) {
4371- t0 = t1;
4372- t1 = 3 * in_near[i] + in_far[i];
4373- out[i * 2 - 1] = stbi__div16(3 * t0 + t1 + 8);
4374- out[i * 2] = stbi__div16(3 * t1 + t0 + 8);
4375- }
4376- out[w * 2 - 1] = stbi__div4(t1 + 2);
4377-
4378- STBI_NOTUSED(hs);
4379-
4380- return out;
4381-}
4382-
4383-#if defined(STBI_SSE2) || defined(STBI_NEON)
4384-static stbi_uc *
4385-stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far,
4386- int w, int hs)
4387-{
4388- // need to generate 2x2 samples for every one in input
4389- int i = 0, t0, t1;
4390-
4391- if (w == 1) {
4392- out[0] = out[1] = stbi__div4(3 * in_near[0] + in_far[0] + 2);
4393- return out;
4394- }
4395-
4396- t1 = 3 * in_near[0] + in_far[0];
4397- // process groups of 8 pixels for as long as we can.
4398- // note we can't handle the last pixel in a row in this loop
4399- // because we need to handle the filter boundary conditions.
4400- for (; i < ((w - 1) & ~7); i += 8) {
4401-#if defined(STBI_SSE2)
4402- // load and perform the vertical filtering pass
4403- // this uses 3*x + y = 4*x + (y - x)
4404- __m128i zero = _mm_setzero_si128();
4405- __m128i farb = _mm_loadl_epi64((__m128i *)(in_far + i));
4406- __m128i nearb = _mm_loadl_epi64((__m128i *)(in_near + i));
4407- __m128i farw = _mm_unpacklo_epi8(farb, zero);
4408- __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
4409- __m128i diff = _mm_sub_epi16(farw, nearw);
4410- __m128i nears = _mm_slli_epi16(nearw, 2);
4411- __m128i curr = _mm_add_epi16(nears, diff); // current row
4412-
4413- // horizontal filter works the same based on shifted vers of current
4414- // row. "prev" is current row shifted right by 1 pixel; we need to
4415- // insert the previous pixel value (from t1).
4416- // "next" is current row shifted left by 1 pixel, with first pixel
4417- // of next block of 8 pixels added in.
4418- __m128i prv0 = _mm_slli_si128(curr, 2);
4419- __m128i nxt0 = _mm_srli_si128(curr, 2);
4420- __m128i prev = _mm_insert_epi16(prv0, t1, 0);
4421- __m128i next =
4422- _mm_insert_epi16(nxt0, 3 * in_near[i + 8] + in_far[i + 8], 7);
4423-
4424- // horizontal filter, polyphase implementation since it's convenient:
4425- // even pixels = 3*cur + prev = cur*4 + (prev - cur)
4426- // odd pixels = 3*cur + next = cur*4 + (next - cur)
4427- // note the shared term.
4428- __m128i bias = _mm_set1_epi16(8);
4429- __m128i curs = _mm_slli_epi16(curr, 2);
4430- __m128i prvd = _mm_sub_epi16(prev, curr);
4431- __m128i nxtd = _mm_sub_epi16(next, curr);
4432- __m128i curb = _mm_add_epi16(curs, bias);
4433- __m128i even = _mm_add_epi16(prvd, curb);
4434- __m128i odd = _mm_add_epi16(nxtd, curb);
4435-
4436- // interleave even and odd pixels, then undo scaling.
4437- __m128i int0 = _mm_unpacklo_epi16(even, odd);
4438- __m128i int1 = _mm_unpackhi_epi16(even, odd);
4439- __m128i de0 = _mm_srli_epi16(int0, 4);
4440- __m128i de1 = _mm_srli_epi16(int1, 4);
4441-
4442- // pack and write output
4443- __m128i outv = _mm_packus_epi16(de0, de1);
4444- _mm_storeu_si128((__m128i *)(out + i * 2), outv);
4445-#elif defined(STBI_NEON)
4446- // load and perform the vertical filtering pass
4447- // this uses 3*x + y = 4*x + (y - x)
4448- uint8x8_t farb = vld1_u8(in_far + i);
4449- uint8x8_t nearb = vld1_u8(in_near + i);
4450- int16x8_t diff = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
4451- int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
4452- int16x8_t curr = vaddq_s16(nears, diff); // current row
4453-
4454- // horizontal filter works the same based on shifted vers of current
4455- // row. "prev" is current row shifted right by 1 pixel; we need to
4456- // insert the previous pixel value (from t1).
4457- // "next" is current row shifted left by 1 pixel, with first pixel
4458- // of next block of 8 pixels added in.
4459- int16x8_t prv0 = vextq_s16(curr, curr, 7);
4460- int16x8_t nxt0 = vextq_s16(curr, curr, 1);
4461- int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
4462- int16x8_t next =
4463- vsetq_lane_s16(3 * in_near[i + 8] + in_far[i + 8], nxt0, 7);
4464-
4465- // horizontal filter, polyphase implementation since it's convenient:
4466- // even pixels = 3*cur + prev = cur*4 + (prev - cur)
4467- // odd pixels = 3*cur + next = cur*4 + (next - cur)
4468- // note the shared term.
4469- int16x8_t curs = vshlq_n_s16(curr, 2);
4470- int16x8_t prvd = vsubq_s16(prev, curr);
4471- int16x8_t nxtd = vsubq_s16(next, curr);
4472- int16x8_t even = vaddq_s16(curs, prvd);
4473- int16x8_t odd = vaddq_s16(curs, nxtd);
4474-
4475- // undo scaling and round, then store with even/odd phases interleaved
4476- uint8x8x2_t o;
4477- o.val[0] = vqrshrun_n_s16(even, 4);
4478- o.val[1] = vqrshrun_n_s16(odd, 4);
4479- vst2_u8(out + i * 2, o);
4480-#endif
4481-
4482- // "previous" value for next iter
4483- t1 = 3 * in_near[i + 7] + in_far[i + 7];
4484- }
4485-
4486- t0 = t1;
4487- t1 = 3 * in_near[i] + in_far[i];
4488- out[i * 2] = stbi__div16(3 * t1 + t0 + 8);
4489-
4490- for (++i; i < w; ++i) {
4491- t0 = t1;
4492- t1 = 3 * in_near[i] + in_far[i];
4493- out[i * 2 - 1] = stbi__div16(3 * t0 + t1 + 8);
4494- out[i * 2] = stbi__div16(3 * t1 + t0 + 8);
4495- }
4496- out[w * 2 - 1] = stbi__div4(t1 + 2);
4497-
4498- STBI_NOTUSED(hs);
4499-
4500- return out;
4501-}
4502-#endif
4503-
4504-static stbi_uc *
4505-stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far,
4506- int w, int hs)
4507-{
4508- // resample with nearest-neighbor
4509- int i, j;
4510- STBI_NOTUSED(in_far);
4511- for (i = 0; i < w; ++i) {
4512- for (j = 0; j < hs; ++j) {
4513- out[i * hs + j] = in_near[i];
4514- }
4515- }
4516- return out;
4517-}
4518-
4519-// this is a reduced-precision calculation of YCbCr-to-RGB introduced
4520-// to make sure the code produces the same results in both SIMD and scalar
4521-#define stbi__float2fixed(x) (((int)((x) * 4096.0f + 0.5f)) << 8)
4522-static void
4523-stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb,
4524- const stbi_uc *pcr, int count, int step)
4525-{
4526- int i;
4527- for (i = 0; i < count; ++i) {
4528- int y_fixed = (y[i] << 20) + (1 << 19); // rounding
4529- int r, g, b;
4530- int cr = pcr[i] - 128;
4531- int cb = pcb[i] - 128;
4532- r = y_fixed + cr * stbi__float2fixed(1.40200f);
4533- g = y_fixed + (cr * -stbi__float2fixed(0.71414f)) +
4534- ((cb * -stbi__float2fixed(0.34414f)) & 0xffff0000);
4535- b = y_fixed + cb * stbi__float2fixed(1.77200f);
4536- r >>= 20;
4537- g >>= 20;
4538- b >>= 20;
4539- if ((unsigned)r > 255) {
4540- if (r < 0) {
4541- r = 0;
4542- } else {
4543- r = 255;
4544- }
4545- }
4546- if ((unsigned)g > 255) {
4547- if (g < 0) {
4548- g = 0;
4549- } else {
4550- g = 255;
4551- }
4552- }
4553- if ((unsigned)b > 255) {
4554- if (b < 0) {
4555- b = 0;
4556- } else {
4557- b = 255;
4558- }
4559- }
4560- out[0] = (stbi_uc)r;
4561- out[1] = (stbi_uc)g;
4562- out[2] = (stbi_uc)b;
4563- out[3] = 255;
4564- out += step;
4565- }
4566-}
4567-
4568-#if defined(STBI_SSE2) || defined(STBI_NEON)
4569-static void
4570-stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb,
4571- stbi_uc const *pcr, int count, int step)
4572-{
4573- int i = 0;
4574-
4575-#ifdef STBI_SSE2
4576- // step == 3 is pretty ugly on the final interleave, and i'm not convinced
4577- // it's useful in practice (you wouldn't use it for textures, for example).
4578- // so just accelerate step == 4 case.
4579- if (step == 4) {
4580- // this is a fairly straightforward implementation and not
4581- // super-optimized.
4582- __m128i signflip = _mm_set1_epi8(-0x80);
4583- __m128i cr_const0 = _mm_set1_epi16((short)(1.40200f * 4096.0f + 0.5f));
4584- __m128i cr_const1 = _mm_set1_epi16(-(short)(0.71414f * 4096.0f + 0.5f));
4585- __m128i cb_const0 = _mm_set1_epi16(-(short)(0.34414f * 4096.0f + 0.5f));
4586- __m128i cb_const1 = _mm_set1_epi16((short)(1.77200f * 4096.0f + 0.5f));
4587- __m128i y_bias = _mm_set1_epi8((char)(unsigned char)128);
4588- __m128i xw = _mm_set1_epi16(255); // alpha channel
4589-
4590- for (; i + 7 < count; i += 8) {
4591- // load
4592- __m128i y_bytes = _mm_loadl_epi64((__m128i *)(y + i));
4593- __m128i cr_bytes = _mm_loadl_epi64((__m128i *)(pcr + i));
4594- __m128i cb_bytes = _mm_loadl_epi64((__m128i *)(pcb + i));
4595- __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
4596- __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
4597-
4598- // unpack to short (and left-shift cr, cb by 8)
4599- __m128i yw = _mm_unpacklo_epi8(y_bias, y_bytes);
4600- __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
4601- __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
4602-
4603- // color transform
4604- __m128i yws = _mm_srli_epi16(yw, 4);
4605- __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
4606- __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
4607- __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
4608- __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
4609- __m128i rws = _mm_add_epi16(cr0, yws);
4610- __m128i gwt = _mm_add_epi16(cb0, yws);
4611- __m128i bws = _mm_add_epi16(yws, cb1);
4612- __m128i gws = _mm_add_epi16(gwt, cr1);
4613-
4614- // descale
4615- __m128i rw = _mm_srai_epi16(rws, 4);
4616- __m128i bw = _mm_srai_epi16(bws, 4);
4617- __m128i gw = _mm_srai_epi16(gws, 4);
4618-
4619- // back to byte, set up for transpose
4620- __m128i brb = _mm_packus_epi16(rw, bw);
4621- __m128i gxb = _mm_packus_epi16(gw, xw);
4622-
4623- // transpose to interleave channels
4624- __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
4625- __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
4626- __m128i o0 = _mm_unpacklo_epi16(t0, t1);
4627- __m128i o1 = _mm_unpackhi_epi16(t0, t1);
4628-
4629- // store
4630- _mm_storeu_si128((__m128i *)(out + 0), o0);
4631- _mm_storeu_si128((__m128i *)(out + 16), o1);
4632- out += 32;
4633- }
4634- }
4635-#endif
4636-
4637-#ifdef STBI_NEON
4638- // in this version, step=3 support would be easy to add. but is there
4639- // demand?
4640- if (step == 4) {
4641- // this is a fairly straightforward implementation and not
4642- // super-optimized.
4643- uint8x8_t signflip = vdup_n_u8(0x80);
4644- int16x8_t cr_const0 = vdupq_n_s16((short)(1.40200f * 4096.0f + 0.5f));
4645- int16x8_t cr_const1 = vdupq_n_s16(-(short)(0.71414f * 4096.0f + 0.5f));
4646- int16x8_t cb_const0 = vdupq_n_s16(-(short)(0.34414f * 4096.0f + 0.5f));
4647- int16x8_t cb_const1 = vdupq_n_s16((short)(1.77200f * 4096.0f + 0.5f));
4648-
4649- for (; i + 7 < count; i += 8) {
4650- // load
4651- uint8x8_t y_bytes = vld1_u8(y + i);
4652- uint8x8_t cr_bytes = vld1_u8(pcr + i);
4653- uint8x8_t cb_bytes = vld1_u8(pcb + i);
4654- int8x8_t cr_biased =
4655- vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
4656- int8x8_t cb_biased =
4657- vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
4658-
4659- // expand to s16
4660- int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
4661- int16x8_t crw = vshll_n_s8(cr_biased, 7);
4662- int16x8_t cbw = vshll_n_s8(cb_biased, 7);
4663-
4664- // color transform
4665- int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
4666- int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
4667- int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
4668- int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
4669- int16x8_t rws = vaddq_s16(yws, cr0);
4670- int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
4671- int16x8_t bws = vaddq_s16(yws, cb1);
4672-
4673- // undo scaling, round, convert to byte
4674- uint8x8x4_t o;
4675- o.val[0] = vqrshrun_n_s16(rws, 4);
4676- o.val[1] = vqrshrun_n_s16(gws, 4);
4677- o.val[2] = vqrshrun_n_s16(bws, 4);
4678- o.val[3] = vdup_n_u8(255);
4679-
4680- // store, interleaving r/g/b/a
4681- vst4_u8(out, o);
4682- out += 8 * 4;
4683- }
4684- }
4685-#endif
4686-
4687- for (; i < count; ++i) {
4688- int y_fixed = (y[i] << 20) + (1 << 19); // rounding
4689- int r, g, b;
4690- int cr = pcr[i] - 128;
4691- int cb = pcb[i] - 128;
4692- r = y_fixed + cr * stbi__float2fixed(1.40200f);
4693- g = y_fixed + cr * -stbi__float2fixed(0.71414f) +
4694- ((cb * -stbi__float2fixed(0.34414f)) & 0xffff0000);
4695- b = y_fixed + cb * stbi__float2fixed(1.77200f);
4696- r >>= 20;
4697- g >>= 20;
4698- b >>= 20;
4699- if ((unsigned)r > 255) {
4700- if (r < 0) {
4701- r = 0;
4702- } else {
4703- r = 255;
4704- }
4705- }
4706- if ((unsigned)g > 255) {
4707- if (g < 0) {
4708- g = 0;
4709- } else {
4710- g = 255;
4711- }
4712- }
4713- if ((unsigned)b > 255) {
4714- if (b < 0) {
4715- b = 0;
4716- } else {
4717- b = 255;
4718- }
4719- }
4720- out[0] = (stbi_uc)r;
4721- out[1] = (stbi_uc)g;
4722- out[2] = (stbi_uc)b;
4723- out[3] = 255;
4724- out += step;
4725- }
4726-}
4727-#endif
4728-
4729-// set up the kernels
4730-static void
4731-stbi__setup_jpeg(stbi__jpeg *j)
4732-{
4733- j->idct_block_kernel = stbi__idct_block;
4734- j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
4735- j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
4736-
4737-#ifdef STBI_SSE2
4738- if (stbi__sse2_available()) {
4739- j->idct_block_kernel = stbi__idct_simd;
4740- j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
4741- j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
4742- }
4743-#endif
4744-
4745-#ifdef STBI_NEON
4746- j->idct_block_kernel = stbi__idct_simd;
4747- j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
4748- j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
4749-#endif
4750-}
4751-
4752-// clean up the temporary component buffers
4753-static void
4754-stbi__cleanup_jpeg(stbi__jpeg *j)
4755-{
4756- stbi__free_jpeg_components(j, j->s->img_n, 0);
4757-}
4758-
4759-typedef struct {
4760- resample_row_func resample;
4761- stbi_uc *line0, *line1;
4762- int hs, vs; // expansion factor in each axis
4763- int w_lores; // horizontal pixels pre-expansion
4764- int ystep; // how far through vertical expansion we are
4765- int ypos; // which pre-expansion row we're on
4766-} stbi__resample;
4767-
4768-// fast 0..255 * 0..255 => 0..255 rounded multiplication
4769-static stbi_uc
4770-stbi__blinn_8x8(stbi_uc x, stbi_uc y)
4771-{
4772- unsigned int t = x * y + 128;
4773- return (stbi_uc)((t + (t >> 8)) >> 8);
4774-}
4775-
4776-static stbi_uc *
4777-load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
4778-{
4779- int n, decode_n, is_rgb;
4780- z->s->img_n = 0; // make stbi__cleanup_jpeg safe
4781-
4782- // validate req_comp
4783- if (req_comp < 0 || req_comp > 4) {
4784- return stbi__errpuc("bad req_comp", "Internal error");
4785- }
4786-
4787- // load a jpeg image from whichever source, but leave in YCbCr format
4788- if (!stbi__decode_jpeg_image(z)) {
4789- stbi__cleanup_jpeg(z);
4790- return NULL;
4791- }
4792-
4793- // determine actual number of components to generate
4794- n = req_comp ? req_comp : z->s->img_n >= 3 ? 3 : 1;
4795-
4796- is_rgb = z->s->img_n == 3 &&
4797- (z->rgb == 3 || (z->app14_color_transform == 0 && !z->jfif));
4798-
4799- if (z->s->img_n == 3 && n < 3 && !is_rgb) {
4800- decode_n = 1;
4801- } else {
4802- decode_n = z->s->img_n;
4803- }
4804-
4805- // nothing to do if no components requested; check this now to avoid
4806- // accessing uninitialized coutput[0] later
4807- if (decode_n <= 0) {
4808- stbi__cleanup_jpeg(z);
4809- return NULL;
4810- }
4811-
4812- // resample and color-convert
4813- {
4814- int k;
4815- unsigned int i, j;
4816- stbi_uc *output;
4817- stbi_uc *coutput[4] = {NULL, NULL, NULL, NULL};
4818-
4819- stbi__resample res_comp[4];
4820-
4821- for (k = 0; k < decode_n; ++k) {
4822- stbi__resample *r = &res_comp[k];
4823-
4824- // allocate line buffer big enough for upsampling off the edges
4825- // with upsample factor of 4
4826- z->img_comp[k].linebuf = (stbi_uc *)stbi__malloc(z->s->img_x + 3);
4827- if (!z->img_comp[k].linebuf) {
4828- stbi__cleanup_jpeg(z);
4829- return stbi__errpuc("outofmem", "Out of memory");
4830- }
4831-
4832- r->hs = z->img_h_max / z->img_comp[k].h;
4833- r->vs = z->img_v_max / z->img_comp[k].v;
4834- r->ystep = r->vs >> 1;
4835- r->w_lores = (z->s->img_x + r->hs - 1) / r->hs;
4836- r->ypos = 0;
4837- r->line0 = r->line1 = z->img_comp[k].data;
4838-
4839- if (r->hs == 1 && r->vs == 1) {
4840- r->resample = resample_row_1;
4841- } else if (r->hs == 1 && r->vs == 2) {
4842- r->resample = stbi__resample_row_v_2;
4843- } else if (r->hs == 2 && r->vs == 1) {
4844- r->resample = stbi__resample_row_h_2;
4845- } else if (r->hs == 2 && r->vs == 2) {
4846- r->resample = z->resample_row_hv_2_kernel;
4847- } else {
4848- r->resample = stbi__resample_row_generic;
4849- }
4850- }
4851-
4852- // can't error after this so, this is safe
4853- output = (stbi_uc *)stbi__malloc_mad3(n, z->s->img_x, z->s->img_y, 1);
4854- if (!output) {
4855- stbi__cleanup_jpeg(z);
4856- return stbi__errpuc("outofmem", "Out of memory");
4857- }
4858-
4859- // now go ahead and resample
4860- for (j = 0; j < z->s->img_y; ++j) {
4861- stbi_uc *out = output + n * z->s->img_x * j;
4862- for (k = 0; k < decode_n; ++k) {
4863- stbi__resample *r = &res_comp[k];
4864- int y_bot = r->ystep >= (r->vs >> 1);
4865- coutput[k] = r->resample(
4866- z->img_comp[k].linebuf, y_bot ? r->line1 : r->line0,
4867- y_bot ? r->line0 : r->line1, r->w_lores, r->hs);
4868- if (++r->ystep >= r->vs) {
4869- r->ystep = 0;
4870- r->line0 = r->line1;
4871- if (++r->ypos < z->img_comp[k].y) {
4872- r->line1 += z->img_comp[k].w2;
4873- }
4874- }
4875- }
4876- if (n >= 3) {
4877- stbi_uc *y = coutput[0];
4878- if (z->s->img_n == 3) {
4879- if (is_rgb) {
4880- for (i = 0; i < z->s->img_x; ++i) {
4881- out[0] = y[i];
4882- out[1] = coutput[1][i];
4883- out[2] = coutput[2][i];
4884- out[3] = 255;
4885- out += n;
4886- }
4887- } else {
4888- z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2],
4889- z->s->img_x, n);
4890- }
4891- } else if (z->s->img_n == 4) {
4892- if (z->app14_color_transform == 0) { // CMYK
4893- for (i = 0; i < z->s->img_x; ++i) {
4894- stbi_uc m = coutput[3][i];
4895- out[0] = stbi__blinn_8x8(coutput[0][i], m);
4896- out[1] = stbi__blinn_8x8(coutput[1][i], m);
4897- out[2] = stbi__blinn_8x8(coutput[2][i], m);
4898- out[3] = 255;
4899- out += n;
4900- }
4901- } else if (z->app14_color_transform == 2) { // YCCK
4902- z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2],
4903- z->s->img_x, n);
4904- for (i = 0; i < z->s->img_x; ++i) {
4905- stbi_uc m = coutput[3][i];
4906- out[0] = stbi__blinn_8x8(255 - out[0], m);
4907- out[1] = stbi__blinn_8x8(255 - out[1], m);
4908- out[2] = stbi__blinn_8x8(255 - out[2], m);
4909- out += n;
4910- }
4911- } else { // YCbCr + alpha? Ignore the fourth channel for
4912- // now
4913- z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2],
4914- z->s->img_x, n);
4915- }
4916- } else {
4917- for (i = 0; i < z->s->img_x; ++i) {
4918- out[0] = out[1] = out[2] = y[i];
4919- out[3] = 255; // not used if n==3
4920- out += n;
4921- }
4922- }
4923- } else {
4924- if (is_rgb) {
4925- if (n == 1) {
4926- for (i = 0; i < z->s->img_x; ++i) {
4927- *out++ = stbi__compute_y(
4928- coutput[0][i], coutput[1][i], coutput[2][i]);
4929- }
4930- } else {
4931- for (i = 0; i < z->s->img_x; ++i, out += 2) {
4932- out[0] = stbi__compute_y(
4933- coutput[0][i], coutput[1][i], coutput[2][i]);
4934- out[1] = 255;
4935- }
4936- }
4937- } else if (z->s->img_n == 4 && z->app14_color_transform == 0) {
4938- for (i = 0; i < z->s->img_x; ++i) {
4939- stbi_uc m = coutput[3][i];
4940- stbi_uc r = stbi__blinn_8x8(coutput[0][i], m);
4941- stbi_uc g = stbi__blinn_8x8(coutput[1][i], m);
4942- stbi_uc b = stbi__blinn_8x8(coutput[2][i], m);
4943- out[0] = stbi__compute_y(r, g, b);
4944- out[1] = 255;
4945- out += n;
4946- }
4947- } else if (z->s->img_n == 4 && z->app14_color_transform == 2) {
4948- for (i = 0; i < z->s->img_x; ++i) {
4949- out[0] =
4950- stbi__blinn_8x8(255 - coutput[0][i], coutput[3][i]);
4951- out[1] = 255;
4952- out += n;
4953- }
4954- } else {
4955- stbi_uc *y = coutput[0];
4956- if (n == 1) {
4957- for (i = 0; i < z->s->img_x; ++i) {
4958- out[i] = y[i];
4959- }
4960- } else {
4961- for (i = 0; i < z->s->img_x; ++i) {
4962- *out++ = y[i];
4963- *out++ = 255;
4964- }
4965- }
4966- }
4967- }
4968- }
4969- stbi__cleanup_jpeg(z);
4970- *out_x = z->s->img_x;
4971- *out_y = z->s->img_y;
4972- if (comp) {
4973- *comp = z->s->img_n >= 3
4974- ? 3
4975- : 1; // report original components, not output
4976- }
4977- return output;
4978- }
4979-}
4980-
4981-static void *
4982-stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
4983- stbi__result_info *ri)
4984-{
4985- unsigned char *result;
4986- stbi__jpeg *j = (stbi__jpeg *)stbi__malloc(sizeof(stbi__jpeg));
4987- if (!j) {
4988- return stbi__errpuc("outofmem", "Out of memory");
4989- }
4990- memset(j, 0, sizeof(stbi__jpeg));
4991- STBI_NOTUSED(ri);
4992- j->s = s;
4993- stbi__setup_jpeg(j);
4994- result = load_jpeg_image(j, x, y, comp, req_comp);
4995- STBI_FREE(j);
4996- return result;
4997-}
4998-
4999-static int
5000-stbi__jpeg_test(stbi__context *s)
5001-{
5002- int r;
5003- stbi__jpeg *j = (stbi__jpeg *)stbi__malloc(sizeof(stbi__jpeg));
5004- if (!j) {
5005- return stbi__err("outofmem", "Out of memory");
5006- }
5007- memset(j, 0, sizeof(stbi__jpeg));
5008- j->s = s;
5009- stbi__setup_jpeg(j);
5010- r = stbi__decode_jpeg_header(j, STBI__SCAN_type);
5011- stbi__rewind(s);
5012- STBI_FREE(j);
5013- return r;
5014-}
5015-
5016-static int
5017-stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
5018-{
5019- if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
5020- stbi__rewind(j->s);
5021- return 0;
5022- }
5023- if (x) {
5024- *x = j->s->img_x;
5025- }
5026- if (y) {
5027- *y = j->s->img_y;
5028- }
5029- if (comp) {
5030- *comp = j->s->img_n >= 3 ? 3 : 1;
5031- }
5032- return 1;
5033-}
5034-
5035-static int
5036-stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
5037-{
5038- int result;
5039- stbi__jpeg *j = (stbi__jpeg *)(stbi__malloc(sizeof(stbi__jpeg)));
5040- if (!j) {
5041- return stbi__err("outofmem", "Out of memory");
5042- }
5043- memset(j, 0, sizeof(stbi__jpeg));
5044- j->s = s;
5045- result = stbi__jpeg_info_raw(j, x, y, comp);
5046- STBI_FREE(j);
5047- return result;
5048-}
5049-#endif
5050-
5051-// public domain zlib decode v0.2 Sean Barrett 2006-11-18
5052-// simple implementation
5053-// - all input must be provided in an upfront buffer
5054-// - all output is written to a single output buffer (can malloc/realloc)
5055-// performance
5056-// - fast huffman
5057-
5058-#ifndef STBI_NO_ZLIB
5059-
5060-// fast-way is faster to check than jpeg huffman, but slow way is slower
5061-#define STBI__ZFAST_BITS 9 // accelerate all cases in default tables
5062-#define STBI__ZFAST_MASK ((1 << STBI__ZFAST_BITS) - 1)
5063-#define STBI__ZNSYMS 288 // number of symbols in literal/length alphabet
5064-
5065-// zlib-style huffman encoding
5066-// (jpegs packs from left, zlib from right, so can't share code)
5067-typedef struct {
5068- stbi__uint16 fast[1 << STBI__ZFAST_BITS];
5069- stbi__uint16 firstcode[16];
5070- int maxcode[17];
5071- stbi__uint16 firstsymbol[16];
5072- stbi_uc size[STBI__ZNSYMS];
5073- stbi__uint16 value[STBI__ZNSYMS];
5074-} stbi__zhuffman;
5075-
5076-stbi_inline static int
5077-stbi__bitreverse16(int n)
5078-{
5079- n = ((n & 0xAAAA) >> 1) | ((n & 0x5555) << 1);
5080- n = ((n & 0xCCCC) >> 2) | ((n & 0x3333) << 2);
5081- n = ((n & 0xF0F0) >> 4) | ((n & 0x0F0F) << 4);
5082- n = ((n & 0xFF00) >> 8) | ((n & 0x00FF) << 8);
5083- return n;
5084-}
5085-
5086-stbi_inline static int
5087-stbi__bit_reverse(int v, int bits)
5088-{
5089- STBI_ASSERT(bits <= 16);
5090- // to bit reverse n bits, reverse 16 and shift
5091- // e.g. 11 bits, bit reverse and shift away 5
5092- return stbi__bitreverse16(v) >> (16 - bits);
5093-}
5094-
5095-static int
5096-stbi__zbuild_huffman(stbi__zhuffman *z, const stbi_uc *sizelist, int num)
5097-{
5098- int i, k = 0;
5099- int code, next_code[16], sizes[17];
5100-
5101- // DEFLATE spec for generating codes
5102- memset(sizes, 0, sizeof(sizes));
5103- memset(z->fast, 0, sizeof(z->fast));
5104- for (i = 0; i < num; ++i) {
5105- ++sizes[sizelist[i]];
5106- }
5107- sizes[0] = 0;
5108- for (i = 1; i < 16; ++i) {
5109- if (sizes[i] > (1 << i)) {
5110- return stbi__err("bad sizes", "Corrupt PNG");
5111- }
5112- }
5113- code = 0;
5114- for (i = 1; i < 16; ++i) {
5115- next_code[i] = code;
5116- z->firstcode[i] = (stbi__uint16)code;
5117- z->firstsymbol[i] = (stbi__uint16)k;
5118- code = (code + sizes[i]);
5119- if (sizes[i]) {
5120- if (code - 1 >= (1 << i)) {
5121- return stbi__err("bad codelengths", "Corrupt PNG");
5122- }
5123- }
5124- z->maxcode[i] = code << (16 - i); // preshift for inner loop
5125- code <<= 1;
5126- k += sizes[i];
5127- }
5128- z->maxcode[16] = 0x10000; // sentinel
5129- for (i = 0; i < num; ++i) {
5130- int s = sizelist[i];
5131- if (s) {
5132- int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
5133- stbi__uint16 fastv = (stbi__uint16)((s << 9) | i);
5134- z->size[c] = (stbi_uc)s;
5135- z->value[c] = (stbi__uint16)i;
5136- if (s <= STBI__ZFAST_BITS) {
5137- int j = stbi__bit_reverse(next_code[s], s);
5138- while (j < (1 << STBI__ZFAST_BITS)) {
5139- z->fast[j] = fastv;
5140- j += (1 << s);
5141- }
5142- }
5143- ++next_code[s];
5144- }
5145- }
5146- return 1;
5147-}
5148-
5149-// zlib-from-memory implementation for PNG reading
5150-// because PNG allows splitting the zlib stream arbitrarily,
5151-// and it's annoying structurally to have PNG call ZLIB call PNG,
5152-// we require PNG read all the IDATs and combine them into a single
5153-// memory buffer
5154-
5155-typedef struct {
5156- stbi_uc *zbuffer, *zbuffer_end;
5157- int num_bits;
5158- int hit_zeof_once;
5159- stbi__uint32 code_buffer;
5160-
5161- char *zout;
5162- char *zout_start;
5163- char *zout_end;
5164- int z_expandable;
5165-
5166- stbi__zhuffman z_length, z_distance;
5167-} stbi__zbuf;
5168-
5169-stbi_inline static int
5170-stbi__zeof(stbi__zbuf *z)
5171-{
5172- return (z->zbuffer >= z->zbuffer_end);
5173-}
5174-
5175-stbi_inline static stbi_uc
5176-stbi__zget8(stbi__zbuf *z)
5177-{
5178- return stbi__zeof(z) ? 0 : *z->zbuffer++;
5179-}
5180-
5181-static void
5182-stbi__fill_bits(stbi__zbuf *z)
5183-{
5184- do {
5185- if (z->code_buffer >= (1U << z->num_bits)) {
5186- z->zbuffer = z->zbuffer_end; /* treat this as EOF so we fail. */
5187- return;
5188- }
5189- z->code_buffer |= (unsigned int)stbi__zget8(z) << z->num_bits;
5190- z->num_bits += 8;
5191- } while (z->num_bits <= 24);
5192-}
5193-
5194-stbi_inline static unsigned int
5195-stbi__zreceive(stbi__zbuf *z, int n)
5196-{
5197- unsigned int k;
5198- if (z->num_bits < n) {
5199- stbi__fill_bits(z);
5200- }
5201- k = z->code_buffer & ((1 << n) - 1);
5202- z->code_buffer >>= n;
5203- z->num_bits -= n;
5204- return k;
5205-}
5206-
5207-static int
5208-stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
5209-{
5210- int b, s, k;
5211- // not resolved by fast table, so compute it the slow way
5212- // use jpeg approach, which requires MSbits at top
5213- k = stbi__bit_reverse(a->code_buffer, 16);
5214- for (s = STBI__ZFAST_BITS + 1;; ++s) {
5215- if (k < z->maxcode[s]) {
5216- break;
5217- }
5218- }
5219- if (s >= 16) {
5220- return -1; // invalid code!
5221- }
5222- // code size is s, so:
5223- b = (k >> (16 - s)) - z->firstcode[s] + z->firstsymbol[s];
5224- if (b >= STBI__ZNSYMS) {
5225- return -1; // some data was corrupt somewhere!
5226- }
5227- if (z->size[b] != s) {
5228- return -1; // was originally an assert, but report failure instead.
5229- }
5230- a->code_buffer >>= s;
5231- a->num_bits -= s;
5232- return z->value[b];
5233-}
5234-
5235-stbi_inline static int
5236-stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
5237-{
5238- int b, s;
5239- if (a->num_bits < 16) {
5240- if (stbi__zeof(a)) {
5241- if (!a->hit_zeof_once) {
5242- // This is the first time we hit eof, insert 16 extra padding
5243- // btis to allow us to keep going; if we actually consume any of
5244- // them though, that is invalid data. This is caught later.
5245- a->hit_zeof_once = 1;
5246- a->num_bits += 16; // add 16 implicit zero bits
5247- } else {
5248- // We already inserted our extra 16 padding bits and are again
5249- // out, this stream is actually prematurely terminated.
5250- return -1;
5251- }
5252- } else {
5253- stbi__fill_bits(a);
5254- }
5255- }
5256- b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
5257- if (b) {
5258- s = b >> 9;
5259- a->code_buffer >>= s;
5260- a->num_bits -= s;
5261- return b & 511;
5262- }
5263- return stbi__zhuffman_decode_slowpath(a, z);
5264-}
5265-
5266-static int
5267-stbi__zexpand(stbi__zbuf *z, char *zout, int n) // need to make room for n bytes
5268-{
5269- char *q;
5270- unsigned int cur, limit, old_limit;
5271- z->zout = zout;
5272- if (!z->z_expandable) {
5273- return stbi__err("output buffer limit", "Corrupt PNG");
5274- }
5275- cur = (unsigned int)(z->zout - z->zout_start);
5276- limit = old_limit = (unsigned)(z->zout_end - z->zout_start);
5277- if (UINT_MAX - cur < (unsigned)n) {
5278- return stbi__err("outofmem", "Out of memory");
5279- }
5280- while (cur + n > limit) {
5281- if (limit > UINT_MAX / 2) {
5282- return stbi__err("outofmem", "Out of memory");
5283- }
5284- limit *= 2;
5285- }
5286- q = (char *)STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
5287- STBI_NOTUSED(old_limit);
5288- if (q == NULL) {
5289- return stbi__err("outofmem", "Out of memory");
5290- }
5291- z->zout_start = q;
5292- z->zout = q + cur;
5293- z->zout_end = q + limit;
5294- return 1;
5295-}
5296-
5297-static const int stbi__zlength_base[31] = {
5298- 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31,
5299- 35, 43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258, 0, 0};
5300-
5301-static const int stbi__zlength_extra[31] = {0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
5302- 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4,
5303- 4, 4, 5, 5, 5, 5, 0, 0, 0};
5304-
5305-static const int stbi__zdist_base[32] = {
5306- 1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33,
5307- 49, 65, 97, 129, 193, 257, 385, 513, 769, 1025, 1537,
5308- 2049, 3073, 4097, 6145, 8193, 12289, 16385, 24577, 0, 0};
5309-
5310-static const int stbi__zdist_extra[32] = {0, 0, 0, 0, 1, 1, 2, 2, 3, 3,
5311- 4, 4, 5, 5, 6, 6, 7, 7, 8, 8,
5312- 9, 9, 10, 10, 11, 11, 12, 12, 13, 13};
5313-
5314-static int
5315-stbi__parse_huffman_block(stbi__zbuf *a)
5316-{
5317- char *zout = a->zout;
5318- for (;;) {
5319- int z = stbi__zhuffman_decode(a, &a->z_length);
5320- if (z < 256) {
5321- if (z < 0) {
5322- return stbi__err("bad huffman code",
5323- "Corrupt PNG"); // error in huffman codes
5324- }
5325- if (zout >= a->zout_end) {
5326- if (!stbi__zexpand(a, zout, 1)) {
5327- return 0;
5328- }
5329- zout = a->zout;
5330- }
5331- *zout++ = (char)z;
5332- } else {
5333- stbi_uc *p;
5334- int len, dist;
5335- if (z == 256) {
5336- a->zout = zout;
5337- if (a->hit_zeof_once && a->num_bits < 16) {
5338- // The first time we hit zeof, we inserted 16 extra zero
5339- // bits into our bit buffer so the decoder can just do its
5340- // speculative decoding. But if we actually consumed any of
5341- // those bits (which is the case when num_bits < 16), the
5342- // stream actually read past the end so it is malformed.
5343- return stbi__err("unexpected end", "Corrupt PNG");
5344- }
5345- return 1;
5346- }
5347- if (z >= 286) {
5348- return stbi__err(
5349- "bad huffman code",
5350- "Corrupt PNG"); // per DEFLATE, length codes 286 and 287
5351- // must not appear in compressed data
5352- }
5353- z -= 257;
5354- len = stbi__zlength_base[z];
5355- if (stbi__zlength_extra[z]) {
5356- len += stbi__zreceive(a, stbi__zlength_extra[z]);
5357- }
5358- z = stbi__zhuffman_decode(a, &a->z_distance);
5359- if (z < 0 || z >= 30) {
5360- return stbi__err(
5361- "bad huffman code",
5362- "Corrupt PNG"); // per DEFLATE, distance codes 30 and 31
5363- // must not appear in compressed data
5364- }
5365- dist = stbi__zdist_base[z];
5366- if (stbi__zdist_extra[z]) {
5367- dist += stbi__zreceive(a, stbi__zdist_extra[z]);
5368- }
5369- if (zout - a->zout_start < dist) {
5370- return stbi__err("bad dist", "Corrupt PNG");
5371- }
5372- if (len > a->zout_end - zout) {
5373- if (!stbi__zexpand(a, zout, len)) {
5374- return 0;
5375- }
5376- zout = a->zout;
5377- }
5378- p = (stbi_uc *)(zout - dist);
5379- if (dist == 1) { // run of one byte; common in images.
5380- stbi_uc v = *p;
5381- if (len) {
5382- do {
5383- *zout++ = v;
5384- } while (--len);
5385- }
5386- } else {
5387- if (len) {
5388- do {
5389- *zout++ = *p++;
5390- } while (--len);
5391- }
5392- }
5393- }
5394- }
5395-}
5396-
5397-static int
5398-stbi__compute_huffman_codes(stbi__zbuf *a)
5399-{
5400- static const stbi_uc length_dezigzag[19] = {
5401- 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15};
5402- stbi__zhuffman z_codelength;
5403- stbi_uc lencodes[286 + 32 + 137]; // padding for maximum single op
5404- stbi_uc codelength_sizes[19];
5405- int i, n;
5406-
5407- int hlit = stbi__zreceive(a, 5) + 257;
5408- int hdist = stbi__zreceive(a, 5) + 1;
5409- int hclen = stbi__zreceive(a, 4) + 4;
5410- int ntot = hlit + hdist;
5411-
5412- memset(codelength_sizes, 0, sizeof(codelength_sizes));
5413- for (i = 0; i < hclen; ++i) {
5414- int s = stbi__zreceive(a, 3);
5415- codelength_sizes[length_dezigzag[i]] = (stbi_uc)s;
5416- }
5417- if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) {
5418- return 0;
5419- }
5420-
5421- n = 0;
5422- while (n < ntot) {
5423- int c = stbi__zhuffman_decode(a, &z_codelength);
5424- if (c < 0 || c >= 19) {
5425- return stbi__err("bad codelengths", "Corrupt PNG");
5426- }
5427- if (c < 16) {
5428- lencodes[n++] = (stbi_uc)c;
5429- } else {
5430- stbi_uc fill = 0;
5431- if (c == 16) {
5432- c = stbi__zreceive(a, 2) + 3;
5433- if (n == 0) {
5434- return stbi__err("bad codelengths", "Corrupt PNG");
5435- }
5436- fill = lencodes[n - 1];
5437- } else if (c == 17) {
5438- c = stbi__zreceive(a, 3) + 3;
5439- } else if (c == 18) {
5440- c = stbi__zreceive(a, 7) + 11;
5441- } else {
5442- return stbi__err("bad codelengths", "Corrupt PNG");
5443- }
5444- if (ntot - n < c) {
5445- return stbi__err("bad codelengths", "Corrupt PNG");
5446- }
5447- memset(lencodes + n, fill, c);
5448- n += c;
5449- }
5450- }
5451- if (n != ntot) {
5452- return stbi__err("bad codelengths", "Corrupt PNG");
5453- }
5454- if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) {
5455- return 0;
5456- }
5457- if (!stbi__zbuild_huffman(&a->z_distance, lencodes + hlit, hdist)) {
5458- return 0;
5459- }
5460- return 1;
5461-}
5462-
5463-static int
5464-stbi__parse_uncompressed_block(stbi__zbuf *a)
5465-{
5466- stbi_uc header[4];
5467- int len, nlen, k;
5468- if (a->num_bits & 7) {
5469- stbi__zreceive(a, a->num_bits & 7); // discard
5470- }
5471- // drain the bit-packed data into header
5472- k = 0;
5473- while (a->num_bits > 0) {
5474- header[k++] =
5475- (stbi_uc)(a->code_buffer & 255); // suppress MSVC run-time check
5476- a->code_buffer >>= 8;
5477- a->num_bits -= 8;
5478- }
5479- if (a->num_bits < 0) {
5480- return stbi__err("zlib corrupt", "Corrupt PNG");
5481- }
5482- // now fill header the normal way
5483- while (k < 4) {
5484- header[k++] = stbi__zget8(a);
5485- }
5486- len = header[1] * 256 + header[0];
5487- nlen = header[3] * 256 + header[2];
5488- if (nlen != (len ^ 0xffff)) {
5489- return stbi__err("zlib corrupt", "Corrupt PNG");
5490- }
5491- if (a->zbuffer + len > a->zbuffer_end) {
5492- return stbi__err("read past buffer", "Corrupt PNG");
5493- }
5494- if (a->zout + len > a->zout_end) {
5495- if (!stbi__zexpand(a, a->zout, len)) {
5496- return 0;
5497- }
5498- }
5499- memcpy(a->zout, a->zbuffer, len);
5500- a->zbuffer += len;
5501- a->zout += len;
5502- return 1;
5503-}
5504-
5505-static int
5506-stbi__parse_zlib_header(stbi__zbuf *a)
5507-{
5508- int cmf = stbi__zget8(a);
5509- int cm = cmf & 15;
5510- /* int cinfo = cmf >> 4; */
5511- int flg = stbi__zget8(a);
5512- if (stbi__zeof(a)) {
5513- return stbi__err("bad zlib header", "Corrupt PNG"); // zlib spec
5514- }
5515- if ((cmf * 256 + flg) % 31 != 0) {
5516- return stbi__err("bad zlib header", "Corrupt PNG"); // zlib spec
5517- }
5518- if (flg & 32) {
5519- return stbi__err("no preset dict",
5520- "Corrupt PNG"); // preset dictionary not allowed in png
5521- }
5522- if (cm != 8) {
5523- return stbi__err("bad compression",
5524- "Corrupt PNG"); // DEFLATE required for png
5525- }
5526- // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
5527- return 1;
5528-}
5529-
5530-static const stbi_uc stbi__zdefault_length[STBI__ZNSYMS] = {
5531- 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
5532- 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
5533- 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
5534- 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
5535- 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
5536- 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
5537- 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
5538- 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
5539- 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
5540- 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
5541- 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 7, 7, 7, 7, 7, 7, 7, 7,
5542- 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8};
5543-static const stbi_uc stbi__zdefault_distance[32] = {
5544- 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5545- 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5};
5546-/*
5547-Init algorithm:
5548-{
5549- int i; // use <= to match clearly with spec
5550- for (i=0; i <= 143; ++i) stbi__zdefault_length[i] = 8;
5551- for ( ; i <= 255; ++i) stbi__zdefault_length[i] = 9;
5552- for ( ; i <= 279; ++i) stbi__zdefault_length[i] = 7;
5553- for ( ; i <= 287; ++i) stbi__zdefault_length[i] = 8;
5554-
5555- for (i=0; i <= 31; ++i) stbi__zdefault_distance[i] = 5;
5556-}
5557-*/
5558-
5559-static int
5560-stbi__parse_zlib(stbi__zbuf *a, int parse_header)
5561-{
5562- int final, type;
5563- if (parse_header) {
5564- if (!stbi__parse_zlib_header(a)) {
5565- return 0;
5566- }
5567- }
5568- a->num_bits = 0;
5569- a->code_buffer = 0;
5570- a->hit_zeof_once = 0;
5571- do {
5572- final = stbi__zreceive(a, 1);
5573- type = stbi__zreceive(a, 2);
5574- if (type == 0) {
5575- if (!stbi__parse_uncompressed_block(a)) {
5576- return 0;
5577- }
5578- } else if (type == 3) {
5579- return 0;
5580- } else {
5581- if (type == 1) {
5582- // use fixed code lengths
5583- if (!stbi__zbuild_huffman(&a->z_length, stbi__zdefault_length,
5584- STBI__ZNSYMS)) {
5585- return 0;
5586- }
5587- if (!stbi__zbuild_huffman(&a->z_distance,
5588- stbi__zdefault_distance, 32)) {
5589- return 0;
5590- }
5591- } else {
5592- if (!stbi__compute_huffman_codes(a)) {
5593- return 0;
5594- }
5595- }
5596- if (!stbi__parse_huffman_block(a)) {
5597- return 0;
5598- }
5599- }
5600- } while (!final);
5601- return 1;
5602-}
5603-
5604-static int
5605-stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
5606-{
5607- a->zout_start = obuf;
5608- a->zout = obuf;
5609- a->zout_end = obuf + olen;
5610- a->z_expandable = exp;
5611-
5612- return stbi__parse_zlib(a, parse_header);
5613-}
5614-
5615-STBIDEF char *
5616-stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size,
5617- int *outlen)
5618-{
5619- stbi__zbuf a;
5620- char *p = (char *)stbi__malloc(initial_size);
5621- if (p == NULL) {
5622- return NULL;
5623- }
5624- a.zbuffer = (stbi_uc *)buffer;
5625- a.zbuffer_end = (stbi_uc *)buffer + len;
5626- if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
5627- if (outlen) {
5628- *outlen = (int)(a.zout - a.zout_start);
5629- }
5630- return a.zout_start;
5631- } else {
5632- STBI_FREE(a.zout_start);
5633- return NULL;
5634- }
5635-}
5636-
5637-STBIDEF char *
5638-stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
5639-{
5640- return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
5641-}
5642-
5643-STBIDEF char *
5644-stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len,
5645- int initial_size, int *outlen,
5646- int parse_header)
5647-{
5648- stbi__zbuf a;
5649- char *p = (char *)stbi__malloc(initial_size);
5650- if (p == NULL) {
5651- return NULL;
5652- }
5653- a.zbuffer = (stbi_uc *)buffer;
5654- a.zbuffer_end = (stbi_uc *)buffer + len;
5655- if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
5656- if (outlen) {
5657- *outlen = (int)(a.zout - a.zout_start);
5658- }
5659- return a.zout_start;
5660- } else {
5661- STBI_FREE(a.zout_start);
5662- return NULL;
5663- }
5664-}
5665-
5666-STBIDEF int
5667-stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
5668-{
5669- stbi__zbuf a;
5670- a.zbuffer = (stbi_uc *)ibuffer;
5671- a.zbuffer_end = (stbi_uc *)ibuffer + ilen;
5672- if (stbi__do_zlib(&a, obuffer, olen, 0, 1)) {
5673- return (int)(a.zout - a.zout_start);
5674- } else {
5675- return -1;
5676- }
5677-}
5678-
5679-STBIDEF char *
5680-stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
5681-{
5682- stbi__zbuf a;
5683- char *p = (char *)stbi__malloc(16384);
5684- if (p == NULL) {
5685- return NULL;
5686- }
5687- a.zbuffer = (stbi_uc *)buffer;
5688- a.zbuffer_end = (stbi_uc *)buffer + len;
5689- if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
5690- if (outlen) {
5691- *outlen = (int)(a.zout - a.zout_start);
5692- }
5693- return a.zout_start;
5694- } else {
5695- STBI_FREE(a.zout_start);
5696- return NULL;
5697- }
5698-}
5699-
5700-STBIDEF int
5701-stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer,
5702- int ilen)
5703-{
5704- stbi__zbuf a;
5705- a.zbuffer = (stbi_uc *)ibuffer;
5706- a.zbuffer_end = (stbi_uc *)ibuffer + ilen;
5707- if (stbi__do_zlib(&a, obuffer, olen, 0, 0)) {
5708- return (int)(a.zout - a.zout_start);
5709- } else {
5710- return -1;
5711- }
5712-}
5713-#endif
5714-
5715-// public domain "baseline" PNG decoder v0.10 Sean Barrett 2006-11-18
5716-// simple implementation
5717-// - only 8-bit samples
5718-// - no CRC checking
5719-// - allocates lots of intermediate memory
5720-// - avoids problem of streaming data between subsystems
5721-// - avoids explicit window management
5722-// performance
5723-// - uses stb_zlib, a PD zlib implementation with fast huffman decoding
5724-
5725-#ifndef STBI_NO_PNG
5726-typedef struct {
5727- stbi__uint32 length;
5728- stbi__uint32 type;
5729-} stbi__pngchunk;
5730-
5731-static stbi__pngchunk
5732-stbi__get_chunk_header(stbi__context *s)
5733-{
5734- stbi__pngchunk c;
5735- c.length = stbi__get32be(s);
5736- c.type = stbi__get32be(s);
5737- return c;
5738-}
5739-
5740-static int
5741-stbi__check_png_header(stbi__context *s)
5742-{
5743- static const stbi_uc png_sig[8] = {137, 80, 78, 71, 13, 10, 26, 10};
5744- int i;
5745- for (i = 0; i < 8; ++i) {
5746- if (stbi__get8(s) != png_sig[i]) {
5747- return stbi__err("bad png sig", "Not a PNG");
5748- }
5749- }
5750- return 1;
5751-}
5752-
5753-typedef struct {
5754- stbi__context *s;
5755- stbi_uc *idata, *expanded, *out;
5756- int depth;
5757-} stbi__png;
5758-
5759-enum {
5760- STBI__F_none = 0,
5761- STBI__F_sub = 1,
5762- STBI__F_up = 2,
5763- STBI__F_avg = 3,
5764- STBI__F_paeth = 4,
5765- // synthetic filter used for first scanline to avoid needing a dummy row of
5766- // 0s
5767- STBI__F_avg_first
5768-};
5769-
5770-static stbi_uc first_row_filter[5] = {
5771- STBI__F_none, STBI__F_sub, STBI__F_none, STBI__F_avg_first,
5772- STBI__F_sub // Paeth with b=c=0 turns out to be equivalent to sub
5773-};
5774-
5775-static int
5776-stbi__paeth(int a, int b, int c)
5777-{
5778- // This formulation looks very different from the reference in the PNG spec,
5779- // but is actually equivalent and has favorable data dependencies and admits
5780- // straightforward generation of branch-free code, which helps performance
5781- // significantly.
5782- int thresh = c * 3 - (a + b);
5783- int lo = a < b ? a : b;
5784- int hi = a < b ? b : a;
5785- int t0 = (hi <= thresh) ? lo : c;
5786- int t1 = (thresh <= lo) ? hi : t0;
5787- return t1;
5788-}
5789-
5790-static const stbi_uc stbi__depth_scale_table[9] = {0, 0xff, 0x55, 0, 0x11,
5791- 0, 0, 0, 0x01};
5792-
5793-// adds an extra all-255 alpha channel
5794-// dest == src is legal
5795-// img_n must be 1 or 3
5796-static void
5797-stbi__create_png_alpha_expand8(stbi_uc *dest, stbi_uc *src, stbi__uint32 x,
5798- int img_n)
5799-{
5800- int i;
5801- // must process data backwards since we allow dest==src
5802- if (img_n == 1) {
5803- for (i = x - 1; i >= 0; --i) {
5804- dest[i * 2 + 1] = 255;
5805- dest[i * 2 + 0] = src[i];
5806- }
5807- } else {
5808- STBI_ASSERT(img_n == 3);
5809- for (i = x - 1; i >= 0; --i) {
5810- dest[i * 4 + 3] = 255;
5811- dest[i * 4 + 2] = src[i * 3 + 2];
5812- dest[i * 4 + 1] = src[i * 3 + 1];
5813- dest[i * 4 + 0] = src[i * 3 + 0];
5814- }
5815- }
5816-}
5817-
5818-// create the png data from post-deflated data
5819-static int
5820-stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len,
5821- int out_n, stbi__uint32 x, stbi__uint32 y, int depth,
5822- int color)
5823-{
5824- int bytes = (depth == 16 ? 2 : 1);
5825- stbi__context *s = a->s;
5826- stbi__uint32 i, j, stride = x * out_n * bytes;
5827- stbi__uint32 img_len, img_width_bytes;
5828- stbi_uc *filter_buf;
5829- int all_ok = 1;
5830- int k;
5831- int img_n = s->img_n; // copy it into a local for later
5832-
5833- int output_bytes = out_n * bytes;
5834- int filter_bytes = img_n * bytes;
5835- int width = x;
5836-
5837- STBI_ASSERT(out_n == s->img_n || out_n == s->img_n + 1);
5838- a->out = (stbi_uc *)stbi__malloc_mad3(
5839- x, y, output_bytes, 0); // extra bytes to write off the end into
5840- if (!a->out) {
5841- return stbi__err("outofmem", "Out of memory");
5842- }
5843-
5844- // note: error exits here don't need to clean up a->out individually,
5845- // stbi__do_png always does on error.
5846- if (!stbi__mad3sizes_valid(img_n, x, depth, 7)) {
5847- return stbi__err("too large", "Corrupt PNG");
5848- }
5849- img_width_bytes = (((img_n * x * depth) + 7) >> 3);
5850- if (!stbi__mad2sizes_valid(img_width_bytes, y, img_width_bytes)) {
5851- return stbi__err("too large", "Corrupt PNG");
5852- }
5853- img_len = (img_width_bytes + 1) * y;
5854-
5855- // we used to check for exact match between raw_len and img_len on
5856- // non-interlaced PNGs, but issue #276 reported a PNG in the wild that had
5857- // extra data at the end (all zeros), so just check for raw_len < img_len
5858- // always.
5859- if (raw_len < img_len) {
5860- return stbi__err("not enough pixels", "Corrupt PNG");
5861- }
5862-
5863- // Allocate two scan lines worth of filter workspace buffer.
5864- filter_buf = (stbi_uc *)stbi__malloc_mad2(img_width_bytes, 2, 0);
5865- if (!filter_buf) {
5866- return stbi__err("outofmem", "Out of memory");
5867- }
5868-
5869- // Filtering for low-bit-depth images
5870- if (depth < 8) {
5871- filter_bytes = 1;
5872- width = img_width_bytes;
5873- }
5874-
5875- for (j = 0; j < y; ++j) {
5876- // cur/prior filter buffers alternate
5877- stbi_uc *cur = filter_buf + (j & 1) * img_width_bytes;
5878- stbi_uc *prior = filter_buf + (~j & 1) * img_width_bytes;
5879- stbi_uc *dest = a->out + stride * j;
5880- int nk = width * filter_bytes;
5881- int filter = *raw++;
5882-
5883- // check filter type
5884- if (filter > 4) {
5885- all_ok = stbi__err("invalid filter", "Corrupt PNG");
5886- break;
5887- }
5888-
5889- // if first row, use special filter that doesn't sample previous row
5890- if (j == 0) {
5891- filter = first_row_filter[filter];
5892- }
5893-
5894- // perform actual filtering
5895- switch (filter) {
5896- case STBI__F_none:
5897- memcpy(cur, raw, nk);
5898- break;
5899- case STBI__F_sub:
5900- memcpy(cur, raw, filter_bytes);
5901- for (k = filter_bytes; k < nk; ++k) {
5902- cur[k] = STBI__BYTECAST(raw[k] + cur[k - filter_bytes]);
5903- }
5904- break;
5905- case STBI__F_up:
5906- for (k = 0; k < nk; ++k) {
5907- cur[k] = STBI__BYTECAST(raw[k] + prior[k]);
5908- }
5909- break;
5910- case STBI__F_avg:
5911- for (k = 0; k < filter_bytes; ++k) {
5912- cur[k] = STBI__BYTECAST(raw[k] + (prior[k] >> 1));
5913- }
5914- for (k = filter_bytes; k < nk; ++k) {
5915- cur[k] = STBI__BYTECAST(
5916- raw[k] + ((prior[k] + cur[k - filter_bytes]) >> 1));
5917- }
5918- break;
5919- case STBI__F_paeth:
5920- for (k = 0; k < filter_bytes; ++k) {
5921- cur[k] = STBI__BYTECAST(
5922- raw[k] + prior[k]); // prior[k] == stbi__paeth(0,prior[k],0)
5923- }
5924- for (k = filter_bytes; k < nk; ++k) {
5925- cur[k] = STBI__BYTECAST(
5926- raw[k] + stbi__paeth(cur[k - filter_bytes], prior[k],
5927- prior[k - filter_bytes]));
5928- }
5929- break;
5930- case STBI__F_avg_first:
5931- memcpy(cur, raw, filter_bytes);
5932- for (k = filter_bytes; k < nk; ++k) {
5933- cur[k] = STBI__BYTECAST(raw[k] + (cur[k - filter_bytes] >> 1));
5934- }
5935- break;
5936- }
5937-
5938- raw += nk;
5939-
5940- // expand decoded bits in cur to dest, also adding an extra alpha
5941- // channel if desired
5942- if (depth < 8) {
5943- stbi_uc scale = (color == 0)
5944- ? stbi__depth_scale_table[depth]
5945- : 1; // scale grayscale values to 0..255 range
5946- stbi_uc *in = cur;
5947- stbi_uc *out = dest;
5948- stbi_uc inb = 0;
5949- stbi__uint32 nsmp = x * img_n;
5950-
5951- // expand bits to bytes first
5952- if (depth == 4) {
5953- for (i = 0; i < nsmp; ++i) {
5954- if ((i & 1) == 0) {
5955- inb = *in++;
5956- }
5957- *out++ = scale * (inb >> 4);
5958- inb <<= 4;
5959- }
5960- } else if (depth == 2) {
5961- for (i = 0; i < nsmp; ++i) {
5962- if ((i & 3) == 0) {
5963- inb = *in++;
5964- }
5965- *out++ = scale * (inb >> 6);
5966- inb <<= 2;
5967- }
5968- } else {
5969- STBI_ASSERT(depth == 1);
5970- for (i = 0; i < nsmp; ++i) {
5971- if ((i & 7) == 0) {
5972- inb = *in++;
5973- }
5974- *out++ = scale * (inb >> 7);
5975- inb <<= 1;
5976- }
5977- }
5978-
5979- // insert alpha=255 values if desired
5980- if (img_n != out_n) {
5981- stbi__create_png_alpha_expand8(dest, dest, x, img_n);
5982- }
5983- } else if (depth == 8) {
5984- if (img_n == out_n) {
5985- memcpy(dest, cur, x * img_n);
5986- } else {
5987- stbi__create_png_alpha_expand8(dest, cur, x, img_n);
5988- }
5989- } else if (depth == 16) {
5990- // convert the image data from big-endian to platform-native
5991- stbi__uint16 *dest16 = (stbi__uint16 *)dest;
5992- stbi__uint32 nsmp = x * img_n;
5993-
5994- if (img_n == out_n) {
5995- for (i = 0; i < nsmp; ++i, ++dest16, cur += 2) {
5996- *dest16 = (cur[0] << 8) | cur[1];
5997- }
5998- } else {
5999- STBI_ASSERT(img_n + 1 == out_n);
6000- if (img_n == 1) {
6001- for (i = 0; i < x; ++i, dest16 += 2, cur += 2) {
6002- dest16[0] = (cur[0] << 8) | cur[1];
6003- dest16[1] = 0xffff;
6004- }
6005- } else {
6006- STBI_ASSERT(img_n == 3);
6007- for (i = 0; i < x; ++i, dest16 += 4, cur += 6) {
6008- dest16[0] = (cur[0] << 8) | cur[1];
6009- dest16[1] = (cur[2] << 8) | cur[3];
6010- dest16[2] = (cur[4] << 8) | cur[5];
6011- dest16[3] = 0xffff;
6012- }
6013- }
6014- }
6015- }
6016- }
6017-
6018- STBI_FREE(filter_buf);
6019- if (!all_ok) {
6020- return 0;
6021- }
6022-
6023- return 1;
6024-}
6025-
6026-static int
6027-stbi__create_png_image(stbi__png *a, stbi_uc *image_data,
6028- stbi__uint32 image_data_len, int out_n, int depth,
6029- int color, int interlaced)
6030-{
6031- int bytes = (depth == 16 ? 2 : 1);
6032- int out_bytes = out_n * bytes;
6033- stbi_uc *final;
6034- int p;
6035- if (!interlaced) {
6036- return stbi__create_png_image_raw(a, image_data, image_data_len, out_n,
6037- a->s->img_x, a->s->img_y, depth,
6038- color);
6039- }
6040-
6041- // de-interlacing
6042- final =
6043- (stbi_uc *)stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0);
6044- if (!final) {
6045- return stbi__err("outofmem", "Out of memory");
6046- }
6047- for (p = 0; p < 7; ++p) {
6048- int xorig[] = {0, 4, 0, 2, 0, 1, 0};
6049- int yorig[] = {0, 0, 4, 0, 2, 0, 1};
6050- int xspc[] = {8, 8, 4, 4, 2, 2, 1};
6051- int yspc[] = {8, 8, 8, 4, 4, 2, 2};
6052- int i, j, x, y;
6053- // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
6054- x = (a->s->img_x - xorig[p] + xspc[p] - 1) / xspc[p];
6055- y = (a->s->img_y - yorig[p] + yspc[p] - 1) / yspc[p];
6056- if (x && y) {
6057- stbi__uint32 img_len =
6058- ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
6059- if (!stbi__create_png_image_raw(a, image_data, image_data_len,
6060- out_n, x, y, depth, color)) {
6061- STBI_FREE(final);
6062- return 0;
6063- }
6064- for (j = 0; j < y; ++j) {
6065- for (i = 0; i < x; ++i) {
6066- int out_y = j * yspc[p] + yorig[p];
6067- int out_x = i * xspc[p] + xorig[p];
6068- memcpy(final + out_y * a->s->img_x * out_bytes +
6069- out_x * out_bytes,
6070- a->out + (j * x + i) * out_bytes, out_bytes);
6071- }
6072- }
6073- STBI_FREE(a->out);
6074- image_data += img_len;
6075- image_data_len -= img_len;
6076- }
6077- }
6078- a->out = final;
6079-
6080- return 1;
6081-}
6082-
6083-static int
6084-stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
6085-{
6086- stbi__context *s = z->s;
6087- stbi__uint32 i, pixel_count = s->img_x * s->img_y;
6088- stbi_uc *p = z->out;
6089-
6090- // compute color-based transparency, assuming we've
6091- // already got 255 as the alpha value in the output
6092- STBI_ASSERT(out_n == 2 || out_n == 4);
6093-
6094- if (out_n == 2) {
6095- for (i = 0; i < pixel_count; ++i) {
6096- p[1] = (p[0] == tc[0] ? 0 : 255);
6097- p += 2;
6098- }
6099- } else {
6100- for (i = 0; i < pixel_count; ++i) {
6101- if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2]) {
6102- p[3] = 0;
6103- }
6104- p += 4;
6105- }
6106- }
6107- return 1;
6108-}
6109-
6110-static int
6111-stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
6112-{
6113- stbi__context *s = z->s;
6114- stbi__uint32 i, pixel_count = s->img_x * s->img_y;
6115- stbi__uint16 *p = (stbi__uint16 *)z->out;
6116-
6117- // compute color-based transparency, assuming we've
6118- // already got 65535 as the alpha value in the output
6119- STBI_ASSERT(out_n == 2 || out_n == 4);
6120-
6121- if (out_n == 2) {
6122- for (i = 0; i < pixel_count; ++i) {
6123- p[1] = (p[0] == tc[0] ? 0 : 65535);
6124- p += 2;
6125- }
6126- } else {
6127- for (i = 0; i < pixel_count; ++i) {
6128- if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2]) {
6129- p[3] = 0;
6130- }
6131- p += 4;
6132- }
6133- }
6134- return 1;
6135-}
6136-
6137-static int
6138-stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
6139-{
6140- stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
6141- stbi_uc *p, *temp_out, *orig = a->out;
6142-
6143- p = (stbi_uc *)stbi__malloc_mad2(pixel_count, pal_img_n, 0);
6144- if (p == NULL) {
6145- return stbi__err("outofmem", "Out of memory");
6146- }
6147-
6148- // between here and free(out) below, exitting would leak
6149- temp_out = p;
6150-
6151- if (pal_img_n == 3) {
6152- for (i = 0; i < pixel_count; ++i) {
6153- int n = orig[i] * 4;
6154- p[0] = palette[n];
6155- p[1] = palette[n + 1];
6156- p[2] = palette[n + 2];
6157- p += 3;
6158- }
6159- } else {
6160- for (i = 0; i < pixel_count; ++i) {
6161- int n = orig[i] * 4;
6162- p[0] = palette[n];
6163- p[1] = palette[n + 1];
6164- p[2] = palette[n + 2];
6165- p[3] = palette[n + 3];
6166- p += 4;
6167- }
6168- }
6169- STBI_FREE(a->out);
6170- a->out = temp_out;
6171-
6172- STBI_NOTUSED(len);
6173-
6174- return 1;
6175-}
6176-
6177-static int stbi__unpremultiply_on_load_global = 0;
6178-static int stbi__de_iphone_flag_global = 0;
6179-
6180-STBIDEF void
6181-stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
6182-{
6183- stbi__unpremultiply_on_load_global = flag_true_if_should_unpremultiply;
6184-}
6185-
6186-STBIDEF void
6187-stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
6188-{
6189- stbi__de_iphone_flag_global = flag_true_if_should_convert;
6190-}
6191-
6192-#ifndef STBI_THREAD_LOCAL
6193-#define stbi__unpremultiply_on_load stbi__unpremultiply_on_load_global
6194-#define stbi__de_iphone_flag stbi__de_iphone_flag_global
6195-#else
6196-static STBI_THREAD_LOCAL int stbi__unpremultiply_on_load_local,
6197- stbi__unpremultiply_on_load_set;
6198-static STBI_THREAD_LOCAL int stbi__de_iphone_flag_local,
6199- stbi__de_iphone_flag_set;
6200-
6201-STBIDEF void
6202-stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply)
6203-{
6204- stbi__unpremultiply_on_load_local = flag_true_if_should_unpremultiply;
6205- stbi__unpremultiply_on_load_set = 1;
6206-}
6207-
6208-STBIDEF void
6209-stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert)
6210-{
6211- stbi__de_iphone_flag_local = flag_true_if_should_convert;
6212- stbi__de_iphone_flag_set = 1;
6213-}
6214-
6215-#define stbi__unpremultiply_on_load \
6216- (stbi__unpremultiply_on_load_set ? stbi__unpremultiply_on_load_local \
6217- : stbi__unpremultiply_on_load_global)
6218-#define stbi__de_iphone_flag \
6219- (stbi__de_iphone_flag_set ? stbi__de_iphone_flag_local \
6220- : stbi__de_iphone_flag_global)
6221-#endif // STBI_THREAD_LOCAL
6222-
6223-static void
6224-stbi__de_iphone(stbi__png *z)
6225-{
6226- stbi__context *s = z->s;
6227- stbi__uint32 i, pixel_count = s->img_x * s->img_y;
6228- stbi_uc *p = z->out;
6229-
6230- if (s->img_out_n == 3) { // convert bgr to rgb
6231- for (i = 0; i < pixel_count; ++i) {
6232- stbi_uc t = p[0];
6233- p[0] = p[2];
6234- p[2] = t;
6235- p += 3;
6236- }
6237- } else {
6238- STBI_ASSERT(s->img_out_n == 4);
6239- if (stbi__unpremultiply_on_load) {
6240- // convert bgr to rgb and unpremultiply
6241- for (i = 0; i < pixel_count; ++i) {
6242- stbi_uc a = p[3];
6243- stbi_uc t = p[0];
6244- if (a) {
6245- stbi_uc half = a / 2;
6246- p[0] = (p[2] * 255 + half) / a;
6247- p[1] = (p[1] * 255 + half) / a;
6248- p[2] = (t * 255 + half) / a;
6249- } else {
6250- p[0] = p[2];
6251- p[2] = t;
6252- }
6253- p += 4;
6254- }
6255- } else {
6256- // convert bgr to rgb
6257- for (i = 0; i < pixel_count; ++i) {
6258- stbi_uc t = p[0];
6259- p[0] = p[2];
6260- p[2] = t;
6261- p += 4;
6262- }
6263- }
6264- }
6265-}
6266-
6267-#define STBI__PNG_TYPE(a, b, c, d) \
6268- (((unsigned)(a) << 24) + ((unsigned)(b) << 16) + ((unsigned)(c) << 8) + \
6269- (unsigned)(d))
6270-
6271-static int
6272-stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
6273-{
6274- stbi_uc palette[1024], pal_img_n = 0;
6275- stbi_uc has_trans = 0, tc[3] = {0};
6276- stbi__uint16 tc16[3];
6277- stbi__uint32 ioff = 0, idata_limit = 0, i, pal_len = 0;
6278- int first = 1, k, interlace = 0, color = 0, is_iphone = 0;
6279- stbi__context *s = z->s;
6280-
6281- z->expanded = NULL;
6282- z->idata = NULL;
6283- z->out = NULL;
6284-
6285- if (!stbi__check_png_header(s)) {
6286- return 0;
6287- }
6288-
6289- if (scan == STBI__SCAN_type) {
6290- return 1;
6291- }
6292-
6293- for (;;) {
6294- stbi__pngchunk c = stbi__get_chunk_header(s);
6295- switch (c.type) {
6296- case STBI__PNG_TYPE('C', 'g', 'B', 'I'):
6297- is_iphone = 1;
6298- stbi__skip(s, c.length);
6299- break;
6300- case STBI__PNG_TYPE('I', 'H', 'D', 'R'): {
6301- int comp, filter;
6302- if (!first) {
6303- return stbi__err("multiple IHDR", "Corrupt PNG");
6304- }
6305- first = 0;
6306- if (c.length != 13) {
6307- return stbi__err("bad IHDR len", "Corrupt PNG");
6308- }
6309- s->img_x = stbi__get32be(s);
6310- s->img_y = stbi__get32be(s);
6311- if (s->img_y > STBI_MAX_DIMENSIONS) {
6312- return stbi__err("too large", "Very large image (corrupt?)");
6313- }
6314- if (s->img_x > STBI_MAX_DIMENSIONS) {
6315- return stbi__err("too large", "Very large image (corrupt?)");
6316- }
6317- z->depth = stbi__get8(s);
6318- if (z->depth != 1 && z->depth != 2 && z->depth != 4 &&
6319- z->depth != 8 && z->depth != 16) {
6320- return stbi__err("1/2/4/8/16-bit only",
6321- "PNG not supported: 1/2/4/8/16-bit only");
6322- }
6323- color = stbi__get8(s);
6324- if (color > 6) {
6325- return stbi__err("bad ctype", "Corrupt PNG");
6326- }
6327- if (color == 3 && z->depth == 16) {
6328- return stbi__err("bad ctype", "Corrupt PNG");
6329- }
6330- if (color == 3) {
6331- pal_img_n = 3;
6332- } else if (color & 1) {
6333- return stbi__err("bad ctype", "Corrupt PNG");
6334- }
6335- comp = stbi__get8(s);
6336- if (comp) {
6337- return stbi__err("bad comp method", "Corrupt PNG");
6338- }
6339- filter = stbi__get8(s);
6340- if (filter) {
6341- return stbi__err("bad filter method", "Corrupt PNG");
6342- }
6343- interlace = stbi__get8(s);
6344- if (interlace > 1) {
6345- return stbi__err("bad interlace method", "Corrupt PNG");
6346- }
6347- if (!s->img_x || !s->img_y) {
6348- return stbi__err("0-pixel image", "Corrupt PNG");
6349- }
6350- if (!pal_img_n) {
6351- s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
6352- if ((1 << 30) / s->img_x / s->img_n < s->img_y) {
6353- return stbi__err("too large", "Image too large to decode");
6354- }
6355- } else {
6356- // if paletted, then pal_n is our final components, and
6357- // img_n is # components to decompress/filter.
6358- s->img_n = 1;
6359- if ((1 << 30) / s->img_x / 4 < s->img_y) {
6360- return stbi__err("too large", "Corrupt PNG");
6361- }
6362- }
6363- // even with SCAN_header, have to scan to see if we have a tRNS
6364- break;
6365- }
6366-
6367- case STBI__PNG_TYPE('P', 'L', 'T', 'E'): {
6368- if (first) {
6369- return stbi__err("first not IHDR", "Corrupt PNG");
6370- }
6371- if (c.length > 256 * 3) {
6372- return stbi__err("invalid PLTE", "Corrupt PNG");
6373- }
6374- pal_len = c.length / 3;
6375- if (pal_len * 3 != c.length) {
6376- return stbi__err("invalid PLTE", "Corrupt PNG");
6377- }
6378- for (i = 0; i < pal_len; ++i) {
6379- palette[i * 4 + 0] = stbi__get8(s);
6380- palette[i * 4 + 1] = stbi__get8(s);
6381- palette[i * 4 + 2] = stbi__get8(s);
6382- palette[i * 4 + 3] = 255;
6383- }
6384- break;
6385- }
6386-
6387- case STBI__PNG_TYPE('t', 'R', 'N', 'S'): {
6388- if (first) {
6389- return stbi__err("first not IHDR", "Corrupt PNG");
6390- }
6391- if (z->idata) {
6392- return stbi__err("tRNS after IDAT", "Corrupt PNG");
6393- }
6394- if (pal_img_n) {
6395- if (scan == STBI__SCAN_header) {
6396- s->img_n = 4;
6397- return 1;
6398- }
6399- if (pal_len == 0) {
6400- return stbi__err("tRNS before PLTE", "Corrupt PNG");
6401- }
6402- if (c.length > pal_len) {
6403- return stbi__err("bad tRNS len", "Corrupt PNG");
6404- }
6405- pal_img_n = 4;
6406- for (i = 0; i < c.length; ++i) {
6407- palette[i * 4 + 3] = stbi__get8(s);
6408- }
6409- } else {
6410- if (!(s->img_n & 1)) {
6411- return stbi__err("tRNS with alpha", "Corrupt PNG");
6412- }
6413- if (c.length != (stbi__uint32)s->img_n * 2) {
6414- return stbi__err("bad tRNS len", "Corrupt PNG");
6415- }
6416- has_trans = 1;
6417- // non-paletted with tRNS = constant alpha. if header-scanning,
6418- // we can stop now.
6419- if (scan == STBI__SCAN_header) {
6420- ++s->img_n;
6421- return 1;
6422- }
6423- if (z->depth == 16) {
6424- for (k = 0; k < s->img_n && k < 3;
6425- ++k) { // extra loop test to suppress false GCC warning
6426- tc16[k] = (stbi__uint16)stbi__get16be(
6427- s); // copy the values as-is
6428- }
6429- } else {
6430- for (k = 0; k < s->img_n && k < 3; ++k) {
6431- tc[k] =
6432- (stbi_uc)(stbi__get16be(s) & 255) *
6433- stbi__depth_scale_table
6434- [z->depth]; // non 8-bit images will be larger
6435- }
6436- }
6437- }
6438- break;
6439- }
6440-
6441- case STBI__PNG_TYPE('I', 'D', 'A', 'T'): {
6442- if (first) {
6443- return stbi__err("first not IHDR", "Corrupt PNG");
6444- }
6445- if (pal_img_n && !pal_len) {
6446- return stbi__err("no PLTE", "Corrupt PNG");
6447- }
6448- if (scan == STBI__SCAN_header) {
6449- // header scan definitely stops at first IDAT
6450- if (pal_img_n) {
6451- s->img_n = pal_img_n;
6452- }
6453- return 1;
6454- }
6455- if (c.length > (1u << 30)) {
6456- return stbi__err("IDAT size limit",
6457- "IDAT section larger than 2^30 bytes");
6458- }
6459- if ((int)(ioff + c.length) < (int)ioff) {
6460- return 0;
6461- }
6462- if (ioff + c.length > idata_limit) {
6463- stbi__uint32 idata_limit_old = idata_limit;
6464- stbi_uc *p;
6465- if (idata_limit == 0) {
6466- idata_limit = c.length > 4096 ? c.length : 4096;
6467- }
6468- while (ioff + c.length > idata_limit) {
6469- idata_limit *= 2;
6470- }
6471- STBI_NOTUSED(idata_limit_old);
6472- p = (stbi_uc *)STBI_REALLOC_SIZED(z->idata, idata_limit_old,
6473- idata_limit);
6474- if (p == NULL) {
6475- return stbi__err("outofmem", "Out of memory");
6476- }
6477- z->idata = p;
6478- }
6479- if (!stbi__getn(s, z->idata + ioff, c.length)) {
6480- return stbi__err("outofdata", "Corrupt PNG");
6481- }
6482- ioff += c.length;
6483- break;
6484- }
6485-
6486- case STBI__PNG_TYPE('I', 'E', 'N', 'D'): {
6487- stbi__uint32 raw_len, bpl;
6488- if (first) {
6489- return stbi__err("first not IHDR", "Corrupt PNG");
6490- }
6491- if (scan != STBI__SCAN_load) {
6492- return 1;
6493- }
6494- if (z->idata == NULL) {
6495- return stbi__err("no IDAT", "Corrupt PNG");
6496- }
6497- // initial guess for decoded data size to avoid unnecessary reallocs
6498- bpl =
6499- (s->img_x * z->depth + 7) / 8; // bytes per line, per component
6500- raw_len = bpl * s->img_y * s->img_n /* pixels */ +
6501- s->img_y /* filter mode per row */;
6502- z->expanded =
6503- (stbi_uc *)stbi_zlib_decode_malloc_guesssize_headerflag(
6504- (char *)z->idata, ioff, raw_len, (int *)&raw_len,
6505- !is_iphone);
6506- if (z->expanded == NULL) {
6507- return 0; // zlib should set error
6508- }
6509- STBI_FREE(z->idata);
6510- z->idata = NULL;
6511- if ((req_comp == s->img_n + 1 && req_comp != 3 && !pal_img_n) ||
6512- has_trans) {
6513- s->img_out_n = s->img_n + 1;
6514- } else {
6515- s->img_out_n = s->img_n;
6516- }
6517- if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n,
6518- z->depth, color, interlace)) {
6519- return 0;
6520- }
6521- if (has_trans) {
6522- if (z->depth == 16) {
6523- if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) {
6524- return 0;
6525- }
6526- } else {
6527- if (!stbi__compute_transparency(z, tc, s->img_out_n)) {
6528- return 0;
6529- }
6530- }
6531- }
6532- if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2) {
6533- stbi__de_iphone(z);
6534- }
6535- if (pal_img_n) {
6536- // pal_img_n == 3 or 4
6537- s->img_n = pal_img_n; // record the actual colors we had
6538- s->img_out_n = pal_img_n;
6539- if (req_comp >= 3) {
6540- s->img_out_n = req_comp;
6541- }
6542- if (!stbi__expand_png_palette(z, palette, pal_len,
6543- s->img_out_n)) {
6544- return 0;
6545- }
6546- } else if (has_trans) {
6547- // non-paletted image with tRNS -> source image has (constant)
6548- // alpha
6549- ++s->img_n;
6550- }
6551- STBI_FREE(z->expanded);
6552- z->expanded = NULL;
6553- // end of PNG chunk, read and skip CRC
6554- stbi__get32be(s);
6555- return 1;
6556- }
6557-
6558- default:
6559- // if critical, fail
6560- if (first) {
6561- return stbi__err("first not IHDR", "Corrupt PNG");
6562- }
6563- if ((c.type & (1 << 29)) == 0) {
6564-#ifndef STBI_NO_FAILURE_STRINGS
6565- // not threadsafe
6566- static char invalid_chunk[] = "XXXX PNG chunk not known";
6567- invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
6568- invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
6569- invalid_chunk[2] = STBI__BYTECAST(c.type >> 8);
6570- invalid_chunk[3] = STBI__BYTECAST(c.type >> 0);
6571-#endif
6572- return stbi__err(invalid_chunk,
6573- "PNG not supported: unknown PNG chunk type");
6574- }
6575- stbi__skip(s, c.length);
6576- break;
6577- }
6578- // end of PNG chunk, read and skip CRC
6579- stbi__get32be(s);
6580- }
6581-}
6582-
6583-static void *
6584-stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp,
6585- stbi__result_info *ri)
6586-{
6587- void *result = NULL;
6588- if (req_comp < 0 || req_comp > 4) {
6589- return stbi__errpuc("bad req_comp", "Internal error");
6590- }
6591- if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
6592- if (p->depth <= 8) {
6593- ri->bits_per_channel = 8;
6594- } else if (p->depth == 16) {
6595- ri->bits_per_channel = 16;
6596- } else {
6597- return stbi__errpuc("bad bits_per_channel",
6598- "PNG not supported: unsupported color depth");
6599- }
6600- result = p->out;
6601- p->out = NULL;
6602- if (req_comp && req_comp != p->s->img_out_n) {
6603- if (ri->bits_per_channel == 8) {
6604- result = stbi__convert_format((unsigned char *)result,
6605- p->s->img_out_n, req_comp,
6606- p->s->img_x, p->s->img_y);
6607- } else {
6608- result = stbi__convert_format16((stbi__uint16 *)result,
6609- p->s->img_out_n, req_comp,
6610- p->s->img_x, p->s->img_y);
6611- }
6612- p->s->img_out_n = req_comp;
6613- if (result == NULL) {
6614- return result;
6615- }
6616- }
6617- *x = p->s->img_x;
6618- *y = p->s->img_y;
6619- if (n) {
6620- *n = p->s->img_n;
6621- }
6622- }
6623- STBI_FREE(p->out);
6624- p->out = NULL;
6625- STBI_FREE(p->expanded);
6626- p->expanded = NULL;
6627- STBI_FREE(p->idata);
6628- p->idata = NULL;
6629-
6630- return result;
6631-}
6632-
6633-static void *
6634-stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
6635- stbi__result_info *ri)
6636-{
6637- stbi__png p;
6638- p.s = s;
6639- return stbi__do_png(&p, x, y, comp, req_comp, ri);
6640-}
6641-
6642-static int
6643-stbi__png_test(stbi__context *s)
6644-{
6645- int r;
6646- r = stbi__check_png_header(s);
6647- stbi__rewind(s);
6648- return r;
6649-}
6650-
6651-static int
6652-stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
6653-{
6654- if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
6655- stbi__rewind(p->s);
6656- return 0;
6657- }
6658- if (x) {
6659- *x = p->s->img_x;
6660- }
6661- if (y) {
6662- *y = p->s->img_y;
6663- }
6664- if (comp) {
6665- *comp = p->s->img_n;
6666- }
6667- return 1;
6668-}
6669-
6670-static int
6671-stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
6672-{
6673- stbi__png p;
6674- p.s = s;
6675- return stbi__png_info_raw(&p, x, y, comp);
6676-}
6677-
6678-static int
6679-stbi__png_is16(stbi__context *s)
6680-{
6681- stbi__png p;
6682- p.s = s;
6683- if (!stbi__png_info_raw(&p, NULL, NULL, NULL)) {
6684- return 0;
6685- }
6686- if (p.depth != 16) {
6687- stbi__rewind(p.s);
6688- return 0;
6689- }
6690- return 1;
6691-}
6692-#endif
6693-
6694-// Microsoft/Windows BMP image
6695-
6696-#ifndef STBI_NO_BMP
6697-static int
6698-stbi__bmp_test_raw(stbi__context *s)
6699-{
6700- int r;
6701- int sz;
6702- if (stbi__get8(s) != 'B') {
6703- return 0;
6704- }
6705- if (stbi__get8(s) != 'M') {
6706- return 0;
6707- }
6708- stbi__get32le(s); // discard filesize
6709- stbi__get16le(s); // discard reserved
6710- stbi__get16le(s); // discard reserved
6711- stbi__get32le(s); // discard data offset
6712- sz = stbi__get32le(s);
6713- r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
6714- return r;
6715-}
6716-
6717-static int
6718-stbi__bmp_test(stbi__context *s)
6719-{
6720- int r = stbi__bmp_test_raw(s);
6721- stbi__rewind(s);
6722- return r;
6723-}
6724-
6725-// returns 0..31 for the highest set bit
6726-static int
6727-stbi__high_bit(unsigned int z)
6728-{
6729- int n = 0;
6730- if (z == 0) {
6731- return -1;
6732- }
6733- if (z >= 0x10000) {
6734- n += 16;
6735- z >>= 16;
6736- }
6737- if (z >= 0x00100) {
6738- n += 8;
6739- z >>= 8;
6740- }
6741- if (z >= 0x00010) {
6742- n += 4;
6743- z >>= 4;
6744- }
6745- if (z >= 0x00004) {
6746- n += 2;
6747- z >>= 2;
6748- }
6749- if (z >= 0x00002) {
6750- n += 1; /* >>= 1;*/
6751- }
6752- return n;
6753-}
6754-
6755-static int
6756-stbi__bitcount(unsigned int a)
6757-{
6758- a = (a & 0x55555555) + ((a >> 1) & 0x55555555); // max 2
6759- a = (a & 0x33333333) + ((a >> 2) & 0x33333333); // max 4
6760- a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
6761- a = (a + (a >> 8)); // max 16 per 8 bits
6762- a = (a + (a >> 16)); // max 32 per 8 bits
6763- return a & 0xff;
6764-}
6765-
6766-// extract an arbitrarily-aligned N-bit value (N=bits)
6767-// from v, and then make it 8-bits long and fractionally
6768-// extend it to full full range.
6769-static int
6770-stbi__shiftsigned(unsigned int v, int shift, int bits)
6771-{
6772- static unsigned int mul_table[9] = {
6773- 0,
6774- 0xff /*0b11111111*/,
6775- 0x55 /*0b01010101*/,
6776- 0x49 /*0b01001001*/,
6777- 0x11 /*0b00010001*/,
6778- 0x21 /*0b00100001*/,
6779- 0x41 /*0b01000001*/,
6780- 0x81 /*0b10000001*/,
6781- 0x01 /*0b00000001*/,
6782- };
6783- static unsigned int shift_table[9] = {
6784- 0, 0, 0, 1, 0, 2, 4, 6, 0,
6785- };
6786- if (shift < 0) {
6787- v <<= -shift;
6788- } else {
6789- v >>= shift;
6790- }
6791- STBI_ASSERT(v < 256);
6792- v >>= (8 - bits);
6793- STBI_ASSERT(bits >= 0 && bits <= 8);
6794- return (int)((unsigned)v * mul_table[bits]) >> shift_table[bits];
6795-}
6796-
6797-typedef struct {
6798- int bpp, offset, hsz;
6799- unsigned int mr, mg, mb, ma, all_a;
6800- int extra_read;
6801-} stbi__bmp_data;
6802-
6803-static int
6804-stbi__bmp_set_mask_defaults(stbi__bmp_data *info, int compress)
6805-{
6806- // BI_BITFIELDS specifies masks explicitly, don't override
6807- if (compress == 3) {
6808- return 1;
6809- }
6810-
6811- if (compress == 0) {
6812- if (info->bpp == 16) {
6813- info->mr = 31u << 10;
6814- info->mg = 31u << 5;
6815- info->mb = 31u << 0;
6816- } else if (info->bpp == 32) {
6817- info->mr = 0xffu << 16;
6818- info->mg = 0xffu << 8;
6819- info->mb = 0xffu << 0;
6820- info->ma = 0xffu << 24;
6821- info->all_a = 0; // if all_a is 0 at end, then we loaded alpha
6822- // channel but it was all 0
6823- } else {
6824- // otherwise, use defaults, which is all-0
6825- info->mr = info->mg = info->mb = info->ma = 0;
6826- }
6827- return 1;
6828- }
6829- return 0; // error
6830-}
6831-
6832-static void *
6833-stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
6834-{
6835- int hsz;
6836- if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') {
6837- return stbi__errpuc("not BMP", "Corrupt BMP");
6838- }
6839- stbi__get32le(s); // discard filesize
6840- stbi__get16le(s); // discard reserved
6841- stbi__get16le(s); // discard reserved
6842- info->offset = stbi__get32le(s);
6843- info->hsz = hsz = stbi__get32le(s);
6844- info->mr = info->mg = info->mb = info->ma = 0;
6845- info->extra_read = 14;
6846-
6847- if (info->offset < 0) {
6848- return stbi__errpuc("bad BMP", "bad BMP");
6849- }
6850-
6851- if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) {
6852- return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
6853- }
6854- if (hsz == 12) {
6855- s->img_x = stbi__get16le(s);
6856- s->img_y = stbi__get16le(s);
6857- } else {
6858- s->img_x = stbi__get32le(s);
6859- s->img_y = stbi__get32le(s);
6860- }
6861- if (stbi__get16le(s) != 1) {
6862- return stbi__errpuc("bad BMP", "bad BMP");
6863- }
6864- info->bpp = stbi__get16le(s);
6865- if (hsz != 12) {
6866- int compress = stbi__get32le(s);
6867- if (compress == 1 || compress == 2) {
6868- return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
6869- }
6870- if (compress >= 4) {
6871- return stbi__errpuc(
6872- "BMP JPEG/PNG",
6873- "BMP type not supported: unsupported compression"); // this
6874- // includes
6875- // PNG/JPEG
6876- // modes
6877- }
6878- if (compress == 3 && info->bpp != 16 && info->bpp != 32) {
6879- return stbi__errpuc(
6880- "bad BMP", "bad BMP"); // bitfields requires 16 or 32 bits/pixel
6881- }
6882- stbi__get32le(s); // discard sizeof
6883- stbi__get32le(s); // discard hres
6884- stbi__get32le(s); // discard vres
6885- stbi__get32le(s); // discard colorsused
6886- stbi__get32le(s); // discard max important
6887- if (hsz == 40 || hsz == 56) {
6888- if (hsz == 56) {
6889- stbi__get32le(s);
6890- stbi__get32le(s);
6891- stbi__get32le(s);
6892- stbi__get32le(s);
6893- }
6894- if (info->bpp == 16 || info->bpp == 32) {
6895- if (compress == 0) {
6896- stbi__bmp_set_mask_defaults(info, compress);
6897- } else if (compress == 3) {
6898- info->mr = stbi__get32le(s);
6899- info->mg = stbi__get32le(s);
6900- info->mb = stbi__get32le(s);
6901- info->extra_read += 12;
6902- // not documented, but generated by photoshop and handled by
6903- // mspaint
6904- if (info->mr == info->mg && info->mg == info->mb) {
6905- // ?!?!?
6906- return stbi__errpuc("bad BMP", "bad BMP");
6907- }
6908- } else {
6909- return stbi__errpuc("bad BMP", "bad BMP");
6910- }
6911- }
6912- } else {
6913- // V4/V5 header
6914- int i;
6915- if (hsz != 108 && hsz != 124) {
6916- return stbi__errpuc("bad BMP", "bad BMP");
6917- }
6918- info->mr = stbi__get32le(s);
6919- info->mg = stbi__get32le(s);
6920- info->mb = stbi__get32le(s);
6921- info->ma = stbi__get32le(s);
6922- if (compress != 3) { // override mr/mg/mb unless in BI_BITFIELDS
6923- // mode, as per docs
6924- stbi__bmp_set_mask_defaults(info, compress);
6925- }
6926- stbi__get32le(s); // discard color space
6927- for (i = 0; i < 12; ++i) {
6928- stbi__get32le(s); // discard color space parameters
6929- }
6930- if (hsz == 124) {
6931- stbi__get32le(s); // discard rendering intent
6932- stbi__get32le(s); // discard offset of profile data
6933- stbi__get32le(s); // discard size of profile data
6934- stbi__get32le(s); // discard reserved
6935- }
6936- }
6937- }
6938- return (void *)1;
6939-}
6940-
6941-static void *
6942-stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
6943- stbi__result_info *ri)
6944-{
6945- stbi_uc *out;
6946- unsigned int mr = 0, mg = 0, mb = 0, ma = 0, all_a;
6947- stbi_uc pal[256][4];
6948- int psize = 0, i, j, width;
6949- int flip_vertically, pad, target;
6950- stbi__bmp_data info;
6951- STBI_NOTUSED(ri);
6952-
6953- info.all_a = 255;
6954- if (stbi__bmp_parse_header(s, &info) == NULL) {
6955- return NULL; // error code already set
6956- }
6957-
6958- flip_vertically = ((int)s->img_y) > 0;
6959- s->img_y = abs((int)s->img_y);
6960-
6961- if (s->img_y > STBI_MAX_DIMENSIONS) {
6962- return stbi__errpuc("too large", "Very large image (corrupt?)");
6963- }
6964- if (s->img_x > STBI_MAX_DIMENSIONS) {
6965- return stbi__errpuc("too large", "Very large image (corrupt?)");
6966- }
6967-
6968- mr = info.mr;
6969- mg = info.mg;
6970- mb = info.mb;
6971- ma = info.ma;
6972- all_a = info.all_a;
6973-
6974- if (info.hsz == 12) {
6975- if (info.bpp < 24) {
6976- psize = (info.offset - info.extra_read - 24) / 3;
6977- }
6978- } else {
6979- if (info.bpp < 16) {
6980- psize = (info.offset - info.extra_read - info.hsz) >> 2;
6981- }
6982- }
6983- if (psize == 0) {
6984- // accept some number of extra bytes after the header, but if the offset
6985- // points either to before the header ends or implies a large amount of
6986- // extra data, reject the file as malformed
6987- int bytes_read_so_far = s->callback_already_read +
6988- (int)(s->img_buffer - s->img_buffer_original);
6989- int header_limit =
6990- 1024; // max we actually read is below 256 bytes currently.
6991- int extra_data_limit =
6992- 256 * 4; // what ordinarily goes here is a palette; 256 entries*4
6993- // bytes is its max size.
6994- if (bytes_read_so_far <= 0 || bytes_read_so_far > header_limit) {
6995- return stbi__errpuc("bad header", "Corrupt BMP");
6996- }
6997- // we established that bytes_read_so_far is positive and sensible.
6998- // the first half of this test rejects offsets that are either too small
6999- // positives, or negative, and guarantees that info.offset >=
7000- // bytes_read_so_far > 0. this in turn ensures the number computed in
7001- // the second half of the test can't overflow.
7002- if (info.offset < bytes_read_so_far ||
7003- info.offset - bytes_read_so_far > extra_data_limit) {
7004- return stbi__errpuc("bad offset", "Corrupt BMP");
7005- } else {
7006- stbi__skip(s, info.offset - bytes_read_so_far);
7007- }
7008- }
7009-
7010- if (info.bpp == 24 && ma == 0xff000000) {
7011- s->img_n = 3;
7012- } else {
7013- s->img_n = ma ? 4 : 3;
7014- }
7015- if (req_comp && req_comp >= 3) { // we can directly decode 3 or 4
7016- target = req_comp;
7017- } else {
7018- target = s->img_n; // if they want monochrome, we'll post-convert
7019- }
7020-
7021- // sanity-check size
7022- if (!stbi__mad3sizes_valid(target, s->img_x, s->img_y, 0)) {
7023- return stbi__errpuc("too large", "Corrupt BMP");
7024- }
7025-
7026- out = (stbi_uc *)stbi__malloc_mad3(target, s->img_x, s->img_y, 0);
7027- if (!out) {
7028- return stbi__errpuc("outofmem", "Out of memory");
7029- }
7030- if (info.bpp < 16) {
7031- int z = 0;
7032- if (psize == 0 || psize > 256) {
7033- STBI_FREE(out);
7034- return stbi__errpuc("invalid", "Corrupt BMP");
7035- }
7036- for (i = 0; i < psize; ++i) {
7037- pal[i][2] = stbi__get8(s);
7038- pal[i][1] = stbi__get8(s);
7039- pal[i][0] = stbi__get8(s);
7040- if (info.hsz != 12) {
7041- stbi__get8(s);
7042- }
7043- pal[i][3] = 255;
7044- }
7045- stbi__skip(s, info.offset - info.extra_read - info.hsz -
7046- psize * (info.hsz == 12 ? 3 : 4));
7047- if (info.bpp == 1) {
7048- width = (s->img_x + 7) >> 3;
7049- } else if (info.bpp == 4) {
7050- width = (s->img_x + 1) >> 1;
7051- } else if (info.bpp == 8) {
7052- width = s->img_x;
7053- } else {
7054- STBI_FREE(out);
7055- return stbi__errpuc("bad bpp", "Corrupt BMP");
7056- }
7057- pad = (-width) & 3;
7058- if (info.bpp == 1) {
7059- for (j = 0; j < (int)s->img_y; ++j) {
7060- int bit_offset = 7, v = stbi__get8(s);
7061- for (i = 0; i < (int)s->img_x; ++i) {
7062- int color = (v >> bit_offset) & 0x1;
7063- out[z++] = pal[color][0];
7064- out[z++] = pal[color][1];
7065- out[z++] = pal[color][2];
7066- if (target == 4) {
7067- out[z++] = 255;
7068- }
7069- if (i + 1 == (int)s->img_x) {
7070- break;
7071- }
7072- if ((--bit_offset) < 0) {
7073- bit_offset = 7;
7074- v = stbi__get8(s);
7075- }
7076- }
7077- stbi__skip(s, pad);
7078- }
7079- } else {
7080- for (j = 0; j < (int)s->img_y; ++j) {
7081- for (i = 0; i < (int)s->img_x; i += 2) {
7082- int v = stbi__get8(s), v2 = 0;
7083- if (info.bpp == 4) {
7084- v2 = v & 15;
7085- v >>= 4;
7086- }
7087- out[z++] = pal[v][0];
7088- out[z++] = pal[v][1];
7089- out[z++] = pal[v][2];
7090- if (target == 4) {
7091- out[z++] = 255;
7092- }
7093- if (i + 1 == (int)s->img_x) {
7094- break;
7095- }
7096- v = (info.bpp == 8) ? stbi__get8(s) : v2;
7097- out[z++] = pal[v][0];
7098- out[z++] = pal[v][1];
7099- out[z++] = pal[v][2];
7100- if (target == 4) {
7101- out[z++] = 255;
7102- }
7103- }
7104- stbi__skip(s, pad);
7105- }
7106- }
7107- } else {
7108- int rshift = 0, gshift = 0, bshift = 0, ashift = 0, rcount = 0,
7109- gcount = 0, bcount = 0, acount = 0;
7110- int z = 0;
7111- int easy = 0;
7112- stbi__skip(s, info.offset - info.extra_read - info.hsz);
7113- if (info.bpp == 24) {
7114- width = 3 * s->img_x;
7115- } else if (info.bpp == 16) {
7116- width = 2 * s->img_x;
7117- } else { /* bpp = 32 and pad = 0 */
7118- width = 0;
7119- }
7120- pad = (-width) & 3;
7121- if (info.bpp == 24) {
7122- easy = 1;
7123- } else if (info.bpp == 32) {
7124- if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 &&
7125- ma == 0xff000000) {
7126- easy = 2;
7127- }
7128- }
7129- if (!easy) {
7130- if (!mr || !mg || !mb) {
7131- STBI_FREE(out);
7132- return stbi__errpuc("bad masks", "Corrupt BMP");
7133- }
7134- // right shift amt to put high bit in position #7
7135- rshift = stbi__high_bit(mr) - 7;
7136- rcount = stbi__bitcount(mr);
7137- gshift = stbi__high_bit(mg) - 7;
7138- gcount = stbi__bitcount(mg);
7139- bshift = stbi__high_bit(mb) - 7;
7140- bcount = stbi__bitcount(mb);
7141- ashift = stbi__high_bit(ma) - 7;
7142- acount = stbi__bitcount(ma);
7143- if (rcount > 8 || gcount > 8 || bcount > 8 || acount > 8) {
7144- STBI_FREE(out);
7145- return stbi__errpuc("bad masks", "Corrupt BMP");
7146- }
7147- }
7148- for (j = 0; j < (int)s->img_y; ++j) {
7149- if (easy) {
7150- for (i = 0; i < (int)s->img_x; ++i) {
7151- unsigned char a;
7152- out[z + 2] = stbi__get8(s);
7153- out[z + 1] = stbi__get8(s);
7154- out[z + 0] = stbi__get8(s);
7155- z += 3;
7156- a = (easy == 2 ? stbi__get8(s) : 255);
7157- all_a |= a;
7158- if (target == 4) {
7159- out[z++] = a;
7160- }
7161- }
7162- } else {
7163- int bpp = info.bpp;
7164- for (i = 0; i < (int)s->img_x; ++i) {
7165- stbi__uint32 v = (bpp == 16 ? (stbi__uint32)stbi__get16le(s)
7166- : stbi__get32le(s));
7167- unsigned int a;
7168- out[z++] = STBI__BYTECAST(
7169- stbi__shiftsigned(v & mr, rshift, rcount));
7170- out[z++] = STBI__BYTECAST(
7171- stbi__shiftsigned(v & mg, gshift, gcount));
7172- out[z++] = STBI__BYTECAST(
7173- stbi__shiftsigned(v & mb, bshift, bcount));
7174- a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
7175- all_a |= a;
7176- if (target == 4) {
7177- out[z++] = STBI__BYTECAST(a);
7178- }
7179- }
7180- }
7181- stbi__skip(s, pad);
7182- }
7183- }
7184-
7185- // if alpha channel is all 0s, replace with all 255s
7186- if (target == 4 && all_a == 0) {
7187- for (i = 4 * s->img_x * s->img_y - 1; i >= 0; i -= 4) {
7188- out[i] = 255;
7189- }
7190- }
7191-
7192- if (flip_vertically) {
7193- stbi_uc t;
7194- for (j = 0; j < (int)s->img_y >> 1; ++j) {
7195- stbi_uc *p1 = out + j * s->img_x * target;
7196- stbi_uc *p2 = out + (s->img_y - 1 - j) * s->img_x * target;
7197- for (i = 0; i < (int)s->img_x * target; ++i) {
7198- t = p1[i];
7199- p1[i] = p2[i];
7200- p2[i] = t;
7201- }
7202- }
7203- }
7204-
7205- if (req_comp && req_comp != target) {
7206- out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
7207- if (out == NULL) {
7208- return out; // stbi__convert_format frees input on failure
7209- }
7210- }
7211-
7212- *x = s->img_x;
7213- *y = s->img_y;
7214- if (comp) {
7215- *comp = s->img_n;
7216- }
7217- return out;
7218-}
7219-#endif
7220-
7221-// Targa Truevision - TGA
7222-// by Jonathan Dummer
7223-#ifndef STBI_NO_TGA
7224-// returns STBI_rgb or whatever, 0 on error
7225-static int
7226-stbi__tga_get_comp(int bits_per_pixel, int is_grey, int *is_rgb16)
7227-{
7228- // only RGB or RGBA (incl. 16bit) or grey allowed
7229- if (is_rgb16) {
7230- *is_rgb16 = 0;
7231- }
7232- switch (bits_per_pixel) {
7233- case 8:
7234- return STBI_grey;
7235- case 16:
7236- if (is_grey) {
7237- return STBI_grey_alpha;
7238- }
7239- // fallthrough
7240- case 15:
7241- if (is_rgb16) {
7242- *is_rgb16 = 1;
7243- }
7244- return STBI_rgb;
7245- case 24: // fallthrough
7246- case 32:
7247- return bits_per_pixel / 8;
7248- default:
7249- return 0;
7250- }
7251-}
7252-
7253-static int
7254-stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
7255-{
7256- int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel,
7257- tga_colormap_bpp;
7258- int sz, tga_colormap_type;
7259- stbi__get8(s); // discard Offset
7260- tga_colormap_type = stbi__get8(s); // colormap type
7261- if (tga_colormap_type > 1) {
7262- stbi__rewind(s);
7263- return 0; // only RGB or indexed allowed
7264- }
7265- tga_image_type = stbi__get8(s); // image type
7266- if (tga_colormap_type == 1) { // colormapped (paletted) image
7267- if (tga_image_type != 1 && tga_image_type != 9) {
7268- stbi__rewind(s);
7269- return 0;
7270- }
7271- stbi__skip(
7272- s, 4); // skip index of first colormap entry and number of entries
7273- sz = stbi__get8(s); // check bits per palette color entry
7274- if ((sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32)) {
7275- stbi__rewind(s);
7276- return 0;
7277- }
7278- stbi__skip(s, 4); // skip image x and y origin
7279- tga_colormap_bpp = sz;
7280- } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
7281- if ((tga_image_type != 2) && (tga_image_type != 3) &&
7282- (tga_image_type != 10) && (tga_image_type != 11)) {
7283- stbi__rewind(s);
7284- return 0; // only RGB or grey allowed, +/- RLE
7285- }
7286- stbi__skip(s, 9); // skip colormap specification and image x/y origin
7287- tga_colormap_bpp = 0;
7288- }
7289- tga_w = stbi__get16le(s);
7290- if (tga_w < 1) {
7291- stbi__rewind(s);
7292- return 0; // test width
7293- }
7294- tga_h = stbi__get16le(s);
7295- if (tga_h < 1) {
7296- stbi__rewind(s);
7297- return 0; // test height
7298- }
7299- tga_bits_per_pixel = stbi__get8(s); // bits per pixel
7300- stbi__get8(s); // ignore alpha bits
7301- if (tga_colormap_bpp != 0) {
7302- if ((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
7303- // when using a colormap, tga_bits_per_pixel is the size of the
7304- // indexes I don't think anything but 8 or 16bit indexes makes sense
7305- stbi__rewind(s);
7306- return 0;
7307- }
7308- tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
7309- } else {
7310- tga_comp = stbi__tga_get_comp(
7311- tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11),
7312- NULL);
7313- }
7314- if (!tga_comp) {
7315- stbi__rewind(s);
7316- return 0;
7317- }
7318- if (x) {
7319- *x = tga_w;
7320- }
7321- if (y) {
7322- *y = tga_h;
7323- }
7324- if (comp) {
7325- *comp = tga_comp;
7326- }
7327- return 1; // seems to have passed everything
7328-}
7329-
7330-static int
7331-stbi__tga_test(stbi__context *s)
7332-{
7333- int res = 0;
7334- int sz, tga_color_type;
7335- stbi__get8(s); // discard Offset
7336- tga_color_type = stbi__get8(s); // color type
7337- if (tga_color_type > 1) {
7338- goto errorEnd; // only RGB or indexed allowed
7339- }
7340- sz = stbi__get8(s); // image type
7341- if (tga_color_type == 1) { // colormapped (paletted) image
7342- if (sz != 1 && sz != 9) {
7343- goto errorEnd; // colortype 1 demands image type 1 or 9
7344- }
7345- stbi__skip(
7346- s, 4); // skip index of first colormap entry and number of entries
7347- sz = stbi__get8(s); // check bits per palette color entry
7348- if ((sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32)) {
7349- goto errorEnd;
7350- }
7351- stbi__skip(s, 4); // skip image x and y origin
7352- } else { // "normal" image w/o colormap
7353- if ((sz != 2) && (sz != 3) && (sz != 10) && (sz != 11)) {
7354- goto errorEnd; // only RGB or grey allowed, +/- RLE
7355- }
7356- stbi__skip(s, 9); // skip colormap specification and image x/y origin
7357- }
7358- if (stbi__get16le(s) < 1) {
7359- goto errorEnd; // test width
7360- }
7361- if (stbi__get16le(s) < 1) {
7362- goto errorEnd; // test height
7363- }
7364- sz = stbi__get8(s); // bits per pixel
7365- if ((tga_color_type == 1) && (sz != 8) && (sz != 16)) {
7366- goto errorEnd; // for colormapped images, bpp is size of an index
7367- }
7368- if ((sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32)) {
7369- goto errorEnd;
7370- }
7371-
7372- res = 1; // if we got this far, everything's good and we can return 1
7373- // instead of 0
7374-
7375-errorEnd:
7376- stbi__rewind(s);
7377- return res;
7378-}
7379-
7380-// read 16bit value and convert to 24bit RGB
7381-static void
7382-stbi__tga_read_rgb16(stbi__context *s, stbi_uc *out)
7383-{
7384- stbi__uint16 px = (stbi__uint16)stbi__get16le(s);
7385- stbi__uint16 fiveBitMask = 31;
7386- // we have 3 channels with 5bits each
7387- int r = (px >> 10) & fiveBitMask;
7388- int g = (px >> 5) & fiveBitMask;
7389- int b = px & fiveBitMask;
7390- // Note that this saves the data in RGB(A) order, so it doesn't need to be
7391- // swapped later
7392- out[0] = (stbi_uc)((r * 255) / 31);
7393- out[1] = (stbi_uc)((g * 255) / 31);
7394- out[2] = (stbi_uc)((b * 255) / 31);
7395-
7396- // some people claim that the most significant bit might be used for alpha
7397- // (possibly if an alpha-bit is set in the "image descriptor byte")
7398- // but that only made 16bit test images completely translucent..
7399- // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
7400-}
7401-
7402-static void *
7403-stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
7404- stbi__result_info *ri)
7405-{
7406- // read in the TGA header stuff
7407- int tga_offset = stbi__get8(s);
7408- int tga_indexed = stbi__get8(s);
7409- int tga_image_type = stbi__get8(s);
7410- int tga_is_RLE = 0;
7411- int tga_palette_start = stbi__get16le(s);
7412- int tga_palette_len = stbi__get16le(s);
7413- int tga_palette_bits = stbi__get8(s);
7414- int tga_x_origin = stbi__get16le(s);
7415- int tga_y_origin = stbi__get16le(s);
7416- int tga_width = stbi__get16le(s);
7417- int tga_height = stbi__get16le(s);
7418- int tga_bits_per_pixel = stbi__get8(s);
7419- int tga_comp, tga_rgb16 = 0;
7420- int tga_inverted = stbi__get8(s);
7421- // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused
7422- // (useless?)
7423- // image data
7424- unsigned char *tga_data;
7425- unsigned char *tga_palette = NULL;
7426- int i, j;
7427- unsigned char raw_data[4] = {0};
7428- int RLE_count = 0;
7429- int RLE_repeating = 0;
7430- int read_next_pixel = 1;
7431- STBI_NOTUSED(ri);
7432- STBI_NOTUSED(tga_x_origin); // @TODO
7433- STBI_NOTUSED(tga_y_origin); // @TODO
7434-
7435- if (tga_height > STBI_MAX_DIMENSIONS) {
7436- return stbi__errpuc("too large", "Very large image (corrupt?)");
7437- }
7438- if (tga_width > STBI_MAX_DIMENSIONS) {
7439- return stbi__errpuc("too large", "Very large image (corrupt?)");
7440- }
7441-
7442- // do a tiny bit of precessing
7443- if (tga_image_type >= 8) {
7444- tga_image_type -= 8;
7445- tga_is_RLE = 1;
7446- }
7447- tga_inverted = 1 - ((tga_inverted >> 5) & 1);
7448-
7449- // If I'm paletted, then I'll use the number of bits from the palette
7450- if (tga_indexed) {
7451- tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
7452- } else {
7453- tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3),
7454- &tga_rgb16);
7455- }
7456-
7457- if (!tga_comp) { // shouldn't really happen, stbi__tga_test() should have
7458- // ensured basic consistency
7459- return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
7460- }
7461-
7462- // tga info
7463- *x = tga_width;
7464- *y = tga_height;
7465- if (comp) {
7466- *comp = tga_comp;
7467- }
7468-
7469- if (!stbi__mad3sizes_valid(tga_width, tga_height, tga_comp, 0)) {
7470- return stbi__errpuc("too large", "Corrupt TGA");
7471- }
7472-
7473- tga_data =
7474- (unsigned char *)stbi__malloc_mad3(tga_width, tga_height, tga_comp, 0);
7475- if (!tga_data) {
7476- return stbi__errpuc("outofmem", "Out of memory");
7477- }
7478-
7479- // skip to the data's starting position (offset usually = 0)
7480- stbi__skip(s, tga_offset);
7481-
7482- if (!tga_indexed && !tga_is_RLE && !tga_rgb16) {
7483- for (i = 0; i < tga_height; ++i) {
7484- int row = tga_inverted ? tga_height - i - 1 : i;
7485- stbi_uc *tga_row = tga_data + row * tga_width * tga_comp;
7486- stbi__getn(s, tga_row, tga_width * tga_comp);
7487- }
7488- } else {
7489- // do I need to load a palette?
7490- if (tga_indexed) {
7491- if (tga_palette_len ==
7492- 0) { /* you have to have at least one entry! */
7493- STBI_FREE(tga_data);
7494- return stbi__errpuc("bad palette", "Corrupt TGA");
7495- }
7496-
7497- // any data to skip? (offset usually = 0)
7498- stbi__skip(s, tga_palette_start);
7499- // load the palette
7500- tga_palette = (unsigned char *)stbi__malloc_mad2(tga_palette_len,
7501- tga_comp, 0);
7502- if (!tga_palette) {
7503- STBI_FREE(tga_data);
7504- return stbi__errpuc("outofmem", "Out of memory");
7505- }
7506- if (tga_rgb16) {
7507- stbi_uc *pal_entry = tga_palette;
7508- STBI_ASSERT(tga_comp == STBI_rgb);
7509- for (i = 0; i < tga_palette_len; ++i) {
7510- stbi__tga_read_rgb16(s, pal_entry);
7511- pal_entry += tga_comp;
7512- }
7513- } else if (!stbi__getn(s, tga_palette,
7514- tga_palette_len * tga_comp)) {
7515- STBI_FREE(tga_data);
7516- STBI_FREE(tga_palette);
7517- return stbi__errpuc("bad palette", "Corrupt TGA");
7518- }
7519- }
7520- // load the data
7521- for (i = 0; i < tga_width * tga_height; ++i) {
7522- // if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
7523- if (tga_is_RLE) {
7524- if (RLE_count == 0) {
7525- // yep, get the next byte as a RLE command
7526- int RLE_cmd = stbi__get8(s);
7527- RLE_count = 1 + (RLE_cmd & 127);
7528- RLE_repeating = RLE_cmd >> 7;
7529- read_next_pixel = 1;
7530- } else if (!RLE_repeating) {
7531- read_next_pixel = 1;
7532- }
7533- } else {
7534- read_next_pixel = 1;
7535- }
7536- // OK, if I need to read a pixel, do it now
7537- if (read_next_pixel) {
7538- // load however much data we did have
7539- if (tga_indexed) {
7540- // read in index, then perform the lookup
7541- int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s)
7542- : stbi__get16le(s);
7543- if (pal_idx >= tga_palette_len) {
7544- // invalid index
7545- pal_idx = 0;
7546- }
7547- pal_idx *= tga_comp;
7548- for (j = 0; j < tga_comp; ++j) {
7549- raw_data[j] = tga_palette[pal_idx + j];
7550- }
7551- } else if (tga_rgb16) {
7552- STBI_ASSERT(tga_comp == STBI_rgb);
7553- stbi__tga_read_rgb16(s, raw_data);
7554- } else {
7555- // read in the data raw
7556- for (j = 0; j < tga_comp; ++j) {
7557- raw_data[j] = stbi__get8(s);
7558- }
7559- }
7560- // clear the reading flag for the next pixel
7561- read_next_pixel = 0;
7562- } // end of reading a pixel
7563-
7564- // copy data
7565- for (j = 0; j < tga_comp; ++j) {
7566- tga_data[i * tga_comp + j] = raw_data[j];
7567- }
7568-
7569- // in case we're in RLE mode, keep counting down
7570- --RLE_count;
7571- }
7572- // do I need to invert the image?
7573- if (tga_inverted) {
7574- for (j = 0; j * 2 < tga_height; ++j) {
7575- int index1 = j * tga_width * tga_comp;
7576- int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
7577- for (i = tga_width * tga_comp; i > 0; --i) {
7578- unsigned char temp = tga_data[index1];
7579- tga_data[index1] = tga_data[index2];
7580- tga_data[index2] = temp;
7581- ++index1;
7582- ++index2;
7583- }
7584- }
7585- }
7586- // clear my palette, if I had one
7587- if (tga_palette != NULL) {
7588- STBI_FREE(tga_palette);
7589- }
7590- }
7591-
7592- // swap RGB - if the source data was RGB16, it already is in the right order
7593- if (tga_comp >= 3 && !tga_rgb16) {
7594- unsigned char *tga_pixel = tga_data;
7595- for (i = 0; i < tga_width * tga_height; ++i) {
7596- unsigned char temp = tga_pixel[0];
7597- tga_pixel[0] = tga_pixel[2];
7598- tga_pixel[2] = temp;
7599- tga_pixel += tga_comp;
7600- }
7601- }
7602-
7603- // convert to target component count
7604- if (req_comp && req_comp != tga_comp) {
7605- tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width,
7606- tga_height);
7607- }
7608-
7609- // the things I do to get rid of an error message, and yet keep
7610- // Microsoft's C compilers happy... [8^(
7611- tga_palette_start = tga_palette_len = tga_palette_bits = tga_x_origin =
7612- tga_y_origin = 0;
7613- STBI_NOTUSED(tga_palette_start);
7614- // OK, done
7615- return tga_data;
7616-}
7617-#endif
7618-
7619-// *************************************************************************************************
7620-// Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz,
7621-// tweaked by STB
7622-
7623-#ifndef STBI_NO_PSD
7624-static int
7625-stbi__psd_test(stbi__context *s)
7626-{
7627- int r = (stbi__get32be(s) == 0x38425053);
7628- stbi__rewind(s);
7629- return r;
7630-}
7631-
7632-static int
7633-stbi__psd_decode_rle(stbi__context *s, stbi_uc *p, int pixelCount)
7634-{
7635- int count, nleft, len;
7636-
7637- count = 0;
7638- while ((nleft = pixelCount - count) > 0) {
7639- len = stbi__get8(s);
7640- if (len == 128) {
7641- // No-op.
7642- } else if (len < 128) {
7643- // Copy next len+1 bytes literally.
7644- len++;
7645- if (len > nleft) {
7646- return 0; // corrupt data
7647- }
7648- count += len;
7649- while (len) {
7650- *p = stbi__get8(s);
7651- p += 4;
7652- len--;
7653- }
7654- } else if (len > 128) {
7655- stbi_uc val;
7656- // Next -len+1 bytes in the dest are replicated from next source
7657- // byte. (Interpret len as a negative 8-bit int.)
7658- len = 257 - len;
7659- if (len > nleft) {
7660- return 0; // corrupt data
7661- }
7662- val = stbi__get8(s);
7663- count += len;
7664- while (len) {
7665- *p = val;
7666- p += 4;
7667- len--;
7668- }
7669- }
7670- }
7671-
7672- return 1;
7673-}
7674-
7675-static void *
7676-stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
7677- stbi__result_info *ri, int bpc)
7678-{
7679- int pixelCount;
7680- int channelCount, compression;
7681- int channel, i;
7682- int bitdepth;
7683- int w, h;
7684- stbi_uc *out;
7685- STBI_NOTUSED(ri);
7686-
7687- // Check identifier
7688- if (stbi__get32be(s) != 0x38425053) { // "8BPS"
7689- return stbi__errpuc("not PSD", "Corrupt PSD image");
7690- }
7691-
7692- // Check file type version.
7693- if (stbi__get16be(s) != 1) {
7694- return stbi__errpuc("wrong version",
7695- "Unsupported version of PSD image");
7696- }
7697-
7698- // Skip 6 reserved bytes.
7699- stbi__skip(s, 6);
7700-
7701- // Read the number of channels (R, G, B, A, etc).
7702- channelCount = stbi__get16be(s);
7703- if (channelCount < 0 || channelCount > 16) {
7704- return stbi__errpuc("wrong channel count",
7705- "Unsupported number of channels in PSD image");
7706- }
7707-
7708- // Read the rows and columns of the image.
7709- h = stbi__get32be(s);
7710- w = stbi__get32be(s);
7711-
7712- if (h > STBI_MAX_DIMENSIONS) {
7713- return stbi__errpuc("too large", "Very large image (corrupt?)");
7714- }
7715- if (w > STBI_MAX_DIMENSIONS) {
7716- return stbi__errpuc("too large", "Very large image (corrupt?)");
7717- }
7718-
7719- // Make sure the depth is 8 bits.
7720- bitdepth = stbi__get16be(s);
7721- if (bitdepth != 8 && bitdepth != 16) {
7722- return stbi__errpuc("unsupported bit depth",
7723- "PSD bit depth is not 8 or 16 bit");
7724- }
7725-
7726- // Make sure the color mode is RGB.
7727- // Valid options are:
7728- // 0: Bitmap
7729- // 1: Grayscale
7730- // 2: Indexed color
7731- // 3: RGB color
7732- // 4: CMYK color
7733- // 7: Multichannel
7734- // 8: Duotone
7735- // 9: Lab color
7736- if (stbi__get16be(s) != 3) {
7737- return stbi__errpuc("wrong color format",
7738- "PSD is not in RGB color format");
7739- }
7740-
7741- // Skip the Mode Data. (It's the palette for indexed color; other info for
7742- // other modes.)
7743- stbi__skip(s, stbi__get32be(s));
7744-
7745- // Skip the image resources. (resolution, pen tool paths, etc)
7746- stbi__skip(s, stbi__get32be(s));
7747-
7748- // Skip the reserved data.
7749- stbi__skip(s, stbi__get32be(s));
7750-
7751- // Find out if the data is compressed.
7752- // Known values:
7753- // 0: no compression
7754- // 1: RLE compressed
7755- compression = stbi__get16be(s);
7756- if (compression > 1) {
7757- return stbi__errpuc("bad compression",
7758- "PSD has an unknown compression format");
7759- }
7760-
7761- // Check size
7762- if (!stbi__mad3sizes_valid(4, w, h, 0)) {
7763- return stbi__errpuc("too large", "Corrupt PSD");
7764- }
7765-
7766- // Create the destination image.
7767-
7768- if (!compression && bitdepth == 16 && bpc == 16) {
7769- out = (stbi_uc *)stbi__malloc_mad3(8, w, h, 0);
7770- ri->bits_per_channel = 16;
7771- } else {
7772- out = (stbi_uc *)stbi__malloc(4 * w * h);
7773- }
7774-
7775- if (!out) {
7776- return stbi__errpuc("outofmem", "Out of memory");
7777- }
7778- pixelCount = w * h;
7779-
7780- // Initialize the data to zero.
7781- // memset( out, 0, pixelCount * 4 );
7782-
7783- // Finally, the image data.
7784- if (compression) {
7785- // RLE as used by .PSD and .TIFF
7786- // Loop until you get the number of unpacked bytes you are expecting:
7787- // Read the next source byte into n.
7788- // If n is between 0 and 127 inclusive, copy the next n+1 bytes
7789- // literally. Else if n is between -127 and -1 inclusive, copy the
7790- // next byte -n+1 times. Else if n is 128, noop.
7791- // Endloop
7792-
7793- // The RLE-compressed data is preceded by a 2-byte data count for each
7794- // row in the data, which we're going to just skip.
7795- stbi__skip(s, h * channelCount * 2);
7796-
7797- // Read the RLE data by channel.
7798- for (channel = 0; channel < 4; channel++) {
7799- stbi_uc *p;
7800-
7801- p = out + channel;
7802- if (channel >= channelCount) {
7803- // Fill this channel with default data.
7804- for (i = 0; i < pixelCount; i++, p += 4) {
7805- *p = (channel == 3 ? 255 : 0);
7806- }
7807- } else {
7808- // Read the RLE data.
7809- if (!stbi__psd_decode_rle(s, p, pixelCount)) {
7810- STBI_FREE(out);
7811- return stbi__errpuc("corrupt", "bad RLE data");
7812- }
7813- }
7814- }
7815-
7816- } else {
7817- // We're at the raw image data. It's each channel in order (Red, Green,
7818- // Blue, Alpha, ...) where each channel consists of an 8-bit (or 16-bit)
7819- // value for each pixel in the image.
7820-
7821- // Read the data by channel.
7822- for (channel = 0; channel < 4; channel++) {
7823- if (channel >= channelCount) {
7824- // Fill this channel with default data.
7825- if (bitdepth == 16 && bpc == 16) {
7826- stbi__uint16 *q = ((stbi__uint16 *)out) + channel;
7827- stbi__uint16 val = channel == 3 ? 65535 : 0;
7828- for (i = 0; i < pixelCount; i++, q += 4) {
7829- *q = val;
7830- }
7831- } else {
7832- stbi_uc *p = out + channel;
7833- stbi_uc val = channel == 3 ? 255 : 0;
7834- for (i = 0; i < pixelCount; i++, p += 4) {
7835- *p = val;
7836- }
7837- }
7838- } else {
7839- if (ri->bits_per_channel == 16) { // output bpc
7840- stbi__uint16 *q = ((stbi__uint16 *)out) + channel;
7841- for (i = 0; i < pixelCount; i++, q += 4) {
7842- *q = (stbi__uint16)stbi__get16be(s);
7843- }
7844- } else {
7845- stbi_uc *p = out + channel;
7846- if (bitdepth == 16) { // input bpc
7847- for (i = 0; i < pixelCount; i++, p += 4) {
7848- *p = (stbi_uc)(stbi__get16be(s) >> 8);
7849- }
7850- } else {
7851- for (i = 0; i < pixelCount; i++, p += 4) {
7852- *p = stbi__get8(s);
7853- }
7854- }
7855- }
7856- }
7857- }
7858- }
7859-
7860- // remove weird white matte from PSD
7861- if (channelCount >= 4) {
7862- if (ri->bits_per_channel == 16) {
7863- for (i = 0; i < w * h; ++i) {
7864- stbi__uint16 *pixel = (stbi__uint16 *)out + 4 * i;
7865- if (pixel[3] != 0 && pixel[3] != 65535) {
7866- float a = pixel[3] / 65535.0f;
7867- float ra = 1.0f / a;
7868- float inv_a = 65535.0f * (1 - ra);
7869- pixel[0] = (stbi__uint16)(pixel[0] * ra + inv_a);
7870- pixel[1] = (stbi__uint16)(pixel[1] * ra + inv_a);
7871- pixel[2] = (stbi__uint16)(pixel[2] * ra + inv_a);
7872- }
7873- }
7874- } else {
7875- for (i = 0; i < w * h; ++i) {
7876- unsigned char *pixel = out + 4 * i;
7877- if (pixel[3] != 0 && pixel[3] != 255) {
7878- float a = pixel[3] / 255.0f;
7879- float ra = 1.0f / a;
7880- float inv_a = 255.0f * (1 - ra);
7881- pixel[0] = (unsigned char)(pixel[0] * ra + inv_a);
7882- pixel[1] = (unsigned char)(pixel[1] * ra + inv_a);
7883- pixel[2] = (unsigned char)(pixel[2] * ra + inv_a);
7884- }
7885- }
7886- }
7887- }
7888-
7889- // convert to desired output format
7890- if (req_comp && req_comp != 4) {
7891- if (ri->bits_per_channel == 16) {
7892- out = (stbi_uc *)stbi__convert_format16((stbi__uint16 *)out, 4,
7893- req_comp, w, h);
7894- } else {
7895- out = stbi__convert_format(out, 4, req_comp, w, h);
7896- }
7897- if (out == NULL) {
7898- return out; // stbi__convert_format frees input on failure
7899- }
7900- }
7901-
7902- if (comp) {
7903- *comp = 4;
7904- }
7905- *y = h;
7906- *x = w;
7907-
7908- return out;
7909-}
7910-#endif
7911-
7912-// *************************************************************************************************
7913-// Softimage PIC loader
7914-// by Tom Seddon
7915-//
7916-// See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
7917-// See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
7918-
7919-#ifndef STBI_NO_PIC
7920-static int
7921-stbi__pic_is4(stbi__context *s, const char *str)
7922-{
7923- int i;
7924- for (i = 0; i < 4; ++i) {
7925- if (stbi__get8(s) != (stbi_uc)str[i]) {
7926- return 0;
7927- }
7928- }
7929-
7930- return 1;
7931-}
7932-
7933-static int
7934-stbi__pic_test_core(stbi__context *s)
7935-{
7936- int i;
7937-
7938- if (!stbi__pic_is4(s, "\x53\x80\xF6\x34")) {
7939- return 0;
7940- }
7941-
7942- for (i = 0; i < 84; ++i) {
7943- stbi__get8(s);
7944- }
7945-
7946- if (!stbi__pic_is4(s, "PICT")) {
7947- return 0;
7948- }
7949-
7950- return 1;
7951-}
7952-
7953-typedef struct {
7954- stbi_uc size, type, channel;
7955-} stbi__pic_packet;
7956-
7957-static stbi_uc *
7958-stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
7959-{
7960- int mask = 0x80, i;
7961-
7962- for (i = 0; i < 4; ++i, mask >>= 1) {
7963- if (channel & mask) {
7964- if (stbi__at_eof(s)) {
7965- return stbi__errpuc("bad file", "PIC file too short");
7966- }
7967- dest[i] = stbi__get8(s);
7968- }
7969- }
7970-
7971- return dest;
7972-}
7973-
7974-static void
7975-stbi__copyval(int channel, stbi_uc *dest, const stbi_uc *src)
7976-{
7977- int mask = 0x80, i;
7978-
7979- for (i = 0; i < 4; ++i, mask >>= 1) {
7980- if (channel & mask) {
7981- dest[i] = src[i];
7982- }
7983- }
7984-}
7985-
7986-static stbi_uc *
7987-stbi__pic_load_core(stbi__context *s, int width, int height, int *comp,
7988- stbi_uc *result)
7989-{
7990- int act_comp = 0, num_packets = 0, y, chained;
7991- stbi__pic_packet packets[10];
7992-
7993- // this will (should...) cater for even some bizarre stuff like having data
7994- // for the same channel in multiple packets.
7995- do {
7996- stbi__pic_packet *packet;
7997-
7998- if (num_packets == sizeof(packets) / sizeof(packets[0])) {
7999- return stbi__errpuc("bad format", "too many packets");
8000- }
8001-
8002- packet = &packets[num_packets++];
8003-
8004- chained = stbi__get8(s);
8005- packet->size = stbi__get8(s);
8006- packet->type = stbi__get8(s);
8007- packet->channel = stbi__get8(s);
8008-
8009- act_comp |= packet->channel;
8010-
8011- if (stbi__at_eof(s)) {
8012- return stbi__errpuc("bad file", "file too short (reading packets)");
8013- }
8014- if (packet->size != 8) {
8015- return stbi__errpuc("bad format", "packet isn't 8bpp");
8016- }
8017- } while (chained);
8018-
8019- *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
8020-
8021- for (y = 0; y < height; ++y) {
8022- int packet_idx;
8023-
8024- for (packet_idx = 0; packet_idx < num_packets; ++packet_idx) {
8025- stbi__pic_packet *packet = &packets[packet_idx];
8026- stbi_uc *dest = result + y * width * 4;
8027-
8028- switch (packet->type) {
8029- default:
8030- return stbi__errpuc("bad format",
8031- "packet has bad compression type");
8032-
8033- case 0: { // uncompressed
8034- int x;
8035-
8036- for (x = 0; x < width; ++x, dest += 4) {
8037- if (!stbi__readval(s, packet->channel, dest)) {
8038- return 0;
8039- }
8040- }
8041- break;
8042- }
8043-
8044- case 1: // Pure RLE
8045- {
8046- int left = width, i;
8047-
8048- while (left > 0) {
8049- stbi_uc count, value[4];
8050-
8051- count = stbi__get8(s);
8052- if (stbi__at_eof(s)) {
8053- return stbi__errpuc("bad file",
8054- "file too short (pure read count)");
8055- }
8056-
8057- if (count > left) {
8058- count = (stbi_uc)left;
8059- }
8060-
8061- if (!stbi__readval(s, packet->channel, value)) {
8062- return 0;
8063- }
8064-
8065- for (i = 0; i < count; ++i, dest += 4) {
8066- stbi__copyval(packet->channel, dest, value);
8067- }
8068- left -= count;
8069- }
8070- } break;
8071-
8072- case 2: { // Mixed RLE
8073- int left = width;
8074- while (left > 0) {
8075- int count = stbi__get8(s), i;
8076- if (stbi__at_eof(s)) {
8077- return stbi__errpuc(
8078- "bad file", "file too short (mixed read count)");
8079- }
8080-
8081- if (count >= 128) { // Repeated
8082- stbi_uc value[4];
8083-
8084- if (count == 128) {
8085- count = stbi__get16be(s);
8086- } else {
8087- count -= 127;
8088- }
8089- if (count > left) {
8090- return stbi__errpuc("bad file", "scanline overrun");
8091- }
8092-
8093- if (!stbi__readval(s, packet->channel, value)) {
8094- return 0;
8095- }
8096-
8097- for (i = 0; i < count; ++i, dest += 4) {
8098- stbi__copyval(packet->channel, dest, value);
8099- }
8100- } else { // Raw
8101- ++count;
8102- if (count > left) {
8103- return stbi__errpuc("bad file", "scanline overrun");
8104- }
8105-
8106- for (i = 0; i < count; ++i, dest += 4) {
8107- if (!stbi__readval(s, packet->channel, dest)) {
8108- return 0;
8109- }
8110- }
8111- }
8112- left -= count;
8113- }
8114- break;
8115- }
8116- }
8117- }
8118- }
8119-
8120- return result;
8121-}
8122-
8123-static void *
8124-stbi__pic_load(stbi__context *s, int *px, int *py, int *comp, int req_comp,
8125- stbi__result_info *ri)
8126-{
8127- stbi_uc *result;
8128- int i, x, y, internal_comp;
8129- STBI_NOTUSED(ri);
8130-
8131- if (!comp) {
8132- comp = &internal_comp;
8133- }
8134-
8135- for (i = 0; i < 92; ++i) {
8136- stbi__get8(s);
8137- }
8138-
8139- x = stbi__get16be(s);
8140- y = stbi__get16be(s);
8141-
8142- if (y > STBI_MAX_DIMENSIONS) {
8143- return stbi__errpuc("too large", "Very large image (corrupt?)");
8144- }
8145- if (x > STBI_MAX_DIMENSIONS) {
8146- return stbi__errpuc("too large", "Very large image (corrupt?)");
8147- }
8148-
8149- if (stbi__at_eof(s)) {
8150- return stbi__errpuc("bad file", "file too short (pic header)");
8151- }
8152- if (!stbi__mad3sizes_valid(x, y, 4, 0)) {
8153- return stbi__errpuc("too large", "PIC image too large to decode");
8154- }
8155-
8156- stbi__get32be(s); // skip `ratio'
8157- stbi__get16be(s); // skip `fields'
8158- stbi__get16be(s); // skip `pad'
8159-
8160- // intermediate buffer is RGBA
8161- result = (stbi_uc *)stbi__malloc_mad3(x, y, 4, 0);
8162- if (!result) {
8163- return stbi__errpuc("outofmem", "Out of memory");
8164- }
8165- memset(result, 0xff, x * y * 4);
8166-
8167- if (!stbi__pic_load_core(s, x, y, comp, result)) {
8168- STBI_FREE(result);
8169- result = 0;
8170- }
8171- *px = x;
8172- *py = y;
8173- if (req_comp == 0) {
8174- req_comp = *comp;
8175- }
8176- result = stbi__convert_format(result, 4, req_comp, x, y);
8177-
8178- return result;
8179-}
8180-
8181-static int
8182-stbi__pic_test(stbi__context *s)
8183-{
8184- int r = stbi__pic_test_core(s);
8185- stbi__rewind(s);
8186- return r;
8187-}
8188-#endif
8189-
8190-// *************************************************************************************************
8191-// GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
8192-
8193-#ifndef STBI_NO_GIF
8194-typedef struct {
8195- stbi__int16 prefix;
8196- stbi_uc first;
8197- stbi_uc suffix;
8198-} stbi__gif_lzw;
8199-
8200-typedef struct {
8201- int w, h;
8202- stbi_uc *out; // output buffer (always 4 components)
8203- stbi_uc
8204- *background; // The current "background" as far as a gif is concerned
8205- stbi_uc *history;
8206- int flags, bgindex, ratio, transparent, eflags;
8207- stbi_uc pal[256][4];
8208- stbi_uc lpal[256][4];
8209- stbi__gif_lzw codes[8192];
8210- stbi_uc *color_table;
8211- int parse, step;
8212- int lflags;
8213- int start_x, start_y;
8214- int max_x, max_y;
8215- int cur_x, cur_y;
8216- int line_size;
8217- int delay;
8218-} stbi__gif;
8219-
8220-static int
8221-stbi__gif_test_raw(stbi__context *s)
8222-{
8223- int sz;
8224- if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' ||
8225- stbi__get8(s) != '8') {
8226- return 0;
8227- }
8228- sz = stbi__get8(s);
8229- if (sz != '9' && sz != '7') {
8230- return 0;
8231- }
8232- if (stbi__get8(s) != 'a') {
8233- return 0;
8234- }
8235- return 1;
8236-}
8237-
8238-static int
8239-stbi__gif_test(stbi__context *s)
8240-{
8241- int r = stbi__gif_test_raw(s);
8242- stbi__rewind(s);
8243- return r;
8244-}
8245-
8246-static void
8247-stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4],
8248- int num_entries, int transp)
8249-{
8250- int i;
8251- for (i = 0; i < num_entries; ++i) {
8252- pal[i][2] = stbi__get8(s);
8253- pal[i][1] = stbi__get8(s);
8254- pal[i][0] = stbi__get8(s);
8255- pal[i][3] = transp == i ? 0 : 255;
8256- }
8257-}
8258-
8259-static int
8260-stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
8261-{
8262- stbi_uc version;
8263- if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' ||
8264- stbi__get8(s) != '8') {
8265- return stbi__err("not GIF", "Corrupt GIF");
8266- }
8267-
8268- version = stbi__get8(s);
8269- if (version != '7' && version != '9') {
8270- return stbi__err("not GIF", "Corrupt GIF");
8271- }
8272- if (stbi__get8(s) != 'a') {
8273- return stbi__err("not GIF", "Corrupt GIF");
8274- }
8275-
8276- stbi__g_failure_reason = "";
8277- g->w = stbi__get16le(s);
8278- g->h = stbi__get16le(s);
8279- g->flags = stbi__get8(s);
8280- g->bgindex = stbi__get8(s);
8281- g->ratio = stbi__get8(s);
8282- g->transparent = -1;
8283-
8284- if (g->w > STBI_MAX_DIMENSIONS) {
8285- return stbi__err("too large", "Very large image (corrupt?)");
8286- }
8287- if (g->h > STBI_MAX_DIMENSIONS) {
8288- return stbi__err("too large", "Very large image (corrupt?)");
8289- }
8290-
8291- if (comp != 0) {
8292- *comp = 4; // can't actually tell whether it's 3 or 4 until we parse the
8293- // comments
8294- }
8295-
8296- if (is_info) {
8297- return 1;
8298- }
8299-
8300- if (g->flags & 0x80) {
8301- stbi__gif_parse_colortable(s, g->pal, 2 << (g->flags & 7), -1);
8302- }
8303-
8304- return 1;
8305-}
8306-
8307-static int
8308-stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
8309-{
8310- stbi__gif *g = (stbi__gif *)stbi__malloc(sizeof(stbi__gif));
8311- if (!g) {
8312- return stbi__err("outofmem", "Out of memory");
8313- }
8314- if (!stbi__gif_header(s, g, comp, 1)) {
8315- STBI_FREE(g);
8316- stbi__rewind(s);
8317- return 0;
8318- }
8319- if (x) {
8320- *x = g->w;
8321- }
8322- if (y) {
8323- *y = g->h;
8324- }
8325- STBI_FREE(g);
8326- return 1;
8327-}
8328-
8329-static void
8330-stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
8331-{
8332- stbi_uc *p, *c;
8333- int idx;
8334-
8335- // recurse to decode the prefixes, since the linked-list is backwards,
8336- // and working backwards through an interleaved image would be nasty
8337- if (g->codes[code].prefix >= 0) {
8338- stbi__out_gif_code(g, g->codes[code].prefix);
8339- }
8340-
8341- if (g->cur_y >= g->max_y) {
8342- return;
8343- }
8344-
8345- idx = g->cur_x + g->cur_y;
8346- p = &g->out[idx];
8347- g->history[idx / 4] = 1;
8348-
8349- c = &g->color_table[g->codes[code].suffix * 4];
8350- if (c[3] > 128) { // don't render transparent pixels;
8351- p[0] = c[2];
8352- p[1] = c[1];
8353- p[2] = c[0];
8354- p[3] = c[3];
8355- }
8356- g->cur_x += 4;
8357-
8358- if (g->cur_x >= g->max_x) {
8359- g->cur_x = g->start_x;
8360- g->cur_y += g->step;
8361-
8362- while (g->cur_y >= g->max_y && g->parse > 0) {
8363- g->step = (1 << g->parse) * g->line_size;
8364- g->cur_y = g->start_y + (g->step >> 1);
8365- --g->parse;
8366- }
8367- }
8368-}
8369-
8370-static stbi_uc *
8371-stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
8372-{
8373- stbi_uc lzw_cs;
8374- stbi__int32 len, init_code;
8375- stbi__uint32 first;
8376- stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
8377- stbi__gif_lzw *p;
8378-
8379- lzw_cs = stbi__get8(s);
8380- if (lzw_cs > 12) {
8381- return NULL;
8382- }
8383- clear = 1 << lzw_cs;
8384- first = 1;
8385- codesize = lzw_cs + 1;
8386- codemask = (1 << codesize) - 1;
8387- bits = 0;
8388- valid_bits = 0;
8389- for (init_code = 0; init_code < clear; init_code++) {
8390- g->codes[init_code].prefix = -1;
8391- g->codes[init_code].first = (stbi_uc)init_code;
8392- g->codes[init_code].suffix = (stbi_uc)init_code;
8393- }
8394-
8395- // support no starting clear code
8396- avail = clear + 2;
8397- oldcode = -1;
8398-
8399- len = 0;
8400- for (;;) {
8401- if (valid_bits < codesize) {
8402- if (len == 0) {
8403- len = stbi__get8(s); // start new block
8404- if (len == 0) {
8405- return g->out;
8406- }
8407- }
8408- --len;
8409- bits |= (stbi__int32)stbi__get8(s) << valid_bits;
8410- valid_bits += 8;
8411- } else {
8412- stbi__int32 code = bits & codemask;
8413- bits >>= codesize;
8414- valid_bits -= codesize;
8415- // @OPTIMIZE: is there some way we can accelerate the non-clear
8416- // path?
8417- if (code == clear) { // clear code
8418- codesize = lzw_cs + 1;
8419- codemask = (1 << codesize) - 1;
8420- avail = clear + 2;
8421- oldcode = -1;
8422- first = 0;
8423- } else if (code == clear + 1) { // end of stream code
8424- stbi__skip(s, len);
8425- while ((len = stbi__get8(s)) > 0) {
8426- stbi__skip(s, len);
8427- }
8428- return g->out;
8429- } else if (code <= avail) {
8430- if (first) {
8431- return stbi__errpuc("no clear code", "Corrupt GIF");
8432- }
8433-
8434- if (oldcode >= 0) {
8435- p = &g->codes[avail++];
8436- if (avail > 8192) {
8437- return stbi__errpuc("too many codes", "Corrupt GIF");
8438- }
8439-
8440- p->prefix = (stbi__int16)oldcode;
8441- p->first = g->codes[oldcode].first;
8442- p->suffix =
8443- (code == avail) ? p->first : g->codes[code].first;
8444- } else if (code == avail) {
8445- return stbi__errpuc("illegal code in raster",
8446- "Corrupt GIF");
8447- }
8448-
8449- stbi__out_gif_code(g, (stbi__uint16)code);
8450-
8451- if ((avail & codemask) == 0 && avail <= 0x0FFF) {
8452- codesize++;
8453- codemask = (1 << codesize) - 1;
8454- }
8455-
8456- oldcode = code;
8457- } else {
8458- return stbi__errpuc("illegal code in raster", "Corrupt GIF");
8459- }
8460- }
8461- }
8462-}
8463-
8464-// this function is designed to support animated gifs, although stb_image
8465-// doesn't support it two back is the image from two frames ago, used for a very
8466-// specific disposal format
8467-static stbi_uc *
8468-stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp,
8469- stbi_uc *two_back)
8470-{
8471- int dispose;
8472- int first_frame;
8473- int pi;
8474- int pcount;
8475- STBI_NOTUSED(req_comp);
8476-
8477- // on first frame, any non-written pixels get the background colour
8478- // (non-transparent)
8479- first_frame = 0;
8480- if (g->out == 0) {
8481- if (!stbi__gif_header(s, g, comp, 0)) {
8482- return 0; // stbi__g_failure_reason set by stbi__gif_header
8483- }
8484- if (!stbi__mad3sizes_valid(4, g->w, g->h, 0)) {
8485- return stbi__errpuc("too large", "GIF image is too large");
8486- }
8487- pcount = g->w * g->h;
8488- g->out = (stbi_uc *)stbi__malloc(4 * pcount);
8489- g->background = (stbi_uc *)stbi__malloc(4 * pcount);
8490- g->history = (stbi_uc *)stbi__malloc(pcount);
8491- if (!g->out || !g->background || !g->history) {
8492- return stbi__errpuc("outofmem", "Out of memory");
8493- }
8494-
8495- // image is treated as "transparent" at the start - ie, nothing
8496- // overwrites the current background; background colour is only used for
8497- // pixels that are not rendered first frame, after that "background"
8498- // color refers to the color that was there the previous frame.
8499- memset(g->out, 0x00, 4 * pcount);
8500- memset(g->background, 0x00,
8501- 4 * pcount); // state of the background (starts transparent)
8502- memset(g->history, 0x00,
8503- pcount); // pixels that were affected previous frame
8504- first_frame = 1;
8505- } else {
8506- // second frame - how do we dispose of the previous one?
8507- dispose = (g->eflags & 0x1C) >> 2;
8508- pcount = g->w * g->h;
8509-
8510- if ((dispose == 3) && (two_back == 0)) {
8511- dispose = 2; // if I don't have an image to revert back to, default
8512- // to the old background
8513- }
8514-
8515- if (dispose == 3) { // use previous graphic
8516- for (pi = 0; pi < pcount; ++pi) {
8517- if (g->history[pi]) {
8518- memcpy(&g->out[pi * 4], &two_back[pi * 4], 4);
8519- }
8520- }
8521- } else if (dispose == 2) {
8522- // restore what was changed last frame to background before that
8523- // frame;
8524- for (pi = 0; pi < pcount; ++pi) {
8525- if (g->history[pi]) {
8526- memcpy(&g->out[pi * 4], &g->background[pi * 4], 4);
8527- }
8528- }
8529- } else {
8530- // This is a non-disposal case eithe way, so just
8531- // leave the pixels as is, and they will become the new background
8532- // 1: do not dispose
8533- // 0: not specified.
8534- }
8535-
8536- // background is what out is after the undoing of the previou frame;
8537- memcpy(g->background, g->out, 4 * g->w * g->h);
8538- }
8539-
8540- // clear my history;
8541- memset(g->history, 0x00,
8542- g->w * g->h); // pixels that were affected previous frame
8543-
8544- for (;;) {
8545- int tag = stbi__get8(s);
8546- switch (tag) {
8547- case 0x2C: /* Image Descriptor */
8548- {
8549- stbi__int32 x, y, w, h;
8550- stbi_uc *o;
8551-
8552- x = stbi__get16le(s);
8553- y = stbi__get16le(s);
8554- w = stbi__get16le(s);
8555- h = stbi__get16le(s);
8556- if (((x + w) > (g->w)) || ((y + h) > (g->h))) {
8557- return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
8558- }
8559-
8560- g->line_size = g->w * 4;
8561- g->start_x = x * 4;
8562- g->start_y = y * g->line_size;
8563- g->max_x = g->start_x + w * 4;
8564- g->max_y = g->start_y + h * g->line_size;
8565- g->cur_x = g->start_x;
8566- g->cur_y = g->start_y;
8567-
8568- // if the width of the specified rectangle is 0, that means
8569- // we may not see *any* pixels or the image is malformed;
8570- // to make sure this is caught, move the current y down to
8571- // max_y (which is what out_gif_code checks).
8572- if (w == 0) {
8573- g->cur_y = g->max_y;
8574- }
8575-
8576- g->lflags = stbi__get8(s);
8577-
8578- if (g->lflags & 0x40) {
8579- g->step = 8 * g->line_size; // first interlaced spacing
8580- g->parse = 3;
8581- } else {
8582- g->step = g->line_size;
8583- g->parse = 0;
8584- }
8585-
8586- if (g->lflags & 0x80) {
8587- stbi__gif_parse_colortable(s, g->lpal, 2 << (g->lflags & 7),
8588- g->eflags & 0x01 ? g->transparent
8589- : -1);
8590- g->color_table = (stbi_uc *)g->lpal;
8591- } else if (g->flags & 0x80) {
8592- g->color_table = (stbi_uc *)g->pal;
8593- } else {
8594- return stbi__errpuc("missing color table", "Corrupt GIF");
8595- }
8596-
8597- o = stbi__process_gif_raster(s, g);
8598- if (!o) {
8599- return NULL;
8600- }
8601-
8602- // if this was the first frame,
8603- pcount = g->w * g->h;
8604- if (first_frame && (g->bgindex > 0)) {
8605- // if first frame, any pixel not drawn to gets the background
8606- // color
8607- for (pi = 0; pi < pcount; ++pi) {
8608- if (g->history[pi] == 0) {
8609- g->pal[g->bgindex][3] =
8610- 255; // just in case it was made transparent, undo
8611- // that; It will be reset next frame if need
8612- // be;
8613- memcpy(&g->out[pi * 4], &g->pal[g->bgindex], 4);
8614- }
8615- }
8616- }
8617-
8618- return o;
8619- }
8620-
8621- case 0x21: // Comment Extension.
8622- {
8623- int len;
8624- int ext = stbi__get8(s);
8625- if (ext == 0xF9) { // Graphic Control Extension.
8626- len = stbi__get8(s);
8627- if (len == 4) {
8628- g->eflags = stbi__get8(s);
8629- g->delay =
8630- 10 * stbi__get16le(s); // delay - 1/100th of a second,
8631- // saving as 1/1000ths.
8632-
8633- // unset old transparent
8634- if (g->transparent >= 0) {
8635- g->pal[g->transparent][3] = 255;
8636- }
8637- if (g->eflags & 0x01) {
8638- g->transparent = stbi__get8(s);
8639- if (g->transparent >= 0) {
8640- g->pal[g->transparent][3] = 0;
8641- }
8642- } else {
8643- // don't need transparent
8644- stbi__skip(s, 1);
8645- g->transparent = -1;
8646- }
8647- } else {
8648- stbi__skip(s, len);
8649- break;
8650- }
8651- }
8652- while ((len = stbi__get8(s)) != 0) {
8653- stbi__skip(s, len);
8654- }
8655- break;
8656- }
8657-
8658- case 0x3B: // gif stream termination code
8659- return (stbi_uc *)s; // using '1' causes warning on some compilers
8660-
8661- default:
8662- return stbi__errpuc("unknown code", "Corrupt GIF");
8663- }
8664- }
8665-}
8666-
8667-static void *
8668-stbi__load_gif_main_outofmem(stbi__gif *g, stbi_uc *out, int **delays)
8669-{
8670- STBI_FREE(g->out);
8671- STBI_FREE(g->history);
8672- STBI_FREE(g->background);
8673-
8674- if (out) {
8675- STBI_FREE(out);
8676- }
8677- if (delays && *delays) {
8678- STBI_FREE(*delays);
8679- }
8680- return stbi__errpuc("outofmem", "Out of memory");
8681-}
8682-
8683-static void *
8684-stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z,
8685- int *comp, int req_comp)
8686-{
8687- if (stbi__gif_test(s)) {
8688- int layers = 0;
8689- stbi_uc *u = 0;
8690- stbi_uc *out = 0;
8691- stbi_uc *two_back = 0;
8692- stbi__gif g;
8693- int stride;
8694- int out_size = 0;
8695- int delays_size = 0;
8696-
8697- STBI_NOTUSED(out_size);
8698- STBI_NOTUSED(delays_size);
8699-
8700- memset(&g, 0, sizeof(g));
8701- if (delays) {
8702- *delays = 0;
8703- }
8704-
8705- do {
8706- u = stbi__gif_load_next(s, &g, comp, req_comp, two_back);
8707- if (u == (stbi_uc *)s) {
8708- u = 0; // end of animated gif marker
8709- }
8710-
8711- if (u) {
8712- *x = g.w;
8713- *y = g.h;
8714- ++layers;
8715- stride = g.w * g.h * 4;
8716-
8717- if (out) {
8718- void *tmp = (stbi_uc *)STBI_REALLOC_SIZED(out, out_size,
8719- layers * stride);
8720- if (!tmp) {
8721- return stbi__load_gif_main_outofmem(&g, out, delays);
8722- } else {
8723- out = (stbi_uc *)tmp;
8724- out_size = layers * stride;
8725- }
8726-
8727- if (delays) {
8728- int *new_delays = (int *)STBI_REALLOC_SIZED(
8729- *delays, delays_size, sizeof(int) * layers);
8730- if (!new_delays) {
8731- return stbi__load_gif_main_outofmem(&g, out,
8732- delays);
8733- }
8734- *delays = new_delays;
8735- delays_size = layers * sizeof(int);
8736- }
8737- } else {
8738- out = (stbi_uc *)stbi__malloc(layers * stride);
8739- if (!out) {
8740- return stbi__load_gif_main_outofmem(&g, out, delays);
8741- }
8742- out_size = layers * stride;
8743- if (delays) {
8744- *delays = (int *)stbi__malloc(layers * sizeof(int));
8745- if (!*delays) {
8746- return stbi__load_gif_main_outofmem(&g, out,
8747- delays);
8748- }
8749- delays_size = layers * sizeof(int);
8750- }
8751- }
8752- memcpy(out + ((layers - 1) * stride), u, stride);
8753- if (layers >= 2) {
8754- two_back = out - 2 * stride;
8755- }
8756-
8757- if (delays) {
8758- (*delays)[layers - 1U] = g.delay;
8759- }
8760- }
8761- } while (u != 0);
8762-
8763- // free temp buffer;
8764- STBI_FREE(g.out);
8765- STBI_FREE(g.history);
8766- STBI_FREE(g.background);
8767-
8768- // do the final conversion after loading everything;
8769- if (req_comp && req_comp != 4) {
8770- out = stbi__convert_format(out, 4, req_comp, layers * g.w, g.h);
8771- }
8772-
8773- *z = layers;
8774- return out;
8775- } else {
8776- return stbi__errpuc("not GIF", "Image was not as a gif type.");
8777- }
8778-}
8779-
8780-static void *
8781-stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
8782- stbi__result_info *ri)
8783-{
8784- stbi_uc *u = 0;
8785- stbi__gif g;
8786- memset(&g, 0, sizeof(g));
8787- STBI_NOTUSED(ri);
8788-
8789- u = stbi__gif_load_next(s, &g, comp, req_comp, 0);
8790- if (u == (stbi_uc *)s) {
8791- u = 0; // end of animated gif marker
8792- }
8793- if (u) {
8794- *x = g.w;
8795- *y = g.h;
8796-
8797- // moved conversion to after successful load so that the same
8798- // can be done for multiple frames.
8799- if (req_comp && req_comp != 4) {
8800- u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
8801- }
8802- } else if (g.out) {
8803- // if there was an error and we allocated an image buffer, free it!
8804- STBI_FREE(g.out);
8805- }
8806-
8807- // free buffers needed for multiple frame loading;
8808- STBI_FREE(g.history);
8809- STBI_FREE(g.background);
8810-
8811- return u;
8812-}
8813-
8814-static int
8815-stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
8816-{
8817- return stbi__gif_info_raw(s, x, y, comp);
8818-}
8819-#endif
8820-
8821-// *************************************************************************************************
8822-// Radiance RGBE HDR loader
8823-// originally by Nicolas Schulz
8824-#ifndef STBI_NO_HDR
8825-static int
8826-stbi__hdr_test_core(stbi__context *s, const char *signature)
8827-{
8828- int i;
8829- for (i = 0; signature[i]; ++i) {
8830- if (stbi__get8(s) != signature[i]) {
8831- return 0;
8832- }
8833- }
8834- stbi__rewind(s);
8835- return 1;
8836-}
8837-
8838-static int
8839-stbi__hdr_test(stbi__context *s)
8840-{
8841- int r = stbi__hdr_test_core(s, "#?RADIANCE\n");
8842- stbi__rewind(s);
8843- if (!r) {
8844- r = stbi__hdr_test_core(s, "#?RGBE\n");
8845- stbi__rewind(s);
8846- }
8847- return r;
8848-}
8849-
8850-#define STBI__HDR_BUFLEN 1024
8851-static char *
8852-stbi__hdr_gettoken(stbi__context *z, char *buffer)
8853-{
8854- int len = 0;
8855- char c = '\0';
8856-
8857- c = (char)stbi__get8(z);
8858-
8859- while (!stbi__at_eof(z) && c != '\n') {
8860- buffer[len++] = c;
8861- if (len == STBI__HDR_BUFLEN - 1) {
8862- // flush to end of line
8863- while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
8864- ;
8865- break;
8866- }
8867- c = (char)stbi__get8(z);
8868- }
8869-
8870- buffer[len] = 0;
8871- return buffer;
8872-}
8873-
8874-static void
8875-stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
8876-{
8877- if (input[3] != 0) {
8878- float f1;
8879- // Exponent
8880- f1 = (float)ldexp(1.0f, input[3] - (int)(128 + 8));
8881- if (req_comp <= 2) {
8882- output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
8883- } else {
8884- output[0] = input[0] * f1;
8885- output[1] = input[1] * f1;
8886- output[2] = input[2] * f1;
8887- }
8888- if (req_comp == 2) {
8889- output[1] = 1;
8890- }
8891- if (req_comp == 4) {
8892- output[3] = 1;
8893- }
8894- } else {
8895- switch (req_comp) {
8896- case 4:
8897- output[3] = 1; /* fallthrough */
8898- case 3:
8899- output[0] = output[1] = output[2] = 0;
8900- break;
8901- case 2:
8902- output[1] = 1; /* fallthrough */
8903- case 1:
8904- output[0] = 0;
8905- break;
8906- }
8907- }
8908-}
8909-
8910-static float *
8911-stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
8912- stbi__result_info *ri)
8913-{
8914- char buffer[STBI__HDR_BUFLEN];
8915- char *token;
8916- int valid = 0;
8917- int width, height;
8918- stbi_uc *scanline;
8919- float *hdr_data;
8920- int len;
8921- unsigned char count, value;
8922- int i, j, k, c1, c2, z;
8923- const char *headerToken;
8924- STBI_NOTUSED(ri);
8925-
8926- // Check identifier
8927- headerToken = stbi__hdr_gettoken(s, buffer);
8928- if (strcmp(headerToken, "#?RADIANCE") != 0 &&
8929- strcmp(headerToken, "#?RGBE") != 0) {
8930- return stbi__errpf("not HDR", "Corrupt HDR image");
8931- }
8932-
8933- // Parse header
8934- for (;;) {
8935- token = stbi__hdr_gettoken(s, buffer);
8936- if (token[0] == 0) {
8937- break;
8938- }
8939- if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) {
8940- valid = 1;
8941- }
8942- }
8943-
8944- if (!valid) {
8945- return stbi__errpf("unsupported format", "Unsupported HDR format");
8946- }
8947-
8948- // Parse width and height
8949- // can't use sscanf() if we're not using stdio!
8950- token = stbi__hdr_gettoken(s, buffer);
8951- if (strncmp(token, "-Y ", 3)) {
8952- return stbi__errpf("unsupported data layout", "Unsupported HDR format");
8953- }
8954- token += 3;
8955- height = (int)strtol(token, &token, 10);
8956- while (*token == ' ') {
8957- ++token;
8958- }
8959- if (strncmp(token, "+X ", 3)) {
8960- return stbi__errpf("unsupported data layout", "Unsupported HDR format");
8961- }
8962- token += 3;
8963- width = (int)strtol(token, NULL, 10);
8964-
8965- if (height > STBI_MAX_DIMENSIONS) {
8966- return stbi__errpf("too large", "Very large image (corrupt?)");
8967- }
8968- if (width > STBI_MAX_DIMENSIONS) {
8969- return stbi__errpf("too large", "Very large image (corrupt?)");
8970- }
8971-
8972- *x = width;
8973- *y = height;
8974-
8975- if (comp) {
8976- *comp = 3;
8977- }
8978- if (req_comp == 0) {
8979- req_comp = 3;
8980- }
8981-
8982- if (!stbi__mad4sizes_valid(width, height, req_comp, sizeof(float), 0)) {
8983- return stbi__errpf("too large", "HDR image is too large");
8984- }
8985-
8986- // Read data
8987- hdr_data =
8988- (float *)stbi__malloc_mad4(width, height, req_comp, sizeof(float), 0);
8989- if (!hdr_data) {
8990- return stbi__errpf("outofmem", "Out of memory");
8991- }
8992-
8993- // Load image data
8994- // image data is stored as some number of sca
8995- if (width < 8 || width >= 32768) {
8996- // Read flat data
8997- for (j = 0; j < height; ++j) {
8998- for (i = 0; i < width; ++i) {
8999- stbi_uc rgbe[4];
9000- main_decode_loop:
9001- stbi__getn(s, rgbe, 4);
9002- stbi__hdr_convert(hdr_data + j * width * req_comp +
9003- i * req_comp,
9004- rgbe, req_comp);
9005- }
9006- }
9007- } else {
9008- // Read RLE-encoded data
9009- scanline = NULL;
9010-
9011- for (j = 0; j < height; ++j) {
9012- c1 = stbi__get8(s);
9013- c2 = stbi__get8(s);
9014- len = stbi__get8(s);
9015- if (c1 != 2 || c2 != 2 || (len & 0x80)) {
9016- // not run-length encoded, so we have to actually use THIS data
9017- // as a decoded pixel (note this can't be a valid pixel--one of
9018- // RGB must be >= 128)
9019- stbi_uc rgbe[4];
9020- rgbe[0] = (stbi_uc)c1;
9021- rgbe[1] = (stbi_uc)c2;
9022- rgbe[2] = (stbi_uc)len;
9023- rgbe[3] = (stbi_uc)stbi__get8(s);
9024- stbi__hdr_convert(hdr_data, rgbe, req_comp);
9025- i = 1;
9026- j = 0;
9027- STBI_FREE(scanline);
9028- goto main_decode_loop; // yes, this makes no sense
9029- }
9030- len <<= 8;
9031- len |= stbi__get8(s);
9032- if (len != width) {
9033- STBI_FREE(hdr_data);
9034- STBI_FREE(scanline);
9035- return stbi__errpf("invalid decoded scanline length",
9036- "corrupt HDR");
9037- }
9038- if (scanline == NULL) {
9039- scanline = (stbi_uc *)stbi__malloc_mad2(width, 4, 0);
9040- if (!scanline) {
9041- STBI_FREE(hdr_data);
9042- return stbi__errpf("outofmem", "Out of memory");
9043- }
9044- }
9045-
9046- for (k = 0; k < 4; ++k) {
9047- int nleft;
9048- i = 0;
9049- while ((nleft = width - i) > 0) {
9050- count = stbi__get8(s);
9051- if (count > 128) {
9052- // Run
9053- value = stbi__get8(s);
9054- count -= 128;
9055- if ((count == 0) || (count > nleft)) {
9056- STBI_FREE(hdr_data);
9057- STBI_FREE(scanline);
9058- return stbi__errpf("corrupt",
9059- "bad RLE data in HDR");
9060- }
9061- for (z = 0; z < count; ++z) {
9062- scanline[i++ * 4 + k] = value;
9063- }
9064- } else {
9065- // Dump
9066- if ((count == 0) || (count > nleft)) {
9067- STBI_FREE(hdr_data);
9068- STBI_FREE(scanline);
9069- return stbi__errpf("corrupt",
9070- "bad RLE data in HDR");
9071- }
9072- for (z = 0; z < count; ++z) {
9073- scanline[i++ * 4 + k] = stbi__get8(s);
9074- }
9075- }
9076- }
9077- }
9078- for (i = 0; i < width; ++i) {
9079- stbi__hdr_convert(hdr_data + (j * width + i) * req_comp,
9080- scanline + i * 4, req_comp);
9081- }
9082- }
9083- if (scanline) {
9084- STBI_FREE(scanline);
9085- }
9086- }
9087-
9088- return hdr_data;
9089-}
9090-
9091-static int
9092-stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
9093-{
9094- char buffer[STBI__HDR_BUFLEN];
9095- char *token;
9096- int valid = 0;
9097- int dummy;
9098-
9099- if (!x) {
9100- x = &dummy;
9101- }
9102- if (!y) {
9103- y = &dummy;
9104- }
9105- if (!comp) {
9106- comp = &dummy;
9107- }
9108-
9109- if (stbi__hdr_test(s) == 0) {
9110- stbi__rewind(s);
9111- return 0;
9112- }
9113-
9114- for (;;) {
9115- token = stbi__hdr_gettoken(s, buffer);
9116- if (token[0] == 0) {
9117- break;
9118- }
9119- if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) {
9120- valid = 1;
9121- }
9122- }
9123-
9124- if (!valid) {
9125- stbi__rewind(s);
9126- return 0;
9127- }
9128- token = stbi__hdr_gettoken(s, buffer);
9129- if (strncmp(token, "-Y ", 3)) {
9130- stbi__rewind(s);
9131- return 0;
9132- }
9133- token += 3;
9134- *y = (int)strtol(token, &token, 10);
9135- while (*token == ' ') {
9136- ++token;
9137- }
9138- if (strncmp(token, "+X ", 3)) {
9139- stbi__rewind(s);
9140- return 0;
9141- }
9142- token += 3;
9143- *x = (int)strtol(token, NULL, 10);
9144- *comp = 3;
9145- return 1;
9146-}
9147-#endif // STBI_NO_HDR
9148-
9149-#ifndef STBI_NO_BMP
9150-static int
9151-stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
9152-{
9153- void *p;
9154- stbi__bmp_data info;
9155-
9156- info.all_a = 255;
9157- p = stbi__bmp_parse_header(s, &info);
9158- if (p == NULL) {
9159- stbi__rewind(s);
9160- return 0;
9161- }
9162- if (x) {
9163- *x = s->img_x;
9164- }
9165- if (y) {
9166- *y = s->img_y;
9167- }
9168- if (comp) {
9169- if (info.bpp == 24 && info.ma == 0xff000000) {
9170- *comp = 3;
9171- } else {
9172- *comp = info.ma ? 4 : 3;
9173- }
9174- }
9175- return 1;
9176-}
9177-#endif
9178-
9179-#ifndef STBI_NO_PSD
9180-static int
9181-stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
9182-{
9183- int channelCount, dummy, depth;
9184- if (!x) {
9185- x = &dummy;
9186- }
9187- if (!y) {
9188- y = &dummy;
9189- }
9190- if (!comp) {
9191- comp = &dummy;
9192- }
9193- if (stbi__get32be(s) != 0x38425053) {
9194- stbi__rewind(s);
9195- return 0;
9196- }
9197- if (stbi__get16be(s) != 1) {
9198- stbi__rewind(s);
9199- return 0;
9200- }
9201- stbi__skip(s, 6);
9202- channelCount = stbi__get16be(s);
9203- if (channelCount < 0 || channelCount > 16) {
9204- stbi__rewind(s);
9205- return 0;
9206- }
9207- *y = stbi__get32be(s);
9208- *x = stbi__get32be(s);
9209- depth = stbi__get16be(s);
9210- if (depth != 8 && depth != 16) {
9211- stbi__rewind(s);
9212- return 0;
9213- }
9214- if (stbi__get16be(s) != 3) {
9215- stbi__rewind(s);
9216- return 0;
9217- }
9218- *comp = 4;
9219- return 1;
9220-}
9221-
9222-static int
9223-stbi__psd_is16(stbi__context *s)
9224-{
9225- int channelCount, depth;
9226- if (stbi__get32be(s) != 0x38425053) {
9227- stbi__rewind(s);
9228- return 0;
9229- }
9230- if (stbi__get16be(s) != 1) {
9231- stbi__rewind(s);
9232- return 0;
9233- }
9234- stbi__skip(s, 6);
9235- channelCount = stbi__get16be(s);
9236- if (channelCount < 0 || channelCount > 16) {
9237- stbi__rewind(s);
9238- return 0;
9239- }
9240- STBI_NOTUSED(stbi__get32be(s));
9241- STBI_NOTUSED(stbi__get32be(s));
9242- depth = stbi__get16be(s);
9243- if (depth != 16) {
9244- stbi__rewind(s);
9245- return 0;
9246- }
9247- return 1;
9248-}
9249-#endif
9250-
9251-#ifndef STBI_NO_PIC
9252-static int
9253-stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
9254-{
9255- int act_comp = 0, num_packets = 0, chained, dummy;
9256- stbi__pic_packet packets[10];
9257-
9258- if (!x) {
9259- x = &dummy;
9260- }
9261- if (!y) {
9262- y = &dummy;
9263- }
9264- if (!comp) {
9265- comp = &dummy;
9266- }
9267-
9268- if (!stbi__pic_is4(s, "\x53\x80\xF6\x34")) {
9269- stbi__rewind(s);
9270- return 0;
9271- }
9272-
9273- stbi__skip(s, 88);
9274-
9275- *x = stbi__get16be(s);
9276- *y = stbi__get16be(s);
9277- if (stbi__at_eof(s)) {
9278- stbi__rewind(s);
9279- return 0;
9280- }
9281- if ((*x) != 0 && (1 << 28) / (*x) < (*y)) {
9282- stbi__rewind(s);
9283- return 0;
9284- }
9285-
9286- stbi__skip(s, 8);
9287-
9288- do {
9289- stbi__pic_packet *packet;
9290-
9291- if (num_packets == sizeof(packets) / sizeof(packets[0])) {
9292- return 0;
9293- }
9294-
9295- packet = &packets[num_packets++];
9296- chained = stbi__get8(s);
9297- packet->size = stbi__get8(s);
9298- packet->type = stbi__get8(s);
9299- packet->channel = stbi__get8(s);
9300- act_comp |= packet->channel;
9301-
9302- if (stbi__at_eof(s)) {
9303- stbi__rewind(s);
9304- return 0;
9305- }
9306- if (packet->size != 8) {
9307- stbi__rewind(s);
9308- return 0;
9309- }
9310- } while (chained);
9311-
9312- *comp = (act_comp & 0x10 ? 4 : 3);
9313-
9314- return 1;
9315-}
9316-#endif
9317-
9318-// *************************************************************************************************
9319-// Portable Gray Map and Portable Pixel Map loader
9320-// by Ken Miller
9321-//
9322-// PGM: http://netpbm.sourceforge.net/doc/pgm.html
9323-// PPM: http://netpbm.sourceforge.net/doc/ppm.html
9324-//
9325-// Known limitations:
9326-// Does not support comments in the header section
9327-// Does not support ASCII image data (formats P2 and P3)
9328-
9329-#ifndef STBI_NO_PNM
9330-
9331-static int
9332-stbi__pnm_test(stbi__context *s)
9333-{
9334- char p, t;
9335- p = (char)stbi__get8(s);
9336- t = (char)stbi__get8(s);
9337- if (p != 'P' || (t != '5' && t != '6')) {
9338- stbi__rewind(s);
9339- return 0;
9340- }
9341- return 1;
9342-}
9343-
9344-static void *
9345-stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp,
9346- stbi__result_info *ri)
9347-{
9348- stbi_uc *out;
9349- STBI_NOTUSED(ri);
9350-
9351- ri->bits_per_channel =
9352- stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n);
9353- if (ri->bits_per_channel == 0) {
9354- return 0;
9355- }
9356-
9357- if (s->img_y > STBI_MAX_DIMENSIONS) {
9358- return stbi__errpuc("too large", "Very large image (corrupt?)");
9359- }
9360- if (s->img_x > STBI_MAX_DIMENSIONS) {
9361- return stbi__errpuc("too large", "Very large image (corrupt?)");
9362- }
9363-
9364- *x = s->img_x;
9365- *y = s->img_y;
9366- if (comp) {
9367- *comp = s->img_n;
9368- }
9369-
9370- if (!stbi__mad4sizes_valid(s->img_n, s->img_x, s->img_y,
9371- ri->bits_per_channel / 8, 0)) {
9372- return stbi__errpuc("too large", "PNM too large");
9373- }
9374-
9375- out = (stbi_uc *)stbi__malloc_mad4(s->img_n, s->img_x, s->img_y,
9376- ri->bits_per_channel / 8, 0);
9377- if (!out) {
9378- return stbi__errpuc("outofmem", "Out of memory");
9379- }
9380- if (!stbi__getn(s, out,
9381- s->img_n * s->img_x * s->img_y *
9382- (ri->bits_per_channel / 8))) {
9383- STBI_FREE(out);
9384- return stbi__errpuc("bad PNM", "PNM file truncated");
9385- }
9386-
9387- if (req_comp && req_comp != s->img_n) {
9388- if (ri->bits_per_channel == 16) {
9389- out = (stbi_uc *)stbi__convert_format16(
9390- (stbi__uint16 *)out, s->img_n, req_comp, s->img_x, s->img_y);
9391- } else {
9392- out = stbi__convert_format(out, s->img_n, req_comp, s->img_x,
9393- s->img_y);
9394- }
9395- if (out == NULL) {
9396- return out; // stbi__convert_format frees input on failure
9397- }
9398- }
9399- return out;
9400-}
9401-
9402-static int
9403-stbi__pnm_isspace(char c)
9404-{
9405- return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' ||
9406- c == '\r';
9407-}
9408-
9409-static void
9410-stbi__pnm_skip_whitespace(stbi__context *s, char *c)
9411-{
9412- for (;;) {
9413- while (!stbi__at_eof(s) && stbi__pnm_isspace(*c)) {
9414- *c = (char)stbi__get8(s);
9415- }
9416-
9417- if (stbi__at_eof(s) || *c != '#') {
9418- break;
9419- }
9420-
9421- while (!stbi__at_eof(s) && *c != '\n' && *c != '\r') {
9422- *c = (char)stbi__get8(s);
9423- }
9424- }
9425-}
9426-
9427-static int
9428-stbi__pnm_isdigit(char c)
9429-{
9430- return c >= '0' && c <= '9';
9431-}
9432-
9433-static int
9434-stbi__pnm_getinteger(stbi__context *s, char *c)
9435-{
9436- int value = 0;
9437-
9438- while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
9439- value = value * 10 + (*c - '0');
9440- *c = (char)stbi__get8(s);
9441- if ((value > 214748364) || (value == 214748364 && *c > '7')) {
9442- return stbi__err(
9443- "integer parse overflow",
9444- "Parsing an integer in the PPM header overflowed a 32-bit int");
9445- }
9446- }
9447-
9448- return value;
9449-}
9450-
9451-static int
9452-stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
9453-{
9454- int maxv, dummy;
9455- char c, p, t;
9456-
9457- if (!x) {
9458- x = &dummy;
9459- }
9460- if (!y) {
9461- y = &dummy;
9462- }
9463- if (!comp) {
9464- comp = &dummy;
9465- }
9466-
9467- stbi__rewind(s);
9468-
9469- // Get identifier
9470- p = (char)stbi__get8(s);
9471- t = (char)stbi__get8(s);
9472- if (p != 'P' || (t != '5' && t != '6')) {
9473- stbi__rewind(s);
9474- return 0;
9475- }
9476-
9477- *comp =
9478- (t == '6') ? 3 : 1; // '5' is 1-component .pgm; '6' is 3-component .ppm
9479-
9480- c = (char)stbi__get8(s);
9481- stbi__pnm_skip_whitespace(s, &c);
9482-
9483- *x = stbi__pnm_getinteger(s, &c); // read width
9484- if (*x == 0) {
9485- return stbi__err("invalid width",
9486- "PPM image header had zero or overflowing width");
9487- }
9488- stbi__pnm_skip_whitespace(s, &c);
9489-
9490- *y = stbi__pnm_getinteger(s, &c); // read height
9491- if (*y == 0) {
9492- return stbi__err("invalid width",
9493- "PPM image header had zero or overflowing width");
9494- }
9495- stbi__pnm_skip_whitespace(s, &c);
9496-
9497- maxv = stbi__pnm_getinteger(s, &c); // read max value
9498- if (maxv > 65535) {
9499- return stbi__err("max value > 65535",
9500- "PPM image supports only 8-bit and 16-bit images");
9501- } else if (maxv > 255) {
9502- return 16;
9503- } else {
9504- return 8;
9505- }
9506-}
9507-
9508-static int
9509-stbi__pnm_is16(stbi__context *s)
9510-{
9511- if (stbi__pnm_info(s, NULL, NULL, NULL) == 16) {
9512- return 1;
9513- }
9514- return 0;
9515-}
9516-#endif
9517-
9518-static int
9519-stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
9520-{
9521-#ifndef STBI_NO_JPEG
9522- if (stbi__jpeg_info(s, x, y, comp)) {
9523- return 1;
9524- }
9525-#endif
9526-
9527-#ifndef STBI_NO_PNG
9528- if (stbi__png_info(s, x, y, comp)) {
9529- return 1;
9530- }
9531-#endif
9532-
9533-#ifndef STBI_NO_GIF
9534- if (stbi__gif_info(s, x, y, comp)) {
9535- return 1;
9536- }
9537-#endif
9538-
9539-#ifndef STBI_NO_BMP
9540- if (stbi__bmp_info(s, x, y, comp)) {
9541- return 1;
9542- }
9543-#endif
9544-
9545-#ifndef STBI_NO_PSD
9546- if (stbi__psd_info(s, x, y, comp)) {
9547- return 1;
9548- }
9549-#endif
9550-
9551-#ifndef STBI_NO_PIC
9552- if (stbi__pic_info(s, x, y, comp)) {
9553- return 1;
9554- }
9555-#endif
9556-
9557-#ifndef STBI_NO_PNM
9558- if (stbi__pnm_info(s, x, y, comp)) {
9559- return 1;
9560- }
9561-#endif
9562-
9563-#ifndef STBI_NO_HDR
9564- if (stbi__hdr_info(s, x, y, comp)) {
9565- return 1;
9566- }
9567-#endif
9568-
9569-// test tga last because it's a crappy test!
9570-#ifndef STBI_NO_TGA
9571- if (stbi__tga_info(s, x, y, comp)) {
9572- return 1;
9573- }
9574-#endif
9575- return stbi__err("unknown image type",
9576- "Image not of any known type, or corrupt");
9577-}
9578-
9579-static int
9580-stbi__is_16_main(stbi__context *s)
9581-{
9582-#ifndef STBI_NO_PNG
9583- if (stbi__png_is16(s)) {
9584- return 1;
9585- }
9586-#endif
9587-
9588-#ifndef STBI_NO_PSD
9589- if (stbi__psd_is16(s)) {
9590- return 1;
9591- }
9592-#endif
9593-
9594-#ifndef STBI_NO_PNM
9595- if (stbi__pnm_is16(s)) {
9596- return 1;
9597- }
9598-#endif
9599- return 0;
9600-}
9601-
9602-#ifndef STBI_NO_STDIO
9603-STBIDEF int
9604-stbi_info(char const *filename, int *x, int *y, int *comp)
9605-{
9606- FILE *f = stbi__fopen(filename, "rb");
9607- int result;
9608- if (!f) {
9609- return stbi__err("can't fopen", "Unable to open file");
9610- }
9611- result = stbi_info_from_file(f, x, y, comp);
9612- fclose(f);
9613- return result;
9614-}
9615-
9616-STBIDEF int
9617-stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
9618-{
9619- int r;
9620- stbi__context s;
9621- long pos = ftell(f);
9622- stbi__start_file(&s, f);
9623- r = stbi__info_main(&s, x, y, comp);
9624- fseek(f, pos, SEEK_SET);
9625- return r;
9626-}
9627-
9628-STBIDEF int
9629-stbi_is_16_bit(char const *filename)
9630-{
9631- FILE *f = stbi__fopen(filename, "rb");
9632- int result;
9633- if (!f) {
9634- return stbi__err("can't fopen", "Unable to open file");
9635- }
9636- result = stbi_is_16_bit_from_file(f);
9637- fclose(f);
9638- return result;
9639-}
9640-
9641-STBIDEF int
9642-stbi_is_16_bit_from_file(FILE *f)
9643-{
9644- int r;
9645- stbi__context s;
9646- long pos = ftell(f);
9647- stbi__start_file(&s, f);
9648- r = stbi__is_16_main(&s);
9649- fseek(f, pos, SEEK_SET);
9650- return r;
9651-}
9652-#endif // !STBI_NO_STDIO
9653-
9654-STBIDEF int
9655-stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
9656-{
9657- stbi__context s;
9658- stbi__start_mem(&s, buffer, len);
9659- return stbi__info_main(&s, x, y, comp);
9660-}
9661-
9662-STBIDEF int
9663-stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y,
9664- int *comp)
9665-{
9666- stbi__context s;
9667- stbi__start_callbacks(&s, (stbi_io_callbacks *)c, user);
9668- return stbi__info_main(&s, x, y, comp);
9669-}
9670-
9671-STBIDEF int
9672-stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len)
9673-{
9674- stbi__context s;
9675- stbi__start_mem(&s, buffer, len);
9676- return stbi__is_16_main(&s);
9677-}
9678-
9679-STBIDEF int
9680-stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *c, void *user)
9681-{
9682- stbi__context s;
9683- stbi__start_callbacks(&s, (stbi_io_callbacks *)c, user);
9684- return stbi__is_16_main(&s);
9685-}
9686-
9687-#endif // STB_IMAGE_IMPLEMENTATION
9688-
9689-/*
9690- revision history:
9691- 2.20 (2019-02-07) support utf8 filenames in Windows; fix warnings and
9692- platform ifdefs 2.19 (2018-02-11) fix warning 2.18 (2018-01-30) fix
9693- warnings 2.17 (2018-01-29) change sbti__shiftsigned to avoid clang -O2 bug
9694- 1-bit BMP
9695- *_is_16_bit api
9696- avoid warnings
9697- 2.16 (2017-07-23) all functions have 16-bit variants;
9698- STBI_NO_STDIO works again;
9699- compilation fixes;
9700- fix rounding in unpremultiply;
9701- optimize vertical flip;
9702- disable raw_len validation;
9703- documentation fixes
9704- 2.15 (2017-03-18) fix png-1,2,4 bug; now all Imagenet JPGs decode;
9705- warning fixes; disable run-time SSE detection on gcc;
9706- uniform handling of optional "return" values;
9707- thread-safe initialization of zlib tables
9708- 2.14 (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet
9709- JPGs 2.13 (2016-11-29) add 16-bit API, only supported for PNG right now 2.12
9710- (2016-04-02) fix typo in 2.11 PSD fix that caused crashes 2.11 (2016-04-02)
9711- allocate large structures on the stack remove white matting for transparent
9712- PSD fix reported channel count for PNG & BMP re-enable SSE2 in non-gcc 64-bit
9713- support RGB-formatted JPEG
9714- read 16-bit PNGs (only as 8-bit)
9715- 2.10 (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
9716- 2.09 (2016-01-16) allow comments in PNM files
9717- 16-bit-per-pixel TGA (not bit-per-component)
9718- info() for TGA could break due to .hdr handling
9719- info() for BMP to shares code instead of sloppy parse
9720- can use STBI_REALLOC_SIZED if allocator doesn't support
9721- realloc code cleanup 2.08 (2015-09-13) fix to 2.07 cleanup, reading RGB PSD
9722- as RGBA 2.07 (2015-09-13) fix compiler warnings partial animated GIF support
9723- limited 16-bpc PSD support
9724- #ifdef unused functions
9725- bug with < 92 byte PIC,PNM,HDR,TGA
9726- 2.06 (2015-04-19) fix bug where PSD returns wrong '*comp' value
9727- 2.05 (2015-04-19) fix bug in progressive JPEG handling, fix warning
9728- 2.04 (2015-04-15) try to re-enable SIMD on MinGW 64-bit
9729- 2.03 (2015-04-12) extra corruption checking (mmozeiko)
9730- stbi_set_flip_vertically_on_load (nguillemot)
9731- fix NEON support; fix mingw support
9732- 2.02 (2015-01-19) fix incorrect assert, fix warning
9733- 2.01 (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit
9734- without -msse2 2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG 2.00
9735- (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg) progressive
9736- JPEG (stb) PGM/PPM support (Ken Miller) STBI_MALLOC,STBI_REALLOC,STBI_FREE
9737- GIF bugfix -- seemingly never worked
9738- STBI_NO_*, STBI_ONLY_*
9739- 1.48 (2014-12-14) fix incorrectly-named assert()
9740- 1.47 (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar
9741- Cornut & stb) optimize PNG (ryg) fix bug in interlaced PNG with
9742- user-specified channel count (stb) 1.46 (2014-08-26) fix broken tRNS chunk
9743- (colorkey-style transparency) in non-paletted PNG 1.45 (2014-08-16) fix
9744- MSVC-ARM internal compiler error by wrapping malloc 1.44 (2014-08-07)
9745- various warning fixes from Ronny Chevalier
9746- 1.43 (2014-07-15)
9747- fix MSVC-only compiler problem in code changed in 1.42
9748- 1.42 (2014-07-09)
9749- don't define _CRT_SECURE_NO_WARNINGS (affects user code)
9750- fixes to stbi__cleanup_jpeg path
9751- added STBI_ASSERT to avoid requiring assert.h
9752- 1.41 (2014-06-25)
9753- fix search&replace from 1.36 that messed up comments/error
9754- messages 1.40 (2014-06-22) fix gcc struct-initialization warning 1.39
9755- (2014-06-15) fix to TGA optimization when req_comp != number of components in
9756- TGA; fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my
9757- test suite) add support for BMP version 5 (more ignored fields) 1.38
9758- (2014-06-06) suppress MSVC warnings on integer casts truncating values fix
9759- accidental rename of 'skip' field of I/O 1.37 (2014-06-04) remove duplicate
9760- typedef 1.36 (2014-06-03) convert to header file single-file library if
9761- de-iphone isn't set, load iphone images color-swapped instead of returning
9762- NULL 1.35 (2014-05-27) various warnings fix broken STBI_SIMD path fix bug
9763- where stbi_load_from_file no longer left file pointer in correct place fix
9764- broken non-easy path for 32-bit BMP (possibly never used) TGA optimization by
9765- Arseny Kapoulkine 1.34 (unknown) use STBI_NOTUSED in
9766- stbi__resample_row_generic(), fix one more leak in tga failure case 1.33
9767- (2011-07-14) make stbi_is_hdr work in STBI_NO_HDR (as specified), minor
9768- compiler-friendly improvements 1.32 (2011-07-13) support for "info" function
9769- for all supported filetypes (SpartanJ) 1.31 (2011-06-20) a few more leak
9770- fixes, bug in PNG handling (SpartanJ) 1.30 (2011-06-11) added ability to
9771- load files via callbacks to accomidate custom input streams (Ben Wenger)
9772- removed deprecated format-specific test/load functions
9773- removed support for installable file formats (stbi_loader) --
9774- would have been broken for IO callbacks anyway error cases in bmp and tga
9775- give messages and don't leak (Raymond Barbiero, grisha) fix inefficiency in
9776- decoding 32-bit BMP (David Woo) 1.29 (2010-08-16) various warning fixes from
9777- Aurelien Pocheville 1.28 (2010-08-01) fix bug in GIF palette transparency
9778- (SpartanJ) 1.27 (2010-08-01) cast-to-stbi_uc to fix warnings 1.26
9779- (2010-07-24) fix bug in file buffering for PNG reported by SpartanJ 1.25
9780- (2010-07-17) refix trans_data warning (Won Chun) 1.24 (2010-07-12) perf
9781- improvements reading from files on platforms with lock-heavy fgetc() minor
9782- perf improvements for jpeg deprecated type-specific functions so we'll get
9783- feedback if they're needed attempt to fix trans_data warning (Won Chun) 1.23
9784- fixed bug in iPhone support 1.22 (2010-07-10) removed image *writing*
9785- support stbi_info support from Jetro Lauha GIF support from Jean-Marc Lienher
9786- iPhone PNG-extensions from James Brown
9787- warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err.
9788- Janez (U+017D)emva) 1.21 fix use of 'stbi_uc' in header (reported by jon
9789- blow) 1.20 added support for Softimage PIC, by Tom Seddon 1.19 bug in
9790- interlaced PNG corruption check (found by ryg) 1.18 (2008-08-02) fix a
9791- threading bug (local mutable static) 1.17 support interlaced PNG 1.16
9792- major bugfix - stbi__convert_format converted one too many pixels 1.15
9793- initialize some fields for thread safety 1.14 fix threadsafe conversion
9794- bug header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
9795- 1.13 threadsafe
9796- 1.12 const qualifiers in the API
9797- 1.11 Support installable IDCT, colorspace conversion routines
9798- 1.10 Fixes for 64-bit (don't use "unsigned long")
9799- optimized upsampling by Fabian "ryg" Giesen
9800- 1.09 Fix format-conversion for PSD code (bad global variables!)
9801- 1.08 Thatcher Ulrich's PSD code integrated by Nicolas Schulz
9802- 1.07 attempt to fix C++ warning/errors again
9803- 1.06 attempt to fix C++ warning/errors again
9804- 1.05 fix TGA loading to return correct *comp and use good luminance
9805- calc 1.04 default float alpha is 1, not 255; use 'void *' for
9806- stbi_image_free 1.03 bugfixes to STBI_NO_STDIO, STBI_NO_HDR 1.02 support
9807- for (subset of) HDR files, float interface for preferred access to them 1.01
9808- fix bug: possible bug in handling right-side up bmps... not sure fix bug: the
9809- stbi__bmp_load() and stbi__tga_load() functions didn't work at all 1.00
9810- interface to zlib that skips zlib header 0.99 correct handling of alpha in
9811- palette 0.98 TGA loader by lonesock; dynamically add loaders (untested)
9812- 0.97 jpeg errors on too large a file; also catch another malloc failure
9813- 0.96 fix detection of invalid v value - particleman@mollyrocket forum
9814- 0.95 during header scan, seek to markers in case of padding
9815- 0.94 STBI_NO_STDIO to disable stdio usage; rename all #defines the same
9816- 0.93 handle jpegtran output; verbose errors
9817- 0.92 read 4,8,16,24,32-bit BMP files of several formats
9818- 0.91 output 24-bit Windows 3.0 BMP files
9819- 0.90 fix a few more warnings; bump version number to approach 1.0
9820- 0.61 bugfixes due to Marc LeBlanc, Christopher Lloyd
9821- 0.60 fix compiling as c++
9822- 0.59 fix warnings: merge Dave Moore's -Wall fixes
9823- 0.58 fix bug: zlib uncompressed mode len/nlen was wrong endian
9824- 0.57 fix bug: jpg last huffman symbol before marker was >9 bits but
9825- less than 16 available 0.56 fix bug: zlib uncompressed mode len vs. nlen
9826- 0.55 fix bug: restart_interval not initialized to 0
9827- 0.54 allow NULL for 'int *comp'
9828- 0.53 fix bug in png 3->4; speedup png decoding
9829- 0.52 png handles req_comp=3,4 directly; minor cleanup; jpeg comments
9830- 0.51 obey req_comp requests, 1-component jpegs return as 1-component,
9831- on 'test' only check type, not whether we support this variant
9832- 0.50 (2006-11-19)
9833- first released version
9834-*/
9835-
9836-/*
9837-------------------------------------------------------------------------------
9838-This software is available under 2 licenses -- choose whichever you prefer.
9839-------------------------------------------------------------------------------
9840-ALTERNATIVE A - MIT License
9841-Copyright (c) 2017 Sean Barrett
9842-Permission is hereby granted, free of charge, to any person obtaining a copy of
9843-this software and associated documentation files (the "Software"), to deal in
9844-the Software without restriction, including without limitation the rights to
9845-use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
9846-of the Software, and to permit persons to whom the Software is furnished to do
9847-so, subject to the following conditions:
9848-The above copyright notice and this permission notice shall be included in all
9849-copies or substantial portions of the Software.
9850-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
9851-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
9852-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
9853-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
9854-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
9855-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
9856-SOFTWARE.
9857-------------------------------------------------------------------------------
9858-ALTERNATIVE B - Public Domain (www.unlicense.org)
9859-This is free and unencumbered software released into the public domain.
9860-Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
9861-software, either in source code form or as a compiled binary, for any purpose,
9862-commercial or non-commercial, and by any means.
9863-In jurisdictions that recognize copyright laws, the author or authors of this
9864-software dedicate any and all copyright interest in the software to the public
9865-domain. We make this dedication for the benefit of the public at large and to
9866-the detriment of our heirs and successors. We intend this dedication to be an
9867-overt act of relinquishment in perpetuity of all present and future rights to
9868-this software under copyright law.
9869-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
9870-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
9871-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
9872-AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
9873-ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
9874-WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
9875-------------------------------------------------------------------------------
9876-*/
+0,
-13259
1@@ -1,13259 +0,0 @@
2-/* stb_image_resize2 - v2.17 - public domain image resizing
3-
4- by Jeff Roberts (v2) and Jorge L Rodriguez
5- http://github.com/nothings/stb
6-
7- Can be threaded with the extended API. SSE2, AVX, Neon and WASM SIMD support.
8- Only scaling and translation is supported, no rotations or shears.
9-
10- COMPILING & LINKING
11- In one C/C++ file that #includes this file, do this:
12- #define STB_IMAGE_RESIZE_IMPLEMENTATION
13- before the #include. That will create the implementation in that file.
14-
15- EASY API CALLS:
16- Easy API downsamples w/Mitchell filter, upsamples w/cubic interpolation,
17- clamps to edge.
18-
19- stbir_resize_uint8_srgb( input_pixels, input_w, input_h,
20- input_stride_in_bytes, output_pixels, output_w, output_h,
21- output_stride_in_bytes, pixel_layout_enum )
22-
23- stbir_resize_uint8_linear( input_pixels, input_w, input_h,
24- input_stride_in_bytes, output_pixels, output_w, output_h,
25- output_stride_in_bytes, pixel_layout_enum )
26-
27- stbir_resize_float_linear( input_pixels, input_w, input_h,
28- input_stride_in_bytes, output_pixels, output_w, output_h,
29- output_stride_in_bytes, pixel_layout_enum )
30-
31- If you pass NULL or zero for the output_pixels, we will allocate the output
32- buffer for you and return it from the function (free with free() or
33- STBIR_FREE). As a special case, XX_stride_in_bytes of 0 means packed
34- continuously in memory.
35-
36- API LEVELS
37- There are three levels of API - easy-to-use, medium-complexity and
38- extended-complexity.
39-
40- See the "header file" section of the source for API documentation.
41-
42- ADDITIONAL DOCUMENTATION
43-
44- MEMORY ALLOCATION
45- By default, we use malloc and free for memory allocation. To override
46- the memory allocation, before the implementation #include, add a:
47-
48- #define STBIR_MALLOC(size,user_data) ...
49- #define STBIR_FREE(ptr,user_data) ...
50-
51- Each resize makes exactly one call to malloc/free (unless you use the
52- extended API where you can do one allocation for many resizes). Under
53- address sanitizer, we do separate allocations to find overread/writes.
54-
55- PERFORMANCE
56- This library was written with an emphasis on performance. When testing
57- stb_image_resize with RGBA, the fastest mode is STBIR_4CHANNEL with
58- STBIR_TYPE_UINT8 pixels and CLAMPed edges (which is what many other
59- resize libs do by default). Also, make sure SIMD is turned on of course
60- (default for 64-bit targets). Avoid WRAP edge mode if you want the fastest
61- speed.
62-
63- This library also comes with profiling built-in. If you define
64- STBIR_PROFILE, you can use the advanced API and get low-level profiling
65- information by calling stbir_resize_extended_profile_info() or
66- stbir_resize_split_profile_info() after a resize.
67-
68- SIMD
69- Most of the routines have optimized SSE2, AVX, NEON and WASM versions.
70-
71- On Microsoft compilers, we automatically turn on SIMD for 64-bit x64
72- and ARM; for 32-bit x86 and ARM, you select SIMD mode by defining STBIR_SSE2
73- or STBIR_NEON. For AVX and AVX2, we auto-select it by detecting the /arch:AVX
74- or /arch:AVX2 switches. You can also always manually turn SSE2, AVX or
75- AVX2 support on by defining STBIR_SSE2, STBIR_AVX or STBIR_AVX2.
76-
77- On Linux, SSE2 and Neon is on by default for 64-bit x64 or ARM64. For
78- 32-bit, we select x86 SIMD mode by whether you have -msse2, -mavx or -mavx2
79- enabled on the command line. For 32-bit ARM, you must pass -mfpu=neon-vfpv4
80- for both clang and GCC, but GCC also requires an additional
81- -mfp16-format=ieee to automatically enable NEON.
82-
83- On x86 platforms, you can also define STBIR_FP16C to turn on FP16C
84- instructions for converting back and forth to half-floats. This is
85- autoselected when we are using AVX2. Clang and GCC also require the -mf16c
86- switch. ARM always uses the built-in half float hardware NEON instructions.
87-
88- You can also tell us to use multiply-add instructions with
89- STBIR_USE_FMA. Because x86 doesn't always have fma, we turn it off by default
90- to maintain determinism across all platforms. If you don't care about non-FMA
91- determinism and are willing to restrict yourself to more recent x86 CPUs
92- (around the AVX timeframe), then fma will give you around a 15% speedup.
93-
94- You can force off SIMD in all cases by defining STBIR_NO_SIMD. You can
95- turn off AVX or AVX2 specifically with STBIR_NO_AVX or STBIR_NO_AVX2. AVX is
96- 10% to 40% faster, and AVX2 is generally another 12%.
97-
98- ALPHA CHANNEL
99- Most of the resizing functions provide the ability to control how the
100- alpha channel of an image is processed.
101-
102- When alpha represents transparency, it is important that when combining
103- colors with filtering, the pixels should not be treated equally; they
104- should use a weighted average based on their alpha values. For example,
105- if a pixel is 1% opaque bright green and another pixel is 99% opaque
106- black and you average them, the average will be 50% opaque, but the
107- unweighted average and will be a middling green color, while the
108- weighted average will be nearly black. This means the unweighted version
109- introduced green energy that didn't exist in the source image.
110-
111- (If you want to know why this makes sense, you can work out the math
112- for the following: consider what happens if you alpha composite a source
113- image over a fixed color and then average the output, vs. if you average the
114- source image pixels and then composite that over the same fixed color.
115- Only the weighted average produces the same result as the ground truth
116- composite-then-average result.)
117-
118- Therefore, it is in general best to "alpha weight" the pixels when
119- applying filters to them. This essentially means multiplying the colors by
120- the alpha values before combining them, and then dividing by the alpha value
121- at the end.
122-
123- The computer graphics industry introduced a technique called
124- "premultiplied alpha" or "associated alpha" in which image colors are stored
125- in image files already multiplied by their alpha. This saves some math when
126- compositing, and also avoids the need to divide by the alpha at the end
127- (which is quite inefficient). However, while premultiplied alpha is common in
128- the movie CGI industry, it is not commonplace in other industries like
129- videogames, and most consumer file formats are generally expected to contain
130- not-premultiplied colors. For example, Photoshop saves PNG files
131- "unpremultiplied", and web browsers like Chrome and Firefox expect PNG images
132- to be unpremultiplied.
133-
134- Note that there are three possibilities that might describe your image
135- and resize expectation:
136-
137- 1. images are not premultiplied, alpha weighting is desired
138- 2. images are not premultiplied, alpha weighting is not desired
139- 3. images are premultiplied
140-
141- Both case #2 and case #3 require the exact same math: no alpha
142- weighting should be applied or removed. Only case 1 requires extra math
143- operations; the other two cases can be handled identically.
144-
145- stb_image_resize expects case #1 by default, applying alpha weighting
146- to images, expecting the input images to be unpremultiplied. This is what the
147- COLOR+ALPHA buffer types tell the resizer to do.
148-
149- When you use the pixel layouts STBIR_RGBA, STBIR_BGRA, STBIR_ARGB,
150- STBIR_ABGR, STBIR_RX, or STBIR_XR you are telling us that the pixels
151- are non-premultiplied. In these cases, the resizer will alpha weight the
152- colors (effectively creating the premultiplied image), do the filtering, and
153- then convert back to non-premult on exit.
154-
155- When you use the pixel layouts STBIR_RGBA_PM, STBIR_RGBA_PM,
156- STBIR_RGBA_PM, STBIR_RGBA_PM, STBIR_RX_PM or STBIR_XR_PM, you are telling
157- that the pixels ARE premultiplied. In this case, the resizer doesn't have to
158- do the premultipling - it can filter directly on the input. This about twice
159- as fast as the non-premultiplied case, so it's the right option if your data
160- is already setup correctly.
161-
162- When you use the pixel layout STBIR_4CHANNEL or STBIR_2CHANNEL, you are
163- telling us that there is no channel that represents transparency; it
164- may be RGB and some unrelated fourth channel that has been stored in the
165- alpha channel, but it is actually not alpha. No special processing will be
166- performed.
167-
168- The difference between the generic 4 or 2 channel layouts, and the
169- specialized _PM versions is with the _PM versions you are telling us
170- that the data *is* alpha, just don't premultiply it. That's important when
171- using SRGB pixel formats, we need to know where the alpha is, because
172- it is converted linearly (rather than with the SRGB converters).
173-
174- Because alpha weighting produces the same effect as premultiplying, you
175- even have the option with non-premultiplied inputs to let the resizer
176- produce a premultiplied output. Because the intially computed
177- alpha-weighted output image is effectively premultiplied, this is actually
178- more performant than the normal path which un-premultiplies the output image
179- as a final step.
180-
181- Finally, when converting both in and out of non-premulitplied space
182- (for example, when using STBIR_RGBA), we go to somewhat heroic measures to
183- ensure that areas with zero alpha value pixels get something reasonable
184- in the RGB values. If you don't care about the RGB values of zero alpha
185- pixels, you can call the stbir_set_non_pm_alpha_speed_over_quality()
186- function - this runs a premultiplied resize about 25% faster. That
187- said, when you really care about speed, using premultiplied pixels for both
188- in and out (STBIR_RGBA_PM, etc) much faster than both of these premultiplied
189- options.
190-
191- PIXEL LAYOUT CONVERSION
192- The resizer can convert from some pixel layouts to others. When using
193- the stbir_set_pixel_layouts(), you can, for example, specify STBIR_RGBA on
194- input, and STBIR_ARGB on output, and it will re-organize the channels during
195- the resize. Currently, you can only convert between two pixel layouts with
196- the same number of channels.
197-
198- DETERMINISM
199- We commit to being deterministic (from x64 to ARM to scalar to SIMD,
200- etc). This requires compiling with fast-math off (using at least
201- /fp:precise). Also, you must turn off fp-contracting (which turns mult+adds
202- into fmas)! We attempt to do this with pragmas, but with Clang, you usually
203- want to add -ffp-contract=off to the command line as well.
204-
205- For 32-bit x86, you must use SSE and SSE2 codegen for determinism. That
206- is, if the scalar x87 unit gets used at all, we immediately lose determinism.
207- On Microsoft Visual Studio 2008 and earlier, from what we can tell
208- there is no way to be deterministic in 32-bit x86 (some x87 always leaks in,
209- even with fp:strict). On 32-bit x86 GCC, determinism requires both -msse2 and
210- -fpmath=sse.
211-
212- Note that we will not be deterministic with float data containing NaNs
213- - the NaNs will propagate differently on different SIMD and platforms.
214-
215- If you turn on STBIR_USE_FMA, then we will be deterministic with other
216- fma targets, but we will differ from non-fma targets (this is
217- unavoidable, because a fma isn't simply an add with a mult - it also
218- introduces a rounding difference compared to non-fma instruction sequences.
219-
220- FLOAT PIXEL FORMAT RANGE
221- Any range of values can be used for the non-alpha float data that you
222- pass in (0 to 1, -1 to 1, whatever). However, if you are inputting float
223- values but *outputting* bytes or shorts, you must use a range of 0 to 1 so
224- that we scale back properly. The alpha channel must also be 0 to 1 for any
225- format that does premultiplication prior to resizing.
226-
227- Note also that with float output, using filters with negative lobes,
228- the output filtered values might go slightly out of range. You can define
229- STBIR_FLOAT_LOW_CLAMP and/or STBIR_FLOAT_HIGH_CLAMP to specify the
230- range to clamp to on output, if that's important.
231-
232- MAX/MIN SCALE FACTORS
233- The input pixel resolutions are in integers, and we do the internal
234- pointer resolution in size_t sized integers. However, the scale ratio from
235- input resolution to output resolution is calculated in float form. This means
236- the effective possible scale ratio is limited to 24 bits (or 16 million
237- to 1). As you get close to the size of the float resolution (again, 16
238- million pixels wide or high), you might start seeing float inaccuracy
239- issues in general in the pipeline. If you have to do extreme resizes,
240- you can usually do this is multiple stages (using float intermediate
241- buffers).
242-
243- FLIPPED IMAGES
244- Stride is just the delta from one scanline to the next. This means you
245- can use a negative stride to handle inverted images (point to the final
246- scanline and use a negative stride). You can invert the input or
247- output, using negative strides.
248-
249- DEFAULT FILTERS
250- For functions which don't provide explicit control over what filters to
251- use, you can change the compile-time defaults with:
252-
253- #define STBIR_DEFAULT_FILTER_UPSAMPLE STBIR_FILTER_something
254- #define STBIR_DEFAULT_FILTER_DOWNSAMPLE STBIR_FILTER_something
255-
256- See stbir_filter in the header-file section for the list of filters.
257-
258- NEW FILTERS
259- A number of 1D filter kernels are supplied. For a list of supported
260- filters, see the stbir_filter enum. You can install your own filters by
261- using the stbir_set_filter_callbacks function.
262-
263- PROGRESS
264- For interactive use with slow resize operations, you can use the
265- scanline callbacks in the extended API. It would have to be a *very*
266- large image resample to need progress though - we're very fast.
267-
268- CEIL and FLOOR
269- In scalar mode, the only functions we use from math.h are ceilf and
270- floorf, but if you have your own versions, you can define the STBIR_CEILF(v)
271- and STBIR_FLOORF(v) macros and we'll use them instead. In SIMD, we just use
272- our own versions.
273-
274- ASSERT
275- Define STBIR_ASSERT(boolval) to override assert() and not use assert.h
276-
277- PORTING FROM VERSION 1
278- The API has changed. You can continue to use the old version of
279- stb_image_resize.h, which is available in the "deprecated/" directory.
280-
281- If you're using the old simple-to-use API, porting is straightforward.
282- (For more advanced APIs, read the documentation.)
283-
284- stbir_resize_uint8():
285- - call `stbir_resize_uint8_linear`, cast channel count to
286- `stbir_pixel_layout`
287-
288- stbir_resize_float():
289- - call `stbir_resize_float_linear`, cast channel count to
290- `stbir_pixel_layout`
291-
292- stbir_resize_uint8_srgb():
293- - function name is unchanged
294- - cast channel count to `stbir_pixel_layout`
295- - above is sufficient unless your image has alpha and it's not
296- RGBA/BGRA
297- - in that case, follow the below instructions for
298- stbir_resize_uint8_srgb_edgemode
299-
300- stbir_resize_uint8_srgb_edgemode()
301- - switch to the "medium complexity" API
302- - stbir_resize(), very similar API but a few more parameters:
303- - pixel_layout: cast channel count to `stbir_pixel_layout`
304- - data_type: STBIR_TYPE_UINT8_SRGB
305- - edge: unchanged (STBIR_EDGE_WRAP, etc.)
306- - filter: STBIR_FILTER_DEFAULT
307- - which channel is alpha is specified in stbir_pixel_layout, see
308- enum for details
309-
310- FUTURE TODOS
311- * For polyphase integral filters, we just memcpy the coeffs to dupe
312- them, but we should indirect and use the same coeff memory.
313- * Add pixel layout conversions for sensible different channel counts
314- (maybe, 1->3/4, 3->4, 4->1, 3->1).
315- * For SIMD encode and decode scanline routines, do any pre-aligning
316- for bad input/output buffer alignments and pitch?
317- * For very wide scanlines, we should we do vertical strips to stay
318- within L2 cache. Maybe do chunks of 1K pixels at a time. There would be some
319- pixel reconversion, but probably dwarfed by things falling out of cache.
320- Probably also something possible with alternating between scattering and
321- gathering at high resize scales?
322- * Should we have a multiple MIPs at the same time function (could keep
323- more memory in cache during multiple resizes)?
324- * Rewrite the coefficient generator to do many at once.
325- * AVX-512 vertical kernels - worried about downclocking here.
326- * Convert the reincludes to macros when we know they aren't changing.
327- * Experiment with pivoting the horizontal and always using the
328- vertical filters (which are faster, but perhaps not enough to
329- overcome the pivot cost and the extra memory touches). Need to buffer the
330- whole image so have to balance memory use.
331- * Most of our code is internally function pointers, should we compile
332- all the SIMD stuff always and dynamically dispatch?
333-
334- CONTRIBUTORS
335- Jeff Roberts: 2.0 implementation, optimizations, SIMD
336- Martins Mozeiko: NEON simd, WASM simd, clang and GCC whisperer
337- Fabian Giesen: half float and srgb converters
338- Sean Barrett: API design, optimizations
339- Jorge L Rodriguez: Original 1.0 implementation
340- Aras Pranckevicius: bugfixes
341- Nathan Reed: warning fixes for 1.0
342-
343- REVISIONS
344- 2.17 (2025-10-25) silly format bug in easy-to-use APIs.
345- 2.16 (2025-10-21) fixed the easy-to-use APIs to allow inverted bitmaps
346- (negative strides), fix vertical filter kernel callback, fix threaded gather
347- buffer priming (and assert). (thanks adipose, TainZerL, and Harrison Green)
348- 2.15 (2025-07-17) fixed an assert in debug mode when using floats with
349- input callbacks, work around GCC warning when adding to null ptr (thanks
350- Johannes Spohr and Pyry Kovanen). 2.14 (2025-05-09) fixed a bug using
351- downsampling gather horizontal first, and scatter with vertical first. 2.13
352- (2025-02-27) fixed a bug when using input callbacks, turned off simd for
353- tiny-c, fixed some variables that should have been
354- static, fixes a bug when calculating temp memory with resizes that exceed 2GB
355- of temp memory (very large resizes). 2.12 (2024-10-18) fix incorrect use of
356- user_data with STBIR_FREE 2.11 (2024-09-08) fix harmless asan warnings in
357- 2-channel and 3-channel mode with AVX-2, fix some weird scaling edge
358- conditions with point sample mode. 2.10 (2024-07-27) fix the defines GCC and
359- mingw for loop unroll control, fix MSVC 32-bit arm half float routines. 2.09
360- (2024-06-19) fix the defines for 32-bit ARM GCC builds (was selecting
361- hardware half floats).
362- 2.08 (2024-06-10) fix for RGB->BGR three channel flips and add SIMD
363- (thanks to Ryan Salsbury), fix for sub-rect resizes, use the pragmas to
364- control unrolling when they are available. 2.07 (2024-05-24) fix for slow
365- final split during threaded conversions of very wide scanlines when
366- downsampling (caused by extra input converting), fix for wide scanline
367- resamples with many splits (int overflow), fix GCC warning. 2.06 (2024-02-10)
368- fix for identical width/height 3x or more down-scaling undersampling a single
369- row on rare resize ratios (about 1%). 2.05 (2024-02-07) fix for 2 pixel to 1
370- pixel resizes with wrap (thanks Aras), fix for output callback (thanks Julien
371- Koenen). 2.04 (2023-11-17) fix for rare AVX bug, shadowed symbol (thanks
372- Nikola Smiljanic). 2.03 (2023-11-01) ASAN and TSAN warnings fixed, minor
373- tweaks. 2.00 (2023-10-10) mostly new source: new api, optimizations, simd,
374- vertical-first, etc 2x-5x faster without simd, 4x-12x faster with simd, in
375- some cases, 20x to 40x faster esp resizing large to very small. 0.96
376- (2019-03-04) fixed warnings 0.95 (2017-07-23) fixed warnings 0.94
377- (2017-03-18) fixed warnings 0.93 (2017-03-03) fixed bug with certain
378- combinations of heights 0.92 (2017-01-02) fix integer overflow on large
379- (>2GB) images 0.91 (2016-04-02) fix warnings; fix handling of subpixel
380- regions 0.90 (2014-09-17) first released version
381-
382- LICENSE
383- See end of file for license information.
384-*/
385-
386-#if !defined(STB_IMAGE_RESIZE_DO_HORIZONTALS) && \
387- !defined(STB_IMAGE_RESIZE_DO_VERTICALS) && \
388- !defined(STB_IMAGE_RESIZE_DO_CODERS) // for internal re-includes
389-
390-#ifndef STBIR_INCLUDE_STB_IMAGE_RESIZE2_H
391-#define STBIR_INCLUDE_STB_IMAGE_RESIZE2_H
392-
393-#include <stddef.h>
394-#ifdef _MSC_VER
395-typedef unsigned char stbir_uint8;
396-typedef unsigned short stbir_uint16;
397-typedef unsigned int stbir_uint32;
398-typedef unsigned __int64 stbir_uint64;
399-#else
400-#include <stdint.h>
401-typedef uint8_t stbir_uint8;
402-typedef uint16_t stbir_uint16;
403-typedef uint32_t stbir_uint32;
404-typedef uint64_t stbir_uint64;
405-#endif
406-
407-#ifndef STBIRDEF
408-#ifdef STB_IMAGE_RESIZE_STATIC
409-#define STBIRDEF static
410-#else
411-#ifdef __cplusplus
412-#define STBIRDEF extern "C"
413-#else
414-#define STBIRDEF extern
415-#endif
416-#endif
417-#endif
418-
419-//////////////////////////////////////////////////////////////////////////////
420-//// start "header file" ///////////////////////////////////////////////////
421-//
422-// Easy-to-use API:
423-//
424-// * stride is the offset between successive rows of image data
425-// in memory, in bytes. specify 0 for packed continuously in memory
426-// * colorspace is linear or sRGB as specified by function name
427-// * Uses the default filters
428-// * Uses edge mode clamped
429-// * returned result is 1 for success or 0 in case of an error.
430-
431-// stbir_pixel_layout specifies:
432-// number of channels
433-// order of channels
434-// whether color is premultiplied by alpha
435-// for back compatibility, you can cast the old channel count to an
436-// stbir_pixel_layout
437-typedef enum {
438- STBIR_1CHANNEL = 1,
439- STBIR_2CHANNEL = 2,
440- STBIR_RGB = 3, // 3-chan, with order specified (for channel flipping)
441- STBIR_BGR = 0, // 3-chan, with order specified (for channel flipping)
442- STBIR_4CHANNEL = 5,
443-
444- STBIR_RGBA = 4, // alpha formats, where alpha is NOT premultiplied into
445- // color channels
446- STBIR_BGRA = 6,
447- STBIR_ARGB = 7,
448- STBIR_ABGR = 8,
449- STBIR_RA = 9,
450- STBIR_AR = 10,
451-
452- STBIR_RGBA_PM =
453- 11, // alpha formats, where alpha is premultiplied into color channels
454- STBIR_BGRA_PM = 12,
455- STBIR_ARGB_PM = 13,
456- STBIR_ABGR_PM = 14,
457- STBIR_RA_PM = 15,
458- STBIR_AR_PM = 16,
459-
460- STBIR_RGBA_NO_AW =
461- 11, // alpha formats, where NO alpha weighting is applied at all!
462- STBIR_BGRA_NO_AW =
463- 12, // these are just synonyms for the _PM flags (which also do
464- STBIR_ARGB_NO_AW =
465- 13, // no alpha weighting). These names just make it more clear
466- STBIR_ABGR_NO_AW = 14, // for some folks).
467- STBIR_RA_NO_AW = 15,
468- STBIR_AR_NO_AW = 16,
469-
470-} stbir_pixel_layout;
471-
472-//===============================================================
473-// Simple-complexity API
474-//
475-// If output_pixels is NULL (0), then we will allocate the buffer and return
476-// it to you.
477-//--------------------------------
478-
479-STBIRDEF unsigned char *
480-stbir_resize_uint8_srgb(const unsigned char *input_pixels, int input_w,
481- int input_h, int input_stride_in_bytes,
482- unsigned char *output_pixels, int output_w,
483- int output_h, int output_stride_in_bytes,
484- stbir_pixel_layout pixel_type);
485-
486-STBIRDEF unsigned char *
487-stbir_resize_uint8_linear(const unsigned char *input_pixels, int input_w,
488- int input_h, int input_stride_in_bytes,
489- unsigned char *output_pixels, int output_w,
490- int output_h, int output_stride_in_bytes,
491- stbir_pixel_layout pixel_type);
492-
493-STBIRDEF float *
494-stbir_resize_float_linear(const float *input_pixels, int input_w, int input_h,
495- int input_stride_in_bytes, float *output_pixels,
496- int output_w, int output_h,
497- int output_stride_in_bytes,
498- stbir_pixel_layout pixel_type);
499-//===============================================================
500-
501-//===============================================================
502-// Medium-complexity API
503-//
504-// This extends the easy-to-use API as follows:
505-//
506-// * Can specify the datatype - U8, U8_SRGB, U16, FLOAT, HALF_FLOAT
507-// * Edge wrap can selected explicitly
508-// * Filter can be selected explicitly
509-//--------------------------------
510-
511-typedef enum {
512- STBIR_EDGE_CLAMP = 0,
513- STBIR_EDGE_REFLECT = 1,
514- STBIR_EDGE_WRAP = 2, // this edge mode is slower and uses more memory
515- STBIR_EDGE_ZERO = 3,
516-} stbir_edge;
517-
518-typedef enum {
519- STBIR_FILTER_DEFAULT =
520- 0, // use same filter type that easy-to-use API chooses
521- STBIR_FILTER_BOX = 1, // A trapezoid w/1-pixel wide ramps, same result as
522- // box for integer scale ratios
523- STBIR_FILTER_TRIANGLE =
524- 2, // On upsampling, produces same results as bilinear texture filtering
525- STBIR_FILTER_CUBICBSPLINE =
526- 3, // The cubic b-spline (aka Mitchell-Netrevalli with B=1,C=0),
527- // gaussian-esque
528- STBIR_FILTER_CATMULLROM = 4, // An interpolating cubic spline
529- STBIR_FILTER_MITCHELL = 5, // Mitchell-Netrevalli filter with B=1/3, C=1/3
530- STBIR_FILTER_POINT_SAMPLE = 6, // Simple point sampling
531- STBIR_FILTER_OTHER = 7, // User callback specified
532-} stbir_filter;
533-
534-typedef enum {
535- STBIR_TYPE_UINT8 = 0,
536- STBIR_TYPE_UINT8_SRGB = 1,
537- STBIR_TYPE_UINT8_SRGB_ALPHA = 2, // alpha channel, when present, should also
538- // be SRGB (this is very unusual)
539- STBIR_TYPE_UINT16 = 3,
540- STBIR_TYPE_FLOAT = 4,
541- STBIR_TYPE_HALF_FLOAT = 5
542-} stbir_datatype;
543-
544-// medium api
545-STBIRDEF void *
546-stbir_resize(const void *input_pixels, int input_w, int input_h,
547- int input_stride_in_bytes, void *output_pixels, int output_w,
548- int output_h, int output_stride_in_bytes,
549- stbir_pixel_layout pixel_layout, stbir_datatype data_type,
550- stbir_edge edge, stbir_filter filter);
551-//===============================================================
552-
553-//===============================================================
554-// Extended-complexity API
555-//
556-// This API exposes all resize functionality.
557-//
558-// * Separate filter types for each axis
559-// * Separate edge modes for each axis
560-// * Separate input and output data types
561-// * Can specify regions with subpixel correctness
562-// * Can specify alpha flags
563-// * Can specify a memory callback
564-// * Can specify a callback data type for pixel input and output
565-// * Can be threaded for a single resize
566-// * Can be used to resize many frames without recalculating the sampler
567-// info
568-//
569-// Use this API as follows:
570-// 1) Call the stbir_resize_init function on a local STBIR_RESIZE structure
571-// 2) Call any of the stbir_set functions
572-// 3) Optionally call stbir_build_samplers() if you are going to resample
573-// multiple times
574-// with the same input and output dimensions (like resizing video frames)
575-// 4) Resample by calling stbir_resize_extended().
576-// 5) Call stbir_free_samplers() if you called stbir_build_samplers()
577-//--------------------------------
578-
579-// Types:
580-
581-// INPUT CALLBACK: this callback is used for input scanlines
582-typedef void const *
583-stbir_input_callback(void *optional_output, void const *input_ptr,
584- int num_pixels, int x, int y, void *context);
585-
586-// OUTPUT CALLBACK: this callback is used for output scanlines
587-typedef void
588-stbir_output_callback(void const *output_ptr, int num_pixels, int y,
589- void *context);
590-
591-// callbacks for user installed filters
592-typedef float
593-stbir__kernel_callback(float x, float scale,
594- void *user_data); // centered at zero
595-typedef float
596-stbir__support_callback(float scale, void *user_data);
597-
598-// internal structure with precomputed scaling
599-typedef struct stbir__info stbir__info;
600-
601-typedef struct STBIR_RESIZE // use the stbir_resize_init and stbir_override
602- // functions to set these values for future
603- // compatibility
604-{
605- void *user_data;
606- void const *input_pixels;
607- int input_w, input_h;
608- double input_s0, input_t0, input_s1, input_t1;
609- stbir_input_callback *input_cb;
610- void *output_pixels;
611- int output_w, output_h;
612- int output_subx, output_suby, output_subw, output_subh;
613- stbir_output_callback *output_cb;
614- int input_stride_in_bytes;
615- int output_stride_in_bytes;
616- int splits;
617- int fast_alpha;
618- int needs_rebuild;
619- int called_alloc;
620- stbir_pixel_layout input_pixel_layout_public;
621- stbir_pixel_layout output_pixel_layout_public;
622- stbir_datatype input_data_type;
623- stbir_datatype output_data_type;
624- stbir_filter horizontal_filter, vertical_filter;
625- stbir_edge horizontal_edge, vertical_edge;
626- stbir__kernel_callback *horizontal_filter_kernel;
627- stbir__support_callback *horizontal_filter_support;
628- stbir__kernel_callback *vertical_filter_kernel;
629- stbir__support_callback *vertical_filter_support;
630- stbir__info *samplers;
631-} STBIR_RESIZE;
632-
633-// extended complexity api
634-
635-// First off, you must ALWAYS call stbir_resize_init on your resize structure
636-// before any of the other calls!
637-STBIRDEF void
638-stbir_resize_init(STBIR_RESIZE *resize, const void *input_pixels, int input_w,
639- int input_h, int input_stride_in_bytes, // stride can be zero
640- void *output_pixels, int output_w, int output_h,
641- int output_stride_in_bytes, // stride can be zero
642- stbir_pixel_layout pixel_layout, stbir_datatype data_type);
643-
644-//===============================================================
645-// You can update these parameters any time after resize_init and there is no
646-// cost
647-//--------------------------------
648-
649-STBIRDEF void
650-stbir_set_datatypes(STBIR_RESIZE *resize, stbir_datatype input_type,
651- stbir_datatype output_type);
652-STBIRDEF void
653-stbir_set_pixel_callbacks(
654- STBIR_RESIZE *resize, stbir_input_callback *input_cb,
655- stbir_output_callback *output_cb); // no callbacks by default
656-STBIRDEF void
657-stbir_set_user_data(STBIR_RESIZE *resize,
658- void *user_data); // pass back STBIR_RESIZE* by default
659-STBIRDEF void
660-stbir_set_buffer_ptrs(STBIR_RESIZE *resize, const void *input_pixels,
661- int input_stride_in_bytes, void *output_pixels,
662- int output_stride_in_bytes);
663-
664-//===============================================================
665-
666-//===============================================================
667-// If you call any of these functions, you will trigger a sampler rebuild!
668-//--------------------------------
669-
670-STBIRDEF int
671-stbir_set_pixel_layouts(
672- STBIR_RESIZE *resize, stbir_pixel_layout input_pixel_layout,
673- stbir_pixel_layout output_pixel_layout); // sets new buffer layouts
674-STBIRDEF int
675-stbir_set_edgemodes(STBIR_RESIZE *resize, stbir_edge horizontal_edge,
676- stbir_edge vertical_edge); // CLAMP by default
677-
678-STBIRDEF int
679-stbir_set_filters(STBIR_RESIZE *resize, stbir_filter horizontal_filter,
680- stbir_filter vertical_filter); // STBIR_DEFAULT_FILTER_UPSAMPLE/DOWNSAMPLE
681- // by default
682-STBIRDEF int
683-stbir_set_filter_callbacks(STBIR_RESIZE *resize,
684- stbir__kernel_callback *horizontal_filter,
685- stbir__support_callback *horizontal_support,
686- stbir__kernel_callback *vertical_filter,
687- stbir__support_callback *vertical_support);
688-
689-STBIRDEF int
690-stbir_set_pixel_subrect(
691- STBIR_RESIZE *resize, int subx, int suby, int subw,
692- int subh); // sets both sub-regions (full regions by default)
693-STBIRDEF int
694-stbir_set_input_subrect(
695- STBIR_RESIZE *resize, double s0, double t0, double s1,
696- double t1); // sets input sub-region (full region by default)
697-STBIRDEF int
698-stbir_set_output_pixel_subrect(
699- STBIR_RESIZE *resize, int subx, int suby, int subw,
700- int subh); // sets output sub-region (full region by default)
701-
702-// when inputting AND outputting non-premultiplied alpha pixels, we use a slower
703-// but higher quality technique
704-// that fills the zero alpha pixel's RGB values with something plausible. If
705-// you don't care about areas of zero alpha, you can call this function to get
706-// about a 25% speed improvement for STBIR_RGBA to STBIR_RGBA types of
707-// resizes.
708-STBIRDEF int
709-stbir_set_non_pm_alpha_speed_over_quality(STBIR_RESIZE *resize,
710- int non_pma_alpha_speed_over_quality);
711-//===============================================================
712-
713-//===============================================================
714-// You can call build_samplers to prebuild all the internal data we need to
715-// resample.
716-// Then, if you call resize_extended many times with the same resize, you only
717-// pay the cost once.
718-// If you do call build_samplers, you MUST call free_samplers eventually.
719-//--------------------------------
720-
721-// This builds the samplers and does one allocation
722-STBIRDEF int
723-stbir_build_samplers(STBIR_RESIZE *resize);
724-
725-// You MUST call this, if you call stbir_build_samplers or
726-// stbir_build_samplers_with_splits
727-STBIRDEF void
728-stbir_free_samplers(STBIR_RESIZE *resize);
729-//===============================================================
730-
731-// And this is the main function to perform the resize synchronously on one
732-// thread.
733-STBIRDEF int
734-stbir_resize_extended(STBIR_RESIZE *resize);
735-
736-//===============================================================
737-// Use these functions for multithreading.
738-// 1) You call stbir_build_samplers_with_splits first on the main thread
739-// 2) Then stbir_resize_with_split on each thread
740-// 3) stbir_free_samplers when done on the main thread
741-//--------------------------------
742-
743-// This will build samplers for threading.
744-// You can pass in the number of threads you'd like to use (try_splits).
745-// It returns the number of splits (threads) that you can call it with.
746-/// It might be less if the image resize can't be split up that many ways.
747-
748-STBIRDEF int
749-stbir_build_samplers_with_splits(STBIR_RESIZE *resize, int try_splits);
750-
751-// This function does a split of the resizing (you call this fuction for each
752-// split, on multiple threads). A split is a piece of the output resize pixel
753-// space.
754-
755-// Note that you MUST call stbir_build_samplers_with_splits before
756-// stbir_resize_extended_split!
757-
758-// Usually, you will always call stbir_resize_split with split_start as the
759-// thread_index
760-// and "1" for the split_count.
761-// But, if you have a weird situation where you MIGHT want 8 threads, but
762-// sometimes
763-// only 4 threads, you can use 0,2,4,6 for the split_start's and use "2" for
764-// the split_count each time to turn in into a 4 thread resize. (This is
765-// unusual).
766-
767-STBIRDEF int
768-stbir_resize_extended_split(STBIR_RESIZE *resize, int split_start,
769- int split_count);
770-//===============================================================
771-
772-//===============================================================
773-// Pixel Callbacks info:
774-//--------------------------------
775-
776-// The input callback is super flexible - it calls you with the input address
777-// (based on the stride and base pointer), it gives you an optional_output
778-// pointer that you can fill, or you can just return your own pointer into
779-// your own data.
780-//
781-// You can also do conversion from non-supported data types if necessary - in
782-// this case, you ignore the input_ptr and just use the x and y parameters to
783-// calculate your own input_ptr based on the size of each non-supported pixel.
784-// (Something like the third example below.)
785-//
786-// You can also install just an input or just an output callback by setting
787-// the callback that you don't want to zero.
788-//
789-// First example, progress: (getting a callback that you can monitor the
790-// progress):
791-// void const * my_callback( void * optional_output, void const *
792-// input_ptr, int num_pixels, int x, int y, void * context )
793-// {
794-// percentage_done = y / input_height;
795-// return input_ptr; // use buffer from call
796-// }
797-//
798-// Next example, copying: (copy from some other buffer or stream):
799-// void const * my_callback( void * optional_output, void const *
800-// input_ptr, int num_pixels, int x, int y, void * context )
801-// {
802-// CopyOrStreamData( optional_output, other_data_src, num_pixels *
803-// pixel_width_in_bytes ); return optional_output; // return the
804-// optional buffer that we filled
805-// }
806-//
807-// Third example, input another buffer without copying: (zero-copy from
808-// other buffer):
809-// void const * my_callback( void * optional_output, void const *
810-// input_ptr, int num_pixels, int x, int y, void * context )
811-// {
812-// void * pixels = ( (char*) other_image_base ) + ( y *
813-// other_image_stride ) + ( x * other_pixel_width_in_bytes ); return
814-// pixels; // return pointer to your data without copying
815-// }
816-//
817-//
818-// The output callback is considerably simpler - it just calls you so that you
819-// can dump out each scanline. You could even directly copy out to disk if you
820-// have a simple format like TGA or BMP. You can also convert to other output
821-// types here if you want.
822-//
823-// Simple example:
824-// void const * my_output( void * output_ptr, int num_pixels, int y, void
825-// * context )
826-// {
827-// percentage_done = y / output_height;
828-// fwrite( output_ptr, pixel_width_in_bytes, num_pixels, output_file
829-// );
830-// }
831-//===============================================================
832-
833-//===============================================================
834-// optional built-in profiling API
835-//--------------------------------
836-
837-#ifdef STBIR_PROFILE
838-
839-typedef struct STBIR_PROFILE_INFO {
840- stbir_uint64 total_clocks;
841-
842- // how many clocks spent (of total_clocks) in the various resize routines,
843- // along with a string description
844- // there are "resize_count" number of zones
845- stbir_uint64 clocks[8];
846- char const **descriptions;
847-
848- // count of clocks and descriptions
849- stbir_uint32 count;
850-} STBIR_PROFILE_INFO;
851-
852-// use after calling stbir_resize_extended (or stbir_build_samplers or
853-// stbir_build_samplers_with_splits)
854-STBIRDEF void
855-stbir_resize_build_profile_info(STBIR_PROFILE_INFO *out_info,
856- STBIR_RESIZE const *resize);
857-
858-// use after calling stbir_resize_extended
859-STBIRDEF void
860-stbir_resize_extended_profile_info(STBIR_PROFILE_INFO *out_info,
861- STBIR_RESIZE const *resize);
862-
863-// use after calling stbir_resize_extended_split
864-STBIRDEF void
865-stbir_resize_split_profile_info(STBIR_PROFILE_INFO *out_info,
866- STBIR_RESIZE const *resize, int split_start,
867- int split_num);
868-
869-//===============================================================
870-
871-#endif
872-
873-//// end header file /////////////////////////////////////////////////////
874-#endif // STBIR_INCLUDE_STB_IMAGE_RESIZE2_H
875-
876-#if defined(STB_IMAGE_RESIZE_IMPLEMENTATION) || \
877- defined(STB_IMAGE_RESIZE2_IMPLEMENTATION)
878-
879-#ifndef STBIR_ASSERT
880-#include <assert.h>
881-#define STBIR_ASSERT(x) assert(x)
882-#endif
883-
884-#ifndef STBIR_MALLOC
885-#include <stdlib.h>
886-#define STBIR_MALLOC(size, user_data) ((void)(user_data), malloc(size))
887-#define STBIR_FREE(ptr, user_data) ((void)(user_data), free(ptr))
888-// (we used the comma operator to evaluate user_data, to avoid "unused
889-// parameter" warnings)
890-#endif
891-
892-#ifdef _MSC_VER
893-
894-#define stbir__inline __forceinline
895-
896-#else
897-
898-#define stbir__inline __inline__
899-
900-// Clang address sanitizer
901-#if defined(__has_feature)
902-#if __has_feature(address_sanitizer) || __has_feature(memory_sanitizer)
903-#ifndef STBIR__SEPARATE_ALLOCATIONS
904-#define STBIR__SEPARATE_ALLOCATIONS
905-#endif
906-#endif
907-#endif
908-
909-#endif
910-
911-// GCC and MSVC
912-#if defined(__SANITIZE_ADDRESS__)
913-#ifndef STBIR__SEPARATE_ALLOCATIONS
914-#define STBIR__SEPARATE_ALLOCATIONS
915-#endif
916-#endif
917-
918-// Always turn off automatic FMA use - use STBIR_USE_FMA if you want.
919-// Otherwise, this is a determinism disaster.
920-#ifndef STBIR_DONT_CHANGE_FP_CONTRACT // override in case you don't want this
921- // behavior
922-#if defined(_MSC_VER) && !defined(__clang__)
923-#if _MSC_VER > 1200
924-#pragma fp_contract(off)
925-#endif
926-#elif defined(__GNUC__) && !defined(__clang__)
927-#pragma GCC optimize("fp-contract=off")
928-#else
929-#pragma STDC FP_CONTRACT OFF
930-#endif
931-#endif
932-
933-#ifdef _MSC_VER
934-#define STBIR__UNUSED(v) (void)(v)
935-#else
936-#define STBIR__UNUSED(v) (void)sizeof(v)
937-#endif
938-
939-#define STBIR__ARRAY_SIZE(a) (sizeof((a)) / sizeof((a)[0]))
940-
941-#ifndef STBIR_DEFAULT_FILTER_UPSAMPLE
942-#define STBIR_DEFAULT_FILTER_UPSAMPLE STBIR_FILTER_CATMULLROM
943-#endif
944-
945-#ifndef STBIR_DEFAULT_FILTER_DOWNSAMPLE
946-#define STBIR_DEFAULT_FILTER_DOWNSAMPLE STBIR_FILTER_MITCHELL
947-#endif
948-
949-#ifndef STBIR__HEADER_FILENAME
950-#define STBIR__HEADER_FILENAME "stb_image_resize2.h"
951-#endif
952-
953-// the internal pixel layout enums are in a different order, so we can easily do
954-// range comparisons of types
955-// the public pixel layout is ordered in a way that if you cast num_channels
956-// (1-4) to the enum, you get something sensible
957-typedef enum {
958- STBIRI_1CHANNEL = 0,
959- STBIRI_2CHANNEL = 1,
960- STBIRI_RGB = 2,
961- STBIRI_BGR = 3,
962- STBIRI_4CHANNEL = 4,
963-
964- STBIRI_RGBA = 5,
965- STBIRI_BGRA = 6,
966- STBIRI_ARGB = 7,
967- STBIRI_ABGR = 8,
968- STBIRI_RA = 9,
969- STBIRI_AR = 10,
970-
971- STBIRI_RGBA_PM = 11,
972- STBIRI_BGRA_PM = 12,
973- STBIRI_ARGB_PM = 13,
974- STBIRI_ABGR_PM = 14,
975- STBIRI_RA_PM = 15,
976- STBIRI_AR_PM = 16,
977-} stbir_internal_pixel_layout;
978-
979-// define the public pixel layouts to not compile inside the implementation (to
980-// avoid accidental use)
981-#define STBIR_BGR bad_dont_use_in_implementation
982-#define STBIR_1CHANNEL STBIR_BGR
983-#define STBIR_2CHANNEL STBIR_BGR
984-#define STBIR_RGB STBIR_BGR
985-#define STBIR_RGBA STBIR_BGR
986-#define STBIR_4CHANNEL STBIR_BGR
987-#define STBIR_BGRA STBIR_BGR
988-#define STBIR_ARGB STBIR_BGR
989-#define STBIR_ABGR STBIR_BGR
990-#define STBIR_RA STBIR_BGR
991-#define STBIR_AR STBIR_BGR
992-#define STBIR_RGBA_PM STBIR_BGR
993-#define STBIR_BGRA_PM STBIR_BGR
994-#define STBIR_ARGB_PM STBIR_BGR
995-#define STBIR_ABGR_PM STBIR_BGR
996-#define STBIR_RA_PM STBIR_BGR
997-#define STBIR_AR_PM STBIR_BGR
998-
999-// must match stbir_datatype
1000-static unsigned char stbir__type_size[] = {
1001- 1, 1, 1, 2,
1002- 4, 2 // STBIR_TYPE_UINT8,STBIR_TYPE_UINT8_SRGB,STBIR_TYPE_UINT8_SRGB_ALPHA,STBIR_TYPE_UINT16,STBIR_TYPE_FLOAT,STBIR_TYPE_HALF_FLOAT
1003-};
1004-
1005-// When gathering, the contributors are which source pixels contribute.
1006-// When scattering, the contributors are which destination pixels are
1007-// contributed to.
1008-typedef struct {
1009- int n0; // First contributing pixel
1010- int n1; // Last contributing pixel
1011-} stbir__contributors;
1012-
1013-typedef struct {
1014- int lowest; // First sample index for whole filter
1015- int highest; // Last sample index for whole filter
1016- int widest; // widest single set of samples for an output
1017-} stbir__filter_extent_info;
1018-
1019-typedef struct {
1020- int n0; // First pixel of decode buffer to write to
1021- int n1; // Last pixel of decode that will be written to
1022- int pixel_offset_for_input; // Pixel offset into input_scanline
1023-} stbir__span;
1024-
1025-typedef struct stbir__scale_info {
1026- int input_full_size;
1027- int output_sub_size;
1028- float scale;
1029- float inv_scale;
1030- float pixel_shift; // starting shift in output pixel space (in pixels)
1031- int scale_is_rational;
1032- stbir_uint32 scale_numerator, scale_denominator;
1033-} stbir__scale_info;
1034-
1035-typedef struct {
1036- stbir__contributors *contributors;
1037- float *coefficients;
1038- stbir__contributors *gather_prescatter_contributors;
1039- float *gather_prescatter_coefficients;
1040- stbir__scale_info scale_info;
1041- float support;
1042- stbir_filter filter_enum;
1043- stbir__kernel_callback *filter_kernel;
1044- stbir__support_callback *filter_support;
1045- stbir_edge edge;
1046- int coefficient_width;
1047- int filter_pixel_width;
1048- int filter_pixel_margin;
1049- int num_contributors;
1050- int contributors_size;
1051- int coefficients_size;
1052- stbir__filter_extent_info extent_info;
1053- int is_gather; // 0 = scatter, 1 = gather with scale >= 1, 2 = gather with
1054- // scale < 1
1055- int gather_prescatter_num_contributors;
1056- int gather_prescatter_coefficient_width;
1057- int gather_prescatter_contributors_size;
1058- int gather_prescatter_coefficients_size;
1059-} stbir__sampler;
1060-
1061-typedef struct {
1062- stbir__contributors conservative;
1063- int edge_sizes[2]; // this can be less than filter_pixel_margin, if the
1064- // filter and scaling falls off
1065- stbir__span spans[2]; // can be two spans, if doing input subrect with clamp
1066- // mode WRAP
1067-} stbir__extents;
1068-
1069-typedef struct {
1070-#ifdef STBIR_PROFILE
1071- union {
1072- struct {
1073- stbir_uint64 total, looping, vertical, horizontal, decode, encode,
1074- alpha, unalpha;
1075- } named;
1076- stbir_uint64 array[8];
1077- } profile;
1078- stbir_uint64 *current_zone_excluded_ptr;
1079-#endif
1080- float *decode_buffer;
1081-
1082- int ring_buffer_first_scanline;
1083- int ring_buffer_last_scanline;
1084- int ring_buffer_begin_index; // first_scanline is at this index in the ring
1085- // buffer
1086- int start_output_y, end_output_y;
1087- int start_input_y, end_input_y; // used in scatter only
1088-
1089-#ifdef STBIR__SEPARATE_ALLOCATIONS
1090- float **ring_buffers; // one pointer for each ring buffer
1091-#else
1092- float *ring_buffer; // one big buffer that we index into
1093-#endif
1094-
1095- float *vertical_buffer;
1096-
1097- char no_cache_straddle[64];
1098-} stbir__per_split_info;
1099-
1100-typedef float *
1101-stbir__decode_pixels_func(float *decode, int width_times_channels,
1102- void const *input);
1103-typedef void
1104-stbir__alpha_weight_func(float *decode_buffer, int width_times_channels);
1105-typedef void
1106-stbir__horizontal_gather_channels_func(
1107- float *output_buffer, unsigned int output_sub_size,
1108- float const *decode_buffer,
1109- stbir__contributors const *horizontal_contributors,
1110- float const *horizontal_coefficients, int coefficient_width);
1111-typedef void
1112-stbir__alpha_unweight_func(float *encode_buffer, int width_times_channels);
1113-typedef void
1114-stbir__encode_pixels_func(void *output, int width_times_channels,
1115- float const *encode);
1116-
1117-struct stbir__info {
1118-#ifdef STBIR_PROFILE
1119- union {
1120- struct {
1121- stbir_uint64 total, build, alloc, horizontal, vertical, cleanup,
1122- pivot;
1123- } named;
1124- stbir_uint64 array[7];
1125- } profile;
1126- stbir_uint64 *current_zone_excluded_ptr;
1127-#endif
1128- stbir__sampler horizontal;
1129- stbir__sampler vertical;
1130-
1131- void const *input_data;
1132- void *output_data;
1133-
1134- int input_stride_bytes;
1135- int output_stride_bytes;
1136- int ring_buffer_length_bytes; // The length of an individual entry in the
1137- // ring buffer. The total number of ring
1138- // buffers is
1139- // stbir__get_filter_pixel_width(filter)
1140- int ring_buffer_num_entries; // Total number of entries in the ring buffer.
1141-
1142- stbir_datatype input_type;
1143- stbir_datatype output_type;
1144-
1145- stbir_input_callback *in_pixels_cb;
1146- void *user_data;
1147- stbir_output_callback *out_pixels_cb;
1148-
1149- stbir__extents scanline_extents;
1150-
1151- void *alloced_mem;
1152- stbir__per_split_info
1153- *split_info; // by default 1, but there will be N of these allocated
1154- // based on the thread init you did
1155-
1156- stbir__decode_pixels_func *decode_pixels;
1157- stbir__alpha_weight_func *alpha_weight;
1158- stbir__horizontal_gather_channels_func *horizontal_gather_channels;
1159- stbir__alpha_unweight_func *alpha_unweight;
1160- stbir__encode_pixels_func *encode_pixels;
1161-
1162- int alloc_ring_buffer_num_entries; // Number of entries in the ring buffer
1163- // that will be allocated
1164- int splits; // count of splits
1165-
1166- stbir_internal_pixel_layout input_pixel_layout_internal;
1167- stbir_internal_pixel_layout output_pixel_layout_internal;
1168-
1169- int input_color_and_type;
1170- int offset_x, offset_y; // offset within output_data
1171- int vertical_first;
1172- int channels;
1173- int effective_channels; // same as channels, except on RGBA/ARGB (7), or
1174- // XA/AX (3)
1175- size_t alloced_total;
1176-};
1177-
1178-#define stbir__max_uint8_as_float 255.0f
1179-#define stbir__max_uint16_as_float 65535.0f
1180-#define stbir__max_uint8_as_float_inverted 3.9215689e-03f // (1.0f/255.0f)
1181-#define stbir__max_uint16_as_float_inverted 1.5259022e-05f // (1.0f/65535.0f)
1182-#define stbir__small_float \
1183- ((float)1 / (1 << 20) / (1 << 20) / (1 << 20) / (1 << 20) / (1 << 20) / \
1184- (1 << 20))
1185-
1186-// min/max friendly
1187-#define STBIR_CLAMP(x, xmin, xmax) \
1188- for (;;) { \
1189- if ((x) < (xmin)) \
1190- (x) = (xmin); \
1191- if ((x) > (xmax)) \
1192- (x) = (xmax); \
1193- break; \
1194- }
1195-
1196-static stbir__inline int
1197-stbir__min(int a, int b)
1198-{
1199- return a < b ? a : b;
1200-}
1201-
1202-static stbir__inline int
1203-stbir__max(int a, int b)
1204-{
1205- return a > b ? a : b;
1206-}
1207-
1208-static float stbir__srgb_uchar_to_linear_float[256] = {
1209- 0.000000f, 0.000304f, 0.000607f, 0.000911f, 0.001214f, 0.001518f, 0.001821f,
1210- 0.002125f, 0.002428f, 0.002732f, 0.003035f, 0.003347f, 0.003677f, 0.004025f,
1211- 0.004391f, 0.004777f, 0.005182f, 0.005605f, 0.006049f, 0.006512f, 0.006995f,
1212- 0.007499f, 0.008023f, 0.008568f, 0.009134f, 0.009721f, 0.010330f, 0.010960f,
1213- 0.011612f, 0.012286f, 0.012983f, 0.013702f, 0.014444f, 0.015209f, 0.015996f,
1214- 0.016807f, 0.017642f, 0.018500f, 0.019382f, 0.020289f, 0.021219f, 0.022174f,
1215- 0.023153f, 0.024158f, 0.025187f, 0.026241f, 0.027321f, 0.028426f, 0.029557f,
1216- 0.030713f, 0.031896f, 0.033105f, 0.034340f, 0.035601f, 0.036889f, 0.038204f,
1217- 0.039546f, 0.040915f, 0.042311f, 0.043735f, 0.045186f, 0.046665f, 0.048172f,
1218- 0.049707f, 0.051269f, 0.052861f, 0.054480f, 0.056128f, 0.057805f, 0.059511f,
1219- 0.061246f, 0.063010f, 0.064803f, 0.066626f, 0.068478f, 0.070360f, 0.072272f,
1220- 0.074214f, 0.076185f, 0.078187f, 0.080220f, 0.082283f, 0.084376f, 0.086500f,
1221- 0.088656f, 0.090842f, 0.093059f, 0.095307f, 0.097587f, 0.099899f, 0.102242f,
1222- 0.104616f, 0.107023f, 0.109462f, 0.111932f, 0.114435f, 0.116971f, 0.119538f,
1223- 0.122139f, 0.124772f, 0.127438f, 0.130136f, 0.132868f, 0.135633f, 0.138432f,
1224- 0.141263f, 0.144128f, 0.147027f, 0.149960f, 0.152926f, 0.155926f, 0.158961f,
1225- 0.162029f, 0.165132f, 0.168269f, 0.171441f, 0.174647f, 0.177888f, 0.181164f,
1226- 0.184475f, 0.187821f, 0.191202f, 0.194618f, 0.198069f, 0.201556f, 0.205079f,
1227- 0.208637f, 0.212231f, 0.215861f, 0.219526f, 0.223228f, 0.226966f, 0.230740f,
1228- 0.234551f, 0.238398f, 0.242281f, 0.246201f, 0.250158f, 0.254152f, 0.258183f,
1229- 0.262251f, 0.266356f, 0.270498f, 0.274677f, 0.278894f, 0.283149f, 0.287441f,
1230- 0.291771f, 0.296138f, 0.300544f, 0.304987f, 0.309469f, 0.313989f, 0.318547f,
1231- 0.323143f, 0.327778f, 0.332452f, 0.337164f, 0.341914f, 0.346704f, 0.351533f,
1232- 0.356400f, 0.361307f, 0.366253f, 0.371238f, 0.376262f, 0.381326f, 0.386430f,
1233- 0.391573f, 0.396755f, 0.401978f, 0.407240f, 0.412543f, 0.417885f, 0.423268f,
1234- 0.428691f, 0.434154f, 0.439657f, 0.445201f, 0.450786f, 0.456411f, 0.462077f,
1235- 0.467784f, 0.473532f, 0.479320f, 0.485150f, 0.491021f, 0.496933f, 0.502887f,
1236- 0.508881f, 0.514918f, 0.520996f, 0.527115f, 0.533276f, 0.539480f, 0.545725f,
1237- 0.552011f, 0.558340f, 0.564712f, 0.571125f, 0.577581f, 0.584078f, 0.590619f,
1238- 0.597202f, 0.603827f, 0.610496f, 0.617207f, 0.623960f, 0.630757f, 0.637597f,
1239- 0.644480f, 0.651406f, 0.658375f, 0.665387f, 0.672443f, 0.679543f, 0.686685f,
1240- 0.693872f, 0.701102f, 0.708376f, 0.715694f, 0.723055f, 0.730461f, 0.737911f,
1241- 0.745404f, 0.752942f, 0.760525f, 0.768151f, 0.775822f, 0.783538f, 0.791298f,
1242- 0.799103f, 0.806952f, 0.814847f, 0.822786f, 0.830770f, 0.838799f, 0.846873f,
1243- 0.854993f, 0.863157f, 0.871367f, 0.879622f, 0.887923f, 0.896269f, 0.904661f,
1244- 0.913099f, 0.921582f, 0.930111f, 0.938686f, 0.947307f, 0.955974f, 0.964686f,
1245- 0.973445f, 0.982251f, 0.991102f, 1.0f};
1246-
1247-typedef union {
1248- unsigned int u;
1249- float f;
1250-} stbir__FP32;
1251-
1252-// From https://gist.github.com/rygorous/2203834
1253-
1254-static const stbir_uint32 fp32_to_srgb8_tab4[104] = {
1255- 0x0073000d, 0x007a000d, 0x0080000d, 0x0087000d, 0x008d000d, 0x0094000d,
1256- 0x009a000d, 0x00a1000d, 0x00a7001a, 0x00b4001a, 0x00c1001a, 0x00ce001a,
1257- 0x00da001a, 0x00e7001a, 0x00f4001a, 0x0101001a, 0x010e0033, 0x01280033,
1258- 0x01410033, 0x015b0033, 0x01750033, 0x018f0033, 0x01a80033, 0x01c20033,
1259- 0x01dc0067, 0x020f0067, 0x02430067, 0x02760067, 0x02aa0067, 0x02dd0067,
1260- 0x03110067, 0x03440067, 0x037800ce, 0x03df00ce, 0x044600ce, 0x04ad00ce,
1261- 0x051400ce, 0x057b00c5, 0x05dd00bc, 0x063b00b5, 0x06970158, 0x07420142,
1262- 0x07e30130, 0x087b0120, 0x090b0112, 0x09940106, 0x0a1700fc, 0x0a9500f2,
1263- 0x0b0f01cb, 0x0bf401ae, 0x0ccb0195, 0x0d950180, 0x0e56016e, 0x0f0d015e,
1264- 0x0fbc0150, 0x10630143, 0x11070264, 0x1238023e, 0x1357021d, 0x14660201,
1265- 0x156601e9, 0x165a01d3, 0x174401c0, 0x182401af, 0x18fe0331, 0x1a9602fe,
1266- 0x1c1502d2, 0x1d7e02ad, 0x1ed4028d, 0x201a0270, 0x21520256, 0x227d0240,
1267- 0x239f0443, 0x25c003fe, 0x27bf03c4, 0x29a10392, 0x2b6a0367, 0x2d1d0341,
1268- 0x2ebe031f, 0x304d0300, 0x31d105b0, 0x34a80555, 0x37520507, 0x39d504c5,
1269- 0x3c37048b, 0x3e7c0458, 0x40a8042a, 0x42bd0401, 0x44c20798, 0x488e071e,
1270- 0x4c1c06b6, 0x4f76065d, 0x52a50610, 0x55ac05cc, 0x5892058f, 0x5b590559,
1271- 0x5e0c0a23, 0x631c0980, 0x67db08f6, 0x6c55087f, 0x70940818, 0x74a007bd,
1272- 0x787d076c, 0x7c330723,
1273-};
1274-
1275-static stbir__inline stbir_uint8
1276-stbir__linear_to_srgb_uchar(float in)
1277-{
1278- static const stbir__FP32 almostone = {0x3f7fffff}; // 1-eps
1279- static const stbir__FP32 minval = {(127 - 13) << 23};
1280- stbir_uint32 tab, bias, scale, t;
1281- stbir__FP32 f;
1282-
1283- // Clamp to [2^(-13), 1-eps]; these two values map to 0 and 1, respectively.
1284- // The tests are carefully written so that NaNs map to 0, same as in the
1285- // reference implementation.
1286- if (!(in > minval.f)) { // written this way to catch NaNs
1287- return 0;
1288- }
1289- if (in > almostone.f) {
1290- return 255;
1291- }
1292-
1293- // Do the table lookup and unpack bias, scale
1294- f.f = in;
1295- tab = fp32_to_srgb8_tab4[(f.u - minval.u) >> 20];
1296- bias = (tab >> 16) << 9;
1297- scale = tab & 0xffff;
1298-
1299- // Grab next-highest mantissa bits and perform linear interpolation
1300- t = (f.u >> 12) & 0xff;
1301- return (unsigned char)((bias + scale * t) >> 16);
1302-}
1303-
1304-#ifndef STBIR_FORCE_GATHER_FILTER_SCANLINES_AMOUNT
1305-#define STBIR_FORCE_GATHER_FILTER_SCANLINES_AMOUNT \
1306- 32 // when downsampling and <= 32 scanlines of buffering, use gather. gather
1307- // used down to 1/8th scaling for 25% win.
1308-#endif
1309-
1310-#ifndef STBIR_FORCE_MINIMUM_SCANLINES_FOR_SPLITS
1311-#define STBIR_FORCE_MINIMUM_SCANLINES_FOR_SPLITS \
1312- 4 // when threading, what is the minimum number of scanlines for a split?
1313-#endif
1314-
1315-#define STBIR_INPUT_CALLBACK_PADDING 3
1316-
1317-#ifdef _M_IX86_FP
1318-#if (_M_IX86_FP >= 1)
1319-#ifndef STBIR_SSE
1320-#define STBIR_SSE
1321-#endif
1322-#endif
1323-#endif
1324-
1325-#ifdef __TINYC__
1326-// tiny c has no intrinsics yet - this can become a version check if they add
1327-// them
1328-#define STBIR_NO_SIMD
1329-#endif
1330-
1331-#if defined(_x86_64) || defined(__x86_64__) || defined(_M_X64) || \
1332- defined(__x86_64) || defined(_M_AMD64) || defined(__SSE2__) || \
1333- defined(STBIR_SSE) || defined(STBIR_SSE2)
1334-#ifndef STBIR_SSE2
1335-#define STBIR_SSE2
1336-#endif
1337-#if defined(__AVX__) || defined(STBIR_AVX2)
1338-#ifndef STBIR_AVX
1339-#ifndef STBIR_NO_AVX
1340-#define STBIR_AVX
1341-#endif
1342-#endif
1343-#endif
1344-#if defined(__AVX2__) || defined(STBIR_AVX2)
1345-#ifndef STBIR_NO_AVX2
1346-#ifndef STBIR_AVX2
1347-#define STBIR_AVX2
1348-#endif
1349-#if defined(_MSC_VER) && !defined(__clang__)
1350-#ifndef STBIR_FP16C // FP16C instructions are on all AVX2 cpus, so we can
1351- // autoselect it here on microsoft - clang needs -m16c
1352-#define STBIR_FP16C
1353-#endif
1354-#endif
1355-#endif
1356-#endif
1357-#ifdef __F16C__
1358-#ifndef STBIR_FP16C // turn on FP16C instructions if the define is set (for
1359- // clang and gcc)
1360-#define STBIR_FP16C
1361-#endif
1362-#endif
1363-#endif
1364-
1365-#if defined(_M_ARM64) || defined(__aarch64__) || defined(__arm64__) || \
1366- ((__ARM_NEON_FP & 4) != 0) || defined(__ARM_NEON__)
1367-#ifndef STBIR_NEON
1368-#define STBIR_NEON
1369-#endif
1370-#endif
1371-
1372-#if defined(_M_ARM) || defined(__arm__)
1373-#ifdef STBIR_USE_FMA
1374-#undef STBIR_USE_FMA // no FMA for 32-bit arm on MSVC
1375-#endif
1376-#endif
1377-
1378-#if defined(__wasm__) && defined(__wasm_simd128__)
1379-#ifndef STBIR_WASM
1380-#define STBIR_WASM
1381-#endif
1382-#endif
1383-
1384-// restrict pointers for the output pointers, other loop and unroll control
1385-#if defined(_MSC_VER) && !defined(__clang__)
1386-#define STBIR_STREAMOUT_PTR(star) star __restrict
1387-#define STBIR_NO_UNROLL(ptr) \
1388- __assume(ptr) // this oddly keeps msvc from unrolling a loop
1389-#if _MSC_VER >= 1900
1390-#define STBIR_NO_UNROLL_LOOP_START __pragma(loop(no_vector))
1391-#else
1392-#define STBIR_NO_UNROLL_LOOP_START
1393-#endif
1394-#elif defined(__clang__)
1395-#define STBIR_STREAMOUT_PTR(star) star __restrict__
1396-#define STBIR_NO_UNROLL(ptr) __asm__("" ::"r"(ptr))
1397-#if (__clang_major__ >= 4) || ((__clang_major__ >= 3) && (__clang_minor__ >= 5))
1398-#define STBIR_NO_UNROLL_LOOP_START \
1399- _Pragma("clang loop unroll(disable)") \
1400- _Pragma("clang loop vectorize(disable)")
1401-#else
1402-#define STBIR_NO_UNROLL_LOOP_START
1403-#endif
1404-#elif defined(__GNUC__)
1405-#define STBIR_STREAMOUT_PTR(star) star __restrict__
1406-#define STBIR_NO_UNROLL(ptr) __asm__("" ::"r"(ptr))
1407-#if __GNUC__ >= 14
1408-#define STBIR_NO_UNROLL_LOOP_START \
1409- _Pragma("GCC unroll 0") _Pragma("GCC novector")
1410-#else
1411-#define STBIR_NO_UNROLL_LOOP_START
1412-#endif
1413-#define STBIR_NO_UNROLL_LOOP_START_INF_FOR
1414-#else
1415-#define STBIR_STREAMOUT_PTR(star) star
1416-#define STBIR_NO_UNROLL(ptr)
1417-#define STBIR_NO_UNROLL_LOOP_START
1418-#endif
1419-
1420-#ifndef STBIR_NO_UNROLL_LOOP_START_INF_FOR
1421-#define STBIR_NO_UNROLL_LOOP_START_INF_FOR STBIR_NO_UNROLL_LOOP_START
1422-#endif
1423-
1424-#ifdef STBIR_NO_SIMD // force simd off for whatever reason
1425-
1426-// force simd off overrides everything else, so clear it all
1427-
1428-#ifdef STBIR_SSE2
1429-#undef STBIR_SSE2
1430-#endif
1431-
1432-#ifdef STBIR_AVX
1433-#undef STBIR_AVX
1434-#endif
1435-
1436-#ifdef STBIR_NEON
1437-#undef STBIR_NEON
1438-#endif
1439-
1440-#ifdef STBIR_AVX2
1441-#undef STBIR_AVX2
1442-#endif
1443-
1444-#ifdef STBIR_FP16C
1445-#undef STBIR_FP16C
1446-#endif
1447-
1448-#ifdef STBIR_WASM
1449-#undef STBIR_WASM
1450-#endif
1451-
1452-#ifdef STBIR_SIMD
1453-#undef STBIR_SIMD
1454-#endif
1455-
1456-#else // STBIR_SIMD
1457-
1458-#ifdef STBIR_SSE2
1459-#include <emmintrin.h>
1460-
1461-#define stbir__simdf __m128
1462-#define stbir__simdi __m128i
1463-
1464-#define stbir_simdi_castf(reg) _mm_castps_si128(reg)
1465-#define stbir_simdf_casti(reg) _mm_castsi128_ps(reg)
1466-
1467-#define stbir__simdf_load(reg, ptr) (reg) = _mm_loadu_ps((float const *)(ptr))
1468-#define stbir__simdi_load(reg, ptr) \
1469- (reg) = _mm_loadu_si128((stbir__simdi const *)(ptr))
1470-#define stbir__simdf_load1(out, ptr) \
1471- (out) = _mm_load_ss((float const *)(ptr)) // top values can be random (not
1472- // denormal or nan for perf)
1473-#define stbir__simdi_load1(out, ptr) \
1474- (out) = _mm_castps_si128(_mm_load_ss((float const *)(ptr)))
1475-#define stbir__simdf_load1z(out, ptr) \
1476- (out) = _mm_load_ss((float const *)(ptr)) // top values must be zero
1477-#define stbir__simdf_frep4(fvar) _mm_set_ps1(fvar)
1478-#define stbir__simdf_load1frep4(out, fvar) (out) = _mm_set_ps1(fvar)
1479-#define stbir__simdf_load2(out, ptr) \
1480- (out) = _mm_castsi128_ps( \
1481- _mm_loadl_epi64((__m128i *)(ptr))) // top values can be random (not
1482- // denormal or nan for perf)
1483-#define stbir__simdf_load2z(out, ptr) \
1484- (out) = _mm_castsi128_ps( \
1485- _mm_loadl_epi64((__m128i *)(ptr))) // top values must be zero
1486-#define stbir__simdf_load2hmerge(out, reg, ptr) \
1487- (out) = _mm_castpd_ps(_mm_loadh_pd(_mm_castps_pd(reg), (double *)(ptr)))
1488-
1489-#define stbir__simdf_zeroP() _mm_setzero_ps()
1490-#define stbir__simdf_zero(reg) (reg) = _mm_setzero_ps()
1491-
1492-#define stbir__simdf_store(ptr, reg) _mm_storeu_ps((float *)(ptr), reg)
1493-#define stbir__simdf_store1(ptr, reg) _mm_store_ss((float *)(ptr), reg)
1494-#define stbir__simdf_store2(ptr, reg) \
1495- _mm_storel_epi64((__m128i *)(ptr), _mm_castps_si128(reg))
1496-#define stbir__simdf_store2h(ptr, reg) \
1497- _mm_storeh_pd((double *)(ptr), _mm_castps_pd(reg))
1498-
1499-#define stbir__simdi_store(ptr, reg) _mm_storeu_si128((__m128i *)(ptr), reg)
1500-#define stbir__simdi_store1(ptr, reg) \
1501- _mm_store_ss((float *)(ptr), _mm_castsi128_ps(reg))
1502-#define stbir__simdi_store2(ptr, reg) _mm_storel_epi64((__m128i *)(ptr), (reg))
1503-
1504-#define stbir__prefetch(ptr) _mm_prefetch((char *)(ptr), _MM_HINT_T0)
1505-
1506-#define stbir__simdi_expand_u8_to_u32(out0, out1, out2, out3, ireg) \
1507- { \
1508- stbir__simdi zero = _mm_setzero_si128(); \
1509- out2 = _mm_unpacklo_epi8(ireg, zero); \
1510- out3 = _mm_unpackhi_epi8(ireg, zero); \
1511- out0 = _mm_unpacklo_epi16(out2, zero); \
1512- out1 = _mm_unpackhi_epi16(out2, zero); \
1513- out2 = _mm_unpacklo_epi16(out3, zero); \
1514- out3 = _mm_unpackhi_epi16(out3, zero); \
1515- }
1516-
1517-#define stbir__simdi_expand_u8_to_1u32(out, ireg) \
1518- { \
1519- stbir__simdi zero = _mm_setzero_si128(); \
1520- out = _mm_unpacklo_epi8(ireg, zero); \
1521- out = _mm_unpacklo_epi16(out, zero); \
1522- }
1523-
1524-#define stbir__simdi_expand_u16_to_u32(out0, out1, ireg) \
1525- { \
1526- stbir__simdi zero = _mm_setzero_si128(); \
1527- out0 = _mm_unpacklo_epi16(ireg, zero); \
1528- out1 = _mm_unpackhi_epi16(ireg, zero); \
1529- }
1530-
1531-#define stbir__simdf_convert_float_to_i32(i, f) (i) = _mm_cvttps_epi32(f)
1532-#define stbir__simdf_convert_float_to_int(f) _mm_cvtt_ss2si(f)
1533-#define stbir__simdf_convert_float_to_uint8(f) \
1534- ((unsigned char)_mm_cvtsi128_si32(_mm_cvttps_epi32( \
1535- _mm_max_ps(_mm_min_ps(f, STBIR__CONSTF(STBIR_max_uint8_as_float)), \
1536- _mm_setzero_ps()))))
1537-#define stbir__simdf_convert_float_to_short(f) \
1538- ((unsigned short)_mm_cvtsi128_si32(_mm_cvttps_epi32( \
1539- _mm_max_ps(_mm_min_ps(f, STBIR__CONSTF(STBIR_max_uint16_as_float)), \
1540- _mm_setzero_ps()))))
1541-
1542-#define stbir__simdi_to_int(i) _mm_cvtsi128_si32(i)
1543-#define stbir__simdi_convert_i32_to_float(out, ireg) \
1544- (out) = _mm_cvtepi32_ps(ireg)
1545-#define stbir__simdf_add(out, reg0, reg1) (out) = _mm_add_ps(reg0, reg1)
1546-#define stbir__simdf_mult(out, reg0, reg1) (out) = _mm_mul_ps(reg0, reg1)
1547-#define stbir__simdf_mult_mem(out, reg, ptr) \
1548- (out) = _mm_mul_ps(reg, _mm_loadu_ps((float const *)(ptr)))
1549-#define stbir__simdf_mult1_mem(out, reg, ptr) \
1550- (out) = _mm_mul_ss(reg, _mm_load_ss((float const *)(ptr)))
1551-#define stbir__simdf_add_mem(out, reg, ptr) \
1552- (out) = _mm_add_ps(reg, _mm_loadu_ps((float const *)(ptr)))
1553-#define stbir__simdf_add1_mem(out, reg, ptr) \
1554- (out) = _mm_add_ss(reg, _mm_load_ss((float const *)(ptr)))
1555-
1556-#ifdef STBIR_USE_FMA // not on by default to maintain bit identical simd to
1557- // non-simd
1558-#include <immintrin.h>
1559-#define stbir__simdf_madd(out, add, mul1, mul2) \
1560- (out) = _mm_fmadd_ps(mul1, mul2, add)
1561-#define stbir__simdf_madd1(out, add, mul1, mul2) \
1562- (out) = _mm_fmadd_ss(mul1, mul2, add)
1563-#define stbir__simdf_madd_mem(out, add, mul, ptr) \
1564- (out) = _mm_fmadd_ps(mul, _mm_loadu_ps((float const *)(ptr)), add)
1565-#define stbir__simdf_madd1_mem(out, add, mul, ptr) \
1566- (out) = _mm_fmadd_ss(mul, _mm_load_ss((float const *)(ptr)), add)
1567-#else
1568-#define stbir__simdf_madd(out, add, mul1, mul2) \
1569- (out) = _mm_add_ps(add, _mm_mul_ps(mul1, mul2))
1570-#define stbir__simdf_madd1(out, add, mul1, mul2) \
1571- (out) = _mm_add_ss(add, _mm_mul_ss(mul1, mul2))
1572-#define stbir__simdf_madd_mem(out, add, mul, ptr) \
1573- (out) = _mm_add_ps(add, _mm_mul_ps(mul, _mm_loadu_ps((float const *)(ptr))))
1574-#define stbir__simdf_madd1_mem(out, add, mul, ptr) \
1575- (out) = _mm_add_ss(add, _mm_mul_ss(mul, _mm_load_ss((float const *)(ptr))))
1576-#endif
1577-
1578-#define stbir__simdf_add1(out, reg0, reg1) (out) = _mm_add_ss(reg0, reg1)
1579-#define stbir__simdf_mult1(out, reg0, reg1) (out) = _mm_mul_ss(reg0, reg1)
1580-
1581-#define stbir__simdf_and(out, reg0, reg1) (out) = _mm_and_ps(reg0, reg1)
1582-#define stbir__simdf_or(out, reg0, reg1) (out) = _mm_or_ps(reg0, reg1)
1583-
1584-#define stbir__simdf_min(out, reg0, reg1) (out) = _mm_min_ps(reg0, reg1)
1585-#define stbir__simdf_max(out, reg0, reg1) (out) = _mm_max_ps(reg0, reg1)
1586-#define stbir__simdf_min1(out, reg0, reg1) (out) = _mm_min_ss(reg0, reg1)
1587-#define stbir__simdf_max1(out, reg0, reg1) (out) = _mm_max_ss(reg0, reg1)
1588-
1589-#define stbir__simdf_0123ABCDto3ABx(out, reg0, reg1) \
1590- (out) = _mm_castsi128_ps(_mm_shuffle_epi32( \
1591- _mm_castps_si128(_mm_shuffle_ps( \
1592- reg1, reg0, (0 << 0) + (1 << 2) + (2 << 4) + (3 << 6))), \
1593- (3 << 0) + (0 << 2) + (1 << 4) + (2 << 6)))
1594-#define stbir__simdf_0123ABCDto23Ax(out, reg0, reg1) \
1595- (out) = _mm_castsi128_ps(_mm_shuffle_epi32( \
1596- _mm_castps_si128(_mm_shuffle_ps( \
1597- reg1, reg0, (0 << 0) + (1 << 2) + (2 << 4) + (3 << 6))), \
1598- (2 << 0) + (3 << 2) + (0 << 4) + (1 << 6)))
1599-
1600-static const stbir__simdf STBIR_zeroones = {0.0f, 1.0f, 0.0f, 1.0f};
1601-static const stbir__simdf STBIR_onezeros = {1.0f, 0.0f, 1.0f, 0.0f};
1602-#define stbir__simdf_aaa1(out, alp, ones) \
1603- (out) = _mm_castsi128_ps( \
1604- _mm_shuffle_epi32(_mm_castps_si128(_mm_movehl_ps(ones, alp)), \
1605- (1 << 0) + (1 << 2) + (1 << 4) + (2 << 6)))
1606-#define stbir__simdf_1aaa(out, alp, ones) \
1607- (out) = _mm_castsi128_ps( \
1608- _mm_shuffle_epi32(_mm_castps_si128(_mm_movelh_ps(ones, alp)), \
1609- (0 << 0) + (2 << 2) + (2 << 4) + (2 << 6)))
1610-#define stbir__simdf_a1a1(out, alp, ones) \
1611- (out) = \
1612- _mm_or_ps(_mm_castsi128_ps(_mm_srli_epi64(_mm_castps_si128(alp), 32)), \
1613- STBIR_zeroones)
1614-#define stbir__simdf_1a1a(out, alp, ones) \
1615- (out) = \
1616- _mm_or_ps(_mm_castsi128_ps(_mm_slli_epi64(_mm_castps_si128(alp), 32)), \
1617- STBIR_onezeros)
1618-
1619-#define stbir__simdf_swiz(reg, one, two, three, four) \
1620- _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(reg), \
1621- (one << 0) + (two << 2) + \
1622- (three << 4) + (four << 6)))
1623-
1624-#define stbir__simdi_and(out, reg0, reg1) (out) = _mm_and_si128(reg0, reg1)
1625-#define stbir__simdi_or(out, reg0, reg1) (out) = _mm_or_si128(reg0, reg1)
1626-#define stbir__simdi_16madd(out, reg0, reg1) (out) = _mm_madd_epi16(reg0, reg1)
1627-
1628-#define stbir__simdf_pack_to_8bytes(out, aa, bb) \
1629- { \
1630- stbir__simdf af, bf; \
1631- stbir__simdi a, b; \
1632- af = _mm_min_ps(aa, STBIR_max_uint8_as_float); \
1633- bf = _mm_min_ps(bb, STBIR_max_uint8_as_float); \
1634- af = _mm_max_ps(af, _mm_setzero_ps()); \
1635- bf = _mm_max_ps(bf, _mm_setzero_ps()); \
1636- a = _mm_cvttps_epi32(af); \
1637- b = _mm_cvttps_epi32(bf); \
1638- a = _mm_packs_epi32(a, b); \
1639- out = _mm_packus_epi16(a, a); \
1640- }
1641-
1642-#define stbir__simdf_load4_transposed(o0, o1, o2, o3, ptr) \
1643- stbir__simdf_load(o0, (ptr)); \
1644- stbir__simdf_load(o1, (ptr) + 4); \
1645- stbir__simdf_load(o2, (ptr) + 8); \
1646- stbir__simdf_load(o3, (ptr) + 12); \
1647- { \
1648- __m128 tmp0, tmp1, tmp2, tmp3; \
1649- tmp0 = _mm_unpacklo_ps(o0, o1); \
1650- tmp2 = _mm_unpacklo_ps(o2, o3); \
1651- tmp1 = _mm_unpackhi_ps(o0, o1); \
1652- tmp3 = _mm_unpackhi_ps(o2, o3); \
1653- o0 = _mm_movelh_ps(tmp0, tmp2); \
1654- o1 = _mm_movehl_ps(tmp2, tmp0); \
1655- o2 = _mm_movelh_ps(tmp1, tmp3); \
1656- o3 = _mm_movehl_ps(tmp3, tmp1); \
1657- }
1658-
1659-#define stbir__interleave_pack_and_store_16_u8(ptr, r0, r1, r2, r3) \
1660- r0 = _mm_packs_epi32(r0, r1); \
1661- r2 = _mm_packs_epi32(r2, r3); \
1662- r1 = _mm_unpacklo_epi16(r0, r2); \
1663- r3 = _mm_unpackhi_epi16(r0, r2); \
1664- r0 = _mm_unpacklo_epi16(r1, r3); \
1665- r2 = _mm_unpackhi_epi16(r1, r3); \
1666- r0 = _mm_packus_epi16(r0, r2); \
1667- stbir__simdi_store(ptr, r0);
1668-
1669-#define stbir__simdi_32shr(out, reg, imm) out = _mm_srli_epi32(reg, imm)
1670-
1671-#if defined(_MSC_VER) && !defined(__clang__)
1672-// msvc inits with 8 bytes
1673-#define STBIR__CONST_32_TO_8(v) \
1674- (char)(unsigned char)((v) & 255), (char)(unsigned char)(((v) >> 8) & 255), \
1675- (char)(unsigned char)(((v) >> 16) & 255), \
1676- (char)(unsigned char)(((v) >> 24) & 255)
1677-#define STBIR__CONST_4_32i(v) \
1678- STBIR__CONST_32_TO_8(v), STBIR__CONST_32_TO_8(v), STBIR__CONST_32_TO_8(v), \
1679- STBIR__CONST_32_TO_8(v)
1680-#define STBIR__CONST_4d_32i(v0, v1, v2, v3) \
1681- STBIR__CONST_32_TO_8(v0), STBIR__CONST_32_TO_8(v1), \
1682- STBIR__CONST_32_TO_8(v2), STBIR__CONST_32_TO_8(v3)
1683-#else
1684-// everything else inits with long long's
1685-#define STBIR__CONST_4_32i(v) \
1686- (long long)((((stbir_uint64)(stbir_uint32)(v)) << 32) | \
1687- ((stbir_uint64)(stbir_uint32)(v))), \
1688- (long long)((((stbir_uint64)(stbir_uint32)(v)) << 32) | \
1689- ((stbir_uint64)(stbir_uint32)(v)))
1690-#define STBIR__CONST_4d_32i(v0, v1, v2, v3) \
1691- (long long)((((stbir_uint64)(stbir_uint32)(v1)) << 32) | \
1692- ((stbir_uint64)(stbir_uint32)(v0))), \
1693- (long long)((((stbir_uint64)(stbir_uint32)(v3)) << 32) | \
1694- ((stbir_uint64)(stbir_uint32)(v2)))
1695-#endif
1696-
1697-#define STBIR__SIMDF_CONST(var, x) stbir__simdf var = {x, x, x, x}
1698-#define STBIR__SIMDI_CONST(var, x) stbir__simdi var = {STBIR__CONST_4_32i(x)}
1699-#define STBIR__CONSTF(var) (var)
1700-#define STBIR__CONSTI(var) (var)
1701-
1702-#if defined(STBIR_AVX) || defined(__SSE4_1__)
1703-#include <smmintrin.h>
1704-#define stbir__simdf_pack_to_8words(out, reg0, reg1) \
1705- out = _mm_packus_epi32( \
1706- _mm_cvttps_epi32(_mm_max_ps( \
1707- _mm_min_ps(reg0, STBIR__CONSTF(STBIR_max_uint16_as_float)), \
1708- _mm_setzero_ps())), \
1709- _mm_cvttps_epi32(_mm_max_ps( \
1710- _mm_min_ps(reg1, STBIR__CONSTF(STBIR_max_uint16_as_float)), \
1711- _mm_setzero_ps())))
1712-#else
1713-static STBIR__SIMDI_CONST(stbir__s32_32768, 32768);
1714-static STBIR__SIMDI_CONST(stbir__s16_32768, ((32768 << 16) | 32768));
1715-
1716-#define stbir__simdf_pack_to_8words(out, reg0, reg1) \
1717- { \
1718- stbir__simdi tmp0, tmp1; \
1719- tmp0 = _mm_cvttps_epi32(_mm_max_ps( \
1720- _mm_min_ps(reg0, STBIR__CONSTF(STBIR_max_uint16_as_float)), \
1721- _mm_setzero_ps())); \
1722- tmp1 = _mm_cvttps_epi32(_mm_max_ps( \
1723- _mm_min_ps(reg1, STBIR__CONSTF(STBIR_max_uint16_as_float)), \
1724- _mm_setzero_ps())); \
1725- tmp0 = _mm_sub_epi32(tmp0, stbir__s32_32768); \
1726- tmp1 = _mm_sub_epi32(tmp1, stbir__s32_32768); \
1727- out = _mm_packs_epi32(tmp0, tmp1); \
1728- out = _mm_sub_epi16(out, stbir__s16_32768); \
1729- }
1730-
1731-#endif
1732-
1733-#define STBIR_SIMD
1734-
1735-// if we detect AVX, set the simd8 defines
1736-#ifdef STBIR_AVX
1737-#include <immintrin.h>
1738-#define STBIR_SIMD8
1739-#define stbir__simdf8 __m256
1740-#define stbir__simdi8 __m256i
1741-#define stbir__simdf8_load(out, ptr) \
1742- (out) = _mm256_loadu_ps((float const *)(ptr))
1743-#define stbir__simdi8_load(out, ptr) \
1744- (out) = _mm256_loadu_si256((__m256i const *)(ptr))
1745-#define stbir__simdf8_mult(out, a, b) (out) = _mm256_mul_ps((a), (b))
1746-#define stbir__simdf8_store(ptr, out) _mm256_storeu_ps((float *)(ptr), out)
1747-#define stbir__simdi8_store(ptr, reg) _mm256_storeu_si256((__m256i *)(ptr), reg)
1748-#define stbir__simdf8_frep8(fval) _mm256_set1_ps(fval)
1749-
1750-#define stbir__simdf8_min(out, reg0, reg1) (out) = _mm256_min_ps(reg0, reg1)
1751-#define stbir__simdf8_max(out, reg0, reg1) (out) = _mm256_max_ps(reg0, reg1)
1752-
1753-#define stbir__simdf8_add4halves(out, bot4, top8) \
1754- (out) = _mm_add_ps(bot4, _mm256_extractf128_ps(top8, 1))
1755-#define stbir__simdf8_mult_mem(out, reg, ptr) \
1756- (out) = _mm256_mul_ps(reg, _mm256_loadu_ps((float const *)(ptr)))
1757-#define stbir__simdf8_add_mem(out, reg, ptr) \
1758- (out) = _mm256_add_ps(reg, _mm256_loadu_ps((float const *)(ptr)))
1759-#define stbir__simdf8_add(out, a, b) (out) = _mm256_add_ps(a, b)
1760-#define stbir__simdf8_load1b(out, ptr) (out) = _mm256_broadcast_ss(ptr)
1761-#define stbir__simdf_load1rep4(out, ptr) \
1762- (out) = _mm_broadcast_ss(ptr) // avx load instruction
1763-
1764-#define stbir__simdi8_convert_i32_to_float(out, ireg) \
1765- (out) = _mm256_cvtepi32_ps(ireg)
1766-#define stbir__simdf8_convert_float_to_i32(i, f) (i) = _mm256_cvttps_epi32(f)
1767-
1768-#define stbir__simdf8_bot4s(out, a, b) \
1769- (out) = _mm256_permute2f128_ps(a, b, (0 << 0) + (2 << 4))
1770-#define stbir__simdf8_top4s(out, a, b) \
1771- (out) = _mm256_permute2f128_ps(a, b, (1 << 0) + (3 << 4))
1772-
1773-#define stbir__simdf8_gettop4(reg) _mm256_extractf128_ps(reg, 1)
1774-
1775-#ifdef STBIR_AVX2
1776-
1777-#define stbir__simdi8_expand_u8_to_u32(out0, out1, ireg) \
1778- { \
1779- stbir__simdi8 a, zero = _mm256_setzero_si256(); \
1780- a = _mm256_permute4x64_epi64( \
1781- _mm256_unpacklo_epi8( \
1782- _mm256_permute4x64_epi64(_mm256_castsi128_si256(ireg), \
1783- (0 << 0) + (2 << 2) + (1 << 4) + \
1784- (3 << 6)), \
1785- zero), \
1786- (0 << 0) + (2 << 2) + (1 << 4) + (3 << 6)); \
1787- out0 = _mm256_unpacklo_epi16(a, zero); \
1788- out1 = _mm256_unpackhi_epi16(a, zero); \
1789- }
1790-
1791-#define stbir__simdf8_pack_to_16bytes(out, aa, bb) \
1792- { \
1793- stbir__simdi8 t; \
1794- stbir__simdf8 af, bf; \
1795- stbir__simdi8 a, b; \
1796- af = _mm256_min_ps(aa, STBIR_max_uint8_as_floatX); \
1797- bf = _mm256_min_ps(bb, STBIR_max_uint8_as_floatX); \
1798- af = _mm256_max_ps(af, _mm256_setzero_ps()); \
1799- bf = _mm256_max_ps(bf, _mm256_setzero_ps()); \
1800- a = _mm256_cvttps_epi32(af); \
1801- b = _mm256_cvttps_epi32(bf); \
1802- t = _mm256_permute4x64_epi64(_mm256_packs_epi32(a, b), \
1803- (0 << 0) + (2 << 2) + (1 << 4) + \
1804- (3 << 6)); \
1805- out = _mm256_castsi256_si128(_mm256_permute4x64_epi64( \
1806- _mm256_packus_epi16(t, t), \
1807- (0 << 0) + (2 << 2) + (1 << 4) + (3 << 6))); \
1808- }
1809-
1810-#define stbir__simdi8_expand_u16_to_u32(out, ireg) \
1811- out = _mm256_unpacklo_epi16( \
1812- _mm256_permute4x64_epi64(_mm256_castsi128_si256(ireg), \
1813- (0 << 0) + (2 << 2) + (1 << 4) + (3 << 6)), \
1814- _mm256_setzero_si256());
1815-
1816-#define stbir__simdf8_pack_to_16words(out, aa, bb) \
1817- { \
1818- stbir__simdf8 af, bf; \
1819- stbir__simdi8 a, b; \
1820- af = _mm256_min_ps(aa, STBIR_max_uint16_as_floatX); \
1821- bf = _mm256_min_ps(bb, STBIR_max_uint16_as_floatX); \
1822- af = _mm256_max_ps(af, _mm256_setzero_ps()); \
1823- bf = _mm256_max_ps(bf, _mm256_setzero_ps()); \
1824- a = _mm256_cvttps_epi32(af); \
1825- b = _mm256_cvttps_epi32(bf); \
1826- (out) = _mm256_permute4x64_epi64(_mm256_packus_epi32(a, b), \
1827- (0 << 0) + (2 << 2) + (1 << 4) + \
1828- (3 << 6)); \
1829- }
1830-
1831-#else
1832-
1833-#define stbir__simdi8_expand_u8_to_u32(out0, out1, ireg) \
1834- { \
1835- stbir__simdi a, zero = _mm_setzero_si128(); \
1836- a = _mm_unpacklo_epi8(ireg, zero); \
1837- out0 = _mm256_setr_m128i(_mm_unpacklo_epi16(a, zero), \
1838- _mm_unpackhi_epi16(a, zero)); \
1839- a = _mm_unpackhi_epi8(ireg, zero); \
1840- out1 = _mm256_setr_m128i(_mm_unpacklo_epi16(a, zero), \
1841- _mm_unpackhi_epi16(a, zero)); \
1842- }
1843-
1844-#define stbir__simdf8_pack_to_16bytes(out, aa, bb) \
1845- { \
1846- stbir__simdi t; \
1847- stbir__simdf8 af, bf; \
1848- stbir__simdi8 a, b; \
1849- af = _mm256_min_ps(aa, STBIR_max_uint8_as_floatX); \
1850- bf = _mm256_min_ps(bb, STBIR_max_uint8_as_floatX); \
1851- af = _mm256_max_ps(af, _mm256_setzero_ps()); \
1852- bf = _mm256_max_ps(bf, _mm256_setzero_ps()); \
1853- a = _mm256_cvttps_epi32(af); \
1854- b = _mm256_cvttps_epi32(bf); \
1855- out = _mm_packs_epi32(_mm256_castsi256_si128(a), \
1856- _mm256_extractf128_si256(a, 1)); \
1857- out = _mm_packus_epi16(out, out); \
1858- t = _mm_packs_epi32(_mm256_castsi256_si128(b), \
1859- _mm256_extractf128_si256(b, 1)); \
1860- t = _mm_packus_epi16(t, t); \
1861- out = _mm_castps_si128( \
1862- _mm_shuffle_ps(_mm_castsi128_ps(out), _mm_castsi128_ps(t), \
1863- (0 << 0) + (1 << 2) + (0 << 4) + (1 << 6))); \
1864- }
1865-
1866-#define stbir__simdi8_expand_u16_to_u32(out, ireg) \
1867- { \
1868- stbir__simdi a, b, zero = _mm_setzero_si128(); \
1869- a = _mm_unpacklo_epi16(ireg, zero); \
1870- b = _mm_unpackhi_epi16(ireg, zero); \
1871- out = _mm256_insertf128_si256(_mm256_castsi128_si256(a), b, 1); \
1872- }
1873-
1874-#define stbir__simdf8_pack_to_16words(out, aa, bb) \
1875- { \
1876- stbir__simdi t0, t1; \
1877- stbir__simdf8 af, bf; \
1878- stbir__simdi8 a, b; \
1879- af = _mm256_min_ps(aa, STBIR_max_uint16_as_floatX); \
1880- bf = _mm256_min_ps(bb, STBIR_max_uint16_as_floatX); \
1881- af = _mm256_max_ps(af, _mm256_setzero_ps()); \
1882- bf = _mm256_max_ps(bf, _mm256_setzero_ps()); \
1883- a = _mm256_cvttps_epi32(af); \
1884- b = _mm256_cvttps_epi32(bf); \
1885- t0 = _mm_packus_epi32(_mm256_castsi256_si128(a), \
1886- _mm256_extractf128_si256(a, 1)); \
1887- t1 = _mm_packus_epi32(_mm256_castsi256_si128(b), \
1888- _mm256_extractf128_si256(b, 1)); \
1889- out = _mm256_setr_m128i(t0, t1); \
1890- }
1891-
1892-#endif
1893-
1894-static __m256i stbir_00001111 = {STBIR__CONST_4d_32i(0, 0, 0, 0),
1895- STBIR__CONST_4d_32i(1, 1, 1, 1)};
1896-#define stbir__simdf8_0123to00001111(out, in) \
1897- (out) = _mm256_permutevar_ps(in, stbir_00001111)
1898-
1899-static __m256i stbir_22223333 = {STBIR__CONST_4d_32i(2, 2, 2, 2),
1900- STBIR__CONST_4d_32i(3, 3, 3, 3)};
1901-#define stbir__simdf8_0123to22223333(out, in) \
1902- (out) = _mm256_permutevar_ps(in, stbir_22223333)
1903-
1904-#define stbir__simdf8_0123to2222(out, in) \
1905- (out) = stbir__simdf_swiz(_mm256_castps256_ps128(in), 2, 2, 2, 2)
1906-
1907-#define stbir__simdf8_load4b(out, ptr) \
1908- (out) = _mm256_broadcast_ps((__m128 const *)(ptr))
1909-
1910-static __m256i stbir_00112233 = {STBIR__CONST_4d_32i(0, 0, 1, 1),
1911- STBIR__CONST_4d_32i(2, 2, 3, 3)};
1912-#define stbir__simdf8_0123to00112233(out, in) \
1913- (out) = _mm256_permutevar_ps(in, stbir_00112233)
1914-#define stbir__simdf8_add4(out, a8, b) \
1915- (out) = _mm256_add_ps(a8, _mm256_castps128_ps256(b))
1916-
1917-static __m256i stbir_load6 = {
1918- STBIR__CONST_4_32i(0x80000000),
1919- STBIR__CONST_4d_32i(0x80000000, 0x80000000, 0, 0)};
1920-#define stbir__simdf8_load6z(out, ptr) \
1921- (out) = _mm256_maskload_ps(ptr, stbir_load6)
1922-
1923-#define stbir__simdf8_0123to00000000(out, in) \
1924- (out) = _mm256_shuffle_ps(in, in, (0 << 0) + (0 << 2) + (0 << 4) + (0 << 6))
1925-#define stbir__simdf8_0123to11111111(out, in) \
1926- (out) = _mm256_shuffle_ps(in, in, (1 << 0) + (1 << 2) + (1 << 4) + (1 << 6))
1927-#define stbir__simdf8_0123to22222222(out, in) \
1928- (out) = _mm256_shuffle_ps(in, in, (2 << 0) + (2 << 2) + (2 << 4) + (2 << 6))
1929-#define stbir__simdf8_0123to33333333(out, in) \
1930- (out) = _mm256_shuffle_ps(in, in, (3 << 0) + (3 << 2) + (3 << 4) + (3 << 6))
1931-#define stbir__simdf8_0123to21032103(out, in) \
1932- (out) = _mm256_shuffle_ps(in, in, (2 << 0) + (1 << 2) + (0 << 4) + (3 << 6))
1933-#define stbir__simdf8_0123to32103210(out, in) \
1934- (out) = _mm256_shuffle_ps(in, in, (3 << 0) + (2 << 2) + (1 << 4) + (0 << 6))
1935-#define stbir__simdf8_0123to12301230(out, in) \
1936- (out) = _mm256_shuffle_ps(in, in, (1 << 0) + (2 << 2) + (3 << 4) + (0 << 6))
1937-#define stbir__simdf8_0123to10321032(out, in) \
1938- (out) = _mm256_shuffle_ps(in, in, (1 << 0) + (0 << 2) + (3 << 4) + (2 << 6))
1939-#define stbir__simdf8_0123to30123012(out, in) \
1940- (out) = _mm256_shuffle_ps(in, in, (3 << 0) + (0 << 2) + (1 << 4) + (2 << 6))
1941-
1942-#define stbir__simdf8_0123to11331133(out, in) \
1943- (out) = _mm256_shuffle_ps(in, in, (1 << 0) + (1 << 2) + (3 << 4) + (3 << 6))
1944-#define stbir__simdf8_0123to00220022(out, in) \
1945- (out) = _mm256_shuffle_ps(in, in, (0 << 0) + (0 << 2) + (2 << 4) + (2 << 6))
1946-
1947-#define stbir__simdf8_aaa1(out, alp, ones) \
1948- (out) = _mm256_blend_ps(alp, ones, \
1949- (1 << 0) + (1 << 1) + (1 << 2) + (0 << 3) + \
1950- (1 << 4) + (1 << 5) + (1 << 6) + (0 << 7)); \
1951- (out) = \
1952- _mm256_shuffle_ps(out, out, (3 << 0) + (3 << 2) + (3 << 4) + (0 << 6))
1953-#define stbir__simdf8_1aaa(out, alp, ones) \
1954- (out) = _mm256_blend_ps(alp, ones, \
1955- (0 << 0) + (1 << 1) + (1 << 2) + (1 << 3) + \
1956- (0 << 4) + (1 << 5) + (1 << 6) + (1 << 7)); \
1957- (out) = \
1958- _mm256_shuffle_ps(out, out, (1 << 0) + (0 << 2) + (0 << 4) + (0 << 6))
1959-#define stbir__simdf8_a1a1(out, alp, ones) \
1960- (out) = _mm256_blend_ps(alp, ones, \
1961- (1 << 0) + (0 << 1) + (1 << 2) + (0 << 3) + \
1962- (1 << 4) + (0 << 5) + (1 << 6) + (0 << 7)); \
1963- (out) = \
1964- _mm256_shuffle_ps(out, out, (1 << 0) + (0 << 2) + (3 << 4) + (2 << 6))
1965-#define stbir__simdf8_1a1a(out, alp, ones) \
1966- (out) = _mm256_blend_ps(alp, ones, \
1967- (0 << 0) + (1 << 1) + (0 << 2) + (1 << 3) + \
1968- (0 << 4) + (1 << 5) + (0 << 6) + (1 << 7)); \
1969- (out) = \
1970- _mm256_shuffle_ps(out, out, (1 << 0) + (0 << 2) + (3 << 4) + (2 << 6))
1971-
1972-#define stbir__simdf8_zero(reg) (reg) = _mm256_setzero_ps()
1973-
1974-#ifdef STBIR_USE_FMA // not on by default to maintain bit identical simd to
1975- // non-simd
1976-#define stbir__simdf8_madd(out, add, mul1, mul2) \
1977- (out) = _mm256_fmadd_ps(mul1, mul2, add)
1978-#define stbir__simdf8_madd_mem(out, add, mul, ptr) \
1979- (out) = _mm256_fmadd_ps(mul, _mm256_loadu_ps((float const *)(ptr)), add)
1980-#define stbir__simdf8_madd_mem4(out, add, mul, ptr) \
1981- (out) = \
1982- _mm256_fmadd_ps(_mm256_setr_m128(mul, _mm_setzero_ps()), \
1983- _mm256_setr_m128(_mm_loadu_ps((float const *)(ptr)), \
1984- _mm_setzero_ps()), \
1985- add)
1986-#else
1987-#define stbir__simdf8_madd(out, add, mul1, mul2) \
1988- (out) = _mm256_add_ps(add, _mm256_mul_ps(mul1, mul2))
1989-#define stbir__simdf8_madd_mem(out, add, mul, ptr) \
1990- (out) = _mm256_add_ps( \
1991- add, _mm256_mul_ps(mul, _mm256_loadu_ps((float const *)(ptr))))
1992-#define stbir__simdf8_madd_mem4(out, add, mul, ptr) \
1993- (out) = _mm256_add_ps( \
1994- add, \
1995- _mm256_setr_m128(_mm_mul_ps(mul, _mm_loadu_ps((float const *)(ptr))), \
1996- _mm_setzero_ps()))
1997-#endif
1998-#define stbir__if_simdf8_cast_to_simdf4(val) _mm256_castps256_ps128(val)
1999-
2000-#endif
2001-
2002-#ifdef STBIR_FLOORF
2003-#undef STBIR_FLOORF
2004-#endif
2005-#define STBIR_FLOORF stbir_simd_floorf
2006-static stbir__inline float
2007-stbir_simd_floorf(float x) // martins floorf
2008-{
2009-#if defined(STBIR_AVX) || defined(__SSE4_1__) || defined(STBIR_SSE41)
2010- __m128 t = _mm_set_ss(x);
2011- return _mm_cvtss_f32(_mm_floor_ss(t, t));
2012-#else
2013- __m128 f = _mm_set_ss(x);
2014- __m128 t = _mm_cvtepi32_ps(_mm_cvttps_epi32(f));
2015- __m128 r = _mm_add_ss(t, _mm_and_ps(_mm_cmplt_ss(f, t), _mm_set_ss(-1.0f)));
2016- return _mm_cvtss_f32(r);
2017-#endif
2018-}
2019-
2020-#ifdef STBIR_CEILF
2021-#undef STBIR_CEILF
2022-#endif
2023-#define STBIR_CEILF stbir_simd_ceilf
2024-static stbir__inline float
2025-stbir_simd_ceilf(float x) // martins ceilf
2026-{
2027-#if defined(STBIR_AVX) || defined(__SSE4_1__) || defined(STBIR_SSE41)
2028- __m128 t = _mm_set_ss(x);
2029- return _mm_cvtss_f32(_mm_ceil_ss(t, t));
2030-#else
2031- __m128 f = _mm_set_ss(x);
2032- __m128 t = _mm_cvtepi32_ps(_mm_cvttps_epi32(f));
2033- __m128 r = _mm_add_ss(t, _mm_and_ps(_mm_cmplt_ss(t, f), _mm_set_ss(1.0f)));
2034- return _mm_cvtss_f32(r);
2035-#endif
2036-}
2037-
2038-#elif defined(STBIR_NEON)
2039-
2040-#include <arm_neon.h>
2041-
2042-#define stbir__simdf float32x4_t
2043-#define stbir__simdi uint32x4_t
2044-
2045-#define stbir_simdi_castf(reg) vreinterpretq_u32_f32(reg)
2046-#define stbir_simdf_casti(reg) vreinterpretq_f32_u32(reg)
2047-
2048-#define stbir__simdf_load(reg, ptr) (reg) = vld1q_f32((float const *)(ptr))
2049-#define stbir__simdi_load(reg, ptr) (reg) = vld1q_u32((uint32_t const *)(ptr))
2050-#define stbir__simdf_load1(out, ptr) \
2051- (out) = vld1q_dup_f32((float const *)(ptr)) // top values can be random (not
2052- // denormal or nan for perf)
2053-#define stbir__simdi_load1(out, ptr) \
2054- (out) = vld1q_dup_u32((uint32_t const *)(ptr))
2055-#define stbir__simdf_load1z(out, ptr) \
2056- (out) = vld1q_lane_f32((float const *)(ptr), vdupq_n_f32(0), \
2057- 0) // top values must be zero
2058-#define stbir__simdf_frep4(fvar) vdupq_n_f32(fvar)
2059-#define stbir__simdf_load1frep4(out, fvar) (out) = vdupq_n_f32(fvar)
2060-#define stbir__simdf_load2(out, ptr) \
2061- (out) = vcombine_f32( \
2062- vld1_f32((float const *)(ptr)), \
2063- vcreate_f32( \
2064- 0)) // top values can be random (not denormal or nan for perf)
2065-#define stbir__simdf_load2z(out, ptr) \
2066- (out) = vcombine_f32(vld1_f32((float const *)(ptr)), \
2067- vcreate_f32(0)) // top values must be zero
2068-#define stbir__simdf_load2hmerge(out, reg, ptr) \
2069- (out) = vcombine_f32(vget_low_f32(reg), vld1_f32((float const *)(ptr)))
2070-
2071-#define stbir__simdf_zeroP() vdupq_n_f32(0)
2072-#define stbir__simdf_zero(reg) (reg) = vdupq_n_f32(0)
2073-
2074-#define stbir__simdf_store(ptr, reg) vst1q_f32((float *)(ptr), reg)
2075-#define stbir__simdf_store1(ptr, reg) vst1q_lane_f32((float *)(ptr), reg, 0)
2076-#define stbir__simdf_store2(ptr, reg) \
2077- vst1_f32((float *)(ptr), vget_low_f32(reg))
2078-#define stbir__simdf_store2h(ptr, reg) \
2079- vst1_f32((float *)(ptr), vget_high_f32(reg))
2080-
2081-#define stbir__simdi_store(ptr, reg) vst1q_u32((uint32_t *)(ptr), reg)
2082-#define stbir__simdi_store1(ptr, reg) vst1q_lane_u32((uint32_t *)(ptr), reg, 0)
2083-#define stbir__simdi_store2(ptr, reg) \
2084- vst1_u32((uint32_t *)(ptr), vget_low_u32(reg))
2085-
2086-#define stbir__prefetch(ptr)
2087-
2088-#define stbir__simdi_expand_u8_to_u32(out0, out1, out2, out3, ireg) \
2089- { \
2090- uint16x8_t l = vmovl_u8(vget_low_u8(vreinterpretq_u8_u32(ireg))); \
2091- uint16x8_t h = vmovl_u8(vget_high_u8(vreinterpretq_u8_u32(ireg))); \
2092- out0 = vmovl_u16(vget_low_u16(l)); \
2093- out1 = vmovl_u16(vget_high_u16(l)); \
2094- out2 = vmovl_u16(vget_low_u16(h)); \
2095- out3 = vmovl_u16(vget_high_u16(h)); \
2096- }
2097-
2098-#define stbir__simdi_expand_u8_to_1u32(out, ireg) \
2099- { \
2100- uint16x8_t tmp = vmovl_u8(vget_low_u8(vreinterpretq_u8_u32(ireg))); \
2101- out = vmovl_u16(vget_low_u16(tmp)); \
2102- }
2103-
2104-#define stbir__simdi_expand_u16_to_u32(out0, out1, ireg) \
2105- { \
2106- uint16x8_t tmp = vreinterpretq_u16_u32(ireg); \
2107- out0 = vmovl_u16(vget_low_u16(tmp)); \
2108- out1 = vmovl_u16(vget_high_u16(tmp)); \
2109- }
2110-
2111-#define stbir__simdf_convert_float_to_i32(i, f) \
2112- (i) = vreinterpretq_u32_s32(vcvtq_s32_f32(f))
2113-#define stbir__simdf_convert_float_to_int(f) vgetq_lane_s32(vcvtq_s32_f32(f), 0)
2114-#define stbir__simdi_to_int(i) (int)vgetq_lane_u32(i, 0)
2115-#define stbir__simdf_convert_float_to_uint8(f) \
2116- ((unsigned char)vgetq_lane_s32( \
2117- vcvtq_s32_f32( \
2118- vmaxq_f32(vminq_f32(f, STBIR__CONSTF(STBIR_max_uint8_as_float)), \
2119- vdupq_n_f32(0))), \
2120- 0))
2121-#define stbir__simdf_convert_float_to_short(f) \
2122- ((unsigned short)vgetq_lane_s32( \
2123- vcvtq_s32_f32( \
2124- vmaxq_f32(vminq_f32(f, STBIR__CONSTF(STBIR_max_uint16_as_float)), \
2125- vdupq_n_f32(0))), \
2126- 0))
2127-#define stbir__simdi_convert_i32_to_float(out, ireg) \
2128- (out) = vcvtq_f32_s32(vreinterpretq_s32_u32(ireg))
2129-#define stbir__simdf_add(out, reg0, reg1) (out) = vaddq_f32(reg0, reg1)
2130-#define stbir__simdf_mult(out, reg0, reg1) (out) = vmulq_f32(reg0, reg1)
2131-#define stbir__simdf_mult_mem(out, reg, ptr) \
2132- (out) = vmulq_f32(reg, vld1q_f32((float const *)(ptr)))
2133-#define stbir__simdf_mult1_mem(out, reg, ptr) \
2134- (out) = vmulq_f32(reg, vld1q_dup_f32((float const *)(ptr)))
2135-#define stbir__simdf_add_mem(out, reg, ptr) \
2136- (out) = vaddq_f32(reg, vld1q_f32((float const *)(ptr)))
2137-#define stbir__simdf_add1_mem(out, reg, ptr) \
2138- (out) = vaddq_f32(reg, vld1q_dup_f32((float const *)(ptr)))
2139-
2140-#ifdef STBIR_USE_FMA // not on by default to maintain bit identical simd to
2141- // non-simd (and also x64 no madd to arm madd)
2142-#define stbir__simdf_madd(out, add, mul1, mul2) \
2143- (out) = vfmaq_f32(add, mul1, mul2)
2144-#define stbir__simdf_madd1(out, add, mul1, mul2) \
2145- (out) = vfmaq_f32(add, mul1, mul2)
2146-#define stbir__simdf_madd_mem(out, add, mul, ptr) \
2147- (out) = vfmaq_f32(add, mul, vld1q_f32((float const *)(ptr)))
2148-#define stbir__simdf_madd1_mem(out, add, mul, ptr) \
2149- (out) = vfmaq_f32(add, mul, vld1q_dup_f32((float const *)(ptr)))
2150-#else
2151-#define stbir__simdf_madd(out, add, mul1, mul2) \
2152- (out) = vaddq_f32(add, vmulq_f32(mul1, mul2))
2153-#define stbir__simdf_madd1(out, add, mul1, mul2) \
2154- (out) = vaddq_f32(add, vmulq_f32(mul1, mul2))
2155-#define stbir__simdf_madd_mem(out, add, mul, ptr) \
2156- (out) = vaddq_f32(add, vmulq_f32(mul, vld1q_f32((float const *)(ptr))))
2157-#define stbir__simdf_madd1_mem(out, add, mul, ptr) \
2158- (out) = vaddq_f32(add, vmulq_f32(mul, vld1q_dup_f32((float const *)(ptr))))
2159-#endif
2160-
2161-#define stbir__simdf_add1(out, reg0, reg1) (out) = vaddq_f32(reg0, reg1)
2162-#define stbir__simdf_mult1(out, reg0, reg1) (out) = vmulq_f32(reg0, reg1)
2163-
2164-#define stbir__simdf_and(out, reg0, reg1) \
2165- (out) = vreinterpretq_f32_u32( \
2166- vandq_u32(vreinterpretq_u32_f32(reg0), vreinterpretq_u32_f32(reg1)))
2167-#define stbir__simdf_or(out, reg0, reg1) \
2168- (out) = vreinterpretq_f32_u32( \
2169- vorrq_u32(vreinterpretq_u32_f32(reg0), vreinterpretq_u32_f32(reg1)))
2170-
2171-#define stbir__simdf_min(out, reg0, reg1) (out) = vminq_f32(reg0, reg1)
2172-#define stbir__simdf_max(out, reg0, reg1) (out) = vmaxq_f32(reg0, reg1)
2173-#define stbir__simdf_min1(out, reg0, reg1) (out) = vminq_f32(reg0, reg1)
2174-#define stbir__simdf_max1(out, reg0, reg1) (out) = vmaxq_f32(reg0, reg1)
2175-
2176-#define stbir__simdf_0123ABCDto3ABx(out, reg0, reg1) \
2177- (out) = vextq_f32(reg0, reg1, 3)
2178-#define stbir__simdf_0123ABCDto23Ax(out, reg0, reg1) \
2179- (out) = vextq_f32(reg0, reg1, 2)
2180-
2181-#define stbir__simdf_a1a1(out, alp, ones) \
2182- (out) = vzipq_f32(vuzpq_f32(alp, alp).val[1], ones).val[0]
2183-#define stbir__simdf_1a1a(out, alp, ones) \
2184- (out) = vzipq_f32(ones, vuzpq_f32(alp, alp).val[0]).val[0]
2185-
2186-#if defined(_M_ARM64) || defined(__aarch64__) || defined(__arm64__)
2187-
2188-#define stbir__simdf_aaa1(out, alp, ones) \
2189- (out) = vcopyq_laneq_f32(vdupq_n_f32(vgetq_lane_f32(alp, 3)), 3, ones, 3)
2190-#define stbir__simdf_1aaa(out, alp, ones) \
2191- (out) = vcopyq_laneq_f32(vdupq_n_f32(vgetq_lane_f32(alp, 0)), 0, ones, 0)
2192-
2193-#if defined(_MSC_VER) && !defined(__clang__)
2194-#define stbir_make16(a, b, c, d) \
2195- vcombine_u8( \
2196- vcreate_u8((4 * a + 0) | ((4 * a + 1) << 8) | ((4 * a + 2) << 16) | \
2197- ((4 * a + 3) << 24) | ((stbir_uint64)(4 * b + 0) << 32) | \
2198- ((stbir_uint64)(4 * b + 1) << 40) | \
2199- ((stbir_uint64)(4 * b + 2) << 48) | \
2200- ((stbir_uint64)(4 * b + 3) << 56)), \
2201- vcreate_u8((4 * c + 0) | ((4 * c + 1) << 8) | ((4 * c + 2) << 16) | \
2202- ((4 * c + 3) << 24) | ((stbir_uint64)(4 * d + 0) << 32) | \
2203- ((stbir_uint64)(4 * d + 1) << 40) | \
2204- ((stbir_uint64)(4 * d + 2) << 48) | \
2205- ((stbir_uint64)(4 * d + 3) << 56)))
2206-
2207-static stbir__inline uint8x16x2_t
2208-stbir_make16x2(float32x4_t rega, float32x4_t regb)
2209-{
2210- uint8x16x2_t r = {vreinterpretq_u8_f32(rega), vreinterpretq_u8_f32(regb)};
2211- return r;
2212-}
2213-#else
2214-#define stbir_make16(a, b, c, d) \
2215- (uint8x16_t){4 * a + 0, 4 * a + 1, 4 * a + 2, 4 * a + 3, \
2216- 4 * b + 0, 4 * b + 1, 4 * b + 2, 4 * b + 3, \
2217- 4 * c + 0, 4 * c + 1, 4 * c + 2, 4 * c + 3, \
2218- 4 * d + 0, 4 * d + 1, 4 * d + 2, 4 * d + 3}
2219-#define stbir_make16x2(a, b) \
2220- (uint8x16x2_t) \
2221- { \
2222- { \
2223- vreinterpretq_u8_f32(a), vreinterpretq_u8_f32(b) \
2224- } \
2225- }
2226-#endif
2227-
2228-#define stbir__simdf_swiz(reg, one, two, three, four) \
2229- vreinterpretq_f32_u8(vqtbl1q_u8(vreinterpretq_u8_f32(reg), \
2230- stbir_make16(one, two, three, four)))
2231-#define stbir__simdf_swiz2(rega, regb, one, two, three, four) \
2232- vreinterpretq_f32_u8(vqtbl2q_u8(stbir_make16x2(rega, regb), \
2233- stbir_make16(one, two, three, four)))
2234-
2235-#define stbir__simdi_16madd(out, reg0, reg1) \
2236- { \
2237- int16x8_t r0 = vreinterpretq_s16_u32(reg0); \
2238- int16x8_t r1 = vreinterpretq_s16_u32(reg1); \
2239- int32x4_t tmp0 = vmull_s16(vget_low_s16(r0), vget_low_s16(r1)); \
2240- int32x4_t tmp1 = vmull_s16(vget_high_s16(r0), vget_high_s16(r1)); \
2241- (out) = vreinterpretq_u32_s32(vpaddq_s32(tmp0, tmp1)); \
2242- }
2243-
2244-#else
2245-
2246-#define stbir__simdf_aaa1(out, alp, ones) \
2247- (out) = vsetq_lane_f32(1.0f, vdupq_n_f32(vgetq_lane_f32(alp, 3)), 3)
2248-#define stbir__simdf_1aaa(out, alp, ones) \
2249- (out) = vsetq_lane_f32(1.0f, vdupq_n_f32(vgetq_lane_f32(alp, 0)), 0)
2250-
2251-#if defined(_MSC_VER) && !defined(__clang__)
2252-static stbir__inline uint8x8x2_t
2253-stbir_make8x2(float32x4_t reg)
2254-{
2255- uint8x8x2_t r = {{vget_low_u8(vreinterpretq_u8_f32(reg)),
2256- vget_high_u8(vreinterpretq_u8_f32(reg))}};
2257- return r;
2258-}
2259-#define stbir_make8(a, b) \
2260- vcreate_u8((4 * a + 0) | ((4 * a + 1) << 8) | ((4 * a + 2) << 16) | \
2261- ((4 * a + 3) << 24) | ((stbir_uint64)(4 * b + 0) << 32) | \
2262- ((stbir_uint64)(4 * b + 1) << 40) | \
2263- ((stbir_uint64)(4 * b + 2) << 48) | \
2264- ((stbir_uint64)(4 * b + 3) << 56))
2265-#else
2266-#define stbir_make8x2(reg) \
2267- (uint8x8x2_t) \
2268- { \
2269- { \
2270- vget_low_u8(vreinterpretq_u8_f32(reg)), \
2271- vget_high_u8(vreinterpretq_u8_f32(reg)) \
2272- } \
2273- }
2274-#define stbir_make8(a, b) \
2275- (uint8x8_t){4 * a + 0, 4 * a + 1, 4 * a + 2, 4 * a + 3, \
2276- 4 * b + 0, 4 * b + 1, 4 * b + 2, 4 * b + 3}
2277-#endif
2278-
2279-#define stbir__simdf_swiz(reg, one, two, three, four) \
2280- vreinterpretq_f32_u8( \
2281- vcombine_u8(vtbl2_u8(stbir_make8x2(reg), stbir_make8(one, two)), \
2282- vtbl2_u8(stbir_make8x2(reg), stbir_make8(three, four))))
2283-
2284-#define stbir__simdi_16madd(out, reg0, reg1) \
2285- { \
2286- int16x8_t r0 = vreinterpretq_s16_u32(reg0); \
2287- int16x8_t r1 = vreinterpretq_s16_u32(reg1); \
2288- int32x4_t tmp0 = vmull_s16(vget_low_s16(r0), vget_low_s16(r1)); \
2289- int32x4_t tmp1 = vmull_s16(vget_high_s16(r0), vget_high_s16(r1)); \
2290- int32x2_t out0 = vpadd_s32(vget_low_s32(tmp0), vget_high_s32(tmp0)); \
2291- int32x2_t out1 = vpadd_s32(vget_low_s32(tmp1), vget_high_s32(tmp1)); \
2292- (out) = vreinterpretq_u32_s32(vcombine_s32(out0, out1)); \
2293- }
2294-
2295-#endif
2296-
2297-#define stbir__simdi_and(out, reg0, reg1) (out) = vandq_u32(reg0, reg1)
2298-#define stbir__simdi_or(out, reg0, reg1) (out) = vorrq_u32(reg0, reg1)
2299-
2300-#define stbir__simdf_pack_to_8bytes(out, aa, bb) \
2301- { \
2302- float32x4_t af = \
2303- vmaxq_f32(vminq_f32(aa, STBIR__CONSTF(STBIR_max_uint8_as_float)), \
2304- vdupq_n_f32(0)); \
2305- float32x4_t bf = \
2306- vmaxq_f32(vminq_f32(bb, STBIR__CONSTF(STBIR_max_uint8_as_float)), \
2307- vdupq_n_f32(0)); \
2308- int16x4_t ai = vqmovn_s32(vcvtq_s32_f32(af)); \
2309- int16x4_t bi = vqmovn_s32(vcvtq_s32_f32(bf)); \
2310- uint8x8_t out8 = vqmovun_s16(vcombine_s16(ai, bi)); \
2311- out = vreinterpretq_u32_u8(vcombine_u8(out8, out8)); \
2312- }
2313-
2314-#define stbir__simdf_pack_to_8words(out, aa, bb) \
2315- { \
2316- float32x4_t af = \
2317- vmaxq_f32(vminq_f32(aa, STBIR__CONSTF(STBIR_max_uint16_as_float)), \
2318- vdupq_n_f32(0)); \
2319- float32x4_t bf = \
2320- vmaxq_f32(vminq_f32(bb, STBIR__CONSTF(STBIR_max_uint16_as_float)), \
2321- vdupq_n_f32(0)); \
2322- int32x4_t ai = vcvtq_s32_f32(af); \
2323- int32x4_t bi = vcvtq_s32_f32(bf); \
2324- out = vreinterpretq_u32_u16( \
2325- vcombine_u16(vqmovun_s32(ai), vqmovun_s32(bi))); \
2326- }
2327-
2328-#define stbir__interleave_pack_and_store_16_u8(ptr, r0, r1, r2, r3) \
2329- { \
2330- int16x4x2_t tmp0 = vzip_s16(vqmovn_s32(vreinterpretq_s32_u32(r0)), \
2331- vqmovn_s32(vreinterpretq_s32_u32(r2))); \
2332- int16x4x2_t tmp1 = vzip_s16(vqmovn_s32(vreinterpretq_s32_u32(r1)), \
2333- vqmovn_s32(vreinterpretq_s32_u32(r3))); \
2334- uint8x8x2_t out = {{ \
2335- vqmovun_s16(vcombine_s16(tmp0.val[0], tmp0.val[1])), \
2336- vqmovun_s16(vcombine_s16(tmp1.val[0], tmp1.val[1])), \
2337- }}; \
2338- vst2_u8(ptr, out); \
2339- }
2340-
2341-#define stbir__simdf_load4_transposed(o0, o1, o2, o3, ptr) \
2342- { \
2343- float32x4x4_t tmp = vld4q_f32(ptr); \
2344- o0 = tmp.val[0]; \
2345- o1 = tmp.val[1]; \
2346- o2 = tmp.val[2]; \
2347- o3 = tmp.val[3]; \
2348- }
2349-
2350-#define stbir__simdi_32shr(out, reg, imm) out = vshrq_n_u32(reg, imm)
2351-
2352-#if defined(_MSC_VER) && !defined(__clang__)
2353-#define STBIR__SIMDF_CONST(var, x) \
2354- __declspec(align(8)) float var[] = {x, x, x, x}
2355-#define STBIR__SIMDI_CONST(var, x) \
2356- __declspec(align(8)) uint32_t var[] = {x, x, x, x}
2357-#define STBIR__CONSTF(var) (*(const float32x4_t *)var)
2358-#define STBIR__CONSTI(var) (*(const uint32x4_t *)var)
2359-#else
2360-#define STBIR__SIMDF_CONST(var, x) stbir__simdf var = {x, x, x, x}
2361-#define STBIR__SIMDI_CONST(var, x) stbir__simdi var = {x, x, x, x}
2362-#define STBIR__CONSTF(var) (var)
2363-#define STBIR__CONSTI(var) (var)
2364-#endif
2365-
2366-#ifdef STBIR_FLOORF
2367-#undef STBIR_FLOORF
2368-#endif
2369-#define STBIR_FLOORF stbir_simd_floorf
2370-static stbir__inline float
2371-stbir_simd_floorf(float x)
2372-{
2373-#if defined(_M_ARM64) || defined(__aarch64__) || defined(__arm64__)
2374- return vget_lane_f32(vrndm_f32(vdup_n_f32(x)), 0);
2375-#else
2376- float32x2_t f = vdup_n_f32(x);
2377- float32x2_t t = vcvt_f32_s32(vcvt_s32_f32(f));
2378- uint32x2_t a = vclt_f32(f, t);
2379- uint32x2_t b = vreinterpret_u32_f32(vdup_n_f32(-1.0f));
2380- float32x2_t r = vadd_f32(t, vreinterpret_f32_u32(vand_u32(a, b)));
2381- return vget_lane_f32(r, 0);
2382-#endif
2383-}
2384-
2385-#ifdef STBIR_CEILF
2386-#undef STBIR_CEILF
2387-#endif
2388-#define STBIR_CEILF stbir_simd_ceilf
2389-static stbir__inline float
2390-stbir_simd_ceilf(float x)
2391-{
2392-#if defined(_M_ARM64) || defined(__aarch64__) || defined(__arm64__)
2393- return vget_lane_f32(vrndp_f32(vdup_n_f32(x)), 0);
2394-#else
2395- float32x2_t f = vdup_n_f32(x);
2396- float32x2_t t = vcvt_f32_s32(vcvt_s32_f32(f));
2397- uint32x2_t a = vclt_f32(t, f);
2398- uint32x2_t b = vreinterpret_u32_f32(vdup_n_f32(1.0f));
2399- float32x2_t r = vadd_f32(t, vreinterpret_f32_u32(vand_u32(a, b)));
2400- return vget_lane_f32(r, 0);
2401-#endif
2402-}
2403-
2404-#define STBIR_SIMD
2405-
2406-#elif defined(STBIR_WASM)
2407-
2408-#include <wasm_simd128.h>
2409-
2410-#define stbir__simdf v128_t
2411-#define stbir__simdi v128_t
2412-
2413-#define stbir_simdi_castf(reg) (reg)
2414-#define stbir_simdf_casti(reg) (reg)
2415-
2416-#define stbir__simdf_load(reg, ptr) (reg) = wasm_v128_load((void const *)(ptr))
2417-#define stbir__simdi_load(reg, ptr) (reg) = wasm_v128_load((void const *)(ptr))
2418-#define stbir__simdf_load1(out, ptr) \
2419- (out) = wasm_v128_load32_splat( \
2420- (void const *)(ptr)) // top values can be random (not denormal or nan
2421- // for perf)
2422-#define stbir__simdi_load1(out, ptr) \
2423- (out) = wasm_v128_load32_splat((void const *)(ptr))
2424-#define stbir__simdf_load1z(out, ptr) \
2425- (out) = \
2426- wasm_v128_load32_zero((void const *)(ptr)) // top values must be zero
2427-#define stbir__simdf_frep4(fvar) wasm_f32x4_splat(fvar)
2428-#define stbir__simdf_load1frep4(out, fvar) (out) = wasm_f32x4_splat(fvar)
2429-#define stbir__simdf_load2(out, ptr) \
2430- (out) = wasm_v128_load64_splat( \
2431- (void const *)(ptr)) // top values can be random (not denormal or nan
2432- // for perf)
2433-#define stbir__simdf_load2z(out, ptr) \
2434- (out) = \
2435- wasm_v128_load64_zero((void const *)(ptr)) // top values must be zero
2436-#define stbir__simdf_load2hmerge(out, reg, ptr) \
2437- (out) = wasm_v128_load64_lane((void const *)(ptr), reg, 1)
2438-
2439-#define stbir__simdf_zeroP() wasm_f32x4_const_splat(0)
2440-#define stbir__simdf_zero(reg) (reg) = wasm_f32x4_const_splat(0)
2441-
2442-#define stbir__simdf_store(ptr, reg) wasm_v128_store((void *)(ptr), reg)
2443-#define stbir__simdf_store1(ptr, reg) \
2444- wasm_v128_store32_lane((void *)(ptr), reg, 0)
2445-#define stbir__simdf_store2(ptr, reg) \
2446- wasm_v128_store64_lane((void *)(ptr), reg, 0)
2447-#define stbir__simdf_store2h(ptr, reg) \
2448- wasm_v128_store64_lane((void *)(ptr), reg, 1)
2449-
2450-#define stbir__simdi_store(ptr, reg) wasm_v128_store((void *)(ptr), reg)
2451-#define stbir__simdi_store1(ptr, reg) \
2452- wasm_v128_store32_lane((void *)(ptr), reg, 0)
2453-#define stbir__simdi_store2(ptr, reg) \
2454- wasm_v128_store64_lane((void *)(ptr), reg, 0)
2455-
2456-#define stbir__prefetch(ptr)
2457-
2458-#define stbir__simdi_expand_u8_to_u32(out0, out1, out2, out3, ireg) \
2459- { \
2460- v128_t l = wasm_u16x8_extend_low_u8x16(ireg); \
2461- v128_t h = wasm_u16x8_extend_high_u8x16(ireg); \
2462- out0 = wasm_u32x4_extend_low_u16x8(l); \
2463- out1 = wasm_u32x4_extend_high_u16x8(l); \
2464- out2 = wasm_u32x4_extend_low_u16x8(h); \
2465- out3 = wasm_u32x4_extend_high_u16x8(h); \
2466- }
2467-
2468-#define stbir__simdi_expand_u8_to_1u32(out, ireg) \
2469- { \
2470- v128_t tmp = wasm_u16x8_extend_low_u8x16(ireg); \
2471- out = wasm_u32x4_extend_low_u16x8(tmp); \
2472- }
2473-
2474-#define stbir__simdi_expand_u16_to_u32(out0, out1, ireg) \
2475- { \
2476- out0 = wasm_u32x4_extend_low_u16x8(ireg); \
2477- out1 = wasm_u32x4_extend_high_u16x8(ireg); \
2478- }
2479-
2480-#define stbir__simdf_convert_float_to_i32(i, f) \
2481- (i) = wasm_i32x4_trunc_sat_f32x4(f)
2482-#define stbir__simdf_convert_float_to_int(f) \
2483- wasm_i32x4_extract_lane(wasm_i32x4_trunc_sat_f32x4(f), 0)
2484-#define stbir__simdi_to_int(i) wasm_i32x4_extract_lane(i, 0)
2485-#define stbir__simdf_convert_float_to_uint8(f) \
2486- ((unsigned char)wasm_i32x4_extract_lane( \
2487- wasm_i32x4_trunc_sat_f32x4( \
2488- wasm_f32x4_max(wasm_f32x4_min(f, STBIR_max_uint8_as_float), \
2489- wasm_f32x4_const_splat(0))), \
2490- 0))
2491-#define stbir__simdf_convert_float_to_short(f) \
2492- ((unsigned short)wasm_i32x4_extract_lane( \
2493- wasm_i32x4_trunc_sat_f32x4( \
2494- wasm_f32x4_max(wasm_f32x4_min(f, STBIR_max_uint16_as_float), \
2495- wasm_f32x4_const_splat(0))), \
2496- 0))
2497-#define stbir__simdi_convert_i32_to_float(out, ireg) \
2498- (out) = wasm_f32x4_convert_i32x4(ireg)
2499-#define stbir__simdf_add(out, reg0, reg1) (out) = wasm_f32x4_add(reg0, reg1)
2500-#define stbir__simdf_mult(out, reg0, reg1) (out) = wasm_f32x4_mul(reg0, reg1)
2501-#define stbir__simdf_mult_mem(out, reg, ptr) \
2502- (out) = wasm_f32x4_mul(reg, wasm_v128_load((void const *)(ptr)))
2503-#define stbir__simdf_mult1_mem(out, reg, ptr) \
2504- (out) = wasm_f32x4_mul(reg, wasm_v128_load32_splat((void const *)(ptr)))
2505-#define stbir__simdf_add_mem(out, reg, ptr) \
2506- (out) = wasm_f32x4_add(reg, wasm_v128_load((void const *)(ptr)))
2507-#define stbir__simdf_add1_mem(out, reg, ptr) \
2508- (out) = wasm_f32x4_add(reg, wasm_v128_load32_splat((void const *)(ptr)))
2509-
2510-#define stbir__simdf_madd(out, add, mul1, mul2) \
2511- (out) = wasm_f32x4_add(add, wasm_f32x4_mul(mul1, mul2))
2512-#define stbir__simdf_madd1(out, add, mul1, mul2) \
2513- (out) = wasm_f32x4_add(add, wasm_f32x4_mul(mul1, mul2))
2514-#define stbir__simdf_madd_mem(out, add, mul, ptr) \
2515- (out) = wasm_f32x4_add( \
2516- add, wasm_f32x4_mul(mul, wasm_v128_load((void const *)(ptr))))
2517-#define stbir__simdf_madd1_mem(out, add, mul, ptr) \
2518- (out) = wasm_f32x4_add( \
2519- add, wasm_f32x4_mul(mul, wasm_v128_load32_splat((void const *)(ptr))))
2520-
2521-#define stbir__simdf_add1(out, reg0, reg1) (out) = wasm_f32x4_add(reg0, reg1)
2522-#define stbir__simdf_mult1(out, reg0, reg1) (out) = wasm_f32x4_mul(reg0, reg1)
2523-
2524-#define stbir__simdf_and(out, reg0, reg1) (out) = wasm_v128_and(reg0, reg1)
2525-#define stbir__simdf_or(out, reg0, reg1) (out) = wasm_v128_or(reg0, reg1)
2526-
2527-#define stbir__simdf_min(out, reg0, reg1) (out) = wasm_f32x4_min(reg0, reg1)
2528-#define stbir__simdf_max(out, reg0, reg1) (out) = wasm_f32x4_max(reg0, reg1)
2529-#define stbir__simdf_min1(out, reg0, reg1) (out) = wasm_f32x4_min(reg0, reg1)
2530-#define stbir__simdf_max1(out, reg0, reg1) (out) = wasm_f32x4_max(reg0, reg1)
2531-
2532-#define stbir__simdf_0123ABCDto3ABx(out, reg0, reg1) \
2533- (out) = wasm_i32x4_shuffle(reg0, reg1, 3, 4, 5, -1)
2534-#define stbir__simdf_0123ABCDto23Ax(out, reg0, reg1) \
2535- (out) = wasm_i32x4_shuffle(reg0, reg1, 2, 3, 4, -1)
2536-
2537-#define stbir__simdf_aaa1(out, alp, ones) \
2538- (out) = wasm_i32x4_shuffle(alp, ones, 3, 3, 3, 4)
2539-#define stbir__simdf_1aaa(out, alp, ones) \
2540- (out) = wasm_i32x4_shuffle(alp, ones, 4, 0, 0, 0)
2541-#define stbir__simdf_a1a1(out, alp, ones) \
2542- (out) = wasm_i32x4_shuffle(alp, ones, 1, 4, 3, 4)
2543-#define stbir__simdf_1a1a(out, alp, ones) \
2544- (out) = wasm_i32x4_shuffle(alp, ones, 4, 0, 4, 2)
2545-
2546-#define stbir__simdf_swiz(reg, one, two, three, four) \
2547- wasm_i32x4_shuffle(reg, reg, one, two, three, four)
2548-
2549-#define stbir__simdi_and(out, reg0, reg1) (out) = wasm_v128_and(reg0, reg1)
2550-#define stbir__simdi_or(out, reg0, reg1) (out) = wasm_v128_or(reg0, reg1)
2551-#define stbir__simdi_16madd(out, reg0, reg1) \
2552- (out) = wasm_i32x4_dot_i16x8(reg0, reg1)
2553-
2554-#define stbir__simdf_pack_to_8bytes(out, aa, bb) \
2555- { \
2556- v128_t af = \
2557- wasm_f32x4_max(wasm_f32x4_min(aa, STBIR_max_uint8_as_float), \
2558- wasm_f32x4_const_splat(0)); \
2559- v128_t bf = \
2560- wasm_f32x4_max(wasm_f32x4_min(bb, STBIR_max_uint8_as_float), \
2561- wasm_f32x4_const_splat(0)); \
2562- v128_t ai = wasm_i32x4_trunc_sat_f32x4(af); \
2563- v128_t bi = wasm_i32x4_trunc_sat_f32x4(bf); \
2564- v128_t out16 = wasm_i16x8_narrow_i32x4(ai, bi); \
2565- out = wasm_u8x16_narrow_i16x8(out16, out16); \
2566- }
2567-
2568-#define stbir__simdf_pack_to_8words(out, aa, bb) \
2569- { \
2570- v128_t af = \
2571- wasm_f32x4_max(wasm_f32x4_min(aa, STBIR_max_uint16_as_float), \
2572- wasm_f32x4_const_splat(0)); \
2573- v128_t bf = \
2574- wasm_f32x4_max(wasm_f32x4_min(bb, STBIR_max_uint16_as_float), \
2575- wasm_f32x4_const_splat(0)); \
2576- v128_t ai = wasm_i32x4_trunc_sat_f32x4(af); \
2577- v128_t bi = wasm_i32x4_trunc_sat_f32x4(bf); \
2578- out = wasm_u16x8_narrow_i32x4(ai, bi); \
2579- }
2580-
2581-#define stbir__interleave_pack_and_store_16_u8(ptr, r0, r1, r2, r3) \
2582- { \
2583- v128_t tmp0 = wasm_i16x8_narrow_i32x4(r0, r1); \
2584- v128_t tmp1 = wasm_i16x8_narrow_i32x4(r2, r3); \
2585- v128_t tmp = wasm_u8x16_narrow_i16x8(tmp0, tmp1); \
2586- tmp = wasm_i8x16_shuffle(tmp, tmp, 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, \
2587- 14, 3, 7, 11, 15); \
2588- wasm_v128_store((void *)(ptr), tmp); \
2589- }
2590-
2591-#define stbir__simdf_load4_transposed(o0, o1, o2, o3, ptr) \
2592- { \
2593- v128_t t0 = wasm_v128_load(ptr); \
2594- v128_t t1 = wasm_v128_load(ptr + 4); \
2595- v128_t t2 = wasm_v128_load(ptr + 8); \
2596- v128_t t3 = wasm_v128_load(ptr + 12); \
2597- v128_t s0 = wasm_i32x4_shuffle(t0, t1, 0, 4, 2, 6); \
2598- v128_t s1 = wasm_i32x4_shuffle(t0, t1, 1, 5, 3, 7); \
2599- v128_t s2 = wasm_i32x4_shuffle(t2, t3, 0, 4, 2, 6); \
2600- v128_t s3 = wasm_i32x4_shuffle(t2, t3, 1, 5, 3, 7); \
2601- o0 = wasm_i32x4_shuffle(s0, s2, 0, 1, 4, 5); \
2602- o1 = wasm_i32x4_shuffle(s1, s3, 0, 1, 4, 5); \
2603- o2 = wasm_i32x4_shuffle(s0, s2, 2, 3, 6, 7); \
2604- o3 = wasm_i32x4_shuffle(s1, s3, 2, 3, 6, 7); \
2605- }
2606-
2607-#define stbir__simdi_32shr(out, reg, imm) out = wasm_u32x4_shr(reg, imm)
2608-
2609-typedef float stbir__f32x4
2610- __attribute__((__vector_size__(16), __aligned__(16)));
2611-#define STBIR__SIMDF_CONST(var, x) \
2612- stbir__simdf var = (v128_t)(stbir__f32x4) { x, x, x, x }
2613-#define STBIR__SIMDI_CONST(var, x) stbir__simdi var = {x, x, x, x}
2614-#define STBIR__CONSTF(var) (var)
2615-#define STBIR__CONSTI(var) (var)
2616-
2617-#ifdef STBIR_FLOORF
2618-#undef STBIR_FLOORF
2619-#endif
2620-#define STBIR_FLOORF stbir_simd_floorf
2621-static stbir__inline float
2622-stbir_simd_floorf(float x)
2623-{
2624- return wasm_f32x4_extract_lane(wasm_f32x4_floor(wasm_f32x4_splat(x)), 0);
2625-}
2626-
2627-#ifdef STBIR_CEILF
2628-#undef STBIR_CEILF
2629-#endif
2630-#define STBIR_CEILF stbir_simd_ceilf
2631-static stbir__inline float
2632-stbir_simd_ceilf(float x)
2633-{
2634- return wasm_f32x4_extract_lane(wasm_f32x4_ceil(wasm_f32x4_splat(x)), 0);
2635-}
2636-
2637-#define STBIR_SIMD
2638-
2639-#endif // SSE2/NEON/WASM
2640-
2641-#endif // NO SIMD
2642-
2643-#ifdef STBIR_SIMD8
2644-#define stbir__simdfX stbir__simdf8
2645-#define stbir__simdiX stbir__simdi8
2646-#define stbir__simdfX_load stbir__simdf8_load
2647-#define stbir__simdiX_load stbir__simdi8_load
2648-#define stbir__simdfX_mult stbir__simdf8_mult
2649-#define stbir__simdfX_add_mem stbir__simdf8_add_mem
2650-#define stbir__simdfX_madd_mem stbir__simdf8_madd_mem
2651-#define stbir__simdfX_store stbir__simdf8_store
2652-#define stbir__simdiX_store stbir__simdi8_store
2653-#define stbir__simdf_frepX stbir__simdf8_frep8
2654-#define stbir__simdfX_madd stbir__simdf8_madd
2655-#define stbir__simdfX_min stbir__simdf8_min
2656-#define stbir__simdfX_max stbir__simdf8_max
2657-#define stbir__simdfX_aaa1 stbir__simdf8_aaa1
2658-#define stbir__simdfX_1aaa stbir__simdf8_1aaa
2659-#define stbir__simdfX_a1a1 stbir__simdf8_a1a1
2660-#define stbir__simdfX_1a1a stbir__simdf8_1a1a
2661-#define stbir__simdfX_convert_float_to_i32 stbir__simdf8_convert_float_to_i32
2662-#define stbir__simdfX_pack_to_words stbir__simdf8_pack_to_16words
2663-#define stbir__simdfX_zero stbir__simdf8_zero
2664-#define STBIR_onesX STBIR_ones8
2665-#define STBIR_max_uint8_as_floatX STBIR_max_uint8_as_float8
2666-#define STBIR_max_uint16_as_floatX STBIR_max_uint16_as_float8
2667-#define STBIR_simd_point5X STBIR_simd_point58
2668-#define stbir__simdfX_float_count 8
2669-#define stbir__simdfX_0123to1230 stbir__simdf8_0123to12301230
2670-#define stbir__simdfX_0123to2103 stbir__simdf8_0123to21032103
2671-static const stbir__simdf8 STBIR_max_uint16_as_float_inverted8 = {
2672- stbir__max_uint16_as_float_inverted, stbir__max_uint16_as_float_inverted,
2673- stbir__max_uint16_as_float_inverted, stbir__max_uint16_as_float_inverted,
2674- stbir__max_uint16_as_float_inverted, stbir__max_uint16_as_float_inverted,
2675- stbir__max_uint16_as_float_inverted, stbir__max_uint16_as_float_inverted};
2676-static const stbir__simdf8 STBIR_max_uint8_as_float_inverted8 = {
2677- stbir__max_uint8_as_float_inverted, stbir__max_uint8_as_float_inverted,
2678- stbir__max_uint8_as_float_inverted, stbir__max_uint8_as_float_inverted,
2679- stbir__max_uint8_as_float_inverted, stbir__max_uint8_as_float_inverted,
2680- stbir__max_uint8_as_float_inverted, stbir__max_uint8_as_float_inverted};
2681-static const stbir__simdf8 STBIR_ones8 = {1.0, 1.0, 1.0, 1.0,
2682- 1.0, 1.0, 1.0, 1.0};
2683-static const stbir__simdf8 STBIR_simd_point58 = {0.5, 0.5, 0.5, 0.5,
2684- 0.5, 0.5, 0.5, 0.5};
2685-static const stbir__simdf8 STBIR_max_uint8_as_float8 = {
2686- stbir__max_uint8_as_float, stbir__max_uint8_as_float,
2687- stbir__max_uint8_as_float, stbir__max_uint8_as_float,
2688- stbir__max_uint8_as_float, stbir__max_uint8_as_float,
2689- stbir__max_uint8_as_float, stbir__max_uint8_as_float};
2690-static const stbir__simdf8 STBIR_max_uint16_as_float8 = {
2691- stbir__max_uint16_as_float, stbir__max_uint16_as_float,
2692- stbir__max_uint16_as_float, stbir__max_uint16_as_float,
2693- stbir__max_uint16_as_float, stbir__max_uint16_as_float,
2694- stbir__max_uint16_as_float, stbir__max_uint16_as_float};
2695-#else
2696-#define stbir__simdfX stbir__simdf
2697-#define stbir__simdiX stbir__simdi
2698-#define stbir__simdfX_load stbir__simdf_load
2699-#define stbir__simdiX_load stbir__simdi_load
2700-#define stbir__simdfX_mult stbir__simdf_mult
2701-#define stbir__simdfX_add_mem stbir__simdf_add_mem
2702-#define stbir__simdfX_madd_mem stbir__simdf_madd_mem
2703-#define stbir__simdfX_store stbir__simdf_store
2704-#define stbir__simdiX_store stbir__simdi_store
2705-#define stbir__simdf_frepX stbir__simdf_frep4
2706-#define stbir__simdfX_madd stbir__simdf_madd
2707-#define stbir__simdfX_min stbir__simdf_min
2708-#define stbir__simdfX_max stbir__simdf_max
2709-#define stbir__simdfX_aaa1 stbir__simdf_aaa1
2710-#define stbir__simdfX_1aaa stbir__simdf_1aaa
2711-#define stbir__simdfX_a1a1 stbir__simdf_a1a1
2712-#define stbir__simdfX_1a1a stbir__simdf_1a1a
2713-#define stbir__simdfX_convert_float_to_i32 stbir__simdf_convert_float_to_i32
2714-#define stbir__simdfX_pack_to_words stbir__simdf_pack_to_8words
2715-#define stbir__simdfX_zero stbir__simdf_zero
2716-#define STBIR_onesX STBIR__CONSTF(STBIR_ones)
2717-#define STBIR_simd_point5X STBIR__CONSTF(STBIR_simd_point5)
2718-#define STBIR_max_uint8_as_floatX STBIR__CONSTF(STBIR_max_uint8_as_float)
2719-#define STBIR_max_uint16_as_floatX STBIR__CONSTF(STBIR_max_uint16_as_float)
2720-#define stbir__simdfX_float_count 4
2721-#define stbir__if_simdf8_cast_to_simdf4(val) (val)
2722-#define stbir__simdfX_0123to1230 stbir__simdf_0123to1230
2723-#define stbir__simdfX_0123to2103 stbir__simdf_0123to2103
2724-#endif
2725-
2726-#if defined(STBIR_NEON) && !defined(_M_ARM) && !defined(__arm__)
2727-
2728-#if defined(_MSC_VER) && !defined(__clang__)
2729-typedef __int16 stbir__FP16;
2730-#else
2731-typedef float16_t stbir__FP16;
2732-#endif
2733-
2734-#else // no NEON, or 32-bit ARM for MSVC
2735-
2736-typedef union stbir__FP16 {
2737- unsigned short u;
2738-} stbir__FP16;
2739-
2740-#endif
2741-
2742-#if (!defined(STBIR_NEON) && !defined(STBIR_FP16C)) || \
2743- (defined(STBIR_NEON) && defined(_M_ARM)) || \
2744- (defined(STBIR_NEON) && defined(__arm__))
2745-
2746-// Fabian's half float routines, see: https://gist.github.com/rygorous/2156668
2747-
2748-static stbir__inline float
2749-stbir__half_to_float(stbir__FP16 h)
2750-{
2751- static const stbir__FP32 magic = {(254 - 15) << 23};
2752- static const stbir__FP32 was_infnan = {(127 + 16) << 23};
2753- stbir__FP32 o;
2754-
2755- o.u = (h.u & 0x7fff) << 13; // exponent/mantissa bits
2756- o.f *= magic.f; // exponent adjust
2757- if (o.f >= was_infnan.f) { // make sure Inf/NaN survive
2758- o.u |= 255 << 23;
2759- }
2760- o.u |= (h.u & 0x8000) << 16; // sign bit
2761- return o.f;
2762-}
2763-
2764-static stbir__inline stbir__FP16
2765-stbir__float_to_half(float val)
2766-{
2767- stbir__FP32 f32infty = {255 << 23};
2768- stbir__FP32 f16max = {(127 + 16) << 23};
2769- stbir__FP32 denorm_magic = {((127 - 15) + (23 - 10) + 1) << 23};
2770- unsigned int sign_mask = 0x80000000u;
2771- stbir__FP16 o = {0};
2772- stbir__FP32 f;
2773- unsigned int sign;
2774-
2775- f.f = val;
2776- sign = f.u & sign_mask;
2777- f.u ^= sign;
2778-
2779- if (f.u >= f16max.u) { // result is Inf or NaN (all exponent bits set)
2780- o.u = (f.u > f32infty.u) ? 0x7e00 : 0x7c00; // NaN->qNaN and Inf->Inf
2781- } else // (De)normalized number or zero
2782- {
2783- if (f.u < (113 << 23)) // resulting FP16 is subnormal or zero
2784- {
2785- // use a magic value to align our 10 mantissa bits at the bottom of
2786- // the float. as long as FP addition is round-to-nearest-even this
2787- // just works.
2788- f.f += denorm_magic.f;
2789- // and one integer subtract of the bias later, we have our final
2790- // float!
2791- o.u = (unsigned short)(f.u - denorm_magic.u);
2792- } else {
2793- unsigned int mant_odd =
2794- (f.u >> 13) & 1; // resulting mantissa is odd
2795- // update exponent, rounding bias part 1
2796- f.u = f.u + ((15u - 127) << 23) + 0xfff;
2797- // rounding bias part 2
2798- f.u += mant_odd;
2799- // take the bits!
2800- o.u = (unsigned short)(f.u >> 13);
2801- }
2802- }
2803-
2804- o.u |= sign >> 16;
2805- return o;
2806-}
2807-
2808-#endif
2809-
2810-#if defined(STBIR_FP16C)
2811-
2812-#include <immintrin.h>
2813-
2814-static stbir__inline void
2815-stbir__half_to_float_SIMD(float *output, stbir__FP16 const *input)
2816-{
2817- _mm256_storeu_ps((float *)output,
2818- _mm256_cvtph_ps(_mm_loadu_si128((__m128i const *)input)));
2819-}
2820-
2821-static stbir__inline void
2822-stbir__float_to_half_SIMD(stbir__FP16 *output, float const *input)
2823-{
2824- _mm_storeu_si128((__m128i *)output,
2825- _mm256_cvtps_ph(_mm256_loadu_ps(input), 0));
2826-}
2827-
2828-static stbir__inline float
2829-stbir__half_to_float(stbir__FP16 h)
2830-{
2831- return _mm_cvtss_f32(_mm_cvtph_ps(_mm_cvtsi32_si128((int)h.u)));
2832-}
2833-
2834-static stbir__inline stbir__FP16
2835-stbir__float_to_half(float f)
2836-{
2837- stbir__FP16 h;
2838- h.u = (unsigned short)_mm_cvtsi128_si32(_mm_cvtps_ph(_mm_set_ss(f), 0));
2839- return h;
2840-}
2841-
2842-#elif defined(STBIR_SSE2)
2843-
2844-// Fabian's half float routines, see: https://gist.github.com/rygorous/2156668
2845-stbir__inline static void
2846-stbir__half_to_float_SIMD(float *output, void const *input)
2847-{
2848- static const STBIR__SIMDI_CONST(mask_nosign, 0x7fff);
2849- static const STBIR__SIMDI_CONST(smallest_normal, 0x0400);
2850- static const STBIR__SIMDI_CONST(infinity, 0x7c00);
2851- static const STBIR__SIMDI_CONST(expadjust_normal, (127 - 15) << 23);
2852- static const STBIR__SIMDI_CONST(magic_denorm, 113 << 23);
2853-
2854- __m128i i = _mm_loadu_si128((__m128i const *)(input));
2855- __m128i h = _mm_unpacklo_epi16(i, _mm_setzero_si128());
2856- __m128i mnosign = STBIR__CONSTI(mask_nosign);
2857- __m128i eadjust = STBIR__CONSTI(expadjust_normal);
2858- __m128i smallest = STBIR__CONSTI(smallest_normal);
2859- __m128i infty = STBIR__CONSTI(infinity);
2860- __m128i expmant = _mm_and_si128(mnosign, h);
2861- __m128i justsign = _mm_xor_si128(h, expmant);
2862- __m128i b_notinfnan = _mm_cmpgt_epi32(infty, expmant);
2863- __m128i b_isdenorm = _mm_cmpgt_epi32(smallest, expmant);
2864- __m128i shifted = _mm_slli_epi32(expmant, 13);
2865- __m128i adj_infnan = _mm_andnot_si128(b_notinfnan, eadjust);
2866- __m128i adjusted = _mm_add_epi32(eadjust, shifted);
2867- __m128i den1 = _mm_add_epi32(shifted, STBIR__CONSTI(magic_denorm));
2868- __m128i adjusted2 = _mm_add_epi32(adjusted, adj_infnan);
2869- __m128 den2 =
2870- _mm_sub_ps(_mm_castsi128_ps(den1), *(const __m128 *)&magic_denorm);
2871- __m128 adjusted3 = _mm_and_ps(den2, _mm_castsi128_ps(b_isdenorm));
2872- __m128 adjusted4 = _mm_andnot_ps(_mm_castsi128_ps(b_isdenorm),
2873- _mm_castsi128_ps(adjusted2));
2874- __m128 adjusted5 = _mm_or_ps(adjusted3, adjusted4);
2875- __m128i sign = _mm_slli_epi32(justsign, 16);
2876- __m128 final = _mm_or_ps(adjusted5, _mm_castsi128_ps(sign));
2877- stbir__simdf_store(output + 0, final);
2878-
2879- h = _mm_unpackhi_epi16(i, _mm_setzero_si128());
2880- expmant = _mm_and_si128(mnosign, h);
2881- justsign = _mm_xor_si128(h, expmant);
2882- b_notinfnan = _mm_cmpgt_epi32(infty, expmant);
2883- b_isdenorm = _mm_cmpgt_epi32(smallest, expmant);
2884- shifted = _mm_slli_epi32(expmant, 13);
2885- adj_infnan = _mm_andnot_si128(b_notinfnan, eadjust);
2886- adjusted = _mm_add_epi32(eadjust, shifted);
2887- den1 = _mm_add_epi32(shifted, STBIR__CONSTI(magic_denorm));
2888- adjusted2 = _mm_add_epi32(adjusted, adj_infnan);
2889- den2 = _mm_sub_ps(_mm_castsi128_ps(den1), *(const __m128 *)&magic_denorm);
2890- adjusted3 = _mm_and_ps(den2, _mm_castsi128_ps(b_isdenorm));
2891- adjusted4 = _mm_andnot_ps(_mm_castsi128_ps(b_isdenorm),
2892- _mm_castsi128_ps(adjusted2));
2893- adjusted5 = _mm_or_ps(adjusted3, adjusted4);
2894- sign = _mm_slli_epi32(justsign, 16);
2895- final = _mm_or_ps(adjusted5, _mm_castsi128_ps(sign));
2896- stbir__simdf_store(output + 4, final);
2897-
2898- // ~38 SSE2 ops for 8 values
2899-}
2900-
2901-// Fabian's round-to-nearest-even float to half
2902-// ~48 SSE2 ops for 8 output
2903-stbir__inline static void
2904-stbir__float_to_half_SIMD(void *output, float const *input)
2905-{
2906- static const STBIR__SIMDI_CONST(mask_sign, 0x80000000u);
2907- static const STBIR__SIMDI_CONST(
2908- c_f16max, (127 + 16) << 23); // all FP32 values >=this round to +inf
2909- static const STBIR__SIMDI_CONST(c_nanbit, 0x200);
2910- static const STBIR__SIMDI_CONST(c_infty_as_fp16, 0x7c00);
2911- static const STBIR__SIMDI_CONST(
2912- c_min_normal, (127 - 14)
2913- << 23); // smallest FP32 that yields a normalized FP16
2914- static const STBIR__SIMDI_CONST(c_subnorm_magic,
2915- ((127 - 15) + (23 - 10) + 1) << 23);
2916- static const STBIR__SIMDI_CONST(
2917- c_normal_bias,
2918- 0xfff -
2919- ((127 - 15) << 23)); // adjust exponent and add mantissa rounding
2920-
2921- __m128 f = _mm_loadu_ps(input);
2922- __m128 msign = _mm_castsi128_ps(STBIR__CONSTI(mask_sign));
2923- __m128 justsign = _mm_and_ps(msign, f);
2924- __m128 absf = _mm_xor_ps(f, justsign);
2925- __m128i absf_int = _mm_castps_si128(
2926- absf); // the cast is "free" (extra bypass latency, but no thruput hit)
2927- __m128i f16max = STBIR__CONSTI(c_f16max);
2928- __m128 b_isnan = _mm_cmpunord_ps(absf, absf); // is this a NaN?
2929- __m128i b_isregular =
2930- _mm_cmpgt_epi32(f16max, absf_int); // (sub)normalized or special?
2931- __m128i nanbit =
2932- _mm_and_si128(_mm_castps_si128(b_isnan), STBIR__CONSTI(c_nanbit));
2933- __m128i inf_or_nan = _mm_or_si128(
2934- nanbit, STBIR__CONSTI(c_infty_as_fp16)); // output for specials
2935-
2936- __m128i min_normal = STBIR__CONSTI(c_min_normal);
2937- __m128i b_issub = _mm_cmpgt_epi32(min_normal, absf_int);
2938-
2939- // "result is subnormal" path
2940- __m128 subnorm1 = _mm_add_ps(
2941- absf, _mm_castsi128_ps(STBIR__CONSTI(
2942- c_subnorm_magic))); // magic value to round output mantissa
2943- __m128i subnorm2 =
2944- _mm_sub_epi32(_mm_castps_si128(subnorm1),
2945- STBIR__CONSTI(c_subnorm_magic)); // subtract out bias
2946-
2947- // "result is normal" path
2948- __m128i mantoddbit = _mm_slli_epi32(
2949- absf_int, 31 - 13); // shift bit 13 (mantissa LSB) to sign
2950- __m128i mantodd =
2951- _mm_srai_epi32(mantoddbit, 31); // -1 if FP16 mantissa odd, else 0
2952-
2953- __m128i round1 = _mm_add_epi32(absf_int, STBIR__CONSTI(c_normal_bias));
2954- __m128i round2 = _mm_sub_epi32(
2955- round1,
2956- mantodd); // if mantissa LSB odd, bias towards rounding up (RTNE)
2957- __m128i normal = _mm_srli_epi32(round2, 13); // rounded result
2958-
2959- // combine the two non-specials
2960- __m128i nonspecial = _mm_or_si128(_mm_and_si128(subnorm2, b_issub),
2961- _mm_andnot_si128(b_issub, normal));
2962-
2963- // merge in specials as well
2964- __m128i joined = _mm_or_si128(_mm_and_si128(nonspecial, b_isregular),
2965- _mm_andnot_si128(b_isregular, inf_or_nan));
2966-
2967- __m128i sign_shift = _mm_srai_epi32(_mm_castps_si128(justsign), 16);
2968- __m128i final2, final = _mm_or_si128(joined, sign_shift);
2969-
2970- f = _mm_loadu_ps(input + 4);
2971- justsign = _mm_and_ps(msign, f);
2972- absf = _mm_xor_ps(f, justsign);
2973- absf_int = _mm_castps_si128(
2974- absf); // the cast is "free" (extra bypass latency, but no thruput hit)
2975- b_isnan = _mm_cmpunord_ps(absf, absf); // is this a NaN?
2976- b_isregular =
2977- _mm_cmpgt_epi32(f16max, absf_int); // (sub)normalized or special?
2978- nanbit = _mm_and_si128(_mm_castps_si128(b_isnan), c_nanbit);
2979- inf_or_nan = _mm_or_si128(
2980- nanbit, STBIR__CONSTI(c_infty_as_fp16)); // output for specials
2981-
2982- b_issub = _mm_cmpgt_epi32(min_normal, absf_int);
2983-
2984- // "result is subnormal" path
2985- subnorm1 = _mm_add_ps(
2986- absf, _mm_castsi128_ps(STBIR__CONSTI(
2987- c_subnorm_magic))); // magic value to round output mantissa
2988- subnorm2 =
2989- _mm_sub_epi32(_mm_castps_si128(subnorm1),
2990- STBIR__CONSTI(c_subnorm_magic)); // subtract out bias
2991-
2992- // "result is normal" path
2993- mantoddbit = _mm_slli_epi32(absf_int,
2994- 31 - 13); // shift bit 13 (mantissa LSB) to sign
2995- mantodd = _mm_srai_epi32(mantoddbit, 31); // -1 if FP16 mantissa odd, else 0
2996-
2997- round1 = _mm_add_epi32(absf_int, STBIR__CONSTI(c_normal_bias));
2998- round2 = _mm_sub_epi32(
2999- round1,
3000- mantodd); // if mantissa LSB odd, bias towards rounding up (RTNE)
3001- normal = _mm_srli_epi32(round2, 13); // rounded result
3002-
3003- // combine the two non-specials
3004- nonspecial = _mm_or_si128(_mm_and_si128(subnorm2, b_issub),
3005- _mm_andnot_si128(b_issub, normal));
3006-
3007- // merge in specials as well
3008- joined = _mm_or_si128(_mm_and_si128(nonspecial, b_isregular),
3009- _mm_andnot_si128(b_isregular, inf_or_nan));
3010-
3011- sign_shift = _mm_srai_epi32(_mm_castps_si128(justsign), 16);
3012- final2 = _mm_or_si128(joined, sign_shift);
3013- final = _mm_packs_epi32(final, final2);
3014- stbir__simdi_store(output, final);
3015-}
3016-
3017-#elif defined(STBIR_NEON) && defined(_MSC_VER) && defined(_M_ARM64) && \
3018- !defined(__clang__) // 64-bit ARM on MSVC (not clang)
3019-
3020-static stbir__inline void
3021-stbir__half_to_float_SIMD(float *output, stbir__FP16 const *input)
3022-{
3023- float16x4_t in0 = vld1_f16(input + 0);
3024- float16x4_t in1 = vld1_f16(input + 4);
3025- vst1q_f32(output + 0, vcvt_f32_f16(in0));
3026- vst1q_f32(output + 4, vcvt_f32_f16(in1));
3027-}
3028-
3029-static stbir__inline void
3030-stbir__float_to_half_SIMD(stbir__FP16 *output, float const *input)
3031-{
3032- float16x4_t out0 = vcvt_f16_f32(vld1q_f32(input + 0));
3033- float16x4_t out1 = vcvt_f16_f32(vld1q_f32(input + 4));
3034- vst1_f16(output + 0, out0);
3035- vst1_f16(output + 4, out1);
3036-}
3037-
3038-static stbir__inline float
3039-stbir__half_to_float(stbir__FP16 h)
3040-{
3041- return vgetq_lane_f32(vcvt_f32_f16(vld1_dup_f16(&h)), 0);
3042-}
3043-
3044-static stbir__inline stbir__FP16
3045-stbir__float_to_half(float f)
3046-{
3047- return vget_lane_f16(vcvt_f16_f32(vdupq_n_f32(f)), 0).n16_u16[0];
3048-}
3049-
3050-#elif defined(STBIR_NEON) && (defined(_M_ARM64) || defined(__aarch64__) || \
3051- defined(__arm64__)) // 64-bit ARM
3052-
3053-static stbir__inline void
3054-stbir__half_to_float_SIMD(float *output, stbir__FP16 const *input)
3055-{
3056- float16x8_t in = vld1q_f16(input);
3057- vst1q_f32(output + 0, vcvt_f32_f16(vget_low_f16(in)));
3058- vst1q_f32(output + 4, vcvt_f32_f16(vget_high_f16(in)));
3059-}
3060-
3061-static stbir__inline void
3062-stbir__float_to_half_SIMD(stbir__FP16 *output, float const *input)
3063-{
3064- float16x4_t out0 = vcvt_f16_f32(vld1q_f32(input + 0));
3065- float16x4_t out1 = vcvt_f16_f32(vld1q_f32(input + 4));
3066- vst1q_f16(output, vcombine_f16(out0, out1));
3067-}
3068-
3069-static stbir__inline float
3070-stbir__half_to_float(stbir__FP16 h)
3071-{
3072- return vgetq_lane_f32(vcvt_f32_f16(vdup_n_f16(h)), 0);
3073-}
3074-
3075-static stbir__inline stbir__FP16
3076-stbir__float_to_half(float f)
3077-{
3078- return vget_lane_f16(vcvt_f16_f32(vdupq_n_f32(f)), 0);
3079-}
3080-
3081-#elif defined(STBIR_WASM) || \
3082- (defined(STBIR_NEON) && \
3083- (defined(_MSC_VER) || defined(_M_ARM) || \
3084- defined(__arm__))) // WASM or 32-bit ARM on MSVC/clang
3085-
3086-static stbir__inline void
3087-stbir__half_to_float_SIMD(float *output, stbir__FP16 const *input)
3088-{
3089- for (int i = 0; i < 8; i++) {
3090- output[i] = stbir__half_to_float(input[i]);
3091- }
3092-}
3093-static stbir__inline void
3094-stbir__float_to_half_SIMD(stbir__FP16 *output, float const *input)
3095-{
3096- for (int i = 0; i < 8; i++) {
3097- output[i] = stbir__float_to_half(input[i]);
3098- }
3099-}
3100-
3101-#endif
3102-
3103-#ifdef STBIR_SIMD
3104-
3105-#define stbir__simdf_0123to3333(out, reg) \
3106- (out) = stbir__simdf_swiz(reg, 3, 3, 3, 3)
3107-#define stbir__simdf_0123to2222(out, reg) \
3108- (out) = stbir__simdf_swiz(reg, 2, 2, 2, 2)
3109-#define stbir__simdf_0123to1111(out, reg) \
3110- (out) = stbir__simdf_swiz(reg, 1, 1, 1, 1)
3111-#define stbir__simdf_0123to0000(out, reg) \
3112- (out) = stbir__simdf_swiz(reg, 0, 0, 0, 0)
3113-#define stbir__simdf_0123to0003(out, reg) \
3114- (out) = stbir__simdf_swiz(reg, 0, 0, 0, 3)
3115-#define stbir__simdf_0123to0001(out, reg) \
3116- (out) = stbir__simdf_swiz(reg, 0, 0, 0, 1)
3117-#define stbir__simdf_0123to1122(out, reg) \
3118- (out) = stbir__simdf_swiz(reg, 1, 1, 2, 2)
3119-#define stbir__simdf_0123to2333(out, reg) \
3120- (out) = stbir__simdf_swiz(reg, 2, 3, 3, 3)
3121-#define stbir__simdf_0123to0023(out, reg) \
3122- (out) = stbir__simdf_swiz(reg, 0, 0, 2, 3)
3123-#define stbir__simdf_0123to1230(out, reg) \
3124- (out) = stbir__simdf_swiz(reg, 1, 2, 3, 0)
3125-#define stbir__simdf_0123to2103(out, reg) \
3126- (out) = stbir__simdf_swiz(reg, 2, 1, 0, 3)
3127-#define stbir__simdf_0123to3210(out, reg) \
3128- (out) = stbir__simdf_swiz(reg, 3, 2, 1, 0)
3129-#define stbir__simdf_0123to2301(out, reg) \
3130- (out) = stbir__simdf_swiz(reg, 2, 3, 0, 1)
3131-#define stbir__simdf_0123to3012(out, reg) \
3132- (out) = stbir__simdf_swiz(reg, 3, 0, 1, 2)
3133-#define stbir__simdf_0123to0011(out, reg) \
3134- (out) = stbir__simdf_swiz(reg, 0, 0, 1, 1)
3135-#define stbir__simdf_0123to1100(out, reg) \
3136- (out) = stbir__simdf_swiz(reg, 1, 1, 0, 0)
3137-#define stbir__simdf_0123to2233(out, reg) \
3138- (out) = stbir__simdf_swiz(reg, 2, 2, 3, 3)
3139-#define stbir__simdf_0123to1133(out, reg) \
3140- (out) = stbir__simdf_swiz(reg, 1, 1, 3, 3)
3141-#define stbir__simdf_0123to0022(out, reg) \
3142- (out) = stbir__simdf_swiz(reg, 0, 0, 2, 2)
3143-#define stbir__simdf_0123to1032(out, reg) \
3144- (out) = stbir__simdf_swiz(reg, 1, 0, 3, 2)
3145-
3146-typedef union stbir__simdi_u32 {
3147- stbir_uint32 m128i_u32[4];
3148- int m128i_i32[4];
3149- stbir__simdi m128i_i128;
3150-} stbir__simdi_u32;
3151-
3152-static const int STBIR_mask[9] = {0, 0, 0, -1, -1, -1, 0, 0, 0};
3153-
3154-static const STBIR__SIMDF_CONST(STBIR_max_uint8_as_float,
3155- stbir__max_uint8_as_float);
3156-static const STBIR__SIMDF_CONST(STBIR_max_uint16_as_float,
3157- stbir__max_uint16_as_float);
3158-static const STBIR__SIMDF_CONST(STBIR_max_uint8_as_float_inverted,
3159- stbir__max_uint8_as_float_inverted);
3160-static const STBIR__SIMDF_CONST(STBIR_max_uint16_as_float_inverted,
3161- stbir__max_uint16_as_float_inverted);
3162-
3163-static const STBIR__SIMDF_CONST(STBIR_simd_point5, 0.5f);
3164-static const STBIR__SIMDF_CONST(STBIR_ones, 1.0f);
3165-static const STBIR__SIMDI_CONST(STBIR_almost_zero, (127 - 13) << 23);
3166-static const STBIR__SIMDI_CONST(STBIR_almost_one, 0x3f7fffff);
3167-static const STBIR__SIMDI_CONST(STBIR_mastissa_mask, 0xff);
3168-static const STBIR__SIMDI_CONST(STBIR_topscale, 0x02000000);
3169-
3170-// Basically, in simd mode, we unroll the proper amount, and we don't want
3171-// the non-simd remnant loops to be unroll because they only run a few times
3172-// Adding this switch saves about 5K on clang which is Captain Unroll the 3rd.
3173-#define STBIR_SIMD_STREAMOUT_PTR(star) STBIR_STREAMOUT_PTR(star)
3174-#define STBIR_SIMD_NO_UNROLL(ptr) STBIR_NO_UNROLL(ptr)
3175-#define STBIR_SIMD_NO_UNROLL_LOOP_START STBIR_NO_UNROLL_LOOP_START
3176-#define STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR \
3177- STBIR_NO_UNROLL_LOOP_START_INF_FOR
3178-
3179-#ifdef STBIR_MEMCPY
3180-#undef STBIR_MEMCPY
3181-#endif
3182-#define STBIR_MEMCPY stbir_simd_memcpy
3183-
3184-// override normal use of memcpy with much simpler copy (faster and smaller with
3185-// our sized copies)
3186-static void
3187-stbir_simd_memcpy(void *dest, void const *src, size_t bytes)
3188-{
3189- char STBIR_SIMD_STREAMOUT_PTR(*) d = (char *)dest;
3190- char STBIR_SIMD_STREAMOUT_PTR(*) d_end = ((char *)dest) + bytes;
3191- ptrdiff_t ofs_to_src = (char *)src - (char *)dest;
3192-
3193- // check overlaps
3194- STBIR_ASSERT(((d >= ((char *)src) + bytes)) ||
3195- ((d + bytes) <= (char *)src));
3196-
3197- if (bytes < (16 * stbir__simdfX_float_count)) {
3198- if (bytes < 16) {
3199- if (bytes) {
3200- STBIR_SIMD_NO_UNROLL_LOOP_START
3201- do {
3202- STBIR_SIMD_NO_UNROLL(d);
3203- d[0] = d[ofs_to_src];
3204- ++d;
3205- } while (d < d_end);
3206- }
3207- } else {
3208- stbir__simdf x;
3209- // do one unaligned to get us aligned for the stream out below
3210- stbir__simdf_load(x, (d + ofs_to_src));
3211- stbir__simdf_store(d, x);
3212- d = (char *)((((size_t)d) + 16) & ~15);
3213-
3214- STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
3215- for (;;) {
3216- STBIR_SIMD_NO_UNROLL(d);
3217-
3218- if (d > (d_end - 16)) {
3219- if (d == d_end) {
3220- return;
3221- }
3222- d = d_end - 16;
3223- }
3224-
3225- stbir__simdf_load(x, (d + ofs_to_src));
3226- stbir__simdf_store(d, x);
3227- d += 16;
3228- }
3229- }
3230- } else {
3231- stbir__simdfX x0, x1, x2, x3;
3232-
3233- // do one unaligned to get us aligned for the stream out below
3234- stbir__simdfX_load(x0,
3235- (d + ofs_to_src) + 0 * stbir__simdfX_float_count);
3236- stbir__simdfX_load(x1,
3237- (d + ofs_to_src) + 4 * stbir__simdfX_float_count);
3238- stbir__simdfX_load(x2,
3239- (d + ofs_to_src) + 8 * stbir__simdfX_float_count);
3240- stbir__simdfX_load(x3,
3241- (d + ofs_to_src) + 12 * stbir__simdfX_float_count);
3242- stbir__simdfX_store(d + 0 * stbir__simdfX_float_count, x0);
3243- stbir__simdfX_store(d + 4 * stbir__simdfX_float_count, x1);
3244- stbir__simdfX_store(d + 8 * stbir__simdfX_float_count, x2);
3245- stbir__simdfX_store(d + 12 * stbir__simdfX_float_count, x3);
3246- d = (char *)((((size_t)d) + (16 * stbir__simdfX_float_count)) &
3247- ~((16 * stbir__simdfX_float_count) - 1));
3248-
3249- STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
3250- for (;;) {
3251- STBIR_SIMD_NO_UNROLL(d);
3252-
3253- if (d > (d_end - (16 * stbir__simdfX_float_count))) {
3254- if (d == d_end) {
3255- return;
3256- }
3257- d = d_end - (16 * stbir__simdfX_float_count);
3258- }
3259-
3260- stbir__simdfX_load(x0, (d + ofs_to_src) +
3261- 0 * stbir__simdfX_float_count);
3262- stbir__simdfX_load(x1, (d + ofs_to_src) +
3263- 4 * stbir__simdfX_float_count);
3264- stbir__simdfX_load(x2, (d + ofs_to_src) +
3265- 8 * stbir__simdfX_float_count);
3266- stbir__simdfX_load(x3, (d + ofs_to_src) +
3267- 12 * stbir__simdfX_float_count);
3268- stbir__simdfX_store(d + 0 * stbir__simdfX_float_count, x0);
3269- stbir__simdfX_store(d + 4 * stbir__simdfX_float_count, x1);
3270- stbir__simdfX_store(d + 8 * stbir__simdfX_float_count, x2);
3271- stbir__simdfX_store(d + 12 * stbir__simdfX_float_count, x3);
3272- d += (16 * stbir__simdfX_float_count);
3273- }
3274- }
3275-}
3276-
3277-// memcpy that is specically intentionally overlapping (src is smaller then
3278-// dest, so can be
3279-// a normal forward copy, bytes is divisible by 4 and bytes is greater than or
3280-// equal to the diff between dest and src)
3281-static void
3282-stbir_overlapping_memcpy(void *dest, void const *src, size_t bytes)
3283-{
3284- char STBIR_SIMD_STREAMOUT_PTR(*) sd = (char *)src;
3285- char STBIR_SIMD_STREAMOUT_PTR(*) s_end = ((char *)src) + bytes;
3286- ptrdiff_t ofs_to_dest = (char *)dest - (char *)src;
3287-
3288- if (ofs_to_dest >= 16) // is the overlap more than 16 away?
3289- {
3290- char STBIR_SIMD_STREAMOUT_PTR(*) s_end16 =
3291- ((char *)src) + (bytes & ~15);
3292- STBIR_SIMD_NO_UNROLL_LOOP_START
3293- do {
3294- stbir__simdf x;
3295- STBIR_SIMD_NO_UNROLL(sd);
3296- stbir__simdf_load(x, sd);
3297- stbir__simdf_store((sd + ofs_to_dest), x);
3298- sd += 16;
3299- } while (sd < s_end16);
3300-
3301- if (sd == s_end) {
3302- return;
3303- }
3304- }
3305-
3306- do {
3307- STBIR_SIMD_NO_UNROLL(sd);
3308- *(int *)(sd + ofs_to_dest) = *(int *)sd;
3309- sd += 4;
3310- } while (sd < s_end);
3311-}
3312-
3313-#else // no SSE2
3314-
3315-// when in scalar mode, we let unrolling happen, so this macro just does the
3316-// __restrict
3317-#define STBIR_SIMD_STREAMOUT_PTR(star) STBIR_STREAMOUT_PTR(star)
3318-#define STBIR_SIMD_NO_UNROLL(ptr)
3319-#define STBIR_SIMD_NO_UNROLL_LOOP_START
3320-#define STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
3321-
3322-#endif // SSE2
3323-
3324-#ifdef STBIR_PROFILE
3325-
3326-#ifndef STBIR_PROFILE_FUNC
3327-
3328-#if defined(_x86_64) || defined(__x86_64__) || defined(_M_X64) || \
3329- defined(__x86_64) || defined(__SSE2__) || defined(STBIR_SSE) || \
3330- defined(_M_IX86_FP) || defined(__i386) || defined(__i386__) || \
3331- defined(_M_IX86) || defined(_X86_)
3332-
3333-#ifdef _MSC_VER
3334-
3335-STBIRDEF stbir_uint64
3336-__rdtsc();
3337-#define STBIR_PROFILE_FUNC() __rdtsc()
3338-
3339-#else // non msvc
3340-
3341-static stbir__inline stbir_uint64
3342-STBIR_PROFILE_FUNC()
3343-{
3344- stbir_uint32 lo, hi;
3345- asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
3346- return (((stbir_uint64)hi) << 32) | ((stbir_uint64)lo);
3347-}
3348-
3349-#endif // msvc
3350-
3351-#elif defined(_M_ARM64) || defined(__aarch64__) || defined(__arm64__) || \
3352- defined(__ARM_NEON__)
3353-
3354-#if defined(_MSC_VER) && !defined(__clang__)
3355-
3356-#define STBIR_PROFILE_FUNC() _ReadStatusReg(ARM64_CNTVCT)
3357-
3358-#else
3359-
3360-static stbir__inline stbir_uint64
3361-STBIR_PROFILE_FUNC()
3362-{
3363- stbir_uint64 tsc;
3364- asm volatile("mrs %0, cntvct_el0" : "=r"(tsc));
3365- return tsc;
3366-}
3367-
3368-#endif
3369-
3370-#else // x64, arm
3371-
3372-#error Unknown platform for profiling.
3373-
3374-#endif // x64, arm
3375-
3376-#endif // STBIR_PROFILE_FUNC
3377-
3378-#define STBIR_ONLY_PROFILE_GET_SPLIT_INFO , stbir__per_split_info *split_info
3379-#define STBIR_ONLY_PROFILE_SET_SPLIT_INFO , split_info
3380-
3381-#define STBIR_ONLY_PROFILE_BUILD_GET_INFO , stbir__info *profile_info
3382-#define STBIR_ONLY_PROFILE_BUILD_SET_INFO , profile_info
3383-
3384-// super light-weight micro profiler
3385-#define STBIR_PROFILE_START_ll(info, wh) \
3386- { \
3387- stbir_uint64 wh##thiszonetime = STBIR_PROFILE_FUNC(); \
3388- stbir_uint64 *wh##save_parent_excluded_ptr = \
3389- info->current_zone_excluded_ptr; \
3390- stbir_uint64 wh##current_zone_excluded = 0; \
3391- info->current_zone_excluded_ptr = &wh##current_zone_excluded;
3392-#define STBIR_PROFILE_END_ll(info, wh) \
3393- wh##thiszonetime = STBIR_PROFILE_FUNC() - wh##thiszonetime; \
3394- info->profile.named.wh += wh##thiszonetime - wh##current_zone_excluded; \
3395- *wh##save_parent_excluded_ptr += wh##thiszonetime; \
3396- info->current_zone_excluded_ptr = wh##save_parent_excluded_ptr; \
3397- }
3398-#define STBIR_PROFILE_FIRST_START_ll(info, wh) \
3399- { \
3400- int i; \
3401- info->current_zone_excluded_ptr = &info->profile.named.total; \
3402- for (i = 0; i < STBIR__ARRAY_SIZE(info->profile.array); i++) \
3403- info->profile.array[i] = 0; \
3404- } \
3405- STBIR_PROFILE_START_ll(info, wh);
3406-#define STBIR_PROFILE_CLEAR_EXTRAS_ll(info, num) \
3407- { \
3408- int extra; \
3409- for (extra = 1; extra < (num); extra++) { \
3410- int i; \
3411- for (i = 0; i < STBIR__ARRAY_SIZE((info)->profile.array); i++) \
3412- (info)[extra].profile.array[i] = 0; \
3413- } \
3414- }
3415-
3416-// for thread data
3417-#define STBIR_PROFILE_START(wh) STBIR_PROFILE_START_ll(split_info, wh)
3418-#define STBIR_PROFILE_END(wh) STBIR_PROFILE_END_ll(split_info, wh)
3419-#define STBIR_PROFILE_FIRST_START(wh) \
3420- STBIR_PROFILE_FIRST_START_ll(split_info, wh)
3421-#define STBIR_PROFILE_CLEAR_EXTRAS() \
3422- STBIR_PROFILE_CLEAR_EXTRAS_ll(split_info, split_count)
3423-
3424-// for build data
3425-#define STBIR_PROFILE_BUILD_START(wh) STBIR_PROFILE_START_ll(profile_info, wh)
3426-#define STBIR_PROFILE_BUILD_END(wh) STBIR_PROFILE_END_ll(profile_info, wh)
3427-#define STBIR_PROFILE_BUILD_FIRST_START(wh) \
3428- STBIR_PROFILE_FIRST_START_ll(profile_info, wh)
3429-#define STBIR_PROFILE_BUILD_CLEAR(info) \
3430- { \
3431- int i; \
3432- for (i = 0; i < STBIR__ARRAY_SIZE(info->profile.array); i++) \
3433- info->profile.array[i] = 0; \
3434- }
3435-
3436-#else // no profile
3437-
3438-#define STBIR_ONLY_PROFILE_GET_SPLIT_INFO
3439-#define STBIR_ONLY_PROFILE_SET_SPLIT_INFO
3440-
3441-#define STBIR_ONLY_PROFILE_BUILD_GET_INFO
3442-#define STBIR_ONLY_PROFILE_BUILD_SET_INFO
3443-
3444-#define STBIR_PROFILE_START(wh)
3445-#define STBIR_PROFILE_END(wh)
3446-#define STBIR_PROFILE_FIRST_START(wh)
3447-#define STBIR_PROFILE_CLEAR_EXTRAS()
3448-
3449-#define STBIR_PROFILE_BUILD_START(wh)
3450-#define STBIR_PROFILE_BUILD_END(wh)
3451-#define STBIR_PROFILE_BUILD_FIRST_START(wh)
3452-#define STBIR_PROFILE_BUILD_CLEAR(info)
3453-
3454-#endif // stbir_profile
3455-
3456-#ifndef STBIR_CEILF
3457-#include <math.h>
3458-#if _MSC_VER <= 1200 // support VC6 for Sean
3459-#define STBIR_CEILF(x) ((float)ceil((float)(x)))
3460-#define STBIR_FLOORF(x) ((float)floor((float)(x)))
3461-#else
3462-#define STBIR_CEILF(x) ceilf(x)
3463-#define STBIR_FLOORF(x) floorf(x)
3464-#endif
3465-#endif
3466-
3467-#ifndef STBIR_MEMCPY
3468-// For memcpy
3469-#include <string.h>
3470-#define STBIR_MEMCPY(dest, src, len) memcpy(dest, src, len)
3471-#endif
3472-
3473-#ifndef STBIR_SIMD
3474-
3475-// memcpy that is specifically intentionally overlapping (src is smaller then
3476-// dest, so can be
3477-// a normal forward copy, bytes is divisible by 4 and bytes is greater than or
3478-// equal to the diff between dest and src)
3479-static void
3480-stbir_overlapping_memcpy(void *dest, void const *src, size_t bytes)
3481-{
3482- char STBIR_SIMD_STREAMOUT_PTR(*) sd = (char *)src;
3483- char STBIR_SIMD_STREAMOUT_PTR(*) s_end = ((char *)src) + bytes;
3484- ptrdiff_t ofs_to_dest = (char *)dest - (char *)src;
3485-
3486- if (ofs_to_dest >= 8) // is the overlap more than 8 away?
3487- {
3488- char STBIR_SIMD_STREAMOUT_PTR(*) s_end8 = ((char *)src) + (bytes & ~7);
3489- STBIR_NO_UNROLL_LOOP_START
3490- do {
3491- STBIR_NO_UNROLL(sd);
3492- *(stbir_uint64 *)(sd + ofs_to_dest) = *(stbir_uint64 *)sd;
3493- sd += 8;
3494- } while (sd < s_end8);
3495-
3496- if (sd == s_end) {
3497- return;
3498- }
3499- }
3500-
3501- STBIR_NO_UNROLL_LOOP_START
3502- do {
3503- STBIR_NO_UNROLL(sd);
3504- *(int *)(sd + ofs_to_dest) = *(int *)sd;
3505- sd += 4;
3506- } while (sd < s_end);
3507-}
3508-
3509-#endif
3510-
3511-static float
3512-stbir__filter_trapezoid(float x, float scale, void *user_data)
3513-{
3514- float halfscale = scale / 2;
3515- float t = 0.5f + halfscale;
3516- STBIR_ASSERT(scale <= 1);
3517- STBIR__UNUSED(user_data);
3518-
3519- if (x < 0.0f) {
3520- x = -x;
3521- }
3522-
3523- if (x >= t) {
3524- return 0.0f;
3525- } else {
3526- float r = 0.5f - halfscale;
3527- if (x <= r) {
3528- return 1.0f;
3529- } else {
3530- return (t - x) / scale;
3531- }
3532- }
3533-}
3534-
3535-static float
3536-stbir__support_trapezoid(float scale, void *user_data)
3537-{
3538- STBIR__UNUSED(user_data);
3539- return 0.5f + scale / 2.0f;
3540-}
3541-
3542-static float
3543-stbir__filter_triangle(float x, float s, void *user_data)
3544-{
3545- STBIR__UNUSED(s);
3546- STBIR__UNUSED(user_data);
3547-
3548- if (x < 0.0f) {
3549- x = -x;
3550- }
3551-
3552- if (x <= 1.0f) {
3553- return 1.0f - x;
3554- } else {
3555- return 0.0f;
3556- }
3557-}
3558-
3559-static float
3560-stbir__filter_point(float x, float s, void *user_data)
3561-{
3562- STBIR__UNUSED(x);
3563- STBIR__UNUSED(s);
3564- STBIR__UNUSED(user_data);
3565-
3566- return 1.0f;
3567-}
3568-
3569-static float
3570-stbir__filter_cubic(float x, float s, void *user_data)
3571-{
3572- STBIR__UNUSED(s);
3573- STBIR__UNUSED(user_data);
3574-
3575- if (x < 0.0f) {
3576- x = -x;
3577- }
3578-
3579- if (x < 1.0f) {
3580- return (4.0f + x * x * (3.0f * x - 6.0f)) / 6.0f;
3581- } else if (x < 2.0f) {
3582- return (8.0f + x * (-12.0f + x * (6.0f - x))) / 6.0f;
3583- }
3584-
3585- return (0.0f);
3586-}
3587-
3588-static float
3589-stbir__filter_catmullrom(float x, float s, void *user_data)
3590-{
3591- STBIR__UNUSED(s);
3592- STBIR__UNUSED(user_data);
3593-
3594- if (x < 0.0f) {
3595- x = -x;
3596- }
3597-
3598- if (x < 1.0f) {
3599- return 1.0f - x * x * (2.5f - 1.5f * x);
3600- } else if (x < 2.0f) {
3601- return 2.0f - x * (4.0f + x * (0.5f * x - 2.5f));
3602- }
3603-
3604- return (0.0f);
3605-}
3606-
3607-static float
3608-stbir__filter_mitchell(float x, float s, void *user_data)
3609-{
3610- STBIR__UNUSED(s);
3611- STBIR__UNUSED(user_data);
3612-
3613- if (x < 0.0f) {
3614- x = -x;
3615- }
3616-
3617- if (x < 1.0f) {
3618- return (16.0f + x * x * (21.0f * x - 36.0f)) / 18.0f;
3619- } else if (x < 2.0f) {
3620- return (32.0f + x * (-60.0f + x * (36.0f - 7.0f * x))) / 18.0f;
3621- }
3622-
3623- return (0.0f);
3624-}
3625-
3626-static float
3627-stbir__support_zeropoint5(float s, void *user_data)
3628-{
3629- STBIR__UNUSED(s);
3630- STBIR__UNUSED(user_data);
3631- return 0.5f;
3632-}
3633-
3634-static float
3635-stbir__support_one(float s, void *user_data)
3636-{
3637- STBIR__UNUSED(s);
3638- STBIR__UNUSED(user_data);
3639- return 1;
3640-}
3641-
3642-static float
3643-stbir__support_two(float s, void *user_data)
3644-{
3645- STBIR__UNUSED(s);
3646- STBIR__UNUSED(user_data);
3647- return 2;
3648-}
3649-
3650-// This is the maximum number of input samples that can affect an output sample
3651-// with the given filter from the output pixel's perspective
3652-static int
3653-stbir__get_filter_pixel_width(stbir__support_callback *support, float scale,
3654- void *user_data)
3655-{
3656- STBIR_ASSERT(support != 0);
3657-
3658- if (scale >= (1.0f - stbir__small_float)) { // upscale
3659- return (int)STBIR_CEILF(support(1.0f / scale, user_data) * 2.0f);
3660- } else {
3661- return (int)STBIR_CEILF(support(scale, user_data) * 2.0f / scale);
3662- }
3663-}
3664-
3665-// this is how many coefficents per run of the filter (which is different
3666-// from the filter_pixel_width depending on if we are scattering or gathering)
3667-static int
3668-stbir__get_coefficient_width(stbir__sampler *samp, int is_gather,
3669- void *user_data)
3670-{
3671- float scale = samp->scale_info.scale;
3672- stbir__support_callback *support = samp->filter_support;
3673-
3674- switch (is_gather) {
3675- case 1:
3676- return (int)STBIR_CEILF(support(1.0f / scale, user_data) * 2.0f);
3677- case 2:
3678- return (int)STBIR_CEILF(support(scale, user_data) * 2.0f / scale);
3679- case 0:
3680- return (int)STBIR_CEILF(support(scale, user_data) * 2.0f);
3681- default:
3682- STBIR_ASSERT((is_gather >= 0) && (is_gather <= 2));
3683- return 0;
3684- }
3685-}
3686-
3687-static int
3688-stbir__get_contributors(stbir__sampler *samp, int is_gather)
3689-{
3690- if (is_gather) {
3691- return samp->scale_info.output_sub_size;
3692- } else {
3693- return (samp->scale_info.input_full_size +
3694- samp->filter_pixel_margin * 2);
3695- }
3696-}
3697-
3698-static int
3699-stbir__edge_zero_full(int n, int max)
3700-{
3701- STBIR__UNUSED(n);
3702- STBIR__UNUSED(max);
3703- return 0; // NOTREACHED
3704-}
3705-
3706-static int
3707-stbir__edge_clamp_full(int n, int max)
3708-{
3709- if (n < 0) {
3710- return 0;
3711- }
3712-
3713- if (n >= max) {
3714- return max - 1;
3715- }
3716-
3717- return n; // NOTREACHED
3718-}
3719-
3720-static int
3721-stbir__edge_reflect_full(int n, int max)
3722-{
3723- if (n < 0) {
3724- if (n > -max) {
3725- return -n;
3726- } else {
3727- return max - 1;
3728- }
3729- }
3730-
3731- if (n >= max) {
3732- int max2 = max * 2;
3733- if (n >= max2) {
3734- return 0;
3735- } else {
3736- return max2 - n - 1;
3737- }
3738- }
3739-
3740- return n; // NOTREACHED
3741-}
3742-
3743-static int
3744-stbir__edge_wrap_full(int n, int max)
3745-{
3746- if (n >= 0) {
3747- return (n % max);
3748- } else {
3749- int m = (-n) % max;
3750-
3751- if (m != 0) {
3752- m = max - m;
3753- }
3754-
3755- return (m);
3756- }
3757-}
3758-
3759-typedef int
3760-stbir__edge_wrap_func(int n, int max);
3761-static stbir__edge_wrap_func *stbir__edge_wrap_slow[] = {
3762- stbir__edge_clamp_full, // STBIR_EDGE_CLAMP
3763- stbir__edge_reflect_full, // STBIR_EDGE_REFLECT
3764- stbir__edge_wrap_full, // STBIR_EDGE_WRAP
3765- stbir__edge_zero_full, // STBIR_EDGE_ZERO
3766-};
3767-
3768-stbir__inline static int
3769-stbir__edge_wrap(stbir_edge edge, int n, int max)
3770-{
3771- // avoid per-pixel switch
3772- if (n >= 0 && n < max) {
3773- return n;
3774- }
3775- return stbir__edge_wrap_slow[edge](n, max);
3776-}
3777-
3778-#define STBIR__MERGE_RUNS_PIXEL_THRESHOLD 16
3779-
3780-// get information on the extents of a sampler
3781-static void
3782-stbir__get_extents(stbir__sampler *samp, stbir__extents *scanline_extents)
3783-{
3784- int j, stop;
3785- int left_margin, right_margin;
3786- int min_n = 0x7fffffff, max_n = -0x7fffffff;
3787- int min_left = 0x7fffffff, max_left = -0x7fffffff;
3788- int min_right = 0x7fffffff, max_right = -0x7fffffff;
3789- stbir_edge edge = samp->edge;
3790- stbir__contributors *contributors = samp->contributors;
3791- int output_sub_size = samp->scale_info.output_sub_size;
3792- int input_full_size = samp->scale_info.input_full_size;
3793- int filter_pixel_margin = samp->filter_pixel_margin;
3794-
3795- STBIR_ASSERT(samp->is_gather);
3796-
3797- stop = output_sub_size;
3798- for (j = 0; j < stop; j++) {
3799- STBIR_ASSERT(contributors[j].n1 >= contributors[j].n0);
3800- if (contributors[j].n0 < min_n) {
3801- min_n = contributors[j].n0;
3802- stop = j + filter_pixel_margin; // if we find a new min, only scan
3803- // another filter width
3804- if (stop > output_sub_size) {
3805- stop = output_sub_size;
3806- }
3807- }
3808- }
3809-
3810- stop = 0;
3811- for (j = output_sub_size - 1; j >= stop; j--) {
3812- STBIR_ASSERT(contributors[j].n1 >= contributors[j].n0);
3813- if (contributors[j].n1 > max_n) {
3814- max_n = contributors[j].n1;
3815- stop = j - filter_pixel_margin; // if we find a new max, only scan
3816- // another filter width
3817- if (stop < 0) {
3818- stop = 0;
3819- }
3820- }
3821- }
3822-
3823- STBIR_ASSERT(scanline_extents->conservative.n0 <= min_n);
3824- STBIR_ASSERT(scanline_extents->conservative.n1 >= max_n);
3825-
3826- // now calculate how much into the margins we really read
3827- left_margin = 0;
3828- if (min_n < 0) {
3829- left_margin = -min_n;
3830- min_n = 0;
3831- }
3832-
3833- right_margin = 0;
3834- if (max_n >= input_full_size) {
3835- right_margin = max_n - input_full_size + 1;
3836- max_n = input_full_size - 1;
3837- }
3838-
3839- // index 1 is margin pixel extents (how many pixels we hang over the edge)
3840- scanline_extents->edge_sizes[0] = left_margin;
3841- scanline_extents->edge_sizes[1] = right_margin;
3842-
3843- // index 2 is pixels read from the input
3844- scanline_extents->spans[0].n0 = min_n;
3845- scanline_extents->spans[0].n1 = max_n;
3846- scanline_extents->spans[0].pixel_offset_for_input = min_n;
3847-
3848- // default to no other input range
3849- scanline_extents->spans[1].n0 = 0;
3850- scanline_extents->spans[1].n1 = -1;
3851- scanline_extents->spans[1].pixel_offset_for_input = 0;
3852-
3853- // don't have to do edge calc for zero clamp
3854- if (edge == STBIR_EDGE_ZERO) {
3855- return;
3856- }
3857-
3858- // convert margin pixels to the pixels within the input (min and max)
3859- for (j = -left_margin; j < 0; j++) {
3860- int p = stbir__edge_wrap(edge, j, input_full_size);
3861- if (p < min_left) {
3862- min_left = p;
3863- }
3864- if (p > max_left) {
3865- max_left = p;
3866- }
3867- }
3868-
3869- for (j = input_full_size; j < (input_full_size + right_margin); j++) {
3870- int p = stbir__edge_wrap(edge, j, input_full_size);
3871- if (p < min_right) {
3872- min_right = p;
3873- }
3874- if (p > max_right) {
3875- max_right = p;
3876- }
3877- }
3878-
3879- // merge the left margin pixel region if it connects within 4 pixels of main
3880- // pixel region
3881- if (min_left != 0x7fffffff) {
3882- if (((min_left <= min_n) &&
3883- ((max_left + STBIR__MERGE_RUNS_PIXEL_THRESHOLD) >= min_n)) ||
3884- ((min_n <= min_left) &&
3885- ((max_n + STBIR__MERGE_RUNS_PIXEL_THRESHOLD) >= max_left))) {
3886- scanline_extents->spans[0].n0 = min_n = stbir__min(min_n, min_left);
3887- scanline_extents->spans[0].n1 = max_n = stbir__max(max_n, max_left);
3888- scanline_extents->spans[0].pixel_offset_for_input = min_n;
3889- left_margin = 0;
3890- }
3891- }
3892-
3893- // merge the right margin pixel region if it connects within 4 pixels of
3894- // main pixel region
3895- if (min_right != 0x7fffffff) {
3896- if (((min_right <= min_n) &&
3897- ((max_right + STBIR__MERGE_RUNS_PIXEL_THRESHOLD) >= min_n)) ||
3898- ((min_n <= min_right) &&
3899- ((max_n + STBIR__MERGE_RUNS_PIXEL_THRESHOLD) >= max_right))) {
3900- scanline_extents->spans[0].n0 = min_n =
3901- stbir__min(min_n, min_right);
3902- scanline_extents->spans[0].n1 = max_n =
3903- stbir__max(max_n, max_right);
3904- scanline_extents->spans[0].pixel_offset_for_input = min_n;
3905- right_margin = 0;
3906- }
3907- }
3908-
3909- STBIR_ASSERT(scanline_extents->conservative.n0 <= min_n);
3910- STBIR_ASSERT(scanline_extents->conservative.n1 >= max_n);
3911-
3912- // you get two ranges when you have the WRAP edge mode and you are doing
3913- // just the a piece of the resize
3914- // so you need to get a second run of pixels from the opposite side of the
3915- // scanline (which you wouldn't need except for WRAP)
3916-
3917- // if we can't merge the min_left range, add it as a second range
3918- if ((left_margin) && (min_left != 0x7fffffff)) {
3919- stbir__span *newspan = scanline_extents->spans + 1;
3920- STBIR_ASSERT(right_margin == 0);
3921- if (min_left < scanline_extents->spans[0].n0) {
3922- scanline_extents->spans[1].pixel_offset_for_input =
3923- scanline_extents->spans[0].n0;
3924- scanline_extents->spans[1].n0 = scanline_extents->spans[0].n0;
3925- scanline_extents->spans[1].n1 = scanline_extents->spans[0].n1;
3926- --newspan;
3927- }
3928- newspan->pixel_offset_for_input = min_left;
3929- newspan->n0 = -left_margin;
3930- newspan->n1 = (max_left - min_left) - left_margin;
3931- scanline_extents->edge_sizes[0] =
3932- 0; // don't need to copy the left margin, since we are directly
3933- // decoding into the margin
3934- }
3935- // if we can't merge the min_left range, add it as a second range
3936- else if ((right_margin) && (min_right != 0x7fffffff)) {
3937- stbir__span *newspan = scanline_extents->spans + 1;
3938- if (min_right < scanline_extents->spans[0].n0) {
3939- scanline_extents->spans[1].pixel_offset_for_input =
3940- scanline_extents->spans[0].n0;
3941- scanline_extents->spans[1].n0 = scanline_extents->spans[0].n0;
3942- scanline_extents->spans[1].n1 = scanline_extents->spans[0].n1;
3943- --newspan;
3944- }
3945- newspan->pixel_offset_for_input = min_right;
3946- newspan->n0 = scanline_extents->spans[1].n1 + 1;
3947- newspan->n1 =
3948- scanline_extents->spans[1].n1 + 1 + (max_right - min_right);
3949- scanline_extents->edge_sizes[1] =
3950- 0; // don't need to copy the right margin, since we are directly
3951- // decoding into the margin
3952- }
3953-
3954- // sort the spans into write output order
3955- if ((scanline_extents->spans[1].n1 > scanline_extents->spans[1].n0) &&
3956- (scanline_extents->spans[0].n0 > scanline_extents->spans[1].n0)) {
3957- stbir__span tspan = scanline_extents->spans[0];
3958- scanline_extents->spans[0] = scanline_extents->spans[1];
3959- scanline_extents->spans[1] = tspan;
3960- }
3961-}
3962-
3963-static void
3964-stbir__calculate_in_pixel_range(int *first_pixel, int *last_pixel,
3965- float out_pixel_center, float out_filter_radius,
3966- float inv_scale, float out_shift,
3967- int input_size, stbir_edge edge)
3968-{
3969- int first, last;
3970- float out_pixel_influence_lowerbound = out_pixel_center - out_filter_radius;
3971- float out_pixel_influence_upperbound = out_pixel_center + out_filter_radius;
3972-
3973- float in_pixel_influence_lowerbound =
3974- (out_pixel_influence_lowerbound + out_shift) * inv_scale;
3975- float in_pixel_influence_upperbound =
3976- (out_pixel_influence_upperbound + out_shift) * inv_scale;
3977-
3978- first = (int)(STBIR_FLOORF(in_pixel_influence_lowerbound + 0.5f));
3979- last = (int)(STBIR_FLOORF(in_pixel_influence_upperbound - 0.5f));
3980- if (last < first) {
3981- last = first; // point sample mode can span a value *right* at 0.5, and
3982- // cause these to cross
3983- }
3984-
3985- if (edge == STBIR_EDGE_WRAP) {
3986- if (first < -input_size) {
3987- first = -input_size;
3988- }
3989- if (last >= (input_size * 2)) {
3990- last = (input_size * 2) - 1;
3991- }
3992- }
3993-
3994- *first_pixel = first;
3995- *last_pixel = last;
3996-}
3997-
3998-static void
3999-stbir__calculate_coefficients_for_gather_upsample(
4000- float out_filter_radius, stbir__kernel_callback *kernel,
4001- stbir__scale_info *scale_info, int num_contributors,
4002- stbir__contributors *contributors, float *coefficient_group,
4003- int coefficient_width, stbir_edge edge, void *user_data)
4004-{
4005- int n, end;
4006- float inv_scale = scale_info->inv_scale;
4007- float out_shift = scale_info->pixel_shift;
4008- int input_size = scale_info->input_full_size;
4009- int numerator = scale_info->scale_numerator;
4010- int polyphase =
4011- ((scale_info->scale_is_rational) && (numerator < num_contributors));
4012-
4013- // Looping through out pixels
4014- end = num_contributors;
4015- if (polyphase) {
4016- end = numerator;
4017- }
4018- for (n = 0; n < end; n++) {
4019- int i;
4020- int last_non_zero;
4021- float out_pixel_center = (float)n + 0.5f;
4022- float in_center_of_out = (out_pixel_center + out_shift) * inv_scale;
4023-
4024- int in_first_pixel, in_last_pixel;
4025-
4026- stbir__calculate_in_pixel_range(&in_first_pixel, &in_last_pixel,
4027- out_pixel_center, out_filter_radius,
4028- inv_scale, out_shift, input_size, edge);
4029-
4030- // make sure we never generate a range larger than our precalculated
4031- // coeff width
4032- // this only happens in point sample mode, but it's a good safe thing
4033- // to do anyway
4034- if ((in_last_pixel - in_first_pixel + 1) > coefficient_width) {
4035- in_last_pixel = in_first_pixel + coefficient_width - 1;
4036- }
4037-
4038- last_non_zero = -1;
4039- for (i = 0; i <= in_last_pixel - in_first_pixel; i++) {
4040- float in_pixel_center = (float)(i + in_first_pixel) + 0.5f;
4041- float coeff = kernel(in_center_of_out - in_pixel_center, inv_scale,
4042- user_data);
4043-
4044- // kill denormals
4045- if (((coeff < stbir__small_float) &&
4046- (coeff > -stbir__small_float))) {
4047- if (i == 0) // if we're at the front, just eat zero contributors
4048- {
4049- STBIR_ASSERT((in_last_pixel - in_first_pixel) !=
4050- 0); // there should be at least one contrib
4051- ++in_first_pixel;
4052- i--;
4053- continue;
4054- }
4055- coeff =
4056- 0; // make sure is fully zero (should keep denormals away)
4057- } else {
4058- last_non_zero = i;
4059- }
4060-
4061- coefficient_group[i] = coeff;
4062- }
4063-
4064- in_last_pixel = last_non_zero + in_first_pixel; // kills trailing zeros
4065- contributors->n0 = in_first_pixel;
4066- contributors->n1 = in_last_pixel;
4067-
4068- STBIR_ASSERT(contributors->n1 >= contributors->n0);
4069-
4070- ++contributors;
4071- coefficient_group += coefficient_width;
4072- }
4073-}
4074-
4075-static void
4076-stbir__insert_coeff(stbir__contributors *contribs, float *coeffs, int new_pixel,
4077- float new_coeff, int max_width)
4078-{
4079- if (new_pixel <= contribs->n1) // before the end
4080- {
4081- if (new_pixel < contribs->n0) // before the front?
4082- {
4083- if ((contribs->n1 - new_pixel + 1) <= max_width) {
4084- int j, o = contribs->n0 - new_pixel;
4085- for (j = contribs->n1 - contribs->n0; j <= 0; j--) {
4086- coeffs[j + o] = coeffs[j];
4087- }
4088- for (j = 1; j < o; j--) {
4089- coeffs[j] = coeffs[0];
4090- }
4091- coeffs[0] = new_coeff;
4092- contribs->n0 = new_pixel;
4093- }
4094- } else {
4095- coeffs[new_pixel - contribs->n0] += new_coeff;
4096- }
4097- } else {
4098- if ((new_pixel - contribs->n0 + 1) <= max_width) {
4099- int j, e = new_pixel - contribs->n0;
4100- for (j = (contribs->n1 - contribs->n0) + 1; j < e;
4101- j++) { // clear in-betweens coeffs if there are any
4102- coeffs[j] = 0;
4103- }
4104-
4105- coeffs[e] = new_coeff;
4106- contribs->n1 = new_pixel;
4107- }
4108- }
4109-}
4110-
4111-static void
4112-stbir__calculate_out_pixel_range(int *first_pixel, int *last_pixel,
4113- float in_pixel_center, float in_pixels_radius,
4114- float scale, float out_shift, int out_size)
4115-{
4116- float in_pixel_influence_lowerbound = in_pixel_center - in_pixels_radius;
4117- float in_pixel_influence_upperbound = in_pixel_center + in_pixels_radius;
4118- float out_pixel_influence_lowerbound =
4119- in_pixel_influence_lowerbound * scale - out_shift;
4120- float out_pixel_influence_upperbound =
4121- in_pixel_influence_upperbound * scale - out_shift;
4122- int out_first_pixel =
4123- (int)(STBIR_FLOORF(out_pixel_influence_lowerbound + 0.5f));
4124- int out_last_pixel =
4125- (int)(STBIR_FLOORF(out_pixel_influence_upperbound - 0.5f));
4126-
4127- if (out_first_pixel < 0) {
4128- out_first_pixel = 0;
4129- }
4130- if (out_last_pixel >= out_size) {
4131- out_last_pixel = out_size - 1;
4132- }
4133- *first_pixel = out_first_pixel;
4134- *last_pixel = out_last_pixel;
4135-}
4136-
4137-static void
4138-stbir__calculate_coefficients_for_gather_downsample(
4139- int start, int end, float in_pixels_radius, stbir__kernel_callback *kernel,
4140- stbir__scale_info *scale_info, int coefficient_width, int num_contributors,
4141- stbir__contributors *contributors, float *coefficient_group,
4142- void *user_data)
4143-{
4144- int in_pixel;
4145- int i;
4146- int first_out_inited = -1;
4147- float scale = scale_info->scale;
4148- float out_shift = scale_info->pixel_shift;
4149- int out_size = scale_info->output_sub_size;
4150- int numerator = scale_info->scale_numerator;
4151- int polyphase = ((scale_info->scale_is_rational) && (numerator < out_size));
4152-
4153- STBIR__UNUSED(num_contributors);
4154-
4155- // Loop through the input pixels
4156- for (in_pixel = start; in_pixel < end; in_pixel++) {
4157- float in_pixel_center = (float)in_pixel + 0.5f;
4158- float out_center_of_in = in_pixel_center * scale - out_shift;
4159- int out_first_pixel, out_last_pixel;
4160-
4161- stbir__calculate_out_pixel_range(&out_first_pixel, &out_last_pixel,
4162- in_pixel_center, in_pixels_radius,
4163- scale, out_shift, out_size);
4164-
4165- if (out_first_pixel > out_last_pixel) {
4166- continue;
4167- }
4168-
4169- // clamp or exit if we are using polyphase filtering, and the limit is
4170- // up
4171- if (polyphase) {
4172- // when polyphase, you only have to do coeffs up to the numerator
4173- // count
4174- if (out_first_pixel == numerator) {
4175- break;
4176- }
4177-
4178- // don't do any extra work, clamp last pixel at numerator too
4179- if (out_last_pixel >= numerator) {
4180- out_last_pixel = numerator - 1;
4181- }
4182- }
4183-
4184- for (i = 0; i <= out_last_pixel - out_first_pixel; i++) {
4185- float out_pixel_center = (float)(i + out_first_pixel) + 0.5f;
4186- float x = out_pixel_center - out_center_of_in;
4187- float coeff = kernel(x, scale, user_data) * scale;
4188-
4189- // kill the coeff if it's too small (avoid denormals)
4190- if (((coeff < stbir__small_float) &&
4191- (coeff > -stbir__small_float))) {
4192- coeff = 0.0f;
4193- }
4194-
4195- {
4196- int out = i + out_first_pixel;
4197- float *coeffs = coefficient_group + out * coefficient_width;
4198- stbir__contributors *contribs = contributors + out;
4199-
4200- // is this the first time this output pixel has been seen? Init
4201- // it.
4202- if (out > first_out_inited) {
4203- STBIR_ASSERT(
4204- out == (first_out_inited +
4205- 1)); // ensure we have only advanced one at time
4206- first_out_inited = out;
4207- contribs->n0 = in_pixel;
4208- contribs->n1 = in_pixel;
4209- coeffs[0] = coeff;
4210- } else {
4211- // insert on end (always in order)
4212- if (coeffs[0] == 0.0f) // if the first coefficent is zero,
4213- // then zap it for this coeffs
4214- {
4215- STBIR_ASSERT(
4216- (in_pixel - contribs->n0) ==
4217- 1); // ensure that when we zap, we're at the 2nd pos
4218- contribs->n0 = in_pixel;
4219- }
4220- contribs->n1 = in_pixel;
4221- STBIR_ASSERT((in_pixel - contribs->n0) < coefficient_width);
4222- coeffs[in_pixel - contribs->n0] = coeff;
4223- }
4224- }
4225- }
4226- }
4227-}
4228-
4229-#ifdef STBIR_RENORMALIZE_IN_FLOAT
4230-#define STBIR_RENORM_TYPE float
4231-#else
4232-#define STBIR_RENORM_TYPE double
4233-#endif
4234-
4235-static void
4236-stbir__cleanup_gathered_coefficients(stbir_edge edge,
4237- stbir__filter_extent_info *filter_info,
4238- stbir__scale_info *scale_info,
4239- int num_contributors,
4240- stbir__contributors *contributors,
4241- float *coefficient_group,
4242- int coefficient_width)
4243-{
4244- int input_size = scale_info->input_full_size;
4245- int input_last_n1 = input_size - 1;
4246- int n, end;
4247- int lowest = 0x7fffffff;
4248- int highest = -0x7fffffff;
4249- int widest = -1;
4250- int numerator = scale_info->scale_numerator;
4251- int denominator = scale_info->scale_denominator;
4252- int polyphase =
4253- ((scale_info->scale_is_rational) && (numerator < num_contributors));
4254- float *coeffs;
4255- stbir__contributors *contribs;
4256-
4257- // weight all the coeffs for each sample
4258- coeffs = coefficient_group;
4259- contribs = contributors;
4260- end = num_contributors;
4261- if (polyphase) {
4262- end = numerator;
4263- }
4264- for (n = 0; n < end; n++) {
4265- int i;
4266- STBIR_RENORM_TYPE filter_scale, total_filter = 0;
4267- int e;
4268-
4269- // add all contribs
4270- e = contribs->n1 - contribs->n0;
4271- for (i = 0; i <= e; i++) {
4272- total_filter += (STBIR_RENORM_TYPE)coeffs[i];
4273- STBIR_ASSERT((coeffs[i] >= -2.0f) &&
4274- (coeffs[i] <= 2.0f)); // check for wonky weights
4275- }
4276-
4277- // rescale
4278- if ((total_filter < stbir__small_float) &&
4279- (total_filter > -stbir__small_float)) {
4280- // all coeffs are extremely small, just zero it
4281- contribs->n1 = contribs->n0;
4282- coeffs[0] = 0.0f;
4283- } else {
4284- // if the total isn't 1.0, rescale everything
4285- if ((total_filter < (1.0f - stbir__small_float)) ||
4286- (total_filter > (1.0f + stbir__small_float))) {
4287- filter_scale = ((STBIR_RENORM_TYPE)1.0) / total_filter;
4288-
4289- // scale them all
4290- for (i = 0; i <= e; i++) {
4291- coeffs[i] = (float)(coeffs[i] * filter_scale);
4292- }
4293- }
4294- }
4295- ++contribs;
4296- coeffs += coefficient_width;
4297- }
4298-
4299- // if we have a rational for the scale, we can exploit the polyphaseness to
4300- // not calculate
4301- // most of the coefficients, so we copy them here
4302- if (polyphase) {
4303- stbir__contributors *prev_contribs = contributors;
4304- stbir__contributors *cur_contribs = contributors + numerator;
4305-
4306- for (n = numerator; n < num_contributors; n++) {
4307- cur_contribs->n0 = prev_contribs->n0 + denominator;
4308- cur_contribs->n1 = prev_contribs->n1 + denominator;
4309- ++cur_contribs;
4310- ++prev_contribs;
4311- }
4312- stbir_overlapping_memcpy(coefficient_group +
4313- numerator * coefficient_width,
4314- coefficient_group,
4315- (num_contributors - numerator) *
4316- coefficient_width * sizeof(coeffs[0]));
4317- }
4318-
4319- coeffs = coefficient_group;
4320- contribs = contributors;
4321-
4322- for (n = 0; n < num_contributors; n++) {
4323- int i;
4324-
4325- // in zero edge mode, just remove out of bounds contribs completely
4326- // (since their weights are accounted for now)
4327- if (edge == STBIR_EDGE_ZERO) {
4328- // shrink the right side if necessary
4329- if (contribs->n1 > input_last_n1) {
4330- contribs->n1 = input_last_n1;
4331- }
4332-
4333- // shrink the left side
4334- if (contribs->n0 < 0) {
4335- int j, left, skips = 0;
4336-
4337- skips = -contribs->n0;
4338- contribs->n0 = 0;
4339-
4340- // now move down the weights
4341- left = contribs->n1 - contribs->n0 + 1;
4342- if (left > 0) {
4343- for (j = 0; j < left; j++) {
4344- coeffs[j] = coeffs[j + skips];
4345- }
4346- }
4347- }
4348- } else if ((edge == STBIR_EDGE_CLAMP) || (edge == STBIR_EDGE_REFLECT)) {
4349- // for clamp and reflect, calculate the true inbounds position
4350- // (based on edge type) and just add that to the existing weight
4351-
4352- // right hand side first
4353- if (contribs->n1 > input_last_n1) {
4354- int start = contribs->n0;
4355- int endi = contribs->n1;
4356- contribs->n1 = input_last_n1;
4357- for (i = input_size; i <= endi; i++) {
4358- stbir__insert_coeff(
4359- contribs, coeffs,
4360- stbir__edge_wrap_slow[edge](i, input_size),
4361- coeffs[i - start], coefficient_width);
4362- }
4363- }
4364-
4365- // now check left hand edge
4366- if (contribs->n0 < 0) {
4367- int save_n0;
4368- float save_n0_coeff;
4369- float *c = coeffs - (contribs->n0 + 1);
4370-
4371- // reinsert the coeffs with it reflected or clamped (insert
4372- // accumulates, if the coeffs exist)
4373- for (i = -1; i > contribs->n0; i--) {
4374- stbir__insert_coeff(
4375- contribs, coeffs,
4376- stbir__edge_wrap_slow[edge](i, input_size), *c--,
4377- coefficient_width);
4378- }
4379- save_n0 = contribs->n0;
4380- save_n0_coeff = c[0]; // save it, since we didn't do the final
4381- // one (i==n0), because there might be too
4382- // many coeffs to hold (before we resize)!
4383-
4384- // now slide all the coeffs down (since we have accumulated them
4385- // in the positive contribs) and reset the first contrib
4386- contribs->n0 = 0;
4387- for (i = 0; i <= contribs->n1; i++) {
4388- coeffs[i] = coeffs[i - save_n0];
4389- }
4390-
4391- // now that we have shrunk down the contribs, we insert the
4392- // first one safely
4393- stbir__insert_coeff(
4394- contribs, coeffs,
4395- stbir__edge_wrap_slow[edge](save_n0, input_size),
4396- save_n0_coeff, coefficient_width);
4397- }
4398- }
4399-
4400- if (contribs->n0 <= contribs->n1) {
4401- int diff = contribs->n1 - contribs->n0 + 1;
4402- while (diff && (coeffs[diff - 1] == 0.0f)) {
4403- --diff;
4404- }
4405-
4406- contribs->n1 = contribs->n0 + diff - 1;
4407-
4408- if (contribs->n0 <= contribs->n1) {
4409- if (contribs->n0 < lowest) {
4410- lowest = contribs->n0;
4411- }
4412- if (contribs->n1 > highest) {
4413- highest = contribs->n1;
4414- }
4415- if (diff > widest) {
4416- widest = diff;
4417- }
4418- }
4419-
4420- // re-zero out unused coefficients (if any)
4421- for (i = diff; i < coefficient_width; i++) {
4422- coeffs[i] = 0.0f;
4423- }
4424- }
4425-
4426- ++contribs;
4427- coeffs += coefficient_width;
4428- }
4429- filter_info->lowest = lowest;
4430- filter_info->highest = highest;
4431- filter_info->widest = widest;
4432-}
4433-
4434-#undef STBIR_RENORM_TYPE
4435-
4436-static int
4437-stbir__pack_coefficients(int num_contributors,
4438- stbir__contributors *contributors, float *coefficents,
4439- int coefficient_width, int widest, int row0, int row1)
4440-{
4441-#define STBIR_MOVE_1(dest, src) \
4442- { \
4443- STBIR_NO_UNROLL(dest); \
4444- ((stbir_uint32 *)(dest))[0] = ((stbir_uint32 *)(src))[0]; \
4445- }
4446-#define STBIR_MOVE_2(dest, src) \
4447- { \
4448- STBIR_NO_UNROLL(dest); \
4449- ((stbir_uint64 *)(dest))[0] = ((stbir_uint64 *)(src))[0]; \
4450- }
4451-#ifdef STBIR_SIMD
4452-#define STBIR_MOVE_4(dest, src) \
4453- { \
4454- stbir__simdf t; \
4455- STBIR_NO_UNROLL(dest); \
4456- stbir__simdf_load(t, src); \
4457- stbir__simdf_store(dest, t); \
4458- }
4459-#else
4460-#define STBIR_MOVE_4(dest, src) \
4461- { \
4462- STBIR_NO_UNROLL(dest); \
4463- ((stbir_uint64 *)(dest))[0] = ((stbir_uint64 *)(src))[0]; \
4464- ((stbir_uint64 *)(dest))[1] = ((stbir_uint64 *)(src))[1]; \
4465- }
4466-#endif
4467-
4468- int row_end = row1 + 1;
4469- STBIR__UNUSED(row0); // only used in an assert
4470-
4471- if (coefficient_width != widest) {
4472- float *pc = coefficents;
4473- float *coeffs = coefficents;
4474- float *pc_end = coefficents + num_contributors * widest;
4475- switch (widest) {
4476- case 1:
4477- STBIR_NO_UNROLL_LOOP_START
4478- do {
4479- STBIR_MOVE_1(pc, coeffs);
4480- ++pc;
4481- coeffs += coefficient_width;
4482- } while (pc < pc_end);
4483- break;
4484- case 2:
4485- STBIR_NO_UNROLL_LOOP_START
4486- do {
4487- STBIR_MOVE_2(pc, coeffs);
4488- pc += 2;
4489- coeffs += coefficient_width;
4490- } while (pc < pc_end);
4491- break;
4492- case 3:
4493- STBIR_NO_UNROLL_LOOP_START
4494- do {
4495- STBIR_MOVE_2(pc, coeffs);
4496- STBIR_MOVE_1(pc + 2, coeffs + 2);
4497- pc += 3;
4498- coeffs += coefficient_width;
4499- } while (pc < pc_end);
4500- break;
4501- case 4:
4502- STBIR_NO_UNROLL_LOOP_START
4503- do {
4504- STBIR_MOVE_4(pc, coeffs);
4505- pc += 4;
4506- coeffs += coefficient_width;
4507- } while (pc < pc_end);
4508- break;
4509- case 5:
4510- STBIR_NO_UNROLL_LOOP_START
4511- do {
4512- STBIR_MOVE_4(pc, coeffs);
4513- STBIR_MOVE_1(pc + 4, coeffs + 4);
4514- pc += 5;
4515- coeffs += coefficient_width;
4516- } while (pc < pc_end);
4517- break;
4518- case 6:
4519- STBIR_NO_UNROLL_LOOP_START
4520- do {
4521- STBIR_MOVE_4(pc, coeffs);
4522- STBIR_MOVE_2(pc + 4, coeffs + 4);
4523- pc += 6;
4524- coeffs += coefficient_width;
4525- } while (pc < pc_end);
4526- break;
4527- case 7:
4528- STBIR_NO_UNROLL_LOOP_START
4529- do {
4530- STBIR_MOVE_4(pc, coeffs);
4531- STBIR_MOVE_2(pc + 4, coeffs + 4);
4532- STBIR_MOVE_1(pc + 6, coeffs + 6);
4533- pc += 7;
4534- coeffs += coefficient_width;
4535- } while (pc < pc_end);
4536- break;
4537- case 8:
4538- STBIR_NO_UNROLL_LOOP_START
4539- do {
4540- STBIR_MOVE_4(pc, coeffs);
4541- STBIR_MOVE_4(pc + 4, coeffs + 4);
4542- pc += 8;
4543- coeffs += coefficient_width;
4544- } while (pc < pc_end);
4545- break;
4546- case 9:
4547- STBIR_NO_UNROLL_LOOP_START
4548- do {
4549- STBIR_MOVE_4(pc, coeffs);
4550- STBIR_MOVE_4(pc + 4, coeffs + 4);
4551- STBIR_MOVE_1(pc + 8, coeffs + 8);
4552- pc += 9;
4553- coeffs += coefficient_width;
4554- } while (pc < pc_end);
4555- break;
4556- case 10:
4557- STBIR_NO_UNROLL_LOOP_START
4558- do {
4559- STBIR_MOVE_4(pc, coeffs);
4560- STBIR_MOVE_4(pc + 4, coeffs + 4);
4561- STBIR_MOVE_2(pc + 8, coeffs + 8);
4562- pc += 10;
4563- coeffs += coefficient_width;
4564- } while (pc < pc_end);
4565- break;
4566- case 11:
4567- STBIR_NO_UNROLL_LOOP_START
4568- do {
4569- STBIR_MOVE_4(pc, coeffs);
4570- STBIR_MOVE_4(pc + 4, coeffs + 4);
4571- STBIR_MOVE_2(pc + 8, coeffs + 8);
4572- STBIR_MOVE_1(pc + 10, coeffs + 10);
4573- pc += 11;
4574- coeffs += coefficient_width;
4575- } while (pc < pc_end);
4576- break;
4577- case 12:
4578- STBIR_NO_UNROLL_LOOP_START
4579- do {
4580- STBIR_MOVE_4(pc, coeffs);
4581- STBIR_MOVE_4(pc + 4, coeffs + 4);
4582- STBIR_MOVE_4(pc + 8, coeffs + 8);
4583- pc += 12;
4584- coeffs += coefficient_width;
4585- } while (pc < pc_end);
4586- break;
4587- default:
4588- STBIR_NO_UNROLL_LOOP_START
4589- do {
4590- float *copy_end = pc + widest - 4;
4591- float *c = coeffs;
4592- do {
4593- STBIR_NO_UNROLL(pc);
4594- STBIR_MOVE_4(pc, c);
4595- pc += 4;
4596- c += 4;
4597- } while (pc <= copy_end);
4598- copy_end += 4;
4599- STBIR_NO_UNROLL_LOOP_START
4600- while (pc < copy_end) {
4601- STBIR_MOVE_1(pc, c);
4602- ++pc;
4603- ++c;
4604- }
4605- coeffs += coefficient_width;
4606- } while (pc < pc_end);
4607- break;
4608- }
4609- }
4610-
4611- // some horizontal routines read one float off the end (which is then masked
4612- // off), so put in a sentinal so we don't read an snan or denormal
4613- coefficents[widest * num_contributors] = 8888.0f;
4614-
4615- // the minimum we might read for unrolled filters widths is 12. So, we need
4616- // to
4617- // make sure we never read outside the decode buffer, by possibly moving
4618- // the sample area back into the scanline, and putting zeros weights
4619- // first.
4620- // we start on the right edge and check until we're well past the possible
4621- // clip area (2*widest).
4622- {
4623- stbir__contributors *contribs = contributors + num_contributors - 1;
4624- float *coeffs = coefficents + widest * (num_contributors - 1);
4625-
4626- // go until no chance of clipping (this is usually less than 8 lops)
4627- while ((contribs >= contributors) &&
4628- ((contribs->n0 + widest * 2) >= row_end)) {
4629- // might we clip??
4630- if ((contribs->n0 + widest) > row_end) {
4631- int stop_range = widest;
4632-
4633- // if range is larger than 12, it will be handled by generic
4634- // loops that can terminate on the exact length
4635- // of this contrib n1, instead of a fixed widest amount - so
4636- // calculate this
4637- if (widest > 12) {
4638- int mod;
4639-
4640- // how far will be read in the n_coeff loop (which depends
4641- // on the widest count mod4);
4642- mod = widest & 3;
4643- stop_range =
4644- (((contribs->n1 - contribs->n0 + 1) - mod + 3) & ~3) +
4645- mod;
4646-
4647- // the n_coeff loops do a minimum amount of coeffs, so
4648- // factor that in!
4649- if (stop_range < (8 + mod)) {
4650- stop_range = 8 + mod;
4651- }
4652- }
4653-
4654- // now see if we still clip with the refined range
4655- if ((contribs->n0 + stop_range) > row_end) {
4656- int new_n0 = row_end - stop_range;
4657- int num = contribs->n1 - contribs->n0 + 1;
4658- int backup = contribs->n0 - new_n0;
4659- float *from_co = coeffs + num - 1;
4660- float *to_co = from_co + backup;
4661-
4662- STBIR_ASSERT((new_n0 >= row0) && (new_n0 < contribs->n0));
4663-
4664- // move the coeffs over
4665- while (num) {
4666- *to_co-- = *from_co--;
4667- --num;
4668- }
4669- // zero new positions
4670- while (to_co >= coeffs) {
4671- *to_co-- = 0;
4672- }
4673- // set new start point
4674- contribs->n0 = new_n0;
4675- if (widest > 12) {
4676- int mod;
4677-
4678- // how far will be read in the n_coeff loop (which
4679- // depends on the widest count mod4);
4680- mod = widest & 3;
4681- stop_range =
4682- (((contribs->n1 - contribs->n0 + 1) - mod + 3) &
4683- ~3) +
4684- mod;
4685-
4686- // the n_coeff loops do a minimum amount of coeffs, so
4687- // factor that in!
4688- if (stop_range < (8 + mod)) {
4689- stop_range = 8 + mod;
4690- }
4691- }
4692- }
4693- }
4694- --contribs;
4695- coeffs -= widest;
4696- }
4697- }
4698-
4699- return widest;
4700-#undef STBIR_MOVE_1
4701-#undef STBIR_MOVE_2
4702-#undef STBIR_MOVE_4
4703-}
4704-
4705-static void
4706-stbir__calculate_filters(stbir__sampler *samp,
4707- stbir__sampler *other_axis_for_pivot,
4708- void *user_data STBIR_ONLY_PROFILE_BUILD_GET_INFO)
4709-{
4710- int n;
4711- float scale = samp->scale_info.scale;
4712- stbir__kernel_callback *kernel = samp->filter_kernel;
4713- stbir__support_callback *support = samp->filter_support;
4714- float inv_scale = samp->scale_info.inv_scale;
4715- int input_full_size = samp->scale_info.input_full_size;
4716- int gather_num_contributors = samp->num_contributors;
4717- stbir__contributors *gather_contributors = samp->contributors;
4718- float *gather_coeffs = samp->coefficients;
4719- int gather_coefficient_width = samp->coefficient_width;
4720-
4721- switch (samp->is_gather) {
4722- case 1: // gather upsample
4723- {
4724- float out_pixels_radius = support(inv_scale, user_data) * scale;
4725-
4726- stbir__calculate_coefficients_for_gather_upsample(
4727- out_pixels_radius, kernel, &samp->scale_info,
4728- gather_num_contributors, gather_contributors, gather_coeffs,
4729- gather_coefficient_width, samp->edge, user_data);
4730-
4731- STBIR_PROFILE_BUILD_START(cleanup);
4732- stbir__cleanup_gathered_coefficients(
4733- samp->edge, &samp->extent_info, &samp->scale_info,
4734- gather_num_contributors, gather_contributors, gather_coeffs,
4735- gather_coefficient_width);
4736- STBIR_PROFILE_BUILD_END(cleanup);
4737- } break;
4738-
4739- case 0: // scatter downsample (only on vertical)
4740- case 2: // gather downsample
4741- {
4742- float in_pixels_radius = support(scale, user_data) * inv_scale;
4743- int filter_pixel_margin = samp->filter_pixel_margin;
4744- int input_end = input_full_size + filter_pixel_margin;
4745-
4746- // if this is a scatter, we do a downsample gather to get the coeffs,
4747- // and then pivot after
4748- if (!samp->is_gather) {
4749- // check if we are using the same gather downsample on the
4750- // horizontal as this vertical,
4751- // if so, then we don't have to generate them, we can just pivot
4752- // from the horizontal.
4753- if (other_axis_for_pivot) {
4754- gather_contributors = other_axis_for_pivot->contributors;
4755- gather_coeffs = other_axis_for_pivot->coefficients;
4756- gather_coefficient_width =
4757- other_axis_for_pivot->coefficient_width;
4758- gather_num_contributors =
4759- other_axis_for_pivot->num_contributors;
4760- samp->extent_info.lowest =
4761- other_axis_for_pivot->extent_info.lowest;
4762- samp->extent_info.highest =
4763- other_axis_for_pivot->extent_info.highest;
4764- samp->extent_info.widest =
4765- other_axis_for_pivot->extent_info.widest;
4766- goto jump_right_to_pivot;
4767- }
4768-
4769- gather_contributors = samp->gather_prescatter_contributors;
4770- gather_coeffs = samp->gather_prescatter_coefficients;
4771- gather_coefficient_width =
4772- samp->gather_prescatter_coefficient_width;
4773- gather_num_contributors = samp->gather_prescatter_num_contributors;
4774- }
4775-
4776- stbir__calculate_coefficients_for_gather_downsample(
4777- -filter_pixel_margin, input_end, in_pixels_radius, kernel,
4778- &samp->scale_info, gather_coefficient_width,
4779- gather_num_contributors, gather_contributors, gather_coeffs,
4780- user_data);
4781-
4782- STBIR_PROFILE_BUILD_START(cleanup);
4783- stbir__cleanup_gathered_coefficients(
4784- samp->edge, &samp->extent_info, &samp->scale_info,
4785- gather_num_contributors, gather_contributors, gather_coeffs,
4786- gather_coefficient_width);
4787- STBIR_PROFILE_BUILD_END(cleanup);
4788-
4789- if (!samp->is_gather) {
4790- // if this is a scatter (vertical only), then we need to pivot the
4791- // coeffs
4792- stbir__contributors *scatter_contributors;
4793- int highest_set;
4794-
4795- jump_right_to_pivot:
4796-
4797- STBIR_PROFILE_BUILD_START(pivot);
4798-
4799- highest_set = (-filter_pixel_margin) - 1;
4800- for (n = 0; n < gather_num_contributors; n++) {
4801- int k;
4802- int gn0 = gather_contributors->n0,
4803- gn1 = gather_contributors->n1;
4804- int scatter_coefficient_width = samp->coefficient_width;
4805- float *scatter_coeffs =
4806- samp->coefficients +
4807- (gn0 + filter_pixel_margin) * scatter_coefficient_width;
4808- float *g_coeffs = gather_coeffs;
4809- scatter_contributors =
4810- samp->contributors + (gn0 + filter_pixel_margin);
4811-
4812- for (k = gn0; k <= gn1; k++) {
4813- float gc = *g_coeffs++;
4814-
4815- // skip zero and denormals - must skip zeros to avoid adding
4816- // coeffs beyond scatter_coefficient_width
4817- // (which happens when pivoting from horizontal, which
4818- // might have dummy zeros)
4819- if (((gc >= stbir__small_float) ||
4820- (gc <= -stbir__small_float))) {
4821- if ((k > highest_set) || (scatter_contributors->n0 >
4822- scatter_contributors->n1)) {
4823- {
4824- // if we are skipping over several contributors,
4825- // we need to clear the skipped ones
4826- stbir__contributors *clear_contributors =
4827- samp->contributors +
4828- (highest_set + filter_pixel_margin + 1);
4829- while (clear_contributors <
4830- scatter_contributors) {
4831- clear_contributors->n0 = 0;
4832- clear_contributors->n1 = -1;
4833- ++clear_contributors;
4834- }
4835- }
4836- scatter_contributors->n0 = n;
4837- scatter_contributors->n1 = n;
4838- scatter_coeffs[0] = gc;
4839- highest_set = k;
4840- } else {
4841- stbir__insert_coeff(scatter_contributors,
4842- scatter_coeffs, n, gc,
4843- scatter_coefficient_width);
4844- }
4845- STBIR_ASSERT((scatter_contributors->n1 -
4846- scatter_contributors->n0 + 1) <=
4847- scatter_coefficient_width);
4848- }
4849- ++scatter_contributors;
4850- scatter_coeffs += scatter_coefficient_width;
4851- }
4852-
4853- ++gather_contributors;
4854- gather_coeffs += gather_coefficient_width;
4855- }
4856-
4857- // now clear any unset contribs
4858- {
4859- stbir__contributors *clear_contributors =
4860- samp->contributors +
4861- (highest_set + filter_pixel_margin + 1);
4862- stbir__contributors *end_contributors =
4863- samp->contributors + samp->num_contributors;
4864- while (clear_contributors < end_contributors) {
4865- clear_contributors->n0 = 0;
4866- clear_contributors->n1 = -1;
4867- ++clear_contributors;
4868- }
4869- }
4870-
4871- STBIR_PROFILE_BUILD_END(pivot);
4872- }
4873- } break;
4874- }
4875-}
4876-
4877-//========================================================================================================
4878-// scanline decoders and encoders
4879-
4880-#define stbir__coder_min_num 1
4881-#define STB_IMAGE_RESIZE_DO_CODERS
4882-#include STBIR__HEADER_FILENAME
4883-
4884-#define stbir__decode_suffix BGRA
4885-#define stbir__decode_swizzle
4886-#define stbir__decode_order0 2
4887-#define stbir__decode_order1 1
4888-#define stbir__decode_order2 0
4889-#define stbir__decode_order3 3
4890-#define stbir__encode_order0 2
4891-#define stbir__encode_order1 1
4892-#define stbir__encode_order2 0
4893-#define stbir__encode_order3 3
4894-#define stbir__coder_min_num 4
4895-#define STB_IMAGE_RESIZE_DO_CODERS
4896-#include STBIR__HEADER_FILENAME
4897-
4898-#define stbir__decode_suffix ARGB
4899-#define stbir__decode_swizzle
4900-#define stbir__decode_order0 1
4901-#define stbir__decode_order1 2
4902-#define stbir__decode_order2 3
4903-#define stbir__decode_order3 0
4904-#define stbir__encode_order0 3
4905-#define stbir__encode_order1 0
4906-#define stbir__encode_order2 1
4907-#define stbir__encode_order3 2
4908-#define stbir__coder_min_num 4
4909-#define STB_IMAGE_RESIZE_DO_CODERS
4910-#include STBIR__HEADER_FILENAME
4911-
4912-#define stbir__decode_suffix ABGR
4913-#define stbir__decode_swizzle
4914-#define stbir__decode_order0 3
4915-#define stbir__decode_order1 2
4916-#define stbir__decode_order2 1
4917-#define stbir__decode_order3 0
4918-#define stbir__encode_order0 3
4919-#define stbir__encode_order1 2
4920-#define stbir__encode_order2 1
4921-#define stbir__encode_order3 0
4922-#define stbir__coder_min_num 4
4923-#define STB_IMAGE_RESIZE_DO_CODERS
4924-#include STBIR__HEADER_FILENAME
4925-
4926-#define stbir__decode_suffix AR
4927-#define stbir__decode_swizzle
4928-#define stbir__decode_order0 1
4929-#define stbir__decode_order1 0
4930-#define stbir__decode_order2 3
4931-#define stbir__decode_order3 2
4932-#define stbir__encode_order0 1
4933-#define stbir__encode_order1 0
4934-#define stbir__encode_order2 3
4935-#define stbir__encode_order3 2
4936-#define stbir__coder_min_num 2
4937-#define STB_IMAGE_RESIZE_DO_CODERS
4938-#include STBIR__HEADER_FILENAME
4939-
4940-// fancy alpha means we expand to keep both premultipied and non-premultiplied
4941-// color channels
4942-static void
4943-stbir__fancy_alpha_weight_4ch(float *out_buffer, int width_times_channels)
4944-{
4945- float STBIR_STREAMOUT_PTR(*) out = out_buffer;
4946- float const *end_decode =
4947- out_buffer + (width_times_channels / 4) *
4948- 7; // decode buffer aligned to end of out_buffer
4949- float STBIR_STREAMOUT_PTR(*) decode =
4950- (float *)end_decode - width_times_channels;
4951-
4952- // fancy alpha is stored internally as R G B A Rpm Gpm Bpm
4953-
4954-#ifdef STBIR_SIMD
4955-
4956-#ifdef STBIR_SIMD8
4957- decode += 16;
4958- STBIR_NO_UNROLL_LOOP_START
4959- while (decode <= end_decode) {
4960- stbir__simdf8 d0, d1, a0, a1, p0, p1;
4961- STBIR_NO_UNROLL(decode);
4962- stbir__simdf8_load(d0, decode - 16);
4963- stbir__simdf8_load(d1, decode - 16 + 8);
4964- stbir__simdf8_0123to33333333(a0, d0);
4965- stbir__simdf8_0123to33333333(a1, d1);
4966- stbir__simdf8_mult(p0, a0, d0);
4967- stbir__simdf8_mult(p1, a1, d1);
4968- stbir__simdf8_bot4s(a0, d0, p0);
4969- stbir__simdf8_bot4s(a1, d1, p1);
4970- stbir__simdf8_top4s(d0, d0, p0);
4971- stbir__simdf8_top4s(d1, d1, p1);
4972- stbir__simdf8_store(out, a0);
4973- stbir__simdf8_store(out + 7, d0);
4974- stbir__simdf8_store(out + 14, a1);
4975- stbir__simdf8_store(out + 21, d1);
4976- decode += 16;
4977- out += 28;
4978- }
4979- decode -= 16;
4980-#else
4981- decode += 8;
4982- STBIR_NO_UNROLL_LOOP_START
4983- while (decode <= end_decode) {
4984- stbir__simdf d0, a0, d1, a1, p0, p1;
4985- STBIR_NO_UNROLL(decode);
4986- stbir__simdf_load(d0, decode - 8);
4987- stbir__simdf_load(d1, decode - 8 + 4);
4988- stbir__simdf_0123to3333(a0, d0);
4989- stbir__simdf_0123to3333(a1, d1);
4990- stbir__simdf_mult(p0, a0, d0);
4991- stbir__simdf_mult(p1, a1, d1);
4992- stbir__simdf_store(out, d0);
4993- stbir__simdf_store(out + 4, p0);
4994- stbir__simdf_store(out + 7, d1);
4995- stbir__simdf_store(out + 7 + 4, p1);
4996- decode += 8;
4997- out += 14;
4998- }
4999- decode -= 8;
5000-#endif
5001-
5002-// might be one last odd pixel
5003-#ifdef STBIR_SIMD8
5004- STBIR_NO_UNROLL_LOOP_START
5005- while (decode < end_decode)
5006-#else
5007- if (decode < end_decode)
5008-#endif
5009- {
5010- stbir__simdf d, a, p;
5011- STBIR_NO_UNROLL(decode);
5012- stbir__simdf_load(d, decode);
5013- stbir__simdf_0123to3333(a, d);
5014- stbir__simdf_mult(p, a, d);
5015- stbir__simdf_store(out, d);
5016- stbir__simdf_store(out + 4, p);
5017- decode += 4;
5018- out += 7;
5019- }
5020-
5021-#else
5022-
5023- while (decode < end_decode) {
5024- float r = decode[0], g = decode[1], b = decode[2], alpha = decode[3];
5025- out[0] = r;
5026- out[1] = g;
5027- out[2] = b;
5028- out[3] = alpha;
5029- out[4] = r * alpha;
5030- out[5] = g * alpha;
5031- out[6] = b * alpha;
5032- out += 7;
5033- decode += 4;
5034- }
5035-
5036-#endif
5037-}
5038-
5039-static void
5040-stbir__fancy_alpha_weight_2ch(float *out_buffer, int width_times_channels)
5041-{
5042- float STBIR_STREAMOUT_PTR(*) out = out_buffer;
5043- float const *end_decode = out_buffer + (width_times_channels / 2) * 3;
5044- float STBIR_STREAMOUT_PTR(*) decode =
5045- (float *)end_decode - width_times_channels;
5046-
5047- // for fancy alpha, turns into: [X A Xpm][X A Xpm],etc
5048-
5049-#ifdef STBIR_SIMD
5050-
5051- decode += 8;
5052- if (decode <= end_decode) {
5053- STBIR_NO_UNROLL_LOOP_START
5054- do {
5055-#ifdef STBIR_SIMD8
5056- stbir__simdf8 d0, a0, p0;
5057- STBIR_NO_UNROLL(decode);
5058- stbir__simdf8_load(d0, decode - 8);
5059- stbir__simdf8_0123to11331133(p0, d0);
5060- stbir__simdf8_0123to00220022(a0, d0);
5061- stbir__simdf8_mult(p0, p0, a0);
5062-
5063- stbir__simdf_store2(out, stbir__if_simdf8_cast_to_simdf4(d0));
5064- stbir__simdf_store(out + 2, stbir__if_simdf8_cast_to_simdf4(p0));
5065- stbir__simdf_store2h(out + 3, stbir__if_simdf8_cast_to_simdf4(d0));
5066-
5067- stbir__simdf_store2(out + 6, stbir__simdf8_gettop4(d0));
5068- stbir__simdf_store(out + 8, stbir__simdf8_gettop4(p0));
5069- stbir__simdf_store2h(out + 9, stbir__simdf8_gettop4(d0));
5070-#else
5071- stbir__simdf d0, a0, d1, a1, p0, p1;
5072- STBIR_NO_UNROLL(decode);
5073- stbir__simdf_load(d0, decode - 8);
5074- stbir__simdf_load(d1, decode - 8 + 4);
5075- stbir__simdf_0123to1133(p0, d0);
5076- stbir__simdf_0123to1133(p1, d1);
5077- stbir__simdf_0123to0022(a0, d0);
5078- stbir__simdf_0123to0022(a1, d1);
5079- stbir__simdf_mult(p0, p0, a0);
5080- stbir__simdf_mult(p1, p1, a1);
5081-
5082- stbir__simdf_store2(out, d0);
5083- stbir__simdf_store(out + 2, p0);
5084- stbir__simdf_store2h(out + 3, d0);
5085-
5086- stbir__simdf_store2(out + 6, d1);
5087- stbir__simdf_store(out + 8, p1);
5088- stbir__simdf_store2h(out + 9, d1);
5089-#endif
5090- decode += 8;
5091- out += 12;
5092- } while (decode <= end_decode);
5093- }
5094- decode -= 8;
5095-#endif
5096-
5097- STBIR_SIMD_NO_UNROLL_LOOP_START
5098- while (decode < end_decode) {
5099- float x = decode[0], y = decode[1];
5100- STBIR_SIMD_NO_UNROLL(decode);
5101- out[0] = x;
5102- out[1] = y;
5103- out[2] = x * y;
5104- out += 3;
5105- decode += 2;
5106- }
5107-}
5108-
5109-static void
5110-stbir__fancy_alpha_unweight_4ch(float *encode_buffer, int width_times_channels)
5111-{
5112- float STBIR_SIMD_STREAMOUT_PTR(*) encode = encode_buffer;
5113- float STBIR_SIMD_STREAMOUT_PTR(*) input = encode_buffer;
5114- float const *end_output = encode_buffer + width_times_channels;
5115-
5116- // fancy RGBA is stored internally as R G B A Rpm Gpm Bpm
5117-
5118- STBIR_SIMD_NO_UNROLL_LOOP_START
5119- do {
5120- float alpha = input[3];
5121-#ifdef STBIR_SIMD
5122- stbir__simdf i, ia;
5123- STBIR_SIMD_NO_UNROLL(encode);
5124- if (alpha < stbir__small_float) {
5125- stbir__simdf_load(i, input);
5126- stbir__simdf_store(encode, i);
5127- } else {
5128- stbir__simdf_load1frep4(ia, 1.0f / alpha);
5129- stbir__simdf_load(i, input + 4);
5130- stbir__simdf_mult(i, i, ia);
5131- stbir__simdf_store(encode, i);
5132- encode[3] = alpha;
5133- }
5134-#else
5135- if (alpha < stbir__small_float) {
5136- encode[0] = input[0];
5137- encode[1] = input[1];
5138- encode[2] = input[2];
5139- } else {
5140- float ialpha = 1.0f / alpha;
5141- encode[0] = input[4] * ialpha;
5142- encode[1] = input[5] * ialpha;
5143- encode[2] = input[6] * ialpha;
5144- }
5145- encode[3] = alpha;
5146-#endif
5147-
5148- input += 7;
5149- encode += 4;
5150- } while (encode < end_output);
5151-}
5152-
5153-// format: [X A Xpm][X A Xpm] etc
5154-static void
5155-stbir__fancy_alpha_unweight_2ch(float *encode_buffer, int width_times_channels)
5156-{
5157- float STBIR_SIMD_STREAMOUT_PTR(*) encode = encode_buffer;
5158- float STBIR_SIMD_STREAMOUT_PTR(*) input = encode_buffer;
5159- float const *end_output = encode_buffer + width_times_channels;
5160-
5161- do {
5162- float alpha = input[1];
5163- encode[0] = input[0];
5164- if (alpha >= stbir__small_float) {
5165- encode[0] = input[2] / alpha;
5166- }
5167- encode[1] = alpha;
5168-
5169- input += 3;
5170- encode += 2;
5171- } while (encode < end_output);
5172-}
5173-
5174-static void
5175-stbir__simple_alpha_weight_4ch(float *decode_buffer, int width_times_channels)
5176-{
5177- float STBIR_STREAMOUT_PTR(*) decode = decode_buffer;
5178- float const *end_decode = decode_buffer + width_times_channels;
5179-
5180-#ifdef STBIR_SIMD
5181- {
5182- decode += 2 * stbir__simdfX_float_count;
5183- STBIR_NO_UNROLL_LOOP_START
5184- while (decode <= end_decode) {
5185- stbir__simdfX d0, a0, d1, a1;
5186- STBIR_NO_UNROLL(decode);
5187- stbir__simdfX_load(d0, decode - 2 * stbir__simdfX_float_count);
5188- stbir__simdfX_load(d1, decode - 2 * stbir__simdfX_float_count +
5189- stbir__simdfX_float_count);
5190- stbir__simdfX_aaa1(a0, d0, STBIR_onesX);
5191- stbir__simdfX_aaa1(a1, d1, STBIR_onesX);
5192- stbir__simdfX_mult(d0, d0, a0);
5193- stbir__simdfX_mult(d1, d1, a1);
5194- stbir__simdfX_store(decode - 2 * stbir__simdfX_float_count, d0);
5195- stbir__simdfX_store(decode - 2 * stbir__simdfX_float_count +
5196- stbir__simdfX_float_count,
5197- d1);
5198- decode += 2 * stbir__simdfX_float_count;
5199- }
5200- decode -= 2 * stbir__simdfX_float_count;
5201-
5202-// few last pixels remnants
5203-#ifdef STBIR_SIMD8
5204- STBIR_NO_UNROLL_LOOP_START
5205- while (decode < end_decode)
5206-#else
5207- if (decode < end_decode)
5208-#endif
5209- {
5210- stbir__simdf d, a;
5211- stbir__simdf_load(d, decode);
5212- stbir__simdf_aaa1(a, d, STBIR__CONSTF(STBIR_ones));
5213- stbir__simdf_mult(d, d, a);
5214- stbir__simdf_store(decode, d);
5215- decode += 4;
5216- }
5217- }
5218-
5219-#else
5220-
5221- while (decode < end_decode) {
5222- float alpha = decode[3];
5223- decode[0] *= alpha;
5224- decode[1] *= alpha;
5225- decode[2] *= alpha;
5226- decode += 4;
5227- }
5228-
5229-#endif
5230-}
5231-
5232-static void
5233-stbir__simple_alpha_weight_2ch(float *decode_buffer, int width_times_channels)
5234-{
5235- float STBIR_STREAMOUT_PTR(*) decode = decode_buffer;
5236- float const *end_decode = decode_buffer + width_times_channels;
5237-
5238-#ifdef STBIR_SIMD
5239- decode += 2 * stbir__simdfX_float_count;
5240- STBIR_NO_UNROLL_LOOP_START
5241- while (decode <= end_decode) {
5242- stbir__simdfX d0, a0, d1, a1;
5243- STBIR_NO_UNROLL(decode);
5244- stbir__simdfX_load(d0, decode - 2 * stbir__simdfX_float_count);
5245- stbir__simdfX_load(d1, decode - 2 * stbir__simdfX_float_count +
5246- stbir__simdfX_float_count);
5247- stbir__simdfX_a1a1(a0, d0, STBIR_onesX);
5248- stbir__simdfX_a1a1(a1, d1, STBIR_onesX);
5249- stbir__simdfX_mult(d0, d0, a0);
5250- stbir__simdfX_mult(d1, d1, a1);
5251- stbir__simdfX_store(decode - 2 * stbir__simdfX_float_count, d0);
5252- stbir__simdfX_store(decode - 2 * stbir__simdfX_float_count +
5253- stbir__simdfX_float_count,
5254- d1);
5255- decode += 2 * stbir__simdfX_float_count;
5256- }
5257- decode -= 2 * stbir__simdfX_float_count;
5258-#endif
5259-
5260- STBIR_SIMD_NO_UNROLL_LOOP_START
5261- while (decode < end_decode) {
5262- float alpha = decode[1];
5263- STBIR_SIMD_NO_UNROLL(decode);
5264- decode[0] *= alpha;
5265- decode += 2;
5266- }
5267-}
5268-
5269-static void
5270-stbir__simple_alpha_unweight_4ch(float *encode_buffer, int width_times_channels)
5271-{
5272- float STBIR_SIMD_STREAMOUT_PTR(*) encode = encode_buffer;
5273- float const *end_output = encode_buffer + width_times_channels;
5274-
5275- STBIR_SIMD_NO_UNROLL_LOOP_START
5276- do {
5277- float alpha = encode[3];
5278-
5279-#ifdef STBIR_SIMD
5280- stbir__simdf i, ia;
5281- STBIR_SIMD_NO_UNROLL(encode);
5282- if (alpha >= stbir__small_float) {
5283- stbir__simdf_load1frep4(ia, 1.0f / alpha);
5284- stbir__simdf_load(i, encode);
5285- stbir__simdf_mult(i, i, ia);
5286- stbir__simdf_store(encode, i);
5287- encode[3] = alpha;
5288- }
5289-#else
5290- if (alpha >= stbir__small_float) {
5291- float ialpha = 1.0f / alpha;
5292- encode[0] *= ialpha;
5293- encode[1] *= ialpha;
5294- encode[2] *= ialpha;
5295- }
5296-#endif
5297- encode += 4;
5298- } while (encode < end_output);
5299-}
5300-
5301-static void
5302-stbir__simple_alpha_unweight_2ch(float *encode_buffer, int width_times_channels)
5303-{
5304- float STBIR_SIMD_STREAMOUT_PTR(*) encode = encode_buffer;
5305- float const *end_output = encode_buffer + width_times_channels;
5306-
5307- do {
5308- float alpha = encode[1];
5309- if (alpha >= stbir__small_float) {
5310- encode[0] /= alpha;
5311- }
5312- encode += 2;
5313- } while (encode < end_output);
5314-}
5315-
5316-// only used in RGB->BGR or BGR->RGB
5317-static void
5318-stbir__simple_flip_3ch(float *decode_buffer, int width_times_channels)
5319-{
5320- float STBIR_STREAMOUT_PTR(*) decode = decode_buffer;
5321- float const *end_decode = decode_buffer + width_times_channels;
5322-
5323-#ifdef STBIR_SIMD
5324-#ifdef stbir__simdf_swiz2 // do we have two argument swizzles?
5325- end_decode -= 12;
5326- STBIR_NO_UNROLL_LOOP_START
5327- while (decode <= end_decode) {
5328- // on arm64 8 instructions, no overlapping stores
5329- stbir__simdf a, b, c, na, nb;
5330- STBIR_SIMD_NO_UNROLL(decode);
5331- stbir__simdf_load(a, decode);
5332- stbir__simdf_load(b, decode + 4);
5333- stbir__simdf_load(c, decode + 8);
5334-
5335- na = stbir__simdf_swiz2(a, b, 2, 1, 0, 5);
5336- b = stbir__simdf_swiz2(a, b, 4, 3, 6, 7);
5337- nb = stbir__simdf_swiz2(b, c, 0, 1, 4, 3);
5338- c = stbir__simdf_swiz2(b, c, 2, 7, 6, 5);
5339-
5340- stbir__simdf_store(decode, na);
5341- stbir__simdf_store(decode + 4, nb);
5342- stbir__simdf_store(decode + 8, c);
5343- decode += 12;
5344- }
5345- end_decode += 12;
5346-#else
5347- end_decode -= 24;
5348- STBIR_NO_UNROLL_LOOP_START
5349- while (decode <= end_decode) {
5350- // 26 instructions on x64
5351- stbir__simdf a, b, c, d, e, f, g;
5352- float i21, i23;
5353- STBIR_SIMD_NO_UNROLL(decode);
5354- stbir__simdf_load(a, decode);
5355- stbir__simdf_load(b, decode + 3);
5356- stbir__simdf_load(c, decode + 6);
5357- stbir__simdf_load(d, decode + 9);
5358- stbir__simdf_load(e, decode + 12);
5359- stbir__simdf_load(f, decode + 15);
5360- stbir__simdf_load(g, decode + 18);
5361-
5362- a = stbir__simdf_swiz(a, 2, 1, 0, 3);
5363- b = stbir__simdf_swiz(b, 2, 1, 0, 3);
5364- c = stbir__simdf_swiz(c, 2, 1, 0, 3);
5365- d = stbir__simdf_swiz(d, 2, 1, 0, 3);
5366- e = stbir__simdf_swiz(e, 2, 1, 0, 3);
5367- f = stbir__simdf_swiz(f, 2, 1, 0, 3);
5368- g = stbir__simdf_swiz(g, 2, 1, 0, 3);
5369-
5370- // stores overlap, need to be in order,
5371- stbir__simdf_store(decode, a);
5372- i21 = decode[21];
5373- stbir__simdf_store(decode + 3, b);
5374- i23 = decode[23];
5375- stbir__simdf_store(decode + 6, c);
5376- stbir__simdf_store(decode + 9, d);
5377- stbir__simdf_store(decode + 12, e);
5378- stbir__simdf_store(decode + 15, f);
5379- stbir__simdf_store(decode + 18, g);
5380- decode[21] = i23;
5381- decode[23] = i21;
5382- decode += 24;
5383- }
5384- end_decode += 24;
5385-#endif
5386-#else
5387- end_decode -= 12;
5388- STBIR_NO_UNROLL_LOOP_START
5389- while (decode <= end_decode) {
5390- // 16 instructions
5391- float t0, t1, t2, t3;
5392- STBIR_NO_UNROLL(decode);
5393- t0 = decode[0];
5394- t1 = decode[3];
5395- t2 = decode[6];
5396- t3 = decode[9];
5397- decode[0] = decode[2];
5398- decode[3] = decode[5];
5399- decode[6] = decode[8];
5400- decode[9] = decode[11];
5401- decode[2] = t0;
5402- decode[5] = t1;
5403- decode[8] = t2;
5404- decode[11] = t3;
5405- decode += 12;
5406- }
5407- end_decode += 12;
5408-#endif
5409-
5410- STBIR_NO_UNROLL_LOOP_START
5411- while (decode < end_decode) {
5412- float t = decode[0];
5413- STBIR_NO_UNROLL(decode);
5414- decode[0] = decode[2];
5415- decode[2] = t;
5416- decode += 3;
5417- }
5418-}
5419-
5420-static void
5421-stbir__decode_scanline(stbir__info const *stbir_info, int n,
5422- float *output_buffer STBIR_ONLY_PROFILE_GET_SPLIT_INFO)
5423-{
5424- int channels = stbir_info->channels;
5425- int effective_channels = stbir_info->effective_channels;
5426- int input_sample_in_bytes =
5427- stbir__type_size[stbir_info->input_type] * channels;
5428- stbir_edge edge_horizontal = stbir_info->horizontal.edge;
5429- stbir_edge edge_vertical = stbir_info->vertical.edge;
5430- int row = stbir__edge_wrap(edge_vertical, n,
5431- stbir_info->vertical.scale_info.input_full_size);
5432- const void *input_plane_data =
5433- ((char *)stbir_info->input_data) +
5434- (size_t)row * (size_t)stbir_info->input_stride_bytes;
5435- stbir__span const *spans = stbir_info->scanline_extents.spans;
5436- float *full_decode_buffer =
5437- output_buffer -
5438- stbir_info->scanline_extents.conservative.n0 * effective_channels;
5439- float *last_decoded = 0;
5440-
5441- // if we are on edge_zero, and we get in here with an out of bounds n, then
5442- // the calculate filters has failed
5443- STBIR_ASSERT(
5444- !(edge_vertical == STBIR_EDGE_ZERO &&
5445- (n < 0 || n >= stbir_info->vertical.scale_info.input_full_size)));
5446-
5447- do {
5448- float *decode_buffer;
5449- void const *input_data;
5450- float *end_decode;
5451- int width_times_channels;
5452- int width;
5453-
5454- if (spans->n1 < spans->n0) {
5455- break;
5456- }
5457-
5458- width = spans->n1 + 1 - spans->n0;
5459- decode_buffer = full_decode_buffer + spans->n0 * effective_channels;
5460- end_decode = full_decode_buffer + (spans->n1 + 1) * effective_channels;
5461- width_times_channels = width * channels;
5462-
5463- // read directly out of input plane by default
5464- input_data = ((char *)input_plane_data) +
5465- spans->pixel_offset_for_input * input_sample_in_bytes;
5466-
5467- // if we have an input callback, call it to get the input data
5468- if (stbir_info->in_pixels_cb) {
5469- // call the callback with a temp buffer (that they can choose to use
5470- // or not). the temp is just right aligned memory in the
5471- // decode_buffer itself
5472- input_data = stbir_info->in_pixels_cb(
5473- ((char *)end_decode) - (width * input_sample_in_bytes) +
5474- ((stbir_info->input_type != STBIR_TYPE_FLOAT)
5475- ? (sizeof(float) * STBIR_INPUT_CALLBACK_PADDING)
5476- : 0),
5477- input_plane_data, width, spans->pixel_offset_for_input, row,
5478- stbir_info->user_data);
5479- }
5480-
5481- STBIR_PROFILE_START(decode);
5482- // convert the pixels info the float decode_buffer, (we index from
5483- // end_decode, so that when channels<effective_channels, we are right
5484- // justified in the buffer)
5485- last_decoded = stbir_info->decode_pixels(
5486- (float *)end_decode - width_times_channels, width_times_channels,
5487- input_data);
5488- STBIR_PROFILE_END(decode);
5489-
5490- if (stbir_info->alpha_weight) {
5491- STBIR_PROFILE_START(alpha);
5492- stbir_info->alpha_weight(decode_buffer, width_times_channels);
5493- STBIR_PROFILE_END(alpha);
5494- }
5495-
5496- ++spans;
5497- } while (spans <= (&stbir_info->scanline_extents.spans[1]));
5498-
5499- // handle the edge_wrap filter (all other types are handled back out at the
5500- // calculate_filter stage) basically the idea here is that if we have the
5501- // whole scanline in memory, we don't redecode the
5502- // wrapped edge pixels, and instead just memcpy them from the scanline
5503- // into the edge positions
5504- if ((edge_horizontal == STBIR_EDGE_WRAP) &&
5505- (stbir_info->scanline_extents.edge_sizes[0] |
5506- stbir_info->scanline_extents.edge_sizes[1])) {
5507- // this code only runs if we're in edge_wrap, and we're doing the entire
5508- // scanline
5509- int e, start_x[2];
5510- int input_full_size = stbir_info->horizontal.scale_info.input_full_size;
5511-
5512- start_x[0] =
5513- -stbir_info->scanline_extents.edge_sizes[0]; // left edge start x
5514- start_x[1] = input_full_size; // right edge
5515-
5516- for (e = 0; e < 2; e++) {
5517- // do each margin
5518- int margin = stbir_info->scanline_extents.edge_sizes[e];
5519- if (margin) {
5520- int x = start_x[e];
5521- float *marg = full_decode_buffer + x * effective_channels;
5522- float const *src =
5523- full_decode_buffer +
5524- stbir__edge_wrap(edge_horizontal, x, input_full_size) *
5525- effective_channels;
5526- STBIR_MEMCPY(marg, src,
5527- margin * effective_channels * sizeof(float));
5528- if (e == 1) {
5529- last_decoded = marg + margin * effective_channels;
5530- }
5531- }
5532- }
5533- }
5534-
5535- // some of the horizontal gathers read one float off the edge (which is
5536- // masked out), but we force a zero here to make sure no NaNs leak in
5537- // (we can't pre-zero it, because the input callback can use that area as
5538- // padding)
5539- last_decoded[0] = 0.0f;
5540-
5541- // we clear this extra float, because the final output pixel filter kernel
5542- // might have used one less coeff than the max filter width
5543- // when this happens, we do read that pixel from the input, so it too
5544- // could be Nan, so just zero an extra one. this fits because each
5545- // scanline is padded by three floats (STBIR_INPUT_CALLBACK_PADDING)
5546- last_decoded[1] = 0.0f;
5547-}
5548-
5549-//=================
5550-// Do 1 channel horizontal routines
5551-
5552-#ifdef STBIR_SIMD
5553-
5554-#define stbir__1_coeff_only() \
5555- stbir__simdf tot, c; \
5556- STBIR_SIMD_NO_UNROLL(decode); \
5557- stbir__simdf_load1(c, hc); \
5558- stbir__simdf_mult1_mem(tot, c, decode);
5559-
5560-#define stbir__2_coeff_only() \
5561- stbir__simdf tot, c, d; \
5562- STBIR_SIMD_NO_UNROLL(decode); \
5563- stbir__simdf_load2z(c, hc); \
5564- stbir__simdf_load2(d, decode); \
5565- stbir__simdf_mult(tot, c, d); \
5566- stbir__simdf_0123to1230(c, tot); \
5567- stbir__simdf_add1(tot, tot, c);
5568-
5569-#define stbir__3_coeff_only() \
5570- stbir__simdf tot, c, t; \
5571- STBIR_SIMD_NO_UNROLL(decode); \
5572- stbir__simdf_load(c, hc); \
5573- stbir__simdf_mult_mem(tot, c, decode); \
5574- stbir__simdf_0123to1230(c, tot); \
5575- stbir__simdf_0123to2301(t, tot); \
5576- stbir__simdf_add1(tot, tot, c); \
5577- stbir__simdf_add1(tot, tot, t);
5578-
5579-#define stbir__store_output_tiny() \
5580- stbir__simdf_store1(output, tot); \
5581- horizontal_coefficients += coefficient_width; \
5582- ++horizontal_contributors; \
5583- output += 1;
5584-
5585-#define stbir__4_coeff_start() \
5586- stbir__simdf tot, c; \
5587- STBIR_SIMD_NO_UNROLL(decode); \
5588- stbir__simdf_load(c, hc); \
5589- stbir__simdf_mult_mem(tot, c, decode);
5590-
5591-#define stbir__4_coeff_continue_from_4(ofs) \
5592- STBIR_SIMD_NO_UNROLL(decode); \
5593- stbir__simdf_load(c, hc + (ofs)); \
5594- stbir__simdf_madd_mem(tot, tot, c, decode + (ofs));
5595-
5596-#define stbir__1_coeff_remnant(ofs) \
5597- { \
5598- stbir__simdf d; \
5599- stbir__simdf_load1z(c, hc + (ofs)); \
5600- stbir__simdf_load1(d, decode + (ofs)); \
5601- stbir__simdf_madd(tot, tot, d, c); \
5602- }
5603-
5604-#define stbir__2_coeff_remnant(ofs) \
5605- { \
5606- stbir__simdf d; \
5607- stbir__simdf_load2z(c, hc + (ofs)); \
5608- stbir__simdf_load2(d, decode + (ofs)); \
5609- stbir__simdf_madd(tot, tot, d, c); \
5610- }
5611-
5612-#define stbir__3_coeff_setup() \
5613- stbir__simdf mask; \
5614- stbir__simdf_load(mask, STBIR_mask + 3);
5615-
5616-#define stbir__3_coeff_remnant(ofs) \
5617- stbir__simdf_load(c, hc + (ofs)); \
5618- stbir__simdf_and(c, c, mask); \
5619- stbir__simdf_madd_mem(tot, tot, c, decode + (ofs));
5620-
5621-#define stbir__store_output() \
5622- stbir__simdf_0123to2301(c, tot); \
5623- stbir__simdf_add(tot, tot, c); \
5624- stbir__simdf_0123to1230(c, tot); \
5625- stbir__simdf_add1(tot, tot, c); \
5626- stbir__simdf_store1(output, tot); \
5627- horizontal_coefficients += coefficient_width; \
5628- ++horizontal_contributors; \
5629- output += 1;
5630-
5631-#else
5632-
5633-#define stbir__1_coeff_only() \
5634- float tot; \
5635- tot = decode[0] * hc[0];
5636-
5637-#define stbir__2_coeff_only() \
5638- float tot; \
5639- tot = decode[0] * hc[0]; \
5640- tot += decode[1] * hc[1];
5641-
5642-#define stbir__3_coeff_only() \
5643- float tot; \
5644- tot = decode[0] * hc[0]; \
5645- tot += decode[1] * hc[1]; \
5646- tot += decode[2] * hc[2];
5647-
5648-#define stbir__store_output_tiny() \
5649- output[0] = tot; \
5650- horizontal_coefficients += coefficient_width; \
5651- ++horizontal_contributors; \
5652- output += 1;
5653-
5654-#define stbir__4_coeff_start() \
5655- float tot0, tot1, tot2, tot3; \
5656- tot0 = decode[0] * hc[0]; \
5657- tot1 = decode[1] * hc[1]; \
5658- tot2 = decode[2] * hc[2]; \
5659- tot3 = decode[3] * hc[3];
5660-
5661-#define stbir__4_coeff_continue_from_4(ofs) \
5662- tot0 += decode[0 + (ofs)] * hc[0 + (ofs)]; \
5663- tot1 += decode[1 + (ofs)] * hc[1 + (ofs)]; \
5664- tot2 += decode[2 + (ofs)] * hc[2 + (ofs)]; \
5665- tot3 += decode[3 + (ofs)] * hc[3 + (ofs)];
5666-
5667-#define stbir__1_coeff_remnant(ofs) tot0 += decode[0 + (ofs)] * hc[0 + (ofs)];
5668-
5669-#define stbir__2_coeff_remnant(ofs) \
5670- tot0 += decode[0 + (ofs)] * hc[0 + (ofs)]; \
5671- tot1 += decode[1 + (ofs)] * hc[1 + (ofs)];
5672-
5673-#define stbir__3_coeff_remnant(ofs) \
5674- tot0 += decode[0 + (ofs)] * hc[0 + (ofs)]; \
5675- tot1 += decode[1 + (ofs)] * hc[1 + (ofs)]; \
5676- tot2 += decode[2 + (ofs)] * hc[2 + (ofs)];
5677-
5678-#define stbir__store_output() \
5679- output[0] = (tot0 + tot2) + (tot1 + tot3); \
5680- horizontal_coefficients += coefficient_width; \
5681- ++horizontal_contributors; \
5682- output += 1;
5683-
5684-#endif
5685-
5686-#define STBIR__horizontal_channels 1
5687-#define STB_IMAGE_RESIZE_DO_HORIZONTALS
5688-#include STBIR__HEADER_FILENAME
5689-
5690-//=================
5691-// Do 2 channel horizontal routines
5692-
5693-#ifdef STBIR_SIMD
5694-
5695-#define stbir__1_coeff_only() \
5696- stbir__simdf tot, c, d; \
5697- STBIR_SIMD_NO_UNROLL(decode); \
5698- stbir__simdf_load1z(c, hc); \
5699- stbir__simdf_0123to0011(c, c); \
5700- stbir__simdf_load2(d, decode); \
5701- stbir__simdf_mult(tot, d, c);
5702-
5703-#define stbir__2_coeff_only() \
5704- stbir__simdf tot, c; \
5705- STBIR_SIMD_NO_UNROLL(decode); \
5706- stbir__simdf_load2(c, hc); \
5707- stbir__simdf_0123to0011(c, c); \
5708- stbir__simdf_mult_mem(tot, c, decode);
5709-
5710-#define stbir__3_coeff_only() \
5711- stbir__simdf tot, c, cs, d; \
5712- STBIR_SIMD_NO_UNROLL(decode); \
5713- stbir__simdf_load(cs, hc); \
5714- stbir__simdf_0123to0011(c, cs); \
5715- stbir__simdf_mult_mem(tot, c, decode); \
5716- stbir__simdf_0123to2222(c, cs); \
5717- stbir__simdf_load2z(d, decode + 4); \
5718- stbir__simdf_madd(tot, tot, d, c);
5719-
5720-#define stbir__store_output_tiny() \
5721- stbir__simdf_0123to2301(c, tot); \
5722- stbir__simdf_add(tot, tot, c); \
5723- stbir__simdf_store2(output, tot); \
5724- horizontal_coefficients += coefficient_width; \
5725- ++horizontal_contributors; \
5726- output += 2;
5727-
5728-#ifdef STBIR_SIMD8
5729-
5730-#define stbir__4_coeff_start() \
5731- stbir__simdf8 tot0, c, cs; \
5732- STBIR_SIMD_NO_UNROLL(decode); \
5733- stbir__simdf8_load4b(cs, hc); \
5734- stbir__simdf8_0123to00112233(c, cs); \
5735- stbir__simdf8_mult_mem(tot0, c, decode);
5736-
5737-#define stbir__4_coeff_continue_from_4(ofs) \
5738- STBIR_SIMD_NO_UNROLL(decode); \
5739- stbir__simdf8_load4b(cs, hc + (ofs)); \
5740- stbir__simdf8_0123to00112233(c, cs); \
5741- stbir__simdf8_madd_mem(tot0, tot0, c, decode + (ofs) * 2);
5742-
5743-#define stbir__1_coeff_remnant(ofs) \
5744- { \
5745- stbir__simdf t, d; \
5746- stbir__simdf_load1z(t, hc + (ofs)); \
5747- stbir__simdf_load2(d, decode + (ofs) * 2); \
5748- stbir__simdf_0123to0011(t, t); \
5749- stbir__simdf_mult(t, t, d); \
5750- stbir__simdf8_add4(tot0, tot0, t); \
5751- }
5752-
5753-#define stbir__2_coeff_remnant(ofs) \
5754- { \
5755- stbir__simdf t; \
5756- stbir__simdf_load2(t, hc + (ofs)); \
5757- stbir__simdf_0123to0011(t, t); \
5758- stbir__simdf_mult_mem(t, t, decode + (ofs) * 2); \
5759- stbir__simdf8_add4(tot0, tot0, t); \
5760- }
5761-
5762-#define stbir__3_coeff_remnant(ofs) \
5763- { \
5764- stbir__simdf8 d; \
5765- stbir__simdf8_load4b(cs, hc + (ofs)); \
5766- stbir__simdf8_0123to00112233(c, cs); \
5767- stbir__simdf8_load6z(d, decode + (ofs) * 2); \
5768- stbir__simdf8_madd(tot0, tot0, c, d); \
5769- }
5770-
5771-#define stbir__store_output() \
5772- { \
5773- stbir__simdf t, d; \
5774- stbir__simdf8_add4halves(t, stbir__if_simdf8_cast_to_simdf4(tot0), \
5775- tot0); \
5776- stbir__simdf_0123to2301(d, t); \
5777- stbir__simdf_add(t, t, d); \
5778- stbir__simdf_store2(output, t); \
5779- horizontal_coefficients += coefficient_width; \
5780- ++horizontal_contributors; \
5781- output += 2; \
5782- }
5783-
5784-#else
5785-
5786-#define stbir__4_coeff_start() \
5787- stbir__simdf tot0, tot1, c, cs; \
5788- STBIR_SIMD_NO_UNROLL(decode); \
5789- stbir__simdf_load(cs, hc); \
5790- stbir__simdf_0123to0011(c, cs); \
5791- stbir__simdf_mult_mem(tot0, c, decode); \
5792- stbir__simdf_0123to2233(c, cs); \
5793- stbir__simdf_mult_mem(tot1, c, decode + 4);
5794-
5795-#define stbir__4_coeff_continue_from_4(ofs) \
5796- STBIR_SIMD_NO_UNROLL(decode); \
5797- stbir__simdf_load(cs, hc + (ofs)); \
5798- stbir__simdf_0123to0011(c, cs); \
5799- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 2); \
5800- stbir__simdf_0123to2233(c, cs); \
5801- stbir__simdf_madd_mem(tot1, tot1, c, decode + (ofs) * 2 + 4);
5802-
5803-#define stbir__1_coeff_remnant(ofs) \
5804- { \
5805- stbir__simdf d; \
5806- stbir__simdf_load1z(cs, hc + (ofs)); \
5807- stbir__simdf_0123to0011(c, cs); \
5808- stbir__simdf_load2(d, decode + (ofs) * 2); \
5809- stbir__simdf_madd(tot0, tot0, d, c); \
5810- }
5811-
5812-#define stbir__2_coeff_remnant(ofs) \
5813- stbir__simdf_load2(cs, hc + (ofs)); \
5814- stbir__simdf_0123to0011(c, cs); \
5815- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 2);
5816-
5817-#define stbir__3_coeff_remnant(ofs) \
5818- { \
5819- stbir__simdf d; \
5820- stbir__simdf_load(cs, hc + (ofs)); \
5821- stbir__simdf_0123to0011(c, cs); \
5822- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 2); \
5823- stbir__simdf_0123to2222(c, cs); \
5824- stbir__simdf_load2z(d, decode + (ofs) * 2 + 4); \
5825- stbir__simdf_madd(tot1, tot1, d, c); \
5826- }
5827-
5828-#define stbir__store_output() \
5829- stbir__simdf_add(tot0, tot0, tot1); \
5830- stbir__simdf_0123to2301(c, tot0); \
5831- stbir__simdf_add(tot0, tot0, c); \
5832- stbir__simdf_store2(output, tot0); \
5833- horizontal_coefficients += coefficient_width; \
5834- ++horizontal_contributors; \
5835- output += 2;
5836-
5837-#endif
5838-
5839-#else
5840-
5841-#define stbir__1_coeff_only() \
5842- float tota, totb, c; \
5843- c = hc[0]; \
5844- tota = decode[0] * c; \
5845- totb = decode[1] * c;
5846-
5847-#define stbir__2_coeff_only() \
5848- float tota, totb, c; \
5849- c = hc[0]; \
5850- tota = decode[0] * c; \
5851- totb = decode[1] * c; \
5852- c = hc[1]; \
5853- tota += decode[2] * c; \
5854- totb += decode[3] * c;
5855-
5856-// this weird order of add matches the simd
5857-#define stbir__3_coeff_only() \
5858- float tota, totb, c; \
5859- c = hc[0]; \
5860- tota = decode[0] * c; \
5861- totb = decode[1] * c; \
5862- c = hc[2]; \
5863- tota += decode[4] * c; \
5864- totb += decode[5] * c; \
5865- c = hc[1]; \
5866- tota += decode[2] * c; \
5867- totb += decode[3] * c;
5868-
5869-#define stbir__store_output_tiny() \
5870- output[0] = tota; \
5871- output[1] = totb; \
5872- horizontal_coefficients += coefficient_width; \
5873- ++horizontal_contributors; \
5874- output += 2;
5875-
5876-#define stbir__4_coeff_start() \
5877- float tota0, tota1, tota2, tota3, totb0, totb1, totb2, totb3, c; \
5878- c = hc[0]; \
5879- tota0 = decode[0] * c; \
5880- totb0 = decode[1] * c; \
5881- c = hc[1]; \
5882- tota1 = decode[2] * c; \
5883- totb1 = decode[3] * c; \
5884- c = hc[2]; \
5885- tota2 = decode[4] * c; \
5886- totb2 = decode[5] * c; \
5887- c = hc[3]; \
5888- tota3 = decode[6] * c; \
5889- totb3 = decode[7] * c;
5890-
5891-#define stbir__4_coeff_continue_from_4(ofs) \
5892- c = hc[0 + (ofs)]; \
5893- tota0 += decode[0 + (ofs) * 2] * c; \
5894- totb0 += decode[1 + (ofs) * 2] * c; \
5895- c = hc[1 + (ofs)]; \
5896- tota1 += decode[2 + (ofs) * 2] * c; \
5897- totb1 += decode[3 + (ofs) * 2] * c; \
5898- c = hc[2 + (ofs)]; \
5899- tota2 += decode[4 + (ofs) * 2] * c; \
5900- totb2 += decode[5 + (ofs) * 2] * c; \
5901- c = hc[3 + (ofs)]; \
5902- tota3 += decode[6 + (ofs) * 2] * c; \
5903- totb3 += decode[7 + (ofs) * 2] * c;
5904-
5905-#define stbir__1_coeff_remnant(ofs) \
5906- c = hc[0 + (ofs)]; \
5907- tota0 += decode[0 + (ofs) * 2] * c; \
5908- totb0 += decode[1 + (ofs) * 2] * c;
5909-
5910-#define stbir__2_coeff_remnant(ofs) \
5911- c = hc[0 + (ofs)]; \
5912- tota0 += decode[0 + (ofs) * 2] * c; \
5913- totb0 += decode[1 + (ofs) * 2] * c; \
5914- c = hc[1 + (ofs)]; \
5915- tota1 += decode[2 + (ofs) * 2] * c; \
5916- totb1 += decode[3 + (ofs) * 2] * c;
5917-
5918-#define stbir__3_coeff_remnant(ofs) \
5919- c = hc[0 + (ofs)]; \
5920- tota0 += decode[0 + (ofs) * 2] * c; \
5921- totb0 += decode[1 + (ofs) * 2] * c; \
5922- c = hc[1 + (ofs)]; \
5923- tota1 += decode[2 + (ofs) * 2] * c; \
5924- totb1 += decode[3 + (ofs) * 2] * c; \
5925- c = hc[2 + (ofs)]; \
5926- tota2 += decode[4 + (ofs) * 2] * c; \
5927- totb2 += decode[5 + (ofs) * 2] * c;
5928-
5929-#define stbir__store_output() \
5930- output[0] = (tota0 + tota2) + (tota1 + tota3); \
5931- output[1] = (totb0 + totb2) + (totb1 + totb3); \
5932- horizontal_coefficients += coefficient_width; \
5933- ++horizontal_contributors; \
5934- output += 2;
5935-
5936-#endif
5937-
5938-#define STBIR__horizontal_channels 2
5939-#define STB_IMAGE_RESIZE_DO_HORIZONTALS
5940-#include STBIR__HEADER_FILENAME
5941-
5942-//=================
5943-// Do 3 channel horizontal routines
5944-
5945-#ifdef STBIR_SIMD
5946-
5947-#define stbir__1_coeff_only() \
5948- stbir__simdf tot, c, d; \
5949- STBIR_SIMD_NO_UNROLL(decode); \
5950- stbir__simdf_load1z(c, hc); \
5951- stbir__simdf_0123to0001(c, c); \
5952- stbir__simdf_load(d, decode); \
5953- stbir__simdf_mult(tot, d, c);
5954-
5955-#define stbir__2_coeff_only() \
5956- stbir__simdf tot, c, cs, d; \
5957- STBIR_SIMD_NO_UNROLL(decode); \
5958- stbir__simdf_load2(cs, hc); \
5959- stbir__simdf_0123to0000(c, cs); \
5960- stbir__simdf_load(d, decode); \
5961- stbir__simdf_mult(tot, d, c); \
5962- stbir__simdf_0123to1111(c, cs); \
5963- stbir__simdf_load(d, decode + 3); \
5964- stbir__simdf_madd(tot, tot, d, c);
5965-
5966-#define stbir__3_coeff_only() \
5967- stbir__simdf tot, c, d, cs; \
5968- STBIR_SIMD_NO_UNROLL(decode); \
5969- stbir__simdf_load(cs, hc); \
5970- stbir__simdf_0123to0000(c, cs); \
5971- stbir__simdf_load(d, decode); \
5972- stbir__simdf_mult(tot, d, c); \
5973- stbir__simdf_0123to1111(c, cs); \
5974- stbir__simdf_load(d, decode + 3); \
5975- stbir__simdf_madd(tot, tot, d, c); \
5976- stbir__simdf_0123to2222(c, cs); \
5977- stbir__simdf_load(d, decode + 6); \
5978- stbir__simdf_madd(tot, tot, d, c);
5979-
5980-#define stbir__store_output_tiny() \
5981- stbir__simdf_store2(output, tot); \
5982- stbir__simdf_0123to2301(tot, tot); \
5983- stbir__simdf_store1(output + 2, tot); \
5984- horizontal_coefficients += coefficient_width; \
5985- ++horizontal_contributors; \
5986- output += 3;
5987-
5988-#ifdef STBIR_SIMD8
5989-
5990-// we're loading from the XXXYYY decode by -1 to get the XXXYYY into different
5991-// halves of the AVX reg fyi
5992-#define stbir__4_coeff_start() \
5993- stbir__simdf8 tot0, tot1, c, cs; \
5994- stbir__simdf t; \
5995- STBIR_SIMD_NO_UNROLL(decode); \
5996- stbir__simdf8_load4b(cs, hc); \
5997- stbir__simdf8_0123to00001111(c, cs); \
5998- stbir__simdf8_mult_mem(tot0, c, decode - 1); \
5999- stbir__simdf8_0123to22223333(c, cs); \
6000- stbir__simdf8_mult_mem(tot1, c, decode + 6 - 1);
6001-
6002-#define stbir__4_coeff_continue_from_4(ofs) \
6003- STBIR_SIMD_NO_UNROLL(decode); \
6004- stbir__simdf8_load4b(cs, hc + (ofs)); \
6005- stbir__simdf8_0123to00001111(c, cs); \
6006- stbir__simdf8_madd_mem(tot0, tot0, c, decode + (ofs) * 3 - 1); \
6007- stbir__simdf8_0123to22223333(c, cs); \
6008- stbir__simdf8_madd_mem(tot1, tot1, c, decode + (ofs) * 3 + 6 - 1);
6009-
6010-#define stbir__1_coeff_remnant(ofs) \
6011- STBIR_SIMD_NO_UNROLL(decode); \
6012- stbir__simdf_load1rep4(t, hc + (ofs)); \
6013- stbir__simdf8_madd_mem4(tot0, tot0, t, decode + (ofs) * 3 - 1);
6014-
6015-#define stbir__2_coeff_remnant(ofs) \
6016- STBIR_SIMD_NO_UNROLL(decode); \
6017- stbir__simdf8_load4b(cs, hc + (ofs) - 2); \
6018- stbir__simdf8_0123to22223333(c, cs); \
6019- stbir__simdf8_madd_mem(tot0, tot0, c, decode + (ofs) * 3 - 1);
6020-
6021-#define stbir__3_coeff_remnant(ofs) \
6022- STBIR_SIMD_NO_UNROLL(decode); \
6023- stbir__simdf8_load4b(cs, hc + (ofs)); \
6024- stbir__simdf8_0123to00001111(c, cs); \
6025- stbir__simdf8_madd_mem(tot0, tot0, c, decode + (ofs) * 3 - 1); \
6026- stbir__simdf8_0123to2222(t, cs); \
6027- stbir__simdf8_madd_mem4(tot1, tot1, t, decode + (ofs) * 3 + 6 - 1);
6028-
6029-#define stbir__store_output() \
6030- stbir__simdf8_add(tot0, tot0, tot1); \
6031- stbir__simdf_0123to1230(t, stbir__if_simdf8_cast_to_simdf4(tot0)); \
6032- stbir__simdf8_add4halves(t, t, tot0); \
6033- horizontal_coefficients += coefficient_width; \
6034- ++horizontal_contributors; \
6035- output += 3; \
6036- if (output < output_end) { \
6037- stbir__simdf_store(output - 3, t); \
6038- continue; \
6039- } \
6040- { \
6041- stbir__simdf tt; \
6042- stbir__simdf_0123to2301(tt, t); \
6043- stbir__simdf_store2(output - 3, t); \
6044- stbir__simdf_store1(output + 2 - 3, tt); \
6045- } \
6046- break;
6047-
6048-#else
6049-
6050-#define stbir__4_coeff_start() \
6051- stbir__simdf tot0, tot1, tot2, c, cs; \
6052- STBIR_SIMD_NO_UNROLL(decode); \
6053- stbir__simdf_load(cs, hc); \
6054- stbir__simdf_0123to0001(c, cs); \
6055- stbir__simdf_mult_mem(tot0, c, decode); \
6056- stbir__simdf_0123to1122(c, cs); \
6057- stbir__simdf_mult_mem(tot1, c, decode + 4); \
6058- stbir__simdf_0123to2333(c, cs); \
6059- stbir__simdf_mult_mem(tot2, c, decode + 8);
6060-
6061-#define stbir__4_coeff_continue_from_4(ofs) \
6062- STBIR_SIMD_NO_UNROLL(decode); \
6063- stbir__simdf_load(cs, hc + (ofs)); \
6064- stbir__simdf_0123to0001(c, cs); \
6065- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 3); \
6066- stbir__simdf_0123to1122(c, cs); \
6067- stbir__simdf_madd_mem(tot1, tot1, c, decode + (ofs) * 3 + 4); \
6068- stbir__simdf_0123to2333(c, cs); \
6069- stbir__simdf_madd_mem(tot2, tot2, c, decode + (ofs) * 3 + 8);
6070-
6071-#define stbir__1_coeff_remnant(ofs) \
6072- STBIR_SIMD_NO_UNROLL(decode); \
6073- stbir__simdf_load1z(c, hc + (ofs)); \
6074- stbir__simdf_0123to0001(c, c); \
6075- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 3);
6076-
6077-#define stbir__2_coeff_remnant(ofs) \
6078- { \
6079- stbir__simdf d; \
6080- STBIR_SIMD_NO_UNROLL(decode); \
6081- stbir__simdf_load2z(cs, hc + (ofs)); \
6082- stbir__simdf_0123to0001(c, cs); \
6083- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 3); \
6084- stbir__simdf_0123to1122(c, cs); \
6085- stbir__simdf_load2z(d, decode + (ofs) * 3 + 4); \
6086- stbir__simdf_madd(tot1, tot1, c, d); \
6087- }
6088-
6089-#define stbir__3_coeff_remnant(ofs) \
6090- { \
6091- stbir__simdf d; \
6092- STBIR_SIMD_NO_UNROLL(decode); \
6093- stbir__simdf_load(cs, hc + (ofs)); \
6094- stbir__simdf_0123to0001(c, cs); \
6095- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 3); \
6096- stbir__simdf_0123to1122(c, cs); \
6097- stbir__simdf_madd_mem(tot1, tot1, c, decode + (ofs) * 3 + 4); \
6098- stbir__simdf_0123to2222(c, cs); \
6099- stbir__simdf_load1z(d, decode + (ofs) * 3 + 8); \
6100- stbir__simdf_madd(tot2, tot2, c, d); \
6101- }
6102-
6103-#define stbir__store_output() \
6104- stbir__simdf_0123ABCDto3ABx(c, tot0, tot1); \
6105- stbir__simdf_0123ABCDto23Ax(cs, tot1, tot2); \
6106- stbir__simdf_0123to1230(tot2, tot2); \
6107- stbir__simdf_add(tot0, tot0, cs); \
6108- stbir__simdf_add(c, c, tot2); \
6109- stbir__simdf_add(tot0, tot0, c); \
6110- horizontal_coefficients += coefficient_width; \
6111- ++horizontal_contributors; \
6112- output += 3; \
6113- if (output < output_end) { \
6114- stbir__simdf_store(output - 3, tot0); \
6115- continue; \
6116- } \
6117- stbir__simdf_0123to2301(tot1, tot0); \
6118- stbir__simdf_store2(output - 3, tot0); \
6119- stbir__simdf_store1(output + 2 - 3, tot1); \
6120- break;
6121-
6122-#endif
6123-
6124-#else
6125-
6126-#define stbir__1_coeff_only() \
6127- float tot0, tot1, tot2, c; \
6128- c = hc[0]; \
6129- tot0 = decode[0] * c; \
6130- tot1 = decode[1] * c; \
6131- tot2 = decode[2] * c;
6132-
6133-#define stbir__2_coeff_only() \
6134- float tot0, tot1, tot2, c; \
6135- c = hc[0]; \
6136- tot0 = decode[0] * c; \
6137- tot1 = decode[1] * c; \
6138- tot2 = decode[2] * c; \
6139- c = hc[1]; \
6140- tot0 += decode[3] * c; \
6141- tot1 += decode[4] * c; \
6142- tot2 += decode[5] * c;
6143-
6144-#define stbir__3_coeff_only() \
6145- float tot0, tot1, tot2, c; \
6146- c = hc[0]; \
6147- tot0 = decode[0] * c; \
6148- tot1 = decode[1] * c; \
6149- tot2 = decode[2] * c; \
6150- c = hc[1]; \
6151- tot0 += decode[3] * c; \
6152- tot1 += decode[4] * c; \
6153- tot2 += decode[5] * c; \
6154- c = hc[2]; \
6155- tot0 += decode[6] * c; \
6156- tot1 += decode[7] * c; \
6157- tot2 += decode[8] * c;
6158-
6159-#define stbir__store_output_tiny() \
6160- output[0] = tot0; \
6161- output[1] = tot1; \
6162- output[2] = tot2; \
6163- horizontal_coefficients += coefficient_width; \
6164- ++horizontal_contributors; \
6165- output += 3;
6166-
6167-#define stbir__4_coeff_start() \
6168- float tota0, tota1, tota2, totb0, totb1, totb2, totc0, totc1, totc2, \
6169- totd0, totd1, totd2, c; \
6170- c = hc[0]; \
6171- tota0 = decode[0] * c; \
6172- tota1 = decode[1] * c; \
6173- tota2 = decode[2] * c; \
6174- c = hc[1]; \
6175- totb0 = decode[3] * c; \
6176- totb1 = decode[4] * c; \
6177- totb2 = decode[5] * c; \
6178- c = hc[2]; \
6179- totc0 = decode[6] * c; \
6180- totc1 = decode[7] * c; \
6181- totc2 = decode[8] * c; \
6182- c = hc[3]; \
6183- totd0 = decode[9] * c; \
6184- totd1 = decode[10] * c; \
6185- totd2 = decode[11] * c;
6186-
6187-#define stbir__4_coeff_continue_from_4(ofs) \
6188- c = hc[0 + (ofs)]; \
6189- tota0 += decode[0 + (ofs) * 3] * c; \
6190- tota1 += decode[1 + (ofs) * 3] * c; \
6191- tota2 += decode[2 + (ofs) * 3] * c; \
6192- c = hc[1 + (ofs)]; \
6193- totb0 += decode[3 + (ofs) * 3] * c; \
6194- totb1 += decode[4 + (ofs) * 3] * c; \
6195- totb2 += decode[5 + (ofs) * 3] * c; \
6196- c = hc[2 + (ofs)]; \
6197- totc0 += decode[6 + (ofs) * 3] * c; \
6198- totc1 += decode[7 + (ofs) * 3] * c; \
6199- totc2 += decode[8 + (ofs) * 3] * c; \
6200- c = hc[3 + (ofs)]; \
6201- totd0 += decode[9 + (ofs) * 3] * c; \
6202- totd1 += decode[10 + (ofs) * 3] * c; \
6203- totd2 += decode[11 + (ofs) * 3] * c;
6204-
6205-#define stbir__1_coeff_remnant(ofs) \
6206- c = hc[0 + (ofs)]; \
6207- tota0 += decode[0 + (ofs) * 3] * c; \
6208- tota1 += decode[1 + (ofs) * 3] * c; \
6209- tota2 += decode[2 + (ofs) * 3] * c;
6210-
6211-#define stbir__2_coeff_remnant(ofs) \
6212- c = hc[0 + (ofs)]; \
6213- tota0 += decode[0 + (ofs) * 3] * c; \
6214- tota1 += decode[1 + (ofs) * 3] * c; \
6215- tota2 += decode[2 + (ofs) * 3] * c; \
6216- c = hc[1 + (ofs)]; \
6217- totb0 += decode[3 + (ofs) * 3] * c; \
6218- totb1 += decode[4 + (ofs) * 3] * c; \
6219- totb2 += decode[5 + (ofs) * 3] * c;
6220-
6221-#define stbir__3_coeff_remnant(ofs) \
6222- c = hc[0 + (ofs)]; \
6223- tota0 += decode[0 + (ofs) * 3] * c; \
6224- tota1 += decode[1 + (ofs) * 3] * c; \
6225- tota2 += decode[2 + (ofs) * 3] * c; \
6226- c = hc[1 + (ofs)]; \
6227- totb0 += decode[3 + (ofs) * 3] * c; \
6228- totb1 += decode[4 + (ofs) * 3] * c; \
6229- totb2 += decode[5 + (ofs) * 3] * c; \
6230- c = hc[2 + (ofs)]; \
6231- totc0 += decode[6 + (ofs) * 3] * c; \
6232- totc1 += decode[7 + (ofs) * 3] * c; \
6233- totc2 += decode[8 + (ofs) * 3] * c;
6234-
6235-#define stbir__store_output() \
6236- output[0] = (tota0 + totc0) + (totb0 + totd0); \
6237- output[1] = (tota1 + totc1) + (totb1 + totd1); \
6238- output[2] = (tota2 + totc2) + (totb2 + totd2); \
6239- horizontal_coefficients += coefficient_width; \
6240- ++horizontal_contributors; \
6241- output += 3;
6242-
6243-#endif
6244-
6245-#define STBIR__horizontal_channels 3
6246-#define STB_IMAGE_RESIZE_DO_HORIZONTALS
6247-#include STBIR__HEADER_FILENAME
6248-
6249-//=================
6250-// Do 4 channel horizontal routines
6251-
6252-#ifdef STBIR_SIMD
6253-
6254-#define stbir__1_coeff_only() \
6255- stbir__simdf tot, c; \
6256- STBIR_SIMD_NO_UNROLL(decode); \
6257- stbir__simdf_load1(c, hc); \
6258- stbir__simdf_0123to0000(c, c); \
6259- stbir__simdf_mult_mem(tot, c, decode);
6260-
6261-#define stbir__2_coeff_only() \
6262- stbir__simdf tot, c, cs; \
6263- STBIR_SIMD_NO_UNROLL(decode); \
6264- stbir__simdf_load2(cs, hc); \
6265- stbir__simdf_0123to0000(c, cs); \
6266- stbir__simdf_mult_mem(tot, c, decode); \
6267- stbir__simdf_0123to1111(c, cs); \
6268- stbir__simdf_madd_mem(tot, tot, c, decode + 4);
6269-
6270-#define stbir__3_coeff_only() \
6271- stbir__simdf tot, c, cs; \
6272- STBIR_SIMD_NO_UNROLL(decode); \
6273- stbir__simdf_load(cs, hc); \
6274- stbir__simdf_0123to0000(c, cs); \
6275- stbir__simdf_mult_mem(tot, c, decode); \
6276- stbir__simdf_0123to1111(c, cs); \
6277- stbir__simdf_madd_mem(tot, tot, c, decode + 4); \
6278- stbir__simdf_0123to2222(c, cs); \
6279- stbir__simdf_madd_mem(tot, tot, c, decode + 8);
6280-
6281-#define stbir__store_output_tiny() \
6282- stbir__simdf_store(output, tot); \
6283- horizontal_coefficients += coefficient_width; \
6284- ++horizontal_contributors; \
6285- output += 4;
6286-
6287-#ifdef STBIR_SIMD8
6288-
6289-#define stbir__4_coeff_start() \
6290- stbir__simdf8 tot0, c, cs; \
6291- stbir__simdf t; \
6292- STBIR_SIMD_NO_UNROLL(decode); \
6293- stbir__simdf8_load4b(cs, hc); \
6294- stbir__simdf8_0123to00001111(c, cs); \
6295- stbir__simdf8_mult_mem(tot0, c, decode); \
6296- stbir__simdf8_0123to22223333(c, cs); \
6297- stbir__simdf8_madd_mem(tot0, tot0, c, decode + 8);
6298-
6299-#define stbir__4_coeff_continue_from_4(ofs) \
6300- STBIR_SIMD_NO_UNROLL(decode); \
6301- stbir__simdf8_load4b(cs, hc + (ofs)); \
6302- stbir__simdf8_0123to00001111(c, cs); \
6303- stbir__simdf8_madd_mem(tot0, tot0, c, decode + (ofs) * 4); \
6304- stbir__simdf8_0123to22223333(c, cs); \
6305- stbir__simdf8_madd_mem(tot0, tot0, c, decode + (ofs) * 4 + 8);
6306-
6307-#define stbir__1_coeff_remnant(ofs) \
6308- STBIR_SIMD_NO_UNROLL(decode); \
6309- stbir__simdf_load1rep4(t, hc + (ofs)); \
6310- stbir__simdf8_madd_mem4(tot0, tot0, t, decode + (ofs) * 4);
6311-
6312-#define stbir__2_coeff_remnant(ofs) \
6313- STBIR_SIMD_NO_UNROLL(decode); \
6314- stbir__simdf8_load4b(cs, hc + (ofs) - 2); \
6315- stbir__simdf8_0123to22223333(c, cs); \
6316- stbir__simdf8_madd_mem(tot0, tot0, c, decode + (ofs) * 4);
6317-
6318-#define stbir__3_coeff_remnant(ofs) \
6319- STBIR_SIMD_NO_UNROLL(decode); \
6320- stbir__simdf8_load4b(cs, hc + (ofs)); \
6321- stbir__simdf8_0123to00001111(c, cs); \
6322- stbir__simdf8_madd_mem(tot0, tot0, c, decode + (ofs) * 4); \
6323- stbir__simdf8_0123to2222(t, cs); \
6324- stbir__simdf8_madd_mem4(tot0, tot0, t, decode + (ofs) * 4 + 8);
6325-
6326-#define stbir__store_output() \
6327- stbir__simdf8_add4halves(t, stbir__if_simdf8_cast_to_simdf4(tot0), tot0); \
6328- stbir__simdf_store(output, t); \
6329- horizontal_coefficients += coefficient_width; \
6330- ++horizontal_contributors; \
6331- output += 4;
6332-
6333-#else
6334-
6335-#define stbir__4_coeff_start() \
6336- stbir__simdf tot0, tot1, c, cs; \
6337- STBIR_SIMD_NO_UNROLL(decode); \
6338- stbir__simdf_load(cs, hc); \
6339- stbir__simdf_0123to0000(c, cs); \
6340- stbir__simdf_mult_mem(tot0, c, decode); \
6341- stbir__simdf_0123to1111(c, cs); \
6342- stbir__simdf_mult_mem(tot1, c, decode + 4); \
6343- stbir__simdf_0123to2222(c, cs); \
6344- stbir__simdf_madd_mem(tot0, tot0, c, decode + 8); \
6345- stbir__simdf_0123to3333(c, cs); \
6346- stbir__simdf_madd_mem(tot1, tot1, c, decode + 12);
6347-
6348-#define stbir__4_coeff_continue_from_4(ofs) \
6349- STBIR_SIMD_NO_UNROLL(decode); \
6350- stbir__simdf_load(cs, hc + (ofs)); \
6351- stbir__simdf_0123to0000(c, cs); \
6352- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 4); \
6353- stbir__simdf_0123to1111(c, cs); \
6354- stbir__simdf_madd_mem(tot1, tot1, c, decode + (ofs) * 4 + 4); \
6355- stbir__simdf_0123to2222(c, cs); \
6356- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 4 + 8); \
6357- stbir__simdf_0123to3333(c, cs); \
6358- stbir__simdf_madd_mem(tot1, tot1, c, decode + (ofs) * 4 + 12);
6359-
6360-#define stbir__1_coeff_remnant(ofs) \
6361- STBIR_SIMD_NO_UNROLL(decode); \
6362- stbir__simdf_load1(c, hc + (ofs)); \
6363- stbir__simdf_0123to0000(c, c); \
6364- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 4);
6365-
6366-#define stbir__2_coeff_remnant(ofs) \
6367- STBIR_SIMD_NO_UNROLL(decode); \
6368- stbir__simdf_load2(cs, hc + (ofs)); \
6369- stbir__simdf_0123to0000(c, cs); \
6370- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 4); \
6371- stbir__simdf_0123to1111(c, cs); \
6372- stbir__simdf_madd_mem(tot1, tot1, c, decode + (ofs) * 4 + 4);
6373-
6374-#define stbir__3_coeff_remnant(ofs) \
6375- STBIR_SIMD_NO_UNROLL(decode); \
6376- stbir__simdf_load(cs, hc + (ofs)); \
6377- stbir__simdf_0123to0000(c, cs); \
6378- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 4); \
6379- stbir__simdf_0123to1111(c, cs); \
6380- stbir__simdf_madd_mem(tot1, tot1, c, decode + (ofs) * 4 + 4); \
6381- stbir__simdf_0123to2222(c, cs); \
6382- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 4 + 8);
6383-
6384-#define stbir__store_output() \
6385- stbir__simdf_add(tot0, tot0, tot1); \
6386- stbir__simdf_store(output, tot0); \
6387- horizontal_coefficients += coefficient_width; \
6388- ++horizontal_contributors; \
6389- output += 4;
6390-
6391-#endif
6392-
6393-#else
6394-
6395-#define stbir__1_coeff_only() \
6396- float p0, p1, p2, p3, c; \
6397- STBIR_SIMD_NO_UNROLL(decode); \
6398- c = hc[0]; \
6399- p0 = decode[0] * c; \
6400- p1 = decode[1] * c; \
6401- p2 = decode[2] * c; \
6402- p3 = decode[3] * c;
6403-
6404-#define stbir__2_coeff_only() \
6405- float p0, p1, p2, p3, c; \
6406- STBIR_SIMD_NO_UNROLL(decode); \
6407- c = hc[0]; \
6408- p0 = decode[0] * c; \
6409- p1 = decode[1] * c; \
6410- p2 = decode[2] * c; \
6411- p3 = decode[3] * c; \
6412- c = hc[1]; \
6413- p0 += decode[4] * c; \
6414- p1 += decode[5] * c; \
6415- p2 += decode[6] * c; \
6416- p3 += decode[7] * c;
6417-
6418-#define stbir__3_coeff_only() \
6419- float p0, p1, p2, p3, c; \
6420- STBIR_SIMD_NO_UNROLL(decode); \
6421- c = hc[0]; \
6422- p0 = decode[0] * c; \
6423- p1 = decode[1] * c; \
6424- p2 = decode[2] * c; \
6425- p3 = decode[3] * c; \
6426- c = hc[1]; \
6427- p0 += decode[4] * c; \
6428- p1 += decode[5] * c; \
6429- p2 += decode[6] * c; \
6430- p3 += decode[7] * c; \
6431- c = hc[2]; \
6432- p0 += decode[8] * c; \
6433- p1 += decode[9] * c; \
6434- p2 += decode[10] * c; \
6435- p3 += decode[11] * c;
6436-
6437-#define stbir__store_output_tiny() \
6438- output[0] = p0; \
6439- output[1] = p1; \
6440- output[2] = p2; \
6441- output[3] = p3; \
6442- horizontal_coefficients += coefficient_width; \
6443- ++horizontal_contributors; \
6444- output += 4;
6445-
6446-#define stbir__4_coeff_start() \
6447- float x0, x1, x2, x3, y0, y1, y2, y3, c; \
6448- STBIR_SIMD_NO_UNROLL(decode); \
6449- c = hc[0]; \
6450- x0 = decode[0] * c; \
6451- x1 = decode[1] * c; \
6452- x2 = decode[2] * c; \
6453- x3 = decode[3] * c; \
6454- c = hc[1]; \
6455- y0 = decode[4] * c; \
6456- y1 = decode[5] * c; \
6457- y2 = decode[6] * c; \
6458- y3 = decode[7] * c; \
6459- c = hc[2]; \
6460- x0 += decode[8] * c; \
6461- x1 += decode[9] * c; \
6462- x2 += decode[10] * c; \
6463- x3 += decode[11] * c; \
6464- c = hc[3]; \
6465- y0 += decode[12] * c; \
6466- y1 += decode[13] * c; \
6467- y2 += decode[14] * c; \
6468- y3 += decode[15] * c;
6469-
6470-#define stbir__4_coeff_continue_from_4(ofs) \
6471- STBIR_SIMD_NO_UNROLL(decode); \
6472- c = hc[0 + (ofs)]; \
6473- x0 += decode[0 + (ofs) * 4] * c; \
6474- x1 += decode[1 + (ofs) * 4] * c; \
6475- x2 += decode[2 + (ofs) * 4] * c; \
6476- x3 += decode[3 + (ofs) * 4] * c; \
6477- c = hc[1 + (ofs)]; \
6478- y0 += decode[4 + (ofs) * 4] * c; \
6479- y1 += decode[5 + (ofs) * 4] * c; \
6480- y2 += decode[6 + (ofs) * 4] * c; \
6481- y3 += decode[7 + (ofs) * 4] * c; \
6482- c = hc[2 + (ofs)]; \
6483- x0 += decode[8 + (ofs) * 4] * c; \
6484- x1 += decode[9 + (ofs) * 4] * c; \
6485- x2 += decode[10 + (ofs) * 4] * c; \
6486- x3 += decode[11 + (ofs) * 4] * c; \
6487- c = hc[3 + (ofs)]; \
6488- y0 += decode[12 + (ofs) * 4] * c; \
6489- y1 += decode[13 + (ofs) * 4] * c; \
6490- y2 += decode[14 + (ofs) * 4] * c; \
6491- y3 += decode[15 + (ofs) * 4] * c;
6492-
6493-#define stbir__1_coeff_remnant(ofs) \
6494- STBIR_SIMD_NO_UNROLL(decode); \
6495- c = hc[0 + (ofs)]; \
6496- x0 += decode[0 + (ofs) * 4] * c; \
6497- x1 += decode[1 + (ofs) * 4] * c; \
6498- x2 += decode[2 + (ofs) * 4] * c; \
6499- x3 += decode[3 + (ofs) * 4] * c;
6500-
6501-#define stbir__2_coeff_remnant(ofs) \
6502- STBIR_SIMD_NO_UNROLL(decode); \
6503- c = hc[0 + (ofs)]; \
6504- x0 += decode[0 + (ofs) * 4] * c; \
6505- x1 += decode[1 + (ofs) * 4] * c; \
6506- x2 += decode[2 + (ofs) * 4] * c; \
6507- x3 += decode[3 + (ofs) * 4] * c; \
6508- c = hc[1 + (ofs)]; \
6509- y0 += decode[4 + (ofs) * 4] * c; \
6510- y1 += decode[5 + (ofs) * 4] * c; \
6511- y2 += decode[6 + (ofs) * 4] * c; \
6512- y3 += decode[7 + (ofs) * 4] * c;
6513-
6514-#define stbir__3_coeff_remnant(ofs) \
6515- STBIR_SIMD_NO_UNROLL(decode); \
6516- c = hc[0 + (ofs)]; \
6517- x0 += decode[0 + (ofs) * 4] * c; \
6518- x1 += decode[1 + (ofs) * 4] * c; \
6519- x2 += decode[2 + (ofs) * 4] * c; \
6520- x3 += decode[3 + (ofs) * 4] * c; \
6521- c = hc[1 + (ofs)]; \
6522- y0 += decode[4 + (ofs) * 4] * c; \
6523- y1 += decode[5 + (ofs) * 4] * c; \
6524- y2 += decode[6 + (ofs) * 4] * c; \
6525- y3 += decode[7 + (ofs) * 4] * c; \
6526- c = hc[2 + (ofs)]; \
6527- x0 += decode[8 + (ofs) * 4] * c; \
6528- x1 += decode[9 + (ofs) * 4] * c; \
6529- x2 += decode[10 + (ofs) * 4] * c; \
6530- x3 += decode[11 + (ofs) * 4] * c;
6531-
6532-#define stbir__store_output() \
6533- output[0] = x0 + y0; \
6534- output[1] = x1 + y1; \
6535- output[2] = x2 + y2; \
6536- output[3] = x3 + y3; \
6537- horizontal_coefficients += coefficient_width; \
6538- ++horizontal_contributors; \
6539- output += 4;
6540-
6541-#endif
6542-
6543-#define STBIR__horizontal_channels 4
6544-#define STB_IMAGE_RESIZE_DO_HORIZONTALS
6545-#include STBIR__HEADER_FILENAME
6546-
6547-//=================
6548-// Do 7 channel horizontal routines
6549-
6550-#ifdef STBIR_SIMD
6551-
6552-#define stbir__1_coeff_only() \
6553- stbir__simdf tot0, tot1, c; \
6554- STBIR_SIMD_NO_UNROLL(decode); \
6555- stbir__simdf_load1(c, hc); \
6556- stbir__simdf_0123to0000(c, c); \
6557- stbir__simdf_mult_mem(tot0, c, decode); \
6558- stbir__simdf_mult_mem(tot1, c, decode + 3);
6559-
6560-#define stbir__2_coeff_only() \
6561- stbir__simdf tot0, tot1, c, cs; \
6562- STBIR_SIMD_NO_UNROLL(decode); \
6563- stbir__simdf_load2(cs, hc); \
6564- stbir__simdf_0123to0000(c, cs); \
6565- stbir__simdf_mult_mem(tot0, c, decode); \
6566- stbir__simdf_mult_mem(tot1, c, decode + 3); \
6567- stbir__simdf_0123to1111(c, cs); \
6568- stbir__simdf_madd_mem(tot0, tot0, c, decode + 7); \
6569- stbir__simdf_madd_mem(tot1, tot1, c, decode + 10);
6570-
6571-#define stbir__3_coeff_only() \
6572- stbir__simdf tot0, tot1, c, cs; \
6573- STBIR_SIMD_NO_UNROLL(decode); \
6574- stbir__simdf_load(cs, hc); \
6575- stbir__simdf_0123to0000(c, cs); \
6576- stbir__simdf_mult_mem(tot0, c, decode); \
6577- stbir__simdf_mult_mem(tot1, c, decode + 3); \
6578- stbir__simdf_0123to1111(c, cs); \
6579- stbir__simdf_madd_mem(tot0, tot0, c, decode + 7); \
6580- stbir__simdf_madd_mem(tot1, tot1, c, decode + 10); \
6581- stbir__simdf_0123to2222(c, cs); \
6582- stbir__simdf_madd_mem(tot0, tot0, c, decode + 14); \
6583- stbir__simdf_madd_mem(tot1, tot1, c, decode + 17);
6584-
6585-#define stbir__store_output_tiny() \
6586- stbir__simdf_store(output + 3, tot1); \
6587- stbir__simdf_store(output, tot0); \
6588- horizontal_coefficients += coefficient_width; \
6589- ++horizontal_contributors; \
6590- output += 7;
6591-
6592-#ifdef STBIR_SIMD8
6593-
6594-#define stbir__4_coeff_start() \
6595- stbir__simdf8 tot0, tot1, c, cs; \
6596- STBIR_SIMD_NO_UNROLL(decode); \
6597- stbir__simdf8_load4b(cs, hc); \
6598- stbir__simdf8_0123to00000000(c, cs); \
6599- stbir__simdf8_mult_mem(tot0, c, decode); \
6600- stbir__simdf8_0123to11111111(c, cs); \
6601- stbir__simdf8_mult_mem(tot1, c, decode + 7); \
6602- stbir__simdf8_0123to22222222(c, cs); \
6603- stbir__simdf8_madd_mem(tot0, tot0, c, decode + 14); \
6604- stbir__simdf8_0123to33333333(c, cs); \
6605- stbir__simdf8_madd_mem(tot1, tot1, c, decode + 21);
6606-
6607-#define stbir__4_coeff_continue_from_4(ofs) \
6608- STBIR_SIMD_NO_UNROLL(decode); \
6609- stbir__simdf8_load4b(cs, hc + (ofs)); \
6610- stbir__simdf8_0123to00000000(c, cs); \
6611- stbir__simdf8_madd_mem(tot0, tot0, c, decode + (ofs) * 7); \
6612- stbir__simdf8_0123to11111111(c, cs); \
6613- stbir__simdf8_madd_mem(tot1, tot1, c, decode + (ofs) * 7 + 7); \
6614- stbir__simdf8_0123to22222222(c, cs); \
6615- stbir__simdf8_madd_mem(tot0, tot0, c, decode + (ofs) * 7 + 14); \
6616- stbir__simdf8_0123to33333333(c, cs); \
6617- stbir__simdf8_madd_mem(tot1, tot1, c, decode + (ofs) * 7 + 21);
6618-
6619-#define stbir__1_coeff_remnant(ofs) \
6620- STBIR_SIMD_NO_UNROLL(decode); \
6621- stbir__simdf8_load1b(c, hc + (ofs)); \
6622- stbir__simdf8_madd_mem(tot0, tot0, c, decode + (ofs) * 7);
6623-
6624-#define stbir__2_coeff_remnant(ofs) \
6625- STBIR_SIMD_NO_UNROLL(decode); \
6626- stbir__simdf8_load1b(c, hc + (ofs)); \
6627- stbir__simdf8_madd_mem(tot0, tot0, c, decode + (ofs) * 7); \
6628- stbir__simdf8_load1b(c, hc + (ofs) + 1); \
6629- stbir__simdf8_madd_mem(tot1, tot1, c, decode + (ofs) * 7 + 7);
6630-
6631-#define stbir__3_coeff_remnant(ofs) \
6632- STBIR_SIMD_NO_UNROLL(decode); \
6633- stbir__simdf8_load4b(cs, hc + (ofs)); \
6634- stbir__simdf8_0123to00000000(c, cs); \
6635- stbir__simdf8_madd_mem(tot0, tot0, c, decode + (ofs) * 7); \
6636- stbir__simdf8_0123to11111111(c, cs); \
6637- stbir__simdf8_madd_mem(tot1, tot1, c, decode + (ofs) * 7 + 7); \
6638- stbir__simdf8_0123to22222222(c, cs); \
6639- stbir__simdf8_madd_mem(tot0, tot0, c, decode + (ofs) * 7 + 14);
6640-
6641-#define stbir__store_output() \
6642- stbir__simdf8_add(tot0, tot0, tot1); \
6643- horizontal_coefficients += coefficient_width; \
6644- ++horizontal_contributors; \
6645- output += 7; \
6646- if (output < output_end) { \
6647- stbir__simdf8_store(output - 7, tot0); \
6648- continue; \
6649- } \
6650- stbir__simdf_store( \
6651- output - 7 + 3, \
6652- stbir__simdf_swiz(stbir__simdf8_gettop4(tot0), 0, 0, 1, 2)); \
6653- stbir__simdf_store(output - 7, stbir__if_simdf8_cast_to_simdf4(tot0)); \
6654- break;
6655-
6656-#else
6657-
6658-#define stbir__4_coeff_start() \
6659- stbir__simdf tot0, tot1, tot2, tot3, c, cs; \
6660- STBIR_SIMD_NO_UNROLL(decode); \
6661- stbir__simdf_load(cs, hc); \
6662- stbir__simdf_0123to0000(c, cs); \
6663- stbir__simdf_mult_mem(tot0, c, decode); \
6664- stbir__simdf_mult_mem(tot1, c, decode + 3); \
6665- stbir__simdf_0123to1111(c, cs); \
6666- stbir__simdf_mult_mem(tot2, c, decode + 7); \
6667- stbir__simdf_mult_mem(tot3, c, decode + 10); \
6668- stbir__simdf_0123to2222(c, cs); \
6669- stbir__simdf_madd_mem(tot0, tot0, c, decode + 14); \
6670- stbir__simdf_madd_mem(tot1, tot1, c, decode + 17); \
6671- stbir__simdf_0123to3333(c, cs); \
6672- stbir__simdf_madd_mem(tot2, tot2, c, decode + 21); \
6673- stbir__simdf_madd_mem(tot3, tot3, c, decode + 24);
6674-
6675-#define stbir__4_coeff_continue_from_4(ofs) \
6676- STBIR_SIMD_NO_UNROLL(decode); \
6677- stbir__simdf_load(cs, hc + (ofs)); \
6678- stbir__simdf_0123to0000(c, cs); \
6679- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 7); \
6680- stbir__simdf_madd_mem(tot1, tot1, c, decode + (ofs) * 7 + 3); \
6681- stbir__simdf_0123to1111(c, cs); \
6682- stbir__simdf_madd_mem(tot2, tot2, c, decode + (ofs) * 7 + 7); \
6683- stbir__simdf_madd_mem(tot3, tot3, c, decode + (ofs) * 7 + 10); \
6684- stbir__simdf_0123to2222(c, cs); \
6685- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 7 + 14); \
6686- stbir__simdf_madd_mem(tot1, tot1, c, decode + (ofs) * 7 + 17); \
6687- stbir__simdf_0123to3333(c, cs); \
6688- stbir__simdf_madd_mem(tot2, tot2, c, decode + (ofs) * 7 + 21); \
6689- stbir__simdf_madd_mem(tot3, tot3, c, decode + (ofs) * 7 + 24);
6690-
6691-#define stbir__1_coeff_remnant(ofs) \
6692- STBIR_SIMD_NO_UNROLL(decode); \
6693- stbir__simdf_load1(c, hc + (ofs)); \
6694- stbir__simdf_0123to0000(c, c); \
6695- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 7); \
6696- stbir__simdf_madd_mem(tot1, tot1, c, decode + (ofs) * 7 + 3);
6697-
6698-#define stbir__2_coeff_remnant(ofs) \
6699- STBIR_SIMD_NO_UNROLL(decode); \
6700- stbir__simdf_load2(cs, hc + (ofs)); \
6701- stbir__simdf_0123to0000(c, cs); \
6702- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 7); \
6703- stbir__simdf_madd_mem(tot1, tot1, c, decode + (ofs) * 7 + 3); \
6704- stbir__simdf_0123to1111(c, cs); \
6705- stbir__simdf_madd_mem(tot2, tot2, c, decode + (ofs) * 7 + 7); \
6706- stbir__simdf_madd_mem(tot3, tot3, c, decode + (ofs) * 7 + 10);
6707-
6708-#define stbir__3_coeff_remnant(ofs) \
6709- STBIR_SIMD_NO_UNROLL(decode); \
6710- stbir__simdf_load(cs, hc + (ofs)); \
6711- stbir__simdf_0123to0000(c, cs); \
6712- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 7); \
6713- stbir__simdf_madd_mem(tot1, tot1, c, decode + (ofs) * 7 + 3); \
6714- stbir__simdf_0123to1111(c, cs); \
6715- stbir__simdf_madd_mem(tot2, tot2, c, decode + (ofs) * 7 + 7); \
6716- stbir__simdf_madd_mem(tot3, tot3, c, decode + (ofs) * 7 + 10); \
6717- stbir__simdf_0123to2222(c, cs); \
6718- stbir__simdf_madd_mem(tot0, tot0, c, decode + (ofs) * 7 + 14); \
6719- stbir__simdf_madd_mem(tot1, tot1, c, decode + (ofs) * 7 + 17);
6720-
6721-#define stbir__store_output() \
6722- stbir__simdf_add(tot0, tot0, tot2); \
6723- stbir__simdf_add(tot1, tot1, tot3); \
6724- stbir__simdf_store(output + 3, tot1); \
6725- stbir__simdf_store(output, tot0); \
6726- horizontal_coefficients += coefficient_width; \
6727- ++horizontal_contributors; \
6728- output += 7;
6729-
6730-#endif
6731-
6732-#else
6733-
6734-#define stbir__1_coeff_only() \
6735- float tot0, tot1, tot2, tot3, tot4, tot5, tot6, c; \
6736- c = hc[0]; \
6737- tot0 = decode[0] * c; \
6738- tot1 = decode[1] * c; \
6739- tot2 = decode[2] * c; \
6740- tot3 = decode[3] * c; \
6741- tot4 = decode[4] * c; \
6742- tot5 = decode[5] * c; \
6743- tot6 = decode[6] * c;
6744-
6745-#define stbir__2_coeff_only() \
6746- float tot0, tot1, tot2, tot3, tot4, tot5, tot6, c; \
6747- c = hc[0]; \
6748- tot0 = decode[0] * c; \
6749- tot1 = decode[1] * c; \
6750- tot2 = decode[2] * c; \
6751- tot3 = decode[3] * c; \
6752- tot4 = decode[4] * c; \
6753- tot5 = decode[5] * c; \
6754- tot6 = decode[6] * c; \
6755- c = hc[1]; \
6756- tot0 += decode[7] * c; \
6757- tot1 += decode[8] * c; \
6758- tot2 += decode[9] * c; \
6759- tot3 += decode[10] * c; \
6760- tot4 += decode[11] * c; \
6761- tot5 += decode[12] * c; \
6762- tot6 += decode[13] * c;
6763-
6764-#define stbir__3_coeff_only() \
6765- float tot0, tot1, tot2, tot3, tot4, tot5, tot6, c; \
6766- c = hc[0]; \
6767- tot0 = decode[0] * c; \
6768- tot1 = decode[1] * c; \
6769- tot2 = decode[2] * c; \
6770- tot3 = decode[3] * c; \
6771- tot4 = decode[4] * c; \
6772- tot5 = decode[5] * c; \
6773- tot6 = decode[6] * c; \
6774- c = hc[1]; \
6775- tot0 += decode[7] * c; \
6776- tot1 += decode[8] * c; \
6777- tot2 += decode[9] * c; \
6778- tot3 += decode[10] * c; \
6779- tot4 += decode[11] * c; \
6780- tot5 += decode[12] * c; \
6781- tot6 += decode[13] * c; \
6782- c = hc[2]; \
6783- tot0 += decode[14] * c; \
6784- tot1 += decode[15] * c; \
6785- tot2 += decode[16] * c; \
6786- tot3 += decode[17] * c; \
6787- tot4 += decode[18] * c; \
6788- tot5 += decode[19] * c; \
6789- tot6 += decode[20] * c;
6790-
6791-#define stbir__store_output_tiny() \
6792- output[0] = tot0; \
6793- output[1] = tot1; \
6794- output[2] = tot2; \
6795- output[3] = tot3; \
6796- output[4] = tot4; \
6797- output[5] = tot5; \
6798- output[6] = tot6; \
6799- horizontal_coefficients += coefficient_width; \
6800- ++horizontal_contributors; \
6801- output += 7;
6802-
6803-#define stbir__4_coeff_start() \
6804- float x0, x1, x2, x3, x4, x5, x6, y0, y1, y2, y3, y4, y5, y6, c; \
6805- STBIR_SIMD_NO_UNROLL(decode); \
6806- c = hc[0]; \
6807- x0 = decode[0] * c; \
6808- x1 = decode[1] * c; \
6809- x2 = decode[2] * c; \
6810- x3 = decode[3] * c; \
6811- x4 = decode[4] * c; \
6812- x5 = decode[5] * c; \
6813- x6 = decode[6] * c; \
6814- c = hc[1]; \
6815- y0 = decode[7] * c; \
6816- y1 = decode[8] * c; \
6817- y2 = decode[9] * c; \
6818- y3 = decode[10] * c; \
6819- y4 = decode[11] * c; \
6820- y5 = decode[12] * c; \
6821- y6 = decode[13] * c; \
6822- c = hc[2]; \
6823- x0 += decode[14] * c; \
6824- x1 += decode[15] * c; \
6825- x2 += decode[16] * c; \
6826- x3 += decode[17] * c; \
6827- x4 += decode[18] * c; \
6828- x5 += decode[19] * c; \
6829- x6 += decode[20] * c; \
6830- c = hc[3]; \
6831- y0 += decode[21] * c; \
6832- y1 += decode[22] * c; \
6833- y2 += decode[23] * c; \
6834- y3 += decode[24] * c; \
6835- y4 += decode[25] * c; \
6836- y5 += decode[26] * c; \
6837- y6 += decode[27] * c;
6838-
6839-#define stbir__4_coeff_continue_from_4(ofs) \
6840- STBIR_SIMD_NO_UNROLL(decode); \
6841- c = hc[0 + (ofs)]; \
6842- x0 += decode[0 + (ofs) * 7] * c; \
6843- x1 += decode[1 + (ofs) * 7] * c; \
6844- x2 += decode[2 + (ofs) * 7] * c; \
6845- x3 += decode[3 + (ofs) * 7] * c; \
6846- x4 += decode[4 + (ofs) * 7] * c; \
6847- x5 += decode[5 + (ofs) * 7] * c; \
6848- x6 += decode[6 + (ofs) * 7] * c; \
6849- c = hc[1 + (ofs)]; \
6850- y0 += decode[7 + (ofs) * 7] * c; \
6851- y1 += decode[8 + (ofs) * 7] * c; \
6852- y2 += decode[9 + (ofs) * 7] * c; \
6853- y3 += decode[10 + (ofs) * 7] * c; \
6854- y4 += decode[11 + (ofs) * 7] * c; \
6855- y5 += decode[12 + (ofs) * 7] * c; \
6856- y6 += decode[13 + (ofs) * 7] * c; \
6857- c = hc[2 + (ofs)]; \
6858- x0 += decode[14 + (ofs) * 7] * c; \
6859- x1 += decode[15 + (ofs) * 7] * c; \
6860- x2 += decode[16 + (ofs) * 7] * c; \
6861- x3 += decode[17 + (ofs) * 7] * c; \
6862- x4 += decode[18 + (ofs) * 7] * c; \
6863- x5 += decode[19 + (ofs) * 7] * c; \
6864- x6 += decode[20 + (ofs) * 7] * c; \
6865- c = hc[3 + (ofs)]; \
6866- y0 += decode[21 + (ofs) * 7] * c; \
6867- y1 += decode[22 + (ofs) * 7] * c; \
6868- y2 += decode[23 + (ofs) * 7] * c; \
6869- y3 += decode[24 + (ofs) * 7] * c; \
6870- y4 += decode[25 + (ofs) * 7] * c; \
6871- y5 += decode[26 + (ofs) * 7] * c; \
6872- y6 += decode[27 + (ofs) * 7] * c;
6873-
6874-#define stbir__1_coeff_remnant(ofs) \
6875- STBIR_SIMD_NO_UNROLL(decode); \
6876- c = hc[0 + (ofs)]; \
6877- x0 += decode[0 + (ofs) * 7] * c; \
6878- x1 += decode[1 + (ofs) * 7] * c; \
6879- x2 += decode[2 + (ofs) * 7] * c; \
6880- x3 += decode[3 + (ofs) * 7] * c; \
6881- x4 += decode[4 + (ofs) * 7] * c; \
6882- x5 += decode[5 + (ofs) * 7] * c; \
6883- x6 += decode[6 + (ofs) * 7] * c;
6884-
6885-#define stbir__2_coeff_remnant(ofs) \
6886- STBIR_SIMD_NO_UNROLL(decode); \
6887- c = hc[0 + (ofs)]; \
6888- x0 += decode[0 + (ofs) * 7] * c; \
6889- x1 += decode[1 + (ofs) * 7] * c; \
6890- x2 += decode[2 + (ofs) * 7] * c; \
6891- x3 += decode[3 + (ofs) * 7] * c; \
6892- x4 += decode[4 + (ofs) * 7] * c; \
6893- x5 += decode[5 + (ofs) * 7] * c; \
6894- x6 += decode[6 + (ofs) * 7] * c; \
6895- c = hc[1 + (ofs)]; \
6896- y0 += decode[7 + (ofs) * 7] * c; \
6897- y1 += decode[8 + (ofs) * 7] * c; \
6898- y2 += decode[9 + (ofs) * 7] * c; \
6899- y3 += decode[10 + (ofs) * 7] * c; \
6900- y4 += decode[11 + (ofs) * 7] * c; \
6901- y5 += decode[12 + (ofs) * 7] * c; \
6902- y6 += decode[13 + (ofs) * 7] * c;
6903-
6904-#define stbir__3_coeff_remnant(ofs) \
6905- STBIR_SIMD_NO_UNROLL(decode); \
6906- c = hc[0 + (ofs)]; \
6907- x0 += decode[0 + (ofs) * 7] * c; \
6908- x1 += decode[1 + (ofs) * 7] * c; \
6909- x2 += decode[2 + (ofs) * 7] * c; \
6910- x3 += decode[3 + (ofs) * 7] * c; \
6911- x4 += decode[4 + (ofs) * 7] * c; \
6912- x5 += decode[5 + (ofs) * 7] * c; \
6913- x6 += decode[6 + (ofs) * 7] * c; \
6914- c = hc[1 + (ofs)]; \
6915- y0 += decode[7 + (ofs) * 7] * c; \
6916- y1 += decode[8 + (ofs) * 7] * c; \
6917- y2 += decode[9 + (ofs) * 7] * c; \
6918- y3 += decode[10 + (ofs) * 7] * c; \
6919- y4 += decode[11 + (ofs) * 7] * c; \
6920- y5 += decode[12 + (ofs) * 7] * c; \
6921- y6 += decode[13 + (ofs) * 7] * c; \
6922- c = hc[2 + (ofs)]; \
6923- x0 += decode[14 + (ofs) * 7] * c; \
6924- x1 += decode[15 + (ofs) * 7] * c; \
6925- x2 += decode[16 + (ofs) * 7] * c; \
6926- x3 += decode[17 + (ofs) * 7] * c; \
6927- x4 += decode[18 + (ofs) * 7] * c; \
6928- x5 += decode[19 + (ofs) * 7] * c; \
6929- x6 += decode[20 + (ofs) * 7] * c;
6930-
6931-#define stbir__store_output() \
6932- output[0] = x0 + y0; \
6933- output[1] = x1 + y1; \
6934- output[2] = x2 + y2; \
6935- output[3] = x3 + y3; \
6936- output[4] = x4 + y4; \
6937- output[5] = x5 + y5; \
6938- output[6] = x6 + y6; \
6939- horizontal_coefficients += coefficient_width; \
6940- ++horizontal_contributors; \
6941- output += 7;
6942-
6943-#endif
6944-
6945-#define STBIR__horizontal_channels 7
6946-#define STB_IMAGE_RESIZE_DO_HORIZONTALS
6947-#include STBIR__HEADER_FILENAME
6948-
6949-// include all of the vertical resamplers (both scatter and gather versions)
6950-
6951-#define STBIR__vertical_channels 1
6952-#define STB_IMAGE_RESIZE_DO_VERTICALS
6953-#include STBIR__HEADER_FILENAME
6954-
6955-#define STBIR__vertical_channels 1
6956-#define STB_IMAGE_RESIZE_DO_VERTICALS
6957-#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
6958-#include STBIR__HEADER_FILENAME
6959-
6960-#define STBIR__vertical_channels 2
6961-#define STB_IMAGE_RESIZE_DO_VERTICALS
6962-#include STBIR__HEADER_FILENAME
6963-
6964-#define STBIR__vertical_channels 2
6965-#define STB_IMAGE_RESIZE_DO_VERTICALS
6966-#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
6967-#include STBIR__HEADER_FILENAME
6968-
6969-#define STBIR__vertical_channels 3
6970-#define STB_IMAGE_RESIZE_DO_VERTICALS
6971-#include STBIR__HEADER_FILENAME
6972-
6973-#define STBIR__vertical_channels 3
6974-#define STB_IMAGE_RESIZE_DO_VERTICALS
6975-#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
6976-#include STBIR__HEADER_FILENAME
6977-
6978-#define STBIR__vertical_channels 4
6979-#define STB_IMAGE_RESIZE_DO_VERTICALS
6980-#include STBIR__HEADER_FILENAME
6981-
6982-#define STBIR__vertical_channels 4
6983-#define STB_IMAGE_RESIZE_DO_VERTICALS
6984-#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
6985-#include STBIR__HEADER_FILENAME
6986-
6987-#define STBIR__vertical_channels 5
6988-#define STB_IMAGE_RESIZE_DO_VERTICALS
6989-#include STBIR__HEADER_FILENAME
6990-
6991-#define STBIR__vertical_channels 5
6992-#define STB_IMAGE_RESIZE_DO_VERTICALS
6993-#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
6994-#include STBIR__HEADER_FILENAME
6995-
6996-#define STBIR__vertical_channels 6
6997-#define STB_IMAGE_RESIZE_DO_VERTICALS
6998-#include STBIR__HEADER_FILENAME
6999-
7000-#define STBIR__vertical_channels 6
7001-#define STB_IMAGE_RESIZE_DO_VERTICALS
7002-#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
7003-#include STBIR__HEADER_FILENAME
7004-
7005-#define STBIR__vertical_channels 7
7006-#define STB_IMAGE_RESIZE_DO_VERTICALS
7007-#include STBIR__HEADER_FILENAME
7008-
7009-#define STBIR__vertical_channels 7
7010-#define STB_IMAGE_RESIZE_DO_VERTICALS
7011-#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
7012-#include STBIR__HEADER_FILENAME
7013-
7014-#define STBIR__vertical_channels 8
7015-#define STB_IMAGE_RESIZE_DO_VERTICALS
7016-#include STBIR__HEADER_FILENAME
7017-
7018-#define STBIR__vertical_channels 8
7019-#define STB_IMAGE_RESIZE_DO_VERTICALS
7020-#define STB_IMAGE_RESIZE_VERTICAL_CONTINUE
7021-#include STBIR__HEADER_FILENAME
7022-
7023-typedef void
7024-STBIR_VERTICAL_GATHERFUNC(float *output, float const *coeffs,
7025- float const **inputs, float const *input0_end);
7026-
7027-static STBIR_VERTICAL_GATHERFUNC *stbir__vertical_gathers[8] = {
7028- stbir__vertical_gather_with_1_coeffs, stbir__vertical_gather_with_2_coeffs,
7029- stbir__vertical_gather_with_3_coeffs, stbir__vertical_gather_with_4_coeffs,
7030- stbir__vertical_gather_with_5_coeffs, stbir__vertical_gather_with_6_coeffs,
7031- stbir__vertical_gather_with_7_coeffs, stbir__vertical_gather_with_8_coeffs};
7032-
7033-static STBIR_VERTICAL_GATHERFUNC *stbir__vertical_gathers_continues[8] = {
7034- stbir__vertical_gather_with_1_coeffs_cont,
7035- stbir__vertical_gather_with_2_coeffs_cont,
7036- stbir__vertical_gather_with_3_coeffs_cont,
7037- stbir__vertical_gather_with_4_coeffs_cont,
7038- stbir__vertical_gather_with_5_coeffs_cont,
7039- stbir__vertical_gather_with_6_coeffs_cont,
7040- stbir__vertical_gather_with_7_coeffs_cont,
7041- stbir__vertical_gather_with_8_coeffs_cont};
7042-
7043-typedef void
7044-STBIR_VERTICAL_SCATTERFUNC(float **outputs, float const *coeffs,
7045- float const *input, float const *input_end);
7046-
7047-static STBIR_VERTICAL_SCATTERFUNC *stbir__vertical_scatter_sets[8] = {
7048- stbir__vertical_scatter_with_1_coeffs,
7049- stbir__vertical_scatter_with_2_coeffs,
7050- stbir__vertical_scatter_with_3_coeffs,
7051- stbir__vertical_scatter_with_4_coeffs,
7052- stbir__vertical_scatter_with_5_coeffs,
7053- stbir__vertical_scatter_with_6_coeffs,
7054- stbir__vertical_scatter_with_7_coeffs,
7055- stbir__vertical_scatter_with_8_coeffs};
7056-
7057-static STBIR_VERTICAL_SCATTERFUNC *stbir__vertical_scatter_blends[8] = {
7058- stbir__vertical_scatter_with_1_coeffs_cont,
7059- stbir__vertical_scatter_with_2_coeffs_cont,
7060- stbir__vertical_scatter_with_3_coeffs_cont,
7061- stbir__vertical_scatter_with_4_coeffs_cont,
7062- stbir__vertical_scatter_with_5_coeffs_cont,
7063- stbir__vertical_scatter_with_6_coeffs_cont,
7064- stbir__vertical_scatter_with_7_coeffs_cont,
7065- stbir__vertical_scatter_with_8_coeffs_cont};
7066-
7067-static void
7068-stbir__encode_scanline(stbir__info const *stbir_info, void *output_buffer_data,
7069- float *encode_buffer,
7070- int row STBIR_ONLY_PROFILE_GET_SPLIT_INFO)
7071-{
7072- int num_pixels = stbir_info->horizontal.scale_info.output_sub_size;
7073- int channels = stbir_info->channels;
7074- int width_times_channels = num_pixels * channels;
7075- void *output_buffer;
7076-
7077- // un-alpha weight if we need to
7078- if (stbir_info->alpha_unweight) {
7079- STBIR_PROFILE_START(unalpha);
7080- stbir_info->alpha_unweight(encode_buffer, width_times_channels);
7081- STBIR_PROFILE_END(unalpha);
7082- }
7083-
7084- // write directly into output by default
7085- output_buffer = output_buffer_data;
7086-
7087- // if we have an output callback, we first convert the decode buffer in
7088- // place (and then hand that to the callback)
7089- if (stbir_info->out_pixels_cb) {
7090- output_buffer = encode_buffer;
7091- }
7092-
7093- STBIR_PROFILE_START(encode);
7094- // convert into the output buffer
7095- stbir_info->encode_pixels(output_buffer, width_times_channels,
7096- encode_buffer);
7097- STBIR_PROFILE_END(encode);
7098-
7099- // if we have an output callback, call it to send the data
7100- if (stbir_info->out_pixels_cb) {
7101- stbir_info->out_pixels_cb(output_buffer, num_pixels, row,
7102- stbir_info->user_data);
7103- }
7104-}
7105-
7106-// Get the ring buffer pointer for an index
7107-static float *
7108-stbir__get_ring_buffer_entry(stbir__info const *stbir_info,
7109- stbir__per_split_info const *split_info, int index)
7110-{
7111- STBIR_ASSERT(index < stbir_info->ring_buffer_num_entries);
7112-
7113-#ifdef STBIR__SEPARATE_ALLOCATIONS
7114- return split_info->ring_buffers[index];
7115-#else
7116- return (float *)(((char *)split_info->ring_buffer) +
7117- (index * stbir_info->ring_buffer_length_bytes));
7118-#endif
7119-}
7120-
7121-// Get the specified scan line from the ring buffer
7122-static float *
7123-stbir__get_ring_buffer_scanline(stbir__info const *stbir_info,
7124- stbir__per_split_info const *split_info,
7125- int get_scanline)
7126-{
7127- int ring_buffer_index =
7128- (split_info->ring_buffer_begin_index +
7129- (get_scanline - split_info->ring_buffer_first_scanline)) %
7130- stbir_info->ring_buffer_num_entries;
7131- return stbir__get_ring_buffer_entry(stbir_info, split_info,
7132- ring_buffer_index);
7133-}
7134-
7135-static void
7136-stbir__resample_horizontal_gather(
7137- stbir__info const *stbir_info, float *output_buffer,
7138- float const *input_buffer STBIR_ONLY_PROFILE_GET_SPLIT_INFO)
7139-{
7140- float const *decode_buffer =
7141- input_buffer - (stbir_info->scanline_extents.conservative.n0 *
7142- stbir_info->effective_channels);
7143-
7144- STBIR_PROFILE_START(horizontal);
7145- if ((stbir_info->horizontal.filter_enum == STBIR_FILTER_POINT_SAMPLE) &&
7146- (stbir_info->horizontal.scale_info.scale == 1.0f)) {
7147- STBIR_MEMCPY(output_buffer, input_buffer,
7148- stbir_info->horizontal.scale_info.output_sub_size *
7149- sizeof(float) * stbir_info->effective_channels);
7150- } else {
7151- stbir_info->horizontal_gather_channels(
7152- output_buffer, stbir_info->horizontal.scale_info.output_sub_size,
7153- decode_buffer, stbir_info->horizontal.contributors,
7154- stbir_info->horizontal.coefficients,
7155- stbir_info->horizontal.coefficient_width);
7156- }
7157- STBIR_PROFILE_END(horizontal);
7158-}
7159-
7160-static void
7161-stbir__resample_vertical_gather(stbir__info const *stbir_info,
7162- stbir__per_split_info *split_info, int n,
7163- int contrib_n0, int contrib_n1,
7164- float const *vertical_coefficients)
7165-{
7166- float *encode_buffer = split_info->vertical_buffer;
7167- float *decode_buffer = split_info->decode_buffer;
7168- int vertical_first = stbir_info->vertical_first;
7169- int width = (vertical_first)
7170- ? (stbir_info->scanline_extents.conservative.n1 -
7171- stbir_info->scanline_extents.conservative.n0 + 1)
7172- : stbir_info->horizontal.scale_info.output_sub_size;
7173- int width_times_channels = stbir_info->effective_channels * width;
7174-
7175- STBIR_ASSERT(stbir_info->vertical.is_gather);
7176-
7177- // loop over the contributing scanlines and scale into the buffer
7178- STBIR_PROFILE_START(vertical);
7179- {
7180- int k = 0, total = contrib_n1 - contrib_n0 + 1;
7181- STBIR_ASSERT(total > 0);
7182- do {
7183- float const *inputs[8];
7184- int i, cnt = total;
7185- if (cnt > 8) {
7186- cnt = 8;
7187- }
7188- for (i = 0; i < cnt; i++) {
7189- inputs[i] = stbir__get_ring_buffer_scanline(
7190- stbir_info, split_info, k + i + contrib_n0);
7191- }
7192-
7193- // call the N scanlines at a time function (up to 8 scanlines of
7194- // blending at once)
7195- ((k == 0) ? stbir__vertical_gathers
7196- : stbir__vertical_gathers_continues)[cnt - 1](
7197- (vertical_first) ? decode_buffer : encode_buffer,
7198- vertical_coefficients + k, inputs,
7199- inputs[0] + width_times_channels);
7200- k += cnt;
7201- total -= cnt;
7202- } while (total);
7203- }
7204- STBIR_PROFILE_END(vertical);
7205-
7206- if (vertical_first) {
7207- // Now resample the gathered vertical data in the horizontal axis into
7208- // the encode buffer
7209- decode_buffer[width_times_channels] =
7210- 0.0f; // clear two over for horizontals with a remnant of 3
7211- decode_buffer[width_times_channels + 1] = 0.0f;
7212- stbir__resample_horizontal_gather(
7213- stbir_info, encode_buffer,
7214- decode_buffer STBIR_ONLY_PROFILE_SET_SPLIT_INFO);
7215- }
7216-
7217- stbir__encode_scanline(
7218- stbir_info,
7219- ((char *)stbir_info->output_data) +
7220- ((size_t)n * (size_t)stbir_info->output_stride_bytes),
7221- encode_buffer, n STBIR_ONLY_PROFILE_SET_SPLIT_INFO);
7222-}
7223-
7224-static void
7225-stbir__decode_and_resample_for_vertical_gather_loop(
7226- stbir__info const *stbir_info, stbir__per_split_info *split_info, int n)
7227-{
7228- int ring_buffer_index;
7229- float *ring_buffer;
7230-
7231- // Decode the nth scanline from the source image into the decode buffer.
7232- stbir__decode_scanline(
7233- stbir_info, n,
7234- split_info->decode_buffer STBIR_ONLY_PROFILE_SET_SPLIT_INFO);
7235-
7236- // update new end scanline
7237- split_info->ring_buffer_last_scanline = n;
7238-
7239- // get ring buffer
7240- ring_buffer_index = (split_info->ring_buffer_begin_index +
7241- (split_info->ring_buffer_last_scanline -
7242- split_info->ring_buffer_first_scanline)) %
7243- stbir_info->ring_buffer_num_entries;
7244- ring_buffer =
7245- stbir__get_ring_buffer_entry(stbir_info, split_info, ring_buffer_index);
7246-
7247- // Now resample it into the ring buffer.
7248- stbir__resample_horizontal_gather(
7249- stbir_info, ring_buffer,
7250- split_info->decode_buffer STBIR_ONLY_PROFILE_SET_SPLIT_INFO);
7251-
7252- // Now it's sitting in the ring buffer ready to be used as source for the
7253- // vertical sampling.
7254-}
7255-
7256-static void
7257-stbir__vertical_gather_loop(stbir__info const *stbir_info,
7258- stbir__per_split_info *split_info, int split_count)
7259-{
7260- int y, start_output_y, end_output_y;
7261- stbir__contributors *vertical_contributors =
7262- stbir_info->vertical.contributors;
7263- float const *vertical_coefficients = stbir_info->vertical.coefficients;
7264-
7265- STBIR_ASSERT(stbir_info->vertical.is_gather);
7266-
7267- start_output_y = split_info->start_output_y;
7268- end_output_y = split_info[split_count - 1].end_output_y;
7269-
7270- vertical_contributors += start_output_y;
7271- vertical_coefficients +=
7272- start_output_y * stbir_info->vertical.coefficient_width;
7273-
7274- // initialize the ring buffer for gathering
7275- split_info->ring_buffer_begin_index = 0;
7276- split_info->ring_buffer_first_scanline = vertical_contributors->n0;
7277- split_info->ring_buffer_last_scanline =
7278- split_info->ring_buffer_first_scanline - 1; // means "empty"
7279-
7280- for (y = start_output_y; y < end_output_y; y++) {
7281- int in_first_scanline, in_last_scanline;
7282-
7283- in_first_scanline = vertical_contributors->n0;
7284- in_last_scanline = vertical_contributors->n1;
7285-
7286- // make sure the indexing hasn't broken
7287- STBIR_ASSERT(in_first_scanline >=
7288- split_info->ring_buffer_first_scanline);
7289-
7290- // Load in new scanlines
7291- while (in_last_scanline > split_info->ring_buffer_last_scanline) {
7292- STBIR_ASSERT((split_info->ring_buffer_last_scanline -
7293- split_info->ring_buffer_first_scanline + 1) <=
7294- stbir_info->ring_buffer_num_entries);
7295-
7296- // make sure there was room in the ring buffer when we add new
7297- // scanlines
7298- if ((split_info->ring_buffer_last_scanline -
7299- split_info->ring_buffer_first_scanline + 1) ==
7300- stbir_info->ring_buffer_num_entries) {
7301- split_info->ring_buffer_first_scanline++;
7302- split_info->ring_buffer_begin_index++;
7303- }
7304-
7305- if (stbir_info->vertical_first) {
7306- float *ring_buffer = stbir__get_ring_buffer_scanline(
7307- stbir_info, split_info,
7308- ++split_info->ring_buffer_last_scanline);
7309- // Decode the nth scanline from the source image into the decode
7310- // buffer.
7311- stbir__decode_scanline(
7312- stbir_info, split_info->ring_buffer_last_scanline,
7313- ring_buffer STBIR_ONLY_PROFILE_SET_SPLIT_INFO);
7314- } else {
7315- stbir__decode_and_resample_for_vertical_gather_loop(
7316- stbir_info, split_info,
7317- split_info->ring_buffer_last_scanline + 1);
7318- }
7319- }
7320-
7321- // Now all buffers should be ready to write a row of vertical sampling,
7322- // so do it.
7323- stbir__resample_vertical_gather(stbir_info, split_info, y,
7324- in_first_scanline, in_last_scanline,
7325- vertical_coefficients);
7326-
7327- ++vertical_contributors;
7328- vertical_coefficients += stbir_info->vertical.coefficient_width;
7329- }
7330-}
7331-
7332-#define STBIR__FLOAT_EMPTY_MARKER 3.0e+38F
7333-#define STBIR__FLOAT_BUFFER_IS_EMPTY(ptr) \
7334- ((ptr)[0] == STBIR__FLOAT_EMPTY_MARKER)
7335-
7336-static void
7337-stbir__encode_first_scanline_from_scatter(stbir__info const *stbir_info,
7338- stbir__per_split_info *split_info)
7339-{
7340- // evict a scanline out into the output buffer
7341- float *ring_buffer_entry = stbir__get_ring_buffer_entry(
7342- stbir_info, split_info, split_info->ring_buffer_begin_index);
7343-
7344- // dump the scanline out
7345- stbir__encode_scanline(stbir_info,
7346- ((char *)stbir_info->output_data) +
7347- ((size_t)split_info->ring_buffer_first_scanline *
7348- (size_t)stbir_info->output_stride_bytes),
7349- ring_buffer_entry,
7350- split_info->ring_buffer_first_scanline
7351- STBIR_ONLY_PROFILE_SET_SPLIT_INFO);
7352-
7353- // mark it as empty
7354- ring_buffer_entry[0] = STBIR__FLOAT_EMPTY_MARKER;
7355-
7356- // advance the first scanline
7357- split_info->ring_buffer_first_scanline++;
7358- if (++split_info->ring_buffer_begin_index ==
7359- stbir_info->ring_buffer_num_entries) {
7360- split_info->ring_buffer_begin_index = 0;
7361- }
7362-}
7363-
7364-static void
7365-stbir__horizontal_resample_and_encode_first_scanline_from_scatter(
7366- stbir__info const *stbir_info, stbir__per_split_info *split_info)
7367-{
7368- // evict a scanline out into the output buffer
7369-
7370- float *ring_buffer_entry = stbir__get_ring_buffer_entry(
7371- stbir_info, split_info, split_info->ring_buffer_begin_index);
7372-
7373- // Now resample it into the buffer.
7374- stbir__resample_horizontal_gather(
7375- stbir_info, split_info->vertical_buffer,
7376- ring_buffer_entry STBIR_ONLY_PROFILE_SET_SPLIT_INFO);
7377-
7378- // dump the scanline out
7379- stbir__encode_scanline(stbir_info,
7380- ((char *)stbir_info->output_data) +
7381- ((size_t)split_info->ring_buffer_first_scanline *
7382- (size_t)stbir_info->output_stride_bytes),
7383- split_info->vertical_buffer,
7384- split_info->ring_buffer_first_scanline
7385- STBIR_ONLY_PROFILE_SET_SPLIT_INFO);
7386-
7387- // mark it as empty
7388- ring_buffer_entry[0] = STBIR__FLOAT_EMPTY_MARKER;
7389-
7390- // advance the first scanline
7391- split_info->ring_buffer_first_scanline++;
7392- if (++split_info->ring_buffer_begin_index ==
7393- stbir_info->ring_buffer_num_entries) {
7394- split_info->ring_buffer_begin_index = 0;
7395- }
7396-}
7397-
7398-static void
7399-stbir__resample_vertical_scatter(stbir__info const *stbir_info,
7400- stbir__per_split_info *split_info, int n0,
7401- int n1, float const *vertical_coefficients,
7402- float const *vertical_buffer,
7403- float const *vertical_buffer_end)
7404-{
7405- STBIR_ASSERT(!stbir_info->vertical.is_gather);
7406-
7407- STBIR_PROFILE_START(vertical);
7408- {
7409- int k = 0, total = n1 - n0 + 1;
7410- STBIR_ASSERT(total > 0);
7411- do {
7412- float *outputs[8];
7413- int i, n = total;
7414- if (n > 8) {
7415- n = 8;
7416- }
7417- for (i = 0; i < n; i++) {
7418- outputs[i] = stbir__get_ring_buffer_scanline(
7419- stbir_info, split_info, k + i + n0);
7420- if ((i) &&
7421- (STBIR__FLOAT_BUFFER_IS_EMPTY(outputs[i]) !=
7422- STBIR__FLOAT_BUFFER_IS_EMPTY(
7423- outputs[0]))) // make sure runs are of the same type
7424- {
7425- n = i;
7426- break;
7427- }
7428- }
7429- // call the scatter to N scanlines at a time function (up to 8
7430- // scanlines of scattering at once)
7431- ((STBIR__FLOAT_BUFFER_IS_EMPTY(outputs[0]))
7432- ? stbir__vertical_scatter_sets
7433- : stbir__vertical_scatter_blends)[n - 1](
7434- outputs, vertical_coefficients + k, vertical_buffer,
7435- vertical_buffer_end);
7436- k += n;
7437- total -= n;
7438- } while (total);
7439- }
7440-
7441- STBIR_PROFILE_END(vertical);
7442-}
7443-
7444-typedef void
7445-stbir__handle_scanline_for_scatter_func(stbir__info const *stbir_info,
7446- stbir__per_split_info *split_info);
7447-
7448-static void
7449-stbir__vertical_scatter_loop(stbir__info const *stbir_info,
7450- stbir__per_split_info *split_info, int split_count)
7451-{
7452- int y, start_output_y, end_output_y, start_input_y, end_input_y;
7453- stbir__contributors *vertical_contributors =
7454- stbir_info->vertical.contributors;
7455- float const *vertical_coefficients = stbir_info->vertical.coefficients;
7456- stbir__handle_scanline_for_scatter_func *handle_scanline_for_scatter;
7457- void *scanline_scatter_buffer;
7458- void *scanline_scatter_buffer_end;
7459- int on_first_input_y, last_input_y;
7460- int width = (stbir_info->vertical_first)
7461- ? (stbir_info->scanline_extents.conservative.n1 -
7462- stbir_info->scanline_extents.conservative.n0 + 1)
7463- : stbir_info->horizontal.scale_info.output_sub_size;
7464- int width_times_channels = stbir_info->effective_channels * width;
7465-
7466- STBIR_ASSERT(!stbir_info->vertical.is_gather);
7467-
7468- start_output_y = split_info->start_output_y;
7469- end_output_y = split_info[split_count - 1]
7470- .end_output_y; // may do multiple split counts
7471-
7472- start_input_y = split_info->start_input_y;
7473- end_input_y = split_info[split_count - 1].end_input_y;
7474-
7475- // adjust for starting offset start_input_y
7476- y = start_input_y + stbir_info->vertical.filter_pixel_margin;
7477- vertical_contributors += y;
7478- vertical_coefficients += stbir_info->vertical.coefficient_width * y;
7479-
7480- if (stbir_info->vertical_first) {
7481- handle_scanline_for_scatter =
7482- stbir__horizontal_resample_and_encode_first_scanline_from_scatter;
7483- scanline_scatter_buffer = split_info->decode_buffer;
7484- scanline_scatter_buffer_end =
7485- ((char *)scanline_scatter_buffer) +
7486- sizeof(float) * stbir_info->effective_channels *
7487- (stbir_info->scanline_extents.conservative.n1 -
7488- stbir_info->scanline_extents.conservative.n0 + 1);
7489- } else {
7490- handle_scanline_for_scatter = stbir__encode_first_scanline_from_scatter;
7491- scanline_scatter_buffer = split_info->vertical_buffer;
7492- scanline_scatter_buffer_end =
7493- ((char *)scanline_scatter_buffer) +
7494- sizeof(float) * stbir_info->effective_channels *
7495- stbir_info->horizontal.scale_info.output_sub_size;
7496- }
7497-
7498- // initialize the ring buffer for scattering
7499- split_info->ring_buffer_first_scanline = start_output_y;
7500- split_info->ring_buffer_last_scanline = -1;
7501- split_info->ring_buffer_begin_index = -1;
7502-
7503- // mark all the buffers as empty to start
7504- for (y = 0; y < stbir_info->ring_buffer_num_entries; y++) {
7505- float *decode_buffer =
7506- stbir__get_ring_buffer_entry(stbir_info, split_info, y);
7507- decode_buffer[width_times_channels] =
7508- 0.0f; // clear two over for horizontals with a remnant of 3
7509- decode_buffer[width_times_channels + 1] = 0.0f;
7510- decode_buffer[0] = STBIR__FLOAT_EMPTY_MARKER; // only used on scatter
7511- }
7512-
7513- // do the loop in input space
7514- on_first_input_y = 1;
7515- last_input_y = start_input_y;
7516- for (y = start_input_y; y < end_input_y; y++) {
7517- int out_first_scanline, out_last_scanline;
7518-
7519- out_first_scanline = vertical_contributors->n0;
7520- out_last_scanline = vertical_contributors->n1;
7521-
7522- STBIR_ASSERT(out_last_scanline - out_first_scanline + 1 <=
7523- stbir_info->ring_buffer_num_entries);
7524-
7525- if ((out_last_scanline >= out_first_scanline) &&
7526- (((out_first_scanline >= start_output_y) &&
7527- (out_first_scanline < end_output_y)) ||
7528- ((out_last_scanline >= start_output_y) &&
7529- (out_last_scanline < end_output_y)))) {
7530- float const *vc = vertical_coefficients;
7531-
7532- // keep track of the range actually seen for the next resize
7533- last_input_y = y;
7534- if ((on_first_input_y) && (y > start_input_y)) {
7535- split_info->start_input_y = y;
7536- }
7537- on_first_input_y = 0;
7538-
7539- // clip the region
7540- if (out_first_scanline < start_output_y) {
7541- vc += start_output_y - out_first_scanline;
7542- out_first_scanline = start_output_y;
7543- }
7544-
7545- if (out_last_scanline >= end_output_y) {
7546- out_last_scanline = end_output_y - 1;
7547- }
7548-
7549- // if very first scanline, init the index
7550- if (split_info->ring_buffer_begin_index < 0) {
7551- split_info->ring_buffer_begin_index =
7552- out_first_scanline - start_output_y;
7553- }
7554-
7555- STBIR_ASSERT(split_info->ring_buffer_begin_index <=
7556- out_first_scanline);
7557-
7558- // Decode the nth scanline from the source image into the decode
7559- // buffer.
7560- stbir__decode_scanline(
7561- stbir_info, y,
7562- split_info->decode_buffer STBIR_ONLY_PROFILE_SET_SPLIT_INFO);
7563-
7564- // When horizontal first, we resample horizontally into the vertical
7565- // buffer before we scatter it out
7566- if (!stbir_info->vertical_first) {
7567- stbir__resample_horizontal_gather(
7568- stbir_info, split_info->vertical_buffer,
7569- split_info
7570- ->decode_buffer STBIR_ONLY_PROFILE_SET_SPLIT_INFO);
7571- }
7572-
7573- // Now it's sitting in the buffer ready to be distributed into the
7574- // ring buffers.
7575-
7576- // evict from the ringbuffer, if we need are full
7577- if (((split_info->ring_buffer_last_scanline -
7578- split_info->ring_buffer_first_scanline + 1) ==
7579- stbir_info->ring_buffer_num_entries) &&
7580- (out_last_scanline > split_info->ring_buffer_last_scanline)) {
7581- handle_scanline_for_scatter(stbir_info, split_info);
7582- }
7583-
7584- // Now the horizontal buffer is ready to write to all ring buffer
7585- // rows, so do it.
7586- stbir__resample_vertical_scatter(
7587- stbir_info, split_info, out_first_scanline, out_last_scanline,
7588- vc, (float *)scanline_scatter_buffer,
7589- (float *)scanline_scatter_buffer_end);
7590-
7591- // update the end of the buffer
7592- if (out_last_scanline > split_info->ring_buffer_last_scanline) {
7593- split_info->ring_buffer_last_scanline = out_last_scanline;
7594- }
7595- }
7596- ++vertical_contributors;
7597- vertical_coefficients += stbir_info->vertical.coefficient_width;
7598- }
7599-
7600- // now evict the scanlines that are left over in the ring buffer
7601- while (split_info->ring_buffer_first_scanline < end_output_y) {
7602- handle_scanline_for_scatter(stbir_info, split_info);
7603- }
7604-
7605- // update the end_input_y if we do multiple resizes with the same data
7606- ++last_input_y;
7607- for (y = 0; y < split_count; y++) {
7608- if (split_info[y].end_input_y > last_input_y) {
7609- split_info[y].end_input_y = last_input_y;
7610- }
7611- }
7612-}
7613-
7614-static stbir__kernel_callback *stbir__builtin_kernels[] = {
7615- 0,
7616- stbir__filter_trapezoid,
7617- stbir__filter_triangle,
7618- stbir__filter_cubic,
7619- stbir__filter_catmullrom,
7620- stbir__filter_mitchell,
7621- stbir__filter_point};
7622-static stbir__support_callback *stbir__builtin_supports[] = {
7623- 0,
7624- stbir__support_trapezoid,
7625- stbir__support_one,
7626- stbir__support_two,
7627- stbir__support_two,
7628- stbir__support_two,
7629- stbir__support_zeropoint5};
7630-
7631-static void
7632-stbir__set_sampler(stbir__sampler *samp, stbir_filter filter,
7633- stbir__kernel_callback *kernel,
7634- stbir__support_callback *support, stbir_edge edge,
7635- stbir__scale_info *scale_info, int always_gather,
7636- void *user_data)
7637-{
7638- // set filter
7639- if (filter == 0) {
7640- filter = STBIR_DEFAULT_FILTER_DOWNSAMPLE; // default to downsample
7641- if (scale_info->scale >= (1.0f - stbir__small_float)) {
7642- if ((scale_info->scale <= (1.0f + stbir__small_float)) &&
7643- (STBIR_CEILF(scale_info->pixel_shift) ==
7644- scale_info->pixel_shift)) {
7645- filter = STBIR_FILTER_POINT_SAMPLE;
7646- } else {
7647- filter = STBIR_DEFAULT_FILTER_UPSAMPLE;
7648- }
7649- }
7650- }
7651- samp->filter_enum = filter;
7652-
7653- STBIR_ASSERT(samp->filter_enum != 0);
7654- STBIR_ASSERT((unsigned)samp->filter_enum < STBIR_FILTER_OTHER);
7655- samp->filter_kernel = stbir__builtin_kernels[filter];
7656- samp->filter_support = stbir__builtin_supports[filter];
7657-
7658- if (kernel && support) {
7659- samp->filter_kernel = kernel;
7660- samp->filter_support = support;
7661- samp->filter_enum = STBIR_FILTER_OTHER;
7662- }
7663-
7664- samp->edge = edge;
7665- samp->filter_pixel_width = stbir__get_filter_pixel_width(
7666- samp->filter_support, scale_info->scale, user_data);
7667- // Gather is always better, but in extreme downsamples, you have to most or
7668- // all of the data in memory
7669- // For horizontal, we always have all the pixels, so we always use gather
7670- // here (always_gather==1). For vertical, we use gather if scaling up
7671- // (which means we will have samp->filter_pixel_width scanlines in memory
7672- // at once).
7673- samp->is_gather = 0;
7674- if (scale_info->scale >= (1.0f - stbir__small_float)) {
7675- samp->is_gather = 1;
7676- } else if ((always_gather) ||
7677- (samp->filter_pixel_width <=
7678- STBIR_FORCE_GATHER_FILTER_SCANLINES_AMOUNT)) {
7679- samp->is_gather = 2;
7680- }
7681-
7682- // pre calculate stuff based on the above
7683- samp->coefficient_width =
7684- stbir__get_coefficient_width(samp, samp->is_gather, user_data);
7685-
7686- // filter_pixel_width is the conservative size in pixels of input that
7687- // affect an output pixel.
7688- // In rare cases (only with 2 pix to 1 pix with the default filters), it's
7689- // possible that the filter will extend before or after the scanline
7690- // beyond just one extra entire copy of the scanline (we would hit the
7691- // edge twice). We don't let you do that, so we clamp the total width to
7692- // 3x the total of input pixel (once for the scanline, once for the left
7693- // side overhang, and once for the right side). We only do this for edge
7694- // mode, since the other modes can just re-edge clamp back in again.
7695- if (edge == STBIR_EDGE_WRAP) {
7696- if (samp->filter_pixel_width > (scale_info->input_full_size * 3)) {
7697- samp->filter_pixel_width = scale_info->input_full_size * 3;
7698- }
7699- }
7700-
7701- // This is how much to expand buffers to account for filters seeking outside
7702- // the image boundaries.
7703- samp->filter_pixel_margin = samp->filter_pixel_width / 2;
7704-
7705- // filter_pixel_margin is the amount that this filter can overhang on just
7706- // one side of either
7707- // end of the scanline (left or the right). Since we only allow you to
7708- // overhang 1 scanline's worth of pixels, we clamp this one side of
7709- // overhang to the input scanline size. Again, this clamping only happens
7710- // in rare cases with the default filters (2 pix to 1 pix).
7711- if (edge == STBIR_EDGE_WRAP) {
7712- if (samp->filter_pixel_margin > scale_info->input_full_size) {
7713- samp->filter_pixel_margin = scale_info->input_full_size;
7714- }
7715- }
7716-
7717- samp->num_contributors = stbir__get_contributors(samp, samp->is_gather);
7718-
7719- samp->contributors_size =
7720- samp->num_contributors * sizeof(stbir__contributors);
7721- samp->coefficients_size =
7722- samp->num_contributors * samp->coefficient_width * sizeof(float) +
7723- sizeof(float) *
7724- STBIR_INPUT_CALLBACK_PADDING; // extra sizeof(float) is padding
7725-
7726- samp->gather_prescatter_contributors = 0;
7727- samp->gather_prescatter_coefficients = 0;
7728- if (samp->is_gather == 0) {
7729- samp->gather_prescatter_coefficient_width = samp->filter_pixel_width;
7730- samp->gather_prescatter_num_contributors =
7731- stbir__get_contributors(samp, 2);
7732- samp->gather_prescatter_contributors_size =
7733- samp->gather_prescatter_num_contributors *
7734- sizeof(stbir__contributors);
7735- samp->gather_prescatter_coefficients_size =
7736- samp->gather_prescatter_num_contributors *
7737- samp->gather_prescatter_coefficient_width * sizeof(float);
7738- }
7739-}
7740-
7741-static void
7742-stbir__get_conservative_extents(stbir__sampler *samp,
7743- stbir__contributors *range, void *user_data)
7744-{
7745- float scale = samp->scale_info.scale;
7746- float out_shift = samp->scale_info.pixel_shift;
7747- stbir__support_callback *support = samp->filter_support;
7748- int input_full_size = samp->scale_info.input_full_size;
7749- stbir_edge edge = samp->edge;
7750- float inv_scale = samp->scale_info.inv_scale;
7751-
7752- STBIR_ASSERT(samp->is_gather != 0);
7753-
7754- if (samp->is_gather == 1) {
7755- int in_first_pixel, in_last_pixel;
7756- float out_filter_radius = support(inv_scale, user_data) * scale;
7757-
7758- stbir__calculate_in_pixel_range(&in_first_pixel, &in_last_pixel, 0.5,
7759- out_filter_radius, inv_scale, out_shift,
7760- input_full_size, edge);
7761- range->n0 = in_first_pixel;
7762- stbir__calculate_in_pixel_range(
7763- &in_first_pixel, &in_last_pixel,
7764- ((float)(samp->scale_info.output_sub_size - 1)) + 0.5f,
7765- out_filter_radius, inv_scale, out_shift, input_full_size, edge);
7766- range->n1 = in_last_pixel;
7767- } else if (samp->is_gather == 2) // downsample gather, refine
7768- {
7769- float in_pixels_radius = support(scale, user_data) * inv_scale;
7770- int filter_pixel_margin = samp->filter_pixel_margin;
7771- int output_sub_size = samp->scale_info.output_sub_size;
7772- int input_end;
7773- int n;
7774- int in_first_pixel, in_last_pixel;
7775-
7776- // get a conservative area of the input range
7777- stbir__calculate_in_pixel_range(&in_first_pixel, &in_last_pixel, 0, 0,
7778- inv_scale, out_shift, input_full_size,
7779- edge);
7780- range->n0 = in_first_pixel;
7781- stbir__calculate_in_pixel_range(&in_first_pixel, &in_last_pixel,
7782- (float)output_sub_size, 0, inv_scale,
7783- out_shift, input_full_size, edge);
7784- range->n1 = in_last_pixel;
7785-
7786- // now go through the margin to the start of area to find bottom
7787- n = range->n0 + 1;
7788- input_end = -filter_pixel_margin;
7789- while (n >= input_end) {
7790- int out_first_pixel, out_last_pixel;
7791- stbir__calculate_out_pixel_range(
7792- &out_first_pixel, &out_last_pixel, ((float)n) + 0.5f,
7793- in_pixels_radius, scale, out_shift, output_sub_size);
7794- if (out_first_pixel > out_last_pixel) {
7795- break;
7796- }
7797-
7798- if ((out_first_pixel < output_sub_size) || (out_last_pixel >= 0)) {
7799- range->n0 = n;
7800- }
7801- --n;
7802- }
7803-
7804- // now go through the end of the area through the margin to find top
7805- n = range->n1 - 1;
7806- input_end = n + 1 + filter_pixel_margin;
7807- while (n <= input_end) {
7808- int out_first_pixel, out_last_pixel;
7809- stbir__calculate_out_pixel_range(
7810- &out_first_pixel, &out_last_pixel, ((float)n) + 0.5f,
7811- in_pixels_radius, scale, out_shift, output_sub_size);
7812- if (out_first_pixel > out_last_pixel) {
7813- break;
7814- }
7815- if ((out_first_pixel < output_sub_size) || (out_last_pixel >= 0)) {
7816- range->n1 = n;
7817- }
7818- ++n;
7819- }
7820- }
7821-
7822- if (samp->edge == STBIR_EDGE_WRAP) {
7823- // if we are wrapping, and we are very close to the image size (so the
7824- // edges might merge), just use the scanline up to the edge
7825- if ((range->n0 > 0) && (range->n1 >= input_full_size)) {
7826- int marg = range->n1 - input_full_size + 1;
7827- if ((marg + STBIR__MERGE_RUNS_PIXEL_THRESHOLD) >= range->n0) {
7828- range->n0 = 0;
7829- }
7830- }
7831- if ((range->n0 < 0) && (range->n1 < (input_full_size - 1))) {
7832- int marg = -range->n0;
7833- if ((input_full_size - marg - STBIR__MERGE_RUNS_PIXEL_THRESHOLD -
7834- 1) <= range->n1) {
7835- range->n1 = input_full_size - 1;
7836- }
7837- }
7838- } else {
7839- // for non-edge-wrap modes, we never read over the edge, so clamp
7840- if (range->n0 < 0) {
7841- range->n0 = 0;
7842- }
7843- if (range->n1 >= input_full_size) {
7844- range->n1 = input_full_size - 1;
7845- }
7846- }
7847-}
7848-
7849-static void
7850-stbir__get_split_info(stbir__per_split_info *split_info, int splits,
7851- int output_height, int vertical_pixel_margin,
7852- int input_full_height, int is_gather,
7853- stbir__contributors *contribs)
7854-{
7855- int i, cur;
7856- int left = output_height;
7857-
7858- cur = 0;
7859- for (i = 0; i < splits; i++) {
7860- int each;
7861-
7862- split_info[i].start_output_y = cur;
7863- each = left / (splits - i);
7864- split_info[i].end_output_y = cur + each;
7865-
7866- // ok, when we are gathering, we need to make sure we are starting on a
7867- // y offset that doesn't have
7868- // a "special" set of coefficients. Basically, with exactly the right
7869- // filter at exactly the right resize at exactly the right phase, some
7870- // of the coefficents can be zero. When they are zero, we don't
7871- // process them at all. But this leads to a tricky thing with the
7872- // thread splits, where we might have a set of two coeffs like this
7873- // for example: (4,4) and (3,6). The 4,4 means there was just one
7874- // single coeff because things worked out perfectly (normally, they
7875- // all have 4 coeffs like the range 3,6. The problem is that if we
7876- // start right on the (4,4) on a brand new thread, then when we get to
7877- // (3,6), we don't have the "3" sample in memory (because we didn't
7878- // load it on the initial (4,4) range because it didn't have a 3 (we
7879- // only add new samples that are larger than our existing samples -
7880- // it's just how the eviction works). So, our solution here is pretty
7881- // simple, if we start right on a range that has samples that start
7882- // earlier, then we simply bump up our previous thread split range to
7883- // include it, and then start this threads range with the smaller
7884- // sample. It just moves one scanline from one thread split to
7885- // another, so that we end with the unusual one, instead of start with
7886- // it. To do this, we check 2-4 sample at each thread split start and
7887- // then occassionally move them.
7888-
7889- if ((is_gather) && (i)) {
7890- stbir__contributors *small_contribs;
7891- int j, smallest, stop, start_n0;
7892- stbir__contributors *split_contribs = contribs + cur;
7893-
7894- // scan for a max of 3x the filter width or until the next thread
7895- // split
7896- stop = vertical_pixel_margin * 3;
7897- if (each < stop) {
7898- stop = each;
7899- }
7900-
7901- // loops a few times before early out
7902- smallest = 0;
7903- small_contribs = split_contribs;
7904- start_n0 = small_contribs->n0;
7905- for (j = 1; j <= stop; j++) {
7906- ++split_contribs;
7907- if (split_contribs->n0 > start_n0) {
7908- break;
7909- }
7910- if (split_contribs->n0 < small_contribs->n0) {
7911- small_contribs = split_contribs;
7912- smallest = j;
7913- }
7914- }
7915-
7916- split_info[i - 1].end_output_y += smallest;
7917- split_info[i].start_output_y += smallest;
7918- }
7919-
7920- cur += each;
7921- left -= each;
7922-
7923- // scatter range (updated to minimum as you run it)
7924- split_info[i].start_input_y = -vertical_pixel_margin;
7925- split_info[i].end_input_y = input_full_height + vertical_pixel_margin;
7926- }
7927-}
7928-
7929-static void
7930-stbir__free_internal_mem(stbir__info *info)
7931-{
7932-#define STBIR__FREE_AND_CLEAR(ptr) \
7933- { \
7934- if (ptr) { \
7935- void *p = (ptr); \
7936- (ptr) = 0; \
7937- STBIR_FREE(p, info->user_data); \
7938- } \
7939- }
7940-
7941- if (info) {
7942-#ifndef STBIR__SEPARATE_ALLOCATIONS
7943- STBIR__FREE_AND_CLEAR(info->alloced_mem);
7944-#else
7945- int i, j;
7946-
7947- if ((info->vertical.gather_prescatter_contributors) &&
7948- ((void *)info->vertical.gather_prescatter_contributors !=
7949- (void *)info->split_info[0].decode_buffer)) {
7950- STBIR__FREE_AND_CLEAR(
7951- info->vertical.gather_prescatter_coefficients);
7952- STBIR__FREE_AND_CLEAR(
7953- info->vertical.gather_prescatter_contributors);
7954- }
7955- for (i = 0; i < info->splits; i++) {
7956- for (j = 0; j < info->alloc_ring_buffer_num_entries; j++) {
7957-#ifdef STBIR_SIMD8
7958- if (info->effective_channels == 3) {
7959- --info->split_info[i]
7960- .ring_buffers[j]; // avx in 3 channel mode needs one
7961- // float at the start of the buffer
7962- }
7963-#endif
7964- STBIR__FREE_AND_CLEAR(info->split_info[i].ring_buffers[j]);
7965- }
7966-
7967-#ifdef STBIR_SIMD8
7968- if (info->effective_channels == 3) {
7969- --info->split_info[i]
7970- .decode_buffer; // avx in 3 channel mode needs one float
7971- // at the start of the buffer
7972- }
7973-#endif
7974- STBIR__FREE_AND_CLEAR(info->split_info[i].decode_buffer);
7975- STBIR__FREE_AND_CLEAR(info->split_info[i].ring_buffers);
7976- STBIR__FREE_AND_CLEAR(info->split_info[i].vertical_buffer);
7977- }
7978- STBIR__FREE_AND_CLEAR(info->split_info);
7979- if (info->vertical.coefficients != info->horizontal.coefficients) {
7980- STBIR__FREE_AND_CLEAR(info->vertical.coefficients);
7981- STBIR__FREE_AND_CLEAR(info->vertical.contributors);
7982- }
7983- STBIR__FREE_AND_CLEAR(info->horizontal.coefficients);
7984- STBIR__FREE_AND_CLEAR(info->horizontal.contributors);
7985- STBIR__FREE_AND_CLEAR(info->alloced_mem);
7986- STBIR_FREE(info, info->user_data);
7987-#endif
7988- }
7989-
7990-#undef STBIR__FREE_AND_CLEAR
7991-}
7992-
7993-static int
7994-stbir__get_max_split(int splits, int height)
7995-{
7996- int i;
7997- int max = 0;
7998-
7999- for (i = 0; i < splits; i++) {
8000- int each = height / (splits - i);
8001- if (each > max) {
8002- max = each;
8003- }
8004- height -= each;
8005- }
8006- return max;
8007-}
8008-
8009-static stbir__horizontal_gather_channels_func *
8010- *stbir__horizontal_gather_n_coeffs_funcs[8] = {
8011- 0,
8012- stbir__horizontal_gather_1_channels_with_n_coeffs_funcs,
8013- stbir__horizontal_gather_2_channels_with_n_coeffs_funcs,
8014- stbir__horizontal_gather_3_channels_with_n_coeffs_funcs,
8015- stbir__horizontal_gather_4_channels_with_n_coeffs_funcs,
8016- 0,
8017- 0,
8018- stbir__horizontal_gather_7_channels_with_n_coeffs_funcs};
8019-
8020-static stbir__horizontal_gather_channels_func *
8021- *stbir__horizontal_gather_channels_funcs[8] = {
8022- 0,
8023- stbir__horizontal_gather_1_channels_funcs,
8024- stbir__horizontal_gather_2_channels_funcs,
8025- stbir__horizontal_gather_3_channels_funcs,
8026- stbir__horizontal_gather_4_channels_funcs,
8027- 0,
8028- 0,
8029- stbir__horizontal_gather_7_channels_funcs};
8030-
8031-// there are six resize classifications: 0 == vertical scatter, 1 == vertical
8032-// gather < 1x scale, 2 == vertical gather 1x-2x scale, 4 == vertical gather <
8033-// 3x scale, 4 == vertical gather > 3x scale, 5 == <=4 pixel height, 6 == <=4
8034-// pixel wide column
8035-#define STBIR_RESIZE_CLASSIFICATIONS 8
8036-
8037-static float stbir__compute_weights[5][STBIR_RESIZE_CLASSIFICATIONS]
8038- [4] = // 5 = 0=1chan, 1=2chan, 2=3chan,
8039- // 3=4chan, 4=7chan
8040- {{
8041- {1.00000f, 1.00000f, 0.31250f, 1.00000f},
8042- {0.56250f, 0.59375f, 0.00000f, 0.96875f},
8043- {1.00000f, 0.06250f, 0.00000f, 1.00000f},
8044- {0.00000f, 0.09375f, 1.00000f, 1.00000f},
8045- {1.00000f, 1.00000f, 1.00000f, 1.00000f},
8046- {0.03125f, 0.12500f, 1.00000f, 1.00000f},
8047- {0.06250f, 0.12500f, 0.00000f, 1.00000f},
8048- {0.00000f, 1.00000f, 0.00000f, 0.03125f},
8049- },
8050- {
8051- {0.00000f, 0.84375f, 0.00000f, 0.03125f},
8052- {0.09375f, 0.93750f, 0.00000f, 0.78125f},
8053- {0.87500f, 0.21875f, 0.00000f, 0.96875f},
8054- {0.09375f, 0.09375f, 1.00000f, 1.00000f},
8055- {1.00000f, 1.00000f, 1.00000f, 1.00000f},
8056- {0.03125f, 0.12500f, 1.00000f, 1.00000f},
8057- {0.06250f, 0.12500f, 0.00000f, 1.00000f},
8058- {0.00000f, 1.00000f, 0.00000f, 0.53125f},
8059- },
8060- {
8061- {0.00000f, 0.53125f, 0.00000f, 0.03125f},
8062- {0.06250f, 0.96875f, 0.00000f, 0.53125f},
8063- {0.87500f, 0.18750f, 0.00000f, 0.93750f},
8064- {0.00000f, 0.09375f, 1.00000f, 1.00000f},
8065- {1.00000f, 1.00000f, 1.00000f, 1.00000f},
8066- {0.03125f, 0.12500f, 1.00000f, 1.00000f},
8067- {0.06250f, 0.12500f, 0.00000f, 1.00000f},
8068- {0.00000f, 1.00000f, 0.00000f, 0.56250f},
8069- },
8070- {
8071- {0.00000f, 0.50000f, 0.00000f, 0.71875f},
8072- {0.06250f, 0.84375f, 0.00000f, 0.87500f},
8073- {1.00000f, 0.50000f, 0.50000f, 0.96875f},
8074- {1.00000f, 0.09375f, 0.31250f, 0.50000f},
8075- {1.00000f, 1.00000f, 1.00000f, 1.00000f},
8076- {1.00000f, 0.03125f, 0.03125f, 0.53125f},
8077- {0.18750f, 0.12500f, 0.00000f, 1.00000f},
8078- {0.00000f, 1.00000f, 0.03125f, 0.18750f},
8079- },
8080- {
8081- {0.00000f, 0.59375f, 0.00000f, 0.96875f},
8082- {0.06250f, 0.81250f, 0.06250f, 0.59375f},
8083- {0.75000f, 0.43750f, 0.12500f, 0.96875f},
8084- {0.87500f, 0.06250f, 0.18750f, 0.43750f},
8085- {1.00000f, 1.00000f, 1.00000f, 1.00000f},
8086- {0.15625f, 0.12500f, 1.00000f, 1.00000f},
8087- {0.06250f, 0.12500f, 0.00000f, 1.00000f},
8088- {0.00000f, 1.00000f, 0.03125f, 0.34375f},
8089- }};
8090-
8091-// structure that allow us to query and override info for training the costs
8092-typedef struct STBIR__V_FIRST_INFO {
8093- double v_cost, h_cost;
8094- int control_v_first; // 0 = no control, 1 = force hori, 2 = force vert
8095- int v_first;
8096- int v_resize_classification;
8097- int is_gather;
8098-} STBIR__V_FIRST_INFO;
8099-
8100-#ifdef STBIR__V_FIRST_INFO_BUFFER
8101-static STBIR__V_FIRST_INFO STBIR__V_FIRST_INFO_BUFFER = {0};
8102-#define STBIR__V_FIRST_INFO_POINTER &STBIR__V_FIRST_INFO_BUFFER
8103-#else
8104-#define STBIR__V_FIRST_INFO_POINTER 0
8105-#endif
8106-
8107-// Figure out whether to scale along the horizontal or vertical first.
8108-// This only *super* important when you are scaling by a massively
8109-// different amount in the vertical vs the horizontal (for example, if
8110-// you are scaling by 2x in the width, and 0.5x in the height, then you
8111-// want to do the vertical scale first, because it's around 3x faster
8112-// in that order.
8113-//
8114-// In more normal circumstances, this makes a 20-40% differences, so
8115-// it's good to get right, but not critical. The normal way that you
8116-// decide which direction goes first is just figuring out which
8117-// direction does more multiplies. But with modern CPUs with their
8118-// fancy caches and SIMD and high IPC abilities, so there's just a lot
8119-// more that goes into it.
8120-//
8121-// My handwavy sort of solution is to have an app that does a whole
8122-// bunch of timing for both vertical and horizontal first modes,
8123-// and then another app that can read lots of these timing files
8124-// and try to search for the best weights to use. Dotimings.c
8125-// is the app that does a bunch of timings, and vf_train.c is the
8126-// app that solves for the best weights (and shows how well it
8127-// does currently).
8128-
8129-static int
8130-stbir__should_do_vertical_first(
8131- float weights_table[STBIR_RESIZE_CLASSIFICATIONS][4],
8132- int horizontal_filter_pixel_width, float horizontal_scale,
8133- int horizontal_output_size, int vertical_filter_pixel_width,
8134- float vertical_scale, int vertical_output_size, int is_gather,
8135- STBIR__V_FIRST_INFO *info)
8136-{
8137- double v_cost, h_cost;
8138- float *weights;
8139- int vertical_first;
8140- int v_classification;
8141-
8142- // categorize the resize into buckets
8143- if ((vertical_output_size <= 4) || (horizontal_output_size <= 4)) {
8144- v_classification =
8145- (vertical_output_size < horizontal_output_size) ? 6 : 7;
8146- } else if (vertical_scale <= 1.0f) {
8147- v_classification = (is_gather) ? 1 : 0;
8148- } else if (vertical_scale <= 2.0f) {
8149- v_classification = 2;
8150- } else if (vertical_scale <= 3.0f) {
8151- v_classification = 3;
8152- } else if (vertical_scale <= 4.0f) {
8153- v_classification = 5;
8154- } else {
8155- v_classification = 6;
8156- }
8157-
8158- // use the right weights
8159- weights = weights_table[v_classification];
8160-
8161- // this is the costs when you don't take into account modern CPUs with high
8162- // ipc and simd and caches - wish we had a better estimate
8163- h_cost = (float)horizontal_filter_pixel_width * weights[0] +
8164- horizontal_scale * (float)vertical_filter_pixel_width * weights[1];
8165- v_cost = (float)vertical_filter_pixel_width * weights[2] +
8166- vertical_scale * (float)horizontal_filter_pixel_width * weights[3];
8167-
8168- // use computation estimate to decide vertical first or not
8169- vertical_first = (v_cost <= h_cost) ? 1 : 0;
8170-
8171- // save these, if requested
8172- if (info) {
8173- info->h_cost = h_cost;
8174- info->v_cost = v_cost;
8175- info->v_resize_classification = v_classification;
8176- info->v_first = vertical_first;
8177- info->is_gather = is_gather;
8178- }
8179-
8180- // and this allows us to override everything for testing (see dotiming.c)
8181- if ((info) && (info->control_v_first)) {
8182- vertical_first = (info->control_v_first == 2) ? 1 : 0;
8183- }
8184-
8185- return vertical_first;
8186-}
8187-
8188-// layout lookups - must match stbir_internal_pixel_layout
8189-static unsigned char stbir__pixel_channels[] = {
8190- 1, 2, 3, 3, 4, // 1ch, 2ch, rgb, bgr, 4ch
8191- 4, 4, 4, 4, 2, 2, // RGBA,BGRA,ARGB,ABGR,RA,AR
8192- 4, 4, 4, 4, 2, 2, // RGBA_PM,BGRA_PM,ARGB_PM,ABGR_PM,RA_PM,AR_PM
8193-};
8194-
8195-// the internal pixel layout enums are in a different order, so we can easily do
8196-// range comparisons of types
8197-// the public pixel layout is ordered in a way that if you cast num_channels
8198-// (1-4) to the enum, you get something sensible
8199-static stbir_internal_pixel_layout
8200- stbir__pixel_layout_convert_public_to_internal[] = {
8201- STBIRI_BGR, STBIRI_1CHANNEL, STBIRI_2CHANNEL, STBIRI_RGB,
8202- STBIRI_RGBA, STBIRI_4CHANNEL, STBIRI_BGRA, STBIRI_ARGB,
8203- STBIRI_ABGR, STBIRI_RA, STBIRI_AR, STBIRI_RGBA_PM,
8204- STBIRI_BGRA_PM, STBIRI_ARGB_PM, STBIRI_ABGR_PM, STBIRI_RA_PM,
8205- STBIRI_AR_PM,
8206-};
8207-
8208-static stbir__info *
8209-stbir__alloc_internal_mem_and_build_samplers(
8210- stbir__sampler *horizontal, stbir__sampler *vertical,
8211- stbir__contributors *conservative,
8212- stbir_pixel_layout input_pixel_layout_public,
8213- stbir_pixel_layout output_pixel_layout_public, int splits, int new_x,
8214- int new_y, int fast_alpha,
8215- void *user_data STBIR_ONLY_PROFILE_BUILD_GET_INFO)
8216-{
8217- static char stbir_channel_count_index[8] = {9, 0, 1, 2, 3, 9, 9, 4};
8218-
8219- stbir__info *info = 0;
8220- void *alloced = 0;
8221- size_t alloced_total = 0;
8222- int vertical_first;
8223- size_t decode_buffer_size, ring_buffer_length_bytes, ring_buffer_size,
8224- vertical_buffer_size;
8225- int alloc_ring_buffer_num_entries;
8226-
8227- int alpha_weighting_type = 0; // 0=none, 1=simple, 2=fancy
8228- int conservative_split_output_size =
8229- stbir__get_max_split(splits, vertical->scale_info.output_sub_size);
8230- stbir_internal_pixel_layout input_pixel_layout =
8231- stbir__pixel_layout_convert_public_to_internal
8232- [input_pixel_layout_public];
8233- stbir_internal_pixel_layout output_pixel_layout =
8234- stbir__pixel_layout_convert_public_to_internal
8235- [output_pixel_layout_public];
8236- int channels = stbir__pixel_channels[input_pixel_layout];
8237- int effective_channels = channels;
8238-
8239- // first figure out what type of alpha weighting to use (if any)
8240- if ((horizontal->filter_enum != STBIR_FILTER_POINT_SAMPLE) ||
8241- (vertical->filter_enum !=
8242- STBIR_FILTER_POINT_SAMPLE)) // no alpha weighting on point sampling
8243- {
8244- if ((input_pixel_layout >= STBIRI_RGBA) &&
8245- (input_pixel_layout <= STBIRI_AR) &&
8246- (output_pixel_layout >= STBIRI_RGBA) &&
8247- (output_pixel_layout <= STBIRI_AR)) {
8248- if (fast_alpha) {
8249- alpha_weighting_type = 4;
8250- } else {
8251- static int fancy_alpha_effective_cnts[6] = {7, 7, 7, 7, 3, 3};
8252- alpha_weighting_type = 2;
8253- effective_channels =
8254- fancy_alpha_effective_cnts[input_pixel_layout -
8255- STBIRI_RGBA];
8256- }
8257- } else if ((input_pixel_layout >= STBIRI_RGBA_PM) &&
8258- (input_pixel_layout <= STBIRI_AR_PM) &&
8259- (output_pixel_layout >= STBIRI_RGBA) &&
8260- (output_pixel_layout <= STBIRI_AR)) {
8261- // input premult, output non-premult
8262- alpha_weighting_type = 3;
8263- } else if ((input_pixel_layout >= STBIRI_RGBA) &&
8264- (input_pixel_layout <= STBIRI_AR) &&
8265- (output_pixel_layout >= STBIRI_RGBA_PM) &&
8266- (output_pixel_layout <= STBIRI_AR_PM)) {
8267- // input non-premult, output premult
8268- alpha_weighting_type = 1;
8269- }
8270- }
8271-
8272- // channel in and out count must match currently
8273- if (channels != stbir__pixel_channels[output_pixel_layout]) {
8274- return 0;
8275- }
8276-
8277- // get vertical first
8278- vertical_first = stbir__should_do_vertical_first(
8279- stbir__compute_weights[(
8280- int)stbir_channel_count_index[effective_channels]],
8281- horizontal->filter_pixel_width, horizontal->scale_info.scale,
8282- horizontal->scale_info.output_sub_size, vertical->filter_pixel_width,
8283- vertical->scale_info.scale, vertical->scale_info.output_sub_size,
8284- vertical->is_gather, STBIR__V_FIRST_INFO_POINTER);
8285-
8286- // sometimes read one float off in some of the unrolled loops (with a weight
8287- // of zero coeff, so it doesn't have an effect)
8288- // we use a few extra floats instead of just 1, so that input callback
8289- // buffer can overlap with the decode buffer without the conversion
8290- // routines overwriting the callback input data.
8291- decode_buffer_size =
8292- (conservative->n1 - conservative->n0 + 1) * effective_channels *
8293- sizeof(float) +
8294- sizeof(float) * STBIR_INPUT_CALLBACK_PADDING; // extra floats for input
8295- // callback stagger
8296-
8297-#if defined(STBIR__SEPARATE_ALLOCATIONS) && defined(STBIR_SIMD8)
8298- if (effective_channels == 3) {
8299- decode_buffer_size +=
8300- sizeof(float); // avx in 3 channel mode needs one float at the start
8301- // of the buffer (only with separate allocations)
8302- }
8303-#endif
8304-
8305- ring_buffer_length_bytes =
8306- (size_t)horizontal->scale_info.output_sub_size *
8307- (size_t)effective_channels * sizeof(float) +
8308- sizeof(float) *
8309- STBIR_INPUT_CALLBACK_PADDING; // extra floats for padding
8310-
8311- // if we do vertical first, the ring buffer holds a whole decoded line
8312- if (vertical_first) {
8313- ring_buffer_length_bytes = (decode_buffer_size + 15) & ~15;
8314- }
8315-
8316- if ((ring_buffer_length_bytes & 4095) == 0) {
8317- ring_buffer_length_bytes += 64 * 3; // avoid 4k alias
8318- }
8319-
8320- // One extra entry because floating point precision problems sometimes cause
8321- // an extra to be necessary.
8322- alloc_ring_buffer_num_entries = vertical->filter_pixel_width + 1;
8323-
8324- // we never need more ring buffer entries than the scanlines we're
8325- // outputting when in scatter mode
8326- if ((!vertical->is_gather) &&
8327- (alloc_ring_buffer_num_entries > conservative_split_output_size)) {
8328- alloc_ring_buffer_num_entries = conservative_split_output_size;
8329- }
8330-
8331- ring_buffer_size = (size_t)alloc_ring_buffer_num_entries *
8332- (size_t)ring_buffer_length_bytes;
8333-
8334- // The vertical buffer is used differently, depending on whether we are
8335- // scattering
8336- // the vertical scanlines, or gathering them.
8337- // If scattering, it's used at the temp buffer to accumulate each output.
8338- // If gathering, it's just the output buffer.
8339- vertical_buffer_size = (size_t)horizontal->scale_info.output_sub_size *
8340- (size_t)effective_channels * sizeof(float) +
8341- sizeof(float); // extra float for padding
8342-
8343- // we make two passes through this loop, 1st to add everything up, 2nd to
8344- // allocate and init
8345- for (;;) {
8346- int i;
8347- void *advance_mem = alloced;
8348- int copy_horizontal = 0;
8349- stbir__sampler *possibly_use_horizontal_for_pivot = 0;
8350-
8351-#ifdef STBIR__SEPARATE_ALLOCATIONS
8352-#define STBIR__NEXT_PTR(ptr, size, ntype) \
8353- if (alloced) { \
8354- void *p = STBIR_MALLOC(size, user_data); \
8355- if (p == 0) { \
8356- stbir__free_internal_mem(info); \
8357- return 0; \
8358- } \
8359- (ptr) = (ntype *)p; \
8360- }
8361-#else
8362-#define STBIR__NEXT_PTR(ptr, size, ntype) \
8363- advance_mem = (void *)((((size_t)advance_mem) + 15) & ~15); \
8364- if (alloced) \
8365- ptr = (ntype *)advance_mem; \
8366- advance_mem = (char *)(((size_t)advance_mem) + (size));
8367-#endif
8368-
8369- STBIR__NEXT_PTR(info, sizeof(stbir__info), stbir__info);
8370-
8371- STBIR__NEXT_PTR(info->split_info,
8372- sizeof(stbir__per_split_info) * splits,
8373- stbir__per_split_info);
8374-
8375- if (info) {
8376- static stbir__alpha_weight_func *fancy_alpha_weights[6] = {
8377- stbir__fancy_alpha_weight_4ch, stbir__fancy_alpha_weight_4ch,
8378- stbir__fancy_alpha_weight_4ch, stbir__fancy_alpha_weight_4ch,
8379- stbir__fancy_alpha_weight_2ch, stbir__fancy_alpha_weight_2ch};
8380- static stbir__alpha_unweight_func *fancy_alpha_unweights[6] = {
8381- stbir__fancy_alpha_unweight_4ch,
8382- stbir__fancy_alpha_unweight_4ch,
8383- stbir__fancy_alpha_unweight_4ch,
8384- stbir__fancy_alpha_unweight_4ch,
8385- stbir__fancy_alpha_unweight_2ch,
8386- stbir__fancy_alpha_unweight_2ch};
8387- static stbir__alpha_weight_func *simple_alpha_weights[6] = {
8388- stbir__simple_alpha_weight_4ch, stbir__simple_alpha_weight_4ch,
8389- stbir__simple_alpha_weight_4ch, stbir__simple_alpha_weight_4ch,
8390- stbir__simple_alpha_weight_2ch, stbir__simple_alpha_weight_2ch};
8391- static stbir__alpha_unweight_func *simple_alpha_unweights[6] = {
8392- stbir__simple_alpha_unweight_4ch,
8393- stbir__simple_alpha_unweight_4ch,
8394- stbir__simple_alpha_unweight_4ch,
8395- stbir__simple_alpha_unweight_4ch,
8396- stbir__simple_alpha_unweight_2ch,
8397- stbir__simple_alpha_unweight_2ch};
8398-
8399- // initialize info fields
8400- info->alloced_mem = alloced;
8401- info->alloced_total = alloced_total;
8402-
8403- info->channels = channels;
8404- info->effective_channels = effective_channels;
8405-
8406- info->offset_x = new_x;
8407- info->offset_y = new_y;
8408- info->alloc_ring_buffer_num_entries =
8409- (int)alloc_ring_buffer_num_entries;
8410- info->ring_buffer_num_entries = 0;
8411- info->ring_buffer_length_bytes = (int)ring_buffer_length_bytes;
8412- info->splits = splits;
8413- info->vertical_first = vertical_first;
8414-
8415- info->input_pixel_layout_internal = input_pixel_layout;
8416- info->output_pixel_layout_internal = output_pixel_layout;
8417-
8418- // setup alpha weight functions
8419- info->alpha_weight = 0;
8420- info->alpha_unweight = 0;
8421-
8422- // handle alpha weighting functions and overrides
8423- if (alpha_weighting_type == 2) {
8424- // high quality alpha multiplying on the way in, dividing on the
8425- // way out
8426- info->alpha_weight =
8427- fancy_alpha_weights[input_pixel_layout - STBIRI_RGBA];
8428- info->alpha_unweight =
8429- fancy_alpha_unweights[output_pixel_layout - STBIRI_RGBA];
8430- } else if (alpha_weighting_type == 4) {
8431- // fast alpha multiplying on the way in, dividing on the way out
8432- info->alpha_weight =
8433- simple_alpha_weights[input_pixel_layout - STBIRI_RGBA];
8434- info->alpha_unweight =
8435- simple_alpha_unweights[output_pixel_layout - STBIRI_RGBA];
8436- } else if (alpha_weighting_type == 1) {
8437- // fast alpha on the way in, leave in premultiplied form on way
8438- // out
8439- info->alpha_weight =
8440- simple_alpha_weights[input_pixel_layout - STBIRI_RGBA];
8441- } else if (alpha_weighting_type == 3) {
8442- // incoming is premultiplied, fast alpha dividing on the way out
8443- // - non-premultiplied output
8444- info->alpha_unweight =
8445- simple_alpha_unweights[output_pixel_layout - STBIRI_RGBA];
8446- }
8447-
8448- // handle 3-chan color flipping, using the alpha weight path
8449- if (((input_pixel_layout == STBIRI_RGB) &&
8450- (output_pixel_layout == STBIRI_BGR)) ||
8451- ((input_pixel_layout == STBIRI_BGR) &&
8452- (output_pixel_layout == STBIRI_RGB))) {
8453- // do the flipping on the smaller of the two ends
8454- if (horizontal->scale_info.scale < 1.0f) {
8455- info->alpha_unweight = stbir__simple_flip_3ch;
8456- } else {
8457- info->alpha_weight = stbir__simple_flip_3ch;
8458- }
8459- }
8460- }
8461-
8462- // get all the per-split buffers
8463- for (i = 0; i < splits; i++) {
8464- STBIR__NEXT_PTR(info->split_info[i].decode_buffer,
8465- decode_buffer_size, float);
8466-
8467-#ifdef STBIR__SEPARATE_ALLOCATIONS
8468-
8469-#ifdef STBIR_SIMD8
8470- if ((info) && (effective_channels == 3)) {
8471- ++info->split_info[i]
8472- .decode_buffer; // avx in 3 channel mode needs one float
8473- // at the start of the buffer
8474- }
8475-#endif
8476-
8477- STBIR__NEXT_PTR(info->split_info[i].ring_buffers,
8478- alloc_ring_buffer_num_entries * sizeof(float *),
8479- float *);
8480- {
8481- int j;
8482- for (j = 0; j < alloc_ring_buffer_num_entries; j++) {
8483- STBIR__NEXT_PTR(info->split_info[i].ring_buffers[j],
8484- ring_buffer_length_bytes, float);
8485-#ifdef STBIR_SIMD8
8486- if ((info) && (effective_channels == 3)) {
8487- ++info->split_info[i]
8488- .ring_buffers[j]; // avx in 3 channel mode needs
8489- // one float at the start of the
8490- // buffer
8491- }
8492-#endif
8493- }
8494- }
8495-#else
8496- STBIR__NEXT_PTR(info->split_info[i].ring_buffer, ring_buffer_size,
8497- float);
8498-#endif
8499- STBIR__NEXT_PTR(info->split_info[i].vertical_buffer,
8500- vertical_buffer_size, float);
8501- }
8502-
8503- // alloc memory for to-be-pivoted coeffs (if necessary)
8504- if (vertical->is_gather == 0) {
8505- size_t both;
8506- size_t temp_mem_amt;
8507-
8508- // when in vertical scatter mode, we first build the coefficients in
8509- // gather mode, and then pivot after,
8510- // that means we need two buffers, so we try to use the decode
8511- // buffer and ring buffer for this. if that is too small, we just
8512- // allocate extra memory to use as this temp.
8513-
8514- both = (size_t)vertical->gather_prescatter_contributors_size +
8515- (size_t)vertical->gather_prescatter_coefficients_size;
8516-
8517-#ifdef STBIR__SEPARATE_ALLOCATIONS
8518- temp_mem_amt = decode_buffer_size;
8519-
8520-#ifdef STBIR_SIMD8
8521- if (effective_channels == 3) {
8522- --temp_mem_amt; // avx in 3 channel mode needs one float at the
8523- // start of the buffer
8524- }
8525-#endif
8526-#else
8527- temp_mem_amt = (size_t)(decode_buffer_size + ring_buffer_size +
8528- vertical_buffer_size) *
8529- (size_t)splits;
8530-#endif
8531- if (temp_mem_amt >= both) {
8532- if (info) {
8533- vertical->gather_prescatter_contributors =
8534- (stbir__contributors *)info->split_info[0]
8535- .decode_buffer;
8536- vertical->gather_prescatter_coefficients =
8537- (float *)(((char *)info->split_info[0].decode_buffer) +
8538- vertical
8539- ->gather_prescatter_contributors_size);
8540- }
8541- } else {
8542- // ring+decode memory is too small, so allocate temp memory
8543- STBIR__NEXT_PTR(vertical->gather_prescatter_contributors,
8544- vertical->gather_prescatter_contributors_size,
8545- stbir__contributors);
8546- STBIR__NEXT_PTR(vertical->gather_prescatter_coefficients,
8547- vertical->gather_prescatter_coefficients_size,
8548- float);
8549- }
8550- }
8551-
8552- STBIR__NEXT_PTR(horizontal->contributors, horizontal->contributors_size,
8553- stbir__contributors);
8554- STBIR__NEXT_PTR(horizontal->coefficients, horizontal->coefficients_size,
8555- float);
8556-
8557- // are the two filters identical?? (happens a lot with mipmap
8558- // generation)
8559- if ((horizontal->filter_kernel == vertical->filter_kernel) &&
8560- (horizontal->filter_support == vertical->filter_support) &&
8561- (horizontal->edge == vertical->edge) &&
8562- (horizontal->scale_info.output_sub_size ==
8563- vertical->scale_info.output_sub_size)) {
8564- float diff_scale =
8565- horizontal->scale_info.scale - vertical->scale_info.scale;
8566- float diff_shift = horizontal->scale_info.pixel_shift -
8567- vertical->scale_info.pixel_shift;
8568- if (diff_scale < 0.0f) {
8569- diff_scale = -diff_scale;
8570- }
8571- if (diff_shift < 0.0f) {
8572- diff_shift = -diff_shift;
8573- }
8574- if ((diff_scale <= stbir__small_float) &&
8575- (diff_shift <= stbir__small_float)) {
8576- if (horizontal->is_gather == vertical->is_gather) {
8577- copy_horizontal = 1;
8578- goto no_vert_alloc;
8579- }
8580- // everything matches, but vertical is scatter, horizontal is
8581- // gather, use horizontal coeffs for vertical pivot coeffs
8582- possibly_use_horizontal_for_pivot = horizontal;
8583- }
8584- }
8585-
8586- STBIR__NEXT_PTR(vertical->contributors, vertical->contributors_size,
8587- stbir__contributors);
8588- STBIR__NEXT_PTR(vertical->coefficients, vertical->coefficients_size,
8589- float);
8590-
8591- no_vert_alloc:
8592-
8593- if (info) {
8594- STBIR_PROFILE_BUILD_START(horizontal);
8595-
8596- stbir__calculate_filters(
8597- horizontal, 0, user_data STBIR_ONLY_PROFILE_BUILD_SET_INFO);
8598-
8599- // setup the horizontal gather functions
8600- // start with defaulting to the n_coeffs functions (specialized on
8601- // channels and remnant leftover)
8602- info->horizontal_gather_channels =
8603- stbir__horizontal_gather_n_coeffs_funcs
8604- [effective_channels][horizontal->extent_info.widest & 3];
8605- // but if the number of coeffs <= 12, use another set of special
8606- // cases. <=12 coeffs is any enlarging resize, or shrinking resize
8607- // down to about 1/3 size
8608- if (horizontal->extent_info.widest <= 12) {
8609- info->horizontal_gather_channels =
8610- stbir__horizontal_gather_channels_funcs
8611- [effective_channels]
8612- [horizontal->extent_info.widest - 1];
8613- }
8614-
8615- info->scanline_extents.conservative.n0 = conservative->n0;
8616- info->scanline_extents.conservative.n1 = conservative->n1;
8617-
8618- // get exact extents
8619- stbir__get_extents(horizontal, &info->scanline_extents);
8620-
8621- // pack the horizontal coeffs
8622- horizontal->coefficient_width = stbir__pack_coefficients(
8623- horizontal->num_contributors, horizontal->contributors,
8624- horizontal->coefficients, horizontal->coefficient_width,
8625- horizontal->extent_info.widest,
8626- info->scanline_extents.conservative.n0,
8627- info->scanline_extents.conservative.n1);
8628-
8629- STBIR_MEMCPY(&info->horizontal, horizontal, sizeof(stbir__sampler));
8630-
8631- STBIR_PROFILE_BUILD_END(horizontal);
8632-
8633- if (copy_horizontal) {
8634- STBIR_MEMCPY(&info->vertical, horizontal,
8635- sizeof(stbir__sampler));
8636- } else {
8637- STBIR_PROFILE_BUILD_START(vertical);
8638-
8639- stbir__calculate_filters(
8640- vertical, possibly_use_horizontal_for_pivot,
8641- user_data STBIR_ONLY_PROFILE_BUILD_SET_INFO);
8642- STBIR_MEMCPY(&info->vertical, vertical, sizeof(stbir__sampler));
8643-
8644- STBIR_PROFILE_BUILD_END(vertical);
8645- }
8646-
8647- // setup the vertical split ranges
8648- stbir__get_split_info(info->split_info, info->splits,
8649- info->vertical.scale_info.output_sub_size,
8650- info->vertical.filter_pixel_margin,
8651- info->vertical.scale_info.input_full_size,
8652- info->vertical.is_gather,
8653- info->vertical.contributors);
8654-
8655- // now we know precisely how many entries we need
8656- info->ring_buffer_num_entries = info->vertical.extent_info.widest;
8657-
8658- // we never need more ring buffer entries than the scanlines we're
8659- // outputting
8660- if ((!info->vertical.is_gather) &&
8661- (info->ring_buffer_num_entries >
8662- conservative_split_output_size)) {
8663- info->ring_buffer_num_entries = conservative_split_output_size;
8664- }
8665- STBIR_ASSERT(info->ring_buffer_num_entries <=
8666- info->alloc_ring_buffer_num_entries);
8667- }
8668-#undef STBIR__NEXT_PTR
8669-
8670- // is this the first time through loop?
8671- if (info == 0) {
8672- alloced_total = (15 + (size_t)advance_mem);
8673- alloced = STBIR_MALLOC(alloced_total, user_data);
8674- if (alloced == 0) {
8675- return 0;
8676- }
8677- } else {
8678- return info; // success
8679- }
8680- }
8681-}
8682-
8683-static int
8684-stbir__perform_resize(stbir__info const *info, int split_start, int split_count)
8685-{
8686- stbir__per_split_info *split_info = info->split_info + split_start;
8687-
8688- STBIR_PROFILE_CLEAR_EXTRAS();
8689-
8690- STBIR_PROFILE_FIRST_START(looping);
8691- if (info->vertical.is_gather) {
8692- stbir__vertical_gather_loop(info, split_info, split_count);
8693- } else {
8694- stbir__vertical_scatter_loop(info, split_info, split_count);
8695- }
8696- STBIR_PROFILE_END(looping);
8697-
8698- return 1;
8699-}
8700-
8701-static void
8702-stbir__update_info_from_resize(stbir__info *info, STBIR_RESIZE *resize)
8703-{
8704- static stbir__decode_pixels_func
8705- *decode_simple[STBIR_TYPE_HALF_FLOAT - STBIR_TYPE_UINT8_SRGB + 1] = {
8706- /* 1ch-4ch */ stbir__decode_uint8_srgb,
8707- stbir__decode_uint8_srgb,
8708- 0,
8709- stbir__decode_float_linear,
8710- stbir__decode_half_float_linear,
8711- };
8712-
8713- static stbir__decode_pixels_func
8714- *decode_alphas[STBIRI_AR - STBIRI_RGBA +
8715- 1][STBIR_TYPE_HALF_FLOAT - STBIR_TYPE_UINT8_SRGB + 1] = {
8716- {/* RGBA */ stbir__decode_uint8_srgb4_linearalpha,
8717- stbir__decode_uint8_srgb, 0, stbir__decode_float_linear,
8718- stbir__decode_half_float_linear},
8719- {/* BGRA */ stbir__decode_uint8_srgb4_linearalpha_BGRA,
8720- stbir__decode_uint8_srgb_BGRA, 0, stbir__decode_float_linear_BGRA,
8721- stbir__decode_half_float_linear_BGRA},
8722- {/* ARGB */ stbir__decode_uint8_srgb4_linearalpha_ARGB,
8723- stbir__decode_uint8_srgb_ARGB, 0, stbir__decode_float_linear_ARGB,
8724- stbir__decode_half_float_linear_ARGB},
8725- {/* ABGR */ stbir__decode_uint8_srgb4_linearalpha_ABGR,
8726- stbir__decode_uint8_srgb_ABGR, 0, stbir__decode_float_linear_ABGR,
8727- stbir__decode_half_float_linear_ABGR},
8728- {/* RA */ stbir__decode_uint8_srgb2_linearalpha,
8729- stbir__decode_uint8_srgb, 0, stbir__decode_float_linear,
8730- stbir__decode_half_float_linear},
8731- {/* AR */ stbir__decode_uint8_srgb2_linearalpha_AR,
8732- stbir__decode_uint8_srgb_AR, 0, stbir__decode_float_linear_AR,
8733- stbir__decode_half_float_linear_AR},
8734- };
8735-
8736- static stbir__decode_pixels_func *decode_simple_scaled_or_not[2][2] = {
8737- {stbir__decode_uint8_linear_scaled, stbir__decode_uint8_linear},
8738- {stbir__decode_uint16_linear_scaled, stbir__decode_uint16_linear},
8739- };
8740-
8741- static stbir__decode_pixels_func
8742- *decode_alphas_scaled_or_not[STBIRI_AR - STBIRI_RGBA + 1][2][2] = {
8743- {/* RGBA */ {stbir__decode_uint8_linear_scaled,
8744- stbir__decode_uint8_linear},
8745- {stbir__decode_uint16_linear_scaled, stbir__decode_uint16_linear}},
8746- {/* BGRA */ {stbir__decode_uint8_linear_scaled_BGRA,
8747- stbir__decode_uint8_linear_BGRA},
8748- {stbir__decode_uint16_linear_scaled_BGRA,
8749- stbir__decode_uint16_linear_BGRA}},
8750- {/* ARGB */ {stbir__decode_uint8_linear_scaled_ARGB,
8751- stbir__decode_uint8_linear_ARGB},
8752- {stbir__decode_uint16_linear_scaled_ARGB,
8753- stbir__decode_uint16_linear_ARGB}},
8754- {/* ABGR */ {stbir__decode_uint8_linear_scaled_ABGR,
8755- stbir__decode_uint8_linear_ABGR},
8756- {stbir__decode_uint16_linear_scaled_ABGR,
8757- stbir__decode_uint16_linear_ABGR}},
8758- {/* RA */ {stbir__decode_uint8_linear_scaled,
8759- stbir__decode_uint8_linear},
8760- {stbir__decode_uint16_linear_scaled, stbir__decode_uint16_linear}},
8761- {/* AR */ {stbir__decode_uint8_linear_scaled_AR,
8762- stbir__decode_uint8_linear_AR},
8763- {stbir__decode_uint16_linear_scaled_AR,
8764- stbir__decode_uint16_linear_AR}}};
8765-
8766- static stbir__encode_pixels_func
8767- *encode_simple[STBIR_TYPE_HALF_FLOAT - STBIR_TYPE_UINT8_SRGB + 1] = {
8768- /* 1ch-4ch */ stbir__encode_uint8_srgb,
8769- stbir__encode_uint8_srgb,
8770- 0,
8771- stbir__encode_float_linear,
8772- stbir__encode_half_float_linear,
8773- };
8774-
8775- static stbir__encode_pixels_func
8776- *encode_alphas[STBIRI_AR - STBIRI_RGBA +
8777- 1][STBIR_TYPE_HALF_FLOAT - STBIR_TYPE_UINT8_SRGB + 1] = {
8778- {/* RGBA */ stbir__encode_uint8_srgb4_linearalpha,
8779- stbir__encode_uint8_srgb, 0, stbir__encode_float_linear,
8780- stbir__encode_half_float_linear},
8781- {/* BGRA */ stbir__encode_uint8_srgb4_linearalpha_BGRA,
8782- stbir__encode_uint8_srgb_BGRA, 0, stbir__encode_float_linear_BGRA,
8783- stbir__encode_half_float_linear_BGRA},
8784- {/* ARGB */ stbir__encode_uint8_srgb4_linearalpha_ARGB,
8785- stbir__encode_uint8_srgb_ARGB, 0, stbir__encode_float_linear_ARGB,
8786- stbir__encode_half_float_linear_ARGB},
8787- {/* ABGR */ stbir__encode_uint8_srgb4_linearalpha_ABGR,
8788- stbir__encode_uint8_srgb_ABGR, 0, stbir__encode_float_linear_ABGR,
8789- stbir__encode_half_float_linear_ABGR},
8790- {/* RA */ stbir__encode_uint8_srgb2_linearalpha,
8791- stbir__encode_uint8_srgb, 0, stbir__encode_float_linear,
8792- stbir__encode_half_float_linear},
8793- {/* AR */ stbir__encode_uint8_srgb2_linearalpha_AR,
8794- stbir__encode_uint8_srgb_AR, 0, stbir__encode_float_linear_AR,
8795- stbir__encode_half_float_linear_AR}};
8796-
8797- static stbir__encode_pixels_func *encode_simple_scaled_or_not[2][2] = {
8798- {stbir__encode_uint8_linear_scaled, stbir__encode_uint8_linear},
8799- {stbir__encode_uint16_linear_scaled, stbir__encode_uint16_linear},
8800- };
8801-
8802- static stbir__encode_pixels_func
8803- *encode_alphas_scaled_or_not[STBIRI_AR - STBIRI_RGBA + 1][2][2] = {
8804- {/* RGBA */ {stbir__encode_uint8_linear_scaled,
8805- stbir__encode_uint8_linear},
8806- {stbir__encode_uint16_linear_scaled, stbir__encode_uint16_linear}},
8807- {/* BGRA */ {stbir__encode_uint8_linear_scaled_BGRA,
8808- stbir__encode_uint8_linear_BGRA},
8809- {stbir__encode_uint16_linear_scaled_BGRA,
8810- stbir__encode_uint16_linear_BGRA}},
8811- {/* ARGB */ {stbir__encode_uint8_linear_scaled_ARGB,
8812- stbir__encode_uint8_linear_ARGB},
8813- {stbir__encode_uint16_linear_scaled_ARGB,
8814- stbir__encode_uint16_linear_ARGB}},
8815- {/* ABGR */ {stbir__encode_uint8_linear_scaled_ABGR,
8816- stbir__encode_uint8_linear_ABGR},
8817- {stbir__encode_uint16_linear_scaled_ABGR,
8818- stbir__encode_uint16_linear_ABGR}},
8819- {/* RA */ {stbir__encode_uint8_linear_scaled,
8820- stbir__encode_uint8_linear},
8821- {stbir__encode_uint16_linear_scaled, stbir__encode_uint16_linear}},
8822- {/* AR */ {stbir__encode_uint8_linear_scaled_AR,
8823- stbir__encode_uint8_linear_AR},
8824- {stbir__encode_uint16_linear_scaled_AR,
8825- stbir__encode_uint16_linear_AR}}};
8826-
8827- stbir__decode_pixels_func *decode_pixels = 0;
8828- stbir__encode_pixels_func *encode_pixels = 0;
8829- stbir_datatype input_type, output_type;
8830-
8831- input_type = resize->input_data_type;
8832- output_type = resize->output_data_type;
8833- info->input_data = resize->input_pixels;
8834- info->input_stride_bytes = resize->input_stride_in_bytes;
8835- info->output_stride_bytes = resize->output_stride_in_bytes;
8836-
8837- // if we're completely point sampling, then we can turn off SRGB
8838- if ((info->horizontal.filter_enum == STBIR_FILTER_POINT_SAMPLE) &&
8839- (info->vertical.filter_enum == STBIR_FILTER_POINT_SAMPLE)) {
8840- if (((input_type == STBIR_TYPE_UINT8_SRGB) ||
8841- (input_type == STBIR_TYPE_UINT8_SRGB_ALPHA)) &&
8842- ((output_type == STBIR_TYPE_UINT8_SRGB) ||
8843- (output_type == STBIR_TYPE_UINT8_SRGB_ALPHA))) {
8844- input_type = STBIR_TYPE_UINT8;
8845- output_type = STBIR_TYPE_UINT8;
8846- }
8847- }
8848-
8849- // recalc the output and input strides
8850- if (info->input_stride_bytes == 0) {
8851- info->input_stride_bytes = info->channels *
8852- info->horizontal.scale_info.input_full_size *
8853- stbir__type_size[input_type];
8854- }
8855-
8856- if (info->output_stride_bytes == 0) {
8857- info->output_stride_bytes =
8858- info->channels * info->horizontal.scale_info.output_sub_size *
8859- stbir__type_size[output_type];
8860- }
8861-
8862- // calc offset
8863- info->output_data =
8864- ((char *)resize->output_pixels) +
8865- ((size_t)info->offset_y * (size_t)resize->output_stride_in_bytes) +
8866- (info->offset_x * info->channels * stbir__type_size[output_type]);
8867-
8868- info->in_pixels_cb = resize->input_cb;
8869- info->user_data = resize->user_data;
8870- info->out_pixels_cb = resize->output_cb;
8871-
8872- // setup the input format converters
8873- if ((input_type == STBIR_TYPE_UINT8) || (input_type == STBIR_TYPE_UINT16)) {
8874- int non_scaled = 0;
8875-
8876- // check if we can run unscaled - 0-255.0/0-65535.0 instead of 0-1.0
8877- // (which is a tiny bit faster when doing linear 8->8 or 16->16)
8878- if ((!info->alpha_weight) &&
8879- (!info->alpha_unweight)) { // don't short circuit when alpha
8880- // weighting (get everything to 0-1.0 as
8881- // usual)
8882- if (((input_type == STBIR_TYPE_UINT8) &&
8883- (output_type == STBIR_TYPE_UINT8)) ||
8884- ((input_type == STBIR_TYPE_UINT16) &&
8885- (output_type == STBIR_TYPE_UINT16))) {
8886- non_scaled = 1;
8887- }
8888- }
8889-
8890- if (info->input_pixel_layout_internal <= STBIRI_4CHANNEL) {
8891- decode_pixels =
8892- decode_simple_scaled_or_not[input_type == STBIR_TYPE_UINT16]
8893- [non_scaled];
8894- } else {
8895- decode_pixels =
8896- decode_alphas_scaled_or_not[(info->input_pixel_layout_internal -
8897- STBIRI_RGBA) %
8898- (STBIRI_AR - STBIRI_RGBA + 1)]
8899- [input_type == STBIR_TYPE_UINT16]
8900- [non_scaled];
8901- }
8902- } else {
8903- if (info->input_pixel_layout_internal <= STBIRI_4CHANNEL) {
8904- decode_pixels = decode_simple[input_type - STBIR_TYPE_UINT8_SRGB];
8905- } else {
8906- decode_pixels = decode_alphas[(info->input_pixel_layout_internal -
8907- STBIRI_RGBA) %
8908- (STBIRI_AR - STBIRI_RGBA + 1)]
8909- [input_type - STBIR_TYPE_UINT8_SRGB];
8910- }
8911- }
8912-
8913- // setup the output format converters
8914- if ((output_type == STBIR_TYPE_UINT8) ||
8915- (output_type == STBIR_TYPE_UINT16)) {
8916- int non_scaled = 0;
8917-
8918- // check if we can run unscaled - 0-255.0/0-65535.0 instead of 0-1.0
8919- // (which is a tiny bit faster when doing linear 8->8 or 16->16)
8920- if ((!info->alpha_weight) &&
8921- (!info->alpha_unweight)) { // don't short circuit when alpha
8922- // weighting (get everything to 0-1.0 as
8923- // usual)
8924- if (((input_type == STBIR_TYPE_UINT8) &&
8925- (output_type == STBIR_TYPE_UINT8)) ||
8926- ((input_type == STBIR_TYPE_UINT16) &&
8927- (output_type == STBIR_TYPE_UINT16))) {
8928- non_scaled = 1;
8929- }
8930- }
8931-
8932- if (info->output_pixel_layout_internal <= STBIRI_4CHANNEL) {
8933- encode_pixels =
8934- encode_simple_scaled_or_not[output_type == STBIR_TYPE_UINT16]
8935- [non_scaled];
8936- } else {
8937- encode_pixels = encode_alphas_scaled_or_not
8938- [(info->output_pixel_layout_internal - STBIRI_RGBA) %
8939- (STBIRI_AR - STBIRI_RGBA + 1)]
8940- [output_type == STBIR_TYPE_UINT16][non_scaled];
8941- }
8942- } else {
8943- if (info->output_pixel_layout_internal <= STBIRI_4CHANNEL) {
8944- encode_pixels = encode_simple[output_type - STBIR_TYPE_UINT8_SRGB];
8945- } else {
8946- encode_pixels = encode_alphas[(info->output_pixel_layout_internal -
8947- STBIRI_RGBA) %
8948- (STBIRI_AR - STBIRI_RGBA + 1)]
8949- [output_type - STBIR_TYPE_UINT8_SRGB];
8950- }
8951- }
8952-
8953- info->input_type = input_type;
8954- info->output_type = output_type;
8955- info->decode_pixels = decode_pixels;
8956- info->encode_pixels = encode_pixels;
8957-}
8958-
8959-static void
8960-stbir__clip(int *outx, int *outsubw, int outw, double *u0, double *u1)
8961-{
8962- double per, adj;
8963- int over;
8964-
8965- // do left/top edge
8966- if (*outx < 0) {
8967- per = ((double)*outx) / ((double)*outsubw); // is negative
8968- adj = per * (*u1 - *u0);
8969- *u0 -= adj; // increases u0
8970- *outx = 0;
8971- }
8972-
8973- // do right/bot edge
8974- over = outw - (*outx + *outsubw);
8975- if (over < 0) {
8976- per = ((double)over) / ((double)*outsubw); // is negative
8977- adj = per * (*u1 - *u0);
8978- *u1 += adj; // decrease u1
8979- *outsubw = outw - *outx;
8980- }
8981-}
8982-
8983-// converts a double to a rational that has less than one float bit of error
8984-// (returns 0 if unable to do so)
8985-static int
8986-stbir__double_to_rational(double f, stbir_uint32 limit, stbir_uint32 *numer,
8987- stbir_uint32 *denom,
8988- int limit_denom) // limit_denom (1) or limit numer (0)
8989-{
8990- double err;
8991- stbir_uint64 top, bot;
8992- stbir_uint64 numer_last = 0;
8993- stbir_uint64 denom_last = 1;
8994- stbir_uint64 numer_estimate = 1;
8995- stbir_uint64 denom_estimate = 0;
8996-
8997- // scale to past float error range
8998- top = (stbir_uint64)(f * (double)(1 << 25));
8999- bot = 1 << 25;
9000-
9001- // keep refining, but usually stops in a few loops - usually 5 for bad cases
9002- for (;;) {
9003- stbir_uint64 est, temp;
9004-
9005- // hit limit, break out and do best full range estimate
9006- if (((limit_denom) ? denom_estimate : numer_estimate) >= limit) {
9007- break;
9008- }
9009-
9010- // is the current error less than 1 bit of a float? if so, we're done
9011- if (denom_estimate) {
9012- err = ((double)numer_estimate / (double)denom_estimate) - f;
9013- if (err < 0.0) {
9014- err = -err;
9015- }
9016- if (err < (1.0 / (double)(1 << 24))) {
9017- // yup, found it
9018- *numer = (stbir_uint32)numer_estimate;
9019- *denom = (stbir_uint32)denom_estimate;
9020- return 1;
9021- }
9022- }
9023-
9024- // no more refinement bits left? break out and do full range estimate
9025- if (bot == 0) {
9026- break;
9027- }
9028-
9029- // gcd the estimate bits
9030- est = top / bot;
9031- temp = top % bot;
9032- top = bot;
9033- bot = temp;
9034-
9035- // move remainders
9036- temp = est * denom_estimate + denom_last;
9037- denom_last = denom_estimate;
9038- denom_estimate = temp;
9039-
9040- // move remainders
9041- temp = est * numer_estimate + numer_last;
9042- numer_last = numer_estimate;
9043- numer_estimate = temp;
9044- }
9045-
9046- // we didn't fine anything good enough for float, use a full range estimate
9047- if (limit_denom) {
9048- numer_estimate = (stbir_uint64)(f * (double)limit + 0.5);
9049- denom_estimate = limit;
9050- } else {
9051- numer_estimate = limit;
9052- denom_estimate = (stbir_uint64)(((double)limit / f) + 0.5);
9053- }
9054-
9055- *numer = (stbir_uint32)numer_estimate;
9056- *denom = (stbir_uint32)denom_estimate;
9057-
9058- err = (denom_estimate) ? (((double)(stbir_uint32)numer_estimate /
9059- (double)(stbir_uint32)denom_estimate) -
9060- f)
9061- : 1.0;
9062- if (err < 0.0) {
9063- err = -err;
9064- }
9065- return (err < (1.0 / (double)(1 << 24))) ? 1 : 0;
9066-}
9067-
9068-static int
9069-stbir__calculate_region_transform(stbir__scale_info *scale_info,
9070- int output_full_range, int *output_offset,
9071- int output_sub_range, int input_full_range,
9072- double input_s0, double input_s1)
9073-{
9074- double output_range, input_range, output_s, input_s, ratio, scale;
9075-
9076- input_s = input_s1 - input_s0;
9077-
9078- // null area
9079- if ((output_full_range == 0) || (input_full_range == 0) ||
9080- (output_sub_range == 0) || (input_s <= stbir__small_float)) {
9081- return 0;
9082- }
9083-
9084- // are either of the ranges completely out of bounds?
9085- if ((*output_offset >= output_full_range) ||
9086- ((*output_offset + output_sub_range) <= 0) ||
9087- (input_s0 >= (1.0f - stbir__small_float)) ||
9088- (input_s1 <= stbir__small_float)) {
9089- return 0;
9090- }
9091-
9092- output_range = (double)output_full_range;
9093- input_range = (double)input_full_range;
9094-
9095- output_s = ((double)output_sub_range) / output_range;
9096-
9097- // figure out the scaling to use
9098- ratio = output_s / input_s;
9099-
9100- // save scale before clipping
9101- scale = (output_range / input_range) * ratio;
9102- scale_info->scale = (float)scale;
9103- scale_info->inv_scale = (float)(1.0 / scale);
9104-
9105- // clip output area to left/right output edges (and adjust input area)
9106- stbir__clip(output_offset, &output_sub_range, output_full_range, &input_s0,
9107- &input_s1);
9108-
9109- // recalc input area
9110- input_s = input_s1 - input_s0;
9111-
9112- // after clipping do we have zero input area?
9113- if (input_s <= stbir__small_float) {
9114- return 0;
9115- }
9116-
9117- // calculate and store the starting source offsets in output pixel space
9118- scale_info->pixel_shift = (float)(input_s0 * ratio * output_range);
9119-
9120- scale_info->scale_is_rational = stbir__double_to_rational(
9121- scale, (scale <= 1.0) ? output_full_range : input_full_range,
9122- &scale_info->scale_numerator, &scale_info->scale_denominator,
9123- (scale >= 1.0));
9124-
9125- scale_info->input_full_size = input_full_range;
9126- scale_info->output_sub_size = output_sub_range;
9127-
9128- return 1;
9129-}
9130-
9131-static void
9132-stbir__init_and_set_layout(STBIR_RESIZE *resize,
9133- stbir_pixel_layout pixel_layout,
9134- stbir_datatype data_type)
9135-{
9136- resize->input_cb = 0;
9137- resize->output_cb = 0;
9138- resize->user_data = resize;
9139- resize->samplers = 0;
9140- resize->called_alloc = 0;
9141- resize->horizontal_filter = STBIR_FILTER_DEFAULT;
9142- resize->horizontal_filter_kernel = 0;
9143- resize->horizontal_filter_support = 0;
9144- resize->vertical_filter = STBIR_FILTER_DEFAULT;
9145- resize->vertical_filter_kernel = 0;
9146- resize->vertical_filter_support = 0;
9147- resize->horizontal_edge = STBIR_EDGE_CLAMP;
9148- resize->vertical_edge = STBIR_EDGE_CLAMP;
9149- resize->input_s0 = 0;
9150- resize->input_t0 = 0;
9151- resize->input_s1 = 1;
9152- resize->input_t1 = 1;
9153- resize->output_subx = 0;
9154- resize->output_suby = 0;
9155- resize->output_subw = resize->output_w;
9156- resize->output_subh = resize->output_h;
9157- resize->input_data_type = data_type;
9158- resize->output_data_type = data_type;
9159- resize->input_pixel_layout_public = pixel_layout;
9160- resize->output_pixel_layout_public = pixel_layout;
9161- resize->needs_rebuild = 1;
9162-}
9163-
9164-STBIRDEF void
9165-stbir_resize_init(STBIR_RESIZE *resize, const void *input_pixels, int input_w,
9166- int input_h, int input_stride_in_bytes, // stride can be zero
9167- void *output_pixels, int output_w, int output_h,
9168- int output_stride_in_bytes, // stride can be zero
9169- stbir_pixel_layout pixel_layout, stbir_datatype data_type)
9170-{
9171- resize->input_pixels = input_pixels;
9172- resize->input_w = input_w;
9173- resize->input_h = input_h;
9174- resize->input_stride_in_bytes = input_stride_in_bytes;
9175- resize->output_pixels = output_pixels;
9176- resize->output_w = output_w;
9177- resize->output_h = output_h;
9178- resize->output_stride_in_bytes = output_stride_in_bytes;
9179- resize->fast_alpha = 0;
9180-
9181- stbir__init_and_set_layout(resize, pixel_layout, data_type);
9182-}
9183-
9184-// You can update parameters any time after resize_init
9185-STBIRDEF void
9186-stbir_set_datatypes(
9187- STBIR_RESIZE *resize, stbir_datatype input_type,
9188- stbir_datatype output_type) // by default, datatype from resize_init
9189-{
9190- resize->input_data_type = input_type;
9191- resize->output_data_type = output_type;
9192- if ((resize->samplers) && (!resize->needs_rebuild)) {
9193- stbir__update_info_from_resize(resize->samplers, resize);
9194- }
9195-}
9196-
9197-STBIRDEF void
9198-stbir_set_pixel_callbacks(
9199- STBIR_RESIZE *resize, stbir_input_callback *input_cb,
9200- stbir_output_callback *output_cb) // no callbacks by default
9201-{
9202- resize->input_cb = input_cb;
9203- resize->output_cb = output_cb;
9204-
9205- if ((resize->samplers) && (!resize->needs_rebuild)) {
9206- resize->samplers->in_pixels_cb = input_cb;
9207- resize->samplers->out_pixels_cb = output_cb;
9208- }
9209-}
9210-
9211-STBIRDEF void
9212-stbir_set_user_data(STBIR_RESIZE *resize,
9213- void *user_data) // pass back STBIR_RESIZE* by default
9214-{
9215- resize->user_data = user_data;
9216- if ((resize->samplers) && (!resize->needs_rebuild)) {
9217- resize->samplers->user_data = user_data;
9218- }
9219-}
9220-
9221-STBIRDEF void
9222-stbir_set_buffer_ptrs(STBIR_RESIZE *resize, const void *input_pixels,
9223- int input_stride_in_bytes, void *output_pixels,
9224- int output_stride_in_bytes)
9225-{
9226- resize->input_pixels = input_pixels;
9227- resize->input_stride_in_bytes = input_stride_in_bytes;
9228- resize->output_pixels = output_pixels;
9229- resize->output_stride_in_bytes = output_stride_in_bytes;
9230- if ((resize->samplers) && (!resize->needs_rebuild)) {
9231- stbir__update_info_from_resize(resize->samplers, resize);
9232- }
9233-}
9234-
9235-STBIRDEF int
9236-stbir_set_edgemodes(STBIR_RESIZE *resize, stbir_edge horizontal_edge,
9237- stbir_edge vertical_edge) // CLAMP by default
9238-{
9239- resize->horizontal_edge = horizontal_edge;
9240- resize->vertical_edge = vertical_edge;
9241- resize->needs_rebuild = 1;
9242- return 1;
9243-}
9244-
9245-STBIRDEF int
9246-stbir_set_filters(STBIR_RESIZE *resize, stbir_filter horizontal_filter,
9247- stbir_filter vertical_filter) // STBIR_DEFAULT_FILTER_UPSAMPLE/DOWNSAMPLE
9248- // by default
9249-{
9250- resize->horizontal_filter = horizontal_filter;
9251- resize->vertical_filter = vertical_filter;
9252- resize->needs_rebuild = 1;
9253- return 1;
9254-}
9255-
9256-STBIRDEF int
9257-stbir_set_filter_callbacks(STBIR_RESIZE *resize,
9258- stbir__kernel_callback *horizontal_filter,
9259- stbir__support_callback *horizontal_support,
9260- stbir__kernel_callback *vertical_filter,
9261- stbir__support_callback *vertical_support)
9262-{
9263- resize->horizontal_filter_kernel = horizontal_filter;
9264- resize->horizontal_filter_support = horizontal_support;
9265- resize->vertical_filter_kernel = vertical_filter;
9266- resize->vertical_filter_support = vertical_support;
9267- resize->needs_rebuild = 1;
9268- return 1;
9269-}
9270-
9271-STBIRDEF int
9272-stbir_set_pixel_layouts(
9273- STBIR_RESIZE *resize, stbir_pixel_layout input_pixel_layout,
9274- stbir_pixel_layout output_pixel_layout) // sets new pixel layouts
9275-{
9276- resize->input_pixel_layout_public = input_pixel_layout;
9277- resize->output_pixel_layout_public = output_pixel_layout;
9278- resize->needs_rebuild = 1;
9279- return 1;
9280-}
9281-
9282-STBIRDEF int
9283-stbir_set_non_pm_alpha_speed_over_quality(
9284- STBIR_RESIZE *resize,
9285- int non_pma_alpha_speed_over_quality) // sets alpha speed
9286-{
9287- resize->fast_alpha = non_pma_alpha_speed_over_quality;
9288- resize->needs_rebuild = 1;
9289- return 1;
9290-}
9291-
9292-STBIRDEF int
9293-stbir_set_input_subrect(STBIR_RESIZE *resize, double s0, double t0, double s1,
9294- double t1) // sets input region (full region by default)
9295-{
9296- resize->input_s0 = s0;
9297- resize->input_t0 = t0;
9298- resize->input_s1 = s1;
9299- resize->input_t1 = t1;
9300- resize->needs_rebuild = 1;
9301-
9302- // are we inbounds?
9303- if ((s1 < stbir__small_float) || ((s1 - s0) < stbir__small_float) ||
9304- (t1 < stbir__small_float) || ((t1 - t0) < stbir__small_float) ||
9305- (s0 > (1.0f - stbir__small_float)) ||
9306- (t0 > (1.0f - stbir__small_float))) {
9307- return 0;
9308- }
9309-
9310- return 1;
9311-}
9312-
9313-STBIRDEF int
9314-stbir_set_output_pixel_subrect(
9315- STBIR_RESIZE *resize, int subx, int suby, int subw,
9316- int subh) // sets input region (full region by default)
9317-{
9318- resize->output_subx = subx;
9319- resize->output_suby = suby;
9320- resize->output_subw = subw;
9321- resize->output_subh = subh;
9322- resize->needs_rebuild = 1;
9323-
9324- // are we inbounds?
9325- if ((subx >= resize->output_w) || ((subx + subw) <= 0) ||
9326- (suby >= resize->output_h) || ((suby + subh) <= 0) || (subw == 0) ||
9327- (subh == 0)) {
9328- return 0;
9329- }
9330-
9331- return 1;
9332-}
9333-
9334-STBIRDEF int
9335-stbir_set_pixel_subrect(STBIR_RESIZE *resize, int subx, int suby, int subw,
9336- int subh) // sets both regions (full regions by default)
9337-{
9338- double s0, t0, s1, t1;
9339-
9340- s0 = ((double)subx) / ((double)resize->output_w);
9341- t0 = ((double)suby) / ((double)resize->output_h);
9342- s1 = ((double)(subx + subw)) / ((double)resize->output_w);
9343- t1 = ((double)(suby + subh)) / ((double)resize->output_h);
9344-
9345- resize->input_s0 = s0;
9346- resize->input_t0 = t0;
9347- resize->input_s1 = s1;
9348- resize->input_t1 = t1;
9349- resize->output_subx = subx;
9350- resize->output_suby = suby;
9351- resize->output_subw = subw;
9352- resize->output_subh = subh;
9353- resize->needs_rebuild = 1;
9354-
9355- // are we inbounds?
9356- if ((subx >= resize->output_w) || ((subx + subw) <= 0) ||
9357- (suby >= resize->output_h) || ((suby + subh) <= 0) || (subw == 0) ||
9358- (subh == 0)) {
9359- return 0;
9360- }
9361-
9362- return 1;
9363-}
9364-
9365-static int
9366-stbir__perform_build(STBIR_RESIZE *resize, int splits)
9367-{
9368- stbir__contributors conservative = {0, 0};
9369- stbir__sampler horizontal, vertical;
9370- int new_output_subx, new_output_suby;
9371- stbir__info *out_info;
9372-#ifdef STBIR_PROFILE
9373- stbir__info profile_infod; // used to contain building profile info before
9374- // everything is allocated
9375- stbir__info *profile_info = &profile_infod;
9376-#endif
9377-
9378- // have we already built the samplers?
9379- if (resize->samplers) {
9380- return 0;
9381- }
9382-
9383-#define STBIR_RETURN_ERROR_AND_ASSERT(exp) \
9384- STBIR_ASSERT(!(exp)); \
9385- if (exp) \
9386- return 0;
9387- STBIR_RETURN_ERROR_AND_ASSERT((unsigned)resize->horizontal_filter >=
9388- STBIR_FILTER_OTHER)
9389- STBIR_RETURN_ERROR_AND_ASSERT((unsigned)resize->vertical_filter >=
9390- STBIR_FILTER_OTHER)
9391-#undef STBIR_RETURN_ERROR_AND_ASSERT
9392-
9393- if (splits <= 0) {
9394- return 0;
9395- }
9396-
9397- STBIR_PROFILE_BUILD_FIRST_START(build);
9398-
9399- new_output_subx = resize->output_subx;
9400- new_output_suby = resize->output_suby;
9401-
9402- // do horizontal clip and scale calcs
9403- if (!stbir__calculate_region_transform(
9404- &horizontal.scale_info, resize->output_w, &new_output_subx,
9405- resize->output_subw, resize->input_w, resize->input_s0,
9406- resize->input_s1)) {
9407- return 0;
9408- }
9409-
9410- // do vertical clip and scale calcs
9411- if (!stbir__calculate_region_transform(
9412- &vertical.scale_info, resize->output_h, &new_output_suby,
9413- resize->output_subh, resize->input_h, resize->input_t0,
9414- resize->input_t1)) {
9415- return 0;
9416- }
9417-
9418- // if nothing to do, just return
9419- if ((horizontal.scale_info.output_sub_size == 0) ||
9420- (vertical.scale_info.output_sub_size == 0)) {
9421- return 0;
9422- }
9423-
9424- stbir__set_sampler(
9425- &horizontal, resize->horizontal_filter,
9426- resize->horizontal_filter_kernel, resize->horizontal_filter_support,
9427- resize->horizontal_edge, &horizontal.scale_info, 1, resize->user_data);
9428- stbir__get_conservative_extents(&horizontal, &conservative,
9429- resize->user_data);
9430- stbir__set_sampler(&vertical, resize->vertical_filter,
9431- resize->vertical_filter_kernel,
9432- resize->vertical_filter_support, resize->vertical_edge,
9433- &vertical.scale_info, 0, resize->user_data);
9434-
9435- if ((vertical.scale_info.output_sub_size / splits) <
9436- STBIR_FORCE_MINIMUM_SCANLINES_FOR_SPLITS) // each split should be a
9437- // minimum of 4 scanlines
9438- // (handwavey choice)
9439- {
9440- splits = vertical.scale_info.output_sub_size /
9441- STBIR_FORCE_MINIMUM_SCANLINES_FOR_SPLITS;
9442- if (splits == 0) {
9443- splits = 1;
9444- }
9445- }
9446-
9447- STBIR_PROFILE_BUILD_START(alloc);
9448- out_info = stbir__alloc_internal_mem_and_build_samplers(
9449- &horizontal, &vertical, &conservative,
9450- resize->input_pixel_layout_public, resize->output_pixel_layout_public,
9451- splits, new_output_subx, new_output_suby, resize->fast_alpha,
9452- resize->user_data STBIR_ONLY_PROFILE_BUILD_SET_INFO);
9453- STBIR_PROFILE_BUILD_END(alloc);
9454- STBIR_PROFILE_BUILD_END(build);
9455-
9456- if (out_info) {
9457- resize->splits = splits;
9458- resize->samplers = out_info;
9459- resize->needs_rebuild = 0;
9460-#ifdef STBIR_PROFILE
9461- STBIR_MEMCPY(&out_info->profile, &profile_infod.profile,
9462- sizeof(out_info->profile));
9463-#endif
9464-
9465- // update anything that can be changed without recalcing samplers
9466- stbir__update_info_from_resize(out_info, resize);
9467-
9468- return splits;
9469- }
9470-
9471- return 0;
9472-}
9473-
9474-void
9475-stbir_free_samplers(STBIR_RESIZE *resize)
9476-{
9477- if (resize->samplers) {
9478- stbir__free_internal_mem(resize->samplers);
9479- resize->samplers = 0;
9480- resize->called_alloc = 0;
9481- }
9482-}
9483-
9484-STBIRDEF int
9485-stbir_build_samplers_with_splits(STBIR_RESIZE *resize, int splits)
9486-{
9487- if ((resize->samplers == 0) || (resize->needs_rebuild)) {
9488- if (resize->samplers) {
9489- stbir_free_samplers(resize);
9490- }
9491-
9492- resize->called_alloc = 1;
9493- return stbir__perform_build(resize, splits);
9494- }
9495-
9496- STBIR_PROFILE_BUILD_CLEAR(resize->samplers);
9497-
9498- return 1;
9499-}
9500-
9501-STBIRDEF int
9502-stbir_build_samplers(STBIR_RESIZE *resize)
9503-{
9504- return stbir_build_samplers_with_splits(resize, 1);
9505-}
9506-
9507-STBIRDEF int
9508-stbir_resize_extended(STBIR_RESIZE *resize)
9509-{
9510- int result;
9511-
9512- if ((resize->samplers == 0) || (resize->needs_rebuild)) {
9513- int alloc_state = resize->called_alloc; // remember allocated state
9514-
9515- if (resize->samplers) {
9516- stbir__free_internal_mem(resize->samplers);
9517- resize->samplers = 0;
9518- }
9519-
9520- if (!stbir_build_samplers(resize)) {
9521- return 0;
9522- }
9523-
9524- resize->called_alloc = alloc_state;
9525-
9526- // if build_samplers succeeded (above), but there are no samplers set,
9527- // then
9528- // the area to stretch into was zero pixels, so don't do anything and
9529- // return success
9530- if (resize->samplers == 0) {
9531- return 1;
9532- }
9533- } else {
9534- // didn't build anything - clear it
9535- STBIR_PROFILE_BUILD_CLEAR(resize->samplers);
9536- }
9537-
9538- // do resize
9539- result = stbir__perform_resize(resize->samplers, 0, resize->splits);
9540-
9541- // if we alloced, then free
9542- if (!resize->called_alloc) {
9543- stbir_free_samplers(resize);
9544- resize->samplers = 0;
9545- }
9546-
9547- return result;
9548-}
9549-
9550-STBIRDEF int
9551-stbir_resize_extended_split(STBIR_RESIZE *resize, int split_start,
9552- int split_count)
9553-{
9554- STBIR_ASSERT(resize->samplers);
9555-
9556- // if we're just doing the whole thing, call full
9557- if ((split_start == -1) ||
9558- ((split_start == 0) && (split_count == resize->splits))) {
9559- return stbir_resize_extended(resize);
9560- }
9561-
9562- // you **must** build samplers first when using split resize
9563- if ((resize->samplers == 0) || (resize->needs_rebuild)) {
9564- return 0;
9565- }
9566-
9567- if ((split_start >= resize->splits) || (split_start < 0) ||
9568- ((split_start + split_count) > resize->splits) || (split_count <= 0)) {
9569- return 0;
9570- }
9571-
9572- // do resize
9573- return stbir__perform_resize(resize->samplers, split_start, split_count);
9574-}
9575-
9576-static void *
9577-stbir_quick_resize_helper(const void *input_pixels, int input_w, int input_h,
9578- int input_stride_in_bytes, void *output_pixels,
9579- int output_w, int output_h,
9580- int output_stride_in_bytes,
9581- stbir_pixel_layout pixel_layout,
9582- stbir_datatype data_type, stbir_edge edge,
9583- stbir_filter filter)
9584-{
9585- STBIR_RESIZE resize;
9586- int scanline_output_in_bytes;
9587- int positive_output_stride_in_bytes;
9588- void *start_ptr;
9589- void *free_ptr;
9590-
9591- scanline_output_in_bytes =
9592- output_w * stbir__type_size[data_type] *
9593- stbir__pixel_channels
9594- [stbir__pixel_layout_convert_public_to_internal[pixel_layout]];
9595- if (scanline_output_in_bytes == 0) {
9596- return 0;
9597- }
9598-
9599- // if zero stride, use scanline output
9600- if (output_stride_in_bytes == 0) {
9601- output_stride_in_bytes = scanline_output_in_bytes;
9602- }
9603-
9604- // abs value for inverted images (negative pitches)
9605- positive_output_stride_in_bytes = output_stride_in_bytes;
9606- if (positive_output_stride_in_bytes < 0) {
9607- positive_output_stride_in_bytes = -positive_output_stride_in_bytes;
9608- }
9609-
9610- // is the requested stride smaller than the scanline output? if so, just
9611- // fail
9612- if (positive_output_stride_in_bytes < scanline_output_in_bytes) {
9613- return 0;
9614- }
9615-
9616- start_ptr = output_pixels;
9617- free_ptr = 0; // no free pointer, since they passed buffer to use
9618-
9619- // did they pass a zero for the dest? if so, allocate the buffer
9620- if (output_pixels == 0) {
9621- size_t size;
9622- char *ptr;
9623-
9624- size = (size_t)positive_output_stride_in_bytes * (size_t)output_h;
9625- if (size == 0) {
9626- return 0;
9627- }
9628-
9629- ptr = (char *)STBIR_MALLOC(size, 0);
9630- if (ptr == 0) {
9631- return 0;
9632- }
9633-
9634- free_ptr = ptr;
9635-
9636- // point at the last scanline, if they requested a flipped image
9637- if (output_stride_in_bytes < 0) {
9638- start_ptr = ptr + ((size_t)positive_output_stride_in_bytes *
9639- (size_t)(output_h - 1));
9640- } else {
9641- start_ptr = ptr;
9642- }
9643- }
9644-
9645- // ok, now do the resize
9646- stbir_resize_init(&resize, input_pixels, input_w, input_h,
9647- input_stride_in_bytes, start_ptr, output_w, output_h,
9648- output_stride_in_bytes, pixel_layout, data_type);
9649-
9650- resize.horizontal_edge = edge;
9651- resize.vertical_edge = edge;
9652- resize.horizontal_filter = filter;
9653- resize.vertical_filter = filter;
9654-
9655- if (!stbir_resize_extended(&resize)) {
9656- if (free_ptr) {
9657- STBIR_FREE(free_ptr, 0);
9658- }
9659- return 0;
9660- }
9661-
9662- return (free_ptr) ? free_ptr : start_ptr;
9663-}
9664-
9665-STBIRDEF unsigned char *
9666-stbir_resize_uint8_linear(const unsigned char *input_pixels, int input_w,
9667- int input_h, int input_stride_in_bytes,
9668- unsigned char *output_pixels, int output_w,
9669- int output_h, int output_stride_in_bytes,
9670- stbir_pixel_layout pixel_layout)
9671-{
9672- return (unsigned char *)stbir_quick_resize_helper(
9673- input_pixels, input_w, input_h, input_stride_in_bytes, output_pixels,
9674- output_w, output_h, output_stride_in_bytes, pixel_layout,
9675- STBIR_TYPE_UINT8, STBIR_EDGE_CLAMP, STBIR_FILTER_DEFAULT);
9676-}
9677-
9678-STBIRDEF unsigned char *
9679-stbir_resize_uint8_srgb(const unsigned char *input_pixels, int input_w,
9680- int input_h, int input_stride_in_bytes,
9681- unsigned char *output_pixels, int output_w,
9682- int output_h, int output_stride_in_bytes,
9683- stbir_pixel_layout pixel_layout)
9684-{
9685- return (unsigned char *)stbir_quick_resize_helper(
9686- input_pixels, input_w, input_h, input_stride_in_bytes, output_pixels,
9687- output_w, output_h, output_stride_in_bytes, pixel_layout,
9688- STBIR_TYPE_UINT8_SRGB, STBIR_EDGE_CLAMP, STBIR_FILTER_DEFAULT);
9689-}
9690-
9691-STBIRDEF float *
9692-stbir_resize_float_linear(const float *input_pixels, int input_w, int input_h,
9693- int input_stride_in_bytes, float *output_pixels,
9694- int output_w, int output_h,
9695- int output_stride_in_bytes,
9696- stbir_pixel_layout pixel_layout)
9697-{
9698- return (float *)stbir_quick_resize_helper(
9699- input_pixels, input_w, input_h, input_stride_in_bytes, output_pixels,
9700- output_w, output_h, output_stride_in_bytes, pixel_layout,
9701- STBIR_TYPE_FLOAT, STBIR_EDGE_CLAMP, STBIR_FILTER_DEFAULT);
9702-}
9703-
9704-STBIRDEF void *
9705-stbir_resize(const void *input_pixels, int input_w, int input_h,
9706- int input_stride_in_bytes, void *output_pixels, int output_w,
9707- int output_h, int output_stride_in_bytes,
9708- stbir_pixel_layout pixel_layout, stbir_datatype data_type,
9709- stbir_edge edge, stbir_filter filter)
9710-{
9711- return (void *)stbir_quick_resize_helper(
9712- input_pixels, input_w, input_h, input_stride_in_bytes, output_pixels,
9713- output_w, output_h, output_stride_in_bytes, pixel_layout, data_type,
9714- edge, filter);
9715-}
9716-
9717-#ifdef STBIR_PROFILE
9718-
9719-STBIRDEF void
9720-stbir_resize_build_profile_info(STBIR_PROFILE_INFO *info,
9721- STBIR_RESIZE const *resize)
9722-{
9723- static char const *bdescriptions[6] = {
9724- "Building", "Allocating", "Horizontal sampler",
9725- "Vertical sampler", "Coefficient cleanup", "Coefficient piovot"};
9726- stbir__info *samp = resize->samplers;
9727- int i;
9728-
9729- typedef int testa[(STBIR__ARRAY_SIZE(bdescriptions) ==
9730- (STBIR__ARRAY_SIZE(samp->profile.array) - 1))
9731- ? 1
9732- : -1];
9733- typedef int
9734- testb[(sizeof(samp->profile.array) == (sizeof(samp->profile.named)))
9735- ? 1
9736- : -1];
9737- typedef int
9738- testc[(sizeof(info->clocks) >= (sizeof(samp->profile.named))) ? 1 : -1];
9739-
9740- for (i = 0; i < STBIR__ARRAY_SIZE(bdescriptions); i++) {
9741- info->clocks[i] = samp->profile.array[i + 1];
9742- }
9743-
9744- info->total_clocks = samp->profile.named.total;
9745- info->descriptions = bdescriptions;
9746- info->count = STBIR__ARRAY_SIZE(bdescriptions);
9747-}
9748-
9749-STBIRDEF void
9750-stbir_resize_split_profile_info(STBIR_PROFILE_INFO *info,
9751- STBIR_RESIZE const *resize, int split_start,
9752- int split_count)
9753-{
9754- static char const *descriptions[7] = {
9755- "Looping", "Vertical sampling", "Horizontal sampling",
9756- "Scanline input", "Scanline output", "Alpha weighting",
9757- "Alpha unweighting"};
9758- stbir__per_split_info *split_info;
9759- int s, i;
9760-
9761- typedef int testa[(STBIR__ARRAY_SIZE(descriptions) ==
9762- (STBIR__ARRAY_SIZE(split_info->profile.array) - 1))
9763- ? 1
9764- : -1];
9765- typedef int testb[(sizeof(split_info->profile.array) ==
9766- (sizeof(split_info->profile.named)))
9767- ? 1
9768- : -1];
9769- typedef int
9770- testc[(sizeof(info->clocks) >= (sizeof(split_info->profile.named)))
9771- ? 1
9772- : -1];
9773-
9774- if (split_start == -1) {
9775- split_start = 0;
9776- split_count = resize->samplers->splits;
9777- }
9778-
9779- if ((split_start >= resize->splits) || (split_start < 0) ||
9780- ((split_start + split_count) > resize->splits) || (split_count <= 0)) {
9781- info->total_clocks = 0;
9782- info->descriptions = 0;
9783- info->count = 0;
9784- return;
9785- }
9786-
9787- split_info = resize->samplers->split_info + split_start;
9788-
9789- // sum up the profile from all the splits
9790- for (i = 0; i < STBIR__ARRAY_SIZE(descriptions); i++) {
9791- stbir_uint64 sum = 0;
9792- for (s = 0; s < split_count; s++) {
9793- sum += split_info[s].profile.array[i + 1];
9794- }
9795- info->clocks[i] = sum;
9796- }
9797-
9798- info->total_clocks = split_info->profile.named.total;
9799- info->descriptions = descriptions;
9800- info->count = STBIR__ARRAY_SIZE(descriptions);
9801-}
9802-
9803-STBIRDEF void
9804-stbir_resize_extended_profile_info(STBIR_PROFILE_INFO *info,
9805- STBIR_RESIZE const *resize)
9806-{
9807- stbir_resize_split_profile_info(info, resize, -1, 0);
9808-}
9809-
9810-#endif // STBIR_PROFILE
9811-
9812-#undef STBIR_BGR
9813-#undef STBIR_1CHANNEL
9814-#undef STBIR_2CHANNEL
9815-#undef STBIR_RGB
9816-#undef STBIR_RGBA
9817-#undef STBIR_4CHANNEL
9818-#undef STBIR_BGRA
9819-#undef STBIR_ARGB
9820-#undef STBIR_ABGR
9821-#undef STBIR_RA
9822-#undef STBIR_AR
9823-#undef STBIR_RGBA_PM
9824-#undef STBIR_BGRA_PM
9825-#undef STBIR_ARGB_PM
9826-#undef STBIR_ABGR_PM
9827-#undef STBIR_RA_PM
9828-#undef STBIR_AR_PM
9829-
9830-#endif // STB_IMAGE_RESIZE_IMPLEMENTATION
9831-
9832-#else // STB_IMAGE_RESIZE_HORIZONTALS&STB_IMAGE_RESIZE_DO_VERTICALS
9833-
9834-// we reinclude the header file to define all the horizontal functions
9835-// specializing each function for the number of coeffs is 20-40% faster
9836-// *OVERALL*
9837-
9838-// by including the header file again this way, we can still debug the functions
9839-
9840-#define STBIR_strs_join2(start, mid, end) start##mid##end
9841-#define STBIR_strs_join1(start, mid, end) STBIR_strs_join2(start, mid, end)
9842-
9843-#define STBIR_strs_join24(start, mid1, mid2, end) start##mid1##mid2##end
9844-#define STBIR_strs_join14(start, mid1, mid2, end) \
9845- STBIR_strs_join24(start, mid1, mid2, end)
9846-
9847-#ifdef STB_IMAGE_RESIZE_DO_CODERS
9848-
9849-#ifdef stbir__decode_suffix
9850-#define STBIR__CODER_NAME(name) STBIR_strs_join1(name, _, stbir__decode_suffix)
9851-#else
9852-#define STBIR__CODER_NAME(name) name
9853-#endif
9854-
9855-#ifdef stbir__decode_swizzle
9856-#define stbir__decode_simdf8_flip(reg) \
9857- STBIR_strs_join1( \
9858- STBIR_strs_join1( \
9859- STBIR_strs_join1(STBIR_strs_join1(stbir__simdf8_0123to, \
9860- stbir__decode_order0, \
9861- stbir__decode_order1), \
9862- stbir__decode_order2, stbir__decode_order3), \
9863- stbir__decode_order0, stbir__decode_order1), \
9864- stbir__decode_order2, stbir__decode_order3)(reg, reg)
9865-#define stbir__decode_simdf4_flip(reg) \
9866- STBIR_strs_join1(STBIR_strs_join1(stbir__simdf_0123to, \
9867- stbir__decode_order0, \
9868- stbir__decode_order1), \
9869- stbir__decode_order2, stbir__decode_order3)(reg, reg)
9870-#define stbir__encode_simdf8_unflip(reg) \
9871- STBIR_strs_join1( \
9872- STBIR_strs_join1( \
9873- STBIR_strs_join1(STBIR_strs_join1(stbir__simdf8_0123to, \
9874- stbir__encode_order0, \
9875- stbir__encode_order1), \
9876- stbir__encode_order2, stbir__encode_order3), \
9877- stbir__encode_order0, stbir__encode_order1), \
9878- stbir__encode_order2, stbir__encode_order3)(reg, reg)
9879-#define stbir__encode_simdf4_unflip(reg) \
9880- STBIR_strs_join1(STBIR_strs_join1(stbir__simdf_0123to, \
9881- stbir__encode_order0, \
9882- stbir__encode_order1), \
9883- stbir__encode_order2, stbir__encode_order3)(reg, reg)
9884-#else
9885-#define stbir__decode_order0 0
9886-#define stbir__decode_order1 1
9887-#define stbir__decode_order2 2
9888-#define stbir__decode_order3 3
9889-#define stbir__encode_order0 0
9890-#define stbir__encode_order1 1
9891-#define stbir__encode_order2 2
9892-#define stbir__encode_order3 3
9893-#define stbir__decode_simdf8_flip(reg)
9894-#define stbir__decode_simdf4_flip(reg)
9895-#define stbir__encode_simdf8_unflip(reg)
9896-#define stbir__encode_simdf4_unflip(reg)
9897-#endif
9898-
9899-#ifdef STBIR_SIMD8
9900-#define stbir__encode_simdfX_unflip stbir__encode_simdf8_unflip
9901-#else
9902-#define stbir__encode_simdfX_unflip stbir__encode_simdf4_unflip
9903-#endif
9904-
9905-static float *
9906-STBIR__CODER_NAME(stbir__decode_uint8_linear_scaled)(float *decodep,
9907- int width_times_channels,
9908- void const *inputp)
9909-{
9910- float STBIR_STREAMOUT_PTR(*) decode = decodep;
9911- float *decode_end = (float *)decode + width_times_channels;
9912- unsigned char const *input = (unsigned char const *)inputp;
9913-
9914-#ifdef STBIR_SIMD
9915- unsigned char const *end_input_m16 = input + width_times_channels - 16;
9916- if (width_times_channels >= 16) {
9917- decode_end -= 16;
9918- STBIR_NO_UNROLL_LOOP_START_INF_FOR
9919- for (;;) {
9920-#ifdef STBIR_SIMD8
9921- stbir__simdi i;
9922- stbir__simdi8 o0, o1;
9923- stbir__simdf8 of0, of1;
9924- STBIR_NO_UNROLL(decode);
9925- stbir__simdi_load(i, input);
9926- stbir__simdi8_expand_u8_to_u32(o0, o1, i);
9927- stbir__simdi8_convert_i32_to_float(of0, o0);
9928- stbir__simdi8_convert_i32_to_float(of1, o1);
9929- stbir__simdf8_mult(of0, of0, STBIR_max_uint8_as_float_inverted8);
9930- stbir__simdf8_mult(of1, of1, STBIR_max_uint8_as_float_inverted8);
9931- stbir__decode_simdf8_flip(of0);
9932- stbir__decode_simdf8_flip(of1);
9933- stbir__simdf8_store(decode + 0, of0);
9934- stbir__simdf8_store(decode + 8, of1);
9935-#else
9936- stbir__simdi i, o0, o1, o2, o3;
9937- stbir__simdf of0, of1, of2, of3;
9938- STBIR_NO_UNROLL(decode);
9939- stbir__simdi_load(i, input);
9940- stbir__simdi_expand_u8_to_u32(o0, o1, o2, o3, i);
9941- stbir__simdi_convert_i32_to_float(of0, o0);
9942- stbir__simdi_convert_i32_to_float(of1, o1);
9943- stbir__simdi_convert_i32_to_float(of2, o2);
9944- stbir__simdi_convert_i32_to_float(of3, o3);
9945- stbir__simdf_mult(of0, of0,
9946- STBIR__CONSTF(STBIR_max_uint8_as_float_inverted));
9947- stbir__simdf_mult(of1, of1,
9948- STBIR__CONSTF(STBIR_max_uint8_as_float_inverted));
9949- stbir__simdf_mult(of2, of2,
9950- STBIR__CONSTF(STBIR_max_uint8_as_float_inverted));
9951- stbir__simdf_mult(of3, of3,
9952- STBIR__CONSTF(STBIR_max_uint8_as_float_inverted));
9953- stbir__decode_simdf4_flip(of0);
9954- stbir__decode_simdf4_flip(of1);
9955- stbir__decode_simdf4_flip(of2);
9956- stbir__decode_simdf4_flip(of3);
9957- stbir__simdf_store(decode + 0, of0);
9958- stbir__simdf_store(decode + 4, of1);
9959- stbir__simdf_store(decode + 8, of2);
9960- stbir__simdf_store(decode + 12, of3);
9961-#endif
9962- decode += 16;
9963- input += 16;
9964- if (decode <= decode_end) {
9965- continue;
9966- }
9967- if (decode == (decode_end + 16)) {
9968- break;
9969- }
9970- decode = decode_end; // backup and do last couple
9971- input = end_input_m16;
9972- }
9973- return decode_end + 16;
9974- }
9975-#endif
9976-
9977-// try to do blocks of 4 when you can
9978-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
9979- decode += 4;
9980- STBIR_SIMD_NO_UNROLL_LOOP_START
9981- while (decode <= decode_end) {
9982- STBIR_SIMD_NO_UNROLL(decode);
9983- decode[0 - 4] = ((float)(input[stbir__decode_order0])) *
9984- stbir__max_uint8_as_float_inverted;
9985- decode[1 - 4] = ((float)(input[stbir__decode_order1])) *
9986- stbir__max_uint8_as_float_inverted;
9987- decode[2 - 4] = ((float)(input[stbir__decode_order2])) *
9988- stbir__max_uint8_as_float_inverted;
9989- decode[3 - 4] = ((float)(input[stbir__decode_order3])) *
9990- stbir__max_uint8_as_float_inverted;
9991- decode += 4;
9992- input += 4;
9993- }
9994- decode -= 4;
9995-#endif
9996-
9997-// do the remnants
9998-#if stbir__coder_min_num < 4
9999- STBIR_NO_UNROLL_LOOP_START
10000- while (decode < decode_end) {
10001- STBIR_NO_UNROLL(decode);
10002- decode[0] = ((float)(input[stbir__decode_order0])) *
10003- stbir__max_uint8_as_float_inverted;
10004-#if stbir__coder_min_num >= 2
10005- decode[1] = ((float)(input[stbir__decode_order1])) *
10006- stbir__max_uint8_as_float_inverted;
10007-#endif
10008-#if stbir__coder_min_num >= 3
10009- decode[2] = ((float)(input[stbir__decode_order2])) *
10010- stbir__max_uint8_as_float_inverted;
10011-#endif
10012- decode += stbir__coder_min_num;
10013- input += stbir__coder_min_num;
10014- }
10015-#endif
10016-
10017- return decode_end;
10018-}
10019-
10020-static void
10021-STBIR__CODER_NAME(stbir__encode_uint8_linear_scaled)(void *outputp,
10022- int width_times_channels,
10023- float const *encode)
10024-{
10025- unsigned char STBIR_SIMD_STREAMOUT_PTR(*) output = (unsigned char *)outputp;
10026- unsigned char *end_output =
10027- ((unsigned char *)output) + width_times_channels;
10028-
10029-#ifdef STBIR_SIMD
10030- if (width_times_channels >= stbir__simdfX_float_count * 2) {
10031- float const *end_encode_m8 =
10032- encode + width_times_channels - stbir__simdfX_float_count * 2;
10033- end_output -= stbir__simdfX_float_count * 2;
10034- STBIR_NO_UNROLL_LOOP_START_INF_FOR
10035- for (;;) {
10036- stbir__simdfX e0, e1;
10037- stbir__simdi i;
10038- STBIR_SIMD_NO_UNROLL(encode);
10039- stbir__simdfX_madd_mem(e0, STBIR_simd_point5X,
10040- STBIR_max_uint8_as_floatX, encode);
10041- stbir__simdfX_madd_mem(e1, STBIR_simd_point5X,
10042- STBIR_max_uint8_as_floatX,
10043- encode + stbir__simdfX_float_count);
10044- stbir__encode_simdfX_unflip(e0);
10045- stbir__encode_simdfX_unflip(e1);
10046-#ifdef STBIR_SIMD8
10047- stbir__simdf8_pack_to_16bytes(i, e0, e1);
10048- stbir__simdi_store(output, i);
10049-#else
10050- stbir__simdf_pack_to_8bytes(i, e0, e1);
10051- stbir__simdi_store2(output, i);
10052-#endif
10053- encode += stbir__simdfX_float_count * 2;
10054- output += stbir__simdfX_float_count * 2;
10055- if (output <= end_output) {
10056- continue;
10057- }
10058- if (output == (end_output + stbir__simdfX_float_count * 2)) {
10059- break;
10060- }
10061- output = end_output; // backup and do last couple
10062- encode = end_encode_m8;
10063- }
10064- return;
10065- }
10066-
10067-// try to do blocks of 4 when you can
10068-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
10069- output += 4;
10070- STBIR_NO_UNROLL_LOOP_START
10071- while (output <= end_output) {
10072- stbir__simdf e0;
10073- stbir__simdi i0;
10074- STBIR_NO_UNROLL(encode);
10075- stbir__simdf_load(e0, encode);
10076- stbir__simdf_madd(e0, STBIR__CONSTF(STBIR_simd_point5),
10077- STBIR__CONSTF(STBIR_max_uint8_as_float), e0);
10078- stbir__encode_simdf4_unflip(e0);
10079- stbir__simdf_pack_to_8bytes(i0, e0, e0); // only use first 4
10080- *(int *)(output - 4) = stbir__simdi_to_int(i0);
10081- output += 4;
10082- encode += 4;
10083- }
10084- output -= 4;
10085-#endif
10086-
10087-// do the remnants
10088-#if stbir__coder_min_num < 4
10089- STBIR_NO_UNROLL_LOOP_START
10090- while (output < end_output) {
10091- stbir__simdf e0;
10092- STBIR_NO_UNROLL(encode);
10093- stbir__simdf_madd1_mem(e0, STBIR__CONSTF(STBIR_simd_point5),
10094- STBIR__CONSTF(STBIR_max_uint8_as_float),
10095- encode + stbir__encode_order0);
10096- output[0] = stbir__simdf_convert_float_to_uint8(e0);
10097-#if stbir__coder_min_num >= 2
10098- stbir__simdf_madd1_mem(e0, STBIR__CONSTF(STBIR_simd_point5),
10099- STBIR__CONSTF(STBIR_max_uint8_as_float),
10100- encode + stbir__encode_order1);
10101- output[1] = stbir__simdf_convert_float_to_uint8(e0);
10102-#endif
10103-#if stbir__coder_min_num >= 3
10104- stbir__simdf_madd1_mem(e0, STBIR__CONSTF(STBIR_simd_point5),
10105- STBIR__CONSTF(STBIR_max_uint8_as_float),
10106- encode + stbir__encode_order2);
10107- output[2] = stbir__simdf_convert_float_to_uint8(e0);
10108-#endif
10109- output += stbir__coder_min_num;
10110- encode += stbir__coder_min_num;
10111- }
10112-#endif
10113-
10114-#else
10115-
10116-// try to do blocks of 4 when you can
10117-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
10118- output += 4;
10119- while (output <= end_output) {
10120- float f;
10121- f = encode[stbir__encode_order0] * stbir__max_uint8_as_float + 0.5f;
10122- STBIR_CLAMP(f, 0, 255);
10123- output[0 - 4] = (unsigned char)f;
10124- f = encode[stbir__encode_order1] * stbir__max_uint8_as_float + 0.5f;
10125- STBIR_CLAMP(f, 0, 255);
10126- output[1 - 4] = (unsigned char)f;
10127- f = encode[stbir__encode_order2] * stbir__max_uint8_as_float + 0.5f;
10128- STBIR_CLAMP(f, 0, 255);
10129- output[2 - 4] = (unsigned char)f;
10130- f = encode[stbir__encode_order3] * stbir__max_uint8_as_float + 0.5f;
10131- STBIR_CLAMP(f, 0, 255);
10132- output[3 - 4] = (unsigned char)f;
10133- output += 4;
10134- encode += 4;
10135- }
10136- output -= 4;
10137-#endif
10138-
10139-// do the remnants
10140-#if stbir__coder_min_num < 4
10141- STBIR_NO_UNROLL_LOOP_START
10142- while (output < end_output) {
10143- float f;
10144- STBIR_NO_UNROLL(encode);
10145- f = encode[stbir__encode_order0] * stbir__max_uint8_as_float + 0.5f;
10146- STBIR_CLAMP(f, 0, 255);
10147- output[0] = (unsigned char)f;
10148-#if stbir__coder_min_num >= 2
10149- f = encode[stbir__encode_order1] * stbir__max_uint8_as_float + 0.5f;
10150- STBIR_CLAMP(f, 0, 255);
10151- output[1] = (unsigned char)f;
10152-#endif
10153-#if stbir__coder_min_num >= 3
10154- f = encode[stbir__encode_order2] * stbir__max_uint8_as_float + 0.5f;
10155- STBIR_CLAMP(f, 0, 255);
10156- output[2] = (unsigned char)f;
10157-#endif
10158- output += stbir__coder_min_num;
10159- encode += stbir__coder_min_num;
10160- }
10161-#endif
10162-#endif
10163-}
10164-
10165-static float *
10166-STBIR__CODER_NAME(stbir__decode_uint8_linear)(float *decodep,
10167- int width_times_channels,
10168- void const *inputp)
10169-{
10170- float STBIR_STREAMOUT_PTR(*) decode = decodep;
10171- float *decode_end = (float *)decode + width_times_channels;
10172- unsigned char const *input = (unsigned char const *)inputp;
10173-
10174-#ifdef STBIR_SIMD
10175- unsigned char const *end_input_m16 = input + width_times_channels - 16;
10176- if (width_times_channels >= 16) {
10177- decode_end -= 16;
10178- STBIR_NO_UNROLL_LOOP_START_INF_FOR
10179- for (;;) {
10180-#ifdef STBIR_SIMD8
10181- stbir__simdi i;
10182- stbir__simdi8 o0, o1;
10183- stbir__simdf8 of0, of1;
10184- STBIR_NO_UNROLL(decode);
10185- stbir__simdi_load(i, input);
10186- stbir__simdi8_expand_u8_to_u32(o0, o1, i);
10187- stbir__simdi8_convert_i32_to_float(of0, o0);
10188- stbir__simdi8_convert_i32_to_float(of1, o1);
10189- stbir__decode_simdf8_flip(of0);
10190- stbir__decode_simdf8_flip(of1);
10191- stbir__simdf8_store(decode + 0, of0);
10192- stbir__simdf8_store(decode + 8, of1);
10193-#else
10194- stbir__simdi i, o0, o1, o2, o3;
10195- stbir__simdf of0, of1, of2, of3;
10196- STBIR_NO_UNROLL(decode);
10197- stbir__simdi_load(i, input);
10198- stbir__simdi_expand_u8_to_u32(o0, o1, o2, o3, i);
10199- stbir__simdi_convert_i32_to_float(of0, o0);
10200- stbir__simdi_convert_i32_to_float(of1, o1);
10201- stbir__simdi_convert_i32_to_float(of2, o2);
10202- stbir__simdi_convert_i32_to_float(of3, o3);
10203- stbir__decode_simdf4_flip(of0);
10204- stbir__decode_simdf4_flip(of1);
10205- stbir__decode_simdf4_flip(of2);
10206- stbir__decode_simdf4_flip(of3);
10207- stbir__simdf_store(decode + 0, of0);
10208- stbir__simdf_store(decode + 4, of1);
10209- stbir__simdf_store(decode + 8, of2);
10210- stbir__simdf_store(decode + 12, of3);
10211-#endif
10212- decode += 16;
10213- input += 16;
10214- if (decode <= decode_end) {
10215- continue;
10216- }
10217- if (decode == (decode_end + 16)) {
10218- break;
10219- }
10220- decode = decode_end; // backup and do last couple
10221- input = end_input_m16;
10222- }
10223- return decode_end + 16;
10224- }
10225-#endif
10226-
10227-// try to do blocks of 4 when you can
10228-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
10229- decode += 4;
10230- STBIR_SIMD_NO_UNROLL_LOOP_START
10231- while (decode <= decode_end) {
10232- STBIR_SIMD_NO_UNROLL(decode);
10233- decode[0 - 4] = ((float)(input[stbir__decode_order0]));
10234- decode[1 - 4] = ((float)(input[stbir__decode_order1]));
10235- decode[2 - 4] = ((float)(input[stbir__decode_order2]));
10236- decode[3 - 4] = ((float)(input[stbir__decode_order3]));
10237- decode += 4;
10238- input += 4;
10239- }
10240- decode -= 4;
10241-#endif
10242-
10243-// do the remnants
10244-#if stbir__coder_min_num < 4
10245- STBIR_NO_UNROLL_LOOP_START
10246- while (decode < decode_end) {
10247- STBIR_NO_UNROLL(decode);
10248- decode[0] = ((float)(input[stbir__decode_order0]));
10249-#if stbir__coder_min_num >= 2
10250- decode[1] = ((float)(input[stbir__decode_order1]));
10251-#endif
10252-#if stbir__coder_min_num >= 3
10253- decode[2] = ((float)(input[stbir__decode_order2]));
10254-#endif
10255- decode += stbir__coder_min_num;
10256- input += stbir__coder_min_num;
10257- }
10258-#endif
10259- return decode_end;
10260-}
10261-
10262-static void
10263-STBIR__CODER_NAME(stbir__encode_uint8_linear)(void *outputp,
10264- int width_times_channels,
10265- float const *encode)
10266-{
10267- unsigned char STBIR_SIMD_STREAMOUT_PTR(*) output = (unsigned char *)outputp;
10268- unsigned char *end_output =
10269- ((unsigned char *)output) + width_times_channels;
10270-
10271-#ifdef STBIR_SIMD
10272- if (width_times_channels >= stbir__simdfX_float_count * 2) {
10273- float const *end_encode_m8 =
10274- encode + width_times_channels - stbir__simdfX_float_count * 2;
10275- end_output -= stbir__simdfX_float_count * 2;
10276- STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
10277- for (;;) {
10278- stbir__simdfX e0, e1;
10279- stbir__simdi i;
10280- STBIR_SIMD_NO_UNROLL(encode);
10281- stbir__simdfX_add_mem(e0, STBIR_simd_point5X, encode);
10282- stbir__simdfX_add_mem(e1, STBIR_simd_point5X,
10283- encode + stbir__simdfX_float_count);
10284- stbir__encode_simdfX_unflip(e0);
10285- stbir__encode_simdfX_unflip(e1);
10286-#ifdef STBIR_SIMD8
10287- stbir__simdf8_pack_to_16bytes(i, e0, e1);
10288- stbir__simdi_store(output, i);
10289-#else
10290- stbir__simdf_pack_to_8bytes(i, e0, e1);
10291- stbir__simdi_store2(output, i);
10292-#endif
10293- encode += stbir__simdfX_float_count * 2;
10294- output += stbir__simdfX_float_count * 2;
10295- if (output <= end_output) {
10296- continue;
10297- }
10298- if (output == (end_output + stbir__simdfX_float_count * 2)) {
10299- break;
10300- }
10301- output = end_output; // backup and do last couple
10302- encode = end_encode_m8;
10303- }
10304- return;
10305- }
10306-
10307-// try to do blocks of 4 when you can
10308-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
10309- output += 4;
10310- STBIR_NO_UNROLL_LOOP_START
10311- while (output <= end_output) {
10312- stbir__simdf e0;
10313- stbir__simdi i0;
10314- STBIR_NO_UNROLL(encode);
10315- stbir__simdf_load(e0, encode);
10316- stbir__simdf_add(e0, STBIR__CONSTF(STBIR_simd_point5), e0);
10317- stbir__encode_simdf4_unflip(e0);
10318- stbir__simdf_pack_to_8bytes(i0, e0, e0); // only use first 4
10319- *(int *)(output - 4) = stbir__simdi_to_int(i0);
10320- output += 4;
10321- encode += 4;
10322- }
10323- output -= 4;
10324-#endif
10325-
10326-#else
10327-
10328-// try to do blocks of 4 when you can
10329-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
10330- output += 4;
10331- while (output <= end_output) {
10332- float f;
10333- f = encode[stbir__encode_order0] + 0.5f;
10334- STBIR_CLAMP(f, 0, 255);
10335- output[0 - 4] = (unsigned char)f;
10336- f = encode[stbir__encode_order1] + 0.5f;
10337- STBIR_CLAMP(f, 0, 255);
10338- output[1 - 4] = (unsigned char)f;
10339- f = encode[stbir__encode_order2] + 0.5f;
10340- STBIR_CLAMP(f, 0, 255);
10341- output[2 - 4] = (unsigned char)f;
10342- f = encode[stbir__encode_order3] + 0.5f;
10343- STBIR_CLAMP(f, 0, 255);
10344- output[3 - 4] = (unsigned char)f;
10345- output += 4;
10346- encode += 4;
10347- }
10348- output -= 4;
10349-#endif
10350-
10351-#endif
10352-
10353-// do the remnants
10354-#if stbir__coder_min_num < 4
10355- STBIR_NO_UNROLL_LOOP_START
10356- while (output < end_output) {
10357- float f;
10358- STBIR_NO_UNROLL(encode);
10359- f = encode[stbir__encode_order0] + 0.5f;
10360- STBIR_CLAMP(f, 0, 255);
10361- output[0] = (unsigned char)f;
10362-#if stbir__coder_min_num >= 2
10363- f = encode[stbir__encode_order1] + 0.5f;
10364- STBIR_CLAMP(f, 0, 255);
10365- output[1] = (unsigned char)f;
10366-#endif
10367-#if stbir__coder_min_num >= 3
10368- f = encode[stbir__encode_order2] + 0.5f;
10369- STBIR_CLAMP(f, 0, 255);
10370- output[2] = (unsigned char)f;
10371-#endif
10372- output += stbir__coder_min_num;
10373- encode += stbir__coder_min_num;
10374- }
10375-#endif
10376-}
10377-
10378-static float *
10379-STBIR__CODER_NAME(stbir__decode_uint8_srgb)(float *decodep,
10380- int width_times_channels,
10381- void const *inputp)
10382-{
10383- float STBIR_STREAMOUT_PTR(*) decode = decodep;
10384- float *decode_end = (float *)decode + width_times_channels;
10385- unsigned char const *input = (unsigned char const *)inputp;
10386-
10387-// try to do blocks of 4 when you can
10388-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
10389- decode += 4;
10390- while (decode <= decode_end) {
10391- decode[0 - 4] =
10392- stbir__srgb_uchar_to_linear_float[input[stbir__decode_order0]];
10393- decode[1 - 4] =
10394- stbir__srgb_uchar_to_linear_float[input[stbir__decode_order1]];
10395- decode[2 - 4] =
10396- stbir__srgb_uchar_to_linear_float[input[stbir__decode_order2]];
10397- decode[3 - 4] =
10398- stbir__srgb_uchar_to_linear_float[input[stbir__decode_order3]];
10399- decode += 4;
10400- input += 4;
10401- }
10402- decode -= 4;
10403-#endif
10404-
10405-// do the remnants
10406-#if stbir__coder_min_num < 4
10407- STBIR_NO_UNROLL_LOOP_START
10408- while (decode < decode_end) {
10409- STBIR_NO_UNROLL(decode);
10410- decode[0] =
10411- stbir__srgb_uchar_to_linear_float[input[stbir__decode_order0]];
10412-#if stbir__coder_min_num >= 2
10413- decode[1] =
10414- stbir__srgb_uchar_to_linear_float[input[stbir__decode_order1]];
10415-#endif
10416-#if stbir__coder_min_num >= 3
10417- decode[2] =
10418- stbir__srgb_uchar_to_linear_float[input[stbir__decode_order2]];
10419-#endif
10420- decode += stbir__coder_min_num;
10421- input += stbir__coder_min_num;
10422- }
10423-#endif
10424- return decode_end;
10425-}
10426-
10427-#define stbir__min_max_shift20(i, f) \
10428- stbir__simdf_max(f, f, \
10429- stbir_simdf_casti(STBIR__CONSTI(STBIR_almost_zero))); \
10430- stbir__simdf_min(f, f, \
10431- stbir_simdf_casti(STBIR__CONSTI(STBIR_almost_one))); \
10432- stbir__simdi_32shr(i, stbir_simdi_castf(f), 20);
10433-
10434-#define stbir__scale_and_convert(i, f) \
10435- stbir__simdf_madd(f, STBIR__CONSTF(STBIR_simd_point5), \
10436- STBIR__CONSTF(STBIR_max_uint8_as_float), f); \
10437- stbir__simdf_max(f, f, stbir__simdf_zeroP()); \
10438- stbir__simdf_min(f, f, STBIR__CONSTF(STBIR_max_uint8_as_float)); \
10439- stbir__simdf_convert_float_to_i32(i, f);
10440-
10441-#define stbir__linear_to_srgb_finish(i, f) \
10442- { \
10443- stbir__simdi temp; \
10444- stbir__simdi_32shr(temp, stbir_simdi_castf(f), 12); \
10445- stbir__simdi_and(temp, temp, STBIR__CONSTI(STBIR_mastissa_mask)); \
10446- stbir__simdi_or(temp, temp, STBIR__CONSTI(STBIR_topscale)); \
10447- stbir__simdi_16madd(i, i, temp); \
10448- stbir__simdi_32shr(i, i, 16); \
10449- }
10450-
10451-#define stbir__simdi_table_lookup2(v0, v1, table) \
10452- { \
10453- stbir__simdi_u32 temp0, temp1; \
10454- temp0.m128i_i128 = v0; \
10455- temp1.m128i_i128 = v1; \
10456- temp0.m128i_u32[0] = table[temp0.m128i_i32[0]]; \
10457- temp0.m128i_u32[1] = table[temp0.m128i_i32[1]]; \
10458- temp0.m128i_u32[2] = table[temp0.m128i_i32[2]]; \
10459- temp0.m128i_u32[3] = table[temp0.m128i_i32[3]]; \
10460- temp1.m128i_u32[0] = table[temp1.m128i_i32[0]]; \
10461- temp1.m128i_u32[1] = table[temp1.m128i_i32[1]]; \
10462- temp1.m128i_u32[2] = table[temp1.m128i_i32[2]]; \
10463- temp1.m128i_u32[3] = table[temp1.m128i_i32[3]]; \
10464- v0 = temp0.m128i_i128; \
10465- v1 = temp1.m128i_i128; \
10466- }
10467-
10468-#define stbir__simdi_table_lookup3(v0, v1, v2, table) \
10469- { \
10470- stbir__simdi_u32 temp0, temp1, temp2; \
10471- temp0.m128i_i128 = v0; \
10472- temp1.m128i_i128 = v1; \
10473- temp2.m128i_i128 = v2; \
10474- temp0.m128i_u32[0] = table[temp0.m128i_i32[0]]; \
10475- temp0.m128i_u32[1] = table[temp0.m128i_i32[1]]; \
10476- temp0.m128i_u32[2] = table[temp0.m128i_i32[2]]; \
10477- temp0.m128i_u32[3] = table[temp0.m128i_i32[3]]; \
10478- temp1.m128i_u32[0] = table[temp1.m128i_i32[0]]; \
10479- temp1.m128i_u32[1] = table[temp1.m128i_i32[1]]; \
10480- temp1.m128i_u32[2] = table[temp1.m128i_i32[2]]; \
10481- temp1.m128i_u32[3] = table[temp1.m128i_i32[3]]; \
10482- temp2.m128i_u32[0] = table[temp2.m128i_i32[0]]; \
10483- temp2.m128i_u32[1] = table[temp2.m128i_i32[1]]; \
10484- temp2.m128i_u32[2] = table[temp2.m128i_i32[2]]; \
10485- temp2.m128i_u32[3] = table[temp2.m128i_i32[3]]; \
10486- v0 = temp0.m128i_i128; \
10487- v1 = temp1.m128i_i128; \
10488- v2 = temp2.m128i_i128; \
10489- }
10490-
10491-#define stbir__simdi_table_lookup4(v0, v1, v2, v3, table) \
10492- { \
10493- stbir__simdi_u32 temp0, temp1, temp2, temp3; \
10494- temp0.m128i_i128 = v0; \
10495- temp1.m128i_i128 = v1; \
10496- temp2.m128i_i128 = v2; \
10497- temp3.m128i_i128 = v3; \
10498- temp0.m128i_u32[0] = table[temp0.m128i_i32[0]]; \
10499- temp0.m128i_u32[1] = table[temp0.m128i_i32[1]]; \
10500- temp0.m128i_u32[2] = table[temp0.m128i_i32[2]]; \
10501- temp0.m128i_u32[3] = table[temp0.m128i_i32[3]]; \
10502- temp1.m128i_u32[0] = table[temp1.m128i_i32[0]]; \
10503- temp1.m128i_u32[1] = table[temp1.m128i_i32[1]]; \
10504- temp1.m128i_u32[2] = table[temp1.m128i_i32[2]]; \
10505- temp1.m128i_u32[3] = table[temp1.m128i_i32[3]]; \
10506- temp2.m128i_u32[0] = table[temp2.m128i_i32[0]]; \
10507- temp2.m128i_u32[1] = table[temp2.m128i_i32[1]]; \
10508- temp2.m128i_u32[2] = table[temp2.m128i_i32[2]]; \
10509- temp2.m128i_u32[3] = table[temp2.m128i_i32[3]]; \
10510- temp3.m128i_u32[0] = table[temp3.m128i_i32[0]]; \
10511- temp3.m128i_u32[1] = table[temp3.m128i_i32[1]]; \
10512- temp3.m128i_u32[2] = table[temp3.m128i_i32[2]]; \
10513- temp3.m128i_u32[3] = table[temp3.m128i_i32[3]]; \
10514- v0 = temp0.m128i_i128; \
10515- v1 = temp1.m128i_i128; \
10516- v2 = temp2.m128i_i128; \
10517- v3 = temp3.m128i_i128; \
10518- }
10519-
10520-static void
10521-STBIR__CODER_NAME(stbir__encode_uint8_srgb)(void *outputp,
10522- int width_times_channels,
10523- float const *encode)
10524-{
10525- unsigned char STBIR_SIMD_STREAMOUT_PTR(*) output = (unsigned char *)outputp;
10526- unsigned char *end_output =
10527- ((unsigned char *)output) + width_times_channels;
10528-
10529-#ifdef STBIR_SIMD
10530-
10531- if (width_times_channels >= 16) {
10532- float const *end_encode_m16 = encode + width_times_channels - 16;
10533- end_output -= 16;
10534- STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
10535- for (;;) {
10536- stbir__simdf f0, f1, f2, f3;
10537- stbir__simdi i0, i1, i2, i3;
10538- STBIR_SIMD_NO_UNROLL(encode);
10539-
10540- stbir__simdf_load4_transposed(f0, f1, f2, f3, encode);
10541-
10542- stbir__min_max_shift20(i0, f0);
10543- stbir__min_max_shift20(i1, f1);
10544- stbir__min_max_shift20(i2, f2);
10545- stbir__min_max_shift20(i3, f3);
10546-
10547- stbir__simdi_table_lookup4(i0, i1, i2, i3,
10548- (fp32_to_srgb8_tab4 - (127 - 13) * 8));
10549-
10550- stbir__linear_to_srgb_finish(i0, f0);
10551- stbir__linear_to_srgb_finish(i1, f1);
10552- stbir__linear_to_srgb_finish(i2, f2);
10553- stbir__linear_to_srgb_finish(i3, f3);
10554-
10555- stbir__interleave_pack_and_store_16_u8(
10556- output, STBIR_strs_join1(i, , stbir__encode_order0),
10557- STBIR_strs_join1(i, , stbir__encode_order1),
10558- STBIR_strs_join1(i, , stbir__encode_order2),
10559- STBIR_strs_join1(i, , stbir__encode_order3));
10560-
10561- encode += 16;
10562- output += 16;
10563- if (output <= end_output) {
10564- continue;
10565- }
10566- if (output == (end_output + 16)) {
10567- break;
10568- }
10569- output = end_output; // backup and do last couple
10570- encode = end_encode_m16;
10571- }
10572- return;
10573- }
10574-#endif
10575-
10576-// try to do blocks of 4 when you can
10577-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
10578- output += 4;
10579- STBIR_SIMD_NO_UNROLL_LOOP_START
10580- while (output <= end_output) {
10581- STBIR_SIMD_NO_UNROLL(encode);
10582-
10583- output[0 - 4] =
10584- stbir__linear_to_srgb_uchar(encode[stbir__encode_order0]);
10585- output[1 - 4] =
10586- stbir__linear_to_srgb_uchar(encode[stbir__encode_order1]);
10587- output[2 - 4] =
10588- stbir__linear_to_srgb_uchar(encode[stbir__encode_order2]);
10589- output[3 - 4] =
10590- stbir__linear_to_srgb_uchar(encode[stbir__encode_order3]);
10591-
10592- output += 4;
10593- encode += 4;
10594- }
10595- output -= 4;
10596-#endif
10597-
10598-// do the remnants
10599-#if stbir__coder_min_num < 4
10600- STBIR_NO_UNROLL_LOOP_START
10601- while (output < end_output) {
10602- STBIR_NO_UNROLL(encode);
10603- output[0] = stbir__linear_to_srgb_uchar(encode[stbir__encode_order0]);
10604-#if stbir__coder_min_num >= 2
10605- output[1] = stbir__linear_to_srgb_uchar(encode[stbir__encode_order1]);
10606-#endif
10607-#if stbir__coder_min_num >= 3
10608- output[2] = stbir__linear_to_srgb_uchar(encode[stbir__encode_order2]);
10609-#endif
10610- output += stbir__coder_min_num;
10611- encode += stbir__coder_min_num;
10612- }
10613-#endif
10614-}
10615-
10616-#if (stbir__coder_min_num == 4) || \
10617- ((stbir__coder_min_num == 1) && (!defined(stbir__decode_swizzle)))
10618-
10619-static float *
10620-STBIR__CODER_NAME(stbir__decode_uint8_srgb4_linearalpha)(
10621- float *decodep, int width_times_channels, void const *inputp)
10622-{
10623- float STBIR_STREAMOUT_PTR(*) decode = decodep;
10624- float *decode_end = (float *)decode + width_times_channels;
10625- unsigned char const *input = (unsigned char const *)inputp;
10626-
10627- do {
10628- decode[0] =
10629- stbir__srgb_uchar_to_linear_float[input[stbir__decode_order0]];
10630- decode[1] =
10631- stbir__srgb_uchar_to_linear_float[input[stbir__decode_order1]];
10632- decode[2] =
10633- stbir__srgb_uchar_to_linear_float[input[stbir__decode_order2]];
10634- decode[3] = ((float)input[stbir__decode_order3]) *
10635- stbir__max_uint8_as_float_inverted;
10636- input += 4;
10637- decode += 4;
10638- } while (decode < decode_end);
10639- return decode_end;
10640-}
10641-
10642-static void
10643-STBIR__CODER_NAME(stbir__encode_uint8_srgb4_linearalpha)(
10644- void *outputp, int width_times_channels, float const *encode)
10645-{
10646- unsigned char STBIR_SIMD_STREAMOUT_PTR(*) output = (unsigned char *)outputp;
10647- unsigned char *end_output =
10648- ((unsigned char *)output) + width_times_channels;
10649-
10650-#ifdef STBIR_SIMD
10651-
10652- if (width_times_channels >= 16) {
10653- float const *end_encode_m16 = encode + width_times_channels - 16;
10654- end_output -= 16;
10655- STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
10656- for (;;) {
10657- stbir__simdf f0, f1, f2, f3;
10658- stbir__simdi i0, i1, i2, i3;
10659-
10660- STBIR_SIMD_NO_UNROLL(encode);
10661- stbir__simdf_load4_transposed(f0, f1, f2, f3, encode);
10662-
10663- stbir__min_max_shift20(i0, f0);
10664- stbir__min_max_shift20(i1, f1);
10665- stbir__min_max_shift20(i2, f2);
10666- stbir__scale_and_convert(i3, f3);
10667-
10668- stbir__simdi_table_lookup3(i0, i1, i2,
10669- (fp32_to_srgb8_tab4 - (127 - 13) * 8));
10670-
10671- stbir__linear_to_srgb_finish(i0, f0);
10672- stbir__linear_to_srgb_finish(i1, f1);
10673- stbir__linear_to_srgb_finish(i2, f2);
10674-
10675- stbir__interleave_pack_and_store_16_u8(
10676- output, STBIR_strs_join1(i, , stbir__encode_order0),
10677- STBIR_strs_join1(i, , stbir__encode_order1),
10678- STBIR_strs_join1(i, , stbir__encode_order2),
10679- STBIR_strs_join1(i, , stbir__encode_order3));
10680-
10681- output += 16;
10682- encode += 16;
10683-
10684- if (output <= end_output) {
10685- continue;
10686- }
10687- if (output == (end_output + 16)) {
10688- break;
10689- }
10690- output = end_output; // backup and do last couple
10691- encode = end_encode_m16;
10692- }
10693- return;
10694- }
10695-#endif
10696-
10697- STBIR_SIMD_NO_UNROLL_LOOP_START
10698- do {
10699- float f;
10700- STBIR_SIMD_NO_UNROLL(encode);
10701-
10702- output[stbir__decode_order0] = stbir__linear_to_srgb_uchar(encode[0]);
10703- output[stbir__decode_order1] = stbir__linear_to_srgb_uchar(encode[1]);
10704- output[stbir__decode_order2] = stbir__linear_to_srgb_uchar(encode[2]);
10705-
10706- f = encode[3] * stbir__max_uint8_as_float + 0.5f;
10707- STBIR_CLAMP(f, 0, 255);
10708- output[stbir__decode_order3] = (unsigned char)f;
10709-
10710- output += 4;
10711- encode += 4;
10712- } while (output < end_output);
10713-}
10714-
10715-#endif
10716-
10717-#if (stbir__coder_min_num == 2) || \
10718- ((stbir__coder_min_num == 1) && (!defined(stbir__decode_swizzle)))
10719-
10720-static float *
10721-STBIR__CODER_NAME(stbir__decode_uint8_srgb2_linearalpha)(
10722- float *decodep, int width_times_channels, void const *inputp)
10723-{
10724- float STBIR_STREAMOUT_PTR(*) decode = decodep;
10725- float *decode_end = (float *)decode + width_times_channels;
10726- unsigned char const *input = (unsigned char const *)inputp;
10727-
10728- decode += 4;
10729- while (decode <= decode_end) {
10730- decode[0 - 4] =
10731- stbir__srgb_uchar_to_linear_float[input[stbir__decode_order0]];
10732- decode[1 - 4] = ((float)input[stbir__decode_order1]) *
10733- stbir__max_uint8_as_float_inverted;
10734- decode[2 - 4] =
10735- stbir__srgb_uchar_to_linear_float[input[stbir__decode_order0 + 2]];
10736- decode[3 - 4] = ((float)input[stbir__decode_order1 + 2]) *
10737- stbir__max_uint8_as_float_inverted;
10738- input += 4;
10739- decode += 4;
10740- }
10741- decode -= 4;
10742- if (decode < decode_end) {
10743- decode[0] = stbir__srgb_uchar_to_linear_float[stbir__decode_order0];
10744- decode[1] = ((float)input[stbir__decode_order1]) *
10745- stbir__max_uint8_as_float_inverted;
10746- }
10747- return decode_end;
10748-}
10749-
10750-static void
10751-STBIR__CODER_NAME(stbir__encode_uint8_srgb2_linearalpha)(
10752- void *outputp, int width_times_channels, float const *encode)
10753-{
10754- unsigned char STBIR_SIMD_STREAMOUT_PTR(*) output = (unsigned char *)outputp;
10755- unsigned char *end_output =
10756- ((unsigned char *)output) + width_times_channels;
10757-
10758-#ifdef STBIR_SIMD
10759-
10760- if (width_times_channels >= 16) {
10761- float const *end_encode_m16 = encode + width_times_channels - 16;
10762- end_output -= 16;
10763- STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
10764- for (;;) {
10765- stbir__simdf f0, f1, f2, f3;
10766- stbir__simdi i0, i1, i2, i3;
10767-
10768- STBIR_SIMD_NO_UNROLL(encode);
10769- stbir__simdf_load4_transposed(f0, f1, f2, f3, encode);
10770-
10771- stbir__min_max_shift20(i0, f0);
10772- stbir__scale_and_convert(i1, f1);
10773- stbir__min_max_shift20(i2, f2);
10774- stbir__scale_and_convert(i3, f3);
10775-
10776- stbir__simdi_table_lookup2(i0, i2,
10777- (fp32_to_srgb8_tab4 - (127 - 13) * 8));
10778-
10779- stbir__linear_to_srgb_finish(i0, f0);
10780- stbir__linear_to_srgb_finish(i2, f2);
10781-
10782- stbir__interleave_pack_and_store_16_u8(
10783- output, STBIR_strs_join1(i, , stbir__encode_order0),
10784- STBIR_strs_join1(i, , stbir__encode_order1),
10785- STBIR_strs_join1(i, , stbir__encode_order2),
10786- STBIR_strs_join1(i, , stbir__encode_order3));
10787-
10788- output += 16;
10789- encode += 16;
10790- if (output <= end_output) {
10791- continue;
10792- }
10793- if (output == (end_output + 16)) {
10794- break;
10795- }
10796- output = end_output; // backup and do last couple
10797- encode = end_encode_m16;
10798- }
10799- return;
10800- }
10801-#endif
10802-
10803- STBIR_SIMD_NO_UNROLL_LOOP_START
10804- do {
10805- float f;
10806- STBIR_SIMD_NO_UNROLL(encode);
10807-
10808- output[stbir__decode_order0] = stbir__linear_to_srgb_uchar(encode[0]);
10809-
10810- f = encode[1] * stbir__max_uint8_as_float + 0.5f;
10811- STBIR_CLAMP(f, 0, 255);
10812- output[stbir__decode_order1] = (unsigned char)f;
10813-
10814- output += 2;
10815- encode += 2;
10816- } while (output < end_output);
10817-}
10818-
10819-#endif
10820-
10821-static float *
10822-STBIR__CODER_NAME(stbir__decode_uint16_linear_scaled)(float *decodep,
10823- int width_times_channels,
10824- void const *inputp)
10825-{
10826- float STBIR_STREAMOUT_PTR(*) decode = decodep;
10827- float *decode_end = (float *)decode + width_times_channels;
10828- unsigned short const *input = (unsigned short const *)inputp;
10829-
10830-#ifdef STBIR_SIMD
10831- unsigned short const *end_input_m8 = input + width_times_channels - 8;
10832- if (width_times_channels >= 8) {
10833- decode_end -= 8;
10834- STBIR_NO_UNROLL_LOOP_START_INF_FOR
10835- for (;;) {
10836-#ifdef STBIR_SIMD8
10837- stbir__simdi i;
10838- stbir__simdi8 o;
10839- stbir__simdf8 of;
10840- STBIR_NO_UNROLL(decode);
10841- stbir__simdi_load(i, input);
10842- stbir__simdi8_expand_u16_to_u32(o, i);
10843- stbir__simdi8_convert_i32_to_float(of, o);
10844- stbir__simdf8_mult(of, of, STBIR_max_uint16_as_float_inverted8);
10845- stbir__decode_simdf8_flip(of);
10846- stbir__simdf8_store(decode + 0, of);
10847-#else
10848- stbir__simdi i, o0, o1;
10849- stbir__simdf of0, of1;
10850- STBIR_NO_UNROLL(decode);
10851- stbir__simdi_load(i, input);
10852- stbir__simdi_expand_u16_to_u32(o0, o1, i);
10853- stbir__simdi_convert_i32_to_float(of0, o0);
10854- stbir__simdi_convert_i32_to_float(of1, o1);
10855- stbir__simdf_mult(
10856- of0, of0, STBIR__CONSTF(STBIR_max_uint16_as_float_inverted));
10857- stbir__simdf_mult(
10858- of1, of1, STBIR__CONSTF(STBIR_max_uint16_as_float_inverted));
10859- stbir__decode_simdf4_flip(of0);
10860- stbir__decode_simdf4_flip(of1);
10861- stbir__simdf_store(decode + 0, of0);
10862- stbir__simdf_store(decode + 4, of1);
10863-#endif
10864- decode += 8;
10865- input += 8;
10866- if (decode <= decode_end) {
10867- continue;
10868- }
10869- if (decode == (decode_end + 8)) {
10870- break;
10871- }
10872- decode = decode_end; // backup and do last couple
10873- input = end_input_m8;
10874- }
10875- return decode_end + 8;
10876- }
10877-#endif
10878-
10879-// try to do blocks of 4 when you can
10880-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
10881- decode += 4;
10882- STBIR_SIMD_NO_UNROLL_LOOP_START
10883- while (decode <= decode_end) {
10884- STBIR_SIMD_NO_UNROLL(decode);
10885- decode[0 - 4] = ((float)(input[stbir__decode_order0])) *
10886- stbir__max_uint16_as_float_inverted;
10887- decode[1 - 4] = ((float)(input[stbir__decode_order1])) *
10888- stbir__max_uint16_as_float_inverted;
10889- decode[2 - 4] = ((float)(input[stbir__decode_order2])) *
10890- stbir__max_uint16_as_float_inverted;
10891- decode[3 - 4] = ((float)(input[stbir__decode_order3])) *
10892- stbir__max_uint16_as_float_inverted;
10893- decode += 4;
10894- input += 4;
10895- }
10896- decode -= 4;
10897-#endif
10898-
10899-// do the remnants
10900-#if stbir__coder_min_num < 4
10901- STBIR_NO_UNROLL_LOOP_START
10902- while (decode < decode_end) {
10903- STBIR_NO_UNROLL(decode);
10904- decode[0] = ((float)(input[stbir__decode_order0])) *
10905- stbir__max_uint16_as_float_inverted;
10906-#if stbir__coder_min_num >= 2
10907- decode[1] = ((float)(input[stbir__decode_order1])) *
10908- stbir__max_uint16_as_float_inverted;
10909-#endif
10910-#if stbir__coder_min_num >= 3
10911- decode[2] = ((float)(input[stbir__decode_order2])) *
10912- stbir__max_uint16_as_float_inverted;
10913-#endif
10914- decode += stbir__coder_min_num;
10915- input += stbir__coder_min_num;
10916- }
10917-#endif
10918- return decode_end;
10919-}
10920-
10921-static void
10922-STBIR__CODER_NAME(stbir__encode_uint16_linear_scaled)(void *outputp,
10923- int width_times_channels,
10924- float const *encode)
10925-{
10926- unsigned short STBIR_SIMD_STREAMOUT_PTR(*) output =
10927- (unsigned short *)outputp;
10928- unsigned short *end_output =
10929- ((unsigned short *)output) + width_times_channels;
10930-
10931-#ifdef STBIR_SIMD
10932- {
10933- if (width_times_channels >= stbir__simdfX_float_count * 2) {
10934- float const *end_encode_m8 =
10935- encode + width_times_channels - stbir__simdfX_float_count * 2;
10936- end_output -= stbir__simdfX_float_count * 2;
10937- STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
10938- for (;;) {
10939- stbir__simdfX e0, e1;
10940- stbir__simdiX i;
10941- STBIR_SIMD_NO_UNROLL(encode);
10942- stbir__simdfX_madd_mem(e0, STBIR_simd_point5X,
10943- STBIR_max_uint16_as_floatX, encode);
10944- stbir__simdfX_madd_mem(e1, STBIR_simd_point5X,
10945- STBIR_max_uint16_as_floatX,
10946- encode + stbir__simdfX_float_count);
10947- stbir__encode_simdfX_unflip(e0);
10948- stbir__encode_simdfX_unflip(e1);
10949- stbir__simdfX_pack_to_words(i, e0, e1);
10950- stbir__simdiX_store(output, i);
10951- encode += stbir__simdfX_float_count * 2;
10952- output += stbir__simdfX_float_count * 2;
10953- if (output <= end_output) {
10954- continue;
10955- }
10956- if (output == (end_output + stbir__simdfX_float_count * 2)) {
10957- break;
10958- }
10959- output = end_output; // backup and do last couple
10960- encode = end_encode_m8;
10961- }
10962- return;
10963- }
10964- }
10965-
10966-// try to do blocks of 4 when you can
10967-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
10968- output += 4;
10969- STBIR_NO_UNROLL_LOOP_START
10970- while (output <= end_output) {
10971- stbir__simdf e;
10972- stbir__simdi i;
10973- STBIR_NO_UNROLL(encode);
10974- stbir__simdf_load(e, encode);
10975- stbir__simdf_madd(e, STBIR__CONSTF(STBIR_simd_point5),
10976- STBIR__CONSTF(STBIR_max_uint16_as_float), e);
10977- stbir__encode_simdf4_unflip(e);
10978- stbir__simdf_pack_to_8words(i, e, e); // only use first 4
10979- stbir__simdi_store2(output - 4, i);
10980- output += 4;
10981- encode += 4;
10982- }
10983- output -= 4;
10984-#endif
10985-
10986-// do the remnants
10987-#if stbir__coder_min_num < 4
10988- STBIR_NO_UNROLL_LOOP_START
10989- while (output < end_output) {
10990- stbir__simdf e;
10991- STBIR_NO_UNROLL(encode);
10992- stbir__simdf_madd1_mem(e, STBIR__CONSTF(STBIR_simd_point5),
10993- STBIR__CONSTF(STBIR_max_uint16_as_float),
10994- encode + stbir__encode_order0);
10995- output[0] = stbir__simdf_convert_float_to_short(e);
10996-#if stbir__coder_min_num >= 2
10997- stbir__simdf_madd1_mem(e, STBIR__CONSTF(STBIR_simd_point5),
10998- STBIR__CONSTF(STBIR_max_uint16_as_float),
10999- encode + stbir__encode_order1);
11000- output[1] = stbir__simdf_convert_float_to_short(e);
11001-#endif
11002-#if stbir__coder_min_num >= 3
11003- stbir__simdf_madd1_mem(e, STBIR__CONSTF(STBIR_simd_point5),
11004- STBIR__CONSTF(STBIR_max_uint16_as_float),
11005- encode + stbir__encode_order2);
11006- output[2] = stbir__simdf_convert_float_to_short(e);
11007-#endif
11008- output += stbir__coder_min_num;
11009- encode += stbir__coder_min_num;
11010- }
11011-#endif
11012-
11013-#else
11014-
11015-// try to do blocks of 4 when you can
11016-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
11017- output += 4;
11018- STBIR_SIMD_NO_UNROLL_LOOP_START
11019- while (output <= end_output) {
11020- float f;
11021- STBIR_SIMD_NO_UNROLL(encode);
11022- f = encode[stbir__encode_order0] * stbir__max_uint16_as_float + 0.5f;
11023- STBIR_CLAMP(f, 0, 65535);
11024- output[0 - 4] = (unsigned short)f;
11025- f = encode[stbir__encode_order1] * stbir__max_uint16_as_float + 0.5f;
11026- STBIR_CLAMP(f, 0, 65535);
11027- output[1 - 4] = (unsigned short)f;
11028- f = encode[stbir__encode_order2] * stbir__max_uint16_as_float + 0.5f;
11029- STBIR_CLAMP(f, 0, 65535);
11030- output[2 - 4] = (unsigned short)f;
11031- f = encode[stbir__encode_order3] * stbir__max_uint16_as_float + 0.5f;
11032- STBIR_CLAMP(f, 0, 65535);
11033- output[3 - 4] = (unsigned short)f;
11034- output += 4;
11035- encode += 4;
11036- }
11037- output -= 4;
11038-#endif
11039-
11040-// do the remnants
11041-#if stbir__coder_min_num < 4
11042- STBIR_NO_UNROLL_LOOP_START
11043- while (output < end_output) {
11044- float f;
11045- STBIR_NO_UNROLL(encode);
11046- f = encode[stbir__encode_order0] * stbir__max_uint16_as_float + 0.5f;
11047- STBIR_CLAMP(f, 0, 65535);
11048- output[0] = (unsigned short)f;
11049-#if stbir__coder_min_num >= 2
11050- f = encode[stbir__encode_order1] * stbir__max_uint16_as_float + 0.5f;
11051- STBIR_CLAMP(f, 0, 65535);
11052- output[1] = (unsigned short)f;
11053-#endif
11054-#if stbir__coder_min_num >= 3
11055- f = encode[stbir__encode_order2] * stbir__max_uint16_as_float + 0.5f;
11056- STBIR_CLAMP(f, 0, 65535);
11057- output[2] = (unsigned short)f;
11058-#endif
11059- output += stbir__coder_min_num;
11060- encode += stbir__coder_min_num;
11061- }
11062-#endif
11063-#endif
11064-}
11065-
11066-static float *
11067-STBIR__CODER_NAME(stbir__decode_uint16_linear)(float *decodep,
11068- int width_times_channels,
11069- void const *inputp)
11070-{
11071- float STBIR_STREAMOUT_PTR(*) decode = decodep;
11072- float *decode_end = (float *)decode + width_times_channels;
11073- unsigned short const *input = (unsigned short const *)inputp;
11074-
11075-#ifdef STBIR_SIMD
11076- unsigned short const *end_input_m8 = input + width_times_channels - 8;
11077- if (width_times_channels >= 8) {
11078- decode_end -= 8;
11079- STBIR_NO_UNROLL_LOOP_START_INF_FOR
11080- for (;;) {
11081-#ifdef STBIR_SIMD8
11082- stbir__simdi i;
11083- stbir__simdi8 o;
11084- stbir__simdf8 of;
11085- STBIR_NO_UNROLL(decode);
11086- stbir__simdi_load(i, input);
11087- stbir__simdi8_expand_u16_to_u32(o, i);
11088- stbir__simdi8_convert_i32_to_float(of, o);
11089- stbir__decode_simdf8_flip(of);
11090- stbir__simdf8_store(decode + 0, of);
11091-#else
11092- stbir__simdi i, o0, o1;
11093- stbir__simdf of0, of1;
11094- STBIR_NO_UNROLL(decode);
11095- stbir__simdi_load(i, input);
11096- stbir__simdi_expand_u16_to_u32(o0, o1, i);
11097- stbir__simdi_convert_i32_to_float(of0, o0);
11098- stbir__simdi_convert_i32_to_float(of1, o1);
11099- stbir__decode_simdf4_flip(of0);
11100- stbir__decode_simdf4_flip(of1);
11101- stbir__simdf_store(decode + 0, of0);
11102- stbir__simdf_store(decode + 4, of1);
11103-#endif
11104- decode += 8;
11105- input += 8;
11106- if (decode <= decode_end) {
11107- continue;
11108- }
11109- if (decode == (decode_end + 8)) {
11110- break;
11111- }
11112- decode = decode_end; // backup and do last couple
11113- input = end_input_m8;
11114- }
11115- return decode_end + 8;
11116- }
11117-#endif
11118-
11119-// try to do blocks of 4 when you can
11120-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
11121- decode += 4;
11122- STBIR_SIMD_NO_UNROLL_LOOP_START
11123- while (decode <= decode_end) {
11124- STBIR_SIMD_NO_UNROLL(decode);
11125- decode[0 - 4] = ((float)(input[stbir__decode_order0]));
11126- decode[1 - 4] = ((float)(input[stbir__decode_order1]));
11127- decode[2 - 4] = ((float)(input[stbir__decode_order2]));
11128- decode[3 - 4] = ((float)(input[stbir__decode_order3]));
11129- decode += 4;
11130- input += 4;
11131- }
11132- decode -= 4;
11133-#endif
11134-
11135-// do the remnants
11136-#if stbir__coder_min_num < 4
11137- STBIR_NO_UNROLL_LOOP_START
11138- while (decode < decode_end) {
11139- STBIR_NO_UNROLL(decode);
11140- decode[0] = ((float)(input[stbir__decode_order0]));
11141-#if stbir__coder_min_num >= 2
11142- decode[1] = ((float)(input[stbir__decode_order1]));
11143-#endif
11144-#if stbir__coder_min_num >= 3
11145- decode[2] = ((float)(input[stbir__decode_order2]));
11146-#endif
11147- decode += stbir__coder_min_num;
11148- input += stbir__coder_min_num;
11149- }
11150-#endif
11151- return decode_end;
11152-}
11153-
11154-static void
11155-STBIR__CODER_NAME(stbir__encode_uint16_linear)(void *outputp,
11156- int width_times_channels,
11157- float const *encode)
11158-{
11159- unsigned short STBIR_SIMD_STREAMOUT_PTR(*) output =
11160- (unsigned short *)outputp;
11161- unsigned short *end_output =
11162- ((unsigned short *)output) + width_times_channels;
11163-
11164-#ifdef STBIR_SIMD
11165- {
11166- if (width_times_channels >= stbir__simdfX_float_count * 2) {
11167- float const *end_encode_m8 =
11168- encode + width_times_channels - stbir__simdfX_float_count * 2;
11169- end_output -= stbir__simdfX_float_count * 2;
11170- STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
11171- for (;;) {
11172- stbir__simdfX e0, e1;
11173- stbir__simdiX i;
11174- STBIR_SIMD_NO_UNROLL(encode);
11175- stbir__simdfX_add_mem(e0, STBIR_simd_point5X, encode);
11176- stbir__simdfX_add_mem(e1, STBIR_simd_point5X,
11177- encode + stbir__simdfX_float_count);
11178- stbir__encode_simdfX_unflip(e0);
11179- stbir__encode_simdfX_unflip(e1);
11180- stbir__simdfX_pack_to_words(i, e0, e1);
11181- stbir__simdiX_store(output, i);
11182- encode += stbir__simdfX_float_count * 2;
11183- output += stbir__simdfX_float_count * 2;
11184- if (output <= end_output) {
11185- continue;
11186- }
11187- if (output == (end_output + stbir__simdfX_float_count * 2)) {
11188- break;
11189- }
11190- output = end_output; // backup and do last couple
11191- encode = end_encode_m8;
11192- }
11193- return;
11194- }
11195- }
11196-
11197-// try to do blocks of 4 when you can
11198-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
11199- output += 4;
11200- STBIR_NO_UNROLL_LOOP_START
11201- while (output <= end_output) {
11202- stbir__simdf e;
11203- stbir__simdi i;
11204- STBIR_NO_UNROLL(encode);
11205- stbir__simdf_load(e, encode);
11206- stbir__simdf_add(e, STBIR__CONSTF(STBIR_simd_point5), e);
11207- stbir__encode_simdf4_unflip(e);
11208- stbir__simdf_pack_to_8words(i, e, e); // only use first 4
11209- stbir__simdi_store2(output - 4, i);
11210- output += 4;
11211- encode += 4;
11212- }
11213- output -= 4;
11214-#endif
11215-
11216-#else
11217-
11218-// try to do blocks of 4 when you can
11219-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
11220- output += 4;
11221- STBIR_SIMD_NO_UNROLL_LOOP_START
11222- while (output <= end_output) {
11223- float f;
11224- STBIR_SIMD_NO_UNROLL(encode);
11225- f = encode[stbir__encode_order0] + 0.5f;
11226- STBIR_CLAMP(f, 0, 65535);
11227- output[0 - 4] = (unsigned short)f;
11228- f = encode[stbir__encode_order1] + 0.5f;
11229- STBIR_CLAMP(f, 0, 65535);
11230- output[1 - 4] = (unsigned short)f;
11231- f = encode[stbir__encode_order2] + 0.5f;
11232- STBIR_CLAMP(f, 0, 65535);
11233- output[2 - 4] = (unsigned short)f;
11234- f = encode[stbir__encode_order3] + 0.5f;
11235- STBIR_CLAMP(f, 0, 65535);
11236- output[3 - 4] = (unsigned short)f;
11237- output += 4;
11238- encode += 4;
11239- }
11240- output -= 4;
11241-#endif
11242-
11243-#endif
11244-
11245-// do the remnants
11246-#if stbir__coder_min_num < 4
11247- STBIR_NO_UNROLL_LOOP_START
11248- while (output < end_output) {
11249- float f;
11250- STBIR_NO_UNROLL(encode);
11251- f = encode[stbir__encode_order0] + 0.5f;
11252- STBIR_CLAMP(f, 0, 65535);
11253- output[0] = (unsigned short)f;
11254-#if stbir__coder_min_num >= 2
11255- f = encode[stbir__encode_order1] + 0.5f;
11256- STBIR_CLAMP(f, 0, 65535);
11257- output[1] = (unsigned short)f;
11258-#endif
11259-#if stbir__coder_min_num >= 3
11260- f = encode[stbir__encode_order2] + 0.5f;
11261- STBIR_CLAMP(f, 0, 65535);
11262- output[2] = (unsigned short)f;
11263-#endif
11264- output += stbir__coder_min_num;
11265- encode += stbir__coder_min_num;
11266- }
11267-#endif
11268-}
11269-
11270-static float *
11271-STBIR__CODER_NAME(stbir__decode_half_float_linear)(float *decodep,
11272- int width_times_channels,
11273- void const *inputp)
11274-{
11275- float STBIR_STREAMOUT_PTR(*) decode = decodep;
11276- float *decode_end = (float *)decode + width_times_channels;
11277- stbir__FP16 const *input = (stbir__FP16 const *)inputp;
11278-
11279-#ifdef STBIR_SIMD
11280- if (width_times_channels >= 8) {
11281- stbir__FP16 const *end_input_m8 = input + width_times_channels - 8;
11282- decode_end -= 8;
11283- STBIR_NO_UNROLL_LOOP_START_INF_FOR
11284- for (;;) {
11285- STBIR_NO_UNROLL(decode);
11286-
11287- stbir__half_to_float_SIMD(decode, input);
11288-#ifdef stbir__decode_swizzle
11289-#ifdef STBIR_SIMD8
11290- {
11291- stbir__simdf8 of;
11292- stbir__simdf8_load(of, decode);
11293- stbir__decode_simdf8_flip(of);
11294- stbir__simdf8_store(decode, of);
11295- }
11296-#else
11297- {
11298- stbir__simdf of0, of1;
11299- stbir__simdf_load(of0, decode);
11300- stbir__simdf_load(of1, decode + 4);
11301- stbir__decode_simdf4_flip(of0);
11302- stbir__decode_simdf4_flip(of1);
11303- stbir__simdf_store(decode, of0);
11304- stbir__simdf_store(decode + 4, of1);
11305- }
11306-#endif
11307-#endif
11308- decode += 8;
11309- input += 8;
11310- if (decode <= decode_end) {
11311- continue;
11312- }
11313- if (decode == (decode_end + 8)) {
11314- break;
11315- }
11316- decode = decode_end; // backup and do last couple
11317- input = end_input_m8;
11318- }
11319- return decode_end + 8;
11320- }
11321-#endif
11322-
11323-// try to do blocks of 4 when you can
11324-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
11325- decode += 4;
11326- STBIR_SIMD_NO_UNROLL_LOOP_START
11327- while (decode <= decode_end) {
11328- STBIR_SIMD_NO_UNROLL(decode);
11329- decode[0 - 4] = stbir__half_to_float(input[stbir__decode_order0]);
11330- decode[1 - 4] = stbir__half_to_float(input[stbir__decode_order1]);
11331- decode[2 - 4] = stbir__half_to_float(input[stbir__decode_order2]);
11332- decode[3 - 4] = stbir__half_to_float(input[stbir__decode_order3]);
11333- decode += 4;
11334- input += 4;
11335- }
11336- decode -= 4;
11337-#endif
11338-
11339-// do the remnants
11340-#if stbir__coder_min_num < 4
11341- STBIR_NO_UNROLL_LOOP_START
11342- while (decode < decode_end) {
11343- STBIR_NO_UNROLL(decode);
11344- decode[0] = stbir__half_to_float(input[stbir__decode_order0]);
11345-#if stbir__coder_min_num >= 2
11346- decode[1] = stbir__half_to_float(input[stbir__decode_order1]);
11347-#endif
11348-#if stbir__coder_min_num >= 3
11349- decode[2] = stbir__half_to_float(input[stbir__decode_order2]);
11350-#endif
11351- decode += stbir__coder_min_num;
11352- input += stbir__coder_min_num;
11353- }
11354-#endif
11355- return decode_end;
11356-}
11357-
11358-static void
11359-STBIR__CODER_NAME(stbir__encode_half_float_linear)(void *outputp,
11360- int width_times_channels,
11361- float const *encode)
11362-{
11363- stbir__FP16 STBIR_SIMD_STREAMOUT_PTR(*) output = (stbir__FP16 *)outputp;
11364- stbir__FP16 *end_output = ((stbir__FP16 *)output) + width_times_channels;
11365-
11366-#ifdef STBIR_SIMD
11367- if (width_times_channels >= 8) {
11368- float const *end_encode_m8 = encode + width_times_channels - 8;
11369- end_output -= 8;
11370- STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
11371- for (;;) {
11372- STBIR_SIMD_NO_UNROLL(encode);
11373-#ifdef stbir__decode_swizzle
11374-#ifdef STBIR_SIMD8
11375- {
11376- stbir__simdf8 of;
11377- stbir__simdf8_load(of, encode);
11378- stbir__encode_simdf8_unflip(of);
11379- stbir__float_to_half_SIMD(output, (float *)&of);
11380- }
11381-#else
11382- {
11383- stbir__simdf of[2];
11384- stbir__simdf_load(of[0], encode);
11385- stbir__simdf_load(of[1], encode + 4);
11386- stbir__encode_simdf4_unflip(of[0]);
11387- stbir__encode_simdf4_unflip(of[1]);
11388- stbir__float_to_half_SIMD(output, (float *)of);
11389- }
11390-#endif
11391-#else
11392- stbir__float_to_half_SIMD(output, encode);
11393-#endif
11394- encode += 8;
11395- output += 8;
11396- if (output <= end_output) {
11397- continue;
11398- }
11399- if (output == (end_output + 8)) {
11400- break;
11401- }
11402- output = end_output; // backup and do last couple
11403- encode = end_encode_m8;
11404- }
11405- return;
11406- }
11407-#endif
11408-
11409-// try to do blocks of 4 when you can
11410-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
11411- output += 4;
11412- STBIR_SIMD_NO_UNROLL_LOOP_START
11413- while (output <= end_output) {
11414- STBIR_SIMD_NO_UNROLL(output);
11415- output[0 - 4] = stbir__float_to_half(encode[stbir__encode_order0]);
11416- output[1 - 4] = stbir__float_to_half(encode[stbir__encode_order1]);
11417- output[2 - 4] = stbir__float_to_half(encode[stbir__encode_order2]);
11418- output[3 - 4] = stbir__float_to_half(encode[stbir__encode_order3]);
11419- output += 4;
11420- encode += 4;
11421- }
11422- output -= 4;
11423-#endif
11424-
11425-// do the remnants
11426-#if stbir__coder_min_num < 4
11427- STBIR_NO_UNROLL_LOOP_START
11428- while (output < end_output) {
11429- STBIR_NO_UNROLL(output);
11430- output[0] = stbir__float_to_half(encode[stbir__encode_order0]);
11431-#if stbir__coder_min_num >= 2
11432- output[1] = stbir__float_to_half(encode[stbir__encode_order1]);
11433-#endif
11434-#if stbir__coder_min_num >= 3
11435- output[2] = stbir__float_to_half(encode[stbir__encode_order2]);
11436-#endif
11437- output += stbir__coder_min_num;
11438- encode += stbir__coder_min_num;
11439- }
11440-#endif
11441-}
11442-
11443-static float *
11444-STBIR__CODER_NAME(stbir__decode_float_linear)(float *decodep,
11445- int width_times_channels,
11446- void const *inputp)
11447-{
11448-#ifdef stbir__decode_swizzle
11449- float STBIR_STREAMOUT_PTR(*) decode = decodep;
11450- float *decode_end = (float *)decode + width_times_channels;
11451- float const *input = (float const *)inputp;
11452-
11453-#ifdef STBIR_SIMD
11454- if (width_times_channels >= 16) {
11455- float const *end_input_m16 = input + width_times_channels - 16;
11456- decode_end -= 16;
11457- STBIR_NO_UNROLL_LOOP_START_INF_FOR
11458- for (;;) {
11459- STBIR_NO_UNROLL(decode);
11460-#ifdef stbir__decode_swizzle
11461-#ifdef STBIR_SIMD8
11462- {
11463- stbir__simdf8 of0, of1;
11464- stbir__simdf8_load(of0, input);
11465- stbir__simdf8_load(of1, input + 8);
11466- stbir__decode_simdf8_flip(of0);
11467- stbir__decode_simdf8_flip(of1);
11468- stbir__simdf8_store(decode, of0);
11469- stbir__simdf8_store(decode + 8, of1);
11470- }
11471-#else
11472- {
11473- stbir__simdf of0, of1, of2, of3;
11474- stbir__simdf_load(of0, input);
11475- stbir__simdf_load(of1, input + 4);
11476- stbir__simdf_load(of2, input + 8);
11477- stbir__simdf_load(of3, input + 12);
11478- stbir__decode_simdf4_flip(of0);
11479- stbir__decode_simdf4_flip(of1);
11480- stbir__decode_simdf4_flip(of2);
11481- stbir__decode_simdf4_flip(of3);
11482- stbir__simdf_store(decode, of0);
11483- stbir__simdf_store(decode + 4, of1);
11484- stbir__simdf_store(decode + 8, of2);
11485- stbir__simdf_store(decode + 12, of3);
11486- }
11487-#endif
11488-#endif
11489- decode += 16;
11490- input += 16;
11491- if (decode <= decode_end) {
11492- continue;
11493- }
11494- if (decode == (decode_end + 16)) {
11495- break;
11496- }
11497- decode = decode_end; // backup and do last couple
11498- input = end_input_m16;
11499- }
11500- return decode_end + 16;
11501- }
11502-#endif
11503-
11504-// try to do blocks of 4 when you can
11505-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
11506- decode += 4;
11507- STBIR_SIMD_NO_UNROLL_LOOP_START
11508- while (decode <= decode_end) {
11509- STBIR_SIMD_NO_UNROLL(decode);
11510- decode[0 - 4] = input[stbir__decode_order0];
11511- decode[1 - 4] = input[stbir__decode_order1];
11512- decode[2 - 4] = input[stbir__decode_order2];
11513- decode[3 - 4] = input[stbir__decode_order3];
11514- decode += 4;
11515- input += 4;
11516- }
11517- decode -= 4;
11518-#endif
11519-
11520-// do the remnants
11521-#if stbir__coder_min_num < 4
11522- STBIR_NO_UNROLL_LOOP_START
11523- while (decode < decode_end) {
11524- STBIR_NO_UNROLL(decode);
11525- decode[0] = input[stbir__decode_order0];
11526-#if stbir__coder_min_num >= 2
11527- decode[1] = input[stbir__decode_order1];
11528-#endif
11529-#if stbir__coder_min_num >= 3
11530- decode[2] = input[stbir__decode_order2];
11531-#endif
11532- decode += stbir__coder_min_num;
11533- input += stbir__coder_min_num;
11534- }
11535-#endif
11536- return decode_end;
11537-
11538-#else
11539-
11540- if ((void *)decodep != inputp) {
11541- STBIR_MEMCPY(decodep, inputp, width_times_channels * sizeof(float));
11542- }
11543-
11544- return decodep + width_times_channels;
11545-
11546-#endif
11547-}
11548-
11549-static void
11550-STBIR__CODER_NAME(stbir__encode_float_linear)(void *outputp,
11551- int width_times_channels,
11552- float const *encode)
11553-{
11554-#if !defined(STBIR_FLOAT_HIGH_CLAMP) && !defined(STBIR_FLOAT_LO_CLAMP) && \
11555- !defined(stbir__decode_swizzle)
11556-
11557- if ((void *)outputp != (void *)encode) {
11558- STBIR_MEMCPY(outputp, encode, width_times_channels * sizeof(float));
11559- }
11560-
11561-#else
11562-
11563- float STBIR_SIMD_STREAMOUT_PTR(*) output = (float *)outputp;
11564- float *end_output = ((float *)output) + width_times_channels;
11565-
11566-#ifdef STBIR_FLOAT_HIGH_CLAMP
11567-#define stbir_scalar_hi_clamp(v) \
11568- if (v > STBIR_FLOAT_HIGH_CLAMP) \
11569- v = STBIR_FLOAT_HIGH_CLAMP;
11570-#else
11571-#define stbir_scalar_hi_clamp(v)
11572-#endif
11573-#ifdef STBIR_FLOAT_LOW_CLAMP
11574-#define stbir_scalar_lo_clamp(v) \
11575- if (v < STBIR_FLOAT_LOW_CLAMP) \
11576- v = STBIR_FLOAT_LOW_CLAMP;
11577-#else
11578-#define stbir_scalar_lo_clamp(v)
11579-#endif
11580-
11581-#ifdef STBIR_SIMD
11582-
11583-#ifdef STBIR_FLOAT_HIGH_CLAMP
11584- const stbir__simdfX high_clamp = stbir__simdf_frepX(STBIR_FLOAT_HIGH_CLAMP);
11585-#endif
11586-#ifdef STBIR_FLOAT_LOW_CLAMP
11587- const stbir__simdfX low_clamp = stbir__simdf_frepX(STBIR_FLOAT_LOW_CLAMP);
11588-#endif
11589-
11590- if (width_times_channels >= (stbir__simdfX_float_count * 2)) {
11591- float const *end_encode_m8 =
11592- encode + width_times_channels - (stbir__simdfX_float_count * 2);
11593- end_output -= (stbir__simdfX_float_count * 2);
11594- STBIR_SIMD_NO_UNROLL_LOOP_START_INF_FOR
11595- for (;;) {
11596- stbir__simdfX e0, e1;
11597- STBIR_SIMD_NO_UNROLL(encode);
11598- stbir__simdfX_load(e0, encode);
11599- stbir__simdfX_load(e1, encode + stbir__simdfX_float_count);
11600-#ifdef STBIR_FLOAT_HIGH_CLAMP
11601- stbir__simdfX_min(e0, e0, high_clamp);
11602- stbir__simdfX_min(e1, e1, high_clamp);
11603-#endif
11604-#ifdef STBIR_FLOAT_LOW_CLAMP
11605- stbir__simdfX_max(e0, e0, low_clamp);
11606- stbir__simdfX_max(e1, e1, low_clamp);
11607-#endif
11608- stbir__encode_simdfX_unflip(e0);
11609- stbir__encode_simdfX_unflip(e1);
11610- stbir__simdfX_store(output, e0);
11611- stbir__simdfX_store(output + stbir__simdfX_float_count, e1);
11612- encode += stbir__simdfX_float_count * 2;
11613- output += stbir__simdfX_float_count * 2;
11614- if (output < end_output) {
11615- continue;
11616- }
11617- if (output == (end_output + (stbir__simdfX_float_count * 2))) {
11618- break;
11619- }
11620- output = end_output; // backup and do last couple
11621- encode = end_encode_m8;
11622- }
11623- return;
11624- }
11625-
11626-// try to do blocks of 4 when you can
11627-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
11628- output += 4;
11629- STBIR_NO_UNROLL_LOOP_START
11630- while (output <= end_output) {
11631- stbir__simdf e0;
11632- STBIR_NO_UNROLL(encode);
11633- stbir__simdf_load(e0, encode);
11634-#ifdef STBIR_FLOAT_HIGH_CLAMP
11635- stbir__simdf_min(e0, e0, high_clamp);
11636-#endif
11637-#ifdef STBIR_FLOAT_LOW_CLAMP
11638- stbir__simdf_max(e0, e0, low_clamp);
11639-#endif
11640- stbir__encode_simdf4_unflip(e0);
11641- stbir__simdf_store(output - 4, e0);
11642- output += 4;
11643- encode += 4;
11644- }
11645- output -= 4;
11646-#endif
11647-
11648-#else
11649-
11650-// try to do blocks of 4 when you can
11651-#if stbir__coder_min_num != 3 // doesn't divide cleanly by four
11652- output += 4;
11653- STBIR_SIMD_NO_UNROLL_LOOP_START
11654- while (output <= end_output) {
11655- float e;
11656- STBIR_SIMD_NO_UNROLL(encode);
11657- e = encode[stbir__encode_order0];
11658- stbir_scalar_hi_clamp(e);
11659- stbir_scalar_lo_clamp(e);
11660- output[0 - 4] = e;
11661- e = encode[stbir__encode_order1];
11662- stbir_scalar_hi_clamp(e);
11663- stbir_scalar_lo_clamp(e);
11664- output[1 - 4] = e;
11665- e = encode[stbir__encode_order2];
11666- stbir_scalar_hi_clamp(e);
11667- stbir_scalar_lo_clamp(e);
11668- output[2 - 4] = e;
11669- e = encode[stbir__encode_order3];
11670- stbir_scalar_hi_clamp(e);
11671- stbir_scalar_lo_clamp(e);
11672- output[3 - 4] = e;
11673- output += 4;
11674- encode += 4;
11675- }
11676- output -= 4;
11677-
11678-#endif
11679-
11680-#endif
11681-
11682-// do the remnants
11683-#if stbir__coder_min_num < 4
11684- STBIR_NO_UNROLL_LOOP_START
11685- while (output < end_output) {
11686- float e;
11687- STBIR_NO_UNROLL(encode);
11688- e = encode[stbir__encode_order0];
11689- stbir_scalar_hi_clamp(e);
11690- stbir_scalar_lo_clamp(e);
11691- output[0] = e;
11692-#if stbir__coder_min_num >= 2
11693- e = encode[stbir__encode_order1];
11694- stbir_scalar_hi_clamp(e);
11695- stbir_scalar_lo_clamp(e);
11696- output[1] = e;
11697-#endif
11698-#if stbir__coder_min_num >= 3
11699- e = encode[stbir__encode_order2];
11700- stbir_scalar_hi_clamp(e);
11701- stbir_scalar_lo_clamp(e);
11702- output[2] = e;
11703-#endif
11704- output += stbir__coder_min_num;
11705- encode += stbir__coder_min_num;
11706- }
11707-#endif
11708-
11709-#endif
11710-}
11711-
11712-#undef stbir__decode_suffix
11713-#undef stbir__decode_simdf8_flip
11714-#undef stbir__decode_simdf4_flip
11715-#undef stbir__decode_order0
11716-#undef stbir__decode_order1
11717-#undef stbir__decode_order2
11718-#undef stbir__decode_order3
11719-#undef stbir__encode_order0
11720-#undef stbir__encode_order1
11721-#undef stbir__encode_order2
11722-#undef stbir__encode_order3
11723-#undef stbir__encode_simdf8_unflip
11724-#undef stbir__encode_simdf4_unflip
11725-#undef stbir__encode_simdfX_unflip
11726-#undef STBIR__CODER_NAME
11727-#undef stbir__coder_min_num
11728-#undef stbir__decode_swizzle
11729-#undef stbir_scalar_hi_clamp
11730-#undef stbir_scalar_lo_clamp
11731-#undef STB_IMAGE_RESIZE_DO_CODERS
11732-
11733-#elif defined(STB_IMAGE_RESIZE_DO_VERTICALS)
11734-
11735-#ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
11736-#define STBIR_chans(start, end) \
11737- STBIR_strs_join14(start, STBIR__vertical_channels, end, _cont)
11738-#else
11739-#define STBIR_chans(start, end) \
11740- STBIR_strs_join1(start, STBIR__vertical_channels, end)
11741-#endif
11742-
11743-#if STBIR__vertical_channels >= 1
11744-#define stbIF0(code) code
11745-#else
11746-#define stbIF0(code)
11747-#endif
11748-#if STBIR__vertical_channels >= 2
11749-#define stbIF1(code) code
11750-#else
11751-#define stbIF1(code)
11752-#endif
11753-#if STBIR__vertical_channels >= 3
11754-#define stbIF2(code) code
11755-#else
11756-#define stbIF2(code)
11757-#endif
11758-#if STBIR__vertical_channels >= 4
11759-#define stbIF3(code) code
11760-#else
11761-#define stbIF3(code)
11762-#endif
11763-#if STBIR__vertical_channels >= 5
11764-#define stbIF4(code) code
11765-#else
11766-#define stbIF4(code)
11767-#endif
11768-#if STBIR__vertical_channels >= 6
11769-#define stbIF5(code) code
11770-#else
11771-#define stbIF5(code)
11772-#endif
11773-#if STBIR__vertical_channels >= 7
11774-#define stbIF6(code) code
11775-#else
11776-#define stbIF6(code)
11777-#endif
11778-#if STBIR__vertical_channels >= 8
11779-#define stbIF7(code) code
11780-#else
11781-#define stbIF7(code)
11782-#endif
11783-
11784-static void
11785-STBIR_chans(stbir__vertical_scatter_with_,
11786- _coeffs)(float **outputs,
11787- float const *vertical_coefficients,
11788- float const *input,
11789- float const *input_end)
11790-{
11791- stbIF0(float STBIR_SIMD_STREAMOUT_PTR(*) output0 = outputs[0];
11792- float c0s = vertical_coefficients[0];)
11793- stbIF1(float STBIR_SIMD_STREAMOUT_PTR(*) output1 = outputs[1];
11794- float c1s = vertical_coefficients[1];)
11795- stbIF2(float STBIR_SIMD_STREAMOUT_PTR(*) output2 = outputs[2];
11796- float c2s = vertical_coefficients[2];)
11797- stbIF3(float STBIR_SIMD_STREAMOUT_PTR(*) output3 = outputs[3];
11798- float c3s = vertical_coefficients[3];)
11799- stbIF4(float STBIR_SIMD_STREAMOUT_PTR(*) output4 =
11800- outputs[4];
11801- float c4s = vertical_coefficients[4];)
11802- stbIF5(float STBIR_SIMD_STREAMOUT_PTR(*) output5 =
11803- outputs[5];
11804- float c5s = vertical_coefficients[5];)
11805- stbIF6(float STBIR_SIMD_STREAMOUT_PTR(*) output6 =
11806- outputs[6];
11807- float c6s = vertical_coefficients[6];)
11808- stbIF7(float STBIR_SIMD_STREAMOUT_PTR(*)
11809- output7 = outputs[7];
11810- float c7s = vertical_coefficients[7];)
11811-
11812-#ifdef STBIR_SIMD
11813- {
11814- stbIF0(stbir__simdfX c0 = stbir__simdf_frepX(c0s);)
11815- stbIF1(stbir__simdfX c1 = stbir__simdf_frepX(c1s);)
11816- stbIF2(stbir__simdfX c2 = stbir__simdf_frepX(c2s);) stbIF3(
11817- stbir__simdfX c3 = stbir__simdf_frepX(c3s);)
11818- stbIF4(stbir__simdfX c4 = stbir__simdf_frepX(c4s);) stbIF5(
11819- stbir__simdfX c5 = stbir__simdf_frepX(c5s);)
11820- stbIF6(stbir__simdfX c6 = stbir__simdf_frepX(c6s);)
11821- stbIF7(stbir__simdfX c7 = stbir__simdf_frepX(c7s);)
11822- STBIR_SIMD_NO_UNROLL_LOOP_START while (
11823- ((char *)input_end - (char *)input) >=
11824- (16 * stbir__simdfX_float_count))
11825- {
11826- stbir__simdfX o0, o1, o2, o3, r0, r1, r2, r3;
11827- STBIR_SIMD_NO_UNROLL(output0);
11828-
11829- stbir__simdfX_load(r0, input);
11830- stbir__simdfX_load(r1, input + stbir__simdfX_float_count);
11831- stbir__simdfX_load(r2, input + (2 * stbir__simdfX_float_count));
11832- stbir__simdfX_load(r3, input + (3 * stbir__simdfX_float_count));
11833-
11834-#ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
11835- stbIF0(
11836- stbir__simdfX_load(o0, output0);
11837- stbir__simdfX_load(o1, output0 + stbir__simdfX_float_count);
11838- stbir__simdfX_load(o2,
11839- output0 + (2 * stbir__simdfX_float_count));
11840- stbir__simdfX_load(o3,
11841- output0 + (3 * stbir__simdfX_float_count));
11842- stbir__simdfX_madd(o0, o0, r0, c0);
11843- stbir__simdfX_madd(o1, o1, r1, c0);
11844- stbir__simdfX_madd(o2, o2, r2, c0);
11845- stbir__simdfX_madd(o3, o3, r3, c0);
11846- stbir__simdfX_store(output0, o0);
11847- stbir__simdfX_store(output0 + stbir__simdfX_float_count, o1);
11848- stbir__simdfX_store(output0 + (2 * stbir__simdfX_float_count),
11849- o2);
11850- stbir__simdfX_store(
11851- output0 + (3 * stbir__simdfX_float_count),
11852- o3);) stbIF1(stbir__simdfX_load(o0, output1);
11853- stbir__simdfX_load(
11854- o1, output1 + stbir__simdfX_float_count);
11855- stbir__simdfX_load(
11856- o2,
11857- output1 + (2 * stbir__simdfX_float_count));
11858- stbir__simdfX_load(
11859- o3,
11860- output1 + (3 * stbir__simdfX_float_count));
11861- stbir__simdfX_madd(o0, o0, r0, c1);
11862- stbir__simdfX_madd(o1, o1, r1, c1);
11863- stbir__simdfX_madd(o2, o2, r2, c1);
11864- stbir__simdfX_madd(o3, o3, r3, c1);
11865- stbir__simdfX_store(output1, o0);
11866- stbir__simdfX_store(
11867- output1 + stbir__simdfX_float_count, o1);
11868- stbir__simdfX_store(
11869- output1 + (2 * stbir__simdfX_float_count),
11870- o2);
11871- stbir__simdfX_store(
11872- output1 + (3 * stbir__simdfX_float_count),
11873- o3);)
11874- stbIF2(
11875- stbir__simdfX_load(o0, output2);
11876- stbir__simdfX_load(o1, output2 + stbir__simdfX_float_count);
11877- stbir__simdfX_load(
11878- o2, output2 + (2 * stbir__simdfX_float_count));
11879- stbir__simdfX_load(
11880- o3, output2 + (3 * stbir__simdfX_float_count));
11881- stbir__simdfX_madd(o0, o0, r0, c2);
11882- stbir__simdfX_madd(o1, o1, r1, c2);
11883- stbir__simdfX_madd(o2, o2, r2, c2);
11884- stbir__simdfX_madd(o3, o3, r3, c2);
11885- stbir__simdfX_store(output2, o0);
11886- stbir__simdfX_store(output2 + stbir__simdfX_float_count,
11887- o1);
11888- stbir__simdfX_store(
11889- output2 + (2 * stbir__simdfX_float_count), o2);
11890- stbir__simdfX_store(
11891- output2 + (3 * stbir__simdfX_float_count),
11892- o3);) stbIF3(stbir__simdfX_load(o0, output3);
11893- stbir__simdfX_load(
11894- o1,
11895- output3 + stbir__simdfX_float_count);
11896- stbir__simdfX_load(
11897- o2,
11898- output3 +
11899- (2 * stbir__simdfX_float_count));
11900- stbir__simdfX_load(
11901- o3,
11902- output3 +
11903- (3 * stbir__simdfX_float_count));
11904- stbir__simdfX_madd(o0, o0, r0, c3);
11905- stbir__simdfX_madd(o1, o1, r1, c3);
11906- stbir__simdfX_madd(o2, o2, r2, c3);
11907- stbir__simdfX_madd(o3, o3, r3, c3);
11908- stbir__simdfX_store(output3, o0);
11909- stbir__simdfX_store(
11910- output3 + stbir__simdfX_float_count,
11911- o1);
11912- stbir__simdfX_store(
11913- output3 +
11914- (2 * stbir__simdfX_float_count),
11915- o2);
11916- stbir__simdfX_store(
11917- output3 +
11918- (3 * stbir__simdfX_float_count),
11919- o3);)
11920- stbIF4(stbir__simdfX_load(o0, output4); stbir__simdfX_load(
11921- o1, output4 + stbir__simdfX_float_count);
11922- stbir__simdfX_load(
11923- o2, output4 + (2 * stbir__simdfX_float_count));
11924- stbir__simdfX_load(
11925- o3, output4 + (3 * stbir__simdfX_float_count));
11926- stbir__simdfX_madd(o0, o0, r0, c4);
11927- stbir__simdfX_madd(o1, o1, r1, c4);
11928- stbir__simdfX_madd(o2, o2, r2, c4);
11929- stbir__simdfX_madd(o3, o3, r3, c4);
11930- stbir__simdfX_store(output4, o0);
11931- stbir__simdfX_store(
11932- output4 + stbir__simdfX_float_count, o1);
11933- stbir__simdfX_store(
11934- output4 + (2 * stbir__simdfX_float_count), o2);
11935- stbir__simdfX_store(
11936- output4 + (3 * stbir__simdfX_float_count), o3);)
11937- stbIF5(
11938- stbir__simdfX_load(o0, output5); stbir__simdfX_load(
11939- o1, output5 + stbir__simdfX_float_count);
11940- stbir__simdfX_load(
11941- o2, output5 + (2 * stbir__simdfX_float_count));
11942- stbir__simdfX_load(
11943- o3, output5 + (3 * stbir__simdfX_float_count));
11944- stbir__simdfX_madd(o0, o0, r0, c5);
11945- stbir__simdfX_madd(o1, o1, r1, c5);
11946- stbir__simdfX_madd(o2, o2, r2, c5);
11947- stbir__simdfX_madd(o3, o3, r3, c5);
11948- stbir__simdfX_store(output5, o0);
11949- stbir__simdfX_store(
11950- output5 + stbir__simdfX_float_count, o1);
11951- stbir__simdfX_store(
11952- output5 + (2 * stbir__simdfX_float_count), o2);
11953- stbir__simdfX_store(
11954- output5 + (3 * stbir__simdfX_float_count), o3);)
11955- stbIF6(
11956- stbir__simdfX_load(o0, output6);
11957- stbir__simdfX_load(
11958- o1, output6 + stbir__simdfX_float_count);
11959- stbir__simdfX_load(
11960- o2,
11961- output6 + (2 * stbir__simdfX_float_count));
11962- stbir__simdfX_load(
11963- o3,
11964- output6 + (3 * stbir__simdfX_float_count));
11965- stbir__simdfX_madd(o0, o0, r0, c6);
11966- stbir__simdfX_madd(o1, o1, r1, c6);
11967- stbir__simdfX_madd(o2, o2, r2, c6);
11968- stbir__simdfX_madd(o3, o3, r3, c6);
11969- stbir__simdfX_store(output6, o0);
11970- stbir__simdfX_store(
11971- output6 + stbir__simdfX_float_count, o1);
11972- stbir__simdfX_store(
11973- output6 + (2 * stbir__simdfX_float_count),
11974- o2);
11975- stbir__simdfX_store(
11976- output6 + (3 * stbir__simdfX_float_count),
11977- o3);)
11978- stbIF7(stbir__simdfX_load(o0, output7);
11979- stbir__simdfX_load(
11980- o1,
11981- output7 + stbir__simdfX_float_count);
11982- stbir__simdfX_load(
11983- o2,
11984- output7 +
11985- (2 * stbir__simdfX_float_count));
11986- stbir__simdfX_load(
11987- o3,
11988- output7 +
11989- (3 * stbir__simdfX_float_count));
11990- stbir__simdfX_madd(o0, o0, r0, c7);
11991- stbir__simdfX_madd(o1, o1, r1, c7);
11992- stbir__simdfX_madd(o2, o2, r2, c7);
11993- stbir__simdfX_madd(o3, o3, r3, c7);
11994- stbir__simdfX_store(output7, o0);
11995- stbir__simdfX_store(
11996- output7 + stbir__simdfX_float_count,
11997- o1);
11998- stbir__simdfX_store(
11999- output7 +
12000- (2 * stbir__simdfX_float_count),
12001- o2);
12002- stbir__simdfX_store(
12003- output7 +
12004- (3 * stbir__simdfX_float_count),
12005- o3);)
12006-#else
12007- stbIF0(
12008- stbir__simdfX_mult(o0, r0, c0); stbir__simdfX_mult(o1, r1, c0);
12009- stbir__simdfX_mult(o2, r2, c0);
12010- stbir__simdfX_mult(o3, r3, c0);
12011- stbir__simdfX_store(output0, o0);
12012- stbir__simdfX_store(output0 + stbir__simdfX_float_count, o1);
12013- stbir__simdfX_store(output0 + (2 * stbir__simdfX_float_count),
12014- o2);
12015- stbir__simdfX_store(
12016- output0 + (3 * stbir__simdfX_float_count),
12017- o3);) stbIF1(stbir__simdfX_mult(o0, r0, c1);
12018- stbir__simdfX_mult(o1, r1, c1);
12019- stbir__simdfX_mult(o2, r2, c1);
12020- stbir__simdfX_mult(o3, r3, c1);
12021- stbir__simdfX_store(output1, o0);
12022- stbir__simdfX_store(
12023- output1 + stbir__simdfX_float_count, o1);
12024- stbir__simdfX_store(
12025- output1 + (2 * stbir__simdfX_float_count),
12026- o2);
12027- stbir__simdfX_store(
12028- output1 + (3 * stbir__simdfX_float_count),
12029- o3);)
12030- stbIF2(stbir__simdfX_mult(o0, r0, c2);
12031- stbir__simdfX_mult(o1, r1, c2);
12032- stbir__simdfX_mult(o2, r2, c2);
12033- stbir__simdfX_mult(o3, r3, c2);
12034- stbir__simdfX_store(output2, o0);
12035- stbir__simdfX_store(output2 + stbir__simdfX_float_count,
12036- o1);
12037- stbir__simdfX_store(
12038- output2 + (2 * stbir__simdfX_float_count), o2);
12039- stbir__simdfX_store(
12040- output2 + (3 * stbir__simdfX_float_count),
12041- o3);) stbIF3(stbir__simdfX_mult(o0, r0, c3);
12042- stbir__simdfX_mult(o1, r1, c3);
12043- stbir__simdfX_mult(o2, r2, c3);
12044- stbir__simdfX_mult(o3, r3, c3);
12045- stbir__simdfX_store(output3, o0);
12046- stbir__simdfX_store(
12047- output3 + stbir__simdfX_float_count,
12048- o1);
12049- stbir__simdfX_store(
12050- output3 +
12051- (2 * stbir__simdfX_float_count),
12052- o2);
12053- stbir__simdfX_store(
12054- output3 +
12055- (3 * stbir__simdfX_float_count),
12056- o3);)
12057- stbIF4(stbir__simdfX_mult(o0, r0, c4);
12058- stbir__simdfX_mult(o1, r1, c4);
12059- stbir__simdfX_mult(o2, r2, c4);
12060- stbir__simdfX_mult(o3, r3, c4);
12061- stbir__simdfX_store(output4, o0);
12062- stbir__simdfX_store(
12063- output4 + stbir__simdfX_float_count, o1);
12064- stbir__simdfX_store(
12065- output4 + (2 * stbir__simdfX_float_count), o2);
12066- stbir__simdfX_store(
12067- output4 + (3 * stbir__simdfX_float_count), o3);)
12068- stbIF5(
12069- stbir__simdfX_mult(o0, r0, c5);
12070- stbir__simdfX_mult(o1, r1, c5);
12071- stbir__simdfX_mult(o2, r2, c5);
12072- stbir__simdfX_mult(o3, r3, c5);
12073- stbir__simdfX_store(output5, o0);
12074- stbir__simdfX_store(
12075- output5 + stbir__simdfX_float_count, o1);
12076- stbir__simdfX_store(
12077- output5 + (2 * stbir__simdfX_float_count), o2);
12078- stbir__simdfX_store(
12079- output5 + (3 * stbir__simdfX_float_count), o3);)
12080- stbIF6(
12081- stbir__simdfX_mult(o0, r0, c6);
12082- stbir__simdfX_mult(o1, r1, c6);
12083- stbir__simdfX_mult(o2, r2, c6);
12084- stbir__simdfX_mult(o3, r3, c6);
12085- stbir__simdfX_store(output6, o0);
12086- stbir__simdfX_store(
12087- output6 + stbir__simdfX_float_count, o1);
12088- stbir__simdfX_store(
12089- output6 + (2 * stbir__simdfX_float_count),
12090- o2);
12091- stbir__simdfX_store(
12092- output6 + (3 * stbir__simdfX_float_count),
12093- o3);)
12094- stbIF7(stbir__simdfX_mult(o0, r0, c7);
12095- stbir__simdfX_mult(o1, r1, c7);
12096- stbir__simdfX_mult(o2, r2, c7);
12097- stbir__simdfX_mult(o3, r3, c7);
12098- stbir__simdfX_store(output7, o0);
12099- stbir__simdfX_store(
12100- output7 + stbir__simdfX_float_count,
12101- o1);
12102- stbir__simdfX_store(
12103- output7 +
12104- (2 * stbir__simdfX_float_count),
12105- o2);
12106- stbir__simdfX_store(
12107- output7 +
12108- (3 * stbir__simdfX_float_count),
12109- o3);)
12110-#endif
12111-
12112- input += (4 * stbir__simdfX_float_count);
12113- stbIF0(output0 += (4 * stbir__simdfX_float_count);) stbIF1(
12114- output1 += (4 * stbir__simdfX_float_count);)
12115- stbIF2(output2 += (4 * stbir__simdfX_float_count);) stbIF3(
12116- output3 += (4 * stbir__simdfX_float_count);)
12117- stbIF4(output4 += (4 * stbir__simdfX_float_count);) stbIF5(
12118- output5 += (4 * stbir__simdfX_float_count);)
12119- stbIF6(output6 += (4 * stbir__simdfX_float_count);)
12120- stbIF7(output7 += (4 * stbir__simdfX_float_count);)
12121- }
12122- STBIR_SIMD_NO_UNROLL_LOOP_START
12123- while (((char *)input_end - (char *)input) >= 16) {
12124- stbir__simdf o0, r0;
12125- STBIR_SIMD_NO_UNROLL(output0);
12126-
12127- stbir__simdf_load(r0, input);
12128-
12129-#ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
12130- stbIF0(stbir__simdf_load(o0, output0); stbir__simdf_madd(
12131- o0, o0, r0, stbir__if_simdf8_cast_to_simdf4(c0));
12132- stbir__simdf_store(
12133- output0,
12134- o0);) stbIF1(stbir__simdf_load(o0, output1);
12135- stbir__simdf_madd(
12136- o0,
12137- o0,
12138- r0,
12139- stbir__if_simdf8_cast_to_simdf4(c1));
12140- stbir__simdf_store(output1, o0);)
12141- stbIF2(stbir__simdf_load(o0, output2); stbir__simdf_madd(
12142- o0, o0, r0, stbir__if_simdf8_cast_to_simdf4(c2));
12143- stbir__simdf_store(output2, o0);)
12144- stbIF3(stbir__simdf_load(o0, output3); stbir__simdf_madd(
12145- o0, o0, r0, stbir__if_simdf8_cast_to_simdf4(c3));
12146- stbir__simdf_store(output3, o0);)
12147- stbIF4(stbir__simdf_load(o0, output4);
12148- stbir__simdf_madd(
12149- o0,
12150- o0,
12151- r0,
12152- stbir__if_simdf8_cast_to_simdf4(c4));
12153- stbir__simdf_store(output4, o0);)
12154- stbIF5(stbir__simdf_load(o0, output5);
12155- stbir__simdf_madd(
12156- o0,
12157- o0,
12158- r0,
12159- stbir__if_simdf8_cast_to_simdf4(c5));
12160- stbir__simdf_store(output5, o0);)
12161- stbIF6(stbir__simdf_load(o0, output6);
12162- stbir__simdf_madd(
12163- o0,
12164- o0,
12165- r0,
12166- stbir__if_simdf8_cast_to_simdf4(c6));
12167- stbir__simdf_store(output6, o0);)
12168- stbIF7(stbir__simdf_load(o0, output7);
12169- stbir__simdf_madd(
12170- o0,
12171- o0,
12172- r0,
12173- stbir__if_simdf8_cast_to_simdf4(
12174- c7));
12175- stbir__simdf_store(output7, o0);)
12176-#else
12177- stbIF0(
12178- stbir__simdf_mult(o0, r0, stbir__if_simdf8_cast_to_simdf4(c0));
12179- stbir__simdf_store(output0, o0);)
12180- stbIF1(stbir__simdf_mult(
12181- o0, r0, stbir__if_simdf8_cast_to_simdf4(c1));
12182- stbir__simdf_store(output1, o0);)
12183- stbIF2(stbir__simdf_mult(
12184- o0, r0, stbir__if_simdf8_cast_to_simdf4(c2));
12185- stbir__simdf_store(output2, o0);)
12186- stbIF3(stbir__simdf_mult(
12187- o0, r0, stbir__if_simdf8_cast_to_simdf4(c3));
12188- stbir__simdf_store(output3, o0);)
12189- stbIF4(stbir__simdf_mult(
12190- o0,
12191- r0,
12192- stbir__if_simdf8_cast_to_simdf4(c4));
12193- stbir__simdf_store(output4, o0);)
12194- stbIF5(stbir__simdf_mult(
12195- o0,
12196- r0,
12197- stbir__if_simdf8_cast_to_simdf4(c5));
12198- stbir__simdf_store(output5, o0);)
12199- stbIF6(stbir__simdf_mult(
12200- o0,
12201- r0,
12202- stbir__if_simdf8_cast_to_simdf4(
12203- c6));
12204- stbir__simdf_store(output6, o0);)
12205- stbIF7(
12206- stbir__simdf_mult(
12207- o0,
12208- r0,
12209- stbir__if_simdf8_cast_to_simdf4(
12210- c7));
12211- stbir__simdf_store(output7, o0);)
12212-#endif
12213-
12214- input += 4;
12215- stbIF0(output0 += 4;) stbIF1(output1 += 4;) stbIF2(output2 += 4;)
12216- stbIF3(output3 += 4;) stbIF4(output4 += 4;)
12217- stbIF5(output5 += 4;) stbIF6(output6 += 4;)
12218- stbIF7(output7 += 4;)
12219- }
12220- }
12221-#else
12222- STBIR_NO_UNROLL_LOOP_START while (
12223- ((char *)input_end - (char *)input) >=
12224- 16)
12225- {
12226- float r0, r1, r2, r3;
12227- STBIR_NO_UNROLL(input);
12228-
12229- r0 = input[0], r1 = input[1], r2 = input[2], r3 = input[3];
12230-
12231-#ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
12232- stbIF0(output0[0] += (r0 * c0s); output0[1] += (r1 * c0s);
12233- output0[2] += (r2 * c0s); output0[3] += (r3 * c0s);)
12234- stbIF1(output1[0] += (r0 * c1s); output1[1] += (r1 * c1s);
12235- output1[2] += (r2 * c1s); output1[3] += (r3 * c1s);)
12236- stbIF2(output2[0] += (r0 * c2s); output2[1] += (r1 * c2s);
12237- output2[2] += (r2 * c2s); output2[3] += (r3 * c2s);)
12238- stbIF3(output3[0] += (r0 * c3s); output3[1] += (r1 * c3s);
12239- output3[2] += (r2 * c3s); output3[3] += (r3 * c3s);)
12240- stbIF4(
12241- output4[0] += (r0 * c4s); output4[1] += (r1 * c4s);
12242- output4[2] += (r2 * c4s); output4[3] += (r3 * c4s);)
12243- stbIF5(output5[0] += (r0 * c5s);
12244- output5[1] += (r1 * c5s);
12245- output5[2] += (r2 * c5s);
12246- output5[3] += (r3 * c5s);)
12247- stbIF6(output6[0] += (r0 * c6s);
12248- output6[1] += (r1 * c6s);
12249- output6[2] += (r2 * c6s);
12250- output6[3] += (r3 * c6s);)
12251- stbIF7(output7[0] += (r0 * c7s);
12252- output7[1] += (r1 * c7s);
12253- output7[2] += (r2 * c7s);
12254- output7[3] += (r3 * c7s);)
12255-#else
12256- stbIF0(output0[0] = (r0 * c0s); output0[1] = (r1 * c0s);
12257- output0[2] = (r2 * c0s); output0[3] = (r3 * c0s);)
12258- stbIF1(output1[0] = (r0 * c1s); output1[1] = (r1 * c1s);
12259- output1[2] = (r2 * c1s); output1[3] = (r3 * c1s);)
12260- stbIF2(output2[0] = (r0 * c2s); output2[1] = (r1 * c2s);
12261- output2[2] = (r2 * c2s); output2[3] = (r3 * c2s);)
12262- stbIF3(output3[0] = (r0 * c3s); output3[1] = (r1 * c3s);
12263- output3[2] = (r2 * c3s); output3[3] = (r3 * c3s);)
12264- stbIF4(output4[0] = (r0 * c4s); output4[1] = (r1 * c4s);
12265- output4[2] = (r2 * c4s);
12266- output4[3] = (r3 * c4s);)
12267- stbIF5(output5[0] = (r0 * c5s);
12268- output5[1] = (r1 * c5s);
12269- output5[2] = (r2 * c5s);
12270- output5[3] = (r3 * c5s);)
12271- stbIF6(output6[0] = (r0 * c6s);
12272- output6[1] = (r1 * c6s);
12273- output6[2] = (r2 * c6s);
12274- output6[3] = (r3 * c6s);)
12275- stbIF7(output7[0] = (r0 * c7s);
12276- output7[1] = (r1 * c7s);
12277- output7[2] = (r2 * c7s);
12278- output7[3] = (r3 * c7s);)
12279-#endif
12280-
12281- input += 4;
12282- stbIF0(output0 += 4;) stbIF1(output1 += 4;) stbIF2(output2 += 4;)
12283- stbIF3(output3 += 4;) stbIF4(output4 += 4;) stbIF5(output5 += 4;)
12284- stbIF6(output6 += 4;) stbIF7(output7 += 4;)
12285- }
12286-#endif
12287- STBIR_NO_UNROLL_LOOP_START
12288- while (input < input_end) {
12289- float r = input[0];
12290- STBIR_NO_UNROLL(output0);
12291-
12292-#ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
12293- stbIF0(output0[0] += (r * c0s);) stbIF1(output1[0] += (r * c1s);)
12294- stbIF2(output2[0] += (r * c2s);) stbIF3(output3[0] += (r * c3s);)
12295- stbIF4(output4[0] += (r * c4s);)
12296- stbIF5(output5[0] += (r * c5s);)
12297- stbIF6(output6[0] += (r * c6s);)
12298- stbIF7(output7[0] += (r * c7s);)
12299-#else
12300- stbIF0(output0[0] = (r * c0s);) stbIF1(output1[0] = (r * c1s);)
12301- stbIF2(output2[0] = (r * c2s);) stbIF3(output3[0] = (r * c3s);)
12302- stbIF4(output4[0] = (r * c4s);) stbIF5(output5[0] = (r * c5s);)
12303- stbIF6(output6[0] = (r * c6s);)
12304- stbIF7(output7[0] = (r * c7s);)
12305-#endif
12306-
12307- ++ input;
12308- stbIF0(++output0;) stbIF1(++output1;) stbIF2(++output2;)
12309- stbIF3(++output3;) stbIF4(++output4;) stbIF5(++output5;)
12310- stbIF6(++output6;) stbIF7(++output7;)
12311- }
12312-}
12313-
12314-static void
12315-STBIR_chans(stbir__vertical_gather_with_,
12316- _coeffs)(float *outputp,
12317- float const *vertical_coefficients,
12318- float const **inputs,
12319- float const *input0_end)
12320-{
12321- float STBIR_SIMD_STREAMOUT_PTR(*) output = outputp;
12322-
12323- stbIF0(float const *input0 = inputs[0];
12324- float c0s = vertical_coefficients[0];)
12325- stbIF1(float const *input1 = inputs[1];
12326- float c1s = vertical_coefficients[1];)
12327- stbIF2(float const *input2 = inputs[2];
12328- float c2s = vertical_coefficients[2];)
12329- stbIF3(float const *input3 = inputs[3];
12330- float c3s = vertical_coefficients[3];)
12331- stbIF4(float const *input4 = inputs[4];
12332- float c4s = vertical_coefficients[4];)
12333- stbIF5(float const *input5 = inputs[5];
12334- float c5s = vertical_coefficients[5];)
12335- stbIF6(float const *input6 = inputs[6];
12336- float c6s = vertical_coefficients[6];)
12337- stbIF7(float const *input7 = inputs[7];
12338- float c7s = vertical_coefficients[7];)
12339-
12340-#if (STBIR__vertical_channels == 1) && \
12341- !defined(STB_IMAGE_RESIZE_VERTICAL_CONTINUE)
12342- // check single channel one weight
12343- if ((c0s >= (1.0f - 0.000001f)) && (c0s <= (1.0f + 0.000001f)))
12344- {
12345- STBIR_MEMCPY(output, input0, (char *)input0_end - (char *)input0);
12346- return;
12347- }
12348-#endif
12349-
12350-#ifdef STBIR_SIMD
12351- {
12352- stbIF0(stbir__simdfX c0 = stbir__simdf_frepX(c0s);)
12353- stbIF1(stbir__simdfX c1 = stbir__simdf_frepX(c1s);)
12354- stbIF2(stbir__simdfX c2 = stbir__simdf_frepX(c2s);) stbIF3(
12355- stbir__simdfX c3 = stbir__simdf_frepX(c3s);)
12356- stbIF4(stbir__simdfX c4 = stbir__simdf_frepX(c4s);) stbIF5(
12357- stbir__simdfX c5 = stbir__simdf_frepX(c5s);)
12358- stbIF6(stbir__simdfX c6 = stbir__simdf_frepX(c6s);)
12359- stbIF7(stbir__simdfX c7 = stbir__simdf_frepX(c7s);)
12360-
12361- STBIR_SIMD_NO_UNROLL_LOOP_START while (
12362- ((char *)input0_end - (char *)input0) >=
12363- (16 * stbir__simdfX_float_count))
12364- {
12365- stbir__simdfX o0, o1, o2, o3, r0, r1, r2, r3;
12366- STBIR_SIMD_NO_UNROLL(output);
12367-
12368- // prefetch four loop iterations ahead (doesn't affect much for
12369- // small resizes, but helps with big ones)
12370- stbIF0(stbir__prefetch(input0 + (16 * stbir__simdfX_float_count));) stbIF1(
12371- stbir__prefetch(input1 + (16 * stbir__simdfX_float_count));)
12372- stbIF2(stbir__prefetch(input2 + (16 * stbir__simdfX_float_count));) stbIF3(
12373- stbir__prefetch(input3 + (16 * stbir__simdfX_float_count));)
12374- stbIF4(stbir__prefetch(input4 + (16 * stbir__simdfX_float_count));) stbIF5(
12375- stbir__prefetch(input5 +
12376- (16 * stbir__simdfX_float_count));)
12377- stbIF6(stbir__prefetch(input6 + (16 * stbir__simdfX_float_count));) stbIF7(
12378- stbir__prefetch(input7 +
12379- (16 * stbir__simdfX_float_count));)
12380-
12381-#ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
12382- stbIF0(
12383- stbir__simdfX_load(o0, output);
12384- stbir__simdfX_load(
12385- o1, output + stbir__simdfX_float_count);
12386- stbir__simdfX_load(
12387- o2,
12388- output + (2 * stbir__simdfX_float_count));
12389- stbir__simdfX_load(
12390- o3,
12391- output + (3 * stbir__simdfX_float_count));
12392- stbir__simdfX_load(r0, input0);
12393- stbir__simdfX_load(
12394- r1, input0 + stbir__simdfX_float_count);
12395- stbir__simdfX_load(
12396- r2,
12397- input0 + (2 * stbir__simdfX_float_count));
12398- stbir__simdfX_load(
12399- r3,
12400- input0 + (3 * stbir__simdfX_float_count));
12401- stbir__simdfX_madd(o0, o0, r0, c0);
12402- stbir__simdfX_madd(o1, o1, r1, c0);
12403- stbir__simdfX_madd(o2, o2, r2, c0);
12404- stbir__simdfX_madd(o3, o3, r3, c0);)
12405-#else
12406- stbIF0(
12407- stbir__simdfX_load(r0, input0);
12408- stbir__simdfX_load(
12409- r1, input0 + stbir__simdfX_float_count);
12410- stbir__simdfX_load(
12411- r2,
12412- input0 + (2 * stbir__simdfX_float_count));
12413- stbir__simdfX_load(
12414- r3,
12415- input0 + (3 * stbir__simdfX_float_count));
12416- stbir__simdfX_mult(o0, r0, c0);
12417- stbir__simdfX_mult(o1, r1, c0);
12418- stbir__simdfX_mult(o2, r2, c0);
12419- stbir__simdfX_mult(o3, r3, c0);)
12420-#endif
12421-
12422- stbIF1(
12423- stbir__simdfX_load(r0, input1);
12424- stbir__simdfX_load(
12425- r1, input1 + stbir__simdfX_float_count);
12426- stbir__simdfX_load(
12427- r2,
12428- input1 +
12429- (2 * stbir__simdfX_float_count));
12430- stbir__simdfX_load(
12431- r3,
12432- input1 +
12433- (3 * stbir__simdfX_float_count));
12434- stbir__simdfX_madd(o0, o0, r0, c1);
12435- stbir__simdfX_madd(o1, o1, r1, c1);
12436- stbir__simdfX_madd(o2, o2, r2, c1);
12437- stbir__simdfX_madd(
12438- o3,
12439- o3,
12440- r3,
12441- c1);) stbIF2(stbir__simdfX_load(r0,
12442- input2);
12443- stbir__simdfX_load(
12444- r1,
12445- input2 +
12446- stbir__simdfX_float_count);
12447- stbir__simdfX_load(
12448- r2,
12449- input2 +
12450- (2 *
12451- stbir__simdfX_float_count));
12452- stbir__simdfX_load(
12453- r3,
12454- input2 +
12455- (3 *
12456- stbir__simdfX_float_count));
12457- stbir__simdfX_madd(
12458- o0, o0, r0, c2);
12459- stbir__simdfX_madd(
12460- o1, o1, r1, c2);
12461- stbir__simdfX_madd(
12462- o2, o2, r2, c2);
12463- stbir__simdfX_madd(
12464- o3, o3, r3, c2);)
12465- stbIF3(
12466- stbir__simdfX_load(r0, input3);
12467- stbir__simdfX_load(
12468- r1,
12469- input3 + stbir__simdfX_float_count);
12470- stbir__simdfX_load(
12471- r2,
12472- input3 +
12473- (2 *
12474- stbir__simdfX_float_count));
12475- stbir__simdfX_load(
12476- r3,
12477- input3 +
12478- (3 *
12479- stbir__simdfX_float_count));
12480- stbir__simdfX_madd(o0, o0, r0, c3);
12481- stbir__simdfX_madd(o1, o1, r1, c3);
12482- stbir__simdfX_madd(o2, o2, r2, c3);
12483- stbir__simdfX_madd(o3, o3, r3, c3);)
12484- stbIF4(
12485- stbir__simdfX_load(r0, input4);
12486- stbir__simdfX_load(
12487- r1,
12488- input4 +
12489- stbir__simdfX_float_count);
12490- stbir__simdfX_load(
12491- r2,
12492- input4 +
12493- (2 *
12494- stbir__simdfX_float_count));
12495- stbir__simdfX_load(
12496- r3,
12497- input4 +
12498- (3 *
12499- stbir__simdfX_float_count));
12500- stbir__simdfX_madd(o0, o0, r0, c4);
12501- stbir__simdfX_madd(o1, o1, r1, c4);
12502- stbir__simdfX_madd(o2, o2, r2, c4);
12503- stbir__simdfX_madd(o3, o3, r3, c4);)
12504- stbIF5(
12505- stbir__simdfX_load(r0, input5);
12506- stbir__simdfX_load(
12507- r1,
12508- input5 +
12509- stbir__simdfX_float_count);
12510- stbir__simdfX_load(
12511- r2,
12512- input5 +
12513- (2 *
12514- stbir__simdfX_float_count));
12515- stbir__simdfX_load(
12516- r3,
12517- input5 +
12518- (3 *
12519- stbir__simdfX_float_count));
12520- stbir__simdfX_madd(
12521- o0, o0, r0, c5);
12522- stbir__simdfX_madd(
12523- o1, o1, r1, c5);
12524- stbir__simdfX_madd(
12525- o2, o2, r2, c5);
12526- stbir__simdfX_madd(
12527- o3, o3, r3, c5);)
12528- stbIF6(
12529- stbir__simdfX_load(r0,
12530- input6);
12531- stbir__simdfX_load(
12532- r1,
12533- input6 +
12534- stbir__simdfX_float_count);
12535- stbir__simdfX_load(
12536- r2,
12537- input6 +
12538- (2 *
12539- stbir__simdfX_float_count));
12540- stbir__simdfX_load(
12541- r3,
12542- input6 +
12543- (3 *
12544- stbir__simdfX_float_count));
12545- stbir__simdfX_madd(
12546- o0, o0, r0, c6);
12547- stbir__simdfX_madd(
12548- o1, o1, r1, c6);
12549- stbir__simdfX_madd(
12550- o2, o2, r2, c6);
12551- stbir__simdfX_madd(
12552- o3, o3, r3, c6);)
12553- stbIF7(
12554- stbir__simdfX_load(
12555- r0, input7);
12556- stbir__simdfX_load(
12557- r1,
12558- input7 +
12559- stbir__simdfX_float_count);
12560- stbir__simdfX_load(
12561- r2,
12562- input7 +
12563- (2 *
12564- stbir__simdfX_float_count));
12565- stbir__simdfX_load(
12566- r3,
12567- input7 +
12568- (3 *
12569- stbir__simdfX_float_count));
12570- stbir__simdfX_madd(
12571- o0, o0, r0, c7);
12572- stbir__simdfX_madd(
12573- o1, o1, r1, c7);
12574- stbir__simdfX_madd(
12575- o2, o2, r2, c7);
12576- stbir__simdfX_madd(
12577- o3, o3, r3, c7);)
12578-
12579- stbir__simdfX_store(
12580- output, o0);
12581- stbir__simdfX_store(output + stbir__simdfX_float_count, o1);
12582- stbir__simdfX_store(output + (2 * stbir__simdfX_float_count), o2);
12583- stbir__simdfX_store(output + (3 * stbir__simdfX_float_count), o3);
12584- output += (4 * stbir__simdfX_float_count);
12585- stbIF0(input0 += (4 * stbir__simdfX_float_count);) stbIF1(
12586- input1 += (4 * stbir__simdfX_float_count);)
12587- stbIF2(input2 += (4 * stbir__simdfX_float_count);) stbIF3(
12588- input3 += (4 * stbir__simdfX_float_count);)
12589- stbIF4(input4 += (4 * stbir__simdfX_float_count);) stbIF5(
12590- input5 += (4 * stbir__simdfX_float_count);)
12591- stbIF6(input6 += (4 * stbir__simdfX_float_count);)
12592- stbIF7(input7 += (4 * stbir__simdfX_float_count);)
12593- }
12594-
12595- STBIR_SIMD_NO_UNROLL_LOOP_START
12596- while (((char *)input0_end - (char *)input0) >= 16) {
12597- stbir__simdf o0, r0;
12598- STBIR_SIMD_NO_UNROLL(output);
12599-
12600-#ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
12601- stbIF0(stbir__simdf_load(o0, output); stbir__simdf_load(r0, input0);
12602- stbir__simdf_madd(
12603- o0, o0, r0, stbir__if_simdf8_cast_to_simdf4(c0));)
12604-#else
12605- stbIF0(stbir__simdf_load(r0, input0); stbir__simdf_mult(
12606- o0, r0, stbir__if_simdf8_cast_to_simdf4(c0));)
12607-#endif
12608- stbIF1(stbir__simdf_load(r0, input1); stbir__simdf_madd(
12609- o0, o0, r0, stbir__if_simdf8_cast_to_simdf4(c1));)
12610- stbIF2(
12611- stbir__simdf_load(r0, input2); stbir__simdf_madd(
12612- o0, o0, r0, stbir__if_simdf8_cast_to_simdf4(c2));)
12613- stbIF3(stbir__simdf_load(r0, input3); stbir__simdf_madd(
12614- o0,
12615- o0,
12616- r0,
12617- stbir__if_simdf8_cast_to_simdf4(c3));)
12618- stbIF4(stbir__simdf_load(r0, input4);
12619- stbir__simdf_madd(
12620- o0,
12621- o0,
12622- r0,
12623- stbir__if_simdf8_cast_to_simdf4(c4));)
12624- stbIF5(
12625- stbir__simdf_load(r0, input5);
12626- stbir__simdf_madd(
12627- o0,
12628- o0,
12629- r0,
12630- stbir__if_simdf8_cast_to_simdf4(c5));)
12631- stbIF6(stbir__simdf_load(r0, input6);
12632- stbir__simdf_madd(
12633- o0,
12634- o0,
12635- r0,
12636- stbir__if_simdf8_cast_to_simdf4(
12637- c6));)
12638- stbIF7(
12639- stbir__simdf_load(r0, input7);
12640- stbir__simdf_madd(
12641- o0,
12642- o0,
12643- r0,
12644- stbir__if_simdf8_cast_to_simdf4(
12645- c7));)
12646-
12647- stbir__simdf_store(output, o0);
12648- output += 4;
12649- stbIF0(input0 += 4;) stbIF1(input1 += 4;) stbIF2(input2 += 4;)
12650- stbIF3(input3 += 4;) stbIF4(input4 += 4;) stbIF5(input5 += 4;)
12651- stbIF6(input6 += 4;) stbIF7(input7 += 4;)
12652- }
12653- }
12654-#else
12655- STBIR_NO_UNROLL_LOOP_START while (
12656- ((char *)input0_end - (char *)input0) >=
12657- 16)
12658- {
12659- float o0, o1, o2, o3;
12660- STBIR_NO_UNROLL(output);
12661-#ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
12662- stbIF0(
12663- o0 = output[0] + input0[0] * c0s; o1 = output[1] + input0[1] * c0s;
12664- o2 = output[2] + input0[2] * c0s; o3 = output[3] + input0[3] * c0s;)
12665-#else
12666- stbIF0(o0 = input0[0] * c0s; o1 = input0[1] * c0s; o2 = input0[2] * c0s;
12667- o3 = input0[3] * c0s;)
12668-#endif
12669- stbIF1(o0 += input1[0] * c1s; o1 += input1[1] * c1s;
12670- o2 += input1[2] * c1s; o3 += input1[3] * c1s;)
12671- stbIF2(o0 += input2[0] * c2s; o1 += input2[1] * c2s;
12672- o2 += input2[2] * c2s;
12673- o3 += input2[3] * c2s;) stbIF3(o0 += input3[0] * c3s;
12674- o1 += input3[1] * c3s;
12675- o2 += input3[2] * c3s;
12676- o3 += input3[3] * c3s;)
12677- stbIF4(o0 += input4[0] * c4s; o1 += input4[1] * c4s;
12678- o2 += input4[2] * c4s; o3 += input4[3] * c4s;)
12679- stbIF5(o0 += input5[0] * c5s; o1 += input5[1] * c5s;
12680- o2 += input5[2] * c5s; o3 += input5[3] * c5s;)
12681- stbIF6(o0 += input6[0] * c6s; o1 += input6[1] * c6s;
12682- o2 += input6[2] * c6s;
12683- o3 += input6[3] * c6s;)
12684- stbIF7(o0 += input7[0] * c7s;
12685- o1 += input7[1] * c7s;
12686- o2 += input7[2] * c7s;
12687- o3 += input7[3] * c7s;) output[0] = o0;
12688- output[1] = o1;
12689- output[2] = o2;
12690- output[3] = o3;
12691- output += 4;
12692- stbIF0(input0 += 4;) stbIF1(input1 += 4;) stbIF2(input2 += 4;)
12693- stbIF3(input3 += 4;) stbIF4(input4 += 4;) stbIF5(input5 += 4;)
12694- stbIF6(input6 += 4;) stbIF7(input7 += 4;)
12695- }
12696-#endif
12697- STBIR_NO_UNROLL_LOOP_START
12698- while (input0 < input0_end) {
12699- float o0;
12700- STBIR_NO_UNROLL(output);
12701-#ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
12702- stbIF0(o0 = output[0] + input0[0] * c0s;)
12703-#else
12704- stbIF0(o0 = input0[0] * c0s;)
12705-#endif
12706- stbIF1(o0 += input1[0] * c1s;) stbIF2(o0 += input2[0] * c2s;)
12707- stbIF3(o0 += input3[0] * c3s;) stbIF4(o0 += input4[0] * c4s;)
12708- stbIF5(o0 += input5[0] * c5s;)
12709- stbIF6(o0 += input6[0] * c6s;)
12710- stbIF7(o0 += input7[0] * c7s;) output[0] = o0;
12711- ++output;
12712- stbIF0(++input0;) stbIF1(++input1;) stbIF2(++input2;) stbIF3(++input3;)
12713- stbIF4(++input4;) stbIF5(++input5;) stbIF6(++input6;)
12714- stbIF7(++input7;)
12715- }
12716-}
12717-
12718-#undef stbIF0
12719-#undef stbIF1
12720-#undef stbIF2
12721-#undef stbIF3
12722-#undef stbIF4
12723-#undef stbIF5
12724-#undef stbIF6
12725-#undef stbIF7
12726-#undef STB_IMAGE_RESIZE_DO_VERTICALS
12727-#undef STBIR__vertical_channels
12728-#undef STB_IMAGE_RESIZE_DO_HORIZONTALS
12729-#undef STBIR_strs_join24
12730-#undef STBIR_strs_join14
12731-#undef STBIR_chans
12732-#ifdef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
12733-#undef STB_IMAGE_RESIZE_VERTICAL_CONTINUE
12734-#endif
12735-
12736-#else // !STB_IMAGE_RESIZE_DO_VERTICALS
12737-
12738-#define STBIR_chans(start, end) \
12739- STBIR_strs_join1(start, STBIR__horizontal_channels, end)
12740-
12741-#ifndef stbir__2_coeff_only
12742-#define stbir__2_coeff_only() \
12743- stbir__1_coeff_only(); \
12744- stbir__1_coeff_remnant(1);
12745-#endif
12746-
12747-#ifndef stbir__2_coeff_remnant
12748-#define stbir__2_coeff_remnant(ofs) \
12749- stbir__1_coeff_remnant(ofs); \
12750- stbir__1_coeff_remnant((ofs) + 1);
12751-#endif
12752-
12753-#ifndef stbir__3_coeff_only
12754-#define stbir__3_coeff_only() \
12755- stbir__2_coeff_only(); \
12756- stbir__1_coeff_remnant(2);
12757-#endif
12758-
12759-#ifndef stbir__3_coeff_remnant
12760-#define stbir__3_coeff_remnant(ofs) \
12761- stbir__2_coeff_remnant(ofs); \
12762- stbir__1_coeff_remnant((ofs) + 2);
12763-#endif
12764-
12765-#ifndef stbir__3_coeff_setup
12766-#define stbir__3_coeff_setup()
12767-#endif
12768-
12769-#ifndef stbir__4_coeff_start
12770-#define stbir__4_coeff_start() \
12771- stbir__2_coeff_only(); \
12772- stbir__2_coeff_remnant(2);
12773-#endif
12774-
12775-#ifndef stbir__4_coeff_continue_from_4
12776-#define stbir__4_coeff_continue_from_4(ofs) \
12777- stbir__2_coeff_remnant(ofs); \
12778- stbir__2_coeff_remnant((ofs) + 2);
12779-#endif
12780-
12781-#ifndef stbir__store_output_tiny
12782-#define stbir__store_output_tiny stbir__store_output
12783-#endif
12784-
12785-static void
12786-STBIR_chans(stbir__horizontal_gather_, _channels_with_1_coeff)(
12787- float *output_buffer, unsigned int output_sub_size,
12788- float const *decode_buffer,
12789- stbir__contributors const *horizontal_contributors,
12790- float const *horizontal_coefficients, int coefficient_width)
12791-{
12792- float const *output_end =
12793- output_buffer + output_sub_size * STBIR__horizontal_channels;
12794- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
12795- STBIR_SIMD_NO_UNROLL_LOOP_START
12796- do {
12797- float const *decode = decode_buffer + horizontal_contributors->n0 *
12798- STBIR__horizontal_channels;
12799- float const *hc = horizontal_coefficients;
12800- stbir__1_coeff_only();
12801- stbir__store_output_tiny();
12802- } while (output < output_end);
12803-}
12804-
12805-static void
12806-STBIR_chans(stbir__horizontal_gather_, _channels_with_2_coeffs)(
12807- float *output_buffer, unsigned int output_sub_size,
12808- float const *decode_buffer,
12809- stbir__contributors const *horizontal_contributors,
12810- float const *horizontal_coefficients, int coefficient_width)
12811-{
12812- float const *output_end =
12813- output_buffer + output_sub_size * STBIR__horizontal_channels;
12814- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
12815- STBIR_SIMD_NO_UNROLL_LOOP_START
12816- do {
12817- float const *decode = decode_buffer + horizontal_contributors->n0 *
12818- STBIR__horizontal_channels;
12819- float const *hc = horizontal_coefficients;
12820- stbir__2_coeff_only();
12821- stbir__store_output_tiny();
12822- } while (output < output_end);
12823-}
12824-
12825-static void
12826-STBIR_chans(stbir__horizontal_gather_, _channels_with_3_coeffs)(
12827- float *output_buffer, unsigned int output_sub_size,
12828- float const *decode_buffer,
12829- stbir__contributors const *horizontal_contributors,
12830- float const *horizontal_coefficients, int coefficient_width)
12831-{
12832- float const *output_end =
12833- output_buffer + output_sub_size * STBIR__horizontal_channels;
12834- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
12835- STBIR_SIMD_NO_UNROLL_LOOP_START
12836- do {
12837- float const *decode = decode_buffer + horizontal_contributors->n0 *
12838- STBIR__horizontal_channels;
12839- float const *hc = horizontal_coefficients;
12840- stbir__3_coeff_only();
12841- stbir__store_output_tiny();
12842- } while (output < output_end);
12843-}
12844-
12845-static void
12846-STBIR_chans(stbir__horizontal_gather_, _channels_with_4_coeffs)(
12847- float *output_buffer, unsigned int output_sub_size,
12848- float const *decode_buffer,
12849- stbir__contributors const *horizontal_contributors,
12850- float const *horizontal_coefficients, int coefficient_width)
12851-{
12852- float const *output_end =
12853- output_buffer + output_sub_size * STBIR__horizontal_channels;
12854- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
12855- STBIR_SIMD_NO_UNROLL_LOOP_START
12856- do {
12857- float const *decode = decode_buffer + horizontal_contributors->n0 *
12858- STBIR__horizontal_channels;
12859- float const *hc = horizontal_coefficients;
12860- stbir__4_coeff_start();
12861- stbir__store_output();
12862- } while (output < output_end);
12863-}
12864-
12865-static void
12866-STBIR_chans(stbir__horizontal_gather_, _channels_with_5_coeffs)(
12867- float *output_buffer, unsigned int output_sub_size,
12868- float const *decode_buffer,
12869- stbir__contributors const *horizontal_contributors,
12870- float const *horizontal_coefficients, int coefficient_width)
12871-{
12872- float const *output_end =
12873- output_buffer + output_sub_size * STBIR__horizontal_channels;
12874- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
12875- STBIR_SIMD_NO_UNROLL_LOOP_START
12876- do {
12877- float const *decode = decode_buffer + horizontal_contributors->n0 *
12878- STBIR__horizontal_channels;
12879- float const *hc = horizontal_coefficients;
12880- stbir__4_coeff_start();
12881- stbir__1_coeff_remnant(4);
12882- stbir__store_output();
12883- } while (output < output_end);
12884-}
12885-
12886-static void
12887-STBIR_chans(stbir__horizontal_gather_, _channels_with_6_coeffs)(
12888- float *output_buffer, unsigned int output_sub_size,
12889- float const *decode_buffer,
12890- stbir__contributors const *horizontal_contributors,
12891- float const *horizontal_coefficients, int coefficient_width)
12892-{
12893- float const *output_end =
12894- output_buffer + output_sub_size * STBIR__horizontal_channels;
12895- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
12896- STBIR_SIMD_NO_UNROLL_LOOP_START
12897- do {
12898- float const *decode = decode_buffer + horizontal_contributors->n0 *
12899- STBIR__horizontal_channels;
12900- float const *hc = horizontal_coefficients;
12901- stbir__4_coeff_start();
12902- stbir__2_coeff_remnant(4);
12903- stbir__store_output();
12904- } while (output < output_end);
12905-}
12906-
12907-static void
12908-STBIR_chans(stbir__horizontal_gather_, _channels_with_7_coeffs)(
12909- float *output_buffer, unsigned int output_sub_size,
12910- float const *decode_buffer,
12911- stbir__contributors const *horizontal_contributors,
12912- float const *horizontal_coefficients, int coefficient_width)
12913-{
12914- float const *output_end =
12915- output_buffer + output_sub_size * STBIR__horizontal_channels;
12916- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
12917- stbir__3_coeff_setup();
12918- STBIR_SIMD_NO_UNROLL_LOOP_START
12919- do {
12920- float const *decode = decode_buffer + horizontal_contributors->n0 *
12921- STBIR__horizontal_channels;
12922- float const *hc = horizontal_coefficients;
12923-
12924- stbir__4_coeff_start();
12925- stbir__3_coeff_remnant(4);
12926- stbir__store_output();
12927- } while (output < output_end);
12928-}
12929-
12930-static void
12931-STBIR_chans(stbir__horizontal_gather_, _channels_with_8_coeffs)(
12932- float *output_buffer, unsigned int output_sub_size,
12933- float const *decode_buffer,
12934- stbir__contributors const *horizontal_contributors,
12935- float const *horizontal_coefficients, int coefficient_width)
12936-{
12937- float const *output_end =
12938- output_buffer + output_sub_size * STBIR__horizontal_channels;
12939- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
12940- STBIR_SIMD_NO_UNROLL_LOOP_START
12941- do {
12942- float const *decode = decode_buffer + horizontal_contributors->n0 *
12943- STBIR__horizontal_channels;
12944- float const *hc = horizontal_coefficients;
12945- stbir__4_coeff_start();
12946- stbir__4_coeff_continue_from_4(4);
12947- stbir__store_output();
12948- } while (output < output_end);
12949-}
12950-
12951-static void
12952-STBIR_chans(stbir__horizontal_gather_, _channels_with_9_coeffs)(
12953- float *output_buffer, unsigned int output_sub_size,
12954- float const *decode_buffer,
12955- stbir__contributors const *horizontal_contributors,
12956- float const *horizontal_coefficients, int coefficient_width)
12957-{
12958- float const *output_end =
12959- output_buffer + output_sub_size * STBIR__horizontal_channels;
12960- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
12961- STBIR_SIMD_NO_UNROLL_LOOP_START
12962- do {
12963- float const *decode = decode_buffer + horizontal_contributors->n0 *
12964- STBIR__horizontal_channels;
12965- float const *hc = horizontal_coefficients;
12966- stbir__4_coeff_start();
12967- stbir__4_coeff_continue_from_4(4);
12968- stbir__1_coeff_remnant(8);
12969- stbir__store_output();
12970- } while (output < output_end);
12971-}
12972-
12973-static void
12974-STBIR_chans(stbir__horizontal_gather_, _channels_with_10_coeffs)(
12975- float *output_buffer, unsigned int output_sub_size,
12976- float const *decode_buffer,
12977- stbir__contributors const *horizontal_contributors,
12978- float const *horizontal_coefficients, int coefficient_width)
12979-{
12980- float const *output_end =
12981- output_buffer + output_sub_size * STBIR__horizontal_channels;
12982- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
12983- STBIR_SIMD_NO_UNROLL_LOOP_START
12984- do {
12985- float const *decode = decode_buffer + horizontal_contributors->n0 *
12986- STBIR__horizontal_channels;
12987- float const *hc = horizontal_coefficients;
12988- stbir__4_coeff_start();
12989- stbir__4_coeff_continue_from_4(4);
12990- stbir__2_coeff_remnant(8);
12991- stbir__store_output();
12992- } while (output < output_end);
12993-}
12994-
12995-static void
12996-STBIR_chans(stbir__horizontal_gather_, _channels_with_11_coeffs)(
12997- float *output_buffer, unsigned int output_sub_size,
12998- float const *decode_buffer,
12999- stbir__contributors const *horizontal_contributors,
13000- float const *horizontal_coefficients, int coefficient_width)
13001-{
13002- float const *output_end =
13003- output_buffer + output_sub_size * STBIR__horizontal_channels;
13004- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
13005- stbir__3_coeff_setup();
13006- STBIR_SIMD_NO_UNROLL_LOOP_START
13007- do {
13008- float const *decode = decode_buffer + horizontal_contributors->n0 *
13009- STBIR__horizontal_channels;
13010- float const *hc = horizontal_coefficients;
13011- stbir__4_coeff_start();
13012- stbir__4_coeff_continue_from_4(4);
13013- stbir__3_coeff_remnant(8);
13014- stbir__store_output();
13015- } while (output < output_end);
13016-}
13017-
13018-static void
13019-STBIR_chans(stbir__horizontal_gather_, _channels_with_12_coeffs)(
13020- float *output_buffer, unsigned int output_sub_size,
13021- float const *decode_buffer,
13022- stbir__contributors const *horizontal_contributors,
13023- float const *horizontal_coefficients, int coefficient_width)
13024-{
13025- float const *output_end =
13026- output_buffer + output_sub_size * STBIR__horizontal_channels;
13027- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
13028- STBIR_SIMD_NO_UNROLL_LOOP_START
13029- do {
13030- float const *decode = decode_buffer + horizontal_contributors->n0 *
13031- STBIR__horizontal_channels;
13032- float const *hc = horizontal_coefficients;
13033- stbir__4_coeff_start();
13034- stbir__4_coeff_continue_from_4(4);
13035- stbir__4_coeff_continue_from_4(8);
13036- stbir__store_output();
13037- } while (output < output_end);
13038-}
13039-
13040-static void
13041-STBIR_chans(stbir__horizontal_gather_, _channels_with_n_coeffs_mod0)(
13042- float *output_buffer, unsigned int output_sub_size,
13043- float const *decode_buffer,
13044- stbir__contributors const *horizontal_contributors,
13045- float const *horizontal_coefficients, int coefficient_width)
13046-{
13047- float const *output_end =
13048- output_buffer + output_sub_size * STBIR__horizontal_channels;
13049- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
13050- STBIR_SIMD_NO_UNROLL_LOOP_START
13051- do {
13052- float const *decode = decode_buffer + horizontal_contributors->n0 *
13053- STBIR__horizontal_channels;
13054- int n =
13055- ((horizontal_contributors->n1 - horizontal_contributors->n0 + 1) -
13056- 4 + 3) >>
13057- 2;
13058- float const *hc = horizontal_coefficients;
13059-
13060- stbir__4_coeff_start();
13061- STBIR_SIMD_NO_UNROLL_LOOP_START
13062- do {
13063- hc += 4;
13064- decode += STBIR__horizontal_channels * 4;
13065- stbir__4_coeff_continue_from_4(0);
13066- --n;
13067- } while (n > 0);
13068- stbir__store_output();
13069- } while (output < output_end);
13070-}
13071-
13072-static void
13073-STBIR_chans(stbir__horizontal_gather_, _channels_with_n_coeffs_mod1)(
13074- float *output_buffer, unsigned int output_sub_size,
13075- float const *decode_buffer,
13076- stbir__contributors const *horizontal_contributors,
13077- float const *horizontal_coefficients, int coefficient_width)
13078-{
13079- float const *output_end =
13080- output_buffer + output_sub_size * STBIR__horizontal_channels;
13081- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
13082- STBIR_SIMD_NO_UNROLL_LOOP_START
13083- do {
13084- float const *decode = decode_buffer + horizontal_contributors->n0 *
13085- STBIR__horizontal_channels;
13086- int n =
13087- ((horizontal_contributors->n1 - horizontal_contributors->n0 + 1) -
13088- 5 + 3) >>
13089- 2;
13090- float const *hc = horizontal_coefficients;
13091-
13092- stbir__4_coeff_start();
13093- STBIR_SIMD_NO_UNROLL_LOOP_START
13094- do {
13095- hc += 4;
13096- decode += STBIR__horizontal_channels * 4;
13097- stbir__4_coeff_continue_from_4(0);
13098- --n;
13099- } while (n > 0);
13100- stbir__1_coeff_remnant(4);
13101- stbir__store_output();
13102- } while (output < output_end);
13103-}
13104-
13105-static void
13106-STBIR_chans(stbir__horizontal_gather_, _channels_with_n_coeffs_mod2)(
13107- float *output_buffer, unsigned int output_sub_size,
13108- float const *decode_buffer,
13109- stbir__contributors const *horizontal_contributors,
13110- float const *horizontal_coefficients, int coefficient_width)
13111-{
13112- float const *output_end =
13113- output_buffer + output_sub_size * STBIR__horizontal_channels;
13114- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
13115- STBIR_SIMD_NO_UNROLL_LOOP_START
13116- do {
13117- float const *decode = decode_buffer + horizontal_contributors->n0 *
13118- STBIR__horizontal_channels;
13119- int n =
13120- ((horizontal_contributors->n1 - horizontal_contributors->n0 + 1) -
13121- 6 + 3) >>
13122- 2;
13123- float const *hc = horizontal_coefficients;
13124-
13125- stbir__4_coeff_start();
13126- STBIR_SIMD_NO_UNROLL_LOOP_START
13127- do {
13128- hc += 4;
13129- decode += STBIR__horizontal_channels * 4;
13130- stbir__4_coeff_continue_from_4(0);
13131- --n;
13132- } while (n > 0);
13133- stbir__2_coeff_remnant(4);
13134-
13135- stbir__store_output();
13136- } while (output < output_end);
13137-}
13138-
13139-static void
13140-STBIR_chans(stbir__horizontal_gather_, _channels_with_n_coeffs_mod3)(
13141- float *output_buffer, unsigned int output_sub_size,
13142- float const *decode_buffer,
13143- stbir__contributors const *horizontal_contributors,
13144- float const *horizontal_coefficients, int coefficient_width)
13145-{
13146- float const *output_end =
13147- output_buffer + output_sub_size * STBIR__horizontal_channels;
13148- float STBIR_SIMD_STREAMOUT_PTR(*) output = output_buffer;
13149- stbir__3_coeff_setup();
13150- STBIR_SIMD_NO_UNROLL_LOOP_START
13151- do {
13152- float const *decode = decode_buffer + horizontal_contributors->n0 *
13153- STBIR__horizontal_channels;
13154- int n =
13155- ((horizontal_contributors->n1 - horizontal_contributors->n0 + 1) -
13156- 7 + 3) >>
13157- 2;
13158- float const *hc = horizontal_coefficients;
13159-
13160- stbir__4_coeff_start();
13161- STBIR_SIMD_NO_UNROLL_LOOP_START
13162- do {
13163- hc += 4;
13164- decode += STBIR__horizontal_channels * 4;
13165- stbir__4_coeff_continue_from_4(0);
13166- --n;
13167- } while (n > 0);
13168- stbir__3_coeff_remnant(4);
13169-
13170- stbir__store_output();
13171- } while (output < output_end);
13172-}
13173-
13174-static stbir__horizontal_gather_channels_func *
13175- STBIR_chans(stbir__horizontal_gather_, _channels_with_n_coeffs_funcs)[4] = {
13176- STBIR_chans(stbir__horizontal_gather_, _channels_with_n_coeffs_mod0),
13177- STBIR_chans(stbir__horizontal_gather_, _channels_with_n_coeffs_mod1),
13178- STBIR_chans(stbir__horizontal_gather_, _channels_with_n_coeffs_mod2),
13179- STBIR_chans(stbir__horizontal_gather_, _channels_with_n_coeffs_mod3),
13180-};
13181-
13182-static stbir__horizontal_gather_channels_func *
13183- STBIR_chans(stbir__horizontal_gather_, _channels_funcs)[12] = {
13184- STBIR_chans(stbir__horizontal_gather_, _channels_with_1_coeff),
13185- STBIR_chans(stbir__horizontal_gather_, _channels_with_2_coeffs),
13186- STBIR_chans(stbir__horizontal_gather_, _channels_with_3_coeffs),
13187- STBIR_chans(stbir__horizontal_gather_, _channels_with_4_coeffs),
13188- STBIR_chans(stbir__horizontal_gather_, _channels_with_5_coeffs),
13189- STBIR_chans(stbir__horizontal_gather_, _channels_with_6_coeffs),
13190- STBIR_chans(stbir__horizontal_gather_, _channels_with_7_coeffs),
13191- STBIR_chans(stbir__horizontal_gather_, _channels_with_8_coeffs),
13192- STBIR_chans(stbir__horizontal_gather_, _channels_with_9_coeffs),
13193- STBIR_chans(stbir__horizontal_gather_, _channels_with_10_coeffs),
13194- STBIR_chans(stbir__horizontal_gather_, _channels_with_11_coeffs),
13195- STBIR_chans(stbir__horizontal_gather_, _channels_with_12_coeffs),
13196-};
13197-
13198-#undef STBIR__horizontal_channels
13199-#undef STB_IMAGE_RESIZE_DO_HORIZONTALS
13200-#undef stbir__1_coeff_only
13201-#undef stbir__1_coeff_remnant
13202-#undef stbir__2_coeff_only
13203-#undef stbir__2_coeff_remnant
13204-#undef stbir__3_coeff_only
13205-#undef stbir__3_coeff_remnant
13206-#undef stbir__3_coeff_setup
13207-#undef stbir__4_coeff_start
13208-#undef stbir__4_coeff_continue_from_4
13209-#undef stbir__store_output
13210-#undef stbir__store_output_tiny
13211-#undef STBIR_chans
13212-
13213-#endif // HORIZONALS
13214-
13215-#undef STBIR_strs_join2
13216-#undef STBIR_strs_join1
13217-
13218-#endif // STB_IMAGE_RESIZE_DO_HORIZONTALS/VERTICALS/CODERS
13219-
13220-/*
13221-------------------------------------------------------------------------------
13222-This software is available under 2 licenses -- choose whichever you prefer.
13223-------------------------------------------------------------------------------
13224-ALTERNATIVE A - MIT License
13225-Copyright (c) 2017 Sean Barrett
13226-Permission is hereby granted, free of charge, to any person obtaining a copy of
13227-this software and associated documentation files (the "Software"), to deal in
13228-the Software without restriction, including without limitation the rights to
13229-use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
13230-of the Software, and to permit persons to whom the Software is furnished to do
13231-so, subject to the following conditions:
13232-The above copyright notice and this permission notice shall be included in all
13233-copies or substantial portions of the Software.
13234-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
13235-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
13236-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
13237-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
13238-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
13239-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
13240-SOFTWARE.
13241-------------------------------------------------------------------------------
13242-ALTERNATIVE B - Public Domain (www.unlicense.org)
13243-This is free and unencumbered software released into the public domain.
13244-Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
13245-software, either in source code form or as a compiled binary, for any purpose,
13246-commercial or non-commercial, and by any means.
13247-In jurisdictions that recognize copyright laws, the author or authors of this
13248-software dedicate any and all copyright interest in the software to the public
13249-domain. We make this dedication for the benefit of the public at large and to
13250-the detriment of our heirs and successors. We intend this dedication to be an
13251-overt act of relinquishment in perpetuity of all present and future rights to
13252-this software under copyright law.
13253-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
13254-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
13255-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
13256-AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
13257-ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
13258-WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
13259-------------------------------------------------------------------------------
13260-*/