optimization - How do I "clamp" a value into the range really fast in Delphi? -


I have a lot of sample processing Prachintaan:

  function Add8 (A, B) : Byte): byte; {$ IFDEF codeEngining} inline; The {$ ENDIF} result starts: = A + B; End; Function Sub 16 (A, B: Word): Word; {$ IFDEF codeining} inline; {$ ENDIF} results begin: = A - B; End; {Et cetera}  

The function assignment of data processing and each input sample apply to (millions of them) designed Result type argument Should have the same size (operands).

The problem arises when the results of the operation are more than the defined limit of the (results). High (results) , reduce the most important bits to make effective results . For example: Adding a lower value to the extreme value Add8 (240, 22) ends the peak, I'm better 255 to reduce the two values ​​of the near baseline level For sub 16 (32000, 33000) I would better 0 .

My questions are: How to do this, to effect the functioning values ​​in the range according to performance-level values? Is there a general solution for all arithmetic and all base types (8 bit, 16 bit, unsigned, signed)?

Because you work with large size data processing, I try some assembler - MMX I suggest, the SE 2 order is specifically meant for such actions. For example, Padbs can add 16 bytes bytes at a time with the instruction saturation (byte clamp results in range) Do not forget about proper alignment of data (deviation)

example (not fully tested) 32-bit compiler it 9x faster work of Pascal version for treatment of 256 M-Array (604 vs. 5100 MS 10 iterations With) Note that the proper data size is of the Pascal version A very sharp.

  Programs Project1; {$ APPTYPE console} uses SysUtils; Process AddBytesSat (const A, B, Res: PByteArray; Len: Integer); // adds byte arrays [i] = a [i] + b [i] // Arejh with saturation should be aligned with the 16-byte boundary, to the length divided by 16/3 eax, edx, ecx registers parameter, fourth stacks ASM Msais ASI, Isiaks / save ridge pointer Moansiaks, lane Sarsiaks, 4/1 lane Div 16 @@ Startः movdqa xmm0, [eax] // copy 16 bytes (aligned) with SSE register Padbs XMM 0 [ Adx] // satellite movdqa [esi] with 16 unsigned Price adds, XMM 0 // Stage Results bytes add memory AX, 16 / / Srkant pointers add Adaks, 16 ADI add 16 Loop @@ Start / / next iteration pop esi to end; Var A, B, C: PByteArray; I: integer; To start / make sure that the memory manager returns the allocated blocks correctly SetMinimumBlockAlignment (System.mba16Byte); GetMem (A, 32); GetMem (B, 32); GetMem (C, 32); For: Starting from = 0 to 31 a [i]: = 8 * i; B [ii]: = 200; End; AddBytesSat (A, B, C, 32); Clamping display for // i: = 0 to 15 Wrighten (C [ii]); Readln; End.  

Comments