How to decode the Nintendo logo from gameboy? - device-emulation

I've tried to decode the following bitmap using the background pallete scheme described at http://imrannazar.com/GameBoy-Emulation-in-JavaScript:-Graphics
CE ED 66 66 CC 0D 00 0B 03 73 00 83 00 0C 00 0D 00 08 11 1F 88 89 00
0E DC CC 6E E6 DD DD D9 99 BB BB 67 63 6E 0E EC CC DD DC 99 9F BB B9
33 3E
source: http://gbdev.gg8.se/wiki/articles/The_Cartridge_Header#0104-0133_-_Nintendo_Logo
But I only got something that resembles a noise.
In what direction should I go? Is it using compression? I can't find more information about this dump in the internet.
Best so far (20x zoom):

There is no compression or encrytion at all.
The logo is binary encoded: 1 is black and 0 is white/green/whatever you want to call the background color of the game boy).
Simply put the hexadecimal string in the correct order and then convert the hex chars to binary:
Hexadecimal:
C 6 C 0 0 0 0 0 0 1 8 0
E 6 C 0 3 0 0 0 0 1 8 0
E 6 0 0 7 8 0 0 0 1 8 0
D 6 D B 3 3 C D 8 F 9 E
D 6 D D B 6 6 E D 9 B 3
C E D 9 B 7 E C D 9 B 3
C E D 9 B 6 0 C D 9 B 3
C 6 D 9 B 3 E C C F 9 E
Binary:
1100 0110 1100 0000 0000 0000 0000 0000 0000 0001 1000 0000
1110 0110 1100 0000 0011 0000 0000 0000 0000 0001 1000 0000
1110 0110 0000 0000 0111 1000 0000 0000 0000 0001 1000 0000
1101 0110 1101 1011 0011 0011 1100 1101 1000 1111 1001 1110
1101 0110 1101 1101 1011 0110 0110 1110 1101 1001 1011 0011
1100 1110 1101 1001 1011 0111 1110 1100 1101 1001 1011 0011
1100 1110 1101 1001 1011 0110 0000 1100 1101 1001 1011 0011
1100 0110 1101 1001 1011 0011 1110 1100 1100 1111 1001 1110
There you go. Your Nintendo logo (w/o 0 and spaces):
11 11 11 11
111 11 11 11 11
111 11 1111 11
11 1 11 11 11 11 11 1111 11 11 11111 1111
11 1 11 11 111 11 11 11 11 111 11 11 11 11 11
11 111 11 11 11 11 111111 11 11 11 11 11 11
11 111 11 11 11 11 11 11 11 11 11 11 11
11 11 11 11 11 11 11111 11 11 11111 1111
Using █ instead of 1:
██ ██ ██ ██
███ ██ ██ ██ ██
███ ██ ████ ██
██ █ ██ ██ ██ ██ ██ ████ ██ ██ █████ ████
██ █ ██ ██ ███ ██ ██ ██ ██ ███ ██ ██ ██ ██ ██
██ ███ ██ ██ ██ ██ ██████ ██ ██ ██ ██ ██ ██
██ ███ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██
██ ██ ██ ██ ██ ██ █████ ██ ██ █████ ████

In addition to the answer by PBurggraf, here is a snippet of my code that I used to check my understanding of it.
static const uint8_t data[] = {
0xCE, 0xED, 0x66, 0x66, 0xCC, 0x0D, 0x00, 0x0B, 0x03, 0x73, 0x00, 0x83,
0x00, 0x0C, 0x00, 0x0D, 0x00, 0x08, 0x11, 0x1F, 0x88, 0x89, 0x00, 0x0E,
0xDC, 0xCC, 0x6E, 0xE6, 0xDD, 0xDD, 0xD9, 0x99, 0xBB, 0xBB, 0x67, 0x63,
0x6E, 0x0E, 0xEC, 0xCC, 0xDD, 0xDC, 0x99, 0x9F, 0xBB, 0xB9, 0x33, 0x3E,
};
for(int y=0; y<8; ++y)
{
int i = ((y/2)%2)+(y/4)*24;
for(int x=0; x<12; ++x,i+=2)
{
const uint8_t nibble = (y%2) ? (data[i]&0xF) : (data[i]>>4);
for(int b=4; b--;) std::cout << (((nibble>>b)&1) ? "*" : " ");
}
std::cout << std::endl;
}
It outputs:
** ** ** **
*** ** ** ** **
*** ** **** **
** * ** ** ** ** ** **** ** ** ***** ****
** * ** ** *** ** ** ** ** *** ** ** ** ** **
** *** ** ** ** ** ****** ** ** ** ** ** **
** *** ** ** ** ** ** ** ** ** ** ** **
** ** ** ** ** ** ***** ** ** ***** ****
Hope it helps someone.

Clarification:
And there is a kind of enryption/compression on the logo.
You must sort the hexstring (decrypt)
You must draw each bit 4 times (decompress) As pokechu22 said before.

Related

VerifyError: Bad type on operand stack dropwizard

We upgraded the java version to 11 in a microservice.
When we tried to run the app, we got the following message:
Caused by: java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
com/template/main/App.initialize(Lio/dropwizard/setup/Bootstrap;)V #153: invokespecial
Reason:
Type 'io/dropwizard/configuration/EnvironmentVariableSubstitutor' (current frame, stack[4]) is not assignable to 'org/apache/commons/text/StrSubstitutor'
Current Frame:
bci: #153
flags: { }
locals: { 'com/template/main/App', 'io/dropwizard/setup/Bootstrap', 'com/bendb/dropwizard/jooq/JooqBundle' }
stack: { 'io/dropwizard/setup/Bootstrap', uninitialized 137, uninitialized 137, 'io/dropwizard/configuration/ConfigurationSourceProvider', 'io/dropwizard/configuration/EnvironmentVariableSubstitutor' }
Bytecode:
0000000: 2a2b b700 052a b600 064d 2b2c b600 072b
0000010: bb00 0859 b700 09b6 0007 2bbb 000a 59b7
0000020: 000b b600 0c2a b800 0d12 0eb6 000f bb00
0000030: 1059 b700 11b6 0012 bb00 1359 2cb7 0014
0000040: b600 12bb 0015 59b7 0016 b600 12bb 0017
0000050: 59b7 0018 b600 12bb 0019 59b7 001a b600
0000060: 12bb 001b 59b7 001c b600 1204 bd00 1d59
0000070: 0312 1e53 b600 1fb2 0020 b600 21b5 0022
0000080: 2b2a b400 22b6 0007 2bbb 0023 592b b600
0000090: 24bb 0025 5903 b700 26b7 0027 b600 282b
00000a0: bb00 2959 2ab7 002a b600 0c2b bb00 2b59
00000b0: 2ab7 002c b600 07b1
Any idea how can we fix it?

How to convert two bytes to floating point number

I have some legacy files that need mined for data. The files were created by Lotus123 Release 4 for DOS. I'm trying to read the files faster by parsing the bytes rather than using Lotus to open the files.
Dim fileBytes() As Byte = My.Computer.FileSystem.ReadAllBytes(fiPath)
'I loop through all the data getting first/second bytes for each value
do ...
Dim FirstByte As Int16 = Convert.ToInt16(fileBytes(Index))
Dim SecondByte As Int16 = Convert.ToInt16(fileBytes(Index + 1))
loop ...
I can get integer values like this:
Dim value As Int16 = BitConverter.ToInt16(fileBytes, Index + 8) / 2
But floating numbers are more complicated. Only the smaller numbers are stored with two bytes. Larger values take 10 bytes, but that's another question. Here we only have smaller values with two bytes. Here are some sample values. I entered the byte values into Excel and use the =DEC2BIN() to convert to binary adding zeros on the left as needed to get 8 bits.
First Second
Byte Byte Value First Byte 2nd Byte
7 241 = -1.2 0000 0111 1111 0001
254 255 = -1 1111 1110 1111 1111
9 156 = -0.8 0000 1001 1001 1100
9 181 = -0.6 0000 1001 1011 0101
9 206 = -0.4 0000 1001 1100 1110
9 231 = -0.2 0000 1001 1110 0111
13 0 = 0 0000 1101 0000 0000
137 12 = 0.1 1000 1001 0000 1100
9 25 = 0.2 0000 1001 0001 1001
137 37 = 0.3 1000 1001 0010 0101
9 50 = 0.4 0000 1001 0011 0010
15 2 = 0.5 0000 1111 0000 0010
9 75 = 0.6 0000 1001 0100 1011
137 87 = 0.7 1000 1001 0101 0111
9 100 = 0.8 0000 1001 0110 0100
137 112 = 0.9 1000 1001 0111 0000
2 0 = 1 0000 0010 0000 0000
199 13 = 1.1 1100 0111 0000 1101
7 15 = 1.2 0000 0111 0000 1111
71 16 = 1.3 0100 0111 0001 0000
135 17 = 1.4 1000 0111 0001 0001
15 6 = 1.5 0000 1111 0000 0110
7 20 = 1.6 0000 0111 0001 0100
71 21 = 1.7 0100 0111 0001 0101
135 22 = 1.8 1000 0111 0001 0110
199 23 = 1.9 1100 0111 0001 0111
4 0 = 2 0000 0100 0000 0000
I'm hoping for a simple conversion method. Or maybe it'll be more complicated.
I looked at BCD: "BCD was used in many early decimal computers, and is implemented in the instruction set of machines such as the IBM System/360 series" and Intel BCD opcode
I do not know if this is BCD or what it is. How do I convert the two bits into a floating point number?
I used the information from the website pointed out by Andrew Morton in comments. Basically the stored 16-bit quantity consists of either a 15-bit two's complement integer (when the lsb is 0) or a 12-bit two's complement integer plus a processing code indicating a scale factor to be applied to that integer (when the lsb is 1). I am not familiar with vb.net so am providing ISO-C code here. Program below successfully decodes all the data provided in the question.
Note: I am converting to an 8-byte double in code below, while the question suggests that the original conversion may have been to a 10-byte long double format (the 80-bit extended-precision format of the 8087 math coprocessor). It would seem like a good idea to try more test data to achieve full coverage of the eight scaling codes: Large integers like 1,000,000 and 1,000,000,000; decimal fractions like 0.0003, 0.000005, and 0.00000007; and binary fractions like 0.125 (1/8) and 0.046875 (3/64).
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
typedef struct {
uint8_t byte1;
uint8_t byte2;
} num;
num data[] =
{
{ 7, 241}, {254, 255}, { 9, 156}, { 9, 181}, { 9, 206}, { 9, 231},
{ 13, 0}, {137, 12}, { 9, 25}, {137, 37}, { 9, 50}, { 15, 2},
{ 9, 75}, {137, 87}, { 9, 100}, {137, 112}, { 2, 0}, {199, 13},
{ 7, 15}, { 71, 16}, {135, 17}, { 15, 6}, { 7, 20}, { 71, 21},
{135, 22}, {199, 23}, { 4, 0}
};
int data_count = sizeof (data) / sizeof (data[0]);
/* define operators that may look more familiar to vb.net programmers */
#define XOR ^
#define MOD %
int main (void)
{
int i;
uint8_t b1, b2;
uint16_t h, code;
int32_t n;
double r;
for (i = 0; i < data_count; i++) {
b1 = data[i].byte1;
b2 = data[i].byte2;
/* data word */
h = ((uint16_t)b2 * 256) + b1;
/* h<0>=1 indicates stored integer needs to be scaled */
if ((h MOD 2) == 1) {
/* extract scaling code in h<3:1> */
code = (h / 2) MOD 8;
/* scaled 12-bit integer in h<15:4>. Extract, sign-extend to 32 bits */
n = (int32_t)((((uint32_t)h / 16) XOR 2048) - 2048);
/* convert integer to floating-point */
r = (double)n;
/* scale based on scaling code */
switch (code) {
case 0x0: r = r * 5000; break;
case 0x1: r = r * 500; break;
case 0x2: r = r / 20; break;
case 0x3: r = r / 200; break;
case 0x4: r = r / 2000; break;
case 0x5: r = r / 20000; break;
case 0x6: r = r / 16; break;
case 0x7: r = r / 64; break;
};
} else {
/* unscaled 15-bit integer in h<15:1>. Extract, sign extend to 32 bits */
n = (int32_t)((((uint32_t)h / 2) XOR 16384) - 16384);
/* convert integer to floating-point */
r = (double)n;
}
printf ("[%3d,%3d] n=%08x r=% 12.8f\n", b1, b2, n, r);
}
return EXIT_SUCCESS;
}
The output of this program is as follows:
[ 7,241] n=ffffff10 r= -1.20000000
[254,255] n=ffffffff r= -1.00000000
[ 9,156] n=fffff9c0 r= -0.80000000
[ 9,181] n=fffffb50 r= -0.60000000
[ 9,206] n=fffffce0 r= -0.40000000
[ 9,231] n=fffffe70 r= -0.20000000
[ 13, 0] n=00000000 r= 0.00000000
[137, 12] n=000000c8 r= 0.10000000
[ 9, 25] n=00000190 r= 0.20000000
[137, 37] n=00000258 r= 0.30000000
[ 9, 50] n=00000320 r= 0.40000000
[ 15, 2] n=00000020 r= 0.50000000
[ 9, 75] n=000004b0 r= 0.60000000
[137, 87] n=00000578 r= 0.70000000
[ 9,100] n=00000640 r= 0.80000000
[137,112] n=00000708 r= 0.90000000
[ 2, 0] n=00000001 r= 1.00000000
[199, 13] n=000000dc r= 1.10000000
[ 7, 15] n=000000f0 r= 1.20000000
[ 71, 16] n=00000104 r= 1.30000000
[135, 17] n=00000118 r= 1.40000000
[ 15, 6] n=00000060 r= 1.50000000
[ 7, 20] n=00000140 r= 1.60000000
[ 71, 21] n=00000154 r= 1.70000000
[135, 22] n=00000168 r= 1.80000000
[199, 23] n=0000017c r= 1.90000000
[ 4, 0] n=00000002 r= 2.00000000
Just a VB.Net translation of the C code posted by njuffa.
The original structure has been substituted with a Byte array and the numeric data type adapted to .Net types. That's all.
Dim data As Byte(,) = New Byte(,) {
{7, 241}, {254, 255}, {9, 156}, {9, 181}, {9, 206}, {9, 231}, {13, 0}, {137, 12}, {9, 25},
{137, 37}, {9, 50}, {15, 2}, {9, 75}, {137, 87}, {9, 100}, {137, 112}, {2, 0}, {199, 13},
{7, 15}, {71, 16}, {135, 17}, {15, 6}, {7, 20}, {71, 21}, {135, 22}, {199, 23}, {4, 0}
}
Dim byte1, byte2 As Byte
Dim word, code As UShort
Dim nValue As Integer
Dim result As Double
For i As Integer = 0 To (data.Length \ 2 - 1)
byte1 = data(i, 0)
byte2 = data(i, 1)
word = (byte2 * 256US) + byte1
If (word Mod 2) = 1 Then
code = (word \ 2US) Mod 8US
nValue = ((word \ 16) Xor 2048) - 2048
Select Case code
Case 0 : result = nValue * 5000
Case 1 : result = nValue * 500
Case 2 : result = nValue / 20
Case 3 : result = nValue / 200
Case 4 : result = nValue / 2000
Case 5 : result = nValue / 20000
Case 6 : result = nValue / 16
Case 7 : result = nValue / 64
End Select
Else
'unscaled 15-bit integer in h<15:1>. Extract, sign extend to 32 bits
nValue = ((word \ 2) Xor 16384) - 16384
result = nValue
End If
Console.WriteLine($"[{byte1,3:D}, {byte2,3:D}] number = {nValue:X8} result ={result,12:F8}")
Next

SQL query is not working (Error in rsqlite_send_query)

This is what the head of my data frame looks like
> head(d19_1)
SMZ SIZ1_diff SIZ1_base SIZ2_diff SIZ2_base SIZ3_diff SIZ3_base SIZ4_diff SIZ4_base SIZ5_diff SIZ5_base
1 1 -620 4170 -189 1347 -35 2040 82 1437 244 1533
2 2 -219 831 -57 255 -4 392 8 282 14 297
3 3 -426 834 -162 294 -134 379 -81 241 -22 221
4 4 -481 676 -142 216 -114 267 -50 158 -43 166
5 5 -233 1711 -109 584 54 913 71 624 74 707
6 6 -322 1539 -79 512 -50 799 23 532 63 576
Total_og Total_base %_SIZ1 %_SIZ2 %_SIZ3 %_SIZ4 %_SIZ5 Total_og Total_base
1 11980 12648 14.86811 14.03118 1.715686 5.706333 15.916504 11980 12648
2 2156 2415 26.35379 22.35294 1.020408 2.836879 4.713805 2156 2415
3 1367 2314 51.07914 55.10204 35.356201 33.609959 9.954751 1367 2314
4 790 1736 71.15385 65.74074 42.696629 31.645570 25.903614 790 1736
5 5339 5496 13.61777 18.66438 5.914567 11.378205 10.466761 5339 5496
6 4362 4747 20.92268 15.42969 6.257822 4.323308 10.937500 4362 4747
The datatype of the data frame is as below str(d19_1)
> str(d19_1)
'data.frame': 1588 obs. of 20 variables:
$ SMZ : int 1 2 3 4 5 6 7 8 9 10 ...
$ SIZ1_diff : int -620 -219 -426 -481 -233 -322 -176 -112 -34 -103 ...
$ SIZ1_base : int 4170 831 834 676 1711 1539 720 1396 998 1392 ...
$ SIZ2_diff : int -189 -57 -162 -142 -109 -79 -12 72 -36 -33 ...
$ SIZ2_base : int 1347 255 294 216 584 512 196 437 343 479 ...
$ SIZ3_diff : int -35 -4 -134 -114 54 -50 16 4 26 83 ...
$ SIZ3_base : int 2040 392 379 267 913 799 361 804 566 725 ...
$ SIZ4_diff : int 82 8 -81 -50 71 23 36 127 46 75 ...
$ SIZ4_base : int 1437 282 241 158 624 532 242 471 363 509 ...
$ SIZ5_diff : int 244 14 -22 -43 74 63 11 143 79 125 ...
$ SIZ5_base : int 1533 297 221 166 707 576 263 582 429 536 ...
$ Total_og : int 11980 2156 1367 790 5339 4362 2027 4715 3465 4561 ...
$ Total_base: int 12648 2415 2314 1736 5496 4747 2168 4464 3278 4375 ...
$ %_SIZ1 : num 14.9 26.4 51.1 71.2 13.6 ...
$ %_SIZ2 : num 14 22.4 55.1 65.7 18.7 ...
$ %_SIZ3 : num 1.72 1.02 35.36 42.7 5.91 ...
$ %_SIZ4 : num 5.71 2.84 33.61 31.65 11.38 ...
$ %_SIZ5 : num 15.92 4.71 9.95 25.9 10.47 ...
$ Total_og : int 11980 2156 1367 790 5339 4362 2027 4715 3465 4561 ...
$ Total_base: int 12648 2415 2314 1736 5496 4747 2168 4464 3278 4375 ...
When I run the below query, it is returning me the below error and I don't know why. I don't have any column in table
Query
d20_1 <- sqldf('SELECT *, CASE
WHEN SMZ BETWEEN 1 AND 110 THEN "Baltimore City"
WHEN SMZ BETWEEN 111 AND 217 THEN "Anne Arundel County"
WHEN SMZ BETWEEN 218 AND 405 THEN "Baltimore County"
WHEN SMZ BETWEEN 406 AND 453 THEN "Carroll County"
WHEN SMZ BETWEEN 454 AND 524 THEN "Harford County"
WHEN SMZ BETWEEN 1667 AND 1674 THEN "York County"
ELSE 0
END Jurisdiction
FROM d19_1')
Error:
Error in rsqlite_send_query(conn#ptr, statement) :
table d19_1 has no column named <NA>
Your code works correctly for me:
d19_1 <- structure(list(SMZ = 1:6, SIZ1_diff = c(-620L, -219L, -426L,
-481L, -233L, -322L), SIZ1_base = c(4170L, 831L, 834L, 676L,
1711L, 1539L), SIZ2_diff = c(-189L, -57L, -162L, -142L, -109L,
-79L), SIZ2_base = c(1347L, 255L, 294L, 216L, 584L, 512L), SIZ3_diff = c(-35L,
-4L, -134L, -114L, 54L, -50L), SIZ3_base = c(2040L, 392L, 379L,
267L, 913L, 799L), SIZ4_diff = c(82L, 8L, -81L, -50L, 71L, 23L
), SIZ4_base = c(1437L, 282L, 241L, 158L, 624L, 532L), SIZ5_diff = c(244L,
14L, -22L, -43L, 74L, 63L), SIZ5_base = c(1533L, 297L, 221L,
166L, 707L, 576L), Total_og = c(11980L, 2156L, 1367L, 790L, 5339L,
4362L), Total_base = c(12648L, 2415L, 2314L, 1736L, 5496L, 4747L
), X._SIZ1 = c(14.86811, 26.35379, 51.07914, 71.15385, 13.61777,
20.92268), X._SIZ2 = c(14.03118, 22.35294, 55.10204, 65.74074,
18.66438, 15.42969), X._SIZ3 = c(1.715686, 1.020408, 35.356201,
42.696629, 5.914567, 6.257822), X._SIZ4 = c(5.706333, 2.836879,
33.609959, 31.64557, 11.378205, 4.323308), X._SIZ5 = c(15.916504,
4.713805, 9.954751, 25.903614, 10.466761, 10.9375), Total_og.1 = c(11980L,
2156L, 1367L, 790L, 5339L, 4362L), Total_base.1 = c(12648L, 2415L,
2314L, 1736L, 5496L, 4747L)), .Names = c("SMZ", "SIZ1_diff",
"SIZ1_base", "SIZ2_diff", "SIZ2_base", "SIZ3_diff", "SIZ3_base",
"SIZ4_diff", "SIZ4_base", "SIZ5_diff", "SIZ5_base", "Total_og",
"Total_base", "X._SIZ1", "X._SIZ2", "X._SIZ3", "X._SIZ4", "X._SIZ5",
"Total_og.1", "Total_base.1"), row.names = c(NA, -6L), class = "data.frame")
library(sqldf)
sqldf('SELECT *, CASE
WHEN SMZ BETWEEN 1 AND 110 THEN "Baltimore City"
WHEN SMZ BETWEEN 111 AND 217 THEN "Anne Arundel County"
WHEN SMZ BETWEEN 218 AND 405 THEN "Baltimore County"
WHEN SMZ BETWEEN 406 AND 453 THEN "Carroll County"
WHEN SMZ BETWEEN 454 AND 524 THEN "Harford County"
WHEN SMZ BETWEEN 1667 AND 1674 THEN "York County"
ELSE 0
END Jurisdiction
FROM d19_1')

Can't get Svg to show up in react-native with react-native-svg or Svg/expo

I am not sure exactly why this is not showing up. I suspect maybe it has to do with the transform?
<Svg.G id="startup" stroke="none" strokeWidth="1" fill="none" fillRule="evenodd" transform="translate(135.000000, 76.000000)">
But I really don't know... I'm seeing "invalid prop transform"
As I copied SVG code form sketch, suspect I'm having similar trouble to this issue:
https://github.com/react-native-community/react-native-svg/issues/205
Code is below. Any ideas?
import React from 'react';
import { Svg } from 'expo';
const RocketSvg = () => {
return (
<Svg width="108px" height="128px" viewBox="135 76 108 128" version="1.1">
<Svg.G id="startup" stroke="none" strokeWidth="1" fill="none" fillRule="evenodd" transform="translate(135.000000, 76.000000)">
<Svg.Ellipse id="Oval" fill="#DCDDDE" cx="54" cy="75" rx="52" ry="51"></Svg.Ellipse>
<Svg.Path d="M81.764,76.608 C79.316,75.5 75.758,74.198 72,73.81 L72,61.414 C77.408,64.05 80.866,70.118 81.764,76.608 Z" id="Shape" fill="#FFFFFF"></Svg.Path>
<Svg.Path d="M26.236,76.608 C27.134,70.118 30.592,64.052 36,61.412 L36,73.804 C32.242,74.194 28.682,75.498 26.236,76.608 Z" id="Shape" fill="#FFFFFF"></Svg.Path>
<Svg.Path d="M40,76 L40,42 C40,37.68 40.562,33.678 41.448,30 L66.556,30 C67.44,33.676 68,37.676 68,42 L68,76 L66,76 L42,76 L40,76 Z" id="Shape" fill="#FFFFFF"></Svg.Path>
<Svg.Polygon id="Shape" fill="#FFFFFF" points="46 84 46 80 62 80 62 84 60 84 48 84"></Svg.Polygon>
<Svg.Path d="M70,78 L70,42 C70,18 54,2 54,2 L54,78 L70,78 Z" id="Shape" fill="#DCDDDE"></Svg.Path>
<Svg.Circle id="Oval" fill="#53B7E8" cx="54" cy="42" r="6"></Svg.Circle>
<Svg.Path d="M98.8,100.76 C96.66,104.34 94.04,107.7 90.96,110.78 C88.58,113.16 86.02,115.26 83.32,117.08 C65.74,129.02 42.46,128.98 24.92,116.96 C22.28,115.18 19.78,113.12 17.44,110.78 C14.26,107.6 11.56,104.1 9.38,100.4 C11.4,96.6 15.4,94 20,94 C25.48,94 30.1,97.7 31.54,102.74 C33.3,99.9 36.42,98 40,98 C44.84,98 48.88,101.44 49.8,106 L50,106 L50,86 L58,86 L58,106 L58.2,106 C59.12,101.44 63.16,98 68,98 C71.58,98 74.7,99.9 76.46,102.74 C77.88,97.7 82.5,94 88,94 C92.74,94 96.84,96.76 98.8,100.76 Z" id="Shape" fill="#FFD768"></Svg.Path>
<Svg.Rect id="Rectangle-path" fill="#DCDDDE" x="54" y="80" width="8" height="4"></Svg.Rect>
<Svg.Path d="M91.414,36.586 L90,35.172 L87.172,38 L88.586,39.414 C98.67,49.5 104,61.458 104,74 C104,82.056 102.096,89.812 98.522,96.778 C95.9,93.776 92.12,92 88,92 C82.948,92 78.354,94.752 75.894,98.988 C73.742,97.08 70.96,96 68,96 C64.966,96 62.144,97.148 60,99.062 L60,88 L66,88 L66,80 L72,80 L72,77.844 C77.584,78.546 82.886,81.664 82.966,81.714 L86,83.538 L86,80 C86,70.238 80.864,60.222 72,57.052 L72,42 C72,17.454 56.092,2.09 55.414,1.414 L54,0 L52.586,1.414 C51.908,2.09 36,17.454 36,42 L36,57.052 C27.136,60.222 22,70.238 22,80 L22.002,83.538 L25.034,81.714 C25.116,81.664 30.418,78.542 36,77.842 L36,80 L42,80 L42,88 L48,88 L48,99.062 C45.856,97.148 43.034,96 40,96 C37.036,96 34.252,97.082 32.1,98.994 C29.628,94.754 25.036,92 20,92 C15.91,92 12.116,93.794 9.486,96.794 C5.906,89.824 4,82.06 4,73.998 C4,61.458 9.33,49.5 19.414,39.414 L20.828,38 L18,35.172 L16.586,36.586 C5.736,47.438 0,60.376 0,73.998 C0,88.424 5.618,101.984 15.816,112.184 C26.344,122.71 40.172,128 54,128 C67.828,128 81.656,122.71 92.184,112.184 C102.382,101.984 108,88.424 108,74 C108,60.376 102.264,47.438 91.414,36.586 Z M81.764,76.608 C79.316,75.5 75.758,74.198 72,73.81 L72,61.414 C77.408,64.05 80.866,70.118 81.764,76.608 Z M54.002,5.824 C56.604,8.896 62.168,15.912 65.444,26 L42.566,26 C45.844,15.928 51.404,8.902 54.002,5.824 Z M26.236,76.608 C27.134,70.118 30.592,64.052 36,61.412 L36,73.804 C32.242,74.194 28.682,75.498 26.236,76.608 Z M40,76 L40,42 C40,37.68 40.562,33.678 41.448,30 L66.556,30 C67.44,33.676 68,37.676 68,42 L68,76 L66,76 L42,76 L40,76 Z M46,84 L46,80 L62,80 L62,84 L60,84 L48,84 L46,84 Z M89.356,109.356 C69.86,128.85 38.14,128.85 18.646,109.356 C15.952,106.662 13.616,103.706 11.634,100.558 C13.48,97.732 16.598,96 20,96 C24.436,96 28.39,98.998 29.618,103.29 L30.86,107.638 L33.24,103.794 C34.712,101.418 37.24,100 40,100 C43.794,100 47.092,102.69 47.838,106.394 L48.164,108 L52,108 L52,88 L56,88 L56,108 L59.836,108 L60.16,106.394 C60.908,102.69 64.206,100 68,100 C70.76,100 73.288,101.418 74.76,103.792 L77.154,107.656 L78.386,103.282 C79.592,98.994 83.546,96 88,96 C91.426,96 94.528,97.718 96.372,100.55 C94.39,103.7 92.052,106.658 89.356,109.356 Z" id="Shape" fill="#0C3847"></Svg.Path>
<Svg.Path d="M54,34 C49.588,34 46,37.588 46,42 C46,46.412 49.588,50 54,50 C58.412,50 62,46.412 62,42 C62,37.588 58.412,34 54,34 Z M54,46 C51.794,46 50,44.206 50,42 C50,39.794 51.794,38 54,38 C56.206,38 58,39.794 58,42 C58,44.206 56.206,46 54,46 Z" id="Shape" fill="#0C3847"></Svg.Path>
<Svg.Polygon id="Shape" fill="#0C3847" points="78 16 82 16 82 12 86 12 86 8 82 8 82 4 78 4 78 8 74 8 74 12 78 12"></Svg.Polygon>
<Svg.Polygon id="Shape" fill="#0C3847" points="14 26 18 26 18 22 22 22 22 18 18 18 18 14 14 14 14 18 10 18 10 22 14 22"></Svg.Polygon>
<Svg.Polygon id="Shape" fill="#0C3847" points="96 32 100 32 100 28 104 28 104 24 100 24 100 20 96 20 96 24 92 24 92 28 96 28"></Svg.Polygon>
<Svg.Polygon id="Shape" fill="#0C3847" points="84 46 84 48 82 48 82 52 84 52 84 54 88 54 88 52 90 52 90 48 88 48 88 46"></Svg.Polygon>
<Svg.Polygon id="Shape" fill="#0C3847" points="26 44 30 44 30 42 32 42 32 38 30 38 30 36 26 36 26 38 24 38 24 42 26 42"></Svg.Polygon>
<Svg.Polygon id="Shape" fill="#0C3847" points="26 8 30 8 30 6 32 6 32 2 30 2 30 0 26 0 26 2 24 2 24 6 26 6"></Svg.Polygon>
<Svg.Path d="M54.002,5.824 C56.604,8.896 62.168,15.912 65.444,26 L42.566,26 C45.844,15.928 51.404,8.902 54.002,5.824 Z" id="Shape" fill="#F37053"></Svg.Path>*/}
</Svg.G>
</Svg>
);
}
Answer Regards Tranform
transform propType is not currently support as part of react-native-svg (4th June 2017)
However you can use x and y props instead:
<Svg.G x="135.000000" y="76.000000" id="startup" stroke="none" strokeWidth="1" fill="none" fillRule="evenodd">
Fix for SVG
However the SVG you pasted does still not seem to render.
I first converted it back to SVG and re-exported with Sketch.
// logo.svg
<?xml version="1.0" encoding="UTF-8"?>
<svg width="108px" height="128px" viewBox="0 0 108 128" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<!-- Generator: Sketch 44.1 (41455) - http://www.bohemiancoding.com/sketch -->
<title>Slice 1</title>
<desc>Created with Sketch.</desc>
<defs></defs>
<g id="Page-1" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<g id="so" fill-rule="nonzero">
<ellipse id="Oval" fill="#DCDDDE" cx="54" cy="75" rx="52" ry="51"></ellipse>
<path d="M81.764,76.608 C79.316,75.5 75.758,74.198 72,73.81 L72,61.414 C77.408,64.05 80.866,70.118 81.764,76.608 Z" id="Shape" fill="#FFFFFF"></path>
<path d="M26.236,76.608 C27.134,70.118 30.592,64.052 36,61.412 L36,73.804 C32.242,74.194 28.682,75.498 26.236,76.608 Z" id="Shape" fill="#FFFFFF"></path>
<path d="M40,76 L40,42 C40,37.68 40.562,33.678 41.448,30 L66.556,30 C67.44,33.676 68,37.676 68,42 L68,76 L66,76 L42,76 L40,76 Z" id="Shape" fill="#FFFFFF"></path>
<polygon id="Shape" fill="#FFFFFF" points="46 84 46 80 62 80 62 84 60 84 48 84"></polygon>
<path d="M70,78 L70,42 C70,18 54,2 54,2 L54,78 L70,78 Z" id="Shape" fill="#DCDDDE"></path>
<circle id="Oval" fill="#53B7E8" cx="54" cy="42" r="6"></circle>
<path d="M98.8,100.76 C96.66,104.34 94.04,107.7 90.96,110.78 C88.58,113.16 86.02,115.26 83.32,117.08 C65.74,129.02 42.46,128.98 24.92,116.96 C22.28,115.18 19.78,113.12 17.44,110.78 C14.26,107.6 11.56,104.1 9.38,100.4 C11.4,96.6 15.4,94 20,94 C25.48,94 30.1,97.7 31.54,102.74 C33.3,99.9 36.42,98 40,98 C44.84,98 48.88,101.44 49.8,106 L50,106 L50,86 L58,86 L58,106 L58.2,106 C59.12,101.44 63.16,98 68,98 C71.58,98 74.7,99.9 76.46,102.74 C77.88,97.7 82.5,94 88,94 C92.74,94 96.84,96.76 98.8,100.76 Z" id="Shape" fill="#FFD768"></path>
<rect id="Rectangle-path" fill="#DCDDDE" x="54" y="80" width="8" height="4"></rect>
<path d="M91.414,36.586 L90,35.172 L87.172,38 L88.586,39.414 C98.67,49.5 104,61.458 104,74 C104,82.056 102.096,89.812 98.522,96.778 C95.9,93.776 92.12,92 88,92 C82.948,92 78.354,94.752 75.894,98.988 C73.742,97.08 70.96,96 68,96 C64.966,96 62.144,97.148 60,99.062 L60,88 L66,88 L66,80 L72,80 L72,77.844 C77.584,78.546 82.886,81.664 82.966,81.714 L86,83.538 L86,80 C86,70.238 80.864,60.222 72,57.052 L72,42 C72,17.454 56.092,2.09 55.414,1.414 L54,0 L52.586,1.414 C51.908,2.09 36,17.454 36,42 L36,57.052 C27.136,60.222 22,70.238 22,80 L22.002,83.538 L25.034,81.714 C25.116,81.664 30.418,78.542 36,77.842 L36,80 L42,80 L42,88 L48,88 L48,99.062 C45.856,97.148 43.034,96 40,96 C37.036,96 34.252,97.082 32.1,98.994 C29.628,94.754 25.036,92 20,92 C15.91,92 12.116,93.794 9.486,96.794 C5.906,89.824 4,82.06 4,73.998 C4,61.458 9.33,49.5 19.414,39.414 L20.828,38 L18,35.172 L16.586,36.586 C5.736,47.438 0,60.376 0,73.998 C0,88.424 5.618,101.984 15.816,112.184 C26.344,122.71 40.172,128 54,128 C67.828,128 81.656,122.71 92.184,112.184 C102.382,101.984 108,88.424 108,74 C108,60.376 102.264,47.438 91.414,36.586 Z M81.764,76.608 C79.316,75.5 75.758,74.198 72,73.81 L72,61.414 C77.408,64.05 80.866,70.118 81.764,76.608 Z M54.002,5.824 C56.604,8.896 62.168,15.912 65.444,26 L42.566,26 C45.844,15.928 51.404,8.902 54.002,5.824 Z M26.236,76.608 C27.134,70.118 30.592,64.052 36,61.412 L36,73.804 C32.242,74.194 28.682,75.498 26.236,76.608 Z M40,76 L40,42 C40,37.68 40.562,33.678 41.448,30 L66.556,30 C67.44,33.676 68,37.676 68,42 L68,76 L66,76 L42,76 L40,76 Z M46,84 L46,80 L62,80 L62,84 L60,84 L48,84 L46,84 Z M89.356,109.356 C69.86,128.85 38.14,128.85 18.646,109.356 C15.952,106.662 13.616,103.706 11.634,100.558 C13.48,97.732 16.598,96 20,96 C24.436,96 28.39,98.998 29.618,103.29 L30.86,107.638 L33.24,103.794 C34.712,101.418 37.24,100 40,100 C43.794,100 47.092,102.69 47.838,106.394 L48.164,108 L52,108 L52,88 L56,88 L56,108 L59.836,108 L60.16,106.394 C60.908,102.69 64.206,100 68,100 C70.76,100 73.288,101.418 74.76,103.792 L77.154,107.656 L78.386,103.282 C79.592,98.994 83.546,96 88,96 C91.426,96 94.528,97.718 96.372,100.55 C94.39,103.7 92.052,106.658 89.356,109.356 Z" id="Shape" fill="#0C3847"></path>
<path d="M54,34 C49.588,34 46,37.588 46,42 C46,46.412 49.588,50 54,50 C58.412,50 62,46.412 62,42 C62,37.588 58.412,34 54,34 Z M54,46 C51.794,46 50,44.206 50,42 C50,39.794 51.794,38 54,38 C56.206,38 58,39.794 58,42 C58,44.206 56.206,46 54,46 Z" id="Shape" fill="#0C3847"></path>
<polygon id="Shape" fill="#0C3847" points="78 16 82 16 82 12 86 12 86 8 82 8 82 4 78 4 78 8 74 8 74 12 78 12"></polygon>
<polygon id="Shape" fill="#0C3847" points="14 26 18 26 18 22 22 22 22 18 18 18 18 14 14 14 14 18 10 18 10 22 14 22"></polygon>
<polygon id="Shape" fill="#0C3847" points="96 32 100 32 100 28 104 28 104 24 100 24 100 20 96 20 96 24 92 24 92 28 96 28"></polygon>
<polygon id="Shape" fill="#0C3847" points="84 46 84 48 82 48 82 52 84 52 84 54 88 54 88 52 90 52 90 48 88 48 88 46"></polygon>
<polygon id="Shape" fill="#0C3847" points="26 44 30 44 30 42 32 42 32 38 30 38 30 36 26 36 26 38 24 38 24 42 26 42"></polygon>
<polygon id="Shape" fill="#0C3847" points="26 8 30 8 30 6 32 6 32 2 30 2 30 0 26 0 26 2 24 2 24 6 26 6"></polygon>
<path d="M54.002,5.824 C56.604,8.896 62.168,15.912 65.444,26 L42.566,26 C45.844,15.928 51.404,8.902 54.002,5.824 Z" id="Shape" fill="#F37053"></path>
</g>
</g>
</svg>
Then I used msvgc to convert it to a react-native-svg compatible component. Lastly made changes so with works with import { Svg } from 'expo'. You'll notice that there is no longer a transform anyway so your original question is not needed.
// Logo.js
import React from 'react'
import { Svg } from 'expo'
const logo = props => (
<Svg width={props.width || 108} height={props.height || 128} viewBox="0 0 108 128">
<Svg.G fillRule="nonzero" fill="none">
<Svg.Ellipse fill="#DCDDDE" cx="54" cy="75" rx="52" ry="51"></Svg.Ellipse>
<Svg.Path d="M81.764 76.608C79.316 75.5 75.758 74.198 72 73.81V61.414c5.408 2.636 8.866 8.704 9.764 15.194zM26.236 76.608c.898-6.49 4.356-12.556 9.764-15.196v12.392c-3.758.39-7.318 1.694-9.764 2.804zM40 76V42c0-4.32.562-8.322 1.448-12h25.108A51.108 51.108 0 0 1 68 42v34H40zM46 84v-4h16v4H48z" fill="#FFF"></Svg.Path>
<Svg.Path d="M70 78V42C70 18 54 2 54 2v76h16z" fill="#DCDDDE"></Svg.Path>
<Svg.Circle fill="#53B7E8" cx="54" cy="42" r="6"></Svg.Circle>
<Svg.Path d="M98.8 100.76a51.654 51.654 0 0 1-7.84 10.02 51.539 51.539 0 0 1-7.64 6.3c-17.58 11.94-40.86 11.9-58.4-.12a50.5 50.5 0 0 1-7.48-6.18 52.123 52.123 0 0 1-8.06-10.38C11.4 96.6 15.4 94 20 94c5.48 0 10.1 3.7 11.54 8.74C33.3 99.9 36.42 98 40 98a10 10 0 0 1 9.8 8h.2V86h8v20h.2a10 10 0 0 1 9.8-8c3.58 0 6.7 1.9 8.46 4.74C77.88 97.7 82.5 94 88 94c4.74 0 8.84 2.76 10.8 6.76z" fill="#FFD768"></Svg.Path>
<Svg.Path fill="#DCDDDE" d="M54 80h8v4h-8z"></Svg.Path>
<Svg.Path d="M91.414 36.586L90 35.172 87.172 38l1.414 1.414C98.67 49.5 104 61.458 104 74c0 8.056-1.904 15.812-5.478 22.778A13.905 13.905 0 0 0 88 92c-5.052 0-9.646 2.752-12.106 6.988A11.857 11.857 0 0 0 68 96a12.009 12.009 0 0 0-8 3.062V88h6v-8h6v-2.156c5.584.702 10.886 3.82 10.966 3.87L86 83.538V80c0-9.762-5.136-19.778-14-22.948V42C72 17.454 56.092 2.09 55.414 1.414L54 0l-1.414 1.414C51.908 2.09 36 17.454 36 42v15.052C27.136 60.222 22 70.238 22 80l.002 3.538 3.032-1.824c.082-.05 5.384-3.172 10.966-3.872V80h6v8h6v11.062A12.009 12.009 0 0 0 40 96a11.85 11.85 0 0 0-7.9 2.994C29.628 94.754 25.036 92 20 92c-4.09 0-7.884 1.794-10.514 4.794C5.906 89.824 4 82.06 4 73.998 4 61.458 9.33 49.5 19.414 39.414L20.828 38 18 35.172l-1.414 1.414C5.736 47.438 0 60.376 0 73.998c0 14.426 5.618 27.986 15.816 38.186C26.344 122.71 40.172 128 54 128c13.828 0 27.656-5.29 38.184-15.816C102.382 101.984 108 88.424 108 74c0-13.624-5.736-26.562-16.586-37.414zm-9.65 40.022C79.316 75.5 75.758 74.198 72 73.81V61.414c5.408 2.636 8.866 8.704 9.764 15.194zM54.002 5.824C56.604 8.896 62.168 15.912 65.444 26H42.566c3.278-10.072 8.838-17.098 11.436-20.176zM26.236 76.608c.898-6.49 4.356-12.556 9.764-15.196v12.392c-3.758.39-7.318 1.694-9.764 2.804zM40 76V42c0-4.32.562-8.322 1.448-12h25.108A51.108 51.108 0 0 1 68 42v34H40zm6 8v-4h16v4H46zm43.356 25.356c-19.496 19.494-51.216 19.494-70.71 0a50.398 50.398 0 0 1-7.012-8.798C13.48 97.732 16.598 96 20 96c4.436 0 8.39 2.998 9.618 7.29l1.242 4.348 2.38-3.844c1.472-2.376 4-3.794 6.76-3.794 3.794 0 7.092 2.69 7.838 6.394l.326 1.606H52V88h4v20h3.836l.324-1.606C60.908 102.69 64.206 100 68 100c2.76 0 5.288 1.418 6.76 3.792l2.394 3.864 1.232-4.374C79.592 98.994 83.546 96 88 96c3.426 0 6.528 1.718 8.372 4.55a50.456 50.456 0 0 1-7.016 8.806z" fill="#0C3847"></Svg.Path>
<Svg.Path d="M54 34c-4.412 0-8 3.588-8 8s3.588 8 8 8 8-3.588 8-8-3.588-8-8-8zm0 12c-2.206 0-4-1.794-4-4s1.794-4 4-4 4 1.794 4 4-1.794 4-4 4zM78 16h4v-4h4V8h-4V4h-4v4h-4v4h4zM14 26h4v-4h4v-4h-4v-4h-4v4h-4v4h4zM96 32h4v-4h4v-4h-4v-4h-4v4h-4v4h4zM84 46v2h-2v4h2v2h4v-2h2v-4h-2v-2zM26 44h4v-2h2v-4h-2v-2h-4v2h-2v4h2zM26 8h4V6h2V2h-2V0h-4v2h-2v4h2z" fill="#0C3847"></Svg.Path>
<Svg.Path d="M54.002 5.824C56.604 8.896 62.168 15.912 65.444 26H42.566c3.278-10.072 8.838-17.098 11.436-20.176z" fill="#F37053"></Svg.Path>
</Svg.G>
</Svg>
)
export default logo

What is the fastest way for adding the vector elements horizontally in odd order?

According to this question I implemented the horizontal addition this time 5 by 5 and 7 by 7. It does the job correctly but it is not fast enough.
Can it be faster than what it is? I tried to use hadd and other instruction but the improvement is restricted. For examlple, when I use _mm256_bsrli_epi128 it is slightly better but it needs some extra permutation that ruins the benefit because of the lanes. So the question is how it should be implemented to gain more performance. The same story is for 9 elements, etc.
This adds 5 elements horizontally and puts the results in places 0, 5, and 10:
//it put the results in places 0, 5, and 10
inline __m256i _mm256_hadd5x5_epi16(__m256i a )
{
__m256i a1, a2, a3, a4;
a1 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 1 * 2);
a2 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 2 * 2);
a3 = _mm256_bsrli_epi128(a2, 2);
a4 = _mm256_bsrli_epi128(a3, 2);
return _mm256_add_epi16(_mm256_add_epi16(_mm256_add_epi16(a1, a2), _mm256_add_epi16(a3, a4)) , a );
}
And this adds 7 elements horizontally and puts the results in places 0 and 7:
inline __m256i _mm256_hadd7x7_epi16(__m256i a )
{
__m256i a1, a2, a3, a4, a5, a6;
a1 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 1 * 2);
a2 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 2 * 2);
a3 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 3 * 2);
a4 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 4 * 2);
a5 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 5 * 2);
a6 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 6 * 2);
return _mm256_add_epi16(_mm256_add_epi16(_mm256_add_epi16(a1, a2), _mm256_add_epi16(a3, a4)) , _mm256_add_epi16(_mm256_add_epi16(a5, a6), a ));
}
Indeed it is possible calculate these sums with less instructions. The idea is to accumulate
the partial sums not only in columns 10, 5 and 0, but also in other columns. This reduces the number of
vpaddw instructions and the number of 'shuffles' compared to your solution.
#include <stdio.h>
#include <x86intrin.h>
/* gcc -O3 -Wall -m64 -march=haswell hor_sum5x5.c */
int print_vec_short(__m256i x);
int print_10_5_0_short(__m256i x);
__m256i _mm256_hadd5x5_epi16(__m256i a );
__m256i _mm256_hadd7x7_epi16(__m256i a );
int main() {
short x[16];
for(int i=0; i<16; i++) x[i] = i+1; /* arbitrary initial values */
__m256i t0 = _mm256_loadu_si256((__m256i*)x);
__m256i t2 = _mm256_permutevar8x32_epi32(t0,_mm256_set_epi32(0,7,6,5,4,3,2,1));
__m256i t02 = _mm256_add_epi16(t0,t2);
__m256i t3 = _mm256_bsrli_epi128(t2,4); /* byte shift right */
__m256i t023 = _mm256_add_epi16(t02,t3);
__m256i t13 = _mm256_srli_epi64(t02,16); /* bit shift right */
__m256i sum = _mm256_add_epi16(t023,t13);
printf("t0 = ");print_vec_short(t0 );
printf("t2 = ");print_vec_short(t2 );
printf("t02 = ");print_vec_short(t02 );
printf("t3 = ");print_vec_short(t3 );
printf("t023= ");print_vec_short(t023);
printf("t13 = ");print_vec_short(t13 );
printf("sum = ");print_vec_short(sum );
printf("\nVector elements of interest: columns 10, 5, 0:\n");
printf("t0 [10, 5, 0] = ");print_10_5_0_short(t0 );
printf("t2 [10, 5, 0] = ");print_10_5_0_short(t2 );
printf("t02 [10, 5, 0] = ");print_10_5_0_short(t02 );
printf("t3 [10, 5, 0] = ");print_10_5_0_short(t3 );
printf("t023[10, 5, 0] = ");print_10_5_0_short(t023);
printf("t13 [10, 5, 0] = ");print_10_5_0_short(t13 );
printf("sum [10, 5, 0] = ");print_10_5_0_short(sum );
printf("\nSum with _mm256_hadd5x5_epi16(t0)\n");
sum = _mm256_hadd5x5_epi16(t0);
printf("sum [10, 5, 0] = ");print_10_5_0_short(sum );
/* now the sum of 7 elements: */
printf("\n\nSum of short ints 13...7 and short ints 6...0:\n");
__m256i t = _mm256_loadu_si256((__m256i*)x);
t0 = _mm256_permutevar8x32_epi32(t0,_mm256_set_epi32(3,6,5,4,3,2,1,0));
t0 = _mm256_and_si256(t0,_mm256_set_epi16(0xFFFF,0,0xFFFF,0xFFFF,0xFFFF,0xFFFF,0xFFFF,0xFFFF, 0,0xFFFF,0xFFFF,0xFFFF,0xFFFF,0xFFFF,0xFFFF,0xFFFF));
__m256i t1 = _mm256_alignr_epi8(t0,t0,2);
__m256i t01 = _mm256_add_epi16(t0,t1);
__m256i t23 = _mm256_alignr_epi8(t01,t01,4);
__m256i t0123 = _mm256_add_epi16(t01,t23);
__m256i t4567 = _mm256_alignr_epi8(t0123,t0123,8);
__m256i sum08 = _mm256_add_epi16(t0123,t4567); /* all elements are summed, but another permutation is needed to get the answer at position 7 */
sum = _mm256_permutevar8x32_epi32(sum08,_mm256_set_epi32(4,4,4,4,4,0,0,0));
printf("t = ");print_vec_short(t );
printf("t0 = ");print_vec_short(t0 );
printf("t1 = ");print_vec_short(t1 );
printf("t01 = ");print_vec_short(t01 );
printf("t23 = ");print_vec_short(t23 );
printf("t0123 = ");print_vec_short(t0123 );
printf("t4567 = ");print_vec_short(t4567 );
printf("sum08 = ");print_vec_short(sum08 );
printf("sum = ");print_vec_short(sum );
printf("\nSum with _mm256_hadd7x7_epi16(t) (the answer is in column 0 and in column 7)\n");
sum = _mm256_hadd7x7_epi16(t);
printf("sum = ");print_vec_short(sum );
return 0;
}
inline __m256i _mm256_hadd5x5_epi16(__m256i a )
{
__m256i a1, a2, a3, a4;
a1 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 1 * 2);
a2 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 2 * 2);
a3 = _mm256_bsrli_epi128(a2, 2);
a4 = _mm256_bsrli_epi128(a3, 2);
return _mm256_add_epi16(_mm256_add_epi16(_mm256_add_epi16(a1, a2), _mm256_add_epi16(a3, a4)) , a );
}
inline __m256i _mm256_hadd7x7_epi16(__m256i a )
{
__m256i a1, a2, a3, a4, a5, a6;
a1 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 1 * 2);
a2 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 2 * 2);
a3 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 3 * 2);
a4 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 4 * 2);
a5 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 5 * 2);
a6 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 6 * 2);
return _mm256_add_epi16(_mm256_add_epi16(_mm256_add_epi16(a1, a2), _mm256_add_epi16(a3, a4)) , _mm256_add_epi16(_mm256_add_epi16(a5, a6), a ));
}
int print_vec_short(__m256i x){
short int v[16];
_mm256_storeu_si256((__m256i *)v,x);
printf("%4hi %4hi %4hi %4hi | %4hi %4hi %4hi %4hi | %4hi %4hi %4hi %4hi | %4hi %4hi %4hi %4hi \n",
v[15],v[14],v[13],v[12],v[11],v[10],v[9],v[8],v[7],v[6],v[5],v[4],v[3],v[2],v[1],v[0]);
return 0;
}
int print_10_5_0_short(__m256i x){
short int v[16];
_mm256_storeu_si256((__m256i *)v,x);
printf("%4hi %4hi %4hi \n",v[10],v[5],v[0]);
return 0;
}
The output is:
$ ./a.out
t0 = 16 15 14 13 | 12 11 10 9 | 8 7 6 5 | 4 3 2 1
t2 = 2 1 16 15 | 14 13 12 11 | 10 9 8 7 | 6 5 4 3
t02 = 18 16 30 28 | 26 24 22 20 | 18 16 14 12 | 10 8 6 4
t3 = 0 0 2 1 | 16 15 14 13 | 0 0 10 9 | 8 7 6 5
t023= 18 16 32 29 | 42 39 36 33 | 18 16 24 21 | 18 15 12 9
t13 = 0 18 16 30 | 0 26 24 22 | 0 18 16 14 | 0 10 8 6
sum = 18 34 48 59 | 42 65 60 55 | 18 34 40 35 | 18 25 20 15
Vector elements of interest: columns 10, 5, 0:
t0 [10, 5, 0] = 11 6 1
t2 [10, 5, 0] = 13 8 3
t02 [10, 5, 0] = 24 14 4
t3 [10, 5, 0] = 15 10 5
t023[10, 5, 0] = 39 24 9
t13 [10, 5, 0] = 26 16 6
sum [10, 5, 0] = 65 40 15
Sum with _mm256_hadd5x5_epi16(t0)
sum [10, 5, 0] = 65 40 15
Sum of short ints 13...7 and short ints 6...0:
t = 16 15 14 13 | 12 11 10 9 | 8 7 6 5 | 4 3 2 1
t0 = 8 0 14 13 | 12 11 10 9 | 0 7 6 5 | 4 3 2 1
t1 = 9 8 0 14 | 13 12 11 10 | 1 0 7 6 | 5 4 3 2
t01 = 17 8 14 27 | 25 23 21 19 | 1 7 13 11 | 9 7 5 3
t23 = 21 19 17 8 | 14 27 25 23 | 5 3 1 7 | 13 11 9 7
t0123 = 38 27 31 35 | 39 50 46 42 | 6 10 14 18 | 22 18 14 10
t4567 = 39 50 46 42 | 38 27 31 35 | 22 18 14 10 | 6 10 14 18
sum08 = 77 77 77 77 | 77 77 77 77 | 28 28 28 28 | 28 28 28 28
sum = 77 77 77 77 | 77 77 77 77 | 77 77 28 28 | 28 28 28 28
Sum with _mm256_hadd7x7_epi16(t) (the answer is in column 0 and in column 7)
sum = 16 31 45 58 | 70 81 91 84 | 77 70 63 56 | 49 42 35 28