Merging unequal dataframes with partial string match - pandas

I want to merge two data frames. df1 has 115 rows and df2 has 600,000 rows.
f1 = pd.DataFrame({'Invoice': ['20561', '20562', '20563', '20564'],
'Currency': ['EUR', 'EUR', 'EUR', 'USD']})
df2 = pd.DataFrame({'Ref': ['20561', 'INV20562', 'INV20563BG', '20564'],
'Type': ['01', '03', '04', '02'],
'Amount': ['150', '175', '160', '180'],
'Comment': ['bla', 'bla', 'bla', 'bla']})
print(df1)
Invoice Currency
0 20561 EUR
1 20562 EUR
2 20563 EUR
3 20564 USD
print(df2)
Ref Type Amount Comment
0 20561 01 150 bla
1 INV20562 03 175 bla
2 INV20563BG 04 160 bla
3 20564 02 180 bla
I applied following code:
compList = '|'.join(df1['Invoice'].tolist())
df2['compMatch'] = df2.Ref.str.contains(compList)
# drop unmatched articles
df2 = df2[df2['compMatch']==True]
df2['content'] = df2['Ref'].str.lower().str.split()
df2['matchedName'] = df2['content'].apply(lambda x: [item for item in x if item in df1['Invoice'].tolist()])
df1['Invoice'].tolist()])
print (df2)
Ref Type Amount Comment compMatch content matchedName
0 20561 01 150 bla True [20561] [20561]
1 INV20562 03 175 bla True [inv20562] []
2 INV20563BG 04 160 bla True [inv20563bg] []
3 20564 02 180 bla True [20564] [20564]
here you see, a few MatchedNames are missing for Ref INV20562 and Ref INV20563BG.
What's wrong with this code? Is there any other solution?

Looks like you want to merge on the digits of the ref:
df2.merge(df1,
left_on=df2['Ref'].str.extract(r'(\d+)', expand=False),
right_on='Invoice', how='left')
Output:
Ref Type Amount Comment Invoice Currency
0 20561 01 150 bla 20561 EUR
1 INV20562 03 175 bla 20562 EUR
2 INV20563BG 04 160 bla 20563 EUR
3 20564 02 180 bla 20564 USD

Related

How do I get a time delta that is closest to 0 days?

I have the following dataframe:
gp_columns = {
'name': ['companyA', 'companyB'],
'firm_ID' : [1, 2],
'timestamp_one' : ['2016-04-01', '2017-09-01']
}
fund_columns = {
'firm_ID': [1, 1, 2, 2, 2],
'department_ID' : [10, 11, 20, 21, 22],
'timestamp_mult' : ['2015-01-01', '2016-03-01', '2016-10-01', '2017-02-01', '2018-11-01'],
'number' : [400, 500, 1000, 3000, 4000]
}
gp_df = pd.DataFrame(gp_columns)
fund_df = pd.DataFrame(fund_columns)
gp_df['timestamp_one'] = pd.to_datetime(gp_df['timestamp_one'])
fund_df['timestamp_mult'] = pd.to_datetime(fund_df['timestamp_mult'])
merged_df = gp_df.merge(fund_df)
merged_df
merged_df_v1 = merged_df.copy()
merged_df_v1['incidence_num'] = merged_df.groupby('firm_ID')['department_ID']\
.transform('cumcount')
merged_df_v1['incidence_num'] = merged_df_v1['incidence_num'] + 1
merged_df_v1['time_delta'] = merged_df_v1['timestamp_mult'] - merged_df_v1['timestamp_one']
merged_wide = pd.pivot(merged_df_v1, index = ['name','firm_ID', 'timestamp_one'], \
columns = 'incidence_num', \
values = ['department_ID', 'time_delta', 'timestamp_mult', 'number'])
merged_wide.reset_index()
that looks as follows:
My question is how i get a column that calculates the minimum time delta (so closest to 0). Note that the time delta can be negative or positive, so .abs() does not work for me here.
I want a dataframe with this particular output:
You can stack (which removes NaTs) and groupby.first after sorting the rows by absolute value (with the key parameter of sort_values):
df = merged_wide.reset_index()
df['time_delta_min'] = (df['time_delta'].stack()
.sort_values(key=abs)
.groupby(level=0).first()
)
output:
name firm_ID timestamp_one department_ID \
incidence_num 1 2 3
0 companyA 1 2016-04-01 10 11 NaN
1 companyB 2 2017-09-01 20 21 22
time_delta timestamp_mult \
incidence_num 1 2 3 1 2
0 -456 days -31 days NaT 2015-01-01 2016-03-01
1 -335 days -212 days 426 days 2016-10-01 2017-02-01
number time_delta_min
incidence_num 3 1 2 3
0 NaT 400 500 NaN -31 days
1 2018-11-01 1000 3000 4000 -212 days
Use lookup with indices of absolute values by DataFrame.idxmin:
idx, cols = pd.factorize(df['time_delta'].abs().idxmin(axis=1))
df['time_delta_min'] = (df['time_delta'].reindex(cols, axis=1).to_numpy()
[np.arange(len(df)), idx])
print (df)

Merging two dataset with partial match

I want to merge two dataframe df1 and df2. Shape of df1 is (115, 16) and Df2 is (624402, 23).
df1 = pd.DataFrame({'Invoice': ['20561', '20562', '20563', '20564'],
'Currency': ['EUR', 'EUR', 'EUR', 'USD']})
df2 = pd.DataFrame({'Ref': ['20561', 'INV20562', 'INV20563BG', '20564'],
'Type': ['01', '03', '04', '02'],
'Amount': ['150', '175', '160', '180'],
'Comment': ['bla', 'bla', 'bla', 'bla']})
print(df1)
Invoice Currency
0 20561 EUR
1 20562 EUR
2 20563 EUR
3 20564 USD
print(df2)
Ref Type Amount Comment
0 20561 01 150 bla
1 INV20562 03 175 bla
2 INV20563BG 04 160 bla
3 20564 02 180 bla
I applied the following code:
df4 = df1.copy()
for i, row in df1.iterrows():
tmp = df2[df2['Ref'].str.contains(row['Invoice'], na=False)]
df4.loc[i, 'Amount'] = tmp['Amount'].values[0]
print(df4)
It is showing: IndexError: index 0 is out of bounds for axis 0 with size 0
The IndexError occurs when no row matches the invoice. You can check for this and return np.nan (or a different default value) if a matching invoice is not found:
df4 = df1.copy()
for i, row in df1.iterrows():
tmp = df2[df2['Ref'].str.contains(row['Invoice'], na=False)]
df4.loc[i, 'Amount'] = tmp['Amount'].values[0] if not tmp.empty else np.nan

Scala Unpivot Table

SCALA
I have a table with this struct:
FName
SName
Email
Jan 2021
Feb 2021
Mar 2021
Total 2021
Micheal
Scott
scarrel#gmail.com
4000
5000
3400
50660
Dwight
Schrute
dschrute#gmail.com
1200
6900
1000
35000
Kevin
Malone
kmalone#gmail.com
9000
6000
18000
32000
And i want to transform it to:
I tried with 'stack' method but i couldn't get it to work.
Thanks
You can flatten the monthly/total columns via explode as shown below:
val df = Seq(
("Micheal", "Scott", "scarrel#gmail.com", 4000, 5000, 3400, 50660),
("Dwight", "Schrute", "dschrute#gmail.com", 1200, 6900, 1000, 35000),
("Kevin", "Malone", "kmalone#gmail.com", 9000, 6000, 18000, 32000)
).toDF("FName","SName", "Email", "Jan 2021", "Feb 2021", "Mar 2021", "Total 2021")
val moYrCols = Array("Jan 2021", "Feb 2021", "Mar 2021", "Total 2021") // (**)
val otherCols = df.columns diff moYrCols
val structCols = moYrCols.map{ c =>
val moYr = split(lit(c), "\\s+")
struct(moYr(1).as("Year"), moYr(0).as("Month"), col(c).as("Value"))
}
df.
withColumn("flattened", explode(array(structCols: _*))).
select(otherCols.map(col) :+ $"flattened.*": _*).
show
/*
+-------+-------+------------------+----+-----+-----+
| FName| SName| Email|Year|Month|Value|
+-------+-------+------------------+----+-----+-----+
|Micheal| Scott| scarrel#gmail.com|2021| Jan| 4000|
|Micheal| Scott| scarrel#gmail.com|2021| Feb| 5000|
|Micheal| Scott| scarrel#gmail.com|2021| Mar| 3400|
|Micheal| Scott| scarrel#gmail.com|2021|Total|50660|
| Dwight|Schrute|dschrute#gmail.com|2021| Jan| 1200|
| Dwight|Schrute|dschrute#gmail.com|2021| Feb| 6900|
| Dwight|Schrute|dschrute#gmail.com|2021| Mar| 1000|
| Dwight|Schrute|dschrute#gmail.com|2021|Total|35000|
| Kevin| Malone| kmalone#gmail.com|2021| Jan| 9000|
| Kevin| Malone| kmalone#gmail.com|2021| Feb| 6000|
| Kevin| Malone| kmalone#gmail.com|2021| Mar|18000|
| Kevin| Malone| kmalone#gmail.com|2021|Total|32000|
+-------+-------+------------------+----+-----+-----+
*/
(**) Use pattern matching in case there are many columns; for example:
val moYrCols = df.columns.filter(_.matches("[A-Za-z]+\\s+\\d{4}"))
val data = Seq(
("Micheal","Scott","scarrel#gmail.com",4000,5000,3400,50660),
("Dwight","Schrute","dschrute#gmail.com",1200,6900,1000,35000),
("Kevin","Malone","kmalone#gmail.com",9000,6000,18000,32000)) )
val columns = Seq("FName","SName","Email","Jan 2021","Feb 2021","Mar 2021","Total 2021")
val newColumns = Array( "FName", "SName", "Email","Total 2021" )
val df = spark.createDataFrame( data ).toDF(columns:_*)
df
.select(
struct(
(for {column <- df.columns } yield col(column)).toSeq :_*
).as("mystruct")) // create your data set with a column as a struct.
.select(
$"mystruct.Fname", // refer to sub element of struct with '.' operator
$"mystruct.sname",
$"mystruct.Email",
explode( /make rows for every entry in the array.
array(
(for {column <- df.columns if !(newColumns contains column) } //filter out the columns we already selected
yield // for each element yield the following expression (similar to map)
struct(
col(s"mystruct.$column").as("value"), // create the value column
lit(column).as("date_year")) // create a date column
).toSeq :_* ) // shorthand to pass scala array into varargs for array function
)
)
.select(
col("*"), // just being lazy instead of typing.
col("col.*") // create columns from new column. Seperating the year/date should be easy from here.
).drop($"col")
.show(false)
+--------------+--------------+------------------+-----+---------+
|mystruct.Fname|mystruct.sname|mystruct.Email |value|date_year|
+--------------+--------------+------------------+-----+---------+
|Micheal |Scott |scarrel#gmail.com |4000 |Jan 2021 |
|Micheal |Scott |scarrel#gmail.com |5000 |Feb 2021 |
|Micheal |Scott |scarrel#gmail.com |3400 |Mar 2021 |
|Dwight |Schrute |dschrute#gmail.com|1200 |Jan 2021 |
|Dwight |Schrute |dschrute#gmail.com|6900 |Feb 2021 |
|Dwight |Schrute |dschrute#gmail.com|1000 |Mar 2021 |
|Kevin |Malone |kmalone#gmail.com |9000 |Jan 2021 |
|Kevin |Malone |kmalone#gmail.com |6000 |Feb 2021 |
|Kevin |Malone |kmalone#gmail.com |18000|Mar 2021 |
+--------------+--------------+------------------+-----+---------

Sum columns in pandas based on the names of the columns

I have a dataframe with the population by age in several cities:
City Age_25 Age_26 Age_27 Age_28 Age_29 Age_30
New York 11312 3646 4242 4344 4242 6464
London 6446 2534 3343 63475 34433 34434
Paris 5242 34343 6667 132 323 3434
Hong Kong 354 979 878 6776 7676 898
Buenos Aires 4244 7687 78 8676 786 9798
I want to create a new dataframe with the sum of the columns based on ranges of three years. That is, people from 25 to 27 and people from 28 to 30. Like this:
City Age_25_27 Age_28_30
New York 19200 15050
London 12323 132342
Paris 46252 3889
Hong Kong 2211 15350
Buenos Aires 12009 19260
In this example I gave a range of three year but in mine real database it has to be 5 five and with 100 ages.
How could I do that? I've saw some related answers but neither work very well in my case.
Try this:
age_columns = df.filter(like='Age_').columns
n = age_columns.str.split('_').str[-1].astype(int)
df['Age_25-27'] = df[age_columns[(n >= 25) & (n <= 27)]].sum(axis=1)
df['Age_28-30'] = df[age_columns[(n >= 28) & (n <= 30)]].sum(axis=1)
Output:
>>> df
City Age_25 Age_26 Age_27 Age_28 Age_29 Age_30 Age_25-27 Age_28-30
New York 11312 3646 4242 4344 4242 6464.0 19200 15050.0
London 6446 2534 3343 63475 34433 34434 NaN 69352 68867.0
Paris 5242 34343 6667 132 323 3434 NaN 41142 3757.0
Hong Kong 354 979 878 6776 7676 898.0 2211 15350.0
Buenos Aires 4244 7687 78 8676 786 9798.0 12009 19260.0
You can use groupby:
In [1]: import pandas as pd
...: import numpy as np
In [2]: d = {
...: 'City': ['New York', 'London', 'Paris', 'Hong Kong', 'Buenos Aires'],
...: 'Age_25': [11312, 6446, 5242, 354, 4244],
...: 'Age_26': [3646, 2534, 34343, 979, 7687],
...: 'Age_27': [4242, 3343, 6667, 878, 78],
...: 'Age_28': [4344, 63475, 132, 6776, 8676],
...: 'Age_29': [4242, 34433, 323, 7676, 786],
...: 'Age_30': [6464, 34434, 3434, 898, 9798]
...: }
...:
...: df = pd.DataFrame(data=d)
...: df = df.set_index('City')
...: df
Out[2]:
Age_25 Age_26 Age_27 Age_28 Age_29 Age_30
City
New York 11312 3646 4242 4344 4242 6464
London 6446 2534 3343 63475 34433 34434
Paris 5242 34343 6667 132 323 3434
Hong Kong 354 979 878 6776 7676 898
Buenos Aires 4244 7687 78 8676 786 9798
In [3]: n_cols = 3 # change to 5 for actual dataset
...: sum_every_n_cols_df = df.groupby((np.arange(len(df.columns)) // n_cols) + 1, axis=1).sum()
...: sum_every_n_cols_df
Out[3]:
1 2
City
New York 19200 15050
London 12323 132342
Paris 46252 3889
Hong Kong 2211 15350
Buenos Aires 12009 19260
You can extract the columns of the dataframe and put them in a list. Use
col_list = df.columns
But ultimately, I think what you'd want to do is more of a while loop with your inputs (band of 5 and up to 100 ages) as static values that you iterate over.
band = 5
start = 20
max_age = 120
i = start
while i < max_age:
age_start = i
age_end = i
sum_cols = []
col_name = 'age_' + str(age_start) + '_to_' + str(age_end)
for i in range(age_start,age_end):
age_adder = 'age_' + str(i)
df[col_name] += df[age_adder]
i += band

What is the fastest way for adding the vector elements horizontally in odd order?

According to this question I implemented the horizontal addition this time 5 by 5 and 7 by 7. It does the job correctly but it is not fast enough.
Can it be faster than what it is? I tried to use hadd and other instruction but the improvement is restricted. For examlple, when I use _mm256_bsrli_epi128 it is slightly better but it needs some extra permutation that ruins the benefit because of the lanes. So the question is how it should be implemented to gain more performance. The same story is for 9 elements, etc.
This adds 5 elements horizontally and puts the results in places 0, 5, and 10:
//it put the results in places 0, 5, and 10
inline __m256i _mm256_hadd5x5_epi16(__m256i a )
{
__m256i a1, a2, a3, a4;
a1 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 1 * 2);
a2 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 2 * 2);
a3 = _mm256_bsrli_epi128(a2, 2);
a4 = _mm256_bsrli_epi128(a3, 2);
return _mm256_add_epi16(_mm256_add_epi16(_mm256_add_epi16(a1, a2), _mm256_add_epi16(a3, a4)) , a );
}
And this adds 7 elements horizontally and puts the results in places 0 and 7:
inline __m256i _mm256_hadd7x7_epi16(__m256i a )
{
__m256i a1, a2, a3, a4, a5, a6;
a1 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 1 * 2);
a2 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 2 * 2);
a3 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 3 * 2);
a4 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 4 * 2);
a5 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 5 * 2);
a6 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 6 * 2);
return _mm256_add_epi16(_mm256_add_epi16(_mm256_add_epi16(a1, a2), _mm256_add_epi16(a3, a4)) , _mm256_add_epi16(_mm256_add_epi16(a5, a6), a ));
}
Indeed it is possible calculate these sums with less instructions. The idea is to accumulate
the partial sums not only in columns 10, 5 and 0, but also in other columns. This reduces the number of
vpaddw instructions and the number of 'shuffles' compared to your solution.
#include <stdio.h>
#include <x86intrin.h>
/* gcc -O3 -Wall -m64 -march=haswell hor_sum5x5.c */
int print_vec_short(__m256i x);
int print_10_5_0_short(__m256i x);
__m256i _mm256_hadd5x5_epi16(__m256i a );
__m256i _mm256_hadd7x7_epi16(__m256i a );
int main() {
short x[16];
for(int i=0; i<16; i++) x[i] = i+1; /* arbitrary initial values */
__m256i t0 = _mm256_loadu_si256((__m256i*)x);
__m256i t2 = _mm256_permutevar8x32_epi32(t0,_mm256_set_epi32(0,7,6,5,4,3,2,1));
__m256i t02 = _mm256_add_epi16(t0,t2);
__m256i t3 = _mm256_bsrli_epi128(t2,4); /* byte shift right */
__m256i t023 = _mm256_add_epi16(t02,t3);
__m256i t13 = _mm256_srli_epi64(t02,16); /* bit shift right */
__m256i sum = _mm256_add_epi16(t023,t13);
printf("t0 = ");print_vec_short(t0 );
printf("t2 = ");print_vec_short(t2 );
printf("t02 = ");print_vec_short(t02 );
printf("t3 = ");print_vec_short(t3 );
printf("t023= ");print_vec_short(t023);
printf("t13 = ");print_vec_short(t13 );
printf("sum = ");print_vec_short(sum );
printf("\nVector elements of interest: columns 10, 5, 0:\n");
printf("t0 [10, 5, 0] = ");print_10_5_0_short(t0 );
printf("t2 [10, 5, 0] = ");print_10_5_0_short(t2 );
printf("t02 [10, 5, 0] = ");print_10_5_0_short(t02 );
printf("t3 [10, 5, 0] = ");print_10_5_0_short(t3 );
printf("t023[10, 5, 0] = ");print_10_5_0_short(t023);
printf("t13 [10, 5, 0] = ");print_10_5_0_short(t13 );
printf("sum [10, 5, 0] = ");print_10_5_0_short(sum );
printf("\nSum with _mm256_hadd5x5_epi16(t0)\n");
sum = _mm256_hadd5x5_epi16(t0);
printf("sum [10, 5, 0] = ");print_10_5_0_short(sum );
/* now the sum of 7 elements: */
printf("\n\nSum of short ints 13...7 and short ints 6...0:\n");
__m256i t = _mm256_loadu_si256((__m256i*)x);
t0 = _mm256_permutevar8x32_epi32(t0,_mm256_set_epi32(3,6,5,4,3,2,1,0));
t0 = _mm256_and_si256(t0,_mm256_set_epi16(0xFFFF,0,0xFFFF,0xFFFF,0xFFFF,0xFFFF,0xFFFF,0xFFFF, 0,0xFFFF,0xFFFF,0xFFFF,0xFFFF,0xFFFF,0xFFFF,0xFFFF));
__m256i t1 = _mm256_alignr_epi8(t0,t0,2);
__m256i t01 = _mm256_add_epi16(t0,t1);
__m256i t23 = _mm256_alignr_epi8(t01,t01,4);
__m256i t0123 = _mm256_add_epi16(t01,t23);
__m256i t4567 = _mm256_alignr_epi8(t0123,t0123,8);
__m256i sum08 = _mm256_add_epi16(t0123,t4567); /* all elements are summed, but another permutation is needed to get the answer at position 7 */
sum = _mm256_permutevar8x32_epi32(sum08,_mm256_set_epi32(4,4,4,4,4,0,0,0));
printf("t = ");print_vec_short(t );
printf("t0 = ");print_vec_short(t0 );
printf("t1 = ");print_vec_short(t1 );
printf("t01 = ");print_vec_short(t01 );
printf("t23 = ");print_vec_short(t23 );
printf("t0123 = ");print_vec_short(t0123 );
printf("t4567 = ");print_vec_short(t4567 );
printf("sum08 = ");print_vec_short(sum08 );
printf("sum = ");print_vec_short(sum );
printf("\nSum with _mm256_hadd7x7_epi16(t) (the answer is in column 0 and in column 7)\n");
sum = _mm256_hadd7x7_epi16(t);
printf("sum = ");print_vec_short(sum );
return 0;
}
inline __m256i _mm256_hadd5x5_epi16(__m256i a )
{
__m256i a1, a2, a3, a4;
a1 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 1 * 2);
a2 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 2 * 2);
a3 = _mm256_bsrli_epi128(a2, 2);
a4 = _mm256_bsrli_epi128(a3, 2);
return _mm256_add_epi16(_mm256_add_epi16(_mm256_add_epi16(a1, a2), _mm256_add_epi16(a3, a4)) , a );
}
inline __m256i _mm256_hadd7x7_epi16(__m256i a )
{
__m256i a1, a2, a3, a4, a5, a6;
a1 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 1 * 2);
a2 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 2 * 2);
a3 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 3 * 2);
a4 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 4 * 2);
a5 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 5 * 2);
a6 = _mm256_alignr_epi8(_mm256_permute2x128_si256(a, _mm256_setzero_si256(), 0x31), a, 6 * 2);
return _mm256_add_epi16(_mm256_add_epi16(_mm256_add_epi16(a1, a2), _mm256_add_epi16(a3, a4)) , _mm256_add_epi16(_mm256_add_epi16(a5, a6), a ));
}
int print_vec_short(__m256i x){
short int v[16];
_mm256_storeu_si256((__m256i *)v,x);
printf("%4hi %4hi %4hi %4hi | %4hi %4hi %4hi %4hi | %4hi %4hi %4hi %4hi | %4hi %4hi %4hi %4hi \n",
v[15],v[14],v[13],v[12],v[11],v[10],v[9],v[8],v[7],v[6],v[5],v[4],v[3],v[2],v[1],v[0]);
return 0;
}
int print_10_5_0_short(__m256i x){
short int v[16];
_mm256_storeu_si256((__m256i *)v,x);
printf("%4hi %4hi %4hi \n",v[10],v[5],v[0]);
return 0;
}
The output is:
$ ./a.out
t0 = 16 15 14 13 | 12 11 10 9 | 8 7 6 5 | 4 3 2 1
t2 = 2 1 16 15 | 14 13 12 11 | 10 9 8 7 | 6 5 4 3
t02 = 18 16 30 28 | 26 24 22 20 | 18 16 14 12 | 10 8 6 4
t3 = 0 0 2 1 | 16 15 14 13 | 0 0 10 9 | 8 7 6 5
t023= 18 16 32 29 | 42 39 36 33 | 18 16 24 21 | 18 15 12 9
t13 = 0 18 16 30 | 0 26 24 22 | 0 18 16 14 | 0 10 8 6
sum = 18 34 48 59 | 42 65 60 55 | 18 34 40 35 | 18 25 20 15
Vector elements of interest: columns 10, 5, 0:
t0 [10, 5, 0] = 11 6 1
t2 [10, 5, 0] = 13 8 3
t02 [10, 5, 0] = 24 14 4
t3 [10, 5, 0] = 15 10 5
t023[10, 5, 0] = 39 24 9
t13 [10, 5, 0] = 26 16 6
sum [10, 5, 0] = 65 40 15
Sum with _mm256_hadd5x5_epi16(t0)
sum [10, 5, 0] = 65 40 15
Sum of short ints 13...7 and short ints 6...0:
t = 16 15 14 13 | 12 11 10 9 | 8 7 6 5 | 4 3 2 1
t0 = 8 0 14 13 | 12 11 10 9 | 0 7 6 5 | 4 3 2 1
t1 = 9 8 0 14 | 13 12 11 10 | 1 0 7 6 | 5 4 3 2
t01 = 17 8 14 27 | 25 23 21 19 | 1 7 13 11 | 9 7 5 3
t23 = 21 19 17 8 | 14 27 25 23 | 5 3 1 7 | 13 11 9 7
t0123 = 38 27 31 35 | 39 50 46 42 | 6 10 14 18 | 22 18 14 10
t4567 = 39 50 46 42 | 38 27 31 35 | 22 18 14 10 | 6 10 14 18
sum08 = 77 77 77 77 | 77 77 77 77 | 28 28 28 28 | 28 28 28 28
sum = 77 77 77 77 | 77 77 77 77 | 77 77 28 28 | 28 28 28 28
Sum with _mm256_hadd7x7_epi16(t) (the answer is in column 0 and in column 7)
sum = 16 31 45 58 | 70 81 91 84 | 77 70 63 56 | 49 42 35 28