Splunk: Use output of search A row by row as input for search B, then produce common result table - splunk

In Splunk, I have a search producing a result table like this:
_time
A
B
C
2022-10-19 09:00:00
A1
B1
C1
2022-10-19 09:00:00
A2
B2
C2
2022-10-19 09:10:20
A3
B3
C3
Now, for each row, I want to run a second search, using the _time value as input parameter.
For above row 1 and 2 (same _time value), the result of the second search would be:
_time
D
E
2022-10-19 09:00:00
D1
E1
For above row 3, the result of the second search would be:
_time
D
E
2022-10-19 09:10:20
D3
E3
And now I want to output the results in a common table, like this:
_time
A
B
C
D
E
2022-10-19 09:00:00
A1
B1
C1
D1
E1
2022-10-19 09:00:00
A2
B2
C2
D1
E1
2022-10-19 09:10:20
A3
B3
C3
D3
E3
I experimented with join, append, map, appendcols and subsearch, but I am struggling both with the row-by-row character of the second search and with pulling to data together into one common table.
For example, appendcols simply tacks one result table onto another, even if they are completely unrelated and differently shaped. Like so:
_time
A
B
C
D
E
2022-10-19 09:00:00
A1
B1
C1
D1
E1
2022-10-19 09:00:00
A2
B2
C2
-
-
2022-10-19 09:10:20
A3
B3
C3
-
-
Can anybody please point me into the right direction?

Related

Compare values from one column in table A and another column in table B

I need to create a NeedDate column in the expected output. I will compare the QtyShort from Table B with QtyReceive from table A.
In the expected output, if QtyShort = 0, NeedDate = MaltDueDate.
For the first row of table A, if 0 < QtyShort (in Table B) <= QtyReceive (=6), NeedDate = 10/08/2021 (DueDate from Table A).
If 6 < QtyShort <= 10 (QtyReceive), move to the second row, NeedDate = 10/22/2021 (DueDate from Table A).
If 10 < QtyShort <= 20 (QtyReceive), move to the third row, NeedDate = 02/01/2022 (DueDate from Table A).
If QtyShort > QtyReceive (=20), NeedDate = 09/09/9999.
This should continue in a loop until the last row on table B has been compared
How could we do this? Any help will be appreciated. Thank you in advance!
Table A
Item DueDate QtyReceive
A1 10/08/2021 6
A1 10/22/2021 10
A1 02/01/2022 20
Table B
Item MatlDueDate QtyShort
A1 06/01/2022 0
A1 06/02/2022 0
A1 06/03/2022 1
A1 06/04/2022 2
A1 06/05/2022 5
A1 06/06/2022 7
A1 06/07/2022 10
A1 06/08/2022 15
A1 06/09/2022 25
Expected Output:
Item MatlDueDate QtyShort NeedDate
A1 06/01/2022 0 06/01/2022
A1 06/02/2022 0 06/02/2022
A1 06/03/2022 1 10/08/2021
A1 06/04/2022 2 10/08/2021
A1 06/05/2022 5 10/08/2021
A1 06/06/2022 7 10/22/2021
A1 06/07/2022 10 10/22/2021
A1 06/08/2022 15 02/01/2022
A1 06/09/2022 25 09/09/9999
Use OUTER APPLY() operator to find the minimum DueDate from TableA that is able to fulfill the QtyShort
select b.Item, b.MatlDueDate, b.QtyShort,
NeedDate = case when b.QtyShort = 0
then b.MatlDueDate
else isnull(a.DueDate, '9999-09-09')
end
from TableB b
outer apply
(
select DueDate = min(a.DueDate)
from TableA a
where a.Item = b.Item
and a.QtyReceive >= b.QtyShort
) a
Result:
Item
MatlDueDate
QtyShort
NeedDate
A1
2022-06-01
0
2022-06-01
A1
2022-06-02
0
2022-06-02
A1
2022-06-03
1
2021-10-08
A1
2022-06-04
2
2021-10-08
A1
2022-06-05
5
2021-10-08
A1
2022-06-06
7
2021-10-22
A1
2022-06-07
10
2021-10-22
A1
2022-06-08
15
2022-02-01
A1
2022-06-09
25
9999-09-09
db<>fiddle demo

create a new table from 2 other tables

If I want to merge the table with 2 other tables b,c
where table a contains columns:( Parent, Style, Ending_Date, WeekNum, Net_Requirment)
tables and calculate how much is required to make product A in a certain date.
The table should like the BOM (Bill of Material)
Can it be applied by pandas?
table b represent the demand for product A per date:
Style Date WeekNum Quantity
A 24/11/2019 0 600
A 01/12/2019 1 500
table c represent Details and quantity used to make product A:
Parent Child Q
A A1 2
A1 A11 3
A1 A12 2
so table a should be filled like this:
Parent Child Date WeekNum Net_Quantity
A A1 24/11/2019 0 1200
A1 A11 24/11/2019 0 3600
A1 A12 24/11/2019 0 2400
A A1 01/12/2019 1 1000
A1 A11 01/12/2019 1 3000
A1 A12 01/12/2019 1 2000
Welcome, in order to properly merge these tables and the rest you would have to have a common key to merge on. What you could do is add said key to each table like this:
data2 = {'Parent':['A','A1','A1'], 'Child':['A1','A11','A12'],
'Q':[2,3,2], 'Style':['A','A','A']}
df2 = pd.DataFrame(data2)
After this you can do a left join on the first table and then you can have multiple rows for the same date. So essentially this:
(notice if you do a left join, your left table will create as many duplicate rows as needed tu suffice the matching key on the right table)
data = {'Style':['A','A'], 'Date':['24/11/2019', '01/12/2019'],
'WeekNum':[0,1], 'Quantity':[600,500]}
df = pd.DataFrame(data)
mergeDf = df.merge(df2,how='left', left_on='Style', right_on='Style')
mergeDf
Then to calculate:
test['Net_Quantity'] = test.Quantity*test.Q
test.drop(['Q'], axis = 1,inplace=True)
result:
Style Date WeekNum Quantity Parent Child Net_Quantity
0 A 24/11/2019 0 600 A A1 1200
1 A 24/11/2019 0 600 A1 A11 1800
2 A 24/11/2019 0 600 A1 A12 1200
3 A 01/12/2019 1 500 A A1 1000
4 A 01/12/2019 1 500 A1 A11 1500
5 A 01/12/2019 1 500 A1 A12 1000

Secondary Sorting (individually)

How would I do the Secondary sorting on a bar chart, for each individual date ?
for example, I have data as follows
Date Type Value
1/1/2020 A1 4
1/1/2020 A2 2
1/1/2020 A3 9
1/1/2020 A4 5
1/1/2020 A5 7
2/1/2020 A1 7
2/1/2020 A2 5
2/1/2020 A3 0
2/1/2020 A4 3
2/1/2020 A5 1
3/1/2020 A1 3
3/1/2020 A2 5
3/1/2020 A3 7
3/1/2020 A4 9
3/1/2020 A5 8
now I need to plot daily bar chart only showing the top three maximum values of individual dates? i.e., the chart would be
Date Type Value
1/1/2020 A3 9
1/1/2020 A4 5
1/1/2020 A5 7
2/1/2020 A1 7
2/1/2020 A2 5
2/1/2020 A4 3
3/1/2020 A3 7
3/1/2020 A4 9
3/1/2020 A5 8
i.e. individual date top three, not like first sum up A1,A2,A3,A4,A5 for each date, and then sorting based on the cumulative sum.
You should be able to achieve the sorting you need through having Date as the dimension and Type as the breakdown dimension.
You should be able to then sort by Date and then secondary sort by type.
Restricting to 3 per date however is something you'd need to do in your data source as Data Studio can't currently do that.

Compare two tables and display selected value from 2nd table

I'm trying to match the 3rd column and 2nd column on two table. In below example, I need to get the PROGRAM from the second table and output it using `AWK. Common between the two table is the TESTER.
below is my code, not working . pls help fix
awk -F, 'NR==FNR{a[$1]=$8;next;}{print $0,a[$3]?a[$2]:"N/A"}' OFS=, table1 table2
Table1:
Date Time TESTER Niche SMS_NO TEST_AREA SCREEN_TYPE PROGRAM
4/23/2019 8:40:42 A1 Nxx S11 TA1 ST1 PGM1
4/23/2019 7:34:08 B1 Nx1 S21 TA2 ST2 PGM2
4/23/2019 3:16:24 C1 Nx2 S31 TA3 ST3 PGM3
4/23/2019 6:22:04 D1 Nx3 S41 TA4 ST4 PGM4
4/23/2019 8:55:19 E1 Nx4 S51 TA5 ST5 PGM5
7/22/2018 17:30:37 F1 Nx5 S61 TA6 ST6 PGM6
Table2:
FEATURE TESTER LICENSE_USED
FEA1 A1 4
FEA2 B1 16
FEA3 C1 16
FEA4 D1 16
FEA5 E1 16
FEA6 F1 16
FEA7 G1 16
FEA8 G2 16
Expected output:
FEATURE TESTER LICENSE_USED PROGRAM
FEA1 A1 4 PGM1
FEA2 B1 16 PGM2
FEA3 C1 16 PGM3
FEA4 D1 16 PGM4
FEA5 E1 16 PGM5
FEA6 F1 16 PGM6
FEA7 G1 16 N/A
FEA8 G2 16 N/A
Please check this:
awk 'NR==FNR {a[$3]=$8; next} {print $0 FS (a[$2]?a[$2]:"N/A")}' file1.txt file2.txt
File1.txt
Date Time TESTER Niche SMS_NO TEST_AREA SCREEN_TYPE PROGRAM
4/23/2019 8:40:42 A1 Nxx S11 TA1 ST1 PGM1
4/23/2019 7:34:08 B1 Nx1 S21 TA2 ST2 PGM2
4/23/2019 3:16:24 C1 Nx2 S31 TA3 ST3 PGM3
4/23/2019 6:22:04 D1 Nx3 S41 TA4 ST4 PGM4
4/23/2019 8:55:19 E1 Nx4 S51 TA5 ST5 PGM5
7/22/2018 17:30:37 F1 Nx5 S61 TA6 ST6 PGM6
File2.txt
FEATURE TESTER LICENSE_USED
FEA1 A1 4
FEA2 B1 16
FEA3 C1 16
FEA4 D1 16
FEA5 E1 16
FEA6 F1 16
FEA7 G1 16
FEA8 G2 16
Output:
FEATURE TESTER LICENSE_USED PROGRAM
FEA1 A1 4 PGM1
FEA2 B1 16 PGM2
FEA3 C1 16 PGM3
FEA4 D1 16 PGM4
FEA5 E1 16 PGM5
FEA6 F1 16 PGM6
FEA7 G1 16 N/A
FEA8 G2 16 N/A
tried on gnu awk
awk 'NR==FNR{a[$3]=$8;next} {$4=a[$2];if($4=="") $4="N/A";print}' Table1 Table2

Pandas pivot table selecting rows with maximum values

I have pandas dataframe as:
df
Id Name CaseId Value
82 A1 case1.01 37.71
1558 A3 case1.01 27.71
82 A1 case1.06 29.54
1558 A3 case1.06 29.54
82 A1 case1.11 12.09
1558 A3 case1.11 32.09
82 A1 case1.16 33.35
1558 A3 case1.16 33.35
For each Id, Name pair I need to select the CaseId with maximum value.
i.e. I am seeking the following output:
Id Name CaseId Value
82 A1 case1.01 37.71
1558 A3 case1.16 33.35
I tried the following:
import pandas as pd
pd.pivot_table(df, index=['Id', 'Name'], columns=['CaseId'], values=['Value'], aggfunc=[np.max])['amax']
But all it does is for each CaseId as column it gives maximum value and not the results that I am seeking above.
sort_values + drop_duplicates
df.sort_values('Value').drop_duplicates(['Id'],keep='last')
Out[93]:
Id Name CaseId Value
7 1558 A3 case1.16 33.35
0 82 A1 case1.01 37.71
Since we post same time , adding more method
df.sort_values('Value').groupby('Id').tail(1)
Out[98]:
Id Name CaseId Value
7 1558 A3 case1.16 33.35
0 82 A1 case1.01 37.71
This should work:
df = df.sort_values('Value', ascending=False).drop_duplicates('Id').sort_index()
Output:
Id Name CaseId Value
0 82 A1 case1.01 37.71
7 1558 A3 case1.16 33.35
With nlargest and groupby
pd.concat(d.nlargest(1, ['Value']) for _, d in df.groupby('Name'))
Id Name CaseId Value
0 82 A1 case1.01 37.71
7 1558 A3 case1.16 33.35
Another idea is to create a joint column, take its max, then split it back to two columns:
df['ValueCase'] = list(zip(df['Value'], df['CaseId']))
p = pd.pivot_table(df, index=['Id', 'Name'], values=['ValueCase'], aggfunc='max')
p['Value'], p['CaseId'] = list(zip(*p['ValueCase']))
del p['ValueCase']
Results in:
CaseId Value
Id Name
82 A1 case1.01 37.71
1558 A3 case1.16 33.35