How to write sql query to generate a group no for each grouped record - sql

following is scenario:
I have data in following format:
entryid , ac_no, db/cr, amt
-----------------------------------------------
1 10 D 5
1 11 C 5
2 01 D 8
2 11 C 8
3 12 D 10
3 13 C 10
4 14 D 5
4 16 C 5
5 14 D 2
5 17 C 2
6 14 D 3
6 18 C 3
I want data in following format:
So far i have acheived the first 3 columns by query
select wm_concat(entryid),ac_no,db_cr,Sum(amt) from t1 group by ac_no,db_cr
wm_Concat(entryid),ac_no, db/cr, Sum(amt), set_id
------------------------------------------------
1 10 D 5 S1
2 01 D 8 S1
1,2 11 C 13 S1
3 12 D 10 S2
3 13 C 10 S2
4,5,6 14 D 10 S3
4 16 C 5 S3
5 17 C 2 S3
6 18 C 3 S3
I want an additional column `set_id` that either shows this S1, S2.. or any number 1,2.. so that the debit & credit entries sets can be identified.
I am making sets of debit and credit entries based on their Ac_no values.
Any little help will be highly appreciated. Thanks

Create a new column say set and give a unique identifier to the particular set. So for example the first three records will have set id S1, next two will have S2 and so on.
To distinguish a transaction from a set you can use column db/cr along with newly added set column. You can identify that the 3rd row is a set since it's transaction type is 'C' whereas the transactions are of type 'D'.
Here I have assumed that your transactions are debit only, if not please provide more details in the question. Hope this helps.

Related

Create multiple rows based on a column containing a list of numbers

I currently have a table which looks like this.
A Category Code
1 A 10,30
2 B 30
3 C 20,30,40
Is there anyway to write a sql statement that would get me
ID Category Code
1 A 10
1 A 30
2 B 30
3 C 20
3 C 30
3 C 40
Thanks
You can use UNNEST with SPLIT function...
select a, category, s_code
from my_data, unnest(split(code, ',')) as s_code
a
category
s_code
1
A
10
1
A
30
2
B
30
3
C
20
3
C
30
3
C
40

Running ID based on 2 columns

Can someone help. Been trawling through Google and loads of forums but can't seem to find what I am looking for. I need some kind of running ID added to my data. See example below.
This is my data
ID
A
B
C
1
22
WP1234
C
2
22
WP1235
C
3
22
WP1236
O
4
24
WP1237
C
5
24
WP1238
C
6
24
WP1239
O
7
26
WP1240
C
8
26
WP1241
C
9
28
WP1242
C
I need to get some kind of running ID based on columns and A, C.
Desired outcome would be
ID
A
B
C
RunningID
1
22
WP1234
C
1
2
22
WP1235
C
2
3
22
WP1236
O
1
4
24
WP1237
C
1
5
24
WP1238
C
2
6
24
WP1239
O
1
7
26
WP1240
C
1
8
26
WP1241
C
2
9
28
WP1242
C
1
I think you want row_number():
select t.*,
row_number() over (partition by a, c order by id) as running_id
from t;

To prepare a dataframe with elements being repeated from a list in python

I have a list as primary = ['A' , 'B' , 'C' , 'D']
and a DataFrame as
df2 = pd.DataFrame(data=dateRange, columns = ['Date'])
which contains 1 date column starting from 01-July-2020 till 31-Dec-2020.
I created another column 'DayNum' which will contain the day number from the date like 01-July-2020 is Wednesday so the 'DayNum' column will have 2 and so on.
Now using the list I want to create another column 'primary' so that the DataFrame looks as follows:
In short, the elements on the list should repeat. You can say that this is a roster to show the name of the person on the roster on a weekly basis where Monday is the start (day 0) and Sunday is the end (day 6).
The output should be like this:
Date DayNum Primary
0 01-Jul-20 2 A
1 02-Jul-20 3 A
2 03-Jul-20 4 A
3 04-Jul-20 5 A
4 05-Jul-20 6 A
5 06-Jul-20 0 B
6 07-Jul-20 1 B
7 08-Jul-20 2 B
8 09-Jul-20 3 B
9 10-Jul-20 4 B
10 11-Jul-20 5 B
11 12-Jul-20 6 B
12 13-Jul-20 0 C
13 14-Jul-20 1 C
14 15-Jul-20 2 C
15 16-Jul-20 3 C
16 17-Jul-20 4 C
17 18-Jul-20 5 C
18 19-Jul-20 6 C
19 20-Jul-20 0 D
20 21-Jul-20 1 D
21 22-Jul-20 2 D
22 23-Jul-20 3 D
23 24-Jul-20 4 D
24 25-Jul-20 5 D
25 26-Jul-20 6 D
26 27-Jul-20 0 A
27 28-Jul-20 1 A
28 29-Jul-20 2 A
29 30-Jul-20 3 A
30 31-Jul-20 4 A
First compare column for 0 by Series.eq with cumulative sum by Series.cumsum for groups for each week, then use modulo by Series.mod with number of values in list and last map by dictioanry created by enumerate and list by Series.map:
primary = ['A','B','C','D']
d = dict(enumerate(primary))
df['Primary'] = df['DayNum'].eq(0).cumsum().mod(len(primary)).map(d)

SAS sum observations not in a group, by group

I have a data set :
data have;
input group $ value;
datalines;
A 4
A 3
A 2
A 1
B 1
C 1
D 2
D 1
E 1
F 1
G 2
G 1
H 1
;
run;
The first variable is a group identifier, the second a value.
For each group, I want a new variable "sum" with the sum of all values in the column, exept for the group the observation is in.
My issue is having to do that on nearly 30 millions of observations, so efficiency matters.
I found that using data step was more efficient than using procs.
The final database should looks like :
data want;
input group $ value $ sum;
datalines;
A 4 11
A 3 11
A 2 11
A 1 11
B 1 20
C 1 20
D 2 18
D 1 18
E 1 20
F 1 20
G 2 18
G 1 20
H 1 20
;
run;
Any idea how to perform this please?
Edit: I don't know if this matter but the example I gave is a simplified version of my issue. In the real case, I have 2 other group variable, thus taking the sum of the whole column and substract the sum in the group is not a viable solution.
The requirement
sum of all values in the column, except for the group the observation is in
indicates two passes of the data must occur:
Compute the all_sum and each group's group_sumA hash can store each group's sum -- computed via a specified suminc: variable and .ref() method invocation. A variable can accumulate allsum.
Compute allsum - group_sum for each row of a group.The group_sum is retrieved from hash and subtracted from allsum.
Example:
data want;
if 0 then set have; * prep pdv;
declare hash sums (suminc:'value');
sums.defineKey('group');
sums.defineDone();
do while (not hash_loaded);
set have end=hash_loaded;
sums.ref(); * adds value to internal sum of hash data record;
allsum + value;
end;
do while (not last_have);
set have end=last_have;
sums.sum(sum:sum); * retrieve groups sum. Do you hear the Dragnet theme too?;
sum = allsum - sum; * subtract from allsum;
output;
end;
stop;
run;
What is wrong with a straight forward approach? You need to make two passes no matter what you do.
Like this. I included extra variables so you can see how the values are derived.
proc sql ;
create table want as
select a.*,b.grand,sum(value) as total, b.grand - sum(value) as sum
from have a
, (select sum(value) as grand from have) b
group by a.group
;
quit;
Results:
Obs group value grand total sum
1 A 3 21 10 11
2 A 1 21 10 11
3 A 2 21 10 11
4 A 4 21 10 11
5 B 1 21 1 20
6 C 1 21 1 20
7 D 2 21 3 18
8 D 1 21 3 18
9 E 1 21 1 20
10 F 1 21 1 20
11 G 1 21 3 18
12 G 2 21 3 18
13 H 1 21 1 20
Note it does not matter what you have as your GROUP BY clause.
Do you really need to output all of the original observations? Why not just output the summary table?
proc sql ;
create table want as
select a.group, b.grand - sum(value) as sum
from have a
, (select sum(value) as grand from have) b
group by a.group
;
quit;
Results
Obs group total sum
1 A 10 11
2 B 1 20
3 C 1 20
4 D 3 18
5 E 1 20
6 F 1 20
7 G 3 18
8 H 1 20
I would break this out into two different segments:
1.) You could start by using PROC SQL to get the sums by the group
2.) Then use some IF/THEN statements to reassign the values by group

Using temporary extended table to make a sum

From a given table I want to be able to sum values having the same number (should be easy, right?)
Problem: A given value can be assigned from 2 to n consecutive numbers.
For some reasons this information is stored in a single row describing the value, the starting number and the ending number as below.
TABLE A
id | starting_number | ending_number | value
----+-----------------+---------------+-------
1 2 5 8
2 0 3 5
3 4 6 6
4 7 8 10
For instance the first row means:
value '8' is assigned to numbers: 2, 3 and 4 (5 is excluded)
So, I would like the following intermediairy result table
TABLE B
id | number | value
----+--------+-------
1 2 8
1 3 8
1 4 8
2 0 5
2 1 5
2 2 5
3 4 6
3 5 6
4 7 10
So I can sum 'value' for elements having identical 'number'
SELECT number, sum(value)
FROM B
GROUP BY number
TABLE C
number | sum(value)
--------+------------
2 13
3 8
4 14
0 5
1 5
5 6
7 10
I don't know how to do this and didn't find any answer on the web (maybe not looking with appropriate key words...)
Any idea?
You can do what you want with generate_series(). So, TableB is basically:
select id, generate_series(starting_number, ending_number - 1, 1) as n, value
from tableA;
Your aggregation is then:
select n, sum(value)
from (select id, generate_series(starting_number, ending_number - 1, 1) as n, value
from tableA
) a
group by n;