Sql: add a column with integers in a loop for duplicates - sql

I have a sql table like:
ID Name Balance
1 Peter 324.5
2 Michael 122.7
3 Peter 788.3
4 Mark 45.7
5 Ralph 333.5
6 Thomas 563.2
7 Ralph 9685.1
8 Peter 2444.5
9 Susi 35.2
10 Andrew 442.5
11 Susi 2424.8
Is it possible to write a while loop in sql, where you could add a whole new column with integer numbers (for example 1....3) for each duplicate names (3 times Peter, 2 times Susi, 2 times Ralph)? For the non duplicate names it should be a value of 0.
So the final table should look like this:
ID Name Balance Value
1 Peter 324.5 1
2 Michael 122.7 0
3 Peter 788.3 1
4 Mark 45.7 0
5 Ralph 333.5 2
6 Thomas 563.2 0
7 Ralph 9685.1 2
8 Peter 2444.5 1
9 Susi 35.2 3
10 Andrew 442.5 0
11 Susi 2424.8 3

You wouldn't want to use a while loop for this. Just use window functions:
select t.*, count(*) over (partition by name) as cnt
from t;
This provides the total count for each name. If you want an incremental value, you can use row_number():
select t.*, row_number() over (partition by name order by id) as seqnum
from t;
This would enumerate the rows for each name, so every name would have a "1" value, some would have "2" and so on.

Related

Create a new column for group based on condition

I wanted to create a new column (Group ID) on the basis of following conditions:
If the DOB and first three letters of Name are same, then it must fall is same Group ID.
Name
DOB
Group ID
Anny
18-01-1922
0
Anny Scott
01-01-1950
1
Annie
01-01-1950
1
David
14-02-1950
2
David Kern
15-02-1951
3
William Perry
15-02-1953
4
Kenneth Field
15-02-1953
5
This how I want to create the groups
I have used the following code, to create the group ID for name (If first three letters are matched)
df['Group ID Name']=df.groupby(df['name'].str[:3]).ngroup()
The following code is used to create the group ID for DOB (If two records have the same DOB)
df['Group ID DOB']=df.groupby('Date of Birth').ngroup()
I want to use both the condition to create the Group ID, please help me out for the same.
Add multiple columns in list and also for correct ordering sort=False:
df['Group ID Name'] = df.groupby(['DOB',df['Name'].str[:3]], sort=False).ngroup()
print (df)
Name DOB Group;ID Group ID Name
0 Anny 18-01-1922 0 0
1 Anny Scott 01-01-1950 1 1
2 Annie 01-01-1950 1 1
3 David 14-02-1950 2 2
4 David Kern 15-02-1951 3 3
5 William erry 15-02-1953 4 4
6 Kenneth Field 15-02-1953 5 5

How to duplicate row based on int column

If I have a table like this in Hive:
name impressions sampling_rate
------------------------------------
paul 34 1
emma 0 3
greg 0 5
How can I duplicate each row in a select statement by the sampling_rate column so that it would look like this:
name impressions sampling_rate
------------------------------------
paul 34 1
emma 0 3
emma 0 3
emma 0 3
greg 0 5
greg 0 5
greg 0 5
greg 0 5
greg 0 5
Using space() you can produce a string of spaces with lenght=sampling_rate-1 , split it and explode with lateral view, it will duplicate rows.
Demo:
with your_table as(--Demo data, use your table instead of this CTE
select stack (3, --number of tuples
'paul',34,1,
'emma', 0,3,
'greg', 0,5
) as (name,impressions,sampling_rate)
)
select t.*
from your_table t --use your table here
lateral view explode(split(space(t.sampling_rate-1),' '))e
Result:
name impressions sampling_rate
------------------------------------
paul 34 1
emma 0 3
emma 0 3
emma 0 3
greg 0 5
greg 0 5
greg 0 5
greg 0 5
greg 0 5

Using a Case statement for an alias

I'm trying to change the header of a column based on a variable
Currently I have
SELECT
(CASE
WHEN GROUPING(CASE ##Role
WHEN 2 THEN Processor
WHEN 3 THEN Reviewer
END) = 1
THEN 'Total'
ELSE (CASE ##Role
WHEN 2 THEN Processor
WHEN 3 THEN Reviewer
END)
END) AS 'User',
COUNT(EntityId) AS 'Tickets Processed'
FROM
table
WHERE
conditions
GROUP BY
CASE ##Role
WHEN 2 THEN Processor
WHEN 3 THEN Reviewer
END WITH ROLLUP
Right now this returns the data I need for the correct role, however is there a way to change the second column's header based on the variable to something like
COUNT(EntityId) AS CASE ##Role
WHEN 2 THEN 'Tickets Processed'
WHEN 3 THEN 'Tickets Reviewed'
END
EDIT:
Sample of current result:
##Role = 2 or ##Role = 3
Both return:
User Tickets Processed
-----------------------------
Steve 1
Gerald 3
John 1
Paul 2
Peter 5
Total 12
Desired result:
##Role = 2
User Tickets Processed
-----------------------------
Steve 1
Gerald 3
John 1
Paul 2
Peter 5
Total 12
##Role = 3
User Tickets Reviewed
-----------------------------
Steve 1
Gerald 3
John 1
Paul 2
Peter 5
Total 12
Sample data
EntityID Processor Reviewer
----------------------------------
1 Peter Bob
2 Peter Paul
3 Peter Bob
4 John Paul
5 Peter Bob
6 Peter Bob
...
You can either use dynamic sql, or you can split the logic based on the ##role variable:
IF ##Role = 2 THEN {do Query A}
ELSE {do Query B}
But you definitely cannot base the column alias on the value of a variable in the context of a non-dynamic query.

How to rewrite query which gives amount of specific value in row to avoid some values and count further with others?

I have a query which gives me amount of grade 5 for every student in row (if student don't have any other grade on the way):
select distinct on (student, class) scg.*
from (select student, class, grade, count(*) as cnt,
min(gradeDate), max(gradeDate), min_gradeDate, max_gradeDate
from (select t.*,
row_number() over (partition by student, class, grade order by gradeDate) as seqnum_scg,
row_number() over (partition by student, class order by gradeDate) as seqnum_sc
from t
) t
where grade = 5
group by student, class, grade, (seqnum_sc - seqnum_scg)
) scg
order by student, class, cnt desc;
The original problem is explained here:
How to count data with specific values and for specific user/person (in row)?
But now I want to extend this query with one more feature. This counter gives me max value unless some student have grade 4/3/2/1, but now I want it to:
stop counting if student has 4 or 3 grade and start over (with previous max) when student get another 5
What I mean:
Actual query: 5, 5, 5, 4, 3, 5, 5, 2 --> gives me max = 3
New query: 5, 5, 5, 4, 3, 5, 5, 2 --> gives me max = 5, because 4 and 3 stop counter and start it when user gets another 5
stop counting if student gets grade 2 or 1 (and give me max value before getting 2/1 grade) So the same thing which query does now for every grade except 5, but I want it only for 2 and lower (that I can specify in query).
Can someone help me rewrite the second query given by #Gordon Linoff to work like that and tell me what changed?
Edit: examples as requested:
id student grade class gradeDate
1 1 5 1 2017-03-03
2 1 5 1 2017-03-04
3 1 1 1 2017-03-05
4 1 5 1 2017-03-06
5 1 5 1 2017-03-07
6 1 5 1 2017-03-08
7 1 1 1 2017-03-09
8 2 5 2 2017-03-03
9 3 5 3 2017-03-03
10 4 5 4 2017-03-03
11 4 5 4 2017-03-04
12 4 4 4 2017-03-05
13 4 3 4 2017-03-06
14 4 5 4 2017-03-07
15 4 5 4 2017-03-08
16 5 5 5 2017-03-01
17 5 5 5 2017-03-03
18 5 5 5 2017-03-04
19 5 5 5 2017-03-05
20 5 5 5 2017-03-06
21 5 2 5 2017-03-07
22 5 5 5 2017-03-08
23 5 5 5 2017-03-09
Student one : max = 3
Student two : max = 1
Student three : max = 1
Student four : max = 4 (grade 4 and 3 stop counter, but don't reset it)
Student five : max = 5 (because grade 2 reset counter, lack of grade on date
2017-03-02 is not a problem for counter)
One of the methods can be using 2 subqueries and one analytic function
Demo: http://sqlfiddle.com/#!15/74b71/10
SELECT student, max( xxx )
FROM (
SELECT student, grp_nbr, count(CASE WHEN grade = 5 THEN 1 END) As xxx
FROM (
SELECT *,
SUM ( CASE WHEN grade in (1,2)
THEN 1 ELSE 0
END
) OVER (Partition by student Order By gradeDate ) As grp_nbr
FROM table1
) x
GROUP BY student, grp_nbr
) y
GROUP BY student
ORDER BY student
| student | max |
|---------|-----|
| 1 | 3 |
| 2 | 1 |
| 3 | 1 |
| 4 | 4 |
| 5 | 5 |

Adding missing information to a table? (considering random start and end months)

I have the following table spanishcourse, representing the grades of students in a Spanish course. This is a school with courses starting every month, and students can start and leave the course randomly during the year (month in and month out columns respectively). The fact is some of these students were absent in some months, when they were absent the grade is 0. The problem is when the student was absent the column month does not show the grade as 0 (column grades).
month in month out month student grades
3 9 3 John 10
3 9 5 John 8
3 9 6 John 4
3 9 7 John 3
3 9 9 John 7
2 7 2 Mary 9
2 7 3 Mary 2
2 7 6 Mary 6
2 7 7 Mary 9
1 3 1 Jane 8
1 3 2 Jane 7
1 3 3 Jane 5
6 10 6 Rick 9
6 10 8 Rick 1
6 10 10 Rick 3
The output that I need is, now a small part of Rick:
month in month out month student grades
6 10 6 Rick 9
6 10 7 Rick 0
6 10 8 Rick 1
6 10 9 Rick 0
6 10 10 Rick 3
Conclusion: I only need to add the missing periods from the start until the end of a student. Considering Rick's example, we only added months 7 and 9 as having grade 0. Can some of you help me please?
PS: I already saw some other answered questions. They were the opposite because they considered as all data starting from 1 to n. They were not considering random months like this example.
You can do this using cross join and left outer join. The cross join generates all combinations between students and months. The left outer join brings in the data for the matching records. Records that don't match get a grade of 0.
The following assumes that some student somewhere has a grade in each month:
select s.month_in, s.month_out, m.month, s.student,
coalesce(sc.grades, 0) as grades
from (select distinct student, month_in, month_out from spanishcourse sc) s cross join
(select distinct month from spanishcourse sc) m left outer join
spanishcourse sc
on sc.student = s.student and sc.month = m.month;
SQL Fiddle
select s.month_in, s.month_out, month, student, coalesce(grades, 0)
from
spanishcourse sc
right join
(
select distinct
student, month_in, month_out,
generate_series(month_in, month_out, 1) as month
from spanishcourse
) s using (student, month)
order by student, month