How to duplicate row based on int column

How to duplicate row based on int column - sql

If I have a table like this in Hive:
name impressions sampling_rate
------------------------------------
paul 34 1
emma 0 3
greg 0 5
How can I duplicate each row in a select statement by the sampling_rate column so that it would look like this:
name impressions sampling_rate
------------------------------------
paul 34 1
emma 0 3
emma 0 3
emma 0 3
greg 0 5
greg 0 5
greg 0 5
greg 0 5
greg 0 5

Using space() you can produce a string of spaces with lenght=sampling_rate-1 , split it and explode with lateral view, it will duplicate rows.
Demo:
with your_table as(--Demo data, use your table instead of this CTE
select stack (3, --number of tuples
'paul',34,1,
'emma', 0,3,
'greg', 0,5
) as (name,impressions,sampling_rate)
)
select t.*
from your_table t --use your table here
lateral view explode(split(space(t.sampling_rate-1),' '))e
Result:
name impressions sampling_rate
------------------------------------
paul 34 1
emma 0 3
emma 0 3
emma 0 3
greg 0 5
greg 0 5
greg 0 5
greg 0 5
greg 0 5

Related

pandas conditional fill binary [duplicate]

This question already has an answer here:
Quickest way to make a get_dummies type dataframe from a column with a multiple of strings
(1 answer)
Closed 1 year ago.
I have a df
name cars
john honda,kia
tom honda,kia,nissan
jack toyota
johnny honda,kia
tommy honda,kia,nissan
jacky toyota
What is a best way using pandas to create a data frame that would add a 1 if car present else 0 to existing df which would look like this.
name cars honda kia nissan toyota
john honda,kia 1 1 0 0
tom honda,kia,nissan 1 1 1 0
jack toyota 0 0 0 1
johnny honda,kia 1 1 0 0
tommy honda,kia,nissan 1 1 1 0
jacky toyota 0 0 0 1
i tried using np.where with multiple conditions as described here but i don't think its the right approach.

That’s exactly what pd.Series.str.get_dummies does, just join it’s result to your dataframe without the cars column:
>>> df.drop(columns=['cars']).join(df['cars'].str.get_dummies(sep=','))
name honda kia nissan toyota
0 john 1 1 0 0
1 tom 1 1 1 0
2 jack 0 0 0 1
3 johnny 1 1 0 0
4 tommy 1 1 1 0
5 jacky 0 0 0 1

Sql: add a column with integers in a loop for duplicates

I have a sql table like:
ID Name Balance
1 Peter 324.5
2 Michael 122.7
3 Peter 788.3
4 Mark 45.7
5 Ralph 333.5
6 Thomas 563.2
7 Ralph 9685.1
8 Peter 2444.5
9 Susi 35.2
10 Andrew 442.5
11 Susi 2424.8
Is it possible to write a while loop in sql, where you could add a whole new column with integer numbers (for example 1....3) for each duplicate names (3 times Peter, 2 times Susi, 2 times Ralph)? For the non duplicate names it should be a value of 0.
So the final table should look like this:
ID Name Balance Value
1 Peter 324.5 1
2 Michael 122.7 0
3 Peter 788.3 1
4 Mark 45.7 0
5 Ralph 333.5 2
6 Thomas 563.2 0
7 Ralph 9685.1 2
8 Peter 2444.5 1
9 Susi 35.2 3
10 Andrew 442.5 0
11 Susi 2424.8 3

You wouldn't want to use a while loop for this. Just use window functions:
select t.*, count(*) over (partition by name) as cnt
from t;
This provides the total count for each name. If you want an incremental value, you can use row_number():
select t.*, row_number() over (partition by name order by id) as seqnum
from t;
This would enumerate the rows for each name, so every name would have a "1" value, some would have "2" and so on.

How to rewrite query which gives amount of specific value in row to avoid some values and count further with others?

I have a query which gives me amount of grade 5 for every student in row (if student don't have any other grade on the way):
select distinct on (student, class) scg.*
from (select student, class, grade, count(*) as cnt,
min(gradeDate), max(gradeDate), min_gradeDate, max_gradeDate
from (select t.*,
row_number() over (partition by student, class, grade order by gradeDate) as seqnum_scg,
row_number() over (partition by student, class order by gradeDate) as seqnum_sc
from t
) t
where grade = 5
group by student, class, grade, (seqnum_sc - seqnum_scg)
) scg
order by student, class, cnt desc;
The original problem is explained here:
How to count data with specific values and for specific user/person (in row)?
But now I want to extend this query with one more feature. This counter gives me max value unless some student have grade 4/3/2/1, but now I want it to:
stop counting if student has 4 or 3 grade and start over (with previous max) when student get another 5
What I mean:
Actual query: 5, 5, 5, 4, 3, 5, 5, 2 --> gives me max = 3
New query: 5, 5, 5, 4, 3, 5, 5, 2 --> gives me max = 5, because 4 and 3 stop counter and start it when user gets another 5
stop counting if student gets grade 2 or 1 (and give me max value before getting 2/1 grade) So the same thing which query does now for every grade except 5, but I want it only for 2 and lower (that I can specify in query).
Can someone help me rewrite the second query given by #Gordon Linoff to work like that and tell me what changed?
Edit: examples as requested:
id student grade class gradeDate
1 1 5 1 2017-03-03
2 1 5 1 2017-03-04
3 1 1 1 2017-03-05
4 1 5 1 2017-03-06
5 1 5 1 2017-03-07
6 1 5 1 2017-03-08
7 1 1 1 2017-03-09
8 2 5 2 2017-03-03
9 3 5 3 2017-03-03
10 4 5 4 2017-03-03
11 4 5 4 2017-03-04
12 4 4 4 2017-03-05
13 4 3 4 2017-03-06
14 4 5 4 2017-03-07
15 4 5 4 2017-03-08
16 5 5 5 2017-03-01
17 5 5 5 2017-03-03
18 5 5 5 2017-03-04
19 5 5 5 2017-03-05
20 5 5 5 2017-03-06
21 5 2 5 2017-03-07
22 5 5 5 2017-03-08
23 5 5 5 2017-03-09
Student one : max = 3
Student two : max = 1
Student three : max = 1
Student four : max = 4 (grade 4 and 3 stop counter, but don't reset it)
Student five : max = 5 (because grade 2 reset counter, lack of grade on date
2017-03-02 is not a problem for counter)

One of the methods can be using 2 subqueries and one analytic function
Demo: http://sqlfiddle.com/#!15/74b71/10
SELECT student, max( xxx )
FROM (
SELECT student, grp_nbr, count(CASE WHEN grade = 5 THEN 1 END) As xxx
FROM (
SELECT *,
SUM ( CASE WHEN grade in (1,2)
THEN 1 ELSE 0
END
) OVER (Partition by student Order By gradeDate ) As grp_nbr
FROM table1
) x
GROUP BY student, grp_nbr
) y
GROUP BY student
ORDER BY student
| student | max |
|---------|-----|
| 1 | 3 |
| 2 | 1 |
| 3 | 1 |
| 4 | 4 |
| 5 | 5 |

Collating data in SQL Server

I have the following data in SQL Server
St 1 2 3 4 5 6 7 8
===========================================
603 2 5 1.5 3 0 0 0 0
603 0 0 0 0 2 1 3 5
As I insert the data by batches, each batch only has 4 columns each and I want to collate the data to the following
St 1 2 3 4 5 6 7 8
===========================================
603 2 5 1.5 3 2 1 3 5
but most of the threads I see here are about concatenating strings of a single column.
Anyone has any idea on how to collate or even merge different rows into a single row.

You can use the group by and Sum key word of the t-SQL
SELECT SUM(COL1) , SUM(COL2)..... FROM tbl GROUP BY ST

You can use the GROUP BY clause and aggregate with SUM fields 1-8 :
SELECT St, SUM(1), SUM(2),.. FROM tbl GROUP BY St

stored procedure to find value in 2 columns out of 3

I am putting in the samle date and i am supposed to do something similar what i am asking.
I want to run a query that would pull values in any two columns out 3 if it has a 1 or if any one column has a 1 it will return just those results. However it should search all three columns and in any of the three columns where it found value as 1 it should return that result. Can anyone please help me with this. Thanks in advance.
ID Patient Patient Name prio prio2 prio3
-------------------------------------------------
1 101563 Robert Riley 1 1 1
2 101583 Cody Ayers 1 0 1
3 101825 Jason Lawler 0 0 1
4 101984 Dustin Lumis 1 0 0
5 102365 Stacy smith 1 0 0
6 102564 Frank Milon 1 0 0
7 102692 Thomas Kroning 1 0 0
8 102856 Andrew Philips 1 0 0
9 102915 Alice Davies 0 0 1
10 103785 Jon Durley 0 0 1
11 103958 Clayton Folsom 1 1 1
12 104696 Michelle Holsley 1 1 1
13 104983 Teresa Jones 1 0 1
14 105892 Betsy Prat 1 1 0
15 106859 Casey Ayers 1 1 0

So, basically you want to pull anything where any of the 3 columns prio,prio2, or prio3 =1? Please clarify your question if this isn't what you are asking( for a better answer). Also, you should tag it with what type of SQL.
SELECT ID,Patient,[Patient Name],prio,prio2, prio3
FROM uRtable
WHERE prio = 1 OR prio2 = 1 OR prio3 = 1
But, if you are saying that you want to pull back any row where any of the 3 columns prio,prio2, or prio3 = 1, but at least one of them is 0 (Get any where any of the 3 = 1 but exclude where they all = 1), probably the easiest way to understand that would be
SELECT ID,Patient,[Patient Name],prio,prio2, prio3
FROM uRtable
WHERE (prio = 1 OR prio2 = 1 OR prio3 = 1)
AND (prio = 0 OR prio2 = 0 OR prio3 = 0)

Try this:
select * from mytable
where prio + prio2 + prio3 = (
select max(prio + prio2 + prio3)
from mytable
where prio = 1 or prio2 = 1 or prio3 = 1
)

SELECT *
FROM tbl
WHERE 1 IN (prio,prio2,prio3)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to duplicate row based on int column - sql

Related

pandas conditional fill binary [duplicate]

Sql: add a column with integers in a loop for duplicates

How to rewrite query which gives amount of specific value in row to avoid some values and count further with others?

Collating data in SQL Server

stored procedure to find value in 2 columns out of 3

Categories

Resources