If I have a table like this in Hive:
name impressions sampling_rate
------------------------------------
paul 34 1
emma 0 3
greg 0 5
How can I duplicate each row in a select statement by the sampling_rate column so that it would look like this:
name impressions sampling_rate
------------------------------------
paul 34 1
emma 0 3
emma 0 3
emma 0 3
greg 0 5
greg 0 5
greg 0 5
greg 0 5
greg 0 5
Using space() you can produce a string of spaces with lenght=sampling_rate-1 , split it and explode with lateral view, it will duplicate rows.
Demo:
with your_table as(--Demo data, use your table instead of this CTE
select stack (3, --number of tuples
'paul',34,1,
'emma', 0,3,
'greg', 0,5
) as (name,impressions,sampling_rate)
)
select t.*
from your_table t --use your table here
lateral view explode(split(space(t.sampling_rate-1),' '))e
Result:
name impressions sampling_rate
------------------------------------
paul 34 1
emma 0 3
emma 0 3
emma 0 3
greg 0 5
greg 0 5
greg 0 5
greg 0 5
greg 0 5
Related
This question already has an answer here:
Quickest way to make a get_dummies type dataframe from a column with a multiple of strings
(1 answer)
Closed 1 year ago.
I have a df
name cars
john honda,kia
tom honda,kia,nissan
jack toyota
johnny honda,kia
tommy honda,kia,nissan
jacky toyota
What is a best way using pandas to create a data frame that would add a 1 if car present else 0 to existing df which would look like this.
name cars honda kia nissan toyota
john honda,kia 1 1 0 0
tom honda,kia,nissan 1 1 1 0
jack toyota 0 0 0 1
johnny honda,kia 1 1 0 0
tommy honda,kia,nissan 1 1 1 0
jacky toyota 0 0 0 1
i tried using np.where with multiple conditions as described here but i don't think its the right approach.
That’s exactly what pd.Series.str.get_dummies does, just join it’s result to your dataframe without the cars column:
>>> df.drop(columns=['cars']).join(df['cars'].str.get_dummies(sep=','))
name honda kia nissan toyota
0 john 1 1 0 0
1 tom 1 1 1 0
2 jack 0 0 0 1
3 johnny 1 1 0 0
4 tommy 1 1 1 0
5 jacky 0 0 0 1
I have a sql table like:
ID Name Balance
1 Peter 324.5
2 Michael 122.7
3 Peter 788.3
4 Mark 45.7
5 Ralph 333.5
6 Thomas 563.2
7 Ralph 9685.1
8 Peter 2444.5
9 Susi 35.2
10 Andrew 442.5
11 Susi 2424.8
Is it possible to write a while loop in sql, where you could add a whole new column with integer numbers (for example 1....3) for each duplicate names (3 times Peter, 2 times Susi, 2 times Ralph)? For the non duplicate names it should be a value of 0.
So the final table should look like this:
ID Name Balance Value
1 Peter 324.5 1
2 Michael 122.7 0
3 Peter 788.3 1
4 Mark 45.7 0
5 Ralph 333.5 2
6 Thomas 563.2 0
7 Ralph 9685.1 2
8 Peter 2444.5 1
9 Susi 35.2 3
10 Andrew 442.5 0
11 Susi 2424.8 3
You wouldn't want to use a while loop for this. Just use window functions:
select t.*, count(*) over (partition by name) as cnt
from t;
This provides the total count for each name. If you want an incremental value, you can use row_number():
select t.*, row_number() over (partition by name order by id) as seqnum
from t;
This would enumerate the rows for each name, so every name would have a "1" value, some would have "2" and so on.
I have a query which gives me amount of grade 5 for every student in row (if student don't have any other grade on the way):
select distinct on (student, class) scg.*
from (select student, class, grade, count(*) as cnt,
min(gradeDate), max(gradeDate), min_gradeDate, max_gradeDate
from (select t.*,
row_number() over (partition by student, class, grade order by gradeDate) as seqnum_scg,
row_number() over (partition by student, class order by gradeDate) as seqnum_sc
from t
) t
where grade = 5
group by student, class, grade, (seqnum_sc - seqnum_scg)
) scg
order by student, class, cnt desc;
The original problem is explained here:
How to count data with specific values and for specific user/person (in row)?
But now I want to extend this query with one more feature. This counter gives me max value unless some student have grade 4/3/2/1, but now I want it to:
stop counting if student has 4 or 3 grade and start over (with previous max) when student get another 5
What I mean:
Actual query: 5, 5, 5, 4, 3, 5, 5, 2 --> gives me max = 3
New query: 5, 5, 5, 4, 3, 5, 5, 2 --> gives me max = 5, because 4 and 3 stop counter and start it when user gets another 5
stop counting if student gets grade 2 or 1 (and give me max value before getting 2/1 grade) So the same thing which query does now for every grade except 5, but I want it only for 2 and lower (that I can specify in query).
Can someone help me rewrite the second query given by #Gordon Linoff to work like that and tell me what changed?
Edit: examples as requested:
id student grade class gradeDate
1 1 5 1 2017-03-03
2 1 5 1 2017-03-04
3 1 1 1 2017-03-05
4 1 5 1 2017-03-06
5 1 5 1 2017-03-07
6 1 5 1 2017-03-08
7 1 1 1 2017-03-09
8 2 5 2 2017-03-03
9 3 5 3 2017-03-03
10 4 5 4 2017-03-03
11 4 5 4 2017-03-04
12 4 4 4 2017-03-05
13 4 3 4 2017-03-06
14 4 5 4 2017-03-07
15 4 5 4 2017-03-08
16 5 5 5 2017-03-01
17 5 5 5 2017-03-03
18 5 5 5 2017-03-04
19 5 5 5 2017-03-05
20 5 5 5 2017-03-06
21 5 2 5 2017-03-07
22 5 5 5 2017-03-08
23 5 5 5 2017-03-09
Student one : max = 3
Student two : max = 1
Student three : max = 1
Student four : max = 4 (grade 4 and 3 stop counter, but don't reset it)
Student five : max = 5 (because grade 2 reset counter, lack of grade on date
2017-03-02 is not a problem for counter)
One of the methods can be using 2 subqueries and one analytic function
Demo: http://sqlfiddle.com/#!15/74b71/10
SELECT student, max( xxx )
FROM (
SELECT student, grp_nbr, count(CASE WHEN grade = 5 THEN 1 END) As xxx
FROM (
SELECT *,
SUM ( CASE WHEN grade in (1,2)
THEN 1 ELSE 0
END
) OVER (Partition by student Order By gradeDate ) As grp_nbr
FROM table1
) x
GROUP BY student, grp_nbr
) y
GROUP BY student
ORDER BY student
| student | max |
|---------|-----|
| 1 | 3 |
| 2 | 1 |
| 3 | 1 |
| 4 | 4 |
| 5 | 5 |
I have the following data in SQL Server
St 1 2 3 4 5 6 7 8
===========================================
603 2 5 1.5 3 0 0 0 0
603 0 0 0 0 2 1 3 5
As I insert the data by batches, each batch only has 4 columns each and I want to collate the data to the following
St 1 2 3 4 5 6 7 8
===========================================
603 2 5 1.5 3 2 1 3 5
but most of the threads I see here are about concatenating strings of a single column.
Anyone has any idea on how to collate or even merge different rows into a single row.
You can use the group by and Sum key word of the t-SQL
SELECT SUM(COL1) , SUM(COL2)..... FROM tbl GROUP BY ST
You can use the GROUP BY clause and aggregate with SUM fields 1-8 :
SELECT St, SUM(1), SUM(2),.. FROM tbl GROUP BY St
I am putting in the samle date and i am supposed to do something similar what i am asking.
I want to run a query that would pull values in any two columns out 3 if it has a 1 or if any one column has a 1 it will return just those results. However it should search all three columns and in any of the three columns where it found value as 1 it should return that result. Can anyone please help me with this. Thanks in advance.
ID Patient Patient Name prio prio2 prio3
-------------------------------------------------
1 101563 Robert Riley 1 1 1
2 101583 Cody Ayers 1 0 1
3 101825 Jason Lawler 0 0 1
4 101984 Dustin Lumis 1 0 0
5 102365 Stacy smith 1 0 0
6 102564 Frank Milon 1 0 0
7 102692 Thomas Kroning 1 0 0
8 102856 Andrew Philips 1 0 0
9 102915 Alice Davies 0 0 1
10 103785 Jon Durley 0 0 1
11 103958 Clayton Folsom 1 1 1
12 104696 Michelle Holsley 1 1 1
13 104983 Teresa Jones 1 0 1
14 105892 Betsy Prat 1 1 0
15 106859 Casey Ayers 1 1 0
So, basically you want to pull anything where any of the 3 columns prio,prio2, or prio3 =1? Please clarify your question if this isn't what you are asking( for a better answer). Also, you should tag it with what type of SQL.
SELECT ID,Patient,[Patient Name],prio,prio2, prio3
FROM uRtable
WHERE prio = 1 OR prio2 = 1 OR prio3 = 1
But, if you are saying that you want to pull back any row where any of the 3 columns prio,prio2, or prio3 = 1, but at least one of them is 0 (Get any where any of the 3 = 1 but exclude where they all = 1), probably the easiest way to understand that would be
SELECT ID,Patient,[Patient Name],prio,prio2, prio3
FROM uRtable
WHERE (prio = 1 OR prio2 = 1 OR prio3 = 1)
AND (prio = 0 OR prio2 = 0 OR prio3 = 0)
Try this:
select * from mytable
where prio + prio2 + prio3 = (
select max(prio + prio2 + prio3)
from mytable
where prio = 1 or prio2 = 1 or prio3 = 1
)
SELECT *
FROM tbl
WHERE 1 IN (prio,prio2,prio3)