Let's say I have a table table1 with the following structure:
id date v1 v2 v3 v4 ... vn
------------------------------
1 03 Y N 89 77 ... x
1 04 N N 9 7 ... i
1 05 N Y 6 90 ... j
1 06 N Y 9 34 ... i
1 07 N Y 0 88 ... i
2 03 N N 9 77 ... f
2 04 Y Y 90 7 ... y
2 05 Y N 6 90 ... v
2 06 N Y 9 34 ... i
2 07 N N 10 88 ... i
As you might see, the table has five rows for each id. I'd like to create two new columns:
-summarystory:= This variable is computed for those rows having the date between 05 and 07 and is the sum of the variable v3 for the last three rows.
Let me explain this better: the first two rows (date 03 and 04) must have NULL values, but the row having date=05 is the sum of the last three v3 values, i.e, 89+9+6=104. Likewise, the row having date=06 must be equal to 9+6+9=24. This have to be done for each id and for each date.
This is the desired result:
id date v3 summarystory
-------------------------
1 03 89 NULL
1 04 9 NULL
1 05 6 104
1 06 9 24
1 07 0 15
2 03 9 NULL
2 04 90 NULL
2 05 6 105
2 06 9 105
2 07 10 25
VcountYN:= the number of Y for each row (based only on variables v1 and v2). So. for instance, for the first row it would be VcountYN=1. This variable must be computed for all the rows.
Any help is much appreciated.
Here's how to do the computations. Turning it into the new table is left as an exercise:
-- SQL 2012 version
Select
t.id,
t.[date],
Case When [Date] Between 5 And 7 Then
Sum(v3) over (
partition by
id
order by
[date]
rows between
2 preceding and current row
) Else Null End,
Case When v1 = 'Y' Then 1 Else 0 End +
Case When v2 = 'Y' Then 1 Else 0 End
From
table1 t;
-- SQL 2005 version
Select
t1.id,
t1.[date],
Case When t1.[date] Between 5 And 7 Then t1.v3 + IsNull(t2.v3, 0) + IsNull(t3.v3, 0) Else Null End,
Case When t1.v1 = 'Y' Then 1 Else 0 End +
Case When t1.v2 = 'Y' Then 1 Else 0 End
From
table1 t1
Left Outer Join
table1 t2
On t1.id = t2.id and t1.[date] = t2.[date] + 1
Left Outer Join
table1 t3
On t2.id = t3.id and t2.[date] = t3.[date] + 1
http://sqlfiddle.com/#!6/a1c45/2
Related
How can I use countif statement in PostgreSQL?
max(COUNTIF(t1.A1:C10,t2.a1),COUNTIF(t1.A1:C10,t2.b1),COUNTIF(t1.A1:C10,t2.c1))
I have table1 which is more then a million rows
a
b
c
M5
16
27
31
3
7
27
and table2 more then 100 rows including different dates after column c
a
b
c
10
15
16
30
40
50
60
70
80
16
18
37
5
12
16
8
31
28
11
12
13
7
9
31
2
7
21
20
16
27
8
12
17
2
8
14
3
14
15
The outcome should be something like this
a
b
c
M5
16
27
31
3
3
7
27
2
Tried the below query but the outcome is not correct
UPDATE table1 SET m5 = greatest(
case When a in(select unnest(array[a,b,c]) from (select * from table2 order by date DESC limit 10) foo) then 1 else 0 END,
case When b in(select unnest(array[a,b,c]) from (select * from table2 order by date DESC limit 10) foo) then 1 else 0 END,
case When c in(select unnest(array[a,b,c]) from (select * from table2 order by date DESC limit 10) foo) then 1 else 0 END)
Assuming your columns are fixed and predictable, I think you could put all possible table values into a single column and then do counts for each occurrence:
with exploded as (
select a from table2
union all
select b from table2
union all
select c from table2
)
select a, count (*) as count
from exploded e
group by a
So for example, the value 7 occurs twice (which would be reflected in this output).
From there, you can just do the updates from the CTE:
with exploded as (
select a from table2
union all
select b from table2
union all
select c from table2
),
counted as (
select a, count (*) as count
from exploded e
group by a
)
update table1 t
set m5 = greatest (ca.count, cb.count, cc.count)
from
counted ca,
counted cb,
counted cc
where
t.a = ca.a and
t.b = cb.a and
t.c = cc.a
The only issue I see is if one of the values does not come up (the inner join fails), but in your example that doesn't seem to happen.
If it is possible, I would think that could be resolved with one more CTE to fill in missing values from table1 in the set of possible occurrences.
I have this data:
ID PERSNR YEARNR MONTHNR DAYNR ABSTIME ABSID ABSCALC TypeLine
---------------------------------------------------------------------
1 26 2018 12 3 480 3 11 0
2 26 2018 12 3 480 3 11 1
5 26 2018 10 1 60 1 31 0
8 26 2018 10 3 60 1 31 0
13 69 2018 12 3 480 3 11 0
14 69 2018 12 3 480 3 11 1
19 69 2018 9 3 60 3 31 1
22 69 2018 9 3 60 3 31 0
23 69 2018 9 3 420 21 11 0
26 69 2018 9 6 120 21 31 1
29 69 2018 9 10 120 21 31 1
32 69 2018 9 4 480 21 11 1
I need to identify the following situations:
the rows which have TypeLine both 0 and 1
Result Id's : 1 and 2; 13 and 14, 19 and 22
the rows which have only TypeLine only 0
Result Id's: 5; 8; 23
the rows which have only TypeLine only 1
Result Id's: 26, 29, 32
I'm not sure to create these 3 scripts and I couldn't find a solution.
Could you, please, help me?
Does this do what you want?
select (case when cnt_type_0 > 0 and cnt_type_1 > 0
then 'Condition 1'
when cnt_type_1 = 0
then 'Condition 2'
when cnt_type_0 = 0
then 'Condition 3'
end) as condition,
t.*
from (select t.*,
count(*) over (partition by ID, PERSNR, YEARNR, MONTHNR, DAYNR, ABSTIME, ABSID, ABSCALC) as cnt,
sum(case when TypeLine = 0 then 1 else 0 end) over (partition by ID, PERSNR, YEARNR, MONTHNR, DAYNR, ABSTIME, ABSID, ABSCALC) as cnt_type_0,
sum(case when TypeLine = 1 then 1 else 0 end) over (partition by ID, PERSNR, YEARNR, MONTHNR, DAYNR, ABSTIME, ABSID, ABSCALC) as cnt_type_1
from t
) t
where cnt >= 2;
You can add the conditions into the WHERE clause to get rows of just one type.
Assuming the source data was correct, you could run the following 3 queries. #1 is currently answering correctly but #2 and #3 have different DAYNUM's in the current version of the question so you won't return anything using those example values...
--1
SELECT T1.ID AS [T1_ID], T2.ID AS [T2_ID]
FROM [tablename] T1 INNER JOIN [tablename] T2 ON T1.PERSNR = T2.PERSNR
AND T1.YEARNR = T2.YEARNR AND T1.MONTHNR = T2.MONTHNR
AND T1.DAYNR = T2.DAYNR AND T1.ABSTIME = T2.ABSTIME
AND T1.ABSID = T2.ABSID AND T1.ABSCALT = T2.ABSCALT
AND (T1.TypeLine = 0 AND T2.TypeLine = 1
OR
T1.TypeLine = 1 AND T2.TypeLine = 0
)
AND T1.ID < T2.ID
--2
SELECT T1.ID AS [T1_ID], T2.ID AS [T2_ID]
FROM [tablename] T1 INNER JOIN [tablename] T2 ON T1.PERSNR = T2.PERSNR
AND T1.YEARNR = T2.YEARNR AND T1.MONTHNR = T2.MONTHNR
AND T1.DAYNR = T2.DAYNR AND T1.ABSTIME = T2.ABSTIME
AND T1.ABSID = T2.ABSID AND T1.ABSCALT = T2.ABSCALT
AND T1.TypeLine = 0 AND T2.TypeLine = 0
AND T1.ID < T2.ID
--3
SELECT T1.ID AS [T1_ID], T2.ID AS [T2_ID]
FROM [tablename] T1 INNER JOIN [tablename] T2 ON T1.PERSNR = T2.PERSNR
AND T1.YEARNR = T2.YEARNR AND T1.MONTHNR = T2.MONTHNR
AND T1.DAYNR = T2.DAYNR AND T1.ABSTIME = T2.ABSTIME
AND T1.ABSID = T2.ABSID AND T1.ABSCALT = T2.ABSCALT
AND T1.TypeLine = 1 AND T2.TypeLine = 1
AND T1.ID < T2.ID
Try something like this:
SELECT DISTINCT ID,
PERSNR,
YEARNR,
MONTHNR,
DAYNR,
ABSTIME,
ABSID,
ABSCALC,
iif(count(TypeLine) >= 2, 'duplicate', iif(min(TypeLine) = 1, '1', '0')) as status
FROM table
GROUP BY ID, PERSNR, YEARNR, MONTHNR, DAYNR, ABSTIME, ABSID, ABSCALC
In Hive, I have two tables as shown below:
SELECT * FROM p_test;
OK
p_test.id p_test.age
01 1
02 2
01 10
02 11
Time taken: 0.07 seconds, Fetched: 4 row(s)
SELECT * FROM p_test2;
OK
p_test2.id p_test2.height
02 172
01 170
Time taken: 0.053 seconds, Fetched: 2 row(s)
I'm supposed to get the age differences between the same user in the p_test table. Hence, I run HiveQL via row_number function as following:
SELECT *
FROM
(SELECT *, ROW_NUMBER() OVER(partition by id order by age asc) rn FROM p_test) t1
LEFT JOIN
(SELECT *, ROW_NUMBER() OVER(partition by id order by age asc) rn FROM p_test) t2
ON t2.id=t1.id AND t1.rn=(t2.rn+1)
LEFT JOIN
(SELECT * FROM p_test2) t_2
ON t_2.id = t1.id;
The result of it is :
t1.id t1.age t1.rn t2.id t2.age t2.rn t_2.id t_2.height
01 1 1 NULL NULL NULL 01 170
01 10 2 01 1 1 01 170
02 11 1 NULL NULL NULL 02 172
02 2 2 02 11 1 02 172
Time taken: 60.773 seconds, Fetched: 4 row(s)
It is all ok so far. However, If I move the condition which left joins table t1 and table t2 shown above to the last line as shown below:
SELECT *
FROM
(SELECT *, ROW_NUMBER() OVER(partition by id order by age asc) rn FROM p_test) t1
LEFT JOIN
(SELECT *, ROW_NUMBER() OVER(partition by id order by age asc) rn FROM p_test) t2
LEFT JOIN
(SELECT * FROM p_test2) t_2
ON t_2.id = t1.id
AND t2.id=t1.id AND t1.rn=(t2.rn+1);
I got the unexpected result as following:
t1.id t1.age t1.rn t2.id t2.age t2.rn t_2.id t_2.height
01 1 1 01 1 1 NULL NULL
01 1 1 01 10 2 NULL NULL
01 1 1 02 11 1 NULL NULL
01 1 1 02 2 2 NULL NULL
01 10 2 01 1 1 01 170
01 10 2 01 10 2 NULL NULL
01 10 2 02 11 1 NULL NULL
01 10 2 02 2 2 NULL NULL
02 11 1 01 1 1 NULL NULL
02 11 1 01 10 2 NULL NULL
02 11 1 02 11 1 NULL NULL
02 11 1 02 2 2 NULL NULL
02 2 2 01 1 1 NULL NULL
02 2 2 01 10 2 NULL NULL
02 2 2 02 11 1 02 172
02 2 2 02 2 2 NULL NULL
It seems that the condition which I move to the last line doesn't work anymore. It bothers me for a long time. Do hope I can hear any valuable answers, thx for anyone who provides me with answers in advance.
In your second query LEFT JOIN with t2 without ON condition is transformed to CROSS JOIN. This is why you have duplication. For id=01 you have two rows in subquery t1 and 2 rows in t2 initially, this CROSS join gives you 2x2=4 rows.
And the ON condition works, but it is applied only to the last LEFT join with t_2 subquery, this condition is being checked only to determine which rows to join in the last join, not all joins, it does not affect first CROSS JOIN (LEFT JOIN without ON condition) at all.
Every join should have it's own ON condition, except cross joins.
See also this answer about joins without ON condition behavior: https://stackoverflow.com/a/46843832/2700344
BTW you can do the same without t2 join at all using lag or lead analytic functions for calculating values ordered by age.
Like this:
lag(height) over(partition by id order by age) -- to get previous height
I need to add more rows to table_1 (first column sequence ) from another table_2 under the last first column [NO] from Table_1
for example
Table_1
NO F1 f2 code f3 Name
-- -- -- -- -- --
1 a 0 22 0 ID
2 b 0 19 0 ID
3 c 0 10 0 pass
4 d 0 05 0 pass
Table_2 that was imported from excel
NO code Name
-- -- --
5 11 ID
6 12 ID
7 06 pass
8 29 pass
My result
NO F1 f2 code f3 Name
-- -- -- -- -- --
1 a 0 22 0 ID
2 b 0 19 0 ID
3 c 0 10 0 pass
4 d 0 05 0 pass
5 0 0 11 0 ID
6 0 0 12 0 ID
7 0 0 06 0 pass
8 0 0 29 0 pass
Made edits based on updated results.
Let me specify what I think you are trying to do:
Add all records from table 2 where No is greater than any record in table 1
This will insert and set f1 f2 & f3 to 0:
declare #maxNo int = (select max(NO) from t1)
insert into t1 (NO,code,Name,f1, f2,f3)
select NO,code,Name,'0',0,0 from t2
where NO > #maxNo
You can try this.
DECLARE #MaxID INT = (SELECT MAX([NO]) FROM Table_1)
INSERT INTO Table_1 ([NO], F1, f2, code, f3, Name)
SELECT ( #MaxID + (ROW_NUMBER() OVER(ORDER BY (SELECT NULL))) ) , 0 , 0, code, 0, Name FROM
Table_2
I have 10 decimal columns and I would like to add a computed column to my table that contains the average of these 10. A complication is that not every record has all 10 columns filled in. Some records have 4 some have 8 and some have 10.
e.g.
ID D1 D2 D3 D4 D5 D6 D7 D8 D9 D10
1 12 19 13 14
2 32 53 34 54 65 34 12 09
3 41 54 33 61 71 12 09 08 08 12
How can I get the average of these where ID1 = 14.5, ID2 = 36.625 etc
I can't just do D1 + D2 + D3... / 10 as the 10 isn't always 10
The ideal would just be to do AVG(D1:D10) but clearly the world isn't ideal!
You can't use AVG aggregate function (because it works on rows) but you can calculate an average using the following query:
SELECT
(ISNULL(D1,0) + ISNULL(D2,0) +
ISNULL(D3,0) + ISNULL(D4,0) + ISNULL(D5,0) +
ISNULL(D6,0) + ISNULL(D7,0) + ISNULL(D8,0) +
ISNULL(D9,0) + ISNULL(D10,0)) /
CASE
WHEN
D1 IS NOT NULL
OR D2 IS NOT NULL
OR D3 IS NOT NULL
OR D4 IS NOT NULL
OR D5 IS NOT NULL
OR D6 IS NOT NULL
OR D7 IS NOT NULL
OR D8 IS NOT NULL
OR D9 IS NOT NULL
OR D10 IS NOT NULL
THEN
(
CASE
WHEN D1 IS NOT NULL THEN 1 ELSE 0
END +
CASE
WHEN D2 IS NOT NULL THEN 1 ELSE 0
END +
CASE
WHEN D3 IS NOT NULL THEN 1 ELSE 0
END +
CASE
WHEN D4 IS NOT NULL THEN 1 ELSE 0
END +
CASE
WHEN D5 IS NOT NULL THEN 1 ELSE 0
END +
CASE
WHEN D6 IS NOT NULL THEN 1 ELSE 0
END +
CASE
WHEN D7 IS NOT NULL THEN 1 ELSE 0
END +
CASE
WHEN D8 IS NOT NULL THEN 1 ELSE 0
END +
CASE
WHEN D9 IS NOT NULL THEN 1 ELSE 0
END +
CASE
WHEN D10 IS NOT NULL THEN 1 ELSE 0
END
)
ELSE 1
END
FROM yourtable
AVG for each id:
select id, avg(d) from
(
select id, id1 as d from tablename
union all
select id, id2 as d from tablename
union all
select id, id3 as d from tablename
union all
select id, id4 as d from tablename
union all
select id, id5 as d from tablename
union all
select id, id6 as d from tablename
union all
select id, id7 as d from tablename
union all
select id, id8 as d from tablename
union all
select id, id9 as d from tablename
union all
select id, id10 as d from tablename)
group by id
Use Values table valued constructor to unpivot the data then find average per ID. Try this
select id,avg(data) from Yourtable
cross apply
(values(D1), (D2), (D3), (D4), (D5), (D6) ,(D7), (D8), (D9) ,(D10)) cs (data)
group by id
Or if your want decimal values then use this.
select id,sum(data)/sum(case when data is not null then 1.0 else 0 end) from Yourtable
cross apply
(values(D1), (D2), (D3), (D4), (D5), (D6) ,(D7), (D8), (D9) ,(D10)) cs (data)
group by id