SQL: Same date but different values - sql

I have a table that looks like this
Station
year
month
day
number
A1
1990
1
1
50
A1
1990
1
1
60
A1
1990
1
2
55
A1
1990
1
3
10
A1
1990
1
4
40
In example , the query result will like below table
for same station and date
Station
year
month
day
number
A1
1990
1
1
50
A1
1990
1
1
60
How to set a proper SQL for it?

If I understand correctly, you want rows where the first four columns are duplicated. A simple method uses count(*):
select t.*
from (select t.*,
count(*) over (partition by station, year, month, date) as cnt
from t
) t
where cnt > 1;

Assuming your table have a primary key column called id, we can also try using exists logic here:
SELECT t1.*
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.Station = t1.Station AND t2.year = t1.year AND
t2.month = t1.month AND t2.day = t1.day AND t2.id <> t1.id);
If you don't have such an id column, then we could also use aggregation here:
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT Station, year, month, day
FROM yourTable
GROUP BY Station, year, month, day
HAVING COUNT(*) > 1
) t2
ON t2.Station = t1.Station AND t2.year = t1.year AND t2.month = t1.month AND
t2.day = t1.day;

You can use exists clause to find exact duplicate with different number as follows:
Select t.*
From your_table t
Where exists
(select 1 from your_table tt
Where tt.station = t.station
And tt.year = t.year
And tt.month = t.month
And tt.date = t.date
And tt.number <> t.number)

Related

Match by id and date between 2 tables, OR last known match id

Trying to make work following:
T1: Take id per dt where name = A which is most recent by load_id
Notice 2 records on 5-Jan-23, with load_id 2 and 3 => take load_id = 3
T2: And display corresponding id per dt for each param rows, with most recent load_id
Notice only load_id = 13 is kept on 05-Jan-23
T2: In case of date now available in T1, keep T2 rows matching last known id
Fiddle: https://dbfiddle.uk/-JO16GSj
My SQL seems a bit wild. Can it be simplified?
SELECT t2.dt, t2.param, t2.load_id, t2.id FROM
(SELECT
dt,
param,
load_id,
MAX(load_id) OVER (PARTITION BY dt, param) AS max_load_id,
id
FROM table2) t2
LEFT JOIN
(SELECT * FROM
(SELECT
dt,
id,
load_id,
MAX(load_id) OVER (PARTITION BY dt) AS max_load_id
FROM table1
WHERE name = 'A') t1_prep
WHERE t1_prep.load_id = t1_prep.max_load_id) t1
ON t1.dt = t2.dt and t1.id = t2.id
WHERE t2.load_id = t2.max_load_id
ORDER BY 1, 2
Your query can be rewritten as:
SELECT t2.*
FROM ( SELECT *
FROM table2
ORDER BY RANK() OVER (PARTITION BY dt, param ORDER BY load_id DESC)
FETCH FIRST ROW WITH TIES
) t2
LEFT OUTER JOIN
( SELECT *
FROM table1
WHERE name = 'A'
ORDER BY RANK() OVER (PARTITION BY dt ORDER BY load_id DESC)
FETCH FIRST ROW WITH TIES
) t1
ON t1.dt = t2.dt and t1.id = t2.id
ORDER BY t2.dt, t2.param
However, since the columns from t1 are never output and are joined with a LEFT OUTER JOIN (and will only output single rows per dt) then it is irrelevant whether a match is found or not with t1 and that table can be eliminated from the query simplifying it to:
SELECT *
FROM (
SELECT *
FROM table2
ORDER BY RANK() OVER (PARTITION BY dt, param ORDER BY load_id DESC)
FETCH FIRST ROW WITH TIES
)
ORDER BY dt, param;
or using your query:
SELECT dt, param, load_id, id
FROM (
SELECT dt, param, load_id, id,
MAX(load_id) OVER (PARTITION BY dt, param) AS max_load_id
FROM table2
)
WHERE load_id = max_load_id
ORDER BY dt, param
Which, for the sample data, all output:
DT
PARAM
LOAD_ID
ID
04-JAN-23
0
11
1
04-JAN-23
1
11
1
05-JAN-23
0
13
3
05-JAN-23
1
13
3
06-JAN-23
0
14
3
06-JAN-23
1
14
3
07-JAN-23
1
14
3
08-JAN-23
1
15
3
09-JAN-23
0
16
3
09-JAN-23
1
16
3
10-JAN-23
0
17
3
10-JAN-23
1
17
3
fiddle

SQL - How to add a column which shows the average of another column where record-specific criteria are met

I have this table:
PersonID
Class
Score
1
1
90
1
2
100
1
3
110
2
1
40
2
2
50
2
3
60
I need the new column to show the average of Score for each PersonID across all Scores for which class is less than or equal to that in the current record.
Here's what it should look like:
PersonID
Class
Score
Avg_Score_ClassLessThanOrEqual
1
1
90
90
1
2
100
95
1
3
110
100
2
1
40
40
2
2
50
45
2
3
60
50
Is this possible? I've tried partition by and sum(Case when), but I'm just starting out learning. I believe I need something like the pseudocode Partition by PersonID where PersonID = PersonID and Class <= Class
Try this:
SELECT t1.personid, t1.class, t1.score,
avg(t2.score) as average_score
FROM yourtable AS t1
LEFT JOIN yourtable AS t2
ON t1.personid = t2.personid
AND t2.class <= t1.class
GROUP BY t1.personid, t1.class, t1.score
EDIT:
I added the following part to match your request in the comments:
SELECT t1.personid, t1.class, t1.score,
avg(CASE WHEN t2.class <= t1.class THEN t2.score END) AS average_lower,
avg(CASE WHEN t2.class > t1.class THEN t2.score END) AS average_higher
FROM yourtable AS t1
LEFT JOIN yourtable AS t2
ON t1.personid = t2.personid
GROUP BY t1.personid, t1.class, t1.score
I think the simplest method is a correlated subquery:
select t.*,
(select avg(t2.score)
from t t2
where t2.personid = t.personid and t2.class <= t.class
) as avg_score_less_than
from t;
For performance, you want an index on (personid, class, score). Note: This uses class for the "less than or equal to part".
Hmmm . . . Actually, you can do this with window functions:
select t.*,
avg(score) over (partition by personid order by class) as avg_score_less_than
from t;

How to group complicated condition in sql

I'd like to group by region where there are customerswho has type=a
region customer type score
A a a 1
A b b 2
A c a 3
B d c 4
B e d 5
C f a 6
C g c 7
Therefore after first step
region customer type score
A a a 1
A b b 2
A c a 3
C f a 6
C g c 7
And then I groupby in region
region sum(score)
A 6
C 13
also I'd like to extract customer whose type=a
region customer type
A a a
A c a
C f a
Then I'd like to merge above.
My desired result is like following
customer sum_in_region
a 6
c 6
f 13
Are there any way to achieve this?
My work is till the second step..
How can I proceed further?
SELECT t1.region,t1.customer, t1.type, t1.score
FROM yourTable t1
WHERE EXISTS (SELECT 1
FROM yourTable t2
WHERE t2.region = t1.region
AND t2.type = 'a');
Thanks
Join the table to a derived table that does your first two steps.
SELECT t3.customer,
x1.score
FROM yourtable t3
INNER JOIN (SELECT t1.region,
sum(score) score
FROM yourtable t1
WHERE EXISTS (SELECT *
FROM yourtable t2
WHERE t2.region = t1.region
AND t2.type = 'a')
GROUP BY t1.region) x1
ON x1.region = t3.region
WHERE t2.type = 'a';
You could use the windows functions to get your result; the first step filters for only rows where type is a, based on the region. The second step then gets the sum of scores, based again on the region, before selecting only customer and sum columns :
with filter_type_a as
(select region, customer, type, score
from
(select *,
sum(type=="a") over (partition by region) as counter
from your_table)
where counter > 0)
select customer, sum_region
from
(select customer, type,
sum(score) over (partition by region) as sum_region
from filter_type_a)
where type=="a";
You can use below query:
SQLFiddle
with country_tmp as
(SELECT t1.region,t1.customer, t1.type, t1.score
FROM country t1
WHERE EXISTS (SELECT 1
FROM country t2
WHERE t2.region = t1.region
AND t2.type = 'a'))
select y.customer, x.score from
(select a.region, sum(a.score) score from (
SELECT t1.region,t1.customer, t1.type, t1.score
FROM country_tmp t1) a
group by region) x , (SELECT t1.region,t1.customer, t1.type
FROM country_tmp t1
Where t1.type = 'a') y where x.region = y.region;

subtract data from two columns and result into one column.`

I have data like below
id year marks id year marks
1 2017 80 1 2018 100
2 2017 60 2 2018 70
3 2017 500 3 2018 600
My result should be values as 20, 10, 100 in Difference column.
All this data should be in a single row.
Unless I'm missing something, you just want to take a difference of the two marks columns:
SELECT
t1.id, t1.year, t1.marks AS marks_2017, t2.id, t2.year, t2.marks AS marks_2018,
t2.marks - t1.marks AS diff
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.id = t2.id AND t1.year = 2017 AND t2.year = 2018;
Somehow, I think you want:
select id, sum(case when year = 2018 then marks else - marks end) as diff
from t
where year in (2017, 2018)
group by id;
SELECT id, year, marks, id, year, new_marks, new_marks - marks AS diff
FROM yourTable where year = '2018'
select
t1.*, t2.year, t2.marks, t2.marks - t1.marks as diff
from
test t1
inner join
test t2
on t1.id = t2.id and t1.year = 2017 and t2.year = 2018
Demo

Cumulative distinct count filtered by last value - T-SQL

I am trying to come up with exactly the same answer as here:
Cumulative distinct count filtered by last value - DAX
but in SQL Server. For convenience I am copying the whole problem description.
I have a dataset:
month name flag
1 abc TRUE
2 xyz TRUE
3 abc TRUE
4 xyz TRUE
5 abc FALSE
6 abc TRUE
I want to calculate month-cumulative distinct count of 'name' filtered by last 'flag' value (TRUE). I.e. I want to have a result:
month count
1 1
2 2
3 2
4 2
5 1
6 2
In months 5 and 6 'abc' should be excluded because the flag switched to 'FALSE' in month 5.
I am thinking about using "over" clause with "partition by" but I don't have any experience here so it's a struggle for me.
UPDATE
I have updated the last row in exemplary source data.
was:
6 abc FALSE
is:
6 abc TRUE
And the last row in output data.
Was:
6 1
is:
6 2
It might have not been obivous from the description that it should work this way and the proposed answer does not solve this problem.
UPDATE 2
I have managed to create a query that gives the result but it's ugly and I think could be shrinked by using over clause. Can you help me with that?
select t5.month_current, count(*) as count from
(select t3.month month_current, t4.month months_until_current, t3.name, t4.flag from
(select name ,month from
(select distinct name
from Source_data) t1
,(select distinct month
from Source_data) t2) t3
left join
Source_data t4
on t3.name = t4.name and t3.month >= t4.month) t5
inner join
(select t3.month month_current, max(t4.month) real_max_month_until_current, t3.name from
(select name ,month from
(select distinct name
from Source_data) t1
,(select distinct month
from Source_data) t2) t3
left join
Source_data t4
on t3.name = t4.name and t3.month >= t4.month
group by
t3.month, t3.name) t6
on t5.month_current = t6.month_current
and t5.months_until_current = t6.real_max_month_until_current
and t5.name = t6.name
where t5.flag = 'TRUE'
group by t5.month_current
You can do a cumulative distinct count as:
select t.*,
sum(case when seqnum = 1 then 1 else 0 end) over (order by month) as cnt
from (select t.*,
row_number() over (partition by name order by month) as seqnum
from t
) t;
I don't understand the logic for incorporating the flag.
You can replicate the results in the question by incorporating the flag:
select t.*,
sum(case when seqnum = 1 and flag = 'true' then 1
when seqnum = 1 and flag = 'false' then -1
else 0
end) over (order by month) as cnt
from (select t.*,
row_number() over (partition by name, flag order by month) as seqnum
from t
) t;