Finding a value from table that have the most number of values from PK attribute - sql

I have a table with the following values (Please ignore index, here column with R1..R10 being the PK of the table.
1 R1 M1 Mo1
2 R2 M2 Mo3
3 R3 M4 Mo6
4 R4 M2 Mo1
5 R5 M7 Mo1
6 R6 M5 Mo2
7 R7 M6 Mo1
8 R8 M4 Mo4
9 R9 M9 Mo3
10 R10 M3 Mo9
I want to find a value of Mo[i] for which number of R[i] are max. For example in above case Mo1 has maximum number of R[i] values so it must return Mo1.
I have been doing the stuff using count, but not succeeded yet.
Here is what i wrote
select Mo from table1 where Mo=(select max(r.Mo),max(count((r.Mo))) from table1 )r group by r.Mo

Try this:
select Mo from
(
select Mo from
(
select Mo, count(*) cnt
from table1
group by Mo
)
order by cnt desc
) where rownum = 1;
This first groups the table by the column Mo, resulting in
Mo | cnt
----+----
Mo3 | 2
Mo2 | 1
Mo4 | 1
Mo1 | 4
Mo6 | 1
Mo9 | 1
It then orders this by the count which results in this:
Mo
---
Mo1
Mo3
Mo6
Mo2
Mo4
Mo9
And then it simply returns the first row of the result which results in Mo1.

Related

How to perform Complex SQL join with multiple approximate matches and return only the first match

I am trying to perform a Left join in SQL where I need to check multiple match criteria and only retain the first match in the right table after a certain sort operation on the right table.
Below is my Left table.
(No Null values)
Date
Customer
Shop
Product
Customer_Score
1/1/2020
C1
S1
P1
2
1/2/2020
C2
S1
P2
8
1/5/2020
C3
S2
P1
6
1/6/2020
C4
S2
P2
10
1/7/2020
C1
S2
P3
2
1/8/2020
C2
S2
P4
4
And this is the right Table
(Null values allowed only in Product column)
Shop
Product
Min_Customer_Score
Valid_From
Valid_To
Percent_Discount
S1
P1
4
1/1/2020
1/5/2020
10
S1
P1
5
1/1/2020
1/5/2020
11
S1
P1
7
1/1/2020
1/5/2020
12
S1
5
1/1/2020
1/5/2020
13
S2
P1
4
1/1/2020
1/5/2020
14
S2
P2
4
1/1/2020
1/5/2020
15
S2
6
1/1/2020
1/5/2020
16
S2
9
1/1/2020
1/5/2020
17
S2
P1
4
1/6/2020
1/8/2020
18
S2
P2
4
1/6/2020
1/8/2020
19
S2
6
1/6/2020
1/8/2020
20
S2
9
1/6/2020
1/8/2020
21
I want to sort the right table first by Product(nulls at last) and then by Min_Customer_Score(ascending).
Then I want to pull the Min_Customer_Score and Discount value from first row matching below conditions:
Left.Date >= Right.Valid_From
Left.Date <= Right.Valid_To
Left.Shop = Right.Shop
Left.Product = Right.Product OR Right.Product = null
Left.Customer_Score >= Right.Min_Customer_Score
My final result should look like below.
Date
Customer
Shop
Product
Customer_Score
Min_Customer_Score
Percent_Discount
1/1/2020
C1
S1
P1
2
null
null
1/2/2020
C2
S1
P2
8
5
13
1/5/2020
C3
S2
P1
6
4
14
1/6/2020
C4
S2
P2
10
4
19
1/7/2020
C1
S2
P3
2
null
null
1/8/2020
C2
S2
P4
4
null
null
Basically, I want to find the right discount for each purchase, considering null values in the Right.Product as default discount that is applicable to all other products.
I am familiar with making Left Joins and also using Sub Queries in SQL. But I couldn't even understand where to start to do such complex queries. I have also referred to other answers which suggest using ROW_NUMBER() OVER (PARTITION BY, But couldn't work it out for this case.
Edit:
This is what I was able to work out so far.
SELECT left_table.*, right_table.Percent_Discount, right_table.Min_Customer_Score
, ROW_NUMBER() OVER (
PARTITION BY left_table.Date, left_table.Customer, left_table.Shop, left_table.Product
ORDER BY right_table.Product DESC right_table.Min_Customer_Score ASC) as row_num
LEFT JOIN right_table
ON left_table.Date >= right_table.Valid_From
AND left_table.Date <= right_table.Valid_To
AND left_table.Shop>= right_table.Shop
AND (left_table.Product = right_table.Product OR right_table.Product is NULL)
AND left_table.Customer_Score >= right_table.Min_Customer_Score
WHERE row_num = 1
But It gives me below error
ERROR: column "row_num" does not exist
LINE: WHERE row_num = 1
Use apply:
select l.*, r.*
from left l outer apply
(select top (1)
from right r
where l.Date >= r.Valid_From and
l.Date <= r.Valid_To and
l.Shop = r.Shop and
(l.Product = r.Product or r.Product = null) and
(l.Customer_Score >= r.Min_Customer_Score)
order by (case when product is not null then 1 else 2 end),
Min_Customer_Score asc
) r
Finally, I was able to solve it as below. Thanks to #iamdave for your comment
SELECT Date, Customer, Shop, Product, Customer_Score, Min_Customer_Score, Percent_Discount
FROM
(
SELECT left_table.*, right_table.Percent_Discount, right_table.Min_Customer_Score
, ROW_NUMBER() OVER (
PARTITION BY left_table.Date, left_table.Customer, left_table.Shop, left_table.Product
ORDER BY right_table.Product DESC right_table.Min_Customer_Score ASC) as row_num
LEFT JOIN right_table
ON left_table.Date >= right_table.Valid_From
AND left_table.Date <= right_table.Valid_To
AND left_table.Shop = right_table.Shop
AND (left_table.Product = right_table.Product OR right_table.Product is NULL)
AND left_table.Customer_Score >= right_table.Min_Customer_Score
) as sub_query
WHERE row_num = 1

Find most favorite products from a large table on SQL server

For SQL server, I have a table with
CostumerID TITLE DATE
1 m1 1999-05-08
1 m1 2000-07-10
1 m1 2001-12-11
1 m2 2008-03-20
1 m2 2005-09-05
1 m2 2011-07-08
1 m3 2006-07-22
1 m3 2009-01-19
1 m3 2012-02-18
2 m1 2007-09-28
2 m1 2010-11-19
2 m1 2009-08-09
2 m2 2010-04-22
2 m2 2008-10-16
2 m2 2010-07-22
2 m3 2013-07-31
2 m3 2011-01-11
2 m3 2010-02-20
3 m1 2010-04-07
3 m1 2011-06-11
3 m1 2010-11-09
3 m2 2013-08-21
3 m2 2014-07-19
3 m2 2015-12-29
3 m3 2011-04-17
3 m3 2014-01-31
3 m3 2012-09-19
2 m3 2010-02-03
…
Q1: I need to find the CostumerID that has consumed a product in Jan and Feb.
Select a.CostumerID
From
(Select distinct CostumerID from theTable where month(date) = '2') as a
Inner join
(Select distinct CostumerID from theTable where month(date) = '1') as b
On a.CostumerID = b.CostumerID
Q2: Also, I need to find the most favorite one in all products that are consumed at the first time by each costumer.
Select b.title, count(b.title) as cnt
from
(
Select a.CostumerID , min(a.date) as earliestDate
from [DJX_test1].dbo.ama_services as a
group by a.CostumerID
) as c
inner join [DJX_test1].dbo.ama_services as b
on b.CostumerID = c.CostumerID and b.[date] = c.earliestDate
group by b.title
order by cnt desc
The table size may be large with 10+ millions rows.
Are there better queries without using subqueries ?
Also, how to estimate a query's performance without running it ?
thanks
Q1: I need to find the CostumerID that has consumed a product in Jan and Feb.
SELECT costumerID
FROM theTable
WHERE month(Convert(date,[date]))='1' OR month(Convert(date,[date]))='2'
Q2: Also, I need to find the most favorite one in all products that are consumed at the first time by each costumer.
You can easily achieve it by using dense_rank() function as below.
;with cte as
(
select costumerID,title,date,
dense_rank() over (partition by costumerid order by convert(Date,[date])) as rn
from theTable
)
select * from cte where rn=1
For the first question, try this:
SELECT DISTINCT costumerID
FROM theTable
WHERE month(date)='1' OR month(date)='2'

Group by a multiple select request from different tables

so I made this :
select (
SELECT COUNT (*)
from TXN_TOTO
WHERE (CO1 = '1L' OR CO1 = '1') AND OP1 in('P3', 'R1')
) as A,
(
SELECT COUNT (*)
from TXN_TITI
WHERE (CO1 = '1L' OR CO1 = '1') AND OP1 in('P3', 'R1') AND STAT = 6
) as B,
(
SELECT COUNT (*)
from T_TITI tti inner join T_TATA ttdi
ON tti.ID_DINT = ttdi.ID_DINT
WHERE (CO1 = '1L' AND OP1 in('01', '04', 'Z8')) OR (CO1 = '1' AND OP1 in('P3', 'R1')) AND COM = 'O'
) as C
FROM DUAL;
I get result who look like this :
A | B | C
----------
7 | 1 | 9
Both TXN_TOTO and TXN_TITI table have a 'cent' column, I'd like to filter on that in order to get
CENT | A | B | C
----------------
0 | 2 | 0 | 0
1 | 2 | 1 | 4
2 | 3 | 0 | 5
Since I getting my data from 2 different tables I really don't see how to do it.
Thanks.
EDIT : as requested here are example data and result
TXN_TOTO
ID_DINT | CO1 | OP1 | DID_CENT
------------------------------
1 2L Z3 088
2 1L 1 089
3 1 P3 155
4 1L Z3 155
5 1L 1 077
6 1 P3 077
7 1L Z3 077
8 1L 1 077
9 1 P3 022
TXN_TITI
ID_DINT | CO1 | OP1 | DID_CENT |STAT
------------------------------------
1 2L Z3 088 6
2 1L 1 089 6
3 1 P3 155 6
4 1L Z3 155 6
5 1L 1 077 6
6 1 P3 077 6
7 1L Z3 077 6
8 1L Z8 077 6
9 1 R1 022 5
TXN_TATA
ID_DINT | COM |
---------------
1 O
2 O
3 O
4 O
5 N
6 O
7 O
8 O
9 O
Expected results :
DID_CENT | A | B | C
155 1 1 0
077 1 1 1
022 1 0 0
A is only computed from TXN_TOTO
B is only computed from TXN_TITI, only difference is the stat column
C is a join of TITI and TATA, you need to have a O in TATA
select coalesce(A.DID_CENT,B.DID_CENT,C.DID_CENT) DID_CENT,
nvl(sum(A.cnt),0) A, nvl(sum(B.cnt),0) B, nvl(sum(C.cnt),0) C
from
(
SELECT DID_CENT, COUNT (*) cnt
from TXN_TOTO
WHERE (CO1 = '1L' OR CO1 = '1') AND OP1 in('P3', 'R1')
GROUP BY DID_CENT
) A
FULL JOIN
(
SELECT DID_CENT, COUNT (*) cnt
from TXN_TITI
WHERE (CO1 = '1L' OR CO1 = '1') AND OP1 in('P3', 'R1') AND STAT = 6
GROUP BY DID_CENT
) B ON A.DID_CENT=B.DID_CENT
FULL JOIN
(
SELECT DID_CENT, COUNT (*) cnt
from TXN_TITI tti inner join TXN_TATA ttdi
ON tti.ID_DINT = ttdi.ID_DINT
WHERE (CO1 = '1L' AND OP1 in('01', '04', 'Z8')) OR (CO1 = '1' AND OP1 in('P3', 'R1'))
GROUP BY DID_CENT
) C ON B.DID_CENT=C.DID_CENT
GROUP BY coalesce(A.DID_CENT,B.DID_CENT,C.DID_CENT)
this is because CENT data has different values according to your A,B,C.
Dual table cant get more than one row so it is not possible to write such a sql.
Instead,
use
count(*) over(partition by CENT)

sql closing distinct counts at end of each day

Hi I have a requirement to calculate the closing counts of distinct House Numbers with status = 'AA' at the end of each day.
So at end of day on 19/03/2016 - if we have 3 distinct house H1, H2, H3 with AA count = 3
On 20/03/2016 if we have H2 record again with AA , and h4,h5 with AA , closing balance = 5
on 21/03/2016 if 2 of the houses move out of AA and another 3 get added to AA ,the count will be = 6
Could someone please help me with the sql to resolve this.
Database = Netezza .
So basically I need to count the number of distinct Houses from the beginning of time to that day which have status AA, for each and every day as the Closing count of Houses for the day.[CLsBal is the derivation required]
ReleaseDate|HNo|Status|HType|RelReason| ValidFrm | ValidTo |ClsBal
-------------------------------------------------------------------
01-Jan-16 H1 AA R XYZ 01-Jan-16 01-Jan-16 2
01-Jan-16 H2 AA R XYZ 01-Jan-16 31/12/2999 2
02-Jan-16 H3 AA R XYZ 02-Jan-16 31/12/2999 4
02-Jan-16 H4 AA R XYZ 02-Jan-16 31/12/2999 4
02-Jan-16 H5 AA R XYZ 02-Jan-16 31/12/2999 4
02-Jan-16 H1 AB R XYZ 02-Jan-16 31/12/2999 4
03-Jan-16 H6 AA R XYZ 02-Jan-16 31/12/2999 8
03-Jan-16 H7 AA R XYZ 02-Jan-16 31/12/2999 8
03-Jan-16 H8 AA R XYZ 02-Jan-16 31/12/2999 8
03-Jan-16 H9 AA R XYZ 02-Jan-16 31/12/2999 8
03-Jan-16 H3 AA R XYZ 02-Jan-16 31/12/2999 8
The code below is the logic for finding the closing Balance for a given day :
select cast('31-dec-2015' as date ) as RELEASEDATE,
HNo,
HType,
count(distinct Hno ) as ClosingCount
from HouseChanges
where ReleaseDate <= '31-dec-2015'
and Status='AA' -- House Status
and ValidTo >= '31-dec-2015'
group by RelReason, Htype
I need this to be calculated for every single Day from 31-dec-2015
Can we use a recursive CTE to resolve this ?
I tried to but it keeps erroring out and Aginity isn't very great at returning error codes.
I used something Like this :
with closing as (
select cast('31-dec-2015' as date ) as RELEASEDATE,
HNo,
HType,
count(distinct Hno ) as ClosingCount
from HouseChanges
where ReleaseDate <= '31-dec-2015'
and
Status='AA' -- House Status
and
VALID_TO_DATE >= '31-dec-2015'
group by RelReason, Htype
union all
select max(Release_date) as RELEASEDATE,
HNo,
HType,
count(distinct Hno ) as ClosingCount
from HouseChanges t1
inner join closing t2
on t1.Release_date <= t2. RELEASEDATE++interval '1 days'
where
Status='AA' -- House Status
and
VALID_TO_DATE >= t2. RELEASEDATE+interval '1 days'
and
t2. RELEASEDATE<=current_date
group by RelReason, Htype
)
select * from closing
Turns out Netezza does not support reccursive CTE . Could someone advise how I can go about doing this without using a CTE ?
I don't fully understand the requirements you're describing but here is my guess. It should be possible to use a window function for this though I don't know if you have those available:
select
ReleaseDate, HNo, HType,
(
select count(distinct hc2.Hno)
from HouseChanges hc2
where hc2.ReleaseDate <= hc.ReleaseDate
and Status = 'AA' and ValidTo >= '31-dec-2015'
) as ClosingCount
from HouseChanges hc
where Status = 'AA' and ValidTo >= '31-dec-2015'
group by ReleaseDate, HNo, HType

Filter out a record depending upon other records of the same ID in SQL

My table schema is like follow:
Table Name: Quality
ID Name Type
-- ---- ----
1 XYZ S1
1 XYZ B1
1 XYZ S2
1 XYZ R1
2 ABC B1
2 ABC B2
2 ABC R1
2 ABC U1
3 PQR B1
3 PQR B2
3 PQR R2
3 PQR R1
4 AAA B1
4 AAA S1
5 BBB B1
5 BBB B2
5 BBB U2
I want to filter out those IDs whose Type is B1 but it should not be (R1 and U1 in other rows). Also those IDs whose type is B2 but it shuld not be (R2 and S2 in other rows)
here,the output should be
ID Name Type
-- ---- ----
2 ABC B2
4 AAA B1
5 BBB B1
My query is following which is nto giving proper result:
SELECT
ID , NAME , TYPE
FROM
QUALITY Q
WHERE
(Q.TYPE IN ('B1') AND (Q.TYPE Not IN ('R1', 'U1'))
OR
(Q.TYPE IN ('B2') AND (Q.TYPE Not IN ('R2', 'U2'))
My query runs for one record at a time so i am not getting proper result. how can I make this query check every record of that particular ID to find TYPE?
Any help will be really useful.
You can use NOT EXISTS:
SELECT *
FROM Quality q
WHERE
(Type = 'B1' AND NOT EXISTS(SELECT 1 FROM Quality WHERE ID = q.ID AND Type IN ('R1', 'U1')))
OR (Type = 'B2' AND NOT EXISTS(SELECT 1 FROM Quality WHERE ID = q.ID AND Type IN ('R2', 'U2')))
select * from
quality q
where
(q.type = 'b1' and q.id not in (select q2.id from quality q2 where q2.type in ('r1','u1'))
or
(q.type = 'b2' and q.id not in (select q3.id from quality q3 where q3.type in ('r2','u2'))