SQL: Combination of one column + count - sql

I have a source table containing userIDs and their actions (entries are distinct)
userID | action
1 A
1 B
2 A
3 B
3 C
and I need to get all combinations of two actions together with the number of users who performed both actions.
action1| action2 | number of users
A A 2
A B 1
(A C 0)
B B 2
B C 1
C C 1
A-C is in parenthesis, because I don't need rows in the output containing 0 users.
a row containing twice the same action (A-A) just stores the number of users who performed that action. (user 1 and user 2 performed action A, that's 2 users)
I tried to join the source table with itself, but as it contains more than an million of rows, I ran out of spool space:
SELECT sT1.action, sT2.action, count(distinct sT1.userID)
FROM sourceTable sT1
JOIN sourceTable sT2 ON (sT1.userID=sT2.userID)
GROUP BY 1,2
HAVING sT1.action <= sT2.action
The output itself shouldn't be too big, as the majority of combinations will not exist (0 users performed both actions).
Is there a more efficient way to query what I need?
Thank you in advance.

SELECT sT1.action, sT2.action, count(*)
FROM sourceTable sT1
LEFT INNER JOIN sourceTable sT2 ON (sT1.userID=sT2.userID)
where (st1.RowID <> st1.RowID)
and sT1.action <= sT2.action
GROUP BY st1.action, st2.action
HAVING count(*) > 0
The only problematic this is that you need to discard the case where st1 and st2 are matching the same row.In the SQL above I have assumed that the sourceTable has a PK I've called RowID and exclude the case where its joining a row to itself.
I've also changed the HAVING line as that didn't seem to be what your description of the problem called for: it sounded like it was better in the WHERE clause. The new HAVING clause is actually redundant: it should never have a count(*) of 0, but it won't hurt.

Related

How to Limit Results Per Match on a Left Join - SQL Server

I have a table with student info [STU] and a table with parent info [PAR]. I want to return an email address for each student, but just one. So I run this query:
SELECT [STU].[ID], [PAR].[EM]
FROM (SELECT [STU].* FROM DB1.STU)
STU LEFT JOIN (SELECT [PAR].* FROM DB1.PAR) PAR ON [STU].[ID] = [PAR].[ID]
This gives me the below table:
Student ID ParentEmail
1 jim#email.com
1 sarah#email.com
2 paul#email.com
2 tim#email.com
3 bill#email.com
3 frank#email.com
3 joyce#email.com
4 greg#email.com
5 tony#email.com
5 sam#email.com
Each student has multiple parent emails, but I only want one. In other words, I want the output to look like this:
Student ID ParentEmail
1 jim#email.com
2 paul#email.com
3 frank#email.com
4 greg#email.com
5 sam#email.com
I've tried so many things. I've tried using GROUP BY and MIN/MAX and I've tried complex CASE statements, and I've tried COALESCE but I just can't seem to figure it out.
I think OUTER APPLY is the simplest method:
SELECT [STU].[ID], [PAR].[EM]
FROM DB1.STU OUTER APPLY
(SELECT TOP (1) [PAR].*
FROM DB1.PAR
WHERE [STU].[ID] = [PAR].[ID]
) PAR;
Normally, there would be an ORDER BY in the subquery, to give you control over which email you want -- the longest, shortest, oldest, or whatever. Without an ORDER BY it returns just one email, which is what you are asking for.
If you just want one column from the parent table, a simple approach is a correlated subquery:
select
s.id student_id,
(select max(p.em) from db1.par p where p.id = s.id) parent_email
from db1.stu s
This gives you the greatest parent email per student.

JOIN query, SQL Server is dropping some rows of my first table

I have two tables customer_details and address_details. I want to display customer details with their corresponding address, so I was using a LEFT JOIN, but when I'm executing this query, SQL Server drops rows where street_no of customer_details table doesn't match with the street_no in address_detials table and displays only rows where `street_no' of customer_detials = street_no of address_details table. I need to display a complete customer_details table and in case if street_no doesn't matches it should display empty string or anything. Am I doing anything wrong in my SQL join?
Table customer_details:
case_id customer_name mob_no street_no
-------------------------------------------------
1 John 242342343 4324234234234
1 Rohan 343233333 43332
1 Ankit 234234233 2342332423433
1 Suresh 234234324 2342342342342
1 Ranjeet 343424323 32233
1 Ramu 234234333 2342342342343
Table address_details:
s_no streen_no address city case_id
------------------------------------------------------
1 4324234234234 Roni road Delhi 1
2 2342332423433 Natan street Lucknow 1
3 2342342342342 Koliko road Herdoi 1
SQL JOIN query:
select
a.*, b.address
from
customer_details a
left join
address_details b on a.street_no = b.street_no
where
b.case_id = 1
Now that it became clear that you used b.case_id=1, I will explain why it filters:
The LEFT JOIN itself returns some rows that contain all NULL values for table b in the result set, which is what you want and expect.
But by using WHERE b.case_id=1, the rows containing NULL values for table b are filtered out because none of them matches the condition (all those rows have b.case_id=NULL so they don't match).
It might work to instead use WHERE a.case_id=1, but we don't know if a.case_id and b.case_id are always the same value for matching rows (they might not be; and if they are always the same, then we just identified a potential redundancy).
There are two ways to fix this for sure.
(1) Move b.case_id = 1 into the left join condition:
left join address_details b on a.street_no = b.street_no and b.case_id = 1
(2) Keep b.case_id = 1 in the WHERE but also allow for NULLED-out b values:
left join address_details b on a.street_no = b.street_no
where b.case_id = 1
or b.street_no IS NULL
Personally I'd go for (1) because that is the most clear way to express that you want to filter b on two conditions, without affecting the rows of a that are being returned.
I do think that Wilhelm Poggenpohl answer is kind of right. You just need to change the last join condition a.case_id=1 to b.case_id=1
select a.* , b.address
from customer_details a
left join address_details b on a.street_no=b.street_no
and b.case_id=1
This query will show every row from customer_details and the corresponding adress if there is a match of street_no and the adress meets the condition case_id=1.
This is because of the where clause. Try this:
select a.* , b.address
from customer_details a
left join address_details b on a.street_no=b.street_no
and a.case_id=1

SQL Inner join in a nested select statement

I'm trying to do an inner join in a nested select statement. Basically, There are first and last reason IDs that produce a certain number (EX: 200). In another table, there are definitions for the IDs. I'm trying to pull the Last ID, along with the corresponding comment for whatever is pulled (EX: 200 - Patient Cancelled), then the first ID and the comment for whatever ID it is.
This is what I have so far:
Select BUSN_ID
AREA_NAME
DATE
AREA_STATUS
(Select B.REASON_ID
A.LAST_REASON_ID
FROM BUSN_INFO A, BUSN_REASONS B
WHERE A.LAST_REASON _ID=B.REASON_ID,
(Select B.REASON_ID
A. FIRST_REASON_ID
FROM BUSN_INFO A, BUSN_REASONS B
WHERE A_FIRST_REASON_ID = B.REASON_ID)
FROM BUSN_INFO
I believe an inner join is best, but I'm stuck on how it would actually work.
Required result would look like (this is example dummy data):
First ID -- Busn Reason -- Last ID -- Busn Reason
1 Patient Sick 2 Patient Cancelled
2 Patient Cancelled 2 Patient Cancelled
3 Patient No Show 1 Patient Sick
Justin_Cave's SECOND example is the way I used to solve this problem.
If you want to use inline select statements, your inline select has to select a single column and should just join back to the table that is the basis of your query. In the query you posted, you're selecting the same numeric identifier multiple times. My guess is that you really want to query a string column from the lookup table-- I'll assume that the column is called reason_description
Select BUSN_ID,
AREA_NAME,
DATE,
AREA_STATUS,
a.last_reason_id,
(Select B.REASON_description
FROM BUSN_REASONS B
WHERE A.LAST_REASON_ID=B.REASON_ID),
a.first_reason_id,
(Select B.REASON_description
FROM BUSN_REASONS B
WHERE A.FIRST_REASON_ID = B.REASON_ID)
FROM BUSN_INFO A
More conventionally, though, you'd just join to the busn_reasons table twice
SELECT i.busn_id,
i.area_name,
i.date,
i.area_status,
i.last_reason_id,
last_reason.reason_description,
i.first_reason_id,
first_reason.reason_description
FROM busn_info i
JOIN busn_reason first_reason
ON( i.first_reason_id = first_reason.reason_id )
JOIN busn_reason last_reason
ON( i.last_reason_id = last_reason.reason_id )

pad database out with NULL criteria

If I have the following sample table (order by ID)
ID Date Type
-- ---- ----
1 01/01/2000 A
2 22/04/1995 A
2 14/02/2001 B
Where you can immediate see that ID=1 does not have a Type=B, but ID=2 does. What I want to do, if fill in a line to show this:
ID Date Type
-- ---- ----
1 01/01/2000 A
1 NULL B
2 22/04/1995 A
2 14/02/2001 B
where there could potentially be 100's of different types, (so may need to end up inserting 100's rows per person if they lack 100's Types!)
Is there a general solution to do this?
Could I possibly outer join the table on itself and do it that way?
You can do this with a cross join to generate all the rows and a left join to get the actual data values:
select i.id, s.date, t.type
from (select distinct id from sample) i cross join
(select distinct type from sample) t left join
sample s
on s.id = i.id and
s.type = t.type;

Remove Duplicates from LEFT OUTER JOIN

My question is quite similar to Restricting a LEFT JOIN, with a variation.
Assuming I have a table SHOP and another table LOCATION. Location is a sort of child table of table SHOP, that has two columns of interest, one is a Division Key (calling it just KEY) and a "SHOP" number. This matches to the Number "NO" in table SHOP.
I tried this left outer join:
SELECT S.NO, L.KEY
FROM SHOP S
LEFT OUTER JOIN LOCATN L ON S.NO = L.SHOP
but I'm getting a lot of duplicates since there are many locations that belong to a single shop. I want to eliminate them and just get a list of "shop, key" entries without duplicates.
The data is correct but duplicates appear as follows:
SHOP KEY
1 XXX
1 XXX
2 YYY
3 ZZZ
3 ZZZ etc.
I would like the data to appear like this instead:
SHOP KEY
1 XXX
2 YYY
3 ZZZ etc.
SHOP table:
NO
1
2
3
LOCATION table:
LOCATION SHOP KEY
L-1 1 XXX
L-2 1 XXX
L-3 2 YYY
L-4 3 YYY
L-5 3 YYY
(ORACLE 10g Database)
You need to GROUP BY 'S.No' & 'L.KEY'
SELECT S.NO, L.KEY
FROM SHOP S
LEFT OUTER JOIN LOCATN L
ON S.NO = L.SHOP
GROUP BY S.NO, L.KEY
EDIT Following the update in your scenario
I think you should be able to do this with a simple sub query (though I haven't tested this against an Oracle database). Something like the following
UPDATE shop s
SET divnkey = (SELECT DISTINCT L.KEY FROM LOCATN L WHERE S.NO = L.SHOP)
The above will raise an error in the event of a shop being associated with locations that are in multiple divisions.
If you just want to ignore this possibility and select an arbitrary one in that event you could use
UPDATE shop s
SET divnkey = (SELECT MAX(L.KEY) FROM LOCATN L WHERE S.NO = L.SHOP)
I had this problem too but I couldn't use GROUP BY to fix it because I was also returning TEXT type fields. (Same goes for using DISTINCT).
This code gave me duplicates:
select mx.*, case isnull(ty.ty_id,0) when 0 then 'N' else 'Y' end as inuse
from master_x mx
left outer join thing_y ty on mx.rpt_id = ty.rpt_id
I fixed it by rewriting it thusly:
select mx.*,
case when exists (select 1 from thing_y ty where mx.rpt_id = ty.rpt_id) then 'Y' else 'N' end as inuse
from master_x mx
As you can see I didn't care about the data within the 2nd table (thing_y), just whether there was greater than zero matches on the rpt_id within it. (FYI: rpt_id was also not the primary key on the 1st table, master_x).