Counting if name appears under a given date - sql

I have a dataframe with a few columns, I will focus on the ones relevant to my issue.
I would like to return the value 1 for cases where a user's name appears in a row associated with a given date.
So here is an example:
user_name date
a 20/1/2019
a 20/1/2019
c 21/1/2019
c 21/1/2019
a 21/1/2019
b 20/1/2019
Using this as an example, this would be my desired output
user_name date val
a 20/1/2019 1
b 20/1/2019 1
c 20/1/2019 0
a 21/1/2019 1
b 21/1/2019 0
c 21/1/2019 1
I had though about use a case statement of sorts with the pseudo logic if name doesn't appear under a specific date then return 0 else return 1
case when user_name not in date then 0 else 1 end
but this doesn't seem to make sense

I think you want a cross join to generate rows and then logic to set the flag:
select u.user_name, d.date,
(case when exists (select 1 from t where t.user_name = u.user_name and t.date = d.date)
then 1 else 0
end) as flag
from (select distinct user_name from t) u cross join
(select distinct date from t) d ;

You may use a calendar table approach here. In the absence of any formal tables containing all dates and users, we can use subqueries in their place:
SELECT DISTINCT
u.user_name,
d.date,
CASE WHEN t.user_name IS NULL THEN 0 ELSE 1 END AS val
FROM (SELECT DISTINCT user_name FROM yourTable) u
CROSS JOIN (SELECT DISTINCT date FROM yourTable) d
LEFT JOIN yourTable t
ON t.user_name = u.user_name AND
t.date = d.date
ORDER BY
d.date,
u.user_name;
Demo

Related

Limit result from inner join query to 2 rows

My query is giving me result from grouped data but now I want only two rows
I have tried HAVING COUNT(*) <= 2 but issue is is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
my query is
select f.CompanyName, f.EmployeeCity, f.PrioritySL ,f.EmployeeSeniorityLevel ,f.EmployeeID
from (
select ConcatKey, min(PrioritySL) as PSL
from dbo.WalkerItContacts group by ConcatKey
) as x inner join dbo.WalkerItContacts as f on f.ConcatKey = x.ConcatKey and f.PrioritySL = x.PSL
where f.PrioritySL != '10'
Company apple have 9 records I want only 2 records
my data
company name priority
a 10
a 1
a 3
b 2
b 4
b 3
b 5
c 1
c 10
c 2
my expected data
company name priority
a 1
a 3
b 2
b 3
c 1
c 2
Add a 'top 2' clause to the outer query:
select top 2 f.CompanyName, f.EmployeeCity, f.PrioritySL ,f.EmployeeSeniorityLevel ,f.EmployeeID
from (
select ConcatKey, min(PrioritySL) as PSL
from dbo.WalkerItContacts group by ConcatKey
) as x inner join dbo.WalkerItContacts as f on f.ConcatKey = x.ConcatKey and f.PrioritySL = x.PSL
where f.PrioritySL != '10'
and f.CompanyName= 'Apple'
will give you two rows. Add a order clause by in the outer query so you can control which two rows are returned.
You can phrase this more succinctly and with better performance as:
select top (2) wic.*
from (select wic,
rank() over (partition by CompanyName, ConcatKey order by PrioritySL) as seqnum
from dbo.WalkerItContacts wic
) wic
where seqnum = 1 and
wic.PrioritySL <> 10 and
wic.CompanyName = 'Apple';
I think you could solve your problem using the ROW_NUMBER() function to count the rows and filter it in the WHERE clause to only show 2 rows per group.
I think something like this might work for you:
SELECT rownum, f.CompanyName, f.EmployeeCity, f.PrioritySL,
f.EmployeeSeniorityLevel, f.EmployeeID
FROM ( SELECT ConcatKey, MIN(PrioritySL) AS PSL, ROW_NUMBER() OVER(PARTITION BY
f.CompanyName) AS rownum
FROM dbo.WalkerItContacts
GROUP BY ConcatKey) AS x
INNER JOIN dbo.WalkerItContacts AS f ON f.ConcatKey = x.ConcatKey
AND f.PrioritySL = x.PSL
WHERE f.PrioritySL != '10' AND rownum <= 2
ORDER BY f.CompanyName ASC;
Hope this helps some.

SQL query to conditionally select a field value

I have an SQL query that joins 3 tables to return me the required data. The query is as follows:
SELECT (s.user_created,
u.first_name,
u.last_name,
u.email,
s.school_name,
p.plan_name,
substring( a.message, 11), u.phone1)
FROM cb_school s
inner join ugrp_user u
on s.user_created = u.user_id
inner join cb_plan p
on s.current_plan = p.plan_id
inner join audit a
on u.user_id = a.user_id
where s.type = 'sample'
and a.module_short = 'sample-user'
and s.created_time > current_timestamp - interval '10 day';
The query works fine if all the attributes are present. Now for few of my rows, the following value would be a.module_short = 'sample-user' missing. But since I have included it as an AND condition, those rows will not be returned. I am trying to return an empty string for that field if it is present, else the value as per my current query. Is there any way to achieve this.
Think you could possibly use a CASE WHEN statement, like this:
SELECT CASE WHEN a.module_short = 'sample-user' THEN a.module_short
ELSE '' END AS a.module_short
FROM TableA
you can use COALESCE it returns the first not null.
SELECT COALESCE(a.module_short,'')
FROM TableA AS a
SELECT (s.user_created,
u.first_name,
u.last_name,
u.email,
s.school_name,
p.plan_name,
substring( a.message, 11), u.phone1)
FROM cb_school s
INNER JOIN ugrp_user u
ON s.user_created = u.user_id
INNER JOIN cb_plan p
ON s.current_plan = p.plan_id
INNER JOIN audit a
ON u.user_id = a.user_id
AND a.module_short = 'sample-user'
WHERE s.type = 'sample'
AND s.created_time > current_timestamp - interval '10 day';
You want to show all users that have at least one module_short.
If the module_short contains 'sample-user' then it should show it, else it should show NULL as module_short. You only want 1 row per user, even if it has multiple module_shorts.
You can use a CTE, ROW_NUMBER() and the CASE clause for this question.
Example Question
I have 3 tables.
Users: Users with an ID
Modules: Modules with an ID
UserModules: The link between users and modules. You user can have multiple models.
I need a query that returns me all users that have at least 1 module with 2 columns UserName and ModuleName.
I only one 1 row for each user. The ModuleName should only display SQL if the user has that module. Else it should display no module.
Example Tables:
Users:
id name
1 Manuel
2 John
3 Doe
Modules:
id module
1 StackOverflow
2 SQL
3 StackExchange
4 SomethingElse
UserModules:
id module_id user_id
1 1 2
2 1 3
4 2 2
5 2 3
6 3 1
7 3 3
8 4 1
9 4 3
Example Query:
with CTE as (
select
u.name as UserName
, CASE
WHEN m.module = 'SQL' THEN 'SQL' ELSE NULL END as ModuleName
, ROW_NUMBER() OVER(PARTITION BY u.id
ORDER BY (CASE
WHEN m.module = 'SQL' THEN 'Ja' ELSE NULL END) DESC) as rn
from UserModules as um
inner join Users as u
on um.user_id = u.id
inner join Modules as m
on um.module_id = m.id
)
select UserName, ModuleName from CTE
where rn = 1
Example Result:
UserName ModuleName
Manuel NULL
John SQL
Doe SQL
Your query would look like this:
with UsersWithRownumbersBasedOnModule_short as (
SELECT s.user_created,
u.first_name,
u.last_name,
u.email,
s.school_name,
p.plan_name,
substring( a.message, 11),
u.phone1)
CASE
WHEN a.module_short = 'sample-user'
THEN a.module_short
ELSE NULL
END AS ModuleShort
ROW_NUMBER() OVER(PARTITION BY u.user_id ORDER BY (
CASE
WHEN a.module_short = 'sample-user'
THEN a.module_short
ELSE NULL
END) DESC) as rn
FROM cb_school s
inner join ugrp_user u
on s.user_created = u.user_id
inner join cb_plan p
on s.current_plan = p.plan_id
inner join audit a
on u.user_id = a.user_id
where s.type = 'sample'
and s.created_time > current_timestamp - interval '10 day';)
select * from UsersWithRownumbersBasedOnModule_short
where rn = 1
PS: I removed a lose bracket after SELECT and your SUBSTRING() is missing 1 parameter, it needs 3.

Join table on conditions, count on conditions

SELECT *, null AS score,
'0' AS SortOrder
FROM products
WHERE datelive = -1
AND hidden = 0
UNION
SELECT e.*, (SUM(r.a)/(COUNT(*)*1.0)+
SUM(r.b)/(COUNT(*)*1.0)+
SUM(r.c)/(COUNT(*)*1.0)+
SUM(r.d)/(COUNT(*)*1.0))/4 AS score,
'1' AS SortOrder
FROM products e
LEFT JOIN reviews r
ON r.productID = e.productID
WHERE e.hidden = 0
AND e.datelive != -1
GROUP BY e.productID
HAVING COUNT(*) >= 5
UNION
SELECT e.*, (SUM(r.a)/(COUNT(*)*1.0)+
SUM(r.b)/(COUNT(*)*1.0)+
SUM(r.c)/(COUNT(*)*1.0)+
SUM(r.d)/(COUNT(*)*1.0))/4 AS score,
'2' AS SortOrder
FROM products e
LEFT JOIN reviews r
ON r.productID = e.productID
WHERE e.hidden = 0
AND e.datelive != -1
GROUP BY e.productID
HAVING COUNT(*) < 5
ORDER BY SortOrder ASC, score DESC
This creates an SQL object for displaying products on a page. The first request grabs items of type datelive = -1, the second of type datelive != -1 but r.count(*) >= 5, and the third of type datelive != -1 and r.count(*) < 5. The reviews table is structured similar to the below:
reviewID | productID | a | b | c | d | approved
-------------------------------------------------
1 1 5 4 5 5 1
2 5 3 2 5 5 0
3 2 5 5 4 3 1
... ... ... ... ... ... ...
I'm trying to work it such that r.count(*) only cares for rows of type approved = 1, since tallying data based on unapproved reviews isn't ideal. How can I join these tables such that the summations of scores and the number of rows is dependent only on approved = 1?
I've tried adding in AND r.approved = 1 in the WHERE conditional for the joins and it doesn't do what I'd like. It does sort it properly, but then it no longer includes items with zero reviews.
You seem to be nearly there.
In your question you talked about adding the AND r.approved = 1 to the join criteria but by the sounds of it you are actually adding it to the WHERE clause.
If you instead properly add it to the join criteria like below then it should work fine:
SELECT *, null AS score,
'0' AS SortOrder
FROM products
WHERE datelive = -1
AND hidden = 0
UNION
SELECT e.*, (SUM(r.a)/(COUNT(*)*1.0)+
SUM(r.b)/(COUNT(*)*1.0)+
SUM(r.c)/(COUNT(*)*1.0)+
SUM(r.d)/(COUNT(*)*1.0))/4 AS score,
'1' AS SortOrder
FROM products e
LEFT JOIN reviews r ON r.productID = e.productID
WHERE e.hidden = 0
AND e.datelive != -1
GROUP BY e.productID
HAVING COUNT(*) >= 5
UNION
SELECT e.*, (SUM(r.a)/(COUNT(*)*1.0)+
SUM(r.b)/(COUNT(*)*1.0)+
SUM(r.c)/(COUNT(*)*1.0)+
SUM(r.d)/(COUNT(*)*1.0))/4 AS score,
'2' AS SortOrder
FROM products e
LEFT JOIN reviews r ON r.productID = e.productID AND r.approved = 1
WHERE e.hidden = 0
AND e.datelive != -1
GROUP BY e.productID
HAVING COUNT(*) < 5
ORDER BY SortOrder ASC, score DESC
SQL Fiddle here.
Notice again how I have simply put the AND r.approved = 1 directly after LEFT JOIN reviews r ON r.productID = e.productID which adds an extra criteria to the join.
As I mentioned in my comment, the WHERE clause will filter rows out of the combined record set after the join has been made. In some cases the RDBMS may optimise it out and put it into the join criteria but only where that would make no difference to the result set.
Calculating the non-zero sums and joining it to your result may solve it;
fiddle
SELECT a.productID,
NULL AS score,
'0' AS SortOrder
FROM products a
WHERE datelive = -1
AND hidden = 0
UNION
SELECT e.productID,
(min(x.a)/(min(x.cnt)*1.0)+ min(x.b)/(min(x.cnt)*1.0)+ min(x.c)/(min(x.cnt)*1.0)+ min(x.d)/(min(x.cnt)*1.0))/4 AS score,
'1' AS SortOrder
FROM products e
JOIN reviews r ON r.productID = e.productID
LEFT JOIN
(SELECT ee.productID,
sum(rr.a) AS a,
sum(rr.b) AS b,
sum(rr.c) AS c,
sum(rr.d) AS d,
count(*) AS cnt
FROM products ee
LEFT JOIN reviews rr ON ee.productID = rr.productID
GROUP BY ee.productID) x ON e.productID = x.productID
WHERE e.hidden = 0
AND e.datelive != -1
GROUP BY e.productID HAVING COUNT(*) >= 5
UNION
SELECT e.productID,
(min(x.a)/(min(x.cnt)*1.0)+ min(x.b)/(min(x.cnt)*1.0)+ min(x.c)/(min(x.cnt)*1.0)+ min(x.d)/(min(x.cnt)*1.0))/4 AS score,
'2' AS SortOrder
FROM products e
LEFT JOIN reviews r ON r.productID = e.productID
LEFT JOIN
(SELECT ee.productID,
sum(rr.a) AS a,
sum(rr.b) AS b,
sum(rr.c) AS c,
sum(rr.d) AS d,
count(*) AS cnt
FROM products ee
LEFT JOIN reviews rr ON ee.productID = rr.productID
GROUP BY ee.productID) x ON e.productID = x.productID
WHERE e.hidden = 0
AND e.datelive != -1
GROUP BY e.productID HAVING COUNT(*) < 5
ORDER BY SortOrder ASC,
score DESC
You could create a temp table that only contains rows where approved = 1, and then join on the temp table instead of reviews.
create table tt_reviews like reviews;
insert into tt_reviews
select * from reviews
where approved = 1;
alter table tt_reviews add index(productID);
Then replace reviews with tt_reviews in your above query.

Remove grouped data set when total of count is zero with subquery

I'm generating a data set that looks like this
category user total
1 jonesa 0
2 jonesa 0
3 jonesa 0
1 smithb 0
2 smithb 0
3 smithb 5
1 brownc 2
2 brownc 3
3 brownc 4
Where a particular user has 0 records in all categories is it possible to remove their rows form the set? If a user has some activity like smithb does, I'd like to keep all of their records. Even the zeroes rows. Not sure how to go about that, I thought a CASE statement may be of some help but I'm not sure, this is pretty complicated for me. Here is my query
SELECT DISTINCT c.category,
u.user_name,
CASE WHEN (
SELECT COUNT(e.entry_id)
FROM category c1
INNER JOIN entry e1
ON c1.category_id = e1.category_id
WHERE c1.category_id = c.category_id
AND e.user_name = u.user_name
AND e1.entered_date >= TO_DATE ('20140625','YYYYMMDD')
AND e1.entered_date <= TO_DATE ('20140731', 'YYYYMMDD')) > 0 -- I know this won't work
THEN 'Yes'
ELSE NULL
END AS TOTAL
FROM user u
INNER JOIN role r
ON u.id = r.user_id
AND r.id IN (1,2),
category c
LEFT JOIN entry e
ON c.category_id = e.category_id
WHERE c.category_id NOT IN (19,20)
I realise the case statement won't work, but it was an attempt on how this might be possible. I'm really not sure if it's possible or the best direction. Appreciate any guidance.
Try this:
delete from t1
where user in (
select user
from t1
group by user
having count(distinct category) = sum(case when total=0 then 1 else 0 end) )
The sub query can get all the users fit your removal requirement.
count(distinct category) get how many category a user have.
sum(case when total=0 then 1 else 0 end) get how many rows with activities a user have.
There are a number of ways to do this, but the less verbose the SQL is, the harder it may be for you to follow along with the logic. For that reason, I think that using multiple Common Table Expressions will avoid the need to use redundant joins, while being the most readable.
-- assuming user_name and category_name are unique on [user] and [category] respectively.
WITH valid_categories (category_id, category_name) AS
(
-- get set of valid categories
SELECT c.category_id, c.category AS category_name
FROM category c
WHERE c.category_id NOT IN (19,20)
),
valid_users ([user_name]) AS
(
-- get set of users who belong to valid roles
SELECT u.[user_name]
FROM [user] u
WHERE EXISTS (
SELECT *
FROM [role] r
WHERE u.id = r.[user_id] AND r.id IN (1,2)
)
),
valid_entries (entry_id, [user_name], category_id, entry_count) AS
(
-- provides a flag of 1 for easier aggregation
SELECT e.[entry_id], e.[user_name], e.category_id, CAST( 1 AS INT) AS entry_count
FROM [entry] e
WHERE e.entered_date BETWEEN TO_DATE('20140625','YYYYMMDD') AND TO_DATE('20140731', 'YYYYMMDD')
-- determines if entry is within date range
),
user_categories ([user_name], category_id, category_name) AS
( SELECT u.[user_name], c.category_id, c.category_name
FROM valid_users u
-- get the cartesian product of users and categories
CROSS JOIN valid_categories c
-- get only users with a valid entry
WHERE EXISTS (
SELECT *
FROM valid_entries e
WHERE e.[user_name] = u.[user_name]
)
)
/*
You can use these for testing.
SELECT COUNT(*) AS valid_categories_count
FROM valid_categories
SELECT COUNT(*) AS valid_users_count
FROM valid_users
SELECT COUNT(*) AS valid_entries_count
FROM valid_entries
SELECT COUNT(*) AS users_with_entries_count
FROM valid_users u
WHERE EXISTS (
SELECT *
FROM user_categories uc
WHERE uc.user_name = u.user_name
)
SELECT COUNT(*) AS users_without_entries_count
FROM valid_users u
WHERE NOT EXISTS (
SELECT *
FROM user_categories uc
WHERE uc.user_name = u.user_name
)
SELECT uc.[user_name], uc.[category_name], e.[entry_count]
FROM user_categories uc
INNER JOIN valid_entries e ON (uc.[user_name] = e.[user_name] AND uc.[category_id] = e.[category_id])
*/
-- Finally, the results:
SELECT uc.[user_name], uc.[category_name], SUM(NVL(e.[entry_count],0)) AS [entry_count]
FROM user_categories uc
LEFT OUTER JOIN valid_entries e ON (uc.[user_name] = e.[user_name] AND uc.[category_id] = e.[category_id])
Here's another method:
WITH totals AS (
SELECT
c.category,
u.user_name,
COUNT(e.entry_id) AS total,
SUM(COUNT(e.entry_id)) OVER (PARTITION BY u.user_name) AS user_total
FROM
user u
INNER JOIN
role r ON u.id = r.user_id
CROSS JOIN
category c
LEFT JOIN
entry e ON c.category_id = e.category_id
AND u.user_name = e.user_name
AND e1.entered_date >= TO_DATE ('20140625', 'YYYYMMDD')
AND e1.entered_date <= TO_DATE ('20140731', 'YYYYMMDD')
WHERE
r.id IN (1, 2)
AND c.category_id IN (19, 20)
GROUP BY
c.category,
u.user_name
)
SELECT
category,
user_name,
total
FROM
totals
WHERE
user_total > 0
;
The totals derived table calculates the totals per user and category as well as totals across all categories per user (using SUM() OVER ...). The main query returns only rows where the user total is greater than zero.

SQL Dynamic join?

Please see http://sqlfiddle.com/#!3/2506f/2/0
I have two tables. One is a general record, and the other is a table containing related documents that link to that record.
In my example I've created a straightforward query which shows all records and their associated documents. This is fine, but I want a more complex situation.
In the 'mainrecord' table there is a 'multiple' field. If this is 0, then I only want the most recent document from the documents table (that is, with the highest ID). If it is 1, I want to join all linked documents.
So, rather than the result of the query being this:-
ID NAME MULTIPLE DOCUMENTNAME IDLINK
1 One 1 first document 1
1 One 1 second document 1
2 Two 0 third document 2
2 Two 0 fourth document 2
3 Three 1 fifth document 3
3 Three 1 sixth document 3
It should look like this:-
ID NAME MULTIPLE DOCUMENTNAME IDLINK
1 One 1 first document 1
1 One 1 second document 1
2 Two 0 fourth document 2
3 Three 1 fifth document 3
3 Three 1 sixth document 3
Is there a way of including this condition into my query to get the results I'm after. I'm happy to explain further if needed.
Thanks in advance.
WITH myData
AS
(SELECT mainrecord.*, documentlinks.documentName, documentlinks.idlink,
Row_number()
OVER (
partition BY mainrecord.ID
ORDER BY mainrecord.ID ASC) AS ROWNUM
FROM mainrecord INNER JOIN documentlinks
ON mainrecord.id = documentlinks.idlink)
SELECT *
FROM mydata o
WHERE multiple = 0 AND rownum =
(SELECT max(rownum) FROM mydata i WHERE i.id = o.id)
UNION
SELECT *
FROM myData
WHERE multiple = 1
http://sqlfiddle.com/#!3/2506f/57
Another solution (tested at SQL-Fiddle):
SELECT m.*,
d.id as did, d.documentName, d.IDLink
FROM mainrecord AS m
JOIN documentlinks AS d
ON d.IDLink = m.id
AND m.multiple = 1
UNION ALL
SELECT m.*,
d.id as did, d.documentName, d.IDLink
FROM mainrecord AS m
JOIN
( SELECT d.IDLink
, MAX(d.id) AS did
FROM mainrecord AS m
JOIN documentlinks AS d
ON d.IDLink = m.id
AND m.multiple = 0
GROUP BY d.IDLink
) AS g
ON g.IDLink = m.id
JOIN documentlinks AS d
ON d.id = g.did
ORDER BY id, did ;
This will probably do:
SELECT mainrecord.name, documentlinks.documentname
FROM documentlinks
INNER JOIN mainrecord ON mainrecord.id = documentlinks.IDLink AND multiple = 1
UNION
SELECT mainrecord.name, documentlinks.documentname
FROM (SELECT max(id) id, IDLink FROM documentlinks group by IDLink) maxdocuments
INNER JOIN documentlinks ON documentlinks.id = maxdocuments.id
INNER JOIN mainrecord ON mainrecord.id = documentlinks.IDLink AND multiple = 0
How about this:
select * from mainrecord a inner join documentlinks b on a.Id=b.IDLink
where b.id=(case
when a.multiple=1 then b.id
else (select max(id) from documentlinks c where c.IDLink=b.IDLink) end)
SQL FIDDLE
select
m.ID, m.name, m.multiple, dl.idlink,
dl.documentName
from mainrecord as m
left outer join documentlinks as dl on dl.IDlink = m.id
where
m.multiple = 1 or
not exists (select * from documentlinks as t where t.idlink = m.id and t.id < dl.id)