PostgreSQL, NOT IN clause - sql

I want to calculate DAU and exclude user that we don't consider "real" (employees, beta testers etc).
It worked fine previously when I wrote the filtering in the query:
SELECT
count(distinct user_id) AS daily,
e.event_timestamp::DATE AS date
FROM
"public"."events" AS e
WHERE
user_id IN (SELECT
distinct id
from
"user"."user"
WHERE
username IS NOT NULL AND position IS NOT NULL )
GROUP BY date
When I try changing it to below, which should give more or less the same count (basically instead of defining the 4000 "real users" I define the 1000 "non-users" I want to exclude). However, this gives me way higher counts. It's like the distinct statement isn't working.
I added the NOT NULL to the subquery but doesn't change the result. Is there something with the NOT IN + subquery that works in another way than the IN clause?
SELECT
count(distinct e.user_id) AS daily,
e.event_timestamp::DATE AS date
FROM
"public"."events" AS e
WHERE
e.user_id NOT IN (SELECT distinct id FROM "public"."non_users" WHERE id IS NOT NULL)
GROUP BY
date
ORDER BY
date

Yes. If any of the values in the subquery are NULL, then NOT IN returns no rows For this reason, I strongly recommend that you always use NOT EXISTS -- it behaves as expected.
You seem to know this, because you are using a NULL comparison in the WHERE. So, the difference is probably due to the other condition. So, include it as well:
SELECT count(distinct e.user_id) AS daily,
e.event_timestamp::DATE AS date
FROM "public"."events" e
WHERE NOT EXISTS (SELECT 1
FROM "public"."non_users" nu
WHERE e.user_id = nu.id AND
nu.position IS NOT NULL
)
GROUP BY date
ORDER BY date;

Related

Is there a way to use DISTINCT and COUNT(*) together to bulletproof your code against DUPLICATE entries?

I got help with a function yesterday to correctly get the count of multiple items in a column based on multiple criteria/columns. However, if there is a way to get the DISTINCT count of all the entries in the table based on aggregated GROUP BY statement.
SELECT TIME = ap.day,
acms.tenantId,
acms.CallingService,
policyList = ltrim(sp.value),
policyInstanceList = ltrim(dp.value),
COUNT(*) AS DISTINCTCount
FROM dbo.acms_data acms
CROSS APPLY string_split(acms.policyList, ',') sp
CROSS APPLY string_split(acms.policyInstanceList, ',') dp
CROSS APPLY (select day = convert(date, acms.[Time])) ap
GROUP BY ap.day, acms.tenantId, sp.value, dp.value, acms.CallingService
I would just like to know if there would be a way to see if there is a workaround for using DISTINCT and Count(*) together and whether or not it would affect my results to make this algorithm potentially invulnerable to duplicate entries.
The reason why I have to use COUNT(*) is because I am aggregating based on every column in the table not just a specific column or multiple.
We can use DISTINCT with COUNT together like this example.
USE AdventureWorks2012
GO
-- This query shows 290 JobTitle
SELECT COUNT(JobTitle) Total_JobTitle
FROM [HumanResources].[Employee]
GO
-- This query shows only 67 JobTitle
SELECT COUNT( DISTINCT JobTitle) Total_Distinct_JobTitle
FROM [HumanResources].[Employee]
GO

Oracle SQL - return multiple columns from subquery

Let's take a simple query in Oracle:
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED
FROM
CASE
WHERE
CASE.DATE_RAISED > '2019-01-01'
Now let's say another table, EVENT, contains multiple events which may be associated with each case (linked via EVENT.CASE_ID). OR not exist at all. I want to report on the earliest-dated future event per case - or if nothing exists, return NULL. I can do this with a subquery in the SELECT clause, as follows:
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED,
(
SELECT
MIN(EVENT.DATE)
FROM
EVENT
WHERE
EVENT.CASE_ID = CASE.ID
AND EVENT.DATE >= CURRENT_DATE
) AS MIN_EVENT_DATE
FROM
CASE
WHERE
CASE.DATE_RAISED > '2019-01-01'
This will return a table like this:
Case ID Case Type Date Raised Min Event Date
76 A 03/01/2019 10/05/2019
43 B 02/02/2019 [NULL]
89 A 29/01/2019 08/07/2019
90 A 04/03/2019 [NULL]
102 C 15/04/2019 20/05/2019
Note that if there do not exist any Events which match the criteria, the line is still returned but without a value. This is because the subquery is in the SELECT clause. This works just fine.
My problem, however, is if I want to return more than one column from the EVENT table - while still at the same time preserving the possibility that there are no matching rows from the EVENT table. The above code only returns EVENT.DATE as the single subquery result, to ONE column of the main query. But what if I also want to return EVENT.ID, or EVENT.TYPE? While still allowing for them to be NULL (if no matching records from CASE are found)?
I suppose I could use multiple subqueries in the SELECT clause: each returning just ONE column. But this seems horribly inefficient, given that each subquery would be based on the same criteria (the minimum-dated EVENT whose CASE ID matches that of the main query; or NULL if no such events found).
I suspect some nifty joins would be the answer - although I'm struggling to understand which ones exactly.
Please note that the above examples are vastly simplified versions of my actual code, which already contains multiple joins in the "old style" Oracle format, eg:
WHERE
CASE.ID(+) = EVENT.CASE_ID
There are reasons why this is so - therefore a request to anyone answering this, please would you demonstrate any solutions in this style of coding, as my SQL isn't far enough advanced to be able to re-factor the "newer" style joins into existing code.
You can use a join and window functions. For instance:
select c.*, e.*
from c left join
(select e.*,
row_number() over (partition by e.case_id order by e.date desc) as seqnum
from events e
) e
on e.case_id = c.id and e.seqnum = 1;
where c.date_raised > date '2019-01-01'; -- assuming the value is a date
Is this what you mean? I just rewrote Gordon's answer with old Oracle join syntax and your code style.
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED,
MIN_E.DATE AS MIN_EVENT_DATE
FROM
CASE,
(SELECT EVENT.*,
ROW_NUMBER() OVER (PARTITION BY EVENT.CASE_ID ORDER BY EVENT.DATE DESC) AS SEQNUM
FROM
EVENT
WHERE
EVENT.DATE >= CURRENT_DATE
) MIN_E
WHERE
CASE.DATE_RAISED > DATE '2019-01-01'
AND MIN_E.CASE_ID (+) = CASE.ID
AND MIN_E.SEQNUM (+) = 1;
Create object type with columns you want and return it from subquery. Your query will be like
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED,
(
SELECT
t_your_new_type ( MIN(EVENT.DATE) , min ( EVENT.your_another_column ) )
FROM
EVENT
WHERE
EVENT.CASE_ID = CASE.ID
AND EVENT.DATE >= CURRENT_DATE
) AS MIN_EVENT_DATE
FROM
CASE
WHERE
CASE.DATE_RAISED > '2019-01-01'

sql oracle group by subquery

I get the same ecommerce number for each date. I am trying to get ecommerce value count depending on the date, which is different for each date as the total number is only 105 for all October, not 391958.
Any idea how to group by the output of a subquery?
Thank you!
SELECT to_char(wcs1.start_tms,'DD/MM/YYYY') as dates,
(
SELECT count(*)
FROM ft_t_wcs1 wcs1,ft_t_stup stup
WHERE stup.modl_id='ECOMMERC'
AND stup.CROSS_REF_ID=wcs1.acct_id
AND stup.end_tms IS NULL
) AS ecommerce
FROM ft_t_wcs1 wcs1, ft_t_stup stup
WHERE wcs1.scenario='CREATE'
AND wcs1.acct_id IS NOT NULL
AND wcs1.start_tms BETWEEN add_months(TRUNC(SYSDATE,'mm'),-1) AND LAST_DAY(add_months(TRUNC(SYSDATE,'mm'),-1))
GROUP BY to_char(wcs1.start_tms,'DD/MM/YYYY')
ORDER BY to_char(wcs1.start_tms,'DD/MM/YYYY');
OUTPUT
Try below modified queries
select to_char(wcs1.start_tms,'DD/MM/YYYY') as dates,count(*) AS
ecommerce
from ft_t_wcs1 wcs1, ft_t_stup stup
where stup.modl_id='ECOMMERC' and stup.CROSS_REF_ID=wcs1.acct_id and stup.end_tms is null wcs1.scenario='CREATE' and wcs1.acct_id is not null and
wcs1.start_tms between add_months(TRUNC(SYSDATE,'mm'),-1) and
LAST_DAY(add_months(TRUNC(SYSDATE,'mm'),-1))
group by to_char(wcs1.start_tms,'DD/MM/YYYY')
order by to_char(wcs1.start_tms,'DD/MM/YYYY');
-- Another way using JOIN clause
select to_char(wcs1.start_tms,'DD/MM/YYYY') as dates,count(*) AS
ecommerce
from ft_t_wcs1 wcs1
join ft_t_stup stup
ON stup.CROSS_REF_ID=wcs1.acct_id
where stup.modl_id='ECOMMERC' and stup.end_tms is null wcs1.scenario='CREATE' and wcs1.acct_id is not null and
wcs1.start_tms between add_months(TRUNC(SYSDATE,'mm'),-1) and
LAST_DAY(add_months(TRUNC(SYSDATE,'mm'),-1))
group by to_char(wcs1.start_tms,'DD/MM/YYYY')
order by to_char(wcs1.start_tms,'DD/MM/YYYY');
It's hard to suggest an answer without understanding your table relationship, but I can tell that your problem is there is no relationship between your subquery and your main query. Your subquery simply returns a count where modl_id='ECOMMERC', so that value will always be the same - in your case, 105. You need to add a JOIN criteria to the subquery that ties the unique value to your main query. You'll also want to alias the table names differently to ensure you're joining correctly.
You are doing unnecessary joins when you just want a correlated subquery:
SELECT to_char(wcs1.start_tms,'DD/MM/YYYY') as dates,
(SELECT count(*)
FROM ft_t_stup stup
WHERE stup.modl_id= 'ECOMMERC' AND
stup.CROSS_REF_ID = wcs1.acct_id
stup.end_tms IS NULL
) AS ecommerce
FROM ft_t_wcs1 wcs1
WHERE wcs1.scenario = 'CREATE' AND
wcs1.acct_id IS NOT NULL AND
wcs1.start_tms BETWEEN add_months(TRUNC(SYSDATE,'mm'),-1) AND LAST_DAY(add_months(TRUNC(SYSDATE,'mm'),-1))
GROUP BY to_char(wcs1.start_tms, 'DD/MM/YYYY')
ORDER BY to_char(wcs1.start_tms, 'DD/MM/YYYY');

Group by Week beginning Sunday

I have a table with eventdatetime , userid etc. The data is inserted in the table daily.
For the report , I need to give count of userid , projectid grouped by week : Tue-Mon for a month range at a time.
I need help on grouping the data by week for month. I'm using Oracle.
select count(distinct( table1.projectid))as Projects, count(distinct( table2.userid)) as Users,??
from table1
join table 2
on table1.a= table2.a
where table1.e='1'
and table1.eventdatetime between sysdate-30 and sysdate-1
group by ??
I want the output to be grouped by week like :
WeekBegin
2013-04-14
2013-04-21
http://www.techonthenet.com/oracle/functions/to_char.php Use the To_Char function with IW to get the week. Then you can GROUP BY that IW value.
Note that the date the Oracle week starts on is dependent on the language settings of the database. Some countries start on Sunday and some Monday. You'll have to look at your settings to see. If it already starts on Sunday, then you're in luck!
if the example you have posted is your work in progress version - before worrying about getting the days of the week in you should look into getting the basics of the query right
you are selecting e.projectid and u.userid but you haven't got any tables named e or u in your query - it looks like you want to alias them as e and u?
the where clause of your query is also looking for the table e which isn't present
in that case you should change
from table1
join table2
on table1.a= table2.a
to
from table1 e -- select from table1 using alias e
join table2 u -- join table2 using alias u
on ( e.a = u.a ) -- joining on column a from table1 (e) = a from table2 (u)
once you have replaced the a's in the on section with the column names you want to join using it might well run after you remove the last column ", ??" from the select - perhaps something along these lines
select
count (e.projectid) PROJECTS,
count (u.userid) USERS
from table1 e
join table2 u
on ( e.a = u.a )
where e.FILTERING_COLUMN = '1'
and e.eventdatetime >= sysdate-30
note that as sysdate is the current time on the server (depending on localisation and session settings) you can use greater than sysdate-30 instead of between which may well be give the query optimiser an easier time if the table is suitable indexed
the basic rule for grouping is that to select a column you need to either be grouping by it or using an aggregate function such as COUNT()
so you'll probably want something like
select
count (e.projectid) PROJECTS,
count (u.userid) USERS,
to_char(e.eventdatetime,'MM') MONTH
from table1 e
join table2 u
on ( e.a = u.a )
where e.FILTERING_COLUMN = '1'
and e.eventdatetime >= sysdate-30
group by e.eventdatetime
though this won't be the most optimal way to do this it would be easier if you posted the schemas involved in the issue

How do you select data from PostgreSQL database, but if no data is present for a given day, then return 0?

I have the following query:
SELECT created_at::DATE, count (*)
FROM messages
WHERE city = 'los angeles'
GROUP BY created_at::DATE
Which works great. The challenge is that if there are no messages for a given date, then it returns no record for that date. How do you make the above query return the date and 0 if there are no messages on that date, for all days between a given date and today?
Working in PostgreSQL 8.3.
Thanks!
It sounds like you need a table of all the dates you are interested in, as it may contain dates not in your messages table. If you have, or build, this table then left join with the messages table and do count on a column that table--it will return 0 where nothing matches the join.
select d.created_at, count(m.messageId)
from possibleDates d
left join messages m
on d.created_at = m.created_at
group by d.created_at
Typical way is to have a separate calendar table with all of the dates in it, left joined to your table on date column, and then some sort of ifnull(x, 0) statement [whatever the function is for PostgreSQL] or case statement to return 0 when the left-join on the date returns null or 1 when it is not null. Then you can do your normal group by and use SUM(x) instead of count().
Very often, when you want to fill in zeroes for missing entries in a series, the answer in PostgreSQL involves the generate_series function. (Search Stackoverflow for lots of similar questions and answers.) In your case, use something like this:
SELECT ts::date AS date, coalesce(count, 0) AS count
FROM
(SELECT created_at::date, count(*)
FROM messages
WHERE city = 'los angeles'
GROUP BY created_at::date) AS m
RIGHT JOIN
(SELECT *
FROM generate_series(timestamp '2011-07-01',
timestamp 'today',
interval '1 day')) AS series(ts)
ON m.created_at = series.ts
ORDER BY 1;