Find number of occurences of event per day - SQL

Find number of occurences of event per day - SQL - sql

I'm trying to set up some monitoring in my SQL database.
select
gn.Goal_Name_
,gn.EventTimestamp as Timestamp
--,Max(EventTimestamp) as Timestamp
from(
select CASE when substr(GoalName,1,3)='MSD' then 'MSD' when substr(GoalName,1,5)='https' then 'https' else goalname END as Goal_Name_
,EventTimestamp
from CG.Goal as goal
)gn
group by 1,2
Produces a table with a structure like:
Goal_Name_
Timestamp
MSD
05.03.2021 11:05:20.162
Logout
18.01.2022 20:07:29.799
Login
23.01.2022 09:12:16.597
etc
etc
The problem i'm having is finding a way to count each distinct Goal Name for each day. Find the daily occurence really.

You are almost there. The only thing missing is converting your timestamp to date and count the number of rows.
select
gn.Goal_Name_
,CAST(gn.EventTimestamp AS DATE FORMAT 'YYYY/MM/DD') as eventDay
,Count(*) as GoalsCount
from(
select CASE when substr(GoalName,1,3)='MSD' then 'MSD' when
substr(GoalName,1,5)='https' then 'https' else goalname END as Goal_Name_
,EventTimestamp
from CG.Goal as goal
)gn
group by 1,2

Related

POSTGRES DATA_TRUNC should return 0 for intervals that has no data

I am trying to do a time series-like reporting, for that, I am using the Postgres DATA_TRUNC function, it works fine and I am getting the expected output, but when a specific interval has no record then it is getting skipped to show, but my expected output is to get the interval also with 0 as the count, below is the query that I have right now. What change I should do to get the intervals that have no data? Thanks in advance.
SELECT date_trunc('days', sent_at), count('*')
FROM (select * from invoice
WHERE supplier = 'ABC' and sent_at BETWEEN '2021-12-01' AND '2022-07-31') as inv
GROUP BY date_trunc('days', sent_at)
ORDER BY date_trunc('days', sent_at);
Expected: As you can see below, the current output now shows 02/12 and then 07/12, it has skipped dates in the middle, but for me, it should also show 03/12, 04/12, 05/12 with count as 0
Current output

It doesn't seem like you have those dates in your data, in which case you need to generate them. Also, casting your timestamp to date instead of date_trunc() can get rid of those zeroes.
SELECT dates::date, count(*) filter (where sent_at is not null)
FROM (
select *
from invoice a
right join generate_series( '2021-12-01'::date,
'2021-12-31'::date,
'1 day'::interval ) as b(dates)
on sent_at::date=b.dates) as inv
GROUP BY 1
ORDER BY 1;
Here's a working example. Also, please try to improve your question according to #nbk's comment.

SQL: Apply an aggregate result per day using window functions

Consider a time-series table that contains three fields time of type timestamptz, balance of type numeric, and is_spent_column of type text.
The following query generates a valid result for the last day of the given interval.
SELECT
MAX(DATE_TRUNC('DAY', (time))) as last_day,
SUM(balance) FILTER ( WHERE is_spent_column is NULL ) AS value_at_last_day
FROM tbl
2010-07-12 18681.800775017498741407984000
However, I am in need of an equivalent query based on window functions to report the total value of the column named balance for all the days up to and including the given date .
Here is what I've tried so far, but without any valid result:
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(sum(balance) FILTER ( WHERE is_spent_column is NULL ) ) OVER ( ORDER BY DATE_TRUNC('DAY', (time)) ) AS total_value_per_day
FROM tbl
group by 1
order by 1 desc
2010-07-12 16050.496339044977568391974000
2010-07-11 13103.159119670350269890284000
2010-07-10 12594.525752964512456914454000
2010-07-09 12380.159588711091681327014000
2010-07-08 12178.119542536668113577014000
2010-07-07 11995.943973804127033140014000
EDIT:
Here is a sample dataset:
LINK REMOVED
The running total can be computed by applying the first query above on the entire dataset up to and including the desired day. For example, for day 2009-01-31, the result is 97.13522530000000000000, or for day 2009-01-15 when we filter time as time < '2009-01-16 00:00:00' it returns 24.446144000000000000.
What I need is an alternative query that computes the running total for each day in a single query.
EDIT 2:
Thank you all so very much for your participation and support.
The reason for differences in result sets of the queries was on the preceding ETL pipelines. Sorry for my ignorance!
Below I've provided a sample schema to test the queries.
https://www.db-fiddle.com/f/veUiRauLs23s3WUfXQu3WE/2
Now both queries given above and the query given in the answer below return the same result.

Consider calculating running total via window function after aggregating data to day level. And since you aggregate with a single condition, FILTER condition can be converted to basic WHERE:
SELECT daily,
SUM(total_balance) OVER (ORDER BY daily) AS total_value_per_day
FROM (
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(balance) AS total_balance
FROM tbl
WHERE is_spent_column IS NULL
GROUP BY 1
) AS daily_agg
ORDER BY daily

SQL Why am I getting the invalid identifier error?

I am trying to use columns that I created in this query to create another column.
Let me first my messy query. The query looks like this:
SELECT tb.team, tb.player, tb.type, tb.date, ToChar(Current Date-1, 'DD-MON-YY') as yesterday,
CASE WHEN to_date(tb.date) = yesterday then 1 else 0 end dateindicator,
FROM (
COUNT DISTINCT(*)
FROM TABLE_A, dual
where dateindicator = 1
Group by tb.team
)
What I am trying to do here is:
creating a column with "Yesterday's date"
Using the "Yesterday" column to create another column called dateindicator indicating each row is yesterday's data or not.
then using that dateindicator, I want to count the distinct number of player for each team that has 1 of the dateindicator column.
But I am getting the "invalid identifier" error. I am new to this oracle SQL, and trying to learn here.

You cannot use an Alias in your Select statement.
see here: SQL: Alias Column Name for Use in CASE Statement
you need to use the full toChar(.. in the CASE WHEN.
Also:
Your WHERE-condition (Line 5) doesnt belong there.. it should be:
SELECT DISTINCT .>. FROM .>. WHERE. you have to specify the table first. then you can filter it with where.

If I follow your explanation correctly: for each team, you want to count the number of players whose date column is yesterday.
If so, you can just filter and aggregate:
select team, count(*) as cnt
from mytable
where mydate >= trunc(sysdate) - 1 and mydate < trunc(sysdate)
group by team
This assumes that the dates are stored in column mydate, that is of date datatype.
I am unsure what you mean by counting distinct players; presumably, a given player appears just once per team, so I used count(*). If you really need to, you can change that to count(distinct player).
Finally: if you want to allow teams where no player matches, you can move the filtering logic within the aggregate function:
select team,
sum(case when mydate >= trunc(sysdate) - 1 and mydate < trunc(sysdate) then 1 else 0 end) as cnt
from mytable
group by team

View data by date after Format 'mmyy'

I'm trying to answer questions like, how many POs per month do we have? Or, how many lines are there in every PO by month, etc. The original PO dates are all formatted #1/1/2013#. So my first step was to Format each PO record date into 'mmyy' so I could group and COUNT them.
This worked well but, now I cannot view the data by date... For example, I cannot ask 'How many POs after December did we get?' I think this is because SQL does not recognize mm/yy as a comparable date.
Any ideas how I could restructure this?
There are 2 queries I wrote. This is the query to format the dates. This is also the query I was trying to add the date filter to (ex: >#3/14#)
SELECT qryALL_PO.POLN, Format([PO CREATE DATE],"mm/yy") AS [Date]
FROM qryALL_PO
GROUP BY qryALL_PO.POLN, Format([PO CREATE DATE],"mm/yy");
My group and counting query is:
SELECT qryALL_PO.POLN, Sum(qryALL_PO.[LINE QUANTITY]) AS SUM_QTY_PO
FROM qryALL_PO
GROUP BY qryALL_PO.POLN;

You can still count and group dates, as long as you have a way to determine the part of the date you are looking for.
In Access you can use year and month for example to get the year and month part of the date:
select year(mydate)
, month(mydate)
, count(*)
from tableX
group
by year(mydate)
, month(mydate)

You can format it 'YYYY-MM' , and then use '>' for 'after' clause

PostgreSQL "nested"? distincts and count

I need to get the count of the distinct names per hour in one query in PostgreSQL 9.1
The relevant columns(generalized for question) in my table are:
occurred timestamp with time zone and
name character varying(250)
And the table name for the sake of the question is just table
The occurred timestamps will all be within a midnight to midnight(exclusive) range for one day. So far my query looks like:
'SELECT COUNT(DISTINCT ON (name)) FROM table'
It would be nice if I could get the output formatted as a list of 24 integers(one for each hour of the day), the names aren't required to be returned.

If I understand correctly what you want, you can write:
SELECT EXTRACT(HOUR FROM occurred),
COUNT(DISTINCT name)
FROM ...
WHERE ...
GROUP
BY EXTRACT(HOUR FROM occurred)
ORDER
BY EXTRACT(HOUR FROM occurred)
;

SELECT date_trunc('hour', occurred) AS hour_slice
,count(DISTINCT name) AS name_ct
FROM mytable
GROUP BY 1
ORDER BY 1;
DISTINCT ON is a different feature.
date_trunc() gives you a sum for every distinct hour, while EXTRACT sums per hour-of-day over longer periods of time. The two results do not add up, because summing up multiple count(DISTINCT x) is equal or greater than one count(DISTINCT x).

You want this by hour:
select extract(hour from occurred) as hr, count(distinct name)
from table t
group by extract(hour from occurred)
order by 1
This assumes there is data for only one day. Otherwise, hours from different days would be combined. To get around this, you would need to include date information as well.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas