Counting by dates in a loop in SQL - sql

I've been working with about 20k records, I don't need all the information, I just need aggregate totals as snapshots of certain times in the records history. Luckily each of the events has a column that records the date of the event, some of those dates will be null in the instance that a particular event never happened to that record. But a couple of the stages, can only be calculated by other fields, for instance a stage of "In Progress" can only be determined by the existence of a create date and either a null in the submit date or a submit date greater than the create date for example in pseudo:
if createDate <= #runDate && (submitDate=null || submitDate > #runDate)
In_Progress_count = In_Progress_count + 1
Any of the other fields are simply counted if the date in the field is less than or equal to the field so for example:
if approvedDate <= #runDate
Approved_count = Approved_count+1
For example I have data that looks something like this:
+-------------+--------------+--------------+--------------+--------------+----------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+
| Application | Applicant | Program | Create Date | Accept |Active Duplicate| Cond. Accept | Defer | Deposited | Divert | Duplicate | Early Quit | Incomplete | Ineligible | Pending | Review | Purge | Reject | Withdraw |
+-------------+--------------+--------------+--------------+--------------+----------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+
| 1 | Peg Bundy | Comp-Sci | 2013-08-01 | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> | <null> |
| 2 | Marcy Darcy | Comp-Sci | 2013-08-25 | 2013-09-05 | <null> | <null> | <null> | 2013-09-30 | <null> | <null> | <null> | 2013-08-30 | <null> | <null> | <null> | <null> | 2013-10-01 | <null> |
| 3 | Al Bundy | Language | 2013-09-01 | 2013-09-05 | <null> | <null> | <null> | 2013-09-27 | <null> | <null> | <null> | 2013-09-05 | <null> | <null> | <null> | <null> | <null> | 2013-09-27 |
+-------------+--------------+--------------+--------------+--------------+----------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+
I'm trying to get a result for a query that looks like this if run with '2013-09-26' as the #rundate:
+---------------+--------------+--------------+----------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+
| Program Name | totalApps | countAccept |ActivDuplicates | countCondAccept | countDefer | countDeposited | countDivert | countDuplicate | countEarlyQuit | countIncomplete | countIneligible | countPending | countReview | countPurge | countReject | countWithdraw |
+---------------+--------------+--------------+----------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+
| Comp-Sci | 2 | 1 | 0 | 0 | <null> | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| Language | 1 | 1 | 0 | 0 | <null> | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
+---------------+--------------+--------------+----------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+
What I've tried so far is to count by date on each of the colums, but I'm getting the wrong totals because I only know how to look at one column to asses the date, so basically it's counting everything that's not null even dates past the date I'm trying
SELECT Programs_Name,
Reported_Application_Stage,
count(Reported_Application_Stage) AS AppStageTotal,
count(SubmitDate) AS AppSubmitted,
count(Application_Accept_Date) AS AcceptDate,
count(Deposit_Paid_Date) AS Deposited,
count(Defer_Date) AS Deferred,
count(Deny_Date) AS Denied,
count(Divert_Date) AS Divert,
count(Early_Quit) AS EarlyQuit,
count(Ineligible_Date) AS Ineligible,
count(Purge_Date) AS Purged,
FROM ExtractApplications
WHERE (Report_Date1='2013-09-27')
GROUP BY ExtractSnapshots.Report_Date1, ExtractSnapshots.id, .Programs, .Reported_Application_Stage, _Program, _Start_Term_Year, _Start_Term, _Decision_Display_Value;
Although I can really easily get any specific stages values by date easily using this and they're correct:
SELECT Programs_Name,
count(Defer_Date) AS Deferred
FROM ExtractApplication
WHERE Defer_Date <='2013-09-26'
GROUP BY Programs_Name;
The problem being that I have about 100 dates that I have to use, and about 15 stages that I'm looking for, and I can't really sit and run 1500 queries one at a time for the next week or so without getting fired :P
So what I'm trying to do, is find the right query to count each field, I honestly just don't know how to use the count() function with the types of parameters I'm trying to use I've tried count(someField<'2013-09-27') and it didn't work, I also don't know how to find the "In Progress" field that relies on a createDate combined with a null or > date value in the submitDate field
To top all of that off, I need to put it into a loop that will run this with the dates being the first, eigth, fifteenth, and twenty second of each month over the last few years, and running a loop in SQL is something I don't know how to do, if it were java I would just nest two for loops that run off of array sizes like:
for (i=0; i<year.length;i++) {
for (j=1; j<13; j++) {
for (k=0; k<setDays.length) {
runDate=year[i]+'-'+j+'-'+setDays[k];
}
}
}
(I only include that because that's how I think of this happening contextually as I'm a PHP/Java programmer mainly and not a database admin)
I could really use some help here as I'm at a loss of what to do and I've spent a ton of time working on this already.

Assuming this is SQL Server, and not Access...
This should get you going in the right direction. This is effectively what #DaveJohnson suggested, with a twist in that it only counts each column if the date is before/on the #RunDate (and not null).
DECLARE #RunDate DATE
SET #RunDate = '2013-12-01'
DECLARE #DATA TABLE (AppID INT,Applicant VARCHAR(100),Program VARCHAR(100),CreateDate DATE,Accept DATE,ActiveDuplicate DATE,CondAccept DATE,Defer DATE,Depostited DATE,Divert DATE,Duplicate DATE,EarlyQuit DATE,Incomplete DATE,Ineligible DATE,Pending DATE,Review DATE,Purge DATE,Reject DATE,Withdraw DATE)
INSERT INTO #DATA
SELECT 1,'Peg Bundy','Comp-Sci','2013-08-01',NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL
UNION ALL
SELECT 2,'Marcy Darcy','Comp-Sci','2013-08-25','2013-09-05',NULL,NULL,NULL,'2013-09-30',NULL,NULL,NULL,'2013-08-30',NULL,NULL,NULL,NULL,'2013-10-01',NULL
UNION ALL
SELECT 3,'Al Bundy','Language','2013-09-01','2013-09-05',NULL,NULL,NULL,'2013-09-27',NULL,NULL,NULL,'2013-09-05',NULL,NULL,NULL,NULL,NULL,'2013-09-27'
SELECT Program
, SUM(CASE WHEN CreateDate IS NULL OR CreateDate>#RunDate THEN 0 ELSE 1 END) AS CreateDate
, SUM(CASE WHEN Accept IS NULL OR Accept>#RunDate THEN 0 ELSE 1 END) AS Accept
, SUM(CASE WHEN ActiveDuplicate IS NULL OR ActiveDuplicate>#RunDate THEN 0 ELSE 1 END) AS ActiveDuplicate
, SUM(CASE WHEN CondAccept IS NULL OR CondAccept>#RunDate THEN 0 ELSE 1 END) AS CondAccept
, SUM(CASE WHEN Defer IS NULL OR Defer>#RunDate THEN 0 ELSE 1 END) AS Defer
, SUM(CASE WHEN Depostited IS NULL OR Depostited>#RunDate THEN 0 ELSE 1 END) AS Depostited
, SUM(CASE WHEN Divert IS NULL OR Divert>#RunDate THEN 0 ELSE 1 END) AS Divert
, SUM(CASE WHEN Duplicate IS NULL OR Duplicate>#RunDate THEN 0 ELSE 1 END) AS Duplicate
, SUM(CASE WHEN EarlyQuit IS NULL OR EarlyQuit>#RunDate THEN 0 ELSE 1 END) AS EarlyQuit
, SUM(CASE WHEN Incomplete IS NULL OR Incomplete>#RunDate THEN 0 ELSE 1 END) AS Incomplete
, SUM(CASE WHEN Ineligible IS NULL OR Ineligible>#RunDate THEN 0 ELSE 1 END) AS Ineligible
, SUM(CASE WHEN Pending IS NULL OR Pending>#RunDate THEN 0 ELSE 1 END) AS Pending
, SUM(CASE WHEN Review IS NULL OR Review>#RunDate THEN 0 ELSE 1 END) AS Review
, SUM(CASE WHEN Purge IS NULL OR Purge>#RunDate THEN 0 ELSE 1 END) AS Purge
, SUM(CASE WHEN Reject IS NULL OR Reject>#RunDate THEN 0 ELSE 1 END) AS Reject
, SUM(CASE WHEN Withdraw IS NULL OR Withdraw>#RunDate THEN 0 ELSE 1 END) AS Withdraw
FROM #DATA
GROUP BY Program

Try using a conditional CASE WHEN construct within your aggregation. Also, avoid looping in SQL for your dates as SQL Server is not optimized for this. You can build a date range and then join to that for an efficient set-based solution.
This is a SQL Server (2005+) only answer.
ex:
WITH [cte] AS
(
SELECT
[date]
FROM ( -- build date range
SELECT TOP (DATEDIFF(DAY,0,GETDATE())) -- avoid overflow
DATEADD(DAY,-1 * ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),CAST(GETDATE() AS DATE)) [Date]
FROM sys.all_objects O1
CROSS JOIN sys.all_objects O2 -- if you need LOTS of days
) A
WHERE [date] BETWEEN '01 Jan 2010' AND GETDATE() -- set these accordingly
AND DAY([date]) IN (1,8,15,22)
)
SELECT
[Programs_Name],
SUM(CASE WHEN [SubmitDate] <= B.[date] THEN 1 ELSE 0 END) [AppSubmitted],
SUM(CASE WHEN [Application_Accept_Date] <= B.[date] THEN 1 ELSE 0 END) [AcceptDate],
...
FROM ExtractApplications A
CROSS JOIN [cte] B
GROUP BY [Programs_Name]

Ok sorry that I derped on the "sql-server" tags guys, and I appreciate the help, but I figured it out.
Instead of using SUM(CASE WHEN field=x THEN 1 ELSE 0 END) I found that the equivalent of that in Access is basically SUM(IIF(field=x, 1, 0)) thanks to LittleBobbyTables (fantastic username) over in this thread getting sum using sql with multiple conditions
So what I was looking for in the combined field is SUM(IIF((createDate<=#myDate AND (submitDate>#myDate OR submitDate=null),1,0)) and the rest of the columns work via SUM(IIF(column<=#myDate, 1, 0))
Thanks again guys!

Related

Optimise SQL Query with SUM and Case

I have the following query which takes more than 1 mn to return data:
SELECT extract(HOUR
FROM date) AS HOUR,
SUM(CASE
WHEN country_name = France THEN atdelay
ELSE 0
END) AS France,
SUM(CASE
WHEN country_name = USA THEN atdelay
ELSE 0
END) AS USA,
SUM(CASE
WHEN country_name = China THEN atdelay
ELSE 0
END) AS China,
SUM(CASE
WHEN country_name = Brezil THEN atdelay
ELSE 0
END) AS Brazil,
SUM(CASE
WHEN country_name = Argentine THEN atdelay
ELSE 0
END) AS Argentine,
SUM(CASE
WHEN country_name = Equator THEN atdelay
ELSE 0
END) AS Equator,
SUM(CASE
WHEN country_name = Maroc THEN atdelay
ELSE 0
END) AS Maroc,
SUM(CASE
WHEN country_name = Egypt THEN atdelay
ELSE 0
END) AS Egypt
FROM
(SELECT *
FROM Contry
WHERE (TO_CHAR(entrydate, 'YYYY-MM-DD')::DATE) >= '2021-01-01'
AND (TO_CHAR(entrydate, 'YYYY-MM-DD')::DATE) <= '2021-01-31'
AND code IS NOT NULL) AS A
GROUP BY HOUR
ORDER BY HOUR ASC;
My table is structured like so:
+---------------------+---------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------+------+-----+-------------------+-----------------------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| country_name | varchar(30) | YES | MUL | NULL | |
| date | timestamp | NO | MUL | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| entrydate | timestamp | NO | | NULL | |
| keyword_count | int(11) | YES | | NULL | |
| all_impressions | int(11) | YES | | NULL | |
| all_clicks | int(11) | YES | | NULL | |
| all_ctr | float | YES | | NULL | |
| all_positions | float | YES | | NULL | |
+---------------------+---------------+------+-----+-------------------+-----------------------------+
The current table size is closing in on 50 million rows.
How can I make this faster?
I'm hoping there is another query or table optimisation I can do - alternatively I could pre-aggregate the data but I'd rather avoid that.
(Your table definition doesn't look like you are really using Postgres, but as you tagged your question with Postgres I'll answer it nevertheless)
One obvious attempt would be to create an index on entrydate, then change your WHERE clause so it can make use of that. When it comes to timestamp columns and a range condition it's usually better to use the "next day" as the upper limit together with < instead of <=
WHERE entrydate >= date '2021-01-01'
AND entrydate < date '2021-02-01'
AND code IS NOT NULL
If the condition AND code IS NOT NULL removes many rows in addition to the date range, you can created a partial index.
create index on country (entrydate)
where code IS NOT NULL;
However, when a large part of the rows qualifies for code is not null the additional filter won't help very much.
Not performance related, but the conditional aggregation can be written in a bit more compact way using the filter clause:
sum(atdelay) filter (where country_name = 'France') as france

How to table date according to date

Given table like:
+---------+------+--------+-----------+--------------+
| Empcode | name | desig | joinmonth | releivemonth |
+---------+------+--------+-----------+--------------+
| 1. | A1. | D1. | Jan-18. | null |
| 2. | A2. | D2. | Jan-18. | May-18 |
| 3. | A3. | D3. | Jan-18. | null |
+---------+------+--------+-----------+--------------+
I want to show table like:
+---------------+--------+--------+--------+--------+--------+
| Remarks | jan-18 | feb-18 | mar-18 | apr-18 | may-18 |
+---------------+--------+--------+--------+--------+--------+
| Joinmonth | 3 | 0 | 0 | 0 | 0 |
| Releivedmonth | 0 | 0 | 0 | 0 | 1 |
+---------------+--------+--------+--------+--------+--------+
You need to unpivot and then re-pivot:
select remarks,
sum(case when mon = 'jan-18' then 1 else 0 end) as jan_18,
sum(case when mon = 'feb-18' then 1 else 0 end) as feb_18,
sum(case when mon = 'mar-18' then 1 else 0 end) as mar_18,
sum(case when mon = 'apr-18' then 1 else 0 end) as apr_18,
sum(case when mon = 'may-18' then 1 else 0 end) as may_18
from t cross apply
(values ('Joinmonth', t.Joinmonth), ('Receivedmonth', Receivedmonth)
) v(remarks, mon)
group by remarks
This is an extended comment rather than answer, please accept that I
needed formatting controls before down-voting this.
You appear to have added a query into a comment, although the syntax wasn't fully correct. You have often used standard parentheses () instead of brackets [] and there was a closing parenthesis missing to terminate the IN(). I believe your query should look like this:
SELECT
empname AS remarks
, [1-1-18]
, [1-2-18]
, [1-3-18]
, [1-4-18]
, [1-5-18]
FROM (
SELECT
empname
, joimonth
, releivedmonth
FROM emply
) AS s
PIVOT (
COUNT(releivedmonth)
FOR joinmonth IN ([1-1-18], [1-2-18], [1-3-18], [1-4-18], [1-5-18])
) piv
You should not attempt to add queries to comments, instead just edit the question.
In this query you refer to values that look like 1-1-18 but in the sample of data there is nothing that looks like that at all. What data type is the column [joinmonth] and [releivedmonth]?
With data that is text in those columns you have substantial problem. If for example these are all different: Jan-18.,Jan 18,Jan-18 so they would not align as you need them to. Variations in data like this will make this impossible.
CREATE TABLE emply(
Empcode NUMERIC(9,0)
,empname VARCHAR(6)
,desig VARCHAR(8)
,joinmonth varchar(30)
,releivemonth varchar(30)
);
INSERT INTO emply(Empcode,empname,desig,joinmonth,releivemonth) VALUES (1.,'A1.','D1.','Jan-18.',NULL);
INSERT INTO emply(Empcode,empname,desig,joinmonth,releivemonth) VALUES (2.,'A2.','D2.','Jan-18.','May 18');
INSERT INTO emply(Empcode,empname,desig,joinmonth,releivemonth) VALUES (3.,'A3.','D3.','Jan-18.',NULL);
SELECT
empname AS remarks
, [Jan-18.]
, [Feb-18.]
, [Mar-18.]
, [Apr-18.]
, [May-18.]
FROM (
SELECT
empname
, joinmonth
, releivemonth
FROM emply
) AS s
PIVOT (
COUNT(releivemonth)
FOR joinmonth IN ([Jan-18.], [Feb-18.], [Mar-18.], [Apr-18.], [May-18.])
) piv
The output from this however is:
+----+---------+---------+---------+---------+---------+---------+
| | remarks | Jan-18. | Feb-18. | Mar-18. | Apr-18. | May-18. |
+----+---------+---------+---------+---------+---------+---------+
| 1 | A1. | 0 | 0 | 0 | 0 | 0 |
| 2 | A2. | 1 | 0 | 0 | 0 | 0 |
| 3 | A3. | 0 | 0 | 0 | 0 | 0 |
+----+---------+---------+---------+---------+---------+---------+
There is only one non-null value of COUNT(releivemonth)

SQL- count the non NULL values and count the rows that has string "1"

I'm trying to count non null row in a column but it's counting all the rows and and count the rows in a column that has string "1".
I was able to count the rows in a column that has string "1" for the 1st column but on the 2nd one, it's count the "0" too.
I've seen some articles here but it didn't resolved the issue.
SELECT NAME as Agent_Name, COUNT(case when Thumbs_Up= 1 then 1 else null end) as Thumbs_Up,
COUNT(case when No_Solution_Found =1 then 1 else null end) as No_Solution,
COUNT(case when Save is null then 0 else 1 end) as Total_Saves,
FROM table
GROUP BY NAME
Table:
Name | Thumbs_up | No_Solution_Found | Save
Jonathan | 1 | 0 | Saved
Mike | 0 | 1 | Null
Peter | 1 | 0 | Null
Mike | 1 | 0 | Saved
Peter | 0 | 1 | Saved
Mike | 1 | 0 | Saved
Peter | 0 | 1 | Saved
Expected results:
Name | Thumbs_up | No_Solution | Total_Save
Jonathan | 1 | 0 | 1
Mike | 2 | 1 | 2
Peter | 1 | 2 | 2
Try with SUM instead of COUNT
SELECT NAME as Agent_Name,
SUM(case when Thumbs_Up = 1 then 1 else 0 end) as Thumbs_Up,
SUM(case when No_Solution_Found =1 then 1 else 0 end) as No_Solution,
SUM(case when Save is null then 0 else 1 end) as Total_Saves,
FROM table
GROUP BY NAME
Since only the Save column has NULLs, I assume that's the column you have the problem with.
In your query you wrote:
COUNT(case when Save is null then 0 else 1 end) as Total_Saves,
That is, you're replacing NULL by 0, which is a non null value and therefore is counted.
You presumable wanted to just write:
COUNT(Save) as Total_Saves
(And BTW, there is a comma after as Total_Saves in your query, that doesn't belong there, as no other column expression follows.)
Try the following query-:
Select
Name,
sum(Thumbs_up),
sum(No_Solution_Found),
count(case when [Save] is not null then 1 else null end) as Total_save
from TABLE
group by Name
SQL Server 2014

How to count and group by

As you can see in this image below, I need to count how many number '1' is on every column, the number '1' means that the person interviewed feels secure at Home(AP_4_01),Workplace(AP4_4_02) and so on..
Number 2 = Insecure
Number 3 = Doesn't Apply
Number 9 = Didn't Answer
+----------+----------------------+
| Columns | Numbers of persons |
+----------+----------------------+
| AP4_4_01 | 312 |
| AP4_4_02 | 232 |
| AP4_4_03 | 345 |
| AP4_4_0X | XXX |
+----------+----------------------+
You just need to use the SUM function on some case statements
SELECT
SUM(CASE WHEN AP_4_01 = 1 THEN 1 ELSE 0 END)
,SUM(CASE WHEN AP_4_02 = 1 THEN 1 ELSE 0 END)
...etc
FROM Table
To get a result set like the one in your question, you will need to use the UNPIVOT function, or you can transpose it in excel.

SQL: query one column in same table

A football manager here. How do I:
Select all matches that have kicked-off but never had a goal.
Select all matches that kicked-off more than 1h ago but haven't yet had a goal or a corner-kick.
| Match | Event | EventTime |
|-------------------------------------------|
| 1 | Kick-off | 2014-12-15T16:00:00 |
| 1 | Throw-in | 2014-12-15T16:15:00 |
| 1 | Goal | 2014-12-15T16:20:00 |
| 1 | Corner-kick | 2014-12-15T16:30:00 |
| 1 | End | 2014-12-15T17:30:00 |
| 2 | Kick-off | 2014-12-10T16:00:00 |
| 2 | Goal | 2014-12-10T16:01:00 |
| 3 | Kick-off | 2014-12-05T08:00:00 |
| 3 | Corner-kick | 2014-12-05T08:10:00 |
I feel this should be simple, but I'm stuck somehow.
1:
SELECT DISTINCT Match
FROM dbo.YourTable A
WHERE [Event] = 'Kick-off'
AND NOT EXISTS( SELECT 1 FROM dbo.YourTable
WHERE Match = A.Match
AND [Event] = 'Goal')
2:
SELECT DISTINCT Match
FROM dbo.YourTable A
WHERE [Event] = 'Kick-off'
AND EventTime <= GETDATE(HOUR,-1,GETDATE())
AND NOT EXISTS( SELECT 1 FROM dbo.YourTable
WHERE Match = A.Match
AND [Event] IN ('Goal','Corner-kick'))
You would do this with aggregation and a having clause. For the first:
select match
from table t
group by match
having sum(case when event = 'Kick-off' then 1 else 0 end) > 0 and
sum(case when event = 'Goal' then 1 else 0 end) = 0;
For the second:
select match
from table t
group by match
having max(sum case when event = 'Kick-off' then eventtime end) <= getdate() - 1.0/24 and
sum(case when event in ('Goal', 'Corner-Kick') then 1 else 0 end) = 0;
Each condition in the having clause counts the number of rows that match the condition. > 0 means that at least one row matched. = 0 means no rows match.