How can i optimize the below query? - sql

I have a table like this.
_id (integer)
event_name(varchar(20))
event_date(timestamp)
Here is some sample data given below.
ID event_date event_name
101 2013-04-24 18:33:37.694818 event_A
102 2013-04-24 20:34:37.000000 event_B
103 2013-04-24 20:40:37.000000 event_A
104 2013-04-25 01:00:00.694818 event_B
105 2013-04-25 12:00:15.694818 event_A
I need the data from above table in below format.
Date count_eventA count_eventB
2013-04-24 2 1
2013-04-25 1 1
hence basically in need the count of each event on each date.
I have tried below query for getting the desired result.
SELECT A.date1 AS Date ,
A.count1 AS count_eventA,
B.count2 AS count_eventB,
FROM
(SELECT count(event_name)AS count1,
event_date::date AS date1
FROM tblname
WHERE event_name='event_A'
GROUP BY (event_date::date))AS A
LEFT JOIN
(SELECT count(event_name)AS count1,
event_date::date AS date1
FROM tblname
WHERE event_name='event_B'
GROUP BY (event_date::date))AS B ON A.date1=B.date2
Can someone please suggest me to find out a better and optimized query? , or I am following a good approach .

Something on this lines should work:
select event_date::date AS Date ,
count_eventA = sum(case when event_name = 'event_A' then 1 else 0 end),
count_eventB = sum(case when event_name = 'event_B' then 1 else 0 end)
from tblname
GROUP BY (event_date::date))
If you have more events you only need to add more sum(case) lines :)
The DBEngine only runs through the table once to give you the totals, independiently of the number of the events you want to count: when you have a high rowcount you will observe significant delay with the original query. Should I add this to my answer, you think

Simpler (and cleaner) than the case syntax:
select
event_date::date as Date,
count(event_name = 'event_A' or null) count_eventA,
count(event_name = 'event_B' or null) count_eventB
from t
group by 1

you are looking for PIVOT and UNPIVOT in sql check below example is very handy
http://blog.sqlauthority.com/2008/06/07/sql-server-pivot-and-unpivot-table-examples/

Related

Choosing MAX value by id in a view?

I have created a simple view based on a few columns in our database
ALTER VIEW [BI].[v_RCVLI_Test] AS
Select distinct
Borger.CPRnrKort as CPR,
(...)
IndsatsDetaljer.VisitationId as VisitationsId,
Indsats.KatalogNavn as IndsatsNavn,
(case
when
(
Indsats.Model = 'SMDB2 Tilbudsmodel' or
Indsats.Model = 'SMDB2 Samtalemodel' or
Indsats.Model = 'Tilbudsmodel' or
Indsats.Model = 'NAB Tilbudsmodel'
)
then IndsatsDetaljer.ServicePeriodeStart
else IndsatsDetaljer.Ikrafttraedelsesdato
end
) as StartDato,
(case
when
(
Indsats.Model = 'SMDB2 Tilbudsmodel' or
Indsats.Model = 'SMDB2 Samtalemodel' or
Indsats.Model = 'Tilbudsmodel'
)
then (case when IndsatsDetaljer.VisitationSlut = '9999-12-31' then convert(varchar(10), getdate(), 23) else IndsatsDetaljer.VisitationSlut end)
when
Indsats.Model = 'NAB Tilbudsmodel'
then (case when IndsatsDetaljer.NABehandlingSlutDato = '9999-12-31' then convert(varchar(10), getdate(), 23) else IndsatsDetaljer.NABehandlingSlutDato end)
else (case when IndsatsDetaljer.VisitationSlut = '9999-12-31' then convert(varchar(10), getdate(), 23) else IndsatsDetaljer.VisitationSlut end)
end
) as StopDato,
Refusion.Handlekommune as Handlekommune,
replace(Refusion.Betalingskommune, 'Ukendt', 'Kendt') Betalingskommune
from nexus2.Fact_VisiteretTid as Fact
join nexus2.Dim_Paragraf Paragraf
on Fact.DW_SK_Paragraf = Paragraf.DW_SK_Paragraf
join nexus2.Dim_Indsats Indsats
on Fact.DW_SK_Indsats = Indsats.DW_SK_Indsats (...)
The cases for StartDato and StopDato are there because those dates come from different columns. I've converted the date '9999-12-31' to the the current date because we'll be doing some time calculations later on, and it's just more convenient.
CPR is the id of a person, VisitationsId is the id for the service the person received.
In theory, There should only be one StartDato and one StopDato per VisitationsId, but because of a glitch in the documentation system, we sometimes get TWO StopDato: one is the correct, and one is '9999-12-31' (now converted to current date).
So I need to group by VisitationsId and then just take the MIN value of StopDato, but I'm kind of unsure how to go about doing that?
CPR
VisitationsId
StartDato
StopDato
Something Else
123
56
2019-01-01
2019-12-12
Something
123
56
2019-01-01
9999-12-31
Something
123
58
2019-01-01
2019-12-14
Something
345
59
2018-11-01
9999-12-31
Something
345
55
2017-01-02
2017-11-12
Something
345
55
2017-01-02
9999-12-31
Something
In the above table I need to remove lines 2 and 6, because the VisitationsId is identical to the previous row, but they diverge on StopDato.
Using a group by anywhere in the query gives me an error on another (seemingly random) column telling me that the column is:
invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Any suggestions on how I can go about doing this?
Add a filter which tests for this condition?
with cte (
{your current query}
)
select *
from cte T
where not (
StopDato = '9999-12-31'
and exists (
select 1
from cte T1
where T1.VisitationsId = T.VisitationsId
and StopDato != '9999-12-31'
)
);
And you look like you are converting StopDato to a varchar which is bad - you should treat dates as dates until you need to display them.

Datediff on 2 rows of a table with a condition

My data looks like the following
TicketID OwnedbyTeamT Createddate ClosedDate
1234 A
1234 A 01/01/2019 01/05/2019
1234 A 10/05/2018 10/07/2018
1234 B 10/04/2019 10/08/2018
1234 finance 11/01/2018 11/11/2018
1234 B 12/02/2018
Now, I want to calculate the datediff between the closeddates for teams A, and B, if the max closeddate for team A is greater than max closeddate team B. If it is smaller or null I don't want to see them. So, for example,I want to see only one record like this :
TicketID (Datediff)result-days
1234 86
and for another tickets, display the info. For example, if the conditions aren't met then:
TicketID (Datediff)result-days
2456 -1111111
Data sample for 2456:
TicketID OwnedbyTeamT Createddate ClosedDate
2456 A
2456 A 10/01/2019 10/05/2019
2456 B 08/05/2018 08/07/2018
2456 B 06/04/2019 06/08/2018
2456 finance 11/01/2018 11/11/2018
2456 B 12/02/2018
I want to see the difference in days between 01/05/2019 for team A, and
10/08/2018 for team B.
Here is the query that I wrote, however, all I see is -1111111, any help please?:
SELECT A.incidentid,
( CASE
WHEN Max(B.[build validation]) <> 'No data'
AND Max(A.crfs) <> 'No data'
AND Max(B.[build validation]) < Max(A.crfs) THEN
Datediff(day, Max(B.[build validation]), Max(A.crfs))
ELSE -1111111
END ) AS 'Days-CRF-diff'
FROM (SELECT DISTINCT incidentid,
Iif(( ownedbyteam = 'B'
AND titlet LIKE '%Build validation%' ), Cast(
closeddatetimet AS NVARCHAR(255)), 'No data') AS
'Build Validation'
FROM incidentticketspecifics) B
INNER JOIN (SELECT incidentid,
Iif(( ownedbyteamt = 'B'
OR ownedbyteamt =
'Finance' ),
Cast(
closeddatetimet AS NVARCHAR(255)), 'No data') AS
'CRFS'
FROM incidentticketspecifics
GROUP BY incidentid,
ownedbyteamt,
closeddatetimet) CRF
ON A.incidentid = B.incidentid
GROUP BY A.incidentid
I hope the following answer will be of help.
With two subqueries for the two teams (A and B), the max date for every Ticket is brought. A left join between these two tables is performed to have these information in the same row in order to perform DATEDIFF. The last WHERE clause keeps the row with the dates greater for A team than team B.
Please change [YourDB] and [MytableName] in the following code with your names.
--Select the items to be viewed in the final view along with the difference in days
SELECT A.[TicketID],A.[OwnedbyTeamT], A.[Max_DateA],B.[OwnedbyTeamT], B.[Max_DateB], DATEDIFF(dd,B.[Max_DateB],A.[Max_DateA]) AS My_Diff
FROM
(
--The following subquery creates a table A with the max date for every project for team A
SELECT [TicketID]
,[OwnedbyTeamT]
,MAX([ClosedDate]) AS Max_DateA
FROM [YourDB].[dbo].[MytableName]
GROUP BY [TicketID],[OwnedbyTeamT]
HAVING [OwnedbyTeamT]='A')A
--A join between view A and B to bring the max dates for every project
LEFT JOIN (
--The max date for every project for team B
SELECT [TicketID]
,[OwnedbyTeamT]
,MAX([ClosedDate]) AS Max_DateB
FROM [YourDB].[dbo].[MytableName]
GROUP BY [TicketID],[OwnedbyTeamT]
HAVING [OwnedbyTeamT]='B')B
ON A.[TicketID]=B.[TicketID]
--Fill out the rows on the max dates for the teams
WHERE A.Max_DateA>B.Max_DateB
You might be able to do with a PIVOT. I am leaving a working example.
SELECT [TicketID], "A", "B", DATEDIFF(dd,"B","A") AS My_Date_Diff
FROM
(
SELECT [TicketID],[OwnedbyTeamT],MAX([ClosedDate]) AS My_Max
FROM [YourDB].[dbo].[MytableName]
GROUP BY [TicketID],[OwnedbyTeamT]
)Temp
PIVOT
(
MAX(My_Max)
FOR Temp.[OwnedbyTeamT] in ("A","B")
)PIV
WHERE "A">"B"
Your sample query is quite complicated and has conditions not mentioned in the text. It doesn't really help.
I want to calculate the datediff between the closeddates for teams A, and B, if the max closeddate for team A is greater than max closeddate team B. If it is smaller or null I don't want to see them.
I think you want this per TicketId. You can do this using conditional aggregation:
SELECT TicketId,
DATEDIFF(day,
MAX(CASE WHEN OwnedbyTeamT = 'B' THEN ClosedDate END),
MAX(CASE WHEN OwnedbyTeamT = 'A' THEN ClosedDate END) as diff
)
FROM incidentticketspecifics its
GROUP BY TicketId
HAVING MAX(CASE WHEN OwnedbyTeamT = 'A' THEN ClosedDate END) >
MAX(CASE WHEN OwnedbyTeamT = 'B' THEN ClosedDate END)

How to calculate data from rows in sql server?

I have a table like this one below:
datatime in_out
---------------------
08:00 IN
08:30 OUT
09:30 OUT
10:00 IN
10:30 OUT
Is there any chance, after a SQL server query to get something like this:
IN OUT
---------------
08:00 08:30
NULL 09:30
10:00 10:30
I spent about 2 weeks to find a solution. I am a beginner. The single solution was with min and max but it did not help me.
This will solve it using row_numbering:
with Ordered as (
select *, rn = row_number() over (order by datatime)
from Input
)
select
[In] = o_in.datatime
, [Out] = o_out.datatime
from Ordered o_out
left join Ordered o_in
on o_in.rn = o_out.rn - 1
and o_in.in_out = 'IN'
where o_out.in_out = 'OUT'
Guess I'm a bit slow to answer, but here's what I got using an inner query:
select
(select top 1
IIF(a.in_out = b.in_out, null, datatime)
from clk b
where a.datatime > b.datatime
order by b.datatime desc
) as [IN],
a.datatime as [OUT]
from clk a
where a.in_out = 'OUT'
Note: doing it this way will "skip" the null rows either forwards or backwards depending on which way it's implemented...

How to use "Group By" for date interval in postgres

I have a table like this.
ID (integer)
event_name(varchar(20))
event_date(timestamp)
some sample data is given below.
ID event_date event_name
101 2013-04-24 18:33:37.694818 event_A
102 2013-04-24 20:34:37.000000 event_B
103 2013-04-24 20:40:37.000000 event_A
104 2013-04-25 01:00:00.694818 event_A
105 2013-04-25 12:00:15.694818 event_A
106 2013-04-26 00:56:10.800000 event_A
107 2013-04-27 12:00:15.694818 event_A
108 2013-04-27 12:00:15.694818 event_B
I need to generate window based report. Here window represent group of rows. Eg: if I choose window size of 2, I need to show the total counting of each event for two days successively, ie the same day and the previous day.
If i choose window size 3 , I need to generate count of each event for three successive days.
so If 2 day window is selected, the result should be something like below.
Date Count_eventA Count_eventB
2013-04-27 (this counts sum of 27th, 26th) 2 1
2013-04-26 (this counts sum of 26th, 25th) 3 0
2013-04-25 (this counts sum of 25th, 24th) 4 1
2013-04-24 (this counts sum of 24th ) 2 1
I have read window function in postgres. Can someone guide me how to write a sql query for this report!
You want to use the count aggregate as a window function, eg count(id) over (partition by event_date rows 3 preceeding)... but it's greatly complicated by the nature of your data. You're storing timestamps, not just dates, and you want to group by day not by number of previous events. To top it all off, you want to cross-tabulate the results.
If PostgreSQL supported RANGE in window functions this would be considerably simpler than it is. As it is, you have to do it the hard way.
You can then filter that through a window to get the per-event per-day lagged counts ... except that your event days aren't contiguous and unfortunately PostgreSQL window functions only support ROWS, not RANGE, so you have to join across a generated series of dates first.
WITH
/* First, get a listing of event counts by day */
event_days(event_name, event_day, event_day_count) AS (
SELECT event_name, date_trunc('day', event_date), count(id)
FROM Table1
GROUP BY event_name, date_trunc('day', event_date)
ORDER BY date_trunc('day', event_date), event_name
),
/*
* Then fill in zeros for any days within the range that didn't have any events.
* If PostgreSQL supported RANGE windows, not just ROWS, we could get rid of this/
*/
event_days_contiguous(event_name, event_day, event_day_count) AS (
SELECT event_names.event_name, gen_day, COALESCE(event_days.event_day_count,0)
FROM generate_series( (SELECT min(event_day)::date FROM event_days), (SELECT max(event_day)::date FROM event_days), INTERVAL '1' DAY ) gen_day
CROSS JOIN (SELECT DISTINCT event_name FROM event_days) event_names(event_name)
LEFT OUTER JOIN event_days ON (gen_day = event_days.event_day AND event_names.event_name = event_days.event_name)
),
/*
* Get the lagged counts by using the sum() function over a row window...
*/
lagged_days(event_name, event_day_first, event_day_last, event_days_count) AS (
SELECT event_name, event_day, first_value(event_day) OVER w, sum(event_day_count) OVER w
FROM event_days_contiguous
WINDOW w AS (PARTITION BY event_name ORDER BY event_day ROWS 1 PRECEDING)
)
/* Now do a manual pivot. For arbitrary column counts use an external tool
* or check out the 'crosstab' function in the 'tablefunc' contrib module
*/
SELECT d1.event_day_first, d1.event_days_count AS "Event_A", d2.event_days_count AS "Event_B"
FROM lagged_days d1
INNER JOIN lagged_days d2 ON (d1.event_day_first = d2.event_day_first AND d1.event_name = 'event_A' AND d2.event_name = 'event_B')
ORDER BY d1.event_day_first;
Output with the sample data:
event_day_first | Event_A | Event_B
------------------------+---------+---------
2013-04-24 00:00:00+08 | 2 | 1
2013-04-25 00:00:00+08 | 4 | 1
2013-04-26 00:00:00+08 | 3 | 0
2013-04-27 00:00:00+08 | 2 | 1
(4 rows)
You can potentially make the query faster but much uglier by combining the three CTE clauses into a nested query using FROM (SELECT...) and wrapping them in a view instead of a CTE for use from the outer query. This will allow Pg to "push down" predicates into the queries, greatly reducing the data you have to work with when querying subsets of the data.
SQLFiddle doesn't seem to be working at the moment, but here's the demo setup I used:
CREATE TABLE Table1
(id integer primary key, "event_date" timestamp not null, "event_name" text);
INSERT INTO Table1
("id", "event_date", "event_name")
VALUES
(101, '2013-04-24 18:33:37', 'event_A'),
(102, '2013-04-24 20:34:37', 'event_B'),
(103, '2013-04-24 20:40:37', 'event_A'),
(104, '2013-04-25 01:00:00', 'event_A'),
(105, '2013-04-25 12:00:15', 'event_A'),
(106, '2013-04-26 00:56:10', 'event_A'),
(107, '2013-04-27 12:00:15', 'event_A'),
(108, '2013-04-27 12:00:15', 'event_B');
I changed the ID of the last entry from 107 to 108, as I suspect that was just an error in your manual editing.
Here's how to express it as a view instead:
CREATE VIEW lagged_days AS
SELECT event_name, event_day AS event_day_first, sum(event_day_count) OVER w AS event_days_count
FROM (
SELECT event_names.event_name, gen_day, COALESCE(event_days.event_day_count,0)
FROM generate_series( (SELECT min(event_date)::date FROM Table1), (SELECT max(event_date)::date FROM Table1), INTERVAL '1' DAY ) gen_day
CROSS JOIN (SELECT DISTINCT event_name FROM Table1) event_names(event_name)
LEFT OUTER JOIN (
SELECT event_name, date_trunc('day', event_date), count(id)
FROM Table1
GROUP BY event_name, date_trunc('day', event_date)
ORDER BY date_trunc('day', event_date), event_name
) event_days(event_name, event_day, event_day_count)
ON (gen_day = event_days.event_day AND event_names.event_name = event_days.event_name)
) event_days_contiguous(event_name, event_day, event_day_count)
WINDOW w AS (PARTITION BY event_name ORDER BY event_day ROWS 1 PRECEDING);
You can then use the view in whatever crosstab queries you want to write. It'll work with the prior hand-crosstab query:
SELECT d1.event_day_first, d1.event_days_count AS "Event_A", d2.event_days_count AS "Event_B"
FROM lagged_days d1
INNER JOIN lagged_days d2 ON (d1.event_day_first = d2.event_day_first AND d1.event_name = 'event_A' AND d2.event_name = 'event_B')
ORDER BY d1.event_day_first;
... or using crosstab from the tablefunc extension, which I'll let you study up on.
For a laugh, here's the explain on the above view-based query: http://explain.depesz.com/s/nvUq

SQL Query: Calculating the deltas in a time series

For a development aid project I am helping a small town in Nicaragua improving their water-network-administration.
There are about 150 households and every month a person checks the meter and charges the houshold according to the consumed water (reading from this month minus reading from last month). Today all is done on paper and I would like to digitalize the administration to avoid calculation-errors.
I have an MS Access Table in mind - e.g.:
*HousholdID* *Date* *Meter*
0 1/1/2013 100
1 1/1/2013 130
0 1/2/2013 120
1 1/2/2013 140
...
From this data I would like to create a query that calculates the consumed water (the meter-difference of one household between two months)
*HouseholdID* *Date* *Consumption*
0 1/2/2013 20
1 1/2/2013 10
...
Please, how would I approach this problem?
This query returns every date with previous date, even if there are missing months:
SELECT TabPrev.*, Tab.Meter as PrevMeter, TabPrev.Meter-Tab.Meter as Diff
FROM (
SELECT
Tab.HousholdID,
Tab.Data,
Max(Tab_1.Data) AS PrevData,
Tab.Meter
FROM
Tab INNER JOIN Tab AS Tab_1 ON Tab.HousholdID = Tab_1.HousholdID
AND Tab.Data > Tab_1.Data
GROUP BY Tab.HousholdID, Tab.Data, Tab.Meter) As TabPrev
INNER JOIN Tab
ON TabPrev.HousholdID = Tab.HousholdID
AND TabPrev.PrevData=Tab.Data
Here's the result:
HousholdID Data PrevData Meter PrevMeter Diff
----------------------------------------------------------
0 01/02/2013 01/01/2013 120 100 20
1 01/02/2013 01/01/2012 140 130 10
The query above will return every delta, for every households, for every month (or for every interval). If you are just interested in the last delta, you could use this query:
SELECT
MaxTab.*,
TabCurr.Meter as CurrMeter,
TabPrev.Meter as PrevMeter,
TabCurr.Meter-TabPrev.Meter as Diff
FROM ((
SELECT
Tab.HousholdID,
Max(Tab.Data) AS CurrData,
Max(Tab_1.Data) AS PrevData
FROM
Tab INNER JOIN Tab AS Tab_1
ON Tab.HousholdID = Tab_1.HousholdID
AND Tab.Data > Tab_1.Data
GROUP BY Tab.HousholdID) As MaxTab
INNER JOIN Tab TabPrev
ON TabPrev.HousholdID = MaxTab.HousholdID
AND TabPrev.Data=MaxTab.PrevData)
INNER JOIN Tab TabCurr
ON TabCurr.HousholdID = MaxTab.HousholdID
AND TabCurr.Data=MaxTab.CurrData
and (depending on what you are after) you could only filter current month:
WHERE
DateSerial(Year(CurrData), Month(CurrData), 1)=
DateSerial(Year(DATE()), Month(DATE()), 1)
this way if you miss a check for a particular household, it won't show.
Or you might be interested in showing last month present in the table (which can be different than current month):
WHERE
DateSerial(Year(CurrData), Month(CurrData), 1)=
(SELECT MAX(DateSerial(Year(Data), Month(Data), 1))
FROM Tab)
(here I am taking in consideration the fact that checks might be on different days)
I think the best approach is to use a correlated subquery to get the previous date and join back to the original table. This ensures that you get the previous record, even if there is more or less than a 1 month lag.
So the right query looks like:
select t.*, tprev.date, tprev.meter
from (select t.*,
(select top 1 date from t t2 where t2.date < t.date order by date desc
) prevDate
from t
) join
t tprev
on tprev.date = t.prevdate
In an environment such as the one you describe, it is very important not to make assumptions about the frequency of reading the meter. Although they may be read on average once per month, there will always be exceptions.
Testing with the following data:
HousholdID Date Meter
0 01/12/2012 100
1 01/12/2012 130
0 01/01/2013 120
1 01/01/2013 140
0 01/02/2013 120
1 01/02/2013 140
The following query:
SELECT a.housholdid,
a.date,
b.date,
a.meter,
b.meter,
a.meter - b.meter AS Consumption
FROM (SELECT *
FROM water
WHERE Month([date]) = Month(Date())
AND Year([date])=year(Date())) a
LEFT JOIN (SELECT *
FROM water
WHERE DateSerial(Year([date]),Month([date]),Day([date]))
=DateSerial(Year(Date()),Month(Date())-1,Day([date])) ) b
ON a.housholdid = b.housholdid
The above query selects the records for this month Month([date]) = Month(Date()) and compares them to records for last month ([date]) = Month(Date()) - 1)
Please do not use Date as a field name.
Returns the following result.
housholdid a.date b.date a.meter b.meter Consumption
0 01/02/2013 01/01/2013 120 100 20
1 01/02/2013 01/01/2013 140 130 10
Try
select t.householdID
, max(s.theDate) as billingMonth
, max(s.meter)-max(t.meter) as waterUsed
from myTbl t join (
select householdID, max(theDate) as theDate, max(meter) as meter
from myTbl
group by householdID ) s
on t.householdID = s.householdID and t.theDate <> s.theDate
group by t.householdID
This works in SQL not sure about access
You can use the LAG() function in certain SQL dialects. I found this to be much faster and easier to read than joins.
Source: http://blog.jooq.org/2015/05/12/use-this-neat-window-function-trick-to-calculate-time-differences-in-a-time-series/