Query Results from Two Tables without a JOIN - sql

I have 2 data tables, both with a datetime column however other than this column no other matching columns thus not allowing me to do a JOIN.
I have attempted to do the join on the MonthYear alias and this is not giving me the expected results and it is significantly slower to execute, whilst doing them individually is instant.
However, every month will have data and as I am doing a count by month and ordering I'm thinking a join may not be essential. I've tried a UNION however this provided the results in additional rows when I require it to be in adjacent columns (see desired outcome).
table1:
SELECT
LEFT(DATENAME(MONTH,[date]),3) + '-' + RIGHT('00' + CAST(YEAR([date]) AS VARCHAR),2) AS 'MonthYear',
COUNT(CASE WHEN responseType = 'positive' THEN 1 END) AS 'Positive',
COUNT(CASE WHEN responseType = 'negative' THEN 1 END) AS 'Negative'
FROM Database.dbo.Response
WHERE [date] BETWEEN '2022/09/01' AND '2022/12/01'
GROUP BY LEFT(DATENAME(MONTH,[date]),3) + '-' + RIGHT('00' + CAST(YEAR([date]) AS VARCHAR),2)
ORDER BY MAX([date])
table1 results:
MonthYear Positive Negative
Sep-22 8738 6001
Oct-22 10120 4512
Nov-22 5621 5451
table2:
SELECT
LEFT(DATENAME(MONTH,[date]),3) + '-' + RIGHT('00' + CAST(YEAR([date]) AS VARCHAR),2) AS 'MonthYear',
COUNT(CASE WHEN Reason = 'Legacy Unsub' THEN 1 END) AS 'Unsub',
COUNT(CASE WHEN Reason = 'Complaint' THEN 1 END) AS 'Complaint'
FROM Database.dbo.Complaint
WHERE [date] BETWEEN '2022/09/01' AND '2022/12/01'
GROUP BY LEFT(DATENAME(MONTH, [date]),3) + '-' + RIGHT('00' + CAST(YEAR([date]) AS VARCHAR),2)
ORDER BY MAX([date])
table2 results:
MonthYear Unsub Complaint
Sep-22 541 5
Oct-22 171 0
Nov-22 459 12
My desired outcome:
MonthYear Positive Negative Unsub Complaint
Sep-22 8738 6001 541 5
Oct-22 10120 4512 171 0
Nov-22 5621 5451 459 12

I would expect the following to give you your expected output and perform not significantly worse than running the two queries individually (although it will ultimately depend on how many different MonthYears you are returning).
WITH t1 AS
(
SELECT
LEFT(DATENAME(MONTH,[date]),3) + '-' + RIGHT('00' + CAST(YEAR([date]) AS VARCHAR),2) AS 'MonthYear',
MAX([date]) AS 'SortOrder',
COUNT(CASE WHEN responseType = 'positive' THEN 1 END) AS 'Positive',
COUNT(CASE WHEN responseType = 'negative' THEN 1 END) AS 'Negative'
FROM Database.dbo.Response
WHERE [date] BETWEEN '2022/09/01' AND '2022/12/01'
GROUP BY LEFT(DATENAME(MONTH,[date]),3) + '-' + RIGHT('00' + CAST(YEAR([date]) AS VARCHAR),2)
),
t2 AS (
SELECT
LEFT(DATENAME(MONTH,[date]),3) + '-' + RIGHT('00' + CAST(YEAR([date]) AS VARCHAR),2) AS 'MonthYear',
COUNT(CASE WHEN Reason = 'Legacy Unsub' THEN 1 END) AS 'Unsub',
COUNT(CASE WHEN Reason = 'Complaint' THEN 1 END) AS 'Complaint'
FROM Database.dbo.Complaint
WHERE [date] BETWEEN '2022/09/01' AND '2022/12/01'
GROUP BY LEFT(DATENAME(MONTH, [date]),3) + '-' + RIGHT('00' + CAST(YEAR([date]) AS VARCHAR),2)
)
SELECT t1.MonthYear, t1.Positive, t1.Negative, t2.Unsub, t2.Complaint
FROM t1 FULL OUTER JOIN t2
ON t1.MonthYear = t2.MonthYear
ORDER BY t1.SortOrder
This simply joins the results of your two existing queries (as CTEs) on MonthYear. I added an additional SortOrder column to the first CTE (since we want to sort in date order, not alphabetic order). You said "every month will have data", so perhaps an INNER JOIN is sufficient in your case, but the FULL OUTER JOIN is probably safer for things like this (where possibly one of the tables doesn't yet have data for a month that the other table has).
Although SQL isn't procedural so there are no guarantees, I expect SQL Server will run your existing queries in about the same time it takes to run them individually, and then matching, say, a few thousand MonthYears should be relatively insignificant.
If this does not have acceptable performance, I would consider adding a computed column MonthYear to both tables and indexing that.

Related

Sum values in two different tables and join results keeping the columns

I have two tables: one with downtime and the other with productive time.
I want to have a table like this
But I am getting this
In the result, I am getting twice the downtime of the sum for the report 04102021-1, but as can be seen in the second picture, the value is present only once.
The script I am using is the following:
SELECT WAJ.REPORTCODE,--BASIC_REPORT_TABLE.TECHNICIAN,BASIC_REPORT_TABLE.JOBREPORTCODE,
SUM(CASE WHEN DATEDIFF(SECOND,WAJ.TIMESTARTED,WAJ.TIMEFINISHED)<0
THEN (86400+DATEDIFF(SECOND,WAJ.TIMESTARTED,WAJ.TIMEFINISHED))/3600.0
ELSE DATEDIFF(SECOND,WAJ.TIMESTARTED,WAJ.TIMEFINISHED) /3600.0
END) AS PRODUCTION_TIME,
SUM(CASE WHEN DATEDIFF(SECOND,WAS.TIMESTARTED,WAS.TIMEFINISHED)<0
THEN (86400+DATEDIFF(SECOND,WAS.TIMESTARTED,WAS.TIMEFINISHED))/3600.0
ELSE DATEDIFF(SECOND,WAS.TIMESTARTED,WAS.TIMEFINISHED) /3600.0
END) AS DOWNTIME
FROM WORK_AT_JOB WAJ,WAITING_AT_SITE WAS
WHERE (WAJ.REPORTCODE=WAS.REPORTCODE AND WAJ.REPORTCODE LIKE '04102021%') GROUP BY WAJ.REPORTCODE
After the #xQbert comment, I tried this:
SELECT WAS.REPORTCODE,
SUM(CASE WHEN DATEDIFF(SECOND,WAJ.TIMESTARTED,WAJ.TIMEFINISHED)<0
THEN (86400+DATEDIFF(SECOND,WAJ.TIMESTARTED,WAJ.TIMEFINISHED))/3600.0
ELSE DATEDIFF(SECOND,WAJ.TIMESTARTED,WAJ.TIMEFINISHED) /3600.0
END) AS PRODUCTION_TIME,
SUM(CASE WHEN DATEDIFF(SECOND,WAS.TIMESTARTED,WAS.TIMEFINISHED)<0
THEN (86400+DATEDIFF(SECOND,WAS.TIMESTARTED,WAS.TIMEFINISHED))/3600.0
ELSE DATEDIFF(SECOND,WAS.TIMESTARTED,WAS.TIMEFINISHED) /3600.0
END) AS DOWNTIME
FROM WAITING_AT_SITE WAS
JOIN WORK_AT_JOB WAJ
ON (WAJ.REPORTCODE=WAS.REPORTCODE AND WAS.REPORTCODE LIKE '04102021%') GROUP BY WAS.REPORTCODE
But I got the same result.
May you give some advice to get the result I want?
Thanks in advance
You could use conditional aggregation for this, but the easiest, and probably most performant, way to do this is to pre-aggregate the results before you join.
SELECT
waj.REPORTCODE
waj.PRODUCTION_TIME,
was.DOWNTIME
FROM (
SELECT
waj.REPORTCODE,
SUM(CASE WHEN v.diff < 0 THEN 86400 + v.diff ELSE v.diff END / 3600.0) AS PRODUCTION_TIME
FROM WORK_AT_JOB waj
CROSS APPLY (VALUES( DATEDIFF(SECOND, waj.TIMESTARTED, waj.TIMEFINISHED) )) v(diff)
WHERE waj.REPORTCODE LIKE '04102021%'
GROUP BY
waj.REPORTCODE
) waj
JOIN (
SELECT
was.REPORTCODE,
SUM(CASE WHEN v.diff < 0 THEN 86400 + v.diff ELSE v.diff END / 3600.0) AS PRODUCTION_TIME
FROM WAITING_AT_SITE was
CROSS APPLY (VALUES( DATEDIFF(SECOND, was.TIMESTARTED, was.TIMEFINISHED) )) v(diff)
WHERE was.REPORTCODE LIKE '04102021%'
GROUP BY
was.REPORTCODE
) was ON waj.REPORTCODE = was.REPORTCODE;
Note the use of CROSS APPLY (VALUES to avoid code repetition.

Compare 2 columns rows if date is greater then update "Yes" else "No"

I have a table with From date and Through columns with dates. I have one more column is Eligible. So if Through column's value is greater than From date column's value, Eligible column rows has to be updated Yes otherwise No. Kindly help me with this logic in sql server.
WHILE #MyDate > DATEADD(DAY,1,GETDATE())
BEGIN
SELECT
MI.Suffix as [Mem Sfx],
CONVERT(VARCHAR(100), EB.Eligibility_Date, 101) as [From],
CONVERT(VARCHAR(100), MI.EOI_Termination_Date, 101) as [Through],
'No' AS Eligible,
SG.SubGroupId as Subgroup,
EB.Plan_ID as [Plan],
'00' + RIGHT('123658' + CAST('00' AS VARCHAR(8)), 8) Product
FROM [dbo].[Members.Indicative] MI
INNER JOIN [dbo].[Eligibilty] EB ON EB.Subscriber_ID = MI.Subscriber_ID
INNER JOIN [dbo].[Subgroup] SG ON SG.Subscriber_ID=MI.Subscriber_ID
order by MI.Suffix
END
You can use case..when statement
case when CONVERT(VARCHAR(100), EB.Eligibility_Date, 101) <
CONVERT(VARCHAR(100), MI.EOI_Termination_Date, 101) then
'Yes'
else
'No'
end as [Eligible]

How to count every half hour?

I have a query that its counting every hour, using a pivot table.
How would it be possible to get the count for every 30 minutes?
for example 8:00-8:29,8:30-8:59,9:00-9:29 etc. until 5:00
SELECT CONVERT(varchar(8),start_date,1) AS 'Day',
SUM(CASE WHEN DATEPART(hour,start_date) = 8 THEN 1 ELSE 0 END) as eight ,
SUM(CASE WHEN DATEPART(hour,start_date) = 9 THEN 1 ELSE 0 END) AS nine,
SUM(CASE WHEN DATEPART(hour,start_date) = 10 THEN 1 ELSE 0 END) AS ten,
SUM(CASE WHEN DATEPART(hour,start_date) = 11 THEN 1 ELSE 0 END) AS eleven,
SUM(CASE WHEN DATEPART(hour,start_date) = 12 THEN 1 ELSE 0 END) AS twelve,
SUM(CASE WHEN DATEPART(hour,start_date) = 13 THEN 1 ELSE 0 END) AS one_clock,
SUM(CASE WHEN DATEPART(hour,start_date) = 14 THEN 1 ELSE 0 END) AS two_clock,
SUM(CASE WHEN DATEPART(hour,start_date) = 15 THEN 1 ELSE 0 END) AS three_clock,
SUM(CASE WHEN DATEPART(hour,start_date) = 16 THEN 1 ELSE 0 END) AS four_clock
FROM test
where user_id is not null
GROUP BY CONVERT(varchar(8),start_date,1)
ORDER BY CONVERT(varchar(8),start_date,1)
I use sql server 2012 (version Microsoft SQL Server Management Studio 11.0.3128.0)
Try using iif as below:
SELECT CONVERT(varchar(8),start_date,1) AS 'Day', SUM(iif(DATEPART(hour,start_date) = 8 and
DATEPART(minute,start_date) >= 0 and
DATEPART(minute,start_date) =< 29,1,0)) as eight_tirty
FROM test where user_id is not null GROUP BY
CONVERT(varchar(8),start_date,1) ORDER BY
CONVERT(varchar(8),start_date,1)
To get counts by day and half hour, something like this should work.
SELECT day, half_hour, count(1) AS half_hour_count
FROM (
SELECT
CAST(start_date AS date) AS day,
DATEPART(hh, start_date)
+ 0.5*(DATEPART(n,start_date)/30) AS half_hour
FROM test
WHERE user_id IS NOT NULL
) qry
GROUP BY day, half_hour
ORDER BY day, half_hour;
Formatting the result could be done later.
You need a few things, and then this query just falls together.
First, assuming you need multiple dates, you're going to want what's known as a Calendar Table (hands down, probably the most useful analysis table).
Next, you're going to want either an existing Numbers table if you have one, or just generate the first on the fly:
WITH Halfs AS (SELECT CAST(0 AS INT) m
UNION ALL
SELECT m + 1
FROM Halfs
WHERE m < 24 * 2)
SELECT m
FROM Halfs
(recursive CTE - generates a table with a list of numbers starting at 0).
These two tables will provide the basis for a range query based on the timestamps in your main table. This will make it very easy for the optimizer to bucket rows for whatever aggregation you're doing. That's done by CROSS JOINing the two tables together in a subquery, as well as adding a couple of other derived columns:
WITH Halfs AS (SELECT CAST(0 AS INT) m
UNION ALL
SELECT m + 1
FROM Halfs
WHERE m < 24 * 2)
SELECT calendarDate, m, rangeStart, rangeEnd
FROM (SELECT Calendar.calendarDate, Halfs.m rangeGroup,
DATEADD(minutes, m * 30, CAST(Calendar.calendarDate AS DATETIME2) rangeStart,
DATEADD(minutes, (m + 1) * 30, CAST(Calendar.calendarDate AS DATETIME2) rangeEnd
FROM Calendar
CROSS JOIN Halfs
WHERE Calendar.calendarDate >= CAST('20160823' AS DATE)
AND Calendar.calendarDate < CAST('20160830' AS DATE)
-- OR whatever your date range actually is.
) Range
ORDER BY rangeStart
(note that, if the range of dates is sufficiently large, it may be beneficial to save this off as a temporary table with indicies. For small tables and datasets, the performance gain isn't likely to be noticeable)
Now that we have our ranges, it's trivial to get our groups, and pivot the table.
Oh, and SQL Server has a specific operator for PIVOTing.
WITH Halfs AS (SELECT CAST(0 AS INT) m
UNION ALL
SELECT m + 1
FROM Halfs
WHERE m < 3 * 2)
-- Intentionally limiting range for example only
SELECT calendarDate AS day, [0], [1], [2], [3], [4], [5], [6]
-- If you're displaying "nice" names,
-- do it at this point, or in the reporting application
FROM (SELECT Range.calendarDate, Range.rangeGroup
FROM (SELECT Calendar.calendarDate, Halfs.m rangeGroup,
DATEADD(minutes, m * 30, CAST(Calendar.calendarDate AS DATETIME2) rangeStart,
DATEADD(minutes, (m + 1) * 30, CAST(Calendar.calendarDate AS DATETIME2) rangeEnd
FROM Calendar
CROSS JOIN Halfs
WHERE Calendar.calendarDate >= CAST('20160823' AS DATE)
AND Calendar.calendarDate < CAST('20160830' AS DATE)
-- OR whatever your date range actually is.
) Range
LEFT JOIN Test
ON Test.user_id IS NOT NULL
AND Test.start_date >= Range.rangeStart
AND Test.start_date < Range.rangeEnd
) AS DataTable
PIVOT (COUNT(*)
FOR Range.rangeGroup IN ([0], [1], [2], [3], [4], [5], [6])) AS PT
-- Only covers the first 6 groups,
-- or the first three hours.
ORDER BY day
The pivot should take care of the getting individual columns, and COUNT will automatically resolve null rows. Should be all you need.

More efficient way of grouping rows by hour (using a timestamp)

I'm trying to show a log of daily transactions that take place. My current method is embarrassingly inefficient and I'm sure there is a much better solution. Here is my current query:
select ReaderMACAddress,
count(typeid) as 'Total Transactions',
SUM(CASE WHEN CAST("Timestamp" as TIME) between '05:00:00' and '11:59:59' THEN 1 ELSE 0 END) as 'Morning(5am-12pm)',
SUM(CASE WHEN CAST("Timestamp" as TIME) between '12:00:00' and '17:59:59' THEN 1 ELSE 0 END) as 'AfternoonActivity(12pm-6pm)',
SUM(CASE WHEN CAST("Timestamp" as TIME) between '18:00:00' and '23:59:59' THEN 1 ELSE 0 END) as 'EveningActivity(6pm-12am)',
SUM(CASE WHEN CAST("Timestamp" as TIME) between '00:00:00' and '04:59:59' THEN 1 ELSE 0 END) as 'OtherActivity(12am-5am)'
from Transactions
where ReaderMACAddress = '0014f54033f5'
Group by ReaderMACAddress;
which returns the results:
ReaderMACAddress Total Transactions Morning(5am-12pm) AfternoonActivity(12pm-6pm) EveningActivity(6pm-12am) OtherActivity(12am-5am)
0014f54033f5 932 269 431 232 0
(sorry for any alignment issues here)
At the moment I only want to look at a single Reader that I specify (through the where clause). Ideally, it would be easier to read if the time sections were in a single column and the results, i.e. a count function were in a second column yielding results such as:
Total Transactions 932
Morning(5am-12pm) 269
AfternoonActivity(12pm-6pm) 431
EveningActivity(6pm-12am) 232
OtherActivity(12am-5am) 0
Thanks for any help :)
I would first consider a computed column, but I believe from a previous post you don't have the ability to change the schema. So how about a view?
CREATE VIEW dbo.GroupedReaderView
AS
SELECT ReaderMACAddress,
Slot = CASE WHEN t >= '05:00' AND t < '12:00' THEN 1
WHEN t >= '12:00' AND t < '18:00' THEN 2
WHEN t >= '18:00' THEN 3 ELSE 4 END
FROM
(
SELECT ReaderMACAddress, t = CONVERT(TIME, [Timestamp])
FROM dbo.Transactions
) AS x;
Now your per-MAC address query is much, much simpler:
SELECT Slot, COUNT(*)
FROM dbo.GroupedReaderView
WHERE ReaderMACAddress = '00...'
GROUP BY Slot;
This will provide a result like:
1 269
2 431
3 232
4 0
You can also add WITH ROLLUP which will provide a grand total with the Slot column being NULL:
SELECT Slot, COUNT(*)
FROM dbo.GroupedReaderView
WHERE ReaderMACAddress = '00...'
GROUP BY Slot
WITH ROLLUP;
Should yield:
1 269
2 431
3 232
4 0
NULL 932
And you can pivot that if you need to, add labels per slot, etc. in your presentation tier.
You could also do it this way, it just makes the view a lot more verbose and pulls a lot of extra data when you query it directly; it's also slightly less efficient to group by strings.
CREATE VIEW dbo.GroupedReaderView
AS
SELECT ReaderMACAddress,
Slot = CASE WHEN t >= '05:00' AND t < '12:00' THEN
'Morning(5am-12pm)'
WHEN t >= '12:00' AND t < '18:00' THEN
'Afternoon(12pm-6pm)'
WHEN t >= '18:00' THEN
'Evening(6pm-12am)'
ELSE
'Other(12am-5am)'
END
FROM
(
SELECT ReaderMACAddress, t = CONVERT(TIME, [Timestamp])
FROM dbo.Transactions
) AS x;
These aren't necessarily more efficient than what you've got, but they're less repetitive and easier on the eyes. :-)
Also if you don't want to (or can't) create a view, you can just put that into a subquery, e.g.
SELECT Slot, COUNT(*)
FROM
(
SELECT ReaderMACAddress,
Slot = CASE WHEN t >= '05:00' AND t < '12:00' THEN
'Morning(5am-12pm)'
WHEN t >= '12:00' AND t < '18:00' THEN
'Afternoon(12pm-6pm)'
WHEN t >= '18:00' THEN
'Evening(6pm-12am)'
ELSE
'Other(12am-5am)'
END
FROM
(
SELECT ReaderMACAddress, t = CONVERT(TIME, [Timestamp])
FROM dbo.Transactions
) AS x
) AS y
WHERE ReaderMACAddress = '00...'
GROUP BY Slot
WITH ROLLUP;
Just an alternative that still lets you use BETWEEN and may be even a little less verbose:
SELECT Slot, COUNT(*)
FROM
(
SELECT ReaderMACAddress,
Slot = CASE WHEN h BETWEEN 5 AND 11 THEN 'Morning(5am-12pm)'
WHEN h BETWEEN 12 AND 17 THEN 'Afternoon(12pm-6pm)'
WHEN h >= 18 THEN 'Evening(6pm-12am)'
ELSE 'Other(12am-5am)'
END
FROM
(
SELECT ReaderMACAddress, h = DATEPART(HOUR, [Timestamp])
FROM dbo.Transactions
) AS x
) AS y
WHERE ReaderMACAddress = '00...'
GROUP BY Slot
WITH ROLLUP;
UPDATE
To always include each slot even if there are no results for that slot:
;WITH slots(s, label, h1, h2) AS
(
SELECT 1, 'Morning(5am-12pm)' , 5, 11
UNION ALL SELECT 2, 'Afternoon(12pm-6pm)' , 12, 17
UNION ALL SELECT 3, 'Evening(6pm-12am)' , 18, 23
UNION ALL SELECT 4, 'Other(12am-5am)' , 0, 4
)
SELECT s.label, c = COALESCE(COUNT(y.ReaderMACAddress), 0)
FROM slots AS s
LEFT OUTER JOIN
(
SELECT ReaderMACAddress, h = DATEPART(HOUR, [Timestamp])
FROM dbo.Transactions
WHERE ReaderMACAddress = '00...'
) AS y
ON y.h BETWEEN s.h1 AND s.h2
GROUP BY s.label
WITH ROLLUP;
The key in all of these cases is to simplify and not repeat yourself. Even if SQL Server only performs it once, why convert to time 4+ times?

Grabbing/rearranging data from SQL for table

I have data in sql that looks like so:
Month PersonID Level
01 102 2
01 506 1
02 617 3
02 506 1
03 297 2
And I need to query this data to receive it for use in a table that would look like this
Jan Feb March ...etc
Level 1
Level 2
Level 3
with the values being how many people are in each level each month.
I'm a complete noob with SQL so any help and relevant links to explain answers would be much appreciated.
Try this:
SELECT 'Level' + CAST(level as varchar), [January], [February], [March]
FROM (SELECT DATENAME(month, '2013'+Month+'01') Month, PersonID, Level FROM Tbl) T
PIVOT
(
COUNT(PersonID) FOR Month IN ([January], [February], [March])
) A
SQL FIDDLE DEMO
SELECT 'Level ' + CAST("Level" AS VARCHAR(2)),
SUM(CASE Month WHEN '01' THEN 1 ELSE 0 END) AS Jan,
SUM(CASE Month WHEN '02' THEN 1 ELSE 0 END) AS Feb,
SUM(CASE Month WHEN '03' THEN 1 ELSE 0 END) AS Mar,
...
FROM myTable
GROUP BY "Level"
SQL Fiddle Example
This is basically a poor man's pivot table, which should work on most RDBMS. What it does is use a SUM with a CASE to achieve a count-if for each month. That is, for January, the value for each row will be 1 if Month = '01', or 0 otherwise. Summing these values gets the total count of all "January" rows in your table.
The GROUP BY Level clause tells the engine to produce one result row for each distinct value in Level, thus separating your data by the different levels.
Since you are using SQL Server 2005, which supports PIVOT, you can simply do:
SELECT 'Level ' + CAST("Level" AS VARCHAR(2)),
[01] AS [Jan], [02] AS [Feb], [03] AS [Mar], ...
FROM myTable
PIVOT
(
COUNT(PersonId)
FOR Month IN ([01], [02], [03], ...)
) x
SQL Fiddle Example