Complex SQL query excluding results from other SQL queries - sql

This question is a continuation from this question
There is a slight change to the database structure however (as I over simplified slightly)
I have a database with data columns as follows:
Key, Instance, User ID, Day, Size, Instance Type
Key - This is the primary key.
Instance - This is a descriptor of the specific instance (this will be unique).
User ID - This will refer to 1 of a number of users where the number will be less than the number of entries in this table.
Day - This is the day the specific user created this instance. It will be one of either Day 1 or Day 2.
Size - This is the size of the data stored.
Instance Type - There are several instance types An instance, itself, will be one of these instance types.
Now in my previous question I was building a nested SQL query to find a distinct users who has instance on day 1 and day 2 and in this case with a specific instance type.
Now I have 2 sets of these queries.
What I would now like to do is a 3rd query and return the database where the User ID does not exist in either of the other queries.
So far I have set up a query but it is REALLY slow (Something to do with the <> comparator in the On statement) and I'm not even 100% sure it does exactly what I want.
This is my SQL statement so far:
Select Max( Table.Key ) as Key,
Max( Table.Instance ) as Instance,
Table.[User ID],
Max( Table.Day ) as Day,
Max( Table.Size ) as Size,
Max( Table.[Instance Type] ) as [Instance Type]
from (((Table
inner join (Select top 90 Max( Table.Key ) as Key,
Max( Table.Instance ) as Instance,
Table.[User ID],
Max( Table.Day ) as Day,
Max( Table.Size ) as Size,
Max( Table.[Instance Type] ) as [Instance Type]
from Table
where Table.[Instance Type]="type1" and
Table.[Day]=1 and
1=1
group by Table.[User ID]) as t2
on Table.[User ID]<>t2.[User ID])
inner join (Select top 90 Max( Table.Key ) as Key,
Max( Table.Instance ) as Instance,
Table.[User ID],
Max( Table.Day ) as Day,
Max( Table.Size ) as Size,
Max( Table.[Instance Type] ) as [Instance Type]
from Table
where Table.[Instance Type]="type1" and
Table.[Day]=2 and
1=1
group by Table.[User ID]) as t3
on Table.[User ID]<>t3.[User ID])
inner join (Select Table.[User ID]
from Table
where 1=1 ) as t4
on Table.[User ID]=t4.[User ID])
where Table.[Instance Type]="type1"
group by Table.[User ID];
Any help or advice on how to get what I'm after would be massively appreciated!

It might make things easier for performance tuning to start with chunking this up as temporary tables (in MS-Access, i guess 'views' or 'make-tables', particularly as your logic seems overly complex (and therefore difficult to maintain and debug!):
Part 1 of your question: "Now in my previous question I was building a nested SQL query to find a distinct users who has instance on day 1 and day 2 and in this case with a specific instance type."
Why don't you first use a 'temp table/make table' or 'view' or whatever (depending on how you're implementing, by first getting a distinct list of userid's with the day.
SELECT DISTINCT Table.UserID, Table.Day
INTO #DistinctListOfUserIDsWithDays /* Not sure of exact syntax in access! */
FROM Table
WHERE Table.Day = 1
OR Table.Day = 2
And then you can get all users that have data for both days, which solves your first part easily, by aggregating the results of the view/query above and using a having clause:
SELECT Table.UserID, COUNT(*)
FROM #DistinctListOfUserIDsWithDays
HAVING COUNT(*) > 1
Now I have 2 sets of these queries.
re: Your new query, "What I would now like to do is a 3rd query and return the database where the User ID does not exist in either of the other queries", here's one simple solution, based on the original query/view above:
SELECT Table.UserID
FROM Table
WHERE UserID NOT IN (SELECT UserID FROM #DistinctListOfUserIDsWithDays)
You cold also outer join #DistinctListOfUserIDsWithDays with the Table and select only those UserID's that return a NULL on the #DistinctListOfUserIDsWithDays side of the query...

Related

Query keeps giving me duplicate records. How can I fix this?

I wrote a query which uses 2 temp tables. And then joins them into 1. However, I am seeing duplicate records in the student visit temp table. (Query is below). How could this be modified to remove the duplicate records of the visit temp table?
with clientbridge as (Select *
from (Select visitorid, --Visid
roomnumber,
room_id,
profid,
student_id,
ambc.datekey,
RANK() over(PARTITION BY visitorid,student_id,profid ORDER BY ambc.datekey desc) as rn
from university.course_office_hour_bridge cohd
--where student_id = '9999999-aaaa-6634-bbbb-96fa18a9046e'
)
where rn = 1 --visitorid = '999999999999999999999999999999'---'1111111111111111111111111111111' --and pai.datekey is not null --- 00000000000000000000000000
),
-----------------Data Header Table
studentvisit as
(SELECT
--Visit key will allow us to track everything they did within that visit.
distinct visid_visitorid,
--calcualted_visitorid,
uniquevisitkey,
--channel, -- says the room they're in. Channel might not be reliable would need to see how that operates
--office_list, -- add 7 to exact
--user_college,
--first_office_hour_name,
--first_question_time_attended,
studentaccountid_5,
profid_officenumber_8,
studentvisitstarttime,
room_id_115,
--date_time,
qqq144, --Course Name
qqq145, -- Course Office Hour Benefit
qqq146, --Course Office Hour ID
datekey
FROM university.office_hour_details ohd
--left_join niversity.course_office_hour_bridge cohd on ohd.visid_visitorid
where DateKey >='2022-10-01' --between '2022-10-01' and '2022-10-27'
and (qqq146 <> '')
)
select
*
from clientbridge ab inner join studentvisit sv on sv.visid_visitorid = cb.visitorid
I wrote a query which uses 2 temp tables. And then joins them into 1. However, I am seeing duplicate records in the student visit temp table. (Query is below). How could this be modified to remove the duplicate records of the visit temp table?
I think you may get have a better shot by joining the two datasets in the same query where you want the data ranked, otherwise your rank from query will be ignored within the results from the second query. Perhaps, something like ->
;with studentvisit as
(SELECT
--Visit key will allow us to track everything they did within that visit.
distinct visid_visitorid,
--calcualted_visitorid,
uniquevisitkey,
--channel, -- says the room they're in. Channel might not be reliable would need to see how that operates
--office_list, -- add 7 to exact
--user_college,
--first_office_hour_name,
--first_question_time_attended,
studentaccountid_5,
profid_officenumber_8,
studentvisitstarttime,
room_id_115,
--date_time,
qqq144, --Course Name
qqq145, -- Course Office Hour Benefit
qqq146, --Course Office Hour ID
datekey
FROM university.office_hour_details ohd
--left_join niversity.course_office_hour_bridge cohd on ohd.visid_visitorid
where DateKey >='2022-10-01' --between '2022-10-01' and '2022-10-27'
and (qqq146 <> '')
)
,clientbridge as (
Select
sv.*,
university.course_office_hour_bridge cohd, --Visid
roomnumber,
room_id,
profid,
student_id,
ambc.datekey,
RANK() over(PARTITION BY sv.visitorid,sv.student_id,sv,profid ORDER BY ambc.datekey desc) as rn
from university.course_office_hour_bridge cohd
inner join studentvisit sv on sv.visid_visitorid = cohd.visitorid
)
select
*
from clientbridge WHERE rn=1

Different results in SQL based on what columns I display

I am trying to run a query to gather the total items on hand in our database. However it seems i'm getting incorrect data. I am selecting selecting just the amount field and summing it using joins from separate tables based on certain parameters, however if I display additional fields such as order number, and date all of a sudden im getting different data, even though those fields are being used as filters in the query. Is it because its not in the select statement? If it needs to be in the select statement is it possible to not display them?
Here are the two queries.
-- Items On Hand
select CONVERT(decimal(25, 2), SUM(tw.amount)) as 'Amt'
from [Sales Header] sh
join
(
select *
from TWAllOrders
where [Status] like 'Released'
) tw
on tw.[Order Nb] = sh.No_
join
(
select *
from OnHand
) oh
on tw.No_ = oh.[Item No_]
where sh.[Requested Delivery Date] < getdate()
HAVING SUM(tw.Quantity) <= SUM(oh.Qty)
providing a sum of 21667457.20
and with the added columns
-- Items On Hand
select CONVERT(decimal(25, 2), SUM(tw.amount)) as 'Amt', [Requested Delivery Date], sh.No_, tw.[Status]
from [Sales Header] sh
join
(
select *
from TWAllOrders
where [Status] like 'Released'
) tw
on tw.[Order Nb] = sh.No_
join
(
select *
from OnHand
) oh
on tw.No_ = oh.[Item No_]
where sh.[Requested Delivery Date] < getdate()
group by sh.[Requested Delivery Date], sh.No_, tw.[Status]
HAVING SUM(tw.Quantity) <= SUM(oh.Qty)
order by sh.[Requested Delivery Date] ASC
Providing a sum of 12319998
I'm self taught in SQL so I may be misunderstanding something obvious, thanks for the help.
With no sample data, I am going to have to demonstrate this in principle. In the latter query you have a GROUP BY meaning the scope of the values in the HAVING will differ, and thus the filtering from said HAVING will be different.
Let's take the following sample data:
CREATE TABLE dbo.MyTable (Grp char(1),
Quantity int,
Required int);
INSERT INTO dbo.MyTable (Grp, Quantity, [Required])
VALUES('a',2,7),
('a',14,2),
('b',4, 7),
('b',3,4),
('c',17,5);
Now we'll perform an overly simplified version of your query:
SELECT SUM(Quantity)
FROM dbo.MyTable
HAVING SUM(Quantity) > SUM(Required);
This brings back the value 40; which is the SUM of all the values in Quantity. A value is returned because the total SUM of Required is 25.
Now let's add a GROUP BY like your second query:
SELECT SUM(Quantity)
FROM dbo.MyTable
GROUP BY Grp
HAVING SUM(Quantity) > SUM(Required);
Now we have 2 rows, with the values 16 and 17 giving a total value of 33. That's because the rows where Grp have a value of 'B' are filtered out, as the SUM of Quantity is lower that Required for 'B'.
The same is happening in your data; in the grouped data you have groups where the HAVING condition isn't met, so those rows aren't returned.

SQL - Returning CTE with Top 1

I am trying to return a set of results and decided to try my luck with CTE, the first table "Vendor", has a list of references, the second table "TVView", has ticket numbers that were created using a reference from the "Vendor" table. There may be one or more tickets using the same ticket number depending on the state of that ticket and I am wanting to return the last entry for each ticket found in "TVView" that matches a selected reference from "Vendor". Also, the "TVView" table has a seed field that is incremented.
I got this to return the right amount of entries (meaning not showing the duplicate tickets but only once) but I cannot figure out how to add an additional layer to go back through and select the last entry for that ticket and return some other fields. I can figure out how to sum which is actually easy, but I really need the Top 1 of each ticket entry in "TVView" regardless if its a duplicate or not while returning all references from "Vendor". Would be nice if SQL supported "Last"
How do you do that?
Here is what I have done so far:
with cteTickets as (
Select s.Mth2, c.Ticket, c.PyRt from Vendor s
Inner join
TVView c on c.Mth1 = s.Mth1 and c.Vendor = s.Vendor
)
Select Mth2, Ticket, PayRt from cteTickets
Where cteTickets.Vendor >='20'
and cteTickets.Vendor <='40'
and cteTickets.Mth2 ='8/15/2014'
Group by cteTickets.Ticket
order by cteTickets.Ticket
Several rdbms's that support Common Table Expressions (CTE) that I am aware of also support analytic functions, including the very useful ROW_NUMBER(), so the following should work in Oracle, TSQL (MSSQL/Sybase), DB2, PostgreSQL.
In the suggestions the intention is to return just the most recent entry for each ticket found in TVView. This is done by using ROW_NUMBER() which is PARTITIONED BY Ticket that instructs row_number to recommence numbering for each change of the Ticket value. The subsequent ORDER BY Mth1 DESC is used to determine which record within each partition is assigned 1, here it will be the most recent date.
The output of row_number() needs to be referenced by a column alias, so using it in a CTE or derived table permits selection of just the most recent records by RN = 1 which you will see used in both options below:
-- using a CTE
WITH
TVLatest
AS (
SELECT
* -- specify the fields
, ROW_NUMBER() OVER (PARTITION BY Ticket
ORDER BY Mth1 DESC) AS RN
FROM TVView
)
SELECT
Mth2
, Ticket
, PayRt
FROM Vendor v
INNER JOIN TVLatest l ON v.Mth1 = l.Mth1
AND v.Vendor = l.Vendor
AND l.RN = 1
WHERE v.Vendor >= '20'
AND v <= '40'
AND v.Mth2 = '2014-08-15'
ORDER BY
v.Ticket
;
-- using a derived table instead
SELECT
Mth2
, Ticket
, PayRt
FROM Vendor v
INNER JOIN (
SELECT
* -- specify the fields
, ROW_NUMBER() OVER (PARTITION BY Ticket
ORDER BY Mth1 DESC) AS RN
FROM TVView
) TVLatest l ON v.Mth1 = l.Mth1
AND v.Vendor = l.Vendor
AND l.RN = 1
WHERE v.Vendor >= '20'
AND v <= '40'
AND v.Mth2 = '2014-08-15'
ORDER BY
v.Ticket
;
please note: "SELECT *" is a convenience or used as an abbreviation if full details are unknown. The queries above may not operate without correctly specifying the field list (eg. 'as is' they would fail in Oracle).

Need to do tthis in 1 SQL

I have to process in an sql as follows. each order is made up of many detail rows. I only need to look at one table, TRA99.
Order number TRAN CODE
123 QEE
123 #23
123 ABC
SELECT
ALL OTRIDC, OTCOM#, OTORD#, OTFL50, OTTRND, OTTRT, OTENT#,
OTSFX#,
OTREL#, OTUSRN, OTTRNC, OTTRN$, OTFL01
FROM ASTDTA.OETRANOT T01
WHERE OTTRNC IN ('QEE', 'QNE')
I want all the Order # which have 'QEE' or 'QNE'. These are QUote codes. We want a report that will tell us, which quotes orders' converted to a real order and which did not.
then if they have as well #23, this tells me that the order was converted or became an actual order. I am not sure how to do this in 1 sql query i was thinking to create a view for all QEE and QNE codes. then run a second query against that looking for #23.
Based on what I think you're trying to do. This will give you all the order numbers which are associated with both QEE or QNE and #23
SELECT T1.OrderNumber
FROM TRA99 T1
WHERE T1.OrderNumber in (
SELECT OrderNumber
FROM TRA99
WHERE TCode IN ('QEE','QNE')
)
AND T1.TCode='#23'
GROUP BY T1.OrderNumber
What you need to do is use a GROUP BY also a EXISTS sub-query can check for the existence of your flag
select t1.[Order Number]
,(CASE WHEN EXISTS(select *
from TRA99 as t2
where t1.[Order Number] = t2.[Order Number]
and t2.[Tran Code] = '#23')
THEN
cast(1 as bit)
ELSE
cast(0 as bit)
END) as HasFlag
from TRA99 as t1
where t1.[Tran Code] in ('QFE', 'QNE')
group by t1.[Order Number]
Working example
NOTE: you did not list what DBMS you are using so I wrote this against Microsoft Sql Server syntax, but you can translate this concept to any DBMS you need.

Datediff between two tables

I have those two tables
1-Add to queue table
TransID , ADD date
10 , 10/10/2012
11 , 14/10/2012
11 , 18/11/2012
11 , 25/12/2012
12 , 1/1/2013
2-Removed from queue table
TransID , Removed Date
10 , 15/1/2013
11 , 12/12/2012
11 , 13/1/2013
11 , 20/1/2013
The TansID is the key between the two tables , and I can't modify those tables, what I want is to query the amount of time each transaction spent in the queue
It's easy when there is one item in each table , but when the item get queued more than once how do I calculate that?
Assuming the order TransIDs are entered into the Add table is the same order they are removed, you can use the following:
WITH OrderedAdds AS
( SELECT TransID,
AddDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY AddDate)
FROM AddTable
), OrderedRemoves AS
( SELECT TransID,
RemovedDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY RemovedDate)
FROM RemoveTable
)
SELECT OrderedAdds.TransID,
OrderedAdds.AddDate,
OrderedRemoves.RemovedDate,
[DaysInQueue] = DATEDIFF(DAY, OrderedAdds.AddDate, ISNULL(OrderedRemoves.RemovedDate, CURRENT_TIMESTAMP))
FROM OrderedAdds
LEFT JOIN OrderedRemoves
ON OrderedAdds.TransID = OrderedRemoves.TransID
AND OrderedAdds.RowNumber = OrderedRemoves.RowNumber;
The key part is that each record gets a rownumber based on the transaction id and the date it was entered, you can then join on both rownumber and transID to stop any cross joining.
Example on SQL Fiddle
DISCLAIMER: There is probably problem with this, but i hope to send you in one possible direction. Make sure to expect problems.
You can try in the following direction (which might work in some way depending on your system, version, etc) :
SELECT transId, (sum(add_date_sum) - sum(remove_date_sum)) / (1000*60*60*24)
FROM
(
SELECT transId, (SUM(UNIX_TIMESTAMP(add_date)) as add_date_sum, 0 as remove_date_sum
FROM add_to_queue
GROUP BY transId
UNION ALL
SELECT transId, 0 as add_date_sum, (SUM(UNIX_TIMESTAMP(remove_date)) as remove_date_sum
FROM remove_from_queue
GROUP BY transId
)
GROUP BY transId;
A bit of explanation: as far as I know, you cannot sum dates, but you can convert them to some sort of timestamps. Check if UNIX_TIMESTAMPS works for you, or figure out something else. Then you can sum in each table, create union by conveniently leaving the other one as zeto and then subtracting the union query.
As for that devision in the end of first SELECT, UNIT_TIMESTAMP throws out miliseconds, you devide to get days - or whatever it is that you want.
This all said - I would probably solve this using a stored procedure or some client script. SQL is not a weapon for every battle. Making two separate queries can be much simpler.
Answer 2: after your comments. (As a side note, some of your dates 15/1/2013,13/1/2013 do not represent proper date formats )
select transId, sum(numberOfDays) totalQueueTime
from (
select a.transId,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate
) X
group by transId
Answer 1: before your comments
Assuming that there won't be a new record added unless it is being removed. Also note following query will bring numberOfDays as zero for unremoved records;
select a.transId, a.addDate, r.removeDate,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate