How to do a query with unique results between two tables? - sql

I need to do a query to get the last accessed file per user, in SCCM 2012. I'm trying to make a query in sql but i'm getting a lot of duplicate results.
The result that I need must contain only the last date (most recently) for each user.
There is the query that i'm using:
SELECT
dbo.v_GS_SoftwareFile.FileName,
dbo.v_R_System.User_Name0,
dbo.v_GS_SoftwareFile.FileModifiedDate
FROM
dbo.v_GS_SoftwareFile
CROSS JOIN dbo.v_R_System
WHERE
(dbo.v_GS_SoftwareFile.FileName = N'outlook.exe')
AND (dbo.v_GS_SoftwareFile.FileModifiedDate > CONVERT(DATETIME, '2015-02-01 00:00:00', 102))
GROUP BY
dbo.v_R_System.User_Name0,
dbo.v_GS_SoftwareFile.FileName,
dbo.v_GS_SoftwareFile.FileModifiedDate
What I need to add to this query?

Your CROSS JOIN might be responsible for the 'duplicate results' you report, since you don't have an actual join condition there (so, if you have 10 records in one table, and 100 records in another, you will have 10x100=1000 records). Is there a common key between your SoftwareFile and System tables?
Once you've added the JOIN condition, to get it down to a single date per user, use the MAX() function as follows:
SELECT
dbo.v_GS_SoftwareFile.FileName,
dbo.v_R_System.User_Name0,
MAX(dbo.v_GS_SoftwareFile.FileModifiedDate) AS LastFileModifiedDate
FROM
dbo.v_GS_SoftwareFile
CROSS JOIN
dbo.v_R_System
WHERE
(dbo.v_GS_SoftwareFile.FileName = N'outlook.exe')
AND (dbo.v_GS_SoftwareFile.FileModifiedDate > CONVERT(DATETIME, '2015-02-01 00:00:00', 102))
GROUP BY
dbo.v_R_System.User_Name0,
dbo.v_GS_SoftwareFile.FileName

Related

How can I get the first instance of an event per day with multiple columns including a datetime and return those columns plus the full datetime value?

I need to generate a SQL script that will pull out Distinct entries using a number of columns, one of which is a datetime column. I am only interested in the first occurrence of the day per event and the query needs to span multiple days. The query will be run against a very large database and can potentially be returning hundreds of thousands of results if not millions. Therefore I need this script to be as efficient as possible as well. This will eventually be a script running in SSRS to pull access transactions.
I've tried using GROUP BY, DISTINCT, subqueries, FIRST, and such without success. All the examples I can find online don't have JOIN statements or calculated columns such as only gathering the date from a datetime field.
I've simplified the below script some to only pull one day and one door, but the prod will be multiple days and doors. This code returns the data I need, I don't care about the COUNT, but I also need to get the (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)) field in my result set as well somehow. The problem is since it goes down to the second it makes all records DISTINCT.
DECLARE #Begin datetime2 = '4/10/2019',
#End datetime2 = '4/11/2019',
#Door varchar(max) = 'Front Entrance'
SELECT
CONVERT(VARCHAR(10), (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)),101) AS 'Date'
,AJ.PrimaryObjectIdentity
,AJ.SecondaryObjectIdentity
,AJ.MessageType
,AJ.PrimaryObjectName
,AJ.SecondaryObjectName
,AP.Text13
,COUNT(*) AS 'Count'
FROM Access.JournalLogView AJ
LEFT OUTER JOIN Access.Personnel as AP on AP.GUID = AJ.PrimaryObjectIdentity
WHERE (MessageType like 'CardAdmitted' OR MessageType like 'CardRejected')
AND (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)) BETWEEN #Begin AND #End
AND (SecondaryObjectName IN (#Door))
GROUP BY CONVERT(VARCHAR(10), (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)),101)
,PrimaryObjectIdentity
,SecondaryObjectIdentity
,MessageType
,PrimaryObjectName
,SecondaryObjectName
,Text13
ORDER BY AJ.PrimaryObjectName
I want to get the columns called out in the SELECT statement plus the datetime which includes the second. Again I also want the most efficient way of pulling this data as well. Thank you very much.
Assuming PrimaryObjectIdentity is the primary key to find the personnel in JournalLogview and ServerLocaleOffset as the datetime column in that table,I have written down this:
DECLARE #Begin datetime2 = '4/10/2019',
#End datetime2 = '4/11/2019',
#Door varchar(max) = 'Front Entrance'
WITH cte
AS(
SELECT
ROW_NUMBER() OVER
(PARTITION BY PrimaryObjectIdentity,CONVERT(VARCHAR(10), (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)),101) ORDER BY ServerLocaleOffset) AS row_num,
--whatever the columns you want here
*
FROM
Access.JournalLogView)
SELECT
DateAdd(minute,-(ServerLocaleOffset),ServerUTC)) AS 'DateTime'
,AJ.PrimaryObjectIdentity
,AJ.SecondaryObjectIdentity
,AJ.MessageType
,AJ.PrimaryObjectName
,AJ.SecondaryObjectName
,AP.Text13
--I guess count(*) won't be of use a we are selecting only the first row
,COUNT(*) AS 'Count'
FROM cte AJ
LEFT OUTER JOIN
Access.Personnel as AP
on
AP.GUID = AJ.PrimaryObjectIdentity
WHERE
AJ.row_num = 1
AND (MessageType like 'CardAdmitted' OR MessageType like 'CardRejected')
AND (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)) BETWEEN #Begin AND #End
AND (SecondaryObjectName IN (#Door))
GROUP BY (DateAdd(minute,-(ServerLocaleOffset),ServerUTC))
,PrimaryObjectIdentity
,SecondaryObjectIdentity
,MessageType
,PrimaryObjectName
,SecondaryObjectName
,Text13
ORDER BY AJ.PrimaryObjectName
In this query, I have used PARTITION to partition the whole table by each user, date and then assign row_number() to each row starting from the first entry of each user in that particular date. So, any row with row_num() = 1 will give you the first entry of that user in that date (which is the same condition I have used in the where clause). Hope this helps :)

Joining multiple tables returning duplicates

I am trying the following select statement including columns from 4 tables. But the results return each row 4 times, im sure this is because i have multiple left joins but i have tried other joins and cannot get the desired result.
select table1.empid,table2.name,table2.datefrom, table2.UserDefNumber1, table3.UserDefNumber1, table4.UserDefChar6
from table1
inner join table2
on table2.empid=table1.empid
inner join table3
on table3.empid=table1.empid
inner join table4
on table4.empid=table1.empid
where MONTH(table2.datefrom) = Month (Getdate())
I need this to return the data without any duplicates so only 1 row for each entry.
I would also like the "where Month" clause at the end look at the previous month not the current month but struggling with that also.
I am a bit new to this so i hope it makes sense.
Thanks
If the duplicate rows are identical on each column you can use the DISTINCT keyword to eliminate those duplicates.
But I think you should reconsider your JOIN or WHERE clause, because there has to be a reason for those duplicates:
The WHERE clause hits several rows in table2 having the same month on a single empid
There are several rows with the same empid in one of the other tables
both of the above is true
You may want to rule those duplicate rows out by conditions in WHERE/JOIN instead of the DISTINCT keyword as there may be unexpected behaviour when some data is changing in a single row of the original resultset. Then you start having duplicate empids again.
You can check if a date is in the previous month by following clause:
date BETWEEN dateadd(mm, -1, datefromparts(year(getdate()), month(getdate()), 1))
AND datefromparts(year(getdate()), month(getdate()), 1)
This statment uses DATEFROMPARTS to create the beginning of the current month twice, subtract a month from the first one by using DATEADD (results in the beginning of the previous month) and checks if date is between those dates using BETWEEN.
If your query is returning duplicates, then one or more of the tables have duplicate empid values. This is a data problem. You can find them with queries like this:
select empid, count(*)
from table1
group by empid
having count(*) > 1;
You should really fix the data and query so it returns what you want. You can do a bandage solution with select distinct, but I would not usually recommend that. Something is causing the duplicates, and if you do not understand why, then the query may not be returning the results you expect.
As for your where clause. Given your logic, the proper way to express this would include the year:
where year(table2.datefrom) = year(getdate()) and
month(table2.datefrom) = month(Getdate())
Although there are other ways to express this logic that are more compatible with indexes, you can continue down this course with:
where year(table2.datefrom) * 12 + month(table2.datefrom) = year(getdate()) * 12 + Month(Getdate()) - 1
That is, convert the months to a number of months since time zero and then use month arithmetic.
If you care about indexes, then your current where clause would look like:
where table2.datefrom >= dateadd(day,
- (day(getdate) - 1),
cast(getdate() as date) and
table2.datefrom < dateadd(day,
- (dateadd(month, 1, getdate()) - 1),
cast(dateadd(month, 1, getdate()) as date)
Eliminate duplicates from your query by including the distinct keyword immediately after select
Comparing against a previous month is slightly more complicated. It depends what you mean:
If the report was run on the 23rd Jan 2015, would you want 01/12/2014-31/12/2014 or 23/12/2014-22/01/2015?

SQL Query, return value from table with no common key

I'm hoping for an idea on the best way to approach what I'm trying to do.
I have a table with a list of transactions. Each transactions has a PostDate in DateTime format. I have another table holding the fiscal period values. This table has the following columns; FiscalYear, FiscalMonth, StartDate, EndDate.
I'm trying to write a query that will return all values from my transactions table, along with the FiscalYear and FiscalMonth of the PostDate. So I guess I'm just trying to return the FiscalYear and FiscalMonth values when the PostDate falls between the StartDate and EndDate.
I've tried using a Subbuery, but I have little experience with them and kept returning an error message that the subquery was returning more than 1 value. Help would be appreciated
EDIT: Sorry, here is the query I tried. I also changed the title from "with no join", to "with no common key" to more accurately reflect my problem
SELECT Transactions.PostDate, Transactions.TranKey, Transactions.CustKey,
(SELECT FiscalPeriod.FiscPer
FROM FiscalPeriod
WHERE (Transactions.PostDate > CONVERT(Datetime, FiscalPeriod.StartDate, 102)) AND (Transactions.PostDate < CONVERT(DATETIME, FiscalPeriod.EndDate, 102))) AS FisPer
FROM Transactions
You should be able to eliminate the subquery and use a join like this:
SELECT Transactions.PostDate, Transactions.TranKey, Transactions.CustKey, FiscPer
FROM Transactions
INNER JOIN FiscalPeriod ON (PostDate BETWEEN StartDate AND EndDate)
although this is not quite the same - the subquery will show all the records even if the postdate isn't covered by the fiscal table, if you want that, change this join to a LEFT JOIN.
Maybe you need to join this two tables with something in common, or doing something like this:
SELECT Transactions.PostDate, Transactions.TranKey, Transactions.CustKey,
(SELECT **distinct** FiscalPeriod.FiscPer
FROM FiscalPeriod
WHERE (Transactions.PostDate > CONVERT(Datetime, FiscalPeriod.StartDate, 102)) AND (Transactions.PostDate < CONVERT(DATETIME, FiscalPeriod.EndDate, 102))) AS FisPer
FROM Transactions
remember, if you had this :
2004-01
2004-02
2004-03
for the fiscalperiod.FiscPer the distinct keyword will not work

SQL merging result sets on a unique column value

I have 2 similar queries which both work on the same table, and I essentially want to combine their results such that the second query supplies default values for what the first query doesn't return. I've simplified the problem as much as possible here. I'm using Oracle btw.
The table has account information in it for a number of accounts, and there are multiple entries for each account with a commit_date to tell when the account information was inserted. I need get the account info which was current for a certain date.
The queries take a list of account ids and a date.
Here is the query:
-- Select the row which was current for the accounts for the given date. (won't return anything for an account which didn't exist for the given date)
SELECT actr.*
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
AND actr.commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ')
AND actr.commit_date =
(
SELECT MAX(actrInner.commit_date)
FROM Account_Information actrInner
WHERE actrInner.account_id = actr.account_id
AND actrInner.commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ')
)
This looks a little ugly, but it returns a single row for each account which was current for the given date. The problem is that it doesn't return anything if the account didn't exist until after the given date.
Selecting the earliest account info for each account is trival - I don't need to supply a date for this one:
-- Select the earliest row for the accounts.
SELECT actr.*
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
AND actr.commit_date =
(
SELECT MAX(actrInner .commit_date)
FROM Account_Information actrInner
WHERE actrInner .account_id = actr.account_id
)
But I want to merge the result sets in such a way that:
For each account, if there is account info for it in the first result set - use that.
Otherwise, use the account info from the second result set.
I've researched all of the joins I can use without success. Unions almost do it but they will only merge for unique rows. I want to merge based on the account id in each row.
Sql Merging two result sets - my case is obviously more complicated than that
SQL to return a merged set of results - I might be able to adapt that technique? I'm a programmer being forced to write SQL and I can't quite follow that example well enough to see how I could modify it for what I need.
The standard way to do this is with a left outer join and coalesce. That is, your overall query will look like this:
SELECT ...
FROM defaultQuery
LEFT OUTER JOIN currentQuery ON ...
If you did a SELECT *, each row would correspond to the current account data plus your defaults. With me so far?
Now, instead of SELECT *, for each column you want to return, you do a COALESCE() on matched pairs of columns:
SELECT COALESCE(currentQuery.columnA, defaultQuery.columnA) ...
This will choose the current account data if present, otherwise it will choose the default data.
You can do this more directly using analytic functions:
select *
from (SELECT actr.*, max(commit_date) over (partition by account_id) as maxCommitDate,
max(case when commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ') then commit_date end) over
(partition by account_id) as MaxCommitDate2
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
) t
where (MaxCommitDate2 is not null and Commit_date = MaxCommitDate2) or
(MaxCommitDate2 is null and Commit_Date = MaxCommitDate)
The subquery calculates two values, the two possibilities of commit dates. The where clause then chooses the appropriate row, using the logic that you want.
I've combined the other answers. Tried it out at apex.oracle.com. Here's some explanation.
MAX(CASE WHEN commit_date <= to_date('2010-DEC-30', 'YYYY-MON-DD')) will give us the latest date not before Dec 30th, or NULL if there isn't one. Combining that with a COALESCE, we get
COALESCE(MAX(CASE WHEN commit_date <= to_date('2010-DEC-30', 'YYYY-MON-DD') THEN commit_date END), MAX(commit_date)).
Now we take the account id and commit date we have and join them with the original table to get all the other fields. Here's the whole query that I came up with:
SELECT *
FROM Account_Information
JOIN (SELECT account_id,
COALESCE(MAX(CASE WHEN commit_date <=
to_date('2010-DEC-30', 'YYYY-MON-DD')
THEN commit_date END),
MAX(commit_date)) AS commit_date
FROM Account_Information
WHERE account_id in (30000316, 30000350, 30000351)
GROUP BY account_id)
USING (account_id, commit_date);
Note that if you do use USING, you have to use * instead of acrt.*.

SELECT with MAX and SUM from multiple tables

I have 3 tables :
weather_data (hourly_date, rain)
weather_data_calculated (hourly_date, calc_value)
weather_data_daily (daily_date, daily_value)
I would like to get a list of DAILY value from these 3 tables using this select :
SELECT daily_date, daily_value, SUM(rain), MAX(calc_value)
The SUM and the MAX need to be done for all the hour of the day.
This is what I did :
SELECT
date_format(convert_tz(daily_date, 'GMT', 'America/Los_Angeles'), '%Y-%m-%d 00:00:00') as daily_date_gmt,
daily_value,
SUM(rain),
MAX(calc_value)
FROM weather_data_daily wdd, weather_data wd, weather_data_calculated wdc
WHERE daily_date_gmt=date_format(convert_tz(wd.hourly_date, 'GMT', 'America/Los_Angeles'), '%Y-%m-%d 00:00:00')
and daily_date_gmt=date_format(convert_tz(wdc.hourly_date, 'GMT', 'America/Los_Angeles'), '%Y-%m-%d 00:00:00')
group by daily_date_gmt
order by daily_date_gmt;
This didn't work because I don't know how to deal with the group by in this case.
I also try to use a temporary table but without success too.
Thanks for your help!
Either include daily_value in your group by, or use two queries. One will contain the date column and the two aggregates, the other will contain the date column and daily value. you can then use a single outer query to join these result sets on the date column.
EDIT: You say in your comment that including daily_value in the group by means the query doesn't complete. This is because (probably) you have no join criteria between all the tables your query includes. This will result in a potentially VERY large result set which would take a very long time. I don't mind helping with the actual SQL but you will need to update your question so that we can see which fields are coming from which tables.
Assuming you only have one entry for daily_date, daily_value in 'weather_data_daily' you should
GROUP BY daily_date, daily_value, then your aggregrations (SUM and MAX) will operate on the correct grouping.
try this:
select a.daily_date, a.daily_value, SUM(b.rain), MAX(c.calc_value)
from weather_data_daily a,weather_data b,weather_data_calculated c
where convert(varchar, a.daily_date, 101)=convert(varchar, b.hourly_date, 101)
and convert(varchar, a.daily_date, 101)=convert(varchar, c.hourly_date, 101)
group by a.daily_date, a.daily_value
You have to connect the tables together somehow (this uses an inner join). This requires getting the hourly dates and other dates in the same format. This gives them the format MM/DD/YYYY.