Joining multiple tables returning duplicates - sql

I am trying the following select statement including columns from 4 tables. But the results return each row 4 times, im sure this is because i have multiple left joins but i have tried other joins and cannot get the desired result.
select table1.empid,table2.name,table2.datefrom, table2.UserDefNumber1, table3.UserDefNumber1, table4.UserDefChar6
from table1
inner join table2
on table2.empid=table1.empid
inner join table3
on table3.empid=table1.empid
inner join table4
on table4.empid=table1.empid
where MONTH(table2.datefrom) = Month (Getdate())
I need this to return the data without any duplicates so only 1 row for each entry.
I would also like the "where Month" clause at the end look at the previous month not the current month but struggling with that also.
I am a bit new to this so i hope it makes sense.
Thanks

If the duplicate rows are identical on each column you can use the DISTINCT keyword to eliminate those duplicates.
But I think you should reconsider your JOIN or WHERE clause, because there has to be a reason for those duplicates:
The WHERE clause hits several rows in table2 having the same month on a single empid
There are several rows with the same empid in one of the other tables
both of the above is true
You may want to rule those duplicate rows out by conditions in WHERE/JOIN instead of the DISTINCT keyword as there may be unexpected behaviour when some data is changing in a single row of the original resultset. Then you start having duplicate empids again.
You can check if a date is in the previous month by following clause:
date BETWEEN dateadd(mm, -1, datefromparts(year(getdate()), month(getdate()), 1))
AND datefromparts(year(getdate()), month(getdate()), 1)
This statment uses DATEFROMPARTS to create the beginning of the current month twice, subtract a month from the first one by using DATEADD (results in the beginning of the previous month) and checks if date is between those dates using BETWEEN.

If your query is returning duplicates, then one or more of the tables have duplicate empid values. This is a data problem. You can find them with queries like this:
select empid, count(*)
from table1
group by empid
having count(*) > 1;
You should really fix the data and query so it returns what you want. You can do a bandage solution with select distinct, but I would not usually recommend that. Something is causing the duplicates, and if you do not understand why, then the query may not be returning the results you expect.
As for your where clause. Given your logic, the proper way to express this would include the year:
where year(table2.datefrom) = year(getdate()) and
month(table2.datefrom) = month(Getdate())
Although there are other ways to express this logic that are more compatible with indexes, you can continue down this course with:
where year(table2.datefrom) * 12 + month(table2.datefrom) = year(getdate()) * 12 + Month(Getdate()) - 1
That is, convert the months to a number of months since time zero and then use month arithmetic.
If you care about indexes, then your current where clause would look like:
where table2.datefrom >= dateadd(day,
- (day(getdate) - 1),
cast(getdate() as date) and
table2.datefrom < dateadd(day,
- (dateadd(month, 1, getdate()) - 1),
cast(dateadd(month, 1, getdate()) as date)

Eliminate duplicates from your query by including the distinct keyword immediately after select
Comparing against a previous month is slightly more complicated. It depends what you mean:
If the report was run on the 23rd Jan 2015, would you want 01/12/2014-31/12/2014 or 23/12/2014-22/01/2015?

Related

Using the result of a query to determine a value in a where clause in SQL?

So basically I've got a query where I want to filter using a date, however that date may change depending on what datas in the system.
If there's no records with a date of >='X', I want to use >='Y' instead
Currently I've got something like the following mess (Pseudocoded down to avoid using actual table names and such)
With a as (SELECT
count(column_id) as num
FROM tableA
WHERE ADate >= getdate() - 8)
,
b as (SELECT
case when num = '0' then getdate() - 15 else getdate() - 8 end as DateToUseInQuery
from A)
SELECT *
FROM tableB
Bunch of joins to other tables
WHERE BDate >= DateToUseInQuery
The general idea being is if there's no records for the week beforehand, use 2 weeks beforehand
I tried using a query within the where clause like:
WHERE BDate >= (SELECT DateToUseInQuery FROM b)
But the query ran for 11 minutes before I stopped it (Up from about 18 seconds before I tried to put this extra bit in)
I've been thinking about trying to set a variable as the date, but I can't do it in a CTE, and when I do it after, it breaks everything else.
So basically:
Is there an easier way to do this than the cack-handed way I'm trying?
If my way is fine, how can I pass that date properly into the WHERE clause?
You could try something like this. I am using a nested select and case statement to determine the number of days.
There may be a prettier way, but this works.
SELECT COUNT(column_id) AS num FROM TableA
WHERE ADate >=
DATEADD(Day,
(SELECT
CASE
WHEN EXISTS(
SELECT 1 FROM TableA
WHERE ADate >= DATEADD(Day,-8,GETDATE()))
THEN -8
ELSE -15
END AS 'dt')
,GETDATE());

Can't access CTE via inner join SQL Server

I know I'm missing something obvious but it's not so obvious to me!
I've got a table valued function that produces a nice interval range of dates given a start, end, interval (thanks to another SO answer!).
I've another TVF that produces the latest part transaction given a date.
However, I was after being able to produce the last parts transaction in a series of dates lying between the start and end dates given. So, given March to May and an interval of say, 2 days, I'd get a sort of time series between the two.
However, I've hit a wall now with CTE's and was trying to avoid going into procedural/cursor style looping to do this.
This is the code:
WITH datesTbl(DateValue)
AS (SELECT DateValue
FROM [dbo].[DateRange]('2016-03-18', '2016-04-27', 1))
SELECT *
FROM datesTbl dr
INNER JOIN dbo.MoveDateDiff(dr.Datevalue, DATEADD(day, 1, dr.DateValue), 14792) pm
ON DATEDIFF(Day, dr.dateValue, pm.MovementDate) <= 1;
I know I've other conceptual errors in the underlying TVF's however here I'm wanting to find a way past the fact I can't seem to access the CTE in the first part of the Inner Join statement (there is no syntax error after the ON declaration!).
Any guidance would be gratefully received!
When you use a TVF, you want APPLY, not JOIN:
WITH datesTbl(DateValue) as (
SELECT DateValue
FROM [dbo].[DateRange]('2016-03-18', '2016-04-27', 1)
)
SELECT *
FROM datesTbl dr CROSS APPLY
dbo.MoveDateDiff(dr.Datevalue, DATEADD(day, 1, dr.DateValue), 14792) pm
WHERE DATEDIFF(Day, dr.dateValue, pm.MovementDate) <= 1;

SQL - Syntax for data within X number of days pulls all data

I have a query I need to run regularly which pulls errors from a log table for x number of days. I originally found the syntax for the x number of days on here highlighted below with --<<.
The script pulls the information I want, throws no errors, but it pulls items from all dates not just past X days I am looking for. No matter what value I place before ...,getdate()) it pulls all the records.
Any Ideas how to get it to correctly give me the date range from today that I am looking for?
SELECT
t2.[campaignshortname],
t1.[CampaignId],
t1.[CreatedDtTm],
t1.[Msg],
t1.[ReferenceDate]
FROM
[alchemy].[CM].[CampaignLog] AS t1 (nolock)
INNER JOIN
[alchemy].[CM].[Campaign] AS t2 ON t1.CampaignId = t2.Id
WHERE
CAST(t1.[ReferenceDate] AS DATE) <= dateadd(DAY, -30, getdate()) --<<
AND t1.Msg LIKE '%fail%' OR t1.Msg LIKE '%error%'
ORDER BY
ReferenceDate DESC
I would actually change the query a little bit more and use between:
where convert(datetime, t1.[ReferenceDate]) between getdate()-30 and getdate()
and (t1.Msg like '%fail%' or t1.Msg like '%error%')
Although those to likes will cause a full scan of either index or table unless you subquery it.
You need to enclose your t1.Msg like '%fail%' or t1.Msg like '%error%' in parentheses, as and precedes or in execution order. As is, your condition searches in your table first rows where the date is 30 days or more before current date and message contains the word "fail", and then it searches in your table the rows where the message contains the word "error". Your condition then should be:
where CAST(t1.[ReferenceDate] AS DATE) <= dateadd(DAY, -30,getdate())
and (t1.Msg like '%fail%' or t1.Msg like '%error%')

SQL group by 1 column but include TOP 1 of other columns

I am trying to build a SQL query where I group by 1 column, but then also include the values of other columns from an arbitrary record in each group. So, something like
SELECT BoxNo
FROM MuffinData
WHERE FrostingTimeApplied >= CONVERT(date, GETDATE())
GROUP BY BoxNo
but including some value from columns MuffinType, FrostingType in the result (I know that there will be only 1 value of MuffinType and FrostingType per box.)
You have to use an aggregate function for each column selected that is not present in the GROUP BY clause:
SELECT BoxNo, MAX(MuffinType) AS MuffinType, MAX(FrostingType) AS FrostingType
FROM MuffinData
WHERE FrostingTimeApplied >= CONVERT(date, GETDATE())
GROUP BY BoxNo
If there is only 1 value of MuffinType and FrostingType per box, then these unique values per box no are going to be selected in the above query.
I know that there will be only 1 value of MuffinType and FrostingType
per box
If that's indeed the case, a simple DISTINCT should do the trick, like so:
SELECT DISTINCT BoxNo, MuffinType, FrostingType
FROM MuffinData
WHERE FrostingTimeApplied >= CONVERT(date, GETDATE());
If that's not the case, you're dealing with a problem known generally as the Top N per group problem. You can find coverage of the problem and suggested solutions here.
Cheers,
Itzik
If you're grouping by anything, then the only way to do this in a single statement (that I'm aware of) is to have the other columns you're returning be the result of an aggregate function. Aggregate functions are anything that take multiple values but return you a single result like: SUM, MAX, MIN, COUNT, etc...
SELECT BoxNo, COUNT(MuffinData.ID), MAX(FrostingType.FlavorID) FROM MuffinData, FrostingType etc...
You might have to adjust your WHERE logic or have another data source in your FROM list (subquery).
You can use a CTE and join back to the original table to get the fields you want. In this case,
WITH BoxGroup AS (SELECT BoxNo FROM MuffinData WHERE FrostingTimeApplied >= CONVERT(date, GETDATE()) GROUP BY BoxNo) SELECT md.BoxNo,md.MuffinType,md.FrostingType FROM MuffinData md INNER JOIN BoxGroup bg ON bg.BoxNo = md.BoxNo

SQL Query, return value from table with no common key

I'm hoping for an idea on the best way to approach what I'm trying to do.
I have a table with a list of transactions. Each transactions has a PostDate in DateTime format. I have another table holding the fiscal period values. This table has the following columns; FiscalYear, FiscalMonth, StartDate, EndDate.
I'm trying to write a query that will return all values from my transactions table, along with the FiscalYear and FiscalMonth of the PostDate. So I guess I'm just trying to return the FiscalYear and FiscalMonth values when the PostDate falls between the StartDate and EndDate.
I've tried using a Subbuery, but I have little experience with them and kept returning an error message that the subquery was returning more than 1 value. Help would be appreciated
EDIT: Sorry, here is the query I tried. I also changed the title from "with no join", to "with no common key" to more accurately reflect my problem
SELECT Transactions.PostDate, Transactions.TranKey, Transactions.CustKey,
(SELECT FiscalPeriod.FiscPer
FROM FiscalPeriod
WHERE (Transactions.PostDate > CONVERT(Datetime, FiscalPeriod.StartDate, 102)) AND (Transactions.PostDate < CONVERT(DATETIME, FiscalPeriod.EndDate, 102))) AS FisPer
FROM Transactions
You should be able to eliminate the subquery and use a join like this:
SELECT Transactions.PostDate, Transactions.TranKey, Transactions.CustKey, FiscPer
FROM Transactions
INNER JOIN FiscalPeriod ON (PostDate BETWEEN StartDate AND EndDate)
although this is not quite the same - the subquery will show all the records even if the postdate isn't covered by the fiscal table, if you want that, change this join to a LEFT JOIN.
Maybe you need to join this two tables with something in common, or doing something like this:
SELECT Transactions.PostDate, Transactions.TranKey, Transactions.CustKey,
(SELECT **distinct** FiscalPeriod.FiscPer
FROM FiscalPeriod
WHERE (Transactions.PostDate > CONVERT(Datetime, FiscalPeriod.StartDate, 102)) AND (Transactions.PostDate < CONVERT(DATETIME, FiscalPeriod.EndDate, 102))) AS FisPer
FROM Transactions
remember, if you had this :
2004-01
2004-02
2004-03
for the fiscalperiod.FiscPer the distinct keyword will not work