Sorting by date across two separate columns in a Full Outer Join - sql

I have two columns of data I am lining up using a Full Outer Join but it includes two separate date columns which make it challenging to sort by.
Table 1 has sales rank data for a product.
Table 2 has actual sales data for the same product.
Each table may have entries for dates on which the other does not.
So envision after the full join, we end up with something like this simplified example:
ProdID L.Date P.Date Rank Units
101 null 2011-10-01 null 740
101 2011-10-02 2011-10-02 23 652
101 2011-10-03 null 32 null
Here is the query I am using to pull this data:
select L.ListID, L.ASIN, L.date, L.ranking, P.ASIN, P.POSdate, P.units from ListItem L
full outer join POSdata P on
L.ASIN = P.ASIN and
L.date = P.POSdate and
(L.ListID = 1 OR L.ASIN is null)
where (L.ASIN = 'xxxxxxxxxx' and L.ListID = 1) or
(P.ASIN = 'xxxxxxxxxx' and L.BookID is null)
order by POSdate, date
It's a bit more complex because products may appear on multiple lists so I have to account for that as well, but it returns the data I need. I am open to suggestions on improving it of course should someone have one.
The problem is, how can I sort this properly when both date columns are likely to have at least some NULLs in them. The way I am Ordering By now will not work when both columns have at one NULL.
Thanks.

ORDER BY ISNULL(P.POSdate,L.date) should do what you need I think?

Related

Get full drilldown of a sparce fact table

I have a transaction fact table and product,time and location as a dimension tables. This fact table is sparse so if no Pizzas sold in January there is no record for Pizza in fact table.
When I drill down by product aggregated results for Pizza which is not in the result. But I want it with 0 values as units_sold = 0.
A solution is to join product table to fact table with a left outer join. Then I can get the desired result.
But when I cut with another dimension such as location or time, again those products are missing in the result.
Outer join provides empty column for other dimensional foreign keys so WHERE clause will remove them again.
How can I solve the problem? (I use ROLAP)
Use join condition is a good idea as some people answered. But I need more general solution.
For example,
Table1
person birth year death year
a 1950 2006
b 1952 2008
c 1960 2007
d 1953 1990
I want to get year by year count of the people that born between 1950-1953 and died in 2006-2008.
Like
birth = 1950 death = 2006 count = 1
birth = 1951 death = 2006 count = 0
...
Can we handle this scenario by using join conditions and where conditions appropriately.
You want LEFT JOIN, and then LEFT JOIN again. Then conditions go in the on clause. Something like this:
select . . .
from products p left join
fact f
on p.product_id = f.product_id left join
timedim td
on f.time_id = td.time_id and
td.month = 'January'

SQL Server: Two COUNTs in one query multiplying with one another in output

I have a query is used to display information in a queue and part of that information is showing the amount of child entities (packages and labs) that belong to the parent entity (change). However instead of showing the individual counts of each type of child, they multiply with one another.
In the below case, there are supposed to be 3 labs and 18 packages, however the the multiply with one another and the output is 54 of each.
Below is the offending portion of the query.
SELECT cef.ChangeId, COUNT(pac.PackageId) AS 'Packages', COUNT(lab.LabRequestId) AS 'Labs'
FROM dbo.ChangeEvaluationForm cef
LEFT JOIN dbo.Lab
ON cef.ChangeId = Lab.ChangeId
LEFT JOIN dbo.Package pac
ON (cef.ChangeId = pac.ChangeId AND pac.PackageStatus != 6 AND pac.PackageStatus !=7)
WHERE cef.ChangeId = 255
GROUP BY cef.ChangeId
I feel like this is obvious but it's not occurring to me how to fix it so the two counts are independent of one another like to me they should be. There doesn't seem to be a scenario like this in any of my research either. Can anyone guide me in the right direction?
Because you do multiply source rows by each left join. So sometimes you have more likely cross join here.
SELECT cef.ChangeId, p.Packages, l.Labs
FROM dbo.ChangeEvaluationForm cef
OUTER APPLY(
SELECT COUNT(*) as Labs
FROM dbo.Lab
WHERE cef.ChangeId = Lab.ChangeId
) l
OUTER APPLY(
SELECT COUNT(*) AS Packages
FROM dbo.Package pac
WHERE (cef.ChangeId = pac.ChangeId AND pac.PackageStatus != 6 AND pac.PackageStatus !=7)
) p
WHERE cef.ChangeId = 255
GROUP BY cef.ChangeId
perhaps GROUP BY is not needed now.
From you question its difficult to derive what result do you expect from your query. So I presume you want following result:
+----------+----------+------+
| ChangeId | Packages | Labs |
+----------+----------+------+
| 255 | 18 | 3 |
+----------+----------+------+
Try below query if you are looking for above mentioned result.
SELECT cef.ChangeId, ISNULL(pac.PacCount, 0) AS 'Packages', ISNULL(Lab.LabCount, 0) AS 'Labs'
FROM dbo.ChangeEvaluationForm cef
LEFT JOIN (SELECT Lab.ChangeId, COUNT(*) LabCount FROM dbo.Lab GROUP BY) Lab
ON cef.ChangeId = Lab.ChangeId
LEFT JOIN (SELECT pac.ChangeId, COUNT(*) PacCount FROM dbo.Package pac WHERE pac.PackageStatus != 6 AND pac.PackageStatus !=7 GROUP BY pac.ChangeId) pac
ON cef.ChangeId = pac.ChangeId
WHERE cef.ChangeId = 255
Query Explanation:
In your query you didn't use group by, so it ended up giving you 54 as count which is Cartesian product.
In this query I tried to group by 'ChangeId' and find aggregate before joining tables. So 3 labs and 18 packages will be counted before join.
Your will also notice that I have moved PackageStatus filter before group by in pac table. So unwanted record won't mess with our count.
You start with a particular ChangeId from the dbo.ChangeEvaluationForm table (ChangeId = 255 from your example), then join to the dbo.Lab table. This join makes your result go from 1 row to 3, considering there are 3 Labs with ChangeId = 255. Your problem is on the next join, you are joining all 3 resulting rows from the previous join with the dbo.Package table, which has 18 rows for ChangeId = 255. The resulting count for columns pac.PackageId and lab.LabRequestId will then be 3 x 18 = 54.
To get what you want, there are 2 easy solutions:
Use COUNT DISTINCT instead of COUNT. This will just count the different values of pac.PackageId and lab.LabRequestId and not the repeated ones.
Split the joins into 2 subqueries and join their result (by ChangeId)

SQL join two tables and the elements that satisfies one condition

Good afternoon,
I'm having an issue with two tables that I'm trying to join.
What I am trying to do is, I have to print a table with all products that is registered in some agenda (codControl), so the person can put his price.
But first I have to look into lctocotacao to see if he had already given a price to some product. But when I do this, I just get the products that has some price, and the other ones I dont see.
Here is an example of my table cadprodutoscotacao
codProduct desc codControl
1 abc 197
2 cde 197
3 fgh 197
1 abc 198
And my table lctocotacao
codProduct price codControl codPerson
1 2.5000 197 19
2 3.0000 197 37
3 4.5000 198 37
I have this SQL statement at the moment:
SELECT cadc.cod, cadc.desc, lcto.codEnt, lcto.price
FROM cadprodutoscotacao cadc JOIN lctocotacao lcto
ON cadc.codControl = lcto.codControl
AND cadc.codProduct = lcto.codProduct
AND cadc.codControl = '197'
AND lcto.codPerson = '19'
ORDER BY cadc.codControl;
What I'm getting:
cod desc price codPerson codControl
1 abc 2.5000 19 197
And the table I expect
cod desc price codPerson codControl
1 abc 2.5000 19 197
2 cde 197
3 fgh 197
197 and 19 will be parameters to my query.
Any ideas on how to proceed?
E D I T
Basically, I have two queries:
SELECT *
FROM cadprodutoscotacao
WHERE cadc_codControl = '197'
This first, to return all products registered in the agenda '197'.
And the second one:
SELECT *
FROM lctocotacao
WHERE codPerson = 19
AND codControl = '197'
This second one to return products that already has some price added by the Person 19 in the agenda 197.
I have to return one table, including all records from the first query, and, if there is some price in the second one, I have to "concatenate" them.
Thanks in advance.
You need a LEFT JOIN, but you also need to be careful about the filtering conditions:
SELECT cadc.cod, cadc.desc, lcto.codEnt, lcto.price
FROM cadprodutoscotacao cadc LEFT JOIN
lctocotacao lcto
ON cadc.codControl = lcto.codControl AND
cadc.cod = lcto.cod AND
lcto.codEnt = '19'
WHERE cadc.codControl = '197'
ORDER BY cadc_codigo;
A LEFT JOIN keeps all rows in the first table, regardless of whether a match is found in the ON conditions. This applies to conditions on the first table as well as the second. Hence, you don't want to put filters on the first table in the ON clause.
The rule is: When using LEFT JOIN put filters on the first table in the WHERE clause. Filters on the second table go in the ON clause (otherwise the outer join is generally turned into an inner join).
Your rows are filtered because you specified JOIN, which is a shortcut for INNER JOIN
If you want all the records from the left table, even if they don't have correlated records in the right table, you should do a LEFT JOIN:
SELECT cadc.cod, cadc.desc, lcto.codEnt, lcto.price
FROM cadprodutoscotacao cadc
LEFT JOIN lctocotacao lcto
ON cadc.codControl = lcto.codControl
AND cadc.cod = lcto.cod
AND cadc.codControl = '197'
AND lcto.codEnt = '19'
ORDER BY cadc_codigo;
I don't understand your example. What are the primary keys? "cod" and "codentry" appear in both tables. Your schema seems to be very redundant.
But whenever someone JOINs and is missing some entries, it might be solved by using a LEFT OUTER JOIN.

Group several tables on a datetime including same table

I am trying to collate several tables into a single row display for reporting purposes - all on a date time value. I do not have an issue when joining disparate tables on say a CTE of datetime values. When I encounter a table with a different FK I get too many records for this to work. Example
Reporting Structure
DateTime EngineName EngineValue PartName PartValue Part2Name Part2Value
20160118 00:00 Engine1 100 Part1 100 Part2 200
Engine Table
DateTime EngineName EngineValue
20160118 00:00 Engine1 100
Part Table
DateTime Name(fK) Value
20160118 00:00 Part1 100
20160118 00:00 Part2 200
At this point I have tried to create a CTE of datetimes and join the logs to the CTE on the datetime. I can't get this work and I know I'm not the first to create reports like this.
If we assume Part1 and Part2 are actual names and static in number then you can do something like this:
SELECT E.DateTime, E.EngineName, E.EngineValue, P1.PartName, P1.PartValue, P2.Part2Name, P2.Part2Value
FROM Engine E
INNER JOIN Part P1
on E.DateTime = P1.DateTime
and P1.Name = 'Part1'
INNER JOIN Part P2
on E.DateTime = P1.DateTime
and P2.Name = 'Part2'
If they are more dynamic in nature then Dynamic SQL will be needed.
If there's always 2 you could use an analytic such as row_number() over (partition by datetime order by name) as RN and join on row 1 and 2 instead of part names. i.e. instead of P2.Name ='part2' use P2.RN = 2 but this limits it to always be only 2 parts, we ignore all others.
So the real question is:
Are the # of parts per engine needing to be reported always 2? always the same "part name" with different values? or dynamic in nature?

2 Queries same logic but different no. of output rows

I have 2 query which both aims to select all batchNo that follows 3 conditions:
ClaimStatus must be 95 or 90
CreatedBy = ProviderLink
The minimum dateUpdate should be from 3pm yesterday until when this query was run
Query 1: Outputs 940 rows
SELECT
DISTINCT bh.BatchNo,
bh.Coverage,
DateUploaded = MIN(csa.DateUpdated)
FROM
Registration2..BatchHeader bh with(nolock)
INNER JOIN ClaimsProcess..BatchHeader bhc with(nolock) on bhc.BatchNo = bh.BatchNo
INNER JOIN ClaimsInfo ci with(nolock) on ci.BatchNo = bhc.BatchNo
INNER JOIN Claims c with(nolock) on c.ClaimNo = ci.ClaimNo
INNER JOIN ClaimStatusAudit csa WITH(NOLOCK) on csa.CLAIMNO = ci.ClaimNo
WHERE c.ClaimStatus in('95','90') AND bhc.CreatedBy = 'PROVIDERLINK'
GROUP BY bh.BatchNo, bh.Coverage
HAVING MIN(CSA.DateUpdated) >= convert(varchar(10),GETDATE() -1,110) + ' 15:00:00.000'
Query 2: Outputs 1314 rows
SELECT
DISTINCT bh.BatchNo,
bh.Coverage
FROM Registration2..BatchHeader bh with(nolock)
INNER JOIN ClaimsProcess..BatchHeader bhc with(nolock) on bhc.BatchNo = bh.BatchNo
INNER JOIN ClaimsInfo ci with(nolock) on ci.BatchNo = bhc.BatchNo
INNER JOIN Claims c with(nolock) on c.ClaimNo = ci.ClaimNo
WHERE c.ClaimStatus in('95','90') AND bhc.CreatedBy = 'PROVIDERLINK'
AND (SELECT MIN(DATEUPDATED) FROM CLAIMSTATUSAUDIT WITH(NOLOCK)WHERE CLAIMNO = ci.ClaimNo) >= convert(varchar(10),GETDATE() -1,110) + ' 15:00:00.000'
Though both got the same logic.. they output different number of rows... I would like to know which among the two is more accurate...
BTW.. Both outputs follow the 3 given conditions..
Your assumption is wrong. These two queries are not employing the same logic, simply because of the order in which each clause is evaluated. Clauses are evaluated in the following order (see here for the full article):
From
Where
Group By
Having
Select
Order By
With that detail out of the way, let's analyze why these two queries return a different number of rows.
The reason you're returning a different number of rows is because of when you are filtering for a date prior to after 3pm today.
In Query 1, you're selecting all Batch Numbers and Coverages that meet two conditions:
1. have corresponding records in all joined tables
2. have the desired claim status and were created by "ProviderLink"
You get this list of records once the From, Where, and Group by clauses have been executed.
You are then running the aggregate calculation (Min) on that set of data, pulling the minimum DateUpdated, yet you have not yet put any restriction on how the DateUpdated should be limited. So when you then group your data and filter the groups using the Having clause, you're filtering out all records that meet criteria from numbers 1 and 2 above and also had a DateUpdated prior to 3pm today. Let's look at an example.
Record 1 has a BatchNo 123 and Coverage A and was last updated on 4/4/2014 12:00:00.000
Record 2 has a BatchNo 123 and Coverage A and was last updated today at 5/1/2014 3:01:00.000
Assuming Records 1 & 2 have corresponding records in all joined tables, query 1 will pull back the distinct BatchNo and Coverage (123 & A, respectively) and find the minimum DateUpdated which is 4/4/2014 12:00:00.000. Then, once grouped, your Having clause will say the DateUpdated is not greater than today at 3pm, so it will filter the grouped records out.
Query 2, on the other hand, is taking a different approach. It will see Records 1 and 2 as the same in terms of BatchNo & Coverage because those values are identical. However, in the where clause (i.e., the initial filtering process), it's only looking for records where the minimum DateUpdated is greater than today at 3pm, so it's finding Record 2, and returning it in the dataset.
I think you will find that is the case with the 374 missing records from Dataset 1.
All that said, and with the understanding that we cannot tell you which dataset is better, you'll find that Query 1 will only show groups of distinct BatchNos & Coverages where the minimum DateUpdated among any of the records falling into that group was last updated after 3pm today. This means Query 1 is returning only BatchNos and Coverages which contain very new records.
Query 2 is returning any distinct BatchNo & Coverage groupings where any record within its group was last updated after 3pm today. So which one is right for you?