Apportioning data into new columns - sql

Morning,
I am quite new to SQL Server 2008 so I was wondering if you could help me.
I currently have:
SELECT
c.code, d.date, d.date_previous,
CAST(d.date-date_previous as int) AS Days,
d.units, d.cost
FROM table1 AS d
INNER JOIN table2 AS p ON d.ID = p.ID
INNER JOIN table3 AS c ON p.c_id = c.ID
WHERE date_previous > '31/12/2012'
This is bringing back one row per invoice received after 31/12/2012. The aim is to get the following columns:
Code Jan data Feb data Mar data etc...
one unique code per line (so I'm assuming row partitioning is required)
Where a bill has a period of 3 months with, for example, 300 units, I'd like that separated out across 3 months (100 in each)
I'm aware I'd probably need to use a pivot function and some temp tables but I'm not that advanced yet.

Related

Get full drilldown of a sparce fact table

I have a transaction fact table and product,time and location as a dimension tables. This fact table is sparse so if no Pizzas sold in January there is no record for Pizza in fact table.
When I drill down by product aggregated results for Pizza which is not in the result. But I want it with 0 values as units_sold = 0.
A solution is to join product table to fact table with a left outer join. Then I can get the desired result.
But when I cut with another dimension such as location or time, again those products are missing in the result.
Outer join provides empty column for other dimensional foreign keys so WHERE clause will remove them again.
How can I solve the problem? (I use ROLAP)
Use join condition is a good idea as some people answered. But I need more general solution.
For example,
Table1
person birth year death year
a 1950 2006
b 1952 2008
c 1960 2007
d 1953 1990
I want to get year by year count of the people that born between 1950-1953 and died in 2006-2008.
Like
birth = 1950 death = 2006 count = 1
birth = 1951 death = 2006 count = 0
...
Can we handle this scenario by using join conditions and where conditions appropriately.
You want LEFT JOIN, and then LEFT JOIN again. Then conditions go in the on clause. Something like this:
select . . .
from products p left join
fact f
on p.product_id = f.product_id left join
timedim td
on f.time_id = td.time_id and
td.month = 'January'

Join two tables with additional field in one table [duplicate]

This question already has answers here:
SQL JOIN and different types of JOINs
(6 answers)
Closed 3 years ago.
I would like to join together two tables with additional columns.
First table is for number of products despatched by product
** Table 1 - Despatches **
Month ProductID No_despatched
Jan abc 10
Jan def 15
Jan xyz 12
The second table is for the number of products returned by product, but also an additional column by return reason
** Table 2 - Returns **
Month ProductID No_returned Return_reason
Jan abc 2 Too big
Jan abc 3 Too small
Jan xyz 1 Wrong colour
I would like to join the tables to show returns and despatched on the same row with the number of despatched being duplicated if there are multiple return reasons for the same product.
** Desired output **
Month ProductID No_despatched No_returned Return_reason
Jan abc 10 2 Too big
Jan abc 10 3 Too small
Jan xyz 12 1 Wrong colour
Hope this makes sense...
Thanks in advance!
afk
This seems like a basic JOIN:
select r.month, r.productid, d.no_despathed, r.no_returned, r.return_reason
from returns r join
despatches d
on r.month = d.month and r.productid = d.productid;
The results don't seem particularly useful, because some products are missing (those with no returns). And the amounts are duplicated if there is more than one return record.
just use join
select a.*,b.No_returned,.Return_reason from
table1 join table2 on a.ProductID=b.ProductID
and a.month=b.month
In case of duplicate you may use distinct
Changing the order of clauses in your question produces the result.
with additional columns.
SELECT Table1.Month, Table1.ProductID, Table1.NoDespatched, Table2.NoReturned, Table2.ReturnReason
join two tables
FROM Table1 LEFT JOIN Table2
ON Table1.Month=Table2.Month AND Table1.ProductID=Table2.ProductID
We use a LEFT JOIN because, presumably a product can be dispatched without being returned, but nobody can return a product you didn't send out.

Postgresql: Values of multiple rows in one row

I have the following database:
Car: {[CarID, HorsePower, Brand, HeadDesigner]}
DesignsCar:{[CarID, DesID]}
Designer:{[DesID, Name]}
You should note that while every Car has only 1 HeadDesigner, multiple people can design cars (as in work on them).
Say I have 10 cars in my database. For CarID (1..9) only one DesID per CarID in DesignsCar.
However, for carID 10 we have 3 people working on it (carID has 3 entries in DesignsCar because 3 people worked on it).
Say I do this:
select *
from car c
left outer join designscar ds on c.carid = ds.carid
left outer join designer d on frb.persnr = r.persnr
This gives me 12 rows, when I only want 10. The reason why this gives me 12 rows should be clear: for carID 10 we have 3 people working on it (carID has 3 entries in DesignsCar because 3 people worked on it).
I hope I've done a good job explaining this problem, so here comes my question:
How do I modify the query above so I get 10 Rows. For CarID 10 I'd like the 3 designers to be written in one column (like, comma separated but anything works as long it's in one column).
Is that possible?
You need to aggregate the values. Here is one possibility:
select c.*,
array_agg(d.name) as designer_names
from car c left outer join
designscar ds
on c.carid = ds.carid left outer join
designer d
on frb.persnr = r.persnr
group by c.carid ; -- allowed assuming `carid` is the primary key

SQL Server: Two COUNTs in one query multiplying with one another in output

I have a query is used to display information in a queue and part of that information is showing the amount of child entities (packages and labs) that belong to the parent entity (change). However instead of showing the individual counts of each type of child, they multiply with one another.
In the below case, there are supposed to be 3 labs and 18 packages, however the the multiply with one another and the output is 54 of each.
Below is the offending portion of the query.
SELECT cef.ChangeId, COUNT(pac.PackageId) AS 'Packages', COUNT(lab.LabRequestId) AS 'Labs'
FROM dbo.ChangeEvaluationForm cef
LEFT JOIN dbo.Lab
ON cef.ChangeId = Lab.ChangeId
LEFT JOIN dbo.Package pac
ON (cef.ChangeId = pac.ChangeId AND pac.PackageStatus != 6 AND pac.PackageStatus !=7)
WHERE cef.ChangeId = 255
GROUP BY cef.ChangeId
I feel like this is obvious but it's not occurring to me how to fix it so the two counts are independent of one another like to me they should be. There doesn't seem to be a scenario like this in any of my research either. Can anyone guide me in the right direction?
Because you do multiply source rows by each left join. So sometimes you have more likely cross join here.
SELECT cef.ChangeId, p.Packages, l.Labs
FROM dbo.ChangeEvaluationForm cef
OUTER APPLY(
SELECT COUNT(*) as Labs
FROM dbo.Lab
WHERE cef.ChangeId = Lab.ChangeId
) l
OUTER APPLY(
SELECT COUNT(*) AS Packages
FROM dbo.Package pac
WHERE (cef.ChangeId = pac.ChangeId AND pac.PackageStatus != 6 AND pac.PackageStatus !=7)
) p
WHERE cef.ChangeId = 255
GROUP BY cef.ChangeId
perhaps GROUP BY is not needed now.
From you question its difficult to derive what result do you expect from your query. So I presume you want following result:
+----------+----------+------+
| ChangeId | Packages | Labs |
+----------+----------+------+
| 255 | 18 | 3 |
+----------+----------+------+
Try below query if you are looking for above mentioned result.
SELECT cef.ChangeId, ISNULL(pac.PacCount, 0) AS 'Packages', ISNULL(Lab.LabCount, 0) AS 'Labs'
FROM dbo.ChangeEvaluationForm cef
LEFT JOIN (SELECT Lab.ChangeId, COUNT(*) LabCount FROM dbo.Lab GROUP BY) Lab
ON cef.ChangeId = Lab.ChangeId
LEFT JOIN (SELECT pac.ChangeId, COUNT(*) PacCount FROM dbo.Package pac WHERE pac.PackageStatus != 6 AND pac.PackageStatus !=7 GROUP BY pac.ChangeId) pac
ON cef.ChangeId = pac.ChangeId
WHERE cef.ChangeId = 255
Query Explanation:
In your query you didn't use group by, so it ended up giving you 54 as count which is Cartesian product.
In this query I tried to group by 'ChangeId' and find aggregate before joining tables. So 3 labs and 18 packages will be counted before join.
Your will also notice that I have moved PackageStatus filter before group by in pac table. So unwanted record won't mess with our count.
You start with a particular ChangeId from the dbo.ChangeEvaluationForm table (ChangeId = 255 from your example), then join to the dbo.Lab table. This join makes your result go from 1 row to 3, considering there are 3 Labs with ChangeId = 255. Your problem is on the next join, you are joining all 3 resulting rows from the previous join with the dbo.Package table, which has 18 rows for ChangeId = 255. The resulting count for columns pac.PackageId and lab.LabRequestId will then be 3 x 18 = 54.
To get what you want, there are 2 easy solutions:
Use COUNT DISTINCT instead of COUNT. This will just count the different values of pac.PackageId and lab.LabRequestId and not the repeated ones.
Split the joins into 2 subqueries and join their result (by ChangeId)

2 Queries same logic but different no. of output rows

I have 2 query which both aims to select all batchNo that follows 3 conditions:
ClaimStatus must be 95 or 90
CreatedBy = ProviderLink
The minimum dateUpdate should be from 3pm yesterday until when this query was run
Query 1: Outputs 940 rows
SELECT
DISTINCT bh.BatchNo,
bh.Coverage,
DateUploaded = MIN(csa.DateUpdated)
FROM
Registration2..BatchHeader bh with(nolock)
INNER JOIN ClaimsProcess..BatchHeader bhc with(nolock) on bhc.BatchNo = bh.BatchNo
INNER JOIN ClaimsInfo ci with(nolock) on ci.BatchNo = bhc.BatchNo
INNER JOIN Claims c with(nolock) on c.ClaimNo = ci.ClaimNo
INNER JOIN ClaimStatusAudit csa WITH(NOLOCK) on csa.CLAIMNO = ci.ClaimNo
WHERE c.ClaimStatus in('95','90') AND bhc.CreatedBy = 'PROVIDERLINK'
GROUP BY bh.BatchNo, bh.Coverage
HAVING MIN(CSA.DateUpdated) >= convert(varchar(10),GETDATE() -1,110) + ' 15:00:00.000'
Query 2: Outputs 1314 rows
SELECT
DISTINCT bh.BatchNo,
bh.Coverage
FROM Registration2..BatchHeader bh with(nolock)
INNER JOIN ClaimsProcess..BatchHeader bhc with(nolock) on bhc.BatchNo = bh.BatchNo
INNER JOIN ClaimsInfo ci with(nolock) on ci.BatchNo = bhc.BatchNo
INNER JOIN Claims c with(nolock) on c.ClaimNo = ci.ClaimNo
WHERE c.ClaimStatus in('95','90') AND bhc.CreatedBy = 'PROVIDERLINK'
AND (SELECT MIN(DATEUPDATED) FROM CLAIMSTATUSAUDIT WITH(NOLOCK)WHERE CLAIMNO = ci.ClaimNo) >= convert(varchar(10),GETDATE() -1,110) + ' 15:00:00.000'
Though both got the same logic.. they output different number of rows... I would like to know which among the two is more accurate...
BTW.. Both outputs follow the 3 given conditions..
Your assumption is wrong. These two queries are not employing the same logic, simply because of the order in which each clause is evaluated. Clauses are evaluated in the following order (see here for the full article):
From
Where
Group By
Having
Select
Order By
With that detail out of the way, let's analyze why these two queries return a different number of rows.
The reason you're returning a different number of rows is because of when you are filtering for a date prior to after 3pm today.
In Query 1, you're selecting all Batch Numbers and Coverages that meet two conditions:
1. have corresponding records in all joined tables
2. have the desired claim status and were created by "ProviderLink"
You get this list of records once the From, Where, and Group by clauses have been executed.
You are then running the aggregate calculation (Min) on that set of data, pulling the minimum DateUpdated, yet you have not yet put any restriction on how the DateUpdated should be limited. So when you then group your data and filter the groups using the Having clause, you're filtering out all records that meet criteria from numbers 1 and 2 above and also had a DateUpdated prior to 3pm today. Let's look at an example.
Record 1 has a BatchNo 123 and Coverage A and was last updated on 4/4/2014 12:00:00.000
Record 2 has a BatchNo 123 and Coverage A and was last updated today at 5/1/2014 3:01:00.000
Assuming Records 1 & 2 have corresponding records in all joined tables, query 1 will pull back the distinct BatchNo and Coverage (123 & A, respectively) and find the minimum DateUpdated which is 4/4/2014 12:00:00.000. Then, once grouped, your Having clause will say the DateUpdated is not greater than today at 3pm, so it will filter the grouped records out.
Query 2, on the other hand, is taking a different approach. It will see Records 1 and 2 as the same in terms of BatchNo & Coverage because those values are identical. However, in the where clause (i.e., the initial filtering process), it's only looking for records where the minimum DateUpdated is greater than today at 3pm, so it's finding Record 2, and returning it in the dataset.
I think you will find that is the case with the 374 missing records from Dataset 1.
All that said, and with the understanding that we cannot tell you which dataset is better, you'll find that Query 1 will only show groups of distinct BatchNos & Coverages where the minimum DateUpdated among any of the records falling into that group was last updated after 3pm today. This means Query 1 is returning only BatchNos and Coverages which contain very new records.
Query 2 is returning any distinct BatchNo & Coverage groupings where any record within its group was last updated after 3pm today. So which one is right for you?