Sum datediff and group by - sql

I have this simple query, but its not producing the results I want... hopefully you can help:
The result is:
Construction 2
Construction 3
Emergency Funds 4
Housing 5
Seniors Services 9
Seniors Services 185
What I want is:
Construction 5
Emergency Funds 4
Housing 5
Seniors Services 194
SELECT T.NAME, SUM(DATEDIFF (HH,T.DATE_DUE,T.DATE_START))as Donation_Hours FROM TASKS T
GROUP BY t.name, T.DATE_DUE,T.DATE_START
order by name

Try this:
SELECT T.NAME, SUM(DATEDIFF(HH,T.DATE_DUE,T.DATE_START))as Donation_Hours
FROM TASKS T
GROUP BY t.name
ORDER BY name

Stan, some additional detail will go a long way in helping you with this problem. The specific database platform, some sample data, and what you mean when you compare what your result is and what you want it to be, should be included at a minimum.
That being said I think you're alluding to the fact that you have multiple instances of the name rather than a single result per name with a grand total of the hours difference. If this is your problem, you can fix it by removing everything in your group by after the t.name. You're not required to put the constituent elements of an aggregate in the group by clause in the same manner that you would be if you had listed them separately.
I'm assuming that you're using MSSQL through SSMS. This answer was checked against SQL2008 R2.

Related

SQL Aggregate Function over partitions

I'm relatively new to SQL but have learned some cool stuff. I'm getting results that don't make sense. I've got a query with several subqueries and what-not but I have a windowed function that isn't working like I'm expecting.
The part that isn't working is this (simplified from the 300 line query):
SELECT AVG(table.sales_amount)
OVER (PARTITION BY table.month, table.sales_rep, table.department)
FROM table
The problem is that when I pull the data non aggregated I get a value different (107) than the above returns (95).
I've used windowed functions for COUNT and SUM and they work fine, but AVG is acting strangely. Am I missing something about how this works with AVG?
The subquery that table is a standin for looks like:
sales_rep, month, department, sales_amount
1, 2017-1, abc, 125.20
1, 2017-2, abc, 120.00
2, 2017-1, def, 100.00
...etc
Working out of Sql Server Management studio
SOLVED: I did finally figure it out, the results i was joining this subquery to had the sales rep multiple times in a month selling objects A&B which caused whoever sold both to be counted twice. whoops, my bad.
The results that you get should be the same values as in:
SELECT AVG(table.sales_amount)
FROM table
GROUP BY table.month, table.sales_rep, table.department;
Of course, the rows will be different. You need to match up the three key columns.
Based on your sample data, it looks like the partitioning keys uniquely define each row. Perhaps you really intend:
SELECT AVG(table.sales_amount) OVER () as overall_average
FROM table;
EDIT:
For the departmental average:
SELECT AVG(table.sales_amount) OVER (partition by table.department) as department_average
FROM table;
After some bruteforcing of potential errors I finally figured out the issue. I was joining that subquery to the another which had multiple instances of a sales_rep in a given month (selling objects a & b) which caused the average of those with sales of both objects to be counted twice instead of once.
so sales rep 1 sold objects a & b which made his avg count as 66% of the dept avg instead of 50%, and sales rep 2 count only 33%.

Problems with distinct in SQL query

Okay, i've been trying it for a while and haven't succeeded yet, it's kind of mystical, so please help.
Here is my table. I need to select all distinct models and group/order them by the vehicle_type. Everything is ok until I start using DISTINCT.
I'm using postgres
Little help with query please?
Assuming model could be shared between several vehicle types:
SELECT vehicle_type,model
FROM vehicle
GROUP BY vehicle_type,model
ORDER BY vehicle_type,model
The data model does not adequately capture your reporting requirments as the column data needs to be inspected to categorise it but something like:
(Extrapolating a possible relationship from your description)
SELECT CASE (vt.description ~ 'car$')
WHEN TRUE THEN 'car'
ELSE 'van'
END AS vehicle_group,
vt.description AS vehicle_sub_group,
COUNT (*) -- or whatever aggregates you might need
FROM vehicle v
INNER JOIN vehicle_type vt ON vt.vehicle_type = v.vehicle_type
GROUP BY 1,2;
Might get you towards what you need in the stated case, however it is a fragile way of dealing with data and will not cope well with additional complexities e.g. if you need to further split car into saloon car, sports car, 4WD or van into flatbed, 7.5 ton, 15 ton etc.

MS Access Complicated Order By

For an invoicing app I'm working on my PHB has decided that the parts sold listed on the invoice need to go in a very unique order. I would like to accomplish this with a single sql statement if possible.
Essentially the order needs to be as such
The most expensive part (but only if there is another part listed at $0)
All parts listed at $0
All other parts (or all parts) listed by order of part_id
All parts with a part_id of ("MISC-30","MISC-31","TEMP")
All parts with a negative qty [returns]
There is also a jumble of comments that will need to be added but That will have to be handled by the code
So far I have:
SELECT *
FROM order_part
WHERE ordid = 1234
ORDER BY qty > 0, part_id NOT IN("MISC-30","MISC-31","TEMP"), part_id
However I cannot figure out how to incorporate the first 2 rules
Since you've had to give up being messing long ago on this project ;)
Select *
, IIF(((Select Count(*) from order_part
where orderid = 1234 and price = 0))=0
and price = ((select max(price) from
order_part where orderid = 1234
and qty >0 and part_id not in(("MISC-30","MISC-31","TEMP")
)), 1
, IIf(price = 0, 2
, IIf(part_id IN("MISC-30","MISC-31","TEMP"), 4
, IIf(qty < 0, 5
, 3)))) AS Part_Sort
from order_part
Order By Part Sort, part_id
Really wish Access had case statement. But you can build these nested IIf's and provide a sorting number based on your logic. The final "ELSE" part is the #3 since just sorting by the part ID is the third choice/ doesn't fall under these other categories. Sorry, I know the parenthesis are wrong.
Perhaps you mean you want to have a single recordset i.e. output from your SQL that can be processed by your invoicing App?
Have you thought of the folling -- Its not pretty but it might work.
Select * From
(
Select 1 as MyOrder .... rest of criteria 1
Union
Select 2 as MyOrder .... rest of criteria 2
Union
Select 3 as MyOrder .... rest of criteria 3
Union
Select 4 as MyOrder .... rest of criteria 4
Union
Select 5 as MyOrder .... rest of criteria 5
)
Order by MyOrder
If it's not possible to do this with one select statement, I would write 5 queries that each get the parts of this end query you need with no intersections. Then add a SortBy integer value to each query and union them together (sorting by the SortBy value).
I've done this in SQL Server and I'm guessing this is possible in Access...
Even if this is possible with a single query, I think you owe it to yourself and future developers to make individual queries and join the results together some other way.
The only exception to this is if performance is 100% completely critical and you need to save every microsecond.
But as a developer and manager I'd rather see maintainable code that a junior team member can figure out than some uber-messy SQL statement.
My advice: have the guy that makes you waste your precious time doing such dummy things fired! He must be a slave driver or something? If you can't have him fired, leave the company.

A simple MDX question

I am new to MDX and I know that this must be a simple question but I haven't been able to find an answer.
I am modeling a a questionnaire that has questions and answers. What I am trying to achieve is to find out the number of people who gave specific answers to questions., e.g. the number of males aged between 20-25
When I run the query below for the questions individually the correct result is returned
SELECT
[Measures].[Fact Demographics Count] ON Columns
FROM
[Dsv All]
WHERE
[Answer].[Dim Answer].&[1]
[Measures].[Fact Demographics Count] is a count of the primary key column
[Answer].[Dim Answer].&[1] is the key for the Male answer
Result for number of people who are male = 150
Result for number of people who are between 20-25 = 12
But when I run the next query below rather than getting the number people who are males and aged between 20-25. I get the sum of the number of people who are males and the number of people who are between 20-25.
SELECT
[Measures].[Fact Demographics Count] ON Columns
FROM
[Dsv All]
WHERE
{[Answer].[Dim Answer].&[1],[Answer].[Dim Answer].&[9]}
result = 162
The structure of the fact table is
FactDemographicsKey,
RespodentKey,
QuestionKey,
AnswerKey
Any help would be greatly appreciated
Thanks
Take a look at the MDX function FILTER - this may give you what you need. A combination of FILTER and Member Properties to filter against the ID's might do it. You're having a problem because what you're trying to do is a little against the grain of how an OLAP cube is structured (from my experience) because Age and Gender are both members of the same dimension (Answers), which means that they each get their own cells for aggregation, but unlike if Age and Gender were each on their own dimension, they don't get aggregated with respect to one another except to get added together. In an OLAP cube, each combination of each member of each dimension with each member of every other dimension gets a "cell" with the value of each measure that is unique to that combination - that is what you want, but members of the same dimension (such as Answers) aren't cross-calculated in that way.
If possible, I would recommend breaking out the individual answers into individual dimensions, i.e. Age and Gender each have their own dimensions with their own members, then what you want to do will naturally flow out of your cube. Otherwise, I'm afraid you will have lots of MDX fiddelry to do. (I am not an MDX expert, though, so I could be completely off base on this one, but that is my understanding)
Also, definitely read the book previously mentioned, MDX Solutions, unless this is the only MDX query you think you'll need to write. MDX and Multidimensional analysis are nothing like SQL, and a solid understanding of the structure of an OLAP database and MDX in general is absolutely essential, and that book does a very, very nice job of getting you where you need to be in that department.
When trying to figure out problems with where-criteria or slices I find it helpful to breakdown the items that you're slicing on into dimensions, rather than measures.
select
[Measures].[Fact Demographics Count] on Columns
from [Dsv All]
where
{
[Answer].[Dim Answer].&[1],
[Dim Age Band].[20-25]
}
Although then you're not really using the power of MDX - you're getting just one value.
select
[Dim Answer].Members on Columns,
[Dim Age Band].Members on Rows
from [Dsv All]
where ( [Measures].[Fact Demographics Count] )
Will give you a pivot table (or crosstable) breaking down gender (on columns) by age-bands (on rows).
BTW - ff you're learning MDX this book: MDX Solutions is far and away the best starting point that I've found.
Firstly thanks to everyone for their replies. This was an interesting one to solve and for anyone new to MDX and coming from SQL its an easy trap to fall into.
So for those interested here is a brief overview of the solution.
I have 3 tables
factDemographics: holds respondents and their answers (who answered what)
dimAnswer: the answers
dimRespondent: the respondents
In the datasource view for the cube I duplicated factDemographics 5 times using Named Queries and I named these fact1, fact2, ..., fact5. (which will create 5 measure groups)
Using VS Studio's create cube wizard I set the following fact tables
fact1, fact2, ... as fact tables
dimRespondent a fact table. I use this table to get the number of respondents.
Removed the original factDemographics table.
Once the cube was created I duplicated the dimAnswer dimension 5 times, naming them filter1, filter2, ...
Finally in the Cube Structure's Dimension Usage tab I linked these together as follows
filter1 many to many dimRespondent
filter2 many to many dimRespondent
filter3 many to many dimRespondent
filter4 many to many dimRespondent
filter5 many to many dimRespondent
filter1 regular relationship fact1
filter2 regular relationship fact2
filter3 regular relationship fact3
filter4 regular relationship fact4
filter5 regular relationship fact5
This now enables me to rewrite the query I used in my original post as
SELECT
[Measures].[Dim Respondent Count] On 0
FROM
[DemographicsCube]
WHERE
(
[Filter1].[Answer].&[Male],
[Filter2].[Answer].&[20-25]
)
My query can now be filtered by up to 5 questions.
Although this works I'm sure that there is a more elegant solution. If anyone knows what that is I'd love to hear it.
Thanks
If you are using MSSQL, you can use the "WITH ROLLUP" to get some extra rows which would have the information you want. Also, you are not using a "GROUP BY" which you will need.
Use the GROUP BY to break up the set into groups and then use aggregate functions to get your counts and other stats.
Example:
select AGE, GENDER, count(1)
from MY_TABLE
group by AGE, GENDER
with rollup
This would give you the number of each gender of person in your table in each age group, and the "rollup" would give you the total number of people in your table, the numbers in each age group regardless of gender, and the numbers of each gender regardless of age. Something like
AGE GENDER COUNT
--- ------ -----
20 M 1245
21 M 1012
20 F 942
21 F 838
M 2257
F 1780
20 2187
21 1850
4037

SQL Group By

If I have a set of records
name amount Code
Dave 2 1234
Dave 3 1234
Daves 4 1234
I want this to group based on Code & Name, but the last Row has a typo in the name, so this wont group.
What would be the best way to group these as:
Dave/Daves 9 1234
As a general rule if the data is wrong you should fix the data.
However if you want to do the report anyway you could come up with another criteria to group on, for example LEFT(Name, 4) would perform a grouping on the first 4 characters of the name.
You may also want to consider the CASE statement as a method (CASE WHEN name = 'Daves' THEN 'Dave' ELSE name), but I really don't like this method, especially if you are proposing to use this for anything else then a one-off report.
If it's a workaround, try
SELECT cname, SUM(amount)
FROM (
SELECT CASE WHEN NAME = 'Daves' THEN 'Dave' ELSE name END AS cname, amount
FROM mytable
)
GROUP BY cname
This if course will handle only this exact case.
For MySQL:
select
group_concat(distinct name separator '/'),
sum(amount),
code
from
T
group by
code
For MSSQL 2005+ group_concat() can be implemented as .NET custom aggregate.
Fix the typo? Otherwise grouping on the name is going to create a new group.
Fixing your data should be your highest priority instead of trying to devise ways to "work around" it.
It should also be noted that if you have this single typo in your data, it is likely that you have (or will have at some point in the future) even more screwy data that will not cleanly fit into your code, which will force you to invent more and more "work arounds" to deal with it, when you should be focusing on the cleanliness of your data.
If the name field is suppose to be a key then the assumption has to be that Dave and Daves are two different items all together, and thus should be grouped differently. If however it is a typo, then as other have suggested, fix the data.
Grouping on a freeform entered text field if that is what this is, will always have issues. Data entry is never 100%.
To me it makes more sense to group on the code alone if that is the key field and leave name out of the grouping all together.