Sum duplicate row values - sql

I'm trying to sum the values of rows two of which have duplicate values, the table I have is below:
Table name (Customers)
value years total
1 30 30
3 10 10
4 15 15
4 25 25
I would ideally like to finally have:
value years total
1 30 30
3 10 10
4 40 40
I've tried using SELECT DISTINCT and GROUP BY to get rid of the duplicate row, also the join in the code below isn't necessary. Regardless both commands come to no avail. Here's my code too:
SELECT DISTINCT
value,
years,
SUM(customer.years) AS total
FROM customer
INNER JOIN language
ON customer.expert=language.l_id
GROUP BY
expert,
years;
But that produces a copy of the first table, any input welcome. Thanks!!!

SELECT
value,
SUM(years) AS years,
SUM(total) AS total
FROM customers
GROUP BY value;
You want the sum of the years and the sum of the total, per — grouped by — value.

SELECT
value,
years,
SUM(customer.years) AS total FROM (SELECT DISTINCT
value,
years,
customer.years AS total
FROM customer
INNER JOIN language
ON customer.expert=language.l_id ) as TABLECUS
GROUP BY
expert,
years;

Related

GROUP BY one column, then GROUP BY another column

I have a database table t with a sales table:
ID
TYPE
AGE
1
B
20
1
BP
20
1
BP
20
1
P
20
2
B
30
2
BP
30
2
BP
30
3
P
40
If a person buys a bundle it appears the bundle sale (TYPE B) and the different bundle products (TYPE BP), all with the same ID. So a bundle with 2 products appears 3 times (1x TYPE B and 2x TYPE BP) and has the same ID.
A person can also buy any other product in that single sale (TYPE P), which has also the same ID.
I need to calculate the average/min/max age of the customers but the multiple entries per sale tamper with the correct calculation.
The real average age is
(20 + 30 + 40) / 3 = 30
and not
(20+20+20+20 + 30+30+30 + 40) / 8 = 26,25
But I don't know how I can reduce the sales to a single row entry AND get the 4 needed values?
Do I need to GROUP BY twice (first by ID, then by AGE?) and if yes, how can I do it?
My code so far:
SELECT
AVERAGE(AGE)
, MIN(AGE)
, MAX(AGE)
, MEDIAN(AGE)
FROM t
but that does count every row.
Assuming the age is the same for all rows with the same ID (which in itself indicates a normalisation problem), you can use nest aggregation:
select avg(min(age)) from sales
group by id
AVG(MIN(AGE))
-------------
30
SQL Fiddle
The example in the documentation is very similar; and is explained as:
This calculation evaluates the inner aggregate (MAX(salary)) for each group defined by the GROUP BY clause (department_id), and aggregates the results again.
So for your version:
This calculation evaluates the inner aggregate (MIN(age)) for each group defined by the GROUP BY clause (id), and aggregates the results again.
It doesn't really matter whether the inner aggregate is min or max - again, assuming they are all the same - it's just to get a single value per ID, which can then be averaged.
You can do the same for the other values in your original query:
select
avg(min(age)) as avg_age,
min(min(age)) as min_age,
max(min(age)) as max_age,
median(min(age)) as med_age
from sales
group by id;
AVG_AGE MIN_AGE MAX_AGE MED_AGE
------- ------- ------- -------
30 20 40 30
Or if you prefer you could get the one-age-per-ID values once ina CTE or subquery and apply the second layer of aggregation to that:
select
avg(age) as avg_age,
min(age) as min_age,
max(age) as max_age,
median(age) as med_age
from (
select min(age) as age
from sales
group by id
);
which gets the same result.
SQL Fiddle

How is average value in conjunction with group by is different from straightforward average in subquery?

Can anyone help me with the below?
SELECT SalesOrderID, SUM (LineTotal) AS TotalSales
FROM SALES.SalesOrderDetail
GROUP BY SalesOrderID
HAVING SUM (LineTotal)>
--query1
(
SELECT AVG (SumCalc.SumValues)
FROM (SELECT SUM(LineTotal) AS SumValues
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
) AS SumCalc
)
---query2
SELECT AVG (LineTotal)
FROM SALES.SalesOrderDetail
why is query 1 and query 2 giving different results? I am just trying to make the query 1 simple. How does the use of GROUP BY Clause make a difference here? When I ran the sum in different queries the value came to be identical, and i am not sure how the way avg is calculated is different here.
This is the difference between a weighted average and a simple average. Consider this simple example:
code value
A 1
A 1
A 4
B 4
The overall average is simply 2.5 -- (1 + 1 + 4 + 4) / 4.
If we aggregate by the code first and then take the average, we have:
code average
A 2
B 4
The average of these two rows is 3, quite different.
Simply put: The average of averages on aggregated data can be quite different from the overall average.

Forcing empty rows from query

I have a table containing monthly statistics for clients.
Columns are CustNo, Year, Month, Trips
Some customers do not have any trips in some months and therefore there are combinations of CustNo, Year and Month that have no rows in that table.
I am trying to write a Query that shows 0 for those combinations of CustNo, Year and Month that have no trips, instead of producing an empty row.
To start with I have created a ValidPeriods table that has a Year and a Month column containing those periods that are valid.
I can then Query like this:
SELECT v.ValidYear, v.ValidMonth, tc.CustNo, tc.Trips
FROM ValidPeriods v
LEFT OUTER JOIN TempTrips AS tc ON v.ValidYear = tc.Year
AND v.ValidMonth = tc.Month
WHERE tc.CustNo IN (1001230, 1001286, 1001292)
This will give me rows for all periods, with 1 row with NULL values for those periods where there are no customers in the list that have any trips.
But how do I get one row for each customer in the list for all periods?
Ideally I want this:
2016 1 1001230 0
2016 1 1001286 14
2016 1 1001292 23
2016 2 1001230 7
2016 2 1001286 0
2016 2 1001292 4
etc...
Generate the rows using cross join. Then fill in the values using left join:
SELECT ym.ValidYear, ym.ValidMonth, c.CustNo, COALESCE(tt.Trips, 0)
FROM ValidPeriods ym CROSS JOIN
(VALUES (1001230), (1001286), (1001292)) c(CustNo) LEFT JOIN
TempTrips tt
ON tt.ValidYear = ym.ValidYear AND tt.ValidMOnth = ym.ValidMonth AND
tt.CustNo = c.CustNo;

sql server : select rows who's sum matches a value [duplicate]

This question already has answers here:
How to get rows having sum equal to given value
(4 answers)
Closed 9 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
here is table T :-
id num
-------
1 50
2 20
3 90
4 40
5 10
6 60
7 30
8 100
9 70
10 80
and the following is a fictional sql
select *
from T
where sum(num) = '150'
the expected result is :-
(A)
id num
-------
1 50
8 100
(B)
id num
-------
2 20
7 30
8 100
(C)
id num
-------
4 40
5 10
8 100
the 'A' case is most preferred !
i know this case is related to combinations.
in real world - client gets items from a shop, and because of an agreement between him and the shop, he pay every Friday. the payment amount is not the exact total of items
for example: he gets 5 books of 50 € ( = 250 € ), and on Friday he bring 150 €, so the first 3 books are perfect match - 3 * 50 = 150. i need to find the id's of those 3 books !
any help would be appreciated!
You can use recursive query in MSSQL to solve this.
SQLFiddle demo
The first recursive query build a tree of items with cumulative sum <= 150. Second recursive query takes leafs with cumulative sum = 150 and output all such paths to its roots. Also in the final results ordered by ItemsCount so you will get preferred groups (with minimal items count) first.
WITH CTE as
( SELECT id,num,
id as Grp,
0 as parent,
num as CSum,
1 as cnt,
CAST(id as Varchar(MAX)) as path
from T where num<=150
UNION all
SELECT t.id,t.num,
CTE.Grp as Grp,
CTE.id as parent,
T.num+CTE.CSum as CSum,
CTE.cnt+1 as cnt,
CTE.path+','+CAST(t.id as Varchar(MAX)) as path
from T
JOIN CTE on T.num+CTE.CSum<=150
and CTE.id<T.id
),
BACK_CTE as
(select CTE.id,CTE.num,CTE.grp,
CTE.path ,CTE.cnt as cnt,
CTE.parent,CSum
from CTE where CTE.CSum=150
union all
select CTE.id,CTE.num,CTE.grp,
BACK_CTE.path,BACK_CTE.cnt,
CTE.parent,CTE.CSum
from CTE
JOIN BACK_CTE on CTE.id=BACK_CTE.parent
and CTE.Grp=BACK_CTE.Grp
and BACK_CTE.CSum-BACK_CTE.num=CTE.CSum
)
select id,NUM,path, cnt as ItemsCount from BACK_CTE order by cnt,path,Id
If you restrict your problem to "which two numbers add up to a value", the solution is as follows:
SELECT t1.id, t1.num, t2.id,t2.num
FROM T t1
INNER JOIN T t2
ON t1.id < t2.id
WHERE t1.num + t2.num = 150
If you also want the result for three and more numbers you can achieve that by using the above query as a base for recursive SQL. Don't forget to specify a maximum recursion depth!
To find the id's of the books that the client is paying, you would need to have a table with your clients, and another one to store the orders of the client, and what products he bought.
Otherwise it would be impossible to know what product the payment refers to.

Sqlite: Selecting records spread over total records

I have a sql / sqlite question. I need to write a query that select some values from a sqlite database table. I always want the maximal returned records to be 20. If the total selected records are more than 20 I need to select 20 records that are spread evenly (no random) over the total records. It is also important that I always select the first and last value from the table when sorted on the date. These records should be inserted first and last in the result.
I know how to accomplish this in code but it would be perfect to have a sqlite query that can do the same.
The query Im using now is really simple and looks like this:
"SELECT value,date,valueid FROM tblvalue WHERE tblvalue.deleted=0 ORDER BY DATE(date)"
If I for example have these records in the talbe and to make an easier example the maximum result I want is 5.
id value date
1 10 2010-04-10
2 8 2010-04-11
3 8 2010-04-13
4 9 2010-04-15
5 10 2010-04-16
6 9 2010-04-17
7 8 2010-04-18
8 11 2010-04-19
9 9 2010-04-20
10 10 2010-04-24
The result I would like is spread evenly like this:
id value date
1 10 2010-04-10
3 8 2010-04-13
5 10 2010-04-16
7 8 2010-04-18
10 10 2010-04-24
Hope that explain what I want, thanks!
Something like this should work for you:
SELECT *
FROM (
SELECT v.value, v.date, v.valueid
FROM tblvalue v
LEFT OUTER JOIN (
SELECT min(DATE(date)) as MinDate, max(DATE(date)) as MaxDate
FROM tblvalue
WHERE tblvalue.deleted = 0
) vm on DATE(v.date) = vm.MinDate or DATE(v.date) = vm.MaxDate
WHERE tblvalue.deleted = 0
ORDER BY vm.MinDate desc, Random()
LIMIT 20
) a
ORDER BY DATE(date)
I think you want this:
SELECT value,date,valueid FROM tblvalue WHERE tblvalue.deleted=0
ORDER BY DATE(date), Random()
LIMIT 20
In other words you want select rows with date column, so that date is from the sorted list of dates, from where we take every odd element? And add the last recorded element (with the latest date)? And everything limited to max 20 rows?
If that's the case, then I think this one should do:
SELECT id,value,date FROM source_table WHERE date IN (SELECT date FROM source_table WHERE (rowid-1) % 2 = 0 OR date = (SELECT max(date) FROM source_table) ORDER BY date) LIMIT 20