An aggregation is affecting results in a major way - sql

I seem to be getting duplicates as a result of this query. The only analysis I want to do is the sum of calls/the total orders, and to be able to see how many support_tickets were generated from orders within an order range, up to a call_date. Very simple, but surprisingly complex to code up. Here is my attempt. I have also tried to change the below into a union, but still get wrong aggregate results.
The query:
SELECT marketing_code,
count(order_code) order_code_count,
order_date,
sum(support_ticket_call) call_count,
call_date
FROM
(select distinct marketing_code, order_code, order_date from table1) a
left join
(select count(call_ids) as support_ticket_Call, call_date
FROM table2 group by call_date) b
on b.order_ID_code = a.order_id_code
group by marketing_code, order_date, call_date
Please note, the call can happen at a much later date than the order. The order date is in table 1, but not in table 2; the call_date is in table 2, but not in table 1. Also, in the data, the marketing code is either AB16 or AB17.
Sample data:
Marketing code order_code_count call_count call_date order_date
AB16 30 45 2016-01-01 2015-12-27
AB17 13 17 2016-01-02 2015-12-29
AB16 24 29 2016-01-02 2016-01-01
The sum of support ticket calls should be lower than the order count.

You join your tables by order_id_code, but in the right part of your join you count all calls from one day. This doesn't seem right. Try something like this:
select
marketing_code
count(order_code) order_code_count
order_date
count(call_ids) call_count
call_date
from
table1 a left join table2 b on b.order_ID_code = a.order_id_code
group by
marketing_code, order_date, call_date

Related

How to GROUP BY and aggregate fields after JOINS in Query

I have the following data which I got from the following query:
date
quantity
name
season_id
contract_id
signing_date
1
2016-07-01 00:00:00
3
John Doe
4
3000
2016-10-20
2
2021-07-28 00:00:00
14
John Doe
5
3541
2021-01-28
3
2016-08-15 00:00:00
10
John Doe
5
3000
2016-10-20
4
2016-08-02 00:00:00
5
John Doe
5
1528
2016-03-02
WITH ws AS (select date, quantity,
name, season_id, contract_id, contract.signing_date
FROM warehouse_state
JOIN inventory ON inventory.id = warehouse_state.inventory_id
JOIN owner ON owner.inventory_id = warehouse_state.id
JOIN season ON season.id = owner.season_id
JOIN contract ON contract.id = warehouse_contract.contract_id
GROUP BY date, quantity, name, season.id, contract.id, signing_date)
Now, I am having trouble aggregating the ws records based on dates.
Let's say I want a SUM of quantity grouped by date where date is date before contract signing_date. Not sure how to proceed with this, and probably it can be done in a single query without having a WITH x AS query or something actually using it like:
SELECT * FROM ws
LEFT JOIN contract on contract.contract_id = ws.contract_id
-- Here set following condition: for any ws record that has `date` before `signing_date`, SUM quantity and return aggregate
Expected output:
contract_id
signing_date
quantity
name
3000
2016-10-20
18
John Doe
3541
2021-01-28
18
John Doe
1528
2021-01-28
0
John Doe
In the expect output quantity is a SUM, and the record is grouped by contract. In the first record, #1, #3, and #4 were aggregated because their date values are before the contract (3000) signing_date. Even though, the 4th record does not have the same contract_id, it's also aggregated because its date field is before the signing date in contract 3000. Similarly, when grouped by contract 3541, record #2 is excluded from the aggregation because its date value is not before the signing_date of contract 3541.
Any suggestions? Thanks
Does that SQL really compile? Reason is I see you referencing an inventory table that I don't see anywhere.
Also you are grouping on all columns -- essential a "select distinct." Is that what you meant to do?
That aside, assuming your joins are correct and a couple of other assumptions, I'm going to sub them all with "< your tables and joins >." I think all you want is a simple aggregate. No need for a CTE (with clause).
select
date, sum (quantity)
FROM
< your tables and joins >
where
date < signing_date
GROUP BY
date
Alternatively, you can see the total quantity for all dates AND the total quantity before the contract date using a filter:
select
date, sum (quantity) as total_quantity,
sum (quantity) filter (where date < signing_date) as qty_before_contract_sign
FROM
< your tables and joins >
GROUP BY
date
If you wanted to see the other columns as well, then you want a windowing function. Let me know if that's the case and I can demonstrate.
-- EDIT 9/7/22 --
Based on your update, I think this is what you want:
select
contract_id, contract.signing_date, sum (quantity) as quantity,
name
FROM warehouse_state
JOIN inventory ON inventory.id = warehouse_state.inventory_id
JOIN owner ON owner.inventory_id = warehouse_state.id
JOIN season ON season.id = owner.season_id
JOIN contract ON contract.id = warehouse_contract.contract_id
where
date < contact.signing_date
GROUP BY
contract_id, contract.signing_date, name
But the one gotcha is Contract 1528 will not show up in this output since it's filtered out by the where condition.
I'm not fond of this, but you could keep the filter to overcome this... maybe there's a better solution.
select
contract_id, contract.signing_date,
coalesce (sum (quantity) filter (where date < contact.signing_date), 0) as quantity,
name
FROM warehouse_state
JOIN inventory ON inventory.id = warehouse_state.inventory_id
JOIN owner ON owner.inventory_id = warehouse_state.id
JOIN season ON season.id = owner.season_id
JOIN contract ON contract.id = warehouse_contract.contract_id
GROUP BY
contract_id, contract.signing_date, name
Also, my output does not match yours, but I'm hoping that's because of sample data.

Finding a min() date for one column and then using this to join with other tables that have a date LESS than this date

In short, I have two tables:
(1) pharmacy_claims (columns: user_id, date_service, claim_id, record_id, prescription)
(2) medical_claims (columns: user_id, date_service, provider, npi, cost)
I want to find user_id's in (1) that have a certain prescription value, find their earliest date_service (e.g. min(date_service)) and then use these user_id's with their earliest date of service as a cohort to pull all of their associated data from (2). Basically I want to find all of their medical_claims data PRIOR to the first time they were prescribed a given prescription in pharmacy_claims.
pharmacy_claims looks something like this:
user_id | prescription | date_service
1 a 2018-05-01
1 a 2018-02-11
1 a 2019-10-11
1 b 2018-07-12
2 a 2019-01-02
2 a 2019-03-10
2 c 2018-04-11
3 c 2019-05-26
So for instance, if I was interested in prescription = 'a', I would only want user_id 1 and 2 returned, with dates 2018-02-11 and 2019-01-02, respectively. Then I would want to pull user_id 1 and 2 from the medical_claims, and get all of their data PRIOR to these respective dates.
The way I tried to go about this was to build out a temp table in the pharmacy_claims table to query the user_id's that have a given medication, and then left join this back to the table to create a cohort of user_id's with a date_service
Here's what I did:
(1) Pulled all of the relevant data from the main pharmacy claims table:
CREATE TABLE user.temp_pharmacy_claims AS
SELECT user_id, claim_id, record_id, date_service
FROM dw.pharmacyclaims
WHERE date_service between '2018-01-01' and '2019-08-31'
This results in ~50,000 user_id's
(2) Created a table with just the user_id's a min(date_service):
CREATE TABLE user.temp_pharmacy_claims_index AS
SELECT distinct user_id, min(date_service) AS Min_Date
FROM user.temp_pharmacy_claims
GROUP BY 1
(3) Created a final table (to get the desired cohort):
CREATE TABLE user.temp_pharmacy_claims_final_index AS
SELECT a.userid
FROM user.temp_pharmacy_claims a
LEFT JOIN user.temp_pharmacy_claims_index b
ON a.user = b.user
WHERE a.date_service < Min_Date
However, this gets me 0 results when there should be a few thousand. Is this set up correctly? It's probably not the most efficient approach, but it looks sound to me, so not sure what's going on.
I think you just want a correlated subquery:
select mc.*
from medical_claims mc
where mc.date_service < (select min(pc.date)
from pharmacy_claims pc
where pc.user_id = mc.user_id and
pc.prescription = ?
);

SQL - Select lowest values with group by and order by?

In my rankings database I have a table named times. I also have another table with authors. The authors have author id's (named ath_id inside the times table).
Records saved in times table:
id ath_id brand_id time date
------------- ------------ -------------- -------------- --------------
65125537 5384729 3 44741 May 8 2014
72073658 4298584 1 1104 Jun 28 2015
86139060 4298584 2 2376 Nov 20 2016
92237079 4298584 1 1115 Jun 24 2017
92237082 4298584 1 1104 Jun 24 2017
93436362 5384729 12 376492 Dec 31 2012
What I want to achieve
I'd like to retrieve an ordered list of the times that belong to the author (by the author id). I'd like to order them by brand_id, and I only want the records with the lowest time value.
Also, when there are multiple records with the same brand_id and the same time value, I'd like the list to be ordered by date. So the record with the latest date will be last.
What I have
I currently use this query: SELECT * FROM times WHERE ath_id = 4298584 GROUP BY brand_id ASC.
It works great, but it limits records with the same brand_id to 1, and thereby it limits records with the same time, even when multiple records have the lowest time value.
To sum it up
So in the case of the example above. When I select all the records with ath_id = 4298584, I'd like to retrieve the following ordered list:
id ath_id brand_id time date
------------- ------------ -------------- -------------- --------------
72073658 4298584 1 1104 Jun 28 2015
92237082 4298584 1 1104 Jun 24 2017
86139060 4298584 2 2376 Nov 20 2016
This is my first time doing a bit more advanced SQL queries. I'm working with Laravel, so giving both a raw SQL solution and a Laravel solution using the Laravel Query Builder wouldn't do any harm.
You could try using a derived table to get the min time for an ath_id and brand_id. Then join it back to your original table to get the rest of the data.
SELECT t.*
FROM times t
JOIN (SELECT ath_id, brand_id, MIN(time) AS time FROM dbo.times GROUP BY ath_id, brand_id) b
ON t.ath_id = b.ath_id AND t.brand_id = b.brand_id AND t.time = b.time
WHERE t.ath_id = 4298584
ORDER BY t.brand_id ASC, t.date DESC
This is another way you can do it. Although the output would be similar to SQLChao's answer, but the difference is that the inner query is creating and assigning ranks to the combination of ath_id,brand_id and date followed ordered by time. Then in outer query, you can use a filter to separate the rank 1. So basically you are replicating row_number() function.
You can use rnk=1 to rnk <= n in case you want first n records for your combination. But in you case, SQLChao's answer would be faster.
select t3.id,t3.ath_id,t3.brand_id,t3.time,t3.date
from times1 t3
inner join
(
select t1.ath_id,t1.brand_id,t1.date,t1.time,count(*) as rnk
from times1 t1
inner join times1 t2
on t1.ath_id=t2.ath_id
and t1.brand_id=t2.brand_id
and t1.date=t2.date
and t1.time >= t2.time
where t1.ath_id=4298584
group by t1.ath_id,t1.brand_id,t1.date,t1.time
) t4
on t3.ath_id=t4.ath_id
and t3.brand_id=t4.brand_id
and t3.date=t4.date
and t3.time = t4.time
and t4.rnk=1
;

SQL Inner Join query

I have following table structures,
cust_info
cust_id
cust_name
bill_info
bill_id
cust_id
bill_amount
bill_date
paid_info
paid_id
bill_id
paid_amount
paid_date
Now my output should display records (1 jan 2013 to 1 feb 2013) between two bill_dates dates as single row as follows,
cust_name | bill_id | bill_amount | tpaid_amount | bill_date | balance
where tpaid_amount is total paid for particular bill_id
For example,
for bill id abcd, bill_amount is 10000 and user pays 2000 one time and 3000 second time
means, paid_info table contains two entries for same bill_id
bill_id | paid_amount
abcd 2000
abcd 3000
so, tpaid_amount = 2000 + 3000 = 5000 and balance = 10000 - tpaid_amount = 10000 - 5000 = 5000
Is there any way to do this with single query (inner joins)?
You'd want to join the 3 tables, then group them by bill ids and other relevant data, like so.
-- the select line, as well as getting your columns to display, is where you'll work
-- out your computed columns, or what are called aggregate functions, such as tpaid and balance
SELECT c.cust_name, p.bill_id, b.bill_amount, SUM(p.paid_amount) AS tpaid, b.bill_date, b.bill_amount - SUM(p.paid_amount) AS balance
-- joining up the 3 tables here on the id columns that point to the other tables
FROM cust_info c INNER JOIN bill_info b ON c.cust_id = b.cust_id
INNER JOIN paid_info p ON p.bill_id = b.bill_id
-- between pretty much does what it says
WHERE b.bill_date BETWEEN '2013-01-01' AND '2013-02-01'
-- in group by, we not only need to join rows together based on which bill they're for
-- (bill_id), but also any column we want to select in SELECT.
GROUP BY c.cust_name, p.bill_id, b.bill_amount, b.bill_date
A quick overview of group by: It will take your result set and smoosh rows together, based on where they have the same data in the columns you give it. Since each bill will have the same customer name, amount, date, etc, we are fine to group by those as well as the bill id, and we'll get a record for each bill. If we wanted to group it by p.paid_amount, though, since each payment would have a different one of those (possibly), you'd get a record for each payment as opposed to for each bill, which isn't what you'd want. Once group by has smooshed these rows together, you can run aggregate functions such as SUM(column). In this example, SUM(p.paid_amount) totals up all the payments that have that bill_id to work out how much has been paid. For more information, please look at W3Schools chapter on group by in their SQL tutorials.
Hope I've understood this correctly and that this helps you.
This will do the trick;
select
cust_name,
bill_id,
bill_amount,
sum(paid_amount),
bill_date,
bill_amount - sum(paid_amount)
from
cust_info
left outer join bill_info
left outer join paid_info
on bill_info.bill_id=paid_info.bill_id
on cust_info.cust_id=bill_info.cust_id
where
bill_info.bill_date between X and Y
group by
cust_name,
bill_id,
bill_amount,
bill_date

SQL select and count all items that have occured before

I have a table with rows that symbolize order dates:
2009-05-15 13:31:47.713
2009-05-15 22:09:32.227
2009-05-16 02:38:36.027
2009-05-16 12:06:49.743
2009-05-16 16:20:26.680
2009-05-17 01:36:19.480
2009-05-18 09:44:46.993
2009-05-18 14:06:12.073
2009-05-18 15:25:47.540
2009-05-19 10:28:24.150
I would like have query that returns the following:
2009-05-15 2
2009-05-16 5
2009-05-17 6
2009-05-18 9
2009-05-19 10
Basically it keeps a running total of all the orders placed by the end of the day of the date indicated. The orders are not the orders on that day but all the orders since the earliest dates in the table.
This is MSSQL 2000 and the datatype in the first table is just datetime, in the second it could be datetime or string, it doesn't really matter for my purposes.
I got this to work on SQL Server 2005. I think it should work with 2000, as well.
SELECT dt, count(q2.YourDate)
FROM (SELECT DISTINCT CONVERT(varchar,YourDate,101) dt FROM YourTable) t1
JOIN YourTable q2 ON DATEADD(d,-1,CONVERT(varchar,YourDate,101)) < dt
GROUP BY dt
This will query the table twice, but at least gives correct output.
I recommend a 2 query solution. This is slow, but I use this method almost daily. The important thing is to NOT join the 2 tables in the first query. You want the duplication of each order for every date in your lookup table.
You will need a Lookup table with 1 row for each date of the time period you're interested in. Let's call it dboDateLookup. Here's what it will look like:
DtIndex
2009-05-15
2009-05-16
2009-05-17
2009-05-18
2009-05-19
Let's also assume the order table, dboOrders has 2 columns, ordernumber and orderdate.
ordernumber orderdate
2009-05-15 13:31:47.713 1
2009-05-15 22:09:32.227 2
2009-05-16 02:38:36.027 3
2009-05-16 12:06:49.743 4
2009-05-16 16:20:26.680 5
Query1:
SELECT
Format([ordernumber],"yyyy-mm-dd") AS ByDate,
ordernumber,
(If Format([orderdate],"yyyy-mm-dd")<=[DtIndex],1,0) AS NumOrdersBefore
FROM [dboOrders], [dboDateLookUp];
Query2:
Select
[ByDate],
sum([NumOrdersBefore]) as RunningTotal
from [Query1];
Try this (returns string dates):
SELECT
LEFT(CONVERT(char(23),YourDate,121),10) AS Date
,COUNT(*) AS CountOf
FROM YourTable
GROUP BY LEFT(CONVERT(char(23),YourDate,121),10)
ORDER BY 1
this will table scan. if it is too slow, consider using a persistant computed column with an index for the date, that will run much faster. However, I'mnot sure if you can do all that in SQL 2000.
EDIT read the question better, try this:
SELECT
d.YourDate
,SUM(dt.CountOf) AS CountOf
FROM (SELECT
LEFT(CONVERT(char(23),YourDate,121),10) AS Date
,COUNT(*) AS CountOf
FROM YourTable
GROUP BY LEFT(CONVERT(char(23),YourDate,121),10)
) dt
INNER JOIN (SELECT
DISTINCT LEFT(CONVERT(char(23),YourDate,121),10) AS Date
FROM YourTable
) d ON dt.Date<=LEFT(CONVERT(char(23),d.YourDate,121),10)
GROUP BY d.YourDate
ORDER BY d.YourDate
I have another one
It is not so fancy
I ran it on Access so syntax may differ little bit.
But it seems to work.
P.S. Im relatively new to SQL
Data:
ID F1 F2
1 15/05/2009 13:31:47.713
2 15/05/2009 22:09:32.227
3 16/05/2009 02:38:36.027
4 16/05/2009 12:06:49.743
5 16/05/2009 16:20:26.680
6 17/05/2009 01:36:19.480
7 18/05/2009 09:44:46.993
8 18/05/2009 14:06:12.073
9 18/05/2009 15:25:47.540
10 19/05/2009 10:28:24.150
Query:
SELECT Table1.F1 AS Dates, SUM(REPLACE(Len(Table1.F2), Len(Table1.F2), 1)) AS Occurred
FROM Table1
GROUP BY Table1.F1;
Result:
Dates Occurred
15/05/2009 2
16/05/2009 3
17/05/2009 1
18/05/2009 3
19/05/2009 1
SELECT Count(*), LEFT(CONVERT(char(23),YourDate,121),10) AS Date FROM
(SELECT
DISTINCT LEFT(CONVERT(char(23),YourDate,121),10) AS Date
FROM YourTable
GROUP BY LEFT(CONVERT(char(23),YourDate,121),10)) x //Gets the distinct dates.
INNER JOIN YourTable y on x.Date >= y.Date
GROUP BY LEFT(CONVERT(char(23),YourDate,121),10)
It's going to be slow. REALLY REALLY slow. I hate to think what run times would be.