SQL query resultset, I need SUM(quantity) COLUMN grouped by DAY - sql

This is the Query I am using on the product table LEFT JOIN on the page table ON the productid column in page being the id in product.. pretty straightforward.
SELECT
COUNT(DISTINCT `p`.`id`) as `quantity`,
DATE_FORMAT(`p`.`created_time`,'%Y-%m-%d') AS `day`
FROM
`product` AS `p`
LEFT JOIN
`page` AS `pg` ON `p`.`id` = `pg`.`productid`
WHERE
`p`.`created_time` BETWEEN '2013-07-03 00:00:00' AND '2013-07-10 23:59:59'
AND
`p`.`group` = '101'
GROUP BY `day`, `p`.`id` HAVING COUNT(`pg`.`productid`)>=10
ORDER BY `p`.`created_time`
The two example tables concerned:
**product**
id created_time
32 2013-07-09
33 2013-07-09
**page**
id productid
1 33
2 33
.. ..
20 33
21 32
22 32
.. ..
54 32
Now my resultset looks like this:
quantity day
1 2013-07-09
1 2013-07-09
1 2013-07-10
But I would like the following output without UNION and without using temp tables:
quantity day
2 2013-07-09
1 2013-07-10
Two tables are now added to my code example on top. I need the number of product with ten or more page grouped by day

I think that is because you are leaving p.id in the group by clause. Try this:
SELECT COUNT(DISTINCT `p`.`id`) as `quantity`,
DATE_FORMAT(`p`.`created_time`,'%Y-%m-%d') AS `day`
FROM `product` AS `p` LEFT JOIN
`page` AS `pg`
ON `p`.`id` = `pg`.`productid`
WHERE `p`.`created_time` BETWEEN '2013-07-03 00:00:00' AND '2013-07-10 23:59:59'
AND `p`.`group` = '101'
GROUP BY `day`
HAVING COUNT(`pg`.`productid`)>=10
ORDER BY `p`.`created_time`

Don't GROUP BY the id value and also the ORDER BY can be on day too
Note that day is not available in the GROUP BY in standard SQL or when sql_mode is using "only_full_group_by". MySQL allows it as an extension but it is misleading
SELECT
COUNT(*) as `quantity`,
DATE_FORMAT(`p`.`created_time`,'%Y-%m-%d') AS `day`
FROM
`product` AS `p`
JOIN
`page` AS `pg` ON `p`.`id` = `pg`.`productid`
WHERE
`p`.`created_time` BETWEEN '2013-07-03 00:00:00' AND '2013-07-10 23:59:59'
AND
`p`.`group` = '101'
GROUP BY
`pg`.`productid`, DATE_FORMAT(`p`.`created_time`,'%Y-%m-%d') AS `day`
HAVING
COUNT(*) >= 10
ORDER BY
`day`;

I found the solution to my query:
SELECT
COUNT(`p`.`id`) as `quantity`,
DATE_FORMAT(`p`.`created_time`,'%Y-%m-%d') AS `day`
FROM
`product` AS `p`
INNER JOIN
(
SELECT
`productid` AS `id`,
count(id) AS pagesNR
FROM
`page`
GROUP BY
`productid` HAVING COUNT(`id`) >= 10
)
AS
`pg` USING (`id`)
WHERE
`p`.`created_time` BETWEEN '2013-07-03 00:00:00' AND '2013-07-10 23:59:59'
AND
`p`.`group` = '101'
GROUP BY
`day`
ORDER BY
`created_time`
Thanks to a friend of my co-worker Daniƫl Versteeg

Related

Splitting value to two columns in SQL

I have a table that stores the VIN numbers and delivery dates of vehicles based on a code. I want to be able to get one row with three columns of data.
I have tried the following
SELECT DISTINCT VIN, MAX(TRANSACTION_DATE) AS DELIVERY_DATE
FROM "TABLE"
WHERE DELIVERY_TYPE ='025'
AND VIN IN ('XYZ')
GROUP BY VIN
UNION ALL
SELECT VIN, MAX(TRANSACTION_DATE) AS OTHER_DELIVERY_DATE
FROM "TABLE"
WHERE DELIVERY_TYPE !='025'
AND VIN IN ('XYZ')
GROUP BY VIN;
When I run this I get
VIN DELIVERY_DATE
XYZ 26-dec-18
XYZ 01-MAY-19
current data format in table:
VIN TRANSACTION_DATE
XYZ 26-DEC-18
XYZ 01-MAY-19
Required format:
VIN DELIVERY_DATE OTHER_DELIVERY DATE
XYZ 26-DEC-18 01-MAY-19
use conditional aggregation
SELECT VIN,
MAX (CASE WHEN DELIVERY_TYPE ='025' AND
VIN IN ('XYZ') then TRANSACTION_DATE end) AS DELIVERY_DATE
MAX(CASE WHEN DELIVERY_TYPE !='025' AND
VIN IN ('XYZ') then TRANSACTION_DATE end) AS OTHER_DELIVERY
FROM "TABLE"
GROUP BY VIN
Just use conditional aggregation:
SELECT VIN,
MAX(CASE WHEN DELIVERY_TYPE = 25 THEN TRANSACTION_DATE END) AS DELIVERY_DATE,
MAX(CASE WHEN DELIVERY_TYPE <> 25 THEN TRANSACTION_DATE END) AS TRANSACTION_DATE
FROM TABLE
WHERE VIN IN ('XYZ')
GROUP BY VIN;
Note that SELECT DISTINCT is almost never used with GROUP BY.
You can use CROSS APPLY
DECLARE #Cars TABLE (VIN VARCHAR(100), DELIVERY_TYPE VARCHAR(3), TRANSACTION_DATE DATE)
INSERT INTO #Cars
(VIN, DELIVERY_TYPE , TRANSACTION_DATE)
VALUES
('XYZ', '025', '20181226'), ('XYZ', '030', '20190319')
I needed above code to be able to run without a table and data, all you need is this:
SELECT DISTINCT C.VIN, DD.DELIVERY_DATE, TD.TRANSACTION_DATE
FROM #Cars C
CROSS APPLY (SELECT MAX(TRANSACTION_DATE) DELIVERY_DATE FROM #Cars D WHERE D.DELIVERY_TYPE = '025' AND D.VIN = C.VIN) DD
CROSS APPLY (SELECT MAX(TRANSACTION_DATE) TRANSACTION_DATE FROM #Cars D WHERE D.DELIVERY_TYPE = '025' AND D.VIN = C.VIN) TD
If you need to transpond not two but a lot more columns, I'd suggest using PIVOT TABLE as more appropriate, but for two columns either CROSS APPLY or conditional aggregation will do the trick.

populating table from different tables

Good day.
I have the following tables:
Order_Header(Order_id {pk}, customer_id {fk}, agent_id {fk}, Order_date(DATE FORMAT))
Invoice_Header (Invoice_ID {pk}, Customer_ID {fk}, Agent_ID{fk}, invoice_Date{DATE FORMAT} )
Stock( Product_ID {pk}, Product_description)
I created a table called AVG_COMPLETION_TIME_FACT and want to populate it with the following values regarding the previous 3 tables:
Product_ID
Invoice_month
Invoice_Year
AVG_Completion_Time (Invoice_date - Order_date)
I have the following code that doesn't work:
INSERT INTO AVG_COMPLETION_TIME_FACT(
SELECT PRODUCT_ID, EXTRACT (YEAR FROM INVOICE_DATE), EXTRACT (MONTH FROM INVOICE_DATE), (INVOICE_DATE - ORDER_DATE)
FROM STOCK, INVOICE_HEADER, ORDER_HEADER
GROUP BY PRODUCT_ID, EXTRACT (YEAR FROM INVOICE_DATE), EXTRACT (MONTH FROM INVOICE_DATE)
);
I want to group it by the product_id, year of invoice and month of invoice.
Is this possible?
Any advice would be much appreciated.
Regards
Short answer: it may be possible - if your database contains some more columns that are needed for writing the correct query.
There are several problems, apart from the syntactical ones. When we create some test tables, you can see that the answer you are looking for cannot be derived from the columns you have provided in your question. Example tables (Oracle 12c), all PK/FK constraints omitted:
-- 3 tables, similar to the ones described in your question,
-- including some test data
create table order_header (id, customer_id, agent_id, order_date )
as
select 1000, 100, 1, date'2018-01-01' from dual union all
select 1001, 100, 2, date'2018-01-02' from dual union all
select 1002, 100, 3, date'2018-01-03' from dual
;
create table invoice_header ( id, customer_id, agent_id, invoice_date )
as
select 2000, 100, 1, date'2018-02-01' from dual union all
select 2001, 100, 2, date'2018-03-11' from dual union all
select 2002, 100, 3, date'2018-04-21' from dual
;
create table stock( product_id, product_description)
as
select 3000, 'product3000' from dual union all
select 3001, 'product3001' from dual union all
select 3002, 'product3002' from dual
;
If you join the tables as you have done it (using a cross join), you will see that you get more rows than expected ... But: Neither the invoice_header table, nor the order_header table contains any PRODUCT_ID data. Thus, we cannot tell which product_ids are associated with the stored order_ids or invoice_ids.
select
product_id
, extract( year from invoice_date )
, extract( month from invoice_date )
, invoice_date - order_date
from stock, invoice_header, order_header -- cross join -> too many rows in the resultset!
-- group by ...
;
...
27 rows selected.
For getting your query right, you should probably write INNER JOINs and conditions (keyword: ON). If we try to do this with your original table definitions (as provided in your question) you will see that we cannot join all 3 tables, as they do not contain all the columns needed: PRODUCT_ID (table STOCK) cannot be associated with ORDER_HEADER or INVOICE_HEADER.
One column that these 2 tables (ORDER_HEADER and INVOICE_HEADER) do have in common is: customer_id, but that's not enough for answering your question. However, we can use it for demonstrating how you could code the JOINs.
select
-- product_id
IH.customer_id as cust_id
, OH.id as OH_id
, IH.id as IH_id
, extract( year from invoice_date ) as year_
, extract( month from invoice_date ) as month_
, invoice_date - order_date as completion_time
from invoice_header IH
join order_header OH on IH.customer_id = OH.customer_id
-- the stock table cannot be joined at this stage
;
Missing columns:
Please regard the following just as "proof of concept" code. Assuming that somewhere in your database, you have tables that have columns that {1} link STOCK and ORDER_HEADER (name here: STOCK_ORDER) and {2} link ORDER_HEADER and INVOICE_HEADER (name here: ORDER_INVOICE), you could actually get the information you want.
-- each ORDER_HEADER is mapped to multiple product_ids
create table stock_order
as
select S.product_id, OH.id as oh_id -- STOCK and ORDER_HEADER
from stock S, order_header OH ; -- cross join, we use all possible combinations here
select oh_id, product_id
from stock_order
order by OH_id
;
PRODUCT_ID OH_ID
---------- ----------
3000 1000
3000 1001
3000 1002
3001 1000
3001 1001
3001 1002
3002 1000
3002 1001
3002 1002
9 rows selected.
-- each INVOICE_HEADER mapped to a single ORDER_HEADER
create table order_invoice ( order_id, invoice_id )
as
select 1000, 2000 from dual union all
select 1001, 2001 from dual union all
select 1002, 2002 from dual
;
For querying, make sure that you code the correct JOIN conditions (ON ...) eg
-- example query. NOTICE: conditions in ON ...
select
S.product_id
, IH.customer_id as cust_id
, OH.id as OH_id
, IH.id as IH_id
, extract( year from invoice_date ) as year_
, extract( month from invoice_date ) as month_
, invoice_date - order_date as completion_time
from invoice_header IH
join order_invoice OI on IH.id = OI.invoice_id -- <- new "link" table
join order_header OH on OI.order_id = OH.id
join stock_order SO on OH.id = SO.OH_id -- <- new "link" table
join stock S on S.product_id = SO.product_id
;
Now you can add the GROUP BY, and SELECT only the columns you need. Combined with an INSERT, you should write something like ...
-- example avg_completion_time_fact table.
create table avg_completion_time_fact (
product_id number
, year_ number
, month_ number
, avg_completion_time number
) ;
insert into avg_completion_time_fact ( product_id, year_, month_, avg_completion_time )
select
S.product_id
, extract( year from invoice_date ) as year_
, extract( month from invoice_date ) as month_
, avg( invoice_date - order_date ) as avg_completion_time
from invoice_header IH
join order_invoice OI on IH.id = OI.invoice_id
join order_header OH on OI.order_id = OH.id
join stock_order SO on OH.id = SO.OH_id
join stock S on S.product_id = SO.product_id
group by S.product_id, extract( year from invoice_date ), extract( month from invoice_date )
;
The AVG_COMPLETION_TIME_FACT table now contains:
SQL> select * from avg_completion_time_fact order by product_id ;
PRODUCT_ID YEAR_ MONTH_ AVG_COMPLETION_TIME
---------- ---------- ---------- -------------------
3000 2018 3 68
3000 2018 4 108
3000 2018 2 31
3001 2018 3 68
3001 2018 2 31
3001 2018 4 108
3002 2018 3 68
3002 2018 4 108
3002 2018 2 31
It is not completely clear what the final query for your database (or schema) will look like, as we don't know the definitions of all the tables it contains. However, if you apply the techniques and stick to the syntax of the examples, you should be able to obtain the required results. Best of luck!

SQL: How to get min date associated with patient value

Trying to get earliest date associated with each PatientID for this period of time.
Current SQL returns multiple visits/documents within the time period for a patient I need to show only earliest date for patient tied to particular provider in date range.
Multiple Dates for PatientID
USE EHR
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
DECLARE #PROV NVARCHAR (255) ='KCOOPER0'
DECLARE #START_DATE DATETIME = '2017-09-18 00:00:00.000'
DECLARE #END_DATE DATETIME = '2017-12-17 23:59:59.999'
--DECLARE #START_DATE DATETIME = '2017-10-02 00:00:00.000'
--DECLARE #END_DATE DATETIME = '2017-12-31 23:59:59.999'
SELECT DISTINCT
PS.ID AS AppointmentID
, CL.Code AS PatientID
-- , SU.NameFirst AS PROVFNAME
-- , SU.NameLast AS PROVLNAME
-- , SU.NameSuffix AS PROVSUFFIX
, PS.ProviderId
, PS.ScheduledDateTime AS AppointmentDT
, PS.Duration
, PS.[TYPE] AS TypeDescription
, PS.IsActive as [Status]
, PS.ExternalId AS VisitID
-- , REPLACE(REPLACE(LOC.[Description],'[',''),']','') AS LOCATIONPLACE
, CDA.CreatedOn AS CDA
FROM PatientSchedule PS
INNER JOIN ContactsList CL WITH(NOLOCK) ON PS.PatientID=CL.ReferenceID
AND CL.Relation = 0
AND PS.ScheduledDateTime BETWEEN #START_DATE AND #END_DATE
INNER JOIN SystemUsers SU WITH(NOLOCK) ON PS.InterfaceCode=SU.InterfaceCode AND SU.Status='1'
INNER JOIN EMRDocuments ED ON PS.ID=ED.PatientScheduleId
AND ED.IsActive=1
LEFT JOIN
(SELECT DISTINCT ED.ID
,SU.NPI
,ED.PATIENTSCHEDULEID
,EDE.CreatedOn
FROM
EMRDOCUMENTS ED
INNER JOIN SystemUsers SU ON ED.ModifiedByID=SU.ID
AND ED.IsActive = 1
AND ED.IsSignedOff ='TRUE'
INNER JOIN EMRDocumentExport EDE ON ED.ID=EDE.DocumentId
AND EDE.LabCompanyName = 'FollowMyHealth_CCDA'
) CDA ON PS.ID=CDA.PatientScheduleId
WHERE --CL.Code = #PatientID
su.RegisteredProvider =1
AND SU.UserID =#PROV
ORDER BY CL.Code, CDA.CreatedOn
This is the general idea. You can fill in the details.
select your fields
from your tables
join (select patientId, min(the date field you want) minDate
from your tables
where whatever
group by patientId) minDates
on minDates.patientId = sometable.patientId
and the date field you want = minDate
etc
A join-less alternative, using window function and either a Common Table Expression or sticking with a subselect.
using CTE:
with mindates as (
select field1, field2, ...,
AppointmentDT,
min(AppointmentDT) OVER (PARTITION BY PatientID) minptAppointmentDT
from table
)
select field1, field2, ... , AppointmentDT from mindate_table
where AppointmentDT = minptAppointmentDT
using subselect:
select field1, field2, ... , AppointmentDT from
(select field1, field2, ...,
AppointmentDT,
min(AppointmentDT) OVER (PARTITION BY PatientID) minptAppointmentDT
from table) mindates
where AppointmentDT = minptAppointmentDT

sql query - find distinct user from table

I am trying to solve a problem using SQL query and need some expert's advice.
I have below transaction table.
-- UserID, ProductId, TransactionDate
-- 1 , 2 , 2014-01-01
-- 1 , 3 , 2014-01-05
-- 2 , 2 , 2014-01-02
-- 2 , 3 , 2014-05-07
.
.
.
What I am trying to achieve is to find all user who purchased more than one product WITHIN 30 DAYS .
My query so far is like
select UserID, COUNT(distinct ProductID)
from tableA
GROUP BY UserID HAVING COUNT(distinct ProductID) > 1
I am not sure where to apply "WITH IN 30 DAYS" logic in the query .
The outcome should be :
1, 2
2, 1
Thanks in advance for your help.
Edit: Within 30 Days
SQL Fiddle
SELECT
a.UserID,
COUNT(DISTINCT ProductID)
FROM TableA a
INNER JOIN (
SELECT UserID, TransactionDate = MAX(TransactionDate)
FROM TableA
GROUP BY UserID
) AS t
ON t.UserID = a.UserID
AND a.TransactionDate >= DATEADD(DAY, -30, t.TransactionDate)
AND a.TransactionDate <= t.TransactionDate
GROUP BY a.UserID
You can use GROUP BY YEAR(TransactionDate), MONTH(TransactionDate)
SELECT
UserID,
COUNT(DISTINCT ProductID)
FROM TableA
GROUP BY
UserID, YEAR(TransactionDate), MONTH(TransactionDate)
HAVING
COUNT(DISTINCT ProductID) > 1
Just add a where clause.
SELECT UserID, COUNT(DISTINCT ProductID) cnt
FROM tableA
WHERE TransactionDate >= CAST(DATEADD(DAY,-30,GETDATE()) AS DATE)
GROUP BY UserID
HAVING COUNT(DISTINCT ProductID) > 1
This works because the where clause is performed BEFORE the Group By and Having. So first it filters out all transactions over 30 days old and then returns only people who bought two distinct products.
Query Processing Order:
http://blog.sqlauthority.com/2009/04/06/sql-server-logical-query-processing-phases-order-of-statement-execution/

SQL Server select with multiple groupings

I have two tables describing users and their payments:
CREATE TABLE test_users
(id int IDENTITY NOT NULL,
name varchar(25),
PRIMARY KEY (id));
CREATE TABLE test_payments
(id int IDENTITY NOT NULL,
user_id int NOT NULL,
money money NOT NULL,
date datetime NOT NULL,
PRIMARY KEY (id));
INSERT INTO test_users (name)
VALUES ('john');
INSERT INTO test_users (name)
VALUES ('peter');
INSERT INTO test_payments (user_id, money, date)
VALUES (1, $1, CONVERT(datetime, '15.12.2012'));
INSERT INTO test_payments (user_id, money, date)
VALUES (1, $2, CONVERT(datetime, '16.12.2012'));
INSERT INTO test_payments (user_id, money, date)
VALUES (2, $1, CONVERT(datetime, '16.12.2012'));
INSERT INTO test_payments (user_id, money, date)
VALUES (2, $3, CONVERT(datetime, '17.12.2012'));
INSERT INTO test_payments (user_id, money, date)
VALUES (1, $1, CONVERT(datetime, '19.12.2012'));
Table test_users:
id name
-------------
1 john
2 peter
Table test_payments:
id user_id money last_activity
---------------------------------------
1 1 1.0000 2012-12-15
2 1 2.0000 2012-12-16
3 2 1.0000 2012-12-16
4 2 3.0000 2012-12-17
5 1 1.0000 2012-12-19
I need to make a users statistic which will show me :
username
total fee for a period of time
the date of the last
user's activity (general, not for a time period).
For example taking the period 15-18.12.12 I expect the following results:
name total last_activity
--------------------------------
peter $4 2012-12-17
john $3 2012-12-19
I've tried the following query:
SELECT u.*, SUM(p.money) total, MAX(p.date) last_activity
FROM test_users u
JOIN test_payments p
ON u.id= p.user_id
WHERE p.date BETWEEN CONVERT(datetime, '15.12.2012') AND CONVERT(datetime, '18.12.2012')
GROUP BY u.id, u.name
ORDER BY total DESC;
but getting wrong result for last_activity as it is also in the date range:
id name total last_activity
--------------------------------
2 peter 4.0000 2012-12-17
1 john 3.0000 2012-12-16
Please suggest a solution.
Looks like a couple of other answers popped up while I worked on mine, but here it is anyhow. There is a working sql fiddle here: http://sqlfiddle.com/#!3/14808/6
Basically, you need a query to pull the max date regardless of the date range. I chose to do this as a correlated subquery.
SELECT
u.id,
u.name,
SUM(IsNull(money,0)) as TotalMoneyInRange,
(SELECT max(date) FROM test_payments where user_id = u.id) AS LastPaymentOverAll
FROM test_users AS u
LEFT JOIN test_payments AS p
ON u.id = p.user_id
WHERE
p.date IS NULL OR
p.date between
CAST('12-11-2012' AS datetime) --range begin
and
CAST('12-16-2012' as datetime) --range end
GROUP BY u.id, u.name
You need to move the condition from the where clause to a case statement:
SELECT u.id, u.name,
SUM(case when p.date BETWEEN CONVERT(datetime, '15.12.2012') AND CONVERT(datetime, '18.12.2012')
then p.money
end) total,
MAX(p.date) last_activity
FROM test_users u JOIN
test_payments p
ON u.id= p.user_id
GROUP BY u.id, u.name
ORDER BY total DESC;
If you only want users who had a payment in that period, then you can include:
having total is not null
If you want the NULL values to appear as 0 instead of NULL, then include else 0 in the case statement.
You can also use subqueries to get the result:
SELECT u.*, total, last_activity
FROM test_users u
JOIN
(
select sum(money) total, user_id
from test_payments
WHERE date BETWEEN CONVERT(datetime, '2012-12-15')
AND CONVERT(datetime, '2012-12-18')
group by user_id
) p
ON u.id= p.user_id
inner join
(
select user_id, max(date) last_activity
from test_payments
group by user_id
) p1
on p.user_id = p1.user_id
ORDER BY total DESC;
See SQL Fiddle with Demo
You could add a sub query for the MAX date that doesn't have the WHERE clause like so:
SELECT
u.*
,SUM(p.money) total
,a.max_date last_activity
FROM test_users u
INNER JOIN test_payments p ON u.id = p.user_id
INNER JOIN (SELECT user_id, MAX(date) AS max_date
FROM test_payments
GROUP BY user_id) a ON u.id = a.user_id
WHERE p.date BETWEEN CONVERT(datetime, '15.12.2012') AND CONVERT(datetime, '18.12.2012')
GROUP BY u.id, u.name, a.max_date
ORDER BY total DESC;