Averaging values and getting standard devs from database to build graph - sql

Tricky to Explain so Ill shrink down the info to a minimum:
But first, I'll try and explain my ultimate goal, I want to take users who trialed a product and determine how that product affected a value as a percentage compared to their average baseline and then average all these percentages with stand devs.
I have database with the a table that has a user_id, a value, a date.
user_id
value
date
int
int
int in epoch miliseconds
I then have a second table which indicates when a trial began and ends for a user and the product they are using for said trial.
user_id
start_date
end_date
product id
int
int in epoch milisecs
int in epoch milisecs
int
What I want to do is gather all the user's trials for one product type, and for each user that participated get a baseline value and their percent change each day. Then take all these percentages and average them and get a standard deviation for each day.
One problem is date needs to convert to days since start_date so anything between the start date and the first 24 hrs will be lumped as day 0, next 24 as day 1, and so forth. So ill be averaging the percents of each day
Not every day was recorded for each user so some will have multiple missing days, so I cant need to mark each day as days from start
The start_date's are random between users
So the graph will look like this:
picture
I would prefer to do as much of it in sql as possible, but the rest will be in Golang.
I was thinking about grabbing each trial , and then each trial will have an array of results. so then I iterate over each trial and iteriate over the results for each trial picking day 0, day 1, day 2 and saving these in their own arrays which I will then average. Everything start getting super messy though
such as in semi pseudo code:
db.Query("select user_id, start_date from trials where product_id = $1", productId).Scan(&trial.UserId, &trial.StartDate)
//extract trials from rows
for _, trial := range trials {
// extract leadingAvgStart from StartDate
db.QueryRow("select AVG(value) from results where user_id = $1 date between $2 and $3", trial.UserId, leadingAvgStart, trial.StartDate)
// Now we have the baseline for the user
rows := db.Query("select value, date from results where product_id = $1", start)
//Now we extract the results and have and array
//Convert Dates to Dates from start Date
//...? It just start getting ugly and I believe there has to be a better way
}
How can I do most of the heavy lifting with sql?
create table users (id int PRIMARY KEY, name text);
create table products (id int PRIMARY KEY, name text);
create table values (
id int PRIMARY KEY
, user_id int REFERENCES users(id)
, value int
, date numeric
);
create table trials (
id int PRIMARY KEY
, user_id int REFERENCES users(id)
, start_date numeric
, end_date numeric
, product_id int REFERENCES products(id)
);
INSERT INTO users (id, name ) VALUES
(1,'John'),
(2,'Jane'),
(3,'Billy'),
(4,'Miranda');
INSERT INTO products (id, name ) VALUES
(1, 'pill A'),
(2, 'pill B'),
(3, 'pill C'),
(4, 'exercise bal'),
(5, 'diet plan');
INSERT INTO trials (id,user_id,start_date,end_date,product_id) VALUES
(1, 1, 1667896408000, 1668099442000, 1),
(2, 1, 1667896408000, 1668099442000, 2),
(3, 2, 1667576960000, 1668074401000, 3),
(4, 3, 1667896408000, 1668099442000, 1);
INSERT INTO values (id, user_id, value, date) VALUES
(38, 1, 7, 1668182428000),
(1, 1, 7, 1668099442000),
(2, 1, 8, 1668074401000),
(3, 1, 8, 1668012300000),
(4, 1, 6, 1668011197000),
(5, 1, 6, 1667978268000),
(6, 1, 9, 1667925002000),
(7, 1, 9, 1667896408000),
(8, 1, 4, 1667838601000),
(9, 1, 6, 1667803049000),
(10, 1, 7, 1667576960000),
(12, 1, 5, 1667546428000),
(13, 1, 8, 1667490149000),
(14, 2, 8, 1668182428000),
(15, 2, 7, 1668099442000),
(16, 2, 8, 1668074401000),
(17, 2, 9, 1668012300000),
(18, 2, 6, 1668011197000),
(19, 2, 6, 1667978268000),
(20, 2, 5, 1667925002000),
(21, 2, 9, 1667896408000),
(22, 2, 4, 1667803049000),
(23, 2, 4, 1667576960000),
(24, 2, 5, 1667546428000),
(25, 2, 9, 1667490149000),
(26, 3, 6, 1668182428000),
(27, 3, 7, 1668099442000),
(28, 3, 8, 1668074401000),
(29, 3, 9, 1668011197000),
(30, 3, 6, 1667978268000),
(31, 3, 9, 1667925002000),
(32, 3, 9, 1667896408000),
(33, 3, 8, 1667838601000),
(34, 3, 6, 1667803049000),
(35, 3, 4, 1667576960000),
(36, 3, 5, 1667546428000),
(37, 3, 6, 1667490149000);

Ok I figures it out, basically I do two inner join queries and treat those as tables and then inner join those, and use a group by to average
select
query1.product_uuid,
query1.days,
AVG(query1.value / query2.avg) as avg_percent
from
(
select
DATE_PART(
'day',
to_timestamp(
values
.date / 1000
):: date - trials.start_date
) as days,
trials.uuid as trial_uuid,
trials.product_uuid,
values
.value as value
from
values
as
values
inner join product_trials as trials ON
values
.user_id = trials.user_id
where
values
.source = 'Trued'
and
values
.use = 'true'
AND trials.start_date IS NOT NULL
AND trials.end_date IS NOT NULL
AND to_timestamp(
values
.date / 1000
):: date > trials.start_date
AND to_timestamp(
values
.date / 1000
):: date < trials.end_date
) as query1
inner join (
select
values
.user_id,
trials.uuid as trial_uuid,
AVG(value)
from
values
inner join product_trials as trials ON
values
.user_id = trials.user_id
where
source = 'Trued'
and use = true
AND trials.start_date IS NOT NULL
AND trials.end_date IS NOT NULL
AND to_timestamp(
values
.date / 1000
):: date < trials.start_date
AND DATE_PART(
'day',
to_timestamp(
values
.date / 1000
):: date - trials.start_date
) > -20
GROUP BY
values
.user_id,
trials.uuid
) as query2 ON query1.trial_uuid = query2.trial_uuid
where
query2.avg > 0
GROUP BY
query1.days,
query1.product_uuid
ORDER BY
query1.product_uuid,
query1.days

Related

SQL. Where condition for multiple values of column

Im looking for some hint when trying to filter for multiple values within column.
I'm interested in an "AND" condition for some values in column X (ie. statement Where Column X in (1,2,3) doesn't fulfill my needs).
Consider this example table:
I'm interested in finding COD_OPE that has both status 6 and 7. In this example i'm interested to find only COD_OPE = 3
If i use Where status in (6,7) i'll get cod_ope 1 and 6.
Any smart way to find cod_ope = 3?
Thank you!
Code for table in the example:
CREATE TABLE [TABLE] (
COD_OPE int,
STATUS int,
Observation_date int
)
INSERT INTO [TABLE] (COD_OPE, STATUS, Observation_date)
VALUES (1, 1, 2022),(1, 1, 2021), (1, 1, 2020), (1, 6, 2019), (1, 6, 2018), (2, 1, 2022), (2, 7, 2021), (2, 4, 2020), (2, 4, 2019), (2, 7, 2018), (3, 1, 2022), (3, 1, 2021), (3, 4, 2020), (3, 7, 2019), (3, 6, 2018)
select * from [TABLE]
Use aggregation:
SELECT COD_OPE
FROM [TABLE]
WHERE STATUS IN (6, 7)
GROUP BY COD_OPE
HAVING COUNT(DISTINCT STATUS) = 2;

summing by rows sql

I attempted to do it using the analytical function, but it appears that I did so improperly...
How can I receive the output from the table I've been given?
CREATE TABLE rides (
ride_id INT,
driver_id INT,
ride_in_kms INT,
ride_fare FLOAT,
ride_date DATE
);
INSERT INTO rides VALUES (1, 1, 3, 4.45, "2016-05-16");
INSERT INTO rides VALUES (2, 1, 4, 8.46, "2016-05-16");
INSERT INTO rides VALUES (3, 2, 6, 11.9, "2016-05-16");
INSERT INTO rides VALUES (4, 3, 3, 6.76, "2016-05-16");
INSERT INTO rides VALUES (5, 2, 6, 13.55, "2016-05-16");
INSERT INTO rides VALUES (6, 4, 3, 4.91, "2016-05-20");
INSERT INTO rides VALUES (7, 1, 7, 16.77, "2016-05-20");
INSERT INTO rides VALUES (8, 3, 9, 16.18, "2016-05-20");
INSERT INTO rides VALUES (9, 2, 3, 6.07, "2016-05-20");
INSERT INTO rides VALUES (10, 4, 4, 6.25, "2016-05-20");
Output result
Thanks in advance
The general gist is to use an expression within the sum() to operate on the correct rows:
select
driver_id,
sum(case when ride_date = "2016-05-16" then ride_in_kms else 0 end) `KMS_MAY_16`,
sum(case when ride_date = "2016-05-20" then ride_in_kms else 0 end) `KMS_MAY_20`
from
group by driver_id;
The particular syntax available, and how to express the column label depends on what database you are using.

SQL Pivot Half of table

I have a table that consists of time information. It's basically:
Employee, Date, Seq, Time In, Time Out.
They can clock out multiple times a day, so I'm trying to get all of the clock outs in a day on one row. My result would be something like:
Employee, Date, TimeIn1, TimeOut1, TimeIn2, TimeOut2, TimeIn3, TimeOut3....
Where the 1, 2, and 3 are the sequence numbers. I know I could just do a bunch of left joins to the table itself based on employee=employee, date=date, and seq=seq+1, but is there a way to do it in a pivot? I don't want to pivot the employee and date fields, just the time in and time out.
The short answer is: Yes, it's possible.
The exact code will be updated if/when you provide sample data to clarify some points, but you can absolutely pivot the times out while leaving the employee/work date alone.
Sorry for the wall of code; none of the fiddle sites are working from my current computer
declare #test table (
pk int,
workdate date,
seq int,
tIN time,
tOUT time
)
insert into #test values
(1, '2020-11-25', 1, '08:00', null),
(1, '2020-11-25', 2, null, '11:00'),
(1, '2020-11-25', 3, '11:32', null),
(1, '2020-11-25', 4, null, '17:00'),
(2, '2020-11-25', 5, '08:00', null),
(2, '2020-11-25', 6, null, '09:00'),
(2, '2020-11-25', 7, '09:15', null),
-- new date
(1, '2020-11-27', 8, '08:00', null),
(1, '2020-11-27', 9, null, '08:22'),
(1, '2020-11-27', 10, '09:14', null),
(1, '2020-11-27', 11, null, '12:08'),
(1, '2020-11-27', 12, '01:08', null),
(1, '2020-11-27', 13, null, '14:40'),
(1, '2020-11-27', 14, '14:55', null),
(1, '2020-11-27', 15, null, '17:00')
select *
from (
/* this just sets the column header names and condenses their values */
select
pk,
workdate,
colName = case when tin is not null then 'TimeIn' + cast(empDaySEQ as varchar) else 'TimeOut' + cast(empDaySEQ as varchar) end,
colValue = coalesce(tin, tout)
from (
/* main query */
select
pk,
workdate,
/* grab what pair # this clock in or out is; reset by employee & date */
empDaySEQ = (row_number() over (partition by pk, workdate order by seq) / 2) + (row_number() over (partition by pk, workdate order by seq) % 2),
tin,
tout
from #test
) i
) a
PIVOT (
max(colValue)
for colName
IN ( /* replace w/ dynamic if you don't know upper boundary of max in/out pairs */
[TimeIn1],
[TimeOut1],
[TimeIn2],
[TimeOut2],
[TimeIn3],
[TimeOut3],
[TimeIn4],
[TimeOut4]
)
) mypivotTable
generates these results.
(I would provide a fiddle demo but they're not working for me today)

Group By get the currently matching Effective data out of past and future date in SQL Server [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 2 years ago.
I'm having list of rows with a column EffectiveOn in SQL Server database table. I want to fetch the currently applicable EffectiveOn for each AccountId respective to current date. Consider the following table
In the above table I have to fetch the rows whose (Id: 11 to 15) because the current date (i.e., today) is 2020-11-19
I tried the following solution but I can't How do I get the current records based on it's Effective Date?
Kindly assist me how to get the expected result-set.
Sample Data:
CREATE TABLE [dbo].[DataInfo]
(
[Id] INT NOT NULL,
[AccountId] INT NOT NULL,
[EffectiveOn] DATE NOT NULL
)
GO
INSERT INTO [dbo].[DataInfo](Id, AccountId, EffectiveOn)
VALUES (1, 1, '2020-01-01'), (2, 2, '2020-01-02'), (3, 3, '2020-01-03'), (4, 4, '2020-01-04'), (5, 5, '2020-01-05'),
(6, 1, '2020-05-01'), (7, 2, '2020-05-02'), (8, 3, '2020-05-03'), (9, 4, '2020-05-04'), (10, 5, '2020-05-05'),
(11, 1, '2020-10-01'), (12, 2, '2020-10-02'), (13, 3, '2020-10-03'), (14, 4, '2020-10-04'), (15, 5, '2020-10-05'),
(16, 1, '2021-02-01'), (17, 2, '2021-02-02'), (18, 3, '2021-02-03'), (19, 4, '2021-02-04'), (20, 5, '2021-02-05')
You can use a correlated subquery to get the most recent date as of a particular date:
select di.*
from datainfo di
where di.effectiveon = (select max(di2.effecctiveon)
from datainfo di2
where di2.accountid = di.accountid and
di2.effectiveon < getdate()
);
You can also do this with window functions:
select di.*
from (select di.*,
row_number() over (partition by accountid order by effective on desc) as seqnum
from datainfo di
where di.effectiveon < getdate()
) di
where seqnum = 1;

COUNT() with GROUP BY - Query to get number of times a set of values appear in a column SQL

Just struggling with this question.
Write a query that selects the following details about the sides that have been ordered in orders:
• The side ID number and side name.
• How many orders the side has been ordered in (regardless of quantity).
I have created a view already and this is the table for it (ordered_sides_details is the name of the view)
View Table
I've wrttien this query but I believe it just counts the number of rows instead of how many times each side is ordered.
SELECT ordered_sides_details.side_name, COUNT(*)
FROM ordered_sides_details
GROUP BY ordered_sides_details.side_name;
This is the resulting table
Obviously its incorrect as 1.25L Coke has only been in 1 order.
Any help with solving this would be awesome. Thanks.
Solution
There must be something wrong with view you've created.
This should be enough to yield proper results:
SELECT
side_id
,side_name
,COUNT(*) AS total_count
FROM dbo.orders
GROUP BY side_id, side_name
Boostrapping (SQL Server)
Scripts for bootstrapping your example:
IF NOT EXISTS (SELECT 1 FROM sys.tables t WHERE t.object_id = OBJECT_ID('dbo.orders'))
BEGIN
CREATE TABLE orders
(
order_id INT,
side_id INT NOT NULL,
side_name NVARCHAR(100) NOT NULL,
ordered_quantity INT NOT NULL,
total_cost MONEY NOT NULL
);
END;
INSERT INTO orders (order_id, side_id, side_name, ordered_quantity, total_cost)
VALUES
(10, 1, '390ml Coke', 1, 3.00),
(5, 2, '1.25l Coke', 2, 10.00),
(8, 3, 'Lava Cake', 3, 8.85),
(7, 4, 'Chicken Wings', 4, 14.00),
(6, 5, 'Garlic Bread', 4, 7.80),
(5, 6, 'Healthy Kale Chips', 3, 16.50),
(5, 6, 'Healthy Kale Chips', 2, 11.00),
(4, 5, 'Garlic Bread', 1, 1.95),
(3, 4, 'Chicken Wings', 1, 3.50),
(2, 3, 'Lava Cake', 2, 5.90);