summing by rows sql - sql

I attempted to do it using the analytical function, but it appears that I did so improperly...
How can I receive the output from the table I've been given?
CREATE TABLE rides (
ride_id INT,
driver_id INT,
ride_in_kms INT,
ride_fare FLOAT,
ride_date DATE
);
INSERT INTO rides VALUES (1, 1, 3, 4.45, "2016-05-16");
INSERT INTO rides VALUES (2, 1, 4, 8.46, "2016-05-16");
INSERT INTO rides VALUES (3, 2, 6, 11.9, "2016-05-16");
INSERT INTO rides VALUES (4, 3, 3, 6.76, "2016-05-16");
INSERT INTO rides VALUES (5, 2, 6, 13.55, "2016-05-16");
INSERT INTO rides VALUES (6, 4, 3, 4.91, "2016-05-20");
INSERT INTO rides VALUES (7, 1, 7, 16.77, "2016-05-20");
INSERT INTO rides VALUES (8, 3, 9, 16.18, "2016-05-20");
INSERT INTO rides VALUES (9, 2, 3, 6.07, "2016-05-20");
INSERT INTO rides VALUES (10, 4, 4, 6.25, "2016-05-20");
Output result
Thanks in advance

The general gist is to use an expression within the sum() to operate on the correct rows:
select
driver_id,
sum(case when ride_date = "2016-05-16" then ride_in_kms else 0 end) `KMS_MAY_16`,
sum(case when ride_date = "2016-05-20" then ride_in_kms else 0 end) `KMS_MAY_20`
from
group by driver_id;
The particular syntax available, and how to express the column label depends on what database you are using.

Related

Averaging values and getting standard devs from database to build graph

Tricky to Explain so Ill shrink down the info to a minimum:
But first, I'll try and explain my ultimate goal, I want to take users who trialed a product and determine how that product affected a value as a percentage compared to their average baseline and then average all these percentages with stand devs.
I have database with the a table that has a user_id, a value, a date.
user_id
value
date
int
int
int in epoch miliseconds
I then have a second table which indicates when a trial began and ends for a user and the product they are using for said trial.
user_id
start_date
end_date
product id
int
int in epoch milisecs
int in epoch milisecs
int
What I want to do is gather all the user's trials for one product type, and for each user that participated get a baseline value and their percent change each day. Then take all these percentages and average them and get a standard deviation for each day.
One problem is date needs to convert to days since start_date so anything between the start date and the first 24 hrs will be lumped as day 0, next 24 as day 1, and so forth. So ill be averaging the percents of each day
Not every day was recorded for each user so some will have multiple missing days, so I cant need to mark each day as days from start
The start_date's are random between users
So the graph will look like this:
picture
I would prefer to do as much of it in sql as possible, but the rest will be in Golang.
I was thinking about grabbing each trial , and then each trial will have an array of results. so then I iterate over each trial and iteriate over the results for each trial picking day 0, day 1, day 2 and saving these in their own arrays which I will then average. Everything start getting super messy though
such as in semi pseudo code:
db.Query("select user_id, start_date from trials where product_id = $1", productId).Scan(&trial.UserId, &trial.StartDate)
//extract trials from rows
for _, trial := range trials {
// extract leadingAvgStart from StartDate
db.QueryRow("select AVG(value) from results where user_id = $1 date between $2 and $3", trial.UserId, leadingAvgStart, trial.StartDate)
// Now we have the baseline for the user
rows := db.Query("select value, date from results where product_id = $1", start)
//Now we extract the results and have and array
//Convert Dates to Dates from start Date
//...? It just start getting ugly and I believe there has to be a better way
}
How can I do most of the heavy lifting with sql?
create table users (id int PRIMARY KEY, name text);
create table products (id int PRIMARY KEY, name text);
create table values (
id int PRIMARY KEY
, user_id int REFERENCES users(id)
, value int
, date numeric
);
create table trials (
id int PRIMARY KEY
, user_id int REFERENCES users(id)
, start_date numeric
, end_date numeric
, product_id int REFERENCES products(id)
);
INSERT INTO users (id, name ) VALUES
(1,'John'),
(2,'Jane'),
(3,'Billy'),
(4,'Miranda');
INSERT INTO products (id, name ) VALUES
(1, 'pill A'),
(2, 'pill B'),
(3, 'pill C'),
(4, 'exercise bal'),
(5, 'diet plan');
INSERT INTO trials (id,user_id,start_date,end_date,product_id) VALUES
(1, 1, 1667896408000, 1668099442000, 1),
(2, 1, 1667896408000, 1668099442000, 2),
(3, 2, 1667576960000, 1668074401000, 3),
(4, 3, 1667896408000, 1668099442000, 1);
INSERT INTO values (id, user_id, value, date) VALUES
(38, 1, 7, 1668182428000),
(1, 1, 7, 1668099442000),
(2, 1, 8, 1668074401000),
(3, 1, 8, 1668012300000),
(4, 1, 6, 1668011197000),
(5, 1, 6, 1667978268000),
(6, 1, 9, 1667925002000),
(7, 1, 9, 1667896408000),
(8, 1, 4, 1667838601000),
(9, 1, 6, 1667803049000),
(10, 1, 7, 1667576960000),
(12, 1, 5, 1667546428000),
(13, 1, 8, 1667490149000),
(14, 2, 8, 1668182428000),
(15, 2, 7, 1668099442000),
(16, 2, 8, 1668074401000),
(17, 2, 9, 1668012300000),
(18, 2, 6, 1668011197000),
(19, 2, 6, 1667978268000),
(20, 2, 5, 1667925002000),
(21, 2, 9, 1667896408000),
(22, 2, 4, 1667803049000),
(23, 2, 4, 1667576960000),
(24, 2, 5, 1667546428000),
(25, 2, 9, 1667490149000),
(26, 3, 6, 1668182428000),
(27, 3, 7, 1668099442000),
(28, 3, 8, 1668074401000),
(29, 3, 9, 1668011197000),
(30, 3, 6, 1667978268000),
(31, 3, 9, 1667925002000),
(32, 3, 9, 1667896408000),
(33, 3, 8, 1667838601000),
(34, 3, 6, 1667803049000),
(35, 3, 4, 1667576960000),
(36, 3, 5, 1667546428000),
(37, 3, 6, 1667490149000);
Ok I figures it out, basically I do two inner join queries and treat those as tables and then inner join those, and use a group by to average
select
query1.product_uuid,
query1.days,
AVG(query1.value / query2.avg) as avg_percent
from
(
select
DATE_PART(
'day',
to_timestamp(
values
.date / 1000
):: date - trials.start_date
) as days,
trials.uuid as trial_uuid,
trials.product_uuid,
values
.value as value
from
values
as
values
inner join product_trials as trials ON
values
.user_id = trials.user_id
where
values
.source = 'Trued'
and
values
.use = 'true'
AND trials.start_date IS NOT NULL
AND trials.end_date IS NOT NULL
AND to_timestamp(
values
.date / 1000
):: date > trials.start_date
AND to_timestamp(
values
.date / 1000
):: date < trials.end_date
) as query1
inner join (
select
values
.user_id,
trials.uuid as trial_uuid,
AVG(value)
from
values
inner join product_trials as trials ON
values
.user_id = trials.user_id
where
source = 'Trued'
and use = true
AND trials.start_date IS NOT NULL
AND trials.end_date IS NOT NULL
AND to_timestamp(
values
.date / 1000
):: date < trials.start_date
AND DATE_PART(
'day',
to_timestamp(
values
.date / 1000
):: date - trials.start_date
) > -20
GROUP BY
values
.user_id,
trials.uuid
) as query2 ON query1.trial_uuid = query2.trial_uuid
where
query2.avg > 0
GROUP BY
query1.days,
query1.product_uuid
ORDER BY
query1.product_uuid,
query1.days

SQL. Where condition for multiple values of column

Im looking for some hint when trying to filter for multiple values within column.
I'm interested in an "AND" condition for some values in column X (ie. statement Where Column X in (1,2,3) doesn't fulfill my needs).
Consider this example table:
I'm interested in finding COD_OPE that has both status 6 and 7. In this example i'm interested to find only COD_OPE = 3
If i use Where status in (6,7) i'll get cod_ope 1 and 6.
Any smart way to find cod_ope = 3?
Thank you!
Code for table in the example:
CREATE TABLE [TABLE] (
COD_OPE int,
STATUS int,
Observation_date int
)
INSERT INTO [TABLE] (COD_OPE, STATUS, Observation_date)
VALUES (1, 1, 2022),(1, 1, 2021), (1, 1, 2020), (1, 6, 2019), (1, 6, 2018), (2, 1, 2022), (2, 7, 2021), (2, 4, 2020), (2, 4, 2019), (2, 7, 2018), (3, 1, 2022), (3, 1, 2021), (3, 4, 2020), (3, 7, 2019), (3, 6, 2018)
select * from [TABLE]
Use aggregation:
SELECT COD_OPE
FROM [TABLE]
WHERE STATUS IN (6, 7)
GROUP BY COD_OPE
HAVING COUNT(DISTINCT STATUS) = 2;

Counting id for both days SQL

Can anyone help me, please?
The task is to find the number of rides taken by drivers who took a ride on both days
create table rides
(
ride_id int,
driver_id int,
ride_in_kms int,
ride_fare float,
ride_date date
);
insert into rides values (1, 1, 3, 4.45, "2016-05-16");
insert into rides values (2, 1, 4, 8.46, "2016-05-16");
insert into rides values (3, 2, 6, 11.9, "2016-05-16");
insert into rides values (4, 3, 3, 6.76, "2016-05-16");
insert into rides values (5, 2, 6, 13.55, "2016-05-16");
insert into rides values (6, 4, 3, 4.91, "2016-05-20");
insert into rides values (7, 1, 7, 16.77, "2016-05-20");
insert into rides values (8, 3, 9, 16.18, "2016-05-20");
insert into rides values (9, 2, 3, 6.07, "2016-05-20");
insert into rides values (10, 4, 4, 6.25, "2016-05-20");
The output:
driver_id rides
--------------
1 3
2 3
3 2
try like below
select driver_id,count(*) as cnt
from
rides
group by driver_id
having count(distinct ride_date) > 1
select driver_id ,count( ride_id) as rides
from rides group by driver_id having driver_id in
(select driver_id from rides
where ride_date ="2016-05-16" and driver_id in
(select driver_id from rides
where ride_date="2016-05-20")) ;

Can I improve this query for use in large tables?

How can I improve this query for use in large tables....?
I use a table ('DataValues') to store a collection of values ('Value') for collections ('Visit_id') ie it records certain values for each visit.
I use a table ('MatchItems') to store dynamic match sets 'MatchSet' of values ('Value'), sets can contain any number of values. The table also has a IsNeg field to indicate if the match should require a value to be not present in the visit collection.
This allows me to dynamically match visits that conform to certain criteria such as
Must contain values A, B and C and NOT D OR C and B AND NOT A.
ie (Value = A and Value = B and Value = C and Value /= D)
or (Value = C and Value = B and Value /= A)
I have a query that delivers a reasonable solution fiddle:
CREATE TABLE DataValues (
id NUMBER(5) CONSTRAINT DataValues_pk PRIMARY KEY,
Visit_id Number(5) ,
Value varchar(5)
);
INSERT INTO DataValues VALUES (1, 1, 'M');
INSERT INTO DataValues VALUES (2, 1, 'I');
INSERT INTO DataValues VALUES (3, 1, 'C');
INSERT INTO DataValues VALUES (4, 1, 'K');
INSERT INTO DataValues VALUES (5, 1, 'E');
INSERT INTO DataValues VALUES (6, 1, 'Y');
INSERT INTO DataValues VALUES (7, 2, 'M');
INSERT INTO DataValues VALUES (8, 2, 'O');
INSERT INTO DataValues VALUES (9, 2, 'U');
INSERT INTO DataValues VALUES (10, 2, 'S');
INSERT INTO DataValues VALUES (11, 2, 'E');
INSERT INTO DataValues VALUES (12, 3, 'C');
INSERT INTO DataValues VALUES (13, 3, 'A');
INSERT INTO DataValues VALUES (14, 3, 'T');
INSERT INTO DataValues VALUES (15, 4, 'S');
INSERT INTO DataValues VALUES (16, 4, 'A');
INSERT INTO DataValues VALUES (17, 4, 'T');
INSERT INTO DataValues VALUES (18, 5, 'M');
INSERT INTO DataValues VALUES (19, 5, 'A');
INSERT INTO DataValues VALUES (20, 5, 'T');
CREATE TABLE MatchItems (
id NUMBER(5) CONSTRAINT MatchItems_pk PRIMARY KEY,
MatchSet Number(5),
Value VARCHAR(5),
IsNeg NUMBER(1) NOT NULL CHECK (IsNeg in (0,1))
);
INSERT INTO MatchItems VALUES (1, 1, 'M', 0);
INSERT INTO MatchItems VALUES (2, 1, 'I', 0);
INSERT INTO MatchItems VALUES (3, 1, 'C', 0);
INSERT INTO MatchItems VALUES (4, 1, 'K', 0);
INSERT INTO MatchItems VALUES (5, 1, 'E', 0);
INSERT INTO MatchItems VALUES (6, 1, 'Y', 0);
INSERT INTO MatchItems VALUES (7, 2, 'C', 0);
INSERT INTO MatchItems VALUES (8, 2, 'A', 0);
INSERT INTO MatchItems VALUES (9, 3, 'A', 0);
INSERT INTO MatchItems VALUES (10, 3, 'T', 0);
INSERT INTO MatchItems VALUES (11, 4, 'S', 1);
INSERT INTO MatchItems VALUES (12, 4, 'A', 0);
INSERT INTO MatchItems VALUES (13, 4, 'K', 1);
INSERT INTO MatchItems VALUES (14, 5, 'A', 0);
INSERT INTO MatchItems VALUES (15, 5, 'T', 0);
SELECT
MatchItems.MatchSet,
DataValues.Visit_id,
GpMatchItems.Count TgtCount,
Count(MatchItems.Id),
sum(MatchItems.IsNeg)
FROM DataValues
LEFT JOIN MatchItems ON MatchItems.Value = DataValues.Value
--AND MatchItems.MatchSet = 4
LEFT JOIN (SELECT
MatchItems.MatchSet,
count(*) Count
FROM MatchItems
WHERE
MatchItems.IsNeg = 0
GROUP BY
MatchItems.MatchSet) GpMatchItems ON GpMatchItems.MatchSet = MatchItems.MatchSet
HAVING
Count(MatchItems.Id) = GpMatchItems.Count
AND sum(MatchItems.IsNeg) = 0
GROUP BY
MatchItems.MatchSet,
DataValues.Visit_id,
GpMatchItems.Count
How can I improve the performance of this query where the DataValues table contains 100m records, and MatchItems may include a collection of 50 sets each of 2 - 20 values?
You can try this version using Analytic functions and see if it performs any better. This query removes the subquery GpMatchItems that you are joining with.
SELECT DISTINCT matchset,
visit_id,
tgtcount,
match_visit_count,
isneg_sum
FROM (SELECT MatchItems.MatchSet,
DataValues.Visit_id,
COUNT (DISTINCT CASE MatchItems.IsNeg WHEN 0 THEN MatchItems.id ELSE NULL END)
OVER (PARTITION BY MatchItems.MatchSet)
AS tgtcount,
COUNT (*) OVER (PARTITION BY MatchItems.MatchSet, DataValues.Visit_id)
AS match_visit_count,
SUM (MatchItems.IsNeg) OVER (PARTITION BY MatchItems.MatchSet, DataValues.Visit_id)
AS isneg_sum
FROM DataValues LEFT JOIN MatchItems ON MatchItems.VALUE = DataValues.VALUE)
WHERE tgtcount = match_visit_count AND isneg_sum = 0;
I have adjusted EJ's suggestion to include a LEFT JOIN to collect the tgtCount to identify the total number of good matches required in each MatchSet:
SELECT DISTINCT matchset,
visit_id,
tgtcount,
match_visit_count,
isneg_sum
GpMatchItems.count tgtCount
FROM
COUNT (*) OVER (PARTITION BY MatchItems.MatchSet, DataValues.Visit_id)
AS match_visit_count,
SUM (MatchItems.IsNeg) OVER (PARTITION BY MatchItems.MatchSet, DataValues.Visit_id)
AS isneg_sum
FROM DataValues
LEFT JOIN MatchItems ON MatchItems.VALUE = DataValues.VALUE)
LEFT JOIN ( SELECT
MatchItems.MatchSet,
count(*) Count
FROM MatchItems
WHERE MatchItems.IsNeg = 0
GROUP BY
MatchItems.MatchSet) GpMatchItems
ON GpMatchItems.MatchSet = MatchItems.MatchSet
)
WHERE
tgtcount = match_visit_count
AND isneg_sum = 0;

SQL MIN() smaller/greater not working properly

I have this Data in DB
CREATE TABLE Stu_Table
(
Stu_Id VARCHAR(2),
Stu_Name VARCHAR(15),
Stu_Class VARCHAR(10),
sub_id VARCHAR(2),
marks VARCHAR(3)
);
INSERT INTO Stu_Table VALUES (1, 'Komal', 10, 1, 45);
INSERT INTO Stu_Table VALUES (2, 'Ajay', 10, 1, 56);
INSERT INTO Stu_Table VALUES (3, 'Rakesh', 10, 1, 67);
INSERT INTO Stu_Table VALUES (1, 'Komal', 10, 2, 47);
INSERT INTO Stu_Table VALUES (2, 'Ajay', 10, 2, 53);
INSERT INTO Stu_Table VALUES (3, 'Rakesh', 10, 2, 57);
INSERT INTO Stu_Table VALUES (1, 'Komal', 10, 3, 45);
INSERT INTO Stu_Table VALUES (2, 'Ajay', 10, 3, 56);
INSERT INTO Stu_Table VALUES (3, 'Rakesh', 10, 3, 67);
INSERT INTO Stu_Table VALUES (1, 'Komal', 10, 4, 65);
INSERT INTO Stu_Table VALUES (2, 'Ajay', 10, 4, 56);
INSERT INTO Stu_Table VALUES (3, 'Rakesh', 10, 4, 37);
INSERT INTO Stu_Table VALUES (1, 'Komal', 10, 5, 65);
INSERT INTO Stu_Table VALUES (2, 'Ajay', 10, 5, 46);
INSERT INTO Stu_Table VALUES (3, 'Rakesh', 10, 5, 63);
And I'm doing this query on this data.
SELECT *
FROM
(
SELECT
Stu_Id,
MIN(marks) AS mini,
AVG(marks) AS per
FROM stu_table
GROUP BY stu_id
HAVING MIN(marks) > 45
);
And I'm getting this:
Stu_Id| mini | per
1 | 45 | 53.4
2 | 46 | 53.4
3 | 37 | 58.2
I don't understand why I still see Stu_Id 1 with min(mark)=45 when I clearly have this HAVING min(marks)>45 in my query.
Runnable Demo
FIX:
Thanks to #sybkar and #Martin Smith!
I set the marks type as a string.
It's should be INT...
Thanks guys!
Working perfect!
create table Stu_Table(Stu_Id INT(2), Stu_Name varchar(15),
Stu_Class varchar(10),sub_id INT(2),marks INT(3));<--INT!!!
I don't understand why I still see Stu_Id 1 with min(mark)=45
when I clearly have this HAVING min(marks)>45 in my query.
You don't. Or at the least the demo you have provided doesn't.
In general any weird results that you are getting will be because marks is being stored as a string so MIN(marks) will be bringing back the earliest in alphabetical order though.
For example HAVING MIN(marks) > 45 will also bring back 5, 6, 7, 8 and 9