SQL. Where condition for multiple values of column - sql

Im looking for some hint when trying to filter for multiple values within column.
I'm interested in an "AND" condition for some values in column X (ie. statement Where Column X in (1,2,3) doesn't fulfill my needs).
Consider this example table:
I'm interested in finding COD_OPE that has both status 6 and 7. In this example i'm interested to find only COD_OPE = 3
If i use Where status in (6,7) i'll get cod_ope 1 and 6.
Any smart way to find cod_ope = 3?
Thank you!
Code for table in the example:
CREATE TABLE [TABLE] (
COD_OPE int,
STATUS int,
Observation_date int
)
INSERT INTO [TABLE] (COD_OPE, STATUS, Observation_date)
VALUES (1, 1, 2022),(1, 1, 2021), (1, 1, 2020), (1, 6, 2019), (1, 6, 2018), (2, 1, 2022), (2, 7, 2021), (2, 4, 2020), (2, 4, 2019), (2, 7, 2018), (3, 1, 2022), (3, 1, 2021), (3, 4, 2020), (3, 7, 2019), (3, 6, 2018)
select * from [TABLE]

Use aggregation:
SELECT COD_OPE
FROM [TABLE]
WHERE STATUS IN (6, 7)
GROUP BY COD_OPE
HAVING COUNT(DISTINCT STATUS) = 2;

Related

Averaging values and getting standard devs from database to build graph

Tricky to Explain so Ill shrink down the info to a minimum:
But first, I'll try and explain my ultimate goal, I want to take users who trialed a product and determine how that product affected a value as a percentage compared to their average baseline and then average all these percentages with stand devs.
I have database with the a table that has a user_id, a value, a date.
user_id
value
date
int
int
int in epoch miliseconds
I then have a second table which indicates when a trial began and ends for a user and the product they are using for said trial.
user_id
start_date
end_date
product id
int
int in epoch milisecs
int in epoch milisecs
int
What I want to do is gather all the user's trials for one product type, and for each user that participated get a baseline value and their percent change each day. Then take all these percentages and average them and get a standard deviation for each day.
One problem is date needs to convert to days since start_date so anything between the start date and the first 24 hrs will be lumped as day 0, next 24 as day 1, and so forth. So ill be averaging the percents of each day
Not every day was recorded for each user so some will have multiple missing days, so I cant need to mark each day as days from start
The start_date's are random between users
So the graph will look like this:
picture
I would prefer to do as much of it in sql as possible, but the rest will be in Golang.
I was thinking about grabbing each trial , and then each trial will have an array of results. so then I iterate over each trial and iteriate over the results for each trial picking day 0, day 1, day 2 and saving these in their own arrays which I will then average. Everything start getting super messy though
such as in semi pseudo code:
db.Query("select user_id, start_date from trials where product_id = $1", productId).Scan(&trial.UserId, &trial.StartDate)
//extract trials from rows
for _, trial := range trials {
// extract leadingAvgStart from StartDate
db.QueryRow("select AVG(value) from results where user_id = $1 date between $2 and $3", trial.UserId, leadingAvgStart, trial.StartDate)
// Now we have the baseline for the user
rows := db.Query("select value, date from results where product_id = $1", start)
//Now we extract the results and have and array
//Convert Dates to Dates from start Date
//...? It just start getting ugly and I believe there has to be a better way
}
How can I do most of the heavy lifting with sql?
create table users (id int PRIMARY KEY, name text);
create table products (id int PRIMARY KEY, name text);
create table values (
id int PRIMARY KEY
, user_id int REFERENCES users(id)
, value int
, date numeric
);
create table trials (
id int PRIMARY KEY
, user_id int REFERENCES users(id)
, start_date numeric
, end_date numeric
, product_id int REFERENCES products(id)
);
INSERT INTO users (id, name ) VALUES
(1,'John'),
(2,'Jane'),
(3,'Billy'),
(4,'Miranda');
INSERT INTO products (id, name ) VALUES
(1, 'pill A'),
(2, 'pill B'),
(3, 'pill C'),
(4, 'exercise bal'),
(5, 'diet plan');
INSERT INTO trials (id,user_id,start_date,end_date,product_id) VALUES
(1, 1, 1667896408000, 1668099442000, 1),
(2, 1, 1667896408000, 1668099442000, 2),
(3, 2, 1667576960000, 1668074401000, 3),
(4, 3, 1667896408000, 1668099442000, 1);
INSERT INTO values (id, user_id, value, date) VALUES
(38, 1, 7, 1668182428000),
(1, 1, 7, 1668099442000),
(2, 1, 8, 1668074401000),
(3, 1, 8, 1668012300000),
(4, 1, 6, 1668011197000),
(5, 1, 6, 1667978268000),
(6, 1, 9, 1667925002000),
(7, 1, 9, 1667896408000),
(8, 1, 4, 1667838601000),
(9, 1, 6, 1667803049000),
(10, 1, 7, 1667576960000),
(12, 1, 5, 1667546428000),
(13, 1, 8, 1667490149000),
(14, 2, 8, 1668182428000),
(15, 2, 7, 1668099442000),
(16, 2, 8, 1668074401000),
(17, 2, 9, 1668012300000),
(18, 2, 6, 1668011197000),
(19, 2, 6, 1667978268000),
(20, 2, 5, 1667925002000),
(21, 2, 9, 1667896408000),
(22, 2, 4, 1667803049000),
(23, 2, 4, 1667576960000),
(24, 2, 5, 1667546428000),
(25, 2, 9, 1667490149000),
(26, 3, 6, 1668182428000),
(27, 3, 7, 1668099442000),
(28, 3, 8, 1668074401000),
(29, 3, 9, 1668011197000),
(30, 3, 6, 1667978268000),
(31, 3, 9, 1667925002000),
(32, 3, 9, 1667896408000),
(33, 3, 8, 1667838601000),
(34, 3, 6, 1667803049000),
(35, 3, 4, 1667576960000),
(36, 3, 5, 1667546428000),
(37, 3, 6, 1667490149000);
Ok I figures it out, basically I do two inner join queries and treat those as tables and then inner join those, and use a group by to average
select
query1.product_uuid,
query1.days,
AVG(query1.value / query2.avg) as avg_percent
from
(
select
DATE_PART(
'day',
to_timestamp(
values
.date / 1000
):: date - trials.start_date
) as days,
trials.uuid as trial_uuid,
trials.product_uuid,
values
.value as value
from
values
as
values
inner join product_trials as trials ON
values
.user_id = trials.user_id
where
values
.source = 'Trued'
and
values
.use = 'true'
AND trials.start_date IS NOT NULL
AND trials.end_date IS NOT NULL
AND to_timestamp(
values
.date / 1000
):: date > trials.start_date
AND to_timestamp(
values
.date / 1000
):: date < trials.end_date
) as query1
inner join (
select
values
.user_id,
trials.uuid as trial_uuid,
AVG(value)
from
values
inner join product_trials as trials ON
values
.user_id = trials.user_id
where
source = 'Trued'
and use = true
AND trials.start_date IS NOT NULL
AND trials.end_date IS NOT NULL
AND to_timestamp(
values
.date / 1000
):: date < trials.start_date
AND DATE_PART(
'day',
to_timestamp(
values
.date / 1000
):: date - trials.start_date
) > -20
GROUP BY
values
.user_id,
trials.uuid
) as query2 ON query1.trial_uuid = query2.trial_uuid
where
query2.avg > 0
GROUP BY
query1.days,
query1.product_uuid
ORDER BY
query1.product_uuid,
query1.days

summing by rows sql

I attempted to do it using the analytical function, but it appears that I did so improperly...
How can I receive the output from the table I've been given?
CREATE TABLE rides (
ride_id INT,
driver_id INT,
ride_in_kms INT,
ride_fare FLOAT,
ride_date DATE
);
INSERT INTO rides VALUES (1, 1, 3, 4.45, "2016-05-16");
INSERT INTO rides VALUES (2, 1, 4, 8.46, "2016-05-16");
INSERT INTO rides VALUES (3, 2, 6, 11.9, "2016-05-16");
INSERT INTO rides VALUES (4, 3, 3, 6.76, "2016-05-16");
INSERT INTO rides VALUES (5, 2, 6, 13.55, "2016-05-16");
INSERT INTO rides VALUES (6, 4, 3, 4.91, "2016-05-20");
INSERT INTO rides VALUES (7, 1, 7, 16.77, "2016-05-20");
INSERT INTO rides VALUES (8, 3, 9, 16.18, "2016-05-20");
INSERT INTO rides VALUES (9, 2, 3, 6.07, "2016-05-20");
INSERT INTO rides VALUES (10, 4, 4, 6.25, "2016-05-20");
Output result
Thanks in advance
The general gist is to use an expression within the sum() to operate on the correct rows:
select
driver_id,
sum(case when ride_date = "2016-05-16" then ride_in_kms else 0 end) `KMS_MAY_16`,
sum(case when ride_date = "2016-05-20" then ride_in_kms else 0 end) `KMS_MAY_20`
from
group by driver_id;
The particular syntax available, and how to express the column label depends on what database you are using.

Sum over range but reset when other column value is 1

So I have an account number and a reading number that I want to take the cumulative sum of but reset at the beginning of a new reading cycle (I want to reset the running sum).
I am using a window function but cannot figure out how to set it when the new reading cycle exists.
Data has the following format:
The Reading cycle Volume value is what I am attempting to achieve.
Currently I have tried SUM(Value) OVER(PARTITION BY ACCOUNT ORDER BY OBS)
I do not know how to reset it when reading # = 1.
I have tried:
Case
when [Reading #] = 1 THEN value
ELSE SUM(Value) OVER(PARTITION BY ACCOUNT ORDER BY OBS)
END AS [Running Total]
If I understand the question correctly and the values, stored in the Obs and [Reading #] columns are without gaps, the next approach is an option:
Table:
SELECT *
INTO Data
FROM (VALUES
(1, 1, 1, 5),
(1, 2, 2, 6),
(1, 3, 3, 5),
(1, 4, 4, 6),
(1, 5, 5, 5),
(1, 6, 6, 5),
(1, 7, 1, 5),
(1, 8, 2, 6),
(1, 9, 3, 5),
(1, 10, 4, 6),
(1, 11, 5, 5),
(1, 12, 6, 5),
(2, 1, 1, 7),
(2, 2, 2, 8),
(2, 3, 3, 9),
(2, 4, 4, 10),
(2, 5, 5, 11),
(2, 6, 6, 12),
(2, 7, 1, 7),
(2, 8, 2, 8),
(2, 9, 3, 9),
(2, 10, 4, 10),
(2, 11, 5, 11),
(2, 12, 6, 12)
) v (Account, Obs, [Reading #], [Value])
Statement:
SELECT
Account, Obs, [Reading #], [Value],
SUM([Value]) OVER (PARTITION BY Account, [Group] ORDER BY Account, Obs) AS [Ready Cicle Value]
FROM (
SELECT
*,
(Obs - [Reading #]) AS [Group]
FROM Data
) t
One additional option (as a more general approach) is to create groups when [Reading #] is equal to 1:
SELECT
Account, Obs, [Reading #], [Value],
SUM([Value]) OVER (PARTITION BY Account, [Group] ORDER BY Obs) AS [Ready Cicle Value]
FROM (
SELECT *, SUM([Change]) OVER (PARTITION BY Account ORDER BY Obs) AS [Group]
FROM (
SELECT *, CASE WHEN [Reading #] = 1 THEN 1 ELSE 0 END AS [Change]
FROM Data
) a
) b
Help us to help you. Always include a minimal set of data and the code we need, which we can copy and paste to immediately be on the same page as you without wasting time & effort that is better spent helping others. Note how you can simply copy and paste our solutions and play with them? They are complete and stand alone. That is what we are looking for from you.
You are close. The piece you are missing is that you need some way to group your readings and then you can include that in your partitioning as well.
There are any number of ways to create the new derived value for "reading_group" the following is just one way.
DECLARE #t_customer_readings TABLE
( account_number INT,
observation INT,
reading_number INT,
reading_value INT
)
INSERT INTO #t_customer_readings
VALUES (1, 1 , 1, 3),
(1, 2 , 2, 6),
(1, 3 , 3, 9),
(1, 4 , 4, 5),
(1, 5 , 5, 5),
(1, 6 , 6, 8),
(1, 7 , 1, 1),
(1, 8 , 2, 4),
(1, 9 , 3, 7),
(1, 10, 4, 0),
(1, 11, 5, 3),
(1, 12, 6, 6),
(2, 1 , 1, 9),
(2, 2 , 2, 2),
(2, 3 , 3, 5),
(2, 4 , 4, 8),
(2, 5 , 5, 1),
(2, 6 , 6, 4),
(2, 7 , 1, 7),
(2, 8 , 2, 0),
(2, 9 , 3, 3),
(2, 10, 1, 6), -- note I have split this group into 2 to show that the reading numbers do not need to be sequential.
(2, 11, 5, 9),
(2, 12, 6, 2)
SELECT r.*,
-- reading_group = CASE WHEN r.reading_number = 1 THEN observation ELSE rg.reading_group END,
ready_cycle_volume = SUM(reading_value) OVER(PARTITION BY account_number,
CASE WHEN r.reading_number = 1 THEN observation
ELSE rg.reading_group
END
ORDER BY observation)
FROM #t_customer_readings r
CROSS APPLY
(SELECT reading_group = MAX(observation) -- I picked observation but you could use whatever value you like. we are just creating something we can group on.
FROM #t_customer_readings
WHERE account_number = r.account_number
AND observation < r.observation
AND reading_number = 1) rg

Group By get the currently matching Effective data out of past and future date in SQL Server [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 2 years ago.
I'm having list of rows with a column EffectiveOn in SQL Server database table. I want to fetch the currently applicable EffectiveOn for each AccountId respective to current date. Consider the following table
In the above table I have to fetch the rows whose (Id: 11 to 15) because the current date (i.e., today) is 2020-11-19
I tried the following solution but I can't How do I get the current records based on it's Effective Date?
Kindly assist me how to get the expected result-set.
Sample Data:
CREATE TABLE [dbo].[DataInfo]
(
[Id] INT NOT NULL,
[AccountId] INT NOT NULL,
[EffectiveOn] DATE NOT NULL
)
GO
INSERT INTO [dbo].[DataInfo](Id, AccountId, EffectiveOn)
VALUES (1, 1, '2020-01-01'), (2, 2, '2020-01-02'), (3, 3, '2020-01-03'), (4, 4, '2020-01-04'), (5, 5, '2020-01-05'),
(6, 1, '2020-05-01'), (7, 2, '2020-05-02'), (8, 3, '2020-05-03'), (9, 4, '2020-05-04'), (10, 5, '2020-05-05'),
(11, 1, '2020-10-01'), (12, 2, '2020-10-02'), (13, 3, '2020-10-03'), (14, 4, '2020-10-04'), (15, 5, '2020-10-05'),
(16, 1, '2021-02-01'), (17, 2, '2021-02-02'), (18, 3, '2021-02-03'), (19, 4, '2021-02-04'), (20, 5, '2021-02-05')
You can use a correlated subquery to get the most recent date as of a particular date:
select di.*
from datainfo di
where di.effectiveon = (select max(di2.effecctiveon)
from datainfo di2
where di2.accountid = di.accountid and
di2.effectiveon < getdate()
);
You can also do this with window functions:
select di.*
from (select di.*,
row_number() over (partition by accountid order by effective on desc) as seqnum
from datainfo di
where di.effectiveon < getdate()
) di
where seqnum = 1;

COUNT() with GROUP BY - Query to get number of times a set of values appear in a column SQL

Just struggling with this question.
Write a query that selects the following details about the sides that have been ordered in orders:
• The side ID number and side name.
• How many orders the side has been ordered in (regardless of quantity).
I have created a view already and this is the table for it (ordered_sides_details is the name of the view)
View Table
I've wrttien this query but I believe it just counts the number of rows instead of how many times each side is ordered.
SELECT ordered_sides_details.side_name, COUNT(*)
FROM ordered_sides_details
GROUP BY ordered_sides_details.side_name;
This is the resulting table
Obviously its incorrect as 1.25L Coke has only been in 1 order.
Any help with solving this would be awesome. Thanks.
Solution
There must be something wrong with view you've created.
This should be enough to yield proper results:
SELECT
side_id
,side_name
,COUNT(*) AS total_count
FROM dbo.orders
GROUP BY side_id, side_name
Boostrapping (SQL Server)
Scripts for bootstrapping your example:
IF NOT EXISTS (SELECT 1 FROM sys.tables t WHERE t.object_id = OBJECT_ID('dbo.orders'))
BEGIN
CREATE TABLE orders
(
order_id INT,
side_id INT NOT NULL,
side_name NVARCHAR(100) NOT NULL,
ordered_quantity INT NOT NULL,
total_cost MONEY NOT NULL
);
END;
INSERT INTO orders (order_id, side_id, side_name, ordered_quantity, total_cost)
VALUES
(10, 1, '390ml Coke', 1, 3.00),
(5, 2, '1.25l Coke', 2, 10.00),
(8, 3, 'Lava Cake', 3, 8.85),
(7, 4, 'Chicken Wings', 4, 14.00),
(6, 5, 'Garlic Bread', 4, 7.80),
(5, 6, 'Healthy Kale Chips', 3, 16.50),
(5, 6, 'Healthy Kale Chips', 2, 11.00),
(4, 5, 'Garlic Bread', 1, 1.95),
(3, 4, 'Chicken Wings', 1, 3.50),
(2, 3, 'Lava Cake', 2, 5.90);