SQLite3 database conditional summing - sql

I am looking to order a list of keys based on the number of orders placed from a database containing order requests. Basically, on table, call it orders(o_partkey, o_returnflag) I am trying to get the total number of returns for each order. I have tried many variations of the following snippet with the goal schema returnlist(partkey, numreturns):
select O.o_partkey as partkey,
count(case when O.o_returnflag = 'R' then 1 else 0 end) as numreturns
from orders O
orderby quantity_returned desc;
I am very new to SQLite and am just jumping into the basics. This is an adjustment of a homework question (the actual question is more complex) but I have simplified down the issue I am having.

Consider using a derived table subquery with SUM() as the aggregate function:
SELECT dT.partkey, dT.numreturns
FROM
(SELECT O.o_partkey as partkey,
SUM(CASE WHEN O.o_returnflag = 'R' THEN 1 ELSE 0 END) as numreturns
FROM [ORDER] O
GROUP BY O.o_partkey) AS dT
ORDER BY dT.numreturns DESC;
Be sure to bracket name of table as [ORDER] is an SQLite key word.

Your problem is that COUNT counts rows, so it counts both 0 and 1 values.
You are not interested in any other rows, so you can just filter out the returns with WHERE:
SELECT o_partkey AS partkey,
COUNT(*) AS numreturns
FROM orders
WHERE o_returnflag = 'R'
ORDER BY 2 DESC;

Related

Group by after a partition by in MS SQL Server

I am working on some car accident data and am stuck on how to get the data in the form I want.
select
sex_of_driver,
accident_severity,
count(accident_severity) over (partition by sex_of_driver, accident_severity)
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
This is my code, which counts the accidents had per each sex for each severity. I know I can do this with group by but I wanted to use a partition by in order to work out % too.
However I get a very large table (I assume for each row that is each sex/severity. When I do the following:
select
sex_of_driver,
accident_severity,
count(accident_severity) over (partition by sex_of_driver, accident_severity)
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
group by
sex_of_driver,
accident_severity
I get this:
sex_of_driver
accident_severity
(No column name)
1
1
1
1
2
1
-1
2
1
-1
1
1
1
3
1
I won't give you the whole table, but basically, the group by has caused the count to just be 1.
I can't figure out why group by isn't working. Is this an MS SQL-Server thing?
I want to get the same result as below (obv without the CASE etc)
select
accident.accident_severity,
count(accident.accident_severity) as num_accidents,
vehicle.sex_of_driver,
CASE vehicle.sex_of_driver WHEN '1' THEN 'Male' WHEN '2' THEN 'Female' end as sex_col,
CASE accident.accident_severity WHEN '1' THEN 'Fatal' WHEN '2' THEN 'Serious' WHEN '3' THEN 'Slight' end as serious_col
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
where
sex_of_driver != 3
and
sex_of_driver != -1
group by
accident.accident_severity,
vehicle.sex_of_driver
order by
accident.accident_severity
You seem to have a misunderstanding here.
GROUP BY will reduce your rows to a single row per grouping (ie per pair of sex_of_driver, accident_severity values. Any normal aggregates you use with this, such as COUNT(*), will return the aggregate value within that group.
Whereas OVER gives you a windowed aggregated, and means you are calculating it after reducing your rows. Therefore when you write count(accident_severity) over (partition by sex_of_driver, accident_severity) the aggregate only receives a single row in each partition, because the rows have already been reduced.
You say "I know I can do this with group by but I wanted to use a partition by in order to work out % too." but you are misunderstanding how to do that. You don't need PARTITION BY to work out percentage. All you need to calculate a percentage over the whole resultset is COUNT(*) * 1.0 / SUM(COUNT(*)) OVER (), in other words a windowed aggregate over a normal aggregate.
Note also that count(accident_severity) does not give you the number of distinct accident_severity values, it gives you the number of non-null values, which is probably not what you intend. You also have a very strange join predicate, you probably want something like a.vehicle_id = v.vehicle_id
So you want something like this:
select
sex_of_driver,
accident_severity,
count(*) as Count,
count(*) * 1.0 /
sum(count(*)) over (partition by sex_of_driver) as PercentOfSex
count(*) * 1.0 /
sum(count(*)) over () as PercentOfTotal
from
dbo.accident as accident a
inner join dbo.vehicle as v on
a.vehicle_id = v.vehicle_id
group by
sex_of_driver,
accident_severity;

SQL - Count new entries based on last date

I have a table with the follow structure
ID ReportDate Object_id
What I need to know, is the count of new and count of old (Object id's)
For example: If I have the data below:
I want the following output grouped by ReportDate:
I thought a way doing it using a Where clause based on date, however i need the data for all the dates I have in the table. To see the count of what already existed in the previous report and what is new at that report. Any Ideas?
Edit: New/Old definition- New would be the records that never appeared before that report run date and appeared on this one, whereas old is the number of records that had at least one match in previous dates. I'll edit the post to include this info.
managed to do it using a left join. Below is my solution in case it helps anyone in the future :)
SELECT table.ReportRunDate,
-1*sum(table.ReportRunDate = new_table.init_date) as count_new,
-1*sum(table.ReportRunDate <> new_table.init_date) as count_old,
count(*) as count_total
FROM table LEFT JOIN
((SELECT Object_ID, min(ReportRunDate) as init_date
FROM table
GROUP By OBJECT_ID) as new_table)
ON table.Object_ID = new_table.Object_ID
GROUP BY ReportRunDate
This would work in Oracle, not sure about ms-access:
SELECT ReportDate
,COUNT(CASE WHEN rnk = 1 THEN 1 ELSE NULL END) count_of_new
,COUNT(CASE WHEN rnk <> 1 THEN 1 ELSE NULL END)count_of_old
FROM (SELECT ID
,ReportDate
,Object_id
,RANK() OVER (PARTITION BY Object_id ORDER BY ReportDate) rnk
FROM table_name)
GROUP BY ReportDate
Inner query should rank each occurence of object_id based on the ReportDate so the 1st occurrence of certain object_id will have rank = 1, the next one rank = 2 etc.
Then the outer query counts how many records with rank equal/not equal 1 are the within each group.
I assumed that 1 object_id can appear only once within each reportDate.

using correlated subquery in the case statement

I’m trying to use a correlated subquery in my sql code and I can't wrap my head around what I'm doing wrong. A brief description about the code and what I'm trying to do:
The code consists of a big query (ALIASED AS A) which result set looks like a list of customer IDs, offer IDs and response status name ("SOLD","SELLING","IRRELEVANT","NO ANSWER" etc.) of each customer to each offer. The customers IDs and the responses in the result set are non-unique, since more than one offer can be made to each customer, and a customer can have different response for different offers.
The goal is to generate a list of distinct customer IDs and to mark each ID with 0 or 1 flag :
if the ID has AT LEAST ONE offer with status name is "SOLD" or "SELLING" the flag should be 1 otherwise 0. Since each customer has an array of different responses, what I'm trying to do is to check if "SOLD" or "SELLING" appears in this array for each customer ID, using correlated subquery in the case statement and aliasing the big underlying query named A with A1 this time:
select distinct
A.customer_ID,
case when 'SOLD' in (select distinct A1.response from A as A1
where A.customer_ID = A1.customer_ID) OR
'SELLING' in (select distinct A1.response from A as A1
where A.customer_ID = A1.customer_ID)
then 1 else 0 end as FLAG
FROM
(select …) A
What I get is a mistake alert saying there is no such object as A or A1.
Thanks in advance for the help!
You can use exists with cte :
with cte as (
<query here>
)
select c.*,
(case when exists (select 1
from cte c1
where c1.customer_ID = c.customer_ID and
c1.response in ('sold', 'selling')
)
then 1 else 0
end) as flag
from cte c;
You can also do aggregation :
select customer_id,
max(case when a.response in ('sold', 'selling') then 1 else 0 end) as flag
from < query here > a;
group by customer_id;
With statement as suggested by Yogesh is a good option. If you have any performance issues with "WITH" statement. you can create a volatile table and use columns from volatile table in your select statement .
create voltaile table as (select response from where response in ('SOLD','SELLING').
SELECT from customer table < and join voltaile table>.
The only disadvantge here is volatile tables cannot be accessed after you disconnect from session.

Retrieve the total number of orders made and the number of orders for which payment has been done

Retrieve the total number of orders made and the number of orders for which payment has been done(delivered).
TABLE ORDER
------------------------------------------------------
ORDERID QUOTATIONID STATUS
----------------------------------------------------
Q1001 Q1002 Delivered
O1002 Q1006 Ordered
O1003 Q1003 Delivered
O1004 Q1006 Delivered
O1005 Q1002 Delivered
O1006 Q1008 Delivered
O1007 Q1009 Ordered
O1008 Q1013 Ordered
Unable to get the total number of orderid i.e 8
select count(orderid) as "TOTALORDERSCOUNT",count(Status) as "PAIDORDERSCOUNT"
from orders
where status ='Delivered'
The expected output is
TOTALORDERDSCOUNT PAIDORDERSCOUNT
8 5
I think you want conditional aggregation:
select count(*) as TOTALORDERSCOUNT,
sum(case when status = 'Delivered' then 1 else 0 end) as PAIDORDERSCOUNT
from orders;
Try this-
SELECT COUNT(ORDERID) TOTALORDERDSCOUNT,
SUM(CASE WHEN STATUS = 'Delivered' THEN 1 ELSE 0 END ) PAIDORDERSCOUNT
FROM ORDER
You can also use COUNT in place of SUM as below-
SELECT COUNT(ORDERID) TOTALORDERDSCOUNT,
COUNT(CASE WHEN STATUS = 'Delivered' THEN 1 ELSE NULL END ) PAIDORDERSCOUNT
FROM ORDER
you could use cross join between the two count
select count(orderid) as TOTALORDERSCOUNT, t.PAIDORDERSCOUNT
from orders
cross join (
select count(Status) PAIDORDERSCOUNT
from orders where Status ='Delivered'
) t
What I've used in the past for summarizing totals is
SELECT
count(*) 'Total Orders',
sum( iif( orders.STATUS = 'Delivered', 1, 0 ) ) 'Total Paid Orders'
FROM orders
I personally don't like using CASE WHEN if I don't have to. This logic may look like its a little too much for a simple summation of totals, but it allows for more conditions to be added quite easily and also just involves less typing, at least for what I use this regularly for.
Using the iif( statement to set up the conditional where you're looking for all rows in the STATUS column with the value 'Delivered', with this set up, if the status is 'Delivered', then it marks it stores a value of 1 for that order, and if the status is either 'Ordered' or any other value, including null values or if you ever need a criteria such as 'Pending', it would still give an accurate count.
Then, nesting this within the 'sum' function totals all of the 1's denoted from your matched values. I use this method regularly for report querying when there's a need for many conditions to be narrowed down to a summed value. This also opens up a lot of options in the case you need to join tables in your FROM statement.
Also just out of personal preference and depending on which SQL environment you're using this in, I tend to only use AS statements for renaming when absolutely necessary and instead just denote the column name with a single quoted string. Does the same thing, but that's just personal preference.
As stated before, this may seem like it's doing too much, but for me, good SQL allows for easy change to conditions without having to rewrite an entire query.
EDIT** I forgot to mention using count(*) only works if the orderid's are all unique values. Generally speaking for an orders table, orderid is an expected unique value, but just wanted to add that in as a side note.
SELECT DISTINCT COUNT(ORDERID) AS [TOTALORDERSCOUNT],
COUNT(CASE WHEN STATUS = 'ORDERED' THEN ORDERID ELSE NULL END) AS [PAIDORDERCOUNT]
FROM ORDERS
TotalOrdersCount will count all distinct values in orderID while the case statement on PaidOrderCount will filter out any that do not have the desired Status.

How to group the outcomes of a query by the values in another column?

I have a table called vegetation with 2 columns:
type, count
I want to sum the count for all the rows where count value is smaller than the average count and all those for which it is higher.
I don't know how to reflect this in the group by clause... (or somewhere else?).
I guess that another way of doing it should be by assigning a value to all less-than-average data and another value to all higher-than-average data and then group by this value. But I just started and can not figure out how to do that either.
SELECT sum(CASE WHEN ct <= x.avg_ct THEN ct ELSE 0 END) AS sum_ct_low
,sum(CASE WHEN ct > x.avg_ct THEN ct ELSE 0 END) AS sum_ct_hi
FROM vegetation v
,(SELECT avg(ct) AS avg_ct FROM vegetation) x
The average is a single value, you can just CROSS JOIN the subquery to the base table (comma separated list of tables means cross-joining).
Then filter with a simple CASE statement.
I use ct instead of count, since that's a reserved word in SQL.