This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 6 years ago.
I have a table that looks like this
Invoice |Line # |Item # |Price Per
1 |1 |11 |5.00
1 |2 |22 |10.00
2 |1 |11 |5.00
2 |2 |22 |12.00
3 |1 |11 |5.00
4 |1 |11 |6.00
I am trying to get the last selling price of an item.
How do I run a script that yields the following results?
Invoice |Line # |Item # |Price Per
2 |2 |22 |12.00
4 |1 |11 |6.00
I am using this script to compare to the current selling price.
Thanks
Assuming invoice and line define the ordering, then the traditional method uses row_number():
select t.*
from (select t.*,
row_number() over (partition by item order by invoice desc, line desc) as seqnum
from t
) t
where seqnum = 1;
Related
Background
I'm a novice Postgres user running a local server on a Windows 10 machine. I've got a dataset g that looks like this:
+--+---------+----------------+
|id|treatment|outcome_category|
+--+---------+----------------+
|a |1 |cardiovascular |
|a |0 |cardiovascular |
|b |0 |metabolic |
|b |0 |sensory |
|c |1 |NULL |
|c |0 |cardiovascular |
|c |1 |sensory |
|d |1 |NULL |
|d |0 |cns |
+--+---------+----------------+
The Problem
I'd like to get a count of outcome_category by outcome_category for those id who are "ever treated" -- defined as "id's who have any row where treatment=1".
Here's the desired result:
+----------------+---------+
|outcome_category| count |
+----------------+---------+
|cardiovascular | 3 |
|sensory | 1 |
|cns | 1 |
+----------------+---------+
It would be fine if the result had to contain metabolic, like so:
+----------------+---------+
|outcome_category|treatment|
+----------------+---------+
|cardiovascular | 3 |
|metabolic | 0 |
|sensory | 1 |
|cns | 1 |
+----------------+---------+
Obviously I don't need the rows to be in any particular order, though descending would be nice.
What I've tried
Here's a query I've written:
select treatment, outcome_category, sum(outcome_ct)
from (select max(treatment) as treatment,
outcome_category,
count(outcome_category) as outcome_ct
from g
group by outcome_category) as sub
group by outcome_category, sub.treatment;
But it's a mishmash result:
+---------+----------------+---+
|treatment|outcome_category|sum|
+---------+----------------+---+
|1 |cardiovascular |3 |
|1 |sensory |2 |
|0 |metabolic |1 |
|1 |NULL |0 |
|0 |cns |1 |
+---------+----------------+---+
I'm trying to identify the "ever exposed" id's using that first line in the subquery: select max(treatment) as treatment. But I'm not quite getting at the rest of it.
EDIT
I realized that the toy dataset g I originally gave you above doesn't correspond to the idiosyncrasies of my real dataset. I've updated g to reflect that many id's who are "ever treated" won't have a non-null outcome_category next to a row with treatment=1.
Interesting little problem. You can do:
select
outcome_category,
count(x.id) as count
from g
left join (
select distinct id from g where treatment = 1
) x on x.id = g.id
where outcome_category is not null
group by outcome_category
order by count desc
Result:
outcome_category count
----------------- -----
cardiovascular 3
sensory 1
cns 1
metabolic 0
See running example at db<>fiddle.
This would appear to be just a simple aggregation,
select outcome_category, Count(*) count
from t
where treatment=1
group by outcome_category
order by Count(*) desc
Demo fiddle
Background
I'm a novice SQL user. Using PostgreSQL 13 on Windows 10 locally, I have a table t:
+--+---------+-------+
|id|treatment|outcome|
+--+---------+-------+
|a |1 |0 |
|a |1 |1 |
|b |0 |1 |
|c |1 |0 |
|c |0 |1 |
|c |1 |1 |
+--+---------+-------+
The Problem
I didn't explain myself well initially, so I've rewritten the goal.
Desired result:
+-----------------------+-----+
|ever treated |count|
+-----------------------+-----+
|0 |1 |
|1 |3 |
+-----------------------+-----+
First, identify id that have ever been treated. Being "ever treated" means having any row with treatment = 1.
Second, count rows with outcome = 1 for each of those two groups. From my original table, the ids who are "ever treated" have a total of 3 outcome = 1, and the "never treated", so to speak, have 1 `outcome = 1.
What I've tried
I can get much of the way there, I think, with something like this:
select treatment, count(outcome)
from t
group by treatment;
But that only gets me this result:
+---------+-----+
|treatment|count|
+---------+-----+
|0 |2 |
|1 |4 |
+---------+-----+
For the updated question:
SELECT ever_treated, sum(outcome_ct) AS count
FROM (
SELECT id
, max(treatment) AS ever_treated
, count(*) FILTER (WHERE outcome = 1) AS outcome_ct
FROM t
GROUP BY 1
) sub
GROUP BY 1;
ever_treated | count
--------------+-------
0 | 1
1 | 3
db<>fiddle here
Read:
For those who got no treatment at all (all treatment = 0), we see 1 x outcome = 1.
For those who got any treatment (at least one treatment = 1), we see 3 x outcome = 1.
Would be simpler and faster with proper boolean values instead of integer.
(Answer to updated question)
here is an easy to follow subquery logic that works with integer:
select subq.ever_treated, sum(subq.count) as count
from (select id, max(treatment) as ever_treated, count(*) as count
from t where outcome = 1
group by id) as subq
group by subq.ever_treated;
I am working with an inpatients' data table that looks like the following:
ID | AdmissionDate |DischDate |LOS |Readmitted30days
+------+-------+-------------+---------------+---------------+
|001 | 2014-01-01 | 2014-01-12 |11 |1
|101 | 2014-02-05 | 2014-02-12 |7 |1
|001 | 2014-02-18 | 2018-02-27 |9 |1
|001 | 2018-02-01 | 2018-02-13 |12 |0
|212 | 2014-01-28 | 2014-02-12 |15 |1
|212 | 2014-03-02 | 2014-03-15 |13 |0
|212 | 2016-12-23 | 2016-12-29 |4 |0
|1011 | 2017-06-10 | 2017-06-21 |11 |0
|401 | 2018-01-01 | 2018-01-11 |10 |0
|401 | 2018-10-01 | 2018-10-10 |9 |0
I want to create another table from the above in which the total length of stay (LOS) is summed up for those who have been readmitted within 30 days. The table I want to create looks like the following:
ID |Total LOS
+------+-----------
|001 |39
|212 |28
|212 |4
|1011 |11
|401 |10
|401 |9
I am using SQL Server Version 17.
Could anyone help me do this?
Thanks in advance
The Readmitted30days column seems irrelevant to the question and a complete red herring. What you seem to want is to aggregate rows which are within 30 days of each other.
This is a type of gaps-and-islands problem. There are a number of solutions, here is one:
We use LAG to check whether the previous DischDate is within 30 days of this AdmissionDate
Based on that we assign a grouping ID by doing a running count
Then simply group by ID and our grouping ID, and sum
The dates and LOS don't seem to match up, so I've given you both
WITH StartPoints AS (
SELECT *,
IsStart = CASE WHEN
DATEADD(day, -30, AdmissionDate) <
LAG(DischDate) OVER (PARTITION BY ID ORDER BY DischDate)
THEN 1 END
FROM YourTable
),
Groupings AS (
SELECT *,
GroupId = COUNT(IsStart) OVER (PARTITION BY ID ORDER BY DischDate ROWS UNBOUNDED PRECEDING)
FROM StartPoints
)
SELECT
ID,
TotalBasedOnDates = SUM(DATEDIFF(day, AdmissionDate, DischDate)), -- do you need to add 1 within the sum?
TotalBasedOnLOS = SUM(LOS)
FROM Groupings
GROUP BY ID, GroupID;
db<>fiddle
if I understand correctly :
select Id, sum(LOS)
from tablename
where Readmitted30days = 1
group by Id
You want to use aggregation:
select id, sum(los)
from t
group by id
having max(Readmitted30days) = 1;
This filters after the aggregation so all los values are included in the sum.
EDIT:
I think I understand. Every occasion where Readmitted30days = 0, you want a row in the result set that combines that row with the following rows up to the next matching row.
If that interpretation is correct, you can construct groups using a cumulative sum and then aggregate:
select id, sum(los)
from (select t.*,
sum(1 - Readmitted30days = 0) over (partition by id order by admissiondate) as grp
from t
) t
group by id, grp;
I have these two tables:
TIME (this table contains the time_id which in turn gives the details like the day,month, year etc)
|time_id|hour|day|month|year|
_____________________________
|1234 |1 |6 |9 |2013|
_____________________________
|1235 |2 |7 |9 |2013|
_____________________________
|1223 |2 |4 |8 |2014|
_____________________________
|1227 |2 |8 |8 |2014|
SUM_JOBS_PROCESSED (this table contains the time_id and the no of jobs processed for this particular time_id.)
|time_id|sum_of_jobs_processed|
_______________________________
|1234 |5 |
_______________________________
|1235 |6 |
_______________________________
|1223 |4 |
_______________________________
|1227 |4 |
I am trying to write a query which should display something like this
|month|year|sum_of_jobs_processed|
__________________________________
|9 |2013| 11 |
__________________________________
|8 |2014| 8 |
__________________________________
It should display total number of jobs processed for a month.
Could anyone please help me with these? I am able to find total number of jobs processed for a day, but number of jobs processed for a month, is not happening.
Not sure I fully understood what you're trying, but I think this query should give you the desired result:
SELECT t.month,
t.year,
SUM(s.sum_of_jobs_processed)
FROM bspm_dim_time t
JOIN bspm_sum_jobs_day s
ON t.time_id = s.time_id
GROUP BY t.month,
t.year
ORDER BY t.year,
t.month
Live DEMO.
Try this:
SELECT month,
year,
sum(sum_of_jobs_processed)
FROM TIME
INNER JOIN SUM_JOBS_PROCESSED
ON TIME.time_id = SUM_JOBS_PROCESSED.time_id
GROUP BY month,
year
ORDER BY month,
year
Mark as answer if correct.
I have a database table containing two costs. I want to find the distinct costs over these two columns. I also want to find the count that these costs appear. The table may look like
|id|cost1|cost2|
|1 |50 |60 |
|2 |20 |50 |
|3 |50 |70 |
|4 |20 |30 |
|5 |50 |60 |
In this case I want a result that is distinct over both columns and count the number of times that appears. So the result I would like is
|distinctCost|count|
|20 |2 |
|30 |1 |
|50 |4 |
|60 |2 |
|70 |1 |
and ideally ordered
|disctinCost1|count|
|50 |4 |
|60 |2 |
|20 |2 |
|70 |1 |
|30 |1 |
I can get the distinct over two columns by doing something like
select DISTINCT c FROM (SELECT cost1 AS c FROM my_costs UNION SELECT cost2 AS c FROM my_costs);
and I can get the count for each column by doing
select cost1, count(*)
from my_costs
group by cost1
order by count(*) desc;
My problem is how can I get the count for both columns? I am stuck on how to do the count over each individual column and then add it up.
Any pointers would be appreciated.
I am using Oracle DB.
Thanks
By combining your two queries..
select cost, count(*)
from
(
SELECT id, cost1 AS cost FROM my_costs
UNION ALL
SELECT id, cost2 AS c FROM my_costs
) v
group by cost
order by count(*) desc;
(If when a row has cost1 and cost2 equal, you want to count it once not twice, change the union all to a union)
You can use the unpivot statement :
select *
from
(
SELECT cost , count(*) as num_of_costs
FROM my_costs
UNPIVOT
(
cost
FOR cost_num IN (cost1,cost2)
)
group by cost
)
order by num_of_costs desc;