Lets have following sample table:
Person Quantity
A 1
B 2
C 3
D 4
E 5
Result should be:
PersonAggregate
1 (0+Quantity of PersonA)=sumA
3 (sumA+Quantity of PersonB)=sumB
6 (sumB+Quantity of PersonC)=sumC
10 (sumC+Quantity of PersonD)=sumD
15 (sumD+Quantity of PersonE)
Is it possible to get this result in onq SQL-query?
Most versions of SQL support cumulative sums as a window function:
select person, sum(quantity) over (order by person) as cumesum
from sample;
You can can also do this with a correlated subquery:
select s.person,
(select sum(s2.quantity)
from samples s2
where s2.person <= s.person
) as cumesum
from sample s;
this will obviously get the individual sums.
select person, sum(quantity)
from sample
group by person
order by person
i don't think your desired effect can be done in a set based way. a procedural language with cursor, like T-SQL or PLSQL, can do it easily.
i'd write a stored procedure and call it.
If the sample table has more than one row per person with multiple quantities that need to be summed you could use:
select curr.person, curr.sum_person + case when prev.person <> curr.person
then prev.sum_person
else 0 end as person_sum
from (select person, sum(quantity) as sum_person
from sample
group by person) curr
cross join (select person, sum(quantity) as sum_person
from sample
group by person) prev
where prev.person =
(select max(x.person) from sample x where x.person < curr.person)
or curr.person = (select min(person) from sample)
group by curr.person
Fiddle: http://sqlfiddle.com/#!2/7c3135/6/0
Output:
| PERSON | PERSON_SUM |
|--------|------------|
| A | 1 |
| B | 3 |
| C | 5 |
| D | 7 |
| E | 9 |
If there is only one row per person on the sample table, you could more simply use:
select curr.person, curr.quantity + case when prev.person <> curr.person
then prev.quantity
else 0 end as person_sum
from sample curr
cross join sample prev
where prev.person =
(select max(x.person) from sample x where x.person < curr.person)
or curr.person = (select min(person) from sample)
group by curr.person
Fiddle: http://sqlfiddle.com/#!2/7c3135/8/0
Output returned is the same, because in your example, there is only one row per person.
If using Oracle, SQL Server, or a database that supports analytic functions, you could use:
If sample has one row per person:
select person,
sum(quantity) over(order by person rows between 1 preceding and current row) as your_sum
from sample
order by person
Fiddle: http://sqlfiddle.com/#!4/82e6f/1/0
If sample has 2+ rows per person:
select person,
sum(sum_person) over(order by person rows between 1 preceding and current row) as your_sum
from (select person, sum(quantity) as sum_person
from sample
group by person) x
order by person
Fiddle: http://sqlfiddle.com/#!4/82e6f/4/0
Related
I have index_date information for IDs and I want to extract baseline ( information between index_date and Index_date minus 6 months). I want to do this without using Cartesian product.
Total Table
ID index_date detail
1 01Jan2012 xyz
1 01Dec2011 pqr
1 01Nov2010 pqr
2 26Feb2013 abc
3 02Mar2013 abc
3 02Feb2013 ert
3 02Jan2013 tyu
4 07May2015 rts
I have a table A extracted from Total which has the index_dates:
ID index_date index_detail
1 01Jan2012 xyz
2 26Feb2013 abc
3 02Mar2013 abc
4 07May2015 rts
I want to extract baseline periods data for IDs in A from from the Total table
Table want :
ID date index_date detail index_detail
1 01Jan2012 01Jan2012 xyz xyz
1 01Dec2011 01Jan2012 pqr xyz
2 26Feb2013 26Feb2013 abc abc
3 02Mar2013 02Mar2013 abc abc
3 02Feb2013 02Mar2013 ert abc
3 02Jan2013 02Mar2013 tyu abc
4 07May2015 07May2015 rts rts
code used :
create table want as
select a.* , b.date,b.detail
from table_a as a
right join
Total as b
on a.id = b.id where
a.index_date > b.date
AND b.date >= add_months( a.index_date, -6)
;
But this requires Cartesian Product. Is there a way to do this without requiring Cartesian product.
DBMS - Hive
Sorry, I don't know it.
I'll give the solution on pure SQL for MySQL 8+ - maybe you'll find the way to convert it to Hive syntax.
SELECT id,
index_date date,
FIRST_VALUE(index_date) OVER (PARTITION BY ID ORDER BY STR_TO_DATE(index_date, '%d%b%Y') DESC) index_date,
detail,
FIRST_VALUE(detail) OVER (PARTITION BY ID ORDER BY STR_TO_DATE(index_date, '%d%b%Y') DESC) index_detail
FROM test
ORDER BY 1 ASC, 2 DESC
fiddle
I would recommend three steps:
Convert the date to a number.
Find the minimum date in a six month period.
Get the first value in that group.
This looks like:
select t.*, t2.index_date, t2.detail
from (select t.*,
min(index_date) over (partition by id
order by months
range between 6 preceding and current row
) as sixmonth_date
from (select t.*,
year(index_date) * 12 + month(index_date) as months
from total t
) t
) t left join
total t2
on t2.id = t.id and t2.index_date = t.sixmonth_date;
This is marginally simpler if first_value() accepts range window frames -- but I'm not sure if it does. It is worth trying, though:
select t.*,
min(index_date) over (partition by id
order by months
range between 6 preceding and current row
) as sixmonth_date,
first_value(detail) over (partition by id
order by months
range between 6 preceding and current row
) as sixmonth_value
from (select t.*,
year(index_date) * 12 + month(index_date) as months
from total t
) t
I'm working with a medical claim table in pyspark and I want to return only userid's that have at least 2 claim_ids. My table looks something like this:
claim_id | userid | diagnosis_type | claim_type
__________________________________________________
1 1 C100 M
2 1 C100a M
3 2 D50 F
5 3 G200 M
6 3 C100 M
7 4 C100a M
8 4 D50 F
9 4 A25 F
From this example, I would want to return userid's 1, 3, and 4 only. Currently I'm building a temp table to count all of the distinct instances of the claim_ids
create table temp.claim_count as
select distinct userid, count(distinct claim_id) as claims
from medical_claims
group by userid
and then pulling from this table when the number of claim_id >1
select distinct userid
from medical_claims
where userid (
select distinct userid
from temp.claim_count
where claims>1)
Is there a better / more efficient way of doing this?
If you want only the ids, then use group by:
select userid, count(*) as claims
from medical_claims
group by userid
having count(*) > 1;
If you want the original rows, then use window functions:
select mc.*
from (select mc.*, count(*) over (partition by userid) as num_claims
from medical_claims mc
) mc
where num_claims > 1;
below is table:
Name | Hike% | Month
------------------------
A 7 1
A 6 2
A 8 3
b 4 1
b 7 2
b 7 3
Result should be:
Name | Hike% | Month
------------------------
A 8 3
b 7 2
Here is one way of doing this:
SELECT Name, [Hike%], Month
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY [Hike%] DESC, Month) rn
FROM yourTable
) t
WHERE rn = 1
ORDER BY Name;
If you instead want to return multiple records per name, in the case where two or more records might be tied for having the greatest hike%, then replace ROW_NUMBER with RANK.
use correlated subquery
select Name,min(Hike) as Hike,min(Month) as Month
from
(
select * from tablename a
where Hike in (select max(Hike) from tablename b where a.name=b.name)
)A group by Name
You can use something similar to the below:
SELECT Name, MAX(Hike), Month
FROM table
GROUP BY Name, Month
Hope this helps :)
How do I count the average amount of times a given number appears in a database?
id | ...
----------
1 | ...
5 | ...
2 | ...
3 | ...
3 | ...
1 | ...
6 | ...
4 | ...
3 | ...
...| ...
id corresponds to the id of the user. Perhaps the table is for customer orders or donations made by a user. For the above table:
id 1 = 2 entries
id 2 = 1 entry
id 3 = 3 entries
id 4 = 1 entry
id 5 = 1 entry
id 6 = 1 entry
Average = (2+1+3+1+1+1)/6 = 1.5 entries per user
The average number of orders/donations made per user is 1.5 to give an example.
I could do something like the below:
$getTotalEntries = $db->prepare("
SELECT *
FROM table
");
$getTotalEntries->execute();
$totalEntries = $getTotalEntries->rowCount();
$getGroupedEntries = $db->prepare("
SELECT *
FROM table
GROUP BY id
");
$getGroupedEntries->execute();
$groupedEntries = $getTotalEntries->rowCount();
$average = $totalEntries/$groupedEntries;
I'm hoping for a single SQL request, however. Incidentally, the below gives me the number of occurances of a given id, but I cannot AVG() them.
$getAverageEntries = $db->prepare("
SELECT id, COUNT(*)
FROM table
GROUP BY id
"); // works, returns the 2,1,3,1,... from before
$getAverageEntries = $db->prepare("
SELECT AVG(COUNT(*))
FROM table
GROUP BY id
"); // won't find aggregate count
How about this?
select count(id) / count(distinct id) as avgEntriesPerUser
from table t;
The only issue with this would be a NULL value for id. If this occurred (and I find it highly unlikely for a column named id), then the above ignores those rows entirely. It can be modified to take this situation into account.
select avg(a.entryCount)
from (
select id, count(id) as entryCount
from <tablename>
group by id
) a;
in SQL you need to do a count to get the number of entries
select avg(entries) from(
Select Distinct Id.tableName, count(Id) As Entries
from tableName
group by ID)
You mean?
select avg(countPerID) from (
select id, count(*) as countPerID from table group by id) x
I have the following code
select ID, count(*) from
( select ID, service type from database
group by 1,2) suba
group by 1
having count (*) > 1
And I get a table where i see the IDs and a count of changes. Similar to this
ID | Count(*)
5675 | 2
5695 | 3
5855 | 2
5625 | 4
5725 | 3
Can someone explain to me how to count all the count(*) into groups such that i get a table similar to...
count (*) | number
2 | 2
3 | 2
4 | 1
and so forth. Can someone also explain to be me what suba means?
MY NEWEST CODE:
select suba.id, count(*) from
( select id, service_type from table_name
group by 1,2) as suba
group by 1
having count (*) > 1
Haven't tried it, but I think this should work
select NoOfChanges, count (*) from
(
select suba.id, count(*) as NoOfChanges from
( select id, service_type from table_name
group by 1,2) as suba
group by 1
having count (*) > 1
)
subtableb
group by NoOfChanges
You can think of that as
select NoOfChanges, count (*) from subtableb
group by NoOfChanges
but subtableb isn't a real table, but the results from your previous query
suba is the alias of the subquery. Every table or subquery needs a unique name or an alias so you can refer to it in other parts of the query (and disambiguate). Note there is a missing implicit AS between the closing parenthesis and "suba".