Related
I have a table contains info about customers and their purchases amount of each type of food. I want to create new columns that is the most freq type of food they have purchased. Is there an efficient way to do this?
I tried using case when and do one-to-one comparison, but it got very tedious.
Sample data:
Cust_ID
apple_type1
apple_type2
apple_type3
apple_type4
apple_type5
apple_type6
1
2
0
0
3
6
1
2
0
0
0
1
0
1
3
4
2
1
1
0
1
4
5
5
5
0
0
0
5
0
0
0
0
0
0
--WANT
Cust_ID
freq_apple_type_buy
1
type5
2
type4 and type6
3
type1
4
type1 and type2 and type3
5
unknown
Consider below approach
select Cust_ID, if(count(1) = any_value(all_count), 'unknown', string_agg(type, ' and ')) freq_apple_type_buy
from (
select *, count(1) over(partition by Cust_ID) all_count
from (
select Cust_ID, replace(arr[offset(0)], 'apple_', '') type,cast(arr[offset(1)] as int64) value
from data t,
unnest(split(translate(to_json_string((select as struct * except(Cust_ID) from unnest([t]))), '{}"', ''))) kv,
unnest([struct(split(kv, ':') as arr)])
)
where true qualify 1 = rank() over(partition by Cust_ID order by value desc)
)
group by Cust_ID
if applied to sample data in your question - output is
This uses UNPIVOT to turn your columns in to rows. Then uses RANK() to assign each row a rank, which means if multiple rows are matched in quantity, they share the same rank.
It then selects only the products with rank=1 (possibly multiple rows, if multiple products are tied for first place)
WITH
normalised_and_ranked AS
(
SELECT
cust_id,
product,
qty,
RANK() OVER (PARTITION BY cust_id ORDER BY qty DESC) AS product_rank,
ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY qty DESC) AS product_row
FROM
yourData
UNPIVOT(
qty FOR product IN (apple_type1, apple_type2, apple_type3, apple_type4, apple_type5, apple_type6)
)
)
SELECT
cust_id,
CASE WHEN qty = 0 THEN NULL ELSE product END AS product,
CASE WHEN qty = 0 THEN NULL ELSE qty END AS qty
FROM
normalised_and_ranked
WHERE
(product_rank = 1 AND qty > 0)
OR
(product_row = 1)
Edit: fudge added to ensure row of nulls returned if all qty are 0.
(Normally I'd just not return a row for such customers.)
I've imported data ("Amount" and "Narration") from a spreadsheet into a table and need help with a query to group consecutive records according to their "Narration", for example:
Expected output:
line_no amount narration calc_group <-Not part of table
----------------------------------------
1 10 Reason 1 1
2 -10 Reason 1 1
3 5 Reason 2 2
4 5 Reason 2 2
5 -10 Reason 2 2
6 -8 Reason 1 3
7 8 Reason 1 3
8 11 Reason 1 3
9 99 Reason 3 4
10 -99 Reason 3 4
I've tried some analytical functions:
select line_no, amount, narration,
first_value (line_no) over
(partition by narration order by line_no) "calc_group"
from test
order by line_no
But that does not work because the Narration of line 6 to 8 is the same as line 1 and 2.
line_no amount narration calc_group
----------------------------------------
1 10 Reason 1 1
2 -10 Reason 1 1
3 5 Reason 2 3
4 5 Reason 2 3
5 -10 Reason 2 3
6 -8 Reason 1 1
7 8 Reason 1 1
8 11 Reason 1 1
9 99 Reason 3 4
10 -99 Reason 3 4
UPDATE
I've managed to do it using lag analytical function and sequences, not very elegant but it works. There should be a better way, please comment!
create or replace function get_next_test_seq
return number
as
begin
return test_seq.nextval;
end get_next_test_seq;
create or replace function get_curr_test_seq
return number
as
begin
return test_seq.currval;
end get_curr_test_seq;
update test
set group_no =
(with cte1
as (select line_no, amount, narration,
lag (narration) over (order by line_no) prev_narration, group_no
from test
order by line_no),
cte2
as (select line_no, amount, narration, group_no,
case when prev_narration is null or prev_narration <> narration then get_next_test_seq else get_curr_test_seq end new_group_no
from cte1)
select new_group_no
from cte2
where cte2.line_no = test.line_no);
UPDATE 2
I'm satisfied with the better accepted answer. Thanks kordiko!
Try this query:
SELECT line_no,
amount,
narration,
SUM( x ) OVER ( ORDER BY line_no
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as calc_group
FROM (
SELECT t.*,
CASE lag( narration ) OVER (order by line_no )
WHEN narration THEN 0
ELSE 1 END x
FROM test t
)
ORDER BY line_no
demo --> http://www.sqlfiddle.com/#!4/6d7aa/9
I have two table
Table_A
ID PostId Item Stock Price
1 1 A 30 10
2 1 B 40 20
3 2 A 50 5
4 3 A 50 25
Table_B
ID PostId Item_ID Sold Price
1 1 1 2 20
2 1 2 2 40
3 1 1 1 10
4 2 3 3 15
5 2 3 1 5
I want to queries from above two table that have same 'PostID' and COUNT and SUM some field group by 'PostID', expected output would be like this
Output
ID PostId Total Item Total Stock Total Buyer(s) Total Sold Total Price
1 1 2 70 3 5 70
I've try to JOIN it, but result still miss calculate
SELECT Table_A.PostId AS PostId, COUNT(Table_A.Item) AS Total_Item, SUM(Table_A.stock) AS Total_Stock, COUNT(Table_B.Item_ID) AS total_buyer, SUM( Table_B.Sold ) AS TotalSold, SUM( Table_B.Price ) AS Total_Price
FROM Table_A
LEFT JOIN Table_B
ON Table_A.PostId = Table_B.PostId
WHERE Table_A.PostId = '1'
GROUP BY Table_A.PostId
LIMIT 0 , 30
Any suggestion for this query problem?? Thank you
SELECT Table_B.PostId AS PostId,
MIN(Table_A.Total_Item) AS Total_Item,
MIN(Table_A.Total_Stock) AS Total_Stock,
COUNT(Table_B.Item_ID) AS total_buyer,
SUM( Table_B.Sold ) AS TotalSold,
SUM( Table_B.Price ) AS Total_Price
FROM Table_B
LEFT JOIN
(
SELECT PostId,
COUNT(Item) AS Total_Item,
SUM(stock) AS Total_Stock
FROM
Table_A
GROUP BY PostId
) Table_A
ON Table_B.PostId=Table_A.PostId
WHERE Table_B.PostId = '1'
GROUP BY Table_B.PostId
LIMIT 0 , 30
The closest thing I got from searching this site is this :
Inventory Average Cost Calculation in SQL
But unfortunately it was oracle specific, using model clause.
So let's begin.
There are two tables:
-the one that holds inventory transactions, and
-the one that holds the latest inventory valuation
I am trying to make an inventory valuation report using average costing method based on a certain date.
Doing it the normal way, calculating from the beginning until that specific date, will yield variable response time.
Imagine calculating on five years worth of data ( and thousands different inventory items ).
It will take considerable amount of time ( and my company is not silicon-valley grade. meaning, 2 core cpu and 8 GB of RAM only)
so I am calculating it backwardly: from the latest (current) backtrack to that specific date.
(Every month the accounting dept will check on data, so the calculation will only deal with 1 month's worth of data, forever.
equal to consistent unchanging performance)
I have merged the table into one on the script below
create table test3 ( rn integer, amt numeric, qty integer, oqty integer);
insert into test3 (rn,amt,qty,oqty) values (0,2260038.16765793,8,0);
insert into test3 (rn,amt,qty,oqty) values (1,1647727.2727,3,0);
insert into test3 (rn,amt,qty,oqty) values (2,2489654.75326715,0,1);
insert into test3 (rn,amt,qty,oqty) values (3,2489654.75326715,0,1);
insert into test3 (rn,amt,qty,oqty) values (4,1875443.6364,1,0);
insert into test3 (rn,amt,qty,oqty) values (5,1647727.2727,3,0);
insert into test3 (rn,amt,qty,oqty) values (6,3012987.01302857,0,1);
insert into test3 (rn,amt,qty,oqty) values (7,3012987.01302857,0,1);
select * from test3; (already sorted desc so rn=1 is the newest transaction)
rn amt qty oqty
0 2260038.168 8 0 --> this is the current average
1 1647727.273 3 0
2 2489654.753 0 1
3 2489654.753 0 1
4 1875443.636 1 0
5 1647727.273 3 0
6 3012987.013 0 1
7 3012987.013 0 1
with recursive
runsum (id,amt,qty,oqty,sqty,avg) as
(select data.id, data.amt, data.qty, data.oqty, data.sqty, data.avg
from (
select rn as id,amt,qty, oqty,
sum(case when rn=0 then qty else
case when oqty=0 then qty*-1
else oqty end end) over (order by rn) as sqty, lag(amt) over (order by rn) as avg
from test3 ) data
),
trans (id,amt,qty,oqty,sqty,prevavg,avg) as
(select id,amt,qty,oqty, sqty,avg,avg
from runsum
union
select runsum.id,trans.amt,trans.qty, trans.oqty, trans.sqty, lag(trans.avg) over (order by 1),
case when runsum.sqty=0 then runsum.amt else
((trans.prevavg*(runsum.sqty+trans.qty))-(runsum.amt*trans.qty)+(trans.prevavg*trans.oqty))/(runsum.sqty+trans.oqty)
end
from runsum join trans using (id))
select *
from trans
where prevavg is null and avg is not null
order by id;
The result is supposed to be like this
rn amt qty oqty sum avg
1 1647727.273 3 0 5 2627424.705
2 2489654.753 0 1 6 2627424.705
3 2489654.753 0 1 7 2627424.705
4 1875443.636 1 0 6 2752754.883
5 1647727.273 3 0 3 3857782.493
6 3012987.013 0 1 4 3857782.493
7 3012987.013 0 1 5 3857782.493
but instead I get this
id amt qty oqty sqty avg
1 1647727.273 3 0 5 2627424.705
2 2489654.753 0 1 6 2627424.705
3 2489654.753 0 1 7 2627424.705
5 1647727.273 3 0 3 3607122.137 --> id=4 is missing thus
screwing the calculation
and id=6 in turn dissappears tpp
7 3012987.013 0 1 5 3607122.137
I am flabbergasted.
Where is the mistake?
Thank you for your kind help.
EDITED
Average Costing Method backtracking ( given current avg calculate last transaction avg, and so on until nth transactions )
Avg (n) = ((Avg(n-1) * (Cum Qty(n)+In Qty(n))) - (In Amount(n) * In Qty (n)) + (Avg(n-1) * Out Qty(n))/(Cum Qty(n)+Out Amount(n))
Cumulative qty for backtracking transactions would be minus for in, plus for out.
So if current qty is 8, transaction in qty before is 3, then cumulative qty for that transaction is 5.
To calculate the average for one transaction before last, then we use current average to use in that transaction calculation.
CURRENT ANSWER BY #kordirko's help
with recursive
runsum (id,amt,qty,oqty,sqty,avg) as
(select data.id, data.amt, data.qty, data.oqty, data.sqty, data.avg
from (
select rn as id,amt,qty, oqty,
sum(case when rn=0 then qty else
case when oqty=0 then qty*-1
else oqty end end) over (order by rn) as sqty, lag(amt) over (order by rn) as avg
from test3 ) data
),
counter (maximum) as
(select count(rn)
from test3
),
trans (n, id,amt,qty,oqty,sqty,prevavg,avg) as
(select 0 n, id,amt,qty,oqty, sqty,avg,avg
from runsum
union
select trans.n+1, runsum.id,trans.amt,trans.qty, trans.oqty, trans.sqty,
lag(trans.avg) over (order by 1),
case when runsum.sqty=0 then runsum.amt else
((trans.prevavg*(runsum.sqty+trans.qty))-(runsum.amt*trans.qty)+(trans.prevavg*trans.oqty))/(runsum.sqty+trans.oqty)
end
from runsum join trans using (id)
where trans.n<(select maximum*2 from counter))
select *
from trans
where prevavg is null and avg is not null
order by id;
This is probably not the "best" answer to your question, but while struggling with this tricky problem, I hit - just by accident - some ugly workaround :).
Click on this SQL Fiddle demo
with recursive
trans (n, id, amt, qty, oqty, sqty, prevavg, avg) as (
select 0 n, id, amt, qty, oqty, sqty, avg, avg
from runsum
union
select trans.n + 1, runsum.id, trans.amt, trans.qty, trans.oqty, trans.sqty,
lag(trans.avg) over (order by 1),
case when runsum.sqty=0 then runsum.amt
else
((trans.prevavg *(runsum.sqty+trans.qty))-(runsum.amt*trans.qty)+(trans.prevavg*trans.oqty))/(runsum.sqty+trans.oqty)
end
from runsum
join trans using (id)
where trans.n < 20
)
select *
from trans
where prevavg is null and avg is not null
order by id;
It seems that the source of the problem is UNION clause in the recursive query.
Read this link: http://www.postgresql.org/docs/8.4/static/queries-with.html
They wrote that for UNION the recursive query discards duplicate rows while evaluating recursive query.
I have a table which have Identity, RecordId, Type, Reading And IsDeleted columns. Identity is primary key that is auto increment, RecordId is integer that can have duplicate values, Type is a type of reading that can be either 'one' or 'average', Reading is integer that contains any integer value, and IsDeleted is bit that can be 0 or 1 i.e. false or true.
Now, I want the query that contains all the records of table in such a manner that if COUNT(Id) for each RecordId is greater than 2 then display all the records of that RecordId.
If COUNT(Id) == 2 for that specific RecordId and Reading value of both i.e. 'one' or 'average' type of the records are same then display only average record.
If COUNT(Id) ==1 then display only that record.
For example :
Id RecordId Type Reading IsDeleted
1 1 one 4 0
2 1 one 5 0
3 1 one 6 0
4 1 average 5 0
5 2 one 1 0
6 2 one 3 0
7 2 average 2 0
8 3 one 2 0
9 3 average 2 0
10 4 one 5 0
11 4 average 6 0
12 5 one 7 0
Ans result can be
Id RecordId Type Reading IsDeleted
1 1 one 4 0
2 1 one 5 0
3 1 one 6 0
4 1 average 5 0
5 2 one 1 0
6 2 one 3 0
7 2 average 2 0
9 3 average 2 0
10 4 one 5 0
11 4 average 6 0
12 5 one 7 0
In short I want to skip the 'one' type reading which have an average reading with same value and its count for 'one' type reading not more than one.
Check out this program
DECLARE #t TABLE(ID INT IDENTITY,RecordId INT,[Type] VARCHAR(10),Reading INT,IsDeleted BIT)
INSERT INTO #t VALUES
(1,'one',4,0),(1,'one',5,0),(1,'one',6,0),(1,'average',5,0),(2,'one',1,0),(2,'one',3,0),
(2,'average',2,0),(3,'one',2,0),(3,'average',2,0),(4,'one',5,0),(4,'average',6,0),(5,'one',7,0),
(6,'average',6,0),(6,'average',6,0),(7,'one',6,0),(7,'one',6,0)
--SELECT * FROM #t
;WITH GetAllRecordsCount AS
(
SELECT *,Cnt = COUNT(RecordId) OVER(PARTITION BY RecordId ORDER BY RecordId)
FROM #t
)
-- Condition 1 : When COUNT(RecordId) for each RecordId is greater than 2
-- then display all the records of that RecordId.
, GetRecordsWithCountMoreThan2 AS
(
SELECT * FROM GetAllRecordsCount WHERE Cnt > 2
)
-- Get all records where count = 2
, GetRecordsWithCountEquals2 AS
(
SELECT * FROM GetAllRecordsCount WHERE Cnt = 2
)
-- Condition 3 : When COUNT(RecordId) == 1 then display only that record.
, GetRecordsWithCountEquals1 AS
(
SELECT * FROM GetAllRecordsCount WHERE Cnt = 1
)
-- Condition 1: When COUNT(RecordId) > 2
SELECT * FROM GetRecordsWithCountMoreThan2 UNION ALL
-- Condition 2 : When COUNT(RecordId) == 2 for that specific RecordId and Reading value of
-- both i.e. 'one' or 'average' type of the records are same then display only
-- average record.
SELECT t1.* FROM GetRecordsWithCountEquals2 t1
JOIN (Select RecordId From GetRecordsWithCountEquals2 Where [Type] = ('one') )X
ON t1.RecordId = X.RecordId
AND t1.Type = 'average' UNION ALL
-- Condition 2: When COUNT(RecordId) = 1
SELECT * FROM GetRecordsWithCountEquals1
Result
ID RecordId Type Reading IsDeleted Cnt
1 1 one 4 0 4
2 1 one 5 0 4
3 1 one 6 0 4
4 1 average5 0 4
5 2 one 1 0 3
6 2 one 3 0 3
7 2 average2 0 3
9 3 average2 0 2
11 4 average6 0 2
12 5 one 7 0 1
;with a as
(
select Id,RecordId,Type,Reading,IsDeleted, count(*) over (partition by RecordId, Reading) cnt,
row_number() over (partition by RecordId, Reading order by Type, RecordId) rn
from table
)
select Id,RecordId,Type,Reading,IsDeleted
from a where cnt <> 2 or rn = 1
Assuming your table is named the_table, let's do this:
select main.*
from the_table as main
inner join (
select recordId, count(Id) as num, count(distinct Reading) as reading_num
from the_table
group by recordId
) as counter on counter.recordId=main.recordId
where num=1 or num>2 or reading_num=2 or main.type='average';
Untested, but it should be some variant of that.
EDIT TEST HERE ON FIDDLE
The short summary is that we want to join the table with an aggregated version of o=itself, then filter it based in the count criteria you mentioned (num=1, then show it; num=2, show just average record if reading numbers are the same otherwise show both; num>2, show all records).