How to minus by period in another column - sql

I need results in minus column like:
For example, we take first result by A = 23(1)
and we 34(2) - 23(1) = 11, then 23(3) - 23(1)...
And so on. For each category.
+--------+----------+--------+-------+
| Period | Category | Result | Minus |
+--------+----------+--------+-------+
| 1 | A | 23 | n/a |
| 1 | B | 24 | n/a |
| 1 | C | 25 | n/a |
| 2 | A | 34 | 11 |
| 2 | B | 23 | -1 |
| 2 | C | 1 | -24 |
| 3 | A | 23 | 0 |
| 3 | B | 90 | 66 |
| 3 | C | 21 | -4 |
+--------+----------+--------+-------+
Could you help me?
Could we use partitions or lead here?

SELECT
*,
Result - FIRST_VALUE(Result) OVER (PARTITION BY Category ORDER BY Period) AS Minus
FROM
yourTable
This doesn't create the hello values, but returns 0 instead. I'm not sure returning arbitrary string in an integer column makes sense, so I didn't do it.
If you really need to avoid the 0 you could just use a CASE statement...
CASE WHEN 1 = Period
THEN NULL
ELSE Result - FIRST_VALUE(Result) OVER (PARTITION BY Category ORDER BY Period)
END
Or, even more robustly...
CASE WHEN 1 = ROW_NUMBER() OVER (PARTITION BY Category ORDER BY Period)
THEN NULL
ELSE Result - FIRST_VALUE(Result) OVER (PARTITION BY Category ORDER BY Period)
END
(Apologies for any typos, etc, I'm on my phone.)

You can do:
select b.*, b.result - a.result as "minus"
from t a
join t b on b.category = a.category and a.period = 1
Result:
period category result minus
------- --------- ------- -----
1 A 23 0
1 B 24 0
1 C 25 0
2 A 34 11
2 B 23 -1
2 C 1 -24
3 A 23 0
3 B 90 66
3 C 21 -4
See running example at DB Fiddle.

Ok, just not to duplicate my question
How to do it?
If
For each new Sub period we should repeat first_value logic
+------------+----------+------------+----------+---------+
| Sub period | Period | Category | Result | Minus |
+------------+----------+------------+----------+---------+
| SA | 1 | A | 23 | n/a |
| SA | 2 | A | 34 | 11 |
| SA | 3 | A | 35 | 12 |
| SA | 4 | A | 36 | 13 |
| KS | 1 | A | 23 | n/a |
| KS | 2 | A | 21 | -2 |
| KS | 3 | A | 23 | 0 |
| KS | 4 | A | 21 | -2 |
+------------+----------+------------+----------+---------+

Related

How to assign duplicate increment in SQL?

While going through SQL columns, if we find text match "NEW" in Calc column, update the incrementing a count starting with 1 in Results column.
It should look like this on the output:
The following uses an id column to resolve the order issue. Replace that with your corresponding expression. This also addresses the requirement to start the display sequence with 1 and also show 0 for the 'NEW' rows.
The SQL (updated):
SELECT logs.*
, CASE WHEN text = 'NEW' THEN 0
ELSE
COALESCE(SUM(CASE WHEN text = 'NEW' THEN 1 END) OVER (PARTITION BY xrank ORDER BY id)+1, 1)
END AS display
FROM logs
ORDER BY id
The result:
+----+-------+------+---------+
| id | xrank | text | display |
+----+-------+------+---------+
| 1 | 1 | A | 1 |
| 2 | 1 | B | 1 |
| 3 | 1 | C | 1 |
| 4 | 1 | NEW | 0 |
| 5 | 1 | D | 2 |
| 6 | 1 | Q | 2 |
| 7 | 1 | B | 2 |
| 8 | 1 | NEW | 0 |
| 9 | 1 | D | 3 |
| 10 | 1 | Z | 3 |
| 11 | 2 | A | 1 |
| 12 | 2 | B | 1 |
| 13 | 2 | C | 1 |
| 14 | 2 | NEW | 0 |
| 15 | 2 | D | 2 |
| 16 | 2 | Q | 2 |
| 17 | 2 | B | 2 |
| 18 | 2 | NEW | 0 |
| 19 | 2 | D | 3 |
| 20 | 2 | Z | 3 |
+----+-------+------+---------+
You need a column that specifies the ordering for the table. With that, just use a cumulative sum:
select t.*,
1 + sum(case when Calc = 'NEW' then 1 else 0 end) over (partition by Rank_Id order by Seq) as display
from t;

output difference of two values same column to another column

Can anhone help me out or point me in the right direction? What is simplest way to get from current table to output table??
Current Table
ID | type | amount |
2 | A | 19 |
2 | B | 6 |
3 | A | 5 |
3 | B | 11 |
4 | A | 1 |
4 | B | 23 |
Desires output
ID | type | amount | change |
2 | A | 19 | 13 |
2 | B | 6 | -6 |
3 | A | 5 | -22 |
3 | B | 11 | |
4 | A | 1 | |
4 | B | 23 | |
I don't get how the values are put on rows. You can, for instance, subtract the "B" value from the "A" value for any given id. For instance:
select t.*,
(case when type = 'A'
then amount - max(amount) filter (type = 'B') over (partition by id)
end) as diff_a_b
from t;

Query Different Condition With 1 Column

I have table like :
+-------+--------+----------+------------+-------+
| cd_hs | cd_cnt | name_cnt | dates | value |
+-------+--------+----------+------------+-------+
| 1 | 1 | aaa | 2018-06-01 | 50 |
| 1 | 2 | bbb | 2018-07-01 | 150 |
| 1 | 3 | ccc | 2018-08-01 | 20 |
| 1 | 1 | aaa | 2018-06-02 | 40 |
| 1 | 2 | bbb | 2018-07-02 | 70 |
| 1 | 3 | ccc | 2018-08-02 | 80 |
+-------+--------+----------+------------+-------+
Actually I have more data but I am just show the sample and what I want to do is
I want to group by cd_hs, name_cnt and based on year in dates column and do sum(value) but I have the 2 condition. First is to show value with condition cd_cnt with 1 and 2 and second condition cd_cnt without 1 and 2 so meaning I have much value other than 1 and 2 and do aliasing as other in one column
Expected Result :
+-------+------+----------+-------------+
| cd_hs | year | name_cnt | total_value |
+-------+------+----------+-------------+
| 1 | 2018 | aaa | 90 |
| 1 | 2018 | bbb | 220 |
| 1 | 2018 | other | 100 |
+-------+------+----------+-------------+
how can I do that? I am new in query and don't know what to do..
Your question is a bit confusing considering your spec doesn't seem to exactly line up with what you requested.
If the sample result you've provided is actually what you're looking for, a simple SUM and GROUP BY should do the trick here:
SELECT cd_hs, EXTRACT(YEAR from dates) as year, name_cnt, SUM(value_)
FROM foo
GROUP BY cd_hs, EXTRACT(YEAR from dates), name_cnt
Result:
| cd_hs | year | name_cnt | sum |
|-------|------|----------|-----|
| 1 | 2018 | aaa | 90 |
| 1 | 2018 | bbb | 220 |
| 1 | 2018 | ccc | 100 |
SQLFiddle
Since you mentioned you wanted two different totals with two separate conditions, you could use JOIN in conjunction with some well-crafted subqueries:
SELECT a.cd_hs, EXTRACT(YEAR FROM a.dates), a.name_cnt, COALESCE(b.total_a, 0) as "Total A", COALESCE(c.total_b, 0) as "Total B"
FROM foo a
LEFT JOIN (
SELECT b.cd_hs, b.name_cnt, EXTRACT(YEAR FROM b.dates), SUM(value_) as total_a
FROM foo b
WHERE b.cd_cnt NOT IN (1, 2)
GROUP BY b.cd_hs, b.name_cnt, EXTRACT(YEAR from b.dates)
) b ON a.cd_hs = b.cd_hs AND a.name_cnt = b.name_cnt
LEFT JOIN (
SELECT c.cd_hs, c.name_cnt, EXTRACT(YEAR FROM c.dates), SUM(value_) as total_b
FROM foo c
WHERE c.cd_cnt IN (1, 2)
GROUP BY c.cd_hs, c.name_cnt, EXTRACT(YEAR from c.dates)
) c ON a.cd_hs = c.cd_hs AND a.name_cnt = c.name_cnt
This particular solution is readable and will get you to the correct end result but will most likely not be scalable in its current form.
Result:
| cd_hs | date_part | name_cnt | Total A | Total B |
|-------|-----------|----------|---------|---------|
| 1 | 2018 | aaa | 0 | 90 |
| 1 | 2018 | bbb | 0 | 220 |
| 1 | 2018 | ccc | 100 | 0 |
| 1 | 2018 | aaa | 0 | 90 |
| 1 | 2018 | bbb | 0 | 220 |
| 1 | 2018 | ccc | 100 | 0 |
SQLFiddle

PostgreSQL multiple row as columns

I have a table like this:
| id | name | segment | date_created | question | answer |
|----|------|---------|--------------|----------|--------|
| 1 | John | 1 | 2018-01-01 | 10 | 28 |
| 1 | John | 1 | 2018-01-01 | 14 | 37 |
| 1 | John | 1 | 2018-01-01 | 9 | 83 |
| 2 | Jack | 3 | 2018-03-11 | 22 | 13 |
| 2 | Jack | 3 | 2018-03-11 | 23 | 16 |
And I want to show this information in a single row, transpose all the questions and answers as columns:
| id | name | segment | date_created | question_01 | answer_01 | question_02 | answer_02 | question_03 | answer_03 |
|----|------|---------|--------------|-------------|-----------|-------------|-----------|-------------|-----------|
| 1 | John | 1 | 2018-01-01 | 10 | 28 | 14 | 37 | 9 | 83 |
| 2 | Jack | 3 | 2018-03-11 | 22 | 13 | 23 | 16 | | |
The number os questions/answers for the same ID is known. Maximum of 15.
I'm already tried using crosstab, but it only accepts a single value as category and I can have 2 (question/answer). Any help how to solve this?
You can try to use row_number to make a number in subquery then, do Aggregate function condition in the main query.
SELECT ID,
Name,
segment,
date_created,
max(CASE WHEN rn = 1 THEN question END) question_01 ,
max(CASE WHEN rn = 1 THEN answer END) answer_01 ,
max(CASE WHEN rn = 2 THEN question END) question_02,
max(CASE WHEN rn = 2 THEN answer END) answer_02,
max(CASE WHEN rn = 3 THEN question END) question_03,
max(CASE WHEN rn = 3 THEN answer END) answer_03
FROM (
select *,Row_number() over(partition by ID,Name,segment,date_created order by (select 1)) rn
from T
) t1
GROUP BY ID,Name,segment,date_created
sqlfiddle
[Results]:
| id | name | segment | date_created | question_01 | answer_01 | question_02 | answer_02 | question_03 | answer_03 |
|----|------|---------|--------------|-------------|-----------|-------------|-----------|-------------|-----------|
| 1 | John | 1 | 2018-01-01 | 1 | 28 | 14 | 37 | 9 | 83 |
| 2 | Jack | 3 | 2018-03-11 | 22 | 13 | 23 | 16 | (null) | (null) |

How do I select columns whenever they change?

I'm trying to create a slowly changing dimension (type 2 dimension) and am a bit lost on how to logically write it out. Say that we have a source table with a grain of Person | Country | Department | Login Time. I want to create this dimension table with Person | Country | Department | Eff Start time | Eff End Time.
Data could look like this:
Person | Country | Department | Login Time
------------------------------------------
Bob | CANADA | Marketing | 2009-01-01
Bob | CANADA | Marketing | 2009-02-01
Bob | USA | Marketing | 2009-03-01
Bob | USA | Sales | 2009-04-01
Bob | MEX | Product | 2009-05-01
Bob | MEX | Product | 2009-06-01
Bob | MEX | Product | 2009-07-01
Bob | CANADA | Marketing | 2009-08-01
What I want in the Type 2 dimension would look like this:
Person | Country | Department | Eff Start time | Eff End Time
------------------------------------------------------------------
Bob | CANADA | Marketing | 2009-01-01 | 2009-03-01
Bob | USA | Marketing | 2009-03-01 | 2009-04-01
Bob | USA | Sales | 2009-04-01 | 2009-05-01
Bob | MEX | Product | 2009-05-01 | 2009-08-01
Bob | CANADA | Marketing | 2009-08-01 | NULL
Assume that Bob's name, Country and Department hasn't been updated since 2009-08-01 so it's left as NULL
What function would work best here? This is on Netezza, which uses a flavor of Postgres.
Obviously GROUP BY would not work here because of same groupings later on (I added in Bob | CANADA | Marketing at the last row to show this.
EDIT
Including a hash column on Person, Country, and Department, would make sense, correct? Thinking of using logic of
SELECT PERSON, COUNTRY, DEPARTMENT
FROM table t1
where
person = person
AND t1.hash <> hash_function(person, country, department)
Answer
create table so (
person varchar(32)
,country varchar(32)
,department varchar(32)
,login_time date
) distribute on random;
insert into so values ('Bob','CANADA','Marketing','2009-01-01');
insert into so values ('Bob','CANADA','Marketing','2009-02-01');
insert into so values ('Bob','USA','Marketing','2009-03-01');
insert into so values ('Bob','USA','Sales','2009-04-01');
insert into so values ('Bob','MEX','Product','2009-05-01');
insert into so values ('Bob','MEX','Product','2009-06-01');
insert into so values ('Bob','MEX','Product','2009-07-01');
insert into so values ('Bob','CANADA','Marketing','2009-08-01');
/* ************************************************************************** */
with prm as ( --Create an ordinal primary key.
select
*
,row_number() over (
partition by person
order by login_time
) rwn
from
so
), chn as ( --Chain events to their previous and next event.
select
cur.rwn
,cur.person
,cur.country
,cur.department
,cur.login_time cur_login
,case
when
cur.country = prv.country
and cur.department = prv.department
then 1
else 0
end prv_equal
,case
when
(
cur.country = nxt.country
and cur.department = nxt.department
) or nxt.rwn is null --No next record should be equivalent to matching.
then 1
else 0
end nxt_equal
,case prv_equal
when 0 then cur_login
else null
end eff_login_start_sparse
,case
when eff_login_start_sparse is null
then max(eff_login_start_sparse) over (
partition by cur.person
order by rwn
rows unbounded preceding --The secret sauce.
)
else eff_login_start_sparse
end eff_login_start
,case nxt_equal
when 0 then cur_login
else null
end eff_login_end
from
prm cur
left outer join prm nxt on
cur.person = nxt.person
and cur.rwn + 1 = nxt.rwn
left outer join prm prv on
cur.person = prv.person
and cur.rwn - 1 = prv.rwn
), grp as ( --Group by login starts.
select
person
,country
,department
,eff_login_start
,max(eff_login_end) eff_login_end
from
chn
group by
person
,country
,department
,eff_login_start
), led as ( --Change the effective end to be the next start, if desired.
select
person
,country
,department
,eff_login_start
,case
when eff_login_end is null
then null
else
lead(eff_login_start) over (
partition by person
order by eff_login_start
)
end eff_login_end
from
grp
)
select * from led order by eff_login_start;
This code returns the following table.
PERSON | COUNTRY | DEPARTMENT | EFF_LOGIN_START | EFF_LOGIN_END
--------+---------+------------+-----------------+---------------
Bob | CANADA | Marketing | 2009-01-01 | 2009-03-01
Bob | USA | Marketing | 2009-03-01 | 2009-04-01
Bob | USA | Sales | 2009-04-01 | 2009-05-01
Bob | MEX | Product | 2009-05-01 | 2009-08-01
Bob | CANADA | Marketing | 2009-08-01 |
Explanation
I must have solved this four or five times in the past few years and keep neglecting to write it down formally. I'm glad to have the chance to do it, so this is a great question.
When attempting this, I like writing down the problem in matrix form. Here's the input, presuming that all values have the same key in the SCD.
Cv | Ce
----|----
A | 10
A | 11
B | 14
C | 16
D | 18
D | 25
D | 34
A | 40
Where Cv is the value that we'll need to compare against (again, presuming that the key value for the SCD is equal in this data; we'll be partitioning over the key value the entire time so it's irrelevant to the solution) and Ce is the event time.
First, we need an ordinal primary key. I've designated this Ck in the table. This will allow us to join the table to itself to get the previous and next events. I've called these columns Pk (previous key), Nk (next key), Pv, and Nv.
Cv | Ce | Ck | Pk | Pv | Nk | Nv |
----|----|----|----|----|----|----|
A | 10 | 1 | | | 2 | A |
A | 11 | 2 | 1 | A | 3 | B |
B | 14 | 3 | 2 | A | 4 | C |
C | 16 | 4 | 3 | B | 5 | D |
D | 18 | 5 | 4 | C | 6 | D |
D | 25 | 6 | 5 | D | 7 | D |
D | 34 | 7 | 6 | D | 8 | A |
A | 40 | 8 | 7 | D | | |
Now we need some columns to see if we're at the beginning or end of a contiguous event block. I'll call these Pc and Nc, for contiguous. Pc is defined as Pv = Cv => true. 1 represents true and 0 represents false. Nc is defined similarly, except that the null case defaults to true (we'll see why in a minute)
Cv | Ce | Ck | Pk | Pv | Nk | Nv | Pc | Nc |
----|----|----|----|----|----|----|----|----|
A | 10 | 1 | | | 2 | A | 0 | 1 |
A | 11 | 2 | 1 | A | 3 | B | 1 | 0 |
B | 14 | 3 | 2 | A | 4 | C | 0 | 0 |
C | 16 | 4 | 3 | B | 5 | D | 0 | 0 |
D | 18 | 5 | 4 | C | 6 | D | 0 | 1 |
D | 25 | 6 | 5 | D | 7 | D | 1 | 1 |
D | 34 | 7 | 6 | D | 8 | A | 1 | 0 |
A | 40 | 8 | 7 | D | | | 0 | 1 |
Now you can start to see how the 1,1 combination of Pc,Nc is a completely useless record. We know this intuitively, since Bob's Mex/Product combination on the 6th row is pretty much useless information when building an SCD.
So let's get rid of the useless information. I'll add two new columns here: an almost-complete effective start time called Sn and an actually-complete effective end time called Ee. Sn is is populated with Ce when Pc is 0 and Ee is populated with Ce when Nc is 0.
Cv | Ce | Ck | Pk | Pv | Nk | Nv | Pc | Nc | Sn | Ee |
----|----|----|----|----|----|----|----|----|----|----|
A | 10 | 1 | | | 2 | A | 0 | 1 | 10 | |
A | 11 | 2 | 1 | A | 3 | B | 1 | 0 | | 11 |
B | 14 | 3 | 2 | A | 4 | C | 0 | 0 | 14 | 14 |
C | 16 | 4 | 3 | B | 5 | D | 0 | 0 | 16 | 16 |
D | 18 | 5 | 4 | C | 6 | D | 0 | 1 | 18 | |
D | 25 | 6 | 5 | D | 7 | D | 1 | 1 | | |
D | 34 | 7 | 6 | D | 8 | A | 1 | 0 | | 34 |
A | 40 | 8 | 7 | D | | | 0 | 1 | 40 | |
This looks really close, but we still have the problem that we can't group by Cv (person/country/department). What we need is for Sn to populate all those nulls with the previous value of Sn. You could join this table to itself on rwn < rwn and get the maximum, but I'm going to be lazy and use Netezza's analytic functions and the rows unbounded preceding clause. It's a shortcut to the method I just described. So we're going to create another column called Es, efffective start, defined as follows.
case
when Sn is null
then max(Sn) over (
partition by k --key value of the SCD
order by Ck
rows unbounded preceding
)
else Sn
end Es
With that definition, we get this.
Cv | Ce | Ck | Pk | Pv | Nk | Nv | Pc | Nc | Sn | Ee | Es |
----|----|----|----|----|----|----|----|----|----|----|----|
A | 10 | 1 | | | 2 | A | 0 | 1 | 10 | | 10 |
A | 11 | 2 | 1 | A | 3 | B | 1 | 0 | | 11 | 10 |
B | 14 | 3 | 2 | A | 4 | C | 0 | 0 | 14 | 14 | 14 |
C | 16 | 4 | 3 | B | 5 | D | 0 | 0 | 16 | 16 | 16 |
D | 18 | 5 | 4 | C | 6 | D | 0 | 1 | 18 | | 18 |
D | 25 | 6 | 5 | D | 7 | D | 1 | 1 | | | 18 |
D | 34 | 7 | 6 | D | 8 | A | 1 | 0 | | 34 | 18 |
A | 40 | 8 | 7 | D | | | 0 | 1 | 40 | | 40 |
The rest is trivial. Group by Es and grab the max of Ee to obtain this table.
Cv | Es | Ee |
----|----|----|
A | 10 | 11 |
B | 14 | 14 |
C | 16 | 16 |
D | 18 | 34 |
A | 40 | |
If you want to populate the effective end time with the next start, join the table again to itself or use the lead() window function to grab it.