Modify data in a specific format - sql

I want to represent data in a specific format. Currently, the data looks like below-
product_id order_id product_type day1_sale day2_sale day3_sale day4_sale
123 456 A null 0.2 0.3 null
123 456 B null null 0.4 null
111 222 A null null null null
333 444 B 0.7 0.1 0.2 0.6
I want to represent it in the below format-
product_id order_id product_type sale_day %sales_on_day
123 456 A day2 0.2
123 456 A day3 0.3
123 456 B day3 0.4
111 222 A null null
333 444 B day1 0.7
333 444 B day2 0.1
333 444 B day3 0.2
333 444 B day4 0.6
Is is there a way to get the data in this format?

Below is for BigQuery Standard SQL
#standardSQL
SELECT product_id, order_id, product_type, x.*
FROM `project.dataset.table`,
UNNEST([STRUCT('day1' AS sale_day, day1_sale AS sales_on_day), ('day2', day2_sale), ('day3', day3_sale), ('day4', day4_sale)]) x
WHERE NOT sales_on_day IS NULL
if to apply to sample data from your question - result is
Row product_id order_id product_type sale_day sales_on_day
1 123 456 A day2 0.2
2 123 456 A day3 0.3
3 123 456 B day3 0.4
4 333 444 B day1 0.7
5 333 444 B day2 0.1
6 333 444 B day3 0.2
7 333 444 B day4 0.6

You want to unpivot and filter. Here is a BigQuery'ish way to do this:
with t as (
select 123 as product_id, 456 as order_id, 'A' as product_type, null as day1_sale, 0.2 as day2_sale, 0.3 as day3_sale, null as day4_sale UNION ALL
select 123, 456, 'B', null, null, 0.4, null UNION ALL
select 111, 222, 'A', null, null, null, null UNION ALL
select 333, 444, 'B', 0.7, 0.1, 0.2, 0.6
)
select t.product_id, t.order_id, t.product_type, ds.*
from t cross join
unnest(array[struct('1' as day, day1_sale as day_sale),
('2', day2_sale),
('3', day3_sale),
('4', day4_sale)
]
) ds
where day_sale is not null;

Related

How to calculate the running balance of an asset based on INPUT and OUTPUT

I'm looking at different blockchain transactions and wanted to create a running balance of a given asset based on INPUT_ADDRESS (the address sending the currency) INPUT_AMOUNT (the amount being sent by an INPUT_ADDRESS), OUTPUT_ADDRESS (the address receiving the currency) and OUTPUT_AMOUNT (the amount being received by an OUTPUT_ADDRESS)
Here's a sample of a table I'm using:
BLOCK_DATE | BLOCK_HEIGHT | TRANS_HASH | INPUT_ADDRESS | OUTPUT_ADDRESS | INPUT_AMOUNT | OUTPUT_AMOUNT
01/11/2020 190 15c7853 abc xyz1 -0.01 0.0001
01/11/2020 190 14v9876 abc xyz2 -0.50 0.70
01/11/2020 191 19vc842 abc xyz3 -5.03 0.413
01/12/2020 192 20ff4d3 abc xyz4 -0.06 0.201
01/12/2020 192 154gf34 xyz1 abc -0.07 0.18
01/12/2020 192 45f4ti5 ggg abc -0.10 0.24
01/12/2020 192 33cv5c5 jjj abc -0.08 1.13
If I were to calculate a running sum of address abc, what's an efficient way of going about this? I tried using something like:
SELECT BLOCK_DATE, BLOCK_HEIGHT, TRANS_HASH, INPUT_ADDRESS, OUTPUT_ADDRESS, INPUT_AMOUNT, OUTPUT_AMOUNT, SUM (INPUT_AMOUNT) OVER (ORDER BY DATE) AS RunningAgeTotal
FROM TRANSACTION_TABLE
WHERE INPUT_ADDRESS = abc
In this particular example, the total balance for abc would be the sum of OUTPUT_AMOUNT where abc is the OUTPUT_ADDRESS (i.e 0.18 + 0.24 + 1.13) + the sum of INPUT_AMOUNT where abc is the INPUT_ADDRESS (i.e. -0.01 + -0.50 + -5.03 + -0.06). So, 1.55 + (-5.60) = -4.05
But I don't think this is the right way of going about this and I'm not sure how to account for the OUTPUT_AMOUNT (e.g. when abc receives is an OUTPUT_ADDRESS and receives an OUTPUT_AMOUNT)
Is this what you want?
select t.*,
sum(case when input_address = 'ABC' then input_amount
when output_address = 'ABC' then output_amount
end) over (order by block_date) as running_amount
from transaction_table t
where 'ABC' in (input_address, output_address);
This is a cumulative sum of the amounts aligned with the input/output columns.
EDIT:
You may want:
sum(case when input_address = 'ABC' then input_amount
when output_address = 'ABC' then output_amount
end) over (order by block_date, block_height) as running_amount

query to get the SUM

Supposed I have a data of
code_table
code_id | code_no | stats |
2 60 22A3
3 60 22A3
value_table
value_no | amount_value_one | amount_value_two | amount_diff | code_no | sample_no | code_id
1 1200.00 400.00 800.00 60 90 2
1 600.00 200.00 400.00 60 100 3
1 1800.00 600.00 1200.00 60 110 2
2 1200.00 1200.00 0.00 60 110 2
2 800.00 600.00 200.00 60 90 2
2 400.00 0.00 400.00 60 100 3
What I want to happen is to get all the SUM of amount_value_two and just retain the first amount_value_one which has the value_no = 1
the output can be conclude as
amount_value_one | SUM_of_amount_value_two | amount_diff | sample_no
1200.00 1000.00 200.00 90
600.00 200.00 400.00 100
1800.00 1.800.00 0.00 110
so far i have this following query
SELECT SUM(p.amount_value_one) as value_one,
SUM(p.amount_value_two) as value_two,
SUM(p.amount_diff) as amount_diff,
p.sample_no as sampleNo FROM value_table p
INNER JOIN code_table On code_table.code_no = p.code_no
WHERE code_table.code_id = p.code_id
AND code_table.stats = '22A3'
GROUP BY p.sample_no
the query above that I used is wrong because it gets the sum of both p.amount_value_one
and p.amount_diff
its just a test query because i cant imagine what would the query will look like.
Assuming that you have a column that specifies the ordering, then you can use that to figure out the "first" row. Then use conditional aggregation:
SELECT SUM(CASE WHEN seqnum = 1 THEN p.amount_value_one END) as value_one,
SUM(p.amount_value_two) as value_two,
SUM(p.amount_diff) as amount_diff,
p.sample_no as sampleNo
FROM (SELECT p.*,
ROW_NUMBER() OVER (PARTITION BY p.sample_no ORDER BY <ordering column>) as seqnum
FROM value_table p
) p JOIN
code_table ct
ON ct.code_no = p.code_no AND
ct.code_id = p.code_id
WHERE ct.stats = '22A3'
GROUP BY p.sample_no

SQL query how to get the last working day of week/year in Oracle/Hive?

Supposed I have some sample data in test_data as below, each week has 1-5 days with data in the database(>=1 days 'There is data' <= 5 days):
code vol val num test_date
--------------------------------------------
1 00001 100 0.1 111 20191104
2 00001 100 0.1 111 20191105
3 00001 100 0.1 111 20191106
4 00001 100 0.1 111 20191107
5 00001 100 0.1 111 20191108
7 00001 100 0.1 111 20191111
8 00001 200 0.1 222 20191112
9 00001 200 0.1 111 20191113
10 00001 400 0.3 222 20191114
11 00001 200 0.2 333 20191118
12 00002 100 0.1 111 20191104
13 00002 200 0.1 222 20191105
14 00002 200 0.1 111 20191106
15 00002 400 0.3 222 20191107
16 00002 200 0.2 333 20191108
....................
....................
I would like to summarize volume, number and value by week/year and code, now I am able to summarize them by below SQL query, but I can't get the last date of a week according to test_date, the last day may be any day of a week or a year because of business/working day, we need to display that last date column
SELECT t.code
,date_add(concat_ws('-',substr(t.test_date,1,4),substr(t.test_date,5,2),substr(t.test_date,7,2)) ,
-pmod(datediff(concat_ws('-',substr(t.test_date,1,4),substr(t.test_date,5,2),substr(t.test_date,7,2)),'1990-01-01'),7)) AS test_date
,sum(t.number) AS num
,sum(t.volume) AS vol
,sum(t.value) AS val
FROM test_data t
GROUP BY t.code, test_date
Now my output is as below:
code vol val num test_date(monday)
----------------------------------------------------
1 00001 500 0.5 555 20191104
2 00001 900 0.6 666 20191111
3 00001 200 0.1 111 20191118
4 00001 400 0.3 222 20191125
5 00001 200 0.2 333 20191202
But my expected output is as below:
code vol val num test_date(the last date of week in database)
-------------------------------------------------------------------------------
1 00001 500 0.5 555 20191108
2 00001 900 0.6 666 20191114
3 00001 200 0.1 111 20191122
4 00001 400 0.3 222 20191129
5 00001 200 0.2 333 20191206
Thanks so much for any advice.
I think the following is what you want:
SELECT t.code
, max(t.test_date) AS test_date
, sum(t.number) AS num
, sum(t.volume) AS vol
, sum(t.value) AS val
FROM test_data t
GROUP BY t.code, TRUNC(TO_DATE(t.test_date,'RRRRMMDD'),'IW')
I just shortend your equation to calculate the first day of the week a bit using TO_DATE and TRUNC. Then you just select the maximum of the test_date for each group which is the last day in this week, where data exist.
If you just want the last of the week regardless if there is data just add the according number of days to the starting day, e.g. TRUNC(TO_DATE(t.test_date,'RRRRMMDD'),'IW') + 6 for sunday.

How can duplicate the data records based on period?

I have a data set as periodic. However, these periods are not consecutive. My data pattern is like that
Period Customer_No Date Product
1 111 01.01.2017 X
3 111 05.09.2017 Y
8 111 02.05.2018 Z
6 222 02.02.2017 X
9 222 06.04.2017 Z
12 222 05.09.2018 B
15 222 02.01.2019 A
End of the period should be 15 for all customers. I want to create consecutive periods based on customers and fill them with previous data like below:
Period Customer_No Date Product
1 111 01.01.2017 X
2 111 01.01.2017 X
3 111 05.09.2017 Y
4 111 05.09.2017 Y
5 111 05.09.2017 Y
6 111 05.09.2017 Y
7 111 05.09.2017 Y
8 111 02.05.2018 Z
9 111 02.05.2018 Z
10 111 02.05.2018 Z
11 111 02.05.2018 Z
12 111 02.05.2018 Z
13 111 02.05.2018 Z
14 111 02.05.2018 Z
15 111 02.05.2018 Z
6 222 02.02.2017 X
7 222 02.02.2017 X
8 222 02.02.2017 X
9 222 06.04.2017 Z
10 222 06.04.2017 Z
11 222 06.04.2017 Z
12 222 05.09.2018 B
13 222 05.09.2018 B
14 222 05.09.2018 B
15 222 02.01.2019 A
create table tbl_cust(period int,Customer_No int, Date date, Product varchar)
insert into tbl_cust values(1,111,'01.01.2017','X')
insert into tbl_cust values(3,111,'05.09.2017','Y')
insert into tbl_cust values(8,111,'02.05.2018','Z')
insert into tbl_cust values(6,222,'02.02.2017','X')
insert into tbl_cust values(9,222,'06.04.2017','Z')
insert into tbl_cust values(12,222,'05.09.2018','B')
insert into tbl_cust values(15,222,'02.01.2019','A')
You can use a recursive CTE to generate the rows that you want. Then you need to fill them in with the most recent data. What you really want is lag(ignore nulls), but SQL Server does not support that functionality.
There are only up to 15 rows per customer, so apply is a reasonable alternative:
with cte as (
select min(period) as period, customer_no
from tbl_cust
group by customer_no
union all
select period + 1, customer_no
from cte
where period < 15
)
select cte.period, cte.customer_no, c.date, c.product
from cte cross apply
(select top (1) c.*
from tbl_cust c
where c.customer_no = cte.customer_no and
c.period <= cte.period
order by c.period desc
) c
order by cte.customer_no, cte.period;
Here is a db<>fiddle.
You can try this.
select ID as period, Customer_No, [Date], Product from
(VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15)) P(ID)
OUTER APPLY( SELECT *, ROW_NUMBER() OVER(PARTITION BY Customer_No ORDER BY period desc) RN
FROM tbl_cust C WHERE C.period <= P.ID ) X
WHERE X.RN = 1
ORDER BY Customer_No, ID
Result:
period Customer_No Date Product
----------- ----------- ---------- -------
1 111 2017-01-01 X
2 111 2017-01-01 X
3 111 2017-05-09 Y
4 111 2017-05-09 Y
5 111 2017-05-09 Y
6 111 2017-05-09 Y
7 111 2017-05-09 Y
8 111 2018-02-05 Z
9 111 2018-02-05 Z
10 111 2018-02-05 Z
11 111 2018-02-05 Z
12 111 2018-02-05 Z
13 111 2018-02-05 Z
14 111 2018-02-05 Z
15 111 2018-02-05 Z
6 222 2017-02-02 X
7 222 2017-02-02 X
8 222 2017-02-02 X
9 222 2017-06-04 Z
10 222 2017-06-04 Z
11 222 2017-06-04 Z
12 222 2018-05-09 B
13 222 2018-05-09 B
14 222 2018-05-09 B
15 222 2019-02-01 A

Exclude rows where keys match, but are on different rows

I'm looking for the best way to produce the result set in the scenario provided. My cust3 column isn't identifying the repeated values in the indvid2 column. The end result I'm looking for is to exclude the rows where key1 and key2 match (ids:1,2,6 and 7), then sum accounts where the acctids match.If there's a better way to code this, I welcome all suggestions. Thanks!
WITH T10 as (
SELECT acctid,invid,(
case
when invid like '%-R' then left (InvID,LEN(invid) -2) else InvID
END) as InvID2
FROM table x
GROUP BY acctID,invID
),
T11 as (
SELECT acctid, Invid2, COUNT(InvID2) as cust3
FROM T10
GROUP BY InvID2,acctid
HAVING
COUNT (InvID2) > 1
)
select DISTINCT
a.acctid,
a.name,
b.invid,
C.invid2,
D.cust3,
b.amt,
b.key1,
b.key2
from table a
inner join table b (nolock) on a.acctid = b.acctid
inner join T10 C (nolock) on b.invid = c.invid
inner join T11 D (nolock) on C.invid2 = D.invid2
Resultset
id acctID name invid invid2 Cust3 amt key1 key2
1 123 James 101 101 2 $500 NULL 6789
2 123 james 101-R 101 2 ($500) 6789 NULL
3 123 James 102 102 2 $350 NULL NULL
4 123 James 103 103 2 $200 NULL NULL
5 246 Tony 98-R 98 2 ($750) 7423 NULL
6 432 David 45 45 2 $100 NULL 9634
7 432 David 45-R 45 2 ($100) 9634 NULL
8 359 Stan 39-R 39 2 ($50) 6157 NULL
9 753 George 95 95 2 $365 NULL NULL
10 753 George 108 108 2 $100 NULL NULL
Desired Resultset
id acctID name invid invid2 Cust3 amt key1 key2
1 123 James 101 101 2 $500 NULL 6789
2 123 james 101-R 101 2 ($500) 6789 NULL
3 123 James 102 102 1 $350 NULL NULL
4 123 James 103 103 1 $200 NULL NULL
5 246 Tony 98-R 98 1 ($750) 7423 NULL
6 432 David 45 45 2 $100 NULL 9634
7 432 David 45-R 45 2 ($100) 9634 NULL
8 359 Stan 39-R 39 1 ($50) 6157 NULL
9 753 George 95 95 1 $365 NULL NULL
10 753 George 108 108 1 $100 NULL NULL
Then to sum amt by acctid
id acctid name amt
1 123 James $550
2 246 Tony ($750)
3 359 Stan ($50)
4 753 George $465
Something like:
;WITH Keys as (
SELECT Key1.acctID, [Key] = Key1.Key1
FROM YourTable as Key1
INNER JOIN YourTable as Key2
ON Key1.Key1 = Key2.Key2 and Key1.acctID = Key2.acctID
)
SELECT t.acctID, t.name, amt = SUM(t.amt)
FROM YourTable as t
LEFT JOIN Keys as k
ON t.acctID = k.acctID and (t.Key1 = [Key] or t.Key2 = [Key])
WHERE k.acctID is Null
GROUP BY t.acctID, t.name