SQL query to select a column with expression of non-aggregate value and aggregate function

SQL query to select a column with expression of non-aggregate value and aggregate function - sql

Tables used:
1) v(date d, name c(25), desc c(50), debit n(7), credit n(7))
name in 'v' refers name in vn table
2) vn(date d, name c(25), type c(25), obal n(7))
name in 'vn' is a primary key and different names are grouped by type
ex: names abc, def, ghi belongs to type 'bank', names xyz, pqr belongs to type 'ledger', ...
I've a query like this:
SELECT vn.type, SUM(vn.obal + IIF(v.date < sd, v.credit-v.debit, 0)) OpBal, ;
SUM(IIF(BETWEEN(v.date, sd, ed), v.credit-v.debit, 0)) CurBal ;
FROM v, vn WHERE v.name = vn.name GROUP BY vn.type ;
ORDER BY vn.type HAVING OpBal + CurBal != 0
It works fine but the only problem is, obal is a value which is entered only once per name in table 'vn' but with this query for every calculation of credit-debit in table 'v', obal is added multiple times and displayed under OpBal. When the query is modified like below:
SELECT vn.type, vn.obal + SUM(IIF(v.date < sd, v.credit-v.debit, 0)) OpBal, ;
SUM(IIF(BETWEEN(v.date, sd, ed), v.credit-v.debit, 0)) CurBal ;
FROM v, vn WHERE v.name = vn.name GROUP BY vn.type ;
ORDER BY vn.type HAVING OpBal + CurBal != 0
it shows an error message like 'Group by clause is missing or invalid'!
RDBMS used MS Visual Foxpro 9. sd and ed are date type variables used for the purpose of query where sd < ed.
Please help me out getting the expected result. Thanks a lot.

I saw the SQL Syntax for SQL with VFP for the first time a few minutes ago, so this could well be full of errors, but as a 'guessful hunch':
SELECT vn.type,
SUM(vn.obal + (SELECT SUM(IIF(v.date < sd, v.credit-v.debit, 0))
FROM v
WHERE v.name = vn.name)) OpBal,
SUM(SELECT SUM(IIF(BETWEEN(v.date, sd, ed), v.credit-v.debit, 0))
FROM v
WHERE v.name = vn.name) CurBal
FROM vn
GROUP BY vn.type
ORDER BY vn.type
HAVING OpBal + CurBal != 0
Basically, I've just turned selection from v into subselects to avoid vn.obal to be repeated. It shouldn't matter for v that it first gets the sum for the individual person before summing them all together.

just a few things. VFP 9 has a setting to NOT require grouping for all non-aggregate for backward compatibility and similar results like MySQL where not all columns must be aggregates. Such as querying extra columns from a customer's record that never change no matter how many records you are joining against on its PK column (name, address, phone, whatever).
SET ENGINEBEHAVIOR 80
default for VFP 9
SET ENGINEBEHAVIOR 90
requires all non group-by columns to be aggregates to comply.
Next... it looks like you have very bad columns in the tables you are dealing with... 3 reserved words in VFP... "date", "Name" and "type", however you are ok by qualifying them in the query with alias.column reference.
The following sample code will create temporary tables (cursors) of the structures you've described in your question. I've also inserted some sample data and simulated your "sd" (start date) and "ed" (end date) variables
CREATE CURSOR vn;
( date d, ;
name c(25), ;
type c(25), ;
obal n(7) )
INSERT INTO vn VALUES ( CTOD( "5/20/2012" ), "person 1", "person type 1", 125 )
INSERT INTO vn VALUES ( CTOD( "5/20/2012" ), "person 2", "another type ", 2155 )
CREATE CURSOR v;
( date d, ;
name c(25), ;
desc c(50), ;
debit n(7), ;
credit n(7))
INSERT INTO V VALUES ( CTOD( "6/1/2012" ), "person 1", "description 1", 10, 32 )
INSERT INTO V VALUES ( CTOD( "6/2/2012" ), "person 1", "desc 2", 235, 123 )
INSERT INTO V VALUES ( CTOD( "6/3/2012" ), "person 1", "desc 3", 22, 4 )
INSERT INTO V VALUES ( CTOD( "6/4/2012" ), "person 1", "desc 4", 53, 36 )
INSERT INTO V VALUES ( CTOD( "6/5/2012" ), "person 1", "desc 5", 31, 3 )
INSERT INTO V VALUES ( CTOD( "6/1/2012" ), "person 2", "another 1", 43, 664 )
INSERT INTO V VALUES ( CTOD( "6/4/2012" ), "person 2", "more desc", 78, 332 )
INSERT INTO V VALUES ( CTOD( "6/6/2012" ), "person 2", "anything", 366, 854 )
sd = CTOD( "6/3/2012" ) && start date of transactions
ed = DATE() && current date as the end date...
Now, the querying... You are trying to get groups by type but the per person (name) needs to be pre-aggregated on a per person basis FIRST. Now, it appears you are trying to get a total opening balance of transactions prior to the start date (sd) as the basis at a given point in time, then looking at activity WITHIN the start/end date in question. Do this first, but don't deal with adding in the "obal" column from the "vn" table. Since it needs aggregates of non-group by columns, I would just use "MAX()" of the column. Since it is by a PK (name) basis, you'll end up with whatever it was, but with the rolled-up totals of transactions, yet have all your data pre-summarized into a single row via...
select;
vn.name,;
vn.type,;
MAX( vn.obal ) as BalByNameOnly,;
SUM( IIF( v.date < sd, v.credit-v.debit, 000000.00 )) OpBal, ;
SUM( IIF( BETWEEN(v.date, sd, ed), v.credit - v.debit, 000000.00 )) CurBal ;
FROM ;
v,;
vn ;
WHERE ;
v.name = vn.name;
GROUP BY ;
vn.Name,;
vn.Type;
INTO ;
CURSOR C_JustByName READWRITE
With this results (from my sample data) would look like...
Name Type BalByNameOnly OpBal CurBal
person 1 person type 1 125 -90 -63
person 2 another type 2155 621 742
Your final aggregate to get by type, you can just query the above result "cursor" (C_JustByName) and use IT to get your grouping by type, having, etc... something like
SELECT ;
JBN.type, ;
JBN.BalByNameOnly - JBN.OpBal as OpBal,;
JBN.CurBal ;
FROM ;
C_JustByName JBN ;
GROUP BY ;
vn.type ;
ORDER BY ;
vn.type ;
HAVING ;
OpBal + CurBal != 0;
INTO ;
CURSOR C_Final
Now, I am just simplifying the above because I don't know what you are really looking for as the date within your "VN" (appears to be like a customer table) with a date that is unclear of its purpose and its oBal column with respect to the transactions table.
The nice thing about VFP is that you can query into a temporary cursor without creating a permanent table and use IT as the basis for any querying after that... It helps in the readability of not having to nest query inside query inside query. It also allows you to see the results of each layer and know you are getting the answers you are EXPECTING before continuing to the next query phase...
Hopefully this will help you in the direction of what you are trying to solve.

Once again, this question was bumped to front :( Before I made a comment like:
It is unfortunate that this question from 6 years back has been bumped
to front page :( The reason is that the question is starving for some
explanations and the design is screaming of flaws. Without knowing
those details no answer would be good. 1) Never ever rely on the old
enginebehavior to workaround this error. It was a BUG and corrected.
Relying on a bug is not the way to go. 2) To create a date value,
never ever do that by converting from a string. That is settings
dependent. Better either use solid date(), datetime() functions or
strict date\datetime literals.
That comment stands and there are more, anyway.
Let's add more about the flaws and a reply based on what I understand from the question.
In the question OP says:"
SELECT vn.type, SUM(vn.obal + IIF(v.date < sd, v.credit-v.debit, 0)) OpBal, ;
SUM(IIF(BETWEEN(v.date, sd, ed), v.credit-v.debit, 0)) CurBal ;
FROM v, vn WHERE v.name = vn.name GROUP BY vn.type ;
ORDER BY vn.type HAVING OpBal + CurBal != 0
works fine."
But of course, any seasoned developer using SQL, would immediately see that it would not work fine, the results could only be fine coincidentally. Let's make it clear why it wouldn't really work. First let's create some cursors describing the OP's data:
CREATE CURSOR vn ( date d, name c(25), type c(25), obal n(7) )
INSERT INTO vn (date, name, type, obal) VALUES ( DATE(2012,5,20), "abc", "bank", 100 )
INSERT INTO vn (date, name, type, obal) VALUES ( DATE(2012,5,20), "def", "bank", 200 )
INSERT INTO vn (date, name, type, obal) VALUES ( DATE(2012,5,20), "ghi", "bank", 300 )
INSERT INTO vn (date, name, type, obal) VALUES ( DATE(2012,5,20), "xyz", "ledger", 400 )
INSERT INTO vn (date, name, type, obal) VALUES ( DATE(2012,5,20), "pqr", "ledger", 500 )
CREATE CURSOR v ( date d, name c(25), desc c(50), debit n(7), credit n(7))
INSERT INTO V (date,name,desc,debit,credit) VALUES ( DATE(2012,6,1), "abc", "description 1", 50, 0 )
INSERT INTO V (date,name,desc,debit,credit) VALUES ( DATE(2012,6,2), "abc", "description 1", 60, 0 )
INSERT INTO V (date,name,desc,debit,credit) VALUES ( DATE(2012,6,3), "abc", "description 1", 70, 0 )
INSERT INTO V (date,name,desc,debit,credit) VALUES ( DATE(2012,6,1), "def", "description 1", 50, 0 )
INSERT INTO V (date,name,desc,debit,credit) VALUES ( DATE(2012,6,2), "def", "description 1", 60, 0 )
INSERT INTO V (date,name,desc,debit,credit) VALUES ( DATE(2012,6,3), "def", "description 1", 70, 0 )
INSERT INTO V (date,name,desc,debit,credit) VALUES ( DATE(2012,6,1), "ghi", "description 1", 50, 0 )
INSERT INTO V (date,name,desc,debit,credit) VALUES ( DATE(2012,6,2), "ghi", "description 1", 60, 0 )
INSERT INTO V (date,name,desc,debit,credit) VALUES ( DATE(2012,6,4), "xyz", "description 1", 50, 0 )
INSERT INTO V (date,name,desc,debit,credit) VALUES ( DATE(2012,6,5), "xyz", "description 1", 60, 0 )
INSERT INTO V (date,name,desc,debit,credit) VALUES ( DATE(2012,6,6), "pqr", "description 1", 50, 0 )
INSERT INTO V (date,name,desc,debit,credit) VALUES ( DATE(2012,6,7), "pqr", "description 1", 60, 0 )
INSERT INTO V (date,name,desc,debit,credit) VALUES ( DATE(2012,6,8), "pqr", "description 1", 70, 0 )
Let's run OP's query on this data:
LOCAL sd,ed
sd = date(2012,6,1) && start date of transactions
ed = DATE() && current date as the end date...
SELECT vn.type, SUM(vn.obal + IIF(v.date < sd, v.credit-v.debit, 0)) OpBal, ;
SUM(IIF(BETWEEN(v.date, sd, ed), v.credit-v.debit, 0)) CurBal ;
FROM v, vn WHERE v.name = vn.name GROUP BY vn.type ;
ORDER BY vn.type HAVING OpBal + CurBal != 0
We get a result like:
TYPE OPBAL CURBAL
------- ----- ------
bank 1500 -470
ledger 2300 -290
which is obviously incorrect. With a query like this, the more you get credit or debit, the more you have an opening balance. Let's see why is that happening, removing the group by and aggregation, check what we are really summing up:
SELECT vn.type, vn.name, v.date, vn.obal, v.credit, v.debit ;
FROM v, vn ;
WHERE v.name = vn.name
Output:
TYPE NAME DATE OBAL CREDIT DEBIT
bank abc 06/01/2012 100 0 50
bank abc 06/02/2012 100 0 60
bank abc 06/03/2012 100 0 70
bank def 06/01/2012 200 0 50
bank def 06/02/2012 200 0 60
bank def 06/03/2012 200 0 70
bank ghi 06/01/2012 300 0 50
bank ghi 06/02/2012 300 0 60
ledger xyz 06/04/2012 400 0 50
ledger xyz 06/05/2012 400 0 60
ledger pqr 06/06/2012 500 0 50
ledger pqr 06/07/2012 500 0 60
ledger pqr 06/08/2012 500 0 70
You can see that, say for 'abc' OBal 100 is repeated 3 times, because of the 3 entries in v. Summing would make it 300 when it is just 100.
When you are using an aggregate like SUM() or AVG(), you should do the aggregation first without a join, then do your join. You can still do the aggregation, with a join, PROVIDED, the join results in a 1-to-many relation. If the above resultset were:
TYPE NAME OBAL CREDIT DEBIT
bank abc 100 0 180
bank def 200 0 180
bank ghi 300 0 110
ledger xyz 400 0 110
ledger pqr 500 0 180
it would be OK to SUM() BY type (1 side of 1-to-Many).
Having said that and adding VFP supports subqueries let's write a solution:
Local sd,ed
sd = Date(2012,6,1) && start date of transactions
ed = Date() && current date as the end date...
Select vn.Type, Sum(vn.OBal - tmp.TotCd) As Opbal, Sum(tmp.Curbal) As Curbal ;
FROM vn ;
LEFT Join ;
(Select v.Name, Sum(Iif(v.Date < sd, v.credit-v.debit, 0)) TotCd, ;
SUM(Iif(Between(v.Date, sd, ed), v.credit-v.debit, 0)) Curbal ;
FROM v ;
GROUP By v.Name ) tmp On tmp.Name = vn.Name ;
GROUP By vn.Type ;
ORDER By vn.Type ;
HAVING Sum(vn.OBal - tmp.TotCd + tmp.Curbal) != 0
we get what we want:
TYPE OPBAL CURBAL
------- ----- ------
bank 600 -470
ledger 900 -290

Related

How to compare range of integer values in PL/SQL?

I am trying to compare a range of integer values between a test table and a reference table. If any range of values from the test table overlaps with the available ranges in the reference table, it should be deleted.
Sorry if it's not clear but here is an example data:
TEST_TABLE:
MIN MAX
10 121
122 648
1200 1599
REFERENCE_TABLE:
MIN MAX
50 106
200 1400
1450 1500
MODIFIED TEST_TABLE: (expected result after running PL/SQL)
MIN MAX
10 49
107 121
122 199
1401 1449
1501 1599
In the first row from the example above, the 10-121 has been cut down into two rows: 10-49 and 107-121 because the values 50, 51, ..., 106 are included in the first row of the reference_table (50-106); and so on..
Here's what I've written so far with nested loops. I've created two additional temp tables that would store all the values that would be found in the reference table. Then it would create new sets of ranges to be inserted to test_table.
But this does not seem to work correctly and might cause performance issues especially if we're dealing with values of millions and above:
CREATE TABLE new_table (num_value NUMBER);
CREATE TABLE new_table_next (num_value NUMBER, next_value NUMBER);
-- PL/SQL start
DECLARE
l_count NUMBER;
l_now_min NUMBER;
l_now_max NUMBER;
l_final_min NUMBER;
l_final_max NUMBER;
BEGIN
FOR now IN (SELECT min_num, max_num FROM test_table) LOOP
l_now_min:=now.min_num;
l_now_max:=now.max_num;
WHILE (l_now_min < l_now_max) LOOP
SELECT COUNT(*) -- to check if number is found in reference table
INTO l_count
FROM reference_table refr
WHERE l_now_min >= refr.min_num
AND l_now_min <= refr.max_num;
IF l_count > 0 THEN
INSERT INTO new_table (num_value) VALUES (l_now_min);
COMMIT;
END IF;
l_now_min:=l_now_min+1;
END LOOP;
INSERT INTO new_table_next (num_value, next_value)
VALUES (SELECT num_value, (SELECT MIN (num_value) FROM new_table t2 WHERE t2.num_value > t.num_value) AS next_value FROM new_table t);
DELETE FROM test_table t
WHERE now.min_num = t.min_num
AND now.max_num = t.max_num;
COMMIT;
SELECT (num_value + 1) INTO l_final_min FROM new_table_next;
SELECT (next_value - num_value - 2) INTO l_final_max FROM new_table_next;
INSERT INTO test_table (min_num, max_num)
VALUES (l_final_min, l_final_max);
COMMIT;
DELETE FROM new_table;
DELETE FROM new_table_next;
COMMIT;
END LOOP;
END;
/
Please help, I'm stuck. :)

The idea behind this approach is to unwind both tables, keeping track of whether the numbers are in the reference table or the original table. This is really cumbersome, because adjacent values can cause problems.
The idea is then to do a "gaps-and-islands" type solution along both dimensions -- and then only keep the values that are in the original table and not in the second. Perhaps this could be called "exclusionary gaps-and-islands".
Here is a working version:
with vals as (
select min as x, 1 as inc, 0 as is_ref
from test_table
union all
select max + 1, -1 as inc, 0 as is_ref
from test_table
union all
select min as x, 0, 1 as is_ref
from reference_table
union all
select max + 1 as x, 0, -1 as is_ref
from reference_table
)
select min, max
from (select refgrp, incgrp, ref, inc2, min(x) as min, (lead(min(x), 1, max(x) + 1) over (order by min(x)) - 1) as max
from (select v.*,
row_number() over (order by x) - row_number() over (partition by ref order by x) as refgrp,
row_number() over (order by x) - row_number() over (partition by inc2 order by x) as incgrp
from (select v.*, sum(is_ref) over (order by x, inc) as ref,
sum(inc) over (order by x, inc) as inc2
from vals v
) v
) v
group by refgrp, incgrp, ref, inc2
) v
where ref = 0 and inc2 = 1 and min < max
order by min;
And here is a db<>fiddle.
The inverse problem of getting the overlaps is much easier. It might be feasible to "invert" the reference table to handle this.
select greatest(tt.min, rt.min), least(tt.max, rt.max)
from test_table tt join
reference_table rt
on tt.min < rt.max and tt.max > rt.min -- is there an overlap?

This is modified from a similar task (using dates instead of numbers) I did on Teradata, it's based on the same base data as Gordon's (all begin/end values combined in a single list), but uses a simpler logic:
WITH minmax AS
( -- create a list of all existing start/end values (possible to simplify using Unpivot or Cross Apply)
SELECT Min AS val, -1 AS prio, 1 AS flag -- main table, range start
FROM test_table
UNION ALL
SELECT Max+1, -1, -1 -- main table, range end
FROM test_table
UNION ALL
SELECT Min, 1, 1 -- reference table, adjusted range start
FROM reference_table
UNION ALL
SELECT Max+1, 1, -1 -- reference table, adjusted range end
FROM reference_table
)
, all_ranges AS
( -- create all ranges from current to next row
SELECT minmax.*,
Lead(val) Over (ORDER BY val, prio desc, flag) AS next_val, -- next value = end of range
Sum(flag) Over (ORDER BY val, prio desc, flag ROWS Unbounded Preceding) AS Cnt -- how many overlapping periods exist
FROM minmax
)
SELECT val, next_val-1
FROM all_ranges
WHERE Cnt = 1 -- 1st level only
AND prio + flag = 0 -- either (prio -1 and flag 1) = range start in base table
-- or (prio 1 and flag -1) = range end in ref table
ORDER BY 1
See db-fiddle

Here's one way to do this. I put the test data in a WITH clause rather than creating the tables (I find this is easier for testing purposes). I used your column names (MIN and MAX); these are very poor choices though, as MIN and MAX are Oracle keywords. They will generate confusion for sure, and they may cause queries to error out.
The strategy is simple - first take the COMPLEMENT of the ranges in REFERENCE_TABLE, which will also be a union of intervals (using NULL as marker for minus infinity and plus infinity); then take the intersection of each interval in TEST_TABLE with each interval in the complement of REFERENCE_TABLE. How that is done is shown in the final (outer) query in the solution below.
with
test_table (min, max) as (
select 10, 121 from dual union all
select 122, 648 from dual union all
select 1200, 1599 from dual
)
, reference_table (min, max) as (
select 50, 106 from dual union all
select 200, 1400 from dual union all
select 1450, 1500 from dual
)
,
prep (min, max) as (
select lag(max) over (order by max) + 1 as min
, min - 1 as max
from ( select min, max from reference_table
union all
select null, null from dual
)
)
select greatest(t.min, nvl(p.min, t.min)) as min
, least (t.max, nvl(p.max, t.max)) as max
from test_table t inner join prep p
on t.min <= nvl(p.max, t.max)
and t.max >= nvl(p.min, t.min)
order by min
;
MIN MAX
---------- ----------
10 49
107 121
122 199
1401 1449
1501 1599

Example to resolve the problem:
CREATE TABLE xrange_reception
(
vdeb NUMBER,
vfin NUMBER
);
CREATE TABLE xrange_transfert
(
vdeb NUMBER,
vfin NUMBER
);
CREATE TABLE xrange_resultat
(
vdeb NUMBER,
vfin NUMBER
);
insert into xrange_reception values (10,50);
insert into xrange_transfert values (15,25);
insert into xrange_transfert values (30,33);
insert into xrange_transfert values (40,45);
DECLARE
CURSOR cr_rec IS SELECT * FROM xrange_reception;
CURSOR cr_tra IS
SELECT *
FROM xrange_transfert
ORDER BY vdeb;
i NUMBER;
vdebSui NUMBER;
BEGIN
FOR rc IN cr_rec
LOOP
i := 1;
vdebSui := NULL;
FOR tr IN cr_tra
LOOP
IF tr.vdeb BETWEEN rc.vdeb AND rc.vfin
THEN
IF i = 1 AND tr.vdeb > rc.vdeb
THEN
INSERT INTO xrange_resultat (vdeb, vfin)
VALUES (rc.vdeb, tr.vdeb - 1);
ELSIF i = cr_rec%ROWCOUNT AND tr.vfin < rc.vfin
THEN
INSERT INTO xrange_resultat (vdeb, vfin)
VALUES (tr.vfin, rc.vfin);
ELSIF vdebSui < tr.vdeb
THEN
INSERT INTO xrange_resultat (vdeb, vfin)
VALUES (vdebSui + 1, tr.vdeb - 1);
END IF;
vdebSui := tr.vfin;
i := i + 1;
END IF;
END LOOP;
IF vdebSui IS NOT NULL THEN
IF vdebSui < rc.vfin
THEN
INSERT INTO xrange_resultat (vdeb, vfin)
VALUES (vdebSui + 1, rc.vfin);
END IF;
ELSE
INSERT INTO xrange_resultat (vdeb, vfin)
VALUES (rc.vdeb, rc.vfin);
END IF;
END LOOP;
END;
So:
Table xrange_reception:
vdeb vfin
10 50
Table xrange_transfert:
vdeb vfin
15 25
30 33
40 45
Table xrange_resultat:
vdeb vfin
10 14
26 29
34 39
46 50

Reshape from wide to long in big query (standard SQL)

Unfortunately reshaping in BQ it's not as easy as in R and I can't export my data for this project.
Here is input
date country A B C D
20170928 CH 3000.3 121 13 3200
20170929 CH 2800.31 137 23 1614.31
Expected output
date country Metric Value
20170928 CH A 3000.3
20170928 CH B 121
20170928 CH C 13
20170928 CH D 3200
20170929 CH A 2800.31
20170929 CH B 137
20170929 CH C 23
20170929 CH D 1614.31
Also my table has many more columns and rows (but I assume a lot of manual will be required)

Below is for BigQuery Standard SQL and does not require repeating selects depends on number of columns. It will pick as many as you have and transform them into metrics and values
#standardSQL
SELECT DATE, country,
metric, SAFE_CAST(value AS FLOAT64) value
FROM (
SELECT DATE, country,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value
FROM `project.dataset.yourtable` t,
UNNEST(SPLIT(REGEXP_REPLACE(to_json_string(t), r'{|}', ''))) pair
)
WHERE NOT LOWER(metric) IN ('date', 'country')
You can test / play with above using dummy data as in your question
#standardSQL
WITH `project.dataset.yourtable` AS (
SELECT '20170928' DATE, 'CH' country, 3000.3 A, 121 B, 13 C, 3200 D UNION ALL
SELECT '20170929', 'CH', 2800.31, 137, 23, 1614.31
)
SELECT DATE, country,
metric, SAFE_CAST(value AS FLOAT64) value
FROM (
SELECT DATE, country,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') metric,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') value
FROM `project.dataset.yourtable` t,
UNNEST(SPLIT(REGEXP_REPLACE(to_json_string(t), r'{|}', ''))) pair
)
WHERE NOT LOWER(metric) IN ('date', 'country')
result is as expected
DATE country metric value
20170928 CH A 3000.3
20170928 CH B 121.0
20170928 CH C 13.0
20170928 CH D 3200.0
20170929 CH A 2800.31
20170929 CH B 137.0
20170929 CH C 23.0
20170929 CH D 1614.31

You need UNION which is denoted using commas in bigquery
SELECT date, country, Metric, Value
FROM (
SELECT date, country, 'A' as Metric, A as Value FROM your_table
), (
SELECT date, country, 'B' as Metric, B as Value FROM your_table
), (
SELECT date, country, 'C' as Metric, C as Value FROM your_table
) , (
SELECT date, country, 'D' as Metric, D as Value FROM your_table
)

Most answers that I managed to find required specifying the name of EVERY column to be melt. This is not tractable when I have hundreds to thousands of columns in the table. Here is an answer that works for an arbitrarily wide table.
It utilizes dynamic SQL and automatically extract multiple column names from the data schema, collate a command string and then evaluate that string. This is intended to mimic Python pandas.melt() / R reshape2::melt() behavior.
I intentionally did not create user defined functions because of some undesirable properties of UDFs. Depending on how you use this, you may or may not want to do that.
Input:
id0 id1 _2020_05_27 _2020_05_28
1 1 11 12
1 2 13 14
2 1 15 16
2 2 17 18
Output:
id0 id1 date value
1 2 _2020_05_27 13
1 2 _2020_05_28 14
2 2 _2020_05_27 17
2 2 _2020_05_28 18
1 1 _2020_05_27 11
1 1 _2020_05_28 12
2 1 _2020_05_27 15
2 1 _2020_05_28 16
#standardSQL
-- PANDAS MELT FUNCTION IN GOOGLE BIGQUERY
-- author: Luna Huang
-- email: lunahuang#google.com
-- run this script with Google BigQuery Web UI in the Cloud Console
-- this piece of code functions like the pandas melt function
-- pandas.melt(id_vars, value_vars, var_name, value_name, col_level=None)
-- without utilizing user defined functions (UDFs)
-- see below for where to input corresponding arguments
DECLARE cmd STRING;
DECLARE subcmd STRING;
SET cmd = ("""
WITH original AS (
-- query to retrieve the original table
%s
),
nested AS (
SELECT
[
-- sub command to be automatically generated
%s
] as s,
-- equivalent to id_vars in pandas.melt()
%s,
FROM original
)
SELECT
-- equivalent to id_vars in pandas.melt()
%s,
-- equivalent to var_name in pandas.melt()
s.key AS %s,
-- equivalent to value_name in pandas.melt()
s.value AS %s,
FROM nested
CROSS JOIN UNNEST(nested.s) AS s
""");
SET subcmd = ("""
WITH
columns AS (
-- query to retrieve the column names
-- equivalent to value_vars in pandas.melt()
-- the resulting table should have only one column
-- with the name: column_name
%s
),
scs AS (
SELECT FORMAT("STRUCT('%%s' as key, %%s as value)", column_name, column_name) AS sc
FROM columns
)
SELECT ARRAY_TO_STRING(ARRAY (SELECT sc FROM scs), ",\\n")
""");
-- -- -- EXAMPLE BELOW -- -- --
-- SET UP AN EXAMPLE TABLE --
CREATE OR REPLACE TABLE `tmp.example`
(
id0 INT64,
id1 INT64,
_2020_05_27 INT64,
_2020_05_28 INT64,
);
INSERT INTO `tmp.example` VALUES (1, 1, 11, 12);
INSERT INTO `tmp.example` VALUES (1, 2, 13, 14);
INSERT INTO `tmp.example` VALUES (2, 1, 15, 16);
INSERT INTO `tmp.example` VALUES (2, 2, 17, 18);
-- MELTING STARTS --
-- execute these two command to melt the table
-- the first generates the STRUCT commands
-- and saves a string in subcmd
EXECUTE IMMEDIATE FORMAT(
-- please do not change this argument
subcmd,
-- query to retrieve the column names
-- equivalent to value_vars in pandas.melt()
-- the resulting table should have only one column
-- with the name: column_name
"""
SELECT column_name
FROM `tmp.INFORMATION_SCHEMA.COLUMNS`
WHERE (table_name = "example") AND (column_name NOT IN ("id0", "id1"))
"""
) INTO subcmd;
-- the second implements the melting
EXECUTE IMMEDIATE FORMAT(
-- please do not change this argument
cmd,
-- query to retrieve the original table
"""
SELECT *
FROM `tmp.example`
""",
-- please do not change this argument
subcmd,
-- equivalent to id_vars in pandas.melt()
-- !!please type these twice!!
"id0, id1", "id0, id1",
-- equivalent to var_name in pandas.melt()
"date",
-- equivalent to value_name in pandas.melt()
"value"
);

How to add and subtract value from previous rows based on condition

I have a table with values
Slno Type Amount
1 P 40
2 C 20
3 P 45
4 P 20
5 C 10
I want to get values for RESULT column.
Type Amount RESULT
P 40 40
C 20 20
P 45 65
P 20 85
C 10 75
If Type is C then value gets subtracted from previous value,
else if Type is P then value gets added to previous values.
This is what i've tried:
;WITH FINALMIDRESULT
AS (SELECT Type,
Value1,
Row_number()
OVER(
ORDER BY Slno ASC) rownum
FROM #midRes)
SELECT C1.Type,
C1.Value1,
CASE
WHEN C1.Type = 'C' THEN (SELECT Sum(Amount)
FROM FINALMIDRESULT c2
WHERE c2.rownum <= C1.rownum)
ELSE (SELECT Sum(Amount) - Sum(Amount)
FROM FINALMIDRESULT c2
WHERE c2.rownum <= C1.rownum)
END AS RESULT
FROM FINALMIDRESULT C1
This is the Result that i have got
Type Amount RESULT
P 40 0
C 20 60
P 45 0
P 20 0
C 10 135

You need to implement a seft INNER JOIN to sum all values with Slno less than the current value, like below:
;WITH OriginalData AS
( SELECT *
FROM
( VALUES
(1, 'P', 40),
(2, 'C', 20),
(3, 'P', 45),
(4, 'P', 20),
(5, 'C', 10)
) AS Temp(Slno, Type, Amount)
)
SELECT [Current].Type, [Current].Amount,
ISNULL(SUM(
CASE WHEN [Previous].Type = 'P'
THEN +[Previous].Amount
ELSE -[Previous].Amount
END),0) +
CASE WHEN [Current].Type = 'P'
THEN +[Current].Amount
ELSE -[Current].Amount
END Result
FROM OriginalData [Current]
LEFT JOIN OriginalData [Previous]
ON [Previous].Slno < [Current].Slno
GROUP BY [Current].Slno, [Current].Type, [Current].Amount
ORDER BY [Current].Slno
I think the biggest change you can make is to shift your mindset. When you think "previous values" you chose a procedural path which can be solved my any major programming language, but rapidly evolve to a cursor approach in SQL -- what isn't appropriate in this case.
When comes to SQL, you need to think in "sets", so you can drive your efforts to identify those data sets and combine them.

SELECT SlNo, Type, Amount,
Sum((Case when Type='C' then -1 else 1 END)*Amount) Over(Order by SlNo) Result
FROM TableName

sql query to divide a value over different records

Consider the following recordset:
1 1000 -1
2 500 2
3 1000 -1
4 500 3
5 500 2
6 1000 -1
7 500 1
So 3x a number 1000 with -1, total -3.
4x a number 500 with different values
Now I'm in need of a query which divides the sum of code 1000 over the 4 number 500 and removes code 1000.
So the end result would look like:
1 500 1.25
2 500 2.25
3 500 1.25
4 500 0.25
The sum of code 1000 = -3
There's 4 times code 500 in the table over which -3 has to be divided.
-3/4 = -0.75
so the record "2 500 2" becomes "2 500 (2+ -0.75)" = 1.25
etc
As an SQL newbie I have no clue how to get this done, can anyone help?

You can use CTEs to do it "step-wise" and build your solution. Like this:
with sumup as
(
select sum(colb) as s
from table
where cola = 1000
), countup as
(
select count(*) as c
from table
where cola = 500
), change as
(
select s / c as v
from sumup, countup
)
select cola, colb - v
from table, change
where cola = 500
Two things to note:
This might not be the fastest solution, but it is often close.
You can test this code easy, just change to final select statement to select the name of the CTE and see what it is. For example this would be a good test if you are getting a bad result:
with sumup as
(
select sum(colb) as s
from table
where cola = 1000
), countup as
(
select count(*) as c
from table
where cola = 500
), change as
(
select s / c as v
from sumup, countup
)
select * form change

Select col1,(
(Select sum(col2 )
from tab
where col1 =1000)
/
(Select count(*)
from tab
where col1 =500))+Col2 as new_value
From tab
Where col1=500
Here tab, col1,col2 are table name, column with (1000 , 500) value, column with (1,2,3 value)

This will give the results you are after:
DECLARE #T TABLE (ID INT, Number INT, Value INT)
INSERT #T (ID, Number, Value)
VALUES
(1, 1000, -1),
(2, 500, 2),
(3, 1000, -1),
(4, 500, 3),
(5, 500, 2),
(6, 1000,-1),
(7, 500, 1);
SELECT Number, Value, NewValue = Value + (x.Total / COUNT(*) OVER())
FROM #T T
CROSS JOIN
( SELECT Total = CAST(SUM(Value) AS FLOAT)
FROM #T
WHERE Number = 1000
) x
WHERE T.Number = 500;
Inside the cross join we simply get the sum where the number is 1000, this could just as easily be done as a subselect:
SELECT Number, Value, NewValue = Value + ((SELECT CAST(SUM(Value) AS FLOAT) FROM #T WHERE Number = 1000) / COUNT(*) OVER())
FROM #T T
WHERE T.Number = 500;
Or with a variable:
DECLARE #Total FLOAT = (SELECT SUM(Value) FROM #T WHERE Number = 1000);
SELECT Number, Value, NewValue = Value + (#Total / COUNT(*) OVER())
FROM #T T
WHERE T.Number = 500;
Then using the analytic function COUNT(*) OVER() you can count the total number of results that are 500.

And here is another solution:
select number1, value1,
value1
+ (select sum(value1) from table1 where number1=1000)/
(select count(*) from table1 where number1=500) calc_value
from table1 where number1=500
http://sqlfiddle.com/#!6/c68a0/1
I hope I got your question right. Then this is imho the best to read.

Inserting missing rows with a join

I have a SQL script that returns this derived table.
MM/YYYY Cat Score
01/2012 Test1 17
02/2012 Test1 19
04/2012 Test1 15
05/2012 Test1 16
07/2012 Test1 14
08/2012 Test1 15
09/2012 Test1 15
12/2012 Test1 11
01/2013 Test2 10
02/2013 Test2 15
03/2013 Test2 13
05/2013 Test2 18
06/2013 Test2 14
08/2013 Test2 15
09/2013 Test2 14
12/2013 Test2 10
As you can see, I am missing some MM/YYYYs (03/2012, 06/2012, 11/2012, etc).
I would like to fill in the missing MM/YYYYs with the Cat & a 0 (zero) form the score.
I have tried to join a table that contains the all MM/YYYY for the ranges the query will be run, but this only returns the missing rows for the first occurrence, it does not repeat for each Cat (should have known that).
So my question is this, can I do this using a join or will I have to do this in a temp table, and then output the data.
AHIGA,
LarryR…

You need to cross join your categories and a list of all dates in the range. Since you have posted no table structures I'll have to guess at your structure slightly, but assuming you have a calendar table you can use something like this:
SELECT calendar.Date,
Category.Cat,
Score = ISNULL(Scores.Score, 0)
FROM Calendar
CROSS JOIN Catogory
LEFT JOIN Scores
ON Scores.Cat = Category.Cat
AND Scores.Date = Calendar.Date
WHERE Calendar.DayOfMonth = 1;
If you do not have a calendar table you can generate a list of dates using the system table Master..spt_values:
SELECT Date = DATEADD(MONTH, Number, '20120101')
FROM Master..spt_values
WHERE Type = 'P';
Where the hardcoded date '20120101' is the first date in your range.
ADDENDUM
If you need to actually insert the missing rows, rather than just have a query that fills in the blanks you can use this:
INSERT Scores (Date, Cat, Score)
SELECT calendar.Date,
Category.Cat,
Score = 0
FROM Calendar
CROSS JOIN Catogory
WHERE Calendar.DayOfMonth = 1
AND NOT EXISTS
( SELECT 1
FROM Scores
WHERE Scores.Cat = Category.Cat
AND Scores.Date = Calendar.Date
)
Although, in my opinion if you have a query that fills in the blanks inserting the data is a bit of a waste of time.

To get what you want, start with a driver table and then use left outer join. The result is something like this:
select driver.cat, driver.MMYYYY, coalesce(t.score, 0) as score
from (select cat, MMYYYY
from (select distinct cat from t) c cross join
themonths -- use where to get a date range
) driver left outer join
t
on t.cat = driver.cat and t.MMMYYYY = driver.MMYYYY

Try this one -
DECLARE #temp TABLE (FDOM DATETIME, Cat NVARCHAR(50), Score INT)
INSERT INTO #temp (FDOM, Cat, Score)
VALUES
('20120101', 'Test1', 17),('20120201', 'Test1', 19),
('20120401', 'Test1', 15),('20120501', 'Test1', 16),
('20120701', 'Test1', 14),('20120801', 'Test1', 15),
('20120901', 'Test1', 15),('20121001', 'Test1', 13),
('20121201', 'Test1', 11),('20130101', 'Test1', 10),
('20130201', 'Test1', 15),('20130301', 'Test1', 13),
('20130501', 'Test1', 18),('20130601', 'Test1', 14),
('20130801', 'Test1', 15),('20130901', 'Test1', 14),
('20131201', 'Test1', 10),('20120601', 'Test2', 10)
;WITH enum AS
(
SELECT Cat, StartDate = MIN(FDOM), EndDate = MAX(FDOM)
FROM #temp
GROUP BY Cat
UNION ALL
SELECT Cat, DATEADD(MONTH, 1, StartDate), EndDate
FROM enum
WHERE StartDate < EndDate
)
SELECT e.StartDate, t.Cat, Score = ISNULL(t.Score, 0)
FROM enum e
LEFT JOIN #temp t ON e.StartDate = t.FDOM AND e.Cat = t.Cat
ORDER BY e.StartDate, t.Cat

Do a left join from "complete table" to "incomplete table" and set a where statement to check the date column of the "incomplete" table. So you will only get the missing results in your select query. After that, just set a "insert into tablename" before.
In the first run it will find two rows, that aren't already in the incomplete table. So it will be inserted by the insert into statement, two rows affected. In a second run the result in the select statement has 0 rows, so nothing happens. Zero rows affected :-)
Sample: http://sqlfiddle.com/#!2/895fe/6
(Just mark the select statement; the insert into statement isn't required to just see, how the join works)
Insert Into supportContacts
Select * FROM
(
Select
'01/2012' as DDate, 'Test1' as Cat, 17 as Score
UNION
Select
'02/2012' as DDate, 'Test1' as Cat, 17 as Score
UNION
Select
'03/2012' as DDate, 'Test1' as Cat, 17 as Score
UNION
Select
'04/2012' as DDate, 'Test1' as Cat, 17 as Score
UNION
Select
'05/2012' as DDate, 'Test1' as Cat, 17 as Score
) CompleteTable
LEFT JOIN
(
Select
'01/2012' as DDate, 'Test1' as Cat, 17 as Score
UNION
Select
'02/2012' as DDate, 'Test1' as Cat, 17 as Score
UNION
Select
'03/2012' as DDate, 'Test1' as Cat, 17 as Score
) InCompleteTable
ON CompleteTable.DDate = IncompleteTable.DDate
WHERE IncompleteTable.DDate is null

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL query to select a column with expression of non-aggregate value and aggregate function - sql

Related

How to compare range of integer values in PL/SQL?

Reshape from wide to long in big query (standard SQL)

How to add and subtract value from previous rows based on condition

sql query to divide a value over different records

Inserting missing rows with a join

Categories

Resources