BigQuery, subtract between 2 tables with the same column names - sql

I have 2 tables in BigQuery.
The first table is user_id table, each user has many labels associated with the user (label1, label2, label3, ...). The second table is product_id table, and each product also has the same number of labels associated with it (label1, label2, label3, ...)
Table 1:
user_id, label1, label2, label3, ... (hundreds of columns)
001 , 1 , 2 , 0 , ...
002 , 2 , 0 , 1 , ...
Table 2:
product_id, label1, label2, label3, ... (hundreds of columns)
a , 0 , 3 , 1 , ...
b , 1 , 2 , 0 , ...
I'd like to write a sql script to generate the following table. The label columns are calculated by labelX in user_id table - labelX in product_id table. For example, the label1 cell for the row with user_id=001 and product_id=a is calculated by 001's label1 - a's label1 = 1-0 = 1.
user_id, product_id, label1, label2, label3, ... (hundreds of columns)
001 , a , 1 , -1 , -1 , ...
001 , b , 0 , 0 , 0 , ...
002 , a , 2 , -3 , 0 , ...
002 , b , 1 , -2 , 1 , ...

You can cross join both tables. The main query that you need to execute is :
select
a.user_id, b.product_id,
(a.label1 - b.label1) as label1,
(a.label2 - b.label2) as label2,
(a.label3 - b.label3) as label3,
...
from table1 as a
cross join table2 as b
This will require you to dynamically generate the query though based on labels. You can use programming language, or bigquery scripting as below. Make sure to replace the label count, and table names in the query.
DECLARE label_clause STRING;
SET (label_clause) = (
select as struct STRING_AGG('(a.label' || i || '-b.label'|| i ||') as label' || i, ',')
from unnest(generate_array(1,100)) i
);
EXECUTE IMMEDIATE
FORMAT("select a.user_id, b.product_id, %s from table1 as a cross join table2 as b", label_clause)

Related

Value as a % of row total in QlikSense

In QlikSense i want to show value as % of row total, using:
count(-value-) / count(Total -value-)
Using the expression above gives me % of totals not row total.
For example, in the new type column, T, % should be 7935 / 8287 = 95.75% and not 13.71%
count(new_type_id) / count(Total New_Type_ID) gives me the result in attached picture
Im using the following script to recreate your data:
Load * inline [
Type, New_Type_ID, new_type_id
T , 8287 , 7935
B , 11942 , 565
C , 18233 , 674
X , 13890 , 165
P , 5515 , 0
];
And using the following expression:
Sum(new_type_id) / Sum(total <Type> New_Type_ID)
The result will show 95.75% for Type = T
The difference is in the scope of the Total qualifier. In your case Total will be called over all results/values and will yield 57 867.
Total can be "forced"/scoped to be called only over specific field(s) values:
Total <[field name here]>
From Qlik's documentation page
Update
(as pivot table and split)
Load script to achieve pivot table structure:
Load * inline [
Type, Amount
T , 8287
B , 11942
C , 18233
X , 13890
P , 5515
];
Load * Inline [
New_Type, Type, SplitAmount
T , T , 7935
B , T , 332
C , T , 12
X , T , 6
P , T , 2
T , B , 565
B , B , 11022
C , B , 302
X , B , 45
P , B , 8
];
Measures:
Units calculationSum(SplitAmount)
% calculation Sum(SplitAmount) / Sum(Total <Type> SplitAmount)

How to calculate row using different value from joined table

I need to run an update on a SQL table, using a value from a different (but almost identical) row in the same table, to show how different sales scenarios play out.
e.g.
Starting with:
ITEM, SITE, SUPPLIER, SCENARIO, SALES_VOLUME, PRICE, TOTAL_SALES
1 , A , X , S1 , 100 , 10 , 1000
2 , A , Y , S1 , 25 , 20 , 500
1 , A , X , S2 , {blank} , 20 , 0
I would like to update to show:
ITEM, SITE, SUPPLIER, SCENARIO, SALES_VOLUME, PRICE, TOTAL_SALES
1 , A , X , 1 , 100 , 10 , 1000
2 , A , Y , 1 , 25 , 20 , 500
1 , A , X , 2 , {blank} , 20 , 2000
So basically, if SCENARIO = S2, I need to recalculate using SALES_VOLUME where SCENARIO = S1
I tried the following, but it didn't work - I think because I'm trying to specify both =1 and !=1 in the same lookup.
UPDATE TABLE1
SET [TOTAL_SALES] = (t1.[PRICE] * t2.[SALES_VOLUME])
FROM TABLE1 t1
inner join TABLE1 t2
on t1.[ITEM] = t2.[ITEM]
and t1.[SITE] = t2.[SITE]
and t1.[SUPPLIER] = t2.[SUPPLIER]
and t1.[SCENARIO] = 'S1'
WHERE t1.[SCENARIO] != 'S1'
I don't think I'm too far off, but just feel I'm missing something.
Any pointers would be gratefully received. :)
You may try this -
UPDATE t1
SET [TOTAL_SALES] = (t1.[PRICE] * t2.[SALES_VOLUME])
FROM TABLE1 t1
inner join TABLE1 t2
on t1.[ITEM] = t2.[ITEM]
and t1.[SITE] = t2.[SITE]
and t1.[SUPPLIER] = t2.[SUPPLIER]
and t1.[SCENARIO] = 'S2'
WHERE t2.[SCENARIO] = 'S1';

Group and count by another columns value

I have a table like below:
CREATE TABLE public.test_table
(
"ID" serial PRIMARY KEY NOT NULL,
"CID" integer NOT NULL,
"SEG" integer NOT NULL,
"DDN" character varying(3) NOT NULL
)
and data looks like this:
ID CID SEG DDN
1 1 1 "711"
2 1 2 "800"
3 1 3 "124"
4 2 1 "711"
5 3 1 "711"
6 3 2 "802"
7 4 1 "799"
8 5 1 "799"
9 5 2 "804"
10 6 1 "799"
I need to group these data by CID column and get column counts depends on DDN columns first values but counts must give me two different information, if it's more than 1 or not.
I'm really sorry if couldn't explains clearly. Let me show you what I need..
DDN END TRA
711 1 2
799 2 1
As you can see, DDN:711 has 1 record of single count (ID:4). This is END column.
But 2 times has multiple SEG count (ID:1to3 and ID:5to6). This is TRA column.
I can not be sure what column should be in group clause!
My solution:
Just found a solution like below
WITH x AS (
SELECT
(SELECT t1."DDN" FROM public.test_table AS t1
WHERE t1."CID"=t."CID" AND t1."SEG"=1) AS ddn,
COUNT("CID") AS seg_count
FROM public.test_table AS t
GROUP BY "CID"
)
SELECT ddn, COUNT(seg_count) AS "TOTAL",
SUM(CASE WHEN x.seg_count=1 THEN 1 ELSE 0 END) as "END",
SUM(CASE WHEN x.seg_count>1 THEN 1 ELSE 0 END) as "TRA"
FROM x
GROUP BY ddn;
Equivalent, faster query:
SELECT "DDN"
, COUNT(*) AS "TOTAL"
, COUNT(*) FILTER (WHERE seg_count = 1) AS "END"
, COUNT(*) FILTER (WHERE seg_count > 1) AS "TRA"
FROM (
SELECT DISTINCT ON ("CID")
"DDN" -- assuming min "SEG" is always 1
, COUNT(*) OVER (PARTITION BY "CID") AS seg_count
FROM test_table
ORDER BY "CID", "SEG"
) sub
GROUP BY "DDN";
db<>fiddle here
Notes
CTEs are typically slower and should only be used where needed in Postgres.
This is equivalent to the query in the question assuming that the minimum "SEG" per "CID" is always 1 - since this query returns the row with the minimum "SEG" while your query returns the one with "SEG" = 1. Typically, you would want the "first" segment and my query implements this requirement more reliably, but that's not clear from the question.
COUNT(*) is slightly faster than COUNT(column) and equivalent while not involving NULL values (applicable here). Related:
PostgreSQL: running count of rows for a query 'by minute'
About DISTINCT ON:
Select first row in each GROUP BY group?
The aggregate FILTER syntax requires Postgres 9.4+:
Conditional SQL count
Here is the solution i propose, the query can be simplified i guess.
CREATE TABLE test_table
(
ID serial PRIMARY KEY NOT NULL,
CID integer NOT NULL,
SEG integer NOT NULL,
DDN character varying(3) NOT NULL
);
insert into test_table(CID,SEG,DDN)
values
( 1, 1, '711'),
( 1, 2, '800'),
( 1, 3, '124'),
( 2, 1, '711'),
( 3, 1, '711'),
( 3, 2, '802'),
( 4, 1, '799'),
( 5, 1, '799'),
( 5, 2, '804'),
( 6, 1, '799');
with summary as (with ddn_t as (select cid,ddn,row_number() OVER( PARTITION BY cid)from test_table)
select a.cid,count(distinct a.ddn),b.ddn
from ddn_t a
join ddn_t b on b.cid=a.cid and b.row_number=1
group by a.cid, b.ddn)
select ddn,
sum (case when count >1 then 1 else 0 end) as TRA,
sum (case when count = 1 then 1 else 0 end) as END
from summary
group by ddn;

SQLite: Select record based on select result values on same table

I have a table Fields. Example:-
id remote unique_id
1 23 30007
1 24 30008
1 1 30009
2 4 30007
2 5 30008
2 1 30009
3 6 30007
3 7 30008
3 2 30009
here i want to get sum of unique_id field=30008 remote value where unique_id field=30009 remote value = 1;
basically, what i want is:
(query 1)
select SUM(remote) from Fields where unquie_id = '30008';
and one added filter where:(query 2)
select remote from Fields where unquie_id = '30009';
i want to select the _id record of query 2 where remote=1 for my query 1.
So the above output would be: 24+5=29.
Here i am selecting _id as 1,2. _id =3 would be rejected as it remote for unique_id = 30009 is 2.
This might be simple for you guys but i am new with sqlite. Hence please help.
If I understand your question correctly, then you want to sum the remote column where the unique_id is 30008, but only for those id groups having remote=1 and unique_id=30009.
SELECT SUM(f1.remote)
FROM Fields f1
INNER JOIN
(
SELECT id
FROM Fields
GROUP BY id
HAVING SUM(CASE WHEN unique_id = '30009' AND remote = 1 THEN 1 ELSE 0 END) > 0
) f2
ON f1.id = f2.id
WHERE unique_id = '30008';
I didn't actually run it, but something like this should do what you need:
select SUM(remote) from Fields as F where F.unique_id = '30008' and F.id in (select remote from Fields as G where G.unique_id = '30009');
So I tested it:
create table Fields(id integer, remote integer, unique_id integer);
insert into Fields(id, remote, unique_id) values (1, 23, 3007), (1, 24, 30008), (1 ,1 ,30009), (2, 4, 30007), (2 , 5 , 30008), (2 , 1 , 30009), (3 , 6, 30007), (3 , 7, 30008
), (3 , 2 , 30009);
/*select * from Fields;*/
select SUM(remote) from Fields as F where F.unique_id = '30008' and F.id in (select remote from Fields as G where G.unique_id = '30009');
And the ouput was: 29 as desired.
You could use exists
select sum(remote) total
from Fields f
where exists (
select 1 from Fields
where remote = f.id and unique_id = 30009
) and f.unique_id = 30008

SQL Server UDF array inputs and outputs

I have a set of columns CODE_1-10, which contain diagnostic codes. I want to create a set of variables CODE_GROUP_1-17, which indicate whether or not one of some particular set of diagnostic codes matches any of the CODE_1-10 variables. For example, CODE_GROUP_1 = 1 if any of CODE_1-10 match either '123' or '456', and CODE_GROUP_2 = 1 if any of CODE_1-10 match '789','111','333','444' or 'foo'.
Here's an example of how you could do this using values constructors.
CASE WHEN (SELECT count(value.val)
FROM (VALUES (CODE_1)
, (CODE_2)
, (CODE_3)
, (CODE_4)
, (CODE_5)
, (CODE_6)
, (CODE_7)
, (CODE_8)
, (CODE_9)
, (CODE_10)
) AS value(val)
WHERE value.val in ('123', '456')
) > 0 THEN 1 ELSE 0 END AS CODE_GROUP_1,
CASE WHEN (SELECT count(value.val)
FROM (VALUES (CODE_1)
, (CODE_2)
, (CODE_3)
, (CODE_4)
, (CODE_5)
, (CODE_6)
, (CODE_7)
, (CODE_8)
, (CODE_9)
, (CODE_10)
) AS value(val)
WHERE value.val in ('789','111','333','444','foo')
) > 0 THEN 1 ELSE 0 END AS CODE_GROUP_2
I am wondering if there is another way to do this that is more efficient. Is there a way to make a CLR UDF that takes an array of CODE_1-10, and outputs a set of columns CODE_GROUP_1-17?
You could at least avoid the repetition of FROM (VALUES ...) like this:
SELECT
CODE_GROUP_1 = COUNT(DISTINCT CASE WHEN val IN ('123', '456') THEN 1 END),
CODE_GROUP_2 = COUNT(DISTINCT CASE WHEN val IN ('789','111','333','444','foo') THEN 1 END),
...
FROM
(
VALUES
(CODE_1),
(CODE_2),
(CODE_3),
(CODE_4),
(CODE_5),
(CODE_6),
(CODE_7),
(CODE_8),
(CODE_9),
(CODE_10)
) AS value(val)
If CODE_1, CODE_2 etc. are column names, you can use the above query as a derived table in CROSS APPLY:
SELECT
...
FROM
dbo.atable -- table containing CODE_1, CODE_2 etc.
CROSS APPLY
(
SELECT ... -- the above query
) AS x
;
Can you create 2 new tables with the columns appended as rows? So one table would be dxCode with a source column if you need to retain the 1-10 value and the dx code and whatever key field(s) you need, the other table would be dxGroup with your 17 groups, the source groupID if you need it, and your target dx values.
Then to determine which codes are in which groups, you can join on your dx fields.