TSQL filtering by character match - sql

How to filter out number of count matches without building new user functions(i.e. you can use built-in functions) on a given data?
The requirement is to get rows with the gw column numbers appearing the same amount of times or if there is different set of amounts their number must match the other ones count. I.e. it could be all 1 like the Sandy's or it could be Don since it has '1' two times and '2' two times as well. Voland would not meet the requirements since he has '1' two times but only once '2' and etc. You don't want to count '0' at all.
login gw1 gw2 gw3 gw4 gw5
Peter 1 0 1 0 0
Sandy 1 1 1 1 0
Voland 1 0 1 2 0
Don 1 2 0 1 2
Diserid output is:
login gw1 gw2 gw3 gw4 gw5
Peter 1 0 1 0 0
Sandy 1 1 1 1 0
Don 1 2 0 1 2
Values could be any positive number of times. To match the criteria values also has to be at least twice total. I.e. 1 2 3 4 0 is not OK. since every value appears only once. 1 1 0 3 3 is a match.

SQL Fiddle
WITH Cte(login, gw) AS(
SELECT login, gw1 FROM TestData WHERE gw1 > 0 UNION ALL
SELECT login, gw2 FROM TestData WHERE gw2 > 0 UNION ALL
SELECT login, gw3 FROM TestData WHERE gw3 > 0 UNION ALL
SELECT login, gw4 FROM TestData WHERE gw4 > 0 UNION ALL
SELECT login, gw5 FROM TestData WHERE gw5 > 0
),
CteCountByLoginGw AS(
SELECT
login, gw, COUNT(*) AS cc
FROM Cte
GROUP BY login, gw
),
CteFinal AS(
SELECT login
FROM CteCountByLoginGw c
GROUP BY login
HAVING
MAX(cc) > 1
AND COUNT(DISTINCT gw) = (
SELECT COUNT(*)
FROM CteCountByLoginGw
WHERE
c.login = login
AND cc = MAX(c.cc)
)
)
SELECT t.*
FROM CteFinal c
INNER JOIN TestData t
ON t.login = c.login
First you unpivot the table without including gw that are equal to 0.
The result (CTE) is:
login gw
---------- -----------
Peter 1
Sandy 1
Voland 1
Don 1
Sandy 1
Don 2
Peter 1
Sandy 1
Voland 1
Sandy 1
Voland 2
Don 1
Don 2
Then, you perform a COUNT(*) GROUP BY login, gw. The result would be (CteCountByLoginGw):
login gw cc
---------- ----------- -----------
Don 1 2
Peter 1 2
Sandy 1 4
Voland 1 2
Don 2 2
Voland 2 1
Finally, only get those login whose max(cc) is greater 1. This is to eliminate rows like 1,2,3,4,0. And login whose unique gw is the same the max(cc). This is to make sure that the occurrence of a gw column is the same as others:
login gw1 gw2 gw3 gw4 gw5
---------- ----------- ----------- ----------- ----------- -----------
Peter 1 0 1 0 0
Sandy 1 1 1 1 0
Don 1 2 0 1 2

I know I'm late to the party, I can't type as fast as some and I think I arrived about 40 minutes late but since I done it, I thought I'd share it anyway.
My method used unpivot and pivot to achieve the result:
Select *
from foobar f1
where exists
(Select * from
(Select login_, Case when [1] = 0 then null else [1] % 2 end Val1, Case when [2] = 0 then null else [2] % 2 end Val2,
Case when [3] = 0 then null else [3] % 2 end Val3, Case when [4] = 0 then null else [4] % 2 end Val4, Case when [5] = 0 then null else [5] % 2 end Val5
from
(Select *
from
(select * from foobar) src
UNPIVOT
(value for amount in (gw1, gw2, gw3, gw4, gw5)) unpvt) src2
PIVOT
(count(amount) for value in ([1],[2],[3],[4],[5])) as pvt) res
Where 0 in (Val1,Val2, Val3, Val4, Val5) and not exists (select * from foobar where 1 in (Val1, Val2, Val3, Val4, Val5)) and login_ = f1.login_)
and here is the fiddle: http://www.sqlfiddle.com/#!6/b78f8/1/0

I think this logic is correct, viz
Find the rows where there is at least one even number of equal gws and no instance of odd number of gws. Zeroes are excluded
Find the rows where there is at least one number of equal gws and no instance of a single gws. Zeroes are excluded.
Unpivot is used to simplify reasoning over the gwx columns, and CTEs are used to prevent repetition.
WITH unpvt AS
(
SELECT *
FROM MyTable
UNPIVOT
(
gwvalue
for z in (gw1, gw2, gw3, gw4, gw5)
) x
),
grp AS
(
SELECT [login], gwvalue, COUNT(gwvalue) gwCount
FROM unpvt
WHERE gwvalue > 0
GROUP BY [login], gwvalue
)
SELECT
*
FROM MyTable mt
WHERE EXISTS
(
SELECT 1
FROM grp g
WHERE g.[login] = mt.[login]
AND gwCount > 1
)
AND NOT EXISTS
(
SELECT 1
FROM grp g
WHERE g.[login] = mt.[login]
AND gwCount = 1
);
SqlFiddle here

Related

Get the sum of (count(column1) + count(column2))

I have a table A:
entity_id name
------------------
1 Test1
2 Test2
3 Test3
4 Test4
5 Test5
6 Test6
I have a table B:
entity_id value1 value2
-----------------------------
1 10 20
1 15 30
2 10 25
1 9 45
3 null 1
2 45 50
3 20 null
I need to write a single query to select the entity_id and name from Table A and count the total occurrences for an entity_id of columns value1 and value2 from Table B and then the total of those column counts (null doesn't count).
So my output table would be:
entity_id name value1_count value2_count total_count
----------------------------------------------------------------------
1 Test1 3 3 6
2 Test2 1 2 3
3 Test3 1 1 2
4 Test4 0 0 0
5 Test5 0 0 0
6 Test6 0 0 0
I am having trouble summing the count of value1 and count of value2 and outputting that value in the total_count per unique entity_it.
This is the query I have so far:
SELECT DISTINCT a.entity_id, a.name
, count(b.value1) AS value1_count, count(b.value2) AS value2_count, sum(2) AS total_count
FROM a
LEFT JOIN b ON a.entity_id = b.entity_id
GROUP BY a.entity_id, a.name
I know that the sum(2) as total_count is incorrect and doesn't get me what I want.
SELECT entity_id, a.name
, COALESCE(b.v1_ct, 0) AS value1_count
, COALESCE(b.v2_ct, 0) AS value2_count
, COALESCE(b.v1_ct + b.v2_ct, 0) AS total_count
FROM a
LEFT JOIN (
SELECT entity_id, count(value1) AS v1_ct, count(value2) AS v2_ct
FROM b
GROUP BY 1
) b USING (entity_id);
db<>fiddle here
Aggregate first, join later. That's simpler and faster. See:
Query with LEFT JOIN not returning rows for count of 0
count() never produces NULL. Only the LEFT JOIN can introduce NULL values for counts in this query, so v1_ct and v2_ct are either both NULL or both NOT NULL. Hence COALESCE(v1_ct + v2_ct, 0) is ok. (Else, one NULL would nullify the other summand in the addition.)
try this :
WITH list AS
(
SELECT b.entity_id
, count(*) FILTER (WHERE b.value1 IS NOT NULL) OVER () AS value1_count
, count(*) FILTER (WHERE b.value2 IS NOT NULL) OVER () AS value2_count
FROM Table_B AS b
GROUP BY b.entity_id
)
SELECT a.entity_id, a.name
, COALESCE(l.value1_count, 0)
, COALESCE(l.value2_count,0)
, COALESCE(l.value1_count + l.value2_count, 0) AS total_count
FROM Table_A AS a
LEFT JOIN list AS l
ON a.entity_id = l.entity_id

How to update a column based on values of other columns

I have a tables as below
row_wid id code sub_code item_nbr orc_cnt part_cnt variance reporting_date var_start_date
1 1 ABC PQR 23AB 0 1 1 11-10-2019 NULL
2 1 ABC PQR 23AB 0 1 1 12-10-2019 NULL
3 1 ABC PQR 23AB 1 1 0 13-10-2019 NULL
4 1 ABC PQR 23AB 1 2 1 14-10-2019 NULL
5 1 ABC PQR 23AB 1 3 2 15-10-2019 NULL
I have to update var_start_date column with min(reporting_date) for each combination of id,code,sub_code and item_nbr only till variance field is zero.
Row with variance = 0 should have null var_start_date. and next row after that should have next min(var_start_date.). FYI, variance is calculated as par_cnt-orc_cnt
so my output should look like this -
row_wid id code sub_code item_nbr orc_cnt part_cnt variance reporting_date var_start_date
1 1 ABC PQR 23AB 0 1 1 11-10-2019 11-10-2019
2 1 ABC PQR 23AB 0 1 1 12-10-2019 11-10-2019
3 1 ABC PQR 23AB 1 1 0 13-10-2019 NULL
4 1 ABC PQR 23AB 1 2 1 14-10-2019 14-10-2019
5 1 ABC PQR 23AB 1 3 2 15-10-2019 14-10-2019
I am trying to write a function using below query to divide the data into sets.
SELECT DISTINCT MIN(reporting_date)
OVER (partition by id, code,sub_code,item_nbr ORDER BY row_wid ),
RANK() OVER (partition by id, code,sub_code,item_nbr ORDER BY row_wid)
AS rnk,id, code,sub_code,item_nbr,orc_cnt,part_cnt,variance,row_wid
FROM TABLE T1
.But dont know how to include variance field to split the sets.
I would suggest:
select t.*,
(case when variance <> 0
then min(reporting_date) over (partition by id, code, sub_code, item_nbr, grouping)
end) as new_reporting_date
from (select t.*,
sum(case when variance = 0 then 1 else 0 end) over (partition by id, code, sub_code, item_nbr) as grouping
from t
) t;
Note that this does not use a JOIN. It should be more efficient than an answer that does.
Try as below
SELECT T.*, CASE WHEN T.variance = 0 THEN NULL ELSE MIN(reporting_date) OVER (PARTITION BY T1.RANK ORDER BY T1.RANK) END AS New_var_start_date
FROM mytbl T
LEFT JOIN (
SELECT row_wid, variance, COUNT(CASE variance WHEN 0 THEN 1 END) OVER (ORDER BY row_wid ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) +1 AS [Rank]
FROM mytbl
) T1 ON T.row_wid = T1.row_wid
SQL FIDDLE DEMO

Summing numbers across multiple columns in BigQuery

I have a query which returns many columns which are either 1 or 0 depending on a users interaction with many points of a website, my data looks like this:
UserID Variable_1 Variable_2 Variable_3 Variable_4 Variable_5
User 1 1 0 1 0 0
User 2 0 0 1 0 0
User 3 0 0 0 0 1
User 4 0 1 1 1 1
User 5 1 0 0 0 1
Each variable is defined with it's own line of code like:
MAX(IF(LOWER(hits_product.productbrand) LIKE "Variable_1",1,0)) AS Variable_1,
I'd like to have one column that sums up all the rows per user. which looks like this:
UserID Total Variable_1 Variable_2 Variable_3 Variable_4 Variable_5
User 1 2 1 0 1 0 0
User 2 3 1 1 1 0 0
User 3 0 0 0 0 0 0
User 4 5 1 1 1 1 1
User 5 3 1 0 1 0 1
What is the most elegant way to achieve this?
Even though it happen that for OP's particular case simple COUNT(DISTINCT) will suffice - I still wanted to answer original question of how to sum up all numerical columns into one Total without having dependency on number and names of those columns
Below is for BigQuery Standard SQL
#standardSQL
SELECT
UserID,
( SELECT SUM(CAST(value AS INT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d+),?')) value
) Total,
* EXCEPT(UserID)
FROM t
This can be tested / played with using dummy data from question
#standardSQL
WITH t AS (
SELECT 'User 1' UserID, 1 Variable_1, 0 Variable_2, 1 Variable_3, 0 Variable_4, 0 Variable_5 UNION ALL
SELECT 'User 2', 1, 1, 1, 0, 0 UNION ALL
SELECT 'User 3', 0, 0, 0, 0, 0 UNION ALL
SELECT 'User 4', 1, 1, 1, 1, 1 UNION ALL
SELECT 'User 5', 1, 0, 1, 0, 1
)
SELECT
UserID,
( SELECT SUM(CAST(value AS INT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d+),?')) value
) Total,
* EXCEPT(UserID)
FROM t
ORDER BY UserID
result is
Row UserID Total Variable_1 Variable_2 Variable_3 Variable_4 Variable_5
1 User 1 2 1 0 1 0 0
2 User 2 3 1 1 1 0 0
3 User 3 0 0 0 0 0 0
4 User 4 5 1 1 1 1 1
5 User 5 3 1 0 1 0 1
A simple method uses a subquery or CTE:
select t.*, (v1 + v2 + v3 . . . ) as total
from (<your query here>
) t;
Not knowing what the data looks like, it is quite possible that count(distinct hits_product.productbrand) would also do the trick.
How about defining multiple variable columns into one repeated 'variables' column, of KeyValue messages, where a key would be your variable name and value a number, it can greatly simplify your calculation.

SQL -- Multiple rows, similar value in a row, need to not show specific values

Here is the issue:
Table name = a
1 2 3
123 1 A
123 1 A
123 2 A
332 1 A
332 1 A
321 2 B
321 2 A
321 1 A
So far what I have is this:
select distinct 1,2,3 from a where a.2='1' and a.3='B';
What it returns is each result (except for 321).
I only want to select values column 1 as long as that value is not in a row where there is a 2 in column 2 or a B in column 3. Is this possible?
"not in a row where there is a 2 in column 2 or a B in column 3" can be expressed as
select distinct 1,2,3 from a where a.2!='2' or a.3!='B';
or
select distinct 1,2,3 from a where a.2 <> '2' or a.3 <> 'B';
I would use group by and having:
select col1
from t
group by col1
having sum(case when col2 = 2 then 1 else 0 end) = 0 and
sum(case when col3 = 'B' then 1 else 0 end) = 0;

SQL Concatenate Rows by Composite Group

I need to concatenate rows values into a column based on which group the row belongs to using two grouping values.
TBL1
cat1 cat2 cat3 value
---- ---- ---- -----
1 1 lvl1 100
1 2 lvl2 abc
1 3 lvl2 cba
2 1 lvl1 200
2 2 lvl2 abb
3 1 lvl1 100
3 2 lvl2 bbc
3 3 lvl2 acc
3 4 lvl1 400
3 5 lvl2 acc
4 1 lvl1 300
4 2 lvl2 aab
...
TBL2
cat1 cat2 value
---- ---- ---------
1 100 abc, cba
2 200 abb
3 100 bbc, aac
3 400 aac
4 300 aab
...
This is using static DB2 SQL. The actual table has over a thousand records.
At least some versions of DB2 support listagg(). So the tricky part is identifying the groups. You can do this by counting the number of rows with where the value is a number, cumulatively. The resulting query is something like this:
select cat1,
max(case when value >= '0' and value <= '999' then value end) as cat2,
listagg(case when not value >= '0' and value <= '999' then value end, ', ') within group (order by cat2) as value
from (select t.*,
sum(case when value >= '0' and value <= '999' then 1 else 0 end) over (order by cat1, cat2) as grp
from t
) t
group by cat1, grp;
Checking for a number in DB2 can be tricky. The above uses simple between logic that is sufficient for your sample data.