Selecting max value between columns and replacing non min values to 0 - sql

I have 4 columns in table that each have different category values. I would like to find the maximum value between the columns and keep only that while turning all other values to 0. How can I go about doing this.
Reproducible example
CREATE TABLE #df (
cat1 int,
cat2 int,
cat3 int,
cat4 int
);
INSERT INTO #df
(
cat1,
cat2,
cat3,
cat4
)
VALUES
( 1, 0, 3, 4 ),
( 0, 2, 0, 4 ),
( 1, 2, 0, 0 ),
( 0, 0, 0, 4 )
SELECT * FROM #df
Final Table:
Cat1
Cat2
Cat3
Cat4
0
0
0
4
0
0
0
4
0
2
0
0
0
0
0
4
My attempt: This is close to what I want but instead of keeping the old columns, it creates a new column with the max value. I would like the same 4 columns as before but the non max values replaced to 0.
SELECT Cat1, Cat2, Cat3, Cat4,
(SELECT Max(Col) FROM (VALUES (Cat1), (Cat2), (Cat3), (Cat4)) AS X(Col)) AS TheMax
FROM #df

Just use your existing statement as part of an apply then you can use an inline if (or case expression) to pick the required value:
select
Iif(cat1 = themax, cat1, 0) cat1,
Iif(cat2 = themax, cat2, 0) cat2,
Iif(cat3 = themax, cat3, 0) cat3,
Iif(cat4 = themax, cat4, 0) cat4
from t
cross apply (
select Max(Col) from (values(Cat1), (Cat2), (Cat3), (Cat4))x(Col)
)m(themax)

Related

Calculating distance using geometry of x and y location in SQL

I'm using SQL Server and I need to calculate the distance between the x and y of a frame and the previous x and y of a frame where the day, team, and member are all the same. Currently, I have this code that works but doesn't accomplish what I need. I'm getting every distance permutation of the x and y location where the day, team, and member are all the same.
I need help to incorporate frames into the query so that I get the N+1 Frame x and y location minus the N Frame x and y location.
CREATE TABLE TestTable (
Day int NULL,
Frame int NULL,
Team int NULL,
Member int NULL,
x float NULL,
y float NULL
);
Insert into a Values
(1, 1, 1, 1, 1486.64, 2017.55),
(1, 1, 1, 2, 1754.55, 1495.81),
(1, 1, 2,1, 2049.15, 876.349),
(1, 2, 1, 1, 1707.59, 1171.22),
(1, 2, 1, 2, 1432.56, 1459.99),
(1, 2, 2, 1, 1470.27, 1086.22),
(1, 3, 1, 1, 3639.19, 1281.36),
(1, 3, 1, 2, 2751.37, 976.348),
(1, 3, 2, 1, 2496.69, 1283.29),
(1, 4, 1, 1, 2347.26, 984.255),
(1, 4, 1, 2, 2044.92, 711.154),
(1, 4, 2, 1, 2473.65, 1816.23);
Select A.Day, A.Frame, A.Team, A.Member,
GEOMETRY::Point(A.[x], A.[y], 0).STDistance(GEOMETRY::Point(B.[x], B.[y], 0)) As Distance
From a A
Join a B
ON A.Day = B.Day and A.Team = B.Team and A.Member = B.Member
I also may deal with NULL x and y values so if it's possible to add this to the query too.
Where A.x IS NOT NULL and A.y IS NOT NULL
Ultimately I want to track the distance of every member throughout the day, frame by frame.Later, I'll add up each member's total distance for the day.
;WITH CTE1 AS
(
SELECT
[day], team, member, frame, x, y,
LAG(x) OVER (PARTITION BY [day], team, member ORDER BY frame) AS PervFrameX,
LAG(y) OVER (PARTITION BY [day], team, member ORDER BY frame) AS PervFrameY
FROM
TestTable
WHERE
X IS NOT NULL AND Y IS NOT NULL
),
CTE2 AS
(
SELECT
[day], team, member, frame, x, y, PervFrameX, PervFrameY,
IIF(PervFrameX IS NULL OR PervFrameY IS NULL, 0,
GEOMETRY::Point(x, y, 0).STDistance(GEOMETRY::Point(PervFrameX, PervFrameY, 0))) As Distance
FROM
CTE1
)
SELECT
*,
SUM(Distance) OVER (PARTITION BY [day], team, member) AS MemberTotalDistance,
SUM(Distance) OVER (PARTITION BY [day]) AS DailyTotalDistance
FROM
CTE2
ORDER BY
[day], team, member, frame
CTE1 and CTE2 are used to improve readability of the query.
Output:
day team member frame x y PervFrameX PervFrameY Distance MemberTotalDistance DailyTotalDistance
1 1 1 1 1486.64 2017.55 NULL NULL 0.000 4135.086 8812.698
1 1 1 2 1707.59 1171.22 1486.64 2017.55 874.696 4135.086 8812.698
1 1 1 3 3639.19 1281.36 1707.59 1171.22 1934.738 4135.086 8812.698
1 1 1 4 2347.26 984.255 3639.19 1281.36 1325.652 4135.086 8812.698
1 1 2 1 1754.55 1495.81 NULL NULL 0.000 2483.257 8812.698
1 1 2 2 1432.56 1459.99 1754.55 1495.81 323.976 2483.257 8812.698
1 1 2 3 2751.37 976.348 1432.56 1459.99 1404.695 2483.257 8812.698
1 1 2 4 2044.92 711.154 2751.37 976.348 754.586 2483.257 8812.698
1 2 1 1 2049.15 876.349 NULL NULL 0.000 2194.355 8812.698
1 2 1 2 1470.27 1086.22 2049.15 876.349 615.750 2194.355 8812.698
1 2 1 3 2496.69 1283.29 1470.27 1086.22 1045.167 2194.355 8812.698
1 2 1 4 2473.65 1816.23 2496.69 1283.29 533.438 2194.355 8812.698

Select column names with X highest values

I have created a matrix of users and interactions with product categories, my data looks like this, where each row is a user and each column is a category, with the number indicating how many interactions they have made with that category:
User Cat1 Cat2 Cat3 Cat4 Cat5 ...
1 0 1 0 2 30
2 0 0 10 5 0
3 0 5 0 0 0
4 2 0 20 2 0
5 0 40 0 0 0
...
I'd like to add a column (either in this query or in a fresh query on this table) which returns, for each user, the 3 column names that contain the highest values.
My complete data has 200+ columns.
Any suggestions on how I could achieve this in StandardSQL?
Here is the code I used to build my grid:
SELECT
customDimension.value AS UserID,
SUM(IF(LOWER(hits_product.productbrand) LIKE "Brand 1",1,0)) AS brand_1,
SUM(IF(LOWER(hits_product.productbrand) LIKE "Brand 2",1,0)) AS brand_2,
SUM(IF(LOWER(hits_product.productbrand) LIKE "Brand 3",1,0)) AS brand_3,
FROM
`table*` AS t
CROSS JOIN
UNNEST (hits) AS hits
CROSS JOIN
UNNEST(t.customdimensions) AS customDimension
CROSS JOIN
UNNEST(hits.product) AS hits_product
WHERE
parse_DATE('%y%m%d',
_table_suffix) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 1 day)
AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 day)
AND customDimension.index = 2
AND hits.eventInfo.eventCategory = 'Ecommerce'
AND hits.eventInfo.eventAction = 'Purchase'
GROUP BY
UserID
LIMIT 50
Below is for BigQuery Standard SQL (and has no dependency on number of category columns - even though example has just 5)
#standardSQL
SELECT *,
ARRAY_TO_STRING(ARRAY(
SELECT SPLIT(kv, ':')[OFFSET(0)]
FROM UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{"}]', ''))) kv
WHERE LOWER(SPLIT(kv, ':')[OFFSET(0)]) <> 'user'
ORDER BY CAST(SPLIT(kv, ':')[OFFSET(1)] AS INT64) DESC
LIMIT 3
), ',') top3_cat
FROM `yourproject.yourdataset.yourtable` t
You can test, play with above using dummy data from your question:
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 user, 0 cat1, 1 cat2, 0 cat3, 2 cat4, 30 cat5 UNION ALL
SELECT 2, 0, 0, 10, 5, 0 UNION ALL
SELECT 3, 0, 5, 0, 0, 0 UNION ALL
SELECT 4, 2, 0, 20, 2, 0 UNION ALL
SELECT 5, 0, 40, 0, 0, 0
)
SELECT *,
ARRAY_TO_STRING(ARRAY(
SELECT SPLIT(kv, ':')[OFFSET(0)]
FROM UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{"}]', ''))) kv
WHERE LOWER(SPLIT(kv, ':')[OFFSET(0)]) <> 'user'
ORDER BY CAST(SPLIT(kv, ':')[OFFSET(1)] AS INT64) DESC
LIMIT 3
), ',') top3_cat
FROM `project.dataset.table` t
with result
Row user cat1 cat2 cat3 cat4 cat5 top3_cat
1 1 0 1 0 2 30 cat5,cat4,cat2
2 2 0 0 10 5 0 cat3,cat4,cat2
3 3 0 5 0 0 0 cat2,cat3,cat1
4 4 2 0 20 2 0 cat3,cat4,cat1
5 5 0 40 0 0 0 cat2,cat3,cat1
I've updated my question with the code I used to build the matrix, would you mind showing how I would integrate your solution?
#standardSQL
WITH `query_result` AS (
SELECT
customDimension.value AS UserID,
SUM(IF(LOWER(hits_product.productbrand) LIKE "Brand 1",1,0)) AS brand_1,
SUM(IF(LOWER(hits_product.productbrand) LIKE "Brand 2",1,0)) AS brand_2,
SUM(IF(LOWER(hits_product.productbrand) LIKE "Brand 3",1,0)) AS brand_3,
...
...
FROM
`table*` AS t
CROSS JOIN
UNNEST (hits) AS hits
CROSS JOIN
UNNEST(t.customdimensions) AS customDimension
CROSS JOIN
UNNEST(hits.product) AS hits_product
WHERE
parse_DATE('%y%m%d',
_table_suffix) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 1 day)
AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 day)
AND customDimension.index = 2
AND hits.eventInfo.eventCategory = 'Ecommerce'
AND hits.eventInfo.eventAction = 'Purchase'
GROUP BY
UserID
LIMIT 50
)
SELECT *,
ARRAY_TO_STRING(ARRAY(
SELECT SPLIT(kv, ':')[OFFSET(0)]
FROM UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{"}]', ''))) kv
WHERE LOWER(SPLIT(kv, ':')[OFFSET(0)]) <> LOWER('UserID')
ORDER BY CAST(SPLIT(kv, ':')[OFFSET(1)] AS INT64) DESC
LIMIT 3
), ',') top3_cat
FROM `query_result` t
Expanding on my comment: If your data were in a more reasonable format like user | category | cat_count you could run something like:
SELECT user, group_concat(category) as top_3_cat
FROM
(
SELECT user, category, rank() OVER (PARTITION BY user ORDER BY cat_count) as cat_rank
FROM yourtable
) cat_ranking
WHERE cat_rank <= 3;
Doing this in your current schema would be nearly impossible given the number of categories you have as columns.
I would focus on unpivoting your table first so it can be ran through the sql above. This may be possible using bigquery's unpivot transform although I'm not sure what the limit is for unpivotting columns.
unpivot col:cat1, cat2, cat3, cat4, cat5, catN groupEvery:N
I don't use bigquery, so I'm not certain how that gets applied to your dataset, but it looks promising.
The other option is UNION many statements together to make up yourtable in that sql above:
SELECT user, 'cat1' as category, cat1 FROM yourtable
UNION ALL SELECT user, 'cat2', cat2 FROM yourtable
UNION ALL SELECT user, 'cat3', cat3 FROM yourtable
UNION ALL SELECT user, 'cat4', cat4 FROM yourtable
UNION ALL SELECT user, 'cat5', cat5 FROM yourtable
UNION ALL SELECT user, 'catN', catN FROM yourtable;
You would use arrays in bigquery:
select t.*,
(select array_agg(s.colname order by s.val desc limit 3)
from unnest(array[struct('col1' as colname), col1 as val),
struct('col2' as colname), col2 as val),
. . .
]
) s
) as top3
from t

SQL Server: how to find the record where a field is X for the first time and there are no later records where it isn't

I tried for quite some time now but cannot figure out how to best do this without using cursors. What I want to do (in SQL Server) is:
Find the earliest (by Date) record where Criterion=1 AND NOT followed by Criterion=0 for each Name and Category.
Or expressed differently:
Find the Date when Criterion turned 1 and not turned 0 again afterwards (for each Name and Category).
Some sort of CTE would seem to make sense I guess but that's not my strong suit unfortunately. So I tried nesting queries to find the latest record where Criterion=0 and then select the next record if there is one but I'm getting incorrect results. Another challenge with this is returning a record where there are only records with Criterion=1 for a Name and Category.
Here's the sample data:
Name Category Criterion Date
------------------------------------------------
Bob Cat1 1 22.11.16 08:54 X
Bob Cat2 0 21.02.16 02:29
Bob Cat3 1 22.11.16 08:55
Bob Cat3 0 22.11.16 08:56
Bob Cat4 0 21.06.12 02:30
Bob Cat4 0 18.11.16 08:18
Bob Cat4 1 18.11.16 08:19
Bob Cat4 0 22.11.16 08:20
Bob Cat4 1 22.11.16 08:50 X
Bob Cat4 1 22.11.16 08:51
Hannah Cat1 1 22.11.16 08:54 X
Hannah Cat2 0 21.02.16 02:29
Hannah Cat3 1 22.11.16 08:55
Hannah Cat3 0 22.11.16 08:56
The rows with an X after the row are the ones I want to retrieve.
It's probably not all that complicated in the end...
If you just want the name, category, and date:
select name, category, min(date)
from t
where criterion = 1 and
not exists (select 1
from t t2
where t2.name = t.name and t2.category = t.category and
t2.criterion = 0 and t2.date >= t.date
)
group by name, category;
There are fancier ways to get this information, but this is a relatively simple method.
Actually, the fancier ways aren't particularly complicated:
select t.*
from (select t.*,
min(case when date > maxdate_0 or maxdate_0 is NULL then date end) over (partition by name, category) as mindate_1
from (select t.*,
max(case when criterion = 0 then date end) over (partition by name, category) as maxdate_0
from t
) t
where criterion = 1
) t
where mindate_1 = date;
EDIT:
SQL Fiddle doesn't seem to be working these days. The following is working for me (using Postgres):
with t(name, category, criterion, date) as (
values ('Bob', 'Cat1', 1, '2016-11-16 08:54'),
('Bob', 'Cat2', 0, '2016-02-21 02:29'),
('Bob', 'Cat3', 1, '2016-11-16 08:55'),
('Bob', 'Cat3', 0, '2016-11-16 08:56'),
('Bob', 'Cat4', 0, '2012-06-21 02:30'),
('Bob', 'Cat4', 0, '2016-11-18 08:18'),
('Bob', 'Cat4', 1, '2016-11-18 08:19'),
('Bob', 'Cat4', 0, '2016-11-22 08:20'),
('Bob', 'Cat4', 1, '2016-11-22 08:50'),
('Bob', 'Cat4', 1, '2016-11-22 08:51'),
('Hannah', 'Cat1', 1, '2016-11-22 08:54'),
('Hannah', 'Cat2', 0, '2016-02-21 02:29'),
('Hannah', 'Cat3', 1, '2016-11-22 08:55'),
('Hannah', 'Cat3', 0, '2016-11-22 08:56')
)
select t.*
from (select t.*,
min(case when date > maxdate_0 or maxdate_0 is NULL then date end) over (partition by name, category) as mindate_1
from (select t.*,
max(case when criterion = 0 then date end) over (partition by name, category) as maxdate_0
from t
) t
where criterion = 1
) t
where mindate_1 = date;
How about a left join, and filter the NULLs?
SELECT yt.Name, yt.Category, yt.Criterion, MIN(yt.Date) AS Date
FROM YourTable yt
LEFT JOIN YourTable lj ON lj.Name = yt.Name AND lj.Category = yt.Category AND
lj.Criterion != yt.Criterion AND lj.Date > yt.Date
WHERE yt.Criterion = 1 AND lj.Name IS NULL
GROUP BY yt.Name, yt.Category, yt.Criterion
there are ton's of ways of doing it especially with Window Functions. The NOT EXISTS, or Anti Join are 2 of the better methods but just for fun here is one of the fancier (to steal Gordon's term) ways of doing it with Window Functions:
;WITH cte AS (
SELECT
Name
,Category
,CASE WHEN Criterion = 1 THEN Date END as Criterion1Date
,MAX(CASE WHEN Criterion = 0 THEN Date END) OVER (PARTITION BY Name, Category) as MaxDateCriterion0
FROM
Table
)
SELECT
Name
,Category
,MIN(Criterion1Date) as Date
FROM
cte
WHERE
ISNULL(MaxDateCriterion0,'1/1/1900') < Criterion1Date
GROUP BY
Name
,Category
Or as a Derived Table if you don't like cte, the only difference is basically nesting the cte in the from clause.
SELECT
Name
,Category
,MIN(Criterion1Date) as Date
FROM
(
SELECT
Name
,Category
,CASE WHEN Criterion = 1 THEN Date END as Criterion1Date
,MAX(CASE WHEN Criterion = 0 THEN Date END) OVER (PARTITION BY Name, Category) as MaxDateCriterion0
FROM
Table
) t
WHERE
ISNULL(MaxDateCriterion0,'1/1/1900') < Criterion1Date
GROUP BY
Name
,Category
Modified answer
select name,category
,min (date) as date
from (select name,category,criterion,date
,min (criterion) over
(
partition by name,category
order by date
rows between current row and unbounded following
) as min_following_criterion
from t
) t
where criterion = 1
and ( min_following_criterion <> 0
or min_following_criterion is null
)
group by name,category

SQL Complex Summation

data, ID, Value, Exp1
201101, 1, 2
201202, 1, 3
201303, 1, 4
201101, 2, 2
201202, 2, 3
201303, 2, 4
201304, 2, 5
201305, 2, 6
201306, 2, 7
201307, 2, 8
201308, 2, 9
201309, 2, 10
201310, 2, 11
201311, 2, 12
201312, 2, 13
I have to calculate the value of Exp1 as
for ID=2. Exp1= (sum of value from 201307 to 201312)/6-(sum of value from 201301 to 201306)/6
Some IDs might not have value for all the months, some might have only one value.
Is this possible in SQL?
for ID 2: Exp1=(13+12+11+10+9+8)/6-(7+6+5+4+3+2)/6
for ID 1: Exp1=(0+0+0+0+0+0+0+0)/6-(2+3+4+0+0+0)/6
This has to be done for all the IDS
select
ID,
sum(
case
when YRMO between 201307 and 201312 then value
else 0
end)/6
- sum(
case
when YRMO between 201301 and 201306 then value
else 0
end)/6 as EXP1
from TABLE
group by ID;
select
id,
sum(value) / 6 exp1
from (
select
id,
case when YRMO between '201301' and '201306' then -value else value end value
from `table`
where YRMO between '201301' and '201312'
) q
group by id

Subquery returned more than 1 value. Inner Query

I have two tables tbl_Category and tbl_Course.
In tbl_Category I have rows like this:
CatID CatName CatDesc
1 Cat1 catDesc1
2 Cat2 catDesc2
3 Cat3 catDesc3
4 Cat4 catDesc4
5 Cat5 catDesc5
and in tbl_course values are like
CoursID Name AssignCategory AdditionalCat
1 cou1 1 2,3
2 cou2 2 3
3 cou3 1 3,4
I need result like below
Category which contains AsignCategory and additionalcat
CatID CatName CatDesc
1 Cat1 catDesc1
2 Cat2 catDesc2
3 Cat3 catDesc3
4 Cat4 catDesc4
Category which does not contains AsignCategory and additionalcat
CatID CatName CatDesc
5 Cat5 catDesc5
I am using this split function
CREATE FUNCTION dbo.StrSplit (#sep char(1), #s varchar(512))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(#sep, #s)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(#sep, #s, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT pn,
SUBSTRING(#s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS s
FROM Pieces
)
I am using below queries form Assign category results:
select * from dbo.Tbl_Category
where catid in(select assigncategory from Tbl_Course )
)
select * from dbo.Tbl_Category
where catid not in(select assigncategory from Tbl_Course
)
Please help me do for additional category result with above query.
You should use CROSS APPLY to make use of your StrSplit udf:
SELECT * FROM dbo.tbl_Category
WHERE CatID IN(
SELECT AssignCategory
FROM tbl_Course
UNION
SELECT CAST(split.S as int)
FROM tbl_Course
CROSS APPLY dbo.StrSplit(',', AdditionalCat) as split )
SELECT * FROM dbo.tbl_Category
WHERE CatID NOT IN(
SELECT AssignCategory
FROM tbl_Course
UNION
SELECT CAST(split.S as int)
FROM tbl_Course
CROSS APPLY dbo.StrSplit(',', AdditionalCat) as split )
SQLFiddle here.
You can also use UNPIVOT to avoid using UNION. But since there are only 2 columns that need to be merged, UNION is probably 'good enough' for this purpose.