Presto SQL pivoting based on a set of yes/no column values - sql

I have a table with a list of tasks, each of with helps satisfy one or more legal regulation, and has the status of each task. I would like to group by the Regulation name in my presto SQL table, which is a subset of the column names. Here's the table:
Task
Regulation1
Regulation2
Regulation3
Status
Task1
Yes
No
Yes
On Track
Task2
No
No
Yes
On Track
Task3
Yes
No
No
At Risk
Task4
No
No
Yes
Blocked
I'd like the output to be one row per Regulation pivoted by the status with the count of tasks, like this:
I don't know how to do a "Group By" based on a subset of the column names so not sure how to do this. Thanks in advance for the help.

Maybe there is a better way but the one I came up with is a bit cumbersome - it involves grouping and then using unnest on "synthetic" arrays:
-- sample data
with dataset(Task, Regulation1, Regulation2, Regulation3, Status) as (
values ('Task1', 'Yes', 'No', 'Yes', 'On Track'),
('Task2', 'No', 'No', 'Yes', 'On Track'),
('Task3', 'Yes', 'No', 'No', 'At Risk'),
('Task4', 'No', 'No', 'Yes', 'Blocked')
)
-- query
select regulation,
sum(on_track) on_track,
sum(at_risk) at_risk,
sum(blocked) blocked
from(
select Status,
count_if(Regulation1 = 'Yes') Regulation1,
count_if(Regulation2 = 'Yes') Regulation2,
count_if(Regulation3 = 'Yes') Regulation3
from dataset
group by Status),
unnest(array['Regulation1', 'Regulation2', 'Regulation3'],
array[if(Status='On Track', Regulation1), if(Status='On Track', Regulation2), if(Status='On Track', Regulation3)],
array[if(Status='At Risk', Regulation1), if(Status='At Risk', Regulation2), if(Status='At Risk', Regulation3)],
array[if(Status='Blocked', Regulation1), if(Status='Blocked', Regulation2), if(Status='Blocked', Regulation3)]
) as t(regulation, on_track, at_risk, blocked)
group by regulation;
Output:
regulation
on_track
at_risk
blocked
Regulation2
0
0
0
Regulation3
2
0
1
Regulation1
1
1
0

Related

select all "groups/partitions" where parameter is found

Without going into too much detail - I need to create groups (grouped on a specific field) of data and then display all GROUPS of records that contain a parameter. I need all records in a GROUP even if some do not match the parameter. Any GROUPS where no records contain the parameter would be suppressed.
I'm working with db2 and I just need help with the basic syntax. I'm thinking a PARTITION_BY used within a subquery might be the correct approach. Any ideas? Thanks in advance.
Does it answer the question ?
with table1 (group_column, expression, other_column) as (
values
('group1', 'false', 'First in G1'),
('group1', 'false', 'Second in G1'),
('group2', 'false', 'First in G2'),
('group2', 'true', 'Second in G2'),
('group3', 'true', 'Full G3')
)
select
table1.group_column, expression, other_column
from table1
inner join
(
select
distinct group_column
from table1
where expression = 'true'
) as groups on table1.group_column = groups.group_column
GROUP_COLUMN
EXPRESSION
OTHER_COLUMN
group2
false
First in G2
group2
true
Second in G2
group3
true
Full G3

Applying multiple conditions in SQL?

I haven't been able to find a similar question for an answer I'm looking for. What is the best way to apply multiple conditions to my query to exclude certain information. Case or Boolean?
Example code:
SELECT test,
testTypeID,
visitType,
submitted
FROM vReport
WHERE (vReport.submitted = 0 OR vReport.submitted IS NULL)
AND vReport.test IN ('Test 1','Test 2','Test 3')
How do I best code for it to return all the tests 1, 2, and 3 while excluding rows for certain visit types (i.e. exclude row ONLY if it is Test 3 AND it is visit Week 26 AND a certain testTypeID)?
<>Not sure what your column names and datatypes are for visitWeek (assuming this is an INT) and testTypeID and what values you want to filter by but here is the logic for it:
SELECT test,
testTypeID,
visitType,
submitted
FROM vReport
WHERE (vReport.submitted = 0 OR vReport.submitted IS NULL)
AND vReport.test IN ('Test 1','Test 2','Test 3')
AND (vReport.test NOT IN ('Test 3') AND vReport.testTypeID NOT IN (some value) AND vReport.visitWeek <> 26)
If you can define your exclusions homogeneously, you can store them in another table. Something like:
ExcludedTest
excludedTestId
test
visitType
testTypeId
and your query can be done like this:
SELECT test,
testTypeID,
visitType,
submitted
FROM vReport VR
WHERE (vReport.submitted = 0 OR vReport.submitted IS NULL)
AND vReport.test IN ('Test 1','Test 2','Test 3')
AND NOT EXISTS ( SELECT 1 FROM ExcludedTest ET
WHERE ET.testTypeID = VR.testTypeID
AND ET.visitType = VR.visitType
AND ET.test = VR.test)
Also, you should have a better performance if you exclude that OR. One way to do this is to keep submitted as NOT NULL with DEFAULT(0) => vReport.submitted = 0 condition is enough.

In SQL, how to query for rows based on the values in other rows at a certain relative position in the table

I have a database containing events which have a "time" (an integer) plus some other attributes.
E.g.
CREATE TABLE events (time, attr1, attr2);
INSERT INTO events VALUES (1, 'a', 'foo');
INSERT INTO events VALUES (2, 'b', 'bar');
INSERT INTO events VALUES (4, 'a', 'baz');
INSERT INTO events VALUES (9, 'b', 'quux');
INSERT INTO events VALUES (10, 'c', 'foobar');
Now I want to do a somewhat complicated query: I want to find all events which have the property that the next event in the table satisfies some condition. For instance, I might want to find all events that satisfy all these conditions:
attr1 == 'a'
the next event (as determined by the time field) has attr2 == 'bar'
This should return the event at time 1, but not the event at time 4. Or a more complicated example would be: find all events that satisfy
attr1 == 'a'
the next event for which attr1 == 'c' has attr2 == 'foobar'
This would return both the events at times 1 and 4.
It seems like this ought to be possible via some sort of complicated nested select, but I haven't managed to work out how.
Other notes:
I'm using sqlite.
Events are irregularly spaced, so strategies that involve computing the position of the 'next' event won't work.
I know these queries are going to be murder on the query optimizer, that's okay.
I know how to do this by doing multiple selects + non-SQL logic, but I'd much rather do it using pure SQL, because this is embedded in a larger query generation system. I need to be able to generate queries of this form in general, conjoined with other constraints, etc., it's not just a single query I'll write once and be done with.
You can find a record that is the next after some specific time by combining ORDER BY and LIMIT:
SELECT *
FROM events
WHERE time > 1
ORDER BY time
LIMIT 1
By using this in a subquery, you can look up values from the next record.
Your first query can be implemented like this:
SELECT *
FROM events AS e2
WHERE attr1 = 'a'
AND (SELECT attr2
FROM events
WHERE time > e2.time
ORDER BY time
LIMIT 1) = 'bar'
Your second query can be implemented like this (the additional condition belongs into the WHERE of the subquery):
SELECT *
FROM events AS e2
WHERE attr1 = 'a'
AND (SELECT attr2
FROM events
WHERE attr1 = 'c'
AND time > e2.time
ORDER BY time
LIMIT 1) = 'foobar'
The subquery lookups can be made faster with an index on the time column.
select * from events a
where exists
(
select * from events c where c.time =
(select min(b.time) from events b where b.time > a.time)--next_event
and c.attr2 = 'bar'
)
and a.attr1 = 'a'
should be your first query. It returns time 1.
http://sqlfiddle.com/#!2/63baf/12
the second could be :
select * from events a
where exists
(
select * from events c where c.time =
(select min(b.time) from events b where b.time > a.time and attr1 = 'c')
and c.attr2 = 'foobar'
)
and a.attr1 = 'a'
but it returns time 1 and 4 (unlike what you expect, but both these rows comply with your conditions)
http://sqlfiddle.com/#!2/63baf/15
hope this helps
Nicolas

How to select multi row values without querying database using 1 line of SQL?

I need to write a SELECT query in one line only that could potentially return n number of rows.
eg
select 1 as 'primary', 'peter#email.com' as 'email'
Will return 1 row with column primary=1 and column email peter#email.com
My parser does not read beyond the first SELECT - so i need to write all this data using 1 select. I have searched a bit but cant really find a proper answer.
eg.
select (1 as 'primary', 'peter#email.com' as 'email'),(2 as 'primary', 'dave#email.com' as 'email')
does not work..
How about this?
select 1 as 'primary', 'peter#email.com' as 'email' union select 2, 'dave#email.com'

What is the MS SQL Server capability similar to the MySQL FIELD() function?

MySQL provides a string function named FIELD() which accepts a variable number of arguments. The return value is the location of the first argument in the list of the remaining ones. In other words:
FIELD('d', 'a', 'b', 'c', 'd', 'e', 'f')
would return 4 since 'd' is the fourth argument following the first.
This function provides the capability to sort a query's results based on a very specific ordering. For my current application there are four statuses that I need to manager: active, approved, rejected, and submitted. However, if I simply order by the status column, I feel the usability of the resulting list is lessened since rejected and active status items are more important than submitted and approved ones.
In MySQL I could do this:
SELECT <stuff> FROM <table> WHERE <conditions> ORDER BY FIELD(status, 'rejected', 'active','submitted', 'approved')
and the results would be ordered such that rejected items were first, followed by active ones, and so on. Thus, the results were ordered in decreasing levels of importance to the visitor.
I could create a separate table which enumerates this importance level for the statuses and then order the query by that in descending order, but this has come up for me a few times since switching to MS SQL Server so I thought I'd inquire as to whether or not I could avoid the extra table and the somewhat more complex queries using a built-in function similar to MySQL's FIELD().
Thank you,
David Kees
Use a CASE expression (SQL Server 2005+):
ORDER BY CASE status
WHEN 'active' THEN 1
WHEN 'approved' THEN 2
WHEN 'rejected' THEN 3
WHEN 'submitted' THEN 4
ELSE 5
END
You can use this syntax for more complex evaluation (including combinations, or if you need to use LIKE)
ORDER BY CASE
WHEN status LIKE 'active' THEN 1
WHEN status LIKE 'approved' THEN 2
WHEN status LIKE 'rejected' THEN 3
WHEN status LIKE 'submitted' THEN 4
ELSE 5
END
For your particular example your could:
ORDER BY CHARINDEX(
',' + status + ',',
',rejected,active,submitted,approved,'
)
Note that FIELD is supposed to return 0, 1, 2, 3, 4 where as the above will return 0, 1, 10, 17 and 27 so this trick is only useful inside the order by clause.
A set based approach would be to outer join with a table-valued-constructor:
LEFT JOIN (VALUES
('rejected', 1),
('active', 2),
('submitted', 3),
('approved', 4)
) AS lu(status, sort_order)
...
ORDER BY lu.sort_order
I recommend a CTE (SQL server 2005+).
No need to repeat the status codes or create the separate table.
WITH cte(status, RN) AS ( -- CTE to create ordered list and define where clause
SELECT 'active', 1
UNION SELECT 'approved', 2
UNION SELECT 'rejected', 3
UNION SELECT 'submitted', 4
)
SELECT <field1>, <field2>
FROM <table> tbl
INNER JOIN cte ON cte.status = tbl.status -- do the join
ORDER BY cte.RN -- use the ordering defined in the cte
Good luck,
Jason
ORDER BY CHARINDEX(','+convert(varchar,status)+',' ,
',rejected,active,submitted,approved,')
just put a comma before and after a string in which you are finding the substring index or you can say that second parameter.
and first parameter of charindex is also surrounded by ,