Grouping the rows on the basis of specific condition in SQL Server - sql

I want to group the rows on the basis of a specific condition.
The table structure is something like this
EmpID | EmpName | TaskId | A_Shift_Status | B_Shift_Status | C_Shift_Status | D_Shift_Status
1 | John | 1 | 1 | null | 2 | 1
1 | John | 2 | 1 | null | 1 | 1
2 | Mike | 3 | 1 | 1 | 2 | 1
2 | Mike | 4 | null | 1 | null | 1
3 | Steve | 5 | null | 1 | 2 | 1
3 | Steve | 6 | 1 | null | 2 | 1
The criteria will be
Done 1
Pending 2
NA 3
The expected output is to group the employees by task and the status will be on the following condition
if ALL tasks are done by any employee then the status will be done
(i.e. 1)
if ANY of the tasks is incomplete then the status will be
incomplete/pending (i.e. 2)
So the desired output will be
EmpID | EmpName | A_Shift_Status | B_Shift_Status | C_Shift_Status | D_Shift_Status
1 | John | 1 | null | 2 | 1
2 | Mike | 1 | 1 | 2 | 1
3 | Steve | 1 | 1 | 2 | 1
So in other terms summary/grouping should only show complete/done (i.e. 1) when all the rows of a particular shift column of an employee have status as complete/done (i.e. 1)

Based on your data (where the criteria are 1, 2 and NULL for n/a), a simple 'group by' the employee, and MAX of the columns, should work e.g.,
SELECT
yt.EmpID,
yt.EmpName,
MAX(yt.A_Shift_Status) AS A_Shift_Status,
MAX(yt.B_Shift_Status) AS B_Shift_Status,
MAX(yt.C_Shift_Status) AS C_Shift_Status,
MAX(yt.D_Shift_Status) AS D_Shift_Status
FROM
yourtable yt
GROUP BY
yt.EmpID,
yt.EmpName;
For the shift statuses
If any of them are 2, it returns 2
otherwise if any of them are 1, it returns 1
otherwise it returns NULL
Notes re 1/2/3 (which was specified as criteria) vs 1/2/NULL (which is in the data)
It gets a little tricker if the inputs are supposed to use 1/2/3 instead of 1/2/NULL. Let us know if you are changing the inputs to reflect that.
If the input is fine as NULLs, but you need the output to have '3' for n/a (nulls), you can put an ISNULL or COALESCE around the MAX statements e.g., ISNULL(MAX(yt.A_Shift_Status), 3) AS A_Shift_Status

Related

Selecting rows of a table in which some special values exist

The first column of a table contains some Ids and the values in the other columns are the numbers corresponded to those Ids. Considering some special numbers, we want to select the rows that this special numbers are among the corresponded numbers to Ids. For example, let we have the following table and the special numbers are 3,5. We want to select the rows in which 2,5 are among the columns except Id:
| Id | corresponded numbers
|----|----------------------
| 1 | 2 | 3 | 5 |
| 2 | 1 | 5 |
| 3 | 1 | 2 | 4 | 5 | 7 |
| 4 | 3 | 5 | 6 |
Therefore, we want to have the following table as the result:
| Id | corresponded numbers
|----|----------------------
| 1 | 2 | 3 | 5 |
| 3 | 1 | 2 | 4 | 5 | 7 |
Would you please introduce me a function in Excel or a query in SQL to do the above selection?
SELECT id,
[corresponded numbers]
FROM TableName
WHERE (charIndex('2', [corresponded numbers]) > 0
AND charIndex('5', [corresponded numbers]) > 0)

Oracle: sql query for deleting duplicate rows based on a group

i need a SQL-Query to delete duplicates from a table. Lets start with my tables
rc_document: (there are more entries, this is just an example)
+----------------+-------------+----------------------+
| rc_document_id | document_id | rc_document_group_id |
+----------------+-------------+----------------------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |
| 4 | 4 | 1 |
| 5 | 1 | 2 |
| 6 | 3 | 2 |
+----------------+-------------+----------------------+
(document_id can be exists in mulitple rc_document-group´s)
rc_document_group:
+----------------------+----------+
| rc_document_group_id | priority |
+----------------------+----------+
| 1 | 1 |
| 2 | 2 |
+----------------------+----------+
Each rc_document can be joined with the rc_document_group. In the rc_document_group is the priority for each rc_document.
I want to delete the rc_document rows with document_id which have not the highest priority in the rc_document_group. Because the document_id can be exists in multiple rc_document-group´s .. i just want to keep that one, with the highest priority.
here is my expected rc_document table after deleting duplicate document_id´s:
+----------------+-------------+----------------------+
| rc_document_id | document_id | rc_document_group_id |
+----------------+-------------+----------------------+
| 2 | 2 | 1 |
| 4 | 4 | 1 |
| 5 | 1 | 2 |
| 6 | 3 | 2 |
+----------------+-------------+----------------------+
the rc_document´s with rc_document_id 1 and 3 must be deleted, because there document_id 1 and 3 are in another rc_document_group with higher priority.
Im new in sql and i have no idea how to write these sql query ... thank for your help!!
First, you could join the two tables in order to get the corresponding priority on each row. After that, you could use the analytic function MAX() to get, for each row, the max priority within each group of document_id. At this point, you filter out the rows where the priority is not equal to the max priority in the group.
Try this query:
SELECT t.rc_document_id,
t.document_id,
t.rc_document_group_id
FROM (SELECT d.*,
g.priority,
MAX(g.priority) OVER(PARTITION BY document_id) max_priority
FROM rc_document d
INNER JOIN rc_document_group g
ON d.rc_document_group_id = g.rc_document_group_id) t
WHERE t.priority = t.max_priority

SQL moving aggregate SUM without partial results

Assume I have this schema (tested on postgresql) where the 'Scorelines' relation contains results of sport matches. (kickoff is a TIMESTAMP but replaced by INT for readability)
SQLFiddle here: http://sqlfiddle.com/#!12/52475/3
CREATE TABLE Scorelines (
team TEXT,
kickoff INT,
scored INT,
conceded INT
);
Now I want to produce another column 'three_matches_scored' that contains the sum of the points scored
over the 3 preceding game (determined by kickoff) of the same team. I have this:
SELECT team, kickoff, scored, conceded, SUM(scored) OVER three_matches AS three_matches_scored
FROM Scorelines
WINDOW three_matches AS
(PARTITION BY team ORDER BY kickoff
ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING)
ORDER BY kickoff;
This works beautifully so far, except that I get values starting from the second game. Example:
| TEAM | KICKOFF | SCORED | CONCEDED | THREE_MATCHES_SCORED |
|------|---------|--------|----------|----------------------|
| A | 1 | 1 | 0 | (null) |
| B | 2 | 1 | 1 | (null) |
| A | 3 | 1 | 1 | 1 |
| A | 4 | 3 | 0 | 2 |
| B | 4 | 1 | 4 | 1 |
| A | 6 | 0 | 2 | 5 |
| B | 6 | 4 | 2 | 2 |
| B | 8 | 1 | 2 | 6 |
| B | 10 | 1 | 1 | 6 |
| A | 11 | 2 | 1 | 4 |
I want the column 'three_matches_scored' to be (null) for the first 3 games because there are no 3 results to sum up. How can I achieve this?
I'd prefer simple understandable solutions, performance is not critical for this particular case.
My only idea right now, is to define a stored function SUM3, that results in (null) with less than 3 values to add up. But I never defined a function in SQL and can't seem to figure it out.
You can use a case statement to null the rows where there are less than 3 games:
SELECT team, kickoff, scored, conceded,
CASE WHEN COUNT(scored) OVER three_matches = 3
THEN SUM(scored) OVER three_matches
ELSE NULL
END AS three_matches_scored
FROM Scorelines
WINDOW three_matches AS
(PARTITION BY team ORDER BY kickoff
ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING)
ORDER BY kickoff;
Output:
team | kickoff | scored | conceded | three_matches_scored
------+---------+--------+----------+----------------------
A | 1 | 1 | 0 |
B | 2 | 1 | 1 |
A | 3 | 1 | 1 |
A | 4 | 3 | 0 |
B | 4 | 1 | 4 |
A | 6 | 0 | 2 | 5
B | 6 | 4 | 2 |
B | 8 | 1 | 2 | 6
B | 10 | 1 | 1 | 6
A | 11 | 2 | 1 | 4
(10 rows)
See harmics answer above.
(my first solution, just for reference)
Solution with user defined aggregate:
CREATE TYPE intermediate_sum AS (
sum INT,
count INT
);
CREATE FUNCTION sum_sfunc(intermediate_sum, INTEGER) RETURNS intermediate_sum AS
$$ SELECT $2 + $1.sum AS sum, $1.count - 1 AS count $$ LANGUAGE SQL;
CREATE FUNCTION sum_ffunc(intermediate_sum) RETURNS INTEGER AS
$$ SELECT (CASE WHEN $1.count > 1 THEN null
WHEN $1.count = 0 THEN $1.sum
END)
$$ LANGUAGE SQL;
CREATE AGGREGATE sum3(INTEGER) (
sfunc = sum_sfunc,
finalfunc = sum_ffunc,
stype = intermediate_sum,
initcond = '(0,3)'
);
The aggregate SUM3 wants at least 3 values, otherwise it returns (null). One can define other aggreates like SUM4 by changing the initcond, for example to '(0,4)'.

SQL 1toM join Query where criteria in Child isn't met are shown

I have a one to many table where I want to run a query and show records only for those records where in the child table a certain field has a certain value as well as seeing all those in the parent table that do not have this value.
Example of the Tables:
A (parent) B (child)
============ =============================
id | name pid | typeid | phone
------------ -----------------------------
1 | Alex 1 | 1 | 555-555-5555
2 | Bill 1 | 2 | 555-555-5556
3 | Cath 2 | 1 | 555-555-5557
4 | Dale 3 | 1 | 555-555-5558
5 | Evan 3 | 2 | 555-555-5559
6 | Steve 3 | 3 | 555-555-5561
7 | Henry 4 | 1 | 555-555-5562
8 | Paul 5 | 1 | 555-555-5563
6 | 1 | 555-555-5564
The result I'm hoping for is that I can get all those that have a phone number with the typeid of 2 as well as know which parents don't without having duplicate result in the child table. So the final desired output would be:
Desired Output (parent joined child)
==========================================
id | name | pid | typeid | phone
------------------------------------------
1 | Alex | 1 | 2 | 555-555-5556
2 | Bill | null | null | null
3 | Cath | 3 | 2 | 555-555-5559
4 | Dale | null | null | null
5 | Evan | null | null | null
6 | Steve | null | null | null
7 | Henry | null | null | null
8 | Paul | null | null | null
so far the query I've attempted looks like:
SELECT id, name, pid, typeid, phone
FROM
parent LEFT OUTER JOIN child
ON parent.id = child.pid
WHERE typeid = 2 or typeid is null
That will return a list where the id is 1,3,7,8 But since the id 2,4,5,6 already have an entry in the child table they don't evaluate to null since in theory they could be joined to a record, just not where typeid is 2.
Current Output (parent joined child)
==========================================
id | name | pid | typeid | phone
------------------------------------------
1 | Alex | 1 | 2 | 555-555-5556
3 | Cath | 3 | 2 | 555-555-5559
7 | Henry | null | null | null
8 | Paul | null | null | null
I was thinking the alternatives would be to do a union table, however even then I'm not sure how to test a join where an element (typeid = 2) doesn't exist without making two entries for id 3 pop up as well. So I guess I'm searching for a way to group by, but being selective about the data in the child table that gets populated by the groupby.
Move the type criteria into the join clause instead of the where clause.
SELECT id, name, pid, typeid, phone
FROM parent LEFT OUTER JOIN child
ON parent.id = child.pid and child.typeid = 2
This will select all the parents and left join children that meet your criteria.

Selecting several max() from a table

I will first say that the table structure is (unfortunately) set.
My goal is to select several max() from a query. Lets say I have the following tables
jobReferenceTable jobList
jobID | jobName | jobDepartment | listID | jobID |
_______|__________|_______________| _______|_________|
1 | dishes | cleaning | 1 | 1 |
2 |vacumming | cleaning | 2 | 5 |
3 | mopping | cleaning | 3 | 2 |
4 |countMoney| admin | 4 | 4 |
5 | hirePpl | admin | 5 | 1 |
6 | 2 |
7 | 3 |
8 | 3 |
9 | 1 |
10 | 5 |
Somehow, I would like to have a query that selects the jobID's from cleaning, and then shows the most recent jobList ID's for each job. I started a query below, and below that are what I'm hoping to get as results
query
SELECT jrt.jobName, jrt.jobDepartment
FROM jobReferenceTable
WHERE jobDepartment = 'cleaning'
JOIN jobList jl ON jr.jobID = jl.jobID
results
jobName | jobDepartment | listID |
________|_______________|________|
1 | cleaning | 9 |
2 | cleaning | 6 |
3 | cleaning | 8 |
Try this;
SELECT jrt.jobName, jrt.jobDepartment, MAX(jl.listID)
FROM jobReferenceTable AS jrt INNER JOIN jobList AS jl ON jrt.jobID = jl.jobID
WHERE jrt.jobDepartment = 'cleaning'
GROUP BY jrt.jobName, jrt.jobDepartment
So far as I can see, you need only the one MAX() - the listID.
MAX() is an aggregate function, meaning that the rest of your result set must then be 'grouped'.