SQL Server - COUNT with GROUP BY in subquery - sql

I have been really struggling with this one! Essentially, I have been trying to use COUNT and GROUP BY within a subquery, errors returning more than one value and whole host of errors.
So, I have the following table:
start_date | ID_val | DIR | tsk | status|
-------------+------------+--------+-----+--------+
25-03-2015 | 001 | U | 28 | S |
27-03-2016 | 003 | D | 56 | S |
25-03-2015 | 004 | D | 56 | S |
25-03-2015 | 001 | U | 28 | S |
16-02-2016 | 002 | D | 56 | S |
25-03-2015 | 001 | U | 28 | S |
16-02-2016 | 002 | D | 56 | S |
16-02-2016 | 005 | NULL | 03 | S |
25-03-2015 | 001 | U | 17 | S |
16-02-2016 | 002 | D | 81 | S |
Ideally, I need to count the number of times the unique value of ID_val had for example U and 28 or D and 56. and only those combinations.
For example I was hoping to return the below results if its possible:
start_date | ID_val | no of times | status |
-------------+------------+---------------+--------+
25-03-2015 | 001 | 3 | S |
27-03-2016 | 003 | 1 | S |
25-03-2015 | 004 | 1 | S |
25-03-2015 | 002 | 3 | S |
I've managed to get the no of times on their own, but not be apart of a table with other values (subquery?)
Any advice is much appreciated!

This is a basic conditional aggregation:
select id_val,
sum(case when (dir = 'U' and tsk = 28) or (dir = 'D' and tsk = 56)
then 1 else 0
end) as NumTimes
from t
group by id_val;
I left out the other columns because your question focuses on id_val, dir, and tsk. The other columns seem unnecessary.

You want one result per ID_val, so you'd group by ID_val.
You want the minimum start date: min(start_date).
You want any status (as it is always the same): e.g. min(status) or max(status).
You want to count matches: count(case when <match> then 1 end).
select
min(start_date) as start_date,
id_val,
count(case when (dir = 'U' and tsk = 28) or (dir = 'D' and tsk = 56) then 1 end)
as no_of_times,
min(status) as status
from mytable
group by id_val;

Use COUNT with GROUP BY.
Query
select start_date, ID_val, count(ID_Val) as [no. of times], [status]
from your_table_name
where (tsk = 28 and DIR = 'U') or (tsk = 56 and DIR = 'D')
group by start_date, ID_val, [status]

So far, all the answers assume you are going to know the value pairs in advance and will require modification if these change or are added to. This solution makes no assumptions.
Table Creation
CREATE TABLE IDCounts
(
start_date date
, ID_val char(3)
, DIR nchar(1)
, tsk int
, status nchar(1)
)
INSERT IDCounts
VALUES
('2015-03-25','001','U' , 28,'S')
,('2016-03-27','003','D' , 56,'S')
,('2015-03-25','004','D' , 56,'S')
,('2015-03-25','001','U' , 28,'S')
,('2016-03-16','002','D' , 56,'S')
,('2015-03-25','001','U' , 28,'S')
,('2016-02-16','002','D' , 56,'S')
,('2016-02-16','005', NULL, 03,'S')
,('2015-03-25','001','U' , 17,'S')
,('2016-02-16','002','D' , 81,'S');
Code
SELECT Distinct i1.start_date, i1.ID_Val, i2.NumOfTimes, i1.status
from IDCounts i1
JOIN
(
select start_date, ID_val, isnull(DIR,N'')+cast(tsk as nvarchar) ValuePair, count(DIR+cast(tsk as nvarchar)) as NumOfTimes
from IDCounts
GROUP BY start_date, ID_val, isnull(DIR,N'')+cast(tsk as nvarchar)
) i2 on i2.start_date=i1.start_date
and i2.ID_val =i1.ID_val
and i2.ValuePair =isnull(i1.DIR,N'')+cast(i1.tsk as nvarchar)
order by i1.ID_val, i1.start_date;

Related

Best Hive SQL query for this

i have 2 table something like this. i'm running a hive query and windows function seems pretty limited in hive.
Table dept
id | name |
1 | a |
2 | b |
3 | c |
4 | d |
Table time (build with heavy load query so it's make a very slow process if i need to join to another newly created table time.)
id | date | first | last |
1 | 1992-01-01 | 1 | 1 |
2 | 1993-02-02 | 1 | 2 |
2 | 1993-03-03 | 2 | 1 |
3 | 1993-01-01 | 1 | 3 |
3 | 1994-01-01 | 2 | 2 |
3 | 1995-01-01 | 3 | 1 |
i need to retrieve something like this :
SELECT d.id,d.name,
t.date AS firstdate,
td.date AS lastdate
FROM dbo.dept d LEFT JOIN dbo.time t ON d.id=t.id AND t.first=1
LEFT JOIN time td ON d.id=td.id AND td.last=1
How the most optimized answer ?
GROUP BY operation that will be done in a single map-reduce job
select id
,max(name) as name
,max(case when first = 1 then `date` end) as firstdate
,max(case when last = 1 then `date` end) as lastdate
from (select id
,null as name
,`date`
,first
,last
from time
where first = 1
or last = 1
union all
select id
,name
,null as `date`
,null as first
,null as last
from dept
) t
group by id
;
+----+------+------------+------------+
| id | name | firstdate | lastdate |
+----+------+------------+------------+
| 1 | a | 1992-01-01 | 1992-01-01 |
| 2 | b | 1993-02-02 | 1993-03-03 |
| 3 | c | 1993-01-01 | 1995-01-01 |
| 4 | d | (null) | (null) |
+----+------+------------+------------+
select d.id
,max(d.name) as name
,max(case when t.first = 1 then t.date end) as 'firstdate'
,max(case when t.last = 1 then t.date end) as 'lastdate'
from dept d left join
time t on d.id = t.id
where t.first = 1 or t.last = 1
group by d.id

How to merge two different rows(how to assign different value is zero)

I am trying to use union for merging two output but these rows value are different.I need different rows value are zero.like output(third) table.I was struggle with pass two days please help me.
Select t1.round,
t1.SC,
t1.ST,
t1.OTHERS,
t2.round_up,
t2.SC_up,
t2.ST_up,
t2.OTHERS_up
From
(Select round as round,
Sum (non_slsc_qty) as SC,
Sum (non_slst_qty) as ST,
Sum (non_slot_qty) as OTHERS
FROM vhn_issue
where (date between '2015-08-01' and '2015-08-31')AND
dvn_cd='15' AND phc_cd='012' AND hsc_cd='05' GROUP BY round) t1
,
(Select round as round_up,
Sum (non_slsc_qty) as SC_up,
Sum (non_slst_qty) as ST_up,
Sum (non_slot_qty) as OTHERS_up,
FROM vhn_issue
where (date between '2015-04-01' and '2015-08-31')AND
dvn_cd='15' AND phc_cd='012' AND hsc_cd='05' GROUP BY round) t2
This first table result
+-----------------------------------+------------+--------+--------
| round | SC | ST | OTHERS |
+-----------------------------------+------------+--------+--------
| 1 | 20 | 30 | 50 |
| | | | |
| | | | |
+-----------------------------------+------------+--------+--------+
This is second table result
+-----------------------------------+------------+--------+----------
| round_up | SC_up | ST_up | OTHERS_up |
+-----------------------------------+------------+--------+-----------
| 1 | 21 | 31 | 51 |
| 3 | 10 | 5 | 2 |
| | | | |
+-----------------------------------+------------+--------+--------+---
I need output like this
+------------+--------+----------------------------------------------
| round_up | SC | ST |OTHERS | SC_up | ST_up |OTHERS_up |
+------------+--------+-----------------------------------------------
| 1 | 20 | 30 | 50 | 21 | 31 | 51 |
| | | | | | | |
| 3 | 0 | 0 | 0 | 10 | 5 | 2 |
+------------+--------+--------+---------------------------------------
You can use WITH Queries (Common Table Expressions) to wrap the two selects and use RIGHT JOIN to get the desired output,COALESCE is used to print 0 instead of NULL.
WITH a
AS (
SELECT round AS round
,Sum(non_slsc_qty) AS SC
,Sum(non_slst_qty) AS ST
,Sum(non_slot_qty) AS OTHERS
FROM vhn_issue
WHERE (
DATE BETWEEN '2015-08-01'
AND '2015-08-31'
)
AND dvn_cd = '15'
AND phc_cd = '012'
AND hsc_cd = '05'
GROUP BY round
)
,b
AS (
SELECT round AS round_up
,Sum(non_slsc_qty) AS SC_up
,Sum(non_slst_qty) AS ST_up
,Sum(non_slot_qty) AS OTHERS_up
,
FROM vhn_issue
WHERE (
DATE BETWEEN '2015-04-01'
AND '2015-08-31'
)
AND dvn_cd = '15'
AND phc_cd = '012'
AND hsc_cd = '05'
GROUP BY round
)
SELECT coalesce(b.round_up, 0) round_up
,coalesce(a.sc, 0) sc
,coalesce(a.st, 0) st
,coalesce(a.others, 0) others
,coalesce(b.sc_up, 0) sc_up
,coalesce(b.st_up, 0) st_up
,coalesce(b.others_up, 0) others_up
FROM a
RIGHT JOIN b ON a.round = b.round_up
WITH Results_CTE AS
(
Select t1.round as round_up ,
t1.SC as SC,
t1.ST as ST,
t1.OTHERS as OTHERS,
0 as SC_up,
0 as ST_up,
0 as OTHERS_up
from round t1
union all
t2.round_up as round_up ,
0 as SC,
0 as ST,
0 as OTHERS,
t2.SC_up,
t2.ST_up,
t2.OTHERS_up from round t2
)
select round_up , sum(SC) as SC,sum (ST) as ST, sum(OTHERS) as OTHERS, sum(SC_up) as SC_up, sum(ST_up) as ST_up, sum(OTHERS_up) as OTHERS_ up
from Results_CTE group by round_up

get the value from the previous row if row is NULL

I have this pivoted table
+---------+----------+----------+-----+----------+
| Date | Product1 | Product2 | ... | ProductN |
+---------+----------+----------+-----+----------+
| 7/1/15 | 5 | 2 | ... | 7 |
| 8/1/15 | 7 | 1 | ... | 9 |
| 9/1/15 | NULL | 7 | ... | NULL |
| 10/1/15 | 8 | NULL | ... | NULL |
| 11/1/15 | NULL | NULL | ... | NULL |
+---------+----------+----------+-----+----------+
I wanted to fill in the NULL column with the values above them. So, the output should be something like this.
+---------+----------+----------+-----+----------+
| Date | Product1 | Product2 | ... | ProductN |
+---------+----------+----------+-----+----------+
| 7/1/15 | 5 | 2 | ... | 7 |
| 8/1/15 | 7 | 1 | ... | 9 |
| 9/1/15 | 7 | 7 | ... | 9 |
| 10/1/15 | 8 | 7 | ... | 9 |
| 11/1/15 | 8 | 7 | ... | 9 |
+---------+----------+----------+-----+----------+
I've found this article that might help me but this only manipulate one column. How do I apply this to all my column or how can I achieve such result since my columns are dynamic.
Any help would be much appreciated. Thanks!
The ANSI standard has the IGNORE NULLS option on LAG(). This is exactly what you want. Alas, SQL Server has not (yet?) implemented this feature.
So, you can do this in several ways. One is using multiple outer applys. Another uses correlated subqueries:
select p.date,
(case when p.product1 is not null else p.product1
else (select top 1 p2.product1 from pivoted p2 where p2.date < p.date order by p2.date desc)
end) as product1,
(case when p.product1 is not null else p.product1
else (select top 1 p2.product1 from pivoted p2 where p2.date < p.date order by p2.date desc)
end) as product1,
(case when p.product2 is not null else p.product2
else (select top 1 p2.product2 from pivoted p2 where p2.date < p.date order by p2.date desc)
end) as product2,
. . .
from pivoted p ;
I would recommend an index on date for this query.
I would like to suggest you a solution. If you have a table which consists of merely two columns my solution will work perfectly.
+---------+----------+
| Date | Product |
+---------+----------+
| 7/1/15 | 5 |
| 8/1/15 | 7 |
| 9/1/15 | NULL |
| 10/1/15 | 8 |
| 11/1/15 | NULL |
+---------+----------+
select x.[Date],
case
when x.[Product] is null
then min(c.[Product])
else
x.[Product]
end as Product
from
(
-- this subquery evaluates a minimum distance to the rows where Product column contains a value
select [Date],
[Product],
min(case when delta >= 0 then delta else null end) delta_min,
max(case when delta < 0 then delta else null end) delta_max
from
(
-- this subquery maps Product table to itself and evaluates the difference between the dates
select p.[Date],
p.[Product],
DATEDIFF(dd, p.[Date], pnn.[Date]) delta
from #products p
cross join (select * from #products where [Product] is not null) pnn
) x
group by [Date], [Product]
) x
left join #products c on x.[Date] =
case
when abs(delta_min) < abs(delta_max) then DATEADD(dd, -delta_min, c.[Date])
else DATEADD(dd, -delta_max, c.[Date])
end
group by x.[Date], x.[Product]
order by x.[Date]
In this query I mapped the table to itself rows which contain values by CROSS JOIN statement. Then I calculated differences between dates in order to pick the closest ones and thereafter fill empty cells with values.
Result:
+---------+----------+
| Date | Product |
+---------+----------+
| 7/1/15 | 5 |
| 8/1/15 | 7 |
| 9/1/15 | 7 |
| 10/1/15 | 8 |
| 11/1/15 | 8 |
+---------+----------+
Actually, the suggested query doesn't choose the previous value. Instead of this, it selects the closest value. In other words, my code can be used for a number of different purposes.
First You need to add identity column in temporary or hard table then resolved by following method.
--- Solution ----
Create Table #Test (ID Int Identity (1,1),[Date] Date , Product_1 INT )
Insert Into #Test ([Date], Product_1)
Values
('7/1/15',5)
,('8/1/15',7)
,('9/1/15',Null)
,('10/1/15',8)
,('11/1/15',Null)
Select ID , DATE ,
IIF ( Product_1 is null ,
(Select Product_1 from #TEST
Where ID = (Select Top 1 a.ID From #TEST a where a.Product_1 is not null and a.ID<b.ID
Order By a.ID desc)
),Product_1) Product_1
from #Test b
-- Solution End ---

How to retrieved specific data from three different tables

i have 3 tables, every table contains the different information of student(e.g personal detail, course detail, academic details etc), in which students have 4 different category(SC, ST, OBC and Gen). i want to retrieve the student list according to there category and Plus2Percentage.
eg.
1. i want to retrieve 2 students from SC category whose Plus2Percentage is >= 60,
then
i want to retrieve 2 students from ST category whose Plus2Percentage is >= 65,
then
i want to retrieve 1 students from OBC category whose Plus2Percentage is >= 60,
then
i want to retrieve 2 students from All 4(SC, ST, OBC and Gen) category whose Plus2Percentage is >= 70, but in this i dont want to retrieve those students info who has been already retrieved.(e.g those two student from SC Category who has been already retrieved from very step and so on from the ST OBC category)
[Table1]:
| Roll No | Applicant Name| Gender | Category | Father's Name |
|------------|---------------|------------|------------|----------------|
| 001 | A | M | SC | as |
| 002 | B | F | ST | hg |
| 003 | C | F | ST | yj |
| 004 | D | M | OBC | uy |
| 005 | E | F | SC | bn |
| 006 | F | M | OBC | kl |
| 007 | E | F | Gen | bn |
| 008 | F | M | OBC | vg |
| 009 | E | F | Gen | gh |
| 010 | F | M | SC | we |
|------------|---------------|------------|------------|----------------|
[Table2]:
| ID | Semester | Major | Applied Course|
|------------|---------------|------------|---------------|
| 001 | 1 | English | B.A |
| 002 | 1 | English | B.A |
| 003 | 1 | History | B.A |
| 004 | 1 | botany | B.Sc |
| 005 | 1 | Hindi | B.A |
| 006 | 1 | History | B.A |
| 007 | 1 | Maths | B.A |
| 008 | 1 | Hindi | B.A |
| 009 | 1 | History | B.A |
| 010 | 1 | Pol.Science| B.A |
|------------|---------------|------------|---------------|
[Table3]:
| ID |Plus2Percentage|
|------------|---------------|
| 001 | 60 |
| 002 | 65 |
| 003 | 70 |
| 004 | 73 |
| 005 | 87 |
| 006 | 91 |
| 007 | 59 |
| 008 | 78 |
| 009 | 88 |
| 010 | 57 |
|------------|---------------|-
[Output]:
| Roll No |Plus2Percentage| Category |
|------------|---------------|-----------|
| 005 | 87 | SC |
| 001 | 60 | SC |
| 003 | 70 | ST |
| 002 | 65 | ST |
| 006 | 91 | OBC |
| 009 | 88 | Gen |
| 008 | 78 | OBC |
|------------|---------------|-----------|
2 Students from SC Category whose Percentage is above or equal to 60%.
Roll No 005 and 001 from sc.
2 Students from ST Category whose Percentage is above or equal to 65%.
Roll No. 002 and 003 from st.
1 Students from OBC Category whose Percentage is above or equal to 60%.
Roll No. 006 from OBC
and
2 Students from all category whose percentage is above 70%, but excluding previously retrieved students.
Roll No. 009 and 008 from all over
Previously working code,when i was retrieving data from1 table ,instead of 3 Tables:
WITH PRIMARY_CHOICE AS (
SELECT
RollNo,
ApplicantName,
FatherName,
Gender,
Major,
Category,
Plus2Percentage
FROM (
SELECT
RollNo,
ApplicantName,
FatherName,
Gender,
Semester,
Major,
AppliedCourse,
Category,
Plus2Percentage,
row_number() over (partition by Category, Semester, Major, AppliedCourse order by Plus2Percentage desc) as rn
FROM [College Management System].[dbo].[ApplicantPersonalDetail]
) as T
WHERE
rn <= CASE
WHEN Category='SC' AND Semester='1' AND AppliedCourse= 'B.A' AND Plus2Percentage >= '60' THEN '2'
WHEN Category='ST' AND Semester= '1' AND AppliedCourse= 'B.A' AND Plus2Percentage >= '65'THEN '2'
WHEN Category='OBC' AND Semester= '1' AND AppliedCourse= 'B.A' AND Plus2Percentage >= '60' THEN '1'
ELSE 0
END
)
SELECT
RollNo,
ApplicantName,
FatherName,
Gender,
Major,
Category,
Plus2Percentage
FROM PRIMARY_CHOICE
UNION ALL
SELECT
RollNo,
ApplicantName,
FatherName,
Gender,
Major,
Category,
Plus2Percentage
FROM (
SELECT
RollNo,
ApplicantName,
FatherName,
Gender,
Semester,
Major,
AppliedCourse,
Category,
Plus2Percentage,
row_number() over (partition by Semester, Major1, AppliedCourse order by Plus2Percentage desc) as rn
FROM [College Management System].[dbo].[ApplicantPersonalDetail] x
WHERE NOT EXISTS (
select 1 from primary_choice y
where x.RollNo = y.RollNo
)
) AS T2
WHERE
rn <= 2
AND Semester = #semester
AND AppliedCourse = 'B.A'
AND Plus2Percentage >= 70
order by Plus2Percentage desc
SELECT RollNo, Plus2Percentage, Category
FROM TABLE1 a
INNER JOIN Table3 b on a.rollno=b.id
WHERE a.category='SC' and b.Plus2Percentage>=60
That should be enough for the first three bullets. The fourth one is a matter of either setting up the predicate or encapsulating a union of the first three and then doing a NOT IN.
Although I do feel it difficult to believe that someone who knows enough SQL to use ROW_NUMBER cannot troubleshoot a JOIN.
I have achieved the output but the query cost is high, if you have lots of data it populating every mins into your database than this query might not support the execution time.
I have used CTE table and UNION clauses for generating your output -
I have not used your table2 because there is no data required from that table in the output or is not dependent on table2
SQL SELECT CODE - you can replace your table and column names accordingly -
EDIT AFTER THE COMMENTS :- SOLUTION IS CHANGED INTO A STROED PROCEDURE READING DATA FROM TABLE VARIABLE
CREATE PROCEDURE usp_SelectCategorywiseData
AS
BEGIN
SET NOCOUNT ON;
DECLARE #tbl_LIST TABLE (RollNo int, [Plus2Percentage] int, Category varchar(10));
WITH CTE AS
(
SELECT A.RollNo, P.Percentage AS [Plus2Percentage], A.Category
, row_number() OVER (PARTITION BY A.Category ORDER BY P.Percentage DESC) AS Rank
FROM APPLICANT A INNER JOIN Plus2Percentage P ON A.RollNo = P.ID
)
INSERT INTO #tbl_LIST
SELECT RollNo, Plus2Percentage as [Plus2Percentage], Category
FROM CTE
WHERE rank <=
CASE
WHEN Category='SC' AND [Plus2Percentage] >= '60' THEN '2'
WHEN Category='ST' AND [Plus2Percentage] >= '60' THEN '2'
WHEN Category='OBC' AND [Plus2Percentage] >= '55' THEN '1'
ELSE 0
End
INSERT INTO #tbl_LIST
SELECT TOP 2 A.RollNo, P.Percentage as [Plus2Percentage], A.Category
FROM APPLICANT A INNER JOIN Plus2Percentage P ON A.RollNo = P.ID
WHERE Percentage > 70 and RollNo NOT IN (SELECT RollNo FROM #tbl_LIST) ORDER BY P.Percentage DESC
SELECT * FROM #tbl_LIST
END
What you will need to do is - create a temp table which will hold the data of first 5 records for SC, ST, OBC Category
Then Insert the records for all the category TOP 2 records order by percentage desc where Rollno is not in temp table. so that duplicates will be excluded and you will get the data.
ORDER BY SHOULD NOT BE USED AS ORDERING THE DATA CAN BE DONE LATER - IT COST THE EXECUTION TIME.
I will explain the USE of all the clauses here:
CTE - used to identify the row number (Ranking the records based on Percentage and grouping on Category)
Using CTE to select the data with individual conditional checks for Category and min. percentage.
Using UNION to get only distinct records as the final output.

PostgreSQL: how to combine multiple rows?

I have a table like this to save the results of a medical checkup and the date of the report sent and the result. Actually the date sent is based on the clinic_visit date. A client can have one or more reports (date may varies)
---------------------------------------
| client_id | date_sent | result |
---------------------------------------
| 1 | 2001 | A |
| 1 | 2002 | B |
| 2 | 2002 | D |
| 3 | 2001 | A |
| 3 | 2003 | C |
| 3 | 2005 | E |
| 4 | 2002 | D |
| 4 | 2004 | E |
| 5 | 2004 | B |
---------------------------------------
I want to extract the following report from the above data.
---------------------------------------------------
| client_id | result1 | result2 | resut3 |
---------------------------------------------------
| 1 | A | B | |
| 2 | D | | |
| 3 | A | C | E |
| 4 | D | E | |
| 5 | B | | |
---------------------------------------------------
I'm working on Postgresql. the "crosstab" function won't work here because the "date_sent" is not consistent for each client.
Can anyone please give a rough idea how it should be queried?
I suggest the following approach:
SELECT client_id, array_agg(result) AS results
FROM labresults
GROUP BY client_id;
It's not exactly the same output format, but it will give you the same information much faster and cleaner.
If you want the results in separate columns, you can always do this:
SELECT client_id,
results[1] AS result1,
results[2] AS result2,
results[3] AS result3
FROM
(
SELECT client_id, array_agg(result) AS results
FROM labresults
GROUP BY client_id
) AS r
ORDER BY client_id;
although that will obviously introduce a hardcoded number of possible results.
While I was reading about "simulating row_number", I tried to figure out another way to do this.
SELECT client_id,
MAX( CASE seq WHEN 1 THEN result ELSE '' END ) AS result1,
MAX( CASE seq WHEN 2 THEN result ELSE '' END ) AS result2,
MAX( CASE seq WHEN 3 THEN result ELSE '' END ) AS result3,
MAX( CASE seq WHEN 4 THEN result ELSE '' END ) AS result4,
MAX( CASE seq WHEN 5 THEN result ELSE '' END ) AS result5
FROM ( SELECT p1.client_id,
p1.result,
( SELECT COUNT(*)
FROM labresults p2
WHERE p2.client_id = p1.client_id
AND p2.result <= p1.result )
FROM labresults p1
) D ( client_id, result, seq )
GROUP BY client_id;
but the query took 10 minutes (500,000 ms++). for 30,000 records. This is too long..