Group by and check value for that group in hive/sql

Group by and check value for that group in hive/sql - hive

I have a table with below data.
id start current
1 today True
2 yesterday False
1 Monday False
3 yesterday True
3 Monday False
4 today
4 Tuesday
5 Wednesday True
6 Friday
6 Monday
7 Sunday True
7 Tuesday
I want to check how many ids contains all nulls in the current column and print that count.
I thought of using group by id and select ids where current is null but its not giving the appropriate count. I wan to count only if all the rows for particular id contains current as null.

Try this: http://sqlfiddle.com/#!9/31f6e/12
select count(distinct start)
from
(
select start,max(case when current is not null then 1 else 0 end) mt
from data
group by start)a where mt=0

First, find all the id(s) whose MAX(current) is NULL.
Then, simply count them out.
Try the following query (will work in MySQL):
SELECT COUNT(DISTINCT IF(derived_t.max_current IS NULL,
derived_t.id,
NULL)) AS ids_with_all_null
(
SELECT id, MAX(current) as max_current
FROM your_table
GROUP BY id
) AS derived_t

You can use exists-clause for that. "Find count of individual id's which do not have rows that have value of current other than NULL"
select count(distinct d.id)
from data d
where not exists (
select *
from data d2
where d2.id=d.id and d2.current is not null
)
See SQLFiddle

Related

SQL Query getting the latest record of the Group and calculate the value of those particular records

I do have the following table (just a sample) and would like to get the Points subtract from Record2 to Record1. (Record2-Record1) from the latest record of both record1 and 2. The records are entered in category of Match. 1 Match will consists of 2 records which are Record 1 and Record 2.
The output will be 3 as the newest record is ID 3 and 4 from the Match2.)
ID
Name
Points
TimeRecorded
Match
1
Record 1
3
2-Mar 2pm
1
2
Record 2
5
2-Mar 2pm
1
3
Record 1
5
4-Mar 5pm
2
4
Record 2
8
4-Mar 5pm
2
I tried to get the value of subtracting both query as below. But I feel that this is not the good way as it is hard coded for the match and the Name of the record. May I know how to construct a better query in order to get the latest record of the grouped match and calculate the points whereby subtracting Record1 from Record2.
SELECT
(select Points from RunRecord where Name= 'Record2' AND Match = 2)
- (select Points from RunRecord where Name= 'Record1' AND Match = 2)

You could use:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TimeRecorded DESC) rn
FROM yourTable
)
SELECT
MAX(CASE WHEN Name = 'Record 2' THEN Points END) -
MAX(CASE WHEN Name = 'Record 1' THEN Points END) AS diff
FROM cte
WHERE rn = 1;
The CTE assigns a row number for each group of records of the same name, with 1 being assigned to the most recent record. Then, we aggregate over the entire table and pivot out the points to find the difference.

You can use the rank() window function to rank the records by match descending. Then take the top of the ranked records and use conditional aggregation to control the sign of the points added.
SELECT sum(CASE x.name
WHEN 'Record2' THEN
x.points
WHEN 'Record1' THEN
-x.points
END)
FROM (SELECT rr.name,
rr.points,
rank() OVER (ORDER BY rr.match DESC) r
FROM runrecord rr
WHERE name IN ('Record1',
'Record2')) x
WHERE x.r = 1;

How to use multiple counts in where clause to compare data of a table in sql?

I want to compare data of a table with its other records. The count of rows with a specific condition has to match the count of rows without the where clause but on the same grouping.
Below is the table
-------------
id name time status
1 John 10 C
2 Alex 10 R
3 Dan 10 C
4 Tim 11 C
5 Tom 11 C
Output should be time = 11 as the count for grouping on time column is different when a where clause is added on status = 'C'
SELECT q1.time
FROM (SELECT time,
Count(id)
FROM table
GROUP BY time) AS q1
INNER JOIN (SELECT time,
Count(id)
FROM table
WHERE status = 'C'
GROUP BY time) AS q2
ON q1.time = q2.time
WHERE q1.count = q2.count
This is giving the desired output but is there a better and efficient way to get the desired result?

Are you looking for this :
select t.*
from table t
where not exists (select 1 from table t1 where t1.time = t.time and t1.status <> 'C');
However you can do :
select time
from table t
group by time
having sum (case when status <> 'c' then 1 else 0 end ) = 0;

If you want the times where the rows all satisfy the where clause, then in Postgres, you can express this as:
select time
from t
group by time
having count(*) = count(*) filter (where status = 'C');

If one value is null get previous value in that quarter, in sql select query

I have data as shown below,
Now i want to get result as ,
DateDisplayName Active
Q2(Jun)-2015 736
Q3(Sep)-2015 734
Q4(Dec)-2015 NULL
Q1(Mar)-2016 NULL
So if last month data is null in that quarter then get last but one data.
Ex: in Q3 Active is null for Sep so i shoul show Aug data.

You'd rank your records. Use ROW_NUMBER to give the best record per quarter row number 1 and then only keep those.
select
date_display_name,
active
from
(
select
date_display_name,
active,
row_number() over
(
partition by date_display_name
order by
case when active is null then 2 else 1 end,
defaultdate desc
) as rn
from mytable
) ranked
where rn = 1;

SQL Select with Group By and Order By Date

I am using SQL Server 2008, and I am wondering if i can accomplish my query in one select statement and without sub-query.
I want to set variable to true if a field in a record is true in the last 10 created records, where if the field is true in the last 10 records the variable will be true while if it is false the variable will be false, also if the total number of records is less than 10 then the variable will be false too.
My problem is, to get the latest 10 created records then i need to user order by descending and do the filter on the top 10, so my query should look like the following where it is not a valid query:
declare #MyVar bit
set #MyVar = 0
select top(10) #MyVar = 1 from MyTable
where SomeId = 1000 and SomeFlag = 1
group by SomeId
having count(SomeId) >= 10
order by CreatedDate
Please provide me with your suggestions.
Here is an example, say we have the following table, and say that i want to check the latest 3 records for each id:
ID Joined CreatedDate
1 true 03/27/2013
1 false 03/26/2013
1 false 03/25/2013
1 true 03/24/2013
1 true 03/23/2013
2 true 03/22/2013
2 true 03/21/2013
2 true 03/20/2013
2 false 03/19/2013
3 true 03/18/2013
3 true 03/17/2013
For id="1", the result will be FALSE as the latest 3 created records don't have the value true for JOINED field in those 3 records.
For id="2", the result will be TRUE as the latest 3 created records have true JOINED field in those 3 records.
For id="3", the result will be FALSE as the latest created records to be checked must be minimum 3 records.

(Answer given before OP specified 2008. The below only works on 2012)
This query gives (for each ID value) the number of rows in the last 10 for which flag is equal to 1. It should be simple enough (if required) to filter this to only rows for which the count is 10, and to restrict it to a single ID value.
Without better sample data, I'll leave it at that for now:
;with Vals as (
select
*,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedDate DESC) as rn,
SUM(CASE WHEN Flag = 1 THEN 1 ELSE 0 END)
OVER (PARTITION BY ID
ORDER BY CreatedDate ASC
ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) as Cnt
from
T1
)
select * from Vals where rn = 1
(This does depend on the SQL Server 2012 version of the OVER clause - but you didn't specify which version)
Result:
ID Flag CreatedDate rn Cnt
----------- ----- ----------------------- -------------------- -----------
1 1 2012-01-12 00:00:00.000 1 10
2 1 2012-01-12 00:00:00.000 1 9
3 1 2012-01-12 00:00:00.000 1 6
(Only ID 1 meets your criteria)
Sample data:
create table T1 (ID int not null,Flag bit not null,CreatedDate datetime not null)
insert into T1 (ID,Flag,CreatedDate) values
(1,1,'20120101'),
(1,0,'20120102'),
(1,1,'20120103'),
(1,1,'20120104'),
(1,1,'20120105'),
(1,1,'20120106'),
(1,1,'20120107'),
(1,1,'20120108'),
(1,1,'20120109'),
(1,1,'20120110'),
(1,1,'20120111'),
(1,1,'20120112'),
(2,1,'20120101'),
(2,1,'20120102'),
(2,1,'20120103'),
(2,1,'20120104'),
(2,1,'20120105'),
(2,1,'20120106'),
(2,0,'20120107'),
(2,1,'20120108'),
(2,1,'20120109'),
(2,1,'20120110'),
(2,1,'20120111'),
(2,1,'20120112'),
(3,1,'20120107'),
(3,1,'20120108'),
(3,1,'20120109'),
(3,1,'20120110'),
(3,1,'20120111'),
(3,1,'20120112')

In SQLServer2008 instead of subquery you can use CTE with ROW_NUMBER() ranking function
;WITH cte AS
(
SELECT ID, CAST(Joined AS int) AS Flag,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY CreatedDate DESC) AS rn
FROM dbo.test63 t
)
SELECT ID, CASE WHEN SUM(Flag) != 3 THEN 0 ELSE 1 END AS Flag
FROM cte
WHERE rn <= 3
GROUP BY ID
Demo on SQLFiddle

Find duplicates within a specific period

I have a table with the following structure
ID Person LOG_TIME
-----------------------------------
1 1 2012-05-21 13:03:11.550
2 1 2012-05-22 13:09:37.050 <--- this is duplicate
3 1 2012-05-28 13:09:37.183
4 2 2012-05-20 15:09:37.230
5 2 2012-05-22 13:03:11.990 <--- this is duplicate
6 2 2012-05-24 04:04:13.222 <--- this is duplicate
7 2 2012-05-29 11:09:37.240
I have some application job that fills this table with data.
There is a business rule that each person should have only 1 record in every 7 days.
From the above example, records # 2,5 and 6 are considered duplicates while 1,3,4 and 7 are OK.
I want to have a SQL query that checks if there are records for the same person in less than 7 days.

;WITH cte AS
(
SELECT ID, Person, LOG_TIME,
DATEDIFF(d, MIN(LOG_TIME) OVER (PARTITION BY Person), LOG_TIME) AS diff_date
FROM dbo.Log_time
)
SELECT *
FROM cte
WHERE diff_date BETWEEN 1 AND 6
Demo on SQLFiddle

Please see my attempt on SQLFiddle here.
You can use a join based on DATEDIFF() to find records which are logged less than 7 days apart:
WITH TooClose
AS
(
SELECT
a.ID AS BeforeID,
b.ID AS AfterID
FROM
Log a
INNER JOIN Log b ON a.Person = b.Person
AND a.LOG_TIME < b.LOG_TIME
AND DATEDIFF(DAY, a.LOG_TIME, b.LOG_TIME) < 7
)
However, this will include records which you don't consider "duplicates" (for instance, ID 3, because it is too close to ID 2). From what you've said, I'm inferring that a record isn't a "duplicate" if the record it is too close to is itself a "duplicate".
So to apply this rule and get the final list of duplicates:
SELECT
AfterID AS ID
FROM
TooClose
WHERE
BeforeID NOT IN (SELECT AfterID FROM TooClose)

Please take a look at this sample.
Reference: SQLFIDDLE
Query:
select person,
datediff(max(log_time),min(log_time)) as diff,
count(log_time)
from pers
group by person
;
select y.person, y.ct
from (
select person,
datediff(max(log_time),min(log_time)) as diff,
count(log_time) as ct
from pers
group by person) as y
where y.ct > 1
and y.diff <= 7
;
PERSON DIFF COUNT(LOG_TIME)
1 1 3
2 8 3
PERSON CT
1 3

declare #Count int
set #count=(
select COUNT(*)
from timeslot
where (( (TimeFrom<#Timefrom and TimeTo >#Timefrom)
or (TimeFrom<#Timeto and TimeTo >#Timeto))
or (TimeFrom=#Timefrom or TimeTo=#Timeto)))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Group by and check value for that group in hive/sql - hive

Try this: http://sqlfiddle.com/#!9/31f6e/12 select count(distinct start) from ( select start,max(case when current is not null then 1 else 0 end) mt from data group by start)a where mt=0

You can use exists-clause for that. "Find count of individual id's which do not have rows that have value of current other than NULL" select count(distinct d.id) from data d where not exists ( select * from data d2 where d2.id=d.id and d2.current is not null ) See SQLFiddle

Related

SQL Query getting the latest record of the Group and calculate the value of those particular records

How to use multiple counts in where clause to compare data of a table in sql?

If one value is null get previous value in that quarter, in sql select query

SQL Select with Group By and Order By Date

Find duplicates within a specific period

Categories

Resources