SQL query that finds repeated data in one column with different data in a second column - sql

Let's say I have data that looks like this:
ID Date Data
A D1 123
A D1 456
A D2 123
What I'm looking for is a select statement that will pull all rows where ID and Date are repeated as a pair, but the data doesn't match. In this case, it would return the top two rows, because ID A and Date D1 are repeated as a pair, but the Data is different. The third row would not be returned because the ID and Date combination are not repeated.

There are several ways to do this. Here's one option using exists:
select *
from yourtable t
where exists (
select 1
from yourtable t2
where t.id = t2.id and t.date = t2.date and t.data <> t2.data
)

SELECT d.Id ,
d.Date ,
d.Data
FROM YourTable d
INNER JOIN ( SELECT Id ,
Date
FROM YourTable
GROUP BY Id ,
Date
HAVING COUNT(*) > 1
) a ON a.Id = d.Id

Related

How to select groups which has only the values we want and not select it if it also has other values in SQL

id
code
value
A
cod
2
A
buy
34
A
cod
4
B
cod
44
B
F
23
C
thk
45
C
cod
33
C
F
31
D
cod
22
In this table for example, I want those groups of id which has 'code' column value as ONLY cod or F. so query should return values of id = B and nothing else. ( Not even values with id = C because id=C also has 'thk' in code , not even id= D, and output should have ids with ONLY the mentioned two values)
expected output
id
code
value
B
cod
44
B
F
23
You want all rows for the ID of which not exists a forbidden row:
select id, code, value
from mytable
where not exists
(
select null
from mytable forbidden_row
where forbidden_row.id = mytable.id
and forbidden_row.code not in ('cod', 'F')
);
One of approaches with nested query
SELECT ID,Code, value FROM (
select ID, Code,
(SELECT count(*) FROM TableA a where Code = 'cod' and a.ID = TableA.ID) Cod,
(SELECT count(*) FROM TableA a where Code = 'F' and a.ID = TableA.ID) F,
(SELECT count(*) FROM TableA a where Code not in ('F','cod') and a.ID = TableA.ID) Other,
Value
from TableA
) SOURCE
WHERE Cod <> 0 AND F <> 0 and Other = 0
We can achieve this using CTE. Check this,
-- Split the two record category first, then check cod Or F condition.
WITH Count2 AS (
SELECT id
FROM YourTable
GROUP BY id
HAVING COUNT(id) = 2
),
codORF AS (
SELECT id, code, COUNT(id) FROM YourTable T1
LEFT JOIN Count2 T2 On T1.id = T2.id
WHERE code = 'cod' OR code = 'F'
GROUP BY id, code
Having COUNT(id) = 1
)
-- Finally to take all values
SELECT T1.*
FROM YourTable T1
INNER JOIN codORF T2 ON T1.id = T2.id
with main as (
select *, count(id) over(partition by id order by id) as total_rows
from sample
), next_and_before as (
select *,
COALESCE(lag(code) over(partition by id order by id),lead(code) over(partition by id order by id)) as before_next
from main where total_rows <= 2
)
select * from next_and_before
where lower(trim(concat(code,before_next)))in('codf','fcod','cod','f')
Its a bit of hacky solution:
first you are filtering out all the rows that have less than or equal to 2 rows, since there could be cases where you only have one row per id with a code value = 'f' or 'cod', if you don't want that then simply change the last part to: in ('codf','fcod')
then out of two rows, you are looking at the next and before value and checking if it contains other than 'f' or 'cod'
where clause will filter those out if they exist
Test Results from the link below:
Results of sample data

How to compare two tables in Hive based on counts

I have below hive tables
Table_1
ID
1
1
2
Table_2
ID
1
2
2
I am comparing two tables based on count of ID in both tables, I need the output like below
ID
1 - 2records in table 1 and 1 record in Table 2
2 - one record in Table 1 and 2 records in table 2
Table_1 is parent table
i am using below query
select count(*),ID from Table_1 group by ID;
select count(*),ID from Table_2 group by ID;
Just do a full outer join on your queries with the on condition as X.id = Y.id, and then select * from the resultant table checking for nulls on either side.
Select id, concat(cnt1, " entries in table 1, ",cnt2, "entries in table 2") from (select * from (select count(*) as cnt1, id from table1 group by id) X full outer join (select count(*) as cnt2, id from table2 group by id)
on X.id=Y.id
)
Try This. You may use a case statement to check if it should be record / records etc.
SELECT m.id,
CONCAT (COALESCE(a.ct, 0), ' record in table 1, ', COALESCE(b.ct, 0),
' record in table 2')
FROM (SELECT id
FROM table_1
UNION
SELECT id
FROM table_2) m
LEFT JOIN (SELECT Count(*) AS ct,
id
FROM table_1
GROUP BY id) a
ON m.id = a.id
LEFT JOIN (SELECT Count(*) AS ct,
id
FROM table_2
GROUP BY id) b
ON m.id = b.id;
You could use this Python program to do a full comparison of 2 Hive tables:
https://github.com/bolcom/hive_compared_bq
If you want a quick comparison just based on counts, then pass the "--just-count" option (you can also specify the group by column with "--group-by-column").
The script also allows you to visually see all the differences on all rows and all columns if you want a complete validation.

Querying two tables to filter data using select case

I have two tables
Table 1 looks like this
ID Repeats
-----------
A 1
A 1
A 0
B 2
B 2
C 2
D 1
Table 2 looks like this
ID values
-----------
A 100
B 200
C 100
D 300
Using a view I need a result like this
ID values Repeats
-------------------
A 100 NA
B 200 2
C 100 2
D 300 1
that means, I want unique ID, its values and Repeats. Repeats value should display NA when there are multiple values against single ID and it should display the Repeats value in case there is single value for repeats.
Initially I needed to display the max value of repeats so I tried the following view
ALTER VIEW [dbo].[BookingView1]
AS
SELECT bv.*, bd2.Repeats FROM Table1 bv
JOIN
(
SELECT distinct bd.id, bd.Repeats FROM table2 bd
JOIN
(
SELECT Id, MAX(Repeats) AS MaxRepeatCount
FROM table2
GROUP BY Id
) bd1
ON bd.Id = bd1.Id
AND bd.Repeats = bd1.MaxRepeatCount
) bd2
ON bv.Id = bd2.Id;
and this returns the correct result but when trying to implement the CASE it fails to return unique ID results. Please help!!
One method uses outer apply:
select t2.*, t1.repeats
from table2 t2 outer apply
(select (case when max(repeats) = min(repeats) then max(repeats)
else 'NA'
end) as repeats
from table1 t1
where t1.id = t2.id
) t1;
Two notes:
This assumes that repeats is a string. If it is a number, you need to cast it to a string.
repeats is not null.
For the sake of completeness, I'm including another approach that will work if repeats is NULL. However, Gordon's answer has a much simpler query plan and should be preferred.
Option 1 (Works with NULLs):
SELECT
t1.ID, t2.[Values],
CASE
WHEN COUNT(*) > 1 THEN 'NA'
ELSE CAST(MAX(Repeats) AS VARCHAR(2))
END Repeats
FROM (
SELECT DISTINCT t1.ID, t1.Repeats
FROM #table1 t1
) t1
LEFT OUTER JOIN #table2 t2
ON t1.ID = t2.ID
GROUP BY t1.ID, t2.[Values]
Option 2 (does not contain explicit subqueries, but does not work with NULLs):
SELECT DISTINCT
t1.ID,
t2.[Values],
CASE
WHEN COUNT(t1.Repeats) OVER (PARTITION BY COUNT(DISTINCT t1.Repeats), t1.ID) > 1 THEN 'NA'
ELSE CAST(t1.Repeats AS VARCHAR(2))
END Repeats
FROM #table1 t1
LEFT OUTER JOIN #table2 t2
ON t1.ID = t2.ID
GROUP BY t1.ID, t2.[Values], t1.Repeats
NOTE:
This may not give desired results if table2 has different values for the same ID.

SQL Server Return Rows Where Field Changed

I have a table with 3 values.
ID AuditDateTime UpdateType
12 12-15-2015 18:09 1
45 12-04-2015 17:41 0
75 12-21-2015 04:26 0
12 12-17-2015 07:43 0
35 12-01-2015 05:36 1
45 12-15-2015 04:35 0
I'm trying to return only records where the UpdateType has changed from AuditDateTime based on the IDs. So in this example, ID 12 changes from the 12-15 entry to the 12-17 entry. I would want that record returned. There will be multiple instances of ID 12, and I need all records returned where an ID's UpdateType has changed from its previous entry. I tried adding a row_number but it didn't insert sequentially because the records are not in the table in order. I've done a ton of searching with no luck. Any help would be greatly appreciated.
By using a CTE it is possible to find the previous record based upon the order of the AuditDateTime
WITH CTEData AS
(SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY AuditDateTime) [ROWNUM], *
FROM #tmpTable)
SELECT A.ID, A.AuditDateTime, A.UpdateType
FROM CTEData A INNER JOIN CTEData B
ON (A.ROWNUM - 1) = B.ROWNUM AND
A.ID = B.ID
WHERE A.UpdateType <> B.UpdateType
The Inner Join back onto the CTE will give in one query both the current record (Table Alias A) and previous row (Table Alias B).
This should do what you're trying to do I believe
SELECT
T1.ID,
T1.AuditDateTime,
T1.UpdateType
FROM
dbo.My_Table T1
INNER JOIN dbo.My_Table T2 ON
T2.ID = T1.ID AND
T2.UpdateType <> T1.UpdateType AND
T2.AuditDateTime < T1.AuditDateTime
LEFT OUTER JOIN dbo.My_Table T3 ON
T3.ID = T1.ID AND
T3.AuditDateTime < T1.AuditDateTime AND
T3.AuditDateTime > T2.AuditDateTime
WHERE
T3.ID IS NULL
Alternatively:
SELECT
T1.ID,
T1.AuditDateTime,
T1.UpdateType
FROM
dbo.My_Table T1
INNER JOIN dbo.My_Table T2 ON
T2.ID = T1.ID AND
T2.UpdateType <> T1.UpdateType AND
T2.AuditDateTime < T1.AuditDateTime
WHERE
NOT EXISTS
(
SELECT *
FROM
dbo.My_Table T3
WHERE
T3.ID = T1.ID AND
T3.AuditDateTime < T1.AuditDateTime AND
T3.AuditDateTime > T2.AuditDateTime
)
The basic gist of both queries is that you're looking for rows where an earlier row had a different type and no other rows exist between the two rows (hence, they're sequential). Both queries are logically identical, but might have differing performance.
Also, these queries assume that no two rows will have identical audit times. If that's not the case then you'll need to define what you expect to get when that happens.
You can use the lag() window function to find the previous value for the same ID. Now you can pick only those rows that introduce a change:
select *
from (
select lag(UpdateType) over (
partition by ID
order by AuditDateTime) as prev_updatetype
, *
from YourTable
) sub
where prev_updatetype <> updatetype
Example at SQL Fiddle.

SQL - Query to return result

There is a table with Columns as below:
Id : long autoincrement;
timestamp:long;
price:long
Timestamp is given as a unix_time in ms.
Question: what is the average time difference between the records ?
First thought is a sub-query grabbing the record immediately previous:
SELECT timestamp -
(select top 1 timestamp from Table T1 where T1.Id < Table.Id order by Id desc)
FROM Table
Then you can take the average of that:
SELECT AVG(delta)
from (SELECT timestamp -
(select top 1 timestamp from Table T1 where T1.Id < Table.Id order by Id desc) as delta
FROM Table) T
There will probably need to be some handling of the null that results for the first row, but I haven't tested to be sure.
In SQL Server, you could write something like that to get that information:
SELECT
t1.ID, t2.ID,
DATEDIFF(MILLISECOND, t2.PriceTime, test2.PriceTime)
FROM table t1
INNER JOIN table t2 ON t2.ID = t1.ID-1
WHERE t1.ID > (SELECT MIN(ID) FROM table)
and if you're only interested in the AVG across all entries, you could use:
SELECT
AVG(DATEDIFF(MILLISECOND, t2.PriceTime, test2.PriceTime))
FROM table t1
INNER JOIN table t2 ON t2.ID = t1.ID-1
WHERE t1.ID > (SELECT MIN(ID) FROM table)
Basically, you need to join the table with itself, and use "t1.ID = t2.ID-1" to associate item no. 2 in one table with item no. 1 in the other table and then calculate the time difference between the two. In order to avoid accessing item no. 0 which doesn't exist, use the "T1.ID > (SELECT MIN(ID) FROM table)" clause to start from the second item.
Marc
At a guess:
SELECT AVG(timestamp)
I think you need to provide more information in your question for us to help.
If you mean difference between each-other row:
select AVG(x) from (
select a.timestamp - b.timestamp as x
from table a, table b -- this multiplies a*b ) sub
SELECT AVG(T2.Timestamp - T1.TimeStamp)
FROM Table T1
JOIN Table T2 ON T2.ID = T1.ID + 1
try this
Select Avg(E.Timestamp - B.Timestamp)
From Table B Join Table E
On E.Timestamp =
(Select Max(Timestamp)
From Table
Where Timestamp < R.Timestamp)