SQL: Compare rows in a same table - sql

I'm trying to compare rows in a single table
and figure out if "addr" and "zip" under the same id are same or different.
id | addr | zip
------+----------+----------
1 | 123 | 0000
1 | 123 | 0000
1 | 123 | 0001
2 | 222 | 1000
2 | 221 | 1000
So the result should say id 1 has valid addr and invalid zip
id 2 has invalid addr and valid zip.
Any hint will be appreciated! Thank you!!

The query...
SELECT id, COUNT(DISTINCT addr), COUNT(DISTINCT zip)
FROM YOUR_TABLE
GROUP BY id
...should give the following result on your example data...
1, 1, 2
2, 2, 1
The numbers in bold greater than 1 indicate "invalid" items.
If you want to actually filter on this, you can use HAVING clause, for example:
SELECT id, COUNT(DISTINCT addr) ADDR_COUNT, COUNT(DISTINCT zip) ZIP_COUNT
FROM YOUR_TABLE
GROUP BY id
HAVING ADDR_COUNT > 1 OR ZIP_COUNT > 1
May I suggest that if you don't actually want this kind of "mismatched" data in your database, redesign your data model so duplicates cannot happen in the first place. No duplicates, no mismatches!

Group by id. Select id, COUNT(DISTINCT addr) and COUNT(DISTINCT zip) columns.
Filter the rows where the number of distinct address or zips > 1.
This will give you the ids with inconsistent duplicate data.
Example:
SELECT id, COUNT(DISTINCT addr) nAddr, COUNT(DISTINCT zip) nZip
FROM [mytable]
GROUP BY id
HAVING nAddr > 1 OR nZip > 1
Cheers,

SELECT id
, CASE s.addrcount
WHEN 1 THEN 'valid'
ELSE 'invalid' END as addrok
, CASE s.zipcount
WHEN 1 THEN 'valid'
ELSE 'invalid' END as zipok
FROM
(
SELECT id
, count(distinct addr) as addrcount
, count(distinct zip) as zipcount
FROM table1
GROUP BY id
) as s

Related

How to set an incrementing flag column for related rows?

I am trying to create a flag column called "Related" to use in reporting to highlight specific rows that are related based on the ID column (1 = related, NULL = not related). The original table "table1" looks like below:
Name ID Related
--------------------------------
Jack 101 NULL
John 101 NULL
Pat 105 NULL
Ben 106 NULL
Jordan 106 NULL
George 300 NULL
Alan 500 NULL
Bill 200 NULL
Bob 200 NULL
I then used this UPDATE statement below:
UPDATE a
SET Related = 1
FROM table1 a
JOIN (SELECT ID FROM table1 GROUP BY ID HAVING COUNT(*) > 1) b
ON a.ID = b.ID
Below is the result of this update statement:
Name ID Related
--------------------------------
Jack 101 1
John 101 1
Pat 105 NULL
Ben 106 1
Jordan 106 1
George 300 NULL
Alan 500 NULL
Bill 200 1
Bob 200 1
This gets me close but I need for it to instead of assigning the number 1 to each related row, to increment the number for each set of related rows based on their different ID column values.
Desired result:
Name ID Related
--------------------------------
Jack 101 1
John 101 1
Pat 105 NULL
Ben 106 2
Jordan 106 2
George 300 NULL
Alan 500 NULL
Bill 200 3
Bob 200 3
This is a possible solution using dense_rank to number your related values and an updateable CTE
with r as (
select id
from t
group by id having Count(*) > 1
),
n as (
select t.id, t.related, Dense_Rank() over (order by r.id) r
from r
join t on t.id = r.id
)
update n set related = r
You can do this without a self-join, just using window functions in a CTE, and updating the CTE directly:
WITH tCounted AS (
SELECT
t.id,
t.related,
c = COUNT(*) OVER (PARTITION BY r.id)
FROM t
),
tWithRelated as (
SELECT
t.id,
t.related,
rn = DENSE_RANK() OVER (ORDER BY r.id)
FROM tCounted
WHERE c > 1
)
UPDATE tWithRelated
SET related = rn;
Use an updateable CTE - comments explain the logic.
with cte1 as (
select [Name], ID, Related
-- Get the count within the id partition, less 1 as specified
, count(*) over (partition by id) - 1 cnt
-- Get the row number within the id partition
, row_number() over (partition by id order by id) rn
from #Test
), cte2 as (
select [Name], ID, Related, cnt, rn
-- Add 1 *only* if the count is > 0 *and* its the first row in the id partition
, case when cnt > 0 then sum(case when cnt > 0 and rn = 1 then 1 else 0 end) over (order by id) else null end NewRelated
from cte1
)
update cte2 set Related = NewRelated;
This doesn't assume Related is already null and works for more than 2 rows for any given ID.
It does assume that one can order by the ID column - even though the data provided doesn't do that.

Return Max ID SQL

I have a table with several rows as mentioned below having same account id :
id account_id id_intern Name Active mobile_no landline_no email
1 0011abs 66654 A yes 098937888 098937888 a#gmail.com
2 0011abs 66655 B yes 098937666 098937666 b#gmail.com
3 0011abs 66656 C no 098937777 098937777 c#gmail.com
4 0011abs 66657 D yes 098937666 d#gmail.com
5 0011abs 66658 E yes 098937111 e#gmail.com
6 0011abs 66659 F yes 098937111 098937665
I am searching for script that can return me just one line for all the common account_id present in the table with the following condition:
For an account_id with several id_intern:
consider those lines with the status active 'yes',
then those lines with entered mobile_no (not empty),
then those lines with entered landline_no (not empty),
then those lines with entered email (not empty),
then if still we have several lines (in this case for users with name A and B)
then we will consider the line with max id_intern.
Expected Result :
id account_id id_intern Name active mobile_no landline_no email
2 0011abs 66655 B yes 098937666 098937666 b#gmail.com
I tried the script but i am unable to fulfil these conditions :(
Thanks in advance for your help.
Use row_number() window function:
select id, account_id, id_intern, Name, Active, mobile_no, landline_no, email
from (
select *, row_number() over (partition by account_id order by id_intern desc) rn
from tablename
where Active = 'yes'
and mobile_no is not null and landline_no is not null and email is not null
) t
where rn = 1
If you want 1 row for each account_id even if not all the conditions are satisfied, the you must use a conditional ORDER BY clause:
select id, account_id, id_intern, Name, Active, mobile_no, landline_no, email
from (
select *,
row_number() over (
partition by account_id
order by
case when Active = 'yes' then 1 else 2 end,
case when mobile_no is not null then 1 else 2 end,
case when landline_no is not null then 1 else 2 end,
case when email is not null then 1 else 2 end,
id_intern desc
) rn
from tablename
) t
where rn = 1

Duplicate id rows with few columns to unique id row with many columns Oracle SQL

I have a pole table that can have one to four streetlights on it. Each row has a pole ID and the type (a description) of streetlight. I need the ID's to be unique with a column for each of the possible streetlights. The type/description can anyone of 26 strings.
I have something like this:
ID Description
----------------
1 S 400
1 M 200
1 HPS 1000
1 S 400
2 M 400
2 S 250
3 S 300
What I need:
ID Description_1 Description_2 Description_3 Description_4
------------------------------------------------------------------
1 S 400 M 200 HPS 1000 S 400
2 M 400 S 250
3 S 300
The order the descriptions get populated in the description columns is not important, e.g. for ID = 1 the HPS 1000 value could be in description column 1, 2, 3, or 4. So, long as all values are present.
I tried to pivot it but I don't think that is the right tool.
select * from table t
pivot (
max(Description) for ID in (1, 2, 3))
Because there are ~3000 IDs I would end up with a table that is ~3001 rows wide...
I also looked at this Oracle SQL Cross Tab Query But it is not quite the same situation.
What is the right way to solve this problem?
You can use row_number() and conditional aggregation:
select
id,
max(case when rn = 1 then description end) description_1,
max(case when rn = 2 then description end) description_2,
max(case when rn = 3 then description end) description_3,
max(case when rn = 4 then description end) description_4
from (
select t.*, row_number() over(partition by id order by description) rn
from mytable t
) t
group by id
This handles up to 4 descriptions per id. To handle more, you can just expand the select clause with more conditional max()s.

SQL How to count one column when another column is distinct

I have a table that looks something like this:
fin_aid_status | student_number
---------------|---------------
YES | 111222
YES | 111222
| 111333
YES | 111444
I want to count the number of fin_aid_status but not double count rows where student_number is duplicated. So the result I would like from this table would be 2. Not 3 because 111222 is in the table twice. There are many other columns in the table as well though so just looking for unique values in the table will not work.
EDIT: This is Oracle.
For example I am using code like this already:
select count(*), count(distinct student_number) from table
So for third column I would want to count the number on financial aid with unique student numbers.
So my expected output would be:
count(*) | count(distinct student_number) | count_fin_aid
4 | 3 | 2
Use a case statement to evaluate the student_number when the fin_aid_status is not null; then count the distinct values.
SELECT count(Distinct case when fin_aid_status is not null
then student_number end) as Distinct_Student
FROM tbl;
Result using sample data: 2
Given Oracle:
With cte (fin_aid_status, student_number) as (
SELECT 'YES' , 111222 from dual union all
SELECT 'YES' , 111222 from dual union all
SELECT '' , 111333 from dual union all
SELECT 'YES' , 111444 from dual )
SELECT count(Distinct case when fin_aid_status is not null
then student_number end) as DistinctStudentCnt
FROM cte;
If you are using MySQL you can write something as follows, if all you want is count
SELECT count(DISTINCT student_number) FROM your_table WHERE fin_aid_status = 'YES';
I'm assuming here, add some expected results but:
SELECT fin_aid_status,
COUNT(DISTINCT student_number)
FROM tablename
GROUP BY fin_aid_status;
Will give you count of distinct values in the student_number column for each value in the fin_aid_status column

Calculate product of column values on the basis of other column in SQL Server

I have a table
Tid Did value
------------------
1 123 100
1 234 200
2 123 323
2 234 233
All tids have dids as 123 and 234. So for every tid having dids 123 and 234 I want the product of corresponding values
The output table will be
Tid Product
------------------
1 20000 (product of 100 and 200)
2 75259 (product of 323 and 233)
Any help?
select tid,
min(case when did = 123 then value end)
* min(case when did = 234 then value end) product
from my_table
group by tid
To get the data for multiple rows combined (based on tid) you use GROUP BY.
Because you're grouping by tid, you have to use an aggregate function to do anything with other values from the individual rows. If implied assumptions hold (exactly 1 row matching each did for each tid) then it doesn't matter much what aggregate function you use; min is as good as anything.
Within the aggregation, you use CASE logic to select value for the required did (and NULL for all other rows in the tid group).
Then just do the math.
You can use some arithmetic to get the product per tid.
select tid,exp(sum(log(value))) as prod
from t
group by tid
To do this only for tid's having did values 123 and 234, use
select tid,exp(sum(log(value))) as prod
from t
group by tid
having count(distinct case when did in (123,234) then did end) = 2
Here's a Rexster solution, based on good work of #gbn here
SELECT
Tid,
CASE
WHEN MinVal = 0 THEN 0
WHEN Neg % 2 = 1 THEN -1 * EXP(ABSMult)
ELSE EXP(ABSMult)
END
FROM
(
SELECT
Tid,
SUM(LOG(ABS(NULLIF(value, 0)))) AS ABSMult,
SUM(SIGN(CASE WHEN value < 0 THEN 1 ELSE 0 END)) AS Neg,
MIN(ABS(value)) AS MinVal
FROM
t
GROUP BY
Tid
) t2