SQL query (Postgres) how to answer that? - sql

I have a table with company id's (non unique) and some attribute (let's call it status id), status can be between 1 to 18 (many to many the row id is what unique)
now I need to get results of companies who only have rows with 1 and 18, if they have any number as well (let's say 3) then this company should not be returned.
The data is stored as row id, some meta data, company id and one status id, the example below is AFTER I ran a group by query.
So as an example if I do group by and string agg, I am getting these values:
Company ID Status
1 1,9,12,18
2 12,13,18
3 1
4 8
5 18
So in this case I need to return only 3 and 5.

You should fix your data model. Here are some reasons:
Storing numbers in strings is BAD.
Storing multiple values in a string is BAD.
SQL has poor string processing capabilities.
Postgres offers many ways to store multiple values -- a junction table, arrays, and JSON come to mind.
For your particular problem, how about an explicit comparison?
where status in ('1', '18', '1,18', '18,1')

You can group by companyid and set 2 conditions in the having clause:
select companyid
from tablename
group by companyid
having
sum((status in (1, 18))::int) > 0
and
sum((status not in (1, 18))::int) = 0
Or with EXCEPT:
select companyid from tablename
except
select companyid from tablename
where status not in (1, 18)
See the demo.
Results:
> | companyid |
> | --------: |
> | 3 |
> | 5 |

You can utilize group by and having. ie:
select *
from myTable
where statusId in (1,18)
and companyId in (select companyId
from myTable
group by companyId
having count(distinct statusId) = 1);
EDIT: If you meant to include those who have 1,18 and 18,1 too, then you could use array_agg instead:
select *
from t t1
inner join
(select companyId, array_agg(statusId) as statuses
from t
group by companyId
) t2 on t1.companyid = t2.companyid
where array[1,18] #> t2.statuses;
EDIT: If you meant to get back only companyIds without the rest of columns and data:
select companyId
from t
group by companyId
having array[1,18] #> array_agg(statusId);
DbFiddle Demo

Related

Select distinct value and bring only the latest one

I have a table that stores different statuses of each transaction. Each transaction can have multiple statuses (pending, rejected, aproved, etc).
I need to build a query that brings only the last status of each transaction.
The definition for the table that stores the statuses is:
[dbo].[Cuotas_Estado]
ID int (PK)
IdCuota int (references table dbo.Cuotas - FK)
IdEstado int (references table dbo.Estados - FK)
Here's the architecture for the 3 tables:
When running a simple SELECT statement on table dbo.Cuotas_Estado you'll get:
SELECT
*
FROM [dbo].[Cuotas_Estado] [E]
But the result I need is:
IdCuota | IdEstado
2 | 1
3 | 2
9 | 3
10 | 3
11 | 4
I'm running the following select statement:
SELECT
DISTINCT([E].[IdEstado]),
[E].[IdCuota]
FROM [dbo].[Cuotas_Estado] [E]
ORDER BY
[E].[IdCuota] ASC;
This will bring this result:
So, as you can see, it's bringing a double value to entry 9 and entry 11, I need the query to bring only the latest IdEstado column (3 in the entry 9 and 4 in the entry 11).
can you try this?
with cte as (
select IdEstado,IdCuota,
row_number() over(partition by IdCuota order by fecha desc) as RowNum
from [dbo].[Cuotas_Estado]
)
select IdEstado,IdCuota
from cte
where RowNum = 1
You can use a correlated subquery:
SELECT e.*
FROM [dbo].[Cuotas_Estado] e
WHERE e.IdEstado = (SELECT MAX(e2.IdEstado)
FROM [dbo].[Cuotas_Estado] e2
WHERE e2.IdCuota = e.IdCuota
);
With an index on Cuotas_Estado(IdCuota, IdEstado) this is probably the most efficient method.

How to perform "Select Count" with complicated "Where" statement to compute co-occurrences?

Let's have an example to declare my concern:
Suppose we have a Table (Tags) which has two columns like this
UserID -------------------------------- Tag
1 -------------------------------------- SQL
1 -------------------------------------- Select
1 -------------------------------------- DB
2 -------------------------------------- SQL
2 -------------------------------------- Programming
2 -------------------------------------- Code
2 -------------------------------------- Software
3 -------------------------------------- Code
4 -------------------------------------- SQL
4 -------------------------------------- Code
I need to count DISTINCT co-occurrences for each tag based on UserID
So, the output should be like this (with Order by Co-occurrences desc):
Tag -------------------------------- Co-occurrences
---------------------------------------------
SQL --------------------------------------- 5
Programming ------------------------------- 3
Code -------------------------------------- 3
Software ---------------------------------- 3
Select ------------------------------------ 2
DB ---------------------------------------- 2
This is just an example..
How can I make a Select statement that can do this?
I came up with one way but for only ONE specific tag:
SELECT count (distinct (Tag)) - 1 as Co_occurrences
FROM Tags
WHERE Tag is NOT NULL and UserID in
( SELECT UserID
FROM Tags
where tag = 'SQL')
Is it possible to change the above statement to make it general for all tags in the table?
SELECT t2.tag, count (distinct (t1.Tag)) - 1 as Co_occurrences
FROM Tags t1 inner join
Tags t2 on t1.UserId = t2.UserId
GROUP BY t2.tag
ORDER BY count (distinct (t1.Tag)) desc
A GROUP BY is what you are looking for:
SELECT
UserID,
Tag,
COUNT(DISTINCT Tag) - 1 AS Co_occurrences
FROM Tags
GROUP BY UserID, Tag
ORDER BY UserID, Tag
Edit: As mentioned in the comments, the above does not answer the question. I improved the answer of #OSA-E a bit, to explain what the -1 is doing after the count.
SELECT
[t1].[Tag],
COUNT(DISTINCT [t2].[Tag]) AS [Co_occurrences]
FROM [Tags] [t1]
INNER JOIN [Tags] [t2] ON [t1].[UserID] = [t2].[UserID]
WHERE [t1].[Tag] <> [t2].[Tag]
GROUP BY [t1].[Tag]
ORDER BY [Co_occurrences] DESC
Here is the Fiddle.

SQL - Removing Duplicate without 'hard' coding?

Heres my scenario.
I have a table with 3 rows I want to return within a stored procedure, rows are email, name and id. id must = 3 or 4 and email must only be per user as some have multiple entries.
I have a Select statement as follows
SELECT
DISTINCT email,
name,
id
from table
where
id = 3
or id = 4
Ok fairly simple but there are some users whose have entries that are both 3 and 4 so they appear twice, if they appear twice I want only those with ids of 4 remaining. I'll give another example below as its hard to explain.
Table -
Email Name Id
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3
jimmy#domain.com jimmy 3
So in the above scenario I would want to ignore the jimmy with the id of 3, any way of doing this without hard coding?
Thanks
SELECT
email,
name,
max(id)
from table
where
id in( 3, 4 )
group by email, name
Is this what you want to achieve?
SELECT Email, Name, MAX(Id) FROM Table WHERE Id IN (3, 4) GROUP BY Email;
Sometimes using Having Count(*) > 1 may be useful to find duplicated records.
select * from table group by Email having count(*) > 1
or
select * from table group by Email having count(*) > 1 and id > 3.
The solution provided before with the select MAX(ID) from table sounds good for this case.
This maybe an alternative solution.
What RDMS are you using? This will return only one "Jimmy", using RANK():
SELECT A.email, A.name,A.id
FROM SO_Table A
INNER JOIN(
SELECT
email, name,id,RANK() OVER (Partition BY name ORDER BY ID DESC) AS COUNTER
FROM SO_Table B
) X ON X.ID = A.ID AND X.NAME = A.NAME
WHERE X.COUNTER = 1
Returns:
email name id
------------------------------
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3

Unique results from database?

I am selecting all badge numbers from a database where category is equal to 1.
category | badge number
0 | 1
1 | 1
2 | 5
1 | 1
Sometimes the category is duplicated, is there a way to only get unique badge numbers from the database?
So above there is two 1's in category, each with badge number 1. How can I make sure the result only gives '1' rather than '1,1'
Use DISTINCT key word in the SELECT statement.
SELECT DISTINCT badge_number FROM Your_Table WHERE category = 1
Use the distinct keyword in your select.
select distinct badge_number from table_name where category = 1
Have you tried Select Distinct :
SELECT DISTINCT [badge number] from table
where Category=1
http://www.w3schools.com/sql/sql_distinct.asp
Select Distinct Badgenumber from table where Category = 1
SELECT DISTINCT BadgeNumber FROM dbo.TableName
Where Category = 1
Edited:
Ohh, there are so many posts already .... !!

return count 0 with mysql group by

database table like this
============================
= suburb_id | value
= 1 | 2
= 1 | 3
= 2 | 4
= 3 | 5
query is
SELECT COUNT(suburb_id) AS total, suburb_id
FROM suburbs
where suburb_id IN (1,2,3,4)
GROUP BY suburb_id
however, while I run this query, it doesn't give COUNT(suburb_id) = 0 when suburb_id = 0
because in suburbs table, there is no suburb_id 4, I want this query to return 0 for suburb_id = 4, like
============================
= total | suburb_id
= 2 | 1
= 1 | 2
= 1 | 3
= 0 | 4
A GROUP BY needs rows to work with, so if you have no rows for a certain category, you are not going to get the count. Think of the where clause as limiting down the source rows before they are grouped together. The where clause is not providing a list of categories to group by.
What you could do is write a query to select the categories (suburbs) then do the count in a subquery. (I'm not sure what MySQL's support for this is like)
Something like:
SELECT
s.suburb_id,
(select count(*) from suburb_data d where d.suburb_id = s.suburb_id) as total
FROM
suburb_table s
WHERE
s.suburb_id in (1,2,3,4)
(MSSQL, apologies)
This:
SELECT id, COUNT(suburb_id)
FROM (
SELECT 1 AS id
UNION ALL
SELECT 2 AS id
UNION ALL
SELECT 3 AS id
UNION ALL
SELECT 4 AS id
) ids
LEFT JOIN
suburbs s
ON s.suburb_id = ids.id
GROUP BY
id
or this:
SELECT id,
(
SELECT COUNT(*)
FROM suburb
WHERE suburb_id = id
)
FROM (
SELECT 1 AS id
UNION ALL
SELECT 2 AS id
UNION ALL
SELECT 3 AS id
UNION ALL
SELECT 4 AS id
) ids
This article compares performance of the two approaches:
Aggregates: subqueries vs. GROUP BY
, though it does not matter much in your case, as you are querying only 4 records.
Query:
select case
when total is null then 0
else total
end as total_with_zeroes,
suburb_id
from (SELECT COUNT(suburb_id) AS total, suburb_id
FROM suburbs
where suburb_id IN (1,2,3,4)
GROUP BY suburb_id) as dt
#geofftnz's solution works great if all conditions are simple like in this case. But I just had to solve a similar problem to generate a report where each column in the report is a different query. When you need to combine results from several select statements, then something like this might work.
You may have to programmatically create this query. Using left joins allows the query to return rows even if there are no matches to suburb_id with a given id. If your db supports it (which most do), you can use IFNULL to replace null with 0:
select IFNULL(a.count,0), IFNULL(b.count,0), IFNULL(c.count,0), IFNULL(d.count,0)
from (select count(suburb_id) as count from suburbs where id=1 group by suburb_id) a,
left join (select count(suburb_id) as count from suburbs where id=2 group by suburb_id) b on a.suburb_id=b.suburb_id
left join (select count(suburb_id) as count from suburbs where id=3 group by suburb_id) c on a.suburb_id=c.suburb_id
left join (select count(suburb_id) as count from suburbs where id=4 group by suburb_id) d on a.suburb_id=d.suburb_id;
The nice thing about this is that (if needed) each "left join" can use slightly different (possibly fairly complex) query.
Disclaimer: for large data sets, this type of query might have not perform very well (I don't write enough sql to know without investigating further), but at least it should give useful results ;-)