SQL - Select unique rows from a group of results - sql

I have wrecked my brain on this problem for quite some time. I've also reviewed other questions but was unsuccessful.
The problem I have is, I have a list of results/table that has multiple rows with columns
| REGISTRATION | ID | DATE | UNITTYPE
| 005DTHGP | 172 | 2007-09-11 | MBio
| 005DTHGP | 1966 | 2006-09-12 | Tracker
| 013DTHGP | 2281 | 2006-11-01 | Tracker
| 013DTHGP | 2712 | 2008-05-30 | MBio
| 017DTNGP | 2404 | 2006-10-20 | Tracker
| 017DTNGP | 508 | 2007-11-10 | MBio
I am trying to select rows with unique REGISTRATIONS and where the DATE is max (the latest). The IDs are not proportional to the DATE, meaning the ID could be a low value yet the DATE is higher than the other matching row and vise-versa. Therefore I can't use MAX() on both the DATE and ID and grouping just doesn't seem to work.
The results I want are as follows;
| REGISTRATION | ID | DATE | UNITTYPE
| 005DTHGP | 172 | 2007-09-11 | MBio
| 013DTHGP | 2712 | 2008-05-30 | MBio
| 017DTNGP | 508 | 2007-11-10 | MBio
PLEASE HELP!!!?!?!?!?!?!?

You want embedded queries, which not all SQLs support. In t-sql you'd have something like
select r.registration, r.recent, t.id, t.unittype
from (
select registration, max([date]) recent
from #tmp
group by
registration
) r
left outer join
#tmp t
on r.recent = t.[date]
and r.registration = t.registration

TSQL:
declare #R table
(
Registration varchar(16),
ID int,
Date datetime,
UnitType varchar(16)
)
insert into #R values ('A','1','20090824','A')
insert into #R values ('A','2','20090825','B')
select R.Registration,R.ID,R.UnitType,R.Date from #R R
inner join
(select Registration,Max(Date) as Date from #R group by Registration) M
on R.Registration = M.Registration and R.Date = M.Date
This can be inefficient if you have thousands of rows in your table depending upon how the query is executed (i.e. if it is a rowscan and then a select per row).

In PostgreSQL, and assuming your data is indexed so that a sort isn't needed (or there are so few rows you don't mind a sort):
select distinct on (registration), * from whatever order by registration,"date" desc;
Taking each row in registration and descending date order, you will get the latest date for each registration first. DISTINCT throws away the duplicate registrations that follow.

select registration,ID,date,unittype
from your_table
where (registration, date) IN (select registration,max(date)
from your_table
group by registration)

This should work in MySQL:
SELECT registration, id, date, unittype FROM
(SELECT registration AS temp_reg, MAX(date) as temp_date
FROM table_name GROUP BY registration) AS temp_table
WHERE registration=temp_reg and date=temp_date
The idea is to use a subquery in a FROM clause which throws up a single row containing the correct date and registration (the fields subjected to a group); then use the correct date and registration in a WHERE clause to fetch the other fields of the same row.

Related

How to query to capture recency and scale in one query?

I've built a query that calculates the number of ids from a table, per url_count.
with cte as (
select id, count(distinct.url) url_count
from table
group by id
)
select sum(if(url_count >= 1,1,0) scale
from cte
union all
select sum(if(url_count >= 2,1,0) scale
from cte
union all
select sum(if(url_count >= 3,1,0) scale
from cte
union all
select sum(if(url_count >= 4,1,0) scale
from cte
union all
select sum(if(url_count >= 5,1,0) scale
from cte
The query above says; "Give me the list of ids and the number of urls they each go to, then accumulate the number of ids who have gone to [1-5] or more urls"
It's ofc a tedious method, but works and outputs something like;
---------
| scale |
---------
|1213432|
|867554 |
|523523 |
|342232 |
|145889 |
---------
From this table, I also have a date field on the last 5 days which I'm working on adding into this query. Thus lies the challenge; Trying to add a second layer of information to the query; i.e. Recency. Been working on multiple approaches to building a query that outputs all the combinations of different scales, per the date.
The sort of output I've imagined is a pivot table which presents something like;
-------------------------------------------------------------
| date | url_co1 | url_co2 | url_co3 | url_co4 | url_co5|
-------------------------------------------------------------
|2020-01-05| 1213432 | 1112321 | 984332 | 632131 | 234124 |
|2020-01-04| 1012131 | 934242 | 867554 | 533242 | 134234 |
| ... | ... | ... | ... | ... | ... |
| ... | ... | ... | ... | ... | ... |
| ... | ... | ... | ... | ... | ... |
-------------------------------------------------------------
Where url_co[1-5] represents the number of ids that visited [1-5] or more urls and dates gives up the date that volume was captured. No idea how to write that because once I query:
with cte as (
select id, date, count(distinct.url) url_count
from table
group by id, date
)
I've aggregated to per id, per date, which therefore something goes wrong. =/
Hope that all made sense!
Please, please help! I would appreciate some guidance.
There must be a methodology for getting the combination of volumes per recency that I've missed!
I don't really follow the full question, but the first query can be simplified to:
select url_count, count(*) as this_count,
sum(url_count) over (order by url_count desc) as descending_count
from (select id, count(distinct url) as url_count
from table
group by id
) t
group by url_count
order by url_count;

SQL : Getting data as well as count from a single table for a month

I am working on a SQL query where I have a rather huge data-set. I have the table data as mentioned below.
Existing table :
+---------+----------+----------------------+
| id(!PK) | name | Date |
+---------+----------+----------------------+
| 1 | abc | 21.03.2015 |
| 1 | def | 22.04.2015 |
| 1 | ajk | 22.03.2015 |
| 3 | ghi | 23.03.2015 |
+-------------------------------------------+
What I am looking for is an insert query into an empty table. The condition is like this :
Insert in an empty table where id is common, count of names common to an id for march.
Output for above table would be like
+---------+----------+------------------------+
| some_id | count | Date |
+---------+----------+----------------------+
| 1 | 2 | 21.03.2015 |
| 3 | 1 | 23.03.2015 |
+-------------------------------------------+
All I have is :
insert into empty_table values (some_id,count,date)
select id,count(*),date from existing_table where id=1;
Unfortunately above basic query doesn't suit this complex requirement.
Any suggestions or ideas? Thank you.
Udpated query
insert into empty_table
select id,count(*),min(date)
from existing_table where
date >= '2015-03-01' and
date < '2015-04-01'
group by id;
Seems you want the number of unique names per id:
insert into empty_table
select id
,count(distinct name)
,min(date)
from existing_table
where date >= DATE '2015-03-01'
and date < DATE '2015-04-01'
group by id;
If I understand correctly, you just need a date condition:
insert into empty_table(some_id, count, date)
select id, count(*), min(date)
from existing_table
where id = 1 and
date >= date '2015-03-01' and
date < date '2015-04-01'
group by id;
Note: the list after the table name contains the columns being inserted. There is no values keyword when using insert . . . select.
insert into empty_table
select id, count(*) as mycnt, min(date) as mydate
from existing_table
group by id, year_month(date);
Please use function provided by your RDBMS obtaining date part containing only year and month as far as you did not provide the RDBMS version and the date processing functionality varies wildly between them.

Count and name content from a SQL Server table

I have a table which is structured like this:
+-----+-------------+-------------------------+
| id | name | timestamp |
+-----+-------------+-------------------------+
| 1 | someName | 2016-04-20 09:41:41.213 |
| 2 | someName | 2016-04-20 09:42:41.213 |
| 3 | anotherName | 2016-04-20 09:43:41.213 |
| ... | ... | ... |
+-----+-------------+-------------------------+
Now, I am trying to create a query, which selects all timestamps since time x and count the amount of times the same name occurs in the result.
As an example, if we would apply this query to the table above, with 2016-04-20 09:40:41.213 as the date from which on it should be counted, the result should look like this:
+-------------+-------+
| name | count |
+-------------+-------+
| someName | 2 |
| anotherName | 1 |
+-------------+-------+
What I have accomplished so far is the following query, which gives me the the names, but not their count:
WITH screenshots AS
(
SELECT * FROM SavedScreenshotsLog
WHERE timestamp > '2016-04-20 09:40:241.213'
)
SELECT s.name
FROM SavedScreenshotsLog s
INNER JOIN screenshots sc ON sc.name = s.name AND sc.timestamp = s.timestamp
ORDER BY s.name
I have browsed through stackoverflow but was not able to find a solution which fits my needs and as I am not very experienced with SQL, I am out of ideas.
You mention one table in your question, and then show a query with two tables. That makes it hard to follow the question.
What you are asking for is a simple aggregation:
SELECT name, COUNT(*)
FROM SavedScreenshotsLog
WHERE timestamp > '2016-04-20 09:40:241.213'
GROUP BY name
ORDER BY COUNT(*) DESC;
EDIT:
If you want "0" values, you can use conditional aggregation:
SELECT name,
SUM(CASE WHEN timestamp > '2016-04-20 09:40:241.213' THEN 1 ELSE 0 END) as cnt
FROM SavedScreenshotsLog
GROUP BY name
ORDER BY cnt DESC;
Note that this will run slower because there is no filter on the dates prior to aggregation.
CREATE TABLE #TEST (name varchar(100), dt datetime)
INSERT INTO #TEST VALUES ('someName','2016-04-20 09:41:41.213')
INSERT INTO #TEST VALUES ('someName','2016-04-20 09:41:41.213')
INSERT INTO #TEST VALUES ('anotherName','2016-04-20 09:43:41.213')
declare #YourDatetime datetime = '2016-04-20 09:41:41.213'
SELECT name, count(dt)
FROM #TEST
WHERE dt >= #YourDatetime
GROUP BY name
I've posted the answer, because using the above query can generate errors in converting the string in where clause into a datetime, it depends on the format of the datetime.

PostgreSQL return multiple rows with DISTINCT though only latest date per second column

Lets says I have the following database table (date truncated for example only, two 'id_' preix columns join with other tables)...
+-----------+---------+------+--------------------+-------+
| id_table1 | id_tab2 | date | description | price |
+-----------+---------+------+--------------------+-------+
| 1 | 11 | 2014 | man-eating-waffles | 1.46 |
+-----------+---------+------+--------------------+-------+
| 2 | 22 | 2014 | Flying Shoes | 8.99 |
+-----------+---------+------+--------------------+-------+
| 3 | 44 | 2015 | Flying Shoes | 12.99 |
+-----------+---------+------+--------------------+-------+
...and I have a query like the following...
SELECT id, date, description FROM inventory ORDER BY date ASC;
How do I SELECT all the descriptions, but only once each while simultaneously only the latest year for that description? So I need the database query to return the first and last row from the sample data above; the second it not returned because the last row has a later date.
Postgres has something called distinct on. This is usually more efficient than using window functions. So, an alternative method would be:
SELECT distinct on (description) id, date, description
FROM inventory
ORDER BY description, date desc;
The row_number window function should do the trick:
SELECT id, date, description
FROM (SELECT id, date, description,
ROW_NUMBER() OVER (PARTITION BY description
ORDER BY date DESC) AS rn
FROM inventory) t
WHERE rn = 1
ORDER BY date ASC;

MIN() Function in SQL

Need help with Min Function in SQL
I have a table as shown below.
+------------+-------+-------+
| Date_ | Name | Score |
+------------+-------+-------+
| 2012/07/05 | Jack | 1 |
| 2012/07/05 | Jones | 1 |
| 2012/07/06 | Jill | 2 |
| 2012/07/06 | James | 3 |
| 2012/07/07 | Hugo | 1 |
| 2012/07/07 | Jack | 1 |
| 2012/07/07 | Jim | 2 |
+------------+-------+-------+
I would like to get the output like below
+------------+------+-------+
| Date_ | Name | Score |
+------------+------+-------+
| 2012/07/05 | Jack | 1 |
| 2012/07/06 | Jill | 2 |
| 2012/07/07 | Hugo | 1 |
+------------+------+-------+
When I use the MIN() function with just the date and Score column I get the lowest score for each date, which is what I want. I don't care which row is returned if there is a tie in the score for the same date. Trouble starts when I also want name column in the output. I tried a few variation of SQL (i.e min with correlated sub query) but I have no luck getting the output as shown above. Can anyone help please:)
Query is as follows
SELECT DISTINCT
A.USername, A.Date_, A.Score
FROM TestTable AS A
INNER JOIN (SELECT Date_,MIN(Score) AS MinScore
FROM TestTable
GROUP BY Date_) AS B
ON (A.Score = B.MinScore) AND (A.Date_ = B.Date_);
Use this solution:
SELECT a.date_, MIN(name) AS name, a.score
FROM tbl a
INNER JOIN
(
SELECT date_, MIN(score) AS minscore
FROM tbl
GROUP BY date_
) b ON a.date_ = b.date_ AND a.score = b.minscore
GROUP BY a.date_, a.score
SQL-Fiddle Demo
This will get the minimum score per date in the INNER JOIN subselect, which we use to join to the main table. Once we join the subselect, we will only have dates with names having the minimum score (with ties being displayed).
Since we only want one name per date, we then group by date and score, selecting whichever name: MIN(name).
If we want to display the name column, we must use an aggregate function on name to facilitate the GROUP BY on date and score columns, or else it will not work (We could also use MAX() on that column as well).
Please learn about the GROUP BY functionality of RDBMS.
SELECT Date_,Name,MIN(Score)
FROM T
GROUP BY Name
This makes the assumption that EACH NAME and EACH date appears only once, and this will only work for MySQL.
To make it work on other RDBMSs, you need to apply another group function on the Date column, like MAX. MIN. etc
SELECT T.Name, T.Date_, MIN(T.Score) as Score FROM T
GROUP BY T.Date_
Edit: This answer is not corrected as pointed out by JNK in comments
SELECT Date_,MAX(Name),MIN(Score)
FROM T
GROUP BY Date_
Here I am using MAX(NAME), it will pick one name if two names were found with the same goal numbers.
This will find Min score for each day (no duplicates), scored by any player. The name that starts with Z will be picked first than the name that starts with A.
Edit: Fixed by removing group by name