Getting rows with the highest SELECT COUNT from groups within a resultset - sql

I have a SQLite Database that contains parsed Apache log lines.
A simplified version of the DB's only table (accesses) looks like this:
|referrer|datestamp|
+--------+---------+
|xy.de | 20170414|
|ab.at | 20170414|
|xy.de | 20170414|
|xy.de | 20170414|
|12.com | 20170413|
|12.com | 20170413|
|xy.de | 20170413|
|12.com | 20170413|
|12.com | 20170412|
|xy.de | 20170412|
|12.com | 20170412|
|12.com | 20170412|
|ab.at | 20170412|
|ab.at | 20170412|
|12.com | 20170412|
+--------+---------+
I am trying to retrieve the top referrer for each day by performing a sub query that does a SELECT COUNT on the referrer. Afterwards I select the entries from that subquery that have the highest count:
SELECT datestamp, referrer, COUNT(*)
FROM accesses WHERE datestamp BETWEEN '20170414' AND '20170414'
GROUP BY referrer
HAVING COUNT(*) = (select MAX(anz)
FROM (SELECT COUNT(*) anz
FROM accesses
WHERE datestamp BETWEEN '20170414' AND '20170414'
GROUP BY referrer
)
);
The above approach works as long as I perform the query for a single date, but it falls apart as soon as I query for date ranges.
How can I achieve grouping by date? I am also only interested in the referrer with the highest count.

If you want all the days combined with a single best referrer, then:
SELECT referrer, COUNT(*) as anz
FROM accesses
WHERE datestamp BETWEEN '20170414' AND '20170414'
GROUP BY referrer
ORDER BY COUNT(*) DESC
LIMIT 1;
I think you might want this information broken out by day. If so, a correlated subquery helps -- and a CTE as well:
WITH dr as (
SELECT a.datestamp, a.referrer, COUNT(*) as cnt
FROM accesses a
WHERE datestamp BETWEEN '20170414' AND '20170414'
GROUP BY a.referrer, a.datestamp
)
SELECT dr.*
FROM dr
WHERE dr.cnt = (SELECT MAX(dr2.cnt)
FROM dr dr2
WHERE dr2.datestamp = dr.datestamp
);

Just group by a date range. As an example,
SELECT referrer,
case when datestamp Between '20170101' AND '20170131' then 1
when datestamp Between '20170201' AND '20170228' then 2
when datestamp Between '20170301' AND '20170331' then 3
else 4 end DateRange
COUNT(*) as anz
FROM accesses
GROUP BY referrer,
case when datestamp Between '20170101' AND '20170131' then 1
when datestamp Between '20170201' AND '20170228' then 2
when datestamp Between '20170301' AND '20170331' then 3
else 4 end
ORDER BY referrer, COUNT(*) DESC
LIMIT 1;
You can put any legal SQL expression in a group by clause. This causes the Query processor to create individual buckets to aggregate the raw data into according to value of the group by expression.

Related

How to filter out conditions based on a group by in JPA?

I have a table like
| customer | profile | status | date |
| 1 | 1 | DONE | mmddyy |
| 1 | 1 | DONE | mmddyy |
In this case, I want to group by on the profile ID having max date. Profiles can be repeated. I've ruled out Java 8 streams as I have many conditions here.
I want to convert the following SQL into JPQL:
select customer, profile, status, max(date)
from tbl
group by profile, customer,status, date, column-k
having count(profile)>0 and status='DONE';
Can someone tell how can I write this query in JPQL if it is correct in SQL? If I declare columns in select it is needed in group by as well and the query results are different.
I am guessing that you want the most recent customer/profile combination that is done.
If so, the correct SQL is:
select t.*
from t
where t.date = (select max(t2.date)
from t t2
where t2.customer = t.customer and t2.profile = t.profile
) and
t.status = 'DONE';
I don't know how to convert this to JPQL, but you might as well start with working SQL code.
In your query date column not needed in group by and status='DONE' should be added with where clause
select customer, profile, status, max(date)
from tbl
where status='DONE'
group by profile, customer,status,
having count(profile)>0

`INTERSECT` does not return anything from two tables, separately values are returned fine

I'm not sure what I am doing wrong here since I didn't touch SQL queries for several years plus MSSQL query language is a bit strange to me but after 30 minutes of googling I still cannot find the answer.
Problem
I have two queries that work perfectly fine:
SELECT COUNT(*) AS 'NumberOfAccounts' FROM Accounts
SELECT COUNT(*) AS 'NumberOfUsers' FROM Users
I need to get this information in one go in my API response since I don't want to execute two statements. How can I combine them into one query so it will return table as follows:
+------------------+---------------+
| NumberOfAccounts | NumberOfUsers |
+------------------+---------------+
| 10 | 16 |
+------------------+---------------+
What I have tried
UNION SELECT COUNT(*) AS 'NumberOfAccounts' FROM Accounts UNION SELECT COUNT(*) AS 'NumberOfUsers' FROM Users
This is giving me the result of both tables, however it all pushes it into NumberOfAccounts and the result is invalid for me to parse.
+------------------+
| NumberOfAccounts |
+------------------+
| 10 |
| 16 |
+------------------+
INTRSECT SELECT COUNT(*) AS 'NumberOfAccounts' FROM Accounts INTERSECT SELECT COUNT(*) AS 'NumberOfUsers' FROM Users
This just gives me empty result with only NumberOfAccounts column in it.
You can just put these as subqueries in a select:
SELECT (SELECT COUNT(*) FROM Accounts) as NumberOfAccounts,
(SELECT COUNT(*) FROM Users) as NumberOfUsers
In SQL Server, no FROM clause is needed.
UNION is the wrong usage here. Union will "merge" rows of identical tables (or identical selects) and not columns.
One solution might be:
SELECT AccountCount, UserCount FROM
(SELECT COUNT(*) AS AccountCount, 1 AS Id FROM Accounts) AS a
JOIN
(SELECT COUNT(*) AS UserCount, 1 as Id FROM Users) AS u ON (a.Id = u.Id)
Be aware of the artificial surrogate key 1 you need to insert to join both sub-selects together.
For completeness sake; with UNION ALL you'd do:
SELECT 'NumberOfAccounts' AS what, COUNT(*) AS howmany FROM accounts
UNION ALL
SELECT 'NumberOfUsers' AS what, COUNT(*) AS howmany FROM users;
which results in
+------------------+---------+
| what | howmany |
+------------------+---------+
| NumberOfAccounts | 10 |
| NumberOfUsers | 16 |
+------------------+---------+
And another variation:
WITH cte AS
(
SELECT COUNT(*) AS cntAccounts, 0 AS cntUsers FROM accounts
UNION ALL
SELECT 0 AS cntAccounts, COUNT(*) AS cntUsers FROM users
)
SELECT
SUM(cntAccounts) AS NumberOfAccounts
,SUM(cntUsers ) AS NumberOfUsers
FROM cte
If you want (need) better performance you can get the row counts from the following query which uses sys.dm_db_partition_stats to get the row counts:
SELECT (
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('Accounts')
AND (index_id=0 or index_id=1)) NumberOfAccounts,
(
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('Users')
AND (index_id=0 or index_id=1)) NumberOfUsers

Avoid useless subqueries or aggregations when joining and grouping

I have two tables, room and message, in a chat database :
CREATE TABLE room (
id serial primary key,
name varchar(50) UNIQUE NOT NULL,
private boolean NOT NULL default false,
description text NOT NULL
);
CREATE TABLE message (
id bigserial primary key,
room integer references room(id),
author integer references player(id),
created integer NOT NULL,
);
Let's say I want to get the rooms with the numbers of messages from an user and dates of most recent message :
id | number | last_created | description | name | private
----+--------+--------------+-------------+------------------+---------
2 | 1149 | 1391703964 | | Dragons & co | t
8 | 136 | 1391699600 | | Javascript | f
10 | 71 | 1391684998 | | WBT | t
1 | 86 | 1391682712 | | Miaou | f
3 | 423 | 1391681764 | | Code & Baguettes | f
...
I see two solutions :
1) selecting/grouping on the messages and using subqueries to get the room columns :
select m.room as id, count(*) number, max(created) last_created,
(select name from room where room.id=m.room),
(select description from room where room.id=m.room),
(select private from room where room.id=m.room)
from message m where author=$1 group by room order by last_created desc limit 10
This makes 3 almost identical subqueries. This looks very dirty. I could reverse it to do only 2 suqueries on message columns but it wouln't be much better.
2) selecting on both tables and using aggregate functions for all columns :
select room.id, count(*) number, max(created) last_created,
max(name) as name, max(description) as description, bool_or(private) as private
from message, room
where message.room=room.id and author=$1
group by room.id order by last_created desc limit 10
All those aggregate functions look messy and useless.
Is there a clean solution here ?
It looks like a general problem to me. Theoretically, those aggregate functions are useless as, by construct, all the joined rows are the same row. I'd like to know if there's a general solution.
Try performing the grouping in a subquery:
select m.id, m.number, m.last_created, r.name, r.description, r.private
from (
select m.room as id, count(*) number, max(created) last_created
from message m
where author=$1
group by room
) m
join room r
on r.id = m.id
order by m.last_created desc limit 10
Edit: Another option (likely with similar performance) is to move that aggregation into a view, something like:
create view MessagesByRoom
as
select m.author, m.room, count(*) number, max(created) last_created,
from message m
group by author, room
And then use it like:
select m.room, m.number, m.last_created, r.name, r.description, r.private
from MessagesByRoom m
join room r
on r.id = m.room
where m.author = $1
order by m.last_created desc limit 10
Maybe use a join?
SELECT
r.id, count(*) number_of_posts,
max(m.created) last_created,
r.name, r.description, r.private
FROM room r
JOIN message m on r.id = m.room
WHERE m.author = $1
GROUP BY r.id
ORDER BY last_created desc
You can include the columns in the group by:
select room.id, count(*) number, max(message.created) last_created,
room.name, room.description, room.private
from message join
room
on message.room=room.id and author=$1
group by room.id, name, description, private
order by last_created desc
limit 10;
EDIT:
This query will work in more recent versions of Postgres:
select room.id, count(*) number, max(message.created) last_created,
room.name, room.description, room.private
from message join
room
on message.room=room.id and author=$1
group by room.id
order by last_created desc
limit 10;
Earlier versions of the documentation are pretty clear that you would need to include all the columns:
When GROUP BY is present, it is not valid for the SELECT list
expressions to refer to ungrouped columns except within aggregate
functions, since there would be more than one possible value to return
for an ungrouped column.
The ANSI standard actually does allow the above query with just group by room.id. This is a rather recent addition to the functionality of databases that support it.

SQL - Search a table for all instances where a value is repeated

I'm looking to find a way to search a table for duplicate values and return those duplicates (or even just one of the set of duplicates) as the result set.
For instance, let's say I have these data:
uid | semi-unique id
1 | 12345
2 | 21345
3 | 54321
4 | 41235
5 | 12345
6 | 21345
I need to return either:
12345
12345
21345
21345
Or:
12345
21345
I've tried googling around and keep coming up short. Any help please?
To get each row, you can use window functions:
select t.*
from (select t.*, count(*) over (partition by [semi-unique id]) as totcnt
from t
) t
where totcnt > 1
To get just one instance, try this:
select t.*
from (select t.*, count(*) over (partition by [semi-unique id]) as totcnt,
row_number() over (partition by [semi-unique id] order by (select NULL)
) as seqnum
from t
) t
where totcnt > 1 and seqnum = 1
The advantage of this approach is that you get all the columns, instead of just the id (if that helps).
Sorry, I was short on time earlier so I couldn't explain my answer. The first query groups the semi_unique_ids that are the same and only returns the ones that have a duplicate.
SELECT semi_unique_id
FROM your_table
GROUP BY semi_unique_id
HAVING COUNT(semi_unique_id) > 1
If you wanted to get the uid in the query too you can easily add it like so.
SELECT uid,
semi_unique_uid
FROM your_table
GROUP BY
semi_unique_id,
uid
HAVING COUNT(semi_unique_id) > 1
Lastly if you would like to get an idea of how many duplicates per row returned you would do the following.
SELECT uid,
semi_unique_uid,
COUNT(semi_unique_uid) AS unique_id_count
FROM your_table
GROUP BY
semi_unique_id,
uid
HAVING COUNT(semi_unique_id) > 1
SELECT t.semi_unique_id AS i
FROM TABLE t
GROUP BY
t.semi_unique_id
HAVING (COUNT(t.semi_unique_id) > 1)
Try this for sql-server

SQL query sum at bottom row

I am trying to get the sum of a column at the bottom row.
I have tried a few examples by using SUM() and COUNT(), but they have all failed with syntax errors.
Here is my current code without any sum or anything:
:XML ON
USE MYTABLE
SELECT sbc.PolicyC.PolicyName as namn,COUNT(*) as cnt
FROM sbc.AgentC, sbc.PolicyC
WHERE sbc.AgentC.PolicyGuid = sbc.PolicyC.PolicyGuid
GROUP BY sbc.AgentC.PolicyGuid, sbc.PolicyC.PolicyName ORDER BY namn ASC
FOR XML PATH ('celler'), ROOT('root')
GO
The XML output is reformatted to become a regular HTML table.
EDIT:
Here is the latest code, but it generates a "sum" (same number as the row above) on every other row:
:XML ON
USE MYTABLE
SELECT sbc.PolicyC.PolicyName as namn,COUNT(*) as cnt
FROM sbc.AgentC, sbc.PolicyC
WHERE sbc.AgentC.PolicyGuid = sbc.PolicyC.PolicyGuid
GROUP BY sbc.AgentC.PolicyGuid, sbc.PolicyC.PolicyName with rollup
FOR XML PATH ('celler'), ROOT('root')
GO
The XML output looks like this:
<root>
<celler>
<namn>example name one</namn>
<cnt>23</cnt>
</celler>
<celler>
<cnt>23</cnt>
</celler>
<celler>
<namn>example name two</namn>
<cnt>1</cnt>
</celler>
<celler>
<cnt>1</cnt>
</celler>
</root>
Try
SELECT sbc.PolicyC.PolicyName as namn,COUNT(*) as cnt
FROM sbc.AgentC, sbc.PolicyC
WHERE sbc.AgentC.PolicyGuid = sbc.PolicyC.PolicyGuid
GROUP BY sbc.AgentC.PolicyGuid, sbc.PolicyC.PolicyName
UNION
SELECT 'TOTAL' as nawn,COUNT(*) as cnt
FROM
FROM sbc.AgentC, sbc.PolicyC
WHERE sbc.AgentC.PolicyGuid = sbc.PolicyC.PolicyGuid
ORDER BY namn ASC
This will compute the total in a separate query. However, you might need to either add some non-printing, high-ASCII character to force the total to the bottom, or add some numeric ordering key... mySQL may also have an operator (similar to WITH ROLLUP in Microsoft SQL) which would be more efficient than the above code... So while this would work, there are probably more efficient options available to you...
MySQL supports a rollup extension to group by.
select * from parts;
+-----------+--------+
| part_name | amount |
+-----------+--------+
| upper | 100 |
| lower | 100 |
| left | 50 |
| right | 50 |
+-----------+--------+
select part_name
,sum(amount)
from parts
group
by part_name with rollup;
+-----------+-------------+
| part_name | sum(amount) |
+-----------+-------------+
| left | 50 |
| lower | 100 |
| right | 50 |
| upper | 100 |
| NULL | 300 |
+-----------+-------------+
Updated to answer comments:
The following items list some
behaviors specific to the MySQL
implementation of ROLLUP:
When you use ROLLUP, you cannot also
use an ORDER BY clause to sort the
results. In other words, ROLLUP and
ORDER BY are mutually exclusive.
However, you still have some control
over sort order. GROUP BY in MySQL
sorts results, and you can use
explicit ASC and DESC keywords with
columns named in the GROUP BY list to
specify sort order for individual
columns. (The higher-level summary
rows added by ROLLUP still appear
after the rows from which they are
calculated, regardless of the sort
order.)
My code became somthing like:
SELECT * FROM (...old code here... UNION ...'Total:' ... COUNT() ...)* z
ORDER BY CASE WHEN z.Namn = 'Total:' THEN '2' ELSE '1' END , z.Antal DESC
I have one column named Namn and one named Antal.
If there is the value 'Total:' in the column Namn it will order that as a '2' and if not as a '1', that makes the 'Total:' move to the botton when I have decendent ordering on the Antal column.
The magic hapens because the 'Total:' is UNION with the table, and then the CASE statement at the end puts it at the end.
My complete code that works for me that is a loot moore messy, it unions 2 tables and stuff as well:
SELECT * FROM (
SELECT acrclient.Client_Name AS 'Namn', COUNT(x.client) AS 'Antal'
FROM
(SELECT 'B' tab,t.client
FROM asutrans t
where t.voucher_type!='IP' AND t.last_update >= {ts '2019-01-01 00:00:00'}
UNION ALL SELECT
'C' tab,t.client
FROM asuhistr t
WHERE t.voucher_type!='IP' AND t.last_update >= {ts '2019-01-01 00:00:00'} ) x
LEFT JOIN acrclient ON x.client = acrclient.client
GROUP BY x.client, acrclient.Client_Name
UNION ALL
SELECT 'Total:', COUNT(client) FROM (SELECT 'B' tab,t.client
FROM asutrans t
where t.voucher_type!='IP' AND t.last_update >= {ts '2019-01-01 00:00:00'}
UNION ALL SELECT
'C' tab,t.client
FROM asuhistr t
WHERE t.voucher_type!='IP' AND t.last_update >= {ts '2019-01-01 00:00:00'} ) y
) z
ORDER BY CASE WHEN z.Namn = 'Total:' THEN '2' ELSE '1' END , z.Antal DESC