In SQL, why does group by make a difference when using having count() - sql

I have a table that stores zone_id. Sometimes a zone id is twice in the database. I wrote a query to show only entries that have two or more entries of the same zone_id in the table.
The following query returns the correct result:
select *, count(zone_id)
from proxies.storage_used
group by zone_id desc
having count(zone_id) > 1;
However, if I group by last_updated or company_id, it returns random values. If I don't add a group by clause, it only displays one value as per the screenshot below. First output shows above query string, second output shows same query string without the 'group by' line and returns only one value:
correction: I'm a new member and thus can't post pictures directly, so I added it on minus: http://min.us/m3yrlkSMu#1o
While my query works, I don't understand why. Can somebody help me understand why group by is altering the actual output, instead of only the grouping of the output? I am using MySQL.

A group by divides the resulting rows into groups and performs the aggregate function on the records in each group. If you do a count(*) without a group by you will get a single count of all rows in a table. Since you didn't specify a group by there is only one group, all records in the table. If you do a count(*) with a group by of zone id, you will get a count of how many records there are for each zone id. If you do a count(*) of zone id and last updated date, you will get a count of how many rows were updated on each date in each zone.

Without a group by clause, everything is stored in the same group, so you get a single result. If there are more than one row in your table, then the having will succeed. So, you'll end up counting all the rows in your table...
source
From what I got, you could create a query with having and without group by only in two situations:
You have a where clause, and you want to test a condition on an aggregation of all rows that satisfy that clause.
Same as above, but for all rows in your table (in practice, it doesn't make sense, though).

Related

Counting the number of times same record exist in a given period of time

I am trying to write a query to find out whether a record exist more than one or not in a given period of time. And even if it exist, how many times the same record has been repeated.
Now to solve this issue, I have sorted the records.
select * from table_name where date = ? and date > ? order by email
And trying to count the number of times the same record exist.But I am not able to figure out a way to count the number of times the same record exists.
Here is a problem.The image below holds the basic data structure.
Here is the expected output for a year
The table above holds Xyz name and xyz#email.com data three times. And the name Abc and email abc#email.com two times and the third record name Def and email def#email.com two times. Now what I am trying to figure out a way to find out the number of times each records are being repeated in a given period of time using a single query. I am thinking to make use of recursion on a record and count till it didn't find a different record after sorting it. But using recursion on every records seems expensive.
Is there a better solution to solve this problem ?
Regards
Group and count.
SELECT column_to_compare1, column_to_compare2, COUNT(*)
FROM table_name
WHERE [date] BETWEEN #date1 AND #date2
GROUP BY column_to_compare1, column_to_compare2
HAVING COUNT(*) > 1 -- IF YOU WANT TO ONLY INCLUDE RECORDS WITH DUPLICATES
Between is inclusive, so you can adjust your dates with DATEADD if you really want between.
You can use the COUNT function to do this.
To do this using your own query:
SELECT Name, Email, COUNT(*) AS count
FROM table_name
WHERE date BETWEEN '01/01/2005' AND '31/12/2005'
GROUP BY Name, Email
However your example query is poor so I cannot give you a better solution. Here is an example of this working: SQL Fiddle
EDIT: Updated my solution to match you expected output.

Get latest data for all people in a table and then filter based on some criteria

I am attempting to return the row of the highest value for timestamp (an integer) for each person (that has multiple entries) in a table. Additionally, I am only interested in rows with the field containing ABCD, but this should be done after filtering to return the latest (max timestamp) entry for each person.
SELECT table."person", max(table."timestamp")
FROM table
WHERE table."type" = 1
HAVING table."field" LIKE '%ABCD%'
GROUP BY table."person"
For some reason, I am not receiving the data I expect. The returned table is nearly twice the size of expectation. Is there some step here that I am not getting correct?
You can 1st return a table having max(timestamp) and then use it in sub query of another select statement, following is query
SELECT table."person", timestamp FROM
(SELECT table."person",max(table."timestamp") as timestamp, type, field FROM table GROUP BY table."person")
where type = 1 and field LIKE '%ABCD%'
Direct answer: as I understand your end goal, just move the HAVING clause to the WHERE section:
SELECT
table."person", MAX(table."timestamp")
FROM table
WHERE
table."type" = 1
AND table."field" LIKE '%ABCD%'
GROUP BY table."person";
This should return no more than 1 row per table."person", with their associated maximum timestamp.
As an aside, I surprised your query worked at all. Your HAVING clause referenced a column not in your query. From the documentation (and my experience):
The fundamental difference between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed.

SQL Developer: Select query results has the expected row but doesn't display it on the grid

I am using SQL developer 3.2.2 to query an Oracle 12 database. I have a select query where I am expecting a certain row to be picked by the query. The results of this query is moved to a global temp table for further processing. But when I query the newly created temp table for the row mentioned above using its key, the query doesn't find the row.
I initially thought that my query had a problem and it wasn't picking up the row at the first place and was debugging the query. But when I ran the query separately on SQL developer and looked for the row by applying a filter on the key column, it shows the row. But when I sort the key column and manually go look for the row in the grid, I don't see the row. I believe it is the same reason why this particular row isn't copied over to the temp table. This is happening to quite a few rows in the database. Has anyone experienced this problem before?
The query is a simple one and has just two columns UserID and LocationID. The query does a union on multiple sub-queries.
select distinct * from (
SELECT distinct UserID, LocationID
FROM TRANSACTION
WHERE "Deleted" = 0 and "TransactionType" in ('E1513','E1514')
AND "Date" <= '31-DEC-2016'
UNION
SELECT distinct UserID, LocationID
FROM FORMHIS
WHERE "FormID" in ('358465','358455')
AND "Date" <= '31-DEC-2016'
)
The output of the above query is missing few rows that I am sure should be in results.

Column value divided by row count in SQL Server

What happens when each column value in a table is divided with the total table row count. What function is basically performed by sql server? Can any one help?
More specifically: what is the difference between sum(column value ) / row count and column value/ row count. for e.g,
select cast(officetotal as float) /count(officeid) as value,
sum(officetotal)/ count(officeid) as average from check1
where officeid ='50009' group by officeid,officetotal
What is the operation performed on both select?
In your example both will be allways the same value because count(officeid) is allways equal to 1 because officeid is contained in the WHERE clause and officetotal is also contained in GROUP BY clause. So the example will not work because no grouping will be applied.
When you remove officetotal from the GROUP BY, you will get following message:
Column 'officetotal' is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause.
It means that you cannot use officetotal and SUM(officetotal) in one select - because SUM is meant to work for set of values and it is pointless to SUM only one value.
It is just not possible to write it this way in SQL using GROUP BY. If you look for something like first or last value from a group, you will have to use MIN(officetotal) or MAX(officetotal) or some other approach.

Assign an ID Value for Every Set of Duplicates

How can i generate an ID value for every set of duplicate records as seen in the second table with ID column? In other words, how can I let the first table to look like the second table using SQL query?
Assume that first name and last name in the first table can appear in duplicates.
Each first name and last name can have one or many purchase yr and cost.
The given image is just a sample. Total records in table 1 can reach thousands.
I'm using Oracle SQL.
Note: I'm working with one table only that is the first one. The second table is what I want.
You can use the DENSE_RANK analytic function to assign ID's as below:
EDIT:
Simplified query to generate ID's.
SELECT
DENSE_RANK() OVER (ORDER BY First_Name, Last_Name) ID,
t.*
FROM Table1 t;
Reference:
DENSE_RANK on Oracle Database SQL Reference