Determine the count of rows returned - sql

Okay so i have been trying to find out if its possible to return how many times a particular bit of data is returned
Event_id
---------
Change
Change
Change
Problem
Task
so i want to find out how many times this string data is returned and pop out a value say for change i would expect 3 and so on.
i was hoping this would be possible in a where statement but i have never used count so unsure on how it all works.

Sounds like a classic count function which requires group by clause; it should contain all non-aggregated columns (event_id in this case).
select event_id,
count(*)
from your_table
group by event_id

Related

What does SELECT Function is SQL actually produce? Does it produce a new table by default?

I am struggling to understand what the output of SELECT is meant to be in SQL (I am using MS ACCESS), and what sort of criteria this output needs to specify, if any. As a result, I don't understand why some queries work and others don't. So I know it retrieves data from a table, does calculations with it and displays it. But I don't understand the "inner" working of SELECT function. For instance, what is the name of data structure / entity it displays? Is it a "new" table?
And for example, suppose I have a table called "table_name", with 5 columns. One of the columns called "column_3", and there are 20 records.
SELECT column_3, COUNT(*) AS Count
FROM table_name;
Why does this query fail to run? By logic, I would expect it to display two columns: first column will be "column_3", containing 20 rows with relevant data, and second column will be "Count", containing just one non-empty row (displaying 20), and other 19 rows will be empty (or NULL maybe)?
Is it because SELECT is meant to produce equal number of rows for each column?
Your questions involve a basic understanding of SQL. SELECT statements do not create tables, but instead return virtual result sets. Nothing is persisted unless you change it to an INSERT.
In your example question, you will need to "tell" the SQL engine what you want a count "of". Because you added column_3, you need to write:
SELECT column_3, COUNT(*) AS Count
FROM table_name
GROUP BY column_3
If you wanted a count of all the rows, simply:
SELECT COUNT(*) FROM table_name

Counting the number of times same record exist in a given period of time

I am trying to write a query to find out whether a record exist more than one or not in a given period of time. And even if it exist, how many times the same record has been repeated.
Now to solve this issue, I have sorted the records.
select * from table_name where date = ? and date > ? order by email
And trying to count the number of times the same record exist.But I am not able to figure out a way to count the number of times the same record exists.
Here is a problem.The image below holds the basic data structure.
Here is the expected output for a year
The table above holds Xyz name and xyz#email.com data three times. And the name Abc and email abc#email.com two times and the third record name Def and email def#email.com two times. Now what I am trying to figure out a way to find out the number of times each records are being repeated in a given period of time using a single query. I am thinking to make use of recursion on a record and count till it didn't find a different record after sorting it. But using recursion on every records seems expensive.
Is there a better solution to solve this problem ?
Regards
Group and count.
SELECT column_to_compare1, column_to_compare2, COUNT(*)
FROM table_name
WHERE [date] BETWEEN #date1 AND #date2
GROUP BY column_to_compare1, column_to_compare2
HAVING COUNT(*) > 1 -- IF YOU WANT TO ONLY INCLUDE RECORDS WITH DUPLICATES
Between is inclusive, so you can adjust your dates with DATEADD if you really want between.
You can use the COUNT function to do this.
To do this using your own query:
SELECT Name, Email, COUNT(*) AS count
FROM table_name
WHERE date BETWEEN '01/01/2005' AND '31/12/2005'
GROUP BY Name, Email
However your example query is poor so I cannot give you a better solution. Here is an example of this working: SQL Fiddle
EDIT: Updated my solution to match you expected output.

Get latest data for all people in a table and then filter based on some criteria

I am attempting to return the row of the highest value for timestamp (an integer) for each person (that has multiple entries) in a table. Additionally, I am only interested in rows with the field containing ABCD, but this should be done after filtering to return the latest (max timestamp) entry for each person.
SELECT table."person", max(table."timestamp")
FROM table
WHERE table."type" = 1
HAVING table."field" LIKE '%ABCD%'
GROUP BY table."person"
For some reason, I am not receiving the data I expect. The returned table is nearly twice the size of expectation. Is there some step here that I am not getting correct?
You can 1st return a table having max(timestamp) and then use it in sub query of another select statement, following is query
SELECT table."person", timestamp FROM
(SELECT table."person",max(table."timestamp") as timestamp, type, field FROM table GROUP BY table."person")
where type = 1 and field LIKE '%ABCD%'
Direct answer: as I understand your end goal, just move the HAVING clause to the WHERE section:
SELECT
table."person", MAX(table."timestamp")
FROM table
WHERE
table."type" = 1
AND table."field" LIKE '%ABCD%'
GROUP BY table."person";
This should return no more than 1 row per table."person", with their associated maximum timestamp.
As an aside, I surprised your query worked at all. Your HAVING clause referenced a column not in your query. From the documentation (and my experience):
The fundamental difference between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed.

In SQL, why does group by make a difference when using having count()

I have a table that stores zone_id. Sometimes a zone id is twice in the database. I wrote a query to show only entries that have two or more entries of the same zone_id in the table.
The following query returns the correct result:
select *, count(zone_id)
from proxies.storage_used
group by zone_id desc
having count(zone_id) > 1;
However, if I group by last_updated or company_id, it returns random values. If I don't add a group by clause, it only displays one value as per the screenshot below. First output shows above query string, second output shows same query string without the 'group by' line and returns only one value:
correction: I'm a new member and thus can't post pictures directly, so I added it on minus: http://min.us/m3yrlkSMu#1o
While my query works, I don't understand why. Can somebody help me understand why group by is altering the actual output, instead of only the grouping of the output? I am using MySQL.
A group by divides the resulting rows into groups and performs the aggregate function on the records in each group. If you do a count(*) without a group by you will get a single count of all rows in a table. Since you didn't specify a group by there is only one group, all records in the table. If you do a count(*) with a group by of zone id, you will get a count of how many records there are for each zone id. If you do a count(*) of zone id and last updated date, you will get a count of how many rows were updated on each date in each zone.
Without a group by clause, everything is stored in the same group, so you get a single result. If there are more than one row in your table, then the having will succeed. So, you'll end up counting all the rows in your table...
source
From what I got, you could create a query with having and without group by only in two situations:
You have a where clause, and you want to test a condition on an aggregation of all rows that satisfy that clause.
Same as above, but for all rows in your table (in practice, it doesn't make sense, though).

Getting the first of a GROUP BY clause in SQL

I'm trying to implement single-column regionalization for a Rails application and I'm running into some major headaches with a complex SQL need. For this system, a region can be represented by a country code (e.g. us) a continent code that is uppercase (e.g. NA) or it can be NULL indicating the "default" information. I need to group these items by some relevant information such as a foreign key (we'll call it external_id).
Given a country and its continent, I need to be able to select only the most specific region available. So if records exist with the country code, I select them. If, not I want a records with the continent code. If not that, I want records with a NULL code so I can receive the default values.
So far I've figured that I may be able to use a generated CASE statement to get an arbitrary sort order. Something like this:
SELECT *, CASE region
WHEN 'us' THEN 1
WHEN 'NA' THEN 2
ELSE 3
END AS region_sort
FROM my_table
WHERE region IN ('us','NA') OR region IS NULL
GROUP BY external_id
ORDER BY region_sort
The problem is that without an aggregate function the actual data returned by the GROUP BY for a given row seems to be untameable. How can I massage this query to make it return only the first record of the region_sort ordered groups?
Only the first record or only the first group of records? It's not even clear from what you've written whether there is more than one record or not.
In any case, it seems you are bending over backwards to do this in one query, but the database structure is not optimized for this. If there's just 3 levels and you want the most specific, why not just:
SELECT * FROM my_table WHERE region = 'us' GROUP BY external_id
If that returns something then you, stop, otherwise you run 1 or 2 more queries conditionally.
I could be wrong, but my instinct says that will yield much better overall performance, though I suppose it depends on the particulars of your DB.