Counting occurence of each distinct element in a table - sql

I am writing a log viewer app in ASP.NET / C#. There is a report window, where it will be possible to check some information about the whole database. One kind of information there I want to display on the screen is the number of times each generator (an entity in my domain, not Firebirds sequence) appears in the table. How do I do that using COUNT ?
Do I have to :
Gather the key for each different generator
Run one query for each generator key using count
Display it somehow
Is there any way that I can do it without having to do two queries to the database? The database size can be HUGE, and having to query it "X" times where "X" is the number of generators would just suck.
I am using a Firebird database, is there any way to fetch this information from any metadata schema or there is no such thing available?
Basically, what I want is to count each occurrence of each generator in the table. Result would be something like : GENERATOR A:10 times,GENERATOR B:7 Times,Generator C:0 Times and so on.

If I understand your question correctly, it is a simple matter of using the GROUP BY clause, e.g.:
select
key,
count(*)
from generators
group by key;

Something like the query below should be sufficient (depending on your exact structure and requirements)
SELECT KEY, COUNT(*)
FROM YOUR_TABLE
GROUP BY KEY

I solved my problem using this simple Query:
SELECT GENERATOR_,count(*)
FROM EVENTSGENERAL GROUP BY GENERATOR_;
Thanks for those who helped me.
It took me 8 hours to come back and post the answer,because of the StackOverflow limitation to answer my own questions based in my reputation.

Related

Why does the number of returned samples where name='keyword' does not match the number of observed samples with 'keyword' in table?

I have a Postgres table whose header is [id(uuid), name(str), arg_name(str), measurements(list), run_id(uuid), parent_id(uuid)] with a total of 237K entries.
When I want to filter for specific measurements I can use 'name', but for the majority of entries in the table 'name' == 'arg_name' and thus map to the same sample.
In my peculiar case I am interested in retrieving samples whose 'name'='TimeM12nS' and whose 'arg_name'='Time'. These two attributes point to the same samples when visually inspecting the table through PgAdmin. That is to say all entries which have arg_name='Time' also have the name='TimeM12nS' and vice-versa.
Its obvious there's a problem because of the quantity of returned samples is not the same. I first noticed the problem using django orm, but the problem is also present when I query the DB using PgAdmin.
SELECT *
FROM TableA
WHERE name='TimeM12nS'
returns 301 entries (name='TimeM12nS' and arg_name='Time' in all cases)
BUT the query:
SELECT *
FROM TableA
WHERE arg_name='Time'
returns 3945 (name='TimeM12nS' and arg_name='Time' in all cases)
I am completely stumped, anyone think they can shed some light into what's happening here?
EDIT:
I should add that the query by 'arg_name' returns the 301 entries that are returned when querying by 'name'
First let me say thank you to everyone who pitched in ideas to solve this conundrum and especially to JGH for the solution (found in the comments of the original post).
Indeed the problem was a indexing issue. After re-indexing the queries return the same number of entries '3945' as expected.
In Postgress re-indexing a table can be achieved through pgAdmin by navigating to Databases > 'database_name' > Schemas > Tables then right-clicking on the table_name selecting Maintenance and pressing the REINDEX button.
or more simply by running the following command
REINDEX TABLE table_name
Postgress Re-Indexing Docs
Without access to the database, it's not possibly to give a definitive answer. All I can provide is the next query that I would use in this case.
SELECT COUNT(*), LENGTH(name), name, arg_name
FROM TableA
WHERE arg_name='Time'
GROUP BY name, arg_name;
This should show you any differences in the name column that you aren't able to see. The length of that string could also be informative.

How to get the data of the newly accessed record by a query on PostgreSQL using it's internal variables and functions?

Let's say I have the following 'items' table in my PostgreSQL database:
id
item
value
1
a
10
2
b
20
3
c
30
For some reason I can't control I need to run the following query:
select max(value) from items;
which will return 30 as the result.
At this point, I know that I can find the record that contains that value using simple select statements, etc. That's not the actual problem.
My real questions are:
Does PostgreSQL know (behind the scenes) what's is the ID of that
record, although the query shows only the max value of the column
'value'?
If yes, can I have access to that information and,
therefore, get the ID and other data from the found record?
I'm not allowed to create indexes and sequences, or change way the max value is retrieved. That's a given. I need to work from that point onward and find a solution (which I have, actually, from regular query work).
I'm just guessing that the database knows in which record that information (30) is and that I could have access to it.
I've been searching for an answer for a couple of hours but wasn't able to find anything.
What am I missing? Any ideas?
Note: postgres (PostgreSQL) 12.5 (Ubuntu 12.5-0ubuntu0.20.10.1)
You can simply extract the whole record that contains max(value) w/o bothering about Postgres internals like this:
select id, item, "value"
from items
order by "value" desc
limit 1;
I do not think that using undocumented "behind the scenes" ways is a good idea at all. The planner is smart enough to do exactly what you need w/o extra work.

Query to Find Adjacent Date Records

There exists in my database a page_history table; the idea is that whenever a record in the page table is changed, that record's old values are stored in the history table.
My job now is to find occasions in which a record was changed, and retrieve the pre- and post-conditions of that change. Specifically, I want to know when a page changed groups, and what groups were involved in the change. The query I have below can find these instances, but with the use of the min function, I can only get back the values that match between the two records:
select page_id,
original_group,
min(created2) change_date
from (select h.page_id,
h.group_id original_group,
i.group_id new_group,
h.created_dttm created1,
i.created_dttm created2
from page_history h,
page_history i
where h.page_id = i.page_id
and h.created_dttm < i.created_dttm
and h.group_id != i.group_id)
group by page_id, original_group, created1
order by page_id
When I try to get, say, any details of the second record, like new_group, I'm hit with a ORA-00979: not a GROUP BY expression error. I don't want to group by new_group, though, because that's going to destroy the logic (I think it would find records displaying times a page changed from a group to another group, regardless of any changes to other groups in between).
My question, then, is how can I modify this query, or go about writing a new one, that achieves a similar end, but with the added availability of columns that do not match between the two records? In essence, how can I find that min record without sacrificing all the other columns I'm not trying to compare? I don't exactly need a complete answer, any suggestions that point me in the right direction would be appreciated.
I use PL/SQL Developer, and it looks like version 11.2.0.2.0 of Oracle.
EDIT: I have found a solution. It's not pretty, and I'd still like to see some alternatives, but if helping me out would threaten to explode your brain, I would advise relocating to an easier question.
Without seeing your table structure it's hard to re-write the query but when you have a min function used like that it invariably seems better to put it into a separate sub select to get what you want and then compare the result of that.

Is there a way to find all distinct values for multiple columns in one query?

I would really appreciate a bit of help/pointers on the following problem.
Background Info:
Database version: Oracle 9i
Java version: 1.4.2
The problem
I have a database table with multiple columns representing various meta data about a document.
E.g.:
CREATE TABLE mytable
(
document_id integer,
filename varchar(255),
added_date date,
created_by varchar(32),
....
)
Due to networking/latency issues between a webserver and database server, I would like to minimise the number of queries made to the database.
The documents are listed in a web page, but there are thousands of different documents.
To aid navigation, we provide filters on the web page to select just documents matching a certain value - e.g. created by user 'joe bloggs' or created on '01-01-2011'. Also, paging is provided so triggering a db call to get the next 50 docs or whatever.
The web pages themselves are kept pretty dumb - they just present what's returned by a java servlet. Currently, these filters are each provided with their distinct values through separate queries for distinct values on each column.
This is taking quite a long time due to networking latency and the fact it means 5 extra queries.
My Question
I would like to know if there is a way to get this same information in just one query?
For example, is there a way to get distinct results from that table in a form like:
DistinctValue Type
01-01-2011 added_date
01-02-2011 added_date
01-03-2011 added_date
Joe Bloggs created_by
AN Other created_by
.... ...
I'm guessing one issue with the above is that the datatypes are different across the columns, so dates and varchars could not both be returned in a "DistinctValue" column.
Is there a better/standard approach to this problem?
Many thanks in advance.
Jay
Edit
As I mentioned in a comment below, I thought of a possibly more memory/load effective approach that removes the original requirement to join the queries up -
I imagine another way it could work is
instead of populating the drop-downs
initially, have them react to a user
typing and then have a "suggester"
style drop-down appear of just those
distinct values that match the entered
text. I think this would mean a)
keeping the separate queries for
distinct values, but b) only running
the queries individually as needed,
and c) reducing the resultset by
filtering the unique values on the
user's text.
This query will return an output as you describe above:
SELECT DocumentID As DocumentID, 'FileName' As AttributeType, FileName As DistinctValue
FROM TableName
UNION
SELECT DocumentID, 'Added Date', Added_date FROM TableName
UNION
SELECT DocumentID, 'Created By', created_by FROM TableName
UNION
....
If you have the privilege you could create a view using this SQL and you could use it for your queries.
Due to networking/latency issues
between a webserver and database
server, I would like to minimise the
number of queries made to the
database.
The documents are listed in a web
page, but there are thousands of
different documents.
You may want to look into Lucene. Whenever I see "minimise queries to db" combined with "searching documents", this is what I think of. I've used this with very good success, and can be used with read-only or updating environments. Oracle's answer is Oracle Text, but (to me anyway) its a bit of a bear to setup and use. Depends on your company's technical resources and strengths.
Anyway, sure beats the heck out of multiple queries to the db for each connection.

Best way to join unique month and year from db in rails 3 ( or otherwise )

I am trying to figure out a nice way of doing this and thought maybe there is a nicer way in the newer Rails 3.0 ActiveRecord query.
I have a bunch of Posts that have a published_at field.
Now I want to present an Archive in the sidebar with all unique months and year that contains posts and display that archive. What's the best way to do this avoiding to heavy hits on the DB on every pageload? Suggestions?
You need a query along the lines of select distinct date_format(published_at, '%m %y'), count(id) from posts group by 1. It's a trivial matter to convert this to AR syntax.
RE: pageload
Run the query for the archive and cache the result using either query caching or fragment caching.