Why does the number of returned samples where name='keyword' does not match the number of observed samples with 'keyword' in table? - sql

I have a Postgres table whose header is [id(uuid), name(str), arg_name(str), measurements(list), run_id(uuid), parent_id(uuid)] with a total of 237K entries.
When I want to filter for specific measurements I can use 'name', but for the majority of entries in the table 'name' == 'arg_name' and thus map to the same sample.
In my peculiar case I am interested in retrieving samples whose 'name'='TimeM12nS' and whose 'arg_name'='Time'. These two attributes point to the same samples when visually inspecting the table through PgAdmin. That is to say all entries which have arg_name='Time' also have the name='TimeM12nS' and vice-versa.
Its obvious there's a problem because of the quantity of returned samples is not the same. I first noticed the problem using django orm, but the problem is also present when I query the DB using PgAdmin.
SELECT *
FROM TableA
WHERE name='TimeM12nS'
returns 301 entries (name='TimeM12nS' and arg_name='Time' in all cases)
BUT the query:
SELECT *
FROM TableA
WHERE arg_name='Time'
returns 3945 (name='TimeM12nS' and arg_name='Time' in all cases)
I am completely stumped, anyone think they can shed some light into what's happening here?
EDIT:
I should add that the query by 'arg_name' returns the 301 entries that are returned when querying by 'name'

First let me say thank you to everyone who pitched in ideas to solve this conundrum and especially to JGH for the solution (found in the comments of the original post).
Indeed the problem was a indexing issue. After re-indexing the queries return the same number of entries '3945' as expected.
In Postgress re-indexing a table can be achieved through pgAdmin by navigating to Databases > 'database_name' > Schemas > Tables then right-clicking on the table_name selecting Maintenance and pressing the REINDEX button.
or more simply by running the following command
REINDEX TABLE table_name
Postgress Re-Indexing Docs

Without access to the database, it's not possibly to give a definitive answer. All I can provide is the next query that I would use in this case.
SELECT COUNT(*), LENGTH(name), name, arg_name
FROM TableA
WHERE arg_name='Time'
GROUP BY name, arg_name;
This should show you any differences in the name column that you aren't able to see. The length of that string could also be informative.

Related

How to get the data of the newly accessed record by a query on PostgreSQL using it's internal variables and functions?

Let's say I have the following 'items' table in my PostgreSQL database:
id
item
value
1
a
10
2
b
20
3
c
30
For some reason I can't control I need to run the following query:
select max(value) from items;
which will return 30 as the result.
At this point, I know that I can find the record that contains that value using simple select statements, etc. That's not the actual problem.
My real questions are:
Does PostgreSQL know (behind the scenes) what's is the ID of that
record, although the query shows only the max value of the column
'value'?
If yes, can I have access to that information and,
therefore, get the ID and other data from the found record?
I'm not allowed to create indexes and sequences, or change way the max value is retrieved. That's a given. I need to work from that point onward and find a solution (which I have, actually, from regular query work).
I'm just guessing that the database knows in which record that information (30) is and that I could have access to it.
I've been searching for an answer for a couple of hours but wasn't able to find anything.
What am I missing? Any ideas?
Note: postgres (PostgreSQL) 12.5 (Ubuntu 12.5-0ubuntu0.20.10.1)
You can simply extract the whole record that contains max(value) w/o bothering about Postgres internals like this:
select id, item, "value"
from items
order by "value" desc
limit 1;
I do not think that using undocumented "behind the scenes" ways is a good idea at all. The planner is smart enough to do exactly what you need w/o extra work.

Query to Find Adjacent Date Records

There exists in my database a page_history table; the idea is that whenever a record in the page table is changed, that record's old values are stored in the history table.
My job now is to find occasions in which a record was changed, and retrieve the pre- and post-conditions of that change. Specifically, I want to know when a page changed groups, and what groups were involved in the change. The query I have below can find these instances, but with the use of the min function, I can only get back the values that match between the two records:
select page_id,
original_group,
min(created2) change_date
from (select h.page_id,
h.group_id original_group,
i.group_id new_group,
h.created_dttm created1,
i.created_dttm created2
from page_history h,
page_history i
where h.page_id = i.page_id
and h.created_dttm < i.created_dttm
and h.group_id != i.group_id)
group by page_id, original_group, created1
order by page_id
When I try to get, say, any details of the second record, like new_group, I'm hit with a ORA-00979: not a GROUP BY expression error. I don't want to group by new_group, though, because that's going to destroy the logic (I think it would find records displaying times a page changed from a group to another group, regardless of any changes to other groups in between).
My question, then, is how can I modify this query, or go about writing a new one, that achieves a similar end, but with the added availability of columns that do not match between the two records? In essence, how can I find that min record without sacrificing all the other columns I'm not trying to compare? I don't exactly need a complete answer, any suggestions that point me in the right direction would be appreciated.
I use PL/SQL Developer, and it looks like version 11.2.0.2.0 of Oracle.
EDIT: I have found a solution. It's not pretty, and I'd still like to see some alternatives, but if helping me out would threaten to explode your brain, I would advise relocating to an easier question.
Without seeing your table structure it's hard to re-write the query but when you have a min function used like that it invariably seems better to put it into a separate sub select to get what you want and then compare the result of that.

sql statement not working on AND (... OR ... OR ...)

this is probably a little thing
but i try to use this sql statement:
SELECT * FROM Colors
WHERE colorHueWarmth < 0
AND colorV >=0.7
AND (fk_subCategory=4 OR fk_subCategory=5 OR fk_subCategory=11)
And in the results i get the perfect colorHueWarmth and colorV but i also get the fk_subcategories for other values than 4, 5 or 11.
i tried changing the values but no results, is it even possible to do such a statement?
Does anyone what i am doing wrong?
Thanks in advance
You've actually got multiple options; although I'd point out that the query (in your qusetion) actually works for me (see this Sql Fiddle)
SELECT
*
FROM
Colors
WHERE
colorHueWarmth < 0
AND colorV >=0.7
AND (fk_subCategory=4 OR fk_subCategory=5 OR fk_subCategory=11)
As stated in one of the comments I would guess that your original didn't have braces on the fk_subCategory clause (the third table in my previous fiddle). Brackets are immensely important when working with logic and should always be used to group items together.
The easiest solution is as follows:
SELECT
*
FROM
Colors
WHERE
colorHueWarmth < 0
AND colorV >=0.7
AND (fk_subCategory IN(4,5,11));
You will find loads of documentation online regarding the LIKE clause here are a few you might find useful:
http://webcheatsheet.com/sql/interactive_sql_tutorial/sql_in.php
http://www.w3schools.com/sql/sql_in.asp (note W3Schools can't always be taken on face value and are often excluded from suggested links due to the errors/omissions they often contain)
http://msdn.microsoft.com/en-gb/library/ms177682.aspx
Given the size of the foreign key constraint (4,5 or 11) the IN clause is a reasonable option, if you have other queries using something similar with large collections this can become quite inefficient in which case you could create a temporary table which contains the ID's and INNER JOIN onto that. (here is a question regarding alternatives to LIKE)

Counting occurence of each distinct element in a table

I am writing a log viewer app in ASP.NET / C#. There is a report window, where it will be possible to check some information about the whole database. One kind of information there I want to display on the screen is the number of times each generator (an entity in my domain, not Firebirds sequence) appears in the table. How do I do that using COUNT ?
Do I have to :
Gather the key for each different generator
Run one query for each generator key using count
Display it somehow
Is there any way that I can do it without having to do two queries to the database? The database size can be HUGE, and having to query it "X" times where "X" is the number of generators would just suck.
I am using a Firebird database, is there any way to fetch this information from any metadata schema or there is no such thing available?
Basically, what I want is to count each occurrence of each generator in the table. Result would be something like : GENERATOR A:10 times,GENERATOR B:7 Times,Generator C:0 Times and so on.
If I understand your question correctly, it is a simple matter of using the GROUP BY clause, e.g.:
select
key,
count(*)
from generators
group by key;
Something like the query below should be sufficient (depending on your exact structure and requirements)
SELECT KEY, COUNT(*)
FROM YOUR_TABLE
GROUP BY KEY
I solved my problem using this simple Query:
SELECT GENERATOR_,count(*)
FROM EVENTSGENERAL GROUP BY GENERATOR_;
Thanks for those who helped me.
It took me 8 hours to come back and post the answer,because of the StackOverflow limitation to answer my own questions based in my reputation.

Oracle DB simple SELECT where column order matters

I am doing a simple SELECT statement in an Oracle DB and need to select the columns in a somewhat-specific order. Example:
Table A has 100 attributes, one of which is "chapter" that occurs somewhere in the order of columns in the table. I need to select the data with "chapter" first and the remaining columns after in no particular order. Essentially, my statement needs to read something like:
SELECT a.chapter, a. *the remaining columns* FROM A
Furthermore, I cannot simply type:
SELECT a.chapter, a.*
because this will select "chapter" twice.
I know the SQL statement seems simple, but if I know how to solve this problem, I can extrapolate this thought into more complicated areas. Also, let's assume that I can't just scroll over to find the "chapter" column and drag it to the beginning.
Thanks.
You should not select * in a program. As your schema evolves it will bring in things you do not know yet. Think about what happens when someone add a column with the whole book in it? The query you thought would be very cheap suddenly starts to bring in megabytes of data.
That means you have to specify every column you need.
Your best bet is just to select each column explicitly.
A quickie way to get around this would be SELECT a.chapter AS chapterCol, a.* FROM table a; This means there will be one column name chapterCol (assuming there's not a column already there named chapterCol. ;))
If your going to embed the 'SELECT *' into program code, then I would strongly recommend against doing that. As noted by the previous authors, your setting up the code to break if a column is ever added to (or removed from) the table. The simple advice is don't do it.
If your using this in development tools (viewing the data, and the like). Then, I'd recommend creating a view with the specific column order you need. Capture the output from 'SELECT COLUMN_NAME FROM ALL_TAB_COLUMNS' and create a select statement for the view with the column order you need.
This is how I would build your query without having to type all the names in, but with some manual effort.
Start with "Select a.chapter"
Now perform another select on your data base as follows :
select ','|| column_name
from user_tab_cols
where table_name = your_real_table_name
and column_name <> 'CHAPTER';
now take the output from that, in a cut-and-paste manner and append it to what you started with. Now run that query. It should be what you asked for.
Ta-da!
Unless you have a very good reason to do so, you should not use SELECT * in queries. It will break your application every time the schema changes.