In Postgresql 8.2 I want to sequentially number rows. I have the table t at SQL Fiddle:
c
---
3
2
I want this:
c | i
---+---
2 | 1
3 | 2
I tried this:
select *
from
(select c from t order by c) s
cross join
generate_series(1, 2) i
And got:
c | i
---+---
2 | 1
3 | 1
2 | 2
3 | 2
The only thing I can think of is a sequence. You could do something like this:
drop sequence if exists row_numbers;
create temporary sequence row_numbers;
select next_val('row_numbers'), dt.c
from (select c from t order by c) as dt;
I'd throw a drop sequence row_numbers in as well but the temporary should take care of that if you forget.
This is a bit cumbersome but you might be able to wrap it in a function to hide some of the ugliness.
Keep in mind that 8.2 is no longer supported but 8.4 is and 8.4 has window functions.
References (8.2 versions):
CREATE SEQUENCE
DROP SEQUENCE
You can use a "triangular join" as in the following:
select a.c, (select count(*) from t where c <= a.c) as i
from t as a
order by i
This assumes, however, that the values of c are unique, as the "row numbering" scheme is simply a count of rows that are less-than-or-equal to the current row. This can be expanded to included a primary key or other unique column for tie-breaking, as necessary.
Also, there can be some performance implications with joining in this manner.
Related
(Edited below in response to answer)
Assume that I have a table R(a,b) with no primary key nor any other constraints.
dmg#[local] test1=# table R;
a | b
---+----
1 | 10
1 | 20
2 | 30
2 | 10
(4 rows)
the query
select a,b
from R group a ;
is invalid in Postgresql (it would be ok if a is the primary key of R).
dmg#[local] test1=# select a,b from R group by a;
ERROR: column "r.b" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select a,b from R group by a;
^
But it is allowed in sqlite3. It chooses a value of b from a tuple within each subset (non deterministically).
sqlite> select a,b from R group by a;
a b
---------- ----------
1 10
2 30
Does oracle execute this query as valid?
First, it is no longer accepted in MySQL,.
Second, it does not return a "random" value. It returns a value from an indeterminate row.
Third, it violates the SQL standard.
And finally, Oracle does not support this syntax.
What is the difference between
select * from degreeprogram NATURAL JOIN degreeprogram ;
and
select * from degreeprogram d1 NATURAL JOIN degreeprogram d2;
in oracle?
I expected that they return the same result set, however, they do not. The second query does what I expect: it joins the two relations using the same named attributes and so it returns the same tuples as stored in degreeprogram. However, the first query is confusing for me: here, each tuple occurs several times in the result set-> what join condition is used here?
Thank you
NATURAL JOIN means join the two tables based on all columns having the same name in both tables.
I imagine that for each column in your table, Oracle is internally writing a condition like:
degreeprogram.column1 = degreeprogram.column1
(which you would not be able to write yourself due to ORA-00918 column ambiguously defined error)
And then, I imagine, Oracle is optimizing that away to just
degreeprogram.column1 is not null
So, you're not exactly getting a CROSS JOIN of your table with itself -- only a CROSS JOIN of those rows having no null columns.
UPDATE: Since this was the selected answer, I will just add from Thorsten Kettner's answer that this behavior is probably a bug on Oracle's part. In 18c, Oracle behaves properly and returns an ORA-00918 error when you try to NATURAL JOIN a table to itself.
The difference between those two statements is that the second explicitly defines a self join on the table, where the first statement, the optimizer is trying to figure out what you really want. On my database, the first statement performs a cartesian merge join and is not optimized at all, and the second statement has a better explain plan, using a single full table access with index scanning.
I'd call this a bug. This query:
select * from degreeprogram d1 NATURAL JOIN degreeprogram d2;
translates to
select col1, col2, ... -- all columns
from degreeprogram d1
join degreeprogram d2 using (col1, col2, ...)
and gives you all rows from the table where all columns are not null (because using(col) never matches nulls).
This query, however:
select * from degreeprogram NATURAL JOIN degreeprogram;
is invalid according to standard SQL, because every table must have a unique name or alias in a query. Oracle lets this pass, but doing so it should do something still to keep the table instances apart (e.g. create internally an alias for them). It obviously doesn't and multiplies the result with the number of rows in the table. A bug.
A so-called natural join instructs the database to
Find all column names common to both tables (in this case, degreeprogram and degreeprogram, which of course have the same columns.)
Generate a join condition for each pair of matching column names, in the form table1.column1 = table2.column1 (in this case, there will be one for every column in degreeprogram.)
Therefore a query like this
select count(*) from demo natural join demo;
will be transformed into
select count(*) from demo, demo where demo.x = demo.x;
I checked this by creating a table with one column and two rows:
create table demo (x integer);
insert into demo values (1);
insert into demo values (2);
commit;
and then tracing the session:
SQL> alter session set tracefile_identifier='demo_trace';
Session altered.
SQL> alter session set events 'trace [SQL_Compiler.*]';
Session altered.
SQL> select /* nj test */ count(*) from demo natural join demo;
COUNT(*)
----------
4
1 row selected.
SQL> alter session set events 'trace [SQL_Compiler.*] off';
Session altered.
Then in twelve_ora_6196_demo_trace.trc I found this line:
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT(*) "COUNT(*)" FROM "WILLIAM"."DEMO" "DEMO","WILLIAM"."DEMO" "DEMO" WHERE "DEMO"."X"="DEMO"."X"
and a few lines later:
try to generate single-table filter predicates from ORs for query block SEL$58A6D7F6 (#0)
finally: "DEMO"."X" IS NOT NULL
(This is merely an optimisation on top of the generated query above, as column X is nullable but the join allows the optimiser to infer that only non-null values are required. It doesn't replace the joins.)
Hence the execution plan:
-----------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT | | | | 7 | |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | MERGE JOIN CARTESIAN | | 4 | 52 | 7 | 00:00:01 |
| 3 | TABLE ACCESS FULL | DEMO | 2 | 26 | 3 | 00:00:01 |
| 4 | BUFFER SORT | | 2 | | 4 | 00:00:01 |
| 5 | TABLE ACCESS FULL | DEMO | 2 | | 2 | 00:00:01 |
-----------------------------------------+-----------------------------------+
Query Block Name / Object Alias(identified by operation id):
------------------------------------------------------------
1 - SEL$58A6D7F6
3 - SEL$58A6D7F6 / DEMO_0001#SEL$1
5 - SEL$58A6D7F6 / DEMO_0002#SEL$1
------------------------------------------------------------
Predicate Information:
----------------------
3 - filter("DEMO"."X" IS NOT NULL)
Alternatively, let's see what dbms_utility.expand_sql_text does with it. I'm not quite sure what to make of this given the trace file above, but it shows a similar expansion taking place:
SQL> var result varchar2(1000)
SQL> exec dbms_utility.expand_sql_text('select count(*) from demo natural join demo', :result)
PL/SQL procedure successfully completed.
RESULT
----------------------------------------------------------------------------------------------------------------------------------
SELECT COUNT(*) "COUNT(*)" FROM (SELECT "A2"."X" "X" FROM "WILLIAM"."DEMO" "A3","WILLIAM"."DEMO" "A2" WHERE "A2"."X"="A2"."X") "A1"
Lesson: NATURAL JOIN is evil. Everybody knows this.
This question already has answers here:
Is it possible to use a PG sequence on a per record label?
(4 answers)
Closed 9 years ago.
I have two models, A and B. A has many B. Originally, both A and B had an auto-incrementing primary key field called id, and B had an a_id field. Now I have found myself needing a unique sequence of numbers for each B within an A. I was keeping track of this within my application, but then I thought it might make more sense to let the database take care of it. I thought I could give B a compound key where the first component is a_id and the second component auto-increments, taking into consideration the a_id. So if I insert two records with a_id 1 and one with a_id 2 then I will have something like:
a_id | other_id
1 | 1
1 | 2
2 | 1
If ids with lower numbers are deleted, then the sequence should not recycle these numbers. So if (1, 2) gets deleted:
a_id | other_id
1 | 1
2 | 1
When the next record with a_id 1 is added, the table will look like:
a_id | other_id
1 | 1
2 | 1
1 | 3
How can I do this in SQL? Are there reasons not to do something like this?
I am using in-memory H2 (testing and development) and PostgreSQL 9.3 (production).
The answer to your question is that you would need a trigger to get this functionality. However, you could just create a view that uses the row_number() function:
create view v_table as
select t.*,
row_number() over (partition by a order by id) as seqnum
from table t;
Where I am calling the primary key for the table id.
I am creating a database for the first time using Postgres 9.3 on MacOSX.
Let's say I have table A and B. A starts off as empty and B as filled. I would like the number of entries in column all_names in table B to equal the number for each names in table A like table B below. Thus names should contain each unique entry from all_names and number its count. I am not used to the syntax, yet, so I do not really know how to go about it. The birthday column is redundant.
Table A
names | number
------+--------
Carl | 3
Bill | 4
Jen | 2
Table B
all_names | birthday
-----------+------------
Carl | 17/03/1980
Carl | 22/08/1994
Carl | 04/09/1951
Bill | 02/12/2003
Bill | 11/03/1975
Bill | 04/06/1986
Bill | 08/07/2005
Jen | 05/03/2009
Jen | 01/04/1945
Would this be the correct way to go about it?
insert into a (names, number)
select b.all_names, count(b.all_names)
from b
group by b.all_names;
Answer to original question
Postgres allows set-returning functions (SRF) to multiply rows. generate_series() is your friend:
INSERT INTO b (all_names, birthday)
SELECT names, current_date -- AS birthday ??
FROM (SELECT names, generate_series(1, number) FROM a);
Since the introduction of LATERAL in Postgres 9.3 you can do stick to standard SQL: the SRF moves from the SELECT to the FROM list:
INSERT INTO b (all_names, birthday)
SELECT a.names, current_date -- AS birthday ??
FROM a, generate_series(1, a.number) AS rn
LATERAL is implicit here, as explained in the manual:
LATERAL can also precede a function-call FROM item, but in this case
it is a noise word, because the function expression can refer to
earlier FROM items in any case.
Reverse operation
The above is the reverse operation (approximately) of a simple aggregate count():
INSERT INTO a (name, number)
SELECT all_names, count(*)
FROM b
GROUP BY 1;
... which fits your updated question.
Note a subtle difference between count(*) and count(all_names). The former counts all rows, no matter what, while the latter only counts rows where all_names IS NOT NULL. If your column all_names is defined as NOT NULL, both return the same, but count(*) is a bit shorter and faster.
About GROUP BY 1:
GROUP BY + CASE statement
For example, I want to order a table like this
Foo | Bar
---------
1 | a
5 | d
2 | c
1 | b
2 | a
to this:
Foo | Bar
---------
1 | a
1 | b
2 | a
2 | c
5 | d
(ordered by Foo column)
That's because I only want to select the Bars that have a given Foo, and if it's already ordered I guess they will be faster to select because I won't have to use ORDER BY.
And if it's possible, once sorting by columns Foo, I want to sort the rows which have the same Foo by Bar column.
Of course, if I INSERT or UPDATE to table, it should remain ordered.
In SQL, tables are inherently unordered. This is a very important characteristic of databases. For instance, you can delete a row in the middle of a table, and when a new row is inserted, it uses up the space occupied by the deleted row. This is more efficient that just appending rows to the end of the data.
In other words, the order by clause is used basically for output purposes only. Okay, I can think of two other situations . . . with limit (or a related clause) and with window functions (which SQLite does not support).
In any case, ordering the data also would not matter for a query such as this:
select bar
from t
where foo = $FOO
The SQL engine does not "know" that the table is ordered. So, it will start at the beginning of the table and do the comparison for each row.
The way to make this more efficient is by building an index on foo. Then you will be able to get the efficiencies that you want.