Creating dynamic columns from table data - sql

Table data sample:
--------------------------
| key | domain | value |
--------------------------
| a | en | English |
| a | de | Germany |
Query which returns result I need:
select * from
(
select t1.key,
(select value from TABLE where t1.key=key AND code='en') en,
(select value from TABLE where t1.key=key AND code='de') de
from TABLE t1
) as t2
Data returned from query:
---------------------------
| key | en | de |
---------------------------
| a | English | Germany |
I do not want to list all available domains with:
(select value from TABLE where t1.key=key AND code='*') *
Is it possible to make this query more dynamic in Postgres: automatically add all domain columns that exist in the table?

For more than a few domains use crosstab() to make the query shorter and faster.
PostgreSQL Crosstab Query
A completely dynamic query, returning a dynamic number of columns based on data in your table is not possible, because SQL is strictly typed. Whatever you try, you'll end up needing two steps. Step 1: generate the query, step 2: execute it.
Execute a dynamic crosstab query
Or you return something more flexible instead of table columns, like an array or a document type like json. Details:
Dynamic alternative to pivot with CASE and GROUP BY
Refactor a PL/pgSQL function to return the output of various SELECT queries

Related

Self-Join with Natural Join

What is the difference between
select * from degreeprogram NATURAL JOIN degreeprogram ;
and
select * from degreeprogram d1 NATURAL JOIN degreeprogram d2;
in oracle?
I expected that they return the same result set, however, they do not. The second query does what I expect: it joins the two relations using the same named attributes and so it returns the same tuples as stored in degreeprogram. However, the first query is confusing for me: here, each tuple occurs several times in the result set-> what join condition is used here?
Thank you
NATURAL JOIN means join the two tables based on all columns having the same name in both tables.
I imagine that for each column in your table, Oracle is internally writing a condition like:
degreeprogram.column1 = degreeprogram.column1
(which you would not be able to write yourself due to ORA-00918 column ambiguously defined error)
And then, I imagine, Oracle is optimizing that away to just
degreeprogram.column1 is not null
So, you're not exactly getting a CROSS JOIN of your table with itself -- only a CROSS JOIN of those rows having no null columns.
UPDATE: Since this was the selected answer, I will just add from Thorsten Kettner's answer that this behavior is probably a bug on Oracle's part. In 18c, Oracle behaves properly and returns an ORA-00918 error when you try to NATURAL JOIN a table to itself.
The difference between those two statements is that the second explicitly defines a self join on the table, where the first statement, the optimizer is trying to figure out what you really want. On my database, the first statement performs a cartesian merge join and is not optimized at all, and the second statement has a better explain plan, using a single full table access with index scanning.
I'd call this a bug. This query:
select * from degreeprogram d1 NATURAL JOIN degreeprogram d2;
translates to
select col1, col2, ... -- all columns
from degreeprogram d1
join degreeprogram d2 using (col1, col2, ...)
and gives you all rows from the table where all columns are not null (because using(col) never matches nulls).
This query, however:
select * from degreeprogram NATURAL JOIN degreeprogram;
is invalid according to standard SQL, because every table must have a unique name or alias in a query. Oracle lets this pass, but doing so it should do something still to keep the table instances apart (e.g. create internally an alias for them). It obviously doesn't and multiplies the result with the number of rows in the table. A bug.
A so-called natural join instructs the database to
Find all column names common to both tables (in this case, degreeprogram and degreeprogram, which of course have the same columns.)
Generate a join condition for each pair of matching column names, in the form table1.column1 = table2.column1 (in this case, there will be one for every column in degreeprogram.)
Therefore a query like this
select count(*) from demo natural join demo;
will be transformed into
select count(*) from demo, demo where demo.x = demo.x;
I checked this by creating a table with one column and two rows:
create table demo (x integer);
insert into demo values (1);
insert into demo values (2);
commit;
and then tracing the session:
SQL> alter session set tracefile_identifier='demo_trace';
Session altered.
SQL> alter session set events 'trace [SQL_Compiler.*]';
Session altered.
SQL> select /* nj test */ count(*) from demo natural join demo;
COUNT(*)
----------
4
1 row selected.
SQL> alter session set events 'trace [SQL_Compiler.*] off';
Session altered.
Then in twelve_ora_6196_demo_trace.trc I found this line:
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT(*) "COUNT(*)" FROM "WILLIAM"."DEMO" "DEMO","WILLIAM"."DEMO" "DEMO" WHERE "DEMO"."X"="DEMO"."X"
and a few lines later:
try to generate single-table filter predicates from ORs for query block SEL$58A6D7F6 (#0)
finally: "DEMO"."X" IS NOT NULL
(This is merely an optimisation on top of the generated query above, as column X is nullable but the join allows the optimiser to infer that only non-null values are required. It doesn't replace the joins.)
Hence the execution plan:
-----------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT | | | | 7 | |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | MERGE JOIN CARTESIAN | | 4 | 52 | 7 | 00:00:01 |
| 3 | TABLE ACCESS FULL | DEMO | 2 | 26 | 3 | 00:00:01 |
| 4 | BUFFER SORT | | 2 | | 4 | 00:00:01 |
| 5 | TABLE ACCESS FULL | DEMO | 2 | | 2 | 00:00:01 |
-----------------------------------------+-----------------------------------+
Query Block Name / Object Alias(identified by operation id):
------------------------------------------------------------
1 - SEL$58A6D7F6
3 - SEL$58A6D7F6 / DEMO_0001#SEL$1
5 - SEL$58A6D7F6 / DEMO_0002#SEL$1
------------------------------------------------------------
Predicate Information:
----------------------
3 - filter("DEMO"."X" IS NOT NULL)
Alternatively, let's see what dbms_utility.expand_sql_text does with it. I'm not quite sure what to make of this given the trace file above, but it shows a similar expansion taking place:
SQL> var result varchar2(1000)
SQL> exec dbms_utility.expand_sql_text('select count(*) from demo natural join demo', :result)
PL/SQL procedure successfully completed.
RESULT
----------------------------------------------------------------------------------------------------------------------------------
SELECT COUNT(*) "COUNT(*)" FROM (SELECT "A2"."X" "X" FROM "WILLIAM"."DEMO" "A3","WILLIAM"."DEMO" "A2" WHERE "A2"."X"="A2"."X") "A1"
Lesson: NATURAL JOIN is evil. Everybody knows this.

Where clause by only number-convertable rows

I have a table with 2 columns like this
COLNUM | COLSTR
---------------
1 | 001223
2 | 002234
3 | ds2-dd
4 | 003344
I would like to make query like this
SELECT * FROM TABLE WHERE CAST(COLSTR AS NUMBER) IN (1223,3344)
But since 4th row can not cast to number, query fails. I could create a view which would eliminate non-number-transformable rows, but I would like to make this with one query. How do I do it?
Oracle 11g.
Thanks.
I would suggest to use regular expression
SELECT * FROM TABLE WHERE REGEXP_LIKE(COLSTR,'^[0-9]$')

Multiple records in a table matched with a column

The architecture of my DB involves records in a Tags table. Each record in the Tags table has a string which is a Name and a foreign kery to the PrimaryID's of records in another Worker table.
Records in the Worker table have tags. Every time we create a Tag for a worker, we add a new row in the Tags table with the inputted Name and foreign key to the worker's PrimaryID. Therefore, we can have multiple Tags with different names per same worker.
Worker Table
ID | Worker Name | Other Information
__________________________________________________________________
1 | Worker1 | ..........................
2 | Worker2 | ..........................
3 | Worker3 | ..........................
4 | Worker4 | ..........................
Tags Table
ID |Foreign Key(WorkerID) | Name
__________________________________________________________________
1 | 1 | foo
2 | 1 | bar
3 | 2 | foo
5 | 3 | foo
6 | 3 | bar
7 | 3 | baz
8 | 1 | qux
My goal is to filter WorkerID's based on an inputted table of strings. I want to get the set of WorkerID's that have the same tags as the inputted ones. For example, if the inputted strings are foo and bar, I would like to return WorkerID's 1 and 3. Any idea how to do this? I was thinking something to do with GROUP BY or JOINING tables. I am new to SQL and can't seem to figure it out.
This is a variant of relational division. Here's one attempt:
select workerid
from tags
where name in ('foo', 'bar')
group by workerid
having count(distinct name) = 2
You can use the following:
select WorkerID
from tags where name in ('foo', 'bar')
group by WorkerID
having count(*) = 2
and this will retrieve your desired result/
Regards.
This article is an excellent resource on the subject.
While the answer from #Lennart works fine in Query Analyzer, you're not going to be able to duplicate that in a stored procedure or from a consuming application without opening yourself up to SQL injection attacks. To extend the solution, you'll want to look into passing your list of tags as a table-valued parameter since SQL doesn't support arrays.
Essentially, you create a custom type in the database that mimics a table with only one column:
CREATE TYPE list_of_tags AS TABLE (t varchar(50) NOT NULL PRIMARY KEY)
Then you populate an instance of that type in memory:
DECLARE #mylist list_of_tags
INSERT #mylist (t) VALUES('foo'),('bar')
Then you can select against that as a join using the GROUP BY/HAVING described in the previous answers:
select workerid
from tags inner join #mylist on tag = t
group by workerid
having count(distinct name) = 2
*Note: I'm not at a computer where I can test the query. If someone sees a flaw in my query, please let me know and I'll happily correct it and thank them.

How to get sum of values per id and update existing records in other table

I have two tables like:
ID | TRAFFIC
fd56756 | 4398
645effa | 567899
894fac6 | 611900
894fac6 | 567899
and
USER | ID | TRAFFIC
andrew | fd56756 | 0
peter | 645effa | 0
john | 894fac6 | 0
I need to get SUM ("TRAFFIC") from first table AND set column traffic to the second table where first table ID = second table ID. ID's from first table are not unique, and can be duplicated.
How can I do this?
Table names from your later comment. Chances are, you are reporting table and column names incorrectly.
UPDATE users u
SET "TRAFFIC" = sub.sum_traffic
FROM (
SELECT "ID", sum("TRAFFIC") AS sum_traffic
FROM stats.traffic
GROUP BY 1
) sub
WHERE u."ID" = sub."ID";
Aside: It's unwise to use mixed-case identifiers in Postgres. Use legal, lower-case identifiers, which do not need to be double-quoted, to make your life easier. Start by reading the manual here.
Something like this?
UPDATE users t2 SET t2.traffic = t1.sum_traffic FROM
(SELECT sum(t1.traffic) t1.sum_traffic FROM stats.traffic t1)
WHERE t1.id = t2.id;

Why do WHERE and HAVING exist as separate clauses in SQL?

I understand the distinction between WHERE and HAVING in a SQL query, but I don't see why they are separate clauses. Couldn't they be combined into a single clause that could handle both aggregated and non-aggregated data?
Here's the rule. If a condition refers to an aggregate function, put that condition in the HAVING clause. Otherwise, use the WHERE clause.
Here's another rule: You can't use HAVING unless you also use GROUP BY.
The main difference is that WHERE cannot be used on grouped item (such as SUM(number)) whereas HAVING can.The reason is the WHERE is done before the grouping and HAVING is done after the grouping is done.
ANOTHER DIFFERENCE IS WHERE clause requires a condition to be a column in a table, but HAVING clause can use both column and alias.
Here's the difference:
SELECT `value` v FROM `table` WHERE `v`>5;
Error #1054 - Unknown column 'v' in 'where clause'
SELECT `value` v FROM `table` HAVING `v`>5; -- Get 5 rows
WHERE clause requires a condition to be a column in a table, but HAVING clause can use both column and alias.
This is because WHERE clause filters data before select, but HAVING clause filters data after select.
So put the conditions in WHERE clause will be more effective if you have many many rows in a table.
Try EXPLAIN to see the key difference:
EXPLAIN SELECT `value` v FROM `table` WHERE `value`>5;
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| 1 | SIMPLE | table | range | value | value | 4 | NULL | 5 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
EXPLAIN SELECT `value` v FROM `table` having `value`>5;
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
| 1 | SIMPLE | table | index | NULL | value | 4 | NULL | 10 | Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
You can see either WHERE or HAVING uses index, but the rows are different.
So there is a need of both of them especially when we need grouping and additional filters.
This question seems to illustrate a misunderstanding that WHERE and HAVING are both missing up to 1/2 of the information necessary to fully process a query.
Consider the following SQL:
drop table if exists foo; create table foo (
ID int,
bar int
); insert into foo values (1, 1);
select now() as d, bar as b
from foo
where bar = 1 and d <= now()
having bar = 1 and ID = 1
;
In the where clause, d is not available because the selected items have not been processed to create it yet.
In the having clause ID has been discarded because it was not selected. In aggregate queries ID may not even have meaning in context of multiple rows combined into one. ID may also be meaningless when joining different tables into a single result.
Could it be done? Sure, but on the back-end it'd do the same as it does now, because you have to aggregate something before you can filter based on that aggregation. Ultimately that's the reason, it's a logical separation of different processes. Why waste resources aggregating records you could have filtered with a WHERE?
The question could only be fully answered by the designer since it asks intent. But the implication is that both clauses do the same thing only against aggregated vs. non-aggregated data. That's not true. "The HAVING clause is typically used together with the GROUP BY clause to filter the results of aggregate values. However, HAVING can be specified without GROUP BY."
As I understand it, the important thing is that "The HAVING clause specifies additional filters that are applied after the WHERE clause filters."
http://technet.microsoft.com/en-us/library/ms179270(v=sql.105).aspx