dynamic sql pivot table - sql

i hope you can help me with the resolution of this task we have.
originally we have these tables:
hwtype
id name
1 router
2 switch
hwelement
id idhwtype name
1 1 RTR1
2 1 RTR2
3 2 SWT1
hwattributes
id idhwtype name
1 1 speed
2 1 IP
3 2 ports
hwtypeattributes
id idhwelement idhwattribute value
1 1 1 100mb
2 1 2 172.16.3.23
3 2 1 10mb
4 2 2 172.16.3.26
5 3 3 8
what we need now is a function that presents the data in this way (according hwtype )
for hwtype.name =router
element speed IP
RTR1 100mb 172.16.3.23
RTR2 10mb 172.16.3.26
The idea is to make the tables able to include new element types, elements and attributes without having to modify the tables coding.
I had been looking for examples but unfortunately i had found good ones that do aggregation on values which is something i had not consider.
thanks in advance for your help

You're using the EAV antipattern. This breaks all sorts of rules of relational database design and as you have discovered, getting data out is very awkward. There are many other weaknesses of this design, recounted elsewhere.
Read the article "Bad CaRMa" for a great story of how an EAV system destroyed a company.
Here's what you have to do to get the router attributes out of your database:
SELECT e.name AS "element",
speedval.value AS "speed",
ipval.value AS "IP",
portsval.value AS "Ports"
FROM hwtype t
JOIN hwelement e ON (e.idhwtype = t.id)
JOIN hwattributes speed ON (speed.idhwtype = t.id AND speed.name = 'speed')
LEFT OUTER JOIN hwtypeattributes speedval
ON (speedval.idhwattribute = speed.id AND speedval.idhwelement = e.id)
JOIN hwattributes ip ON (ip.idhwtype = t.id AND ip.name = 'ip')
LEFT OUTER JOIN hwtypeattributes ipval
ON (ipval.idhwattribute = ip.id AND ipval.idhwelement = e.id)
JOIN hwattributes ports ON (ports.idhwtype = t.id AND ports.name = 'ports')
LEFT OUTER JOIN hwtypeattributes portsval
ON (portsval.idhwattribute = ports.id AND portsval.idhwelement = e.id)
WHERE t.name = 'router';
Note that you need an extra pair of joins for each attribute if you insist on fetching all attributes for a given element on a single row. This quickly gets prohibitively expensive for the SQL optimizer.
It's far easier to fetch the attributes on multiple rows, and sort it out in application code:
SELECT e.name AS "element", a.name, v.value
FROM hwtype t
JOIN hwelement e ON (e.idhwtype = t.id)
JOIN hwattributes a ON (a.idhwtype = t.id)
JOIN hwtypeattributes v ON (v.idhwattribute = a.id AND v.idhwelement = e.id)
WHERE t.name = 'router';

Related

Refactoring slow SQL query

I currently have this very very slow query:
SELECT generators.id AS generator_id, COUNT(*) AS cnt
FROM generator_rows
JOIN generators ON generators.id = generator_rows.generator_id
WHERE
generators.id IN (SELECT "generators"."id" FROM "generators" WHERE "generators"."client_id" = 5212 AND ("generators"."state" IN ('enabled'))) AND
(
generators.single_use = 'f' OR generators.single_use IS NULL OR
generator_rows.id NOT IN (SELECT run_generator_rows.generator_row_id FROM run_generator_rows)
)
GROUP BY generators.id;
An I'm trying to refactor it/improve it with this query:
SELECT g.id AS generator_id, COUNT(*) AS cnt
from generator_rows gr
join generators g on g.id = gr.generator_id
join lateral(select case when exists(select * from run_generator_rows rgr where rgr.generator_row_id = gr.id) then 0 else 1 end as noRows) has on true
where g.client_id = 5212 and "g"."state" IN ('enabled') AND
(g.single_use = 'f' OR g.single_use IS NULL OR has.norows = 1)
group by g.id
For reason it doesn't quite work as expected(It returns 0 rows). I think I'm pretty close to the end result but can't get it to work.
I'm running on PostgreSQL 9.6.1.
This appears to be the query, formatted so I can read it:
SELECT gr.generators_id, COUNT(*) AS cnt
FROM generators g JOIN
generator_rows gr
ON g.id = gr.generator_id
WHERE gr.generators_id IN (SELECT g.id
FROM generators g
WHERE g.client_id = 5212 AND
g.state = 'enabled'
) AND
(g.single_use = 'f' OR
g.single_use IS NULL OR
gr.id NOT IN (SELECT rgr.generator_row_id FROM run_generator_rows rgr)
)
GROUP BY gr.generators_id;
I would be inclined to do most of this work in the FROM clause:
SELECT gr.generators_id, COUNT(*) AS cnt
FROM generators g JOIN
generator_rows gr
ON g.id = gr.generator_id JOIN
generators gg
on g.id = gg.id AND
gg.client_id = 5212 AND gg.state = 'enabled' LEFT JOIN
run_generator_rows rgr
ON g.id = rgr.generator_row_id
WHERE g.single_use = 'f' OR
g.single_use IS NULL OR
rgr.generator_row_id IS NULL
GROUP BY gr.generators_id;
This does make two assumptions that I think are reasonable:
generators.id is unique
run_generator_rows.generator_row_id is unique
(It is easy to avoid these assumptions, but the duplicate elimination is more work.)
Then, some indexes could help:
generators(client_id, state, id)
run_generator_rows(id)
generator_rows(generators_id)
Generally avoid inner selects as in
WHERE ... IN (SELECT ...)
as they are usually slow.
As it was already shown for your problem it's a good idea to think of SQL as of set- theory.
You do NOT join tables on their sole identity:
In fact you take (SQL does take) the set (- that is: all rows) of the first table and "multiply" it with the set of the second table - thus ending up with n times m rows.
Then the ON- clause is used to (often strongly) reduce the result by simply selecting each one of those many combinations by evaluating this portion to either true (take) or false (drop). This way you can chose any arbitrary logic to select those combinations in favor.
Things get trickier with LEFT JOIN and RIGHT JOIN, but one can easily think of them as to take one side for granted:
output the combinations of that row IF the logic yields true (once at least) - exactly like JOIN does
output exactly ONE row, with 'the other side' (right side on LEFT JOIN and vice versa) consisting of ALL NULL for every column.
Count(*) is great either, but if things getting complicated don't stick to it: Use Sub- Selects for the keys only, and once all the hard word is done join the Fun- Stuff to it. Like in
SELECT SUM(VALID), ID
FROM SELECT
(
(1 IF X 0 ELSE) AS VALID, ID
FROM ...
)
GROUP BY ID) AS sub
JOIN ... AS details ON sub.id = details.id
Difference is: The inner query is executed only once. The outer query does usually have no indices left to work with and will be slow, but if the inner select here doesn't make the data explode this is usually many times faster than SELECT ... WHERE ... IN (SELECT..) constructs.

Aggregate to 'plain' query

I have a query which uses aggregate functions to assign the maximum absolute of the values to another column in the table. The problem is that it takes whole lot of time (apprx. adds upto 10-15 seconds) to query completion time. This is what the query looks like:
UPDATE calculated_table c
SET tp = (SELECT MAX(ABS(s.tp))
FROM ts s INNER JOIN tc t ON s.id = t.id
GROUP BY s.id);
Where id is not unique, hence the grouping. tp is a numeric whole number field. Here is what the tables look like:
TABLE ts
PID(primary) | id (FKEY) | tp (integer)
--------------------+-----------------------------+------------------------------------------------------
1 | 2 | -100
2 | 2 | -500
3 | 2 | -1000
TABLE tc
PID(primary) | id (FKEY)
--------------------+-----------------------------+-------------------------
1 | 2
I want the output to look like:
TABLE c
PID(primary) | tp (integer)
--------------------+-----------------------------+--------
1 | 1000
I tried to make it work like this:
UPDATE calculated_table c
SET tp = (SELECT s.tp
FROM ts s INNER JOIN tc t ON s.id = t.id
ORDER BY s.tp DESC
LIMIT 1);
Though it improved the performance, however the results are incorrect.. any help would be appreciated?
I did manage to modify the query, turnsout nesting aggregate functions is not a good option. However, if it helps anyone, here is what I ended up doing:
UPDATE calculated_table c
SET tp = (SELECT ABS(s.trade_position)
FROM ts s INNER JOIN tc t ON s.id = t.id
WHERE c.id = s.id
ORDER BY ABS(s.tp) DESC
LIMIT 1);
Though it improved the performance, however the results are incorrect.
The operation was a success, but the patient died.
The problem with your query is that
SELECT MAX(ABS(s.tp))
FROM ts s INNER JOIN tc t ON s.id = t.id
GROUP BY s.id);
doesn't produce a scalar value; it produces a column of values, one for each s.id. Your DBMS really should raise a syntax error. In terms of performance, I think you're sequentially applying each row produced by the subquery to each row in the target table. It's probably both slow and wrong.
What you want is to correlate your select output with the table you're updating, and limit the rows updated to those correlated. Here's ANSI syntax to update one table from another:
UPDATE calculated_table
SET tp = (SELECT MAX(ABS(s.tp))
FROM ts s INNER JOIN tc t ON s.id = t.id
where s.id = calculated_table.id)
where exists ( select 1 from ts join tc
on ts.id = tc.id
where ts.id = calculated_table.id )
That should be close to what you want.
BTW, it's tempting to interpret correlated subqueries literally, to think that the subquery is run N times, once for each row in the target table. And that's the right way to picture it, logically. The DBMS won't implement it that way, though, in all likelihood, and performance should be much better than that picture would suggest.
Try:
UPDATE calculated_table c
SET tp = (SELECT greatest( MAX( s.tp ) , - MIN( s.tp ))
FROM ts s INNER JOIN tc t ON s.id = t.id
WHERE c.id = s.id
);
Also try to create a multicolumn index on ts( id, tp )
I hope the below sql will be helpful to you, I tested in netezza, but not postgresql. Also, I didn't put update on top of it.
SELECT ABS(COM.TP)
FROM TC C LEFT OUTER JOIN
(SELECT ID,TP
FROM TS A
WHERE NOT EXISTS (SELECT 1
FROM TS B
WHERE A.ID = B.ID
AND ABS(B.TP)>ABS(A.TP))) COM
ON C.ID = COM.ID

Query different IDs with different values?

I'm trying to write a query for a golf database. It needs to return players who have statisticID = 1 with a p2sStatistic > 65 and who also have statisticID = 3 with p2sStatistic > 295.
One statisticID is driving distance, the other accuracy, etc. I've tried the following but it doesn't work and can't seem to find an answer online. How would I go about this without doing a view?
SELECT playerFirstName, playerLastName
FROM player2Statistic, player
WHERE player.playerID=player2Statistic.playerID
AND player2Statistic.statisticID=statistic.statisticID
AND p2sStatistic.3 > 295
AND p2sStatistic.1 > 65;
http://i.imgur.com/o8epk.png - pic of db
Trying to get it just output the list of players that satisfy those two conditions.
For a list of players without duplicates an EXISTS semi-join is probably best:
SELECT playerFirstName, playerLastName
FROM player AS p
WHERE EXISTS (
SELECT 1
FROM player2Statistic AS ps
WHERE ps.playerID = p.playerID
AND ps.StatisticID = 1
AND ps.p2sStatistic > 65
)
AND EXISTS (
SELECT 1
FROM player2Statistic AS ps
WHERE ps.playerID = p.playerID
AND ps.StatisticID = 3
AND ps.p2sStatistic > 295
);
Column names and context are derived from the provided screenshots. The query in the question does not quite cover it.
Note the parenthesis, they are needed to cope with operator precedence.
This is probably faster (duplicates are probably not possible):
SELECT p.playerFirstName, p.playerLastName
FROM player AS p
JOIN player2Statistic AS ps1 USING (playerID)
JOIN player2Statistic AS ps3 USING (playerID)
AND ps1.StatisticID = 1
AND ps1.p2sStatistic > 65
AND ps3.StatisticID = 3
AND ps3.p2sStatistic > 295;
If your top-secret brand of RDBMS does not support the SQL-standard (USING (playerID), substitute: ON ps1.playerID = p.playerID to the same effect.
It's a case of relational division. Find many more query techniques to deal with it under this related question:
How to filter SQL results in a has-many-through relation
You are missing the statistic table in your query. You need to join it in, based on your where clause.
You also need to use proper join syntax.
The following version joins in the statistics table twice, once for the "1" and once for the "3":
SELECT distinct playerFirstName, playerLastName
FROM player2Statistic p2s join
player p
on p.playerId = p2s.playerId join
statistic s3
on s3.StatisticId = p2s.statistcId and
s3.StatisticId = 3 join
statistic s1
on s1.StatisticId = p2s.statisticId and
s1.StatisticId = 1
WHERE (s3.statistic > 295 and s1.statistic > 65)
You will want to join to the statistics table twice:
SELECT playerFirstName, playerLastName
FROM player p
JOIN player2Statistic s1
on p.playerID=s1.playerID and s1.statisticID = 1
JOIN player2Statistic s3
on p.playerID=s3.playerID and s1.statisticID = 3
WHERE s1.p2sStatistic > 65 and s3.p2sStatistic > 295;

Receiving 1 row from joined (1 to many) postgresql

I have this problem:
I have 2 major tables (apartments, tenants) that have a connection of 1 to many (1 apartment, many tenants).
I'm trying to pull all my building apartments, but with one of his tenants.
The preffered tenant is the one who have ot=2 (there are 2 possible values: 1 or 2).
I tried to use subqueries but in postgresql it doesn't let you return more than 1 column.
I don't know how to solve it. Here is my latest code:
SELECT a.apartment_id, a.apartment_num, a.floor, at.app_type_desc_he, tn.otype_desc_he, tn.e_name
FROM
public.apartments a INNER JOIN public.apartment_types at ON
at.app_type_id = a.apartment_type INNER JOIN
(select t.apartment_id, t.building_id, ot.otype_id, ot.otype_desc_he, e.e_name
from public.tenants t INNER JOIN public.ownership_types ot ON
ot.otype_id = t.ownership_type INNER JOIN entities e ON
t.entity_id = e.entity_id
) tn ON
a.apartment_id = tn.apartment_id AND tn.building_id = a.building_id
WHERE
a.building_id = 4 AND tn.building_id=4
ORDER BY
a.apartment_num ASC,
tn.otype_id DESC
Thanx in advance
SELECT a.apartment_id, a.apartment_num, a.floor
,at.app_type_desc_he, tn.otype_desc_he, tn.e_name
FROM public.apartments a
JOIN public.apartment_types at ON at.app_type_id = a.apartment_type
LEFT JOIN (
SELECT t.apartment_id, t.building_id, ot.otype_id
,ot.otype_desc_he, e.e_name
FROM public.tenants t
JOIN public.ownership_types ot ON ot.otype_id = t.ownership_type
JOIN entities e ON t.entity_id = e.entity_id
ORDER BY (ot.otype_id = 2) DESC
LIMIT 1
) tn ON (tn.apartment_id, tn.building_id)=(a.apartment_id, a.building_id)
WHERE a.building_id = 4
AND tn.building_id = 4
ORDER BY a.apartment_num; -- , tn.otype_id DESC -- pointless
Crucial part emphasized.
This works in either case.
If there are tenants for an apartment, exactly 1 will be returned.
If there is one (or more) tenant of ot.otype_id = 2, it will be one of that type.
If there are no tenants, the apartment is still returned.
If, for ot.otype_id ...
there are 2 possible values: 1 or 2
... you can simplify to:
ORDER BY ot.otype_id DESC
Debug query
Try removing the WHERE clauses from the base query and change
JOIN public.apartment_types
to
LEFT JOIN public.apartment_types
and add them back one by one to see which condition excludes all rows.
Do at.app_type_id and a.apartment_type really match?

How do I use rows-as-fields in a SQL database

I've got a SQL related question regarding a general database structure that seems to be somewhat common. I came up with it one day while trying to solve a problem and (later on) I've seen other people do the same thing (or something remarkably similar) so I think the structure itself makes sense. I just have trouble trying to form certain queries against it.
The idea is that you've got a table with "items" in it and you want to store a set of fields and their values for each item. Normally this would be done by simply adding columns to the items table, the problem is that the field(s) themselves (not just the values) vary from item to item. For example, I might have two items:
Article 1
product_id = aproductid
hidden_key = ahiddenkeyvalue
Article 2
product_id = anotherproductid
address = anaddress
You can see that both items have a product_id field (with different values) but the data stored for each item is different.
The structure in the database ends up something like this:
ItemsTable
id
itemdata_1
itemdata_2
...
FieldsTable
id
field_name
...
And the table that relates them and makes it work
FieldsItemRelationsTable
field_id
item_id
value
Well when I'm trying to do something that involves just one "dynamic field" value there's no problem. I usually do something similar to:
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id
INNER JOIN FieldsTable f ON f.id = v.field_id
WHERE v.value = 50 AND f.name = 'product_id';
Which selects all items where product_id=50
The problem arises when I need to do something involving multiple "dynamic field" values. Say I want to select all items where product_id = 50 AND hidden_key = 30. Is it possible with a single SQL statement? I've tried:
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id
INNER JOIN FieldsTable f ON f.id = v.field_id
WHERE (v.value = 50 AND f.name = 'product_id')
AND (v.value = 30 AND f.name = 'hidden_key');
But it just returns zero rows.
You'll need to do a seperate join for each value you are bringing back...
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id
INNER JOIN FieldsTable f ON f.id = v.field_id
INNER JOIN FieldsItemRelationsTable v2 ON v2.item_id = i.id
INNER JOIN FieldsTable f2 ON f2.id = v2.field_id
WHERE v.value = 50 AND f.name = 'product_id'
AND (v2.value = 30 AND f2.name = 'hidden_key');
er...that query might not function (a bit of a copy/paste sludge job on my part), but you get the idea...you'll need the second value held in a second instance of the table(s) (v2 and f2 in my example here) that is seperate than the first instance. v1.value = 30 and v2.value = 50. v1.value = 50 and v1.value = 30 should never return rows as nothing will equal 30 and 50 at the same time
As an after thought...the query will probably read easier had you put the where clause in the join statement
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id and v.value = 50
INNER JOIN FieldsTable f ON f.id = v.field_id and f.name = 'product_id'
INNER JOIN FieldsItemRelationsTable v2 ON v2.item_id = i.id and v2.value = 30
INNER JOIN FieldsTable f2 ON f2.id = v2.field_id and f2.name = 'hidden_key'
Functionally both queries should operate the same though. I'm not sure if there's a logical limit...in scheduling systems you'll often see a setup for 'exceptions'...I've got a report query that's joining like this 28 times...one for each exception type returned.
It's called EAV
Some people hate it
There are alternatives (SO)
Sorry to be vague, but I would investigate your options more.
Try doing some left or right joins to see if you get any results. inner joins will not return results sometimes if there are null fields.
its a start.
Dont forget though, outer join = cartesian product