Using nested queries:SQL - sql

I am trying to store the following data in a table where the visible_to is a multivalued attribute.
Wall_ID Facebook_ID Visible_To
W1 F1 F2,F3,F4
W2 F2 F1
W3 F3 F1
W4 F4 F1
I am trying to emulate Facebook on Oracle. I want to find the user who can view the max no of other's wall(here:F1).
I have gotten to the point of storing the multivalued attribute using NESTED table in Oracle 11g. Do I have to un-nest the table to find the result of the query or is there another way to do it?
Thanks!

If I'm understanding what you're trying to do correctly, then something like this should work (Please see the SQL Fiddle):
SELECT *
FROM (
SELECT
f.wall_id
, f.facebook_id
, COUNT(t.COLUMN_VALUE) visible_to_count
FROM facebook_data f
CROSS JOIN TABLE(f.visible_to) t
GROUP BY f.wall_id, f.facebook_id
ORDER BY visible_to_count DESC
)
WHERE ROWNUM = 1
Results:
| WALL_ID | FACEBOOK_ID | VISIBLE_TO_COUNT |
--------------------------------------------
| W1 | F1 | 3 |
This simply unfolds the nested table and then aggregates so you can count up the number of values. You might also want to consider adding the DISTINCT keyword to the COUNT() function if you're likely to store duplicate values, or otherwise introduce cardinality in the query.

Related

Get total count and first 3 columns

I have the following SQL query:
SELECT TOP 3 accounts.username
,COUNT(accounts.username) AS count
FROM relationships
JOIN accounts ON relationships.account = accounts.id
WHERE relationships.following = 4
AND relationships.account IN (
SELECT relationships.following
FROM relationships
WHERE relationships.account = 8
);
I want to return the total count of accounts.username and the first 3 accounts.username (in no particular order). Unfortunately accounts.username and COUNT(accounts.username) cannot coexist. The query works fine removing one of the them. I don't want to send the request twice with different select bodies. The count column could span to 1000+ so I would prefer to calculate it in SQL rather in code.
The current query returns the error Column 'accounts.username' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. which has not led me anywhere and this is different to other questions as I do not want to use the 'group by' clause. Is there a way to do this with FOR JSON AUTO?
The desired output could be:
+-------+----------+
| count | username |
+-------+----------+
| 1551 | simon1 |
| 1551 | simon2 |
| 1551 | simon3 |
+-------+----------+
or
+----------------------------------------------------------------+
| JSON_F52E2B61-18A1-11d1-B105-00805F49916B |
+----------------------------------------------------------------+
| [{"count": 1551, "usernames": ["simon1", "simon2", "simon3"]}] |
+----------------------------------------------------------------+
If you want to display the total count of rows that satisfy the filter conditions (and where username is not null) in an additional column in your resultset, then you could use window functions:
SELECT TOP 3
a.username,
COUNT(a.username) OVER() AS cnt
FROM relationships r
JOIN accounts a ON r.account = a.id
WHERE
r.following = 4
AND EXISTS (
SELECT 1 FROM relationships t1 WHERE r1.account = 8 AND r1.following = r.account
)
;
Side notes:
if username is not nullable, use COUNT(*) rather than COUNT(a.username): this is more efficient since it does not require the database to check every value for nullity
table aliases make the query easier to write, read and maintain
I usually prefer EXISTS over IN (but here this is mostly a matter of taste, as both techniques should work fine for your use case)

Select conditionally bound columns from 2 tables into one line

I searched, but seems I haven't found the right keyword for describing what I am trying to achieve, so please be lenient if this is a known problem, just point me to the right keywords.
I have following tables/entries:
select * from personnes where id=66;
id | aclid | referendid | login | validated | passwd
66 | | | toto#tiiti.com | f | $2y$10$w3DRh/g2Tebu/mkMcQz32OUB.dDjFiBP99vWlMrrPWpR45JZDdw4W
and
select * from pattributs where (name='nom' OR name='prenom') AND persid=66;
id | name | value | persid
----+--------+-------+--------
90 | prenom | Jean | 66
91 | nom | Meyer | 66
Now I use that form for not cluttering the main table, since depending on the case, I record the name, or not....
but having a view as a table of the completed table would be nice, so I tried:
select (personnes."id","login",
(select "value" from pattributs where "name"='nom' AND "persid"=66),
(select "value" from pattributs where "name"='prenom' AND "persid"=66)
) from personnes where personnes.id=66;
which seems to do the job:
row
--------------------------------
(66,toto#tiiti.com,Meyer,Jean)
but the column tags disappeared, and being able to fetch them from the invoking php script is immensely useful, but when I add:
select (personnes."id","login",
(select "value" from pattributs where "name"='nom' AND "persid"=66),
(select "value" from pattributs where "name"='prenom' AND "persid"=66) as 'prenom')
from personnes where personnes.id=66;
I get a syntax error at the as directive... So probably I haven't understood how to do this properly, the braces indicate that this isn't anymore in tabular form), so how can I achieve the following result:
id | login | nom | prenom
66 |toto#tiiti.com | Meyer | Jean
The idea being to store a suitable view for each use case, bundling only the relevant columns.
Thanks in advance
To answer your question: You are loosing the column names because you are creating a simple data set, a row. This is done by your braces. Without them you should get your expected result.
But your solution is not very well: You should avoid to calculate your single columns in every single subquery. This could be done easily in one SELECT:
select
pe.id,
pe.login,
MIN(pa.value) FILTER (WHERE pa.name = 'nom') as nom,
MIN(pa.value) FILTER (WHERE pa.name = 'prenom') as prenom
from
personnes pe
join pattributs pa ON pe.id = pa.persid AND pe.id = 66
where pa.name = 'nom' or pa.name = 'prenom'
group by pe.id, pe.login
First you'll need a JOIN to get the right datasets of both tables together. You should join on the id.
Then you have the problem that you have two rows for the name (which seems not very well designed, why not two columns?). These two values can be grouped by the id. Now you could aggregate them.
What I am doing is to "aggregate" them (it doesn't matter what function I am using because it should be only one value). The FILTER clause filters out the right value.

Hibernate #Formula recursive query with matching table

I have a table with a ManyToMany relation with itself.
So there are two tables in my H2-Database:
supporting_asset: supporting_asset_dependencies:
id | provided_csc dependencies_id | supporting_assets_id
------------------ ---------------------------------------
1 | A1 1 | 2
2 | A3 1 | 3
3 | A2
I have a calculated attribute 'minCSC' and to calculate it I use the #Formula annotation:
#Formula(value="(Select min(sa.provided_csc) from supporting_asset_dependencies sad right join supporting_asset sa on sa.id = sad.dependencies_id where sad.supporting_assets_id = id group by id)")
This works fine, but depending assets can have dependencies of them own. With this I get a multileveled dependency tree. My goal is to get the minimum csc from this tree.
I tried:
WITH RECURSIVE tree(id, provided_csc, dep_id)
AS (SELECT sa.id, sa.provided_csc, sad.supporting_assets_id
FROM supporting_asset_dependencies AS sad
JOIN supporting_asset AS sa
ON sa.id=sad.dependencies_id
WHERE sad.supporting_assets_id=2
UNION ALL
SELECT child.id, child.provided_csc, childd.supporting_assets_id
FROM supporting_asset_dependencies AS childd
JOIN supporting_asset AS child
ON child.id=childd.dependencies_id
JOIN tree ON childd.supporting_assets_id=tree.id )
SELECT min(provided_csc)
FROM tree
but I get a 'Syntax error in SQL statement'. It seems like the formula is computed into:
(SUPPORTING2_.WITH[*] TREE(SUPPORTING2_.ID, SUPPORTING2_.PROVIDED_CSC, SUPPORTING2_.DEP_ID) AS ( ..
.. ) SELECT MIN(SUPPORTING2_.PROVIDED_CSC) FROM TREE) AS FORMULA0_1_,
It looks like it does not know the 'with recursive' command and tries to find it as field of the table.
How do I have to change the query to make it work or is there another way to achieve what I want?
---UPDATE---
I changed the query a little bit and on DB Fiddle it works for a SQLite DB. It also seems to work in the h2 web-console.

Access join on first record

I have two tables in an Access database, tblProducts and tblProductGroups.
I am trying to run a query that joins both of these tables, and brings back a single record for each product. The problem is that the current design allows for a product to be listed in the tblProductGroups table more than 1 - i.e. a product can be a member of more than one group (i didnt design this!)
The query is this:
select tblProducts.intID, tblProducts.strTitle, tblProductGroups.intGroup
from tblProducts
inner join tblProductGroups on tblProducts.intID = tblProductGroups.intProduct
where tblProductGroups.intGroup = 56
and tblProducts.blnActive
order by tblProducts.intSort asc, tblProducts.curPrice asc
At the moment this returns results such as:
intID | strTitle | intGroup
1 | Product 1 | 1
1 | Product 1 | 2
2 | Product 2 | 1
2 | Product 2 | 2
Whereas I only want the join to be based on the first matching record, so that would return:
intID | strTitle | intGroup
1 | Product 1 | 1
2 | Product 2 | 1
Is this possible in Access?
Thanks in advance
Al
This option runs a subquery to find the minimum intGoup for each tblProducts.intID.
SELECT tblProducts.intID
, tblProducts.strTitle
, (SELECT TOP 1 intGroup
FROM tblProductGroups
WHERE intProduct=tblProducts.intID
ORDER BY intGroup ASC) AS intGroup
FROM tblProducts
WHERE tblProducts.blnActive
ORDER BY tblProducts.intSort ASC, tblProducts.curPrice ASC
This works for me. Maybe this helps someone:
SELECT
a.Lagerort_ID,
FIRST(a.Regal) AS frstRegal,
FIRST(a.Fachboden) AS frstFachboden,
FIRST(a.xOffset) AS frstxOffset,
FIRST(a.yOffset) AS frstyOffset,
FIRST(a.xSize) AS frstxSize,
FIRST(a.ySize) AS frstySize,
FIRST(a.Platzgr) AS frstyPlatzgr,
FIRST(b.Artikel_ID) AS frstArtikel_ID,
FIRST(b.Menge) AS frstMenge,
FIRST(c.Breite) AS frstBreite,
FIRST(c.Tiefe) AS frstTiefe,
FIRST(a.Fachboden_ID) AS frstFachboden_ID,
FIRST(b.BewegungsDatum) AS frstBewegungsDatum,
FIRST(b.ErzeugungsDatum) AS frstErzeugungsDatum
FROM ((Lagerort AS a)
LEFT JOIN LO_zu_ART AS b ON a.Lagerort_ID = b.Lagerort_ID)
LEFT JOIN Regal AS c ON a.Regal = c.Regal
GROUP BY a.Lagerort_ID
ORDER BY FIRST(a.Regal), FIRST(a.Fachboden), FIRST(a.xOffset), FIRST(a.yOffset);
I have non unique entries for Lagerort_ID on the table LO_zu_ART. My goal was to only use the first found entry from LO_zu_ART to match into Lagerort.
The trick is to use FIRST() an any column but the grouped one. This may also work with MIN() or MAX(), but I have not tested it.
Also make sure to call the Fields with the "AS" statement different than the original field. I used frstFIELDNAME. This is important, otherwise I got errors.
Create a new query, qryFirstGroupPerProduct:
SELECT intProduct, Min(intGroup) AS lowest_group
FROM tblProductGroups
GROUP BY intProduct;
Then JOIN qryFirstGroupPerProduct (instead of tblProductsGroups) to tblProducts.
Or you could do it as a subquery instead of a separate saved query, if you prefer.
It's not very optimal, but if you're bringing in a few thousand records this will work:
Create a query that gets the max of tblProducts.intID from one table and call it qry_Temp.
Create another query and join qry_temp to the table you are trying to join against, and you should get your results.

SQL magic - query shouldn't take 15 hours, but it does

Ok, so i have one really monstrous MySQL table (900k records, 180 MB total), and i want to extract from subgroups records with higher date_updated and calculate weighted average in each group. The calculation runs for ~15 hours, and i have a strong feeling i'm doing it wrong.
First, monstrous table layout:
category
element_id
date_updated
value
weight
source_prefix
source_name
Only key here is on element_id (BTREE, ~8k unique elements).
And calculation process:
Make hash for each group and subgroup.
CREATE TEMPORARY TABLE `temp1` (INDEX ( `ds_hash` ))
SELECT `category`,
`element_id`,
`source_prefix`,
`source_name`,
`date_updated`,
`value`,
`weight`,
MD5(CONCAT(`category`, `element_id`, `source_prefix`, `source_name`)) AS `subcat_hash`,
MD5(CONCAT(`category`, `element_id`, `date_updated`)) AS `cat_hash`
FROM `bigbigtable` WHERE `date_updated` <= '2009-04-28'
I really don't understand this fuss with hashes, but it worked faster this way. Dark magic, i presume.
Find maximum date for each subgroup
CREATE TEMPORARY TABLE `temp2` (INDEX ( `subcat_hash` ))
SELECT MAX(`date_updated`) AS `maxdate` , `subcat_hash`
FROM `temp1`
GROUP BY `subcat_hash`;
Join temp1 with temp2 to find weighted average values for categories
CREATE TEMPORARY TABLE `valuebycats` (INDEX ( `category` ))
SELECT `temp1`.`element_id`,
`temp1`.`category`,
`temp1`.`source_prefix`,
`temp1`.`source_name`,
`temp1`.`date_updated`,
AVG(`temp1`.`value`) AS `avg_value`,
SUM(`temp1`.`value` * `temp1`.`weight`) / SUM(`weight`) AS `rating`
FROM `temp1` LEFT JOIN `temp2` ON `temp1`.`subcat_hash` = `temp2`.`subcat_hash`
WHERE `temp2`.`subcat_hash` = `temp1`.`subcat_hash`
AND `temp1`.`date_updated` = `temp2`.`maxdate`
GROUP BY `temp1`.`cat_hash`;
(now that i looked through it and wrote it all down, it seems to me that i should use INNER JOIN in that last query (to avoid 900k*900k temp table)).
Still, is there a normal way to do so?
UPD: some picture for reference:
removed dead ImageShack link
UPD: EXPLAIN for proposed solution:
+----+-------------+-------+------+---------------+------------+---------+--------------------------------------------------------------------------------------+--------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+---------------+------------+---------+--------------------------------------------------------------------------------------+--------+----------+----------------------------------------------+
| 1 | SIMPLE | cur | ALL | NULL | NULL | NULL | NULL | 893085 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | next | ref | prefix | prefix | 1074 | bigbigtable.cur.source_prefix,bigbigtable.cur.source_name,bigbigtable.cur.element_id | 1 | 100.00 | Using where |
+----+-------------+-------+------+---------------+------------+---------+--------------------------------------------------------------------------------------+--------+----------+----------------------------------------------+
Using hashses is one of the ways in which a database engine can execute a join. It should be very rare that you'd have to write your own hash-based join; this certainly doesn't look like one of them, with a 900k rows table with some aggregates.
Based on your comment, this query might do what you are looking for:
SELECT cur.source_prefix,
cur.source_name,
cur.category,
cur.element_id,
MAX(cur.date_updated) AS DateUpdated,
AVG(cur.value) AS AvgValue,
SUM(cur.value * cur.weight) / SUM(cur.weight) AS Rating
FROM eev0 cur
LEFT JOIN eev0 next
ON next.date_updated < '2009-05-01'
AND next.source_prefix = cur.source_prefix
AND next.source_name = cur.source_name
AND next.element_id = cur.element_id
AND next.date_updated > cur.date_updated
WHERE cur.date_updated < '2009-05-01'
AND next.category IS NULL
GROUP BY cur.source_prefix, cur.source_name,
cur.category, cur.element_id
The GROUP BY performs the calculations per source+category+element.
The JOIN is there to filter out old entries. It looks for later entries, and then the WHERE statement filters out the rows for which a later entry exists. A join like this benefits from an index on (source_prefix, source_name, element_id, date_updated).
There are many ways of filtering out old entries, but this one tends to perform resonably well.
Ok, so 900K rows isn't a massive table, it's reasonably big but and your queries really shouldn't be taking that long.
First things first, which of the 3 statements above is taking the most time?
The first problem I see is with your first query. Your WHERE clause doesn't include an indexed column. So this means that it has to do a full table scan on the entire table.
Create an index on the "data_updated" column, then run the query again and see what that does for you.
If you don't need the hash's and are only using them to avail of the dark magic then remove them completely.
Edit: Someone with more SQL-fu than me will probably reduce your whole set of logic into one SQL statement without the use of the temporary tables.
Edit: My SQL is a little rusty, but are you joining twice in the third SQL staement? Maybe it won't make a difference but shouldn't it be :
SELECT temp1.element_id,
temp1.category,
temp1.source_prefix,
temp1.source_name,
temp1.date_updated,
AVG(temp1.value) AS avg_value,
SUM(temp1.value * temp1.weight) / SUM(weight) AS rating
FROM temp1 LEFT JOIN temp2 ON temp1.subcat_hash = temp2.subcat_hash
WHERE temp1.date_updated = temp2.maxdate
GROUP BY temp1.cat_hash;
or
SELECT temp1.element_id,
temp1.category,
temp1.source_prefix,
temp1.source_name,
temp1.date_updated,
AVG(temp1.value) AS avg_value,
SUM(temp1.value * temp1.weight) / SUM(weight) AS rating
FROM temp1 temp2
WHERE temp2.subcat_hash = temp1.subcat_hash
AND temp1.date_updated = temp2.maxdate
GROUP BY temp1.cat_hash;