Let's say there is a table call ITEM and it contains 3 attributes(name, id, price):
name id price
Apple 1 3
Orange 1 3
Banana 2 4
Cherry 3 5
Mango 1 3
How should I write a query to use a constants selection operator to select those item that have same prices and same ids ? The first thing come into my mind is use a rename operator to rename id to id', and price to price', then union it with the ITEM table, but since I need to select 2 tuples (price=price' & id=id') from the table, how can I select them without using the conjunctions operator in relational algebra ?
Thank you.
I'm not quite sure but for me, it would be something like this in relational calculus:
and then in SQL:
SELECT name FROM ITEM i WHERE
EXISTS ITEM u
AND u.name != i.name
AND u.price=i.price
AND u.id = i.id
But still, I think your assumption is right, you can still do it by renaming. I do believe it is a bit longer than what I did above.
Related
Is there a simple way to delete a STRUCT from the nested and repeated field in the BigQuery (BQ table column Type: RECORD, Mode: REPEATED).
Let's say I have the following tables:
wishlist
name toy.id toy.priority
Alice 1 high
2 medium
3 low
Kazik 3 high
1 medium
toys
id name available
1 car 0
2 doll 1
3 bike 1
I'd like to DELETE from wishlist toys that are not available (toys.available==0). In this case, it's toy.id==1.
As a result, the wishlist would look like this:
name toy.id toy.priority
Alice 2 medium
3 low
Kazik 3 high
I know how to select it:
WITH `project.dataset.wishlist` AS
(
SELECT 'Alice' name, [STRUCT<id INT64, priority STRING>(1, 'high'), (2, 'medium'), (3, 'low')] toy UNION ALL
SELECT 'Kazik' name, [STRUCT<id INT64, priority STRING>(3, 'high'), (1, 'medium')]
), toys AS (
SELECT 1 id, 'car' name, 0 available UNION ALL
SELECT 2 id, 'doll' name, 1 available UNION ALL
SELECT 3 id, 'bike' name, 1 available
)
SELECT wl.name, ARRAY_AGG(STRUCT(unnested_toy.id, unnested_toy.priority)) as toy
FROM `project.dataset.wishlist` wl, UNNEST (toy) as unnested_toy
LEFT JOIN toys t ON unnested_toy.id=t.id
WHERE t.available != 0
GROUP BY name
But I don't know how to remove structs <toy.id, toy.priority> from wishlist when toys.available==0.
There are very similar questions like How to delete/update nested data in bigquery or How to Delete rows from Structure in bigquery but the answers are either unclear to me in terms of deletion or suggest copying the whole wishlist to the new table using the selection statement. My 'wishlist' is huge and 'toys.availabililty' changes often. Copying it seems to me very inefficient.
Could you please suggest a solution aligned with BQ best practices?
Thank you!
... since row Deletion was implemented in BQ, I thought that STRUCT deletion inside a row is also possible.
You can use UPDATE DML for this (not DELETE as it is used for deletion of whole row(s), while UPDATE can be used to modify the row)
update `project.dataset.wishlist` wl
set toy = ((
select array_agg(struct(unnested_toy.id, unnested_toy.priority))
from unnest(toy) as unnested_toy
left join `project.dataset.toys` t on unnested_toy.id=t.id
where t.available != 0
))
where true;
You can UNNEST() and reaggregate:
SELECT wl.name,
(SELECT ARRAY_AGG(t)
FROM UNNEST(wl.toy) t JOIN
toys
ON toys.id = t.id
WHERE toys.available <> 0
) as available_toys
FROM `project.dataset.wishlist` wl;
Let's assume, a table has the following rows
ID Name Value
1 Apple Red
1 Taste Sour
2 Apple Yellow
2 Taste Sweet
3 Apple Red
3 Taste Sour
4 Apple Green
4 Taste Tart
5 Apple Yellow
5 Taste Sweet
I wonder, how can I select ID's corresponding to distinct combination of Apple and Taste? For example, ID=1 corresponds to red sour apple and ID=3 can be omitted in the query result. Similarly, ID=2 is for yellow sweet apple and ID=5 can be excluded from the query result, etc. A valid query result can be any of the following ID sets: (1,2,4), (1,4,5), (2,3,4) etc.
The query or the model could be improved with more understanding of the problem.
But assuming the model is correct and the problem is presented as this, this would be my quick approach.
SELECT MIN(a.ID) as ID
FROM Table a
INNER JOIN Table b ON a.ID = b.ID AND a.Name > b.Name
GROUP BY a.Value, b.Value
This query is joining the table with itself using the ID. But because you would have four lines for each possible combination (Ex.: Apple-Apple, Taste-Taste, Apple-Taste and Taste-Apple), you need to state not only that they are different (Because you would still have Apple-Taste and Taste-Apple) but that one of them is bigger than the other (That way you choose to have Apples on one side of the join and Tastes in the other). That's why there is the a.Name > b.Name.
You then group by both the values, stating that you don't want to have more than one combination of Apple values and Taste values. Resulting in only three lines.
The Select I think it depends of the RDBMS (I used SQL Server syntax), and it's selecting the lowest ID. You don't care, so you could choose Min or Max. Min results in lines with 1,2,4. Max would result in 3,4,5.
I need some help on this one. I have a query that I need to make work but I need to limit it by the results of another query.
SELECT ItemID, ItemNums
FROM dbo.Tables
ItemNums is a varchar field that is used to store the strings of the various item numbers.
This produces the following.
ItemID ItemNums
1 1, 4, 5
2 1, 3, 4, 5
3 2
4 4
5 1
I have another table that has each item number as an INT that I need to use to pull all ItemIDs that have the associated ItemNums
Something like this.
SELECT *
FROM dbo.Tables
WHERE ItemNums IN (4,5)
Any help would be appreciated.
If possible, you should change your database schema. In general, it's not good to store comma delimited lists in a relational database.
However, if that's not an option, here's one way using a join with like:
select *
from dbo.Tables t
join dbo.SecondTable st on ', '+t.ItemNums+',' like '%, '+st.ItemNumId+',%'
This concatenates commas to the beginning and end of the itemnums to ensure you only match on the specific ids.
I personally would recommend normalizing your dbo.tables.
It would be better as:
ItemID ItemNums
1 1
1 4
1 5
2 1
etc.
Then you can use a join or a sub query to pull out the rows with ItemNums in some list.
Otherwise, it's going to be a mess and not very fast.
I have a table called sales. In it there are columns catid, desc, parentforeignkey and the records are something like this:
catid desc parentforeignkey
1, clothes, 1
2, shoes, 1
3, socks, 1
4, gloves, 1
5, mittens, 4
6, leather gloves, 4
7, plain gloves, 4
...
How do I build a query to show this relationship?
I'ma take a shot at this, but I'm struggling to see the question. I feel like you want a query that groups the selection by the parent foreignkey and then lists the catid and desc for each parent. So basically something like
SELECT t.parentforeignkey, t.catid, t.desc
FROM table1 as t
GROUP BY t.parentforeignkey, t.catid, t.desc;
NOTE: be careful with "desc" as a column name as DESC is a reserved word for descending (used for sorting)
That will give you a result like:
ParentForeignKey | CatID | Desc
1 1 Clothes
1 2 Shoes
2 2 Shoes
3 1 Clothes
So the trick is to use GROUP BY to assign the parent and child groups. Be careful though, because the order of the group by command matters (GROUP BY Catid, ParentforeignKey yields a different result than what I listed above). Also, you need to explicitely say how each column is related to the grouping. If you leave a column out, you'll likely get an error (depending on your DBMS) that says something like "You tried to specify a query that does not include the specified expression as part of the aggregate function"
EDIT: I now see that you've included the DBMS in your question. If you're using the BIDS or SSRS then this is supremely easy, you'll what a query that just selects the data (and filters whatever you want out) and then you'll go to the tablix controls and define the parent group to the details as catid, and the parent of catid as foreignparentkey and then the table should take care of itself!
I've got two tables in SQL, one with a project and one with categories that projects belong to, i.e. the JOIN would look roughly like:
Project | Category
--------+---------
Foo | Apple
Foo | Banana
Foo | Carrot
Bar | Apple
Bar | Carrot
Qux | Apple
Qux | Banana
(Strings replaced with IDs from a higher normal form, obviously, but you get the point here.)
What I want to do is allow filtering such that users can select any number of categories and results will be filtered to items that are members of all the selected categories. For example, if a user selects categories "Apple" and "Banana", projects "Foo" and "Qux" show up. If a user select categories "Apple", "Banana", and "Carrot" then only the "Foo" project shows up.
The first thing I tried was a simple SELECT DISTINCT Project FROM ... WHERE Category = 'Apple' AND Category = 'Banana', but of course that doesn't work since Apple and Banana show up in the same column in two different rows for any common project.
GROUP BY and HAVING don't do me any good, so tell me: is there an obvious way to do this that I'm missing, or is it really so complicated that I'm going to have to resort to recursive joins?
This is in PostgreSQL, by the way, but of course standard SQL code is always preferable when possible.
See this article in my blog for performance details:
PostgreSQL: selecting items that belong to all categories
The solution below:
Works on any number of categories
Is more efficient that COUNT and GROUP BY, since it checks existence of any project / category pair exactly once, without counting.
SELECT *
FROM (
SELECT DISTINCT Project
FROM mytable
) mo
WHERE NOT EXISTS
(
SELECT NULL
FROM (
SELECT 'Apple' AS Category
UNION ALL
SELECT 'Banana'
UNION ALL
SELECT 'Carrot'
) list
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mii
WHERE mii.Project = mo.Project
AND mii.Category = list.Category
)
)
Since a project can only be in a category once, we can use COUNT to pull this stunt off:
SELECT project, COUNT(category) AS cat_count
FROM /* your join */
WHERE category IN ('apple', 'banana')
GROUP BY project
HAVING cat_count = 2
A project with a category of only apple or banana will get a count of 1, and thus fail the HAVING clause. Only a project with both categories will get a count of 2.
If for some reason you have duplicate categories, you can use something like COUNT(DISTINCT category). COUNT(*) should work as well, and differs only if category can be null.
One other solution is, of course, something like "SELECT DISTINCT Project FROM ... AS a WHERE 'Apple' IN (SELECT Category FROM ... AS b WHERE a.Project = b.Project) AND 'Banana' IN (SELECT Category FROM ... AS b WHERE a.Project = b.Project)", but that gets pretty computationally expensive pretty quickly. I was hoping for something more elegant, and you guys haven't disappointed. I'm including this one mostly for completeness in case someone else consults this question. It's clearly worth zero points. :)