How can I do a group-concat call with a max value? - sql

I'm tracking game prices across multiple stores. I have a games table:
id | title | platform_id
---|-------------|-----------
1 | Super Mario | 1
2 | Tetris | 3
3 | Sonic | 2
a stores table:
id | title
---|-------------
1 | Target
2 | Amazon
3 | EB Games
and a copies table with one entry for Target's copy of a given game, one entry for Amazon's, etc. I store the SKU so I can use it when scraping their websites.
game_id | store_id | sku
--------|----------|----------
1 | 2 | AMZ-3F4YK
1 | 3 | 001481
I run one scrape a day or a week or however long, and I store the result as cents in a prices table:
sku | price | time
----------|---------|------
AMZ-3F4YK | 4010 | 13811101
001481 | 3210 | 13811105
Plus a platforms table that just maps IDs to names.
Here's where I get confused and stuck.
I want to issue a query that selects each game, plus its most recent price at each store. So it would net results like
games.title | platform_name | info
------------|---------------|------
Super Mario | NES | EB Games,1050;Amazon,3720;Target,5995
Tetris | Game Boy | EB Games,3720;Amazon,410;Target,5995
My best attempt thus far is
select
games.title as title,
platforms.name as platform,
group_concat(distinct(stores.name) || "~" || prices.price) as price_info
from games
join platforms on games.platform_id = platforms.id
join copies on copies.game_id = games.id
join prices on prices.sku = copies.sku
join stores on stores.id = copies.store_id
group by title
Which nets results like
Super Mario | NES | EB Games~2300,Target~2300,Target~3800
that is, it includes every price listed, when I only want one per store (and for it to be the most recent). Figuring out how to integrate the 'select price where id = (select id from max(time)...' etc subquery to sort this out has totally stumped me all night and I'd appreciate any advice anyone could offer me.
I'm using SQLite, but if there's a better option in Postgres I could do it there.

You need two levels of aggregation . . . And, Postgres is much simpler for this, so I'll use Postgres syntax:
select title, platform,
string_agg(s.name || '~' pr.price order by s.name)
from (select distinct on (g.title, p.name, s.name) g.title as title, p.name as platform, s.name, pr.price
from games g join
platforms p
on g.platform_id = p.id join
copies c
on c.game_id = g.id join
prices pr
on pr.sku = c.sku join
stores s
on s.id = c.store_id
group by g.title, p.name, s.name, pr.time desc
) gps
group by title, platform

Related

SQL Server avoid repeat same joins

I´m doing the query below where I´m repeating the same joins multiple times, there is a better way to do it? (SQL Server Azure)
Ex.
Table: [Customer]
[Id_Customer] | [CustomerName]
1 | Tomy
...
Table: [Store]
[Id_Store] | [StoreName]
1 | SuperMarket
2 | BestPrice
...
Table: [SalesFrutes]
[Id_SalesFrutes] | [FruteName] | [Fk_Id_Customer] | [Fk_Id_Store]
1 | Orange | 1 | 1
...
Table: [SalesVegetable]
[Id_SalesVegetable] | [VegetableName] | [Fk_Id_Customer] | [Fk_Id_Store]
1 | Pea | 1 | 2
...
Select * From [Customer] as C
left join [SalesFrutes] as SF on SF.[Fk_Id_Customer] = C.[Id_Customer]
left join [SalesVegetable] as SV on SV.[Fk_Id_Customer] = C.[Id_Customer]
left join [Store] as S1 on S1.[Id_Store] = SF.[Fk_Id_Store]
left join [Store] as S2 on S1.[Id_Store] = SV.[Fk_Id_Store]
In my real case, I have many [Sales...] to Join with [Customer] and many other tables similar to [Store] to join to each [Sales...]. So it starts to scale a lot the number on joins repeating. There is a better way to do it?
Bonus question: I do like also to have FruteName, VegetableName, StoreName, and each Food table name under the same column.
The Expected Result is:
[CustomerName] | [FoodName] | [SalesTableName] | [StoreName]
Tomy | Orange | SalesFrute | SuperMarket
Tomy | Pea | SalesVegetable | BestPrice
...
Thank you!!
So based on the information provided, I would have suggested the below, to use a cte to "fix" the data model and make writing your query easier.
Since you say your real-world scenario is different to the info provided it might not work for you, but could still be applicable if you have say 80% shared columns, you can just use placeholder/null values where relevant for unioning the data sets and still minimise the number of joins eg to your store table.
with allSales as (
select Id_SalesFrutes as Id, FruitName as FoodName, 'Fruit' as SaleType, Fk_Id_customer as Id_customer, Fk_Id_Store as Id_Store
from SalesFruits
union all
select Id_SalesVegetable, VegetableName, 'Vegetable', Fk_Id_customer, Fk_Id_Store
from SalesVegetable
union all... etc
)
select c.CustomerName, s.FoodName, s.SaleType, st.StoreName
from Customer c
join allSales s on s.Id_customer=c.Id_customer
join Store st on st.Id_Store=s.Id_Store

sql query with extra relation rows or null

I have
accounts (id, name)
deals (id, name, account_id) many to one accounts
pos (id, name, deal_id) many to one deals
I want to have an export that has all accounts, deals and pos.
Not all deals have a po so if I from POs then I will miss some deals.
If I from deals I will only get 0 or 1 PO even if there is 2 or more. I might also miss some accounts because not all accounts have deals.
All pos have a deal, all deals have an account.
I believe I need to do a seperate report for each where it doesnt have relation and then union each together. I dont quite have the syntax correct.
table could be
account_name | deal_name | po_name
cool account | null | null
another | sweet deal | null
another | bitter sweet | null
last for best | deal 1 | po here
last for best | deal 1 | another po
last for best | deal 2 | null
last for best | deal 3 | o yea
You need left joins from accounts to deals and finally pos:
select
a.name account_name,
d.name deal_name,
p.name po_name
from accounts a
left join deals d on d.account_id = a.id
left join pos p on p.deal_id = d.id

Finding duplicate rows and related records

I have a database that was migrated into a new schema. The old database had no referential integrity and so I need to get rid of lots of duplicates.
I have a table of RegisteredVehicles:
id | plate | state
# | 1425 | il
# | 3322 | il
And a table of ParkingRequests:
id | date | registeredVehicleId (FK)
# | 2/2/12 | #
The relatoinship is one to many - one registered vehicle to many requests.
The following query gets me each duplicate record by Plate and State and also outputs each RegisteredVehicle's Id.
select Id, Plate, [State] from RegisteredVehicles where Plate in (
select plate from RegisteredVehicles group by Plate having count(*) > 1
)
Which gives me something like this
Id Plate State
036d59f1-d928-40f2-b373-049122202bff 0000000 IL
615e2fab-8b43-4e42-b6f0-268038bba949 0000000
I am trying to get a count of parking request per each vehicle row returned in the above code block. Something like this
Id | Plate | State | # Requests
1 | 222 | IL | 2
2 | 333 | IL | 4
But am having issues making the query more complex than it already is. This itself took me quite a while to get working.
Please try this query :
SELECT
A.ID,
A.PLATE,
A.STATE AS [STATE],
COUNT(A.ID) AS [NO OF REQUESTS]
FROM REGISTEREDVEHICLES A
LEFT JOIN PARKINGREQUESTS B
ON B.REGISTEREDVEHICLEID = A.ID
WHERE
A.PLATE IN(
SELECT
PLATE
FROM REGISTEREDVEHICLES
GROUP BY PLATE
HAVING COUNT(*) > 1
)
GROUP BY
A.ID,
A.PLATE,
A.STATE

SQL select only highest date

For a project I want to generate a price list.
I want to get only the latest prices from each supplier for each article.
There are just those two tables.
Table articles
ARTNR | TXT | ACTIVE | SUPPLIER
------------------------------------------
10 | APPLE | Y | 10
20 | ORANGE | Y | 10
30 | KEYBOARD | N | 20
40 | ORANGE | Y | 20
50 | BANANA | Y | 10
60 | CHERRY | Y | 10
Table prices
ARTNR | PRCGRP | PRCDAT | PRICE
--------------------------------------
10 | 10 | 01-Aug-10 | 2.1
10 | 10 | 05-Aug-11 | 2.2
10 | 10 | 21-Aug-12 | 2.5
20 | 0 | 01-Aug-10 | 2.1
20 | 10 | 09-Aug-12 | 2.3
10 | 10 | 14-Aug-13 | 2.7
This is what I have so far:
SELECT
ARTICLES.[ARTNR], ARTICLES.[TXT], ARTICLES.[ACTIVE], ARTICLES.[SUPPLIER], PRICES.PRCGRP, PRICES.PRCDAT, PRICES.PRICE
FROM
ARTICLES INNER JOIN PRICES ON ARTICLES.ARTNR = PRICES.ARTNR
WHERE
(
(ARTICLES.[ACTIVE]="Y") AND
(ARTICLES.[SUPPLIER]=10) AND
(PRICES.PRCGRP=0) AND
(PRICES.PRCDAT=(SELECT MAX(PRCDAT) FROM PRICES as art WHERE art.ARTNR = PRICES.artnr) )
)
ORDER BY ARTICLES.ARTNR
;
It is okay to choose just one supplier each time, but I want the max price.
The problem is:
Lots of articles do not show up with the query above,
but I cannot figure out what is wrong.
I can see that they should be in the resultset when I leave out the subselect on max prcdat.
What is wrong?
Your subquery to get the latest price does not take the other conditions into account, that is when you're getting the latest price, you may get a price in another price group or that is not active. When you join that against the filtered list that has no inactive prices and only prices in a single price group, you get no hits that exist in both.
Either you need to duplicate or - better - move your conditions inside the subquery to get the best price under the conditions. I can't test against access, but something like this should be possible if the SQL is not too limited;
SELECT a.artnr, a.txt, a.active, a.supplier, p.prcgrp, p.prcdat, p.price
FROM articles a INNER JOIN prices p ON a.ARTNR = p.ARTNR
JOIN (
SELECT a.artnr, MAX(p.prcdat) prcdat
FROM articles a JOIN prices p ON a.artnr = p.artnr
WHERE a.active='Y' AND a.supplier=10 AND p.prcgrp=10
GROUP BY a.artnr) z
ON a.artnr = z.artnr AND p.prcdat = z.prcdat
ORDER BY a.ARTNR
If the SQL support in access won't allow a join with a subquery, you can just move the conditions inside your existing subquery, something like;
SELECT a.artnr, a.txt, a.active, a.supplier, p.prcgrp, p.prcdat, p.price
FROM articles a INNER JOIN prices p ON a.ARTNR = p.ARTNR
WHERE p.prcdat = (
SELECT MAX(p2.prcdat)
FROM articles a2 JOIN prices p2 ON a2.artnr = p2.artnr
WHERE a.artnr = a2.artnr AND a2.active='Y' AND a2.supplier=10 AND p2.prcgrp=10
)
ORDER BY a.ARTNR;
Note that due to limitations in identifying a unique price (no primary key in prices), the queries may give duplicates if several prices for the same article have the same prcdat. If that's a problem, you'll probably need to duplicate your conditions outside the subquery too.

Fetch Id's that are related to a specific set of items, but not others

Good morning all, apologies for the title... i had trouble simplifying the problem down to a line. My database platform is Teradata.
I am working w/ a table like the following (let's call it "t1")
+------------+----------------------------------------+
| Service_Id | Product |
+------------+----------------------------------------+
| 1 | Traffic |
| 1 | Weather |
| 1 | Travel |
| 1 | Audio |
| 1 | Audio Add-on |
| 2 | Traffic |
| 2 | Weather |
| 2 | Travel |
+------------+----------------------------------------+
I am trying to select service_id's that are related to the following products AND ONLY the following products: Traffic, Weather, Travel
"Service_Id = 1" does not apply here because while it has the required products, it also has an "audio" product related to it... so we have to leave it out. I was able to successfully do this through a series of temp (volatile) tables but it's feeling really hacky and I feel there's got to be a better way. Thanks for your assistance.
I'm doing stuff like that (find a subset/superset/exact match for a set of rows) in my training classes using pizzas :-)
There are several ways to get your result, but for an exact match the easiest way is a SUM using following logic:
SELECT service_id
FROM t1
GROUP BY 1
HAVING
SUM(CASE WHEN Product IN ('Traffic', 'Weather', 'Travel') THEN 1 ELSE -1 END = 3
Assuming that Product is unique for every service_ID.
SELECT service_ID
FROM tableName a
WHERE Product IN ('Traffic', 'Weather', 'Travel') AND
EXISTS
(
SELECT 1
FROM tableName b
WHERE a.Service_ID = b.Service_ID
GROUP BY b.Service_ID
HAVING COUNT(*) = 3 -- <<== total number of products
)
GROUP BY service_ID
HAVING COUNT(*) = 3 -- <<== total number of products
SQLFiddle Demo (demo is running under MySQL database, not sure if it will work on teradata)