Query just one row that meet three conditions in SQL - sql

I'd like to make a query that returns just one row when it meets 3 conditions. I have a database that looks like this:
Location
Date
Item
Price
Chicago
2021-06-10
1
150
New York
2021-06-10
2
130
Chicago
2021-06-10
1
150
Los Angeles
2021-06-10
3
100
Atlanta
2021-06-10
4
120
New York
2021-06-09
2
125
Chicago
2021-06-09
1
155
Los Angeles
2021-06-09
3
99
Atlanta
2021-06-09
4
140
This database contains the price of different items, by date and location. This price changes each day and the price in each location for the same item does not need to be the same. Given that this database contains each sale made in a day, for each item, I'd like to make a query that returns only one observation by Location, Date and Item. I want to have like a time series for each the price of each item, in each location. So the resulting table should look like this:
Location
Date
Item
Price
Chicago
2021-06-10
1
150
New York
2021-06-10
2
130
Los Angeles
2021-06-10
3
100
Atlanta
2021-06-10
4
120
New York
2021-06-09
2
125
Chicago
2021-06-09
1
155
Los Angeles
2021-06-09
3
99
Atlanta
2021-06-09
4
140
Hope someone can help me, thanks.

To elaborate on the comments, this will give exactly what you have specified.
SELECT
DISTINCT
*
FROM
yourTable
The DISTINCT key word looks at all columns in each row and eliminates any row that exactly matches any other row.
If the price can vary within a day, but you want the maximum value, for example, use a GROUP BY...
SELECT
location,
date,
item,
MAX(price) AS max_price
FROM
yourTable
GROUP BY
location,
date,
item
That will ensure you get one row per unique combination of location, date, item, and then you can pick which price to include using aggregate functions.
Note: Using keywords such as date as column names is a bad idea. depending on your database you may need to "quote"/"escape" such column names, and even then the make reading the code harder for others.

Related

how to filter based on events that HAVE happened and HAVE NOT happened

I have a table named orders in a SQL database that looks like this:
user_id email segment destination revenue
1 joe#smith.com basic New York 500
1 joe#smith.com luxury London 750
1 joe#smith.com luxury London 500
1 joe#smith.com basic New York 625
1 joe#smith.com basic Miami 925
1 joe#smith.com basic Los Angeles 218
1 joe#smith.com basic Sydney 200
2 mary#jones.com basic Chicago 375
2 mary#jones.com luxury New York 1500
2 mary#jones.com basic Toronto 2800
2 mary#jones.com basic Miami 750
2 mary#jones.com basic New York 500
2 mary#jones.com basic New York 625
3 mike#me.com luxury New York 650
3 mike#me.com basic New York 875
4 sally#you.com luxury Chicago 1300
4 sally#you.com basic New York 1200
4 sally#you.com basic New York 1000
4 sally#you.com luxury Sydney 725
5 bob#gmail.com basic London 500
5 bob#gmail.com luxury London 750
Here's a SQL Fiddle: http://www.sqlfiddle.com/#!9/22f40a/1
I'd like to be able to apply the following logic to get the final result set:
Return only the distinct user_id and the user's email based on the following conditions:
where segment is equal to luxury and destination is equal to New York
OR
where segment is equal to luxury and destination is equal to London
OR
where segment is equal to basic and destination is equal to New York and the given user has a revenue amount in the basic and New York records that sums to greater than $2,000
BUT
a given user has not previously been to destination equal to Miami
Based on my sample data, I would like to see the following returned:
user_id email
3 mike#me.com
4 sally#you.com
5 bob#gmail.com
I tried to use the following to get part of what I need:
SELECT
DISTINCT(user_id),
email
FROM orders o
WHERE
(o.segment = 'luxury' AND o.destination = 'New York')
OR
(o.segment = 'luxury' AND o.destination = 'London')
But, this query doesn't handle conditions #3 and #4 above. I feel like a window function might be helpful here, but I don't know quite how to implement it.
If someone could help me with this query, I would be incredibly grateful!
Thanks!
You can use subqueries to achieve what you need:
SELECT
DISTINCT(o.user_id),
o.email
FROM orders o
WHERE
(
-- Clause 1
(o.segment = 'luxury' AND o.destination = 'New York')
OR
-- Clause 2
(o.segment = 'luxury' AND o.destination = 'London')
OR
-- Clause 3
(o.user_id IN (
SELECT DISTINCT(o.user_id)
FROM orders o
WHERE o.segment = 'basic' AND o.destination = 'New York'
GROUP BY o.user_id, o.email, o.segment, o.destination
HAVING SUM(o.revenue) > 2000
))
)
AND
-- Clause 4
o.user_id NOT IN (
SELECT DISTINCT(o.user_id)
FROM orders o
WHERE o.destination = 'Miami'
)
here's another to do it by scanning the table once, group by and having:
SELECT user_id, email,
SUM(case
when segment='luxury' and destination in ('New York','London') then 1
else 0
end) as is_luxury,
SUM(case
when segment='basic' and destination in ('New York') then 1
else 0
end) as is_basic,
SUM(case
when segment='basic' and destination in ('New York') then revenue
else 0
end) as basic_revenue,
SUM(case when destination in ('Miami') then 1 else 0 end) as is_miami
FROM orders
GROUP BY 1,2
HAVING (is_luxury > 0 OR (is_basic > 0 AND basic_revenue > 2000))
AND NOT is_miami;

What logic should be used to label customers (monthly) based on the categories they bought more often in the preceding 4 calendar months?

I have a table that looks like this:
user
type
quantity
order_id
purchase_date
john
travel
10
1
2022-01-10
john
travel
15
2
2022-01-15
john
books
4
3
2022-01-16
john
music
20
4
2022-02-01
john
travel
90
5
2022-02-15
john
clothing
200
6
2022-03-11
john
travel
70
7
2022-04-13
john
clothing
70
8
2022-05-01
john
travel
200
9
2022-06-15
john
tickets
10
10
2022-07-01
john
services
20
11
2022-07-15
john
services
90
12
2022-07-22
john
travel
10
13
2022-07-29
john
services
25
14
2022-08-01
john
clothing
3
15
2022-08-15
john
music
5
16
2022-08-17
john
music
40
18
2022-10-01
john
music
30
19
2022-11-05
john
services
2
20
2022-11-19
where i have many different users, multiple types making purchases daily.
I want to end up with a table of this format
user
label
month
john
travel
2022-01-01
john
travel
2022-02-01
john
clothing
2022-03-01
john
travel-clothing
2022-04-01
john
travel-clothing
2022-05-01
john
travel-clothing
2022-06-01
john
travel
2022-07-01
john
travel
2022-08-01
john
services
2022-10-01
john
music
2022-11-01
where the label would record the most popular type (based on % of quantity sold) for each user in a timeframe of the last 4 months (including the current month). So for instance, for March 2022 john ordered 200/339 clothing (Jan to and including Mar) so his label is clothing. But for months where two types are almost even I'd want to use a double label like for April (185 travel 200 clothing out of 409). In terms of rules this is not set in stone yet but it's something like, if two types are around even (e.g. >40%) then use both types in the label column; if three types are around even (e.g. around 30% each) use three types as label; if one label is 40% but the rest is made up of many small % keep the first label; and of course where one is clearly a majority use that. One other tricky bit is that there might be missing months for a user.
I think regarding the rules I need to just compare the % of each type, but I don't know how to retrieve the type as label afterwards. In general, I don't have the SQL/BigQuery logic very clearly in my head. I have done somethings but nothing that comes close to the target table.
Broken down in steps, I think I need 3 things:
group by user, type, month and get the partial and total count (I have done this)
then retrieve the counts for the past 4 months (have done something but it's not exactly accurate yet)
compare the ratios and make the label column
I'm not very clear on the sql/bigquery logic here, so please advise me on the correct steps to achieve the above. I'm working on bigquery but sql logic will also help
Consider below approach. It looks a little bit messy and has a room to optimize but hope you get some idea or a direction to address your problem.
WITH aggregation AS (
SELECT user, type, DATE_TRUNC(purchase_date, MONTH) AS month, month_no,
SUM(quantity) AS net_qty,
SUM(SUM(quantity)) OVER w1 AS rolling_qty
FROM sample_table, UNNEST([EXTRACT(YEAR FROM purchase_date) * 12 + EXTRACT(MONTH FROM purchase_date)]) month_no
GROUP BY 1, 2, 3, 4
WINDOW w1 AS (
PARTITION BY user ORDER BY month_no RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
)
),
rolling AS (
SELECT user, month, ARRAY_AGG(STRUCT(type, net_qty)) OVER w2 AS agg, rolling_qty
FROM aggregation
QUALIFY ROW_NUMBER() OVER (PARTITION BY user, month) = 1
WINDOW w2 AS (PARTITION BY user ORDER BY month_no RANGE BETWEEN 3 PRECEDING AND CURRENT ROW)
)
SELECT user, month, ARRAY_TO_STRING(ARRAY(
SELECT type FROM (
SELECT type, SUM(net_qty) / SUM(SUM(net_qty)) OVER () AS pct,
FROM r.agg GROUP BY 1
) QUALIFY IFNULL(FIRST_VALUE(pct) OVER (ORDER BY pct DESC) - pct, 0) < 0.10 -- set threshold to 0.1
), '-') AS label
FROM rolling r
ORDER BY month;
Query results

Function to get rolling average with lowest 2 values eliminated?

This is my sample data with the current_Rating column my desired output.
Date Name Subject Importance Location Time Rating Current_rating
12/08/2020 David Work 1 London - - 4
1/08/2020 David Work 3 London 23.50 4 3.66
2/10/2019 David Emails 3 New York 18.20 3 4.33
2/08/2019 David Emails 3 Paris 18.58 4 4
11/07/2019 David Work 1 London - 3 4
1/06/2019 David Work 3 London 23.50 4 4
2/04/2019 David Emails 3 New York 18.20 3 5
2/03/2019 David Emails 3 Paris 18.58 5 -
12/08/2020 George Updates 2 New York - - 2
1/08/2019 George New Appointments5 London 55.10 2 -
I need to use a function to get values in the current_Rating column.The current_Rating gets the previous 5 results from the rating column for each name, then eliminates the lowest 2 results, then gets the average for the remaining 3. Also some names may not have 5 results, so I will just need to get the average of the results if 3 or below, if 4 results I will need to eliminate the lowest value and average the remaining 3. Also to get the right 5 previous results it will need to be sorted by date. Is this possible? Thanks for your time in advance.
What a pain! I think the simplest method might be to use arrays and then unnest() and aggregate:
select t.*, r.current_rating
from (select t.*,
array_agg(rating) over (partition by name order by date rows between 4 preceding and current row) as rating_5
from t
) t cross join lateral
(select avg(r) as current_rating
from (select u.*
from unnest(t.rating_5) with ordinality u(r, n)
where r is not null
order by r desc desc
limit 3
) r
) r

SQL: Create a flag for separate records in the same table with overlapping date ranges

I'm trying to figure out how to create a boolean field that would tell me when two records have overlapping date ranges.
IN the following example, every unique Location/Counterparty combo within a specified date range can EITHER have a contract, or a DeliveryPoint, not both. So id 1&2 should be flagged, but id's 3 and 4 are ok because they don't overlap, so the flag should read "False".
I started to do a self join, but after that, I couldn't wrap my head around the next step. Did I start correctly, or is the solution totally different?
id Location Counterparty Contract DeliveryPoint StartDate EndDate
1 New York Wal Mart Philadelphia 3/1/2019 12/31/2020
2 New York Wal Mart 123456 5/1/2019 7/31/2019
3 Toronto Target Boston 3/1/2019 5/31/2019
4 Toronto Target 456789 6/1/2019 12/31/2020
With the flag, I'd want it to look like
id Location Counterparty Contract DeliveryPoint StartDate EndDate Overlap
1 New York Wal Mart Philadelphia 3/1/2019 12/31/2020 TRUE
2 New York Wal Mart 123456 5/1/2019 7/31/2019 TRUE
3 Toronto Target Boston 3/1/2019 5/31/2019 FALSE
4 Toronto Target 456789 6/1/2019 12/31/2020 FALSE
On your insert query, I think you could create a subquery that search other record with overlapping dates. Please attention the date fields test. See the example:
insert into table(location, Counterparty, Overlap)
select
#location,
#Counterparty,
case when exists(select Id
from table t
where t.location = #location
and t.Counterparty = #Counterparty
and #startDate <= t.EndDate
and #endDate >= t.StartDate
) then 1 else 0 end as Overlap

Create all combinations of summations given criteria in Access VBA

I have a subset summation problem I cannot find the answer to. I am trying to write something in VBA for access that will take all combinations of summations within a certain criteria and place them in a table so I can match a different table to it. Right now I am more concerned with creating the table of combinations. First time I have asked a question sorry if I mess something up.
Example:
Access Table: ImpTable
Fields: ID, Year-Month, Name, Country, Quantity
I need to make every combination of summations where the country and Year-Month are the same. Yet keep track of what was included in the formula. If the new table was created and kept track of which ID's were included in the combination I can reference the original table for the name.
Expected Ending Table Results:
NewID, Year-Month, Country, SumQuantity, ComboName (ID's from original table)
Any help is appreciated.
Raw Data:
ID Year-Month Name Country Quantity
1 2016-06 Person1 US 10
2 2016-06 Person2 US 12
3 2016-10 Person3 US 4
4 2016-06 Person4 UK 5
5 2016-06 Person5 UK 6
6 2016-06 Person6 US 3
Desired Results:
NewID Year-Month Country SumQuantity ComboName
1 2016-06 US 22 1,2
2 2016-06 US 13 1,6
3 2016-06 US 25 1,2,6
4 2016-06 US 15 2,6
5 2016-06 UK 11 4,5
6 2016-10 US 4 3