SQL - find prior string value - sql

I have a DB which 'tracks' the customer shopping journey. What I want to do is recall the previous value if their final destination or 'shop' is a particular value.
For example say the shops are named like this:
Shop 1
Shop 2
Shop 3
Shop 4
If my select query returns Shop 4 (for any customer) then I want the extra column to show the previous shop they last shopped at. There is no natural order to my data so I can't literally state that Shop 4 = Shop 3 it just needs to return whatever shop they last shopped at if the last one is Shop 4 (there previous shop could be any 'shop').
This is what I have so far but it's probably way off the mark. I have a date column in my table but don't know how to use it in this way.
Select ...
case
when TableShop.ShopName LIKE 'Shop4' then
cast(TableShop.ShopName -1 AS nvarchar(50))
end
From ...

Presumably, you have some column that specifies the ordering of the visits -- say a visitDatetime column.
Then, you can use the ANSI standard LAG() function:
select s.*,
(case when s.shopName = 'Shop4'
then lag(s.shopName) over (partition by customerId order by visitDateTime)
end) as prev_ShopName
from tableshop s;

Related

Counting how many times one specific value changed to another specific value in order of date range and grouped by ID

I have a table like below where I need to query a count of how many times each ID went from specifically 'Waste Sale' in one value to 'On Stop' in the very next value based on ascending date and if there are no instances of this, the count will be 0
ID
Stage name
Stage Changed Date
1
Waste Sale
06-05-2022
1
On Stop
08-06-2022
1
Cancelled
09-02-2022
2
Waste Sale
06-05-2022
2
On Stop
07-05-2022
2
Waste Sale
08-06-2022
2
On Stop
10-07-2022
3
Cancelled
10-07-2022
3
On Stop
11-07-2022
The result I would be looking for based on the above table would be something like this:
ID
Count of 'Waste Sales to On Stops'
1
1
2
2
3
0
ID 1 having a count of 1 because there was one instance of 'Waste Sale' changing to 'On Stop' in the very next value based on date range
ID 3 having a count of 0 because even though the stage name changed to 'On Stop' the previous value based on date range wasn't 'Waste Sale'.
I have a hunch I would have to use something like LEAD() and GROUP BY/ ORDER BY but since I'm so new to SQL would really appreciate some help on the specific syntax and coding. Any version of SQL is okay.
We can use window function lead to take a peek at the next value of the query result.
select distinct id,
(
select count(*)
from
(
select *,
lead(stage_name)
over(
partition by id
order by stage_changed_date)
as stage_next
from sales s2
) s3
where s3.id = s1.id
and s3.stage_name = 'waste sale'
and s3.stage_next = 'on stop'
) as count_of_waste_sales_to_on_stop
from sales s1
order by id;
Query above uses lead(stage_name) over(partition by id order by stage_changed_date) to get the next stage_name in the query result while segregating it by id and order it based on stage_changed_date. Check the query on DB Fiddle.
Note:
I have no experience in zoho, so i'm unsure if the query will 100% works or not. They said it supported ansi-sql, however there might some differences with MySQL due to reasons.
The column names are not the exact same with op question due to testing only done using DB Fiddle.
There might better query out there waiting to be written.

COUNT with multiple LEFT joins [duplicate]

This question already has answers here:
Two SQL LEFT JOINS produce incorrect result
(3 answers)
Closed 12 months ago.
I am having some troubles with a count function. The problem is given by a left join that I am not sure I am doing correctly.
Variables are:
Customer_name (buyer)
Product_code (what the customer buys)
Store (where the customer buys)
The datasets are:
Customer_df (list of customers and product codes of their purchases)
Store1_df (list of product codes per week, for Store 1)
Store2_df (list of product codes per day, for Store 2)
Final output desired:
I would like to have a table with:
col1: Customer_name;
col2: Count of items purchased in store 1;
col3: Count of items purchased in store 2;
Filters: date range
My query looks like this:
SELECT
DISTINCT
C_customer_name,
C.product_code,
COUNT(S1.product_code) AS s1_sales,
COUNT(S2.product_code) AS s2_sales,
FROM customer_df C
LEFT JOIN store1_df S1 USING(product_code)
LEFT JOIN store2_df S2 USING(product_code)
GROUP BY
customer_name, product_code
HAVING
S1_sales > 0
OR S2_sales > 0
The output I expect is something like this:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
4
8
James
100022
6
10
But instead, I get:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
290
60
James
100022
290
60
It works when instead of COUNT(product_code) I do COUNT(DSITINCT product_code) but I would like to avoid that because I would like to be able to aggregate on different timespans (e.g. if I do count distinct and take into account more than 1 week of data I will not get the right numbers)
My hypothesis are:
I am joining the tables in the wrong way
There is a problem when joining two datasets with different time aggregations
What am I doing wrong?
The reason as Philipxy indicated is common. You are getting a Cartesian result from your data thus bloating your numbers. To simplify, lets consider just a single customer purchasing one item from two stores. The first store has 3 purchases, the second store has 5 purchases. Your total count is 3 * 5. This is because for each entry in the first is also joined by the same customer id in the second. So 1st purchase is joined to second store 1-5, then second purchase joined to second store 1-5 and you can see the bloat. So, by having each store pre-query the aggregates per customer will have AT MOST, one record per customer per store (and per product as per your desired outcome).
select
c.customer_name,
AllCustProducts.Product_Code,
coalesce( PQStore1.SalesEntries, 0 ) Store1SalesEntries,
coalesce( PQStore2.SalesEntries, 0 ) Store2SalesEntries
from
customer_df c
-- now, we need all possible UNIQUE instances of
-- a given customer and product to prevent duplicates
-- for subsequent queries of sales per customer and store
JOIN
( select distinct customerid, product_code
from store1_df
union
select distinct customerid, product_code
from store2_df ) AllCustProducts
on c.customerid = AllCustProducts.customerid
-- NOW, we can join to a pre-query of sales at store 1
-- by customer id and product code. You may also want to
-- get sum( SalesDollars ) if available, just add respectively
-- to each sub-query below.
LEFT JOIN
( select
s1.customerid,
s1.product_code,
count(*) as SalesEntries
from
store1_df s1
group by
s1.customerid,
s1.product_code ) PQStore1
on AllCustProducts.customerid = PQStore1.customerid
AND AllCustProducts.product_code = PQStore1.product_code
-- now, same pre-aggregation to store 2
LEFT JOIN
( select
s2.customerid,
s2.product_code,
count(*) as SalesEntries
from
store2_df s2
group by
s2.customerid,
s2.product_code ) PQStore2
on AllCustProducts.customerid = PQStore2.customerid
AND AllCustProducts.product_code = PQStore2.product_code
No need for a group by or having since all entries in their respective pre-aggregates will result in a maximum of 1 record per unique combination. Now, as for your needs to filter by date ranges. I would just add a WHERE clause within each of the AllCustProducts, PQStore1, and PQStore2.

PostgreSql - How to create conditional column with the filter on another column?

I want to add 1 more columns where segment out whether the customer had sold at least one product or not.
Data example:
ProductID Customer Status
1 John Not sold
2 John Not Sold
3 John Sold
My expect result
ProductID Customer Status Sold_at_least_1
1 John Not sold Yes
2 John Not Sold Yes
3 John Sold Yes
4 Andrew Not Sold No
5 Andrew Not Sold No
6 Brandon Sold Yes
This is an example data. Sorry for any inconvenience as I unable to extract data out. Btw, appreciating for any helps.
You can do a window count of records of the same customer that have status = 'Sold' in a case expression:
select
t.*,
case when sum( (status = 'Sold')::int ) over(partition by customer) >= 1
then 'Yes'
else 'No'
end
from mytable
NB: note that this does not magically create new records (as shown in your sample data). This query gives you as many records in the resultset as there are in the table, with an additionnal column that indicates whether each cutsomer has at least one sold item in the table.
Here is a demo provided by VBokšić (thanks).
Another option is to use bool_or() as a window function. If you can live with a boolean column rather than a varchar with Yes/No, this makes the expression even simpler:
select productid, customer, status,
bool_or(status = 'Sold') over (partition by customer) as sold_at_least_one
from mytable;
Online example: https://rextester.com/NDN54253

Custom Sortby , Order by issue

I have data in table -> bp like below
1 Vendor
2 Customer
3 Transporter
I want select * from bp order by row value 2,1,3, like this the result should be:
2 Customer
1 Vendor
3 Transporter
As the ordering isn't alphabetic or numeric, and appears somewhat arbitrary, then use a case statement. However this doesn't support growth and code would have to be changed anytime a new value is presented in col2. You'd be better off including an orderBy Column in the base table containing these values. and allowing a user to specify order for long term usability. Why tie a user into a specific order... Seems odd but this is the way to do it.
SELECT *
FROM bp
order by CASE when col2='Customer' then 1
when col2='Vendor' then 2
when col2='Transporter' then 3
else then 4 end;
Try This:-
Add One More Column SortBy and Add Values Like first digit shows one type of sort, second digit second type and third digit third type. Its a one and very simple way. If records are more then you can arrange it in other ways.
1 Vendor 132
2 Customer 213
3 Transporter 321
Vendor --> select * from bp order by substring(SortBy,1,1)
Customer --> select * from bp order by substring(SortBy,2,1)
Transporter --> select * from bp order by substring(SortBy,3,1)

Determine records which held particular "state" on a given date

I have a state machine architecture, where a record will have many state transitions, the one with the greatest sort_key column being the current state. My problem is to determine which records held a particular state (or states) for a given date.
Example data:
items table
id
1
item_transitions table
id item_id created_at to_state sort_key
1 1 05/10 "state_a" 1
2 1 05/12 "state_b" 2
3 1 05/15 "state_a" 3
4 1 05/16 "state_b" 4
Problem:
Determine all records from items table which held state "state_a" on date 05/15. This should obviously return the item in the example data, but if you query with date "05/16", it should not.
I presume I'll be using a LEFT OUTER JOIN to join the items_transitions table to itself and narrow down the possibilities until I have something to query on that will give me the items that I need. Perhaps I am overlooking something much simpler.
Your question rephrased means "give me all items which have been changed to state_a on 05/15 or before and have not changed to another state afterwards. Please note that for the example it added 2001 as year to get a valid date. If your "created_at" column is not a datetime i strongly suggest to change it.
So first you can retrieve the last sort_key for all items before the threshold date:
SELECT item_id,max(sort_key) last_change_sort_key
FROM item_transistions it
WHERE created_at<='05/15/2001'
GROUP BY item_id
Next step is to join this result back to the item_transitions table to see to which state the item was switched at this specific sort_key:
SELECT *
FROM item_transistions it
JOIN (SELECT item_id,max(sort_key) last_change_sort_key
FROM item_transistions it
WHERE created_at<='05/15/2001'
GROUP BY item_id) tmp ON it.item_id=tmp.item_id AND it.sort_key=tmp.last_change_sort_key
Finally you only want those who switched to 'state_a' so just add a condition:
SELECT DISTINCT it.item_id
FROM item_transistions it
JOIN (SELECT item_id,max(sort_key) last_change_sort_key
FROM item_transistions it
WHERE created_at<='05/15/2001'
GROUP BY item_id) tmp ON it.item_id=tmp.item_id AND it.sort_key=tmp.last_change_sort_key
WHERE it.to_state='state_a'
You did not mention which DBMS you use but i think this query should work with the most common ones.