Comparing values on previous row to perform calculations on the current row, SQL - sql

sql fiddle: http://sqlfiddle.com/#!4/a717b/1
Here is my Table:
Here is my Code:
select key, status, dat, sysdat,
(case when key = 1 and type = 'Car' status = 'F' then round( sysdat-dat,2) else 0 end ) as Days,
(case when key = 1 and type ='Bus' and Status = 'F' then round( sysdat-dat,2) else 0 end) as Days
from ABC
Expected Output:
So I want to calculate days between the 'dat' column and current date for the following conditions.
1) For every key, The sequence is always car first, bus second.
which means that for every key, only when the status of the car is true we check for the bus.
2) If the Status OF 'CAR' IS 'T' then I don't want to calculate the days
3) If the Status of 'Car' IS 'F' then I want to calculate the days only for the 'Car' and not for 'Bus' because it is always 'car' first 'bus' second
4) If the Status of 'Bus' is 'F' and Status of 'Car' is 'T' then I calculate the days because it matches the condition, 'Car' first and 'Bus' second.

With 2 vehicles
If you always have a car and a bus, and only a car and a bus of the same key, you could self-join the table, and check if either the vehicle a (which you're querying) is Car with status F, or if the related verhicle, b, is a car with status T. In either case you're gonna get a date, and in any other case you don't. That covers your example, and also implies that if car and bus would both be T, the date would still be shown next to the bus only.
select a.key, a.type, a.status, a.dat,
case when
(a.type = 'car' and a.Status = 'F') or -- a is a car and is F
(b.type = 'car' and b.Status = 'T') -- a is related to a car (b), which is T
then
trunc(sysdate) - a.dat
end as DAYS
from
ABC a
join ABC b on b.key = a.key and b.type <> a.type
order by
-- Sort the query by key first, then type.
a.key,
decode(a.type, 'car', 1, 2)
The query above: http://sqlfiddle.com/#!4/a717b/5/0
With N vehicles
If you have more vehicles, a different approach can be better, especially when the number of vehicles is high, or not fixed.
The query below has a list of all the vehicles and their sort order. This is an inline view now, but you could use a separate lookup table for that. A lookup table is even more flexible, because you can just add vehicle types or change their sort order.
Anyway, that lookup table/view can be joined on your main table to have a sort order for each of your record.
You can then make a ranking using window function like rank or dense_rank to introduce a numbering based on that sort order ("falsenumber"), and the fact that status is 'F'. After that, it's easy to put the date on the first row that is F (falsenumber = 1).
with
VW_TYPES as
-- This could be a lookup table as well, instead of this hard-codeslist of unions.
( select 'car' as type, 1 as sortorder from dual union all
select 'bus' as type, 2 as sortorder from dual union all
select 'train' as type, 3 as sortorder from dual union all
select 'airplane' as type, 4 as sortorder from dual union all
select 'rocket' as type, 5 as sortorder from dual),
VW_TYPESTATUS as
( select
a.*,
t.sortorder,
dense_rank() over (partition by key order by case when a.status = 'F' then t.sortorder end) as falsenumber
from
ABC a
join VW_TYPES t on t.type = a.type)
select
ts.key, ts.type, ts.status, ts.dat,
case when ts.falsenumber= 1 then
trunc(sysdate) - ts.dat
end as DAYS
from
VW_TYPESTATUS ts
order by
ts.key, ts.sortorder
The query above: http://sqlfiddle.com/#!4/71f52/8/0
Vehicle types in a separate table: http://sqlfiddle.com/#!4/f055d/1/0
Do note that oracle is case sensitive. 'car' and 'Car' and 'CAR' are not the same thing. Use lower(type) = 'car' if you want to allow type to contain the vehicle type with any casing. Do note that that's bad for using indexes, although I think the impact isn't that bad, since you only got a couple of rows per key.
Alternatively (arguably better), you could introduce a numeric VehicleTypeId in the new types table, and use that id in the ABC table, instead of the string 'Car'.

Related

SQL query to return rows with one distinct field and use CASE to create new evaluation column

I want to write an SQL query to return rows with one distinct field and use CASE to create new evaluation column. Any help is appreciated. Deets below:
table
id
status
category
string
string
bigint
--------
--------
----------
pseudo query:
return (distinct id), time_created, NEW_COL
where category is 123123
and where new_col //create new col with these values
(
if status = 'good' then 'GOOD'
if status = 'bad' then 'BAD'
)
FROM table
result:
id
time_created
new_col
1
Jun-1
BAD
2
Jul-21
GOOD
3
Jun-12
GOOD
4
Aug-1
GOOD
--- I keep getting a lint error right on my CASE keyword:
"expecting " '%', '*', '+', '-', '.', '/', 'AT', '[', '||',""
one of queries I tried:
SELECT
ID, time_created
CASE
WHEN status = 'good' THEN 'GOOD'
WHEN status = 'bad' THEN 'BAD'
END
as STATUS_new
FROM TBL
WHERE CATEGORY = '871654671'
ORDER BY time_created
You just have a small syntax error (and bad column name in your sql fiddle). You just need a comma after the time created column.
SELECT
ID, time as time_created,
CASE
WHEN status = 'good' THEN 'GOOD'
WHEN status = 'bad' THEN 'BAD'
END
as STATUS_new
FROM TBL
WHERE CATEGORY = '871654671'
ORDER BY time_created
Here is the working query:
http://www.sqlfiddle.com/#!18/7293b5/11
SELECT
ID, TIME, 'STATUS_new' =
CASE STATUS
WHEN 'good' THEN 'GOOD'
WHEN 'bad' THEN 'BAD'
END
FROM TBL
WHERE CATEGORY = '871654671'
ORDER BY TIME
you must put the new name of the column before the CASE
the column you are defining the CASE must be defined directly behind the case and all the WHEN conditions are directly related to it.
in your fiddle you used the wrong column name of your TIME column

How to create a case statement which groups fields?

I am trying to understand how to group values together to add an indicator. I want to 'fix' the values and based on this, attribute an indicator.
The values I am trying to group are date, customer name and product type to create an indicator which captures what kind of order was placed (fruit only, fruit and vegetable, vegetable only). The goal is to calculate the total volume of each kind of order placed. The data is set out like this, and the column I am trying to create is the 'Order Type.
What I have done so far:
I originally completed this analysis in Tableau ]where I was able to use the 'Fixed' function and sum the value of indicators (for fruit or veggie) to determine each order type individually.
I have written case statements to identify the product type, with the idea that I could sum this to determine order type (code below) however this did not work as I only need one instance of the indicator for each order. To solve this, I have written a case statement which partitions the fields and orders by date to get one instance of an indicator for each order.
Case Statements
CASE WHEN Product_Type = 'Fruit' THEN 1 ELSE 0 END AS Fruit_Indicator
, CASE WHEN Product_Type = 'Vegetable' THEN 1 ELSE 0 END AS Veg_Indicator
Case Statement with partition by and order by
, CASE WHEN ROW_NUMBER() OVER (PARTITION BY Order_Date, Customer ORDER BY Order_Date ASC) = 1 AND Product_Type = 'Fruit' THEN 1 ELSE NULL END AS Fruit_Ind
, CASE WHEN ROW_NUMBER() OVER (PARTITION BY Order_Date, Customer ORDER BY Order_Date ASC) = 1 AND Product_Type = 'Vegetable' THEN 1 ELSE NULL END AS Veg_Ind
I would appreciate any guidance on the right direction.
Thanks!
It APPEARS you are trying to get data grouped by date such as Mar 21, Mar 22, etc... So, you may want to have a secondary query to join the primary data from. The second query will be an aggregate by customer and date. If the date field is date/time oriented, you will have to adjust the group by to get proper formatted context such as date-format using month/day/year and ignoring any time component. This might also be handled by a function to just get the date-part and ignoring the time. Then, your original data to the aggregate should get you what you need. Maybe something like.
select
yt.date,
yt.customer,
yt.product,
yt.productType,
case when PreQuery.IsFruit > 0 and PreQuery.IsVegetable > 0
then 'Fruit & Vegetable'
when PreQuery.IsFruit > 0 and PreQuery.IsVegetable = 0
then 'Fruit Only'
when PreQuery.IsFruit = 0 and PreQuery.IsVegetable > 0
then 'Vegetable Only' end OrderType
from
YourTable yt
JOIN
( select
yt2.customer,
yt2.date,
max( case when yt2.ProductType = 'Fruit'
then 1 else 0 end ) IsFruit,
max( case when yt2.ProductType = 'Vegetable'
then 1 else 0 end ) IsVegetable
from
YourTable yt2
-- if you want to restrict time period, add a where
-- clause here on the date range as to not query entire table
group by
yt2.customer,
yt2.date ) PreQuery
ON yt.customer = PreQuery.customer
AND yt.date = PreQuery.date
-- same here for your outer query to limit just date range in question.
-- if you want to restrict time period, add a where
-- clause here on the date range as to not query entire table
order by
yt.date,
yt.customer,
yt.product

How to Count Distinct on Case When?

I have been building up a query today and I have got stuck. I have two unique Ids that identify if and order is Internal or Web. I have been able to split this out so it does the count of how many times they appear but unfortunately it is not providing me with the intended result. From research I have tried creating a Count Distinct Case When statement to provide me with the results.
Please see below where I have broken down what it is doing and how I expect it to be.
Original data looks like:
Company Name Order Date Order Items Orders Value REF
-------------------------------------------------------------------------------
CompanyA 03/01/2019 Item1 Order1 170 INT1
CompanyA 03/01/2019 Item2 Order1 0 INT1
CompanyA 03/01/2019 Item3 Order2 160 WEB2
CompanyA 03/01/2019 Item4 Order2 0 WEB2
How I expect it to be:
Company Name Order Date Order Items Orders Value WEB INT
-----------------------------------------------------------------------------------------
CompanyA 03/01/2019 4 2 330 1 1
What currently comes out
Company Name Order Date Order Items Orders Value WEB INT
-----------------------------------------------------------------------------------------
CompanyA 03/01/2019 4 2 330 2 2
As you can see from my current result it is counting every line even though it is the same reference. Now it is not a hard and fast rule that it is always doubled up. This is why I think I need a Count Distinct Case When. Below is my query I am currently using. This pull from a Progress V10 ODBC that I connect through Excel. Unfortunately I do not have SSMS and Microsoft Query is just useless.
My Current SQL:
SELECT
Company_0.CoaCompanyName
, SopOrder_0.SooOrderDate
, Count(DISTINCT SopOrder_0.SooOrderNumber) AS 'Orders'
, SUM(CASE WHEN SopOrder_0.SooOrderNumber IS NOT NULL THEN 1 ELSE 0 END) AS 'Order Items'
, SUM(SopOrderItem_0.SoiValue) AS 'Order Value'
, SUM(CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN 1 ELSE 0 END) AS 'INT'
, SUM(CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'WEB%' THEN 1 ELSE 0 END) AS 'WEB'
FROM
SBS.PUB.Company Company_0
, SBS.PUB.SopOrder SopOrder_0
, SBS.PUB.SopOrderItem SopOrderItem_0
WHERE
SopOrder_0.SopOrderID = SopOrderItem_0.SopOrderID
AND Company_0.CompanyID = SopOrder_0.CompanyID
AND SopOrder_0.SooOrderDate > '2019-01-01'
GROUP BY
Company_0.CoaCompanyName
, SopOrder_0.SooOrderDate
I have tried using the following line but it errors on me when importing:
, Count(DISTINCT CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN SopOrder_0.SooParentOrderReference ELSE 0 END) AS 'INT'
Just so know the error I get when importing at the moment is syntax error at or about "CASE WHEN sopOrder_0.SooParentOrderRefer" (10713)
Try removing the ELSE:
COUNT(DISTINCT CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN SopOrder_0.SooParentOrderReference END) AS num_int
You don't specify the error, but the problem is probably that the THEN is returning a string and the ELSE a number -- so there is an attempt to convert the string values to a number.
Also, learn to use proper, explicit, standard JOIN syntax. Simple rule: Never use commas in the FROM clause.
count distinct on the SooOrderNumber or the SooParentOrderReference, whichever makes more sense for you.
If you are COUNTing, you need to make NULL the thing that your are not counting. I prefer to include an else in the case because it is more consistent and complete.
, Count(DISTINCT CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN SopOrder_0.SooParentOrderReference ELSE null END) AS 'INT'
Gordon Linoff is correct regarding the source of your error, i.e. datatype mismatch between the case then value else value end. null removes (should remove) this ambiguity - I'd need to double check.
Editing my earlier answer...
Even though it looks, as you say, like count distinct is not supported in Pervasive PSQL, CTEs are supported. So you can do something like...
This is what you are trying to do but it is not supported...
with
dups as
(
select 1 as id, 'A' as col1 union all select 1, 'A' union all select 1, 'B' union all select 2, 'B'
)
select id
,count(distinct col1) as col_count
from dups
group by id;
Stick another CTE in the query to de-duplicate the data first. Then count as normal. That should work...
with
dups as
(
select 1 as id, 'A' as col1 union all select 1, 'A' union all select 1, 'B' union all select 2, 'B'
)
,de_dup as
(
select id
,col1
from dups
group by id
,col1
)
select id
,count(col1) as col_count
from de_dup
group by id;
These 2 versions should give the same result set.
There is always a way!!
I cannot explain the error you are getting. You are mistakenly using single quotes for alias names, but I don't actually think this is causing the error.
Anyway, I suggest you aggregate your order items per order first and only join then:
SELECT
c.coacompanyname
, so.sooorderdate
, COUNT(*) AS orders
, SUM(soi.itemcount) AS order_items
, SUM(soi.ordervalue) AS order_value
, COUNT(CASE WHEN so.sooparentorderreference LIKE 'INT%' THEN 1 END) AS int
, COUNT(CASE WHEN so.sooparentorderreference LIKE 'WEB%' THEN 1 END) AS web
FROM sbs.pub.company c
JOIN sbs.pub.soporder so ON so.companyid = c.companyid
JOIN
(
SELECT soporderid, COUNT(*) AS itemcount, SUM(soivalue) AS ordervalue
FROM sbs.pub.soporderitem
GROUP BY soporderid
) soi ON soi.soporderid = so.soporderid
GROUP BY c.coacompanyname, so.sooorderdate
ORDER BY c.coacompanyname, so.sooorderdate;

Constructing A Query In BigQuery With CASE Statements

So I'm trying to construct a query in BigQuery that I'm struggling with for a final part.
As of now I have:
SELECT
UNIQUE(Name) as SubscriptionName,
ID,
Interval,
COUNT(mantaSubscriptionIdmetadata) AS SubsPurchased,
SUM(RevenueGenerated) as RevenueGenerated
FROM (
SELECT
mantaSubscriptionIdmetadata,
planIdmetadata,
INTEGER(Amount) as RevenueGenerated
FROM
[sample_internal_data.charge0209]
WHERE
revenueSourcemetadata = 'new'
AND
Status = 'Paid'
GROUP BY
mantaSubscriptionIdmetadata,
planIdmetadata,
RevenueGenerated
)a
JOIN (
SELECT
id,
Name,
Interval
FROM
[sample_internal_data.subplans]
WHERE
id in ('150017','150030','150033','150019')
GROUP BY
id,
Name,
Interval )b
ON
a.planIdmetadata = b.id
GROUP BY
ID,
Interval,
Name
ORDER BY
Interval ASC
The resulting query looks like this
Which is exactly what I'm looking for up to that point.
Now what I'm stuck on this. There is another column I need to add called SalesRepName. The resulting field will either be null or not null. If its null it means it was sold online. If its not null, it means it was sold via telephone. What I want to do is create two additional columns where it says how many were sold via telesales and via online. The sum total of the two columns will always equal the SubsPurchased total.
Can anyone help?
You can include case statements within aggregate functions. Here you could choose sum(case when SalesRepName is null then 1 else 0 end) as online and sum(case when SalesRepName is not null then 1 else 0 end) as telesales.
count(case when SalesRepName is null then 1 end) as online would give the same result. Using sum in these situations is simply my personal preference.
Note that omitting the else clause is equivalent to setting else null, and null isn't counted by count. This can be very useful in combination with exact_count_distinct, which has no equivalent in terms of sum.
Try below:
it assumes your SalesRepName field is in [sample_internal_data.charge0209] table
and then it uses "tiny version" of SUM(CASE ... WHEN ...) which works when you need 0 or 1 as a result to be SUM'ed
SUM(SalesRepName IS NULL) AS onlinesales,
SUM(NOT SalesRepName IS NULL) AS telsales
SELECT
UNIQUE(Name) AS SubscriptionName,
ID,
Interval,
COUNT(mantaSubscriptionIdmetadata) AS SubsPurchased,
SUM(RevenueGenerated) AS RevenueGenerated,
SUM(SalesRepName IS NULL) AS onlinesales,
SUM(NOT SalesRepName IS NULL) AS telesales
FROM (
SELECT SalesRepName, mantaSubscriptionIdmetadata, planIdmetadata, INTEGER(Amount) AS RevenueGenerated
FROM [sample_internal_data.charge0209]
WHERE revenueSourcemetadata = 'new'
AND Status = 'Paid'
GROUP BY mantaSubscriptionIdmetadata, planIdmetadata, RevenueGenerated
)a
JOIN (
SELECT id, Name, Interval
FROM [sample_internal_data.subplans]
WHERE id IN ('150017','150030','150033','150019')
GROUP BY id, Name, Interval
)b
ON a.planIdmetadata = b.id
GROUP BY ID, Interval, Name
ORDER BY Interval ASC

Sql query to find count with a difference condition and total count in the same query

Here is a sample table I have
Logs
user_id, session_id, search_query, action
1, 100, dog, A
1, 100, dog, B
2, 101, cat, A
3, 102, ball, A
3, 102, ball, B
3, 102, kite, A
4, 103, ball, A
5, 104, cat, A
where
miss = for the same user_id and same session id , if action A is not followed by action B its termed a miss.
Note: action B can happen only after action A has happened.
I am able to find the count of misses for each unique search_query across all users and sessions.
SELECT l1.search_query, count(l1.*) as misses
FROM logs l1
WHERE NOT EXISTS
(SELECT NULL FROM logs l2
WHERE l1.user_id = l2.user_id
AND l1.session_id = l2.session_id
AND l1.session_id != ''
AND l2.action = 'B'
AND l1.action = 'A')
AND l1.action='A'
AND l1.search_query != ''
GROUP BY v1.search_query
order by misses desc;
I am trying to find the value of miss_percentage=(number of misses/total number of rows)*100 for each unique search_query. I couldn't figure out how to find the count with a condition and count without that condition in the same query. Any help would be great.
expected output:
cat 100
kite 100
ball 50
One way to do it is to move the EXISTS into the count
SELECT l1.search_query, count(case when NOT EXISTS
(SELECT 1 FROM logs l2
WHERE l1.user_id = l2.user_id
AND l1.session_id = l2.session_id
AND l1.search_query = l2.search_query
AND l2.action = 'B'
AND l1.action = 'A') then 1 else null end
)*100.0/count(*) as misses
FROM logs l1
WHERE l1.action='A'
AND l1.search_query != ''
GROUP BY l1.search_query
order by misses desc;
This produces the desired results, but also zeros if no misses were found. This can be removed with a HAVING clause, or postprocessing.
Note I also added the clause l1.search_query = l2.search_query that was missing, since otherwise it was counting kite as succeeded, since there is a row with B in the same session.
I think you just need to use case statements here. If I have understood your problem correctly .. then the solution would be something like this -
WITH summary
AS (
SELECT user_id
,session_id
,search_query
,count(1) AS total_views
,sum(CASE
WHEN action = 'A'
THEN 1
ELSE 0
END) AS action_a
,sum(CASE
WHEN action = 'B'
THEN 1
ELSE 0
END) AS action_b
FROM logs l
GROUP BY user_id
,session_id
,search_query
)
SELECT search_query
,(sum(action_a - action_b) / sum(action_a)) * 100 AS miss_percentage
FROM summary
GROUP BY search_query;
You can allways create two queries, and combine them into one with a join. Then you can do the calculations in the bridging (or joining) SQL statement.
In MS-SQL compatible SQL this would be:
SELECT ActiontypeA,countedA,isNull(countedB,0) as countedB,
(countedA-isNull(countedB,0))*100/CountedA as missed
FROM (SELECT search_query as actionTypeA, count(*) as countedA
FROM logs WHERE Action='A' GROUP BY actionType
) as TpA
LEFT JOIN
(SELECT search_query as actionTypeB, count(*) as countedB
FROM logs WHERE Action='B' GROUP BY actionType
) as TpB
ON TpA.ActionTypeA = TpB.ActiontypeB
The LEFT JOIN is required to select all activities (search_query) from the 'A' results, and join them to only those from the 'B' results where a B is available.
Since this is very basic SQL (and well optimized by SQL engines) I'd suggest to prevent WHERE EXISTS as much as possible. The IsNull() function is an MS-SQL function to force a NULL value into the int(0) value which can be used in a calculation.
Finally you could filter on
WHERE missed>0
to get the final result.