Creating row with different where - sql

I have this code to get the number of users of all items in the list and the average level.
select itemId,count(c.characterid) as numberOfUse, avg(maxUpgrade) as averageLevel
from items i inner join characters c on i.characterId=c.characterId
where itemid in (22001,22002,22003,22004,22005,22006,22007,22008,22009,22010,22011,22012,22013,22014,22015,22016,22030,22031,22032,22033,22034,22035,22036,22037,22038,22039,22040,22041,22042,22050,22051,22052,22053,22054,22055,22056,22057,22058,22059,22060,22070,22071,22072,22073,22074,22075,22076,22077,22085,22086,22087,22091,22092)
and attached>0
group by itemId
It does is creating a row for the rune id, one for the number of users, and one for the average-level people who upgrade it, and it does that for all players of the server.
I would like to create a new column every 10 levels to have stats every 10 levels, so I can see what item is more used depending on player level. The item level depending on the level, so the way I do to select only a certain level is using WHERE itemid>0 and itemid<10, and I do that every 10 levels, copy data, and push them in a google sheet.
So I would like a result with columns :
itemid use_1-10 avg_level_1-10 use_11-20 avg_level_21-30 etc...
So I could copy all the results at once and not having to do the same process 15 times.

If I am following this correctly, you can do conditional aggregation. Assuming that a "level" is stored in column level in table characters, you would do:
select i.itemId,
sum(case when c.level between 1 and 10 then 1 else 0 end) as use_1_10,
avg(case when c.level between 1 and 10 then maxUpgrade end) as avg_level_1_10,
sum(case when c.level between 11 and 20 then 1 else 0 end) as use_11_20,
avg(case when c.level between 11 and 20 then maxUpgrade end) as avg_level_11_20,
...
from items i
inner join characters c on i.characterId = c.characterId
where i.itemid in (...) and attached > 0
group by i.itemId
Note: consider prefixing column attached in the where clause with the table it belongs to, in order to avoid ambiguity.

Related

Inner join + group by - select common columns and aggregate functions

Let's say i have two tables
Customer
---
Id Name
1 Foo
2 Bar
and
CustomerPurchase
---
CustomerId, Amount, AmountVAT, Accountable(bit)
1 10 11 1
1 20 22 0
2 5 6 0
2 2 3 0
I need a single record for every joined and grouped Customer and CustomerPurchase group.
Every record would contain
columns from table Customer
some aggregation functions like SUM
a 'calculated' column. For example difference of other columns
result of subquery to CustomerPurchase table
An example of result i would like to get
CustomerPurchases
---
Name Total TotalVAT VAT TotalAccountable
Foo 30 33 3 10
Bar 7 9 2 0
I was able to get a single row only by grouping by all the common columns, which i dont think is the right way to do. Plus i have no idea how to do the 'VAT' column and 'TotalAccountable' column, which filters out only certain rows of CustomerPurchase, and then runs some kind of aggregate function on the result. Following example doesn't work ofc but i wanted to show what i would like to achieve
select C.Name,
SUM(CP.Amount) as 'Total',
SUM(CP.AmountVAT) as 'TotalVAT',
diff? as 'VAT',
subquery? as 'TotalAccountable'
from Customer C
inner join CustomerPurchase CR
on C.Id = CR.CustomerId
group by C.Id
I would suggest you just need the follow slight changes to your query. I would also consider for clarity, if you can, to use the terms net and gross which is typical for prices excluding and including VAT.
select c.[Name],
Sum(cp.Amount) as Total,
Sum(cp.AmountVAT) as TotalVAT,
Sum(cp.AmountVAT) - Sum(CP.Amount) as VAT,
Sum(case when cp.Accountable = 1 then cp.Amount end) as TotalAccountable
from Customer c
join CustomerPurchase cp on cp.CustomerId = c.Id
group by c.[Name];

Re-coding/transforming SQL values into new columns from linked data: why is CASE WHEN returning multiple values?

I work with a lot of linked data from multiple tables. As a result, I'm running into some challenges with deduplication and re-coding values into new columns in a more meaningful way.
My core data set is a list of person-level records as rows. However, the linked data include multiple rows per person based on the dates they've been booked into events, whether they've showed up or not, and whether they're a member of our organisation. There are usually multiple bookings. It is possible to lose membership status and continue to attend events/cancel/etc, but we are interested in whether or not they have ever been a member and if not, which is the highest level of contact they have ever had with our organisation.
In short: If they have ever been a member, that needs to take precedence.
select distinct
a.ticketnumber
a.id
-- (many additional columns from multiple tables here)
case
when b.Went_Member >=1 then 'Member'
when b.Went_NonMember >=1 then 'Attended but not member'
when b.Going_NonMember >=1 then 'Going but not member'
when b.OptOut='1' then 'Opt Out'
when b.Cancelled >=1 then 'Cancelled'
when c.MemberStatus = '9' then 'Member'
when c.MemberStatus = '6' then 'Attended but not member'
when c.DateBooked > current_timestamp then 'Going but not member'
when c.OptOut='1' then 'Opt out'
when c.MemberStatus = '8' then 'Cancelled'
end [NewMemberStatus]
from table1 a
left join TableWithMemberStatus1 b on a.id = b.id
left join TableWithMemberStatus2 c on a.id = c.id
-- (further left joins to additional tables here)
order by a.ticketnumber
Table b is more accurate because these are our internal records, whereas table c is from a third party. Annoyingly, the numbers in C aren't in the same meaningful order as we've decided so I can't have it select the highest value for each ID.
I was under the impression that CASE goes down the list of WHEN statements and returns the first matching value, but this will produce multiple rows. For example:
ID
NewMemberStatus
989898
NULL
989898
Cancelled
777777
Member
111111
Cancelled
111111
Member
I feel like maybe there is something missing in terms of ORDER BY or GROUP BY that I should be adding? I tried COALESCE with CASE inside and it didn't work. Should I be nesting some things in parentheses?
In your query you are showing all rows (all bookings), because there is no WHERE clause and no aggregation. But you only want one result row per person.
You want a person's best status from the internal table. If there is no entry for the person in the internal table, you want their best status from the third party table. You get the best statuses by aggregating the rows in the internal and third party tables by person. Then join to the person.
I am using status numbers, because these can be ordered (I use 1 for the best status (member), so I look for the minimum status). In the end I replace the number found with the related text (e.g. 'Member' for status 1).
select
p.*,
case coalesce(i.best_status, tp.best_status)
when 1 then 'Member'
when 2 then 'Attended but not member'
when 3 then 'Going but not member'
when 4 then 'Opt out'
when 5 then 'Cancelled'
else 'unknown'
end as status
from person p
left join
(
select
person_id,
min(case when went_member >= 1 then 1
when went_nonmember >= 1 then 2
when going_nonmember >= 1 then 3
when optout = 1 then 4
when cancelled >= 1 then 5
end) as best_status
from internal_table
group by person_id
) i on i.person_id = p.person_id
left join
(
select
person_id,
min(case when MemberStatus = 9 then 1
when MemberStatus = 6 then 2
when DateBooked > current_timestamp then 3
when optout = 1 then 4
when memberstatus = 8 then 5
end) as best_status
from thirdparty_table
group by person_id
) tp on tp.person_id = p.person_id
order by p.person_id;

Create function that returns table in SQL

I wanted to create view with some logic like using (for loop , if .. else) but since that's not supported in SQL
I thought of creating table function that takes no parameter and returns a table.
I have a table for orders as below
OrderId Destination Category Customer
----------------------------------------
6001 UK 5 Adam
6002 GER 3 Jack
And table for tracking orders as below
ID OrderID TrackingID
-----------------------
1 6001 1
2 6001 2
3 6002 2
And here are the types of tracking
ID Name
--------------
1 Processing
2 Shipped
3 Delivered
As you can see in tracking order, The order number may have more than one record depending on how many tracking events occurred.
We have more than 25 tracking types that I didn't include here. which means one order can exist 25 times in tracking order table.
Now with that being said , My requirements is to create view as below with condition that an order must belong to 5 or 3 category ( we have more than 15 categories).
And whenever I run the function it must return the updated information.
So for example, when new tracking occurs and it's inserted in tracking order , I want to run my function and see the update in the corresponding flag column (e.g isDelivered).
I'm really confused on what is the best way to achieve this. I don't need the exact script i just need to understand the way to achieve it as i'm not very familiar with SQL
It could be done with a crosstab query using conditional aggregation. Something like this
select o.OrderID,
max(case when tt.[Name]='Processing' then 1 else 0 end) isPrepared,
max(case when tt.[Name]='Shipped' then 1 else 0 end) isShipped,
max(case when tt.[Name]='Delivered' then 1 else 0 end) isDelivered
from orders o
join tracking_orders tro on o.OrderID=tro.OrderID
join tracking_types tt on tro.TrackingID=tt.TrackingID
where o.category in(3, 5)
group by o.OrderID;
[EDIT] To break out Category 3 orders, 3 additional columns were added to the cross tab.
select o.OrderID,
max(case when tt.[Name]='Processing' then 1 else 0 end) isPrepared,
max(case when tt.[Name]='Shipped' then 1 else 0 end) isShipped,
max(case when tt.[Name]='Delivered' then 1 else 0 end) isDelivered,
max(case when tt.[Name]='Processing' and o.category=3 then 1 else 0 end) isC3Prepared,
max(case when tt.[Name]='Shipped' and o.category=3 then 1 else 0 end) isC3Shipped,
max(case when tt.[Name]='Delivered' and o.category=3 then 1 else 0 end) isC3Delivered
from orders o
join tracking_orders tro on o.OrderID=tro.OrderID
join tracking_types tt on tro.TrackingID=tt.TrackingID
where o.category in(3, 5)
group by o.OrderID;

Using Count distinct case in sql and group by multiple columns

I have a query that works great (listed below). The issue I am having is we have run into a patient that has had event on two different days and because I am grouping by the PATNUM, it is only showing it as one.
How can I get it to count 1 for each time if the PATNUM and SCHDT are different
Example:
PATNUM SCHDT
12345 30817
12345 30817
54321 30817
54321 30717
PATNUM 12345 should only count once while PATNUM 54321 should count twice.
My count statement is this:
SELECT ph.*, pi.*,
COUNT(DISTINCT CASE WHEN `SERVTYPE` IN ('INPT','INPFOP','INFOBS','IP') AND Complete ='7' THEN pi.PATNUM ELSE NULL END) AS count1,
COUNT(DISTINCT CASE WHEN `SERVTYPE` IN ('INPT','INPFOP','INFOBS','IP') AND Complete ='8' THEN pi.PATNUM ELSE NULL END) AS count2
FROM patientinfo as pi
INNER JOIN physicians as ph ON pi.SURGEON=ph.PName
WHERE PID NOT IN ('1355','988','767','1289','484','2784')
GROUP BY SURGEON
ORDER BY Dept,SURGEON ASC
Which columns do you want to see?
You can adjust your GROUP BY:
SELECT
ph.pname,
ph.specialty,
SUM(CASE WHEN complete = 7 THEN 1 ELSE 0 END) count1,
SUM(CASE WHEN complete = 8 THEN 1 ELSE 0 END) count2
FROM
(
SELECT
DISTINCT
surgeon,
patnum,
schdt,
complete,
servtype
FROM patientinfo
WHERE complete IN (7,8)
AND servtype IN ('INPT','INPFOP','INFOBS','IP')
AND pid NOT IN ('1355','988','767','1289','484','2784')
) pisub
INNER JOIN physicians ph ON pisub.surgeon = ph.pname
GROUP BY ph.pname, ph.specialty
ORDER BY ph.pname, ph.specialty;
Also, I would make a few suggestions:
If you're going to give your tables an alias, then use the alias when referring to any column in your query. I've made a guess here about some of your columns as to which table they come from (e.g. dept), so feel free to change it if it is not correct
You don't need to select all records from both tables if you don't need them
The query won't run if you don't GROUP BY all columns you're selecting. I've written about this for Oracle and SQL in general, but actually in MySQL I think it does run but show incorrect results.

Pivot for redshift database

I know this question has been asked before but any of the answers were not able to help me to meet my desired requirements. So asking the question in new thread
In redshift how can use pivot the data into a form of one row per each unique dimension set, e.g.:
id Name Category count
8660 Iced Chocolate Coffees 105
8660 Iced Chocolate Milkshakes 10
8662 Old Monk Beer 29
8663 Burger Snacks 18
to
id Name Cofees Milkshakes Beer Snacks
8660 Iced Chocolate 105 10 0 0
8662 Old Monk 0 0 29 0
8663 Burger 0 0 0 18
The category listed above gets keep on changing.
Redshift does not support the pivot operator and a case expression would not be of much help (if not please suggest how to do it)
How can I achieve this result in redshift?
(The above is just an example, we would have 1000+ categories and these categories keep's on changing)
i don't think there is a easy way to do that in Redshift,
also you say you have more then 1000 categories and the number is growing
you need to taking in to account you have limit of 1600 columns per table,
see attached link
[http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_usage.html][1]
you can use case but then you need to create case for each category
select id,
name,
sum(case when Category='Coffees' then count end) as Cofees,
sum(case when Category='Milkshakes' then count end) as Milkshakes,
sum(case when Category='Beer' then count end) as Beer,
sum(case when Category='Snacks' then count end) as Snacks
from my_table
group by 1,2
other option you have is to upload the table for example to R and then you can use cast function for example.
cast(data, name~ category)
and then upload the data back to S3 or Redshift
We do a lot of pivoting at Ro - we built python based tool for autogenerating pivot queries. This tool allows for the same basic options as what you'd find in excel, including specifying aggregation functions as well as whether you want overall aggregates.
Redshift released a Pivot/Unpivot functionality on last re:Invent 2021 (December 2021): https://docs.aws.amazon.com/redshift/latest/dg/r_FROM_clause-pivot-unpivot-examples.html
SELECT *
FROM (SELECT id, Name, Category, count FROM my_table) PIVOT (
SUM(count) FOR Category IN ('Coffees', 'Milkshakes', 'Beer', 'Snacks')
);
If you will typically want to query specific subsets of the categories from the pivot table, a workaround based on the approach linked in the comments might work.
You can populate your "pivot_table" from the original like so:
insert into pivot_table (id, Name, json_cats) (
select id, Name,
'{' || listagg(quote_ident(Category) || ':' || count, ',')
within group (order by Category) || '}' as json_cats
from to_pivot
group by id, Name
)
And access specific categories this way:
select id, Name,
nvl(json_extract_path_text(json_cats, 'Snacks')::int, 0) Snacks,
nvl(json_extract_path_text(json_cats, 'Beer')::int, 0) Beer
from pivot_table
Using varchar(max) for the JSON column type will give 65535 bytes which should be room for a couple thousand categories.
#user3600910 is right with the approach however 'END' is required else '500310' invalid operation would occur.
select id,
name,
sum(case when Category='Coffees' then count END) as Cofees,
sum(case when Category='Milkshakes' then count END) as Milkshakes,
sum(case when Category='Beer' then count END) as Beer,
sum(case when Category='Snacks' then count END) as Snacks
from my_table
group by 1,2
The answer given above worked for me after switching count to 1
select id,
name,
sum(case when Category='Coffees' then 1 end) as Cofees,
sum(case when Category='Milkshakes' then 1 end) as Milkshakes,
sum(case when Category='Beer' then 1 end) as Beer,
sum(case when Category='Snacks' then 1 end) as Snacks
from my_table
group by 1,2