Can SQL Compare rows in same table , and dynamic select value? - sql

Recently, i got a table which name Appointments
The requirement is that i need to select only one row for each customer by 2 rule:
if same time and (same location or different location), put null on tutor and location.
if different time and (same location or different location), pick the smallest row.
Since i'm so amateur in SQL, i've search the method of self join, but it seems not working in this case.
Expected result
Thanks all, have a great day...

You seem to want the minimum time for each customer, with null values if there are multiple rows and the tutor or location don't match.
You can use window functions:
select customer, starttime,
(case when min(location) = max(location) then min(location) end) as location,
(case when min(tutor) = max(tutor) then min(tutor) end) as tutor
from (select t.*, rank() over (partition by customer order by starttime) as seqnum
from t
) t
where seqnum = 1
group by customer, starttime

Related

PostgreSQL fill out nulls with previous value by category

I am trying fill out some nulls where I just need them to be the previous available value for a name (sorted by date).
So, from this table:
I need the query to output this:
Now, the idea is that for Jane, on the second and third there was no score, so it should be equal to the previous date on which an score was available, for Jane. And the same for Jon. I am trying coalesce and range, but range is not implemented yet in Redshift. I also looked into other questions and they don't fully apply to different categories. Any alternatives?
Thanks!
select day, name,
coalesce(score, (select score
from [your table] as t
where t.name = [your table].name and t.date < [your table].date
order by date desc limit 1)) as score
from [your table]
The query straightforwardly implements the logic you described:
if score is not null, coalesce will return its value without executing the subquery
if score is null, the subquery will return the last available score for that name before the given date
It's a "gaps and islands" problem and a query can be like this
SELECT
day,
name,
MAX(score) OVER (PARTITION BY name, group_id) AS score
FROM (
SELECT
*,
SUM(CASE WHEN score IS NULL THEN 0 ELSE 1 END) OVER (PARTITION BY name ORDER BY day) AS group_id
FROM data
) groups
ORDER BY name DESC, day
You can check a working demo here

How do we find frequency of one column based off two other columns in SQL?

I'm relatively new to working with SQL and wasn't able to find any past threads to solve my question. I have three columns in a table, columns being name, customer, and location. I'd like to add an additional column determining which location is most frequent, based off name and customer (first two columns).
I have included a photo of an example where name-Jane customer-BEC in my created column would be "Texas" as that has 2 occurrences as opposed to one for California. Would there be anyway to implement this?
If you want 'Texas' on all four rows:
select t.Name, t.Customer, t.Location,
(select t2.location
from table1 t2
where t2.name = t.name
group by name, location
order by count(*) desc
fetch first 1 row only
) as most_frequent_location
from table1 t ;
You can also do this with analytic functions:
select t.Name, t.Customer, t.Location,
max(location) keep (dense_rank first order by location_count desc) over (partition by name) most_frequent_location
from (select t.*,
count(*) over (partition by name, customer, location) as location_count
from table1 t
) t;
Here is a db<>fiddle.
Both of these version put 'Texas' in all four rows. However, each can be tweaks with minimal effort to put 'California' in the row for ARC.
In Oracle, you can use aggregate function stats_mode() to compute the most occuring value in a group.
Unfortunately it is not implemented as a window function. So one option uses an aggregate subquery, and then a join with the original table:
select t.*, s.top_location
from mytable t
inner join (
select name, customer, stats_mode(location) top_location
from mytable
group by name, customer
) s where s.name = t.name and s.customer = t.customer
You could also use a correlated subquery:
select
t.*,
(
select stats_mode(t1.location)
from mytable t1
where t1.name = t.name and t1.customer = t.customer
) top_location
from mytable t
This is more a question about understanding the concepts of a relational database. If you want that information, you would not put that in an additional column. It is calculated data over multiple columns - why would you store that in the table itself ? It is complex to code and it would also be very expensive for the database (imagine all the rows you have to calculate that value for if someone inserted a million rows)
Instead you can do one of the following
Calculate it at runtime, as shown in the other answers
if you want to make it more persisent, you could embed that query above in a view
if you want to physically store the info, you could use a materialized view
Plenty of documentation on those 3 options in the official oracle documentation
Your first step is to construct a query that determines the most frequent location, which is as simple as:
select Name, Customer, Location, count(*)
from table1
group by Name, Customer, Location
This isn't immediately useful, but the logic can be used in row_number(), which gives you a unique id for each row returned. In the query below, I'm ordering by count(*) in descending order so that the most frequent occurrence has the value 1.
Note that row_number() returns '1' to only one row.
So, now we have
select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1 tb_
group by Name, Customer, Location
The final step puts it all together:
select tab.*, tb_.Location most_freq_location
from table1 tab
inner join
(select Name, Customer, Location, row_number() over (partition by Name, Customer order by count(*) desc) freq_name_cust
from table1
group by Name, Customer, Location) tb_
on tb_.Name = tab.Name
and tb_.Customer = tab.Customer
and freq_name_cust = 1
You can see how it all works in this Fiddle where I deliberately inserted rows with the same frequency for California and Texas for one of the customers for illustration purposes.

Generic SQL Question (Big Query) - removing rows after a date that is different for each customer

I want to remove all customer hits that I see on my site after they have registered. However, not all customers will register on the same day so I cannot simply filter on a specific date. I have a registration indicator of 1 or 0 and then a hit timestamp, along with unique indicators for the specific customers. I have tried this:
rank() over (partition by customer_id, registration_ind order by hit_timestamp asc) rnk
However, this still partitions by customer and isn't working for what I want.
Any help please?
THanks
Is this what you want?
select t.*
from (select t.*,
min(case when registration_ind = 1 then hit_timestamp end) over (partition by customer_id) as registration_timestamp
from t
) t
where registration_timestamp is null or
hit_timestamp < registration_timestamp;
It returns all rows before the first registration timestamp.

How to count rows in SQL Server 2012?

I am trying to find whether a person (id = A3) is continuously active in a program at least five months or more in a given year (2013). Any suggestion would be appreciated. My data look like as follows:
You simply use group by and a conditional expression:
select id,
(case when count(ActiveMonthYear) >= 5 then 'YES!' else 'NAW' end)
from table t
where ListOfTheMonths between '201301' and '201312'
group by id;
EDIT:
I suppose "continuously" doesn't just mean any five months. For that, there are various ways. I like the difference of row numbers approach
select distinct id
from (select t.*,
(row_number() over (partition by id order by ListOfTheMonths) -
count(ActiveMonthYear) over (partition by id order by ListOfTheMonths)
) as grp
from table t
where ListOfTheMonths between '201301' and '201312'
) t
where ActiveMonthYear is not null
group by id, grp
having count(*) >= 5;
The difference in the subquery is constant for groups of consecutive active months. This is then used a grouping. The result is a list of all ids that meet this criteria. You can add a where for a particular id (do it in the subquery).
By the way, this is written using select distinct and group by. This is one of the rare cases where these two are appropriately used together. A single id could have two periods of five months in the same year. There is no reason to include that person twice in the result set.

T-SQL Select first instance

I have a table that contains patient locations. I'm trying to find the first patient location that is not the emergency department. I tried using MIN but since the locations have numbers in them it pulls the MIN location but not necessarily the first location. There is a datetime field associated with the location, but I'm not certain how to link the min datetime to the first location. Any help would be appreciated. My query looks something like this:
SELECT PatientName,
MRN,
CSN,
MIN (LOC) as FirstUnit,
MIN (DateTime)as FirstUnitTime
FROM Patients
WHERE LOC <> 'ED'
I presume that you want the first unit for each patient. If so, then you can use row_number():
select PatientName, MRN, CSN, LOC as FirstUnit, DateTime as FirstUnitTime
from (select p.*,
row_number() over (partition by PatientName, MRN, CSN
order by datetime asc) as seqnum
from Patients p
where loc <> 'ED'
) p
where seqnum = 1;
row_number() assigns a sequential number to a group of rows, where the group is specified by the partition by clause. The numbers are in order, as defined by the order by clause. So, the oldest (first) row in each group is assigned a value of 1. The outside query chooses this row.