Finding first pending events from SQL table - sql

I have a table which contains TV Guide data.
In a simplified form, the columns look like the following...
_id, title, start_time, end_time, channel_id
What I'm trying to do is create a list of TV shows in a NOW/NEXT format. Generating the 'NOW' list (what's currently being broadcast) is easy but trying to get a list of what is showing 'NEXT' is causing me problems.
I tried this...
SELECT * from TV_GUIDE where start_time >= datetime('now') GROUP BY channel_id
Sure enough this gives me one TV show for each TV channel_id but it gives me the very last shows (by date/time) in the TV_GUIDE table.
SQL isn't my strong point and I'm struggling to work out why only the last TV shows are returned. It seems I need to do a sub-query of a query (or a query of a sub-query). I've tried combinations of ORDER BY and LIMIT but they don't help.

I believe that you first must select the couples (channel_id, ) from TV_GUIDE, based on your criteria.
Then, you will display those records of TV_GUIDE which match those criteria:
SELECT source.* from TV_GUIDE AS source
JOIN (SELECT channel_id, MIN(start_time) AS start_time
FROM TV_GUIDE
WHERE start_time >= now() GROUP BY channel_id ) AS start_times
ON (source.channel_id = start_times.channel_id
AND source.start_time = start_times.start_time)
ORDER BY channel_id;
This first selects all shows with minimum start time, one for each channel, thus giving you the channel id and the start time. Then fills in the other information (MySQL sometimes lets you retrieve that from a single query, but I feel it's a bad habit to acquire - maybe you add a field and it won't work anymore) from a JOIN with the same table.
You might want to add an index on the combined fields (start_time, channel_id). Just to be on the safe side, make it a UNIQUE index.

Related

Select the last user_property in a dataset in bigQuery per user

I have this query, and the goal of it is to get all the user_properties of the events that are stored in the dataset, now the result is around 300k+ per day and it is quite too big and I only care for one user_property per user since it will have the keys that I want
To explain it more, we record the events done by a user on the mobile/web app in the dataset, so each button he clicks or every screen he searches for, we record those in order to be used later for analysis by clients, so a single user may have 0 or 100 events per day or more and usually, the last event been recorded contains all the updated keys I want
SELECT
user_pseudo_id AS user_id,
user_properties AS user_properties
FROM
`TABLENAME`
order by user_pseudo_id, event_timestamp
I tried grouping the user_properties by user_pseaudo_id, but that obviously didn't work because the properties are not the same
My solution was to get all the results from the query above, loop over the result, and store them in a Map<String, List<FieldValue>>, well this solution Is doing the trick but userPropertiesResult.iterateAll() is too expensive and is taking a lot of time
So I came up with a better query that reduced the number of rows by a lot following this answer https://stackoverflow.com/a/43863450/7298897
SELECT
a.user_pseudo_id AS user_id,
a.user_properties AS user_properties
FROM
`TABLENAME` AS a
JOIN (
SELECT
user_pseudo_id,
MAX(event_timestamp) AS event_timestamp
FROM
`TABLENAME`
GROUP BY
user_pseudo_id) AS b
ON
a.user_pseudo_id = b.user_pseudo_id
AND a.event_timestamp = b.event_timestamp
But the problem is that the data returned is not accurate as it was before
So my question would be, How I can get the last user_property only per user?

Select latest and earliest times within a time group and a pivot statement

I have attandance data that contains a username, time, and status (IN or OUT). I want to show attendance data that contains a name, and the check in/out times. I expect a person to check in and out no more than twice a day. The data looks like this:
As you can see, my problem is that one person can have multiple data entries in different seconds for the same login attempt. This is because I get data from a fingerprint attendace scanner, and the machine in some cases makes multiple entries, sometimes just within 5-10 seconds. I want to select the data to be like this:
How can I identify the proper time for the login attempt, and then select the data with a pivot?
First, you need to normalize your data by removing the duplicate entries. In your situation, that's a challenge because the duplicated data isn't easily identified as a duplicate. You can make some assumptions though. Below, I assume that no one will make multiple login attempts in a two minute window. You can do this by first using a Common Table Expression (CTE, using the WITH clause).
Within the CTE, you can use the LAG function. Essentially what this code is saying is "for each partition of user and entry type, if the previous value was within 2 minutes of this value, then put a number, otherwise put null." I chose null as the flag that will keep the value because LAG of the first entry is going to be null. So, your CTE will just return a table of entry events (ID) that were distinct attempts.
Now, you prepare another CTE that a PIVOT will pull from that has everything from your table, but only for the entry IDs you cared about. The PIVOT is going to look over the MIN/MAX of your IN/OUT times.
WITH UNIQUE_LOGINS AS (
SELECT ID FROM LOGIN_TABLE
WHERE CASE WHEN LAG(TIME, 1, 0) OVER (PARTITION BY USERNAME, STATUS ORDER BY TIME)
+ (2/60/24) < TIME THEN NULL ELSE 1 END IS NULL ), -- Times within 2 minutes
TEMP_FOR_PIVOT AS (
SELECT USERNAME, TIME, STATUS FROM LOGIN_TABLE WHERE ID IN (SELECT ID FROM UNIQUE_LOGINS)
)
SELECT * FROM TEMP_FOR_PIVOT
PIVOT (
MIN(TIME), MAX(TIME) FOR STATUS IN ('IN', 'OUT')
)
From there, if you need to rearrange or rename your columns, then you can just put that last SELECT into yet another CTE and then select your values from it. There is some more about PIVOT here: Rotate/pivot table with aggregation in Oracle

Sum column total based on criteria sql

I am not even sure if I am asking this question correctly. Algorithmically I know what I want to do, but don't know the appropriate syntax in SQL.
I have created a table that contains total online session times by customer number, IP, session start time, and total session length. Here is an example of what this table looks like(ip and CustNo is masked, also not sure how to make tables so excuse the weirdness):
CustNo minDate maxDate ClientIp timeDiff
123456 2017-11-14-02:39:27.093 2017-11-14-02:39:59.213 1.1.1.1 0.000372
I then create another table looking for a specific type of activity and want to know how long this specific user has used that IP for before this specific activity. The second table contains each activity as a separate row, customerID, IP and a timestamp.
Up to here no issue and the tables look fine.
I now need to write the part that will look into the first table based on customer ID and IP, then sum all usage of that IP for that customer as long as session min start time is less than the activity time but I have no idea how to do this. Here is the current function (not working obviously). I am doing a left join because it is possible this will be a new IP and it may not be in the first table.
SELECT
*,
SUM(##finalSessionSums.timeDiff)
FROM
##allTransfersToDiffReceip
LEFT JOIN
##finalSessionSums ON ##allTransfersToDiffReceip.CustNo = ##finalSessionSums.CustNo
AND ##allTransfersToDiffReceip.ClientIp = ##finalSessionSums.ClientIp
AND ##allTransfersToDiffReceip.[DateTime] < ##finalSessionSums.minDate
I get an aggregate function error here but I don't know how to approach this at all.
You have a SELECT * (return all columns) and an aggregate function (In this case SUM). Whenever you combine specific columns for return alongside aggregate, summarised values you need to stipulate each column specified in the SELECT clause in the GROUP BY clause. For example
SELECT
A, B, SUM(C) as CSum
FROM
Table
GROUP BY
A, B
In cause of the few information, I can't provide a perfect solution, but I'll give it a try:
First, like Alan mentioned, you have to select only columns that you need for your aggregate-function, which is CustomerNo and Ip. To get the sums of the query, you have to group it like this:
SELECT sum(s.timeDiff) as Sum, s.custNo, s.Ip
FROM ##finalSessionSums s
INNER JOIN ##allTransfersToDiffReceip a on a.CustNo = s.CustNo
AND a.ClientIp = s.ClientIp
AND a.[DateTime] < s.minDate
GROUP BY s.custNo, s.Ip;

Postgresql/SQL - My date/timestamp logic seems solid but producing nulls

I have a massive table full of hospital visit information. Each row corresponds to one visit. The visit/row itself has a unique ID but also contains a person ID (patient) to match back to that persons specific information.
I'm building a "new patient" sequence model. In doing so, I need to remove any patient from the table who has one (or more) visits before a set date. I can't just remove records before that date, as those "loyal patients" will still have visit information.
I tried to build a look-up table with all the patient ID's that have one or more visits before a certain time. I then tried to use this table to remove all visit information for patients who have had one or more visit before that set time.
I've tried multiple variations of the below (with statements, delete statements, having statements ect.) Each time, the final table has no values. I have verified that there are "new patients" with visit dates only after the set date.
My logic feels solid but clearly something is off. Here is the last command I tried. Any help would be greatly appreciated!
create table client_myvisit_notnew_id as
select patient, admissiondate from client_myvisit_primary_temp1
where admissiondate < '2015-05-30 00:00:00';
create table client_myvisit_primary_temp2 as
select * from client_myvisit_primary_temp1
where patient not in
(select patient from client_myvisit_notnew_id);
not in is a very dangerous construct with subqueries. If any of the values returned by the subquery is NULL, then nothing ever passes the filter. Although you can fix this by adding a where clause, I suggest that you get used to not exists instead:
select mpt.*
from client_myvisit_primary_temp1 mpt
where not exists (select 1
from client_myvisit_notnew_id mni
where mpt.patient = mni.patient
);
This has the semantics that most people expect.
EDIT:
If you just want patients who's original visit is after a certain date, then use window functions:
select mpt.*
from (select mpt.*,
min(mpt.admission_date) over (partition by mpt.patient) as min_ad
from client_myvisit_primary_temp1
) mpt
where min_ad >= '2015-05-30';
Defining multiple views is not necessary.

Need help wrapping head around joins

I have a database of a service that helps people sell things. If they fail a delivery of a sale, they get penalised. I am trying to extract the number of active listings each user had when a particular penalty was applied.
I have the equivalent to the following tables(and relevant fields):
user (id)
listing (id, user_id, status)
transaction (listing_id, seller_id)
listing_history (id, listing_status, date_created)
penalty (id, transaction_id, user_id, date_created)
The listing_history table saves an entry every time a listing is modified, saving a record of what the new state of the listing is.
My goal is to end with a result table with the field: penalty_id, and number of active listings the penalised user had when the penalty was applied.
So far I have the following:
SELECT s1.penalty_id,
COUNT(s1.record_id) 'active_listings'
FROM (
SELECT penalty.id AS 'penalty_id',
listing_history.id AS 'record_id',
FROM user
JOIN penalty ON penalty.user_id = user.id
JOIN transaction ON transaction.id = penalty.transaction_id
JOIN listing_history ON listing_history.listing_id = listing.id
WHERE listing_history.date_created < penalty.date_created
AND listing_history.status = 0
) s1
GROUP BY s1.penalty_id
Status = 0 means that the listing is active (or that the listing was active at the time the record was created). I got results similar to what I expected, but I fear I may be missing something or may be doing the JOINs wrong. Would this have your approval? (apart from the obvious non-use of aliases, for clarity problems).
UPDATE - As the comments on this answer indicate that changing the table structure isn't an option, here are more details on some queries you could use with the existing structure.
Note that I made a couple changes to the query before even modifying the logic.
As viki888 pointed out, there was a problem reference to listing.id; I've replaced it.
There was no real need for a subquery in the original query; I've simplified it out.
So the original query is rewritten as
SELECT penalty.id AS 'penalty_id'
, COUNT(listing_history.id) 'active_listings'
FROM user
JOIN penalty
ON penalty.user_id = user.id
JOIN transaction
ON transaction.id = penalty.transaction_id
JOIN listing_history
ON listing_history.listing_id = transaction.listing_id
WHERE listing_history.date_created < penalty.date_created
AND listing_history.status = 0
GROUP BY penalty.id
Now the most natural way, in my opinion, to write the corrected timeline constraint is with a NOT EXISTS condition that filters out all but the most recent listing_history record for a given id. This does require thinking about some edge cases:
Could two listing history records have the same create date? If so, how do you decide which happened first?
If a listing history record is created on the same day as the penalty, which is treated as happening first?
If the created_date is really a timestamp, then this may not matter much (if at all); if it's really a date, it might be a bigger issue. Since your original query required that the listing history be created before the penalty, I'll continue in that style; but it's still ambiguous how to handle the case where two history records with matching status have the same date. You may need to adjust the date comparisons to get the desired behavior.
SELECT penalty.id AS 'penalty_id'
, COUNT(DISTINCT listing_history.id) 'active_listings'
FROM user
JOIN penalty
ON penalty.user_id = user.id
JOIN transaction
ON transaction.id = penalty.transaction_id
JOIN listing_history
ON listing_history.listing_id = transaction.listing_id
WHERE listing_history.date_created < penalty.date_created
AND listing_history.status = 0
AND NOT EXISTS (SELECT 1
FROM listing_history h2
WHERE listing_history.date_created < h2.date_created
AND h2.date_created < penalty.date_created
AND h2.id = listing_history.id)
GROUP BY penalty.id
Note that I switched from COUNT(...) to COUNT(DISTINCT ...); this helps with some edge cases where two active records for the same listing might be counted.
If you change the date comparisons to use <= instead of < - or, equivalently, if you use BETWEEN to combine the date comparisons - then you'd want to add AND h2.status != 0 (or AND h2.status <> 0, depending on your database) to the subquery so that two concurrent ACTIVE records don't cancel each other out.
There are several equivalent ways to write this, and unfortunately its the kind of query that doesn't always cooperate with a database query optimizer so some trial and error may be necessary to make it run well with large data volumes. Hopefully that gives enough insight into the intended logic that you could work out some equivalents if need be. You could consider using NOT IN instead of NOT EXISTS; or you could use an outer join to a second instance of LISTING_HISTORY... There are probably others I'm not thinking of off hand.
I don't know that we're in a position to sign off on a general statement that the query is, or is not, "correct". If there's a specific question about whether a query will include/exclude a record in a specific situation (or why it does/doesn't, or how to modify it so it won't/will), those might get more complete answers.
I can say that there are a couple likely issues:
The only glaring logic issue has to do with timeline management, which is something that causes a lot of trouble with SQL. The issue is, while your query demonstrates that the listing was active at some point before the penalty creation date, it doesn't demonstrate that the listing was still active on the penalty creation date. Consider
PENALTY
id transaction date
1 10 2016-02-01
TRANSACTION
id listing_id
10 100
LISTING_HISTORY
listing_id status date
100 0 2016-01-01
100 1 2016-01-15
The joins would create a single record, and the count for penalty 1 would include listing 100 even though its status had changed to something other than 0 before the penalty was created.
This is hard - but not impossible - to fix with your existing table structure. You could add a NOT EXISTS condition looking for another LISTING_HISTORY record matching the ID with a date between the first LISTING_HISTORY date and the PENALTY date, for one.
It would be more efficient to add an end date to the LISTING_HISTORY date, but that may not be so easy depending on how the data is maintained.
The second potential issue is the COUNT(RECORD_ID). This may not do what you mean - what COUNT(x) may intuitively seem like it should do, is what COUNT(DISTINCT RECORD_ID) actually does. As written, if the join produces two matches with the same LISTING_HISTORY.ID value - i.e. the listing became active at two different times before the penalty - the listing would be counted twice.