Help me build a SQL select statement - sql

SQL isn't my greatest strength and I need some help building a select statement.
Basically, this is my requirement. The table stores a list of names and a timestamp of when the name was entered in the table. Names may be entered multiple times during a week, but only once a day.
I want the select query to return names that were entered anytime in the past 7 days, but not today.
To get a list of names entered today, this is the statement I have:
Select * from table where Date(timestamp) = Date(now())
And to get a list of names entered in the past 7 days, not including today:
Select * from table where (Date(now())- Date(timestamp) < 7) and (date(timestamp) != date(now()))
If the first query returns a set or results, say A, and the second query returns B, how can I get
B-A

Try this if you're working with SQL Server:
SELECT * FROM Table
WHERE Timestamp BETWEEN
dateadd(day,datediff(day,0,getdate()),-7),
AND dateadd(day,datediff(day,0,getdate()),0)
This ensures that the timestamp is between 00:00 7 days ago, and 00:00 today. Today's entries with time greater than 00:00 will not be included.

In plain English, you want records from your second query where the name is not in your first query. In SQL:
Select *
from table
where (Date(now())- Date(timestamp) < 7)
and (date(timestamp) != date(now()))
and name not in (Select name
from table
where Date(timestamp) = Date(now())
)

not in
like
select pk from B where PK not in A
or you can do something like
Select * from table where (Date(now())- Date(timestamp) < 7) and (Date(now())- Date(timestamp) > 1)

Related

Unable to divide to counts of two separate lists in SQL, keeps returning 1

I have one list of events. One event name is creating an account and another is creating an account with Facebook. I am trying to see what percentage of accounts created use Facebook.
The code below will give me an accurate count of the number of facebook accounts and total accounts, but when I try to divide the two numbers it just gives me the number 1.
I am very new to SQL, and have spent hours trying to figure out why it is doing that to no avail.
with
fb_act as (
select *
from raw_event
where name = 'onboard_fb_success'
and event_ts::date >= current_date - 30
),
total_act as (
select *
from raw_event
where name ='create_account'
and event_ts::date >= current_date - 30
)
select count(fb_act)/count(total_act), total_act.event_ts::date as day
from total_act, fb_act
group by day
order by day
I expect the output to be about ~.3, but the actual output is always exactly 1.
Conditional aggregation is a much simpler way to write the query. You appear to be using Postgres, so something like this:
select re.event_ts::date as day,
(sum( (name = 'onboard_fb_success' and event_ts::date >= current_date - 30):: int) /
sum( name = 'create_account' and event_ts::date >= current_date - 30)::int)
) as ratio
from raw_event re
group by re.event_ts::date
order by day;

sql query to get today new records compared with yesterday

i have this table:
COD (Integer) (PK)
ID (Varchar)
DATE (Date)
I just want to get the new ID's from today, compared with yesterday (the ID's from today that are not present yesterday)
This needs to be done with just one query, maximum efficiency because the table will have 4-5 millions records
As a java developer i am able to do this with 2 queries, but with just one is beyond my knowledge so any help would be so much appreciated
EDIT: date format is dd/mm/yyyy and every day each ID may come 0 or 1 times
Here is a solution that will go over the base data one time only. It selects the id and the date where the date is either yesterday or today (or both). Then it GROUPS BY id - each group will have either one or two rows. Then it filters by the condition that the MIN date in the group is "today". Those are the id's that exist today but did not exist yesterday.
DATE is an Oracle keyword, best not used as a column name. I changed that to DT. I also assume that your "dt" field is a pure date (as pure as it can be in Oracle, meaning: time of day, which is always present, is 00:00:00).
select id
from your_table
where dt in (trunc(sysdate), trunc(sysdate) - 1)
group by id
having min(dt) = trunc(sysdate)
;
Edit: Gordon makes a good point: perhaps you may have more than one such row per ID, in the same day? In that case the time-of-day may also be different from 00:00:00.
If so, the solution can be adapted:
select id
from your_table
where dt >= trunc(sysdate) - 1 and dt < trunc(sysdate) + 1
group by id
having min(dt) >= trunc(sysdate)
;
Either way: (1) the base table is read just once; (2) the column DT is not wrapped within any function, so if there is an index on that column, it can be used to access just the needed rows.
The typical method would use not exists:
select t.*
from t
where t.date >= trunc(sysdate) and t.date < trunc(sysdate + 1) and
not exists (select 1
from t t2
where t2.id = t.id and
t2.date >= trunc(sysdate - 1) and t2.date < trunc(sysdate)
);
This is a general solution. If you know that there is at most one record per day, there are better solutions, such as using lag().
Use MINUS. I suppose your date column has a time part, so you need to truncate it.
select id from mytable where trunc(date) = trunc(sysdate)
minus
select id from mytable where trunc(date) = trunc(sysdate) - 1;
I suggest the following function index. Without it, the query would have to full scan the table, which would probably be quite slow.
create idx on mytable( trunc(sysdate) , id );

SELECT statement optimization

I'm not so expert in SQL queryes, but not even a complete newbie.
I'm exporting data from a MS-SQL database to an excel file using a SQL query.
I'm exporting many columns and two of this columns contain a date and an hour, this are the columns I use for the WHERE clause.
In detail I have about 200 rows for each day, everyone with a different hour, for many days. I need to extract the first value after the 15:00 of each day for more days.
Since the hours are different for each day i can't specify something like
SELECT a,b,hour,day FROM table WHERE hour='15:01'
because sometimes the value is at 15:01, sometimes 15:03 and so on (i'm looking for the closest value after the 15:00), for fix this i used this workaround:
SELECT TOP 1 a,b,hour,day FROM table WHERE hour > "15:00"
in this way i can take the first value after the 15:00 for a day...the problem is that i need this for more days...for a user-specifyed interval of days. At the moment i fix this with a UNION ALL statement, like this:
SELECT TOP 1 a,b,hour,day FROM table WHERE data="first_day" AND hour > "15:00"
UNION ALL SELECT TOP 1 a,b,hour,day FROM table WHERE data="second_day" AND hour > "15:00"
UNION ALL SELECT TOP 1 a,b,hour,day FROM table WHERE data="third_day" AND hour > "15:00"
...and so on for all the days (i build the SQL string with a for each day in the specifyed interval).
Until now this worked, but now I need to expand the days interval (now is maximun a week, so 5 days) to up to 60 days. I don't want to build an huge query string, but i can't imagine an alternative way for write the SQL.
Any help appreciated
Ettore
I typical solution for this uses row_number():
SELECT a, b, hour, day
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY day ORDER BY hour) as seqnum
FROM table t
WHERE hour > '15:00'
) t
WHERE seqnum = 1;

generate each minute string for a day within specified time limit

My aim is to generate per minute count of all records existing in a table like this.
SELECT
COUNT(*) as RECORD_COUNT,
to_Char(MY_DATE,'HH24:MI') MINUTE_GAP
FROM
TABLE_A
WHERE
BLAH='Blah! Blah!!'
GROUP BY
to_Char(MY_DATE,'HH24:MI')
However, This query doesn't give me the minutes where there were no results.
To get the desired result it, I'm to using the following query to fill the gaps in the original query by doing a JOIN between these two results.
SELECT
*
FROM
( SELECT
TO_CHAR(TRUNC(SYSDATE)+( (ROWNUM-1) /1440) ,'HH24:MI') as MINUTE_GAP,
0 as COUNT
FROM
SOME_LARGE_TABLE_B
WHERE
rownum<=1440
)
WHERE
minute_gap>'07:00' /*I want only the data starting from 7:00AM*/
This works for me, But
I can't rely on SOME_LARGE_TABLE_B to generate the minutes
because it might have no records at some point in future
The query doesn't look like a professional solution.
Is there any easier way to do this?
NOTE:I don't want any new tables created with static values for all the minutes just for one query.
Just generate your timestamps and left join your grouped data to it:
SELECT MINUTE, ....
FROM (
SELECT TO_CHAR(TO_DATE((LEVEL + 419) * 60, 'SSSSS'), 'HH24:MI') MINUTE /* 07:00 - 23:59 */ FROM DUAL CONNECT BY LEVEL <= 1020)
LEFT JOIN (
<your grouped subquery>
) ON MINUTE = MINUTE_GAP

SQL Performance when quering for a time interval

I have one table of tickets containing three relevant columns: id, start and finish where start and finish are timestamps.
I have a second table (intervals) with only one relevant column which is time point. time_point is also a timestamp. time_point is always every 15 minutes. That is content of this second table is:
8:00
8:15
8:30
...
The first table (ticket) has 4 millions of records. The second has only 96 records (24 * 4).
I have to select how many tickets are open at any time_point
I wrote the following query: (simplified version)
select *
from interval, ticket
where (finish is null or finish > time_point)
and start < time_point
which works but it is too slow. The problem is that there is no real join between both tables and a I presume that a full table scan is performed for every row.
How can I get better performance here?
Thanks!
EDIT: This is an Oracle DB.
i believe you dont need to cross join or create an interval table.
Instead try following:
> select count(*), tsd from (
> select
> /****************************************************************
> Now
> 1- bring your finish column into the format you need: HH24:MI
> 2- truncate its content down to the interval the row belongs to
> ****************************************************************/
> to_char(dt,'HH24')|| decode(trunc(to_char(dt,'MI')/15) * 15,0,'00',trunc(to_char(dt,'MI')/15)*15)
> tsd
> from (
> select nvl( finish ,to_date('31.12.2999', 'dd.mm.yyyy')) dt --
> from tickets
> /****************************************************************
> Now Filter out your tickets(before truncate), to find the relevant
> tickets for your period use a Parameter date and compare it to the
> start and end columns nvl( finish ,to_date('31.12.2999', dd.mm.yyyy'))
> ****************************************************************/
> where P_YOUR_PARAM_DATE between start
> and nvl( finish ,to_date('31.12.2999', 'dd.mm.yyyy'))
> ) dat
> ) group by tsd order by tsd ;
One way to speed this up is to include the finish column in a composite index so there's no need to read from the table to fetch that value:
create index IX_Tickets on Tickets(start,finish)
P.S. Drop any simple index on Tickets.start as well.
P.P.S. Please clarify: 8:00, 8:15 in your intervals table are not timestamp data type. Did you elminate the date element in your question for the sake of simplicity?