Having trouble getting the correct dates from a table - sql

I currently have this query with the date range set to only the 24th of February. Unfortunately my results have dates from all over the place.
Any direction as to why this is happening?
P.S. "TIMESTAMP" is the name of my column (I can't change this).
SELECT
DATE(TIMESTAMP) as dateofEvent,
eventType,
parameters.name,
parameters.value,
membershipStatus,
gender,
COUNT(*)
FROM (
SELECT
*
FROM
TABLE_DATE_RANGE(myFirstTable, TIMESTAMP('2016-02-24 00:00:00'), TIMESTAMP('2016-02-24 23:59:59'))) AS EV
JOIN
mySecondTable AS UD
ON
UD.userId = EV.userId
WHERE
eventType = 'SettingUpdated'
and (UPPER(parameters.value) = 'TRUE' or UPPER(parameters.value) = 'FALSE')
GROUP BY
dateofEvent,
eventType,
gender,
parameters.value,
parameters.name,
membershipStatus

TABLE_DATE_RANGE filters table names, it does nothing with field values.
So if you want to get only fields in specific time range, you have to make sure the field values correspond to table names. In specific case, you need to make sure to add only rows with TIMESTAMP field between '2016-02-24 00:00:00' and '2016-02-24 23:59:59' to the table myFirstTable.20160224. You likely have this violated and the table myFirstTable.20160224 contains TIMESTAMP fields all over the place.

When you use below query
SELECT *
FROM TABLE_DATE_RANGE(myFirstTable,
TIMESTAMP('2016-02-24 00:00:00'),
TIMESTAMP('2016-02-24 23:59:59'))) AS EV
FROM statement is being "translated" and you query is executed as if it were like below (see more in Table wildcard functions)
SELECT *
FROM myFirstTable20160224
So, now the fact that you are getting dates all over the place tells me that you have your timestamp field not limited to the 2016-02-24 but rather being all over the place despite table name
If you need to filter output by date you can add something like
WHERE DATE(timestamp) = '2016-02-24'

Related

Postgres: Unable to determine percent of successful events ending in a completed trip

SQL Gurus,
I'm trying to solve this challenging problem as I'm practicing my SQL skills, however I'm stuck and would appreciate if someone could help.
A signup is defined as an event labelled ‘sign_up_success’ within the events table. For each city (‘A’ and ‘B’) and each day of the week, determine the percentage of signups in the first week of 2016 that resulted in completed a trip within 10 hours of the sign up date.
Table Name: trips
Column Name Datatype
id integer
client_id integer (Foreign keyed to
events.rider_id)
driver_id integer
city_id Integer (Foreign keyed to
cities.city_id)
client_rating integer
driver_rating integer
request_at Timestamp with timezone
predicted_eta Integer
actual_eta Integer
status Enum(‘completed’,
‘cancelled_by_driver’, ‘cancelled_by_client’)
Table Name: cities
Column Name Datatype
city_id integer
city_name string
Table Name: events
Column Name Datatype
device_id integer
rider_id integer
city_id integer
event_name Enum(‘sign_up_success’, ‘attempted_sign_up’,
‘sign_up_failure’)
_ts Timestamp with timezone
Tried something on this lines, however its no where near the expected answer:
SELECT *
FROM trips AS trips
LEFT JOIN cities AS cities ON trips.city_id = cities.city_id
LEFT JOIN events AS events ON events.client_id = events.rider_id
WHERE events.event_name = "sign_up_success"
AND Convert(datetime, trips.request_at') <= Convert(datetime, '2016-01-
07' )
AND DATEDIFF(d, Convert(datetime, events._ts), Convert(datetime,
trips.request_at)) < 7 days
AND events.status = "completed
Desired Results look like below:
Monday A x%
Monday B y%
Tuesday A z%
Tuesday A p%
Can someone please help.
First of all, I assume that "trips"."city_id" is mandatory, so I use INNER JOIN instead of LEFT JOIN when joining with cities.
Then, to specify string constants, you need to use single quotes.
There are some other changes in the query -- hope you'll notice them yourself.
Also, the query might fail, since I didn't run it actually (you didn't provide boilerplate SQL unfortunately).
date_trunc() function with 'week' first parameter converts your timestamp to "first day of the corresponding week, time 00:00:00", based on your current timezone settings (see https://www.postgresql.org/docs/current/static/functions-datetime.html).
I used GROUP BY on that value and second "layer" of grouping was city ID.
Also, I used "filter (where ...)" next to count() -- it allows to count only desired rows.
Finally, I used CTE to improve the query's structure and readability.
Let me know if it fails, I'll fix it. In general, this approach must work.
with data as (
select
left(date_trunc('week', t.request_at)::text, 10) as period,
c.city_id,
count(distinct t.id) as trips_count,
count(*) filter (
where
e.event_name = 'sign_up_success'
and e._ts < t.request_at + interval '10 hour'
) as successes_count
from trips as t
join cities as c on t.city_id = c.city_id
left join events as e on t.client_id = e.rider_id and e._ts
where
t.request_at between '2016-01-01' and '2016-01-08'
group by 1, 2
)
select
*,
round(100 * success_count::numeric / trips_count, 2)::text || '%' as ratio_percent
from data
order by period, city_id
;

Other efficient way to write multiple select

I have following queries:
select * from
( select volume as vol1 from table1 where code='A1' and daytime='12-may-2012') a,
( select volume as vol2 from table2 where code='A2' and daytime='12-may-2012') b,
( select volume as vol3 from table3 where code='A3' and daytime='12-may-2012') c
result:
vol1 vol2 vol3
20 45
What would be other efficient way to write this query(in real case it could be up to 15 sub queries), assuming data not always exists in any of these tables for selected date? I think it could be join but not sure.
thanks,
S
If the concern is that data might not exist, then cross join is not the right operator. If any subquery returns zero rows, then you will get an empty result set.
Assuming at most one row is returned per query, just use subqueries in the select:
select (select volume from table1 where code = 'A1' and daytime = date '2012-05-12') as vol1,
(select volume from table2 where code = 'A2' and daytime = date '2012-05-12') as vol2,
(select volume from table3 where code = 'A3' and daytime = date '2012-05-12') as vol3
from dual;
If a value is missing, it will be NULL. If a subquery returns more than one row, then you'll get an error.
I much prefer ANSI standard formats, which is why I use the date keyword.
I am highly suspicious of comparing a field called datetime to a date constant with no time component. I would double check the logic on this. Perhaps you intend trunc(daytime) = date '2012-05-12' or something similar.
I should also note that if performance is an issue, then you want an index on each table on (code, daytime, volume).

SQL merging result sets on a unique column value

I have 2 similar queries which both work on the same table, and I essentially want to combine their results such that the second query supplies default values for what the first query doesn't return. I've simplified the problem as much as possible here. I'm using Oracle btw.
The table has account information in it for a number of accounts, and there are multiple entries for each account with a commit_date to tell when the account information was inserted. I need get the account info which was current for a certain date.
The queries take a list of account ids and a date.
Here is the query:
-- Select the row which was current for the accounts for the given date. (won't return anything for an account which didn't exist for the given date)
SELECT actr.*
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
AND actr.commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ')
AND actr.commit_date =
(
SELECT MAX(actrInner.commit_date)
FROM Account_Information actrInner
WHERE actrInner.account_id = actr.account_id
AND actrInner.commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ')
)
This looks a little ugly, but it returns a single row for each account which was current for the given date. The problem is that it doesn't return anything if the account didn't exist until after the given date.
Selecting the earliest account info for each account is trival - I don't need to supply a date for this one:
-- Select the earliest row for the accounts.
SELECT actr.*
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
AND actr.commit_date =
(
SELECT MAX(actrInner .commit_date)
FROM Account_Information actrInner
WHERE actrInner .account_id = actr.account_id
)
But I want to merge the result sets in such a way that:
For each account, if there is account info for it in the first result set - use that.
Otherwise, use the account info from the second result set.
I've researched all of the joins I can use without success. Unions almost do it but they will only merge for unique rows. I want to merge based on the account id in each row.
Sql Merging two result sets - my case is obviously more complicated than that
SQL to return a merged set of results - I might be able to adapt that technique? I'm a programmer being forced to write SQL and I can't quite follow that example well enough to see how I could modify it for what I need.
The standard way to do this is with a left outer join and coalesce. That is, your overall query will look like this:
SELECT ...
FROM defaultQuery
LEFT OUTER JOIN currentQuery ON ...
If you did a SELECT *, each row would correspond to the current account data plus your defaults. With me so far?
Now, instead of SELECT *, for each column you want to return, you do a COALESCE() on matched pairs of columns:
SELECT COALESCE(currentQuery.columnA, defaultQuery.columnA) ...
This will choose the current account data if present, otherwise it will choose the default data.
You can do this more directly using analytic functions:
select *
from (SELECT actr.*, max(commit_date) over (partition by account_id) as maxCommitDate,
max(case when commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ') then commit_date end) over
(partition by account_id) as MaxCommitDate2
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
) t
where (MaxCommitDate2 is not null and Commit_date = MaxCommitDate2) or
(MaxCommitDate2 is null and Commit_Date = MaxCommitDate)
The subquery calculates two values, the two possibilities of commit dates. The where clause then chooses the appropriate row, using the logic that you want.
I've combined the other answers. Tried it out at apex.oracle.com. Here's some explanation.
MAX(CASE WHEN commit_date <= to_date('2010-DEC-30', 'YYYY-MON-DD')) will give us the latest date not before Dec 30th, or NULL if there isn't one. Combining that with a COALESCE, we get
COALESCE(MAX(CASE WHEN commit_date <= to_date('2010-DEC-30', 'YYYY-MON-DD') THEN commit_date END), MAX(commit_date)).
Now we take the account id and commit date we have and join them with the original table to get all the other fields. Here's the whole query that I came up with:
SELECT *
FROM Account_Information
JOIN (SELECT account_id,
COALESCE(MAX(CASE WHEN commit_date <=
to_date('2010-DEC-30', 'YYYY-MON-DD')
THEN commit_date END),
MAX(commit_date)) AS commit_date
FROM Account_Information
WHERE account_id in (30000316, 30000350, 30000351)
GROUP BY account_id)
USING (account_id, commit_date);
Note that if you do use USING, you have to use * instead of acrt.*.

How to group by a date column by month

I have a table with a date column where date is stored in this format:
2012-08-01 16:39:17.601455+0530
How do I group or group_and_count on this column by month?
Your biggest problem is that SQLite won't directly recognize your dates as dates.
CREATE TABLE YOURTABLE (DateColumn date);
INSERT INTO "YOURTABLE" VALUES('2012-01-01');
INSERT INTO "YOURTABLE" VALUES('2012-08-01 16:39:17.601455+0530');
If you try to use strftime() to get the month . . .
sqlite> select strftime('%m', DateColumn) from yourtable;
01
. . . it picks up the month from the first row, but not from the second.
If you can reformat your existing data as valid timestamps (as far a SQLite is concerned), you can use this relatively simple query to group by year and month. (You almost certainly don't want to group by month alone.)
select strftime('%Y-%m', DateColumn) yr_mon, count(*) num_dates
from yourtable
group by yr_mon;
If you can't do that, you'll need to do some string parsing. Here's the simplest expression of this idea.
select substr(DateColumn, 1, 7) yr_mon, count(*) num_dates
from yourtable
group by yr_mon;
But that might not quite work for you. Since you have timezone information, it's sure to change the month for some values. To get a fully general solution, I think you'll need to correct for timezone, extract the year and month, and so on. The simpler approach would be to look hard at this data, declare "I'm not interested in accounting for those edge cases", and use the simpler query immediately above.
It took me a while to find the correct expression using Sequel. What I did was this:
Assuming a table like:
CREATE TABLE acct (date_time datetime, reward integer)
Then you can access the aggregated data as follows:
ds = DS[:acct]
ds.select_group(Sequel.function(:strftime, '%Y-%m', :date_time))
.select_append{sum(:reward)}.each do |row|
p row
end

SELECT with MAX and SUM from multiple tables

I have 3 tables :
weather_data (hourly_date, rain)
weather_data_calculated (hourly_date, calc_value)
weather_data_daily (daily_date, daily_value)
I would like to get a list of DAILY value from these 3 tables using this select :
SELECT daily_date, daily_value, SUM(rain), MAX(calc_value)
The SUM and the MAX need to be done for all the hour of the day.
This is what I did :
SELECT
date_format(convert_tz(daily_date, 'GMT', 'America/Los_Angeles'), '%Y-%m-%d 00:00:00') as daily_date_gmt,
daily_value,
SUM(rain),
MAX(calc_value)
FROM weather_data_daily wdd, weather_data wd, weather_data_calculated wdc
WHERE daily_date_gmt=date_format(convert_tz(wd.hourly_date, 'GMT', 'America/Los_Angeles'), '%Y-%m-%d 00:00:00')
and daily_date_gmt=date_format(convert_tz(wdc.hourly_date, 'GMT', 'America/Los_Angeles'), '%Y-%m-%d 00:00:00')
group by daily_date_gmt
order by daily_date_gmt;
This didn't work because I don't know how to deal with the group by in this case.
I also try to use a temporary table but without success too.
Thanks for your help!
Either include daily_value in your group by, or use two queries. One will contain the date column and the two aggregates, the other will contain the date column and daily value. you can then use a single outer query to join these result sets on the date column.
EDIT: You say in your comment that including daily_value in the group by means the query doesn't complete. This is because (probably) you have no join criteria between all the tables your query includes. This will result in a potentially VERY large result set which would take a very long time. I don't mind helping with the actual SQL but you will need to update your question so that we can see which fields are coming from which tables.
Assuming you only have one entry for daily_date, daily_value in 'weather_data_daily' you should
GROUP BY daily_date, daily_value, then your aggregrations (SUM and MAX) will operate on the correct grouping.
try this:
select a.daily_date, a.daily_value, SUM(b.rain), MAX(c.calc_value)
from weather_data_daily a,weather_data b,weather_data_calculated c
where convert(varchar, a.daily_date, 101)=convert(varchar, b.hourly_date, 101)
and convert(varchar, a.daily_date, 101)=convert(varchar, c.hourly_date, 101)
group by a.daily_date, a.daily_value
You have to connect the tables together somehow (this uses an inner join). This requires getting the hourly dates and other dates in the same format. This gives them the format MM/DD/YYYY.