Comparing Dates in PL/SQL (I'm a beginner, be gentle) - sql

Relatively new to SQL, so still trying to get my head around a couple of concepts.
I have two tables, the first has a bunch of VARCHAR attributes, as well as a 'DAY' column which is in date format, and 'USAGE', a number.
The second table only has one column, 'HOLIDAY_DATE', also datatype date. As the name suggests, this is a bunch of dates corresponding to past and future holidays.
I'm trying to find 'USAGE' on days that are not holidays, by comparing 'DAY' in the first table to 'DATE' in the second. My select statement so far is:
SELECT DAY, USAGE
FROM FIRST_TABLE, HOLIDAYS
WHERE FIRST_TABLE.DAY, NOT LIKE HOLIDAYS.HOLIDAY_DATE
GROUP BY DAY, USAGE
ORDER BY DAY;
But this still seems to bring up ALL the days, including holidays. Not quite sure where to go from here.

SELECT DAY, USAGE
FROM FIRST_TABLE
WHERE FIRST_TABLE.DAY NOT in (select HOLIDAY_DATE from HOLIDAYS)
GROUP BY DAY, USAGE
ORDER BY DAY

there may be other ways but you can use subquery concept
SELECT DAY, USAGE
FROM FIRST_TABLE
WHERE DAY NOT IN(Select HOLIDAY_DATE FROM HOLIDAYS)
GROUP BY DAY, USAGE
ORDER BY DAY;
One More thing I want to add here eventhough your query has logical issue but there is Syntaxt error also
WHERE FIRST_TABLE.DAY, NOT LIKE HOLIDAYS.HOLIDAY_DATE
^

Related

SQL query to count number of checkins per month

To put a long story short, I am working on a database using PostgreSQL that is managing yelp checkins. The checkintable has the attributes business_id(string), date(string in form yyyy-mm-dd), and time(string in form 00:00:00).
What I simply need to do is, given a business_id, I need to return a list of the total number of checkins based on just the mm (month) value.
So for instance, I need to retrieve the total checkins that were in Jan, Feb, March, April, etc, not based upon the year.
Any help is greatly appreciated. I've already considered group by clauses but I didn't know how to factor in '%mm%'.
Reiterating Gordon, class or not, storing dates and times as strings makes things harder, slower, and more likely to break. It's harder to take advantage of Postgres's powerful date math functions. Storing dates and times separately makes things even harder; you have to concatenate them together to get the full timestamp which means it will not be indexed. Determining the time between two events becomes unnecessarily difficult.
It should be a single timestamp column. Hopefully your class will introduce that shortly.
What I simply need to do is, given a business_id, I need to return a list of the total number of checkins based on just the mm (month) value.
This is deceptively straightforward. Cast your strings to dates, fortunately they're in ISO 8601 format so no reformatting is required. Then use extract to extract just the month part.
select
extract('month' from checkin_date::date) as month,
count(*)
from yelp_checkins
where business_id = ?
group by month
order by month
But there's a catch. What if there are no checkins for a business on a given month? We'll get no entry for that month. This is a pretty common problem.
If we want a row for every month, we need to generate a table with our desired months with generate_series, then left join with our checkin table. A left join ensures all the months (the "left" table) will be there even if there is no corresponding month in the join table (the "right" table).
select
months.month,
count(business_id)
from generate_series(1,12) as months(month)
left join yelp_checkins
on months.month = extract('month' from checkin_date::date)
and business_id = ?
group by months.month
order by months.month
Now that we have a table of months, we can group by that. We can't use a where business_id = ? clause or that will filter out empty months after the left join has happened. Instead we must put that as part of the left join.
Try it.
Why would you store the date as a string? That is a broken data model. You should fix the data.
That said, I recommend converting a date and truncating to the first day of the month:
select date_trunc('day', datestr::date) as yyyymm, count(*)
from t
group by yyyymm
order by yyyymm;
If you don't want these based on the year, then use extract():
select extract(month from datestr::date) as mm, count(*)
from t
group by mm
order by mm;

Detecting Invalid Dates in Oracle 11g database (ORA-01847 )

I am querying an Oracle 11.2 instance to build a small data mart that includes extracting the date of birth and date of death of people.
Unfortunately the INSERT query (which takes its data from a SELECT) fails due to ORA-01847 (day of month must be between 1 and last day of month).
To find my bad dates I first did:
SELECT extract(day FROM SOME_DT_TM),
extract(month FROM SOME_DT_TM),
COUNT(*)
FROM PERSON
GROUP BY extract(day FROM SOME_DT_TM), extract(month FROM SOME_DT_TM)
ORDER BY COUNT(*) DESC;
It gave me 367 rows, one for each day of the year including NULL and February-29th (leap year). True for the other date column as well, so it looks like the data is fine from a SELECT perspective.
However if I set logging up on my insert
create table registry_new_dates
(some_dob date, some_death_date date);
exec dbms_errlog.create_error_log('SOME_NEW_DATES');
And then run my long insert query:
SELECT some_dob,some_death_date,ora_err_mesg$ FROM ERR$_SOME_NEW_DATES;
I get the following weird results (first 3 rows shown) which makes me think that zip codes have been somehow inserted instead of dates for the 2nd column.
31-DEC-25 35244 "ORA-01847: day of month must be between 1 and last day of month"
13-DEC-33 35244-3402 "ORA-01847: day of month must be between 1 and last day of month"
23-JUN-58 35235 "ORA-01847: day of month must be between 1 and last day of month"
My question is - how do I detect these bad rows (there are 11 apparentlyh) with an SQL statement so I can fix or remove them. Fixing them in the originating table is not an option (no write privileges). I tried using queries like this:
SELECT DECEASED_DT_TM
FROM WH_CLN_PERSON
WHERE DECEASED_DT_TM LIKE '35%'
AND rownum<3;
But it did not find the offending rows.
Not sure if you are still actively researching this (or if you got an answer already).
To find the rows with the bad data, can't you instead select the DOB and the date of death, and express the WHERE clause in terms of DOB - like so:
...WHERE some_dob = to_date('31-DEC-25')
? After you find those rows, you may want to do another query on just one or two of those rows, including a calculated column: dump(date of death). Then post that. We can learn a lot from the dump - the internal representation of the so-called "date" (which may very well be a ZIP code instead). With that in hand we may be able to figure out what's stored, and how to hunt for it.

How to break down smalldatetime into year, month, day indexes?

I am using SQL Server 2008.
I have some dates in my database that I "think" I want to break down into smaller parts. The dates are birthdays and death days. I want to be able to output them like by querying people who were born in October or on May 12th or in 1945.
I was told that a typical way of doing this is to take a date and break it into smaller pieces and put each piece of the date into its own column, like this:
2001-03-12 00:00:00 // EventDate column
Add these columns:
2001 // EventYear column
03 // EventMonth column
12 // EventDay column
First, is this a good way of doing this? If so, second, can I somehow have SQL Server automatically break the date part and put it into its own columns?
I'd appreciate ideas and solutions.
I would recommend that you leave it as a date column and then use DatePart in queries to filter results.
Select * from TABLEX
where DatePart(YEAR,EventDate) = 1945
It doesn't sound like the business requirement is very solidified. For what reason would you need to break out the different parts of the date? If you don't need to, then I wouldn't.
But, if you do find the necessity to do this then I'd utilize computed columns that are persisted. There wil lbe some overhead on an insert, but because there won't be any updates on existing data (your birthdate and death date won't change) then you won't see any performance overhead on a SELECT.
Something like this:
create table DateTest
(
SomeDate datetime not null,
SomeYear as datepart(yy, somedate) persisted,
SomeMonth as datepart(mm, somedate) persisted,
SomeDay as datepart(dd, somedate) persisted
)
Here is what I do.
I have a table "lib.Dates". It has a DATE as primary key.
It has additional columns with additional information to this date. This is for example day of month, day to end of month, week of year etc.
Joining this date table with dates allows me to:
* Get a list of all dates (for example grouping sales per person by date would have no entry for zero sales, this way it can have)
* Do funny things like all dates in week 23 of a year, which is normally harder to get.
This is part of a number of such tables that I have stored procedures maintain daily (-3 years, +5 years).

PostgreSQL - GROUP BY timestamp values?

I've got a table with purchase orders stored in it. Each row has a timestamp indicating when the order was placed. I'd like to be able to create a report indicating the number of purchases each day, month, or year. I figured I would do a simple SELECT COUNT(xxx) FROM tbl_orders GROUP BY tbl_orders.purchase_time and get the value, but it turns out I can't GROUP BY a timestamp column.
Is there another way to accomplish this? I'd ideally like a flexible solution so I could use whatever timeframe I needed (hourly, monthly, weekly, etc.) Thanks for any suggestions you can give!
This does the trick without the date_trunc function (easier to read).
// 2014
select created_on::DATE from users group by created_on::DATE
// updated September 2018 (thanks to #wegry)
select created_on::DATE as co from users group by co
What we're doing here is casting the original value into a DATE rendering the time data in this value inconsequential.
Grouping by a timestamp column works fine for me here, keeping in mind that even a 1-microsecond difference will prevent two rows from being grouped together.
To group by larger time periods, group by an expression on the timestamp column that returns an appropriately truncated value. date_trunc can be useful here, as can to_char.

Query to find a weekly average

I have an SQLite database with the following fields for example:
date (yyyymmdd fomrat)
total (0.00 format)
There is typically 2 months of records in the database. Does anyone know a SQL query to find a weekly average?
I could easily just execute:
SELECT COUNT(1) as total_records, SUM(total) as total FROM stats_adsense
Then just divide total by 7 but unless there is exactly x days that are divisible by 7 in the db I don't think it will be very accurate, especially if there is less than 7 days of records.
To get a daily summary it's obviously just total / total_records.
Can anyone help me out with this?
You could try something like this:
SELECT strftime('%W', thedate) theweek, avg(total) theaverage
FROM table GROUP BY strftime('%W', thedate)
I'm not sure how the syntax would work in SQLite, but one way would be to parse out the date parts of each [date] field, and then specifying which WEEK and DAY boundaries in your WHERE clause and then GROUP by the week. This will give you a true average regardless of whether there are rows or not.
Something like this (using T-SQL):
SELECT DATEPART(w, theDate), Avg(theAmount) as Average
FROM Table
GROUP BY DATEPART(w, theDate)
This will return a row for every week. You could filter it in your WHERE clause to restrict it to a given date range.
Hope this helps.
Your weekly average is
daily * 7
Obviously this doesn't take in to account specific weeks, but you can get that by narrowing the result set in a date range.
You'll have to omit those records in the addition which don't belong to a full week. So, prior to summing up, you'll have to find the min and max of the dates, manipulate them such that they form "whole" weeks, and then run your original query with a WHERE that limits the date values according to the new range. Maybe you can even put all this into one query. I'll leave that up to you. ;-)
Those values which are "truncated" are not used then, obviously. If there's not enough values for a week at all, there's no result at all. But there's no solution to that, apparently.