Most efficient way to subset a query to a list of dates (across two columns) in PostgreSQL? - sql

I am querying a table in a PG database which contains a period (character varying(255)) and a value (integer), which looks something like:
|period|value|
|Months|3 |
|Months|6 |
|Weeks |1 |
|Years |5 |
After a few joins, I'm looking to subset my result set to only include a subset of these period / value combinations, for example I may only want 3 Months, 6 Months and 5 Years (so not 1 Weeks).
I'd usually reach to a WHERE IN(..) but don't think I can do this across two columns. Instead I've tried to make a composite column by:
CURRENT_DATE + CAST(CONCAT(tbl.value, tbl.period) AS INTERVAL)
Producing a column of timestamps which I can then subset with an IN('2019-05-18', '2019-08-18', '2024-02-18').
This works but isn't particularly pretty or efficient. Is there a better way?
I'm free to change my query (so I can subset by dates as I currently am, or by 3 and Months) but importantly I do not know ahead of time whether 2 Years will be stored as 24 Months (nor do I have control of the table).
Thanks!

You can say
WHERE (period, value) IN (('months', 5), ...)
and use an index over both columns.
I hope I got the syntax right; there might be a ROW missing somewhere.

Related

Stacking column data in excel SQL query

I'm trying to pull data from an access database into an excel table with an SQL query. The problem is that my access database has columns with similar data that I want to combine into one single column. This should give me duplicates of the data in other columns for each entry. I'm not great with SQL but I think I have the basics down.
Database structure that I have:
Date | Product | Hours 1 | Reason 1 | Hours 2 | Reason 2 |
2019 A 3 "xxx" 5 "yyy"
Excel table that I want:
Date | Product | Hours | Reason |
2019 A 3 "xxx"
2019 A 5 "yyy"
Also not sure if it's possible but it would be great to see the source column of each
Date | Product | Hours | Reason | Source |
2019 A 3 "xxx" "Hours 1"
2019 A 5 "yyy" "Hours 2"
I've tried UNION ALL and got duplicates of the data but not merged into one column. I'm about to try INSERT INTO but sort of lost on how to get each one into the same column
Try this
SELECT Date, Product, Hours, Reason, Source
FROM (
SELECT Date, Product, Hours1 Hours, Reason1 Reason, "Hours 1" Source
FROM Table
UNION
SELECT Date, Product, Hours2, Reason2, "Hours 2"
FROM Table
)
It looks like you have a bad data structure in the table. By that I mean its a "flat" table with multiple hours in one row for a record. This is generally a PITA when it comes doing tasks for reviewing data in many to one situations. Normally there would be a table where records get logged separately for each hour involved. I understand you probably didnt build it, but its worth pointing out for you own information.
Fundamentally, this issue would be easier to appproach once you understood how that less than desirable structure affects what youre task is. This is essentially, in my mind, a pivot problem. PIVOT in SQL is essentially switching rows and columns. There are may ways to pivot data with code - pick your favorite - most people actually use the function PIVOT, where I tend to teeter between CTE's (common table experessions) and PIVOT. IMO CTE's are easier to read once you understand them. Because Acess SQL doesnt support PIVOT or CTE's we just had to treat the body of what a cte wouldve been as a correlated subquery.
SELECT x.*
FROM
(
SELECT
Date,
Product,
Hours,
Reason,
[Hours 1] AS Source
FROM yourTableName
UNION
SELECT
Date,
Product,
Hours,
Reason,
[Hours 2] AS Source
FROM yourTableName
) x

SQL Statement - want daily dates rolled up and displayed as Year

I have two years worth of data that I'm summing up for instance
Date | Ingredient_cost_Amount| Cost_Share_amount |
I'm looking at two years worth of data for 2012 and 2013,
I want to roll up all the totals so I have only two rows, one row for 2012 and one row for 2013. How do I write a SQL statement that will look at the dates but display only the 4 digit year vs 8 digit daily date. I suspect the sum piece of it will be taken care of by summing those columns withe calculations, so I'm really looking for help in how to tranpose a daily date to a 4 digit year.
Help is greatly appreciated.
select DATEPART(year,[Date]) [Year]
, sum(Ingredient_cost_Amount) Total
from #table
group by DATEPART(year,[Date])
Define a range/grouping table.
Something similar to the following should work in most RDBMSs:
SELECT Grouping.id, SUM(Ingredient.ingredient_cost_amount) AS Ingredient_Cost_Amount,
SUM(Ingredient.cost_share_amount) AS Cost_Share_Amount
FROM (VALUES (2013, DATE('2013-01-01'), DATE('2014-01-01')),
(2012, DATE('2012-01-01'), DATE('2013-01-01'))) Grouping(id, gStart, gEnd)
JOIN Ingredient
ON Ingredient.date >= Grouping.gStart
AND Ingredient.date < Grouping.gEnd
GROUP BY Grouping.id
(DATE() and related conversion functions are heavily DB dependent. Some RDBMSs don't support using VALUES this way, although there are other ways to create the virtual grouping table)
See this blog post for why I used an exclusive upper bound for the range.
Using a range table this way will potentially allow the db to use indices to help with the aggregation. How much this helps depends on a bunch of other factors, like the specific RDBMS used.

is it possible to find out how much of the db data is older than some N years in SQL Server?

I have two database in SQL Server. I wanted to find out the data older than (let say 3) years.
I know the database creation date, currently I have around 550 GB (both the database) of data spanned for 7 years, I wanted to know 'how much of the DB data (out of total 550 GB)is older than 3 years OR (5 years)'?
I was going through this link but couldn't get the expected data.
SQL SERVER – Query to find number Rows, Columns, ByteSize for each table in the current database – Find Biggest Table in Database
One of the solution coming in my mind right now is to find out the total number of rows accounted for 7 years (easily get this number), total number of rows accounted for 5 years (starting from the date creation) (don't know how to get this number).
then for row_count_7_years accounts for 550 GB of data , what will be the row_count_5_years? i will get the approx data.
Please Help
For such purposes you should keep some datetime field as marc mentioned. I suppose you don't have it.
In you suggested solution you can get the whole count of rows from your table (for 7 years i suppose), but you wouldn't be able to get the rows for 5 years, because there is no date.
You can get the whole number of records for 7 years and divide them on the number of years, and ONLY IN CASE you have your database avarage fulfill, you can make query for top (numberOFRows in one year)*5 and order them by row_number(). The result - the rows, you should delete. But I wouldn't recommend you to use this solution.
I would recommend you to alter your tables and add the datetime columns for each of them. Before that you should make the backup for the whole date and copy it somewhere. After 3 years you would be able to make your clean up.
as mentioned above u shud have a date column , however if you dont , depending on the realtionships in your tables u might be able to estimate the number of rows looking up realtionships with some other table that has the datetime column , else if you have a backup ( unlikely but still) you can restore that to identify the delta

Effect on Query to find range after calculation

I have a View V1 with a column Name, DateOfBirth.
In my stored proc, I created a temporary table T2 as given below.
MinAge MaxAge Category
0 5 Under 5
13 19 Teenager
My stored proc query goes:
Select V1.Name, T2.Category
from V1, T2
where DATEDIFF(hour,V1.DateOfBirth ,GETDATE())/8766 between T2.minage and T2.maxage
As of now my result set looks fine. My query is, since there is no direct relatioship between these two tables, going forward can my query result be effected in any way?
Since there is no answer, I am forced to assume that there would be no major effect. Though suggestion from #GregHNZ is quite helpful. Thanks.
It's easier to determine age by adding years to their date of birth than dividing by days or hours. I'd add 5 years to their date of birth and compare that with getdate(), or your reporting date, to determine who was under/over 5 years old at that point in time.

SQL YTD for previous years and this year

Wondering if anyone can help with the code for this.
I want to query the data and get 2 entries, one for YTD previous year and one for this year YTD.
Only way I know how to do this is as 2 separate queries with where clauses.. I would prefer to not have to run the query twice.
One column called DatePeriod and populated with 2011 YTD and 2012YTD, would be even better if I could get it to do 2011YTD, 2012YTD, 2011Total, 2012Total... though guessing this is 4 queries.
Thanks
EDIT:
In response to help clear a few things up:
This is being coded in MS SQL.
The data looks like so: (very basic example)
Date | Call_Volume
1/1/2012 | 4
What I would like is to have the Call_Volume summed up, I have queries that group it by week, and others that do it by month. I could pull all the dailies in and do this in Excel but the table has millions of rows so always best to reduce the size of my output.
I currently group by Week/Month and Year and union all so its 1 output. But that means I have 3 queries accessing the same table, large pain, very slow not efficient and that is fine but now I also need a YTD so its either 1 more query or if I could find a way to add it to the yearly query that would ideal:
So
DatePeriod | Sum_Calls
2011 Total | 40
2011 YTD | 12
2012 Total | 45
2012 YTD | 15
Hope this makes any sense.
SQL is built to do operations on rows, not columns (you select columns, of course, but aggregate operations are all on rows).
The most standard approach to this is something like:
SELECT SUM(your_table.sales), YEAR(your_table.sale_date)
FROM your_table
GROUP BY YEAR(your_table.sale_date)
Now you'll get one row for each year on record, with no limit to how many years you can process. If you're already grouping by another field, that's fine; you'll then get one row for each year in each of those groups.
Your program can then iterate over the rows and organize/render them however you like.
If you absolutely, positively must have columns instead, you'll be stuck with something like this:
SELECT SUM(IF(YEAR(date) = 2011, sales, 0)) AS total_2011,
SUM(IF(YEAR(date) = 2012, total_2012, 0)) AS total_2012
FROM your_table
If you're building the query programmatically you can add as many of those column criteria as you need, but I wouldn't count on this running very efficiently.
(These examples are written with some MySQL-specific functions. Corresponding functions exist for other engines but the syntax would be a little different.)