Query to complete empty spaces on table in Postgres - sql

I need to create a view that given the first two tables (A and B), I get the result like in table C.
Basically I need to fill empty spaces on table B, using the first previous value available like shown below.
I've accomplished this using two loops on a procedure, but I'd like to try a solution using just selects statements.
table_a
date
1/1/2013
2/1/2013
3/1/2013
4/1/2013
5/1/2013
6/1/2013
7/1/2013
8/1/2013
9/1/2013
10/1/2013
....
table_b
date value
1/1/2013 10
3/1/2013 5
7/1/2013 30
10/1/2013 40
table_c - Desired result
date value
1/1/2013 10
2/1/2013 10
3/1/2013 5
4/1/2013 5
5/1/2013 5
6/1/2013 5
7/1/2013 30
8/1/2013 30
9/1/2013 30
10/1/2013 40
Does someone has any idea on how to accomplish this?

My sql is very rusty so I have this nagging feeling there's a better way, but what I came up with was to join against a sub-select that's a self join of table_b to make a new table b with date ranges. With that, it's easy to match table_a with the proper value.
I left a test on sqlfiddle so you can see the assumptions I made. This is the code below :
select date_format(a.date,'%m/%d/%Y') as date, b.value as value
from table_a as a join
(select b1.date as start, IFNULL(min(b2.date),'9999-12-31') as end, b1.value as value
from table_b as b1 left outer join table_b as b2
on b1.date < b2.date
group by b1.date) as b
on a.date >= b.start and a.date < b.end
The self join trims out the extra b2 entries with a group by and taking the min b2 date that's larger than b1's date. In the case of the very last entry, there is no b2 date larger so it ends up null; that I map to 12/31/9999 to be a really large date.

Related

Join Runing Sum in TSQL

I would like to join a running sum until a specific time point. E.g. I have two tables
Table A
TimestampOfInterest
2001-01-01
2001-02-01
2001-03-01
Table B
Timestamp Credits
2001-01-01 1
2001-01-05 1
2001-02-10 1
2001-03-15 1
Joining B -> A should lead to
TimestampOfInterest Credits
2001-01-01 0
2001-02-01 2
2001-03-01 3
That is the sum of credits until the given TimestampOfInterest.
Can someone help?
Lazloo
not sure you need join. You can simply do this:
Select TimestampOfInterest,
(Select SUM(Credits)
from TableB
where Timestamp < A.TimeStampOfInterest and Category = A.Category) Credits
From TableA A

How to LEFT JOIN on ROW_NUM using WITH

Right now I'm in the testing phase of this query so I'm only testing it on two Queries. I've gotten stuck on the final part where I want to left join everything (this will have to be extended to 12 separate queries). The problem is basically as the title suggests--I want to join 12 queries on the created Row_Num column using the WITH() statement, instead of creating 12 separate tables and saving them as table in a database.
WITH Jan_Table AS
(SELECT ROW_NUMBER() OVER (ORDER BY a.SALE_DATE) as Row_ID, a.SALE_DATE, sum(a.revenue) as Jan_Rev
FROM ba.SALE_TABLE a
WHERE a.SALE_DATE BETWEEN '2015-01-01' and '2015-01-31'
GROUP BY a.SALE_DATE)
SELECT ROW_NUMBER() OVER (ORDER BY a.SALE_DATE) as Row_ID, a.SALE_DATE, sum(a.revenue) as Jun_Rev, j.Jan_Rev
FROM ba.SALE_TABLE a
LEFT JOIN Jan_Table j
on "j.Row_ID" = a.Row_ID
WHERE a.SALE_DATE BETWEEN '2015-06-01' and '2015-06-30'
GROUP BY a.SALE_DATE
And then I get this error message:
ERROR: column "j.Row_ID" does not exist
I put in the "j.Row_ID" because the previous message was:
ERROR: column a.row_id does not exist Hint: Perhaps you meant to
reference the column "j.row_id".
Each query works individually without the JOIN and WITH functions. I have one for every month of the year and want to join 12 of these together eventually.
The output should be a single column with ROW_NUM and 12 Monthly Revenues columns. Each row should be a day of the month. I know not every month has 31 days. So, for example, Feb only has 28 days, meaning I'd want days 29, 30, and 31 as NULLs. The query above still has the dates--but I will remove the "SALE_DATE" column after I can just get these two queries to join.
My initially thought was just to create 12 tables but I think that'd be a really bad use of space and not the most logical solution to this problem if I were to extend this solution.
edit
Below are the separate outputs of the two qaruies above and the third table is what I'm trying to make. I can't give you the raw data. Everything above has been altered from the actual column names and purposes of the data that I'm using. And I don't know how to create a dataset--that's too above my head in SQL.
Jan_Table (first five lines)
Row_Num Date Jan_Rev
1 2015-01-01 20
2 2015-01-02 20
3 2015-01-03 20
4 2015-01-04 20
5 2015-01-05 20
Jun_Table (first five lines)
Row_Num Date Jun_Rev
1 2015-06-01 30
2 2015-06-02 30
3 2015-06-03 30
4 2015-06-04 30
5 2015-06-05 30
JOINED_TABLE (first five lines)
Row_Num Date Jun_Rev Date Jan_Rev
1 2015-06-01 30 2015-01-01 20
2 2015-06-02 30 2015-01-02 20
3 2015-06-03 30 2015-01-03 20
4 2015-06-04 30 2015-01-04 20
5 2015-06-05 30 2015-01-05 20
It seems like you can just use group by and conditional aggregation for your full query:
select day(sale_date),
max(case when month(sale_date) = 1 then sale_date end) as jan_date,
max(case when month(sale_date) = 1 then revenue end) as jan_revenue,
max(case when month(sale_date) = 2 then sale_date end) as feb_date,
max(case when month(sale_date) = 2 then revenue end) as feb_revenue,
. . .
from sale_table s
group by day(sale_date)
order by day(sale_date);
You haven't specified the database you are using. DAY() is a common function to get the day of the month; MONTH() is a common function to get the months of the year. However, those particular functions might be different in your database.

How Do I Select All Parents and the Top Previous Child Record Based on Dates in SQL Server 2008

I'm using a vendor provided database running on SQL Server 2008. There are two tables that track tests. For every record in Table A there may be zero, one or multiple records in Table B. There can also be multiple tests in Table A for the same user. The relationship is TableA.UserID = TableB.UserID. Tests taken in Table B can occur before or after Table A.
I need to select all of the records in Table A and, if test(s) from Table B have been taken by the same user before the test in Table A, data from Table B but only from the last previous child record. Both tables are structured similarly:
**TABLE A**
TestID INTEGER PRIMARY KEY,
UserID INTEGER,
TestDate DATE,
Score INTEGER
TABLE B
TestID INTEGER PRIMARY KEY,
UserID INTEGER,
TestDate Date,
Score INTEGER
Sample Data
TABLE A
TestID UserID TestDate Score
1 100 2014-02-15 80
2 101 2014-02-20 100
3 102 2014-02-22 90
4 102 2014-03-10 70
TABLE B
TestID UserID TestDate Score
1000 100 2014-02-01 55
1007 100 2014-02-05 85
1012 100 2014-02-20 95
1034 102 2014-02-12 65
1205 102 2014-03-05 75
1986 101 2014-03-10 45
What I'd like returned would be:
UserID TestA_ID TestADate TestAScore TestB_ID TestBDate TestBScore
100 1 2014-02-15 80 1007 2014-02-05 85
101 2 2014-02-20 100 NULL NULL NULL
102 3 2014-02-22 90 1034 2014-02-12 65
102 4 2014-03-10 70 1205 2014-03-05 75
I've know how to get all of the previous Table B rows joined to the Table A rows by using a LEFT OUTER JOIN and filtering by date in the WHERE clause, and I know how to get the Top row from Table B, but I haven't been able to work out how to get the top child record that occurs before the date of the record in Table A. Any help would be appreciated. Thanks.
You can do this using OUTER APPLY in T-SQL.
For each record in TableA, we're looking for a record in TableB for the same user but with a test date prior to the test date in TableA and we're also ordering the test in TableB to ensure we're getting the most recent test from TableB (but still prior to the test date from TableA).
SELECT
A.[UserID],
A.[TestID] [TestA_ID],
A.[TestDate] [TestADate],
A.[Score] [TestAScore],
B.[TestB_ID],
B.[TestBDate],
B.[TestBScore]
FROM [TableA] A
OUTER APPLY
(
SELECT TOP 1
B1.[TestID] [TestB_ID],
B1.[TestDate] [TestBDate],
B1.[Score] [TestBScore]
FROM [TableB] B1
WHERE A.[UserID] = B1.[UserID]
AND A.[TestDate] > B1.[TestDate]
ORDER BY
B1.[TestDate] DESC
) B
Or another option might be to use the ROW_NUMBER() window function to find the record from TableB. I have a hunch this one wouldn't perform as well because it needs to hit TableA twice, but can't be sure without running tests.
SELECT
A.[UserID],
A.[TestID] [TestA_ID],
A.[TestDate] [TestADate],
A.[Score] [TestAScore],
B.[TestB_ID],
B.[TestBDate],
B.[TestBScore]
FROM [TableA] A
LEFT JOIN
(
SELECT
ROW_NUMBER() OVER (PARTITION BY A.[UserID], A.[TestID] ORDER BY B.[TestDate] DESC) [rn],
A.[UserID],
A.[TestID] [TestA_ID],
B.[TestID] [TestB_ID],
B.[TestDate] [TestBDate],
B.[Score] [TestBScore]
FROM [TableA] A
INNER JOIN [TableB] B
ON A.[UserID] = B.[UserID]
AND A.[TestDate] > B.[TestDate]
) B
ON A.[UserID] = B.[UserID]
AND A.[TestID] = B.[TestA_ID]
AND B.[rn] = 1

SQL: Select all from column A and add a value from column B if present

I'm having quite an easy problem with SQL, I just can't word it properly (therefore I didn't find anything in google and my title probably is misleading)
The problem is: I have a big table containing transaction informations in the form (ID, EmployeeID, Date, Value) (and some more, but only those matter currently) and a list of all EmployeeIDs. What I want is a result table showing all employee IDs with their aggregated value of transactions in a given timespan.
The problem is: How do I get those employees into the result table that don't have an entry for the given time period?
e.g.
ID EMPLID DATE VALUE
1 1 2013-01-01 1000
2 2 2013-02-02 2000
3 1 2013-01-03 3000
4 2 2013-04-01 2000
5 2 2013-03-01 2000
6 1 2013-02-01 4000
EMPLID NAME
1 bob
2 alice
And now I want the aggregated value of all transactions after 2013-03-01 like this
EMPLID VALUE
1 0 <- how to get this based on the employee table?
2 4000
The SQL Server in use is Firebird and I connect to it through JDBC (if that matters)
SELECT a.EmpID, a.Name,
COALESCE(SUM(b.Value), 0) TotalValue
FROM Employee a
LEFT JOIN Transactions b
ON a.EmpID = b.EmpID AND
b.Date >= '2013-03-01'
GROUP BY a.EmpID, a.Name
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins

How to get the last conscutive <n> like records in an Oracle query

I have several linked tables and I'm trying to find the sets of data where one of the sub tables has 3 (or some user-set parameter) records in a row with a given value, where the ordering is by date (in another linked table)
Table1
ID LinkID Flag
AA1 AA 30
AA2 AA 30
AA3 AA 60
AA4 AA 30
BB1 BB 30
BB2 BB 30
BB3 BB 30
BB4 BB 40
Table2
TA1 CA 2/1/2013
TA2 CA 1/1/2013
TA3 CA 12/1/2012
TA4 CA 11/1/2012
TB1 CB 2/2/2013
TB2 CB 1/1/2013
TB3 CB 12/1/2012
TB4 CB 11/2/2012
Other tables link them together, but I can get the AA to the CA records linked, and thus a joined result set that has
AA 30 2/1/2013
AA 30 1/1/2013
AA 60 12/1/2012
AA 30 11/1/2012
BB 30 1/1/2013
BB 30 2/2/2013
BB 30 12/1/2012
BB 40 11/2/2012
How do I query so that if they want the records with the last 3 consecutive '30' records I get only BB, but if they want the set with the last 2 consecutive '30' records I get them both? And, of course, for any data that doesn't have a flag of 30 in the most recent record, I don't ever get that data?
I'm starting from an existing query that joins the dozen tables or so, and returns data based on the most current one being 30, and shows the one previous. I think for this modification, I'll need to completely re-organize it, but I'm drawing a blank on how to even approach it, and the above I think shows what I'm trying to do.
I don't need working sql (I didn't provide enough data examples anyway), but rather an sql pseudo-code showing how to approach finding consecutive records with a given value, based on ordering found in another indirectly linked table. Or, for that matter, how to get it if it was all in one table, like the result set above.
It sounds like you'll want to use the LAG and/or LEAD analytic functions. So, for example,
LAG( flag ) OVER (PARTITION BY id ORDER BY date_column DESC) prior_flag_value
will return the prior value of the flag column for that id value based on the date_column. You can look back more than one row as well
LAG( flag, 2 ) OVER (PARTITION BY id ORDER BY date_column DESC) prior_flag_value
will get you the value from two rows prior. Similarly, you can use LEAD to get the value for the next row.
It sounds like you are looking for the number of records that appear consecutively together, based on the date.
To do this, do the following:
(1) enumerate the months using row_number() for each col1 value: row_number() over (partition by col1) as seqnum_1.
(2) enumerate the months using row_number() for each col1, col2 combination: row_number() over (partition by col1, col2) as seqnum_2.
(3) Now seqnum_1 - seqnum_2 identifies groups of consecutive values.
(4) Count this for each group on each record: count(*) over (partition by col1, seqnum_1 - seqnum_2) as thegroupsize.
Now, you can select where thegroupsize has 2 or more elements.