Cross joining two tables with "using" instead of "on" - sql

I found a SQL query in a book which i am not able to understand. From what i understand there are two tables - date which has a date_id and test_Date column, the 2nd table has date_id and obs_cnt.
select t1.test_date
,sum(t2.obs_cnt)
from date t1
cross join
(transactions join date using (date_id)) as t2
where t1.test_date>=t2.test_date
group by t1.test_date
order by t1.test_date
Can someone help me understand what this code does or how the output will look like.
I understand obs_cnt variable is being aggregated at a test_date level.
I understand the use of using in placed on on. But what i dont get is how the date table is being reference twice, does it mean it is being joined twice?

But what i dont get is how the date table is being reference twice, does it mean it is being joined twice?
Yes it is, although it's probably easier to think of t2 as a whole rather than as a function of the date table: t2 is the transaction table but with the actual date representation of the test_date rather than an ID.
I assume there's actually some context for all of this in the book, but it looks like this will produce:
one row of output for every row in the date table (t1), in order of test_date
for each row, total up the number of observations for all transactions that happened on or before that date, using our transactions-with-date table t2.
I understand obs_cnt variable is being aggregated at a test_date level.
It's being aggregated against t1 test_date, which is the constraint we're using to select the rows in t2 that are summed.

Related

Selecting distinct values from database

I have a table as follows:
ParentActivityID | ActivityID | Timestamp
1 A1 T1
2 A2 T2
1 A1 T1
1 A1 T5
I want to select unique ParentActivityID's along with Timestamp. The time stamp can be the most recent one or the first one as is occurring in the table.
I tried to use DISTINCT but i came to realise that it dosen't work on individual columns. I am new to SQL. Any help in this regard will be highly appreciated.
DISTINCT is a shorthand that works for a single column. When you have multiple columns, use GROUP BY:
SELECT ParentActivityID, Timestamp
FROM MyTable
GROUP BY ParentActivityID, Timestamp
Actually i want only one one ParentActivityID. Your solution will give each pair of ParentActivityID and Timestamp. For e.g , if i have [1, T1], [2,T2], [1,T3], then i wanted the value as [1,T3] and [2,T2].
You need to decide what of the many timestamps to pick. If you want the earliest one, use MIN:
SELECT ParentActivityID, MIN(Timestamp)
FROM MyTable
GROUP BY ParentActivityID
Try this:
SELECT [ParentActivityId],
MIN([Timestamp]) AS [FirstTimestamp],
MAX([Timestamp]) AS [RecentTimestamp]
FROM [Table]
GROUP BY [ParentActivityId]
This will provide you the first timestamp and the most recent timestamp for each ParentActivityId that is present in your table. You can choose the ones you need as per your need.
"Group by" is what you need here. Just do "group by ParentActivityID" and tell that most recent timestamp along all rows with same ParentActivityID is needed for you:
SELECT ParentActivityID, MAX(Timestamp) FROM Table GROUP BY ParentActivityID
"Group by" operator is like taking rows from a table and putting them in a map with a key defined in group by clause (ParentActivityID in this example). You have to define how grouping by will handle rows with duplicate keys. For this you have various aggregate functions which you specify on columns you want to select but which are not part of the key (not listed in group by clause, think of them as a values in a map).
Some databases (like mysql) also allow you to select columns which are not part of the group by clause (not in a key) without applying aggregate function on them. In such case you will get some random value for this column (this is like blindly overwriting value in a map with new value every time). Still, SQL standard together with most databases out there will not allow you to do it. In such case you can use min(), max(), first() or last() aggregate function to work around it.
Use CTE for getting the latest row from your table based on parent id and you can choose the columns from the entire row of the output .
;With cte_parent
As
(SELECT ParentActivityId,ActivityId,TimeStamp
, ROW_NUMBER() OVER(PARTITION BY ParentActivityId ORDER BY TimeStamp desc) RNO
FROM YourTable )
SELECT *
FROM cte_parent
WHERE RNO =1

SQL: Move duplicates to another table where condition

I am quite new to SQL and Stackoverflow, so pardon the layout of my post.
Currently, I am struggling with putting the following workflow into an executable SQL statement:
I have a table containing the following columns:
ID (not unique)
PARTYTYPE (1 or 2)
DATE column
several other, not relevant columns
Now I need to find those observations (rows) that have the same ID and same PARTYTYPE but are not the most recent, i.e. have a date in the DATE column that is less than the most recent for the given combination of PARTYTYPE and ID. The rows that satisfy this condition need to be moved to another table with the same table scheme in order to archive them.
Is there an efficient, yet simple way to accomplish this in SQL?
I have been looking for a long time, but since it involves finding duplicates with certain conditions and inserting it into a table, it is a rather specific problem.
This is what I have so far:
INSERT INTO table_history
select ID, PARTYTYPE, count(*) as count_
from table
group by ID, PARTYTYPE, DATE
having DATE = MAX(DATE)
Any help would be appreciated!
The way you describe the SQL almost exactly conforms to a correlated subquery:
INSERT INTO table_history( . . . )
select t.*
from table t
where date < (select max(date)
from table t2
where t2.id = t.id and t2.partytype = t.partytype
);

Oracle SQL Developer(4.0.0.12)

First time posting here, hopes it goes well.
I try to make a query with Oracle SQL Developer, where it returns a customer_ID from a table and the time of the payment from another. I'm pretty sure that the problems lies within my logicflow (It was a long time I used SQL, and it was back in school so I'm a bit rusty in it). I wanted to list the IDs as DISTINCT and ORDER BY the dates ASCENDING, so only the first date would show up.
However the returned table contains the same ID's twice or even more in some cases. I even found the same ID and same DATE a few times while I was scrolling through it.
If you would like to know more please ask!
SELECT DISTINCT
FIRM.customer.CUSTOMER_ID,
FIRM.account_recharge.X__INSDATE FELTOLTES
FROM
FIRM.customer
INNER JOIN FIRM.account
ON FIRM.customer.CUSTOMER_ID = FIRM.account.CUSTOMER
INNER JOIN FIRM.account_recharge
ON FIRM.account.ACCOUNT_ID = FIRM.account_recharge.ACCOUNT
WHERE
FIRM.account_recharge.X__INSDATE BETWEEN TO_DATE('14-01-01', 'YY-MM-DD') AND TO_DATE('14-12-31', 'YY-MM-DD')
ORDER
BY FELTOLTES
Your select works like this because a CUSTOMER_ID indeed has more than one X__INSDATE, therefore the records in the result will be distinct. If you need only the first date then don't use DISTINCT and ORDER BY but try to select for MIN(X__INSDATE) and use GROUP BY CUSTOMER_ID.
SELECT DISTINCT FIRM.customer.CUSTOMER_ID,
FIRM.account_recharge.X__INSDATE FELTOLTES
Distinct is applied to both the columns together, which means you will get a distinct ROW for the set of values from the two columns. So, basically the distinct refers to all the columns in the select list.
It is equivalent to a select without distinct but a group by clause.
It means,
select distinct a, b....
is equivalent to,
select a, b...group by a, b
If you want the desired output, then CONCATENATE the columns. The distict will then work on the single concatenated resultset.

Dedupe and retain record with most recent timestamp

I'm working with a wide dataset with 500+ columns. The dataset contains a customer ID field and a time-stamp field. I'd like to query the data and end up with a table with only one row per customer ID field where the row retained is the row with the most recent timestamp. The query will be run on a Netezza server if that makes a difference. It seems like I could do this with a sub-query, but I can't seem to get syntax that works.
Here is a typical way to approach this problem:
select t.*
from table t
where not exists (select 1
from table t2
where t2.customerid = t.customerid and
t2.timestamp > t.timestamp
);
This rephrases the question to: "Get me all rows from the table where there is no row with the same customer id and a larger timestamp."

Complex sql query involving timestamp to timestamp periods and joins and sums - is it even possible?

I am trying to create database query, which will select rows from one table, create periods from those rows (using Lag window function), and join the query with rows from different table, where it sums value(s)from one column per row in first table.
Table A:
id,
created_at,
object_id
Table B:
id,
end_time,
value,
object_id
And rows, that query yields should consist of columns something like:
lag(tablea.created_at) over(tablea.object_id, tablea.created_at),
tablea.created_at,
tablea.object_id
sum(tableb.value) where it sums the tableb.value from matching period
I tried creating query where i put the window function into WHERE clause only to get an error. I also tried puting the period into join on clause but that also raised an error.
It is no problem, if it is not possible. I just want to know if it possible and in that case how it is possible. If it is not possible, then i just will try to come up with alternative aproach.
Edit: link to example sqlfiddle: http://sqlfiddle.com/#!1/c7878
Edit2: The SQL i tried was something like:
SELECT lag(a.created_at, a.created_at, a.object_id, sum(b.value) from tablea a left join tableb b on (something) order by a.object_id, a.created_at
But obviously that did not work, because i could not use window function in ON clause. That's where i got stuck