Tricky SQL SELECT statement - combine two rows into two columns - sql

My problem:
I have a table with a Channel <int> and a Value <float> column, along with a timestamp and a couple of other columns with additional data. Channel is either 1 or 2, and there is either 1 or 2 rows that have everything except channel and value the same.
What I'd like to do is select this data into a new form, where the two channels show up as columns. I tried to do something with GROUP BY, but I couldn't figure out how to get the values into the correct columns based on the channel on the same row.
Example:
For those of you that rather look at the data I have and the data I want and figure it out from there, here it is. What I have:
Channel Value Timestamp OtherStuff
1 0.2394 2010-07-09 13:00:00 'some other stuff'
2 1.2348 2010-07-09 13:00:00 'some other stuff'
1 24.2348 2010-07-09 12:58:00 'some other stuff'
2 16.3728 2010-07-09 12:58:00 'some other stuff'
1 12.284 2010-07-09 13:00:00 'unrelated things'
2 9.6147 2010-07-09 13:00:00 'unrelated things'
What I want:
Value1 Value2 Timestamp OtherStuff
0.2394 1.2348 2010-07-09 13:00:00 'some other stuff'
24.2348 16.3728 2010-07-09 12:58:00 'some other stuff'
12.284 9.6147 2010-07-09 13:00:00 'unrelated things'
Update in response to some questions that have arised in comments, and a few follow up questions/clarifications:
Yes, it is the combination of Timestamp and OtherStuff that links the two rows together. (OtherStuff is actually more than one column, but I simplified for brevity.) There are also a couple of other columns that are not necessarily equal, but should be kept just as they are.
The table in question is already joined from two tables, where Value, Channel and Timestamp comes from one of them, and the rest (a total of 7 more columns, out of which 4 are always equal for "linked" rows, and the other three are mostly not). There have been a couple of suggestions using INNER JOIN - will these still work if I'm already joining stuff together (even though I don't have a myTable to join to itself)?
There are a lot of rows with the same timestamp, so I need information from both the tables I'm joining to figure out which rows to link together.
I have a lot of data. The input comes from measurement devices stationed all over the country, and most of them (if not all) upload measurements (for up to 4 channels) every 2 minutes. Right now we have about 1000 devices online, so this means an addidtion of on average approximately 1000 rows every minute. I need to consider values that are up to at least 3, preferrably 6, hours old, which means 180 000 to 360 000 rows in the table with channel, value and timestamp.

As long as you have something that links the 2 rows, something like this
SELECT
c1.Value AS Value1, c2.Value AS Value2, c1.timestamp, c2.otherstuff
FROM
MyTable c1
JOIN
MyTable c2 ON c1.timestamp = c2.timestamp AND c1.otherstuff = c2.otherstuff
WHERE
c1.Channel = 1 AND c2.Channel = 2
If you don't have anything that links the 2 rows, then it probably can't be done because how do you know they are paired?
If you have 1 or 2 rows (edit: and don't know which channel value you have)
SELECT
c1.Value AS Value1, c2.Value AS Value2, c1.timestamp, c2.otherstuff
FROM
(
SELECT Value, timestamp, otherstuff
FROM MyTable
WHERE Channel = 1
) c1
FULL OUTER JOIN
(
SELECT Value, timestamp, otherstuff
FROM MyTable
WHERE Channel = 2
) c2 ON c1.timestamp = c2.timestamp AND c1.otherstuff = c2.otherstuff

Something like...
SELECT MAX(CASE Channel WHEN 1 THEN Value ELSE 0 END) AS Value1,
MAX(CASE Channel WHEN 2 THEN Value ELSE 0 END) AS Value2,
Timestamp,
OtherStuff
FROM {tablename}
GROUP BY Timestamp, OtherStuff
(I havent tested this!)
(and this assumes your Value is always positive!)
Alternatively (see comments below)...
SELECT SUM(CASE Channel WHEN 1 THEN Value ELSE 0 END) AS Value1,
SUM(CASE Channel WHEN 2 THEN Value ELSE 0 END) AS Value2,
Timestamp,
OtherStuff
FROM {tablename}
GROUP BY Timestamp, OtherStuff

SELECT a.Value as Value1, b.Value as Value2,
a.TimeStamp, a.OtherStuff
FROM myTable a INNER JOIN myTable b
ON a.OtherStuff = b.OtherStuff and a.TimeStamp = b.TimeStamp
WHERE a.Channel = 1 AND b.Channel = 2
Written without a query editor.
Edit: INNER JOIN could also be used here.

Related

How can I replace the LAST() function in MS Access with proper ordering on a rather large table?

I have an MS Access database with the two tables, Asset and Transaction. The schema looks like this:
Table ASSET
Key Date1 AType FieldB FieldC ...
A 2023.01.01 T1
B 2022.01.01 T1
C 2023.01.01 T2
.
.
TABLE TRANSACTION
Date2 Key TType1 TType2 TType3 FieldOfInterest ...
2022.05.31 A 1 1 1 10
2022.08.31 A 1 1 1 40
2022.08.31 A 1 2 1 41
2022.09.31 A 1 1 1 30
2022.07.31 A 1 1 1 30
2022.06.31 A 1 1 1 20
2022.10.31 A 1 1 1 45
2022.12.31 A 2 1 1 50
2022.11.31 A 1 2 1 47
2022.05.23 B 2 1 1 30
2022.05.01 B 1 1 1 10
2022.05.12 B 1 2 1 20
.
.
.
The ASSET table has a PK (Key).
The TRANSACTION table has a composite key that is (Key, Date2, Type1, Type2, Type3).
Given the above tables let's see an example:
Input1 = 2022.04.01
Input2 = 2022.08.31
Desired result:
Key FieldOfInterest
A 41
because if the Transactions in scope was to be ordered by Date2, TType1, TType2, TType3 all ascending then the record having FieldOfInterest = 41 would be the last one.
Note that Asset B is not in scope due to Asset.Date1 < Input1, neither is Asset C because AType != T1. Ultimately I am curious about the SUM(FieldOfInterest) of all the last transactions belonging to an Asset that is in scope determined by the input variables.
The following query has so far provided the right results but after upgrading to a newer MS Access version, the LAST() operation is no longer reliably returning the row which is the latest addition to the Transaction table.
I have several input values but the most important ones are two dates, lets call them InputDate1 and
InputDate2.
This is how it worked so far:
SELECT Asset.AType, Last(FieldOfInterest) AS CurrentValue ,Asset.Key
FROM Transaction
INNER JOIN Asset ON Transaction.Key = Asset.Key
WHERE Transaction.Date2 <= InputDate2 And Asset.Date1 >= InputDate1
GROUP BY Asset.Key, Asset.AType
HAVING Asset.AType='T1'
It is known that the grouped records are not guaranteed to be in any order. Obviously it is a mistake to rely on the order of the records of the group by operation will always keep the original table order but lets just ignore this for now.
I have been struggling to come up with the right way to do the following:
join the Asset and Transaction tables on Asset.Key = Transaction.Key
filter by Asset.Date1 >= InputDate1 AND Transaction.Date2 <= InputDate2
then I need to select one record for all Transaction.Key where Date2 and TType1 and TType2 and TType3 has the highest value. (this represents the actual last record for given Key)
As far as I know there is no way to order records within a group by clause which is unfortunate.
I have tried Ranking, but the Transactions table is large (800k rows) and the performance was very slow, I need something faster than this. The following are an example of three saved queries that I wrote and chained together but the performance is very disappointing probably due to the ranking step.
-- Saved query step_1
SELECT Asset.*, Transaction.*
FROM Transaction
INNER JOIN Asset ON Transaction.Key = Asset.Key
WHERE Transaction.Date2 <= 44926
AND Asset.Date1 >= 44562
AND Asset.aType = 'T1'
-- Saved query step_2
SELECT tr.FieldOfInterest, (SELECT Count(*) FROM
(SELECT tr2.Transaction.Key, tr2.Date2, tr2.Transaction.tType1, tr2.tType2, tr2.tType3 FROM step_1 AS tr2) AS tr1
WHERE (tr1.Date2 > tr.Date2 OR
(tr1.Date2 = tr.Date2 AND tr1.tType1 > tr.Transaction.tType1) OR
(tr1.Date2 = tr.Date2 AND tr1.tType1 = tr.Transaction.tType1 AND tr1.tType2 > tr.tType2) OR
(tr1.Date2 = tr.Date2 AND tr1.tType1 = tr.Transaction.tType1 AND tr1.tType2 = tr.tType2 AND tr1.tType3 > tr.tType3))
AND tr1.Key = tr.Transaction.Key)+1 AS Rank
FROM step_1 AS tr
-- Saved query step_3
SELECT SUM(FieldOfInterest) FROM step_2
WHERE Rank = 1
I hope I am being clear enough so that I can get some useful recommendations. I've been stuck with this for weeks now and really don't know what to do about it. I am open for any suggestions.
Reading the following specification
then I need to select one record for all Transaction.Key where Date2 and TType1 and TType2 and TType3 has the highest value. (this represents the actual last record for given Key)
Consider a simple aggregation for step 2 to retrieve the max values then in step 3 join all fields to first query.
Step 1 (rewritten to avoid name collision and too many columns)
SELECT a.[Key] AS Asset_Key, a.Date1, a.AType,
t.[Key] AS Transaction_Key, t.Date2,
t.TType1, t.TType2, t.TType3, t.FieldOfInterest
FROM Transaction t
INNER JOIN Asset a ON a.[Key] = a.[Key]
WHERE t.Date2 <= 44926
AND a.Date1 >= 44562
AND a.AType = 'T1'
Step 2
SELECT Transaction_Key,
MAX(Date2) AS Max_Date2,
MAX(TType1) AS TType1,
MAX(TType2) AS TType2,
MAX(TType3) AS TType3
FROM step_1
GROUP Transaction_Key
Step 3
SELECT s1.*
FROM step_1 s1
INNER JOIN step_2 s2
ON s1.Transaction_Key = s2.Transaction_Key
AND s1.Date2 = s2.Max_Date2
AND s1.TType1 = s2.Max_TType1
AND s1.TType2 = s2.Max_TType2
AND s1.TType3 = s2.Max_TType3

BQ/SQL join two tables in a way that one column fills up with all distinct values from the other table while remaining columns get a null

Hello everyone this is my first question here. I have been browsing thru the questions but couldnt quite find the answer to my problem:
I have a couple of tables which I need to join. The key I join with is non unique(in this case its a date). This is working fine but now I also need to group the results based on another column without getting cross-join like results (meaning each value of this column should only appear once but depending on the table used the column can have different values in each table)
Here is an example of what I have and what I would like to get:
Table1
Date/Key
Group Column
Example Value1
01-01-2022
a
1
01-01-2022
d
2
01-01-2022
e
3
01-01-2022
f
4
Table 2
Date/Key
Group Column
Example Value 2
01-01-2022
a
1
01-01-2022
b
2
01-01-2022
c
3
01-01-2022
d
4
Wanted Result :
Table Result
Date/Key
Group Column
Example Value1
Example Value2
01-01-2022
a
1
1
01-01-2022
b
NULL
2
01-01-2022
c
NULL
3
01-01-2022
d
2
4
01-01-2022
e
3
NULL
01-01-2022
f
4
NULL
I have tryed a couple of approaches but I always get results with values in group column appear multiple times. I am under the impression that full joining and then grouping over the group column shoul work but apparently I am missing something. I also figured I could bruteforce the result by left joining everything with setting the on to table1.date = table2.date AND table1.Groupcolumn = table2.Groupcolumn ect.. and then doing UNIONs of all permutations (so each table was on "the left" once) but this is not only tedious but bigquery doesnt like it since it contains too many sub queries.
I feel kinda bad that my first question is something that I should actually know but I hope someone can help me out!
I do not need a full code solution just a hint to the correct approach would suffice (also incase I missed it: if this was already answered I also appreciate just a link to it!)
Edit:
So one solution I came up with, which appears to work, was to select the group column of each table and union them as a with() and then join this "list" onto the first table like
list as(Select t1.GroupColumn FROM Table_1 t1 WHERE CONDITION1
UNION DISTINCT Select t1.GroupColumn FROM Table_1 t1 WHERE CONDITION2 ... ect)
result as (
SELECT l.GoupColumn, t1.Example_Value1, t2.Example_Value2
FROM Table_1 t1
LEFT JOIN( SELECT * FROM list) s
ON S.GroupColumn = t1.GroupColumn
LEFT JOIN Table_2 t2
on S.GroupColumn = t2.GroupColumn
and t1.key = t2.key
...
)
SELECT * FROM result
I think what you are looking for is a FULL OUTER JOIN and then you can coalesce the date and group columns. It doesn't exactly look like you need to group anything based on the example data you posted:
SELECT
coalesce(table1.date_key, table2.date_key) AS date_key,
coalesce(table1.group_column, table2.group_column) AS group_column,
table1.example_value_1,
table2.example_value_2
FROM
table1
FULL OUTER JOIN
table2
USING
(date_key,
group_column)
ORDER BY
date_key,
group_column;
Consider below simple approach
select * from (
select *, 'example_value1' type from table1 union all
select *, 'example_value2' type from table2
)
pivot (
any_value(example_value1)
for type in ('example_value1', 'example_value2')
)
if applied to sample data in your question - output is

SQL UNION ALL only include newer entries from 'bottom' table

Fair warning: I'm new to using SQL. I do so on an Oracle server either via AQT or with SQL Developer.
As I haven't been able to think or search my way to an answer, I put myself in your able hands...
I'd like to combine data from table A (high quality data) with data from table B (fresh data) such that the entries from B are only included when the date stamp are later than those available from table A.
Both tables include entries from multiple entities, and the latest date stamp varies with those entities.
On the 4th of january, the tables may look something like:
A____________________________ B_____________________________
entity date type value entity date type value
X 1.jan 1 1 X 1.jan 1 2
X 1.jan 0 1 X 1.jan 0 2
X 2.jan 1 1 X 2.jan 1 2
Y 1.jan 1 1 (new entry)X 3.jan 1 1
Y 3.jan 1 1 Y 1.jan 1 2
Y 3.jan 1 2
(new entry)Y 4.jan 1 1
I have made an attempt at some code that I hope clarify my need:
WITH
AA AS
(SELECT entity, date, SUM(value)
FROM table_A
GROUP BY
entity,
date),
BB AS
(SELECT entity, date, SUM(value)
FROM table_B
WHERE date > ALL (SELECT date FROM AA)
GROUP BY
entity,
date
)
SELECT * FROM (SELECT * FROM AA UNION ALL SELECT * FROM BB)
Now, if the WHERE date > ALL (SELECT date FROM AA)would work seperately for each entity, I think have what I need.
That is, for each entity I want all entries from A, and only newer entries from B.
As the data in table A often differ from that of B (values are often corrected) I dont think I can use something like: table A UNION ALL (table B MINUS table A)?
Thanks
Essentially you are looking for entries in BB which do not exist in AA. When you are doing date > ALL (SELECT date FROM AA) this will not take into consideration the entity in question and you will not get the correct records.
Alternative is to use the JOIN and filter out all matching entries with AA.
Something like below.
WITH
AA AS
(SELECT entity, date, SUM(value)
FROM table_A
GROUP BY
entity,
date),
BB AS
(SELECT entity, date, SUM(value)
FROM table_B
LEFT OUTER JOIN AA
ON AA.entity = BB.entity
AND AA.DATE = BB.date
WHERE AA.date == null
GROUP BY
entity,
date
)
SELECT * FROM (SELECT * FROM AA UNION ALL SELECT * FROM BB)
I find your question confusing, because I don't know where the aggregation is coming from.
The basic idea on getting newer rows from table_b uses conditions in the where clause, something like this:
select . . .
from table_a a
union all
select . . .
from table_b b
where b.date > (select max(a.date) from a where a.entity = b.entity);
You can, of course, run this on your CTEs, if those are what you really want to combine.
Use UNION instead of UNION ALL , it will remove the duplicate records
SELECT * FROM (
SELECT *
FROM AA
UNION
SELECT *
FROM BB )

SQL select from multiple tables based on datetime

I am working on a script to analyze some data contained in thousands of tables on a SQL Server 2008 database.
For simplicity sakes, the tables can be broken down into groups of 4-8 semi-related tables. By semi-related I mean that they are data collections for the same item but they do not have any actual SQL relationship. Each table consists of a date-time stamp (datetime2 data type), value (can be a bit, int, or float depending on the particular item), and some other columns that are currently not of interest. The date-time stamp is set for every 15 minutes (on the quarter hour) within a few seconds; however, not all of the data is recorded precisely at the same time...
For example:
TABLE1:
TIMESTAMP VALUE
2014-11-27 07:15:00.390 1
2014-11-27 07:30:00.390 0
2014-11-27 07:45:00.373 0
2014-11-27 08:00:00.327 0
TABLE2:
TIMESTAMP VALUE
2014-11-19 08:00:07.880 0
2014-11-19 08:15:06.867 0.0979999974370003
2014-11-19 08:30:08.593 0.0979999974370003
2014-11-19 08:45:07.397 0.0979999974370003
TABLE3
TIMESTAMP VALUE
2014-11-27 07:15:00.390 0
2014-11-27 07:30:00.390 0
2014-11-27 07:45:00.373 1
2014-11-27 08:00:00.327 1
As you can see, not all of the tables will start with the same quarterly TIMESTAMP. Basically, what I am after is a query that will return the VALUE for each of the 3 tables for every 15 minute interval starting with the earliest TIMESTAMP out of the 3 tables. For the example given, I'd want to start at 2014-11-27 07:15 (don't care about seconds... thus, would need to allow for the timestamp to be +- 1 minute or so). Returning NULL for the value when there is no record for the particular TIMESTAMP is ok. So, the query for my listed example would return something like:
TIMESTAMP VALUE1 VALUE2 VALUE3
2014-11-27 07:15 1 NULL 0
2014-11-27 07:30 0 NULL 0
2014-11-27 07:45 0 NULL 1
2014-11-27 08:00 0 NULL 1
...
2014-11-19 08:00 0 0 1
2014-11-19 08:15 0 0.0979999974370003 0
2014-11-19 08:30 0 0.0979999974370003 0
2014-11-19 08:45 0 0.0979999974370003 0
I hope this makes sense. Any help/pointers/guidance will be appreciated.
Use Full Outer Join
SELECT COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP]) [TIMESTAMP],
Isnull(Max(a.VALUE), 0) VALUE1,
Max(b.VALUE) VALUE2,
Isnull(Max(c.VALUE), 0) VALUE3
FROM TABLE1 a
FULL OUTER JOIN TABLE2 b
ON CONVERT(SMALLDATETIME, a.[TIMESTAMP]) = CONVERT(SMALLDATETIME, b.[TIMESTAMP])
FULL OUTER JOIN TABLE3 c
ON CONVERT(SMALLDATETIME, a.[TIMESTAMP]) = CONVERT(SMALLDATETIME, c.[TIMESTAMP])
GROUP BY COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP])
ORDER BY [TIMESTAMP] DESC
The first thing I would do is normalize the timestamps to the minute. You can do this with an update to the existing column
UPDATE TABLENAME
SET TIMESTAMP = dateadd(minute,datediff(minute,0,TIMESTAMP),0)
or in a new column
ALTER TABLE TABLENAME ADD COLUMN NORMTIME DATETIME;
UPDATE TABLENAME
SET NORMTIME = dateadd(minute,datediff(minute,0,TIMESTAMP),0)
For details on flooring dates this see this post: Floor a date in SQL server
The next step is to make a table that has all of the timestamps (normalized) that you expect to see -- that is every 15 -- one per row. Lets call this table TIME_PERIOD and the column EVENT_TIME for my examples (call it whatever you want).
There are many ways to make such a table recursive CTE, ROW_NUMBER(), even brute force. I leave that part up to you.
Now the problem is simple select with left joins and a filter for valid values like this:
SELECT TP.EVENT_TIME, a.VALUE as VALUE1, b.VALUE as VALUE2, c.VALUE as VALUE3
FROM TIME_PERIOD TP
LEFT JOIN TABLE1 a ON a.[TIMESTAMP] = TP.EVENT_TIME
LEFT JOIN TABLE2 b ON b.[TIMESTAMP] = TP.EVENT_TIME
LEFT JOIN TABLE3 c ON c.[TIMESTAMP] = TP.EVENT_TIME
WHERE COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP]) is not null
ORDER BY TP.EVENT_TIME DESC
The where might get a little more complex if they are different types so you can always use this (which is not as good as coalesce but will always work):
WHERE a.[TIMESTAMP] IS NOT NULL OR
b.[TIMESTAMP] IS NOT NULL OR
c.[TIMESTAMP] IS NOT NULL
Here is an updated version of NoDisplayName's answer that does what you want. It works for SQL 2012, but you could replace the DATETIMEFROMPARTS function with a series of other functions to get the same result.
;WITH
NewT1 as (
SELECT DATETimeFROMPARTS( DATEPART(year,Timestamp) , DATEPART(month,timestamp) , datepart(day,timestamp),datepart(hour,timestamp), datepart(minute,timestamp),0,0 ) as TimeStamp, Value
FROM Table1),
NewT2 as (
SELECT DATETimeFROMPARTS( DATEPART(year,Timestamp) , DATEPART(month,timestamp) , datepart(day,timestamp),datepart(hour,timestamp), datepart(minute,timestamp),0,0 ) as TimeStamp, Value
FROM Table2),
NewT3 as (
SELECT DATETimeFROMPARTS( DATEPART(year,Timestamp) , DATEPART(month,timestamp) , datepart(day,timestamp),datepart(hour,timestamp), datepart(minute,timestamp),0,0 ) as TimeStamp, Value
FROM Table3)
SELECT COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP]) [TIMESTAMPs],
Isnull(Max(a.VALUE), 0) VALUE1,
Isnull(Max(b.VALUE), 0) VALUE2,
Isnull(Max(c.VALUE), 0) VALUE3
FROM NewT1 a
FULL OUTER JOIN NewT2 b
ON a.[TIMESTAMP] = b.[TIMESTAMP]
FULL OUTER JOIN TABLE3 c
ON a.[TIMESTAMP] = b.[TIMESTAMP]
GROUP BY COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP])
ORDER BY [TIMESTAMPs]

Exclude value of a record in a group if another is present v2

In the example table below, I'm trying to figure out a way to sum amount over marks in two situations: the first, when mark 'C' exists within a single id, and the second, when mark 'C' doesn't exist within an id (see id 1 or 2). In the first situation, I want to exclude the amount against mark 'A' within that id (see id 3 in the desired conversion table below). In the second situation, I want to perform no exclusion and take a simple sum of the amounts against the marks.
In other words, for id's containing both mark 'A' and 'C', I want to make the amount against 'A' as zero. For id's that do not contain mark 'C' but contain mark 'A', keep the original amount against mark 'A'.
My desired output is at the bottom. I've considered trying to partition over id or use the EXISTS command, but I'm having trouble conceptualizing the solution. If any of you could take a look and point me in the right direction, it would be greatly appreciated :)
example table:
id mark amount
------------------
1 A 1
2 A 3
2 B 2
3 A 1
3 C 3
desired conversion:
id mark amount
------------------
1 A 1
2 A 3
2 B 2
3 A 0
3 C 3
desired output:
mark sum(amount)
--------------------
A 4
B 2
C 3
You could slightly modify my previous answer and end up with this:
SELECT
mark,
sum(amount) AS sum_amount
FROM atable t
WHERE mark <> 'A'
OR NOT EXISTS (
SELECT *
FROM atable
WHERE id = t.id
AND mark = 'C'
)
GROUP BY
mark
;
There's a live demo at SQL Fiddle.
Try:
select
mark,
sum(amount)
from ( select
id,
mark,
case
when (mark = 'A' and id in (select id from table where mark = 'C')) then 0
else amount
end as amount
from table ) t1
group by mark