How to do a query that is agnostic of the sort field? - sql

I have multiple tables that have the same date_time added field in each table. After doing a UNION of all tables i want to sort them by the most recent one. But the query will tell me that the i have to add a table name like videos.date_time rather than ORDER BY date_time. How can i structure the query so that it is agnostic of the which date_time field?

Unless you are using a proprietary feature such as SQL Server's TOP directive, the Order By in a Union query is always at the bottom and always applies to the entire query. E.g.
Select Col1, date_time
From Table1
Union All
Select Col1, date_time
From Table2
Order By date_time
If your query does include various elements such TOP or LIMIT which require an Order By and thus you want to differentiate the Order By's, then you can encapsulate your query into a derived table:
Select Col, date_time
From (
Select Col1 As Col, date_time
From Table1
Union All
Select Col1, date_time
From Table2
) As Z
Order By Z.date_time

In SQL Server you can also order by a column number, e.g. "ORDER BY 2" in which case whatever the second column is in your union set would be the sort target.

As I understand you have X tables (where X is > 1), and every table have it's own date_time column and you want to get last updated. If that's true, than one of the possible ways is to do it that way
SELECT id, date_added FROM table1
UNION ALL
SELECT id, date_added FROM table2
ORDER BY date_added DESC;
Other ways which I have in mind is when you fetch results, put them in array and do the "magic" inside it.

Related

Get minimum without using row number/window function in Bigquery

I have a table like as shown below
What I would like to do is get the minimum of each subject. Though I am able to do this with row_number function, I would like to do this with groupby and min() approach. But it doesn't work.
row_number approach - works fine
SELECT * FROM (select subject_id,value,id,min_time,max_time,time_1,
row_number() OVER (PARTITION BY subject_id ORDER BY value) AS rank
from table A) WHERE RANK = 1
min() approach - doesn't work
select subject_id,id,min_time,max_time,time_1,min(value) from table A
GROUP BY SUBJECT_ID,id
As you can see just the two columns (subject_id and id) is enough to group the items together. They will help differentiate the group. But why am I not able to use the other columns in select clause. If I use the other columns, I may not get the expected output because time_1 has different values.
I expect my output to be like as shown below
In BigQuery you can use aggregation for this:
SELECT ARRAY_AGG(a ORDER BY value LIMIT 1)[SAFE_OFFSET(1)].*
FROM table A
GROUP BY SUBJECT_ID;
This uses ARRAY_AGG() to aggregate each record (the a in the argument list). ARRAY_AGG() allows you to order the result (by value) and to limit the size of the array. The latter is important for performance.
After you concatenate the arrays, you want the first element. The .* transforms the record referred to by a to the component columns.
I'm not sure why you don't want to use ROW_NUMBER(). If the problem is the lingering rank column, you an easily remove it:
SELECT a.* EXCEPT (rank)
FROM (SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY subject_id ORDER BY value) AS rank
FROM A
) a
WHERE RANK = 1;
Are you looking for something like below-
SELECT
A.subject_id,
A.id,
A.min_time,
A.max_time,
A.time_1,
A.value
FROM table A
INNER JOIN(
SELECT subject_id, MIN(value) Value
FROM table
GROUP BY subject_id
) B ON A.subject_id = B.subject_id
AND A.Value = B.Value
If you do not required to select Time_1 column's value, this following query will work (As I can see values in column min_time and max_time is same for the same group)-
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
--A.time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time
Finally, the best approach is if you can apply something like CAST(Time_1 AS DATE) on your time column. This will consider only the date part regardless of the time part. The query will be
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE) Time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE)
-- Make sure the syntax of CAST AS DATE
-- in BigQuery is as I written here or bit different.
Below is for BigQuery Standard SQL and is most efficient way for such cases like in your question
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY value LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY subject_id
Using ROW_NUMBER is not efficient and in many cases lead to Resources exceeded error.
Note: self join is also very ineffective way of achieving your objective
A bit late to the party, but here is a cte-based approach which made sense to me:
with mins as (
select subject_id, id, min(value) as min_value
from table
group by subject_id, id
)
select distinct t.subject_id, t.id, t.time_1, t.min_time, t.max_time, m.min_value
from table t
join mins m on m.subject_id = t.subject_id and m.id = t.id

Get Records By Most Recent Date From two tables

I have two SQL tables. Each has an ID with other columns and a Date.
Is there a way that I can get the result from these two tables in one query sorted by the date? For example, as a result, I may have one record from table 1 followed by two records from table 2 and then another record from table one and so on. I have tried the code below but I think that I am not on the right track.
I would appreciate your help.
SELECT
app.ID as 'AppraisalID',
app.CityName,
app.CountryName,
app.Street,
app.DateCreated,
subApp.ID as 'SubAppraisalID',
subApp.Message,
subApp.DateCreated
From
(
SELECT TOP 10
dbo.Appraisal.ID,
dbo.Appraisal.Street,
dbo.Country.Name as 'CountryName',
dbo.City.Name as 'CityName',
dbo.Appraisal.DateCreated
FROM dbo.Appraisal
INNER JOIN dbo.Country ON dbo.Appraisal.CountryID = dbo.Country.ID
INNER JOIN dbo.City ON dbo.Appraisal.CityID = dbo.City.ID
Order by dbo.Appraisal.DateCreated DESC
) app
Cross Join
(
SELECT TOP 10
dbo.Sub_Appraisal.ID,
dbo.Sub_Appraisal.Message,
dbo.Sub_Appraisal.DateCreated
FROM dbo.Sub_Appraisal
Order by dbo.Sub_Appraisal.DateCreated DESC
) subApp
Order By
app.DateCreated DESC,
subApp.DateCreated DESC
Thanks guys.
What you want to use is the UNION operator, although the column lists for each table (or at least the ones that you are selecting) must match up. You'll want to make sure that you do the ordering after the UNION.
A simplified example:
SELECT
col1,
col2,
some_date
FROM
(
SELECT
col1,
col2,
some_date
FROM
Table1
UNION ALL
SELECT
col1,
col2,
some_date
FROM
Table2
) AS SQ
ORDER BY
some_date
Look at union all. You'll need to make sure that your result columns are the same data type.
select a.id "id", null "message", a.cityname "city", a.countryname "country", a.street "street", a.datecreated "dt"
from dbo.appraisal a
union all
select s.id, s.message, null, null, null, s.datecreated
from dbo.sub_appraisal s
order by 6
However, I suspect that your sub_appraisal table is missing an ID linking it to the appraisal table. This is how you would ideally join the two tables allowing you to accurately get the data out, in the correct order because you cannot guarantee that sub_appraisal records are created directly after appraisal records and before another appraisal record is created. If this happened, your query would give you results you're possibly not expecting.

Select all columns for a distinct column

I need to fetch distinct rows for a particular column from a table having over 200 columns. How can I achieve this?
I used
Select distinct col1 , * from table;
but it failed.
Please suggest an elegant way to achieve this.
Regards,
Tarun
The solution if you want one row for each distinct value in a particular column is:
SELECT col1, MAX(col2), MAX(col3), MAX(col4), ...
FROM mytable
GROUP BY col1
I chose MAX() arbitrarily. It could also be MIN() or some other aggregate function. The point is if you use GROUP BY to make sure to get one row per value in col1, then all the other columns must be inside aggregate functions.
There is no way to write MAX(*) to get that treatment for all columns. Sorry, you will have to write out all the column names (at least those that you need in this query, which might not be all 200).
We can generate a sequence of ROW_NUMBER() for every COL1 and then select the first entry of every sequence.
SELECT * FROM
(
SELECT E.* , ROW_NUMBER() OVER( PARTITION BY COL1 ORDER BY 1) AS ID
FROM YOURTABLE E
) MYDATA
WHERE MYDATA.ID = 1
A working example in fiddle

SQL Server : UNION ALL but remove duplicate IDs by choosing first date of occurrence

I am unioning two queries but I'm getting an ID that occurs in each query. I do not know how to keep only the first time the id occurs. Everything else about the row is different. In general, it will be hard to know which of the two queries I will have to keep a duplicate on, therefore, I need a general solution.
I was thinking about creating a temp table and choosing the min date (once the date has been converted to an int).
Any ideas on the proper syntax?
You can do this using the row_number() function. This will assign a sequential number, starting with 1, to each row with the same id (based on the partition by clause). The ordering of the sequence is determined by the order by clause. So, the following assigns 1 to the earliest date for each id:
select t.*
from (select t.*,
row_number() over (partition by id order by date asc) as seqnum
from ((select *
from <subquery1>
) union all
(select *
from <subquery2>
)
) t
) t
where seqnum = 1;
The final where clause simply filters for the first occurrence.
If you use the keyword UNION, then it will remove duplicates from the two data sets you are working with. UNION ALL preserves duplicates.
You can view the specifics here:
http://www.w3schools.com/sql/sql_union.asp
If you want to only have one of the 2 records and they are not identical you will have to filter them yourself. You may need to do something like the following. THis may be possible to do with the one (select union select) block but this should get you started.
select *
from (
select id
, date
, otherstuf
from table_1
union all
select id
, date
, otherstuf
from table_2
) x1
, (
select id
, date
, otherstuf
from table_1
union all
select id
, date
, otherstuf
from table_2
) x2
where x1.id = x2.id
and x1.date < x2.date
Although rethinking this if you go down a path like this why bother to UNION it?

Fastest way to identify differences between two tables?

I have a need to check a live table against a transactional archive table and I'm unsure of the fastest way to do this...
For instance, let's say my live table is made up of these columns:
Term
CRN
Fee
Level Code
My archive table would have the same columns, but also have an archive date so I can see what values the live table had at a given date.
Now... How would I write a query to ensure that the values for the live table are the same as the most recent entries in the archive table?
PS I'd prefer to handle this in SQL, but PL/SQL is also an option if it's faster.
SELECT term, crn, fee, level_code
FROM live_data
MINUS
SELECT term, crn, fee, level_code
FROM historical_data
Whats on live but not in historical. Can then union to a reverse of this to get whats in historical but not live.
Simply:
SELECT collist
FROM TABLE A
minus
SELECT collist
FROM TABLE B
UNION ALL
SELECT collist
FROM TABLE B
minus
SELECT collist
FROM TABLE A;
You didn't mention how rows are uniquely identified, so I've assumed you also have an "id" column:
SELECT *
FROM livetable
WHERE (term, crn, fee, levelcode) NOT IN (
SELECT FIRST_VALUE(term) OVER (ORDER BY archivedate DESC)
,FIRST_VALUE(crn) OVER (ORDER BY archivedate DESC)
,FIRST_VALUE(fee) OVER (ORDER BY archivedate DESC)
,FIRST_VALUE(levelcode) OVER (ORDER BY archivedate DESC)
FROM archivetable
WHERE livetable.id = archivetable.id
);
Note: This query doesn't take NULLS into account - if any of the columns are nullable you can add suitable logic (e.g. NVL each column to some "impossible" value).
unload to table.unl
select * from table1
order by 1,2,3,4
unload to table2.unl
select * from table2
order by 1,2,3,4
diff table1.unl table2.unl > diff.unl
Could you use a query of the form:
SELECT your columns FROM your live table
EXCEPT
SELECT your columns FROM your archive table WHERE archive date is most recent;
Any results will be rows in your live table that are not in your most recent archive.
If you also need rows in your most recent archive that are not in your live table, simply reverse the order of the selects, and repeat, or get them all in the same query by performing a (live UNION archive) EXCEPT (live INTERSECTION archive)