SQL query searching a common field in all tables - sql

All tables in a DB have the fields creationdate and revisiondate, which are date fields just as you'd think. Looking for a SQL query to find all instances where creationdate > '2017-02-01'. I'm not able to find an example where you loop through each table to return all new records as of X date in a DB. The DB has 1000 tables so I need to be able to search dynamically. The one table version of the query is (select * from tableA where creationdate > '2017-02-01') I just need to do that against all tables. Thanks!!!!

SELECT schema.column_1, schema.column2
FROM table_name_1
UNION
SELECT schema.column_same_datatype, schema.column2_same_datatype
FROM table_name_2
WHERE creation_date > '2017-02-01';
NOTE: YOu should have precaution about date format. I think the most common date format is DD-MM-YYYY.

Related

Querying all partition table

I have around 600 partitioned tables called table.ga_session. Each table is separated by 1 day, and for each table it has its own unique name, for example, table for date (30/12/2021) has its name as table.ga_session_20211230. The same goes for other table, the naming format would be like this table.ga_session_YYYYMMDD.
Now, when I try to call all partitioned table, I cannot use command like this:. The error showed that _PARTITIONTIME is unrecognized.
SELECT
*,
_PARTITIONTIME pt
FROM `table.ga_sessions_20211228`
where _PARTITIONTIME
BETWEEN TIMESTAMP('2019-01-01')
AND TIMESTAMP('2020-01-02')
I also tried this and does not work
select *
from between `table.ga_sessions_20211228`
and
`table.ga_sessions_20211229`
I also cannot use FROM 'table.ga_sessions' to apply WHERE clause to take out range of time as the table does not exist. How do I call all of these partitioned table? Thank you in advance!
You can query using wildcard tables. For example:
SELECT max
FROM `bigquery-public-data.noaa_gsod.gsod*`
WHERE _TABLE_SUFFIX = '1929'
This will specifically query the gsod1929 table, but the table_suffix clause can be excluded if desired.
In your scenario you could do:
select *
from table.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20190101' and '20200102'
For more information see the documentation here:
https://cloud.google.com/bigquery/docs/reference/standard-sql/wildcard-table-reference

Compare columns from two Oracle databases, matching field is date, one has time one doesn't

I need to compare and find a percent error for several columns from two different Oracle databases. DB1 DATE_TIME column only has date, no time (DD-MON-YY), DB2 DATE_TIME column has date and time. Each row represents one hour, and DB1 is in the correct order by hour, but with no actual times. Need the relevant columns to match up between id and date_time (specifically, hour), but I've found that a WHERE clause testing for DATE equality will only give the DB1 entry corresponding to 12:00:00 AM because of not having times as part of the date format in DB1, so I'm not able to to compare the correct entries for other times. How can I get around this?
Code below to better illustrate:
SELECT db1.field1, db2.field1, db1.date_time, db2.date_time
FROM db1, db2
WHERE db1.date_time = db2.date_time AND db1.id = X
ORDER BY db2.date_time DESC;
This query runs, but none of the data actually matches because it's only returning the first row of each day from DB1 (corresponding to 12:00:00 AM).
I've thought of somehow inserting corresponding time stamps to DB1 DATE_TIME column based off position so I can include time in the WHERE, but not sure how to do that, or if it will even work. I've seen that running a test query using BETWEEN day1 and day2 (instead of =) returns the results I want for a given range of days, but I'm not sure how to implement that in the JOIN that I'm trying to do with DB2.
Any ideas?
If you care about performance, I would suggest that you create a function-based index on db2:
create index idx_db2_datetime_date on db2(trunc(datetime));
Then, you can use this construct in the query:
SELECT db1.field1, db2.field1, db1.date_time, db2.date_time
FROM db1 JOIN
db2
ON db1.date_time = trunc(db2.date_time)
WHERE db1.id = X
ORDER BY db2.date_time DESC;
For this query, an index on db1(id, date_time) is also helpful.
The indexes are not necessary for the query to work, but the function-based index is a nice way to write a performant query with a function in the ON clause.
Note: Learn to use proper, explicit JOIN syntax. Never use commas in the FROM clause.
For a SELECT, you might want to try along:
SELECT db1_.field1, db2.field1, db1_.date_time, db2.date_time
FROM (
SELECT
id
, field1
, date_time + (RANK() OVER (PARTITION BY date_time ORDER BY id - 1) / 24 date_time
FROM DB1
) DB1_
JOIN db2
ON db1.date_time = TRUNC(db2.date_time, 'HH24')
AND db1.id = X
ORDER BY db2.date_time DESC;
If you prefer to get the hours added to DB1.date_time, please, try:
UPDATE
DB1 TDB1
SET TDB1.date_time =
(SELECT
d_t
FROM
(SELECT
id
, (date_time + (RANK() OVER (PARTITION BY date_time ORDER BY id) - 1) / 24) d_t
FROM DB1)
WHERE id = TD1.id
) DB1
;
Sorry, no suitable test data to verify in full at this time.
Please comment if and as this requires adjustment / further detail.

SQL: Move duplicates to another table where condition

I am quite new to SQL and Stackoverflow, so pardon the layout of my post.
Currently, I am struggling with putting the following workflow into an executable SQL statement:
I have a table containing the following columns:
ID (not unique)
PARTYTYPE (1 or 2)
DATE column
several other, not relevant columns
Now I need to find those observations (rows) that have the same ID and same PARTYTYPE but are not the most recent, i.e. have a date in the DATE column that is less than the most recent for the given combination of PARTYTYPE and ID. The rows that satisfy this condition need to be moved to another table with the same table scheme in order to archive them.
Is there an efficient, yet simple way to accomplish this in SQL?
I have been looking for a long time, but since it involves finding duplicates with certain conditions and inserting it into a table, it is a rather specific problem.
This is what I have so far:
INSERT INTO table_history
select ID, PARTYTYPE, count(*) as count_
from table
group by ID, PARTYTYPE, DATE
having DATE = MAX(DATE)
Any help would be appreciated!
The way you describe the SQL almost exactly conforms to a correlated subquery:
INSERT INTO table_history( . . . )
select t.*
from table t
where date < (select max(date)
from table t2
where t2.id = t.id and t2.partytype = t.partytype
);

how to write a two counts and the newest record in hive

my data format is like this
query guid result time
I want to write a sql like
select
query,
count(query),
count(distinect guid),
result
from
table
group by
query
second column means the number of same querys,third column means the number of the distinct guids,the fourth column means the newest result,while same query may have several results and we chose the newest result by the time.since its logic is a little complex,how can i write a sql to do all these things?
select a.md5,a.cnt,a.wide,b.main_level from (select md5,count(md5) cnt,count(distinct guid) wide,max(time) maxtime from hive group by md5) a join hive b on a.maxtime = b.time ;

T-SQL dates comparison

Is a result set of the following query:
SELECT * FROM Table
WHERE Date >= '20130101'
equals to result set of the following query:
SELECT * FROM Table
WHERE Date = '20130101'
UNION ALL
SELECT * FROM Table
WHERE Date > '20130101'
?
Date is DATETIME field
On the result YES but on the performance NO.
There may have a performance issue. The first one only scans the table once while the second one scans twice because of UNION. (one SELECT statement is more faster than two combine select statement)
So I'd rather go on the first one.