Optimal SQL query for querying 31 tables (containing datestamp in tablename) - sql

Fairly new to SQL and I was stumped on this question I received in an interview recently.
The question was along the lines of how would you count the total occurrences of 'True' for Column B in July.
Problem was; there was no date or timestamp column in the table. Instead the table naming convention was defined as "ProductX_YYYYMMDD". The assumption being that a new table is created for each day's data dump.
Is there an efficient query I can write to obtain the True COUNTs of Column B for each table (which doesn't involve ~30 JOIN or UNION statements to get the answer)?

Use STRING_SPLIT(myColumn, '_')
Then
SELECT RIGHT (SELECT LEFT(tempColumn, -4)), -2)
Now you have a temp table filled with only month |MM| and you can use
COUNT() FROM dailyTable WHERE dailyName like '07'
Add the count of every daily Table to a variable

Related

Null values not being returned in Postgresql

I have two tables in Postgresql, which I need to perform the union taking the null values, to add other values in another column of the junction.
Table one:
I filtered by date, because this data is generated daily and I only need the current_date
Table two: All names.
In table two I have 9 names that are not found in table one.
When I try to perform the join, I only get the 9 names from table one as a result.
Trying with date from table one to current_date
But if I don't filter the date from table one, the null value is returned.
That is, the name that is in table two but not in table one.
What I need is to join the two tables and where there is no asset referring to the second table, fill it with 0 (zero).
In this part I understood that I must use COALESCE(vcm.ativo,0).
But first I need the names of the second table to appear as well.
The result should be like this:
If someone could help me, I'll be grateful.
As pointed out in a comment by the asker, the solution turned out to be
with todays_data as (
select vcm.cooperativa, vcm.ativo
from sga_bi.veiculos_coop_mensal as vcm
where data = current_date
)
select coop.nome, COALESCE(vcmm.ativo,0)
from sga.cooperativas as coop
left outer join todays_data as vcmm
on coop.nome = vcmm.cooperativa

SQL for append rows based on max date

This is more of a logic question as I am having a hard time wrapping my head around it.
Say I have table 1 that is truncated and populated everyday, and a time stamp column is added onto it. Everyday new records would be added to the table.
That table 1 is copied to table 2 initially, however on consequent runs I only want to add the new records from table 1 into table 2.
I know this will be a mixture of matching the columns and only importing the MAX DATES, however confused as to the actual logic of the query.
So in short I want to append only the latest rows from table 1 to table 2 based on the max date.
If you want to sync the tables daily, you may just look for timestamp_column > current_Date.
If you want to get the max dates, you can write something like this:
INSERT INTO table2 (x,y,z, timestamp_column)
SELECT x,y,z, current_timestamp() FROM table1
WHERE timestamp_column >
(SELECT IFNULL(MAX(timestamp_column), '0001-01-01' ) FROM table2);
On the other hand, I think Snowflake streams are a very good fit for this task:
https://docs.snowflake.com/en/user-guide/streams-intro.html
You can create an "Append-only" stream on table1, and use it as a source when synchronizing to table2.

I need to retrieve a column from already retrieved column from a table

I had a table which has a column like this which i retrieved from this query
select distinct HDD_WP_RPTNG_AS_OF_SID
from wcadbo.WCA_MDW_D_HLDNGS_DATE
order by HDD_WP_RPTNG_AS_OF_SID desc;
Table:
HDD_WP_RPTNG_AS_OF_SID
20210501
20210430
20210429
20210428
It contains dates in integer format.
I wrote a query to retrieve another column of these dates in date format and I named column as AS_OF_DATE - like this:
SELECT DISTINCT
HDD_WP_RPTNG_AS_OF_SID,
to_date(HDD_WP_RPTNG_AS_OF_SID,'YYYYMMDD') AS_OF_DATE
FROM
WCADBO.WCA_MDW_D_HLDNGS_DATE
ORDER BY
HDD_WP_RPTNG_AS_OF_SID DESC;
Result set:
HDD_WP_RPTNG_AS_OF_SID AS_OF_DATE
----------------------------------
20210501 01-MAY-21
20210430 30-APR-21
20210429 29-APR-21
20210428 28-APR-21
Now I need another column as Display_Date in char type which gives LastAvailableDate for latest date in previous column or gives Date in char type for all other dates like this
I wrote this query but not working:
SELECT
HDD_WP_RPTNG_AS_OF_SID,
AS_OF_DATE,
Display_date
FROM
(SELECT DISTINCT
HDD_WP_RPTNG_AS_OF_SID,
to_date(HDD_WP_RPTNG_AS_OF_SID,'YYYYMMDD') AS_OF_DATE
FROM
WCADBO.WCA_MDW_D_HLDNGS_DATE
ORDER BY
HDD_WP_RPTNG_AS_OF_SID DESC)
WHERE
Display_Date = (CASE
WHEN AS_OF_DATE = '01-MAY-21'
THEN 'Last_Available_date'
ELSE TO_CHAR(AS_OF_DATE, 'MON DD YYYY')
END);
Finally I need three columns, one is already in table but modified a bit. Other two are temporary ones(AS_OF_DATE and Display_Date) that i need to retrieve.
I'm a beginner in SQL and couldn't figure out how to retrieve column from another temporary column..
Kindly help, Thank you.
BTW I was doing it in Oracle SQL Developer
It looks like you want something like this
SELECT
subQ.HDD_WP_RPTNG_AS_OF_SID,
subQ.AS_OF_DATE,
(CASE
WHEN subQ.AS_OF_DATE = date '2021-05-01'
THEN 'Last_Available_date'
ELSE TO_CHAR(subQ.AS_OF_DATE, 'MON DD YYYY')
END) Display_date
FROM
(SELECT DISTINCT
tbl.HDD_WP_RPTNG_AS_OF_SID,
to_date(tbl.HDD_WP_RPTNG_AS_OF_SID,'YYYYMMDD') AS_OF_DATE
FROM
WCADBO.WCA_MDW_D_HLDNGS_DATE tbl) subQ
ORDER BY
subQ.HDD_WP_RPTNG_AS_OF_SID DESC
Comments
If you want to add computed columns, that is done in the projection (the select list)
Always compare dates to dates and strings to strings. So in your case statement, compare the date as_of_date against another date. In this case I'm using a date literal. You could also call to_date on a string parameter.
If you want the results of the query ordered by a particular column, you want that order by applied at the outermost layer of the query, not in an inline view.
You basically always want to use aliases when referring to any column in a query. It's less critical in situations where everything is coming from one table but as soon as you start referencing multiple tables in a query, it becomes annoying to look at a query and not sure where a column is coming from. Even in a query like this where there is an inline view and an outer query, it makes it easier to read the query if you're explicit about where the columns are coming from.
Do you really need the distinct? I kept it because it was a part of the original query but I get antsy whenever I see a distinct particularly from people learning SQL. Doing a distinct is a relatively expensive operation and it is very commonly used to cover up an underlying issue in the query (i.e. that you're getting multiple rows because some other column you aren't showing has multiple values) that ought to be addressed correctly (i.e. by adding an additional predicate to ensure that you're only getting each hdd_wp_rptg_as_of_sid once).
Storing dates as strings in tables (as is done apparently with hdd_wp_rptg_as_of_sid) is a really bad practice. If one person writes one row to the table where the string isn't in the right format, your query will suddenly stop working and start throwing errors, for example.

SQL-Server 2012 Subtracting Rows

I have some counts that get periodically recorded into SQL and I'm trying to find out the difference between the start count and the final count.
Raw Data Below
This table has around 30 columns but they are the more of the same just different counts.
I want to take the min and max row for a time period based on user input from an SQL report I can filter out the data with the code below. (I can also filter it into two different tables, one with min, and one with max if that is easier.)
SELECT *
FROM #tempTable
WHERE TableIndex = (SELECT min(TableIndex) FROM #tempTable)
or TableIndex = (SELECT max(TableIndex) FROM #tempTable)
Filtered Data Below
The end goal is the difference between these two rows, I would then give that data to an SQL report to display a bar graph.
I've seen solutions but they seemed overly complex for what I'm trying to do and I would need to define each column I'm subtracting vs using *. Some of the tables have a couple hundred columns.
How about joining these together?
SELECT tt.*, ttm.*
FROM #tempTable tt
(SELECT min(TableIndex) as minti, max(TableIndx) as maxti
FROM #tempTable
) ttm
ON ttmTableIndex IN (ttm.minti, ttm.maxti);
You can then do whatever arithmetic operations you like with the values in the same row.
Personally, I would find it easier to just put the two rows into a spreadsheet and subtract the values using a formula.

SQL Query to count multiple values from one table into specific view

I like to request your help. I can get the results seperated but now i want to create a query which has it perfect for a external person. my explanation:
I have a statistics database with in this database a table when some records comes in and each records has several columns with values etc...
Now one of these columns is called "MT"
MT Column can have only one of the following values per records: A,B,C,D,E
The records also have a columne called TotalAmount which indicate a size of a value outside the database. This TotalAmount column is numeric without decimals and can have a value between 1 and 10.000.
And the last part is the records it self, the table has X amount of records.
So Basicly i need to create a query which seperates each MT value and calculates the amount of records per MT and the sum of TotalAmount.
This is on SQL Server 2005.
Many thanks for your assistance!
Very hard to guess without a full db schema. But I think you need.
SELECT MT, Count(*), SUM (TotalAmout)
FROM YourTable
GROUP BY MT