I am querying a DATE field:
SELECT DATE ,
FIELD2 ,
FIELD3
into Table_new
FROM Table_old
WHERE (criteria iLIKE '%xxxyyy%')
The DATE field runs from 10/1/2010 to present, but it has missing days along the way. When I export the data (in Tableau, for example), I need the data to line up with a calendar that DOES NOT have any missing dates. This means I need a space/holder for a date, even if no data exists for that date in the query. How can I achieve this?
Right now I am exporting the data, and manually creating a space where no data for a date exists, which is extremely inefficient.
Tableau can do this natively. No need to alter your data set. You just need to make sure that your DATE field is of the date type in Tableau and then show emptycolumns/rows.
My test data:
Before I show empty columns:
How I show empty columns:
After I show empty columns (end result):
If you want to then restrict those dates, you can add the date field to the filter, select your date range, and Apply to Context.
In Postgres, you can easily generate the dates:
select mm.date, t.field1, t.field2
from (select generate_series(mm.mindate, mm.maxdate, interval '1 day') as date
from (select min(date) as mindate, max(date) as maxdate
from table_old
where criteria ilike '%xxxyyy%'
) mm
) d left join
table_old t
on t.date = mm.date and
criteria ilike '%xxxyyy%';
This returns all dates between the minimum and maximum for the criteria. If you have another date range in mind, just use that for the generate_series().
Note: The final condition on criteria needs to go in the on clause not a where clause.
Related
Say I want to match records in table_a that have a startdate and an enddate to individual days and see if on, for instance March 13, one or more records in table_a match. I'd like to solve this by generating a row per day, with the date as the leading column, and any matching data from table_a as a left join.
I've worked with data warehouses that have date dimensions that make this job easy. But unfortunately I need to run this particular query on an OLTP database that doesn't have such a table.
How can I generate a row-per-day table in SQL Server? How can I do this inside my query, without temp tables, functions/procedures etc?
An alternative is a recursive query to generate the date series. Based on your pseudo-code:
with dates_table as (
select <your-start-date> dt
union all
select dateadd(day, 1, dt) from dates_table where dt < <your-end-date>
)
select d.dt, a.<whatever>
from dates_table d
left outer join table_a a on <join / date matching here>
-- where etc etc
option (maxrecursion 0)
I found a bit of a hack way to do this. I'll assume two years of dates is sufficient for your dates table.
Now, find a table in your database that has at least 800 records (365 x 2 + leap years - headache for multiplying 365 = ~~ 800). The question talks about selecting data from table_a, so we'll assume this is table_b. Then, create this Common Table Expression at the top of the query:
with dates_table as (
select top 800 -- or more/less if your timespan isn't ~2years
[date] = date_add(day, ROW_NUMBER() over (order by <random column>) -1, <your-start-date>)
from table_b
)
select d.[date]
, a.<whatever>
from dates_table d
left outer join table_a a on <join / date matching here>
-- where etc, etc, etc
Some notes:
This works by just getting the numbers 0 - 799 and adding them to an arbitrary date.
If you need more or less dates than two years, increase or decrease the number fed into the top statement. Ensure that table_b has sufficient rows: select count(*) from table_b.
<random column can be any column on table_b, the ordering doesn't matter. We're only interested in the numbers 1-800 ( -1 for a range of 0-799), but ROW_NUMBER requires an order by argument.
<your-start-date> has the first date tyou want in the dates table, and is included in the output.
in the where of the joined query, you could filter out any excess days that we overshot by taking 800 rows instead of 730 (+leap) by adding stuff like year(d.[date]) IN (2020, 2021).
If table_a has more than 800 records itself, this could be used as the basis for the dates_table instead of some other table too.
My problem is that in an ancient database there are two tables that I need to query for matching rows based on date. Only in one table date is represented as YYYYMM as a decimal(6) and in the other as YYYY-MM-DD as a date.
How do I join these two tables together?
I am perfectly happy searching on any day or day 01.
You can format that date as YYYYMM using TO_CHAR or VARCHAR_FORMAT, then join the two tables together.
Assuming table A has the date field in col1, and table B has the decimal(6) field in col2, it would look like this:
select *
from A
join B on dec(varchar_format(a.col1, 'YYYYMM'),6,0) = b.col2
You can perform join on those two tables. Suppose first table where date is stored as decimal(6) is A in column col1 and another table be B with date stored as column col2.The query would be something like below :
SELECT * FROM A, B
WHERE INT(A.col1) = INT(SUBSTR(B.col2,1,4)|| SUBSTR(B.col2,6,2))
I have an oracle table having columns {date, id, profit, max_profit}.
I have data in date and profit, and I want highest value of profit till date in max_profit, I am using query below
UPDATE MY_TABLE a SET a.MAX_PROFIT = (SELECT MAX(b.PROFIT)
FROM MY_TABLE b WHERE b.DATE <= a.DATE
AND a.id = b.id)
This is giving me correct result, but I have millions of rows for which query is taking considerable time, any faster way of doing it ?
You can use a MERGE statement with an analytic function:
MERGE INTO my_table dst
USING (
SELECT ROWID rid,
MAX( profit ) OVER ( PARTITION BY id ORDER BY "DATE" ) AS max_profit
FROM my_table
) src
ON ( src.rid = dst.ROWID )
WHEN MATCHED THEN
UPDATE SET max_profit = src.max_profit;
When you do something like "SELECT MAX(...)" you're going to scan all the records implicated in the 'WHERE" part of the query, so you want to make getting all those records as easy on the database as possible.
Do you have an index on the table that includes the id and date columns?
Depending on the behavior of this application, if you're doing a lot fewer updates/inserts (as opposed to doing a ton of reads during reporting or some other process), a possible performance enhancement might be to keep the value you're storing in the max_profit column up to date somewhere while you're changing the data. Have you considered a separate table that just stores the profit calculation for each possible date?
i have a working PostgreSQL query, column "code" is common in both tables and table test.a has date column and i want to limit search results on year, date format is like ( 2010-08-25 )
SELECT *
FROM test.a
WHERE form IN ('xyz')
AND code IN (
SELECT code
FROM test.city)
any help is appreciated
To return rows with date_col values in the year 2010:
SELECT *
FROM test.a
WHERE form = 'xyz'
AND EXISTS (
SELECT 1
FROM test.city
WHERE code = a.code
)
AND date_col >= '2010-01-01'
AND date_col < '2011-01-01';
This way, the query can use an index on date_col (or, ideally on (form, date_col) or (form, code, date_col) for this particular query). And the filter works correctly for data type date and timestamp alike (you did not disclose data types, the "date format" is irrelevant).
If performance is of any concern, do not use an expression like EXTRACT(YEAR FROM dateColumn) = 2010. While that seems clean and simple to the human eye it kills performance in a relational DB. The left-hand expression has to be evaluated for every row of the table before the filter can be tested. What's more, simple indexes cannot be used. (Only an expression index on (EXTRACT(YEAR FROM dateColumn)) would qualify.) Not important for small tables, crucial for big tables.
EXISTS can be faster than IN, except for simple cases where the query plan ends up being the same. The opposite NOT IN can be a trap if NULL values are involved, though:
Select rows which are not present in other table
If by "limit" you mean "filter", then I can give you an option
SELECT
*
FROM
test_a
WHERE
form IN ('xyz')
AND code IN (
SELECT code
FROM test_city
)
AND EXTRACT(YEAR FROM dateColumn) = 2010;
db-fiddle for you to run and play with it: https://www.db-fiddle.com/f/5ELU6xinJrXiQJ6u6VH5/6
I have a table as follows:
ParentActivityID | ActivityID | Timestamp
1 A1 T1
2 A2 T2
1 A1 T1
1 A1 T5
I want to select unique ParentActivityID's along with Timestamp. The time stamp can be the most recent one or the first one as is occurring in the table.
I tried to use DISTINCT but i came to realise that it dosen't work on individual columns. I am new to SQL. Any help in this regard will be highly appreciated.
DISTINCT is a shorthand that works for a single column. When you have multiple columns, use GROUP BY:
SELECT ParentActivityID, Timestamp
FROM MyTable
GROUP BY ParentActivityID, Timestamp
Actually i want only one one ParentActivityID. Your solution will give each pair of ParentActivityID and Timestamp. For e.g , if i have [1, T1], [2,T2], [1,T3], then i wanted the value as [1,T3] and [2,T2].
You need to decide what of the many timestamps to pick. If you want the earliest one, use MIN:
SELECT ParentActivityID, MIN(Timestamp)
FROM MyTable
GROUP BY ParentActivityID
Try this:
SELECT [ParentActivityId],
MIN([Timestamp]) AS [FirstTimestamp],
MAX([Timestamp]) AS [RecentTimestamp]
FROM [Table]
GROUP BY [ParentActivityId]
This will provide you the first timestamp and the most recent timestamp for each ParentActivityId that is present in your table. You can choose the ones you need as per your need.
"Group by" is what you need here. Just do "group by ParentActivityID" and tell that most recent timestamp along all rows with same ParentActivityID is needed for you:
SELECT ParentActivityID, MAX(Timestamp) FROM Table GROUP BY ParentActivityID
"Group by" operator is like taking rows from a table and putting them in a map with a key defined in group by clause (ParentActivityID in this example). You have to define how grouping by will handle rows with duplicate keys. For this you have various aggregate functions which you specify on columns you want to select but which are not part of the key (not listed in group by clause, think of them as a values in a map).
Some databases (like mysql) also allow you to select columns which are not part of the group by clause (not in a key) without applying aggregate function on them. In such case you will get some random value for this column (this is like blindly overwriting value in a map with new value every time). Still, SQL standard together with most databases out there will not allow you to do it. In such case you can use min(), max(), first() or last() aggregate function to work around it.
Use CTE for getting the latest row from your table based on parent id and you can choose the columns from the entire row of the output .
;With cte_parent
As
(SELECT ParentActivityId,ActivityId,TimeStamp
, ROW_NUMBER() OVER(PARTITION BY ParentActivityId ORDER BY TimeStamp desc) RNO
FROM YourTable )
SELECT *
FROM cte_parent
WHERE RNO =1