Please help me with the below :
The table AR_X_LO is a SCD TYPE 2 table. There was a bug in the ETL with the result that changed records has not been end dated, e.g.
AR_X_LO_TP_ID AR_ID EFF_TMS LO_ID RANK END_TMS ORIG_SRC_STM_ID RT_TMS
------------- ------- ------------------- -------- ---- ---------- --------------- ----------
802 6751231 2016-06-08 00:00:00 39748325 1 NULL 9643 2016-06-09
802 6751231 2015-05-02 00:00:00 29496916 1 NULL 9643 2015-05-04
The ETL was supposed to end the changed row with the EFF_TMS of the new row - 1 day.
AR_X_LO_TP_ID AR_ID EFF_TMS LO_ID RANK END_TMS ORIG_SRC_STM_ID RT_TMS
------------- ------- ------------------- -------- ---- ---------- --------------- ----------
802 6751231 2016-06-08 39748325 1 NULL 9643 2016-06-09
802 6751231 2015-05-02 29496916 1 2016-06-07 9643 2015-05-04
I want to write a SQL query that for each AR_ID, AR_X_LO_TP_ID, RANK, ORIG_SRC_STM_ID combination returns what the END_TMS was supposed to be.
According to your request being for
"a SQL query that [...] returns what the END_TMS was supposed to be"
and since you specified the SAS tag, the following SAS code will do just that:
proc sql;
create table result as
select t1.*, datepart(t2.EFF_TMS)-1 as END_TMS format=E8601DA.
from AR_X_LO(drop=END_TMS) t1
left join AR_X_LO t2
on t1.AR_ID = t2.AR_ID
and t1.AR_X_LO_TP_ID = t2.AR_X_LO_TP_ID
and t1.RANK= t2.RANK
and t1.ORIG_SRC_STM_ID = t2.ORIG_SRC_STM_ID
and t1.EFF_TMS < t2.EFF_TMS
group by t1.EFF_TMS
having END_TMS=min(END_TMS)
;
quit;
Be aware that this code contains SAS-specific statements/functions (like the datepart() function, format= statement or the drop= dataset option) which will not work in other SQL environments (like Oracle which you also tagged) and will perform poorly in SAS if indeed you are working on an Oracle backend.
If the latter is true, you could probably do this more elegantly with analytic functions such as lag, lead, partition by, etc (using SQL-passthrough when within SAS)
NOTE: conforming to your provided example of the expected result, I returned the END_TMS as date even though the name of that variable suggest it should probably by a timestamp (datetime in SAS).
Related
I have a very big transaction table on DB2 v11, and I need to query a subset of it as efficiently as possible. All I need is the total count of the set (not known in advance, it's based on criteria, lets say 1 day) and the ID of the first record, and the ID of the last record.
The old code was fetching the entire table, then just using the 1st record ID, and the last record ID, and size, and not making use of the rest. Now this code is timing out. It's a complex query of several joins.
IS there a way to just fetch the size of the set, 1st record, last record all in one select query ?
I've read that reordering the list in order to fetch the 1st record(so fetch with Desc, then change to Asc) is not efficient.
sample table 1 TRANSACTION_RECORDS:
tdID TIMESTAMP name
-------------------------------
123 2020-03-31 john
234 2020-03-31 dan
456 2020-03-01 Eve
675 2020-04-01 joy
sample table 2 TRANSACTION_TYPE:
invoiceId tdID account
------------------------------
897 123 abc
898 123 def
877 234 mnc
899 456 opp
Sample query
select Min(tr.transaction_id), Max(tr.transaction_id)
from TRANSACTION_RECORDS TR
join TRANSACTION_TYPE TT
on TR.tdID=tt.tdID
WHERE Date(TR.TIMESTAMP) = '2020-03-31'
group by tr.tdID
order by TR.tdID ASC
This results in multiple columns, (but it requires the group by)
123,123
234,234
456,456
What I want is:
123,456
As I mentioned in the comments, for this query you don't need Group BY and neither Order by, just do:
select Min(tr.transaction_id), Max(tr.transaction_id)
from TRANSACTION_RECORDS TR
join TRANSACTION_TYPE TT
on TR.tdID=tt.tdID
WHERE Date(TR.TIMESTAMP) = '2020-03-31'
It should work as expected
I want to convert the sql query below to doctrine query in symfony.
select p.name,p.id,sum(t.amount) as bal from transactions t right join processors p on t.processor_id=p.id where user_id=18 or user_id is null group by p.id
The above code fetches balance from transactions table by summing up amounts of each transaction for a user for each processor.
Result:
Processor1 --------- 43
Processor2 --------- 12
Processor3 --------- NULL
Processor4 --------- NULL
Processor5 --------- NULL
The query i tried with dql is:
$sql = $procRepo->createQueryBuilder('t');
$sql->select('p.name');
$sql->leftJoin('t.processorId','p');
$sql->addSelect('sum(t.amount) as bal');
$sql->groupBy('p.id');
$sql->orderBy('p.name')->getQuery()->getResult();
Result:
Processor1 --------- 43
Processor2 --------- 12
So my problem is i also want to get the NULL rows.
Note: I am using Symfony 3
Can anybody help?
You needs to invert the join statement to gets all processors:
$sql = $procRepo->createQueryBuilder('p');
$sql->select('p.name', 'sum(t.amount) as bal');
$sql->leftJoin('p.transaction', 't');
$sql->groupBy('p.id');
$result = $sql->orderBy('p.name')->getQuery()->getResult();
This query must be made in your ProcessorRepository.
I have two tables.
Order
Replication.
A single order record can have multiple Replication records. I want to join these two tables, such that i always retrieve a single record out of the join even if multiple records exist.
Sample data
Replication table:
ORDID | STATUS | ID | ERRORMSG | HTTPSTATUS | DELIVERYCNT
=========================================================
1717410307 1 JBM-9e92ae0c NULL 200 1
----------
1717410307 1 JBM-9fb59af1 NULL 400 -99
----------
1717410308 1 JBM-0764b091 NULL 403 1
----------
1717410308 1 JBM-0764b091 NULL 200 1
Order Table:
ORDID | ORDTYPE | DATE
----------
1717410307 CAR 22-SEP-2011
1717410308 BUS 23-SEP-2011
How can i make a join effectively so as , i will get as many records in order table and a replication table that should be dynamically selected on a priority basis.
The priority can be defined as :
Any record with a delivery count of -99
HTTPSTATUS != 200
Please guide me how can i proceed with this joining?
Please let me know if you need any clarification.
Your help is much appreciated!
Is it possible to use ORDER BY clause based on the HTTPSTATUS and DELIVERYCNT?
In that case you can write a specific ORDER BY and getting the TOP 1 from it (don't know which RDBMS do you use) or getting ROW_NUMBER() OVER (ORDER BY ... ) AS RowN WHERE RowN = 1
But this is the ugly (yet quick) solution.
The other option is to make a subquery where you add a new column which will make the priority calculation.
To make the query effective you should consider indexing (or using RDBMS specific solutions like included columns)
I have a large table (TokenFrequency) which has millions of rows in it. The TokenFrequency table that is structured like this:
Table - TokenFrequency
id - int, primary key
source - int, foreign key
token - char
count - int
My goal is to select all of the rows in which two sources have the same token in it. For example if my table looked like this:
id --- source --- token --- count
1 ------ 1 --------- dog ------- 1
2 ------ 2 --------- cat -------- 2
3 ------ 3 --------- cat -------- 2
4 ------ 4 --------- pig -------- 5
5 ------ 5 --------- zoo ------- 1
6 ------ 5 --------- cat -------- 1
7 ------ 5 --------- pig -------- 1
I would want a SQL query to give me source 1, source 2, and the sum of the counts. For example:
source1 --- source2 --- token --- count
---- 2 ----------- 3 --------- cat -------- 4
---- 2 ----------- 5 --------- cat -------- 3
---- 3 ----------- 5 --------- cat -------- 3
---- 4 ----------- 5 --------- pig -------- 6
I have a query that looks like this:
SELECT F.source AS source1, S.source AS source2, F.token,
(F.count + S.count) AS sum
FROM TokenFrequency F
INNER JOIN TokenFrequency S ON F.token = S.token
WHERE F.source <> S.source
This query works fine but the problems that I have with it are that:
I have a TokenFrequency table that has millions of rows and therefore need a faster alternative to obtain this result.
The current query that I have is giving duplicates. For example its selecting:
source1=2, source2=3, token=cat, count=4
source1=3, source2=2, token=cat, count=4
Which isn't too much of a problem but if there is a way to elimate those and in turn obtain a speed increase then it would be very useful
The main issue that I have is speed of the query with my current query it takes hours to complete. The INNER JOIN on a table to itself is what I believe to be the problem. Im sure there has to be a way to eliminate the inner join and get similar results just using one instance of the TokenFrequency table. The second problem that I mentioned might also promote a speed increase in the query.
I need a way to restructure this query to provide the same results in a faster, more efficient manner.
Thanks.
I'd need a little more info to diagnose the speed issue, but to remove the dups, add this to the WHERE:
AND F.source<S.source
Try this:
SELECT token, GROUP_CONCAT(source), SUM(count)
FROM TokenFrequency
GROUP BY token;
This should run a lot faster and also eliminate the duplicates. But the sources will be returned in a comma-separated list, so you'll have to explode that in your application.
You might also try creating a compound index over the columns token, source, count (in that order) and analyze with EXPLAIN to see if MySQL is smart enough to use it as a covering index for this query.
update: I seem to have misunderstood your question. You don't want the sum of counts per token, you want the sum of counts for every pair of sources for a given token.
I believe the inner join is the best solution for this. An important guideline for SQL is that if you need to calculate an expression with respect to two different rows, then you need to do a join.
However, one optimization technique that I mentioned above is to use a covering index so that all the columns you need are included in an index data structure. The benefit is that all your lookups are O(log n), and the query doesn't need to do a second I/O to read the physical row to get other columns.
In this case, you should create the covering index over columns token, source, count as I mentioned above. Also try to allocate enough cache space so that the index can be cached in memory.
If token isn't indexed, it certainly should be.
I have a table where I store customer sales (on periodicals, like newspaper) data. The product is stored by issue. Example
custid prodid issue qty datesold
1 123 2 12 01052008
2 234 1 5 01022008
1 123 1 5 01012008
2 444 2 3 02052008
How can I retrieve (whats a faster way) the get last issue for all products, for a specific customer? Can I have samples for both SQL Server 2000 and 2005? Please note, the table is over 500k rows.
Thanks
Assuming that "latest" is determined by date (rather than by issue number), this method is usually pretty fast, assuming decent indexes:
SELECT
T1.prodid,
T1.issue
FROM
Sales T1
LEFT OUTER JOIN dbo.Sales T2 ON
T2.custid = T1.custid AND
T2.prodid = T1.prodid AND
T2.datesold > T1.datesold
WHERE
T1.custid = #custid AND
T2.custid IS NULL
Handling 500k rows is something that a laptop can probably handle without trouble, let alone a real server, so I'd stay clear of denormalizing your database for "performance". Don't add extra maintenance, inaccuracy, and most of all headaches by tracking a "last sold" somewhere else.
EDIT: I forgot to mention... this doesn't specifically handle cases where two issues have the same exact datesold. You might need to tweak it based on your business rules for that situation.
Generic SQL; SQL Server's syntax shouldn't be much different:
SELECT prodid, max(issue) FROM sales WHERE custid = ? GROUP BY prodid;
Is this a new project? If so, I would be wary of setting up your database like this and read up a bit on normalization, so that you might end up with something like this:
CustID LastName FirstName
------ -------- ---------
1 Woman Test
2 Man Test
ProdID ProdName
------ --------
123 NY Times
234 Boston Globe
ProdID IssueID PublishDate
------ ------- -----------
123 1 12/05/2008
123 2 12/06/2008
CustID OrderID OrderDate
------ ------- ---------
1 1 12/04/2008
OrderID ProdID IssueID Quantity
------- ------ ------- --------
1 123 1 5
2 123 2 12
I'd have to know your database better to come up with a better schema, but it sound like you're building too many things into a flat table, which will cause lots of issues down the road.
If you're looking for most recent sale by date maybe that's what you need:
SELECT prodid, issue
FROM Sales
WHERE custid = #custid
AND datesold = SELECT MAX(datesold)
FROM Sales s
WHERE s.prodid = Sales.prodid
AND s.issue = Sales.issue
AND s.custid = #custid
To query on existing growing historical table is way too slow!
Strongly suggest you create a new table tblCustomerSalesLatest which stores the last issue data of each customer. and select from there.