Generate id row for a view with grouping - sql

I'm trying to create a view with row numbers like so:
create or replace view daily_transactions as
select
generate_series(1, count(t)) as id,
t.ic,
t.bio_id,
t.wp,
date_trunc('day', t.transaction_time)::date transaction_date,
min(t.transaction_time)::time time_in,
w.start_time wp_start,
w.start_time - min(t.transaction_time)::time in_diff,
max(t.transaction_time)::time time_out,
w.end_time wp_end,
max(t.transaction_time)::time - w.end_time out_diff,
count(t) total_transactions,
calc_att_status(date_trunc('day', t.transaction_time)::date,
min(t.transaction_time)::time,
max(t.transaction_time)::time,
w.start_time, w.end_time ) status
from transactions t
left join wp w on (t.wp = w.wp_name)
group by ic, bio_id, t.wp, date_trunc('day', transaction_time),
w.start_time, w.end_time;
I ended up with duplicate rows. SELECT DISTINCT doesn't work either. Any ideas?
Transaction Table:
create table transactions(
id serial primary key,
ic text references users(ic),
wp text references wp(wp_name),
serial_no integer,
bio_id integer,
node integer,
finger integer,
transaction_time timestamp,
transaction_type text,
transaction_status text
);
WP table:
create table wp(
id serial unique,
wp_name text primary key,
start_time time,
end_time time,
description text,
status text
);
View Output:

CREATE OR REPLACE VIEW daily_transactions as
SELECT row_number() OVER () AS id
, t.ic
, t.bio_id
, t.wp
, t.transaction_time::date AS transaction_date
, min(t.transaction_time)::time AS time_in
, w.start_time AS wp_start
, w.start_time - min(t.transaction_time)::time AS in_diff
, max(t.transaction_time)::time AS time_out
, w.end_time AS wp_end
, max(t.transaction_time)::time - w.end_time AS out_diff
, count(*) AS total_transactions
, calc_att_status(t.transaction_time::date, min(t.transaction_time)::time
, max(t.transaction_time)::time
, w.start_time, w.end_time) AS status
FROM transactions t
LEFT JOIN wp w ON t.wp = w.wp_name
GROUP BY t.ic, t.bio_id, t.wp, t.transaction_time::date
, w.start_time, w.end_time;
Major points
generate_series() is applied after aggregate functions, but produces multiple rows, thereby multiplying all output rows.
The window function row_number() is also applied after aggregate functions, but only generates a single number per row. You need PostgreSQL 8.4 or later for that.
date_trunc() is redundant in date_trunc('day', t.transaction_time)::date.
t.transaction_time::date achieves the same, simper & faster.
Use count(*) instead of count(t). Same result here, but a bit faster.
Some other minor changes.

Related

SQLite: Workaround for SQLite-TRIGGER with WITH

I'm working on a project to monitor downtimes of production lines with an embedded device. I want to automate acknowledging of these downtimes by generic rules the user can configure.
I want to use a TRIGGER but get a syntax error near UPDATE even though the documentation says it should be fine to use the WITH statement.
CREATE TRIGGER autoAcknowledge
AFTER UPDATE OF dtEnd ON ackGroups
FOR EACH ROW
WHEN old.dtEnd IS NULL AND new.dtEnd IS NOT NULL
BEGIN
WITH sub1(id, stationId, groupDur) AS (
SELECT MIN(d.id), d.station,
strftime('%s', ag.dtEnd) - strftime('%s', ag.dtStart)
FROM ackGroups AS ag
LEFT JOIN downtimes AS d on d.acknowledge = ag.id
WHERE ag.id = old.id
GROUP BY ag.id ),
sub2( originId, groupDur, reasonId, above, ruleDur) AS (
SELECT sub1.stationId, sub1.groupDur, aar.reasonId, aar.above, aar.duration
FROM sub1
LEFT JOIN autoAckStations AS aas ON aas.stationId = sub1.stationId
LEFT JOIN autoAckRules AS aar ON aas.autoAckRuleId = aar.id
ORDER BY duration DESC )
UPDATE ackGroups SET (reason, dtAck, origin)=(
SELECT reasonId, datetime('now'), originId
FROM sub2 as s
WHERE ( s.ruleDur < s.groupDur AND above = 1 ) OR (s.ruleDur > s.groupDur AND above = 0)
LIMIT 1
)
WHERE id = old.id;
END
Background: First we have the downtimes table. Each production line consists of multiple parts called stations. Each station can start the line downtime and they can overlap with other stations downtimes.
CREATE TABLE "downtimes" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"station" integer NOT NULL,
"acknowledge" integer,
"dtStart" datetime NOT NULL,
"dtEnd" datetime,
"dtLastModified" datetime)
Overlaping downtimes are grouped to acknowledge groups using TRIGGER AFTER INSERT on downtimes to set acknowledge id right or create a new group.
CREATE TABLE "ackGroups" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"reason" integer,
"dtAck" datetime,
"dtStart" datetime NOT NULL,
"dtEnd" datetime,
"line" integer NOT NULL,
"origin" integer)
The autoAckRules table represents the configuration. The user decides whether the rule should apply to durations higher or lower a certain value and which rasonId should be used to acknowledge.
CREATE TABLE "autoAckRules" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"description" text NOT NULL,
"reasonId" integer NOT NULL,
"above" bool NOT NULL,
"duration" real NOT NULL)
The autoAckStations table is used to manage M:N relationship. Each rule allow multiple stations which started the ackGroup.
CREATE TABLE autoAckStations (
autoAckRuleId INTEGER NOT NULL,
stationId INTEGER NOT NULL,
PRIMARY KEY ( autoAckRuleId, stationId )
)
When the last downtime ends dtEnd of ackGroups is set to datetime('now') and the trigger is fired to check if there is a autoAckRule that fits.
If I substitute the sub selects with a SELECT .. FROM( SELECT .. FROM(SELECT .. FROM ))) cascade
is there a nice way to avoid the need to write and evaluate it twice?
Or am I missing something stupid?
Common table expression are not supported for statements inside of triggers. You need to convert CTE to sub-query such as
CREATE TRIGGER autoAcknowledge
AFTER UPDATE OF dtEnd ON ackGroups
FOR EACH ROW
WHEN old.dtEnd IS NULL AND new.dtEnd IS NOT NULL
BEGIN
UPDATE ackGroups
SET (reason, dtAck, origin)= (
SELECT reasonId, datetime('now'), originId
FROM (SELECT sub1.stationId AS originId,
sub1.groupDur AS groupDur,
aar.reasonId AS reasonId,
aar.above AS above,
aar.duration AS ruleDur
FROM (SELECT MIN(d.id) AS id,
d.station AS stationId,
strftime('%s', ag.dtEnd) - strftime('%s', ag.dtStart) AS groupDur
FROM ackGroups AS ag
LEFT
JOIN downtimes AS d
ON d.acknowledge = ag.id
WHERE ag.id = old.id
GROUP BY ag.id ) AS sub1
LEFT
JOIN autoAckStations AS aas
ON aas.stationId = sub1.stationId
LEFT
JOIN autoAckRules AS aar
ON aas.autoAckRuleId = aar.id
ORDER BY duration DESC) as s
WHERE ( s.ruleDur < s.groupDur AND above = 1 ) OR (s.ruleDur > s.groupDur AND above = 0)
LIMIT 1
);
END;

I need an oracle sql extract

So I am newer to oracle sql and need a data extract.
I have a table with informations about customers (customers), they can have multiple emails (emailaddresses) which can have multiple usages (usage).
At the moment I have something like this:
Select emailaddresses.email as primary, customer.uid as customerUid,
emailaddresses.email as workmail
Join emailaddresses on emailaddresses.parentid = customer.id
Join usages on usages.parent_id = emailaddresses.id .... -- here I am stuck
workmail: (where usage.usagetype = 'work';)
and primary: (where usage.usagetype = 'primary';)
-- now the issue is, i dont know how to select both workmails and primary mails into this extract for one and the same customer. (and customer - uid and id are not the same, I did not invent it and I cannot change it. I just need an extract)
my tables and columns:
customer
uid (int)
id (varchar)
usages
parent_id (int) -- links to emailaddresses.id
customer_id (varchar) -- links to customer.id
usagetype (varchar)
emailaddresses
id (int)
parentid (varchar) -- links to customer.id
email (varchar)
My expected outcome:
customeruid
primary
workmail
01234
example#mail.com
example#workmail.com
01235
mail#mail.com
example#work.com
01236
mail1#mail2.com
mail#work2.com
one way you could do is use LISTAGG as follows,
select customer_id, listagg(email_id || ' , ') within group (order by customer_id) FROM (
select 1 as customer_id, 'hk#gmail.com' as email_id , 'primary' as usagetype from dual UNION
select 1 as customer_id, 'hk#tmail.com' as email_id , 'work' as usagetype from dual UNION
select 2 as customer_id, 'tt#tmail.com' as email_id , 'work' as usagetype from dual
)group by customer_id
This should do:
SELECT *
FROM (SELECT CST.UID
, EA.EMAIL,
, USG.USAGETYPE
FROM EMAILADDRESSES EA
, CUSTOMER CST
, USAGES USG
WHERE EA.PARENTID = CST.ID
AND EA.ID = USG.PARENT_ID
AND USG.USAGETYPE IN ('work','primary'))
PIVOT (MAX(EMAIL) FOR USAGETYPE IN ('work','primary'));

How to display (recursive) data-set in a particular manner?

my brain may not be working today... but I'm trying to get a dataset to be arranged in a particular way. It's easier to show what I mean.
I have a dataset like this:
CREATE TABLE #EXAMPLE (
ID CHAR(11)
, ORDER_ID INT
, PARENT_ORDER_ID INT
);
INSERT INTO #EXAMPLE VALUES
('27KJKR8K3TP', 19517, 0)
, ('27KJKR8K3TP', 10615, 0)
, ('27KJKR8K3TP', 83364, 19517)
, ('27KJKR8K3TP', 96671, 10615)
, ('TXCMK9757JT', 92645, 0)
, ('TXCMK9757JT', 60924, 92645);
SELECT * FROM #EXAMPLE;
DROP TABLE #EXAMPLE;
The PARENT_ORDER_ID field refers back to other orders on the given ID. E.g. ID TXCMK9757JT has order 60924 which is a child order of 92645, which is a separate order on the ID. The way I need this dataset to be arranged is like this:
CREATE TABLE #EXAMPLE (
ID CHAR(11)
, ORDER_ID INT
, CHILD_ORDER_ID INT
);
INSERT INTO #EXAMPLE VALUES
('27KJKR8K3TP', 19517, 19517)
, ('27KJKR8K3TP', 19517, 83364)
, ('27KJKR8K3TP', 10615, 10615)
, ('27KJKR8K3TP', 10615, 96671)
--, ('27KJKR8K3TP', 83364, 83364)
--, ('27KJKR8K3TP', 96671, 96671)
, ('TXCMK9757JT', 92645, 92645)
, ('TXCMK9757JT', 92645, 60924)
--, ('TXCMK9757JT', 60924, 60924)
;
SELECT * FROM #EXAMPLE;
DROP TABLE #EXAMPLE;
In this arrangement of the data set, instead of PARENT_ORDER_ID field there is CHILD_ORDER_ID, which basically lists every single ORDER_ID falling under a given ORDER_ID, including itself. I ultimately would like to have the CHILD_ORDER_ID field be the key for the data set, having only unique values (so that's why I've commented out the CHILD_ORDER_IDs that would only contain themselves, because they have a parent order ID which already contains them).
Any advice on how to achieve the described transformation of the data set would be greatly appreciated! I've tried recursive CTEs and different join statements but I'm not quite getting what I want. Thank you!
You can try to use CTE recursive first, then you will get a result to show all Id hierarchy then use CASE WHEN judgment the logic.
;WITH CTE AS (
SELECT ID,ORDER_ID,PARENT_ORDER_ID
FROM #EXAMPLE
WHERE PARENT_ORDER_ID = 0
UNION ALL
SELECT c.Id,e.ORDER_ID,e.PARENT_ORDER_ID
FROM CTE c
INNER JOIN #EXAMPLE e
ON c.ORDER_ID = e.PARENT_ORDER_ID AND c.Id = e.Id
)
SELECT ID,
(CASE WHEN PARENT_ORDER_ID = 0 THEN ORDER_ID ELSE PARENT_ORDER_ID END) ORDER_ID,
ORDER_ID CHILD_ORDER_ID
FROM CTE
ORDER BY ID
sqlfiddle

SQL query with GROUP BY and HAVING

I have a little problem in mi project, i'm trying to make a query on a single table but I'm not succeeding.
The table is this:
CREATE TABLE PARTITA(
COD_SFIDA VARCHAR (20) PRIMARY KEY,
DATA_P DATE NOT NULL,
RISULTATO CHAR (3) NOT NULL,
COD_DECK_IC VARCHAR (15),
COD_DECK_FC VARCHAR (15),
COD_EVT VARCHAR (15),
TAG_USR_IC VARCHAR (15),
TAG_USR_FC VARCHAR (15),
CONSTRAINT CHECK_RISULTATO CHECK (RISULTATO='0-0' OR RISULTATO='0-1' OR RISULTATO='1-0' OR RISULTATO='1-1'),
CONSTRAINT FK8 FOREIGN KEY (COD_DECK_IC, TAG_USR_IC) REFERENCES DECK (COD_DECK, TAG_USR) ON DELETE CASCADE,
CONSTRAINT FK17 FOREIGN KEY (COD_DECK_FC, TAG_USR_FC) REFERENCES DECK (COD_DECK, TAG_USR) ON DELETE CASCADE,
CONSTRAINT FK9 FOREIGN KEY (COD_EVT) REFERENCES TORNEO (COD_EVENTO) ON DELETE CASCADE
);
I would like to view the most used deck by each user.
this is the query I tried to do:
SELECT P.COD_DECK_FC, P.TAG_USR_FC, COUNT(P.COD_DECK_FC)
FROM PARTITA P
GROUP BY P.TAG_USR_FC, P.COD_DECK_FC
UNION
SELECT P.COD_DECK_IC, P.TAG_USR_IC, COUNT(P.COD_DECK_IC)
FROM PARTITA P
GROUP BY P.TAG_USR_IC, P.COD_DECK_IC
/
But I would like to view just the most used deck by each user and don't all the decks and how many times users used them.
How can I do?
I would like the query to show the tag_usr and the cod_deck that is used the most for all of this for each user
eg:
cod_deck tag_usr count(cod_deck)
------------- ----------- --------------
1 A1BE2 5
2 AE3NF 6
5 FNKJD 3
instead the previious query returns to me:
cod_deck tag_usr count(cod_deck)
------------- ----------- --------------
1 A1BE2 5
2 AE3NF 6
5 FNKJD 3
2 A1BE2 2
1 AE3NF 3
I just want that the query show me the users A1BE2 and AE3NF just one time, because the query have to select the most used deck of each user.
You don't want to select a field that you're counting. Try something like this:
SELECT P.COD_DECK_FC, P.TAG_USR_FC, COUNT(P.COD_SFIDA)
FROM PARTITA P
GROUP BY P.COD_DECK_FC, P.TAG_USR_FC
UNION
SELECT P.COD_DECK_IC, P.TAG_USR_IC, COUNT(P.COD_SFIDA)
FROM PARTITA P
GROUP BY P.COD_DECK_IC, P.TAG_USR_IC
That will list all of the combinations of COD_DECK_FC and TAG_USR_FC
and then number of times it appears in the table, and then do the same with COD_DECK_IC and TAG_USR_IC. It's not clear to me from your question exactly what you want, but I know that you shouldn't put a field in COUNT if you're selecting it.
If i understand correctly you need subquery with ranking function :
with t as (
select *, row_number() over (partition by cod_deck order by count desc) Seq
from (<union query>)
)
select *
from cte c
where seq = 1;
I think you want this:
with ct as (
select P.COD_DECK_FC as deck, P.TAG_USR_FC as usr, COUNT(P.COD_DECK_FC) as cnt
from partita p
group by P.TAG_USR_FC, P.COD_DECK_FC
union all
select P.COD_DECK_IC, P.TAG_USR_IC, COUNT(P.COD_DECK_IC)
from partita P
group by P.TAG_USR_IC, P.COD_DECK_IC
)
select ct.*
from (select ct.*,
row_number() over (partition by usr order by cnt desc) as seqnum
from ct
) ct
where seqnum = 1;
You can also shorten this using grouping sets:
select p.*
from (select coalesce(P.COD_DECK_FC, P.COD_DECK_IC) as deck,
coalesce(P.TAG_USR_FC, P.TAG_USR_IC) as usr,
count(*) as cnt,
row_number() over (partition by coalesce(P.TAG_USR_FC, P.TAG_USR_IC) order by count(*) desc) as seqnum
from partita p
group by grouping sets ( (P.TAG_USR_FC, P.COD_DECK_FC), P.TAG_USR_IC, P.COD_DECK_IC) )
) p
where seqnum = 1;

Improve view performance

I need to improve my view performance, right now the SQL that makes the view is:
select tr.account_number , tr.actual_collection_trx_date ,s.customer_key
from fct_collections_trx tr,
stg_scd_customers_key s
where tr.account_number = s.account_number
and trunc(tr.actual_collection_trx_date) between s.start_date and s.end_date;
Table fct_collections_trx has 170k+-(changes every day) records.
Table stg_scd_customers_key has 430mil records.
Table fct_collections_trx have indexes as following:
(SINGLE INDEX OF ALL OF THEM) (ACCOUNT_NUMBER, SUB_ACCOUNT_NUMBER, ACTUAL_COLLECTION_TRX_DATE, COLLECTION_TRX_DATE, COLLECTION_ACTION_CODE)(UNIQUE) and ENTRY_SCHEMA_DATE(NORMAL). DDL:
alter table stg_admin.FCT_COLLECTIONS_TRX
add primary key (ACCOUNT_NUMBER, SUB_ACCOUNT_NUMBER, ACTUAL_COLLECTION_TRX_DATE, COLLECTION_TRX_DATE, COLLECTION_ACTION_CODE)
using index
tablespace STG_COLLECTION_DATA
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 80K
next 1M
minextents 1
maxextents unlimited
);
Table structure:
create table stg_admin.FCT_COLLECTIONS_TRX
(
account_number NUMBER(10) not null,
sub_account_number NUMBER(5) not null,
actual_collection_trx_date DATE not null,
customer_key NUMBER(10),
sub_account_key NUMBER(10),
schema_key VARCHAR2(10) not null,
collection_group_code CHAR(3),
collection_action_code CHAR(3) not null,
action_order NUMBER,
bucket NUMBER(5),
collection_trx_date DATE not null,
days_into_cycle NUMBER(5),
logical_delete_date DATE,
balance NUMBER(10,2),
abbrev CHAR(8),
customer_status CHAR(2),
sub_account_status CHAR(2),
entry_schema_date DATE,
next_collection_action_code CHAR(3),
next_collectin_trx_date DATE,
reject_key NUMBER(10) not null,
dwh_update_date DATE,
delta_type VARCHAR2(1)
)
Table stg_scd_customers_key have indexes : (SINGLE INDEX OF ALL OF THEM)
(ACCOUNT_NUMBER, START_DATE, END_DATE). DDL :
create unique index stg_admin.STG_SCD_CUST_KEY_PKP on stg_admin.STG_SCD_CUSTOMERS_KEY (ACCOUNT_NUMBER, START_DATE, END_DATE);
This table is also partitioned:
partition by range (END_DATE)
(
partition SCD_CUSTOMERS_20081103 values less than (TO_DATE(' 2008-11-04 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
tablespace FCT_CUSTOMER_SERVICES_DATA
pctfree 10
initrans 1
maxtrans 255
storage
(
initial 8M
next 1M
minextents 1
maxextents unlimited
)
Table structure:
create table stg_admin.STG_SCD_CUSTOMERS_KEY
(
customer_key NUMBER(18) not null,
account_number NUMBER(10) not null,
start_date DATE not null,
end_date DATE not null,
curr_ind NUMBER(1) not null
)
I Can't add filter on the big table(need all range of dates) and i can't use materialized view. This query runs for about 20-40 minutes, I have to make it faster..
I've already tried to drop the trunc, makes no different.
Any suggestions?
Explain plan:
First, write the query using explicit join syntax:
select tr.account_number , tr.actual_collection_trx_date ,s.customer_key
from fct_collections_trx tr join
stg_scd_customers_key s
on tr.account_number = s.account_number and
trunc(tr.actual_collection_trx_date) between s.start_date and s.end_date;
You already have appropriate indexes for the customers table. You can try an index on fct_collections_trx(account_number, trunc(actual_collection_trx_date), actual_collection_trx_date). Oracle might find this useful for the join.
However, if you are looking for a single match, then I wonder if there is another approach that might work. How does the performance of the following query work:
select tr.account_number , tr.actual_collection_trx_date,
(select min(s.customer_key) keep (dense_rank first order by s.start_date desc)
from stg_scd_customers_key s
where tr.account_number = s.account_number and
tr.actual_collection_trx_date >= s.start_date
) as customer_key
from fct_collections_trx tr ;
This query is not exactly the same as the original query, because it is not doing any filtering -- and it is not checking the end date. Sometimes, though, this phrasing can be more efficient.
Also, I think the trunc() is unnecessary in this case, so an index on stg_scd_customers_key(account_number, start_date, customer_key) is optimal.
The expression min(x) keep (dense_rank first order by) essentially does first() -- it gets the first element in a list. Note that the min() isn't important; max() works just as well. So, this expression is getting the first customer key that meets the conditions in the where clause. I have observed that this function is quite fast in Oracle, and often faster than other methods.
If the start and end dates have no time elements (ie. they both default to midnight), then you could do:
select tr.account_number , tr.actual_collection_trx_date ,s.customer_key
from fct_collections_trx tr,
stg_scd_customers_key s
where tr.account_number = s.account_number
and tr.actual_collection_trx_date >= s.start_date
and tr.actual_collection_trx_date < s.end_date + 1;
On top of that, you could add an index to each table, containing the following columns:
for fct_collections_trx: (account_number, actual_collection_trx_date)
for stg_scd_customers_key: (account_number, start_date, end_date, customer_key)
That way, the query should be able to use the indexes rather than having to go to the table as well.
I suggest you an index based on the most selective field in your case
START_DATE, END_DATE
try reverting the (or adding a proper) index
START_DATE, END_DATE, ACCOUNT_NUMBER
in table stg_scd_customers_key