Finding the current status of each phone number in PostgreSQL - sql

Terminology: msisdn = phone number
First I'd like to apologize for the names. This Database schema was created using the squeryl ORM and it has some interesting table definition choices. I've included the two relevant tables below.
Basically, Records contains provisioning requests. Provisioning is each attempt at that Record. Some Records are attempted multiple times.
create table "Provisioning" (
"record_id" bigint not null,
"responseCode" varchar(128),
"response" varchar(128),
"id" bigint primary key not null,
"status" integer not null,
"actionRequired" boolean not null
);
create sequence "s_Provisioning_id";
create table "Record" (
"source" varchar(128) not null,
"timestamp" varchar(128) not null,
"finalState" varchar(128) not null,
"fromMSISDN" varchar(128) not null,
"reason" varchar(128) not null,
"id" bigint primary key not null,
"toICCID" varchar(128) not null,
"fromICCID" varchar(128) not null,
"toMSISDN" varchar(128) not null,
"batch_id" bigint not null,
"action" varchar(128) not null,
"transactionId" varchar(128) not null
);
I realize the Provisioning has no timestamp. Yes the latest id is the latest request. The developer working on it forgot to put in timestamps, the project manager convinced them it wasn't a part of the original requirements and then the client didn't want to pay to add it in later. No I'm not happy about it. No there's nothing I can do about it. Yes I hate working for a consulting company. Moving on.
The Problem: I need a report that gives me the latest state of each phone number (msisdn). There can be multiple records for each phone number. In the case of from/toMSISDN, the toMSISDN should always be used unless it is empty, in which case the from is used. The following query gets all the unique phone numbers that are in Record:
SELECT
CASE
WHEN "toMSISDN" = '' OR "toMSISDN" IS NULL THEN "fromMSISDN"
ELSE "toMSISDN"
END AS msisdn
FROM "Record"
GROUP BY msisdn
So that gives me a subset of all the numbers I need to report on. Now I need the very latest Record and Provisioing. I can get the latest Provisiong with the following:
SELECT
max(p.id) latest_provision_id,
p.record_id
FROM "Provisioning" p
LEFT JOIN "Record" r ON r.id = p.record_id
group by p.record_id
So this gives me a 1-to-1 table of each record and what it's latest provisioning is. And this is where I start to get stuck. From the Provisioning table I need the response and responseCode for the latest Provisioing. I thought about just adding max(p."responseCode") to the query, but then I realized it would most likely do a alphabetic comparison and not get the correct responseCode/response for the appropriate Provisioning.id. I tried adding those fields to the Group By, but then I started getting extra records in my queries and I wasn't quite sure what was going on.
This (very ugly subquery join) seems to give me the correct record row and provisioning row information, but it's for ever record and I need to get the latest (max provisioning id) for each msisdn/phone number (computed field). I'm not sure what to group by and what aggregate functions to use.
SELECT *,
CASE
WHEN "toMSISDN" = '' OR "toMSISDN" IS NULL THEN "fromMSISDN"
ELSE "toMSISDN"
END AS msisdn
FROM (
SELECT
max(p.id) latest_provision_id,
p.record_id
FROM "Provisioning" p
LEFT JOIN "Record" r ON r.id = p.record_id
group by p.record_id
) latest_prov
LEFT JOIN "Provisioning" p2 ON p2.id=latest_prov.latest_provision_id
LEFT JOIN "Record" r2 ON r2.id=latest_prov.record_id
I can't seem to think of a clean way of doing this without running multiple queries and dealing with the results in the application layer.
I was also originally going to do this as a Scala app using the same squeryl ORM, but the query just got considerably more complicated and I stopped around the following statement, opting instead to do the reports as a Python application:
def getSimSnapshot() = {
join(record,provisioning.leftOuter)((r,p) =>
groupBy(r.fromMSISDN)
compute(max(r.id),r.finalState,r.fromMSISDN,r.reason,r.action)
on(r.id === p.map(_.record_id))
)
}
If there's an easier way to do this with the ORM I'm all for it.

Check out Window Functions: http://www.postgresql.org/docs/8.4/static/tutorial-window.html
Without joins you can get latest data for record_id:
select *
from
(
select p.record_id, p.responseCode, p.id, max(p.id) over (partition by p.record_id) max_id
from "Provisioning" p
)
where id = max_id
It can be a problem only if "Provisioning" contains also record_ids of different tables then "Records"

Related

SQL - Query that returns the Username along with their total count of records

I'm new to the relational database stuff and Im having a hard time understanding how to write a query to do what I want. I have two tables that have a relationship.
CREATE TABLE DocumentGroups (
id INTEGER PRIMARY KEY AUTOINCREMENT,
comments TEXT,
Username TEXT NOT NULL,
)
CREATE TABLE Documents (
id INTEGER PRIMARY KEY,
documentGroupId INT NOT NULL,
documentTypeId INT NOT NULL,
documentTypeName TEXT NOT NULL,
succesfullyUploaded BIT
)
I would like to query the Documents table and get the record count for each username. Here is the query that I came up with:
SELECT Count(*)
FROM DOCUMENTS
JOIN DocumentGroups ON Documents.documentGroupId=DocumentGroups.id
GROUP BY Username
I currently have 2 entries in the Documents table, 1 from each user. This query prints out:
[{Count(*): 1}, {Count(*): 1}]
This looks correct, but is there anyway for me to get he username associated with each count. Right now there is no way of me knowing which count belongs to each user.
You are almost there. Your query already produces one row per user name (that's your group by clause). All that is left to do is to put that column in the select clause as well:
select dg.username, count(*) cnt
from documents d
join documentgroups dg on d.documentgroupid = dg.id
group by dg.username
Side notes:
table aliases make the queries easier to read and write
in a multi-table query, always qualify all columns with the (alias of) table they belong to
you probably want to alias the result of count(*), so it is easier to consume it from your application

PostgreSQL questions, constraints and queries

My task is to make a table that records the placement won by race car drivers competing in Race events.
The given schema is:
CREATE TABLE RaceEvent
(
Name text,
Year int,
);
CREATE TABLE Driver
(
Name text,
Date_of_birth date,
Gender char,
Nationality,
);
I then added the following constraints :
CREATE TABLE RaceEvent
(
RaceName text NOT NULL PRIMARY KEY,
Year int NOT NULL,
Description text NOT NULL
);
CREATE TABLE Driver
(
Name text NOT NULL,
Date_of_birth date NOT NULL PRIMARY KEY,
Gender char(1) NOT NULL,
Nationality text NOT NULL
);
The table I created looks like this :
CREATE TABLE Races
(
Medal char(6) CHECK (Medal = 'Gold' or Medal = 'Silver' or Medal =
'Bronze'),
Event text NOT NULL REFERENCES RaceEvent (Name),
DriverDOB date NOT NULL REFERENCES Driver (Date_of_birth)
);
I know using the date of birth as a primary key is very silly but for some reason that was part of the task.
I need to ensure a driver cannot gain multiple medals in the same race, can anybody give insight on a good way of doing this? I thought about using some sort of check but can't quite work it out.
After that, I need to write a query that can return the nationalities of drivers that won at least 2 gold medals in certain years, to figure out which nationalities seem to produce the best drivers. 2 versions of the same query, one using aggregation and one not.
I know I have to do something along these lines :
SELECT Nationality from Driver JOIN Races ON Driver.Date_of_Birth = Races.DriverDOB WHERE ....?
Not sure on what the best way of figuring out how to link the nationalities to the medals?
All feedback much appreciated
The "best" way to do it would be to restructure your schema, as right now it's pretty crap. I'm assuming you can't, so here's one way to prevent multiple drivers from gaining multiple medals in the same race: add a primary key on DriverDOB and Event to the Races table.
Try it out here: http://sqlfiddle.com/#!17/dc8a9/1
As for the query to get the nationalities with multiple golds in a given year, here's one way to do it:
SELECT d.nationality, COUNT(*) AS golds
FROM races r
JOIN driver d
ON r.driverdob = d.date_of_birth
JOIN raceevent e
ON r.event = e.racename
AND e.year = 1999
WHERE r.medal = 'Gold'
GROUP BY d.nationality
HAVING COUNT(*) > 1;
Output:
nationality golds
NatA 3
NatB 2
And you can test it here: http://sqlfiddle.com/#!17/dc8a9/9

SQL algorithm with three identifiers from one table

Probably this will be really easy, but I can't figure out, how to get necessary values from my DB with one query. Just can't figure it out now. I'm going to make this query inside CodeIginiter system.
Table 'information' construction:
CREATE TABLE information (
planid int(11) NOT NULL,
production_nr int(11) NOT NULL,
status int(11) NOT NULL
);
Table 'information' content:
Necessary output:
I would like to get (at the best - with only one query, but if its not possible, then with multiple) all planid's where: ALL of this plan id's pruduction_nrs has status >= 3.
In this case, I would need to get these plandid's: 2 and 5 because each of these planid's ALL production_nrs has status greater or equal than 3.
select planid, production_nr
from information inf1
where not exists (select 1 from information inf2
where inf1.planid = inf2.planid
and status < 3)
You might consider amending the select clause (first row) according to your needs:
Add distinct (if the table PK includes status column)
Change the column list
Try this,
SELECT planid , production_nr FROM information
WHERE production_nr IN(SELECT production_nr FROM information) AND STATUS >=3
Your problem is known as relational division. There are basically two ways to approach it
1) planid where it does not exist a production_nr with status < 3
select planid
from information i1
where not exists (
select 1 from information i2
where i1.planid = i2.planid
and i2.planid < 3
)
2) planid where the number of production_nr is equal to the number of production_nr with status >= 3. I'll leave that as an exercise ;-)

in SQL, best way to join first and last instance of child table without NOT EXISTS?

in PostgreSQL, have issue table and child issue_step table - an issue contains one or more steps.
the view issue_v pulls things from the issue and the first and last step: author and from_ts are pulled from the first step, while status and thru_ts are pulled from the last step.
the tables
create table if not exists seeplai.issue(
isu_id serial primary key,
subject varchar(240)
);
create table if not exists seeplai.issue_step(
stp_id serial primary key,
isu_id int not null references seeplai.issue on delete cascade,
status varchar(12) default 'open',
stp_ts timestamp(0) default current_timestamp,
author varchar(40),
notes text
);
the view
create view seeplai.issue_v as
select isu.*,
first.stp_ts as from_ts,
first.author as author,
first.notes as notes,
last.stp_ts as thru_ts,
last.status as status
from seeplai.issue isu
join seeplai.issue_step first on( first.isu_id = isu.isu_id and not exists(
select 1 from seeplai.issue_step where isu_id=isu.isu_id and stp_id>first.stp_id ) )
join seeplai.issue_step last on( last.isu_id = isu.isu_id and not exists(
select 1 from seeplai.issue_step where isu_id=isu.isu_id and stp_id<last.stp_id ) );
note1: issue_step.stp_id is guaranteed to be chronologically sequential, so using it instead of stp_ts because it's already indexed
this works, but ugly as sin, and cannot be the most efficient query in the world.
In this code, I use a sub-query to find the first and last step IDs, and then join to the two instances of the step table by using those found values.
SELECT ISU.*
,S1.STP_TS AS FROM_TS
,S1.AUTHOR AS AUTHOR
,S1.NOTES AS NOTES
,S2.STP_TS AS THRU_TS
,S2.STATUS AS STATUS
FROM SEEPLAI.ISSUE ISU
INNER JOIN
(
SELECT ISU_ID
,MIN(STP_ID) AS MIN_ID
,MAX(STP_ID AS MAX_ID
FROM SEEPLAI.ISSUE_STEP
GROUP BY
ISU_ID
) SQ
ON SQ.ISU_ID = ISU.ISU.ID
INNER JOIN
SEEPLAI.ISSUE_STEP S1
ON S1.STP_ID = SQ.MIN_ID
INNER JOIN
SEEPLAI.ISSUE_STEP S2
ON S2.STP_ID = SQ.MAX_ID
Note: you really shouldn't be using a select * in a view. It is much better practice to list out all the fields that you need in the view explicitly
Have you considered using window functions?
http://www.postgresql.org/docs/9.2/static/tutorial-window.html
http://www.postgresql.org/docs/9.2/static/functions-window.html
A starting point:
select steps.*,
first_value(steps.stp_id) over w as first_id,
last_value(steps.stp_id) over w as last_id
from issue_step steps
window w as (partition by steps.isu_id order by steps.stp_id)
Btw, if you know the IDs in advance, you'll much be better off getting details in a separate query. (Trying to fetch everything in one go will just yield sucky plans due to subqueries or joins on aggregates, which will result in inefficiently considering/joining the entire tables together.)

SQL Database Design for SSIS

OK my first question so here goes.
Currently users are using a huge Access Application. They wanted a web application with some functionality based off of the Access data and with some modifications.
Ok no problem. I used the Access to SQL migration assistant to convert the data over and then wrote some SSIS packages which are executed from the web end to allow the application to be updated as needed. All here is good.
Here is where I am kind of stumped. There are 2 types of imports, quarterly and yearly. The quarterly is fine but the yearly import is causing issues. The yearly import can be for an adopted budget or for a proposed budget (each is held in a separate Access db). I have one SSIS package for each type of yearly import. The table where the information goes is as follows..
CREATE TABLE Budget
(
BudgetID uniqueidentifier NOT NULL,
ProjectNumber int NOT NULL,
SubProjectNumber varchar(6) NOT NULL,
FiscalYearBegin int NOT NULL,
FiscalYearEnd int NOT NULL,
Sequence int NULL,
QuarterImportDate datetime NULL,
ProposedBudget money NULL,
AdoptedBudget money NULL,
CONSTRAINT PK_Budget PRIMARY KEY CLUSTERED
(
BudgetID ASC
),
CONSTRAINT uc_Budget UNIQUE NONCLUSTERED
(
ProjectNumber ASC,
SubProjectNumber ASC,
FiscalYearBegin ASC,
FiscalYearEnd ASC,
Sequence ASC
)
)
Also, there can be multiple versions of the budget for the specific year in terms of Project, SubProject, FiscalYearBegin, and FiscalYearEnd. Thats is why there is a sequence number.
So the problem becomes, since I have 2 different SSIS packages, each of which is an update statement on 1 specific column (either ProposedBudget or AdoptedBudget), I have no effective way of keeping track of the correct sequence.
Please let me know if I can make this any clearer, and any advice would be great!
Thanks.
Something like this will get you the next item with an empty AdoptedBudget, but I think you will need a cursor when there are multiple AdoptedBudgets. I was thinking of doing a nested subquery with an update, but that won't work when there are multiple AdoptedBudgets. It sounds like in the source application they should be selecting a ProposedBudget whenever they add the AdoptedBudget so that a relationship can be created. This way it is clear which AdoptedBudget goes with which ProposedBudget, and it would be a simple join. I have almost the same scenario, but the difference is that I don't keep all the versions. I only have to keep the most current "ProposedBudget" and most current "AdoptedBudget". It's a little bit more difficult trying to sequence them all.
--get the smallest SequenceId with an unfilled AdoptedBudget
Select min(SequenceID),
ProjectNumber,
FiscalYearBegin,
SubProjectNumber --any other fields needed for the join
From Budgets b
Where AdoptedBudget is null
Group By
ProjectNumber,
FiscalYearBegin,
SubProjectNumber --any other fields needed for the join
--This won't work I don't believe
Update Budgets
Set AdoptedBudget = BudgetAmount
From Budgets b
Inner Join SourceAdoptedBudgets ab on
b.ProjectNumber = ab.ProjectNumber
b.FiscalYearBegin = ab.FiscalYearBegin
b.FiscalYearEnd = ab.FiscalYearEnd
Inner Join
(
--get the smallest SequenceId with an unfilled AdoptedBudget
Select min(SequenceID),
ProjectNumber,
FiscalYearBegin,
SubProjectNumber --any other fields needed for the join
From Budgets b
Where AdoptedBudget is null
Group By
ProjectNumber,
FiscalYearBegin,
SubProjectNumber --any other fields needed for the join
) as nextBudgets
on --the join fields again
Something like this using the BudgetType. Of course you'd probably create a code table for these or a IsAdopted bit field. But you get the idea.
Select
budgets.*
,row_number() over(partition by
ProjectNumber
,SubProjectNumber
,FiscalYearBegin
,FiscalYearEnd
order by QuarterImportDate) as SequenceNumber
From
(
Select
ProjectNumber
,SubProjectNumber
,FiscalYearBegin
,FiscalYearEnd
,QuarterImportDate
,'Proposed' as BudgetType
,ProposedBudget as Budget
From sourceProposed
Union
Select
ProjectNumber
,SubProjectNumber
,FiscalYearBegin
,FiscalYearEnd
,QuarterImportDate
,'Adopted' as BudgetType
,AdoptedBudget as Budget
From sourceAdopted
) as budgets