how to segment groups based on different criteria - sql

I'm trying to assign test and control groups based on A to F columns values to the table below.
Eventually, I want a table look like below. If different zips have the same values for all columns, then assign half zips to test and half to control. If the total number of zips cannot be equally assigned, then give the extra zip to control.

You could use row_number() and mod():
select
t.*,
case when mod(
row_number() over(partition by A, B, C, D, E, F order by zip),
2
) = 0 then 'T' else 'C' end tc_group
from mytable t
row_number() assigns increasing numbers to records that share the same (A, B, C, D, E, F) values, ordered by increasing zip. We would assign even row numbers to testing group T, and uneven numbers to group C.

I think a stratified sample will do what you want:
select t.*,
(case when mod(row_number() over (order by a, b, c, d, e, f), 2) = 1
then 'C' else 'T'
end) as test_group
from t;
This is not exactly how you phrased the question, but it should have the same effect of splitting rows with the same values in the columns evenly in the two groups. When there are odd numbers, sometimes the extra will go to test and sometimes to control.
It is unclear from the question whether you want balanced control and test groups -- which is what I would expect. If you actually want all groups with odd numbers to go to control (as you suggest), then all the onesies will be in the control and that seems biased to me.

Related

How to select another column in table if first column doesnt have value that i need

I have 1 table where is 2 columns that i need to use at this moment( length and length_to_fault). If length has a null(N/A) value then i need to select value from length_to_fault and opposite.
I also need to sort everything and i can do it with 1 column like this:
select d.* from (select d.*, lead(length::float) over (partition by port_nbr, pair order by add_date) as next_length from diags d where length != 'N/A') d
this select sorts everything except length_to_fault. If 1 record has value in length_to_fault , it will be ignored and it wont show up.
Is there any suggestions?
Maybe its possible to just make these 2 columns into 1? It sounds much logical. But how?
I changed it to select d.* from (select d.*, lead(sum(length::float + length_to_fault::float)) over (partition by port_nbr, pair order by d.add_date) as next_length from diags d)d
i get error : column "d.ip" must appear in the group by clause or be used in an aggregate function.
I dont need to use ip column... I even dont know where to put that ip right now

sql sum 2 column values and sort records by 3 columns

I have columns - ip, port, pair, pair_status, length, length_to_fault, add_date.
I need to sort everything by port, and each port has a pair(A,B,C,D) atleast once. when its sorted, i need to sort even more - i need to sort each pair in exact port.
Currently i have select that does everything that i need but just with a length.
I want to change this fragment so it could check - if length = N/A, then it takes length_to_fault and if length_to_fault = N/A, then it takes length. My idea is just to combine these 2 columns into 1. Also each record has value on one column only(it can be length or length_to_fault). So far i have this-
Select d.*
from (select d.*, lead(length::float) over (partition by port_nbr, pair order by d.add_date) as next_length
from diags d
where length !='N/A'
) d
This works perfectly, but there is records that has N/A in length, but value is inside length_to_fault so this select doesn't take that record. Is there a way to edit this fragment to include length_to_fault too? Maybe i can sum these two columns into one? Also length/length_to_fault is chars in database, so i must change it to float in this select.
You can use a case expression:
Select d.*
from (select d.*,
lead( (case when length <> 'N/A' then length else length_to_fault end)::float) over (partition by port_nbr, pair order by d.add_date) as next_length
from diags d
) d

Access - SQL Query Date wise with selection of column summarized value

Below is my source Data
by using below query I can get summarized data for '17-09-2016'
SQL Query :-
SELECT key_val.A, key_val.B, key_val.C, key_val.D, Sum(IIf(key_val.Store_date=#9/17/2016#,key_val.Val,0)) AS [17-09-2016]
FROM key_val
GROUP BY key_val.A, key_val.B, key_val.C, key_val.D;
but I am looking output suppose to look like this way.
Specifically= I need summarized data for column a,b,c and for '17-09-2016' dateIn excel we will apply sumifs formula to get desired output but in Access - SQL I am not getting how to form the query to get the same data.
Can any one assist me how to acheive above result by using Access Query?
Specifically= I need summarized data for column a,b,c and for '17-09-2016' date
I'm not sure where you get the 34 figure from - the sum of the first two rows even though the values in A, B, C & D are different (so the grouping won't work)?
Making an assumption that you want the values summed where all the other fields are equal (A, B, C, D & Store_Date):
This query will give you the totals, but not in the format you're after:
SELECT A, B, C, D, SUM(val) As Total, Store_Date
FROM key_val
WHERE Store_date = #9/17/2016#
GROUP BY A,B,C,D, Store_Date
This SQL will give you the same, but for all dates (just remove the WHERE clause).
SELECT A, B, C, D, SUM(val) As Total, Store_Date
FROM key_val
GROUP BY A,B,C,D, Store_Date
ORDER BY Store_Date
This will give the exact table shown in your example:
TRANSFORM Sum(val) AS SumOfValue
SELECT A, B, C, D
FROM key_val
WHERE Store_date = #9/17/2016#
GROUP BY A,B,C,D,val
PIVOT Store_Date
Again, just remove the WHERE clause to list all dates in the table:

Oracle-Complex sql view creation

I have a table like below:
For each disinct combination of ID and VALUE, I have several steps. For example, For the combination of A and B, I have three steps QC, LC and DR and so on for C and D. Now, I want a view like below:
That is, I want a column "OUTPUT" in the view where i have to put the first step after QC for each combination of ID and VALUE. For example, For A and B, first step after QC is LC and so OUTPUT value is LC. For C and D, there is no QC and so OUTPUT value is NA.
Can anyone please help me on this issue.
Thanks in advance.
In SQL, tables are inherently unordered. So, you need a column to specify the ordering. Let me assume that you have such a column, say StepOrder in the table. If so, then you can do what you want using analytic functions.
The lead() in the inner subquery returns the next step. The max() in the next subquery returns the value after QA, and the output max() spreads the value over all rows with the same id and value:
select id, value, step,
coalesce(max(qa_next) over (partition by id, value), 'NA') as "Output"
from (select t.*,
max(case when step = 'QA' then nextstep end) over (partition by id, value) as qa_next
from (select t.*,
lead(step) over (partition by id, value order by StepOrder) as nextStep
from table t
) t
) t

Purposely having a query return blank entries at regular intervals

I want to write a query that returns 3 results followed by blank results followed by the next 3 results, and so on. So if my database had this data:
CREATE TABLE table (a integer, b integer, c integer, d integer);
INSERT INTO table (a,b,c,d)
VALUES (1,2,3,4),
(5,6,7,8),
(9,10,11,12),
(13,14,15,16),
(17,18,19,20),
(21,22,23,24),
(25,26,37,28);
I would want my query to return this
1,2,3,4
5,6,7,8
9,10,11,12
, , ,
13,14,15,16
17,18,19,20
21,22,23,24
, , ,
25,26,27,28
I need this to work for arbitrarily many entries that I select for, have three be grouped together like this.
I'm running postgresql 8.3
This should work flawlessly in PostgreSQL 8.3
SELECT a, b, c, d
FROM (
SELECT rn, 0 AS rk, (x[rn]).*
FROM (
SELECT x, generate_series(1, array_upper(x, 1)) AS rn
FROM (SELECT ARRAY(SELECT tbl FROM tbl) AS x) x
) y
UNION ALL
SELECT generate_series(3, (SELECT count(*) FROM tbl), 3), 1, (NULL::tbl).*
ORDER BY rn, rk
) z
Major points
Works for a query that selects all columns of tbl.
Works for any table.
For selecting arbitrary columns you have to substitute (NULL::tbl).* with a matching number of NULL columns in the second query.
Assuming that NULL values are ok for "blank" rows.
If not, you'll have to cast your columns to text in the first and substitute '' for NULL in the second SELECT.
Query will be slow with very big tables.
If I had to do it, I would write a plpgsql function that loops through the results and inserts the blank rows. But you mentioned you had no direct access to the db ...
In short, no, there's not an easy way to do this, and generally, you shouldn't try. The database is concerned with what your data actually is, not how it's going to be displayed. It's not an appropriate scope of responsibility to expect your database to return "dummy" or "extra" data so that some down-stream process produces a desired output. The generating script needs to do that.
As you can't change your down-stream process, you could (read that with a significant degree of skepticism and disdain) add things like this:
Select Top 3
a, b, c, d
From
table
Union Select Top 1
'', '', '', ''
From
table
Union Select Top 3 Skip 3
a, b, c, d
From
table
Please, don't actually try do that.
You can do it (at least on DB2 - there doesn't appear to be equivalent functionality for your version of PostgreSQL).
No looping needed, although there is a bit of trickery involved...
Please note that though this works, it's really best to change your display code.
Statement requires CTEs (although that can be re-written to use other table references), and OLAP functions (I guess you could re-write it to count() previous rows in a subquery, but...).
WITH dataList (rowNum, dataColumn) as (SELECT CAST(CAST(:interval as REAL) /
(:interval - 1) * ROW_NUMBER() OVER(ORDER BY dataColumn) as INTEGER),
dataColumn
FROM dataTable),
blankIncluder(rowNum, dataColumn) as (SELECT rowNum, dataColumn
FROM dataList
UNION ALL
SELECT rowNum - 1, :blankDataColumn
FROM dataList
WHERE MOD(rowNum - 1, :interval) = 0
AND rowNum > :interval)
SELECT *
FROM dataList
ORDER BY rowNum
This will generate a list of those elements from the datatable, with a 'blank' line every interval lines, as ordered by the initial query. The result set only has 'blank' lines between existing lines - there are no 'blank' lines on the ends.