SQL - Row For Each Criteria - Access - sql

I am trying to create a query that will take a list of data, and give me N rows each time it meets a criteria.
Say I have the following data:
ID | Type
1 | Vegetables
2 | Vegetables
3 | Vegetables
4 | Fruits
5 | Fruits
6 | Meats
7 | Dairy
8 | Dairy
9 | Dairy
10 | Dairy
And what I want is:
Type
Dairy
Dairy
Dairy
Fruits
Fruits
Meats
Meats
Vegetables
Vegetables
The criteria I have is that for every 2 of each Type I count it as a "whole" value. If there is anything more than a whole value, round up to the nearest whole number. So, the Vegetables Type rounds up from 1.5 to 2 rows and the Dairy Type stays at 2 rows.
Then I want to add a row to every Type that is not the last type in the set (Which is why Vegetables only has two rows), perhaps with another column denomination showing that it was the added row.

This query will return every type along with the number of times that it has to be repeated:
SELECT Type, tot+IIf(Type=(SELECT MAX(Type) FROM tablename),0,1) AS Rep
FROM (SELECT tablename.Type, -Int(-Count([tablename].[ID])/2) AS tot
FROM tablename
GROUP BY tablename.Type
) AS s;
then my idea is to use a table named [times] that contains every number repeated n times:
n
---
1
2
2
3
3
3
...
and then your query could be like this:
SELECT s.*
FROM (
SELECT Type, tot+IIf(Type=(SELECT MAX(Type) FROM tablename),0,1) AS rep
FROM (SELECT tablename.Type, -Int(-Count([tablename].[ID])/2) AS tot
FROM tablename
GROUP BY tablename.Type
) AS s1) s INNER JOIN times ON s.rep=times.n

So you want to count the records, divide by 2, round up and then add 1.
--Create a temporary table with all numbers from 1 to 1024.
declare #Numbers table
(
MaxQty INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
)
WHILE COALESCE(SCOPE_IDENTITY(), 0) <= 1024
BEGIN
INSERT #Numbers DEFAULT VALUES
END
--First get the count of records
SELECT [Type], Sum(1) as CNT
INTO #TMP1
FROM MyTable
Group By [Type]
--Now get the number of times the record should be repeated, based on this formula :
-- count the records, divide by 2, round up and then add 1
SELECT [Type], CNT, CEILING((CNT/2)+1) as TimesToRepeat
INTO #TMP2
FROM #TMP1
--Join the #TMP2 table with the #Numbers table so you can repeat your records the
-- required number of times
SELECT A.*
from #TMP2 as A
join #Numbers as B
on B.MaxQty <= A.TimesToRepeat
Not pretty, but it should work. This still doesn't account for the last type in the set, I'm a little stumped by that part.

Related

Number of rows per "percentile"

I would like a Postgres query returning the number of rows per percentile.
Input:
id
name
price
1
apple
12
2
banana
6
3
orange
18
4
pineapple
26
4
lemon
30
Desired output:
percentile_3_1
percentile_3_2
percentile_3_3
1
2
2
percentile_3_1 = number of fruits in the 1st 3-precentile (i.e. with a price < 10)
Postgres has the window function ntile() and a number of very useful ordered-set aggregate functions for percentiles. But you seem to have the wrong term.
number of fruits in the 1st 3-precentile (i.e. with a price < 10)
That's not a "percentile". That's the count of rows with a price below a third of the maximum.
Assuming price is defined numeric NOT NULL CHECK (price > 0), here is a generalized query to get row counts for any given number of partitions:
WITH bounds AS (
SELECT *
FROM (
SELECT bound AS lo, lead(bound) OVER (ORDER BY bound) AS hi
FROM (
SELECT generate_series(0, x, x/3) AS bound -- number of partitions here!
FROM (SELECT max(price) AS x FROM tbl) x
) sub1
) sub2
WHERE hi IS NOT NULL
)
SELECT b.hi, count(t.price)
FROM bounds b
LEFT JOIN tbl t ON t.price > b.lo AND t.price <= b.hi
GROUP BY 1
ORDER BY 1;
Result:
hi | count
--------------------+------
10.0000000000000000 | 1
20.0000000000000000 | 2
30.0000000000000000 | 2
Notably, each partition includes the upper bound, as this makes more sense while deriving partitions from the maximum value. So your quote would read:
i.e. with a price <= 10
db<>fiddle here

Grouping sequence number in SQL

I have a table like below.
DECLARE #Table TABLE (
[Text] varchar(100),
[Order] int,
[RequiredResult] int
);
INSERT INTO #Table
VALUES
('A',1,1),
('B',2,1),
('C',3,1),
('D',1,2),
('A',2,2),
('B',3,2),
('G',4,2),
('H',1,3),
('B',2,3);
I have used dense_rank, but the results are not correct.
select [Text], [Order], RequiredResult
, DENSE_RANK() OVER (ORDER BY [text],[Order]) AS ComputedResult
from #Table;
Results:
Text
Order
RequiredResult
ComputedResult
A
1
1
1
A
2
2
2
B
2
1
3
B
2
3
3
B
3
2
4
C
3
1
5
D
1
2
6
G
4
2
7
H
1
3
8
Please help me to calculate the RequiredResult column.
It looks like the RequiredResult column is simple a running sequence that resets after each broken sequence in the Order column when you process the records in the order they were inserted.
This is a typical Data Island analysis task, except in this case the islands are the rows that are sequential sets, the boundary is when the numbering resets back to 1.
Record the input sequence by adding an IDENTITY column to the table variable.
Calculate an island identifier
Due to the rule about the rows being in sequence based on the Order column, we can calculate a unique number for the Island by subtracting the Order from the IDENTITY column, in this case Id
We can then use DENSE_RANK() ordering by the Island Number
Putting all that together:
DECLARE #Table TABLE (
[Id] int IDENTITY(1,1),
[Text] varchar(100),
[Order] int,
[RequiredResult] int
);
INSERT INTO #Table
VALUES
('A',1,1),
('B',2,1),
('C',3,1),
('D',1,2),
('A',2,2),
('B',3,2),
('G',4,2),
('H',1,3),
('B',2,3);
SELECT [Text],[Order]
, [Id]-[Order] as Island
, RequiredResult
, DENSE_RANK() OVER (ORDER BY [ID]-[ORDER]) AS CalculatedResult
FROM #Table
ORDER BY [ID]
Text
Order
Island
RequiredResult
CalculatedResult
A
1
0
1
1
B
2
0
1
1
C
3
0
1
1
D
1
3
2
2
A
2
3
2
2
B
3
3
2
2
G
4
3
2
2
H
1
7
3
3
B
2
7
3
3
The key here is that we need to record the input sequence so we can us it in the calculation. It doesn't matter what actual numbering value the Id column has, only that it is also in sequence. If that number sequence is broken, then you could use the ROW_NUMER() function result to calculate the Island Number but the specifics on that would depend on the initial query that provides the basic sequential dataset.
You seem to have an ordering in mind for the rows. SQL tables represent unordered (multi)sets. The only column in your data that has the appropriate ordering is text, but your real data might have another column with this information.
Basically, you just want a cumulative sum of the number of 1s up to each row. That would be:
select t.*,
sum(case when ord = 1 then 1 else 0 end) over (order by text)
from t

Select latest available value SQL

Below is a test table for simplification of what I am looking to achieve in a query. I am attempting to create a query using a running sum which inserts into column b that last sum result that was not null. If you can imagine, i'm looking to have a cumulative sum the purchases of a customer every day, some days no purchases occurs for a particular customer thus I want to display the latest sum for that particular customer instead of 0/null.
CREATE TABLE test (a int, b int);
insert into test values (1,null);
insert into test values (2,1);
insert into test values (3,3);
insert into test values (4,null);
insert into test values (5,5);
insert into test values (6,null);
1- select sum(coalesce(b,0)),coalesce(0,sum(b)) from test
2- select a, sum(coalesce(b,0)) from test group by a order by a asc
3- select a, sum(b) over (order by a asc rows between unbounded preceding and current row) from test group by a,b order by a asc
I'm not sure if my interpretation of how coalesce works is correct. I thought this sum(coalesce(b,0)) will insert 0 where b is null and always take the latest cumulative sum of column b.
Think I may have solved it with query 3.
The result I expect will look like this:
a | sum
--------
1
2 1
3 4
4 4
5 9
6 9
Each records of a displays the last cumulative sum of column b.
Any direction would be of valuable.
Thanks
In Postgres you can also use the window function of SUM for a cummulative sum.
Example:
create table test (a int, b int);
insert into test (a,b) values (1,null),(2,1),(3,3),(4,null),(5,5),(6,null);
select a, sum(b) over (order by a, b) as "sum"
from test;
a | sum
-- | ----
1 | null
2 | 1
3 | 4
4 | 4
5 | 9
6 | 9
db<>fiddle here
And if "a" isn't unique, but you want to group on a?
Then you could use a suminception:
select a, sum(sum(b)) over (order by a) as "sum"
from test
group by a

Group by in SQL returning error: Selected non-aggregate values must be part of the associated group

I have a table that looks like this:
date store flag
1 5/4/2018 a 1
2 5/4/2018 a 1
3 5/3/2018 b 1
4 5/3/2018 b 0
5 5/2/2018 a 1
6 5/2/2018 b 0
I want to group by date and store and sum the number of flags
i.e. table_a below:
date store total_flag
1 5/4/2018 a 2
3 5/3/2018 b 1
4 5/2/2018 a 1
5 5/2/2018 b 0
This is what I'm trying:
create multiset volatile table flag_summary as (
sel table_a.*, SUM(table_a.flag) as total_flag
group by date, store
)
with data primary index (date, store) on commit preserve rows;
The above gives me an error, "CREATE TABLE Failed. [3504] Selected non-aggregate values must be part of the associated group.
You are selecting all of tableA (including the flag). You should just be pulling the date and the store since you want the sum of the flag.
SELECT date, store, SUM(flag)
FROM tableA
GROUP BY date, store

In SQL, find duplicates in one column with unique values for another column

So I have a table of aliases linked to record ids. I need to find duplicate aliases with unique record ids. To explain better:
ID Alias Record ID
1 000123 4
2 000123 4
3 000234 4
4 000123 6
5 000345 6
6 000345 7
The result of a query on this table should be something to the effect of
000123 4 6
000345 6 7
Indicating that both record 4 and 6 have an alias of 000123 and both record 6 and 7 have an alias of 000345.
I was looking into using GROUP BY but if I group by alias then I can't select record id and if I group by both alias and record id it will only return the first two rows in this example where both columns are duplicates. The only solution I've found, and it's a terrible one that crashed my server, is to do two different selects for all the data and then join them
ON [T_1].[ALIAS] = [T_2].[ALIAS] AND NOT [T_1].[RECORD_ID] = [T_2].[RECORD_ID]
Are there any solutions out there that would work better? As in, not crash my server when run on a few hundred thousand records?
It looks as if you have two requirements:
Identify all aliases that have more than one record id, and
List the record ids for these aliases horizontally.
The first is a lot easier to do than the second. Here's some SQL that ought to get you where you want with the first:
WITH A -- Get a list of unique combinations of Alias and [Record ID]
AS (
SELECT Distinct
Alias
, [Record ID]
FROM T1
)
, B -- Get a list of all those Alias values that have more than one [Record ID] associated
AS (
SELECT Alias
FROM A
GROUP BY
Alias
HAVING COUNT(*) > 1
)
SELECT A.Alias
, A.[Record ID]
FROM A
JOIN B
ON A.Alias = B.Alias
Now, as for the second. If you're satisfied with the data in this form:
Alias Record ID
000123 4
000123 6
000345 6
000345 7
... you can stop there. Otherwise, things get tricky.
The PIVOT command will not necessarily help you, because it's trying to solve a different problem than the one you have.
I am assuming that you can't necessarily predict how many duplicate Record ID values you have per Alias, and thus don't know how many columns you'll need.
If you have only two, then displaying each of them in a column becomes a relatively trivial exercise. If you have more, I'd urge you to consider whether the destination for these records (a report? A web page? Excel?) might be able to do a better job of displaying them horizontally than SQL Server can do in returning them arranged horizontally.
Perhaps what you want is just the min() and max() of RecordId:
select Alias, min(RecordID), max(RecordId)
from yourTable t
group by Alias
having min(RecordId) <> max(RecordId)
You can also count the number of distinct values, using count(distinct):
select Alias, count(distinct RecordId) as NumRecordIds, min(RecordID), max(RecordId)
from yourTable t
group by Alias
having count(DISTINCT RecordID) > 1;
This will give all repeated values:
select Alias, count(RecordId) as NumRecordIds,
from yourTable t
group by Alias
having count(RecordId) <> count(distinct RecordId);
I agree with Ann L's answer but would like to show how you can use window functions with CTE's as you may prefer the readability.
(Re: how to pivot horizontally, I again agree with Ann)
create temporary table things (
id serial primary key,
alias varchar,
record_id int
)
insert into things (alias, record_id) values
('000123', 4),
('000123', 4),
('000234', 4),
('000123', 6),
('000345', 6),
('000345', 7);
with
things_with_distinct_aliases_and_record_ids as (
select distinct on (alias, record_id)
id,
alias,
record_id
from things
),
things_with_unique_record_id_counts_per_alias as (
select *,
COUNT(*) OVER(PARTITION BY alias) as unique_record_ids_count
from things_with_distinct_aliases_and_record_ids
)
select * from things_with_unique_record_id_counts_per_alias
where unique_record_ids_count > 1
The first CTE gets all the unique alias/record id combinations. E.g.
id | alias | record_id
----+--------+-----------
1 | 000123 | 4
4 | 000123 | 6
3 | 000234 | 4
5 | 000345 | 6
6 | 000345 | 7
The second CTE simply creates a new column for the above and adds the count of record ids for each alias. This allows you to filter only those aliases which have more than one record id associated with them.
id | alias | record_id | unique_record_ids_count
----+--------+-----------+-------------------------
1 | 000123 | 4 | 2
4 | 000123 | 6 | 2
3 | 000234 | 4 | 1
5 | 000345 | 6 | 2
6 | 000345 | 7 | 2
SELECT A.CitationId,B.CitationId, A.CitationName, A.LoaderID, A.PrimaryReferenceLoaderID,B.SecondaryReference1LoaderID, A.SecondaryReference1LoaderID, A.SecondaryReference2LoaderID,
A.SecondaryReference3LoaderID, A.SecondaryReference4LoaderID, A.CreatedOn, A.LastUpdatedOn
FROM CitationMaster A, CitationMaster B
WHERE A.PrimaryReferenceLoaderID= B.SecondaryReference1LoaderID and Isnull(A.PrimaryReferenceLoaderID,'') != '' and Isnull(B.SecondaryReference1LoaderID,'') !=''