PostgreSQL: Distribute rows evenly and according to frequency - sql

I have trouble with a complex ordering problem. I have following example data:
table "categories"
id | frequency
1 | 0
2 | 4
3 | 0
table "entries"
id | category_id | type
1 | 1 | a
2 | 1 | a
3 | 1 | a
4 | 2 | b
5 | 2 | c
6 | 3 | d
I want to put entries rows in an order so that category_id,
and type are distributed evenly.
More precisely, I want to order entries in a way that:
category_ids that refer to a category that has frequency=0 are
distributed evenly - so that a row is followed by a different category_id
whenever possible. e.g. category_ids of rows: 1,2,1,3,1,2.
Rows with category_ids of categories with frequency<>0 should
be inserted from ca. the beginning with a minimum of frequency rows between them
(the gaps should vary). In my example these are rows with category_id=2.
So the result could start with row id #1, then #4, then a minimum of 4 rows of other
categories, then #5.
in the end result rows with same type should not be next to each other.
Example result:
id | category_id | type
1 | 1 | a
4 | 2 | b
2 | 1 | a
6 | 3 | d
.. some other row ..
.. some other row ..
.. some other row ..
5 | 2 | c
entries are like a stream of things the user gets (one at a time).
The whole ordering should give users some variation. It's just there to not
present them similar entries all the time, so it doesn't have to be perfect.
The query also does not have to give the same result on each call - using
random() is totally fine.
frequencies are there to give entries of certain categories a higher
priority so that they are not distributed across the whole range, but are placed more
at the beginning of the result list. Even if there are a lot of these entries, they
should not completely crowd out the frequency=0 entries at the beginning, through.
I'm no sure how to start this. I think I can use window functions and
ntile() to distribute rows by category_id and type.
But I have no idea how to insert the non-0-category-entries afterwards.

Related

Flatten tree structure represented in SQL [duplicate]

This question already has an answer here:
SQL Server recursive self join
(1 answer)
Closed 3 years ago.
I'm using an engineering calculation package and trying to extract some information from it in a built in reporting tool that allows SQL query
An abbreviated example SQL tables are as follows:
Id | Description | Ref
---|---------------------
1 | system 1 |
3 | block 4 | 6
3 | block 4 | 1
5 | formula1 | 3
6 | f |
7 | something | 1
9 | cheese | 5
The "Ref" column identifies rows that are subrecords of other items.
What I want to do is run a query that will produce a list that will show all items that appear on a each page. As you can see from the table above "ID" is not the unique key; each item can appear in multiple locations within the table. In the example above:
ID 5 is a subitem of ID3
ID 3 is a subitem of ID 1 AND ID 6
ID 1 and ID 6 aren't subitems of anything
So effectively it is representing a tree structure:
ID 1
+-------- ID 7
|---- ID 3
+---- ID 5
+---- ID 9
ID 6
+---- ID 3
+---- ID 5
+---- ID 9
What I'm hoping to is work out which items appear under each top level item (so the end result should be a table where in the "Ref" column only top level items appear):
Id | Description | Ref
---|---------------------
1 | system 1 |
3 | block 4 | 6
3 | block 4 | 1
5 | formula1 | 1
5 | formula1 | 6
6 | f |
9 | cheese | 1
9 | cheese | 6
7 | something | 1
The tree structure can be a total of 5 levels deep
I've been trying to use left joins to build up a list of page references, but I think I'm also going to need to union results tables (because obviously rows like ID=9, ID=5, and ID = 6 have to be duplicated in the final results set). It starts to get a bit messy!
WITH A
AS (SELECT *
FROM [RbdBlocks]),
B
AS (SELECT [x].[Id],
[x].[Description],
[x].[Page] AS Page1,
[y].[Page] AS Page2,
FROM A AS x
LEFT OUTER JOIN
A AS y
ON y.Id = x.Page)
SELECT *
FROM B
The above gives me some of the nested references, but I'm not sure if there's a better way to get this data together, and to manage the recursion rather than just duplicating the set of queries 4 times?
Have a look at Recursive Common Table Expressions (CTEs). They should be able to accomplish exactly what you need.
Have a look at Example D on the SQL Docs page.
Basically what you'd do in your case is:
In the "anchor member" of the CTE, select all top-level items
In the "recursive member" of the CTE, join all of the nested children to the top-level item
Recursive CTEs are not really trivial to understand, so be sure to read the docs carefully.

Trying to find non-duplicate entries in mostly identical tables(access)

I have 2 different databases. They track different things about inventory. in essence they share 3 common fields. Location, item number and quantity. I've extracted these into 2 tables, with only those fields. Every time I find an answer, it doesn't get all the test cases, just some of the fields.
Items can be in multiple locations, and as a turn each location can have multiple items. The primary key would be location and item number.
I need to flag when an entry doesn't match all three fields.
I've only been able to find queries that match an ID or so, or who's queries are beyond my comprehension. in the below, I'd need a query that would show that rows 1,2, and 5 had issues. I'd run it on each table and have to verify it with a physical inventory.
Please refrain from commenting on it being silly having information in 2 different databases, All I get in response it to deal with it =P
Table A
Location ItemNum | QTY
-------------------------
1a1a | as1001 | 5
1a1b | as1003 | 10
1a1b | as1004 | 2
1a1c | as1005 | 15
1a1d | as1005 | 15
Table B
Location ItemNum | QTY
-------------------------
1a1a | as1001 | 10
1a1d | as1003 | 10
1a1b | as1004 | 2
1a1c | as1005 | 15
1a1e | as1005 | 15
This article seemed to do what I wanted but I couldn't get it to work.
To find entries in Table A that don't have an exactly matching entry in Table B:
select A.*
from A
left join B on A.location = B.location and A.ItemNum = B.ItemNum and A.qty = B.qty
where B.location Is Null
Just swap all the A's and B's to get the list of entries in B with no matching entry in A.

How to properly group SQL results set?

SQL noob, please bear with me!!
I am storing a 3-tuple in a database (x,y, {signal1, signal2,..}).
I have a database with tables coordinates (x,y) and another table called signals (signal, coordinate_id, group) which stores the individual signal values. There can be several signals at the same coordinate.
The group is just an abitrary integer which marks the entries in the signal table as belonging to the same set (provided they belong to the same coordinate). So that any signals with the same 'coordinate_id' and 'group' together form a tuple as shown above.
For example,
Coordinates table Signals table
-------------------- -----------------------------
| id | x | y | | id | signal | coordinate_id | group |
| 1 | 1 | 2 | | 1 | 45 | 1 | 1 |
| 2 | 2 | 5 | | 2 | 95 | 1 | 1 |
| 3 | 33 | 1 | 1 |
| 4 | 65 | 1 | 2 |
| 5 | 57 | 1 | 2 |
| 6 | 63 | 2 | 1 |
This would produce the tuples (1,2 {45,95,33}), (1,2,{65,57}), (2,5, {63}) and so on.
I would like to retrieve the sets of {signal1, signal2,...} for each coordinate. The signals belonging to a set have the same coordinate_id and group, but I do not necessarily know the group value. I only know that if the group value is the same for a particular coordinate_id, then all those with that group form one set.
I tried looking into SQL GROUP BY, but I realized that it is for use with aggregate functions.
Can someone point out how to do this properly in SQL or give tips for improving my database structure.
SQLite supports the GROUP_CONCAT() aggregate function similar to MySQL. It rolls up a set of values in the group and concatenates them together comma-separated.
SELECT c.x, c.y, GROUP_CONCAT(s.signal) AS signal_list
FROM Signals s
JOIN Coordinates ON s.coordinate_id = c.id
GROUP BY s.coordinate_id, s.group
SQLite also permits the mismatch between columns in the select-list and columns in the group-by clause, even though this isn't strictly permitted by ANSI SQL and most implementations.
personally I would write the database as 3 tables:
x_y(x, y, id) coords_groups(pos, group, id) signals(group, signal)
with signals.group->coords_groups.id and coords_groups.pos->x_y.id
as you are trying to represent a sort-of 4 dimensional array.
then, to get from a couple of coordinates (X, Y) an ArrayList of List of Signal you can use this
SELECT temp."group", signals.signal
FROM (
SELECT cg."group", cg.id
FROM x_y JOIN coords_groups AS cg ON x_y.id = cg.pos
WHERE x_y.x=X AND x_y.y=Y )
AS temp JOIN signals ON temp.id=signals."group"
ORDER BY temp."group" ASC
(X Y are in the innermost where)
inside this sort of pseudo-code:
getSignalsGroups(X, Y)
ArrayList<List<Signals>> a
List<Signals> temp
query=sqlLiteExecute(THE_SQL_SNIPPET, x, y)
row=query.fetch() //fetch the first row to set the groupCounter
actualGroup=row.group
temp.add(row.signal)
for(row : query) //foreach row add the signal to the list
if(row.group!=actualGroup) //or reset the list if is a new group
a.add(actualGroup, temp)
actualGroup=row.group; temp= new List
temp.add(row.signal)
return a

Out of a sample of 100 demands what else was asked for?

I am doing some work on an inbound call demand capture system where each call could have one or more than one demands linked to it.
There is a CaptureHeader table with CallDate, CallReference and CaptureID and a CaptureDemand table with CaptureID and DemandID.
EDIT:
I have added some representative data to show what would be expected in each table.
CaptureHeader
CaptureID | CallReference | CallDate
-----------------------------------------------
1 | 1 | 2009-11-02 20:37:00
2 | 3 | 2009-11-02 20:37:05
3 | 2 | 2009-11-02 20:37:10
4 | 4 | 2009-11-02 20:38:00
5 | 5 | 2009-11-02 20:38:30
CaptureDemand
DemandID | CaptureID | DemandText
------------------------------------
1 | 1 | Fund value
2 | 2 | Password reset
3 | 2 | Fund value
4 | 3 | Change address
5 | 3 | Fund value
6 | 3 | Rate change
7 | 3 | Fund value
8 | 4 | Variable to fixed
9 | 4 | Change address
10 | 5 | Fund value
11 | 5 | Address change
Using the tables above a filter on 'Fund value' would bring back call references of 1, 2, 3, 3, 5 because 3 has two fund values.
If I did a DISTINCT on this because I have ordered by date it would ask me to show that which would also give me two lines for 3.
To get the full set of data I would do the following query:
SELECT * FROM CaptureHeader AS ch
JOIN CaptureDemand AS cd ON ch.CaptureID = cd.CaptureID
JOIN DemandDetails AS dd ON cd.DemandID = dd.DemandID
What I would like though is to get the last 100 headers by date for a particular demand. Where it gets tricky is when there is more than one of the same demand on a header for a particular reference which is possible.
I would like 100 unique call references because I then need to get back all the demands for those call references and then count how many of each other demand was also recorded in the same call.
EDIT:
I would like to be able to say 'WHERE DemandID = SomeValue' to select my 100 references.
In other words out of 100 "value requested" demands what else was asked for. If this doesn't make sense let me know and I will try and modify the question to be clearer.
I would like to get a table like this:
Demands | Count
------------------------
Demand asked for | 100
Another demand | 36
Third demand | 12
Fourth demand | 6
Cheers, Ian.
Now that the sample data made your requirement more explicit, I believe the following will generally server your needs. It is essentially the same as previous submission, with an added condition on the JOIN; this condition essentially excludes any CaptureDemand row for which we readily have the same DemandText (within the same Capture), only retaining the one with the lowest DemandId.
WITH myCTE (CaptId, NbOfDemands)
AS (
SELECT CaptureID, COUNT(*) -- Can use COUNT(DISTINCT DemandText)
FROM CaptureDemand
WHERE CaptureID IN
(SELECT TOP 100 C.CaptureID
FROM CaptureHeader C
JOIN CaptureDemand D ON C.CaptureID = D.CaptureID
AND NOT EXISTS (
SELECT * FROM CaptureDemand X
WHERE X.CaptureId = D.CaptureId AND X.DemandText = D.DemandText
AND X.DemandId < D.DemandId
)
WHERE D.DemandText= 'Fund Value'
ORDER BY CallDate DESC)
)
SELECT NbOfDemands, COUNT(*)
FROM myCTE
GROUP BY NbOfDemands
ORDER BY NbOfDemands
What this query provides:
The number of Captures which had exactly one demand
The number of Captures which had exactly two demands
..
The number of Captures which had exactly n demands
For the 100 MOST RECENT Captures which included a Demand of a particular value 'someValue' (and, this time, giving indeed 100, i.e. not counting the same CaptureID twice in case of dups on the Demand Type).
A few points:
You may want to use COUNT(DISTINCT DemandText) rather than COUNT(*) in the select list of the CTE. (We do include 100 distinct CaptureIDs, i.e. that the Capture #3 in your sample doesn't come twice and hence hiding another capture at the end of the list, but we need to know if this #3 Capture should be counted as 3 Demands or a 4 Demands capture).
Oops, not quite what you required because each line show the number of Capture instances that have exactly this amount of demands...
use a CASE on NbOfDemands to display the text as in the question (trivial)
This may show Capture instances with more than 4 demands, but that's probably a plus (if any), but that is probably a plus
This would not show 0 if for example there were no Capture instances with the given number of demands.
It sounds like you are trying to solve a Many to Many problem with just two tables and you really need three tables. For example:
TABLE Calls
CallId | CallDate
----------------------------
1 | 2009-11-02 20:37:00
2 | 2009-11-02 20:37:05
3 | 2009-11-02 20:37:10
4 | 2009-11-02 20:38:00
5 | 2009-11-02 20:38:30
TABLE Requests
RequestId | RequestType
----------------------------
1 | Fund value
2 | Password reset
3 | Change address
4 | Rate change
5 | Variable to fixed
TABLE CallRequests (resolves the many to many)
CallId |RequestId
-----------------
1 |1
2 |2
2 |1
3 |3
3 |1
3 |4
3 |1
4 |5
4 |3
5 |1
5 |3
This data structure will let you query from the Call side of things and from the Request side of things.

How to sort sql result using a pre defined series of rows

i have a table like this one:
--------------------------------
id | name
--------------------------------
1 | aa
2 | aa
3 | aa
4 | aa
5 | bb
6 | bb
... one million more ...
and i like to obtain an arbitrary number of rows in a pre defined sequence and the other rows ordered by their name. e.g. in another table i have a short sequence with 3 id's:
sequ_no | id | pos
-----------------------
1 | 3 | 0
1 | 1 | 1
1 | 2 | 2
2 | 65535 | 0
2 | 45 | 1
... one million more ...
sequence 1 defines the following series of id's: [ 3, 1, 2]. how to obtain the three rows of the first table in this order and the rest of the rows ordered by their name asc?
how in PostgreSQL and how in mySQL? how would a solution look like in hql (hibernate query language)?
an idea i have is to first query and sort the rows which are defined in the sequence and than concat the other rows which are not in the sequence. but this involves tow queries, can it be done with one?
Update: The final result for the sample sequence [ 3, 1, 2](as defined above) should look like this:
id | name
----------------------------------
3 | aa
1 | aa
2 | aa
4 | aa
5 | bb
6 | bb
... one million more ...
i need this query to create a pagination through a product table where part of the squence of products is a defined sequence and the rest of the products will be ordered by a clause i dont know yet.
I'm not sure I understand the exact requirement, but won't this work:
SELECT ids.id, ids.name
FROM ids_table ids LEFT OUTER JOIN sequences_table seq
WHERE ids.id = seq.id
ORDER BY seq.sequ_no, seq.pos, ids.name, ids.id
One way: assign a position (e.g. 0) to each id that doesn't have a position yet, UNION the result with the second table, join the result with the first table, and ORDER BY seq_no, pos, name.