Create variations of data efficiently - sql

I have 4 parameters. I need to create a new table that holds all possible variations of these 4 parameters divided by themself.
this is an example of 4 parameters:
the original parameters:
p1 | p2 | p3 | p4 |
=====+=====+=====+=====+
8 | 8 | 8 | 8 |
The new table should contain:
p1 | p2 | p3 | p4 |
=====+=====+=====+=====+
8 | 8 | 8 | 8 | The original raw
1 | 8 | 8 | 8 | One cell devided by 8 (4 rows overall)
8 | 1 | 8 | 8 |
8 | 8 | 1 | 8 |
8 | 8 | 8 | 1 |
1 | 1 | 8 | 8 | Two cells divided by 8 (6 rows overall)
1 | 8 | 1 | 8 |
1 | 8 | 8 | 1 |
8 | 1 | 1 | 8 |
8 | 1 | 8 | 1 |
8 | 8 | 1 | 1 |
1 | 1 | 1 | 8 | Three cells divided by 8 (4 rows overall)
1 | 1 | 8 | 1 |
1 | 8 | 1 | 1 |
8 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | All cells divided by 8 (1 row overall)
I'm looking for the most efficient way to do it, because the next level might be to do the same but with 5 parameters (and do all kinds of mathematical operations).
I thought to use a WHILE loop, but I don't know how can I "run" on the columns like a nested for loop in c/c++/java/python etc. Are there other ways to create this? What is the efficient way to do it?

The cross join solution:
select *
from
(values (8),(1)) as q1(p1)
cross join
(values (8),(1)) as q2(p2)
cross join
(values (8),(1)) as q3(p3)
cross join
(values (8),(1)) as q4(p4)

Based on your comment, you can use apply:
select p1.p1, p2.p2, p3.p3, p4.p4
from t cross apply
(values (t.p1, t.p1 / 8.0)) p1(p1) cross apply
(values (t.p2, t.p2 / 8.0)) p2(p2) cross apply
(values (t.p3, t.p3 / 8.0)) p3(p3) cross apply
(values (t.p4, t.p4 / 8.0)) p4(p4) ;
Note: A SQL query returns a fixed number of columns. If you need to handle a variable number of parameters, then you need dynamic SQL.
That said, it should be obvious how to handle whatever specific number of parameters you need to.

Related

How to get columns when using buckets (width_bucket)

I would like to know which row were moved to a bucket.
SELECT
width_bucket(s.score, sl.mins, sl.maxs, 9) as buckets,
COUNT(*)
FROM scores s
CROSS JOIN scores_limits sl
GROUP BY 1
ORDER BY 1;
My actual return:
buckets | count
---------+-------
1 | 182
2 | 37
3 | 46
4 | 15
5 | 29
7 | 18
8 | 22
10 | 11
| 20
What I expect to return:
SELECT buckets FROM buckets_table [...] WHERE scores.id = 1;
How can I get, for example, the column 'id' of table scores?
I believe you can include the id in an array with array_agg. If I recreate your case with
create table test (id serial, score int);
insert into test(score) values (10),(9),(5),(4),(10),(2),(5),(7),(8),(10);
The data is
id | score
----+-------
1 | 10
2 | 9
3 | 5
4 | 4
5 | 10
6 | 2
7 | 5
8 | 7
9 | 8
10 | 10
(10 rows)
Using the following and aggregating the id with array_agg
SELECT
width_bucket(score, 0, 10, 11) as buckets,
COUNT(*) nr_ids,
array_agg(id) agg_ids
FROM test s
GROUP BY 1
ORDER BY 1;
You get
buckets | nr_ids | agg_ids
---------+--------+----------
3 | 1 | {6}
5 | 1 | {4}
6 | 2 | {3,7}
8 | 1 | {8}
9 | 1 | {9}
10 | 1 | {2}
12 | 3 | {1,5,10}

How to return the same period last year data with SQL?

I am trying to create a view in postgreSQL with the requirements as below:
The table needs to show the same period last year data for every records.
Sample data:
date_sk | location_sk | division_sk | employee_type_sk | value
20180202 | 6 | 8 | 4 | 1
20180202 | 7 | 2 | 4 | 2
20190202 | 6 | 8 | 4 | 1
20190202 | 7 | 2 | 4 | 1
20200202 | 6 | 8 | 4 | 1
20200202 | 7 | 2 | 4 | 3
In the table, date_sk, location_sk, division_sk and employee_type_sk are super keys which form an unique record in the table.
You can check the required output as below:
date_sk | location_sk | division_sk | employee_type_sk | value | value_last_year
20180202 | 6 | 8 | 4 | 1 | NULL
20180203 | 7 | 2 | 4 | 2 | NULL
20190202 | 6 | 8 | 4 | 1 | 1
20190203 | 7 | 3 | 4 | 1 | NULL
20200202 | 6 | 8 | 4 | 1 | 1
20200203 | 7 | 3 | 4 | 3 | 1
The records start on 20180202, therefore, the data for the same period last year is unavailable. At the 4th record, there is a difference in division_sk comparing with the same period last year - hence, the head_count_last_year is NULL.
My current solution is to create a view from the sample data with an addition column as same_date_last_year then LEFT JOIN the same table. The SQL queries are below:
CREATE VIEW test_view AS
SELECT *,
CONCAT(LEFT(date_sk, 4) - 1, RIGHT(date_sk, 4)) AS same_date_last_year
FROM test_table
SELECT
test_view.date_sk,
test_view.location_sk,
test_view.division_sk,
test_view.employee_type_sk,
test_view.value,
test_table.value AS value_last_year
FROM test_view
LEFT JOIN test_table ON (test_view.same_date_last_year = test_table.date_sk)
We have a lot of data in the table. My solution above is unacceptable in terms of performance.
Is there a different query which yields the same result and might improve the performance ?
You could simply use a correlated subquery here which is likely best for performance:
select *,
(
select value from t t2
where t2.date_sk=t.date_sk - interval '1' year and
t2.location_sk=t.location_sk and
t2.division_sk=t.division_sk and
t2.employee_type_sk=t.employee_type_sk
) as value_last_year
from t
WITH CTE(DATE_SK,LOCATION_SK,DIVISION_SK,EMPLOYEE_TYPE_SK,VALUE)AS
(
SELECT CAST('20180202' AS DATE),6,8,4,1 UNION ALL
SELECT CAST('20180203'AS DATE),7,2,4,2 UNION ALL
SELECT CAST('20190202'AS DATE),6,8,4,1 UNION ALL
SELECT CAST('20190203'AS DATE),7,2,4,1 UNION ALL
SELECT CAST('20200202'AS DATE),6,8,4,1 UNION ALL
SELECT CAST('20200203'AS DATE),7,2,4,3
)
SELECT C.DATE_SK,C.LOCATION_SK,C.DIVISION_SK,C.EMPLOYEE_TYPE_SK,C.VALUE,
LAG(C.VALUE)OVER(PARTITION BY C.LOCATION_SK,C.DIVISION_SK,C.EMPLOYEE_TYPE_SK ORDER BY C.DATE_SK ASC)LAGG
FROM CTE AS C
ORDER BY C.DATE_SK ASC;
Could you please try if the above is suitable for you. I assume,DATE_SK is a date column or can be CAST to a date

Generate 'average' column from sub query and ROW_NUMBER window function in SQL SELECT

I have the following SQL Server tables (with sample data):
Questionnaire
id | coachNodeId | youngPersonNodeId | complete
1 | 12 | 678 | 1
2 | 12 | 52 | 1
3 | 30 | 99 | 1
4 | 12 | 678 | 1
5 | 12 | 678 | 1
6 | 30 | 99 | 1
7 | 12 | 52 | 1
8 | 30 | 102 | 1
Answer
id | questionnaireId | score
1 | 1 | 1
2 | 2 | 3
3 | 2 | 2
4 | 2 | 5
5 | 3 | 5
6 | 4 | 5
7 | 4 | 3
8 | 5 | 4
9 | 6 | 1
10 | 6 | 3
11 | 7 | 5
12 | 8 | 5
ContentNode
id | text
12 | Zak
30 | Phil
52 | Jane
99 | Ali
102 | Ed
678 | Chris
I have the following T-SQL query:
SELECT
Questionnaire.id AS questionnaireId,
coachNodeId AS coachNodeId,
coachNode.[text] AS coachName,
youngPersonNodeId AS youngPersonNodeId,
youngPersonNode.[text] AS youngPersonName,
ROW_NUMBER() OVER (PARTITION BY Questionnaire.coachNodeId, Questionnaire.youngPersonNodeId ORDER BY Questionnaire.id) AS questionnaireNumber,
score = (SELECT AVG(score) FROM Answer WHERE Answer.questionnaireId = Questionnaire.id)
FROM
Questionnaire
LEFT JOIN
ContentNode AS coachNode ON Questionnaire.coachNodeId = coachNode.id
LEFT JOIN
ContentNode AS youngPersonNode ON Questionnaire.youngPersonNodeId = youngPersonNode.id
WHERE
(complete = 1)
ORDER BY
coachNodeId, youngPersonNodeId
This query outputs the following example data:
questionnaireId | coachNodeId | coachName | youngPersonNodeId | youngPersonName | questionnaireNumber | score
1 | 12 | Zak | 678 | Chris | 1 | 1
2 | 12 | Zak | 52 | Jane | 1 | 3
3 | 30 | Phil | 99 | Ali | 1 | 5
4 | 12 | Zak | 678 | Chris | 2 | 4
5 | 12 | Zak | 678 | Chris | 3 | 4
6 | 30 | Phil | 99 | Ali | 2 | 2
7 | 12 | Zak | 52 | Jane | 2 | 5
8 | 30 | Phil | 102 | Ed | 1 | 5
To explain what's happening here… There are various coaches whose job is to undertake questionnaires with various young people, and log the scores. A coach might, at a later date, repeat the questionnaire with the same young person several times, hoping that they get a better score. The ultimate goal of what I'm trying to achieve is that the managers of the coaches want to see how well the coaches are performing, so they'd like to see whether the scores for the questionnaires tend to go up or not. The window function represents a way to establish how many times the questionnaire has been undertaken by the same coach/young person combo.
I need to be able to determine the average score based on the questionnaire number. So for example, the coach 'Zak' logged scores of '1' and '3' for his first questionnaires (where questionnaireNumber = 1) so the average would be 2. For his second questionnaires (where questionnaireNumber = 2) the scores were '3' and '5' so the average would be 4. So in analysing this data we know that over time Zak's questionnaire scores have improved from an average of '2' the first time to an average of '4' the second time.
I feel like the query needs to be grouped by the coachNodeId and questionnaireNumber values so it would output something like this (I've ommitted the questionnaireId, youngPersonNodeId, youngPersonName and score columns as they aren't crucial for the output — they're only used to derive the averageScore — and wouldn't be useful the way the results are grouped):
coachNodeId | coachName | questionnaireNumber | averageScore
12 | Zak | 1 | 2 (calculation: (1 + 3) / 2)
12 | Zak | 2 | 4 (calculation: (3 + 5) / 2)
12 | Zak | 3 | 4 (only one value: 4)
30 | Phil | 1 | 5 (calculation: (5 + 5) / 2)
30 | Phil | 2 | 2 (only one value: 2)
Could anyone suggest how I can modify my query to output the average scores based on the score from the sub-query and the ROW_NUMBER window function? I've hit the limits of my SQL skills!
Many thanks.
It is a bit hard to tell without sample data, but I think you are describing aggregation:
SELECT q.coachNodeId AS coachNodeId,
cn.[text] AS coachName,
q.youngPersonNodeId AS youngPersonNodeId,
ypn.[text] AS youngPersonName,
AVG(score)
FROM Questionnaire q JOIN
ContentNode cn
ON q.coachNodeId = cn.id JOIN
ContentNode ypn
ON q.youngPersonNodeId = ypn.id LEFT JOIN
Answer a
ON a.questionnaireId = q.id
WHERE complete = 1
GROUP BY q.coachNodeID, cn.[text] AS coachName,
q.youngPersonNodeId, ypn.[text]

Postgres Query Based on Previous and Next Rows

I'm trying to solve the bus routing problem in postgresql which requires visibility of previous and next rows. Here is my solution.
Step 1) Have one edges table which represents all the edges (the source and target represent vertices (bus stops):
postgres=# select id, source, target, cost from busedges;
id | source | target | cost
----+--------+--------+------
1 | 1 | 2 | 1
2 | 2 | 3 | 1
3 | 3 | 4 | 1
4 | 4 | 5 | 1
5 | 1 | 7 | 1
6 | 7 | 8 | 1
7 | 1 | 6 | 1
8 | 6 | 8 | 1
9 | 9 | 10 | 1
10 | 10 | 11 | 1
11 | 11 | 12 | 1
12 | 12 | 13 | 1
13 | 9 | 15 | 1
14 | 15 | 16 | 1
15 | 9 | 14 | 1
16 | 14 | 16 | 1
Step 2) Have a table which represents bus details like from time, to time, edge etc.
NOTE: I have used integer format for "from" and "to" column for faster results as I can do an integer query, but I can replace it with any better format if available.
postgres=# select id, "busedgeId", "busId", "from", "to" from busedgetimes;
id | busedgeId | busId | from | to
----+-----------+-------+-------+-------
18 | 1 | 1 | 33000 | 33300
19 | 2 | 1 | 33300 | 33600
20 | 3 | 2 | 33900 | 34200
21 | 4 | 2 | 34200 | 34800
22 | 1 | 3 | 36000 | 36300
23 | 2 | 3 | 36600 | 37200
24 | 3 | 4 | 38400 | 38700
25 | 4 | 4 | 38700 | 39540
Step 3) Use dijkstra algorithm to find the nearest path.
Step 4) Get the upcoming buses from the busedgetimes table in the earliest first order for the nearest path detected by dijkstra algorithm.
Problem: I am finding it difficult to make the query for the Step 4.
For example: If I get the path as edges 2, 3, 4, to travel from source vertex 2 to target vertex 5 in the above records. To get the first bus for the first edge, it's not so hard as I can simply query with from < 'expected departure' order by from desc but for the second edge, the from condition requires to time of first result row. Also, query requires edge ids filter.
How can I achieve this in a single query?
I am not sure if I understood your problem correctly. But getting values from other rows this can be done by window functions (https://www.postgresql.org/docs/current/static/tutorial-window.html):
demo: db<>fiddle
SELECT
id,
lag("to") OVER (ORDER BY id) as prev_to,
"from",
"to",
lead("from") OVER (ORDER BY id) as next_from
FROM bustimes;
The lag function moves the value of the previous row into the current one. The lead function does the same with the next row. So you are able to calculate a difference between last arrival and current departure or something like that.
Result:
id prev_to from to next_from
18 33000 33300 33300
19 33300 33300 33600 33900
20 33600 33900 34200 34200
21 34200 34200 34800 36000
22 34800 36000 36300
Please notice that "from" and "to" are reserved words in PostgreSQL. It would be better to chose other names.

SQL comparing data in a table

I have a table that stores user's points in reading and writing and the point needed to meet the requirements for each user. Here's an example:
User | READING_PNTS | READING_REQ | WRITING_PNTS | WRITING_REQ
jim | 3 | 8 | 6 | 5
tim | 7 | 4 | 6 | 3
kim | 7 | 5 | 2 | 5
Ron | 6 | 4 | 8 | 4
Dom | 10 | 7 | 6 | 3
ton | 3 | 5 | 6 | 5
My resulting table should be simply the number of ppl who meet both requirements and the number of ppl who don't meet both requirements. So in this case it would be this:
Meet | Not Meet
3 | 3
Any help would be appreciated. Also, I'm working in Access for the record.
Thanks!
I think you just want conditional aggregation, which in MS Access uses iif() or swtich():
select sum(iif(reading_pnts >= reading_req and writing_pnts >= writing_req, 1, 0)) as meet,
sum(iif(reading_pnts >= reading_req and writing_pnts >= writing_req, 0, 1)) as not_meet
from t;