Converting single entries into an array with PostgreSQL - sql

I have 3 tables of data. Table A, AC and C. Table AC simply connects A and C together with a ForeignKey. Currently all my rows in AC are single entries that create these connections. So my question in this matter arises whether it makes sense to convert all the single entries in AC into one row for each A entry.
The connections that occur between A and C is one-to-many, so the array length is basically 1..x. The mean entry amount in table AC for each A entry is around 6, so it would reduce the entries in AC significantly if it was changed to arrays.
OR should i instead remove AC and simply add the FK field to the A table instead?
Where at the pitfalls when using arrays in this use-case? Will i have to use JSON style entries in an array: ['blah','blah2','blah3']?
The example below explains the structure of the database, while values are bogus:
Table A
| id | name |
---------------------
| 1 | John |
| 2 | Jim |
| 3 | Joe |
Table AC
| id | id_a | id_c |
---------------------------------
| 1 | 1 | 1 |
| 2 | 1 | 4 |
| 3 | 2 | 2 |
| 4 | 3 | 3 |
| 5 | 3 | 1 |
Table C
| id | name |
---------------------
| 1 | Pie |
| 2 | Cake |
| 3 | Burger |
| 4 | Ice |

Related

Access - retrieving status and date puzzle

In Access 2016, I have the following tables:
Table1
------
| ID | FK_ID | Status_date |Status_ID |
---------------------------------------
| 1 | 11 | d1 | 1 |
| 2 | 11 | d2 | 2 |
| 3 | 22 | d3 | 3 |
| 4 | 22 | d4 | 3 |
LookupTable1
------------
| OBJ_ID | Status |
-------------------
| 1 | A |
| 2 | B |
| 3 | C |
And I would like to produce the following result. This will ultimately be exported to Excel.
xls report
==========
| FK_ID | Status_1_date | Status_2_date | Status_3_date | <-- these will be aliased
=========================================================
| 11 | d1 | d2 | |
| 22 | | | d4 |
The part of the puzzle I'm struggling with is that there seem to be at least these different ways to achieve this a) multiple Access queries b) a single Access query with in-line queries (possible?) c) VBA code d) in SQL Server itself e) other... What's the simplest way to create and maintain this, as the LookupTable1.Status values will change.
Looks like a simple CROSSTAB query:
TRANSFORM Max(Table1.Status_date) AS MaxOfStatus_date
SELECT Table1.FK_ID
FROM Table1
GROUP BY Table1.FK_ID
PIVOT Table1.Status_ID;
If you want the output to show the Status alias values, first JOIN the two tables then use the Status field as column header in CROSSTAB.

is there an easy/efficient way to do multiple lookups in SQL for a value from an ID without multiple joins?

I have two tables. One table has an ID column and then other data. Another table has multiple "ID" fields that correspond to the first table. I know I can do multiple joins but each table has 400,000+ records so I thought maybe there is a more efficient way?
Here is an example of what I mean:
This is my lookup table:
+----+------+---------+
| ID | Name | Title |
+----+------+---------+
| 1 | a | alpha |
| 2 | b | bravo |
| 3 | c | charlie |
| 4 | d | delta |
+----+------+---------+
This is my input table. The example only has 2 lookup columns but the production table has 10.
+----+------------+------------+
| ID | Lookup ID1 | Lookup ID2 |
+----+------------+------------+
| 1 | 1 | 3 |
| 2 | 2 | 3 |
| 3 | 2 | 4 |
| 4 | 2 | 4 |
| 5 | 2 | 2 |
| 6 | 2 | 2 |
| 7 | 3 | 4 |
| 8 | 1 | 3 |
+----+------------+------------+
This is the expected output.
+----+---------------+----------------+---------------+----------------+
| ID | Lookup Name 1 | Lookup Title 1 | Lookup Name 2 | Lookup Title 2 |
+----+---------------+----------------+---------------+----------------+
| 1 | a | alpha | c | charlie |
| 2 | b | bravo | c | charlie |
| 3 | b | bravo | d | delta |
| 4 | b | bravo | d | delta |
| 5 | b | bravo | b | bravo |
| 6 | b | bravo | b | bravo |
| 7 | c | charlie | d | delta |
| 8 | a | alpha | c | charlie |
+----+---------------+----------------+---------------+----------------+
I know I can get this using multiple joins (10 in my production environment) but there has to be a more efficient way?
You do this operation with joins, that's what relational databases do:
select i.id, l1.*, l2.*
from input i left join
lookup l1
on i.lookupid1 = l1.id left join
lookup l2
on i.lookupid2 = l2.id;
(The left joins make sure no rows are lost if one of the ids is missing.)
For performance, you want an index on lookup(id) or lookup(id, name). Because id should be declared a primary key, another index is not necessary.
You could do it with a single join, but I wouldn't recommend it. It would involve a complex join condition something like a.id IN (b.id1, b.id2, ....) and conditional aggregation; and likely be much much slower than the multi-join equivalent.
It is likely the table structure is just plain poor for your needs (though it is hard to be certain without knowing more details about what is being represented); if you cannot change it, the multi-join is your best solution.
Also, I should note that your assumption that multiple joins would be inefficient is incorrect. "Efficiency" in RDBMS systems is generally more affected by indexing and the ability to use those indexes. One of the reasons a.id IN (b.id1, b.id2, ....) is poor is MySQL pretty much ignores indexes when conditions have the equivalent of even a single OR.

Developing SCV using SQL

I am trying to identify all related records using IDs from two different systems.
I have seen solutions that matches SourceA to SourceB and back to SourceA but obviously this will not pick up everything.
The below table shows that 1-A is seemingly unrelated to 4-C, however when we pair them up we can see that all of the below records are related and the latest ID combination is 4-C.
| SystemA_ID | SystemB_ID | Date | PrimaryA | PrimaryB |
| 1 | A | 1/1/2016 | 4 | C |
| 2 | A | 2/1/2016 | 4 | C |
| 2 | B | 3/1/2016 | 4 | C |
| 3 | B | 4/1/2016 | 4 | C |
| 3 | C | 5/1/2016 | 4 | C |
| 4 | C | 6/1/2016 | 4 | C |
What I need is to populate the PrimaryA and PrimaryB columns with 4 and 'C' respectively.
I was thinking of doing a double loop similar to the solution described here
However, I could not get it working and also there might be a better solution.

Joining to tables while linking with a crosswalk table

I am trying to write a query to de identify one of my tables. To make distinct ids for people, I used name, age and sex. However in my main table, the data has been collected for years and the sex code changed from 1 meaning male and 2 meaning female to M meaning male and F meaning female. To make this uniform in my distinct individuals table I used a crosswalk table to convert the sexcode into to the correct format before placing it into the distinct patients table.
I am now trying to write the query to match the distinct patient ids to their correct the rows from the main. table. The issue is that now the sexcode for some has been changed. I know I could use an update statement on my main table and changes all of the 1 and 2 to the m and f. However, I was wondering if there was a way to match the old to the new sexcodes so I would not have to make the update. I did not know if there was a way to join the main and distinct ids tables in the query while using the sexcode table to convert the sexcodes again. Below are the example tables I am currently using.
This is my main table that I want to de identify
----------------------------
| Name | age | sex | Toy |
----------------------------
| Stacy| 30 | 1 | Bat |
| Sue | 21 | 2 | Ball |
| Jim | 25 | 1 | Ball |
| Stacy| 30 | M | Ball |
| Sue | 21 | F | glove |
| Stacy| 18 | F | glove |
----------------------------
Sex code crosswalk table
-------------------
| SexOld | SexNew |
-------------------
| M | M |
| F | F |
| 1 | M |
| 2 | F |
-------------------
This is the table I used to to populate IDs for people I found to be distinct in my main table
--------------------------
| ID | Name | age | sex |
--------------------------
| 1 | Stacy| 30 | M |
| 2 | Jim | 25 | M |
| 3 | Stacy| 18 | F |
| 4 | Sue | 21 | F |
--------------------------
This what I want my de identified table to look like
---------------
| ID | Toy |
---------------
| 1 | Bat |
| 4 | Ball |
| 2 | Ball |
| 1 | Ball |
| 4 | glove |
| 3 | glove |
---------------
select c.ID, a.Toy
from maintable a
left join sexcodecrosswalk b on b.sexold = a.sex
left join peopleids c on c.Name = a.Name and c.age = a.age and c.Sex = b.sexNew
Here's a demonstration that this works:
http://sqlfiddle.com/#!3/a2d26/1

Possible fallbacks in my pagination technique and how can I improve it?

I want to perform pagination for my web page.The method that I am using (and I found mostly on internet ) is explained below with an example.
Suppose I have the following table user
+----+------+----------+
| id | name | category |
+----+------+----------+
| 1 | a | 1 |
| 2 | b | 2 |
| 3 | c | 2 |
| 4 | d | 3 |
| 5 | e | 1 |
| 6 | f | 3 |
| 7 | g | 1 |
| 8 | h | 3 |
| 9 | i | 2 |
| 10 | j | 2 |
| 11 | k | 1 |
| 12 | l | 3 |
| 13 | m | 3 |
| 14 | n | 3 |
| 15 | o | 1 |
| 16 | p | 1 |
| 17 | q | 2 |
| 18 | r | 1 |
| 19 | s | 3 |
| 20 | t | 3 |
| 21 | u | 3 |
| 22 | v | 3 |
| 23 | w | 1 |
| 24 | x | 1 |
| 25 | y | 2 |
| 26 | z | 2 |
+----+------+----------+
And I want to show information about category 3 users with 2 users per page, I am using the following query for this
select * from user where category=3 limit 0,2;
+----+------+----------+
| id | name | category |
+----+------+----------+
| 4 | d | 3 |
| 6 | f | 3 |
+----+------+----------+
and for next two
select * from user where category=3 limit 2,2;
+----+------+----------+
| id | name | category |
+----+------+----------+
| 8 | h | 3 |
| 12 | l | 3 |
+----+------+----------+
and so on.
Now in practice I have around 7000 tuples in a single table.So is there any better way in terms of speed to achieve this or in terms of any fallback this method may have.
Thanks.
You don't want to fetch more values than your current page can handle, so yes, you will essentially be making one query per page. Some other solutions (such as Rails will_paginate) will execute essentially the same queries.
Now, you could build some logic into your client side to do the pagination there - prefetch multiple (or all) pages at once and store them on the client side. This way pagination is handled completely on the client side without need for further queries. It is a bit wasteful if a user is likely to only look at a small percentage of pages overall though.
If your actual production table has more columns in it, you could select only the relevant columns instead of *, or potentially add some sort of order by (for sorting).
I hope this will help, you gotta put your page number in place of your_page_number, and records per page in place of records_per_page which in your sample is 2:
select A.* from
(select #row := #row + 1 as Row_Number, User.* from User
join (select #row := 0) Row_Temp_View
where category = 3
) A
where row_number
between (your_page_number * records_per_page)-records_per_page+1
and your_page_number * records_per_page;
notice that this will fetch you the right records, where your sample will not, and this is because your sample will fetch you always two records, which is not always true, lets say that you have 3 users you wonna show in two pages so your sample will show the first and the second in the first page and it will show the second and the third in the second page which is not right, my code will show you the first and the second in the first page and in the second page it will show you only the third one....
You can use Datatables. It's meant for exact same thing that you are looking for. I successfully use it for paginating more than a million rows, it's very fast & easy to implement.