how to split table in postgresql - sql

I get a data set about 70 thousand rows and now I want to split this table into three with exact number of rows(the code was fisrt applied in SAS and now move to postgresql),one from 1-5000,two from 5001-25000 and last one with the rest row,and no duplicated rows in any of them.
like:
+--------+-----+--------+-----+
| cst_id | age | salary | sex |
+--------+-----+--------+-----+
| 1 | 44 | 2000 | M |
| 2 | 23 | 3000 | F |
| 3 | 34 | 4000 | M |
| 4 | 51 | 5000 | M |
| 5 | 26 | 6000 | F |
| 6 | 28 | 7000 | F |
| 7 | 39 | 8000 | M |
+--------+-----+--------+-----+
finally I want three table with the exact number of rows I assign(such as 3rows-2rows-rest rows),and they are all distinct.like:
table1:
+--------+-----+--------+-----+
| cst_id | age | salary | sex |
+--------+-----+--------+-----+
| 1 | 44 | 2000 | M |
| 2 | 23 | 3000 | F |
| 3 | 34 | 4000 | M |
+--------+-----+--------+-----+
table2:
+--------+-----+--------+-----+
| cst_id | age | salary | sex |
+--------+-----+--------+-----+
| 4 | 51 | 5000 | M |
| 5 | 26 | 6000 | F |
+--------+-----+--------+-----+
table3:
+--------+-----+--------+-----+
| cst_id | age | salary | sex |
+--------+-----+--------+-----+
| 6 | 28 | 7000 | F |
| 7 | 39 | 8000 | M |
+--------+-----+--------+-----+
how to use postgresql to finish this?

There is a window function "NTILE" can do this:
-- add a col to help split
create temp table help_table as
select *
,NTILE(3) OVER(ORDER BY cat_id) as batch_nbr
from your_table;
create table_1 as
select * from help_table where batch_nbr = 1;
create table_2 as
select * from help_table where batch_nbr = 2;
create table_3 as
select * from help_table where batch_nbr = 3;

You can split this process into steps as a function.
Get the total number of distinct rows.
Divide that value by 3 and store the value as a DECLARED variable (_size).
Create table_1, table_2, and table_3.
INSERT INTO table_1 with LIMIT (_size).
INSERT INTO table_2 with LIMIT (_size) WHERE id > table_1's greatest id.
INSERT INTO table_3 with LIMIT (_size) WHERE id > table_2's greatest id.
Hopefully this helps.

Related

Get the Id of the matched data from other table. No duplicates of ID from both tables

Here is my table A.
| Id | GroupId | StoreId | Amount |
| 1 | 20 | 7 | 15000 |
| 2 | 20 | 7 | 1230 |
| 3 | 20 | 7 | 14230 |
| 4 | 20 | 7 | 9540 |
| 5 | 20 | 7 | 24230 |
| 6 | 20 | 7 | 1230 |
| 7 | 20 | 7 | 1230 |
Here is my table B.
| Id | GroupId | StoreId | Credit |
| 12 | 20 | 7 | 1230 |
| 14 | 20 | 7 | 15000 |
| 15 | 20 | 7 | 14230 |
| 16 | 20 | 7 | 1230 |
| 17 | 20 | 7 | 7004 |
| 18 | 20 | 7 | 65523 |
I want to get this result without getting duplicate Id of both table.
I need to get the Id of table B and A where the Amount = Credit.
| A.ID | B.ID | Amount |
| 1 | 14 | 15000 |
| 2 | 12 | 1230 |
| 3 | 15 | 14230 |
| 4 | null | 9540 |
| 5 | null | 24230 |
| 6 | 16 | 1230 |
| 7 | null | 1230 |
My problem is when I have 2 or more same Amount in table A, I get duplicate ID of table B. which should be null. Please help me. Thank you.
I think you want a left join. But this is tricky because you have duplicate amounts, but you only want one to match. The solution is to use row_number():
select . . .
from (select a.*, row_number() over (partition by amount order by id) as seqnum
from a
) a left join
(select b.*, row_number() over (partition by credit order by id) as seqnum
from b
)b
on a.amount = b.credit and a.seqnum = b.seqnum;
Another approach, I think simplier and shorter :)
select ID [A.ID],
(select top 1 ID from TABLE_B where Credit = A.Amount) [B.ID],
Amount
from TABLE_A [A]

How to make rows shuffle?

There is a table with over 10+ rows, and now needed to shuffle all rows randomly and create a new table on it. any ideas ?
Using select * from table order by random() seems slow.
raw table is like,and the target column is separated into two parts:
+--------+------+--------+------+-----+--------+
| cst_id | name | salary | fund | age | target |
+--------+------+--------+------+-----+--------+
| 1 | a | 100 | Y | 33 | 0 |
| 2 | b | 200 | Y | 21 | 0 |
| 3 | c | 300 | Y | 45 | 0 |
| 4 | d | 400 | N | 26 | 0 |
| 5 | e | 500 | N | 37 | 0 |
| 6 | f | 600 | Y | 56 | 0 |
| 7 | g | 700 | Y | 44 | 0 |
| 8 | h | 800 | N | 22 | 1 |
| 9 | i | 900 | N | 38 | 1 |
| 10 | j | 1000 | Y | 61 | 1 |
| 11 | k | 1100 | N | 51 | 1 |
| 12 | l | 1200 | N | 21 | 1 |
| 13 | m | 1300 | Y | 32 | 1 |
| 14 | n | 1400 | N | 17 | 1 |
+--------+------+--------+------+-----+--------+
after:
+--------+------+--------+------+-----+--------+
| cst_id | name | salary | fund | age | target |
+--------+------+--------+------+-----+--------+
| 1 | a | 100 | Y | 33 | 0 |
| 2 | b | 200 | Y | 21 | 0 |
| 8 | h | 800 | N | 22 | 1 |
| 9 | i | 900 | N | 38 | 1 |
| 3 | c | 300 | Y | 45 | 0 |
| 13 | m | 1300 | Y | 32 | 1 |
| 14 | n | 1400 | N | 17 | 1 |
| 5 | e | 500 | N | 37 | 0 |
| 6 | f | 600 | Y | 56 | 0 |
| 7 | g | 700 | Y | 44 | 0 |
| 10 | j | 1000 | Y | 61 | 1 |
| 11 | k | 1100 | N | 51 | 1 |
| 4 | d | 400 | N | 26 | 0 |
+--------+------+--------+------+-----+--------+
Following explanation is to create NEW table from existing one with same data as in old one(same schema) with shuffled rows.
Create a new table and import all those rows and records from first table, randomly selected and ordered by the RAND() SQL function:
CREATE TABLE new_table SELECT * FROM old_table ORDER BY RAND()
Or if you have created a table identical to the structure of the old one, use INSERT INTO instead:
INSERT INTO new_table SELECT * FROM old_table ORDER BY RAND()
That is of course if you want to preserve the primary key identification of each row, which is most likely what you want to do with old tables because of the legacy code and data entity relationships. However, if you want a grand new table with all the shuffled records completely rearranged in order as if it’s for a different application, you can ignore the primary key or ID by not importing the ID field of the old table.
For instance, you got ID, col1 and col2 in the old table as data fields. To create a grand new reordered or shuffled rows version of old table:
CREATE TABLE new_table SELECT col1, col2 FROM old_table ORDER BY RAND()
And a new primary key ID will be automatically assigned to each of the rows in the new table.
But in SQL, Relations have no order. Rows in a relational database are not sorted. You may get different order while retrieving.

How can I compare 2 tables in MS-SQL?

How can I compare 2 tables with the same rows, but different data?
The tables are something like this:
1. Table price_old:
|-----------------------|
| id | price1 | price2 |
|-----------------------|
| 1 | 12 | 12 |
|-----------------------|
| 2 | 12 | 55 |
------------------------|
| 3 | 12 | 40 |
-------------------------
The tables are something like this:
2. Table price_old:
|-----------------------|
| id | price1 | price2 |
|-----------------------|
| 1 | 12 | 12 |
|-----------------------|
| 2 | 13 | 40 |
------------------------|
| 3 | 10 | 40 |
-------------------------
The Result should look like this:
3. Table Result:
|----------------------------------------------------------|
| id | price1_old | price1_new | price2_old | price2_new |
|----------------------------------------------------------|
| 2 | 12 | 13 | 55 | 40 |
|----------------------------------------------------------|
| 3 | 13 | 10 | 40 | 40 |
Try this, joining on the id and filtering out occurrences where at least one price is different:
SELECT
old.id,
old.price1 as price1_old,
old.price2 as price2_old,
new.price1 as price1_new,
new.price2 as price2_new
FROM price_old as old
LEFT JOIN price_new as new on old.id=new.id
WHERE old.price1<>new.price1
OR old.price2<>new.price2
This is might be an approach:
SELECT 'TableName' AS `set`, r.*
FROM robot r
WHERE ROW(r.col1, r.col2, …) NOT IN
(
SELECT *
FROM TableName2
)
UNION ALL
SELECT 'TableName2' AS `set`, t.*
FROM tbd_robot t
WHERE ROW(t.col1, t.col2, …) NOT IN
(
SELECT *
FROM TableName1
)

find other columns value based on maximum of one column using groupby particular column

I have data like below
+-------+---------+--------+
| Count | Mindif | Device |
+-------+---------+--------+
| 45 | 3 | A |
| 78 | 4 | A |
| 52 | 5 | A |
| 24 | 6 | A |
| 22 | 1 | B |
| 22 | 2 | B |
| 34 | 3 | B |
| 37 | 4 | B |
| 52 | 5 | B |
| 34 | 6 | B |
| 13 | 1 | C |
| 30 | 2 | C |
| 57 | 3 | C |
| 111 | 4 | C |
| 35 | 5 | C |
+-------+---------+--------+
Want to find Mindif and device based on max value of count.
Output be like
+-------+---------+--------+
| Count | Mindif | Device |
+-------+---------+--------+
| 78 | 4 | A |
| 52 | 5 | B |
| 111 | 4 | C |
+-------+---------+--------+
You can use a query like this:
SELECT t1.Count, t1.Mindif, t1.Device
FROM mytable AS t1
JOIN (
SELECT Device, MAX(Count) AS Count
FROM mytable
GROUP BY Device
) AS t2 ON t1.Device = t2.Device AND t1.Count = t2.Count
The query uses a derived table that returns the max Count value per Device. Joining back to the original table we can get the desired result.
using Window Function
SELECT Count, Mindif, Device
FROM
(SELECT Count, Mindif, Device,
rank() over (order by Count desc) as r
FROM table) S
WHERE S.r = 1;
OR
Simple Join with MAX
SELECT a.* FROM table a
LEFT SEMI JOIN
(SELECT MAX(Count)Cnt
FROM table)b on (a.Count = b.Cnt)

MS Access SQL query from 3 tables

I have 3 tables shown below in MS Access 2010:
Table: devices
id | device_id | Company | Version | Revision |
-----------------------------------------------
1 | dev_a | Almaras | 1.5.1 | 0.2A |
2 | dev_b | Enigma | 1.5.1 | 0.2A |
3 | dev_c | Almaras | 1.5.1 | 0.2C |
*Field: device_id is Primary Key Unique String
*Field ID is just an auto-number column
Table: activities
id | act_id | act_date | act_type | act_note |
------------------------------------------------
1 | dev_a | 07/22/2013 | usb_axc | ok |
2 | dev_a | 07/23/2013 | usb_axe | ok | (LAST ROW for dev_a)
3 | dev_c | 07/22/2013 | usb_axc | ok | (LAST ROW for dev_c)
4 | dev_b | 07/21/2013 | usb_axc | ok | (LAST ROW for dev_b)
*Field: act_id contains device_id; NOT UNIQUE
*Field ID is just an auto-number column
Table: matrix
id | mat_id | tc | ts | bat | cycles |
-----------------------------------------
1 | dev_a | 2811 | 10 | 99 | 200 |
2 | dev_a | 2911 | 10 | 97 | 400 |
3 | dev_a | 3007 | 10 | 94 | 600 |
4 | dev_a | 3210 | 10 | 92 | 800 | (LAST ROW for dev_d)
5 | dev_b | 1100 | 5 | 98 | 100 |
6 | dev_b | 1300 | 8 | 93 | 200 |
7 | dev_b | 1411 | 11 | 90 | 300 | (LAST ROW for dev_b)
8 | dev_c | 4000 | 27 | 77 | 478 | (LAST ROW for dev_c)
*Field: mat_id contains device_id; NOT UNIQUE
*Field ID is just an auto-number column
Is there any way to query tables to get results as shown below (each device from devices and only last row added [see example output table] from each of the other two tables):
Query Results:
device_id | Company | act_date | act_type | bat | cycles |
------------------------------------------------------------
device_a | Almaras | 07/23/2013 | usb_axe | 92 | 800 |
device_b | Enigma | 07/21/2013 | usb_axc | 90 | 300 |
device_c | Almaras | 07/22/2013 | usb_axc | 77 | 478 |
Any ideas? Thank you in advance for reading and helping me out :)
I think is what you want,
SELECT a.device_id, a.Company,
b.act_date, b.act_type,
c.bat, c.cycles
FROM ((((devices AS a
INNER JOIN activities AS b
ON a.device_id = b.act_id)
INNER JOIN matrix AS c
ON a.device_id = c.mat_id)
INNER JOIN
(
SELECT act_id, MAX(act_date) AS max_date
FROM activities
GROUP BY act_id
) AS d ON b.act_id = d.act_id AND b.act_date = d.max_date)
INNER JOIN
(
SELECT mat_id, MAX(tc) AS max_tc
FROM matrix
GROUP BY mat_id
) AS e ON c.mat_id = e.mat_id AND c.tc = e.max_tc)
The subqueries: d and e separately gets the latest row for every act_id.
Try
SELECT devices.device_id, devices.Company, activities.act_data, activities.act_type, matrix.bat, matrix.cycles
FROM devices
LEFT JOIN activities
ON devices.device_id = activities.act_id
LEFT JOIN matrix
ON devices.device_id = matrix.mat_id;
What do you consider the "last" row in Matrix?
You need to do something like
WHERE act_date in (SELECT max(a.act_date) from activities a where a.mat_id=d.device_id GROUP BY a.mat_id)
and something similar for the join to matrix.