Table whose columns are random samples from an original column - sql

I have numerical data ​​related to clinical information from people with a particular disease recorded in an specific column ('Lab') from Table A.
I need to get a Table B with 30 rows and 50 columns.
The columns of Table B should be random samples from the values ​​contained in column 'Lab' (nearly 3300 registers).
I am able to get a table with one column using:
SELECT Lab FROM Table_A sample (1) WHERE Lab IS NOT NULL;
Is it possible make a query using the SELECT command that results in Table B with all its 50 columns without the need of getting its columns one by one?

You can use the RAND operator with ORDER BY, like this:
SELECT * FROM Table_A ORDER BY RAND( ) LIMIT 0 , 30

Related

Bigquery join 2 tables with id concated from 4 columns and create a new table dynamically

I have two tables in Bigquery from two different data sources, lets say x and y. I want to join these two tables on os_name, tracker_name, date, country columns. For that i am using concat function and joining like this:
full outer join x on concat(x.date,x.os_name,x.tracker_name, x.country) = concat(y.date,y.os_name,y.tracker_name,y.country_code)
as a query result common columns also gets duplicated. like in the result there is os_name and os_name_1, country_code, country_code_1 etc. columns. I don't want that. Final columns should be as in the example below in Final Table Schema.
I want to return all records from both sides. For example if there is no match in table y
y_install, and y_purcase will be 0, and vice versa.
X TABLE SCHEMA:
os_name,
tracker_name,
date ,
country
install
purchase
Y TABLE SCHEMA:
os_name,
tracker_name,
date,
country,
y_install,
y_purchase
Final Table Schema required:
os_name,
tracker_name,
date ,
country
install
purchase,
y_install,
y_purchase
I am going to schedule the query and write results to destination table at given interval.
Can you help me out with this query.
Regarding the final table, I don't understand whether you want to return first NON NULL result or whether you want to have e.g. an array which will contain both results from both tables in case both tables a valid value. In my sample table, do you want row 1,2 (actually the same thing) or 3?
row_number
x_install
y_install
final_table_install
1
23
50
23
2
NULL
50
50
3
23
50
[23,50]
It comes out that What I wanted to use was union all. First, I added the non-common columns to the two tables so that the schemas of the two tables are equal. So I was able to vertically merge tables using union all. Thanks for trying to help out anyway.

How to find out if a row of one table exists in the values of at least one row of another table?

I have two SQL tables, example below:
Table 1 (column types varchar, integer, numeric)
A
B
C
D
A007
22
14.02
_Z 1
A008
36
15.06
_Z 1
Table 2 (column types varchar)
A
B
C
D
A009,A010,A011
33,35,36
16.06,17.06
_Z 1,_Z 2
A003,A007,A009
14,22,85
13.01,17.05,14.02
_Z 1
Is there a way to compare individual rows of the first table with the rows of the second table and find out which row of the first table does not occur in the values of any row of the second table?
As can be seen, the first row of table 1 occurs in the values of the second row of table 2.
However, the second row of table 1 does not occur in the values of the rows of table 2, therefore the desired output is row 2 of table 1.
Desired output table:
A
B
C
D
A008
36
15.06
_Z 1
What I have tried so far:
My solution was to create a table containing all possible combinations of column values for each row of the second table (with the same column data types as the columns of the first table) and then use SELECT * FROM TABLE1 EXCEPT SELECT * FROM TABLE2 to get the difference rows.
The solution worked (for relatively small tables) but I am currently in a situation where generating all combinations of column values for each row of the second table (which in my case has 500 rows) results in a table containing millions of rows, so I am looking for another solution, where I can use the original table with 500 rows.
Thank you in advance for any possible answer, preferably one that could also work in the IBM DB2 database.
We can use a LIKE trick here along with string concatenation:
SELECT t1.*
FROM Table1 t1
WHERE NOT EXISTS (
SELECT 1
FROM Table2 t2
WHERE ',' || t2.A || ',' LIKE '%,' || t1.A || ',%'
);
Note that it would be a preferable table design for Table2 to not store CSV values in this way. Instead, get every A value onto a separate row.

How to aggregate data stored column-wise in a matrix table

I have a table, Ellipses (...), represent multiple columns of a similar type
TABLE: diagnosis_info
COLUMNS: visit_id,
patient_diagnosis_code_1 ...
patient_diagnosis_code_100 -- char(100) with a value of ‘0’ or ‘1’
How do I find the most common diagnosis_code? There are 101 columns including the visit_id. The table is like a matrix table of 0s and 1s. How do I write something that can dynamically account for all the columns and count all the rows where the value is 1?
What I would normally do is not feasable as there are too many columns:
SELECT COUNT(patient_diagnostic_code_1), COUNT(patient_diagnostic_code_2),... FROM diagnostic_info WHERE patient_diagnostic_code_1 = ‘1’ and patient_diagnostic_code_2 = ‘1’ and ….
Then even if I typed all that out how would I select which column had the highest count of values = 1. The table is more column oriented instead of row oriented.
Unfortunately your data design is bad from the start. Instead it could be as simple as:
patient_id, visit_id, diagnosis_code
where a patient with 1 dignostic code would have 1 row, a patient with 100 diagnostic codes 100 rows and vice versa. At any given time you could transpose this into the format you presented (what is called a pivot or cross tab). Also in some databases, for example postgreSQL, you could put all those diagnostic codes into an array field, then it would look like:
patient_id, visit_id, diagnosis_code (data type -bool or int- array)
Now you need the reverse of it which is called unpivot. On some databases like SQL server there is UNPIVOT as an example.
Without knowing what your backend this, you could do that with an ugly SQL like:
select code, pdc
from
(
select 1 as code, count(*) as pdc
from myTable where patient_diagnosis_code_1=1
union
select 2 as code, count(*) as pdc
from myTable where patient_diagnosis_code_2=1
union
...
select 100 as code, count(*) as pdc
from myTable where patient_diagnosis_code_100=1
) tmp
order by pdc desc, code;
PS: This would return all the codes with their frequency ordered from most to least. You could limit to get 1 to get the max (with ties in case there are more than one code to match the max).

How can I use an input from another table in my query?

I'm creating a new table using PostgreSQL, but I need to get a parameter from another table as an input.
This is the table I have (I called table_1):
id column_1
1 100
2 100
3 100
4 100
5 100
I want to create a new table, but only using ids that are higher than the highest id from the table above (table_1). Something like this:
insert into table_new
select id, column_1 from table_old
where id > (max(id) from table_1)
How can I do this? I tried searching, but I got to several posts like https://community.powerbi.com/t5/Desktop/M-Query-Create-a-table-using-input-from-another-table/td-p/209923, Take one table as input and output using another table BigQuery and sql query needs input from another table, which are not exactly what I need.
Just use where id > (select max(id) from table_1).

Query to find duplicate values for two fields

Sorry for the Title, But didn't know how to explain.
I have a table that have 2 fields A and B.
I want find all rows in the table that have duplicate A (more than one record) but at the same time A will consider as a duplicate only if B is different in both rows.
Example:
FIELD A Field B
10 10
10 10 // This is not duplicate
10 10
10 5 // this is a duplicate
How to to this in a single query
Let's break this down into how you would go about constructing such a query. You don't make it clear whether you're looking for all values of A or all rows but let's assume all values of A initially.
The first step therefore is to create a list of all values of A. This can be done two ways, DISTINCT or GROUP BY. I'm going to use GROUP BY because of what else you want to do:
select a
from your_table
group by a
This returns a single column that is unique on A. Now, how can you change this to give you the unique values? The most obvious thing to use is the HAVING clause, which allows you to restrict on aggregated values. For instance the following will give you all values of A which only appear once in the table
select a
from your_table
group by a
having count(*) = 1
That is the count of all values of A inside the group is 1. You don't want this of course, you want to do this with the column B. You need there to exist more than one value of B in order for the situation you want to identify to be possible (if there's only one value of B then it's impossible). This gets us to
select a
from your_table
group by a
having count(b) > 1
This still isn't enough as you want two different values of B. The above just counts the number of records with the column B. Inside an aggregate function you use the DISTINCT keyword to determine unique values; bringing us to:
select a
from your_table
group by a
having count(distinct b) > 1
To transcribe this into English this means select all unique values of A from YOUR_TABLE that have more than one values of B in the group.
You can use this method, or something similar, to build up your own queries as you create them. Determine what you want to achieve and slowly build up to it.
select FIELD from your_table group by FIELD having count(b) > 1
take in consideration that this will return count of all duplicate
example
if you have values
1
1
2
1
it will return 3 for value 1 not 2