Data reorganization - SQL

Data reorganization - SQL - sql

I have a question for reorganizing the data in SQL in the manner indicated below.
I have a table with the given data structure.
Username
Type
Data
test
1
Data1
test
1
Data2
test
1
Data3
test
2
Data1
test
2
Data2
test
2
Data3
Using query how can i get the view as given below? How can I make more columns from the last column, for those rows where the data in the first and second columns are the same?
Username
Type
Data1
Data2
Data3
test
1
Data1
Data2
Data3
test
2
Data1
Data2
Data3

Assuming you know in advance you just need three values for each of your output records, you can use the ROW_NUMBER window function to provide a ranking for each of your fields, then you can use conditional aggregation, using CASE expressions, to set each of your Data information in the specific field. Eventually aggregate per <Username, Type> partition.
WITH cte AS (
SELECT tab.*,
ROW_NUMBER() OVER(PARTITION BY UserName, Type ORDER BY Data) AS rn
FROM tab
)
SELECT Username, Type,
MAX(CASE WHEN rn = 1 THEN Data END) AS Data1,
MAX(CASE WHEN rn = 2 THEN Data END) AS Data2,
MAX(CASE WHEN rn = 3 THEN Data END) AS Data3
FROM cte
GROUP BY Username, Type
This is likely to work on all most common up-to-date DBMS'.

Related

sql how to assign the same ID for the same group

I have a dataset as this:
ID SESSION DATE
1 A 2021/1/1
1 A 2021/1/2
1 B 2021/1/3
1. B 2021/1/4
1 A 2021/1/5
1 A 2021/1/6
So what I want to create is the GROUP column which assigns the same row number for where ID column AND SESSION column is the same as below:
ID SESSION DATE GROUP
1 A 2021/1/1 1
1 A 2021/1/2 1
1 B 2021/1/3 2
1 B 2021/1/4 2
1 A 2021/1/5 3
1 A 2021/1/6 3
Does anyone know how to do this in SQL in an efficient way because I have about 5 billion rows? Thank you in advance!

You have a kind of gaps and islands problem, you can create your groupings by counting when the session changes using lag, like so:
select Id, Session, Date,
Sum(case when session = prevSession then 0 else 1 end) over(partition by Id order by date) "Group"
from (
select *,
Lag(Session) over(partition by Id order by date) prevSession
from t
)t;
Example Fiddle using MySql but this is ansi SQL that should work in most DBMS.

BigQuery - Populating SELECT fields from results of a temp function

In Google BigQuery, I have a query that has the same field name appearing multiple times in various join subqueries. I would like to abstract out this field name into a temporary function such that it will amend it in all places if I change it within the function only.
This is the query I have:
SELECT *
FROM
(SELECT field1, COUNT(*) sq1_total
FROM table
WHERE condition = 1
GROUP BY field 1) sq1
LEFT JOIN
(SELECT field1, COUNT(*) sq2_total
FROM table
WHERE condition = 0
GROUP BY field 1) sq2
USING(field1)
This is what I would like to have:
CREATE TEMP FUNCTION replace_field_name() AS (...);
SELECT *
FROM
(SELECT replace_field_name(), COUNT(*) sq1_total
FROM table
WHERE condition = 1
GROUP BY replace_field_name()) sq1
LEFT JOIN
(SELECT replace_field_name(), COUNT(*) sq2_total
FROM table
WHERE condition = 0
GROUP BY replace_field_name()) sq2
USING(replace_field_name())
So that when I want to compare many different fields like this, I only need to change the field name in one place as opposed to five places.
Is this possible?

Below thoughts/proposals relevant in terms of BigQuery Standard SQL
I would like to abstract out this field name into a temporary function ...
As Tim mentioned in his comment - it is quite not possible to do in a way you mock it
I want to compare many different fields like this, I only need to change the field name in one place as opposed to five places.
You can try to re-write your query in such a way that you will need to change field name in less places, like in below examples
#standardSQL
SELECT * FROM (SELECT field1, COUNT(*) sq1_total FROM `project.dataset.table` WHERE condition = 1 GROUP BY 1) sq1
LEFT JOIN (SELECT field1, COUNT(*) sq2_total FROM `project.dataset.table` WHERE condition = 0 GROUP BY 1) sq2
USING (field1)
OR
#standardSQL
SELECT DISTINCT field1,
COUNTIF(condition = 1) OVER(PARTITION BY field1) sq1_total,
COUNTIF(condition = 0) OVER(PARTITION BY field1) sq2_total
FROM `project.dataset.table`
In bothe above queries - there are "just" three place to replace field name in (as opposed to 5 in original query)
Obviously - this does not address the problem in qualitative way - just quantitatively
Is this possible?
Good news - there is always work around - but usually it requires to slightly change something in your requirements, expectations
For example in below solution you need to set field name only once!!! in UNNEST(['field1']) field line
#standardSQL
SELECT DISTINCT field, value,
COUNTIF(condition = 1) OVER(PARTITION BY field, value) sq1_total,
COUNTIF(condition = 0) OVER(PARTITION BY field, value) sq2_total
FROM (
SELECT field, REGEXP_EXTRACT(x, CONCAT(r'"', field, '":"?([^",])"?')) value, condition
FROM `project.dataset.table` t,
UNNEST([TO_JSON_STRING(t)]) x,
UNNEST(['field1']) field
)
the "price" is - you will have output in form of (with dummy data)
Row field value sq1_total sq2_total
1 field1 1 1 3
2 field1 2 1 0
instead of output from original query
Row field1 sq1_total sq2_total
1 1 1 3
2 2 1 null
I want to compare many different fields like this ...
The extra value in above approach is that you can run your comparison (for as many fields as you want) in one shot - by adding needed fields' names into UNNEST(['field1']) field list as in below example
#standardSQL
SELECT DISTINCT field, value,
COUNTIF(condition = 1) OVER(PARTITION BY field, value) sq1_total,
COUNTIF(condition = 0) OVER(PARTITION BY field, value) sq2_total
FROM (
SELECT field, REGEXP_EXTRACT(x, CONCAT(r'"', field, '":"?([^",])"?')) value, condition
FROM `project.dataset.table` t,
UNNEST([TO_JSON_STRING(t)]) x,
UNNEST(['field1', 'field2']) field
)
-- ORDER BY field, value
so result could look like
Row field value sq1_total sq2_total
1 field1 1 1 3
2 field1 2 1 0
3 field2 1 1 1
4 field2 2 0 2
5 field2 3 1 0

Select top N columns based on standardized values

Got a bit of googly question. Is it possible to select say 10 columns based on the values in each column if all the values are standardized.
So for example
cluster Id | v1 | v2| v3 | v4 | v6 | v26
___________________________________________
1 | 4.2|0.9|05 |3.2 | 0.7|0.5
2 | 1.2|0.1|0.9 |0.21|0.3 |0.1
so in this example if I wanted 3 top three columns for cluster 1 i'd have
cluster ID |v1 |v4 |v2
1 |4.2|3.2|0.9
I'm open to any suggestions at the moment i'm using Oracle Sql but wiling to switch if theres a solution on a different platform and its impossible using SQL
edit. I've added an image which shows the feature i'm trying to replicate on Sql developer. The fetch size is the number of variables/attributes and there must be some table sitting behind the model that's being queried when I change the fetch size and thats the statement i'm trying to reproduce
thank you

If you want the top three values, I would unpivot the data and reaggregate. Oracle 12c has some useful functionality for this; for earlier versions I would just use more traditional SQL methods.
It is unclear whether you want the column names or the values. The following does both:
select id,
max(case when seqnum = 1 then v end) as v_1,
max(case when seqnum = 2 then v end) as v_2,
max(case when seqnum = 3 then v end) as v_3,
max(case when seqnum = 1 then which end) as which_1,
max(case when seqnum = 2 then which end) as which_2,
max(case when seqnum = 3 then which end) as which_3
from (select id, v, which, row_number() over (partition by id order by v desc) as seqnum
from ((select id, v1 as v, 'v1' as which from t) union all
(select id, v2 as v, 'v2' as which from t) union all
(select id, v3 as v, 'v3' as which from t) union all
(select id, v4 as v, 'v4' as which from t) union all
(select id, v5 as v, 'v5' as which from t)
) t
) t
group by id;

In the end the approach I took was to go through all the Oracle Data Miner tables created during the clustering of my dataset. One of them , table DM$PTCLUS_K_M_1_2 , contained a pivot table with with all the clusters,values,variable Id and name. Recreated here using my example
cluster_id,variable_id,value,variable_name
1 | 1 | 4.2 | v1
And by doing a nested select statement with a where clause (cluster_id) and ordering by value I could then pick out the top 10 variables and their values for each cluster
select * from
(select * from DM$PTCLUS_K_M_1_2
where cluster_id = 1
order by value)
where rownum < 11
For those with a similar problem and want to get cluster centroids or values i suggest looking at the dataminer schema and checking the tables there , a few of them will contain the data u need

Advanced SQL Select and Union Statements

I've seen other similar questions and I have tried implementing many solutions, but to to no avail so far. This specific questions involves a little more complexity. What I need to do is create a table and join columns to the right side depending on certain criterion. It seems simple enough, but there are a few bumps that I am encountering.
The tables are as follows:
ADC_DATA_COLLECTION_HEADER
(PK)Transaction_ID | BEMSID | DEVICE | TIMESTAMP | CONFIG_NAME
ADC_DATA_COLLECTION_APPS
(FK)CONFIG_NAME | NUM_DATA_ELEMENTS | DATA_ELEMENT1 | DATA_ELEMENT2 | DATA_ELEMENT3 | DATA_ELEMENT4
ADC_DATA_COLLECTION_DATA
(FK)TRANSACTION_ID | DATA_ELEMENT_NUMBER | DATA
I want my final output to look like:
TRANSACTION_ID | DEVICE | CONFIG_NAME | DATA | DATA | DATA | DATA
The "data" column is filled in using the table ADC_DATA_COLLECTION_DATA. The first instance of "data" would be the "data" field in ADC_DATA_COLLECTION_DATA where DATA_ELEMENT_NUMBER = 1. The second instance of "data" would be the "data" field in ADC_DATA_COLLECTION_DATA where DATA_ELEMENT_NUMBER = 2... And so on.
The furthest I have gotten is by using a join statement, except I have nulls in places I do not want them. The code I have used and the results are posted below. So far I only wrote code for the first two columns of data.
SELECT
ADC_Data_Collection_header.BEMSID,
ADC_Data_Collection_header.DEVICE,
ADC_Data_Collection_header.CONFIG_NAME,
null AS locationlabel,
null AS partno
/*null AS partno2,
null AS DE4,
null AS DE5,
null AS DE6 */
FROM
ADC_Data_Collection_header,
ADC_Data_Collection_apps,
ADC_Data_Collection_data
WHERE
ADC_Data_Collection_header.CONFIG_NAME = 'mobileScanning'
AND ADC_Data_Collection_header.BEMSID = '2386531'
AND ADC_Data_Collection_header.CONFIG_NAME = ADC_Data_Collection_apps.CONFIG_NAME
AND (TO_DATE('7/19/2013','MM/DD/YYYY') <= timestamp AND TO_DATE('7/27/2013','MM/DD/YYYY') >= timestamp)
AND ADC_DATA_COLLECTION_HEADER.transaction_ID = ADC_DATA_COLLECTION_DATA.Transaction_ID
UNION
SELECT
null as BEMSID,
null as DEVICE,
null as CONFIG_NAME,
ADC_Data_Collection_DATA.DATA AS locationlabel,
null as partno
FROM
ADC_DATA_COLLECTION_DATA,
ADC_Data_Collection_header,
ADC_Data_Collection_apps
WHERE
ADC_DATA_COLLECTION_DATA.DATA_ELEMENT_NUMBER = 3
AND ADC_Data_Collection_header.CONFIG_NAME = 'mobileScanning'
AND (TO_DATE('7/19/2013','MM/DD/YYYY') <= timestamp AND TO_DATE('7/27/2013','MM/DD/YYYY') >= timestamp)
AND ADC_DATA_COLLECTION_HEADER.transaction_ID = ADC_DATA_COLLECTION_DATA.Transaction_ID
UNION
SELECT
null as BEMSID,
null as DEVICE,
null as CONFIG_NAME,
null as locationlabel,
ADC_Data_Collection_DATA.DATA AS partno
FROM
ADC_DATA_COLLECTION_DATA,
ADC_Data_Collection_header,
ADC_Data_Collection_apps
WHERE
ADC_DATA_COLLECTION_DATA.DATA_ELEMENT_NUMBER = 4
AND ADC_Data_Collection_header.CONFIG_NAME = 'mobileScanning'
AND (TO_DATE('7/19/2013','MM/DD/YYYY') <= timestamp AND TO_DATE('7/27/2013','MM/DD/YYYY') >= timestamp)
AND ADC_DATA_COLLECTION_HEADER.transaction_ID = ADC_DATA_COLLECTION_DATA.Transaction_ID
The result from this appears with null values which I do not want to have.
If you can offer an explicit solution using a join statement or a fix to this union approach, it would be much appreciated. Thank you in advance!

UNION gives you additional rows so it's not the right tool for this situation.
Here's an abbreviated version that uses your ADC_DATA_COLLECTION_DATA table only; you should be able to incorporate this into your query:
SELECT
Transaction_ID,
MAX(CASE WHEN Data_Element_Number = 1 THEN Data END) AS Data1,
MAX(CASE WHEN Data_Element_Number = 2 THEN Data END) AS Data2,
MAX(CASE WHEN Data_Element_Number = 3 THEN Data END) AS Data3,
MAX(CASE WHEN Data_Element_Number = 4 THEN Data END) AS Data4
FROM ADC_DATA_COLLECTION_DATA
GROUP BY Transaction_ID
This is a fairly common "Pivot Table" hack for Oracle (and MySQL and SQL Server). Oracle also supports PIVOT queries but I'm not that good with them.
Note that once you put your final query together with the Device and Config_Name columns, you'll need to add those columns to your GROUP BY.

I would use pivot for this:
select
h.transaction_id,
h.device,
h.config_name,
d.data1,
d.data2,
d.data3,
d.data4
from
ADC_DATA_COLLECTION_HEADER h
inner join (
select *
from ADC_DATA_COLLECTION_DATA
pivot
(
max(data)
for data_element_number in (1 as data1, 2 as data2, 3 as data3, 4 as data4)
)
) d
on d.transaction_id = h.transaction_id
where
(TO_DATE('7/19/2013','MM/DD/YYYY') <= timestamp AND TO_DATE('7/27/2013','MM/DD/YYYY') >= timestamp);
I put together an example SQL Fiddle at: http://www.sqlfiddle.com/#!4/fe1c94/9/0

Sort data row in sql

please help me i have columns from more than one table and the data type for all these columns is integer
i want to sort the data row (not columns (not order by)) Except the primary key column
for example
column1(pk) column2 column3 column4 column5
1 6 5 3 1
2 10 2 3 1
3 1 2 4 3
How do I get this result
column1(pk) column2 column3 column4 column5
1 1 3 5 6
2 1 2 3 10
3 1 2 3 4
Please help me quickly .. Is it possible ?? or impossible ???
if impossible how I could have a similar result regardless of sort

What database are you using? The capabilities of the database are important. Second, this suggests a data structure issue. Things that need to be sorted would normally be separate entities . . . that is, separate rows in a table. The rest of this post answers the question.
If the database supports pivot/unpivot you can do the following:
(1) Unpivot the data to get in the format , ,
(2) Use row_number() to assign a new column, based on the ordering of the values.
(3) Use the row_number() to create a varchar new column name.
(4) Pivot the data again using the new column.
You can do something similar if this functionality is not available.
First, change the data to rows:
(select id, 'col1', col1 as val from t) union all
(select id, 'col2', col2 from t) union all
. . .
Call this byrow. The following query appends a row number:
select br.*, row_number() over (partition by id order by val) as seqnum
from byrow br
Put this into a subquery to unpivot. The final solution looks like:
with byrow as (<the big union all query>)
select id,
max(case when seqnum = 1 then val end) as col1,
max(case when seqnum = 2 then val end) as col2,
...
from (select br.*, row_number() over (partition by id order by val) as seqnum
from byrow br
) br
group by id

You can use pivot function of sql server to convert the row in column. Apply the sorting there and again convert column to row using unpivot.

Here is a good example using PIVOT, you should be able to adapt this to meet your needs
http://blogs.msdn.com/b/spike/archive/2009/03/03/pivot-tables-in-sql-server-a-simple-sample.aspx

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Data reorganization - SQL - sql

Related

sql how to assign the same ID for the same group

BigQuery - Populating SELECT fields from results of a temp function

Select top N columns based on standardized values

Advanced SQL Select and Union Statements

Sort data row in sql

Categories

Resources