row data to column data

row data to column data - pandas

I am a newbie in python. I have data looks like this:
ID Annotation X Y
ID_1 first 767 942
ID_1 last 768 943
ID_2 first 769 944
ID_2 last 770 945
I want to make new column first XY and last for XY. my expected result:
ID X_first Y_first X_last Y_last
ID_1 767 942 768 943
ID_2 769 944 770 945
thank you for your help

I am using unstack for the pivot problem
s=df.set_index(['ID','Annotation']).unstack()
s.columns=s.columns.map('_'.join) # columns flatten
s.reset_index(inplace=True)
s
Out[353]:
ID X_first X_last Y_first Y_last
0 ID_1 767 768 942 943
1 ID_2 769 770 944 945

Related

How to remove duplicates based on condition?

Here is my sample table:
idmain
idtime
idperson1
idperson2
141
20220106
510
384
221
20220107
300
184
221
20220107
301
184
465
20220108
300
184
525
20220109
111
123
525
20220109
112
123
525
20220109
113
123
Duplicated records only differ by idperson1. So I require to remove these records preserving only the record with the max value of idperson1. So my final table should be:
idmain
idtime
idperson1
idperson2
141
20220106
510
384
221
20220107
301
184
465
20220108
300
184
525
20220109
113
123
db<>fiddle

first you can use subquery to obtain max value of idperson1.
then use this condition like this:
select a.* from fact1 a
where idperson1=(select max(b.idperson1) from fact1 b where a.idtime=b.idtime and a.idperson2=b.idperson2);

How to remove unwanted values in data when reading csv file

Reading Pina_Indian_Diabities.csv some of the values are strings, something like this
+AC0-5.4128147485
734 2
735 4
736 0
737 8
738 +AC0-5.4128147485
739 1
740 NaN
741 3
742 1
743 9
744 13
745 12
746 1
747 1
like in row 738, there re such values in other rows and columns as well.
How can I drop them?

HSQLDB query to replace a null value with a value derived from another record

This is a small excerpt from a much larger table, call it LOG:
RN EID FID FRID TID TFAID
1 364 509 7045 null 7452
2 364 509 7045 7452 null
3 364 509 7045 7457 null
4 375 512 4525 5442 5241
5 375 513 4525 5863 5241
6 375 515 4525 2542 5241
7 576 621 5632 null 5452
8 576 621 5632 2595 null
9 672 622 5632 null 5966
10 672 622 5632 2635 null
I would like a query that will replace the null in the 'TFAID' column with the value from the 'TFAID' column from the 'FID' column that matches.
Desired output would therefore be:
RN EID FID FRID TID TFAID
1 364 509 7045 null 7452
2 364 509 7045 7452 7452
3 364 509 7045 7457 7452
4 375 512 4525 5442 5241
5 375 513 4525 5863 5241
6 375 515 4525 2542 5241
7 576 621 5632 null 5452
8 576 621 5632 2595 5452
9 672 622 5632 null 5966
10 672 622 5632 2635 5966
I know that something like
SELECT RN,
EID,
FID,
FRID,
TID,
(COALESCE TFAID, {insert clever code here}) AS TFAID
FROM LOG
is what I need, but I can't for the life of me come up with the clever bit of SQL that will fill in the proper TFAID.

HSQLDB supports SQL features that can be used as alternatives. These features are not supported by some other databases.
CREATE TABLE LOG (RN INT, EID INT, FID INT, FRID INT, TID INT, TFAID INT);
-- using LATERAL
SELECT l.RN, l.EID, l.FID, l.FRID, l.TID,
COALESCE(l.TFAID, f.TFAID) AS TFAID
FROM LOG l , LATERAL (SELECT MAX(TFAID) AS TFAID FROM LOG f WHERE f.FID = l.FID) f
-- using scalar subquery
SELECT l.RN, l.EID, l.FID, l.FRID, l.TID,
COALESCE(l.TFAID, (SELECT MAX(TFAID) AS TFAID FROM LOG f WHERE f.FID = l.FID)) AS TFAID
FROM LOG l

Here is one approach. This aggregates the log to get the value and then joins the result in:
SELECT l.RN, l.EID, l.FID, l.FRID, l.TID,
COALESCE(l.TFAID, f.TFAID) AS TFAID
FROM LOG l join
(select fid, max(tfaid) as tfaid
from log
group by fid
) f
on l.fid = f.fid;
There may be other approaches that are more efficient. However, HSQL doesn't implement all SQL features.

SQL Query: How to pull counts of two coulmns from respective tables

Given two tables:
1st Table Name: FACETS_Business_NPI_Provider
Buss_ID NPI Bussiness_Desc
11 222 Eleven 222
12 223 Twelve 223
13 224 Thirteen 224
14 225 Fourteen 225
11 226 Eleven 226
12 227 Tweleve 227
12 228 Tweleve 228
2nd Table : FACETS_PROVIDERs_Practitioners
NPI PRAC_NO PROV_NAME PRAC_NAME
222 943 P222 PR943
222 942 P222 PR942
223 931 P223 PR931
224 932 P224 PR932
224 933 P224 PR933
226 950 P226 PR950
227 951 P227 PR951
228 952 P228 PR952
228 953 P228 PR953
With below query I'm getting following results whereas it is expected to have the provider counts from table FACETS_Business_NPI_Provider (i.e. 3 instead of 4 for Buss_Id 12 and 2 instead of 3 for Buss_Id 11, etc).
SELECT BP.Buss_ID,
COUNT(BP.NPI) PROVIDER_COUNT,
COUNT(PP.PRAC_NO)PRACTITIONER_COUNT
FROM FACETS_Business_NPI_Provider BP
LEFT JOIN FACETS_PROVIDERs_Practitioners PP
ON PP.NOI=BP.NPI
group by BP.Buss_ID
Buss_ID PROVIDER_COUNT PRACTITIONER_COUNT
11 3 3
12 4 4
13 2 2
14 1 0

If I understood it correctly, you might want to add a DISTINCT clause to the columns.
Here is an SQL Fiddle, which we can probably use to discuss further.
http://sqlfiddle.com/#!2/d9a0e6/3

How to sort a sql result based on values in previous row?

I'm trying to sort a sql data selection by values in columns of the result set. The data looks like:
(This data is not sorted correctly, just an example)
ID projectID testName objectBefore objectAfter
=======================================================================================
13147 280 CDM-710 Generic TP-0000120 TOC~~#~~ -1 13148
1145 280 3.2 Quadrature/Carrier Null 25 Deg C 4940 1146
1146 280 3.2 Quadrature/Carrier Null 0 Deg C 1145 1147
1147 280 3.3 External Frequency Reference 1146 1148
1148 280 3.4 Phase Noise 50 Deg C 1147 1149
1149 280 3.4 Phase Noise 25 Deg C 1148 1150
1150 280 3.4 Phase Noise 0 Deg C 1149 1151
1151 280 3.5 Output Spurious 50 Deg C 1150 1152
1152 280 3.5 Output Spurious 25 Deg C 1151 1153
1153 280 3.5 Output Spurious 0 Deg C 1152 1154
............
18196 280 IP Regression Suite 18195 -1
The order of the data is based on the objectBefore and the objectAfter columns. The first row will always be when objectBefore = -1 and the last row will be when objectAfter = -1. In the above example, the second row would be ID 13148 as that is what row 1 objectAfter is equal to. Is there any way to write a query that would order the data in this manner?

This is actually sorting a linked list:
WITH SortedList (Id, objectBefore , projectID, testName, Level)
AS
(
SELECT Id, objectBefore , projectID, testName, 0 as Level
FROM YourTable
WHERE objectBefore = -1
UNION ALL
SELECT ll.Id, ll.objectBefore , ll.projectID, ll.testName, Level+1 as Level
FROM YourTable ll
INNER JOIN SortedList as s
ON ll.objectBefore = s.Id
)
SELECT Id, objectBefore , projectID, testName
FROM SortedList
ORDER BY Level
You can find more details in this post

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

row data to column data - pandas

I am using unstack for the pivot problem s=df.set_index(['ID','Annotation']).unstack() s.columns=s.columns.map('_'.join) # columns flatten s.reset_index(inplace=True) s Out[353]: ID X_first X_last Y_first Y_last 0 ID_1 767 768 942 943 1 ID_2 769 770 944 945

Related

How to remove duplicates based on condition?

How to remove unwanted values in data when reading csv file

HSQLDB query to replace a null value with a value derived from another record

SQL Query: How to pull counts of two coulmns from respective tables

How to sort a sql result based on values in previous row?

Categories

Resources