Bring data frame rows in same order as second data frame - pandas

I got two data frames:
features:
id feat1 feat2 feat3
r0 a1 1 1 1
r1 b3 4 NA NA
r2 q7 8 1 1
labels:
id target
r0 b3 1
r1 q7 0
r2 a1 1
They contain both the same id's but in different order. How can I bring either of them into the order of the other one? Is there a way without sorting both of them into an alphabetic order? Like that I only need to alter one of them?
Thanks a lot for any comments!

Related

Create a single dimension from multiple fields

So essentially I want to create a single dimension by using their respective column/filename, say a Listbox called Asset, to make a selection on this laptop, desktops, server, and tablet.
Many thanks.
enter image description here
What you'd want is the CrossTable prefix (see the Qlik Help here). It allows you to "pivot" a table so that 1 record with multiple columns becomes multiple records with just 2 columns (1 column for field name, 1 column for field value).
So given your table, which we'll call [Data]:
Code
# of laptop
# of desktop
# of servers
# of tablet
d1
0
1
0
1
a2
23
3
0
0
a3
12
5
0
0
f1
0
14
0
0
e3
0
12
0
0
z2
0
5
1
0
...you can use the following Qlik script in the Data Load Editor to get the desired output:
[Pivoted]:
CrossTable ([Device], [DeviceCount], 1) Load * Resident [Data];
Drop Table [Data];
[New Data]:
NoConcatenate Load
[Code]
, Capitalize(SubField([Device], ' ', -1)) as [Device]
, [DeviceCount]
Resident [Pivoted];
Drop Table [Pivoted];
That should give you this result:
Code
Device
DeviceCount
a2
Desktop
3
a2
Laptop
23
a2
Servers
0
a2
Tablet
0
a3
Desktop
5
a3
Laptop
12
a3
Servers
0
a3
Tablet
0
d1
Desktop
1
d1
Laptop
0
d1
Servers
0
d1
Tablet
1
e3
Desktop
12
e3
Laptop
0
e3
Servers
0
e3
Tablet
0
f1
Desktop
14
f1
Laptop
0
f1
Servers
0
f1
Tablet
0
z2
Desktop
5
z2
Laptop
0
z2
Servers
1
z2
Tablet
0

SQL find MIN and MAX range value for a subset of columns in SCD [duplicate]

This question already has answers here:
Find min and max for subsets of consecutive rows - gaps and islands
(2 answers)
Group similar objects in different date ranges to get min and max dates in SQL Server
(1 answer)
Closed 1 year ago.
I have implemented CDC - SCD Type 2 on the customer data set.
I have implemented CDC on large of columns but the ask is to track behavior for only subset of those columns.
In the below input table I have ID column for the customer and RATE_CODE as one of the CDC field and START and END DATES are the CDC changes dates.
In this I wan't to know how over a period of time the customer data(RATE_CODE) is changing.
EX Row 1-3 has same RATE_CODE thus i need min(START_DATE) from ROW#1 and max(END_DATE) from ROW#3.
I tried applying group by on (ID,RATE_CODE) and min and max on dates but it is giving wrong value as in that case the max value will be picked from ROW#9 (for which I want a separate entry considering the ROW#6-9)
INPUT TABLE
ROW NUMBER
ID
RATE_CODE
START_DATE
END_DATE
1
1
A1
01-01-2021
18-01-2021
2
1
A1
18-01-2021
25-02-2021
3
1
A1
25-02-2021
15-03-2021
4
1
A2
15-03-2021
28-03-2021
5
1
A2
28-03-2021
28-05-2021
6
1
A1
28-05-2021
28-06-2021
7
1
A1
28-06-2021
12-07-2021
8
1
A1
20-07-2021
28-07-2021
9
1
A1
28-08-2021
13-09-2021
10
1
A2
13-09-2021
13-10-2021
EXPECTED OUTPUT
ID
RATE_CODE
START_DATE
END_DATE
1
A1
01-01-2021
15-03-2021
1
A2
15-03-2021
28-05-2021
1
A1
28-05-2021
13-09-2021
1
A2
13-09-2021
13-10-2021
There could be some articles or answer on the net as well but due to framing of the question I couldn't find them.
I want the solution in SQL but for the community PySpark and other languages are also welcomed.

How to pivot output in SQL Server 2012

Input:
Table1->Student details
sno sname scourse sjoindate
1 Ram bsc 11/26/2011
2 Sham bcom 10/06/2010
3 Krishna ba 06/16/2012
Table2->Student marks
sno id marks
1 1 67
1 2 77
1 3 80
1 4 60
1 5 90
Table3->Subjectnames
id subjectname
1 Computerscience
2 Maths
3 Satatistics
4 English
5 Hindi
Table4->Student_feedback_QuestionandAnswer
sno Questions Answers
1 Q1 A1
1 Q2 A2
1 Q3 A3
1 Q4 A4
Expected Output:
sno sname scourse sjoindate Computerscience Maths Statistics English Hindi Questions Answers
1 Ram bsc 11/26/2011 67 77 80 60 90 Q1 A1
Q2 A2
Q3 A3
Q4 A4
In the above tables the rows of Table2 and Table3 can increase and decrease depending on the studentname.
I need tabled to combine the data from all four tables into one result set but I don't know how to use pivot with dynamic header with dynamic column values:
Data from Table3.Subjectname column values should be header column names e.g. Computerscience, Hindi.
Data from Table2.marks should be the value under the header column name e.g. marks for id=1 under Computerscience.
the Questions, Answers column from Table4 should come in result set depending on the value of `Table1.sno'
If you can provide a hint for creating a table using pivot with dynamic header with dynamic column values ,yeah using two tables.
Please let me know for more clarification.

pig order by with rank and join the rank together

I have the following data with the schema (t0:chararray, t1:int)
a0 1
a1 7
b2 9
a2 4
b0 6
And I want to order it t1 and then combine with a rank
a0 1 1
a2 4 2
b0 6 3
a1 7 4
b2 9 5
Is there any convenient way without writing UDF in pig?
There is the RANK operation in Pig. This should be sufficient:
X = rank A by t1 ASC;
Please see the Pig docs for more details.

How to get the last non empty value of a hierarchy?

I've got a hierarchy with the appropriate value linked to each level, let's say :
A 100
A1 NULL
A2 NULL
B
B1 NULL
B2 1000
B21 500
B22 500
B3 NULL
This hierarchy is materialized in my database as a parent-child hierarchy
Hierarchy Table
------------------------
Id Code Parent_Id
1 A NULL
2 A1 1
3 A2 3
4 B NULL
5 B1 4
6 B2 4
7 B21 6
8 B22 6
9 B3 4
And here is my fact table :
Fact Table
------------------------
Hierarchy_Id Value
1 100
6 1000
7 500
8 500
My question is : do you know/have any idea of how to get only the last non empty value of my hiearchy?
I know that there an MDX function which could do this job but I'd like to do this in an another way.
To be clear, the desired output would be :
Fact Table
------------------------
Hierarchy_Id Value
1 100
7 500
8 500
(If necessary, the work of flatten the hierarchy is already done...)
Thank you in advance!
If the codes for your hierarchy are correct, then you can use the information in the codes to determine the depth of the hierarchy. I think you want to filter out any "code" where there is a longer code that starts with it.
In that case:
select f.*
from fact f join
hierarchy h
on f.hierarchyId = h.hierarchyId
where not exists (select 1
from fact f2 join
hierarchy h2
on f2.hierarchyId = h2.hierarchyId
where h2.code like concat(h.code, '%') and
h2.code <> h.code
)
Here I've used the function concat() to create the pattern. In some databases, you might use + or || instead.