Input:
id type value
1 a aa
1 a aaa
1 b bb
1 b bbb
1 c cc
1 c ccc
Output:
id type_a type_b type_c
1 aa;aaa bb;bbb cc;ccc
using db2 i need to do the work
Please give some info of Database Version. As different source have different techniques.
Refer below articles for all possible ways of concatenating strings.
String Aggregation Techniques
Related
This question seems repetition and answered before but it is a bit tricky.
Let us say I have the following data frame.
Id Col_1
1 aaa
1 ccc
2 bbb
3 aa
Based on the value column Id and Col_1 I want create new column and assign new value by checking the existence of aa in Col_1. And this value should be applied based on the Id means if they have same Id.
The expected result:
Id Col_1 New_Column
1 aaa aa
1 ccc aa
2 bbb
3 aa aa
I tried it with this:
df['New_Column'] = ((df['Id']==1) | df['Col_1'].str.contains('aa')).map({True:'aa', False:''})
and the result is
Id Col_1 New_Column
1 aaa aa
1 ccc
2 bbb
3 aa aa
But as I mentioned it above, I want to assign aa on the new column with the same Id as well.
Can anyone help on this?
Use GroupBy.transform with GroupBy.any for get mask for all groups with at least one aaa:
mask = df['Col_1'].str.contains('aa').groupby(df['Id']).transform('any')
Alternative with Series.isin and filtering Id values by aa:
mask = df['Id'].isin(df.loc[df['Col_1'].str.contains('aa'), 'Id'])
df['New_Column'] = np.where(mask, 'aa','')
print (df)
Id Col_1 New_Column
0 1 aaa aa
1 1 ccc aa
2 2 bbb
3 3 aa aa
EDIT:
mask1 = df['Id'].isin(df.loc[df['Col_1'].str.contains('aa'), 'Id'])
mask2 = df['Id'].isin(df.loc[df['Col_1'].str.contains('bb'), 'Id'])
df['New_Column'] = np.select([mask1, mask2], ['aa','bb'],'')
print (df)
Id Col_1 New_Column
0 1 aaa aa
1 1 ccc aa
2 2 bbb bb
3 3 aa aa
How do I remove group rows based on other columns for a particular ID such that:
ID Att Comp Att. Inc. Att
aaa 2 0 2
aaa 2 0 2
bbb 3 1 2
bbb 3 1 2
bbb 3 0 2
becomes:
ID Att Comp Att. Inc. Att
aaa 2 0 2
bbb 3 1 2
I need to discard cases which are not just duplicate, but also infer the same data based on the columns.
Use drop_duplicates -- check out the documentation at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html
I can't tell for sure from your description what you want to pay attention for for duplicates, but you can tell drop_duplicates which column(s) to look at.
I have a spreadsheet linking issue. For example let the data in the work books look like this
Book1
A 1 aaa
B 2 bbb
C 3 ccc
Now I link this sheet to another sheet by reference and it has same values as in here
Book 2
A 1 aaa
B 2 bbb
C 3 ccc
Now if I add another column in Book 2 like below
Book 2
A 1 aaa 10
B 2 bbb 20
C 3 ccc 30
Now if I link this new column back to Book1 it looks like below
Book 1
A 1 aaa 10
B 2 bbb 20
C 3 ccc 30
But now if I sort column 1 in book 1 it doesn't sort the column4 because that is referenced to Book 2 which looks like below
Book 1
C 3 ccc 10
B 2 bbb 20
A 1 aaa 30
But expected output is
Book 1
C 3 ccc 30
B 2 bbb 20
A 1 aaa 10
But I want the sorting to happen on the new column that is added as well. A solution I could think of is to share a single sheet. But that doesn't serve my purpose so I need a linking which is sophisticated enough to sync across the books. Is there a pivot column based linking where the column1 in two books are taken as a reference when the manipulation happens on any sheet. Any help is appreciated.
I'm not sure why this isn't working for you, but I may be misinterpreting how you've set this up.
I've set up two workbooks, each with identical data for the first three columns, the fourth column in Book1 being linked to the same column in Book2.
Here, you can see the formula view after I sorted on Column A.
This will not work properly if you had previously enabled Auto Filter Before putting in the links, since the Auto Filter will not have extended the filter range by itself. To fix this particular issue, simply disable and re enable the Auto Filter (You should see the drop-down icon on Column D after that).
I've run into this type of "missorting" problem before. My solution was always to copy and paste the formulas in a different sheet as values.
Otherwise you could make column D dynamic through VLOOKUP and not reference hard cell numbers.
I am sure my question is very simple for some, but I cannot figure it out and it is one of those things difficult to search an answer for. I hope you can help.
In a table in SQL I have the following (simplified data):
UserID UserIDX Number Date
aaa bbb 1 21.01.2000
aaa bbb 5 21.01.2010
ppp ggg 9 21.01.2009
ppp ggg 3 15.02.2020
xxx bbb 99 15.02.2020
And I need a view which will give me the same amount of records, but for every combination of UserID and UserIDX, there should be only 1 value under the Number field, i.e. the highest value found in the combination data set. The Date field needs to remain unchanged. So the above would be transformed to:
UserID UserIDX Number Date
aaa bbb 5 21.01.2000
aaa bbb 5 21.01.2010
ppp ggg 9 21.01.2009
ppp ggg 9 15.02.2020
xxx bbb 99 15.02.2020
So, for all instances of aaa+bbb combination the unique value in Number should be 5 and for ppp+ggg the unique number is 9.
Thank you very much.
Leo
select userid,useridx,maxnum,date
from table a
inner join (
select userid,useridx,max(number) maxnum
from table
group by userid,useridx) b
on a.userid = b.userid and a.useridx = b.useridx
I have written a query which involves joins and finally returns the below result,
Name ID
AAA 1
BBB 1
BBB 6
CCC 1
CCC 6
DDD 6
EEE 1
But I want my result to be still filtered in such a way that, the duplicate values in the first column should be ignored which has lesser value. ie, CCC and BBB which are duplicates with value 1 should be removed. The result should be
AAA 1
BBB 6
CCC 6
DDD 6
EEE 1
Note: I have a condition called Where (ID = '6' or ID = '1'), is there any way to improve this condition saying Where ID = 6 or ID = 1 (if no 6 is available in that table)"
You will likely want to add:
GROUP BY name
to the bottom of your query and change ID to MAX(ID) in your SELECT statement
It is hard to give a more specific answer without seeing the query you've already written.