I'm trying to transform a base with duplicates into a new base according to the attached model
impossible without duplicate
I don't see how I can do
in advance thank you for your help
original base
IDu| ID | Information
1 |A |1
2 |A |2
3 |A |3
4 |A |4
5 |A |5
6 |B |1
7 |B |2
8 |B |3
9 |B |4
10 |C |1
11 |D |1
12 |D |2
13 |D |3
base to reach
ID | Resultat/table2 | plus grand valeur
A |(1,2,3,4,5) |5
B |(1,2,3,4) |4
C |(1) |1
D |(1,2,3) |3
You can use GROUP_CONCAT
(https://www.w3resource.com/mysql/aggregate-functions-and-grouping/aggregate-functions-and-grouping-group_concat.php):
SELECT
ID, GROUP_CONCAT(INFORMATION), COUNT(INFORMATION)
FROM
TABLE
GROUP BY
ID
a huge thank you.
Quick and perfect response
on the other hand how I can filter to have the greatest value
this query ranges from smallest to largest, but how to keep only the largest value
D | Resultat/table2 | greatest value
A |(1,2,3,4,5) |5
B |(1,2,3,4) |4
C |(1) |1
D |(1,2,3) |3
I tried, but without success
SELECT ID,GROUP_CONCAT(ID1)
from tournee_reduite
GROUP BY ID
ORDER BY MAX(ID1) desc;
another huge thank you
I have a dataframe in which some rows are useless except for one variable.
I want to add that the variable in those rows to the previous row and then delete the useless rows.
In the data frame there are some rows in which the only useful information is on a variable, so I want to preserve this information.
More precisely, my dataframe looks something like
|cat1| cat2|var1|var2|
|A |x |1 |2 |
|A |x |1 |0 |
|A |x |. |5 |
|A |y |1 |2 |
|A |y |1 |2 |
|A |y |1 |3 |
|A |y |. |6 |
|B |x |1 |2 |
|B |x |1 |4 |
|B |x |1 |2 |
|B |x |1 |1 |
|B |x |. |3 |
and i want to get
|cat1| cat2|var1|var2|
|A |x |1 |2 |
|A |x |1 |5(5+0)|
|A |y |1 |2 |
|A |y |1 |2 |
|A |y |1 |9(6+3)|
|B |x |1 |2 |
|B |x |1 |4 |
|B |x |1 |2 |
|B |x |1 |4(3+1)|
iI've tried code like
test = df[df['var1'] == '.'].index
for num in test:
df['var2][num - 1] = df['var2][num - 1] + df['var2][num]
but it doesn't work.
Any help would be appreciated.
For a very readable solution combine np.where to select the rows where the shifted rows of var1 contain .. Use the -1 to select the next row. If that's the case add the next row, otherwise just fill the original row. Afterwards, just drop all the rows with a .
df['var2_new'] = np.where(df['var1'].shift(-1) == '.',
df['var2'] + df['var2'].shift(-1), df['var2'])
df[df['var1'] != '.']
# cat1 cat2 var1 var2 var2_new
#0 A x 1 2 2.0
#1 A x 1 0 5.0
#3 A y 1 2 2.0
#4 A y 1 2 2.0
#5 A y 1 3 9.0
#7 B x 1 2 2.0
#8 B x 1 4 4.0
#9 B x 1 2 2.0
#10 B x 1 1 4.0
I've a dataset where from I want distinct count of more than one column and get the result in one single select, how to go about it?
Example:
Table:
|Col_A|Col_B|
|a |c |
|a |d |
|b |c |
|b |d |
|b |c |
I want like this (with the use of a single select query) -
|Col_A|Count_of_A|Col_B|Count_of_B|
|a |2 |c |3 |
|b |3 |d |2 |
How to do this? Given that, data is unknown every-time and hence, we cannot use where or case statements for specific use-case.
Ideally this is a Spark-Streaming problem, where I want to do this operation on a Spark-streaming dataframe every time new data comes in from Kafka.
Is there a word/phrase that describes the following action?
Where data in the form:
ID |Group |Type |Data
------------------------
1 |A |a |10
2 |A |b |11
3 |A |c |12
4 |B |a |20
5 |B |d |40
6 |C |b |31
Is transformed to this form:
Type |A |B |C (etc.)
-------------------------
a |10 |20 |NULL
b |11 |NULL |31
c |12 |NULL |NULL
d |NULL |40 |NULL
This is a kind of pivot, but where there is no summarising so data could (in theory) be updated via the transformed table.
I would have thought that this is needed quite widely for allocation of resources/stock to multiple projects. In the example above 'Group' would be project, 'Type' would be the resource and 'Data' would be the quantity needed or allocated.
I really want to ask a question about how this is normally approached in database design, but I need to know the terminology before I can do that!
I'm new to SQL so this took my a long time without being able to figure it out.
My table looks like this:
+------+------+------+
|ID |2016 | 2017 |
+------+------+------+
|1 |A |A |
+------+------+------+
|2 |A |B |
+------+------+------+
|3 |B |B |
+------+------+------+
|4 |B |C |
+------+------+------+
I would like to have only the rows which have changed from 2016 to 2017:
+------+------+------+
|ID |2016 | 2017 |
+------+------+------+
|2 |A |B |
+------+------+------+
|4 |B |C |
+------+------+------+
Could you please help ?
select * from mytable where column_2016<>column_2017
assuming your column labels are column_2016 and column_2017