Distinct count on multiple unrelated columns - sql

I've a dataset where from I want distinct count of more than one column and get the result in one single select, how to go about it?
Example:
Table:
|Col_A|Col_B|
|a |c |
|a |d |
|b |c |
|b |d |
|b |c |
I want like this (with the use of a single select query) -
|Col_A|Count_of_A|Col_B|Count_of_B|
|a |2 |c |3 |
|b |3 |d |2 |
How to do this? Given that, data is unknown every-time and hence, we cannot use where or case statements for specific use-case.
Ideally this is a Spark-Streaming problem, where I want to do this operation on a Spark-streaming dataframe every time new data comes in from Kafka.

Related

sparksql how to resolve Data dependency issues

scala> sql(""" select * from demo1""").show(false)
+-----+---+
|from1|to1|
+-----+---+
|c |d |
|b |c |
|a |b |
+-----+---+
demo1 is my inputTable.
now from table we can find: a to b, b to c,c to d; so we need All elements on the link
will be to d.
so i need result like this:
+-----+---+
|from1|to1|
+-----+---+
|c |d |
|b |d |
|a |d |
+-----+---+
note: a b c It's not sorted by subtitle or size.
Their relationship transitions are random.
How do I write this SparkSQL?

transform table with duplicate

I'm trying to transform a base with duplicates into a new base according to the attached model
impossible without duplicate
I don't see how I can do
in advance thank you for your help
original base
IDu| ID | Information
1 |A |1
2 |A |2
3 |A |3
4 |A |4
5 |A |5
6 |B |1
7 |B |2
8 |B |3
9 |B |4
10 |C |1
11 |D |1
12 |D |2
13 |D |3
base to reach
ID | Resultat/table2 | plus grand valeur
A |(1,2,3,4,5) |5
B |(1,2,3,4) |4
C |(1) |1
D |(1,2,3) |3
You can use GROUP_CONCAT
(https://www.w3resource.com/mysql/aggregate-functions-and-grouping/aggregate-functions-and-grouping-group_concat.php):
SELECT
ID, GROUP_CONCAT(INFORMATION), COUNT(INFORMATION)
FROM
TABLE
GROUP BY
ID
a huge thank you.
Quick and perfect response
on the other hand how I can filter to have the greatest value
this query ranges from smallest to largest, but how to keep only the largest value
D | Resultat/table2 | greatest value
A |(1,2,3,4,5) |5
B |(1,2,3,4) |4
C |(1) |1
D |(1,2,3) |3
I tried, but without success
SELECT ID,GROUP_CONCAT(ID1)
from tournee_reduite
GROUP BY ID
ORDER BY MAX(ID1) desc;
another huge thank you

how to merge specific cells table data in oracle

I want to condionally concatenate text cells in oracle table according to sequence (SEQ) number attribute. Is it possible to do it? I need your help with the query.
For example I have the following table DATA:
|-----------------|
|ID|CODE|SEQ|TEXT |
|--|----|---|-----|
|1 |a |1 |text1|
|1 |a |2 |text2|
|2 |b |1 |text3|
|3 |c |1 |text4|
|4 |d |1 |text6|
|4 |d |2 |text7|
|4 |d |3 |text8|
-------------------
What I want to do is to create a new table DATA1 which concatenates TEXT values having the same id and code with concatenated texts in case SEQ > 1. The new table should look like this:
|-------------------------|
|ID|CODE|TEXT |
|--|----|-----------------|
|1 |a |text1 text2 |
|2 |b |text3 |
|3 |c |text4 |
|4 |d |text6 text7 text8|
---------------------------
listagg() function might be used with grouping by id and code.
select id, code,
listagg(text,' ') within group (order by seq) as text
from tab
group by id, code
Demo

SQL Split multiple groups of columns into rows from a view

I'm working on a legacy view which for a key returns multiple subsets of data that I would like as separate rows. Example of what's being returned:
|a |b |cStartQty |cUpdatedQty |dStartQty |dUpdatedQty |
|1 |2 |10 |20 |15 |20 |
|2 |4 |11 |18 |16 |21 |
What I'd like returned is something like
|a |b |Account |StartQty |UpdatedQty |
|1 |2 |cXX |10 |20 |
|1 |2 |dXX |15 |21 |
|2 |4 |cXX |11 |18 |
|2 |4 |dXX |16 |21 |
At first I thought I could do this with a chain of unions but that would require many redundant queries on the view (there are approximately 15 subsets). Outside of that I don't really have a clue how to proceed. If necessary I thought I may have to wrap this view in a proc and go that route.
You ca use UNNEST:
SELECT
a,
b,
UNNEST(ARRAY[cStartQty, dStartQty]) as StartQty,
UNNEST(ARRAY[cUpdateQty, dUpdateQty]) as UpdateQty
FROM mytable
** UPDATE **
Sorry, I didn't notice you are using sybase.
I don't know if sybase supports unnest, but I will leave the answer as it is to see if someone can confirm sybase support. The proposed query works on Postgresql.

Is there a word or term for a PIVOT without data loss?

Is there a word/phrase that describes the following action?
Where data in the form:
ID |Group |Type |Data
------------------------
1 |A |a |10
2 |A |b |11
3 |A |c |12
4 |B |a |20
5 |B |d |40
6 |C |b |31
Is transformed to this form:
Type |A |B |C (etc.)
-------------------------
a |10 |20 |NULL
b |11 |NULL |31
c |12 |NULL |NULL
d |NULL |40 |NULL
This is a kind of pivot, but where there is no summarising so data could (in theory) be updated via the transformed table.
I would have thought that this is needed quite widely for allocation of resources/stock to multiple projects. In the example above 'Group' would be project, 'Type' would be the resource and 'Data' would be the quantity needed or allocated.
I really want to ask a question about how this is normally approached in database design, but I need to know the terminology before I can do that!