SQL pivot table for unknown number of columns - sql

I need some tips for the Postgres pivot below, please.
I have a table like this:
+------+---+----+
| round| id| kpi|
+------+---+----+
| 0 | 1 | 0.1|
| 1 | 1 | 0.2|
| 0 | 2 | 0.5|
| 1 | 2 | 0.4|
+------+---+----+
The number of Ids is unknown.
I need to convert the id column into multiple columns (same amount of different ids), with KPI value as their values and in the new table we keep the rounds like in the first table.
+------+----+----+
| round| id1| id2|
+------+----+----+
| 0 | 0.1| 0.5|
| 1 | 0.2| 0.4|
+------+----+----+
Is it possible to do this in SQL? How to do that?

It´s possible, check this question
This other is a pivot that I did, also with an unknown number of columns, maybe it can help you too: Advanced convert rows to columns (pivot) in SQL Server

Related

SQL table transformation. How to pivot a certain table?

How would I do the pivot below?
I have a table like this:
+------+---+----+
| round| id| kpi|
+------+---+----+
| 0 | 1 | 0.1|
| 1 | 1 | 0.2|
| 0 | 2 | 0.5|
| 1 | 2 | 0.4|
+------+---+----+
I want to convert the id column into multiple columns (same amount of different ids), with KPI value as their values and in the new table we keep the rounds like in the first table.
+------+----+----+
| round| id1| id2|
+------+----+----+
| 0 | 0.1| 0.5|
| 1 | 0.2| 0.4|
+------+----+----+
Is it possible to do this in SQL? How to do that?
You are looking for a pivot function. You can find details on how to do this here and here. The first link also provides input into how to do this if you have an unknown number of columnnames.

Using pyspark to create a segment array from a flat record

I have a sparsely populated table with values for various segments for unique user ids. I need to create an array with unique_id and relevant segment headers only
Please note that this is just an indicative dataset. I have several hundreds of segments like these.
------------------------------------------------
| user_id | seg1 | seg2 | seg3 | seg4 | seg5 |
------------------------------------------------
| 100 | M | null| 25 | null| 30 |
| 200 | null| null| 43 | null| 250 |
| 300 | F | 3000| null| 74 | null|
------------------------------------------------
I am expecting the output to be
-------------------------------
| user_id| segment_array |
-------------------------------
| 100 | [seg1, seg3, seg5] |
| 200 | [seg3, seg5] |
| 300 | [seg1, seg2, seg4] |
-------------------------------
Is there any function available in pyspark of pyspark-sql to accomplish this?
Thanks for your help!
I cannot find the direct way but you can do this.
cols= df.columns[1:]
r = df.withColumn('array', array(*[when(col(c).isNotNull(), lit(c)).otherwise('notmatch') for c in cols])) \
.withColumn('array', array_remove('array', 'notmatch'))
r.show()
+-------+----+----+----+----+----+------------------+
|user_id|seg1|seg2|seg3|seg4|seg5| array|
+-------+----+----+----+----+----+------------------+
| 100| M|null| 25|null| 30|[seg1, seg3, seg5]|
| 200|null|null| 43|null| 250| [seg3, seg5]|
| 300| F|3000|null| 74|null|[seg1, seg2, seg4]|
+-------+----+----+----+----+----+------------------+
Not sure this is the best way but I'd attack it this way:
There's the collect_set function which will always give you a unique value across a list of values you aggregate over.
do a union for each segment on:
df_seg_1 = df.select(
'user_id',
fn.when(
col('seg1').isNotNull(),
lit('seg1)
).alias('segment')
)
# repeat for all segments
df = df_seg_1.union(df_seg_2).union(...)
df.groupBy('user_id').agg(collect_list('segment'))

Oracle SQL: Merge select results on top of each other

Lets say you have two queries. Query A results in:
| A | B | C |
+---+---+---+
| 1 | 5 | 9 |
| 2 | 6 | 10|
And Query B results in:
| A | B | C |
+---+---+---+
| 3 | 7 | 11|
| 4 | 8 | 12|
Is it possible to execute the statements in a way to get:
| A | B | C |
+---+---+---+
| 1 | 5 | 9 |
| 2 | 6 | 10|
| 3 | 7 | 11|
| 4 | 8 | 12|
Would the simpler solution be to join them? Or if it involves using ';' to separate the two selects I'm getting an error using it.
Also, I have tried using UNION or UNION ALL between the statements but that gives
ORA-00933: SQL command not properly ended
This is being done in Excel's Microsoft query.
Use a UNION statement.
SELECT * FROM queryA
UNION ALL
SELECT * FROM queryB
The WHERE clauses stay with each query, but the ORDER BY moves to the very end.
When you use UNIONs, the name or alias of the column in the top most query becomes the alias for the entire column. So at the end, you would just write something like Order By A where A is your first column name. Make sure you don't have aliases specified in queries other than the top most one.

SSRS Distinct Count Of Condition Once Per Group

Let's say I have data similar to this:
|NAME |AMOUNT|RANDOM_FLAG|
|------|------|-----------|
|MARK | 100| X |
|MARK | 400| |
|MARK | 200| X |
|AMY | 100| X |
|AMY | 400| |
|AMY | 300| |
|ABE | 300| |
|ABE | 900| |
|ABE | 700| |
How can I get a distinct count of names with at least one RANDOM_FLAG set. In my total row, I want to see a count of 2 since both Mark and Amy had the flag set, regardless of how many times it is selected. I have tried every thing I can think of in SSRS. I'm guessing there is a way to nest aggregates to get to this, but I can't come up with it. I do have a group on NAME.
You can use a conditional COUNTDISTINCT in SSRS.
=COUNTDISTINCT(
IIF( not ISNOTHING(Fields!Random_Flag.Value) and Fields!Random_Flag <> "",Fields!Name.Value,Nothing),
"DataSetName"
)
Replace DataSetName by the name of your dataset.
select count(name) from table_name where name in (select distinct name from table_name where random_flag="x");

Translation of a SQL Query Into DAX to create a Calculated Column in PowerPivot

Hi I am building a PowerPivot Data Model Using "Person" table which has the columns "Name" and "Amount"
Table - Person
|Name | Amount|
|Red | 10|
|Blue | 10|
|Red | 16|
|Blue | 82|
|Red | 82|
|Red | 54|
|Red | 61|
|Blue | 82|
|Blue | 82|
The Output is as expected :
| Name | Amount | Count(Specific_Amount) |
| Red |10 | 2 |
| Blue | 10 | 1 |
|Red | 16 | 1|
|Blue | 82 | 3|
|Red |82 | 1|
|Red | 54 | 1|
|Red | 61 | 1|
What i Have Tried till now is :
select Name, distinct Amount, count(Amount) as CountOfAmountRepeated
from Person
group by Amount
order by Amount;
I have imported my table "Person" into PowerPivot in Excel.
I want to create a Calculated Column In PowerPivot in Excel to create a new column of count of Repeated Amount Values. i was able to do this in SQL by using the above query, But i wanted an Equivalent DAX query for creating a new column in PowerPivot.
Can someone translate this query into DAX or say a tool to translate sql into DAX so that i can create an Calculated column and Use PowerView to prepare a histogram of this data.
tried googling but no much help. Advance Thanks ..
There are a lot of facets of you question that need to be addressed but very simply (without consideration of any other requirements) the calculation is:
Count(Specific_Amount):=COUNTROWS('Person')
*All you seem to be looking to do here is count the unique instances of each combination.
If you then then created a pivot table dragging the [name] and [amount] into the rows and [Count(Specific_Amount)] into the values you would have the answer you are looking for, To get the layout you want you could change the layout to tabular form and remove the sub totals.