How to merge columns of non unique rows in a database? (Sybase ASE) - sql

Consider the data as:
|Column 1|Column 2|Column 3|
----------------------------
|A |Tom |1 |
|A |Tom |2 |
|B |Ron |3 |
There are few duplicates in Column 1 that are preventing me to create an index. I need to only create an index on Col 1.
How do I merge/flatten the values to get something like:
|Column 1|Column 2|Column 3|
----------------------------
|A |Tom |1,2 |
|B |Ron |3 |
How do we do this without using concatenate/LIST/STUFF? The database is Sybase ASE.

You'll have to write a loop to do this. But if you only want to create that index, why not create it as non-unique?
If you have to create it as unique, just add an identity column to the table and create the index on column1 + the identity column (or use the auto-identity DBoption)

Related

How to append data to a column value in dataframe

In spark, I have a dataframe having a column named goals which holds numeric value. Here, I just want to append "goal or goals" string to the actual value
I want to print it as
if,
value = 1 then 1 goal
value = 2 then 2 goals and so on..
My data looks like this
val goalsDF = Seq(("meg", 2), ("meg", 4), ("min", 3),
("min2", 1), ("ss", 1)).toDF("name", "goals")
goalsDF.show()
+-----+-----+
|name |goals|
+-----+-----+
|meg |2 |
|meg |4 |
|min |3 |
|min2 |1 |
|ss |1 |
+-----+-----+
Expected Output:
+-----+---------+
|name |goals |
+-----+---------+
|meg |2 goals |
|meg |4 goals |
|min |3 goals |
|min2 |1 goal |
|ss |1 goal |
+-----+---------+
I tried below code but it doesn't work and prints the data as null
goalsDF.withColumn("goals", col("goals") + lit("goals")).show()
+----+-----+
|name|goals|
+----+-----+
| meg| null|
| meg| null|
| min| null|
|min2| null|
| ss| null|
+----+-----+
Please suggest if we can do this inside .withColumn() without any addition user defined method
You should use case when. It's pyspark example but you should be able to reference it and use scala.
DF.
withColumn('goals', F.When(F.col('goals') == 1, '1 goal').otherwise(F.concat_ws(" ", F.col("goals"), "goals"))
)
For scala example see here: https://stackoverflow.com/a/37108127/5899997

SQL triggers - conditionals and math operations

I have some database, lets say like that:
|a |b |c |d |e |
----------------------------------
|0 |12 |NA | | |
|NA|NA |30 | 42 | |
|NA|NA |NA | |53 |
I'd like to do few things:
for first row - set value to column D to be sin(b)
for second row - if the value is NA for column B, copy the previous known value (i.e (a,b)=(0,12))
on third row - if the value is NA for column B , copy the previous known value (i.e (a,b)=(0,12)) and set value to column D to be sin(b).
Those three examples of activities I have to execute on my databases for each row.
If triggers are not the right solution for this, I'd like to have recommendation how to solve it.

How to add more rows in pyspark df by column value

I'm stuck with this problem quite a while and probably making it bigger than really it is. I will try to simplify it.
I'm using pyspark and data frame functions along my code.
I already have a df as:
+--+-----+---------+
|id|col1 |col2 |
+--+-----+---------+
|1 |Hello|Repeat |
|2 |Word |Repeat |
|3 |Aux |No repeat|
|4 |Test |Repeat |
+--+-----+---------+
What I want to achieve is to repeat the df's rows when col2 is 'Repeat' increasing col1's values in value+1.
+--+-----+---------+------+
|id|col1 |col2 |col3 |
+--+-----+---------+------+
|1 |Hello|Repeat |Hello1|
|1 |Hello|Repeat |Hello2|
|1 |Hello|Repeat |Hello3|
|2 |Word |Repeat |Word1 |
|2 |Word |Repeat |Word2 |
|2 |Word |Repeat |Word3 |
|3 |Aux |No repeat|Aux |
|4 |Test |Repeat |Test1 |
|4 |Test |Repeat |Test2 |
|4 |Test |Repeat |Test3 |
+--+-----+---------+------+
My first approach was to use withColumn operator to create a new column with udf's help:
my_func = udf(lambda words: (words + str(i + 1 for i in range(3))), StringType())
df = df\
.withColumn('col3', when(col('col2') == 'No Repeat', col('col1'))
.otherwise(my_func(col('col1'))))
But when I evaluate this in a df.show(10,False) it's throw me an error. My guessing is because I just can't create more rows with withColumn function in that way.
So I decide to go for another approach with no success also. Using a rdd.flatMap:
test = df.rdd.flatMap(lambda row: (row if (row.col2== 'No Repeat') else (row.col1 + str(i+1) for i in range(3))))
print(test.collect())
But here I'm losing the df schema and I can not throw out the full row on the else condition, it only throw me the col1 words plus it's iterator.
Do you know any proper way to solve this?
At the end my problem is that I do not get a properly way to create more rows based on column values because I'm quite new in this world. Also answers that I found seems not to fit this problem.
All help will be appreciate.
One way is use a condition and assign an array , then explode,
import pyspark.sql.functions as F
(df.withColumn("test",F.when(df['col2']=='Repeat',
F.array([F.lit(str(i)) for i in range(1,4)])).otherwise(F.array(F.lit(''))))
.withColumn("col3",F.explode(F.col("test"))).drop("test")
.withColumn("col3",F.concat(F.col("col1"),F.col("col3")))).show()
A neater version of the same as suggested by #MohammadMurtazaHashmi would look like:
(df.withColumn("test",F.when(df['col2']=='Repeat',
F.array([F.concat(F.col("col1"),F.lit(str(i))) for i in range(1,4)]))
.otherwise(F.array(F.col("col1"))))
.select("id","col1","col2", F.explode("test"))).show()
+---+-----+---------+------+
| id| col1| col2| col3|
+---+-----+---------+------+
| 1|Hello| Repeat|Hello1|
| 1|Hello| Repeat|Hello2|
| 1|Hello| Repeat|Hello3|
| 2| Word| Repeat| Word1|
| 2| Word| Repeat| Word2|
| 2| Word| Repeat| Word3|
| 3| Aux|No repeat| Aux|
| 4| Test| Repeat| Test1|
| 4| Test| Repeat| Test2|
| 4| Test| Repeat| Test3|
+---+-----+---------+------+

Select rows with same id but different result in another column

sql: I have a table like this:
+------+------+
|ID |Result|
+------+------+
|1 |A |
+------+------+
|2 |A |
+------+------+
|3 |A |
+------+------+
|1 |B |
+------+------+
|2 |B |
+------+------+
The output should be something like:
Output:
+------+-------+-------+
|ID |Result1|Result2|
+------+-------+-------+
|1 |A |B |
+------+-------+-------+
|2 |A |B |
+------+-------+-------+
|3 |A | |
+------+-------+-------+
How can I do this?
SELECT
Id,
MAX((CASE result WHEN 'A' THEN 'A' ELSE NULL END)) result1,
MAX((CASE result WHEN 'B' THEN 'B' ELSE NULL END)) result2,
FROM
table1
GROUP BY Id
results
+------+-------+-------+
|Id |Result1|Result2|
+------+-------+-------+
|1 |A |B |
|2 |A |B |
|3 |A |NULL |
+------+-------+-------+
run live demo on SQL fiddle: (http://sqlfiddle.com/#!9/e1081/2)
there are a few ways to do it.
None of tehm a are straight forward.
in theory, a simple way would be to create 2 temporary tables, where you separte the data, all the "A" resultas in one table and "B" in another table.
Then get the results with simple query. using JOIN.
if you are allowed to use some scrpting on the process then it is simpler, other wise you need a more complex logic on your query. And for you query to alwasy work, you need to have some rules like, A table always contains more ids than B table.
If you post your real example, it is easier to get better answers.
for this reason:
ID Name filename
1001 swapan 4566.jpg
1002 swapan 678.jpg
1003 karim 7688.jpg
1004 tarek 7889.jpg
1005 karim fdhak.jpg
output:
ID Name filename
1001 swapan 4566.jpg 678.jpg
1003 karim 7688.jpg fdhak.jpg
1004 tarek 7889.jpg ...
.. ... ... ...

How to Merge 2 Rows into one by comma separate?

I need to merge this individual rows to one column, I now how to merge column by comma separated,
+---------------+-------+-------+
|CID |Flag |Value |
+---------------+-------+-------+
|1 |F |10 |
|1 |N |20 |
|2 |F |12 |
|2 |N |23 |
|2 |F |14 |
|3 |N |21 |
|3 |N |22 |
+---------------+-------+-------+
Desired Result can be anything,
+-----------+----------------------------+ +--------------------------+
|Part Number| Value | | Value |
+-----------+----------------------------+ +--------------------------+
| 1 | 1|F|10 ; 1|N|20 | Or | 1|F|10 ; 1|N|20 |
| 2 | 2|F|12 ; 2|N|23 ; 2|F|14 | | 2|F|12 ; 2|N|23 ; 2|F|14 |
| 3 | 3|N|21 ; 3|N|22 | | 3|N|21 ; 3|N|22 |
+-----------+----------------------------+ +--------------------------+
Note:
Any hint in right direction with small example is more than enough
EDIT :
I have massive data in tables like thousands of records where parent's and child relationship is present. I have to dump this into text files by comma separated values In single line as record. Think as primary record has relationship with so many other table then all this record has to be printed as a big line.
And I am trying to achieve by creating query so load can be distributed on database and only thing i have to worry about in business is just dumping logic into text files or whatever form we need in future.
You can try to use LISTAGG and your query will look like this:
select a.cid, a.cid || listagg(a.flag || '|' || a.value, ',')
from foo.dat a
group by a.cid
You can use different separators and of course play with how the result will be formatted.