replacing values in pig latin - apache-pig

I have a dataset in form:
id1, id2, id3
Either of id1,id2 or id3 (or all three.. or any two) can be missing in a record.
Now if id1 is missing I want to replace it with 1
id2 by 3
id3 by 7
How do I do this.
Thanks

Use the bincond operator to test if the value is null and then replace it with the desired value. From Programming Pig, Chapter 5:
2 == 2 ? 1 : 4 --returns 1
2 == 3 ? 1 : 4 --returns 4
null == 2 ? 1 : 4 -- returns null
2 == 2 ? 1 : 'fred' -- type error, both values must be of the same type
In your example,
id2 IS NULL ? 3 : id2

Related

MSAccess - query to return result set of earliest rows with a unique combination of 2 columns

I have a table with the following columns.
ID (auto-inc)
When (datetime)
id1 (number)
id2 (number)
The combination of id1 and id2 can be unique or duplicated many times.
I need a query that returns the earliest record (by When) for each unique combination of id1+id2.
Example data:
ID
When
id1
id2
1
1-Jan-2020
4
5
2
1-Jan-2019
4
5
3
1-Jan-2021
4
5
4
1-Jan-2020
4
4
5
1-Jan-2019
4
4
6
1-Jan-2021
4
6
I need this to return rows 2, 5 and 6
I cannot figure out how to do this with an SQL query.
I have tried Group By on the concatenation of id1 & id2, and I have tried "Distinct id1, id2", but neither return the entire row of the record with the earliest When value.
If the result set can just return the ID that is fine also, I just need to know the rows that match these two requirements.
Okay, I had a few minutes to kill:
SELECT Data.* FROM Data WHERE ID IN (
SELECT TOP 1 ID FROM Data AS D
WHERE D.id1=Data.id1 AND D.id2=Data.id2 ORDER BY When);
or
SELECT Data.* FROM Data INNER JOIN (
SELECT id1, id2, Min(When) AS MW FROM Data
GROUP BY id1, id2) AS D
ON Data.When = D.MW AND Data.id1=D.id1 AND Data.id2=D.id2;
ID
When
id1
id2
2
1/1/2019
4
5
5
1/1/2019
4
4
6
1/1/2021
4
6

Select first value then fill with null for unique id

I'm using a Postgres database.
I've been trying to resolve this for hours and read dozens of topics with no result yet
Since I don't know how to explain my issue with words, here is what I need by example :
My query is
select distinct chiffre_affaires.contrat_pentete_id,
chiffre_affaires.chiffre_affaires_id,
chiffre_affaires.chiffre_affaires_montant_total
from chiffre_affaires;
Current output :
contrat_pentete_id
chiffre_affaires_id
chiffre_affaires_montant_total
1
1
111.7848
1
2
111.7848
1
3
111.7848
1
4
111.7848
1
5
111.7848
1
6
111.7848
2
7
90
2
8
90
2
9
90
2
10
90
Expected output :
null values can be replaced by 0, both null or 0 would work
contrat_pentete_id
chiffre_affaires_id
chiffre_affaires_montant_total
1
1
111.7848
1
2
null
1
3
null
1
4
null
1
5
null
1
6
null
2
7
90
2
8
null
2
9
null
2
10
null
Thank you in advance for any help !
Trying to understand what you want to achieve : for a group of rows with same contrat_pentete_id, ordered by chiffre_affaires_id ASC, you want to display the
chiffre_affaires_montant_total value for the first row, and NULL for the next rows. If so, you can try this :
SELECT DISTINCT
ca.contrat_pentete_id,
ca.chiffre_affaires_id,
CASE
WHEN ca.chiffre_affaires_id = first_value (ca.chiffre_affaires_id) OVER (ORDER BY ca.chiffre_affaires_id)
THEN ca.chiffre_affaires_montant_total
ELSE NULL
END AS ca.chiffre_affaires_montant_total
FROM chiffre_affaires AS ca
ORDER BY ca.contrat_pentete_id, ca.chiffre_affaires_id
Thanks to Edouard H. I finally wrote a script that did the job.
Here is the solution :
SELECT DISTINCT ca.contrat_pentete_id,
ca.chiffre_affaires_id,
ca.chiffre_affaires_annee_mois,
CASE
WHEN ca.chiffre_affaires_id =
first_value(ca.chiffre_affaires_id) OVER (PARTITION BY ca.contrat_pentete_id ORDER BY ca.chiffre_affaires_annee_mois)
THEN ca.chiffre_affaires_montant_total
END AS montant_facture
FROM chiffre_affaires AS ca
ORDER BY ca.contrat_pentete_id;

Excluding rows based on column

I am trying to exclude rows where a value exists in another column of other row.
select * from TABLE1
ID1 ID2 VALUE
1 1 HIGH
2 2 MEDIUM
3 3 LOW
4 4 HIGH
5 4 HIGH
6 6 MEDIUM
All the data is coming from the same table what I want is to exclude ID1 = 4 because the value 4 exists in column ID2 in row 5. The final desired result is as follows:
ID1 ID2 VALUE
1 1 HIGH
2 2 MEDIUM
3 3 LOW
6 6 MEDIUM
I tried using a simple query such as:
Select * from TABLE1 Where ID1 = ID2
But this will wrongly also include row 4 as below since I need to exclude it because the value exists in another row but in ID2 column:
ID1 ID2 VALUE
1 1 HIGH
2 2 MEDIUM
3 3 LOW
4 4 HIGH
6 6 MEDIUM
You just have to add, this will exclude the records where you see more than 1 ids.
and id2 not in (Select id2 from table1 group by id2 having count(*) > 1)
Similarly add for id1 with OR
You can use the logic in the query below.
select * from t T1
Where 2 > (Select count(1) from t T2 where T2.id2 = T1.id2);

Find all ids which lie between Range and are present in a common foreign key id

I want all the specific Ids which have a common other Id. I am sending the data through an user-defined table type.
CREATE TYPE rangeType AS TABLE (
ID2 int NOT NULL,
StartRange int NULL,
EndRange int NULL
);
The table is Like the following
ID1 ID2 Value
11 2 3
12 2 4
12 3 8.9
15 3 10
15 2 4
The value I will send will be of the form
DECLARE #temp_table rangeType
Insert INTO #temp_table values (2,4,10)
INSERT INTO #temp_table values (3,5,10)
So I want Output to be all those ID1's which have both the value of ID2 as 2 and 3 and the rows which have ID2 as 2 should have a value between 4 and 10 and all those rows which have ID2 as 3 should have a value between 5 and 10.
So my output, in this case, should be
ID1
12
15
as the ID1 12 and 15 maps both 2 and 3 and have the ranges between the specified respective ranges.
I Tried an inner join on the table followed by a BETWEEN operator. Which is giving me a correct value the operation which is performed is OR operation rather than an AND operation which I want.
You can use below query
SELECT ID1 FROM TABLE
WHERE ID2 IN (2,3)
AND
CASE WHEN ID2 = 2 AND VALUE >= 4 AND VALUE >=10 THEN 1
WHEN ID2 = 3 AND VALUE >= 5 AND VALUE >=10 THEN 1
ELSE 0
END = 1;

Select distinct not-null rows SQL server 2005

I ran into the following problem.
I have a table like this:
ID ID1 ID2 ID3 ID4 ID5
1 NULL NULL NULL NULL 1
2 NULL NULL NULL 2 NULL
3 NULL NULL NULL 2 1
4 3 NULL NULL 2 NULL
5 3 NULL NULL 2 1
6 NULL 5 NULL 2 NULL
And I need to get distinct rows it terms that NULL equals any value. For this example the answer is:
ID ID1 ID2 ID3 ID4 ID5
5 3 NULL NULL 2 1
6 NULL 5 NULL 2 NULL
P.S. Here ID is primary key hence unique. ID1-ID5 - any integers.
Thanks in advance!
UPDATED
Saying that null equals any number I mean that it's absorbed by any number.
This works, don't know if it can be made any simpler
SELECT ID1, ID2, ID3, ID4, ID5
FROM IDS OUTT
WHERE NOT EXISTS (SELECT 1
FROM IDS INN
WHERE OUTT.ID != INN.ID AND
(ISNULL(OUTT.ID1, INN.ID1) = INN.ID1 OR (INN.ID1 IS NULL AND OUTT.ID1 IS NULL)) AND
(ISNULL(OUTT.ID2, INN.ID2) = INN.ID2 OR (INN.ID2 IS NULL AND OUTT.ID2 IS NULL)) AND
(ISNULL(OUTT.ID3, INN.ID3) = INN.ID3 OR (INN.ID3 IS NULL AND OUTT.ID3 IS NULL)) AND
(ISNULL(OUTT.ID4, INN.ID4) = INN.ID4 OR (INN.ID4 IS NULL AND OUTT.ID4 IS NULL)) AND
(ISNULL(OUTT.ID5, INN.ID5) = INN.ID5 OR (INN.ID5 IS NULL AND OUTT.ID5 IS NULL)))
EDIT: Found a sweeter alternative, if your ids never have negative numbers
SELECT ID1, ID2, ID3, ID4, ID5
FROM IDS OUTT
WHERE NOT EXISTS (SELECT 1
FROM IDS INN
WHERE OUTT.ID != INN.ID AND
coalesce(OUTT.ID1, INN.ID1,-1) = isnull(INN.ID1,-1) AND
coalesce(OUTT.ID2, INN.ID2,-1) = isnull(INN.ID2,-1) AND
coalesce(OUTT.ID3, INN.ID3,-1) = isnull(INN.ID3,-1) AND
coalesce(OUTT.ID4, INN.ID4,-1) = isnull(INN.ID4,-1) AND
coalesce(OUTT.ID5, INN.ID5,-1) = isnull(INN.ID5,-1))
EDIT2: There is one case where it won't work - in case two rows (with different ids) have exact same form. I am assuming that it is not there. If such a thing is present, then first create a view with a select distinct on the base table first, and then apply this query.
Statement of your problem as I understand it:
You start with the full table:
ID ID1 ID2 ID3 ID4 ID5
1 NULL NULL NULL NULL 1
2 NULL NULL NULL 2 NULL
3 NULL NULL NULL 2 1
4 3 NULL NULL 2 NULL
5 3 NULL NULL 2 1
6 NULL 5 NULL 2 NULL
Then you eliminate "duplicate" rows, ie. rows that have less, but the same values as other rows (except NULL — and the ID column is not included):
Row 1 is eliminated because row 3 is identical, but has more values in the places where row 1 has NULL.
Row 2 likewise gets eliminated by (either of) row 2 or 4.
Row 3 and 4 are eliminated by row 5.
You're then left with rows 5 and 6:
ID ID1 ID2 ID3 ID4 ID5
5 3 NULL NULL 2 1
6 NULL 5 NULL 2 NULL
My answer:
Frankly, I don't see how this could be done with SQL's SELECT DISTINCT, or more generally, with SQL's set-based logic. I could imagine that you might be able to do this kind of filtering with a more procedural approach (e.g. with cursors) — but I can't provide a solution for this.
A note about terminology:
NULL equals any value
NULL never equals any value, because NULL is itself not a value; it is the absence of a value. NULL essentially means "unknown". (The fact that NULL is not a value is the reason why you shouldn't write IDx = NULL, but IDx IS NULL instead.)
If ID1, ID2 (...) has always the same value, as in your example, you could do it
Select
SUM(id1)/COUNT(id1),
SUM(id2)/COUNT(id2),
SUM(id3)/COUNT(id3),
SUM(id4)/COUNT(id4),
SUM(id5)/COUNT(id5) From TABLE
The functions SUM and COUNT will ignore that null values.
But still little confused your question.. :)