I have a columns of IDs with large volume of data and I want to retrieve Unique, duplicates etc in a single query that can display data in below format. Please suggest a query.
ID_Duplicates: a b c
ID_null : w, x, y, z
To search duplicate entries use:
Distinct keyword
Do some programming like:
if(duplicate){
}
else{
}
Related
I need to count the number of times each item appears in a column and return the different counts in query. In details, I have a table that saves contact logs and I need to report the number of logs each person entered and group them by x, y, z... x, y and z are all entries of the same column.
Can someone please help me with this? I am working in MS Access
you'll have to do something like this
SELECT Column_With_XYZ, Count(Column_With_XYZ) FROM TABLE_NAME group by Column_With_XYZ;
That way you will get the value of the Column_With_XYZ, and the number of times it appears.
I have two tables in MS Access and I am trying to add a field for one of those tables that tells which record from another table has a value that is less than the first field's value, but comes the closest? I have this query so far (just a select statement to test output and not alter existing tables), but it lists all values that are less than the querying value:
SELECT JavaClassFileList.ClassFile, ModuleList.Module
FROM JavaClassFileList, ModuleList
WHERE ModuleList.Order<JavaClassFileList.Order;`
I tried using things likeSELECT JavaClassFileList.Classfile, MAX(ModuleList.Module), which will only display the maximum module but combined it with the select statement above, but it would say that it would only return one record.
Output desired: I have some records, a, b, and c, I shall call them, each storing various information, while a is storing a value of 732 in a column, and b is storing a value of 731 in the same column. c is storing a value of 720. In another table, d is storing a value of 730 and e is storing a value of 718. I want the output like this (they are ordered largest to smallest):
a 732 d 730
b 731 d 730
c 720 e 718
There can be duplicates on the right, but no duplicates on the left. How can I get this result?
I would approach this type of query using a correlated subquery. I think the following words in Access:
SELECT jc.ClassFile,
(select top 1 ml.Module
from ModuleList as ml
where ml.[Order] < jc.[Order]
)
FROM JavaClassFileList as jc;
I'm assuming Order is unique for Module. If it isn't, JavaClassFileRecords may show up multiple times in the resultset.
If no module can be found for a JavaClassFile then it will not show up in the results. If you do want it to show up in cases like that (with a null module), replace INNER JOIN with LEFT OUTER JOIN.
SELECT j.ClassFile, m.Module
FROM JavaClassFileList j
INNER JOIN ModuleList m
ON m.Order =
(SELECT MAX(Order)
FROM ModuleList
WHERE Order < j.Order)
I work for a small company dealing with herbal ingredients. We count regularly the effectiveness of the ingredients, based on the "product mix" (how much of the ingredient A, B and C). I have a table with thousands of rows, like the following:
PRODUCT Ingredient A Ingredient B Ingredient C EFFECTIVENESS
1 A 28 94 550 4,1
2 B 50 105 400 4,3
3 C 30 104 312 3,5
.. Etc etc etc etc Etc
What I want as a result, is the table below. I am using excel during the last years however it is difficult to handle millions of data and therefore I would like now to have something similar in sql. I did several attempts with Pivot and subqueries but I did not manage to get the result I needed.
In particular, in the first three columns, I include various ranges / criteria. In the column ‘Average effectiveness’ it is counted the average effectiveness of the ‘total products’ which meet these criteria. Due to the fact that the ranges are hundreds e.g. for Ingredient A, I have more than 100 different ranges and similarly for Ingredient B and C, I would like a way to have all multiple combinations of A, B, C ingredients (ranges) automatically.
Ingr. A Ingr. B Ingr. C Total products Average Effectiveness
1-10 50-60 90-110 ??? ???
1-10 50-60 110-130 ??? ??
1-10 50-60 130-150 ???? ??
1-10 60-70 150-170 ??? ??
10-20 60-70 90-110 ??? ??
10-20 60-70 110-130 ??? ??
10-20 60-70 130-150 ?? ??
Etc etc
I'm unable to give a more specific answer, but I think what you need to do is;
Use the CUBE to get all of the combinations and to aggregate the SUM and AVG values
Summarizing Data Using CUBE
The CUBE query will take its data from a nested query that has your data stored by the range of a value rather than the actual value. You can refer to SQL's CASE expression for more information on transforming the data so that it stores the range of a value rather than the value.
So, in other words, first you transform your data so that you're storing which range a value occurs in. Then from that transformed data, you summarize it using the CUBE to get all the combinations. So #1 is the outer query and #2 is the inner query.
Here is a very rough idea of what the query might look like, just to give you an idea:
Select Ingr_A, Ingr_B, Ingr_C, COUNT(*), AVG(Effectiveness)
(SELECT
Product,
Effectiveness,
"Ingr_A" =
CASE
WHEN Ingredient_A >= 10 and Ingredient_A < 20 THEN '[10, 20)'
WHEN Ingredient_A >= 20 and Ingredient_A < 30 THEN '[20, 30)'
...
END,
"Ingr_B" =
CASE
(like above)
END,
"Ingr_C"
(etc.)
FROM ProductsTable)
GROUP BY Ingr_A, Ingr_B, Ingr_C WITH CUBE
I've a PIG question and is related to converting columns of tables into tuples so that I can pass them to a UDF. Details as follows:-
There is a result "C" which looks like following if I do "dump C"
(a1,b1,c1)
(a2,b2,c2)
I want to convert extract the every combination of 2 columns as follows:
(a1,a2,a3), (b1,b2,b3), (c1,c2,c3)
and then call a UDF on each possible pair of tuples:
UDF((a1,a2,a3), (b1,b2,b3))
UDF((a1,a2,a3), (c1,c2,c3))
UDF((c1,c2,c3), (b1,b2,b3))
How do I do this in PIG?
You can get all of the values for a given "column" by using GROUP .. ALL and then using bag projection:
grpd = GROUP C ALL;
udfs =
FOREACH grpd
GENERATE
UDF(grpd.a, grpd.b),
UDF(grpd.a, grpd.c),
UDF(grpd.c, grpd.b);
Note, however, that the values for each column will be stored in bags rather than tuples. This is proper, because relations in Pig do not guarantee that the records are ordered in any particular way. So your UDF should be comparing bags and not rely on the order of the elements.
However, it may be important that you be able to compare values that were originally in the same row; i.e., match up a1 with b1, etc. For this, you will need to write your UDF to take a single bag, with each tuple containing the paired elements an and bn. To do this, use bag projection of two columns:
grpd = GROUP C ALL;
udfs =
FOREACH grpd
GENERATE
UDF(grpd.(a,b)),
UDF(grpd.(a,c)),
UDF(grpd.(c,b));
Again, the tuples will not necessarily be in order, but you should not rely on that fact. Your bag will contain the tuples (a1,b1), (a2,b2), etc.
I am joining against a view, in a stored procedure to try and return a record.
In one table I have something like this:
Measurement | MeasurementType | Date Performed
I need to plot TireHeight and TireWidth.
So I am flattening that table into one row.
I only want to return a row though if both TireHeight and TireWidth were measured on that date. I am not using the date performed for anything other than joining TireWidth and TireHeight together. I run a calculation on these 2 numbers for my chart point, and use TireAge for the other axis.
How can I exclude a result row if either TireHeight or TireWidth are not available?
Thanks!
You'd use an INNER JOIN to only return rows when they are present in both tables. For example:
SELECT th.DatePerformed
, th.Measurement as TireHeight
, tw.Measurement as TireWidth
FROM (
SELECT DatePerformed, Measurement
FROM Measurements
WHERE MeasurementType = 'TireHeight'
) th
INNER JOIN (
SELECT DatePerformed, Measurement
FROM Measurements
WHERE MeasurementType = 'TireWidth'
) tw
ON tw.DatePerformed = th.DatePerformed