I need to merge data into different columns of table. I need to use stored proc to merge these data.
I have a(i = 1 to x), b, c and d as parameters
a(i = 1 to x) parameters depend on the user entry...
the other thing I need to check is at
if a(y) = b
then update b and c to table 1
else delete data from table 1 and update values a(y) to b into 4 different tables
Here y is some value in between 1 to x
should i do multiple dataconnections or use one data connection and do everything in sql?? what is the best way to tackle this problem...
If second option please samples would be a great help... Just sql side would be great!!!
Related
I have a table whose structure looks like the following:
k | i | p | v
Notice that the key (k) is not unique, there are no keys, nothing. Each key can have multiple attributes (i = 0, 1, 2, ...) which can be of different types (p) and have different values (v). One attribute type may also appear multiple times (p(i-1) = p(i)).
What I want to do is pick certain attribute types and their corresponding values and place them in the same row. For example I want to have:
k | attr_name1 | attr_name2
I have managed to make a query that does this and works for all keys (k) for which attr_name1 and attr_name2 appear in the column p of the initial table:
SELECT DISTINCT ON (key) fn.k AS key, fn.v AS attr_name1, a.v AS attr_name2
FROM Table fn
LEFT JOIN Table a ON fn.k = a.k
AND a.p = 'attr_name2'
WHERE fn.p = 'attr_name1'
I would like, however, to take into account the case where a certain key has no attribute named attr_name1 and insert a NULL value into the corresponding column of the new table. I am not sure how to achieve that. I have no issue using multiple queries or intermediate tables etc, but there are quite a lot of rows in the table and I need something that scales to millions of rows.
Any help would be appreciated.
Example:
k i p v
1 0 a 10
1 1 b 12
1 2 c 34
1 3 d 44
1 4 e 09
2 0 a 11
2 1 b 13
2 2 d 22
2 3 f 34
Would turn into (assuming I am only interested in columns a, b, c):
k a b c
1 10 12 34
2 11 13 NULL
I would use conditional aggregation. That is, an aggregate function around a CASE expression.
SELECT
k,
MAX(CASE WHEN p='a' THEN v END) AS a,
MAX(CASE WHEN p='b' THEN v END) AS b,
MAX(CASE WHEN p='c' THEN v END) AS c
FROM
your_table
GROUP BY
k
This presumes that (k, p) is unique. If there are duplicate keys, this will clearly find the one v with the highest value (for each (k,p))
As a general rule this kind of pivoting makes the data harder to process in SQL. This is often done for display purposes because humans find this easier to read. However, from a software engineering perspective, such formatting should not be done in the data layer; be careful that by doing this you don't actually make your future life harder.
I have two dataset data1 and data2
data data1;
input sn id $;
datalines;
1 a
2 a
3 a
;
run;
data data2;
input id $ sales x $;
datalines;
a 10 x
a 20 y
a 30 z
a 40 q
;
run;
I am merging them from below code:
data join;
merge data1(in=a) data2(in=b);
by id;
if a and b;
run;
Result: (I was expecting an Inner Join result which is not the case)
1 a 10 x
2 a 20 y
2 a 30 z
2 a 40 w
Result from proc sql inner join.
proc sql;
select data1.id,sn,sales,x from data2 inner join data1 on data1.hh_id;
quit;
Result: (As expected from an inner join)
a 1 10 x
a 1 20 y
a 1 30 z
a 1 40 w
a 2 10 x
a 2 20 y
a 2 30 z
a 2 40 w
b 3 10 x
b 3 20 y
b 3 30 z
b 3 40 w
I want to know the concept and STEP BY STEP working of merge statement in SAS with In= and proving the above result.
PS: I have read this, and it says
An obvious use for these variables is to control what kind of 'merge'
will occur, using if statements. For example, if
ThisRecordIsFromYourData and ThisRecordIsFromOtherData; will make SAS
only include rows that match on the by variables from both input data
sets (like an inner join).
which I guess, (like an Inner Join) is not always the case.
Basically, this is a result of the difference in how the SAS data step and SQL process their respective join/merges.
SQL creates a separate record for each possible combination of keys. This is a Cartesian Product (at the key level).
SAS data step, however, process merges very differently. MERGE is really nothing more than a special case of SET. It still processes rows iteratively, one at a time - it never goes back, and never has more than one row from any dataset in the PDV at once. Thus, it cannot create a Cartesian product in its normal process - that would require random access, which the SAS datastep doesn't do normally.
What it does:
For each unique BY value
Take the next record from the left side dataset, if one exists with that BY value
Take the next record from the right side dataset, if one exists with that BY value
Output a row
Continue until both datasets are exhausted for that BY value
With BY values that yield unique records per value on either side (or both), it is effectively identical to SQL. However, with BY values that yield duplicates on BOTH sides, you get what you have there: a side-by-side merge, and if one runs out before the other, the values from the last row of the shorter dataset (for that by value) are more-or-less copied down. (They're actually RETAINED, so if you overwrite them with changes, they will not reset on new records from the longer dataset).
So, if left has 3 records and right has 4 records for key value a, like in your example, then you get data from the following records (assuming you don't alter the data after):
left right
1 1
2 2
3 3
3 4
For example, the dataset a is
id x
1 15
2 25
3 35
4 45
I want to add a column y to dataset a, y being the average of x excluding the current id.
so y_1 = (x_2+x_3+x_4)/3 = (25+35+45)/3.
Easiest way to do it without SQL is to add the mean and the n to each row (use PROC MEANS, then merge on the values), and then use math to remove the current value. IE, if x_mean=(15+25+35+45)/4 = 30, and x=15, then
x_mean_others = ((30*4)-15)/(4-1) = 105/3 = 35
Alternateively, in SQL, you can calculate it on the fly with the same idea.
proc sql;
create table want as
select x, (mean(x)*n(x) - x)/(n(x)-1) as y
from have H
;
quit;
This takes advantage of SAS's automatic remerging, in something like SQL Server you'd need a WITH clause to make this work I imagine.
Example:
column A column B
A 1
A 2
B 2
B 2
C 1
C 1
I would somehow like to get the following result:
column A column B
A 1.5
B 2
C 1
(which are averages of 1 and 2, 2 and 2 and 1 and 1)
How do I achieve that?
Thanks
If you're using Excel 2007 or above, you can also use the shorter AVERAGEIF function:
=AVERAGEIF($A$1:$A:$6,D1,$B$1:$B$6)
Less typing, easier to read..
In D1:D3, type A, B, C. Then in E1, put this formula
=SUMIF($A$1:$A$6,D1,$B$1:$B$6)/COUNTIF($A$1:$A$6,D1)
and fill down to E3. If you want to replace the existing data, copy E1:E3 and paste-special-values over itself. Then delete A:C.
Alternatively, you can add headers to your data, say "Letter" and "Number". Then create a Pivot Table from your data. Put Letter in the rows section and Number in the Data section. Change your Data section from SUM to AVERAGE and you'll get the same result.
Total newbie here, regarding sqlite, so don't flame too hard :)
I have a table:
index name length L breadth B height H
1 M-1234 10 5 2
2 M-2345 20 10 3
3 ....
How do I put some tabular data (let' say ten x,y values) corresponding to index 1, then another table to index 2, and then another, etc. In short, so that I have a table of x and y values that is "connected" to first row, then another that is connected to second row.
I'm reading some tutorials on sqlite3 (which I'm using), but am having trouble finding this. If anyone knows a good newbie tutorial or a book dealing with sqlite3 (CLI) I'm all ears for that too :)
You are just looking for information on joins and the concept of foreign key, that although SQLite3 doesn't enforce, is what you need. You can go without it, anyway.
In your situation you can either add two "columns" to your table, being one x and another y, or create a new table with 3 "columns": foreign_index, x and y. Which one to use depends on what you are trying to accomplish, performance and maintainability.
If you go the linked table route, you'd end up with two tables, like this:
MyTable
index name length L breadth B height H
1 M-1234 10 5 2
2 M-2345 20 10 3
3 ....
XandY
foreign_index x y
1 12 9
2 8 7
3 ...
When you want the x and y values of your element, you just use something like SELECT x, y FROM XandY WHERE foreign_index = $idx;
To get all the related attributes, you just do a JOIN:
SELECT index, name, length, breadth, height, x, y FROM MyTable INNER JOIN XandY ON MyTable.index = XandY.foreign_index;