How to define input variable when datalines have spaces for a variable - input

I want to have two different strings in the same dataset.
I tried to separate valeus with "" but it didnt work. Imagine I dont want to write "" but only strings inside. I searched a lot but did not find anything related to.
Could you guys please help me to get my goal?
data ecl.dim_produtos;
input id_produt id_departament id_order id_business id_portfolio initials $4. long_name $40. short_name $30.;
datalines;
1 1 10201 4 1 PZC "Puzzle Crédito" "Puzzle Crédito"
2 1 10202 4 1 PZR "Puzzle Reestruturados" "Reestruturados"
3 2 10207 30 1 DBO "Banca Online" "Banca Online"
4 3 10210 60 1 CLB "Colaboradores" "Colaboradores"
5 1 10203 4 1 PZF "Puzzle Formação" "Code Academy"
6 4 10205 5 1 HIP "Hipoteca Inversa" "Hip. Inversa"
7 5 10206 25 1 EMP "DEMP" "DEMP"
8 6 10208 45 1 NCO "NewCo" "NewCo"
9 6 10211 70 1 LDRC "Lendrock" "Lendrock"
10 4 10209 50 1 OTI "Otima Provision" "Otima"
11 6 10001 1 1 LDC "Lendico" "Lendico"
12 6 10007 1 1 MIBL "Market Invoice BL - EUR" "Market Invoice BL"
13 6 10003 1 1 CRS "CreditShelf" "CreditShelf"
14 6 10005 1 1 FUN "Funding Circle" "Funding Circle"
15 6 10002 1 1 RAI "Raize" "Raize"
16 4 10204 5 1 FLX "Flex" "Flex"
17 6 10101 2 1 AUX "Auxmoney" "Auxmoney"
18 6 10009 2 1 UPG "Upgrade - EUR" "Upgrade"
19 6 10104 2 1 PRO "Prodigy Finance" "Prodigy"
20 6 10102 2 1 FEL "Fellow Finance" "Fellow"
21 6 10008 1 1 ASZ "Assetz - EUR" "Assetz"
22 6 10010 2 1 LDB "Lendable - EUR" "Lendable"
23 6 10004 1 1 LIN "Linked Finance" "Linked"
24 6 10103 2 1 LDR "Lendrock" "Lendrock"
25 6 10105 3 1 EDX "Edebex" "Edebex"
26 6 10006 1 1 CAM "Camomille - FC" "Camomille"
27 6 10106 3 1 MIN "Market Invoice - EUR" "Market Invoice"
90 0 99991 102 2 DIV "Dívida Pública - EUR" "Dívida Pública"
91 6 99992 103 2 CRP "Obrigações Corporate - EUR" "Obrigações Corporate"
92 0 99990 101 3 SDA "Disp. Aplicações OIC - EUR" "Disp. Aplicações OIC"
9999 0 999999 999 99 TOT "Total Patrimonial - EUR" "Total Patrimonial"
;
run;

The most reliable approach would be to:
define the variables of the INPUT statement using a length or attrib statement.
use INFILE options to specify how the data lines are parsed by INPUT
take the $ out of the INPUT statement
Example (leave data lines as-is):
length
id_produt id_departament id_order id_business id_portfolio 8
initials $4
long_name $40
short_name $30
;
infile cards dsd dlm=" ";
For the case of wanting data lines with double quotes, you will have to modify the data lines to separate the values with two or more spaces and use the & argument for the variables in a list-style INPUT statement.
You could also separate the values in the data lines with a tab character and use DLM='09'x. You might have some trouble seeing and entering tabs using the SAS editor.

First make sure to use the : modifier if you want to include informat specifications in the INPUT statement to avoid switching between list and formatted input modes.
If you can insure that you have have at least two spaces between the values (and that the values themselves do NOT have adjacent spaces inside them) you can use the & modifier.
data test;
input id_produt id_departament id_order id_business id_portfolio
initials &:$4. long_name &:$40. short_name &:$30.
;
datalines;
1 1 10201 4 1 PZC Puzzle Crédito Puzzle Crédito
2 1 10202 4 1 PZR Puzzle Reestruturados Reestruturados
;
Or keep the quotes and make sure there is exactly one space between each value (and don't indent the datalines!) and add the DSD option.
data test;
infile datalines dsd dlm=' ' truncover ;
input id_produt id_departament id_order id_business id_portfolio
initials :$4. long_name :$40. short_name :$30.
;
datalines;
1 1 10201 4 1 PZC "Puzzle Crédito" "Puzzle Crédito"
2 1 10202 4 1 PZR "Puzzle Reestruturados" "Reestruturados"
;
Or use a different delimiter, with or without the DSD option.
data test;
infile datalines dsd dlm='|' truncover ;
input id_produt id_departament id_order id_business id_portfolio
initials :$4. long_name :$40. short_name :$30.
;
datalines;
1|1|10201|4|1|PZC|Puzzle Crédito|Puzzle Crédito
2|1|10202|4|1|PZR|Puzzle Reestruturados|Reestruturados
;

Related

How to create 2 datalines in sas with different length

I want to create a table like that:
a 1 2 3
b 1 2 3 4
a has 3 values, b has 4.
How can I do it in SAS?
When I enter it like that it deletes the 4 at the end.
data my_data;
input a b;
datalines;
1 1
2 2
3 3
4
I am very new to SAS thanks for your advice.
If you want to use LIST MODE input, like in your example, then each variable needs to have a "word" on the line. Use a period to indicate the missing values.
data my_data;
input a b;
datalines;
1 1
2 2
3 3
. 4
;
Otherwise switch to COLUMN MODE input.
data my_data;
input a 1-2 b 3-4 ;
datalines;
1 1
2 2
3 3
4
;
Or FORMATTED MODE
data my_data;
input a 2. b 2.;
datalines;
1 1
2 2
3 3
4
;
Note that you can use the period to indicate a missing value even when the variable is character. This is because the normal character informat will convert that single period into a blank value.
data my_data;
input a $ b;
datalines;
1 1
2 2
3 3
. 4
;

Reordering the results of an SQL query using SQLite

I have a SQLite database that models Sanskrit nouns and has tables like this: (Sorry if it is very lengthy. I've tried to cut things down to the minimum necessary to understand this problem.)
numbers:
id
number
1
singular
2
dual
3
plural
cases:
id
case
1
nominative
2
accusative
3
instrumental
4
dative
5
ablative
6
genitive
7
locative
8
vocative
nouns:
id
name
1
rAma
forms:
id
form
noun
1
rAmaH
1
2
rAmau
1
3
rAmAH
1
4
rAmam
1
5
rAmAN
1
6
rAmENa
1
7
rAmAbhyAm
1
8
rAmaiH
1
9
rAmAya
1
10
rAmebhyaH
1
11
rAmAt
1
12
rAmasya
1
13
ramayoH
1
14
rAmANAm
1
15
rAme
1
16
rAmeShu
1
17
rAma
1
noun is a foreign key which references nouns(id)
nounforms:
id
form
case
number
noun
1
1
1
1
1
2
2
1
2
1
3
3
1
3
1
4
4
2
1
1
5
2
2
2
1
6
5
2
3
1
7
6
3
1
1
8
7
3
2
1
9
8
3
3
1
10
9
4
1
1
11
7
4
2
1
12
10
4
3
1
13
11
5
1
1
14
7
5
2
1
15
10
5
3
1
16
12
6
1
1
17
13
6
2
1
18
14
6
3
1
19
15
7
1
1
20
13
7
2
1
21
16
7
3
1
22
17
8
1
1
23
2
8
2
1
24
3
8
3
1
form is a foreign key which references forms(id)
case is a foreign key which references cases(id)
number is a foreign key which references numbers(id)
noun is a foreign key which references nouns(id)
I can get all the declensions of the noun rAma with this SQL query:
SELECT forms.form FROM forms JOIN nouns,nounforms
WHERE forms.id = nounforms.form
AND nounforms.noun = nouns.id
AND noun.name = "rAma"
GROUP BY nounforms.case, nounforms.number;
and that returns the whole noun perfectly in 24 rows:
form
rAmaH
rAmau
rAmAH
rAmam
rAmau
rAmAN
rAmENa
rAmAbhyAm
rAmaiH
rAmAya
rAmAbhyAm
rAmebhyaH
rAmAt
rAmAbhyAm
rAmebhyaH
rAmasya
ramayoH
rAmANAm
rAme
ramayoH
rAmeShu
rAma
rAmau
rAmAH
So far so good. But what I would really like is something like this:
singular
dual
plural
rAmaH
rAmau
rAmAH
rAmam
rAmau
rAmAN
rAmENa
rAmAbhyAm
rAmaiH
rAmAya
rAmAbhyAm
rAmebhyaH
rAmAt
rAmAbhyAm
rAmebhyaH
rAmasya
ramayoH
rAmANAm
rAme
ramayoH
rAmeShu
rAma
rAmau
rAmAH
i.e. 8 rows for each case with 3 columns for each number. The problem is my SQL knowledge is not quite enough to get me there. I think what I want is a view or a virtual table. Is that right? Also once that is solved, I would like to parametrize the query so I can use it for nouns other than rAma but SQLite does not I believe support stored procedures. Is that right? If so, what is the workaround?
Btw, I am aware that I can do the reordering in my application. In fact, that is what I am
doing now but I would like to keep as much centralized in the database as possible so I can port to other languages/environments.
Can anyone help?
You need conditional aggregation:
SELECT MAX(CASE WHEN nf.number = 1 THEN f.form END) singular,
MAX(CASE WHEN nf.number = 2 THEN f.form END) dual,
MAX(CASE WHEN nf.number = 3 THEN f.form END) plural
FROM forms f
JOIN nouns n ON n.id = f.noun
JOIN nounforms nf ON f.id = nf.form AND nf.noun = n.id
WHERE n.name = ?
GROUP BY nf.`case`;
Replace the placeholder ? with the noun that you want.
Also, always use proper joins with ON clauses and aliases for the tables to make the code shorter and more readable.
See the demo.
As you already know, SQLite does not support stored procedures or functions, so probably the best way to use this query is as it is in your app with the the placeholder ? in a prepared statement and pass the value of the noun as a parameter.

Generate multiple rows from row with bitmask

Lets have table with 3 columns: key, value, and bitmask (as varchar; of unknown maximum length):
abc | 23 | 101
xyz | 56 | 000101
Is it possible to write query, where on the output I will get one row for every combination of key, value, and 1 in bitmask, with index of that 1 as integer column (doesnt matter if starting from 0 or 1)? So for example above:
abc | 23 | 1
abc | 23 | 3
xyz | 56 | 4
xyz | 56 | 6
Thanks for any ideas!
I think you might be better off choosing a maximum length for your varchar.
SELECT * FROM
table
INNER JOIN
generate_series(1,1000) s(n)
ON
s.n <= char_length(bitmask) and
substring(bitmask from s.n for 1) = '1'
We generate a list of numbers:
s.n
---
1
2
3
4
...
And join it to the table in a way that causes repeated table rows:
s.n bitmask
--- -------
1 000101
2 000101
3 000101
4 000101
5 000101
6 000101
1 101
2 101
3 101
Then use the s.n to substring the bitmask, and look for being equal to 1:
s.n bitmask substr
--- ------- ------
1 000101 --substring('000101' from 1 for 1) = '1'? no
2 000101 --substring('000101' from 2 for 1) = '1'? no
3 000101 --substring('000101' from 3 for 1) = '1'? no
4 000101 --substring('000101' from 4 for 1) = '1'? yes...
5 000101
6 000101
1 101
2 101
3 101
So the s.n gives us the number in the last column of your desired output, and the where filters to only rows where the string substring works out

mathematical operations in a text file usinf awk

I have a text file which looks like this small example:
in this file the first line of each group is ID and belong each ID, there are some lines in which the 1st column is 3-letters character and 2nd one is a number. 2 columns are tab separated.
ID1
AAA 17
TTA 3
ATA 6
ATC 12
AAG 9
ACA 13
ATG 21
ACC 13
ACG 5
AAT 12
AGA 11
ATT 22
AGC 11
TAA 3
ACT 8
TAC 12
ID2
AAA 10
AAC 7
AAG 4
ACA 3
ACC 1
ATG 6
ACG 1
below also I have a list of 3-letter characters. I want to get a ratio of TTA and other 3-letters characters belong each ID which is also present in the below list.
ATT
ATC
ATA
CTT
AAC
CTA
CTG
TTA
TTG
GTT
GTC
GTA
GTG
the output for this example would look like this:
ID1 0.065
ID2 0
for the ID2, the ration is 0 because there is no TTA and for the ID1 is 0.065 because 3 divided by 46 is equal to 0.065. for each ID, I only took the 3-letters characters which are common between above list and the rows below each ID. and also 2 columns are tab separated.
I am quite new in awk programming language. I wrote the following piece of code, but it does not return what I want. would you please help me to fix it?
3_letter_list= [ATT, ATC, ATA, CTT, AAC, CTA, CTG, TTA, TTG, GTT, GTC, GTA, GTG]
awk -F "\t" '{if($1==3_letter_list), (if $1=="TTA" & ratio=$2/$1)}' filename.txt > out.txt
ID3
AAA 2
AAC 8
ATA 1
ATC 20
AAG 26
ACA 6
ATG 11
ACC 16
ACG 7
AAT 2
ATT 4
AGC 18
TAA 1
TAC 8
ACT 3
AGG 1
TTC 20
TCA 1
TCC 8
TTG 6
TCG 4
AGT 5
TAT 3
GAC 18
GTC 12
TTT 6
TGC 7
GAG 31
TCT 1
GCC 19
GTG 21
TGG 6
GCG 8
CAC 12
GAT 6
CTC 12
GGA 2
CAG 22
GGC 25
CTG 52
CCC 15
GCT 3
GGG 6
CCG 4
CAT 4
CTT 2
CGC 18
GGT 4
CCT 3
CGG 13
Awk solution:
Assuming that list of 3-letter character groups is saved into groups_list.txt.
awk 'NR==FNR{ a[$1]; next }
/^ID[0-9]/{
if (id) { printf "%s %.4f\n", id, tta/sum; id=tta=sum="" }
id = $1; next
}
$1 == "TTA"{ tta = $2 }
$1 in a{ sum += $2 }
END{ printf "%s %.4f\n", id, tta/sum }' groups_list.txt file.txt
The output:
ID1 0.0698
ID2 0.0000

Why can I get the FAILED:Invalid table alias or column reference ‘ ’: (possible column names are: line) when I queried in Hive?

I have a table which struct is:
line string
and the content is:
product_id product_date orderattribute1 orderattribute2 orderattribute3 orderattribute4 ciiquantity ordquantity price
1 2014-09 2 1 1 1 1 3 153
1 2014-01 2 1 1 1 1 1 153
1 2014-04 2 2 1 1 1 1 164
1 2014-02 2 1 1 1 3 4 162
1 2014-07 2 1 1 1 9 23 224
1 2014-08 2 1 1 1 1 7 216
1 2014-03 2 1 1 1 3 13 180
1 2014-08 2 2 1 1 4 6 171
1 2014-05 2 1 1 1 3 7 180
....
(19000 lines omited)
the total price of every line above is ordquantity*price
I want to get the total price of every month like this:
month sum
201401 ****
201402 ****
Accoding to just 10 lines in the table above,the sum of month 201408 is 7*216+6*171 which is derived from (1 2014-08 2 1 1 1 1 7 216 and 1 2014-08 2 1 1 1 1 7 216).
I use the code:
create table product as select sum(ordquantity*price) as sum from text3 group by product_date;
and I got the FAILED:
FAILED:Invalid table alias or column reference 'product_date': (possible column names are: line)
I am not familiar with Hive,I don't know how to solv the problem.
Did you just create the table with correct schema? Well in case, if you didn't
CREATE TABLE product
(product_id INT
product_date STRING
orderattribute1 INT
orderattribute2 INT
orderattribute3 INT
orderattribute4 INT
ciiquantity INT
ordquantity INT
price INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
And the code for your requirment,
SELECT product_date, SUM(ordquantity*price) FROM product
GROUP BY product_date;
Hope I answered your question. Yippee!!