Pulling data from Netflix through private account - authentication

I want to pull a list of all the movies and shows I have seen on Netflix for a personal project, which Netflix has a page for.
Results from trying curl:
curl https://www.netflix.com/MoviesYouveSeen -v
* Trying 50.112.92.119...
* Connected to www.netflix.com (50.112.92.119) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* ALPN/NPN, server did not agree to a protocol
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* subject: CN=www.netflix.com,OU=Operations,O="Netflix, Inc.",L=Los Gatos,ST=CALIFORNIA,C=US
* start date: Apr 14 00:00:00 2015 GMT
* expire date: Apr 12 23:59:59 2017 GMT
* common name: www.netflix.com
* issuer: CN=Symantec Class 3 Secure Server CA - G4,OU=Symantec Trust Network,O=Symantec Corporation,C=US
> GET /MoviesYouveSeen HTTP/1.1
> Host: www.netflix.com
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 302 Found
< Cache-Control: no-cache, no-store
< Date: Tue, 26 Apr 2016 14:47:16 GMT
< Edge-Control: no-cache, no-store
< location: https://www.netflix.com/Login?nextpage=https%3A%2F%2Fwww.netflix.com%2FMoviesYouveSeen
< req_id: 2a134cc9-7f77-4a35-9d83-0099fc7a2466
< Server: shakti-prod i-8cf6164a
< Set-Cookie: nflx-rgn=uw2|1461682036196; Max-Age=-1; Expires=Tue, 26 Apr 2016 14:47:15 GMT; Path=/; Domain=.netflix.com
< Set-Cookie: memclid=b40d0e2c-27b3-4d72-9b14-4477fcf5fa39; Max-Age=31536000; Expires=Wed, 26 Apr 2017 14:47:16 GMT; Path=/; Domain=.netflix.com
< Set-Cookie: nfvdid=BQFmAAEBEDgFjrzXIIi7X6rTj6vmSYUwYpekhXXCCx5ywGWHaOvo0%2BmNx86oMCsliwERTTbRi6FwmgZM3YhqFUBfffSwJ0Kd; Max-Age=31536000; Expires=Wed, 26 Apr 2017 14:47:16 GMT; Path=/; Domain=.netflix.com
< Strict-Transport-Security: max-age=31536
< Via: 1.1 i-6af8eaad (us-west-2)
< X-Content-Type-Options: nosniff
< X-Frame-Options: DENY
< X-Netflix-From-Zuul: true
< X-Netflix.nfstatus: 1_1
< X-Originating-URL: https://www.netflix.com/MoviesYouveSeen
< X-Xss-Protection: 1; mode=block; report=https://ichnaea.netflix.com/log/freeform/xssreport
< Content-Length: 256
< Connection: keep-alive
<
* Connection #0 to host www.netflix.com left intact
I also tried wget:
wget https://www.netflix.com/MoviesYouveSeen
--2016-04-26 10:57:23-- https://www.netflix.com/MoviesYouveSeen
Resolving www.netflix.com (www.netflix.com)... 54.244.126.7, 50.112.115.177, 54.214.7.82, ...
Connecting to www.netflix.com (www.netflix.com)|54.244.126.7|:443... connected.
HTTP request sent, awaiting response... 302 Found
Syntax error in Set-Cookie: nflx-rgn=uw2|1461682643973; Max-Age=-1; Expires=Tue, 26 Apr 2016 14:57:23 GMT; Path=/; Domain=.netflix.com at position 39.
Location: https://www.netflix.com/Login?nextpage=https%3A%2F%2Fwww.netflix.com%2FMoviesYouveSeen [following]
--2016-04-26 10:57:24-- https://www.netflix.com/Login?nextpage=https%3A%2F%2Fwww.netflix.com%2FMoviesYouveSeen
Reusing existing connection to www.netflix.com:443.
HTTP request sent, awaiting response... 200 OK
Syntax error in Set-Cookie: nflx-rgn=uw2|1461682644112; Max-Age=-1; Expires=Tue, 26 Apr 2016 14:57:23 GMT; Path=/; Domain=.netflix.com at position 39.
Length: unspecified [text/html]
Saving to: ‘MoviesYouveSeen’
MoviesYouveSeen [ <=> ] 41.63K 220KB/s in 0.2s
2016-04-26 10:57:24 (220 KB/s) - ‘MoviesYouveSeen’ saved [42629]
It looks like I am not being properly authenticated. Inside my browser if I view source I can see the list of movies. Any suggestions for getting the data?

That 302 response is redirecting you to the login page. You'd need to be logged in for the query to work correctly.

Related

Index file for a subdirectory through CloudFront

I am trying to do a perfectly conventional thing: I am using
CloudFront / S3 to host a static website, but I also want to host
another website in a subdirectory. Following the instruction, I
believe I got S3 to work
% curl -v http://mydomain.me.s3-website-us-west-1.amazonaws.com/c
> GET /c HTTP/1.1
> Host: mydomain.me.s3-website-us-west-1.amazonaws.com
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 302 Moved Temporarily
< x-amz-error-code: Found
< x-amz-error-message: Resource Found
< x-amz-request-id: 9BB13A73FFB4503E
< x-amz-id-2: 3JX26tNdHi1irPbFJS7E1BifwliygqRZsZIc/qZptjBqBjjmGL7YGK6xfG23GZR70R0Ou+3ZAiM=
< Location: /c/
< Content-Type: text/html; charset=utf-8
< Content-Length: 313
< Date: Tue, 01 Dec 2020 01:58:08 GMT
< Server: AmazonS3
So /c is redirecting to /c/, which I believe is correct, and that new location definitely serves correctly:
% curl -v http://mydomain.me.s3-website-us-west-1.amazonaws.com/c/
> GET /c/ HTTP/1.1
> Host: mydomain.me.s3-website-us-west-1.amazonaws.com
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< x-amz-id-2: BD0wdDnhonp7Y5i2b7mUDVbIXKYu4O52YPUKVQx5GDaLW5hmDzcrsF/EixdksCtkt/NK6Bg24hY=
< x-amz-request-id: 7F11B109218EF9ED
< Date: Tue, 01 Dec 2020 01:58:11 GMT
< Last-Modified: Tue, 01 Dec 2020 01:31:59 GMT
< x-amz-version-id: zSq5IxE3Ug8oG5SSW.lZsCYydp42.h.4
< ETag: "7999ccd49fe930021167ae6f8fe95eb6"
< Content-Type: text/html
< Content-Length: 36
< Server: AmazonS3
<
And it actually gives me my file. But when I try to go through CloudFront for /c:
% curl -v https://mydomain.me/c
> GET /c HTTP/2
> Host: mydomain.me
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/2 403
< content-type: application/xml
< date: Tue, 01 Dec 2020 01:59:43 GMT
< server: AmazonS3
< x-cache: Error from cloudfront
< via: 1.1 58b53da3f7d231b76d30fcffbf4945a1.cloudfront.net (CloudFront)
< x-amz-cf-pop: SFO20-C1
< x-amz-cf-id: PSjqsinkkfheUfhEPVYbbujMqemugFbrYxM-pQMIihMk3dpp2W4Bmw==
and it downloads the familiar S3 access denied. For /c/, it is even weirder:
% curl -v https://mydomain.me/c/
> GET /c/ HTTP/2
> Host: mydomain.me
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/2 200
< content-type: application/x-directory; charset=UTF-8
< content-length: 0
< last-modified: Tue, 01 Dec 2020 01:30:44 GMT
< x-amz-version-id: 4L.jn6WG3emcGutRuwEZv_lE0aO07AGR
< accept-ranges: bytes
< server: AmazonS3
< date: Tue, 01 Dec 2020 02:00:31 GMT
< etag: "d41d8cd98f00b204e9800998ecf8427e"
< x-cache: RefreshHit from cloudfront
< via: 1.1 37d64bca4c93552139fb3a85c9c4a119.cloudfront.net (CloudFront)
< x-amz-cf-pop: SFO20-C1
< x-amz-cf-id: r5lS4QTmg07XhIXRlXsNJ4qcJaWXfj5Ik9fXZPY_dzLjED-A2MhBiA==
It "works", but it returns an empty file, which it says is a directory listing.
I have logging turned on, and that last one returns:
b5063beaaa3c80c2ad85635ddb1c5fac3da6b5510e9ef332c9e0df0c9abdd45a mydomain.me [01/Dec/2020:01:57:47 +0000] 73.202.134.48 b5063beaaa3c80c2ad85635ddb1c5fac3da6b5510e9ef332c9e0df0c9abdd45a 116EA2ED16AA56DE REST.GET.NOTIFICATION - "GET /mydomain.me?notification= HTTP/1.1" 200 - 115 - 15 - "-" "S3Console/0.4, aws-internal/3 aws-sdk-java/1.11.888 Linux/4.9.217-0.3.ac.206.84.332.metal1.x86_64 OpenJDK_64-Bit_Server_VM/25.262-b10 java/1.8.0_262 vendor/Oracle_Corporation" - noe+YUO+FeYaIukSpTTKl9npt1R0+uAr4Hqzx/mQge2bfhydBiiquR9EWG3iGanDRjK/EagN5Ss= SigV4 ECDHE-RSA-AES128-SHA AuthHeader s3-us-west-1.amazonaws.com TLSv1.2
CloudFront is running some Java library?
curl -v https://mydomain.me/c/index.html works fine.
I assume I have misconfigured CloudFront, but cannot figure out how. Any suggestions?
Click on the CloudFront Distribution ID
Select the tab "Origins and Origin Groups"
Click the checkbox for the first item under "Origins" (assuming you only have one)
Click "Edit"
Change the "Origin Domain Name" to
"mydomain.me.s3-website-us-west-1.amazonaws.com" (following your
example)
Click "Yes, Edit"
I've done this a hundred times, I know this is a requirement, and it bites me every time!

Invalid client type getting verification code for limited input devices with google oauth

I'm trying to add google login for limited input devices as described here for a Web application type.
Using my client_id I cannot get the verification code because I keep getting this error:
$> curl -d "client_id=887293527777-tf5uf5q5skss8sbktp1vpo67p2v5b7i7.apps.googleusercontent.com&scope=email%20profile" https://accounts.google.com/o/oauth2/device/code
{
"error" : "invalid_client",
"error_description" : "Invalid client type."
}
And with verbose output:
$> curl -d "client_id=887293527777-tf5uf5q5skss8sbktp1vpo67p2v5b7i7.apps.googleusercontent.com&scope=email%20profile" https://accounts.google.com/o/oauth2/device/code -vvv
* Trying 209.85.203.84...
* TCP_NODELAY set
* Connected to accounts.google.com (209.85.203.84) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* ALPN, server accepted to use h2
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* subject: CN=accounts.google.com,O=Google Inc,L=Mountain View,ST=California,C=US
* start date: Feb 07 21:22:30 2018 GMT
* expire date: May 02 21:11:00 2018 GMT
* common name: accounts.google.com
* issuer: CN=Google Internet Authority G2,O=Google Inc,C=US
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x56450ea69cd0)
> POST /o/oauth2/device/code HTTP/1.1
> Host: accounts.google.com
> User-Agent: curl/7.51.0
> Accept: */*
> Content-Length: 104
> Content-Type: application/x-www-form-urlencoded
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
* We are completely uploaded and fine
< HTTP/2 401
< content-type: application/json; charset=utf-8
< x-content-type-options: nosniff
< cache-control: no-cache, no-store, max-age=0, must-revalidate
< pragma: no-cache
< expires: Mon, 01 Jan 1990 00:00:00 GMT
< date: Fri, 23 Feb 2018 14:29:41 GMT
< server: ESF
< x-xss-protection: 1; mode=block
< x-frame-options: SAMEORIGIN
< alt-svc: hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338; quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"
< accept-ranges: none
< vary: Accept-Encoding
<
{
"error" : "invalid_client",
"error_description" : "Invalid client type."
* Curl_http_done: called premature == 0
* Connection #0 to host accounts.google.com left intact
}
This is really annoying given they give a curl example on their guide. I've tried from javascript too but with no luck.
Edit: I can curl using Other as the type so I don't think the problem is on my side, but using Other is no good to me because I need to use Web Application in order to set CORS.
That flow is only supported for client type "TVs and Limited Input devices". See https://developers.google.com/identity/sign-in/devices#get_a_client_id_and_client_secret

HTTP keep-alive: when client should stop reusing connection?

I understand that the client should stop reusing http keepalive socket after timeout or the max usage. However two popular server implementations seem to keep socket open for the longer duration than the timeout and hence this question to learn what strategies a client should adapt to optimally reuse the connection.
We were testing our implementation with various servers, and while apache seems to be adhering to timeout, lighttpd and nginx seem to keep connection open for longer duration than the timeout. As you can see from the logs below, despite of client-server agreed on 15 second timeout, the socket was kept open for more than 60 seconds and second request succeeded even after 60 seconds.
1) First request and response with timestamp
GET /en/CHANGES HTTP/1.1
Host: nginx.org
Accept: */*
Accept-Encoding: identity
Content-Length: 0
Connection: Keep-Alive
Keep-Alive: timeout=15, max=1000
E(2410-130624-216): HTTP/1.1 200 OK
E(2410-130624-216): Server: nginx/1.13.3
E(2410-130624-216): Date: Tue, 24 Oct 2017 07:36:23 GMT
E(2410-130624-216): Content-Type: text/plain; charset=utf-8
E(2410-130624-216): Content-Length: 282456
E(2410-130624-216): Last-Modified: Tue, 10 Oct 2017 15:39:18 GMT
E(2410-130624-216): Connection: keep-alive
E(2410-130624-216): Keep-Alive: timeout=15
E(2410-130624-216): ETag: "59dce9a6-44f58"
E(2410-130624-216): Accept-Ranges: bytes
E(2410-130624-216):
2) Second request on the same socket, 60 seconds later
GET /en/CHANGES HTTP/1.1
Host: nginx.org
Accept: */*
Accept-Encoding: identity
Content-Length: 0
Connection: Keep-Alive
Keep-Alive: timeout=15, max=1000
E(2410-130726-881): HTTP/1.1 200 OK
E(2410-130726-881): Server: nginx/1.13.3
E(2410-130726-881): Date: Tue, 24 Oct 2017 07:37:26 GMT
E(2410-130726-881): Content-Type: text/plain; charset=utf-8
E(2410-130726-881): Content-Length: 282456
E(2410-130726-881): Last-Modified: Tue, 10 Oct 2017 15:39:18 GMT
E(2410-130726-881): Connection: keep-alive
E(2410-130726-881): Keep-Alive: timeout=15
E(2410-130726-881): ETag: "59dce9a6-44f58"
E(2410-130726-881): Accept-Ranges: bytes
E(2410-130726-881):
There comes the question, when should client stop reusing connection in the optimal manner. Should client keep reusing the connection till it gives error? or should we discard the connection after timeout? What strategies browsers are using?
Thanks

will Accept Http Headers digest any other formats other than the one specified?

Per my understanding:
the Accept header is used by HTTP clients to tell the server what content types they'll accept. The server will then send back a response, which will include a Content-Type header telling the client what the content type of the returned content actually is.
With this understanding, I tried the following:
curl -X GET -H "Accept: application/xml" http://www.google.com -v
* About to connect() to www.google.com port 80 (#0)
* Trying 173.194.33.81...
* connected
* Connected to www.google.com (173.194.33.81) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8y zlib/1.2.5
> Host: www.google.com
> Accept: application/xml
>
< HTTP/1.1 200 OK
< Date: Tue, 02 Sep 2014 17:58:05 GMT
< Expires: -1
< Cache-Control: private, max-age=0
< Content-Type: text/html; charset=ISO-8859-1
< Set-Cookie: PREF=ID=5c30672b67a74789:FF=0:TM=1409680685:LM=1409680685:S=PsGclk3vR4HWjann; expires=Thu, 01-Sep-2016 17:58:05 GMT; path=/; domain=.google.com
< Set-Cookie: NID=67=rPuxpwUu5UNuapzCdbD5iwVyjjC9TzP_Ado29h3ucjEq4A_2qkSM4nQM3RO02rfyuHmrh-hvmwmgFCmOvISttFfHv06f8ay4_6Gl4pXRjqxihNhJSGbvujjDRzaSibfy; expires=Wed, 04-Mar-2015 17:58:05 GMT; path=/; domain=.google.com; HttpOnly
< P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
< Server: gws
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
< Alternate-Protocol: 80:quic
< Transfer-Encoding: chunked
<
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><
As you can notice in the response, I am sent Content-Type: text/html; charset=ISO-8859-1 which is not what I asked for?
why does a different representation (HTML in this case) is sent, although I asked for xml?
Thanks
From RFC 2616:
If an Accept header field is present,
and if the server cannot send a response which is acceptable
according to the combined Accept field value, then the server SHOULD
send a 406 (not acceptable) response.
Here, "should" means that Google aren't actually obliged to throw a 406 error. But since you're receiving an HTML response, it has effectively the same meaning.

google oauth1 to oauth2 migration invalid_token error

I have been trying to obtain new oauth2 refresh tokens using oauth1 access token but it constantly returns an "invalid_token" error. I have checked and the access token is working correctly. I have also tested the same creds/params in oauth2 playground and result is the same. Any help is appreciated...
Here is the curl verbose output:
> POST /o/oauth2/token HTTP/1.1
Host: accounts.google.com
Content-Type: application/x-www-form-urlencoded
Authorization: OAuth oauth_nonce="cb7407355fe20f509cb6bf901eae2d24", oauth_timestamp="1389169471", oauth_consumer_key="***", oauth_token="1%2FFVy....", oauth_signature_method="HMAC-SHA1", oauth_signature="0YL1hH5R571nOH1byeHxQlg%2Fa6g%3D"
Content-Length: 444
* upload completely sent off: 444 out of 444 bytes
< HTTP/1.1 400 Bad Request
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Pragma: no-cache
< Expires: Fri, 01 Jan 1990 00:00:00 GMT
< Date: Wed, 08 Jan 2014 08:24:31 GMT
< Content-Type: application/json
< X-Content-Type-Options: nosniff
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
* Server GSE is not blacklisted
< Server: GSE
< Alternate-Protocol: 443:quic
< Transfer-Encoding: chunked
<
* Connection #0 to host accounts.google.com left intact
string(415) "HTTP/1.1 400 Bad Request
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Date: Wed, 08 Jan 2014 08:24:31 GMT
Content-Type: application/json
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Server: GSE
Alternate-Protocol: 443:quic
Transfer-Encoding: chunked
{
"error" : "invalid_token"
}"
Can you check if you are putting the client_secret in {} in the POST Body?
grant_type=urn:ietf:params:oauth:grant-type:migration:oauth1&client_id=xxxxxxx.apps.googleusercontent.com&client_secret={xxxxxxx}
You will also need to put {} around the client_secret value when you are generating the oauth_signature
We have made a few changes to the validation pieces of the OAuth1->OAuth2 token migration. Would you mind checking your migration flows again and updating this thread with the results?