Domanda python web scraping

markito

Utente Iron
3 Marzo 2021
23
7
2
19
Ultima modifica:
salve a tutti, ho un problema con questo codice:

Python:
import requests
from bs4 import BeautifulSoup

url = 'https://store.epicgames.com/it/free-games'

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"}

page = requests.get(url, headers=headers)

soup = BeautifulSoup(page.content, "html.parser")

print(soup.prettify())

l'intento è quello di ottenere il codice html dell' url scritto e in ultimo tracciare i giochi gratis settimanali della settimana, ma per ora vorrei prima capire perchè il risultato è questo:

Codice:
<html>
 <head>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <style>
   @media screen and (max-width:374px){h1{font-size:18px}.cf_challenge_container{margin:25px;padding:25px}.cf_challenge_text{font-size:16px}}@media screen and (min-width:375px) and (max-width:1279px){h1{font-size:30px}.cf_challenge_container{margin:25px;padding:60px}.cf_challenge_text{font-size:16px}}@media screen and (min-width:1280px){h1{font-size:40px}.cf_challenge_container{margin:25px;padding:60px}.cf_challenge_text{font-size:18px}}@media screen and (min-width:1920px){h1{font-size:50px}}.cf_challenge_text_small{font-size:11px}*,:after :before{-webkit-font-smoothing:antialiased;-webkit-touch-callout:none;-moz-osx-font-smoothing:grayscale;box-sizing:border-box;text-size-adjust:100%}body{height:100vh;width:100vw;overflow-x:hidden;font-family:sans-serif;font-weight:400;font-size:14px;line-height:20px;letter-spacing:.2px;color:hsla(0,0%,100%,.5);margin:0;background:#121212;display:flex;flex-direction:column;justify-content:center;align-items:center}.logo{padding-bottom:25px}.logo,section{display:flex;align-items:center;flex-direction:column;text-align:center}h1,p{font-family:Brutal,sans-serif;font-weight:400;padding:0;margin:0}h1{line-height:34px;text-align:center;letter-spacing:-.5px;color:#fff;margin:0 0 20px}p{font-size:14px;line-height:20px;letter-spacing:.2px;color:hsla(0,0%,100%,.5)}strong{font-weight:400;color:#fff}.cf_challenge_container{max-width:430px;min-width:200px;background-color:#202020;font-family:sans-serif;line-height:normal;overflow:auto}.cf_challenge,.cf_challenge_container{display:flex;justify-content:center;flex-direction:column}.cf_challenge{text-align:center;margin:25px 0}.cf_challenge_container hr{border-bottom:0;max-width:500px;opacity:.25}.cf_close_button{background:transparent;border-radius:4px;color:#fff;cursor:pointer;padding:5px;position:absolute;right:15px;top:10px;transition:.1s}.cf_close_button:hover{background:#3b3b3b}.cf_error_container button{background:transparent;border:1px solid #000;border-radius:4px;color:#000;cursor:pointer;font-family:sans-serif;font-weight:700;margin:5px;padding:14px 22px}.cf_error_container p{color:#000;font-family:sans-serif;font-size:14px;margin:20px}.cf_error_container{align-items:flex-start;background:#ffa640;border-radius:4px;display:none;justify-content:space-between;margin:auto auto 8px;text-align:left;width:500px}.cf_logo{margin:0 auto;width:41px}
  </style>
  <script type="application/javascript">
   const localeStrings = {
            ar: {
                challengeTitle: 'خطوة واحدة إضافية',
                challengeSubtitle: 'يُرجى إكمال فحص الأمان للمتابعة',
                sessionId: 'مُعرّف الجلسة',
                ipAddress: 'عنوان IP',
            },
            de: {
                challengeTitle: 'Ein letzter Schritt',
                challengeSubtitle: 'Bitte führe eine Sicherheitskontrolle aus, um fortzufahren.',
                sessionId: 'Sitzungs-ID',
                ipAddress: 'IP-Adresse',
            },
            en: {
                challengeTitle: 'One More Step',
                challengeSubtitle: 'Please complete a security check to continue',
                sessionId: 'Session ID',
                ipAddress: 'IP Address',
            },
            'es-ES': {
                challengeTitle: 'Un paso más',
                challengeSubtitle: 'Completa el control de seguridad para continuar',
                sessionId: 'ID de sesión',
                ipAddress: 'Dirección IP',
            },
            'es-MX': {
                challengeTitle: 'Un paso más',
                challengeSubtitle: 'Completa el control de seguridad para continuar',
                sessionId: 'ID de sesión',
                ipAddress: 'Dirección IP',
            },
            fr: {
                challengeTitle: 'Encore une étape',
                challengeSubtitle: "Remplissez l'enquête de sécurité pour continuer",
                sessionId: 'ID de session',
                ipAddress: 'Adresse IP',
            },
            it: {
                challengeTitle: 'Ancora un passo da compiere',
                challengeSubtitle: 'Completa un controllo di sicurezza per continuare',
                sessionId: 'ID della sessione',
                ipAddress: 'Indirizzo IP',
            },
            ja: {
                challengeTitle: 'あともう1ステップ',
                challengeSubtitle: '継続するにはセキュリティチェックを完了してください',
                sessionId: 'セッションID',
                ipAddress: 'IPアドレス',
            },
            ko: {
                challengeTitle: '한 단계가 더 남았습니다',
                challengeSubtitle: '계속하려면 보안 검사를 완료해주세요',
                sessionId: '세션 ID',
                ipAddress: 'IP 주소',
            },
            pl: {
                challengeTitle: 'Jeszcze jeden krok',
                challengeSubtitle: 'Przeprowadź kontrolę bezpieczeństwa, by kontynuować',
                sessionId: 'Identyfikator sesji',
                ipAddress: 'Adres IP',
            },
            'pt-BR': {
                challengeTitle: 'Mais uma etapa',
                challengeSubtitle: 'Complete uma verificação de segurança para continuar',
                sessionId: 'ID da sessão',
                ipAddress: 'Endereço IP',
            },
            ru: {
                challengeTitle: 'Ещё один шаг',
                challengeSubtitle: 'Перед тем как продолжить, завершите проверку безопасности',
                sessionId: 'Идентификатор сеанса',
                ipAddress: 'IP-адрес',
            },
            th: {
                challengeTitle: 'อีกขั้นตอนเดียวเท่านั้น',
                challengeSubtitle: 'โปรดทำการตรวจสอบความปลอดภัยให้เสร็จเพื่อดำเนินการต่อ',
                sessionId: 'ID เซสชัน',
                ipAddress: 'ที่อยู่ IP',
            },
            tr: {
                challengeTitle: 'Son Bir Adım Daha',
                challengeSubtitle: 'Devam etmek için lütfen bir güvenlik kontrolünü tamamla',
                sessionId: 'Oturum NO',
                ipAddress: 'IP Adresi',
            },
            'zh-CN': {
                challengeTitle: '再进行一步操作',
                challengeSubtitle: '请完成安全检查以继续',
                sessionId: '会话 ID',
                ipAddress: 'IP 地址',
            },
            'zh-TW': {
                challengeTitle: '再一個步驟',
                challengeSubtitle: '請完成安全性確認以繼續',
                sessionId: '階段 ID',
                ipAddress: 'IP 位址',
            }
        }

        const getLocaleStrings = (locale = 'en') => {
            switch (locale.toLowerCase()) {
                case 'ar':
                    return localeStrings.ar;
                case 'de-de':
                case 'de':
                    return localeStrings.de;
                case 'en-us':
                case 'en':
                    return localeStrings.en;
                case 'es-es':
                    return localeStrings["es-ES"];
                case 'es-mx':
                    return localeStrings["es-MX"];
                case 'fr':
                case 'fr-fr':
                    return localeStrings.fr;
                case 'it':
                case 'it-it':
                    return localeStrings.it;
                case 'ja':
                case 'ja-jp':
                    return localeStrings.ja;
                case 'ko':
                case 'ko-kr':
                    return localeStrings.ko;
                case 'pl':
                    return localeStrings.pl;
                case 'pt':
                case 'pt-br':
                    return localeStrings["pt-BR"];
                case 'ru':
                case 'ru-ru':
                    return localeStrings.ru;
                case 'th':
                    return localeStrings.th;
                case 'tr':
                    return localeStrings.tr;
                case 'zh':
                case 'zh-cn':
                    return localeStrings["zh-CN"];
                case 'zh-tw':
                    return localeStrings["zh-TW"]
                default: {
                    const hyphIndex = locale.indexOf('-');
                    const localeSub = locale.substring(0, hyphIndex);
                    if (hyphIndex > -1) {
                        return getLocaleStrings(localeSub);
                    } else {
                        return localeStrings.en;
                    }
                }
            }
        }

        document.addEventListener("DOMContentLoaded", function () {
            const els = document.querySelectorAll('[data-t]');
            els.forEach((el) => {
                const tKey = el.getAttribute('data-t');
                const locale = window.navigator.language;
                const loc = getLocaleStrings(locale);
                el.innerText = loc[tKey];
            })
        });
  </script>
 </head>
 <body>
  <div class="cf_challenge_container">
   <div class="cf_challenge_section">
    <div class="logo">
     <img alt="Epic Games Logo" class="cf_logo" src="data:image/svg+xml;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiIHN0YW5kYWxvbmU9Im5vIj8+CjxzdmcKICAgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIgogICB3aWR0aD0iODgiCiAgIGhlaWdodD0iMTA0IgogICB2aWV3Qm94PSIwIDAgODggMTA0IgogICBwcmVzZXJ2ZUFzcGVjdFJhdGlvPSJ0cnVlIgogICB2ZXJzaW9uPSIxLjEiPgogIDxwYXRoCiAgICAgZmlsbD0iI2ZmZiIKICAgICBkPSJNNy44OTYgMGg3Mi4yMDJjNS43NjcgMCA3Ljg5NiAyLjE3OCA3Ljg5NiA4LjA4MXY3MS4yMzJjMCAuNjcgMCA1LjI5LTQuNTI5IDcuNzMyLTMuMDIgMS42MjctMTQuODMyIDYuNjMzLTM1LjQzNyAxNS4wMTgtMS44Mi44NTQtMi42OTkgMS4wNzUtNC4wMyAxLjA1LTEuNDk2IDAtMi4wNTctLjIwMi00LjA1LTEuMDVDMTkuMzU3IDkzLjQwNyA3LjU1MSA4OC40IDQuNTMgODcuMDQ1IDAgODUuMDEuMjE1IDgyLjQ2NS4wODMgODEuMTc3QTE4Ljc4MSAxOC43ODEgMCAwIDEgMCA3OS4zMTNWOC4wODFDMCAyLjE3OCAyLjEyOSAwIDcuODk2IDB6bTY2LjA4OSA3Mi42MDRMNzQgNzIuNDN2LS4zODFsLS4wMTUtLjE3NC0uMDE3LS4xNTktLjA2NC0uMzE4LS4wMzItLjE0Mi0uMDQ3LS4xNDMtLjA1LS4xNDQtLjA2My0uMTI2LS4wODEtLjE0NC0uMDgxLS4xMjYtLjA5Ni0uMTQyLS4wOTYtLjEyOC0uMTEzLS4xMS0uMTEzLS4xMjgtLjEyOC0uMTExLS4xMjgtLjExMS0uMTMtLjA4LS4xMjgtLjA5Ni0uMTI4LS4wNzgtLjE0NS0uMDgtLjE0NS0uMDc5LS4xNi0uMDgtLjE2Mi0uMDY0LS4xNi0uMDgtLjE3Ny0uMDYtLjE0NS0uMDQ5LS4xNDQtLjA0OC0uMTYyLS4wNDYtLjE0NC0uMDQ4LS4xNi0uMDQ4LS4xNi0uMDQ4LS4xNjMtLjAzMi0uMTc3LS4wNDgtLjE2LS4wNDYtLjE3Ny0uMDMzLS4xNzctLjA0Ny0uMTc3LS4wNDgtLjE3Ny0uMDQ4LS4xNi0uMDMxLS4xNDUtLjA0OC0uMTQ1LS4wMzItLjEyOC0uMDQ4LS4xMjgtLjAzMS0uMTk0LS4wNjMtLjE3Ny0uMDY1LS4xNi0uMDYzLS4xMjgtLjA3OS0uMTQ3LS4wOC0uMTExLS4xMS0uMDY2LS4wOTUtLjA3OS0uMTc1LS4wMTctLjIwNnYtLjAzM2wuMDE3LS4xNTcuMDY0LS4xNDQuMDk2LS4xNDMuMTQ1LS4xMjcuMTEzLS4wNjMuMTI4LS4wNjMuMTQ1LS4wMzIuMTYyLS4wMzMuMTc1LS4wMzFoLjM1NGwuMTQ1LjAxNi4xNjIuMDE1LjE2LjAxNy4xNi4wMzEuMTYuMDMyLjE2Mi4wMzEuMTYuMDQ4LjE2Mi4wNDguMTc2LjA0OC4xMy4wNjMuMTQ0LjA0OC4xMjguMDYzLjE0NS4wNjMuMTI5LjA2NS4xNDUuMDYzLjEyOC4wOC4xNDUuMDYyLjEzLjA5Ni4xMjcuMDguMTQ0LjA3OS4xMy4wOTUuMDk2LS4xMjYuMDgtLjEyOC4wOTctLjEyNy4wOTYtLjEyNi4wOTYtLjE0NC4wOC0uMTI2LjA5Ny0uMTI4LjA5OC0uMTI3LjA3OS0uMTI2LjA5Ni0uMTI4LjA5OC0uMTI3LjA4LS4xMjYuMDk1LS4xNDQuMDk4LS4xMjYuMDk2LS4xMjguMDgtLjEyNy4wOTctLjEyOC0uMTI4LS4wOTQtLjEyOC0uMDk1LS4xMy0uMDk2LS4xMjgtLjA4LS4xNDUtLjA5NC0uMTI4LS4wOC0uMTQ1LS4wNzktLjE0NS0uMDgtLjEyOC0uMDY0LS4xNDUtLjA3OC0uMTYyLS4wNjQtLjE0NC0uMDYzLS4xNDUtLjA2My0uMTQ0LS4wNDgtLjE2Mi0uMDY1LS4xNDQtLjAzMS0uMTQ1LS4wNDgtLjE2MS0uMDQ3LS4xNDQtLjAzMS0uMTYxLS4wMzMtLjE2LS4wMzItLjE0Ni0uMDMxLS4xNi0uMDMyLS4xNzctLjAxNi0uMTYtLjAzMi0uMTc4LS4wMTVoLS4xNmwtLjE3OC0uMDE2LS4xNzctLjAxN2gtLjY5bC0uMzIxLjAzMy0uMTYyLjAxNS0uMTYuMDE3LS4xNDUuMDMxLS4xNi4wMzItLjE0Ny4wMzEtLjE0My4wMzItLjEyOC4wMzMtLjE0Ny4wNDYtLjE2LjA2NS0uMTYuMDQ2LS4xNDUuMDY1LS4xNDUuMDgtLjE0NS4wNjItLjE0NS4wOC0uMTI4LjA4LS4xMjguMDk0LS4xMy4wOTYtLjExMS4wOC0uMTMuMTEtLjExMS4xMjgtLjExMy4xMS0uMTEzLjEyNy0uMDk2LjEyNy0uMDk2LjEyOC0uMDgxLjE0Mi0uMDguMTQzLS4wNjUuMTQ0LS4wNjQuMTQyLS4wNjQuMTQzLS4wMzIuMTQyLS4wNS4xNDQtLjA2My4yODUtLjAxNy4xNi0uMDE1LjE0Mi0uMDE3LjE1OXYuMzY2bC4wMTcuMTczLjAxNS4xNi4wMTcuMTU4LjAzMi4xNDMuMDMyLjE1OS4wNDkuMTI2LjA0Ny4xNDQuMDQ5LjEyNi4wNjQuMTEuMDguMTQzLjA4LjEyOC4wOC4xMS4wOTcuMTEyLjA5OC4xMS4xMTEuMTEyLjExMy4xMS4xMTMuMDk3LjEyOC4wOC4xMy4wOTQuMTQzLjA4LjEzLjA3OS4xNi4wOC4xNDUuMDc5LjE2LjA2My4xNjIuMDY0LjE2LjA2My4xNzcuMDYzLjE0NS4wNDguMTQ1LjA0OC4xNDMuMDQ4LjE2Mi4wMzIuMTQ1LjA0OC4xNi4wNDYuMTYuMDMyLjE2Mi4wNDguMTYuMDMxLjE3Ny4wNDguMTYyLjAzMi4xNzUuMDQ4LjE2Mi4wMzEuMTYuMDQ4LjE0NS4wMzIuMTQ1LjA0OC4xMjguMDMxLjExMy4wMzIuMTkzLjA3OS4xNjEuMDY1LjE0NS4wNjMuMTI4LjA2My4xMjkuMDk2LjExMi4wOTQuMDgxLjExMS4wNjQuMTU5LjAxNy4xOXYuMDMybC0uMDE3LjE1OS0uMDQ4LjE0Mi0uMDY0LjEyOC0uMDk3LjExLS4xMjguMDk3LS4xMTMuMDYzLS4xMy4wNDgtLjE0My4wMzEtLjE0NS4wMzItLjE2MS4wMzEtLjE3Ny4wMTdoLS4zNTNsLS4xNzctLjAxN2gtLjE2bC0uMTYyLS4wMzEtLjE3Ny0uMDE1LS4xNi0uMDMzLS4xNi0uMDMyLS4xNjItLjAzMS0uMTQ1LS4wNDgtLjE2LS4wNDctLjE2LS4wNDgtLjE0Ny0uMDYzLS4xNDMtLjA0OC0uMTQ1LS4wNjQtLjE0NS0uMDYzLS4xNDUtLjA2My0uMTI4LS4wOC0uMTQ1LS4wNjMtLjEyOC0uMDgtLjE0NS0uMDk1LS4xMy0uMDgtLjE0My0uMDk0LS4xMy0uMDk2LS4xNDMtLjA5Ni0uMTMtLjA5NS0uMDk2LjEyOC0uMTEzLjExMS0uMDk2LjEyNy0uMTEzLjEyNi0uMDk2LjExMS0uMDk2LjEyOC0uMTEzLjEyNy0uMDk2LjExMS0uMTEzLjEyNi0uMDk2LjExMy0uMTEzLjEyNi0uMDk2LjEyNy0uMDk2LjExMS0uMTEzLjEyOC0uMDk2LjEyNi0uMTEzLjExLS4wOTYuMTI4LjEyOC4xMTEuMTEzLjA5Ni4xMjguMDk1LjEyOC4wOTYuMTMuMDk0LjEyOC4wOTYuMTI4LjA4LjEyOC4wOC4xNDcuMDc5LjEyOC4wNzguMTQzLjA4LjE0Ny4wNjQuMTQzLjA3OC4xNDUuMDYzLjE0NS4wNjQuMTYuMDYzLjE0NS4wNDguMTYuMDQ4LjE0NS4wNjMuMTYyLjA0OC4xNi4wNDcuMTQ1LjAzMy4xNi4wNDYuMTYyLjAzMy4xNi4wMzIuMzIyLjA2My4xNi4wMTYuMTYyLjAzMi4xNzUuMDE1LjE2Mi4wMTYuMTYuMDE3aC4xNzdsLjE2LjAxNWguNjc2bC4xNi0uMDE1LjE3OC0uMDE3LjE2MS0uMDE2LjE0NS0uMDE1LjE2LS4wMTcuMTYyLS4wMzEuMTQ0LS4wMzIuMTQ1LS4wMzEuMTYtLjAzMi4xNDYtLjA0OC4xNi0uMDQ4LjE2LS4wNDguMTYtLjA2My4xNjMtLjA2My4xNDQtLjA4LjE0NC0uMDY0LjE0Ni0uMDguMTI5LS4wOTQuMTI4LS4wNzguMTI4LS4wOTYuMTEzLS4wOTQuMTI4LS4xMTMuMDk2LS4wOTQuMTEzLS4xMTEuMDk2LS4xMTEuMDgtLjEyOC4wOTctLjEyNy4xNjItLjI1NC4wNjQtLjE0Mi4wNjQtLjEyOC4wNjQtLjE0Mi4wNDktLjE2LjAzMi0uMTQyLjA0Ny0uMTU5LjAzNC0uMTU5LjAxNS0uMTc0LjAxNy0uMTU5em0tMTEuMjc3IDMuMTc0aC4xNjJ2LTIuNjE5aC02LjMwM3YtMS45NjdoNS41OTV2LTIuNDZoLTUuNTk1di0xLjg5aDYuMjIydi0yLjYyaC05LjI5M3YxMS41NTZoOS4yMTJ6bS0xMS43NDYgMGguMTZWNjQuMjIyaC0zLjI5NWwtLjA4MS4xMjctLjA4LjE0My0uMDk3LjEyNi0uMDgxLjE0NC0uMDguMTI2LS4wOC4xNDQtLjA5Ni4xMjYtLjA4MS4xNDItLjA4MS4xMjgtLjA4LjEyNi0uMDk2LjE0Mi0uMDguMTI4LS4wODEuMTQyLS4wODEuMTI4LS4wOTUuMTQtLjA4LjEyOC0uMDgyLjEyOC0uMDguMTQtLjA4LjEyOC0uMDk2LjE0My0uMDguMTI3LS4wODIuMTQxLS4wOC4xMjgtLjA5Ny4xNDItLjA3OS4xMjYtLjA4LjEyOC0uMDgyLjE0Mi0uMDk2LjEyOC0uMDguMTQyLS4wOC4xMjYtLjA4LjE0NC0uMDk3LjEyNi0uMDguMTQyLS4wODIuMTI4LS4wNzktLjEyOC0uMD


ovvero da cosa mi sembra di capire una specie di pagina che chiede in più lingue un ulteriore controllo di sicurezza che dubito sia il risultato sperato in quanto le classi che cerco non ci sono; tra l'altro ho il link aperto e cliccando f12 il risultato è tutt'altra cosa, poi dato che sono abbastanza alle prime armi in ambito html non so neanche se il risultato che voglio ottenere dal mio codice deve essere uguale a ciò che vedo dal sito e f12 aperto.

in ogni caso vorrei sapere la vostra e nel caso mi trovi in una sorta di controllo che fa in automatico il sito magari c'è un modo per aggirarlo, grazie in anticipo.
 
Ciao, facendo un test veloce ho visto che la richiesta GET torna un 403 :

1692771561620.png


il che vuol dire che non hai le autorizzazioni per visualizzare la pagina, infatti se guardi bene nel codice HTML che viene generato in output ti viene detto che c'è un check di sicurezza da eseguire prima di accedere alla pagina:

Schermata del 2023-08-23 08-20-57.jpg



puoi provare a registrarti e fare una request con le credenziali, magari ti risponde con 200.
 
  • Mi piace
Reazioni: --- Ra ---
in ogni caso vorrei sapere la vostra e nel caso mi trovi in una sorta di controllo che fa in automatico il sito magari c'è un modo per aggirarlo, grazie in anticipo.
Ciao,
stai cercando di fare web scraping nel modo sbagliato...
Fare web scraping direttamente dal risultato della pagina web è da considerarsi come ultima spiaggia in quanto è moooolto più lento e decisamente più complesso. Come prima risorsa è bene verificare se il sito web utilizza delle API che magari non sono "protette".

Nel tuo caso specifico dovresti effettuare una chiamata GET a questo indirizzo: https://store-site-backend-static-ipv4.ak.epicgames.com/freeGamesPromotions?locale=it&country=IT&allowCountries=IT
in modo da ottenere il json pulito che poi il frontend (la pagina web) formatta.