Is stochastic gradient descent pseudo-stochastic?Why do neural network researchers care about epochs?Repeated training examples in Gradient DescentConvergence Criteria for Stochastic Gradient DescentWhy do neural network researchers care about epochs?Parallel minibatch gradient descent algorithmsGradient Descent (GD) vs Stochastic Gradient Descent (SGD)How backpropagation through gradient descent represents the error after each forward passStochastic Gradient Descent, Mini-Batch and Batch Gradient DescentStochastic gradient descent Vs Mini-batch size 1Stochastic gradient descent vs mini-batch gradient descentSpecifics on weight update calculation in stochastic gradient descent

Python "triplet" dictionary?

When India mathematicians did know Euclid's Elements?

Why do computer-science majors learn calculus?

Minimum value of 4 digit number divided by sum of its digits

What is the point of Germany's 299 "party seats" in the Bundestag?

Can not tell colimits from limits

What word means to make something obsolete?

Confusion about capacitors

Feels like I am getting dragged in office politics

Will a top journal at least read my introduction?

Can someone publish a story that happened to you?

Reverse the word in a string with the same order in javascript

What does YCWCYODFTRFDTY mean?

Confused by notation of atomic number Z and mass number A on periodic table of elements

Bayes Nash Equilibria in Battle of Sexes

Does jamais mean always or never in this context?

Pressure to defend the relevance of one's area of mathematics

Subtleties of choosing the sequence of tenses in Russian

get exit status from system() call

Is thermodynamics only applicable to systems in equilibrium?

Multiple options for Pseudonyms

Examples of non trivial equivalence relations , I mean equivalence relations without the expression " same ... as" in their definition?

A non-technological, repeating, visible object in the sky, holding its position in the sky for hours

Packing rectangles: Does rotation ever help?



Is stochastic gradient descent pseudo-stochastic?


Why do neural network researchers care about epochs?Repeated training examples in Gradient DescentConvergence Criteria for Stochastic Gradient DescentWhy do neural network researchers care about epochs?Parallel minibatch gradient descent algorithmsGradient Descent (GD) vs Stochastic Gradient Descent (SGD)How backpropagation through gradient descent represents the error after each forward passStochastic Gradient Descent, Mini-Batch and Batch Gradient DescentStochastic gradient descent Vs Mini-batch size 1Stochastic gradient descent vs mini-batch gradient descentSpecifics on weight update calculation in stochastic gradient descent






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








4












$begingroup$


I know that stochastic gradient descent randomly chooses 1 sample to update the weights. An epoch is defined as using all $N$ samples. So with SGD, for each epoch, we update the weights $N$ times.



My confusion is doesn't this make it so you have to go through all $N$ samples before you can see the same sample twice? Doesn't that effectively make it pseudo-random/stochastic? If it was entirely random, then there would be a possibility of seeing the same sample more than once before going through all $N$ samples.










share|cite|improve this question











$endgroup$


















    4












    $begingroup$


    I know that stochastic gradient descent randomly chooses 1 sample to update the weights. An epoch is defined as using all $N$ samples. So with SGD, for each epoch, we update the weights $N$ times.



    My confusion is doesn't this make it so you have to go through all $N$ samples before you can see the same sample twice? Doesn't that effectively make it pseudo-random/stochastic? If it was entirely random, then there would be a possibility of seeing the same sample more than once before going through all $N$ samples.










    share|cite|improve this question











    $endgroup$














      4












      4








      4





      $begingroup$


      I know that stochastic gradient descent randomly chooses 1 sample to update the weights. An epoch is defined as using all $N$ samples. So with SGD, for each epoch, we update the weights $N$ times.



      My confusion is doesn't this make it so you have to go through all $N$ samples before you can see the same sample twice? Doesn't that effectively make it pseudo-random/stochastic? If it was entirely random, then there would be a possibility of seeing the same sample more than once before going through all $N$ samples.










      share|cite|improve this question











      $endgroup$




      I know that stochastic gradient descent randomly chooses 1 sample to update the weights. An epoch is defined as using all $N$ samples. So with SGD, for each epoch, we update the weights $N$ times.



      My confusion is doesn't this make it so you have to go through all $N$ samples before you can see the same sample twice? Doesn't that effectively make it pseudo-random/stochastic? If it was entirely random, then there would be a possibility of seeing the same sample more than once before going through all $N$ samples.







      machine-learning neural-networks gradient-descent sgd






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Mar 20 at 15:50









      Sycorax

      43.3k12112208




      43.3k12112208










      asked Mar 20 at 15:14









      IamanonIamanon

      354




      354




















          1 Answer
          1






          active

          oldest

          votes


















          6












          $begingroup$

          Exhausting all $N$ samples before being able to repeat a sample means that the process is not independent. However, the process is still stochastic.



          Consider a shuffled deck of cards. You look at the top card and see $mathsfAspadesuit$ (Ace of Spades), and set it aside. You'll never see another $mathsfAspadesuit$ in the whole deck. However, you don't know anything about the ordering of the remaining 51 cards, because the deck is shuffled. In this sense, the remainder of the deck still has a random order. The next card could be a $mathsf2colorredheartsuit$ or $mathsfJclubsuit$. You don't know for sure; all you do know is that the next card isn't the Ace of Spades, because you've put the only $mathsfAspadesuit$ face-up somewhere else.



          In the scenario you outline, you're suggesting looking at the top card and then shuffling it into the deck again. This implies that the probability of seeing the $mathsfAspadesuit$ is independent of the previously-observed cards. Independence of events is an important attribute in probability theory, but it is not required to define a random process.



          You might wonder why a person would want to construct mini-batches using the non-independent strategy. That question is answered here: Why do neural network researchers care about epochs?






          share|cite|improve this answer











          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "65"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398540%2fis-stochastic-gradient-descent-pseudo-stochastic%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            6












            $begingroup$

            Exhausting all $N$ samples before being able to repeat a sample means that the process is not independent. However, the process is still stochastic.



            Consider a shuffled deck of cards. You look at the top card and see $mathsfAspadesuit$ (Ace of Spades), and set it aside. You'll never see another $mathsfAspadesuit$ in the whole deck. However, you don't know anything about the ordering of the remaining 51 cards, because the deck is shuffled. In this sense, the remainder of the deck still has a random order. The next card could be a $mathsf2colorredheartsuit$ or $mathsfJclubsuit$. You don't know for sure; all you do know is that the next card isn't the Ace of Spades, because you've put the only $mathsfAspadesuit$ face-up somewhere else.



            In the scenario you outline, you're suggesting looking at the top card and then shuffling it into the deck again. This implies that the probability of seeing the $mathsfAspadesuit$ is independent of the previously-observed cards. Independence of events is an important attribute in probability theory, but it is not required to define a random process.



            You might wonder why a person would want to construct mini-batches using the non-independent strategy. That question is answered here: Why do neural network researchers care about epochs?






            share|cite|improve this answer











            $endgroup$

















              6












              $begingroup$

              Exhausting all $N$ samples before being able to repeat a sample means that the process is not independent. However, the process is still stochastic.



              Consider a shuffled deck of cards. You look at the top card and see $mathsfAspadesuit$ (Ace of Spades), and set it aside. You'll never see another $mathsfAspadesuit$ in the whole deck. However, you don't know anything about the ordering of the remaining 51 cards, because the deck is shuffled. In this sense, the remainder of the deck still has a random order. The next card could be a $mathsf2colorredheartsuit$ or $mathsfJclubsuit$. You don't know for sure; all you do know is that the next card isn't the Ace of Spades, because you've put the only $mathsfAspadesuit$ face-up somewhere else.



              In the scenario you outline, you're suggesting looking at the top card and then shuffling it into the deck again. This implies that the probability of seeing the $mathsfAspadesuit$ is independent of the previously-observed cards. Independence of events is an important attribute in probability theory, but it is not required to define a random process.



              You might wonder why a person would want to construct mini-batches using the non-independent strategy. That question is answered here: Why do neural network researchers care about epochs?






              share|cite|improve this answer











              $endgroup$















                6












                6








                6





                $begingroup$

                Exhausting all $N$ samples before being able to repeat a sample means that the process is not independent. However, the process is still stochastic.



                Consider a shuffled deck of cards. You look at the top card and see $mathsfAspadesuit$ (Ace of Spades), and set it aside. You'll never see another $mathsfAspadesuit$ in the whole deck. However, you don't know anything about the ordering of the remaining 51 cards, because the deck is shuffled. In this sense, the remainder of the deck still has a random order. The next card could be a $mathsf2colorredheartsuit$ or $mathsfJclubsuit$. You don't know for sure; all you do know is that the next card isn't the Ace of Spades, because you've put the only $mathsfAspadesuit$ face-up somewhere else.



                In the scenario you outline, you're suggesting looking at the top card and then shuffling it into the deck again. This implies that the probability of seeing the $mathsfAspadesuit$ is independent of the previously-observed cards. Independence of events is an important attribute in probability theory, but it is not required to define a random process.



                You might wonder why a person would want to construct mini-batches using the non-independent strategy. That question is answered here: Why do neural network researchers care about epochs?






                share|cite|improve this answer











                $endgroup$



                Exhausting all $N$ samples before being able to repeat a sample means that the process is not independent. However, the process is still stochastic.



                Consider a shuffled deck of cards. You look at the top card and see $mathsfAspadesuit$ (Ace of Spades), and set it aside. You'll never see another $mathsfAspadesuit$ in the whole deck. However, you don't know anything about the ordering of the remaining 51 cards, because the deck is shuffled. In this sense, the remainder of the deck still has a random order. The next card could be a $mathsf2colorredheartsuit$ or $mathsfJclubsuit$. You don't know for sure; all you do know is that the next card isn't the Ace of Spades, because you've put the only $mathsfAspadesuit$ face-up somewhere else.



                In the scenario you outline, you're suggesting looking at the top card and then shuffling it into the deck again. This implies that the probability of seeing the $mathsfAspadesuit$ is independent of the previously-observed cards. Independence of events is an important attribute in probability theory, but it is not required to define a random process.



                You might wonder why a person would want to construct mini-batches using the non-independent strategy. That question is answered here: Why do neural network researchers care about epochs?







                share|cite|improve this answer














                share|cite|improve this answer



                share|cite|improve this answer








                edited Mar 20 at 16:20

























                answered Mar 20 at 15:39









                SycoraxSycorax

                43.3k12112208




                43.3k12112208



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Cross Validated!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398540%2fis-stochastic-gradient-descent-pseudo-stochastic%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Masuk log Menu navigasi

                    Identifying “long and narrow” polygons in with PostGISlength and width of polygonWhy postgis st_overlaps reports Qgis' “avoid intersections” generated polygon as overlapping with others?Adjusting polygons to boundary and filling holesDrawing polygons with fixed area?How to remove spikes in Polygons with PostGISDeleting sliver polygons after difference operation in QGIS?Snapping boundaries in PostGISSplit polygon into parts adding attributes based on underlying polygon in QGISSplitting overlap between polygons and assign to nearest polygon using PostGIS?Expanding polygons and clipping at midpoint?Removing Intersection of Buffers in Same Layers

                    Старые Смолеговицы Содержание История | География | Демография | Достопримечательности | Примечания | НавигацияHGЯOLHGЯOL41 206 832 01641 606 406 141Административно-территориальное деление Ленинградской области«Переписная оброчная книга Водской пятины 1500 года», С. 793«Карта Ингерманландии: Ивангорода, Яма, Копорья, Нотеборга», по материалам 1676 г.«Генеральная карта провинции Ингерманландии» Э. Белинга и А. Андерсина, 1704 г., составлена по материалам 1678 г.«Географический чертёж над Ижорскою землей со своими городами» Адриана Шонбека 1705 г.Новая и достоверная всей Ингерманландии ланткарта. Грав. А. Ростовцев. СПб., 1727 г.Топографическая карта Санкт-Петербургской губернии. 5-и верстка. Шуберт. 1834 г.Описание Санкт-Петербургской губернии по уездам и станамСпецкарта западной части России Ф. Ф. Шуберта. 1844 г.Алфавитный список селений по уездам и станам С.-Петербургской губернииСписки населённых мест Российской Империи, составленные и издаваемые центральным статистическим комитетом министерства внутренних дел. XXXVII. Санкт-Петербургская губерния. По состоянию на 1862 год. СПб. 1864. С. 203Материалы по статистике народного хозяйства в С.-Петербургской губернии. Вып. IX. Частновладельческое хозяйство в Ямбургском уезде. СПб, 1888, С. 146, С. 2, 7, 54Положение о гербе муниципального образования Курское сельское поселениеСправочник истории административно-территориального деления Ленинградской области.Топографическая карта Ленинградской области, квадрат О-35-23-В (Хотыницы), 1930 г.АрхивированоАдминистративно-территориальное деление Ленинградской области. — Л., 1933, С. 27, 198АрхивированоАдминистративно-экономический справочник по Ленинградской области. — Л., 1936, с. 219АрхивированоАдминистративно-территориальное деление Ленинградской области. — Л., 1966, с. 175АрхивированоАдминистративно-территориальное деление Ленинградской области. — Лениздат, 1973, С. 180АрхивированоАдминистративно-территориальное деление Ленинградской области. — Лениздат, 1990, ISBN 5-289-00612-5, С. 38АрхивированоАдминистративно-территориальное деление Ленинградской области. — СПб., 2007, с. 60АрхивированоКоряков Юрий База данных «Этно-языковой состав населённых пунктов России». Ленинградская область.Административно-территориальное деление Ленинградской области. — СПб, 1997, ISBN 5-86153-055-6, С. 41АрхивированоКультовый комплекс Старые Смолеговицы // Электронная энциклопедия ЭрмитажаПроблемы выявления, изучения и сохранения культовых комплексов с каменными крестами: по материалам работ 2016-2017 гг. в Ленинградской области