Factor Analysis in Text Mining taskAre mathematicians discovering algorithms by mining them with...

Multi tool use
Multi tool use

Why does Kotter return in Welcome Back Kotter?

SSH "lag" in LAN on some machines, mixed distros

Forgetting the musical notes while performing in concert

What reasons are there for a Capitalist to oppose a 100% inheritance tax?

Blender 2.8 I can't see vertices, edges or faces in edit mode

Can a rocket refuel on Mars from water?

Why was the shrinking from 8″ made only to 5.25″ and not smaller (4″ or less)?

Modeling an IP Address

How to model explosives?

What is the most common color to indicate the input-field is disabled?

In Romance of the Three Kingdoms why do people still use bamboo sticks when paper had already been invented?

Western buddy movie with a supernatural twist where a woman turns into an eagle at the end

Why is Collection not simply treated as Collection<?>

AES: Why is it a good practice to use only the first 16bytes of a hash for encryption?

Fully-Firstable Anagram Sets

How can I prevent hyper evolved versions of regular creatures from wiping out their cousins?

How could indestructible materials be used in power generation?

Has there ever been an airliner design involving reducing generator load by installing solar panels?

Why does Arabsat 6A need a Falcon Heavy to launch

Twin primes whose sum is a cube

Why are electrically insulating heatsinks so rare? Is it just cost?

Why do bosons tend to occupy the same state?

How to take photos in burst mode, without vibration?

How many spell slots should my level 1 wizard/level 1 fighter have?



Factor Analysis in Text Mining task


Are mathematicians discovering algorithms by mining them with computers?Number of rules (Data Mining)Difference between Data Mining and a model?Probability and Statistics Books for Distributions and Introduction to Data Mining/Machine Learning













0












$begingroup$


I was faced with the task of determining the topics of big text massive. For example you have 1 million any text phrases or sentences. I want factorize the main topic from this massive. The ordinary factor analysis works with continuous data. Is there analog of factor analyze, but for text mining tasks? In ideal factorize big text massive, then select any factors (semantic core)
instance
F1 f2
topic 1 topic 2
topic 3 topic 4

or maybe you can help me find the greatest way to decide my task. I.e. i want understand
What are the main topics of interesе me people










share|cite|improve this question









$endgroup$








  • 1




    $begingroup$
    Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
    $endgroup$
    – Mike Pierce
    Feb 6 '15 at 16:09
















0












$begingroup$


I was faced with the task of determining the topics of big text massive. For example you have 1 million any text phrases or sentences. I want factorize the main topic from this massive. The ordinary factor analysis works with continuous data. Is there analog of factor analyze, but for text mining tasks? In ideal factorize big text massive, then select any factors (semantic core)
instance
F1 f2
topic 1 topic 2
topic 3 topic 4

or maybe you can help me find the greatest way to decide my task. I.e. i want understand
What are the main topics of interesе me people










share|cite|improve this question









$endgroup$








  • 1




    $begingroup$
    Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
    $endgroup$
    – Mike Pierce
    Feb 6 '15 at 16:09














0












0








0





$begingroup$


I was faced with the task of determining the topics of big text massive. For example you have 1 million any text phrases or sentences. I want factorize the main topic from this massive. The ordinary factor analysis works with continuous data. Is there analog of factor analyze, but for text mining tasks? In ideal factorize big text massive, then select any factors (semantic core)
instance
F1 f2
topic 1 topic 2
topic 3 topic 4

or maybe you can help me find the greatest way to decide my task. I.e. i want understand
What are the main topics of interesе me people










share|cite|improve this question









$endgroup$




I was faced with the task of determining the topics of big text massive. For example you have 1 million any text phrases or sentences. I want factorize the main topic from this massive. The ordinary factor analysis works with continuous data. Is there analog of factor analyze, but for text mining tasks? In ideal factorize big text massive, then select any factors (semantic core)
instance
F1 f2
topic 1 topic 2
topic 3 topic 4

or maybe you can help me find the greatest way to decide my task. I.e. i want understand
What are the main topics of interesе me people







data-mining






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Feb 6 '15 at 16:05









JuliaJulia

11




11








  • 1




    $begingroup$
    Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
    $endgroup$
    – Mike Pierce
    Feb 6 '15 at 16:09














  • 1




    $begingroup$
    Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
    $endgroup$
    – Mike Pierce
    Feb 6 '15 at 16:09








1




1




$begingroup$
Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
$endgroup$
– Mike Pierce
Feb 6 '15 at 16:09




$begingroup$
Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
$endgroup$
– Mike Pierce
Feb 6 '15 at 16:09










2 Answers
2






active

oldest

votes


















0












$begingroup$

I did this for Chinese Language text. Guessing yours is in English? The methods I used may work for you too. First of all, you need to define the concept of "topic", that could be a family of related key words. Second, you need to have a database that has the synonyms and antonyms, and the form changes like "calculate, -ing, -ed" etc. Then you count the frequency of the usage of same word, sorting them by category, etc. things like that. You need to take out words like (of, with, etc.) Hope it helps.



For example, in the text of your question, there are about 120 words. The counting results shows topic (7), factor (6), text (5), task (4), analysis (3). Others are not in high frequency. If you get the above data from a computer, and you did not read the text, you may guess that it is about "an analysis task of topic or factor of text".






share|cite|improve this answer











$endgroup$













  • $begingroup$
    PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
    $endgroup$
    – Julia
    Feb 6 '15 at 16:33










  • $begingroup$
    R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
    $endgroup$
    – PdotWang
    Feb 6 '15 at 16:35












  • $begingroup$
    under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
    $endgroup$
    – Julia
    Feb 6 '15 at 16:36












  • $begingroup$
    what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
    $endgroup$
    – Julia
    Feb 6 '15 at 16:39










  • $begingroup$
    Thanks. It is good. It talks about the post processing of the searching results.
    $endgroup$
    – PdotWang
    Feb 6 '15 at 16:58



















0












$begingroup$

For others landing on this page more recently, I wanted to provide an updated reply. If I understand this question correctly, the question submitter was looking for effective, all purpose algorithms for sorting and grouping a large corpus of text by common themes or subjects. There are a few approaches to this task that might be useful, such as Tf-IDF, Latent Semantic Analysis, Non-negative Matrix Factorization, and Latent Dirichlet Allocation. Also of interest might be keyword extraction methods and algorithmic summary methods such as Rapid Automatic Keyword Extraction (RAKE), TextTeaser, TextRank, and some deep learning or convolutional neural network approaches. Many of these are implemented in R and/or Python. See also Mehdi Allahyari et al., "Text Summarization Techniques: A Brief Survey," arXiv:1707.02268 [cs], July 7, 2017, http://arxiv.org/abs/1707.02268.






share|cite|improve this answer









$endgroup$














    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1136602%2ffactor-analysis-in-text-mining-task%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    I did this for Chinese Language text. Guessing yours is in English? The methods I used may work for you too. First of all, you need to define the concept of "topic", that could be a family of related key words. Second, you need to have a database that has the synonyms and antonyms, and the form changes like "calculate, -ing, -ed" etc. Then you count the frequency of the usage of same word, sorting them by category, etc. things like that. You need to take out words like (of, with, etc.) Hope it helps.



    For example, in the text of your question, there are about 120 words. The counting results shows topic (7), factor (6), text (5), task (4), analysis (3). Others are not in high frequency. If you get the above data from a computer, and you did not read the text, you may guess that it is about "an analysis task of topic or factor of text".






    share|cite|improve this answer











    $endgroup$













    • $begingroup$
      PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
      $endgroup$
      – Julia
      Feb 6 '15 at 16:33










    • $begingroup$
      R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
      $endgroup$
      – PdotWang
      Feb 6 '15 at 16:35












    • $begingroup$
      under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
      $endgroup$
      – Julia
      Feb 6 '15 at 16:36












    • $begingroup$
      what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
      $endgroup$
      – Julia
      Feb 6 '15 at 16:39










    • $begingroup$
      Thanks. It is good. It talks about the post processing of the searching results.
      $endgroup$
      – PdotWang
      Feb 6 '15 at 16:58
















    0












    $begingroup$

    I did this for Chinese Language text. Guessing yours is in English? The methods I used may work for you too. First of all, you need to define the concept of "topic", that could be a family of related key words. Second, you need to have a database that has the synonyms and antonyms, and the form changes like "calculate, -ing, -ed" etc. Then you count the frequency of the usage of same word, sorting them by category, etc. things like that. You need to take out words like (of, with, etc.) Hope it helps.



    For example, in the text of your question, there are about 120 words. The counting results shows topic (7), factor (6), text (5), task (4), analysis (3). Others are not in high frequency. If you get the above data from a computer, and you did not read the text, you may guess that it is about "an analysis task of topic or factor of text".






    share|cite|improve this answer











    $endgroup$













    • $begingroup$
      PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
      $endgroup$
      – Julia
      Feb 6 '15 at 16:33










    • $begingroup$
      R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
      $endgroup$
      – PdotWang
      Feb 6 '15 at 16:35












    • $begingroup$
      under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
      $endgroup$
      – Julia
      Feb 6 '15 at 16:36












    • $begingroup$
      what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
      $endgroup$
      – Julia
      Feb 6 '15 at 16:39










    • $begingroup$
      Thanks. It is good. It talks about the post processing of the searching results.
      $endgroup$
      – PdotWang
      Feb 6 '15 at 16:58














    0












    0








    0





    $begingroup$

    I did this for Chinese Language text. Guessing yours is in English? The methods I used may work for you too. First of all, you need to define the concept of "topic", that could be a family of related key words. Second, you need to have a database that has the synonyms and antonyms, and the form changes like "calculate, -ing, -ed" etc. Then you count the frequency of the usage of same word, sorting them by category, etc. things like that. You need to take out words like (of, with, etc.) Hope it helps.



    For example, in the text of your question, there are about 120 words. The counting results shows topic (7), factor (6), text (5), task (4), analysis (3). Others are not in high frequency. If you get the above data from a computer, and you did not read the text, you may guess that it is about "an analysis task of topic or factor of text".






    share|cite|improve this answer











    $endgroup$



    I did this for Chinese Language text. Guessing yours is in English? The methods I used may work for you too. First of all, you need to define the concept of "topic", that could be a family of related key words. Second, you need to have a database that has the synonyms and antonyms, and the form changes like "calculate, -ing, -ed" etc. Then you count the frequency of the usage of same word, sorting them by category, etc. things like that. You need to take out words like (of, with, etc.) Hope it helps.



    For example, in the text of your question, there are about 120 words. The counting results shows topic (7), factor (6), text (5), task (4), analysis (3). Others are not in high frequency. If you get the above data from a computer, and you did not read the text, you may guess that it is about "an analysis task of topic or factor of text".







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited Feb 6 '15 at 16:32

























    answered Feb 6 '15 at 16:19









    PdotWangPdotWang

    684411




    684411












    • $begingroup$
      PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
      $endgroup$
      – Julia
      Feb 6 '15 at 16:33










    • $begingroup$
      R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
      $endgroup$
      – PdotWang
      Feb 6 '15 at 16:35












    • $begingroup$
      under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
      $endgroup$
      – Julia
      Feb 6 '15 at 16:36












    • $begingroup$
      what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
      $endgroup$
      – Julia
      Feb 6 '15 at 16:39










    • $begingroup$
      Thanks. It is good. It talks about the post processing of the searching results.
      $endgroup$
      – PdotWang
      Feb 6 '15 at 16:58


















    • $begingroup$
      PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
      $endgroup$
      – Julia
      Feb 6 '15 at 16:33










    • $begingroup$
      R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
      $endgroup$
      – PdotWang
      Feb 6 '15 at 16:35












    • $begingroup$
      under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
      $endgroup$
      – Julia
      Feb 6 '15 at 16:36












    • $begingroup$
      what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
      $endgroup$
      – Julia
      Feb 6 '15 at 16:39










    • $begingroup$
      Thanks. It is good. It talks about the post processing of the searching results.
      $endgroup$
      – PdotWang
      Feb 6 '15 at 16:58
















    $begingroup$
    PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
    $endgroup$
    – Julia
    Feb 6 '15 at 16:33




    $begingroup$
    PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
    $endgroup$
    – Julia
    Feb 6 '15 at 16:33












    $begingroup$
    R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
    $endgroup$
    – PdotWang
    Feb 6 '15 at 16:35






    $begingroup$
    R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
    $endgroup$
    – PdotWang
    Feb 6 '15 at 16:35














    $begingroup$
    under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
    $endgroup$
    – Julia
    Feb 6 '15 at 16:36






    $begingroup$
    under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
    $endgroup$
    – Julia
    Feb 6 '15 at 16:36














    $begingroup$
    what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
    $endgroup$
    – Julia
    Feb 6 '15 at 16:39




    $begingroup$
    what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
    $endgroup$
    – Julia
    Feb 6 '15 at 16:39












    $begingroup$
    Thanks. It is good. It talks about the post processing of the searching results.
    $endgroup$
    – PdotWang
    Feb 6 '15 at 16:58




    $begingroup$
    Thanks. It is good. It talks about the post processing of the searching results.
    $endgroup$
    – PdotWang
    Feb 6 '15 at 16:58











    0












    $begingroup$

    For others landing on this page more recently, I wanted to provide an updated reply. If I understand this question correctly, the question submitter was looking for effective, all purpose algorithms for sorting and grouping a large corpus of text by common themes or subjects. There are a few approaches to this task that might be useful, such as Tf-IDF, Latent Semantic Analysis, Non-negative Matrix Factorization, and Latent Dirichlet Allocation. Also of interest might be keyword extraction methods and algorithmic summary methods such as Rapid Automatic Keyword Extraction (RAKE), TextTeaser, TextRank, and some deep learning or convolutional neural network approaches. Many of these are implemented in R and/or Python. See also Mehdi Allahyari et al., "Text Summarization Techniques: A Brief Survey," arXiv:1707.02268 [cs], July 7, 2017, http://arxiv.org/abs/1707.02268.






    share|cite|improve this answer









    $endgroup$


















      0












      $begingroup$

      For others landing on this page more recently, I wanted to provide an updated reply. If I understand this question correctly, the question submitter was looking for effective, all purpose algorithms for sorting and grouping a large corpus of text by common themes or subjects. There are a few approaches to this task that might be useful, such as Tf-IDF, Latent Semantic Analysis, Non-negative Matrix Factorization, and Latent Dirichlet Allocation. Also of interest might be keyword extraction methods and algorithmic summary methods such as Rapid Automatic Keyword Extraction (RAKE), TextTeaser, TextRank, and some deep learning or convolutional neural network approaches. Many of these are implemented in R and/or Python. See also Mehdi Allahyari et al., "Text Summarization Techniques: A Brief Survey," arXiv:1707.02268 [cs], July 7, 2017, http://arxiv.org/abs/1707.02268.






      share|cite|improve this answer









      $endgroup$
















        0












        0








        0





        $begingroup$

        For others landing on this page more recently, I wanted to provide an updated reply. If I understand this question correctly, the question submitter was looking for effective, all purpose algorithms for sorting and grouping a large corpus of text by common themes or subjects. There are a few approaches to this task that might be useful, such as Tf-IDF, Latent Semantic Analysis, Non-negative Matrix Factorization, and Latent Dirichlet Allocation. Also of interest might be keyword extraction methods and algorithmic summary methods such as Rapid Automatic Keyword Extraction (RAKE), TextTeaser, TextRank, and some deep learning or convolutional neural network approaches. Many of these are implemented in R and/or Python. See also Mehdi Allahyari et al., "Text Summarization Techniques: A Brief Survey," arXiv:1707.02268 [cs], July 7, 2017, http://arxiv.org/abs/1707.02268.






        share|cite|improve this answer









        $endgroup$



        For others landing on this page more recently, I wanted to provide an updated reply. If I understand this question correctly, the question submitter was looking for effective, all purpose algorithms for sorting and grouping a large corpus of text by common themes or subjects. There are a few approaches to this task that might be useful, such as Tf-IDF, Latent Semantic Analysis, Non-negative Matrix Factorization, and Latent Dirichlet Allocation. Also of interest might be keyword extraction methods and algorithmic summary methods such as Rapid Automatic Keyword Extraction (RAKE), TextTeaser, TextRank, and some deep learning or convolutional neural network approaches. Many of these are implemented in R and/or Python. See also Mehdi Allahyari et al., "Text Summarization Techniques: A Brief Survey," arXiv:1707.02268 [cs], July 7, 2017, http://arxiv.org/abs/1707.02268.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Mar 28 '18 at 14:00









        Matthew LavinMatthew Lavin

        12




        12






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1136602%2ffactor-analysis-in-text-mining-task%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            i10qXjXuH4E1HWQLUjk0 7hz nDZqHE8g1dd7w,OYGKWdDmHmm9,uKWc6bR0 rG,gsKYe
            e,91hp,zqK,jVOA

            Popular posts from this blog

            Nidaros erkebispedøme

            Birsay

            Was Woodrow Wilson really a Liberal?Was World War I a war of liberals against authoritarians?Founding Fathers...