Factor Analysis in Text Mining taskAre mathematicians discovering algorithms by mining them with...
Why does Kotter return in Welcome Back Kotter?
SSH "lag" in LAN on some machines, mixed distros
Forgetting the musical notes while performing in concert
What reasons are there for a Capitalist to oppose a 100% inheritance tax?
Blender 2.8 I can't see vertices, edges or faces in edit mode
Can a rocket refuel on Mars from water?
Why was the shrinking from 8″ made only to 5.25″ and not smaller (4″ or less)?
Modeling an IP Address
How to model explosives?
What is the most common color to indicate the input-field is disabled?
In Romance of the Three Kingdoms why do people still use bamboo sticks when paper had already been invented?
Western buddy movie with a supernatural twist where a woman turns into an eagle at the end
Why is Collection not simply treated as Collection<?>
AES: Why is it a good practice to use only the first 16bytes of a hash for encryption?
Fully-Firstable Anagram Sets
How can I prevent hyper evolved versions of regular creatures from wiping out their cousins?
How could indestructible materials be used in power generation?
Has there ever been an airliner design involving reducing generator load by installing solar panels?
Why does Arabsat 6A need a Falcon Heavy to launch
Twin primes whose sum is a cube
Why are electrically insulating heatsinks so rare? Is it just cost?
Why do bosons tend to occupy the same state?
How to take photos in burst mode, without vibration?
How many spell slots should my level 1 wizard/level 1 fighter have?
Factor Analysis in Text Mining task
Are mathematicians discovering algorithms by mining them with computers?Number of rules (Data Mining)Difference between Data Mining and a model?Probability and Statistics Books for Distributions and Introduction to Data Mining/Machine Learning
$begingroup$
I was faced with the task of determining the topics of big text massive. For example you have 1 million any text phrases or sentences. I want factorize the main topic from this massive. The ordinary factor analysis works with continuous data. Is there analog of factor analyze, but for text mining tasks? In ideal factorize big text massive, then select any factors (semantic core)
instance
F1 f2
topic 1 topic 2
topic 3 topic 4
or maybe you can help me find the greatest way to decide my task. I.e. i want understand
What are the main topics of interesе me people
data-mining
$endgroup$
add a comment |
$begingroup$
I was faced with the task of determining the topics of big text massive. For example you have 1 million any text phrases or sentences. I want factorize the main topic from this massive. The ordinary factor analysis works with continuous data. Is there analog of factor analyze, but for text mining tasks? In ideal factorize big text massive, then select any factors (semantic core)
instance
F1 f2
topic 1 topic 2
topic 3 topic 4
or maybe you can help me find the greatest way to decide my task. I.e. i want understand
What are the main topics of interesе me people
data-mining
$endgroup$
1
$begingroup$
Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
$endgroup$
– Mike Pierce
Feb 6 '15 at 16:09
add a comment |
$begingroup$
I was faced with the task of determining the topics of big text massive. For example you have 1 million any text phrases or sentences. I want factorize the main topic from this massive. The ordinary factor analysis works with continuous data. Is there analog of factor analyze, but for text mining tasks? In ideal factorize big text massive, then select any factors (semantic core)
instance
F1 f2
topic 1 topic 2
topic 3 topic 4
or maybe you can help me find the greatest way to decide my task. I.e. i want understand
What are the main topics of interesе me people
data-mining
$endgroup$
I was faced with the task of determining the topics of big text massive. For example you have 1 million any text phrases or sentences. I want factorize the main topic from this massive. The ordinary factor analysis works with continuous data. Is there analog of factor analyze, but for text mining tasks? In ideal factorize big text massive, then select any factors (semantic core)
instance
F1 f2
topic 1 topic 2
topic 3 topic 4
or maybe you can help me find the greatest way to decide my task. I.e. i want understand
What are the main topics of interesе me people
data-mining
data-mining
asked Feb 6 '15 at 16:05
JuliaJulia
11
11
1
$begingroup$
Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
$endgroup$
– Mike Pierce
Feb 6 '15 at 16:09
add a comment |
1
$begingroup$
Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
$endgroup$
– Mike Pierce
Feb 6 '15 at 16:09
1
1
$begingroup$
Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
$endgroup$
– Mike Pierce
Feb 6 '15 at 16:09
$begingroup$
Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
$endgroup$
– Mike Pierce
Feb 6 '15 at 16:09
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
I did this for Chinese Language text. Guessing yours is in English? The methods I used may work for you too. First of all, you need to define the concept of "topic", that could be a family of related key words. Second, you need to have a database that has the synonyms and antonyms, and the form changes like "calculate, -ing, -ed" etc. Then you count the frequency of the usage of same word, sorting them by category, etc. things like that. You need to take out words like (of, with, etc.) Hope it helps.
For example, in the text of your question, there are about 120 words. The counting results shows topic (7), factor (6), text (5), task (4), analysis (3). Others are not in high frequency. If you get the above data from a computer, and you did not read the text, you may guess that it is about "an analysis task of topic or factor of text".
$endgroup$
$begingroup$
PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
$endgroup$
– Julia
Feb 6 '15 at 16:33
$begingroup$
R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
$endgroup$
– PdotWang
Feb 6 '15 at 16:35
$begingroup$
under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
$endgroup$
– Julia
Feb 6 '15 at 16:36
$begingroup$
what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
$endgroup$
– Julia
Feb 6 '15 at 16:39
$begingroup$
Thanks. It is good. It talks about the post processing of the searching results.
$endgroup$
– PdotWang
Feb 6 '15 at 16:58
add a comment |
$begingroup$
For others landing on this page more recently, I wanted to provide an updated reply. If I understand this question correctly, the question submitter was looking for effective, all purpose algorithms for sorting and grouping a large corpus of text by common themes or subjects. There are a few approaches to this task that might be useful, such as Tf-IDF, Latent Semantic Analysis, Non-negative Matrix Factorization, and Latent Dirichlet Allocation. Also of interest might be keyword extraction methods and algorithmic summary methods such as Rapid Automatic Keyword Extraction (RAKE), TextTeaser, TextRank, and some deep learning or convolutional neural network approaches. Many of these are implemented in R and/or Python. See also Mehdi Allahyari et al., "Text Summarization Techniques: A Brief Survey," arXiv:1707.02268 [cs], July 7, 2017, http://arxiv.org/abs/1707.02268.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1136602%2ffactor-analysis-in-text-mining-task%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I did this for Chinese Language text. Guessing yours is in English? The methods I used may work for you too. First of all, you need to define the concept of "topic", that could be a family of related key words. Second, you need to have a database that has the synonyms and antonyms, and the form changes like "calculate, -ing, -ed" etc. Then you count the frequency of the usage of same word, sorting them by category, etc. things like that. You need to take out words like (of, with, etc.) Hope it helps.
For example, in the text of your question, there are about 120 words. The counting results shows topic (7), factor (6), text (5), task (4), analysis (3). Others are not in high frequency. If you get the above data from a computer, and you did not read the text, you may guess that it is about "an analysis task of topic or factor of text".
$endgroup$
$begingroup$
PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
$endgroup$
– Julia
Feb 6 '15 at 16:33
$begingroup$
R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
$endgroup$
– PdotWang
Feb 6 '15 at 16:35
$begingroup$
under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
$endgroup$
– Julia
Feb 6 '15 at 16:36
$begingroup$
what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
$endgroup$
– Julia
Feb 6 '15 at 16:39
$begingroup$
Thanks. It is good. It talks about the post processing of the searching results.
$endgroup$
– PdotWang
Feb 6 '15 at 16:58
add a comment |
$begingroup$
I did this for Chinese Language text. Guessing yours is in English? The methods I used may work for you too. First of all, you need to define the concept of "topic", that could be a family of related key words. Second, you need to have a database that has the synonyms and antonyms, and the form changes like "calculate, -ing, -ed" etc. Then you count the frequency of the usage of same word, sorting them by category, etc. things like that. You need to take out words like (of, with, etc.) Hope it helps.
For example, in the text of your question, there are about 120 words. The counting results shows topic (7), factor (6), text (5), task (4), analysis (3). Others are not in high frequency. If you get the above data from a computer, and you did not read the text, you may guess that it is about "an analysis task of topic or factor of text".
$endgroup$
$begingroup$
PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
$endgroup$
– Julia
Feb 6 '15 at 16:33
$begingroup$
R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
$endgroup$
– PdotWang
Feb 6 '15 at 16:35
$begingroup$
under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
$endgroup$
– Julia
Feb 6 '15 at 16:36
$begingroup$
what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
$endgroup$
– Julia
Feb 6 '15 at 16:39
$begingroup$
Thanks. It is good. It talks about the post processing of the searching results.
$endgroup$
– PdotWang
Feb 6 '15 at 16:58
add a comment |
$begingroup$
I did this for Chinese Language text. Guessing yours is in English? The methods I used may work for you too. First of all, you need to define the concept of "topic", that could be a family of related key words. Second, you need to have a database that has the synonyms and antonyms, and the form changes like "calculate, -ing, -ed" etc. Then you count the frequency of the usage of same word, sorting them by category, etc. things like that. You need to take out words like (of, with, etc.) Hope it helps.
For example, in the text of your question, there are about 120 words. The counting results shows topic (7), factor (6), text (5), task (4), analysis (3). Others are not in high frequency. If you get the above data from a computer, and you did not read the text, you may guess that it is about "an analysis task of topic or factor of text".
$endgroup$
I did this for Chinese Language text. Guessing yours is in English? The methods I used may work for you too. First of all, you need to define the concept of "topic", that could be a family of related key words. Second, you need to have a database that has the synonyms and antonyms, and the form changes like "calculate, -ing, -ed" etc. Then you count the frequency of the usage of same word, sorting them by category, etc. things like that. You need to take out words like (of, with, etc.) Hope it helps.
For example, in the text of your question, there are about 120 words. The counting results shows topic (7), factor (6), text (5), task (4), analysis (3). Others are not in high frequency. If you get the above data from a computer, and you did not read the text, you may guess that it is about "an analysis task of topic or factor of text".
edited Feb 6 '15 at 16:32
answered Feb 6 '15 at 16:19
PdotWangPdotWang
684411
684411
$begingroup$
PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
$endgroup$
– Julia
Feb 6 '15 at 16:33
$begingroup$
R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
$endgroup$
– PdotWang
Feb 6 '15 at 16:35
$begingroup$
under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
$endgroup$
– Julia
Feb 6 '15 at 16:36
$begingroup$
what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
$endgroup$
– Julia
Feb 6 '15 at 16:39
$begingroup$
Thanks. It is good. It talks about the post processing of the searching results.
$endgroup$
– PdotWang
Feb 6 '15 at 16:58
add a comment |
$begingroup$
PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
$endgroup$
– Julia
Feb 6 '15 at 16:33
$begingroup$
R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
$endgroup$
– PdotWang
Feb 6 '15 at 16:35
$begingroup$
under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
$endgroup$
– Julia
Feb 6 '15 at 16:36
$begingroup$
what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
$endgroup$
– Julia
Feb 6 '15 at 16:39
$begingroup$
Thanks. It is good. It talks about the post processing of the searching results.
$endgroup$
– PdotWang
Feb 6 '15 at 16:58
$begingroup$
PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
$endgroup$
– Julia
Feb 6 '15 at 16:33
$begingroup$
PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
$endgroup$
– Julia
Feb 6 '15 at 16:33
$begingroup$
R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
$endgroup$
– PdotWang
Feb 6 '15 at 16:35
$begingroup$
R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
$endgroup$
– PdotWang
Feb 6 '15 at 16:35
$begingroup$
under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
$endgroup$
– Julia
Feb 6 '15 at 16:36
$begingroup$
under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
$endgroup$
– Julia
Feb 6 '15 at 16:36
$begingroup$
what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
$endgroup$
– Julia
Feb 6 '15 at 16:39
$begingroup$
what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
$endgroup$
– Julia
Feb 6 '15 at 16:39
$begingroup$
Thanks. It is good. It talks about the post processing of the searching results.
$endgroup$
– PdotWang
Feb 6 '15 at 16:58
$begingroup$
Thanks. It is good. It talks about the post processing of the searching results.
$endgroup$
– PdotWang
Feb 6 '15 at 16:58
add a comment |
$begingroup$
For others landing on this page more recently, I wanted to provide an updated reply. If I understand this question correctly, the question submitter was looking for effective, all purpose algorithms for sorting and grouping a large corpus of text by common themes or subjects. There are a few approaches to this task that might be useful, such as Tf-IDF, Latent Semantic Analysis, Non-negative Matrix Factorization, and Latent Dirichlet Allocation. Also of interest might be keyword extraction methods and algorithmic summary methods such as Rapid Automatic Keyword Extraction (RAKE), TextTeaser, TextRank, and some deep learning or convolutional neural network approaches. Many of these are implemented in R and/or Python. See also Mehdi Allahyari et al., "Text Summarization Techniques: A Brief Survey," arXiv:1707.02268 [cs], July 7, 2017, http://arxiv.org/abs/1707.02268.
$endgroup$
add a comment |
$begingroup$
For others landing on this page more recently, I wanted to provide an updated reply. If I understand this question correctly, the question submitter was looking for effective, all purpose algorithms for sorting and grouping a large corpus of text by common themes or subjects. There are a few approaches to this task that might be useful, such as Tf-IDF, Latent Semantic Analysis, Non-negative Matrix Factorization, and Latent Dirichlet Allocation. Also of interest might be keyword extraction methods and algorithmic summary methods such as Rapid Automatic Keyword Extraction (RAKE), TextTeaser, TextRank, and some deep learning or convolutional neural network approaches. Many of these are implemented in R and/or Python. See also Mehdi Allahyari et al., "Text Summarization Techniques: A Brief Survey," arXiv:1707.02268 [cs], July 7, 2017, http://arxiv.org/abs/1707.02268.
$endgroup$
add a comment |
$begingroup$
For others landing on this page more recently, I wanted to provide an updated reply. If I understand this question correctly, the question submitter was looking for effective, all purpose algorithms for sorting and grouping a large corpus of text by common themes or subjects. There are a few approaches to this task that might be useful, such as Tf-IDF, Latent Semantic Analysis, Non-negative Matrix Factorization, and Latent Dirichlet Allocation. Also of interest might be keyword extraction methods and algorithmic summary methods such as Rapid Automatic Keyword Extraction (RAKE), TextTeaser, TextRank, and some deep learning or convolutional neural network approaches. Many of these are implemented in R and/or Python. See also Mehdi Allahyari et al., "Text Summarization Techniques: A Brief Survey," arXiv:1707.02268 [cs], July 7, 2017, http://arxiv.org/abs/1707.02268.
$endgroup$
For others landing on this page more recently, I wanted to provide an updated reply. If I understand this question correctly, the question submitter was looking for effective, all purpose algorithms for sorting and grouping a large corpus of text by common themes or subjects. There are a few approaches to this task that might be useful, such as Tf-IDF, Latent Semantic Analysis, Non-negative Matrix Factorization, and Latent Dirichlet Allocation. Also of interest might be keyword extraction methods and algorithmic summary methods such as Rapid Automatic Keyword Extraction (RAKE), TextTeaser, TextRank, and some deep learning or convolutional neural network approaches. Many of these are implemented in R and/or Python. See also Mehdi Allahyari et al., "Text Summarization Techniques: A Brief Survey," arXiv:1707.02268 [cs], July 7, 2017, http://arxiv.org/abs/1707.02268.
answered Mar 28 '18 at 14:00
Matthew LavinMatthew Lavin
12
12
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1136602%2ffactor-analysis-in-text-mining-task%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
$endgroup$
– Mike Pierce
Feb 6 '15 at 16:09