Factor Analysis in Text Mining taskAre mathematicians discovering algorithms by mining them with...

Why does Kotter return in Welcome Back Kotter?

SSH "lag" in LAN on some machines, mixed distros

Forgetting the musical notes while performing in concert

What reasons are there for a Capitalist to oppose a 100% inheritance tax?

Blender 2.8 I can't see vertices, edges or faces in edit mode

Can a rocket refuel on Mars from water?

Why was the shrinking from 8″ made only to 5.25″ and not smaller (4″ or less)?

Modeling an IP Address

How to model explosives?

What is the most common color to indicate the input-field is disabled?

In Romance of the Three Kingdoms why do people still use bamboo sticks when paper had already been invented?

Western buddy movie with a supernatural twist where a woman turns into an eagle at the end

Why is Collection not simply treated as Collection<?>

AES: Why is it a good practice to use only the first 16bytes of a hash for encryption?

Fully-Firstable Anagram Sets

How can I prevent hyper evolved versions of regular creatures from wiping out their cousins?

How could indestructible materials be used in power generation?

Has there ever been an airliner design involving reducing generator load by installing solar panels?

Why does Arabsat 6A need a Falcon Heavy to launch

Twin primes whose sum is a cube

Why are electrically insulating heatsinks so rare? Is it just cost?

Why do bosons tend to occupy the same state?

How to take photos in burst mode, without vibration?

How many spell slots should my level 1 wizard/level 1 fighter have?

Factor Analysis in Text Mining task

Are mathematicians discovering algorithms by mining them with computers?Number of rules (Data Mining)Difference between Data Mining and a model?Probability and Statistics Books for Distributions and Introduction to Data Mining/Machine Learning

I was faced with the task of determining the topics of big text massive. For example you have 1 million any text phrases or sentences. I want factorize the main topic from this massive. The ordinary factor analysis works with continuous data. Is there analog of factor analyze, but for text mining tasks? In ideal factorize big text massive, then select any factors (semantic core)
instance
F1 f2
topic 1 topic 2
topic 3 topic 4

or maybe you can help me find the greatest way to decide my task. I.e. i want understand
What are the main topics of interesе me people

asked Feb 6 '15 at 16:05

Julia

1

$begingroup$
Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
$endgroup$
– Mike Pierce
Feb 6 '15 at 16:09

add a comment |

asked Feb 6 '15 at 16:05

Julia

1

$begingroup$
Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
$endgroup$
– Mike Pierce
Feb 6 '15 at 16:09

add a comment |

asked Feb 6 '15 at 16:05

Julia

data-mining

asked Feb 6 '15 at 16:05

Julia

asked Feb 6 '15 at 16:05

Julia

asked Feb 6 '15 at 16:05

Julia

asked Feb 6 '15 at 16:05

Julia

asked Feb 6 '15 at 16:05

Julia

1

$begingroup$
Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
$endgroup$
– Mike Pierce
Feb 6 '15 at 16:09

add a comment |

1

$begingroup$
Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.
$endgroup$
– Mike Pierce
Feb 6 '15 at 16:09

Your question is unclear. You should improve the grammar in your post and add as many details as possible so that it will be easier for someone to help you.

– Mike Pierce
Feb 6 '15 at 16:09

add a comment |

2 Answers
2

active

oldest

votes

I did this for Chinese Language text. Guessing yours is in English? The methods I used may work for you too. First of all, you need to define the concept of "topic", that could be a family of related key words. Second, you need to have a database that has the synonyms and antonyms, and the form changes like "calculate, -ing, -ed" etc. Then you count the frequency of the usage of same word, sorting them by category, etc. things like that. You need to take out words like (of, with, etc.) Hope it helps.

For example, in the text of your question, there are about 120 words. The counting results shows topic (7), factor (6), text (5), task (4), analysis (3). Others are not in high frequency. If you get the above data from a computer, and you did not read the text, you may guess that it is about "an analysis task of topic or factor of text".

edited Feb 6 '15 at 16:32

answered Feb 6 '15 at 16:19

PdotWang

684411

$begingroup$
PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
$endgroup$
– Julia
Feb 6 '15 at 16:33

$begingroup$
R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
$endgroup$
– PdotWang
Feb 6 '15 at 16:35

$begingroup$
under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
$endgroup$
– Julia
Feb 6 '15 at 16:36

$begingroup$
what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
$endgroup$
– Julia
Feb 6 '15 at 16:39

$begingroup$
Thanks. It is good. It talks about the post processing of the searching results.
$endgroup$
– PdotWang
Feb 6 '15 at 16:58

add a comment |

For others landing on this page more recently, I wanted to provide an updated reply. If I understand this question correctly, the question submitter was looking for effective, all purpose algorithms for sorting and grouping a large corpus of text by common themes or subjects. There are a few approaches to this task that might be useful, such as Tf-IDF, Latent Semantic Analysis, Non-negative Matrix Factorization, and Latent Dirichlet Allocation. Also of interest might be keyword extraction methods and algorithmic summary methods such as Rapid Automatic Keyword Extraction (RAKE), TextTeaser, TextRank, and some deep learning or convolutional neural network approaches. Many of these are implemented in R and/or Python. See also Mehdi Allahyari et al., "Text Summarization Techniques: A Brief Survey," arXiv:1707.02268 [cs], July 7, 2017, http://arxiv.org/abs/1707.02268.

answered Mar 28 '18 at 14:00

Matthew Lavin

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1136602%2ffactor-analysis-in-text-mining-task%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

edited Feb 6 '15 at 16:32

answered Feb 6 '15 at 16:19

PdotWang

684411

$begingroup$
PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
$endgroup$
– Julia
Feb 6 '15 at 16:33

$begingroup$
R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
$endgroup$
– PdotWang
Feb 6 '15 at 16:35

$begingroup$
under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
$endgroup$
– Julia
Feb 6 '15 at 16:36

$begingroup$
what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
$endgroup$
– Julia
Feb 6 '15 at 16:39

$begingroup$
Thanks. It is good. It talks about the post processing of the searching results.
$endgroup$
– PdotWang
Feb 6 '15 at 16:58

add a comment |

edited Feb 6 '15 at 16:32

answered Feb 6 '15 at 16:19

PdotWang

684411

$begingroup$
PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
$endgroup$
– Julia
Feb 6 '15 at 16:33

$begingroup$
R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
$endgroup$
– PdotWang
Feb 6 '15 at 16:35

$begingroup$
under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
$endgroup$
– Julia
Feb 6 '15 at 16:36

$begingroup$
what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
$endgroup$
– Julia
Feb 6 '15 at 16:39

$begingroup$
Thanks. It is good. It talks about the post processing of the searching results.
$endgroup$
– PdotWang
Feb 6 '15 at 16:58

add a comment |

edited Feb 6 '15 at 16:32

answered Feb 6 '15 at 16:19

PdotWang

684411

edited Feb 6 '15 at 16:32

answered Feb 6 '15 at 16:19

PdotWang

684411

edited Feb 6 '15 at 16:32

answered Feb 6 '15 at 16:19

PdotWang

684411

answered Feb 6 '15 at 16:19

PdotWang

684411

answered Feb 6 '15 at 16:19

PdotWang

684411

$begingroup$
PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
$endgroup$
– Julia
Feb 6 '15 at 16:33

$begingroup$
R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
$endgroup$
– PdotWang
Feb 6 '15 at 16:35

$begingroup$
under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
$endgroup$
– Julia
Feb 6 '15 at 16:36

$begingroup$
what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
$endgroup$
– Julia
Feb 6 '15 at 16:39

$begingroup$
Thanks. It is good. It talks about the post processing of the searching results.
$endgroup$
– PdotWang
Feb 6 '15 at 16:58

add a comment |

$begingroup$
PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)
$endgroup$
– Julia
Feb 6 '15 at 16:33

$begingroup$
R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.
$endgroup$
– PdotWang
Feb 6 '15 at 16:35

$begingroup$
under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on
$endgroup$
– Julia
Feb 6 '15 at 16:36

$begingroup$
what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task
$endgroup$
– Julia
Feb 6 '15 at 16:39

$begingroup$
Thanks. It is good. It talks about the post processing of the searching results.
$endgroup$
– PdotWang
Feb 6 '15 at 16:58

PdotWang, thank you. Do you know, was your algorithm realized in R. Yes my language english:)

– Julia
Feb 6 '15 at 16:33

R for programming? Mine is Python. I am still searching for mathematic expression for it. No, my work is very shallow so far. I would like to learn from people like you.

– PdotWang
Feb 6 '15 at 16:35

under a topic i mean generalization of the themes which people wrote for example in 1000 000 sentences i may conclude that people wrote about broken parts, change the treatment plan and so on

– Julia
Feb 6 '15 at 16:36

what do you think about this article. iase-web.org/documents/papers/icots7/5E2_MORI.pdf can it help me in my task

– Julia
Feb 6 '15 at 16:39

Thanks. It is good. It talks about the post processing of the searching results.

– PdotWang
Feb 6 '15 at 16:58

add a comment |

answered Mar 28 '18 at 14:00

Matthew Lavin

add a comment |

answered Mar 28 '18 at 14:00

Matthew Lavin

add a comment |

answered Mar 28 '18 at 14:00

Matthew Lavin

answered Mar 28 '18 at 14:00

Matthew Lavin

answered Mar 28 '18 at 14:00

Matthew Lavin

answered Mar 28 '18 at 14:00

Matthew Lavin

answered Mar 28 '18 at 14:00

Matthew Lavin

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dtjytyk