finding e-mail address which begins with 2 character and different domain names as array

hello every one; i have 3 arrays and they are declared as the following:
dom_lists={'000' 'hotmal.com';'001' 'gmail.com';'010' 'yahoo.com'; '011' 'mail.com';'100' 'live.com';'101' 'myspace.com';'110' 'msn.com';'111' 'mynet.com'};
sub_g1={'aj';'ih';'vn';'hu';'eg';'is';'rd';'nt';'me';'ah';'zb';'en';'mm'};
sub_g2={'001';'101'; '101';'101'; '000'; '110'; '001'; '111'; '101'; '000'; '110'; '000'; '000'};
list_emialadre={'aakm@hotmail.com';'abomcn@hotmail.com';...............}; 5408 emails which are based on domain names, e.g. hotmail.com have 676 email address and also gmail.com have 676....
what i want is; generating the sub_g2's meaning or equivalent strings from domain_lists vector. after that, sub_g1's data is also used for finding from the email address which begins with the sub_g1's dual characters and ends sub_2 domain_lis. so help me for solving this problem.

 Accepted Answer

If you split the problem into parts then it is much easier to solve. Here are the data call arrays, which I altered e.g. by adding one sample email address that actually matches the second pair+domain data (otherwise the two email addresses given do not match any, so we would not have a positive result):
dom_lists = {'001' 'gmail.com'; '000' 'hotmal.com';'010' 'yahoo.com'; '011' 'mail.com';'100' 'live.com';'101' 'myspace.com';'110' 'msn.com';'111' 'mynet.com'};
sub_g1 = {'aj'; 'ih'; 'vn'; 'hu'; 'eg'; 'is'; 'rd'; 'nt'; 'me'; 'ah'; 'zb'; 'en'; 'mm'};
sub_g2 = {'001';'101'; '101';'101'; '000'; '110'; '001'; '111'; '101'; '000'; '110'; '000'; '000'};
email_list = {'aakm@hotmail.com'; 'abomcn@hotmail.com'; 'ihzzz@myspace.com'};
First convert the binary strings into numeric values with bin2dec, as this makes them easier to work with:
sub_N = bin2dec(cell2mat(sub_g2));
dom_N = bin2dec(cell2mat(dom_lists(:,1)));
Then simply match these numeric values using bsxfun, and extract the correct domain strings:
[row,~] = find(bsxfun(@eq,dom_N,sub_N.'));
dom_g2 = dom_lists(row,2);
Then create regexp regular expressions based on the character-pair and domain strings, and use these to locate the matching email addresses:
rgx = strcat('^',sub_g1,'.*@',strrep(dom_g2,'.','\.'),'$');
mtc = cellfun(@(s)regexp(email_list,s),rgx, 'UniformOutput',false);
out = ~cellfun('isempty',[mtc{:}]);
where out can be shown in the command window:
>> out
out =
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
out is a logical array where each row corresponds to one of the given email addresses (only three in the sample data!) and each column corresponds to the pair+domain data of sub_g1 and sub_g2 (thus thirteen columns). From this array we can see that the third email address matches the data of the second pair+domain data, which is what was stated at the beginning, so the algorithm has successfully detected this positive test case.

7 Comments

thank you Stephen Cobeldick that was my objectives,but i have also one problem
mtc = cellfun(@(s)regexp(emaillist,s),rgx1, 'UniformOutput',false);
out = ~cellfun('isempty',[mtc{:}]);
how i can benefit out vector's values to generate matched emial addresses and store in array with out loop;
e.g:
sub_N = bin2dec(cell2mat(sub_g2));
dom_N = bin2dec(cell2mat(dom_lists(:,1)));
[rowww,~] = find(bsxfun(@eq,dom_N,sub_N.'));
dom_g2 = dom_lists(rowww,2);
rgx1 = strcat(sub_g1,'@',dom_g2,'$');
[malisrow,malistcol]=size(emaillists);
comlett=1;
for mailout=1:malistcol
for mailin=1:malisrow
comresult=strncmpi(emaillists(mailin,mailout),rgx1(comlett,mailout),2);
if(comresult>0)
k11(comlett,mailout)=emaillists(mailin,mailout)
comlett=comlett+1;
end
end
end
but, this code was not running well, help be for generating founded email addresses from maillist vector N.B: the attache file is email address
I wrote and tested some fully functioning code that correctly locates and matches a list of email addresses to the provided data arrays. You changed that code by removing the core function regexp and several cellfun calls, and added a strncmpi call and several for-loops... and now it does not work?
My answer did not use any for loops, and it already searches all of the email addresses: did you read the part of the description "...each row corresponds to one of the given email addresses ... and each column corresponds to the pair+domain data..." ? See, I already explained that all email addresses are accounted for, as are the data that you want to match them with. No loops are required.
If you attach your mail-list data correctly (you will need to push both buttons: Choose file and Attach file) then we can see this working properly.
i understand your explanation. but my point is how i can get the real email addresses after generating pair+domain data. that is why i used this loop or for loop. so help me again to extract the out vector's binary from original emial addresses according to those 5408 email addresses. thank you in advance.
Thank you for uploading your data. I ran my code and it works without error on your whole data list, exactly as I wrote it and without changing a single line of code.
You can simply use find on the output array out:
>> [idx,idy] = find(out)
idx =
686
4272
4616
4259
3607
1122
5090
4373
4032
idy =
1
2
3
4
6
7
8
9
11
Where each element of idx is the index for the list of email addresses, and the corresponding element of idy are the pair+domain list values. So we can check this by, for example by viewing the second value from idx and idy, first the pair+domain data:
>> sub_g1{idy(2)}
ans =
ih
>> dom_g2{idy(2)}
ans =
myspace.com
and then the email data:
>> email_list{idx(2)}
ans =
ihodan@myspace.com
Note that some of the pair+domain data do not match an email address!
thank you Stephen Cobeldick for your continuous support.
@abdulkarim hassan: I'm glad to help! On this forum it is also considered polite to accept answers that resolve your questions.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!